The Perils of Mining on Unstable Data Foundations
Jul 27, 2012 by Dylan Jones in Data Management, Data Quality
Data mining has evolved into a critical skill for many organisations who are looking to find new areas for improvement. I’ve always found data mining to be one of the more innovative and “out there” past-times within corporate data management.
For me it conjures up the Wild West of the data world. You ride in without any promises or commitments and 24 hours later you’ve hit a vein of business performance gold and discovered a pattern within the data that points to some previously undiscovered issue or benefit.
Data mining doesn’t always conform to the (data) rules. Quite often you need to mash up and merge wildly different datasets to find patterns and correlations that were never intended. You’re often transposing and transforming existing data structures into completely new creations in an effort to abstract a new perspective on how the business is operating along a product or service line.
As data volumes grow exponentially I think we’ll see an ever greater need for data mining desperados to come into town. We need more “data storytellers” to help unravel what’s really happening in the labyrinths of our businesses but what we must always bear in mind is the quality of the data on which these discoveries are made.
I’ve seen far more experienced data miners than myself be led blindly down a new vein of insight only to be publicly embarrassed due to an underlying lack of data quality. One of the big problems is that data mining forces you to connect the dots, joining disparate datasets together in a bid to tell a bigger story. Whilst this is fine for exposing an initial discovery what invariably follows is some kind of operational process or report that is left behind to continuously monitor progress.
This is where we hit problems when data quality management is ignored. Data mining should always be free-form in nature. You should not face restrictions when linking and merging, slicing and dicing, transforming and calculating. However, once you move from mining to ongoing operations you need to understand the limitations of the data and enforce some kind of governance on the rules you’re implying. Demonstrating the value of data mining is never an easy task but all your efforts can be eliminated in an instant if stakeholders are publicly embarrassed by the “fools gold” your mining processes have mistakenly uncovered.




