Tag Archives: Data Profiling
Aug 16, 2012 by Phil Simon
A frequent topic of this blog these days is data tracking. Companies like Amazon, Apple, Facebook, Google, and others are able to extensively log what we’re doing online so they can predict what we will do (or, perhaps more accurately, what we might buy.)
Jun 29, 2012 by Dylan Jones
One of the biggest problems facing any company that manages large amounts of tangible assets such as equipment, parts, stock and physical inventory is ensuring that every item is accounted for. I’ve written in the past about how many assets become “stranded” when data quality defects are introduced.
Apr 17, 2012 by David Loshin
Data attributed that have misleading names are like ticking time bombs awaiting the wrong scenario. While we were considering ideas for mitigating an existing issues related to data attributes with names that did not correctly describe the values the attribute held, one of my colleagues noted that the problem was much more insidious than bad naming. Changing the attributes’ named would not work, since apparently there were numerous applications that had been designed based on those same mistaken assumptions.
Apr 10, 2012 by David Loshin
In my last post, we came across what is probably a common problem: data attributes have names that don’t accurately reflect what those attributes store. So why can’t we just modify the name of the attribute to something that makes more sense? The gut-reaction answer is that there are multiple impacts that are not immediately known. The data attribute may be used in many different applications. Changing the name of the attribute is going to make all those application fail, so duh, that would not make sense.
Apr 03, 2012 by David Loshin
I was recently working on a customer engagement in which a profile of a process’s data set led to the identification of some potential anomalies. These potential issues were related to the use of data values based on the names of the attributes, and what those names presumably meant. However, after some investigation, my direct contact at the customer was made aware that even though the column names implied aspects about how the data values were used, in fact those attributes were not used that way at all, and no one should make any assumptions about the underlying semantics or usage scenarios for the data.
Mar 12, 2012 by Joyce Norris-Montanari
Obviously, the answer must be because of change. Usually, once the requirements are completed the data modeler(s) can start creating the data model for most any application. However, issues seem to always come up that cause change. Maybe the requirements were not signed off on by the right people OR the business analyst forgot something very important. Either of those issues causes delay and change in the data model.
Feb 17, 2012 by Dylan Jones
Someone asked me the following question this week: “Where is the best place to start data profiling for the first time?”.
One obvious answer is “where most data quality issues are,” but that’s not much help when you’re starting out, so the advice I gave was to “check your data plumbing.”