There are many dimensions of data quality—completeness, consistency, accuracy, and timeliness, to name just a few of the most common—however, one commonly overlooked is . . . metadata.
I know—some of you are probably saying:
“Jim, what’s the matter with you? Metadata is not a dimension of data quality.”
With all due respect (as always, of course!), my response is:
“Sometimes, the best way to know what’s the matter with your data quality, is to ask—what’s the meta with your data?”
What exactly is metadata?
The simplest definition for metadata is “data about data.”
In other words, metadata can be thought of as a label that provides a definition, description, and context for data.
Common examples include relational table definitions and flat file layouts. More detailed examples of metadata include conceptual and logical data models.
Therefore, metadata—among its many other uses—often plays an integral role in determining your data usage.
The perfect wrong answer
As Henrik Liliendahl Sørensen recently explained on his blog, the shared understanding of the label (i.e., metadata) attached to many key business metrics can represent the real data quality issue associated with the metric—and ignoring this important point, can lead to providing the perfect wrong answer to common business questions.
How many watches do you own?
A famous quote, sometimes referred to as Segal’s Law, states that:
“A man with one watch knows what time it is. A man with two watches is never sure.”
When it comes to the metrics used to make (or explain) critical business decisions, I have often witnessed the “we have too many watches” phenomenon being the underlying cause of the confusion and contention surrounding the (often conflicting) answers to common business questions, such as:
- How many customers do we have?
- How many products did we sell?
- How much revenue did we generate?
Therefore, another example of metadata is providing clear definitions of what the terms customers, products, and revenue actually mean.
The metadata associated with the data used to form the basis of the answers to these questions can cause a “framing effect” where the answer is correct from a certain point of view (i.e., depending upon which watch you are using to tell time).
Therefore, you should always verify the metadata as well as the data.
It’s okay to own more than one watch. After all, there is more than one time zone. So, when someone asks you what time it is, instead of responding in your local time (correct from your perspective), you should ask them—where in the world are you?
In the comments section of Henrik’s blog post, I paraphrased King Claudius (from Hamlet—after all, it’s been a few posts since I made a Shakespearean reference):
“The labels attached to critical data must not unwatched go.”
What’s the Meta with your Data?
How do you define metadata? How do you use metadata in your organization?
What is the relationship between metadata and data quality?
Do you think metadata is (or should be considered) a dimension of data quality?