What's the Meta with your Data?
Jan 13, 2010 by Jim Harris in Data Quality, Uncategorized
There are many dimensions of data quality—completeness, consistency, accuracy, and timeliness, to name just a few of the most common—however, one commonly overlooked is . . . metadata.
I know—some of you are probably saying:
“Jim, what’s the matter with you? Metadata is not a dimension of data quality.”
With all due respect (as always, of course!), my response is:
“Sometimes, the best way to know what’s the matter with your data quality, is to ask—what’s the meta with your data?”
What exactly is metadata?
The simplest definition for metadata is “data about data.”
In other words, metadata can be thought of as a label that provides a definition, description, and context for data.
Common examples include relational table definitions and flat file layouts. More detailed examples of metadata include conceptual and logical data models.
Therefore, metadata—among its many other uses—often plays an integral role in determining your data usage.
The perfect wrong answer
As Henrik Liliendahl Sørensen recently explained on his blog, the shared understanding of the label (i.e., metadata) attached to many key business metrics can represent the real data quality issue associated with the metric—and ignoring this important point, can lead to providing the perfect wrong answer to common business questions.
How many watches do you own?
A famous quote, sometimes referred to as Segal’s Law, states that:
“A man with one watch knows what time it is. A man with two watches is never sure.”
When it comes to the metrics used to make (or explain) critical business decisions, I have often witnessed the “we have too many watches” phenomenon being the underlying cause of the confusion and contention surrounding the (often conflicting) answers to common business questions, such as:
- How many customers do we have?
- How many products did we sell?
- How much revenue did we generate?
Therefore, another example of metadata is providing clear definitions of what the terms customers, products, and revenue actually mean.
The metadata associated with the data used to form the basis of the answers to these questions can cause a “framing effect” where the answer is correct from a certain point of view (i.e., depending upon which watch you are using to tell time).
Therefore, you should always verify the metadata as well as the data.
It’s okay to own more than one watch. After all, there is more than one time zone. So, when someone asks you what time it is, instead of responding in your local time (correct from your perspective), you should ask them—where in the world are you?
In the comments section of Henrik’s blog post, I paraphrased King Claudius (from Hamlet—after all, it’s been a few posts since I made a Shakespearean reference):
“The labels attached to critical data must not unwatched go.”
What’s the Meta with your Data?
How do you define metadata? How do you use metadata in your organization?
What is the relationship between metadata and data quality?
Do you think metadata is (or should be considered) a dimension of data quality?





Charles Blyth
Jan 13, 2010
Jim
Another great post. We have discussed before about how important ‘context’ is in the realm of data quality. Metadata is a key entity when it comes to defining and understanding context in data, and therefore I agree, it is an integral part of data quality.
PS: No ‘olde English’ quotes here, are you feeling alright?
Phil Simon
Jan 13, 2010
Interesting post, Jim.
This line:
“A man with one watch knows what time it is. A man with two watches is never sure.”
…made me think of the NFL adage:
If you have two starting quarterbacks, you have none.
Rob Paller
Jan 13, 2010
Jim,
I think metadata transcends many data initiatives. The trouble with metadata is that it can be very time consuming just trying to get consensus on what exactly is a product, customer, or vendor to a company; or a student, alumni, or benefactor is to a university. It requires that every part of the enterprise is represented and with that comes politics, ownership, and power struggles.
How many enterprises come together with the full intention of doing it right when a new operational system comes online or a data quality program begins only to let metadata fall to the way side when things get heated and ugly in the conference room? It takes a special individual to facilitate (mediate?) such endeavors with sufficient support from management (C-level?) to keep things from stagnating and ultimately being abandoned because the project is falling behind schedule.
Jim Harris
Jan 13, 2010
@Charles Yes, context is key in data quality. No ‘olde English’ quotes, but how about more Shakespearean paraphrasing:
“What’s in a name? That which we call data
By any other name would stink without good metadata.”
@Phil Good (American, the only REAL kind – oh no he didn’t!) football analogy.
@Rob Excellent points. I agree that “metadata transcends many data initiatives.” I believe that data quality also transcends data initiatives.
I am coining a new term: Data Transcendentalism. Paraphrasing Raplh Waldo Emerson:
“So shall we come to look at the world of data with new eyes. It shall answer the endless inquiry of business intelligence. What is a Single Version of the Truth? What is good data? Build, therefore, metadata that accurately depicts the wide world of your most decision-critical enterprise information. The faster you conform your organization to the best practices in your collective business context, the sooner you will realize its great potential.”
Thanks for your comments, your feedback (as always) is greatly appreciated.
Garnie Bolling
Jan 13, 2010
Jim, excellent “context”
Back to the basics, since the data will “transcend” across the enterprise, it is the core definition, the all serving “core” meta data that should be the bases of your “golden view(s)”
taking that the next step, as we know, there will be a need for multiple “views” of that meta data, such as sending “gold information” to another application with it’s own meta data requirements.
So I like your Transcendentalism philosophy for data.
Oh and one more thing: Answering a question with a question…. yes always a good thing, we need to find the real question behind the question asked
Thanks Jim for some great posts and insight.
Monis Iqbal
Jan 13, 2010
I’m taking the example of a database. Here we can call the table statistics as meta-data with the perspective of the DB vendors and DB admins.
From the business domain perspective, the meta-data changes to what they expect in high level reports.
I think it may or may not be a part of data quality, depending on what data (meta-data) it holds.
Dalton Cervo
Jan 13, 2010
Hi Jim,
Very good posting indeed! I think metadata is one of the most overlooked aspects of data management, and I think that’s because it is pretty darn hard!
Metadata can potentially encompass so many levels. From a single data element on the database to a more complex entity, such as customer, for example, which will be a composite of other elements and/or entities. You mention revenue, which is another big one, with so many dependencies and context related issues.
Metadata, in my opinion, is closely associated with quality of data and processes. I see lots of inconsistent workflows because of misinterpretation of the meaning of the data.
You really got me thinking about metadata as a dimension. But, as much as I think it makes sense, I prefer to have it separate. Metadata is data as well, and as such, it is itself a fair game for data quality. You could potentially apply dimensions of quality to metadata. As much as I like recursive algorithms, I think that’s kind of a stretch here
Furthermore, I see Metadata Management as a much bigger task when compared to other dimensions. You may even need a separate repository, track things such as source of data’s value, transformations performed, business rules applied, etc.
I have more to say on this, but this comment is getting too long. Maybe I’ll add a post to complement your excellent thought provoking entry.
Thanks!
Dalton.
Christopher Blotto
Jan 13, 2010
Jim, great post.
I actually feel that your question could be transversed and looked at as… is DQ a dimension of metadata.
Metadata is the hot commodity of 2010. The hot commodity evolution for Information Management has taken us through MDM, then Governance, to DQ, and now Metadata.
Bottom line users want all of the things that you led with; accuracy, completeness, transparency, and trust.
Our methodology is to look at information processing then look at all of these critical components to achieve the desired outcomes. With the emergence of the semantic web, ontology modeling is going to bring knowing more about our data to the forefront of any information centric program. I see business processes driving information requirements, ontology modeling driving consensus across LOBs, metadata driving governance as well as information quality management (stewardship), with technology components such as data quality, integration, rules management, and MDM being enablers.
Lineage traceability to dynamically link metadata from an enterprise model aligned to process and repository (application, ODS, EDW…) centric metadata will be imperative to truly enable operational stewardship which is the ultimately enabler for DQ.
Jim Harris
Jan 13, 2010
@Garnie Great points (and by the way, you have a really great name – or perhaps I just have a really boring name). Back on topic, I definitely agree with you that there are “core” metadata that the enterprise needs to share as a common foundation, as well as allowing multiple views proving the necessary flexibility for day-to-day operations – just as long as there are justifiable business reasons for doing so.
You don’t want @Charles and I going off on our Battle of the “Single Version of the Truth” – well, if you do, then check out this link:
http://www.ocdqblog.com/home/beyond-a-single-version-of-the-truth.html
And I definitely like answering a question with a question – I was never able to shake the habit most of us developed as kids, where you keep asking “why?”
@Monis (another really great name!) You make an excellent point about perspective and that not all metadata necessarily relating to data quality – such as the table statistics you mentioned, or operational metadata such as start, end, and duration for process runtimes.
@Dalton (does everyone have a cooler name than mine?) Deep thought here: “Metadata is data as well, and as such, it is itself a fair game for data quality.” I call this the Cartesian Recursive Paradox – “I am meta-data, therefore I am data; therefore I exist within data quality.”
@Christopher (okay, I might be able to compete with you first name, but your last name is way cooler than mine!) Metadata is not a dimension of data quality because data quality is a dimension of metadata. Ah – the ontological argument has arrived – wow, what’s with all the deep philosophy today? But seriously, ontology modeling is an excellent example of rich and pervasive enterprise metadata architecture sharing the collective business context needed to understand (and properly leverage) decision-critical enterprise information assets.
Thanks everyone for your awesome comments – once again proving that your feedback is the best part of the blogging experience.
Monis Iqbal
Jan 13, 2010
@Jim thanks for the compliments on the name
and btw don’t be too harsh on your name although I know you are kidding. After all, having a name like Joe doesn’t make you an Average Joe, right?
Rayk
Jan 13, 2010
It may be mentioned but I need to add my comment on this great post (and yes, I want a warm feedback on my name as well).
Currently I’m working on collecting requirements for a Meta data Management System (MDMS), because this is part of our relatively new Data Quality Program. I must say that I have some problems with this.
IMHO it is not a solution to start collection Meta data only to proof that something is wrong with your data. There must be something more and so I reversed the arguments:
Let us introduce a MDMS, set up processes (including data owners etc.), maintain stuff properly, and spread expectations (valid values) and check routines into the relevant systems. The result is? Data quality.
I totally agree with “quality is a dimension of metadata” (and would like to use this phrase). My expectations are that – beside others – Quality in Data is the result of an active meta data management.
Love you guys!
Jim Harris
Jan 13, 2010
@Rayk (Yes, yet another fantastic name!) I wasn’t suggesting that the point of collecting metadata is simply to prove something is wrong with your data.
It was more about clarifying the business context of the data, in order to improve understanding and therefore, among other things, verify usage and help determine if an actual data quality issue exists – which to do so comprehensively, requires other “dimensions” of data quality beyond metadata.
However, I don’t think we are really that far apart in what we are saying. Metadata and data quality are interrelated – and they are both intricately interrelated with all enterprise data initiatives.
I have a tendency to see “data quality management” in all things and you, at least it appears to me, have a tendency to see “metadata management” in all things.
Neither perspective is either right or wrong – but more likely, a matter of semantics.