In an early 2010 blog post, I stated that sometimes the best way to know what’s the matter with your data quality is to ask the question: What’s the Meta with your Data?
The blog post received great comments, which included philosophical and literary theories about the relationship between metadata and data quality. A few of my favorites included:
- Cartesian Recursive Paradox of Metadata: “I am meta-data, therefore I am data; therefore I exist within data quality.”
- Ontological Argument of Metadata: “Metadata is not a dimension of data quality because data quality is a dimension of metadata.”
- From Data and Metadata, an unfinished play by William Shakespeare: “What’s in a name? That which we call data, by any other name would stink without good metadata.”
Although its simplest definition is “data about data,” metadata can be thought of as a label that provides a definition, description and context for data. Common examples include relational table definitions and flat file layouts. More detailed examples include conceptual and logical data models.
What’s the Meta with your Tweet?
A social media example of metadata is a hashtag, which Twitter users include in their tweets in order to tag them for search engines and trending topics websites. David Loshin recently blogged about the data quality of the #dataquality hashtag, and Henrik Liliendahl Sørensen recently blogged about the semantic ambiguities inherent with using #MDM as the hashtag for Master Data Management.
Tagging is a great example of one of the semantic challenges of metadata. As Wikipedia explains, users freely choosing tags often creates a folksonomy, as opposed to users selecting terms from a controlled vocabulary, and the resulting metadata can include homonyms (the same tags used with different meanings) and synonyms (multiple tags for the same concept), which may lead to inappropriate data relationships and inefficient searches for data about a particular subject.
Let’s Meta a Data
In the popular game show Let’s Make a Deal, contestants chose a potential prize concealed behind one of three doors, which are ambiguously labeled Door #1, Door #2 and Door #3, and behind which are the grand prize (e.g., a brand new luxury car), a lesser prize (e.g., a small amount of money) and a “zonk” (i.e., a booby prize, e.g., a tiny toy car).
Choosing the right metadata to describe your data is like playing a data management version of the game show called Let’s Meta a Data, where you must create a system of unambiguous semantics so that your organization’s “contestants” are not forced to choose potentially useful data concealed behind a relational table definition ambiguously labeled Column #1, Column #2 and Column #3.
I’m not sure if a sound metadata management strategy can guarantee you’ll always win the grand prize, but without good metadata, your data quality is more likely to be data zonk-ity.
What’s the Meta with You?
How do you play Let’s Meta a Data? (i.e., what’s your organization’s metadata management strategy?)
Please share both your metadata (i.e., Name, Email, Website) and your data (i.e., Comment) below.