Defining Data Quality Metrics

Defining Data Quality Metrics

May 19, 2008 by in Compliance, Data Governance, Data Quality

I have been getting a lot of questions about the use of metrics to support a data governance activity, especially in the context of the presentation I made at the recent DataFlux executive briefings on Data Governance. We all agree that poor data quality impacts the business, and therefore governance helps monitor and control the quality of data across the enterprise. But how do we monitor the quality of data? Using metrics, of course, and that means we need a process for defining DQ metrics.

Here is a high-level approach to defining DQ metrics:

  1. Select one of the identified critical business impacts associated with poor data quality
  2. Evaluate any data dependencies associated with that business impact
  3. For each data dependency, list the associated business client data expectations
  4. For each data expectation, specify the associated dimension of data quality and one or more business rules that can be used to determine conformance of the data to expectations
  5. For each selected business rule, describe the process for measuring conformance
  6. For each business rule, specify an acceptability threshold

The end result is a set of measurement processes that provide raw data quality scores that can roll up to quantify conformance to business user data quality expectations. Measurements that do not meet the specified acceptability thresholds indicate non-conformance, indicating that some data remediation is necessary.

2 Responses to “Defining Data Quality Metrics”

  1. Leon Schwartz

    May 21, 2008

    This is great high level stuff.
    But when you get down to it, the devil is in the details.

    I have been trying to operationalize the simplest DQ dimension, accuracy, for a while now. Understanding the perception of measurable accuracy metrics by business leaders with budgets is key. It is not a simple function.

    Reply to this comment
  2. David Loshin

    May 22, 2008

    On the contrary, accuracy may be one of the hardest DQ dimensions, specifically because of the challenge of clarifying what accuracy means in every context, and how it can be measured. Comparing against a system of record is good if the data set is small, but when it gets to be big, one might have to look at sampling for verfiication as a way of reporting metric scores.

    Reply to this comment

Leave a Reply