I have been getting a lot of questions about the use of metrics to support a data governance activity, especially in the context of the presentation I made at the recent DataFlux executive briefings on Data Governance. We all agree that poor data quality impacts the business, and therefore governance helps monitor and control the quality of data across the enterprise. But how do we monitor the quality of data? Using metrics, of course, and that means we need a process for defining DQ metrics.
Here is a high-level approach to defining DQ metrics:
- Select one of the identified critical business impacts associated with poor data quality
- Evaluate any data dependencies associated with that business impact
- For each data dependency, list the associated business client data expectations
- For each data expectation, specify the associated dimension of data quality and one or more business rules that can be used to determine conformance of the data to expectations
- For each selected business rule, describe the process for measuring conformance
- For each business rule, specify an acceptability threshold
The end result is a set of measurement processes that provide raw data quality scores that can roll up to quantify conformance to business user data quality expectations. Measurements that do not meet the specified acceptability thresholds indicate non-conformance, indicating that some data remediation is necessary.