We’re all accustomed to the big headlines when data quality issues hit the news. A trader enters an incorrect digit and billions are wiped off an entire stock exchange. A double debit spreadsheet error and a country suddenly gains billions of euros it never realised it had.
These are the poster children of data quality, so often lauded by vendors of services and technology: “Look what happens when you don’t manage data correctly!”
The problem is that these type of issues come from an infrequent variation in quality. Traders don’t make billions of losses every time they log a trade and countries don’t understate their debt by massive margins every single time they have a regulatory report to compile.
These kind of “low variance” issues can actually be quite damaging for data quality evangelists who want to take action and get things done. I’ve witnessed senior managers blockading progress because they just can’t envision this kind of media frenzy in their own companies.
The thing is, they’re right. Most data quality issues occur far more frequently than these big headline issues, but their impacts are much more subtle and often hidden from day-to-day operations. The reason is that most processes operate in a stable manner. There is variation in quality throughout the process, but if you were to graph this variation it would probably be within acceptable bounds – at least what the organisation considers “the norm,” anyway.
Root causes of “once in a lifetime” type data quality issues are generally special causes. The underlying issues may well be connected to a common lack of data quality management, poor training, aging technology or any number of other problems, but it’s often a specific set of events that transpired to cause a major incident.
The danger is that you focus too much energy on preventing these special causes and ignore the real issues, those that lurk in seemingly stable processes. Just because a process is stable does not mean the data meets the demands of the knowledge worker; it simply means that it is operating as specified. The problem is that many process specifications are not defined with the quality of data and even quality of service in mind. To really improve data quality for the long term and remove variation in process performance, you need to build data quality into those process specifications – even if they appear to deliver stable and operationally acceptable results.