For decades, data quality experts have been telling us poor quality is bad for our data, bad for our decisions, bad for our business, and just plain all around bad, bad, bad—did I already mention it’s bad?
So why does poor data quality continue to exist and persist?
Have the experts been all talk, but with no plan for taking action? Have the technology vendors not been evolving their data quality tools to become more powerful, easier to use, and more aligned with the business processes that create data and the technical architectures that manage data?
Have the business schools been unleashing morons into the workforce who can’t design a business process correctly? Have employees been intentionally corrupting data in an attempt to undermine their employers’ success?
Wouldn’t any perfectly rational organization never suffer from poor data quality?
I am personally fascinated with behavioral economics, which is a relatively new field combining aspects of both psychology and economics.
The basic assumption underlying standard economics is that we will always make rational decisions in our best interest, often justified by a simple cost-benefit analysis.
Behavioral economics more realistically acknowledges that we are not always rational, and most important—our irrationality is neither random nor senseless, but instead it is quite predictable when the complex psychology of human behavior is considered.
The basic assumption underlying most theories of data quality is that since the business benefits of high quality data are obvious when compared to the detrimental effects of poor quality, then any people, processes, or technology which allow poor data quality must be either acting irrationally or otherwise be somehow defective.
Therefore, preventative measures, once put into place, will correct “the problem” and alleviate any need for future corrective action, such as data cleansing, and everything, and everyone, will then be rational and wonderful in a world of perfect data quality.
But what really is the root cause of poor data quality?
Late last year, in an intentionally provocative blog post, Julian Schwarzenbach declared that there is no such thing as a data quality problem because people are the root cause of all data quality problems.
If we recognized this fact, Julian explained, then solving data quality problems involves solving people problems. Although Julian was partially countering the views of some who believe that technology alone is the solution, there are, without question, some data quality problems which are indeed attributable to people problems.
Julian provided an excellent list exemplifying how a lack of data ownership as well as assuming data quality is someone else’s responsibility is the fundamental root case for many data quality problems.
So if people can cause poor data quality, then how do we “correct” their behavior?
Behavioral Data Quality
Whether or not it is a relatively new field, I am using the term Behavioral Data Quality to describe the inclusion of aspects of psychology within the data quality profession.
I have only briefly touched on this subject in my blog posts The Poor Data Quality Jar and The Scarlet DQ, but some of my fellow behavioral data quality scientists have also contributed blog posts on this important topic (listed here in chronological order):
- Rob Paller – A Data Quality Riot Act
- Phil Wright – Can motivations impact the state of data quality?
- Dylan Jones – How Are You Creating a Pull for Data Quality in Your Organization?
- James Standen – Data quality behavioral modification?
- Rich Murnane – How to identify your “Data Quality Savages”
- Julian Schwarzenbach – The Data Accident Investigation Board
- Jill Wanless – Attributes of a Data Rock Star
The Upside of Irrationality is the recently published and provocative follow-up book by Dan Ariely. Although I look forward to reading it, I doubt I could make a similar case for the upside of poor data quality.
However, the first book explained the dangers of not testing our intuitions, thinking we can always predict our behavior, and assuming our behavior will always be rational.
Better understanding these flawed perspectives can help us truly better understand the root causes of our predictably poor data quality.
Most important, it can help us develop far more effective tactics and strategies for implementing successful and sustainable data quality improvements.
What (rationally or irrationally) say you?