Predictably Poor Data Quality
Jun 09, 2010 by Jim Harris in Data Quality
For decades, data quality experts have been telling us poor quality is bad for our data, bad for our decisions, bad for our business, and just plain all around bad, bad, bad—did I already mention it’s bad?
So why does poor data quality continue to exist and persist?
Have the experts been all talk, but with no plan for taking action? Have the technology vendors not been evolving their data quality tools to become more powerful, easier to use, and more aligned with the business processes that create data and the technical architectures that manage data?
Have the business schools been unleashing morons into the workforce who can’t design a business process correctly? Have employees been intentionally corrupting data in an attempt to undermine their employers’ success?
Wouldn’t any perfectly rational organization never suffer from poor data quality?
I recently finished reading the excellent book Predictably Irrational by Dan Ariely, the James B. Duke Professor of Psychology and Behavioral Economics at Duke University.
I am personally fascinated with behavioral economics, which is a relatively new field combining aspects of both psychology and economics.
The basic assumption underlying standard economics is that we will always make rational decisions in our best interest, often justified by a simple cost-benefit analysis.
Behavioral economics more realistically acknowledges that we are not always rational, and most important—our irrationality is neither random nor senseless, but instead it is quite predictable when the complex psychology of human behavior is considered.
The basic assumption underlying most theories of data quality is that since the business benefits of high quality data are obvious when compared to the detrimental effects of poor quality, then any people, processes, or technology which allow poor data quality must be either acting irrationally or otherwise be somehow defective.
Therefore, preventative measures, once put into place, will correct “the problem” and alleviate any need for future corrective action, such as data cleansing, and everything, and everyone, will then be rational and wonderful in a world of perfect data quality.
But what really is the root cause of poor data quality?
Late last year, in an intentionally provocative blog post, Julian Schwarzenbach declared that there is no such thing as a data quality problem because people are the root cause of all data quality problems.
If we recognized this fact, Julian explained, then solving data quality problems involves solving people problems. Although Julian was partially countering the views of some who believe that technology alone is the solution, there are, without question, some data quality problems which are indeed attributable to people problems.
Julian provided an excellent list exemplifying how a lack of data ownership as well as assuming data quality is someone else’s responsibility is the fundamental root case for many data quality problems.
So if people can cause poor data quality, then how do we “correct” their behavior?
Behavioral Data Quality
Whether or not it is a relatively new field, I am using the term Behavioral Data Quality to describe the inclusion of aspects of psychology within the data quality profession.
I have only briefly touched on this subject in my blog posts The Poor Data Quality Jar and The Scarlet DQ, but some of my fellow behavioral data quality scientists have also contributed blog posts on this important topic (listed here in chronological order):
- Rob Paller – A Data Quality Riot Act
- Phil Wright – Can motivations impact the state of data quality?
- Dylan Jones – How Are You Creating a Pull for Data Quality in Your Organization?
- James Standen – Data quality behavioral modification?
- Rich Murnane – How to identify your “Data Quality Savages”
- Julian Schwarzenbach – The Data Accident Investigation Board
- Jill Wanless – Attributes of a Data Rock Star
Conclusion
The Upside of Irrationality is the recently published and provocative follow-up book by Dan Ariely. Although I look forward to reading it, I doubt I could make a similar case for the upside of poor data quality.
However, the first book explained the dangers of not testing our intuitions, thinking we can always predict our behavior, and assuming our behavior will always be rational.
Better understanding these flawed perspectives can help us truly better understand the root causes of our predictably poor data quality.
Most important, it can help us develop far more effective tactics and strategies for implementing successful and sustainable data quality improvements.
What (rationally or irrationally) say you?





Garnie Bolling
Jun 09, 2010
Jim, good stuff,
An example of how behavior can have a positive or a negative effect on data quality are reporting tools. I am sure many of us have experienced a period when we had to input “data” to an application that produces a report for the leadership team. It was “job requirement” to insert & update the data. If the application is useless to the user inputing the data, then what incentive does that person have to insure that the data is correct ? in a word, none… confession: I have been guilty of just putting in data that I know they want to read…
You also posted a while back (and many of our colleagues) have said, the best way to insure data quality is to make sure that the data input is closest to the source, i.e. if you are online shopping, you should update your email / address / name… that should be considered the “best, trusted” version of the data. Now add the spin “how do you incent the person that the data is up to date or correct” ?
This is where your behavioral point of view comes in. If there is value perceived in updating / creating data, then there will be greater chances that the person will update correct / timely information. (I will make sure that my address is correct when I buy something online)
Now what about applications to applications ? well, those are chapters about Trust / Weights & Process of Data Quality, which your past postings have addressed.
Again, thanks for posting, you piqued my interest in the behavioral data quality area… will be watching it …
Grant Martin
Jun 09, 2010
Thanks for a thought-provoking read, Jim. I usually subscribe to the theory that there are multiple causes for just about every effect, with poor data quality being no exception.
However, Julian Schwarzenbach’s somewhat tongue-in-cheek contention that people are the root cause of all data problems is difficult to dismiss because, let’s face it, people are 100% responsible for the technology that measures data. And in a world without people, the data is still floating around out there in its perfect form. It’s only when we humans attempt to make some sense out of all this almost infinitely complex data that the fun begins.
Jim Harris
Jun 09, 2010
Thanks for your great comments, Garnie and Grant.
@Garnie – Predictably (
) excellent point about the incentive for, or motivation of, the user who is entering the data to do so “correctly.” It is often all too easy to villainize (not that I am at all suggesting this was your point) data entry for causing data quality issues, as if they were somehow deliberately attempting to undermine the quality of the organization’s information. Most often, users actually do make every effort to make data fit its immediate purpose of use. However, many times, it’s the unknown future uses of the originally entered data that is the context for what in hindsight appear to be obvious data quality issues.
@Grant – Your excellent point incites me to paraphrase Shakespeare (okay, in truth it gives me an excuse to paraphrase Shakespeare):
“There is nothing either good or bad about data’s quality, but people make it so.”
As well as paraphrase Nobel laureate Murray Gell-Mann:
“Think how hard data quality would be if data could think.”
Or as I have previously paraphrased John Lennon:
“Imagine there’s no defects
It’s easy if you try
No data cleansing beneath us
Above us only sky
Imagine all the data
Living with quality
Imagine there’s no companies
It isn’t hard to do
Nothing to manage or govern
And no experts too
Imagine all the data
Living life in peace”
Phil Simon
Jun 10, 2010
Great post, Jim.
This particularly resonated with me:
Although Julian was partially countering the views of some who believe that technology alone is the solution, there are, without question, some data quality problems which are indeed attributable to people problems.
As applications automate more and more things, many often believe that, if the system can’t find a problem, then there is none. Nothing could be further from the truth.
My friend Bob Charette has written extensively about the dangers of automation:
http://spectrum.ieee.org/riskfactor/aerospace/aviation/us-national-transportation-safety-board-looks-at-aviation-automation-and-complacency
Long story short, the presence of technology in no way obviates the need for people to continue to be involved. Or am I being irrational?
Dylan Jones
Jun 11, 2010
Great post Jim.
I totally agree with you. In recent years I’ve tried to focus my efforts as much on discovering how to initiate change management as well as the technology wizardry.
I’ve seen too many flawless business cases thrown out simply because of fear, politics, uncertainty – we’re behavioural creatures and unless the entire information chain has people who buy into the culture of quality data all bets are off.
Jim Harris
Jun 11, 2010
Thanks for your great comments, Phil and Dylan.
@Phil – Thanks for sharing the article about automation. My favorite quote is “No light comes on to tell you that you’re being complacent.” Therefore, I think you are being very rational when you say that the presence of technology in no way obviates the need for people to continue to be involved. Now, if we could only find some predictably rational people to be involved
@Dylan – I definitely agree that we’re behavioral creatures and a culture of quality data is necessary for success. Since culture has just been rightly introduced into the mix, I guess that one of my future posts is going to have to discuss Data Quality Sociology. At this rate, data quality is going to need its own online university
Julian Schwarzenbach
Jun 13, 2010
Jim,
A great post pulling in many strands of thought around the area of behavioural data quality (a good label for the subject). Thank you for the prominent mention of some of our posts and thoughts.
It is interesting since that original post about people being the root of all data quality problems and some subsequent related posts, we are getting continued interest and comment on the subject. Over the last two weeks we have presented our Data Zoo and related subjects at a couple of seminars with interesting responses.
At a BCS/DAMA UK seminar on Data Quality the ideas resonated with many of the Data Quality professionals, who were keen to find out each others behaviour types. The previous session included a number of pyschologists in the audience who initially had concerns about some of the theory behind the model, but subsequently were identifying an external party as a “Data Squirrel” due to them hanging on to key data – so had picked up some of the concepts covered.
I think Garnie’s point hits the nail on the head – data behaviours are driven by perceived consequences:
* If ordering on line, you ensure your address is correct
* If submitting a tax return you make sure all the entries are correct
* If you change bank accounts, you will make double sure that payroll get the details correct
Where staff do not see an immediate importance or there is feedback for poor/missing data then there is a much higher risk of not getting what the organisation requires. This may be due to a desire to wifully do the minimum required to get a job done, or it may be due to a perception that the user knows best and that not all the data is really needed/important.
A recent conversation with someone who now works in governance in the finance sector illustrated this point well – the person concerned had been responsible for data entry of client details and custom and practice picked up from colleagues was to default entries to one value. This persons next role was in the regulatory reporting team where this data turned out to be vital for particular reports and resulted in large amounts of extra work to prevent regulatory fines!
It would be interesting to hear if these problems are just human nature, or are they endemic in particular cultures? Do similar problems exist in Japan, Singapore and Germany where there appears to be a much stronger culture of complying with rules and procedures.
Julian
Jim Harris
Jun 14, 2010
Thanks for your great comment, Julian.
As I commented on your excellent blog series that lead to your Data Zoo white paper (I encourage everyone to follow the link that Julian provided, the white paper is free to download, no registration required):
Personality management is often more important than data management, since without the former, the latter will not have much of a chance to be successful.
I also agree with Garnie’s point about data behaviors being driven by their perceived consequences. However, as the experiments documented in Dan Ariely’s book demonstrated very effectively, there are circumstances where we will predictably make a seemingly irrational choice because of variables that can alter our perception of the consequences.
Returning to the Data Zoo, it is possible that even a Data Evangelist can act like a Data Anarchist, convinced that they are a one-person army and have to do everything their way or it won’t get done properly, or when highly stressed, become very pessimistic and therefore a Whinger, or Squirrel away important information until they feel it is the right time to reveal it — but then completely forget what it was in the first place.
My point is that although actual split personalities (i.e., dissociative identity disorder for the psychiatrists and psychologists in the audience) are a truly rare real-world occurrence, but in the data management space, it might actually be the very definition of “normal.”
Your point about cultural differences is an extremely important one. However, data quality issues can still (and do) arise even when the tendency to comply with established rules and procedures is more commonplace.
I am not trying cast all of this as hopeless complexity that we can never figure out. On the contrary, I believe there is hope as long as we stop assuming that we can always predict behavior, that behavior will always be rational–and that when it isn’t, stop assuming that it must have been random or senseless behavior.
The root causes of our predictably poor data quality may be far more complex than we ever imagined, but by acknowledging this, we can develop far more effective tactics and strategies for implementing successful and sustainable data quality improvements.
Best Regards,
Jim
Julian Schwarzenbach
Jun 14, 2010
Jim,
Thanks for the response.
One of the areas where the Data Zoo has been refined over time is a change from these being stated as personality traits to behaviour types. Personalities are generally recognised as being core to an individual and do not tend to change much over time; whereas behaviours can be more easily changed and also a person can exhibit a number of different behaviours. This allows the Data Zoo to work better as a concept without having to cover split personalities!
Julian
Jim Harris
Jun 14, 2010
Excellent point, Julian!
Thanks for the clarification.
Every single one of my split personalities is in total agreement that it is all about behavior and not personality.
Jim
Beth Breidenbach
Jun 26, 2010
I’m late to the party, but this is an insightful post.
At the end of the day, human behavior is both the root cause and solution. Technology doesn’t cause or solve the data quality challenge. Rather, it’s a tool that exacerbates or aids human behavior in either direction.
I believe the same is true for metadata quality — and am publishing a blog post to that effect (with pointers to this post).
Great job!
Jim Harris
Jun 28, 2010
Thanks for your great comment, Beth!
Excellent point about how human behavior is BOTH the root cause AND the solution.
I have always found it puzzling when organizations try to resolve people issues by applying more technology or simply better technology.
Only people can resolve people issues.
Best Regards,
Jim
P.S. Everybody please check out Beth’s blog post: Predictably Poor MetaData Quality