A Tale of Two Q’s
Jan 20, 2010 by Jim Harris in Data Quality
As with many complex challenges, data quality often feels as if it is caught within the eternal struggle between theory and practice.
I refer to the theory of data quality as The Big Q.
I refer to the practice of data quality as the little q.
Therefore, and with apologies to Charles Dickens and his A Tale of Two Cities, I refer to data quality’s struggle between theory and practice as A Tale of Two Q’s:
“It was the best of times, it was the worst of times.
It was the age of wisdom, it was the age of foolishness. It was the epoch of belief, it was the epoch of incredulity. It was the season of Procrastination, it was the season of Perfection. It was the spring of Maturity, it was the winter of Reality. We had everything before us, we had nothing before us, we were all going direct to High Quality Data, we were all going direct the other way.
In short, the period was so far like the present period, that some of its noisiest authorities insisted on its being received, for Theory or for Practice, in the superlative degree of comparison only.”
The Big Q, Defect Prevention, and “Best Theory”
The primary trait of The Big Q is defect prevention, which I refer to as “Best Theory.”
The Big Q is the proactive approach to data quality.
Advocating root cause analysis and business process improvement, defect prevention is essentially the cure for the quality issues that ail your data—by preventing data quality problems before they happen.
This is undeniably the Best Theory of Data Quality.
Most Data Quality Theoreticians usually play the Maturity card—as in, does your organization possess the necessary maturity for proactive data quality.
Numerous capability and maturity models are available, providing stages ranging from initial or undisciplined, through tactical or reactive, then strategic or proactive, up to optimized or governed.
The bottom-line is that a data governance framework is necessary. As is considerable patience, understanding, and dedication—because it will require a strategic organizational transformation that doesn’t happen overnight.
the little q, data cleansing, and “actual practice”
The primary trait of the little q is data cleansing, which I refer to as “actual practice.”
Yes, the little q is the reactive approach to data quality.
The common (and deserved) criticism is that it essentially treats the symptoms without curing the disease—by correcting data quality problems after they have been created—and without correcting their root cause (and sometimes even ignoring it).
However, this is undeniably the actual practice of data quality.
Most data quality practitioners usually play the Reality card—as in, the unavoidable reality is that data cleansing is used to correct the data problems that are currently plaguing critical business decisions on a daily basis.
In fact, many would argue that although it only alleviates the symptoms without curing the disease, reactive data cleansing is a triage, where the priority is to stabilize the patient—since a cure for the underlying condition is worthless if the patient dies before it can be administered.
Doing data quality well is a far, far better thing to do . . .
But how exactly—do you—do DQ?
Are you a Data Quality Theoretician or a data quality practitioner?
In A Tale of Two Q’s, which Q are you?
Or is this apparent struggle all just Much Ado About Nothing?
(And yes, I realize that I just mixed my literary metaphors.)
Epilogue
Perhaps data cleansing should be used to correct your critical business problems today, while defect prevention is busy building a better tomorrow for your organization?
Maybe theory and practice merge, combining data cleansing and defect prevention into your hybrid discipline for enterprise-wide data quality?
What say you?





Graham Rhind
Jan 20, 2010
I’m a Big Q man and get very frustrated with organisations’ concentration on little q.
Two comments:
1) Data doesn’t suddenly and magically appear in organisations and unexpectedly start causing problems. Organisations collecting data have an opportunity at the beginning of the process of preventing data quality issues – the Big Q – and preventing the little q problems later on. The problem is that at that stage there is usually no q at all, and this is down to business structures – you can’t persuade a company to spend money on prevention because you can’t prove ROI – companies would rather spend to solve a problem they can measure than spend less preventing a problem whose financial consequences they could only have guessed at.
2) Let’s be honest: little q may be the reality, but name me an organisation, any organisation, anywhere, that gets beyond the patient life-saving stage and then start working on the Big Q. Even companies that manage to save their patients then take their eye off the ball, rest on their laurels, and let their existing data deteriorate whilst continue to allow bad data into the company through all sorts of entryways.
Nope, for me it has to be Big Q, no contest!
Phil Simon
Jan 20, 2010
Jim
I enjoyed this post. Lamentably, you are right. “The little q” is indeed “is undeniably the actual practice of data quality.”
I often wonder why organizations are so reactive about DQ and really most things, when you think about it. Some have called me negative or pessimistic for focusing on what could happen. I suppose that I have deep-seeded childhood issues or something.
What do you think it would take for organizations to embrace The Big Q”?
Phil Wright
Jan 20, 2010
Great post Jim.
I’m an advocate of The Big Q, as we all should be, however, there have been times where I have had no choice but to put on the little q hat.
For instance, in cases of departmental data quality issues, perhaps it hasn’t been possible to find the correct sponsor, or has been deemed ‘low priority’, or ‘high cost with little return’ to fix the issues at source. Therefore resulting in ‘data quality firewall’ type affairs where a department profiles incoming data to ensure it meets their standards, and if not, an issue is flagged to fix.
A perfect example of “treating the symptoms but not curing the disease”. Not through choice you understand, but through necessity.
Dylan Jones
Jan 20, 2010
I think you’ve been tapping my phone line, that’s what I say!
Just had a conversation covering exactly this with a firm proponent of the Big Q, I think you can guess who
, we agreed to differ but I think you’ve opened up a big can of worms.
The most commented post on our site this year (by a country mile) was related to whether data cleansing is now a reality for data-driven businesses (http://bit.ly/3g0V5K), I think this demonstrates how deep the opinion runs on this topic.
I think the challenge I lay before the Big Q’ers is one of practicality – yes, we all agree that little q creates another cost-centre but is it really non-value adding in the eyes of the business if it is truly solving their pain? In many cases data cleansing creates a clear ROI, a repetitive cost indeed, but a positive return all the same.
The curve ball I always throw into this debate is one of the time-boxed data migration. If we have 12 months to collapse multiple systems into a new target we simply have no option but to implement lots of little q activities, there is neither the time nor the value to implement Big Q.
And we see this problem every day, the poor folks in business just want to get the job done. Is it really fair to expect them to wait until the organisation wakes up to its data governance responsibilities?
The problem I see with a “no-cleanse” approach is that defect prevention at source is often, well, next to impossible isn’t it? The best we can hope to do in many cases is to push measurement, resolution and monitoring further up the information chain. We often can’t change those ageing apps to integrate complex data quality rules to check that widget belongs to a master list on the other side of our enterprise.
That said, the most rewarding projects I’ve been involved in have been where we completely eliminated defects at source, the effects can often be felt right across the business immediately, this is still the goal to aim for.
So, I’m very much a “progressive Q” kind of practitioner. Fight the fires today and then practice fire prevention tomorrow (I accept that is a complete plagiarism of your closing statement!).
Great debate – get your slippers and cocoa ready, this is going to run and run…
Jim Harris
Jan 20, 2010
Thanks everyone for your comments, your feedback is greatly appreciated.
@Graham – So, data isn’t like Gremlins running around computer systems at night and causing trouble while no one is watching? Well, at the very least, you probably shouldn’t get your data wet – or feed it after midnight
Seriously though, you (as always) make excellent points. The ROI of defect prevention is completely theoretical – but nonetheless sound and vitally important to avoiding dire financial consequences. This makes The Big Q your data quality insurance policy – which has the same sales difficulty as all insurance policies – do I really need to insure against the possibility that a tree will fall on my car?
@PhilSimon – Yes, there is a deep psychological issue at work at here – for both organizations and the individual professionals they employ – we all have a seemingly natural tendency to be reactive and not proactive – and not just when it comes to data quality.
Unfortunately, just like many things in life, the importance of data quality is often a lesson that can only be learned and not taught – meaning that many organizations have to face the harsh failure of only relying on the reactive approach, before embracing the proactive – or more likely, a hybrid approach.
@PhilWright – You have summed up nicely the paradox of data quality – we all know The Big Q is the vastly superior approach, but reality sometimes necessitates the little q – which is why I advocate the hybrid approach – we truly need both Q’s to do DQ well.
@Dylan – Excellent point about time-boxed data migrations, where the little q often rules (when data quality isn’t completely ignored). I have witnessed the same on many new system implementations where a massive little q is used to perform data cleansing prior to the initial load.
I also definitely agree that a completely “no-cleanse” approach (i.e., it’s The Big Q or Nothing) is doomed to fail – as you stated, although it is right to push defect prevention as far up the chain as possible, some source systems simply cannot be re-engineered – especially many legacy applications.
I also like your fire-fighting analogy for the hybrid approach:
“Fight the fires today and then practice fire prevention tomorrow.”
And I encourage everyone to check out more about this debate on Data Quality Pro:
http://www.dataqualitypro.com/data-quality-home/debate-is-data-cleansing-a-reality-for-the-data-driven-busin.html
Garnie Bolling
Jan 20, 2010
I reply: it depends Jim.
Where am I, and what is in front of me ? Static, New, Local, Creation phase… oh yes Big Q…
Existing, Complex, Large, Over reaching phase… well little q.
I look at this with a twisted view on physics: String Theory is the latest “Theory of All things, the one molecular and massive body equation.” BUT, we see in 3 dimensions, and String Theory says there are 11 dimensions…
How do you test your 11 Dimensions ? or Big Q for your Enterprise ? obviously, lots and lots of assumptions (and faith, if you allow that)
But, we cant stop living in the 3 dimensions we see now, it is real, we see it, and it directly impacts our daily life. the little q with known issues and all its quirks.
Thanks Jim, I always enjoy your posts, and your insight. Allowing us to “think and contemplate” what is happening out in the Data Quality world.
Jim Harris
Jan 20, 2010
@Garnie – Thanks for your comment. Yes, “it depends” is the only honest reply in my opinion as well.
Excellent use of String Theory – I occasionally read books about physics (in layman’s terms) and it occasionally finds its way into my writing since I see in it so many excellent metaphors and analogies.
Perhaps, a hybrid discipline for enterprise-wide data quality, which combines data cleansing and defect prevention is the best “Theory of Everything” for Data Quality?
Thorsten
Jan 22, 2010
Jim,
another thoughtful and great post!
My first reaction was that you make it sound like two different, almost mutually exclusive approaches. This is where I disagree (a bit).
I think that the correct approach is working on “both q and Q” (is there a letter for that? maybe q|Q?) at the same time. So if you have to fix a problem (q), always add a bit of prevention (Q). Same the other way around: When you work on prevention (Q), always look at the data and “prove” that the prevention is necessary (and correct the identified records).
For me, this is also a “feature” of Deming/TQM cycles that should be the basis for DQM as well.
What are your thoughts?
Thanks
Thorsten
Jim Harris
Jan 22, 2010
Thorsten,
Thanks for your excellent comment.
The reason that I wrote this post as an either/or struggle is because so many people I encounter within the industry describe it that way – it is perhaps the most deeply polarizing debate around data quality.
Some advocates of Deming/TQM evangelize The Big Q as the ONLY acceptable approach – again, as I stated in the post – by playing the Maturity card — which, I have always found a little insulting and also unrealistic. Yes, organizations and practitioners need to advance their understanding of data quality and include the implementation of defect prevention. However, today’s realities include defective records that must be cleansed before they are used as the basis to make a critical business decision.
Therefore, I definitely agree with you that the Two Q’s are not mutually exclusive approaches – I have always advocated a hybrid approach that combines them both into a single best practice.
Best Regards,
Jim
William Sharp
Feb 03, 2010
Loved the post, Jim! It addresses “a day in the life” of a DQ professional. I agree that the little q gets more attention than the big Q. However, this seems logical given how most data quality problems are identified and reported. Afterall designing a system with the big Q in mind is almost admitting that you’ll mess up some where along the line, no? Maybe I’m short-sighted on this one? None the less, this post is a great explaination of the DQ specttrum! Great job (as usual)!
Jim Harris
Feb 03, 2010
William,
Thanks for your great (as usual) comment.
You raise a common challenge – it’s always more difficult to get funding for preventative measures.
As you noted, many might ask:
“You want us to invest in building system features based on the possibility that something may go wrong with the data in the future that we are not seeing happening right now?”
As opposed to the relative ease of getting funding for reactive measures, where (to borrow Dylan’s analogy) you don’t need to convince people that putting out the blazing fire that is burning down their business would be a really good idea.
However, practicing fire (i.e., defect) prevention is obviously important and many fires could have been avoided (or at least minimized) if we had done a better job building the system “up to code” to extend the building analogy – such as not constructing our (data ware)house out of obviously flammable material (i.e., no validation rules enforced during data entry, really?).
Of course, to continue the analogy, practicing fire prevention will not guarantee that a fire couldn’t still happen – kind of like the difference between flame-resistant and non-flammable materials.
Something in the building is capable of catching fire – and fires always find a way to spread – and so do data quality problems . . .
Best Regards,
Jim