Tag Archives: data cleansing

The Fate is Not in the Stars, But in Ourselves

The Fate is Not in the Stars, But in Ourselves

Aug 16, 2011 by

Errors do not appear in data randomly. Actually, the concept of “error” might actually be flawed itself. The data itself is not an error, rather the result of a scenario where one or more events caused the data to be inconsistent with the real world entity being modeled. This could be a user error (such as typing the wrong address) or it could be a sequence of other types of events with which the data did not “track.” For an inconsistent address, that might mean that the individual moved, or the house was destroyed in a hurricane, or the Post Office modified the ZIP codes.

 

The Nature of the Beast

The Nature of the Beast

Aug 09, 2011 by

In my last entry I started to ask about the difference between the current view of “proactive data quality,” in which we really mean “early reactivity to known errors” as opposed to truly proactive data quality in which we anticipate, find, and eliminate errors latent within our existing contexts. I gave two examples of error detection.

 

The Persistence of Error

The Persistence of Error

Aug 02, 2011 by

A few weeks ago I posted a blog entry about analyzing the existence of errors in a data set as a way of anticipating data quality flaws/failures prior to their incurring any business impact. More to the point, we always talk about being “proactive” when it comes to data quality inspection, but what is normally meant is the continuous monitoring of compliance to data quality rules of which we are already aware. If we really think about that, are we being proactive or just arranging to be reactive earlier in the process?

 

Gertie the Sea Lion

Gertie the Sea Lion

May 16, 2011 by

While on Santa Cruz Island in the Galapagos, we stayed at a great bed and breakfast called The Solymar.  It has a swimming pool, which was one of the reasons I picked this establishment.  After a long day of hiking across broken-up lava to get to the “special” swimming hole of the locals, I went swimming at a great little beach, then chose to come back to the hotel instead of ocean kayaking.  I was a bit tired from all that hiking, and I swear to you, I could hear the bartender at the pool calling me to come back for a cold refreshing beverage.

 

Controlled Data Cleansing

Controlled Data Cleansing

Oct 12, 2010 by

Once we have come to grips with the fact that it is not a cardinal sin to correct a data error when necessary, the next set of questions center on the pragmatic aspects of data cleansing, such as:

-          What kinds of data cleansing will we do? There is a broad range of corrections that can be applied, ranging from…

 

Quality vs. Cleansing

Quality vs. Cleansing

Oct 05, 2010 by

Data quality consultants consistently beat the drums about the difference between data quality management and data cleansing. In fact, there may even be a perceived level of condescension. Data cleansing? Hah, of course you would never change the data when you could eliminate the root cause of the introduction of errors. Why, that is preposterous!

Or is it? Actually, let’s…