Tag Archives: ETL

Facing Maturity – Your Baby is NOT Really that Pretty, but Is the Baby Really Ugly?

Facing Maturity – Your Baby is NOT Really that Pretty, but Is the Baby Really Ugly?

Feb 27, 2012 by

A friend of mine, in her new position, has discovered that the database design for the data warehouse is just not that easy to use.  So, she called me and asked, “Joyce, how do I tell them their baby is ugly?”  After I burst out laughing for a few minutes, I asked her exactly what was not working for her.  Listed below are her issues with the current data warehouse:

 

What’s the Big Deal?

What’s the Big Deal?

Dec 23, 2010 by

I am winding up what has been at times a slightly stressful consulting gig. For me, this is like shooting an 88 in golf: not great by the standards of many folks but, for me, a comparatively successful round. We’ve had some challenges moving data around but, in the end, my client is happy with what I accomplished – and the time and expense involved.

 

The Limitations of Band-Aids

The Limitations of Band-Aids

Nov 11, 2010 by

Most data management professionals are adept at trickery. No, we don’t lie to folks and maintain elaborate ruses – at least most of us don’t. Rather, when it comes to data, we’re skilled at the art of manipulation. We get data in one form and, for a wide variety of reasons, change it to another. More often than not, our organizations would benefit from addressing some of these data management problems at a core level. Unfortunately, organizations are not run by folks like us. When push comes to shove, we have to heed the oft-hurried calls of CXOs and “just get it done.” In this post, I’m going to discuss the limitations of Band-Aids.

 

70-80%

70-80%

Nov 09, 2010 by

ETL accounts for 70-80% of the effort of building a data warehouse. We all know this – it is conventional wisdom, and is a metric used many places to justify the use of data integration tools for business intelligence projects. But where did this estimate come from?

The “70-80%” is commonly used. In about 10 minutes invested so far on Google, I found:

A blog entry from 2009,

an article reference from 2004,

a 2002 article from the UK Operations Research Society,

one of my own articles from 2003,

various tool vendors going as far back as 1997,

and numerous other citations. But where is this estimate from? And more interestingly, is it valid today?

 

Facebook and Common Sense!

Facebook and Common Sense!

Apr 05, 2010 by

My niece just became a fan of “The problem with Common Sense…..Is that it’s not that Common”. When I saw this message show up on her Facebook wall, I started thinking about how I assume that everyone must know how vital data integration and master data management are to an organization. I mean, OMG isn’t it COMMON SENSE?

Wouldn’t it even be better if the MDM software, data quality software, and data profiling software were all on the same platform? Meaning, they have a shared a repository (truly integrated), with one software installation (i.e. one engine with multiple options), and one user interface to learn. Wouldn’t this be great for our data management initiative?

 

Data Modeling for BI (Business Intelligence)

Data Modeling for BI (Business Intelligence)

Mar 29, 2010 by

I was having lunch with a friend a couple of weeks ago, and she told me she like data modeling for data warehouses because it was so easy. I thought about this for a while, and decided that data modeling for business intelligence or a data warehouse initiative could be accomplished differently, based on the circumstances. For example:

  1. If the enterprise data warehouse (EDW) is already complete, and the data is available. Then data modeling would be very physical, and require creating new data marts, based on the business requirements.

 

Cloud Computing and ETL

Cloud Computing and ETL

Nov 03, 2009 by

I have been working on a small project involving thoughts about cloud computing, high performance programming models, and transient applications such as the T (transformation) part of ETL. What I mean by “transient” is that the operation is not an ongoing operational activity, but basically is a batch process that is executed when needed. That being said, if the data sets being extracted have to be subjected to a lot of modifications (parsing, standardizing, normalization, aggregation, reductions, summarization, etc.) and that takes a long time on a single server, would it not make sense to attempt to speed up the execution by employing multiple processors? This certainly makes sense if the operations are largely independent, and a lot of the T is. For example, parsing out the tokens in name strings can be done in parallel, as can standardization and normalization.

 

Data Integration – Where Does it Fit in an Organization?

Data Integration – Where Does it Fit in an Organization?

Jun 09, 2009 by

In the last couple of blogs, I have concentrated on Data Integration as a practice or discipline. I also talked about the need for Data Integration in an organization. Data Integration is just part of an Enterprise Information Management (EIM) practice at your organization. EIM has the following components:

 

Problems with Data Integration

Problems with Data Integration

Jun 02, 2009 by

Data integration issues are not a new problem. In the 1960s companies started adopting database management systems instead of flat file systems as a corporate repository for information. This thought naturally led to the integration of data into one place for the following reasons:

  1. One source of data to maintain for the corporation
  2. One platform to maintain for the corporation
  3. Less security issues – every system had the same levels of security applied during the application