I am winding up what has been at times a slightly stressful consulting gig. For me, this is like shooting an 88 in golf: not great by the standards of many folks but, for me, a comparatively successful round. We’ve had some challenges moving data around but, in the end, my client is happy with what I accomplished – and the time and expense involved. As I prepared to walk away, though, a few last-minute changes caused a bit of a ruckus. In this post, I describe the perils of moving targets as they relate to data management.
After six weeks, most of the main financial table layouts, indexes, and relationships to one another had been more or less finalized. Taking it down a level, I had defined each field’s data type and I knew, for each file, which ones would have headers. File layouts provided by the client had been reasonably solid and I knew what I would get.
Just as important, I knew what I wouldn’t get. I knew what I would have to create via batch processes, temp tables, funky if-then statements (great band name, eh?), and other creative methods. In short, while not elegant, I had built a working ETL tool that did what my client needed when we started talking in late October.
Things were good. Until they weren’t.
Some IT folks had made some ostensibly minor changes to a few “unimportant” files, at least to them. These changes included:
- A numeric field now contained letters.
- Another field previously supplied was now missing.
- Commas that had been stripped were now included.
Admittedly, these changes were not to “main” files; they were made to files that populated tables designed to produce supplemental reports. When I pointed out these changes to my boss, he originally seemed dubious. “What’s the big deal?” he quipped.
The Big Deal
While he’s come a long way on the technical side in the six weeks that I’ve known him, he doesn’t quite know what he doesn’t know. I showed him how, for example, an extraneous comma threw off all of the fields during the import process. I could no longer accurately delimit all of the data, shifting everything over. So, instead of “CUSTOMER_RELATIONSHIP,” the “123″ code now wound up in the “CUSTOMER_TYPE” field.
Let’s follow the chain here. The value “123″ isn’t a valid value in the CUST_TYPE table, causing additional queries to exclude that information. The end result: a few lost transactions inhibiting balancing a trial balance report to a GL transaction report. Of course, based on the 30 or so steps to morph the data, the source of the error may not have been obvious, wasting precious time and causing a great deal of frustration. It’s not like you can Google this type of error.
You might think that data management types are a little uptight. Why do we insist on so much precision and resist ostensibly minor changes to layouts? Why are we sometimes downright defiant when you want to make a little tweak? Because we’re not creating simple text documents here. We’re parsing vast amounts of data, taking into account all of those little complexities that only make sense to you. While you may not be able to be logical with your data, at least be consistent. You’ll make both of our jobs much, much easier.
What say you?