David Loshin, Cowbell, and the Myth of (Completely) Unstructured Data
Mar 01, 2012 by Phil Simon in Data Management
Most people don’t think of a book as data.
I do.
In fact, while I was editing 101 Lightbulb Moments in Data Management, I used a very unorthodox, very structured approach to manage the hundreds of posts from the roundtablers on data-oriented topics. Why? The short answer: it made sense. The longer answer: Ultimately, I wanted the book to be balanced in a number ways, including:
- number of posts per contributor
- number of posts by topic
- number of posts by contributor by topic
I strived to represent each of the contributors and topics as equally as possible. While I consider myself friendly with each member of this forum, I didn’t want someone complaining to me that Jim Harris’s 18 posts trumped his or her 14 posts. I also didn’t want the data quality section, for instance, to contain 100 pages while there were only 20 pages on an equally important topic like data governance.
And, I’ll admit it. I didn’t want to tick off Dylan Jones.
You don’t either.
Ever.
Also, I wanted to tell the folks at DataFlux exactly where we were light. I’d frequently tell Scott Batchelor that we needed more MDM content or “more Loshin.” (In case you were wondering, there’s no such thing as “enough Loshin.”)
It’s like cowbell.
Structure This!
Now, I’m big pivot table guy (link may not work due to SOPA protests.) I exported all WordPress posts from this site into a flat file, imported it into Excel, saved it as a proper workbook, and dutifully kept track of where we were at any given point. I could always tell you exactly how many lightbulb moments had been chosen, how many Jill Dyché had contributed (and about what), and even how many posts contained Rush references (just about all of them).
Rush is also like cowbell.
I seriously doubt that too many people have thought of editing a book in this manner, but I stand by my methods. You see, to me, just about everything is data. A book is no exception. Yes, the data is typically unstructured, but that doesn’t mean that it has to stay that way. Nor does it mean that you can’t apply a little structure.
No, I didn’t count how many times Jim Harris riffed on a song or the average length of David Loshin’s posts–although I could have. That would have been overkill. Still, being able to simply answer questions made managing the whole project much, much easier than it would otherwise have been.
Simon Says
I’d argue that most people would benefit from approaching their unstructured data in a similar manner. To wit, there’s no such thing as completely unstructured data. It’s a myth, a spook story.
You can always assign times, dates, and handles to tweets. You can use semantic analysis on blog posts and web pages.
Just because data is initially unstructured doesn’t mean that it has to stay that way.
Period.
Feedback
What say you?





Jim Harris
Mar 01, 2012
Well structured blog post, Phil.
Yes, data management always needs a little more Cowbell, a little more Loshin, and a lot more Rush
Unstructured data represents the largest segment of the rising data volumes we are seeing today. I definitely agree that completely unstructured data is a myth, and I think the disruptive paradigm shift that we have face is reevaluating how much structure has to be imposed on data before it can be used.
Historically, data had to be structured (and cleansed, transformed, integrated, etc.) before it was used. Not only is that approach becoming less practical because of how much data we are dealing with, but the reality is that data doesn’t always need a high degree of structure in order to be useful.
And I think books are an excellent example of deriving value from somewhat unstructured data, which is why I used data management books to discuss deriving value from unstructured data in my video post DQ-View: Data Is as Data Does.
Exit the Cowbell Warrior
marc smith
Mar 01, 2012
with all due respect to David, Cowbell is like Rush.
David Loshin
Mar 02, 2012
Somewhere around 250 words is the average.
Phil Simon
Mar 02, 2012
Thanks for the comments, guys. If a book can be made more structured, then I think just about anything can.