Big Data “In the Air Tonight”
May 14, 2012 by Rich Murnane in Big Data
Rich Murnane continues Big Data Week at the Roundtable, as he wonders if Big Data is all a misunderstanding…
According to the all-knowing WikiPedia, when singer/songwriter Phil Collins wrote the 1981 hit In the Air Tonight the lyrics were not about a drowning incident he witnessed like most people thought. The lyrics were written spontaneously while Collins was reminiscing about the “anger he felt after divorcing his first wife Andrea in 1979″.
So, a misunderstanding and an old problem (love), sounds a bit like “Big Data” to me…
“Big Data” a misunderstanding?
“Big Data” means different things to different people, hence the misunderstanding. To many techies “Big Data” means technologies such as those in the Hadoop family, along their associated distributed physical data architecture.
To the business executive, “Big Data” is all about making their business better by “data mining” through these piles of crap data to find insight about business which would never have been available had the executive not cut the check for managing all this data.
To DataGeeks among us, it’s all about V3 (Volume, Velocity, Variety). Organizations which never had to manage terabytes and petabytes of data (Volume) now have to figure out what we need to do to manage this in an effective and efficient manner. Our data is now growing so much faster than we’ve experienced in our careers, are there things we should be doing differently if our databases are growing at a rate of 500% per year? What about 2000% a year? Talk about Velocity. And talk about Variety, we’re now concerning ourselves with machine generated data such as sensor and log data. If you add unstructured documents such as video, binary documents, and XML documents and we’re no longer thinking in rows & columns, a brave new world.
“Big Data” an old problem?
Large government agencies along with the big players in the web have been managing “Big Data” for much longer than the rest of us. There are lessons to be learned from all these folks, particularly about parallel processing and distributed physical data architectures. Large data processing shops such as credit card companies typically managed large datasets by purchasing big old mainframes and storing everything on file based data stores. In the late 1990′s when I was a DBA on a Very Large Database (VLDB), I remember opening a trouble ticket with my RDBMS vendor asking them “ahh, is there anything I need to do if my database is growing 200% per month?”. Folks adding value to their organizations by using data isn’t anything new either, what’s new is the attention this facet of a business is getting these days.
The best part about “Big Data” is that people are really starting to understand that data is an asset to an organization. The part that keeps me up at night is making sure everyone understands that the same best practices and data management principles apply to data sets of any size, big or small.
“Big Data” is here In the Air Tonight, don’t you think?
Until next time…Rich
it’s “Big Data Week” at the Roundtable! Read what our experts are saying about Big Data!





Monark Vyas
Jun 26, 2012
Hi Rich
Though I agree to your points, I am still unable to find someone who has taken up the discussion around whether Big Data hype is real or not. As you mention Big Data has always been around from the very first times we started working on data. Systems/technologies have normally been overwhelmed with data (V3).
Can you provide your perspective on whether all organizations are jumping the bandwagon too early or prematurely? I have observed everyone wants to do Big Data becuase its the next/current big thing. There are so many challenges within their organization around maturity which they do not count on. Orgs want to integrate data from facebook, tweeter, while they havent yet reached a level where they are effectively using the data that they have internally. Are orgs biting more than what they can chew.
Rich Murnane
Jul 25, 2012
Hello Monark,
I’m sorry it has taken me close to one month to reply to your comment. For one reason or another I didn’t even know you had made the comment.
Regarding “is Big Data real or not”, I’d say “it depends”. I wrote a little about this in the following post and I might use your question as a starting point for a future post:
http://www.dataroundtable.com/?p=9171
As for my perspective on whether or not all organizations are jumping on the bandwagon too early, I can’t speak for “all” organizations but there is certainly something to be said about the intrigue of using data to gain a competitive advantage.
Yes, Big Data is the air, every executive with a pulse and MBA is looking to their DataGeeks and asking them “what should we be doing with Big Data”. Many of us DataGeeks answer the call with things like “install Hadoop” instead of the probably more appropriate reply of “What should we be doing with our business?”. If you keep the conversations about the business objectives and use your data assets (big or small) to better your business, you’ll have a much more productive conversation which will lead to much bigger benefits than just “installing Hadoop”.
As for your question about “orgs biting off more than they can chew?”, some probably are and some probably aren’t. The organizations who manage their current data assets correctly and understand where their business needs to go and how data helps achieve these goals should probably go “all in”. The others should start slow.
I really enjoyed Jill Dyche’s post at the following URL which touches on the subject of the “Big Data Gotcha’s”, which I believe to be very relevant to your questions.
http://blogs.hbr.org/cs/2012/04/how_to_avoid_the_big_data_gotc_1.html
Another interesting point which I haven’t written about yet is that at first glance it appears that many of the new Big Data technologies appear to be better suited for allowing software engineers to be better DataGeeks which has essentially created a new type of DataGeek. Historically speaking (and generalizing probably way too much), most DataGeeks were folks who knew SQL, could plug away on an relational database and spin your data six ways to Sunday. These new technologies appear to have not been built for these types of folks, they seem to be more oriented for the software person who sits up late all night coding and likes to tinker. We’ll see what happens now that we’ve opened the DataGeek market up to these folks, it should be a wild ride.
Thanks for the comment and please reach out to me on LinkedIn if you’d like to connect and keep the conversation flowing offline.
Best…Rich