Rich Murnane continues Big Data Week at the Roundtable, as he wonders if Big Data is all a misunderstanding…
According to the all-knowing WikiPedia, when singer/songwriter Phil Collins wrote the 1981 hit In the Air Tonight the lyrics were not about a drowning incident he witnessed like most people thought. The lyrics were written spontaneously while Collins was reminiscing about the “anger he felt after divorcing his first wife Andrea in 1979″.
So, a misunderstanding and an old problem (love), sounds a bit like “Big Data” to me…
“Big Data” a misunderstanding?
“Big Data” means different things to different people, hence the misunderstanding. To many techies “Big Data” means technologies such as those in the Hadoop family, along their associated distributed physical data architecture.
To the business executive, “Big Data” is all about making their business better by “data mining” through these piles of crap data to find insight about business which would never have been available had the executive not cut the check for managing all this data.
To DataGeeks among us, it’s all about V3 (Volume, Velocity, Variety). Organizations which never had to manage terabytes and petabytes of data (Volume) now have to figure out what we need to do to manage this in an effective and efficient manner. Our data is now growing so much faster than we’ve experienced in our careers, are there things we should be doing differently if our databases are growing at a rate of 500% per year? What about 2000% a year? Talk about Velocity. And talk about Variety, we’re now concerning ourselves with machine generated data such as sensor and log data. If you add unstructured documents such as video, binary documents, and XML documents and we’re no longer thinking in rows & columns, a brave new world.
“Big Data” an old problem?
Large government agencies along with the big players in the web have been managing “Big Data” for much longer than the rest of us. There are lessons to be learned from all these folks, particularly about parallel processing and distributed physical data architectures. Large data processing shops such as credit card companies typically managed large datasets by purchasing big old mainframes and storing everything on file based data stores. In the late 1990′s when I was a DBA on a Very Large Database (VLDB), I remember opening a trouble ticket with my RDBMS vendor asking them “ahh, is there anything I need to do if my database is growing 200% per month?”. Folks adding value to their organizations by using data isn’t anything new either, what’s new is the attention this facet of a business is getting these days.
The best part about “Big Data” is that people are really starting to understand that data is an asset to an organization. The part that keeps me up at night is making sure everyone understands that the same best practices and data management principles apply to data sets of any size, big or small.
“Big Data” is here In the Air Tonight, don’t you think?
Until next time…Rich
it’s “Big Data Week” at the Roundtable! Read what our experts are saying about Big Data!