Where do we store all of this data?
It’s a question that many CIOs are asking themselves these days. Relational databases just aren’t built to store petabytes of unstructured data. Today, the usual suspects include Hadoop and columnar, NoSQL and NewSQL databases. Yes, data storage costs have plummeted over the last fifteen years, but do we really need to store historical information?
Jeff Hawkins doesn’t think so. From a recent New York Times piece:
“It only makes sense to look at old data if you think the world doesn’t change,” said Mr. Hawkins. “You don’t remember the specific muscles you just used to pick up a coffee cup, or all the words you heard this morning; you might remember some of the ideas.”
If no data needs to be saved over a long term and real-time data can stream in all the information that is needed, a big part of the tech industry has a problem. Data storage companies like EMC and Hewlett-Packard thrive on storing massive amounts of data cheaply. Data analysis companies including Microsoft, IBM and SAS fetch that data and crunch the history to find patterns. They and others rely on both the traditional relational databases from Oracle, and newer “unstructured” databases like Hadoop.
Hawkins believes that machine intelligence will in the near future obviate the need for traditional data storage. So, is he right?
A Definitive Maybe
I have mixed feelings here. On one hand, I’m completely on board with data being a means to an end. He who has the most data doesn’t win. That is, it’s not about the data per se; it’s about what you do with the data. For instance, Amazon, Apple, Facebook and Google are worth a boatload of money because they monetize their data, not because of the “inherent value” of their data. In other words, one shouldn’t think of data as cash. However, if used effectively, data can result in increased revenue and profits. So, if we can make better decisions without storing data in the traditional sense, so be it.
On the other hand, we are not machines. How many of us are comfortable with always and blindly trusting the results of computers? Most of us want the ability to drill down, whether we usually take advantage of it or not. Plus, as my friend Bob Charette writes, there are major dangers associated with automation. What if we can’t “fix” a computer or machine because we don’t have the ability or the data? What then?
Hawkins may ultimately prove to be right. In 20 years, companies like EMC may go the way of Kodak because we just don’t need to store data anymore. I’m hard-pressed to believe, though, that this day is coming anytime soon.
What say you?
This post was inspired by my friend Alan Berkson.