Tag Archives: unstructured data
Apr 11, 2013 by Phil Simon
I understand big things. Some of my favorite songs exceed 20 minutes. Many of my favorite books push 500 pages. Heat is one of my favorite movies and it clocks in at nearly three hours. Sometimes big ideas just can’t be compressed into small packages.
Mar 08, 2012 by Phil Simon
I’ve noticed that you people spend a great deal of time on social networks, especially when kickers miss field goals in football games. I find this really interesting. It seems like a relatively recent phenomenon. I would think that there’s a great deal of unlocked value in the data generated by social networks such as Twitter, Facebook, and LinkedIn.
Mar 01, 2012 by Phil Simon
Most people don’t think of a book as data.
Nov 08, 2011 by David Loshin
In my last post, I talked about Hadoop and how its core programming model, based on MapReduce, enables elastic yet scalable distributed parallel processing. We should note, though, that the simplicity of the programming model does somewhat restrict its utility when left in the hands of programmers who are inexperienced with parallel programming. So it is worth considering what makes a problem amenable to a Hadoop-type of solution?
Oct 27, 2011 by Phil Simon
In an era of rapid technological change, I’d argue that the very concept of data has changed a great deal over the last few years. Unstructured data has exploded. We are seeing the growth of columnar databases (think right, not down). Data storage costs have plummeted.
Amid this sea change, one friend remains – like a lighthouse in a violent storm. Amazingly, one simple and longstanding tool continues to provide so much utility.
Oct 06, 2011 by Phil Simon
I’ve been doing a good deal of travel lately and came across a pretty neat iPhone app called TripIt. TripIt provides a single view of your trip, regardless of its components. Book a flight via Travelocity, for instance, and then you can forward the email confirmation of the flight to firstname.lastname@example.org. TripIt allows one to consolidate his/her travel plans in an incredibly useful and straightforward manner.
Jun 30, 2011 by Phil Simon
A friend of mine (call him Larry here) is a hard core techie. I’m happy when I can keep up with him when we “geek out” and talk about SQL statements, cartesian products, and types of joins.
Oct 21, 2010 by Phil Simon
I recently attended the DataFlux IDEAS 2010 Conference and had a chance to meet many people I’ve been following for quite some time. In no particular order, they included: Jim Harris, David Loshin, Jill Dyché, Evan Levy, Dalton Cervo, Rich Murnane, and a host of cool DataFlux folks. I also made some new friends. In this post (and a subsequent one), I’ll be discussing the lessons learned at the conference.
I have a strong affinity for round numbers, so here are five of my top 10 lessons.
10. Most organizations are still struggling with basic data management.
Many organizations are trying to get their arms around their data. Besieged by reduced headcounts and massive amounts of data, some really smart people are fighting the good fight. Yet, there are limits to what even the smartest among us can do. SQL statements have their limitations and many more would benefit from data profiling and quality tools.