Tag Archives: unstructured data

Big Data: Get a Little Bit Pregnant

Big Data: Get a Little Bit Pregnant

Apr 11, 2013 by

I understand big things. Some of my favorite songs exceed 20 minutes. Many of my favorite books push 500 pages. Heat is one of my favorite movies and it clocks in at nearly three hours. Sometimes big ideas just can’t be compressed into small packages.

 

Big Data and the 2012 Presidential Election

Big Data and the 2012 Presidential Election

Nov 29, 2012 by

Time Magazine soon after the U.S. presidential election ran a fascinating piece on how the Obama team managed its data extremely well. From the piece:

 

Dator on the Carrots and Sticks of Sentiment Analysis

Dator on the Carrots and Sticks of Sentiment Analysis

Mar 08, 2012 by

Greetings Earthlings:

I’ve noticed that you people spend a great deal of time on social networks, especially when kickers miss field goals in football games. I find this really interesting. It seems like a relatively recent phenomenon. I would think that there’s a great deal of unlocked value in the data generated by social networks such as Twitter, Facebook, and LinkedIn.

 

David Loshin, Cowbell, and the Myth of (Completely) Unstructured Data

David Loshin, Cowbell, and the Myth of (Completely) Unstructured Data

Mar 01, 2012 by

Most people don’t think of a book as data.

I do.

 

What Types of Activities are Suited to Hadoop?

What Types of Activities are Suited to Hadoop?

Nov 08, 2011 by

In my last post, I talked about Hadoop and how its core programming model, based on MapReduce, enables elastic yet scalable distributed parallel processing. We should note, though, that the simplicity of the programming model does somewhat restrict its utility when left in the hands of programmers who are inexperienced with parallel programming. So it is worth considering what makes a problem amenable to a Hadoop-type of solution?

 

In Praise of the Flat File

In Praise of the Flat File

Oct 27, 2011 by

In an era of rapid technological change, I’d argue that the very concept of data has changed a great deal over the last few years. Unstructured data has exploded. We are seeing the growth of columnar databases (think right, not down). Data storage costs have plummeted.

Amid this sea change, one friend remains – like a lighthouse in a violent storm. Amazingly, one simple and longstanding tool continues to provide so much utility.

 

On Efficiency, Travel, and Unstructured Data

On Efficiency, Travel, and Unstructured Data

Oct 06, 2011 by

I’ve been doing a good deal of travel lately and came across a pretty neat iPhone app called TripIt. TripIt provides a single view of your trip, regardless of its components. Book a flight via Travelocity, for instance, and then you can forward the email confirmation of the flight to plans@tripit.com. TripIt allows one to consolidate his/her travel plans in an incredibly useful and straightforward manner.

 

Rush, Dr. Evil, and the Email Approach to Data Management

Rush, Dr. Evil, and the Email Approach to Data Management

Jun 30, 2011 by

A friend of mine (call him Larry here) is a hard core techie. I’m happy when I can keep up with him when we “geek out” and talk about SQL statements, cartesian products, and types of joins.

 

Lessons and Observations from IDEAS, Part I

Lessons and Observations from IDEAS, Part I

Oct 21, 2010 by

I recently attended the DataFlux IDEAS 2010 Conference and had a chance to meet many people I’ve been following for quite some time. In no particular order, they included: Jim Harris, David Loshin, Jill Dyché, Evan Levy, Dalton Cervo, Rich Murnane, and a host of cool DataFlux folks. I also made some new friends. In this post (and a subsequent one), I’ll be discussing the lessons learned at the conference.

Lessons

I have a strong affinity for round numbers, so here are five of my top 10 lessons.

10. Most organizations are still struggling with basic data management.

Many organizations are trying to get their arms around their data. Besieged by reduced headcounts and massive amounts of data, some really smart people are fighting the good fight. Yet, there are limits to what even the smartest among us can do. SQL statements have their limitations and many more would benefit from data profiling and quality tools.