Sometimes it’s Okay to be Shallow

Sometimes it’s Okay to be Shallow

May 30, 2012 by in Big Data

Big data seems like a daunting challenge because, as data management professionals, we have been taught by experts, and learned from experience, that we always have to dive deep into data in order to discover meaningful business insights and support daily business operations.

In other words, we are always diving into the deep end of the data swimming pool, and when we are swimming in big data, that deep end appears to be bottomless. However, sometimes we can discover meaningful insights in big data without diving into that apparently bottomless deep end.

Let’s use a simple example. I had never watched the television show Lost. When I signed up for Netflix, I noticed that I could watch all 6 seasons (121 episodes). So, I asked two of my closest friends, who had both watched the show while it was on-air, whether or not I should invest some of my free time watching the show.

One of my friends loved the show. My other friend hated the show. I know a lot about my two friends. I know what types of shows they typically like and dislike. So, I could have performed a detailed analysis by comparing their opinions about Lost with their opinions about other shows that we had all seen.

But instead, I checked the available data on Netflix. 6,709,610 people rated the show, giving Lost an average rating of 3.9 stars on a five star scale where 1 = “Hated It”, 2 = “Didn’t Like It”, 3 = “Liked It”, 4 = “Really Liked It”, 5 = “Loved It”. Of those 6,709,610 people, 934 also provided a written review to explain their rating.

Now, of course, I know nothing about any of these people that provided ratings and reviews on Netflix. Furthermore, I could not have performed a more detailed analysis even if I wanted to since Netflix doesn’t share the deep end of its data swimming pool. All I had access to was the aggregated, general sentiment of a large group of unknown, unqualified strangers. I couldn’t perform a deep dive into this data, so I stayed in the shallow end of the pool. I watched all 6 seasons of Lost and ending up giving it a 4 star rating.

Obviously, solving business problems with big data is more important than using it to choose what television show to watch on Netflix. And there will be many times when we will have to dive deep into big data. But there will also be some big data use cases where depth and detailed analysis will not be necessary in order to solve a business problem. So, I just wanted to let you know that, sometimes, it’s okay to be shallow.

Read this related Jim Harris blog post:
Big Data: Structure and Quality

2 Responses to “Sometimes it’s Okay to be Shallow”

  1. Phil Simon

    May 30, 2012

    “Lost” in deep data, eh?

    Reply to this comment
    • Jim Harris

      May 30, 2012

      Yes, Phil. I often find myself lost in deep data, as if I was stranded on a not-so-deserted island being chased by the Smoke Monster of Poor Data Quality, while being attacked by Mysterious Others, who are performing an identity resolution project, verifying master data lists provided by Jacob, who also checks my transaction data for any criminal or fraudulent activity, and if he finds any, then he has Richard tell Ben to have me sent to Room 23 on Hydra Island for awhile and then leave me wallowing in the deep end of the Big Data tanks . . .

      Hmmm, maybe it’s no coincidence that I wrote a blog post about it being okay to be shallow with data analysis after having spent 121 episodes trying to figure out the plot of Lost :-)

      Reply to this comment

Leave a Reply