Big data seems like a daunting challenge because, as data management professionals, we have been taught by experts, and learned from experience, that we always have to dive deep into data in order to discover meaningful business insights and support daily business operations.
In other words, we are always diving into the deep end of the data swimming pool, and when we are swimming in big data, that deep end appears to be bottomless. However, sometimes we can discover meaningful insights in big data without diving into that apparently bottomless deep end.
Let’s use a simple example. I had never watched the television show Lost. When I signed up for Netflix, I noticed that I could watch all 6 seasons (121 episodes). So, I asked two of my closest friends, who had both watched the show while it was on-air, whether or not I should invest some of my free time watching the show.
One of my friends loved the show. My other friend hated the show. I know a lot about my two friends. I know what types of shows they typically like and dislike. So, I could have performed a detailed analysis by comparing their opinions about Lost with their opinions about other shows that we had all seen.
But instead, I checked the available data on Netflix. 6,709,610 people rated the show, giving Lost an average rating of 3.9 stars on a five star scale where 1 = “Hated It”, 2 = “Didn’t Like It”, 3 = “Liked It”, 4 = “Really Liked It”, 5 = “Loved It”. Of those 6,709,610 people, 934 also provided a written review to explain their rating.
Now, of course, I know nothing about any of these people that provided ratings and reviews on Netflix. Furthermore, I could not have performed a more detailed analysis even if I wanted to since Netflix doesn’t share the deep end of its data swimming pool. All I had access to was the aggregated, general sentiment of a large group of unknown, unqualified strangers. I couldn’t perform a deep dive into this data, so I stayed in the shallow end of the pool. I watched all 6 seasons of Lost and ending up giving it a 4 star rating.
Obviously, solving business problems with big data is more important than using it to choose what television show to watch on Netflix. And there will be many times when we will have to dive deep into big data. But there will also be some big data use cases where depth and detailed analysis will not be necessary in order to solve a business problem. So, I just wanted to let you know that, sometimes, it’s okay to be shallow.
Big Data: Structure and Quality