<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>The Data Roundtable</title>
	<atom:link href="http://www.dataroundtable.com/?feed=rss2" rel="self" type="application/rss+xml" />
	<link>http://www.dataroundtable.com</link>
	<description>The Data Roundtable</description>
	<lastBuildDate>Fri, 18 May 2012 14:00:09 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.1</generator>
		<item>
		<title>Big Data Information Chain Challenges</title>
		<link>http://www.dataroundtable.com/?p=10696</link>
		<comments>http://www.dataroundtable.com/?p=10696#comments</comments>
		<pubDate>Fri, 18 May 2012 14:00:09 +0000</pubDate>
		<dc:creator>Dylan Jones</dc:creator>
				<category><![CDATA[Big Data]]></category>
		<category><![CDATA[big data]]></category>
		<category><![CDATA[Big Data Week]]></category>
		<category><![CDATA[information chain]]></category>

		<guid isPermaLink="false">http://www.dataroundtable.com/?p=10696</guid>
		<description><![CDATA[Dylan Jones (@dataqualitypro) wraps up Big Data Week with "Big Data Information Chain Challenges."]]></description>
			<content:encoded><![CDATA[<p><em>Dylan Jones wraps up our Big Data Week event with a possible end-game for our tale of Big Data&#8230; <span id="more-10696"></span><br />
</em></p>
<p>Much of the talk about Big Data these days revolves around the 3 dimensions cited by Gartner: Volume, Variety, Velocity. Companies getting started with Big Data invariably have to come to terms with the challenges of dealing with new storage approaches, unstructured data sources and demands for hyper-speed analytics and processing.</p>
<p>I sometimes feel what we’re missing in the Big Data narrative is the end-game, what are companies going to do with all this data? If we mull on this for a moment then we start to see the importance of enterprise data quality management, not just for our new Big Data sources but for our existing data sets and all along the Information Chain.</p>
<p>For example, imagine your Big Data objective as a logistics company is to store, process and make sense of millions of <a title="RFID" href="http://en.wikipedia.org/wiki/RFID" target="_blank">RFID</a> readings across your entire shipping inventory. The amount of data generated by these sensor networks can be incredibly useful but this information can only be utilised for operational and competitive advantage if an entire Information Chain is in place, connecting the sensory data to conventional corporate data.</p>
<p>The container identifier needs to be connected to the shipment manifest identifier. The manifest needs to be connected to the shipping route. The route needs to be connected to perhaps meteorological “Big Data” or forecasted transit times from the GPS readings relayed continuously off the ship. All of this helps the logistics company predict when the forward delivery trucks need to be in place at the port so that their assets are fully optimised.</p>
<p>It quickly becomes apparent that the customer and shipping records need the highest levels of accuracy, completeness and consistency (amongst many other dimensions) so that they can leverage all this Big Data intelligence to deliver a higher value of service, reduced operating costs and increased competitive advantage.</p>
<p>As we start to map out a typical business use case for Big Data we can see how value can only be delivered when data quality is managed holistically across the entire Information Chain. This may be a wake-up call for companies who see Big Data as somehow separate to their existing data landscape.</p>
<div>
<div class="call-out-box">Like Shark Week for Data Geeks,<br />
it&#8217;s &#8220;Big Data Week&#8221; at the Roundtable! Read what our experts are saying about <a title="Big Data Week at the Data Roundtable" href="http://www.dataroundtable.com/?tag=big-data-week" target="_self">Big Data</a>!</div>
</div>
]]></content:encoded>
			<wfw:commentRss>http://www.dataroundtable.com/?feed=rss2&amp;p=10696</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>George Carlin and Big Data</title>
		<link>http://www.dataroundtable.com/?p=10327</link>
		<comments>http://www.dataroundtable.com/?p=10327#comments</comments>
		<pubDate>Thu, 17 May 2012 14:00:43 +0000</pubDate>
		<dc:creator>Phil Simon</dc:creator>
				<category><![CDATA[Big Data]]></category>
		<category><![CDATA[big data]]></category>
		<category><![CDATA[Big Data Week]]></category>
		<category><![CDATA[platforms]]></category>

		<guid isPermaLink="false">http://www.dataroundtable.com/?p=10327</guid>
		<description><![CDATA[Phil Simon (@philsimon) on our propensity to introduce new terms – and worry about their meaning later.]]></description>
			<content:encoded><![CDATA[<p><em>Phil Simon continues Big Data Week with a tribute to a comedy great. What&#8217;s that got to do with Big Data, you ask? Read on&#8230;</em><span id="more-10327"></span></p>
<p>To paraphrase the immortal comedian <a title="George Carlin" href="http://www.georgecarlin.com/" target="_blank">George Carlin</a>, words should simply convey information to people. Far too often, however they are used to confuse people. (In a <a title="Carlin on Euphemisms" href="http://www.youtube.com/watch?v=CNk_kzQCclo" target="_blank">famous bit on euphemisms and soft language</a>, he talks about the unfortunate evolution of the term <em><a title="Shell Shock" href="http://en.wikipedia.org/wiki/Combat_stress_reaction" target="_blank">shell shock</a></em> to <em>post-traumatic stress disorder – </em>two words for essentially the same thing.)</p>
<p>I thought of Carlin&#8217;s words recently while listening to a recent IBM event entitled <a title="Simulcast" href="https://events.unisfair.com/index.jsp?code=socialmedia&amp;seid=33755&amp;eid=556" target="_blank">Smarter Analytics Leadership Summit Simulcast</a>. After a while, the discussion turned to Big Data as a Platform – and that just rubbed me wrong.</p>
<p>Do you know <em>Big Data as a Platform</em> even means? I sure don&#8217;t – and I like to think that <a title="The Age of the Platform" href="http://www.theageoftheplatform.com" target="_blank">I know a thing or two about platforms</a>. The very fact that everything seems to be a platform these days dilutes the meaning of the term.</p>
<p><em>Disclaimer: I have no bone to pick with IBM in this post. I have nothing against the company or any of its employees.</em></p>
<p>There are many problems with introducing new business jargon, as I pointed out <a title="Jargon" href="http://www.dataroundtable.com/?p=9015" target="_self">a few months ago on this site</a>. In this post, I&#8217;ll keep my rant to Big Data as a Platform.</p>
<p>If &#8220;Big Data as a Platform&#8221; catches on (and I sure hope that it doesn&#8217;t), I can just see BDAP conferences and self-anointed BDAP &#8220;experts&#8221; who drop buzzwords in every sentences and confuse the very people they are brought in to help.</p>
<p>Think about this way:</p>
<ul>
<li>How many people agree on the &#8220;proper&#8221; <a title="Big Data" href="http://en.wikipedia.org/wiki/Big_Data" target="_blank">definition of Big Data</a>?</li>
<li>How many people agree on the &#8220;proper&#8221; definition of a platform?</li>
<li>How many people will even know what Big Data as a Platform means?</li>
</ul>
<h3>Simon Says</h3>
<p>Now, within any given organization, no one is saying that everyone has to agree on everything for anything to get done. But consider the following questions:</p>
<ul>
<li>But, all else being equal, doesn&#8217;t a common understanding of terms engender better results?</li>
<li>Isn&#8217;t there less room for confusion, miscommunication, and misinterpretation when everyone is on the same page?</li>
<li>And how can everyone be on the same page when lofty buzzwords replace common sense and concrete terms?</li>
</ul>
<h3>Feedback</h3>
<p>What say you?</p>
<div>
<div class="call-out-box">Like Shark Week for Data Geeks,<br />
it&#8217;s &#8220;Big Data Week&#8221; at the Roundtable! Read what our experts are saying about <a title="Big Data Week at the Data Roundtable" href="http://www.dataroundtable.com/?tag=big-data-week" target="_self">Big Data</a>!</div>
</div>
]]></content:encoded>
			<wfw:commentRss>http://www.dataroundtable.com/?feed=rss2&amp;p=10327</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Big Data: Dangerous to Sit this one Out</title>
		<link>http://www.dataroundtable.com/?p=10682</link>
		<comments>http://www.dataroundtable.com/?p=10682#comments</comments>
		<pubDate>Wed, 16 May 2012 19:00:55 +0000</pubDate>
		<dc:creator>Thomas Redman</dc:creator>
				<category><![CDATA[Big Data]]></category>
		<category><![CDATA[big data]]></category>
		<category><![CDATA[Big Data Week]]></category>

		<guid isPermaLink="false">http://www.dataroundtable.com/?p=10682</guid>
		<description><![CDATA[Thomas Redman drops by for Big Data Week, as he details the Dangers inherent in waiting out Big Data.]]></description>
			<content:encoded><![CDATA[<p><em>Midway through Big Data Week at the Data Roundtable, Thomas Redman drops by to argue a better definition, while detailing the dangers of a &#8220;wait and see&#8221; attitude as it pertains to Big Data&#8230;<span id="more-10682"></span><br />
</em></p>
<p>It seems to me that organizations that adopt a “wait and see” attitude with respect to big data play a dangerous game.  I make this claim fully aware that:</p>
<ol>
<li>Like most “shiny new things,” the “hype-to-substance ratio” (see below) is low – and getting lower everyday.</li>
<li>Gaining any real, sustained value and advantage will take a lot more dedication than almost anyone imagines.</li>
<li>“This time will be different” is almost never true.</li>
</ol>
<p>Still, I make my claim.  I do so for three reasons. As background, I think the popular definition of “big data,” based on exceeding current storage and processing capabilities, completely misses the point.  For, except for a limited few, technical horsepower was never the limiting factor.  A better definition would reflect overall intellectual, managerial, and organizational capabilities to understand what the data mean and leverage those insights.  To “put the data to work” in other words.</p>
<p>For an organization struggling to interpret spreadsheets properly, a mid-sized transaction system may represent big data.  For others, a warehouse may be the most meaningful “big data.”  In a slightly different vein, a time-series analysis may represent a step up for a company that is currently doing year-over-year comparisons.</p>
<p><strong>Reason one:</strong> Viewed in this light, “the big data opportunity” is building a smarter organization.  Never a bad strategy.</p>
<p><strong>Reason two:</strong> There are pent-up customer demands in almost every market.  Customers need better financial products, cheaper, better health care, and on and on.  The essential ideas to meet these needs could well lie hidden in the data.  Finding them could yield intoxicating rewards.</p>
<p><strong>Reason three:</strong> Your competitors are almost certainly reading the same stuff you are.  The true danger is dithering while your competitor finds new opportunity in data and gains an advantage you can’t match.</p>
<p>Danger indeed!</p>
<p>Note to readers:  As it matures, big data will almost surely introduce new metrics into our lexicon.  I may have done so with “the hype-to-substance ratio.”  The basic notions are surely age-old, but creating a mathematical construct may be new.  I did a quick Google search and don’t find the term.  But maybe I missed something.  Please let me know if you’ve seen it.  If no one has, I’ll get to work on a proper definition and method of measurement!</p>
<div>
<div class="call-out-box">Like Shark Week for Data Geeks,<br />
it&#8217;s &#8220;Big Data Week&#8221; at the Roundtable! Read what our experts are saying about <a title="Big Data Week at the Data Roundtable" href="http://www.dataroundtable.com/?tag=big-data-week" target="_self">Big Data</a>!</div>
</div>
]]></content:encoded>
			<wfw:commentRss>http://www.dataroundtable.com/?feed=rss2&amp;p=10682</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Big Data: Structure and Quality</title>
		<link>http://www.dataroundtable.com/?p=10689</link>
		<comments>http://www.dataroundtable.com/?p=10689#comments</comments>
		<pubDate>Wed, 16 May 2012 14:00:21 +0000</pubDate>
		<dc:creator>Jim Harris</dc:creator>
				<category><![CDATA[Big Data]]></category>
		<category><![CDATA[big data]]></category>
		<category><![CDATA[Big Data Week]]></category>
		<category><![CDATA[Data Quality]]></category>

		<guid isPermaLink="false">http://www.dataroundtable.com/?p=10689</guid>
		<description><![CDATA[Today on Big Data Week: Jim Harris (@ocdqblog) tackles "Big Data: Structure and Quality."]]></description>
			<content:encoded><![CDATA[<p><em>It&#8217;s Wednesday on Big Data Week, and Jim Harris expands upon the definition of Big Data by looking at quality and structure&#8230;<span id="more-10689"></span><br />
</em></p>
<p>In a <a title="Swimming in Big Data by Jim Harris on the Data Roundtable" href="http://www.dataroundtable.com/?p=10144">previous post</a>, I noted, as many others also have, that <a title="Read Data Roundtable blog posts about Big Data" href="http://www.dataroundtable.com/?cat=561">Big Data</a> is about <a title="HoardaBytes and the Big Data Lebowski by Jim Harris on Obsessive-Compulsive Data Quality" href="http://www.ocdqblog.com/home/hoardabytes-and-the-big-data-lebowski.html" target="_blank">more than just data volume</a>. Its other two most commonly cited characteristics – <a title="Our Increasingly Data-Constructed World by Jim Harris on Obsessive-Compulsive Data Quality" href="http://www.ocdqblog.com/home/our-increasingly-data-constructed-world.html" target="_blank">variety</a> and <a title="The Speed of Decision by Jim Harris on Obsessive-Compulsive Data Quality" href="http://www.ocdqblog.com/home/the-speed-of-decision.html" target="_blank">velocity</a> – further complicate the big data challenge. But before I continue, please permit me to greatly oversimplify traditional data management as a two-step process:</p>
<ol>
<li><strong>Structure</strong></li>
<li><strong>Qualify</strong></li>
</ol>
<p>The easiest example of step one is <a title="Wikipedia article about the relational model" href="http://en.wikipedia.org/wiki/Relational_model" target="_blank">the relational model</a>, which has dominated the data management industry since the 1980s, fostering the long-held belief that data has to be structured <em>before</em> it can be used. The second step is the long-held belief, at least among data quality professionals, that data also has to qualified <em>before</em> it can be used (verifying <a title="Completeness is a Two-Way Street by Rich Murnane on the Data Roundtable" href="http://www.dataroundtable.com/?p=6567">completeness</a>, <a title="Data Quality and the Cupertino Effect by Jim Harris on Obsessive-Compulsive Data Quality" href="http://www.ocdqblog.com/home/data-quality-and-the-cupertino-effect.html" target="_blank">validity</a>, <a title="DQ-Tip: &quot;There is no such thing as data accuracy...&quot; by Jim Harris on Obsessive-Compulsive Data Quality" href="http://www.ocdqblog.com/home/dq-tip-there-is-no-such-thing-as-data-accuracy.html" target="_blank">accuracy</a>, etc.).</p>
<p>These two steps require a methodical approach that is slower than the velocity of big data, which refers to not only how fast data is being produced, but also how fast data must be processed to meet demand. And the biggest increase in volume comes from the variety of big data, which consists mostly of unstructured or semi-structured data. So, from my perspective, most of the big data angst is about the fear that traditional data management techniques can not effectively and efficiently structure and qualify big data before it can be used.</p>
<h3>Different Uses, Different Approaches</h3>
<p>We must acknowledge that some big data use cases differ considerably from traditional use cases, requiring us to reevaluate how we structure big data and how we <a title="The Lies We Tell Data by Jim Harris on the Data Roundtable" href="http://www.dataroundtable.com/?p=10524">assess the quality of big data</a>.</p>
<p>An excellent example is <a title="The Dark Side of the Mood by Jim Harris on the Data Roundtable" href="http://www.dataroundtable.com/?p=10364">sentiment analysis</a>, which analyzes large amounts of largely unstructured data in an attempt to understand how customers think and feel about products and services. By its very nature, determining the sentiments your customers have requires a different data management approach than, for example, determining the number of <a title="Identifying Duplicate Customers by Jim Harris on Obsessive-Compulsive Data Quality" href="http://www.ocdqblog.com/identify-duplicate-customers/" target="_blank">duplicate customer records</a> you have.</p>
<p>In his book <a title="The Secret Life of Pronouns: What Our Words Say About Us by James Pennebaker" href="http://www.amazon.com/The-Secret-Life-Pronouns-Words/dp/1608194809" target="_blank"><em>The Secret Life of Pronouns: What Our Words Say About Us</em></a>, social psychologist and language expert James Pennebaker shared insights from his groundbreaking research in computational linguistics – in essence, counting the frequency of words we use – to show that our language carries secrets about, among other things, our thoughts and feelings.</p>
<p>&#8220;Sociolinguists,&#8221; Pennebaker explained, &#8220;focus on broad social dimensions such as gender, race, social class, and power. Their approach is qualitative, involving recording and analyzing conversations on a case-by-case basis. It is slow, painstaking work. Over the course of a year, a good sociolinguist may analyze only a few interactions. Whereas the qualitative approach is powerful at getting an in-depth understanding of a small group of interactions, the methods are not designed to get an accurate picture of an entire society or culture. This is where computer-based text analysis methods can help. By analyzing the blogs of hundreds of thousands of people, for example, the computer-based methods can quickly determine the nature of gender differences as a function of age, class, native language, region, and other domains. In other words, a relatively slow but careful qualitative approach can give us an in-depth view of a small group of people; a computer-based quantitative approach provides a broader social and cultural perspective. The two methods, then, complement each other in ways that the two research camps often fail to appreciate.&#8221;</p>
<p>So, the <em>loosely-structured, quantitative</em> approach of counting and categorizing the individual words in a very large data set to assess a general, but broad, aggregated sentiment (e.g., positive, negative, or neutral) is a very different approach than the <em>highly-structured, qualitative</em> approach of evaluating the complete sentences and paragraphs in a very small data set to assess a more specific, but narrow, detailed sentiment (i.e., providing more comprehensive and contextual feedback).</p>
<p>Another excellent example of a data management solution that relies on a loosely-structured, quantitative approach is Internet search engines, which rank their results primarily according to the frequency with which the key words in your search term appear on websites. Of course, as we all know, this doesn&#8217;t always guarantee the highest quality search results, but it does enable us to <em>very quickly</em> search a <em>very large</em> number of websites from a <em>wide variety</em> of sources.</p>
<h3>Discussing Structure and Quality</h3>
<p>Big data discussions often turn into debates due to the misperception that big data always requires sacrificing <em>structured data quality</em> in favor of <em>un-or-semi-structured data quantity</em>. But the reality is that sometimes one of these approaches will be more applicable for certain use cases, and other times, these approaches will complement each other in ways that data management professionals may fail to appreciate, or perhaps just reflexively refuse to accept.</p>
<p>In my opinion, in order to move the big data discussion forward, and, more importantly, enable our organizations to develop strategies for using big data to solve business problems, we have to stop fiercely defending our traditional data management perspectives about structure and quality.</p>
<div>
<div class="call-out-box">Like Shark Week for Data Geeks,<br />
it&#8217;s &#8220;Big Data Week&#8221; at the Roundtable! Read what our experts are saying about <a title="Big Data Week at the Data Roundtable" href="http://www.dataroundtable.com/?tag=big-data-week" target="_self">Big Data</a>!</div>
</div>
]]></content:encoded>
			<wfw:commentRss>http://www.dataroundtable.com/?feed=rss2&amp;p=10689</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Big Data: Latent Latency</title>
		<link>http://www.dataroundtable.com/?p=10675</link>
		<comments>http://www.dataroundtable.com/?p=10675#comments</comments>
		<pubDate>Tue, 15 May 2012 14:00:51 +0000</pubDate>
		<dc:creator>David Loshin</dc:creator>
				<category><![CDATA[Big Data]]></category>
		<category><![CDATA[Data Integration]]></category>
		<category><![CDATA[big data]]></category>
		<category><![CDATA[Big Data Week]]></category>
		<category><![CDATA[Data Management]]></category>
		<category><![CDATA[Hadoop]]></category>

		<guid isPermaLink="false">http://www.dataroundtable.com/?p=10675</guid>
		<description><![CDATA[David Loshin (@davidloshin) cuts to the chase with "Big Data: Latent Latency."]]></description>
			<content:encoded><![CDATA[<p><em>As we continue Big Data Week at the Data Roundtable, David Loshin steps in to examine the benefits of analysis platforms like Hadoop and <em>the problems with latency &#8230;</em><span id="more-10675"></span><br />
</em></p>
<p>I will cut to the chase here: the current hysteria regarding the all-encompassing benefits of big data analytics, along with its catchy 3 (or 4) “V” formula of “Volume, Variety, Velocity” (and “Value”) is essentially predicated on the expectation that given an operational environment, parallel file system, and parallelizing programming environment, one can accomplish analyses in a much shorter time frame. This compressed “time to information” would enable real-time decision-making (or at least “faster” decision-making), and so each data management tool vendor is working as hard as it can to demonstrate that its tools can be aligned with Hadoop and thereby support “big data.”</p>
<p>There are benefits to providing commodity-based high-performance computing platform for algorithmic implementation. And as long as the massive volumes of data are available, these high-performance computing engines (such as those developed using Hadoop) should perform reasonably well. The bottleneck, though, is the data.</p>
<p>Using a big data analysis platform such as one built using Hadoop, you benefit from the inherent parallelization of execution, but the hidden cost is the latency associated with data access. Accessing data from disk is slow enough, but imagine trying to access and then pump petabytes through the limited network bandwidth to stream the data into the analytical platform. The latency associated with data motion is often glossed over when reporting application execution times, but I wonder whether the fantastic execution times would be as good if you added in the latency associated with the data access.</p>
<p>The challenge is that the latency problem gets worse as the data volumes grow, so if you are developing a big data application, testing it using a reasonably-sized data set, you might not even notice the tax that the data loading process is assessing. However, you must consider the scalability issues, and if your parallel environment hasn’t been fitted with equally scalable networking and I/O channels, you will eventually feel the pinch. In fact, MapReduce is not insensitive to the latency issue, since each transition between Map and Reduce phases will, by necessity, require broadcasting data across the system’s network.</p>
<p>Since data access and exchange latency is the limiting factor for big data performance, it has the potential to be the cloud that rains on the big data parade. That being said, alleviating the latency bottleneck is likely to become a key issue for anyone who wants to tackle <em>really </em>big data. That means high performance data integration.</p>
<p>Yes, even though the term “data integration” is not as sexy or mesmerizing as “big data,” it might be the name of the technology that enables high performance analytics. And to accommodate the massive volumes, it means a number of key considerations for companies providing data integration technology. Here are some things to think about:</p>
<ul>
<li>Optimizing communication channels to provide pipelined streaming of data;</li>
<li>Embedding computation within the communication network (remember “active networks” research?)</li>
<li>Data federation and virtualization</li>
<li>High-speed virtual data caching</li>
<li>Query optimization prior to “pushing-down” to the source</li>
<li>Integrated event stream processing within the integration layers</li>
<li>Compression</li>
<li>Dynamic data realignment (to take advantage of alternate record layouts and orientations)</li>
<li>Bulk data loading</li>
<li>Data replication</li>
</ul>
<p>This is just a short laundry list. I am pretty convinced that solutions for high performance computation without a strategy for high performance data movement is bound to be bound by the latent latency inherent in data access.</p>
<div>
<div class="call-out-box">Like Shark Week for Data Geeks,<br />
it&#8217;s &#8220;Big Data Week&#8221; at the Roundtable! Read what our experts are saying about <a title="Big Data Week at the Data Roundtable" href="http://www.dataroundtable.com/?tag=big-data-week" target="_self">Big Data</a>!</div>
</div>
]]></content:encoded>
			<wfw:commentRss>http://www.dataroundtable.com/?feed=rss2&amp;p=10675</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Big Data &#8220;In the Air Tonight&#8221;</title>
		<link>http://www.dataroundtable.com/?p=10671</link>
		<comments>http://www.dataroundtable.com/?p=10671#comments</comments>
		<pubDate>Mon, 14 May 2012 19:00:50 +0000</pubDate>
		<dc:creator>Rich Murnane</dc:creator>
				<category><![CDATA[Big Data]]></category>
		<category><![CDATA[big data]]></category>
		<category><![CDATA[Big Data Week]]></category>

		<guid isPermaLink="false">http://www.dataroundtable.com/?p=10671</guid>
		<description><![CDATA[Rich Murnane (@murnane) senses Big Data Week is... "In The Air Tonight."]]></description>
			<content:encoded><![CDATA[<p><em>Rich Murnane continues Big Data Week at the Roundtable, as he wonders if Big Data is all a misunderstanding&#8230;<span id="more-10671"></span><br />
</em></p>
<p>According to the all-knowing <a title="In the Air Tonight" href="http://en.wikipedia.org/wiki/In_the_Air_Tonight" target="_blank">WikiPedia</a>, when singer/songwriter Phil Collins wrote the 1981 hit <em>In the Air Tonight</em> the lyrics were not about a drowning incident he witnessed <a href="http://www.google.com/url?q=http%3A%2F%2Fen.wikipedia.org%2Fwiki%2FStan_(song)&amp;sa=D&amp;sntz=1&amp;usg=AFQjCNEf2RyQ-ZhuSHutvrH_B2QDDUjqhA">like</a><a href="http://www.google.com/url?q=http%3A%2F%2Fen.wikipedia.org%2Fwiki%2FStan_(song)&amp;sa=D&amp;sntz=1&amp;usg=AFQjCNEf2RyQ-ZhuSHutvrH_B2QDDUjqhA"> </a><a href="http://www.google.com/url?q=http%3A%2F%2Fen.wikipedia.org%2Fwiki%2FStan_(song)&amp;sa=D&amp;sntz=1&amp;usg=AFQjCNEf2RyQ-ZhuSHutvrH_B2QDDUjqhA">most</a><a title="Drowning Incident" href="http://www.google.com/url?q=http%3A%2F%2Fen.wikipedia.org%2Fwiki%2FStan_(song)&amp;sa=D&amp;sntz=1&amp;usg=AFQjCNEf2RyQ-ZhuSHutvrH_B2QDDUjqhA" target="_blank"> </a><a href="http://www.google.com/url?q=http%3A%2F%2Fen.wikipedia.org%2Fwiki%2FStan_(song)&amp;sa=D&amp;sntz=1&amp;usg=AFQjCNEf2RyQ-ZhuSHutvrH_B2QDDUjqhA">people</a><a href="http://www.google.com/url?q=http%3A%2F%2Fen.wikipedia.org%2Fwiki%2FStan_(song)&amp;sa=D&amp;sntz=1&amp;usg=AFQjCNEf2RyQ-ZhuSHutvrH_B2QDDUjqhA"> </a><a href="http://www.google.com/url?q=http%3A%2F%2Fen.wikipedia.org%2Fwiki%2FStan_(song)&amp;sa=D&amp;sntz=1&amp;usg=AFQjCNEf2RyQ-ZhuSHutvrH_B2QDDUjqhA">thought</a>.  The lyrics were written spontaneously while Collins was reminiscing about the <em>&#8220;anger he felt after divorcing his first wife Andrea in 1979&#8243;</em>.</p>
<p>So, a misunderstanding and an old problem (love), sounds a bit like &#8220;Big Data&#8221; to me&#8230;</p>
<h3><strong>&#8220;Big Data&#8221; a misunderstanding?</strong></h3>
<p>&#8220;Big Data&#8221; means different things to different people, hence the misunderstanding.  To many techies &#8220;Big Data&#8221; means technologies such as those in the Hadoop family, along their associated distributed physical data architecture.</p>
<p>To the business executive, &#8220;Big Data&#8221; is all about making their business better by &#8220;data mining&#8221; through these piles of <span style="text-decoration: line-through;">crap</span> data to find insight about business which would never have been available had the executive not cut the check for managing all this data.</p>
<p>To DataGeeks among us, it&#8217;s all about V<sup>3</sup> (Volume, Velocity, Variety).  Organizations which never had to manage terabytes and petabytes of data (Volume) now have to figure out what we need to do to manage this in an effective and efficient manner.  Our data is now growing so much faster than we&#8217;ve experienced in our careers, are there things we should be doing differently if our databases are growing at a rate of 500% per year?  What about 2000% a year?  Talk about Velocity.  And talk about Variety, we&#8217;re now concerning ourselves with machine generated data such as sensor and log data.  If you add unstructured documents such as video, binary documents, and XML documents and we&#8217;re no longer thinking in rows &amp; columns, a brave new world.</p>
<h3><strong>&#8220;Big Data&#8221; an old problem?</strong></h3>
<p>Large government agencies along with the big players in the web have been managing &#8220;Big Data&#8221; for much longer than the rest of us.  There are lessons to be learned from all these folks, particularly about parallel processing and distributed physical data architectures.  Large data processing shops such as credit card companies typically managed large datasets by purchasing big old mainframes and storing everything on file based data stores.  In the late 1990&#8242;s when I was a DBA on a Very Large Database (VLDB), I remember opening a trouble ticket with my RDBMS vendor asking them <em>&#8220;ahh, is there anything I need to do if my database is growing 200% per month?&#8221;</em>.  Folks adding value to their organizations by using data isn&#8217;t anything new either, what&#8217;s new is the attention this facet of a business is getting these days.</p>
<p>The best part about &#8220;Big Data&#8221; is that people are really starting to understand that data is an asset to an organization.  The part that keeps me up at night is making sure everyone understands that the same best practices and data management principles apply to data sets of any size, big or small.</p>
<p>&#8220;Big Data&#8221; is here <em>In the Air Tonight</em>, don&#8217;t you think?</p>
<p>Until next time&#8230;Rich</p>
<div>
<div class="call-out-box">Like Shark Week for Data Geeks, <br />it&#8217;s &#8220;Big Data Week&#8221; at the Roundtable! Read what our experts are saying about <a title="Big Data Week at the Data Roundtable" href="http://www.dataroundtable.com/?tag=big-data-week" target="_self">Big Data</a>!</div>
</div>
]]></content:encoded>
			<wfw:commentRss>http://www.dataroundtable.com/?feed=rss2&amp;p=10671</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Big Data: Use it or Lose it!</title>
		<link>http://www.dataroundtable.com/?p=10663</link>
		<comments>http://www.dataroundtable.com/?p=10663#comments</comments>
		<pubDate>Mon, 14 May 2012 14:00:12 +0000</pubDate>
		<dc:creator>Joyce Norris-Montanari</dc:creator>
				<category><![CDATA[Big Data]]></category>
		<category><![CDATA[big data]]></category>
		<category><![CDATA[Big Data Week]]></category>
		<category><![CDATA[Hadoop]]></category>

		<guid isPermaLink="false">http://www.dataroundtable.com/?p=10663</guid>
		<description><![CDATA[Joyce Norris-Montanari (@jmontanari) kicks of Big Data Week with "Big Data: Use it or Lose it!"]]></description>
			<content:encoded><![CDATA[<p><em>Joyce Norris-Montanari kicks off Big Data Week at The Data Roundtable! We asked our experts for their thoughts on Big Data, and they responded in a big way. Joyce begins her post with a simple question&#8230;<span id="more-10663"></span><br />
</em></p>
<p>What is &#8220;Big Data&#8221; anyway?  For some companies big data is based on the volume of huge data tables or files.  Is 6 billion rows big data? To some companies it is considered big data.  Some companies consider graphics inclusion in the data warehouse as big data because it brings complexity to the drawing board of BI.  An example would be storing and showing a picture of a product in the reporting and analytics environment alongside product historical sales information.  Other companies believe that big data consists of different types of complex data types, that are brought together on one platform, where sophisticated software crawls thru the data searching for something we didn’t already know.</p>
<p>There is hardware and software available just for crawling thru the complex data, but it is expensive. I get calls almost daily now, wanting to know if I know anything about Hadoop.  That means people are starting to think in the direction of big and complex data.  So, all the more important for us to make we are using it for the right reasons or we could lose the opportunity to learn something new about our data and business.</p>
<p>My biggest fear about implementation of a processor and software to analyze big data is, What if we go to the trouble and expense of processing big data, and no one uses it?  What if we don’t take the results of the analysis and apply to our day-to-day business.</p>
<p>It is crucial to make sure the requirements we gather are RIGHT for our company when we talk about big data.  Not every company has a need for big data – it should all be based on scope and requirements.</p>
<div>
<div class="call-out-box">In preparation for Big Data Week, read more on &#8220;<a title="Big Data on the Data Roundtable" href="http://www.dataroundtable.com/?tag=big-data" target="_self">Big Data</a>,&#8221; and then keep<br />
an eye on the Roundtable the rest of the week as our experts further the discussion.</div>
</div>
]]></content:encoded>
			<wfw:commentRss>http://www.dataroundtable.com/?feed=rss2&amp;p=10663</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>On Uncertainty and Data Minimalism</title>
		<link>http://www.dataroundtable.com/?p=10387</link>
		<comments>http://www.dataroundtable.com/?p=10387#comments</comments>
		<pubDate>Thu, 10 May 2012 14:00:00 +0000</pubDate>
		<dc:creator>Phil Simon</dc:creator>
				<category><![CDATA[Data Management]]></category>
		<category><![CDATA[Amazon]]></category>
		<category><![CDATA[The Age of the Platform]]></category>

		<guid isPermaLink="false">http://www.dataroundtable.com/?p=10387</guid>
		<description><![CDATA[Phil Simon (@philsimon) on understanding the need to simplify our data lives. ]]></description>
			<content:encoded><![CDATA[<p><a title="Wikipedia - Ludwig" href="http://en.wikipedia.org/wiki/Ludwig_Mies_van_der_Rohe Ludwig Mies van der Rohe" target="_blank">Ludwig Mies van der Rohe</a> (March 27, 1886 – August 17, 1969) was a German-American architect. Among the famous quotes attributed to him is &#8220;less is more.&#8221; In other words, blame <a title="Minimalism" href="http://en.wikipedia.org/wiki/Minimalist_architecture#Minimalist_design" target="_blank">minimalism</a> at least partially on him.</p>
<p>Now, for your data-overwhelmed folks, I understand the desire for a more simplistic view of your data universe. I sympathize. While I&#8217;m a small business owner, I too am awash in a sea of data. Along these lines, I recently spoke with media expert <a title="Catherine Davis" href="https://twitter.com/#!/catherinedavis1" target="_blank">Catherine Davis</a> on the challenges of knowing where to spend my limited marketing budget.</p>
<h3>Uncertainty + Data = Less Uncertainty</h3>
<p>You see, it&#8217;s not easy for me to know where I should spend my limited time and financial resources. Among the many questions that I face on daily basis are:</p>
<ul>
<li>Should I write &#8220;exposure-only&#8221; guest posts for <em>Inc</em> or <em>Huffington Post</em>? What percentage of links to the book&#8217;s site will result in conversions?</li>
<li>Should I spend $1,000 on an ad that theoretically reaches 200,000 people? Will this ad pay for itself?</li>
<li>Should I attend a conference at my own expense in the hope that I can meet some people who can help me promote <em><a title="The Age of the Platform" href="http://www.theageoftheplatform.com" target="_blank">The Age of the Platform</a></em>?</li>
<li>Do tweets or other social media mentions matter?</li>
</ul>
<p>The answer to each of these questions: I honestly don&#8217;t know.</p>
<p>And there&#8217;s the rub for the independent author: It&#8217;s extremely difficult to determine the source of book sales. There are just too many data points for me to know for certain how people found out about my book, much less bought it. Sure, there are major events that drive sales – e.g., a review on Slashdot, an Oprah appearance (back in the day, at least). For the most part, though, I&#8217;m clutching at straws.</p>
<p>At least for me, data management is more art than science. If I had written a book in my teens, at least it would have been easier for me to track my sales and assess the potential effectiveness of any marketing efforts. These days, it just ain&#8217;t easy to figure out what&#8217;s going on. While it might not have been easy for authors 20 years ago, I&#8217;d argue that it was probably <em>easier</em>.</p>
<p>Ah, the days of relative data minimalism.</p>
<h3>Simon Says</h3>
<p>And if it&#8217;s this difficult for me, then imagine the CMO of a multinational organization running twelve concurrent international campaigns at different times. Can organizations apply some scientific and mathematical rigor to their data management practices? Of course. But I don&#8217;t see a day anytime soon in which the vast majority of organizations will be completely able to eradicate doubt and uncertainty from their decision making processes.</p>
<p>Brass tacks: Embrace the uncertainly – and use your data to minimize it.</p>
<h3>Feedback</h3>
<p>What say you?</p>
<div>
<div class="call-out-box">Learn more about Phil Simon&#8217;s book,<br />
<em><a title="The Age of the Platform" href="http://www.theageoftheplatform.com/" target="_blank">The Age of the Platform</a></em>.</div>
</div>
]]></content:encoded>
			<wfw:commentRss>http://www.dataroundtable.com/?feed=rss2&amp;p=10387</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Books That Influenced my Thinking: The Goal</title>
		<link>http://www.dataroundtable.com/?p=10645</link>
		<comments>http://www.dataroundtable.com/?p=10645#comments</comments>
		<pubDate>Wed, 09 May 2012 18:00:08 +0000</pubDate>
		<dc:creator>Thomas Redman</dc:creator>
				<category><![CDATA[Data Management]]></category>
		<category><![CDATA[Data Quality]]></category>
		<category><![CDATA[influential books]]></category>

		<guid isPermaLink="false">http://www.dataroundtable.com/?p=10645</guid>
		<description><![CDATA[If we took a poll, I bet that more people would cite The Goal as the most influential book on quality of all time.  Devotees might note the importance of Goldratt’s theory of constraints as the feature they liked best.  But most would cite its readability. Over three million copies have been sold. For The [...]]]></description>
			<content:encoded><![CDATA[<p>If we took a poll, I bet that more people would cite <em>The Goal</em> as the most influential book on quality of all time.  Devotees might note the importance of Goldratt’s theory of constraints as the feature they liked best.  But most would cite its readability. Over three million copies have been sold.<span id="more-10645"></span></p>
<p>For <em>The Goal</em> is a novel, told in first person by Alex Rogo, a beleaguered plant manager, as he struggles to implement the lessons imparted his old teacher Jonah.  Jonah steadfastly refuses to tell Alex anything, instead forcing Alex to work through the questions he poses by himself.</p>
<p>It is <em>The Goal’s</em> homespun examples that most impacted me.   In one, Alex is taking his son’s scout troop on a hike.  One of the boys is slower than the others and, as Alex works out how to maintain order, he reflects on a related problem at the plant.  The powerful lesson is that quality management is not just for work.  It provides powerful tools for everyday life, raising kids, managing one’s weight, and so forth.  Sometime later I read Coveys’ <em>The Seven Habits of Highly Effective People,</em> which further brought the lesson home.</p>
<p>But wait.  There’s more.  Too many of us have a tendency to think that what we do is inherently complex.  But after <em>The Goal</em> I knew:  If I couldn’t explain it (whatever it is) to a “regular person,” I don’t understand it well enough.</p>
<p>This is a really important lesson.  Many of my clients complain that “their management doesn’t get the data quality joke.”  My response is always the same, “Then you’re not telling it right.”</p>
<p>If you haven’t done so already, read <em>The Goal.</em> It is a fast read.  And even better as a slow, careful, re-read.</p>
<p>Next week:  Drucker</p>
<div>
<div class="call-out-box">Read more of Thomas Redman&#8217;s<br />
&#8220;<a title="Thomas Redman's &quot;Books That Influenced My Thinking.&quot;" href="http://www.dataroundtable.com/?tag=influential-books" target="_self">Books That Influenced My Thinking</a>.&#8221;</div>
</div>
]]></content:encoded>
			<wfw:commentRss>http://www.dataroundtable.com/?feed=rss2&amp;p=10645</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Could Unlimited Data Limit Data Silos?</title>
		<link>http://www.dataroundtable.com/?p=10650</link>
		<comments>http://www.dataroundtable.com/?p=10650#comments</comments>
		<pubDate>Wed, 09 May 2012 14:00:23 +0000</pubDate>
		<dc:creator>Jim Harris</dc:creator>
				<category><![CDATA[Big Data]]></category>
		<category><![CDATA[Data Management]]></category>
		<category><![CDATA[big data]]></category>
		<category><![CDATA[data silo]]></category>
		<category><![CDATA[unlimited data plans]]></category>

		<guid isPermaLink="false">http://www.dataroundtable.com/?p=10650</guid>
		<description><![CDATA[Jim Harris (@OCDQBlog) asks, "Could Unlimited Data Limit Data Silos?"]]></description>
			<content:encoded><![CDATA[<p>Some strong storms recently caused an extended disruption in the Internet service provided by my local cable company, which provided me with an opportunity to test the reliability of my new smartphone&#8217;s mobile broadband connectivity, as well as see just how <em>unlimited</em> my unlimited data plan really is.<span id="more-10650"></span></p>
<p>After successfully passing both aspects of this test, I analyzed my smartphone&#8217;s log file and was surprised to discover how much data I used – nearly 5 GB in less than 24 hours.</p>
<p>Of course, the reason that mobile providers offer unlimited (as well as a variety of limited) data plans is because, in <a title="Our Increasingly Data-Constructed World by Jim Harris on Obsessive-Compulsive Data Quality" href="http://www.ocdqblog.com/home/our-increasingly-data-constructed-world.html" target="_blank">our increasingly data-constructed world</a>, more and more of our daily activities, both personal and professional, involve using data – and sometimes using a lot more data than we realize.</p>
<p>When most people discuss the unrelenting data management trend of <a title="Read Data Roundtable blog posts about Big Data" href="http://www.dataroundtable.com/?cat=561">Big Data</a>, many express concerns about <a title="Information Overload Revisited by Jim Harris on Obsessive-Compulsive Data Quality" href="http://www.ocdqblog.com/home/information-overload-revisited.html" target="_blank">Information Overload</a>. But when that term was originally coined over 40 years ago, the primary concern was not about the increasing <em>amount</em> of information, but instead the increasing <em>access</em> to information. I think that the unlimited data plans from mobile providers are an excellent example of this because most smartphone data usage is about access to, not accumulation of, data.</p>
<p>For example, a large percentage of my smartphone data usage was allocated to data streaming services (Pandora for music, Netflix for television shows). With data streaming, I am accessing data without accumulating data, i.e., I am not downloading audio and video files to save on my smartphone&#8217;s hard drive (which has a very limited storage capacity anyway).</p>
<p>Of course, one reason for this is financial, e.g., if I wanted to download the MP3 file of a favorite song, I would have to purchase it, whereas Pandora offers a <em>free</em> music streaming service (if you ignore the cost of buying a smartphone and paying for a monthly mobile service plan).</p>
<p>However, to me the more interesting aspect of my smartphone <em>data usage</em> is that I am not creating a smartphone <em>data silo</em>. Unlimited data access has not required unlimited data storage since the vast majority of the data I use via my smartphone is not retained.</p>
<p>Compare this with the data silos that proliferate within the enterprise data management landscape of most organizations. Many, if not most, data silos consist of data that was retained for a very specific and <em>short-term</em> use. However, after this data is used, it remains retained, eventually becoming <a title="George Mallory and Data Mountaineering by Jim Harris on the Data Roundtable" href="http://www.dataroundtable.com/?p=7196">data that&#8217;s managed just because it&#8217;s there</a>.</p>
<p>Sometimes, I think that we create data silos just because we can. So, I can&#8217;t help but wonder – could unlimited data limit data silos?</p>
<p>In other words, could the enterprise data management equivalent of an unlimited data plan provide users access to unlimited data streaming services <em>without</em> allowing them to create local copies of enterprise data, therefore limiting the proliferation and retention of data silos?</p>
<div>
<div class="call-out-box">Read this related Jim Harris blog post: <a title="Sharing Data by Jim Harris on the Data Roundtable" href="http://www.dataroundtable.com/?p=6508" target="_self">Sharing Data</a></div>
</div>
]]></content:encoded>
			<wfw:commentRss>http://www.dataroundtable.com/?feed=rss2&amp;p=10650</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
	</channel>
</rss>

