<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>The Data Roundtable</title>
	<atom:link href="http://www.dataroundtable.com/?feed=rss2" rel="self" type="application/rss+xml" />
	<link>http://www.dataroundtable.com</link>
	<description>The Data Roundtable</description>
	<lastBuildDate>Tue, 18 Jun 2013 13:00:13 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.4.2</generator>
		<item>
		<title>Customer Analytics: Classification vs. Segmentation</title>
		<link>http://www.dataroundtable.com/?p=13176&#038;utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=classification-vs-segmentation</link>
		<comments>http://www.dataroundtable.com/?p=13176#comments</comments>
		<pubDate>Tue, 18 Jun 2013 13:00:13 +0000</pubDate>
		<dc:creator>David Loshin</dc:creator>
				<category><![CDATA[Data Management]]></category>
		<category><![CDATA[analytics]]></category>
		<category><![CDATA[customer data]]></category>

		<guid isPermaLink="false">http://www.dataroundtable.com/?p=13176</guid>
		<description><![CDATA[David Loshin (@davidloshin) on using analytics to pinpoint your best customers.]]></description>
			<content:encoded><![CDATA[<p>Last time we looked at a starting point for a classification model for determining the “goodness” of customers, based on some selected dimensions of value, measures, weights, scores and classification levels and thresholds. That being said, these classifications divide your customer based on your criteria.<span id="more-13176"></span></p>
<p>What might be interesting is to explore similarities of those customers within each of the classes that can be used in two different ways. The first is for segmentation purposes: to identify characteristics of specific variables that can be used proactively for new customers to predict which class they will fall into. An example might be that many of the “good” customers live in an area with a median annual household income between $75,000 and $95,000 and own their own homes. That would suggest that a new customer whose annual household income is $84,000 and lives in her own home is likely to be a “good” customer.</p>
<p>The benefit of this segmentation is that it can guide other decisions in the marketing and customer acquisition process. One case in point: if “good” customers live in an area with a median annual household income between $75,000 and $95,000 and own their own homes, perhaps the best place for media spend is radio ads in areas where people with those salaries own their own homes.</p>
<p>The second is for promotion: to determine whether there are any customers in one class that have the potential to be promoted into a higher customer classification. As an example, if one of the customers in the “fair” class has an annual household income of $82,500 and lives in his own home, there is a case to be made for trying to influence that customer’s behavior to transition that customer into a “good” one. That might mean making the customer an offer in a way that changes the score, such as urging the customer to spend more, or buy more items, or to remain a customer for a longer time. And, as we will discuss next time, one way to influence behavior is through price.</p>
<div>
<div class="call-out-box">Read more posts by <a title="David Loshin posts" href="http://www.dataroundtable.com/?author=2" target="_blank">David Loshin</a>.</div>
</div>
]]></content:encoded>
			<wfw:commentRss>http://www.dataroundtable.com/?feed=rss2&#038;p=13176</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>20 Encounters of the Information Management Kind – #7 Data Conversion Strategies is Where Quality Counts</title>
		<link>http://www.dataroundtable.com/?p=13282&#038;utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=20-encounters-of-the-information-management-kind-7-data-conversion-strategies-is-where-quality-counts</link>
		<comments>http://www.dataroundtable.com/?p=13282#comments</comments>
		<pubDate>Mon, 17 Jun 2013 13:00:39 +0000</pubDate>
		<dc:creator>Joyce Norris-Montanari</dc:creator>
				<category><![CDATA[Data Quality]]></category>
		<category><![CDATA[data conversion]]></category>

		<guid isPermaLink="false">http://www.dataroundtable.com/?p=13282</guid>
		<description><![CDATA[Joyce Norris-Montanari explains why data conversion is where data quality counts.]]></description>
			<content:encoded><![CDATA[<p>I don’t believe you can commit to the success of a data conversion without addressing quality (or lack thereof). Do you agree? If so, then why are there so many conversion projects that just move data from one place to another? <span id="more-13282"></span>Let me give you an example. I had a client who did this HUGE database conversion from flat files to an Oracle database (yes, the flat files had seven years of history). Here is the chain of events:</p>
<ol>
<li>The business requirements were gathered, and they basically stated that the business users wanted the same reports they had now. Mistake #1: it is usually setting up a project to promise the same level of reports. Instead, offer an enhancement using words like &#8220;drill thru,&#8221; &#8220;drill across,&#8221; etc. This sets expectations that the new reports are BETTER, AND THEY SHOULD BE!</li>
<li>ETL platform chosen</li>
<li>Data model for new environment created with referential integrity</li>
<li>Project plan created</li>
<li>Resources acquired</li>
<li>Kick-off planned</li>
</ol>
<p>During the requirements gathering and the data modeling, no one really addressed the quality of the data. No profiling, and no quick SQL queries to check out the data. This cost time during implementation. Here&#8217;s why:</p>
<ol>
<li>As soon as some data didn’t load, they dropped the referential integrity and loaded garbage</li>
<li>The garbage had to be filtered out for the reports</li>
<li>The data models had to reflect the lack of database enforced referential integrity</li>
</ol>
<p>Planning for quality is definitely in my top 10!</p>
<div>
<div class="call-out-box">Read how the Miami Herald Media Company <a title="Miami Herald Media Company success story" href="http://www.sas.com/success/miamiherald-data-quality.html" target="_blank">improved its data quality</a> in this customer success story.</div>
</div>
]]></content:encoded>
			<wfw:commentRss>http://www.dataroundtable.com/?feed=rss2&#038;p=13282</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>How Can You Ensure the Readiness of Your Data During Data Migration?</title>
		<link>http://www.dataroundtable.com/?p=13393&#038;utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=how-can-you-ensure-the-readiness-of-your-data-during-data-migration</link>
		<comments>http://www.dataroundtable.com/?p=13393#comments</comments>
		<pubDate>Fri, 14 Jun 2013 13:00:01 +0000</pubDate>
		<dc:creator>Dylan Jones</dc:creator>
				<category><![CDATA[Data Migration]]></category>

		<guid isPermaLink="false">http://www.dataroundtable.com/?p=13393</guid>
		<description><![CDATA[How can you ensure the readiness of your data during data migration?]]></description>
			<content:encoded><![CDATA[<p dir="ltr">Are you embarking on a data migration in the near future? If so, there is one nagging question that will loom across every stage leading up to the final moment of truth as your data finally lands in the target system:</p>
<blockquote>
<p dir="ltr">&#8220;Will the migrated data be able to support our business functions post-migration?&#8221;<span id="more-13393"></span></p>
</blockquote>
<p dir="ltr">Businesses often place their entire faith in a contractor or supplier. They think that because they have a contract and the supplier has a “proven method” they are protected from the failure caused by data that simply isn&#8217;t ready for target system operation.</p>
<p dir="ltr">You only have to read the press and media to learn that a contract is no insurance policy.</p>
<p dir="ltr"><strong>Why does this “data readiness problem” occur?</strong></p>
<p dir="ltr">Most businesses don’t perform large data migrations often, so project leaders and sponsors typically lack the expertise and experience to recognise the danger signs. As data migration is such a relatively immature profession, it’s all too easy to omit critical steps in the project plan that can cause poor-quality data to wreak havoc in target systems.</p>
<p dir="ltr"><strong>What should businesses be doing to ensure data readiness during data migration?</strong></p>
<p dir="ltr">Business leaders need to take action from the outset because data migration is a<em> business</em> initiative. Yes, you’re migrating data, but the core activity is business transformation and for this reason it demands the necessary levels of business involvement.</p>
<p dir="ltr">Project leaders need to ensure key sponsors and experts from the business community are heavily involved in the decision-making processes for determining data readiness. They need to allocate their most valuable resources to ensure that post-migration the target systems will function like clockwork.</p>
<p dir="ltr">Even if your suppliers or contractors don&#8217;t demand it, your organisation has to push for greater involvement, particularly when it comes to ensuring the readiness of data.</p>
<p dir="ltr"><strong>Why is this “business integration” tactic so uncommon?</strong></p>
<p>One of the most common practices in data migration is “dumping the problem on the supplier.” This is a flawed approach driven by an assumption that data belongs to IT and can therefore be wholly outsourced to IT teams, either internally or externally.</p>
<p>The key fact that leaders ignore is that data is not some IT by-product that can be shunted over the fence to the contractor of choice. Data drives the business functions that fuel services and business performance. Removing business involvement from the data migration life cycle is akin to saying “we don’t care about the core functions of our business.”</p>
<p dir="ltr">This is commercial suicide and a completely outdated approach. If you want to create successful migrations, <em>the business has to get some skin in the game.</em></p>
<p dir="ltr">By taking an active part in the development, testing and assurance processes of data migration, the business can guide the correct decisions and actions to ensure healthy, operational data after the data migration and beyond.</p>
<p dir="ltr"><strong>What tactics are useful for ensuring data readiness during and after data migration?</strong></p>
<ul>
<ul>
<li>
<p dir="ltr">Create contracts that stipulate business involvement during each phase of the migration</p>
</li>
<li>
<p dir="ltr">Get the business to determine which data it will need, the rules that will bind it and the quality levels required for go-live; don’t let it use the “we need everything” response</p>
</li>
<li>
<p dir="ltr">Perform extensive data quality assessments before, during and after the data migration</p>
</li>
<li>
<p dir="ltr">Create use-case tests that reflect the full range of functions required of the target environment</p>
</li>
<li>
<p dir="ltr">Perform target system testing based on full-volume data loads, not small sample sets</p>
</li>
<li>
<p dir="ltr">Convince senior management to change its perception of data migration being an IT-centric incentive; show managers the business relevance</p>
</li>
<li>
<p dir="ltr">At the earliest possible stage, visually demonstrate to the business how its data will look when migrated across to the target system</p>
</li>
<li>
<p dir="ltr">Set expectations up front as to what data will be migrated across and get business leaders to sign off on these expectations</p>
</li>
<li>
<p dir="ltr">Provide assistance to downstream data users so that they can assess their own data readiness and sign off on any changes to their data feeds</p>
</li>
<li>
<p dir="ltr">Use modern software and methods that allow metadata, business rules and data quality management processes to create great value going forward</p>
</li>
<li>
<p dir="ltr">Develop migration architectures and software that deliver not just one-time use, but become part of the business and IT landscape, adding value for years to come</p>
<p dir="ltr">
</li>
</ul>
</ul>
<p><em>How does your data readiness plan compare to the suggested activities above? Please add your comments below if there is anything I’ve missed or anything that needs further explanation.</em></p>
<div>
<div class="call-out-box">Read a related white paper: <a title="Enhancing Your Chance for Successful Data Migration" href="http://www.sas.com/reg/wp/corp/5969" target="_blank">Enhancing Your Chance for Successful Data Migration</a>.</div>
</div>
]]></content:encoded>
			<wfw:commentRss>http://www.dataroundtable.com/?feed=rss2&#038;p=13393</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Interviews, HR and Hiring Data Scientists</title>
		<link>http://www.dataroundtable.com/?p=12639&#038;utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=interviews-hr-and-hiring-data-scientists</link>
		<comments>http://www.dataroundtable.com/?p=12639#comments</comments>
		<pubDate>Thu, 13 Jun 2013 13:00:24 +0000</pubDate>
		<dc:creator>Phil Simon</dc:creator>
				<category><![CDATA[Big Data]]></category>
		<category><![CDATA[data science]]></category>
		<category><![CDATA[HR]]></category>
		<category><![CDATA[R]]></category>

		<guid isPermaLink="false">http://www.dataroundtable.com/?p=12639</guid>
		<description><![CDATA[Phil Simon (@philsimon) on the need to check boxes in the hiring process.]]></description>
			<content:encoded><![CDATA[<p>What makes a great data scientist? It&#8217;s an interesting question and, to be sure, an increasingly important one now that that we&#8217;ve entered the era of Big Data.</p>
<p><span id="more-12639"></span></p>
<p>On his <a title="Blog - Dempsey" href="http://binalytics.wordpress.com/2013/01/10/on-becoming-a-data-scientist-part-1-the-destination/" target="_blank">blog</a>, <a title="Dempsey" href="http://www.linkedin.com/in/andrewdempsey" target="_blank">Andrew Dempsey</a> lists three high-level skills:</p>
<ul>
<li>Math – They know some blend of statistics, data mining and machine learning</li>
<li>Code – They can do the above through programming, widget based software or a combination</li>
<li>Communicate – They can effectively communicate their findings and recommendations</li>
</ul>
<p>So, how do you find these people? And how do you know when one is right in front of you?</p>
<h3>Are these people actually good?</h3>
<p>As someone who knows a thing or two about HR, let me chime in. For a long time now, <a title="Job Interviews" href="http://www.fastcompany.com/660537/careers-why-traditional-job-interviews-dont-work" target="_blank">traditional interviews have been terrible predictors of ultimate success on the job</a>. I&#8217;ve seen first-hand how people who looked so good on paper did very poorly in their new positions.</p>
<p>The hiring problem is particularly acute with respect to Big Data and data science because the terms are so poorly understood, especially by HR folks who (as a general rule) aren&#8217;t terribly skilled at analyzing data. I&#8217;d argue that many hiring managers are the least equipped to understand the true skills and abilities of the very people before them.</p>
<p>Of course, there are exceptions to that broad statement. And, lest I overstate things, HR rarely makes these types of hiring decisions sans consulting line employees with whom the prospective new hire will be working.</p>
<h3>Simon Says</h3>
<p>Rare is the recruiter who understands that hiring the right Big Data people is <em>not</em> about checking boxes on a checklist. Yes, for data scientists, technical skills really do matter. However, finding adroit data scientists requires more than knowing how to code in <a title="R" href="http://www.r-project.org/" target="_blank">R</a>. As I write in <em>Too Big to Ignore</em>, it&#8217;s more about a state of mind, a natural curiosity to solve problems and ask questions.</p>
<h3>Feedback</h3>
<p>How do you find your data scientists?</p>
<div>
<div class="call-out-box">Related post: <a title="Don’t confuse a data scientist with a rocket scientist" href="http://www.sas.com/knowledge-exchange/business-analytics/uncategorized/don%e2%80%99t-confuse-a-data-scientist-with-a-rocket-scientist/index.html" target="_blank">Don’t confuse a data scientist with a rocket scientist</a>.</div>
</div>
]]></content:encoded>
			<wfw:commentRss>http://www.dataroundtable.com/?feed=rss2&#038;p=12639</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>An Issue with Reporting Data Quality Issues</title>
		<link>http://www.dataroundtable.com/?p=13384&#038;utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=an-issue-with-reporting-data-quality-issues</link>
		<comments>http://www.dataroundtable.com/?p=13384#comments</comments>
		<pubDate>Wed, 12 Jun 2013 13:00:54 +0000</pubDate>
		<dc:creator>Jim Harris</dc:creator>
				<category><![CDATA[Data Quality]]></category>

		<guid isPermaLink="false">http://www.dataroundtable.com/?p=13384</guid>
		<description><![CDATA[Jim Harris (@ocdqblog) explores what can happen if it's too easy to report data quality issues.]]></description>
			<content:encoded><![CDATA[<p>An organization’s perspective on data quality is often revealed by its <a title="dataroundtable.com/?p=2144" href="http://www.dataroundtable.com/?p=2144" target="_blank">data auditing practices</a>. Some organizations practice data quality ignorance by not performing data audits, assuming that if they don’t check it or hear anyone screaming about it, their data quality must be good enough. <span id="more-13384"></span>Other organizations persist on data quality pretense by carefully performing data audits in such a way to project the appearance of high quality data, as sometimes happens with <a title="ocdqblog.com/home/red-flag-or-red-herring.html" href="http://www.ocdqblog.com/home/red-flag-or-red-herring.html" target="_blank">red herrings submitted to avoid raising red flags</a> with regulatory compliance.</p>
<p>Just as important as performing regular data audits is fostering an environment where everyone in the organization can report data quality issues <a title="ocdqblog.com/home/the-scarlet-dq.html" href="http://www.ocdqblog.com/home/the-scarlet-dq.html" target="_blank">without fear of blame or reprisal</a>. However, improving an organization’s ability to report data quality issues can sometimes have unintended consequences, similar to improving a city’s ability to report crimes.</p>
<p>“If the police report an increased number of burglaries in a neighborhood,” Nate Silver asked in his book <a title="amazon.com/Signal-Noise-Most-Predictions-Fail/dp/159420411X" href="http://www.amazon.com/Signal-Noise-Most-Predictions-Fail/dp/159420411X" target="_blank"><em>The Signal and the Noise: Why Most Predictions Fail but Some Don&#8217;t</em></a>, “is that because they are being more vigilant and are catching crimes that they had missed before, or have made it easier to report them? Or is it because the neighborhood is becoming more dangerous?”</p>
<p>As an example of the complex relationship between crime reporting and crime rates, Silver explained that “New York does not allow you to file a police report online, while San Francisco does, as I found out when my rental car was broken into there in a reporting trip for this book. San Francisco is doing a better job of helping citizens and visitors to report and prevent crimes. But perversely, this makes its reported crime rate higher.”</p>
<p>The same thing happens in an organization that makes it easier to report data quality issues. As more issues are reported, the organization’s data quality will be perceived as being worse than it was assumed to be when reporting and auditing were not performed on a regular basis.</p>
<p>But it would be a crime to allow this issue with reporting data quality issues prevent you from being more vigilant about auditing your data and reporting any data quality issues that you find.</p>
<div class="call-out-box">Read this related Jim Harris blog post: <a title="dataroundtable.com/?p=12675" href="http://www.dataroundtable.com/?p=12675" target="_blank">What is Being Measured is Intrinsically Fuzzy</a></div>
]]></content:encoded>
			<wfw:commentRss>http://www.dataroundtable.com/?feed=rss2&#038;p=13384</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Ideas for Addressing Customer Classification</title>
		<link>http://www.dataroundtable.com/?p=13174&#038;utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=customer-types</link>
		<comments>http://www.dataroundtable.com/?p=13174#comments</comments>
		<pubDate>Tue, 11 Jun 2013 13:00:28 +0000</pubDate>
		<dc:creator>David Loshin</dc:creator>
				<category><![CDATA[Data Management]]></category>
		<category><![CDATA[classification]]></category>
		<category><![CDATA[customers]]></category>

		<guid isPermaLink="false">http://www.dataroundtable.com/?p=13174</guid>
		<description><![CDATA[David Loshin (@davidloshin) offers a new approach to addressing customer classification.]]></description>
			<content:encoded><![CDATA[<p>Last time we started to look at methods used in setting product prices, and I asked whether knowledge of customer type would contribute to the determination of a “fair” price for an item that might change in relation to customer type.<span id="more-13174"></span></p>
<p>It might be simple to suggest a specific hierarchy of customer types in relation to some mythical scale of “customer goodness,” and we see this somewhat implicitly applied across the board in the literature surrounding customer centricity and relationship management. Some examples include how to handle your “best customers,” or ways of getting rid of your “worst customers.” The challenge is that without a unit of measure and a scale for goodness, how do organizations classify their customers according to that comparative ranking?</p>
<p>I’d like to suggest two ideas that might help us address customer classification in a way that is easier to manage. The first involves establishing discrete measures and a scale for customer goodness. The second involves having a variety of customer classifications that are not tied to the concept of goodness.</p>
<p>This week, let’s look at the first. Here are some tasks for establishing discrete measures and a scale for customer classification:</p>
<ul>
<li>Selecting some key dimensions of value and a unit of measure (such as annual sales in dollars, number of items purchased or duration of the relationship in months),</li>
<li>Selecting a weighting factor for each measured dimension,</li>
<li>Deciding on the number of customer goodness levels,</li>
<li>Setting thresholds for each level, and</li>
<li>Coming up with discrete measures of goodness.</li>
</ul>
<p>The weighting factors might be initially set in a somewhat arbitrary way. For some of the other decisions, we can presume that the distribution of the customer base is a normal distribution. That means we can start out using the “banding” of six standard deviations to set the customer goodness levels and the thresholds for each level, since more than 99% of the scores should lie within three standard deviations of the mean:</p>
<table border="1" cellspacing="0" cellpadding="0">
<tbody>
<tr>
<td valign="top" width="221">
<p align="center"><strong>Customer Classification</strong></p>
</td>
<td valign="top" width="221">
<p align="center"><strong>Threshold Score</strong></p>
</td>
</tr>
<tr>
<td valign="top" width="221">Golden</td>
<td valign="top" width="221">&gt; 97.8%</td>
</tr>
<tr>
<td valign="top" width="221">Best</td>
<td valign="top" width="221">83.6% &#8211; 97.8%</td>
</tr>
<tr>
<td valign="top" width="221">Good</td>
<td valign="top" width="221">50% &#8211; 83.6%</td>
</tr>
<tr>
<td valign="top" width="221">Fair</td>
<td valign="top" width="221">15.8% &#8211; 50%</td>
</tr>
<tr>
<td valign="top" width="221">Bad</td>
<td valign="top" width="221">2.2% &#8211; 15.8%</td>
</tr>
<tr>
<td valign="top" width="221">Worst</td>
<td valign="top" width="221">&lt; 2.2%</td>
</tr>
</tbody>
</table>
<p>&nbsp;</p>
<p>After evaluating the groupings, you might want to tweak the measures, weights and thresholds – perhaps you did not pull in some expected demographic, or you know a really good customer who didn’t get classified as a good customer. However, this provides at least one starting point for classification. Next time we will look at understanding the characteristics of individuals within each of those classifications.</p>
<div>
<div class="call-out-box">Read more posts by <a title="David Loshin posts" href="http://www.dataroundtable.com/?author=2" target="_blank">David Loshin</a>.</div>
</div>
]]></content:encoded>
			<wfw:commentRss>http://www.dataroundtable.com/?feed=rss2&#038;p=13174</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>20 Encounters of the Information Management Kind – #6 Converting History! Does it Make Sense?</title>
		<link>http://www.dataroundtable.com/?p=13280&#038;utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=20-encounters-of-the-information-management-kind-6-converting-history-does-it-make-sense</link>
		<comments>http://www.dataroundtable.com/?p=13280#comments</comments>
		<pubDate>Mon, 10 Jun 2013 13:00:36 +0000</pubDate>
		<dc:creator>Joyce Norris-Montanari</dc:creator>
				<category><![CDATA[Data Management]]></category>
		<category><![CDATA[data warehouse]]></category>

		<guid isPermaLink="false">http://www.dataroundtable.com/?p=13280</guid>
		<description><![CDATA[When should you convert history in a data warehouse? Joyce Norris-Montanari explains.]]></description>
			<content:encoded><![CDATA[<p>There are two cases, that I can think of, where you may have to consider whether to convert history in a data warehouse. They are:<span id="more-13280"></span></p>
<p>1. <strong>Initial creation of the data warehouse.</strong> In the past, we have always entertained the feasibility of the conversion of history data even if the history data resides in the source system, a spreadsheet or another makeshift data warehouse. In some cases, the source system has gotten large and the historical data is not needed or used. So we consider bringing the historical data into the data warehouse that the business requirements dictate. You need to be careful here, as to not make the data warehouse your archival system. We only want the historical data that requires business usage!</p>
<p>2. <strong>When you are changing or revamping the data warehouse.</strong> In this instance, the business may have changed or you are incorporating enhanced data. The enhanced data may be something that is purchased (probably by marketing or sales), and we want to apply this data to historical data warehouse records.</p>
<p>So you have to ask yourself (and the business users) IS IT WORTH IT? If you google or bing &#8220;feasibility studies&#8221; it will give you good ideas on what you may want to include in your own feasibility study. Resources that are required to convert history are: people, hardware and software. Always consider giving the option of &#8220;START HISTORY FROM IMPLEMENTATION DATE&#8221;… this is the cheapest and easiest way to deal with history.</p>
<div>
<div class="call-out-box">Read more posts by <a title="Joyce Norris-Montanari posts" href="http://www.dataroundtable.com/?author=5" target="_blank">Joyce Norris-Montanari</a>.</div>
</div>
]]></content:encoded>
			<wfw:commentRss>http://www.dataroundtable.com/?feed=rss2&#038;p=13280</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Introducing a New Data Quality Dimension: Irrelevance</title>
		<link>http://www.dataroundtable.com/?p=13318&#038;utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=introducing-a-new-data-quality-dimension-irrelevance</link>
		<comments>http://www.dataroundtable.com/?p=13318#comments</comments>
		<pubDate>Fri, 07 Jun 2013 13:00:42 +0000</pubDate>
		<dc:creator>Dylan Jones</dc:creator>
				<category><![CDATA[Data Quality]]></category>

		<guid isPermaLink="false">http://www.dataroundtable.com/?p=13318</guid>
		<description><![CDATA[How do you deal with irrelevant data? Dylan Jones weighs in.]]></description>
			<content:encoded><![CDATA[<p dir="ltr">Do you want to know what one of the single largest causes of bad data is?</p>
<p>Irrelevant data.</p>
<p dir="ltr">Irrelevant data has no value or place within your business, yet for many reasons it is still being maintained (badly).<span id="more-13318"></span></p>
<p>It accounts for huge amounts of physical real estate within your data landscape and, if left ignored, it can skew your data quality assessments, confuse the heck out of knowledge workers and cause unnecessary bloat within your systems.</p>
<p><strong>Where does irrelevant data come from?</strong></p>
<p>There are numerous reasons for the creation of irrelevant data, but here are some of the most common:</p>
<ul>
<li>
<p dir="ltr"><strong>COTS</strong>: Custom-off-the-shelf systems often cater to the broad industry. As a result, there are often many screens and data structures that simply don’t relate to all organisations.</p>
</li>
<li>
<p dir="ltr"><strong>M&amp;A inheritance</strong>: Legacy systems are integrated or migrated from newly acquired or merged companies.</p>
</li>
<li>
<p dir="ltr"><strong>Shifting business models and processes</strong>: Over time, your underlying business processes have changed, but these changes haven’t been reflected in the data.</p>
</li>
<li>
<p dir="ltr"><strong>Lack of archival strategy</strong>: Old data and data structures are not routinely moved from systems.</p>
</li>
</ul>
<p>Do any of the above scenarios apply to your organisation? If they do, then welcome to the “Data Irrelevance Dimension.”</p>
<p><strong>What are the benefits of removing irrelevant data?</strong></p>
<p dir="ltr">By removing irrelevant data you get immediate gains such as increased query performance and reduced storage requirements. However, the main benefits come from “decluttering” some of the baggage that is slowing down your core user processes.</p>
<p dir="ltr">For example, imagine that you have inherited a COTS system that has field entries for international addresses, yet you only deal with domestic markets. Having to tab past redundant fields can slow up call centre staff or lead to information going in the wrong fields.</p>
<p dir="ltr">Perhaps you’ve inherited an asset management system as a result of a merger. The acquired organisation had a slightly different business model that stored site information for health and safety reasons that isn&#8217;t applicable to your organisation. The data remains in the system, and when the original organisation&#8217;s data is migrated into the acquired system you now have irrelevant data that persists.</p>
<p dir="ltr">Getting rid of irrelevant data simplifies user processes and operational performance. These are more than enough reason to explore its removal, but where should you start?</p>
<p dir="ltr"><strong>How do you get rid of irrelevant data?</strong></p>
<p dir="ltr">I typically approach this with the full support of the business. You need to get them bought into the process.</p>
<p dir="ltr">First, profile the data and look for trends and obvious events in the lifetime history of the data. You’ll often see fields that have not been updated for several years. By having business experts with you there is more chance of spotting occurrences of irrelevance.</p>
<p dir="ltr">Defining what data you need is another critical activity. Getting the business community to agree on common terms and definitions can help in determining what data should be deleted.</p>
<p dir="ltr">Performing functional modelling exercises is another great way of performing a gap analysis of what data and functions you have compared to what you really need. This also helps validate your current business model and business processes.</p>
<p dir="ltr">Somewhat more challenging is removing screen design elements that map to redundant data structures. This can be problematic on COTS solutions where updates to screen and application design are often not supported. However, most modern systems allow you to customise a lot of the application infrastructure. (Hint: This ability to customise an app should also form part of your search for any new COTS solutions).</p>
<p dir="ltr">You will also need to perform an information chain analysis exercise to see which systems depend on the irrelevant data or data structures. Applications and ETL scripts can fail quite easily when the underlying schemas and information sources are changed in an uncontrolled manner. Create an amnesty for downstream users of redundant data structures but give them a deadline for when they need to come forward.</p>
<p dir="ltr">How have you got rid of irrelevant data in the past? How did it impact your organisation?</p>
<p dir="ltr">Welcome your experiences in the comments below.</p>
<div></div>
<div>
<div class="call-out-box">Read a white paper on data quality: <a title="data quality white paper" href="http://www.sas.com/reg/wp/corp/52450" target="_blank">Building a Data Quality Scorecard for Operational Data Governance</a>.</div>
</div>
]]></content:encoded>
			<wfw:commentRss>http://www.dataroundtable.com/?feed=rss2&#038;p=13318</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>On Data and the Gender Gap</title>
		<link>http://www.dataroundtable.com/?p=12766&#038;utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=on-data-and-the-gender-gap</link>
		<comments>http://www.dataroundtable.com/?p=12766#comments</comments>
		<pubDate>Thu, 06 Jun 2013 13:00:56 +0000</pubDate>
		<dc:creator>Phil Simon</dc:creator>
				<category><![CDATA[Data Management]]></category>
		<category><![CDATA[compensation]]></category>
		<category><![CDATA[Sheryl Sandberg]]></category>

		<guid isPermaLink="false">http://www.dataroundtable.com/?p=12766</guid>
		<description><![CDATA[.@philsimon on gender issues and the data behind them.]]></description>
			<content:encoded><![CDATA[<p><a title="Sandberg Book" href="http://www.amazon.com/Lean-In-Women-Work-Will/dp/0385349947" target="_blank">Sheryl Sandberg&#8217;s new book</a><span style="font-size: small;"> has, as expected, ignited a debate over women in the workplace. Sandberg contends that many women don&#8217;t negotiate very well, and this inability is largely </span>responsible<span style="font-size: small;"> for &#8220;the gender gap.&#8221; </span></p>
<p><span id="more-12766"></span>Dice recently reported that <a title="Gender Gap Gone?" href="http://media.dice.com/report/spotlight-on-women-in-tech-3/" target="_blank">the gap has largely disappeared</a>. Yes, salary differences remain between the sexes, but they can be explained by &#8220;legitimate&#8221; factors like experience.</p>
<h3>Looking at the (Hypothetical) Data</h3>
<p><span style="font-size: 13px;">Before I became a proper techie, I used to work in corporate HR. I learned a long time ago that everyone&#8217;s dispassionate about <em>other people&#8217;s </em>compensation. Once they talk about their own, though, emotions run high. </span></p>
<p>So, are men paid more than women? And are there mitigating factors?</p>
<p><em>Note that, in this post, I use hypothetical data to make a point.</em></p>
<p>Looking at the average salaries for programmers, there is in fact a legitimate difference between men and women of about eight percent:</p>
<p><a href="http://simonsandbox.wpengine.com/wp-content/uploads/2013/03/df1.jpg"><img class="alignnone size-medium wp-image-8774" src="http://simonsandbox.wpengine.com/wp-content/uploads/2013/03/df1-300x46.jpg" alt="df1" width="300" height="46" /></a></p>
<p>But macro-level statistics may mask perfectly valid reasons for this difference. Perhaps the generic title of <em>programmer</em> obscures differences in skills. In 2013, knowing how to code in <a title="COBOL" href="http://en.wikipedia.org/wiki/COBOL" target="_blank">COBOL</a> isn&#8217;t necessarily as valuable as knowing PHP, right? Or what about experience? All else equal, programmers with more experience might make more than those new to the field, irrespective of gender.</p>
<p>Throwing in years of experience, you see that the gender differences are much smaller, even non-existent in some cases:.</p>
<p><a href="http://simonsandbox.wpengine.com/wp-content/uploads/2013/03/df2.jpg"><img class="alignnone size-medium wp-image-8775" src="http://simonsandbox.wpengine.com/wp-content/uploads/2013/03/df2-300x120.jpg" alt="df2" width="300" height="120" /></a></p>
<p>Looking at the data above, you&#8217;ll notice a clear outlier here. The guy with seven years of experience makes the most money out of everyone. Perhaps he has a unique programming skill? A good recruiter? Maybe he knows something about the VP&#8217;s extracurricular activities and gets a little extra to keep his mouth quiet.</p>
<p>Regardless of why, this outlier can throw off the numbers for everyone if the sample size is small. So, is it?</p>
<p><a href="http://simonsandbox.wpengine.com/wp-content/uploads/2013/03/df3.jpg"><img class="alignnone size-medium wp-image-8776" src="http://simonsandbox.wpengine.com/wp-content/uploads/2013/03/df3-300x121.jpg" alt="df3" width="300" height="121" /></a></p>
<p>In short, yes. If we were talking about 11 hundred or thousand, then one outlier would have a negligible impact. But with 11, each data point has a disproportionate impact.</p>
<h3>Simon Says</h3>
<p>I&#8217;m not saying that some level of gender discrimination doesn&#8217;t exist today. (This post used simple, hypothetical data). However, before making any controversial statements about any issue, make sure that you&#8217;ve looked at some data first.</p>
<h3>Feedback</h3>
<p>What say you?</p>
<div>
<div class="call-out-box">Subscribe to the <a title="SAS Information Management News" href="https://login.sas.com/opensso/UI/Login?realm=/extweb&amp;goto=http://www.sas.com:80/profile/user/subscribe.htm?subcode%3D215&amp;locale=en_US" target="_blank">SAS Information Management e-newsletter</a> for more insightful blog posts, articles, webcasts and white papers.</div>
</div>
]]></content:encoded>
			<wfw:commentRss>http://www.dataroundtable.com/?feed=rss2&#038;p=12766</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>The Fallacy of Defect Prevention</title>
		<link>http://www.dataroundtable.com/?p=13339&#038;utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=the-fallacy-of-defect-prevention</link>
		<comments>http://www.dataroundtable.com/?p=13339#comments</comments>
		<pubDate>Wed, 05 Jun 2013 13:00:47 +0000</pubDate>
		<dc:creator>Jim Harris</dc:creator>
				<category><![CDATA[Data Quality]]></category>

		<guid isPermaLink="false">http://www.dataroundtable.com/?p=13339</guid>
		<description><![CDATA[Jim Harris (@ocdqblog) on the fallacy of defect prevention.]]></description>
			<content:encoded><![CDATA[<p>One of my least favorite phrases in the data quality industry is “getting data right the first time, every time.” It’s not that I disagree with the premise of defect prevention. Even though it’s impossible to truly prevent every defect before it happens, <a title="dataroundtable.com/?p=1711" href="http://www.dataroundtable.com/?p=1711" target="_blank">defect prevention is highly recommended</a> because the more control enforced where data originates, the better the overall quality will be for enterprise information.<span id="more-13339"></span></p>
<p>However, a major problem with defect prevention is assuming that defects are easily detected and can always be known <em>a priori</em> (i.e., not dependent on experience using the data in any business context) when, in fact, many defects can only be detected <em>a posteriori</em> (i.e., dependent on experience using the data in a specific business context), such as data defects causing a failure in a business process.</p>
<p>According to <a title="eccma.org/iso8000/iso8000home.php" href="http://www.eccma.org/iso8000/iso8000home.php" target="_blank">ISO 8000 standards</a>, when it comes to data quality (more specifically, data accuracy) <a title="ocdqblog.com/home/dq-tip-there-is-no-such-thing-as-data-accuracy.html" href="http://www.ocdqblog.com/home/dq-tip-there-is-no-such-thing-as-data-accuracy.html" target="_blank">there are only assertions</a>. You need to use data to test its assertions before you can determine its quality. Furthermore, quality is never achieved as a permanent state. Instead, it&#8217;s always in flux because of the universality of change. Even data that is defect-free today could be considered defective tomorrow. Therefore, assertions of data quality must be continually reasserted.</p>
<p>And, as Dylan Jones recently blogged, <a title="dataroundtable.com/?p=13309" href="http://www.dataroundtable.com/?p=13309" target="_blank">poor quality data isn’t a pitfall, but an opportunity to learn</a>. He recommended transforming the way you approach bad data by finding the hidden story, tracking problems back to their source and asking questions of those who created the data. This story about the people, processes and technologies involved in creating poor quality data can often create considerable opportunities for increased profits, morale and operational efficiency.</p>
<p>Of course, those learning opportunities will often lead to implementing new &#8211; or strengthening existing &#8211; defect prevention procedures. Just <a title="ocdqblog.com/home/there-is-no-such-thing-as-a-root-cause.html" href="http://www.ocdqblog.com/home/there-is-no-such-thing-as-a-root-cause.html" target="_blank">don’t equate defect prevention with defect elimination</a> — that is the preventable defective reasoning that I refer to as the fallacy of defect prevention.</p>
<div class="call-out-box">Read this related Daniel Teachey blog post: <a title="blogs.sas.com/content/datamanagement/2013/05/31/the-value-of-data-quality/" href="http://blogs.sas.com/content/datamanagement/2013/05/31/the-value-of-data-quality/" target="_blank">The Value of Data Quality</a></div>
]]></content:encoded>
			<wfw:commentRss>http://www.dataroundtable.com/?feed=rss2&#038;p=13339</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
