Why Can’t We Predict the Weather?
Jan 18, 2012 by Jim Harris in Big Data, Data Management, Data Quality
In an edited excerpt of his new book, Too Big to Know, David Weinberger explained “Thomas Jefferson and George Washington recorded daily weather observations, but they didn’t record them hourly or by the minute. Not only did they have other things to do, such data didn’t seem useful. Even after the invention of the telegraph enabled the centralization of weather data, the 150 volunteers who received weather instruments from the Smithsonian Institution in 1849 still reported only once a day.”
Nowadays there is, as Weinberger continued, “a literally immeasurable, continuous stream of climate data from satellites circling the earth, buoys bobbing in the ocean, and Wi-Fi-enabled sensors in the rain forest. We are measuring temperatures, rainfall, wind speeds, carbon dioxide levels, and pressure pulses of solar wind.”
Has all of this additional data, and our analysis of it, allowed us to reliably predict the weather?
No, of course not. But why? Does meteorological data suffer from data quality issues? No, the completeness and accuracy (and many other quality dimensions) of this data is astounding. So, is meteorological data not being delivered fast enough to support real-time data-driven decisions about weather forecasting? No, in fact, the velocity of this data is as about real-time as real-time gets.
So, it must be a decision quality problem then, right? In other words, meteorologists must not know how to make high-quality decisions using all of that real-time high-quality meteorological data. Well, as much as we all like to complain about the ineptness of our local weather forecasters, meteorologists are actually well-trained, competent scientists performing numerical weather prediction using computer simulations built on complex mathematical models.
“Models this complex,” Weinberger explained, “often fail us, because the world is more complex than our models can capture. But sometimes they can predict accurately how the system will behave. At their most complex these are sciences of emergence and complexity, studying properties of systems that cannot be seen by looking only at the parts, and cannot be well predicted except by looking at what happens.”
This reminded me of an old statistics joke about mistaking a correlation for a cause, which says that the best predictive variable of whether it will rain is a spike in the sale of umbrellas. It’s a joke because obviously people buy more umbrellas when it’s already raining, so umbrella sales do not forecast rain.
Data-Driven Decision Making
Data-driven decision making exists at the intersection of data quality and decision quality, where quality data supports quality business decisions. The need for improved data quality reflects our organizations need for making better business decisions faster than ever before, using better data, and more varied sources and types of data, and with more transparency in data-driven decision making and its business results.
Despite the fact that the business world will forever remain as predictable as the weather, you cannot turn a blind eye to the need for data-driven decision making best practices, or the reality that no best practice can eliminate the potential for poor data quality and decision quality – nor the potential for poor business results even despite better data quality and decision quality.
Central to continuous improvement is the importance of closing the feedback loops that make data-driven decisions more transparent through better monitoring, allowing the organization to learn from its decision-making mistakes, and make adjustments when necessary.
Although it cannot predict business success, continuous improvement can enable better decisions with better data, which can reliably forecast better business performance for your organization.





Phil Simon
Jan 18, 2012
I hadn’t heard that joke before. Brilliant.
Jim Harris
Jan 18, 2012
Yes, some of the best statistics jokes come from mistaking correlations for causes, which, because of the prevalence of those mistakes in data-driven decision making, are often just as sad as they are funny . . .
and
. . . plus or minus two standard deviations, of course