Big Data titillates the modern imagination with grand and strangely credible promises. It also raises equally tenable risks. I’ve written about the promises and the pitfalls (See here, here, here and here), but I do not believe that we can harness the potential of data science or mitigate attendant risks if we try to reduce the broad repercussions of Big Data to simplistic constructs that simply miss the point (e.g., pros and cons, costs and benefits, good or bad) .
These reductionist perspectives typically assume that the future of data science hinges on the predictive and explanatory value of the risk models it produces. So, if data scientists (formerly known as statisticians) can predict the next Enron or the next Fukushima, then we’ll all turn to these beautiful minds in gratitude and celebrate the redemptive impact of their genius. Based on my recent experience, I caution the geniuses not to hold their breath.
In my last job, I worked closely with data scientists who developed several statistical models that accurately identified stocks with the most aggressive accounting practices and the highest relative risk of major share price drops. We subjected the models to extensive testing, which consistently showed that eliminating these risky stocks from investment portfolios would materially improve risk-adjusted returns. Along with some successes in sales and marketing, we also met with staunch resistance, which inspired this essay last year: Do We Really Want to Detect Accounting Fraud?
I am now working with scientists who are using novel data analysis methods to predict the next major earthquake. I’ll share details about this data in the coming days and weeks. For now, I refer to earthquake prediction mainly to illustrate a larger point: As any large-scale innovation, data science can overcome its methodological challenges (e.g., prediction, risk assessment) much more easily than it can overcome institutional inertia and dogmatism. As I recently learned, it’s hard to get mainstream media to cover earthquake forecasts because mainstream science has enshrined the dogma that earthquakes are unpredictable.
We do not yet know that this position is wrong, but we know that it’s overdue for a thorough reassessment. The resistance to this scrutiny brings to mind the work of Thomas Kuhn, the physicist, historian and philosopher of science who coined the term “paradigm shift” in his 1962 book The Structure of Scientific Revolutions. Here, Kuhn explained that dominant paradigms survive, as they should, as long as they can describe, explain or predict observed phenomena more effectively than competing theories. But the accumulation of anomalous events that do not conform to the reigning paradigm warrants and typically causes the paradigm’s revision or displacement. Conscientious scientists do not resist this inevitability. They embrace it.
- We may be getting closer to predicting big earthquakes (Vox, 5/17): “Some seismologists have come up with a new proposed precursor – and the recent large earthquakes in Tohoku, Japan and Chile have lent that proposal some credibility.”
- Pair of seismologists publicly wonder if it might be possible to predict largest earthquakes (Phys.org, 5/16): “Together they [seismologists Emily Brodsky and Thorne Lay at the University of California] published a Perspective piece in the journal Science, questioning the traditional belief in the earth sciences field that it’s impossible to predict earthquakes of any kind and likely will always be that way.”
- Are scientists getting closer to predicting major earthquakes? (CBS News, 5/16): “Whether earthquakes are predictable or not is still an open question, but perhaps there is now some cause for optimism.”
- Fracking linked to Ohio earthquakes, officials say (CBS News, 5/14): “The state has temporarily shut down a group of wells suspected of causing quakes and the Department of Natural Resources is setting tougher standards for drilling permits.”
- Scientists Warn of Quake Risk From Fracking Operations (National Geographic, 5/2): “Evidence continues to accumulate that the activities associated with the North American oil and gas boom can lead to unintended, man-made tremors, or ‘induced seismicity'”.
- How Mexicans know when an earthquake is coming (The Economist, 4/27): “Seismologists installed sensors in the south of the country that detect the first tremors and send a warning to the capital. The seismic wave moves at about 7,000 miles per hour. That sounds fast, but it means that it takes the quake nearly two minutes to travel the 200 miles from Oaxaca to Mexico City.”
- Earthquake science in the era of big data (USC News, 2/24): “…by beefing up and modernizing the region’s seismographic network and then crunching the massive reams of resulting data, scientists from SCEC have been able to piece together a clearer, more granular picture of the varying risk that regions throughout Southern California face due to earthquakes.”
Related Blog Posts and Other Links
- Centenarian Chain Smokers: Lessons about the Predictive and Explanatory Value of ESG Research: “Risk modeling is not prediction. The former is an earnest attempt to make thoughtful decisions in the context of complex uncertainty. The latter is a centerpiece in Wall Street’s media-fueled culture of ‘propheteering’.”
- New Frontiers in Risk Modeling: “In The Structure of Scientific Revolutions, Kuhn explained that dominant paradigms survive, as they should, as long as they can describe, explain or predict observed phenomena more effectively than competing theories. But the accumulation of anomalous events that do not conform to the reigning paradigm warrants and typically causes the paradigm’s revision or displacement.”
- Propheteers: The Future of Wall Street’s False Prophets: “Along with legitimate scholars and creative geniuses, the art and science of forecasting has always attracted opportunists eager to exploit people’s anxieties about the inherently uncertain future.”
Pricing Earthquake Reinsurance (Ophir Gottlieb, Stanford University): “Earthquake occurrences are modeled statistically with variants of the Hawkes Process. Monte Carlo simulations are then used to obtain the price of tranches for different market parameters.”