Study shows how machine learning could predict rare disastrous events, like earthquakes or pandemics
When it comes to predicting disasters brought on by extreme events (think earthquakes, pandemics or “rogue waves” that could destroy coastal structures), computational modeling faces an almost insurmountable challenge: Statistically speaking, these events are so rare that there’s just not enough data on them to use predictive models to accurately forecast when they’ll happen next.
But a team of researchers from Brown University and Massachusetts Institute of Technology say it doesn’t have to be that way.
In a new study in Nature Computational Science, the scientists describe how they combined statistical algorithms — which need less data to make accurate, efficient predictions — with a powerful machine learning technique developed at Brown and trained it to predict scenarios, probabilities and sometimes even the timeline of rare events despite the lack of historical record on them.
Doing so, the research team found that this new framework can provide a way to circumvent the need for massive amounts of data that are traditionally needed for these kinds of computations, instead essentially boiling down the grand challenge of predicting rare events to a matter of quality over quantity.
“You have to realize that these are stochastic events,” said George Karniadakis, a professor of applied mathematics and engineering at Brown and a study author. “An outburst of pandemic like COVID-19, environmental disaster in the Gulf of Mexico, an earthquake, huge wildfires in California, a 30-meter wave that capsizes a ship — these are rare events and because they are rare, we don’t have a lot of historical data. We don’t have enough samples from the past to predict them further into the future. The question that we tackle in the paper is: What is the best possible data that we can use to minimize the number of data points we need?”
The researchers found the answer in a sequential sampling technique called active learning. These types of statistical algorithms are not only able to analyze data input into them, but more importantly, they can learn from the information to label new relevant data points that are equally or even more important to the outcome that’s being calculated. At the most basic level, they allow more to be done with less. More