COVID radar: Genetic sequencing can help predict severity of next variant
As public health officials around the world contend with the latest surge of the COVID-19 pandemic, researchers at Drexel University have created a computer model that could help them be better prepared for the next one. Using machine learning algorithms, trained to identify correlations between changes in the genetic sequence of the COVID-19 virus and upticks in transmission, hospitalizations and deaths, the model can provide an early warning about the severity of new variants.
More than two years into the pandemic, scientists and public health officials are doing their best to predict how mutations of the SARS-CoV-2 virus are likely to make it more transmissible, evasive to the immune system and likely to cause severe infections. But collecting and analyzing the genetic data to identify new variants — and linking it to the specific patients who have been sickened by it — is still an arduous process.
Because of this, most public health projections about new “variants of concern” — as the World Health Organization categorizes them — are based on surveillance testing and observation of the regions where they are already spreading.
“The speed with which new variants, like Omicron have made their way around the globe means that by the time public health officials have a good handle on how vulnerable their population might be, the virus has already arrived,” said Bahrad A. Sokhansanj, PhD, an assistant research professor in Drexel’s College of Engineering who led development of the computer model. “We’re trying to give them an early warning system — like advanced weather modeling for meteorologists — so they can quickly predict how dangerous a new variant is likely to be — and prepare accordingly.”
The Drexel model, which was recently published in the journal Computers in Biology and Medicine, is driven by a targeted analysis of the genetic sequence of the virus’s spike protein — the part of the virus that allows it to evade the immune system and infect healthy cells, it is also the part known to have mutated most frequently throughout the pandemic — combined with a mixed effects machine learning analysis of factors such as age, sex and geographic location of COVID patients.
Learning to Find Patterns
The research team used a newly developed machine learning algorithm, called GPBoost, based on methods commonly used by large companies to analyze sales data. Via a textual analysis, the program can quickly home in on the areas of the genetic sequence that are most likely to be linked to changes in the severity of the variant. More