Computer model predicts dominant SARS-CoV-2 variants
Scientists at the Broad Institute of MIT and Harvard and the University of Massachusetts Medical School have developed a machine learning model that can analyze millions of SARS-CoV-2 genomes and predict which viral variants will likely dominate and cause surges in COVID-19 cases. The model, called PyR0 (pronounced “pie-are-nought”), could help researchers identify which parts of the viral genome will be less likely to mutate and hence be good targets for vaccines that will work against future variants. The findings appear today in Science.
The researchers trained the machine-learning model using 6 million SARS-CoV-2 genomes that were in the GISAID database in January 2022. They showed how their tool can also estimate the effect of genetic mutations on the virus’s fitness — its ability to multiply and spread through a population. When the team tested their model on viral genomic data from January 2022, it predicted the rise of the BA.2 variant, which became dominant in many countries in March 2022. PyR0 would have also identified the alpha variant (B.1.1.7) by late November 2020, a month before the World Health Organization listed it as a variant of concern.
The research team includes first author Fritz Obermeyer, a machine learning fellow at the Broad Institute when the study began, and senior authors Jacob Lemieux, an instructor of medicine at Harvard Medical School and Massachusetts General Hospital, and Pardis Sabeti, an institute member at Broad, a professor at the Center for Systems Biology and the Department of Organismic and Evolutionary Biology at Harvard University, and a professor in the Department of Immunology and Infectious Disease at the Harvard T. H. Chan School of Public Health. Sabeti is also a Howard Hughes Medical Institute investigator.
PyR0 is based on a machine learning framework called Pyro, which was originally developed by a team at Uber AI Labs. In 2020, three members of that team including Obermeyer and Martin Jankowiak, the study’s second author, joined the Broad Institute and began applying the framework to biology.
“This work was the result of biologists and geneticists coming together with software engineers and computer scientists,” Lemieux said. “We were able to tackle some really challenging questions in public health that no single disciplinary approach could have answered on its own.”
“This kind of machine learning-based approach that looks at all the data and combines that into a single prediction is extremely valuable,” said Sabeti. “It gives us a leg up on identifying what’s emerging and could be a potential threat.”
The future of SARS-CoV-2 More