More stories

  • in

    Act now to prevent uncontrolled rise in carbon footprint of computational science

    Cambridge scientists have set out principles for how computational science — which powers discoveries from unveiling the mysteries of the universe to developing treatments to fight cancer to improving our understanding of the human genome, but can have a substantial carbon footprint — can be made more environmentally sustainable.
    Writing in Nature Computational Science, researchers from the Department of Public Health and Primary Care at the University of Cambridge argue that the scientific community needs to act now if it is to prevent a potentially uncontrolled rise in the carbon footprint of computational science as data science and algorithms increase in usage.
    Dr Loïc Lannelongue, who is a research associate in biomedical data science and a postdoctoral associate at Jesus College, Cambridge, said: “Science has transformed our understanding of the world around us and has led to great benefits to society. But this has come with a not-insignificant — and not always well understood — impact on the environment. As scientists — as with people working in every sector — it’s important that we do what we can to reduce the carbon footprint of our work to ensure that the benefits of our discoveries are not outweighed by their environmental costs.”
    Recent studies have begun to explore the environmental impacts of scientific research, with an initial focus on scientific conferences and experimental laboratories. For example, the 2019 Fall Meeting of the American Geophysical Union was estimated to emit 80,000 tons of CO2e* (tCO2e), equivalent to the average weekly emissions of the city of Edinburgh, UK. The annual carbon footprint of a typical life science laboratory has been estimated to be around 20 tCO2e.
    But there is one aspect of research that often gets overlooked — and which can have a substantial environmental impact: high performance and cloud computing.
    In 2020, the Information and Communication Technologies sector was estimated to have made up between 1.8% and 2.8% of global greenhouse gas emissions — more than aviation (1.9%). In addition to the environmental effects of electricity usage, manufacturing and disposal of hardware, there are also concerns around data centres’ water usage and land footprint.

    Professor Michael Inouye said: “While the environmental impact of experimental ‘wet’ labs is more immediately obvious, the impact of algorithms is less clear and often underestimated. While new hardware, lower-energy data centres and more efficient high performance computing systems can help reduce their impact, the increasing ubiquity of artificial intelligence and data science more generally means their carbon footprint could grow exponentially in coming years if we don’t act now.”
    To help address this issue, the team has developed GREENER (Governance, Responsibility, Estimation, Energy and embodied impacts, New collaborations, Education and Research), a set of principles to allow the computational science community to lead the way in sustainable research practices, maximising computational science’s benefit to both humanity and the environment.
    Governance and Responsibility — Everyone involved in computational science has a role to play in making the field more sustainable: individual and institutional responsibility is a necessary step to ensure transparency and reduction of greenhouse gas emission.
    For example, institutions themselves can be key to managing and expanding centralised data infrastructures, and in ensuring that procurement decisions take into account both the manufacturing and operational footprint of hardware purchases. IT teams in high performance computing (HPC) centres can play a key role, both in terms of training and helping scientists monitor the carbon footprint of their work. Principal Investigators can encourage their teams to think about this issue and give access to suitable training. Funding bodies can influence researchers by requiring estimates of carbon footprints to be included in funding applications.
    Estimate and report the energy consumption of algorithms — Estimating and monitoring the carbon footprint of computations identifies inefficiencies and opportunities for improvement.

    User-level metrics are crucial to understanding environmental impacts and promoting personal responsibility. The financial cost of running computations is often negligible, particularly in academia, and scientists may have the impression of unlimited and inconsequential computing capacity. Quantifying the carbon footprint of individual projects helps raise awareness of the true costs of research.
    Tackling Energy and embodied impacts through New collaborations — Minimising carbon intensity — that is, the carbon footprint of producing electricity — is one of the most immediately impactful ways to reduce greenhouse gas emissions. This could involve relocating computations to low-carbon settings and countries, but this needs to be done with equity in mind. Carbon intensities can differ by as much as three orders of magnitude between the top and bottom performing high-income countries (from 0.10 gCO2e/kWh in Iceland to 770 gCO2e/kWh in Australia).
    The footprint of user devices is also a factor: one estimate found that almost three-quarters (72%) of the energy footprint of streaming a video to a laptop is from the laptop, with 23% used in transmission and a mere 5% at the data centre.
    Another key consideration is data storage. The carbon footprint of storing data depends on numerous factors, but the life cycle footprint of storing one terabyte of data for a year is of the order of 10 kg CO2e. This issue is exacerbated by the duplication of such datasets in order for each institution, and sometimes each research group, to have a copy. Large (hyperscale) data centres are expected to be more energy efficient, but they may also encourage unnecessary increases in the scale of computing (the ‘rebound effect’).
    Education and Research — Education is essential to raise awareness of the issues with different stakeholders. Integrating sustainability into computational training courses is a tangible first step toward reducing carbon footprints. Investing in research that will catalyse innovation in the field of environmentally sustainable computational science is a crucial role for funders and institutions to play.
    Recent studies found that the most widely-used programming languages in research, such as R and Python, tend to be the least energy efficient ones, highlighting the importance of having trained Research Software Engineers within research groups to ensure that the algorithms used are efficiently implemented. There is also scope to use current tools more efficiently by better understanding and monitoring how coding choices impact carbon footprints.
    Dr Lannelongue said: “Computational scientists have a real opportunity to lead the way in sustainability, but this is going to involve a change in our culture and the ways we work. There will need to more transparency, more awareness, better training and resources, and improved policies.
    “Cooperation, open science, and equitable access to low-carbon computing facilities will also be crucial. We need to make sure that sustainable solutions work for everyone, as they frequently have the least benefit for populations, often in low- and middle-income countries, who suffer the most from climate change.”
    Professor Inouye added: “Everyone in the field — from funders to journals to institutions down to individuals — plays an important role and can, themselves, make a positive impact. We have an immense opportunity to make a change, but the clock is ticking.”
    The research was a collaboration with major stakeholders including Health Data Research UK, EMBL-EBI, Wellcome and UK Research and Innovation (UKRI).
    *CO2e, or CO2-equivalent, summarises the global warming impacts of a range of greenhouse gases and is the standard metric for carbon footprints, although its accuracy is sometimes debated. More

  • in

    ‘Toggle switch’ can help quantum computers cut through the noise

    What good is a powerful computer if you can’t read its output? Or readily reprogram it to do different jobs? People who design quantum computers face these challenges, and a new device may make them easier to solve.
    The device, introduced by a team of scientists at the National Institute of Standards and Technology (NIST), includes two superconducting quantum bits, or qubits, which are a quantum computer’s analogue to the logic bits in a classical computer’s processing chip. The heart of this new strategy relies on a “toggle switch” device that connects the qubits to a circuit called a “readout resonator” that can read the output of the qubits’ calculations.
    This toggle switch can be flipped into different states to adjust the strength of the connections between the qubits and the readout resonator. When toggled off, all three elements are isolated from each other. When the switch is toggled on to connect the two qubits, they can interact and perform calculations. Once the calculations are complete, the toggle switch can connect either of the qubits and the readout resonator to retrieve the results.
    Having a programmable toggle switch goes a long way toward reducing noise, a common problem in quantum computer circuits that makes it difficult for qubits to make calculations and show their results clearly.
    “The goal is to keep the qubits happy so that they can calculate without distractions, while still being able to read them out when we want to,” said Ray Simmonds, a NIST physicist and one of the paper’s authors. “This device architecture helps protect the qubits and promises to improve our ability to make the high-fidelity measurements required to build quantum information processors out of qubits.”
    The team, which also includes scientists from the University of Massachusetts Lowell, the University of Colorado Boulder and Raytheon BBN Technologies, describes its results in a paper published today in Nature Physics.
    Quantum computers, which are still at a nascent stage of development, would harness the bizarre properties of quantum mechanics to do jobs that even our most powerful classical computers find intractable, such as aiding in the development of new drugs by performing sophisticated simulations of chemical interactions.

    However, quantum computer designers still confront many problems. One of these is that quantum circuits are kicked around by external or even internal noise, which arises from defects in the materials used to make the computers. This noise is essentially random behavior that can create errors in qubit calculations.
    Present-day qubits are inherently noisy by themselves, but that’s not the only problem. Many quantum computer designs have what is called a static architecture, where each qubit in the processor is physically connected to its neighbors and to its readout resonator. The fabricated wiring that connects qubits together and to their readout can expose them to even more noise.
    Such static architectures have another disadvantage: They cannot be reprogrammed easily. A static architecture’s qubits could do a few related jobs, but for the computer to perform a wider range of tasks, it would need to swap in a different processor design with a different qubit organization or layout. (Imagine changing the chip in your laptop every time you needed to use a different piece of software, and then consider that the chip needs to be kept a smidgen above absolute zero, and you get why this might prove inconvenient.)
    The team’s programmable toggle switch sidesteps both of these problems. First, it prevents circuit noise from creeping into the system through the readout resonator and prevents the qubits from having a conversation with each other when they are supposed to be quiet.
    “This cuts down on a key source of noise in a quantum computer,” Simmonds said.

    Second, the opening and closing of the switches between elements are controlled with a train of microwave pulses sent from a distance, rather than through a static architecture’s physical connections. Integrating more of these toggle switches could be the basis of a more easily programmable quantum computer. The microwave pulses can also set the order and sequence of logic operations, meaning a chip built with many of the team’s toggle switches could be instructed to perform any number of tasks.
    “This makes the chip programmable,” Simmonds said. “Rather than having a completely fixed architecture on the chip, you can make changes via software.”
    One last benefit is that the toggle switch can also turn on the measurement of both qubits at the same time. This ability to ask both qubits to reveal themselves as a couple is important for tracking down quantum computational errors.
    The qubits in this demonstration, as well as the toggle switch and the readout circuit, were all made of superconducting components that conduct electricity without resistance and must be operated at very cold temperatures. The toggle switch itself is made from a superconducting quantum interference device, or “SQUID,” which is very sensitive to magnetic fields passing through its loop. Driving a microwave current through a nearby antenna loop can induce interactions between the qubits and the readout resonator when needed.
    At this point, the team has only worked with two qubits and a single readout resonator, but Simmonds said they are preparing a design with three qubits and a readout resonator, and they have plans to add more qubits and resonators as well. Further research could offer insights into how to string many of these devices together, potentially offering a way to construct a powerful quantum computer with enough qubits to solve the kinds of problems that, for now, are insurmountable. More

  • in

    Generative AI models are encoding biases and negative stereotypes in their users

    In the space of a few months generative AI models, such as ChatGPT, Google’s Bard and Midjourney, have been adopted by more and more people in a variety of professional and personal ways. But growing research is underlining that they are encoding biases and negative stereotypes in their users, as well as mass generating and spreading seemingly accurate but nonsensical information. Worryingly, marginalised groups are disproportionately affected by the fabrication of this nonsensical information.
    In addition, mass fabrication has the potential to influence human belief as the models that drive it become increasingly common, populating the World Wide Web. Not only do people grab information from the web, but much of the primary training material used by AI models comes from here too. In other words, a continuous feedback loop evolves in which biases and nonsense become repeated and accepted again and again.
    These findings — and a plea for psychologists and machine learning experts to work together very swiftly to assess the scale of the issue and devise solutions — are published today in a thought-provoking Perspective in leading international journal, Science, co-authored by Abeba Birhane, who is an adjunct assistant professor in Trinity’s School of Computer Science and Statistics (working with Trinity’s Complex Software Lab) and Senior Fellow in Trustworthy AI at the Mozilla Foundation.
    Prof Birhane said: “People regularly communicate uncertainty through phrases such as ‘I think,’ response delays, corrections, and speech disfluencies. By contrast, generative models give confident, fluent responses with no uncertainty representations nor the ability to communicate their absence. As a result, this can cause greater distortion compared with human inputs and lead to people accepting answers as factually accurate. These issues are exacerbated by financial and liability interests incentivising companies to anthropomorphise generative models as intelligent, sentient, empathetic, or even childlike.
    One such example provided in the Perspective focuses on how statistical regularities in a model assigned Black defendants with higher risk scores. Court judges, who learned the patterns, may then change their sentencing practices in order to match the predictions of the algorithms. This basic mechanism of statistical learning could lead a judge to believe Black individuals to be more likely to reoffend — even if use of the system is stopped by regulations like those recently adopted in California.
    Of particular concern is the fact that it is not easy to shake biases or fabricated information once it has become accepted by an individual. Children are at especially high risk as they are more vulnerable to belief distortion as they are more likely to anthropomorphise technology and are more easily influenced.
    What is needed is swift, detailed analysis that measures the impact of generative models on human beliefs and biases.
    Prof Birhane said: “Studies and subsequent interventions would be most effectively focused on impacts on the marginalised populations who are disproportionately affected by both fabrications and negative stereotypes in model outputs. Additionally resources are needed for the education of the public, policymakers, and interdisciplinary scientists to give realistically informed views of how generative AI models work and to correct existing misinformation and hype surrounding these new technologies.” More

  • in

    Perovskite solar cells set new record for power conversion efficiency

    Perovskite solar cells designed by a team of scientists from the National University of Singapore (NUS) have attained a world record efficiency of 24.35% with an active area of 1 cm2. This achievement paves the way for cheaper, more efficient and durable solar cells.
    To facilitate consistent comparisons and benchmarking of different solar cell technologies, the photovoltaic (PV) community uses a standard size of at least 1 cm2 to report the efficiency of one-sun solar cells in the “Solar Cell Efficiency Tables.” Prior to the record-breaking feat by the NUS team, the best 1-cm2 perovskite solar cell recorded a power conversion efficiency of 23.7%. This ground-breaking achievement in maximising power generation from next-generation renewable energy sources will be crucial to securing world’s energy future.
    Perovskites are a class of materials that exhibit high light absorption efficiency and ease of fabrication, making them promising for solar cell applications. In the past decade, perovskite solar cell technology has achieved several breakthroughs, and the technology continues to evolve.
    “To address this challenge, we undertook a dedicated effort to develop innovative and scalable technologies aimed at improving the efficiency of 1-cm2 perovskite solar cells. Our objective was to bridge the efficiency gap and unlock the full potential of larger-sized devices,” said Assistant Professor Hou Yi, leader of the NUS research team comprising scientists from the Department of Chemical and Biomolecular Engineering under the NUS College of Design and Engineering as well as the Solar Energy Research Institute of Singapore (SERIS), a university-level research institute in NUS.
    He added, “Building on more than 14 years of perovskite solar cell development, this work represents the first instance of an inverted-structure perovskite solar cell exceeding the normal structured perovskite solar cells with an active area of 1 cm2, and this is mainly attributed to the innovative charge transporting material incorporated in our perovskite solar cells. Since inverted-structure perovskite solar cells always offer excellent stability and scalability, achieving a higher efficiency than for normal-structured perovskite cells represents a significant milestone in commercialising this cutting-edge technology.”
    This milestone achievement by Asst Prof Hou Yi and his team has been included in the Solar Cell Efficiency Tables (Version 62) in 2023. Published by scientific journal Progress in Photovoltaics on 21 June 2023, these consolidated tables show an extensive listing of the highest independently confirmed efficiencies for solar cells and modules.

    Low-cost, efficient and stable solar cell technology
    The record-breaking accomplishment was made by successfully incorporating a novel interface material into perovskite solar cells.
    “The introduction of this novel interface material brings forth a range of advantageous attributes, including excellent optical, electrical, and chemical properties. These properties work synergistically to enhance both the efficiency and longevity of perovskite solar cells, paving the way for significant improvements in their performance and durability,” explained team member Dr Li Jia, postdoctoral researcher at SERIS.
    The promising results reported by the NUS team mark a pivotal milestone in advancing the commercialisation of a low-cost, efficient, stable perovskite solar cell technology. “Our findings set the stage for the accelerated commercialisation and integration of solar cells into various energy systems. We are excited by the prospects of our invention that represents a major contribution to a sustainable and renewable energy future,” said team member Mr Wang Xi, an NUS doctoral student.
    Towards a greener future
    Building upon this exciting development, Asst Prof Hou and his team aim to push the boundaries of perovskite solar cell technology even further.
    Another key area of focus is to improve the stability of perovskite solar cells, as perovskite materials are sensitive to moisture and can degrade over time. Asst Prof Hou commented, “We are developing a customised accelerating aging methodology to bring this technology from the lab to the fab. One of our next goals is to deliver perovskite solar cells with 25 years of operational stability.”
    The team is also working to scale up the solar cells to modules by expanding the dimensions of the perovskite solar cells and demonstrating their viability and effectiveness on a larger scale.
    “The insights gained from our current study will serve as a roadmap for developing stable, and eventually, commercially-viable perovskite solar cell products that can serve as sustainable energy solutions to help reduce our reliance on fossil fuels,” Asst Prof Hou added. More

  • in

    Breakthrough innovation could solve temperature issues for source-gated transistors and lead to low-cost, flexible displays

    Low-cost, flexible displays that use very little energy could be a step closer, thanks to an innovation from the University of Surrey that solves a problem that has plagued source-gated transistors (SGT).
    SGTs are not widely used because current designs have a problem with how their performance changes with temperature. To solve this problem, scientists from the University of Surrey have developed a new design for the transistor part called the source. They have proposed adding very thin layers of insulating material at the source contact to change the way in which electric charges flow.
    Dr Radu Sporea, project lead from the University of Surrey, said:
    “We used a rapidly emerging semiconductor material called IGZO or indium-gallium-zinc oxide to create the next generation of source-gated transistors. Through nanoscale contact engineering, we obtained transistors that are much more stable with temperature than previous attempts. Device simulations allowed us to understand this effect.
    “This new design adds temperature stability to SGTs and retains usual benefits like using low power, producing high signal amplification, and being more reliable under different conditions. While source-gated transistors are not mainstream because of a handful of performance limitations, we are steadily chipping away at their shortcomings.”
    A source-gated transistor (SGT) is a special type of transistor that combines two fundamental components of electronics — a thin-film transistor and a carefully engineered metal-semiconductor contact. It has many advantages over traditional transistors, including using less power and being more stable. SGTs are suitable for large-area electronics and are promising candidates to be used in various fields such as medicine, engineering and computing.
    Salman Alfarisyi performed the simulations at the University of Surrey as part of his final-year undergraduate project. Salman said:
    “Source-gate transistors could be the building block to new power-efficient flexible electronics technology that helps to meet our energy needs without damaging the health of our planet. For example, their sensing and signal amplification ability makes it easy to recommend them as key elements for medical devices that interface with our entire body, allowing us to better understand human health.”
    The study has been published by IEEE Transactions on Electron Devices.
    The University of Surrey is a world-leading centre for excellence in sustainability — where our multi-disciplinary research connects society and technology to equip humanity with the tools to tackle climate change, clean our air, reduce the impacts of pollution on health and help us live better, more sustainable lives. The University is committed to improving its own resource efficiency on its estate and being a sector leader, aiming to be carbon neutral by 2030. A focus on research that makes a difference to the world has contributed to Surrey being ranked 55th in the world in the Times Higher Education (THE) University Impact Rankings 2022, which assesses more than 1,400 universities’ performance against the United Nations’ Sustainable Development Goals (SDGs). More

  • in

    Physicists discover a new switch for superconductivity

    Under certain conditions — usually exceedingly cold ones — some materials shift their structure to unlock new, superconducting behavior. This structural shift is known as a “nematic transition,” and physicists suspect that it offers a new way to drive materials into a superconducting state where electrons can flow entirely friction-free.
    But what exactly drives this transition in the first place? The answer could help scientists improve existing superconductors and discover new ones.
    Now, MIT physicists have identified the key to how one class of superconductors undergoes a nematic transition, and it’s in surprising contrast to what many scientists had assumed.
    The physicists made their discovery studying iron selenide (FeSe), a two-dimensional material that is the highest-temperature iron-based superconductor. The material is known to switch to a superconducting state at temperatures as high as 70 kelvins (close to -300 degrees Fahrenheit). Though still ultracold, this transition temperature is higher than that of most superconducting materials.
    The higher the temperature at which a material can exhibit superconductivity, the more promising it can be for use in the real world, such as for realizing powerful electromagnets for more precise and lightweight MRI machines or high-speed, magnetically levitating trains.
    For those and other possibilities, scientists will first need to understand what drives a nematic switch in high-temperature superconductors like iron selenide. In other iron-based superconducting materials, scientists have observed that this switch occurs when individual atoms suddenly shift their magnetic spin toward one coordinated, preferred magnetic direction.

    But the MIT team found that iron selenide shifts through an entirely new mechanism. Rather than undergoing a coordinated shift in spins, atoms in iron selenide undergo a collective shift in their orbital energy. It’s a fine distinction, but one that opens a new door to discovering unconventional superconductors.
    “Our study reshuffles things a bit when it comes to the consensus that was created about what drives nematicity,” says Riccardo Comin, the Class of 1947 Career Development Associate Professor of Physics at MIT. “There are many pathways to get to unconventional superconductivity. This offers an additional avenue to realize superconducting states.”
    Comin and his colleagues will publish their results in a study appearing in Nature Materials. Co-authors at MIT include Connor Occhialini, Shua Sanchez, and Qian Song, along with Gilberto Fabbris, Yongseong Choi, Jong-Woo Kim, and Philip Ryan at Argonne National Laboratory.
    Following the thread
    The word “nematicity” stems from the Greek word “nema,”meaning “thread” — for instance, to describe the thread-like body of the nematode worm. Nematicity is also used to describe conceptual threads, such as coordinated physical phenomena. For instance, in the study of liquid crystals, nematic behavior can be observed when molecules assemble in coordinated lines.

    In recent years, physicists have used nematicity to describe a coordinated shift that drives a material into a superconducting state. Strong interactions between electrons cause the material as a whole to stretch infinitesimally, like microscopic taffy, in one particular direction that allows electrons to flow freely in that direction. The big question has been what kind of interaction causes the stretching. In some iron-based materials, this stretching seems to be driven by atoms that spontaneously shift their magnetic spins to point in the same direction. Scientists have therefore assumed that most iron-based superconductors make the same, spin-driven transition.
    But iron selenide seems to buck this trend. The material, which happens to transition into a superconducting state at the highest temperature of any iron-based material, also seems to lack any coordinated magnetic behavior.
    “Iron selenide has the least clear story of all these materials,” says Sanchez, who is an MIT postdoc and NSF MPS-Ascend Fellow. “In this case, there’s no magnetic order. So,understanding the origin of nematicity requires looking very carefully at how the electrons arrange themselves around the iron atoms, and what happens as those atoms stretch apart.”
    A super continuum
    In their new study, the researchers worked with ultrathin, millimeter-long samples of iron selenide, which they glued to a thin strip of titanium. They mimicked the structural stretching that occurs during a nematic transition by physically stretching the titanium strip, which in turn stretched the iron selenide samples. As they stretched the samples by a fraction of a micron at a time, they looked for any properties that shifted in a coordinated fashion.
    Using ultrabright X-rays, the team tracked how the atoms in each sample were moving, as well as how each atom’s electrons were behaving. After a certain point, they observed a definite, coordinated shift in the atoms’ orbitals. Atomic orbitals are essentially energy levels that an atom’s electrons can occupy. In iron selenide, electrons can occupy one of two orbital states around an iron atom. Normally, the choice of which state to occupy is random. But the team found that as they stretched the iron selenide, its electrons began to overwhelmingly prefer one orbital state over the other. This signaled a clear, coordinated shift, along with a new mechanism of nematicity, and superconductivity.
    “What we’ve shown is that there are different underlying physics when it comes to spin versus orbital nematicity, and there’s going to be a continuum of materials that go between the two,” says Occhialini, an MIT graduate student. “Understanding where you are on that landscape will be important in looking for new superconductors.”
    This research was supported by the Department of Energy, the Air Force Office of Scientific Research, and the National Science Foundation. More

  • in

    New microcomb device advances photonic technology

    A new tool for generating microwave signals could help propel advances in wireless communication, imaging, atomic clocks, and more.
    Frequency combs are photonic devices that produce many equally spaced laser lines, each locked to a specific frequency to produce a comb-like structure. They can be used to generate high-frequency, stable microwave signals and scientists have been attempting to miniaturize the approach so they can be used on microchips.
    Scientists have been limited in their abilities to tune these microcombs at a rate to make them effective. But a team of researchers led by University of Rochester’s Qiang Lin, professor of electrical and computer engineering and optics, outlined a new high-speed tunable microcomb in Nature Communications.
    “One of the hottest areas of research in nonlinear integrated photonics is trying to produce this kind of a frequency comb on a chip-scale device,” says Lin. “We are excited to have developed the first microcomb device to produce a highly tunable microwave source.”
    The device is a lithium niobate resonator that allows users to manipulate the bandwidth and frequency modulation rates several orders-of-magnitude faster than existing microcombs.
    “The device provides a new approach to electro-optic processing of coherent microwaves and opens up a great avenue towards high-speed control of soliton comb lines that is crucial for many applications including frequency metrology, frequency synthesis, RADAR/LiDAR, sensing, and communication,” says Yang He ’20 (PhD), who was an electrical and computer engineering postdoctoral scholar in Lin’s lab and is the first author on the paper.
    Other coauthors from Lin’s group include Raymond Lopez-Rios, Usman A. Javid, Jingwei Ling, Mingxiao Li, and Shixin Xue.
    The project was a collaboration between faculty and students at Rochester’s Department of Electrical and Computer Engineering and Institute of Optics as well as the California Institute of Technology. The work was supported in part by the Defense Threat Reduction Agency, the Defense Advanced Research Projects Agency, and the National Science Foundation. More

  • in

    Now, every biologist can use machine learning

    The amount of data generated by scientists today is massive, thanks to the falling costs of sequencing technology and the increasing amount of available computing power. But parsing through all that data to uncover useful information is like searching for a molecular needle in a haystack. Machine learning (ML) and other artificial intelligence (AI) tools can dramatically speed up the process of data analysis, but most ML tools are difficult for non-ML experts to access and use. Recently, automated machine learning (AutoML) methods have been developed that can automate the design and deployment of ML tools, but they are often very complex and require a facility with ML that few scientists outside of the AI field have.
    A group of scientists at the Wyss Institute for Biologically Inspired Engineering at Harvard University and MIT has now filled that unmet need by building a new, comprehensive AutoML platform designed for biologists with little to no ML experience. Their platform, called BioAutoMATED, can use sequences of nucleic acids, peptides, or glycans as input data, and its performance is comparable to other AutoML platforms while requiring minimal user input. The platform is described in a new paper published in Cell Systems and is available to download from GitHub.
    “Our tool is for folks who don’t have the ability to build their own custom ML models, who find themselves asking questions like, ‘I have this cool data set, will ML even work for it? How do I get it into an ML model? The complexity of ML is what’s stopping me from going further with this data set, so how do I overcome that?’,” said co-first author Jackie Valeri, a graduate student in the lab of Wyss Core Faculty member Jim Collins, Ph.D. “We wanted to make it easy for biologists and experts in other domains to use the power of ML and AutoML to answer fundamental questions and help uncover biology that means something.”
    AutoML for all
    Like many great ideas, the seed that would become BioAutoMATED was planted not in the lab, but over lunch. Valeri and co-first authors Luis Soenksen, Ph.D. and Katie Collins were eating together at one of the Wyss Institute’s dining tables when they realized that despite the Institute’s reputation as a world-class destination for biological research, only a handful of the top experts working there were capable of building and training ML models that could greatly benefit their work.
    “We decided that we needed to do something about that, because we wanted the Wyss to be at the forefront of the AI biotech revolution, and we also wanted the development of these tools to be driven by biologists, for biologists,” said Soenksen, a Postdoctoral Fellow at the Wyss Institute who is also a serial entrepreneur in the science and technology space. “Now, everyone agrees that AI is the future, but four years ago when we got this idea, it wasn’t that obvious, particularly for biological research. So, it started as a tool that we wanted to build to serve ourselves and our Wyss colleagues, but now we know that it can serve much more.”
    While various AutoML systems have already been developed to simplify the process of generating ML models from datasets, they typically have drawbacks; among them, the fact that each AutoML tool is designed to look at only one type of model (e.g., neural networks) when searching for an optimal solution. This limits the resulting model to a narrow set of possibilities, when in reality, a different type of model altogether may be more optimal. Another issue is that most AutoML tools aren’t designed specifically to take biological sequences as their input data. Some tools have been developed that use language models for analyzing biological sequences, but these lack automation features and are difficult to use.

    To build a robust all-in-one AutoML for biology, the team modified three existing AutoML tools that each use a different approach for generating models: AutoKeras, which searches for optimal neural networks; DeepSwarm, which uses swarm-based algorithms to search for convolutional neural networks; and TPOT, which searches non-neural networks using a variety of methods including genetic programming and self-learning. BioAutoMATED then produces standardized output results for all three tools, so that the user can easily compare them and determine which type produces the most useful insights from their data.
    The team built BioAutoMATED to be able to take as inputs DNA, RNA, amino acid, and glycan (sugars molecules found on the surfaces of cells) sequences of any length, type, or biological function. BioAutoMATED automatically pre-processes the input data, then generates models that can predict biological functions from the sequence information alone.
    The platform also has a number of features that help users determine whether they need to gather additional data to improve the quality of the output, learn which features of a sequence the models “paid attention” to most (and thus may be of more biological interest), and design new sequences for future experiments.
    Nucleotides and peptides and glycans, oh my!
    To test-drive their new framework, the team first used it to explore how changing the sequence of a stretch of RNA called the ribosome binding site (RBS) affected the efficiency with which a ribosome could bind to the RNA and translate it into protein in E. coli bacteria. They fed their sequence data into BioAutoMATED, which identified a model generated by the DeepSwarm algorithm that could accurately predict translation efficiency. This model performed as well as models created by a professional ML expert, but was generated in just 26.5 minutes and only required ten lines of input code from the user (other models can require more than 750). They also used BioAutoMATED to identify which areas of the sequence seemed to be the most important in determining translation efficiency, and to design new sequences that could be tested experimentally.

    They then moved on to trials of feeding peptide and glycan sequence data into BioAutoMATED and using the results to answer specific questions about those sequences. The system generated highly accurate information about which amino acids in a peptide sequence are most important in determining an antibody’s ability to bind to the drug ranibizumab (Lucentis), and also classified different types of glycans into immunogenic and non-immunogenic groups based on their sequences. The team also used it to optimize the sequences of RNA-based toehold switches, informing the design of new toehold switches for experimental testing with minimal input coding from the user.
    “Ultimately, we were able to show that BioAutoMATED helps people 1) recognize patterns in biological data, 2) ask better questions about that data, and 3) answer those questions quickly, all within a single framework — without having to become an ML expert themselves,” said Katie Collins, who is currently a graduate student at the University of Cambridge and worked on the project while an undergraduate at MIT.
    Any models predicted with the help of BioAutoMATED, as with any other ML tool, need to be experimentally validated in the lab whenever possible. But the team is hopeful that it could be further integrated into the ever-growing set of AutoML tools, one day extending its function beyond biological sequences to any sequence-like object, such as fingerprints.
    “Machine learning and artificial intelligence tools have been around for a while now, but it’s only with the recent development of user-friendly interfaces that they’ve exploded in popularity, as in the case of ChatGPT,” said Jim Collins, who is also the Termeer Professor of Medical Engineering & Science at MIT. “We hope that BioAutoMATED can enable the next generation of biologists to faster and more easily discover the underpinnings of life.”
    “Enabling non-experts to use these platforms is critical for being able to harness ML techniques’ full potential to solve long-standing problems in biology, and beyond. This advance by the Collins team is a major step forward for making AI a key collaborator for biologists and bioengineers,” said Wyss Founding Director Don Ingber, M.D., Ph.D., who is also the also the Judah Folkman Professor of Vascular Biology at Harvard Medical School and Boston Children’s Hospital, and the Hansjörg Wyss Professor of Bioinspired Engineering at the Harvard John A. Paulson School of Engineering and Applied Sciences (SEAS).
    Additional authors of the paper include George Cai from the Wyss Institute and Harvard Medical School; former Wyss Institute members Pradeep Ramesh, Rani Powers, Nicolaas Angenent-Mari, and Diogo Camacho; and Felix Wong and Timothy Lu from MIT.
    This research was supported by the Defense Threat Reduction Agency (grant HDTRA-12210032), the DARPA SD2 program, the Paul G. Allen Frontiers Group, the Wyss Institute for Biologically Inspired Engineering, an MIT-Takeda Fellowship, CONACyT grant 342369/408970, and an MIT-TATA Center fellowship (2748460). More