Cambridge scientists have set out principles for how computational science — which powers discoveries from unveiling the mysteries of the universe to developing treatments to fight cancer to improving our understanding of the human genome, but can have a substantial carbon footprint — can be made more environmentally sustainable.
Writing in Nature Computational Science, researchers from the Department of Public Health and Primary Care at the University of Cambridge argue that the scientific community needs to act now if it is to prevent a potentially uncontrolled rise in the carbon footprint of computational science as data science and algorithms increase in usage.
Dr Loïc Lannelongue, who is a research associate in biomedical data science and a postdoctoral associate at Jesus College, Cambridge, said: “Science has transformed our understanding of the world around us and has led to great benefits to society. But this has come with a not-insignificant — and not always well understood — impact on the environment. As scientists — as with people working in every sector — it’s important that we do what we can to reduce the carbon footprint of our work to ensure that the benefits of our discoveries are not outweighed by their environmental costs.”
Recent studies have begun to explore the environmental impacts of scientific research, with an initial focus on scientific conferences and experimental laboratories. For example, the 2019 Fall Meeting of the American Geophysical Union was estimated to emit 80,000 tons of CO2e* (tCO2e), equivalent to the average weekly emissions of the city of Edinburgh, UK. The annual carbon footprint of a typical life science laboratory has been estimated to be around 20 tCO2e.
But there is one aspect of research that often gets overlooked — and which can have a substantial environmental impact: high performance and cloud computing.
In 2020, the Information and Communication Technologies sector was estimated to have made up between 1.8% and 2.8% of global greenhouse gas emissions — more than aviation (1.9%). In addition to the environmental effects of electricity usage, manufacturing and disposal of hardware, there are also concerns around data centres’ water usage and land footprint.
Professor Michael Inouye said: “While the environmental impact of experimental ‘wet’ labs is more immediately obvious, the impact of algorithms is less clear and often underestimated. While new hardware, lower-energy data centres and more efficient high performance computing systems can help reduce their impact, the increasing ubiquity of artificial intelligence and data science more generally means their carbon footprint could grow exponentially in coming years if we don’t act now.”
To help address this issue, the team has developed GREENER (Governance, Responsibility, Estimation, Energy and embodied impacts, New collaborations, Education and Research), a set of principles to allow the computational science community to lead the way in sustainable research practices, maximising computational science’s benefit to both humanity and the environment.
Governance and Responsibility — Everyone involved in computational science has a role to play in making the field more sustainable: individual and institutional responsibility is a necessary step to ensure transparency and reduction of greenhouse gas emission.
For example, institutions themselves can be key to managing and expanding centralised data infrastructures, and in ensuring that procurement decisions take into account both the manufacturing and operational footprint of hardware purchases. IT teams in high performance computing (HPC) centres can play a key role, both in terms of training and helping scientists monitor the carbon footprint of their work. Principal Investigators can encourage their teams to think about this issue and give access to suitable training. Funding bodies can influence researchers by requiring estimates of carbon footprints to be included in funding applications.
Estimate and report the energy consumption of algorithms — Estimating and monitoring the carbon footprint of computations identifies inefficiencies and opportunities for improvement.
User-level metrics are crucial to understanding environmental impacts and promoting personal responsibility. The financial cost of running computations is often negligible, particularly in academia, and scientists may have the impression of unlimited and inconsequential computing capacity. Quantifying the carbon footprint of individual projects helps raise awareness of the true costs of research.
Tackling Energy and embodied impacts through New collaborations — Minimising carbon intensity — that is, the carbon footprint of producing electricity — is one of the most immediately impactful ways to reduce greenhouse gas emissions. This could involve relocating computations to low-carbon settings and countries, but this needs to be done with equity in mind. Carbon intensities can differ by as much as three orders of magnitude between the top and bottom performing high-income countries (from 0.10 gCO2e/kWh in Iceland to 770 gCO2e/kWh in Australia).
The footprint of user devices is also a factor: one estimate found that almost three-quarters (72%) of the energy footprint of streaming a video to a laptop is from the laptop, with 23% used in transmission and a mere 5% at the data centre.
Another key consideration is data storage. The carbon footprint of storing data depends on numerous factors, but the life cycle footprint of storing one terabyte of data for a year is of the order of 10 kg CO2e. This issue is exacerbated by the duplication of such datasets in order for each institution, and sometimes each research group, to have a copy. Large (hyperscale) data centres are expected to be more energy efficient, but they may also encourage unnecessary increases in the scale of computing (the ‘rebound effect’).
Education and Research — Education is essential to raise awareness of the issues with different stakeholders. Integrating sustainability into computational training courses is a tangible first step toward reducing carbon footprints. Investing in research that will catalyse innovation in the field of environmentally sustainable computational science is a crucial role for funders and institutions to play.
Recent studies found that the most widely-used programming languages in research, such as R and Python, tend to be the least energy efficient ones, highlighting the importance of having trained Research Software Engineers within research groups to ensure that the algorithms used are efficiently implemented. There is also scope to use current tools more efficiently by better understanding and monitoring how coding choices impact carbon footprints.
Dr Lannelongue said: “Computational scientists have a real opportunity to lead the way in sustainability, but this is going to involve a change in our culture and the ways we work. There will need to more transparency, more awareness, better training and resources, and improved policies.
“Cooperation, open science, and equitable access to low-carbon computing facilities will also be crucial. We need to make sure that sustainable solutions work for everyone, as they frequently have the least benefit for populations, often in low- and middle-income countries, who suffer the most from climate change.”
Professor Inouye added: “Everyone in the field — from funders to journals to institutions down to individuals — plays an important role and can, themselves, make a positive impact. We have an immense opportunity to make a change, but the clock is ticking.”
The research was a collaboration with major stakeholders including Health Data Research UK, EMBL-EBI, Wellcome and UK Research and Innovation (UKRI).
*CO2e, or CO2-equivalent, summarises the global warming impacts of a range of greenhouse gases and is the standard metric for carbon footprints, although its accuracy is sometimes debated.