More stories

  • in

    Google’s deepfake hunter sees what you can’t—even in videos without faces

    In an era where manipulated videos can spread disinformation, bully people, and incite harm, UC Riverside researchers have created a powerful new system to expose these fakes.
    Amit Roy-Chowdhury, a professor of electrical and computer engineering, and doctoral candidate Rohit Kundu, both from UCR’s Marlan and Rosemary Bourns College of Engineering, teamed up with Google scientists to develop an artificial intelligence model that detects video tampering — even when manipulations go far beyond face swaps and altered speech. (Roy-Chowdhury is also the co-director of the UC Riverside Artificial Intelligence Research and Education (RAISE) Institute, a new interdisciplinary research center at UCR.)
    Their new system, called the Universal Network for Identifying Tampered and synthEtic videos (UNITE), detects forgeries by examining not just faces but full video frames, including backgrounds and motion patterns. This analysis makes it one of the first tools capable of identifying synthetic or doctored videos that do not rely on facial content.
    “Deepfakes have evolved,” Kundu said. “They’re not just about face swaps anymore. People are now creating entirely fake videos — from faces to backgrounds — using powerful generative models. Our system is built to catch all of that.”
    UNITE’s development comes as text-to-video and image-to-video generation have become widely available online. These AI platforms enable virtually anyone to fabricate highly convincing videos, posing serious risks to individuals, institutions, and democracy itself.
    “It’s scary how accessible these tools have become,” Kundu said. “Anyone with moderate skills can bypass safety filters and generate realistic videos of public figures saying things they never said.”
    Kundu explained that earlier deepfake detectors focused almost entirely on face cues.

    “If there’s no face in the frame, many detectors simply don’t work,” he said. “But disinformation can come in many forms. Altering a scene’s background can distort the truth just as easily.”
    To address this, UNITE uses a transformer-based deep learning model to analyze video clips. It detects subtle spatial and temporal inconsistencies — cues often missed by previous systems. The model draws on a foundational AI framework known as SigLIP, which extracts features not bound to a specific person or object. A novel training method, dubbed “attention-diversity loss,” prompts the system to monitor multiple visual regions in each frame, preventing it from focusing solely on faces.
    The result is a universal detector capable of flagging a range of forgeries — from simple facial swaps to complex, fully synthetic videos generated without any real footage.
    “It’s one model that handles all these scenarios,” Kundu said. “That’s what makes it universal.”
    The researchers presented their findings at the high ranking 2025 Conference on Computer Vision and Pattern Recognition (CVPR) in Nashville, Tenn. Titled “Towards a Universal Synthetic Video Detector: From Face or Background Manipulations to Fully AI-Generated Content,” their paper, led by Kundu, outlines UNITE’s architecture and training methodology. Co-authors include Google researchers Hao Xiong, Vishal Mohanty, and Athula Balachandra. Co-sponsored by the IEEE Computer Society and the Computer Vision Foundation, CVPR is among the highest-impact scientific publication venues in the world.
    The collaboration with Google, where Kundu interned, provided access to expansive datasets and computing resources needed to train the model on a broad range of synthetic content, including videos generated from text or still images — formats that often stump existing detectors.
    Though still in development, UNITE could soon play a vital role in defending against video disinformation. Potential users include social media platforms, fact-checkers, and newsrooms working to prevent manipulated videos from going viral.
    “People deserve to know whether what they’re seeing is real,” Kundu said. “And as AI gets better at faking reality, we have to get better at revealing the truth.” More

  • in

    One small qubit, one giant leap for quantum computing

    On July 8, 2025, physicists from Aalto University in Finland published a transmon qubit coherence dramatically surpassing previous scientifically published records. The millisecond coherence measurement marks a quantum leap in computational technology, with the previous maximum echo coherence measurements approaching 0.6 milliseconds.
    Longer qubit coherence allows for an extended window of time in which quantum computers can execute error-free operations, enabling more complex quantum computations and more quantum logic operations before errors occur. Not only does this allow for more calculations with noisy quantum computers, but it also decreases the resources needed for quantum error correction, which is a path to noiseless quantum computing.
    “We have just measured an echo coherence time for a transmon qubit that landed at a millisecond at maximum with a median of half a millisecond,” says Mikko Tuokkola, the PhD student who conducted and analyzed the measurements. The median reading is particularly significant, as it also surpasses current recorded readings.
    The findings have been just published in the prestigious peer-reviewed journal Nature Communications.
    The researchers report their approach as thoroughly as possible, with the aim of making it reproducible for research groups around the world.
    Finland cements position at forefront of quantum
    Tuokkala was supervised at Aalto University by postdoctoral researcher Dr. Yoshiki Sunada, who fabricated the chip and built the measurement setup.

    “We have been able to reproducibly fabricate high-quality transmon qubits. The fact that this can be achieved in a cleanroom which is accessible for academic research is a testament to Finland’s leading position in quantum science and technology,” adds Sunada who is currently working in Stanford University, USA.
    The work is a result of the Quantum Computing and Devices (QCD) research group which is a part of Aalto University’s Department of Applied Physics, Academy of Finland Centre of Excellence in Quantum Technology (QTF), and the Finnish Quantum Flagship (FQF).
    The qubit was fabricated by the QCD group at Aalto using high-quality superconducting film supplied by the Technical Research Centre of Finland (VTT). The success reflects the high quality of Micronova cleanrooms at OtaNano, Finland’s national research infrastructure for micro-, nano-, and quantum technologies.
    “This landmark achievement has strengthened Finland’s standing as a global leader in the field, moving the needle forward on what can be made possible with the quantum computers of the future,” says Professor of Quantum Technology Mikko Möttönen, who heads the QCD group.
    Scaling up the quantum computers of the future requires advancements across several domains. Among them are noise reduction, qubit-count increases, and the qubit coherence time improvements at the center of the new observations from the QCD. The group just opened a senior staff member and two postdocs positions for achieving future breakthroughs faster. More

  • in

    A simple twist fooled AI—and revealed a dangerous flaw in medical ethics

    A study by investigators at the Icahn School of Medicine at Mount Sinai, in collaboration with colleagues from Rabin Medical Center in Israel and other collaborators, suggests that even the most advanced artificial intelligence (AI) models can make surprisingly simple mistakes when faced with complex medical ethics scenarios.
    The findings, which raise important questions about how and when to rely on large language models (LLMs), such as ChatGPT, in health care settings, were reported in the July 22 online issue of NPJ Digital Medicine[10.1038/s41746-025-01792-y].
    The research team was inspired by Daniel Kahneman’s book “Thinking, Fast and Slow,” which contrasts fast, intuitive reactions with slower, analytical reasoning. It has been observed that large language models (LLMs) falter when classic lateral-thinking puzzles receive subtle tweaks. Building on this insight, the study tested how well AI systems shift between these two modes when confronted with well-known ethical dilemmas that had been deliberately tweaked.
    “AI can be very powerful and efficient, but our study showed that it may default to the most familiar or intuitive answer, even when that response overlooks critical details,” says co-senior author Eyal Klang, MD, Chief of Generative AI in the Windreich Department of Artificial Intelligence and Human Health at the Icahn School of Medicine at Mount Sinai. “In everyday situations, that kind of thinking might go unnoticed. But in health care, where decisions often carry serious ethical and clinical implications, missing those nuances can have real consequences for patients.”
    To explore this tendency, the research team tested several commercially available LLMs using a combination of creative lateral thinking puzzles and slightly modified well-known medical ethics cases. In one example, they adapted the classic “Surgeon’s Dilemma,” a widely cited 1970s puzzle that highlights implicit gender bias. In the original version, a boy is injured in a car accident with his father and rushed to the hospital, where the surgeon exclaims, “I can’t operate on this boy — he’s my son!” The twist is that the surgeon is his mother, though many people don’t consider that possibility due to gender bias. In the researchers’ modified version, they explicitly stated that the boy’s father was the surgeon, removing the ambiguity. Even so, some AI models still responded that the surgeon must be the boy’s mother. The error reveals how LLMs can cling to familiar patterns, even when contradicted by new information.
    In another example to test whether LLMs rely on familiar patterns, the researchers drew from a classic ethical dilemma in which religious parents refuse a life-saving blood transfusion for their child. Even when the researchers altered the scenario to state that the parents had already consented, many models still recommended overriding a refusal that no longer existed.
    “Our findings don’t suggest that AI has no place in medical practice, but they do highlight the need for thoughtful human oversight, especially in situations that require ethical sensitivity, nuanced judgment, or emotional intelligence,” says co-senior corresponding author Girish N. Nadkarni, MD, MPH, Chair of the Windreich Department of Artificial Intelligence and Human Health, Director of the Hasso Plattner Institute for Digital Health, Irene and Dr. Arthur M. Fishberg Professor of Medicine at the Icahn School of Medicine at Mount Sinai, and Chief AI Officer of the Mount Sinai Health System. “Naturally, these tools can be incredibly helpful, but they’re not infallible. Physicians and patients alike should understand that AI is best used as a complement to enhance clinical expertise, not a substitute for it, particularly when navigating complex or high-stakes decisions. Ultimately, the goal is to build more reliable and ethically sound ways to integrate AI into patient care.”
    “Simple tweaks to familiar cases exposed blind spots that clinicians can’t afford,” says lead author Shelly Soffer, MD, a Fellow at the Institute of Hematology, Davidoff Cancer Center, Rabin Medical Center. “It underscores why human oversight must stay central when we deploy AI in patient care.”

    Next, the research team plans to expand their work by testing a wider range of clinical examples. They’re also developing an “AI assurance lab” to systematically evaluate how well different models handle real-world medical complexity.
    The paper is titled “Pitfalls of Large Language Models in Medical Ethics Reasoning.”
    The study’s authors, as listed in the journal, are Shelly Soffer, MD; Vera Sorin, MD; Girish N. Nadkarni, MD, MPH; and Eyal Klang, MD.
    About Mount Sinai’s Windreich Department of AI and Human Health
    Led by Girish N. Nadkarni, MD, MPH — an international authority on the safe, effective, and ethical use of AI in health care — Mount Sinai’s Windreich Department of AI and Human Health is the first of its kind at a U.S. medical school, pioneering transformative advancements at the intersection of artificial intelligence and human health.
    The Department is committed to leveraging AI in a responsible, effective, ethical, and safe manner to transform research, clinical care, education, and operations. By bringing together world-class AI expertise, cutting-edge infrastructure, and unparalleled computational power, the department is advancing breakthroughs in multi-scale, multimodal data integration while streamlining pathways for rapid testing and translation into practice.

    The Department benefits from dynamic collaborations across Mount Sinai, including with the Hasso Plattner Institute for Digital Health at Mount Sinai — a partnership between the Hasso Plattner Institute for Digital Engineering in Potsdam, Germany, and the Mount Sinai Health System — which complements its mission by advancing data-driven approaches to improve patient care and health outcomes.
    At the heart of this innovation is the renowned Icahn School of Medicine at Mount Sinai, which serves as a central hub for learning and collaboration. This unique integration enables dynamic partnerships across institutes, academic departments, hospitals, and outpatient centers, driving progress in disease prevention, improving treatments for complex illnesses, and elevating quality of life on a global scale.
    In 2024, the Department’s innovative NutriScan AI application, developed by the Mount Sinai Health System Clinical Data Science team in partnership with Department faculty, earned Mount Sinai Health System the prestigious Hearst Health Prize. NutriScan is designed to facilitate faster identification and treatment of malnutrition in hospitalized patients. This machine learning tool improves malnutrition diagnosis rates and resource utilization, demonstrating the impactful application of AI in health care.
    * Mount Sinai Health System member hospitals: The Mount Sinai Hospital; Mount Sinai Brooklyn; Mount Sinai Morningside; Mount Sinai Queens; Mount Sinai South Nassau; Mount Sinai West; and New York Eye and Ear Infirmary of Mount Sinai More

  • in

    This tiny metal switches magnetism without magnets — and could power the future of electronics

    Research from the University of Minnesota Twin Cities gives new insight into a material that could make computer memory faster and more energy-efficient.
    The study was recently published in Advanced Materials, a peer-reviewed scientific journal. The researchers also have a patent on the technology.
    As technology continues to grow, so does the demand for emerging memory technology. Researchers are looking for alternatives and complements to existing memory solutions that can perform at high levels with low energy consumption to increase the functionality of everyday technology.
    In this new research, the team demonstrated a more efficient way to control magnetization in tiny electronic devices using a material called Ni₄W-a combination of nickel and tungsten. The team found that this low-symmetry material produces powerful spin-orbit torque (SOT) — a key mechanism for manipulating magnetism in next-generation memory and logic technologies.
    “Ni₄W reduces power usage for writing data, potentially cutting energy use in electronics significantly,” said Jian-Ping Wang, a senior author on the paper and a Distinguished McKnight Professor and Robert F. Hartmann Chair in the Department of Electrical and Computer Engineering (ECE) at the University of Minnesota Twin Cities.
    This technology could help reduce the electricity consumption of devices like smartphones and data centers making future electronics both smarter and more sustainable.
    “Unlike conventional materials, Ni₄W can generate spin currents in multiple directions, enabling ‘field-free’ switching of magnetic states without the need for external magnetic fields. We observed high SOT efficiency with multi-direction in Ni₄W both on its own and when layered with tungsten, pointing to its strong potential for use in low-power, high-speed spintronic devices.” said Yifei Yang, a fifth-year Ph.D. student in Wang’s group and a co-first author on the paper.

    Ni₄W is made from common metals and can be manufactured using standard industrial processes. The low-cost material makes it very attractive to industry partners and soon could be implemented into technology we use everyday like smart watches, phones, and more.
    “We are very excited to see that our calculations confirmed the choice of the material and the SOT experimental observation,” said Seungjun Lee, a postdoctoral fellow in ECE and the co-first author on the paper.
    The next steps are to grow these materials into a device that is even smaller from their previous work.
    In addition to Wang, Yang and Lee, the ECE team included Paul Palmberg Professor Tony Low, another senior author on the paper, Yu-Chia Chen, Qi Jia, Brahmudutta Dixit, Duarte Sousa, Yihong Fan, Yu-Han Huang, Deyuan Lyu and Onri Jay Benally. This work was done with Michael Odlyzko, Javier Garcia-Barriocanal, Guichuan Yu and Greg Haugstad from the University of Minnesota Characterization Facility, along with Zach Cresswell and Shuang Liang from the Department of Chemical Engineering and Materials Science.
    This work was supported by SMART (Spintronic Materials for Advanced InforRmation Technologies), a world-leading research center that brings together experts from across the nation to develop technologies for spin-based computing and memory systems. SMART was one of the seven centers of nCORE, a Semiconductor Research Corporation program sponsored by the National Institute of Standards and Technology. This work is being supported by the Global Research Collaboration Logic and Memory program. This study was done in collaboration with the University of Minnesota Characterization Facility and the Minnesota Nano Center. More

  • in

    This flat chip uses twisted light to reveal hidden images

    Imagine trying to wear a left-handed glove on your right hand: it doesn’t fit because left and right hands are mirror images that can’t be superimposed on each other. This ‘handedness’ is what scientists call chirality, and it plays a fundamental role in biology, chemistry, and materials science. Most DNA molecules and sugars are right-handed, while most amino acids are left-handed. Reversing a molecule’s handedness can render a nutrient useless or a drug inactive and even harmful.
    Light can also be left or right ‘handed’. When a light beam is circularly polarized, its electric field corkscrews through space in either a left-handed or right-handed spiral. Because chiral structures interact differently with these two types of twisted light beams, shining a circularly polarized light on a sample – and comparing how much of each twist is absorbed, reflected, or delayed – lets scientists read out the sample’s own handedness. However, this effect is extremely weak, which makes precise control of chirality an essential but challenging task.
    Now, scientists from the Bionanophotonic Systems Laboratory in EPFL’s School of Engineering have collaborated with those in Australia to create artificial optical structures called metasurfaces: 2D lattices composed of tiny elements (meta-atoms) that can easily tune their chiral properties. By varying the orientation of meta-atoms within a lattice, scientists can control the resulting metasurface’s interaction with polarized light.
    “Our ‘chiral design toolkit’ is elegantly simple, and yet more powerful than previous approaches, which tried to control light through very complex meta-atom geometries. Instead, we leverage the interplay between the shape of the meta-atom and the symmetry of the metasurface lattice,” explains Bionanophotonics Lab head Hatice Altug.
    The innovation, which has potential applications in data encryption, biosensing, and quantum technologies, has been published in Nature Communications.
    An invisible, dual layer watermark
    The team’s metasurface, made of germanium and calcium difloride, presents a gradient of meta-atoms with orientations that vary continuously along a chip. The shape and angles of these meta-atoms, as well as the lattice symmetry, all work together to tune the response of the metasurface to polarized light.

    In a proof-of-concept experiment, the scientists encoded two different images simultaneously on a metasurface optimized for the invisible mid-infrared range of the electromagnetic spectrum. For the first image of an Australian cockatoo, the image data were encoded in the size of the meta-atoms – which represented pixels – and decoded with unpolarized light. The second image was encoded using the orientation of the meta-atoms so that, when exposed to circularly polarized light, the metasurface revealed a picture of the iconic Swiss Matterhorn.
    “This experiment showcased our technique’s ability to produce a dual layer ‘watermark’ invisible to the human eye, paving the way for advanced anticounterfeiting, camouflage and security applications,” says Bionanophotonics Systems Lab researcher Ivan Sinev.
    Beyond encryption, the team’s approach has potential applications for quantum technologies, many of which rely on polarized light to perform computations. The ability to map chiral responses across large surfaces could also streamline biosensing.
    “We can use chiral metastructures like ours to sense, for example, drug composition or purity from small-volume samples. Nature is chiral, and the ability to distinguish between left- and right-handed molecules is essential, as it could make the difference between a medicine and a toxin,” says Bionanophotonic Systems Lab researcher Felix Richter. More

  • in

    This AI-powered lab runs itself—and discovers new materials 10x faster

    Researchers have demonstrated a new technique that allows “self-driving laboratories” to collect at least 10 times more data than previous techniques at record speed. The advance – which is published in Nature Chemical Engineering – dramatically expedites materials discovery research, while slashing costs and environmental impact.
    Self-driving laboratories are robotic platforms that combine machine learning and automation with chemical and materials sciences to discover materials more quickly. The automated process allows machine-learning algorithms to make use of data from each experiment when predicting which experiment to conduct next to achieve whatever goal was programmed into the system.
    “Imagine if scientists could discover breakthrough materials for clean energy, new electronics, or sustainable chemicals in days instead of years, using just a fraction of the materials and generating far less waste than the status quo,” says Milad Abolhasani, corresponding author of a paper on the work and ALCOA Professor of Chemical and Biomolecular Engineering at North Carolina State University. “This work brings that future one step closer.”
    Until now, self-driving labs utilizing continuous flow reactors have relied on steady-state flow experiments. In these experiments, different precursors are mixed together and chemical reactions take place, while continuously flowing in a microchannel. The resulting product is then characterized by a suite of sensors once the reaction is complete.
    “This established approach to self-driving labs has had a dramatic impact on materials discovery,” Abolhasani says. “It allows us to identify promising material candidates for specific applications in a few months or weeks, rather than years, while reducing both costs and the environmental impact of the work. However, there was still room for improvement.”
    Steady-state flow experiments require the self-driving lab to wait for the chemical reaction to take place before characterizing the resulting material. That means the system sits idle while the reactions take place, which can take up to an hour per experiment.
    “We’ve now created a self-driving lab that makes use of dynamic flow experiments, where chemical mixtures are continuously varied through the system and are monitored in real time,” Abolhasani says. “In other words, rather than running separate samples through the system and testing them one at a time after reaching steady-state, we’ve created a system that essentially never stops running. The sample is moving continuously through the system and, because the system never stops characterizing the sample, we can capture data on what is taking place in the sample every half second.

    “For example, instead of having one data point about what the experiment produces after 10 seconds of reaction time, we have 20 data points – one after 0.5 seconds of reaction time, one after 1 second of reaction time, and so on. It’s like switching from a single snapshot to a full movie of the reaction as it happens. Instead of waiting around for each experiment to finish, our system is always running, always learning.”
    Collecting this much additional data has a big impact on the performance of the self-driving lab.
    “The most important part of any self-driving lab is the machine-learning algorithm the system uses to predict which experiment it should conduct next,” Abolhasani says. “This streaming-data approach allows the self-driving lab’s machine-learning brain to make smarter, faster decisions, honing in on optimal materials and processes in a fraction of the time. That’s because the more high-quality experimental data the algorithm receives, the more accurate its predictions become, and the faster it can solve a problem. This has the added benefit of reducing the amount of chemicals needed to arrive at a solution.”
    In this work, the researchers found the self-driving lab that incorporated a dynamic flow system generated at least 10 times more data than self-driving labs that used steady-state flow experiments over the same period of time, and was able to identify the best material candidates on the very first try after training.
    “This breakthrough isn’t just about speed,” Abolhasani says. “By reducing the number of experiments needed, the system dramatically cuts down on chemical use and waste, advancing more sustainable research practices.
    “The future of materials discovery is not just about how fast we can go, it’s also about how responsibly we get there,” Abolhasani says. “Our approach means fewer chemicals, less waste, and faster solutions for society’s toughest challenges.”
    The paper, “Flow-Driven Data Intensification to Accelerate Autonomous Materials Discovery,” will be published July 14 in the journal Nature Chemical Engineering. Co-lead authors of the paper are Fernando Delgado-Licona, a Ph.D. student at NC State; Abdulrahman Alsaiari, a master’s student at NC State; and Hannah Dickerson, a former undergraduate at NC State. The paper was co-authored by Philip Klem, an undergraduate at NC State; Arup Ghorai, a former postdoctoral researcher at NC State; Richard Canty and Jeffrey Bennett, current postdoctoral researchers at NC State; Pragyan Jha, Nikolai Mukhin, Junbin Li and Sina Sadeghi, Ph.D. students at NC State; Fazel Bateni, a former Ph.D. student at NC State; and Enrique A. López-Guajardo of Tecnologico de Monterrey.
    This work was done with support from the National Science Foundation under grants 1940959, 2315996 and 2420490; and from the University of North Carolina Research Opportunities Initiative program. More

  • in

    This magnetic breakthrough could make AI 10x more efficient

    The rapid rise in AI applications has placed increasingly heavy demands on our energy infrastructure. All the more reason to find energy-saving solutions for AI hardware. One promising idea is the use of so-called spin waves to process information. A team from the Universities of Münster and Heidelberg (Germany) led by physicist Prof. Rudolf Bratschitsch (Münster) has now developed a new way to produce waveguides in which the spin waves can propagate particularly far. They have thus created the largest spin waveguide network to date. Furthermore, the group succeeded in specifically controlling the properties of the spin wave transmitted in the waveguide. For example, they were able to precisely alter the wavelength and reflection of the spin wave at a certain interface. The study was published in the scientific journal Nature Materials.
    The electron spin is a quantum mechanical quantity that is also described as the intrinsic angular momentum. The alignment of many spins in a material determines its magnetic properties. If an alternating current is applied to a magnetic material with an antenna, thereby generating a changing magnetic field, the spins in the material can generate a spin wave.
    Spin waves have already been used to create individual components, such as logic gates that process binary input signals into binary output signals, or multiplexers that select one of various input signals. Up until now, however, the components were not connected to form a larger circuit. “The fact that larger networks such as those used in electronics have not yet been realised, is partly due to the strong attenuation of the spin waves in the waveguides that connect the individual switching elements – especially if they are narrower than a micrometre and therefore on the nanoscale,” explains Rudolf Bratschitsch.
    The group used the material with the lowest attenuation currently known: yttrium iron garnet (YIG)., The researchers inscribed individual spin-wave waveguides into a 110 nanometre thin film of this magnetic material using a silicon ion beam and produced a large network with 198 nodes. The new method allows complex structures of high quality to be produced flexibly and reproducibly.
    The German Research Foundation (DFG) funded the project as part of the Collaborative Research Centre 1459 “Intelligent Matter.” More

  • in

    Scientists discover the moment AI truly understands language

    The language capabilities of today’s artificial intelligence systems are astonishing. We can now engage in natural conversations with systems like ChatGPT, Gemini, and many others, with a fluency nearly comparable to that of a human being. Yet we still know very little about the internal processes in these networks that lead to such remarkable results.
    A new study published in the Journal of Statistical Mechanics: Theory and Experiment (JSTAT) reveals a piece of this mystery. It shows that when small amounts of data are used for training, neural networks initially rely on the position of words in a sentence. However, as the system is exposed to enough data, it transitions to a new strategy based on the meaning of the words. The study finds that this transition occurs abruptly, once a critical data threshold is crossed — much like a phase transition in physical systems. The findings offer valuable insights for understanding the workings of these models.
    Just like a child learning to read, a neural network starts by understanding sentences based on the positions of words: depending on where words are located in a sentence, the network can infer their relationships (are they subjects, verbs, objects?). However, as the training continues — the network “keeps going to school” — a shift occurs: word meaning becomes the primary source of information.
    This, the new study explains, is what happens in a simplified model of self-attention mechanism — a core building block of transformer language models, like the ones we use every day (ChatGPT, Gemini, Claude, etc.). A transformer is a neural network architecture designed to process sequences of data, such as text, and it forms the backbone of many modern language models. Transformers specialize in understanding relationships within a sequence and use the self-attention mechanism to assess the importance of each word relative to the others.
    “To assess relationships between words,” explains Hugo Cui, a postdoctoral researcher at Harvard University and first author of the study, “the network can use two strategies, one of which is to exploit the positions of words.” In a language like English, for example, the subject typically precedes the verb, which in turn precedes the object. “Mary eats the apple” is a simple example of this sequence.
    “This is the first strategy that spontaneously emerges when the network is trained,” Cui explains. “However, in our study, we observed that if training continues and the network receives enough data, at a certain point — once a threshold is crossed — the strategy abruptly shifts: the network starts relying on meaning instead.”
    “When we designed this work, we simply wanted to study which strategies, or mix of strategies, the networks would adopt. But what we found was somewhat surprising: below a certain threshold, the network relied exclusively on position, while above it, only on meaning.”
    Cui describes this shift as a phase transition, borrowing a concept from physics. Statistical physics studies systems composed of enormous numbers of particles (like atoms or molecules) by describing their collective behavior statistically. Similarly, neural networks — the foundation of these AI systems — are composed of large numbers of “nodes,” or neurons (named by analogy to the human brain), each connected to many others and performing simple operations. The system’s intelligence emerges from the interaction of these neurons, a phenomenon that can be described with statistical methods.

    This is why we can speak of an abrupt change in network behavior as a phase transition, similar to how water, under certain conditions of temperature and pressure, changes from liquid to gas.
    “Understanding from a theoretical viewpoint that the strategy shift happens in this manner is important,” Cui emphasizes. “Our networks are simplified compared to the complex models people interact with daily, but they can give us hints to begin to understand the conditions that cause a model to stabilize on one strategy or another. This theoretical knowledge could hopefully be used in the future to make the use of neural networks more efficient, and safer.”
    The research by Hugo Cui, Freya Behrens, Florent Krzakala, and Lenka Zdeborová, titled “A Phase Transition between Positional and Semantic Learning in a Solvable Model of Dot-Product Attention,” is published in JSTAT as part of the Machine Learning 2025 special issue and is included in the proceedings of the NeurIPS 2024 conference. More