More stories

  • in

    Researchers apply quantum computing methods to protein structure prediction

    Researchers from Cleveland Clinic and IBM recently published findings in the Journal of Chemical Theory and Computation that could lay the groundwork for applying quantum computing methods to protein structure prediction. This publication is the first peer-reviewed quantum computing paper from the Cleveland Clinic-IBM Discovery Accelerator partnership.
    For decades, researchers have leveraged computational approaches to predict protein structures. A protein folds itself into a structure that determines how it functions and binds to other molecules in the body. These structures determine many aspects of human health and disease.
    By accurately predicting the structure of a protein, researchers can better understand how diseases spread and thus how to develop effective therapies. Cleveland Clinic postdoctoral fellow Bryan Raubenolt, Ph.D., and IBM researcher Hakan Doga, Ph.D., spearheaded a team to discover how quantum computing can improve current methods.
    In recent years, machine learning techniques have made significant progress in protein structure prediction. These methods are reliant on training data (a database of experimentally determined protein structures) to make predictions. This means that they are constrained by how many proteins they have been taught to recognize. This can lead to lower levels of accuracy when the programs/algorithms encounter a protein that is mutated or very different from those on which they were trained, which is common with genetic disorders.
    The alternative method is to simulate the physics of protein folding. Simulations allow researchers to look at a given protein’s various possible shapes and find the most stable one. The most stable shape is critical for drug design.
    The challenge is that these simulations are nearly impossible on a classical computer, beyond a certain protein size. In a way, increasing the size of the target protein is comparable to increasing the dimensions of a Rubik’s cube. For a small protein with 100 amino acids, a classical computer would need the time equal to the age of the universe to exhaustively search all the possible outcomes, says Dr. Raubenolt.
    To help overcome these limitations, the research team applied a mix of quantum and classical computing methods. This framework could allow quantum algorithms to address the areas that are challenging for state-of-the-art classical computing, including protein size, intrinsic disorder, mutations and the physics involved in proteins folding. The framework was validated by accurately predicting the folding of a small fragment of a Zika virus protein on a quantum computer, compared to state-of-the-art classical methods.

    The quantum-classical hybrid framework’s initial results outperformed both a classical physics-based method and AlphaFold2. Although the latter is designed to work best with larger proteins, it nonetheless demonstrates this framework’s ability to create accurate models without directly relying on substantial training data.
    The researchers used a quantum algorithm to first model the lowest energy conformation for the fragment’s backbone, which is typically the most computationally demanding step of the calculation. Classical approaches were then used to convert the results obtained from the quantum computer, reconstruct the protein with its sidechains, and perform final refinement of the structure with classical molecular mechanics force fields. The project shows one of the ways that problems can be deconstructed into parts, with quantum computing methods addressing some parts and classical computing others, for increased accuracy.
    “One of the most unique things about this project is the number of disciplines involved,” says Dr. Raubenolt. “Our team’s expertise ranges from computational biology and chemistry, structural biology, software and automation engineering, to experimental atomic and nuclear physics, mathematics, and of course quantum computing and algorithm design. It took the knowledge from each of these areas to create a computational framework that can mimic one of the most important processes for human life.”
    The team’s combination of classical and quantum computing methods is an essential step for advancing our understanding of protein structures, and how they impact our ability to treat and prevent disease. The team plans to continue developing and optimizing quantum algorithms that can predict the structure of larger and more sophisticated proteins.
    “This work is an important step forward in exploring where quantum computing capabilities could show strengths in protein structure prediction,” says Dr. Doga. “Our goal is to design quantum algorithms that can find how to predict protein structures as realistically as possible.” More

  • in

    Theoretical quantum speedup with the quantum approximate optimization algorithm

    In a new paper in Science Advances on May 29, researchers at JPMorgan Chase, the U.S. Department of Energy’s (DOE) Argonne National Laboratory and Quantinuum have demonstrated clear evidence of a quantum algorithmic speedup for the quantum approximate optimization algorithm (QAOA).
    This algorithm has been studied extensively and has been implemented on many quantum computers. It has potential application in fields such as logistics, telecommunications, financial modeling and materials science.
    “This work is a significant step towards reaching quantum advantage, laying the foundation for future impact in production,” said Marco Pistoia, head of Global Technology Applied Research at JPMorgan Chase.
    The team examined whether a quantum algorithm with low implementation costs could provide a quantum speedup over the best-known classical methods. QAOA was applied to the Low Autocorrelation Binary Sequences problem, which has significance in understanding the behavior of physical systems, signal processing and cryptography. The study showed that if the algorithm was asked to tackle increasingly larger problems, the time it would take to solve them would grow at a slower rate than that of a classical solver.
    To explore the quantum algorithm’s performance in an ideal noiseless setting, JPMorgan Chase and Argonne jointly developed a simulator to evaluate the algorithm’s performance at scale. It was built on the Polaris supercomputer, accessed through the Argonne Leadership Computing Facility (ALCF), a DOE Office of Science user facility. The ALCF is supported by DOE’s Advanced Scientific Computing Research program.
    “The large-scale quantum circuit simulations efficiently utilized the DOE petascale supercomputer Polaris located at the ALCF. These results show how high performance computing can complement and advance the field of quantum information science,” said Yuri Alexeev, a computational scientist at Argonne. Jeffrey Larson, a computational mathematician in Argonne’s Mathematics and Computer Science Division, also contributed to this research.
    To take the first step toward practical realization of the speedup in the algorithm, the researchers demonstrated a small-scale implementation on Quantinuum’s System Model H1 and H2 trapped-ion quantum computers. Using algorithm-specific error detection, the team reduced the impact of errors on algorithmic performance by up to 65%.
    “Our long-standing partnership with JPMorgan Chase led to this meaningful and noteworthy three-way research experiment that also brought in Argonne. The results could not have been achieved without the unprecedented and world leading quality of our H-Series Quantum Computer, which provides a flexible device for executing error-correcting and error-detecting experiments on top of gate fidelities that are years ahead of other quantum computers,” said Ilyas Khan, founder and chief product officer of Quantinuum. More

  • in

    Modular, scalable hardware architecture for a quantum computer

    Quantum computers hold the promise of being able to quickly solve extremely complex problems that might take the world’s most powerful supercomputer decades to crack.
    But achieving that performance involves building a system with millions of interconnected building blocks called qubits. Making and controlling so many qubits in a hardware architecture is an enormous challenge that scientists around the world are striving to meet.
    Toward this goal, researchers at MIT and MITRE have demonstrated a scalable, modular hardware platform that integrates thousands of interconnected qubits onto a customized integrated circuit. This “quantum-system-on-chip” (QSoC) architecture enables the researchers to precisely tune and control a dense array of qubits. Multiple chips could be connected using optical networking to create a large-scale quantum communication network.
    By tuning qubits across 11 frequency channels, this QSoC architecture allows for a new proposed protocol of “entanglement multiplexing” for large-scale quantum computing.
    The team spent years perfecting an intricate process for manufacturing two-dimensional arrays of atom-sized qubit microchiplets and transferring thousands of them onto a carefully prepared complementary metal-oxide semiconductor (CMOS) chip. This transfer can be performed in a single step.
    “We will need a large number of qubits, and great control over them, to really leverage the power of a quantum system and make it useful. We are proposing a brand new architecture and a fabrication technology that can support the scalability requirements of a hardware system for a quantum computer,” says Linsen Li, an electrical engineering and computer science (EECS) graduate student and lead author of a paper on this architecture.
    Li’s co-authors include Ruonan Han, an associate professor in EECS, leader of the Terahertz Integrated Electronics Group, and member of the Research Laboratory of Electronics (RLE); senior author Dirk Englund, professor of EECS, principal investigator of the Quantum Photonics and Artificial Intelligence Group and of RLE; as well as others at MIT, Cornell University, the Delft Institute of Technology, the Army Research Laboratory, and the MITRE Corporation. The paper appears in Nature.

    Diamond microchiplets
    While there are many types of qubits, the researchers chose to use diamond color centers because of their scalability advantages. They previously used such qubits to produce integrated quantum chips with photonic circuitry.
    Qubits made from diamond color centers are “artificial atoms” that carry quantum information. Because diamond color centers are solid-state systems, the qubit manufacturing is compatible with modern semiconductor fabrication processes. They are also compact and have relatively long coherence times, which refers to the amount of time a qubit’s state remains stable, due to the clean environment provided by the diamond material.
    In addition, diamond color centers have photonic interfaces which allows them to be remotely entangled, or connected, with other qubits that aren’t adjacent to them.
    “The conventional assumption in the field is that the inhomogeneity of the diamond color center is a drawback compared to identical quantum memory like ions and neutral atoms. However, we turn this challenge into an advantage by embracing the diversity of the artificial atoms: Each atom has its own spectral frequency. This allows us to communicate with individual atoms by voltage tuning them into resonance with a laser, much like tuning the dial on a tiny radio,” says Englund.
    This is especially difficult because the researchers must achieve this at a large scale to compensate for the qubit inhomogeneity in a large system.

    To communicate across qubits, they need to have multiple such “quantum radios” dialed into the same channel. Achieving this condition becomes near-certain when scaling to thousands of qubits. To this end, the researchers surmounted that challenge by integrating a large array of diamond color center qubits onto a CMOS chip which provides the control dials. The chip can be incorporated with built-in digital logic that rapidly and automatically reconfigures the voltages, enabling the qubits to reach full connectivity.
    “This compensates for the in-homogenous nature of the system. With the CMOS platform, we can quickly and dynamically tune all the qubit frequencies,” Li explains.
    Lock-and-release fabrication
    To build this QSoC, the researchers developed a fabrication process to transfer diamond color center “microchiplets” onto a CMOS backplane at a large scale.
    They started by fabricating an array of diamond color center microchiplets from a solid block of diamond. They also designed and fabricated nanoscale optical antennas that enable more efficient collection of the photons emitted by these color center qubits in free space.
    Then, they designed and mapped out the chip from the semiconductor foundry. Working in the MIT.nano cleanroom, they post-processed a CMOS chip to add microscale sockets that match up with the diamond microchiplet array.
    They built an in-house transfer setup in the lab and applied a lock-and-release process to integrate the two layers by locking the diamond microchiplets into the sockets on the CMOS chip. Since the diamond microchiplets are weakly bonded to the diamond surface, when they release the bulk diamond horizontally, the microchiplets stay in the sockets.
    “Because we can control the fabrication of both the diamond and the CMOS chip, we can make a complementary pattern. In this way, we can transfer thousands of diamond chiplets into their corresponding sockets all at the same time,” Li says.
    The researchers demonstrated a 500-micron by 500-micron area transfer for an array with 1,024 diamond nanoantennas, but they could use larger diamond arrays and a larger CMOS chip to further scale up the system. In fact, they found that with more qubits, tuning the frequencies actually requires less voltage for this architecture.
    “In this case, if you have more qubits, our architecture will work even better,” Li says.
    The team tested many nanostructures before they determined the ideal microchiplet array for the lock-and-release process. However, making quantum microchiplets is no easy task, and the process took years to perfect.
    “We have iterated and developed the recipe to fabricate these diamond nanostructures in MIT cleanroom, but it is a very complicated process. It took 19 steps of nanofabrication to get the diamond quantum microchiplets, and the steps were not straightforward,” he adds.
    Alongside their QSoC, the researchers developed an approach to characterize the system and measure its performance on a large scale. To do this, they built a custom cryo-optical metrology setup.
    Using this technique, they demonstrated an entire chip with over 4,000 qubits that could be tuned to the same frequency while maintaining their spin and optical properties. They also built a digital twin simulation that connects the experiment with digitized modeling, which helps them understand the root causes of the observed phenomenon and determine how to efficiently implement the architecture.
    In the future, the researchers could boost the performance of their system by refining the materials they used to make qubits or developing more precise control processes. They could also apply this architecture to other solid-state quantum systems. More

  • in

    Bio-inspired cameras and AI help drivers detect pedestrians and obstacles faster

    It’s every driver’s nightmare: a pedestrian stepping out in front of the car seemingly out of nowhere, leaving only a fraction of a second to brake or steer the wheel and avoid the worst. Some cars now have camera systems that can alert the driver or activate emergency braking. But these systems are not yet fast or reliable enough, and they will need to improve dramatically if they are to be used in autonomous vehicles where there is no human behind the wheel.
    Quicker detection using less computational power
    Now, Daniel Gehrig and Davide Scaramuzza from the Department of Informatics at the University of Zurich (UZH) have combined a novel bio-inspired camera with AI to develop a system that can detect obstacles around a car much quicker than current systems and using less computational power. The study is published in this week’s issue of Nature.
    Most current cameras are frame-based, meaning they take snapshots at regular intervals. Those currently used for driver assistance on cars typically capture 30 to 50 frames per second and an artificial neural network can be trained to recognize objects in their images — pedestrians, bikes, and other cars. “But if something happens during the 20 or 30 milliseconds between two snapshots, the camera may see it too late. The solution would be increasing the frame rate, but that translates into more data that needs to be processed in real-time and more computational power,” says Daniel Gehrig, first author of the paper.
    Combining the best of two camera types with AI
    Event cameras are a recent innovation based on a different principle. Instead of a constant frame rate, they have smart pixels that record information every time they detect fast movements. “This way, they have no blind spot between frames, which allows them to detect obstacles more quickly. They are also called neuromorphic cameras because they mimic how human eyes perceive images,” says Davide Scaramuzza, head of the Robotics and Perception Group. But they have their own shortcomings: they can miss things that move slowly and their images are not easily converted into the kind of data that is used to train the AI algorithm.
    Gehrig and Scaramuzza came up with a hybrid system that combines the best of both worlds: It includes a standard camera that collects 20 images per second, a relatively low frame rate compared to the ones currently in use. Its images are processed by an AI system, called a convolutional neural network, that is trained to recognize cars or pedestrians. The data from the event camera is coupled to a different type of AI system, called an asynchronous graph neural network, which is particularly apt for analyzing 3-D data that change over time. Detections from the event camera are used to anticipate detections by the standard camera and also boost its performance. “The result is a visual detector that can detect objects just as quickly as a standard camera taking 5,000 images per second would do but requires the same bandwidth as a standard 50-frame-per-second camera,” says Daniel Gehrig.
    One hundred times faster detections using less data
    The team tested their system against the best cameras and visual algorithms currently on the automotive market, finding that it leads to one hundred times faster detections while reducing the amount of data that must be transmitted between the camera and the onboard computer as well as the computational power needed to process the images without affecting accuracy. Crucially, the system can effectively detect cars and pedestrians that enter the field of view between two subsequent frames of the standard camera, providing additional safety for both the driver and traffic participants — which can make a huge difference, especially at high speeds.
    According to the scientists, the method could be made even more powerful in the future by integrating cameras with LiDAR sensors, like the ones used on self-driving cars. “Hybrid systems like this could be crucial to allow autonomous driving, guaranteeing safety without leading to a substantial growth of data and computational power,” says Davide Scaramuzza. More

  • in

    AI helps medical professionals read confusing EEGs to save lives

    Researchers at Duke University have developed an assistive machine learning model that greatly improves the ability of medical professionals to read the electroencephalography (EEG) charts of intensive care patients.
    Because EEG readings are the only method for knowing when unconscious patients are in danger of suffering a seizure or are having seizure-like events, the computational tool could help save thousands of lives each year. The results appear online May 23 in the New England Journal of Medicine AI.
    EEGs use small sensors attached to the scalp to measure the brain’s electrical signals, producing a long line of up and down squiggles. When a patient is having a seizure, these lines jump up and down dramatically like a seismograph during an earthquake — a signal that is easy to recognize. But other medically important anomalies called seizure-like events are much more difficult to discern.
    “The brain activity we’re looking at exists along a continuum, where seizures are at one end, but there’s still a lot of events in the middle that can also cause harm and require medication,” said Dr. Brandon Westover, associate professor of neurology at Massachusetts General Hospital and Harvard Medical School. “The EEG patterns caused by those events are more difficult to recognize and categorize confidently, even by highly trained neurologists, which not every medical facility has. But doing so is extremely important to the health outcomes of these patients.”
    To build a tool to help make these determinations, the doctors turned to the laboratory of Cynthia Rudin, the Earl D. McLean, Jr. Professor of Computer Science and Electrical and Computer Engineering at Duke. Rudin and her colleagues specialize in developing “interpretable” machine learning algorithms. While most machine learning models are a “black box” that makes it impossible for a human to know how it’s reaching conclusions, interpretable machine learning models essentially must show their work.
    The research group started by gathering EEG samples from over 2,700 patients and having more than 120 experts pick out the relevant features in the graphs, categorizing them as either a seizure, one of four types of seizure-like events or ‘other.’ Each type of event appears in EEG charts as certain shapes or repetitions in the undulating lines. But because these charts are rarely steadfast in their appearance, telltale signals can be interrupted by bad data or can mix together to create a confusing chart.
    “There is a ground truth, but it’s difficult to read,” said Stark Guo, a Ph.D. student working in Rudin’s lab. “The inherent ambiguity in many of these charts meant we had to train the model to place its decisions within a continuum rather than well-defined separate bins.”
    When displayed visually, that continuum looks something like a multicolored starfish swimming away from a predator. Each differently colored arm represents one type of seizure-like event the EEG could represent. The closer the algorithm puts a specific chart toward the tip of an arm, the surer it is of its decision, while those placed closer to the central body are less certain.

    Besides this visual classification, the algorithm also points to the patterns in the brainwaves that it used to make its determination and provides three examples of professionally diagnosed charts that it sees as being similar.
    “This lets a medical professional quickly look at the important sections and either agree that the patterns are there or decide that the algorithm is off the mark,” said Alina Barnett, a postdoctoral research associate in the Rudin lab. “Even if they’re not highly trained to read EEGs, they can make a much more educated decision.”
    Putting the algorithm to the test, the collaborative team had eight medical professionals with relevant experience categorize 100 EEG samples into the six categories, once with the help of AI and once without. The performance of all of the participants greatly improved, with their overall accuracy rising from 47% to 71%. Their performance also rose above those using a similar “black box” algorithm in a previous study.
    “Usually, people think that black box machine learning models are more accurate, but for many important applications, like this one, it’s just not true,” said Rudin. “It’s much easier to troubleshoot models when they are interpretable. And in this case, the interpretable model was actually more accurate. It also provides a bird’s eye view of the types of anomalous electrical signals that occur in the brain, which is really useful for care of critically ill patients.”
    This work was supported by the National Science Foundation (IIS-2147061, HRD-2222336, IIS-2130250, 2014431), the National Institutes of Health (R01NS102190, R01NS102574, R01NS107291, RF1AG064312, RF1NS120947, R01AG073410, R01HL161253, K23NS124656, P20GM130447) and the DHHS LB606 Nebraska Stem Cell Grant. More

  • in

    Public have no difficulty getting to grips with an extra thumb, study finds

    Cambridge researchers have shown that members of the public have little trouble in learning very quickly how to use a third thumb — a controllable, prosthetic extra thumb — to pick up and manipulate objects.
    The team tested the robotic device on a diverse range of participants, which they say is essential for ensuring new technologies are inclusive and can work for everyone.
    An emerging area of future technology is motor augmentation — using motorised wearable devices such as exoskeletons or extra robotic body parts to advance our motor capabilities beyond current biological limitations.
    While such devices could improve the quality of life for healthy individuals who want to enhance their productivity, the same technologies can also provide people with disabilities new ways to interact with their environment.
    Professor Tamar Makin from the Medical Research Council (MRC) Cognition and Brain Sciences Unit at the University of Cambridge said: “Technology is changing our very definition of what it means to be human, with machines increasingly becoming a part of our everyday lives, and even our minds and bodies.
    “These technologies open up exciting new opportunities that can benefit society, but it’s vital that we consider how they can help all people equally, especially marginalised communities who are often excluded from innovation research and development. To ensure everyone will have the opportunity to participate and benefit from these exciting advances, we need to explicitly integrate and measure inclusivity during the earliest possible stages of the research and development process.”
    Dani Clode, a collaborator within Professor Makin’s lab, has developed the Third Thumb, an extra robotic thumb aimed at increasing the wearer’s range of movement, enhancing their grasping capability and expanding the carrying capacity of the hand. This allows the user to perform tasks that might be otherwise challenging or impossible to complete with one hand or to perform complex multi-handed tasks without having to coordinate with other people.

    The Third Thumb is worn on the opposite side of the palm to the biological thumb and controlled by a pressure sensor placed under each big toe or foot. Pressure from the right toe pulls the Thumb across the hand, while the pressure exerted with the left toe pulls the Thumb up toward the fingers. The extent of the Thumb’s movement is proportional to the pressure applied, and releasing pressure moves it back to its original position.
    In 2022, the team had the opportunity to test the Third Thumb at the annual Royal Society Summer Science Exhibition, where members of the public of all ages were able to use the device during different tasks. The results are published today in Science Robotics.
    Over the course of five days, the team tested 596 participants, ranging in age from three to 96 years old and from a wide range of demographic backgrounds. Of these, only four were unable to use the Third Thumb, either because it did not fit their hand securely, or because they were unable to control it with their feet (the pressure sensors developed specifically for the exhibition were not suitable for very lightweight children).
    Participants were given up to a minute to familiarise themselves with the device, during which time the team explained how to perform one of two tasks.
    The first task involved picking up pegs from a pegboard one at a time with just the Third Thumb and placing them in a basket. Participants were asked to move as many pegs as possible in 60 seconds. 333 participants completed this task.
    The second task involved using the Third Thumb together with the wearer’s biological hand to manipulate and move five or six different foam objects. The objects were of various shapes that required different manipulations to be used, increasing the dexterity of the task. Again, participants were asked to move as many objects as they could into the basket within a maximum of 60 seconds. 246 participants completed this task.

    Almost everyone was able to use the device straightaway. 98% of participants were able to successfully manipulate objects using the Third Thumb during the first minute of use, with only 13 participants unable to perform the task.
    Ability levels between participants were varied, but there were no differences in performance between genders, nor did handedness change performance — despite the Thumb always being worn on the right hand. There was no definitive evidence that people who might be considered ‘good with their hands’ — for example, they were learning to play a musical instrument, or their jobs involved manual dexterity — were any better at the tasks.
    Older and younger adults had a similar level of ability when using the new technology, though further investigation just within the older adults age bracket revealed a decline in performance with increasing age. The researchers say this effect could be due to the general degradation in sensorimotor and cognitive abilities that are associated with ageing and may also reflect a generational relationship to technology.
    Performance was generally poorer among younger children. Six out of the 13 participants that could not complete the task were below the age of 10 years old, and of those that did complete the task, the youngest children tended to perform worse compared to older children. But even older children (aged 12-16 years) struggled more than young adults.
    Dani said: “Augmentation is about designing a new relationship with technology — creating something that extends beyond being merely a tool to becoming an extension of the body itself. Given the diversity of bodies, it’s crucial that the design stage of wearable technology is as inclusive as possible. It’s equally important that these devices are accessible and functional for a wide range of users. Additionally, they should be easy for people to learn and use quickly.”
    Co-author Lucy Dowdall, also from the MRC Cognition and Brain Science Unit, added: “If motor augmentation — and even broader human-machine interactions — are to be successful, they’ll need to integrate seamlessly with the user’s motor and cognitive abilities. We’ll need to factor in different ages, genders, weight, lifestyles, disabilities — as well as people’s cultural, financial backgrounds, and even likes or dislikes of technology. Physical testing of large and diverse groups of individuals is essential to achieve this goal.”
    There are countless examples of where a lack of inclusive design considerations has led to technological failure: Automated speech recognition systems that convert spoken language to text have been shown to perform better listening to white voices over Black voices. Some augmented reality technologies have been found to be less effective for users with darker skin tones. Women face a higher health risk from car accidents, due to car seats and seatbelts being primarily designed to accommodate ‘average’ male-sized dummies during crash testing. Hazardous power and industrial tools designed for a right-hand dominant use or grip have resulted in more accidents when operated by left-handers forced to use their non-dominant hand.This research was funded by the European Research Council, Wellcome, the Medical Research Council and Engineering and Physical Sciences Research Council. More

  • in

    Tracking animals without markers in the wild

    Researchers from the Cluster of Excellence Collective Behaviour developed a computer vision framework for posture estimation and identity tracking which they can use in indoor environments as well as in the wild. They have thus taken an important step towards markerless tracking of animals in the wild using computer vision and machine learning.
    Two pigeons are pecking grains in a park in Konstanz. A third pigeon flies in. There are four cameras in the immediate vicinity. Doctoral students Alex Chan and Urs Waldmann from the Cluster of Excellence Collective Behaviour at the University of Konstanz are filming the scene. After an hour, they return with the footage to their office to analyze it with a computer vision framework for posture estimation and identity tracking. The framework detects and draws a box around all pigeons. It records central body parts and determines their posture, their position, and their interaction with the other pigeons around them. All of this happened without any markers being attached to pigeons or any need for human being called in to help. This would not have been possible just a few years ago.
    3D-MuPPET, a framework to estimate and track 3D poses of up to 10 pigeons
    Markerless methods for animal posture tracking have been rapidly developed recently, but frameworks and benchmarks for tracking large animal groups in 3D are still lacking. To overcome this gap, researchers from the Cluster of Excellence Collective Behaviour at the University of Konstanz and the Max Planck Institute of Animal Behavior present 3D-MuPPET, a framework to estimate and track 3D poses of up to 10 pigeons at interactive speed using multiple camera views. The related publication was recently published in the International Journal of Computer Vision (IJCV).
    Important milestone in animal posture tracking and automatic behavioural analysis
    Urs Waldmann and Alex Chan recently finalized a new method, called 3D-MuPPET, which stands for 3D Multi-Pigeon Pose Estimation and Tracking. 3D-MuPPET is a computer vision framework for posture estimation and identity tracking for up to 10 individual pigeons from 4 camera views, based on data collected both in captive environments and even in the wild. “We trained a 2D keypoint detector and triangulated points into 3D, and also show that models trained on single pigeon data work well with multi-pigeon data,” explains Urs Waldmann. This is a first example of 3D animal posture tracking for an entire group of up to 10 individuals. Thus, the new framework provides a concrete method for biologists to create experiments and measure animal posture for automatic behavioural analysis. “This framework is an important milestone in animal posture tracking and automatic behavioural analysis,” as Alex Chan and Urs Waldmann say.
    Framework can be used in the wild
    In addition to tracking pigeons indoors, the framework is also extended to pigeons in the wild. “Using a model that can identify the outline of any object in an image called the Segment Anything Model, we further trained a 2D keypoint detector with a masked pigeon from the captive data, then applied the model to pigeon videos outdoors without any extra model finetuning,” states Alex Chan. 3D-MuPPET presents one of the first case-studies on how to transition from tracking animals in captivity towards tracking animals in the wild, allowing fine-scaled behaviours of animals to be measured in their natural habitats. The developed methods can potentially be applied across other species in future work, with potential application for large scale collective behaviour research and species monitoring in a non-invasive way.
    3D-MuPPET showcases a powerful and flexible framework for researchers who would like to use 3D posture reconstruction for multiple individuals to study collective behaviour in any environments or species. As long as a multi-camera setup and a 2D posture estimator is available, the framework can be applied to track 3D postures of any animals. More

  • in

    Research finds improving AI large language models helps better align with human brain activity

    With generative artificial intelligence (GenAI) transforming the social interaction landscape in recent years, large language models (LLMs), which use deep-learning algorithms to train GenAI platforms to process language, have been put in the spotlight. A recent study by The Hong Kong Polytechnic University (PolyU) found that LLMs perform more like the human brain when being trained in more similar ways as humans process language, which has brought important insights to brain studies and the development of AI models.
    Current large language models (LLMs) mostly rely on a single type of pretraining — contextual word prediction. This simple learning strategy has achieved surprising success when combined with massive training data and model parameters, as shown by popular LLMs such as ChatGPT. Recent studies also suggest that word prediction in LLMs can serve as a plausible model for how humans process language. However, humans do not simply predict the next word but also integrate high-level information in natural language comprehension.
    A research team led by Prof. LI Ping, Dean of the Faculty of Humanities and Sin Wai Kin Foundation Professor in Humanities and Technology at PolyU, has investigated the next sentence prediction (NSP) task, which simulates one central process of discourse-level comprehension in the human brain to evaluate if a pair of sentences is coherent, into model pretraining and examined the correlation between the model’s data and brain activation. The study has been recently published in the academic journal Sciences Advances.
    The research team trained two models, one with NSP enhancement and the other without, both also learned word prediction. Functional magnetic resonance imaging (fMRI) data were collected from people reading connected sentences or disconnected sentences. The research team examined how closely the patterns from each model matched up with the brain patterns from the fMRI brain data.
    It was clear that training with NSP provided benefits. The model with NSP matched human brain activity in multiple areas much better than the model trained only on word prediction. Its mechanism also nicely maps onto established neural models of human discourse comprehension. The results gave new insights into how our brains process full discourse such as conversations. For example, parts of the right side of the brain, not just the left, helped understand longer discourse. The model trained with NSP could also better predict how fast someone read — showing that simulating discourse comprehension through NSP helped AI understand humans better.
    Recent LLMs, including ChatGPT, have relied on vastly increasing the training data and model size to achieve better performance. Prof. Li Ping said, “There are limitations in just relying on such scaling. Advances should also be aimed at making the models more efficient, relying on less rather than more data. Our findings suggest that diverse learning tasks such as NSP can improve LLMs to be more human-like and potentially closer to human intelligence.”
    He added, “More importantly, the findings show how neurocognitive researchers can leverage LLMs to study higher-level language mechanisms of our brain. They also promote interaction and collaboration between researchers in the fields of AI and neurocognition, which will lead to future studies on AI-informed brain studies as well as brain-inspired AI.” More