More stories

  • in

    Using AI to decode dog vocalizations

    Have you ever wished you could understand what your dog is trying to say to you? University of Michigan researchers are exploring the possibilities of AI, developing tools that can identify whether a dog’s bark conveys playfulness or aggression.
    The same models can also glean other information from animal vocalizations, such as the animal’s age, breed and sex. A collaboration with Mexico’s National Institute of Astrophysics, Optics and Electronics (INAOE) Institute in Puebla, the study finds that AI models originally trained on human speech can be used as a starting point to train new systems that target animal communication.
    The results were presented at the Joint International Conference on Computational Linguistics, Language Resources and Evaluation.
    “By using speech processing models initially trained on human speech, our research opens a new window into how we can leverage what we built so far in speech processing to start understanding the nuances of dog barks,” said Rada Mihalcea, the Janice M. Jenkins Collegiate Professor of Computer Science and Engineering, and director of U-M’s AI Laboratory.
    “There is so much we don’t yet know about the animals that share this world with us. Advances in AI can be used to revolutionize our understanding of animal communication, and our findings suggest that we may not have to start from scratch.”
    One of the prevailing obstacles to developing AI models that can analyze animal vocalizations is the lack of publicly available data. While there are numerous resources and opportunities for recording human speech, collecting such data from animals is more difficult.
    “Animal vocalizations are logistically much harder to solicit and record,” said Artem Abzaliev, lead author and U-M doctoral student in computer science and engineering. “They must be passively recorded in the wild or, in the case of domestic pets, with the permission of owners.”
    Because of this dearth of usable data, techniques for analyzing dog vocalizations have proven difficult to develop, and the ones that do exist are limited by a lack of training material. The researchers overcame these challenges by repurposing an existing model that was originally designed to analyze human speech.

    This approach enabled the researchers to tap into robust models that form the backbone of the various voice-enabled technologies we use today, including voice-to-text and language translation. These models are trained to distinguish nuances in human speech, like tone, pitch and accent, and convert this information into a format that a computer can use to identify what words are being said, recognize the individual speaking, and more.
    “These models are able to learn and encode the incredibly complex patterns of human language and speech,” Abzaliev said. “We wanted to see if we could leverage this ability to discern and interpret dog barks.”
    The researchers used a dataset of dog vocalizations recorded from 74 dogs of varying breed, age and sex, in a variety of contexts. Humberto Pérez-Espinosa, a collaborator at INAOE, led the team who collected the dataset. Abzaliev then used the recordings to modify a machine-learning model — a type of computer algorithm that identifies patterns in large data sets. The team chose a speech representation model called Wav2Vec2, which was originally trained on human speech data.
    With this model, the researchers were able to generate representations of the acoustic data collected from the dogs and interpret these representations. They found that Wav2Vec2 not only succeeded at four classification tasks; it also outperformed other models trained specifically on dog bark data, with accuracy figures up to 70%.
    “This is the first time that techniques optimized for human speech have been built upon to help with the decoding of animal communication,” Mihalcea said. “Our results show that the sounds and patterns derived from human speech can serve as a foundation for analyzing and understanding the acoustic patterns of other sounds, such as animal vocalizations.”
    In addition to establishing human speech models as a useful tool in analyzing animal communication — which could benefit biologists, animal behaviorists and more — this research has important implications for animal welfare. Understanding the nuances of dog vocalizations could greatly improve how humans interpret and respond to the emotional and physical needs of dogs, thereby enhancing their care and preventing potentially dangerous situations, the researchers said. More

  • in

    New model allows a computer to understand human emotions

    Researchers at the University of Jyväskylä, Finland, have developed a model that enables computers to interpret and understand human emotions, utilizing principles of mathematical psychology. This advancement could significantly improve the interface between humans and smart technologies, including artificial intelligence systems, making them more intuitive and responsive to user feelings.
    According to Jussi Jokinen, Associate Professor of Cognitive Science, the model could be used by a computer in the future to predict, for example, when a user will become annoyed or anxious. In such situations, the computer could, for example, give the user additional instructions or redirect the interaction.
    In everyday interactions with computers, users commonly experience emotions such as joy, irritation, and boredom. Despite the growing prevalence of artificial intelligence, current technologies often fail to acknowledge these user emotions.
    The model developed in Jyväskylä can currently predict if the user has feelings of happiness, boredom, irritation, rage, despair and anxiety.
    “Humans naturally interpret and react to each other’s emotions, a capability that machines fundamentally lack,” Jokinen explains. “This discrepancy can make interactions with computers frustrating, especially if the machine remains oblivious to the user’s emotional state.”
    The research project led by Jokinen uses mathematical psychology to find solutions to the problem of misalignment between intelligent computer systems and their users.
    “Our model can be integrated into AI systems, granting them the ability to psychologically understand emotions and thus better relate to their users.” Jokinen says.

    Research is based on emotional theory — the next step is to influence the user’s emotions
    The research is anchored in a theory postulating that emotions are generated when human cognition evaluates events from various perspectives.
    Jokinen elaborates: “Consider a computer error during a critical task. This event is assessed by the user’s cognition as being counterproductive. An inexperienced user might react with anxiety and fear due to uncertainty on how to resolve the error, whereas an experienced user might feel irritation, annoyed at having to waste time resolving the issue. Our model predicts the user’s emotional response by simulating this cognitive evaluation process.”
    The next phase of this project will explore potential applications of this emotional understanding.
    “With our model, a computer could preemptively predict user distress and attempt to mitigate negative emotions,” Jokinen suggests.
    “This proactive approach could be utilized in various settings, from office environments to social media platforms, improving user experience by sensitively managing emotional dynamics.”
    The implications of such technology are profound, offering a glimpse into a future where computers are not merely tools, but empathetic partners in user interaction. More

  • in

    New open-source platform allows users to evaluate performance of AI-powered chatbots

    Researchers have developed a platform for the interactive evaluation of AI-powered chatbots such as ChatGPT.
    A team of computer scientists, engineers, mathematicians and cognitive scientists, led by the University of Cambridge, developed an open-source evaluation platform called CheckMate, which allows human users to interact with and evaluate the performance of large language models (LLMs).
    The researchers tested CheckMate in an experiment where human participants used three LLMs — InstructGPT, ChatGPT and GPT-4 — as assistants for solving undergraduate-level mathematics problems.
    The team studied how well LLMs can assist participants in solving problems. Despite a generally positive correlation between a chatbot’s correctness and perceived helpfulness, the researchers also found instances where the LLMs were incorrect, but still useful for the participants. However, certain incorrect LLM outputs were thought to be correct by participants. This was most notable in LLMs optimised for chat.
    The researchers suggest models that communicate uncertainty, respond well to user corrections, and can provide a concise rationale for their recommendations, make better assistants. Human users of LLMs should verify their outputs carefully, given their current shortcomings.
    The results, reported in the Proceedings of the National Academy of Sciences (PNAS), could be useful in both informing AI literacy training, and help developers improve LLMs for a wider range of uses.
    While LLMs are becoming increasingly powerful, they can also make mistakes and provide incorrect information, which could have negative consequences as these systems become more integrated into our everyday lives.

    “LLMs have become wildly popular, and evaluating their performance in a quantitative way is important, but we also need to evaluate how well these systems work with and can support people,” said co-first author Albert Jiang, from Cambridge’s Department of Computer Science and Technology. “We don’t yet have comprehensive ways of evaluating an LLM’s performance when interacting with humans.”
    The standard way to evaluate LLMs relies on static pairs of inputs and outputs, which disregards the interactive nature of chatbots, and how that changes their usefulness in different scenarios. The researchers developed CheckMate to help answer these questions, designed for but not limited to applications in mathematics.
    “When talking to mathematicians about LLMs, many of them fall into one of two main camps: either they think that LLMs can produce complex mathematical proofs on their own, or that LLMs are incapable of simple arithmetic,” said co-first author Katie Collins from the Department of Engineering. “Of course, the truth is probably somewhere in between, but we wanted to find a way of evaluating which tasks LLMs are suitable for and which they aren’t.”
    The researchers recruited 25 mathematicians, from undergraduate students to senior professors, to interact with three different LLMs (InstructGPT, ChatGPT, and GPT-4) and evaluate their performance using CheckMate. Participants worked through undergraduate-level mathematical theorems with the assistance of an LLM and were asked to rate each individual LLM response for correctness and helpfulness. Participants did not know which LLM they were interacting with.
    The researchers recorded the sorts of questions asked by participants, how participants reacted when they were presented with a fully or partially incorrect answer, whether and how they attempted to correct the LLM, or if they asked for clarification. Participants had varying levels of experience with writing effective prompts for LLMs, and this often affected the quality of responses that the LLMs provided.
    An example of an effective prompt is “what is the definition of X” (X being a concept in the problem) as chatbots can be very good at retrieving concepts they know of and explaining it to the user.

    “One of the things we found is the surprising fallibility of these models,” said Collins. “Sometimes, these LLMs will be really good at higher-level mathematics, and then they’ll fail at something far simpler. It shows that it’s vital to think carefully about how to use LLMs effectively and appropriately.”
    However, like the LLMs, the human participants also made mistakes. The researchers asked participants to rate how confident they were in their own ability to solve the problem they were using the LLM for. In cases where the participant was less confident in their own abilities, they were more likely to rate incorrect generations by LLM as correct.
    “This kind of gets to a big challenge of evaluating LLMs, because they’re getting so good at generating nice, seemingly correct natural language, that it’s easy to be fooled by their responses,” said Jiang. “It also shows that while human evaluation is useful and important, it’s nuanced, and sometimes it’s wrong. Anyone using an LLM, for any application, should always pay attention to the output and verify it themselves.”
    Based on the results from CheckMate, the researchers say that newer generations of LLMs are increasingly able to collaborate helpfully and correctly with human users on undergraduate-level maths problems, as long as the user can assess the correctness of LLM-generated responses. Even if the answers may be memorised and can be found somewhere on the internet, LLMs have the advantage of being flexible in their inputs and outputs over traditional search engines (though should not replace search engines in their current form).
    While CheckMate was tested on mathematical problems, the researchers say their platform could be adapted to a wide range of fields. In the future, this type of feedback could be incorporated into the LLMs themselves, although none of the CheckMate feedback from the current study has been fed back into the models.
    “These kinds of tools can help the research community to have a better understanding of the strengths and weaknesses of these models,” said Collins. “We wouldn’t use them as tools to solve complex mathematical problems on their own, but they can be useful assistants, if the users know how to take advantage of them.”
    The research was supported in part by the Marshall Commission, the Cambridge Trust, Peterhouse, Cambridge, The Alan Turing Institute, the European Research Council, and the Engineering and Physical Sciences Research Council (EPSRC), part of UK Research and Innovation (UKRI). More

  • in

    Microscope system sharpens scientists’ view of neural circuit connections

    The brain’s ability to learn comes from “plasticity,” in which neurons constantly edit and remodel the tiny connections called synapses that they make with other neurons to form circuits. To study plasticity, neuroscientists seek to track it at high resolution across whole cells, but plasticity doesn’t wait for slow microscopes to keep pace and brain tissue is notorious for scattering light and making images fuzzy. In a paper in Scientific Reports, a collaboration of MIT engineers and neuroscientists describes a new microscopy system designed for fast, clear, and frequent imaging of the living brain.
    The system, called “multiline orthogonal scanning temporal focusing” (mosTF), works by scanning brain tissue with lines of light in perpendicular directions. As with other live brain imaging systems that rely on “two-photon microscopy,” this scanning light “excites” photon emission from brain cells that have been engineered to fluoresce when stimulated. The new system proved in the team’s tests to be eight times faster than a two-photon scope that goes point by point, and proved to have a four-fold better signal to background ratio (a measure of the resulting image clarity) than a two-photon system that just scans in one direction.
    “Tracking rapid changes in circuit structure in the context of the living brain remains a challenge,” said co-author Elly Nedivi, William R. (1964) and Linda R. Young Professor of Neuroscience in The Picower Institute for Learning and Memory and MIT’s Departments of Biology and Brain and Cognitive Sciences. “While two-photon microscopy is the only method that allows high resolution visualization of synapses deep in scattering tissue, such as the brain, the required point by point scanning is mechanically slow. The mosTF system significantly reduces scan time without sacrificing resolution.”
    Scanning a whole line of a sample is inherently faster than just scanning one point at a time, but it kicks up a lot of scattering. To manage that scattering, some scope systems just discard scattered photons as noise, but then they are lost, said lead author Yi Xue, an assistant professor at UC Davis and a former graduate student in the lab of corresponding author Peter T.C. So, professor of mechanical engineering and biological engineering at MIT. Newer single-line and the mosTF systems produce a stronger signal (thereby resolving smaller and fainter features of stimulated neurons) by algorithmically reassigning scattered photons back to their origin. In a two-dimensional image, that process is better accomplished by using the information produced by a two-dimensional, perpendicular-direction system such as mosTF, than by a one-dimensional, single-direction system, Xue said.
    “Our excitation light is a line rather than a point — more like a light tube than a light bulb — but the reconstruction process can only reassign photons to the excitation line and cannot handle scattering within the line,” Xue explained. “Therefore, scattering correction is only performed along one dimension for a 2D image. To correct scattering in both dimensions, we need to scan the sample and correct scattering along the other dimension as well, resulting in an orthogonal scanning strategy.”
    In the study the team tested their system head-to-head against a point-by-point scope (a two-photon laser scanning microscope — TPLSM) and a line-scanning temporal focusing microscope (lineTF). They imaged fluorescent beads through water and through a lipid-infused solution that better simulates the kind of scattering that arises in biological tissue. In the lipid solution, mosTF produced images with a 36-times better signal-to-background ratio than lineTF.
    For a more definitive proof, Xue worked with Josiah Boivin in the Nedivi lab to image neurons in the brain of a live, anesthetized mouse, using mosTF. Even in this much more complex environment, where the pulsations of blood vessels and the movement of breathing provide additional confounds, the mosTF scope still achieved a four-fold better signal-to-background ratio. Importantly, it was able to reveal the features where many synapses dwell: the spines that protrude along the vine-like processes, or dendrites, that grow out of the neuron cell body. Monitoring plasticity requires being able to watch those spines grow, shrink, come and go, across the entire cell, Nedivi said.
    “Our continued collaboration with the So lab and their expertise with microscope development has enabled in vivo studies that are unapproachable using conventional, out-of-the-box two photon microscopes,” she added.
    So said he is already planning further improvements to the technology.
    “We’re continuing to work toward the goal of developing even more efficient microscopes to look at plasticity even more efficiently,” he said. “The speed of mosTF is still limited by needing to use high sensitivity, low noise cameras that are often slow. We are now working on a next generation system with new type of detectors such as hybrid photomultiplier or avalanche photodiode arrays that are both sensitive and fast.” More

  • in

    Unraveling the physics of knitting

    Knitting, the age-old craft of looping and stitching natural fibers into fabrics, has received renewed attention for its potential applications in advanced manufacturing. Far beyond their use for garments, knitted textiles are ideal for designing and fabricating emerging technologies like wearable electronics or soft robotics — structures that need to move and bend.
    Knitting transforms one-dimensional yarn into two-dimensional fabrics that are flexible, durable, and highly customizable in shape and elasticity. But to create smart textile design techniques that engineers can use, understanding the mechanics behind knitted materials is crucial.
    Physicists from the Georgia Institute of Technology have taken the technical know-how of knitting and added mathematical backing to it. In a study led by Elisabetta Matsumoto, associate professor in the School of Physics, and Krishma Singal, a graduate researcher in Matsumoto’s lab, the team used experiments and simulations to quantify and predict how knit fabric response can be programmed. By establishing a mathematical theory of knitted materials, the researchers hope that knitting — and textiles in general — can be incorporated into more engineering applications.
    Their research paper, “Programming Mechanics in Knitted Materials, Stitch by Stitch,” was published in the journal Nature Communications.
    “For centuries, hand knitters have used different types of stitches and stitch combinations to specify the geometry and ‘stretchiness’ of garments, and much of the technical knowledge surrounding knitting has been handed down by word of mouth,” said Matsumoto.
    But while knitting has often been dismissed as unskilled, poorly paid “women’s work,” the properties of knits can be more complex than traditional engineering materials like rubbers or metals.
    For this project, the team wanted to decode the underlying principles that direct the elastic behavior of knitted fabrics. These principles are governed by the nuanced interplay of stitch patterns, geometry, and yarn topology — the undercrossings or overcrossings in a knot or stitch. “A lot of yarn isn’t very stretchy, yet once knit into a fabric, the fabric exhibits emergent elastic behavior,” Singal said.

    “Experienced knitters can identify which fabrics are stretchier than others and have an intuition for its best application,” she added. “But by understanding how these fabrics can be programmed and how they behave, we can expand knitting’s application into a variety of fields beyond clothing.”
    Through a combination of experiments and simulations, Matsumoto and Singal explored the relationships among yarn manipulation, stitch patterns, and fabric elasticity, and how these factors work together to affect bulk fabric behavior. They began with physical yarn and fabric stretching experiments to identify main parameters, such as how bendable or fluffy the yarn is, and the length and radius of yarn in a given stitch.
    They then used the experiment results to design simulations to examine the yarn inside a stitch, similar to an X-ray. It is difficult to see inside stitches during the physical measurements, so the simulations are used to see what parts of the yarn have interacted with other parts. The simulations are used to recreate the physical measurements as accurately as possible.
    Through these experiments and simulations, Singal and Matsumoto showed the profound impact that design variations can have on fabric response and uncovered the remarkable programmability of knitting. “We discovered that by using simple adjustments in how you design a fabric pattern, you can change how stretchy or stiff the bulk fabric is,” Singal said. “How the yarn is manipulated, what stitches are formed, and how the stitches are patterned completely alter the response of the final fabric.”
    Matsumoto envisions that the insights gleaned from their research will enable knitted textile design to become more commonly used in manufacturing and product design. Their discovery that simple stitch patterning can alter a fabric’s elasticity points to knitting’s potential for cutting-edge interactive technologies like soft robotics, wearables, and haptics.
    “We think of knitting as an additive manufacturing technique — like 3D printing, and you can change the material properties just by picking the right stitch pattern,” Singal said.
    Matsumoto and Singal plan to push the boundaries of knitted fabric science even further, as there are still numerous questions about knitted fabrics to be answered.
    “Textiles are ubiquitous and we use them everywhere in our lives,” Matsumoto said. “Right now, the hard part is that designing them for specific properties relies on having a lot of experience and technical intuition. We hope our research helps make textiles a versatile tool for engineers and scientists too.” More

  • in

    AI detects more breast cancers with fewer false positives

    Using artificial intelligence (AI), breast radiologists in Denmark have improved breast cancer screening performance and reduced the rate of false-positive findings. Results of the study were published today in Radiology, a journal of the Radiological Society of North America (RSNA).
    Mammography successfully reduces breast cancer mortality, but also carries the risk of false-positive findings. In recent years, researchers have studied the use of AI systems in screening.
    “We believe AI has the potential to improve screening performance,” said Andreas D. Lauritzen, Ph.D., a post-doctoral student at the University of Copenhagen and researcher at Gentofte Hospital in Denmark.
    When used to triage likely normal screening results or assist with decision support, AI also can substantially reduce radiologist workload.
    “Population-based screening with mammography reduces breast cancer mortality, but it places a substantial workload on radiologists who must read a large number of mammograms, the majority of which don’t warrant a recall of the patient,” Dr. Lauritzen said. “The reading workload is further compounded when screening programs employ double reading to improve cancer detection and decrease false-positive recalls.”
    Dr. Lauritzen and colleagues set out to compare workload and screening performance in two cohorts of women who underwent screening before and after AI implementation.
    The retrospective study compared two groups of women between the ages of 50 and 69 who underwent biennial mammography screening in the Capital Region of Denmark.

    In the first group, two radiologists read the mammograms of women screened between October 2020 and November 2021 before the implementation of AI. The screening mammograms of the second group of women performed between November 2021 and October 2022 were initially analyzed by AI. Mammograms deemed likely to be normal by AI were then read by one of 19 specialized full-time breast radiologists (called a single-read). The remaining mammograms were read by two radiologists (called a double-read) with AI-assisted decision support.
    The commercially available AI system used for screening was trained by deep learning models to highlight and rate suspicious lesions and calcifications within mammograms. All women who underwent mammographic screening were followed for at least 180 days. Invasive cancers and ductal carcinoma in situ (DCIS) detected through screening were confirmed through needle biopsy or surgical specimens.
    In total, 60,751 women were screened without AI, and 58,246 women were screened with the AI system. In the AI implementation group, 66.9% (38,977) of the screenings were single-read, and 33.1% (19,269) were double-read with AI assistance.
    Compared to screening without AI, screening with the AI system detected significantly more breast cancers (0.82% versus 0.70%) and had a lower false-positive rate (1.63% versus 2.39%).
    “In the AI-screened group, the recall rate decreased by 20.5 percent, and the radiologists’ reading workload was lowered by 33.4 percent,” Dr. Lauritzen said.
    The positive predictive value of AI screening was also greater than that of screening without AI (33.5% versus 22.5%). In the AI group, a higher proportion of invasive cancers detected were 1 centimeter or less in size (44.93% vs. 36.60%).

    “All screening performance indicators improved except for the node-negative rate which showed no evidence of change,” Dr. Lauritzen said.
    Dr. Lauritzen said more research is needed to evaluate long-term outcomes and ensure overdiagnosis does not increase.
    “Radiologists typically have access to the women’s previous screening mammograms, but the AI system does not,” he said. “That’s something we’d like to work on in the future.”
    It is also important to note that not all countries follow the same breast cancer screening protocols and intervals. U.S. breast cancer screening protocols differ from protocols used in Denmark. More

  • in

    A technique for more effective multipurpose robots

    Let’s say you want to train a robot so it understands how to use tools and can then quickly learn to make repairs around your house with a hammer, wrench, and screwdriver. To do that, you would need an enormous amount of data demonstrating tool use.
    Existing robotic datasets vary widely in modality — some include color images while others are composed of tactile imprints, for instance. Data could also be collected in different domains, like simulation or human demos. And each dataset may capture a unique task and environment.
    It is difficult to efficiently incorporate data from so many sources in one machine-learning model, so many methods use just one type of data to train a robot. But robots trained this way, with a relatively small amount of task-specific data, are often unable to perform new tasks in unfamiliar environments.
    In an effort to train better multipurpose robots, MIT researchers developed a technique to combine multiple sources of data across domains, modalities, and tasks using a type of generative AI known as diffusion models.
    They train a separate diffusion model to learn a strategy, or policy, for completing one task using one specific dataset. Then they combine the policies learned by the diffusion models into a general policy that enables a robot to perform multiple tasks in various settings.
    In simulations and real-world experiments, this training approach enabled a robot to perform multiple tool-use tasks and adapt to new tasks it did not see during training. The method, known as Policy Composition (PoCo), led to a 20 percent improvement in task performance when compared to baseline techniques.
    “Addressing heterogeneity in robotic datasets is like a chicken-egg problem. If we want to use a lot of data to train general robot policies, then we first need deployable robots to get all this data. I think that leveraging all the heterogeneous data available, similar to what researchers have done with ChatGPT, is an important step for the robotics field,” says Lirui Wang, an electrical engineering and computer science (EECS) graduate student and lead author of a paper on PoCo.

    Wang’s coauthors include Jialiang Zhao, a mechanical engineering graduate student; Yilun Du, an EECS graduate student; Edward Adelson, the John and Dorothy Wilson Professor of Vision Science in the Department of Brain and Cognitive Sciences and a member of the Computer Science and Artificial Intelligence Laboratory (CSAIL); and senior author Russ Tedrake, the Toyota Professor of EECS, Aeronautics and Astronautics, and Mechanical Engineering, and a member of CSAIL. The research will be presented at the Robotics: Science and Systems Conference.
    Combining disparate datasets
    A robotic policy is a machine-learning model that takes inputs and uses them to perform an action. One way to think about a policy is as a strategy. In the case of a robotic arm, that strategy might be a trajectory, or a series of poses that move the arm so it picks up a hammer and uses it to pound a nail.
    Datasets used to learn robotic policies are typically small and focused on one particular task and environment, like packing items into boxes in a warehouse.
    “Every single robotic warehouse is generating terabytes of data, but it only belongs to that specific robot installation working on those packages. It is not ideal if you want to use all of these data to train a general machine,” Wang says.
    The MIT researchers developed a technique that can take a series of smaller datasets, like those gathered from many robotic warehouses, learn separate policies from each one, and combine the policies in a way that enables a robot to generalize to many tasks.

    They represent each policy using a type of generative AI model known as a diffusion model. Diffusion models, often used for image generation, learn to create new data samples that resemble samples in a training dataset by iteratively refining their output.
    But rather than teaching a diffusion model to generate images, the researchers teach it to generate a trajectory for a robot. They do this by adding noise to the trajectories in a training dataset. The diffusion model gradually removes the noise and refines its output into a trajectory.
    This technique, known as Diffusion Policy, was previously introduced by researchers at MIT, Columbia University, and the Toyota Research Institute. PoCo builds off this Diffusion Policy work.
    The team trains each diffusion model with a different type of dataset, such as one with human video demonstrations and another gleaned from teleoperation of a robotic arm.
    Then the researchers perform a weighted combination of the individual policies learned by all the diffusion models, iteratively refining the output so the combined policy satisfies the objectives of each individual policy.
    Greater than the sum of its parts
    “One of the benefits of this approach is that we can combine policies to get the best of both worlds. For instance, a policy trained on real-world data might be able to achieve more dexterity, while a policy trained on simulation might be able to achieve more generalization,” Wang says.
    Because the policies are trained separately, one could mix and match diffusion policies to achieve better results for a certain task. A user could also add data in a new modality or domain by training an additional Diffusion Policy with that dataset, rather than starting the entire process from scratch.
    The researchers tested PoCo in simulation and on real robotic arms that performed a variety of tools tasks, such as using a hammer to pound a nail and flipping an object with a spatula. PoCo led to a 20 percent improvement in task performance compared to baseline methods.
    “The striking thing was that when we finished tuning and visualized it, we can clearly see that the composed trajectory looks much better than either one of them individually,” Wang says.
    In the future, the researchers want to apply this technique to long-horizon tasks where a robot would pick up one tool, use it, then switch to another tool. They also want to incorporate larger robotics datasets to improve performance.
    “We will need all three kinds of data to succeed for robotics: internet data, simulation data, and real robot data. How to combine them effectively will be the million-dollar question. PoCo is a solid step on the right track,” says Jim Fan, senior research scientist at NVIDIA and leader of the AI Agents Initiative, who was not involved with this work.
    This research is funded, in part, by Amazon, the Singapore Defense Science and Technology Agency, the U.S. National Science Foundation, and the Toyota Research Institute. More

  • in

    New machine learning method can better predict spine surgery outcomes

    Researchers who had been using Fitbit data to help predict surgical outcomes have a new method to more accurately gauge how patients may recover from spine surgery.
    Using machine learning techniques developed at the AI for Health Institute at Washington University in St. Louis, Chenyang Lu, the Fullgraf Professor in the university’s McKelvey School of Engineering, collaborated with Jacob Greenberg, MD, assistant professor of neurosurgery at the School of Medicine, to develop a way to predict recovery more accurately from lumbar spine surgery.
    The results published this month in the journal Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, show that their model outperforms previous models to predict spine surgery outcomes. This is important because in lower back surgery and many other types of orthopedic operations, the outcomes vary widely depending on the patient’s structural disease but also varying physical and mental health characteristics across patients.
    Surgical recovery is influenced by both preoperative physical and mental health. Some people may have catastrophizing, or excessive worry, in the face of pain that can make pain and recovery worse. Others may suffer from physiological problems that cause worse pain. If physicians can get a heads-up on the various pitfalls for each patient, that will allow for better individualized treatment plans.
    “By predicting the outcomes before the surgery, we can help establish some expectations and help with early interventions and identify high risk factors,” said Ziqi Xu, a PhD student in Lu’s lab and first author on the paper.
    Previous work in predicting surgery outcomes typically used patient questionnaires given once or twice in clinics that capture only one static slice of time.
    “It failed to capture the long-term dynamics of physical and psychological patterns of the patients,” Xu said. Prior work training machine learning algorithms focus on just one aspect of surgery outcome “but ignore the inherent multidimensional nature of surgery recovery,” she added.

    Researchers have used mobile health data from Fitbit devices to monitor and measure recovery and compare activity levels over time but this research has shown that activity data, plus longitudinal assessment data, is more accurate in predicting how the patient will do after surgery, Greenberg said.
    The current work offers a “proof of principle” showing, with the multimodal machine learning, doctors can see a much more accurate “big picture” of all the interrelated factors that affect recovery. Proceeding this work, the team first laid out the statistical methods and protocol to ensure they were feeding the AI the right balanced diet of data.
    Prior to the current publication, the team published an initial proof of principle in Neurosurgery showing that patient-reported and objective wearable measurements improve predictions of early recovery compared to traditional patient assessments. In addition to Greenberg and Xu, Madelynn Frumkin, a PhD psychological and brain sciences student in Thomas Rodebaugh’s laboratory in Arts & Sciences, was co-first author on that work. Wilson “Zack” Ray, MD, the Henry G. and Edith R. Schwartz Professor of neurosurgery in the School of Medicine, was co-senior author, along with Rodebaugh and Lu. Rodebaugh is now at the University of North Carolina at Chapel Hill.
    In that research, they show that Fitbit data can be correlated with multiple surveys that assess a person’s social and emotional state. They collected that data via “ecological momentary assessments” (EMAs) that employ smart phones to give patients frequent prompts to assess mood, pain levels and behavior multiple times throughout day.
    “We combine wearables, EMA -and clinical records to capture a broad range of information about the patients, from physical activities to subjective reports of pain and mental health, and to clinical characteristics,” Lu said.
    Greenberg added that state-of-the-art statistical tools that Rodebaugh and Frumkin have helped advance, such as “Dynamic Structural Equation Modeling,” were key in analyzing the complex, longitudinal EMA data.

    For the most recent study they then took all those factors and developed a new machine learning technique of “Multi-Modal Multi-Task Learning (M3TL)” to effectively combine these different types of data to predict multiple recovery outcomes.
    In this approach, the AI learns to weigh the relatedness among the outcomes while capturing their differences from the multimodal data, Lu adds.
    This method takes shared information on interrelated tasks of predicting different outcomes and then leverages the shared information to help the model understand how to make an accurate prediction, according to Xu.
    It all comes together in the final package producing a predicted change for each patient’s post-operative pain interference and physical function score.
    Greenberg says the study is ongoing as they continue to fine tune their models so they can take these more detailed assessments, predict outcomes and, most notably, “understand what types of factors can potentially be modified to improve longer term outcomes.”
    This study was funded by grants from AO Spine North America, the Cervical Spine Research Society, the Scoliosis Research Society, the Foundation for Barnes-Jewish Hospital, Washington University/BJC Healthcare Big Ideas Competition, the Fullgraf Foundation, and the National Institute of Mental Health (1F31MH124291-01A). More