More stories

  • in

    Chatbot outperformed physicians in clinical reasoning in head-to-head study

    ChatGPT-4, an artificial intelligence program designed to understand and generate human-like text, outperformed internal medicine residents and attending physicians at two academic medical centers at processing medical data and demonstrating clinical reasoning. In a research letter published in JAMA Internal Medicine, physician-scientists at Beth Israel Deaconess Medical Center (BIDMC) compared a large language model’s (LLM) reasoning abilities directly against human performance using standards developed to assess physicians.
    “It became clear very early on that LLMs can make diagnoses, but anybody who practices medicine knows there’s a lot more to medicine than that,” said Adam Rodman MD, an internal medicine physician and investigator in the department of medicine at BIDMC. “There are multiple steps behind a diagnosis, so we wanted to evaluate whether LLMs are as good as physicians at doing that kind of clinical reasoning. It’s a surprising finding that these things are capable of showing the equivalent or better reasoning than people throughout the evolution of clinical case.”
    Rodman and colleagues used a previously validated tool developed to assess physicians’ clinical reasoning called the revised-IDEA (r-IDEA) score. The investigators recruited 21 attending physicians and 18 residents who each worked through one of 20 selected clinical cases comprised of four sequential stages of diagnostic reasoning. The authors instructed physicians to write out and justify their differential diagnoses at each stage. The chatbot GPT-4 was given a prompt with identical instructions and ran all 20 clinical cases. Their answers were then scored for clinical reasoning (r-IDEA score) and several other measures of reasoning.
    “The first stage is the triage data, when the patient tells you what’s bothering them and you obtain vital signs,” said lead author Stephanie Cabral, MD, a third-year internal medicine resident at BIDMC. “The second stage is the system review, when you obtain additional information from the patient. The third stage is the physical exam, and the fourth is diagnostic testing and imaging.”
    Rodman, Cabral and their colleagues found that the chatbot earned the highest r-IDEA scores, with a median score of 10 out of 10 for the LLM, 9 for attending physicians and 8 for residents. It was more of a draw between the humans and the bot when it came to diagnostic accuracy — how high up the correct diagnosis was on the list of diagnosis they provided — and correct clinical reasoning. But the bots were also “just plain wrong” — had more instances of incorrect reasoning in their answers — significantly more often than residents, the researchers found. The finding underscores the notion that AI will likely be most useful as a tool to augment, not replace, the human reasoning process.
    “Further studies are needed to determine how LLMs can best be integrated into clinical practice, but even now, they could be useful as a checkpoint, helping us make sure we don’t miss something,” Cabral said. “My ultimate hope is that AI will improve the patient-physician interaction by reducing some of the inefficiencies we currently have and allow us to focus more on the conversation we’re having with our patients.
    “Early studies suggested AI could makes diagnoses, if all the information was handed to it,” Rodman said. “What our study shows is that AI demonstrates real reasoning — maybe better reasoning than people through multiple steps of the process. We have a unique chance to improve the quality and experience of healthcare for patients.”
    Co-authors included Zahir Kanjee, MD, Philip Wilson, MD, and Byron Crowe, MD, of BIDMC; Daniel Restrepo, MD, of Massachusetts General Hospital; and Raja-Elie Abdulnour, MD, of Brigham and Women’s Hospital.
    This work was conducted with support from Harvard Catalyst | The Harvard Clinical and Translational Science Center (National Center for Advancing Translational Sciences, National Institutes of Health) (award UM1TR004408) and financial contributions from Harvard University and its affiliated academic healthcare centers.
    Potential Conflicts of Interest: Rodman reports grant funding from the Gordon and Betty Moore Foundation. Crowe reports employment and equity in Solera Health. Kanjee reports receipt of royalties for books edited and membership on a paid advisory board for medical education products not related to AI from Wolters Kluwer, as well as honoraria for continuing medical education delivered from Oakstone Publishing. Abdulnour reports employment by the Massachusetts Medical Society (MMS), a not-for-profit organization that owns NEJM Healer. Abdulnour does not receive royalty from sales of NEJM Healer and does not have equity in NEJM Healer. No funding was provided by the MMS for this study. Abdulnour reports grant funding from the Gordan and Betty Moore Foundation via the National Academy of Medicine Scholars in Diagnostic Excellence. More

  • in

    Research reveals language barriers limit effectiveness of cybersecurity resources

    The idea for Fawn Ngo’s latest research came from a television interview.
    Ngo, a University of South Florida criminologist, had spoken with a Vietnamese language network in California about her interest in better understanding how people become victims of cybercrime.
    Afterward, she began receiving phone calls from viewers recounting their own experiences of victimization.
    “Some of the stories were unfortunate and heartbreaking,” said Ngo, an associate professor in the USF College of Behavioral and Community Sciences. “They made me wonder about the availability and accessibility of cybersecurity information and resources for non-English speakers. Upon investigating further, I discovered that such information and resources were either limited or nonexistent.”
    The result is what’s believed to be the first study to explore the links among demographic characteristics, cyber hygiene practices and cyber victimization using a sample of limited English proficiency internet users.
    Ngo is the lead author of an article, “Cyber Hygiene and Cyber Victimization Among Limited English Proficiency (LEP) Internet Users: A Mixed-Method Study,” which just published in the journal Victims & Offenders. The article’s co-authors are Katherine Holman, a USF graduate student and former Georgia state prosecutor, and Anurag Agarwal, professor of information systems, analytics and supply chain at Florida Gulf Coast University.
    Their research, which focused on Spanish and Vietnamese speakers, led to two closely connected main takeaways: LEP Internet users share the same concern about cyber threats and the same desire for online safety as any other individual. However, they are constrained by a lack of culturally and linguistically appropriate resources, which also hampers accurate collection of cyber victimization data among vulnerable populations. Online guidance that provides the most effective educational tools and reporting forms is only available in English. The most notable example is the website for the Internet Crime Complaint Center, which serves as the FBI’s primary apparatus for combatting cybercrime.As a result, the study showed that many well-intentioned LEP users still engage in risky online behaviors like using unsecured networks and sharing passwords. For example, only 29 percent of the study’s focus group participants avoided using public Wi-Fi over the previous 12 months, and only 17 percent said they had antivirus software installed on their digital devices.

    Previous research cited in Ngo’s paper has shown that underserved populations exhibit poorer cybersecurity knowledge and outcomes, most commonly in the form of computer viruses and hacked accounts, including social media accounts. Often, it’s because they lack awareness and understanding and isn’t a result of disinterest, Ngo said.
    “According to cybersecurity experts, humans are the weakest link in the chain of cybersecurity,” Ngo said. “If we want to secure our digital borders, we must ensure that every member in society, regardless of their language skills, is well-informed about the risks inherent in the cyber world.”
    The study’s findings point to a need for providing cyber hygiene information and resources in multiple formats, including visual aids and audio guides, to accommodate diverse literacy levels within LEP communities, Ngo said. She added that further research is needed to address the current security gap and ensure equitable access to cybersecurity resources for all Internet users.
    In the meantime, Ngo is preparing to launch a website with cybersecurity information and resources in different languages and a link to report victimization.
    “It’s my hope that cybersecurity information and resources will become as readily accessible in other languages as other vital information, such as information related to health and safety,” Ngo said. “I also want LEP victims to be included in national data and statistics on cybercrime and their experiences accurately represented and addressed in cybersecurity initiatives.” More

  • in

    Artificial intelligence boosts super-resolution microscopy

    Generative artificial intelligence (AI) might be best known from text or image-creating applications like ChatGPT or Stable Diffusion. But its usefulness beyond that is being shown in more and more different scientific fields. In their recent work, to be presented at the upcoming International Conference on Learning Representations (ICLR), researchers from the Center for Advanced Systems Understanding (CASUS) at the Helmholtz-Zentrum Dresden-Rossendorf (HZDR) in collaboration with colleagues from Imperial College London and University College London have provided a new open-source algorithm called Conditional Variational Diffusion Model. Based on generative AI, this model improves the quality of images by reconstructing them from randomness. In addition, the CVDM is computationally less expensive than established diffusion models — and it can be easily adapted for a variety of applications.
    With the advent of big data and new mathematical and data science methods, researchers aim to decipher yet unexplainable phenomena in biology, medicine, or the environmental sciences using inverse problem approaches. Inverse problems deal with recovering the causal factors leading to certain observations. You have a greyscale version of an image and want to recover the colors. There are usually several valid solutions here, as, for example, light blue and light red look identical in the grayscale image. The solution to this inverse problem can therefore be the image with the light blue or the one with the light red shirt.
    Analyzing microscopic images can also be a typical inverse problem. “You have an observation: your microscopic image. Applying some calculations, you then can learn more about your sample than first meets the eye,” says Gabriel della Maggiora, PhD student at CASUS and lead author of the ICLR paper. The results can be higher-resolution or better-quality images. However, the path from the observations, i.e. the microscopic images, to the “super images” is usually not obvious. Additionally, observational data is often noisy, incomplete, or uncertain. This all adds to the complexity of solving inverse problems making them exciting mathematical challenges.
    The power of generative AI models like Sora
    One of the powerful tools to tackle inverse problems with is generative AI. Generative AI models in general learn the underlying distribution of the data in a given training dataset. A typical example is image generation. After the training phase, generative AI models generate completely new images that are, however, consistent with the training data.
    Among the different generative AI variations, a particular family named diffusion models has recently gained popularity among researchers. With diffusion models, an iterative data generation process starts from basic noise, a concept used in information theory to mimic the effect of many random processes that occur in nature. Concerning image generation, diffusion models have learned which pixel arrangements are common and uncommon in the training dataset images. They generate the new desired image bit by bit until a pixel arrangement coincides best with the underlying structure of the training data. A good example for the power of diffusion models is the US software company OpenAI’s text-to-video model Sora. An implemented diffusion component gives Sora the ability to generate videos that appear more realistic than anything AI models have created before.
    But there is one drawback. “Diffusion models have long been known as computationally expensive to train. Some researchers were recently giving up on them exactly for that reason,” says Dr. Artur Yakimovich, Leader of a CASUS Young Investigator Group and corresponding author of the ICLR paper. “But new developments like our Conditional Variational Diffusion Model allow minimizing ‘unproductive runs’, which do not lead to the final model. By lowering the computational effort and hence power consumption, this approach may also make diffusion models more eco-friendly to train.”
    Clever training does the trick — not just in sports

    The ‘unproductive runs’ are an important drawback of diffusion models. One of the reasons is that the model is sensitive to the choice of the predefined schedule controlling the dynamics of the diffusion process: This schedule governs how the noise is added: too little or too much, wrong place or wrong time — there are many possible scenarios that end with a failed training. So far, this schedule has been set as a hyperparameter which has to be tuned for each and every new application. In other words, while designing the model, researchers usually estimate the schedule they chose in a trial-and-error manner. In the new paper presented at the ICLR, the authors incorporated the schedule already in the training phase so that their CVDM is capable of finding the optimal training on its own. The model then yielded better results than other models relying on a predefined schedule.
    Among others, the authors demonstrated the applicability of the CVDM to a scientific problem: super-resolution microscopy, a typical inverse problem. Super-resolution microscopy aims to overcome the diffraction limit, a limit that restricts resolution due to the optical characteristics of the microscopic system. To surmount this limit algorithmically, data scientists reconstruct higher-resolution images by eliminating both blurring and noise from recorded, limited-resolution images. In this scenario, the CVDM yielded comparable or even superior results compared to commonly used methods.
    “Of course, there are several methods out there to increase the meaningfulness of microscopic images — some of them relying on generative AI models,” says Yakimovich. “But we believe that our approach has some new unique properties that will leave an impact in the imaging community, namely high flexibility and speed at a comparable or even better quality compared to other diffusion model approaches. In addition, our CVDM provides direct hints where it is not very sure about the reconstruction — a very helpful property that sets the path forward to address these uncertainties in new experiments and simulations.” More

  • in

    Revolutionary biomimetic olfactory chips to enable advanced gas sensing and odor detection

    A research team led by the School of Engineering of the Hong Kong University of Science and Technology (HKUST) has addressed the long-standing challenge of creating artificial olfactory sensors with arrays of diverse high-performance gas sensors. Their newly developed biomimetic olfactory chips (BOC) are able to integrate nanotube sensor arrays on nanoporous substrates with up to 10,000 individually addressable gas sensors per chip, a configuration that is similar to how olfaction works for humans and other animals.
    For decades, researchers worldwide have been developing artificial olfaction and electronic noses (e-noses) with the aim of emulating the intricate mechanism of the biological olfactory system to effectively discern complex odorant mixtures. Yet, major challenges of their development lie on the difficulty of miniaturizing the system and increasing its recognition capabilities in determining the exact gas species and their concentrations within complex odorant mixtures.
    To tackle these issues, the research team led by Prof. FAN Zhiyong, Chair Professor at HKUST’s Department of Electronic & Computer Engineering and Department of Chemical & Biological Engineering, used an engineered material composition gradient that allows for wide arrays of diverse sensors on one small nanostructured chip. Leveraging the power of artificial intelligence, their biomimetic olfactory chips exhibit exceptional sensitivity to various gases with excellent distinguishability for mixed gases and 24 distinct odors. With a vision to expand their olfactory chip’s applications, the team also integrated the chips with vision sensors on a robot dog, creating a combined olfactory and visual system that can accurately identify objects in blind boxes.
    The development of the biomimetic olfactory chips will not only improve the existing broad applications of the artificial olfaction and e-noses systems in food, environmental, medical and industrial process control etc, but also open up new possibilities in intelligent systems, such as advanced robots and portable smart devices, for applications in security patrols and rescue operations.
    For example, in their applications in real-time monitoring and quality control, the biomimetic olfactory chips can be used to detect and analyze specific odors or volatile compounds associated with different stages of industrial processes to ensure safety; detect any abnormal or hazardous gases in environmental monitoring; and identify leakage in pipes to facilitate timely repair.
    The technology presented in this study serves as a pivotal breakthrough in the realm of odor digitization. As the scientific community witnesses the triumphant prevalence of visual information digitization, facilitated by the modern and mature imaging sensing technologies, the realm of scent-based information has yet remained untapped due to the absence of advanced odor sensors. The work conducted by Prof. Fan’s team has paved the way for the development of biomimetic odor sensors that possess immense potential. With further advancements, these sensors could find widespread utilization, akin to the ubiquitous presence of miniaturized cameras in cell phones and portable electronics, thereby enriching and enhancing people’s quality of life.
    “In the future, with the development of suitable bio-compatible materials, we hope that the biomimetic olfactory chip can also be placed on human body to allow us to smell odor that normally cannot be smelled. It can also monitor the abnormalities in volatile organic molecules in our breath and emitted by our skin, to warn us on potential diseases, reaching further potential of biomimetic engineering,” said Prof. Fan. More

  • in

    Could AI play a role in locating damage to the brain after stroke?

    Artificial intelligence (AI) may serve as a future tool for neurologists to help locate where in the brain a stroke occurred. In a new study, AI processed text from health histories and neurologic examinations to locate lesions in the brain. The study, which looked specifically at the large language model called generative pre-trained transformer 4 (GPT-4), is published in the March 27, 2024, online issue of Neurology® Clinical Practice, an official journal of the American Academy of Neurology.
    A stroke can cause long-term disability or even death. Knowing where a stroke has occurred in the brain helps predict long-term effects such as problems with speech and language or a person’s ability to move part of their body. It can also help determine the best treatment and a person’s overall prognosis.
    Damage to the brain tissue from a stroke is called a lesion. A neurologic exam can help locate lesions, when paired with a review of a person’s health history. The exam involves symptom evaluation and thinking and memory tests. People with stroke often have brain scans to locate lesions.
    “Not everyone with stroke has access to brain scans or neurologists, so we wanted to determine whether GPT-4 could accurately locate brain lesions after stroke based on a person’s health history and a neurologic exam,” said study author Jung-Hyun Lee, MD, of State University of New York (SUNY) Downstate Health Sciences University in Brooklyn and a member of the American Academy of Neurology.
    The study used 46 published cases of people who had stroke. Researchers gathered text from participants’ health histories and neurologic exams. The raw text was fed into GPT-4. Researchers asked it to answer three questions: whether a participant had one or more lesions; on which side of the brain lesions were located; and in which region of the brain the lesions were found. They repeated these questions for each participant three times. Results from GPT-4 were then compared to brain scans for each participant.
    Researchers found that GPT-4 processed the text from the health histories and neurologic exams to locate lesions in many participants’ brains, identifying which side of the brain the lesion was on, as well as the specific brain region, with the exception of lesions in the cerebellum and spinal cord.
    For the majority of people, GPT-4 was able to identify on which side of the brain lesions were found with a sensitivity of 74% and a specificity of 87%. Sensitivity is the percentage of actual positives that are correctly identified as positive. Specificity is the percentage of negatives that are correctly identified. It also identified the brain region with a sensitivity of 85% and a specificity of 94%.

    When looking at how often the three tests had the same result for each participant, GPT-4 was consistent for 76% of participants regarding the number of brain lesions. It was consistent for 83% of participants for the side of the brain, and for 87% of participants regarding the brain regions.
    However, when combining its responses to all three questions across all three times, GPT-4 provided accurate answers for 41% of participants.
    “While not yet ready for use in the clinic, large language models such as generative pre-trained transformers have the potential not only to assist in locating lesions after stroke, they may also reduce health care disparities because they can function across different languages,” said Lee. “The potential for use is encouraging, especially due to the great need for improved health care in underserved areas across multiple countries where access to neurologic care is limited.”
    A limitation of the study is that the accuracy of GPT-4 depends on the quality of the information it is provided. While researchers had detailed health histories and neurologic exam information for each participant, such information is not always available for everyone who has a stroke. More

  • in

    Robot, can you say ‘cheese’?

    What would you do if you walked up to a robot with a human-like head and it smiled at you first? You’d likely smile back and perhaps feel the two of you were genuinely interacting. But how does a robot know how to do this? Or a better question, how does it know to get you to smile back?
    While we’re getting accustomed to robots that are adept at verbal communication, thanks in part to advancements in large language models like ChatGPT, their nonverbal communication skills, especially facial expressions, have lagged far behind. Designing a robot that can not only make a wide range of facial expressions but also know when to use them has been a daunting task.
    Tackling the challenge
    The Creative Machines Lab at Columbia Engineering has been working on this challenge for more than five years. In a new study published today in Science Robotics, the group unveils Emo, a robot that anticipates facial expressions and executes them simultaneously with a human. It has even learned to predict a forthcoming smile about 840 milliseconds before the person smiles, and to co-express the smile simultaneously with the person.
    The team, led by Hod Lipson, a leading researcher in the fields of artificial intelligence (AI) and robotics, faced two challenges: how to mechanically design an expressively versatile robotic face which involves complex hardware and actuation mechanisms, and knowing which expression to generate so that they appear natural, timely, and genuine. The team proposed training a robot to anticipate future facial expressions in humans and execute them simultaneously with a person. The timing of these expressions was critical — delayed facial mimicry looks disingenuous, but facial co-expression feels more genuine since it requires correctly inferring the human’s emotional state for timely execution.
    How Emo connects with you
    Emo is a human-like head with a face that is equipped with 26 actuators that enable a broad range of nuanced facial expressions. The head is covered with a soft silicone skin with a magnetic attachment system, allowing for easy customization and quick maintenance. For more lifelike interactions, the researchers integrated high-resolution cameras within the pupil of each eye, enabling Emo to make eye contact, crucial for nonverbal communication.

    The team developed two AI models: one that predicts human facial expressions by analyzing subtle changes in the target face and another that generates motor commands using the corresponding facial expressions.
    To train the robot how to make facial expressions, the researchers put Emo in front of the camera and let it do random movements. After a few hours, the robot learned the relationship between their facial expressions and the motor commands — much the way humans practice facial expressions by looking in the mirror. This is what the team calls “self modeling” — similar to our human ability to imagine what we look like when we make certain expressions.
    Then the team ran videos of human facial expressions for Emo to observe them frame by frame. After training, which lasts a few hours, Emo could predict people’s facial expressions by observing tiny changes in their faces as they begin to form an intent to smile.
    “I think predicting human facial expressions accurately is a revolution in HRI. Traditionally, robots have not been designed to consider humans’ expressions during interactions. Now, the robot can integrate human facial expressions as feedback,” said the study’s lead author Yuhang Hu, who is a PhD student at Columbia Engineering in Lipson’s lab. “When a robot makes co-expressions with people in real-time, it not only improves the interaction quality but also helps in building trust between humans and robots. In the future, when interacting with a robot, it will observe and interpret your facial expressions, just like a real person.”
    What’s next
    The researchers are now working to integrate verbal communication, using a large language model like ChatGPT into Emo. As robots become more capable of behaving like humans, Lipson is well aware of the ethical considerations associated with this new technology.
    “Although this capability heralds a plethora of positive applications, ranging from home assistants to educational aids, it is incumbent upon developers and users to exercise prudence and ethical considerations,” says Lipson, James and Sally Scapa Professor of Innovation in the Department of Mechanical Engineering at Columbia Engineering, co-director of the Makerspace at Columbia, and a member of the Data Science Institute. “But it’s also very exciting — by advancing robots that can interpret and mimic human expressions accurately, we’re moving closer to a future where robots can seamlessly integrate into our daily lives, offering companionship, assistance, and even empathy. Imagine a world where interacting with a robot feels as natural and comfortable as talking to a friend.” More

  • in

    More efficient TVs, screens and lighting

    New multidisciplinary research from the University of St Andrews could lead to more efficient televisions, computer screens and lighting.
    Researchers at the Organic Semiconductor Centre in the School of Physics and Astronomy, and the School of Chemistry have proposed a new approach to designing efficient light-emitting materials in a paper published this week in Nature (27 March).
    Light-emitting materials are used in organic light-emitting diodes (OLEDs) that are now found in the majority of mobile phone displays and smartwatches, and some televisions and automotive lighting.
    The latest generation of emitter materials under development produce OLEDs that have high efficiency at low brightness, but suffer reduced efficiency as the brightness is increased to the levels required for lighting and outdoor applications. This problem is known as ‘efficiency roll-off’.
    Researchers have identified the combination of features of materials required to overcome this problem. Guidelines developed by the team of researchers, led by Professor Ifor Samuel and Professor Eli Zysman-Colman, will help OLED researchers develop materials that maintain high efficiency at high brightness, enabling the latest materials to be used for applications in displays, lighting and medicine.
    Commenting on the research, Professor Zysman-Colman explained that the findings “provide clearer insight into the link between the properties of the emitter material and the performance of the OLED.”
    Professor Samuel said, “Our new approach to this problem will help to develop bright, efficient and colourful OLEDs that use less power.” More

  • in

    New software enables blind and low-vision users to create interactive, accessible charts

    A growing number of tools enable users to make online data representations, like charts, that are accessible for people who are blind or have low vision. However, most tools require an existing visual chart that can then be converted into an accessible format.
    This creates barriers that prevent blind and low-vision users from building their own custom data representations, and it can limit their ability to explore and analyze important information.
    A team of researchers from MIT and University College London (UCL) wants to change the way people think about accessible data representations.
    They created a software system called Umwelt (which means “environment” in German) that can enable blind and low-vision users to build customized, multimodal data representations without needing an initial visual chart.
    Umwelt, an authoring environment designed for screen-reader users, incorporates an editor that allows someone to upload a dataset and create a customized representation, such as a scatterplot, that can include three modalities: visualization, textual description, and sonification. Sonification involves converting data into nonspeech audio.
    The system, which can represent a variety of data types, includes a viewer that enables a blind or low-vision user to interactively explore a data representation, seamlessly switching between each modality to interact with data in a different way.
    The researchers conducted a study with five expert screen-reader users who found Umwelt to be useful and easy to learn. In addition to offering an interface that empowered them to create data representations — something they said was sorely lacking — the users said Umwelt could facilitate communication between people who rely on different senses.

    “We have to remember that blind and low-vision people aren’t isolated. They exist in these contexts where they want to talk to other people about data,” says Jonathan Zong, an electrical engineering and computer science (EECS) graduate student and lead author of a paper introducing Umwelt. “I am hopeful that Umwelt helps shift the way that researchers think about accessible data analysis. Enabling the full participation of blind and low-vision people in data analysis involves seeing visualization as just one piece of this bigger, multisensory puzzle.”
    Joining Zong on the paper are fellow EECS graduate students Isabella Pedraza Pineros and Mengzhu “Katie” Chen; Daniel Hajas, a UCL researcher who works with the Global Disability Innovation Hub; and senior author Arvind Satyanarayan, associate professor of computer science at MIT who leads the Visualization Group in the Computer Science and Artificial Intelligence Laboratory. The paper will be presented at the ACM Conference on Human Factors in Computing.
    De-centering visualization
    The researchers previously developed interactive interfaces that provide a richer experience for screen reader users as they explore accessible data representations. Through that work, they realized most tools for creating such representations involve converting existing visual charts.
    Aiming to decenter visual representations in data analysis, Zong and Hajas, who lost his sight at age 16, began co-designing Umwelt more than a year ago.
    At the outset, they realized they would need to rethink how to represent the same data using visual, auditory, and textual forms.

    “We had to put a common denominator behind the three modalities. By creating this new language for representations, and making the output and input accessible, the whole is greater than the sum of its parts,” says Hajas.
    To build Umwelt, they first considered what is unique about the way people use each sense.
    For instance, a sighted user can see the overall pattern of a scatterplot and, at the same time, move their eyes to focus on different data points. But for someone listening to a sonification, the experience is linear since data are converted into tones that must be played back one at a time.
    “If you are only thinking about directly translating visual features into nonvisual features, then you miss out on the unique strengths and weaknesses of each modality,” Zong adds.
    They designed Umwelt to offer flexibility, enabling a user to switch between modalities easily when one would better suit their task at a given time.
    To use the editor, one uploads a dataset to Umwelt, which employs heuristics to automatically creates default representations in each modality.
    If the dataset contains stock prices for companies, Umwelt might generate a multiseries line chart, a textual structure that groups data by ticker symbol and date, and a sonification that uses tone length to represent the price for each date, arranged by ticker symbol.
    The default heuristics are intended to help the user get started.
    “In any kind of creative tool, you have a blank-slate effect where it is hard to know how to begin. That is compounded in a multimodal tool because you have to specify things in three different representations,” Zong says.
    The editor links interactions across modalities, so if a user changes the textual description, that information is adjusted in the corresponding sonification. Someone could utilize the editor to build a multimodal representation, switch to the viewer for an initial exploration, then return to the editor to make adjustments.
    Helping users communicate about data
    To test Umwelt, they created a diverse set of multimodal representations, from scatterplots to multiview charts, to ensure the system could effectively represent different data types. Then they put the tool in the hands of five expert screen reader users.
    Study participants mostly found Umwelt to be useful for creating, exploring, and discussing data representations. One user said Umwelt was like an “enabler” that decreased the time it took them to analyze data. The users agreed that Umwelt could help them communicate about data more easily with sighted colleagues.
    Moving forward, the researchers plan to create an open-source version of Umwelt that others can build upon. They also want to integrate tactile sensing into the software system as an additional modality, enabling the use of tools like refreshable tactile graphics displays.
    “In addition to its impact on end users, I am hoping that Umwelt can be a platform for asking scientific questions around how people use and perceive multimodal representations, and how we can improve the design beyond this initial step,” says Zong.
    This work was supported, in part, by the National Science Foundation and the MIT Morningside Academy for Design Fellowship. More