More stories

  • in

    Can AI learn like us?

    It reads. It talks. It collates mountains of data and recommends business decisions. Today’s artificial intelligence might seem more human than ever. However, AI still has several critical shortcomings.
    “As impressive as ChatGPT and all these current AI technologies are, in terms of interacting with the physical world, they’re still very limited. Even in things they do, like solve math problems and write essays, they take billions and billions of training examples before they can do them well, ” explains Cold Spring Harbor Laboratory (CSHL) NeuroAI Scholar Kyle Daruwalla.
    Daruwalla has been searching for new, unconventional ways to design AI that can overcome such computational obstacles. And he might have just found one.
    The key was moving data. Nowadays, most of modern computing’s energy consumption comes from bouncing data around. In artificial neural networks, which are made up of billions of connections, data can have a very long way to go. So, to find a solution, Daruwalla looked for inspiration in one of the most computationally powerful and energy-efficient machines in existence — the human brain.
    Daruwalla designed a new way for AI algorithms to move and process data much more efficiently, based on how our brains take in new information. The design allows individual AI “neurons” to receive feedback and adjust on the fly rather than wait for a whole circuit to update simultaneously. This way, data doesn’t have to travel as far and gets processed in real time.
    “In our brains, our connections are changing and adjusting all the time,” Daruwalla says. “It’s not like you pause everything, adjust, and then resume being you.”
    The new machine-learning model provides evidence for a yet unproven theory that correlates working memory with learning and academic performance. Working memory is the cognitive system that enables us to stay on task while recalling stored knowledge and experiences.
    “There have been theories in neuroscience of how working memory circuits could help facilitate learning. But there isn’t something as concrete as our rule that actually ties these two together. And so that was one of the nice things we stumbled into here. The theory led out to a rule where adjusting each synapse individually necessitated this working memory sitting alongside it, ” says Daruwalla.
    Daruwalla’s design may help pioneer a new generation of AI that learns like we do. That would not only make AI more efficient and accessible — it would also be somewhat of a full-circle moment for neuroAI. Neuroscience has been feeding AI valuable data since long before ChatGPT uttered its first digital syllable. Soon, it seems, AI may return the favor. More

  • in

    Creation of a power-generating, gel electret-based device

    A team of researchers from NIMS, Hokkaido University and Meiji Pharmaceutical University has developed a gel electret capable of stably retaining a large electrostatic charge. The team then combined this gel with highly flexible electrodes to create a sensor capable of perceiving low-frequency vibrations (e.g., vibrations generated by human motion) and converting them into output voltage signals. This device may potentially be used as a wearable healthcare sensor.
    Interest in the development of soft, lightweight, power-generating materials has been growing in recent years for use in soft electronics designed for various purposes, such as healthcare and robotics. Electret materials capable of stably retaining electrostatic charge may be used to develop vibration-powered devices without external power sources. NIMS has been leading efforts to develop a low-volatility, room-temperature alkyl-π liquid composed of a π-conjugated dye moiety and flexible yet branched alkyl chains (a type of hydrocarbon compound). The alkyl-π liquids exhibit excellent charge retention properties, can be applied to other materials (e.g., through painting and impregnation) and are easily formable. However, when these liquids have been combined with electrodes to create flexible devices, they have proven difficult to immobilize and seal, resulting in leakage issues. Moreover, the electrostatic charge retention capacities of alkyl-π liquids needed to be increased in order to improve their power generation capabilities.
    The research team recently succeeded in creating an alkyl-π gel by adding a trace amount of a low-molecular-weight gelator to an alkyl-π liquid. The elastic storage modulus of this gel was found to be 40 million times that of its liquid counterpart, and it could be simplified fixation and sealed. Moreover, the gel-electret obtained by charging this gel achieved a 24% increase in charge retention compared to the base material (i.e., the alkyl-π liquid), thanks to the improved confinement of electrostatic charges within the gel. The team then combined flexible electrodes with the gel-electret to create a vibration sensor. This sensor was able to perceive vibrations with frequencies as low as 17 Hz and convert them into an output voltage of 600 mV — 83% higher than the voltage generated by an alkyl-π liquid electret-based sensor.
    In future research, the team aims to develop wearable sensors capable of responding to subtle vibrations and various strain deformations by further improving the charging electret characteristics (i.e., charge capacity and charge life) and strength of the alkyl-π gel. Additionally, since this gel is recyclable and reusable as a vibration sensor material, its use is expected to help promote a circular economy. More

  • in

    A railroad of cells

    Looking under the microscope, a group of cells slowly moves forward in a line, like a train on the tracks. The cells navigate through complex environments. A new approach by researchers involving the Institute of Science and Technology Austria (ISTA) now shows how they do this and how they interact with each other. The experimental observations and the following mathematical concept are published in Nature Physics.
    The majority of the cells in the human body cannot move. Some specific ones, however, can go to different places. For example, in wound healing, cells move through the body to repair damaged tissue. They sometimes travel alone or in different group sizes. Although the process is increasingly understood, little is known about how cells interact while traveling and how they collectively navigate the complex environments found in the body. An interdisciplinary team of theoretical physicists at the Institute of Science and Technology Austria (ISTA) and experimentalists from the University of Mons in Belgium now has new insights.
    Much like social dynamics experiments, where understanding the interactions of a small group of people is easier than analyzing an entire society, the scientists studied the traveling behavior of a small group of cells in well-defined in vitro surroundings, i.e. outside a living organism, in a Petri dish equipped with interior features. Based on their findings, they developed a framework of interaction rules, which is now published in Nature Physics.
    Cells travel in trains
    David Brückner rushes back to his office to grab his laptop. “I think it’s better to show some videos of our experiments,” he says excitedly and presses play. The video shows a Petri dish. Microstripes — one-dimensional lanes guiding cell movement — are printed on the substrate beside a zebrafish scale made up of numerous cells. Special wound-healing cells, known as “keratocytes” start to stretch away from the scale, forming branches into the lanes. “At first, cells stick together through adhesive molecules on their surface — it’s like they’re holding hands,” explains Brückner. Suddenly, the bond breaks off, and the cells assemble into tiny groups, moving forward like trains along tracks. “The length of the train is always different. Sometimes it’s two, sometimes it’s ten. It depends on the initial conditions.”
    Eléonore Vercurysse and Sylvain Gabriele from the University of Mons in Belgium observed this phenomenon while investigating keratocytes and their wound-healing features within different geometrical patterns. To help interpret these puzzling observations, they reached out to theoretical physicists David Brückner and Edouard Hannezo at ISTA.
    Cells have a steering wheel
    “There’s a gradient within each cell that determines where the cell is going. It’s called ‘polarity’ and it’s like the cell’s very own steering wheel,” says Brückner. “Cells communicate their polarity to neighboring cells, allowing them to move in concert.” But how they do so has remained a big puzzle in the field. Brückner and Hannezo started brainstorming. The two scientists developed a mathematical model combining a cell’s polarity, their interactions, and the geometry of their surroundings. They then transferred the framework into computer simulations, which helped them visualize different scenarios.

    The first thing the scientists in Austria looked at was the speed of the cell trains. The simulation revealed that the speed of the trains is independent of their length, whether they consist of two or ten cells. “Imagine if the first cell did all the work, dragging the others behind it; the overall performance would decrease,” says Hannezo. “But that’s not the case. Within the trains, all the cells are polarized in the same direction. They are aligned and in sync about their movement and smoothly move forward.” In other words, the trains operate like an all-wheel drive rather than just a front-wheel drive.
    As a next step, the theoreticians examined the effects of increasing the width of the lanes and the cell clusters in their simulations. Compared to cells moving in a single file, clusters were much slower. The explanation is quite simple: the more cells are clustered together, the more they bump into each other. These collisions cause them to polarize away from each other and move in opposite directions. The cells are not aligned properly, which disrupts the flow of movement and drastically influences the overall speed. This phenomenon was also observed in the Belgian lab (in vitro experiments).
    Dead end? No problem for cell clusters
    From an efficiency standpoint, it sounds like moving in clusters is not ideal. However, the model predicted that it also had its benefits when cells navigate through complex terrain, as they do, for instance, in the human body. To test this, the scientists added a dead end, both in the experiments and in the simulations. “Trains of cells get to the dead end quickly, but struggle to change direction. Their polarization is well aligned, and it’s very hard for them to agree on switching around,” says Brückner. “Whereas in the cluster, quite a few cells are already polarized in the other direction, making the change of direction way easier.”
    Trains or clusters?
    Naturally, the question arises: when do cells move in clusters, and when do they move in trains? The answer is that both scenarios are observed in nature. For example, some developmental processes rely on clusters of cells moving from one side to the other, while others depend on small trains of cells moving independently. “Our model doesn’t only apply to a single process. Instead, it is a broadly applicable framework showing that placing cells in an environment with geometric constraints is highly instructive, as it challenges them and allows us to decipher their interactions with each other,” Hannezo adds.
    A small train packed with information
    Recent publications by the Hannezo group suggest that cell communication propagates in waves — an interplay between biochemical signals, physical behavior, and motion. The scientists’ new model now provides a physical foundation for these cell-to-cell interactions, possibly aiding in understanding the big picture. Based on this framework, the collaborators can delve deeper into the molecular players involved in this process. According to Brückner, the behaviors revealed by these small cell trains can help us understand large-scale movements, such as those seen in entire tissues. More

  • in

    Researchers leverage shadows to model 3D scenes, including objects blocked from view

    Imagine driving through a tunnel in an autonomous vehicle, but unbeknownst to you, a crash has stopped traffic up ahead. Normally, you’d need to rely on the car in front of you to know you should start braking. But what if your vehicle could see around the car ahead and apply the brakes even sooner?
    Researchers from MIT and Meta have developed a computer vision technique that could someday enable an autonomous vehicle to do just that.
    They have introduced a method that creates physically accurate, 3D models of an entire scene, including areas blocked from view, using images from a single camera position. Their technique uses shadows to determine what lies in obstructed portions of the scene.
    They call their approach PlatoNeRF, based on Plato’s allegory of the cave, a passage from the Greek philosopher’s “Republic”in which prisoners chained in a cave discern the reality of the outside world based on shadows cast on the cave wall.
    By combining lidar (light detection and ranging) technology with machine learning, PlatoNeRF can generate more accurate reconstructions of 3D geometry than some existing AI techniques. Additionally, PlatoNeRF is better at smoothly reconstructing scenes where shadows are hard to see, such as those with high ambient light or dark backgrounds.
    In addition to improving the safety of autonomous vehicles, PlatoNeRF could make AR/VR headsets more efficient by enabling a user to model the geometry of a room without the need to walk around taking measurements. It could also help warehouse robots find items in cluttered environments faster.
    “Our key idea was taking these two things that have been done in different disciplines before and pulling them together — multibounce lidar and machine learning. It turns out that when you bring these two together, that is when you find a lot of new opportunities to explore and get the best of both worlds,” says Tzofi Klinghoffer, an MIT graduate student in media arts and sciences, affiliate of the MIT Media Lab, and lead author of a paper on PlatoNeRF.

    Klinghoffer wrote the paper with his advisor, Ramesh Raskar, associate professor of media arts and sciences and leader of the Camera Culture Group at MIT; senior author Rakesh Ranjan, a director of AI research at Meta Reality Labs; as well as Siddharth Somasundaram at MIT, and Xiaoyu Xiang, Yuchen Fan, and Christian Richardt at Meta. The research will be presented at the Conference on Computer Vision and Pattern Recognition.
    Shedding light on the problem
    Reconstructing a full 3D scene from one camera viewpoint is a complex problem.
    Some machine-learning approaches employ generative AI models that try to guess what lies in the occluded regions, but these models can hallucinate objects that aren’t really there. Other approaches attempt to infer the shapes of hidden objects using shadows in a color image, but these methods can struggle when shadows are hard to see.
    For PlatoNeRF, the MIT researchers built off these approaches using a new sensing modality called single-photon lidar. Lidars map a 3D scene by emitting pulses of light and measuring the time it takes that light to bounce back to the sensor. Because single-photon lidars can detect individual photons, they provide higher-resolution data.
    The researchers use a single-photon lidar to illuminate a target point in the scene. Some light bounces off that point and returns directly to the sensor. However, most of the light scatters and bounces off other objects before returning to the sensor. PlatoNeRF relies on these second bounces of light.

    By calculating how long it takes light to bounce twice and then return to the lidar sensor, PlatoNeRF captures additional information about the scene, including depth. The second bounce of light also contains information about shadows.
    The system traces the secondary rays of light — those that bounce off the target point to other points in the scene — to determine which points lie in shadow (due to an absence of light). Based on the location of these shadows, PlatoNeRF can infer the geometry of hidden objects.
    The lidar sequentially illuminates 16 points, capturing multiple images that are used to reconstruct the entire 3D scene.
    “Every time we illuminate a point in the scene, we are creating new shadows. Because we have all these different illumination sources, we have a lot of light rays shooting around, so we are carving out the region that is occluded and lies beyond the visible eye,” Klinghoffer says.
    A winning combination
    Key to PlatoNeRF is the combination of multibounce lidar with a special type of machine-learning model known as a neural radiance field (NeRF). A NeRF encodes the geometry of a scene into the weights of a neural network, which gives the model a strong ability to interpolate, or estimate, novel views of a scene.
    This ability to interpolate also leads to highly accurate scene reconstructions when combined with multibounce lidar, Klinghoffer says.
    “The biggest challenge was figuring out how to combine these two things. We really had to think about the physics of how light is transporting with multibounce lidar and how to model that with machine learning,” he says.
    They compared PlatoNeRF to two common alternative methods, one that only uses lidar and the other that only uses a NeRF with a color image.
    They found that their method was able to outperform both techniques, especially when the lidar sensor had lower resolution. This would make their approach more practical to deploy in the real world, where lower resolution sensors are common in commercial devices.
    “About 15 years ago, our group invented the first camera to ‘see’ around corners, that works by exploiting multiple bounces of light, or ‘echoes of light.’ Those techniques used special lasers and sensors, and used three bounces of light. Since then, lidar technology has become more mainstream, that led to our research on cameras that can see through fog. This new work uses only two bounces of light, which means the signal to noise ratio is very high, and 3D reconstruction quality is impressive,” Raskar says.
    In the future, the researchers want to try tracking more than two bounces of light to see how that could improve scene reconstructions. In addition, they are interested in applying more deep learning techniques and combining PlatoNeRF with color image measurements to capture texture information.
    Further information: https://openaccess.thecvf.com/content/CVPR2024/html/Klinghoffer_PlatoNeRF_3D_Reconstruction_in_Platos_Cave_via_Single-View_Two-Bounce_Lidar_CVPR_2024_paper.html More

  • in

    Breakthrough may clear major hurdle for quantum computers

    The potential of quantum computers is currently thwarted by a trade-off problem. Quantum systems that can carry out complex operations are less tolerant to errors and noise, while systems that are more protected against noise are harder and slower to compute with. Now a research team from Chalmers University of Technology, in Sweden, has created a unique system that combats the dilemma, thus paving the way for longer computation time and more robust quantum computers.
    For the impact of quantum computers to be realised in society, quantum researchers first need to deal with some major obstacles. So far, errors and noise stemming from, for example, electromagnetic interference or magnetic fluctuations, cause the sensitive qubits to lose their quantum states — and subsequently their ability to continue the calculation. The amount of time that a quantum computer can work on a problem is thus so far limited. Additionally, for a quantum computer to be able to tackle complex problems, quantum researchers need to find a way to control the quantum states. Like a car without a steering wheel, quantum states may be considered somewhat useless if there is no efficient control system to manipulate them.
    However, the research field is facing a trade-off problem. Quantum systems that allow for efficient error correction and longer computation times are on the other hand deficient in their ability to control quantum states — and vice versa. But now a research team at Chalmers University of Technology has managed to find a way to battle this dilemma.
    “We have created a system that enables extremely complex operations on a multi-state quantum system, at an unprecedented speed.” says Simone Gasparinetti, leader of the 202Q-lab at Chalmers University of Technology and senior author of the study.
    Deviates from the two-quantum-state principle
    While the building blocks of a classical computer, bits, have either the value 1 or 0, the most common building blocks of quantum computers, qubits, can have the value 1 and 0 at the same time — in any combination. The phenomenon is called superposition and is one of the key ingredients that enable a quantum computer to perform simultaneous calculations, with enormous computing potential as a result. However, qubits encoded in physical systems are extremely sensitive to errors, which has led researchers in the field to search for ways to detect and correct these errors. The system created by the Chalmers researchers is based on so called continuous-variable quantum computing and uses harmonic oscillators, a type of microscopic component, to encode information linearly. The oscillators used in the study consist of thin strips of superconducting material patterned on an insulating substrate to form microwave resonators, a technology fully compatible with the most advanced superconducting quantum computers. The method is previously known in the field and departs from the two-quantum state principle as it offers a much larger number of physical quantum states, thus making quantum computers significantly better equipped against errors and noise.
    “Think of a qubit as a blue lamp that, quantum mechanically, can be both switched on and off simultaneously. In contrast, a continuous variable quantum system is like an infinite rainbow, offering a seamless gradient of colours. This illustrates its ability to access a vast number of states, providing far richer possibilities than the qubit’s two states,” says Axel Eriksson, researcher in quantum technology at Chalmers University of Technology and lead author of the study.

    Combats trade-off problem between operation complexity and fault tolerance
    Although continuous-variable quantum computing based on harmonic oscillators enables improved error correction, its linear nature does not allow for complex operations to be carried out. Attempts to combine harmonic oscillators with control systems such as superconducting quantum systems have been made but have been hindered by the so-called Kerr-effect. The Kerr-effect in turn scrambles the many quantum states offered by the oscillator, canceling the desired effect.
    By putting a control system device inside the oscillator, the Chalmers researchers were able to circumvent the Kerr-effect and combat the trade-off problem. The system presents a solution that preserves the advantages of the harmonic oscillators, such as a resource-efficient path towards fault tolerance, while enabling accurate control of quantum states at high speed. The system is described in an article published in Nature Communications and may pave the way for more robust quantum computers.
    “Our community has often tried to keep superconducting elements away from quantum oscillators, not to scramble the fragile quantum states. In this work, we have challenged this paradigm. By embedding a controlling device at the heart of the oscillator we were able to avoid scrambling the many quantum states while at the same time being able to control and manipulate them. As a result, we demonstrated a novel set of gate operations performed at very high speed,” says Simone Gasparinetti. More

  • in

    Advanced artificial intelligence: A revolution for sustainable agriculture

    The rise of advanced artificial intelligence (edge AI) could well mark the beginning of a new era for sustainable agriculture. A recent study proposes a roadmap for integrating this technology into farming practices. The aim? To improve the efficiency, quality and safety of agricultural production, while addressing a range of environmental, social and economic challenges.
    One of the main objectives of sustainable agricultural practices is to efficiently feed a growing world population. Digital technology, such as artificial intelligence (AI), can bring substantial benefits to agriculture by improving farming practices that can increase the efficiency, yield, quality and safety of agricultural production. Edge AI refers to the implementation of artificial intelligence in an advanced computing environment. “This technology enables calculations to be carried out close to where the data is collected, rather than in a centralized cloud computing facility or off-site datacenter,” explains Moussa El Jarroudi, researcher in Crop Environment and Epidemiology at the University of Liège (Belgium). This means devices can make smarter decisions faster, without connecting to the cloud or off-site datacenters.”
    In a new study published in the scientific journal Nature Sustainability, a scientific team led by Moussa El Jarroudi, demonstrates how to overcome these challenges and how AI can be practically integrated into agricultural systems to meet the growing needs of sustainable food production. “Deploying AI in agriculture is not without its challenges. It requires innovative solutions and the right infrastructure. Experts like Professor Said Hamdioui of Delft University of Technology have developed low-energy systems capable of operating autonomously.” Although challenges remain, particularly in the context of climate change, the prospects opened up by these advances are promising.
    The University of Liège played a crucial role in this study, contributing cutting-edge resources and expertise in the fields of artificial intelligence and sustainable agriculture. ULiège researchers have developed innovative edge AI solutions and conducted in-depth analyses of their potential impact on agricultural practices.
    A new era for agriculture
    “The results of our study are part of a growing trend to integrate advanced technologies into agriculture to achieve sustainability goals,” resumes Benoît Mercatoris, co-author of the study and agronomy researcher at ULiège. The adoption of edge AI can transform agricultural practices by increasing resource efficiency, improving crop quality and reducing environmental impacts. This technology is positioning itself as an essential pillar for the future of sustainable agriculture.”
    The applications are vast: improving crop management with real-time data, optimizing the use of resources such as water and fertilizers, reducing post-harvest losses and increasing food safety, or enhancing monitoring and response capabilities to changing weather conditions. This study paves the way for smarter, more environmentally-friendly agriculture, thanks to edge AI. A technological revolution that could well transform the way we produce and consume. More

  • in

    Towards wider 5G network coverage: Novel wirelessly powered relay transceiver

    A novel 256-element wirelessly powered transceiver array for non-line-of-sight 5G communication, featuring efficient wireless power transmission and high-power conversion efficiency, has been designed by scientists at Tokyo Tech. The innovative design can enhance the 5G network coverage even to places with link blockage, improving flexibility and coverage area, and potentially making high-speed, low-latency communication more accessible.
    Millimeter wave 5G communication, which uses extremely high-frequency radio signals (24 to 100 GHz), is a promising technology for next-generation wireless communication, exhibiting high speed, low latency, and large network capacity. However, current 5G networks face two key challenges. The first one is the low signal-to-noise ratio (SNR). A high SNR is crucial for good communication. Another challenge is link blockage, which refers to the disruption in signal between transmitter and receiver due to obstacles such as buildings.
    Beamforming is a key technique for long-distance communication using millimeter waves which improves SNR. This technique uses an array of sensors to focus radio signals into a narrow beam in a specific direction, akin to focusing a flashlight beam on a single point. However, it is limited to line-of-sight communication, where transmitters and receivers must be in a straight line, and the received signal can become degraded due to obstacles. Furthermore, concrete and modern glass materials can cause high propagation losses. Hence, there is an urgent need for a non-line-of-sight (NLoS) relay system to extend the 5G network coverage, especially indoors.
    To address these issues, a team of researchers led by Associate Professor Atsushi Shirane from the Laboratory for Future Interdisciplinary Research of Science and Technology at Tokyo Institute of Technology(Tokyo Tech) designed a novel wirelessly powered relay transceiver for 28 GHz millimeter-wave 5G communication. Their study has been published in the Proceedings of the 2024 IEEE MTT-S International Microwave Symposium.
    Explaining the motivation behind their study, Shirane says, “Previously, for NLoS communication, two types of 5G relays have been explored: an active type and a wireless-powered type. While the active relay can maintain a good SNR even with few rectifier arrays, it has high power consumption. The wirelessly powered type does not require a dedicated power supply but needs many rectifier arrays to maintain SNR due to low conversion gain and uses CMOS diodes with lower than ten percent power conversion efficiency. Our design addresses their issues while using commercially available semiconductor integrated circuits (ICs).”
    The proposed transceiver consists of 256 rectifier arrays with 24 GHz wireless power transfer (WPT). These arrays consist of discrete ICs, including gallium arsenide diodes, and baluns, which interface between balanced and unbalanced (bal-un) signal lines, DPDT switches, and digital ICs. Notably, the transceiver is capable of simultaneous data and power transmission, converting 24 GHz WPT signal to direct current (DC) and facilitating 28 GHz bi-directional transmission and reception at the same time. The 24 GHz signal is received at each rectifier individually, while the 28 GHz signal is transmitted and received using beamforming. Both signals can be received from the same or different directions and the 28 GHz signal can be transmitted either with retro-reflecting with the 24 GHz pilot signal or in any direction.
    Testing revealed that the proposed transceiver can achieve a power conversion efficiency of 54% and a conversion gain of -19 decibels, higher than conventional transceivers while maintaining SNR over long distances. Additionally, it achieves about 56 milliwatts of power generation which can be increased even further by increasing the number of arrays. This can also improve the resolution of the transmission and reception beams. “The proposed transceiver can contribute to the deployment of the millimeter-wave 5G network even to places where the link is blocked, improving installation flexibility and coverage area,” remarks Shirane about the benefits of their device. More

  • in

    Researchers teach AI to spot what you’re sketching

    A new way to teach artificial intelligence (AI) to understand human line drawings — even from non-artists — has been developed by a team from the University of Surrey and Stanford University.
    The new model approaches human levels of performance in recognising scene sketches.
    Dr Yulia Gryaditskaya, Lecturer at Surrey’s Centre for Vision, Speech and Signal Processing (CVSSP) and Surrey Institute for People-Centred AI (PAI), said:
    “Sketching is a powerful language of visual communication. It is sometimes even more expressive and flexible than spoken language.
    “Developing tools for understanding sketches is a step towards more powerful human-computer interaction and more efficient design workflows. Examples include being able to search for or create images by sketching something.”
    People of all ages and backgrounds use drawings to explore new ideas and communicate. Yet, AI systems have historically struggled to understand sketches.
    AI has to be taught how to understand images. Usually, this involves a labour-intensive process of collecting labels for every pixel in the image. The AI then learns from these labels.

    Instead, the team taught the AI using a combination of sketches and written descriptions. It learned to group pixels, matching them against one of the categories in a description.
    The resulting AI displayed a much richer and more human-like understanding of these drawings than previous approaches. It correctly identified and labelled kites, trees, giraffes and other objects with an 85% accuracy. This outperformed other models which relied on labelled pixels.
    As well as identifying objects in a complex scene, it could identify which pen strokes were intended to depict each object. The new method works well with informal sketches drawn by non-artists, as well as drawings of objects it was not explicitly trained on.
    Professor Judith Fan, Assistant Professor of Psychology at Stanford University, said:
    “Drawing and writing are among the most quintessentially human activities and have long been useful for capturing people’s observations and ideas.
    “This work represents exciting progress towards AI systems that understand the essence of the ideas people are trying to get across, regardless of whether they are using pictures or text.”
    The research forms part of Surrey’s Institute for People-Centred AI, and in particular its SketchX programme. Using AI, SketchX seeks to understand the way we see the world by the way we draw it.

    Professor Yi-Zhe Song, Co-director of the Institute for People-Centred AI, and SketchX lead, said:
    “This research is a prime example of how AI can enhance fundamental human activities like sketching. By understanding rough drawings with near-human accuracy, this technology has immense potential to empower people’s natural creativity, regardless of artistic ability.”
    The findings will be presented at the IEEE/CVF Conference on Computer Vision and Pattern Recognition 2024. It takes place in Seattle from 17-21 June 2024. More