More stories

  • in

    Researchers use generative AI to design novel proteins

    Researchers at the University of Toronto have developed an artificial intelligence system that can create proteins not found in nature using generative diffusion, the same technology behind popular image-creation platforms such as DALL-E and Midjourney.
    The system will help advance the field of generative biology, which promises to speed drug development by making the design and testing of entirely new therapeutic proteins more efficient and flexible.
    “Our model learns from image representations to generate fully new proteins, at a very high rate,” says Philip M. Kim, a professor in the Donnelly Centre for Cellular and Biomolecular Research at U of T’s Temerty Faculty of Medicine. “All our proteins appear to be biophysically real, meaning they fold into configurations that enable them to carry out specific functions within cells.”
    Today, the journal Nature Computational Science published the findings, the first of their kind in a peer-reviewed journal. Kim’s lab also published a pre-print on the model last summer through the open-access server bioRxiv, ahead of two similar pre-prints from last December, RF Diffusion by the University of Washington and Chroma by Generate Biomedicines.
    Proteins are made from chains of amino acids that fold into three-dimensional shapes, which in turn dictate protein function. Those shapes evolved over billions of years and are varied and complex, but also limited in number. With a better understanding of how existing proteins fold, researchers have begun to design folding patterns not produced in nature.
    But a major challenge, says Kim, has been to imagine folds that are both possible and functional. “It’s been very hard to predict which folds will be real and work in a protein structure,” says Kim, who is also a professor in the departments of molecular genetics and computer science at U of T. “By combining biophysics-based representations of protein structure with diffusion methods from the image generation space, we can begin to address this problem.”
    The new system, which the researchers call ProteinSGM, draws from a large set of image-like representations of existing proteins that encode their structure accurately. The researchers feed these images into a generative diffusion model, which gradually adds noise until each image becomes all noise. The model tracks how the images become noisier and then runs the process in reverse, learning how to transform random pixels into clear images that correspond to fully novel proteins.

    Jin Sub (Michael) Lee, a doctoral student in the Kim lab and first author on the paper, says that optimizing the early stage of this image generation process was one of the biggest challenges in creating ProteinSGM. “A key idea was the proper image-like representation of protein structure, such that the diffusion model can learn how to generate novel proteins accurately,” says Lee, who is from Vancouver but did his undergraduate degree in South Korea and master’s in Switzerland before choosing U of T for his doctorate.
    Also difficult was validation of the proteins produced by ProteinSGM. The system generates many structures, often unlike anything found in nature. Almost all of them look real according to standard metrics, says Lee, but the researchers needed further proof.
    To test their new proteins, Lee and his colleagues first turned to OmegaFold, an improved version of DeepMind’s software AlphaFold 2. Both platforms use AI to predict the structure of proteins based on amino acid sequences.
    With OmegaFold, the team confirmed that almost all their novel sequences fold into the desired and also novel protein structures. They then chose a smaller number to create physically in test tubes, to confirm the structures were proteins and not just stray strings of chemical compounds.
    “With matches in OmegaFold and experimental testing in the lab, we could be confident these were properly folded proteins. It was amazing to see validation of these fully new protein folds that don’t exist anywhere in nature,” Lee says.
    Next steps based on this work include further development of ProteinSGM for antibodies and other proteins with the most therapeutic potential, Kim says. “This will be a very exciting area for research and entrepreneurship,” he adds.
    Lee says he would like to see generative biology move toward joint design of protein sequences and structures, including protein side-chain conformations. Most research to date has focussed on generation of backbones, the primary chemical structures that hold proteins together.
    “Side-chain configurations ultimately determine protein function, and although designing them means an exponential increase in complexity, it may be possible with proper engineering,” Lee says. “We hope to find out.” More

  • in

    The future of data storage lies in DNA microcapsules

    Storing data in DNA sounds like science fiction, yet it lies in the near future. Professor Tom de Greef expects the first DNA data center to be up and running within five to ten years. Data won’t be stored as zeros and ones in a hard drive but in the base pairs that make up DNA: AT and CG. Such a data center would take the form of a lab, many times smaller than the ones today. De Greef can already picture it all. In one part of the building, new files will be encoded via DNA synthesis. Another part will contain large fields of capsules, each capsule packed with a file. A robotic arm will remove a capsule, read its contents and place it back.
    We’re talking about synthetic DNA. In the lab, bases are stuck together in a certain order to form synthetically produced strands of DNA. Files and photos that are currently stored in data centers can then be stored in DNA. For now, the technique is suitable only for archival storage. This is because the reading of stored data is very expensive, so you want to consult the DNA files as little as possible.
    Large, energy-guzzling data centers made obsolete
    Data storage in DNA offers many advantages. A DNA file can be stored much more compactly, for instance, and the lifespan of the data is also many times longer. But perhaps most importantly, this new technology renders large, energy-guzzling data centers obsolete. And this is desperately needed, warns De Greef, “because in three years, we will generate so much data worldwide that we won’t be able to store half of it.”
    Together with PhD student Bas Bögels, Microsoft and a group of university partners, De Greef has developed a new technique to make the innovation of data storage with synthetic DNA scalable. The results have been published today in the journal Nature Nanotechnology. De Greef works at the Department of Biomedical Engineering and the Institute for Complex Molecular Systems (ICMS) at TU Eindhoven and serves as a visiting professor at Radboud University.
    Scalable
    The idea of using strands of DNA for data storage emerged in the 1980s but was far too difficult and expensive at the time. It became technically possible three decades later, when DNA synthesis started to take off. George Church, a geneticist at Harvard Medical School, elaborated on the idea in 2011. Since then, synthesis and the reading of data have become exponentially cheaper, finally bringing the technology to the market.

    In recent years, De Greef and his group have looked mainly into reading the stored data. For the time being, this is the biggest problem facing this new technique. The PCR method currently used for this, called ‘random access’, is highly error-prone. You can therefore only read one file at a time and, in addition, the data quality deteriorates too much each time you read a file. Not exactly scalable.
    Here’s how it works: PCR (Polymerase Chain Reaction) creates millions of copies of the piece of DNA that you need by adding a primer with the desired DNA code. Corona tests in the lab, for example, are based on this: even a minuscule amount of coronavirus material from your nose is detectable when copied so many times. But if you want to read multiple files simultaneously, you need multiple primer pairs doing their work at the same time. This creates many errors in the copying process.
    Every capsule contains one file
    This is where the capsules come into play. De Greef’s group developed a microcapsule of proteins and a polymer and then anchored one file per capsule. De Greef: “These capsules have thermal properties that we can use to our advantage.” Above 50 degrees Celsius, the capsules seal themselves, allowing the PCR process to take place separately in each capsule. Not much room for error then. De Greef calls this ‘thermo-confined PCR’. In the lab, it has so far managed to read 25 files simultaneously without significant error.
    If you then lower the temperature again, the copies detach from the capsule and the anchored original remains, meaning that the quality of your original file does not deteriorate. De Greef: “We currently stand at a loss of 0.3 percent after three reads, compared to 35 percent with the existing method.”
    Searchable with fluorescence
    And that’s not all. De Greef has also made the data library even easier to search. Each file is given a fluorescent label and each capsule its own color. A device can then recognize the colors and separate them from one another. This brings us back to the imaginary robotic arm at the beginning of this story, which will neatly select the desired file from the pool of capsules in the future.
    This solves the problem of reading the data. De Greef: “Now it’s just a matter of waiting until the costs of DNA synthesis fall further. The technique will then be ready for application.” As a result, he hopes that the Netherlands will soon be able to open its inaugural DNA data center — a world first. More

  • in

    Quan­tum com­puter in reverse gear

    Today’s computers are based on microprocessors that execute so-called gates. A gate can, for example, be an AND operation, i.e. an operation that adds two bits. These gates, and thus computers, are irreversible. That is, algorithms cannot simply run backwards. “If you take the multiplication 2*2=4, you cannot simply run this operation in reverse, because 4 could be 2*2, but likewise 1*4 or 4*1,” explains Wolfgang Lechner, professor of theoretical physics at the University of Innsbruck. If this were possible, however, it would be feasible to factorize large numbers, i.e. divide them into their factors, which is an important pillar of cryptography.

    Martin Lanthaler, Ben Niehoff and Wolfgang Lechner from the Department of Theoretical Physics at the University of Innsbruck and the quantum spin-off ParityQC have now developed exactly this inversion of algorithms with the help of quantum computers. The starting point is a classical logic circuit, which multiplies two numbers. If two integers are entered as the input value, the circuit returns their product. Such a circuit is built from irreversible operations. “However, the logic of the circuit can be encoded within ground states of a quantum system,” explains Martin Lanthaler from Wolfgang Lechner’s team. “Thus, both multiplication and factorization can be understood as ground-state problems and solved using quantum optimization methods.”
    Superposition of all possible results
    “The core of our work is the encoding of the basic building blocks of the multiplier circuit, specifically AND gates, half and full adders with the parity architecture as the ground state problem on an ensemble of interacting spins,” says Martin Lanthaler. The coding allows the entire circuit to be built from repeating subsystems that can be arranged on a two-dimensional grid. By stringing several of these subsystems together, larger problem instances can be realized. Instead of the classical brute force method, where all possible factors are tested, quantum methods can speed up the search process: To find the ground state, and thus solve an optimization problem, it is not necessary to search the whole energy landscape, but deeper valleys can be reached by “tunneling.”
    The current research work provides a blueprint for a new type of quantum computer to solve the factorization problem, which is a cornerstone of modern cryptography. This blueprint is based on the parity architecture developed at the University of Innsbruck and can be implemented on all current quantum computing platforms.
    The results were recently published in Nature Communications Physics. Financial support for the research was provided by the Austrian Science Fund FWF, the European Union and the Austrian Research Promotion Agency FFG, among others. More

  • in

    Researchers detect and classify multiple objects without images

    Researchers have developed a new high-speed way to detect the location, size and category of multiple objects without acquiring images or requiring complex scene reconstruction. Because the new approach greatly decreases the computing power necessary for object detection, it could be useful for identifying hazards while driving.
    “Our technique is based on a single-pixel detector, which enables efficient and robust multi-object detection directly from a small number of 2D measurements,” said research team leader Liheng Bian from the Beijing Institute of Technology in China. “This type of image-free sensing technology is expected to solve the problems of heavy communication load, high computing overhead and low perception rate of existing visual perception systems.”
    Today’s image-free perception methods can only achieve classification, single object recognition or tracking. To accomplish all three at once, the researchers developed a technique known as image-free single-pixel object detection (SPOD). In the Optica Publishing Group journal Optics Letters, they report that SPOD can achieve an object detection accuracy of just over 80%.
    The SPOD technique builds on the research group’s previous accomplishments in developing imaging-free sensing technology as efficient scene perception technology. Their prior work includes image-free classification, segmentation and character recognition based on a single-pixel detector.
    “For autonomous driving, SPOD could be used with lidar to help improve scene reconstruction speed and object detection accuracy,” said Bian. “We believe that it has a high enough detection rate and accuracy for autonomous driving while also reducing the transmission bandwidth and computing resource requirements needed for object detection.”
    Detection without images
    Automating advanced visual tasks — whether used to navigate a vehicle or track a moving plane — usually require detailed images of a scene to extract the features necessary to identify an object. However, this requires either complex imaging hardware or complicated reconstruction algorithms, which leads to high computational cost, long running time and heavy data transmission load.For this reason, the traditional image first, perceive later approaches may not be best for object detection.

    Image-free sensing methods based on single-pixel detectors can cut down on the computational power needed for object detection. Instead of employing a pixelated detector such as a CMOS or CCD, single-pixel imaging illuminates the scene with a sequence of structured light patterns and then records the transmitted light intensity to acquire the spatial information of objects. This information is then used to computationally reconstruct the object or to calculate its properties.
    For SPOD, the researchers used a small but optimized structured light pattern to quickly scan the entire scene and obtain 2D measurements. These measurements are fed into a deep learning model known as a transformer-based encoder to extract the high-dimensional meaningful features in the scene. These features are then fed into a multi-scale attention network-based decoder, which outputs the class, location and size information of all targets in the scene simultaneously.
    “Compared to the full-size pattern used by other single-pixel detection methods, the small, optimized pattern produces better image-free sensing performance,” said group member Lintao Peng. “Also, the multi-scale attention network in the SPOD decoder reinforces the network’s attention to the target area in the scene. This allows more efficient extraction of scene features, enabling state-of-the art object detection performance.”
    Proof-of-concept demonstration
    To experimentally demonstrate SPOD, the researchers built a proof-of-concept setup. Images randomly selected from the Pascal Voc 2012 test dataset were printed on film and used as target scenes. When a sampling rate of 5% was used, the average time to complete spatial light modulation and image-free object detection per scene with SPOD was just 0.016 seconds. This is much faster than performing scene reconstruction first (0.05 seconds) and then object detection (0.018 seconds. SPOD showed an average detection accuracy of 82.2% for all the object classes included in the test dataset.
    “Currently, SPOD cannot detect every possible object category because the existing object detection dataset used to train the model only contains 80 categories,” said Peng. “However, when faced with a specific task, the pre-trained model can be fine-tuned to achieve image-free multi-object detection of new target classes for applications such as pedestrian, vehicle or boat detection.”
    Next, the researchers plan to extend the image-free perception technology to other kinds of detectors and computational acquisition systems to achieve reconstruction-free sensing technology. More

  • in

    Engineers tap into good vibrations to power the Internet of Things

    In a world hungry for clean energy, engineers have created a new material that converts the simple mechanical vibrations all around us into electricity to power sensors in everything from pacemakers to spacecraft.
    The first of its kind and the product of a decade of work by researchers at the University of Waterloo and the University of Toronto, the novel generating system is compact, reliable, low-cost and very, very green.
    “Our breakthrough will have a significant social and economic impact by reducing our reliance on non-renewable power sources,” said Asif Khan, a Waterloo researcher and co-author of a new study on the project. “We need these energy-generating materials more critically at this moment than at any other time in history.”
    The system Khan and his colleagues developed is based on the piezoelectric effect, which generates an electrical current by applying pressure — mechanical vibrations are one example — to an appropriate substance.
    The effect was discovered in 1880, and since then, a limited number of piezoelectric materials, such as quartz and Rochelle salts, have been used in technologies ranging from sonar and ultrasonic imaging to microwave devices.
    The problem is that until now, traditional piezoelectric materials used in commercial devices have had limited capacity for generating electricity. They also often use lead, which Khan describes as “detrimental to the environment and human health.”
    The researchers solved both problems.
    They started by growing a large single crystal of a molecular metal-halide compound called edabco copper chloride using the Jahn-Teller effect, a well-known chemistry concept related to spontaneous geometrical distortion of a crystal field.
    Khan said that highly piezoelectric material was then used to fabricate nanogenerators “with a record power density that can harvest tiny mechanical vibrations in any dynamic circumstances, from human motion to automotive vehicles” in a process requiring neither lead nor non-renewable energy.
    The nanogenerator is tiny — 2.5 centimetres square and about the thickness of a business card — and could be conveniently used in countless situations. It has the potential to power sensors in a vast array of electronic devices, including billions needed for the Internet of Things — the burgeoning global network of objects embedded with sensors and software that connect and exchange data with other devices.
    Dr. Dayan Ban, a researcher at the Waterloo Institute for Nanotechnology, said that in future, an aircraft’s vibrations could power its sensory monitoring systems, or a person’s heartbeat could keep their battery-free pacemaker running.
    “Our new material has shown record-breaking performance,” said Ban, a professor of electrical and computer engineering. “It represents a new path forward in this field.” More

  • in

    ‘Raw’ data show AI signals mirror how the brain listens and learns

    New research from the University of California, Berkeley, shows that artificial intelligence (AI) systems can process signals in a way that is remarkably similar to how the brain interprets speech, a finding scientists say might help explain the black box of how AI systems operate.
    Using a system of electrodes placed on participants’ heads, scientists with the Berkeley Speech and Computation Lab measured brain waves as participants listened to a single syllable — “bah.” They then compared that brain activity to the signals produced by an AI system trained to learn English.
    “The shapes are remarkably similar,” said Gasper Begus, assistant professor of linguistics at UC Berkeley and lead author on the study published recently in the journal Scientific Reports. “That tells you similar things get encoded, that processing is similar. ”
    A side-by-side comparison graph of the two signals shows that similarity strikingly.
    “There are no tweaks to the data,” Begus added. “This is raw.”
    AI systems have recently advanced by leaps and bounds. Since ChatGPT ricocheted around the world last year, these tools have been forecast to upend sectors of society and revolutionize how millions of people work. But despite these impressive advances, scientists have had a limited understanding of how exactly the tools they created operate between input and output.

    A question and answer in ChatGPT has been the benchmark to measure an AI system’s intelligence and biases. But what happens between those steps has been something of a black box. Knowing how and why these systems provide the information they do — how they learn — becomes essential as they become ingrained in daily life in fields spanning health care to education.
    Begus and his co-authors, Alan Zhou of Johns Hopkins University and T. Christina Zhao of the University of Washington, are among a cadre of scientists working to crack open that box.
    To do so, Begus turned to his training in linguistics.
    When we listen to spoken words, Begus said, the sound enters our ears and is converted into electrical signals. Those signals then travel through the brainstem and to the outer parts of our brain. With the electrode experiment, researchers traced that path in response to 3,000 repetitions of a single sound and found that the brain waves for speech closely followed the actual sounds of language.
    The researchers transmitted the same recording of the “bah” sound through an unsupervised neural network — an AI system — that could interpret sound. Using a technique developed in the Berkeley Speech and Computation Lab, they measured the coinciding waves and documented them as they occurred.

    Previous research required extra steps to compare waves from the brain and machines. Studying the waves in their raw form will help researchers understand and improve how these systems learn and increasingly come to mirror human cognition, Begus said.
    “I’m really interested as a scientist in the interpretability of these models,” Begus said. “They are so powerful. Everyone is talking about them. And everyone is using them. But much less is being done to try to understand them.”
    Begus believes that what happens between input and output doesn’t have to remain a black box. Understanding how those signals compare to the brain activity of human beings is an important benchmark in the race to build increasingly powerful systems. So is knowing what’s going on under the hood.
    For example, having that understanding could help put guardrails on increasingly powerful AI models. It could also improve our understanding of how errors and bias are baked into the learning processes.
    Begus said he and his colleagues are collaborating with other researchers using brain imaging techniques to measure how these signals might compare. They’re also studying how other languages, like Mandarin, are decoded in the brain differently and what that might indicate about knowledge.
    Many models are trained on visual cues, like colors or written text — both of which have thousands of variations at the granular level. Language, however, opens the door for a more solid understanding, Begus said.
    The English language, for example, has just a few dozen sounds.
    “If you want to understand these models, you have to start with simple things. And speech is way easier to understand,” Begus said. “I am very hopeful that speech is the thing that will help us understand how these models are learning.”
    In cognitive science, one of the primary goals is to build mathematical models that resemble humans as closely as possible. The newly documented similarities in brain waves and AI waves are a benchmark on how close researchers are to meeting that goal.
    “I’m not saying that we need to build things like humans,” Begus said. “I’m not saying that we don’t. But understanding how different architectures are similar or different from humans is important.” More

  • in

    Deep neural network provides robust detection of disease biomarkers in real time

    Sophisticated systems for the detection of biomarkers — molecules such as DNA or proteins that indicate the presence of a disease — are crucial for real-time diagnostic and disease-monitoring devices.
    Holger Schmidt, distinguished professor of electrical and computer engineering at UC Santa Cruz, and his group have long been focused on developing unique, highly sensitive devices called optofluidic chips to detect biomarkers.
    Schmidt’s graduate student Vahid Ganjalizadeh led an effort to use machine learning to enhance their systems by improving its ability to accurately classify biomarkers. The deep neural network he developed classifies particle signals with 99.8 percent accuracy in real time, on a system that is relatively cheap and portable for point-of-care applications, as shown in a new paper in Nature Scientific Reports.
    When taking biomarker detectors into the field or a point-of-care setting such as a health clinic, the signals received by the sensors may not be as high quality as those in a lab or a controlled environment. This may be due to a variety of factors, such as the need to use cheaper chips to bring down costs, or environmental characteristics such as temperature and humidity.
    To address the challenges of a weak signal, Schmidt and his team developed a deep neural network that can identify the source of that weak signal with high confidence. The researchers trained the neural network with known training signals, teaching it to recognize potential variations it could see, so that it can recognize patterns and identify new signals with very high accuracy.
    First, a parallel cluster wavelet analysis (PCWA) approach designed in Schmidt’s lab detects that a signal is present. Then, the neural network processes the potentially weak or noisy signal, identifying its source. This system works in real time, so users are able to receive results in a fraction of a second.

    “It’s all about making the most of possibly low quality signals, and doing that really fast and efficiently,” Schmidt said.
    A smaller version of the neural network model can run on portable devices. In the paper, the researchers run the system over a Google Coral Dev board, a relatively cheap edge device for accelerated execution of artificial intelligence algorithms. This means the system also requires less power to execute the processing compared to other techniques.
    “Unlike some research that requires running on supercomputers to do high-accuracy detection, we proved that even a compact, portable, relatively cheap device can do the job for us,” Ganjalizadeh said. “It makes it available, feasible, and portable for point-of-care applications.”
    The entire system is designed to be used completely locally, meaning the data processing can happen without internet access, unlike other systems that rely on cloud computing. This also provides a data security advantage, because results can be produced without the need to share data with a cloud server provider.
    It is also designed to be able to give results on a mobile device, eliminating the need to bring a laptop into the field.
    “You can build a more robust system that you could take out to under-resourced or less- developed regions, and it still works,” Schmidt said.
    This improved system will work for any other biomarkers Schmidt’s lab’s systems have been used to detect in the past, such as COVID-19, Ebola, flu, and cancer biomarkers. Although they are currently focused on medical applications, the system could potentially be adapted for the detection of any type of signal.
    To push the technology further, Schmidt and his lab members plan to add even more dynamic signal processing capabilities to their devices. This will simplify the system and combine the processing techniques needed to detect signals at both low and high concentrations of molecules. The team is also working to bring discrete parts of the setup into the integrated design of the optofluidic chip. More

  • in

    A touch-responsive fabric armband — for flexible keyboards, wearable sketchpads

    It’s time to roll up your sleeves for the next advance in wearable technology — a fabric armband that’s actually a touch pad. In ACS Nano, researchers say they have devised a way to make playing video games, sketching cartoons and signing documents easier. Their proof-of-concept silk armband turns a person’s forearm into a keyboard or sketchpad. The three-layer, touch-responsive material interprets what a user draws or types and converts it into images on a computer.
    Computer trackpads and electronic signature-capture devices seem to be everywhere, but they aren’t as widely used in wearables. Researchers have suggested making flexible touch-responsive panels from clear, electrically conductive hydrogels, but these substances are sticky, making them hard to write on and irritating to the skin. So, Xueji Zhang, Lijun Qu, Mingwei Tian and colleagues wanted to incorporate a similar hydrogel into a comfortable fabric sleeve for drawing or playing games on a computer.
    The researchers sandwiched a pressure-sensitive hydrogel between layers of knit silk. The top piece was coated in graphene nanosheets to make the fabric electrically conductive. Attaching the sensing panel to electrodes and a data collection system produced a pressure-responsive pad with real-time, rapid sensing when a finger slid over it, writing numbers and letters. The device was then incorporated into an arm-length silk sleeve with a touch-responsive area on the forearm. In experiments, a user controlled the direction of blocks in a computer game and sketched colorful cartoons in a computer drawing program from the armband. The researchers say that their proof-of-concept wearable touch panel could inspire the next generation of flexible keyboards and wearable sketchpads. More