More stories

  • in

    A roadmap to help AI technologies speak African languages

    From text-generating ChatGPT to voice-activated Siri, artificial intelligence-powered tools are designed to aid our everyday life — as long as you speak a language they support. These technologies are out of reach for billions of people who don’t use English, French, Spanish or other mainstream languages, but researchers in Africa are looking to change that. In a study published August 11 in the journal Patterns, scientists draw a roadmap to develop better AI-driven tools for African languages.
    “It doesn’t make sense to me that there are limited AI tools for African languages,” says first author and AI researcher Kathleen Siminyu of the Masakhane Research Foundation, a grassroots network of African scientists who aim to spur accessible AI tools for those who speak African languages. “Inclusion and representation in the advancement of language technology is not a patch you put at the end — it’s something you think about up front.”
    Many of these tools rely on a field of AI called natural language processing, a technology that enables computers to understand human languages. Computers can master a language through training, where they pick up on patterns in speech and text data. However, they fail when data in a particular language is scarce, as seen in African languages. To fill the gap, the research team first identified key players involved in developing African language tools and explored their experience, motivation, focuses, and challenges. These people include writers and editors who create and curate content, as well as linguists, software engineers, and entrepreneurs who are crucial in establishing the infrastructure for language tools.
    Interviews with the key players revealed four central themes to consider in designing African language tools: First, bearing the impact of colonization, Africa is a multilingual society where African language is central to people’s cultural identities and is key to societal participation in education, politics, economy, and more. Second, there is a need to support African content creation. This includes building basic tools such as dictionaries, spell checkers, and keyboards for African languages and removing financial and administrative barriers for translating government communications to multiple national languages, which includes African languages. Third, the creation of African language technologies will benefit from collaborations between linguistics and computer science. Also, there should be focus on creating tools that are human centered, which help individuals unlock greater potential. Fourth, developers should be mindful of communities and ethical practices during the collection, curation, and use of data.”There’s a growing number of organizations working in this space, and this study allows us to coordinate efforts in building impactful language tools,” says Siminyu. “The findings highlight and articulate what the priorities are, in terms of time and financial investments.”
    Next, the team plans to expand the study and include more participants to understand the communities that AI language technologies may impact. They will also address barriers that may hinder people’s access to the technology. The team hopes their study could serve as a roadmap to help develop a wide range of language tools, from translation services to misinformation-catching content moderators. The findings may also pave the way to preserve indigenous African languages.
    “I would love for us to live in a world where Africans can have as good quality of life and access to information and opportunities as somebody fluent in English, French, Mandarin, or other languages,” says Siminyu. More

  • in

    Tool finds bias in state-of-the-art generative AI model

    Text-to-image (T2I) generative artificial intelligence tools are increasingly powerful and widespread tools that can create nearly any image based on just a few inputted words. T2I generative AI can create convincingly realistic photos and videos which are being used more and more for a multitude of purposes, from art to political campaigning.
    However, the algorithmic models that power these tools are trained on data from humans, and can replicate human biases in the images they produce, such as biases around gender and skin tone. These biases can harm marginalized populations, reinforcing stereotypes and potentially leading to discrimination.
    To address these implicit biases, Assistant Professor of Computer Science and Engineering Xin (Eric) Wang and a team of researchers from Baskin Engineering at UC Santa Cruz created a tool called the Text to Image Association Test, which provides a quantitative measurement of complex human biases embedded in T2I models, evaluating biases across dimensions such as gender, race, career, and religion. They used this tool to identify and quantify bias in the state-of-the-art generative model Stable Diffusion.
    The tool is detailed in a paper for the 2023 Association for Computational Linguistics (ACL) conference, a premier computer science conference, and is available for use in a demo version.
    “I think both the model owners and users care about this issue,” said Jialu Wang, a UCSC computer science and engineering Ph.D. student and the first author on the paper. “If the user is from an unprivileged group, they may not want to see just the privileged group reflected in the images they generate.”
    To use the tool, a user must tell the model to produce an image for a neutral prompt, for example “child studying science.” Next, the user inputs gender specific prompts, such as “girl studying science” and “boy studying science.” Then, the tool calculates the distance between the images generated with the neutral prompt and each of the specific prompts. That difference between those two distances is a quantitative measurement of bias.
    Using their tool, the research team found that the state-of-the-art generative model Stable Diffusion both replicates and amplifies human biases in the images it produces. The tool tests the association between two concepts, such as science and arts, to two attributes, such as male and female. It then gives an association score between the concept and the attribute and a value to indicate how confident the tool is in that score. More

  • in

    Effectiveness of video gameplay restrictions questioned in new study

    Legal restrictions placed on the amount of time young people in China can play video games may be less effective than originally thought, a new study has revealed.
    To investigate the effectiveness of the policy, a team of researchers led by the University of York, analysed over 7 billion hours of playtime data from tens of thousands of games, with data drawn from over two billion accounts from players in China, where legal restrictions on playtime for young people have been in place since 2019.
    The research team, however, did not find evidence of a decrease in heavy play of games after these restrictions were put in place.
    The video games industry has witnessed a surge in popularity, and as many as 4 billion people are now estimated to engage in gaming worldwide each year.
    Many countries across the globe have expressed concerns about the number of hours young people spend playing video games and the potential impact of this on wellbeing. In response to these concerns, in 2019 China restricted playtime for people under 18.
    China is one of the first countries to explore legal means of restricting gameplay for young people with the aim of limiting the potential risks of gaming to wellbeing, and the policy was assumed to be effective, with some bodies suggesting that it had resolved issues relating to disordered gaming.
    Dr David Zendle, from the University of York’s Department of Computer Science, said: “Policymakers around the world have been discussing how to understand the impact of video gameplay, particularly on young people, for some time now, and how to ensure a healthy relationship with games. The UK government, for example, has recently issued guidelines for high quality research into gaming and wellbeing to inform future decision making. More

  • in

    Turning ChatGPT into a ‘chemistry assistant’

    Developing new materials requires significant time and labor, but some chemists are now hopeful that artificial intelligence (AI) could one day shoulder much of this burden. In a new study in the Journal of the American Chemical Society, a team prompted a popular AI model, ChatGPT, to perform one particularly time-consuming task: searching scientific literature. With that data, they built a second tool, a model to predict experimental results.
    Reports from previous studies offer a vast trove of information that chemists need, but finding and parsing the most relevant details can be laborious. For example, those interested in designing highly porous, crystalline metal-organic frameworks (MOFs) — which have potential applications in areas such as clean energy — must sort through hundreds of scientific papers describing a variety of experimental conditions. Researchers have previously attempted to coax AI to take over this task; however, the language processing models they used required significant technical expertise, and applying them to new topics meant changing the program. Omar Yaghi and colleagues wanted to see if the next generation of language models, which includes ChatGPT, could offer a more accessible, flexible way to extract information.
    To analyze text from scientific papers, the team gave ChatGPT prompts, or instructions, guiding it through three processes intended to identify and summarize the experimental information the manuscripts contained. The researchers carefully constructed these prompts to minimize the model’s tendency to make up responses, a phenomenon known as hallucination, and to ensure the best responses possible.
    When tested on 228 papers describing MOF syntheses, this system extracted more than 26,000 factors relevant for making roughly 800 of these compounds. With these data, the team trained a separate AI model to predict the crystalline state of MOFs based on these conditions. And finally, to make the data more user friendly, they built a chatbot to answer questions about it. The team notes that, unlike previous AI-based efforts, this one does not require expertise in coding. What’s more, scientists can shift its focus simply by adjusting the narrative language in the prompts. This new system, which they dub the “ChatGPT Chemistry Assistant,” could also be useful in other fields of chemistry, according to the researchers. More

  • in

    How sure is sure? Incorporating human error into machine learning

    Researchers are developing a way to incorporate one of the most human of characteristics — uncertainty — into machine learning systems.
    Human error and uncertainty are concepts that many artificial intelligence systems fail to grasp, particularly in systems where a human provides feedback to a machine learning model. Many of these systems are programmed to assume that humans are always certain and correct, but real-world decision-making includes occasional mistakes and uncertainty.
    Researchers from the University of Cambridge, along with The Alan Turing Institute, Princeton, and Google DeepMind, have been attempting to bridge the gap between human behaviour and machine learning, so that uncertainty can be more fully accounted for in AI applications where humans and machines are working together. This could help reduce risk and improve trust and reliability of these applications, especially where safety is critical, such as medical diagnosis.
    The team adapted a well-known image classification dataset so that humans could provide feedback and indicate their level of uncertainty when labelling a particular image. The researchers found that training with uncertain labels can improve these systems’ performance in handling uncertain feedback, although humans also cause the overall performance of these hybrid systems to drop. Their results will be reported at the AAAI/ACM Conference on Artificial Intelligence, Ethics and Society (AIES 2023) in Montréal.
    ‘Human-in-the-loop’ machine learning systems — a type of AI system that enables human feedback — are often framed as a promising way to reduce risks in settings where automated models cannot be relied upon to make decisions alone. But what if the humans are unsure?
    “Uncertainty is central in how humans reason about the world but many AI models fail to take this into account,” said first author Katherine Collins from Cambridge’s Department of Engineering. “A lot of developers are working to address model uncertainty, but less work has been done on addressing uncertainty from the person’s point of view.”
    We are constantly making decisions based on the balance of probabilities, often without really thinking about it. Most of the time — for example, if we wave at someone who looks just like a friend but turns out to be a total stranger — there’s no harm if we get things wrong. However, in certain applications, uncertainty comes with real safety risks. More

  • in

    How randomized data can improve our security

    Huge streams of data pass through our computers and smartphones every day. In simple terms, technical devices contain two essential units to process this data: A processor, which is a kind of control center, and a RAM, comparable to memory. Modern processors use a cache to act as a bridge between the two, since memory is much slower at providing data than the processor is at processing it. This cache often contains private data that could be an attractive target for attackers. A team of scientists from Bochum, Germany, in cooperation with researchers from Japan, has now developed an innovative cipher that not only offers greater security than previous approaches, but is also more efficient and faster. They are presenting their work at the prestigious Usenix Security Symposium in Anaheim, California (USA).
    The team includes Dr. Federico Canale and Professor Gregor Leander from the Chair of Symmetric Cryptography, Jan Philipp Thoma and Professor Tim Güneysu from the Chair of Security Engineering, all from Ruhr University Bochum, as well as Yosuke Todo from NTT Social Informatics Laboratories and Rei Ueno from Tohoku University (Japan).
    Cache not well protected against side-channel attacks until now
    Years ago, CASA PI Professor Yuval Yarom, who has been at Ruhr University since April 2023, discovered that the cache is not well protected against a certain type of attack. The serious Spectre and Meltdown vulnerabilities made headlines at the time because they affected all popular microprocessors as well as cloud services. Caches are unobtrusive, but they perform an important task: they store data that is requested very frequently. Its main function is to reduce latency. If the CPU had to fetch from slower RAM every time it needed to access data, this would slow down the system. This is why the CPU fetches certain data from the cache. However, attackers can exploit this communication between CPU and cache. Their method: They overwrite the cache’s unsecured data. The system requests the data from main memory because it cannot find it in the cache. This process is measurably slower. “In so-called timing side-channel attacks, attackers can measure the time differences and use them to observe memory accesses by other programs. Thus, they can steal private keys for encryption algorithms, for example,” explains Jan Philipp Thoma from the Chair of Security Engineering.
    Innovative mathematical solution
    While patches have been developed to fix the vulnerability for certain attacks, they have failed to provide provable security. However, the team from Bochum and Japan has now come up with an innovative solution: “Our idea is to use mathematical processes to randomize the data in the cache,” explains Gregor Leander, who recently received an ECR Advanced Grant for his research. This randomization in the CPU’s caches can help prevent attacks by disabling attackers from removing data from the cache.
    “The interdisciplinary approach of cryptography and hardware security considerations is a novelty in computer security. While there have been previous ideas for randomized cache architectures, none have been very efficient and none have been able to completely withhold strong attackers,” said Tim Güneysu, who heads the Chair of Security Engineering. The new SCARF model uses block cipher encryption, a completely new idea for the field, according to the researchers. “Normally, we encrypt data with 128 bits, in the cache we sometimes work with 10 bits. This is a complex process because it takes much longer to mix this data with a large key,” said Gregor Leander. The large key is needed because a shorter encryption of such small amounts of data could be more easily broken by attackers. More

  • in

    Turning big data into better breeds and varieties: Can AI help feed the planet?

    Artificial intelligence could hold the key to feeding 10 billion people by 2050 in the face of climate change and rapidly evolving pests and pathogens according to researchers at The University of Queensland.
    Professor Lee Hickey from UQ’s Queensland Alliance for Agriculture and Food Innovation said AI offered opportunities to accelerate the development of high performing plants and animals for better farm sustainability and profitability.
    “Breeders are collecting billions of data points, but the big challenge is how we turn this colossal amount of data into knowledge to support smarter decisions in the breeding process,” Professor Hickey said.
    “AI can help to identify which plants and animals we use for crossing or carry forward to the next generation.”
    Professor Ben Hayes, the co-inventor of genomic prediction, said the QAAFI team had identified four applications for AI in crop and livestock breeding.
    “The first one is deciding what to breed — it might sound simple, but this decision is becoming more complex,” Professor Hayes said.
    “In an increasingly challenging environment, consumer acceptance will be more important, so AI is a good way to pull together the preferences of millions of people. More

  • in

    A new weapon in the war on robocall scams

    The latest weapon in the war on robocalls is an automated system that analyzes the content of these unsolicited bulk calls to shed light on both the scope of the problem and the type of scams being perpetuated by robocalls. The tool, called SnorCall, is designed to help regulators, phone carriers and other stakeholders better understand and monitor robocall trends — and take action against related criminal activity.
    “Although telephone service providers, regulators and researchers have access to call metadata — such as the number being called and the length of the call — they do not have tools to investigate what is being said on robocalls at the vast scale required,” says Brad Reaves, corresponding author of a paper on the work and an assistant professor of computer science at North Carolina State University.
    “For one thing, providers don’t want to listen in on calls — it raises significant privacy concerns. But robocalls are a huge problem, and are often used to conduct criminal fraud. To better understand the scope of this problem, and gain insights into these scams, we need to know what is being said on these robocalls.
    “We’ve developed a tool that allows us to the characterize the content of robocalls,” Reaves says. “And we’ve done it without violating privacy concerns; in collaboration with a telecommunications company called Bandwidth, we operate more than 60,000 phone numbers that are used solely by us to monitor unsolicited robocalls. We did not use any phone numbers of actual customers.”
    The new tool, SnorCall, essentially records all robocalls received on the monitored phone lines. It bundles together robocalls that use the same audio, reducing the number of robocalls whose content needs to be analyzed by around an order of magnitude. These recorded robocalls are then transcribed and analyzed by a machine learning framework called Snorkel that can be used to characterize each call.
    “SnorCall essentially uses labels to identify what each robocall is about,” Reaves says. “Does it mention a specific company or government program? Does it request specific personal information? If so, what kind? Does it request money? If so, how much? This is all fed into a database that we can use to identify trends or behaviors.”
    As a proof of concept, the researchers used SnorCall to assess 232,723 robocalls collected over 23 months on the more than 60,000 phone lines dedicated to the study. More