More stories

  • in

    New data science platform speeds up Python queries

    Researchers from Brown University and MIT have developed a new data science framework that allows users to process data with the programming language Python — without paying the “performance tax” normally associated with a user-friendly language.
    The new framework, called Tuplex, is able to process data queries written in Python up to 90 times faster than industry-standard data systems like Apache Spark or Dask. The research team unveiled the system in research presented at SIGMOD 2021, a premier data processing conference, and have made the software freely available to all.
    “Python is the primary programming language used by people doing data science,” said Malte Schwarzkopf, an assistant professor of computer science at Brown and one of the developers of Tuplex. “That makes a lot of sense. Python is widely taught in universities, and it’s an easy language to get started with. But when it comes to data science, there’s a huge performance tax associated with Python because platforms can’t process Python efficiently on the back end.”
    Platforms like Spark perform data analytics by distributing tasks across multiple processor cores or machines in a data center. That parallel processing allows users to deal with giant data sets that would choke a single computer to death. Users interact with these platforms by inputting their own queries, which contain custom logic written as “user-defined functions” or UDFs. UDFs specify custom logic, like extracting the number of bedrooms from the text of a real estate listing for a query that searches all of the real estate listings in the U.S. and selects all the ones with three bedrooms.
    Because of its simplicity, Python is the language of choice for creating UDFs in the data science community. In fact, the Tuplex team cites a recent poll showing that 66% of data platform users utilize Python as their primary language. The problem is that analytics platforms have trouble dealing with those bits of Python code efficiently.
    Data platforms are written in high-level computer languages that are compiled before running. Compilers are programs that take computer language and turn it into machine code — sets of instructions that a computer processor can quickly execute. Python, however, is not compiled beforehand. Instead, computers interpret Python code line by line while the program runs, which can mean far slower performance. More

  • in

    How children integrate information

    Children learn a huge number of words in the early preschool years. A two-year-old might be able to say just a handful of words, while a five-year-old is quite likely to know many thousands. How do children achieve this marvelous feat? The question has occupied psychologists for over a century: In countless carefully designed experiments, researchers titrate the information children use to learn new words. How children integrate different types of information, has remained unclear.
    “We know that children use a lot of different information sources in their social environment, including their own knowledge, to learn new words. But the picture that emerges from the existing research is that children have a bag of tricks that they can use,” says Manuel Bohn, a researcher at the Max Planck Institute for Evolutionary Anthropology.
    For example, if you show a child an object they already know — say a cup — as well as an object they have never seen before, the child will usually think that a word they never heard before belongs with the new object. Why? Children use information in the form of their existing knowledge of words (the thing you drink out of is called a “cup”) to infer that the object that doesn’t have a name goes with the name that doesn’t have an object. Other information comes from the social context: children remember past interactions with a speaker to find out what they are likely to talk about next.
    “But in the real world, children learn words in complex social settings in which more than just one type of information is available. They have to use their knowledge of words while interacting with a speaker. Word learning always requires integrating multiple, different information sources,” Bohn continues. An open question is how children combine different, sometimes even conflicting, sources of information.
    Predictions by a computer program
    In a new study, a team of researchers from the Max Planck Institute for Evolutionary Anthropology, MIT, and Stanford University takes on this issue. In a first step, they conducted a series of experiments to measure children’s sensitivity to different information sources. Next, they formulated a computational cognitive model which details the way that this information is integrated. More

  • in

    Decoding electron dynamics

    Electron motion in atoms and molecules is of fundamental importance to many physical, biological, and chemical processes. Exploring electron dynamics within atoms and molecules is essential for understanding and manipulating these phenomena. Pump-probe spectroscopy is the conventional technique. The 1999 Nobel Prize in Chemistry provides a well-known example wherein femtosecond pumped laser pulses served to probe the atomic motion involved in chemical reactions. However, because the timescale of electron motion within atoms and molecules is on the order of attoseconds (10-18 seconds) rather than femtoseconds (10-15 seconds), attosecond pulses are required to probe electron motion. With the development of the attosecond technology, lasers with pulse durations shorter than 100 attoseconds have become available, providing opportunities for probing and manipulating electron dynamics in atoms and molecules.
    Another important method for probing electron dynamics is based on strong-field tunneling ionization. In this method, a strong femtosecond laser is employed to induce tunneling ionization, a quantum mechanical phenomenon that causes electrons to tunnel through the potential barrier and escape from the atom or molecule. This process provides photoelectron-encoded information about ultrafast electron dynamics. Based on the relationship between the ionization time and the final momentum of the tunneling ionized photoelectron, electron dynamics can be observed with attosecond-scale resolution.
    The relationship between ionization time and the final momentum of the tunneling photoelectron has been theoretically established in terms of a “quantum orbit” model and the accuracy of the relationship has been verified experimentally. But which quantum orbits contribute to the photoelectron yield in strong-field tunneling ionization has remained a mystery, as well as how different orbits correspond differently to momentum and ionization times. So, identifying the quantum orbits is vital to the study of ultrafast dynamic processes using tunneling ionization.
    As reported in Advanced Photonics, researchers at Huazhong University of Science and Technology (HUST) proposed a scheme to identify and weigh the quantum orbits in strong-field tunneling ionization. In their scheme, a second harmonic (SH) frequency is introduced to perturb the tunneling ionization process. The perturbation SH is much weaker than the fundamental field, so it does not change the final momentum of the electron that is tunneling toward ionization. However, it can significantly alter the photoelectron yield, due to the highly nonlinear nature of tunneling ionization. Because of different ionization times, different quantum orbitals have different responses to the intervening SH field. By changing the phase of the SH field relative to the fundamental driving field and monitoring the responses of the photoelectron yield, the quantum orbits of tunneling ionized electrons can be accurately identified. Based on this scheme, the mysteries of the so-called “long” and “short” quantum orbits in strong-field tunneling ionization can be resolved, and their relative contribution to the photoelectron yield at each momentum is able to be accurately weighted. This is a very important development for the application of strong-field tunneling ionization as a method of photoelectron spectroscopy.
    A collaborative team effort led by HUST graduate students Jia Tan, under the supervision of Professor Yueming Zhou, along with Shengliang Xu and Xu Han, under the supervision of Professor Qingbin Zhang, the study indicates that the hologram generated by the multi-orbit contribution from the photoelectronic spectrum can provide valuable information regarding the phase of the tunneled electron. Its wave packet encodes rich information about atomic and molecular electron dynamics. According to Peixiang Lu, HUST professor, vice director of the Wuhan National Laboratory for Optoelectronics, and senior author of the study, “Attosecond temporal and subangstrom spatial resolution measurement of electron dynamics is made possible by this new scheme for resolving and weighing quantum orbits.”
    Story Source:
    Materials provided by SPIE–International Society for Optics and Photonics. Note: Content may be edited for style and length. More

  • in

    Machine learning helps in predicting when immunotherapy will be effective

    When it comes to defense, the body relies on attack thanks to the lymphatic and immune systems. The immune system is like the body’s own personal police force as it hunts down and eliminates pathogenic villains.
    “The body’s immune system is very good at identifying cells that are acting strangely. These include cells that could develop into tumors or cancer in the future,” says Federica Eduati from the department of Biomedical Engineering at TU/e. “Once detected, the immune system strikes and kills the cells.”
    Stopping the attack
    But it’s not always so straightforward as tumor cells can develop ways to hide themselves from the immune system.
    “Unfortunately, tumor cells can block the natural immune response. Proteins on the surface of a tumor cell can turn off the immune cells and effectively put them in sleep mode,” says Oscar Lapuente-Santana, PhD researcher in the Computational Biology group.
    Fortunately, there is a way to wake up the immune cells and restore their antitumor immunity, and it’s based on immunotherapy. More

  • in

    Common errors in internet energy analysis

    When it comes to understanding and predicting trends in energy use, the internet is a tough nut to crack. So say energy researchers Eric Masanet, of UC Santa Barbara, and Jonathan Koomey, of Koomey Analytics. The two just published a peer-reviewed commentary in the journal Joule discussing the pitfalls that plague estimates of the internet’s energy and carbon impacts.
    The paper describes how these errors can lead well-intentioned studies to predict massive energy growth in the information technology (IT) sector, which often doesn’t materialize. “We’re not saying the energy use of the internet isn’t a problem, or that we shouldn’t worry about it,” Masanet explained. “Rather, our main message is that we all need to get better at analyzing internet energy use and avoiding these pitfalls moving forward.”
    Masanet, the Mellichamp Chair in Sustainability Science for Emerging Technologies at UCSB’s Bren School of Environmental Science & Management, has researched energy analysis of IT systems for more than 15 years. Koomey, who has studied the subject for over three decades, was for many years a staff scientist and group leader at Lawrence Berkeley National Lab, and has served as a visiting professor at Stanford University, Yale University and UC Berkeley. The article, which has no external funding source, arose out of their combined experiences and observations and was motivated by the rising public interest in internet energy use. Although the piece contains no new data or conclusions about the current energy use or environmental impacts of different technologies and sectors, it raises some important technical issues the field currently faces.
    Masanet and Koomey’s work involves gathering data and building models of energy use to understand trends and make predictions. Unfortunately, IT systems are complicated and data is scarce. “The internet is a really complex system of technologies and it changes fast,” Masanet said. What’s more, in the competitive tech industry, companies often guard energy and performance data as proprietary trade secrets. “There’s a lot of engineering that goes into their operations,” he added, “and they often don’t want to give that up.”
    Four fallacies
    This feeds directly into the first of four major pitfalls the two researchers identified: oversimplification. Every model is a simplification of a real-world system. It has to be. But simplification becomes a pitfall when analysts overlook important aspects of the system. For example, models that underestimate improvements to data center efficiency often overestimate growth in their energy use. More

  • in

    Researchers look to human 'social sensors' to better predict elections and other trends

    Election outcomes are notoriously difficult to predict. In 2016, for example, most polls suggested that Hillary Clinton would win the presidency, but Donald Trump defeated her. Researchers cite multiple explanations for the unreliability in election forecasts — some voters are difficult to reach, and some may wish to remain hidden. Among those who do respond to surveys, some may change their minds after being polled, while others may be embarrassed or afraid to report their true intentions.
    In a new perspective piece for Nature, Santa Fe Institute researchers Mirta Galesic, Jonas Dalege, Henrik Olsson, Daniel Stein, Tamara van der Does, and their collaborators* propose a surprising way to get around these shortcomings in survey design — not just in the world of politics, but in other types of research as well. While it’s widely assumed that cognitive bias clouds our assessment of the people around us, their research and that of others suggests that in fact, our estimations of what our friends and family believe are often accurate.
    “We realized that if we ask a national sample of people about who their friends are going to vote for, we get more accurate predictions than if we ask them who they’re going to vote for,” says Galesic, who is the corresponding author. “We found that people are actually pretty good at estimating the beliefs of people around them.”
    That means researchers can gather highly accurate information about social trends and groups by asking about a person’s social circle rather than interrogating their own individual beliefs. That’s because as highly social creatures, we have become very good at sizing up those around us — what researchers call “social sensing.”
    When people are selected to represent a particular group, their perceptions, combined with new computational models of human social dynamics, can be used to identify emerging trends and better predict political and health-related developments in particular, the team writes. This approach, combining elements of psychology and sociology, can even be harnessed to devise interventions that “could steer social systems in different directions” after a major event, such as a natural disaster or a mass shooting, they suggest.
    “I really hope human social sensing will be included in the standard social science toolbox, because I think it can be a very useful strategy for predicting and modeling societal trends,” Galesic says.
    * Mirta Galesic (Santa Fe Institute), Wändi Bruine de Bruin (University of Southern California), Jonas Dalege (Santa Fe Institute), Scott Feld (Purdue University); Frauke Kreuter (LMU Munich, University of Maryland); Henrik Olsson (Santa Fe Institute); Drazen Prelec (Sloan School of Management, MIT); Daniel Stein (New York University, Santa Fe Institute), and Tamara van der Does (Santa Fe Institute) are co-authors on the perspective piece.
    Story Source:
    Materials provided by Santa Fe Institute. Note: Content may be edited for style and length. More

  • in

    New research lifts the clouds on land clearing and biodiversity loss

    QUT researchers have developed a new machine learning mathematical system that helps to identify and detect changes in biodiversity, including land clearing, when satellite imagery is obstructed by clouds.
    Using statistical methods to quantify uncertainty, the research, published in Remote Sensing in Ecology and Conservation, analysed available satellite images of an 180km square area in central south-east Queensland.
    The region is home to many native species including the critically endangered northern hairy-nosed wombat and the vulnerable greater glider, and the area mainly consists of forest, pasture, and agricultural land.
    Dr Jacinta Holloway-Brown says measuring changes in forest cover over time is essential to track and preserve habitats and is a key sustainable development goal by the United Nations and World Bank to manage forests sustainably.
    “Satellite imagery is important as it is too difficult and expensive to frequently collect field data over large, forested areas,” Dr Holloway-Brown said.
    “The problem with using satellite imagery is large portions of the earth are obscured by clouds and this cloud cover causes large and frequent amounts of missing data.”
    Dr Holloway-Brown said it was estimated based on 12 years of satellite imagery on average approximately 67 per cent of the earth is obscured by cloud cover. More

  • in

    Thinking in 3D improves mathematical skills

    Spatial reasoning ability in small children reflects how well they will perform in mathematics later. Researchers from the University of Basel recently came to this conclusion, making the case for better cultivation of spatial reasoning.
    Good math skills open career doors in the natural sciences as well as technical and engineering fields. However, a nationwide study on basic skills conducted in Switzerland in 2019 found that schoolchildren achieved only modest results in mathematics. But it seems possible to begin promoting math skills from a young age, as Dr. Wenke Möhring’s team of researchers from the University of Basel reported after studying nearly 600 children.
    The team found a correlation between children’s spatial sense at the age of three and their mathematical abilities in primary school. “We know from past studies that adults think spatially when working with numbers — for example, represent small numbers to the left and large ones to the right,” explains Möhring. “But little research has been done on how spatial reasoning at an early age affects children’s learning and comprehension of mathematics later.”
    The study, which was published in the journal Learning and Instruction, suggests that there is a strong correlation between early spatial skills and the comprehension of mathematical concepts later. The researchers also ruled out the possibility that this correlation is due to other factors, such as socio-economic status or language ability. Exactly how spatial ability affects mathematical skills in children is still unclear, but the spatial conception of numbers might play a role.
    The findings are based on the analysis of data from 586 children in Basel, Switzerland. As part of a project on language acquisition of German as a second language, the researchers gave three-year-old children a series of tasks to test cognitive, socio-emotional and spatial abilities. For example, the children were asked to arrange colored cubes in certain shapes. The researchers repeated these tests four times at an interval of about 15 months and compared the results with the academic performance of seven-year-old children in the first grade.
    The researchers also closely examined whether the pace of development, i.e. particularly rapid development of spatial abilities, can predict future mathematical ability. Past studies with a small sample size had found a correlation, but Möhring and her colleagues were unable to confirm this in their own study. Three-year-old children who started out with low spatial abilities improved them faster in the subsequent years, but still performed at a lower level in mathematics when they were seven years old. Despite faster development, by the time they began school these children had still not fully caught up with the children possessing higher initial spatial reasoning skills.
    “Parents often push their children in the area of language skills,” says Möhring. “Our results suggest how important it is to cultivate spatial reasoning at an early age as well.” There are simple ways to do this, such as using “spatial language” (larger, smaller, same, above, below) and toys — e.g. building blocks — that help improve spatial reasoning ability.
    Spatial reasoning and gender
    The researchers found that boys and girls are practically indistinguishable in terms of their spatial reasoning ability at the age of three, but in subsequent years this develops more slowly in girls. Möhring and her colleagues suspect that boys may hear more “spatial language” and that toys typically designed for boys often promote spatial reasoning, whereas toys for girls focus mainly on social skills. Children may also internalize their parents’ and teacher’s expectations and then, as they grow up, live up to stereotypes — for example, that women do not perform as well in the areas of spatial reasoning and mathematics as men.
    Story Source:
    Materials provided by University of Basel. Note: Content may be edited for style and length. More