More stories

  • in

    Tool transforms world landmark photos into 4D experiences

    Using publicly available tourist photos of world landmarks such as the Trevi Fountain in Rome or Top of the Rock in New York City, Cornell University researchers have developed a method to create maneuverable 3D images that show changes in appearance over time.
    The method, which employs deep learning to ingest and synthesize tens of thousands of mostly untagged and undated photos, solves a problem that has eluded experts in computer vision for six decades.
    “It’s a new way of modeling scenes that not only allows you to move your head and see, say, the fountain from different viewpoints, but also gives you controls for changing the time,” said Noah Snavely, associate professor of computer science at Cornell Tech and senior author of “Crowdsampling the Plenoptic Function,” presented at the European Conference on Computer Vision, held virtually Aug. 23-28.
    “If you really went to the Trevi Fountain on your vacation, the way it would look would depend on what time you went — at night, it would be lit up by floodlights from the bottom. In the afternoon, it would be sunlit, unless you went on a cloudy day,” Snavely said. “We learned the whole range of appearances, based on time of day and weather, from these unorganized photo collections, such that you can explore the whole range and simultaneously move around the scene.”
    Representing a place in a photorealistic way is challenging for traditional computer vision, partly because of the sheer number of textures to be reproduced. “The real world is so diverse in its appearance and has different kinds of materials — shiny things, water, thin structures,” Snavely said.
    Another problem is the inconsistency of the available data. Describing how something looks from every possible viewpoint in space and time — known as the plenoptic function — would be a manageable task with hundreds of webcams affixed around a scene, recording data day and night. But since this isn’t practical, the researchers had to develop a way to compensate.

    advertisement

    “There may not be a photo taken at 4 p.m. from this exact viewpoint in the data set. So we have to learn from a photo taken at 9 p.m. at one location, and a photo taken at 4:03 from another location,” Snavely said. “And we don’t know the granularity of when these photos were taken. But using deep learning allows us to infer what the scene would have looked like at any given time and place.”
    The researchers introduced a new scene representation called Deep Multiplane Images to interpolate appearance in four dimensions — 3D, plus changes over time. Their method is inspired in part on a classic animation technique developed by the Walt Disney Company in the 1930s, which uses layers of transparencies to create a 3D effect without redrawing every aspect of a scene.
    “We use the same idea invented for creating 3D effects in 2D animation to create 3D effects in real-world scenes, to create this deep multilayer image by fitting it to all these disparate measurements from the tourists’ photos,” Snavely said. “It’s interesting that it kind of stems from this very old, classic technique used in animation.”
    In the study, they showed that this model could be trained to create a scene using around 50,000 publicly available images found on sites such as Flickr and Instagram. The method has implications for computer vision research, as well as virtual tourism — particularly useful at a time when few can travel in person.
    “You can get the sense of really being there,” Snavely said. “It works surprisingly well for a range of scenes.”
    First author of the paper is Cornell Tech doctoral student Zhengqi Li. Abe Davis, assistant professor of computer science in the Faculty of Computing and Information Science, and Cornell Tech doctoral student Wenqi Xian also contributed.
    The research was partly supported by philanthropist Eric Schmidt, former CEO of Google, and Wendy Schmidt, by recommendation of the Schmidt Futures Program.

    Story Source:
    Materials provided by Cornell University. Original written by Melanie Lefkowitz. Note: Content may be edited for style and length. More

  • in

    New perception metric balances reaction time, accuracy

    Researchers at Carnegie Mellon University have developed a new metric for evaluating how well self-driving cars respond to changing road conditions and traffic, making it possible for the first time to compare perception systems for both accuracy and reaction time.
    Mengtian Li, a Ph.D. student in CMU’s Robotics Institute, said academic researchers tend to develop sophisticated algorithms that can accurately identify hazards, but may demand a lot of computation time. Industry engineers, by contrast, tend to prefer simple, less accurate algorithms that are fast and require less computation, so the vehicle can respond to hazards more quickly.
    This tradeoff is a problem not only for self-driving cars, but also for any system that requires real-time perception of a dynamic world, such as autonomous drones and augmented reality systems. Yet until now, there’s been no systematic measure that balances accuracy and latency — the delay between when an event occurs and when the perception system recognizes that event. This lack of an appropriate metric as made it difficult to compare competing systems.
    The new metric, called streaming perception accuracy, was developed by Li, together with Deva Ramanan, associate professor in the Robotics Institute, and Yu-Xiong Wang, assistant professor at the University of Illinois at Urbana-Champaign. They presented it last month at the virtual European Conference on Computer Vision, where it received a best paper honorable mention award.
    Streaming perception accuracy is measured by comparing the output of the perception system at each moment with the ground truth state-of-the-world.
    “By the time you’ve finished processing inputs from sensors, the world has already changed,” Li explained, noting that the car has traveled some distance while the processing occurs.
    “The ability to measure streaming perception offers a new perspective on existing perception systems,” Ramanan said. Systems that perform well according to classic measures of performance may perform quite poorly on streaming perception. Optimizing such systems using the newly introduced metric can make them far more reactive.
    One insight from the team’s research is that the solution isn’t necessarily for the perception system to run faster, but to occasionally take a well-timed pause. Skipping the processing of some frames prevents the system from falling farther and farther behind real-time events, Ramanan added.
    Another insight is to add forecasting methods to the perception processing. Just as a batter in baseball swings at where they think the ball is going to be — not where it is — a vehicle can anticipate some movements by other vehicles and pedestrians. The team’s streaming perception measurements showed that the extra computation necessary for making these forecasts doesn’t significantly harm accuracy or latency.
    The CMU Argo AI Center for Autonomous Vehicle Research, directed by Ramanan, supported this research, as did the Defense Advanced Research Projects Agency.

    Story Source:
    Materials provided by Carnegie Mellon University. Original written by Byron Spice. Note: Content may be edited for style and length. More

  • in

    Virtual tourism could offer new opportunities for travel industry, travelers

    A new proposal for virtual travel, using advanced mathematical techniques and combining livestream video with existing photos and videos of travel hotspots, could help revitalize an industry that has been devastated by the coronavirus pandemic, according to researchers at the Medical College of Georgia at Augusta University.
    In a new proposal published in Cell Patterns, Dr. Arni S.R. Srinivasa Rao, a mathematical modeler and director of the medical school’s Laboratory for Theory and Mathematical Modeling, and co-author Dr. Steven Krantz, a professor of mathematics and statistics at Washington University, suggest using data science to improve on existing television and internet-based tourism experiences. Their technique involves measuring and then digitizing the curvatures and angles of objects and the distances between them using drone footage, photos and videos, and could make virtual travel experiences more realistic for viewers and help revitalize the tourism industry, they say.
    They call this proposed technology LAPO, or Live Streaming with Actual Proportionality of Objects. LAPO employs both information geometry — the measures of an object’s curvatures, angles and area — and conformal mapping, which uses the measures of angles between the curves of an object and accounts for the distance between objects, to make images of people, places and things seem more real.
    “This is about having a new kind of technology that uses advanced mathematical techniques to turn digitized data, captured live at a tourist site, into more realistic photos and videos with more of a feel for the location than you would get watching amovie or documentary,” says corresponding author Rao. “When you go see the Statue of Liberty for instance, you stand on the bank of the Hudson River and look at it. When you watch a video of it, you can only see the object from one angle. When you measure and preserve multiple angles and digitize that in video form, you could visualize it from multiple angles. You would feel like you’re there while you’re sitting at home.”
    Their proposed combination of techniques is novel, Rao says. “Information geometry has seen wide applications in physics and economics, but the angle preservation of the captured footage is never applied,” he says.
    Rao and Krantz say the technology could help mediate some of the pandemic’s impact on the tourism industry and offer other advantages.
    Those include its cost-effectiveness, because virtual tourism would be cheaper; health safety, because it can be done from the comfort of home; it saves time, eliminating travel times; it’s accessibility — tourism hotspots that are not routinely accessible to seniors or those with physical disabilities would be; it’s safer and more secure, eliminating risks like becoming a victim of crime while traveling; and it requires no special equipment — a standard home computer with a graphics card and internet access is all that’s needed to enjoy a “virtual trip.”
    “Virtual tourism (also) creates new employment opportunities for virtual tour guides, interpreters, drone pilots, videographers and photographers, as well as those building the new equipment for virtual tourism,” the authors write.
    “People would pay for these experiences like they pay airlines, hotels and tourist spots during regular travel,” Rao says. “The payments could go to each individual involved in creating the experience or to a company that creates the entire trip, for example.”
    Next steps include looking for investors and partners in the hospitality, tourism and technology industries, he says.
    If the pandemic continues for several more months, the World Travel and Tourism Council, the trade group representing major global travel companies, projects a global loss of 75 million jobs and $2.1 trillion in revenue.
    Rao is a professor of health economics and modeling in the MCG Department of Population Health Sciences. More

  • in

    A new method for directed networks could help multiple levels of science

    Many complex systems have underlying networks: they have nodes which represent units of the system and their edges indicate connections between the units. In some contexts, the connections are symmetric, but in many they are directed, for example, indicating flows from one unit to another or which units affect which other units.
    A prime example of this is a food web, in which the nodes represent species and there is a directed edge from each species to those which eat it. In a directed network, the ecological concept of ‘trophic level’ allows one to assign a height to each node in such a way that on average the height goes up by one along each edge.
    The trophic levels can help to associate function to nodes, for example, plant, herbivore, carnivore in a food web. The concept was reinvented in economics, where it is called ‘upstreamness’, though it can be traced back to Leontief and the ‘output multiplier’. It is also an ingredient in the construction of SinkRank, a measure of contribution to systemic risk.
    Alongside ‘trophic level’, there is also ‘trophic incoherence’; this is the standard deviation of the distribution of height differences along edges and it gives a measure of the extent to which the directed edges fail to line up. The trophic incoherence is an indicator of network structure that has been related to stability, percolation, cycles, normality and various other system properties.
    Trophic level and incoherence are limited in various ways, however: they require the network to have basal nodes (ones with no incoming edges), the basal nodes are given too much emphasis, and if there is more than one they do not give a stable way to determine levels and incoherence for a piece of a network, and they do not give a natural notion of maximal incoherence.
    In the paper, ‘How directed is a directed network?’, published today, the 9th September in the journal Royal Society Open Science, researchers from the University of Warwick and the University of Birmingham reveal a new method for analysing hierarchies in complex networks and illustrate it by applications to economics, language and gene expression.
    The researchers introduce improved notions of trophic level and trophic coherence, which do not require basal or top nodes, are as easy to compute as the old notions, and are connected in the same way with network properties such as normality, cycles and spectral radius. They expect this to be a valuable tool in domains from ecology and biochemistry to economics, social science and humanities.
    Professor Robert MacKay, from the Mathematics Institute at the University of Warwick comments:
    “Our method makes hierarchical structure apparent in directed networks and quantifies the extent to which the edges do not line up. We expect it to be useful in disparate contexts, such as determining the extent of influence in a social network or organisational management, assessing the situation of the UK in the face of Brexit trade talks, illuminating how biochemical reaction networks function, and understanding how the brain works.”

    Story Source:
    Materials provided by University of Warwick. Note: Content may be edited for style and length. More

  • in

    Terahertz receiver for 6G wireless communications

    Future wireless networks of the 6th generation (6G) will consist of a multitude of small radio cells that need to be connected by broadband communication links. In this context, wireless transmission at THz frequencies represents a particularly attractive and flexible solution. Researchers have now developed a novel concept for low-cost terahertz receivers. More