A robot moves a toy package of butter around a table in the Intelligent Robotics and Vision Lab at The University of Texas at Dallas. With every push, the robot is learning to recognize the object through a new system developed by a team of UT Dallas computer scientists.
The new system allows the robot to push objects multiple times until a sequence of images are collected, which in turn enables the system to segment all the objects in the sequence until the robot recognizes the objects. Previous approaches have relied on a single push or grasp by the robot to “learn” the object.
The team presented its research paper at the Robotics: Science and Systems conference July 10-14 in Daegu, South Korea. Papers for the conference are selected for their novelty, technical quality, significance, potential impact and clarity.
The day when robots can cook dinner, clear the kitchen table and empty the dishwasher is still a long way off. But the research group has made a significant advance with its robotic system that uses artificial intelligence to help robots better identify and remember objects, said Dr. Yu Xiang, senior author of the paper.
“If you ask a robot to pick up the mug or bring you a bottle of water, the robot needs to recognize those objects,” said Xiang, assistant professor of computer science in the Erik Jonsson School of Engineering and Computer Science.
The UTD researchers’ technology is designed to help robots detect a wide variety of objects found in environments such as homes and to generalize, or identify, similar versions of common items such as water bottles that come in varied brands, shapes or sizes.
Inside Xiang’s lab is a storage bin full of toy packages of common foods, such as spaghetti, ketchup and carrots, which are used to train the lab robot, named Ramp. Ramp is a Fetch Robotics mobile manipulator robot that stands about 4 feet tall on a round mobile platform. Ramp has a long mechanical arm with seven joints. At the end is a square “hand” with two fingers to grasp objects.
advertisement
Xiang said robots learn to recognize items in a comparable way to how children learn to interact with toys.
“After pushing the object, the robot learns to recognize it,” Xiang said. “With that data, we train the AI model so the next time the robot sees the object, it does not need to push it again. By the second time it sees the object, it will just pick it up.”
What is new about the researchers’ method is that the robot pushes each item 15 to 20 times, while the previous interactive perception methods only use a single push. Xiang said multiple pushes enable the robot to take more photos with its RGB-D camera, which includes a depth sensor, to learn about each item in more detail. This reduces the potential for mistakes.
The task of recognizing, differentiating and remembering objects, called segmentation, is one of the primary functions needed for robots to complete tasks.
“To the best of our knowledge, this is the first system that leverages long-term robot interaction for object segmentation,” Xiang said.
Ninad Khargonkar, a computer science doctoral student, said working on the project has helped him improve the algorithm that helps the robot make decisions.
advertisement
“It’s one thing to develop an algorithm and test it on an abstract data set; it’s another thing to test it out on real-world tasks,” Khargonkar said. “Seeing that real-world performance — that was a key learning experience.”
The next step for the researchers is to improve other functions, including planning and control, which could enable tasks such as sorting recycled materials.
Other UTD authors of the paper included computer science graduate student Yangxiao Lu; computer science seniors Zesheng Xu and Charles Averill; Kamalesh Palanisamy MS’23; Dr. Yunhui Guo, assistant professor of computer science; and Dr. Nicholas Ruozzi, associate professor of computer science. Dr. Kaiyu Hang from Rice University also participated.
The research was supported in part by the Defense Advanced Research Projects Agency as part of its Perceptually-enabled Task Guidance program, which develops AI technologies to help users perform complex physical tasks by providing task guidance with augmented reality to expand their skill sets and reduce errors.
Conference paper submitted to arXiv: https://arxiv.org/abs/2302.03793