That’s funny — but AI models don’t get the joke
Large neural networks, a form of artificial intelligence, can generate thousands of jokes along the lines of “Why did the chicken cross the road?” But do they understand why they’re funny?
 Using hundreds of entries from the New Yorker magazine’s Cartoon Caption Contest as a testbed, researchers challenged AI models and humans with three tasks: matching a joke to a cartoon; identifying a winning caption; and explaining why a winning caption is funny.
 In all tasks, humans performed demonstrably better than machines, even as AI advances such as ChatGPT have closed the performance gap. So are machines beginning to “understand” humor? In short, they’re making some progress, but aren’t quite there yet.
 “The way people challenge AI models for understanding is to build tests for them — multiple choice tests or other evaluations with an accuracy score,” said Jack Hessel, Ph.D. ’20, research scientist at the Allen Institute for AI (AI2). “And if a model eventually surpasses whatever humans get at this test, you think, ‘OK, does this mean it truly understands?’ It’s a defensible position to say that no machine can truly `understand’ because understanding is a human thing. But, whether the machine understands or not, it’s still impressive how well they do on these tasks.”
 Hessel is lead author of “Do Androids Laugh at Electric Sheep? Humor ‘Understanding’ Benchmarks from The New Yorker Caption Contest,” which won a best-paper award at the 61st annual meeting of the Association for Computational Linguistics, held July 9-14 in Toronto.
 Lillian Lee ’93, the Charles Roy Davis Professor in the Cornell Ann S. Bowers College of Computing and Information Science, and Yejin Choi, Ph.D. ’10, professor in the Paul G. Allen School of Computer Science and Engineering at the University of Washington, and the senior director of common-sense intelligence research at AI2, are also co-authors on the paper.
 For their study, the researchers compiled 14 years’ worth of New Yorker caption contests — more than 700 in all. Each contest included: a captionless cartoon; that week’s entries; the three finalists selected by New Yorker editors; and, for some contests, crowd quality estimates for each submission. More
 