Speech and language recognition technology is a rapidly developing field, which has led to the emergence of novel speech dialog systems, such as Amazon Alexa and Siri. A significant milestone in the development of dialog artificial intelligence (AI) systems is the addition of emotional intelligence. A system able to recognize the emotional states of the user, in addition to understanding language, would generate a more empathetic response, leading to a more immersive experience for the user.
“Multimodal sentiment analysis” is a group of methods that constitute the gold standard for an AI dialog system with sentiment detection. These methods can automatically analyze a person’s psychological state from their speech, voice color, facial expression, and posture and are crucial for human-centered AI systems. The technique could potentially realize an emotionally intelligent AI with beyond-human capabilities, which understands the user’s sentiment and generates a response accordingly.
However, current emotion estimation methods focus only on observable information and do not account for the information contained in unobservable signals, such as physiological signals. Such signals are a potential gold mine of emotions that could improve the sentiment estimation performance tremendously.
In a new study published in the journal IEEE Transactions on Affective Computing, physiological signals were added to multimodal sentiment analysis for the first time by researchers from Japan, a collaborative team comprising Associate Professor Shogo Okada from Japan Advanced Institute of Science and Technology (JAIST) and Prof. Kazunori Komatani from the Institute of Scientific and Industrial Research at Osaka University. “Humans are very good at concealing their feelings. The internal emotional state of a user is not always accurately reflected by the content of the dialog, but since it is difficult for a person to consciously control their biological signals, such as heart rate, it may be useful to use these for estimating their emotional state. This could make for an AI with sentiment estimation capabilities that are beyond human,” explains Dr. Okada.
The team analyzed 2468 exchanges with a dialog AI obtained from 26 participants to estimate the level of enjoyment experienced by the user during the conversation. The user was then asked to assess how enjoyable or boring they found the conversation to be. The team used the multimodal dialogue data set named “Hazumi1911,” which uniquely combined speech recognition, voice color sensors, facial expression and posture detection with skin potential, a form of physiological response sensing.
“On comparing all the separate sources of information, the biological signal information proved to be more effective than voice and facial expression. When we combined the language information with biological signal information to estimate the self-assessed internal state while talking with the system, the AI’s performance became comparable to that of a human,” comments an excited Dr. Okada.
These findings suggest that the detection of physiological signals in humans, which typically remain hidden from our view, could pave the way for highly emotionally intelligent AI-based dialog systems, making for more natural and satisfying human-machine interactions. Moreover, emotionally intelligent AI systems could help identify and monitor mental illness by sensing a change in daily emotional states. They could also come handy in education where the AI could gauge whether the learner is interested and excited over a topic of discussion, or bored, leading to changes in teaching strategy and more efficient educational services.
Story Source:
Materials provided by Japan Advanced Institute of Science and Technology. Note: Content may be edited for style and length.