Can a robot laugh with you?
Spoken dialogue systems must be able to express empathy to achieve natural interaction with human users. However, laughter generation requires a high level of dialogue understanding.
Thus, implementing laughter in existing systems, such as in conversational robots, has been challenging.
As a first step toward solving this problem, rather than generating laughter from user dialogue, we focus on “shared laughter,” where a user laughs using either solo or speech laughs (initial laugh), and the system laughs in turn (response laugh).
The proposed system consists of three models: 1) initial laugh detection, 2) shared laughter prediction, and 3) laugh type selection.
We trained each model using a human-robot speed dating dialogue corpus. For the first model, a recurrent neural network was applied, and the detection performance achieved an F1 score of 82.6 percent. The second model used the acoustic and prosodic features of the initial laugh and achieved a prediction accuracy above that of the random prediction. The third model selects the type of system’s response laugh as social or mirthful laugh based on the same features of the initial laugh.
We then implemented the full shared laughter generation system in an attentive listening dialogue system and conducted a dialogue listening experiment. The proposed system improved the impression of the dialogue system such as empathy perception compared to a naive baseline without laughter and a reactive system that always responded with only social laughs.
We propose that our system can be used for situated robot interaction and also emphasize the need for integrating proper empathetic laughs into conversational robots and agents.
1 Introduction
Dialogue systems are commonly implemented in robots and virtual agents, with applications in task-based and conversational scenarios. For conversational scenarios, the focus is on natural language processing and on the emulation of other real conversational phenomena, such as backchannels, turn-taking, and fillers (Inoue et al., 2016; Hara et al., 2018; Hussain et al., 2019; Lala et al., 2019; Skantze, 2021). Laughter is another such phenomenon.
The implementation of a laughter model is a non-trivial task. Systems that try to emulate everyday conversation still struggle with the notion of when to laugh. Laughter stimuli may not be explicit, although humor recognition in a textual medium can produce reasonable results (Chen and Soo, 2018; Weller and Seppi, 2019; Annamoradnejad and Zoghi, 2020).
Situated conversation presents issues such as incorrect speech recognition, prosody, and timing that may complicate a system’s ability to respond adequately to a joke in real-time. Furthermore, the type of laughter used as a reaction to a stimulus can influence the atmosphere of a conversation.
For example, a user who describes an unfortunate event they experienced may be satisfied with a sympathetic chuckle, but a cheerful laugh would be inappropriate and could make the user feel embarrassed.
Given these types of issues in situated conversation, we propose another method of laughter implementation, shared laughter, in which the user initially laughs, and then the system responds with laughter as an empathetic response. In the case of human–human behavior, research has suggested that shared laughter can be framed in terms of speaker invitation and listener acceptance (Glenn, 1991).
Furthermore, it is clear that not all speaker laughs are invitations to respond with a shared laugh, so humans have to decide when it is appropriate to respond with laughter (Holt, 2010; Bonin et al., 2014). The prosodic and acoustic features of the laughter itself also differ between initial and response laughs (Truong and Trouvain, 2014). Therefore, shared laughter in the real world cannot be reduced to a call-and-response mechanism. This motivates us to construct a more appropriate computational model for shared laughter.
From a systems perspective, the shared laughter approach requires completing several sub-tasks. The first sub-task is laughter detection, which simply determines if a user has laughed. Once a laugh is detected, the next step is to decide if the agent should laugh as a response. Our study demonstrates that this is a less common occurrence than unshared laughter.
We also propose that the type of laughter generated by the system should be considered. Although humans have a vast range of laughs, a system may be restricted to only a few fixed laugh utterances. We show that more subtle “social” laughs are a necessary system feature, in addition to more explicit “mirthful” laughs.
The basis of our work is to implement this type of model in a real-time system. Given a user’s utterance, we should be able to detect if the utterance is a laugh, predict if the user should engage in shared laughter, and finally predict the type of laugh that should be used as a reaction.
This study analyzes and annotates a large corpus of human–robot interactions to understand the frequency and types of laughter used in shared laughter. We then extract training data to create models that can address the three tasks described earlier. Finally, we conducted a subjective experiment to evaluate the implemented model in a listening task. This work uses Japanese as the target language with the goal of implementing the shared laughter model in the android ERICA (Inoue et al., 2016).
Our contributions are useful for researchers in conversational dialogue systems. We show that shared laughter improves the perception of the system and that the type of laughter is influential. The sub-tasks of the shared laughter system can be modularized to enable incremental progress. The current implementation is achieved using only audio data and can function in real-time, making it relatively generalizable for other conversation systems.
The remainder of this article is organized as follows: Section 2 summarizes related works. Section 3 introduces the dialogue corpus and annotation of shared laughter samples. Section 4 explains the proposed system, each module, and its evaluations. We evaluate the shared laughter dialogue samples generated by the proposed system using crowdsourcing in Section 5 and discuss limitations and future work before concluding.
2 Related work
Laughter has been well studied in scientific literature, including an analysis of its function in human conversation and interaction (Provine, 2001; Glenn, 2003). Hearing laughter from others is known to trigger our laughter and be “contagious” (Provine, 1992), for example, when we see and hear laughter on television. In terms of shared laughter in conversation, it is a conversational behavior and expression that arises as a form of mimicry (Estow et al., 2007; Navarretta, 2016).
Furthermore, shared laughter has been the focus of conversation analysis in which the intensity, timing, and the type of response laugh have been systematically studied to identify patterns in laughter behavior (Bonin et al., 2014; Gupta et al., 2015; El Haddad et al., 2019). One study has examined laughter in the context of human–robot interactions (Batliner et al., 2019).
From an intelligent systems perspective, creating models that do automatic laughter detection is arguably the most common task using both audio and visual features for training (Truong and Van Leeuwen, 2007; Cosentino et al., 2016; Turker et al., 2017; Akhtar et al., 2018; Kantharaju et al., 2018; Ataollahi and Suarez, 2019; Gosztolya and Tóth, 2019), and ubiquitous devices such as computer microphones and web cameras can provide reasonably accurate detection.
These studies primarily distinguish between speech and laughter for an inter-pausal unit (IPU) or perform detection using a continuous model. They often try to capture information to predict user engagement or other internal states rather than dynamically respond to the actual laugh. Our previous work trained a model to predict shared laughter, although subjective experiments were not conducted, and the method of extracting training samples was not thorough (Lala et al., 2020).
Naturally, researchers have also attempted to integrate laughing behavior into robots and agents. An early example is the AVLaughterCycle (Urbain et al., 2009), which detects laughter from the user and mimics it through the virtual agent Greta.
Subsequent studies expanded on this by detecting features of the user’s laughter to generate a more suitable laugh for the agent (Niewiadomski et al., 2013; Hofmann et al., 2015) and increase the engagement and amusement level of the user. Another study used laughter in a social robot during a quiz game to analyze user engagement (Türker et al., 2017).
These studies used an external stimulus as a trigger for laughter, such as a funny video or game. Our study targets a different scenario, where the interaction is a dyadic conversation. In this case, we assume that the trigger for laughter is based entirely on the content of the conversation.
Laughter generation in robots and agents has also been addressed but primarily in terms of animation or movement to produce realistic laughs (Niewiadomski and Pelachaud, 2012; Ishi et al., 2016a; Ishi et al., 2019). For the majority of agents, the range of laughter utterances is restricted by the text-to-speech system and is unable to generate speech laughs, although recent research has addressed this issue by producing a large variety of natural-sounding laughs (Mori et al., 2019; Tits et al., 2020; Luong and Yamagishi, 2021).
Our study is positioned as an integrated laughter system for dyadic chatting conversation. We intend the system to be used in a real-time, situated environment and as a module that can be integrated into existing agents and social robot systems instead of as a stand-alone system. We emphasize that the decisions regarding if and how to laugh are a necessary requirement in the laughter system, and current agent systems tend to overlook the importance of laughter type selection.
This is taken from a long document, see the rest here frontiersin.org
Please Donate Below To Support Our Ongoing Work To Defend The Scientific Method
PRINCIPIA SCIENTIFIC INTERNATIONAL, legally registered in the UK as a company incorporated for charitable purposes. Head Office: 27 Old Gloucester Street, London WC1N 3AX.
Trackback from your site.
John Watt
| #
Can a robot laugh with you? No, it can’t. Robots have programmed ‘brains’, humans have consciousness and sub-consciousness and connection with multi-dimensional super-consciousness so robots will never, ever, be able to have a sense of humour. Only tech Geeks, who imagine their intelligence is contained within their heads, would ever imagine this is a serious question.
Reply
Howdy
| #
It’s a stupid robot, a synthetic at best, but no life-form intended. Why does it need to laugh? It cannot empathize. It can pretend and algorythmize though.
Have you seen the back of this stupid sophia machine? That isn’t all of it either. Who in their right mind would talk to the thing as if it was sentient, never mind the fact it got citizenship.
Childish and ludicrous toy.
Have a look at the wannabe female Johnny 5…
https://media.juiceonline.com/wp-content/uploads/2017/12/05165849/Robot-Sophia01-1.jpg
Reply