Response selection and turn-taking for a sensitive artificial listening agent


Maat, Mark ter (2011) Response selection and turn-taking for a sensitive artificial listening agent. thesis.

Abstract:This thesis focusses on two aspects of the interaction between a user and a
virtual human, namely the perception of turn-taking strategies and the selec-
tion of appropriate responses. This research was carried out in the context of
the SEMAINE project, in which a virtual listening agent was built: a virtual
agent that tries to keep the user talking for as long as possible. Additionally,
the system consists of four specific characters, each with a certain emotional
state: a happy, a gloomy, an aggressive, and a pragmatic one.
The first part describes the study of how different turn-taking strategies
used by a dialogue system in
uence the perception that users have of that
system. These turn-taking strategies are different start times of the next
turn (starting before the user's turn has finished, directly when it finishes
or after a small pause) and different reactions when overlapping speech is
detected (stop speaking, continue normally or continue with a raised voice).
These strategies were evaluated in two studies. In the first study, users had
to listen to simulated, non-intelligible conversations in which one participant
used a predetermined turn-taking strategy. In the second study, users were
interviewed by a dialogue system, but the exact timing of each question was
controlled by a human wizard. After each study, the users had to complete
a questionnaire containing semantic differential scales about how they per-
ceived the participant in the conversation.
The final part describes the response selection of the listening agent. We
decided to select an appropriate response based on the non-verbal input,
rather than on the content of the user's speech, to make the listening agent
capable of responding appropriately regardless of the topic. This thesis first
describes the handcrafted models and then the more data-driven approach.
In this approach, humans annotated videos containing user turns with appro-
priate possible responses. Classifiers were then used to learn how to respond
after a user's turn. The classifiers were tested by letting them predict appro-
priate responses for new fragments and let humans rate these responses. We
found that some classifiers produced significantly more appropriate responses
than a random model.
