Brighton Pavilion

10thAnnual Conference of the International Speech Communication Association

ISCA Interspeech 2009 Brighton

Technical Programme

This is the final programme for this session. For oral sessions, the timing on the left is the current presentation order, but this may still change, so please check at the conference itself. If you have signed in to My Schedule, you can add papers to your own personalised list.

Wed-Ses2-S1:
Special Session: Active Listening & Synchrony

Time:Wednesday 13:30 Place:East Wing 4 Type:Special
Chair: Nick Campbell & Joakim Gustafson

13:30Understanding Speaker-Listener Interactions

Dirk Heylen (University of Twente)

We provide an eclectic generic framework to understand the back and forth interactions between participants in a conversation highlighting the complexity of the actions that listeners are engaged in. Communicative actions of one participant implicate the ``other" in many ways. In this paper, we try to enumerate some essential relevant dimensions of this reciprocal dependence.

13:50Detecting changes in speech expressiveness in participants of a radio program

Plínio Barbosa (Speech Prosody Studies Group/Dep. of Linguistics/Inst.Est. Ling., Univ. of Campinas, Brazil)

A method for speech expressiveness change detection is presented which combines a dimensional analysis of speech expression, a Principal Component Analysis technique, as well as multiple regression analysis. From the three inferred rates of activation, valence, and involvement, two PCA-factors explain 97 % of the variance of the judges' evaluations of a corpus of radio show interaction. The multiple regression analysis predicted the values of the two listener-oriented, PCA-derived dimensions of promptness and empathy from the acoustic parameters automatically obtained from a set of 206 utterances produced by radio show's participants. Analysed chronologically, the utterances reveal expression change from automatic acoustic analysis.

14:10An Audio-Visual Approach to Measuring \\Discourse Synchrony in Multimodal Conversation Data

Nick Campbell (Trinity College Dublin)

This paper describes recent work on the automatic extraction of visual and audio parameters relating to the detection of synchrony in discourse, and to the modelling of active listening for advanced speech technology. It reports findings based on image processing that reliably identify the strong entrainment between members of a group conversation, and describes techniques for the extraction and analysis of such information.

14:30Towards Flexible Representations for Analysis of Accomodation of Temporal Features in Spontaneous Dialogue Speech

Spyros Kousidis (Digital Media Center, Dublin Institute of Technology, Ireland)
David Dorran (Audio Research Group, Dublin Institute of Technology, Ireland)
Ciaran McDonnell (Digital Media Center, Dublin Institute of Technology, Ireland)
Eugene Coyle (Audio Research Group, Dublin Institute of Technology)

Current advances in spoken interface design point towards a shift towards more “human-like” interaction, as opposed to the traditional “push-to-talk” approach. However, human dialogue is characterized by synchrony and multi-modality, and these properties are not captured by traditional representation approaches, such as turn succession. This paper proposes an alternative representation schema for recorded (human) dialogues, which employs per frame averages of speaker turn distribution, in order to inform further analyses of temporal features (pauses and overlaps) in terms of inter-speaker accommodation. Preliminary results of such analyses are provided.

14:50Are we ‘in sync’: Turn-taking in collaborative dialogues

Štefan Beňuš (Constantine the Philosopher University, Nitra, Slovakia and Slovak Academy of Sciences, Bratislava, Slovakia)

We used a corpus of collaborative task oriented dialogues in American English to compare two units of rhythmic structure – pitch accents and syllables – within the coupled oscillator model of rhythmical entrainment in turn-taking proposed in Wilson & Wilson (2005). We found that pitch accents are a slightly better fit than syllables as the unit of rhythmical structure for the model, but we also observed weak support for the model in general. Some turn-taking types such as 'pause interruptions' and 'backchanneling' had more salient rhythmical characteristics than others.

15:10An Audio-Visual Attention System for Online Association Learning

Martin Heckmann (Honda Research Institute Europe GmbH)
Holger Brandl (Research Institute for Cognition and Robotics, University of Bielefeld)
Xavier Domont (University of Darmstadt, Institut für Automatisierungstechnik, FG Regelungstheorie)
Bram Bolder (Honda Research Institute Europe GmbH)
Frank Joublin (Honda Research Institute Europe GmbH)
Christian Goerick (Honda Research Institute Europe GmbH)

We present an audio-visual attention system for speech based interaction with a humanoid robot where a tutor can teach visual properties/locations (e.g "left") and corresponding, arbitrary speech labels. The acoustic signal is segmented via the attention system and speech labels are learned from a few repetitions of the label by the tutor. The attention system integrates bottom-up stimulus driven saliency calculation (delay-and-sum beamforming, adaptive noise level estimation) and top-down modulation (spectral properties, segment length, movement and interaction status of the robot). We evaluate the performance of different aspects of the system based on a small dataset.