Brighton Pavilion

10thAnnual Conference of the International Speech Communication Association

ISCA Interspeech 2009 Brighton

Tutorials Day - Sunday 6 September 2009

T-8: Statistical approaches to dialogue systems

Presented by Jason Williams, Steve Young and Blaise Thomson

Outline

Although spoken dialog systems are widely deployed in numerous commercial applications, they are far from a solved problem. Speech recognition errors and the unpredictability of users’ behavior often confound dialog systems, leading to failed interactions. In the absence of perfect speech recognition, advances in higher-level dialog technology hold the promise of enabling more robust spoken interfaces, and expanding the scope of tasks suitable for real-world spoken dialog systems.

Recent work has explored applying statistical techniques to spoken dialogue systems. For example, whereas traditional techniques track a single hypothesis for the dialog state and rely on local confidence scores, recent work tracks a “beam” of plausible dialog states. Commonality across N-Best lists can be combined, and a proper probability estimate can be assigned to each hypothesis based on all recognitions over the course of the dialog. In addition, techniques like reinforcement learning have been applied to choosing system actions. This enables more features of the dialog history to inform action choices, and to automatically explore candidate dialog paths to find optimal sequences of actions in far greater detail than a human designer could feasibly do.

These ideas have been formalized into models based on Markov decision processes (MDPs), partially observable Markov decision processes (POMDPs), and Bayesian networks with utility maximization, among others. Specialized versions of these techniques have been tailored to the real-time dialog management problem. Empirically, systems implemented using these techniques have been shown to outperform traditional methods in simulation and with real people.

Even so, many of these techniques come from disciplines peripheral to the speech, language, and dialog research community. For example, POMDPs and reinforcement learning come from the operations research and AI traditions, and are uncommon in speech and language applications. It can be difficult to acquaint oneself with this research area without a substantial investment in learning the underlying methods.

The objective of this tutorial is to provide a comprehensive, cohesive overview of statistical techniques in dialog management for the newcomer. Specifically we will start by motivating the research area by showing how traditional techniques fail and intuitively why statistical techniques would be expected to do better. Then, in classroom style presentation, we will explain the core algorithms and how they have been applied to spoken dialogue systems. Our intention is to provide a cohesive treatment of the techniques using a unified, common notation in order to give the audience a clear understanding of how the techniques interrelate. Finally we will report results from the literature to provide an indication of the impact in practice. Through the tutorial we will draw on both our own work and the literature (with citations throughout), and wherever possible we will use audio/video recordings of interactions to illustrate operation. We will provide lecture notes and a comprehensive bibliography. Our aim is that attendees to this course should be able to readily read papers in this area, comment on them meaningfully, and (we hope!) suggest avenues for future work in this area rich in open challenges and begin research enquiries of their own.

Speaker Biography

Jason Williams is Principal Member of Technical Staff at AT&T Labs – Research. He received a BSE in Electrical Engineering from Princeton University in 1998, and at Cambridge University he received an M Phil in Computer Speech and Language Processing in 1999 and a Ph D in Information Engineering in 2006. His main research interests are dialog management, the design of spoken language systems, and planning under uncertainty. He has more than 20 technical publications, and has given over 20 technical talks to conferences, workshops, and research groups. For 3 years he taught the spoken dialog systems portion of Cambridge’s M Phil course in Computer Speech, Text and Internet Technology. He is currently Editor-in-chief of the IEEE SLTC’s Newsletter. Prior to entering research, he built commercial spoken dialogue systems for Tellme Networks (now Microsoft) and others. He also served as a consultant with McKinsey & Company’s Business Technology Office.

Steve Young is Professor of Information Engineering and Head of the Information Engineering Division at Cambridge University, UK. His main research interests lie in the area of spoken language systems including speech recognition, speech synthesis and dialogue management. He was the original author of the HTK toolkit and a key contributor in the development of large vocabulary speech recognition systems. More recently, he has pioneered the development of statistical approaches to dialogue management. He is a Fellow of the Royal Academy of Engineering, the Institution of Electrical Engineers (IEE), the Institute of Electrical and Electronics Engineers (IEEE) and the RSA. He is also a member of the British Computer Society (BCS). He was Editor of Computer Speech and Language from 1993 to 2004. In 2004, he was a recipient of an IEEE Signal Processing Society Technical Achievement Award, and in 2008 he was elected Fellow of the International Speech Communication Association.

Blaise Thomson is currently completing a Ph. D. in Statistical Dialogue Systems at the University of Cambridge. He received a B.Sc. in Actuarial Science, Mathematics, Statistics and Computer Science from the University of Cape Town in 2003, and an M.Phil. in Computer Speech, Text and Internet Technologies from Cambridge University in 2006. He has 9 technical publications and is a co-chair of the 2009 ACL Student Research Workshop. His main research interests are in spoken dialog systems, user modeling and learning algorithms.