T-8: Statistical approaches to dialogue systems
Presented by
Jason Williams, Steve Young and Blaise Thomson
Outline
Although spoken dialog systems are widely deployed in numerous commercial applications, they are far from a solved
problem. Speech recognition errors and the unpredictability of users’ behavior often confound dialog systems, leading
to failed interactions. In the absence of perfect speech recognition, advances in higher-level dialog technology hold
the promise of enabling more robust spoken interfaces, and expanding the scope of tasks suitable for real-world
spoken dialog systems.
Recent work has explored applying statistical techniques to spoken dialogue systems. For example, whereas traditional
techniques track a single hypothesis for the dialog state and rely on local confidence scores, recent work tracks
a “beam” of plausible dialog states. Commonality across N-Best lists can be combined, and a proper probability
estimate can be assigned to each hypothesis based on all recognitions over the course of the dialog. In addition,
techniques like reinforcement learning have been applied to choosing system actions. This enables more features
of the dialog history to inform action choices, and to automatically explore candidate dialog paths to find optimal
sequences of actions in far greater detail than a human designer could feasibly do.
These ideas have been formalized into models based on Markov decision processes (MDPs), partially observable
Markov decision processes (POMDPs), and Bayesian networks with utility maximization, among others. Specialized
versions of these techniques have been tailored to the real-time dialog management problem. Empirically, systems
implemented using these techniques have been shown to outperform traditional methods in simulation and with
real people.
Even so, many of these techniques come from disciplines peripheral to the speech, language, and dialog research
community. For example, POMDPs and reinforcement learning come from the operations research and AI traditions,
and are uncommon in speech and language applications. It can be difficult to acquaint oneself with this research
area without a substantial investment in learning the underlying methods.
The objective of this tutorial is to provide a comprehensive, cohesive overview of statistical techniques in dialog
management for the newcomer. Specifically we will start by motivating the research area by showing how traditional
techniques fail and intuitively why statistical techniques would be expected to do better. Then, in classroom style
presentation, we will explain the core algorithms and how they have been applied to spoken dialogue systems. Our
intention is to provide a cohesive treatment of the techniques using a unified, common notation in order to give the
audience a clear understanding of how the techniques interrelate. Finally we will report results from the literature
to provide an indication of the impact in practice. Through the tutorial we will draw on both our own work and
the literature (with citations throughout), and wherever possible we will use audio/video recordings of interactions
to illustrate operation. We will provide lecture notes and a comprehensive bibliography. Our aim is that attendees
to this course should be able to readily read papers in this area, comment on them meaningfully, and (we hope!)
suggest avenues for future work in this area rich in open challenges and begin research enquiries of their own.
Speaker Biography
Jason Williams is Principal Member of Technical Staff at AT&T Labs – Research. He received a BSE in Electrical
Engineering from Princeton University in 1998, and at Cambridge University he received an M Phil in Computer
Speech and Language Processing in 1999 and a Ph D in Information Engineering in 2006. His main research interests
are dialog management, the design of spoken language systems, and planning under uncertainty. He has more than
20 technical publications, and has given over 20 technical talks to conferences, workshops, and research groups. For
3 years he taught the spoken dialog systems portion of Cambridge’s M Phil course in Computer Speech, Text and
Internet Technology. He is currently Editor-in-chief of the IEEE SLTC’s Newsletter. Prior to entering research, he
built commercial spoken dialogue systems for Tellme Networks (now Microsoft) and others. He also served as a
consultant with McKinsey & Company’s Business Technology Office.
Steve Young is Professor of Information Engineering and Head of the Information Engineering Division at Cambridge
University, UK. His main research interests lie in the area of spoken language systems including speech recognition,
speech synthesis and dialogue management. He was the original author of the HTK toolkit and a key contributor in
the development of large vocabulary speech recognition systems. More recently, he has pioneered the development of
statistical approaches to dialogue management. He is a Fellow of the Royal Academy of Engineering, the Institution
of Electrical Engineers (IEE), the Institute of Electrical and Electronics Engineers (IEEE) and the RSA. He is also
a member of the British Computer Society (BCS). He was Editor of Computer Speech and Language from 1993 to
2004. In 2004, he was a recipient of an IEEE Signal Processing Society Technical Achievement Award, and in 2008
he was elected Fellow of the International Speech Communication Association.
Blaise Thomson is currently completing a Ph. D. in Statistical Dialogue Systems at the University of Cambridge.
He received a B.Sc. in Actuarial Science, Mathematics, Statistics and Computer Science from the University of Cape
Town in 2003, and an M.Phil. in Computer Speech, Text and Internet Technologies from Cambridge University in
2006. He has 9 technical publications and is a co-chair of the 2009 ACL Student Research Workshop. His main
research interests are in spoken dialog systems, user modeling and learning algorithms.