Brighton Pavilion

10thAnnual Conference of the International Speech Communication Association

ISCA Interspeech 2009 Brighton

Tutorials Day - Sunday 6 September 2009

T-5: In-Vehicle Speech Processing & Analysis

Presented by John H.L. Hansen and Pinar Boyraz

Outline

Recent trends in technology has resulted in the modern automobile being transformed into an interactive device equipped with new interfaces (e.g., speech recognition for command/control), communication systems (cell-phone), multi-media applications (internet, music, video – iPod, PDF, text messaging), navigation systems and several driver-assistance systems for better and safer driving experiences (i.e., road and traffic information feed, lane departure warning, speed limit warning). This transition has two main implications for in-vehicle system design. First, it brings opportunities to upgrade automobiles as intelligent, inter-active platform devices in favor of enhanced driving experiences in terms of both comfort and safety. The second implication is that drivers are exposed to new sources of distraction via visual, auditory and cognitive channels. If the accident causes are examined closely, more than 90% of them are linked to driver errors due to limited perception, cognition or faulty decision processes. Also, in 2006, it is estimated that 60% of all cell-phone calls were initiated while the caller was in a vehicle. Therefore, designing effective in-vehicle interactive systems has crucial impact on how safe future vehicles can be operated without excessive driver workload.

Since it is widely known that the task of driving mostly relies on visual feedback and situation awareness, the auditory perception channel is thought to be under-utilized. In addition, eyes off-the-road duration is directly related to increase in the risk of accident involvement (on average, if a driver’s is looking away for more than 1.5 sec, it is considered a distraction). Therefore, it has been proposed that present and future inter-active driver assistance systems be based on speech technology, since this will be a natural and convenient solution for reducing the driver workload.

In this tutorial, we will focus on speech technology for in-vehicle use by discussing the cutting-edge developments in these two applications:

  1. Speech as interface: Robust speech recognition system development under vehicle-noise conditions (i.e. engine, open windows, A/C operation). This field of study includes application of microphone-arrays for in-vehicle use to reduce the effect of the noise on speech recognition employing beam-forming algorithms. The resultant system can be employed as a driver-vehicle interface for entering prompts and commands for music search, control of in-vehicle systems such as cell-phone, A/C, windows etc. instead of manual operation which engages the driver visually as well.
  2. Speech as monitoring system: Speech can be used to design a sub-module for driver-monitoring systems. For the last two decades speech under stress studies has contributed to improve the performance of ASR systems. Detecting stress in speech can also help improving the performance of driver monitoring systems which conventionally relies on computer vision applications of driver head and eye tracking. On the other hand, the effects of introducing speech technologies as an interface can be assessed via driver behaviour modeling studies. Therefore in this part, we will mainly explore two areas:
    1. Can speech analyzing systems contribute for obtaining more reliable driver monitoring/ distraction detection systems to be used in active safety?
    2. Is speech (prompted, neutral, under risky conditions, under car noise) affecting the attention level/span of the drivers? Is it a cause of distraction increasing accident risk?

Speaker Biography

John H.L. Hansen received the Ph.D. and M.S. degrees in Electrical Engineering from Georgia Institute of Technology, Atlanta, Georgia, in 1988 and 1983, and B.S.E.E. degree from Rutgers University, College of Engineering, New Brunswick, N.J. in 1982. He joined University of Texas at Dallas (UTD), Erik Jonsson School of Engineering and Computer Science in the fall of 2005, where he is Professor and Department Chairman of Electrical Engineering, and holds the Distinguished University Chair in Telecommunications Engineering. He also holds a joint appointment as Professor in the School of Brain and Behavioral Sciences (Speech & Hearing). At UTD, he established the Center for Robust Speech Systems (CRSS) which is part of the Human Language Technology Research Institute. Previously, he served as Department Chairman and Professor in the Dept. of Speech, Language and Hearing Sciences (SLHS), and Professor in the Dept. of Electrical & Computer Engineering, at Univ. of Colorado Boulder (1998-2005), where he co-founded the Center for Spoken Language Research. In 1988, he established the Robust Speech Processing Laboratory (RSPL) and continues to direct research activities in CRSS at UTD. In 2007, he was named IEEE Fellow for contributions in "Robust Speech Recognition in Stress and Noise," and is currently serving as Member of the IEEE Signal Processing Society Speech Technical Committee and Educational Technical Committee. Previously, he has served as Technical Advisor to U.S. Delegate for NATO (IST/TG-01), IEEE Signal Processing Society Distinguished Lecturer (2005/06), Associate Editor for IEEE Trans. Speech & Audio Processing (1992-99), Associate Editor for IEEE Signal Processing Letters (1998-2000), Editorial Board Member for the IEEE Signal Processing Magazine (2001-03). He has also served as guest editor of the Oct. 1994 special issue on Robust Speech Recognition for IEEE Trans. Speech & Audio Proc. He has served on the Speech Communications Technical Committee for the Acoustical Society of America (2000-03), and is serving as a member of the ISCA (Inter. Speech Communications Association) Advisory Council. His research interests span the areas of digital speech processing, analysis and modeling of speech and speaker traits, speech enhancement, feature estimation in noise, robust speech recognition with emphasis on spoken document retrieval, and in-vehicle interactive systems for hands-free human-computer interaction. He was the recipient of a Whitaker Foundation Biomedical Research Award, an NSF’ Research Initiation Award, and has been named a Lilly Foundation Teaching Fellow for “Contributions to the Advancement of Engineering Education.” He has supervised 43 (18 PhD, 25 MS) thesis candidates, was recipient of the 2005 University of Colorado Teacher Recognition Award as voted by the student body, and author/co-author of 292 journal and conference papers in the field of speech processing and communications, coauthor of the textbook Discrete-Time Processing of Speech Signals, (IEEE Press, 2000), co-editor of DSP for In-Vehicle and Mobile Systems (Springer, 2004), Advances for In-Vehicle and Mobile Systems: Challenges for International Standards (Springer, 2006), and lead author of the report “The Impact of Speech Under ‘Stress’ on Military Speech Technology,” (NATO RTO-TR-10, 2000). He also organized and served as General Chair for ICSLP/Interspeech-2002: International Conference on Spoken Language Processing, Sept. 16-20, 2002, and will serve as Technical Program Chair for IEEE ICASSP-2010, Dallas, TX.

Pinar Boyraz received the double major B.S. degrees in Textile Engineering andMechanical Engineering with a focus in System Dynamics and Control Engineering with high honors (4th and 2nd place respectively in graduation) from Istanbul Technical University, Istanbul, Turkey in 2003 and 2004. She was awarded her PhD degree in Mechatronics from Loughborough University, United Kingdom in July 2008. She has been awarded merit-based scholarships by Haci Omer Sabanci Foundation (TR, 1998-2003), Royal Academy of Engineering (UK, 2007) and full-scholarship from Loughborough University (UK, 2004-2007) during her studies. She is currently a Research Associate in the Erik Jonsson School of Engineering and Computer Science, University of Texas at Dallas (UTD), Richardson, U.S.A. since February 2008. Her research interests include applications of control theory, robust & optimal control; signal processing, vehicle/system dynamics, image/video processing and analysis, applications of artificial intelligence (Artificial Neural Networks, Fuzzy Inference Systems and Genetic Algorithms). Her recent focus is mathematical modelling (i.e. Hidden Markov Models, Gaussian Mixtures and Control Theory) of driver behaviour for development of active vehicle safety and control systems.