T-5: In-Vehicle Speech Processing & Analysis
Presented by
John H.L. Hansen and Pinar Boyraz
Outline
Recent trends in technology has resulted in the modern automobile being transformed into an interactive device
equipped with new interfaces (e.g., speech recognition for command/control), communication systems (cell-phone),
multi-media applications (internet, music, video – iPod, PDF, text messaging), navigation systems and several
driver-assistance systems for better and safer driving experiences (i.e., road and traffic information feed, lane
departure warning, speed limit warning). This transition has two main implications for in-vehicle system design.
First, it brings opportunities to upgrade automobiles as intelligent, inter-active platform devices in favor of enhanced
driving experiences in terms of both comfort and safety. The second implication is that drivers are exposed to new
sources of distraction via visual, auditory and cognitive channels. If the accident causes are examined closely, more
than 90% of them are linked to driver errors due to limited perception, cognition or faulty decision processes. Also,
in 2006, it is estimated that 60% of all cell-phone calls were initiated while the caller was in a vehicle. Therefore,
designing effective in-vehicle interactive systems has crucial impact on how safe future vehicles can be operated
without excessive driver workload.
Since it is widely known that the task of driving mostly relies on visual feedback and situation awareness, the
auditory perception channel is thought to be under-utilized. In addition, eyes off-the-road duration is directly
related to increase in the risk of accident involvement (on average, if a driver’s is looking away for more than 1.5
sec, it is considered a distraction). Therefore, it has been proposed that present and future inter-active driver
assistance systems be based on speech technology, since this will be a natural and convenient solution for reducing
the driver workload.
In this tutorial, we will focus on speech technology for in-vehicle use by discussing the cutting-edge developments
in these two applications:
- Speech as interface: Robust speech recognition system development under vehicle-noise conditions (i.e.
engine, open windows, A/C operation). This field of study includes application of microphone-arrays for
in-vehicle use to reduce the effect of the noise on speech recognition employing beam-forming algorithms. The
resultant system can be employed as a driver-vehicle interface for entering prompts and commands for music
search, control of in-vehicle systems such as cell-phone, A/C, windows etc. instead of manual operation which
engages the driver visually as well.
- Speech as monitoring system: Speech can be used to design a sub-module for driver-monitoring systems.
For the last two decades speech under stress studies has contributed to improve the performance of ASR
systems. Detecting stress in speech can also help improving the performance of driver monitoring systems
which conventionally relies on computer vision applications of driver head and eye tracking. On the other
hand, the effects of introducing speech technologies as an interface can be assessed via driver behaviour
modeling studies. Therefore in this part, we will mainly explore two areas:
- Can speech analyzing systems contribute for obtaining more reliable driver monitoring/ distraction detection
systems to be used in active safety?
- Is speech (prompted, neutral, under risky conditions, under car noise) affecting the attention level/span
of the drivers? Is it a cause of distraction increasing accident risk?
Speaker Biography
John H.L. Hansen received the Ph.D. and M.S. degrees in Electrical Engineering from Georgia Institute of Technology,
Atlanta, Georgia, in 1988 and 1983, and B.S.E.E. degree from Rutgers University, College of Engineering, New
Brunswick, N.J. in 1982. He joined University of Texas at Dallas (UTD), Erik Jonsson School of Engineering and
Computer Science in the fall of 2005, where he is Professor and Department Chairman of Electrical Engineering, and
holds the Distinguished University Chair in Telecommunications Engineering. He also holds a joint appointment as
Professor in the School of Brain and Behavioral Sciences (Speech & Hearing). At UTD, he established the Center for
Robust Speech Systems (CRSS) which is part of the Human Language Technology Research Institute. Previously,
he served as Department Chairman and Professor in the Dept. of Speech, Language and Hearing Sciences (SLHS),
and Professor in the Dept. of Electrical & Computer Engineering, at Univ. of Colorado Boulder (1998-2005), where
he co-founded the Center for Spoken Language Research. In 1988, he established the Robust Speech Processing Laboratory
(RSPL) and continues to direct research activities in CRSS at UTD. In 2007, he was named IEEE Fellow for
contributions in "Robust Speech Recognition in Stress and Noise," and is currently serving as Member of the IEEE
Signal Processing Society Speech Technical Committee and Educational Technical Committee. Previously, he has
served as Technical Advisor to U.S. Delegate for NATO (IST/TG-01), IEEE Signal Processing Society Distinguished
Lecturer (2005/06), Associate Editor for IEEE Trans. Speech & Audio Processing (1992-99), Associate Editor for
IEEE Signal Processing Letters (1998-2000), Editorial Board Member for the IEEE Signal Processing Magazine
(2001-03). He has also served as guest editor of the Oct. 1994 special issue on Robust Speech Recognition for IEEE
Trans. Speech & Audio Proc. He has served on the Speech Communications Technical Committee for the Acoustical
Society of America (2000-03), and is serving as a member of the ISCA (Inter. Speech Communications Association)
Advisory Council. His research interests span the areas of digital speech processing, analysis and modeling of speech
and speaker traits, speech enhancement, feature estimation in noise, robust speech recognition with emphasis on
spoken document retrieval, and in-vehicle interactive systems for hands-free human-computer interaction. He was
the recipient of a Whitaker Foundation Biomedical Research Award, an NSF’ Research Initiation Award, and has
been named a Lilly Foundation Teaching Fellow for “Contributions to the Advancement of Engineering Education.”
He has supervised 43 (18 PhD, 25 MS) thesis candidates, was recipient of the 2005 University of Colorado Teacher
Recognition Award as voted by the student body, and author/co-author of 292 journal and conference papers in
the field of speech processing and communications, coauthor of the textbook Discrete-Time Processing of Speech
Signals, (IEEE Press, 2000), co-editor of DSP for In-Vehicle and Mobile Systems (Springer, 2004), Advances for
In-Vehicle and Mobile Systems: Challenges for International Standards (Springer, 2006), and lead author of the
report “The Impact of Speech Under ‘Stress’ on Military Speech Technology,” (NATO RTO-TR-10, 2000). He also
organized and served as General Chair for ICSLP/Interspeech-2002: International Conference on Spoken Language
Processing, Sept. 16-20, 2002, and will serve as Technical Program Chair for IEEE ICASSP-2010, Dallas, TX.
Pinar Boyraz received the double major B.S. degrees in Textile Engineering andMechanical Engineering with a focus
in System Dynamics and Control Engineering with high honors (4th and 2nd place respectively in graduation) from
Istanbul Technical University, Istanbul, Turkey in 2003 and 2004. She was awarded her PhD degree in Mechatronics
from Loughborough University, United Kingdom in July 2008. She has been awarded merit-based scholarships by
Haci Omer Sabanci Foundation (TR, 1998-2003), Royal Academy of Engineering (UK, 2007) and full-scholarship
from Loughborough University (UK, 2004-2007) during her studies. She is currently a Research Associate in the
Erik Jonsson School of Engineering and Computer Science, University of Texas at Dallas (UTD), Richardson, U.S.A.
since February 2008. Her research interests include applications of control theory, robust & optimal control; signal
processing, vehicle/system dynamics, image/video processing and analysis, applications of artificial intelligence
(Artificial Neural Networks, Fuzzy Inference Systems and Genetic Algorithms). Her recent focus is mathematical
modelling (i.e. Hidden Markov Models, Gaussian Mixtures and Control Theory) of driver behaviour for development
of active vehicle safety and control systems.