T-3: Language and Dialect Recognition
Presented by
Jiri Navratil
Outline
Spoken language recognition (a.k.a Language ID or LID) is a task of recognizing the language from a sample spoken
by an unknown speaker. Language ID finds applications in multi-lingual dialog systems, distillation, diarization and
indexing systems, speaker detection and speech recognition. Often, LID represents one of the first and necessary
processing steps in many speech processing systems. Furthermore, language, dialect, and accent are of interest in
diarization, indexing/search, and may play an important auxiliary role in identifying speakers.
LID has seen almost four decades of active research. Benefitting from the development of public multi-lingual
corpora in the 90’s, the progress in LID technology has accelerated in the 00’s tremendously. While the availability
of large corpora served as an enabling medium, establishing a series of NIST-administered Language Recognition
Evaluations (LRE) provided the research community with a common ground of comparison and proved to be a strong
catalyst. In another positive way, the LRE series gave rise to a "cross-pollination effect" by effectively fusing the
speaker and language recognition communities thus sharing and spreading their respective methods and techniques.
In the past five years or so, a considerable success was achieved by focusing on and developing techniques to deal
with channel and session variability, to improve acoustic language modeling by means of discriminative methods,
and to further refine the basic phonotactic approaches.
The goal of this tutorial is to survey the LID area from a historical perspective as well as in its most modern
state. Several important milestones contributing to the growth of the LID area will be identified. In a second,
larger part, most successful state-of-the-art probabilistic approaches and modeling techniques will be described
more in detail. Among these belong various phonotactic architectures, UBM-GMMs, discriminative techniques,
and subspace modeling tricks. The closely related problem of detecting dialects will be discussed in the final part.
Speaker Biography
Jiri Navratil is a Research Staff Member in the Multilingual Analytics and User Technologies department at IBM
Research. He is involved in research efforts in the area of voice-based authentication, spoken language ID, and
statistical machine translation. Jiri received his MSc., and Ph.D. degrees in Electrical Engineering from Ilmenau
Technical University , Germany, in 1994, 1998, respectively. In 1999, he joined the IBM Thomas J. Watson
Research Center in Yorktown Heights, NY, where he is now member of the Statistical Content Analytics Group
in the Multilingual Analytics and User Technologies department. He has published more than 40 conference and
journal papers, and filed 10 patent applications. Dr. Navratil is recipient of the 1999 Johann-Philipp-Reis prize
awarded by the German Association for Electrical, Electronic & Information Technologies VDE , the Deutsche
Telecom, and the cities of Friedrichsdorf and Gelnhausen, for outstanding contributions to the field of language
recognition. He received multiple invention achievement awards and a technical group award from IBM and served
as the Watson chair of the User Interface Technologies interest area in 2004-2006.