Brighton Pavilion

10thAnnual Conference of the International Speech Communication Association

ISCA Interspeech 2009 Brighton

Tutorials Day - Sunday 6 September 2009

T-3: Language and Dialect Recognition

Presented by Jiri Navratil

Outline

Spoken language recognition (a.k.a Language ID or LID) is a task of recognizing the language from a sample spoken by an unknown speaker. Language ID finds applications in multi-lingual dialog systems, distillation, diarization and indexing systems, speaker detection and speech recognition. Often, LID represents one of the first and necessary processing steps in many speech processing systems. Furthermore, language, dialect, and accent are of interest in diarization, indexing/search, and may play an important auxiliary role in identifying speakers.

LID has seen almost four decades of active research. Benefitting from the development of public multi-lingual corpora in the 90’s, the progress in LID technology has accelerated in the 00’s tremendously. While the availability of large corpora served as an enabling medium, establishing a series of NIST-administered Language Recognition Evaluations (LRE) provided the research community with a common ground of comparison and proved to be a strong catalyst. In another positive way, the LRE series gave rise to a "cross-pollination effect" by effectively fusing the speaker and language recognition communities thus sharing and spreading their respective methods and techniques. In the past five years or so, a considerable success was achieved by focusing on and developing techniques to deal with channel and session variability, to improve acoustic language modeling by means of discriminative methods, and to further refine the basic phonotactic approaches.

The goal of this tutorial is to survey the LID area from a historical perspective as well as in its most modern state. Several important milestones contributing to the growth of the LID area will be identified. In a second, larger part, most successful state-of-the-art probabilistic approaches and modeling techniques will be described more in detail. Among these belong various phonotactic architectures, UBM-GMMs, discriminative techniques, and subspace modeling tricks. The closely related problem of detecting dialects will be discussed in the final part.

Speaker Biography

Jiri Navratil is a Research Staff Member in the Multilingual Analytics and User Technologies department at IBM Research. He is involved in research efforts in the area of voice-based authentication, spoken language ID, and statistical machine translation. Jiri received his MSc., and Ph.D. degrees in Electrical Engineering from Ilmenau Technical University , Germany, in 1994, 1998, respectively. In 1999, he joined the IBM Thomas J. Watson Research Center in Yorktown Heights, NY, where he is now member of the Statistical Content Analytics Group in the Multilingual Analytics and User Technologies department. He has published more than 40 conference and journal papers, and filed 10 patent applications. Dr. Navratil is recipient of the 1999 Johann-Philipp-Reis prize awarded by the German Association for Electrical, Electronic & Information Technologies VDE , the Deutsche Telecom, and the cities of Friedrichsdorf and Gelnhausen, for outstanding contributions to the field of language recognition. He received multiple invention achievement awards and a technical group award from IBM and served as the Watson chair of the User Interface Technologies interest area in 2004-2006.