Brighton Pavilion

10thAnnual Conference of the International Speech Communication Association

ISCA Interspeech 2009 Brighton

Technical Programme

This is the final programme for this session. For oral sessions, the timing on the left is the current presentation order, but this may still change, so please check at the conference itself. If you have signed in to My Schedule, you can add papers to your own personalised list.

Wed-Ses3-O1:
Language Recognition

Time:Wednesday 16:00 Place:Main Hall Type:Oral
Chair:Honza Černocký

16:00A Human Benchmark for Language Recognition

Rosemary Orr (University College Utrecht)
David van Leeuwen (TNO Human Factors)

In this study, we explore a human benchmark in language recognition, for the purpose of comparing human performance to machine performance in the context of the NIST LRE 2007. Humans are categorised in terms of language proficiency, and performance is presented per proficiency. The main challenge in this work is the design of a test and application of a performance metric which allows a meaningful comparison of humans and machines. The main result of this work is that where subjects have lexical knowledge of a language, even at a low level, they perform as well as the state of the art in language recognition systems in 2007.

16:20Large Margin Estimation of Gaussian Mixture Model Parameters with Extended Baum-Welch for Spoken Language Recognition

Donglai Zhu (Institute for Infocomm Research, Singapore)
Bin Ma (Institute for Infocomm Research, Singapore)
Haizhou Li (Institute for Infocomm Research, Singapore)

Discriminative training (DT) methods of acoustic models, such as SVM and MMI-training GMM, have been proved effective in spoken language recognition. In this paper we propose a DT method for GMM using the large margin (LM) estimation. Unlike traditional MMI or MCE methods, the LM estimation attempts to enhance the generalization ability of GMM to deal with new data that exists mismatch with training data. We define the multi-class separation margin as a function of GMM likelihoods, and derive update formulae of GMM parameters with the extended Baum-Welch algorithm. Results on the NIST language recognition evaluation (LRE) 2007 task show that the LM estimation achieves better performance and faster convergent speed than the MMI estimation.

16:40Linguistically-motivated automatic classification of regional French varieties

Cécile Woehrling (LIMSI-CNRS)
Philippe Boula de Mareüil (LIMSI-CNRS)
Martine Adda-Decker (LIMSI-CNRS)

The goal of this study is to automatically differentiate French varieties (standard French and French varieties spoken in the South of France, Alsace, Belgium nd Switzerland) by applying a linguistically-motivated approach. We took dvantage of automatic phoneme alignment to measure vowel formants, consonant (de)voicing, pronunciation variants as well as prosodic cues. These features were then used to identify French varieties by applying classification techniques. On large corpora of hundreds of speakers, over 80% correct identification scores were obtained. The confusions between varieties and the features used (by decision trees) are linguistically grounded.

17:00Discriminative Acoustic Language Recognition via Channel-Compensated GMM Statistics

Niko Brummer (AGNITIO)
Albert Strasheim (AGNITIO)
Valiantsina Hubeika (Brno University of Technology)
Pavel Matejka (Brno University of Technology)
Lukas Burget (Brno University of Technology)
Ondrej Glembek (Brno University of Technology)

We propose a novel design for acoustic feature-based automatic spoken language recognizers. Our design is inspired by recent advances in text-independent speaker recognition, where intra-class variability is modeled by factor analysis in Gaussian mixture model (GMM) space. We use approximations to GMM-likelihoods which allow variable-length data sequences to be represented as statistics of fixed size. Our experiments on NIST LRE'07 show that variability-compensation of these statistics can reduce error-rates by a factor of three. Finally, we show that further improvements are possible with discriminative logistic regression training.

17:20Language Score Calibration using Adapted Gaussian Back-end

Mohamed Faouzi BenZeghiba (LIMSI-CNRS)
Jean-luc Gauvain (LIMSI-CNRS)
Lori Lamel (LIMSI-CNRS)

Generative Gaussian back-end and discriminative logistic regression are the most used approaches for language score fusion and calibration. Combination of these two approaches can significantly improve the performance. This paper proposes the use of an adapted Gaussian back-end, where the mean of the language-dependent Gaussian is adapted from the mean of a language-specific background Gaussian via maximum a posteriori estimation algorithm. Experiments are conducted using the LRE-07 evaluation data. Compared to the conventional Gaussian back-end approach for a closed set task, relative improvements in the C_avg of 50%, 17% and 4.2% are obtained on the 30s, 10s and 3s conditions, respectively. Besides this, the estimated scores are better calibrated. A combination with logistic regression results in a system with the best calibrated scores.

17:40A Framework for Discriminative SVM/GMM Systems for Language Recognition

William Campbell (MIT Lincoln Laboratory)
Zahi Karam (MIT Lincoln Laboratory, DSPG Research Laboratory of Electronics at MIT)

Language recognition with support vector machines and shifted-delta cepstral features has been an excellent performer in NIST-sponsored language evaluation for many years. A novel improvement of this method has been the introduction of hybrid SVM/GMM systems. These systems use GMM supervectors as an SVM expansion for classification. In prior work, methods for scoring SVM/GMM systems have been introduced based upon either standard SVM scoring or GMM scoring with a pushed model. Although prior work showed experimentally that GMM scoring yielded better results, no framework was available to explain the connection between SVM scoring and GMM scoring. In this paper, we show that there are interesting connections between SVM scoring and GMM scoring. We provide a framework both theoretically and experimentally that connects the two scoring techniques. This connection should provide the basis for further research in SVM discriminative training for GMM models.