Brighton Pavilion

10thAnnual Conference of the International Speech Communication Association

ISCA Interspeech 2009 Brighton

Technical Programme

This is the final programme for this session. For oral sessions, the timing on the left is the current presentation order, but this may still change, so please check at the conference itself. If you have signed in to My Schedule, you can add papers to your own personalised list.

Mon-Ses3-O1:
Automatic Speech Recognition: Language Models I

Time:Monday 16:00 Place:Main Hall Type:Oral
Chair:Steve Renals

16:00Back-Off Language Model Compression

Boulos Harb (Google, Inc.)
Ciprian Chelba (Google, Inc.)
Jeffrey Dean (Google, Inc.)
Sanjay Ghemawat (Google, Inc.)

With the availability of large amounts of training data relevant to speech recognition scenarios, scalability becomes a very productive way to improve language model performance. We present a technique that represents a back-off n-gram language model using arrays of integer values and thus renders it amenable to effective block compression. We propose a few such compression algorithms and evaluate the resulting language model along two dimensions: memory footprint, and speed reduction relative to the uncompressed one. We experimented with a model that uses a 32-bit word vocabulary (at most 4B words) and log-probabilities/back-off-weights quantized to 1 byte, respectively. The best compression algorithm achieves 2.6 bytes/n-gram at 18X slower than uncompressed.

16:20Improving Broadcast News Transcription with a Precision Grammar and Discriminative Reranking

Tobias Kaufmann (ETH Zurich)
Thomas Ewender (ETH Zurich)
Beat Pfister (ETH Zurich)

We propose a new approach of integrating a precision grammar into speech recognition. The approach is based on a novel robust parsing technique and discriminative reranking. By reranking 100-best output of the LIMSI German broadcast news transcription system we achieved a significant reduction of the word error rate by 9.6% relative. To our knowledge, this is the first significant improvement for a real-world broad-domain speech recognition task due to a precision grammar.

16:40Use of Contexts in Language Model Interpolation and Adaptation

Xunying Liu (Cambridge University Engineering Department)
Mark Gales (Cambridge University Engineering Department)
Phil Woodland (Cambridge University Engineering Department)

Language models (LMs) are often constructed by building component models on multiple text sources to be interpolated using global, context free weights. By re-adjusting these weights, LMs may be adapted to a target domain of a particular genre, epoch or other higher level attributes. Other factors that determine the ``usefulness'' of sources on a context dependent basis, such as modeling resolution, generalization, topics and styles, are poorly modeled. To overcome this problem, this paper investigates a context dependent form of LM interpolation and adaptation. In previous research, it was used primarily for LM adaptation. In this paper, a range of schemes to combine context dependent weights obtained from training and test data to improve LM adaptation are proposed. Consistent perplexity and error rate gains of 6\% relative were obtained on a state-of-the-art broadcast recognition task.

17:00Exploiting Chinese Character Models to Improve Speech Recognition Performance

J. L. Hieronymus (NASA Ames Research Center)
X. Liu (Cambridge University Engineering Department)
M. J. F. Gales (Cambridge University Engineering Department)
P.C. Woodland (Cambridge University Engineering Department)

The Chinese language is based on characters which are syllabic in nature. Since languages have syllabotactic rules which govern the construction of syllables and their allowed sequences, Chinese character sequence models can be used as a first level approximation. Ngram character sequence models were trained on 4.3 billion characters. Characters are used as a first level recognition unit with multiple pronunciations per character. The CU-HTK Mandarin word based system was used to recognize words which were then converted to character sequences. The character alone error rates of one best recognition were slightly worse than word based character recognition. However combining the two systems using log-linear combination gives better results than either system separately. An equally weighted combination gave consistant CER gains of 0.1 - 0.2 \% absolute over the word based standard system.

17:20Constraint selection for topic-based MDI adaptation of language models

Gwénolé Lecorvé (IRISA/INSA, France)
Guillaume Gravier (IRISA/CNRS, France)
Pascale Sébillot (IRISA/INSA, France)

This paper presents an unsupervised topic-based language model adaptation method which specializes the standard minimum information discrimination approach by identifying and combining topic-specific features. By acquiring a topic terminology from a thematically coherent corpus, language model adaptation is restrained to the sole probability re-estimation of n-grams ending with some topic-specific words, keeping other probabilities untouched. Experiments are carried out on a large set of spoken documents about various topics. Results show significant perplexity and recognition improvements which outperform results of classical adaptation techniques.

17:40Nonstationary Latent Dirichlet Allocation for Speech Recognition

Chuang-Hua Chueh (National Cheng Kung University)
Jen-Tzung Chien (National Cheng Kung University)

Latent Dirichlet allocation (LDA) has been successful for document modeling. LDA extracts the latent topics across documents. Words in a document are generated by the same topic distribution. However, in real-world documents, the usage of words in different paragraphs is varied and accompanied with different writing styles. This study extends the LDA and copes with the variations of topic information within a document. We build the nonstationary LDA (NLDA) by incorporating a Markov chain which is used to detect the stylistic segments in a document. Each segment corresponds to a particular style in composition of a document. This NLDA can exploit the topic information between documents as well as the word variations within a document. We accordingly establish a Viterbi-based variational Bayesian procedure. A language model adaptation scheme using NLDA is developed for speech recognition. Experimental results show improvement of NLDA over LDA in terms of perplexity and word error rate.