Brighton Pavilion

10thAnnual Conference of the International Speech Communication Association

ISCA Interspeech 2009 Brighton

Technical Programme

This is the final programme for this session. For oral sessions, the timing on the left is the current presentation order, but this may still change, so please check at the conference itself. If you have signed in to My Schedule, you can add papers to your own personalised list.

Tue-Ses1-O1:
ASR: Discriminative Training

Time:Tuesday 10:00 Place:Main Hall Type:Oral
Chair: Erik McDermott

10:00On the Semi-Supervised Learning of Multi-Layered Perceptrons

Jonathan Malkin (University of Washington)
Amarnag Subramanya (University of Washington)
Jeff Bilmes (University of Washington)

We present a novel approach for training a multi-layered perceptron (MLP) in a semi-supervised fashion. Our objective function, when optimized, balances training set accuracy with fidelity to a graph-based manifold over all points. Additionally, the objective favors smoothness via an entropy regularizer over classifier outputs as well as straightforward L2 regularization. Our approach also scales well enough to enable large-scale training. The results demonstrate significant improvement on several phone classification tasks over baseline MLPs.

10:20Generalized Discriminative Feature Transformation for Speech Recognition

Roger Hsiao (InterACT, Language Technologies Institute, Carnegie Mellon University)
Tanja Schultz (InterACT, Language Technologies Institute, Carnegie Mellon University)

We propose a new algorithm called Generalized Discriminative Feature Transformation (GDFT) for acoustic models in speech recognition. GDFT is based on Lagrange relaxation on a transformed optimization problem. We show that the existing discriminative feature transformation methods like feature space MMI/MPE (fMMI/MPE), region dependent linear transformation (RDLT), and a non-discriminative feature transformation, constrained maximum likelihood linear regression (CMLLR) are special cases of GDFT. We evaluate the performance of GDFT for Iraqi large vocabulary continuous speech recognition (LVCSR).

10:40A Fast Online Algorithm for Large Margin Training of Continuous Density Hidden Markov Models

Chih-Chieh Cheng (University of California, San Diego)
Fei Sha (University of Southern California)
Lawrence Saul (University of California, San Diego)

We propose an online learning algorithm for large margin training of continuous density hidden Markov models. The online algorithm updates the model parameters incrementally after the decoding of each training utterance. For large margin training, the algorithm attempts to separate the log-likelihoods of correct and incorrect transcriptions by an amount proportional to their Hamming distance. We evaluate this approach to hidden Markov modeling on the TIMIT speech database. We find that the algorithm yields significantly lower phone error rates than other approaches--both online and batch--that do not attempt to enforce a large margin. We also find that the algorithm converges much more quickly than analogous batch optimizations for large margin training.

11:00Maximum Mutual Information Estimation via Second Order Cone Programming for Large Vocabulary Continuous Speech Recognition

Dalei Wu (Department of Computer Science and Engineering, York University, 4700 Keele Street, Toronto, Ontario M3J 1P3, CANADA)
Baojie Li (Department of Computer Science and Engineering, York University, 4700 Keele Street, Toronto, Ontario M3J 1P3, CANADA)
Hui Jiang (Department of Computer Science and Engineering, York University, 4700 Keele Street, Toronto, Ontario M3J 1P3, CANADA)

In this paper, we have successfully extended our previous work of convex optimization methods to MMIE-based discriminative training for large vocabulary continuous speech recognition. Specifically, we have re-formulated the MMIE training into a second order cone programming (SOCP) program using some convex relaxation techniques that we have previously proposed. Moreover, the entire SOCP formulation has been developed for word graphs instead of N-best lists to handle large vocabulary tasks. The proposed method has been evaluated in the standard WSJ-5k task and experimental results show that the proposed SOCP method significantly outperforms the conventional EBW method in terms of recognition accuracy as well as convergence behavior. Our experiments also show that the proposed SOCP method is efficient enough to handle some relatively large HMM sets normally used in large vocabulary tasks.

11:20Hidden Conditional Random Field with Distribution Constraints for Phone Classification

Dong Yu (Microsoft Research)
Li Deng (Microsoft Research)
Alex Acero (Microsoft Research)

We advance the recently proposed hidden conditional random field (HCRF) model by replacing the moment constraints (MCs) with the distribution constraints (DCs). We point out that the DCs are the same as the traditional MCs for the binary features but are able to better regularize the probability distribution of the continuous-valued features than the MCs. We show that under the DCs the HCRF model is no longer log-linear but embeds the model parameters in non-linear functions. We provide an effective solution to the resulting optimization problem by converting it to the traditional log-linear form at a higher-dimensional space of features exploiting cubic spline. We demonstrate that a 20.8% classification error rate can be achieved on the TIMIT phone classification task using the HCRF-DC model. This result is superior to any published single-system result on this task including the HCRF-MC model, the discriminatively trained HMMs, and the large-margin HMMs using the same features.

11:40Deterministic Annealing Based Training Algorithm for Bayesian Speech Recognition

Sayaka Shiota (Nagoya Institute of Technology)
Kei Hashimoto (Nagoya Institute of Technology)
Yoshihiko Nanakaku (Nagoya Institute of Technology)
Keiichi Tokuda (Nagoya Institute of Technology)

This paper proposes a deterministic annealing based training algorithm for Bayesian speech recognition. The Bayesian method is a statistical technique for estimating reliable predictive distributions by marginalizing model parameters. However, the local maxima problem in the Bayesian method is more serious than in the ML-based approach, because the Bayesian method treats not only state sequences but also model parameters as latent variables. The deterministic annealing EM (DAEM) algorithm has been proposed to improve the local maxima problem in the EM algorithm, and its effectiveness has been reported in HMM-based speech recognition using ML criterion. In this paper, the DAEM algorithm is applied to Bayesian speech recognition to relax the local maxima problem. Speech recognition experiments show that the proposed method achieved a higher performance than the conventional methods.