|
10thAnnual Conference of the International Speech Communication Association
Interspeech 2009 Brighton
|
Technical Programme
This is the final programme for this session. For oral sessions, the timing on the left is the current presentation order, but this may still change, so please check at the conference itself. If you have signed in to My Schedule, you can add papers to your own personalised list.
Tue-Ses1-O1: ASR: Discriminative Training
| Time: | Tuesday 10:00 |
Place: | Main Hall |
Type: | Oral |
| Chair: | Erik McDermott |
| 10:00 | On the Semi-Supervised Learning of Multi-Layered Perceptrons
Jonathan Malkin (University of Washington) Amarnag Subramanya (University of Washington) Jeff Bilmes (University of Washington)
We present a novel approach for training a multi-layered perceptron
(MLP) in a semi-supervised fashion. Our objective function, when
optimized, balances training set accuracy with fidelity to a
graph-based manifold over all points. Additionally, the objective
favors smoothness via an entropy regularizer over classifier outputs
as well as straightforward L2 regularization. Our approach also
scales well enough to enable large-scale training. The results
demonstrate significant improvement on several phone classification
tasks over baseline MLPs.
|
| 10:20 | Generalized Discriminative Feature Transformation for Speech Recognition
Roger Hsiao (InterACT, Language Technologies Institute, Carnegie Mellon University) Tanja Schultz (InterACT, Language Technologies Institute, Carnegie Mellon University)
We propose a new algorithm called Generalized Discriminative Feature Transformation (GDFT) for acoustic models in speech recognition. GDFT is based on Lagrange relaxation on a transformed optimization problem. We show that the existing discriminative feature transformation methods like feature space MMI/MPE (fMMI/MPE), region dependent linear transformation (RDLT), and a non-discriminative feature transformation, constrained maximum likelihood linear regression (CMLLR) are special cases of GDFT. We evaluate the performance of GDFT for Iraqi large vocabulary continuous speech recognition (LVCSR).
|
| 10:40 | A Fast Online Algorithm for Large Margin Training of Continuous Density Hidden Markov Models
Chih-Chieh Cheng (University of California, San Diego) Fei Sha (University of Southern California) Lawrence Saul (University of California, San Diego)
We propose an online learning algorithm for large margin training of continuous density hidden Markov models. The online algorithm updates the model parameters incrementally after the decoding of each training utterance. For large margin training, the algorithm attempts to separate the log-likelihoods of correct and incorrect transcriptions by an amount proportional to their Hamming distance. We evaluate this approach to hidden Markov modeling on the TIMIT speech database. We find that the algorithm yields significantly lower phone error rates than other approaches--both online and batch--that do not attempt to enforce a large margin. We also find that the algorithm converges much more quickly than analogous batch optimizations for large margin training.
|
| 11:00 | Maximum Mutual Information Estimation via Second Order Cone Programming for Large Vocabulary Continuous Speech Recognition
Dalei Wu (Department of Computer Science and Engineering, York University, 4700 Keele Street, Toronto, Ontario M3J 1P3, CANADA) Baojie Li (Department of Computer Science and Engineering, York University, 4700 Keele Street, Toronto, Ontario M3J 1P3, CANADA) Hui Jiang (Department of Computer Science and Engineering, York University, 4700 Keele Street, Toronto, Ontario M3J 1P3, CANADA)
In this paper, we have successfully extended our previous work of convex optimization methods to MMIE-based discriminative training for large vocabulary continuous speech recognition. Specifically, we have re-formulated the MMIE training into a second order cone programming (SOCP) program
using some convex relaxation techniques that we have previously proposed.
Moreover, the entire
SOCP formulation has been developed for word graphs instead of N-best lists to handle
large vocabulary tasks. The proposed method has been evaluated in the standard WSJ-5k task
and experimental results show that the proposed SOCP method significantly outperforms the conventional EBW method
in terms of recognition accuracy as well as convergence behavior. Our experiments also show that the
proposed SOCP method is efficient enough to handle some relatively large HMM sets normally used in large vocabulary tasks.
|
| 11:20 | Hidden Conditional Random Field with Distribution Constraints for Phone Classification
Dong Yu (Microsoft Research) Li Deng (Microsoft Research) Alex Acero (Microsoft Research)
We advance the recently proposed hidden conditional random field (HCRF) model by replacing the moment constraints (MCs) with the distribution constraints (DCs). We point out that the DCs are the same as the traditional MCs for the binary features but are able to better regularize the probability distribution of the continuous-valued features than the MCs. We show that under the DCs the HCRF model is no longer log-linear but embeds the model parameters in non-linear functions. We provide an effective solution to the resulting optimization problem by converting it to the traditional log-linear form at a higher-dimensional space of features exploiting cubic spline. We demonstrate that a 20.8% classification error rate can be achieved on the TIMIT phone classification task using the HCRF-DC model. This result is superior to any published single-system result on this task including the HCRF-MC model, the discriminatively trained HMMs, and the large-margin HMMs using the same features.
|
| 11:40 | Deterministic Annealing Based Training Algorithm for Bayesian Speech Recognition
Sayaka Shiota (Nagoya Institute of Technology) Kei Hashimoto (Nagoya Institute of Technology) Yoshihiko Nanakaku (Nagoya Institute of Technology) Keiichi Tokuda (Nagoya Institute of Technology)
This paper proposes a deterministic annealing based training algorithm
for Bayesian speech recognition.
The Bayesian method is a statistical technique for estimating reliable
predictive distributions by marginalizing model parameters.
However, the local maxima problem in the Bayesian method is more serious
than in the ML-based approach, because the Bayesian method treats
not only state sequences but also model parameters as latent variables.
The deterministic annealing EM (DAEM) algorithm
has been proposed to improve the local maxima problem in the
EM algorithm, and its effectiveness has been reported in HMM-based
speech recognition using ML criterion.
In this paper, the DAEM algorithm is applied to Bayesian speech
recognition to relax the local maxima problem.
Speech recognition experiments show that the proposed method
achieved a higher performance than the conventional methods.
|
|
|