|
10thAnnual Conference of the International Speech Communication Association
Interspeech 2009 Brighton
|
Technical Programme
This is the final programme for this session. For oral sessions, the timing on the left is the current presentation order, but this may still change, so please check at the conference itself. If you have signed in to My Schedule, you can add papers to your own personalised list.
Tue-Ses2-O4: Speaker Diarisation
| Time: | Tuesday 13:30 |
Place: | East Wing 3 |
Type: | Oral |
| Chair: | Douglas Reynolds |
| 13:30 | A STUDY OF NEW APPROACHES TO SPEAKER DIARIZATION
Douglas Reynolds (MIT Lincoln Laboratory) Patrick Kenny (CRIM) Fabio Castaldo (Politecnico di Torino)
This paper reports on work carried out at the 2008 JHU Summer Workshop
examining new approaches to speaker diarization. Four different systems
were developed and experiments were conducted using summed-channel telephone
data from the 2008 NIST SRE. The systems are a baseline agglomerative
clustering system, a new Variational Bayes system using eigenvoice speaker
models, a streaming system using a mix of low dimensional speaker factors and
classic segmentation and clustering, and a new hybrid system combining
the baseline system with a new cosine-distance speaker factor clustering.
Results are presented using the Diarization
Error Rate as well as by the EER when using diarization outputs for a speaker
detection task.
The best configurations of the diarization system produced DERs of 3.5-4.6\%
and we demonstrate a weak correlation of EER and DER,
|
| 13:50 | REDEFINING THE BAYESIAN INFORMATION CRITERION FOR SPEAKER DIARISATION
Themos Stafylakis (Institute for Language and Speech Processing, National Technical University of Athens) Vassilis Katsouros (Institute for Language and Speech Processing) George Carayannis (Institute for Language and Speech Processing, National Technical University of Athens)
A novel approach to Bayesian Information Criterion (BIC) is introduced. The new criterion redefines the penalty terms of the BIC, such that each parameter is penalized with the effective sample size is trained with. Contrary to Local-BIC, the proposed criterion scores overall clustering hypotheses and therefore is not restricted to hierarchical clustering algorithms. Contrary to Global-BIC, it provides a local dissimilarity measure that depends only the statistics of the examined clusters and not on the overall sample size. We tested our criterion with two benchmark tests and found significant improvement in performance in the speaker diarisation task
|
| 14:10 | Speaker Diarization Using Divide-and-Conquer
Shih-Sian Cheng (Institute of Information Science, Academia Sinica, Taipei, Taiwan) Chun-Han Tseng (Department of Computer Science and Engineering, National Sun Yat-sen University, Kaohsiung, Taiwan) Chia-Ping Chen (Department of Computer Science and Engineering, National Sun Yat-sen University, Kaohsiung, Taiwan) Hsin-Min Wang (Institute of Information Science, Academia Sinica, Taipei, Taiwan)
Speaker diarization systems consist of two core
components: speaker segmentation and speaker clustering. The
current state-of-the-art speaker diarization systems usually apply
hierarchical agglomerative clustering (HAC) for speaker clustering
after segmentation. However, HAC's quadratic computational
complexity with respect to the number of data samples inevitably
limits its application in large-scale data sets. In this paper, we
propose a divide-and-conquer (DAC) framework for speaker
diarization. It recursively partitions the input speech stream
into two sub-streams, performs diarization on them separately, and then combines the diarization results obtained from them using HAC. The experiment results show that the proposed framework is faster than the conventional segmentation and clustering-based approach while achieving comparable diarization accuracy. Moreover, the proposed framework obtains a higher speedup over the conventional approach on a larger test data set.
|
| 14:30 | KL Realignment for Speaker Diarization with Multiple Feature Streams
Deepu Vijayasenan (Idiap Research Institute, 1920 Martigny, CH) Fabio Valente (Idiap Research Institute, 1920 Martigny, CH) Herve Bourlard (Idiap Research Institute, 1920 Martigny, CH)
This paper aims at investigating the use of Kullback-Leibler
(KL) divergence based realignment with application to speaker
diarization. The use of KL divergence based realignment operates
directly on the speaker posterior distribution estimates
and is compared with traditional realignment performed using
HMM/GMM system. We hypothesize that using posterior estimates
to re-align speaker boundaries is more robust than gaussian
mixture models in case of multiple feature streams with
different statistical properties. Experiments are run on the NIST
RT06 data. They reveal that in case of conventional MFCC features
the two approaches have the same performance while the
KL based system outperforms the HMM/GMM re-alignment in
case of combination of multiple feature streams (MFCC and
TDOA). Furthermore we discuss the possible extension to other
feature sets.
|
| 14:50 | Speech Overlap Detection in a Two-Pass Speaker Diarization System
Marijn Huijbregts (University of Twente) David Leeuwen, van (TNO Human Factors) Franciska Jong, de (University of Twente)
In this paper we present the two-pass speaker diarization system that we developed for the NIST RT09s evaluation. In the first pass of our system a model for speech overlap detection is generated automatically. This model is used in two ways to reduce the diarization errors due to overlapping speech. First, it is used in a second diarization pass to remove overlapping speech from the data while training the speaker models. Second, it is used to find speech overlap for the final segmentation so that overlapping speech segments can be generated. The experiments show that our overlap detection method improves the performance of all three of our system configurations.
|
| 15:10 | Improved Speaker Diarization of Meeting Speech with Recurrent Selection of Representative Speech Segments and Participant Interaction Pattern Modeling
Kyu Han (University of Southern California) Shrikanth Narayanan (University of Southern California)
In this work we describe two distinct novel improvements to our speaker diarization system, previously proposed for analysis of meeting speech. The first approach focuses on recurrent selection of representative speech segments for speaker clustering while the other is based on participant interaction pattern modeling. The former selects speech segments with high relevance to speaker clustering, especially from a robust cluster modeling perspective, and keeps updating them throughout clustering procedures. The latter statistically models conversation patterns between meeting participants and applies it as a priori information when refining diarization results. Experimental results reveal that the two proposed approaches provide performance enhancement by 29.82% (relative) in terms of diarization error rate in tests on 13 meeting excerpts from various meeting speech corpora.
|
|
|