|
10thAnnual Conference of the International Speech Communication Association
Interspeech 2009 Brighton
|
Technical Programme
This is the final programme for this session. For oral sessions, the timing on the left is the current presentation order, but this may still change, so please check at the conference itself. If you have signed in to My Schedule, you can add papers to your own personalised list.
Tue-Ses1-S1: Special Session: Advanced Voice Function Assessment
| Time: | Tuesday 10:00 |
Place: | East Wing 4 |
Type: | Special |
| Chair: | Anna Barney & Mette Pedersen |
| 10:00 | Acoustic and High-Speed Digital Imaging Based Analysis of Pathological Voice Contributes to Better Understanding and Differential Diagnosis of Neurological Dysphonias and of Mimicking Phonatory Disorders
Krzysztof Izdebski (Pacific Voice and Speech Foundation & Department of Otolaryngology: Head & Neck Surgery, Stanford Voice & Swallowing Center,
Stanford University School of Medicine) Yuling Yan (Department of Bioengineering, Santa Clara University & Department of Otolaryngology, Stanford University School of Medicine) Melda Kunduk (Department of Communication Sciences and Disorders, Louisiana State University)
Using Nyquist-plots definitions and HSDI-based analyses of the acoustic and visual data base of similarly sounding disordered neurologically driven pathological phonations, we categorized these signals and provided an in-depth explanation of how these sounds differ, and how these sounds are generated at the glottic level. Combined evaluations based on modern technology strengthened our knowledge and improved objective guidelines on how to approach clinical diagnosis “by ear”, significantly aiding the process of differential diagnosis of complex pathological voice qualities in non-laboratory settings. Index Terms: HSDI, Nyquist-plots, voice quality, tremor overpressure, vocal arrests, neurologic dsyphonias, functional dysphonias, mimicking disorders
|
| 10:20 | Normalized Modulation Spectral Features for Cross-Database Voice Pathology Detection
Maria Markaki (Computer Science Department, University of Crete) Yannis Stylianou (Computer Science Department, University of Crete)
In this paper, we employ normalized modulation spectral analysis
for voice pathology detection. Such normalization is important
when there is a mismatch between training and testing conditions,
or in other words, employing the detection system in real
(testing) conditions. Modulation spectra usually produce a
high-dimensionality space. For classification purposes, the size
of the original space is reduced using Higher Order Singular Value
Decomposition (SVD). Further, we select most relevant features
based on the mutual information between subjective voice quality
and computed features, which leads to an adaptive to the
classification task modulation spectra representation. For voice
pathology detection, the adaptive modulation spectra is combined
with an SVM classifier. To simulate the real testing conditions,
we used two independently recorded databases; one for training and
the other for testing. We address the difference of signal
characteristics between training and testing data through subband
normalization of modulation spectral features. Simulations show
that feature normalization enables the cross-database detection
of pathological voices even when training and test data are
different.
|
| 10:40 | Speech sample salience analysis for speech cycle detection
Christophe Mertens (Laboratory of Images, Signals and Telecommunication Devices, CP 165/51, Faculté des Sciences Appliquées. Université Libre de Bruxelles) Francis Grenez (Laboratory of Images, Signals and Telecommunication Devices, CP 165/51, Faculté des Sciences Appliquées. Université Libre de Bruxelles) Jean Schoentgen (National Fund for Scientific Research, Belgium)
The presentation proposes a method for the measurement
of cycle lengths in voiced speech. The background is the
study of acoustic cues of slow (vocal tremor) and fast
(vocal jitter) perturbations of the vocal frequency. Here,
these acoustic cues are obtained by means of a temporal
method that detects speech cycles via the so-called
salience of the speech signal samples. The method does
not request that the signal is locally periodic and the average
period length is known a priori. Several implementations
are considered and discussed. Salience analysis
is compared with the auto-correlation method for cycle
detection implemented in Praat.
|
| 11:00 | The Use of Telephone Speech Recordings for Assessment and Monitoring of Cognitive Function in Elderly People
Viliam Rapcan (Trinity Centre for Bioengineering, Trinity College Dublin, Ireland) Shona D\'Arcy (Trinity Centre for Bioengineering, Trinity College Dublin, Ireland) Nils Penard (Trinity College Institute of Neuroscience, Trinity College Dublin, Ireland) Ian H. Robertson (Trinity College Institute of Neuroscience, Trinity College Dublin, Ireland) Richard B. Reilly (Trinity Centre for Bioengineering & Trinity College Institute of Neuroscience, Trinity College Dublin, Ireland)
Cognitive assessment in clinic represents time consuming and expensive task. Speech may be employed as a means of monitoring cognitive function in elderly people. Extraction of speech characteristics from speech recorded remotely over a telephone was investigated and compared to speech characteristics extracted from recordings made in controlled environment. Results demonstrate that speech characteristics can be, with little changes in feature extraction algorithm, reliably (with overall accuracy of 93.2%) extracted from telephone quality speech. With further development of a fully automated IVR system, an early screening system for cognitive decline may be easily realized.
|
| 11:20 | Optimized Feature set to Assess Acoustic Perturbations in Dysarthric Speech
Sunil Nagaraja (Department of Electrical and Computer Engineering, University of New Brunswick, Canada) Eduardo Castillo Guerra (Department of Electrical and Computer Engineering, University of New Brunswick, Canada)
This paper is focused on the optimization of features derived to characterize the acoustic perturbations encountered in a group of neurological disorders known as Dysarthria. The work derives a set of orthogonal features that enable acoustic analyses of dysarthric speech from eight different Dysarthria types. The feature set is composed by combinations of objective measurements obtained with digital signal processing algorithms and perceptual judgments of the most reliably perceived acoustic perturbations. The effectiveness of the features to provide relevant information of the disorders is evaluated with different classifiers enabling a classification rate up to 93.7%.
|
| 11:40 | A MICROPHONE-INDEPENDENT VISUALIZATION TECHNIQUE FOR SPEECH DISORDERS
Andreas Maier (Universität Erlangen-Nürnberg, Abteilung für Phoniatrie und Pädaudiologie) Stefan Wenhardt (Universität Erlangen-Nürnberg, Lehrstuhl für Mustererkennung) Tino Haderlein (Universität Erlangen-Nürnberg, Lehrstuhl für Mustererkennung) Maria Schuster (Universität Erlangen-Nürnberg, Abteilung für Phoniatrie und Pädaudiologie) Elmar Nöth (Universität Erlangen-Nürnberg, Lehrstuhl für Mustererkennung)
In this paper we introduce a novel method for the visualization
of speech disorders. We demonstrate the method with disordered
speech and a control group. However, both groups were recorded
using two different microphones. The projection of the patient data
using a single microphone yields significant correlations between the
coordinates on the map and certain criteria of the disorder which
were perceptually rated. However, projection of data from multiple
microphones reduces this correlation. Usually, the acoustical mismatch
between the microphones is greater than the mismatch between
the speakers, i.e., not the disorders but the microphones form
clusters in the visualization. Based on an extension of the Sammon
mapping, we are able to create a map which projects the same speakers
onto the same position even if multiple microphones are used.
Furthermore, our method also restores the correlation between the
map coordinates and the perceptual assessment.
|
| 12:00 | Evaluation of the Effect of the GSM Full Rate codec on the Automatic Detection of Laryngeal Pathologies Based on Cepstral Analysis
Ruben Fraile (Universidad Politecnica de Madrid) Carmelo Sanchez (Universidad Politecnica de Madrid) Juan I. Godino-Llorente (Universidad Politecnica de Madrid) Nicolas Saenz-Lechon (Universidad Politecnica de Madrid) Victor Osma-Ruiz (Universidad Politecnica de Madrid) Juana M. Gutierrez (Universidad Politecnica de Madrid)
Advances in speech signal analysis during the last decade have allowed the development of automatic algorithms for a non-invasive detection of laryngeal pathologies. Bearing in mind the extension of these automatic methods to remote diagnosis scenarios, this paper analyzes the performance of a pathology detector based on Mel Frequency Cepstral Coefficients when the speech signal has undergone the distortion of a speech codec such as the GSM FR codec, which is use in one of the nowadays most widespread communications networks. It is shown that the overall performance of the automatic detection of pathologies is degraded less than 5%, and that such degradation is not due to the codec itself, but to the bandwidth limitation needed at its input. These results indicate that the GSM system can be more adequate to implement remote voice assessment than the analogue telephone channel.
|
| 12:20 | Cepstral analysis of vocal dysperiodicities in disordered connected speech
Ali Alpan (Laboratory of Images, Signals & Telecommunication Devices, Université Libre de Bruxelles, Brussels, Belgium) Jean Schoentgen (National Fund for Scientific Research, Belgium) Youri Maryn (Department of Otorhinolaryngology and Head & Neck Surgery, Department of Speech-Language Pathology and Audiology, Sint-Jan General Hospital, Bruges, Belgium) Francis Grenez (Laboratory of Images, Signals & Telecommunication Devices, Université Libre de Bruxelles, Brussels, Belgium) Peter Murphy (Department of Electronic and Computer Engineering, University of Limerick, Limerick, Ireland)
Several studies have shown that the amplitude of the first rahmonic peak (R1) in the cepstrum is an indicator of hoarse voice quality. The cepstrum is obtained by taking the inverse Fourier Transform of the log-magnitude spectrum. In the present study, a number of spectral analysis processing steps are implemented, including period-synchronous and period-asynchronous analysis, as well as harmonic-synchronous and harmonic-asynchronous spectral band-limitation prior to computing the cepstrum. The analysis is applied to connected speech signals. The correlation between amplitude R1 and perceptual ratings is examined for a corpus comprising 28 normophonic and 223 dysphonic speakers. One observes that the correlation between R1 and perceptual ratings increases when the spectrum is band-limited prior to computing the cepstrum. In addition, comparisons are made with a popular cepstral cue which is the cepstral peak prominence (CPP).
|
| 12:40 | Standard information from patients: the usefulness of self-evaluation measured with the French version of the VHI
Lise Crevier-Buchman (Department of Otolaryngology, Head & Neck Surgery, Hôpital Européen Georges Pompidou, Université Paris Descartes, Paris, France / Lab. Phonétique et Phonologie, UMR 7018 CNRS-Paris3/Sorbonne Nouvelle, Paris, France) Stephanie Borel (Department of Otolaryngology, Head & Neck Surgery, Hôpital Européen Georges Pompidou, Université Paris Descartes, Paris, France / Lab. Phonétique et Phonologie, UMR 7018 CNRS-Paris3/Sorbonne Nouvelle, Paris, France) Stephane Hans (Department of Otolaryngology, Head & Neck Surgery, Hôpital Européen Georges Pompidou, Université Paris Descartes, Paris, France) Madeleine Menard (Department of Otolaryngology, Head & Neck Surgery, Hôpital Européen Georges Pompidou, Université Paris Descartes, Paris, France) jacqueline Vaissiere (Lab. Phonétique et Phonologie, UMR 7018 CNRS-Paris3/Sorbonne Nouvelle, Paris, France)
Voice Handicap Index is a scale designed to measure the voice disability in daily life. Two groups of patients were evaluated. One group was represented by glottic carcinoma treated by cordectomy Type I & II (13 patients), type III (5 patients), type V (5 patients). Evaluation was done pre and postoperatively for 12 months. The other group was represented by patients with unilateral vocal fold paralysis treated by thyroplasty (17 patients). Evaluation was done before and 3 months postoperatively. Total VHI, emotional and physical subscales improved significantly for type I&II cordectomy and for thyroplasty. VHI can provide an insight into patient’s handicap
|
| 13:00 | Intelligibility Assessment in Children with Cleft Lip and Palate in Italian and German
Marcello Scipioni (Politecnico di Milano, Polo Regionale di Como, Italy) Matteo Gerosa (FBK - Fondazione Bruno Kessler, Trento, Italy) Diego Giuliani (FBK - Fondazione Bruno Kessler, Trento, Italy) Elmar Nöth (Chair of Pattern Recognition, Friedrich-Alexander-University Erlangen-Nuremberg, Germany) Andreas Maier (Chair of Pattern Recognition, Friedrich-Alexander-University Erlangen-Nuremberg, Germany)
Current research has shown that the speech intelligibility in
children with cleft lip and palate (CLP) can be estimated automatically using speech recognition methods. On German CLP
data high and significant correlations between human ratings
and the recognition accuracy of a speech recognition system
were already reported. In this paper we investigate whether
the approach is also suitable for other languages. Therefore,
we compare the correlations obtained on German data with the
correlations on Italian data. A high and significant correlation
(r=0.76; p < 0.01) was identified on the Italian data. This results
do not differ significantly from the results on German data
(p > 0.05).
|
| 13:20 | Universidade de Aveiro’s Voice Evaluation Protocol
Luis M. T. Jesus (IEETA and ESSUA, Universidade de Aveiro, Portugal) Anna Barney (ISVR, University of Southampton, UK) Ricardo Santos (Hospital Privado da Trofa, Portugal) Janine Caetano (Agrupamento de Escolas Serra da Gardunha, Fundão, Portugal) Juliana Jorge (RAIZ, Esmoriz, Portugal) Pedro Sá Couto (Departamento de Matemática da Universidade de Aveiro, Portugal)
This paper presents Universidade de Aveiro’s Voice Evaluation Protocol for European Portuguese (EP), and a preliminary inter-rater reliability study. Ten patients with vocal pathology were assessed, by two Speech and Language Therapists (SLTs). Protocol parameters such as overall severity, roughness, breathiness, change of loudness (CAPE-V), grade, breathiness and strain (GRBAS), glottal attack, respiratory support, respiratory-phonotary-articulatory coordination, digital laryngeal manipulation, voice quality after manipulation, muscular tension and diagnosis, presented high reliability and were highly correlated (good inter-rater agreement and high value of correlation). Values for the overall severity and grade were similar to those reported in the literature.
|
|
|