Brighton Pavilion

10thAnnual Conference of the International Speech Communication Association

ISCA Interspeech 2009 Brighton

Technical Programme

This is the final programme for this session. For oral sessions, the timing on the left is the current presentation order, but this may still change, so please check at the conference itself. If you have signed in to My Schedule, you can add papers to your own personalised list.

Tue-Ses1-S1:
Special Session: Advanced Voice Function Assessment

Time:Tuesday 10:00 Place:East Wing 4 Type:Special
Chair:Anna Barney & Mette Pedersen

10:00Acoustic and High-Speed Digital Imaging Based Analysis of Pathological Voice Contributes to Better Understanding and Differential Diagnosis of Neurological Dysphonias and of Mimicking Phonatory Disorders

Krzysztof Izdebski (Pacific Voice and Speech Foundation & Department of Otolaryngology: Head & Neck Surgery, Stanford Voice & Swallowing Center, Stanford University School of Medicine)
Yuling Yan (Department of Bioengineering, Santa Clara University & Department of Otolaryngology, Stanford University School of Medicine)
Melda Kunduk (Department of Communication Sciences and Disorders, Louisiana State University)

Using Nyquist-plots definitions and HSDI-based analyses of the acoustic and visual data base of similarly sounding disordered neurologically driven pathological phonations, we categorized these signals and provided an in-depth explanation of how these sounds differ, and how these sounds are generated at the glottic level. Combined evaluations based on modern technology strengthened our knowledge and improved objective guidelines on how to approach clinical diagnosis “by ear”, significantly aiding the process of differential diagnosis of complex pathological voice qualities in non-laboratory settings. Index Terms: HSDI, Nyquist-plots, voice quality, tremor overpressure, vocal arrests, neurologic dsyphonias, functional dysphonias, mimicking disorders

10:20Normalized Modulation Spectral Features for Cross-Database Voice Pathology Detection

Maria Markaki (Computer Science Department, University of Crete)
Yannis Stylianou (Computer Science Department, University of Crete)

In this paper, we employ normalized modulation spectral analysis for voice pathology detection. Such normalization is important when there is a mismatch between training and testing conditions, or in other words, employing the detection system in real (testing) conditions. Modulation spectra usually produce a high-dimensionality space. For classification purposes, the size of the original space is reduced using Higher Order Singular Value Decomposition (SVD). Further, we select most relevant features based on the mutual information between subjective voice quality and computed features, which leads to an adaptive to the classification task modulation spectra representation. For voice pathology detection, the adaptive modulation spectra is combined with an SVM classifier. To simulate the real testing conditions, we used two independently recorded databases; one for training and the other for testing. We address the difference of signal characteristics between training and testing data through subband normalization of modulation spectral features. Simulations show that feature normalization enables the cross-database detection of pathological voices even when training and test data are different.

10:40Speech sample salience analysis for speech cycle detection

Christophe Mertens (Laboratory of Images, Signals and Telecommunication Devices, CP 165/51, Faculté des Sciences Appliquées. Université Libre de Bruxelles)
Francis Grenez (Laboratory of Images, Signals and Telecommunication Devices, CP 165/51, Faculté des Sciences Appliquées. Université Libre de Bruxelles)
Jean Schoentgen (National Fund for Scientific Research, Belgium)

The presentation proposes a method for the measurement of cycle lengths in voiced speech. The background is the study of acoustic cues of slow (vocal tremor) and fast (vocal jitter) perturbations of the vocal frequency. Here, these acoustic cues are obtained by means of a temporal method that detects speech cycles via the so-called salience of the speech signal samples. The method does not request that the signal is locally periodic and the average period length is known a priori. Several implementations are considered and discussed. Salience analysis is compared with the auto-correlation method for cycle detection implemented in Praat.

11:00The Use of Telephone Speech Recordings for Assessment and Monitoring of Cognitive Function in Elderly People

Viliam Rapcan (Trinity Centre for Bioengineering, Trinity College Dublin, Ireland)
Shona D\'Arcy (Trinity Centre for Bioengineering, Trinity College Dublin, Ireland)
Nils Penard (Trinity College Institute of Neuroscience, Trinity College Dublin, Ireland)
Ian H. Robertson (Trinity College Institute of Neuroscience, Trinity College Dublin, Ireland)
Richard B. Reilly (Trinity Centre for Bioengineering & Trinity College Institute of Neuroscience, Trinity College Dublin, Ireland)

Cognitive assessment in clinic represents time consuming and expensive task. Speech may be employed as a means of monitoring cognitive function in elderly people. Extraction of speech characteristics from speech recorded remotely over a telephone was investigated and compared to speech characteristics extracted from recordings made in controlled environment. Results demonstrate that speech characteristics can be, with little changes in feature extraction algorithm, reliably (with overall accuracy of 93.2%) extracted from telephone quality speech. With further development of a fully automated IVR system, an early screening system for cognitive decline may be easily realized.

11:20Optimized Feature set to Assess Acoustic Perturbations in Dysarthric Speech

Sunil Nagaraja (Department of Electrical and Computer Engineering, University of New Brunswick, Canada)
Eduardo Castillo Guerra (Department of Electrical and Computer Engineering, University of New Brunswick, Canada)

This paper is focused on the optimization of features derived to characterize the acoustic perturbations encountered in a group of neurological disorders known as Dysarthria. The work derives a set of orthogonal features that enable acoustic analyses of dysarthric speech from eight different Dysarthria types. The feature set is composed by combinations of objective measurements obtained with digital signal processing algorithms and perceptual judgments of the most reliably perceived acoustic perturbations. The effectiveness of the features to provide relevant information of the disorders is evaluated with different classifiers enabling a classification rate up to 93.7%.

11:40A MICROPHONE-INDEPENDENT VISUALIZATION TECHNIQUE FOR SPEECH DISORDERS

Andreas Maier (Universität Erlangen-Nürnberg, Abteilung für Phoniatrie und Pädaudiologie)
Stefan Wenhardt (Universität Erlangen-Nürnberg, Lehrstuhl für Mustererkennung)
Tino Haderlein (Universität Erlangen-Nürnberg, Lehrstuhl für Mustererkennung)
Maria Schuster (Universität Erlangen-Nürnberg, Abteilung für Phoniatrie und Pädaudiologie)
Elmar Nöth (Universität Erlangen-Nürnberg, Lehrstuhl für Mustererkennung)

In this paper we introduce a novel method for the visualization of speech disorders. We demonstrate the method with disordered speech and a control group. However, both groups were recorded using two different microphones. The projection of the patient data using a single microphone yields significant correlations between the coordinates on the map and certain criteria of the disorder which were perceptually rated. However, projection of data from multiple microphones reduces this correlation. Usually, the acoustical mismatch between the microphones is greater than the mismatch between the speakers, i.e., not the disorders but the microphones form clusters in the visualization. Based on an extension of the Sammon mapping, we are able to create a map which projects the same speakers onto the same position even if multiple microphones are used. Furthermore, our method also restores the correlation between the map coordinates and the perceptual assessment.

12:00Evaluation of the Effect of the GSM Full Rate codec on the Automatic Detection of Laryngeal Pathologies Based on Cepstral Analysis

Ruben Fraile (Universidad Politecnica de Madrid)
Carmelo Sanchez (Universidad Politecnica de Madrid)
Juan I. Godino-Llorente (Universidad Politecnica de Madrid)
Nicolas Saenz-Lechon (Universidad Politecnica de Madrid)
Victor Osma-Ruiz (Universidad Politecnica de Madrid)
Juana M. Gutierrez (Universidad Politecnica de Madrid)

Advances in speech signal analysis during the last decade have allowed the development of automatic algorithms for a non-invasive detection of laryngeal pathologies. Bearing in mind the extension of these automatic methods to remote diagnosis scenarios, this paper analyzes the performance of a pathology detector based on Mel Frequency Cepstral Coefficients when the speech signal has undergone the distortion of a speech codec such as the GSM FR codec, which is use in one of the nowadays most widespread communications networks. It is shown that the overall performance of the automatic detection of pathologies is degraded less than 5%, and that such degradation is not due to the codec itself, but to the bandwidth limitation needed at its input. These results indicate that the GSM system can be more adequate to implement remote voice assessment than the analogue telephone channel.

12:20Cepstral analysis of vocal dysperiodicities in disordered connected speech

Ali Alpan (Laboratory of Images, Signals & Telecommunication Devices, Université Libre de Bruxelles, Brussels, Belgium)
Jean Schoentgen (National Fund for Scientific Research, Belgium)
Youri Maryn (Department of Otorhinolaryngology and Head & Neck Surgery, Department of Speech-Language Pathology and Audiology, Sint-Jan General Hospital, Bruges, Belgium)
Francis Grenez (Laboratory of Images, Signals & Telecommunication Devices, Université Libre de Bruxelles, Brussels, Belgium)
Peter Murphy (Department of Electronic and Computer Engineering, University of Limerick, Limerick, Ireland)

Several studies have shown that the amplitude of the first rahmonic peak (R1) in the cepstrum is an indicator of hoarse voice quality. The cepstrum is obtained by taking the inverse Fourier Transform of the log-magnitude spectrum. In the present study, a number of spectral analysis processing steps are implemented, including period-synchronous and period-asynchronous analysis, as well as harmonic-synchronous and harmonic-asynchronous spectral band-limitation prior to computing the cepstrum. The analysis is applied to connected speech signals. The correlation between amplitude R1 and perceptual ratings is examined for a corpus comprising 28 normophonic and 223 dysphonic speakers. One observes that the correlation between R1 and perceptual ratings increases when the spectrum is band-limited prior to computing the cepstrum. In addition, comparisons are made with a popular cepstral cue which is the cepstral peak prominence (CPP).

12:40Standard information from patients: the usefulness of self-evaluation measured with the French version of the VHI

Lise Crevier-Buchman (Department of Otolaryngology, Head & Neck Surgery, Hôpital Européen Georges Pompidou, Université Paris Descartes, Paris, France / Lab. Phonétique et Phonologie, UMR 7018 CNRS-Paris3/Sorbonne Nouvelle, Paris, France)
Stephanie Borel (Department of Otolaryngology, Head & Neck Surgery, Hôpital Européen Georges Pompidou, Université Paris Descartes, Paris, France / Lab. Phonétique et Phonologie, UMR 7018 CNRS-Paris3/Sorbonne Nouvelle, Paris, France)
Stephane Hans (Department of Otolaryngology, Head & Neck Surgery, Hôpital Européen Georges Pompidou, Université Paris Descartes, Paris, France)
Madeleine Menard (Department of Otolaryngology, Head & Neck Surgery, Hôpital Européen Georges Pompidou, Université Paris Descartes, Paris, France)
jacqueline Vaissiere (Lab. Phonétique et Phonologie, UMR 7018 CNRS-Paris3/Sorbonne Nouvelle, Paris, France)

Voice Handicap Index is a scale designed to measure the voice disability in daily life. Two groups of patients were evaluated. One group was represented by glottic carcinoma treated by cordectomy Type I & II (13 patients), type III (5 patients), type V (5 patients). Evaluation was done pre and postoperatively for 12 months. The other group was represented by patients with unilateral vocal fold paralysis treated by thyroplasty (17 patients). Evaluation was done before and 3 months postoperatively. Total VHI, emotional and physical subscales improved significantly for type I&II cordectomy and for thyroplasty. VHI can provide an insight into patient’s handicap

13:00Intelligibility Assessment in Children with Cleft Lip and Palate in Italian and German

Marcello Scipioni (Politecnico di Milano, Polo Regionale di Como, Italy)
Matteo Gerosa (FBK - Fondazione Bruno Kessler, Trento, Italy)
Diego Giuliani (FBK - Fondazione Bruno Kessler, Trento, Italy)
Elmar Nöth (Chair of Pattern Recognition, Friedrich-Alexander-University Erlangen-Nuremberg, Germany)
Andreas Maier (Chair of Pattern Recognition, Friedrich-Alexander-University Erlangen-Nuremberg, Germany)

Current research has shown that the speech intelligibility in children with cleft lip and palate (CLP) can be estimated automatically using speech recognition methods. On German CLP data high and significant correlations between human ratings and the recognition accuracy of a speech recognition system were already reported. In this paper we investigate whether the approach is also suitable for other languages. Therefore, we compare the correlations obtained on German data with the correlations on Italian data. A high and significant correlation (r=0.76; p < 0.01) was identified on the Italian data. This results do not differ significantly from the results on German data (p > 0.05).

13:20Universidade de Aveiro’s Voice Evaluation Protocol

Luis M. T. Jesus (IEETA and ESSUA, Universidade de Aveiro, Portugal)
Anna Barney (ISVR, University of Southampton, UK)
Ricardo Santos (Hospital Privado da Trofa, Portugal)
Janine Caetano (Agrupamento de Escolas Serra da Gardunha, Fundão, Portugal)
Juliana Jorge (RAIZ, Esmoriz, Portugal)
Pedro Sá Couto (Departamento de Matemática da Universidade de Aveiro, Portugal)

This paper presents Universidade de Aveiro’s Voice Evaluation Protocol for European Portuguese (EP), and a preliminary inter-rater reliability study. Ten patients with vocal pathology were assessed, by two Speech and Language Therapists (SLTs). Protocol parameters such as overall severity, roughness, breathiness, change of loudness (CAPE-V), grade, breathiness and strain (GRBAS), glottal attack, respiratory support, respiratory-phonotary-articulatory coordination, digital laryngeal manipulation, voice quality after manipulation, muscular tension and diagnosis, presented high reliability and were highly correlated (good inter-rater agreement and high value of correlation). Values for the overall severity and grade were similar to those reported in the literature.