Brighton Pavilion

10thAnnual Conference of the International Speech Communication Association

ISCA Interspeech 2009 Brighton

Technical Programme

This is the final programme for this session. For oral sessions, the timing on the left is the current presentation order, but this may still change, so please check at the conference itself. If you have signed in to My Schedule, you can add papers to your own personalised list.

Tue-Ses1-P1:
Human Speech Production II

Time:Tuesday 10:00 Place:Hewison Hall Type:Poster
Chair:Martin Cooke

#1Simple Physical Models of the Vocal Tract for Education in Speech Science

Takayuki Arai (Sophia University)

In the speech-related field, physical models of the vocal tract are effective tools for education in acoustics. Arai’s cylinder-type models are based on Chiba and Kajiyama’s measurement of vocal-tract shapes. The models quickly and effectively demonstrate vowel production. In this study, we developed physical models with simplified shapes as educational tools to illustrate how vocal-tract shape accounts for differences among vowels. As a result, the five Japanese vowels were produced by tube-connected models, where several uniform tubes with different cross-sectional areas and lengths are connected as Fant’s and Arai’s three-tube models.

#2Auto-meshing Algorithm for Acoustic Analysis of Vocal Tract

Kyohei Hayashi (Future University Hakodate)
Nobuhiro Miki (Future University Hakodate)

We propose a new method for an auto-meshing algorithm for an acoustic analysis of the vocal tract using the Finite Element Method (FEM). In our algorithm, the domain of the 3 dimensional figure of the vocal tract is decomposed into two domains; one is a surface domain and the other is an inner domain in order to employ the overlapping domain decomposition method. The meshing of surface blocks can be realized with smooth surfaces using a NURBS interpolation. We show the example of the meshes for the vocal tract figure of Japanese vowel /a/, and the trial result of the FEM simulation.

#3Voice production model employing an interactive boundary-layer analysis of glottal flow

Tokihiko Kaburagi (Department of Acoustic Design, Faculty of Design, Kyushu University)
Katsunori Daimo (Graduate School of Design, Kyushu University)
Shogo Nakamura (School of Design, Kyushu University)

A voice production model has been studied by considering essential aerodynamic and acoustic phenomena in phonation. Acoustic voice sources are produced by the volume flow through the glottis. A precise flow analysis is therefore performed based on the boundary-layer approximation and the viscous-inviscid interaction between the boundary layer and core flow. This flow analysis can supply information on the separation point of the glottal flow and the thickness of the boundary layer, and yield an effective prediction of the flow behavior. When the flow analysis is combined with a mechanical model of the vocal fold, the resulting acoustic wave travels through the vocal tract and a pressure change develops in the vicinity of the glottis. This change can affect the glottal flow and the motion of the folds, causing source-filter interaction. Preliminary simulations were conducted by changing the relationship between the fundamental and formant frequencies and their results were reported.

#4Characteristics of Two-Dimensional Finite Difference Techniques for Vocal Tract Analysis and Voice Synthesis

Matt Speed (Audio Lab, Department of Electronics, University of York)
Damian Murphy (Audio Lab, Department of Electronics, University of York)
David Howard (Audio Lab, Department of Electronics, University of York)

Both digital waveguide and finite difference techniques are numerical methods that have been demonstrated as appropriate for acoustic modelling applications. Whilst the application of the digital waveguide mesh to vocal tract modelling has been the subject of previous work, the application of comparable finite difference techniques is as yet untested. This study explores the characteristics of such a finite-difference approach to two-dimensional vocal tract modelling. Initial results suggest that finite difference techniques alone are not ideal, due to the limitation of non-dynamic behaviour and poor representation of admittance discontinuities in the approximation of three dimensional geometries. They do however introduce robust boundary formulations, and have a valid and useful application in modelling non-vital static volumes, particularly the nasal tract.

#5Adaptation of a predictive model of tongue shapes

Chao Qin (EECS, School of Engineering, University of California, Merced)
Miguel Carreira-Perpiñán (EECS, School of Engineering, University of California, Merced)

It is possible to recover the full midsagittal contour of the tongue with submillimetric accuracy from the location of just 3--4 landmarks on it. This involves fitting a predictive mapping from the landmarks to the contour using a training set consisting of contours extracted from ultrasound recordings. However, extracting sufficient contours is a slow and costly process. Here, we consider adapting a predictive mapping obtained for one condition (such as a given recording session, recording modality, speaker or speaking style) to a new condition, given only a few new contours and no correspondences. We propose an extremely fast method based on estimating a 2D-wise linear alignment mapping, and show it recovers very accurate predictive models from about 10 new contours.

#6Using sensor orientation information for computational head stabilisation in 3D Electromagnetic Articulography (EMA)

Christian Kroos (MARCS Auditory Laboratories, University of Western Sydney, Australia)

We propose a new simple algorithm to make use of the sensor orientation information in 3D Electromagnetic Articulography (EMA) for computational head stabilisation. The algorithm also provides a well-defined procedure in the case where only two sensors are available for head motion tracking and allows for the combining of position coordinates and orientation angles for head stabilisation with an equal weighting of each kind of information. An evaluation showed that the method using the orientation angles produced the most reliable results.

#7Collision Threshold Pressure Before and After Vocal Loading

Laura Enflo (Dept. of Speech, Music and Hearing, School of Computer Science & Communication, KTH, Sweden)
Johan Sundberg (Dept. of Speech, Music and Hearing, School of Computer Science & Communication, KTH, Sweden)
Friedemann Pabst (Hospital Dresden Friedrichstadt, Dresden, Germany)

The phonation threshold pressure (PTP) has been found to increase during vocal fatigue. In the present study we compare PTP and collision threshold pressure (CTP) before and after vocal loading in singer and non-singer voices. Seven subjects repeated the vowel sequence /a,e,i,o,u/ at an SPL of at least 80 dB @ 0.3 m for 20 min. Before and after this loading the subjects’ voices were recorded while they produced a diminuendo repeating the syllable /pa/. Oral pressure during the /p/ occlusion was used as a measure of subglottal pressure. Both CTP and PTP increased significantly after the vocal loading.

#8Gender differences in the realization of vowel-initial glottalization

Elke Philburn (University of Manchester, Department of Linguistics and English Language)

The aim of the study was to investigate gender-dependent differences in the realization of German glottalized vowel onsets. Laryngographic data of semi-spontaneous speech were collected from four male and four female speakers of Standard German. Measurements of relative vocal fold contact duration were carried out including glottalized vowel onsets as well as non-glottalized controls. The results show that female subjects realized the glottalized vowel onsets with greater maximum vocal fold contact duration than male subjects and that the glottalized vowel onsets produced by females were more clearly distinguished from the non-glottalized controls.

#9Stability and composition of functional synergies for speech movements in children and adults

Hayo Terband (Medical Psychology/Pediatric Neurology Centre/ENT, Radboud University Nijmegen Medical Centre, Nijmegen, the Netherlands)
Frits van Brenk (Department of Speech and Language Therapy, University of Strathclyde, Glasgow, UK)
Pascal van Lieshout (Department of Speech-Language Pathology, Oral Dynamics Lab; Department of Psychology; Institute of Biomaterials and Biomedical Engineering, University of Toronto, and Toronto Rehabilitation Institute, Toronto, Canada)
Lian Nijland (Medical Psychology/Pediatric Neurology Centre/ENT, Radboud University Nijmegen Medical Centre, Nijmegen, the Netherlands)
Ben Maassen (Medical Psychology/Pediatric Neurology Centre/ENT, Radboud University Nijmegen Medical Centre, Nijmegen, the Netherlands ; Department of Neurolinguistics, University of Groningen, Groningen, the Netherlands)

The consistency and composition of functional synergies for speech movements were investigated in 7 year-old children and adults in a reiterated speech task using electromagnetic articulography (EMA). Results showed higher variability in children for tongue tip and jaw, but not for lower lip movement trajectories. Furthermore, the relative contribution to the oral closure of lower lip was smaller in children compared to adults, whereas in this respect no difference was found for tongue tip. These results support and extend findings of non-linearity in speech motor development and illustrate the importance of a multi-measures approach in studying speech motor development.

#10An analysis of speech rate strategies in aging

Frits van Brenk (Department of Speech and Language Therapy, University of Strathclyde, Glasgow, UK; Medical Psychology/Pediatric Neurology Centre/ENT, Radboud University Nijmegen Medical Centre, Nijmegen, the Netherlands)
Hayo Terband (Medical Psychology/Pediatric Neurology Centre/ENT, Radboud University Nijmegen Medical Centre, Nijmegen, the Netherlands)
Pascal van Lieshout (Department of Speech-Language Pathology, Oral Dynamics Lab; Department of Psychology; Institute of Biomaterials and Biomedical Engineering, University of Toronto, and Toronto Rehabilitation Institute, Toronto, Canada)
Anja Lowit (Department of Speech and Language Therapy, University of Strathclyde, Glasgow, UK)
Ben Maassen (Medical Psychology/Pediatric Neurology Centre/ENT, Radboud University Nijmegen Medical Centre, Nijmegen, the Netherlands; Department of Neurolinguistics, University of Groningen, Groningen, the Netherlands)

Effects of age and speech rate on movement cycle duration were assessed using electromagnetic articulography. In a repetitive task syllables were articulated at eight rates, obtained by metronome and self-pacing. Results indicate that increased speech rate is associated with increasing movement cycle duration stability, while decreased rate leads to a decrease in uniformity of cycle duration, supporting the view that alterations in speech rate are associated with different motor control strategies involving durational manipulations. The relative contribution of closing movement durations increases with decreasing speech rate, and is a more dominant strategy for elderly speakers.

#11Variability and stability in collaborative dialogues: turn-taking and filled pauses

Štefan Beňuš (Constantine the Philosopher University, Nitra, Slovakia and Slovak Academy of Sciences, Bratislava, Slovakia)

Filled pauses have important and varied functions in turn-taking behavior, and better understanding of their relationship opens new ways for improving the quality and naturalness of dialogue systems. We use a corpus of collaborative task oriented dialogues to provide new insights into the relationship between filled pauses and turn-taking based on temporal and acoustic features. We then explore which of these patterns are stable and robust across speakers, which are prone to entrainment based on conversational partner, and which are variable and noisy. Our findings suggest that intensity is the least stable feature followed by pitch-related features, and temporal features relating filled pauses to chunking and turn-taking are the most stable.

#12Speaking in the presence of a competing talker

Youyi Lu (University of Sheffield)
Martin Cooke (Ikerbasque and University of the Basque Country)

How do speakers cope with a competing talker? This study investigated the possibility that speakers are able to retime their contributions to take advantages of temporal fluctuations in the background, reducing any adverse effects for an interlocutor. Speech was produced in quiet, competing talker, modulated noise and stationary backgrounds, with and without a communicative task. An analysis of the timing of contributions relative to the background indicated a significantly reduced chance of overlapping for the modulated noise backgrounds relative to quiet, with competing speech resulting in the least overlap. Strong evidence for an active overlap avoidance strategy is presented.