Brighton Pavilion

10thAnnual Conference of the International Speech Communication Association

ISCA Interspeech 2009 Brighton

Technical Programme

This is the final programme for this session. For oral sessions, the timing on the left is the current presentation order, but this may still change, so please check at the conference itself. If you have signed in to My Schedule, you can add papers to your own personalised list.

Mon-Ses2-O2:
Production: Articulatory modelling

Time:Monday 13:30 Place:East Wing 1 Type:Oral
Chair: Rob Van Son

13:30Feedforward Control of A 3D Physiological Articulatory Model for Vowel Production

Qiang Fang (Phonetics Lab., Institute of Linguistics, Chinese Academy of Social Sciences)
Akikazu Nishikido (IIPL, School of Information Science, Japan Advanced Institute of Science and Technology)
Jianwu Dang (IIPL, School of Information Science, Japan Advanced Institute of Science and Technology)
Aijun Li (Phonetics Lab., Institute of Linguistics, Chinese Academy of Social Sciences)

A 3D Physiological articulatory model has been developed to account for the biomechanical properties of speech organs in speech production. To control the model for investigating the mechanism of speech production, a feedforward control strategy is necessary to generate proper muscle activations according to desired articulatory targets. In this paper, we elaborated a feedforward control module for the 3D physiological articulatory model. In the feedforward control process, an input articulatory target, specified by articulatory parameters, is transformed to intrinsic representation of articulation; then, a muscle activation pattern is estimated by a proposed mapping function. The results showed that the proposed feedforward control strategy is able to control the proposed 3D physiological articulatory model with high accuracy both acoustically and articulatorily.

13:50Articulatory Modeling Based on Semi-polar Coordinates and Guided PCA Technique

Jun Cai (Groupe Parole, LORIA-CNRS & INRIA, BP 239, 54600 Vandoeuvre-lès-Nancy, France)
Yves Laprie (Groupe Parole, LORIA-CNRS & INRIA, BP 239, 54600 Vandoeuvre-lès-Nancy, France)
Julie Busset (Groupe Parole, LORIA-CNRS & INRIA, BP 239, 54600 Vandoeuvre-lès-Nancy, France)
Fabrice Hirsch (Institut de Phonétique de Strasbourg, 2, rue Descartes, 67084 Strasbourg, France)

Research on 2-dimensional static articulatory modeling has been performed by using the semi-polar system and the guided PCA analysis of lateral X-ray images of vocal tract. The density of the grid lines in the semi-polar system has been increased to have a better descriptive precision. New parameters have been introduced to describe the movements of tongue apex. An extra feature, the tongue root, has been extracted as one of the elementary factors in order to improve the precision of tongue model. New methods still remain to be developed for describing the movements of tongue apex.

14:10Sequencing of Articulatory Gestures using Cost Optimization

Juraj Simko (Univeristy College Dublin)
Fred Cummins (University College Dublin)

Within the framework of articulatory phonology (AP), gestures function as primitives, and their ordering in time is provided by a gestural score. Determining how they should be sequenced in time has been something of a challenge. We modify the task dynamic implementation of AP, by defining tasks to be the desired positions of physically embodied end effectors. This allows us to investigate the optimal sequencing of gestures based on a parametric cost function. Costs evaluated include precision of articulation, articulatory effort, and gesture duration. We find that a simple optimization using these costs results in stable gestural sequences that reproduce several known coarticulatory effects.

14:30From experiments to articulatory motion—a three dimensional talking head model

Xiao Bo Lu (Bioengineering Institute, the University of Auckland, Auckland, New Zealand)
C. William Thorpe (Bioengineering Institute, the University of Auckland, Auckland, New Zealand)
Kylie Foster (Department of Food and Health, the University of Massey, Auckland, New Zealand)
Peter Hunter (Bioengineering Institute, the University of Auckland, Auckland, New Zealand)

The goal of this study is to develop a customised computer model that can accurately represent the motions of vocal articulators during vowels and consonants. Models of the articulators were constructed as Finite element (FE) meshes based on digitised high-resolution MRI (Magnetic Resonance Imaging) scans obtained during rest breathing. Articulatory kinematics during speaking were obtained by EMA (Electromagnetic Articulography) and video of the face. The movement information thus acquired was applied to the FE model to provide jaw motion, modeled as a rigid body, and tongue, cheek and lip movement modeled with a free-form deformation technique. The motion of the epiglottis has also been considered in the model.

14:50Towards Robust Glottal Source Modeling

Javier Pérez (TALP Research Center, Universitat Politècnica de Catalunya (UPC), Barcelona, Spain)
Antonio Bonafonte (TALP Research Center, Universitat Politècnica de Catalunya (UPC), Barcelona, Spain)

We present here a new method for the simultaneous estimation of the derivative glottal waveform and the vocal tract filter. The algorithm is pitch-synchronous and uses overlapping frames of several glottal cycles to increase the robustness and quality of the estimation. Two parametric models for the glottal waveform are used: the KLGLOTT88 during the convex optimization iteration, and the LF model for the final parametrization. We use a synthetic corpus using real data published in several studies to evaluate the performance. A second corpus has been specially recorded for this work, consisting of isolated vowels uttered with different voice qualities. The algorithm has been found to perform well with most of the voice qualities present in the synthetic data-set in terms of glottal waveform matching. The performance is also good with the real vowel data-set in terms of resynthesis quality.

15:10Sliding Vocal-tract Model and its Application for Vowel Production

Takayuki Arai (Sophia University)

In a previous study, Arai implemented a sliding vocal-tract model based on Fant’s three-tube model and demonstrated its usefulness for education in acoustics and speech science. The sliding vocal-tract model consists of a long outer cylinder and a short inner cylinder, which simulates tongue constriction in the vocal tract. This model can produce different vowels by sliding the inner cylinder and changing the degree of constriction. In this study, we investigated the model’s coverage of vowels on the vowel space and explored its application for vowel production in the speech and hearing sciences.