Brighton Pavilion

10thAnnual Conference of the International Speech Communication Association

ISCA Interspeech 2009 Brighton

Technical Programme

This is the final programme for this session. For oral sessions, the timing on the left is the current presentation order, but this may still change, so please check at the conference itself. If you have signed in to My Schedule, you can add papers to your own personalised list.

Tue-Ses3-S2:
Special Session: Measuring the Rhythm of Speech

Time:Tuesday 16:00 Place:East Wing 4 Type:Special
Chair: Daniel Hirst & Greg Kochanski

#0Investigating Changes in the Rhythm of Maori over Time

Margaret Maclagan (University of Canterbury, New Zealand)
Catherine Watson (University of Auckland, New Zealand)
Jeanette King (University of Canterbury, New Zealand)
Ray Harlow (University of Waikato, New Zealand)
Laura Thompson (University of Auckland, New Zealand)
Peter Keegan (University of Auckland, New Zealand)

Present-day Maori elders comment that the mita (which includes rhythm) of the Maori language, has changed over time. This paper presents the first results in a study of the change of Maori rhythm. PVI analyses did not capture this change. Perceptual experiments, using extracts of speech low-pass filtered to 400 Hz, demonstrated that Maori and English speech could be distinguished. Listeners who spoke Maori were more accurate than those who spoke only English. The English and Maori speech of groups of different speakers born at different times was perceived differently, indicating that the rhythm of Maori has indeed changed over time.

#0The Dynamic Dimension of the Global Speech-Rhythm Attributes

Jan Volín (Institute of Phonetics, Charles University in Prague)
Petr Pollák (Faculty of Electrical Engineering, Czech Technical University in Prague)

Recent years have revealed that certain global attributes of speech rhythm can be quite successfully captured with respect to consonantal and vocalic intervals in spoken texts. One of the many problems of this approach lies in complex syllabic structures. Unless we make an a-priori phonological decision, sonorous consonants may contribute to either vocalic or consonantal part of the speech signal in post-initial and pre-final positions of syllabic onsets and codas. A procedure is offered to avoid phonological dilemmas together with tedious manual work. The method is tested on continuous Czech and English texts read out by several professionals.

#0Vowel duration in pre-geminate contexts in Polish

Zofia Malisz (Adam Mickiewicz University, Poznan)

The study presents Polish experimental data on the variability of vowel duration in the context of following singleton and geminate consonants. The aim of the study is to explain the low vocalic variability values obtained from "rhythm metrics" based analyses of speech rhythm. It also aims at contributing to the discussion about current dynamical models of speech rhythm that contain assumptions of the relative temporal stability of the vowel-to-vowel sequence. The results suggest that vowels in Polish co-vary with following consonant length in a roughly proportionate manner. An interpretation of the effect is offered where a fortition process overrides the possibility of temporal compensation. Index Terms: gemination, vowel duration, speech rhythm, Polish

#0Effects of Mora-timing in English Rhythm Control by Japanese Learners

Shizuka Nakamura (Graduate School of Global Information and Telecommunication Studies, Waseda University, Japan)
Hiroaki Kato (National Institute of Information and Communications Technology / Advanced Telecommunications Research Institute International, Japan)
Yoshinori Sagisaka (Graduate School of Global Information and Telecommunication Studies, Waseda University, Japan)

In our previous studies on an objective evaluation of English rhythm control by Japanese learners, we noticed that the accustomed mora-timing of Japanese learners might unfavorably affect English speech of stress-timing. In this paper, we analyzed durational differences between Japanese learners and native speakers in the corresponding speech units such as stressed/unstressed syllable, strong/weak vowel, syllable in content/function word, and closed/open syllable from a perspective of the contrast of stressed/unstressed syllables. It was confirmed that these durational differences caused by mora-timing strongly affected subjective evaluation by native teachers, through correlation analyses of these differences and subjective evaluation scores.

16:00The rhythm of text and the rhythm of utterances: from metrics to models.

Daniel Hirst (CNRS, Aix-Marseille Université, Aix-en-Provence, France)

The typological classification of languages as stress-timed, syllable-timed and mora-timed did not stand up to empirical investigation which found little or no evidence for the different types of isochrony which had been assumed to be the basis for the classification. In recent years, there has been a renewal of interest with the development of empirical metrics for measuring rhythm. In this paper it is shown that some of these metrics are more sensitive to the rhythm of the text than to the rhythm of the utterance itself. While a number of recent proposals have been made for improving these metrics it is proposed that what is needed is more detailed studies of large corpora in order to develop more sophisticated models of the way in which prosodic structure is realised in different languages. New data on British English is presented using the Aix-Marsec corpus.

16:20No Time to Lose? Time Shrinking Effects Enhance the Impression of Rhythmic ”Isochrony” and Fast Speech Rate

Petra Wagner (Universität Bielefeld)
Andreas Windmann (Universität Bielefeld)

Time Shrinking denotes the psycho-acoustic shrinking effect of a short interval on one or several subsequent longer intervals. Its effectiveness in the domain of speech perception has so far not been examined. Two perception experiments clearly suggest the influence of relative duration patterns triggering time shrinking on the perception of tempo and rhythmical isochrony or rather "evenness". A comparison between the experimental data and duration patterns across various languages suggests a strong influence of time shrinking on the impression of isochrony in speech and perceptual speech rate. Our results thus emphasize the necessity of taking into account relative timing within rhythmical domains such as feet, phrases or narrow rhythm units as a complementary perspective to popular global rhythm variability metrics.

16:40Measuring speech rhythm variation in a model-based framework

Plínio Barbosa (Speech Prosody Studies Group/Dep. of Linguistics/Inst.Est. Ling., Univ. of Campinas, Brazil)

A coupled-oscillators-model-based method for measuring speech rhythm is presented. This model explains cross-linguistic differences in rhythm as deriving from varying degrees of coupling strength between a syllable oscillator and a phrase stress oscillator. The method was applied to three texts read aloud in French, in Brazilian and European Portuguese by seven speakers. The results reproduce the early findings on rhythm typology for these languages/varieties with the following advantages: it successfully accounts for speech rate variation, related to the syllabic oscillator frequency in the model; it takes only syllable-sized units into account, not spliting syllables into vowels and consonants; the consequences of phrase stress magnitude on stress group duration are directly considered; both universal and language-specific aspects of speech rhythm are captured by the model.

17:00Rhythm measures with language-independent segmentation

Anastassia Loukina (Phonetics laboratory, University of Oxford, United Kingdom)
Greg Kochanski (Phonetics laboratory, University of Oxford, United Kingdom)
Chilin Shih (EALC/Linguistics, University of Illinois, Urbana-Champaign USA)
Elinor Keane (Phonetics laboratory, University of Oxford, United Kingdom)
Ian Watson (Phonetics laboratory, University of Oxford, United Kingdom)

We compare 15 measures of speech rhythm based on an automatic segmentation of speech into vowel-like and consonant-like regions. This allows us to apply identical segmentation criteria to all languages and compute rhythm measures over a large corpus. It may also approximate more closely the segmentation available to pre-lexical infants, who have been claimed to discriminate between languages. We find that within-language variation is large and comparable to the language-to-language differences we observed. We evaluate the success of different measures in separating languages and show that the efficiency of measures depends on the languages included in the corpus. Rhythm appears to be described by two dimensions and different published rhythm measures capture different aspects of it.