Brighton Pavilion

10thAnnual Conference of the International Speech Communication Association

ISCA Interspeech 2009 Brighton

Technical Programme

This is the final programme for this session. For oral sessions, the timing on the left is the current presentation order, but this may still change, so please check at the conference itself. If you have signed in to My Schedule, you can add papers to your own personalised list.

Wed-Ses1-S1:
Special Session: Lessons and Challenges Deploying Voice Search

Time:Wednesday 10:00 Place:East Wing 4 Type:Special
Chair:Mike Cohen & Mike Phillips

10:00Role of Natural Language Understanding in Voice Local Search

Junlan Feng (AT&T Labs Research)
Srinivas Banglore (AT&T Labs Research)
Mazin Gilbert (AT&T Labs Research)

Speak4it is a voice-enabled local search system currently available for iPhone devices. The natural language understanding (NLU) component is one of the key technology modules in this system. The role of NLU in voice-enabled local search is twofold: (a) parse the automatic speech recognition (ASR) output (1-best and word lattices) into meaningful segments that contribute to high-precision local search, and (b) understand user’s intent. This paper is concerned with the first task of NLU. In previous work, we had presented a scalable approach to parsing, which is built upon text indexing and search framework, and can also parse ASR lattices. In this paper, we propose an algorithm to improve the baseline by extracting the “subjects” of the query. Experimental results indicate that lattice-based query parsing outperforms ASR 1-best based parsing by 2.1% absolute and extracting subjects in the query improves the robustness of search.

10:20Recognition and Correction of Voice Web Search Queries

Keith Vertanen (University of Cambridge)
Per Ola Kristensson (University of Cambridge)

In this work we investigate how to recognize and correct voice web search queries. We describe our corpus of web search queries and show how it was used to improve the accuracy of recognition. We show that using a search-specific vocabulary with automatically generated pronunciations is superior to using a vocabulary limited to a fixed pronunciation dictionary. We conducted a formative user study to investigate recognition and correction aspects of voice search in a mobile context. In the user study, we found that despite a word error rate of 48%, users were able to speak and correct search queries in about 18 seconds. Users did this while walking around using a mobile touch-screen device.

10:40Voice Search and Everything Else – What Users Are Saying to the Vlingo Top Level Voice UI

Chao Wang (Vlingo)

No abstract available.

11:00Searching Google by Voice

Johan Schalkwyk (Google)

No abstract available.

11:20Multiple-hypotheses searches from deeply parsed requests to multiple-evidences scoring: the DeepQA challenge

Roberto Sicconi (IBM)

No abstract available.

11:40Research Areas in Voice Search: Lessons from Microsoft Deployments

Geoffrey Zweng (Microsoft)

No abstract available.