Friday, February 4, 2011

Crowdsourcing for speech technology

Title: Crowdsourcing for speech technology
Speaker: Gabriel Parent

When: February 9th (12:00 pm to 1:00 pm) in GHC6501

Abstract: The growing need for speech data in the last decades has lead to new guidelines for speech transcription. However, even with these "quick transcription" guidelines, transcribing a large quantity of speech using the traditional methods are very costly and slow. The use of crowdsourcing has considerably changed the way speech data is acquired and processed. In this talk, I will give an overview of the research on using crowdsourcing for speech labeling, speech acquisition and spoken dialog system assessment. I will then present both the design principles we used and the results we had using Mechanical Turk to transcribe over 250,000 speech utterances. Important issues such as quality control and throughput will also be addressed.