Crowdsourcing for Machine Translation

Speaker: Vamshi Ambati
Location: GHC 4405
Date: September 12, 2010: Noon

Slides: [pdf]

Machine Translation for low-resource languages has not received wide attention, primarily due to lack of existing parallel corpora, or access to language experts that can create such resources. In this talk, I will share our experience with using crowd-sourcing platforms like Mechanical Turk for reaching bilingual speakers on the web. We will discuss the design of the task for effective elicitation from the crowd.

When working with crowd data, the objectives are two-fold - maximizing the quality of data from multiple non-experts, and minimizing the cost of annotation by pruning noisy annotators. I will discuss our recent experiments in Machine Translation for selection of high quality crowd translations by explicitly modeling annotator reliability based on agreement with other submissions. I will also present some preliminary results in cost minimization and report their adaptation and feasibility to machine translation.