Discriminative Phrase-Based Statistical Machine Translation

Lead Research Organisation: University of Edinburgh
Department Name: Sch of Informatics

Abstract

Statistical Machine Translation (SMT) has made great improvements over the last decade. A striking property of these systems is that they make minimal usage of linguistic knowledge about translation. All knowledge about how to translate sentences is gathered in a data-driven manner from parallel corpora (sentences paired with their translation).In tandem with this observation, projecting ahead, we can see that the volumes of parallel corpora available for traning will not increase at a substantial rate. This suggests that further progress in SMT will come from better modelling of the existing data we have: this means bringing linguistics to the translation problem.For some languages, linguistic constraints are easily obtained. For other languages, this information is less widely present. We intend seeing whether an improvement in translation can be obtained even when using impoverished knowledge sources.To successfully carry out this integration, we need a flexible framework. We shall extend an existing approach (which yields state-of-the-art results) using techniques from discriminitive machine learning techinques ( maximim entropy ). These approaches will not only allow us to easily integrate linguistics into the translation process, but should also allow us to improve upon the state-of-the-art simply from better modelling. Associated with better modelling are serious scaling problems, for which we have experience at tackling.The language pairs we shall investigate will include German-English, Arabic-English and Chinese-English.Finally, we shall compete in international Machine Translation evaluation exercises. This will involve automatic and manual evaluation of our translation quality, and will allow comparison of our approaches with that of other groups.

Publications

10 25 50