Project Details
Core Technologies for Statistical Machine Translation
Applicant
Professor Dr.-Ing. Hermann Ney
Subject Area
Image and Language Processing, Computer Graphics and Visualisation, Human Computer Interaction, Ubiquitous and Wearable Computing
General and Comparative Linguistics, Experimental Linguistics, Typology, Non-European Languages
General and Comparative Linguistics, Experimental Linguistics, Typology, Non-European Languages
Term
from 2017 to 2020
Project identifier
Deutsche Forschungsgemeinschaft (DFG) - Project number 327471424
This proposal is an updated and revised version of our proposal submitted to DFG in 2015.A lot of progress has been made in statistical machine translation (SMT). Nevertheless, the existing statistical methods do not yet capture all the relevant interdependencies of the words in source and target language. This proposal of RWTH Aachen focuses on three problems in order to improve the state of the art in statistical machine translation (SMT) of written and spoken language:1) Artificial neural networks (NN): We will extend the existing NN approaches for better modeling the dependencies between source sentence and target sentence; special attention will be given to recurrent neural networks and the word re-ordering problem. Word re-ordering is a serious problem for German because its word order tends to be very much different from the word order of other European languages.2) Extended translation models and improved/consistent training: The existing phrase-based approaches lack a sound statistical basis; in particular there is no consistent training procedure of the phrases in the phrasebased approach.3) Interface for spoken language input: In addition to the machine translation of text, this project will also consider speech translation. The output of the ASR (automatic speech recognition) engine is the input to the SMT engine, and we will improve the interface between ASR and SMT by various methods like punctuation prediction, enriched word hypothesis lattices and joint optimization of the ASR-SMT pipeline.
DFG Programme
Research Grants