Project

ArisToCAT - Assessing The Comprehensibility of Automatic Translations

Acronym

G006417N

Code

3G006417W

Duration

01 January 2017 → 31 December 2020

Funding

Research Foundation - Flanders (FWO)

Promotor

Lieve Macken

Research disciplines

Natural sciences
- Natural language processing
Humanities and the arts
- Translation studies
- Interpreting studies

Keywords

automatic translations Language technology Translation Studies

Project description

Machine translation systems cannot guarantee that the text they produce will be fluent and coherent in both syntax and semantics. Erroneous words and syntax occur frequently in machinetranslated text, leaving the reader to guess parts of the intended message. This project (i) analyzes eye movement data to investigate to what extent the lack of predictability in texts that were created by MT impairs comprehension, and (ii) tries to automatically estimate the comprehensibility of machine-translated text. To tackle the first research objective, we will collect and analyze eye movements of participants reading Dutch machine-translated text. In a first experiment we investigate the impact of different categories of MT errors (syntactic versus semantic, function words versus content words, shortdistance versus long-distance triggers of errors) on comprehension. In a second experiment, the participants read six short machine-translated texts of approximately 300-400 words for comprehension. To tackle the second research objective, an MT comprehensibility estimation system for Dutch will be built. The system takes as input a machine-translated sentence and tries to detect the MT errors that seriously hamper comprehension. We start off with a basic system incorporating baseline features such as sentence length and word frequency and gradually add features derived from language models with increasing complexity, namely n-gram, dependency and neural language models.