A brilliant academic future for machine translation

CloudThis article was originally posted on the TAUS Blog.

On April 30 and May 1, the European Association for Machine Translation (EAMT) sponsored a two half-day workshop on training translator trainers to teach machine translation. The objective was to provide trainers with the necessary understanding, confidence and skills to teach machine translation to translators. The SALIS department at the Dublin City University organized the workshop and the presenters were Prof. Andy Way, Prof. Dorothy Kenny, Dr. Sharon O’Brien, KantanMT’s founder and chief architect Tony O’Dowd and Markus Foti from the European Commission’s DGT machine translation team.

The latest on machine translation and a brief lesson in extraterrestrial languages

Prof. Way opened the workshop with a presentation on the most recent developments in the field of machine translation and discussed the technological aspects that translator trainers should know and reproduce in their own classrooms. The focus was on different approaches to machine translation from a historical perspective, with emphasis on the currently dominant statistical machine translation paradigm. The geek needle went all the way into the red when Prof. Way presented the Centauri-into-Arcturan translation exercise devised by Kenny Knight in 1997. If you do not know this exercise, I recommend you read this great paper that has a nice little plot twist at the end.

Teaching statistical machine translation to translators

Prof. Kenny explained how machine translation is taught at DCU: the curriculum, its contents, time allocation and the various assignments. The DCU curriculum is centered on the premise that, within the machine translation field, a translator’s role shouldn’t be limited to post-editing but it should entail other skills as well, from terminology management to engine training and quality evaluation. In short, we should see future translators as some sort of language engineers.

Among the topics generally handled within the DCU Master’s program there are rule-based machine translation (system architecture, linguistic issues), statistical machine translation (probabilistic language processing, n-grams, language models, translation models, phrase tables, noisy channel model, log-linear model, domain specificity), quality evaluation (human and automatic evaluation, BLEU, WER, TER, etc.) and post-editing. In lab sessions, students are guided through the building of a statistical machine translation engine, starting with the selection of source texts and evaluation metrics, the optimization of source texts and translation processes, the improvement of the engine, and ending with the evaluation of the raw output.

It all sounds very interesting and infused with the right level of ambition, but the attendees I spoke to during the break wondered whether 10 hours of frontal teaching and 10 hours of lab are enough to forge translators with expertise on machine translation, who can really contribute to tweak the machines together with software engineers and computational linguists. Ambition is good, but it needs to go hand in hand with realistic goals, especially when, as Prof. Kenny herself admitted, most students still lacked basic file format management skills.

Teaching post-editing and machine translation evaluation

Dr. O’Brien’s session was engaging, too. Revisiting her paper from 2002 on post-editing, she asked fundamental questions to the participants on when and how post-editing skills and MT evaluation skills should be taught to students: Should post-editing be taught only to highly skilled translators or is it suited for beginners as well? What are the skills of a good post-editor (deep knowledge of both source and target language, terminology skills, knowledge of machine translation, etc…)? There was also a discussion about the technologies to use in order to help students explore the benefits and challenges of post-editing and machine translation evaluation, for example Google Translate and Asiya.

The only aspect of this session that raises some doubts is that within the DCU curriculum post-editing is presented like another form of revision, while, as practice teaches us, it actually is deeply different from traditional revision and requires very specific skills.

Promotional presentation

The second day saw an interesting presentation. Tony O’Dowd from KantanMT – the company that makes its platform available to DCU students – gave a fast-paced, very engaging hands-on demonstration of how to build a machine translation engine and translate documents using the platform. Tony O’Dowd was very generous with his knowledge, but the presentation was very much a sales talk. “KantanMT makes Machine Translation easy”: Apparently, to develop a machine translation engine it is enough to 1) gather training data; 2) upload them and press the Build button; and 3) translate client files. Indeed, during the demo we discovered that it all looks very simple. The user interface is almost self-explanatory and, in general, the platform includes fancy options. However, keeping in mind that the data and the files we used had been gathered, cleaned and pre-processed by KantanMT, I have to wonder whether this kind of practical exercise isn’t rather simplistic and somehow trivializes the complexity of machine translation in a professional setting.

A European glitch Markus Foti from the DGT’s MT team gave a peek at what happens behind the apparently simple UI of the EU machine translation engines. The MT@EC platform is available to the staff of European institutions and bodies, as an online service, for pilot projects with the public administrations in the EU Member States, as well as within the framework of collaboration projects with EMT (European Master in Translation) universities. Release 2.0 of the system was launched in 2014 and by the end of that same year, it contained already 725 million of monolingual segments – with a growth speed of around 2.6 million per month.

Through the user interface available in 24 languages, the system can deliver the translation of a text in various file formats, including TMX and XLIFF. I admit I am puzzled by Mr. Foti’s question, whether we thought that XLIFF was really going to become a popular format in the near future.

There was also an unfortunate glitch in the demonstration: When sending out the request for the translation of a French document into English, many participants received an empty Word file.

The role of TAUS

Machine translation and all the related skills are slowly but surely making an entrance in the university curricula all over the world. The DCU curriculum represents a brilliant beginning. In this sense, TAUS’s legacy as a think tank and resource center can prove extremely useful to those academic institutions that want to introduce machine translation courses for translation students.

I was surprised by the fact that, although most of the attendees (PhD students, lecturers and researchers) knew TAUS as a center that promotes translation automation (and, more specifically, machine translation), no one was aware of our free academic membership programour Talents Directory and Post-editors Directory. Only a couple of the workshop’s participants had followed our online post-editing course, and, again, no one knew that this course contains practical exercises in 20 languages.

Clearly, there is a lot of work ahead for us. We would like to invite everyone who is involved in translators’ training to contact us and discuss a possible cooperation.


Sign up for my monthly
#SmartReads on the Translation Industry

    Your email is safe with me and I will never share it with anyone.

    Most Popular Posts