This article was first published in Multilingual, December 2015.
Post-editing of machine translation (MT) is as old as machine translation itself and, just like machine translation, it has been generating great controversy since its inception. Many professional translators have been refusing to post-edit machine translation output because of the low quality and the inherent defectiveness of machine translation engines, while many language service providers (LSP) cannot figure out the best practices for using and offering MT and post-editing services. A general misunderstanding of what post-editing really is and its association with revision can be considered the common denominator for the problems of both groups.
Academic seal of approval
MT and all its related skills are slowly but surely making an entrance in university curricula, all over the world. One example is the translation curriculum at Dublin City University (DCU). On April 30 and May 1, 2015, the School for Applied Language and Intercultural Studies (SALIS) at DCU hosted the “MT Train the Trainer” workshop, sponsored by the European Association for Machine Translation (EAMT). The workshop’s objective was to provide trainers with the necessary understanding, confidence and skills to teach MT to professional and would-be translators.
At the beginning of his presentation, one of the speaker, Prof. Andy Way, told the participants (among which, many lecturers from various European universities belonging to the EMT network) that one of his main tasks – and one of the main goals of his course on statistical machine translation (SMT) – is debunking myths and fighting prejudices about machine translation. In fact, misunderstandings are still widespread and deeply rooted in the academic world as well as in the industry.
Key points of discussion during the workshop were the skills of a good post-editor, MT evaluation and post-editing skills to be included in the program. In this respect, the fundamental question was – and actually still is – whether post-editing should be taught only to highly skilled translators or to beginners as well.
These questions are still unanswered at large, as it emerged from the DCU curriculum that Prof. Dorothy Kenny and Dr. Sharon O’Brien illustrated. The Master’s program in translation technology at DCU develops over an academic year of eight months, in two semesters. MT and post-editing is one of five compulsory modules. Two of the other three remaining modules are centered on translation practice and profession, one around terminology and the last one around translation theory. The curriculum also comprises optional modules like localization and corpora linguistics.
The DCU program is focused on a general background knowledge of various MT-related topics, from the difference between rule-based machine translation and SMT to quality evaluation and post-editing. In lab sessions, students are guided through the building of a SMT engine, starting with the selection of source texts and evaluation metrics, the optimization of source texts and translation processes, the improvement of the engine, and ending with the evaluation of the raw output. In brief, the DCU curriculum offers a good beginning and a promise of a brilliant academic future for MT and post-editing. However, one question comes to mind: is the profile of future post-editors really looking so complicated?
Dublin is once again at the forefront in implementing translation technology in academic curricula. But other universities are now following the DCU’s example and are working to include an optional course on machine translation and post-editing. Even the conservative Italian academic world is lining up to join the cause. Both the University of Bologna and the UNINT (Università degli Studi Internazionali) in Rome, for instance, have each planned a similar course on MT and post-editing for the year 2015/2016. At first glance, the programs seem to be of a very theoretical nature. Also, browsing through the curriculum and reference material one cannot fail to notice two issues: the most recent recommended articles date back to 2012 and there are no references to post-editing issues in various languages. In the course description of UNINT there is mention of lab assignments only in one language combination (English into Italian). In any case, it will be interesting to see how these curricula will evolve.
During DCU’s “MT Train the Trainer” workshop, two main weak spots were eventually identified: the still evolving nature of post-editing and the lack of shared methodologies for an efficient post-editing process.
The ISO/DIS 18587 standard
Shared methodologies and standards are important because they define the building blocks of common protocols to be adopted by all. Standards are not compulsory: We don’t have to adhere to them, but they are meant to make life easier. Think of electric plugs and how the differences and similarities between them affect us.
After 70 years, post-editing is still very much an evolving skill and, therefore, a standard could be premature, unless it is meant to be more regulatory and directing than it is meant to be normalizing.
Unfortunately, the long-awaited ISO standard on post-editing (still in draft, now at close-of-voting stage) represents a step away from reality, first of all because there was no direct contribution from translation practitioners and, secondly, because it aims to establish principles and requirements for a discipline that is still very much fluid, and that seems to be going towards a completely different direction.
For example, the draft in question defines post-editing as revision of a machine-translated text (see 2.1.4 of the document), which clearly goes against the nature of post-editing itself as a process/skill set inherently different from translation and, therefore, revision.
The goals and scope of MT are different from those of human translation, and the same applies to post-editing and revision. It is therefore pointless to keep comparing these four different areas. The definition offered by TAUS in its 2010 report: (“the process of improving a machine-generated translation with a minimum of manual labor”) allows for a more precise distinction between two different activities.
The preproduction and production section 3 of ISO’s standard draft lists a long series of tasks without any clear methodological indication on how to proceed, and, in a way, it includes some of the tasks belonging the more traditional translation process (e.g., terminology check and format control).
The confusion in the ISO standard continues in Section 4.1 with the list translation skills presented as necessary competences. Translation competence is the first that springs out, together with the “linguistic and textual competence in the source language and the target language”, which leaves no room for monolingual post-editing done by subject matter experts.
In short, the standard draft on post-editing is too little, too soon. The risk of such an early normalization is that it could very quickly become outdated.
Post-editing in CAT tool environments
In recent years, MT has becoming one of the functionalities available to computer-aided tool (CAT) users, first through general engines like Google Translate and Microsoft Translator and then with APIs for other commercial engines. All main CAT tools nowadays have plug-ins to machine translation engines offered by the main technology providers of our industry. This might be one good explanation from the sudden popularity of post-editing, although some practitioners are not ready to admit of having been charmed by the productivity offered by a MT engine.
In a blog post published on August 18, 2015, Memsource makes a bold statement, by declaring that post-editing might be going mainstream. Based on the data collected within the Memsource community, over 50% of all the translations from English to Spanish and from English to German are done combining a translation memory with a machine translation engine. Whether this is enough to claim that post-editing is overtaking conventional translation remains to be seen. It would be interesting to compare Memsource data with those from other CAT tool providers.
Another good point made in the aforementioned blog post – and one that the ISO/DIS 18587 ignored – is the evolution of post-editing from conventional to interactive. Conventional post-editing was born with machine translation and happens when the raw output is sent to the post-editor “as is”, while interactive post-editing is the one enabled by the integration with CAT tools and it allows users to leverage fuzzy matches and machine translation suggestions for a higher productivity.
SDL was the first of CAT tool provider to catch on the changing nature of post-editing. The SDL online post-editing course is available as part of the company’s certification program and offered free to SDL Trados license holders. It focuses mainly on the three statistical engine types used at SDL (baseline, vertical and customization) and, more specifically, on the BeGlobal technology, with recommendations on how to best use SDL Trados for an efficient automatic and manual post-editing.