This article was originally published on the Wordbee Blog. Wordbee are the makers of the popular translation management system and CAT Tool for translators.
Following the article on three different approaches to post-editing, I received questions on how to set up post-editing projects. Here I’ll try to answer that and provide some strategies.
Not All Engines Are Created Equal
Not all machine learning is created equal, and this applies to machine translation as well. If you go through the long list of machine translation connectors available in Wordbee, you’ll see that they go from general to vertical to customized.
Google Translate, Amazon Translate, Microsoft Translator and DeepL Translator are general machine translation systems that need to be used with caution. For these, be aware of the data you feed the system (medical, technical, legal, etc…) when you translate. Also, because they are general systems, it’s advisable to treat the machine translation output more like suggestions to be checked or accuracy and terminology.
The more a machine translation engine is vertical, the better. In the case of a machine translation system that is customized for language pair, domain and text typology, the output will be of reasonably high quality.
A New Idea of Post-Editing
Keep in mind that the idea of post-editing has been changing with the spread of neural machine translation (NMT). With statistical machine translation (SMT), errors were predictable and easy to spot. Therefore, experts advised us to start post-editing by focusing on the target text and referring to the source text only when in doubt. In this respect, TAUS’s definition of PEMT (2010) was indicative: “Post-editing is the process of improving a machine-generated translation with a minimum of manual labour”.
With the introduction of NMT, fluency has improved to the extent that it sometimes veils accuracy. This means that post-editing must be getting closer (or getting back) to traditional revision. Because of the misleading fluency of NMT systems, we now have to get the meaning of the source text first and then compare the text with the MT raw output to make sure that the translation is correct and adequate. Back to contrastive analysis, it seems.
Adequacy Still Rules
In the previous paragraph, I wrote “adequate,” not “perfect”. The risk of using a comparative approach to post-editing is exactly this: linguists might want to achieve the goal of a “perfect translation” that has no place in a workflow with machine translation.
In a recent article, Marco Trombetti (Translated’s founder and CEO) provided the results of an analysis conducted on 20 million translated words. He points out that “suggestions from other humans have an average correction rate of 11% rather than 0%.” This means that linguists tend to correct even the 100% matches contained in a translation memory. Is it simply a case of accidental over-post-editing? Or is it because it’s still difficult to put aside one’s own taste and style?
Remember: After the post-editing task, wherever possible, have subject-matter experts check the translation done by an NMT systems for terminology accuracy and consistency. To this end, a Quality Assurance (QA) step built in your workflow will help conduct linguistic checks, including dates, numbers, forbidden terms, markups, leading/trailing spaces, translation consistency, text length, etc.
To evaluate your supplier’s performance, take a look at how you can set up the Supplier Performance Index.
Clear and direct post-editing guidelines are essential. They must indicate the required level of post-editing (light or full), the purpose of the text, and the target group. If the project is split among various post-editors, these guidelines will be of capital importance to assure the right level of consistency.
A good starting point for guidelines are two documents that you can use a reference: The TAUS Post-Editing Guidelines, with a recap available in various languages, and sQuid’s Guide to Post-Editing, which contains specific instructions for project managers and a list of suggested readings.
Both documents provide a blueprint for your own post-editing guidelines and will guide you through aspects like communication with your team, engine evaluation and advice on automatic post-editing.