Post-editing: From Controversy to Consensus

consensus

This article was originally published on the TAUS Blog.

Until a year or two ago, post-editing of machine translation (PEMT) was considered a taboo, like sex, religion and politics. Professional translators were refusing to post-edit machine translation output, partly because of the low quality of machine translation engines and partly because they were raging against machine translation the same way they had raged against CAT tools some 20 years before.

Language service providers, on the other hand, still hadn’t figured out the best practices for using and offering machine translation and PEMT services. Ignorance of what PEMT really entails and its difference compared to revision can be considered the common denominator for the problems of both groups. There is still a small hotbed of resistance opposing MT and PEMT, but it’s less and less vociferous of late.

In fact, post-editing is becoming an integral part of the mainstream translation production process, as many CAT tools are connected to MT systems through APIs and, therefore, can pre-translate all “empty” segments in a project, thus offering suggestions to translators. And, as an MT engine improves, the quality – and thus the usefulness – of the raw output improves as well, helping translators increase their productivity.

The steady increase in the number of participants of the online TAUS post-editing course is another piece of evidence of a wider interest in PEMT. Since the launch of the course in March 2014, we have registered over 700 participants, a crowd of translators with various levels of experience, project managers, subject matter experts and translation students.

Worth mentioning is the fact that many language service providers are prepared to cover the cost of the TAUS course for their translators in order to create teams of post-editors for specific projects.

At the latest edition of Localization World (LocWorld28) in Berlin, PEMT was also a major topic of discussion at the TAUS track.

The State of Post-editing

One of the sessions in the track aimed at offering the three different points of view of a translation automation entrepreneur, a researcher and a language service provider.

The entrepreneur

Diego Bartolomé, CEO of Tauyou (a company specialized in machine translation, natural processing tools and process automation for the translation industry) gave a sketch of the current state of post-editing. Tauyou boasts a database of 1000+ post-editors, mostly recruited via ProZ.com. It isn’t clear though, how many of them have actually had some training or previous experience in post-editing. Bartolomé confirmed that MT tools are becoming more translator-friendly mainly due to the integration of APIs. According to him, the API economy is the next not-to-be-missed best thing for the translation industry (as well as for the rest of the business world), as APIs can facilitate the exchange of business competencies and skills.

The researcher

John Moran (Centre for Next Generation Localization) gave an overview of the available tools to measure and improve translators’ productivity. Moran encouraged the audience to set aside the current practices of a) asking a translator to communicate the number of hours worked and/or b) letting the market forces decide the PEMT discounts.According to Moran, the first method is unreliable, especially in those cases when the impact of MT on productivity is less than 30%, while in the second case a long-term translator/vendor attrition could arise, due to post-editors’ rates falling beyond any decent measure.Moran suggested using the SLAB test method (SLAB being a fancy acronym for Segment-Level A/B testing) in combination with various tools, e.g. MemoQ, MateCAT, Studio Time Tracker etc… We are happy to say that the TAUS DQF is among his recommendations.

The language service provider

Selçuk Özcan, co-founder of Transistent Language Automation Services (an Istanbul-based company offering MT services) analyzed the main steps of an MT and PEMT workflow: from engine training (with bilingual and monolingual corpora analytics and terminology extraction) to gap and broken pattern detection and rule and data patch distinction. Two necessary steps to arrive at a mature production system:

  • From quality evaluation and quality level definition to the evaluation and error-typology annotation of the MT output before it is sent to post-editors
  • From the compilation of clear post-editing guidelines to the evaluation of the post-edited task.

Overall, the three presentations confirmed that, although a lot is happening in the field of PEMT, things still need to move further. Towards more defined best practices to be shared and implemented by all.

eBay’s Case Study

During another session, Lucie Le Naour and Costantis Galatis showed us what happens backstage at eBay.

Translation is essential to eBay’s business. Buyers want to see the whole website in their own language. A poor quality translation represents an unnecessary risk: Firstly, it pushes users towards those competitors that can offer services with a better linguistic quality level; secondly, a bad translation can negatively affect a user’s opinion of the company.

The translation of user-generated content clearly brings some challenges: the short texts that form the search queries, the item descriptions and titles are very often linguistically imperfect. They lack context (pay attention to the item descriptions next time you’re visiting the eBay website, you’ll see they consist of a number of terms one after the other) and aren’t normalized: they are often written in an informal language and can contain spelling and grammar mistakes.

eBay’s workflow is a good example of how post-editing should be integrated. The company has precise guidelines for each content typology and detailed instructions for the vendors. The translation data is post-processed with regex before the vendors start post-editing, and lists for almost everything are available, e.g. blacklists of non-usable expressions, lists of usable and non-usable acronyms, do-not-translate lists of brands and names.

Post-editing at TAUS

The post-editing course we offer at TAUS has been designed to present post-editing as a different skill from the more conventional revision. The course tackles the main differences between SMT and RbMT, pre-editing, quality evaluation, light and full post-editing.

The feedback from participants has been very positive from the start and – hear hear – many skeptical translators have changed their opinion about this new skill.

In addition, following requests from our members, we have developed over 22 language-specific post-editing exercises (an error typology evaluation exercise and a productivity measurement exercise). We started with the FIGS languages and are now adventuring into more exotic linguistic fields (Finnish and Ukrainian language modules were added just last week). We invite you to take a look at the list of available language modules and if your language is not yet on the list, drop us a line and we will develop it for you.

Finally, the post-editing course will become part of the offering of the newborn TAUS Academy, but that, as they say, is yet another blog post.