When I am involved in training colleagues or students (for ex., the ITV students) and have to talk about machine translation and post-editing, I have to remind myself not to take a few things for granted:
- My story shouldn’t become too technical, otherwise most of the participants will have a case of MEGO.
- I need to pinpoint that, although far from perfect, Google Translate has been constantly improving since inception, especially considering that is not a language- and domain-specific system. Inevitably, one or more students come up with the “I am better than GT” statement in class or in their social media account.
- It is vital to start using a CAT tool and build your translation memories and term bases from the very beginning; yes, even if you’re still a student, and yes, even if the texts you’re supposed to translate for homework are all on different topics. Once you realize how important translation memories and term bases are for your work as a linguist, they quickly become your best friends.
Leveraging your own language data
Now, let’s say you have always been collecting your language data. Do you really take care of them as you should? Do you…
- … insert the metadata in a new TM?
- … do a bit of maintenance after a certain amount of time, especially if you harvest language data from other sources?
- … make sure that the terminology is consolidated?
- … run an automatic QA checks (empty or duplicate segments, terminology…)
- … filter out obsolete translations?
If you answered yes to all these questions, you are one of the very few smart TM users. I must confess I can be quite sloppy at times. If you answered no, ask yourselves: Why should I spend some of my precious free time to maintain my TMs? Well, because clean and consistent TMs can help improve the quality of your work and boost your productivity.
In addition to this, as the datafication of the translation industry is now a reality and data seems to be the new fuel, your language data represents a very precious drop of that new oil that is used to train MT systems. But what about using your TMs to create your own machine translation engine, an engine that only you can access and use, and that is not subject to any external influence/contamination?
Enter Slate Desktop.
I have been aware of the existence of Slate Desktop (“SD”) since October 2015, when Tom Hoar, CEO of Precision Tools, was still putting the final touches to the product. Tom Hoar is not new to the translation industry. After I read some reviews of SD from a couple of colleagues, I found it hard to turn down his invitation to join the Blogging Translator Review Program.
Some background information: Slate Desktop is an out-of-the-box Moses-based solution that transforms your TMs in translation engines. These engines can be integrated in your CAT tool via a plugin and suggest draft translations based on the vocabulary and style of your TMs. Once you have downloaded and installed the program and trained one or more engines, it all happens offline, on your desktop. This is the point that Precision Tools wants to stress the most: Confidentiality is set at 100%. It’s your data, your engine, and only you can access it.
So, how difficult is it to build an MT engine with SD?
I decided to make life difficult for SD and to feed it a real-life scenario.
I selected three .tmx files with a total of 250.768 segments (that I started to collect back in 2003 from the same client). The selected tmx files were client-specific but contained my translations, the translations of a colleague who takes over when I am on holiday and translations from some other unknown colleagues (when, in few cases, my client had been “unfaithful”). Also, these tmx files were somewhat messy: they contained few terminological discrepancies; some sentences were long and convoluted, while others were one-word long, etc…. All in all, a great mess.
I installed the program on my recently dismissed computer, a 4-year-old Samsung laptop, with a Core i3 processor, 8GB of memory and a 256GB SSD.
Installing SD was easy. The knowledge base on the SD website is a fair step-by-step guide, even though some screenshot might be a bit different from the ones you see on your screen, probably because they refer to a previous version of the product.
Next, I uploaded the files and entered the engine parameters. It just took few minutes to create a corpus.
Then, it was time to hit the “Build an engine” button. To create an engine, SD goes through 18 stages. If you want to follow the progress step by step, you can read the changing text on your screen and you will learn more about the various processing. Or you can walk away and do something else. There are some log files that you can check afterwards, if you like. My engine was ready in 9 hours (give and take). If your computer is faster, chances are you won’t need all that time (depending also on the size of your corpus).
In my case, at the end of the process, the following screen appeared on my monitor:
The most important piece of information in the above screenshot is the quality quotient, which expresses the engine’s capacity to mimic the user’s work. My quality quotient of 29.75% is within the average values, as indicated on the SD website:
25% to 45% – You’re among the most common Slate Desktop user group. Continue your evaluations and start measuring the changes in your capacity.
SD also creates an evaluation report that you can view in Excel. Here is mine.
In general, my first results are not that bad, considering the messy TMs I used. SD allows me also to force terminology, if I want; on the other hand, a mismatch like upper/smaller cases can only be fixed manually, at least for the moment.
The final test
SD can be used standalone or within several CAT tools, among others the one that I use, i.e. memoQ.
If you can speak Italian, you’ll see that the translation offered by Google Translate (first screenshot, oragen sentence) requires less post-editing than the one by SD (second screenshot, orange sentence). Keep in mind, though, that Google Translate has gone neural already.
Because memoQ doesn’t allow me to pre-translate using an MT engine combined with the fragment assembly functionality, I also pre-translated the same text only with the same TMs used to build the SD engine. And the results were disappointing.
So, is SD going to be a translator’s best friend?
Difficult to say. If you’re a translator who works with texts that are not confidential, your best bet would be to choose Google Translate (of course, it also depends on your language pairs). If you translate mostly confidential texts and your clients require maximum privacy, then SD could help you leverage your language data. The price tag remains a sore point: SD might be a serious investment, especially if you consider the rate war going on in the translation industry.
At the same time, SD could be a good alternative for small agencies that a) have a large amount of data but don’t traslate volumes large enough to justify a subscription to an online MT service and b) work with confidential texts. In this case, the agency could create an engine, use it to pretranslate a text and then send the file to its translators for conventional post-editing.
As for me, my next step will be to clean up my TMs, rebuild the engine and see if SD will really boost my productivity. To be continued.