This article was originally published on the TAUS Blog.
Can translation service providers and technology providers meet translation buyers’ requests? How can we evaluate the quality of both source and target within the translation process? How do we measure localization processes? These are just few of the questions that passed the review during the roundtable hosted by TAUS in Barcelona on May 12, 2016.
(image copyrights: TAUS, 2016)
Integration Is Key
“There is no more interesting industry than the translation industry nowadays.” These were the opening words of Jaap van der Meer (TAUS Director), who reminded us that we are living the Convergence Era, when translation is a full-fledged utility, available to everyone on every single electronic device, be it a computer, a tablet, a cellphone or a TV screen. We’re not only talking about Google Translate – although Google is partly responsible for bringing machine translation to the masses. We are talking about a fusion of technologies: machine translation, text-to-speech (and speech-to-text), sign-to-speech, augmented reality based on ORC recognition, just to mention a few examples.
Each technology is not an isolated item anymore: We are witnessing the development of systems broad enough to include under the same cloud CAT tools, TMs and MT engines, TMS functionalities, CMSs, CRMs. For this convergence of technologies, we need a new business model. We also need to measure all activities and processes with the help of data, whether linguistic data or process-related. We need, in short, to get ready for the next phase, which Van der Meer calls the Singularity Era, characterized by deep learning, neural networks, robots and artificial intelligence.
Everything Is Correlated
The one-day session was centered around five themes: Vision & Strategy, Data & Data Sharing, Quality and Localization Effectiveness, Post-editing, Game-Changers and Innovation. As it was clear while the day unfolded, and in spite of the compartmentalized presentations, there is a strict correlation among these five themes: There cannot be a strategy without technology integration and a well-designed business model; there cannot be quality measurement without data, and there can be no innovation if we don’t know how to measure and evaluate localization processes.
Luigi Muzii (sQuid) explained the important role of data in measuring business processes. Although measurements are important in themselves, to gain useful insights the various sets of data must be correlated. In addition to the standard financial and operational indicators, it is necessary to collect other types of data, especially performance data, and conduct measurements through correlation (e.g., total productivity correlated to productivity per vendor; job-related data compared with data about deadlines, volumes, compensation and reworking). In this way, language service providers can offer their clients a clear indication of the expected results. Modern technology must support content profiling and ranking, in order to compute a predictive quality.
Language Policy and Technology
During the roundtable David Pérez Fernández, a representative of the Ministry of Industry, Energy and Tourism, announced the effort of the local authorities to strengthen the position of minority languages and cultures in the Autonomous Region of Catalonia through the funding of various (technological) projects.
According to the linguistic census held by the Government of Catalonia in 2013, Spanish is the most spoken language in Catalonia (46.53%), followed by Catalan (37.26%). The Catalan language has been granted special legal status within the region: it is an official language, together with Spanish and Aranese Occitan (spoken by some 22,000 people)1.
As the Apertium project shows us, minority languages are not afraid of technology. Gema Ramirez-Sanchez (Prompsit), a self-declared Apertium activist, underlined in her presentation that, after 11 years of creating MT systems, Apertium cannot be considered only a platform for rule-based machine translation anymore. It is a vast language resource repository (41 language pairs are currently available), with particular focus on minority languages: Besides Catalan, Occitan and Aragonese, the Apertium website hosts linguistic data for other languages like Asturian, Icelandic, Maltese and Macedonian. The resources are shared under GNU GPL and are downloaded thousands of times every week. Data sharing is one of the main characteristics of the Apertium community.
Quality Data: A Nice-To-Have or a Must?
Data is the new fuel of the translation industry. But how can we collect linguistic data? What should we do to overcome the legal/technical issues regarding translation data sharing? What should we measure and how? While most answers to these questions can be found in the Translation Data Landscape Report that TAUS published in December 2015, at the roundtable participants had the chance to see and discuss two mini case studies.
The presentation by Doron Schwartzblat (Straker Translations) made clear that “if you don’t capitalize on your big data, you are going to be left behind.” One example for all: Straker Translation has developed an algorithm to create an ad-hoc team of translators for each project, to remove the bias of the project manager in the selection of the linguistic resources.
María Azqueta and María Illescas (SeproTec) showed how the TAUS Quality Dashboard is set to become an important tool and an industry standard for measuring quality. Azqueta and Illescas underlined the simple installation of the Quality Dashboard and the immediate availability of metrics for every single projects. Project information is always at hand: The Quality Dashboard allows various comparison between post-editing jobs, CAT tools, MT engines etc. Even SeproTec translators have given a positive feedback: They don’t feel like they have been controlled and the measurement, especially in terms of time, is much more realistic.
As Paola Valli (TAUS Product Manager for the Quality Dashboard) explained, the Quality Dashboard is a much needed platform to collect the key data points. It allows a solid benchmarking, both for internal processes and for a comparison with the whole industry. The Quality Dashboard is, according to Valli, “a democratic tool” in that it involves the whole translation supply chain: it’s not limited to measuring the quality of machine translation projects, but it can also be applied to the evaluation of human translation and various translation memory activities.
The TAUS Quality Dashboard is integrated through the TAUS DQF API, which is freely available since 2014. A number of technology providers (MateCat and SDL) have already integrated it; others (like LingoTek and Memsource) are in the process of doing it.
There Is an API for Everything
It is certainly true for apps, but now it’s true also for APIs. Diego Bartolomé (tauyou) stated that API-fication is essential for the translation industry.
An API allows to connect a cloud platform to other systems. Most cloud-based CAT tools and ERP systems offer APIs that prove useful in streamlining the more mundane tasks of the translation workflow. “How to remove people from workflow? With an API,” explains Bartolomé, half-joking. “If you eliminate the human factor from the equation, you could create some interesting options for your clients.” Provided of course your organization has the right expertise in-house.
According to Bartolomé, APIs can contribute to new business models, in order to move away from the conventional price per word/segment model towards freemium or subscription-paid models (based on a fixed number of API calls) or task-based models. Gengo, Unbabel and Translated.net are examples of translation companies exploiting the power of APIs.
The Search for Game Changers
Are APIs enough to meet the requests of translation buyers? In the last part of the roundtable, Patricia Paladini Adell (CA Technologies) traced the ideal requisites for language service providers.
The old waterfall approach is no longer sufficient for software localization: Steps like user interface validation or traditional terminology and query management need to be eliminated to meet the sprints and cycles of the agile methodology. The traditional approach of language service providers is now an obstacle to software localization: The turnaround time is so short that vendors don’t have the time to build ad-hoc translation teams and send files back and forth.
Automation and continuous delivery: These are the key points of the localization process as envisaged by CA Technologies. What translation buyers in the software industry need is a fully automated end-to-end process, going from the developer’s code repository to the translation editor. Within this process, translation memory leverage and quality control features must be guaranteed; all the tools for quality check, terminology and query management must be integrated in the workflow.
Are there any language service providers or state-of-the-art technology providers ready to cover all these requirements? CA Technology is daring the localization industry to move away from old-fashioned approaches.
TAUS Roundtable: A More Accessible Format
News and updates from the localization and translation technology industry usually come from Dublin, Seattle, the Silicon Valley and other places where the majority of the events for the translation and localization industry are held.
For its roundtables – a format devised for the first time this year – TAUS has chosen somewhat unusual locations, like Vienna, Barcelona and Riga, bringing under our attention some regions of the translation industry that up to now have had a low profile – not by choice – and where some interesting research and business projects are taking place.
The TAUS roundtables aim to find a better connection with all translation practitioners. If you find the more traditional TAUS events too crowded or too high-level, you might want to consider joining a roundtable discussion (the next one is in Riga on June 1st), where you’ll be able to exchange ideas and information with a smaller group of peers.
1. Source: Wikipedia