This article was originally published on the TAUS Blog, on November 26, 2014.
At the latest edition of Localization World in Vancouver, the Baltic company Tilde presented an integrated cloud platform offering machine translation in different flavors, in combination with terminology tools and a multilingual data library.
We talked to Andrejs Vasiļjevs (Tilde’s CEO) and Rihards Kalniņš (Tilde’s International Development Manager) about it.
After Die Wende
The two years preceding 1991 are recorded in history, but what do you remember of 1991? A quick search will tell you that in 1991 Tim Berners-Lee announced the World Wide Web; Linus Torvalds released the first version of the Linux kernel… and Latvia, Estonia and Lithuania became independent. The term Internet of Things was coined following the introduction of RFID (Radio Frequency IDentification) tags; the first Starbucks Coffee was opened in California; and Arnold Schwarzenegger was back on the big screen in “Terminator 2: Judgment Day”.
What you may not discover easily is that Tilde also developed its first product in 1991: it was a keyboard driver for Latvian. Andrejs Vasiļjevs explains: “In Latvian we have a number of diacritical marks. To write them with the keyboard driver you had to press the tilde key (~, left to 1) first – and then the corresponding letter. The tilde key was a gateway to the Latvian language on computers.”
From then on, the company began developing dictionaries, spell-checkers and other language tools, first for Latvian and subsequently for other languages.
Fast forward
It’s 1998: Larry Page and Sergey Brin are founding Google and Microsoft is introducing the almost forgotten Windows 98 – the second major release in the Windows 9x line of operating systems. The Space Shuttle Endeavour launches the first American component to the International Space Station, and the Coen brothers brought the epic Dude to life in “The Big Lebowski”.
Meanwhile Tilde is making its first steps into the field of machine translation. Considered state-of-the-art, the rule-based approach prevailed. “It took quite an effort to develop the first Latvian/English and Latvian/Russian systems using the rule-based paradigm.” In previous years, the company had developed a deep knowledge and various language tools, like POS taggers, morphology analyzers, and syntactic analyzers.
Enter Google Translate
In 2007, the first-generation iPhone went on sale; Google Translate started stealing the technological scene and the Statistical Machine Translation (SMT) technology had reached maturity. Tilde combined this technology with its linguistic know-how to achieve optimal machine translation quality.
Switching to SMT made it much easier for Tilde to expand the languages supported. From the get-go, the Baltic company has shown a predilection for the smaller languages (first the Baltic languages – Latvian, Estonian and Lithuanian – to be followed by Bulgarian and Slovenian, Swedish and Danish, as well as Portuguese and Spanish) that are not well served by Google Translate or by Microsoft Translator.
“Within the EU context Baltic languages are very interesting,” explains Kalniņš. “There aren’t many speakers. Finding, collecting and extracting data can prove difficult, but what we have been able to do – due to the great amount of work we did in the past in the field of language technology – is leveraging these skills for other minority languages as well.”
Smaller languages are not only small in terms of number of speakers and availability of the sources but they also tend to be morphologically complex. “Take Estonian: The Estonian language has 14 noun cases. Tilde’s MT system for Estonian has proven to achieve a higher score than Google Translate. By putting in an extra effort we can now offer valuable MT tools for the speakers of this language,” continues Kalniņš. Tilde has been collaborating with various universities, including the University of Edinburgh, where, as Andrejs says, “the father of Moses used to work”, the University of Zagreb for research projects on the languages of the Balkan region and the University of Uppsala (OPUS corpus).
Loyal to the core (business)
Although deeply involved with the development of LetsMT (the technology that powers Tilde’s machine translation solutions), the company has never forgotten its first love: terminology. In 2013, Tilde presented Terminology as a Service (TaaS), a cloud platform to gather and process multilingual terminology data.
TaaS is an EU-funded project that Tilde developed in cooperation with the Cologne University of Applied Sciences (Germany), the University of Sheffield (UK), Kilgray (Hungary), and TAUS. This enables to streamline the terminology workflow with automatic extraction of terms and research of candidate terms in various online databases.
Beaming it all up
LetsMT and TaaS are now combined in a single cloud platform, with the addition of a multilingual data library. “We provide quite a huge online resource of clean and verified terminology, and users can benefit and pick up the collections best suited for their needs,” explains Vasiļjevs. The repository contains 4 million terms and 2.5 billion parallel sentences in 125 languages.
Flexibility is key to this platform. For the MT engines, users can choose among 3 options: Ready-to-use MT (domain-specific engines for the legal, pharmaceutical and IT fields) to get a taste of the possibilities offered by machine translation, Custom MT for higher quality translation output, and Build-your-own MT for users who like to keep control of their own MT engines.
What makes Tilde’s cloud platform different from all others? Vasiļjevs relates: “First of all we combine the synergy of SMT language specific components with terminology tools and services to address major weaknesses in machine translation. In addition, our offering focuses on smaller languages. Many MT players focus on few widely spoken languages. But to address a global market you need to cover other languages as well, especially the 24 official languages of the European Union.”
The immediate future
Having already build MT systems for the Latvian and the Lithuanian governments, Tilde has cut its technological teeth providing MT solutions outside the translation industries for various scenarios: translation tools for government employees, large-scale data analysis, international media monitoring, integration into e-services, access to cultural information.
The first six months of 2015 will be marked by the Latvian EU presidency. for which Tilde has provided a number of MT engines. Officials, journalists, diplomats and other EU professionals will have the chance to put this technology to good use.
“We are eager to see how MT technology is going to be put into use to facilitate multilingual communications,”concludes Vasiļjevs.
More of everything
Eric Schmidt, Google’s CEO, sums up the current trend very nicely in his book How Google Works: networks, the Internet, mobile devices, and cloud computing have brought disruption into most industries. In combination with the relevant tools and applications, cloud computing offers virtually infinite computer power and storage.
This is more or less what is happening in the translation industry right now: most companies are moving their products and services to the cloud in order to gain – and offer – more power, flexibility and customization.
No more limits nor boundaries, and enough room to store one’s happiness.