Standards, Terminology and Europe



I co-wrote this article with Luigi Muzii. The article was originally published in Multilingual Magazine

We use standards every day, in all aspects of our lives. Some standards have been around for hundreds or even thousands of years. Think, for example, of weights and measures and how their differences and similarities affect us all.

Standards provide a shared reference framework that ensures safety, reliability, interoperability, and transparency, with partners having common expectations on each other’s performances and products.

In an ideal market there would be no need for standards, but the more a real market grows in complexity the more they become important and proliferate, and eventually become plethoric.

In fact, standards organizations operating in large economies are increasingly extending their scope, and authoritativeness becomes an issue when standards are released covering niche areas, without being conclusive.

Battling on and through standards

Today’s logarithmic pace of technological evolution and the sluggish operating model of standards organizations are jeopardizing the relevance of any new standards, most of which are doomed from inception. In fact, in addition to the delay in releasing de iure (formally ratified) standards, their adoption is generally voluntary, and this is an incentive for de facto (market-driven) standards to prevail through widespread use. The ratification of de facto standards often comes after they have achieved a dominant position, thus strengthening it.

The last three decades have seen the rising of process standards, often following a regulatory approach. On one hand, certification is frequently wanted mostly to indulge customer expectations, enhance perception, and increase reputation. On the other hand, due to their regulatory intention, process standards are perceived as constricting, inflated and unhelpful.

The huge assortment of standards nowadays promotes the belief that there is no real intent to join forces and maximize efforts. On the contrary, this plethora of standards appears to be the result of faction fights.

Translation and terminology standards

For example, the existing translation and terminology standards are mostly detached from reality and seem the product of pure academic exercises in reiterating century-old practices that look rather unfashionable to outliers.

Standards look as a way for the whole translation community to recover from the failure of having the relevance and importance of translation acknowledged and unchallenged, but this topic is more and more restricted to narrow and narrow circles, frantically and feebly claiming for recognition.

Terminology is exemplary in this respect.

In 1991, during the 3rd TermNet Summer School in Vienna, Christian Galinski predicted that, before the turn of the millennium, the importance of terminology would eventually be universally acknowledged. Mr. Galinski also predicted that terminology would be reserved its own place among C-level executives.

A quarter of a century later, terminology is still an ancillary discipline for a belittled profession, with a lot of specialized literature considering terminology as an indisputable subject. In sacred circles, though, discussions are still around data categories, semantic interoperability, data modelling — obviously according to unsung standards — and term formation in association with standards.

As futurist Raymond Kurzweil illustrated with a brilliantly devised imagery, we are in the second half of the chessboard, and no standard effort, however smart, can keep the pace with technology evolution.

Terminology plays a crucial role in accessing and managing information, especially today, but it is still a knowledge-intensive labor-demanding human task, with users being more and more often unaware of — and possibly uninterested in — its principles and methods, and the many terminological standards available becoming obsolete as soon as they are published because of the slowness of the process and the verticality of topics and efforts.

Every year, TermNet, the Vienna-headquartered International Network for Terminology organizes an online training with final exam that requires the presentation of an application scenario. The course is sponsored by the European Certification and Qualification Association, a non-profit association whose aim is to provide a world-wide unified certification schema for numerous professions. Sessions are held by academics and experts tackling the main aspects of terminology management, with participants being given useful information and examples, but almost no practical exercises on term extraction, stop-word list building, term data handling, and generally real life scenarios, while much time is devoted to data categories, data modelling, semantic interoperability, and even on team management theory.


Reality check

How much time can translators — be they freelance or in-house linguists — really spend on terminology, if we consider the productivity level and the strict deadlines that are imposed by the various parties involved in a translation project?

From experience we know that translators have hardly the time to quickly click on the concordance option in a CAT tool to browse through the translation memory they were given and add terms with a second click to a given term base. We also know that the exchange of term bases from one CAT tool to another will bring loss of metadata, import problems, waste of time and, in general, a strong headache.

A lesson from lexicography

In 2007 Erin McKean, a lexicographer and editor for the Oxford American English Dictionary, gave an enthusiastic TED Talk on the joys of lexicography. Her objective was clear even for a layman: the creation of an online dictionary collecting not only all the traditionally accepted words and definitions, but also new words and new uses for old words. The talk became a huge success.

Anno 2015 Ms. McKean heads, the world’s biggest online English dictionary by number of words. Example sentences are pulled from major news media (one example for all, Wall Street Journal) and from books available in the public domain (Project Gutenberg and the Internet Archive), as well as from other sources across the web, including less conventional ones, like blogs. The website also offers all sorts of information on each word: synonyms, hypernyms, hyponyms, words used in the same context, a reverse dictionary, and tags.

Of course, there are differences between lexicography and terminology. One might suffice for all: while the former is descriptive, the latter tends to be more normalizing — if not prescriptive. But is pointing us in the right direction: Collaborative, cloud-based translation environments that allow the sharing of linguistic data — in the form of translation memories and term bases — coming from all the parties involved in a translation project are the best way forward.

A role for Europe

Europe_flagsIf it is true that terminology plays a crucial role in accessing and managing information, not much effort has been made so far to promote terminology and translation knowledge, as well as acknowledge their importance and value.

The Old Continent is where standardization was born and is still homeland for translation studies, for research, staffing, and resource organizations. And yet, most efforts have been focusing on updating terminology and translation standards and issuing new ones, without giving evidence of their actual impact, if any, on the evolution of society.

Like translation, terminology is such a complex, time-demanding, knowledge-intensive task, and it is hard to show its cost effectiveness and have as many people as possible be interested in it, see, exploit, and acknowledge the benefits of it.

Maybe, potential users could benefit from the definition and actual spreading of basic criteria and requirements for using terminology and profit from it. Hardly could they be interested in theory, even when relating to methods and applications.

Missed opportunities

While we are writing, a controversy is raging over the insolvency of four Italian regional banks. Many unknowing customers of these banks were pushed to buy subordinated bonds, and eventually lost their life’s savings.

IATE has three entries for ‘obbligazione subordinata,’ all marked as reliable, whose definitions are mostly overlapping and inconsistent with ‘standard’ methodology.

The only entry available in Wikipedia, in English, is for ‘subordinated debt,’ with the equivalent, in Italian, of ‘debito non garantito’ (junior debt,) containing a reference to an obscure ‘credito chirografario’ (unsecured debt, in English, in IATE.)

This is solid evidence of the importance of terminology and of terminological resources: But how many non-linguists — and maybe even linguists — know of the existence of IATE?

And yet, this is not an isolated case. Fifteen years ago, at Linate Airport in Milan, Italy, a SAS airliner carrying 110 people collided on take-off with a business jet carrying four people bound. All 114 people on both aircrafts were killed, as well as four ground personnel. Investigations identified a number of deficiencies in airport procedures, including violations of ICAO regulations on the part of air traffic controllers, ranging from uncorrected incorrect read-backs to the usage of non-standard phraseology in communications, with a specific irrelevant term — extension — leading to a fatal misunderstanding.

All this calls into question the weight and trustworthiness of terminology standards. We also need to mention that ISO nor the other standard-setting bodies provide for any public term base whatsoever.

In a 2001 report for the now long-defunct LISA titled Terminology Management in the Localization Industry, author Kara Warburton somberly noticed that, “Globally active organizations whose core business is not communications-related (translation, localization, information management, etc.) are generally unaware of the benefits of performing terminology management.” More recently, a Common Sense Advisory survey revealed that only 41 percent of localization-mature organizations have some terminology management policy in place, almost solely translation-oriented.

Things do not seem to have changed much since then.

Ten years ago, in an article in volume 13 issue 3 of KMWorld titled The high cost of not finding information, Susan Feldman reported that, in 2001, IDC began to gather data on the costs an organization has to face when it doesn’t find the information needed. IDC’s study showed that knowledge workers spent 15% to 35% of their time searching for information, that searches were successfully completed 50% of the time or less, and that only 21% of workers found the information they needed 85% to 100% of the time. The time spent looking for information and not finding it cost an organization a total of $6 million a year, not including opportunity costs or the costs of reworking the existing information that could not be located. The cost of reworking the information that was not found cost that organization a further $12 million a year (15% of time spent in duplicating existing information). The opportunity cost of not locating and retrieving information amounted to more than $15 million per year.

Also, in a study for the EU-funded MULTIDOC project in 2010, Jörg Schütz and Rita Nübel claimed that terminology has a cost multiplier of 10 for localization and of 20 for maintenance.

Terminology management can be extremely costly in the short term, especially for a localization-negligent organization. According to a JD Edwards study presented at the TAMA conference in Antwerp in February 2001, one terminological entry cost $ 150.00.

Again, this data could generally be considered valid today.

Actually, terminology is a (rare) commodity, useful, but expensive, because it requires considerable resources, and it should be easily understandable that terminology work and management must be sustainable and, therefore, this requires the ability to estimate revenues.

Google research showed that Google saves an average of fifteen minutes per query (once you are in library). Using the average hourly wage of Americans ($22), thus saving 3.75 minutes per day, this works out to about $500 per adult worker per year.

Consider now the IBM estimates saying that it would take a doctor 160 hours of reading each and every week just to keep up with relevant new literature, and how this task can be made easier with proper indexing.

Many potential terminology users could really be not very interested in standards, but in the associate terminology. Of the hundreds of standards available at ISO and regional standards bodies, more than half contains terminology. This could then be harmonized, structured, and made publicly and freely available.

And yet, no speaker at the most prominent event on terminology, in Europe, the TOTh Workshop, hosted this year by the Terminology Coordination Unit of the European Parliament, dealt with the issue of having terminology become a popular topic and discipline, let alone its cost.

In November, the European Association for Terminology (EAFT) will celebrate its 20th anniversary in the historical first hemicycle of the European Parliament with a flashback on the activity in terminology in the past 20 years. During the event, a prize will be awarded to the best thesis on terminology. Rather than financing mammoth DGT-oriented educational programs with the typical EU regulatory aim (have you ever heard of the bendy banana law?), the DGT could fund a program for the consolidation of the many dust-collecting terminological archives scattered all along the Old Continent in its innumerable universities. This program could be entrusted to a pool of outstanding graduates from the universities feeding the ranks of underpaid DGT interns.

On the other hand, the DGT has been doping the European language industry for decades, and academic institutions have vied to flatten out on its needs, thus breeding flocks of mostly inadequate would-be translation professionals, and fed them with the illusion of brilliant careers and well-paid jobs.

DGT is the largest translation service in the world, which overpays and pampers its employees while underpaying freelancers, sometimes even vexing them with absurd claims and heavy remarks and, at the same time, offering a rare chance to draw unlikely academic paths to unashamed academics that stay away from market reality.

DGT’s quota is 1% of the overall EU budget, an amount of money that is usually enough to cover important expenses in almost all EU member states and in many advanced economies.

The average productivity of a DGT translator is approximately less than 800 words per day (by dividing the in-house total volume of words produced by the DGT by the number of translators). It is roughly less than a third of the average productivity of an experienced freelance professional. At a cost that is at least ten times higher.

Recent estimates give the outsourced quota of translation production around 26%, corresponding to a roughly amount of 150 million words, thus significantly impacting the EU translation market, especially for minor combinations.

Not by translation alone

Basic strategic planning involves estimating the market size, the growth rate of both the market and the business, as well as the investments required to win the business goals (market share, revenues, position, reputation, etc.)

This indulgence towards EU institutions allowed the European translation community to elude any strategic planning, in the vain belief that EU institutions would run all the necessary research that could then allow buyers and providers to succeed even beyond local boundaries.

Any research effort should consider the market at large, spot and analyze unmet demand, identify any signs of changes. Actually, this is a job for the many industry organizations based and operating mostly in Western Europe. But even umbrella organizations are almost inactive in this respect, though, while pulverization of representation reflects the intrinsic weakness of the industry.

Pulverization is also at the origin of the lack of innovation in the translation business, together with a disinclination to collaboration, and a highly conservative nature of players. Even the regular mergers and acquisitions have never reduced pulverization nor produced any real innovation. They aim at complementing customers, offerings, and extend market penetration, very seldom at acquiring greater financial strength to fund innovation. Innovation is viewed as an inescapable evil, yet it is necessary, not sufficient. Europe, especially Western Europe, has been lagging behind on this front.

As Mariana Mazzucato brilliantly explained in The Entrepreneurial State, real innovation cannot exist without public involvement. And yet, for years, the DGT has been striving to justify its expenditures — and existence — rather than to illustrate its goals and merits, while most EU-funded projects often remains unknown, ignored, and/or without producing any fallouts.

The last real innovation in translation were translation memories, a quarter of a century ago. And yet, think, they were born in Europe. Even TMS are a very peculiar abridged application of project management software, afar from workflow management systems, which remain extraneous to the translation business, although they could be a leap forward. And, again, they were born in Europe too. The same goes for Moses, the open-source SMT engine.

Why? Make an educated guess.

Today, the translation community in Europe is still at a navel-gazing stage, especially in the academic field, but not only. Quality is a perfect example. It is a most debated topic and yet it is still at the I-know-it-when-I-see-it and error-catching stage, affected by an incurable red-pen syndrome. In the best case, the best minds are working on yet another quality standard, some other metrics, and some fashionable application to count errors.

The future is past

In the last two decades, the ability of effectively using and integrating a wide range of software tools forming the typical translator’s toolbox has become pivotal. Today, translating is less and less a question of language knowledge and more and more one of knowing how to use it and the right tools to exploit it. The integration of machine translation into the now widespread, comprehensive, and increasingly mundane translation tools is making machine translation and post-editing part of a translator’s daily job.

The last year marked the final statement for data as the lifeline of our online existence. With hardware increasingly being commoditized and software simply a click away, data is gold. Machine learning technologies are revolutionizing everything, from image recognition to voice transcription to machine translation. These technologies require massive amounts of training data.

Translators will have to be able to build parallel corpora, produce, access and use (big) data, process unstructured dataset to mine, produce and manage rich terminology data, but formal translation education still does not consider linguistic data and its manipulation in an innovative perspective.

Terabytes of translation data are produced in Europe every year. But, as Andrew Joscelyne and Anna Samiotou explained in the TAUS Translation Data Landscape Report, data sources are heterogeneous and unbalanced as of language coverage, and private owners can be reluctant to give their translation data for free or even to open source it. Traditional public sources of translation data are no longer enough already. Incentives are necessary for a translation open data project preventing any conflicts of interests.

Futurists, visionaries and wishful thinkers

The translation community remains rather close and definitely conservative. Business models and production processes remain unchanged, together with the diffidence towards innovation. Anyway, many business scholars argue that innovation is not coming up with something big and new, but instead recombining things that already exist.

Maybe, when advocating innovation in the translation industry, most insiders are just indulging in some wishful thinking. Still too often, translation is depicted as a highly technical and dynamic process requiring both human and technological involvement, complicated to the point that no step can be definitely removed or absolutely needed. Now, technology is already playing a growing role in every area of everyday (working) life, and translation technologies will certainly replace a certain way of applying knowledge.

Despite any autosuggestion effort, translation is still scarcely recognized perhaps because demand is prompted by factors other than those traditionally proposed by industry players; more than quality, customers seem to be increasingly interested in accessibility, convenience, price, and speed. These last two factors seem to be most decisive, while most customers are seemingly disoriented by the absence of a fair balance of efficiency, ease of integration, convenience, and return on investment.