This article was originally published on the Wordbee Blog. Wordbee are the makers of the popular translation management system and CAT Tool for translators.
During a recent Wordbee panel on terminology management, launching our new terminology management tool, one of the questions was how to start building a terminology database and how to do so efficiently.
There is no need to start from scratch to harvest terminology: You can start with your translation memories. As a matter of fact, a translation memory (TM) is a trove of valuable terminology that can be mined either manually or semi-automatically. With small TMs, you can browse the segments and select the terms you need using the concordance function. With larger TMs, you may want to work in semi-automatic mode and take advantage of various open-source/free term mining programs.
Sometimes, translation memories might also contain free-range definitions that can be helpful to fill term entries. You can quite easily spot free-range definitions (not official ones, mind you) within a text, because authors/copywriters often use ready-made phrases like “also known as”, “which means”, “meaning” etc. to signal the introduction of an unfamiliar word or a technical term.
Once you have processed your TM, you can also beef up your termbases by collecting terms from other freely available online databases.
The Internet is, of course, a wonderful trove where you can find everything and more. Terminological sources are just a click away. But, just like fake news, unreliable, shoddy sources are also out there. Here are some basic criteria to help you choose terminology sources.
Also, to point you in the right direction, the Wordbee team has selected 10 of the best terminology sources available online.
Please do send us your suggestions and let’s make this list grow.
IATE, i.e. the Interactive Terminology for Europe, is the most convenient starting point. It is the interinstitutional terminology database of the European Union and it is used in the EU institutions and agencies since summer 2004. The IATE website is managed by the EU Translation Centre in Luxembourg on behalf of the project partners. The FAQ page contains intriguing information about the work going on behind the scenes (who enters terminology, how are terms selected etc…).
The main benefit is that IATE is available for download: Users can extract terms for one or more languages, for one specific domain or for a domain cluster. The subsets are provided in TBX format and can be imported in your own termbase.
WIPO PEARL, the multilingual terminology database developed by the World Intellectual Property Organization, is a thing of beauty. Developed by WIPO language experts and terminologists, this database contains terms in Arabic, Chinese, English, French, German, Japanese, Korean, Portuguese, Russian and Spanish.
Terms are validated and assigned a reliability score. Also, if your search returns you 0 results, the WIPO machine translation engines offers a translation suggestion. But most importantly, the WIPO termbase allows you to do a search by concepts or subject field (concept maps).
Microsoft Language Portal
The Microsoft Language Portal is a must for those who develop localized versions of applications to be integrated in Microsoft products. The database can also be used to integrate Microsoft terminology into other termbases or as a basic IT glossary for developing applications.
It’s available in approximately 100 languages. Just like the IATE, the Microsoft termbase for one language is provided in TBX format.
Extra benefit: From the same website you can also download the Microsoft localization style guides in various languages.
TERMDAT & JURIVOC
Terminology in Switzerland is taken very seriously. The Swiss Federal Administration employs about 200 translators and specialized linguists. Their work is assisted by the Terminology Section, which places the terminology database TERMDAT, at their (and our) disposal.
TERMDAT is a multilingual specialist dictionary and contains the terminology of Swiss law, public administration and the public sector in Switzerland’s four national languages, i.e. German, French, Italian and Romansh, as well as in English.
The entries explain legal and administrative terminology and, in most cases, the equivalent term is also given in the other national languages. Besides terms and their definitions and explanatory notes, TERMDAT contains further useful information such as official designations and abbreviations of Swiss and international authorities, institutions and organisations, and the titles and abbreviations of all federal legislation.
TERMDAT can be complemented with JURIVOC, the trilingual thesaurus of the Swiss Federal Supreme Court.
TERMITE is the Telecommunication Terminology Database from the International Telecommunication Union, a specialised agency of the United Nations that is responsible for issues that concern information and communication technologies. And the oldest International Organization, by the way.
TERMITE contains approximately 60,000 entries, i.e. all the terms appearing in formerly published lexicons. Although mainly devoted to telecommunication terms, the database – which is constantly updated – also covers other technical fields as well as administrative and financial terms. The entries are in English, French, Spanish and sometimes Russian (transcribed), with a small number of entries in Italian, German and Portuguese.
Just like TERMITE, Electropedia is a specialized terminology database, curated by the International Electrotechnical Commission, which is the international standards organization that prepares and publishes international standards for all electrical, electronic and related technologies.
Electropedia contains over 22 000 terminological entries in English and French organized by subject area. It also offers equivalent terms in various other languages: Arabic, Chinese, Czech, Finnish, German, Italian, Japanese, Korean, Norwegian (both Bokmål and Nynorsk), Polish, Portuguese, Russian, Serbian, Slovenian, Spanish and Swedish.
ECHA is the European Chemicals Agency which manages the technical and administrative aspects of the implementation of the European Union regulation called Registration, Evaluation, Authorisation and Restriction of Chemicals (REACH) to protect human health and the environment.
ECHA-term is a dynamic database, constantly updated thanks to the work of subject-matter experts and linguists. The database contains over 1,200 entries in all official EU languages, with the main REACH terms also available in Croatian.
There is a customizable download option: Excel or TBX format; the whole database or even a single entry; monolingual, bilingual or multilingual.
GEMET stands for General Multilingual Environmental Thesaurus. It is considered the reference vocabulary of the European Environmental Agency (EEA) and is available in 32 languages.
Users can access terms in three main ways: hierarchical listings (terms are distributed within a conceptual structure formed by four main supergroups, that, in turn, are divided in some 32 groups); thematic listing (40 themes); and a more traditional alphabetic listing. Labels and definitions can be downloaded in RDF format.
The Food and Agriculture Organization (FAO) publishes AGROVOC, a terminology database containing over 36,000 concepts available in 36 languages.
All areas of interest to FAO are covered in the database: food, nutrition, agriculture, forestry, fisheries, names of animals and plants, environment, biological notions, techniques of plant cultivation, etc. The thesaurus is hierarchically organized under 25 top concepts. Users can search by alphabetical or hierarchical listing.