Glossary Conversion Details

Overview

TBX (TermBase eXchange) is a family of XML-based languages for the interchange of terminological information (called TMLs, for Terminological Markup Language; also informally called "dialects" of TBX). All of TBX shares a core structure, in which information is represented on one of three structural levels: concept, language, and term. Concept entries contain language entries, which in turn contain entries for individual terms. The core structure also provides a set of generic elements for attaching descriptive and administrative information to these entries. These generic elements can be employed differently in different TMLs.

TBX-Glossary is one such TML, designed to support the interchange of glossary data among several formats: UTX-Simple, GlossML, the TBX family, and OLIF. Its expressive capacities are intentionally limited; it is designed to express only such essential data as can be unambiguously represented in all of these formats. This design goal is the main point differentiating TBX-Glossary from other standard TMLs such as TBX-Basic (intended to serve the most common needs in localization) or TBX-Default (intended to provide a broad array of terminological data categories taken from ISO 12620).

The convert_glossary program performs this interchange by converting glossary files among these various formats.

Limitations

As stated on the conversion page, you can convert between these glossary filetypes:

Each of the four formats can represent kinds of data that some of the other three cannot. Therefore, not every file in one of these formats can be converted to another format. Moreover, most of the formats require at least one kind of data that another format does not require. To be fully convertible among formats, a file must contain all data that one of the formats may require, and it must not contain any data that one of the formats cannot represent. Refer to the table below.

Convertible Data

Glossary-wide
Mandatory
source and target language
subject field
Optional
glossary note
Per-entry
Mandatory
source and target term
source and target part of speech
Optional
source note
source and/or target definition
source and/or target definition source citation
source and/or target contextual example
source and/or target contextual example source citation

Data placed in these categories should be in plain text, without XML-like markup or tab characters. For details on how these data categories are represented in each format, see the corresponding page linked below:

If the input file violates these requirements, the converter program will emit a warning. It may then stop the conversion process, or it may proceed with a best-effort attempt, so production of an output file should not be taken as evidence of success: The only such evidence is freedom from warnings.

(By "kinds of data" above we mean both data categories and broader, structural qualities of the glossary. The four formats embody different models of what a glossary is, and conversion requires common ground on these modeling concerns just as it requires agreement on required and permitted data categories. Thus the seemingly vague phrase.)