Translation memory: what is a segment?

A translation memory (TM) is a combination of pieces of software that breaks down a source text into units known as ‘segments’, and builds databases of equivalent segments in different languages. A segment is the basic semantic unit of a text. Although a segment could be an entire sentence, it is more usually a small […]

A translation memory (TM) is a combination of pieces of software that breaks down a source text into units known as ‘segments’, and builds databases of equivalent segments in different languages.

A segment is the basic semantic unit of a text. Although a segment could be an entire sentence, it is more usually a small group of words: ‘the red house’, for example, or ‘eighty-three’.

Once the TM has split the text into segments, the translator goes to work, translating the segments one by one. Once a segment has been translated, the TM ‘learns’ what it means, and the next time a text is put into that TM, it will search for any segments it has already learned. The TM learns that ‘the red house’ means ‘la casa roja’ in Spanish and, the next time that it comes across ‘the red house’ in an English to Spanish translation, it automatically suggests the translation it has already seen.

This suggestion is known as a ‘candidate’. Some TMs only search for identical candidates. Other TMs will also retrieve segments which are only similar to segments in the source text. If a segment is similar but not identical to one the TM already knows, it will flag this as a ‘fuzzy match’. A fuzzy matching algorithm calculates how similar the already-translated-segment – the fuzzy match – is to the sentence in the source text, and will indicate this appropriately, typically using a colour-based code.

Having fed the source text into the TM, the translator then has various possible ways to deal with candidates, fuzzy or otherwise. In the case of an identical candidate, they will often have to do no more than check it before they click ‘accept’; a fuzzy candidate, on the other hand, generally requires a closer analysis and some adjustment before it is accepted.

Let’s take an example. Imagine that the TM already recognised the segment ‘Dear Sir’; if you entered another document which contained the segment ‘Dear Sir/Madam’, it would suggest the translation that it had already learned for ‘Dear Sir’, indicating that its suggestion was a fuzzy match. The translator would then decide whether to translate the new segment entirely from scratch, or adapt the TM’s suggestion. In this case, they would probably take the fuzzy match and add the relevant extra word.

Segments which have no existing match in the TM must, of course, be translated ‘manually’. Once this has been done, the freshly-translated segment will be stored in the TM and used again in future texts, as well as later on in the text at hand. This means that if a segment or group of segments is used more than once in the same text (and the TM had not seen it previously), all the translator has to do is translate the first example of that segment, and the TM will automatically suggest the match each time it occurs later on in the same text.

As part of the quality control process, and in order to ensure that mistakes are not buried inside the TM and repeated in future documents, a project manager should check the newly-translated segments before using them to update the TM.

A TM can be built for each customer that a translation agency works with regularly. This TM is used exclusively with that client’s documents, and it quickly ‘learns’ the customer’s preferences.