In the field of translation studies a bitext is a merged document comprised of both source- and target-language versions of a given text.
Bitexts are generated by a piece of software called an alignment tool, or a bitext tool, which automatically aligns the original and translated versions of the same text. The tool generally matches these two texts sentence by sentence. A collection of bitexts is called a bitext database or a bilingual corpus, and can be consulted with a search tool.
History
The idea of the bitext is attributed to Brian Harris, who first wrote a paper on the concept in
1988, and has been promoted by the
Université de Montréal-based
RALI (
Recherche appliquée en linguistique informatique, or
Applied Research in Computational Linguistics), a group of
computer scientists and
linguists who study
natural language processing.
Pierre Isabelle and
Claude Bédard are noted promoters of the concept of the
bitext.
Bitexts and Translation memories
The concept of the
bitext shows certain similarities with that of the
translation memory. The main difference between a bitext and a translation memory is that a translation memory is a database in which its segments (matched sentences) are stored in a way that is totally unrelated to their original context; the original sentence order is lost. A bitext retains the original sentence order.
Bitexts are designed to be consulted by a human translator, not by a machine. As such, small alignment errors or minor discrepancies that would cause a translation memory to fail are of no importance.
Translation