Most commercial translation memory applications now include an aligner as part of a suite of tools. Aligners are used to produce translation memory files (e.g. in the industry-standard TMX format) from legacy translations and their corresponding source files. The resulting file can then be used within a translation memory application, which provides easy access to the source/target segment pairs in the translation memory file.
Although the advantage of aligning relevant legacy texts is obvious, a major drawback is that such tools generally align non-intelligently. Where the source and target texts have different numbers of segments (e.g. because at some point, the original translator merged two sentences to form a single sentence in the translation), the resulting translation memory file is misaligned. Aligners generally provide an interface through which such misalignments can be corrected manually. This can be a time-consuming process, however, and is usually worthwhile only when the legacy translation is known to be useful, such as when the original source text has been modified and a new translation of it is required.
Bitext2tmx
Launched in January 2006, bitext2tmx is a Java application which is able to align two plain-text files. It features a clean graphical user interface and self-explanatory functions for splitting and merging cells etc. Bitext2tmx is distributed under the GNU Public License and is the work of Susana Santos, with the support of other members of the team associated with Mikel L. Forcada.
The screenshot below shows bitext2tmx being used to align the original English text of the GNU Public License with the German translation produced by Katja Lachmann Übersetzungen and Peter Gerwinski.
LF Aligner
Developed by Hungarian translator András Farkas. Written mainly in Perl; also makes use of other open-source utilities such as hunalign and pdftotext.
LF Aligner is a command-line tool, but interactive. It promises "intelligent" sentence-level segmenting.
Heartsome TMX Editor
Like bitext2tmx (see above), Heartsome's modestly and perhaps confusingly named "TMX Editor" is also a Java-based utility. It is a commercial application, though competitively priced (at the time of writing, US$68/€60 for the personal edition). The TMX editor is capable of aligning files in the following formats: RTF, HTML, XML, Plain Text, JavaScript, PO, and all OpenOffice.org files, and offers a range of functions which assist in streamlining the alignment process and reducing the scale of manual intervention.
Stingray Document Aligner
An alignment tool from Maxprograms, the company responsible for the Swordfish translation memory application and the work of Rodolfo Raya, formerly programmer for Heartsome. Stingray can align files in a wide range of file formats. It currently costs €70, and a free, fully functional 30-day trial version is available.
A tool for "geometric mapping and alignment". Runs on Java and os licensed under the GPL. (With thanks to Patrick Hall for the tip.)
PlusTools (running on Crossover Linux)
The PlusTools utility, which is associated with the Wordfast translation memory application, includes an alignment tool, and runs nicely on Linux with the aid of Crossover Office.
Wordfisher aligner (running on Crossover Linux)
Wordfisher also has its own alignment utility, one which is in fact considered by some people to be superior to those offered by expensive commercial CAT tools.
aligner.py
A simple Python script for creating a TMX file from two texts. Written by Dmitri Gabinski.
bligner.py
A simple Python script for creating a TMX file from two texts. Written by Didier Briel.
Other resources
Linux and UNIX are popular in the academic world, and university mainframes have been used as the platform for projects in computational linguistics. These projects have spawned a number of alignment tools and multilingual corpora, some of which are publicly accessible.
Multilingual Corpora: Available Resources
Catalogue of resources and links relating to:
Parallel Corpora
Multilingual Corpora
Projects
Parallel Corpora Tools (parallel concordancers,
sentence aligners, word aligners)
Links to more resources on alignment. (Thanks again to Patrick Hall.)