The PDF (portable document) format is something of a double-edged sword as far as Linux users are concerned. On the one hand, the format is designed to be used on the widest possible range of platforms, and it is becoming increasingly popular, not only for electronic publishing, but also for document interchange. These two factors together level the playing field for Linux somewhat: a PDF document on Linux is the same as a PDF document on Windows, and the fact that customers are increasingly sending texts in this format to translators for translation gives the lie to the adage that "all customers want Word".
PDF is however eminently unsuitable as a file format for further processing. It was never intended for this purpose. Quite the opposite, in fact - PDF files are designed to stay, and look, the same, on any platform. Editing them is highly impractical, and extracting text and formatting from them fraught with difficulty. Hopefully customers can be educated in this fact; in the meantime, though, we are going to have to deal with the format.
There is another side to PDF, however: when used properly, the format is a blessing rather than a curse. It has its use for translators, too.
For instance, I now use PDF files for my price lists. It saves time and effort, and I can be confident that the result looks exactly the way I want it to look when the customer opens it.
General information from Adobe on the PDF file format.
Viewing PDF files
Adobe Reader is a more fully-featured PDF viewer.
Converting PDF files to other formats for processing
KWord can read PDF files, and can save them to other useful formats such as OpenOffice Writer and RTF with reasonable results.
LibreOffice and OpenOffice.org can read PDF files, saving them in the OpenOffice.org/LibreOffice Draw format, which can then be edited in OpenOffice.org/LibreOffice. The resulting file broadly retains the format of the original PDF and can be re-exported to it; this may be an acceptable procedure for the translation of files containing small amounts of text "for information".
A handy open-source command-line utility for converting PDF files to HTML.
Adobe Reader has a useful "Save as Text" function.
Although primarily an OCR application, ABBYY OCR CLI is a very effective tool for extracting text from PDF files, whether scanned or containing machine-readable text, and is also able to save the output in a number of formats including plain text, HTML and RTF.
pdftotext is an open-source command-line utility for converting PDF files to plain text. Likely to be supplied with mainstream Linux distributions.
Creating PDF files
Most word processors running on Linux are now able to produce PDF files directly, usually by way of a "Print to PDF" option.
Annotating PDF files
PDF-XChange Viewer is a small Windows utility that can be used, among other things, to annotate PDF files. It appears to run acceptably on Crossover Office.
Editing PDF files
Conventional wisdom has it that PDF files are simply not designed to be edited. The proper approach is to make any editing changes – and therefore translation of text – in the original file from which the PDF was created, and then to re-export this file to PDF.
With its Infix PDF editor, Iceni has now moved the goalposts. Infix employs a text reflow function that enables users to edit a PDF file in much the same way as with a word processor. Not only that, Infix includes an XML export function specifically intended for translators. This function exports the "stories", i.e. the text portions, of the text to an XML file that can be translated externally, and then re-imported back into the original PDF.
Infix is a Windows product, but Iceni has taken care to ensure that it will run on Crossover Linux.
Although it is still doubtful whether editing PDF files in Infix can be considered a professional procedure for the production of print-ready PDFs, it certainly represents a convenient way of translating documents "for information".
This HowTo describes the use of Infix in conjunction with OmegaT for translation purposes.