Gettext i18n system
Gettext is an internationalization and localization system that has an implementation in almost any programming language out there. Originally written in 1990 by Sun Microsystems it can be seen as the grandfather of all i18n resource files. Chances are good, many of our readers are younger than gettext. It was originally implemented as a library for the C programming language and is used as standard i18n system for C, C++, PHP and Python.
In almost any programming language you will find an official library for using gettext in your application. And there might be some good reasons to choose gettext as a replacement for the standard i18n system of your favorite programming language or framework (e.g. ResourceBundle in Java, rails-i18n for Rails).
Imagine following scenario. Your web application is written in Rails, your apps in Java and Swift and your web presence is a Wordpress or Drupal site. Sooner or later, you will realize that your team of translators is translating four different systems, just because there is no common resource file format that can be used in all your frameworks. So gettext might be a great choice if you are forced to use different i18n systems and want to replace it with a pervasive solution. Although, with a tool like LingoHub, resource files and syntax won't matter anymore. However, there are drawbacks and caveats using gettext. It does not have a standardized placeholder syntax for example. Quite often developers tend to use the syntax of the prior used system because they are familiar with it and do not have to change anything. By doing so, you still have the problem of handling the placeholder replacement in all of your systems. Thus, I recommend to change all placeholders to one syntax. The printf format syntax can be a good choice when having implementations in many programming languages.
Besides, gettext can offer you features that your standard system might be missing:
- pluralization support
- different types of comments
- different types of flags for every segment
Gettext resource file formats and gettext i18n
In gettext there are 3 different resource file types used to store your translatable segments.
PO (Portable Objects)
Files with the extension .po represent the central file format for using gettext translation. Those files are human readable (and editable) text files. Usually you have a PO file for each language. You can split them up for different categorizations.
Here you can see an example for a PO file. We have already covered a detailed format description about PHP internationalization with gettext tutorial.
If you already have a PO file, this file needs to be exchanged with your translator. After receiving the translated file you can use it in your application.
When using LingoHub you do not have to pass the PO files on to your translator. What's even more important is, that translators do not have to edit those files. Your files have to be machine readable and follow a strong syntax. If the person editing this file forgets about a quote or uses characters that are not allowed by the used character set, the file won't be readable by the gettext system. Hence, this could lead to a state where your application is not useable anymore.
- LingoHub will check the validity of files at import.
- Translators use the LingoHub editor to translate the project by focussing on the text. They no longer have to deal with the syntax of PO files.
- After the translation process is finished you can export the syntactically correct PO files.
POT (Portable Object Templates)
POT generally have the same structure as PO files, they just contain the keys ("msgid") of your translatable content. Those files hold the source information of what needs to be translated, therefore, they do not belong to a certain language. The usage of POT files makes sense if you have chosen the strategy that your keys hold texts of your source language (instead of generic keys).
The workflow of using POT files in your translation process is as follows:
- POT files are created by extracting translatable strings from your source code or CMS (how strings are extracted depends on the programming language and framework).
- You pass the POT files on to your translators.
- Your translators will import the files into their CAT tool.
- After the translation process is finished, translators will create PO files.
LingoHub allows you to import POT files and export corresponding PO files, which is great for starting a translation project. However, it is better to continue the process using PO files. LingoHub offers a fundamental feature, called continuous translation. If you export - adapt - import your resource files, LingoHub will detect changes automatically and keep the project in-sync to your resource files.
MO (Machine Objects)
MO files are the machine readable representation of PO files. They are smaller in size and valid after creation. As using an invalid PO file might break your application, you won't be able to create an MO file if the source PO file is incorrect.
Nowadays developers tend to use PO files instead of MO files in their application, since having a smaller and faster processable file isn't that important anymore. Moreover, the validity is checked by their continuous integration process. The human readable format gives you more certainty that the correct translations are released.
The easiest way to create a MO file is:
msgfmt en.po -o en.mo