Resource Files

This section explains in general how LingoHub imports and exports the specific resource files. Specific documentation for every supported resource file type can be found in the child pages.
It explains which options you have in order to manage your different resource files via LingoHub.

Parsing the file name

First of all LingoHub tries to understand the filename of the uploaded file. A file should have a name formatted like:

  1. <basename><locale-separator><locale>.<extension> (eg. “activerecord_en-US.yml”, “public.de_AT.xml”)
  2. <locale>.<extension> (eg. “en.yml”, “de-AT.strings”)
  3. <basename>.<extension> (eg. “Localizable.strings”)

LingoHub will now try to extract the <basename> and the <locale> of this filename.

For 1) it will try to find a <locale> information. If this is preceded by a<locale-separator> (can be either “.” or “_”), the part before this locale separator is interpreted as the basename.
For 2) it will detect that there is no <basename> information, just the <locale> information
For 3) LingoHub does not find any <locale> information, therefore we do not know to which language we have to assign the imported translations. So the import process will stop here and you will be asked to give this language information in the “Resource Imports” view.
There is one exception to this rule. Some resource file formats like Rails I18n yaml, Xliff, CSV do have the locale information in the file content and LingoHub will extract the <locale> from the file content.

When using LingoHub SCM integration

We recommend to use our SCM integration (Github, BitBucket) for synchronizing your resource files in your repository with your LingoHub project.

If so, we have the full path information of your resource files. In the case that we are not able to extract the locale information from the filename, we will try to find the locale information in a path segment:

  • resources/en/strings.xml
  • root/en-US/Localizable.strings
  • root/en.lproj/Localizable.strings

Locale information

The <locale> information in the filename has to be compliant with ISO 639-1.
As support for our customers we interpret “_” as alternative for the region separator. You can either use “en-US” or “en_US”.

The list of supported locales can be found here.

 

Parsing the file content

Determining the character encoding

For most resource file types it is given that the character encoding of the files is not specified, sometimes just as best practice. So if someone imports a file to LingoHub we have to determine the character encoding of the file.
This is done by trying to parse the file using several encodings. If parsing the file succeeds we can be sure that we have applied the correct encoding. If we are not able to detect the charset, the import will fail. But this will mean that the file is corrupt in some way.

When exporting a file LingoHub will use the most common character encoding for the given resource file type. This can be overridden in the Export Settings.

Extracting segments

After LingoHub knows the locale information it will parse the content of the file and will extract the title of the translations, as well as the content. How this content will be imported can be customized by using the Import Settings.

How the title is extracted is different for every format. Most of the time it is clear how to extract the title, eg. for hierarchical formats like yaml or Xliff the key will be concatenated to be able to result in a unique key.

If the Import Setting ‘Text segment keys should be unique per’ is set to “Project“, the title has to be unique throughout the whole LingoHub project. So if one title is present in one resource file (different basenames) it must not be present in another file.

  • you upload the file “public.en.properties” containing the title “welcome”
  • you upload the file “general.en.properties” containing the title “welcome”
  • now LingoHub assumes that the key has been moved from the file “public” to the file “general” and updates its database accordingly
  • if you download the file “general.en.properties” the segment “welcome” will be present, but it won’t be present in “public.en.properties” anymore.

If the Import Setting ‘Text segment keys should be unique per’ is set to “Resource File“, the title has to be unique in a resource file (different basenames). With this option it is possible to have more than one segments having the same title in a LingoHub project.

  • you upload the file “public.en.properties” containing the title “welcome”
  • you upload the file “general.en.properties” containing the title “welcome”
  • now both resource files “public” and “general” contain a segment with title “welcome”. LingoHub handles them as 2 different segments that might have different content.

Segments can also be deactivated:

  • you upload the file “public.en.properties” containing the title “welcome”
  • you upload the file “public.en.properties” again, but now the title “welcome” isn’t defined in the file anymore
  • LingoHub assumes now that the segment “welcome” was deleted during your development process
  • so LingoHub will deactivate this segment
  • the translators won’t see it any longer in the editor (therefore they won’t translate a segment that is obsolete)
  • if you export this file again the segment won’t be present
  • by importing a file that holds the title again, this segment will be reactivated with all its history, so you won’t use any prior made translation effort

By default the deactivation of a segment is only triggered by uploading a file of the source language. This can be customized by changing the Import Settings.

Handling of content

In every resource file format the content is escaped and quoted, following the specification of the format or because the used character encoding does not allow the characters used.
While importing, LingoHub will apply these rules to the read file content to create human readable strings.

We do this transformation to be able to show the translators a content like “A text with “Ümlauts” & ελληνικά characters” instead of “A text with \”&#x00DC;mlauts\” &amp; &#x03B5;&#x03BB;&#x03BB;&#x03B7;&#x03BD;&#x03B9;&#x03BA;&#x03AC; characters”.
The translators won’t be able to read this text and won’t be able to enter the correct escaping for their translated texts. This is a task done by LingoHub.

But this means we might loose the originally used escaping of the uploaded file. So if you import a file to LingoHub and export it again, it might differ because of alternatives used for escaping. But the exported files will be always syntactically correct according to the specifications of the resource file format.
Example: The uploaded file may have contained &#x00DC; for the character “Ü” – LingoHub might export it using &Uuml; as long it results in a correct resource file.

Handling of comments

If the resource file specification supports comments, we will additionally import these comments. Some resource file types (like XLIFF, resx/w …) have a clear definition how to specify comments for a segment. For other mostly line based formats (like properties, iOS strings …) we take the following approach:

  • the first comment in the file will be interpreted as header comment. This will be stored along with the file metadata to be able to export it again.
  • all other comments will be associated to the next segment after that comment and will be shown as information for the translators in our editor

Comments may also contain LingoChecks per segment.

 

 



Was this Helpful ?   yes     no Chat with us