LingoHub Academy

Our experience, knowledge and lessons learned - all here just for you.

What is a Locale?

A locale is a set that defines the language and region preferences that the user wants to see in their user interface. Usually a locale identifier includes a language code and a country or region code.

Locale basics

One of the most important rules for successful localization to remember is that separating language & other cultural variables from the actual application is crucial. Localizing external content for a new market is much easier than completely re-creating software. That’s where we differentiate source files and locales to which the content should be translated and adapted to.

Essentially, a locale refers to a collection of territorially regulated generic information within a language. Based on the information a locale provides, we can differentiate several types:

  • Character classification
  • Time and date formats (names of months and days, abbreviations)
  • Monetary (currency symbols and their position, numeric separators, etc)
  • Paper formats (A4, B5, Letter, Legal, etc)

As different countries, regions and cultures often use different conventions to format time, date, numbers, parcel words and phrases, locales help software makers in several areas:

  • Identification of languages within a text
  • Encoding of resource files
  • Processing of texts
  • Layout and input method of texts
  • Choose the fonts and visual elements in accordance with a culture

Understanding locale names and ISO standards

Locale names usually include the two-letter codes for the language and territory of use. The language code is derived from ISO 639 standardization, while the territory code is usually selected from the ISO 3166. It is typical for environments which are based on UNICODE to use these ISO standards.

These locale identifying standards are relevant for several reasons. Firstly, it provides the possibility of correct language detection. Secondly, it includes data formats typical for the region, even when the language is quite similar or the same (countries that share some of the locale data and formats, but not all - like currency). Good example would be North American English which is identified with en_US, while Great Britain’s locale identifier is en_GB.

ISO 639 Standards

It’s important to note that there are a couple of different ISO 639 standards and it is not always easy to choose the right one for one's localization endeavours. The most relevant ISO Standards of language identifiers include:

  • ISO 639-1
  • ISO 639-2
  • ISO 639-3

Each of these standards includes a set of regulated language identifiers. While ISO 639-1 generally focuses on macro languages, ISO 639-2 brings in more specifications, including ancient languages and expanding further to include even constructed languages (conglangs, such as Dothraki and Klingon). When choosing the right code set, think about the nature of your content and its possible target audiences in the future.

This growth oriented mindset at this stage might save you a lot of trouble later.

What is the Common Locale Data Repository (CLDR)?

Common Locale Data Repository (CLDR) is a project by the Unicode Consortium that provides extensive locale data. This data includes information such as number formats, date, time, currency, case classification for different characters, but also script identifiers etc.

Companies such as Apple, IBM, Microsoft and Google use CLDR as a reliable source of parsing and formatting patterns for locales.

Ready to optimize your translation workflow?