LingoHub Academy

Our experience, knowledge and lessons learned - all here just for you.

What is Pluralization (p11n)?

Pluralization (p11n) is the process of changing nouns from singular to plural form. Not all languages change plural forms of nouns in the same way.

While some languages just change a suffix (the end of the word), there is a sea of irregularities. Languages are a fickle communication tool and there’s no one golden rule for pluralization. Luckily, there are a myriad of solutions for tackling the problem.

Plural Forms vs. Plural Rules

Firstly, it’s important to understand that there’s a difference between plural forms and plural rules.

Plural formats are actual changes that happen to a noun when the word changes grammatical number. For example, Chinese and Japanese have a single form for plurals. German, English, Spanish have two forms. On the other hand, some Slavic languages have 3 or more forms for plural nouns. And then there are languages like Arabic and Welsh that have 6 forms.

Plural rules will tell you how to "get" the plural form. Sometimes it means adding a simple suffix like "s". That way, in English we go from "hat" to "hats". Fairly easy.

How to solve the pluralization problem for software translation?

In general, there are two options:

  1. Adapting your content so that there is a fixed part of the text which is not influenced by the change of the number of the noun. This solution is pretty easy to implement and cheap to sustain.

  2. The programmer provides both singular and plural form, which the system then recognizes and applies. For example, a pluralizing algorithm could automatically add an appropriate suffix at the end of the word.

Img

Understanding CDLR (Common Locale Data Repository)

The Unicode Consortium has created the CDLR (Common Locale Data Repository) project with the aim of tackling different plural forms and rules.

Firstly, you have created the fixed part of the message and then include the variable element {placeholder}. The CLDR then comes in with the huge amount of data for many languages - including plural forms and rules.

With CLDR, up to 6 different plural forms are categorized:

  • Zero
  • One (singular)
  • Two (dual)
  • Few (paucal)
  • Many
  • Other (general plural form—also used if the language only has a single form)

Once the number of the noun has been recognized properly and correlated to the proper plural form, it is much easier to rely on placeholders.

Tips for easier Pluralization

Here are a couple of useful tips and tricks to help you out:

  • Avoid creating long sentences where meaning can be easily lost or misinterpreted
  • Avoid too using sentences which are too short or depend too much on context
  • Avoid pronouns as they can be quite ambiguous depending on the language
  • Avoid poor text segmentation and unclear text strings
  • Don’t forget about revision.

Related Resources

Further resources

Ready to optimize your translation workflow?