The first step to successful localization of your product is choosing the right language. The second step is choosing the right alphabet or writing script. And lastly, you need to choose a software translation management tool.
Out of 7000 living languages around the world, there are many which have dual alphabets or even several active writing scripts. In this article, we will cover different types of writing systems, discuss languages with dual alphabets and tackle the puzzle of multi scripts.
source:statista.com*
Types of writing systems
What are writing systems? Basically, writing systems are groups of symbols used to represent the language. Linguists divide writing systems into specific groups by the type of symbols speakers use.
1. Pictographic/ideographic writing systems
Pictographic writing systems are systems that use graphics, picture solutions to represent words or abstract ideas. The most famous example in history are hieroglyphs. Today most commonly used pictographic system are emojis
Since Shigetaka Kurita invented emojis in 1999, emojis were an instant hit. Nowadays almost every internet user in the world is familiar with emojis. We can assume the reason for their popularity is a universal meaning and ease of expression. But, although that idea is not completely wrong, it’s not completely true either.
The tricky part about pictographic or ideographic writing systems is that they are far from universal. Some linguist would even argue that you can't read pictographic writing systems, just interpret them.
For example, an average user will be able to assume that a [facepalm] emoji represents something bad. On the other hand, we would struggle to assume what a [sun] hieroglyph means. Besides, pictographic writing systems are tied to the culture they come from. With that in mind, it’s not surprising that emojis were invented in Japan.
2. Syllabary
Syllabary writing system uses symbols that cover whole syllables.
Most languages that do use syllabary alphabets or idiographic alphabets don't opt for one writing script. Usually, the end result is a mix of the two, which offers freedom of expression. It can also offer a huge potential for misinterpretation and poor translation.
3. Alphabets
Alphabets are writing systems in which a single symbol represents an individual sound. They further branch out into several groups, depending if the alphabet has symbols for both consonants and vowels or just the one. For example, the Roman, Greek and Cyrillic alphabet have symbols for both consonants and vowels. Arabic and Hebrew alphabet show only consonants.
But even among languages that use alphabet writing system, there are dualities. For example, Serbian uses both Roman and Cyrillic alphabet. Here’s how it works.
What is digraphia?
Digraphia is the use of two writing scripts in one language and one writing system. It can refer to languages that use two writing scripts at the same time (synchronic digraphia). The other type of digraphia is when one language is, over time, switched for another with time.
Dual Alphabet in the Serbian language
An interesting example of a language synchronic digraphia is Serbian. The Serbian language has dual alphabet - one Roman and one Cyrillic. Speakers use both alphabets and the choice which to use is in most cases completely individual. For example:
Ana ima psa.
Ана има пса.
The meaning of both sentences is identical in all, except script.
Now, if this sounds like a bit of a headache, there are upsides too. The alphabets are “mirrored” which means each sound and symbol has a twin. This makes the process of transliteration (switching between scripts) fairly easy.
However, if you are choosing the right script for your product, be sure to invest resources in research. In an excellent article “Pragmatics meets Ideology: Digraphia and non-standard orthographic practices in Serbian online news forums”, Dejan Ivkovic points out how relevant the target group is when choosing the script. He uses the example of online news sites, the type of content and their target groups. In case of Serbian, opting for a Roman alphabet will allow regional access to your product, while Cyrillic alphabet will definitely narrow it.
Multiscript languages
Japanese that we mentioned earlier in the article is a good example of a multiscript language. Japanese modern writing system uses a mix of logographic and syllabic scripts.
There are three of them:
- Kanji - Kanji is a logographic system that stems probably from China. People use it to write most nouns and personal names.
- Katakana - Katakana is a syllabic system that is used to cover onomatopoeia and transliteration of foreign words and names.
- Hiragana - Hiragana is another syllabic system. Japanese speakers use it to mark grammatical particles and modify verbs and adjectives.
It is quite possible to find all three scripts in one sentence. With several thousand characters in Japanese language and three scripts, having a native speaker to help you with your localization process is a must.
Here are some examples of different writing scripts and their languages
Writing Script | Languages |
---|---|
Latin | Italian, French, Portuguese, Spanish, Romanian, Nordic languages, English, German, Chinese, Indonesian, Polynesian, Polish, Swahili, Turkish, Albanian, Hungarian, Somali, Vietnamese |
Chinese | Chinese, Japanese (Kanji), Korean, Vietnamese, Cantonese |
Arabic | Arabic, Persian, Urdu, Punjabi, Pashto, Kazakh, Kurdish |
Cyrillic | Bulgarian, Russian, Serbian, Ukrainian, Macedonian, Belarusian |
Kana | Japanese |
Hebrew | Hebrew, Yiddish |
Telugu | Telugu, Sanskrit, Gondi |
Tamil | Tamil, Kanikkaran, Badaga, Irula, Paniya, Sanskrit, Saurashtra |
International standards for language codes
To make things easier, International Organization for Standardization created the official international standard for language codes. ISO 639-1 had two-letter codes for languages, while ISO 639-2 and ISO 639-3 have expanded three first code list.
In addition to this, the international set of codes for defining writing scripts is defined. The ISO 15924 (http://www.unicode.org/iso15924/iso15924-codes.html) system includes codes that better define further the script of the language. Let’s take our examples for before.
Ana ima psa. The ISO code would be: The sr-Latn. ‘sr’ for Serbian and ‘Latn’ for Latin script. Ана има пса. The ISO code would be: sr-Cyrl. ‘sr’ for Serbian and ‘Cyrl’ for Cyrillic script.
The first ISO 639 standard was developed to make things easier for linguists and bibliography. Nowadays, understanding ISO standards and choosing the right ISO code for languages your product supports is the first step to successful localization.
How the Internet is changing things
In its beginning, the Internet was restricted by ASCII (American Standard Code for Information Interchange). As ASCII is based on the English, Roman alphabet script, the world adapted. Internet users evolved their native, non-roman writing systems to the new Roman keyboards.
These tech-driven changes influenced a lot of languages like Serbian and Japanese.
For example, for Serbian that meant forgetting Cyrillic set for a long time, and adapting the Roman alphabet by removing diacritics. Even now, these changes dominate the informal use of Serbian online. On the other hand, the dominance of the Roman alphabet inspired the romanization of Japanese and Greek. It's possible it inspired the creation of emojis too.
How do different scripts influence the market?
Localization and translation can be a challenge even when you are just aiming to transfer the meaning of words. Having to choose a proper writing script to translate to is important. Japanese uses Katakana for foreign names. Serbian uses Cyrillic for official documents and Roman alphabet for less formal content.
All these nuances are relevant for whatever product you are introducing to the market. Proper localization and review by native language professionals are an investment you can’t forget. And Lingohub is here to help you get where you want to be.