Internationalization (i18n): A Simple Definition

photo-1451226428352-cf66bf8a0317“Think globally, act locally.”

Akio Morita, co-founder of Sony (1921-99) said these famous words*. Local cultures are much more than just about language. Much like how different groups in the United States have their own in-jokes, dialects, idioms and customs, different countries have their own ideas about how things should be done and how things should be presented. That’s why software, websites and apps need to be developed with internationalization (i18n) in mind.

What Is Internationalization (i18n)?

Internationalization, often written as i18n, is the process through which products can be prepared to be taken to other countries. It doesn’t just mean being able to change languages; instead it means being to accept different forms of data, different settings to match local customs and different strings of data and process it correctly.

The W3C Group defines it “Internationalization is the design and development of a product, application or document content that enables easy localization for target audiences that vary in culture, region, or language.”

According to them Internationalization (i18n) normally includes:

  1. “Designing and developing in a way that removes barriers to localization or international deployment. This includes such things as enabling the use of Unicode, or ensuring the proper handling of legacy character encodings where appropriate, taking care over the concatenation of strings, avoiding dependance in code of user-interface string values, etc.”

  2. “Providing support for features that may not be used until localization occurs. For example, adding markup in your DTD to support bidirectional text, or for identifying language. Or adding to CSS support for vertical text or other non-Latin typographic features.”

  3. “Enabling code to support local, regional, language, or culturally related preferences. Typically this involves incorporating predefined localization data and features derived from existing libraries or user preferences. Examples include date and time formats, local calendars, number formats and numeral systems, sorting and presentation of lists, handling of personal names and forms of address, etc.”

  4. “Separating localizable elements from source code or content, such that localized alternatives can be loaded or selected based on the user’s international preferences as needed.”

What Is Localization (l10n)?

Localization is simply the act of changing a piece of software to suit a different locale. In many ways, internationalization can be thought of as building the structure of a piece of software so that it can be adjusted for different markets, and localization is the process of actually doing so for a specific market.
The W3.org group refers to localization as follows:

“Localization refers to the adaptation of a product, application or document content to meet the language, cultural and other requirements of a specific target market (a locale).

Localization is sometimes written as l10n, where 10 is the number of letters between l and n.

Often thought of only as a synonym for translation of the user interface and documentation, localization is often a substantially more complex issue. It can entail customization related to:

  1. Numeric, date and time formats
  2. Use of currency
  3. Keyboard usage
  4. Collation and sorting
  5. Symbols, icons and colors
  6. Text and graphics containing references to objects, actions or ideas which, in a given culture, may be subject to misinterpretation or viewed as insensitive.
  7. Varying legal requirements
  8. and many more things.

Localization may even necessitate a comprehensive rethinking of logic, visual design, or presentation if the way of doing business (eg., accounting) or the accepted paradigm for learning (eg., focus on individual vs. group) in a given locale differs substantially from the originating culture.”

Why Is Internationalization (i18n) Important?

In a number of Asian countries, the family name is given first, and the given name is given second. Your software needs to be able to understand the difference and present the correct information to the consumer so that that person puts in the data correctly. All of this is put in place by the process of internationalization.

Similarly, not everyone uses a ZIP code. Most countries have postcodes, and even then, the format can differ substantially. In Canada, for example, a postcode takes the form X0X 0X0, where X is a letter and 0 is a number. In the United Kingdom, however, a postcode can take the form X00 0XX, XX00 0XX, XX0, 0XX or X0 0XX. In Brazil, postcodes take the form 00000-000. Appropriate internationalization creates software that can handle multiple inputs. Even better is when the software can automatically check those inputs to ensure that the right format is used for the right country.

All of these are important aspects to developing software that consumers can relate to and use appropriately. A business that cannot accept orders through its software because that software cannot render postcodes properly is not going to last in the international market for very long. Even software that gets given names and family names mixed up is going to create distance between the software and its user base. This means that code must be internationalized throughout the development process and not as an afterthought.

As a brief example, Baidu is the number one search engine in China. It reached this position because it can resonate effectively with people within China, because it’s targeted specifically for them. While this is an example of localization rather than internationalization, it’s able to do better than Google because of this targeting and understanding of local cultures, restrictions and, possibly most importantly, government requirements, such as access to user information and, reportedly, censorship. Because Baidu isn’t particularly well internationalized, however, it hasn’t been able to break into any markets outside China. Although this is unlikely to be a concern in a country with more than 1 billion potential users, it does limit potential future growth.

Google, on the other hand, has been out to break into most markets thanks to its internationalized software. Because it’s easily adaptable to a wide variety of locales, it can present interesting information that meets the searcher’s requirements, whether that person is in South Africa, the United States or Russia. In a similar vein, its Android operating system, Google Chrome browser and numerous other products are all effectively internationalized, so they can be easily converted to meet the user’s cultural and personal requirements.

How Does Internationalization (i18n) Affect Developers?

Drilling down into code a little more, it’s clear that there are several good practices that go into reliable and trustworthy internationalization. As an example, around a third of all WordPress downloads are for localized non-English versions. This means that those developing various plug-ins need to take into account localization when building them and ensure their versions are fully internationalized.

This means, for example, that they shouldn’t use PHP variables inside a translation function’s strings. This is simply because translation software typically scans all the strings and pulls out the parts that need to be translated, designated by __(). If you have a PHP variable within that, it tries to pull that variable out for translation because it doesn’t know any better, and an accidental deletion by a translator can render the entire line of code worthless, and this is an error that can be difficult to track down.

In addition, it’s essential to translate phrases rather than individual words. A simplistic example would be the difference between the positioning of French adjectives and English adjectives. In English, you would say, “He is an orange man.” In French, you would say, “Il est un homme d’orange” (literally, he is a man of orange). If you are translating individual words, however, the English structure wouldn’t sound right to a native French speaker. In other languages, there are different pluralization rules for different numbers, particularly in Polish. This brings a serious complication to translations because the system must be able to return different words for different numbers. Even in English, you often use a different word after a single object than you do for a pair of objects (one potato, two potatoes). Even worse, some words are just not translatable, so the translator has to create an approximation that gets the meaning across accurately.

Disambiguation is also essential, particularly for homonyms. A homonym is a word that has multiple meanings, which, unfortunately, is the majority of English words. Context tells us a lot, but when you have multiple strings to be translated, it may not be obvious what that context is. Take the word “comment.” Is it a comment on the site, or are you being asked to make a comment? The differences in other languages can make a difference to the success of your software. When using PHP, use _x() to create a comment defining what the word means.

Other languages, such as Java, are relatively easy to internationalize, but even then, Java internationalization is a tricky subject for those not used to the process. That’s why it’s best to start the process correctly from the beginning and use tools designed specifically for internationalization.

All of this also makes it easier for translators to understand and modify your text without affecting how the website, software or app functions.

Internationalization (i18n) Gone Wrong

Bad internationalization (i18n) typically means bad localization. A classic example is when only prices are localized on an e-commerce site while the product descriptions, weights and measures remain in the original language. Because a large number of websites are based in the United States, this often means European, Asian and African customers are only given quantities and product descriptions in language that is not their own. For many, pounds, feet, inches and ounces are not easily convertible, so this turns the customer off the website because they don’t understand what they’re being offered. For clothing retailers, the same number can mean vastly different sizes in different countries. A size 10 in the United Kingdom would be a 38 in Europe. A size 38 in the United States, however, would be vastly different —the European size 38 is actually a size 6. Good coding would allow automatic conversion of data so that it appears in the target language and cultural context — at least for the website’s prime markets — and good coding has to start at the beginning of the development process.

In addition, different parts of the world use different date forms. In the United States, January 2 would be written 1/2. In the UK, that would mean February 1. This can make a big difference to delivery dates and could be a big factor as to whether your customer wants to purchase from you.

Partial translations are often the result of bad internationalization, as well. In some cases, menus may be untranslated, or perhaps essential contact information may be only in English. Similarly, the website may be able to translate certain portions at all — which is particularly common when untranslatable JPEGs or PNGs are used instead of text. This is quite common with adverts.

Different layouts are also required for different cultures. Typically, a minimalist layout is fine for many countries, but in some, such as Japan, a much denser layout is more common. Good internationalization means that you can present different products, layouts and even colors for different audiences, whereas bad internationalization means you have to use exactly the same layout for everyone.

Bad translation is often example of bad localization rather than internationalization, but it’s still important to know that these two are highly interlinked subjects.

What Good Internationalization (i18n) Means

Ultimately, good internationalization ensures your software, app or website works across a variety of cultures and target markets. It means that every piece of text should be translatable and that there shouldn’t be any code that relies on text being input in a specific language or alphabet. It should be able to render prices in an appropriate format, dates in a way that makes sense for the reader and variables so that they make sense.

Most importantly, good internationalization that’s accomplished with the right software means that you can hand off your program to your translators safe in the knowledge that they can simply get translating and return it back to you without any code changes required. This makes development much easier, bug solving simpler and updates even easier — if you only have to update the code and not the translation, it saves a significant amount of money in the long term.

This article was originally published in THECONVERSATION.COM