Translation platforms cannot replace humans

But they are still astonishingly useful

In the past few months free online translators have suddenly got much better. This may come as a surprise to those who have tried to make use of them in the past. But in November Google unveiled a new version of Translate. The old version, called “phrase-based” machine translation, worked on hunks of a sentence separately, with an output that was usually choppy and often inaccurate.

The new system still makes mistakes, but these are now relatively rare, where once they were ubiquitous. It uses an artificial neural network, linking digital “neurons” in several layers, each one feeding its output to the next layer, in an approach that is loosely modelled on the human brain. Neural-translation systems, like the phrase-based systems before them, are first “trained” by huge volumes of text translated by humans. But the neural version takes each word, and uses the surrounding context to turn it into a kind of abstract digital representation. It then tries to find the closest matching representation in the target language, based on what it has learned before. Neural translation handles long sentences much better than previous versions did.

The new Google Translate began by translating eight languages to and from English, most of them European. It is much easier for machines (and humans) to translate between closely related languages. But Google has also extended its neural engine to languages like Chinese (included in the first batch) and, more recently, to Arabic, Hebrew, Russian and Vietnamese, an exciting leap forward for these languages that are both important and difficult. On April 25th Google extended neural translation to nine Indian languages. Microsoft also has a neural system for several hard languages.

Google Translate does still occasionally garble sentences. The introduction to a Haaretz story in Hebrew had text that Google translated as: “According to the results of the truth in the first round of the presidential elections, Macaron and Le Pen went to the second round on May 7. In third place are Francois Peyon of the Right and Jean-Luc of Lanschon on the far left.” If you don’t know what this is about, it is nigh on useless. But if you know that it is about the French election, you can see that the engine has badly translated “samples of the official results” as “results of the truth”. It has also given odd transliterations for (Emmanuel) Macron and (François) Fillon (P and F can be the same letter in Hebrew). And it has done something particularly funny with Jean-Luc Mélenchon’s surname. “Me-” can mean “of” in Hebrew. The system is “dumb”, having no way of knowing that Mr Mélenchon is a French politician. It has merely been trained on lots of text previously translated from Hebrew to English.

Such fairly predictable errors should gradually be winnowed out as the programmers improve the system. But some “mistakes” from neural-translation systems can seem mysterious. Users have found that typing in random characters in languages such as Thai, for example, results in Google producing oddly surreal “translations” like: “There are six sparks in the sky, each with six spheres. The sphere of the sphere is the sphere of the sphere.”

Although this might put a few postmodern poets out of work, neural-translation systems aren’t ready to replace humans any time soon. Literature requires far too supple an understanding of the author’s intentions and culture for machines to do the job. And for critical work—technical, financial or legal, say—small mistakes (of which even the best systems still produce plenty) are unacceptable; a human will at the very least have to be at the wheel to vet and edit the output of automatic systems.

Online translating is of great benefit to the globally curious. Many people long to see what other cultures are reading and talking about, but have no time to learn the languages. Though still finding its feet, the new generation of translation software dangles the promise of being able to do just that.

The New Language Barrier: Zero Digital Content

A language barrier doesn’t have to be something that exists between two people. It can also exist between a person and a machine. In fact, a lack of digital content in thousands of languages is a new kind of language barrier—one that prevents millions of people around the world from getting information.


There’s no language barrier for this woman when she uses the Internet—she speaks English. (Curiously, she also wears a hat indoors.)


For literate native English speakers—and even highly proficient nonnatives—there’s no language barrier in the Digital Age.

Approximately 2.5 billion pages on the Internet are in English. (This is about half of the total number of pages online.)

In other words, if you speak English and you can’t find what you’re looking for, then it’s not for lack of content. It’s a case of user error. You may want to brush up on your search skills.

There is so much digital content available in English that one person could never begin to read even a small fraction of it in a lifetime.

For example, the English version of Wikipedia alone has almost 5 million pages. (It’s no surprise, then, that the SEO tool Ahrefs ranks English Wikipedia No. 14 on its list of top websites in the world.)

If you were to read two Wikipedia articles every hour of every day for the next 100 years, you wouldn’t even get through 40% of the available content on the English version of the site.

Suffice it to say, English speakers have hit the linguistic lottery. No matter what topic they’re interested in, users can find digital content—and lots of it—in English.


If you don’t speak English—or one of several dozen very widely spoken languages—the amount of digital content available to you goes down precipitously.

Even the amount of online content in Korean—which boasts a not-too-shabby 80 million native speakers worldwide—is paltry. It represents just 0.9% of the pages on the Web.

In an article that appeared in The Atlantic called “The Internet Isn’t Available in Most Languages,” Katharine Schwab examines the language barrier that confronts Internet users.

In short, a person who speaks a language in which there is little or no digital content is at a disadvantage.

The piece cites Chichewa, a language spoken by 12 million people (2007 data). Most Chichewa speakers live in Malawi.

Because there are so few articles in Chichewa on the Internet, native speakers can access only a fraction of the content available to speakers of English, Chinese, French, Spanish, German, Arabic, and so on.

Only five percent of the languages spoken in the world today are represented online—that’s about 350 out of about 7,000 living languages.

We can guess why some languages dominate on the Web and others don’t.

Take German. Despite the fact that German speakers make up only a fraction of the world’s population, German is all over the Internet.

Germany, Austria, and Switzerland all have developed economies and high literacy rates, of course, so this isn’t surprising. (See our article on Internet users by region and languagefor more.)



Schwab uses as the basis for her article a report by the Broadband Commission for Digital Development. You can read “The State of Broadband 2015: Broadband as a Foundation for Sustainable Development” for yourself.

The findings? There is digital content for only 5% of all living languages. (That’s about 350 out of 7,000 living languages.)

The statistic may shock you, but languages don’t have a perfect distribution among the planet’s inhabitants.

Roughly 1 billion people speak Mandarin Chinese, for example. Contrast that figure with the number of speakers of Shelta: about 6,000. It makes sense that one abounds on the Internet and the other doesn’t.

Still, the problem for speakers of rare or less vital languages is very real.

While you might go online to check box scores or watch a cat video, other people in the world may use the Web to learn how to put in a well. The stakes can be high.

Broadband access, then, is not just about having enough money. And it’s not just about having a good wi-fi signal, either.

It’s about having online content rich enough to provide information—news, weather, sports, job vacancies, and much more. In this new way of thinking, broadband access is access to the world.

