Skip to content

Google Translate and Wikipedia

The main idea behind Wikipedia was to make knowledge available freely anywhere around the world. Right now, however, most information is available only in English. 5.5 million articles are in the English version of Wikipedia – making it the largest edition by far – and just about 15 of the 301 editions have more than a million articles. The quality of those articles varies dramatically, with vital content often entirely missing.

Along came Google Translate, and Google partnered with Wikimedia Foundation to provide articles in different languages through translation. This was supposed to help solve the problem of machine translation tools while making as many articles as possible available in all editions. The Wikimedia Foundation integrated Google Translate into their content translation tool, which at the time was using open-source translation software. Unfortunately, it didn’t work well for editors working on the non-English editions, the translations created more problems than it solved and continued the debate over whether Wikipedia should be using machine translation or not.

Available as a beta feature, the content translation tool allows editors to generate a preview of a new article based on an automated translation from another edition. If implemented properly, the tool can help editors save considerable time and build out understaffed editions – but if it goes the other way round, it can be disastrous. Terrible translations have been pointed out by global administrators. “Village pump” in the English version became “bomb the village” when put through machine translation into Portuguese. The administrator who agreed to popular opinion that ‘Google Translate is flawless’ said, “Obviously it isn’t. It isn’t meant to be a replacement to knowing the language.”

It is quite surprising to hear how Artificial Intelligence means that Machine Translation has reached “parity” with human translation. But those stories usually refer to narrow, specialized tests of machine translation’s abilities. When software is deployed in the wild, the limitations of artificial intelligence will become glaring. Undoubtedly, AI translation is shallow. It produces text that has surface-level fluency, but which lacks the deeper meaning of words and sentences thereby making the content lose its originality and efficiency. AI systems learn how to translate by studying patterns in large bodies of training data, but that means they’re blind to the nuances of language that are used more infrequently and lack the common sense of human translators.

Machine translation may never be a viable way to make articles on Wikipedia, simply because it cannot understand complex human phrases that don’t translate between languages. Smaller projects may always have a lower standard of quality when compared to the English Wikipedia. Quality is relative, and unfinished or poorly written articles are impossible to stamp out completely. In some parts of the World, Wikipedia is seen as not trustworthy, a reputation that isn’t helped by shoddily done translations of English articles. Machine translation will (probably) never be a viable way of submitting articles on Wikipedia.

Related Posts

The main idea behind Wikipedia was to make knowledge available freely anywhere around the world. Right now, however, most information is available only in English. 5.5 million articles are in the English version of Wikipedia – making it the largest edition by far – and just about 15 of the 301 editions have more than a million articles. The quality of those articles varies dramatically, with vital content often entirely missing.

Along came Google Translate, and Google partnered with Wikimedia Foundation to provide articles in different languages through translation. This was supposed to help solve the problem of machine translation tools while making as many articles as possible available in all editions. The Wikimedia Foundation integrated Google Translate into their content translation tool, which at the time was using open-source translation software. Unfortunately, it didn’t work well for editors working on the non-English editions, the translations created more problems than it solved and continued the debate over whether Wikipedia should be using machine translation or not.

Available as a beta feature, the content translation tool allows editors to generate a preview of a new article based on an automated translation from another edition. If implemented properly, the tool can help editors save considerable time and build out understaffed editions – but if it goes the other way round, it can be disastrous. Terrible translations have been pointed out by global administrators. “Village pump” in the English version became “bomb the village” when put through machine translation into Portuguese. The administrator who agreed to popular opinion that ‘Google Translate is flawless’ said, “Obviously it isn’t. It isn’t meant to be a replacement to knowing the language.”

It is quite surprising to hear how Artificial Intelligence means that Machine Translation has reached “parity” with human translation. But those stories usually refer to narrow, specialized tests of machine translation’s abilities. When software is deployed in the wild, the limitations of artificial intelligence will become glaring. Undoubtedly, AI translation is shallow. It produces text that has surface-level fluency, but which lacks the deeper meaning of words and sentences thereby making the content lose its originality and efficiency. AI systems learn how to translate by studying patterns in large bodies of training data, but that means they’re blind to the nuances of language that are used more infrequently and lack the common sense of human translators.

Machine translation may never be a viable way to make articles on Wikipedia, simply because it cannot understand complex human phrases that don’t translate between languages. Smaller projects may always have a lower standard of quality when compared to the English Wikipedia. Quality is relative, and unfinished or poorly written articles are impossible to stamp out completely. In some parts of the World, Wikipedia is seen as not trustworthy, a reputation that isn’t helped by shoddily done translations of English articles. Machine translation will (probably) never be a viable way of submitting articles on Wikipedia.