Skip to content

Why AI can’t do high-quality translation

Why AI cannot yet do high-quality translation (and perhaps, never will)

Yehoshua Bar-Hillel, a linguist and machine translation pioneer in the early 1950s, listed three requirements for “machine intelligence”: 

  • The ability to manipulate language; 
  • having background knowledge about the world; and 
  • reasoning and computing abilities, all at the level of a high school graduate.

He added that achieving these prerequisites “would be incomparably greater than that required for putting man on Venus.”

Earlier this year, OpenAI revealed that they expect to create an AI “superintelligence”, and that “the vast power of superintelligence could also be very dangerous, and could lead to the disempowerment of humanity or even human extinction.” Gosh! Adding, that while superintelligence “seems far off now, we believe it could arrive this decade.”

Back in the 1950s, Yehoshua Bar-Hillel offered a simple phrase to demonstrate why he believed high-quality machine/AI translation was a dream. It was:

The box was in the pen.

Here is the context: “Little John was looking for his toy box. Finally he found it. The box was in the pen. John was very happy.”

This is an example of “semantic ambiguity”. Wikipedia defines this term as follows: “an expression is semantically ambiguous when it can have multiple meanings. The higher the amount of synonyms a word has, the higher the degree of ambiguity. Like other kinds of ambiguity, semantic ambiguities are often clarified by context or by prosody. One’s comprehension of a sentence in which a semantically ambiguous word is used is strongly influenced by the general structure of the sentence.”

In this context: ‘Pen’, of course, has a number of meanings — two of which are closely related. A pen can be a writing instrument, it can also be an enclosure for animals, alternatively it can be a ‘play pen’ for children. In this case, John found his toy box in his play pen. Any human reader or translator, who had sufficient knowledge of English, would understand this instantly from the context. But for a machine — who has no life experience and no eyes with which to see the world and how we operate within it — this is hard, if not impossible, to process.

How it started…

For earlier incarnations of machine translation, Natural Language Processing (NLP), this was indeed a problem. Google Translate (which is no longer a high-end language model) failed this test in a number of languages. Google Translate translated “the pen” as a writing instrument. Bard — Google’s alternative to ChapGPT — also failed, translating “the pen” as an animal enclosure. Next, the user asked for examples of ‘pen’ being used as an enclosure for small children to play. Bard provided a number of examples, but it was unable to ‘join the dots’ and understand the context — the text remained ambiguous. (For details on these tests read the full article in Forbes.)

In 1955, Bar-Hillel wrote:

What makes an intelligent human reader grasp this meaning so unhesitatingly is…. his knowledge that the relative sizes of pens, in the sense of writing implements, toy boxes, and pens, in the sense of playpens, are such that when someone writes under ordinary circumstances and in something like the given context, “The box was in the pen,” he almost certainly refers to a playpen and most certainly not to a writing pen.

This knowledge is not at the disposal of the electronic computer and none of the dictionaries or programs for the elimination of polysemy puts this knowledge at its disposal.

Yehoshua Bar-Hillel

Put simply, AI doesn’t have, and cannot have, the equivalent life-knowledge of a three-year-old human.

How it’s going…

We are led to believe that for the latest translation models, Large Language Models (LLM) — which are trained using vast amounts of data, harvested mostly from the internet — this problem of context and semantic ambiguity has been solved. By 2006, statistical language understanding and neural networks were seen as the future of machine translation. Once again, we were assured that a high-quality AI translation was just around the corner.

In 2010, Hector Levesque at the University of Toronto, developed a new test, this time using ambiguous pronouns. For example:

The trophy doesn’t fit in the brown suitcase because it is too large. What is too large?

A. The trophy

B. The suitcase

In 2016, the most successful of the six AI programs that participated in Levesque’s test, scored 58%. Soon the addition of a transformer (the T in ChatGPT) meant the new network architecture was able to better assess the relationship between words that were further apart in the text. This was a breakthrough in machine translation; and in 2019, the most successful AI scored 90% in Levesque’s test. In 2020, GPT-3 scored 88%.

This year, researcher and journalist Gil Press attempted his own test with ChatGPT. He presented the AI with the question:

The sack of potatoes had been placed below the bag of flour, so it had to be moved first.

What had to be moved first?

ChatGPT responded:

In the given sentence, “it” refers to the sack of potatoes. The sentence “The sack of potatoes had been placed below the bag of flour, so it had to be moved first” means that the sack of potatoes had to be moved first before accessing or moving the bag of flour, which was placed on top of it.


Next, Press switched the position of the objects — changing “below” to “above”. ChatGPT responded:

“The sack of potatoes had been placed above the bag of flour, so it had to be moved first.”

This means that the bag of flour had to be moved first because it was below the sack of potatoes, and you needed to move it out of the way before accessing the sack of potatoes placed above it.


Again, we see an innate misunderstanding regarding space and motion, and how objects move and are manipulated.

What’s coming…?

In conclusion, the latest LLMs are undoubtably vastly better than older NLP machine translation models, and can sometimes-often produce accurate translations. However, there is always the possibility that they will completely fail to understand the context of the text provided. 

Perhaps the problem is the basic premise that machines will, one day, learn how to function like a human brain. Machines are not human, they do not process information in the way we do. Perhaps rather than trying to ‘replace’ human workers, we should be looking at ways that AI can augment and support our own efforts.

Find out more: What is Computer Assisted Translation?

Find out more: Machine Translation: Tips And Tricks

Related Posts