Has Statistical Machine Translation had its time?
With the news that Facebook has now fully transitioned to a neural-network based A.I system (Neural Machine Translation), many people are asking whether this is the end of days for Statistical TM.
“Of the 1.6 billion people who actively use Facebook, more than half…don’t speak English at all. Most of them don’t speak each other’s language,” said Alan Packer, Engineering Director and head of the Language Technology team at Facebook. And yet, English is still predominantly the language of the web. Most published text on the internet is in English and, as a result, content producers are frequently biased toward English.
Facebooks users are from multiple countries and speak multiple languages. Collectively, they have generated more than two trillion posts and comments, which grows by over a billion each day. Packer commented, “Pretty clearly, we’re not going to solve this problem with a roomful or even a building-full of human translators,“ to have even “a hope of solving this problem, we need A.I.; we need automation.”
Statistical Machine Translation
For a long time, most free web translation systems, like Facebook and Google, were using statistical, or phrase-based MT systems. Meaning, the machine learnt frequently used phrases and vocabulary and matched them to new content it processed. These systems are effective translators, but as Packer puts it, “They don’t sound like they came from a human. They’re not natural, they don’t flow well.”
Neural Machine Translation
Certainly, Neural Machine Translation (NMT) is the new buzzword, but it really is a revolutionary concept in MT technology. NMT uses A.I. to learn speech patterns and syntax, as well as vocabulary. The A.I. can learn to predict the likelihood of a sequence of words. The problem is these platforms require enormous processing power, as machine learning is basically attempting to copy or replicate the processes of a brain.
Google have been building their own NMT platform since 2017. People worldwide noticed the dramatic improvements in their service as a result. In fact, in 2016 Google’s chief executive, Sundar Pichai, said Google was going to be “A.I. first.”
Now, Facebook has taken the same route. They have attempted to take this A.I. technology and apply it, at scale, to Facebook products.
Learning to speak Zoomer with NMT
Douglas Adams famously wrote, “Numbers written on restaurant bills within the confines of restaurants do not follow the same mathematical laws as numbers written on any other pieces of paper in any other parts of the Universe.” Bistromathics was his response.
The language of Facebook is not like anywhere else, it obeys different rules, and frequently disregards those rules! Facebook language is über informal. Users throw shade, they cushion, they may even speak cursive! They misspell – often quite deliberately – maybe your mierdas touch got you pluto’d? Nah, that’ll happen in Nevuary. (I know, I know, I should have typed that in aLtErNaTiNg CaPs.)
Most NMTs are using mostly academic data sets and data mined from the Internet. Packer explains, they are “looking for parallel corpora,” that is, “the same document in multiple languages on the web.”
The problem is that this data usually comes from sources such as government documents, conference notes, or user manuals. Packer commented, “It’s great that I can find my dishwasher manual online so I can figure out how to get the lemon seeds out of the ‘spinny’ thing. [But] it turns out the language that’s in that dishwasher manual has very little to do with the language people are using to talk to each other on Facebook.”
Neural network-based MT can learn idiomatic expressions and metaphors. They can find appropriate cultural equivalents, rather than being literal, which strangely, can sometimes make no sense at all!
In short, NMT is great news for all of us with a multinational or multicultural social group, or an interest in different cultures or languages. It’s great for Googling and for Facebook users. But it’s also great for Bots. Be careful in the soft hours, zoomers! You never know what you’re talking to. Learn to lfg!