Introduction to Machine Translation


A brief introduction to the different types of Machine Translation (with help from Wikipedia):


Statistical machine translation (SMT) – this system relies on the statistical analysis of large bilingual corporation to train the stochastic models describing the mapping between a source language (SL) and a target language (TL).


Rule-based machine translation (RBMT) – systems based on linguistic information about source and target languages basically retrieved from (unilingual, bilingual or multilingual) dictionaries and grammars covering the main semantic, morphological, and syntactic regularities of each language respectively. Having input sentences (in some source language), an RBMT system generates them to output sentences (in some target language) on the basis of morphological, syntactic, and semantic analysis of both the source and the target languages involved in a concrete translation task.


The demand for SMT has significantly increased over the past few years, due to the effectiveness of this technology over RBMT in terms of the cost and time. Further, a big advantage of SMT is the handiness of platforms and algorithms. This means that a lot of the work for building and training a corpus might already be done, and can be found at a much cheaper rate than usual.


Neural machine translation (NMT) – an approach to machine translation that uses a large artificial neural network to predict the likelihood of a sequence of words, typically modeling entire sentences in a single integrated model. They require only a fraction of the memory needed by traditional statistical machine translation (SMT) models. Furthermore, unlike conventional translation systems, all parts of the neural translation model are trained jointly (end-to-end) to maximize the translation performance.