The term machine translation (MT)
is used in the sense of translation of one language to another. The ideal aim
of machine translation systems is to produce the best possible translation
without human assistance. Basically every machine translation system requires
programs for translation and automated dictionaries and grammars to support translation.
The translation quality of the machine translation
systems can be improved by pre-editing the input. Pre-editing means adjusting
the input by marking prefixes, suffixes, clause boundaries, etc. Translation
quality can also be improved by controlling the vocabulary. The output of the
machine translation should be post-edited to make it perfect. Post-editing is
required especially for health related information.
TYPES OF MACHINE TRANSLATION SYSTEMS
Machine translation systems that produce
translations between only two particular languages are called bilingual systems
and those that produce translations for any given pair of languages are called
multilingual systems. Multilingual systems may be either uni-directional or
bi-directional. Multilingual systems are preferred to be bi-directional and
bi-lingual as they have ability to translate from any given language to any
other given language and vice versa.
Machine Translation
Pyramid
DIRECT MACHINE TRANSLATION APPROACH
Direct translation
approach is the oldest and less popular approach. Machine translation systems
that use this approach are capable of translating a language, called source
language (SL) directly to another language, called target language (TL). The
analysis of SL texts is oriented to only one TL. Direct translation systems are
basically bilingual and uni-directional. Direct translation approach needs only
a little syntactic and semantic analysis. SL analysis is oriented specifically
to the production of representations appropriate for one particular TL.
INTERLINGUA APPROACH
Interlingua
approach intends to translate SL texts to that of more than one
language. Translation is from SL to an intermediate form called Interlingua
(IL) and then from IL to TL. Interlingua may be artificial one or auxiliary
language like Esperanto with universal vocabulary. Interlingua requires
complete resolution of all ambiguities in the SL text.
TRANSFER APPROACH
Unlike interlingua
approach, transfer approach has three stages involved. In the first stage, SL
texts are converted into abstract SL-oriented representations. In the second
stage, SL-oriented representations are converted into equivalent TL-oriented
representations. Final texts are generated in the third stage. In transfer
approach complete resolution of ambiguities of SL text is not required, but
only the ambiguities inherent in the language itself are tackled. Three types
of dictionaries are required: SL dictionaries, TL dictionaries and a bilingual
transfer dictionary. Transfer systems have separate grammars for SL
analysis, TL analysis and for the transformation of SL structures into
equivalent TL forms.
EMPIRICAL MACHINE TRANSLATION APPROACH
Empirical approach is the
emerging one that uses large amount of raw data in the form of parallel corpora. The raw data
consists of texts and their translations. Example-based MT,
analogy-based MT, memory-based MT, and case-based MT are the techniques that
use empirical approach. Basically all these techniques use a corpus or database of
translated examples. Statistical machine translation is corpus based but
slightly different in the sense that it depends on statistical modelling of the
word order of the target language and of source-target word equivalences. Statistical machine translationautomatically
learns lexical and structural preferences from corpora. Statistical
models offer good solution to ambiguity problem. They are robust and work well
even if there are errors and the presence of new data. IBM researchers
pioneered the first statistical approach to machine translation in 1980’s. IBM
group relies on the source-channel approach, a framework for combining a
word-based translation model and a language model. The translation model ensures
that the machine translation system produces target hypothesis corresponding to
the source sentence. The language model ensures the grammtically correct
output.