- Great Painters
- Accounting
- Fundamentals of Law
- Marketing
- Shorthand
- Concept Cars
- Videogames
- The World of Sports

- Blogs
- Free Software
- Google
- My Computer

- PHP Language and Applications
- Wikipedia
- Windows Vista

- Education
- Masterpieces of English Literature
- American English

- English Dictionaries
- The English Language

- Medical Emergencies
- The Theory of Memory
- The Beatles
- Dances
- Microphones
- Musical Notation
- Music Instruments
- Batteries
- Nanotechnology
- Cosmetics
- Diets
- Vegetarianism and Veganism
- Christmas Traditions
- Animals

- Fruits And Vegetables


  1. Adobe Reader
  2. Adware
  3. Altavista
  4. AOL
  5. Apple Macintosh
  6. Application software
  7. Arrow key
  8. Artificial Intelligence
  9. ASCII
  10. Assembly language
  11. Automatic translation
  12. Avatar
  13. Babylon
  14. Bandwidth
  15. Bit
  16. BitTorrent
  17. Black hat
  18. Blog
  19. Bluetooth
  20. Bulletin board system
  21. Byte
  22. Cache memory
  23. Celeron
  24. Central processing unit
  25. Chat room
  26. Client
  27. Command line interface
  28. Compiler
  29. Computer
  30. Computer bus
  31. Computer card
  32. Computer display
  33. Computer file
  34. Computer games
  35. Computer graphics
  36. Computer hardware
  37. Computer keyboard
  38. Computer networking
  39. Computer printer
  40. Computer program
  41. Computer programmer
  42. Computer science
  43. Computer security
  44. Computer software
  45. Computer storage
  46. Computer system
  47. Computer terminal
  48. Computer virus
  49. Computing
  50. Conference call
  51. Context menu
  52. Creative commons
  53. Creative Commons License
  54. Creative Technology
  55. Cursor
  56. Data
  57. Database
  58. Data storage device
  59. Debuggers
  60. Demo
  61. Desktop computer
  62. Digital divide
  63. Discussion groups
  64. DNS server
  65. Domain name
  66. DOS
  67. Download
  68. Download manager
  69. DVD-ROM
  70. DVD-RW
  71. E-mail
  72. E-mail spam
  73. File Transfer Protocol
  74. Firewall
  75. Firmware
  76. Flash memory
  77. Floppy disk drive
  78. GNU
  79. GNU General Public License
  80. GNU Project
  81. Google
  82. Google AdWords
  83. Google bomb
  84. Graphics
  85. Graphics card
  86. Hacker
  87. Hacker culture
  88. Hard disk
  89. High-level programming language
  90. Home computer
  91. HTML
  92. Hyperlink
  93. IBM
  94. Image processing
  95. Image scanner
  96. Instant messaging
  97. Instruction
  98. Intel
  99. Intel Core 2
  100. Interface
  101. Internet
  102. Internet bot
  103. Internet Explorer
  104. Internet protocols
  105. Internet service provider
  106. Interoperability
  107. IP addresses
  108. IPod
  109. Joystick
  110. JPEG
  111. Keyword
  112. Laptop computer
  113. Linux
  114. Linux kernel
  115. Liquid crystal display
  116. List of file formats
  117. List of Google products
  118. Local area network
  119. Logitech
  120. Machine language
  121. Mac OS X
  122. Macromedia Flash
  123. Mainframe computer
  124. Malware
  125. Media center
  126. Media player
  127. Megabyte
  128. Microsoft
  129. Microsoft Windows
  130. Microsoft Word
  131. Mirror site
  132. Modem
  133. Motherboard
  134. Mouse
  135. Mouse pad
  136. Mozilla Firefox
  137. Mp3
  138. MPEG
  139. MPEG-4
  140. Multimedia
  141. Musical Instrument Digital Interface
  142. Netscape
  143. Network card
  144. News ticker
  145. Office suite
  146. Online auction
  147. Online chat
  148. Open Directory Project
  149. Open source
  150. Open source software
  151. Opera
  152. Operating system
  153. Optical character recognition
  154. Optical disc
  155. output
  156. PageRank
  157. Password
  158. Pay-per-click
  159. PC speaker
  160. Peer-to-peer
  161. Pentium
  162. Peripheral
  163. Personal computer
  164. Personal digital assistant
  165. Phishing
  166. Pirated software
  167. Podcasting
  168. Pointing device
  169. POP3
  170. Programming language
  171. QuickTime
  172. Random access memory
  173. Routers
  174. Safari
  175. Scalability
  176. Scrollbar
  177. Scrolling
  178. Scroll wheel
  179. Search engine
  180. Security cracking
  181. Server
  182. Simple Mail Transfer Protocol
  183. Skype
  184. Social software
  185. Software bug
  186. Software cracker
  187. Software library
  188. Software utility
  189. Solaris Operating Environment
  190. Sound Blaster
  191. Soundcard
  192. Spam
  193. Spamdexing
  194. Spam in blogs
  195. Speech recognition
  196. Spoofing attack
  197. Spreadsheet
  198. Spyware
  199. Streaming media
  200. Supercomputer
  201. Tablet computer
  202. Telecommunications
  203. Text messaging
  204. Trackball
  205. Trojan horse
  206. TV card
  207. Unicode
  208. Uniform Resource Identifier
  209. Unix
  210. URL redirection
  211. USB flash drive
  212. USB port
  213. User interface
  214. Vlog
  215. Voice over IP
  216. Warez
  217. Wearable computer
  218. Web application
  219. Web banner
  220. Web browser
  221. Web crawler
  222. Web directories
  223. Web indexing
  224. Webmail
  225. Web page
  226. Website
  227. Wiki
  228. Wikipedia
  229. WIMP
  230. Windows CE
  231. Windows key
  232. Windows Media Player
  233. Windows Vista
  234. Word processor
  235. World Wide Web
  236. Worm
  237. XML
  238. X Window System
  239. Yahoo
  240. Zombie computer

This article is from:

All text is available under the terms of the GNU Free Documentation License: 

Machine translation

From Wikipedia, the free encyclopedia

(Redirected from Automatic translation)

Machine translation, sometimes referred to by the acronym MT, is a sub-field of computational linguistics that investigates the use of computer software to translate text or speech from one natural language to another. At its basic level, MT performs simple substitution of atomic words in one natural language for words in another. Using corpus techniques, more complex translations may be attempted, allowing for better handling of differences in linguistic typology, phrase recognition, and translation of idioms, as well as the isolation of anomalies.

Current machine translation software often allows for customisation by domain or profession (such as weather reports) — improving output by limiting the scope of allowable substitutions. This technique is particularly effective in domains where formal or formulaic language is used. It follows then that machine translation of government and legal documents more readily produces usable output than conversation or less standardised text.

Improved output quality can also be achieved by human intervention: for example, some systems are able to translate more accurately if the user has unambiguously identified which words in the text are names. With the assistance of these techniques, MT has proven useful as a tool to assist human translators, and in some cases can even produce output that can be used "as is". However, current systems are unable to produce output of the same quality as a human translator, particularly where the text to be translated uses casual language.


The translation process, whether for translation, can be stated simply as:

  1. Decoding the meaning of the source text, and
  2. Re-encoding this meaning in the target language.

Behind this simple procedure there lies a complex cognitive operation. For example, to decode the meaning of the source text in its entirety, the translator must interpret and analyse all the features of the text, a process which requires in-depth knowledge of both the grammar, semantics, syntax, idioms, and the like of the source language, as well as the culture of its speakers. The translator needs the same in-depth knowledge to re-encode the meaning in the target language.

Therein lies the challenge in machine translation: how to program a computer to "understand" a text as a person does and also to "create" a new text in the target language that "sounds" as if it has been written by a person.

This problem can be tackled in a number of ways.


Pyramid showing comparative depths of intermediary representation, interlingual machine translation at the peak, followed by transfer-based, then direct translation.
Pyramid showing comparative depths of intermediary representation, interlingual machine translation at the peak, followed by transfer-based, then direct translation.

Machine translation can use a method based on linguistic rules, which means that words will be translated in a linguistic way — the most suitable (orally speaking) words of the target language will replace the ones in the source language.

It is often argued that the success of machine translation requires the problem of natural language understanding to be solved first.

Generally, rule-based methods parse a text, usually creating an intermediary, symbolic representation, from which the text in the target language is generated. According to the nature of the intermediary representation, an approach is described as interlingual machine translation or transfer-based machine translation. These methods require extensive lexicons with morphological, syntactic, and semantic information, and large sets of rules.

Given enough data, machine translation programs often work well enough for a native speaker of one language to get the approximate meaning of what is written by the other native speaker. The difficulty is getting enough data of the right kind to support the particular method. For example, the large multilingual corpus of data needed for statistical methods to work is not necessary for the grammar-based methods. But then, the grammar methods need a skilled linguist to carefully design the grammar that they use.

To translate between closely related languages, a technique referred to as shallow-transfer machine translation may be used.

Dictionary-based machine translation

Main article: Dictionary-based machine translation

Machine translation can use a method based on dictionary entries, which means that the words will be translated as a dictionary does — word by word, usually without much correlation of meaning between them.

Statistical machine translation

Main article: Statistical machine translation

Statistical machine translation tries to generate translations using statistical methods based on bilingual text corpora, such as the Canadian Hansard corpus, the English-French record of the Canadian parliament and EUROPARL, the record of the European Parliament. Where such corpora are available, impressive results can be achieved translating texts of a similar kind, but such corpora are still very rare. The first statistical machine translation software was CANDIDE from IBM. Google currently uses SYSTRAN, but is working on a statistical translation method for most of their machine translation in the future. Recently, they improved their translation capabilities by inputting approximately 200 billion words from United Nations materials to train their system. Accuracy of the translation has improved drastically since. [1]

Example-based machine translation

Main article: Example-based machine translation

Example-based machine translation (EBMT) approach is often characterised by its use of a bilingual corpus as its main knowledge base, at run-time. It is essentially a translation by analogy and can be viewed as an implementation of case-based reasoning approach of machine learning.

Interlingual machine translation

Main article: Interlingual machine translation

Interlingual machine translation is one instance of rule-based machine translation approaches. According to this approach, the source language, ie. the text to be translated is transformed into an interlingual, ie. source/target language independent representation. The target language is then generated out of the interlingua.

Major issues

Word sense disambiguation

Main article: Word sense disambiguation

Word sense disambiguation concerns finding a suitable translation when a word can have more than one meaning. The problem was first raised in the 1950s by Yehoshua Bar-Hillel [2]. He pointed out that without a "universal encyclopaedia", a machine would never be able to distinguish between the two meanings of a word. Today there are numerous approaches designed to overcome this problem. They can be approximately divided into "shallow" approaches and "deep" approaches.

Shallow approaches assume no knowledge of the text. They simply apply statistical methods to the words surrounding the ambiguous word. Deep approaches presume a comprehensive knowledge of the word. So far, shallow approaches have been more successful.

Named entities

Related to named entity recognition in information extraction.


Main article: History of machine translation

The history of machine translation generally starts in the 1950s after the second world war. The Georgetown experiment in 1954 involved fully automatic translation of more than sixty Russian sentences into English. The experiment was a great success and ushered in an era of significant funding for machine translation research. The authors claimed that within three or five years, machine translation would be a solved problem.

However, the real progress was much slower, and after the ALPAC report in 1966, which found that the ten years long research had failed to fulfill the expectations, the funding was dramatically reduced. Starting in the late 1980s, as computational power increased and became less expensive, more interest began to be shown in statistical models for machine translation.

Today there are many software programs for translating natural language, several of them online, such as the SYSTRAN system which powers both Google translate and the AltaVista's Babelfish. Although there is no system that provides the holy-grail of "Fully automatic high quality machine translation" (FAHQMT), many systems provide reasonable output.

Real world applications

Despite their inherent limitations, MT programs are currently used by various organisations around the world. Probably the largest institutional user is the European Commission, which uses a highly customised version of the commercial MT system SYSTRAN to handle the automatic translation of a large volume of preliminary drafts of documents for internal use.

A Danish translation agency, Lingtech A/S [3], has been translating patent applications from English to Danish since 1993 using a proprietary rule-based machine translation system, PaTrans, working together with the translation memory based Trados commercial CAT tool. The system requires both manual pre- and post-editing, but the monthly output is still approximately 400,000 words per operator.

The Spanish daily newspaper Periódico de Catalunya is translated from Spanish into Catalan with an MT system [4].

Google has reported that promising results were obtained using proprietary statistic machine translation engine [5]. This engine is currently used in the Google Translation tools for Arabic <-> English and Chinese <-> English, with more language pairs soon to be migrated from the SYSTRAN engine to the Google engine [citation needed]. Uwe Muegge has implemented a demo website [6] that uses a controlled language in combination with the Google engine to produce fully automatic, high quality machine translations of his English, German, and French web sites.

With the recent focus on terrorism, the military sources in US invest significant amounts of money in natural language engineering. In-Q-Tel [7] (a venture capital fund, largely funded by the US Intelligence Community, to stimulate new technologies through private sector entrepreneurs) brought up companies like Language Weaver. Currently the military community is interested in translation and processing of languages like Arabic, Pashto, and Dari. [citation needed] Information Processing Technology Office in DARPA hosts programs like TIDES and Babylon Translator. US Air Force has awarded a $1 million contract to develop a language translation technology. [8]


There are various methods for evaluating the performance of machine translation systems, the oldest is by using human judges to tell the quality of a translation. Newer automated methods include BLEU, NIST and METEOR.

Currently, the product of machine translation is sometimes called a "gisting translation" — unless one is proficient in both languages, MT will often produce only a rough translation that will at best allow the reader to "get the gist" of the source text, but is unlikely to convey a complete understanding of it. The user may find the raw translation sufficiently useful as it is.


In the words of the European Association for Machine Translation (EAMT):

Machine translation (MT) is the application of computers to the task of translating texts from one natural language to another. One of the very earliest pursuits in computer science, MT has proved to be an elusive goal, but today a number of systems are available which produce output which, if not perfect, is of sufficient quality to be useful in a number of specific domains. [9] (1997)

See also

  • Artificial Intelligence
  • Cognitive science
  • Computer-assisted translation
  • Computer-assisted reviewing
  • Distributed Language Translation
  • Eurotra
  • History of machine translation
  • Language acquisition
  • Language identification
  • List of Machine Translation software
  • Parallel text alignment
  • Translation
  • Universal grammar
  • Universal Networking Language
  • Universal translator
  • List of research laboratories for machine translation


  1. ^
  2. ^ Milestones in machine translation - No.6: Bar-Hillel and the nonfeasibility of FAHQT by John Hutchins
  3. ^
  4. ^ Informe sobre el sistema de traducción automática del Periódico de Catalunya (in Spanish)
  5. ^ Google Blog: The machines do the translating (by Franz Och)
  6. ^ This demo website uses a controlled language in combination with the Google engine
  7. ^ In-Q-Tel
  8. ^ GCN — Air force wants to build a universal translator
  9. ^ European Association for Machine Translation — What is Machine Translation?


  1. Hutchins, W. John; and Harold L. Somers (1992). An Introduction to Machine Translation. London: Academic Press. ISBN 0-12-362830-X.

External links

  • DARPA challenge -- build the ultimate speech translation machine
  • International Association for Machine Translation (IAMT)
  • Association for Machine Translation in the Americas (AMTA)
  • European Association for Machine Translation (EAMT)
  • Asia-Pacific Association for Machine Translation (APAMT)
  • Association for Computational Linguistics
  • Machine Translation, an introductory guide to MT by D.J.Arnold et al. (1994)
  • Machine Translation Archive by John Hutchins. An electronic repository (and bibliography) of articles, books and papers in the field of machine translation and computer-based translation technology
  • Machine translation (computer-based translation) — Publications by John Hutchins (includes PDFs of several books on machine translation)
  • NIST 2005 Machine Translation Evaluation Official Results
  • Machine Translation and Minority Languages
  • John Hutchins 1999
Retrieved from ""