Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Machine translation
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
===Statistical=== {{main|Statistical machine translation}} Statistical machine translation tried to generate translations using [[statistical methods]] based on bilingual text corpora, such as the [[Hansard#Translation|Canadian Hansard]] corpus, the English-French record of the Canadian parliament and [[Europarl corpus|EUROPARL]], the record of the [[European Parliament]]. Where such corpora were available, good results were achieved translating similar texts, but such corpora were rare for many language pairs. The first statistical machine translation software was [[CANDIDE]] from [[IBM]]. In 2005, Google improved its internal translation capabilities by using approximately 200 billion words from United Nations materials to train their system; translation accuracy improved.<ref>{{cite web |url=http://blog.outer-court.com/archive/2005-05-22-n83.html |title=Google Translator: The Universal Language |publisher=Blog.outer-court.com |date=25 January 2007 |access-date=2012-06-12 |archive-date=20 November 2008 |archive-url=https://web.archive.org/web/20081120030225/http://blog.outer-court.com/archive/2005-05-22-n83.html |url-status=live }}</ref> SMT's biggest downfall included it being dependent upon huge amounts of parallel texts, its problems with morphology-rich languages (especially with translating ''into'' such languages), and its inability to correct singleton errors. Some work has been done in the utilization of multiparallel [[text corpus|corpora]], that is a body of text that has been translated into 3 or more languages. Using these methods, a text that has been translated into 2 or more languages may be utilized in combination to provide a more accurate translation into a third language compared with if just one of those source languages were used alone.<ref>{{Cite conference |last=Schwartz |first=Lane |date=2008 |title=Multi-Source Translation Methods |url=https://dowobeha.github.io/papers/amta08.pdf |conference=Paper presented at the 8th Biennial Conference of the Association for Machine Translation in the Americas |archive-url=https://web.archive.org/web/20160629171944/http://dowobeha.github.io/papers/amta08.pdf |archive-date=29 June 2016 |access-date=3 November 2017 |url-status=live}}</ref><ref>{{Cite conference |last1=Cohn |first1=Trevor |last2=Lapata |first2=Mirella |date=2007 |title=Machine Translation by Triangulation: Making Effective Use of Multi-Parallel Corpora |url=http://homepages.inf.ed.ac.uk/mlap/Papers/acl07.pdf |conference=Paper presented at the 45th Annual Meeting of the Association for Computational Linguistics, June 23β30, 2007, Prague, Czech Republic |archive-url=https://web.archive.org/web/20151010171334/http://homepages.inf.ed.ac.uk/mlap/Papers/acl07.pdf |archive-date=10 October 2015 |access-date=3 February 2015 |url-status=live}}</ref><ref>{{Cite journal |last1=Nakov |first1=Preslav |last2=Ng |first2=Hwee Tou |date=2012 |title=Improving Statistical Machine Translation for a Resource-Poor Language Using Related Resource-Rich Languages |url=https://jair.org/index.php/jair/article/view/10764 |journal=Journal of Artificial Intelligence Research |volume=44 |pages=179β222 |arxiv=1401.6876 |doi=10.1613/jair.3540 |doi-access=free}}</ref>
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)