Editing Machine translation (section)

==Evaluation==
{{Main|Evaluation of machine translation}}
There are many factors that affect how machine translation systems are evaluated. These factors include the intended use of the translation, the nature of the machine translation software, and the nature of the translation process.

Different programs may work well for different purposes. For example, [[statistical machine translation]] (SMT) typically outperforms [[example-based machine translation]] (EBMT), but researchers found that when evaluating English to French translation, EBMT performs better.<ref name="Way 295–309">{{cite journal|last=Way|first=Andy|author2=Nano Gough|title=Comparing Example-Based and Statistical Machine Translation|journal=Natural Language Engineering|date=20 September 2005|volume=11|issue=3|pages=295–309|doi=10.1017/S1351324905003888|doi-broken-date=1 November 2024 |s2cid=3242163}}</ref> The same concept applies for technical documents, which can be more easily translated by SMT because of their formal language.

In certain applications, however, e.g., product descriptions written in a [[controlled language]], a [[dictionary-based machine translation|dictionary-based machine-translation]] system has produced satisfactory translations that require no human intervention save for quality inspection.<ref>Muegge (2006), "[http://www.mt-archive.info/Aslib-2006-Muegge.pdf Fully Automatic High Quality Machine Translation of Restricted Text: A Case Study] {{Webarchive|url=https://web.archive.org/web/20111017043848/http://www.mt-archive.info/Aslib-2006-Muegge.pdf |date=17 October 2011 }}," in ''Translating and the computer 28. Proceedings of the twenty-eighth international conference on translating and the computer, 16–17 November 2006, London'', London: Aslib. {{ISBN|978-0-85142-483-5}}.</ref>

There are various means for evaluating the output quality of machine translation systems. The oldest is the use of human judges<ref>{{cite web |url=http://www.morphologic.hu/public/mt/2008/compare12.htm |title=Comparison of MT systems by human evaluation, May 2008 |publisher=Morphologic.hu |access-date=2012-06-12 |archive-url=https://web.archive.org/web/20120419072313/http://www.morphologic.hu/public/mt/2008/compare12.htm |archive-date=19 April 2012 |url-status=dead |df=dmy-all }}</ref> to assess a translation's quality. Even though human evaluation is time-consuming, it is still the most reliable method to compare different systems such as rule-based and statistical systems.<ref>Anderson, D.D. (1995). [http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.961.5377&rep=rep1&type=pdf Machine translation as a tool in second language learning] {{Webarchive|url=https://web.archive.org/web/20180104073518/http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.961.5377&rep=rep1&type=pdf |date=4 January 2018 }}. CALICO Journal. 13(1). 68–96.</ref> [[Automate]]d means of evaluation include [[Bilingual evaluation understudy|BLEU]], [[NIST (metric)|NIST]], [[METEOR]], and [[LEPOR]].<ref>Han et al. (2012), "[http://repository.umac.mo/jspui/bitstream/10692/1747/1/10205_0_%5B2012-12-08~15%5D%20C.%20%28COLING2012%29%20LEPOR.pdf LEPOR: A Robust Evaluation Metric for Machine Translation with Augmented Factors] {{Webarchive|url=https://web.archive.org/web/20180104073506/http://repository.umac.mo/jspui/bitstream/10692/1747/1/10205_0_%5B2012-12-08~15%5D%20C.%20%28COLING2012%29%20LEPOR.pdf |date=4 January 2018 }}," in ''Proceedings of the 24th International Conference on Computational Linguistics (COLING 2012): Posters, pages 441–450'', Mumbai, India.</ref>

Relying exclusively on unedited machine translation ignores the fact that communication in [[natural language|human language]] is context-embedded and that it takes a person to comprehend the [[context (language use)|context]] of the original text with a reasonable degree of probability. It is certainly true that even purely human-generated translations are prone to error. Therefore, to ensure that a machine-generated translation will be useful to a human being and that publishable-quality translation is achieved, such translations must be reviewed and edited by a human.<ref>J.M. Cohen observes (p.14): "Scientific translation is the aim of an age that would reduce all activities to [[Technology|techniques]]. It is impossible however to imagine a literary-translation machine less complex than the human brain itself, with all its knowledge, reading, and discrimination."</ref> The late [[Claude Piron]] wrote that machine translation, at its best, automates the easier part of a translator's job; the harder and more time-consuming part usually involves doing extensive research to resolve [[ambiguity|ambiguities]] in the [[source text]], which the [[grammatical]] and [[Lexical (semiotics)|lexical]] exigencies of the target language require to be resolved. Such research is a necessary prelude to the pre-editing necessary in order to provide input for machine-translation software such that the output will not be [[garbage in garbage out|meaningless]].<ref name="NIST">See the [https://www.nist.gov/speech/tests/mt/ annually performed NIST tests since 2001] {{Webarchive|url=https://web.archive.org/web/20090322202656/http://nist.gov/speech/tests/mt/ |date=22 March 2009 }} and [[Bilingual Evaluation Understudy]]</ref>

In addition to disambiguation problems, decreased accuracy can occur due to varying levels of training data for machine translating programs. Both example-based and statistical machine translation rely on a vast array of real example sentences as a base for translation, and when too many or too few sentences are analyzed accuracy is jeopardized. Researchers found that when a program is trained on 203,529 sentence pairings, accuracy actually decreases.<ref name="Way 295–309"/> The optimal level of training data seems to be just over 100,000 sentences, possibly because as training data increases, the number of possible sentences increases, making it harder to find an exact translation match.

Flaws in machine translation have been noted for [[Humour in translation|their entertainment value]]. Two videos uploaded to [[YouTube]] in April 2017 involve two Japanese [[hiragana]] characters えぐ (''[[E (kana)|e]]'' and ''[[Ku (kana)|gu]]'') being repeatedly pasted into Google Translate, with the resulting translations quickly degrading into nonsensical phrases such as "DECEARING EGG" and "Deep-sea squeeze trees", which are then read in increasingly absurd voices;<ref>{{Cite web|url=https://www.businessinsider.com/google-translate-fails-2017-11|title=4 times Google Translate totally dropped the ball|first=Mark|last=Abadi|website=Business Insider}}</ref><ref>{{Cite web|url=https://nlab.itmedia.co.jp/nl/articles/1704/16/news013.html|title=回数を重ねるほど狂っていく Google翻訳で「えぐ」を英訳すると奇妙な世界に迷い込むと話題に|website=ねとらぼ}}</ref> the full-length version of the video currently has 6.9 million views {{as of|lc=y|March 2022|post=.}}<ref>{{Cite web|url=https://www.youtube.com/watch?v=3-rfBsWmo0M|title=えぐ|date=12 April 2017 |via=www.youtube.com}}</ref>