Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Parallel text
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
{{short description|Text placed alongside its translation or translations}} {{Distinguish|Parallel novel}} [[File:Rosetta Stone.JPG|thumb|right|The [[Rosetta Stone]], a [[stele]] engraved with the same decree in both of the [[Egyptian language|Ancient Egyptian scripts]] as well as [[Ancient Greek]]. Its discovery was key to [[Decipherment of ancient Egyptian scripts|deciphering]] the Ancient Egyptian language.]] {{Refimprove|date=May 2008}} A '''parallel text''' is a text placed alongside its translation or translations.<ref name="Chan2014">{{Cite book |last=Chan |first=Sin-Wai |url=https://books.google.com/books?id=S0FWBQAAQBAJ |title=Routledge Encyclopedia of Translation Technology |date=2015 |publisher=Routledge |isbn=978-1-315-74912-9 |location=London |language=en}}</ref><ref name="WilliamsSennrich2016">{{Cite book |last1=Williams |first1=Philip |url=https://books.google.com/books?id=bd3dDAAAQBAJ |title=Syntax-based Statistical Machine Translation |last2=Sennrich |first2=Rico |last3=Post |first3=Matt |last4=Koehn |first4=Philipp |date=2016 |publisher=Morgan & Claypool |isbn=978-1-62705-502-4}}</ref> '''Parallel text alignment''' is the identification of the corresponding sentences in both halves of the parallel text. The [[Loeb Classical Library]] and the [[Clay Sanskrit Library]] are two examples of dual-language series of texts. Reference [[Bible translations|Bibles]] may contain the original languages and a translation, or several translations by themselves, for ease of comparison and study; [[Origen]]'s [[Hexapla]] (Greek for "sixfold") placed six versions of the Old Testament side by side. A famous example is the [[Rosetta Stone]], whose discovery allowed the [[Egyptian language|Ancient Egyptian language]] to begin being [[Decipherment of ancient Egyptian scripts|deciphered]]. Large collections of parallel texts are called '''parallel corpora''' (see [[text corpus]]). Alignments of parallel corpora at sentence level are prerequisite for many areas of [[linguistics|linguistic]] research. During translation, sentences can be split, merged, deleted, inserted or reordered by the translator. This makes alignment a non-trivial task. Parallel texts may be used in [[language education]].<ref>Abdallah, A. (2021). Impact of using parallel text strategy on teaching reading to intermediate II level students. International Journal on Social and Education Sciences (IJonSES), 3(1), 95-108. https://doi.org/10.46328/ijonses.48</ref> ==Types of parallel corpora== Parallel corpora can be classified into four main categories:{{Citation needed|date=February 2021}} * A ''parallel corpus'' contains translations of the same document in two or more languages, aligned at least at the sentence level. These tend to be rarer than less-comparable corpora.{{Citation needed|date=February 2021}} * A ''noisy parallel corpus'' contains bilingual sentences that are not perfectly aligned or have poor quality translations. Nevertheless, most of its contents are bilingual translations of a specific document. * A ''comparable corpus'' is built from non-sentence-aligned and untranslated bilingual documents, but the documents are topic-aligned. * A ''quasi-comparable corpus'' includes very heterogeneous and non-parallel bilingual documents that may or may not be topic-aligned. ==Noise in corpora== Large corpora used as training sets for [[machine translation]] algorithms are usually extracted from large bodies of similar sources, such as databases of news articles written in the first and second languages describing similar events. However, extracted fragments may be noisy, with extra elements inserted in each corpus. Extraction techniques can differentiate between [[bilingual]] elements represented in both corpora and [[monolingual]] elements represented in only one corpus in order to extract cleaner parallel fragments of bilingual elements. Comparable corpora are used to directly obtain knowledge for translation purposes. High-quality parallel data is difficult to obtain, however, especially for under-resourced languages.<ref>{{Cite journal |last=Wołk |first=Krzysztof |date=2015 |title=Noisy-Parallel and Comparable Corpora Filtering Methodology for the Extraction of Bi-Lingual Equivalent Data at Sentence Level |journal=Computer Science |volume=16 |issue=2 |pages=169–184 |arxiv=1510.04500 |bibcode=2015arXiv151004500W |doi=10.7494/csci.2015.16.2.169 |doi-access=free |s2cid=12860633}}</ref> == Bitext == {{main|Bitext word alignment}} In the field of [[translation studies]] a '''bitext''' is a merged document composed of both source- and target-language versions of a given text. Bitexts are generated by a piece of software called an ''alignment tool'', or a ''bitext tool'', which automatically aligns the original and translated versions of the same text. The tool generally matches these two texts sentence by sentence. A collection of bitexts is called a ''bitext database'' or a ''bilingual corpus'', and can be consulted with a search tool. ===Bitexts and translation memories=== {{main|Translation memory}} ''Bitexts'' have some similarities with translation memories. The most salient difference is that a translation memory loses the original context, while a bitext retains the original sentence order. That said, some implementations of translation memory, such as [[Translation Memory eXchange]] (TMX), a standard [[XML]] format for exchanging translation memories between [[computer-assisted translation]] (CAT) programs, allow preserving the original order of sentences. Bitexts are designed to be consulted by a human [[translation|translator]], not by a machine. As such, small alignment errors or minor discrepancies that would cause a translation memory to fail are of no importance. In his original 1988 article, Harris also posited that bitext represents how translators hold their source and target texts together in their mental working memories as they progress. However, this hypothesis has not been followed up.<ref>{{Cite journal |last=Harris |first=B. |date=March 1988 |title=Bi-Text, A New Concept in Translation Theory |url=http://mt-archive.info/LangMonthly-54-1988-Harris.pdf |url-status=dead |journal=Language Monthly |volume=54 |pages=8–10 |archive-url=https://web.archive.org/web/20180302103859/http://mt-archive.info/LangMonthly-54-1988-Harris.pdf |archive-date=2018-03-02}}</ref> Online bitexts and translation memories may also be called {{anchor|OBC}}online bilingual concordances. Several are available on the public Web, including [[Linguee|Linguée]], [[Reverso (language tools)|Reverso]], and Tradooit.<ref>{{Cite thesis |last=Genette |first=Marie |title=How Reliable Are Online Bilingual Concordancers? An investigation of ''Linguee'', ''TradooIT'', ''WeBiText'' and ''ReversoContext'' and Their Reliability Through a Contrastive Analysis of Complex Prepositions from French to English |date=2016 |degree=M.A. |publisher=Université catholique de Louvain & Universitetet i Oslo |url=http://urn.nb.no/URN:NBN:no-55054 |hdl=10852/51577 |hdl-access=free}}</ref><ref>{{Cite web |title=TradooIT – Concordancier bilingue |url=http://tradooit.com}}</ref><ref>{{Cite conference |last1=Désilets |first1=Alain |last2=Farley |first2=Benoît |last3=Stojanović |first3=Marta |last4=Patenaude |first4=Geneviève |date=2008 |title=WeBiText: Building Large Heterogeneous Translation Memories from Parallel Web Content |conference=Proceedings of Translating and the Computer |volume=30 |pages=27–28 |s2cid=14586900 }}</ref> == See also == * [[Bilingual inscription]] * [[Computer-assisted reviewing]] * [[Example-based machine translation]] * [[Natural language processing]] * [[Polyglot (book)]] * [[Ruby character]] * [[Statistical machine translation]] == References == {{reflist}} ==External links== === Parallel corpora === * [https://web.archive.org/web/20060619034515/http://langtech.jrc.it/JRC-Acquis.html The JRC-Acquis Multilingual Parallel Corpus] of the total body of [[European Union]] (EU) law: ''[[Acquis Communautaire]]'' with 231 language pairs.<ref>{{Cite conference |last1=Ralf |first1=Ralf Steinberger |last2=Pouliquen |first2=Bruno |last3=Widiger |first3=Anna |last4=Ignat |first4=Camelia |last5=Erjavec |first5=Tomaž |last6=Tufiş |first6=Dan |last7=Varga |first7=Dániel |date=2006 |title=The JRC-Acquis: A multilingual aligned parallel corpus with 20+ languages |conference=Proceedings of the 5th International Conference on Language Resources and Evaluation (LREC'2006). Genoa, Italy, 24–26 May 2006 |conference-url=http://www.lrec-conf.org/proceedings/lrec2006/}}</ref> * [http://www.statmt.org/europarl/ European Parliament Proceedings Parallel Corpus 1996–2011] * [http://opus.lingfil.uu.se/ The Opus project aims at collecting freely available parallel corpora] * [http://alaginrc.nict.go.jp/WikiCorpus/index_E.html Japanese-English Bilingual Corpus of Wikipedia's Kyoto Articles] {{Webarchive|url=https://web.archive.org/web/20120822235848/http://alaginrc.nict.go.jp/WikiCorpus/index_E.html |date=2012-08-22 }} * [http://www.linguateca.pt/COMPARA/ COMPARA – Portuguese/English parallel corpora] * [http://www.termsearch.info TERMSEARCH – English/Russian/French parallel corpora (Major international treaties, conventions, agreements, etc.] * [http://www.tradooit.com TradooIT – English/French/Spanish – Free Online tools] * [https://web.archive.org/web/20070707091815/http://www.inuktitutcomputing.ca/NunavutHansard/en/ Nunavut Hansard – English/Inuktitut parallel corpus] * [http://parasolcorpus.org ParaSol – A parallel corpus of Slavic and other languages] * [http://glosbe.com/tmem Glosbe: Multilanguage parallel corpora] {{Webarchive|url=https://web.archive.org/web/20130527211044/http://glosbe.com/tmem/ |date=2013-05-27 }} with online search interface * [https://wiki.korpus.cz/doku.php/en:cnk:intercorp InterCorp: A multilingual parallel corpus] 40 languages aligned with Czech, [https://kontext.korpus.cz/first_form?corpname=intercorp_v13_en&usesubcorp= online search interface] * [http://olanto.org/ myCAT – Olanto], concordancer (open source AGPL) with online search on JCR and UNO corpus * [http://www.translationautomation.com/ TAUS], with online search interface. * [http://www.linguatools.com/ linguatools] multilingual parallel corpora, online search interface. * [https://www.sketchengine.eu/eurlex-corpus/ EUR-Lex Corpus – corpus] built up of the [[EUR-Lex]] database consists of [[European Union law]] and other public documents of the [[European Union]] * [http://langrid.org Language Grid – Multilingual service platform that includes parallel text services] ===Documentation=== * [https://web.archive.org/web/20040417031546/http://www.up.univ-mrs.fr/~veronis/biblios/ptp.htm Parallel text processing bibliography by J. Veronis and M.-D. Mahimon] * [https://web.archive.org/web/20060913013656/https://www.cs.unt.edu/~rada/wpt/ Proceedings of the 2003 Workshop on Building and Using Parallel Texts] * [https://web.archive.org/web/20060913025814/https://www.cs.unt.edu/~rada/wpt05/ Proceedings of the 2005 Workshop on Building and Using Parallel Texts] ===Alignment tools=== * [http://www-i6.informatik.rwth-aachen.de/web/Tools/GIZA++.html GIZA++ alignment tool (1999)] * [https://bitbucket.org/tiedemann/uplug Uplug – tools for processing parallel corpora (2003)] * [https://web.archive.org/web/20111004235757/http://nl.ijs.si/telri/Vanilla/ An implementation of the Gale and Church sentence alignment algorithm (2005)] * [http://mokk.bme.hu/resources/hunalign/ The Hunalign sentence aligner (2005)] * [http://champollion.sourceforge.net/ Champollion (2006)] * [https://github.com/loomchild/maligna mALIGNa (2008–2020)] * [https://github.com/braunefe/Gargantua Gargantua sentence aligner (2010)] * [https://github.com/rsennrich/Bleualign Bleualign – machine translation based sentence alignment (2010)] * [https://github.com/riklopfer/YASA YASA (2013)] * [https://gitlab.cl.uzh.ch/sparcling/hierarchical_alignment_tool Hierarchical alignment tool (HAT) (2018)] {{Webarchive|url=https://web.archive.org/web/20200705030751/https://gitlab.cl.uzh.ch/sparcling/hierarchical_alignment_tool |date=2020-07-05 }} * [https://github.com/thompsonb/vecalign Vecalign sentence alignment algorithm (2019)] * [http://phraseotext.univ-grenoble-alpes.fr/webAlignToolkit/ Web Alignment Tool at University of Grenoble] {{Natural language processing}} [[Category:Translation databases]] [[Category:Language acquisition]] [[Category:Corpus linguistics]]
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)
Pages transcluded onto the current version of this page
(
help
)
:
Template:Anchor
(
edit
)
Template:Citation needed
(
edit
)
Template:Cite book
(
edit
)
Template:Cite conference
(
edit
)
Template:Cite journal
(
edit
)
Template:Cite thesis
(
edit
)
Template:Cite web
(
edit
)
Template:Distinguish
(
edit
)
Template:Main
(
edit
)
Template:Natural language processing
(
edit
)
Template:Refimprove
(
edit
)
Template:Reflist
(
edit
)
Template:Short description
(
edit
)
Template:Webarchive
(
edit
)