Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Dictionary-based machine translation
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
== "DKvec" == "DKvec is a method for extracting bilingual lexicons, from noisy parallel corpora based on arrival distances of words in noisy parallel corpora". This method has emerged in response to two problems plaguing the statistical extraction of bilingual lexicons: "(1) How can noisy parallel corpora be used? (2) How can non-parallel yet comparable corpora be used?"<ref name=":4">{{Cite book|title = Machine Translation and the Information Soup|volume = 1529|publisher = CR Subject Classification (1998): I.2.7, H.3, F.4.3, H.5, J.5 Springer-Verlag Berlin Heidelberg New York|isbn=978-3-540-65259-5|last = David Farwell Laurie Gerber Eduard Hovy|s2cid = 19677267|doi = 10.1007/3-540-49478-2|series = Lecture Notes in Computer Science|year = 1998|hdl = 11693/27676}}</ref> The "DKvec" method has proven invaluable for machine translation in general, due to the amazing success it has had in trials conducted on both English β Japanese and English β Chinese noisy parallel corpora. The figures for accuracy "show a 55.35% precision from a small corpus and 89.93% precision from a larger corpus".<ref name=":4" /> With such impressive numbers it is safe to assume the immense impact that methods such as "DKvec" has had in the evolution of machine translation in general, especially Dictionary-Based Machine Translation. Algorithms used for extracting [[parallel corpora]] in a bilingual format exploit the following rules in order to achieve a satisfactory accuracy and overall quality:<ref name=":4" /> # Words have one sense per corpus # Words have single translation per corpus # No missing translations in the target document # Frequencies of bilingual word occurrences are comparable # Positions of bilingual word occurrences are comparable This methods can be used to generate, or to look for, occurrence patterns which in turn are used to produce binary occurrence vectors which are used by the "DKvec" method.
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)