Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Spell checker
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
==Design== A basic spell checker carries out the following processes: * It scans the text and extracts the words contained in it. * It then compares each word with a known list of correctly spelled words (i.e. a dictionary). This might contain just a list of words, or it might also contain additional information, such as hyphenation points or lexical and grammatical attributes. * An additional step is a language-dependent algorithm for handling [[morphology (linguistics)|morphology]]. Even for a lightly inflected language like [[English language|English]], the spell checker will need to consider different forms of the same word, such as plurals, verbal forms, [[contraction (grammar)|contraction]]s, and [[possessive (linguistics)|possessive]]s. For many other languages, such as those featuring agglutination and more complex declension and conjugation, this part of the process is more complicated. It is unclear whether morphological analysis—allowing for many forms of a word depending on its grammatical role—provides a significant benefit for English, though its benefits for highly [[synthetic language]]s such as German, Hungarian, or Turkish are clear. As an adjunct to these components, the program's [[user interface]] allows users to approve or reject replacements and modify the program's operation. Spell checkers can use [[approximate string matching]] algorithms such as [[Levenshtein distance]] to find correct spellings of misspelled words.<ref>{{Cite book|last=Perner|first=Petra|url=https://books.google.com/books?id=wnXJfsCGQC8C&q=%22spell+checking%22|title=Advances in Data Mining: Applications and Theoretical Aspects: 10th Industrial Conference, ICDM 2010, Berlin, Germany, July 12-14, 2010. Proceedings|date=2010-07-05|publisher=Springer Science & Business Media|isbn=978-3-642-14399-1|language=en}}</ref> An alternative type of spell checker uses solely statistical information, such as [[n-gram]]s, to recognize errors instead of correctly-spelled words. This approach usually requires a lot of effort to obtain sufficient statistical information. Key advantages include needing less runtime storage and the ability to correct errors in words that are not included in a dictionary.<ref>U.S. Patent 6618697, [https://patentimages.storage.googleapis.com/84/a6/5f/6b58b2e2c2da12/US6618697.pdf Method for rule-based correction of spelling and grammar errors]</ref> <!-- The cited U.S. patent was used to implement a dictionary-less spelling correction algorithm for Graffiti on the Palm Pilot in only about 8k of memory. Another reference is "What Makes a Great Invention?", Wall Street Journal, 10/23/2003, https://www.wsj.com/articles/SB106684550065867100. An n-gram based algorithm is also included in Solr: http://lucidworks.com/blog/getting-started-spell-checking-with-apache-lucene-and-solr/.--> In some cases, spell checkers use a fixed list of misspellings and [[spelling suggestion|suggestions]] for those misspellings; this less flexible approach is often used in paper-based correction methods, such as the ''see also'' entries of encyclopedias. [[Clustering algorithm]]s have also been used for spell checking<ref>de Amorim, R.C.; Zampieri, M. (2013) [http://anthology.aclweb.org/R/R13/R13-1.pdf#page=200 Effective Spell Checking Methods Using Clustering Algorithms.] {{Webarchive|url=https://web.archive.org/web/20170817162117/http://anthology.aclweb.org/R/R13/R13-1.pdf#page=200 |date=2017-08-17 }} Proceedings of Recent Advances in Natural Language Processing (RANLP2013). Hissar, Bulgaria. p. 172-178.</ref> combined with phonetic information.<ref>Zampieri, M.; de Amorim, R.C. (2014) [https://www.researchgate.net/profile/Renato_Amorim/publication/262603118_Between_Sound_and_Spelling_Combining_Phonetics_and_Clustering_Algorithms_to_Improve_Target_Word_Recovery/links/0a85e53cd2485a27fb000000.pdf Between Sound and Spelling: Combining Phonetics and Clustering Algorithms to Improve Target Word Recovery.] Proceedings of the 9th International Conference on Natural Language Processing (PolTAL). Lecture Notes in Computer Science (LNCS). Springer. p. 438-449.</ref>
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)