Editing Phonetic algorithm

{{Short description|Algorithm for indexing of words by their pronunciation}}
A '''phonetic algorithm''' is an [[algorithm]] for [[index (publishing)|indexing]] of [[word]]s by their [[pronunciation]]. If the algorithm is based on orthography, it depends crucially on the spelling system of the language it is designed for: as most phonetic algorithms were developed for [[English language|English]] they are less useful for indexing words in other languages.<ref>{{cite book|last1=Li|first=Nan|url=https://books.google.com/books?id=cZvDnVVmI98C&pg=PA232|title=Exploring the grand challenges for next generation E-Business : 8th Workshop on E-Business, WEB 2009, Phoenix, AZ, USA, December 15, 2009, Revised selected papers|last2=Hitchcock|first2=Peter|last3=Blustein|first3=James|last4=Bliemel|first4=Michael|date=2011|publisher=Springer|isbn=9783642174483 |editor=H. Raghav Rao |editor2=Raj Sharman |editor3=T. S. Raghu |location=Berlin|page=232|access-date=31 December 2020}}</ref> Because [[English orthography|English spelling]] varies significantly depending on multiple factors, such as the word's origin and usage over time and borrowings from other languages, phonetic algorithms necessarily take into account numerous rules and exceptions.<ref>{{cite book |last1=Cohen |first1=Eli B. |title=Growing Information: Part 2 |date=2009 |publisher=Informing Science |location=Santa Rosa, Calif. |isbn=978-1-932886-17-7 |page=498 |url=https://books.google.com/books?id=t7RDjagG1FAC&pg=PA498 |language=en}}</ref> More general phonetic matching algorithms take articulatory features into account <ref>Ladefoged, Peter. [https://aclanthology.org/C69-5701.pdf "The measurement of phonetic similarity."] In International Conference on Computational Linguistics COLING 1969: Preprint No. 57. 1969.</ref>

Phonetic search has many applications, and one of the early use cases has been that of trademark search to ensure that newly registered trade marks do not risk infringing on existing trademarks by virtue of their pronunciation. <ref>McAllister, Robert, and Benny Brodda. "Development of a new speech comprehension test with a phonological distance metric." In Proceedings of Fonetik, vol. 44, pp. 149-152. 2002.</ref><ref>Fall, Caspas J., and Christophe Giraud-Carrier. "Searching trademark databases for verbal similarities." World Patent Information 27, no. 2 (2005): 135-143.</ref>

==Algorithms==
Among the best-known phonetic algorithms are:

* [[Soundex]], which was developed to encode surnames for use in censuses. Soundex codes are four-character strings composed of a single letter followed by three numbers.
* [[Daitch–Mokotoff Soundex]], which is a refinement of Soundex designed to better match surnames of Slavic and Germanic origin. Daitch–Mokotoff Soundex codes are strings composed of six numeric digits.
* [[Cologne phonetics]]: This is similar to Soundex, but more suitable for German words.
* [[Metaphone]] and [[Double Metaphone]] which are suitable for use with most English words, not just names. Metaphone algorithms are the basis for many popular [[spell checkers]].
* [[New York State Identification and Intelligence System]] (NYSIIS), which maps similar [[phonemes]] to the same letter. The result is a string that can be pronounced by the reader without decoding.
* [[Match Rating Approach]] developed by Western Airlines in 1977 - this algorithm has an encoding and range comparison technique.
* [[Caverphone]], created to assist in data matching between late 19th century and early 20th century electoral rolls, optimized for accents present in parts of New Zealand.

== Common uses ==
* [[Spell checkers]] can often contain phonetic algorithms. The [[Metaphone]] algorithm, for example, can take an incorrectly spelled word and create a code. The code is then looked up in directory for words with the same or similar Metaphone. Words that have the same or similar Metaphone become possible alternative spellings.
* [[Search engine technology|Search]] functionality will often use phonetic algorithms to find results that don't match exactly the term(s) used in the search. Searching for names can be difficult as there are often multiple alternative spellings for names. An example is the name [[wikt:Claire|Claire]]. It has two alternatives, Clare/Clair, which are both pronounced the same. Searching for one spelling wouldn't show results for the two others. Using [[Soundex]] all three variations produce the same Soundex code, C460. By searching names based on the Soundex code all three variations will be returned.
* [[Data deduplication]] efforts use phonetic algorithms to easily bucket records into groups of similar sounding names for further evaluation.
* [[Speech to text]] modules use phonetic encoding to find the set of dictionary words that are pronounced similarly to the phonemes output by the processed audio signal.

== See also ==
* [[Approximate string matching]]
* [[Hamming distance]]
* [[Levenshtein distance]]
* [[Damerau–Levenshtein distance]]

== References ==
{{reflist}}
* {{DADS|phonetic coding|phoneticCoding}}

== External links ==
* Algorithm for [http://shape-of-code.coding-guidelines.com/2012/03/16/generating-sounds-like-and-accented-words/ converting words to phonemes] and back.
* [http://rockymadden.com/stringmetric/ StringMetric project] a [[Scala programming language|Scala]] library of phonetic algorithms.
* [https://yomguithereal.github.io/clj-fuzzy/ clj-fuzzy project] a [[Clojure]] library of phonetic algorithms.
*  [https://github.com/danielmarcelino/soundexBR SoundexBR] library of phonetic algorithm implemented in [[R (programming language)|R]].
* [https://yomguithereal.github.io/talisman/phonetics/ Talisman] a [[JavaScript]] library collecting various phonetic algorithms that one can try online.

[[Category:Phonetic algorithms| ]]
[[Category:Phonology]]