Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Edit distance
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
==Computation== The first algorithm for computing minimum edit distance between a pair of strings was published by [[Frederick J. Damerau|Damerau]] in 1964.<ref>{{Cite journal |title=Techniques for Automatically Correcting Words in Text |first=Karen |last=Kukich |journal=ACM Computing Surveys |volume=24 |issue=4 |year=1992 |url=http://dc-pubs.dbs.uni-leipzig.de/files/Kukich1992Techniqueforautomatically.pdf |doi=10.1145/146370.146380 |pages=377–439 |s2cid=5431215 |access-date=2017-11-09 |archive-url=https://web.archive.org/web/20160927030713/http://dc-pubs.dbs.uni-leipzig.de/files/Kukich1992Techniqueforautomatically.pdf |archive-date=2016-09-27 |url-status=dead }}</ref> ===Common algorithm=== {{main|Wagner–Fischer algorithm}} Using Levenshtein's original operations, the (nonsymmetric) edit distance from <math>a = a_1\ldots a_m</math> to <math>b = b_1\ldots b_n</math> is given by <math>d_{mn}</math>, defined by the [[Recursive definition|recurrence]]<ref name="slp"/> :<math>\begin{align}d_{i0} &= \sum_{k=1}^{i} w_\mathrm{del}(a_{k}), & & \quad \text{for}\; 1 \leq i \leq m \\ d_{0j} &= \sum_{k=1}^{j} w_\mathrm{ins}(b_{k}), & & \quad \text{for}\; 1 \leq j \leq n \\ d_{ij} &= \begin{cases} d_{i-1, j-1} & \text{for}\; a_{i} = b_{j}\\ \min \begin{cases} d_{i-1, j} + w_\mathrm{del}(a_{i})\\ d_{i,j-1} + w_\mathrm{ins}(b_{j}) \\ d_{i-1,j-1} + w_\mathrm{sub}(a_{i}, b_{j}) \end{cases} & \text{for}\; a_{i} \neq b_{j}\end{cases} & & \quad \text{for}\; 1 \leq i \leq m, 1 \leq j \leq n.\end{align}</math> This algorithm can be generalized to handle transpositions by adding another term in the recursive clause's minimization.<ref name="ukkonen83"/> The straightforward, [[Recursion (computer science)|recursive]] way of evaluating this recurrence takes [[exponential time]]. Therefore, it is usually computed using a [[dynamic programming]] algorithm that is commonly credited to [[Wagner–Fischer algorithm|Wagner and Fischer]],<ref>{{cite journal |author1=R. Wagner |author2=M. Fischer |s2cid=13381535 |title=The string-to-string correction problem |journal=J. ACM |volume=21 |year=1974 |pages=168–178 |doi=10.1145/321796.321811|doi-access=free }}</ref> although it has a history of multiple invention.<ref name="slp"/><ref name="ukkonen83"/> After completion of the Wagner–Fischer algorithm, a minimal sequence of edit operations can be read off as a backtrace of the operations used during the dynamic programming algorithm starting at <math>d_{mn}</math>. This algorithm has a [[time complexity]] of Θ({{mvar|m}}{{mvar|n}}) where {{mvar|m}} and {{mvar|n}} are the lengths of the strings. When the full dynamic programming table is constructed, its [[space complexity]] is also {{nowrap|Θ({{mvar|m}}{{mvar|n}})}}; this can be improved to {{nowrap|Θ(min({{mvar|m}},{{mvar|n}}))}} by observing that at any instant, the algorithm only requires two rows (or two columns) in memory. However, this optimization makes it impossible to read off the minimal series of edit operations.<ref name="ukkonen83"/> A linear-space solution to this problem is offered by [[Hirschberg's algorithm]].<ref>{{cite book |last=Skiena |first=Steven |author-link=Steven Skiena |title = The Algorithm Design Manual |publisher=[[Springer Science+Business Media]] |edition=2nd |year = 2010 |isbn=978-1-849-96720-4|bibcode=2008adm..book.....S }}</ref>{{rp|634}} A general recursive divide-and-conquer framework for solving such recurrences and extracting an optimal sequence of operations cache-efficiently in space linear in the size of the input is given by Chowdhury, Le, and Ramachandran.<ref name="CLR-08">{{cite journal |last1=Chowdhury |first1=Rezaul |last2=Le |first2=Hai-Son |last3=Ramachandran |first3=Vijaya |title=Cache-oblivious dynamic programming for bioinformatics |journal=IEEE/ACM Transactions on Computational Biology and Bioinformatics |date=July 2010 |volume=7 |issue=3 |pages=495–510 |doi=10.1109/TCBB.2008.94 |pmid=20671320 |s2cid=2532039 |url=https://ieeexplore.ieee.org/document/4609376}}</ref> ===Improved algorithms=== Improving on the Wagner–Fisher algorithm described above, [[Esko Ukkonen|Ukkonen]] describes several variants,<ref>{{cite journal |title=Algorithms for approximate string matching |journal=Information and Control |volume=64 |issue=1–3 |year=1985 |doi=10.1016/S0019-9958(85)80046-2|url=http://www.cs.helsinki.fi/u/ukkonen/InfCont85.PDF |pages=100–118|last1=Ukkonen |first1=Esko |doi-access=free }}</ref> one of which takes two strings and a maximum edit distance {{mvar|s}}, and returns {{nowrap|min({{mvar|s}}, {{mvar|d}})}}. It achieves this by only computing and storing a part of the dynamic programming table around its diagonal. This algorithm takes time {{nowrap|O({{mvar|s}}×min({{mvar|m}},{{mvar|n}}))}}, where {{mvar|m}} and {{mvar|n}} are the lengths of the strings. Space complexity is {{nowrap|O({{mvar|s}}<sup>2</sup>)}} or {{nowrap|O({{mvar|s}})}}, depending on whether the edit sequence needs to be read off.<ref name="ukkonen83"/> Further improvements by [[Gad Landau|Landau]], [[Eugene Myers|Myers]], and Schmidt [https://dblp.org/pers/hd/s/Schmidt:Jeanette_P=] give an {{nowrap|O({{mvar|s}}{{sup|2}} + max({{mvar|m}},{{mvar|n}}))}} time algorithm.<ref>{{cite journal |year=1998 |last1=Landau |last2=Myers |last3=Schmidt |citeseerx=10.1.1.38.1766 |title=Incremental String Comparison |journal=SIAM Journal on Computing |volume=27 |issue=2 |pages=557–582 |doi=10.1137/S0097539794264810}}</ref> For a finite alphabet and edit costs which are multiples of each other, the fastest known exact algorithm is of Masek and Paterson<ref>{{Cite journal|last1=Masek|first1=William J.|last2=Paterson|first2=Michael S.|date=February 1980|title=A faster algorithm computing string edit distances|journal=Journal of Computer and System Sciences|volume=20|issue=1|pages=18–31|doi=10.1016/0022-0000(80)90002-1|issn=0022-0000|doi-access=free|hdl=1721.1/148933|hdl-access=free}}</ref> having worst case runtime of O(nm/logn).
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)