Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Levenshtein distance
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
== Definition == The Levenshtein distance between two strings <math>a, b</math> (of length <math>|a|</math> and <math>|b|</math> respectively) is given by <math>\operatorname{lev}(a, b)</math> where : <math>\operatorname{lev}(a, b) = \begin{cases} |a| & \text{ if } |b| = 0, \\ |b| & \text{ if } |a| = 0, \\ \operatorname{lev}\big(\operatorname{tail}(a),\operatorname{tail}(b)\big) & \text{ if } \operatorname{head}(a)= \operatorname{head}(b), \\ 1 + \min \begin{cases} \operatorname{lev}\big(\operatorname{tail}(a), b\big) \\ \operatorname{lev}\big(a, \operatorname{tail}(b)\big) \\ \operatorname{lev}\big(\operatorname{tail}(a), \operatorname{tail}(b)\big) \\ \end{cases} & \text{ otherwise} \end{cases}</math> where the <math>\operatorname{tail}</math> of some string <math>x</math> is a string of all but the first character of <math>x</math> (i.e. <math>\operatorname{tail}(x_0x_1 \dots x_n)=x_1x_2 \dots x_n</math>), and <math>\operatorname{head}(x)</math> is the first character of <math>x</math> (i.e. <math>\operatorname{head}(x_0x_1 \dots x_n)=x_0</math>). Either the notation <math>x[n]</math> or <math>x_n</math> is used to refer to the <math>n</math>th character of the string <math>x</math>, [[Zero-based numbering|counting from 0]], thus <math>\operatorname{head}(x)=x_0=x[0]</math>. The first element in the minimum corresponds to deletion (from <math>a</math> to <math>b</math>), the second to insertion and the third to replacement. This definition corresponds directly to [[Levenshtein distance#Recursive|the naive recursive implementation]]. === Example === [[File:Levenshtein distance animation.gif|thumb|right|400px| Edit distance matrix for two words using cost of substitution as 1 and cost of deletion or insertion as 0.5]] For example, the Levenshtein distance between "kitten" and "sitting" is 3, since the following 3 edits change one into the other, and there is no way to do it with fewer than 3 edits: # '''k'''itten β '''s'''itten (substitution of "s" for "k"), # sitt'''e'''n β sitt'''i'''n (substitution of "i" for "e"), # sittin β sittin'''g''' (insertion of "g" at the end). A simple example of a deletion can be seen with "uninformed" and "uniformed" which have a distance of 1: # uni'''n'''formed β uniformed (deletion of "n"). ===Upper and lower bounds=== The Levenshtein distance has several simple upper and lower bounds. These include: * It is at least the absolute value of the difference of the sizes of the two strings. * It is at most the length of the longer string. * It is zero if and only if the strings are equal. * If the strings have the same size, the [[Hamming distance]] is an upper bound on the Levenshtein distance. The Hamming distance is the number of positions at which the corresponding symbols in the two strings are different. * The Levenshtein distance between two strings is no greater than the sum of their Levenshtein distances from a third string ([[triangle inequality]]). An example where the Levenshtein distance between two strings of the same length is strictly less than the Hamming distance is given by the pair "flaw" and "lawn". Here the Levenshtein distance equals 2 (delete "f" from the front; insert "n" at the end). The [[Hamming distance]] is 4.
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)