Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Index of coincidence
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
==Calculation== The index of coincidence provides a measure of how likely it is to draw two matching letters by randomly selecting two letters from a given text. The chance of drawing a given letter in the text is (number of times that letter appears / length of the text). The chance of drawing that same letter again (without replacement) is (appearances β 1 / text length β 1). The product of these two values gives you the chance of drawing that letter twice in a row. One can find this product for each letter that appears in the text, then sum these products to get a chance of drawing two of a kind. This probability can then be normalized by multiplying it by some coefficient, typically 26 in English. :<math> \mathbf{IC} = c \times \left({\left({\frac{n_\mathrm{a}}{N} \times \frac{n_\mathrm{a} - 1}{N - 1}}\right) + \left({\frac{n_\mathrm{b}}{N} \times \frac{n_\mathrm{b} - 1}{N - 1}}\right) + \cdots + \left({\frac{n_\mathrm{z}}{N} \times \frac{n_\mathrm{z} - 1}{N - 1}}\right)}\right)</math> where ''c'' is the normalizing coefficient (26 for English), ''n''<sub>a</sub> is the number of times the letter "a" appears in the text, and ''N'' is the length of the text. We can express the index of coincidence '''IC''' for a given letter-frequency distribution as a summation: :<math>\mathbf{IC} = \frac{\displaystyle\sum_{i=1}^{c}n_i(n_i -1)}{N(N-1)/c}</math> where ''N'' is the length of the text and ''n''<sub>1</sub> through ''n<sub>c</sub>'' are the [[Letter frequencies|frequencies]] (as integers) of the ''c'' letters of the alphabet (''c'' = 26 for monocase [[English language|English]]). The sum of the ''n<sub>i</sub>'' is necessarily ''N''. The products {{math|''n''(''n'' β 1)}} count the number of [[combinations]] of ''n'' elements taken two at a time. (Actually this counts each pair twice; the extra factors of 2 occur in both numerator and denominator of the formula and thus cancel out.) Each of the ''n<sub>i</sub>'' occurrences of the ''i'' -th letter matches each of the remaining {{math|''n<sub>i</sub>'' β 1}} occurrences of the same letter. There are a total of {{math|''N''(''N'' β 1)}} letter pairs in the entire text, and 1/''c'' is the probability of a match for each pair, assuming a uniform [[random]] distribution of the characters (the "null model"; see below). Thus, this formula gives the ratio of the total number of coincidences observed to the total number of coincidences that one would expect from the null model.<ref>{{cite journal |last=Mountjoy |first=Marjorie | title= The Bar Statistics | journal=NSA Technical Journal | year=1963 | volume=VII | issue=2,4}} Published in two parts.</ref> The expected average value for the IC can be computed from the relative letter frequencies {{mvar|''f<sub>i</sub>''}} of the source language: :<math>\mathbf{IC}_{\mathrm{expected}} = \frac{\displaystyle\sum_{i=1}^{c}{f_i}^2}{1/c}.</math> If all {{mvar|c}} letters of an alphabet were equally probable, the expected index would be 1.0. The actual monographic IC for [[telegraph]]ic English text is around 1.73, reflecting the unevenness of [[natural language|natural-language]] letter distributions. Sometimes values are reported without the normalizing denominator, for example {{math|1=0.067 = 1.73/26}} for English; such values may be called ''ΞΊ''<sub>p</sub> ("kappa-plaintext") rather than IC, with ''ΞΊ''<sub>r</sub> ("kappa-random") used to denote the denominator {{math|1/''c''}} (which is the expected coincidence rate for a uniform distribution of the same alphabet, {{math|1=0.0385=1/26}} for English). English plaintext will generally fall somewhere in the range of 1.5 to 2.0 (normalized calculation).<ref>{{Cite journal |last=Kontou |first=Eleni |date=18 May 2020 |title=Index of Coincidence |url=https://core.ac.uk/display/327259203 |journal=University of Leicester Open Journals |via=[[CORE_(research_service)|CORE]]}}</ref>
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)