Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Index of coincidence
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
==Generalization== The above description is only an introduction to use of the index of coincidence, which is related to the general concept of [[correlation]]. Various forms of Index of Coincidence have been devised; the "delta" I.C. (given by the formula above) in effect measures the [[autocorrelation]] of a single distribution, whereas a "kappa" I.C. is used when matching two text strings.<ref>{{cite book | last=Kahn | first=David | author-link=David Kahn (writer) | title=The Codebreakers - The Story of Secret Writing | orig-year=1967 | publisher=Macmillan | location=New York | isbn=0-684-83130-9 | year=1996}}</ref> Although in some applications constant factors such as <math>c</math> and <math>N</math> can be ignored, in more general situations there is considerable value in truly ''indexing'' each I.C. against the value to be expected for the [[null hypothesis]] (usually: no match and a uniform random symbol distribution), so that in every situation the [[expected value]] for no correlation is 1.0. Thus, any form of I.C. can be expressed as the ratio of the number of coincidences actually observed to the number of coincidences expected (according to the null model), using the particular test setup. From the foregoing, it is easy to see that the formula for kappa I.C. is :<math>\mathbf{IC} = \frac{\displaystyle\sum_{j=1}^{N}[a_j=b_j]}{N/c},</math> where <math>N</math> is the common aligned length of the two texts ''A'' and ''B'', and the bracketed term is defined as 1 if the <math>j</math>-th letter of text ''A'' matches the <math>j</math>-th letter of text ''B'', otherwise 0. A related concept, the "bulge" of a distribution, measures the discrepancy between the observed I.C. and the null value of 1.0. The number of cipher alphabets used in a [[polyalphabetic cipher]] may be estimated by dividing the expected bulge of the delta I.C. for a single alphabet by the observed bulge for the message, although in many cases (such as when a [[Vigenère cipher|repeating key]] was used) better techniques are available.
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)