Editing Index of coincidence (section)

==Generalization==

The above description is only an introduction to use of the index of coincidence, which is related to the general concept of [[correlation]].  Various forms of Index of Coincidence have been devised; the "delta" I.C. (given by the formula above) in effect measures the [[autocorrelation]] of a single distribution, whereas a "kappa" I.C. is used when matching two text strings.<ref>{{cite book | last=Kahn | first=David | author-link=David Kahn (writer) | title=The Codebreakers - The Story of Secret Writing | orig-year=1967 | publisher=Macmillan | location=New York | isbn=0-684-83130-9 | year=1996}}</ref>  Although in some applications constant factors such as <math>c</math> and <math>N</math> can be ignored, in more general situations there is considerable value in truly ''indexing'' each I.C. against the value to be expected for the [[null hypothesis]] (usually: no match and a uniform random symbol distribution), so that in every situation the [[expected value]] for no correlation is 1.0.  Thus, any form of I.C. can be expressed as the ratio of the number of coincidences actually observed to the number of coincidences expected (according to the null model), using the particular test setup.

From the foregoing, it is easy to see that the formula for kappa I.C. is

:<math>\mathbf{IC} = \frac{\displaystyle\sum_{j=1}^{N}[a_j=b_j]}{N/c},</math>

where <math>N</math> is the common aligned length of the two texts ''A'' and ''B'', and the bracketed term is defined as 1 if the <math>j</math>-th letter of text ''A'' matches the <math>j</math>-th letter of text ''B'', otherwise 0.

A related concept, the "bulge" of a distribution, measures the discrepancy between the observed I.C. and the null value of 1.0.  The number of cipher alphabets used in a [[polyalphabetic cipher]] may be estimated by dividing the expected bulge of the delta I.C. for a single alphabet by the observed bulge for the message, although in many cases (such as when a [[Vigenère cipher|repeating key]] was used) better techniques are available.