Editing Vigenère cipher (section)

==Cryptanalysis==
The idea behind the Vigenère cipher, like all other polyalphabetic ciphers, is to disguise the plaintext [[letter frequency]] to interfere with a straightforward application of [[frequency analysis]]. For instance, if <code>P</code> is the most frequent letter in a ciphertext whose plaintext is in [[English language|English]], one might suspect that <code>P</code> corresponds to <code>e</code> since <code>e</code> is the most frequently used letter in English. However, by using the Vigenère cipher, <code>e</code> can be enciphered as different ciphertext letters at different points in the message, which defeats simple frequency analysis.

The primary weakness of the Vigenère cipher is the repeating nature of its [[key (cryptography)|key]]. If a cryptanalyst correctly guesses the key's length ''n'', the cipher text can be treated as ''n'' interleaved [[Caesar cipher]]s, which can easily be broken individually. The key length may be discovered by [[Brute-force attack|brute force]] testing each possible value of ''n'', or [[#Kasiski examination|Kasiski examination]] and the [[#Friedman test|Friedman test]] can help to determine the key length (see below: {{section link||Kasiski examination}} and {{section link||Friedman test}}).

=== Kasiski examination ===
{{further|Kasiski examination}}

In 1863, [[Friedrich Kasiski]] was the first to publish a successful general attack on the Vigenère cipher.<ref>{{cite book|last1=Kasiski|first1=F. W.|title=Die Geheimschriften und die Dechiffrir-Kunst|trans-title=Cryptograms and the art of deciphering|date=1863|publisher=E.S. Mittler und Sohn|location=Berlin, (Germany)|url=https://books.google.com/books?id=fB5dAAAAcAAJ&pg=PP7|language=de}}</ref> Earlier attacks relied on knowledge of the plaintext or the use of a recognizable word as a key. Kasiski's method had no such dependencies. Although Kasiski was the first to publish an account of the attack, it is clear that others had been aware of it. In 1854, [[Charles Babbage]] was goaded into breaking the Vigenère cipher when John Hall Brock Thwaites submitted a "new" cipher to the ''Journal of the Society of the Arts''.<ref>See:
*  {{cite journal|last1=Thwaites|first1=J.H.B.|title=Secret, or cypher writing|journal=Journal of the Society of Arts|date=11 August 1854|volume=2|issue=90|pages=663–664|url=https://books.google.com/books?id=nxw9AQAAIAAJ&pg=PA663}}
*  {{cite journal|last1="C." (Charles Babbage)|title=Mr. Thwaites's cypher|journal=Journal of the Society of Arts|date=1 September 1854|volume=2|issue=93|pages=707–708|url=https://books.google.com/books?id=nxw9AQAAIAAJ&pg=PA707}}
*  {{cite book|last1=Babbage|first1=Charles|title=Passages from the Life of a Philosopher|date=1864|publisher=Longman|location=London, England|page=[https://archive.org/details/bub_gb_2T0AAAAAQAAJ/page/n509 496]|url=https://archive.org/details/bub_gb_2T0AAAAAQAAJ}}</ref><ref>Thwaites filed for a patent for his "new" cipher system:
*  [https://babel.hathitrust.org/cgi/pt?id=gri.ark:/13960/t4gn2h303;view=1up;seq=798 "Weekly list of patents sealed. … 1727. John Hall Brock Thwaites, Bristol – Improvements in apparatus to facilitate communication by cypher."] in: ''Journal of the Society of Arts'', '''2''' (99): 792 (13 October 1854).
*  [https://books.google.com/books?id=7xEFAAAAQAAJ&pg=PA211 "Thwaites, John Hall Brock, of Bristol, dentist. ''Improvements in apparatus to facilitate the communication by cypher''.  Application dated August 7, 1854. (No. 1727.)"] in:  ''The Mechanics' Magazine'', '''62''' (1647): 211 (3 March 1855).</ref> When Babbage showed that Thwaites' cipher was essentially just another recreation of the Vigenère cipher, Thwaites presented a challenge to Babbage: given an original text (from Shakespeare's ''[[The Tempest]]'': Act 1, Scene 2) and its enciphered version, he was to find the key words that Thwaites had used to encipher the original text. Babbage soon found the key words: "two" and "combined".  Babbage then enciphered the same passage from Shakespeare using different key words and challenged Thwaites to find Babbage's key words.<ref>See:
*  {{cite journal|last1=Thwaites|first1=John H.B.|title=Secret or cypher writing|journal=Journal of the Society of Arts|date=15 September 1854|volume=2|issue=95|pages=732–733|url=https://babel.hathitrust.org/cgi/pt?id=gri.ark:/13960/t4gn2h303;view=1up;seq=738}}
*  {{cite journal|last1="C" (Charles Babbage)|title=Mr. Thwaites's cypher|journal=Journal of the Society of Arts|date=6 October 1854|volume=2|issue=98|pages=776–777|url=https://babel.hathitrust.org/cgi/pt?id=gri.ark:/13960/t4gn2h303;view=1up;seq=782}}</ref> Babbage never explained the method that he used. Studies of Babbage's notes reveal that he had used the method later published by Kasiski and suggest that he had been using the method as early as 1846.<ref name="Franksen1985">{{cite book|author=Ole Immanuel Franksen|title=Mr. Babbage's Secret: The Tale of a Cypher and APL|url=https://books.google.com/books?id=53dQAAAAMAAJ|year=1985|publisher=Prentice Hall|isbn=978-0-13-604729-2}}</ref>

The [[Kasiski examination]], also called the Kasiski test, takes advantage of the fact that repeated words are, by chance, sometimes encrypted using the same key letters, leading to repeated groups in the ciphertext. For example, consider the following encryption using the keyword <code>ABCD</code>:

 Key:        '''''ABCDAB'''''CDABCDABCD'''''ABCDAB'''''CDABCD
 Plaintext:  '''''crypto'''''isshortfor'''''crypto'''''graphy
 Ciphertext: '''''CSASTP'''''KVSIQUTGQU'''''CSASTP'''''IUAQJB

There is an easily noticed repetition in the ciphertext, and so the Kasiski test will be effective.

The distance between the repetitions of <code>CSASTP</code> is 16. If it is assumed that the repeated segments represent the same plaintext segments, that implies that the key is 16, 8, 4, 2, or 1 characters long. (All [[factorization|factors]] of the distance are possible key lengths; a key of length one is just a simple [[Caesar cipher]], and its [[cryptanalysis]] is much easier.) Since key lengths 2 and 1 are unrealistically short, one needs to try only lengths 16, 8, and 4. Longer messages make the test more accurate because they usually contain more repeated ciphertext segments. The following ciphertext has two segments that are repeated:
 Ciphertext: {{color|light-dark(red, lightcoral)|'''''VHVS'''''}}SP{{color|light-dark(blue, lightblue)|'''''QUCE'''''}}MRVBVBBB{{color|light-dark(red, lightcoral)|'''''VHVS'''''}}URQGIBDUGRNICJ{{color|light-dark(blue, lightblue)|'''''QUCE'''''}}RVUAXSSR
The distance between the repetitions of <code>VHVS</code> is 18. If it is assumed that the repeated segments represent the same plaintext segments, that implies that the key is 18, 9, 6, 3, 2, or 1 characters long. The distance between the repetitions of <code>QUCE</code> is 30 characters. That means that the key length could be 30, 15, 10, 6, 5, 3, 2, or 1 characters long. By taking the [[intersection (set theory)|intersection]] of those sets, one could safely conclude that the most likely key length is 6 since 3, 2, and 1 are unrealistically short.<!-- the key is CARBON and the message is THERECOULDYETBEANOTHERGEOGRAPHERWHOWOULDDISAGREE -->

===Friedman test===
The Friedman test (sometimes known as the kappa test) was invented during the 1920s by [[William F. Friedman]], who used the [[index of coincidence]], which measures the unevenness of the cipher letter frequencies to break the cipher. By knowing the probability <math>\kappa_\text{p}</math> that any two randomly chosen source language letters are the same (around 0.067 for [[Case sensitivity|case-insensitive]] English) and the probability of a coincidence for a uniform random selection from the alphabet <math>\kappa_\text{r}</math> ({{frac|1|26}} = 0.0385 for English), the key length can be estimated as the following:

: <math>\frac{\kappa_\text{p}-\kappa_\text{r}}{\kappa_\text{o}-\kappa_\text{r}}</math>

from the observed coincidence rate

: <math>\kappa_\text{o}=\frac{\sum_{i=1}^{c}n_i(n_i -1)}{N(N-1)}</math>

in which ''c'' is the size of the alphabet (26 for English), ''N'' is the length of the text and ''n''<sub>1</sub> to ''n''<sub>''c''</sub> are the observed ciphertext [[letter frequencies]], as integers.

That is, however, only an approximation; its accuracy increases with the length of the text. It would, in practice, be necessary to try various key lengths that are close to the estimate.<ref>{{cite book |editor=Henk C.A. van Tilborg |title=Encyclopedia of Cryptography and Security |url=https://archive.org/details/encyclopediacryp00tilb |url-access=limited |publisher=Springer |edition=First |isbn=0-387-23473-X |pages=[https://archive.org/details/encyclopediacryp00tilb/page/n127 115] |year=2005}}</ref> A better approach for repeating-key ciphers is to copy the ciphertext into rows of a matrix with as many columns as an assumed key length and then to compute the average [[Index of coincidence#Example|index of coincidence]] with each column considered separately. When that is done for each possible key length, the highest average index of coincidence then corresponds to the most-likely key length.<ref>{{cite journal | author=Mountjoy, Marjorie | title= The Bar Statistics | journal=NSA Technical Journal | year=1963 | volume=VII | issue=2,4}}  Published in two parts.</ref> Such tests may be supplemented by information from the Kasiski examination.

===Frequency analysis===
{{Main|Frequency analysis}}
Once the length of the key is known, the ciphertext can be rewritten into that many columns, with each column corresponding to a single letter of the key. Each column consists of plaintext that has been encrypted by a single [[Caesar cipher]]. The Caesar key (shift) is just the letter of the Vigenère key that was used for that column. Using methods similar to those used to break the Caesar cipher, the letters in the ciphertext can be discovered.

An improvement to the Kasiski examination, known as [[Auguste Kerckhoffs|Kerckhoffs]]' method, matches each column's letter frequencies to shifted plaintext frequencies to discover the key letter (Caesar shift) for that column. Once every letter in the key is known, all the cryptanalyst has to do is to decrypt the ciphertext and reveal the plaintext.<ref>{{cite web |url=http://courses.umass.edu/cs415/labs/lab1/415-lab1-crypto.pdf |title=Lab exercise: Vigenere, RSA, DES, and Authentication Protocols |access-date=2006-11-10 |work=CS 415: Computer and Network Security |archive-url=https://web.archive.org/web/20110723140549/http://courses.umass.edu/cs415/labs/lab1/415-lab1-crypto.pdf |archive-date=2011-07-23 |url-status=dead }}</ref> Kerckhoffs' method is not applicable if the Vigenère table has been scrambled, rather than using normal alphabetic sequences, but Kasiski examination and coincidence tests can still be used to determine key length.

===Key elimination===
The Vigenère cipher, with normal alphabets, essentially uses modulo arithmetic, which is commutative. Therefore, if the key length is known (or guessed), subtracting the cipher text from itself, offset by the key length, will produce the plain text subtracted from itself, also offset by the key length. If any "probable word" in the plain text is known or can be guessed, its self-subtraction can be recognized, which allows recovery of the key by subtracting the known plaintext from the cipher text. Key elimination is especially useful against short messages. For example, using <code>LION</code> as the key below:

{|
| Plaintext: || <code>thequickbrownfoxjumpsoverthelazydog</code>
|-
| Key: || <code>LIONLIONLIONLIONLIONLIONLIONLIONLIO</code>
|-
| Ciphertext: || <code>EPSDFQQXMZCJYNCKUCACDWJRCBVRWINLOWU</code>
|}

Then subtract the ciphertext from itself with a shift of the key length 4 for <code>LION</code>.

{|
| Ciphertext (original): || <code>'''EPSD'''FQQXMZCJYNCKUCACDWJRCBVRWINLOWU</code>
|-
| Ciphertext (shifted): || <code>FQQXMZCJYNCKUCACDWJRCBVRWINLOWU____</code>
|-
| Result (difference): || <code>ZZCGTROOOMAZELCIRGRLBVOAGTIGIMT{{strikethrough|LOWU}}</code>
|}

Which is nearly equivalent to subtracting the plaintext from itself by the same shift.

{|
| Plaintext (original): || <code>'''theq'''uickbrownfoxjumpsoverthelazydog</code>
|-
| Plaintext (shifted): || <code>uickbrownfoxjumpsoverthelazydog____</code>
|-
| Result (difference): || <code>zzcgtrooomazelcirgrlbvoagtigimt{{strikethrough|ydog}}</code>
|}

Which is algebraically represented for <math>i \in [1, n - m]</math> as:

:<math>
\begin{align}
(C_i - C_{(i + m)}) \bmod \ell &= (E_K(M_i) - E_K(M_{(i + m)})) \bmod \ell \\
 &= ((M_i + K_{(i \bmod m)}) \bmod \ell - (M_{(i + m)} + K_{((i + m) \bmod m)}) \bmod \ell) \bmod \ell \\
 &= ((M_i + K_{(i \bmod m)}) - (M_{(i + m)} + K_{((i + m) \bmod m)})) \bmod \ell \\
 &= (M_i + K_{(i \bmod m)} - M_{(i + m)} - K_{((i + m) \bmod m)}) \bmod \ell \\
 &= (M_i - M_{(i + m)} + K_{(i \bmod m)} - K_{((i + m) \bmod m)}) \bmod \ell \\
 &= (M_i - M_{(i + m)} + K_{(i \bmod m)} - K_{(i \bmod m)}) \bmod \ell \\
 &= (M_i - M_{(i + m)}) \bmod \ell \\
\end{align}</math>

In this example, the words <code>brownfox</code> are known.

{|
| Plaintext (original): || <code>brownfox</code>
|-
| Plaintext (shifted): || <code>nfox____</code>
|-
| Result (difference): || <code>omaz{{strikethrough|nfox}}</code>
|}

This result <code>omaz</code> corresponds with the 9th through 12th letters in the result of the larger examples above. The known section and its location is verified.

Subtract <code>brow</code> from that range of the ciphertext.

{|
| Ciphertext: || <code>EPSDFQQX'''MZCJ'''YNCKUCACDWJRCBVRWINLOWU</code>
|-
| Plaintext: || <code>________brow_______________________</code>
|-
| Key: || <code>{{strikethrough|EPSDFQQX}}'''LION'''{{strikethrough|YNCKUCACDWJRCBVRWINLOWU}}</code>
|}

This produces the final result, the reveal of the key <code>LION</code>.