This is done on Maple. Question 1 (10 points) [Index of Coincidence] In this que
ID: 3663147 • Letter: T
Question
This is done on Maple.
Question 1 (10 points) [Index of Coincidence]
In this question, you compute the index of coincidence of a string. Given a string
"S=s[1]...s[t],"
of length
t
of symbols from an alphabet
A = a[1] .. a[n]
comprised of
n
symbols. Let
"f[i] ="
the number of occurrences of the
i
th symbol
a[i]
in the string
"S. "
The index of coincidence is equal to
"(∑)f[i]^(2)."
The index of coincidence can be used to determine if a cypher is a monoalphabetic substitution cypher or a polyalphabetic substitution cypher. Assume that English is the language of the plaintext. In a monoalphabetic substitution cypher, the frequencies
f[i]
should be approximately equal to the frequencies of
a[i]
in English. In an ideal polyalphabetic substitution cypher, the frequencies
f[i]
should be approximately equal to
1
-
n
, i.e. each letter occurs with equal probability. Note that it can be shown (see question 5) that the index of coincidence is minimal when each letter occurs with identical frequency (i.e. the uniform distribution) and maximal when one letter occurs with frequency 1.
restart;
with(StringTools):
with(Statistics):
with(LinearAlgebra):
In this question we assume that
alphabet := Iota("a".."z");
"abcdefghijklmnopqrstuvwxyz"
The probabilities of the letters in the alphabet occurring in typical English text are given in the list FE. The
i
th element of FE is the probability for the
i
th letter in the alphabet. The probability distribution can be visualized with the bar graph produced by ColumnGraph.
FE := [0.8167000000e-1, 0.1492000000e-1, 0.2782000000e-1, 0.4253000000e-1, .1270200000, 0.2228000000e-1, 0.2015000000e-1, 0.6094000000e-1, 0.6966000000e-1, 0.1530000000e-2, 0.7720000000e-2, 0.4025000000e-1, 0.2406000000e-1, 0.6749000000e-1, 0.7507000000e-1, 0.1929000000e-1, 0.9500000000e-3, 0.5987000000e-1, 0.6327000000e-1, 0.9056000000e-1, 0.2758000000e-1, 0.9780000000e-2, 0.2360000000e-1, 0.1500000000e-2, 0.1974000000e-1, 0.7400000000e-3];
[0.08167000000, 0.01492000000, 0.02782000000, 0.04253000000,
0.1270200000, 0.02228000000, 0.02015000000, 0.06094000000,
0.06966000000, 0.001530000000, 0.007720000000, 0.04025000000,
0.02406000000, 0.06749000000, 0.07507000000, 0.01929000000,
0.0009500000000, 0.05987000000, 0.06327000000, 0.09056000000,
0.02758000000, 0.009780000000, 0.02360000000, 0.001500000000,
0.01974000000, 0.0007400000000]
ColumnGraph(FE);
a) Calculate the Index of Coincidence for English letters using the standard probabilities. Also calculate the Index of Coincidence assuming each letter in the alphabet occurs equally often. You can use the DotProduct function from Maple's LinearAlgebra package.
b) Calculate the Index of Coincidence for the following two cyphers. First you should compute a list of fequencies of the letters in the two cyphers, and then you should calculate the Index of Coincidence. Which one is more likely to be a monoalphabetic substitution cypher?
You may use the command CountCharacterOccurrences from the StringTools package to count character frequencies. You may want to use the command add to calculate the Index of Coincidence.
Explanation / Answer
Definition: Index of Coincidence. Denoted by I, the index of coincidence represents the probability that two randomly selected letters are identical. Index of Coincidence for Monoalphabetic Ciphers In monoalphabetic ciphers, the frequencies of letters in standard English are preserved when converting from plaintext to ciphertext. The following example illustrates why this is true. Example 4: Illustrate why the Caesar shift cipher preserves frequencies when converting from plain the ciphertext. Solution: Recall the index of coincidence represents the probability that two randomly selected letters are identical. Since monoalphabetic ciphers preserves frequencies, the index of coincidence of the plaintext alphabet of standard English will be exactly the same as the index of coincidence of the ciphertext alphabet for a monoalphabetic cipher. Using the result from Example 3, this fact results in the following statement: Index of Coincidence for Monoalphabetic Ciphers Index of Coincidence for Polyalphabetic Ciphers In a polyalphabetic cipher, the goal is to distribute the letter frequencies so that each letters has the same likelihood of occurring in the ciphertext. The next example determines what the index of coincidence is for a polyalphabetic cipher for a large collection of letters. Example 5: Determine the probability that two randomly selected letters are identical of the ciphertext of a message enciphered with a polyalphabetic cipher, assuming there are a very large number of letters in the ciphertext. Solution: ince the index of coincidence represents the probability that two randomly selected letters are identical, Example 5 allows us to make the following statement: Index of Coincidence for Polyalphabetic Ciphers The index of coincidence values for monoalphabetic (0.065) and polyalphabetic ciphers (0.0385) were derived assuming that the plaintext message has a very large number of letters. When messages are enciphered and deciphered, these messages are normally much shorter. Hence, the index of coincidence for a typical enciphered message enciphered will be bounded somewhere between 0.0385 and 0.065. This leads to the following statement: Index of Coincidence Bound For a typical ciphertext message, the index of coincidence I satisfies: Fact: If I is close to 0.0385, then the cipher is likely to have been obtained from a polyalphabetic cipher. If I is closer to 0.065, the cipher is likely to be monoalphabetic. Knowing what the value for the index of coincidence tells us, we now need to derive a formula for calculating it. Before doing this, we need to recall the following fact concerning summation notation: Fact: Summation notation is a shorthand notation in mathematics for indicating the sum of many terms. We say that represents the sum of k terms , where the index i starts at the first term (i = 1) and we sum until we reach the upper index k of the summation symbol. Example 6: Compute . Solution: Derivation of Formula for the Index of Coincidence for a Given Ciphertext Message Suppose a ciphertext message is received. Let be the counts of the number of occurrences of the alphabet letters A, B, C, …, Y, Z that occur in the ciphertext (note that the subscript of each variable corresponds to the MOD 26 alphabet assignment number of the corresponding letter). Suppose represents the total sum of all of the letters in the ciphertext. Recall that the index of coincidence represents the probability that two randomly selected letters are identical. Using the multiplication principle of probability for n total letters, we can compute the following probabilities for each individual letter: Since these probabilities are mutually exclusive, we can sum the individual probabilities for each letters to find the index on coincidence. We summarize this result. Formula for the Index of Coincidence The index of coincidence I is given by the formula: where n represents the total number of letters in the ciphertext and represents the number of letters corresponding to each individual letters in the ciphertext with , , , etc. The next example illustrates an example of how this formula is used. Example 7: Suppose we receive the message "HLUBN WFSFK IGIHM GBSIM MBSEJ MAFUT QECII LJSUB BAXMA JCWXC MBSGZ GGSMK BHUQB ETVUS MLMER CFDTW UBASW ERFIE LOMVY SIMMY YEDDM MSGZA NCOFY YTIHL JRYOH KLOFH IEFKQ OFAAI ZGIEJ HAKNZ JSQRU QXDKW HSNNF AOUMO ROFAA IZPIQ YHQFY SWEFK ILDPQ GIXUE ADFWN NFVYO TRXRG QKRUS HVYHA GYONT TZISI EPUOF XAZRN ZTSQK BGIIS MIMII SMIMX HAHUF ZNMFG WIMMB QWQLT SMZTU XRBSA EMFGW IHUAM WQFFV CKNSM TIJYY RCOJR ASBOE YHQAI KYPAK YJKUX VUFIG GBCFY HQKIJ QDFVU LBOGZ XTQOI MIMWH QOXUQ EMBIX KYAIB SAEFC UKPYA ILKJL RRIQT URSYD QUOYS OJLXR IQTUB IHC" Use the index of coincidence to decide if this ciphertext was produced by a monoalphabetic or olyalphabetic cipher. Solution: Using the Friedman Maplet, we generate the frequency table for the letters in this message: A B C D E F G H I J K L M N O P Q R S T U V W X Y Z Letter Frequency Table This gives total letters. Thus the index of coincidence is: I = = = = = Since 0.043186 is much closer to 0.0385 than 0.065, the cipher is likely polyalphabetic. Using the Index of Coincidence to Estimate the Keyword Length for the Vigenere Cipher So far we have used the index of coincidence to determine whether a cipher is polyalphabetic or monoalphabetic. Once this is determined, it can be used to estimate the keyword length. Knowing the keyword length is the first essential step when attempting to break the Vigenere cipher. The following formula gives an estimate of the keyword length: Keyword Length Formula for the Vigenere Cipher where Keyword Length Formula for the Vigenere Cipher
Related Questions
drjack9650@gmail.com
Navigate
Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.