Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

2. A spelling checker program reads an input file and prints al the words not fo

ID: 3734612 • Letter: 2

Question

2. A spelling checker program reads an input file and prints al the words not found in some on-line dictionary. Suppose the dictionary contains 20,000 words and the input file is large, so that the algorithm can make only one pass through the input file. A simple strategy is to read the dictionary into a hash table and look for each input word as it is read (a) Assuming that an average word is 7 characters long and that it is possible to store words of L characters in L 1 bytes (so space wastage is not a consideration), and assuming a quadratic probing hash table, how much space does the table require? (b) If memory is limited and the entire dictionary cannot be stored in a hash table, we can still get an efficient algorithm that almost always works. We declare a table of boolean (initialized to false) from 0 to Ta esize-1. As we read in a word, we set table [hash (word)] = true, which of the following is true? i. If a word hashes to a location with value that is false, the word is not in the dictionary. ii. If a word hashes to a location with value that is true, the word is in the dictionary. iii. Suppose we choose Tablesize-200,007 How much memory does this require? iv. What is the probability of an error in this scheme?

Explanation / Answer

(a) As space is not a consideration here, for every word we can find some place to store every word. The placement of word depends on it's hash and the collision will be resolved by quadratic probing. So total space = (20000 * space for each word)

= 20000* 8 bytes

= 160000 bytes.

8 because average length of word is 7 and it given we require L + 1 bytes to store a word of length L.

(b) (i) It is true because if it was there in dictionary then hash would have been set to true.

(ii) Need not be, because some other word may be mapped to same hash and it is because of that hash value might have been set to true.

(iii) Assumig boolean takes one bit to store, total will be 200,007 bits.

(iv) If table size is 200,007 then there won't be any error because the collision will be resolved by quadratic probing. So it is zero.