Spelling Checker Program C++ Using a hash table/function, print out a list of wo
ID: 3693207 • Letter: S
Question
Spelling Checker Program C++
Using a hash table/function, print out a list of words that appear to be misspelled.
A hash table contains buckets into which an object (data item) can be placed. When a hash function is applied to an object, a hash value is generated. The hash value is used to determine which bucket the object is assigned to. Hashed associated containers in STL have a constructor that takes the number of buckets and guarantees the bucket count will be at least that number.
A bucket is a cluster (or a sub container) that holds a set of data items that hash to the same table location. Obviously, you can not store 25K words in 1373 slots and you need to use some kind of chaining schemes such as linear probing or the second hashing. The size of a bucket is independent from the number of data items you put into the hash. So if you have too many buckets, the hash will not have many collisions but you may waste the storage and you may have to deal with a rather complex hash function and longer keys. If you have too small number of buckets, then you have to deal with frequent collisions. Finding a good bucket number would play an important role in reducing collisions. That's why we usually pick a prime number for the number of bucket. We picked 1373 for the bucket number.
You can use an STL for this project. Make sure to use the unordered versions (implemented as hash table) not the ordered versions (implemented as trees).
You will find unordered_map and unordered_set act the same, but the unordered_map requires that you map relates two items together. This assigment requires that you insert a single item (a word) into a hash table, so you would expect to use an unordered_set
http://www.cplusplus.com/reference/unordered_set/unordered_set/ (Links to an external site.)
Since you are dealing with strings, you could use the standard hash function for strings (will happen by default). (If you wanted a different function, you would need to overload the ( ) operator. To set the starting hash table size, you would use .rehash(). According to documentation, an unordered set will automatically increase the number of the buckets as required.
OR
you can use an unordered multiset. The difference here is that the number of buckets is automatically increased, based on the number of collisions,but still allows some collisions, that you can access with iterators pointing in a linked list to the next collision.
http://www.cplusplus.com/reference/unordered_set/unordered_multiset/ (Links to an external site.)(Make sure to compile with std=C++11).
OR
You can implement your own hash, based on the textbook implementation but you need to come up with a scheme to include buckets (clusters) and collision resolution. Hash set from STL makes this very easy by using a constructor with a bucket value, although the exact bucket count may be reevaluated and altered by the library (it's OK for the homework. you don't have to constrain the program to use 1373 buckets)
If you're implementing your own hash, here's a sample function to create a key from a string. Basically it uses a prime number and the character ascii values for the % function.
class HFstring
{
public:
unsigned int operator()(const string& item) const
{
unsigned int prime = 2049982463;
int n = 0, i;
for (i = 0; i < item.length(); i++)
n = n*8 + item[i];
return n > 0 ? (n % prime) : (-n % prime);
}
};
You don't have to interpret verb tense, plurals, conjugations etc. All you have to do is to check with each word with the dictionary.
THANKS!
Explanation / Answer
#include <iostream>
#include <string>
#include <unordered_set>
#include <fstream>
#include <algorithm>
#include <iterator>
#include <sstream>
using namespace std;
int main() {
unordered_set<string> dict;
string line;
ifstream file ("dict.dat");
if (file.is_open()) {
while (getline(file, line)) {
dict.insert(line);
}
}
string input;
cout << "Enter name of input file" << endl;
cin >> input;
ifstream inputFile (input);
if (inputFile.is_open()) {
while (getline(inputFile, line)) {
stringstream ss(line);
istream_iterator<string> begin(ss);
istream_iterator<string> end;
string word;
while (begin != end) {
word = *begin++;
transform(word.begin(), word.end(), word.begin(), ::tolower);
if (word.back() == '.' || word.back() == ',' || word.back() == '?' || word.back() == '!') {
word = word.erase(word.length()-1);
}
unordered_set<string>::const_iterator got = dict.find (word);
if ( got == dict.end() )
std::cout << word << endl;
}
}
}
}
Related Questions
drjack9650@gmail.com
Navigate
Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.