Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

Prompt for the name of an input text file to check. This file will contain a num

ID: 3709881 • Letter: P

Question


Prompt for the name of an input text file to check. This file will contain a number of words.

For this assignment a word is any sequence of one or more characters separated by one or more Spaces or newlines (Sounds like << of a String).

Print out the words that could be misspelled, then print out the number of Words in the Dictionary, number of Words in the File, number of words not in the dictionary.

Here are two files to check against (you may be a few off depending on how you coded):

25021 dictionary words, 28 words in file, 3 misspelled (Your algorithms should come up with the correct answer for this one)

IMPLEMENTATION

Implement with an array (SIZE=1373) of linked lists. Your linked lists should contain the word that was hashed to that array. When you land on a particular array cell (equal to the hash of the word), traverse the linked list until you either find the word, or the nullptr...then add the word. (You can use the STL list if you choose). For your hash function, you will be hashing strings.

To get a hash string you should declare a variable something like this:
hash<string> hashStr;

then when you want to hash a particular string (let's say called string1)

hashStr(string1);

This "function" will produce a long data type (which you should mod by 1373)

You don't have to interpret verb tense, plurals, conjugations etc. All you have to do is to check with each word against the dictionary.

/********************************************************************************/

Extra Credit (5 pts) Ensure you document in your readme.txt:

You could be reading text from a book, so you could have characters that are not alpha. Since all of the words in the dictionary are lowercase, you should convert the first character to lower case (tolower( ) ). Loop through the entire word and remove non-alpha characters. If you include cctype, you will be able to ask if isalnum.   Also note that strings have an erase command which allows you to erase a character that is non-alpha. Use http://www.cplusplus.com/reference/locale/tolower/ (Links to an external site.)Links to an external site. as an example way to convert. A for loop could be useful in converting all characters.

Test against

25021 dictionary words, 78451 words in file, 17729 words misspelled (Note: you may have hundreds more words misspelled depending on which characters you delete in checking the database).

/********************************************************************************/

HASH TABLES

A hash table contains buckets into which an object (data item) can be placed. When a hash function is applied to an object, a hash value is generated. The hash value is used to determine which bucket the object is assigned to.

A bucket is a cluster (or a sub container) that holds a set of data items that hash to the same table location. Obviously, you can not store 25K words in 1373 slots and you need to use some kind of chaining schemes such as linear probing or the second hashing. The size of a bucket is independent from the number of data items you put into the hash. So if you have too many buckets, the hash will not have many collisions but you may waste the storage and you may have to deal with a rather complex hash function and longer keys. If you have too small number of buckets, then you have to deal with frequent collisions. Finding a good bucket number would play an important role in reducing collisions. That's why we usually pick a prime number for the number of bucket. We picked 1373 for the bucket number.

Explanation / Answer

#include <iostream>

#include <fstream>

#include <cctype>

#include <cstring>

#include <string>

#include <iomanip>

#include <ctime>

#include <limits>

#include "HashTable.h"

using namespace std;

// iterator declaration for hash table

typedef HashTable<string>::Iterator iterDec;

// hash table size

const int TABLE_SIZE = 19000;

// strtok delimiters

const char* DELIMITERS = " ,.-':;?()+*/\%$#!"@^&";

// function prototypes

void PrintTableStats(HashTable<string>& hashTable);

int SpellCheck(HashTable<string>& hashTable, string word);

string ToLowerCase(string word);

int main()

{

// declare variables

int result = 0;

string userInput;

string currWord;

clock_t beg; // used to time the hashtable load

clock_t end; // used to time the hashtable load

char response;

ifstream infile file1;

HashTable<string> hashTable(TABLE_SIZE);

// open the dictionary file

infile.open("dict.txt");

// check if the file exists, EXIT if it doesnt

if(infile.fail())

{

cout<<"nn**ERROR - The dictionary file could not be found...n";

exit(1);

}

cerr<<"nLoading dictionary....";

beg = clock(); // start the timer

// get data from file and put into hashtable

while(infile >> currWord)

{

// makes sure duplicate words arent inserted into table

if(!hashTable.Count(currWord))

{

hashTable.Insert(currWord);

}

}

infile.close();

PrintTableStats(hashTable);

end = clock()-beg; // end the timer

cout<<"nnDictionary loaded in "<<

(double)end / ((double)CLOCKS_PER_SEC)<<" secs!";

// creates a line separator

cout<<endl;

cout.fill('-');

cout<<left<<setw(50)<<""<<endl;

do{ // get user input

cout<<"n>> Please enter a name of the file To be checked: ";

getline(cin,userInput);

cout<<endl;

file1.open(userInput);

while(file1 >> splitInput)

{

// split each word from the string into individual words to check if

// they are spelled correctly

char* splitInput = strtok(const_cast<char*>(userInput.c_str()),DELIMITERS);

while(splitInput!=NULL)

{

currWord = splitInput;

currWord = ToLowerCase(currWord);

result += SpellCheck(hashTable,currWord);

splitInput = strtok(NULL,DELIMITERS);

}

}//while

// display results

if(result > 0)

{

cout<<"Number of words spelled incorrectly: "<<result<<endl;

result = 0;

}

// ask for more data

cout<<"nDo you want to enter another sentence? (y/n): ";

cin >> response;

cin.ignore(numeric_limits<streamsize>::max(),' '); // clear the cin buffer

}while(toupper(response)=='Y');

cout<<"nBYE!!n";

return 0;

}// end of main

void PrintTableStats(HashTable<string>& hashTable)

{

int largestBucket = -9999999;

int largestIndex = 0;

int smallestBucket = 9999999;

int smallestIndex = 0;

double numBuckestUsed = 0;

ofstream outfile("OUTPUT_HashTable_Stats_programmingnotes_freeweq_com.txt");

for(int x=0; x < hashTable.TableSize(); ++x)

{

// iterator is used to traverse each hashtable bucket

iterDec it = hashTable.begin(x);

if(!hashTable.IsEmpty(x))

{

if(smallestBucket > hashTable.BucketSize(x))

{

smallestBucket = hashTable.BucketSize(x);

smallestIndex = x;

}

if(largestBucket < hashTable.BucketSize(x))

{

largestBucket = hashTable.BucketSize(x);

largestIndex = x;

}

++numBuckestUsed;

outfile<<"nBucket #"<<x<<": ";

for(int y = 0; y < hashTable.BucketSize(x); ++y)

{

outfile <<" "<< it[y] << endl;

}

}

}

cout<<"Complete!n";

// creates a line separator

cout<<endl;

cout.fill('-');

cout<<left<<setw(50)<<""<<endl;

cout<<"Total dictionary words = "<<hashTable.TotalElems()<<endl

<<"Hash table size = "<<hashTable.TableSize()<<endl

<<"Largest bucket size = "<<largestBucket<< " items at index #"<<largestIndex<<endl

<<"Smallest bucket size = "<<smallestBucket<< " items at index #"<<smallestIndex<<endl

<<"Total buckets used = "<<numBuckestUsed<<endl

<<"Total percent of hash table used = "<<(numBuckestUsed/hashTable.TableSize())*100<<"%"<<endl

<<"Average bucket size = "<<(hashTable.TotalElems()/numBuckestUsed)<<" items";

}// end of PrintTableStats

int SpellCheck(HashTable<string>& hashTable, string word)

{

int result = 0;

int suggestion = 0;

string remove[256];

int numRemove=0;

if(!hashTable.Count(word))

{

++result;

cout<<"** "<<word<<": ";

// alteration & insertion

for(unsigned x = 0; x < word.length(); ++x)

{

string alteration = word;

for(char c = 'a'; c <= 'z'; ++c)

{

//alteration

alteration[x] = c;

if(hashTable.Count(alteration))

{

cout<<alteration<<", ";

remove[numRemove++] = alteration;

++suggestion;

// remove the entry so it isnt displayed multiple times

hashTable.Remove(alteration);

}

//insertion

string insertion = word.substr(0, x) + c + word.substr(x);

if(hashTable.Count(insertion))

{

cout<<insertion<<", ";

remove[numRemove++] = insertion;

++suggestion;

// remove the entry so it isnt displayed multiple times

hashTable.Remove(insertion);

}

}

}

// transposition & deletion

for(unsigned x = 0; x < word.length()-1;++x)

{

// transposition

string transposition = word.substr(0,x) + word[x+1] + word[x] + word.substr(x+2);

if(hashTable.Count(transposition))

{

cout<<transposition<<", ";

remove[numRemove++] = transposition;

++suggestion;

// remove the entry so it isnt displayed multiple times

hashTable.Remove(transposition);

}

// deletion

string deletion = word.substr(0, x)+ word.substr(x + 1);

if(hashTable.Count(deletion))

{

cout<<deletion<<", ";

remove[numRemove++] = deletion;

++suggestion;

// remove the entry so it isnt displayed multiple times

hashTable.Remove(deletion);

}

}

// place the removed items back inside the hash table

while(numRemove>=0)

{

hashTable.Insert(remove[numRemove--]);

}

if(suggestion < 1)

{

cout<<"No spelling suggestion found...";

}

cout<<endl<<endl;

}

return result;

}// end of SpellCheck

string ToLowerCase(string word)

{

for(unsigned x = 0; x < word.length(); ++x)

{

word[x] = tolower(word[x]);

}

return word;

}