HashTable.h is the h file that outlines everything, HashTable.cpp creates it and
ID: 3833516 • Letter: H
Question
HashTable.h is the h file that outlines everything, HashTable.cpp creates it and FinalProject.cpp opens the command line argument, takes user input, uses the defined functions and displays results.
You need to build a hash table using two different algorithms for collision resolution – open addressing and chaining. Your hash table should be build from the provided data set called playerData. You also need to evaluate the performance of each algorithm in terms of how many collisions are generated while the table is built and the operations required to retrieve data from the table.
Assignment data
There is a file called playerData that includes every Major League Baseball player on every team since 1985. For example, the first two rows in the file show the header row, which shows what each column includes, and the first row of data. The first row of data is:
1985, ATL, NL, barkele01, 870000, Len, Barker, 1955, USA, 225, 77, R, R
The fields in the header row show:
yearID – year, between 1985 and 2016
teamID – three-letter abbreviation for team
leagueID – league that the team belongs to, either National League (NL) or American League (AL)
playerID – ID for the player, used in the Lahman database where I downloaded data salary – how much money the player made that year
firstName – player’s first name
lastName – player’s last name
birthYear – year the player was born
birthCountry – country where the player was born
weight – player’s weight
height – player’s height
bats – how the player bats, either right (R ) or left (L) handed, or switch (S).
throws – how the player throws, either right (R ) or left (L) handed.
Assignment requirements
Build hash table
You need to process the data provided and build a hash table, where the key for entries in the hash table is the player’s first and last name, concatenated. For example, for the first row in the file, the key would be LenBarker.
For a hash function, you can use the hashSum function discussed in lecture, or any other hash function that you want to try. In your project report, describe the hash function that you use.
Each player should only appear in the hash table one time. Each record in the hash table needs to include the player’s information, including first and last name, birth year and country, height and weight, and batting and throwing. Each record also needs to include a list of all the teams and years that the player was active, and their salary for that year. For example, the hash table entry for Len Barker would be:
First name: Len
Last name: Barker Year born: 1955 Country born: USA Weight: 225
Height: 77
Bats: R
Throws: R
Teams and salary: 1985, ATL, NL, 870000 1986, ATL, NL, 880000 1987, ATL, NL, 890000 1987, ML4, AL, 72500 1988, ATL, NL, 900000
The teams and salary entries are taken from the playerData file for every row where the player’s name appears. Note: you could have multiple players with the same name. You will need some method for verifying the uniqueness of a player. For example, if there were another Len Barker in the data file, you might want to check the height and weight or year born information to determine if it’s the same person.
Set hash table size using command line argument
Use a command line argument to set the size of the hash table. There are 5072 unique players in the playerData file. Run your program multiple times, using 5072 as the minimum hash table size, and increase the size of the hash table up to 15K- 20K. In your project write up, describe the effect that increasing the hash table size had on the number of collisions for both algorithms.
The file data I have uploaded is not this large, but this is the format it will be in. The info must be read in and opened from a comand line argument of playerData.txt The program you write should also be able to handle collisions such as the same player with different years. The file section I have copied may not have collisions, but change some of the data to test that your collision checking works.
Collision analysis
As part of this assignment, you need to evaluate the performance of two collision resolution algorithms – open addressing and chaining. My suggestion for approaching this problem is to build two hash tables as you are reading the data from the file – one hash table resolves collisions using open addressing and the other hash table resolves collisions using chaining.
Collisions during building the table
As you are building the hash table, you need to count the number of collisions that each collision resolution algorithm generates. You also need to count the number of search operations needed to find a location for the record. For example, if you are using chaining and there is a collision and you traverse three records in a linked list to find an open location, then the number of search operations is 3. If you get a collision using open addressing and evaluate 4 entries in the hash table before finding an open location, then the number of search operations is 4.
Once the table is built, display the following information clearly for the user:
Hash table size: <table size>
Collisions using open addressing: <collisions> Collisions using chaining: <chaining collisions>
Search operations using open addressing: <search ops> Search operations using chaining: <search ops chaining>
User input
Your program needs to present the user with the option of looking up a player in the hash table. A suggested menu is as follows:
Query hash table
Quit program
If the user selects Query hash table, they should be asked to input the name of a player. If the player is in the hash table, display the record for that player, including all of the information shown on the previous page for the Len Barker example. If the player is not in the hash table, display “Player not found”, and display the menu again so the user can select another player or quit the program.
Linear searches needed to retrieve data
In addition to displaying player information, your program should also display the number of search operations needed to find the player in the hash table for both open addressing and chaining.
For example, if the hash code for a player is 0, and no other players are stored at that index, then there would be zero searches needed to find the players information. However, if there are 5 players with the same hash code, and the player you need was stored at the end of the linked list for that index using chaining, then the number of searches is 5.
Display the following information, in addition to the player information: Search operations using open addressing: <search ops>
Search operations using chaining: <search ops chaining>
Hints
Your code needs to be separated into multiple files, including a HashTable.h, HashTable.cpp, and FinalProject.cpp.
If you use an array for your hash table, you will need to use dynamic memory to create the array. For the chaining algorithm, you store a dynamically created array of pointers. For the open addressing algorithm, you need a regular dynamic array.
Each player should only be stored in the hash table one time. The list of teams for that player needs to be an item in the player’s record.
Explanation / Answer
The Hash Table
The dictionary will consist of a list of words, separated by whitespace. For convenience, the words will be given in lower case, so you do not need to worry about capitalization issues. You will need to insert them into a hash table that grows dynamically as necessary to hold the dictionary while keeping the load factor low enough. It is up to you to decide how to handle collisions: separate chaining, linear probing, quadratic probing, or rehashing. Your hash table must be implemented as a general-purpose class, although it does not need to be templated.
Hash Functions
Designing a good hash function is something of a black art. We have provided a separate Web page that briefly discusses some hash functions and how they work. However, you don't necessarily have to write a hash function as part of this assignment.
Because we don't have time to cover hash functions in lecture, and because of the limited amount of time you have to work on the program, we have provided a hash function for you. Actually, we have provided several for you to choose from, together with a header file that you can #include so that they are easy to use.
If you are short on time, we suggest that you use hashStringCRC (with a prime table size) or hashStringBUZ as your hash function. However, if you have more time, we suggest that you experiment with several different hash functions to find out which works best (in terms of the collision statistics).
All of the hash functions have descriptive comments in the source file. Before you choose a function, be sure to read the comments (for example, some functions work very badly with certain table sizes).
You are not required to use one of our hash functions. If you want to experiment with writing your own, please feel free. However, you should test your function thoroughly so that you can be sure that it gives you good collision statistics in a wide variety of conditions.
Hash Table Statistics
To help you understand how your hashing code works, you should track and report the following statistics:
Note that all but the first of these statistics will need to be reinitialized whenever you expand the table. After you read the dictionary, you should report the above statistics to cerr. If you wind up with a collision chain longer than about 20, there is something seriously wrong with your hash function or your collision method, and points will be deducted. (This means that linear probing is probably inappropriate.)
Spell Checking
Once the dictionary has been created, your program will read a list of words from standard input. If a word is found in the dictionary, your program should produce no output. Otherwise, you should generate suggested corrections and write them, together with the original word (converted to lowercase), as a single output line. For example, suppose the input word was "Wird". The output might be:
Unlike the dictionary, the words input to your program may be in any case. You can convert a string to lower case by including the ctype.h header file and using the isupper and tolower functions:
Generating Corrections
The easiest way to generate corrections in a spell checker is a trial-and-error method. If we assume that the misspelled word contains only a single error, we can try all possible corrections and look each up in the dictionary.
Traditionally, spelling checkers have looked for four possible errors: a wrong letter ("wird"), an inserted letter ("woprd"), a deleted letter ("wrd"), or a pair of adjacent transposed letters ("wrod"). To simplify this assignment, you will only need to deal with the first possibility, a wrong letter. When a word isn't found in the dictionary, you will need to look up all variants that can be generated by changing one letter. For example, given "wird," you should look up "aird", "bird", "cird", etc. through "zird", then "ward", "wbrd", "wcrd" through "wzrd", and so forth. Whenever you find a match in the dictionary, you should add it to your output line.
Input Format
Both the dictionary and the file to be spell-checked consist of arbitrary-length words separated by whitespace. The easy way to represent them is as C++ strings. You can then easily read them in and manipulate them using something like:
When used with a string, the >> operator will skip over any whitespace and then grab the next string of non-whitespace characters -- exactly what you need.
For convenience, neither the dictionary nor the input file will contain punctuation. If you would like to test your spelling checker on a "real" input file (such as your README), you can remove the punctuation with the tr program. For example, you could do:
(You may find it instructive to study the tr manual page to learn how the above command works.)
If you have a file that has already been cleaned up (so it only contains alphabetics and whitespace), you could do:
You can also just type directly to stdin:
In that case, you'll need to type control-D at the end of your input to generate an EOF.
Sample Dictionaries
You may wish to create a very small sample dictionary of your own for initial testing. A slightly larger dictionary of 341 words should help you to get most of your bugs out. When you're fairly confident, you can try your luck with over 34,000 words in an all-lowercase version of the ispell dictionary.
Output Format
The spell-check program should produce one line of statistical output on cerr, and zero or more lines of correction output on cout.
The statistical output should be in the format:
n expansions, load factor f, n collisions, longest chain n
where n is an integer and f is a floating-point (double) number.
Each line in the correction output should consist of the incorrect word, followed by a colon and zero or more corrections, separated by spaces. There should be exactly one space after the colon (unless there are no corrections), and there should be no space at the end of the line. The following are examples of valid output lines:
If every word in the input is found in the dictionary, the spell checker should produce no output on cout.
Related Questions
drjack9650@gmail.com
Navigate
Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.