Write a program to read protein sequences from a file, count them, and allow for
ID: 3677640 • Letter: W
Question
Write a program to read protein sequences from a file, count them, and allow for retrieval of any single protein sequence.
Read in proteins and store them in a hash table. You do not know what the proteins are ahead of time (pretend that the input dataset may change). So you will have to resolve collisions.
The input file is very large, but somehow you happen to know that each protein will be less than 30 amino acids long so you can store them in a 30 character string. You also know that the file contains many copies of less than 20 unique proteins, so, you can use a data array with 40 elements which is twice as much space as you need, to reduce the number of collisions. Each element will contain the key value itself (the protein), and the number of times it occurs in the input file (the count). Use the following data structure:
struct arrayelement {
char protein[30];
int count;
}; arrayelement proteins[40];
The hash function is:
h(key) = ( first_letter_of_key + (2 * last_letter_of_key) ) % 40
where, A = 0, B = 1, …, Z = 25.
Generate output of the form:
Protein Count
BIKFPLVHANQHVDNSVRWGIKDW 5929
AWGKKKTKTQFQFPTADANCDCDD 7865
Etc for all of them…
Please enter a sequence: AWGKKKTKTQFQFPTADANCDCDD 7865 FOUND
Please enter a sequence: LADYGAGABORNTHISWAY NOT FOUND
// The file processing algorithm
While(there are proteins)
Read in a protein
Hash the initial index into the proteins table
While(forever)
If(found key in table)
Increment count
Break;
If(found empty spot in table)
Copy key into table
Increment count
Break;
Increment index; // collision! Try the next spot!
This is the link to the protein.txt file
http://wserver.flc.losrios.edu/~ross/CISP430S16.SPLWQKFGXWKTGZHS.BOB/proteins.txt
This code should be done in c or c++
Please include an output for this code
Explanation / Answer
struct arrayelement {
char protein[30];
int count;
}; arrayelement proteins[40];
h(key) = ( first_letter_of_key + (2 * last_letter_of_key) ) % 40
// The file processing program
While(there are proteins)
{
scanf("in a protein");
h(key) = ( first_letter_of_key + (2 * last_letter_of_key) ) % 40
While(forever){
If( key in table){
count++;
Break; }
If( empty spot) {
strcpy(key,table);
count++;
Break; }
index++; // collision! Try the next spot!
Related Questions
drjack9650@gmail.com
Navigate
Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.