Functions are important mechanism for encapsulation and code reuse. They allow y
ID: 3739787 • Letter: F
Question
Functions are important mechanism for encapsulation and code reuse. They allow you to abstract away the messy details of how things are implemented so that you can concentrate on the algorithm you are working on. In this homework, you will practice writing functions by implementing an interesting algorithm called a "Bloom Filter". There are two problems in this homework. Related C++ HackerRank Problems Introduction -> Functions Bloom Filter From https://en.wikipedia.org/wiki/Bloom_filter: "A Bloom filter is a space-efficient probabilistic data structure, conceived by Burton Howard Bloom in 1970, that is used to test whether an element is a member of a set. False positive matches are possible, but false negatives are not – in other words, a query returns either "possibly in set" or "definitely not in set". Elements can be added to the set, but not removed (though this can be addressed with a "counting" filter); the more elements that are added to the set, the larger the probability of false positives." The Bloom filter itself is stored as an array of m Boolean values, which all start out as false. To add an object to the filter (in our case strings), we compute k hash functions for the string, and set the bit at the hash indices to true. To test if a string is in the filter, we compute the k hash functions for that string and check to see if the values stored at those locations in the filter are true or false. If any of them are false, then the string is definitely not in the set, but if they are all true then the string is probably in the set. Fairly rigorous analysis has been done on the error rates of Bloom filters (some of which is described in the Wikipedia article), but a quick rule of thumb is that using 10 bits per item stored in the filter and 7 hashes (k=7) will result in a false positive rate of about 1%. The final functionality needed to implement the Bloom filter is the hash functions themselves. For a String s with letters s0…s{n-1}, a positive integer p, and a Bloom filter of size m, we can define a hash h_p(s) as: $hp(s) = (p^0 s0 + p^1 s1 … + p^{n-1} s{n-1}) MOD m$ where s_i is the ASCII value of the letter (in other words, you just treat the char as an integer in the calculation). Recall that the % operator computes the MOD function in C++. Also note that to prevent overflow, you should apply the % operator to your result after every operation. In addition, you can't use the pow function to get the powers of p, since the powers of p themselves overflow. To make 7 different hash functions, we simply use 7 different p values. Typcially, prime numbers work well, so we will use the values: 31, 37, 41, 43, 47, 53, and 59.
Problem 3a : Bloom Filter Hash Function Write a function that will compute the Bloom filter hash function defined above (hp) for an input string, an input value of p, and an input value of $m$. Your function must take three parameters: a string s, a integer value for p, and an integer filter size m. It must return an integer that is the value of the hash function hp(s). The main function of your program should get the three inputs from the user, then call the hash function and print its value. The inputs are (1) a line of text that is the input string to be hashed, (2) the p value for the hash function, and (3) the size of the filter, m. Your program must have a second function to do the hash calculation. (The hash calculation may not be in main.)
Explanation / Answer
here is the code for the given problem
#include <iostream>
#include <string>
#include <vector>
#include <iterator>
typedef unsigned int (*HashFunction)(std::string);
class BloomFilter {
unsigned int NumofCells;
unsigned int numofFunctioons;
bool* cell;
std::vector<HashFunction> hashFunctions;
public:
BloomFilter(unsigned int numbCells, std::vector<HashFunction> funcs) : NumofCells(numbCells), hashFunctions(funcs) {
cell = (bool*)calloc(numbCells, sizeof(bool));
}
void add(std::string str) {
for (std::vector<HashFunction>::iterator iter = hashFunctions.begin(); iter != hashFunctions.end(); iter++) {
cell[(*iter)(str) % NumofCells] = true;
}
}
bool search(std::string str) {
bool strInSet = true;
for (std::vector<HashFunction>::iterator iter = hashFunctions.begin(); iter != hashFunctions.end(); iter++) {
if (cell[(*iter)(str) % NumofCells] == false) {
strInSet = false;
break;
}
}
return strInSet;
}
~BloomFilter() {
free(cell);
cell = NULL;
}
};
unsigned int djb2(std::string str) {
unsigned int hash = 5381;
for (std::string::iterator iter = str.begin(); iter != str.end(); iter++) {
hash = ((hash << 5) + hash) + *iter;
}
return hash;
}
unsigned int sdbm(std::string str) {
unsigned int hash = 0;
for (std::string::iterator iter = str.begin(); iter != str.end(); iter++) {
hash = ((hash << 6) + (hash << 16) - hash) + *iter;
}
return hash;
}
int main() {
std::vector<HashFunction> hashFunctions;
hashFunctions.push_back(djb2);
hashFunctions.push_back(sdbm);
BloomFilter bf(1024, hashFunctions);
std::vector<std::string> setOfStrings({
"Hello World!",
"sweet potato",
"C++",
"alpha",
"beta",
"gamma"
});
std::cout << "Bloom filter created." << std::endl;
std::cout << "Bloom filter tests for the following set of strings:" << std::endl;
for (std::vector<std::string>::iterator iter = setOfStrings.begin(); iter != setOfStrings.end(); iter++) {
bf.add(*iter);
std::cout << " " + *iter << std::endl;
}
std::vector<std::string> testSetOfStrings({
"Hello World!",
"sweet potato",
"C++",
"alpha",
"beta",
"gamma",
"delta",
"where am i?",
"foo",
"bar"
});
std::cout << "Testing set inclusion for the following strings:" << std::endl;
std::cout << std::boolalpha;
for (std::vector<std::string>::iterator iter = testSetOfStrings.begin(); iter != testSetOfStrings.end(); iter++) {
std::cout << " " + *iter + ": " << bf.search(*iter) << std::endl;
}
getchar();
}
Related Questions
Navigate
Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.