Recall that in separate chaining, we create a linked list (or array) at each pos
ID: 3574542 • Letter: R
Question
Recall that in separate chaining, we create a linked list (or array) at each position of the hash table. When multiple items collide at that position, we store them in that list/array. Since to find an item, we now have to perform a linear search through that "bucket", it's important to bound how many items are likely to be stored in any one bucket.
For the entire first problem, we assume that the load factor is 1, i.e., we are placing m items in m hash buckets. (Obviously, the guarantees will be better if the load factor is lower, but this way, we will avoid having an extra parameter you have to deal with.) Prove the following:
The probability that any hash bucket contains more than ln(m) items goes to 0 as m goes to infinity. (As a result, for large hash tables, all hash operations are very likely to run in time O(log m).)
[With a somewhat more complicated analysis, one can prove an upper bound of O(log m/log log m), and with more work also a lower bound of (log m/log log m). But you don't have to prove that.]
Guide line:
As an outline of a proof that you might want to follow (though you don't have to):
As we discussed in class, you can/should assume that the hash function maps each item independently to a uniformly random position between 0 and m-1 (where m is the hash table's size).
First, focus on just one hash bucket i.
If bucket i ends up with at least ln m keys, then there must be some set (let's call it S) of exactly ln m keys that end up in it.
How many different sets S of ln m keys are there?
For any one set S of exactly ln m keys, what is the probability that they all end up in bucket i (and the others end up in bucket i or somewhere else)?
To upper bound the probability that at least one of these sets ends up in bucket i, you can use the "Union Bound": It says that the probability of the union of events is no more than the sum of probabilities of the individual events. In particular, it implies that the probability that at least one of the sets S ends up in bucket i is at most the sum of their individual probabilities.
You will probably end up with some Binomial coefficients somewhere. A useful formula here is that
(nk)nk/k!.(nk)nk/k!.
Now the formula should simplify nicely for a single bucket i.
To go from a single bucket to all buckets, you can again use the Union Bound, which says that the probability that at least one bucket has more than ln m keys is at most the sum of the probabilities for the individual buckets.
Finally, you may want to use the following consequence of Stirling's Approximation:
n!(n/e)nn!(n/e)n
for all n, where e=2.7181... is Euler's Constant.
If all has gone well, you can now take limits as m goes to infinity. Alternatively, consider whether your numerator or denominator grows more quickly asymptotically.
You may also want to look at the analysis of Bloom Filters in the lecture notes, which contains some useful ideas for analyzing this type of scenario.
Explanation / Answer
If m objects are placed into hash table uniformly of size m using technique of separate chaining, then
probability approach 1 to m will reach infinity and longest chain will be defined as O(ln n/ln(ln n)))
When collision occurs , create separe chain link and insert it. When inserting new element , collision check
probability will go to O(n^2 ml/n^2). This revelas that when hash table size increase the probability of
collision will become nearly to 0. So when n exceeds hash table limit size, then probability of avoiding
collisions will lead to zero. After filling all elements into hashtable, if hash table fills completely , then
separate chaining will be done. In that case collision not occurs. It always create separate link to store hash.
Related Questions
Navigate
Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.