Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

A. Explain the Collision term in a Hash Table. B. Use the following hash functio

ID: 3852127 • Letter: A

Question

A. Explain the Collision term in a Hash Table.

B. Use the following hash function to build a hash table of array size (12). hashFunction(s) s s.toUpperCase(); hash ( (s. charAt(1) + s.charAt(0)) )/2 – ‘A’ ) MOD 12 return hash Use the hash function and the table below to insert the following strings in the hash table:

Apple, Banana, Jack, Kite, Motorcycle, Book, Computer, Data, Information, Blue, Green, Red, Yellow, Orange, Car. Char A B C D E F G H I J K L M N O P Q R S T U V W X Y Z Decimal 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90

C. What collision handling technique did you use to deal with collisions?

D. What is the name of the other collision handling technique?

Explanation / Answer

A.

A collision is used in two slightly different senses in theoretical computer science and telecommunications. In computer science, it refers to a situation where a function maps two distinct inputs to the same output. In telecommunications, a collision occurs when two nodes of a network attempt to transmit at the same time.

A collision or clash occurs when two different inputs to a function, typically one used to compress large data items into a smaller or fixed size, produce the same output, called (depending on the application) a hash value, checksum, fingerprint, or digest.

Collisions are unavoidable whenever members of a very large set (such as all possible person names, or all possible computer files) are mapped to a relatively short bit string. This is merely an instance of the pigeonhole principle.

C.

Hash Table Collision Handling

Two basic methods; separate chaining and open address.

Separate Chain

Hangs an additional data structure off of the buckets. For example the bucket array becomes an array of link list. So to find an item we first go to the bucket then compare keys.. This is a popular method, and if link list is used the hash never fills up.

Illustrate

load factor, f = n/N where n is number of items stored in the hash table. Like for the load factor to be less then 1.

The cost for get(k) is on average O(n/N)

Open Addressing

The problem with separate chaining is that the data structure can grow with out bounds. Sometimes this is not appropriate because of finite storage, for example in embedded processors.

Open addressing does not introduce a new structure. If a collision occurs then we look for availability in the next spot generated by an algorithm. Open Addressing is generally used where storage space is a premium, i.e. embedded processors. Open addressing not necessarily faster then separate chaining.

Methods for Open Addressing:

Linear Probing:

We try to insert Item = (k, e) into bucket A[i] and find it full so the next bucket we try is:

A[(i + 1) mod N]

then try A[(i + 1) mod N], etc.

Illustrate with 11 buckets: Note the probing is linear.

Note the hash table can be filled up.

Also what to do if we remove an Item. Should repair the array A but this is too costly. Instead we mark the bucket as available/deactivated. Then the next use of findElement(k) would skip over the available/deactivated bucket. insertItem(k, e) would insert into a available/deactivated.

Clustering slows down searches.

Quadratic Probing:

A[ (i + f(j) )mod N] where j = 0, 1, 2, ... and f(j) = j2

Helps avoids clustering.  Secondary clustering can occur. We can imagine a more complicated function for f.

Double Hashing:

Use a second hash function h'.

A[ (i + f(j) )mod N]   where  f(j) = j*h'(k) should not evaluate to zero. Example: h'(k) = q - (k mod q). Note that still i = h(k).

D.

Load factor

The performance of most collision resolution methods does not depend directly on the number n of stored entries, but depends strongly on the table's load factor, the ratio n/s between n and the size s of its bucket array. With a good hash function, the average lookup cost is nearly constant as the load factor increases from 0 up to 0.7 or so. Beyond that point, the probability of collisions and the cost of handling them increases.
On the other hand, as the load factor approaches zero, the size of the hash table increases with little improvement in the search cost, and memory is wasted.

Separate chaining with other structures

Instead of a list, one can use any other data structure that supports the required operations. By using aself-balancing tree, for example, the theoretical worst-case time of a hash table can be brought down to O(log n) rather than O( n). However, this approach is only worth the trouble and extra memory cost if long delays must be avoided at all costs (e.g. in a real-time application), or if one expects to have many entries hashed to the same slot (e.g. if one expects extremely non-uniform or even malicious key distributions).
The variant called array hash table uses a dynamic array to store all the entries that hash to the same slot. Each newly inserted entry gets appended to the end of the dynamic array that is assigned to the slot. The dynamic array is resized in an exact-fit manner, meaning it is grown only by as many bytes as needed. Alternative techniques such as growing the array by block sizes or pages were found to improve insertion performance, but at a cost in space. This variation makes more efficient use of CPU caching and the TLB (Translation lookaside buffer), since slot entries are stored in sequential memory positions. It also dispenses with the next pointers that are required by linked lists, which saves space and despite frequent array resizing, space overheads incurred by operating system such as memory fragmentation, were found to be small.
An elaboration on this approach is the so-called ynamic perfect hashing, where a bucket that contains k entries is organized as a perfect hash table with k2 slots. While it uses more memory (n2 slots for n entries, in the worst case), this variant has guaranteed constant worst-case lookup time, and low amortized time for insertion.

Robin Hood hashing

One interesting variation on double-hashing collision resolution is that of Robin Hood hashing. The idea is that a key already inserted may be displaced by a new key if its probe count is larger than the key at the current position. The net effect of this is that it reduces worst case search times in the table. This is similar to Knuth's ordered hash tables except that the criterion for bumping a key does not depend on a direct relationship between the keys. Since both the worst case and the variation in the number of probes is reduced dramatically, an interesting variation is to probe the table starting at the expected successful probe value and then expand from that position in both directions. External Robin Hashing is an extension of this algorithm where the table is stored in an external file and each table position corresponds to a fixed sized page or bucket with B records.

Cuckoo hashing

Another alternative open-addressing solution is cuckoo hashing, which ensures constant lookup time in the worst case, and constant amortized time for insertions and deletions. It uses two or more hash functions, which means any key/value pair could be in two or more locations. For lookup, the first hash function is used; if the key/value is not found, then the second hash function is used, and so on. If a collision happens during insertion, then the key is re-hashed with the second hash function to map it to another bucket. If all hash functions are used and there is still a collision, then the key it collided with is removed to make space for the new key, and the old key is re-hashed with one of the other hash functions, which maps it to another bucket. If that location also results in a collision, then the process repeats until there is no collision or the process traverses all the buckets, at which point the table is resized. By combining multiple hash functions with multiple cells per bucket, very high space utilisation can be achieved.

Hopscotch hashing

Another alternative open-addressing solution is hopscotch hashing, which combines the approaches of cuckoo hashing andlinear probing, yet seems in general to avoid their limitations. In particular it works well even when the load factor grows beyond 0.9. The algorithm is well suited for implementing a resizable concurrent hash table.

The hopscotch hashing algorithm works by defining a neighborhood of buckets near the original hashed bucket, where a given entry is always found. Thus, search is limited to the number of entries in this neighborhood, which is logarithmic in the worst case, constant on average, and with proper alignment of the neighborhood typically requires one cache miss. When inserting an entry, one first attempts to add it to a bucket in the neighborhood. However, if all buckets in this neighborhood are occupied, the algorithm traverses buckets in sequence until an open slot (an unoccupied bucket) is found (as in linear probing). At that point, since the empty bucket is outside the neighborhood, items are repeatedly displaced in a sequence of hops (in a manner reminiscent of cuckoo hashing, but with the difference that in this case the empty slot is being moved into the neighborhood, instead of items being moved out with the hope of eventually finding an empty slot). Each hop brings the open slot closer to the original neighborhood, without invalidating the neighborhood property of any of the buckets along the way. In the end the open slot has been moved into the neighborhood, and the entry being inserted can be added to it.

Hire Me For All Your Tutoring Needs
Integrity-first tutoring: clear explanations, guidance, and feedback.
Drop an Email at
drjack9650@gmail.com
Chat Now And Get Quote