Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

From the answers to (When) is hash table lookup O(1)?, I gather that hash tables

ID: 646421 • Letter: F

Question

From the answers to (When) is hash table lookup O(1)?, I gather that hash tables have O(1) worst-case behavior, at least amortized, when the data satisfies certain statistical conditions, and there are techniques to help make these conditions broad. However, from a programmer's perspective, I don't know in advance what my data will be: it often comes from some external source. And I rarely have all the data at once: often insertions and deletions happen at a rate that's not far below the rate of lookups, so preprocessing the data to fine-tune the hash function is out. So, taking a step out: given some knowledge about data source, how can I determine whether a hash table has a chance of having O(1) operations, and possibly which techniques to use on my hash function?

Explanation / Answer

I am not a data structures expert, but the usual theoretical approach to

hashing is that one defines a family of functions (e.g., ha,b(x)=ax+bmodp) and

then considers the behavior on a worst case input on a randomly chosen member

of the family, where the adversary doesn't know the random choice in advance.

This is similar to how randomized algorithms are analyzed as well: the

expectation is taken over the algorithm's choices, not the input distribution.

In the past, according to a Usenix paper by Crosby and Wallach, common

programming languages didn't do anything like this, leaving lots of web apps

(and other servers) open to a DoS attack based on manufacturing collisions.

(The paper is from 2003, but it suggests that Dan Bernstein had discovered the

same idea quite a bit earlier.)

A quick google search provides claims that the state of the art in terms of

implementations has both improved and not improved.

Another aside is that in a high-bandwidth world, timing attacks make it not-

so-hard to find collisions online (as opposed to offline as the Crosby-Wallach

link suggests). I seem to remember that Daniel Golovin had results some years

ago on data structures that aren't vulnerable to timing attacks, but I don't

know if they are widely used.

Hire Me For All Your Tutoring Needs
Integrity-first tutoring: clear explanations, guidance, and feedback.
Drop an Email at
drjack9650@gmail.com
Chat Now And Get Quote