From the answers to (When) is hash table lookup O(1)?, I gather that hash tables
ID: 648011 • Letter: F
Question
From the answers to (When) is hash table lookup O(1)?, I gather that hash tables have O(1) worst-case behavior, at least amortized, when the data satisfies certain statistical conditions, and there are techniques to help make these conditions broad.
However, from a programmer's perspective, I don't know in advance what my data will be: it often comes from some external source. And I rarely have all the data at once: often insertions and deletions happen at a rate that's not far below the rate of lookups, so preprocessing the data to fine-tune the hash function is out.
So, taking a step out: given some knowledge about data source, how can I determine whether a hash table has a chance of having O(1) operations, and possibly which techniques to use on my hash function?
Explanation / Answer
I am not a data structures expert, but the usual theoretical approach to hashing is that one defines a family of functions (e.g., ha,b(x)=ax+bmodp) and then considers the behavior on a worst case input on a randomly chosen member of the family, where the adversary doesn't know the random choice in advance. This is similar to how randomized algorithms are analyzed as well: the expectation is taken over the algorithm's choices, not the input distribution.
In the past, according to a Usenix paper by Crosby and Wallach, common programming languages didn't do anything like this, leaving lots of web apps (and other servers) open to a DoS attack based on manufacturing collisions. (The paper is from 2003, but it suggests that Dan Bernstein had discovered the same idea quite a bit earlier.)
A quick google search provides claims that the state of the art in terms of implementations has both improved and not improved.
Another aside is that in a high-bandwidth world, timing attacks make it not-so-hard to find collisions online (as opposed to offline as the Crosby-Wallach link suggests). I seem to remember that Daniel Golovin had results some years ago on data structures that aren't vulnerable to timing attacks, but I don't know if they are widely used.
Related Questions
drjack9650@gmail.com
Navigate
Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.