Youareinterestedinanalyzingsomehard-to-obtaindatafromtwosepa- rate databases. Ea
ID: 3784082 • Letter: Y
Question
Youareinterestedinanalyzingsomehard-to-obtaindatafromtwosepa- rate databases. Each database contains n numerical values—so there are 2n values total—and you may assume that no two values are the same. You’d like to determine the median of this set of 2n values, which we will define here to be the nth smallest value.
However, the only way you can access these values is through queries to the databases. In a single query, you can specify a value k to one of the two databases, and the chosen database will return the kth smallest value that it contains. Since queries are expensive, you would like to compute the median using as few queries as possible.
Give an algorithm that finds the median value using at most O(log n) queries.
Explanation / Answer
Here's how I thought this through. Since it's an educational problem, I suggest you stop reading if any of the points makes you think, "aha, I hadn't thought of that, I can think further along those lines by myself".
1) Same observation as sdcwc: with a possible slight quibble over whether it starts at 0 or 1, the database can be thought of as a sorted array. I ask for element 0, I get the smallest. I ask for 12, I get the 13th smallest. And so on. I find this easier to visualise.
2) We know we're looking for an O(log n) algorithm. That means that roughly speaking we're looking for one of two things:
Either we start out by calculating the smallest element, then calculate the 2nd smallest, 4th smallest, 8th, etc, until we're up to the size we want, each step in constant time. This doesn't sound promising to me.
Or we start out with a problem of size n then we perform some constant-time operation which allows us to solve the original problem by solving a problem of size n/2. Obviously we can solve the problem for n=1 in constant time, to complete the algorithm. This sounds a bit more plausible.
Actually it doesn't necessarily have to be n/2 each time. It could be n/3, or 999*n/1000, and the result will still be O(log n). But there's no harm in looking for n/2 first.
3) How are we going to reduce the problem like that? Well, if we can discount/remove m elements from the start of one array or the other as being smaller than the kth smallest, then we can find the (k-m)th smallest element of the resulting pair of arrays, and it will be the kth smallest of the original arrays.
4) Finally, the breakthrough observation is that if the mth smallest element of array A is smaller than the mth smallest element of array B, then that mth element of A cannot possibly be the (2m)th smallest element of the two arrays combined. It's smaller than that (or of equal value: I'm not sure whether "no repeats" means "no repeats in each database", or "no repeats between the databases combined"), since there are at most 2*(m-1) elements strictly smaller than it in both arrays combined.
Related Questions
drjack9650@gmail.com
Navigate
Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.