Describe the divide-and-conquer search algorithm and explain its efficiency. Con
ID: 673250 • Letter: D
Question
Describe the divide-and-conquer search algorithm and explain its efficiency. Consider two different Split functions: Split = Lo, and Split = (Lo + Hi) / 2. Draw the trees of recursive calls. Assume that we are searching a large data base with 4 million items represented as an ordered array. How many probes into the list will the binary search require before finding the target or concluding that it cannot be found? Assume that it takes 1 microsecond to access an item, estimate the execution time of the binary search.
Explanation / Answer
In computer science, divide and conquer (D&C) is an algorithm design paradigm based on multi-branched recursion. A divide and conquer algorithm works by recursively breaking down a problem into two or more sub-problems of the same (or related) type (divide), until these become simple enough to be solved directly (conquer). The solutions to the sub-problems are then combined to give a solution to the original problem.
This divide and conquer technique is the basis of efficient algorithms for all kinds of problems, such as sorting (e.g., quicksort, merge sort), multiplying large numbers (e.g. Karatsuba), syntactic analysis (e.g., top-down parsers), and computing the discrete Fourier transform (FFTs).
Understanding and designing D&C algorithms is a complex skill that requires a good understanding of the nature of the underlying problem to be solved. As when proving a theorem by induction, it is often necessary to replace the original problem with a more general or complicated problem in order to initialize the recursion, and there is no systematic method for finding the proper generalization. These D&C complications are seen when optimizing the calculation of a Fibonacci number with efficient double recursion.
The name "divide and conquer" is sometimes applied also to algorithms that reduce each problem to only one sub-problem, such as the binary search algorithm for finding a record in a sorted list (or its analog in numerical computing, the bisection algorithm for root finding).[1] These algorithms can be implemented more efficiently than general divide-and-conquer algorithms; in particular, if they use tail recursion, they can be converted into simple loops. Under this broad definition, however, every algorithm that uses recursion or loops could be regarded as a "divide and conquer algorithm". Therefore, some authors consider that the name "divide and conquer" should be used only when each problem may generate two or more subproblems.[2] The name decrease and conquer has been proposed instead for the single-subproblem class.[3]
Algorithm efficiency
The divide-and-conquer paradigm often helps in the discovery of efficient algorithms. It was the key, for example, to Karatsuba's fast multiplication method, the quicksort and mergesort algorithms, the Strassen algorithm for matrix multiplication, and fast Fourier transforms.
In all these examples, the D&C approach led to an improvement in the asymptotic cost of the solution. For example, if the base cases have constant-bounded size, the work of splitting the problem and combining the partial solutions is proportional to the problem's size n, and there are a bounded number p of subproblems of size ~ n/p at each stage, then the cost of the divide-and-conquer algorithm will be O(n log n).
The binary search algorithm begins by comparing the target value to the value of the middle element of the sorted array. If the target value is equal to the middle element's value, then the position is returned and the search is finished. If the target value is less than the middle element's value, then the search continues on the lower half of the array; or if the target value is greater than the middle element's value, then the search continues on the upper half of the array. This process continues, eliminating half of the elements, and comparing the target value to the value of the middle element of the remaining elements - until the target value is either found (and its associated element position is returned), or until the entire array has been searched (and "not found" is returned).
Example
Sorted array: L = [1, 3, 4, 6, 8, 9, 11]
Target value: X = 4
Compare X to 6. X is smaller. Repeat with L = [1, 3, 4].
Compare X to 3. X is larger. Repeat with L = [4].
Compare X to 4. X equals 4, so the position is returned.
Algorithm
Recursive
A straightforward implementation of binary search is recursive. The initial call uses the indices of the entire array to be searched. The procedure then calculates an index midway between the two indices, determines which of the two subarrays to search, and then does a recursive call to search that subarray. Each of the calls is tail recursive, so a compiler need not make a new stack frame for each call. The variables imin and imax are the lowest and highest inclusive indices that are searched.
int binary_search(int A[], int key, int imin, int imax)
{
// test if array is empty
if (imax < imin)
// set is empty, so return value showing not found
return KEY_NOT_FOUND;
else
{
// calculate midpoint to cut set in half
int imid = midpoint(imin, imax);
// three-way comparison
if (A[imid] > key)
// key is in lower subset
return binary_search(A, key, imin, imid - 1);
else if (A[imid] < key)
// key is in upper subset
return binary_search(A, key, imid + 1, imax);
else
// key has been found
return imid;
}
}
It is invoked with initial imin and imax values of 0 and N-1 for a zero based array of length N.
The number type "int" shown in the code has an influence on how the midpoint calculation can be implemented correctly. With unlimited numbers, the midpoint can be calculated as "(imin + imax) / 2". In practical programming, however, the calculation is often performed with numbers of a limited range, and then the intermediate result "(imin + imax)" might overflow. With limited numbers, the midpoint can be calculated correctly as "imin + ((imax - imin) / 2)".
Iterative
The binary search algorithm can also be expressed iteratively with two index limits that progressively narrow the search range.[3]
int binary_search(int A[], int key, int imin, int imax)
{
// continue searching while [imin,imax] is not empty
while (imin <= imax)
{
// calculate the midpoint for roughly equal partition
int imid = midpoint(imin, imax);
if(A[imid] == key)
// key found at index imid
return imid;
// determine which subarray to search
else if (A[imid] < key)
// change min index to search upper subarray
imin = imid + 1;
else
// change max index to search lower subarray
imax = imid - 1;
}
// key was not found
return KEY_NOT_FOUND;
}
Deferred detection of equality
The above iterative and recursive versions take three paths based on the key comparison: one path for less than, one path for greater than, and one path for equality. (There are two conditional branches.) The path for equality is taken only when the record is finally matched, so it is rarely taken. That branch path can be moved outside the search loop in the deferred test for equality version of the algorithm. The following algorithm uses only one conditional branch per iteration.[4]
// inclusive indices
// 0 <= imin when using truncate toward zero divide
// imid = (imin+imax)/2;
// imin unrestricted when using truncate toward minus infinity divide
// imid = (imin+imax)>>1; or
// imid = (int)floor((imin+imax)/2.0);
int binary_search(int A[], int key, int imin, int imax)
{
// continually narrow search until just one element remains
while (imin < imax)
{
int imid = midpoint(imin, imax);
// code must guarantee the interval is reduced at each iteration
assert(imid < imax);
// note: 0 <= imin < imax implies imid will always be less than imax
// reduce the search
if (A[imid] < key)
imin = imid + 1;
else
imax = imid;
}
// At exit of while:
// if A[] is empty, then imax < imin
// otherwise imax == imin
// deferred test for equality
if ((imax == imin) && (A[imin] == key))
return imin;
else
return KEY_NOT_FOUND;
}
Arithmetic
In a practical implementation, the variables used to represent the indices will often be of finite size, hence only capable of representing a finite range of values. For example, 32-bit unsigned integers can only hold values from 0 to 4294967295. 32-bit signed integers can only hold values from -2147483648 to 2147483647. If the binary search algorithm is to operate on large arrays, this has three implications:
The values first 1 and last + 1 must both be representable within the finite bounds of the chosen integer type . Therefore, continuing the 32-bit unsigned example, the largest value that last may take is +4294967294, not +4294967295. A problem exists even for the "inclusive" form of the method, as if x > A(4294967295).Key, then on the final iteration the algorithm will attempt to store 4294967296 into L and fail. Equivalent issues apply to the lower limit, where first 1 could become negative as when the first element of the array is at index zero.
If the midpoint of the span is calculated as p := (L + R)/2, then the value (L + R) will exceed the number range if last is greater than (for unsigned) 4294967295/2 or (for signed) 2147483647/2 and the search wanders toward the upper end of the search space. This can be avoided by performing the calculation as p := (R - L)/2 + L. For example, this bug existed in Java SDK at Arrays.binarySearch() from 1.2 to 5.0 and was fixed in 6.0.[11]
KEY_NOT_FOUND must be a valid value of the return type, but this value can never be an index of the array
Related Questions
drjack9650@gmail.com
Navigate
Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.