2. Verify experimentally that the average depth of the nodes in randomly built b
ID: 668376 • Letter: 2
Question
2. Verify experimentally that the average depth of the nodes in randomly built binary search trees containing n nodes is o (log n) for trees containing n = 4, 32,256, and 2048 nodes. Of course, it won^?t be sufficient to generate just one randomly built binary search tree for each value of n. You?ll have to generate many trees for each value of n. calculate the average depth of the nodes for each tree generated, and then calculate the average depth of all the trees generated .For n= 4, 32, 256, and 2048, generate8, 64, 512, and 4096 trees, respectively. Write the output that you generate into a comma-separated file and then import the file into an MS-Excel compatible spreadsheet. Compare the average depths on a graph, where n is on the x-axis and the average depth is on the y-axis. Hand-written graphs or graphs generated from manually entered data will not be accepted. NOTE: This problem has nothing to do with the algorithm timing stuff in Problem 1!! Do your work on one of the Linux machines in CL 115. To log onto one of these machines, follow the instructions given above in Problem 1. Submit to TIR Courses: (1) the source code for your program, (2) a script file showing the compilation and execution of your program, and the contents of the average depth file (use the UNIX cat command to show the contents?). (3) the average depth file, and(4) the MS- Excel file containing the graph of the average depths. If you are working alone, submit to TIR Courses: (1) the source code for any .cpp and .h files you might have created (and only the . p,p and .h files) zipped into one file and (2) a single script file showing the execution of your program and the contents of the average depth file (use the UNIX cat command to show the contents of the average depth file in the script), (3) the average depth file, and (4) the MS-Excel compatible file containing the graph of the average depths.Explanation / Answer
Let's first think about this intuitively. In the best-case scenario, the tree is perfectly balanced; in the worst-case scenario, the tree is entirely unbalanced:
Starting from the root node p, this left tree has twice as many nodes at each succeeding depth, such that the tree has n=hi=02i=2h+11 nodes and a height h (which is in this case 3). With a little math, n2h+11hlog2(n+1)1log2n, which is to say it has O(logn)height. For the entirely unbalanced tree, the height of tree is simply n1O(n). So we have our bounds.
If we were constructing a balanced tree from an ordered list {1,2,…,n}, we'd choose the middle element to be our root node. If we are instead randomly constructing a tree, any of the n nodes are equally likely to be picked and the height of our tree is:
heighttree=1+max(heightleft subtree,heightright subtree)
We know that in a binary search tree, the left subtree must only contain keys less than the root node. Thus, if we randomly choose the ith element, the left subtree has i1 elements and the right subtree has ni elements, so more compactly: hn=1+max(hi1,hni). From there, it makes sense that if each element is equally likely to be picked, the expected value is just the average of all cases (rather than a weighted average). Hence: E[hn]=1nni=1[1+max(hi1,hni)]
As I'm sure you've noticed, I've deviated slightly from how CLRS proves this, because CLRS uses two relatively common proof techniques that are disconcerting for the uninitiated. The first is to use exponents (or logarithms) of what we want to find (in this case height), which makes the math work out a little more cleanly; the second is to use indicator functions (which I'm just going to ignore here). CLRS defines the exponential height as Yn=2hn, so the analogous recurrence is Yn=2×max(Yi1,Yni).
Assuming independence (that each draw of an element (out of the available elements) to be the root of a subtree is irrespective of all previous draws), we still have the relation:
E[Yn]=i=1n1nE[2×max(Yi1,Yni)]=2ni=1nE[max(Yi1,Yni)]
for which I did two steps: (1) moving the 1n outside because it is a constant and one of the properties of summations is that ici=cii, and (2) moving the 2 outside because it is also a constant and one of the properties of expected values is E[ax]=aE[x]. Now we're going to replace the max function with something larger because otherwise simplifying is difficult. If we argue for nonnegative X, Y: E[max(X,Y)]E[max(X,Y)+min(X,Y)]=E[X]+E[Y], then:
E[Yn]2ni=1n(E[Yi1]+E[Yni])=2ni=0n12E[Yi]
such that the last step follows from the observation that for i=1, Yi1=Y0 and Yni=Yn1and going all the way to i=n, Yi1=Yn1 and Yni=Y0, so every term Y0 to Yn1 appears twice, so we can replace the entire summation with an analogous one. The good news is that we have a recurrence E[Yn]4nn1i=0E[Yi]; the bad news, is that we're not much further than where we started.
At this point, CLRS pulls an induction proof E[Yn]14(n+33) out of their ... repertoire of mathematical experience, one that includes an identity n1i=0(i+33)=(n+34) they leave to the user to prove. What's important about their choice is that its largest term is n3, and recall that we are using exponential height Yn=2hn such that hn=log2n3=3log2nO(logn). Perhaps someone will comment why this particular binomial was chosen. The general idea though is to bound from above our recurrence with an expression nk for some constant k.
To conclude with a one liner:
2E[Xn]E[Yn]4ni=0n1E[Yi]14(n+33)=(n+3)(n+2)(n+1)24E[hn]=O(logn)
Related Questions
drjack9650@gmail.com
Navigate
Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.