Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

1. Suppose you need to sort a relation of 40 gigabytes, with 4 kilobyte blocks,

ID: 3531617 • Letter: 1

Question

1. Suppose you need to sort a relation of 40 gigabytes, with 4 kilobyte blocks, using a memory size of 40 megabytes. Suppose the cost of a seek is 5 milliseconds, while the disk transfer rate is 40 megabytes per second. (18 points) a. Find the cost of sorting the relation, in seconds, with bb = 1 and with bb = 100. b. In each case, how many merge passes are required? c. Suppose a flash storage device is used instead of a disk, and it has a seek time of 1 microsecond, and a transfer rate of 40 megabytes per second. Recompute the cost of sorting the relation, in seconds, with bb = 1 and with bb = 100, in this setting.

Explanation / Answer

i. Find the cost of sorting the relation, in seconds, with bb = 1 and with bb = 100.

cost = b * tT + S * tS

First find b:
br = number of relation blocks = relation size/block size
= 40 gigabytes/4 kilobytes = 10,000,000 blocks

M = number of memory blocks = memory size/block size
= 40 megabytes/4 kilobytes =10,000 blocks

B = number of block transfers = br(2[logM-1(br/M)]+1)
= 10,000,000 * (2[log9999(10,000,000/10,000)] + 1)
= 10,000,000 * (2[4(1,000)] + 1)
= 10,000,000 * (2[4,000] + 1)
= 10,000,000 * (8,000 + 1)
= 10,000,000 * 8,001
= 80,010,000,000 transfers

Next find tT:
tT = block transfer time = block size/disk transfer rate = 4,000/40,000,000
= 0.0001 seconds


Next find S:
S = number of seeks = 2[br/M] + [br/bb](2[logM-1(br/M)]-1)

case one:
= 2[10,000,000/10,000]+[10,000,000/1](2[log9999(10,000,000/10,000)]-1)
= 2[1,000]+10,000,000(2[4(1,000)]-1)
= 2,000+10,000,000(2[4,000]-1)
= 2,000+10,000,000(8,000-1)
= 2,000+10,000,000(7,999)
= 2,000 + 79,990,000,000
= 79,990,002,000 seeks

case two:
= 2[10,000,000/10,000] + [10,000,000/100](2[log9999(10,000,000/10,000)]-1)
= 2[1,000]+100,000(2[4(1,000)]-1)
= 2,000+100,000 (2[4,000]-1)
= 2,000+100,000 (8,000-1)
= 2,000+100,000 (7,999)
= 2,000 + 799,900,000
= 799,902,000 seeks


Then find tS:
tS = disk seek cost = number of seeks because the problem involves a relation that is larger than the available memory

So