Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

Let us use some simple estimates to compare the cost and performance of a teraby

ID: 3540701 • Letter: L

Question

Let us use some simple estimates to compare the cost and performance of a terabyte storage
system made entirely from disks with one that incorporates tertiary storage. Suppose
that magnetic disks each hold 10 gigabytes, cost $1000, transfer 5 megabytes per second,
and have an average access latency of 15 milliseconds. Suppose that a tape library costs
$10 per gigabyte, transfers 10 megabytes per second, and has an average access latency of
20 seconds. Compute the total cost, themaximum total data rate, and the average waiting
time for a pure disk system. (Do you need to make any assumptions about the workload?
If you do, what are they?) Now, suppose that 5 percent of the data are frequently used,
so they must reside on disk, but the other 95 percent are archived in the tape library. And
suppose that 95 percent of the requests are handled by the disk system, and the other 5
percent are handled by the library. What are the total cost, the maximum total data rate,
and the average waiting time for this hierarchical storage system?

Explanation / Answer

First let's consider the pure disk system. One terabyte is 1024 GB. To be correct,

we need 103 disks at 10 GB each. But since this question is about approximations, we

will simplify the arithmetic by rounding off the numbers. The pure disk system will have

100 drives. The cost of the disk drives would be $100,000, plus about 20% for cables,

power supplies, and enclosures, i.e., around $120,000. The aggregate data rate would

be 100 5 MB/s, or 500 MB/s. The average waiting time depends on the workload.

Suppose that the requests are for transfers of size 8 KB, and suppose that the requests

are randomly distributed over the disk drives. If the system is lightly loaded, a typical

request will arrives at an idle disk, so the response time will be 15 ms access time plus

about 2 ms transfer time. If the system is heavily loaded, the delay will increase, roughly

in proportion to the queue length.

Now let's consider the hierarchical storage system. The total disk space required is 5%

of 1 TB, which is 50 GB. Consequently, we need 5 disks, so the cost of the disk storage is

$5,000 (plus 20%, i.e., $6,000). The cost of the 950 GB tape library is $9500. Thus the total

storage cost is $15,500. The maximum total data rate depends on the number of drives in

the tape library. We suppose there is only 1 drive. Then the aggregate data rate is 6 10

MB/s, i.e., 60 MB/s. For a lightly-loaded system, 95% of the requests will be satisfied by

the disks with a delay of about 17 ms. The other 5% of the requests will be satisfied by the

tape library, with a delay of slightly more than 20 seconds. Thus the average delay will be

(950.017+520)/100, or about 1 second. Even with an empty request queue at the tape

library, the latency of the tape drive is responsible for almost all of the system's response

latency, because 1/20th of the workload is sent to a device that has a 20 second latency. If

the system is more heavily loaded, the average delay will increase in proportion to the

length of the queue of requests waiting for service from the tape drive.

The hierarchical system is much cheaper. For the 95% of the requests that are served by

the disks, the performance is as good as a pure-disk system. But the maximum data rate

of the hierarchical system is much worse than for the pure-disk system, as is the average

response time.