Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

A database architect must decide how to best design a database server for perfor

ID: 3876164 • Letter: A

Question

A database architect must decide how to best design a database server for performance. Assume that the database is 50 GB in size. The architect can use up to 32 GB of memory, may choose between hard drives and solid state drives, but only has $1000 to spend on memory and drives. Given the following limitations, come up with the best server design and provide its queries/second.

The database server requires at least 4 GB of memory without buffering any data. Any memory provided beyond 4 GB will buffer the data.

Must have secondary storage (HDD or SSD) of at least twice the database size for persistence.

Costs: Memory = $50/GB, SSD = $5/GB, HDD = $1/GB

Two types of queries: lookup and scan. Lookup finds a single record. # of lookup queries/second per type: Memory = 100K, HDD = 1K, SSD = 5K Assume that buffering in memory allows for faster lookup queries. For example, if memory is 29 GB, then 25 GB of data will be buffered in memory (50%), so lookup can be done from memory 50% of the time.

A scan reads all data in the database. Scan rate: Memory = 20 GB/sec., SSD = 500 MB/sec., HDD = 100 MB/sec. We will assume a scan can read as much out of the memory buffer that is there. i.e. If 50% of data is buffered in memory, then only the other 50% needs to be read from disk.

Query mix is 99% lookup and 1% scans. We will assume (naively) that lookups and scans will not affect each other's execution performance.

Calculate the time to perform 1 million queries given the above query parameters and then convert that into queries/second.

Test these two configurations: one with HDD and one with SSD with minimum storage amount (100GB) then spend rest of money to maximize memory available.

Explanation / Answer

Answer:

Given:

Size of database: 50 GB

Useable memory: 32 GB

Amount available to spend: $1000

Minimum memory requirement: 4 GB (without buffering)

Secondary storage requirement: At least twice the database size = 2*50 = 100 GB (at least)

Cost:

Memory = $50/GB, SSD = $5/GB, HDD = $1/GB

Types of query: 1) Lookup, 2) Scan

Query required to run: 1 million

Proportion of lookup queries = 1*99/100 = 0.99 million = 990000

Proportion of scan queries = 1*1/100 = 0.01 million = 10000

1. Server configuration 1:

Secondary storage: 100 GB HDD

Cost of secondary storage = 100*1 = $100

Balance amount to be spend on memory = 1000 - 100 = $900

Size of memory costing $900 = 900/50 = 18 GB

Memory available for buffering = 18 - 4 = 14 GB

Data to be read from HDD = 50 - 14 = 36 GB

Calculation for Lookup query:

Lookup for buffed data will be done from memory, while for rest will be done from HDD.

%data to be lookup from memory = 14*100/50 = 28%

Hence number of queries to be lookup from memory = 990000*28/100 = 277200

Number of queries to be lookup from HDD = 990000 - 277200 = 712800

Time required for lookup queries from memory = 277200/100000 = 2.772 seconds

Time required for lookup queries from HDD = 712800/1000 = 712.8 seconds

Total time required for lookup queries = 712.8+2.772 = 715.572 seconds

Calculation for scan query:

Time required to read buffered data from memory = 14/20 = 0.7 seconds

Time required to read 36 GB data from HDD = 36000/100 = 360 seconds

Total time required for scan queries = 360+0.7 = 360.7 seconds

Total time required for all queries = 715.572+360.7 = 1076.272 seconds

Queries/second = 1000000/1076.272 = 929.133 queries/second

-----------------------------------------

2. Server configuration 2:

Secondary storage: 100 GB SDD

Cost of secondary storage = 100*5 = $500

Balance amount to be spend on memory = 1000 - 500 = $500

Size of memory costing $500 = 500/50 = 10 GB

Memory available for buffering = 10 - 4 = 6 GB

Data to be read from HDD = 50 - 6 = 44 GB

Calculation for Lookup query:

Lookup for buffed data will be done from memory, while for rest will be done from SDD.

%data to be lookup from memory = 6*100/50 = 12%

Hence number of queries to be lookup from memory = 990000*12/100 = 118800

Number of queries to be lookup from SDD = 990000 - 118800 = 871200

Time required for lookup queries from memory = 118800/100000 = 1.118 seconds

Time required for lookup queries from SDD = 871200/5000 = 174.24 seconds

Total time required for lookup queries = 174.24+1.118 = 175.358 seconds

Calculation for scan query:

Time required to read buffered data from memory = 6/20 = 0.3 seconds

Time required to read 44 GB data from SDD = 44000/500 = 88 seconds

Total time required for scan queries = 88+0.3 = 88.3 seconds

Total time required for all queries = 175.358+88.3 = 263.658 seconds

Queries/second = 1000000/263.658 = 3792.792 queries/second

Conclusion: From this calculation, it is clearer that SSD based configuration will have much better performance than HDD based configuration i.e. almost 3-4 times better.

Hire Me For All Your Tutoring Needs
Integrity-first tutoring: clear explanations, guidance, and feedback.
Drop an Email at
drjack9650@gmail.com
Chat Now And Get Quote