Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

DS 220 Midterm 1. (a) You declared a table phone-num and (b) nu customers who wa

ID: 3733898 • Letter: D

Question

DS 220 Midterm 1. (a) You declared a table phone-num and (b) nu customers who want to use three phone numbers in their account settings. whtomers who watable with one phone-number column. Now, you have what do you do if you have the data stored in (a) relational datà and addresses of n so that you can document database such as MongoDB? (b) You want to enable atomic updates to phone numbers users in MongoDB. What should you do in your desig support that? (c) What is eventual consistency? What is the advantage of having even consistency? (d) Your most frequent query is as follows: "Return all comments made in response to comment#, posted by a student". Show how you would store your data and what you would index using which type of indexing (B+-tree or hash tree)? 2. (a) Provide three clear examples using two transactions, the statements of the two transactions, and a timing diagram to show a write-write conflict, a read-write conflict, and a write-read conflict where the result is not equal to a serializable schedule. Show the initial value of the data and the final in the three cases. (b) Data warehouses use denormalized schema design. Show a small example that uses a denormalized schema. What are the advantages and disadvantages of denormalization? (c) What does information gain capture? When do you stop expanding a branch in a decision tree? (d) Consider the database D depicted in Table 1, containing five transactions, each containing several items. Consider minsup-60% and minconf-S096, Table i : Database D of transactions to be analyzed. TiD ltems T100 {B, O, N, E, C, O} T200 (B, O, N, E, C, A) T300 (C, A, N, E, C, A) T400 (F, A, N, E, C, A) T500 (F, A, C, A) Find all frequent 4-itemsets and 3-itemsets in the database. (i) (ii) Find association rules from these itemsets that satisfy the support and confidence thresholds stated above. (e) For what type of problems do you enable replication and for which do you enable sharding? Identify situations when replication and sharding can reduce downtime (state two scenarios, one for each). (f) When do you use a relational database, a key-value database, and a document database? Clearly identify the characteristics of the data, schema, and queries that you will consider to make such a choice. 3. (a) What is the difference between ACID and BASE? (b) What is a journal in MongoDB? Why do we need it?

Explanation / Answer

ACID and BASE are two common consistency models.

The ACID acronym stands for :

Atomicity -> Either all the operations in a transaction will succeed or all the operations will be rolled back.

Consistent -> Any transaction will not leave the database in an inconsistent state.

Isolated -> None of the transactions will affect another transaction.

Durable -> None of the transaction will get affected even in the power failures.

The BASE acronym stands for :

Basic Availability -> In this, the database appears that it is working most of the time.

Soft - state -> No need for the different replicas to be mutually consistent all of the time.

Eventual consistency -> Consistency is exhibited by the stores but at some later point.