Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

DS 220 Midterm 1. (a) You declared a table phone-num and (b) nu customers who wa

ID: 3733881 • Letter: D

Question

DS 220 Midterm 1. (a) You declared a table phone-num and (b) nu customers who want to use three phone numbers in their account settings. whtomers who watable with one phone-number column. Now, you have what do you do if you have the data stored in (a) relational datà and addresses of n so that you can document database such as MongoDB? (b) You want to enable atomic updates to phone numbers users in MongoDB. What should you do in your desig support that? (c) What is eventual consistency? What is the advantage of having even consistency? (d) Your most frequent query is as follows: "Return all comments made in response to comment#, posted by a student". Show how you would store your data and what you would index using which type of indexing (B+-tree or hash tree)? 2. (a) Provide three clear examples using two transactions, the statements of the two transactions, and a timing diagram to show a write-write conflict, a read-write conflict, and a write-read conflict where the result is not equal to a serializable schedule. Show the initial value of the data and the final in the three cases. (b) Data warehouses use denormalized schema design. Show a small example that uses a denormalized schema. What are the advantages and disadvantages of denormalization? (c) What does information gain capture? When do you stop expanding a branch in a decision tree? (d) Consider the database D depicted in Table 1, containing five transactions, each containing several items. Consider minsup-60% and minconf-S096, Table i : Database D of transactions to be analyzed. TiD ltems T100 {B, O, N, E, C, O} T200 (B, O, N, E, C, A) T300 (C, A, N, E, C, A) T400 (F, A, N, E, C, A) T500 (F, A, C, A) Find all frequent 4-itemsets and 3-itemsets in the database. (i) (ii) Find association rules from these itemsets that satisfy the support and confidence thresholds stated above. (e) For what type of problems do you enable replication and for which do you enable sharding? Identify situations when replication and sharding can reduce downtime (state two scenarios, one for each). (f) When do you use a relational database, a key-value database, and a document database? Clearly identify the characteristics of the data, schema, and queries that you will consider to make such a choice. 3. (a) What is the difference between ACID and BASE? (b) What is a journal in MongoDB? Why do we need it?

Explanation / Answer

Answer.)

2b) As asked in the question :

Data Warehouses are large store of data accumulated from a wide range of sources and are mostly used for querying purposes. For this we usually avoid JOINS at all cost to avoid time limitations. Hence, De-normalization is used in Data Warehouse.

For example,

Tables :

Employee (EmpID, EmpName, DeptID, PayrollID, Designation)

Dept(DeptID, DeptName, DeptHead)

Payroll(PayrollID, PayrollName, Amount)

This is a normalised scenario.

Now, to query EmpName, DeptName, DeptHead and Amount we have to join all the three tables, and as data warehouse deals with huge amount of data, a time constraint arises in this normalised scenario.

SELECT e.EmpName, d.DeptName, d.DeptHead, p.Amount

FROM Employee e INNER JOIN Dept d

ON e.DeptID=d.DeptID INNER JOIN Payroll p

ON e.PayrollID=p.PayrollID;

Now, to de-normalize this all we have to do is, we have to make a single table with all the column of the three tables in it.

Table :

EmployeeDetails(EmpID, EmpName, DeptName, DeptHead, PayrollName, Amount, Designation)

And, now to query this table to get the EmpName, DeptName, DeptHead and Amount we don't have to join all the three tables. We are having one single table with all the data. Hence, time management is optimised.

SELECT e.EmpName, e.DeptName, e.DeptHead, e.Amount

FROM Employee e;

Hence, data warehouse uses de-normalised schema design.

De-normalization | Advantages :

De-normalization | Disadvantages :