DS 220 Midterm 1. (a) You declared a table phone-num and (b) nu customers who wa
ID: 3733883 • Letter: D
Question
DS 220 Midterm 1. (a) You declared a table phone-num and (b) nu customers who want to use three phone numbers in their account settings. whtomers who watable with one phone-number column. Now, you have what do you do if you have the data stored in (a) relational datà and addresses of n so that you can document database such as MongoDB? (b) You want to enable atomic updates to phone numbers users in MongoDB. What should you do in your desig support that? (c) What is eventual consistency? What is the advantage of having even consistency? (d) Your most frequent query is as follows: "Return all comments made in response to comment#, posted by a student". Show how you would store your data and what you would index using which type of indexing (B+-tree or hash tree)? 2. (a) Provide three clear examples using two transactions, the statements of the two transactions, and a timing diagram to show a write-write conflict, a read-write conflict, and a write-read conflict where the result is not equal to a serializable schedule. Show the initial value of the data and the final in the three cases. (b) Data warehouses use denormalized schema design. Show a small example that uses a denormalized schema. What are the advantages and disadvantages of denormalization? (c) What does information gain capture? When do you stop expanding a branch in a decision tree? (d) Consider the database D depicted in Table 1, containing five transactions, each containing several items. Consider minsup-60% and minconf-S096, Table i : Database D of transactions to be analyzed. TiD ltems T100 {B, O, N, E, C, O} T200 (B, O, N, E, C, A) T300 (C, A, N, E, C, A) T400 (F, A, N, E, C, A) T500 (F, A, C, A) Find all frequent 4-itemsets and 3-itemsets in the database. (i) (ii) Find association rules from these itemsets that satisfy the support and confidence thresholds stated above. (e) For what type of problems do you enable replication and for which do you enable sharding? Identify situations when replication and sharding can reduce downtime (state two scenarios, one for each). (f) When do you use a relational database, a key-value database, and a document database? Clearly identify the characteristics of the data, schema, and queries that you will consider to make such a choice. 3. (a) What is the difference between ACID and BASE? (b) What is a journal in MongoDB? Why do we need it?Explanation / Answer
Solution:
Information Gain:
The information gain based on the decrement in the entropy which is achieved after the dataset is split into the attributes. So this information gain which is achieved is always considered the highest ones while constructing a decision tree.
When do you stop expanding a branch?
When the split attributes keep on providing the entropy more than 0 we keep on expanding and as soon as the entropy becomes 0 we will stop expanding and declare that node as a leaf node.
I hope this helps if you find any problem. Please comment below. Don't forget to give a thumbs up if you liked it. :)
Related Questions
Navigate
Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.