Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

DS 220 Midterm 1. (a) You declared a table phone-num and (b) nu customers who wa

ID: 3733900 • Letter: D

Question

DS 220 Midterm 1. (a) You declared a table phone-num and (b) nu customers who want to use three phone numbers in their account settings. whtomers who watable with one phone-number column. Now, you have what do you do if you have the data stored in (a) relational datà and addresses of n so that you can document database such as MongoDB? (b) You want to enable atomic updates to phone numbers users in MongoDB. What should you do in your desig support that? (c) What is eventual consistency? What is the advantage of having even consistency? (d) Your most frequent query is as follows: "Return all comments made in response to comment#, posted by a student". Show how you would store your data and what you would index using which type of indexing (B+-tree or hash tree)? 2. (a) Provide three clear examples using two transactions, the statements of the two transactions, and a timing diagram to show a write-write conflict, a read-write conflict, and a write-read conflict where the result is not equal to a serializable schedule. Show the initial value of the data and the final in the three cases. (b) Data warehouses use denormalized schema design. Show a small example that uses a denormalized schema. What are the advantages and disadvantages of denormalization? (c) What does information gain capture? When do you stop expanding a branch in a decision tree? (d) Consider the database D depicted in Table 1, containing five transactions, each containing several items. Consider minsup-60% and minconf-S096, Table i : Database D of transactions to be analyzed. TiD ltems T100 {B, O, N, E, C, O} T200 (B, O, N, E, C, A) T300 (C, A, N, E, C, A) T400 (F, A, N, E, C, A) T500 (F, A, C, A) Find all frequent 4-itemsets and 3-itemsets in the database. (i) (ii) Find association rules from these itemsets that satisfy the support and confidence thresholds stated above. (e) For what type of problems do you enable replication and for which do you enable sharding? Identify situations when replication and sharding can reduce downtime (state two scenarios, one for each). (f) When do you use a relational database, a key-value database, and a document database? Clearly identify the characteristics of the data, schema, and queries that you will consider to make such a choice. 3. (a) What is the difference between ACID and BASE? (b) What is a journal in MongoDB? Why do we need it?

Explanation / Answer

3b) What is a journal in MongoDB? Why do we need it?

Answer)
Journaling in MongoDB is the process which occurs when a write operation occurs in MongoDB in a private view which then creates changes. The private view writes the changes of the operations to the journal directory in MongoDB. The journaling doubles the virtual memory that MongoDB uses. In journal view, the OS cannot flush or save changes to the data file, as it is a private view and thus MongoDB is used to save those changes.

MongoDB writes the changes to a private view to the journal file, creating a little description of which bytes in which file changed. The journal appends each change description it gets. The journal will play the change on the shared view and the shared view will be flushed to disk.

This way data is saved to the disk. Journaling is needed when we want a private view and want to store the changes without flushing to the disk using the shared view all the time and let MongoDB flush the changes from the private view to the shared view. Journaling becomes to be able to recover data in case of abrupt shutdown and before being able to flush data to the disk.