Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

no copy Briefly explain, why we use Data Mining, specifically on large data sets

ID: 3799258 • Letter: N

Question

no copy Briefly explain, why we use Data Mining, specifically on large data sets (big data)?

List and briefly explain major data mining tasks with examples?

Explain different approaches to handle the problem of missing values of attributes while data cleaning.

Explain each of the following characteristics about the data warehouse mentioned in its definition:

“A data warehouse is a (1) subject-oriented, (2) integrated, (3) time-variant, and (4) nonvolatile collection of data in support of (5) management’s decision-making process”

Explanation / Answer

Data warehouse:-

A data warehouse is a subject-oriented, integrated, time-variant, and non-volatile collection of data in support of management's decision-making process.

Subject-oriented:-
A data warehouse is well ordered around major subjects, such as customer, supplier, product, and sales. Rather than focusing on the day-to-day operations and transaction processing of an organization, a data warehouse focuses on the modeling and analysis of data for decision makers.

Integrated:-
A data warehouse is usually manufactured by combining multiple different sources, such as relational databases, flat files, and on-line transaction records.

Time-variant:-
Data are stored to provide information from a historical perspective. Every key structure in the data warehouse contains, either implicitly or explicitly, an element of time.

Non-volatile:-
A data warehouse is always a physically separate store of data transformed from the application data found in the operational environment. Due to this separation, a data warehouse does not require transaction processing, recovery, and concurrency control mechanisms.