Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

Data mining has several challenges that can relate to data scientists, predictiv

ID: 3747295 • Letter: D

Question

Data mining has several challenges that can relate to data scientists, predictive modelers, analysts, and other industry-specific. However, organizations try to manage and integrate data issues to excess in daily projects. Some of the Data Mining challenges could be categorized into seven areas as Scalability, Dimensionality, Complex and Heterogeneous Data, Data Quality, Data Ownership and Distribution, Privacy Preservation.

a)Select a case study company of your choice that uses the huge amount of data in their daily operations. (Case study company can relate to health, supply chain, transportation, telecommunication, or other industry).

b)Describe the company mission and objectives.

c)Explain the data mining challenges that the organization faces using the seven areas identified above.

d)Provide strategies on how to handle the challenges with justification to support your ideas.

e)General Paper Format: APA format

Explanation / Answer

Answer)

Data mining is the calculation and the process of identifying patterns in the huge datasets using methods of data analysis, machine learning, database systems calculation, and statistics.

a) A company that uses the huge amount of data in their daily operations is the e-tailer Amazon, who has got GBs of data generated from their customer and their systems. We are just going to focus on customer data here, which is huge.

b) The company mission is to serve as a one of a kind e-commerce leader in the global market by serving its customer's great deals at deal prices. Thus Amazon is committed to serving its customers. Having huge data of the data order, their likes and dislikes and their buying habit generate a huge amount of data to be analyzed by the data mining.

c) Amazon nevertheless uses machine learning and data mining to analyze data and recommend products to the customer according to their buying habits. The challenges faced by the organization are:

Data Quality - bad quality of data is an issue where the collection and usage of the data is a problem.

Complex and Heterogeneous Data - These data cannot be easily analyzed and thus difficult to use.

Privacy Preservation - We have to analyze data using data mining and machine learning and also maintain the privacy of the users, which is lost somewhere in the process.

d) The strategies on how Amazon can handle the challenges are;

i) Simplifying the complicated and Heterogeneous Data

ii) Masking the actual data to keep the privacy of the customers and also acheiving the analysis of the data

iii) Bad quality data should be filtered and made good quality, otherwise they should not be considered.

iv) Data collection methods to be improved.