Data Mining - Error Evaluation Review the Bubba Gump survey in the Final Project
ID: 464958 • Letter: D
Question
Data Mining - Error Evaluation
Review the Bubba Gump survey in the Final Project
Document for missing data. Do you see any patterns in the missing data? What could the sources of error be? How might different sources of error and missing values impact the quality of data used in data mining activities?
Assignment must follow these formatting guidelines: double spacing, 12-point Times New Roman font, one-inch margins, and APA citations. Page length requirements: 1–2 pages.
first_name last_name city county state zip ZIP_2 Restaurant RES_VISITS WEB_PURCH_YN Webstore_Spend WEB_VISITS THIRD_SPEND THIRD_VISITS Age Married_YN MARR_BIN Income Lenna Paprocki Anchorage Anchorage AK 99501 99 155 2 1 191 3 63 1 28 N 0 42 Roxane Campain Fairbanks Fairbanks North Star AK 99708 99 60 1 0 0 1 0 0 49 Y 1 112 Erick Ferencz Fairbanks Fairbanks North Star AK 99712 99 155 2 0 0 0 0 0 44 Y 1 74 Penney Weight Anchorage Anchorage AK 99515 99 90 1 0 0 0 26 1 19 N 0 47 Wilda Giguere Anchorage Anchorage AK 99501 99 131 3 1 223 2 0 0 25 Y 1 55 Gail Kitty Anchorage Anchorage AK 99501 99 106 2 0 0 1 0 0 66 N 0 38 Carin Deleo Little Rock Pulaski AR 72202 72 40 1 0 0 1 31 1 28 N 0 64 Mattie Poquette Phoenix Maricopa AZ 85013 85 95 2 0 0 0 0 1 32 Y 1 35 Arminda Parvis Phoenix Maricopa AZ 85017 85 111 2 0 0 1 0 1 26 Y 1 72Explanation / Answer
I could assess only one pattern in the missing data. Wilda and Lenna as the highest values for the webstore spend. Others are not having any values for the webstore spend. This could be the error of concentrating the entire values on these two people only. Other could have also has some values for the webstore spend. But that is not reflecting against their particular field. This the exact case of wrong summarizing. The values would have recorded individually but when the summarizing process takes, the data could be added upon to only two people. That is the only reason that only 2 people have having the highest values and the others are not having and values corresponding to them.
These kind of errors affect the quality of data used in the data mining activity. Impact of these kind of errors are as follows:
Based on this data only resource allocation could be done. Seeing the values as zero against certain fields, there will not be any resource allocated to that particular job location. Finally the service will get affected.
Budgeting process will also be affected by using this kind of data. Proper financial backing and assistance will not be provided to that particular location seeing the value as zero.
Since proper resource and budget not provided to a well functioning centre, that will indirectly make the centre to come down from their original status in the business. That will hamper the performance of the business centre. If the performance is hampered, that will bring down the profits of the company as a whole.
Related Questions
Navigate
Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.