Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

You are given some data by a collaborator, and asked to build a two-class classi

ID: 3005982 • Letter: Y

Question

You are given some data by a collaborator, and asked to build a two-class classifier with n = 1000 observations and p = 500 features, to predict the risk of a customer defaulting on a loan. Unfortunately about 25% of the features are missing at random (and not the same 25% each time). The result is that nearly every observation has some missing features. How would you deal with this? If you later learn that some of the features like monthly income are not missing at random, but are more likely to be missing because the mortgage company has lost track of the customer. How would you deal with this issue?

Explanation / Answer

in this case very large amount of data was missing(25%) first try to complete the data if possible.

we can not delete full row of missing value data beacause we might lose data

we can also replace data with mean or median (not recomended) but this is not clever way to handle this situation in data analysis

their is package in R called MICE

please install mice package and try to run following code

install.packages("mice")

library("mice")

x1=mice(DataName,m=5,seed=100)
x2=complete(x1)
View(x2)

your missing values are filled

if you have any doubt regarding this please comment.

Hire Me For All Your Tutoring Needs
Integrity-first tutoring: clear explanations, guidance, and feedback.
Drop an Email at
drjack9650@gmail.com
Chat Now And Get Quote