45256 A. Develop the best logistic regression model that can predict the wage X1
ID: 3323981 • Letter: 4
Question
45256
A. Develop the best logistic regression model that can predict the wage X1 by using the combination of any following variables: total unit (X2), constructed unit (X3), equipment used (X4), city location (X5) and total cost of a project (X6). Make sure that you partition your data with 60% training test, 40% validation test, and default seed of 12345 before running the logistic regression (15 points).
B. Explain the results of your logistic regression model (i.e., p-value, R-square, coefficient, training set accuracy, and validation set accuracy) (10 points).
Wage - X1 Total Unit - X2 Contracted Units - X3 Equipment Used - X4 City Location - X5 Total Cost - X6 0 50 5 2 1 83680 1 25 2 3 1 73604 0 55 1 2 1 101562 0 68 3 2 1 91055 1 35 3 2 1 41790 0 24 2 2 1 75770 1 12 2 4 1 37420 0 20 1 2 1 58000 1 48 2 2 1 97800 0 36 2 3 1 73960 0 40 1 2 1 98720 1 39 4 2 1 54190 0 26 1 1 1 67800 1 25 1 4 1 66760 0 70 3 3 2 88055 0 36 2 2 2 68045 0 68 5 2 2 104580 0 68 2 1 2 93780 0 12 2 2 2 30000 0 12 1 2 2 53900 0 50 2 2 2 90800 0 180 10 2 2 259420 0 70 5 2 2 107385 0 212 6 2 2 274755 1 150 5 2 2 212800 1 30 2 2 2 74700 0 2 0.2 2 2 16564 1 56 4 1 2 92952 0 10 0.5 2 3 25634 1 15 2 2 3 67280 0 160 5 2 3 182055 0 64 5 2 3 68168 0 35 1 2 3 48240 0 33 1 2 3 45487 1 18 4 2 3 45140 0 150 8 2 3 104000 0 6 2 2 3 4240 0 30 1 2 3 45860 0 65 3 3 3 48920 1 15 2 2 3 57060 0 80 2 2 3 89150 0 75 4 2 3 86250 0 12 1 3 3 65210 0 68 3 3 4 91450 0 10 2 2 4 31160 0 48 3 2 4 90140 0 75 4 3 4 72000 1 10 1 2 4 28668 1 50 5 2 4 83600 0 40 1 3 4 107742 0 36 3 2 4 42400 0 60 5 2 4 57350 1 36 2 2 4 58460 0 102 3 3 4 105680 0 75 3 2 4 93000 1 24 0.5 3 4 55495 0 40 2 3 5 56875 0 75 3 2 5 92460 0 40 1 2 5 45800 0 43 5 2 5 57650 0 40 2 2 5 68500 0 12 0.5 2 5 23000 0 18 2 2 5 48770 0 65 2 5 5 68925 0 100 5 2 5 105000 1 28 4 2 5 68530 0 53 2 3 5 79800 0 100 5 2 5 68160 0 150 5 2 5 185060 0 35 3 4 5 59800 0 80 1 3 5 79128 0 48 2 2 5 50670 0 120 3 2 5 154628 0 210 6 4 5 186500 1 48 2 2 5 59760 0 10 5 2 5 35600 0 20 1 3 5 46425 0 25 1 4 5 45200 1 140 3 2 6 90000 0 25 2 1 6 49560 0 45 2 2 6 56870 0 75 3 2 6 96200 0 42 4 4 6 56550 0 60 5 2 6 95420 0 80 2 2 6 101545 0 12 1 2 6 28153 0 35 1 2 6 68520 0 75 3 3 6 97822 0 68 5 4 6 86250 0 60 3 3 6 87230 0 44 1 2 6 45920 0 62 4 3 6 75910 0 60 3 5 6 86280 1 250 2 2 6 255455 0 50 2 2 6 86480 0 78 1 2 6 85240 1 36 3 2 645256
Explanation / Answer
A. R codes :
> data=read.csv(file.choose(),header=T)
> names(data)
[1] "X1" "X2" "X3" "X4" "X5" "X6"
> set.seed(12345)
> split=sample.split(data,SplitRatio=0.6)
> train=subset(data,split==TRUE)
> test=subset(data,split==FALSE)
> #logistic regression model
> model=glm(X1~X2+X3+X4+X5+X6,data=train,family=binomial)
> summary(model)
Call:
glm(formula = X1 ~ X2 + X3 + X4 + X5 + X6, family = binomial,
data = train)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.1664 -0.7418 -0.5801 -0.2900 2.3097
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 2.320e-01 1.667e+00 0.139 0.889
X2 -1.352e-02 2.000e-02 -0.676 0.499
X3 3.392e-01 2.583e-01 1.314 0.189
X4 -3.250e-01 5.705e-01 -0.570 0.569
X5 -1.273e-01 2.187e-01 -0.582 0.560
X6 -5.986e-06 1.846e-05 -0.324 0.746
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 54.553 on 48 degrees of freedom
Residual deviance: 50.122 on 43 degrees of freedom
AIC: 62.122
Number of Fisher Scoring iterations: 4
B.
> anova(model,test="Chisq")
Analysis of Deviance Table
Model: binomial, link: logit
Response: X1
Terms added sequentially (first to last)
Df Deviance Resid. Df Resid. Dev Pr(>Chi)
NULL 48 54.553
X2 1 1.44177 47 53.111 0.2299
X3 1 2.00638 46 51.105 0.1566
X4 1 0.60134 45 50.503 0.4381
X5 1 0.27562 44 50.228 0.5996
X6 1 0.10535 43 50.122 0.7455
> fit=predict(model,newdata=test,type='response') #prediction based on test data set
> fit=ifelse(fit>0.5,1,0)
> misclasificerror=mean(fit!=test$X1)
> print(paste('ACCURACY',1-misclasificerror))
[1] "ACCURACY 0.791666666666667"
Accuracy=79.167%.
Related Questions
drjack9650@gmail.com
Navigate
Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.