Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

45256 Please solve using XLminer in Excel, the problem I keep having is that my

ID: 3324211 • Letter: 4

Question

45256

Please solve using XLminer in Excel, the problem I keep having is that my validation/training sets are producing 0's every time and doesn't seem like they are sorting the information.

A. Develop the best logistic regression model that can predict the wage X1 by using the combination of any following variables: total unit (X2), constructed unit (X3), equipment used (X4), city location (X5) and total cost of a project (X6). Make sure that you partition your data with 60% training test, 40% validation test, and default seed of 12345 before running the logistic regression (15 points). Solve Using XL Miner Data partion and Logistic Regression.

B. Explain the results of your logistic regression model (i.e., p-value, R-square, coefficient, training set accuracy, and validation set accuracy) (10 points).

Wage - X1 Total Unit - X2 Contracted Units - X3 Equipment Used - X4 City Location - X5 Total Cost - X6 0 50 5 2 1 83680 1 25 2 3 1 73604 0 55 1 2 1 101562 0 68 3 2 1 91055 1 35 3 2 1 41790 0 24 2 2 1 75770 1 12 2 4 1 37420 0 20 1 2 1 58000 1 48 2 2 1 97800 0 36 2 3 1 73960 0 40 1 2 1 98720 1 39 4 2 1 54190 0 26 1 1 1 67800 1 25 1 4 1 66760 0 70 3 3 2 88055 0 36 2 2 2 68045 0 68 5 2 2 104580 0 68 2 1 2 93780 0 12 2 2 2 30000 0 12 1 2 2 53900 0 50 2 2 2 90800 0 180 10 2 2 259420 0 70 5 2 2 107385 0 212 6 2 2 274755 1 150 5 2 2 212800 1 30 2 2 2 74700 0 2 0.2 2 2 16564 1 56 4 1 2 92952 0 10 0.5 2 3 25634 1 15 2 2 3 67280 0 160 5 2 3 182055 0 64 5 2 3 68168 0 35 1 2 3 48240 0 33 1 2 3 45487 1 18 4 2 3 45140 0 150 8 2 3 104000 0 6 2 2 3 4240 0 30 1 2 3 45860 0 65 3 3 3 48920 1 15 2 2 3 57060 0 80 2 2 3 89150 0 75 4 2 3 86250 0 12 1 3 3 65210 0 68 3 3 4 91450 0 10 2 2 4 31160 0 48 3 2 4 90140 0 75 4 3 4 72000 1 10 1 2 4 28668 1 50 5 2 4 83600 0 40 1 3 4 107742 0 36 3 2 4 42400 0 60 5 2 4 57350 1 36 2 2 4 58460 0 102 3 3 4 105680 0 75 3 2 4 93000 1 24 0.5 3 4 55495 0 40 2 3 5 56875 0 75 3 2 5 92460 0 40 1 2 5 45800 0 43 5 2 5 57650 0 40 2 2 5 68500 0 12 0.5 2 5 23000 0 18 2 2 5 48770 0 65 2 5 5 68925 0 100 5 2 5 105000 1 28 4 2 5 68530 0 53 2 3 5 79800 0 100 5 2 5 68160 0 150 5 2 5 185060 0 35 3 4 5 59800 0 80 1 3 5 79128 0 48 2 2 5 50670 0 120 3 2 5 154628 0 210 6 4 5 186500 1 48 2 2 5 59760 0 10 5 2 5 35600 0 20 1 3 5 46425 0 25 1 4 5 45200 1 140 3 2 6 90000 0 25 2 1 6 49560 0 45 2 2 6 56870 0 75 3 2 6 96200 0 42 4 4 6 56550 0 60 5 2 6 95420 0 80 2 2 6 101545 0 12 1 2 6 28153 0 35 1 2 6 68520 0 75 3 3 6 97822 0 68 5 4 6 86250 0 60 3 3 6 87230 0 44 1 2 6 45920 0 62 4 3 6 75910 0 60 3 5 6 86280 1 250 2 2 6 255455 0 50 2 2 6 86480 0 78 1 2 6 85240 1 36 3 2 6

45256

Explanation / Answer

A. R codes :

> data=read.csv(file.choose(),header=T)
> names(data)
[1] "X1" "X2" "X3" "X4" "X5" "X6"
> set.seed(12345)
> split=sample.split(data,SplitRatio=0.6)
> train=subset(data,split==TRUE)
> test=subset(data,split==FALSE)
> #logistic regression model
> model=glm(X1~X2+X3+X4+X5+X6,data=train,family=binomial)
> summary(model)

Call:
glm(formula = X1 ~ X2 + X3 + X4 + X5 + X6, family = binomial,
data = train)

Deviance Residuals:
Min 1Q Median 3Q Max  
-1.1664 -0.7418 -0.5801 -0.2900 2.3097  

Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 2.320e-01 1.667e+00 0.139 0.889
X2 -1.352e-02 2.000e-02 -0.676 0.499
X3 3.392e-01 2.583e-01 1.314 0.189
X4 -3.250e-01 5.705e-01 -0.570 0.569
X5 -1.273e-01 2.187e-01 -0.582 0.560
X6 -5.986e-06 1.846e-05 -0.324 0.746

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 54.553 on 48 degrees of freedom
Residual deviance: 50.122 on 43 degrees of freedom
AIC: 62.122

Number of Fisher Scoring iterations: 4

B.

> anova(model,test="Chisq")
Analysis of Deviance Table

Model: binomial, link: logit

Response: X1

Terms added sequentially (first to last)


Df Deviance Resid. Df Resid. Dev Pr(>Chi)
NULL 48 54.553
X2 1 1.44177 47 53.111 0.2299
X3 1 2.00638 46 51.105 0.1566
X4 1 0.60134 45 50.503 0.4381
X5 1 0.27562 44 50.228 0.5996
X6 1 0.10535 43 50.122 0.7455

> fit=predict(model,newdata=test,type='response') #prediction based on test data set

> fit=ifelse(fit>0.5,1,0)
> misclasificerror=mean(fit!=test$X1)
> print(paste('ACCURACY',1-misclasificerror))
[1] "ACCURACY 0.791666666666667"

Accuracy=79.167%.

Hire Me For All Your Tutoring Needs
Integrity-first tutoring: clear explanations, guidance, and feedback.
Drop an Email at
drjack9650@gmail.com
Chat Now And Get Quote