can you do it on R For this exercise use the ptitanic data from the rpart.plot p
ID: 2906873 • Letter: C
Question
can you do it on R
For this exercise use the ptitanic data from the rpart.plot package. (The rpart.plot package depends on the rpart package.) Use rpart.plot: :ptitanic to learn about this dataset We will use logistic regression to help predict which passengers aboard the Titanic will survive based on various attributes. # install, packages("rpart") # install.packages("rpart .pLot") library (rpart) library(rpart.plot) data("ptitanic") For simplicity, we will remove any observations with missing data. Additionally, we will create a test and train dataset ptitanic na.omit(ptitanic) set.seed (42) trn_idx sample (nrow(ptitanic), 388) ptitanic-trn= ptitanic [trn-idx, ] ptitanic tst ptitanicI-trn_idx, ] (a) Consider the model p(x) 1-p(x) where is the probability that a certain passenger survives given their attributes and - 1 is a dummy variable that takes the value 1 if a passenger was 2nd class. - I2 is a dummy variable that takes the value 1 if a passenger was 3rd class. - is a dummy variable that takes the value 1 if a passenger was male. ·Z4 is the age in years of a passenger Fit this model to the training data and report its deviance. (b) Use the model fit in (a) and an appropriate statistical test to determine if class played a significant role in surviving on the Titanic. Use a 0.01. Report The null hypothesis of the test The test statistic of the test - The p-value of the test A statistical decision - A practical conclusion (c) Use the model fit in (a) and an appropriate statistical test to determine if an interaction between age and sex played a significant role in surviving on the Titanic. Use ? 0.01. Report: The null hypothesis of the test The test statistic of the test - The p-value of the test A statistical decision - A practical conclusion (d) Use the model fit in (a) as a classifier that seeks to minimize the misclassification rate. Classify each of the passengers in the test dataset. Report the misclassification rate, the sensitivity, and the specificity of this classifier. (Use survived as the positive class.)Explanation / Answer
Call:
glm(formula = y ~ x1 + x2 + x3 + x4, family = binomial, data = ptitanic_trn)
Deviance Residuals:
Min 1Q Median 3Q Max
-2.3596 -0.6763 -0.4272 0.6918 2.4098
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 3.662144 0.392065 9.341 < 2e-16 ***
x1 -1.449896 0.269336 -5.383 7.32e-08 ***
x2 -2.379391 0.270474 -8.797 < 2e-16 ***
x3 -2.472139 0.197830 -12.496 < 2e-16 ***
x4 -0.037678 0.007914 -4.761 1.93e-06 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 1005.42 on 745 degrees of freedom
Residual deviance: 692.38 on 741 degrees of freedom
AIC: 702.38
Number of Fisher Scoring iterations: 4
b)
h0 b1=b2=0
h1: at least one of b1 b2 is non zero
from above we can see that at .01 both b1 and b2 are significant
so the class is important for survival
c)
we insert an interaction term x3:x4 and reun the model again
Call:
glm(formula = y ~ x1 + x2 + x3 + x4 + x3:x4, family = binomial,
data = ptitanic_trn)
Deviance Residuals:
Min 1Q Median 3Q Max
-2.3900 -0.6734 -0.3874 0.7212 2.6433
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 2.97410 0.43272 6.873 6.28e-12 ***
x1 -1.61137 0.29040 -5.549 2.88e-08 ***
x2 -2.50019 0.28542 -8.760 < 2e-16 ***
x3 -1.00677 0.43346 -2.323 0.02020 *
x4 -0.00709 0.01161 -0.610 0.54156
x3:x4 -0.05450 0.01500 -3.633 0.00028 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 1005.42 on 745 degrees of freedom
Residual deviance: 678.48 on 740 degrees of freedom
AIC: 690.48
Number of Fisher Scoring iterations: 5
we see that x3:x4 has a coefficient which has p <.01 so there are significant interaction effects
also AIC decreases so this model is better as well
d)
0 1
0 143 30
1 36 91
accuracy =( 91+143)/300 . =.78
percision = 91/ 121 = .75
recall = 91 / 126 = .722
code
library(rpart)
library(rpart.plot)
data("ptitanic")
#creating dummy
ptitanic=na.omit(ptitanic)
ptitanic$x1<-ifelse(ptitanic$pclass =="2nd", 1,0)
ptitanic$x2<-ifelse(ptitanic$pclass =="3rd", 1,0)
ptitanic$x3<-ifelse(ptitanic$sex =="male", 1,0)
ptitanic$x4<-ptitanic$age
ptitanic$y<-ifelse(ptitanic$survived =="survived", 1,0)
set.seed(42)
indx = sample(nrow(ptitanic), 300)
ptitanic_trn =ptitanic[-indx,]
ptitanic_tst =ptitanic[indx,]
#training mdel
model <- glm (y ~ x1+x2+x3+x4, data = ptitanic_trn, family = binomial)
summary(model)
y_pred<-predict(model, ptitanic_tst, type = 'response')
#confusion matrix
table(ptitanic_tst$y, ifelse(y_pred > 0.5,1,0))
Related Questions
drjack9650@gmail.com
Navigate
Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.