can you do it on R For this exercise use the ptitanic data from the rpart.plot p

ID: 2906873 • Letter: C

Question

can you do it on R

For this exercise use the ptitanic data from the rpart.plot package. (The rpart.plot package depends on the rpart package.) Use rpart.plot: :ptitanic to learn about this dataset We will use logistic regression to help predict which passengers aboard the Titanic will survive based on various attributes. # install, packages("rpart") # install.packages("rpart .pLot") library (rpart) library(rpart.plot) data("ptitanic") For simplicity, we will remove any observations with missing data. Additionally, we will create a test and train dataset ptitanic na.omit(ptitanic) set.seed (42) trn_idx sample (nrow(ptitanic), 388) ptitanic-trn= ptitanic [trn-idx, ] ptitanic tst ptitanicI-trn_idx, ] (a) Consider the model p(x) 1-p(x) where is the probability that a certain passenger survives given their attributes and - 1 is a dummy variable that takes the value 1 if a passenger was 2nd class. - I2 is a dummy variable that takes the value 1 if a passenger was 3rd class. - is a dummy variable that takes the value 1 if a passenger was male. ·Z4 is the age in years of a passenger Fit this model to the training data and report its deviance. (b) Use the model fit in (a) and an appropriate statistical test to determine if class played a significant role in surviving on the Titanic. Use a 0.01. Report The null hypothesis of the test The test statistic of the test - The p-value of the test A statistical decision - A practical conclusion (c) Use the model fit in (a) and an appropriate statistical test to determine if an interaction between age and sex played a significant role in surviving on the Titanic. Use ? 0.01. Report: The null hypothesis of the test The test statistic of the test - The p-value of the test A statistical decision - A practical conclusion (d) Use the model fit in (a) as a classifier that seeks to minimize the misclassification rate. Classify each of the passengers in the test dataset. Report the misclassification rate, the sensitivity, and the specificity of this classifier. (Use survived as the positive class.)

Explanation / Answer

Call:
glm(formula = y ~ x1 + x2 + x3 + x4, family = binomial, data = ptitanic_trn)

Deviance Residuals:
Min 1Q Median 3Q Max
-2.3596 -0.6763 -0.4272 0.6918 2.4098

Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 3.662144 0.392065 9.341 < 2e-16 ***
x1 -1.449896 0.269336 -5.383 7.32e-08 ***
x2 -2.379391 0.270474 -8.797 < 2e-16 ***
x3 -2.472139 0.197830 -12.496 < 2e-16 ***
x4 -0.037678 0.007914 -4.761 1.93e-06 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 1005.42 on 745 degrees of freedom
Residual deviance: 692.38 on 741 degrees of freedom
AIC: 702.38

Number of Fisher Scoring iterations: 4

h0 b1=b2=0

h1: at least one of b1 b2 is non zero

from above we can see that at .01 both b1 and b2 are significant

so the class is important for survival

we insert an interaction term x3:x4 and reun the model again

Call:
glm(formula = y ~ x1 + x2 + x3 + x4 + x3:x4, family = binomial,
data = ptitanic_trn)

Deviance Residuals:
Min 1Q Median 3Q Max
-2.3900 -0.6734 -0.3874 0.7212 2.6433

Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 2.97410 0.43272 6.873 6.28e-12 ***
x1 -1.61137 0.29040 -5.549 2.88e-08 ***
x2 -2.50019 0.28542 -8.760 < 2e-16 ***
x3 -1.00677 0.43346 -2.323 0.02020 *
x4 -0.00709 0.01161 -0.610 0.54156
x3:x4 -0.05450 0.01500 -3.633 0.00028 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 1005.42 on 745 degrees of freedom
Residual deviance: 678.48 on 740 degrees of freedom
AIC: 690.48

Number of Fisher Scoring iterations: 5

we see that x3:x4 has a coefficient which has p <.01 so there are significant interaction effects

also AIC decreases so this model is better as well

0 1
0 143 30
1 36 91

accuracy =( 91+143)/300 . =.78

percision = 91/ 121 = .75

recall = 91 / 126 = .722

code

library(rpart)
library(rpart.plot)
data("ptitanic")
#creating dummy
ptitanic=na.omit(ptitanic)
ptitanic$x1<-ifelse(ptitanic$pclass =="2nd", 1,0)
ptitanic$x2<-ifelse(ptitanic$pclass =="3rd", 1,0)
ptitanic$x3<-ifelse(ptitanic$sex =="male", 1,0)
ptitanic$x4<-ptitanic$age
ptitanic$y<-ifelse(ptitanic$survived =="survived", 1,0)

set.seed(42)
indx = sample(nrow(ptitanic), 300)
ptitanic_trn =ptitanic[-indx,]
ptitanic_tst =ptitanic[indx,]
#training mdel
model <- glm (y ~ x1+x2+x3+x4, data = ptitanic_trn, family = binomial)
summary(model)

y_pred<-predict(model, ptitanic_tst, type = 'response')
#confusion matrix
table(ptitanic_tst$y, ifelse(y_pred > 0.5,1,0))

Navigate

can you do it as a function file named weirdSort Given a list of integers, sort

can you do the following... Change your program to decrement encoderPosition eve

Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.

can you do it on R For this exercise use the ptitanic data from the rpart.plot p

Question

Explanation / Answer

Related Questions

Navigate