2. For the prostate data set, fit a model with lpsa as the response, and the oth
ID: 3045275 • Letter: 2
Question
2. For the prostate data set, fit a model with lpsa as the response, and the other variables as predictors. (a) Suppose a new patient with the following values arrives: Icavol = 1.45000, lweight = 3.59801, age = 63.00000, lbph = 0.30010, svi = 0.00000, Ic Predict the lpsa for this patient along with an appropriate 95% prediction interval p =-0.79851, gleason-7.00000, pgg45-15.00000. (b) Repeat the questions in (a) for a patient with the same values except that he is age 20. Explain why the prediction interval is wider. (c) For the model of the previous question, remove all the predictors that are not significant at the 5% level. Using the reduced model recompute the predictions for the x values given in the previous questions (a) and (b). Are the new prediction intervals wider or narrower than in parts (a) and (b)? Which predictions would you prefer? ExplainExplanation / Answer
> library("faraway", lib.loc="~/R/win-library/3.4")
Warning message:
package ‘faraway’ was built under R version 3.4.3
>install.package(faraway)
> head(prostate)
lcavol lweight age lbph svi lcp gleason pgg45 lpsa
1 -0.5798185 2.7695 50 -1.386294 0 -1.38629 6 0 -0.43078
2 -0.9942523 3.3196 58 -1.386294 0 -1.38629 6 0 -0.16252
3 -0.5108256 2.6912 74 -1.386294 0 -1.38629 7 20 -0.16252
4 -1.2039728 3.2828 58 -1.386294 0 -1.38629 6 0 -0.16252
5 0.7514161 3.4324 62 -1.386294 0 -1.38629 6 0 0.37156
6 -1.0498221 3.2288 50 -1.386294 0 -1.38629 6 0 0.76547
> data<-as.data.frame(prostate)
> model<-lm(lpsa~.,data=data)
> summary(model)
Call:
lm(formula = lpsa ~ ., data = data)
Residuals:
Min 1Q Median 3Q Max
-1.7331 -0.3713 -0.0170 0.4141 1.6381
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.669337 1.296387 0.516 0.60693
> #we need to predict lpsa where
> #lcavol=1.45,lweight=3.59801,age=63,lbph=0.3001,avi-=0,lcp=-0.79851
> #gleason=7,pgg45=15
> #download EnvStats for predict.lm which gives CI
> coeff<-model$coefficients
> coeff
(Intercept) lcavol lweight age lbph avi lcp
0.669336698 0.587021826 0.454467424 -0.019637176 0.107054031 0.766157326 -0.105474263
gleason pgg45
0.045141598 0.004525231
> lca=1.45
> lwei=3.59801
> age=63
> lbph=0.3001
> avi=0
> lcp=-0.79851
> gle=7
> ppg=15
> intercept=0
> pred_value_lpsa<-(lca*0.587021826+lwei*0.454467424+age*(-0.019637176 )+lbph*0.107054031+lcp*(-0.105474263)+gle*0.045141598+ppg*0.004525231)
> pred_value_lpsa
[1] 1.749437
> pred<-predict(model,new_data=c(lcavol=1.45,lweight=3.59801,age=63,lbph=0.3001,avi=0,lcp=-0.79851,gleason=7,pgg45=15),interval="confidence",level=0.95,type="response")
> pred
fit lwr upr
1 0.8744185 0.47939565 1.2694413
2 0.7240419 0.35805962 1.0900242
3 0.5436880 0.02458008 1.0627960
4 0.5842070 0.19187470 0.9765393
5 1.7215026 1.45812433 1.9848809
6 0.8072530 0.40088310 1.2136229
7 1.9068071 1.66337769 2.1502366
8 2.1274558 1.81440467 2.4405070
9 1.1677967 0.73613091 1.5994625
10 1.3063635 1.02397560 1.5887514
11 1.4490060 1.15414015 1.7438719
12 0.8296243 0.40901760 1.2502311
13 2.1195193 1.82324184 2.4157967
14 1.9197169 1.53886082 2.3005730
15 2.0578709 1.74241857 2.3733233
16 1.9378065 1.56806944 2.3075436
17 1.2967219 0.93328402 1.6601598
18 2.4584591 2.05237673 2.8645414
19 1.2879588 0.79598194 1.7799356
20 1.7348683 1.42337326 2.0463634
21 2.0069587 1.72887690 2.2850405
22 2.7132032 2.30515498 3.1212514
23 1.0567972 0.76034765 1.3532467
24 2.5568413 2.17891640 2.9347662
25 1.7955313 1.49415489 2.0969077
26 2.0526057 1.69107485 2.4141365
27 1.9530127 1.46434285 2.4416825
28 1.6227001 1.26434423 1.9810561
29 2.0830917 1.58124422 2.5849392
30 2.2933079 1.75476709 2.8318486
31 1.9843982 1.66755805 2.3012384
32 2.8752199 2.06590217 3.6845377
33 1.9568618 1.52257924 2.3911443
34 1.3684926 1.07721605 1.6597692
35 1.0949346 0.77680675 1.4130625
36 2.7701779 2.40330346 3.1370523
37 1.9252901 1.26730730 2.5832730
38 1.1420697 0.70132616 1.5828132
39 3.9468978 3.57991658 4.3138791
40 1.8631363 1.46229465 2.2639780
41 2.0494372 1.35829899 2.7405754
42 2.2132876 1.86514545 2.5614298
43 2.0503309 1.74595146 2.3547104
44 2.3916276 1.96842185 2.8148333
45 2.4343772 2.14371878 2.7250355
46 2.5860927 2.27012568 2.9020598
47 4.0810274 3.53458109 4.6274738
48 2.5765308 2.31689293 2.8361687
49 2.7080974 2.19059345 3.2256014
50 2.1351071 1.81777637 2.4524378
51 2.3300150 1.88650834 2.7735216
52 2.9506782 2.59409147 3.3072649
53 2.1523816 1.72192498 2.5828383
54 2.9892078 2.61177646 3.3666391
55 3.2963041 2.77960842 3.8129998
56 2.8194593 2.47488969 3.1640289
57 1.7534362 1.27259307 2.2342793
58 2.2596875 1.83943681 2.6799383
59 2.1115771 1.79346819 2.4296860
60 2.6265188 2.28748146 2.9655561
61 2.2266866 1.83661378 2.6167594
62 3.5303174 3.12402836 3.9366064
63 3.0102333 2.40522683 3.6152397
64 3.6939256 3.27099579 4.1168553
65 2.5447492 2.17411414 2.9153843
66 2.6645671 2.41326533 2.9158688
67 2.9683951 2.51711602 3.4196742
68 3.0406475 2.68234478 3.3989503
69 1.3245701 0.81986877 1.8292714
70 2.8007343 2.35482474 3.2466439
71 3.2841966 2.90201924 3.6663739
72 2.1192517 1.71506279 2.5234406
73 2.7667808 2.30088828 3.2326733
74 3.5438934 2.92828358 4.1595033
75 3.6471108 3.20999159 4.0842301
76 3.7088439 3.30445789 4.1132299
77 3.2127993 2.80053484 3.6250639
78 3.3629133 2.89023167 3.8355950
79 3.4748902 3.03835475 3.9114257
80 3.1559609 2.73739947 3.5745224
81 2.1637282 1.85841164 2.4690448
82 2.9931352 2.47497676 3.5112936
83 3.5608938 3.11601163 4.0057759
84 3.3670778 2.79545018 3.9387055
85 2.6420046 2.31182529 2.9721840
86 3.9104588 3.52035679 4.3005609
87 3.0070909 2.59899556 3.4151862
88 3.0372964 2.65050983 3.4240830
89 4.1306431 3.59721540 4.6640709
90 3.2095665 2.71147868 3.7076542
91 3.3725635 2.81773196 3.9273950
92 4.0702492 3.42626708 4.7142312
93 3.8229776 3.42733948 4.2186158
94 4.7052980 4.10051067 5.3100854
95 3.6175206 3.08833689 4.1467043
96 4.1878802 3.70398779 4.6717726
97 4.0918918 3.61056470 4.5732189
HERE 95% CI would be (0.81986877, 1.8292714)
#b)
> pred1<-predict(model,new_data=c(lcavol=1.45,lweight=3.59801,age=20,lbph=0.3001,avi=0,lcp=-0.79851,gleason=7,pgg45=15),interval="confidence",level=0.95,type="response")
> pred1
fit lwr upr
1 0.8744185 0.47939565 1.2694413
2 0.7240419 0.35805962 1.0900242
3 0.5436880 0.02458008 1.0627960
4 0.5842070 0.19187470 0.9765393
5 1.7215026 1.45812433 1.9848809
6 0.8072530 0.40088310 1.2136229
7 1.9068071 1.66337769 2.1502366
8 2.1274558 1.81440467 2.4405070
9 1.1677967 0.73613091 1.5994625
10 1.3063635 1.02397560 1.5887514
11 1.4490060 1.15414015 1.7438719
12 0.8296243 0.40901760 1.2502311
13 2.1195193 1.82324184 2.4157967
14 1.9197169 1.53886082 2.3005730
15 2.0578709 1.74241857 2.3733233
16 1.9378065 1.56806944 2.3075436
17 1.2967219 0.93328402 1.6601598
18 2.4584591 2.05237673 2.8645414
19 1.2879588 0.79598194 1.7799356
20 1.7348683 1.42337326 2.0463634
21 2.0069587 1.72887690 2.2850405
22 2.7132032 2.30515498 3.1212514
23 1.0567972 0.76034765 1.3532467
24 2.5568413 2.17891640 2.9347662
25 1.7955313 1.49415489 2.0969077
26 2.0526057 1.69107485 2.4141365
27 1.9530127 1.46434285 2.4416825
28 1.6227001 1.26434423 1.9810561
29 2.0830917 1.58124422 2.5849392
30 2.2933079 1.75476709 2.8318486
31 1.9843982 1.66755805 2.3012384
32 2.8752199 2.06590217 3.6845377
33 1.9568618 1.52257924 2.3911443
34 1.3684926 1.07721605 1.6597692
35 1.0949346 0.77680675 1.4130625
36 2.7701779 2.40330346 3.1370523
37 1.9252901 1.26730730 2.5832730
38 1.1420697 0.70132616 1.5828132
39 3.9468978 3.57991658 4.3138791
40 1.8631363 1.46229465 2.2639780
41 2.0494372 1.35829899 2.7405754
42 2.2132876 1.86514545 2.5614298
43 2.0503309 1.74595146 2.3547104
44 2.3916276 1.96842185 2.8148333
45 2.4343772 2.14371878 2.7250355
46 2.5860927 2.27012568 2.9020598
47 4.0810274 3.53458109 4.6274738
48 2.5765308 2.31689293 2.8361687
49 2.7080974 2.19059345 3.2256014
50 2.1351071 1.81777637 2.4524378
51 2.3300150 1.88650834 2.7735216
52 2.9506782 2.59409147 3.3072649
53 2.1523816 1.72192498 2.5828383
54 2.9892078 2.61177646 3.3666391
55 3.2963041 2.77960842 3.8129998
56 2.8194593 2.47488969 3.1640289
57 1.7534362 1.27259307 2.2342793
58 2.2596875 1.83943681 2.6799383
59 2.1115771 1.79346819 2.4296860
60 2.6265188 2.28748146 2.9655561
61 2.2266866 1.83661378 2.6167594
62 3.5303174 3.12402836 3.9366064
63 3.0102333 2.40522683 3.6152397
64 3.6939256 3.27099579 4.1168553
65 2.5447492 2.17411414 2.9153843
66 2.6645671 2.41326533 2.9158688
67 2.9683951 2.51711602 3.4196742
68 3.0406475 2.68234478 3.3989503
69 1.3245701 0.81986877 1.8292714
70 2.8007343 2.35482474 3.2466439
71 3.2841966 2.90201924 3.6663739
72 2.1192517 1.71506279 2.5234406
73 2.7667808 2.30088828 3.2326733
74 3.5438934 2.92828358 4.1595033
75 3.6471108 3.20999159 4.0842301
76 3.7088439 3.30445789 4.1132299
77 3.2127993 2.80053484 3.6250639
78 3.3629133 2.89023167 3.8355950
79 3.4748902 3.03835475 3.9114257
80 3.1559609 2.73739947 3.5745224
81 2.1637282 1.85841164 2.4690448
82 2.9931352 2.47497676 3.5112936
83 3.5608938 3.11601163 4.0057759
84 3.3670778 2.79545018 3.9387055
85 2.6420046 2.31182529 2.9721840
86 3.9104588 3.52035679 4.3005609
87 3.0070909 2.59899556 3.4151862
88 3.0372964 2.65050983 3.4240830
89 4.1306431 3.59721540 4.6640709
90 3.2095665 2.71147868 3.7076542
91 3.3725635 2.81773196 3.9273950
92 4.0702492 3.42626708 4.7142312
93 3.8229776 3.42733948 4.2186158
94 4.7052980 4.10051067 5.3100854
95 3.6175206 3.08833689 4.1467043
96 4.1878802 3.70398779 4.6717726
97 4.0918918 3.61056470 4.5732189
> pred_value_lpsa_newage<-(lca*0.587021826+lwei*0.454467424+age*(-0.019637176 )+lbph*0.107054031+lcp*(-0.105474263)+gle*0.045141598+ppg*0.004525231)
> pred_value_lpsa_newage
[1] 2.593835
#From the 95% CI , corresponding to 2.59 would be 1.83943681 2.6799383
#here the coefficients table gives us those predictors which contribute
#significantly to the model
#here we have lcavol,lweight,age,lbph,svi which have significant effect in comparison
model_1<-lm(lpsa~lcavol+lweight+age+lbph+svi,data=data)
summary(model_1)
#HERE adjusted R squared is 62.45% as compared to the full mode which has 62.34%
#HENCE ADJUSTED R squared increases meaning this model is better
#the predictors are narrowed down to lcavol,lweight and svi
# LESSER VARIABLES CONTRIBUTE TO BETTER FIT OF THE MODEL
Call:
lm(formula = lpsa ~ lcavol + lweight + age + lbph + svi, data = data)
Residuals:
Min 1Q Median 3Q Max
-1.83505 -0.39396 0.00414 0.46336 1.57888
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.95100 0.83175 1.143 0.255882
lcavol 0.56561 0.07459 7.583 2.77e-11 ***
lweight 0.42369 0.16687 2.539 0.012814 *
age -0.01489 0.01075 -1.385 0.169528
lbph 0.11184 0.05805 1.927 0.057160 .
svi 0.72095 0.20902 3.449 0.000854 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.7073 on 91 degrees of freedom
Multiple R-squared: 0.6441, Adjusted R-squared: 0.6245
F-statistic: 32.94 on 5 and 91 DF, p-value: < 2.2e-16
lcavol 0.587022 0.087920 6.677 2.11e-09 ***
lweight 0.454467 0.170012 2.673 0.00896 **
age -0.019637 0.011173 -1.758 0.08229 .
lbph 0.107054 0.058449 1.832 0.07040 .
svi 0.766157 0.244309 3.136 0.00233 **
lcp -0.105474 0.091013 -1.159 0.24964
gleason 0.045142 0.157465 0.287 0.77503
pgg45 0.004525 0.004421 1.024 0.30886
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.7084 on 88 degrees of freedom
Multiple R-squared: 0.6548, Adjusted R-squared: 0.6234
F-statistic: 20.86 on 8 and 88 DF, p-value: < 2.2e-16
Related Questions
drjack9650@gmail.com
Navigate
Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.