2 1In this problem, you are asked to simulate (along the same lines as we did wi
ID: 3320560 • Letter: 2
Question
2 1In this problem, you are asked to simulate (along the same lines as we did with the class example) the first two steps of a forward stepwise regression. In the table on the left, "" represents the response (dependent) variable while the 6 variables represent the possible "regressor" (aka" predictor or independent variables. Select the variable most highly (absolute value) correlated with y as the first variable to enter. After determining which of the remaining x's is most highly (absolute value) correlated with the residuals from the first regression, regress those residuals against it. Add the 2 simple regressions together. Determine the error associated with that model and complete columns A and C in the table on the right. Finally, perform a "regular" multiple linear regression (MLR) using the 2 x-variables you chose and complete columns B and D in the table on the right. (Note that "error" and 'residual" both refer to the difference between the observed and "predicted" y-values). You may attach a separate table with the data, but PUT THE SUMS UNDER COLUMNS C AND D IN THE SPACES BELOw Error after Residual 2nd step "regular" MLR Squared Squared Col A Col 8 240 38 24 91 100 71 43 236 19 21 90 95 6736 270 5 24 88 110 62 39 274 65 25 87 88 2843 301 74 25 91 94 28 32 316 86 26 94 99 2829 300 88 25 87 97 22 38 296 79 25 86 96 34 23 267 26 24 88 110 28 26 276 45 25 91 105 4 24 288 9 2590 100 40 44 261 66 23 89 98 40 35 SUMSExplanation / Answer
> y<-c(240,236,270,274,301,316,300,296,267,276,288,261)
> x1<-c(38,19,5,65,74,86,88,79,26,45,9,66)
> x2<-c(24,21,24,25,25,26,25,25,24,25,25,23)
> x3<-c(91,90,88,87,91,94,87,86,88,91,90,89)
> x4<-c(100,95,110,88,94,99,97,96,110,105,100,98)
> x5<-c(71,67,62,28,28,28,22,34,28,45,40,40)
> x6<-c(43,36,39,43,32,29,38,23,26,24,44,35)
> cor(y,x1)
[1] 0.6063294
> cor(y,x2)
[1] 0.826963
> cor(y,x3)
[1] 0.09285061
> cor(y,x4)
[1] -0.1326605
> cor(y,x5)
[1] -0.7820574
> cor(y,x6)
[1] -0.3234644
Hence, the variable y is most correlated with - x2
We regress y on x2.
> model1 <- lm(y ~ x2)
> model1
Call:
lm(formula = y ~ x2)
Coefficients:
(Intercept) x2
-100.52 15.52
> residuals1 <- model1$residuals
The residuals after model1 are:
[,1]
1 -31.9107143
2 10.6428571
3 -1.9107143
4 -13.4285714
5 13.5714286
6 13.0535714
7 12.5714286
8 8.5714286
9 -4.9107143
10 -11.4285714
11 0.5714286
12 4.6071429
> cor(residuals1,x1)
[1] 0.3686749
> cor(residuals1,x3)
[1] -0.000676516
> cor(residuals1,x4)
[1] -0.1986911
> cor(residuals1,x5)
[1] -0.4626613
> cor(residuals1,x6)
[1] -0.3247839
So, the residuals are most correlated with x5.
we regress the residuals from model1 on x5.
> model2 <- lm(residuals1 ~ x5)
> residuals2 <- model2$residuals
Hence, the error after step 2 (Col A) is:
1 -20.6166896
2 20.4268172
3 5.9856651
4 -18.3677410
5 8.6322590
6 8.1144019
7 5.3671621
8 5.8973559
9 -9.8498838
10 -9.9499665
11 0.1624527
12 4.1981670
Doing regualar MLR of y on x2 and x5
> model3 <- lm(y ~ x2 + x5)
> residuals3 <- model3$residuals
The residual regular MLR (Col B) is:
1 -14.846446
2 9.836142
3 9.506547
4 -18.224045
5 8.775955
6 13.378498
7 4.011283
8 7.540626
9 -14.826589
10 -5.557477
11 3.305297
12 -2.899791
Col A squared (Col C) is:
1 425.0478908
2 417.2548622
3 35.8281864
4 337.3739091
5 74.5158956
6 65.8435176
7 28.8064294
8 34.7788064
9 97.0202118
10 99.0018339
11 0.0263909
12 17.6246064
And their sum = 1633.123
Col B squared (Col D) is:
1 220.416951
2 96.749687
3 90.374445
4 332.115833
5 77.017378
6 178.984220
7 16.090394
8 56.861037
9 219.827751
10 30.885551
11 10.924988
12 8.408787
And their sum = 1338.657
Related Questions
drjack9650@gmail.com
Navigate
Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.