Using the seatpos data, fit the linear regression of hipcenter on all of the oth
ID: 3173481 • Letter: U
Question
Using the seatpos data, fit the linear regression of hipcenter on all of the other variables.
(a) Produce a summary of the regression results.
(b) Do any variables appear to be signi cant based on the individual t-tests for their
coeficients? What about based on the overall F-test (for all of the variables together)?
(c) Compute the variance inflation factors (VIFs) for the variables. Using the threshold of 10 to determine if a VIF indicates a problem of collinearity, which variables have a VIF
indicating a possible problem?
(d) Reduce the model by removing all variables that had VIFs you identified as problematic in the previous part. Produce a summary of the regression results.
(e) For the model of the previous part, do any variables appear to be significant based on the individual t-tests for their coecients? What about based on the overall F-test (for
all of the variables together)?
(f) Compute the VIFs for the reduced set of variables. (Have they changed?) Again using the threshold of 10, which variables have a VIF indicating a possible problem?
Install seatpos data package in R., or download data at: https://cran.r-project.org/web/packages/faraway/index.html
Explanation / Answer
The complete R snippet is as follows
library(faraway)
data.df <- data.frame(seatpos)
# fit the model
fit <- lm(hipcenter~.,,data = data.df)
summary(fit)
# get the vif values
vif(fit)
# all variables where vif is greater than 2 are problematic
sqrt(vif(fit)) > 2
## htshoes , ht , seated leg are all problematic
# refit the model
myformula <- hipcenter~ Age + Weight +Thigh
# fit the model
fit1 <- lm(myformula,,data = data.df)
summary(fit1)
The results are
> summary(fit)
Call:
lm(formula = hipcenter ~ ., data = data.df)
Residuals:
Min 1Q Median 3Q Max
-73.827 -22.833 -3.678 25.017 62.337
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 436.43213 166.57162 2.620 0.0138 *
Age 0.77572 0.57033 1.360 0.1843
Weight 0.02631 0.33097 0.080 0.9372
HtShoes -2.69241 9.75304 -0.276 0.7845
Ht 0.60134 10.12987 0.059 0.9531
Seated 0.53375 3.76189 0.142 0.8882
Arm -1.32807 3.90020 -0.341 0.7359
Thigh -1.14312 2.66002 -0.430 0.6706
Leg -6.43905 4.71386 -1.366 0.1824
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 37.72 on 29 degrees of freedom
Multiple R-squared: 0.6866, Adjusted R-squared: 0.6001
F-statistic: 7.94 on 8 and 29 DF, p-value: 1.306e-05 # none of the variables are significant as the p values are not lessthan 0.05 , however the overall model is signficant which tells that the model is effected by multicollinearity . Lets find varaince inflaiton factor
> vif(fit)
Age Weight HtShoes Ht Seated Arm Thigh
1.997931 3.647030 307.429378 333.137832 8.951054 4.496368 2.762886
Leg
6.694291
> sqrt(vif(fit)) > 2
Age Weight HtShoes Ht Seated Arm Thigh Leg
FALSE FALSE TRUE TRUE TRUE TRUE FALSE TRUE
> myformula <- hipcenter~ Age + Weight +Thigh
> fit1 <- lm(myformula,,data = data.df)
> summary(fit1)
Call:
lm(formula = myformula, data = data.df)
Residuals:
Min 1Q Median 3Q Max
-84.764 -26.436 2.596 20.809 84.995
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 126.7917 69.6700 1.820 0.07759 .
Age 1.0654 0.4438 2.401 0.02198 *
Weight -0.7679 0.2315 -3.316 0.00218 **
Thigh -5.4259 2.1400 -2.535 0.01599 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 41.29 on 34 degrees of freedom
Multiple R-squared: 0.5597, Adjusted R-squared: 0.5208
F-statistic: 14.41 on 3 and 34 DF, p-value: 3.194e-06
when we refit the model by accounting for vifs we get the variales as significant
Please note that we can answer only 4 subparts of a question at a time , as per the answering guidelines.
Related Questions
Navigate
Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.