Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

Using the seatpos data, fit the linear regression of hipcenter on all of the oth

ID: 3173481 • Letter: U

Question

Using the seatpos data, fit the linear regression of hipcenter on all of the other variables.

(a) Produce a summary of the regression results.

(b) Do any variables appear to be signi cant based on the individual t-tests for their
coeficients? What about based on the overall F-test (for all of the variables together)?

(c) Compute the variance inflation factors (VIFs) for the variables. Using the threshold of 10 to determine if a VIF indicates a problem of collinearity, which variables have a VIF
indicating a possible problem?

(d) Reduce the model by removing all variables that had VIFs you identified as problematic in the previous part. Produce a summary of the regression results.

(e) For the model of the previous part, do any variables appear to be significant based on the individual t-tests for their coecients? What about based on the overall F-test (for
all of the variables together)?

(f) Compute the VIFs for the reduced set of variables. (Have they changed?) Again using the threshold of 10, which variables have a VIF indicating a possible problem?

Install seatpos data package in R., or download data at: https://cran.r-project.org/web/packages/faraway/index.html

Explanation / Answer

The complete R snippet is as follows

library(faraway)

data.df <- data.frame(seatpos)

# fit the model
fit <- lm(hipcenter~.,,data = data.df)
summary(fit)

# get the vif values
vif(fit)

# all variables where vif is greater than 2 are problematic
sqrt(vif(fit)) > 2

## htshoes , ht , seated leg are all problematic

# refit the model
myformula <- hipcenter~ Age + Weight +Thigh

# fit the model
fit1 <- lm(myformula,,data = data.df)
summary(fit1)

The results are

> summary(fit)

Call:
lm(formula = hipcenter ~ ., data = data.df)

Residuals:
Min 1Q Median 3Q Max
-73.827 -22.833 -3.678 25.017 62.337

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 436.43213 166.57162 2.620 0.0138 *
Age 0.77572 0.57033 1.360 0.1843
Weight 0.02631 0.33097 0.080 0.9372
HtShoes -2.69241 9.75304 -0.276 0.7845
Ht 0.60134 10.12987 0.059 0.9531
Seated 0.53375 3.76189 0.142 0.8882
Arm -1.32807 3.90020 -0.341 0.7359
Thigh -1.14312 2.66002 -0.430 0.6706
Leg -6.43905 4.71386 -1.366 0.1824
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 37.72 on 29 degrees of freedom
Multiple R-squared: 0.6866,   Adjusted R-squared: 0.6001
F-statistic: 7.94 on 8 and 29 DF, p-value: 1.306e-05 # none of the variables are significant as the p values are not lessthan 0.05 , however the overall model is signficant which tells that the model is effected by multicollinearity . Lets find varaince inflaiton factor

> vif(fit)
Age Weight HtShoes Ht Seated Arm Thigh
1.997931 3.647030 307.429378 333.137832 8.951054 4.496368 2.762886
Leg
6.694291
> sqrt(vif(fit)) > 2
Age Weight HtShoes Ht Seated Arm Thigh Leg
FALSE FALSE TRUE TRUE TRUE TRUE FALSE TRUE

> myformula <- hipcenter~ Age + Weight +Thigh
> fit1 <- lm(myformula,,data = data.df)
> summary(fit1)

Call:
lm(formula = myformula, data = data.df)

Residuals:
Min 1Q Median 3Q Max
-84.764 -26.436 2.596 20.809 84.995

Coefficients:
Estimate Std. Error t value Pr(>|t|)   
(Intercept) 126.7917 69.6700 1.820 0.07759 .
Age 1.0654 0.4438 2.401 0.02198 *
Weight -0.7679 0.2315 -3.316 0.00218 **
Thigh -5.4259 2.1400 -2.535 0.01599 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 41.29 on 34 degrees of freedom
Multiple R-squared: 0.5597,   Adjusted R-squared: 0.5208
F-statistic: 14.41 on 3 and 34 DF, p-value: 3.194e-06

when we refit the model by accounting for vifs we get the variales as significant

Please note that we can answer only 4 subparts of a question at a time , as per the answering guidelines.