Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

The dataset uswages , from the faraway package in R., is drawn as a sample from

ID: 3204965 • Letter: T

Question

The dataset uswages, from the faraway package in R., is drawn as a sample from the Current Population Survey in 1988.

Install faraway data package in R., or download data at: https://cran.r-project.org/web/packages/faraway/index.html

(a) Fit a regression model with weekly wages as the response and years of education and experience as predictors. Present the output.
(b) What percentage of variation in the response is explained by these predictors? (Percentage variance explained is the same as coecient of determination).
(c) Which observation has the largest (positive) residual? Give the case number.
(d) Compute the mean and median of the residuals. Explain what the diference between the mean and the median indicates.
(e) For two people with the same education and one year diference in experience, what would be the diference in predicted weekly wages?
(f) Compute the correlation of the residuals with the tted values. Plot residuals against fitted values. Explain the value of this correlation using the geometric (projection) interpretation of least squares.

Explanation / Answer

Please see the complete R snippet for the first 4 parts of the question

require(faraway)

(uswages)
  

# regression model

fit <- lm(wage ~ educ+exper,uswages)

# summarise the results
summary(fit)

max(fit$residuals)

# mean and median of residuals

mean(fit$residuals)
median(fit$residuals)

The results are

> # regression model
>
> fit <- lm(wage ~ educ+exper,uswages)
>
> # summarise the results
> summary(fit)

Call:
lm(formula = wage ~ educ + exper, data = uswages)

Residuals:
Min 1Q Median 3Q Max
-1018.2 -237.9 -50.9 149.9 7228.6

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -242.7994 50.6816 -4.791 1.78e-06 ***
educ 51.1753 3.3419 15.313 < 2e-16 ***
exper 9.7748 0.7506 13.023 < 2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 427.9 on 1997 degrees of freedom
Multiple R-squared: 0.1351,   Adjusted R-squared: 0.1343 # hence the model can explain only 13% of the variation of the data
F-statistic: 156 on 2 and 1997 DF, p-value: < 2.2e-16

>
> max(fit$residuals)
[1] 7228.612
>
> # mean and median of residuals
>
> mean(fit$residuals)
[1] -6.317169e-16
> median(fit$residuals)
[1] -50.86827

The mean of the residuals is close to zero as that the whole point of performing a linear regression which attemopts to minimise the residual values (predicted-actual) on the other hand median is simply the middle value of all the residuals observed.

Please ntoe that we can answer only 4 subparts of a question at a time , as per the answering guidelines

Hire Me For All Your Tutoring Needs
Integrity-first tutoring: clear explanations, guidance, and feedback.
Drop an Email at
drjack9650@gmail.com
Chat Now And Get Quote