Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

hi. this is a Regression analysis with R In here i want a get values df, SSE, R

ID: 2926795 • Letter: H

Question

hi.

this is a Regression analysis with R

In here i want a get values df, SSE, R square, MSE, Cp, PRESSp

like this

but i don't know the R-code.

Can you help me??

below is basic R-code

#######################################################

set.seed(1)
x1 <- c(98.352,100.416,90.858,91.146,101.194,77.820,117.960,112.186,308.404,289.196,116.564,106.006,125.424,119.184,101.000,164.928,133.938,155.682,180.768,119.788,150.844,175.902,121.862,167.214,162.800,182.832,240.000)
x2 <- c(34.720,35.310,22.750,40.500,44.550,44.550,58.500,95.200,98.000,128.000,64.350,49.883,55.200,66.660,50.000,51.500,69.020,71.020,78.000,55.200,40.000,98.900,67.265,91.500,80.000,73.262,50.000)
x3 <- c(19.96,30.00,23.50,24.64,22.42,19.76,24.80,30.02,68.40,60.00,24.50,31.04,19.50,22.42,20.40,33.28,29.76,27.52,30.00,25.12,33.80,36.40,33.04,35.54,30.08,36.62,24.00)
x4 <- c(4.2,6.2,4.0,5.4,4.2,5.6,5.1,3.2,4.2,1.4,3.2,3.0,3.0,3.2,4.6,5.0,2.2,1.7,2.3,4.0,2.2,5.0,4.4,4.8,0.3,3.1,3.0)
y <- c(12.95,14.75,13.95,12.95,14.95,14.95,15.45,14.45,42.45,41.45,17.95,15.75,15.50,15.45,15.00,18.45,20.95,20.25,21.95,18.75,18.95,22.25,18.95,19.45,18.45,22.90,20.50)

mydata <- data.frame(y,x1,x2,x3,x4)

fit <- lm(y~x1+x2+x3+x4, data=mydata)

Explanation / Answer

First we can fit the lm model in R as you have shown:

x1 <- c(98.352,100.416,90.858,91.146,101.194,77.820,117.960,112.186,308.404,289.196,116.564,106.006,125.424,119.184,101.000,164.928,133.938,155.682,180.768,119.788,150.844,175.902,121.862,167.214,162.800,182.832,240.000)
x2 <- c(34.720,35.310,22.750,40.500,44.550,44.550,58.500,95.200,98.000,128.000,64.350,49.883,55.200,66.660,50.000,51.500,69.020,71.020,78.000,55.200,40.000,98.900,67.265,91.500,80.000,73.262,50.000)
x3 <- c(19.96,30.00,23.50,24.64,22.42,19.76,24.80,30.02,68.40,60.00,24.50,31.04,19.50,22.42,20.40,33.28,29.76,27.52,30.00,25.12,33.80,36.40,33.04,35.54,30.08,36.62,24.00)
x4 <- c(4.2,6.2,4.0,5.4,4.2,5.6,5.1,3.2,4.2,1.4,3.2,3.0,3.0,3.2,4.6,5.0,2.2,1.7,2.3,4.0,2.2,5.0,4.4,4.8,0.3,3.1,3.0)
y <- c(12.95,14.75,13.95,12.95,14.95,14.95,15.45,14.45,42.45,41.45,17.95,15.75,15.50,15.45,15.00,18.45,20.95,20.25,21.95,18.75,18.95,22.25,18.95,19.45,18.45,22.90,20.50)
mydata <- data.frame(y,x1,x2,x3,x4)
fit <- lm(y~x1+x2+x3+x4, data=mydata)

Now we can see the summary of the model as below using the summary function :

> summary(fit)

Call:

lm(formula = y ~ x1 + x2 + x3 + x4, data = mydata)

Residuals:

Min 1Q Median 3Q Max

-3.4891 -1.3574 0.1337 1.0686 3.4938

Coefficients:

Estimate Std. Error t value Pr(>|t|)   

(Intercept) 1.21874 2.04661 0.595 0.55759   

x1 0.05195 0.01383 3.756 0.00109 **

x2 0.01159 0.02534 0.458 0.65169   

x3 0.34941 0.07268 4.807 8.41e-05 ***

x4 -0.21894 0.33149 -0.660 0.51582   

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 2.039 on 22 degrees of freedom

Multiple R-squared: 0.9313, Adjusted R-squared: 0.9188

F-statistic: 74.53 on 4 and 22 DF, p-value: 1.817e-12

We can see that df is mentioned as 22.

SSE can be calculated by taking sum of squares of residuals as below:

sum(fit$residuals^2)

=91.44876

R squared is mentioned in the summary as 0.9313

The Residual standard error mentioned in the summary is RMSE. So we can just square this number to get MSE.

So MSE = (2.039)^2 = 4.157521

Not sure if we can get the value of Mallows Cp from the lm object, but we can use the leaps package for getting this value as below. We'll need to load the leaps library first.

library(leaps)

leaps(mydata[,2:5],mydata$y,method="Cp")

$which
1 2 3 4
1 FALSE FALSE TRUE FALSE
1 TRUE FALSE FALSE FALSE
1 FALSE TRUE FALSE FALSE
1 FALSE FALSE FALSE TRUE
2 TRUE FALSE TRUE FALSE
2 FALSE FALSE TRUE TRUE
2 FALSE TRUE TRUE FALSE
2 TRUE TRUE FALSE FALSE
2 TRUE FALSE FALSE TRUE
2 FALSE TRUE FALSE TRUE
3 TRUE FALSE TRUE TRUE
3 TRUE TRUE TRUE FALSE
3 FALSE TRUE TRUE TRUE
3 TRUE TRUE FALSE TRUE
4 TRUE TRUE TRUE TRUE

$label
[1] "(Intercept)" "1" "2" "3" "4"   

$size
[1] 2 2 2 2 3 3 3 3 3 3 4 4 4 4 5

$Cp
[1] 20.943616 28.958146 128.227503 266.280912 1.901360 16.002160 19.557496
[8] 25.174420 30.636710 129.792782 3.209442 3.436208 17.109908 26.110366
[15] 5.000000

Here we can see that the last model created is with all the 4 variables, and the Cp value is 5.

Predictively adjusted residuals can be calcualted as below:
pr <- resid(fit)/(1 - lm.influence(fit)$hat)

PRESS is :
sum(pr^2)

=170.481