hi. this is a Regression analysis with R In here i want a get values df, SSE, R
ID: 2926795 • Letter: H
Question
hi.
this is a Regression analysis with R
In here i want a get values df, SSE, R square, MSE, Cp, PRESSp
like this
but i don't know the R-code.
Can you help me??
below is basic R-code
#######################################################
set.seed(1)
x1 <- c(98.352,100.416,90.858,91.146,101.194,77.820,117.960,112.186,308.404,289.196,116.564,106.006,125.424,119.184,101.000,164.928,133.938,155.682,180.768,119.788,150.844,175.902,121.862,167.214,162.800,182.832,240.000)
x2 <- c(34.720,35.310,22.750,40.500,44.550,44.550,58.500,95.200,98.000,128.000,64.350,49.883,55.200,66.660,50.000,51.500,69.020,71.020,78.000,55.200,40.000,98.900,67.265,91.500,80.000,73.262,50.000)
x3 <- c(19.96,30.00,23.50,24.64,22.42,19.76,24.80,30.02,68.40,60.00,24.50,31.04,19.50,22.42,20.40,33.28,29.76,27.52,30.00,25.12,33.80,36.40,33.04,35.54,30.08,36.62,24.00)
x4 <- c(4.2,6.2,4.0,5.4,4.2,5.6,5.1,3.2,4.2,1.4,3.2,3.0,3.0,3.2,4.6,5.0,2.2,1.7,2.3,4.0,2.2,5.0,4.4,4.8,0.3,3.1,3.0)
y <- c(12.95,14.75,13.95,12.95,14.95,14.95,15.45,14.45,42.45,41.45,17.95,15.75,15.50,15.45,15.00,18.45,20.95,20.25,21.95,18.75,18.95,22.25,18.95,19.45,18.45,22.90,20.50)
mydata <- data.frame(y,x1,x2,x3,x4)
fit <- lm(y~x1+x2+x3+x4, data=mydata)
Explanation / Answer
First we can fit the lm model in R as you have shown:
x1 <- c(98.352,100.416,90.858,91.146,101.194,77.820,117.960,112.186,308.404,289.196,116.564,106.006,125.424,119.184,101.000,164.928,133.938,155.682,180.768,119.788,150.844,175.902,121.862,167.214,162.800,182.832,240.000)
x2 <- c(34.720,35.310,22.750,40.500,44.550,44.550,58.500,95.200,98.000,128.000,64.350,49.883,55.200,66.660,50.000,51.500,69.020,71.020,78.000,55.200,40.000,98.900,67.265,91.500,80.000,73.262,50.000)
x3 <- c(19.96,30.00,23.50,24.64,22.42,19.76,24.80,30.02,68.40,60.00,24.50,31.04,19.50,22.42,20.40,33.28,29.76,27.52,30.00,25.12,33.80,36.40,33.04,35.54,30.08,36.62,24.00)
x4 <- c(4.2,6.2,4.0,5.4,4.2,5.6,5.1,3.2,4.2,1.4,3.2,3.0,3.0,3.2,4.6,5.0,2.2,1.7,2.3,4.0,2.2,5.0,4.4,4.8,0.3,3.1,3.0)
y <- c(12.95,14.75,13.95,12.95,14.95,14.95,15.45,14.45,42.45,41.45,17.95,15.75,15.50,15.45,15.00,18.45,20.95,20.25,21.95,18.75,18.95,22.25,18.95,19.45,18.45,22.90,20.50)
mydata <- data.frame(y,x1,x2,x3,x4)
fit <- lm(y~x1+x2+x3+x4, data=mydata)
Now we can see the summary of the model as below using the summary function :
> summary(fit)
Call:
lm(formula = y ~ x1 + x2 + x3 + x4, data = mydata)
Residuals:
Min 1Q Median 3Q Max
-3.4891 -1.3574 0.1337 1.0686 3.4938
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.21874 2.04661 0.595 0.55759
x1 0.05195 0.01383 3.756 0.00109 **
x2 0.01159 0.02534 0.458 0.65169
x3 0.34941 0.07268 4.807 8.41e-05 ***
x4 -0.21894 0.33149 -0.660 0.51582
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 2.039 on 22 degrees of freedom
Multiple R-squared: 0.9313, Adjusted R-squared: 0.9188
F-statistic: 74.53 on 4 and 22 DF, p-value: 1.817e-12
We can see that df is mentioned as 22.
SSE can be calculated by taking sum of squares of residuals as below:
sum(fit$residuals^2)
=91.44876
R squared is mentioned in the summary as 0.9313
The Residual standard error mentioned in the summary is RMSE. So we can just square this number to get MSE.
So MSE = (2.039)^2 = 4.157521
Not sure if we can get the value of Mallows Cp from the lm object, but we can use the leaps package for getting this value as below. We'll need to load the leaps library first.
library(leaps)
leaps(mydata[,2:5],mydata$y,method="Cp")
$which
1 2 3 4
1 FALSE FALSE TRUE FALSE
1 TRUE FALSE FALSE FALSE
1 FALSE TRUE FALSE FALSE
1 FALSE FALSE FALSE TRUE
2 TRUE FALSE TRUE FALSE
2 FALSE FALSE TRUE TRUE
2 FALSE TRUE TRUE FALSE
2 TRUE TRUE FALSE FALSE
2 TRUE FALSE FALSE TRUE
2 FALSE TRUE FALSE TRUE
3 TRUE FALSE TRUE TRUE
3 TRUE TRUE TRUE FALSE
3 FALSE TRUE TRUE TRUE
3 TRUE TRUE FALSE TRUE
4 TRUE TRUE TRUE TRUE
$label
[1] "(Intercept)" "1" "2" "3" "4"
$size
[1] 2 2 2 2 3 3 3 3 3 3 4 4 4 4 5
$Cp
[1] 20.943616 28.958146 128.227503 266.280912 1.901360 16.002160 19.557496
[8] 25.174420 30.636710 129.792782 3.209442 3.436208 17.109908 26.110366
[15] 5.000000
Here we can see that the last model created is with all the 4 variables, and the Cp value is 5.
Predictively adjusted residuals can be calcualted as below:
pr <- resid(fit)/(1 - lm.influence(fit)$hat)
PRESS is :
sum(pr^2)
=170.481
Related Questions
Navigate
Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.