Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

Load the prostate data from the Faraway package into R. Let lpsa be the outcome

ID: 3171245 • Letter: L

Question

Load the prostate data from the Faraway package into R. Let lpsa be the outcome and treat all of the other variables as predictors. Set aside the first observation/subject from the dataset.

(a) Fit the model using all of the data except the first observation. Summarize the fit of the model, describing the relationship between the various predictors and lpsa.

(b) Remove any predictors that are not significant above at = 0.1. Summarize the fit of the new/sub model and contrast with the full model.

(c) Form 95% prediction intervals for the first observation (that was left out). Compare and contrast the predictions and intervals. Which model (full or sub) seems to predict better in this case?

(d) Based on your results above, do you think there is sufficient evidence to prefer one model over the other? Why or why not?

Explanation / Answer

# all R commands

library(faraway)# library which contains the dataset.

class(prostate)#gives the type/class of the data

data=prostate

d=scan("clipboard")

data=matrix(d,nrow=97,byrow=T)

#a)

y=data[,9]

x1=data[,2]

x2=data[,3]

x3=data[,4]

x4=data[,5]

x5=data[,6]

x6=data[,7]

fit=lm(y~x1+x2+x3+x4+x5+x6)

pairs(y~x1+x2+x3+x4+x5+x6)

d=cbind(y,x1,x2,x3,x4,x5,x6)

cor(d)

m=model.matrix(~x1+x2+x3+x4+x5+x6)

beta=solve(t(m)%*%m)%*%t(m)%*%y

beta

b0=beta[1]

b0

names(summary(fit))

betaest=summary(fit)$coefficients

betaest[,1]

t(m)%*%m%*%beta

t(m)%*%y

H=(m)%*%solve(t(m)%*%m)%*%t(m)

res=y-(H%*%y)

var(res)

resest=summary(fit)$residuals

errsig=summary(fit)$sigma

errvar=(errsig)^2

errvar

round(vcov(fit),4)

ve=diag(vcov(fit))

se=sqrt(ve)

se

par(mfrow=c(2,2))

plot(fit)

shapiro.test(y)

shapiro.test(resest)

f=summary(fit)

f

ftb=qf(0.95,6,90)# f test is for check adequacy of the model

ftb

# now to test the significance of each predictor in the model.it follows dist

b0=beta[1]

b1=beta[2]

b2=beta[3]

b3=beta[4]

b4=beta[5]

b5=beta[6]

b6=beta[7]

seb0=se[1]

seb1=se[2]

seb2=se[3]

seb3=se[4]

seb4=se[5]

seb5=se[6]

seb6=se[7]

t=c()

for(i in 0:7)

{

t[i]=(beta[i]/se[i])

}

t

tb=abs(qt(0.025,95))

tb

# or using summary(fit) we can get t calculated values

ci=confint(fit)

ci

#b) # new model , Remove any predictors

y=data[,9]#response variable lpsa

y

x1=data[,1]#regressor/predictor lcavol

x1 fit=lm(y~x1)

names(fit)

plot(y~x1,xlab="lcavol",ylab="lpsa")

abline(fit)

summary(fit)

coef=fit$coefficient

bo=coef[1]

b1=coef[2]

bo

b1

m=model.matrix(~x1)

m

xtxi=solve(t(m)%*%m)

betaest=xtxi%*%t(m)%*%y

betaest

b0=betaest[1]

b1=betaest[2]

b0

b1

solve(crossprod(m,m),crossprod(m,y))

fitobject=summary(fit)

fitobject

names(fitobject)

rsq=fitobject$r.squared

rsq

par=(mfrow=c(2,2))

par

plot(fit)

shapiro.test(y)

res=fit$residual

shapiro.test(res)

fitobject=summary(fit)

fitobject$coefficient

ci=confint(fit)

ci

ciinterpret=ci[1,]

ciinterpret

cislpoe=ci[2,]

cislpoe

#c) Form 95% prediction intervals for the first observation (that was left out). Compare and contrast the predictions and intervals. All “c” beat calculate option (a)and (b).all full model is better then sub model.

#d) I see that ,full model is better than sub full model. Because ,In full model is less error compare than sub full model . you can run all command and see that.

Hire Me For All Your Tutoring Needs
Integrity-first tutoring: clear explanations, guidance, and feedback.
Drop an Email at
drjack9650@gmail.com
Chat Now And Get Quote