I need help coding for this problem. It is about the OLS regression with categor
ID: 3043930 • Letter: I
Question
I need help coding for this problem. It is about the OLS regression with categorical variable(gear). The dataset is mtcars in r.
Should find the answer without using lm function.( predictors - hp, qsec, wt, gear(as a categorical var.) / response - mpg).
Thank you.
Your turn: . Consider the variable gear (number of forward gears). Let's handle this variable as a . Form a new design matrix X by adding two dummy indicators for number of gears 4 . Calculate a new set of coefficient estimates with these new set of predictors categorical one. and 5 Compare your results with those returned by the following call of lm() # regression output with in() lm (mpg ~ hp + qsec + wt + factor(gear), data = mt cars) ## Call: ## 1m(formula mpg ~ hp + qsec + wt + factor (gear), data = mt cars) Coefficients: (Intercept) 24.13327 factor(gear)5 2.24468 ## hp 0.02164 wt factor(gear)4 1.15593 ## qsec 0.56761 3.66248 ##Explanation / Answer
ans) To perform the linear regression using mtcars in R we are given the predictors as hp, qsec, wt, and gear where is gear is a categorical variable. The response is mpg. In this case for categorical variable we have to extract the variable from the data and find the number of categories involved in it. Here we can find that there are 3 categories in gear.So we have to make 2 dummy variables instead of the gear variable. Then we will perform the usual regression using the formulas. We will first find the fitted values of regression coefficients. Denote it by beta_hat. Then we will find SSRes, SST to find R-square values. The R code is given below :
d=mtcars
d
d1=summary(d)
d2=d$gear
d2
# response
y=d$mpg
# creating 2 dummy variables for a categorical variable of 3 categories
x11=replace(d2,d2==3,1)
x12=replace(x11,x11==4,0)
x13=replace(x12,x12==5,0) # Here x13 is first dummy variable which is 1 for gear = 3 and 0 otherwise
x21=replace(d2,d2==4,1)
x22=replace(x21,x21==3,0)
x23=replace(x22,x22==5,0) # Here x23 is second dummy variable which is 1 for gear = 3 and 0 otherwise
#other continuous predictors
x1=d$hp
x2=d$qsec
x3=d$wt
x=as.matrix(cbind(x1,x2,x3,x13,x23))
beta=(solve(t(x)%*%x))%*%(t(x))%*%y
phat=x%*%(solve(t(x)%*%x))%*%t(x)
id=diag(32)
sut=as.vector(rep(1,32))
sstot=t(y)%*%(id-((sut%*%t(sut))/32))%*%y # the value is 1126.047
ssres=t(y)%*%(id-phat)%*%y # the value is 238.8945
Rsq=1-(ssres/sstot) # the value is 0.7878468
F= ((sstot-ssres)/ssres) *(26/5) # the value is 19.31059
F_q=qf(0.95,5,26) # the value is 2.58679
Comparing F-observed value and quantile of F at 95% significance level we can conclude that regressors are significant.
Related Questions
drjack9650@gmail.com
Navigate
Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.