SOLVE USING RSTUDIO: Predict the per capita crime rate (crim) using the other va
ID: 2948468 • Letter: S
Question
SOLVE USING RSTUDIO:
Predict the per capita crime rate (crim) using the other variables in the Boston data set, which is available in the MASS library.
(a) After loading the MASS library, use ?Boston to access information about the data set and answer the following questions. Note that you may also find the summary ) function useful i. Not including crim, how many variables are in the data set? In other words, what is p? ii. Are there any missing values in the data set? If so, remove them. iii. What is the sample size (once missing values have been removed, if necessary)? In other words, what is N? iv. Are there any qualitative variables in the data set? If so, list them. (b) Split the data set into a training set and a test set. Use set.seed (1) (c) Fit a linear model using least squares on the training set and report the test error obtained. (d) Fit a ridge regression model on the training set, with X chosen by cross-validation (use set.seed (2)). Report the test error obtained. Report the test error obtained, along with the number of non-zero coefficient estimates Is there much difference among the test errors resulting from these three approaches? What (e) Fit a lasso model on the training set, with ? chosen by cross-validation (use set . Seed (2) ) (f) Comment on the results obtained. How accurately can we predict the per capita crime rate? features seem to be related to per capita crime rate?Explanation / Answer
Part A)
1) library(MASS)
data("Boston")
str(Boston)
str(Boston)
'data.frame': 506 obs. of 14 variables:
$ crim : num 0.00632 0.02731 0.02729 0.03237 0.06905 ...
$ zn : num 18 0 0 0 0 0 12.5 12.5 12.5 12.5 ...
$ indus : num 2.31 7.07 7.07 2.18 2.18 2.18 7.87 7.87 7.87 7.87 ...
$ chas : int 0 0 0 0 0 0 0 0 0 0 ...
$ nox : num 0.538 0.469 0.469 0.458 0.458 0.458 0.524 0.524 0.524 0.524 ...
$ rm : num 6.58 6.42 7.18 7 7.15 ...
$ age : num 65.2 78.9 61.1 45.8 54.2 58.7 66.6 96.1 100 85.9 ...
$ dis : num 4.09 4.97 4.97 6.06 6.06 ...
$ rad : int 1 2 2 3 3 3 5 5 5 5 ...
$ tax : num 296 242 242 222 222 222 311 311 311 311 ...
$ ptratio: num 15.3 17.8 17.8 18.7 18.7 18.7 15.2 15.2 15.2 15.2 ...
$ black : num 397 397 393 395 397 ...
$ lstat : num 4.98 9.14 4.03 2.94 5.33 ...
$ medv : num 24 21.6 34.7 33.4 36.2 28.7 22.9 27.1 16.5 18.9 ...
dim(Boston)
506 14
P = 13 predictor exluding crim
2) summary(Boston)
sum(is.na(Boston)) :0
there is no missing values
3) N = 506 (no of rows)
4) there is no qualitative data in the Boston dataset
part b)
indp = sample(2,nrow(Boston),replace = TRUE,prob = c(0.8,0.2))
trainingdata = Boston[indp ==1,]
testdata = Boston[indp ==2,]
dim(trainingdata)
dim(testdata)
dim(Boston)
partc)
trainingdata_MSE = mean(crim.fit$residuals^2)
trainingdata_MSE = 40.85946
mean((testdata$crim - predict.lm(crim.fit, testdata)) ^ 2)
39.12132
part d)
library(glmnet)
set.seed(2)
ridge.mod <- glmnet(x, y, alpha=0, nlambda=100, lambda.min.ratio=0.0001)
cv.out <- cv.glmnet(x, y, alpha=0, nlambda=100, lambda.min.ratio=0.0001)
plot(cv.out)
best.lambda <- cv.out$lambda.min
best.lambda = 0.5842299
predict(ridge.mod, s=best.lambda, type="coefficients")
# non zero-coefficient = 14
x = x <- model.matrix(crim~.,testdata)[,-1]
y_test =testdata$crim
head(x)
ridge_pred = predict(ridge.mod, s = best.lambda, newx = x)
mean((ridge_pred - y_test)^2)
38.54707
f) they are same as previously fiting using linear model.
Related Questions
Navigate
Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.