The file BostonHousing.xls contains information collected by the U.S. Bureau of
ID: 3771283 • Letter: T
Question
The file BostonHousing.xls contains information collected by the U.S. Bureau of the Concerning housing in the area of Boston, Massachusetts. The dataset includes information on 506 census housing tracts in the boston area. The goal is to predict the median house price in new tracts based on information such as crime rate, pollution, and number of rooms. The dataset contains 14 predictors and the response in the median house price (MEDV). Table 5.3 describes each of the predictors and the response.
a. Why should the data be partitioned into training and validation sets? For what will the training set be used? For what will the validation set be used? Fit a multiple linear regression model to the median house price (MEDV) as a function of CRIM, CHAS and RM.
b. Write the equation for predicting the median house price from the predictors in the model.
c. What median house price is predicted for a tract in the Boston area that does not bound the Charles River, has a crime rate of 0.1, and where the average number of rooms per house is 6? What is the prediction error?
Table 5.3: Description of Variables for Boston Housing Example CRIM ZN INDUS CHAS NOX RM AGE DIS RAD TAX PTRATIO Per capita crime rate by town Proportion of residential land zoned for lots over 25,000 ft2 Proportion of nonretail business acres per town Charles River dummy variable (= 1 if tract bounds river; Nitric oxide concentration (parts per 10 milion) Average number of rooms per dwelling Proportion of owner-occupied units built prior to 1940 Weighted distances to f Index of accessibility to radial highways Full-value property-tax rate per $10,000 0 otherwise) five Boston employment centers Pupil/teacher ratio by town LSTAT MEDV 1000(Bk-0.63)2 where Bk is the proportion of blacks by town % Lower status of the population Median value of owner-occupied homes in S1000sExplanation / Answer
A)
i) As we know to make comparsion of data we must need two sets. So naming those sets
as training and validation. One of dor depict relationship and another one is for
validate the prediction with accuracy.
ii) training set is to build model. Algortithm named 'discovers' the model using this
data set.
iii) validaton set is to validate the model. It computees measures of error
-----------------------------------------------------------------------------------
B)
Regression model equation will be
INPUT VARIABLES COEFFICIENT STD_ERROR P-VALUE SS
constant term -23.607 3.410 0 159255.8125
crim -0.2611 0.0406 0 3756.92
CHAS 2.8866 1.4655 0.049 767.87
RM 7.508 0.5354 0 7997.099
So Equation will be
MEDV = -23/6071014 + (-0.2611129 * CRIM) + (2.8866*CHAS) + (7.5081*RM)
------------------------------------------------------------------------------
C)
Regression equation is
MEDV=-23.6071014+(-0.2611129*CRIM)+(2.88669062*CHAS)+(7.50815392*RM)MEDV=-23.6071014+(-0.2611129*0.1)+(2.88669062*0)+(7.50815392*6)MEDV=21.41571
Median house price is $21,415.71
Related Questions
drjack9650@gmail.com
Navigate
Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.