Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

-2 0 Theoretical Quantiles box plots and Q-O plots of log(Time) and Tonnage indu

ID: 3307160 • Letter: #

Question

-2 0 Theoretical Quantiles box plots and Q-O plots of log(Time) and Tonnage industry has asked for your help in modeling data on the st centers on modeling suggested retail price as a func- to the dealer for 234 new cars. The data set, which is available as AD analyst for the auto of new cars. Interest dealer orices tion of the cost website in the file cars04.csv, is a subset of the data from the mstat.org/publications/jse/datasets/04cars.txt book (Accessed March 12, 2007) e first model fit to the data was Suggested Retail Price P+, Dealer Cost e (3.10) he following pages is some output from fitting model (3.10) as well s some plots Based on the output for model (3.10) the analyst concluded the following: Figure 3.46) Since the and the effective model explains Just more than 99.8% of the variability in Suggested Retail Price coefficient of Dealer Cost has a t-value greater than 412, model (1) is a highly model for producing prediction intervals for Suggested Retail Price. Provide a detailed critique of this conclusion.

Explanation / Answer

a) In model 3.10 the p value for the F statistic is very less, so this model is way better than the null model.
Also the p value associated with the DealerCost variable is very much significant.
Also the adjusted R square is 0.9986 which is very good.

But if we check the graphs produced for the model, we can clearly see that assumptions of linear model are violated.
From the bottom left graph of square root of standardizes residuals vs. Dealer cost, we can clearly see a pattern and this is having an increasing trend. So homoscedasticity assumption is violated. Ideally this should be a random band around a horizontal
line. Same thing we can see in the top right graph. The value of Residuals are increasing with the increase in Dealre cost, so the model is not able to capture the systematic variance present in the data.

b) Generally when homoscedasticty is violated we transform the response variable. Generally a log transformation helps here.
So we can log transform the Suggested Retail price.

c) In terms of prediction model 3.11 and 3.10 are almost same, as they have similar R squared and same level of significance for the predictor variable.

d) For every 1 unit increase in log(DealerCost), the log(SuggestedRetailPrice) increases by a factor of 1.014836.

e) From the QQ plot, we can see that there are some extreme observations in the data. We can either remove the outliers or do capping/flooring to improve the model performance.