4pts] Suppose that a data scientist has 200 observations,50 input variables, and
ID: 3367348 • Letter: 4
Question
4pts] Suppose that a data scientist has 200 observations,50 input variables, and a output variable. He performed random forests for regression trees. Based on the variable importance plot from the random forests, it seemed that there are 10 important input variables (i.e, the 10 inputs have high importance measure values, while the other inputs have relatively low values). To improve interpretation, he built a linear regression model with the 10 input variables selected from the variable importance plot. However, test error of the linear regression model was much higher than the random forests. Give at least two possibilities why prediction of the linear regression model was poor. And then, justify your answer briefl loryer varianceExplanation / Answer
The two possibilities are:
1. Non-linear relationship – There exists a non-linear relationship between dependent and independent variables and as the data scientist is applying linear regression the error will be high because the correlation between the variables will not be acceptable for linear regression.
2. Heteroscedasticity – The variation in error terms variables is not homogeneous across all values of the independent variables. That means the error term does not have constant variance.
Related Questions
Hire Me For All Your Tutoring Needs
Integrity-first tutoring: clear explanations, guidance, and feedback.
Drop an Email at
drjack9650@gmail.com
drjack9650@gmail.com
Navigate
Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.