A software quality engineer wants to develop a regression model for predicting t
ID: 3173016 • Letter: A
Question
A software quality engineer wants to develop a regression model for predicting the number of faults in program (y). S/He decided that three software complexity metrics are potentially useful as independent variables, namely program size in lines of code in program (x_1), program control flow measured in terms of the number of branches (x_2) and the number of tokens in program (x_3). Data were collected on a set of 24 programs from a distributed application in a data communication software system. The ANOVA table for the model y_i = beta_0 + beta_1 x_1i + beta_2x_2i + beta_3 x_3i + elementof_i was found to be as follows: Refer to problem (2). The software quality engineer decided to screen (select) the independent variables to determine the best set for predicting y. The regression sums of squares for all possible regression models were found to be as follows: For the R^2 criterion, indicate which set of independent variables you would recommend as to best set for predicting y. An observer states: "There are only three variables, so why screen? You might as well use all three." Discuss.Explanation / Answer
(a) The set of independent variables which has the highest explained sum of squares or sum of squares of regression has the maximum value of R^2 as per definition of R^2, so by the method of R^2, the highest is of the set {x1, x2, x3}, but we see that it is not much different from the set {x1, x3} which means that not much impact is made by adding x32 so we chose the set {x1, x3}
(b) Even if there are only three variables, there might be multicollinearity between the set of variables, which might highly impact the efficiency of the output regression model, so it is always better to test, even in case of 3 variables only.
Related Questions
drjack9650@gmail.com
Navigate
Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.