Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

JJ Plains Roofing and siding Company, Inc., sells roofing and siding products to

ID: 3220363 • Letter: J

Question

JJ Plains Roofing and siding Company, Inc., sells roofing and siding products to home repair retailers, such as Lowe's and Home Depot, and commercial contractors. The owner is interested in studying the effects of several variables on the value of shingles sold (000$). The marketing manager is arguing that the company should spend more money on advertising, while a market researcher suggests it should focus more on making its brand and product more distinct from its competitors. The company has divided the United States into 26 marketing districts. In each district is collected information on the following variables: volume of sales (in thousands of dollars), advertising dollars (in thousands), number of active accounts, number of competitor brands, and a rating of district potential. i. Develop a correlation matrix. Do you see any problem with multicollinearity? ii. Use the AIC forward selction procedure to determine the variables to include in the regression model for these data, with Sales as the dependent variable. Name the variables to be included in the final model. iii. Determine a regression equation, using sales as the dependent variable and the predictors determined from part(ii). Conduct a global test to dtermine whether any independendent variables are different from 0. iv. Interpret the regression coefficients in the context of these data, using the regression equation obtained in part (iii).

Explanation / Answer

I am using R software to solve the problem.

First we need to get this data loaded into R environment. So let us create 5 vectors corresponding to the 5 variables given as below:

Sales <- c(79.3,200.1,163.2,200.1,146,177.7,30.9,291.9,160,339.4,159.6,86.3,237.5,
107.2,155,291.4,100.2,135.8,223.2,195,73.4,47.7,140.7,93.5,259,331.2)
AdvDollars <- c(5.5,2.5,8,3,3,2.9,8,9,4,6.5,5.5,5,6,5,3.5,8,6,4,7.5,7,6.7,6.1,3.6,
4.2,4.5,5.6)
NoOfAccounts <- c(31,55,67,50,38,71,30,56,42,73,60,44,50,39,55,70,40,50,62,59,53,
38,43,26,75,71)
NoOfCompetitors <- c(10,8,12,7,8,12,12,5,8,5,11,12,6,10,10,6,11,11,9,9,13,13,9,8,8,4)
MarketPotential <- c(8,6,9,16,15,17,8,10,4,16,7,12,6,4,4,14,6,8,13,11,5,10,17,3,19,9)

#Creating a data frame holding all these 5 variables

InputData <- data.frame(Sales,AdvDollars,NoOfAccounts,NoOfCompetitors,MarketPotential)

1) Computing correlation matrix using cor() function in R

corMatrix <- cor(InputData)

Sales AdvDollars NoOfAccounts NoOfCompetitors MarketPotential
Sales 1.0000000 0.15881908 0.7828336 -0.83300320 0.4072504
AdvDollars 0.1588191 1.00000000 0.1725506 -0.03825157 -0.0705647
NoOfAccounts 0.7828336 0.17255062 1.0000000 -0.32428751 0.4682129
NoOfCompetitors -0.8330032 -0.03825157 -0.3242875 1.00000000 -0.2020604
MarketPotential 0.4072504 -0.07056470 0.4682129 -0.20206035 1.0000000

We can see that Sales has a high negative correlation with no. of competitors and positive correaltion with no. of accounts. But as Sales will be our dependent variable, that is not an issue.

2) For performing stepwise regression based on AIC value, stepAIC function be used from the MASS package.

model <- lm(Sales ~ . , data = InputData)
library(MASS)
step <- stepAIC(model, direction = "forward")
step$anova

Stepwise Model Path
Analysis of Deviance Table

Initial Model:
Sales ~ AdvDollars + NoOfAccounts + NoOfCompetitors + MarketPotential

Final Model:
Sales ~ AdvDollars + NoOfAccounts + NoOfCompetitors + MarketPotential


Step Df Deviance Resid. Df Resid. Dev AIC
1 21 1934.712 122.0501

According to forward selection, all the 4 variables are included in the model.

3) We can check the coefficients and the associated p values for each of the predictors using summary function

Call:
lm(formula = Sales ~ ., data = InputData)

Residuals:
Min 1Q Median 3Q Max
-19.0848 -5.9771 0.8958 6.5712 14.7982

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 178.3428 12.9522 13.769 5.54e-12 ***
AdvDollars 1.8046 1.0804 1.670 0.110
NoOfAccounts 3.3178 0.1628 20.381 2.57e-15 ***
NoOfCompetitors -21.1853 0.7874 -26.904 < 2e-16 ***
MarketPotential 0.3240 0.4675 0.693 0.496
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 9.598 on 21 degrees of freedom
Multiple R-squared: 0.9892,   Adjusted R-squared: 0.9871
F-statistic: 479.7 on 4 and 21 DF, p-value: < 2.2e-16

F statistic is significant which means overall the model is better than a null model.

The variables AdvDollars and MarketPotential are not statistically significant. NoOfAccounts and NoOfCompetitors are significant variables.

The regression equation will be:

Sales = 178.3428 + 1.8046*AdvDollars +3.3178*NoOfAccounts - 21.1853*NoOfCompetitors + 0.3240*MarketPotential

4) Interpretation of cofficients is as follows:

For every added account, Sales increase by 3.3178 thousands of dollars

For every comeptitor added, Sales goes down by 21.18 thousands of dollars.