Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

The owner of a restaurant in Bloomington, Indiana, has recorded sales data for t

ID: 3177212 • Letter: T

Question

The owner of a restaurant in Bloomington, Indiana, has recorded sales data for the past 19 years. He has also recorded data on potentially relevant variables. The data are listed in the file A4P1Data.xls.

A4P1Data.xls link: https://drive.google.com/open?id=0BwCycs32_rqAOGZqU2p4b3p6bms

a. Estimate a simple linear regression model involving annual sales (the response variable) and the size of the population (the explanatory variable) residing within 10 miles of the restaurant. Interpret R2.

b. Add another explanatory variable-annual advertising expenditures-to the regression model in part a. Estimate and interpret this expanded model. How does the R2 value for this multiple regression model compare to that of the simple regression model estimated in part a? Explain any difference between the two R2 values. Compute and interpret the adjusted R2 value for the revised model.

c. Add one more explanatory variable to the multiple regression model estimated in part b. In particular, estimate and interpret the coefficients of a multiple regression model that includes the previous year's advertising expenditure. How does the inclusion of this third explanatory variable affect the R2 and adjusted R2 values, in comparison to the corresponding values for the model of part b? Explain any changes in these values.

d. For the model of part c, which of the explanatory variables have significant effect on sales at the 10% significance level? Do any of these results surprise you? Explain why or why not.

Explanation / Answer

I am using R software to solve this problem.

At first, we need to import the data into R environment as below:

InputData <- read.table("Data1.txt",sep = " ", header = T, stringsAsFactors = F)

#Removing special characters like $ and , from the data
InputData$Sales <- gsub("\$|,","",InputData$Sales)
InputData$Population <- gsub(",","",InputData$Population)
InputData$Advertising <- gsub("\$","",InputData$Advertising)
InputData$Previous_Advertising <- gsub("\$","",InputData$Previous_Advertising)

#Converting the variables to numeric data types
InputData$Sales <- as.numeric(InputData$Sales)
InputData$Population <- as.numeric(InputData$Population)
InputData$Advertising <- as.numeric(InputData$Advertising)
InputData$Previous_Advertising <- as.numeric(InputData$Previous_Advertising)

a) Fitting a simple linear regression model  involving annual sales (the response variable) and the size of the population (the explanatory variable)

Model1 <- lm(Sales ~ Population, data = InputData)
summary(Model1)

Call:
lm(formula = Sales ~ Population, data = InputData)

Residuals:
Min 1Q Median 3Q Max
-5883.1 -1936.0 187.8 1390.8 5244.3

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 8.038e+03 1.250e+04 0.643 0.529
Population 6.721e-02 1.121e-01 0.600 0.557

Residual standard error: 2780 on 17 degrees of freedom
Multiple R-squared: 0.02071,   Adjusted R-squared: -0.0369
F-statistic: 0.3595 on 1 and 17 DF, p-value: 0.5567

R square for this model is 0.02071

b) Adding another explanatory variable-annual advertising expenditures-to the regression model in part a

Model2 <- lm(Sales ~ Population + Advertising, data = InputData)
summary(Model2)

Call:
lm(formula = Sales ~ Population + Advertising, data = InputData)

Residuals:
Min 1Q Median 3Q Max
-4058 -1571 -133 1056 5384

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 4.279e+03 1.111e+04 0.385 0.7051
Population 6.687e-02 9.866e-02 0.678 0.5076
Advertising 1.643e+02 6.740e+01 2.438 0.0268 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 2447 on 16 degrees of freedom
Multiple R-squared: 0.286,   Adjusted R-squared: 0.1967
F-statistic: 3.204 on 2 and 16 DF, p-value: 0.06755

R square for second model has improved to 0.286 as compared to 0.02071 for model 1. Adjusted R squared for the second model is 0.1967. In this model, Advertising variable is significant as p value is less than 0.05.

c) Adding previous year's advertising expenditure to the model in part b

Model3 <- lm(Sales ~ Population + Advertising + Previous_Advertising, data = InputData)
summary(Model3)

Call:
lm(formula = Sales ~ Population + Advertising + Previous_Advertising,
data = InputData)

Residuals:
Min 1Q Median 3Q Max
-504.50 -123.89 -16.62 152.06 748.05

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -4770.2049 1580.4333 -3.018 0.00864 **
Population 0.1011 0.0138 7.327 2.50e-06 ***
Advertising 120.2392 9.5221 12.627 2.15e-09 ***
Previous_Advertising 269.3897 9.4738 28.435 1.83e-14 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 341 on 15 degrees of freedom
Multiple R-squared: 0.987,   Adjusted R-squared: 0.9844
F-statistic: 379.5 on 3 and 15 DF, p-value: 2.312e-14

The r squared and adjusted r squared for this model is 0.987 and 0.9844 respectively. So these values have drastically improved with addition of the third variable, and all the three variables are coming significant. This model is able to explain around 99% of the variance.

d) For model in part c, all the 3 variables have significant effect on sales at 10% significance level. This tells us that with addition of an important variable, in this case previous year's advertising, model has improved drastically.

Hire Me For All Your Tutoring Needs
Integrity-first tutoring: clear explanations, guidance, and feedback.
Drop an Email at
drjack9650@gmail.com
Chat Now And Get Quote