Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

A department store wants to study the relationship between it’s monthly market s

ID: 3368498 • Letter: A

Question

A department store wants to study the relationship between it’s monthly market share (Y) of a particular product with the predictor variables below. The original data file is located in problem02.csv.

X1 = Price of the product, in dollars (quantitative)

X2 = 1 if the product was discounted, 0 if not discounted (categorical)

X3 = 1 if a promotion was present, 0 if no promotion (categorical)

X4 = Product rating score (quantitative)

All one variable, two variable, and three variable models were considered for subset selection, backward elimination, and forward selection.

(d) In R, fit the full multiple regression model

Display the summary. Write down the fitted equation, multiple-R2, and adjusted R2.

(e) Using the regsubsets() function, perform a subset selection for all possible subset models starting with the model in (d). Show the three best subsets per model size (nbest=3). Display multiple-R2, the adjusted Ra2, and Mallow’s Cp values for each model. Which model is best using the selection criterion for Ra2 and Cp?

(f) Using the stepAIC() function, perform the backward elimination and forward selection algorithms to find the best model based on AIC. Do these algorithms pick the same model selected in (e)?

(g) Fit the best model selected in part (f). Write down the fitted equation. Calculate the PRESSp statistic for this model. Calculate (PRESSp)/n and MSEp. Is this model effective?

(h) Obtain Studentized deleted residuals, hat values, DFFITS, Cooks D, and DFBETAS for each observation, based on the model you selected in part (f). Do any cities stand out as outliers or influential cases? Specifically give your “critical” cut-off value for each measure.

problem02.csv

market.share,price,discount,promotion,rating

3.15,2.198,1,1,498

2.52,2.186,0,0,510

2.64,2.293,1,1,422

2.55,2.42,0,1,858

2.69,2.179,1,0,566

2.38,2.207,0,0,536

3.02,2.127,1,1,585

2.52,2.206,1,0,310

2.45,2.305,0,0,211

2.42,2.26,0,1,504

3.16,2.205,1,1,234

2.6,2.34,0,0,347

2.98,2.171,1,0,430

2.5,2.201,0,1,518

2.45,2.248,0,0,465

4.36,2.74,0,1,900

3.06,2.184,1,1,684

2.34,2.373,0,0,152

2.88,2.157,1,1,453

2.94,2.129,1,1,485

2.72,2.557,1,0,78

2.27,2.587,0,1,72

2.33,2.255,0,0,391

2.64,2.124,1,0,322

2.21,3.99,1,1,652

2.76,2.683,1,1,317

3.05,2.336,1,1,252

2.48,2.266,0,1,446

2.23,2.443,0,0,521

2.65,2.478,1,1,435

2.56,2.394,1,0,402

2.66,2.414,1,1,468

2.99,2.233,1,0,262

2.3,2.302,0,1,182

2.88,2.421,1,1,145

2.8,2.518,1,0,270

2.48,2.497,0,1,322

2.85,2.781,1,1,317

Possible R Outline

########################

####### Part (d) #######

########################

# Fit the full multiple linear regression model with X1, X2, X3, and X4

fit3 <-

summary(fit3)

########################

####### Part (e) #######

########################

library(leaps) # this library is needed for the regsubsets() function. You may have to

               # install the library using the install.packages() function.

# Use the regsubsets() function to perform subset selection

subs <-

with(summary(subs), round(cbind(which,rsq,adjr2,cp),3)) # Just run this line of code

########################

####### Part (f) #######

########################

library(MASS) # this library is needed for the stepAIC() function. You may have to

              # install the library using the install.packages() function.

# Use the stepAIC() function to perform backward and forward model selection based on AIC

# You'll need to define this fit vs. intercept only (1) for the forward model selection

fit0 <-

########################

####### Part (g) #######

########################

library(qpcR) # this library is needed for the regsubsets() function. You may have to

              # install the library using the install.packages() function.

# Fit the model only using the variables selected in the model of part (e)

fit4 <-

summary(fit4)

# Run the PRESS function on this model, fit4

pr <-

pr # Display the results

# Calculate PRESS/n and MSE of fit4

pr$stat/n

sigma(fit4)^2

########################

####### Part (h) #######

########################

p <- # set number of parameters in selected model from (f)

# Find the studentized deleted residuals, hat values, DFFITS, and Cooks Distance

rstu <-

hats <-

df.fit <-

cooksd <-

round(cbind(rstudent = rstu, hat = hats, dffits = df.fit, cooksd = cooksd), 3)

# Value for comparing hat values: 2p/n

2*p/n

# Value to compare Cooks D: F(0.50, p, n-p)

qf(0.50, p, n-p)

# Calculate DFBETAS

df.beta <-

round(df.beta, 3)

Explanation / Answer

Loaded the csv data into a dataframe problem02.

problem02 <- read.csv("problem02.csv")

Convert X2 and X3 to categorical variables.

problem02$discount <- as.factor(problem02$discount)

problem02$promotion <- as.factor(problem02$promotion)

Part (d) #######

# Fit the full multiple linear regression model with X1, X2, X3, and X4

fit3 <- lm(market.share ~ ., data = problem02)

summary(fit3)

> summary(fit3)

Call:
lm(formula = market.share ~ ., data = problem02)

Residuals:
Min 1Q Median 3Q Max
-0.45663303 -0.19002049 -0.05333401 0.15586346 1.51755260

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2.7262349557 0.4333347326 6.29129 4.104e-07 ***
price -0.2360561628 0.1805193714 -1.30765 0.200029
discount1 0.2886599841 0.1142393211 2.52680 0.016486 *
promotion1 0.1482331840 0.1184460546 1.25148 0.219558
rating 0.0006830813 0.0003087363 2.21251 0.033962 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.3400639 on 33 degrees of freedom
Multiple R-squared: 0.3046799,   Adjusted R-squared: 0.2203986
F-statistic: 3.615038 on 4 and 33 DF, p-value: 0.01500259

The fitted equation is,

Y = 2.7262349557 - 0.2360561628 X1 + 0.2886599841 X2 + 0.1482331840 X3 + 0.0006830813 X4

multiple-R2 = 0.3046799

and adjusted R2 = 0.2203986

####### Part (e) #######

# Use the regsubsets() function to perform subset selection

subs <- regsubsets(market.share ~ ., data = problem02,nbest=3)
summary(subs)
Subset selection object
Call: regsubsets.formula(market.share ~ ., data = problem02, nbest = 3)
4 Variables (and intercept)
Forced in Forced out
price FALSE FALSE
discount1 FALSE FALSE
promotion1 FALSE FALSE
rating FALSE FALSE
3 subsets of each size up to 4
Selection Algorithm: exhaustive
price discount1 promotion1 rating
1 ( 1 ) " " "*" " " " "   
1 ( 2 ) " " " " " " "*"   
1 ( 3 ) " " " " "*" " "   
2 ( 1 ) " " "*" " " "*"   
2 ( 2 ) " " "*" "*" " "   
2 ( 3 ) " " " " "*" "*"   
3 ( 1 ) "*" "*" " " "*"   
3 ( 2 ) " " "*" "*" "*"   
3 ( 3 ) "*" "*" "*" " "   
4 ( 1 ) "*" "*" "*" "*"   

Models are listed in order of size (the first column in the above summary), and within size, in order of fit (best model first). The included variables are indicated by asterisks in quotes; variables not in a model have empty quotes.
So, the best model with only one variable is the model with discount variable.
The best model with only two variable is the model with discount and rating variables.
The best model with only three variable is the model with price, discount and rating variables.

with(summary(subs), round(cbind(which,rsq,adjr2,cp),3)) # Just run this line of code

(Intercept) price discount1 promotion1 rating rsq adjr2 cp
1 1 0 1 0 0 0.121 0.097 7.694
1 1 0 0 0 1 0.097 0.072 8.842
1 1 0 0 1 0 0.074 0.049 9.928
2 1 0 1 0 1 0.247 0.204 3.739
2 1 0 1 1 0 0.173 0.126 7.240
2 1 0 0 1 1 0.141 0.092 8.757
3 1 1 1 0 1 0.272 0.207 4.566
3 1 0 1 1 1 0.269 0.204 4.710
3 1 1 1 1 0 0.202 0.131 7.895
4 1 1 1 1 1 0.305 0.220 5.000

Based on Adj R2 and Cp, the best model is the full model, with all predictors X1, X2, X3 and X4 which has the largest adj R2 and Cp is very near the number of predictors.

####### Part (f) #######

# Use the stepAIC() function to perform backward and forward model selection based on AIC

# You'll need to define this fit vs. intercept only (1) for the forward model selection

Using backward elimination,

> stepAIC(fit3, direction = "backward")
Start: AIC=-77.34
market.share ~ price + discount + promotion + rating

Df Sum of Sq RSS AIC
- promotion 1 0.18112178 3.9973552 -77.574222
- price 1 0.19774434 4.0139778 -77.416531
<none> 3.8162335 -77.336245
- rating 1 0.56609628 4.3823297 -74.080216
- discount 1 0.73835112 4.5545846 -72.615170

Step: AIC=-77.57
market.share ~ price + discount + rating

Df Sum of Sq RSS AIC
- price 1 0.13556831 4.1329236 -78.306843
<none> 3.9973552 -77.574222
- rating 1 0.75191338 4.7492686 -73.024630
- discount 1 0.87427972 4.8716350 -72.057949

Step: AIC=-78.31
market.share ~ discount + rating

Df Sum of Sq RSS AIC
<none> 4.1329236 -78.306843
- rating 1 0.68874690 4.8216705 -74.449698
- discount 1 0.82148101 4.9544046 -73.417748

Call:
lm(formula = market.share ~ discount + rating, data = problem02)

Coefficients:
(Intercept) discount1 rating
2.2248257090 0.2997740407 0.0007300344

Based on backward elimination, the best model is with the variables discount and rating.

Using forward selection,

fit0 <- lm(market.share ~ 1, data = problem02)

> stepAIC(fit0, direction = "forward", scope=list(upper=fit3,lower=fit0))
Start: AIC=-71.53
market.share ~ 1

Df Sum of Sq RSS AIC
+ discount 1 0.66678481 4.8216705 -74.449698
+ rating 1 0.53405070 4.9544046 -73.417748
+ promotion 1 0.40850526 5.0799500 -72.466820
<none> 5.4884553 -71.527694
+ price 1 0.04324826 5.4452070 -69.828315

Step: AIC=-74.45
market.share ~ discount

Df Sum of Sq RSS AIC
+ rating 1 0.68874690 4.1329236 -78.306843
+ promotion 1 0.28384445 4.5378260 -74.755248
<none> 4.8216705 -74.449698
+ price 1 0.07240182 4.7492686 -73.024630

Step: AIC=-78.31
market.share ~ discount + rating

Df Sum of Sq RSS AIC
<none> 4.1329236 -78.306843
+ price 1 0.13556831 3.9973552 -77.574222
+ promotion 1 0.11894575 4.0139778 -77.416531

Call:
lm(formula = market.share ~ discount + rating, data = problem02)

Coefficients:
(Intercept) discount1 rating
2.2248257090 0.2997740407 0.0007300344

Based on forward selection, the best model is with the variables discount and rating.

This model is different from the model selected in part (e).

####### Part (g) #######

########################

fit4 <- lm(market.share ~ ., data = problem02)

summary(fit4)

Call:
lm(formula = market.share ~ ., data = problem02)

Residuals:
Min 1Q Median 3Q Max
-0.45663303 -0.19002049 -0.05333401 0.15586346 1.51755260

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2.7262349557 0.4333347326 6.29129 4.104e-07 ***
price -0.2360561628 0.1805193714 -1.30765 0.200029
discount1 0.2886599841 0.1142393211 2.52680 0.016486 *
promotion1 0.1482331840 0.1184460546 1.25148 0.219558
rating 0.0006830813 0.0003087363 2.21251 0.033962 *
---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.3400639 on 33 degrees of freedom
Multiple R-squared: 0.3046799,   Adjusted R-squared: 0.2203986
F-statistic: 3.615038 on 4 and 33 DF, p-value: 0.01500259

# Run the PRESS function on this model, fit4

pr <- PRESS(fit4)

pr # Display the results

# Calculate PRESS/n and MSE of fit4

n <- 38

pr$stat/n

sigma(fit4)^2

[1] 0.1156434384

Hire Me For All Your Tutoring Needs
Integrity-first tutoring: clear explanations, guidance, and feedback.
Drop an Email at
drjack9650@gmail.com
Chat Now And Get Quote