Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

Data for the following question come from http://lib.stat.cmu.edu/DASL/Datafiles

ID: 3151884 • Letter: D

Question

Data for the following question come from http://lib.stat.cmu.edu/DASL/Datafiles/

teacherpaydat.html.(n = 51).

PAY, Y : average annual public school teacher salary, in dollars.

SPEND, X1: Spending on public schools per student, in dollars.

AREA: Region (NE/NC, South, West).

We want to see if geographic region and spending on public schools affect the average

public teacher pay. A model with interaction was fitted, i.e.

E(Y) = 0 + 1X1 + 2I2 + 3I3 + 4X1 · I2 + 5X1 · I3,

where I2 and I3 are the dummy codes for AREA. I2 = 1 if AREA = South, 0 otherwise,

and I3 = 1 if AREA = West, 0 otherwise.

The following output from R for the extra sums of squares is shown below.

Analysis of Variance Table

Response: PAY

                        Df   Sum Sq     Mean Sq     F value     Pr(>F)

SPEND            1   608555015 608555015 117.7856   3.764e-14 ***

AREA               2   22606468   11303234    2.1877     0.1240

SPEND:AREA 2   9720281       _______    ______     ______

Residuals         45 232498501 5166633

1. Carry out a hypothesis test to see if the interaction terms are significant.

2. Regardless of your answer from part , suppose the interaction terms are dropped.

The following is output from the model with just the first order terms.

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 1.160e+04 1.334e+03 8.690 2.43e-11 ***

1

SPEND 3.289e+00 3.176e-01 10.354 1.03e-13 ***

AREASouth 5.294e+02 7.669e+02 0.690 0.4934

AREAWest 1.674e+03 8.012e+02 2.089 0.0422 *

############################################

##Variance-Covariance matrix for beta hats##

############################################

(Intercept) SPEND AREASouth AREAWest

(Intercept) 1780535.6980 -393.5597348 -491859.07243 -2.381145e+05

SPEND -393.5597 0.1008967 63.18227 -1.870101e+00

AREASouth -491859.0724 63.1822716 588126.71689 2.442380e+05

AREAWest -238114.5499 -1.8701007 244238.02959 6.418738e+05

What is the reference class for this model?

3. What is the estimate of 2? Give an interpretation of this value.

4. Using the Bonferroni procedure, compute the 95% family confidence intervals for

the difference in mean response for PAY between teachers in the

(a) NE/NC region and the South region;

(b) NE/NC region and the West region;

(c) South region and the West region.

5. What do your intervals from part 4 indicate about the effect of geographic region

on mean annual salary for teachers?

Explanation / Answer

Data for the following question come from http://lib.stat.cmu.edu/DASL/Datafiles/

teacherpaydat.html.(n = 51).

PAY, Y : average annual public school teacher salary, in dollars.

SPEND, X1: Spending on public schools per student, in dollars.

AREA: Region (NE/NC, South, West).

We want to see if geographic region and spending on public schools affect the average

public teacher pay. A model with interaction was fitted, i.e.

E(Y) = 0 + 1X1 + 2I2 + 3I3 + 4X1 · I2 + 5X1 · I3,

where I2 and I3 are the dummy codes for AREA. I2 = 1 if AREA = South, 0 otherwise,

and I3 = 1 if AREA = West, 0 otherwise.

The following output from R for the extra sums of squares is shown below.

We know that,

MS = SS / df

and F = MS / MSE

And P-value we can find by using EXCEL.

syntax is,

=FDIST(x, deg_freedom1, deg_freedom2)

where x is test statistic value.

deg_freedom1 = degrees of freedom for main effect or interaction effect.

deg_freedom2 = degrees of freedom for error.

By completing ANOVA table,

We see that for interaction (spend*area) test statistic value is 0.9407 and P-value is 0.3979.

P-value > alpha

Accept H0 at 5% level of significance.

Conclusion : The results are not significant.

If interaction term is dropped out results are significant for variable spend and insignificant for area.

What is the estimate of 2? Give an interpretation of this value.

2 is the coefficient of AREA south and that coefficient is 5.294e+02 .

The regression equation is,

Y =1.160e+04 +  3.289e+00*spend + 5.294e+02*AREASouth + 1.674e+03*AREAWest

If all the variables are fixed except AREASouth then unit change in AREASouth will be result in 5.294e+02 change in Y.

df sum sq mean sq F Pr spend 1 608555015 608555015 117.7856 3.76392E-14 area 2 22606468 11303234 2.187737 0.123958948 spend*area 2 9720281 4860140.5 0.940678 0.397903425 residuals 45 232498501 5166633.356