Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

The Major League Baseball Data Set (collected from 2005) on the the next tab des

ID: 3218703 • Letter: T

Question

The Major League Baseball Data Set (collected from 2005) on the the next tab describes some descriptive stats for all MLB teams in 2005. It might be interesting to test whether wins (a potential dependent variable) or attendance (a potential depedent variable) can be explained linearly by some combination of variables listed in the dataset. Please develop a multiple regression scenario (that makes sense) from the dataset below and test it by following the project directions.

1,914,385

Team Seating Capacity Salary ($) Salary ($M) Batting ERA HR Error Stolen Bases Wins Attendance Boston 33,871 123,505,125 123.5 0.281 4.74 199 109 45 95.0 2,847,798 New York Yankees 57,746 208,306,817 208.3 0.276 4.52 229 95 84 95.0 4,090,440 Oakland 43,662 55,425,762 55.4 0.262 3.69 155 88 31 88.0 2,108,818 Baltimore 48,262 73,914,333 73.9 0.269 4.56 189 107 83 74.0 2,623,904 Los Angles Angels 45,050 97,725,322 97.7 0.270 3.68 147 87 161 95.0 3,404,636 Cleveland 43,368 41,502,500 41.5 0.271 3.61 207 106 62 93.0 2,014,220 Chicago White Sox 44,321 75,178,000 75.2 0.262 3.61 200 94 137 99.0 2,342,804 Toronto 50,516 45,719,500 45.7 0.265 4.06 136 95 72 80.0 2,014,995 Minnesota 48,678 56,186,000 56.2 0.259 3.71 134 102 102 83.0 2,034,243 Tampa Bay 44,027 29,679,067 29.7 0.274 5.39 157 124 151 67.0 1,141,915 Texas 52,000 55,849,000 55.8 0.267 4.96 260 108 67 79.0 2,525,259 Detroit 40,000 69,092,000 69.1 0.272 4.51 168 110 66 71.0 2,024,505 Seattle 45,611 87,754,334 87.8 0.256 4.49 130 86 102 69.0 2,724,859 Kansas City 40,529 36,881,000 36.9 0.263 5.49 126 125 53 56.0 1,371,181 Atlanta 50,062 86,457,302 86.5 0.265 3.98 184 86 92 90.0 2,520,904 Arizona 49,075 62,329,166 62.3 0.256 4.84 191 94 67 77.0 2,059,327 Houston 42,000 76,799,000 76.8 0.256 3.51 161 89 115 89.0 2,805,060 Cincinnati 42,059 61,892,583 61.9 0.261 5.15 222 104 72 73.0 1,923,254 New York Mets 55,775 101,305,821 101.3 0.258 3.76 175 106 153 83.0 2,827,549 Pittsburgh 38,127 38,133,000 38.1 0.259 4.42 139 117 73 67.0 1,817,245 Los Angeles Dodgers 56,000 83,039,000 83.0 0.253 4.38 149 106 58 71.0 3,603,680 San Diego 42,445 63,290,833 63.3 0.257 4.13 130 109 99 82.0 2,869,787 Washington 56,000 48,581,500 48.6 0.252 3.87 117 92 45 81.0 2,730,352 San Francisco 40,800 90,199,500 90.2 0.261 4.33 128 90 71 75.0 3,181,020 St Louis 49,625 92,106,833 92.1 0.270 3.49 170 100 83 100.0 3,542,271 Florida 42,531 60,408,834 60.4 0.272 4.16 128 103 96 83.0 1,852,608 Philadelphia 43,500 95,522,000 95.5 0.270 4.21 167 90 116 88.0 2,665,304 Milwaukee 42,400 39,934,833 39.9 0.259 3.97 175 119 79 81.0 2,211,323 Chicago Cubs 38,957 87,032,933 87.0 0.270 4.19 194 101 65 79.0 3,100,092 Colorado 50,381 48,155,000 48.2 0.267 5.13 150 118 65 67.0

1,914,385

Explanation / Answer

Here wins or attendance is dependent variable and remaining are independent variable.

Here we have to fit multiple regression.

This we can done in MINITAB.

steps :

ENTER data into MINITAB sheet --> STAT --> Regression --> Regression --> Response : wins --> Predictors : select all the independent variables --> Results : select second option --> ok --> ok

output :

————— 4/7/2017 9:59:04 PM ————————————————————

Regression Analysis: Wins versus Seating Capacity, Salary ($), ...

The regression equation is
Wins = 17.9 - 0.000012 Seating Capacity - 0.000027 Salary ($) + 26.9 Salary ($M)
+ 443 Batting - 14.1 ERA + 0.0954 HR - 0.126 Error + 0.0064 Stolen Bases


Predictor Coef SE Coef T P
Constant 17.92 42.26 0.42 0.676
Seating Capacity -0.0000116 0.0001743 -0.07 0.948
Salary ($) -0.00002684 0.00003938 -0.68 0.503
Salary ($M) 26.88 39.39 0.68 0.502
Batting 443.5 162.2 2.73 0.012
ERA -14.091 2.065 -6.82 0.000
HR 0.09542 0.03114 3.06 0.006
Error -0.1258 0.1106 -1.14 0.268
Stolen Bases 0.00635 0.02921 0.22 0.830


S = 4.82934 R-Sq = 85.6% R-Sq(adj) = 80.1%


Analysis of Variance

Source DF SS MS F P
Regression 8 2914.23 364.28 15.62 0.000
Residual Error 21 489.77 23.32
Total 29 3404.00

Here we see that regression model is significant since F test statistic is 15.62 and P-value is 0.000.

R-sq= 85.6%

It expresses the proportion of variation in response variable (y) which is explained by variation in independent variables.

F-test is known as overall test for regression.

We can test single slope by using t-test.

We see that P-value for batting, ERA and error are less than alpha (0.05).

These three variables are significant variable.

And remaining variables have P-value greator than alpha.

These variables are insignificant variable.

Hire Me For All Your Tutoring Needs
Integrity-first tutoring: clear explanations, guidance, and feedback.
Drop an Email at
drjack9650@gmail.com
Chat Now And Get Quote