5. Using Nursing Salaries Run a multiple regression analysis using Nursing Salar
ID: 3152490 • Letter: 5
Question
5. Using Nursing Salaries Run a multiple regression analysis using Nursing Salaries column as dependent variable and other columns as independent variables and answer the following questions:
A. Evaluate the performance of the regression analysis as a whole.
B. What variables significantly contribute to the changes in Nursing Salaries?
C. Is there any multicolinearity present among independent variables? How would you deal with multicolinearity problem?
D. Using the estimated regression equation, predict the Nursing Salary for the following values for independent variables: Number of Beds = 200, Annual medical in-patient days (100s) = 400, Annual total patient days (100s) = 300, Rural (1) and non-rural (0) homes = 1
Number of beds in home Annual medical in-patient days (100s) Annual total patient days (100s) Rural (1) and non-rural (0) homes Annual nursing salaries ($100s) 137 128 385 0 5230 59 155 203 1 2459 120 281 392 0 6304 120 291 419 0 6590 120 238 363 0 5362 65 180 234 1 3622 120 306 372 1 4406 90 214 305 1 4173 96 155 169 0 1955 120 133 188 1 3224 62 148 192 0 2409 120 274 300 1 2066 116 154 321 0 5946 59 120 164 1 1925 80 261 284 1 4166 120 338 375 1 5257 80 77 133 1 1988 100 204 318 1 4156 60 97 213 1 1914 110 178 280 1 5173 120 232 336 0 4630 135 316 442 0 7489 59 163 191 1 2051 60 96 202 0 3803 25 74 83 1 2008 75 225 250 1 1288 64 91 214 1 4729 62 146 204 0 2367 108 255 366 1 5933 62 144 220 1 2782 90 151 286 0 4651 146 100 375 0 6857 62 174 189 1 2143 30 54 88 1 3025 79 213 278 0 2905 44 127 158 1 1498 120 208 423 0 6236 100 255 300 1 3547 49 110 177 1 2810 123 208 336 1 6059 82 114 136 1 1995 58 166 205 1 2245 110 228 323 1 4029 62 183 222 1 2784 86 62 200 1 3720 102 326 355 1 3866 135 157 471 0 7485 78 154 203 1 3672 83 224 390 1 3995 60 48 213 0 2820 54 119 144 1 2088 120 217 327 0 4432Explanation / Answer
The data from the problem is copied to clipboard, simple copy, and then the program is run in R. The variables Number of beds in home, Annual medical in-patient days (100s), Annual total patient days (100s), Rural (1) and non-rural (0) homes, Annual nursing salaries ($100s) are respectively renamed as Bed, Medical_Days, Patient_Days , Rural, and Salary.
> nurse <- read.csv("clipboard",header=TRUE,sep=" ")
> head(nurse)
Bed Medical_Days Patient_Days Rural Salary
1 137 128 385 0 5230
2 59 155 203 1 2459
3 120 281 392 0 6304
4 120 291 419 0 6590
5 120 238 363 0 5362
6 65 180 234 1 3622
> nurselm <- lm(Salary~.,data=nurse)
> summary(nurselm)
Call:
lm(formula = Salary ~ ., data = nurse)
Residuals:
Min 1Q Median 3Q Max
-1825.4 -376.9 -171.5 642.2 1714.3
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 113.500 495.465 0.229 0.81980
Bed 9.640 7.080 1.361 0.17985
Medical_Days -7.407 2.401 -3.085 0.00341 **
Patient_Days 15.767 2.755 5.723 7.05e-07 ***
Rural -79.580 288.186 -0.276 0.78365
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 820.2 on 47 degrees of freedom
Multiple R-squared: 0.7749, Adjusted R-squared: 0.7557
F-statistic: 40.44 on 4 and 47 DF, p-value: 1.166e-14
(A) With the given data, the model is found to be significant as model p-value is 1.166e-14, which is zero and hence a significant model. Also, the Adjusted R-square is 75.57% and it means that the overall variation in salaries is well-explained by the variables considered.
(B) From the summary results above, it is clear that the variables Annual medical in-patient days and Annual total patient days are the two significant variables.
(C) The problem of multicollinearity is very much present in the problem. The variables Annual medical in-patient days and Annual total patient days are strongly correlated. However, we use the vif function from the faraway package and get:
> vif(nurselm)
Bed Medical_Days Patient_Days Rural
3.509246 2.377022 5.311486 1.452795
>
Since the Patient-Days is having very high VIF value, we drop it.
(D) The predictionis carried out with:
> newdata <- data.frame(Bed=200,Medical_Days=400,Patient_Days=300,Rural=1)
> predict(nurselm,newdata)
1
3729.258
Related Questions
drjack9650@gmail.com
Navigate
Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.