PLEASE INCLUDE A SUMMARY EXPLAINING AND DETAILING ALL THE DECISIONS! A 10-year s
ID: 3220283 • Letter: P
Question
PLEASE INCLUDE A SUMMARY EXPLAINING AND DETAILING ALL THE DECISIONS!
A 10-year study conducted by the AHA provided data on how age, blood pressure, and smoking relate to the risk of strokes. Assume that the following data are from a portion of this study. Risk is interpreted as the probability (x 100) that the patient will have a stroke over the next 10 year period. For the smoking variable, define a dummy variable with 1 indicating a smoker and 0 indicating a nonsmoker. a. Develop an estimated regression equation that relates risk of a stroke to the person's age, blood pressure, and whether the person is a smoker. b. Is the model significant (or worth keeping)? What hypothesis test shows whether the model is significant. What is the deciding factor to keep or discard the model? c. How much of the variability in risk of stroke can be explained by the model? What is this known as? d. Is smoking a significant factor in the risk of a stroke? How do you know? Use alpha = 0.05 e. What is the probability of a stroke over the next 10 years for Art Speen, a 68 year old smoker who has blood pressure of 175? What action might the physician recommend for the patient?Explanation / Answer
a. I have used R programming to develop the regression equation and ran the below commands.
The model is risk = B0 + B1 age + B2 pressure + B3 smoker
> model <- lm(risk ~ age+pressure+smoker)
> summary(model)
Call:
lm(formula = risk ~ age + pressure + smoker)
Residuals:
Min 1Q Median 3Q Max
-13.1064 -1.5715 0.4225 3.4855 8.5561
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -91.75950 15.22276 -6.028 1.76e-05 ***
age 1.07674 0.16596 6.488 7.49e-06 ***
pressure 0.25181 0.04523 5.568 4.24e-05 ***
smoker1 8.73987 3.00082 2.912 0.0102 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 5.757 on 16 degrees of freedom
Multiple R-squared: 0.8735, Adjusted R-squared: 0.8498
F-statistic: 36.82 on 3 and 16 DF, p-value: 2.064e-07
The estimated regression equation is,
risk = -91.75950 + 1.07674 age + 0.25181 pressure + 8.73987 smoker
(b) R-squared value of the model is 0.8735. So it looks like a good model.
Running anova on the model, we get
Analysis of Variance Table
Response: risk
Df Sum Sq Mean Sq F value Pr(>F)
age 1 1771.98 1771.98 53.4726 1.743e-06 ***
pressure 1 1607.66 1607.66 48.5138 3.185e-06 ***
smoker 1 281.10 281.10 8.4826 0.01017 *
Residuals 16 530.21 33.14
---
As all p-values are lower than significance level 0.05, all variables Age, Pressure and Smoking are significant variables in in the variable and the model is significant. If any of the p-values of the varaiables are greater than 0.05, we can discard the model.
(c) R-squared is 0.8735. So 87.35 % of the variability in risk of stroke is explained by the model. This is known as coefficient of determination.
(d) As the p-values of smoking is lower than significance level 0.05, Smoking is a significant factor in the risk of stroke.
(e) The estimated regression equation is,
risk = -91.75950 + 1.07674 age + 0.25181 pressure + 8.73987 smoker
So, risk = -91.75950 + 1.07674 * 68 + 0.25181 * 175 + 8.73987 * 1 = 34.26544
Related Questions
drjack9650@gmail.com
Navigate
Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.