Statistical test of significance - Please give me SAS or R code. The months are
ID: 3340962 • Letter: S
Question
Statistical test of significance - Please give me SAS or R code.
The months are the birth months of members of Groups A, B, C. Thus, 178 is the number of people born in January that are in Group A.
Is there a statistical relationship between month of birth and membership in a certain group? Do a test of signifance in SAS to support or reject this statement. Obviously, there is a relationship in which people born earlier in the year are more likely to be in a group, any group.
Group Jan Feb Mar Apr May June Jul Aug Sep Oct Nov Dec A 178 190 164 176 169 137 112 119 138 122 98 97 B 79 71 81 86 79 67 61 56 45 50 56 45 C 31 31 23 34 25 36 17 21 22 17 22 9Explanation / Answer
Here number of peoples born is dependent variable and month and group are independent variables.
SO this is the problem of multiple regression.
We can do multiple regression in R.
Lets denote group by 1,2,3 for group A, B and C respectively.
And months are denoted by 1,2,3,.......,12 for months Jan, Feb, Mar,..............Dec respectively.
> y = c(178,79,31,190,71,31,164,81,23,176,86,34,169,79,25,137,67,36,112,61,17,119,56,21,138,45,22,122,50,17,98,56,22,97,45,9)
> x1= c(1,2,3,1,2,3,1,2,3,1,2,3,1,2,3,1,2,3,1,2,3,1,2,3,1,2,3,1,2,3,1,2,3,1,2,3)
> x2 = c(1,1,1,2,2,2,3,3,3,4,4,4,5,5,5,6,6,6,7,7,7,8,8,8,9,9,9,10,10,10,11,11,11,12,12,12)
> model = lm(formula = y ~ x1 + x2)
> model
Call:
lm(formula = y ~ x1 + x2)
Coefficients:
(Intercept) x1 x2
223.535 -58.833 -4.476
> summary(model)
Call:
lm(formula = y ~ x1 + x2)
Residuals:
Min 1Q Median 3Q Max
-25.918 -11.674 -1.408 13.863 34.249
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 223.5354 8.8686 25.205 < 2e-16 ***
x1 -58.8333 3.3448 -17.590 < 2e-16 ***
x2 -4.4755 0.7911 -5.657 2.65e-06 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 16.39 on 33 degrees of freedom
Multiple R-squared: 0.9119, Adjusted R-squared: 0.9065
F-statistic: 170.7 on 2 and 33 DF, p-value: < 2.2e-16
The regression equation is,
y = 223.535 - 58.833*x1 - 4.476*x2
Intercept = 223.535
SLopes = -58.833 and -4.476
Interpretation of slopes : If we fixed x1 then one unit change in x2 will be 4.476 unit decrease in y.
If we fixed x2 then one unit change in x1 will be 58.833 unit decrease in y.
We can test here two hypothesis as :
i) Overall significance :
Here we have to test the hypothesis that,
H0 : Bj = 0 Vs H1 : Bj not= 0
where Bj is population slope for jth independent variable.
Assume alpha = level of significance = 5% = 0.05
Here test statistic follows F - distribution.
Test statistic = 170.7
P-value - 2.2e-16 = 0.000
P-value < alpha
Reject H0 at 5% level of significance.
Conclusion : Atleast one of the slope is differ than 0.
We get significant result about F-test.
Individual significance :
Here we have to test the hypothesis that,
H0 : B = 0 Vs H1 : B not= 0
where B is population slope for independent variable.
Assume alpha = level of significance = 5% = 0.05
Here test statistic follows t - distribution.
For x1 :
Test statistic = -17.590
P-value = 2e-16 = 0.000
P-value < alpha
Reject H0 at 5% level of significance.
Conclusion : The population slope for x1 is differ than 0.
OR there is some relationship between y and x1.
We get significant result about x1.
Now similarly for x2 :
Test statistic = -5.657
P-value = 2.65e-06 = 0.000
P-value < alpha
Reject H0 at 5% level of significance.
Conclusion : The population slope for x2 is differ than 0.
OR there is some relationship between y and x2.
We get significant result about x2.
R-sq = 0.9119
It expresses the proportion of variation in y which is explained by variation in x1 and x2.
Related Questions
drjack9650@gmail.com
Navigate
Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.