Find a data source online to use for a linear regression model. The data source
ID: 3375834 • Letter: F
Question
Find a data source online to use for a linear regression model. The data source must contain at least two quantitative variables, preferably both continuous in nature. This data set cannot be from another textbook or worked out regression example. Think about topics that interest you, or use some data that's relevant to your work or personal life. If you are unsure if your data set qualifies for this project, send it over to me and I'll review it for you. To earn all of the points for this assignment, you must: Fit a linear model in R, and provide both the line of best fit and R2 value. Interpret what the R2 value means in the context of the problem. Provide 95% confidence intervals for both the y-intercept and slope, and interpret these confidence intervals. Produce a 95% prediction interval for a reasonable x-value of your choice. Comment on how the linear regression model fits the three main modeling assumptions from the lecture notes. Discuss how you adjusted the model to meet the assumptions better, especially in regards to the assumption of errors having constant variance * . · . *Explanation / Answer
As per chegg protocol I only solvethe first part
> library(faraway)
Warning message:
package ‘faraway’ was built under R version 3.4.4
> str(divusa)
'data.frame': 77 obs. of 7 variables:
$ year : int 1920 1921 1922 1923 1924 1925 1926 1927 1928 1929 ...
$ divorce : num 8 7.2 6.6 7.1 7.2 7.2 7.5 7.8 7.8 8 ...
$ unemployed: num 5.2 11.7 6.7 2.4 5 3.2 1.8 3.3 4.2 3.2 ...
$ femlab : num 22.7 22.8 22.9 23 23.1 ...
$ marriage : num 92 83 79.7 85.2 80.3 79.2 78.7 77 74.1 75.5 ...
$ birth : num 118 120 111 110 111 ...
$ military : num 3.22 3.56 2.46 2.21 2.29 ...
> df=data.frame(divusa[,2:7]) ## to keep the dependent variable and theindependent variable together####
> head(df)
divorce unemployed femlab marriage birth military
1 8.0 5.2 22.70 92.0 117.9 3.2247
2 7.2 11.7 22.79 83.0 119.8 3.5614
3 6.6 6.7 22.88 79.7 111.2 2.4553
4 7.1 2.4 22.97 85.2 110.5 2.2065
5 7.2 5.0 23.06 80.3 110.9 2.2889
6 7.2 3.2 23.15 79.2 106.6 2.1735
> model<-lm(divorce~.,data=df)### now we do multiple linear regrsiion###
> summary(model)
Call:
lm(formula = divorce ~ ., data = df)
Residuals:
Min 1Q Median 3Q Max
-3.8611 -0.8916 -0.0496 0.8650 3.8300
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2.48784 3.39378 0.733 0.4659
unemployed -0.11125 0.05592 -1.989 0.0505 .
femlab 0.38365 0.03059 12.543 < 2e-16 ***
marriage 0.11867 0.02441 4.861 6.77e-06 ***
birth -0.12996 0.01560 -8.333 4.03e-12 ***
military -0.02673 0.01425 -1.876 0.0647 .
---
Signif. codes:
0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 1.65 on 71 degrees of freedom
Multiple R-squared: 0.9208, Adjusted R-squared: 0.9152
F-statistic: 165.1 on 5 and 71 DF, p-value: < 2.2e-16
Related Questions
drjack9650@gmail.com
Navigate
Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.