Summarize the consulcion of sample data provided below: Were you suprisedby resu
ID: 3269018 • Letter: S
Question
Summarize the consulcion of sample data provided below:
Were you suprisedby results oy the project or did the summary verify what you suspected? Why or why not.
Discuss any lurking variable that might influence the results
Make recommdation for further studies
What did you learn
What would you have don differently the next time?
we shall use the open source statistcal package R to perform the analysis , the complete r snippet is as shown below. please note that it is not given which one of these is explanatory variable , so we are considering age as the explanatory variable , the compelte r snippet is as follows
# read the data into R dataframe
data.df<- read.csv("C:\Users\586645\Downloads\Chegg\marathon.csv",header=TRUE)
str(data.df)
# summary statistics
summary(data.df)
# plots
colr<-c("tomato","turquoise","violetred2" ,"cornflowerblue" ,"gainsboro" ,
"whitesmoke","yellow3","slateblue1","sienna3" , "wheat1",
"salmon3" , "plum2","coral1","palegreen1" ,"orangered" ,"magenta4" )
boxplot(Age~ Gender, data=data.df,ylab="Gender",
main="Boxplots of Age ~ Gender",col=colr,horizontal=TRUE)
boxplot(Time~ Gender, data=data.df,ylab="Gender",
main="Boxplots of Time ~ Gender",col=colr,horizontal=TRUE)
# histograms
hist(data.df$Age , col="red")
hist(data.df$Time,col="blue")
# stem and leaf plot
stem(data.df$Age)
stem(data.df$Time)
# scatterplot
plot(data.df$Age,data.df$Time,col="darkgreen", main = "Scatterplot of Age and Time",pch = 16)
lm(Age~Time,data = data.df)
################
the results are
The stem plots are
stem(data.df$Age)
The decimal point is at the |
18 | 0
20 | 000
22 | 000
24 | 00000
26 | 00
28 | 0000000000
30 | 000000000000000000
32 | 00000000000000
34 | 000000000
36 | 00000000
38 | 00000000000
40 | 00000000000
42 | 000000000
44 | 0000000
46 | 0000
48 | 0000
50 | 00000000000000
52 | 000000
54 | 0000
56 |
58 | 00
60 | 000
62 |
64 | 0
66 |
68 | 0
stem(data.df$Time)
The decimal point is 3 digit(s) to the right of the |
9 | 6
10 | 24679
11 | 1233478899
12 | 01133679
13 | 01233466778899
14 | 11111112234455666777788999
15 | 000001222333446778
16 | 0122344445688888999
17 | 0022233333446677889
18 | 015666789
19 | 227
20 | 12377889
21 | 069
22 | 011
23 | 5
24 | 4
25 | 49
The linear regression is
lm(Age~Time,data = data.df)
Call:
lm(formula = Age ~ Time, data = data.df)
Coefficients:
(Intercept) Time
Does a linear relation exists? Determine by comparing the correlation coefficient (r) with the critical value (from Table II from Appendix A).
# read the data into R dataframe
data.df<- read.csv("C:\Users\586645\Downloads\Chegg\marathon.csv",header=TRUE)
str(data.df)
cor(data.df$Age,data.df$Time)
cor.test(data.df$Age,data.df$Time)
the results are
> cor(data.df$Age,data.df$Time)
[1] 0.1445136 , as the value is low , we can say that there is no relationship between the 2 variables
> cor.test(data.df$Age,data.df$Time)
Pearson's product-moment correlation
data: data.df$Age and data.df$Time
t = 1.7767, df = 148, p-value = 0.07767 , also the correlation is not signficant as we see that the p value is greater than 0.05 ,hence we can conclude that the correlation is not signficant
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
-0.01612133 0.29787629
sample estimates:
cor
0.1445136
Identify the least-squares regression line and use the line to predict a y-value for an x-value you select that is within the scope of your sample data. Example: Using the least-squares regression line
The least square regression line is
> model <- lm(Age~Time,data = data.df)
> summary(model)
Call:
lm(formula = Age ~ Time, data = data.df)
Residuals:
Min 1Q Median 3Q Max
-20.1101 -8.2708 -0.7707 7.6042 26.7966
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3.146e+01 4.247e+00 7.408 9.03e-12 ***
Time 4.662e-04 2.624e-04 1.777 0.0777 .
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 10.07 on 148 degrees of freedom
Multiple R-squared: 0.02088, Adjusted R-squared: 0.01427
F-statistic: 3.157 on 1 and 148 DF, p-value: 0.07767
> model$coefficients
(Intercept) Time
3.146362e+01 4.662208e-04 ,
the regression line is thus
Age = 3.146362e+01 + 4.662208e-04* Time
Identify and interpret the coefficient of determination. Comment on the adequacy of the linear model.
The coefficient of determintation is R2 value
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3.146e+01 4.247e+00 7.408 9.03e-12 ***
Time 4.662e-04 2.624e-04 1.777 0.0777 .
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 10.07 on 148 degrees of freedom
Multiple R-squared: 0.02088, Adjusted R-squared: 0.01427
F-statistic: 3.157 on 1 and 148 DF, p-value: 0.07767
which is very low , this means that model is able to explain only 2.08% variation in the data. Hence this model is very poor based on the data and we can say that there appears to be no linear relationship between the 2 variables
Also , we see that the p value of the model is 0.07 , which is greater than 0.05 hence we fail to reject the null hypothesis and conclude that the model is not statistically signifcant3.146e+01 4.662e-040
Comment
Boxplots of Age Gender 12 40 50 60 20 30Explanation / Answer
You are done a very good solution but it may be further analysis can be done for this data. We can fitting a regression model interms of dummy variabl, for example gender with respect to time. In the plot as well as results shows that the data is not linear. We can choose a suitable model for this data set which may be non linear model. If we can analysis by time series plot. We also do this. A varitey of empricial work is done for this data. The result is obtained is not statisfied the model. So, we extend this in other models.
Related Questions
drjack9650@gmail.com
Navigate
Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.