Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

(1) Here is a list of 106 students. This simple data set includes students\' gra

ID: 3203173 • Letter: #

Question

(1) Here is a list of 106 students. This simple data set includes students' grade (GPA) in the course, log-on minutes in the on-line course over a semester, and students' gender. Please use STATA for the following analysis. For this analysis, you will need to first specify a regression model.

(a) Please do a descriptive analysis on the data.
(b) Please conduct an ANOVA on grade and log-on minutes (exclude gender for now). Is log-on minutes a statistically significant variable for the variation of students' grade?
(c) Do a simple regression analysis of grade and log-on minutes (exclude gender for now; hence a simple regression). Does the regression result agree with ANOVA? How well does this simple regression explain students' grade?
(d) Now include gender in the model. Does gender hold explanatory power on students' grade?

Data:

Students Total log-on Minutes in the course over a semester Grade (GPA) Gender 1 857 4 M 2 779 4 M 3 484 3.67 M 4 697 0 F 5 1137 4 M 6 598 4 M 7 626 3 F 8 3948 3.33 F 9 5417 4 F 10 208 4 F 11 10076 3.67 M 12 193 3.67 M 13 455 3.67 M 14 181 3 M 15 382 2.67 F 16 195 3.67 F 17 839 3.67 M 18 156 2.33 M 19 164 3 F 20 206 3.33 F 21 223 4 M 22 137 2.67 F 23 414 3 M 24 451 4 M 25 553 2.67 M 26 178 2.67 F 27 115 2.67 F 28 636 3.33 F 29 111 2 F 30 1154 4 F 31 119 2 M 32 121 2.33 M 33 131 2 F 34 112 2 M 35 587 3 F 36 102 4 M 37 296 4 F 38 466 4 M 39 433 3.33 M 40 620 4 F 41 695 4 F 42 120 4 F 43 313 4 F 44 61 3.67 M 45 803 3.33 M 46 738 3.67 F 47 114 3.67 F 48 174 3.67 F 49 762 3.67 M 50 449 4 M 51 2322 4 F 52 3048 4 F 53 1596 4 M 54 455 3.33 M 55 832 3.67 F 56 1291 4 F 57 1087 4 M 58 413 4 M 59 336 3.67 M 60 1132 3.67 M 61 1026 4 M 62 552 4 M 63 798 4 F 64 979 4 F 65 1137 3.67 F 66 767 4 M 67 1650 4 F 68 41 4 F 69 1534 4 F 70 96 3 M 71 217 3.33 F 72 117 4 M 73 1300 4 F 74 7329 3.67 F 75 393 3 F 76 106 3 M 77 1123 3.33 M 78 5971 4 F 79 1262 3 M 80 363 2 F 81 371 4 M 82 90 2 F 83 199 2.33 M 84 98 2.67 M 85 126 2 M 86 592 3 F 87 86 2.33 F 88 611 2.33 F 89 164 2.33 F 90 187 2.67 M 91 107 2.33 M 92 80 2 F 93 64 2.33 F 94 106 2.33 M 95 301 3.33 M 96 244 3.33 F 97 115 2.33 F 98 78 2.33 M 99 617 3.67 M 100 86 2.67 M 101 349 2.33 M 102 238 2 F 103 85 2.33 M 104 79 2.33 F 105 462 2.33 F 106 300 2.33 F

Explanation / Answer

We cannot provide you the solutions using paid softwares such as Stata. However , we shall provide you the solution using the open source stats package R, the concepts and the workings would remain the same,

The complete R snippet is as follows

# read the data into R dataframe
data.df<- read.csv("C:\Users\586645\Downloads\Chegg\students.csv",header=TRUE)
str(data.df)

# descriptive stats of the data
summary(data.df)


# perform anova analysis
a<- aov(lm(data.df$Total.log.on.Minutes.in.the.course.over.a.semester~ data.df$Grade..GPA.,data=data.df))

#summarise the results
summary(a)

fit <- lm(data.df$Total.log.on.Minutes.in.the.course.over.a.semester~ data.df$Grade..GPA.,data=data.df)
#summarise the results
summary(fit)


# perform anova analysis
a<- aov(lm(data.df$Total.log.on.Minutes.in.the.course.over.a.semester~ data.df$Grade..GPA.*data.df$Gender,data=data.df))

#summarise the results
summary(a)

The results are

Descriptive stats are

> summary(data.df)
tudents Total.log.on.Minutes.in.the.course.over.a.semester Grade..GPA.
Min. : 1.00 Min. : 41.0 Min. :0.00
1st Qu.: 27.25 1st Qu.: 132.5 1st Qu.:2.67
Median : 53.50 Median : 387.5 Median :3.33
Mean : 53.50 Mean : 791.5 Mean :3.23
3rd Qu.: 79.75 3rd Qu.: 776.0 3rd Qu.:4.00
Max. :106.00 Max. :10076.0 Max. :4.00
Gender
F:54
M:52
  

> a<- aov(lm(data.df$Total.log.on.Minutes.in.the.course.over.a.semester~ data.df$Grade..GPA.,data=data.df))
> #summarise the results
> summary(a)
Df Sum Sq Mean Sq F value Pr(>F)   
data.df$Grade..GPA. 1 17193902 17193902 8.785 0.00376 **
Residuals 104 203537386 1957090   
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

as the p value is less than 0.05 , hence the results are significant

> fit <- lm(data.df$Total.log.on.Minutes.in.the.course.over.a.semester~ data.df$Grade..GPA.,data=data.df)
> summary(fit)

Call:
lm(formula = data.df$Total.log.on.Minutes.in.the.course.over.a.semester ~
data.df$Grade..GPA., data = data.df)

Residuals:
Min 1Q Median 3Q Max
-1149.2 -566.1 -243.9 -36.1 9056.6

Coefficients:
Estimate Std. Error t value Pr(>|t|)   
(Intercept) -880.0 580.1 -1.517 0.13227   
data.df$Grade..GPA. 517.6 174.6 2.964 0.00376 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1399 on 104 degrees of freedom
Multiple R-squared: 0.0779,   Adjusted R-squared: 0.06903
F-statistic: 8.785 on 1 and 104 DF, p-value: 0.003765

as the p value is less than 0.05 , hence the results are significant for the linear regression model as well

when we include gender as well

> a<- aov(lm(data.df$Total.log.on.Minutes.in.the.course.over.a.semester~ data.df$Grade..GPA.*data.df$Gender,data=data.df))
> #summarise the results
> summary(a)
Df Sum Sq Mean Sq F value Pr(>F)   
data.df$Grade..GPA. 1 17193902 17193902 8.785 0.00378 **
data.df$Gender 1 3425194 3425194 1.750 0.18884   
data.df$Grade..GPA.:data.df$Gender 1 471077 471077 0.241 0.62477
  
Residuals 102 199641115 1957266   
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

in this results grade turns out to be significant but gender and and the interaction effect are not significant in explaining the results.