Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

The variables from data set are: attend: number of classes attended during the s

ID: 2934179 • Letter: T

Question

The variables from data set are:

attend: number of classes attended during the semester (out of 32 total)

termGPA the student’s GPA that semester

priGPA the student’s cumulative GPA prior to the semester

ACT the student’s ACT core

final the student’s final score

hwrte the student’s homework submission rate (%)

fresh dummy=1 if freshman, 0 otherwise

soph dummy=1 if sophomore, 0 otherwise

1. Suppose you would like to know what impact attendance has on final score. A colleague proposes to estimate the model FINAL = B0 + B1 ATTEND + u.

(a) Is this a reasonable model to address this research question? Explain why or why not. If you think there may be a problem with this model, propose a better alternative and explain why it is better.

Do I need to look at the data in Stata or is it intuitive?

Explanation / Answer

The model FINAL = B0 + B1 ATTEND + u.is a reasonable model to find the impact attendance has on final score. This would tell us how change in final score means change in attendance (i.e. attendance may increase as final marks increase or vice versa). However the model can be made better using dummy variables. A dummy variable is a numerical variable used to represent subgroups of the sample in your study. Dummy variables are useful because they enable us to use a single regression equation to represent multiple groups. This means that we don't need to write out separate equation models for each subgroup.

Whenever you have a regression model with dummy variables, you can always see how the variables are being used to represent multiple subgroup equations by following the two steps described above:

It is always good practice in regression to look at the data. While you can intuitively tell that the dummy variables here would improve your data. To confirm that both independent and deendent variable are linear you need to look at the data.