SOLVE USING RSTUDIO - DATASET IS CSV FILE DATASET - CSV FILE: subject gender age
ID: 3369213 • Letter: S
Question
SOLVE USING RSTUDIO - DATASET IS CSV FILE
DATASET - CSV FILE:
subject gender age high_sch_GPA college_GPA distance_home distance_residence TV sports newspapers AIDS vegetarian political_affiliation political_ideology religiosity abortion_legalize affirmative_action_support life_after_death 1 m 32 2.2 3.5 0 5 3 5 0 0 n r 6 2 n n y 2 f 23 2.1 3.5 1200 0.3 15 7 5 6 y d 2 1 y y u 3 f 27 3.3 3 1300 1.5 0 4 3 0 y d 2 2 y y u 4 f 35 3.5 3.2 1500 8 5 5 6 3 n i 4 1 y y n 5 m 23 3.1 3.5 1600 10 6 6 3 0 n i 1 0 y n n 6 m 39 3.5 3.5 350 3 4 5 7 0 y d 2 1 y y u 7 m 24 3.6 3.7 0 0.2 5 12 4 2 n i 2 1 y y y 8 f 31 3 3 5000 1.5 5 3 3 1 n i 2 1 y y y 9 m 34 3 3 5000 2 7 5 3 0 n i 1 1 y y u 10 m 28 4 3.1 900 2 1 1 2 1 y i 3 0 n y y 11 m 23 2.3 2.6 253 1.5 10 15 1 1 n r 5 1 n y y 12 f 27 3.5 3.6 190 3 14 3 7 0 n d 2 1 y y u 13 m 36 3.3 3.5 245 1.5 6 15 12 5 n d 1 1 y y y 14 m 28 3.2 3.2 500 6 3 10 1 2 n i 4 1 y n y 15 f 28 3 3.5 3500 1 4 3 1 0 n d 1 0 y y y 16 f 25 3.8 3.3 210 10 7 6 1 0 y i 2 3 y y y 17 f 41 4 3 1000 15 6 7 3 10 n i 3 3 n u y 18 m 50 3.8 3.8 0 3 5 9 6 10 n d 2 0 y n n 19 m 71 4 3.5 5000 3 6 12 2 2 n i 2 0 y n n 20 f 28 3 3.8 120 1 25 0 0 2 y d 1 1 y y y 21 f 26 3.7 3.7 8000 8 4 4 4 1 n i 4 1 y y y 22 f 27 4 3.7 2 2.5 4 2 7 0 n i 2 1 y y y 23 m 31 2.7 3.5 1700 5 7 7 2 0 n r 7 3 n n y 24 f 23 3.7 3.7 2 2 7 4 2 0 n i 4 0 y y y 25 m 23 3.2 3.8 450 4 0 7 7 3 n i 1 0 y y y 26 f 44 3 3 0 2 2 3 2 3 y i 3 2 y y y 27 m 26 3.7 3 1000 3 8 2 7 0 n d 2 1 y y u 28 f 31 3.7 3.8 850 10 10 3 7 0 n r 5 2 y n y 29 m 24 3.3 3.1 420 2 10 6 5 0 n d 4 1 y y u 30 f 26 3.3 3.3 1200 0.75 10 0 3 0 n r 2 1 y y u 31 m 26 3.3 3.5 1000 1.5 0 3 3 3 y d 2 1 y y n 32 f 32 3.5 3.9 150 12 10 2 0 0 n d 2 1 n n y 33 m 26 3.4 3.4 2000 1.5 2 7 14 0 n d 2 0 y y n 34 f 22 3.2 2.8 316 2 10 3 5 2 n i 2 1 y y u 35 f 24 3.5 3.9 900 1.75 8 0 0 1 n d 1 1 y y u 36 m 24 3.6 3.3 250 2 4 6 3 1 n r 5 3 n y y 37 m 23 3.8 3.7 180 0.5 10 5 7 0 n i 2 0 y n u 38 m 33 3.4 3.4 6000 1.5 8 5 6 2 n i 2 0 y y n 39 m 23 2.8 3.2 950 2 37 1 0 5 0 n r 5 2 y n y 40 m 31 3.8 3.5 1100 0.75 0.5 3 5 2 n r 6 2 y n u 41 m 26 3.4 3.4 1300 1.2 0 8 2 0 n i 2 1 n y n 42 m 28 2 3 360 0.25 10 8 3 0 n d 3 0 y y u 43 f 24 3.8 3.9 1800 2 2 5 4 1 n r 6 3 n y y 44 m 23 3 3.6 900 15 12 0 5 0 n r 5 0 y n n 45 f 25 3 4 5000 5 1.5 0 4 0 n i 4 1 y y n 46 f 24 3 3.5 300 1 10 5 5 0 n d 2 0 y y n 47 f 27 3 3.8 2000 20 28 7 14 2 y r 3 1 y y y 48 m 24 3.3 3.8 630 1.3 2 3 5 0 n r 7 3 n n y 49 f 26 3.8 4 1200 1 0 4 3 1 n d 2 0 y y n 50 f 27 3 4 580 2 5 15 1 2 n d 1 1 y y n 51 m 32 3 3 2000 5 5 5 2 1 n r 5 3 n y y 52 f 41 4 4 0 8 8 4 2 2 n r 4 1 n n y 53 f 29 3 3.9 300 3.7 2 5 1 11 n d 2 1 y y y 54 f 50 3.5 3.8 6 6 7 3 7 0 n d 2 1 y y u 55 f 22 3.4 3.7 80 7 10 1 2 2 n i 2 0 y y u 56 f 23 3.6 3.2 375 1.5 5 1 0 5 0 n r 6 3 n n y 57 m 26 3.5 3.6 2000 0.3 16 8 3 0 n d 4 1 y y u 58 m 30 3 3 1 1.1 1 4 3 0 n i 3 3 y n y 59 f 23 3 3 112 0.5 15 3 3 0 n i 4 2 y y y 60 f 22 3.4 3 650 4 8 16 7 1 n i 4 1 y y y The data set fl.student survey contains survey responses for a random sample of 60 students in Florida. a multiple linear regression model to predict college.GPA using the following predictons o gender - gender of student (m/f) o age - age of student o high.sch.GPA-student's GPA in high school o TV- average amount of time per week the student spends watching TV o sports - average number of hours per week the student engaged in sports and other physical exercise Also include an interaction between gender and age, and between gender and sports. Is the model significant overall in predicting sales? tb) Provide an interpretation of each coefficient in the model you fit in (al) (c) For which of the predictors in (a) can you reasonably reject the null hypothesis Ho B 0? Justify your answer and explain what it means to reject Ho- (d) Comment on the results of (c). Do they make intuitive sense? (e) On the basis of your response to (c), fit a smaller model that only uses the predictor(s) for which there is evidence of association with the response. (f How well do the models in (a) and (e) fit the data?Explanation / Answer
data_23Jun = read.csv(file.choose(),header = T)
# to get first 6 rows of the dataset
head(data_23Jun)
# to Check if there are missing values
colSums(is.na(data_23Jun))
#converting age to factors
data_23Jun$gender <- as.numeric(as.character(factor(data_23Jun$gender,levels = c("m", "f"),labels = c("1", "2"))))
#Top 6 rows of our relevant variables
head(data_23Jun[,c(2:5,8,9)])
Output:
## Correlation Matrix of the relevant variables
round(cor(data_23Jun[,c(2:5,8,9)]),2)
Output:
gender age high_sch_GPA college_GPA TV sports
gender 1.00 -0.08 0.13 0.23 0.11 -0.28
age -0.08 1.00 0.25 0.05 -0.15 0.18
high_sch_GPA 0.13 0.25 1.00 0.28 -0.27 -0.13
college_GPA 0.23 0.05 0.28 1.00 -0.02 -0.13
TV 0.11 -0.15 -0.27 -0.02 1.00 -0.11
sports -0.28 0.18 -0.13 -0.13 -0.11 1.00
Since no value is greater than 0.5, hence none of the variables indicate strong correlation among them
a)
#Regression Model
regmodel <- lm(college_GPA ~ gender+age+high_sch_GPA+TV+sports, data = data_23Jun)
summary(regmodel)
Output:
Since the p-value of the overall model is greater than 0.05, hence the model is not significant
b)
Since the p-value of all the independent variables is greater than 0.05, hence no variable appears to be significant in the variable.
c)
For all the variables, we cannot reject the null hypothesis as p-values for all the variables is more than 0.05.
Rejecting the null hypothesis implies that the variable is significant in explaining the variance of the dependent variables.
d)
No, the results of the model do not make an intuitive sense as age,high_school_GPA, TV, sports all the variables should be helpful in explaining the variance of the dependent variable
e)
#Model 2
regmodel2 <- lm(college_GPA ~ high_sch_GPA,data = data_23Jun)
summary(regmodel2)
Output:
f)
RSquare for Model1 is 12.07% while for Model2 is 7.748%.
Ie Model1 explains 12% of the variance of the dependent variable while model2 explains only 7.75%.
Related Questions
Navigate
Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.