Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

data :https://harlanhappydog.github.io/STAT306/docs/newbie.txt Download the \"ne

ID: 3067235 • Letter: D

Question

data :https://harlanhappydog.github.io/STAT306/docs/newbie.txt

Download the "newbie" data set from the website. It studies the relationship between whether internet users belong to the "Newbie" category (that is those that have been on the Internet for less than a year), and a set of demographic indicators. These demographic indicators include age, gender, household income, sexual preference, education, occupation and marital status. 1500 observations are included in this data set.
Read in this data and use the first 1200 observations to fit a logistic model for response "Newbie" with all the other variables. Then apply this model to the rest of observations and get the predicted probabilities. Classify a case as Newbie if the predicted probability exceeds 0.6 and otherwise classify it as a non-Newbie.

Question 1 How many parameters are there in this logistic model? Question 2 in the hold-out set? Question 3 What is the AIC of this model on the training data set?

Explanation / Answer

Solution : All solution is performed into R and Studio

newbew <- read.csv('newbew.csv',header = T , stringsAsFactors = T)
View(newbew)
dim(newbew)
str(newbew)
colSums(is.na(newbew))

newbew_pre <- select(newbew, -Newbie)
head(newbew_pre)
#creating dummy variable with caret package
new_dmy <- dummyVars(~.,data = newbew_pre,fullRank = T)
new_trans <- data.frame(predict(new_dmy,newdata =newbew_pre))
View(new_trans)
dim(new_trans)
new_trans$Newbie <- newbew$Newbie
names(new_trans)
#train test split
train <- new_trans[1:1200,]
test <- new_trans[1201:1500,]
#Runnig a logistic model on first 1200 observation

newlr <- glm(Newbie ~ ., data = train, family = binomial)
summary(newlr)

###predicting on the remaining obsevation

bie_pre <- predict(newlr,, type = 'response', newdata = test)
head(bie_pre)
table(test$Newbie,bie_pre>0.6)

Q1: How many parameters are there in logistics model

31 + 1(intercept) = 32

Q2: Misclassification of Hold out set

What is the AIC of this model on training dataset