Question: Campaign organizers for both the Republican and Democrat parties are i
ID: 3209425 • Letter: Q
Question
Question:
Campaign organizers for both the Republican and Democrat parties are interested in identifying individual undecided voters who would consider voting for their party in an upcoming election. The file BlueOrRed contains data on a sample of voters with tracked variables including: whether or not they are undecided regarding their candidate preference, age, whether they own a home, gender, marital status, household size, income, years of education, and whether they attend church.
Partition the data into training (50 percent), validation (30 percent), and test (20 percent) sets. Use logistic regression to classify observations as undecided (or decided) using Age, HomeOwner, Female, Married, HouseholdSize, Income, and Education as input variables and Undecided as the output variable. Perform an exhaustive-search best subset selection with the number of subsets equal to 2.
a. From the generated set of logistic regression models, select one that you believe is a good fit. Express the model as a mathematical equation relating the output variable to the input variables.
b. Increases in which variables increase the chance of a voter being undecided? Increases in which variables decrease the chance of a voter being decided?
c. Using the default cutoff value of 0.5 for your logistic regression model, what is the overall error rate on the test data?
d. Examine the decile-wise lift chart for your model on the test data. What is the first decile lift? Interpret this value.
Data (Link to excel spreadsheet as data is too big to paste):
https://s3.amazonaws.com/tarynalyssa.com/chegg/BlueOrRed.xlsx
Explanation / Answer
a) the model is obtain as
undecided=-4.088-0.00977*Age+0.4563*HomeOwner+1.0231*Female+0.1767*Married+0.1598*HouseholdSize-0.005427*Income+0.18135*Education
Now to decide the fit is good or not
the p-value for the deviance, pearson,Howser Lemenshow goodness of fit test is 0.
so it is less than alpha=0.05
so there is a statistical evidence that we may reject the null hypothesis of adequate of fit of the data.
so we may say that the fit is inadequate.
B) from the fitted model we may observe that the coefficient of the variable namely HomeOwner,Female,Married,HouseholdSize and education are positive so it indicate that as you increase these variable the chance of voter being decided.
and the variable Age,income increases the chance of voter being undecided decreases
c)
Related Questions
drjack9650@gmail.com
Navigate
Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.