Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

\"... this does not mean that an explanatory variable can be considered in isola

ID: 3173940 • Letter: #

Question

"... this does not mean that an explanatory variable can be considered in isolation from other explanatory variables. There is something else going on that makes it important to consider the explanatory variables not one at a time but simultaneously, at the same time.

"In many situations, the explanatory variables are themselves related to one another. An effect attributed to one variable might equally well be assigned to some other variable.

"Due to the relationships between explanatory variables, you need to untangle them from one another. The way this is done is to use the variables together in a model, rather than in isolation. The way the tangling shows up is in the way the coefficient on a variable will change when another variable is added to the model or taken away from the model. That is, model coefficients on a variable tend to depend on the context set by other variables in the model.

"To illustrate, consider this criticism of spending on public education in the United States from a respected political essayist:

"The 10 states with the lowest per pupil spending included four -- North Dakota, South Dakota, Tennessee, Utah -- among the 10 states with the top SAT scores. Only one of the 10 states with the highest per pupil expenditures -- Wisconsin -- was among the 10 states with the highest SAT scores. New Jersey has the highest per pupil expenditures, an astonishing $10,561, which teachers' unions elsewhere try to use as a negotiating benchmark. New Jersey's rank regarding SAT scores? Thirty-ninth ... The fact that the quality of schools ...[fails to correlate] with education appropriations will have no effect on the teacher unions' insistence that money is the crucial variable.  -- George F. Will, The Washington Post, Sept. 12, 1993, C7.

"The response variable here is the score on a standardized test taken by many students finishing high school: the SAT. The explanatory variable -- there is only one -- is the level of school spending. But even though the essayist implies that spending is not the 'crucial variable,' he doesn't include any other variable in the analysis. In part, this is because the method of analysis -- pointing to individual cases to illustrate a trend -- doesn't allow the simultaneous consideration of multiple explanatory variables. (Another flaw with the informal method of comparing cases is that it fails to quantify the strength of the effect: Just how much negative influence does spending have on performance? Or is the claim that there is no connection between spending and performance?)

You can confirm the claims in the essay by modeling. The SAT dataset contains state-by-state information from the mid-1990s on per capita yearly school expenditures in thousands of dollars, average statewide ST scores, average teachers' salaries, and other variables.

"The analysis in the essay corresponds to a simple model : sat ~ expend."

1. Fit the model sat ~ expend to the state-by-state data in sat.csv. Find the coefficients. Is the coefficient on expend positive or negative? Is that consistent with the claim in the essay?

"According to the model, an increase in expenditures by $1000 per capita is associated with a 21 point decrease in the SAT score. That's not very good news for people who think society should be spending more on schools ... (A 21 point decrease in the SAT doesn't mean much for an individual student, but as an average over tens of thousands of students, it's a pretty big deal.)

Perhaps expenditures is the wrong thing to look at -- you might be studying administrative inefficiency or even corruption. Better to look at teachers' salaries."

2. Fit the model sat ~ salary. (Salary is the average annual salary of public school teachers in $1000s.) Is the coefficient on salary positive or negative? Is that consistent with the claim in the essay?

"Higher salaries are associated with lower average SAT scores! But maybe states with high salaries manage to pay well because they overcrowd classrooms."

3. So look at the average student/teacher ratio in each state by fitting the model sat ~ ratio. Is the coefficient on ratio positive or negative? What does that say about the effect of ratio on sat?

"At this point, many advocates for more spending, higher salaries, and smaller classes will explain that you can't measure the quality of an education with a standardized test, that the relationship between a student and a teacher is too complicated to be quantified, that students should be educated as complete human beings and not as test-taking machines, and so on. Perhaps.

"Whatever the criticisms of standardized tests, the tests have the great benefit of allowing comparisons across different conditions: different states, different curricula, etc. If there is a problem with the tests, it isn't standardization itself but the material that is on the test. Absent a criticism of that material, rejections of standardized testing ought to be treated with considerable skepticism.

"But there is something wrong with using the SAT as a test, even if the content of the test is good. What's wrong is that the test isn't required for all students. Depending on the state, a larger or smaller fraction of students will take the SAT. In states where very few students take the SAT, those students who do are the ones bound for out-of-state colleges, in other words, the high performers.

"What's more, the states which spend the least on education tend to have the fewest students who take the SAT. That is, the fraction of students taking the SAT is entangled with expenditures and other explanatory variables.

4. To verify Kaplan's claim, make a scatterplot matrix of expend, salary, ratio and frac. Is it true that the states which spend the least on education tend to have the fewest students who take the SAT?

expend

5. Fit the three models sat ~ expend + frac, sat ~ salary + frac, and sat ~ ratio + frac. Are the coefficients on expend, salary, and ratio positive or negative? Do they agree with the signs you found earlier? Is the coefficient of frac positive or negative? Does it have the same sign in all three models?

"The situation seen here, where adding a new explanatory variable changes the sign of the coefficient on another variable, is called Simpson's paradox.

"Simpson's paradox is an extreme version of a common situation: that the coefficient on an explanatory variable can depend on what other explanatory variables have been included in the model. In other words, the role of an explanatory variable can depend, sometimes strongly, on the context set by other explanatory variables. You can't look at explanatory variables in isolation; you have to interpret them in context."

6. Fit the model sat ~ expend + salary + ratio + frac. Show the model output and interpret the coefficients.

"But which is the right model? What's the right context? Do expend, salary, and ratio have a positive role in school performance ..., or should you believe the first set of models? This is an important question and one that comes up often in statistical modeling. At one level, the answer is that you need to be aware that context matters. At another level, you should always check to see if your conclusions would be altered by including or excluding some other explanatory variable. At a still higher level, the choice of which variables to include or exclude needs to be related to the modeler's ideas about what causes what.

this does not mean that an explanatory variable can be considere ed in isolation rom other explanatory variables. There is something else going on that makes it important to consider the explanatory variables not one at a time but simultaneously, at the same time 'In many situations, the explanatory variables are themselves related to one another. An effect attributed to one variable mightequally well be assigned to some other variable "Due to the relationships between explanatory variables, you need to u ntangle them from one another, The way this is done is to use the variables together in a model, rather than in isolation. The way the tangling shows up is in the way the coefficient on a variable will change when another variable is added to the model or taken away from the model. That is, model coefficients on a variable tend to depend on the context set by other variables inthe model "To illustrate, consider this criticism of spending on public education in the United States from a respected political essayist The 10 states with the lowest per pupil spending included four North Dakota, South Dakota, Tennessee, Utah -among the 10 states with the top SAT scores one of the 10 states with the highest per pupilexpenditures Wisconsin-was among the 10 $10.56 negotiating benchmark Now Jorsey's rank regarding SAT scores? T states with the highest SAT scores. Now Jersey has the highest per pupil expendituros, an astonishing which teachers' unions elsewhero try to use as a The fact that the qualty of schools ...lfails to correlatel with education aporopriations wil have no effect on the teacher unions' insistence that money isthe crucial variable George F. Wil, The Washington Post, Sept. 12, 1993, C7 "The response variable here is the score on a standardized test taken by many students finishing high school the SAT. The explanatory variable there is only one he level of school spending. But even though the essayist implies that spending is not the crucial variable, he doesn't include any other variable in the analysis In part, this is because the method of analysis pointing to individual cases to ilustrate a trend doesn't allow the simultaneous consideration of multiple explanatory variables. (Another flaw with the informal method of comparing cases is that it fails to quantify the strength of the effect: Just how much negative influence does spending have on performance? Or is the claim that there is no connecton between spending and perf You can confirm the claims in the essay by modelng. The SAT dataset contains state-by-state information from the mid-1990s on per capita yearly school expenditures in thousands of dollars, average statewide ST scores, average teachers' salaries, and other variables. ponds to a simple model sat-expend. "The anal lysis in the essay corre 1. Fit the model sat- ex pend to the state-by state data in sat.csv. d the coefficients. Is the coefficient on expend positive or negative? Is that consistent with the claim in the ess to the model, an increase in expenditures by $1000 per capita is a sociated with a 21 point decrease in the SAT score. That's not very good news for people who think society should be spending more on schools (A 21 point decrease in the SAT doesn't mean much for an individual student, but as an average over tens of thousands of students a pretty big deal. s expenditures is the wrong thing to look at- you might be studying administrative inefficiency or even corruption. Better to look at teachers' salaries. 2. Fit the model sat-salary. (Salary is the average annual salary of public school teachers in S1000s.) ls e coefficient on salary positive or negative? ls that consistent with the claim in the e "Higher salaries are associated with lower average SAT scoresl But maybe states with high salaries manage to pay well because they overcrowd classrooms. 3. So look at the average studentuteacher ratio in each state by fitting the model sat-ratio. Is the coefficient on ratio pos ve or negative? What does that say about the effect of ratio on sat? At this point, many advocates for more spending, higher salaries, and smaller classes will explain that you can't measure the quality of an education with a standardized test, that the relationship between a student and a teacher is too complicated to be quantified, that students should be educated as complete human beings and not as test-taking machines, and so on. Perhaps "Whatever the criticisms of standardized tests, the tests have the great benefit of allowing comparisons across different conditions: different states, different curricula, etc. If there is a problem with the tests, it isn't standardization itself but the material that is on the test. Absent a criticism of that material, rejections of standardized testing ought to be treated with considerable skepticism. "But there is something wrong with using the SAT as a test, even if the content of the test is good. What's wrong is that the test isn't required for all students. Depending on the state, a larger or smaller fraction of students will take the SAT In states where very few students take the SAT, those students who do are the ones bound for out-of-state col leges, in other words, the high performers "What's more, the states which spend the least on education tend to have the fewest students who take the SAT That is, the fraction of students taking the SAT is entangled with expenditures and other explanatory variables To verify Kaplan's claim, make a scatterplot matrix of expend, salary, ratio and frac. ls it true that the states which spend the least on education tend to have the fewest students who take the SAT? 4. included simultaneously in a model. That i in addition to expend or salary or ratio, the model needs to take into account the fraction of students who take the SAT (variable "To untangle the variables, they have to be 5. Fit the three models sat-expend rac, sat- salary frac, and ratio frac. Are the coefficients on io positive or negative? Do they agree with the signs you found earlier? Is the coefficient of frac positive or negative? Does it have the same sign in all three models? "The situation seen here, where adding a new explanatory variable changes the sign of the coefficient on another variable, is called Simpson's paradox "Simpson's paradox is an extreme version of a common situation: that the coefficient on an explanatory variable can depend on what other explanatory variables have been included in the model. In other words, the role of an explanatory variable can depend, sometimes strongly, on the context set by other explanatory variables. You can't ook at explanatory variables in isolation; you have to interpret them in context. 6. Fit the model sat-expend salary ratio frac. Show the model output and interpret the coefficients. 'But which is the right model? What's the right context? Do expend, salary, and ratio have a positive role in school performance or should you believe the first set of models? This is an important question and one that comes up often in statistical modeling. At one level, the answer is that you need to be aware that context matters. At another level, you should always check to see if your conclusions would be altered by including or excluding some other explanatory variable. At a still higher level, the choice of which variables to include or exclude needs to be related to the modeler's ideas about what causes what.

Explanation / Answer

I think this is a good problem to be solved by the method of step-wise regression.

In stepwise regression,when we are proceeding forward, then firstly one includes any one of the variable(in this case all 4 variables to be included one at a time) to the model to explain the dependent variable, then the efficiency of the model and the percentage variation explained by the variable is calculated.

after that, the second variable is included in the model(in this case 4C2 combinations will be there)and we the calculate the percentage variation explained by thesee two variables.

then, we include the third variable(in this case 4C3 combinations) and calculate the same.

and finally, all the variable are considered and percentage variation is calculated.

Now, it is the decision time,

Among these 4C1+4C2+4C3+4C4 (i.e. 15) model,we should choose that model which explains the maximum variation, and the variation should be considered in terms of adjusted R-square, because it also considers the effect of persimony, because, as an example, suppose 3 variables are explaining 90% of the variation, but when we include the 4th one the it explaines 90.5% variation, then we should go for the 3 variable model.

Hence, we can choose the optimum model.

As,here any of the explained variation figures are not present, so we need to know the actual figures to decide which model is best in this case, and the procedure to choose the model is explained above.

Hire Me For All Your Tutoring Needs
Integrity-first tutoring: clear explanations, guidance, and feedback.
Drop an Email at drjack9650@gmail.com
Chat Now And Get Quote