Suppose we have a set of data consisting of ordered pairs and we suspect the x a
ID: 3540318 • Letter: S
Question
Suppose we have a set of data consisting of ordered pairs and we suspect the x and y coordinates are related. It is natural to try to find the best line that fits the data points. If we can find this line, then we can use it to make all sorts of other predictions. In this project, we're going to use several functions to find this line using a technique called least squares regression. The result will be what we call the least squares regression line (or LSRL for short).
In order to do this, you'll be able to reuse some code you've already written (improve it if necessary, of course), as the LSRL is more or less based on statistical calculations we've already automated. You'll need to program one new statistical computation called the correlation coefficient, denoted by r in statistical symbols:
Once you have the correlation coefficient, you use it along with the sample means and sample standard deviations of the x and y-coordinates to compute the slope and y-intercept of your regression line via these formulas:
You will also need a main program to drive this program. All computation should be done in the six methods; the main program should be extremely short. (I have fewer than a dozen lines of code.)
Notes and Hints
%u2022Remember to think about passing by value vs. passing by reference. Forgetting this distinction could cost you hours of debugging time.
%u2022Pay special attention to writing pre- and postconditions. I recommend that you begin by writing method headers and pre- and postconditions, THEN write the internal code.
%u2022Please give 1 or 2 lines of whitespace between methods.
Sample Screen Output
Regression line: y = 1166.93 + -0.586788x
Explanation / Answer
// it needs input.txt with that data..
#include<iostream>
#include<fstream>
using namespace std;
void Input(double x[],double y[],int &size)
{
ifstream infile("input.txt");
int i=0;
while(!infile.eof())
{
infile >> x[i] >> y[i];
i++;
}
size = i;
}
double sample_mean(double x[],int size)
{
double sum = 0;
for(int i=0; i<size; i++)
sum = sum + x[i];
return sum/size;
}
double deviation(double x[],int size,double sample_mean)
{
double sum = 0;
for(int i=0; i<size; i++)
sum = sum + ((x[i] - sample_mean)*(x[i] - sample_mean));
return sum/size;
}
double compute_correlation_coefficient(double x[],double y[],int size)
{
double sum_xy = 0;
double sum_x = 0;
double sum_x_s = 0;
double sum_y = 0;
double sum_y_s = 0;
double final = 0;
for(int i=0; i<size; i++)
{
sum_xy = sum_xy + x[i]*y[i];
sum_x = sum_x + x[i];
sum_x_s = sum_x_s + x[i]*x[i];
sum_y = sum_y + y[i];
sum_y_s = sum_y_s + x[i]*x[i];
}
for(int i=0; i<size; i++)
{
final = final + (((x[i]-sample_mean(x,size))/deviation(x,size,sample_mean(x,size)))*((y[i]-sample_mean(y,size))/deviation(y,size,sample_mean(y,size))));
}
final = final/(size-1);
return (final*deviation(y,size,sample_mean(y,size)))/deviation(x,size,sample_mean(x,size));
}
double compute_line(double x[],double y[],int size)
{
double sum_y = 0;
double sum_x = 0;
for(int i=0; i<size; i++)
{
sum_y = sum_y + y[i];
sum_x = sum_x + x[i];
}
return sample_mean(y,size) - compute_correlation_coefficient(x,y,size)*sample_mean(x,size);
}
void print_line(double y_intercept, double slope)
{
cout <<"Regression line: y = " << y_intercept << " + " << slope << "x" << endl;
}
int main(int argc,char *argv[])
{
double x[100];
double y[100];
int size;
Input(x,y,size);
print_line(compute_line(x,y,size),compute_correlation_coefficient(x,y,size));
system("pause");
return 0;
}
Related Questions
drjack9650@gmail.com
Navigate
Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.