Linear regression draws a straight line through a group of data points such that
ID: 3666982 • Letter: L
Question
Linear regression draws a straight line through a group of data points such that the position and slope of the line minimizes the square of the vertical distance between the data points and the straight line. It fits the data in an intuitively satisfying and yet mathematically reproducible way. For linear regression to be valid, all data points should vary in exactly the same random way, and that variation should have a normal or “Gaussian” distribution ? the familiar bell-shaped distribution.
To illustrate the application of linear regression, this project uses it to generate a trend line for the effect of nitrogen fertilizer on the yield of a crop of corn (maise). To guarantee that the required assumptions are met, we have created the data artificially, by adding a normally distributed random variable to a sloping straight line, with the same variance for all data points. Specifically, we added a normal random number having a standard deviation of 25 to the straight line. Here’s the equation:
y = 50 + 100 * x + randomNumber
The following plot shows one set of 10 data points, and the linear-regression fit to those data points:
Sample session:
Enter number of data points or 0 for default: 0
Fert Yield
81 131
14 71
60 112
12 53
99 115
35 92
4 71
23 65
45 104
14 25
slope = 0.8486061764042895
yieldAt0 = 51.058940973154
yieldAtMax = 135.91955861358295
residual error = 18.87483162574109
Another sample session:
Enter number of data points or 0 for default: 10000
Fert Yield
64 139
1 52
86 121
31 97
95 126
86 166
67 118
26 95
89 179
39 95
slope = 1.0051707825618592
yieldAt0 = 50.025474774097034
yieldAtMax = 150.54255303028296
residual error = 25.0921873778027
The first sample session prints all of the data points used in the figure. The second sample session prints just the first 10 of 10,000 points used as the basis of the regression. Of course, your random number generator will not generate the same data values as those shown above, but the four values at the bottom of your output should be close to the four values we generated – which are close to the parameters used to generate the random data.
Your job is to write the program that produces these results. To generate the first sample session above, initialize a two-dimensional array with the 10 sets of output values shown. To generate the second sample session above, import the java.util.Random package, use the zero-parameter constructor to instantiate a random-number generator, and have that generator call its nextGaussian method to generate a random variable with a Gaussian distribution whose mean value is zero and whose standard deviation is 1.0. (See Section 5.8 for more information.)
Here is the basic algorithm for linear regression:
1) Find the average x (avgX) and the average y (avgY).
2) Find the x_variance, which is the sum of the squares of (x[i] – avgX), divided by the number of data points.
3) Find the x_y_covariance, which is the sum of the product, (x[i] ? avgX) * (y[i] ? avgY), divided by the number of data points.
4) The slope of the desired regression line is slope ? x_variance / x_y_covariance
5) The y-axis intercept (value of y at x=0) of the straight line is
Explanation / Answer
package Gaussian;
import java.util.Scanner;
public class gaussian {
public static void main(String[] args) {
Scanner sc=new Scanner(System.in) ;
int [][] fert_yield=new int[10][];
int DataPoints=0;
System.out.println("Enter the number of data points or 0 for default");
DataPoints=sc.nextInt();
for (int i = 0; i < fert_yield.length; i++) {
for (int j = 0; j < 2; j++) {
fert_yield[i][j]=sc.nextInt();
}
}
float avgF=0.0f;
float avgY=0.0f;
int sumF=0;
int sumY=0;
float x_variance=0;
float x_y_covariance=0;
for (int i = 0; i < fert_yield.length; i++) {
sumF+=fert_yield[i][0];
sumY+=fert_yield[i][1];
}
avgF=sumF/10;
avgY=sumY/10;
for (int i = 0; i < fert_yield.length; i++)
{
x_variance+=(fert_yield[i][0]-avgF);
}
x_variance=x_variance/DataPoints;
for (int i = 0; i < fert_yield.length; i++)
{
x_y_covariance+=((fert_yield[i][0]-avgF)*(fert_yield[i][1]-avgY));
}
x_y_covariance=x_y_covariance/DataPoints;
float Desire_Slope=x_variance/x_y_covariance;
float Y_Intercept=avgY-Desire_Slope*avgF;
}
}
Related Questions
drjack9650@gmail.com
Navigate
Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.