PYTHON Write a Python program that will read in a two-column comma-separated val
ID: 643673 • Letter: P
Question
PYTHON Write a Python program that will read in a two-column comma-separated value (CSV) file and do a linear regression You should do the regression calculation yourself. You should check your results with either np.polyfit or np.linalg. lstq but you should (a) either calculate the normal matrix and solve the system yourself, or (b) calculate the sample averages, standard deviations, and correlations yourself, and not use built in functions for these operations On a single figure, make a scatter plot of the original data points and a line plot of linear regression of y on x Have the program automatically write the solution equation y=mx+b somewhere on the plot
here are my values from the txt file since i cant attach a fie
1,0.9984489205
2,0.9752788428
3,0.8769402712
4,0.6284264418
5,0.1772490897
6,-0.4210076698
7,-0.9162875775
8,-0.9116652003
9,-0.1990087588
10,0.7565584253
11,0.8973685995
12,-0.1666463731
13,-0.9999420786
14,-0.0774674983
15,0.9994582237
Explanation / Answer
Program Code:
import math
import csv
#import numpy as np
#import matplotlib.pyplot as plt
#definition of the SampleAverages
def SampleAverages(avg=[]):
total=0;
size=len(avg)
for x in range(len(avg)):
total=total+avg[x]
average=total/size
return average
#definition of the StandardDeviation
def StandardDeviation(m,points=[]):
mean=m
s=0;
for i in points:
s=s+math.pow((i-mean),2)
sd=0
sd=math.sqrt(s/(len(points)-1))
return sd
#definition of the slopeValue
def slopeValue(xm, ym, xV=[], yV=[]):
b=0
xysum=0
xsq=0
for i in range(len(xV)):
xysum=xysum+(xV[i]*yV[i])
xsq=xsq+math.pow(xV[i],2)
b=(xysum-(len(xV)*xm*ym))/(xsq-len(xV)*math.pow(xm,2))
return b
#definition of the interceptValue
def interceptValue(ymean, slope, xmean):
yinter=0
yinter=ymean-(slope*xmean)
return yinter
#definition of the correlationValue
def correlationValue(xmean, ymean,xdeviation,ydeviation, xValues, yValues):
xco=0
yco=0
xyco=0
corelation=0
for i in range(len(xValues)):
xco=((xValues[i]-xmean)/xdeviation)
yco=((yValues[i]-ymean)/ydeviation)
xyco=xyco+(xco*yco)
corelation=xyco/(len(xValues)-1)
return corelation
#read data from the CSV file Input.csv
f = open('Input.csv')
csv_f = csv.reader(f)
yValues=[]
xValues=[]
#read the values from the csv file: Input.csv
for row in csv_f:
#append the data as x coordinates and y coordinates
xValues.append(int(row[0]))
yValues.append(float(row[1]))
#print the x-y coordinates
for i in range(len(xValues)):
print(xValues[i], " ",yValues[i] )
#find the mean of x coordinates and y coordinates by calling the
#funciton SampleAvearges and print the values
xmean=SampleAverages(xValues)
ymean=SampleAverages(yValues)
print("The mean of X= ", round(xmean,2))
print("The mean of Y= ", round(ymean,2))
#find the standard deviation of x coordinates and y coordinates by calling the
#funciton StandardDeviation and print the values
xdeviation=StandardDeviation(xmean, xValues)
print("The standard deviation of X= ", round(xdeviation,3))
ydeviation=StandardDeviation(ymean, yValues)
print("The standard deviation of Y= ", round(ydeviation,3))
#find the slope of the line by calling the slopeValue function and print the value
slope=slopeValue(xmean, ymean, xValues, yValues)
print("The slope of the line is: ", round(slope,3))
#find the y-intercept of the line by calling the interceptValue function and print the value
intercept=interceptValue(ymean, slope, xmean)
print("The y-intersept value = ", round(intercept,3))
print(" ")
equation=""
equation="y = "+str(round(slope,3))+"x + "+str(round(intercept,3))
#find the correlation of the line and print the value
correlation=correlationValue(xmean, ymean,xdeviation,ydeviation, xValues, yValues)
print("The correaltion value = ", round(correlation,3))
#finally print the value of the line
print("Therefore, the equation of the line is ", equation)
'''
# calculate polynomial
z = np.polyfit(xValues, yValues, 3)
f = np.poly1d(z)
# calculate new x's and y's
x_new = np.linspace(xValues[0], xValues[-1], 50)
y_new = f(x_new)
plt.plot(xValues,yValues,'o', x_new, y_new)
plt.xlim([xValues[0]-1, x[-1] + 1 ])
plt.show()'''
Sample Input file: Input.csv
sh-4.2# python3 main.py
1 0.9984489205
2 0.9752788428
3 0.8769402712
4 0.6284264418
5 0.1772490897
6 - 0.4210076698
7 -0.9162875775
8 -0.9116652003
9 -0.1990087588
10 0.7565584253
11 0.8973685995
12 -0.1666463731
13 -0.9999420786
14 -0.0774674983
15 0.9994582237
The mean of X= 8.0
The mean of Y= 0.17
The standard deviation of X= 4.472
The standard deviation of Y= 0.755
The slope of the line is: -0.049
The y-intersept value = 0.564
The correaltion value = -0.289
Therefore, the equation of the line is y = -0.049x + 0.564
Related Questions
Navigate
Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.