Python When analyzing data, we often have some idea that two quantities are rela

ID: 3664034 • Letter: P

Question

Python

When analyzing data, we often have some idea that two quantities are related and would like to quantify this relationship. We will eventually learn to use linear least-squares approximation to do so. For now, we want to see an example of correlated data and how a least-squares fit of that data can give us a good quantitative picture of the relationship. This problem will have some similarities with the power law problem, but in this case we are using actual data available online. The file [baseball-data.csv] contains the batting average and RBIs from 2014 for each of the MLB teams. Each row represents data for a single team, with the first column containing the RBI count and the second column containing the batting average. Write Python code to read in the baseball data using np.loadtxt(). Your code will see a variable called baseball_csv which you can treat as the filename when calling np.loadtxt. You will not need quotes around baseball_csv in this case. Plot the data points (with AVG on x-axis and RBI on y-axis) using matplotlib. Additionally, plot the line y=mx+b where m=4074.611 and b=398.302 on the same graph as the data points. This line is the line of best fit through the data. Which data point is closest to the line (vertically)? Which data point is farthest from the line (vertically)? Your code should produce a variable called closest_pt which is a 2-tuple of the form (x,y) where x is the AVG and y is the RBI for the closest data point. Your code should also produce a variable called closest_dist which is the vertical distance between the closest point and the line through the data. Your code should likewise produce a variable called farthest_pt which is also a tuple of length 2 of the same form and a variable called farthest_dist which is the vertical distance between the farthest point and the line through the data.

729 0.259 731 0.277 721 0.276 686 0.244 690 0.259 686 0.265 675 0.254 681 0.256 635 0.253 659 0.259 644 0.253 636 0.255 625 0.253 604 0.263 617 0.25 614 0.253 597 0.256 601 0.244 600 0.244 591 0.245 596 0.242 602 0.239 584 0.242 585 0.253 573 0.248 590 0.239 586 0.247 562 0.238 545 0.241 500 0.226

Explanation / Answer

//import files
import numpy as np
from matplotlib import pyplot as plt

a = np.loadtxt("baseball-data.c", delimiter=",")
// define x and y matrix
x = a[:,1]
y = a[:,0]

fig = plt.figure()
coord = fig.add_subplot(1,1,1)
// plot will display blue color with dotted line
fig1 = coord.plot(x, y, marker=".", linestyle="", color="blue")

m = 4074.611
b = -398.302

x_line = np.linspace(0.22,0.28,2)
y_line = m*x_line + b
fig2 = coord.plot(x_line, y_line, color="blue")

def distToLine(x, y):
dist = abs(y - m*x - b)
return dist

first_point = tuple(a[0,:])
closest_pt = (first_point[1], first_point[0])
closest_dist = distToLine(first_point[1], first_point[0])
farthest_pt = (0, 0)
farthest_dist = 0
for i in range (1, len(x)):
    point = tuple(a[i,:])
    currDist = distToLine(point[1], point[0])
    if currDist < closest_dist:
        closest_dist = currDist
        closest_pt = (point[1], point[0])
    if currDist > farthest_dist:
        farthest_dist = currDist
        farthest_pt = (point[1], point[0])

plt.show()

Navigate

Python What will be the value assigned to the variable `letter` below? food = \'

Python Where do I do wrong? Your friend who works as a sports teacher in primary

Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.

Python When analyzing data, we often have some idea that two quantities are rela

Question

Explanation / Answer

Related Questions

Navigate