Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

Using Python- Your research group developed two systems to measure the speed of

ID: 3595796 • Letter: U

Question

Using Python-
Your research group developed two systems to measure the speed of a tennis ball. The local tennis pro suspects
that the systems are not consistent with each other. To test her suspicions, she sets up both systems and records
the speeds of 20 serves. The values are stored in the data file “../data/tennis.csv” in the columns speed1 and
speed2. The recorded speeds are in kilometers per hour.
a. What is the average difference of measurements between the two systems (speed1-speed2)?
b. Start the analysis by verifying the normality assumption. Perform Shapiro-Wilk test of normality on the
differences (speed1-speed2). What is the p-value of the test?
c. Perform the t-test (paired, 2-sided). What is the p-value?
d. What is the lower limit of a 95% confidence interval on the mean difference of measurements based on the
t-test?
e. What is the upper limit of a 95% confidence interval on the mean difference of measurements based on the
t-test?
NOTE: To find an unbiased estimator of the standard deviation use
np.std(speed1-speed2,ddof=1) # ddof=1 --> std = sum( (xi-xbar)^2/(n-ddfof)

Explanation / Answer

import numpy as np
import pandas as pd

from scipy.stats.mstats import normaltest

from scipy.stats import shapiro

from scipy.stats import ttest_ind

df = pd.read_csv('tennis.csv', header=None)

df=df.drop(0)

df.apply(pd.to_numeric, errors='ignore')

df=df.astype(float)

df= np.array(df)

#a)

mean = np.mean(df,axis=0)

avg_diff= mean[0]-mean[1]

print(avg_diff)

#b)

dim=df.shape
x_len=dim[0]

diff_arr=[]
for i in range(0,_len):

diff_arr.append(df[i][0]- df[i][1]

normal_val= normaltest(diff_arr)

p_val= shapiro(diff_arr)

print(p_val)

#c)

p_val = ttest_ind(df[0],df[1],axis=0)

print(p_val)

#d)

std= np.std(df[0]-df[1],axis=0,ddof=1)

## formula for computing lower limit of 95% confidence interval is, LL= avg_diff - (T*std)

##here T is the critical value and depends on number of rows in your data and follows from checking distribution table, since you haven't mentioned it, just use online calculator to find it. If it asks for degrees of freedom, use 2*no_of_rows_in_dataset-2, I've assumed no_of_rows as 200==> T=1.972

LL = avg_diff- (1.972 * std)

print(LL)

#e)

## formula for computing lower limit of 95% confidence interval is, UL= avg_diff + (T*std)

UL= avg_diff+ (1.972 * std)

print(UL)

Hire Me For All Your Tutoring Needs
Integrity-first tutoring: clear explanations, guidance, and feedback.
Drop an Email at
drjack9650@gmail.com
Chat Now And Get Quote