Using Python- Your research group developed two systems to measure the speed of
ID: 3595796 • Letter: U
Question
Using Python-
Your research group developed two systems to measure the speed of a tennis ball. The local tennis pro suspects
that the systems are not consistent with each other. To test her suspicions, she sets up both systems and records
the speeds of 20 serves. The values are stored in the data file “../data/tennis.csv” in the columns speed1 and
speed2. The recorded speeds are in kilometers per hour.
a. What is the average difference of measurements between the two systems (speed1-speed2)?
b. Start the analysis by verifying the normality assumption. Perform Shapiro-Wilk test of normality on the
differences (speed1-speed2). What is the p-value of the test?
c. Perform the t-test (paired, 2-sided). What is the p-value?
d. What is the lower limit of a 95% confidence interval on the mean difference of measurements based on the
t-test?
e. What is the upper limit of a 95% confidence interval on the mean difference of measurements based on the
t-test?
NOTE: To find an unbiased estimator of the standard deviation use
np.std(speed1-speed2,ddof=1) # ddof=1 --> std = sum( (xi-xbar)^2/(n-ddfof)
Explanation / Answer
import numpy as np
import pandas as pd
from scipy.stats.mstats import normaltest
from scipy.stats import shapiro
from scipy.stats import ttest_ind
df = pd.read_csv('tennis.csv', header=None)
df=df.drop(0)
df.apply(pd.to_numeric, errors='ignore')
df=df.astype(float)
df= np.array(df)
#a)
mean = np.mean(df,axis=0)
avg_diff= mean[0]-mean[1]
print(avg_diff)
#b)
dim=df.shape
x_len=dim[0]
diff_arr=[]
for i in range(0,_len):
diff_arr.append(df[i][0]- df[i][1]
normal_val= normaltest(diff_arr)
p_val= shapiro(diff_arr)
print(p_val)
#c)
p_val = ttest_ind(df[0],df[1],axis=0)
print(p_val)
#d)
std= np.std(df[0]-df[1],axis=0,ddof=1)
## formula for computing lower limit of 95% confidence interval is, LL= avg_diff - (T*std)
##here T is the critical value and depends on number of rows in your data and follows from checking distribution table, since you haven't mentioned it, just use online calculator to find it. If it asks for degrees of freedom, use 2*no_of_rows_in_dataset-2, I've assumed no_of_rows as 200==> T=1.972
LL = avg_diff- (1.972 * std)
print(LL)
#e)
## formula for computing lower limit of 95% confidence interval is, UL= avg_diff + (T*std)
UL= avg_diff+ (1.972 * std)
print(UL)
Related Questions
drjack9650@gmail.com
Navigate
Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.