You are given three data files (data1.txt, data2.txt, and data3.txt) consisting
ID: 3590325 • Letter: Y
Question
You are given three data files (data1.txt, data2.txt, and data3.txt) consisting of integers (1 integer on each line). Write Python code that reads each of these three files, computes sample mean and standard deviation for each data set and write them on a file, output.txt. For this problem, you should import simple_ds.py and use the mean function rather than including the code directly in your script. Next, combine the data of the three data sets in these data files and draw a histogram as well as print the result (combined data) on the screen. The output to the screen must be in a reader friendly format. Use matplotlib (as demonstrated in hist4.py) to plot a histogram of the combined data and not matlab.
Data set 1:
1
2
3
4
5
8
9
9
10
0
12
1
1
1
2
0
2
3
3
Data set 2:
4
5
1
2
7
8
8
9
0
1
2
3
4
5
Data set 3:
1
2
3
1
1
6
10
23
11
11
24
55
simple_ds.py:
# mean() - compute the sample mean.
# Parameters:
# N a list of numbers
#
def mean( N ):
# running total
Total = 0
# count of the number of items
Count = len(N)
# for each item in the list
for Num in N:
# increment the total
Total = Total + Num
# compute the sample average
average = float(Total)/Count if Count > 0 else 0
return(average)
#
# std_dev() - compute the sample standard deviation.
# Parameters:
# N a list of numbers
#
def std_dev( N ):
Count = len(N)
# Compute the average
average = mean(N)
if Count > 1:
# Compute the std dev.
Total = 0
for Num in N:
Total = Total + (float(Num) - average)**2
std_dev = ((float(1)/(Count-1))*Total)**(float(1)/2)
else:
std_dev = 0
return(std_dev)
# default main() for command-line execution.
def main():
Numbers = [123, 87, 96, 24, 104, 16, 85, 55, 62, 109]
# display output
print 'Numbers: {}'.format(Numbers)
the_avg = mean(Numbers)
print 'Average: {:.3f}'.format(the_avg)
the_std_dev = std_dev(Numbers)
print 'Standard deviation: {:.3f}'.format(the_std_dev)
# This line causes main() to be executed if this module
# is executed without an import (i.e., from the command line).
# If the module is imported, the condition will fail.
if __name__ == "__main__": main()
Explanation / Answer
Part One
from simple_ds import mean,std_dev
file_names_list = ['data_set1.txt','data_set2.txt','data_set3.txt']
individual_all_data = []
for _ in file_names_list:
with open(_,"r") as f:
all_data = f.readlines()
for i in range(len(all_data)):
all_data[i] = all_data[i].strip()
all_data = list(map(int,all_data))
individual_all_data.append(all_data)
# print(individual_all_data)
for _ in individual_all_data:
mean_value = mean(_)
std_dev_value = std_dev(_)
with open("output.txt","a") as o:
o.write("{} {} ".format(mean_value,std_dev_value))
Part Two
import matplotlib.mlab as mlab
import matplotlib.pyplot as plt
file_names_list = ['data_set1.txt','data_set2.txt','data_set3.txt']
theta = []
for _ in file_names_list:
with open(_,"r") as f:
all_data = f.readlines()
for i in range(len(all_data)):
theta.append(int(all_data[i].strip()))
num_bins = 500
# the histogram of the data
n, bins, patches = plt.hist(theta, num_bins, range=[0,50], normed = True, histtype='bar',facecolor='green')
plt.xlabel(r'X-axis')
plt.ylabel(r'Y-axis')
plt.show()
Related Questions
drjack9650@gmail.com
Navigate
Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.