Python code for: In this project, you will be writing a program to merge and fil
ID: 3575117 • Letter: P
Question
Python code for:
In this project, you will be writing a program to merge and filter one or more csv files. This project will test many of the skills you've acquired this semester, and will introduce you to the csv module.
Sample Program Execution
The user should be prompted for a comma separated list containing the csv files you want to merge, and a comma separated list of filters:
Enter the csv files: april.csv,june.csv,july.csv
Enter the filters: 1==40.7293,2==-73.992
Result: Date/Time,Lat,Lon,Base 6/1/2014 0:00:00,40.7293,-73.992,B02512
The filters let the user control which rows in the csv files are merged. In the execution above, column 1 has to be equal to 40.7293 and column 2 has to equal to 73.992 or the row is excluded from the final print out. You should make no assumptions about the number of filters, rows, or columns. The format for a filter is the column number (starting at 0, as it is an index), a comparison operator, and a value. You must support all of the familiar comparison operators (==, !=, >, >=, <, <=); e.g. all of these are valid filters: 0!="6/1/2014 14:45:00" 1==40.7293 2==-73.992 0>"6/1/2014 14:45:00" 1>=40.7293 100<50 100<=55 Further, there are 3 data types you have to support: string, float, and int. If the filter value has quotes (e.g. "6/1/2014 14:45:00"), then the row value and the filter value should be compared as strings, if the filter value has a period (e.g. 40.7293) the row value and the filter value should be compared as floats, otherwise (e.g. 55) the row value and filter value should be compared as integers. Your program will need to perform these conversions. The sample execution below would compare the filter value and each row's column 0 as strings: Enter the csv files: june.csv Enter the filters: 0=="6/1/2014 0:00:00" Result: Date/Time,Lat,Lon,Base 6/1/2014 0:00:00,40.7293,-73.992,B02512 CS1064 Fall 2016 Input File Format The files in the sample executions above, april.csv, etc., are in csv format (comma separated values), a common file type for distributing data.
If you open the file april.csv in a text editor it will look something like this:
Date/Time,Lat,Lon,Base
4/1/2014 0:11:00,40.769,-73.9549,B02512
4/1/2014 0:17:00,40.7267,-74.0345,B02512
4/1/2014 0:21:00,40.7316,-73.9873,B02512
4/1/2014 0:28:00,40.7588,-73.9776,B02512 ...
So a text editor will show you all of the data, but it's not the most readable way to look at the file. A better way to visualize what's in the file is to open it in excel: The first row in each file (april.csv. may.csv, june.csv, etc) will always be a "header" line that you may assume is the same across files, it should be included only once in your print out! All other rows in the csv files will be data that could be merged.
Hints For this project I'd recommend reading the file in a different way. Rather than reading the file manually, use the csv module:
import csv
f = open("april.csv", "r")
r = csv.reader(f)
for row in r:
print(row)
Each row is a list containing the data from one row in the csv file. This code would print:
['Date/Time', 'Lat', 'Lon', 'Base']
['4/1/2014 0:11:00', '40.769', '-73.9549', 'B02512']
['4/1/2014 0:17:00', '40.7267', '-74.0345', 'B02512']
['4/1/2014 0:21:00', '40.7316', '-73.9873', 'B02512']
['4/1/2014 0:28:00', '40.7588', '-73.9776', 'B02512']
['4/1/2014 0:33:00', '40.7594', '-73.9722', 'B02512']
['4/1/2014 0:33:00', '40.7383', '-74.0403', 'B02512']
['4/1/2014 0:39:00', '40.7223', '-73.9887', 'B02512'] ...
Another useful function is next(). next() allows us to pull individual rows out of the csv file without using a for loop. The ordering is the same as the for loop though:
import csv f = open("april.csv", "r")
r = csv.reader(f)
# Get just the first row (the row with the header line):
# ['Date/Time', 'Lat', 'Lon', 'Base']
header = next(r, [])
# for loop for the rest of the data
for row in r:
print(row)
The for loop in this case would just print the "data" rows:
['4/1/2014 0:11:00', '40.769', '-73.9549', 'B02512']
['4/1/2014 0:17:00', '40.7267', '-74.0345', 'B02512'] ...
You may use any function that's part of the csv package when working on this assignment. For more information check out the python documentation: https://docs.python.org/3/library/csv.html
Explanation / Answer
# -*- coding: utf-8 -*-
"""
Created on Thu Dec 8 22:20:38 2016
@author: naresh
"""
import os
import csv
os.chdir("/home/naresh/Desktop/chegg/")
file_list = []
filter_list = []
l1 = list(input("Enter the csv files:").split(','))
l2 = input("Enter the filters:").split(',')
for val in l1:
print("val",val)
f = open(val, "r")
r = csv.reader(f)
firstline = True
for row in r:
if firstline:
firstline = False
continue
else:
print(row)
Related Questions
Navigate
Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.