http://www.cse.msu.edu/~cse231/Online/Labs/Lab10/lab10a.py ======== lab10.py htt
ID: 3803745 • Letter: H
Question
http://www.cse.msu.edu/~cse231/Online/Labs/Lab10/lab10a.py ======== lab10.py
http://www.cse.msu.edu/~cse231/Online/Labs/Lab10/document1.txt ======== doc1.txt
Part A: Programming with Sets Consider the file named "lab10a.py". That file contains the skeleton of a Python program to do a simple analysis of two files: it will display the number of unique words which appear in the two files (the union of those two sets of words as well as the number of unique words which are common to both files (the intersection of those two sets of words). Case does not matter: the words "pumpkin", "Pumpkin" and "PUMPKIN" should be treated as the same word. Only unique words should be counted: ifa word appears more than once in a file, it should only be counted once. Note: remember to remove punctuation from words, e.g. "it," should be "it" a. Replace the comments labeled "YOUR COMMENT" in function "build word set" with meaningful comments to describe the work being done in the next statement. Use more than one comment line, if necessary b. Revise function "compare files" to accomplish the work described in the comments. c. Test the revised program. There are two sample documents available: "document1.txt" The Declaration of Independence) and "document2.txt" (The Gettysburg Address).Explanation / Answer
import string
def build_word_set( input_file ):
word_set = set()
for line in input_file:
# split word on space and store in list after stripping any leading
# or trailing spaces in words
word_lst = line.strip().split()
# for each word changing it to lower case and remove any punctuation
word_lst = [w.lower().strip(string.punctuation) for w in word_lst]
for word in word_lst:
if word != "":
# If a word is not empty add it to set.
# set data structure will make sure than only one element is
# present in set
word_set.add( word )
return word_set
def compare_files( file1, file2 ):
# Build two sets:
# all of the unique words in file1
unique_word_file1_set = build_word_set(file1)
# all of the unique words in file2
unique_word_file2_set = build_word_set(file2)
# Display the total number of unique words between the
# two files. If a word appears in both files, it should
# only be counted once.
union_set = unique_word_file1_set.union(unique_word_file2_set)
print "Total number of unique words between the two files " + str(len(union_set))
# Display the number of unique words which appear in both
# files. A word should only be counted if it is present in
# both files.
intersection_set = unique_word_file1_set.intersection(unique_word_file2_set)
print "The number of unique words which appear in both files " + str(len(intersection_set))
######################################################################
f1 = open( "document1.txt" )
f2 = open( "document2.txt" )
compare_files( f1, f2 )
f1.close()
f2.close()
'''
Sample run
$ python lab10a.py
Total number of unique words between the two files 606
Total number of unique words between the two files 60
# line code if indentation issue: https://goo.gl/nRFnGr
'''
Related Questions
Navigate
Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.