In Python 3 The links to the files needed are at https://www.cse.msu.edu/~cse231
ID: 3849232 • Letter: I
Question
In Python 3
The links to the files needed are at https://www.cse.msu.edu/~cse231/Labs/Lab10/
Thank you!
Consider the file named "lab10a.py". That file contains the skeleton of a Python program to do a simple analysis of two files: it will display the number of unique words which appear in the two files (the union of those two sets of words), as well as the number of unique words which are common to both files (the intersection of those two sets of words) Case does not matter: the words "pumpkin Pumpkin" and "PUMPKIN" should be treated as the same word. Only unique words should be counted: if a word appears more than once in a file, it should only be counted once. Note: remember to remove punctuation from words, e "it," should be "it a. Replace the comments labeled "YOUR COMMENT" in function "build word set" with meaningful comments to describe the work being done in the next statement. Use more than one comment line, if necessary b. Revise function "compare files" to accomplish the work described in the comments. c. Test the revised program. There are two sample documents available: "document1.txt" (The Declaration of Independence) and "document2.txt" (The Gettysburg Address) Demonstrate your completed program to your TA. on-line students should submit the completed program (named "lab10a.py") for grading via the Mirmir system. Part B: Programming with Dictionaries and Sets Consider the file named "lab10b.py". That file contains the skeleton of a Python program to display information about the words in a document. Function "main" is complete. It handles the interaction with the user and calls other functions to perform the appropriate tasks Function "print word index" is complete. It receives a dictionary, where each element is a word and a set of line numbers where that word appears in a document. It displays all of the words (in alphabetic order), along with the lines numbers for each word (in ascending order)Explanation / Answer
import string
def build_word_set( input_file ):
word_set = set()
for line in input_file:
# YOUR COMMENT
word_lst = line.strip().split()
# YOUR COMMENT
word_lst = [w.lower().strip(string.punctuation) for w in word_lst]
for word in word_lst:
if word != "":
# YOUR COMMENT
word_set.add( word )
return word_set
def compare_files( file1, file2 ):
# Build two sets:
# all of the unique words in file1
# all of the unique words in file2
# Display the total number of unique words between the
# two files. If a word appears in both files, it should
# only be counted once.
print("Total unique words:", unique_word_count)
# Display the number of unique words which appear in both
# files. A word should only be counted if it is present in
# both files.
print("Unique words that appear in both files:", unique_word_in_both_count)
######################################################################
f1 = open( "document1.txt" )
f2 = open( "document2.txt" )
compare_files( f1, f2 )
f1.close()
f2.close()
lab 10b
import string
def build_word_index( input_file ):
word_map = {}
line_no = 0
for line in input_file:
# Missing code
return word_map
def print_word_index( word_map ):
index_lst = sorted(list(word_map.items()))
for word, line_set in index_lst:
line_lst = sorted(list(line_set))
line_str = str( line_lst[0] )
for line_no in line_lst[1:]:
line_str += ", {}".format( line_no )
print("{:14s}:".format(word), line_str )
## Alternative way to create the line_str
## line_str = ",".join([str(i) for i in line_lst])
def main():
filename = input( "Name of file to be processed: " )
try:
file = open( filename, "r" )
index = build_word_index( file )
print_word_index( index )
file.close()
except IOError:
print( "Halting -- unable to open", filename )
main()
Related Questions
drjack9650@gmail.com
Navigate
Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.