Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

document1.txt:https://www.cse.msu.edu/~cse231/Labs/Lab10/document1.txt document2

ID: 3848920 • Letter: D

Question

document1.txt:https://www.cse.msu.edu/~cse231/Labs/Lab10/document1.txt

document2.txt:https://www.cse.msu.edu/~cse231/Labs/Lab10/document2.txt

lab10a.py:https://www.cse.msu.edu/~cse231/Labs/Lab10/lab10a.py

lab10b.py:https://www.cse.msu.edu/~cse231/Labs/Lab10/lab10b.py

This lab exercise provides practice with sets in Python. It has similarities to the previous lab on dictionaries, but this time with sets. You will work with a partner on this exercise during your lab session. Two people should work at one computer. occasionally switch the person who is typing. Talk to each other about what you are doing and why so that both of you understand each step. Programming with Sets Consider the file named "lab10a.py". That file contains the skeleton of a Python program to do a simple analysis of two files: it will display the number of unique words which appear in the two files (the union of those two sets of words), as well as the number of unique words which are common to both files (the intersection of those two sets of words) Case does not matter: the words "pumpkin "Pumpkin" and "PUMPKIN" should be treated as the same word. Only unique words should be counted: if a word appears more than once in a file, it should only be counted once. Note: remember to remove punctuation from words, e.g. "it, " should be "it Replace the comments labeled "YOUR COMMENT" in function "build_word_ set" with meaningful comments to describe the work being done in the next statement. Use more than one comment line, if necessary. Revise function "compare_files" to accomplish the work described in the comments. Test the revised program. There are two sample documents available: "document1.txt" (The Declaration of Independence) and "document2.txt" The Gettysburg Address) Demonstrate your completed program to your TA. on-line students should submit the completed program (named "lab10a.py") for grading via the Mirmir system. Programming with Dictionaries and Sets Consider the file named "lab10b.py". That file contains the skeleton of a Python program to display information about the words in a document. Function "main" is complete. It handles the interaction with the user and calls other functions to perform the appropriate tasks. Function "print_word_index" is complete. It receives a dictionary, where each element is a word and a set of line numbers where that word appears in a document. It displays all of the words (in alphabetic order), along with the lines numbers for each word (in ascending order). Function "build_word_index" is incomplete. It receives on input file and builds a dictionary containing the unique words which appear in the input file, along with the line numbers where each word appears The first line of the input file should be considered to be line 1. Revise function "build_word_index" to accomplish the specified work. You may wish to review function, "build_word_set" (above) for ideas about how to handle upper and lower case letters, as well as punctuation. Test the revised program using the sample documents. Demonstrate your completed program to your TA. On-line students should submit the completed program (named "Iabl0b.pv") for grading via the Mirmir system.

Explanation / Answer

Part.A Code:
#!/usr/local/bin/python3

import string

def build_word_set(input_file):
word_set = set()
for line in input_file:
# removing the new line and splitting by space
word_lst = line.strip().split()
# converting each word to lower case removing punctuation symbols like , . etc and putting all the words in word_list set
word_lst = [w.lower().strip(string.punctuation) for w in word_lst]
for word in word_lst:
if word != "":
# adding each word to word_set
word_set.add( word )
  
return word_set

def compare_files(file1,file2):
# Build two sets:
# all of the unique words in file1
# all of the unique words in file2
file1_words = build_word_set(file1)
file2_words = build_word_set(file2)

# union of two sets file1_words and file2_words
union_of_words = file1_words | file2_words
# getting the length of number of unique words from two files
unique_word_count = len(union_of_words)

# Display the total number of unique words between the
# two files. If a word appears in both files, it should
# only be counted once.
print("Total unique words: ", unique_word_count)

# intersection of two sets file1_words and file2_words
intersect_of_words = file1_words & file2_words
# getting the length of number of unique words in both files
unique_word_in_both_count = len(intersect_of_words)

# Display the number of unique words which appear in both
# files. A word should only be counted if it is present in
# both files.
print("Unique words that appear in both files: ", unique_word_in_both_count)
  
def main():
f1 = open('document1.txt')
f2 = open('document2.txt')
compare_files(f1, f2)
f1.close()
f2.close()   

if __name__=='__main__':
main()


Execution and output:
Unix Terminal> python3 lab10a.py
Total unique words: 606
Unique words that appear in both files: 61
Unix Terminal>

Part.B Code:
#!/usr/local/bin/python3

import string

def build_word_index(input_file):
word_map = {}
line_no = 0
# iterating through each line in input file
for line in input_file:
# removing new line from the line and splitting the line into words using space as delimiter
word_lst = line.strip().split()
# converting words to lowercase and stripping punctuation symbols
word_lst = [w.lower().strip(string.punctuation) for w in word_lst]
line_no += 1
# iterating through each word in line
for word in word_lst:
if word != "":
# adding line numbers where word has seen into the word_map dict
if word in word_map:
word_map[word].append(line_no)
else:
word_map[word] = [line_no]
return word_map

def print_word_index(word_map):
index_lst = sorted(list(word_map.items()))

for word, line_set in index_lst:
line_lst = sorted(list(line_set))
line_str = str(line_lst[0])
for line_no in line_lst[1:]:
line_str += ", {}".format( line_no )
print( "{:14s}: {}".format(word, line_str))

## Alternative way to create the line_str
## line_str = ",".join([str(i) for i in line_lst])

def main():
filename = raw_input( "Name of file to be processed: " )
print filename
try:
file = open(filename, "r" )
index = build_word_index(file)
print_word_index( index )
file.close()
except IOError:
print( "Halting -- unable to open", filename )

if __name__ == "__main__":
main()
  


Execution and output:
Unix Terminal> python lab10b.py
Name of file to be processed: document1.txt
document1.txt
a : 2, 5, 7, 8, 8, 12, 24
above : 14
add : 15
advanced : 19
ago : 1
all : 3
altogether : 10
and : 1, 2, 6, 10, 14, 24
any : 6
are : 3, 5, 7
as : 8
battlefield : 7
be : 17, 19
before : 20
birth : 24
brave : 13
brought : 1
but : 12, 16
by : 25
can : 6, 12, 12, 13, 16
cause : 21
civil : 5
come : 8
conceived : 2, 6
consecrate : 13
consecrated : 14
continent : 2
created : 3
dead : 14, 20, 22
dedicate : 8, 12
dedicated : 2, 6, 17, 19
detract : 15
devotion : 21, 22
did : 16
died : 23
do : 10
earth : 25
endure : 7
engaged : 5
equal : 3
far : 14, 18
fathers : 1
field : 8
final : 8
fitting : 10
for : 9, 17, 19, 21, 25
forget : 16
forth : 1
fought : 18
four : 1
freedom : 24
from : 20, 25
full : 21
gave : 9, 21
god : 23
government : 24
great : 5, 7, 19
ground : 13
hallow : 13
have : 7, 14, 18, 23, 24
here : 9, 14, 16, 17, 17, 18, 19, 22
highly : 22
honored : 20
in : 2, 5, 12, 23
increased : 21
is : 10, 17, 19
it : 10, 14, 16, 17, 19
larger : 12
last : 21
liberty : 2
little : 15
live : 10
lives : 9
living : 13, 17
long : 6, 15
measure : 22
men : 3, 13
met : 7
might : 9
nation : 2, 6, 6, 9, 23
never : 16
new : 2, 24
nobly : 18
nor : 15
not : 12, 12, 13, 23, 25
note : 15
now : 5
of : 7, 8, 22, 24, 24
on : 1, 7
or : 6, 15
our : 1, 14
people : 24, 25, 25
perish : 25
place : 8
poor : 15
portion : 8
power : 15
proper : 10
proposition : 3
rather : 17, 19
remaining : 20
remember : 16
resolve : 22
resting : 8
say : 16
score : 1
sense : 12
seven : 1
shall : 23, 23, 25
should : 10
so : 6, 6, 18
struggled : 14
take : 20
task : 20
testing : 5
that : 3, 5, 7, 8, 9, 9, 10, 20, 21, 22, 22, 23, 24
the : 2, 13, 15, 17, 17, 19, 21, 24, 25, 25, 25
their : 9
these : 20, 22
they : 16, 18, 21
this : 1, 10, 13, 23
those : 9
thus : 18
to : 2, 8, 15, 17, 17, 19, 19, 21
under : 23
unfinished : 18
us : 17, 19, 20
vain : 23
war : 5, 7
we : 5, 7, 7, 10, 12, 12, 13, 16, 20, 22
what : 16, 16
whether : 5
which : 18, 21
who : 9, 14, 18
will : 15
work : 18
world : 15
years : 1
Unix Terminal>