Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

document1.txt:www.cse.msu.edu/~cse231/Labs/Lab09/document1.txt document2.txt:www

ID: 3849031 • Letter: D

Question

document1.txt:www.cse.msu.edu/~cse231/Labs/Lab09/document1.txt

document2.txt:www.cse.msu.edu/~cse231/Labs/Lab09/document2.txt

Assignment overview This lab exercise provides practice with dictionaries in Python. A. Modify a program that uses Dictionaries Consider the file named "lab08a.py". That file contains the skeleton of a Python program to do a simple analysis of a text file: it will display the number of unique words which appear in the file, along with the number of times each word appears. Case does not matter: the words "pumpkin", "Pumpkin" and "PUMPKIN" should be treated as the same word. (The word "map" is used in identifiers because sometimes a dictionary is called a "map.) Execute the program (which currently uses "documenti.txt" as the data file) and inspect the output. a. Replace each of the lines labeled "YOUR COMMENT' with meaningful comments to describe the work being done in the next block of statements. Use more than one comment line, if necessary b. Add doc strings to each function to describe the work being done in the function c. The program currently processes the empty string as a word. Revise the program to exclude empty strings from the collection of words. d. The program currently processes words such as "The" and "the" as different words. Revise the program to ignore case when processing words e. The program currently always uses "document 1.txt" as the input file. Revise the program to prompt the user for the name of the input file f. Revise the program to display the collection of words sorted by greatest frequency of occurrence to least frequency, and sorted alphabetically for words with the same frequency count. Since the sorted function and the sort method are stable sorts, you can first sort the words alphabetically, and then sort them by frequency (with reverse True (You do the two sorts in that order because you do the primary key last, frequency is the primary key in this case.) By default sorting is done on the first item in a list or tuple. To sort on other items use itemgetter from the operator module. See documentation here, focus on the students tuple example https //docs thon.org/3/howto/sorting.html g. Test the revised program. There are two sample documents available: "document l.txt" (The Declaration of Independence) and "document2.txt" The Gettysburg Address)

Explanation / Answer

lab08b.py

def build_map(in_file, scores_dic):
    line_counter = 0
    line_list = []
    for line_counter, line in enumerate(in_file):
        if line_counter != 0:
            line_list = line.split()
            for word in line_list:          
                word = word.strip().lower()
            if line_list[0] not in scores_dic:
                scores_dic[line_list[0]] = 0
            scores_dic[line_list[0]] += int(line_list[1])
    return scores_dic

def dic_sort(scores_dic):
    sorted_list = []
    for item in scores_dic:
        sorted_list.append((item, scores_dic[item]))
    sorted_list = sorted(sorted_list, key=str)
    return sorted_list

def display_dic(sorted_list):
    print("{:<10}{:<6}".format("Name", "Total"))  
    for item in sorted_list:
        print("{:<10}{:<6}".format(item[0], item[1]))
              
scores_dic = {}
fp1 = open(input("Enter a file name: "), "r")
fp2 = open(input("Enter another file name: "))

scores_dic = build_map(fp1, scores_dic)
scores_dic = build_map(fp2, scores_dic)

sorted_list = dic_sort(scores_dic)

display_dic(sorted_list)


lab08a.py


import string
from operator import itemgetter


def add_word( word_map, word ):
    '''
    Adding the words of the file that occur to word_map, and determining
    how many times those words occur.
    Values: word_map (list of all the words and their frequencies), word (the
    word being inputted into the function)
    Returns: Nothing
    '''
    #Saying if the word isn't in the string of words, then it will be added to
    #the string of words at the count of zero
    if word not in word_map:
        word_map[ word ] = 0

    #Incrementing the count of the word by one
    word_map[ word ] += 1


def build_map( in_file, word_map ):
    '''
    Splits the lines off of the larger file and iterates through the lines
    to determine the frequency of the words within, and adding them to the
    larger word_map.
    Values: file pointer and word_map (word frequency list)
    Returns: Nothing
    '''
    for line in in_file:

        #Splitting each line into a list of words in the line
        word_list = line.split()

        for word in word_list:

            #Stripping the word of anything other than the word itself, and
            #converting the word to lowercase
            if word.count("-") == 0:          
                word = word.strip().strip(string.punctuation).lower()
                add_word( word_map, word )
      

def display_map( word_map ):
    '''
    Takes the words in the frequencies and prints them in a readable and
    neat format.
    Values: word_map (list of word frequencies)
    Returns: Nothing
    '''
    word_list = list()

    # Iterating through the word_map to create a printable list of tuples
    for word, count in word_map.items():
        word_list.append( (word, count) )

    # Sorting the words according to the frequency at which they occur
    temp_word_list = sorted(word_list, key=str)
    freq_list = sorted( temp_word_list, key=itemgetter(1), reverse=True )

    print( " {:15s}{:5s}".format( "Word", "Count" ) )
    print( "-"*20 )
    for item in freq_list:
        print( "{:15s}{:>5d}".format( item[0], item[1] ) )


def open_file(filename):
    '''
    Takes a user input of a file name and attempts to open it.
    Values: file name
    Returns: file pointer
    '''
    while True:
        try:
            in_file = open( filename, "r" )
            return in_file
        except FileNotFoundError:
            print( " *** unable to open file *** " )
            filename = input("Input a file name: ")

word_map = dict()
in_file = open_file(input("Input a file name: "))

build_map( in_file, word_map )
display_map( word_map )
in_file.close()