Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

In this problem you will write a function to parse a Shakespearean text (a short

ID: 3719399 • Letter: I

Question

In this problem you will write a function to parse a Shakespearean text (a short monologue) and glean information from the text file. There are a few parts to this problem; you will be required to print 1. The number of words in the document 2. The number of different (unique) words in the document 3. The total number of words that contain apostrophes. 4. A list of words in the document-that have a frequency of 5 or greater-paired with their corresponding frequency. Any words that contain apostrophes must keep them if they appear in the list! The list should be sorted in descending order by frequency (see example below). Additionally, words like "Mother" and "mother" should be only one key in your dictionary (use lower. Note that words paired by a hyphen or dashorare counted as separate words. For example, "five-year-old" is considered to contain three separate words: "five", "year", and "old". You can assume that the file being mined (fname) exists in the same repository as the program Your function should be named: def mine file(fname): Where fname is the file name (such as "shakespeare.txt"). You will need to use file I/O methods such as open ),close, and readline ) to accomplish this problem. Here isa sample output of the program: For the file macbeth.txt: Word Count: 205 Unique Word Count: 127 Apostrophe Word Count: 8 to: 10 the: 8 of: 8 and 7 my: 5 be: 5 We have provided you with one Shakespearean text file to use when writing this problem (macbeth.txt)

Explanation / Answer

#remove_punctuation is a helper function

def remove_punctuation(line):
   ap_count = 0
   punctuations = [" ","!","(",")","[","]","{","}",";",":",""","/",",","<",">",".","\","?","@","#","$","%","^","&","*","~","|","+"]

   no_punct = ""
   for char in line:
       if char == '-':
           no_punct = no_punct + " "
       elif char == '`':
           ap_count+=1
           no_punct = no_punct + char
       elif char not in punctuations:
            no_punct = no_punct + char

   return no_punct, ap_count

def mine_file(fname):
   ap_count = 0
   all_lines = []
   with open(fname) as f:
       l = [x for x in f.readlines()]
       for each_line in l:
           each_line.strip()
           each_line, count = remove_punctuation(each_line)
           ap_count += count
           if len(each_line)>0:
               all_lines.append(each_line)

   t_words = []
   for line in all_lines:
       words = list(line.split(" "))
       t_words.append(words)

   all_words = []
   for words in t_words:
       for each_word in words:
           each_word = each_word.lower()
           all_words.append(each_word)


   total_no_words = len(all_words)
   print "Word Count: {} ".format(total_no_words)

   word_dict = dict()
   for word in all_words:
       if word not in word_dict:
           word_dict[word] = 1
       else:
           word_dict[word] += 1

   no_different_unique_words = len(word_dict)
   print "Unique Word Count: {} ".format(no_different_unique_words)

   no_words_having_aposthrope = ap_count
   print "Apostrophe Word Count: {} ".format(no_words_having_aposthrope)

   freq5 = list((i,j) for i, j in word_dict.items() if j>=4)

   final = sorted(freq5, key = lambda x: [ x[1], x[0]], reverse = True)
   for x in final:
       print "{}: {}".format(x[0], x[1])

Hire Me For All Your Tutoring Needs
Integrity-first tutoring: clear explanations, guidance, and feedback.
Drop an Email at
drjack9650@gmail.com
Chat Now And Get Quote