In this problem you will write a function to parse a Shakespearean text (a short

ID: 3719399 • Letter: I

Question

In this problem you will write a function to parse a Shakespearean text (a short monologue) and glean information from the text file. There are a few parts to this problem; you will be required to print 1. The number of words in the document 2. The number of different (unique) words in the document 3. The total number of words that contain apostrophes. 4. A list of words in the document-that have a frequency of 5 or greater-paired with their corresponding frequency. Any words that contain apostrophes must keep them if they appear in the list! The list should be sorted in descending order by frequency (see example below). Additionally, words like "Mother" and "mother" should be only one key in your dictionary (use lower. Note that words paired by a hyphen or dashorare counted as separate words. For example, "five-year-old" is considered to contain three separate words: "five", "year", and "old". You can assume that the file being mined (fname) exists in the same repository as the program Your function should be named: def mine file(fname): Where fname is the file name (such as "shakespeare.txt"). You will need to use file I/O methods such as open ),close, and readline ) to accomplish this problem. Here isa sample output of the program: For the file macbeth.txt: Word Count: 205 Unique Word Count: 127 Apostrophe Word Count: 8 to: 10 the: 8 of: 8 and 7 my: 5 be: 5 We have provided you with one Shakespearean text file to use when writing this problem (macbeth.txt)

Explanation / Answer

#remove_punctuation is a helper function

def remove_punctuation(line):
   ap_count = 0
   punctuations = [" ","!","(",")","[","]","{","}",";",":",""","/",",","<",">",".","\","?","@","#","$","%","^","&","*","~","|","+"]

   no_punct = ""
   for char in line:
       if char == '-':
           no_punct = no_punct + " "
       elif char == '`':
           ap_count+=1
           no_punct = no_punct + char
       elif char not in punctuations:
            no_punct = no_punct + char

   return no_punct, ap_count

def mine_file(fname):
   ap_count = 0
   all_lines = []
   with open(fname) as f:
       l = [x for x in f.readlines()]
       for each_line in l:
           each_line.strip()
           each_line, count = remove_punctuation(each_line)
           ap_count += count
           if len(each_line)>0:
               all_lines.append(each_line)

   t_words = []
   for line in all_lines:
       words = list(line.split(" "))
       t_words.append(words)

   all_words = []
   for words in t_words:
       for each_word in words:
           each_word = each_word.lower()
           all_words.append(each_word)

   total_no_words = len(all_words)
   print "Word Count: {} ".format(total_no_words)

   word_dict = dict()
   for word in all_words:
       if word not in word_dict:
           word_dict[word] = 1
       else:
           word_dict[word] += 1

   no_different_unique_words = len(word_dict)
   print "Unique Word Count: {} ".format(no_different_unique_words)

   no_words_having_aposthrope = ap_count
   print "Apostrophe Word Count: {} ".format(no_words_having_aposthrope)

   freq5 = list((i,j) for i, j in word_dict.items() if j>=4)

   final = sorted(freq5, key = lambda x: [ x[1], x[0]], reverse = True)
   for x in final:
       print "{}: {}".format(x[0], x[1])

Navigate

In this problem you will write a function to implement following math function:

In this problem you will write a program that plays connect four. The program sh

Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.

In this problem you will write a function to parse a Shakespearean text (a short

Question

Explanation / Answer

Related Questions

Navigate