Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

(a) (4 marks) Complete the subsequent Python code that does the following, in th

ID: 3905041 • Letter: #

Question

(a) (4 marks) Complete the subsequent Python code that does the following, in this order: 1. (1 mark) Use NLTK to split the text into sentences. 2. (1 mark) Use NLTK to split each sentence into words 3. (1 mark) Use NLTK to find the parts of speech of each word. 4. (1 mark) Use the counter Python package to find and print the most common part of speech. Your code does not need to convert the words to lowercase or uppercase. Do not use gutenberg.words() or gutenberg.sents() in your solution. import nltk emma = n1tk. corpus. gutenberg. raw('austen-emma. txt') # Use NLTK to split the text into sentences (1 mark) # Use NLTK to split each sentence into words (1 mark) # Use NLTK to find the parts of speech (1 mark) # Print the most frequent part of speech (1 mark)

Explanation / Answer

The program is below with explanation.

import nltk
import collections

emma = nltk.corpus.gutenberg.raw('austen-emma.txt')

tag_list = []

#Text to sentences
sentences = nltk.sent_tokenize(emma)

# Sentences to words
for sentence in sentences:
    words = nltk.word_tokenize(sentence)
    # Give Part of speech to each word
    tagged_words = nltk.pos_tag(words)
    tag_list.append(tagged_words)
  
#Find frequent part of speech  
pos_frequent_counts = collections.Counter((freq_pos[1] for freq_pos in tag_list))
#Here can specify How many most common Part of speech we want     
freq_pos_words = pos_counts.most_common(3)
# I have used 3. This will give 3 most common Position of tags

print(freq_pos_words)