Problem 1 (Text Analysis) Create a Python file called analysis.py that will perf
ID: 3847098 • Letter: P
Question
Problem 1 (Text Analysis) Create a Python file called analysis.py that will perform text analysis on files. For this question, assume that each space in the document separates one word from the next so any use of the term 'word' means a string that occurs between two spaces (or in two special cases, between the start of the file and a space, or between a space and the end of the file). You can also assume there is no punctuation or other symbols present in the files only words separated by spaces. If you want to see examples of the type of text, look in the testfile .txt files included on cuLearn. You must implement and test the following functions inside of your analysis.py file 1) load(str) Takes a single string argument, representing a filename. The program must open the file and parse the text inside. This function should initialize the variables (e.g., lists, dictionaries, other variables) you need to solve the remainder of the problem. This way, the file contents can be parsed once and the functions below can be executed many times without re-reading the file, which is a slow process This function should also remove any information stored from a previous file when it is called (i.e., you start from nothing every time load is called) 2) commonword(list) -Takes a single list-type argument which contains string values. The function should operate as follows: a. If the list is empty or none of the words specified in the list occur in the text that has been loaded, the function should return None b. Otherwise, the function should return the word contained in the list that occurs most often in the loaded text or any one of the most common, in the case of a tie 3) commonletter(list) Takes a single list-type argument which contains single character strings (i.e., etters/characters). The function should operate as follows: a. If the list is empty or none of the letters specified in the list occur in the text that has been loaded, the function should return NoneExplanation / Answer
NOTE: Due to lack of time i was not able to implement the function commonpair(). Sorry for that.
Code:
#!/usr/bin/python
# Program to analyze text files
# Main function which triggers all other functions
def main():
# Asking user to enter the filename as input
filename = raw_input("Enter the filename: ")
load(filename)
#print " Given file after transforming to dictionary is " + str(word_dict)
#print " Given file after transforming to char dictonary is " + str(char_dict)
# sample list to test the commonword function
list = ['how', 'dict', 'main', 'to', 'are']
cword = commonword(list)
print " Most common word is: " + str(cword)
# sample list of characters to test the commonletter function
list = ['a', 'b', 'c', 'd', 'e', 'f']
cletter = commonletter(list)
print " Most common letter is: " + str(cletter)
# counting total words and unique word count
word_cnt = countall()
uniq_cnt = countunique()
print " Total word count is: " + str(word_cnt)
print " Unique word count is: " + str(uniq_cnt)
# function to return word count
def countall():
if len(word_list) > 0:
return len(word_list)
else:
return None
# function to return unique word count
def countunique():
if len(word_dict) > 0:
return len(word_dict)
else:
return None
# function to return most common letter of the list from load text
def commonletter(list):
if len(list) > 0:
mcomm_char = ""
mcomm_freq = 0
for i in range(0, len(list)):
char = list[i]
# checking if char fetched from list is available in char_dict and is it the most common character
if char in char_dict and char_dict[char] > mcomm_freq:
mcomm_freq = char_dict[char]
mcomm_char = char
else:
return None
return mcomm_char
# function to return most common word of the list from loaded text
def commonword(list):
if len(list) > 0:
mcomm_word = ""
mcomm_freq = 0
for i in range(0, len(list)):
# checking if word fetched from the list is available in word_dict and is it the most common word
word = list[i]
if word in word_dict and word_dict[word] > mcomm_freq:
mcomm_freq = word_dict[word]
mcomm_word = word
else:
return None
return mcomm_word
# function to load contents from file to a list
def load(str):
# declaring and initializing word_dict, word_list, char_dict as these are used across the functions
global word_dict, word_list, char_dict
word_dict = {}
char_dict = {}
word_list = []
# opening the file in read mode
with open(str, "r") as fp:
# iterating through each line in file
for line in fp:
# converting the line into words and store it in line_list
line_list = line.rstrip(' ').split()
for i in range(0, len(line_list)):
word = line_list[i]
# fetching each word and storing in list
word_list.append(word)
# creating a dictionary of words
if word in word_dict:
word_dict[word] = word_dict[word] + 1
else:
word_dict[word] = 1
# fetching each character from word and creating dictionary of characters
for char in word:
if char in char_dict:
char_dict[char] = char_dict[char] + 1
else:
char_dict[char] = 1
fp.close()
if __name__=='__main__':
main()
Execution and output:
Unix Terminal> cat testfile
hello how are you
why are things
Unix Terminal> python analysis.py
Enter the filename: testfile
Most common word is: are
Most common letter is: e
Total word count is: 7
Unique word count is: 6
Unix Terminal>
Related Questions
drjack9650@gmail.com
Navigate
Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.