Python 1. Creating the word dictionary [Coding only: save code as problem1.py ]
ID: 3852912 • Letter: P
Question
Python
1. Creating the word dictionary [Coding only: save code as problem1.py ] The first step in building an n-gram model is to create a dictionary that maps words to java map or python dictionary (which we’ll use to access the elements corresponding to that word in a vector or matrix of counts or probabilities). You’ll create this dictionary from the given data files (Select one file for training purpose) for all unique words. You’ll need to split the sentences (consider each line) into a list of words and convert each word to lowercase, before storing it to the dictionary.
For example, I have a text file abc.txt
Explanation / Answer
NOTE: The question is not clear to me to understand what actually you require. After reading question multiple times it looks like you want to create a dictionary of words from a file which i did and below is the code. If this is not the expected answer, please mention clearly about your requirement along with example. I will revert back within 24 hours.
Code:
#!/usr/bin/python
import sys
# Program to get unique words from a given file
# script name : uniq_word.py
def main():
# Taking filename from command line and validating the arguments
if(len(sys.argv) != 2):
print "One argument must be passed and is a filename. Usage: " + sys.argv[0] + " <filename>"
exit(0)
# taking filename into the variable filename
filename = sys.argv[1]
word_dict = {}
# opening the filename in read mode
with open(filename, "r") as fp:
# iterating through each line in file
for line in fp:
# splitting the words in each line
words = line.split() # by default space is used as a delimiter to split words in a line
# Iterating through each word to store in dictionary
for word in words:
# if we are seeing a word first time assign the value 1
word = word.lower()
if not word in word_dict:
word_dict[word] = 1
# if the word already exists in dictionary increment the frequency of word
else:
word_dict[word] += 1
# closing the file
fp.close()
# Iterating through word dictionary and printing the unique words
for word in word_dict:
print word + "=" + str(word_dict[word])
if __name__=='__main__':
main()
Execution and output:
Unix Terminal> cat test
this is a program file to test word_length
hello how are you
hello world is the first program
Unix Terminal> python uniq_word.py test
a=1
the=1
how=1
this=1
is=2
to=1
program=2
word_length=1
are=1
file=1
test=1
world=1
you=1
hello=2
first=1
Related Questions
Navigate
Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.