Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

For this assignment, you need to use python to compute the term-frequency matrix

ID: 3578679 • Letter: F

Question

For this assignment, you need to use python to compute the term-frequency matrix for a set of documents.

A term frequency matrix is a table, where rows represent documents and columns represent the terms/words. The value in cell (i,j) is the number of times that word j occurs in document i.

To do this, your python program first needs to go through the files in the input folder, where each file is a separate document (thus, the number of documents in the number of files), and build a set of all unique terms across all the documents.

Let's call this list of terms T, which contains n terms.

Then you'll need to go through each file/document, and compute the number of times that each of the n words occurs in that document. Doing this, you will produce the term-document matrix.

The program should save this matrix in a file, where each row of the matrix appears on a separate line, and all terms occurrence frequencies are separated by commas.

The folder with the documents, representing movie reviews, is included in the assignment.

Explanation / Answer

import os
def wordCount(fileName):
file=open(fileName,"r+")
wordcount={}
for word in file.read().split():
if word not in wordcount:
wordcount[word] = 1
else:
wordcount[word] += 1
return wordcount
  
m = []
fileNum = 1
#give folder path here
path = folderpath
files = os.listdir(path)
for file in files:
d = {}
if (os.path.isfile(file)):
m[fileNum][0] = file
d = wordCount(file)
i=0
for key, value in d.items():
allKeys = [i[0] for i in m]
if key in allKeys:
for i in range(len(allKeys)):
if allKeys[i] == key:
m[fileNum][i] = value
else:
allKeys.append(key)
for i in range(len(allKeys)):
if allKeys[i] == key:
m[fileNum][i] = value

Hire Me For All Your Tutoring Needs
Integrity-first tutoring: clear explanations, guidance, and feedback.
Drop an Email at
drjack9650@gmail.com
Chat Now And Get Quote