The file “ap_docs.txt” contains several old newswire articles. We will use this
ID: 3680277 • Letter: T
Question
The file “ap_docs.txt” contains several old newswire articles. We will use this as our document collection; when a user gives us a set of keywords we will find the documents in this collection that match their search terms. Each article in the collection is separated by a line that contains only the “” token. Your program will read in the documents from the file and number each document starting with 1 (the first document in the file is document 1, the second is document 2, etc.). In order to look up search terms, we will need to know which words appear in each document. We will use a dictionary for this purpose. Each entry in your dictionary should have a word as the key and the word’s value as the set of documents that this word appears in. This arrangement allows you to look up a keyword in the dictionary and immediately get all the documents that it appears in, making it easy to figure out documents that might meet a search query. Once your program has read the file, it will prompt the user to do one of three things:1) search for documents that match the search words input by the user, 2) display a document, or 3) quit the program. If the user choses to search, your program should prompt for a string of search words and find documents that contain all of those keywords. It will then print out the document number of all of the relevant documents. If no documents in the collection contain every keyword input by the user, your program should print a message that says that no relevant documents were found. If the user chooses to display a document, your program should prompt for a document number and print out the entire document that corresponds to that number. Your program should continue to prompt until the user chooses to quit. Must be coded in Python
Explanation / Answer
import string
## This function takes a document in a text file and returns a dictionary
## with its keys as words and values as a set of the document numbers
## the word is in.
## The user can thus utilize this function to search words in a document and find a document with a specific word.
def word_search():
file_obj = open('ap_docs.txt')
token = '<NEW DOCUMENT>'
new_dict = {}
doc_number = 0
for line in file_obj:
line = line.strip()
if line == token:
doc_number += 1
del line
else:
word_list = line.split()
for word in word_list:
word = word.lower()
word = word.strip()
word = word.strip(string.punctuation)
if word in new_dict:
new_dict[word].add(doc_number)
elif word not in new_dict:
new_dict[word] = {doc_number}
return new_dict
## This function takes an empty list as a parameter.
## The function reads through a document and creates
## a list of strings with a article in each string.
## The new strings are spotted and created with a
## token.
def article_str(new_list):
file_obj = open('ap_docs.txt')
token = '<NEW DOCUMENT> '
new_str = ''
doc_number = 0
for line in file_obj:
if line == token:
doc_number += 1
new_list.append(new_str)
new_str = ''
elif line not in token:
new_str += line
elif line == ' ':
new_str += ' '
new_list.append(new_str)
return new_list
## This function takes the dictionary from the word_search
## function and a user inputted list of search words as
## parameters.
## It appends the documents numbers the search word exists in
## to a list.
## It intesects each item in the list and returns the
## intersection.
def intersection(search_dict, search_word):
my_list = []
sw = search_word.split(' ')
for word in sw:
my_list.append(search_dict[word])
inter = my_list[0]
for item in my_list:
inter = item & inter
return inter
## This function uses the word_search function to create a dictionary of all the
## words with its document number. It contains a boolean function that takes
## an input. If the input is 1, then it finds the intersection and returns the
## document numbers the search words exist in. If the input is 2, then it prompts for the
## document number and returns the document. If the input is 3, then the program
## shuts down.
def main():
new_list = []
search_dict = word_search()
while True:
q = input("What would you like to do? 1. Search for Documents 2. Read Document 3. Quit Program ")
if q=="1":
search_word = input('Enter search word: ')
search_word = search_word.lower()
search_word = search_word.strip(string.punctuation)
inter = intersection(search_dict,search_word)
print('Documents fitting search: ', inter)
print('----------------------------')
elif q=="2":
doc_num = int(input('Enter document number: '))
print('----------------------------')
new_list = article_str(new_list)
print(new_list[doc_num])
print('----------------------------')
elif q=="3":
print ('Quiting')
break
main()
ap_docs.txt
<NEW DOCUMENT>
A state weight inspector whose boss says
he tipped the scales at 500 pounds himself and ``kept growing out
of the uniforms'' is appealing his firing by the North Dakota
Highway Patrol.
Melvin Hansen of Wahpeton, who is awaiting a hearing before the
state Personnel Board on Wednesday, was given five reasons in
writing for being fired, including his weight, his lawyer, Hal
Stutsman, said Friday.
Hansen was dismissed July 31 from his job in a scale house where
he weighed trucks and inspected cargo.
``To say we terminated the guy because he was overweight is
basically unfair,'' said Brian Berg, superintendent of the Highway
Patrol. ``I feel bad for the guy. We tried to work with him to
improve his situation. He left us no alternative, in our opinion.''
Berg said Hansen's ``weight probably contributed to other
things. ... Personal health habits would be a polite way of saying
it.''
Highway Patrol policy requires employees to maintain
``appropriate health levels, weight levels and physical fitness
levels'' as determined by the department's doctor, personnel
officer Richard Anagnost said.
Anagnost said Hansen was informed that reasons for his firing
included his failure to for
Related Questions
drjack9650@gmail.com
Navigate
Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.