The objective of this assignment is to get you familiar with regular expression,
ID: 3746735 • Letter: T
Question
The objective of this assignment is to get you familiar with regular expression, word tokenization and sentence segmentation. Given a text document, your goal is to segment the document into separate sentences using Python. An example input/output pair is provided below.
Input:
QUINCY HOWE, MODERATOR: I am Quincy Howe of ABC News saying good evening from New York where the two major candidates for president of the United States are about to engage in their fourth radio-television discussion of the present campaign. Tonight these men will confine that discussion to foreign policy. Good evening, Vice President Nixon.
MR. NIXON: Good evening, Mr. Howe.
MR. HOWE: And good evening, Senator Kennedy.
MR. KENNEDY: Good evening, Mr. Howe.
MR. HOWE: Now let me read the rules and conditions under which the candidates themselves have agreed to proceed. As they did in their first meeting, both men will make opening statements of about eight minutes each and closing statements of equal time running three to five minutes each. During the half hour between the opening and closing statements, the candidates will answer and comment upon questions from a panel of four correspondents chosen by the nationwide networks that carry the program. Each candidate will be questioned in turn with opportunity for comment by the other. Each answer will be limited to two and one-half minutes, each comment to one and one-half minutes. The correspondents are free to ask any questions they choose in the field of foreign affairs. Neither candidate knows what questions will be asked. Time alone will determine the final question. Reversing the order in their first meeting, Senator Kennedy will make the second opening statement and the first closing statement. For the first opening statement, here is Vice President Nixon.
Output:
1. QUINCY HOWE, MODERATOR: I am Quincy Howe of ABC News saying good evening from New York where the two major candidates for president of the United States are about to engage in their fourth radio-television discussion of the present campaign.
2. Tonight these men will confine that discussion to foreign policy.
3. Good evening, Vice President Nixon.
4. MR. NIXON: Good evening, Mr. Howe.
5. MR. HOWE: And good evening, Senator Kennedy.
6. MR. KENNEDY: Good evening, Mr. Howe.
7. MR. HOWE: Now let me read the rules and conditions under which the candidates themselves have agreed to proceed.
8. As they did in their first meeting, both men will make opening statements of about eight minutes each and closing statements of equal time running three to five minutes each.
(etc.)
--------
You must build your own sentence segmentation program. You can’t use third party software to do that. Your program should process the input.txt file and generate an output.txt file. Each line of the output file should start with the sentence number followed by the sentence as shown in the above example.
Explanation / Answer
from nltk.tokenize import sent_tokenize
import sys,textwrap
import codecs
text=""
with codecs.open('input.txt') as f1:
text = f1.read().replace(".","."+' ')
with codecs.open('output.txt','w') as f:
f.write(text)
Related Questions
drjack9650@gmail.com
Navigate
Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.