Given an article such as this one at nytimes.com ( http://www.nytimes.com/2015/0
ID: 671493 • Letter: G
Question
Given an article such as this one at nytimes.com ( http://www.nytimes.com/2015/03/01/books/review/kazuo-ishiguros-the-buried-giant.html?ref=books&_r=0 ), design an algorithm to find the most frequently co-occurring word-pair in this article. Two words are said to co-occur if they appear in the same sentence. You can assume you have access to a subroutine, sentenceSplitter(article), that can accurately segment an article into separate sentences. Please describe your algorithm unambiguously using pseudo code with necessary comments in English. Assume you start with an "article" variable that already contains the full text of an article.
Explanation / Answer
The following is the required algorithm that finds a pair of most frequently co-occured words .
Co-occur (article):
//article contains full of text.
//tokenize the ‘article’ and save the words occurred in the article into
// an array called words [ ]. Make sure that words [ ] does not contain duplicates
Sort (words)// now, the array contains words in dictionary order(optional)
//store the number of words in a variable
noWords=words.length
// crate a two dimensional array of integers to store the co-occurrence of pair of words.
//initialize the array with zeros
Counter[noWords][noWords]=0
//split the article into separate sentences and store them into a string array
Sentances[]=setanceSplitter(article)
//find the co-occurrence of each pair of words in the article
For 1 = 1 to noWords
For j=1 to noWords
For k=1 to Sentances.length
If words[i] and words[j] occurs in Sentances[k] for i!=j
//Increment coutner[i][j] by 1
Counter[i][j]=Coutner[i][j]+1
Endif
Endfor
Endfor
Endfor
//define a temporary integer
Highest=0
//consider two integers x1, x2 to save indexes of most frequently co-occur pairs
//now find the pair (i,j) such that counter[i][j] is highest than any value for i and j.
For i = 1 to noWords
For j = 1 to noWords
If Counter[i][j] > Highest
Highest =Counter[i][j]
X1=i
X2=j
Endif
Endfor
Endfor
Print “words[x1] and words[x2] are most frequently occurred words pair”
Related Questions
drjack9650@gmail.com
Navigate
Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.