Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

how to write algorithm using psuedo-code for this : Given an article such as thi

ID: 3843659 • Letter: H

Question

how to write algorithm using psuedo-code for this :

Given an article such as this one at nytimes.com, design an algorithm to find the top 150 most frequently co-occurring word-pairs in this article. Two words are said to co-occur if they appear in the same sentence. For example, the last sentence in this article "It's really a milestone in Chinese science fiction." contain the following word pairs: ('It's', 'really') ('It's', 'a')('it's', 'milestone') (, 'Chinese') ('It's', 'science') ('It's', 'fiction') Creakily^, 'milestone')C really!.:, ^. 'science') (.'.really.', 'fictions'^', ' milestone')^, 'in'), 'Chinese') 'science')a', 'fiction')(milestone', milestone) ('a', in)('a' Chinese') ('a, science') (milestone' 'fiction') ('in', Chinese') ('in', science') ('in, fiction') ('Chinese, ' 'science') ('Chinese', 'fiction') ('science, ' 'fiction') you can assume you have access to a subroutine, sentenceSplitter(article), that can accurately segment an article into separate sentences and return these sentences in an array-like structure. You can also assume that you have access to another routine tokenizer(sentence), that can accurately identify the individual words contained in the input sentence and return these words in another array-like data structure.

Explanation / Answer

//declare a Hashmap with key as a set of two words (wordPair) and value as frequency
//Everytime we get a new word pair, we just insert it into the map
//Everytime we a repeating word pair we just increase its frequency
//so in the end we can sort the map by value part and we will have a list of cooccuring pairs.

declare map< wordPair, freq> wordPairMap

sentenceSpiller (article)
   return sentences_vector
  
tokenizer(sentence)
   return words_vector

main()
   string article

   sentence_vector = sentenceSpiller(article);

   //iterate through the sentence_vector

   for each sentence : sentence_vector
       word_vector = tokenizer(sentence)
       for i = 0 to word_vector.size
           for j= i + 1 to word_vector.size
               word_pair = word[i] +","+ word[j])
              
               //if map contains the word_pair key just increment its frequency
               if wordPairMap contains word_pair
                   wordPairMap[word_pair].freq++
               else
               //otherwise insert a new entry in map with frequency 1
                   wordPairMap.push(word_pair, freq)

   //sort the wordPairMap by value
   wordPairMap.sortByValue()

   Print the wordPairMap
   for each pair : wordPairMap
       Print( pair.Key )
end