Given an article, design an algorithm in pseudo code to find the top 150 most fr
ID: 3844937 • Letter: G
Question
Given an article, design an algorithm in pseudo code to find the top 150 most frequently co-occurring word-pairs in the article. Two words are said to co-occur if they appear in the same sentence. For example, the sentence, “It’s really a milestone in Chinese science fiction.” contain the following word pairs:
('It's', 'really')
('It's', 'a')
('It's', 'milestone')
('It's', 'in')
('It's', 'Chinese')
('It's', 'science')
('It's', 'fiction')
('really', 'a')
('really', 'milestone')
('really', 'in')
('really', 'Chinese')
('really', 'science')
('really', 'fiction')
('a', 'milestone')
('a', 'in')
('a', 'Chinese')
('a', 'science')
('a', 'fiction')
('milestone', 'in')
('milestone', 'Chinese')
('milestone', 'science')
('milestone', 'fiction')
('in', 'Chinese')
('in', 'science')
('in', 'fiction')
('Chinese', 'science')
('Chinese', 'fiction')
('science', 'fiction')
you can assume you have access to a subroutine, sentenceSplitter(article), that can accurately segment an article into separate sentences and return these sentences in an array-like structure.
You can also assume that you have access to another routine tokenizer( sentence), that can accurately identify the individual words contained in the input sentence and return these words in another array-like data structure.
Please describe rhe algorithm unambiguously using pseudo code with necessary comments in English.
Don't implement the algorithm, only the pseudo code is required
Explanation / Answer
Map<String, Integer> hmap = new HashMap<String,Integer>();
//you have two lists with those strings, called list1 and list2.
// list1<String> and list2<String> have same size from Sentence Splitter
String key ;
for(int i=0;i<list1.size();i++)
{
for(int j=1;j<list2.size();j++)
{
key = list1.get(i) + list2.get(j);
if(map.containsKey(key))
Integer value = hmap.get(key)++;
else
map.put(key,1);
}
}
//now the map has been filled, you can go through the map,
//and check the value, if value == 1, then the key is unique.
//for those value >1, you know, which string pair is not unique,
// and how many times it shows.
//
//We are storing the hashmap by values
Map<Integer, String> map = sortByValues(hmap);
System.out.println("After Sorting:");
Set set2 = map.entrySet();
Iterator iterator2 = set2.iterator();
while(iterator2.hasNext()) {
Map.Entry me2 = (Map.Entry)iterator2.next();
System.out.print(me2.getKey() + ": ");
System.out.println(me2.getValue());
}
}
private static HashMap sortByValues(HashMap map) {
List list = new LinkedList(map.entrySet());
// Defined Custom Comparator here
Collections.sort(list, new Comparator() {
public int compare(Object o1, Object o2) {
return ((Comparable) ((Map.Entry) (o1)).getValue())
.compareTo(((Map.Entry) (o2)).getValue());
}
});
// Here I am copying the sorted list in HashMap
// using LinkedHashMap to preserve the insertion order
HashMap sortedHashMap = new LinkedHashMap();
for (Iterator it = list.iterator(); it.hasNext();) {
Map.Entry entry = (Map.Entry) it.next();
sortedHashMap.put(entry.getKey(), entry.getValue());
}
return sortedHashMap;
}
}
Related Questions
drjack9650@gmail.com
Navigate
Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.