I have an English dictionary (text file) and the frequency of 2-grams, 3-grams a
ID: 649760 • Letter: I
Question
I have an English dictionary (text file) and the frequency of 2-grams, 3-grams and 4-grams as the beginning of each word. I need to write an algorithm that, with a given word, calculates the possible anagrams (which have to be into the dictionary) based on transitions probability (my assignment requires that I use this method).
I thought to use the Viterbi Algorithm but I don't know if it is the right choise. What do you think about a simulated annealing algorithm? Can you give me any help or advice?
Explanation / Answer
There's a better way to list all anagrams.
Given a word, compute its "canonical form" by sorting the individual letters into alphabetical order. For instance, BASEBALL becomes AABBELLS. Do this for each word in the dictionary, and build a data structure that maps from a canonical form to the list of words with that canonical form (e.g., from AABBELLS to BASEBALL) -- for instance, you might use a hashtable or a sorted list.
Now, given one English word, if you want to find all anagrams of it, you can rapidly compute all anagrams by computing its canonical form and then looking that up in your data structure.
This doesn't require n-grams, transition probabilities, Viterbi algorithm, or anything fancy like that -- just sorting and simple data structures. Thus, it can be coded up quickly and elegantly; and it will be very efficient.
Related Questions
drjack9650@gmail.com
Navigate
Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.