I am building a classifier for short texts in a chat system. My features are wor
ID: 650604 • Letter: I
Question
I am building a classifier for short texts in a chat system. My features are words and pairs of words.
Naturally, the sentences contain spelling mistakes. If a particular wrong spelling of a certain word hasn't appeared in the training corpus, the classifier has no chance to identify it.
I consider taking an existing spelling corrector, and integrate it with my current classifier, but I am not sure how to do it.
Do you know of a paper that integrates an automatic spelling correction tool with a short text classifier?
Explanation / Answer
Could you not just pre-process the text? Simply pass over the corpus and correct errors via edit-distance or lookup tables. Once this has finished you can then run the classifier.
It you really want to incorporate the correction into your model, it would seem that you can re-define edit distance to compute the log-probability of some misspelled word matching another for some mismatch probability ??Beta(?,?).
The resulting probabilities can then be incorporated into the classifier directly, which would be parameterized by ?. Depending on how complex your internal model is, you could find the ML or MAP estimate ?^ by some EM algorithm, or if the model allows, analytically.
Related Questions
Navigate
Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.