Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

I am building a classifier for short texts in a chat system. My features are wor

ID: 650604 • Letter: I

Question

I am building a classifier for short texts in a chat system. My features are words and pairs of words.

Naturally, the sentences contain spelling mistakes. If a particular wrong spelling of a certain word hasn't appeared in the training corpus, the classifier has no chance to identify it.

I consider taking an existing spelling corrector, and integrate it with my current classifier, but I am not sure how to do it.

Do you know of a paper that integrates an automatic spelling correction tool with a short text classifier?

Explanation / Answer

Could you not just pre-process the text? Simply pass over the corpus and correct errors via edit-distance or lookup tables. Once this has finished you can then run the classifier.

It you really want to incorporate the correction into your model, it would seem that you can re-define edit distance to compute the log-probability of some misspelled word matching another for some mismatch probability ??Beta(?,?).

The resulting probabilities can then be incorporated into the classifier directly, which would be parameterized by ?. Depending on how complex your internal model is, you could find the ML or MAP estimate ?^ by some EM algorithm, or if the model allows, analytically.