in Uncategorized

Spammers Using Essays To Break Bayesian Filters

I’ve been noticing a lot of the spam that makes it by my ASSP Bayesian filter-based anti-spam application has been using text from sites selling school essays. The spams will quote a large swath of text from these essays to fool the spamfilters into thinking its legitimate text.

Now all I need to do is to figure out how to use that text to detect spam. Putting it into the spam-text database probably won’t work because the text is too generic. It’s easy for a human to find the source of the text (hint: Google), but how would I create an app to automatically identify the swiped text as essay text?