Paul Graham has a new article out, Filters that Fight Back. In the first part of the article he basically does a "State of the Union" on SPAM, focusing on techniques spammers use to try to get around spam, and concludes that none of them are really effective at concealing the fact that a message is spam. None of this was new to me, since I've been using POPFile [1] for a while and have been watching what goes on, but it's a great summary with actual statistics 
The focus of the article is this:
As I mentioned in Will Filters Kill Spam?, following all the urls in a spam would have an amusing side-effect. If popular email clients did this in order to filter spam, the spammer's servers would take a serious pounding. The more I think about this, the better an idea it seems. This isn't just amusing; it would be hard to imagine a more perfectly targeted counterattack on spammers.
So I'd like to suggest an additional feature to those working on spam filters: a "punish" mode which, if turned on, would retrieve whatever's at the end of every url in a suspected spam n times, where n could be set by the user.
If widely used, auto-retrieving spam filters would make the email system rebound. The huge volume of the spam, which has so far worked in the spammer's favor, will now work against him, like a branch snapping back in his face. Auto-retrieving spam filters will drive the spammer's costs up, and his sales down: his bandwidth usage will go through the roof, and his servers will grind to a halt under the load, which will make them unavailable to the people who would have responded to the spam.
The whole point of spam fighting is to raise spammers' costs until it is no longer economically viable for anyone to spam. Unless you do that, spam will continue. This sounds like it might be an effective way to punish spammers, but I'm a little concerned about the potential for abuse. We'll see what happens...
What's great, however, is that we're beginning to win the war on spam, even if you don't realize it yet. Spammers are changing their behavior, which means Bayesian filtering is having an effect. If you're not using a Bayesian filter yet, why not? Graham has a huge list of them for you to check out, as well as a link to an article comparing POPFile and SpamBayes. Finally, check out this article about how SpamBayes is about to be rolled out at Cornell. The more widely installed anti-spam software is, the more we all benefit. Go Cornell.
Footnotes:
[1]: I'm now at 99.2% accuracy, though POPFile chokes on some messages containing Chinese and I have to change my account settings to not use POPFile to be able to get my mail. I'm not sure whether that can be fixed easily, or whether this is a problem Perl (or, just the version of Perl POPFile ships with) has dealing with Unicode, for example
Feel free to post a comment below. Please see my comment policy.
Formatting Rules (No HTML):