Paul Graham: Better Bayesian Filtering
If people had been onto Bayesian filtering four years ago, why wasn't everyone using it? When I read the papers I found out why. Pantel and Lin's filter was the more effective of the two, but it only caught 92% of spam, with 1.16% false positives.
When I tried writing a Bayesian spam filter, it caught 99.5% of spam with less than .03% false positives.
So why did we get such different numbers? I haven't tried to reproduce Pantel and Lin's results, but from reading the paper I see five things that probably account for the difference.
John Udell: Shipping the Prototype "Let's promote scripting languages to the status they deserve"
Check out this neat interview with Dennis Richie.
Feel free to post a comment below. Please see my comment policy.
Formatting Rules (No HTML):