Since everyone has a different idea of what counts as spam, a filter backed by a single global database will inevitably misclassify some legitimate emails. Even if only 1% of good mail gets flagged, that is not acceptable for paying customers. So we set up individual spam databases for each mailbox and let customers train the filter themselves.
Three buttons were added to the webmail interface: “Delete as Spam,” “Recover from Spam,” and “Reset Spam Filter.” The workflow is similar to the SpamBayes plug-in for Outlook — users simply mark misclassified messages, and the filter learns from their corrections over time.
I deployed this solution on a production server. Below is the summary from the email server’s log for September 25:
Grand Totals
------------
messages
87812 received
56351 delivered
2012 forwarded
606 deferred (5096 deferrals)
1342 bounced
54150 rejected (49%)
0 reject warnings
0 held
0 discarded (0%)
3486m bytes received
3816m bytes delivered
13591 senders
4993 sending hosts/domains
6937 recipients
1750 recipient hosts/domains