SpamBayes: Per-Mailbox Training via Webmail

Since everyone has a different idea of what counts as spam, a filter backed by a single global database will inevitably misclassify some legitimate emails. Even if only 1% of good mail gets flagged, that is not acceptable for paying customers. So we set up individual spam databases for each mailbox and let customers train the filter themselves.

Three buttons were added to the webmail interface: “Delete as Spam,” “Recover from Spam,” and “Reset Spam Filter.” The workflow is similar to the SpamBayes plug-in for Outlook — users simply mark misclassified messages, and the filter learns from their corrections over time.

I deployed this solution on a production server. Below is the summary from the email server’s log for September 25:

Grand Totals
------------
messages

  87812   received
  56351   delivered
   2012   forwarded
    606   deferred  (5096  deferrals)
   1342   bounced
  54150   rejected (49%)
      0   reject warnings
      0   held
      0   discarded (0%)

   3486m  bytes received
   3816m  bytes delivered
  13591   senders
   4993   sending hosts/domains
   6937   recipients
   1750   recipient hosts/domains