A spam filter is a program that can accurately predict whether someone will want to read an email message when it arrives in their inbox. To make accurate predictions, spam filters need samples of wanted and unwanted email. These samples are fed into classification algorithms to fine-tune their settings as the nature of spam changes over time.A spam filter’s accuracy derives from:
- The amount of high-quality training data (i.e. messages) that is available to fine-tune its classification algorithms; and,
- The quality and sophistication of these algorithms.
Good training data comes from real email traffic and feedback from real users. For example, Gmail sources training data from over a billion end users by watching how they interact with countless billions of email messages every day. Is a message opened? Is it reported as spam? Is it responded-to? Are any links clicked?
Open-source spam filters such as Rspamd and Apache SpamAssassin have great algorithms; however, these filters do not intrinsically have a large number of end-users whose interactions with email messages can be tracked as they can within Gmail. At best, open-source filters have a community-based feedback system that is used to train algorithms. It’s difficult to assure this community-based data is of high quality, never mind getting enough of it to adequately train the algorithms.
Just because you get the filter for free doesn’t mean an open-source spam filter is free in practice. Because the quality and quantity of community-sourced training data is poor, open-source spam filters often require a large amount of manual tuning. On top of this effort, you also have to run the software on your own infrastructure. Maintenance costs, infrastructure costs, and other overhead can easily outweigh the license cost of a good commercial filter.
If you’re a hobbyist with a few domains and the desire to run a spam filter as an educational project, open-source filters offer a great tool for learning and exploration. If you’re a service provider trying to focus on activities that generate growth, running an open-source spam filter is a poor use of your time. You’ll never have enough high-quality training data to make the filter accurate anyhow.
Use a high-quality cloud-based spam filter. Let someone else do the tweaking. Chances are they have way better training data than you and far more expertise.