Spam is one of the most serious problems plaguing Internet users today. There's nothing quite as frustrating as arriving at work each morning to a mailbox full of unwanted ads. Sorry, there is one thing more frustrating...wasting the next hour deleting those ads for drugs and refinancing and other junk you don't want or need.
Fortunately there is a cure for the spam blues. It's called SpamAssassin, and it's possibly the best tool out there to combat spam. In this guide we'll show you how it works, and then how to install and configure it for your server.
CNET Networks, the parent company of Builder.com, uses SpamAsssassin for spam filtering.
How SpamAssassin works
SpamAssassin works by "scoring" each e-mail message against a range of tests designed to identify if that message is spam or not. A wide number of tests are provided, including checks to see if the sender and recipient address are valid, if the message dates are valid, if the body contains any of a list of forbidden words, if any of the sending servers are blacklisted, and so on. Each test adds to a message's overall spam score; messages over a certain user-defined threshold are treated as spam and can be either trashed or marked with a special spam header.
In addition to these tests, SpamAssassin comes with a Bayes algorithm which "learns" to recognize new spam on the basis of old spam messages. This makes it possible for the software to automatically adapt and identify spam even in the absence of specific header or body tests. A white list system makes it easy to list e-mail addresses that you know are valid; messages from these senders are exempted from filtering and get routed directly to your mailbox. In true open source spirit, it is possible to add your own custom tests, or modify the scoring rules to your own specific requirements.
SpamAssassin comes in two main flavors: an on-demand scanner, which can be invoked every time a message comes in, or a daemon which continuously runs in memory and scans all messages. This article focuses on the latter approach.
Now let's get started by looking at how to install SpamAssassin.
SpamAssassin is licensed under the GPL and its own Artistic License (though it is in the process of moving to the Apache Software Foundation, and future versions will be covered under the Apache Software License). You can download the UNIX versions here and Windows versions here. Detailed installation instructions are provided in the download archive, but by far the simplest way to install it is to use the CPAN shell:shell> perl -MCPAN -e shell
cpan> install Mail::SpamAssassin
Note that SpamAssassin requires procmail and a relatively-recent version of Perl to be installed on the system. A number of other Perl module dependencies also exist, but if you use the CPAN shell, they will usually be downloaded and installed automatically as well (the exception is if for some odd reason your CPAN shell is set to ignore dependencies, then you'll have to install each manually).
Typically, SpamAssassin is installed to "/usr/bin/spamassassin", although you can specify another location as well during the compilation phase if you like. If you'd like to completely customize the SpamAssassin installation—say, if you're installing it for a specific user instead of the entire domain—you should consider downloading and installing the package manually. Refer to the online documentation for details.
Once installed, you can test SpamAssassin by using it to scan two sample messages—one genuine and one spam—that ship with the distribution:$ /usr/bin/spamassassin -t < sample-spam.txt $ /usr/bin/spamassassin -t < sample-nonspam.txt
SpamAssassin will print a report for each message, indicating whether or not it is spam. For messages marked as spam, it will also tell you which tests were used.
Activating the SpamAssassin daemon
Once SpamAssassin has been tested, the next step is to set it up to scan incoming e-mails automatically. The most efficient way to do this is to set up the spamd/spamc system—essentially SpamAssassin in daemon mode.
Procmail is used to pass the incoming messages to spamc, which then connects to the daemon and passes it the message for processing. The spamd daemon remains active at all times and, on receiving a message, scans it and flags it appropriately.
The first task, then, is to add procmail rules to redirect incoming messages through spamc. Open up your system procmailrc recipe file, and add the following lines to the top:DROPPRIVS=yes
* < 256000
Next, you need to start up spamd:$ /usr/bin/spamd &
Try sending yourself a test message and, when you receive it, check the headers—you should see one or more SpamAssassin headers attached to it. This indicates that spamd is functioning and scanning your mail as it comes in.
Now that SpamAssassin is installed and running, it's time to tweak the system configuration and figure out how to filter on your local e-mail client.
After you've got the SpamAssassin daemon up and running, there are a number of options you can tweak to make it more efficient at filtering your mail:
1. Alter the minimum threshold for mail to be flagged as spam. A higher value allows more spam through; a lower value is more aggressive at filtering spam, but has a higher risk of genuine e-mail being wrongly flagged as spam.
2. Since spam sometimes comes in foreign languages, reduce incidence by specifying which languages are allowed.
3. Visibly mark each message as spam by placing a special "SPAM" flag in the subject line. This allows users to filter out those messages on the client side.
4. Activate the Bayes learning system and real-time blacklisting so that SpamAssassin "learns" from its mistakes, and also from the real-time data gathered by the community to identify known spammers.
5. Use white lists so that genuine mail from trusted contacts is never wrongly flagged as spam.
All these settings are handled through either a sitewide configuration file, or a per-user preferences file in each user's home directory. As an illustration, consider the sample configuration file shown in Listing A. It activates all of the settings described above:
You can have a custom file like Listing A created automatically by using this online configuration tool. You can obtain more information on these settings by looking in the documentation for SpamAssassin.
Filtering on the client
Normally, every message designated as spam (that is, messages with a spam score above your threshold) will be modified by spamd to include the "X-Spam-Status: Yes" header. Mail clients like Mutt (UNIX) or Microsoft Outlook/Eudora (Windows) can be configured to look for this header and shunt these messages into a separate mailbox, which you can inspect at your leisure.
On UNIX, you can use a procmail recipe for this::0:
* ^X-Spam-Status: Yes
If you've activated the "rewrite_subject" variable in the SpamAssassin configuration, messages will be further marked with the text "*****SPAM*****" in the subject line (this is how SpamAssassin is configured here at CNET). This provides a very visible cue as to the nature of the message, and again it serves as a flag for your mail client's filtering engine.
For more information, consider spending some quality time with the SpamAssassin documentation, especially the SpamAssassin wiki, the list of tests used by SpamAssassin, and auto-white-listing in SpamAssassin. And if you happen to use Mutt as your e-mail reader, here's a handy guide to using SpamAssassin with Mutt.
Have fun, and here's hoping you get a cleaner mailbox!