The SpamAssassin software installation project plan will make help you track project milestones, roles and responsibilities, and risks. For just $9.95, this comprehensive package will make SpamAssassin installation a no-brainer.

Spam is one of the most serious problems plaguing Internet
users today. There’s nothing quite as frustrating as arriving at work each morning
to a mailbox full of unwanted ads. Sorry, there is one thing more frustrating…wasting
the next hour deleting those ads for drugs and refinancing and other junk you
don’t want or need.

Fortunately there is a cure for the spam blues. It’s called
SpamAssassin, and it’s possibly the best tool out there to combat spam. In this
guide we’ll show you how it works, and then how to install and configure it for
your server.


Editor’s Note
CNET Networks, the parent company of Builder.com, uses
SpamAsssassin for spam filtering.


How SpamAssassin works

SpamAssassin works by “scoring” each e-mail
message against a range of tests designed to identify if that message is spam
or not. A wide number of tests are provided, including checks to see if the
sender and recipient address are valid, if the message dates are valid, if the
body contains any of a list of forbidden words, if any of the sending servers
are blacklisted, and so on. Each test adds to a message’s overall spam score;
messages over a certain user-defined threshold are treated as spam and can be
either trashed or marked with a special spam header.

In addition to these tests, SpamAssassin comes with a Bayes
algorithm which “learns” to recognize new spam on the basis of old
spam messages. This makes it possible for the software to automatically adapt
and identify spam even in the absence of specific header or body tests. A white
list system makes it easy to list e-mail addresses that you know are valid;
messages from these senders are exempted from filtering and get routed directly
to your mailbox. In true open source spirit, it is possible to add your own
custom tests, or modify the scoring rules to your own specific requirements.

SpamAssassin comes in two main flavors: an on-demand
scanner, which can be invoked every time a message comes in, or a daemon which
continuously runs in memory and scans all messages. This article focuses on the
latter approach.

Now let’s get started by looking at how to install
SpamAssassin
.

Installing SpamAssassin

SpamAssassin is licensed under the GPL and its own Artistic
License (though it is in the process of moving to the Apache Software
Foundation, and future versions will be covered under the Apache Software
License
). You can download the UNIX versions
here
and Windows versions here. Detailed installation instructions are
provided in the download archive, but by far the simplest way to install it is
to use the CPAN shell:

shell> perl -MCPAN -e shell
cpan> install Mail::SpamAssassin

Note that SpamAssassin requires procmail and a relatively-recent version of
Perl to be installed on the system. A number of
other Perl module dependencies also exist, but if you use the CPAN shell, they
will usually be downloaded and installed automatically as well (the exception
is if for some odd reason your CPAN shell is set to ignore dependencies, then
you’ll have to install each manually).

Typically, SpamAssassin is installed to
“/usr/bin/spamassassin”, although you can specify another location as
well during the compilation phase if you like. If you’d like to completely
customize the SpamAssassin installation—say, if you’re installing it for a
specific user instead of the entire domain—you should consider downloading and
installing the package manually. Refer to the online documentation
for details.

Once installed, you can test SpamAssassin by using it to
scan two sample messages—one genuine and one spam—that ship with the
distribution:

$ /usr/bin/spamassassin -t < sample-spam.txt $ /usr/bin/spamassassin -t < sample-nonspam.txt

SpamAssassin will print a report for each message,
indicating whether or not it is spam. For messages marked as spam, it will also
tell you which tests were used.

Activating the SpamAssassin daemon

Once SpamAssassin has been tested, the next step is to set
it up to scan incoming e-mails automatically. The most efficient way to do this
is to set up the spamd/spamc system—essentially SpamAssassin in daemon mode.

Procmail is used to pass the incoming messages to spamc,
which then connects to the daemon and passes it the message for processing. The
spamd daemon remains active at all times and, on receiving a message, scans it
and flags it appropriately.

The first task, then, is to add procmail rules to redirect
incoming messages through spamc. Open up your system procmailrc recipe file,
and add the following lines to the top:

DROPPRIVS=yes

:0fw
* < 256000
| /usr/bin/spamc

Next, you need to start up spamd:

$ /usr/bin/spamd &

Try sending yourself a test message and, when you receive
it, check the headers—you should see one or more SpamAssassin headers attached
to it. This indicates that spamd is functioning and scanning your mail as it
comes in.

Now that SpamAssassin is installed and running, it’s time to
tweak the system configuration and figure out how to filter on your
local e-mail client
.

Tweaking SpamAssassin

After you’ve got the SpamAssassin daemon up and running, there
are a number of options you can tweak to make it more efficient at filtering
your mail:

1. Alter the minimum threshold for mail to be flagged as
spam. A higher value allows more spam through; a lower value is more aggressive
at filtering spam, but has a higher risk of genuine e-mail being wrongly flagged
as spam.

2. Since spam sometimes comes in foreign languages, reduce
incidence by specifying which languages are allowed.

3. Visibly mark each message as spam by placing a special
“SPAM” flag in the subject line. This allows users to filter out
those messages on the client side.

4. Activate the Bayes learning system and real-time
blacklisting so that SpamAssassin “learns” from its mistakes, and
also from the real-time data gathered by the community to identify known
spammers.

5. Use white lists so that genuine mail from trusted
contacts is never wrongly flagged as spam.

All these settings are handled through either a sitewide
configuration file, or a per-user preferences file in each user’s home
directory. As an illustration, consider the sample configuration file shown in Listing A. It activates all of the
settings described above:

You can have a custom file like Listing A created automatically
by using this online
configuration tool
. You can obtain more information on these settings by
looking in the documentation
for SpamAssassin
.

Filtering on the client

Normally, every message designated as spam (that is,
messages with a spam score above your threshold) will be modified by spamd to
include the “X-Spam-Status: Yes” header. Mail clients like Mutt (UNIX)
or Microsoft Outlook/Eudora (Windows) can be configured to look for this header
and shunt these messages into a separate mailbox, which you can inspect at your
leisure.

On UNIX, you can use a procmail recipe for this:

:0:
* ^X-Spam-Status: Yes
junk-mail

If you’ve activated the “rewrite_subject” variable
in the SpamAssassin configuration, messages will be further marked with the
text “*****SPAM*****” in the subject line (this is how SpamAssassin
is configured here at CNET). This provides a very visible cue as to the nature
of the message, and again it serves as a flag for your mail client’s filtering
engine.

For more information, consider spending some quality time
with the SpamAssassin documentation, especially the SpamAssassin wiki,
the list of tests
used by SpamAssassin, and auto-white-listing in SpamAssassin. And if you happen to use Mutt as your e-mail reader, here’s a handy guide to using SpamAssassin
with Mutt
.

Have fun, and here’s hoping you get a cleaner mailbox!