Spam continues to fill up mail servers. At one point, it was simply be a hassle; now, it's becoming a severe problem. With spammers getting cleverer, prevention is a must for anyone running a mail server. Fortunately for the Linux community, an outstanding spam-filtering system is available: SpamAssasin. Here's how it works.
What's SpamAssassin?
SpamAssassin is a mail filter that attempts to identify spam using a variety of mechanisms, including: text analysis, Bayesian filtering, DNS blocklists, and collaborative filtering databases. SpamAssassin does not delete spam, route spam and ham to separate mailboxes or folders, or send bounces when you receive spam.
SpamAssassin offers the following features:
- Wide-spectrum: SpamAssassin uses a wide variety of local and network tests to identify spam signatures. This makes it harder for spammers to identify one aspect they can craft their messages to work around.
- Free software: Distributed under the same terms and conditions as other popular open-source software packages, such as the Apache Web server.
- Easy to extend: Anti-spam tests and configuration are stored in plain text, making it easy to configure and add new rules.
- Flexible: SpamAssassin encapsulates its logic in a well-designed, abstract API, so it can be integrated anywhere in the e-mail stream. The Mail::SpamAssassin can be used on a wide variety of e-mail systems, including: procmail,sendmail,Postfix, andqmail, among others.
- Easy configuration: SpamAssassin requires very little configuration. You won't need to continually update it with details of your mail accounts or mailing list memberships. Once classified, site and user-specific policies can be applied against spam. Policies can be applied on both mail servers and later using the user's own mail user-agent application.
SpamAssassin is generally thought of as one of the best spam filters available. This article will walk you through installing, configuring, and using this powerful tool.
Getting and installing SpamAssassin
As with any Linux application, there are numerous ways to install SpamAssassin. Here's the short list:
- Debian unstable:
apt-get install spamassassin - Gentoo:
emerge mail-filter/spamassassin - Fedora:
yum install spamassassin
If you prefer to install everything from source, download the archive file from the SpamAssassin Web site. With the file in place, enter the following commands at a console prompt:
untar/unzip the file
cd into the newly created
directory
perl Makefile.PL
OPTION: Add
-DSPAMC_SSL to $CFLAGS to build an SSL-enabled spamc]
make
make
install [as root]
There are quite a few distribution-specific and dependency-specific rules within the top-level INSTALL file. Make sure you read that file in its entirety before installing.
Installing for system-wide use
One of the best reasons for installing system-wide is so you won't need to alter users' .procmailrc file. This could become a major headache, depending on how many users you have. Since this process could theoretically destroy all users' e-mail, it's smart to get this working on a test-bed environment first.
Configuration
Like most Linux applications, SpamAssassin requires the editing of a configuration file. This file resides in /etc/mail/spamassassin/ and is named local.cf. Before you hand-edit your configuration file, make note that Michael Moncur has written an outstanding tool to help create your local.cffile. This tool only works for version 3.x.
This tool is a web-based set of options you choose that will help to generate your configuration file. After selecting the few simple options Mr. Moncur has created, press Generate:
# Generated
by http://www.yrex.com/spam/spamconfig.php (version 1.50)
# How
many hits before a message is considered spam.
required_score 7.5
# Change the subject of suspected spam
rewrite_header subject *****SPAM*****
# Encapsulate spam in an attachment (0=no, 1=yes, 2=safe)
report_safe 1
# Enable the Bayes system
use_bayes 1
# Enable Bayes auto-learning
bayes_auto_learn 1
# Enable or disable network checks
skip_rbl_checks 0
use_razor2 1
use_dcc 1
use_pyzor 1
# Mail using
languages used in these country codes will not be marked
# as being possibly spam in a foreign language.
ok_languages all
# Mail using locales used in these country codes will not be marked
# as being possibly spam in a foreign language.
ok_locales all
Let's take a look at what we see.
- Score Threshold: The lower the threshold, the fewer mails slip through. Default is 6. Be warned: if you set it too low, legitimate e-mails will get flagged as spam.
- Rewrite Message Subjects: With this, you can configure SpamAssassin to edit the subject line of an e-mail with whatever you choose. Default is set to: *****SPAM*****
- Encapsulate Spam In Attachment: If you select this, the suspect e-mail will be added as an attachment instead of inline. Default is set to not attach.
- Enable Bayes System: This allows you to enable the "Bayesian" analysis system to determine whether each message is spam based on previous examples of spam and non-spam. Default is set to enable.
- Use Auto Learning:SpamAssassin can automatically train its Bayes database by analyzing messages with a score that strongly suggests if they are spam or non-spam.
- Enable RBL Checks: Choose whether SpamAssassin should use RBLs (DNS Blacklists). These can help detect difficult spam, but they require some time, network bandwidth, and an available DNS server.
- Use Network Checksum Tests: Choose whether to use the services that compare message checksums to known spam: Vipul's Razor 2.x, DCC, and Pyzor. These will only work when the client software for each service is installed. (use_razor2, use_dcc, use_pyzor)
- Languages: The last two configurations are for languages, the first being which languages should be checked. The default is all languages. I would leave this as is.
If you use Mr. Moncur's application to generate your .cf file, save that file in /etc/mail/spamassassin/, and start the SpamAssassin application. To get SpamAssassin running, issue this command (as root):
/etc/rc.d/init.d/spamassassin start
Note: Depending upon your distribution, the SpamAssassin executable might be located in /etc/init.d/.
Once it is up and running, you will want to make sure the spamdstarts at each boot. You can either use the system-config-services application, or check the spamassassinoption. If you don't have that application available, you can add the following to your /etc/rc.local file.
/etc/rc.d/init.d/spamassassin start
or
/etc/init.d/spamassassin start
depending on where your SpamAssassin executable is located.
Working with procmail
Now that your system is up and running, you have to set it up to work with your MDA (Mail Delivery Agent). I'm going to assume you are working with procmail, as this is the most widespread MDA in the Linux environment.
You are going to edit the /etc/procmailrc file and add the following:
DROPPRIVS=yes
:0fw
| /usr/bin/spamc
Now procmail is set up to use SpamAssassin to score and filter incoming spam.
Blacklisting with spamd
We all know there are certain domains/users that spam. Fortunately for us, SpamAssassin has a means to handle known spammers. With the help of blacklisting, SpamAssassin takes another step forward as the best and last line of defense.
Setting up a blacklist is simple. There are two configuration files you can add a blacklist to. Either /etc/mail/spamassain/local.cf (for site-wide use) or each user can configure their own inside of ~/.spamassassin/user_prefs. The lines of blacklists will look like:
blacklist_from sample_email@sampledomain.com
blacklist_from *@sampledomain.com
blacklist_from *@sampledomain.com
blacklist_from *@sampledomain.com
The above should be fairly obvious. You can either configure exact e-mail addresses (as in sample_email@sampledomain.com), or you can configure entire domains (as in *@sampledomain.com).
If you don't want to take the time to edit a blacklist on your own, you can download an up-to-date blacklist from William Stearns. This list is huge, so beware when downloading and adding it to your blacklist.
Training SpamAssassin
There's a chance that SpamAssassin might be scoring e-mail incorrectly. If that's the case, you can train SpamAssassin with your own e-mail.
To do so, you will need to use an ssh program (like ssh or Putty) and SpamAssassin'ssa-learn program. Your e-mail server must also be set to IMAP to train SpamAssassin.
To train SpamAssassin, follow these steps:
- Separate your spam from ham into separate mailboxes.
- Open your ssh application and connect to jupiter.gac.edu.
- Run these two commands:
sa-learn --ham
--progress --mbox Mail/nameOfYourHamMailbox
sa-learn --spam
--progress --mbox Mail/nameOfYourSpamMailbox
Once you have trained SpamAssassin with more than 200 spam and 200 ham messages, it will start to use that information to help it determine what is and is not spam.
Final thoughts
SpamAssassin is one of those programs you should not be without. Running a mail server in a company setting can be a nightmare when (not if) spam starts pouring through your pipes.
Considering how important a spam front line and how easy SpamAssassin is to use, there is no reason why you shouldn't be deploying this gem on your Linux mail servers. Your users will thank you when they notice their mailboxes are spam-free.
Full Bio
Jack Wallen is an award-winning writer for Techrepublic and Linux.com. As an avid promoter/user of the Linux OS, Jack tries to convert as many users to open source as possible. His current favorite flavor of Linux is Bodhi Linux (a melding of Ubuntu and Enlightenment). When Jack isn't writing about Linux he is hard at work on his other writing career -- writing about zombies, various killers, super heroes, and just about everything else he can manipulate between the folds of reality. You can find Jack's books on Amazon, Barnes & Noble, and Smashwords. Outnumbered in his house one male to two females and three humans to six felines, Jack maintains his sanity by riding his mountain bike and working on his next books. For more news about Jack Wallen, visit his website Get Jack'd.
