Software

A simple email filter: getlessmail

TechRepublic writer Sterling Camden created a configurable email spam filter of his own. Here are the details on how you can get it and configure it for your own needs.

TechRepublic writer Sterling Camden created an email filter. Here are the details.


Fellow TechRepublic contributor Sterling "Chip" Camden has recently made the move from MS Windows to FreeBSD for his primary workstation OS. Pretty much every time I talk to him these days, he feels compelled to share how much he's enjoying the benefits of his new working environment, and every time it puts a smile on my face, hearing about the joy of discovery.

He has also sought my advice about software choices on the new platform, system configuration, and various other matters, knowing that I use FreeBSD as my own primary workstation OS. While his ultimate choices don't strictly mirror mine, I see from what he's selecting that many of his tastes are running in directions very similar to my own. He is using XMonad as his window manager where I'm using AHWM, but they are both primarily keyboard driven, and XMonad is something I've been meaning to try out myself (some day).

One area where he has ended up using the same piece of software for common tasks that I use is email. We are both using a mail setup that involves Mutt as the Mail User Agent (MUA), sSMTP (Simple SMTP) as the SMTP client, and getmail as the POP client, on our laptops.

Both of us are quite wary of the problem of false positives when dealing with spam email. It would be great to never get another piece of spam in my inbox, but the danger of false positives -- of legitimate email that I actually want to get being misidentified as spam by an email filter -- is enough to make us both shy away from most spam filtering software.

Mutt makes it so easy to deal with email en masse that, considering my IT Security writing commitment at TechRepublic, it actually makes sense for me to get some spam in my inbox just so I can see current spam trends. I skim through my email to see what has started to appear for spam and phishing email, then use Mutt's vaguely vi-like powerful sorting and managing functionality to eliminate large numbers of spam emails very quickly. It still requires my direct intervention, though, so it comes as no surprise that Sterling chooses another approach, whereby he minimizes the amount of spam he sees as much as is reasonably possible.

Sterling's approach was to write a spam filter of his own. In his own words at Chip's Tips, in Script email filtering with Ruby, he says:

I've used all sorts of email filters since my very first internet email account in the early 90s -- and none of them have been quite right. I'd like to be able to block anything about Viagra, but not when a friend or family member uses the word. Pure Bayesian filters always seem to block something from someone I know, while letting a few of the real spam messages through. But whitelists and blacklists suffer from a "which rule comes first" problem.

The result of his decision to write his own spam filter is called "getlessmail", because it was originally designed to work with getmail. His approach involved creating an embedded domain-specific language (EDSL) that is used to configure a filter for his own purposes. This also means that others who want to use getlessmail can use this same EDSL to create simple configurations for email filtering that are easy to compose and read. The sample configuration he provides is:

  keep if from "mybestfriend@example.com"

spam if from "@example.com"

spam if subject "viagra|cialis"

spam if body "(?m:\bnude\b.*\bpics\b)"

I have skimmed the README, and it looks like a quite capable little tool. I will probably even use it for one of the email accounts for which I use Mutt as my MUA. I do not need to get spam and phishing samples in multiple email accounts, after all. It appears to be better suited for some types of email accounts (a private account for which only known entities have the address) than for others (a public account where any random person on the Internet might have a legitimate email to send you), of course -- but that appears to be a problem that no email filter has yet solved. In addition to its other benefits, this email filter is even distributed under the terms of the Open Works License (OWL), a copyfree license, which my regular readers should recognize as my choice for the right licensing model for security software.

The Chip's Tips article about getlessmail offers, at this time, a bzipped tarball download that includes documentation in a README file, a license.txt file with the text of the OWL in it, a sample getmailrc file demonstrating how to configure getmail to use getlessmail as an email filter, and of course the getlessmail.rb program file itself.

With Sterling's blessing, I have created a BitBucket project for getlessmail where he can manage the project and anyone with the Mercurial (also known as hg) version control system installed can use it to mirror the repository to hack on it or use as he or she desires. He tells me that he "will have some supporting scripts and updates" for getlessmail in the future, and those should become available within the Mercurial repository at BitBucket as he releases them.

About

Chad Perrin is an IT consultant, developer, and freelance professional writer. He holds both Microsoft and CompTIA certifications and is a graduate of two IT industry trade schools.

21 comments
Rikker000
Rikker000

Can he come up with something for Outlook?

Elvis.GodZilla.777
Elvis.GodZilla.777

Thanks Sterling ?Chip? Camden! I love a good Mutt! What a great idea! Sorry for shouting...

joel
joel

(Quick initial caveat: I'm using MS Outlook but the filtering concepts are/could be made mail client-agnostic.) I don't get loads of spam, per se, but I do get a lot of trivial, interruptive mail that I want to separate out from important email. To do this, I've created a macro-based filter that checks mail as it comes in. If it's from someone in my address book or a member of my work domain, then it's left in the inbox. Otherwise it's moved to a secondary location. This way I can deal with email from unrecognized senders in a batch as I have the time. So far, it seems to help me be more productive.

Sterling chip Camden
Sterling chip Camden

Yes, I do use sSMTP for outbound. And I hope to add those scripts to the repository today or tomorrow.

apotheon
apotheon

He probably could, but I don't think he's going to. He wrote it to scratch his itch, and his itch is in the realm of a FreeBSD and Mutt user, not an MS Windows and Outlook user, these days.

michel
michel

Thank you, thank you, thank you This is great stuff!

ian
ian

I think now is a good time to install FreeBSD on Virtual PC. Thanks for reminding me.

Sterling chip Camden
Sterling chip Camden

... and I built that capability into getlessmail as well (although it assumes you store mail in mbox format): move "ReadNow" if from "TechRepublic" Note how the "from" can be anything in the "From:" header, not just the email address.

amgillard
amgillard

This is a great idea Sterling, however you are really implementing it at the wrong end of the chain. By the time the e-mail has hit the Inbox on my PC, it has already been downloaded and hence : - reduced my bandwidth capacity - increased the risk of deploying any embedded malicious payloads - identified my e-mail account as valid (i.e. no bounc-back) While there are definite benefits for these tools in better managing the e-mails I want to receive at the End-Node in the chain (i.e. my PC), it would be much better to have this capability provided at the ISP level. What sort of interest can you generate for your tool at that level ?

apotheon
apotheon

I edited the article before publication to leave out the uncertainty about whether you were using sSMTP, but appear to have made some kind of error in the process of getting the edit to take effect. Now that it has been published with both that and another error, I made sure the edits got added properly this time, and all should be well. I'm looking forward to seeing your updates.

Sterling chip Camden
Sterling chip Camden

as well as the Outlook object model, both of which make me feel very weighed down. I have done a lot of work with Outlook automation in the past, but as apotheon notes I would need some special motivation to do more in the future.

Sterling chip Camden
Sterling chip Camden

As Nick Bradbury says, solving one's own problem often produces something that's useful to others.

apotheon
apotheon

Sterling's getlessmail should work well in any Unix-like environment where the proper tools exist, including various Linux distributions, Solaris, MacOS X, and even Cygwin (on MS Windows). I won't swear it will work in all those environments, because I'm not sure of the availability of some of these tools on all those platforms, but in general it should be portable across such Unix-like platforms. Thus, for instance, you could make use of Sterling's getlessmail toolset on an Ubuntu system, if you happen to be using Ubuntu -- without having to start using FreeBSD. Don't take this as any kind of discouragement from using FreeBSD, though. In fact, I think that picking up a BSD Unix system for general use is an awesome idea, and I encourage it wholeheartedly. Not only are OSes like FreeBSD, NetBSD, and OpenBSD very well-designed systems with a lot of technical benefits that do not exist in quite the same measure as in those systems, but they are also standardized on copyfree licensing, rather than copyright or copyleft. This, to me, is the most important reason to pick up an open source BSD Unix system; to pick an OS whose core licensing model is copyfree, and thus the most ethically sound licensing model of the three major options (where "public domain" is sort of like a special case of copyfree licensing policy). It's just a happy accident that, for most purposes, I believe the various major BSD Unix systems are technically superior to the various more popular options, as well as being ethically superior. It probably helps that copyfree licensing helps improve the technical superiority of software if it's handled correctly.

Sterling chip Camden
Sterling chip Camden

However, I'm not running my own SMTP/POP server, so the point at which the mail is extracted from the POP server is my earliest opportunity to insert code. A hook like this would be nice at the server layer, though.

Sterling chip Camden
Sterling chip Camden

To replace my Windows server with a FreeBSD server. The only thing I haven't researched yet is support domain logins. Once I figure that out, it's "goodbye, Windows 2008" Perhaps I'll set up the FBSD server in parallel with the 2008 server to make migration easier -- but that means buying some hardware, or else running it in a VM.

apotheon
apotheon

It took a little bit to start noticing how the apparently minor differences in the system (from Linux-based OSes) add up in terms of the major differences in what it's like to admin the system. Once it started becoming clear, though, the end result was that it was increasingly obvious how FreeBSD enabled what I wanted to do and, while I was doing it, got the heck out of my way. That's available to various extents in some Linux distributions, of course, but I've seen none that achieve that level of enabling and elimination of obstacles that I get with FreeBSD. There's really no going back for me at this point. I don't have nearly the depth of experience with other BSD Unix systems that I have with FreeBSD. What little experience I have with them, though, and what I know about them -- particularly the similarities in project management styles -- suggests that the situation is similar. One of these days, I'll probably find myself setting up and managing an OpenBSD firewall. I guess that'll probably be confirmation of what I already suspect -- that OpenBSD (and NetBSD) can provide the same, or even a better, sysadmin experience. Considering my desire for a generally useful desktop system, though, I don't foresee giving up FreeBSD for my primary workstation OS any time soon. Of course, I'm sure OpenBSD and NetBSD are at least as well-suited to many common tasks for work and play, but some of those tasks that are less friendly to open source platforms are likely to be less easily accessible than on FreeBSD. Even FreeBSD, as mainstream-friendly as it is amongst BSD Unix systems, is short on a few minor points compared to some Linux distributions. They're not points that actually get in my way, but cut a few more points off to get to something like NetBSD or OpenBSD and suddenly I'm not so sure I'd have everything I wanted for a primary desktop OS. Maybe if I find a need for a suitable type of specialized secondary workstation, something like OpenBSD can fill that need. I'll still be doing most of my Web browsing (among other things) on FreeBSD though, I'm sure.

Sterling chip Camden
Sterling chip Camden

I knew I would like it, having worked on various *n*x systems off and on for more than 25 years. But FreeBSD seems to me the cleanest and most transparent one yet. The community provides awesome support. When I'm on FreeBSD, I feel like I can do anything.

Sterling chip Camden
Sterling chip Camden

For adding entries, and another one to pipe a message from mutt and add the sender. See the README for details.

AlexNagy
AlexNagy

If I can ever figure out what's wrong with Big Blue, I have at least 10K email in one account alone to run it through (unfortunately it's already in Thunderbird folders) for some more testing. That's cool. Is there any way to adjust the rules aside from manually (not that it would be a problem)?

Sterling chip Camden
Sterling chip Camden

But my sample size is probably too small. The way I have it set up, anything labeled spam goes to a spam folder, where I can review it before deleting.

AlexNagy
AlexNagy

How many false positives does your hook create? If it's even 1 out of a million, I'd still rather look at what's been labeled as spam before blacklisting it (if I do that at all). While the parent makes a good point about embedded execution of malicious code, even a well patched Windows box can mitigate those risks.