Fighting SPAM and Spam Assassin

It is pretty clear to anyone living on the same planet that I do that spam has become a big enough problem that some kind of automated method is needed to deal with it. Since I live exclusively in a linux world, my mail is handled by sendmail and procmail and spam assassin is a readily available anti-spam solution.

This short note deals entirely with how to set up spam assassin (and procmail) to effectively deal with spam.

At the time of this writing, I am running Red Hat Fedora Core 3, and using sendmail (the stock setup) as my MTA. Spam assasin comes as a regular package, and can be started as a service, however when I type "service spamassassin status" I get the message: spamd is stopped, and when I look in /etc/rc.d/rc5.d there is no start entry for spamassassin. To start it up, I use the commands:

chkconfig --level 35 spamassassin on
service spamassassin start

A peek at /etc/rc.d/init.d/spamassassin shows that this starts up /usr/bin/spamd and gets configuration stuff from /etc/sysconfig/spamassassin. Spamd is run with options -d -c -m5 -H, this runs spamd as a daemon with 5 child processes (6 processes in all). The sysconfig file is just a way to override the options for spamd.

How does spamc get invoked to filter the mail then?? It used to be automagic, but the magic all stopped when I did a full install of fedora core 4. The missing magic is /etc/procmailrc which looks like this:

DROPPRIVS=yes

:0fw
* < 256000
| /usr/bin/spamc

/usr/bin/spamassasin and /usr/bin/spamd are both perl scripts, and they get configuration from files in /usr/share/spamassassin and /etc/mail/spamassassin. The files in /usr/share/spamassassin should not be modified (they get overwritten when you upgrade to new versions of spamassassin). The files in /etc/mail/spamassassin can be fiddled with to tailor spam assassin for a particular site. Also ~/.spamassassin has files unique to each user, including a per-user white list, if desired. Spam assassin has an auto-white-list algorithm that can be turned on via the use_auto_whitelist variable.

/usr/bin/spamc is a compiled client program. You have your choice of using the "old way" i.e. just invoking /usr/bin/spamassassin from the command line, or the "new say" of running spamd in the background and invoking spamc to move mail through the background spamd. The new way is a lot more efficient if you process a lot of mail, as it avoids starting a perl interpreter for every mail message.

The files in ~/.spamassassin are bayesian profile information for a particular user. These are databases maintained and built by /usr/bin/sa-learn (see man sa-learn).

Spamassassin is also available as a perl module for inclusion in other perl code via Mail::SpamAssassin. See man 3 Mail::SpamAssassin for more information.

The files in /usr/share/spamassassin are more interesting, here we begin to see some of the rules that make spam assassin do what it does. All of these coach that local mods should NOT change any of these files (which get overwritten whenever a new version of spamassassin comes along), but that local changes be made to /etc/mail/spamassassin/local.cf

Have any comments? Questions? Drop me a line!

Adventures in Computing / tom@mmto.org