This is the fourth part of a five part tutorial that will show you how to install a full featured email server on your Raspberry Pi. This tutorial covers how to mark emails as spam with Spamassassin.
The parts are:
The Introduction & Contents Page (read first)
Raspberry Pi Email Server Part 1: Postfix
Raspberry Pi Email Server Part 2: Dovecot
Raspberry Pi Email Server Part 3: Squirrelmail
Raspberry Pi Email Server Part 4: Spam Detection with Spamassassin
Raspberry Pi Email Server Part 5: Spam Sorting with LMTP & Sieve
Intro
I don’t actually get very many spam emails (famous last words, right?) but the occasional email gets past my helo access restrictions list (discussed in Raspberry Pi Email Server Part 1: Postfix). So, I decided to set up Spamassassin, a program that will check incoming emails and mark them as spam if they look suspicious. Spamassassin is pretty clever, it uses bayesian filtering to decide what’s spam and what’s not, and it will learn based on previous results, so it gets more accurate over time if you correct it when it gets things wrong. Spamassassin will only mark emails as spam, it will not sort them into folders for you as well. We’ll be doing the sorting with Dovecot’s Local Mail Transfer Protocol (LMTP) and the Sieve plugin, in the next tutorial: Raspberry Pi Email Server Part 5: Spam Sorting with LMTP & Sieve. Let’s get started:
Installing & Configuring Spamassassin
First, install Spamassassin:
sudo apt-get update sudo apt-get install spamassassin
Now we need to edit values in the file /etc/spamassassin/local.cf
. Some of these may already be set, in which case you can leave them as they are; add or amend the others as necessary: This one will add the spam score to the subject line of emails that Spamassassin considers to be spam:
rewrite_header Subject [***** SPAM _SCORE_ *****]
Spamassassin will also flag spam emails with “X-Spam-Flag: YES” in the headers. This flag is what we will eventually use to sort emails with; the rewritten subject line is purely to make the score easier to see. This next setting will tell Spamassassin to modify headers only, without making any changes to the body of the email:
report_safe 0
This one lowers the threshold for mail to be considered spam from 5 to 2. You can change this later if you get lots of false positives, but it’s nice to have some emails set off the rules to begin with, just so you know it’s working:
required_score 2.0
This tells Spamassassin to use Bayesian filtering:
use_bayes 1
This turns on automatic learning:
bayes_auto_learn 1
Now edit /etc/default/spamassassin and set:
ENABLED=1
You can now start the spamassassin daemon:
sudo service spamassassin start
If you are using a modern Debian derivative (Jessie or later), the init system has changed to systemd. You need to run this additional command to enable spamassassin, which will cause it to automatically start when you boot:
sudo systemctl enable spamassassin
Instructing Postfix to use Spamassassin
At this stage, the Spamassassin daemon is running but none of your incoming emails are being passed through it. We need to edit this line in /etc/postfix/master.cf
(just under the headers):
smtp inet n - - - - smtpd -o content_filter=spamassassin
And append this to the bottom of that same file, which will pipe the output back to Postfix using the Postfix’s Sendmail compatibility interface:
spamassassin unix - n n - - pipe user=debian-spamd argv=/usr/bin/spamc -f -e /usr/sbin/sendmail -oi -f ${sender} ${recipient}
Note: this is all one line, even if it appears wrapped in your browser. Now restart postfix:
sudo service postfix restart
If you get an error like this:
[....] Stopping Postfix Mail Transport Agent: postfix/usr/sbin/postconf: fatal: file /etc/postfix/master.cf: line 22: bad field count postfix/postfix-script: fatal: cannot execute /usr/sbin/postconf! failed!
…then check the whitespace before the -o in content_filter=spamassassin. I can’t quite remember what I did but I think I changed tabs to spaces or the other way round, and then restarted postfix. Now watch the mail log with this command:
tail -f /var/log/mail.log
…and send a test email. You should see testing something like this:
Jan 8 22:21:18 samhobbs postfix/smtpd[952]: connect from blu0-omc2-s3.blu0.hotmail.com[65.55.111.78] Jan 8 22:21:19 samhobbs postfix/smtpd[952]: 542E83F519: client=blu0-omc2-s3.blu0.hotmail.com[65.55.111.78] Jan 8 22:21:19 samhobbs postfix/cleanup[957]: 542E83F519: message-id=Jan 8 22:21:19 samhobbs postfix/qmgr[941]: 542E83F519: from= , size=1579, nrcpt=1 (queue active) Jan 8 22:21:19 samhobbs spamd[445]: spamd: connection from localhost [127.0.0.1] at port 35680 Jan 8 22:21:19 samhobbs postfix/smtpd[952]: disconnect from blu0-omc2-s3.blu0.hotmail.com[65.55.111.78] Jan 8 22:21:19 samhobbs spamd[445]: spamd: setuid to debian-spamd succeeded Jan 8 22:21:19 samhobbs spamd[445]: spamd: creating default_prefs: /var/lib/spamassassin/.spamassassin/user_prefs Jan 8 22:21:19 samhobbs spamd[445]: config: created user preferences file: /var/lib/spamassassin/.spamassassin/user_prefs Jan 8 22:21:19 samhobbs spamd[445]: spamd: processing message for debian-spamd:111 Jan 8 22:21:24 samhobbs spamd[445]: spamd: clean message (0.0/2.0) for debian-spamd:111 in 5.0 seconds, 1541 bytes. Jan 8 22:21:24 samhobbs spamd[445]: spamd: result: . 0 - HTML_MESSAGE,MSGID_FROM_MTA_HEADER scantime=5.0,size=1541,user=debian-spamd,uid=111,required_score=2.0,rhost=localhost,raddr=127.0.0.1,rport=35680,mid= ,autolearn=ham Jan 8 22:21:24 samhobbs postfix/pickup[940]: D83DE3F521: uid=111 from= Jan 8 22:21:24 samhobbs postfix/pipe[958]: 542E83F519: to= , relay=spamassassin, delay=5.7, delays=0.44/0.05/0/5.2, dsn=2.0.0, status=sent (delivered via spamassassin service) Jan 8 22:21:24 samhobbs postfix/qmgr[941]: 542E83F519: removed Jan 8 22:21:24 samhobbs postfix/cleanup[957]: D83DE3F521: message-id= Jan 8 22:21:24 samhobbs postfix/qmgr[941]: D83DE3F521: from= , size=1890, nrcpt=1 (queue active) Jan 8 22:21:25 samhobbs postfix/local[964]: D83DE3F521: to= , relay=local, delay=0.2, delays=0.06/0.1/0/0.03, dsn=2.0.0, status=sent (delivered to maildir) Jan 8 22:21:25 samhobbs postfix/qmgr[941]: D83DE3F521: removed Jan 8 22:21:25 samhobbs spamd[439]: prefork: child states: II
So the steps you can see here are:
- Outlook server connects to RasPi/Postfix on port 25
- Postfix accepts the message and hands it to Spamassassin to process
- Spamassassin decides the message is clean and marks it as HAM
- The email is passed back from Spamassassin to Postfix and delivered to the inbox
Training Spamassassin
We’ve deliberately set the score limit for spam to a low value. This inevitably means we’ll get some false positives, but we can use these to train Spamassassin and make it better. First, some things to understand about the Maildir format we’re using. Here’s what my structure looks like:
admin@samhobbs ~ $ sudo ls -al /home/sam/Maildir/ total 604 drwx------ 12 sam sam 4096 Mar 6 14:55 . drwxr-xr-x 3 sam sam 4096 Mar 5 23:07 .. drwx------ 2 sam sam 36864 Mar 6 12:59 cur -rw------- 1 sam sam 11920 Mar 6 04:14 dovecot.index -rw------- 1 sam sam 415744 Mar 6 14:50 dovecot.index.cache -rw------- 1 sam sam 10332 Mar 6 13:08 dovecot.index.log -rw------- 1 sam sam 32784 Mar 5 16:22 dovecot.index.log.2 -rw------- 1 sam sam 30 Jan 13 22:30 dovecot-keywords -rw------- 1 sam sam 144 Mar 3 17:49 dovecot.mailbox.log -rw------- 1 sam sam 27138 Mar 6 09:27 dovecot-uidlist -rw------- 1 sam sam 8 Mar 5 23:07 dovecot-uidvalidity -r--r--r-- 1 sam sam 0 Nov 23 22:55 dovecot-uidvalidity.52913278 drwx------ 5 sam sam 4096 Mar 5 22:36 .Drafts drwx------ 5 sam sam 4096 Mar 4 21:53 .foo drwx------ 5 sam sam 4096 Mar 3 17:49 .INBOX.foo drwx------ 2 sam sam 4096 Mar 6 09:37 new drwx------ 5 sam sam 4096 Mar 5 22:36 .Sent drwx------ 5 sam sam 4096 Mar 6 14:37 .Spam -rw------- 1 sam sam 37 Mar 3 17:49 subscriptions drwx------ 5 sam sam 4096 Nov 27 19:00 .Templates drwx------ 2 sam sam 4096 Mar 6 09:27 tmp drwx------ 5 sam sam 4096 Mar 6 04:08 .Trash
You can see I’ve created a couple of test folders here: one top level folder called “foo” and another subfolder in the inbox also called “foo” (.INBOX.foo). Each folder has three subdirectories: new for new (unread) emails, cur for emails that have been read, and tmp for temporary storage during delivery. You can read more about this on the Dovecot Wiki if you’d like to know more. So, the important thing to take away from this is that HAM emails are stored here: /home/username/Maildir/cur
…and SPAM emails will be stored here (after sieve has been configured): /home/username/Maildir/.Spam/cur
Spamassassin has a commandline training tool that is invoked like this:
sa-learn --no-sync [--spam or --ham] [folder/{cur,new}]
Each user has its own spamassassin database, which is located in the user's home directory in a hidden folder (.spamassassin
). By default, the sa-learn
command trains the database in the home directory of the user running the command, and since the spamassassin pipe we set up processes email as the user debian-spamd
, we need to make sure we train the database in debian-spamd
's home directory (which is /var/lib/spamassassin
- you can check by looking in /etc/passwd
). Unfortunately, if you run the command as debian-spamd
using sudo -u debian-spamd command
, you won't have read permissions for your emails. Here’s the plan: move any false positives back into the inbox with your email client, and move any missed spam into the spam folder. Then run these three commands using sudo, so you have permission to read your emails and write to the spamassassin database, and use the --dbpath
option to specify which database to write to:
# Scan HAM sudo sa-learn --dbpath /var/lib/spamassassin/.spamassassin/ --no-sync --ham /home/username/Maildir/{cur,new} # Scan SPAM sudo sa-learn --dbpath /var/lib/spamassassin/.spamassassin/ --no-sync --spam /home/username/Maildir/.Spam/{cur,new} # sync the journal and databases sudo sa-learn --dbpath /var/lib/spamassassin/.spamassassin/ --sync
On my Pi, running the HAM command took about 5mins to process ~500 messages, with WordPress running at the same time. If you’re sure you will always move emails into the correct folders, you could add these two commands to a cron job so that they run regularly and keep everything up to date. Alternatively, you can just run the commands when you notice a few false positives or missed spam emails. Over time, your spam filter will get better and better.
Automated learning using a script
If you don't want to run the commands manually all the time, you can use this simple cron job I wrote. The cron job runs as root, so you don't need the sudo
part we used earlier. Create the script like this:
sudo nano /etc/cron.daily/spamassassin-learn
Now copy and paste this into the file (ctrl + shift + v to paste in nano):
#!/bin/bash # Script by Sam Hobbs, see the following URL for updates: # https://samhobbs.co.uk/2014/03/raspberry-pi-email-server-part-4-spam-detection-with-spamassassin # redirect errors and output to logfile exec 2>&1 >> /var/log/spamassassin.log NOW=$(date +"%Y-%m-%d") # Headers for log echo "" echo "#================================ $NOW ================================#" echo "" # learn HAM echo "Learning HAM from Inbox" sa-learn --dbpath /var/lib/spamassassin/.spamassassin/ --no-sync --ham /home/sam/Maildir/{cur,new} # learn SPAM echo "Learning SPAM from Spam folder" sa-learn --dbpath /var/lib/spamassassin/.spamassassin/ --no-sync --spam /home/sam/Maildir/.Spam/{cur,new} # Synchronize the journal and databases. echo "Syncing" sa-learn --dbpath /var/lib/spamassassin/.spamassassin/ --sync
Important: edit the paths so that they match your username! If you want to scan ham and spam for all users (this only works if you trust all users to be sensible and move ham/spam to the right folder) then replace the username "sam" with a glob ("*"), i.e:
sa-learn --dbpath /var/lib/spamassassin/.spamassassin/ --no-sync --ham /home/*/Maildir/{cur,new}
Now make the script executable:
sudo chmod +x /etc/cron.daily/spamassassin-learn
The script will learn from ham/spam daily, and write a log file at /var/log/spamassassin.log
. Make sure you move any spam you find into your spam folder, and any false positives back into your inbox. Don't worry if ham is accidentally marked as spam one day and gets "learned", if you move the messages to their correct locations then the next time the script runs spamassassin will correct itself.
What’s next?
We’re now done with Spamassassin. The only thing left to do is find a way to sort spam emails directly into the spam folder, which is covered in the next tutorial: Raspberry Pi Email Server Part 5: Spam Sorting with LMTP & Sieve. Feel free to leave a comment to let me know how you get on!
Comments
Spamassassin guide -- few questions
Hi,
This is very good manual!
Have installed Spamassassin 3.4.2 flawlessly. Though in fact it works a bit differently, than it's described here:
--
SPAM _SCORE_
is missing in the local.cf. There is only*****SPAM*****
. Should I use it anyway? I have these lines in a SPAM email now:Subject: *****SPAM***** [original subject goes here]
X-Spam-Level: ********
Should I use
_SCORE_
somewhere?-- You tell that report_safe 0 will just modify the header, leaving the body as is. Nevertheless AFAICS, the SPAM email is converted into a preview with the original message in attachment. What's wrong here?
-- You tell that
required_score 2.0
is a good initial value. OK, I've set it. Now I see these lines in a SPAM email's header:X-Spam-Status: Yes, score=8.8 required=5.0 tests=BASE64_LENGTH_79_INF,
RCVD_IN_BL_SPAMCOP_NET,RCVD_IN_RP_RNBL,RCVD_IN_SBL_CSS,RCVD_IN_XBL,
URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.2
Where is my "2.0", which I've set? Also, would it be good to add "autolearn yes" and "autolearn_force=yes"?
-- Concerning training (the final part): it's not too clear whether should I create
.Spam
folder manually or not. At present there is no such subdirectory within mail account directory. Also, only the home/Maildir is described. It would be good to see at least few words about virtual users. This is just my case. I can suppose that everything is said here, also could be applied to the virtual account. It would be just good to add such phrase.Thank you.
Incorrect link
I do not know if you are aware but the above link to LMPT does not take you there.
The link in question is ": Raspberry Pi Email Server Part 5: Spam Sorting with LMTP & Sieve".
If clicked one ends up at "Kodi server part 7: Firewall Rules "
Add new comment