This week’s topic is partly a natural match of my previous post, 8 ways to use email header info and how to extract it and partly a reaction to one of the comments it received:
With the advent of Gmail and the ongoing market dominance of Outlook in the corporate arena, this tip is all but useless
I strongly disagree, for the reasons you can read in that thread plus the fact that, even if one used Gmail etc… it would still be absolutely necessary to have a private, complete backup of all email, either on a server of yours, or on a local hard drive. As they say, I may be paranoid, but that doesn’t mean somebody isn’t out there to hurt me, that is, in this case, to close my webmail account. Therefore, this week I’ll explain a FOSS way to automatically copy a bunch of mailboxes to other IMAP/Maildir ones. Before that, however, I first need to give an approximate answer to the question…
What are IMAP and Maildir?
The Internet Message Access Protocol (IMAP) allows efficient remote access to email even if it remains stored on a remote server. If you access your remote inbox through an IMAP server, for example, you won’t need to download a whole uninteresting message or a big attachment just to delete it.
For the purpose of this post, mailboxes formats can be divided in two categories. In the first one (mbox and derivates), all the messages of each mailbox are written in one single file, one after another. If Bob’s “home” email directory is /home/bob/Mail and he has three mailboxes called Work, Family and Friends, a listing of that directory will show three files with the same names:
#> ls -l /home/bob/MailFamily
Friends
Work
The other class of formats, instead, uses as mailboxes directories, normally with subdirectories, storing each email in a different file. The most popular representative of this category, unsurprisingly called Maildir, uses three subfolders per mailbox/directory, called new, cur (current) and tmp (temporary). If Bob used Maildir, he’d have this folder structure inside /home/bob/Mail:
.Family/new.Family/cur
.Family/tmp
.Friends/new
.Friends/cur
.Friends/tmp
.Work/new
.Work/cur
.Work/tmp
where unread messages stay in “new”, read ones in “cur” and “tmp” is used for temporary processing. Maildir (as the other directory-based formats) has a lot of advantages over mbox:
- it is fully supported by all IMAP servers around
- but you don’t need an IMAP server to use Maildir on your computer: any decent email client can access it directly
- it is more robust than single-file formats. With Maildir, it’s possible to delete an email from a mailbox just while the server adds another email to it without any data corruption
- it makes much easier to apply all the tricks described in my previous post
- it is great for incremental backups: adding one email to a 1000-messages mbox file changes it, forcing you to back it up completely. Adding one email to a Maildir means to back up only that new file
OK, how do I migrate to Maildir?
If this convinces you (as I hope) to convert to IMAP/Maildir all your mbox files, the question becomes how to do it automatically, especially if those files are scattered in several directories.
As a matter of fact, there’s no need to write complicated scripts, or use esoteric libraries. All you need is an email client (that is a program that already knows everything about mailboxes) that is capable to run inside a script, taking orders from it. Mutt is just such a client, and this is the second reason (the first is Mutt profiles) why I love it. Here are 20 lines of code (please note the credits) that will find all your mbox files and move their content to a Maildir:
1 #! /bin/bash2 #CREDITS: inspired by: http://foolab.org/node/1737
3
4 for ORIG_MBOX in `find $1 -type f -exec file {} \; | egrep 'ASCII mail|ISO-8859 mail text|UTF-8 Unicode mail text' | cut -d: -f1 `
5 do
6 echo "Found mbox: $ORIG_MBOX"
7 TARGET_MAILDIR="imap://USER@SERVER/temp_email_folder"
8 rm -f /tmp/MUTTCONF >& /dev/null
9 cat > /tmp/MUTTCONF <<ENDMUTTCONF
10 set folder=/dev/null
11 set move=no
12 set imap_pass=mypassword
13 macro index <F3> "<tag-pattern>~A<enter><tag-prefix><copy-message>$TARGET_MAILDIR<enter>y<quit>y"
14 folder-hook . push <F3>
15
16 ENDMUTTCONF
17
18 echo "Moving $ORIG_MBOX to $CURRENT_MAILDIR"
19 mutt -F /tmp/MUTTCONF -m Maildir -R -f $ORIG_MBOX
20 done
21 exit
Line 4 finds all the files in the directory passed as first argument ($1), runs the “file” command on them and filters (egrep) only those whose description shows they are mailboxes. Lines 9 to 16 save to /tmp/MUTTCONF the the IMAP password, user name and location plus, above all, the Mutt macro that does the real work. Line 13 (check the Mutt Manual for details) means in fact “dear Mutt, when I press the F3 key copy all the messages in the current folder to $TARGET_MAILDIR and exit“. Line 14, instead, simulates the pressing of just that key.
Once the configuration file is available, line 19 of the script tells Mutt to use it (-F) with Maildir as default Mbox format, on the $ORIG_MBOX, opened in read-only mode (-R). Cool, huh?
Usage notes
The script above will copy the content of all the mailboxes it finds in the target folder to $TARGET_MAILDIR. To take full advantage of it, you should note that:
- if (unlike me) you have email in less common character sets, you will have to add them in the egrep part of line 4, or the script won’t recognize them
- the $TARGET_MAILDIR may be anywhere. You may, for example, replace USER@SERVER and mypasswd with the credentials of your account on any remote IMAP server (including Gmail…) to upload all the email on your drive to that server. Using 127.0.0.1 as SERVER, instead, will copy the messages to an IMAP server on your computer
- as a matter of fact, you don’t even need an IMAP server. Setting TARGET_MAILDIR to “/email/mymail_archive” would create a perfectly usable Maildir in that location
- as is, the script has no reason to create the /tmp/MUTTCONF file inside the loop. I did it there on purpose, to stress the fact that you may even move every mbox to a different maildir, by just setting a different $TARGET_MAILDIR at every iteration. You may even use a wholly different Mutt configuration file every time, if you wanted
If you have any questions, please don’t hesitate to ask in the comments!