It seems as though every administrator has her own solution for archiving users’ e-mail inboxes. For some, it’s an automated process in Exchange. For others, it’s writing Procmail rules to keep a copy of every incoming message and storing it as a file on a particular backup server. No matter what the system, archiving is all too often a cumbersome and complicated task.
In this Drill Down, I will take a look at MHonArc software, a Perl-based program for archiving e-mail directories. Unlike Outlook (or Evolution) rules, where a user filters e-mail into various folders based on specific conditions, MHonArc shines when used to create user-friendly Web archives of mailing lists and e-mail archives. With this tool in place, end users can keep track of their vast collections of e-mail through an easy-to-use, threaded, Web-based archive.
Installing MHonArc
The first step is to download MHonArc from its home page. As of this writing, the current version of MHonArc is 2.5.4. Download whichever package you prefer (tar.bz2, tar.gz, or zip) and unpack it into /usr/local/src with the commands shown below.
For the tar.bz2 file, use:
cd /usr/local/src
tar xvyf MHonArc2.5.4.tar.bz2
cd MHonArc2.5.4
For the tar.gz file, use:
cd /usr/local/src
tar xvzf MHonArc2.5.4.tar.bz2
cd MHonArc2.5.4
For the zip file, use:
cd /usr/local/src
unzip MHonArc2.5.4.zip
cd MHonArc2.5.4
The next step is to build MHonArc. First determine that you have the prerequisite Perl modules: Getopt::Long and Time::Local. You can determine whether these are installed by using the perl -MGetopt::Long -MTime::Local -e ‘;’ command.
CPAN
If you get any errors, you will need to visit CPAN and download the appropriate modules. On most Linux systems, these will come preinstalled with your Perl or Perl Development packages.
The next step is actually to do the installation. In this scenario, I will install MHonArc into the /usr/local/mhonarc directory tree. You can install MHonArc wherever you like—but I suggest using the more standard /usr/local directory for installing from source. The command perl install.me -prefix /usr/local/mhonarc (run from within the MHonArc directory where theinstall.me file is located) will do the trick.
Where to install MHonArc?
Depending on where you want to install MHonArc, you may need to be root to install it. Another option, if you want to use MHonArc personally, is to install it into your home directory.
During the execution of install.me, you will be asked some straightforward questions such as where your Perl executable is located. Unless you have installed the requisite applications in non-standard directories, simply pressing enter to accept the defaults will be sufficient.
Using MHonArc
Now that MHonArc is installed, it’s time to configure it for use. There are a few files to deal with in the form of scripts and wrappers. The MHonArc process is something like this: the MTA receives an e-mail and triggers the mhonarc.sh wrapper script, which sends arguments to the MHonArc application, which then processes the mail to the archive.
There are a few ways to call the mhonarc.sh script, but the most typical would be via Procmail or the .forward mechanism of your Mail Transfer Agent (MTA) such as Sendmail, Postfix, Qmail, etc.
Calling the mhonarc.sh wrapper
The MHonArc wrapper must be called by the system’s MTA. If Qmail is used, the following entry has to reside in the ~/.qmail file, which is the equivalent of .forward in other MTAs:
|preline /home/user/scripts/mhonarc.sh
./Maildir/
The above entry would tell Qmail, on every incoming message, to first process the message with mhonarc.sh (the wrapper script for MHonArc), and then deliver the message to the user’s Maildir (Qmail mail directory), or inbox.
If Procmail is used, the following excerpt from a ~/.procmailrc file would be necessary:
:0 $HOME/mail/mhonarc.lock
|/home/user/scripts/mhonarc.sh
:0
$HOME/mail/inbox
This Procmail recipe will first process the message, again using the mhonarc.sh script, and then deliver the mail to the configured inbox (in the above instance: ~/mail/inbox). Of course, if you already use Procmail, you could put your filtering rules just after the MHonArc call, so you can archive every incoming message prior to filtering them into different mailboxes.
Procmail
For more information on Procmail take a look at Jack Wallen, Jr.’s articles on all the wonders of Procmail and lockfiles and nondelivering recipes.
Configuring the mhonarc.sh wrapper
The mhonarc.sh script is a wrapper script that is used to pass command line arguments to MHonArc. Using a shell script to do this makes things easier because a lot of arguments will be passed to the MHonArc binary. Here is a sample mhonarc.sh:
#!/bin/sh
/usr/bin/mhonarc -add -quiet \
-rcfile /home/user/scripts/default.mhrc \
-outdir /home/user/web/archives \
-idxfname date.php -tidxfname index.php -spammode -multipg -idxsize 500 \
-umask 002 -htmlext php
The above script passes a number of arguments to MHonArc. First the script tells MHonArc to use the control file /home/user/scripts/default.mhrc and to output all HTML files to the /home/user/web/archives directory, with a dated index name of date.php and a threaded index name of index.php. The script then enables spammode, which hides e-mail addresses; multi-page indexes; and a maximum number of 500 messages to be shown on any index page. Finally, the script writes the file with an umask of 002 (or 664 file permissions) and uses the extension .php on every HTML file it creates.
The default.mhrc file (see Listing A) is the configuration file for MHonArc, which defines how each archived message will look when the pages come up in a browser. This configuration file can be modified to taste.
You will notice, by looking at Listing A, that there is some PHP code mixed in. These “includes” define the header and footer of each page, which can be useful if you want to make the archives part of an existing Web site or if you want to be able to have a little flexibility. MHonArc, by default, writes static HTML, which means that if you want to change things down the road, in terms of how it looks, you may find it a little difficult. This isn’t such a big deal for a personal mail archiving solution, but it could be a nuisance if you plan to use MHonArc to archive mailing list messages for a Web site or company. By using the PHP “includes,” you can work around the inflexible nature of static HTML.
At any rate, the include files define the header and footer information, and because the above configuration file tells MHonArc to create the files with a .php extension, all of these files, when viewed with a browser, will be parsed with the PHP engine. Of course, this assumes:
- · That a Web server, like Apache, is running.
- · That this is for personal use.
- · That the user will connect to the server to view his/her archives.
Since Apache is so easy to configure, and the customization possibilities are made endless by doing so, it might be safe to assume that this would be a logical route to take. After all, mail archives don’t have to be static, just the messages themselves. The only disadvantage is that if a Web server is not running and the user wants to browse the files locally with just a Web browser, the -htmlext parameter will have to be removed and the PHP code will have to be replaced (in the default.mhrc file) with actual HTML code.
In order to provide a fully functional template, let’s look at some very basic HTML code that can be used in the includes referenced in the mhonarc.mhrc. The header1.php file, referenced in mhonarc.mhrc, might look like:
<html>
<head>
<title>
The header2.php file might include:
</title>
</head>
<body>
<p><b>
The header3.php file might be:
</b></p>
Finally, for footer.php you could use:
</body>
</html>
Of course, this is extremely simplistic, creating a very basic, non-descript Web page. Feel free to be more creative than this example.
Taking it one step further
There is another disadvantage to using the default mhonarc.sh. If you receive a lot of e-mail, you will find your directory filling up quickly with files. This will slow down access to that directory, so it might be prudent to configure MHonArc to use a different directory each month. This is very easy to do, but requires that you use something like CRON script to make the process automated. If you do not receive a high volume of mail, you may not wish to separate your archives in this manner.
Take the above mhonarc.sh and modify it slightly, so that it looks something like this:
#!/bin/sh
/usr/bin/mhonarc -add -quiet \
-rcfile /home/user/scripts/default.mhrc \
-outdir /home/user/web/archives/2002-05 \
-idxfname date.php -tidxfname index.php -spammode -multipg -idxsize 500 \
-umask 002 -htmlext php
The only difference is that the above script will add the current year and month (in the form of 2002-05) as a subdirectory to your ~/web/archives directory. You should now create this directory, as MHonArc requires that the directory exist in order for it to write the HTML files. The next step is to create another script, called newdate, which will run each month via CRON and will automatically create this date-based directory for you. This script will look like Listing B.
To get things rolling, create a file in your ~/scripts directory called lastmonth and include in it the current year-month (i.e., 2002-05). By looking at the newdate script, you can see that each time it runs, it echoes the current year-month to the lastdate file. At the beginning of the script, it uses the contents of the lastmonth file to determine what the last month was. The newdate script will also take the new date and create a directory based on that date. In other words, when the date becomes June 2002, newdate will automatically create ~/web/archives/2002-06, and modify the mhonarc.sh script to tell MHonArc that the 2002-06/ subdirectory is to be used instead of the 2002-05/ directory (by modifying the -outdir value using some sed magic).
Finally, make a cron entry so newdate will automatically be run. Create a file called ~/crontab, which contains the single line:
01 0 1 * * /home/user/scripts/newdate
The above line tells CRON to run the newdate script the first minute of each month. Now add it to your own personal crontab, with the crontab ~/crontab command.
Conclusion
At this point, you should have a fully functional, HTML-based, e-mail archiving system. You can extend this to work with archiving mailing lists, company e-mail, or basically any other e-mail you could possibly want to retain. MHonArc is a very versatile piece of software, and it works exceptionally well.
There are a whole slew of different configuration options that you can enable, both on the command line and within the control file. Be sure to look at the documentation that comes with MHonArc for a list of the variables you can use in different sections of the control file to modify how each page looks and acts. MHonArc is great for homegrown solutions, or for archiving public mailing lists and making them visible as part of your Web page. With its high level of customization, rest assured that you can make MHonArc integrate very nicely within your existing Web structure.