In A practical example of why HTML email is a bad idea, I provided a very simple example of the kind of security dangers that can be avoided by nothing more complex than viewing email only as plain text. Of course, it’s reasonable to expect that there will be occasions people will send you HTML e-mails, and you’ll need to view the contents — and it is reasonable to expect that you will want to be able to view these emails without having to visually sort through everything to find the parts that aren’t extraneous, unnecessary markup. It is the convenience of not having to try to sort through the tangle of bad HTML in many emails that drives many to ignore security advice about viewing emails as plain text.

Much as I wish it were otherwise, it doesn’t seem likely that everybody in the world will forsake their unnecessarily HTML-laden ways when it comes to composing emails, not only because many modern email clients default to HTML-formatted e-mails, but also because many people get deeply involved in choosing the right “stationery” background image, fancy fonts, and other pointless frippery when sending intensely interesting and highly informative e-mails like “yo, whats up? how r u?”. Luckily, a lot of those clients do “the right thing” when composing HTML emails, which is to offer a plain text version as well without the person composing the email even having to know about it.

Sometimes, however, we don’t get plain text versions along with the tangle of spaghetti HTML. This may be because, when certain things are done in HTML that cannot be reasonably well degraded to plain text, the email client just skips the plain text version; it may be because some Website developer doesn’t know any better when he designs the automated feedback form reply script; it may even be because someone stupidly turned off the plain text copy capability. It may also simply be the fact that someone is using a particularly low-quality email client that only offers HTML formatted functionality.

Regardless of how it happens, the fact remains that sometimes we just need to be able to sift through the contents of an HTML-formatted e-mail, and probably don’t want to have to do it without getting a headache. The choices are usually limited to:

  1. give up on secure practices and just view the rendered HTML e-mail somehow — either within an HTML-capable email client or by viewing the email in an outside application that renders HTML
  2. live with the inconvenience of parsing all that markup by eye
  3. delete the offending email and request another copy without the unnecessary markup scattered through it

I used to run with number 2 all the time. After a while, I decided to write a simple script I call stripmark that would parse HTML out of the email so I could just view the plain text. Over time, it has acquired a few more capabilities, including translating linebreak tags into actual newline characters in the plain text output. This sort of thing is relatively easy to do, with tools like the Ruby programming language and a highly functional text-mode mail user agent such as Mutt. This, however, is far outside the range of what the average user is able to do, and isn’t exactly within the range of what most technically oriented people are willing to do — especially those whose tools are mostly limited to the MS Windows platform.

The script I use is gradually changing into something akin to the opposite of what text parsing libraries like Markdown do. Such libraries define a simplified markup language that is much easier to use to describe how text formatting should be specified via the keyboard, then parse text formatted in that manner, translating the document into an HTML or XHTML formatted document. In fact, I type these articles in plain text files using Vim, entering Markdown formatting signifiers by hand, then run a script that uses a Markdown library to transform the text formatting signifiers into Web markup before it gets published here at TechRepublic.

If it keeps evolving to that point, my stripmark script will eventually do the inverse: it will translate HTML or XHTML formatted documents into text with simple text formatting signifiers that make emphasis, linebreaks, and other simplified “rich text” characteristics obvious without making the text nigh-unreadable the way an actual Web markup language does.

At the moment, the script is just a dirty hack. Even if it was cleaned up, prettied up, and made worth distributing, though, its use would still be a dirty hack. What’s really needed is for mail user agents and email clients to incorporate such safety related functionality directly, either via official plugins in the case of Unix MUAs or as integrated functionality in the case of mainstream GUI email clients.

By default, a “safety mode” for any e-mail client should perform a number of tasks for the user. A few examples include:

  1. Disallow embedded images, Flash objects, or anything else that isn’t actual text and markup in the rendered “rich text” email display without a warning and specific user intervention.
  2. Disallow hiding URLs behind link text so that even the most casual, security-ignorant user will not be fooled into thinking a link that says “PayPal” but has a URL with a domain is a legitimate PayPal link.
  3. Disallow execution of any dynamic content, especially including JavaScript, VBScript, and similar programming languages, without a warning and specific user intervention.

The list could go on at great length, but it would probably be easier to just list things that should be allowed — like italicizing text, bolding text, underlining text, manipulating colors, and physically altering the visible location of content on the screen in a manner that doesn’t hide any content (such as via tables or CSS positioning). It should also clearly indicate when any text uses characters that are not part of the standard ASCII character set, just for good measure, in case someone wants to copy and paste a URL from an email to a browser.

This, at least, would allow people like me, who are aware of the security dangers of normal HTML e-mail rendering, to view the occasional marked up email without having to go to inconvenient lengths to read it without making ourselves susceptible to the dangers of rendered HTML emails.

Unless and until such a MUA or e-mail client that I want to use lands in my lap, though, I’d appreciate it if you’d all default to sending plain text e-mails only. Considering the overwhelming tendency of spammers and phishers to use HTML e-mail, and that most legitimate email users at least offer plain text along with the HTML formatted versions of their messages (whether they know it or not), my spam filtering identifies all HTML-only emails as high-risk targets. If I don’t expect your email, and it’s HTML formatted, you should be resigned to the expectation that I may never read it.

I value my security more than unsolicited emails, and — contrary to my usual policy of avoiding false positives at any reasonable cost — I’m willing to accept a few false positives to protect myself.