Open Source

Txt2tags: A great lightweight markup language for many tasks

Marco Fioretti explains why he favors Txt2Tags lightweight markup language (LML) for many text management tasks. Here's what it can do for you.

Computers and the Internet have greatly increased the amount of text that many of us write, edit, reuse or simply archive. When pens and typewriters were the only tools available, they limited both how much text we could produce, and the number of occasions to process it. Today we live in an endless stream of reports, memos, email, websites and tweets that we can copy and paste with a click. Such a situation is a continuous stimulus to write, rearrange, and reuse text. Trying to do it efficiently, however, has some interesting practical implications.

Even when we are aware of how much text we need to manage, we cannot know in advance when we'll want to reuse or adapt some piece, or where, that is on which media or platform (paper, website, smartphone...). If we want to make the most of all the texts we write ourselves or keep stored in our computer, we need to be sure:

  • that their starting format is as simple, portable and future-proof as possible.
  • that we can quickly convert it to many other formats.

The OpenDocument format (ODF) satisfies the first requirement and is great for complex documents. However, it is too complicated for simple texts and is not the easier solution when (automatic) conversion to many other formats is important. These considerations have produced a whole bunch of lightweight markup languages (LML): plain text formats, with very simple special characters or strings that mark up headings, lists, type faces and so on. The work flow for all LMLs is the same:

  • write and store your text, with any editor, in the LML of your choice
  • whenever you need that text in another format (HTML, LaTeX, wiki, PDF...) generate a copy in that format, using the available conversion software for that LML

The LML I prefer, and have been using for most of my work for a few years now, is Txt2Tags. Here's why.

The main reason why I like Txt2Tags is the simplicity and high availability, now and in the future, of its conversion software. I use Txt2Tags because I am sure that I can run it everywhere, with the smallest possible set up effort, without compiling anything, or fighting with dependencies.

The Txt2Tags converter is one small script that Just Works, without relying on any particular library, on every platform where Python runs. Its slogan, "download and run", is true and it's a great part of being "as future-proof as possible": I can create, reuse, and convert the same *.t2t files on any computer I may encounter, from the VPS server hosting my websites to my uncle's Windows box or any Android smartphone.

One simple input...

The second great advantage of Txt2tags is the simplicity of the format itself. A .t2t file is divided in three sections called body, header, and settings. The body, which is the only mandatory part, corresponds to the actual text. The header, instead, contains metadata like document author, date and title. The setting section is the place where you pass instructions to the Python script (more on this in a moment).

The markup rules are simple and keep the source text very readable, which is much more important than you may think. Learning those rules is a breeze, thanks to the online demo and converter. Besides, unlike what happens with other systems, all the marks are symbols, not strings or letters that may confuse spell checkers.

...for many great outputs

What next? Oh, yes, output formats. The features page currently lists 18 of them, from DocBook to HTML, several Wiki flavours, MagicPoint presentations and LaTex. PDF and e-books, you say? No problem. Txt2Tags doesn't support them, because it doesn't need to. On most Gnu/Linux distributions, once you have converted some text to LaTex, PDF is just one more command away:

  $ txt2tags -t tex filename.t2t
  $ pdflatex filename.tex

The same applies to e-books. You can convert .t2t sources to HTML, and then generate ePub versions from there in many ways. Me, I've personally used Txt2Tags to generate OpenDocument slideshows automatically, as well as PDF books and other stuff. With a bit of hacking, you may even add footnotes to your documents!

The power of pre- and post-processing

I mentioned above that Txt2Tags documents have an optional section in which you can give instructions to the converter. The two most important ones are those called preproc and postproc. A command like this inside a *.t2t file:

%!preproc: _something_

means "do whatever is written after the colon NOW", that is, before converting the file to the desired target format. The postproc command, which has the same syntax, works in the opposite way:

%!postproc: _something_else_

thus defining commands that the Txt2Tags script must execute after it has finished the conversion. The most common usage of preproc and postproc is to find and replace specific strings, in whatever moment it is more convenient for you. Some users, for example, use preproc to create lists of links and other abbreviations. Doing so saves typing and keeps the .t2t source file more readable. This line, for example:

%!preproc: url_tros

tells the script to replace all the occurrences of the url_tros string with the URL of the Tech Republic Open Source blogs.

Txt2Tags, a format good for almost any task

Combined together, preproc, postproc and the other Txt2Tags features can extend the functionality of this LML in all the ways described in the official wiki, and many more. Txt2Tags is not the best solution for works with lots of formulas, cross-references, or pictures with captions. In all other cases, however, Txt2Tags is so simple to use that it would be a shame to not try it!


Marco Fioretti is a freelance writer and teacher whose work focuses on the impact of open digital technologies on education, ethics, civil rights, and environmental issues.

Editor's Picks