Linux

How to handle time-based data with Gnuplot

Marco Fioretti explains how you can use Gnuplot to make handy charts of time-based data. Here are some tips for making it generate what you want.

Gnuplot is, in my opinion, one of the best open source tools to plot charts on Linux. I like it because:

  • it is a command line tool
  • it recognizes commands that are plain text strings
  • it reads the numbers in plain text format, from plain text files

In other words, I like Gnuplot because it makes automatic chart creation as easy as possible: you can generate both data and Gnuplot commands on the fly with whatever program or programming language you like best, and then pass them to Gnuplot.

With the exception (unless you adopt some tricks) of pie charts, Gnuplot can plot practically every kind of diagram... if you know which options to use. Here I will explain in detail only the little known ones that let Gnuplot recognize certain strings as absolutes dates or hours and, consequently, properly plot time-related data.

Let's assume that you have a data file ("datafile.dat") like this:

20110101 30

20110108 21

20110115 28

20110122 3

20110129 6

20110205 9

20110212 12

20110219 25

20110226 22

20110305 18

20110312 37

20110319 37

20110326 32

20110402 41

20110409 35

20110416 26

20110423 27

20110430 20

A human would understand immediately that the numbers in the first columns are, very likely, dates. Gnuplot, instead, won't. Not by itself, at least. If you just tell it to plot the file with these instructions:

  set terminal png size 900, 300
  set output "chart_1.png"
  plot [:][:] 'datafile.dat' using 1:2 title "This is what you get when gnuplot doesn't recognize time values" with lines
that is, using the first column for the X axis and the second for the Y one, you'll get the chart of Figure A: "20110101" isn't recognized as "January 1st, 2011" but as "twenty millions something..." Really ugly and unreadable, isn't it?

Figure A

The solution is to set the xdata and timefmt variables of Gnuplot, by adding these commands right before the plot instruction:

  set xdata time
  set timefmt "%Y%m%d"

The first one tells Gnuplot that the numbers that go on the X axis are time values. The second explains how they are formatted. In our example, the format is YYYYMMDD, but it could have many other values, all described in the documentation (more on this later). In practice, there are only two constraints here. The first is that only one time/date input format per plot is supported. The other is that, to let Gnuplot handle without problems times values containing spaces, e.g., "2011/06/02 11:18", the columns in the data file must be separated by tabs instead of spaces when such strings are present.

Back to plotting now. Run Gnuplot with the latest settings and lo! the numbers of the first column will be recognized for what they are and printed accordingly, as in Figure B.

Figure B

Much better now, isn't it? We can make it even better, however. Luckily, the way the time values are formatted in the data file and the way in which they will be printed in the plot are completely independent. The timefmt variable that we've already seen specifies how to read the time column in the data file. Timefmt recognizes lots of formats: %j, for example, indicates the day of the year in the 1- 365 format, and %B the month name (in English!). To read about all the capabilities of timefmt, type gnuplot in a terminal and then help set timefmt.

The other thing you need to know in order to plot readable time-based charts is how to plot ranges and display tics on the time axis to show just what you, not Gnuplot, think is relevant.

If you only want to plot a certain range of values from the data file, specify it by setting the xrange with the same format used by timefmt, that is the one in the original data file:

  set xrange ["20110402":"20110430"]
This is how to plot "restricted" chart like the one of Figure C.

Figure C

Settings number, positions and names of the tics is a bit more complicated, but not so much. The names are set either by listing them explicitly, or assigning a format to the xtics variable. This, for example:

   set xtics format "%b %d"
means that Gnuplot should use the abbreviated month name (%b) and day of the month (/%d) to print tics like "Jan 20", "Mar 11" and so on. Position and number of tics can be controlled in several ways. The most flexible one is to assign to xtics a start and end value, plus the number of seconds between two consecutive tics. Let's assume, for example, that we want to display one tic every two weeks, on Wednesdays. Since the first Wednesday of 2011 was January 5th and there are 60x60x24x7x2 = 1209600 seconds in two weeks, here's how to plot what you see in Figure D:
  set grid
  set xtics format "%b %d"
  set xtics "20110105", 1209600, "20110430"

Figure D

Cool, right? Please note that "set grid" wasn't necessary; it just makes it easier to see that the tics appear just where they should. The only gotcha here is to remember that you must specify start and end tics in the format in which they appear in the source file ("20110105"), not the one shown in the plot ("Jan 05").

Summing up

Charts of time-based data are extremely useful, in many ways, for work and study. Here I have explained all you'll probably need to know to use Gnuplot to make sense of such data. In case it isn't enough, you can find even more information in the Gnuplot documentation by typing help time/date or help set xtics at the prompt.

About

Marco Fioretti is a freelance writer and teacher whose work focuses on the impact of open digital technologies on education, ethics, civil rights, and environmental issues.

0 comments