Open Source optimize

Add watermarks to all your ODF files automatically

Marco Fioretti shares a script that allows you to add watermarks to already-existing ODF files automatically.

Office documents like texts, presentations, and spreadsheets can have watermarks, that is, images or (much more often) semi-transparent text as the background of all their pages.

Normally, the purpose of a watermark is to declare, in a way that it is impossible to miss, the current status of a document, or who published it. "Draft", "Pending Approval", "Strictly Confidential" or the logo of a company are all common watermarks.

How do you add a watermark to the text you are editing? With LibreOffice or OpenOffice, you have to add as background the image of the corresponding text to the current page style. This procedure, explained in the official online documentation, is all you need to know to add watermarks to templates or single files.

Things change a lot, however, if you need to add watermarks to many, already existing files. This could be, for example, the case of a Public Administration that decides to put online all the transcripts of its past meetings, but with a huge watermark saying "unofficial."

Should they do it manually, one file at a time? Of course not, at least if they are using open formats and software! The OpenDocument Format (ODF) file format, used by default in Libre Office /Open Office is very simple to hack. An .odt file is simply a ZIP archive with figures and plain text XML files inside it.

Figure A

Figure B

The only differences between the ZIP archives corresponding to the files in Figure A and B is that the second one has:
  • a Pictures subfolder, containing the background image in PNG format
  • one extra line in its manifest.xml, explaining what that image is
  • one extra statement in its styles.xml file, that sets that image as the background in the page style declaration

Here is what that statement looks like:

<style:background-image xlink:href="Pictures/10000201000009B000000DB4B15311A8.png" xlink:type="simple" xlink:actuate="onLoad"/>

The practical consequence of this ODF architecture is that a script able to add an image, and strings like that, in the right places inside a .odt file, will transform it in one with the new watermark. Here is how such a script could look like:

       1       #! /bin/bash
       2
       3       TMPDIR=/tmp/watermarking
       4       DOCUMENT=$1
       5       IMAGE=`basename $2`
       6
       7       XML='<style:background-image xlink:href="Pictures/pic.png" xlink:type="simple" xlink:actuate="onLoad"/>'
       8       MANIFEST='<manifest:file-entry manifest:media-type="image/png" manifest:full-path="Pictures/pic.png"/>'
       9
      10       FULLNAME=${DOCUMENT%.*}
      11       EXT=${DOCUMENT##*.}
      12       DIR=`dirname $DOCUMENT`
      13       BASE=`basename $DOCUMENT`
      14
      15       if [ -e "$FULLNAME.watermark.$EXT" ]
      16         then
      17           echo "Warning: $FULLNAME.watermark.$EXT already exists"
      18           exit
      19         fi
      20
      21       if [ -d "$TMPDIR" ]
      22         then
      23           true
      24       else
      25         mkdir $TMPDIR
      26         fi
      27
      28       cp $DOCUMENT $TMPDIR
      29       cd $TMPDIR
      30
      31       unzip $BASE > /dev/null
      32       rm    $BASE
      33
      34       XML=`echo $XML | sed -e s/pic.png/$IMAGE/ | sed -e 's|\/|\\\/|g'`
      35       SEDCOMMAND="sed -i -e 's/<\/style:page-layout-properties>/$XML<\/style:page-layout-properties>/' styles.xml"
      36       eval $SEDCOMMAND
      37
      38       MANIFEST=`echo $MANIFEST | sed -e s/pic.png/$IMAGE/ | sed -e 's|\/|\\\/|'`
      39       SEDCOMMAND="sed -i -e 's/<\/manifest:manifest>/$MANIFEST<\/manifest:manifest>/' META-INF/manifest.xml"
      40       eval $SEDCOMMAND
      41
      42       if [ -d "Pictures" ]
      43         then
      44          true
      45         else mkdir Pictures
      46         fi
      47
      48       cp $2 Pictures
      49
      50       zip -0 -X $FULLNAME.watermark.$EXT mimetype >> /dev/null
      51       zip -r    $FULLNAME.watermark.$EXT `find . -type f | grep -v mimetype` >> /dev/null
      52       exit

The scripts take as arguments the full paths of both the ODF file ($1) and that of the watermark image ($2). Lines 7 and 8 are the two XML snippets that we need to insert (with the right name of the watermark picture, of course) in the styles.xml and manifest.xml files. From line 10 to line 33, the script:

  • finds the base names and extensions of the involved files
  • checks if a watermarked version already exists
  • creates a temporary working directory
  • copies the original file inside it
  • unzips the copy and removes it

The fun starts at line 34. First, we replace "pic.png" in the $XML variable with the actual base name of our watermark picture. In the next two lines, we build a sed command that will insert $XML at just the right position in the styles file, that is at the end of the page-layout-properties section. Then we evaluate, that is execute, that command. Lines 30 to 40 perform the same trick to add the watermark file declaration to the XML manifest.

At this point, all we have to do is to copy the watermark picture in the Pictures subfolder, creating it, if needed (lines 42-48). The two zip commands at the end put everything into one ODF file. We must use two commands to make sure that the mimetype is at the beginning of the ZIP archive, as required by the ODF specification, explained here.

How to use and extend this script

Of course, this script makes sense when used inside a loop, or with the "find" command, to find and watermark, automatically, all the files in a hierarchy of folders. You should also take into account that as is, the script adds the same watermark to all the page styles defined in a document (thanks to the "g" modifier of sed in line 34). Should you need a different watermark for each style, you should expand lines 34-36 accordingly.

About

Marco Fioretti is a freelance writer and teacher whose work focuses on the impact of open digital technologies on education, ethics, civil rights, and environmental issues.

0 comments