Linux

Seven photo-archiving tips and the Linux tools to help you

Linux expert Marco Fioretti shares seven tips for seamless archiving of your digital image files and recommends some tools to help you stay organized.

Photographs are wonderful. Digital photographs are even better. We can take, archive and share as many of them as we want, at the smallest possible cost... if we pay the right price. A digital archive of many thousands images (which takes surprisingly little time to happen!) doesn't tolerate carelessness. If you don't build and manage  it properly from the beginning, it will be worse than not having it at all. Here are a few quick tips to keep normal Jpeg digital photographs under control, with GUI or command line tools that work in any Gnu/Linux distribution. If you generate and store high quality raw pictures you'll have to follow some extra steps, but that's stuff for another article.

1. Use tool-independent, plain old file system based storage

Some picture managers store pictures and metadata inside a database to improve performances. If that's their only storage option, stay away from them. The first thing you want from your archive is the guarantee that both the pictures and the associated metadata will always be fully accessible from any computer, now and in the future. In practice, this means you can use any software you like, as long as it:

  • works without complaints with albums that are simple directories, maybe created by other programs
  • can dump all the metadata you use in the pictures themselves, in standard formats like EXIF or IPTC.

Why? Because only archives like these remain accessible and more or less ordered and immediately usable even when you change software or operating system, no matter where and how they were backed up!

As far as I'm concerned, those above are the main reasons why I use digiKam (which is full of cool features anyway!). You only need to tell it where the pictures are in Settings | Configure DigiKam | Collections and start working. Whenever you want to save tags and other metadata in the pictures, select Image | Write metadata to image and they will remain available with the files even if you change software (but remember tip #7!)

2. Use a consistent naming scheme. Always

Use always one, and only one, file naming scheme, that doesn't allow spaces or non-ASCII characters. Otherwise, sooner or later you'll encounter two problems. The first is, at least in the short/medium term, less portability of your collection, that is less backup options: what if you must give a friend some pictures or your main backup drive is full, and the only option available is a FAT USB key?

The other problem is coherence. Whatever naming scheme you'll conceive and follow religiously, sooner or later you'll want to archive photographs scanned manually or named in some other way by a friend. The only way to keep a common order is to have a naming scheme that tools like Krename (or digiKam itself, albeit less flexibly) can apply automatically to rename all new pictures. Personally, I create folders with names like 20110508_trip_to_Brazil and pictures names that are simply their own timestamp, e.g., YYYYMMDDHHMM.jpg.

3. Get rid of similar photographs. Now!

During a vacation it makes a lot of sense to take as many shots as possible but... do you really need to keep 15 almost identical photographs of you in that pool? They'll bore to death you and your relatives and fill your drive much more quickly than you'd think! So don't hesitate. Open digiKam's Light Table (Ctrl-L), or a dedicated tool like Darktable, to look at similar pictures side by side and throw away the worst ones.

4. Remove duplicates

There's another category of pictures that should be automatically eliminated as soon as possible, that is actual duplicates. They may appear in your folder if, by mistake, you download the same pictures twice, or if you restore some folders from a backup (both things happened to me). Any good photo manager will be able to help you find duplicates, but why do manually something that computers can and should do by themselves? I routinely find duplicate pictures with this little great shell script by J. Elonen.

5. Geotag!

In our case, geotagging means writing geographical coordinates inside digital pictures. This makes it possible to search and display them by location in lots of neat ways. Geotagging is easy too! In digiKam, select all the pictures taken in the same location, click on Image |Geolocation- | Edit coordinates, find on the map that location, and you're done! From now on, whenever you select a picture and click on the small Geolocation icon in the right pane you'll see where it was taken. Alternatively, a click on the world icon on the left will let you perform "Map Searches" that find all pictures taken within the area you select.

6. Back up as often as possible

As obvious as this may seem, it never hurts to say it clearly. When (not if, when) one of your drives will break, you'll want to have a backup that is as recent as possible. The only way to make this happen to thousands of pictures is to make it really quick and simple. A very portable way to do it is to use the rsync utility as follows (DRIVE is the absolute path to your backup drive, something like /media/usb_drive):

rsync -rpvt --delete /photo/ DRIVE/photo

This command will not preserve file attributes like links and other stuff that may not be supported on some file systems, but will make sure that the photo folder on the backup drive is always a perfect copy of the main one (/photo/ in this example). rsync doesn't rewrite everything from scratch, it only changes what actually needs changing. Therefore, it's really fast after the first complete backup.

7. Anonymize before publishing online!

Tags and embedded comments are essential to organize pictures, but may not always be suitable for general consumption. If you love to use tags like "Relatives_I_Really_Hate" you probably don't want them to remain in the files that you publish on Flickr or email to your colleagues. The solution? Always run exiftools before handing out your tagged pictures! This command at a prompt will remove all the embedded EXIF tags from all files in the current directory:

exiftool -all= *.jpg

About

Marco Fioretti is a freelance writer and teacher whose work focuses on the impact of open digital technologies on education, ethics, civil rights, and environmental issues.

4 comments
sailor009
sailor009

Hi Mr Fioretti, You wrote an article in Linux Format about using ExifTool for renaming pic files. The scripts were supposed to be on the DVD, but I could not find them. Could you reference the scripts or send them to me? (drhall_009@yahoo.com) Thanks for your help, I am really interested in working with these scripts with my pics. Doug

TAPhilo
TAPhilo

The more organized - which means following a very structed organizing method - the easier it becomes to find any photo no matter how many years later you want to find it. One thing to note: organizing for yourself is VASTLY different that organizing for someone else. If multiple people use the same system to find photos - then you have to ensure what whatever method you use they too can find them easily too. Thus hopefully you organize so that it is intuiative to others to find that one photo out 10s of thousands you may have.

Neon Samurai
Neon Samurai

I've been using Jhead to rename image files for a while now. Give it a standard naming format and point it at your directory of images; whammoo.. nice and quick. http://www.sentex.net/~mwandel/jhead/usage.html %d Day of month as decimal number (01 ??? 31) %H Hour in 24-hour format (00 ??? 23) %j Day of year as decimal number (001 ??? 366) %m Month as decimal number (01 ??? 12) %M Minute as decimal number (00 ??? 59) %S Second as decimal number (00 ??? 59) %U Week of year as decimal number, with Sunday as first day of week (00 ??? 53) %w Weekday as decimal number (0 ??? 6; Sunday is 0) %y Year without century, as decimal number (00 ??? 99) %Y Year with century, as decimal number Example: jhead -n%Y%m%d-%H%M%S *.jpg This will rename files matched by *.jpg according to YYYYMMDD-HHMMSS I like; jhead -ncollection-%Y%m%d-%H%M%S *.jpg then you just need to use your favorite tool to remove duplicates if your combining multiple copies of the same image set.

mfioretti
mfioretti

"organizing for yourself is VASTLY different that organizing for someone else" TAPhilo, well said. A good part of what I wrote in tip #2 comes just from this consideration: giving a picture a name that is its time stamp is the probably the only thing that will always be self-documenting and meaningful, no matter who finds that photograph. In addition to this, here's another tip about (meta)-naming that I forgot to put in the post: if you put people names in EXIF tags to index your pictures, put REAL, complete names. "Mom" or "Dad" are only appropriate and make sense to the sons and daughters of those people. What about all the other members of the family, or all the friends who will see those pictures? They won't know without ambiguity who "Mom" is. So if your mother's name is "Jane Smith", that's what you should use as a tag, if you want to organize your pictures in a way that is really useful and easy to search for everybody else with whom you may share your pictures. Yes, it's impersonal, but it works much better, and you can always add personal comments and thoughts about Mom in another field.