Photographs are wonderful. Digital photographs are even better. We can take, archive and share as many of them as we want, at the smallest possible cost… if we pay the right price. A digital archive of many thousands images (which takes surprisingly little time to happen!) doesn’t tolerate carelessness. If you don’t build and manage it properly from the beginning, it will be worse than not having it at all. Here are a few quick tips to keep normal Jpeg digital photographs under control, with GUI or command line tools that work in any Gnu/Linux distribution. If you generate and store high quality raw pictures you’ll have to follow some extra steps, but that’s stuff for another article.
1. Use tool-independent, plain old file system based storage
Some picture managers store pictures and metadata inside a database to improve performances. If that’s their only storage option, stay away from them. The first thing you want from your archive is the guarantee that both the pictures and the associated metadata will always be fully accessible from any computer, now and in the future. In practice, this means you can use any software you like, as long as it:
- works without complaints with albums that are simple directories, maybe created by other programs
- can dump all the metadata you use in the pictures themselves, in standard formats like EXIF or IPTC.
Why? Because only archives like these remain accessible and more or less ordered and immediately usable even when you change software or operating system, no matter where and how they were backed up!
As far as I’m concerned, those above are the main reasons why I use digiKam (which is full of cool features anyway!). You only need to tell it where the pictures are in Settings | Configure DigiKam | Collections and start working. Whenever you want to save tags and other metadata in the pictures, select Image | Write metadata to image and they will remain available with the files even if you change software (but remember tip #7!)
2. Use a consistent naming scheme. Always
Use always one, and only one, file naming scheme, that doesn’t allow spaces or non-ASCII characters. Otherwise, sooner or later you’ll encounter two problems. The first is, at least in the short/medium term, less portability of your collection, that is less backup options: what if you must give a friend some pictures or your main backup drive is full, and the only option available is a FAT USB key?
The other problem is coherence. Whatever naming scheme you’ll conceive and follow religiously, sooner or later you’ll want to archive photographs scanned manually or named in some other way by a friend. The only way to keep a common order is to have a naming scheme that tools like Krename (or digiKam itself, albeit less flexibly) can apply automatically to rename all new pictures. Personally, I create folders with names like 20110508_trip_to_Brazil and pictures names that are simply their own timestamp, e.g., YYYYMMDDHHMM.jpg.
3. Get rid of similar photographs. Now!
During a vacation it makes a lot of sense to take as many shots as possible but… do you really need to keep 15 almost identical photographs of you in that pool? They’ll bore to death you and your relatives and fill your drive much more quickly than you’d think! So don’t hesitate. Open digiKam’s Light Table (Ctrl-L), or a dedicated tool like Darktable, to look at similar pictures side by side and throw away the worst ones.
4. Remove duplicates
There’s another category of pictures that should be automatically eliminated as soon as possible, that is actual duplicates. They may appear in your folder if, by mistake, you download the same pictures twice, or if you restore some folders from a backup (both things happened to me). Any good photo manager will be able to help you find duplicates, but why do manually something that computers can and should do by themselves? I routinely find duplicate pictures with this little great shell script by J. Elonen.
In our case, geotagging means writing geographical coordinates inside digital pictures. This makes it possible to search and display them by location in lots of neat ways. Geotagging is easy too! In digiKam, select all the pictures taken in the same location, click on Image |Geolocation- | Edit coordinates, find on the map that location, and you’re done! From now on, whenever you select a picture and click on the small Geolocation icon in the right pane you’ll see where it was taken. Alternatively, a click on the world icon on the left will let you perform “Map Searches” that find all pictures taken within the area you select.
6. Back up as often as possible
As obvious as this may seem, it never hurts to say it clearly. When (not if, when) one of your drives will break, you’ll want to have a backup that is as recent as possible. The only way to make this happen to thousands of pictures is to make it really quick and simple. A very portable way to do it is to use the rsync utility as follows (DRIVE is the absolute path to your backup drive, something like /media/usb_drive):
rsync -rpvt --delete /photo/ DRIVE/photo
This command will not preserve file attributes like links and other stuff that may not be supported on some file systems, but will make sure that the photo folder on the backup drive is always a perfect copy of the main one (/photo/ in this example). rsync doesn’t rewrite everything from scratch, it only changes what actually needs changing. Therefore, it’s really fast after the first complete backup.
7. Anonymize before publishing online!
Tags and embedded comments are essential to organize pictures, but may not always be suitable for general consumption. If you love to use tags like “Relatives_I_Really_Hate” you probably don’t want them to remain in the files that you publish on Flickr or email to your colleagues. The solution? Always run exiftools before handing out your tagged pictures! This command at a prompt will remove all the embedded EXIF tags from all files in the current directory:
exiftool -all= *.jpg