Tuesday, September 27, 2016

Moving my archive of photo from Picasa to Digikam whilst preserving keywords and geotagging

I have a ridiculous number of images archived: over 24.000 files occupying 390 GB of space and extending back to when I first had a digital camera in 2003. I store them on an 1TB external USB hard drive and use an online backup service (Crash Plan) to make sure they are continuously backed up both locally on a NAS box and offsite, to cloud storage.

Clearly, such a vast collection of images is useless unless you can find the ones you want, so they need to be organised and catalogued in some way. I have used Google Picasa to do this:
  1. Adding keywords, such as the identity of the species depicted - e.g. Syrphus ribesii, Syrphidae, Diptera, Fleabane.
  2. Geotagging the photos with the location where they were taken.
Unfortunately, Google announced it was no longer supporting the desktop version of Picasa as of March 2016 and has moved to the cloud-base Google Photos. This is useless to me because it doesn't support keywording or geotagging. The last version of Picasa released was 3.9 in October 2015. Of course, Picasa has not stopped working, but it gets steadily less useful because:
  1. I have recently bought a Canon 80D and, for that model, Canon have yet again changed their RAW format (images still get a .CR2 extension). Whilst much software has been modified to support the new format, Picasa will not be updated and doesn't display the files properly (the thumbnails come out pale and mauve).
    Picasa displaying Canon 80D RAW files as thumbnails.

    The Google Maps API has moved on since support stopped and the geotagging functions in Picasa 3.9 no longer work properly making it impossible to geotag new images.
So I looked for a new tool to replace Picasa and, after lots of searching, reading reviews and trying out several pieces of software, I decided on Digikam which provides the facilities I want.

Up until 2010, I had a Nikon Coolpix 4500 and that did not support RAW format. The images it produced were stored as .JPG files. With JPEG images, Picasa stored keywords as IPTC format metadata and geotags as EXIF data directly in the image file. This is all available to Digikam and the keywords are picked up and stored in Digikam's database and images are shown or queried correctly on its maps. All good! However, Picasa did not modify RAW files. Keywords and geotags were not stored in the .CR2 files produced by my Canon DSLRs but in a ".picasa.ini" file placed in each directory of my archive. Here is what one of these files looks like:

[_MG_5324.CR2]
keywords=Brent Goose,Holkham NNR
backuphash=3966
geotag=52.965313,0.814490
[_MG_5325.CR2]
keywords=Holkham NNR,Pink-footed Goose
backuphash=3966
geotag=52.965313,0.814490
[_MG_5327.CR2]
keywords=Holkham NNR,Wigeon
backuphash=3966
geotag=52.965313,0.814490


Digikam does not know how to deal with stuff in this format. So I am left with an awful lot of images which I have geotagged and assigned keywords, but my chosen tool cannot use them!

I spent some time searching for solutions to this and got a hint from a blog post by a French photographer Michaël Delorme from which I was able to download a php script. This finds the .picasa.ini files in the directory structure it is pointed at, reads and parses each .ini file in turn and then uses the excellent command line tool Exiftool by Phil Harvey to write the required information directly into each image's metadata. I have taken on this idea, but written an R script to do the same job. Here is the R script I came up with:

library(ini)
library(tools)

archivePath <- "F:/PhotoArchive"

inis <- list.files(path=archivePath, recursive=TRUE,
                   full.names=TRUE, pattern="*ini$",
                   ignore.case=TRUE, all.files=TRUE)

pb <- txtProgressBar(min = 0, max = length(inis), style = 3)
for(i in 1:length(inis)){
    ini <- read.ini(inis[i])
    setTxtProgressBar(pb, i)
    for(f in 1:length(ini)){
        fname <- names(ini)[f]
        if(file_ext(fname)=="CR2"){
            doit <- FALSE
            args <- " -overwrite_original_in_place"
            if(!is.null(ini[[f]]$keywords)){
                kwds <- unlist(strsplit(ini[[f]]$keywords, ","))
                args <- paste(args, " -iptc:keywords=",
                              paste("\"", kwds, "\"",
                                    collapse=" -iptc:keywords+=", sep=""),
                              sep="")
                doit <- TRUE
            }
            if(!is.null(ini[[f]]$geotag)){
                xy <- unlist(strsplit(ini[[f]]$geotag, ","))
                lat <- as.numeric(xy[[1]])
                lon <- as.numeric(xy[[2]])
                if(lon>=0) lonref <- "E" else lonref <- "W"
                if(lat>=0) latref <- "N" else latref <- "S"
                args <- paste(args, " -GPSLatitude=", lat,
                                 " -GPSLatitudeRef=", latref,
                                 " -GPSLongitude=", lon,
                                 " -GPSLongitudeRef=", lonref,
                                 " -GPSAltitude=0 -GPSAltitudeRef=0", sep="")
                doit <- TRUE
            }
            if(doit){
              args <- paste(args, paste(dirname(inis[i]),fname,
                                        sep=.Platform$file.sep))
              x <- system2("exiftool", args=args, stdout=TRUE)
            }
        }
    }
}
close(pb)


This is not necessarily written in good R style (it uses loops rather than being vectorised) and it is certainly not fast! But it did the job for me (taking some hours to run) and is only needed once.

So, here is the step by step how to, if you find yourself in a similar position. Use it at your own risk!

  1. THIS MODIFIES THE ORIGINAL RAW FILES SO MAKE SURE YOU HAVE A GOOD BACKUP BEFORE YOU START.
  2. Download and install Exiftool. I downloaded the zip version for Windows (I am on version 10.27). There is no installation involved, you just unzip the executable and put it somewhere - but that somewhere does need to be in your path (so Windows can find it). The simplest way to achieve this is to pt it in you C:\Windows directory. Otherwise, add the path where you chose to put it in to your PATH system variable as follows (Windows 10, earlier versions differ slightly):
    1. Right click on "This PC" - either the icon on your desktop or the item in your start menu - and choose "Properties". 
    2. Click "Advanced system settings". 
    3. Open the "Advanced" tab in the System Properties window and click the [Environment Variables...] button near the bottom.  
    4. Select the PATH item in the System variables list and click the [Edit] button. 
    5. Add your path (without a trailing \) to the bottom of the list in the Edit environment variables window. 
    6. Press OK about 3 times to close all these windows.
  3. The file that is unzipped is called "exiftool(-k).exe". I changed this file name to "exiftool.exe" so it can be executed just by giving the commad "exiftool". 
  4. Test it is all working correctly by opening a command window (press Windows key-R, type "cmd" and click [OK]), type "exiftool" and press return. You should see a load of help information from the tool. If instead it says "'exiftool' is not recognized as an internal or external command", then it isn't set up correctly! Check you renamed it and it is in a directory in you path.
  5. I assume you already have R installed (I am using version 3.3.1) and I am using RStudio (I am on version 1.0.12) to provide a convenient GUI front-end. If you haven't used R before, there are plenty of good tutorials out there.
  6. Install the ini package for R. This will be used to parse the ini files. In RStudio, open the Packages pane in the lower, right hand window, click Install, type "ini" in the Packages prompt and click the [Install] button.
  7. Open a new script (File - New File - RScript) and paste in the above script.
  8. Modify line 4 archivePath <- "F:/PhotoArchive" to point to the root path of your archive. Note that R is, at base, Linux software so it expects file paths to be delimited with "/", not "\" as in Windows.
  9. As it stands it is looking for Canon RAW files with the extension .CR2. If your photos are from some other make of camera, you will need to change the extension it searches for in line 16 if(file_ext(fname)=="CR2"){. For example, if you use a Nikon, this should read if(file_ext(fname)=="NEF"){.
  10. As it stands this will OVERWRITE your RAW files with a new version containing keywords and/or GPS metadata. By default, Exiftool will also save the original version of the file, adding ".original" to the end of the file name. If you want it to do this, remove the tag -overwrite_original_in_place from line 18 which will then read args <- "" (i.e initialised to an empty string).
  11. Save the file and run it (click Source at the top-right corner of the script window).
  12. It shows a % done progress bar as it runs. Go and have a cuppa or mow the lawn whilst it does its thing...