I recently found out Google Photos makes it difficult to get metadata from photos. Also, they were using up all my cloud storage. So I decided to self-host my photo library. To get the original photos with metadata intact, I had to export them using Google Takeout. I had about 1 TB of Google Photos data which I had to download in 50 GB chunks. If you’re downloading to a flash drive, there’s a good chance it’s formatted using FAT which means you’ll need to download them in 4GB chunks. 4GB * 250 downloads is quite a bit and Google makes you download each archive manually. So yeah, don’t use Google Photos as a photo backup solution. It’s free and the search tools are pretty good, though.
Once I downloaded all the archives, I deleted any duplicates using rclone. I chose to dedupe by hash and automatically delete every file except for the first result. The first result may not have the name I want, but I’m going to use exiftool to rename everything later. I redirect output to a log file in case I need to analyze things later.
rclone dedupe --by-hash --dedupe-mode first --no-console --log-file ../rclone.log .
Next, I organized everything into directories by date using exiftool. If the picture had metadata containing the date the picture was taken, I used that. If not, I used the time the file was modified. I redirected output to a log file so I can address things like duplicate files later.
exiftool -r -d %Y/%m/%d "-directory<filemodifydate" "-directory<createdate" "-directory<datetimeoriginal" -overwrite_original . 1> ../exiftool.log 2>&1
I’m trying to think of some other identifying information to add to the directory names, like the location or activity or event. Location is pretty easy to get in a parseable format.
exiftool -gpslatitude -gpslongitude -n -json . | jq
Then I just gotta reverse geocode it.
I’d also like to do some deduplication using an image similarity detector. Facebook open-sourced a tool that looks like it might work.