May 22, 2019

Grinder - Rsync the pilot directory

I now have the pilot directory (the home directory for the user "pilot") mounted via NFS as /crater_d0/pilot -- as discussed elsewhere. Now I would like to mirror this to a second linux machine using rsync. I want to use rsync so I can update the mirror daily while a casting is in progress.

There are two complications. One is that the machine is behind a firewall and I need to use some special ssh options, as well as an alternate port number. The other is that the pilot directory contains lots of files that I don't want to mirror, and excluding them is tricky.

Some comments on rsync

I always find dealing with rsync to be a trial and error process. Much of its behavior is unique and non-intuitive, the documentation is lengthy and complex, and we are impatient. The --dry-run option can be useful to ensure that what you intend to happen is going to happen before transferring lots of files. Note that --dry-run can be tacked on the end of commands, which makes it easy to peel off once you want to do the real thing.

I sneak up on this by trying to transfer just one file, namely the usual convenient file for such tests: /etc/passwd. I already have an account set up for myself on the remote machine, and have set up ssh keys to login without a password. After some experimentation, I find that the following command works.

RHOST=tom@casting.as.arizona.edu
rsync -v -e "ssh -oKexAlgorithms=+diffie-hellman-group1-sha1 -p 9999" $RHOST:/etc/passwd ./zz
The main difficulty was setting this up as a shell script and getting the quotes to work right. Anything resembling shell programming is always a pain in the ass, but never mind that for now.

Now for the real data

I want to transfer *.imh and the entire pixels directory. I could probably do this in a single clever rsync command, but I am quite content to do each separately.

The pixels directory is easy, because I want everything in it First I set up my target directory, creating the empty pixels directory. Then I run this command (in a shell script):

RHOST=tom@casting.as.arizona.edu
rsync -av -e "ssh -oKexAlgorithms=+diffie-hellman-group1-sha1 -p 9999" $RHOST:/crater_d0/pilot/pixels/ ./pixels
This works great and does exactly what I want. Note the trailing slash on the source directory. Running it a second time grabs only changed files, so rsync is doing what it should.

Getting the *.imh files involves some rsync trickery to avoid transferring a myriad of unwanted (and bulky) subdirectories. There is also an odd issue that comes up. For some reason, IRAF keeps a bunch of hidden files that are actually hard links to the imh file. So for x.imh, there is a hidden link ..x.imh. Rsync does not recognize these as links, but does find and transfer them, which is as it should be I suppose. Getting rsync to ignore these is not as easy as it should or could be. The trick is that once something is included, it is included for keeps (or something like that). Excluding all dot files first seems to do the trick.

My final shell script to do this is as follows:

#!/bin/sh

RHOST=tom@casting.as.arizona.edu

#rsync -av -e "ssh -oKexAlgorithms=+diffie-hellman-group1-sha1 -p 9999" $RHOST:/crater_d0/pilot/pixels/ ./pixels --dry-run
rsync -av -e "ssh -oKexAlgorithms=+diffie-hellman-group1-sha1 -p 9999" $RHOST:/crater_d0/pilot/pixels/ ./pixels

#rsync -av -e "ssh -oKexAlgorithms=+diffie-hellman-group1-sha1 -p 9999" --exclude=".*" --include="*.imh" --exclude="*" $RHOST:/crater_d0/pilot/ . --dry-run
rsync -av -e "ssh -oKexAlgorithms=+diffie-hellman-group1-sha1 -p 9999" --exclude=".*" --include="*.imh" --exclude="*" $RHOST:/crater_d0/pilot/ .

Have any comments? Questions? Drop me a line!

Tom's home page / tom@mmto.org