April 20, 2018

The Rsync command

Rsync can be a great tool. I use it when I want to maintain identical copies of files on two different computers (or media). I have used it for backups when I have two computers connected by a reasonably fast link. I have also used it with removable media (such as a big external USB hard drive) in much the same way. The BIG advantage of using rsync is that is figures out how to make the two copies the same in a minimal and highly efficient way (i.e. without just doing a brute force copy).

Rsync, however is terribly overfeatured and has more options and modes than anyone can keep straight. These are simply my notes about how to use the few things that I find a regular need to use. I rarely use it across the network from one machine to another, although it is good to know that it can do such things and that it even supports its own protocol for this kind of thing (when I have done this, I just use an ssh connection).

Dry run

Using one of the following is a great way to ensure you have things right before copying millions of files into some directory you did not want to.
rsync --dry-run -av /old/ /new
rsync -n -av /old/ /new

Excluding files and directories

This is weirder than it should be. The trick here is that exclude patterns are always relative to the source path. Also avoid any leading slash. So, if your source is /usr/xxx and you want to exclude the directory /usr/xxx/spam, then just put "spam" as the pattern. The following is a real example that uses a file full of exclude patterns:
rsync -av -e "ssh -p 4223" --exclude-from=crater_exclude root@casting.as.arizona.edu:/d0/pilot /u9

Here is a simpler example of the --exclude-from file option. It allows you to specify files (and/or directories) relative to the source path that should be excluded from the copy. You use your favorite editor to create a file (pathnames relative to the source directory) that should be excluded from the sync.

For example if you were doing the following:

    rsync -av --exclude-from=exfile /backuproot/ /
Then the exfile might look like (along with other things)
dev
proc

The slash at the end of the path thing

My typical use of rsync looks like one of the following commands:

rsync -av /home/wally /archive
rsync -av /home/wally/ /archive/wally
The difference between these two commands has bitten me many times and is one of my main motivations for writing this little note.
Let's look at these one at a time:
  1. rsync -av /home/wally /archive
    
    The above command will create (if it does not exist already) the directory /archive/wally and recursively copy everything into there. This is just the thing if you are doing a first time copy. This is usually not what I want and makes me angry by producing a new directory.
  2. rsync -av /home/wally/ /archive/wally
    
    This command will NOT create the wally directory inside /archive, but expects the source and destination directories to be "equivalent". This is just the thing if you want to make a prior copy up to date, and usually matches how I think about things.

Also take note of the -av switch. The v option just says to be verbose and tell about every file that gets transferred. The a option is a composite option that is the same as -rlptgoD which is in general "just what I want". When I forget the -av switch, I get the confusing message.

skipping directory wally

You also may want to consider adding the -u (update) switch. This tells rsync to skip files that have newer timestamps at the destination.

Some useful options

Things may go faster (though I am not overwhelmed) if you give the --size-only option. This simply uses file sizes to decide if a file has changed or not. This avoids calculating cryptographic checksums, which would be useful to detect files whose contents had changed, but whose sizes had not. Certainly calculating such checksums in computationally intensive, but a large rsync is dominated by IO times, and for a small rsync, who cares? This switch could also be useful if timestamps between two machines could simply not be trusted.

--bwlimit=2000 says to bandwidth limit what goes on in units of kb/second, which is nice for gigantic transfers over shared links.

--delete says to delete files on the receiver that are not on the source. I do not routinely do this, but it is the thing to do when you do something like rearrange the directory structure on the source.

Consider the -u option (and thus using -auv instead of -av). This says to avoid overwriting a destination file with a newer modification time than the source, the idea being that the files are different, but that the destination file is more recent and we don't want to overwrite it with an old copy from another directory or machine. In general the right thing. As an example, I use the following command to keep a backup copy of all my photos on a good sized external USB hard drive:

rsync -auv /home/Camera/photos /media/disk

Remote rsync

First, set up ssh keys so you can use ssh to go from one machine to the other without a password (or be prepared to supply a password when you are prompted for one). The game can be played in either of the following ways (fetching or putting):
rsync -auv -e ssh remotehost:/home/wally /local/archive

rsync -auv -e ssh /local/archive/wally remotehost:/home
All the same screwy business of the trailing slash and creating new directories mentioned above still applies.
Have any comments? Questions? Drop me a line!

Adventures in Computing / tom@mmto.org