December 10, 2018

Fedora - Backups and a corrupted disklabel

I decided to reboot my home system, expecting this to be a routine process. The boot up was unusually slow, then dropped me to a single user shell. The bottom line is that my second disk (sdb) had a corrupted disklabel, more on that below. My primary disk was fine, and I could edit /etc/fstab and comment out the line that wanted to mount sdb onto /u1 and then reboot. Rebooting seemed the best thing to get networking up and allow me to investigate a way to recover from the disaster.

Backups

There is nothing like a situation like this to make you regret not having a current backup. As it turned out, I was able to recover without losing a bit of data, but things could have gone much differently. My most recent backup was over 8 months old, being done back in April of 2018.

I have a pair of external USB drives I used to do backups. It is really no great inconvenience to perform backups. I get the drive (or both) out of the drawer, plug them in, then run (as root) a script that is a wrapper around rsync. Even after 8 months, a single backup took less than an hour.

The trick is twofold. One is not to be complacent because you keep thinking "I know I need to do a backup, and will real soon". Therein lies the root of all self deception and the old saying about the road to hell being paved with good intentions. Intentions are useless, in fact worse than useless because the deceive you into thinking you are on the right track. Ask every smoker. Ask every out of shape and overweight person.

What I need is a reminder (along with a policy). My policy is to do backups every month. Maybe I should every week, but once a month would be a big improvement over once a year. In the good old days, I would use a cron job to send me mail as a reminder, but my home machine won't run the old command line "mail" program without setting up sendmail.cf and I want nothing to do with that. It would appear that Google Calendar is a viable option. I can set an Event or a Reminder and then specify that I want the notification via email. In fact an Event will hand notifications to Chrome if I allow it to (and I do). So I have a good chance of being pestered to do backups via either my browser or email, which sounds promising.

Another thing I did after I recovered from the whole mess is to make backup copies of my disklabels. This is easy to do by running "dd" with a count of 4 blocks (less would do). The "fdisk" command is happy to run on image files as well as devices.

That corrupted disklabel

What caused this was my work on a script to format an sdcard. I am not totally clear on what happened. The sdcard is /dev/sdg and the disk that got whacked is /dev/sdb. This is a pretty easy one finger typo, so that is probably what happened. It is truly unfortunate that removeable devices like USB sticks share the same names as vital things like system disks. I am tempted to add my own aliases to the udev scheme like /dev/sdcard-a or some such. Maybe someone in the linux kernel world will fix this gaping open manhole someday.

The crucial thing when a disaster like this happens is not to panic. Don't do anything at all until you are absolutely sure of what you need to do. You can easily make a recoverable situation worse or impossible. That being said, what I did was to break out my tablet and start doing searches on "linux recover disklabel". It turns out there is a tool called testdisk that is just the thing for this. It is somewhat cryptic to use, but ultimately it did the job.

Note my policy above of now making backup copies of disklabels. If I only had known the offsets and sizes of partitions, I could have run fdisk to regenerate the partition tables. For that matter, I could have simply used "dd" to write the saved partition table back onto the disk.

Once I got brave enough to tell testdisk to write the recovered disklabel back to the disk, I had my old sdb5 partition available as sdb3 I have some intervening partitions that I was not actually using and did not bother to recover those -- and probably never will. After this I mounted the partition in question readonly as follows:

mount -o /dev/sdb3 /u1
Then I ran backups of /u1 -- two hours later, I edited /etc/fstab to now mount sdb3 on /u1 instead of the old sdb5. Then I did the following:
umount /dev/sdb3
fsck -f /dev/sdb3
mount /u1
And I am back in business.



Have any comments? Questions? Drop me a line!

Adventures in Computing / tom@mmto.org