darxus: (Default)
darxus ([personal profile] darxus) wrote2010-11-30 04:32 pm
Entry tags:

[geek] rsync nightmare - overwriting the wrong directory

I had some data loss that caused me to seriously question my sanity.

I had a 500gb primary hard drive (/), and a 2tb drive (/media/2tb) containing backups of my primary drive and my server, plus about 1.5tb of non-backed-up data.

I was attempting some work on an X video driver (nouveau, the open source nVidia driver, porting a patch related to wayland). There were a few times I got the machine to hang pretty good.

I didn't remember the magic-sysrq sequence (it's alt-sysrq reisub), and ctrl-alt-del wasn't working. Or quite possibly was on a 60 second delay I was unaware of.

And I wasn't noticing any response to pressing the power button briefly (which should initiate a soft shutdown). That was probably on the same 60 second delay. Thanks gnome.

So I shut down by holding the power button in for 4 seconds. A bunch of times.

This resulted in my 500gb drive refusing to do anything useful. I could mount it, fdisk showed me the partition table, but I couldn't read any data, and fsck gave me an interesting error about not being able to read anything. (dmesg)

I shut the machine down overnight, hoping that it would be more cooperative in the morning.

It wasn't. But I was able to re-format without problems, and successfully restored a backup from the previous day. All good.

This is where it gets weird.

Around that time I noticed the directory containing the ~1.5tb of not backed-up data disappeared. I tried a program that was supposed to be able to undelete files, and got nothing. I figured it was another drive crash or something. There also seemed to be extra files in /media/2tb/, the same as in /. I ignored that, fearing for my sanity.

The next day I checked my backups, and everything in /media/2tb/ was gone, replaced with a backup of / (my primary drive). Except for the things I --exclude'd. /dev/ hadn't been copied to /media/2tb/dev/, and /media/2tb/tmp/ contained what it previously had, not what /tmp/ contained.

It sure looked like rsync overwrote it. And my rsync job was indeed using the --del flag, which would delete anything not in the original which it copied (I since removed that).

But the cron job still said to output to /media/2tb/bak/dancer-`date +\%F` (need to escape %'s in cron jobs). I checked both my (5,000 line) command histories for any evidence I might have kicked off an rsync wrong and not remembered it, and found nothing unusual.

Convinced I must have a split personality which did this to me out of malice and then covered its tracks, I posted to the rsync mailing list.

The next day I got an email from the rsync cron job that contained the line:

"created directory /media/2tb/bak/da"

The command in the cron job was too long and truncated. rsync had in fact backed up / to /media/2tb/bak/da/.

The truncated path was exactly as much longer than "/media/2tb/" as the length of "--del " which I had removed.

So when it wiped /media/2tb/, it truncated right at /media/2tb/.