[kwlug-disc] Backup and Recovery in the 21st Century

Tue Jan 12 12:01:35 EST 2010

sorry if anyone gets two copies of this reply.  Getting my mailing lists
mixed up again.

Khalid Baheyeldin wrote:
> On Tue, Jan 12, 2010 at 9:46 AM, Adam Glauser <adamglauser at gmail.com> wrote:
> 
>> Khalid Baheyeldin wrote:
>>
>>> The fact that it works on the file system level also has advantages. You
>>> can restore a partition EXACTLY to what it was before. This means
>>> timestamps, ownerships, access times, ...etc.
>>>
>> Is the main advantage over using --archive with rsync that the backup can
>> reside on a filesystem that does not support all the features of the source
>> filesystem?
> 
> 
> Rsync as most people use it is not real backup. You have only one copy on
> another machine (which is good), but you don't have versions. You just have
> the latest sync point in there.
> 
> Rsync here is like RAID, it protects from one thing only (disk failure in
> case of RAID, and machine failure in case of rsync). What they do not
> protect against, is user error, e.g. someone deletes a file by mistake then
> realizes it a week after, or an application bug corrupts data and you
> realize that after 4 days.
> 
> With Rsync and RAID, you are out of luck. If you have real versioned backup
> you can go back and retrieve your file from the older backup.
> 
> I have had a few instances of "oops!" and had to go to backup and found what
> I deleted in it. Rsync and RAID do not provide that.
> 

I realize that the caveat "as most people use it" is probably true, but
you are really underestimating rsync's ability.

In particular, rsync has the built in ability to use a Unix file system
hard link ability to create what I find to be the most awesome
versioning backup system ever.  Most people use rsnapshot, though I'm
the kind of masochists who rolls out his own scripts.

Pro:

Backups with versioning.  As many versions as you want, either just a
straightforward number of versions or some hierarchical time system,
such as 7 daily backups, 4 weekly and 12 monthly.

Data transmitted on the wire only needs to be the difference between old
file and new file, so backup system works well for large data sets over
slow WAN links.

Each and every backup only requires data transfer and storage space of
an incremental backup, but can be accessed on the filesystem like a full
backup.  No need to assemble backup sets or use a database scheme to
keep track of what file can be found where for what day.  Absolutely no
software besides bash (or your shell of choice) needed to find and
restore files.

Cons:

No compression on storage. I believe, however, for most people, this
scheme will still require less space than any other since there's never
more than 1 "full backup," insofar as storage space is concerned.

Even though rsync only transmits the differences between two files, a
complete copy of changed files gets stored on the remote storage.  This
would be undesirable if you are backing up large files that change
frequently, in which case, you might prefer a backup system that backs
up incremental diff (rdiff?)

Users who re-organize directory structure or change directory names can
completely gum up the WAN if rsync suddenly thinks it has to transfer
Gigabytes of data because a top level directory name has changed.