[kwlug-disc] btrfs/zfs for backups

Chris Irwin chris at chrisirwin.ca
Tue Dec 2 22:27:32 EST 2014


On 12/02/2014 08:44 PM, Paul Nijjar wrote:
> My inclination is to go with Debian/Ubuntu fileservers that
> synchronize backup files (one way). Once upon a time somebody (I think
> it was Chris Irwin) got me all excited by talking about btrfs send and
> receive, that would allow you to send snapshots of incremental changes
> to remote hosts via SSH.

Once upon a time somebody (possibly this "Chris Irwin" guy) said he'd do 
a talk about using btrfs, too.

What months are open at this point? Keep in mind that winter adds a 
"depending on weather" for my attendance, since I need to come from (and 
back to) London.

> That sounds exciting, but The Internet (or at
> least this StackExchange post:
> http://serverfault.com/questions/285878/is-btrfs-production-ready ) it
> seems that I should not consider this, and should go with ZFS instead
> (which also has this functionality).

I've been using btrfs on my machines for a few months now, and haven't 
had any issues "yet" (but now that I've publicly stated as much...).

Yes, there are things you should not be using it for (VMs in 
particular). That article references the "gotchas" page, which lists 
issues with the filesystem and certain use-cases. There are also some 
issues further down the line with distro scripts, etc. I recently had an 
issue with `fedup` on one of my btrfs machines, but not another one, due 
to a slightly peculiar filesystem choice that I didn't think was a big 
deal at the time.

`df` is a little tricky. You can have a full filesystem with free space 
available, which makes sense when you discover why, but is annoying anyway.

> Here is some information about the infrastructure:
> - The fileserver will consist of a bunch of Samba shares
> - Symantec BackupExec (yes, I know) will write big backup files to
>    these shares. Most of the files are 4GB large, but there are a few
>    files that are almost 100GB large.
> - The two servers are connected via a wireless link that effectively
>    runs at 100Mbit
> - The backup storage media are Western digital Green drives (sorry
>    Cedric)
> - The servers themselves are nothing special: 64-bit intel
>    workstations with 2-4GB of RAM.
> - These are backup files, so data integrity is important
> - We can assume there are 100s of GB of data backed up each week,
>    although I am not sure whether this means hundreds of files are
>    changing each week. (This could be the case; BackupExec has a habit
>    of doing things in the most inconvenient way possible.)
>
> I am interested in hearing about how well btrfs works for the btrfs
> send/receive scenario I am thinking about, [...]

Assuming BackupExec is creating new full snapshots, you're not really 
going to have much benefit of doing a `btrfs send` versus just scp, 
xcopy, or rsync to your remote destination. Sending incremental changes 
only works when the data incrementally changed. Sure the source 
workstation is 99% unchanged, but a new full backup is entirely new data.

You would gain data checksums, so you could verify the data is "correct" 
on each server (at least as correct as it was when written to disk). But 
if you've just got large files, you might as well just store md5sums of 
each file. If the md5sum is wrong on one server, refresh it from the 
other server. It's a pain, but it's probably easier than starting from 
scratch with a whole new untested solution.

There may be some benefit from running deduplication if you're storing 
multiple versions of backups on disk... I don't have a lot of experience 
with using dedup, but I've got a few days off during the holidays in 
which I was already planning on playing with it.

Regarding integrity, just using btrfs will tell you if a file is invalid 
as all data is checksummed. If you actually want it to fix that data 
automagically, you need to be running in raid1 mode, which means you 
need twice the storage capacity. If you're not using raid1, again, just 
check md5sums before you restore a backup.

If you're looking at rsync, btrfs send, etc., can I assume changes only 
occur on one server? Why not look at xcopy on your existing windows 
server instead of dfs, at least as a first step?

> [...] and any advice
> strengthening/contradicting the StackExchange opinion. If people are
> using ZFS (in particular ZFS on Linux) with zfs send/receive in this
> manner then I am interested in that information as well. If people
> have other options (such as an effective rsync option for Windows 64
> bit) then feel free to chime in. I am more interested in experiences
> than speculation.

I have no experience with ZFS whatsoever, so can't comment on it.

I only have one real piece of advice for Windows server... :)


-- 
Chris Irwin
e: <chris at chrisirwin.ca>
w: http://chrisirwin.ca






More information about the kwlug-disc mailing list