[kwlug-disc] btrfs/zfs for backups
L.D. Paniak
ldpaniak at fourpisolutions.com
Tue Dec 2 22:57:02 EST 2014
On 12/02/2014 10:27 PM, Chris Irwin wrote:
> On 12/02/2014 08:44 PM, Paul Nijjar wrote:
>> My inclination is to go with Debian/Ubuntu fileservers that
>> synchronize backup files (one way). Once upon a time somebody (I think
>> it was Chris Irwin) got me all excited by talking about btrfs send and
>> receive, that would allow you to send snapshots of incremental changes
>> to remote hosts via SSH.
>
> Once upon a time somebody (possibly this "Chris Irwin" guy) said he'd
> do a talk about using btrfs, too.
>
> What months are open at this point? Keep in mind that winter adds a
> "depending on weather" for my attendance, since I need to come from
> (and back to) London.
>
Looks like March and onward are available. Updates on the state of
btrfs are always welcome.
> ...
>> Here is some information about the infrastructure:
>> - The fileserver will consist of a bunch of Samba shares
>> - Symantec BackupExec (yes, I know) will write big backup files to
>> these shares. Most of the files are 4GB large, but there are a few
>> files that are almost 100GB large.
>> - The two servers are connected via a wireless link that effectively
>> runs at 100Mbit
>> - The backup storage media are Western digital Green drives (sorry
>> Cedric)
>> - The servers themselves are nothing special: 64-bit intel
>> workstations with 2-4GB of RAM.
>> - These are backup files, so data integrity is important
>> - We can assume there are 100s of GB of data backed up each week,
>> although I am not sure whether this means hundreds of files are
>> changing each week. (This could be the case; BackupExec has a habit
>> of doing things in the most inconvenient way possible.)
>>
>> I am interested in hearing about how well btrfs works for the btrfs
>> send/receive scenario I am thinking about, [...]
>
> Assuming BackupExec is creating new full snapshots, you're not really
> going to have much benefit of doing a `btrfs send` versus just scp,
> xcopy, or rsync to your remote destination. Sending incremental
> changes only works when the data incrementally changed. Sure the
> source workstation is 99% unchanged, but a new full backup is entirely
> new data.
>
> You would gain data checksums, so you could verify the data is
> "correct" on each server (at least as correct as it was when written
> to disk). But if you've just got large files, you might as well just
> store md5sums of each file. If the md5sum is wrong on one server,
> refresh it from the other server. It's a pain, but it's probably
> easier than starting from scratch with a whole new untested solution.
>
> There may be some benefit from running deduplication if you're storing
> multiple versions of backups on disk... I don't have a lot of
> experience with using dedup, but I've got a few days off during the
> holidays in which I was already planning on playing with it.
>
> Regarding integrity, just using btrfs will tell you if a file is
> invalid as all data is checksummed. If you actually want it to fix
> that data automagically, you need to be running in raid1 mode, which
> means you need twice the storage capacity. If you're not using raid1,
> again, just check md5sums before you restore a backup.
>
> If you're looking at rsync, btrfs send, etc., can I assume changes
> only occur on one server? Why not look at xcopy on your existing
> windows server instead of dfs, at least as a first step?
>
>> [...] and any advice
>> strengthening/contradicting the StackExchange opinion. If people are
>> using ZFS (in particular ZFS on Linux) with zfs send/receive in this
>> manner then I am interested in that information as well. If people
>> have other options (such as an effective rsync option for Windows 64
>> bit) then feel free to chime in. I am more interested in experiences
>> than speculation.
>
> I have no experience with ZFS whatsoever, so can't comment on it.
>
> I only have one real piece of advice for Windows server... :)
>
>
All of the comments above on "send/receive" are also true in ZFS. If
you are syncing large, static, compressed files then rsync-like tools
would appear to be your best option. In that vein - and considering the
distinct possibility your wireless link is lossy - I would suggest
taking a look at something like BTSync:
http://www.getsync.com/
This syncs files between two server using bit-torrent tech. I think it
is a great idea. Unfortunately, experience with earlier versions of
BTSync led to disappointment with apparent file corruption. It appears
to have problems with files that are changing dynamically. This may not
be a problem for backup files which are generally fixed after being
produced. In any case, BTSync may be useful but caution is required.
I would avoid deduplication in ZFS unless you have plenty of $$ for
server RAM (hundreds of GB). Performance is generally abysmal on dedup
systems if one skimps on RAM. Given the price of hard drives at the
moment, I find it difficult to recommend either deduplication or
compression for ZFS: more spindles will increase your performance anyway.
In short, I do not think ZFS can solve any of the problems described
above. At best, you can have ZFS servers offer iSCSI/SRP targets to
Windows servers to use for your large storage volumes.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 473 bytes
Desc: OpenPGP digital signature
URL: <http://kwlug.org/pipermail/kwlug-disc_kwlug.org/attachments/20141202/ad519c3c/attachment.sig>
More information about the kwlug-disc
mailing list