[kwlug-disc] rcs (git?) integration/prevalence? [Was: CSS: varying colours based on inheritance]

Chris Irwin chris at chrisirwin.ca
Fri Jul 8 14:07:29 EDT 2011


On Fri, Jul 8, 2011 at 11:54, unsolicited <unsolicited at swiz.ca> wrote:

> So, clearly, my premise is understood. Joe user getting on with his daily
> use, on all file types, be it .odt or .jpg or even .raw, and the magic that
> is rcs sitting in the background just making this all transparent to him.
> Add a couple of generic utilities for reverting back and recovering disk
> space (by deleting intermediate versions).
>
> Isn't one of the long time pros of rcs, though, the minimal storage keeping
> revisions take? And doesn't Chris' point that full versions are kept (even
> if zipped) diminish this pro?
>
> From this thread, it seems that the most effective revision storage options
> happen when the file type is understood. e.g. .txt, therefore diff, and
> store changes only. There are a lot of file types out there.
>
> It almost feels like a filter to be passed through before being handed to
> the filesystem, save a file, the filter compresses / diffs / whatever is
> appropriate for that rcs, then passes the result on to the file system. Is
> this such a big thing, that only something like a Canonical, Debian, Red
> Hat, etc., could tackle it?
>

My knowledge on the subject is fairly high-level (are there any Solaris
folks who can chime in?) but the ZFS filesystem can do a certain amount of
automatic de-duplication at the filesystem level (per some chunk size I
believe). You have file A, and download file B which is 90% similar. ZFS
would only store the extra 10% of the file, referencing the common on-disk
bits between them. Because this is at the filesystem level the advantage is
any similar files are deduplicated (think VM images), even if they do not
have a common origin (like revisions of a single file would). Think of it as
rsync for storage.

ZFS also has snapshots. Because ZFS handles both the storage and
snapshotting, a snapshot is fairly space efficient. EXT4+LVM can snapshot
for example, but because LVM does not understand EXT4 it also snapshots free
space. ZFS however only has to handle overlaying changes to actual data.
Writing new data to free space (such as downloading a file) requires no
special logic, handling, or 'extra' storage. So you could in theory tell ZFS
to snapshot your filesystem every hour, and run some maintenance scripts to
remove older snapshots when free space drops below a threshold. BTRFS may
have these features, I'm not up to date on it.

The problem I foresee with versioning filesystems is that people might
confuse that with a backup. Which it is not.

IIRC, someone git's their home directory. cronjob? Does this get us 'close
> enough'?


This is what I do for now. I have several repositories that are snapshotted
hourly with git, then pushed to my server. I didn't make a git repository of
my whole $HOME since there is a lot of garbage in there I don't care about
(thumbnails, caches, etc). Instead, I moved all important "Home" files to
another directory and symlink them to where they should be. I can run a very
simple find command to list any files in $HOME that are not picked up by git
(basically anything that isn't a symlink or directory, excluding the path to
my actual git repositories). I get the convenience of on-device file
versions, off-device storage by pushing my repos elsewhere (eg. laptop dies,
still have current & revisions on server). I also have this on multiple
machines with each living in it's own branch. This allows me to easily merge
changes between them.

The problem is typically when programmers use a VCS the commits are very
specific and logs read "XYZ changed". Unfortunately with hourly snapshots
you have potentially broad commits with generic messages. Finding a list of
versions of a specific file is a little troublesome. Actually fetching one
of those revisions, assuming you know which one, is a little more work. If
the file is not text-based, you must fetch it and view it to actually
determine if it is what you wanted in the first place. There may be
utilities to help with this, but I have not used any.

If anybody is interested, the code for my snapshot/pusher/linker is on
gitorious. There is a very brief README file.

    https://gitorious.org/chrisirwin-utils/home-repos/trees/master

-- 
Chris Irwin
<chris at chrisirwin.ca>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://kwlug.org/pipermail/kwlug-disc_kwlug.org/attachments/20110708/b3594e4e/attachment-0001.html>


More information about the kwlug-disc mailing list