[kwlug-disc] rcs (git?) integration/prevalence? [Was: CSS: varying colours based on inheritance]

unsolicited unsolicited at swiz.ca
Fri Jul 8 11:54:42 EDT 2011

Chris Frey wrote, On 07/08/2011 2:17 AM:
> On Fri, Jul 08, 2011 at 01:31:54AM -0400, Kyle Spaans wrote:
>> The problem here isn't a GUI or lack thereof. The problem is content.
>> Git (and most RCSes) are built around versioning and diffing plaintext
>> files (some are better/worse at version binary files). The unit of
>> currency in Git-land is the patch, generated by `diff`. OO.org files
>> are not plaintext, so trying to integrate git into OO.org runs you
>> into the problem of "Well, git isn't totally meant for binary files".
>> On the other hand, (not knowing anything about the versioning system
>> in OO.org) with binary diffs or a special diff that understands the
>> structure of OO.org documents, you can build a versioning system -- it
>> won't necessarily have the same interface as Git though: you commit
>> changes, see a version history, and roll-back when needed.
>> But now if your "word processor" was some plaintext editor, IDE or
>> OO.org in plaintext mode, then it's trivial to present a GUI interface
>> with buttons for each of the git commands.
> Sorry, but I gotta correct this. :-)
> While some SCM's are diff-based (such as CVS, and I think arch),
> git is not.  Git stores each revision of a file as a separate copy
> in a compressed file under the .git directory, using the SHA1 sum
> as the filename / directory name.
> Git only cares about diffs when it combines all these separate revision
> files into an optimized .pack file.  Then it does store diffs.  But
> that is just an optimization.
> As for OpenOffice documents, the crazy thing about them is that they *are*
> just text files.  If you run 'unzip -l' on an .odt file, you'll see a
> bunch of XML files.
> All that said, the argument is somewhat valid for files like JPG or
> even GPG documents, which may not diff very well.
> But I think this is beside the point of a filesystem that stores file
> revision history automatically.  A filesystem does not need to worry
> about compression, and could store the revisions the same way that
> git does: a separate file for each revision.  This would take care
> of the binary file issue.
> It would be nice if the filesystem also purged old revisions automatically
> when it started running out of space.
> The more I ponder this, the more I think this could be done by writing
> a FUSE filesystem driver with a git backend....
> *google searches*
> And what do you know?  gitfs and figfs already exist.  I wonder if
> they do what we want.  If not, maybe combining with something like
> git-annex or bup would work.
> This project needs programmers with free time. :-)

So, clearly, my premise is understood. Joe user getting on with his 
daily use, on all file types, be it .odt or .jpg or even .raw, and the 
magic that is rcs sitting in the background just making this all 
transparent to him. Add a couple of generic utilities for reverting 
back and recovering disk space (by deleting intermediate versions).

Isn't one of the long time pros of rcs, though, the minimal storage 
keeping revisions take? And doesn't Chris' point that full versions 
are kept (even if zipped) diminish this pro?

 From this thread, it seems that the most effective revision storage 
options happen when the file type is understood. e.g. .txt, therefore 
diff, and store changes only. There are a lot of file types out there.

It almost feels like a filter to be passed through before being handed 
to the filesystem, save a file, the filter compresses / diffs / 
whatever is appropriate for that rcs, then passes the result on to the 
file system. Is this such a big thing, that only something like a 
Canonical, Debian, Red Hat, etc., could tackle it?

Is this, in some sense, journaling?

IIRC, someone git's their home directory. cronjob? Does this get us 
'close enough'?

More information about the kwlug-disc mailing list