[kwlug-disc] How GIT stores stuff.

Chris Frey cdfrey at foursquare.net
Wed May 4 00:32:25 EDT 2022


On Tue, May 03, 2022 at 03:53:26PM -0400, Bob B wrote:
> Any idea how to interpret what it actually did from this:
> 
> tower:.password-store bob$ git gc
> Counting objects: 4031, done.
> Delta compression using up to 24 threads.
> Compressing objects: 100% (2261/2261), done.
> Writing objects: 100% (4031/4031), done.
> Total 4031 (delta 1839), reused 3683 (delta 1703)

These are common statistics you'll see when you fetch and push
from/to remote repositories, and here with garbage collection.
Also with fsck.  It's basically reporting how many objects you have,
and it's progress as it works through the data.

I had never looked into the last line in detail until now, but I did
find an explanation:

	https://stackoverflow.com/questions/9379714/what-do-the-numbers-in-the-total-line-of-git-gc-git-repack-output-mean

> Total 4031 (delta 1839), reused 3683 (delta 1703)

Taking an educated guess, I would read this as saying you have 4031 total
objects in your git repo.  I assume 1839 of them could be compressed in
a diff-like way.  You already had 3683 objects in a pack already (perhaps
from a previous git-gc run which sometimes happens automatically for you),
and of those, 1703 were able to be compressed diff-style.

If you run git-gc again, before and after numbers should be alike.


> So many questions! Do you have a good learning resource for git or is
> it just a matter of using it more and learning nuances like this as you
> come across them?

The book I usually recommend to anyone who wants to understand git
is "Git from the bottom up" by John Wiegley:

	https://jwiegley.github.io/git-from-the-bottom-up/

In the early days of git, I used to follow the mailing list, but
that was a long time ago.  But the bottom-up details usually stick
once understanding is achieved.

I also have found the manpages to be very good if you have the time
to crunch through them.  Read them like a novel, and they will
reward you later. :-)

Especially the git-rev-parse page, which documents how to specify
revisions and ranges, which is syntax which can be used across many
other git commands, like git-log.

i.e.  If you know what these mean, you'll have a great handle on git

	master^
	..master
	HEAD^^
	branchname:./path/to/file

Git commands are split into porcelain (user-friendly high level commands),
and plumbing (low level data access commands).  The 'man git' manpage
lists which is which.  The plumbing commands have good manpages too,
but fewer safety mechanisms, so while you may not actually use a plumbing
command, reading its manpage can help you understand what the porcelain
is doing.

And reading through the list of all available commands (after you understand
git from the bottom up) is a good way to get a feel for what's available.

It's a fun fun fun fun world. :-)

- Chris


On Tue, May 03, 2022 at 03:53:26PM -0400, Bob B wrote:
> Chris,
> 
> Cool! Good to know.
> 
> I ran this on a copy of my password-store tree (all gpg encrypted files) and it reduced it by about 1M (~2%)
> 
> Any idea how to interpret what it actually did from this:
> 
> tower:.password-store bob$ git gc
> Counting objects: 4031, done.
> Delta compression using up to 24 threads.
> Compressing objects: 100% (2261/2261), done.
> Writing objects: 100% (4031/4031), done.
> Total 4031 (delta 1839), reused 3683 (delta 1703)
> 
> tower:.password-store bob$ du -sh . ~/.password-store/
>  41M	.
>  42M	/Users/bob/.password-store/
> 
> 
> So many questions! Do you have a good learning resource for git or is it just a matter of using it more and learning nuances like this as you come across them?
> 
> BB
> 
> 
> > -----Original Message-----
> > From: kwlug-disc <kwlug-disc-bounces at kwlug.org> On Behalf Of Chris
> > Frey
> > Sent: May 3, 2022 2:24 PM
> > To: KWLUG discussion <kwlug-disc at kwlug.org>
> > Subject: Re: [kwlug-disc] How GIT stores stuff.
> > 
> > On Tue, May 03, 2022 at 10:36:23AM -0400, Bob B wrote:
> > > It has some references to deeper information that look enticing, but
> > in
> > > summary I think it confirms what I said in that GIT stores complete
> > files,
> > > not deltas. At least not deltas as 'diffs' of text files.
> > 
> > This is correct.  At least until you run 'git gc' which then turns
> > those individual files into packs, which do store things in diff-like
> > ways.
> > 
> > When you do a fresh clone from a remote repository, you will download
> > the pack,
> > which you can see if you look inside the .git directory.  If you start
> > your own git repo, you will see many individual files in the objects
> > subdirectory until you run git-gc and git decides it is time to
> > pack things up for optimization purposes.
> > 
> > - Chris
> > 
> > 
> > _______________________________________________
> > kwlug-disc mailing list
> > kwlug-disc at kwlug.org
> > https://kwlug.org/mailman/listinfo/kwlug-disc_kwlug.org
> 
> 
> 
> 
> _______________________________________________
> kwlug-disc mailing list
> kwlug-disc at kwlug.org
> https://kwlug.org/mailman/listinfo/kwlug-disc_kwlug.org




More information about the kwlug-disc mailing list