public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
* gcc git's performance problem Fwd: git gc expanding packed data?
@ 2009-08-07  2:17 Hin-Tak Leung
  2009-08-07 13:09 ` Frank Ch. Eigler
  0 siblings, 1 reply; 3+ messages in thread
From: Hin-Tak Leung @ 2009-08-07  2:17 UTC (permalink / raw)
  To: dberlin, fche, gcc

I asked the git people, and here is the answer - maybe somebody can
fix the gcc git repositry?

---------- Forwarded message ----------
From: Nicolas Pitre <nico@cam.org>
Date: Wed, Aug 5, 2009 at 11:39 PM
Subject: Re: git gc expanding packed data?
To: Hin-Tak Leung <hintak.leung@gmail.com>
Cc: git@vger.kernel.org


On Tue, 4 Aug 2009, Hin-Tak Leung wrote:

> I cloned gcc's git about a week ago to work on some problems I have
> with gcc on minor platforms, just plain 'git clone
> git://gcc.gnu.org/git/gcc.git gcc' .and ran gcc fetch about daily, and
> 'git rebase origin' from time to time. I don't have local changes,
> just following and monitoring what's going on in gcc. So after a week,
> I thought I'd do a git gc . Then it goes very bizarre.
>
> Before I start 'git gc', .The whole of .git was about 700MB and
> git/objects/pack was a bit under 600MB, with a few other directories
> under .git/objects at 10's of K's and a few 30000-40000K's, and the
> checkout was, well, the size of gcc source code. But after I started
> git gc, the message stays in the 'counting objects' at about 900,000
> for a long time, while a lot of directories under .git/objects/ gets a
> bit large, and .git blows up to at least 7GB with a lot of small files
> under .git/objects/*/, before seeing as I will run out of disk space,
> I kill the whole lot and ran git clone again, since I don't have any
> local change and there is nothing to lose.
>
> I am running git version 1.6.2.5 (fedora 11). Is there any reason why
> 'git gc' does that?

There is probably a reason, although a bad one for sure.

Well... OK.

It appears that the git installation serving clone requests for
git://gcc.gnu.org/git/gcc.git generates lots of unreferenced objects. I
just cloned it and the pack I was sent contains 1383356 objects (can be
determined with 'git show-index < .git/objects/pack/*.idx | wc -l').
However, there are only 978501 actually referenced objects in that
cloned repository ( 'git rev-list --all --objects | wc -l').  That makes
for 404855 useless objects in the cloned repository.

Now git has a safety mechanism to _not_ delete unreferenced objects
right away when running 'git gc'.  By default unreferenced objects are
kept around for a period of 2 weeks.  This is to make it easy for you to
recover accidentally deleted branches or commits, or to avoid a race
where a just-created object in the process of being but not yet
referenced could be deleted by a 'git gc' process running in parallel.

So to give that grace period to packed but unreferenced objects, the
repack process pushes those unreferenced objects out of the pack into
their loose form so they can be aged and eventually pruned.  Objects
becoming unreferenced are usually not that many though.  Having 404855
unreferenced objects is quite a lot, and being sent those objects in the
first place via a clone is stupid and a complete waste of network
bandwidth.

Anyone has an idea of the git version running on gcc.gnu.org?  It is
certainly buggy and needs fixing.

Anyway... To solve your problem, you simply need to run 'git gc' with
the --prune=now argument to disable that grace period and get rid of
those unreferenced objects right away (safe only if no other git
activities are taking place at the same time which should be easy to
ensure on a workstation).  The resulting .git/objects directory size
will shrink to about 441 MB.  If the gcc.gnu.org git server was doing
its job properly, the size of the clone transfer would also be
significantly smaller, meaning around 414 MB instead of the current 600+
MB.

And BTW, using 'git gc --aggressive' with a later git version (or
'git repack -a -f -d --window=250 --depth=250') gives me a .git/objects
directory size of 310 MB, meaning that the actual repository with all
the trunk history is _smaller_ than the actual source checkout.  If that
repository was properly repacked on the server, the clone data transfer
would be 283 MB.  This is less than half the current clone transfer
size.


Nicolas

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: gcc git's performance problem Fwd: git gc expanding packed data?
  2009-08-07  2:17 gcc git's performance problem Fwd: git gc expanding packed data? Hin-Tak Leung
@ 2009-08-07 13:09 ` Frank Ch. Eigler
  2009-08-08 21:15   ` Hin-Tak Leung
  0 siblings, 1 reply; 3+ messages in thread
From: Frank Ch. Eigler @ 2009-08-07 13:09 UTC (permalink / raw)
  To: Hin-Tak Leung; +Cc: dberlin, gcc

Hi -

Nicolas wrote:

> Anyone has an idea of the git version running on gcc.gnu.org?  It is
> certainly buggy and needs fixing.

It was 1.6.3.2 now it's 1.6.4, practically spring chickens.

> Anyway... To solve your problem, you simply need to run 'git gc' with
> the --prune=now  [...]
> And BTW, using 'git gc --aggressive' with a later git version (or
> 'git repack -a -f -d --window=250 --depth=250') gives me a .git/objects
> directory size of 310 MB [...]

Unfortunately, git gc --aggressive / repack proceed consume several
gigabytes of memory, which on the 32-bit host sometimes fails.


- FChE

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: gcc git's performance problem Fwd: git gc expanding packed data?
  2009-08-07 13:09 ` Frank Ch. Eigler
@ 2009-08-08 21:15   ` Hin-Tak Leung
  0 siblings, 0 replies; 3+ messages in thread
From: Hin-Tak Leung @ 2009-08-08 21:15 UTC (permalink / raw)
  To: Frank Ch. Eigler; +Cc: dberlin, gcc

On Fri, Aug 7, 2009 at 1:49 PM, Frank Ch. Eigler<fche@redhat.com> wrote:
> Hi -
>
> Nicolas wrote:
>
>> Anyone has an idea of the git version running on gcc.gnu.org?  It is
>> certainly buggy and needs fixing.
>
> It was 1.6.3.2 now it's 1.6.4, practically spring chickens.
>
>> Anyway... To solve your problem, you simply need to run 'git gc' with
>> the --prune=now  [...]
>> And BTW, using 'git gc --aggressive' with a later git version (or
>> 'git repack -a -f -d --window=250 --depth=250') gives me a .git/objects
>> directory size of 310 MB [...]
>
> Unfortunately, git gc --aggressive / repack proceed consume several
> gigabytes of memory, which on the 32-bit host sometimes fails.
>
>
> - FChE
>

I have a bit more discussion with the git guys (see the git archive if
interested), apparently the loose objects are all the user, feature
and misc svn branches. I guess they are of some use to somebody. I
think git user intending to casually play with gcc (e.g. following the
development for a small period of a few weeks, etc to fix some bugs)
needs to know about this problem with git gc (blowing up the packed
objects to a few gigabyte) though.

Thanks anyhow.

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2009-08-08 13:13 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-08-07  2:17 gcc git's performance problem Fwd: git gc expanding packed data? Hin-Tak Leung
2009-08-07 13:09 ` Frank Ch. Eigler
2009-08-08 21:15   ` Hin-Tak Leung

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).