Re: Something is broken in repack

public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed

* Re: Something is broken in repack
       [not found]     ` <9e4733910712102125w56c70c0cxb8b00a060b62077@mail.gmail.com>
@ 2007-12-11  7:01       ` Jon Smirl
  2007-12-11  7:34         ` Jon Smirl
  2007-12-11 13:49         ` Nicolas Pitre
  0 siblings, 2 replies; 45+ messages in thread
From: Jon Smirl @ 2007-12-11  7:01 UTC (permalink / raw)
  To: Nicolas Pitre, Junio C Hamano, gcc; +Cc: Git Mailing List

I added the gcc people to the CC, it's their repository. Maybe they
can help up sort this out.

On 12/11/07, Jon Smirl <jonsmirl@gmail.com> wrote:
> On 12/10/07, Nicolas Pitre <nico@cam.org> wrote:
> > On Mon, 10 Dec 2007, Jon Smirl wrote:
> >
> > > New run using same configuration. With the addition of the more
> > > efficient load balancing patches and delta cache accounting.
> > >
> > > Seconds are wall clock time. They are lower since the patch made
> > > threading better at using all four cores. I am stuck at 380-390% CPU
> > > utilization for the git process.
> > >
> > > complete seconds RAM
> > > 10%   60    900M (includes counting)
> > > 20%   15    900M
> > > 30%   15    900M
> > > 40%   50    1.2G
> > > 50%   80    1.3G
> > > 60%   70    1.7G
> > > 70%   140  1.8G
> > > 80%   180  2.0G
> > > 90%   280  2.2G
> > > 95%   530  2.8G - 1,420 total to here, previous was 1,983
> > > 100% 1390 2.85G
> > > During the writing phase RAM fell to 1.6G
> > > What is being freed in the writing phase??
> >
> > The cached delta results, but you put a cap of 256MB for them.
> >
> > Could you try again with that cache disabled entirely, with
> > pack.deltacachesize = 1 (don't use 0 as that means unbounded).
> >
> > And then, while still keeping the delta cache disabled, could you try
> > with pack.threads = 2, and pack.threads = 1 ?
> >
> > I'm sorry to ask you to do this but I don't have enough ram to even
> > complete a repack with threads=2 so I'm reattempting single threaded at
> > the moment.  But I really wonder if the threading has such an effect on
> > memory usage.
>
> I already have a threads = 1 running with this config. Binary and
> config were same from threads=4 run.
>
> 10% 28min 950M
> 40% 135min 950M
> 50% 157min 900M
> 60% 160min 830M
> 100% 170min 830M
>
> Something is hurting bad with threads. 170 CPU minutes with one
> thread, versus 195 CPU minutes with four threads.
>
> Is there a different memory allocator that can be used when
> multithreaded on gcc? This whole problem may be coming from the memory
> allocation function. git is hardly interacting at all on the thread
> level so it's likely a problem in the C run-time.
>
> [core]
>         repositoryformatversion = 0
>         filemode = true
>         bare = false
>         logallrefupdates = true
> [pack]
>         threads = 1
>         deltacachesize = 256M
>         windowmemory = 256M
>         deltacachelimit = 0
> [remote "origin"]
>         url = git://git.infradead.org/gcc.git
>         fetch = +refs/heads/*:refs/remotes/origin/*
> [branch "trunk"]
>         remote = origin
>         merge = refs/heads/trunk
>
>
>
>
> >
> >
> >
> > >
> > > I have no explanation for the change in RAM usage. Two guesses come to
> > > mind. Memory fragmentation. Or the change in the way the work was
> > > split up altered RAM usage.
> > >
> > > Total CPU time was 195 minutes in 70 minutes clock time. About 70%
> > > efficient. During the compress phase all four cores were active until
> > > the last 90 seconds. Writing the objects took over 23 minutes CPU
> > > bound on one core.
> > >
> > > New pack file is: 270,594,853
> > > Old one was: 344,543,752
> > > It still has 828,660 objects
> >
> > You mean the pack for the gcc repo is now less than 300MB?  Wow.
> >
> >
> > Nicolas
> >
>
>
> --
> Jon Smirl
> jonsmirl@gmail.com
>


-- 
Jon Smirl
jonsmirl@gmail.com

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Something is broken in repack
  2007-12-11  7:01       ` Something is broken in repack Jon Smirl
@ 2007-12-11  7:34         ` Jon Smirl
  2007-12-11 11:11           ` Andreas Ericsson
                             ` (3 more replies)
  2007-12-11 13:49         ` Nicolas Pitre
  1 sibling, 4 replies; 45+ messages in thread
From: Jon Smirl @ 2007-12-11  7:34 UTC (permalink / raw)
  To: Nicolas Pitre, Junio C Hamano, gcc; +Cc: Git Mailing List

Switching to the Google perftools malloc
http://goog-perftools.sourceforge.net/

10%   30  828M
20%   15  831M
30%   10  834M
40%   50  1014M
50%   80  1086M
60%   80  1500M
70% 200  1.53G
80% 200  1.85G
90% 260  1.87G
95% 520  1.97G
100% 1335 2.24G

Google allocator knocked 600MB off from memory use.
Memory consumption did not fall during the write out phase like it did with gcc.

Since all of this is with the same code except for changing the
threading split, those runs where memory consumption went to 4.5GB
with the gcc allocator must have triggered an extreme problem with
fragmentation.

Total CPU time 196 CPU minutes vs 190 for gcc. Google's claims of
being faster are not true.

So why does our threaded code take 20 CPU minutes longer (12%) to run
than the same code with a single thread? Clock time is obviously
faster. Are the threads working too close to each other in memory and
bouncing cache lines between the cores? Q6600 is just two E6600s in
the same package, the caches are not shared.

Why does the threaded code need 2.24GB (google allocator, 2.85GB gcc)
with 4 threads? But only need 950MB with one thread? Where's the extra
gigabyte going?

Is there another allocator to try? One that combines Google's
efficiency with gcc's speed?


On 12/11/07, Jon Smirl <jonsmirl@gmail.com> wrote:
> I added the gcc people to the CC, it's their repository. Maybe they
> can help up sort this out.
>
> On 12/11/07, Jon Smirl <jonsmirl@gmail.com> wrote:
> > On 12/10/07, Nicolas Pitre <nico@cam.org> wrote:
> > > On Mon, 10 Dec 2007, Jon Smirl wrote:
> > >
> > > > New run using same configuration. With the addition of the more
> > > > efficient load balancing patches and delta cache accounting.
> > > >
> > > > Seconds are wall clock time. They are lower since the patch made
> > > > threading better at using all four cores. I am stuck at 380-390% CPU
> > > > utilization for the git process.
> > > >
> > > > complete seconds RAM
> > > > 10%   60    900M (includes counting)
> > > > 20%   15    900M
> > > > 30%   15    900M
> > > > 40%   50    1.2G
> > > > 50%   80    1.3G
> > > > 60%   70    1.7G
> > > > 70%   140  1.8G
> > > > 80%   180  2.0G
> > > > 90%   280  2.2G
> > > > 95%   530  2.8G - 1,420 total to here, previous was 1,983
> > > > 100% 1390 2.85G
> > > > During the writing phase RAM fell to 1.6G
> > > > What is being freed in the writing phase??
> > >
> > > The cached delta results, but you put a cap of 256MB for them.
> > >
> > > Could you try again with that cache disabled entirely, with
> > > pack.deltacachesize = 1 (don't use 0 as that means unbounded).
> > >
> > > And then, while still keeping the delta cache disabled, could you try
> > > with pack.threads = 2, and pack.threads = 1 ?
> > >
> > > I'm sorry to ask you to do this but I don't have enough ram to even
> > > complete a repack with threads=2 so I'm reattempting single threaded at
> > > the moment.  But I really wonder if the threading has such an effect on
> > > memory usage.
> >
> > I already have a threads = 1 running with this config. Binary and
> > config were same from threads=4 run.
> >
> > 10% 28min 950M
> > 40% 135min 950M
> > 50% 157min 900M
> > 60% 160min 830M
> > 100% 170min 830M
> >
> > Something is hurting bad with threads. 170 CPU minutes with one
> > thread, versus 195 CPU minutes with four threads.
> >
> > Is there a different memory allocator that can be used when
> > multithreaded on gcc? This whole problem may be coming from the memory
> > allocation function. git is hardly interacting at all on the thread
> > level so it's likely a problem in the C run-time.
> >
> > [core]
> >         repositoryformatversion = 0
> >         filemode = true
> >         bare = false
> >         logallrefupdates = true
> > [pack]
> >         threads = 1
> >         deltacachesize = 256M
> >         windowmemory = 256M
> >         deltacachelimit = 0
> > [remote "origin"]
> >         url = git://git.infradead.org/gcc.git
> >         fetch = +refs/heads/*:refs/remotes/origin/*
> > [branch "trunk"]
> >         remote = origin
> >         merge = refs/heads/trunk
> >
> >
> >
> >
> > >
> > >
> > >
> > > >
> > > > I have no explanation for the change in RAM usage. Two guesses come to
> > > > mind. Memory fragmentation. Or the change in the way the work was
> > > > split up altered RAM usage.
> > > >
> > > > Total CPU time was 195 minutes in 70 minutes clock time. About 70%
> > > > efficient. During the compress phase all four cores were active until
> > > > the last 90 seconds. Writing the objects took over 23 minutes CPU
> > > > bound on one core.
> > > >
> > > > New pack file is: 270,594,853
> > > > Old one was: 344,543,752
> > > > It still has 828,660 objects
> > >
> > > You mean the pack for the gcc repo is now less than 300MB?  Wow.
> > >
> > >
> > > Nicolas
> > >
> >
> >
> > --
> > Jon Smirl
> > jonsmirl@gmail.com
> >
>
>
> --
> Jon Smirl
> jonsmirl@gmail.com
>


-- 
Jon Smirl
jonsmirl@gmail.com

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Something is broken in repack
  2007-12-11  7:34         ` Jon Smirl
@ 2007-12-11 11:11           ` Andreas Ericsson
  2007-12-11 15:01           ` Nicolas Pitre
                             ` (2 subsequent siblings)
  3 siblings, 0 replies; 45+ messages in thread
From: Andreas Ericsson @ 2007-12-11 11:11 UTC (permalink / raw)
  To: Jon Smirl; +Cc: Nicolas Pitre, Junio C Hamano, gcc, Git Mailing List

Jon Smirl wrote:
> Switching to the Google perftools malloc
> http://goog-perftools.sourceforge.net/
> 
> Google allocator knocked 600MB off from memory use.
> Memory consumption did not fall during the write out phase like it did with gcc.
> 
> Since all of this is with the same code except for changing the
> threading split, those runs where memory consumption went to 4.5GB
> with the gcc allocator must have triggered an extreme problem with
> fragmentation.
> 
> Total CPU time 196 CPU minutes vs 190 for gcc. Google's claims of
> being faster are not true.
> 

Did you use the tcmalloc with heap checker/profiler, or tcmalloc_minimal?

-- 
Andreas Ericsson                   andreas.ericsson@op5.se
OP5 AB                             www.op5.se
Tel: +46 8-230225                  Fax: +46 8-230231

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Something is broken in repack
  2007-12-11  7:34         ` Jon Smirl
  2007-12-11 11:11           ` Andreas Ericsson
@ 2007-12-11 15:01           ` Nicolas Pitre
  2007-12-11 15:36             ` Nicolas Pitre
  2007-12-11 17:19           ` Linus Torvalds
  2007-12-11 17:44           ` Daniel Berlin
  3 siblings, 1 reply; 45+ messages in thread
From: Nicolas Pitre @ 2007-12-11 15:01 UTC (permalink / raw)
  To: Jon Smirl; +Cc: Junio C Hamano, gcc, Git Mailing List

On Tue, 11 Dec 2007, Jon Smirl wrote:

> Switching to the Google perftools malloc
> http://goog-perftools.sourceforge.net/
> 
> 10%   30  828M
> 20%   15  831M
> 30%   10  834M
> 40%   50  1014M
> 50%   80  1086M
> 60%   80  1500M
> 70% 200  1.53G
> 80% 200  1.85G
> 90% 260  1.87G
> 95% 520  1.97G
> 100% 1335 2.24G
> 
> Google allocator knocked 600MB off from memory use.
> Memory consumption did not fall during the write out phase like it did with gcc.
> 
> Since all of this is with the same code except for changing the
> threading split, those runs where memory consumption went to 4.5GB
> with the gcc allocator must have triggered an extreme problem with
> fragmentation.

Did you mean the glibc allocator?

> Total CPU time 196 CPU minutes vs 190 for gcc. Google's claims of
> being faster are not true.
> 
> So why does our threaded code take 20 CPU minutes longer (12%) to run
> than the same code with a single thread? Clock time is obviously
> faster. Are the threads working too close to each other in memory and
> bouncing cache lines between the cores? Q6600 is just two E6600s in
> the same package, the caches are not shared.

Of course there'll always be a certain amount of wasted cycles when 
threaded.  The locking overhead, the extra contention for IO, etc.  So 
12% overhead (3% per thread) when using 4 threads is not that bad I 
would say.

> Why does the threaded code need 2.24GB (google allocator, 2.85GB gcc)
> with 4 threads? But only need 950MB with one thread? Where's the extra
> gigabyte going?

I really don't know.

Did you try with pack.deltacachesize set to 1 ?

And yet, this is still missing the actual issue.  The issue being that 
the 2.1GB pack as a _source_ doesn't cause as much memory to be 
allocated even if the _result_ pack ends up being the same.

I was able to repack the 2.1GB pack on my machine which has 1GB of ram. 
Now that it has been repacked, I can't repack it anymore, even when 
single threaded, as it start crowling into swap fairly quickly.  It is 
really non intuitive and actually senseless that Git would require twice 
as much RAM to deal with a pack that is 7 times smaller.

Nicolas (still puzzled)

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Something is broken in repack
  2007-12-11 15:01           ` Nicolas Pitre
@ 2007-12-11 15:36             ` Nicolas Pitre
  2007-12-11 16:20               ` Jon Smirl
  2007-12-11 16:22               ` Nicolas Pitre
  0 siblings, 2 replies; 45+ messages in thread
From: Nicolas Pitre @ 2007-12-11 15:36 UTC (permalink / raw)
  To: Jon Smirl; +Cc: Junio C Hamano, gcc, Git Mailing List

On Tue, 11 Dec 2007, Nicolas Pitre wrote:

> And yet, this is still missing the actual issue.  The issue being that 
> the 2.1GB pack as a _source_ doesn't cause as much memory to be 
> allocated even if the _result_ pack ends up being the same.
> 
> I was able to repack the 2.1GB pack on my machine which has 1GB of ram. 
> Now that it has been repacked, I can't repack it anymore, even when 
> single threaded, as it start crowling into swap fairly quickly.  It is 
> really non intuitive and actually senseless that Git would require twice 
> as much RAM to deal with a pack that is 7 times smaller.

OK, here's something else for you to try:

	core.deltabasecachelimit=0
	pack.threads=2
	pack.deltacachesize=1

With that I'm able to repack the small gcc pack on my machine with 1GB 
of ram using:

	git repack -a -f -d --window=250 --depth=250

and top reports a ~700m virt and ~500m res without hitting swap at all.
It is only at 25% so far, but I was unable to get that far before.

Would be curious to know what you get with 4 threads on your machine.


Nicolas

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Something is broken in repack
  2007-12-11 15:36             ` Nicolas Pitre
@ 2007-12-11 16:20               ` Jon Smirl
  2007-12-11 16:22               ` Nicolas Pitre
  1 sibling, 0 replies; 45+ messages in thread
From: Jon Smirl @ 2007-12-11 16:20 UTC (permalink / raw)
  To: Nicolas Pitre; +Cc: Junio C Hamano, gcc, Git Mailing List

On 12/11/07, Nicolas Pitre <nico@cam.org> wrote:
> On Tue, 11 Dec 2007, Nicolas Pitre wrote:
>
> > And yet, this is still missing the actual issue.  The issue being that
> > the 2.1GB pack as a _source_ doesn't cause as much memory to be
> > allocated even if the _result_ pack ends up being the same.
> >
> > I was able to repack the 2.1GB pack on my machine which has 1GB of ram.
> > Now that it has been repacked, I can't repack it anymore, even when
> > single threaded, as it start crowling into swap fairly quickly.  It is
> > really non intuitive and actually senseless that Git would require twice
> > as much RAM to deal with a pack that is 7 times smaller.
>
> OK, here's something else for you to try:
>
>         core.deltabasecachelimit=0
>         pack.threads=2
>         pack.deltacachesize=1
>
> With that I'm able to repack the small gcc pack on my machine with 1GB
> of ram using:
>
>         git repack -a -f -d --window=250 --depth=250
>
> and top reports a ~700m virt and ~500m res without hitting swap at all.
> It is only at 25% so far, but I was unable to get that far before.
>
> Would be curious to know what you get with 4 threads on your machine.

Changing those parameters really slowed down counting the objects. I
used to be able to count in 45 seconds now it took 130 seconds. I am
still have the Google allocator linked in.

4 threads, cumulative clock time
25%     200 seconds, 820/627M
55%     510 seconds, 1240/1000M - little late recording
75%     15 minutes, 1658/1500M
90%      22 minutes, 1974/1800M
it's still running but there is no significant change.

Are two types of allocations being mixed?
1) long term, global objects kept until the end of everything
2) volatile, private objects allocated only while the object is being
compressed and then freed

Separating these would make a big difference to the fragmentation
problem. Single threading probably wouldn't see a fragmentation
problem from mixing the allocation types.

When a thread is created it could allocated a private 20MB (or
whatever) pool. The volatile, private objects would come from that
pool. Long term objects would stay in the global pool. Since they are
long term they will just get laid down sequentially in memory.
Separating these allocation types make things way easier for malloc.

CPU time would be helped by removing some of the locking if possible.

-- 
Jon Smirl
jonsmirl@gmail.com

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Something is broken in repack
  2007-12-11 15:36             ` Nicolas Pitre
  2007-12-11 16:20               ` Jon Smirl
@ 2007-12-11 16:22               ` Nicolas Pitre
  2007-12-11 16:34                 ` Jon Smirl
  1 sibling, 1 reply; 45+ messages in thread
From: Nicolas Pitre @ 2007-12-11 16:22 UTC (permalink / raw)
  To: Jon Smirl; +Cc: Junio C Hamano, gcc, Git Mailing List

On Tue, 11 Dec 2007, Nicolas Pitre wrote:

> OK, here's something else for you to try:
> 
> 	core.deltabasecachelimit=0
> 	pack.threads=2
> 	pack.deltacachesize=1
> 
> With that I'm able to repack the small gcc pack on my machine with 1GB 
> of ram using:
> 
> 	git repack -a -f -d --window=250 --depth=250
> 
> and top reports a ~700m virt and ~500m res without hitting swap at all.
> It is only at 25% so far, but I was unable to get that far before.

Well, around 55% memory usage skyrocketed to 1.6GB and the system went 
deep into swap.  So I restarted it with no threads.

Nicolas (even more puzzled)

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Something is broken in repack
  2007-12-11 16:22               ` Nicolas Pitre
@ 2007-12-11 16:34                 ` Jon Smirl
  2007-12-12  7:25                   ` Nicolas Pitre
  0 siblings, 1 reply; 45+ messages in thread
From: Jon Smirl @ 2007-12-11 16:34 UTC (permalink / raw)
  To: Nicolas Pitre; +Cc: Junio C Hamano, gcc, Git Mailing List

On 12/11/07, Nicolas Pitre <nico@cam.org> wrote:
> On Tue, 11 Dec 2007, Nicolas Pitre wrote:
>
> > OK, here's something else for you to try:
> >
> >       core.deltabasecachelimit=0
> >       pack.threads=2
> >       pack.deltacachesize=1
> >
> > With that I'm able to repack the small gcc pack on my machine with 1GB
> > of ram using:
> >
> >       git repack -a -f -d --window=250 --depth=250
> >
> > and top reports a ~700m virt and ~500m res without hitting swap at all.
> > It is only at 25% so far, but I was unable to get that far before.
>
> Well, around 55% memory usage skyrocketed to 1.6GB and the system went
> deep into swap.  So I restarted it with no threads.
>
> Nicolas (even more puzzled)

On the plus side you are seeing what I see, so it proves I am not imagining it.


-- 
Jon Smirl
jonsmirl@gmail.com

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Something is broken in repack
  2007-12-11 16:34                 ` Jon Smirl
@ 2007-12-12  7:25                   ` Nicolas Pitre
  2007-12-12 12:02                     ` David Kastrup
                                       ` (2 more replies)
  0 siblings, 3 replies; 45+ messages in thread
From: Nicolas Pitre @ 2007-12-12  7:25 UTC (permalink / raw)
  To: Jon Smirl; +Cc: Junio C Hamano, gcc, Git Mailing List

On Tue, 11 Dec 2007, Jon Smirl wrote:

> On 12/11/07, Nicolas Pitre <nico@cam.org> wrote:
> > On Tue, 11 Dec 2007, Nicolas Pitre wrote:
> >
> > > OK, here's something else for you to try:
> > >
> > >       core.deltabasecachelimit=0
> > >       pack.threads=2
> > >       pack.deltacachesize=1
> > >
> > > With that I'm able to repack the small gcc pack on my machine with 1GB
> > > of ram using:
> > >
> > >       git repack -a -f -d --window=250 --depth=250
> > >
> > > and top reports a ~700m virt and ~500m res without hitting swap at all.
> > > It is only at 25% so far, but I was unable to get that far before.
> >
> > Well, around 55% memory usage skyrocketed to 1.6GB and the system went
> > deep into swap.  So I restarted it with no threads.
> >
> > Nicolas (even more puzzled)
> 
> On the plus side you are seeing what I see, so it proves I am not imagining it.

Well... This is weird.

It seems that memory fragmentation is really really killing us here.  
The fact that the Google allocator did manage to waste quite less memory 
is a good indicator already.

I did modify the progress display to show accounted memory that was 
allocated vs memory that was freed but still not released to the system.  
At least that gives you an idea of memory allocation and fragmentation 
with glibc in real time:

diff --git a/progress.c b/progress.c
index d19f80c..46ac9ef 100644
--- a/progress.c
+++ b/progress.c
@@ -8,6 +8,7 @@
  * published by the Free Software Foundation.
  */

+#include <malloc.h>
 #include "git-compat-util.h"
 #include "progress.h"

@@ -94,10 +95,12 @@ static int display(struct progress *progress, unsigned n, const char *done)
 	if (progress->total) {
 		unsigned percent = n * 100 / progress->total;
 		if (percent != progress->last_percent || progress_update) {
+			struct mallinfo m = mallinfo();
 			progress->last_percent = percent;
-			fprintf(stderr, "%s: %3u%% (%u/%u)%s%s",
-				progress->title, percent, n,
-				progress->total, tp, eol);
+			fprintf(stderr, "%s: %3u%% (%u/%u) %u/%uMB%s%s",
+				progress->title, percent, n, progress->total,
+				m.uordblks >> 18, m.fordblks >> 18,
+				tp, eol);
 			fflush(stderr);
 			progress_update = 0;
 			return 1;

This shows that at some point the repack goes into a big memory surge.  
I don't have enough RAM to see how fragmented memory gets though, since 
it starts swapping around 50% done with 2 threads.

With only 1 thread, memory usage grows significantly at around 11% with 
a pretty noticeable slowdown in the progress rate.

So I think the theory goes like this:

There is a block of big objects together in the list somewhere.  
Initially, all those big objects are assigned to thread #1 out of 4.  
Because those objects are big, they get really slow to delta compress, 
and storing them all in a window with 250 slots takes significant 
memory.

Threads 2, 3, and 4 have "easy" work loads, so they complete fairly 
quicly compared to thread #1.  But since the progress display is global 
then you won't notice that one thread is actually crawling slowly.

To keep all threads busy until the end, those threads that are done with 
their work load will steal some work from another thread, choosing the 
one with the largest remaining work.  That is most likely thread #1.  So 
as threads 2, 3, and 4 complete, they will steal from thread 1 and 
populate their own window with those big objects too, and get slow too.

And because all threads gets to work on those big objects towards the 
end, the progress display will then show a significant slowdown, and 
memory usage will almost quadruple.

Add memory fragmentation to that and you have a clogged system.

Solution: 

	pack.deltacachesize=1
	pack.windowmemory=16M

Limiting the window memory to 16MB will automatically shrink the window 
size when big objects are encountered, therefore keeping much fewer of 
those objects at the same time in memory, which in turn means they will 
be processed much more quickly.  And somehow that must help with memory 
fragmentation as well.

Setting pack.deltacachesize to 1 is simply to disable the caching of 
delta results entirely which will only slow down the writing phase, but 
I wanted to keep it out of the picture for now.

With the above settings, I'm currently repacking the gcc repo with 2 
threads, and memory allocation never exceeded 700m virt and 400m res, 
while the mallinfo shows about 350MB, and progress has reached 90% which 
has never occurred on this machine with the 300MB source pack so far.

Nicolas

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Something is broken in repack
  2007-12-12  7:25                   ` Nicolas Pitre
@ 2007-12-12 12:02                     ` David Kastrup
  2007-12-14 16:44                       ` Wolfram Gloger
  2007-12-12 16:14                     ` Nicolas Pitre
  2007-12-12 16:19                     ` Nicolas Pitre
  2 siblings, 1 reply; 45+ messages in thread
From: David Kastrup @ 2007-12-12 12:02 UTC (permalink / raw)
  To: Nicolas Pitre; +Cc: Jon Smirl, Junio C Hamano, gcc, Git Mailing List

Nicolas Pitre <nico@cam.org> writes:

> Well... This is weird.
>
> It seems that memory fragmentation is really really killing us here.  
> The fact that the Google allocator did manage to waste quite less memory 
> is a good indicator already.

Maybe an malloc/free/mmap wrapper that records the requested sizes and
alloc/free order and dumps them to file so that one can make a compact
git-free standalone test case for the glibc maintainers might be a good
thing.

-- 
David Kastrup, Kriemhildstr. 15, 44793 Bochum

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Something is broken in repack
  2007-12-12 12:02                     ` David Kastrup
@ 2007-12-14 16:44                       ` Wolfram Gloger
  0 siblings, 0 replies; 45+ messages in thread
From: Wolfram Gloger @ 2007-12-14 16:44 UTC (permalink / raw)
  To: dak; +Cc: nico, jonsmirl, gitster, gcc, git

Hi,

> Maybe an malloc/free/mmap wrapper that records the requested sizes and
> alloc/free order and dumps them to file so that one can make a compact
> git-free standalone test case for the glibc maintainers might be a good
> thing.

I already have such a wrapper:

http://malloc.de/malloc/mtrace-20060529.tar.gz

But note that it does interfere with the thread scheduling, so it
can't record the exact same allocation pattern as when not using the
wrapper.

Regards,
Wolfram.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Something is broken in repack
  2007-12-12  7:25                   ` Nicolas Pitre
  2007-12-12 12:02                     ` David Kastrup
@ 2007-12-12 16:14                     ` Nicolas Pitre
  2007-12-12 16:37                       ` Paolo Bonzini
                                         ` (2 more replies)
  2007-12-12 16:19                     ` Nicolas Pitre
  2 siblings, 3 replies; 45+ messages in thread
From: Nicolas Pitre @ 2007-12-12 16:14 UTC (permalink / raw)
  To: Jon Smirl; +Cc: Junio C Hamano, gcc, Git Mailing List

On Wed, 12 Dec 2007, Nicolas Pitre wrote:

> Add memory fragmentation to that and you have a clogged system.
> 
> Solution: 
> 
> 	pack.deltacachesize=1
> 	pack.windowmemory=16M
> 
> Limiting the window memory to 16MB will automatically shrink the window 
> size when big objects are encountered, therefore keeping much fewer of 
> those objects at the same time in memory, which in turn means they will 
> be processed much more quickly.  And somehow that must help with memory 
> fragmentation as well.

OK scrap that.

When I returned to the computer this morning, the repack was 
completed... with a 1.3GB pack instead.

So... The gcc repo apparently really needs a large window to efficiently 
compress those large objects.

But when those large objects are already well deltified and you repack 
again with a large window, somehow the memory allocator is way more 
involved, probably even 
more so when there are several threads in parallel amplifying the issue, 
and things probably get to a point of no return with regard to memory 
fragmentation after a while.

So... my conclusion is that the glibc allocator has fragmentation issues 
with this work load, given the notable difference with the Google 
allocator, which itself might not be completely immune to fragmentation 
issues of its own.  And because the gcc repo requires a large window of 
big objects to get good compression, then you're better not using 4 
threads to repack it with -a -f.  The fact that the size of the source 
pack has such an influence is probably only because the increased usage 
of the delta base object cache is playing a role in the global memory 
allocation pattern, allowing for the bad fragmentation issue to occur.

If you could run one last test with the mallinfo patch I posted, without 
the pack.windowmemory setting, and adding the reported values along with 
those from top, then we could formally conclude to memory fragmentation 
issues.

So I don't think Git itself is actually bad.  The gcc repo most 
certainly constitute a nasty use case for memory allocators, but I don't 
think there is much we can do about it besides possibly implementing our 
own memory allocator with active defragmentation where possible (read 
memcpy) at some point to give glibc's allocator some chance to breathe a 
bit more.

In the mean time you might have to use only one thread and lots of 
memory to repack the gcc repo, or find the perfect memory allocator to 
be used with Git.  After all, packing the whole gcc history to around 
230MB is quite a stunt but it requires sufficient resources to 
achieve it. Fortunately, like Linus said, such a wholesale repack is not 
something that most users have to do anyway.

Nicolas

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Something is broken in repack
  2007-12-12 16:14                     ` Nicolas Pitre
@ 2007-12-12 16:37                       ` Paolo Bonzini
  2007-12-12 16:42                       ` Linus Torvalds
  2007-12-13 14:09                       ` Nguyen Thai Ngoc Duy
  2 siblings, 0 replies; 45+ messages in thread
From: Paolo Bonzini @ 2007-12-12 16:37 UTC (permalink / raw)
  To: gcc; +Cc: git


> When I returned to the computer this morning, the repack was 
> completed... with a 1.3GB pack instead.
> 
> So... The gcc repo apparently really needs a large window to efficiently 
> compress those large objects.

So, am I right that if you have a very well-done pack (such as gcc's), 
you might want to repack in two phases:

- first discarding the old deltas and using a small window, thus 
producing a bad pack that can be repacked without humongous amounts of 
memory...

- ... then discarding the old deltas and producing another 
well-compressed pack?

Paolo

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Something is broken in repack
  2007-12-12 16:14                     ` Nicolas Pitre
  2007-12-12 16:37                       ` Paolo Bonzini
@ 2007-12-12 16:42                       ` Linus Torvalds
  2007-12-12 16:54                         ` David Miller
                                           ` (2 more replies)
  2007-12-13 14:09                       ` Nguyen Thai Ngoc Duy
  2 siblings, 3 replies; 45+ messages in thread
From: Linus Torvalds @ 2007-12-12 16:42 UTC (permalink / raw)
  To: Nicolas Pitre; +Cc: Jon Smirl, Junio C Hamano, gcc, Git Mailing List

On Wed, 12 Dec 2007, Nicolas Pitre wrote:
> 
> So... my conclusion is that the glibc allocator has fragmentation issues 
> with this work load, given the notable difference with the Google 
> allocator, which itself might not be completely immune to fragmentation 
> issues of its own. 

Yes.

Note that delta following involves patterns something like

   allocate (small) space for delta
   for i in (1..depth) {
	allocate large space for base
	allocate large space for result
	.. apply delta ..
	free large space for base
	free small space for delta
   }

so if you have some stupid heap algorithm that doesn't try to merge and 
re-use free'd spaces very aggressively (because that takes CPU time!), you 
might have memory usage be horribly inflated by the heap having all those 
holes for all the objects that got free'd in the chain that don't get 
aggressively re-used.

Threaded memory allocators then make this worse by probably using totally 
different heaps for different threads (in order to avoid locking), so they 
will *all* have the fragmentation issue.

And if you *really* want to cause trouble for a memory allocator, what you 
should try to do is to allocate the memory in one thread, and free it in 
another, and then things can really explode (the freeing thread notices 
that the allocation is not in its thread-local heap, so instead of really 
freeing it, it puts it on a separate list of areas to be freed later by 
the original thread when it needs memory - or worse, it adds it to the 
local thread list, and makes it effectively totally impossible to then 
ever merge different free'd allocations ever again because the freed 
things will be on different heap lists!).

I'm not saying that particular case happens in git, I'm just saying that 
it's not unheard of. And with the delta cache and the object lookup, it's 
not at _all_ impossible that we hit the "allocate in one thread, free in 
another" case!

		Linus

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Something is broken in repack
  2007-12-12 16:42                       ` Linus Torvalds
@ 2007-12-12 16:54                         ` David Miller
  2007-12-12 17:12                           ` Linus Torvalds
  2007-12-12 17:29                         ` Jon Smirl
  2007-12-14 16:19                         ` Wolfram Gloger
  2 siblings, 1 reply; 45+ messages in thread
From: David Miller @ 2007-12-12 16:54 UTC (permalink / raw)
  To: torvalds; +Cc: nico, jonsmirl, gitster, gcc, git

From: Linus Torvalds <torvalds@linux-foundation.org>
Date: Wed, 12 Dec 2007 08:37:10 -0800 (PST)

> I'm not saying that particular case happens in git, I'm just saying that 
> it's not unheard of. And with the delta cache and the object lookup, it's 
> not at _all_ impossible that we hit the "allocate in one thread, free in 
> another" case!

One thing that supports these theories is that, while running
these large repacks, I notice that the RSS is roughly 2/3 of
the amount of virtual address space allocated.

I personally don't think it's unreasonable for GIT to have it's
own customized allocator at least for certain object types.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Something is broken in repack
  2007-12-12 16:54                         ` David Miller
@ 2007-12-12 17:12                           ` Linus Torvalds
  0 siblings, 0 replies; 45+ messages in thread
From: Linus Torvalds @ 2007-12-12 17:12 UTC (permalink / raw)
  To: David Miller; +Cc: nico, jonsmirl, gitster, gcc, git

On Wed, 12 Dec 2007, David Miller wrote:
> 
> I personally don't think it's unreasonable for GIT to have it's
> own customized allocator at least for certain object types.

Well, we actually already *do* have a customized allocator, but currently 
only for the actual core "object descriptor" that really just has the SHA1 
and object flags in it (and a few extra words depending on object type).

Those are critical for certain loads, and small too (so using the standard 
allocator wasted a _lot_ of memory). In addition, they're fixed-size and 
never free'd, so a specialized allocator really can do a lot better than 
any general-purpose memory allocator ever could.

But the actual object *contents* are currently all allocated with whatever 
the standard libc malloc/free allocator is that you compile for (or load 
dynamically). Havign a specialized allocator for them is a much more 
involved issue, exactly because we do have interesting allocation patterns 
etc.

That said, at least those object allocations are all single-threaded (for 
right now, at least), so even when git does multi-threaded stuff, the core 
sha1_file.c stuff is always run under a single lock, and a simpler 
allocator that doesn't care about threads is likely to be much better than 
one that tries to have thread-local heaps etc.

I suspect that is what the google allocator does. It probably doesn't have 
per-thread heaps, it just uses locking (and quite possibly things like 
per-*size* heaps, which is much more memory-efficient and helps avoid some 
of the fragmentation problems). 

Locking is much slower than per-thread accesses, but it doesn't have the 
issues with per-thread-fragmentation and all the problems with one thread 
allocating and another one freeing.

			Linus

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Something is broken in repack
  2007-12-12 16:42                       ` Linus Torvalds
  2007-12-12 16:54                         ` David Miller
@ 2007-12-12 17:29                         ` Jon Smirl
  2007-12-14 16:19                         ` Wolfram Gloger
  2 siblings, 0 replies; 45+ messages in thread
From: Jon Smirl @ 2007-12-12 17:29 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Nicolas Pitre, Junio C Hamano, gcc, Git Mailing List

On 12/12/07, Linus Torvalds <torvalds@linux-foundation.org> wrote:
>
>
> On Wed, 12 Dec 2007, Nicolas Pitre wrote:
> >
> > So... my conclusion is that the glibc allocator has fragmentation issues
> > with this work load, given the notable difference with the Google
> > allocator, which itself might not be completely immune to fragmentation
> > issues of its own.
>
> Yes.
>
> Note that delta following involves patterns something like
>
>    allocate (small) space for delta
>    for i in (1..depth) {
>         allocate large space for base
>         allocate large space for result
>         .. apply delta ..
>         free large space for base
>         free small space for delta
>    }

Is it hard to hack up something that statically allocates a big block
of memory per thread for these two and then just reuses it?
   allocate (small) space for delta
   allocate large space for base

The alternating between long term and short term allocations
definitely aggravates fragmentation.

>
> so if you have some stupid heap algorithm that doesn't try to merge and
> re-use free'd spaces very aggressively (because that takes CPU time!), you
> might have memory usage be horribly inflated by the heap having all those
> holes for all the objects that got free'd in the chain that don't get
> aggressively re-used.
>
> Threaded memory allocators then make this worse by probably using totally
> different heaps for different threads (in order to avoid locking), so they
> will *all* have the fragmentation issue.
>
> And if you *really* want to cause trouble for a memory allocator, what you
> should try to do is to allocate the memory in one thread, and free it in
> another, and then things can really explode (the freeing thread notices
> that the allocation is not in its thread-local heap, so instead of really
> freeing it, it puts it on a separate list of areas to be freed later by
> the original thread when it needs memory - or worse, it adds it to the
> local thread list, and makes it effectively totally impossible to then
> ever merge different free'd allocations ever again because the freed
> things will be on different heap lists!).
>
> I'm not saying that particular case happens in git, I'm just saying that
> it's not unheard of. And with the delta cache and the object lookup, it's
> not at _all_ impossible that we hit the "allocate in one thread, free in
> another" case!
>
>                 Linus
>


-- 
Jon Smirl
jonsmirl@gmail.com

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Something is broken in repack
  2007-12-12 16:42                       ` Linus Torvalds
  2007-12-12 16:54                         ` David Miller
  2007-12-12 17:29                         ` Jon Smirl
@ 2007-12-14 16:19                         ` Wolfram Gloger
  2007-12-14 16:59                           ` David Kastrup
  2 siblings, 1 reply; 45+ messages in thread
From: Wolfram Gloger @ 2007-12-14 16:19 UTC (permalink / raw)
  To: torvalds; +Cc: nico, jonsmirl, gitster, gcc, git

Hi,

> Note that delta following involves patterns something like
> 
>    allocate (small) space for delta
>    for i in (1..depth) {
> 	allocate large space for base
> 	allocate large space for result
> 	.. apply delta ..
> 	free large space for base
> 	free small space for delta
>    }
> 
> so if you have some stupid heap algorithm that doesn't try to merge and 
> re-use free'd spaces very aggressively (because that takes CPU time!),

ptmalloc2 (in glibc) _per arena_ is basically best-fit.  This is the
best known general strategy, but it certainly cannot be the best in
every case.

> you 
> might have memory usage be horribly inflated by the heap having all those 
> holes for all the objects that got free'd in the chain that don't get 
> aggressively re-used.

It depends how large 'large' is -- if it exceeds the mmap() threshold
(settable with mallopt(M_MMAP_THRESHOLD, ...))
the 'large' spaces will be allocated with mmap() and won't cause
any internal fragmentation.
It might pay to experiment with this parameter if it is hard to
avoid the alloc/free large space sequence.

> Threaded memory allocators then make this worse by probably using totally 
> different heaps for different threads (in order to avoid locking), so they 
> will *all* have the fragmentation issue.

Indeed.

Could someone perhaps try ptmalloc3
(http://malloc.de/malloc/ptmalloc3-current.tar.gz) on this case?

Thanks,
Wolfram.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Something is broken in repack
  2007-12-14 16:19                         ` Wolfram Gloger
@ 2007-12-14 16:59                           ` David Kastrup
  2007-12-14 18:49                             ` Wolfram Gloger
  0 siblings, 1 reply; 45+ messages in thread
From: David Kastrup @ 2007-12-14 16:59 UTC (permalink / raw)
  To: Wolfram Gloger; +Cc: torvalds, nico, jonsmirl, gitster, gcc, git

Wolfram Gloger <wmglo@dent.med.uni-muenchen.de> writes:

> Hi,
>
>> Note that delta following involves patterns something like
>> 
>>    allocate (small) space for delta
>>    for i in (1..depth) {
>> 	allocate large space for base
>> 	allocate large space for result
>> 	.. apply delta ..
>> 	free large space for base
>> 	free small space for delta
>>    }
>> 
>> so if you have some stupid heap algorithm that doesn't try to merge and 
>> re-use free'd spaces very aggressively (because that takes CPU time!),
>
> ptmalloc2 (in glibc) _per arena_ is basically best-fit.  This is the
> best known general strategy,

Uh what?  Someone crank out his copy of "The Art of Computer
Programming", I think volume 1.  Best fit is known (analyzed and proven
and documented decades ago) to be one of the worst strategies for memory
allocation.  Exactly because it leads to huge fragmentation problems.

-- 
David Kastrup, Kriemhildstr. 15, 44793 Bochum

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Something is broken in repack
  2007-12-14 16:59                           ` David Kastrup
@ 2007-12-14 18:49                             ` Wolfram Gloger
  0 siblings, 0 replies; 45+ messages in thread
From: Wolfram Gloger @ 2007-12-14 18:49 UTC (permalink / raw)
  To: dak; +Cc: wmglo, torvalds, nico, jonsmirl, gitster, gcc, git

Hi,

> Uh what?  Someone crank out his copy of "The Art of Computer
> Programming", I think volume 1.  Best fit is known (analyzed and proven
> and documented decades ago) to be one of the worst strategies for memory
> allocation.  Exactly because it leads to huge fragmentation problems.

Well, quoting http://gee.cs.oswego.edu/dl/html/malloc.html:

"As shown by Wilson et al, best-fit schemes (of various kinds and
approximations) tend to produce the least fragmentation on real loads
compared to other general approaches such as first-fit."

See [Wilson 1995] ftp://ftp.cs.utexas.edu/pub/garbage/allocsrv.ps for
more details and references.

Regards,
Wolfram.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Something is broken in repack
  2007-12-12 16:14                     ` Nicolas Pitre
  2007-12-12 16:37                       ` Paolo Bonzini
  2007-12-12 16:42                       ` Linus Torvalds
@ 2007-12-13 14:09                       ` Nguyen Thai Ngoc Duy
       [not found]                         ` <fjrj9k$n6k$1@ger.gmane.org>
  2 siblings, 1 reply; 45+ messages in thread
From: Nguyen Thai Ngoc Duy @ 2007-12-13 14:09 UTC (permalink / raw)
  To: Nicolas Pitre; +Cc: Jon Smirl, Junio C Hamano, gcc, Git Mailing List

On Dec 12, 2007 10:48 PM, Nicolas Pitre <nico@cam.org> wrote:
> In the mean time you might have to use only one thread and lots of
> memory to repack the gcc repo, or find the perfect memory allocator to
> be used with Git.  After all, packing the whole gcc history to around
> 230MB is quite a stunt but it requires sufficient resources to
> achieve it. Fortunately, like Linus said, such a wholesale repack is not
> something that most users have to do anyway.

Is there an alternative to "git repack -a -d" that repacks everything
but the first pack?
-- 
Duy

^ permalink raw reply	[flat|nested] 45+ messages in thread

[parent not found: <fjrj9k$n6k$1@ger.gmane.org>]

* Re: Something is broken in repack
       [not found]                         ` <fjrj9k$n6k$1@ger.gmane.org>
@ 2007-12-13 16:39                           ` Paolo Bonzini
  2007-12-13 17:40                           ` Johannes Sixt
  1 sibling, 0 replies; 45+ messages in thread
From: Paolo Bonzini @ 2007-12-13 16:39 UTC (permalink / raw)
  Cc: git, gcc


>> Is there an alternative to "git repack -a -d" that repacks everything
>> but the first pack?
> 
> That would be a pretty good idea for big repositories.  If I were to 
> implement it, I would actually add a .git/config option like 
> pack.permanent so that more than one pack could be made permanent; then 
> to repack really really everything you'd need "git repack -a -a -d".

Actually there is something like this, as seen from the source of 
git-repack:

             for e in `cd "$PACKDIR" && find . -type f -name '*.pack' \
                      | sed -e 's/^\.\///' -e 's/\.pack$//'`
             do
                     if [ -e "$PACKDIR/$e.keep" ]; then
                             : keep
                     else
                             args="$args --unpacked=$e.pack"
                             existing="$existing $e"
                     fi
             done

So, just create a file named as the pack, but with extension ".keep".

Paolo

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Something is broken in repack
       [not found]                         ` <fjrj9k$n6k$1@ger.gmane.org>
  2007-12-13 16:39                           ` Paolo Bonzini
@ 2007-12-13 17:40                           ` Johannes Sixt
  2007-12-14  2:31                             ` Jakub Narebski
  1 sibling, 1 reply; 45+ messages in thread
From: Johannes Sixt @ 2007-12-13 17:40 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: git, gcc

Paolo Bonzini schrieb:
> Nguyen Thai Ngoc Duy wrote:
>> On Dec 12, 2007 10:48 PM, Nicolas Pitre <nico@cam.org> wrote:
>>> In the mean time you might have to use only one thread and lots of
>>> memory to repack the gcc repo, or find the perfect memory allocator to
>>> be used with Git.  After all, packing the whole gcc history to around
>>> 230MB is quite a stunt but it requires sufficient resources to
>>> achieve it. Fortunately, like Linus said, such a wholesale repack is not
>>> something that most users have to do anyway.
>>
>> Is there an alternative to "git repack -a -d" that repacks everything
>> but the first pack?
> 
> That would be a pretty good idea for big repositories.  If I were to
> implement it, I would actually add a .git/config option like
> pack.permanent so that more than one pack could be made permanent; then
> to repack really really everything you'd need "git repack -a -a -d".

It's already there: If you have a pack .git/objects/pack/pack-foo.pack, then
"touch .git/objects/pack/pack-foo.keep" marks the pack as precious.

-- Hannes

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Something is broken in repack
  2007-12-13 17:40                           ` Johannes Sixt
@ 2007-12-14  2:31                             ` Jakub Narebski
       [not found]                               ` <fjt6vm$n7d$1@ger.gmane.org>
  0 siblings, 1 reply; 45+ messages in thread
From: Jakub Narebski @ 2007-12-14  2:31 UTC (permalink / raw)
  To: gcc; +Cc: git

Johannes Sixt wrote:
> Paolo Bonzini schrieb:
>> Nguyen Thai Ngoc Duy wrote:
>>>
>>> Is there an alternative to "git repack -a -d" that repacks everything
>>> but the first pack?
>> 
>> That would be a pretty good idea for big repositories.  If I were to
>> implement it, I would actually add a .git/config option like
>> pack.permanent so that more than one pack could be made permanent; then
>> to repack really really everything you'd need "git repack -a -a -d".
> 
> It's already there: If you have a pack .git/objects/pack/pack-foo.pack, then
> "touch .git/objects/pack/pack-foo.keep" marks the pack as precious.

Actually you can (and probably should) put the one line with the _reason_
pack is to be kept in the *.keep file.

Hmmm... it is even documented in git-gc(1)... and git-index-pack(1) of
all things.
-- 
Jakub Narebski
Warsaw, Poland
ShadeHawk on #git


^ permalink raw reply	[flat|nested] 45+ messages in thread

[parent not found: <fjt6vm$n7d$1@ger.gmane.org>]

* Re: Something is broken in repack
       [not found]                               ` <fjt6vm$n7d$1@ger.gmane.org>
@ 2007-12-14  6:39                                 ` Nguyen Thai Ngoc Duy
  2007-12-14  9:02                                   ` Paolo Bonzini
  2007-12-14 10:52                                   ` Jakub Narebski
  2007-12-14 13:53                                 ` Nicolas Pitre
  1 sibling, 2 replies; 45+ messages in thread
From: Nguyen Thai Ngoc Duy @ 2007-12-14  6:39 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: git, gcc

On Dec 14, 2007 1:14 PM, Paolo Bonzini <bonzini@gnu.org> wrote:
> > Hmmm... it is even documented in git-gc(1)... and git-index-pack(1) of
> > all things.
>
> I found that the .keep file is not transmitted over the network (at
> least I tried with git+ssh:// and http:// protocols), however.

I'm thinking about "git clone --keep" to mark initial packs precious.
But 'git clone' is under rewrite to C. Let's wait until C rewrite is
done.
-- 
Duy

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Something is broken in repack
  2007-12-14  6:39                                 ` Nguyen Thai Ngoc Duy
@ 2007-12-14  9:02                                   ` Paolo Bonzini
  2007-12-14  9:39                                     ` Harvey Harrison
  2007-12-14 10:52                                   ` Jakub Narebski
  1 sibling, 1 reply; 45+ messages in thread
From: Paolo Bonzini @ 2007-12-14  9:02 UTC (permalink / raw)
  To: gcc; +Cc: git


> I'm thinking about "git clone --keep" to mark initial packs precious.
> But 'git clone' is under rewrite to C. Let's wait until C rewrite is
> done.

It should be the default, IMHO.

Paolo

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Something is broken in repack
  2007-12-14  9:02                                   ` Paolo Bonzini
@ 2007-12-14  9:39                                     ` Harvey Harrison
  0 siblings, 0 replies; 45+ messages in thread
From: Harvey Harrison @ 2007-12-14  9:39 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: gcc, git

On Fri, 2007-12-14 at 09:20 +0100, Paolo Bonzini wrote:
> > I'm thinking about "git clone --keep" to mark initial packs precious.
> > But 'git clone' is under rewrite to C. Let's wait until C rewrite is
> > done.
> 
> It should be the default, IMHO.
> 

While it doesn't mark the packs as .keep, git will reuse all of the old
deltas you got in the original clone, so you're not losing anything.

Harvey

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Something is broken in repack
  2007-12-14  6:39                                 ` Nguyen Thai Ngoc Duy
  2007-12-14  9:02                                   ` Paolo Bonzini
@ 2007-12-14 10:52                                   ` Jakub Narebski
  2007-12-14 13:25                                     ` Nguyen Thai Ngoc Duy
  1 sibling, 1 reply; 45+ messages in thread
From: Jakub Narebski @ 2007-12-14 10:52 UTC (permalink / raw)
  To: Nguyen Thai Ngoc Duy; +Cc: Paolo Bonzini, git, gcc

"Nguyen Thai Ngoc Duy" <pclouds@gmail.com> writes:

> On Dec 14, 2007 1:14 PM, Paolo Bonzini <bonzini@gnu.org> wrote:
> > > Hmmm... it is even documented in git-gc(1)... and git-index-pack(1) of
> > > all things.
> >
> > I found that the .keep file is not transmitted over the network (at
> > least I tried with git+ssh:// and http:// protocols), however.
> 
> I'm thinking about "git clone --keep" to mark initial packs precious.
> But 'git clone' is under rewrite to C. Let's wait until C rewrite is
> done.

But if you clone via network, pack might be network optimized if you
use "smart" transport, not disk optimized, at least with current git
which regenerates pack also on clone AFAIK.

-- 
Jakub Narebski
Poland
ShadeHawk on #git

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Something is broken in repack
  2007-12-14 10:52                                   ` Jakub Narebski
@ 2007-12-14 13:25                                     ` Nguyen Thai Ngoc Duy
  0 siblings, 0 replies; 45+ messages in thread
From: Nguyen Thai Ngoc Duy @ 2007-12-14 13:25 UTC (permalink / raw)
  To: Jakub Narebski, Harvey Harrison; +Cc: Paolo Bonzini, git, gcc

On Dec 14, 2007 4:01 PM, Harvey Harrison <harvey.harrison@gmail.com> wrote:
> While it doesn't mark the packs as .keep, git will reuse all of the old
> deltas you got in the original clone, so you're not losing anything.

There is another reason I want it. I have an ~800MB pack and I don't
want git to rewrite  the pack every time I repack my changes. So it's
kind of disk-wise (don't require 800MB on disk to prepare new pack,
and don't write too much).

On Dec 14, 2007 5:40 PM, Jakub Narebski <jnareb@gmail.com> wrote:
> But if you clone via network, pack might be network optimized if you
> use "smart" transport, not disk optimized, at least with current git
> which regenerates pack also on clone AFAIK.

Um.. that's ok it just regenerate once.

-- 
Duy

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Something is broken in repack
       [not found]                               ` <fjt6vm$n7d$1@ger.gmane.org>
  2007-12-14  6:39                                 ` Nguyen Thai Ngoc Duy
@ 2007-12-14 13:53                                 ` Nicolas Pitre
  1 sibling, 0 replies; 45+ messages in thread
From: Nicolas Pitre @ 2007-12-14 13:53 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: git, gcc

On Fri, 14 Dec 2007, Paolo Bonzini wrote:

> > Hmmm... it is even documented in git-gc(1)... and git-index-pack(1) of
> > all things.
> 
> I found that the .keep file is not transmitted over the network (at least I
> tried with git+ssh:// and http:// protocols), however.

That is a local policy.


Nicolas

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Something is broken in repack
  2007-12-12  7:25                   ` Nicolas Pitre
  2007-12-12 12:02                     ` David Kastrup
  2007-12-12 16:14                     ` Nicolas Pitre
@ 2007-12-12 16:19                     ` Nicolas Pitre
  2007-12-13  7:39                       ` Andreas Ericsson
  2 siblings, 1 reply; 45+ messages in thread
From: Nicolas Pitre @ 2007-12-12 16:19 UTC (permalink / raw)
  To: Jon Smirl; +Cc: Junio C Hamano, gcc, Git Mailing List

On Wed, 12 Dec 2007, Nicolas Pitre wrote:

> I did modify the progress display to show accounted memory that was 
> allocated vs memory that was freed but still not released to the system.  
> At least that gives you an idea of memory allocation and fragmentation 
> with glibc in real time:
> 
> diff --git a/progress.c b/progress.c
> index d19f80c..46ac9ef 100644
> --- a/progress.c
> +++ b/progress.c
> @@ -8,6 +8,7 @@
>   * published by the Free Software Foundation.
>   */
>  
> +#include <malloc.h>
>  #include "git-compat-util.h"
>  #include "progress.h"
>  
> @@ -94,10 +95,12 @@ static int display(struct progress *progress, unsigned n, const char *done)
>  	if (progress->total) {
>  		unsigned percent = n * 100 / progress->total;
>  		if (percent != progress->last_percent || progress_update) {
> +			struct mallinfo m = mallinfo();
>  			progress->last_percent = percent;
> -			fprintf(stderr, "%s: %3u%% (%u/%u)%s%s",
> -				progress->title, percent, n,
> -				progress->total, tp, eol);
> +			fprintf(stderr, "%s: %3u%% (%u/%u) %u/%uMB%s%s",
> +				progress->title, percent, n, progress->total,
> +				m.uordblks >> 18, m.fordblks >> 18,
> +				tp, eol);

Note: I didn't know what unit of memory those blocks represents, so the 
shift is most probably wrong.


Nicolas

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Something is broken in repack
  2007-12-12 16:19                     ` Nicolas Pitre
@ 2007-12-13  7:39                       ` Andreas Ericsson
  2007-12-14 16:12                         ` Wolfram Gloger
  0 siblings, 1 reply; 45+ messages in thread
From: Andreas Ericsson @ 2007-12-13  7:39 UTC (permalink / raw)
  To: Nicolas Pitre; +Cc: Jon Smirl, Junio C Hamano, gcc, Git Mailing List

Nicolas Pitre wrote:
> On Wed, 12 Dec 2007, Nicolas Pitre wrote:
> 
>> I did modify the progress display to show accounted memory that was 
>> allocated vs memory that was freed but still not released to the system.  
>> At least that gives you an idea of memory allocation and fragmentation 
>> with glibc in real time:
>>
>> diff --git a/progress.c b/progress.c
>> index d19f80c..46ac9ef 100644
>> --- a/progress.c
>> +++ b/progress.c
>> @@ -8,6 +8,7 @@
>>   * published by the Free Software Foundation.
>>   */
>>  
>> +#include <malloc.h>
>>  #include "git-compat-util.h"
>>  #include "progress.h"
>>  
>> @@ -94,10 +95,12 @@ static int display(struct progress *progress, unsigned n, const char *done)
>>  	if (progress->total) {
>>  		unsigned percent = n * 100 / progress->total;
>>  		if (percent != progress->last_percent || progress_update) {
>> +			struct mallinfo m = mallinfo();
>>  			progress->last_percent = percent;
>> -			fprintf(stderr, "%s: %3u%% (%u/%u)%s%s",
>> -				progress->title, percent, n,
>> -				progress->total, tp, eol);
>> +			fprintf(stderr, "%s: %3u%% (%u/%u) %u/%uMB%s%s",
>> +				progress->title, percent, n, progress->total,
>> +				m.uordblks >> 18, m.fordblks >> 18,
>> +				tp, eol);
> 
> Note: I didn't know what unit of memory those blocks represents, so the 
> shift is most probably wrong.
> 

Me neither, but it appears to me as if hblkhd holds the actual memory
consumed by the process. It seems to store the information in bytes,
which I find a bit dubious unless glibc has some internal multiplier.

-- 
Andreas Ericsson                   andreas.ericsson@op5.se
OP5 AB                             www.op5.se
Tel: +46 8-230225                  Fax: +46 8-230231

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Something is broken in repack
  2007-12-13  7:39                       ` Andreas Ericsson
@ 2007-12-14 16:12                         ` Wolfram Gloger
  0 siblings, 0 replies; 45+ messages in thread
From: Wolfram Gloger @ 2007-12-14 16:12 UTC (permalink / raw)
  To: ae; +Cc: nico, jonsmirl, gitster, gcc, git

Hi,

> >>  	if (progress->total) {
> >>  		unsigned percent = n * 100 / progress->total;
> >>  		if (percent != progress->last_percent || progress_update) {
> >> +			struct mallinfo m = mallinfo();
> >>  			progress->last_percent = percent;
> >> -			fprintf(stderr, "%s: %3u%% (%u/%u)%s%s",
> >> -				progress->title, percent, n,
> >> -				progress->total, tp, eol);
> >> +			fprintf(stderr, "%s: %3u%% (%u/%u) %u/%uMB%s%s",
> >> +				progress->title, percent, n, progress->total,
> >> +				m.uordblks >> 18, m.fordblks >> 18,
> >> +				tp, eol);
> > 
> > Note: I didn't know what unit of memory those blocks represents, so the 
> > shift is most probably wrong.
> > 
> 
> Me neither, but it appears to me as if hblkhd holds the actual memory
> consumed by the process. It seems to store the information in bytes,
> which I find a bit dubious unless glibc has some internal multiplier.

mallinfo() will only give you the used memory for the main arena.
When you have separate arenas (likely when concurrent threads have
been used), the only way to get the full picture is to call
malloc_stats(), which prints to stderr.

Regards,
Wolfram.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Something is broken in repack
  2007-12-11  7:34         ` Jon Smirl
  2007-12-11 11:11           ` Andreas Ericsson
  2007-12-11 15:01           ` Nicolas Pitre
@ 2007-12-11 17:19           ` Linus Torvalds
  2007-12-11 17:24             ` Nicolas Pitre
  2007-12-11 18:57             ` Jon Smirl
  2007-12-11 17:44           ` Daniel Berlin
  3 siblings, 2 replies; 45+ messages in thread
From: Linus Torvalds @ 2007-12-11 17:19 UTC (permalink / raw)
  To: Jon Smirl; +Cc: Nicolas Pitre, Junio C Hamano, gcc, Git Mailing List

On Tue, 11 Dec 2007, Jon Smirl wrote:
> 
> So why does our threaded code take 20 CPU minutes longer (12%) to run
> than the same code with a single thread?

Threaded code *always* takes more CPU time. The only thing you can hope 
for is a wall-clock reduction. You're seeing probably a combination of 
 (a) more cache misses
 (b) bigger dataset active at a time
and a probably fairly miniscule
 (c) threading itself tends to have some overheads.

> Q6600 is just two E6600s in the same package, the caches are not shared.

Sure they are shared. They're just not *entirely* shared. But they are 
shared between each two cores, so each thread essentially has only half 
the cache they had with the non-threaded version.

Threading is *not* a magic solution to all problems. It gives you 
potentially twice the CPU power, but there are real downsides that you 
should keep in mind.

> Why does the threaded code need 2.24GB (google allocator, 2.85GB gcc)
> with 4 threads? But only need 950MB with one thread? Where's the extra
> gigabyte going?

I suspect that it's really simple: you have a few rather big files in the 
gcc history, with deep delta chains. And what happens when you have four 
threads running at the same time is that they all need to keep all those 
objects that they are working on - and their hash state - in memory at the 
same time!

So if you want to use more threads, that _forces_ you to have a bigger 
memory footprint, simply because you have more "live" objects that you 
work on. Normally, that isn't much of a problem, since most source files 
are small, but if you have a few deep delta chains on big files, both the 
delta chain itself is going to use memory (you may have limited the size 
of the cache, but it's still needed for the actual delta generation, so 
it's not like the memory usage went away).

That said, I suspect there are a few things fighting you:

 - threading is hard. I haven't looked a lot at the changes Nico did to do 
   a threaded object packer, but what I've seen does not convince me it is 
   correct. The "trg_entry" accesses are *mostly* protected with 
   "cache_lock", but nothing else really seems to be, so quite frankly, I 
   wouldn't trust the threaded version very much. It's off by default, and 
   for a good reason, I think.

   For example: the packing code does this:

	if (!src->data) {
		read_lock();
		src->data = read_sha1_file(src_entry->idx.sha1, &type, &sz);
		read_unlock();
		...

   and that's racy. If two threads come in at roughly the same time and 
   see a NULL src->data, theyÍ„'ll both get the lock, and they'll both 
   (serially) try to fill it in. It will all *work*, but one of them will 
   have done unnecessary work, and one of them will have their result 
   thrown away and leaked.

   Are you hitting issues like this? I dunno. The object sorting means 
   that different threads normally shouldn't look at the same objects (not 
   even the sources), so probably not, but basically, I wouldn't trust the 
   threading 100%. It needs work, and it needs to stay off by default.

 - you're working on a problem that isn't really even worth optimizing 
   that much. The *normal* case is to re-use old deltas, which makes all 
   of the issues you are fighting basically go away (because you only have 
   a few _incremental_ objects that need deltaing). 

   In other words: the _real_ optimizations have already been done, and 
   are done elsewhere, and are much smarter (the best way to optimize X is 
   not to make X run fast, but to avoid doing X in the first place!). The 
   thing you are trying to work with is the one-time-only case where you 
   explicitly disable that big and important optimization, and then you 
   complain about the end result being slow!

   It's like saying that you're compiling with extreme debugging and no
   optimizations, and then complaining that the end result doesn't run as 
   fast as if you used -O2. Except this is a hundred times worse, because 
   you literally asked git to do the really expensive thing that it really 
   really doesn't want to do ;)

> Is there another allocator to try? One that combines Google's
> efficiency with gcc's speed?

See above: I'd look around at threading-related bugs and check the way we 
lock (or don't) accesses.

		Linus

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Something is broken in repack
  2007-12-11 17:19           ` Linus Torvalds
@ 2007-12-11 17:24             ` Nicolas Pitre
  2007-12-11 17:28               ` David Miller
  2007-12-11 18:57             ` Jon Smirl
  1 sibling, 1 reply; 45+ messages in thread
From: Nicolas Pitre @ 2007-12-11 17:24 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Jon Smirl, Junio C Hamano, gcc, Git Mailing List

[-- Attachment #1: Type: TEXT/PLAIN, Size: 4148 bytes --]

On Tue, 11 Dec 2007, Linus Torvalds wrote:

> That said, I suspect there are a few things fighting you:
> 
>  - threading is hard. I haven't looked a lot at the changes Nico did to do 
>    a threaded object packer, but what I've seen does not convince me it is 
>    correct. The "trg_entry" accesses are *mostly* protected with 
>    "cache_lock", but nothing else really seems to be, so quite frankly, I 
>    wouldn't trust the threaded version very much. It's off by default, and 
>    for a good reason, I think.

I beg to differ (of course, since I always know precisely what I do, and 
like you, my code never has bugs).

Seriously though, the trg_entry has not to be protected at all.  Why? 
Simply because each thread has its own exclusive set of objects which no 
other threads ever mess with.  They never overlap.

>    For example: the packing code does this:
> 
> 	if (!src->data) {
> 		read_lock();
> 		src->data = read_sha1_file(src_entry->idx.sha1, &type, &sz);
> 		read_unlock();
> 		...
> 
>    and that's racy. If two threads come in at roughly the same time and 
>    see a NULL src->data, theyÍ„'ll both get the lock, and they'll both 
>    (serially) try to fill it in. It will all *work*, but one of them will 
>    have done unnecessary work, and one of them will have their result 
>    thrown away and leaked.

No.  Once again, it is impossible for two threads to ever see the same 
src->data at all.  The lock is there simply because read_sha1_file() is 
not reentrant.

>    Are you hitting issues like this? I dunno. The object sorting means 
>    that different threads normally shouldn't look at the same objects (not 
>    even the sources), so probably not, but basically, I wouldn't trust the 
>    threading 100%. It needs work, and it needs to stay off by default.

For now it is, but I wouldn't say it really needs significant work at 
this point.  The latest thread patches were more about tuning than 
correctness.

What the threading could be doing, though, is uncovering some other 
bugs, like in the pack mmap windowing code for example.  Although that 
code is serialized by the read lock above, the fact that multiple 
threads are hammering on it in turns means that the mmap window is 
possibly seeking back and forth much more often than otherwise, possibly 
leaking something in the process.

>  - you're working on a problem that isn't really even worth optimizing 
>    that much. The *normal* case is to re-use old deltas, which makes all 
>    of the issues you are fighting basically go away (because you only have 
>    a few _incremental_ objects that need deltaing). 
> 
>    In other words: the _real_ optimizations have already been done, and 
>    are done elsewhere, and are much smarter (the best way to optimize X is 
>    not to make X run fast, but to avoid doing X in the first place!). The 
>    thing you are trying to work with is the one-time-only case where you 
>    explicitly disable that big and important optimization, and then you 
>    complain about the end result being slow!
> 
>    It's like saying that you're compiling with extreme debugging and no
>    optimizations, and then complaining that the end result doesn't run as 
>    fast as if you used -O2. Except this is a hundred times worse, because 
>    you literally asked git to do the really expensive thing that it really 
>    really doesn't want to do ;)

Linus, please pay attention to the _actual_ important issue here.

Sure I've been tuning the threading code in parallel to the attempt to 
debug this memory usage issue.

BUT.  The point is that repacking the gcc repo using "git repack -a -f 
--window=250" has a radically different memory usage profile whether you 
do the repack on the earlier 2.1GB pack or the later 300MB pack.  
_That_ is the issue.  Ironically, it is the 300MB pack that causes the 
repack to blow memory usage out of proportion.

And in both cases, the threading code has to do the same 
work whether or not the original pack was densely packed or not since -f 
throws away every existing deltas anyway.

So something is fishy elsewhere than in the packing code.

Nicolas

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Something is broken in repack
  2007-12-11 17:24             ` Nicolas Pitre
@ 2007-12-11 17:28               ` David Miller
  2007-12-11 18:43                 ` Nicolas Pitre
  0 siblings, 1 reply; 45+ messages in thread
From: David Miller @ 2007-12-11 17:28 UTC (permalink / raw)
  To: nico; +Cc: torvalds, jonsmirl, gitster, gcc, git

From: Nicolas Pitre <nico@cam.org>
Date: Tue, 11 Dec 2007 12:21:11 -0500 (EST)

> BUT.  The point is that repacking the gcc repo using "git repack -a -f 
> --window=250" has a radically different memory usage profile whether you 
> do the repack on the earlier 2.1GB pack or the later 300MB pack.  

If you repack on the smaller pack file, git has to expand more stuff
internally in order to search the deltas, whereas with the larger pack
file I bet git has to less often undelta'ify to get base objects blobs
for delta search.

In fact that behavior makes perfect sense to me and I don't understand
GIT internals very well :-)

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Something is broken in repack
  2007-12-11 17:28               ` David Miller
@ 2007-12-11 18:43                 ` Nicolas Pitre
  2007-12-11 20:34                   ` Andreas Ericsson
  0 siblings, 1 reply; 45+ messages in thread
From: Nicolas Pitre @ 2007-12-11 18:43 UTC (permalink / raw)
  To: David Miller; +Cc: Linus Torvalds, jonsmirl, Junio C Hamano, gcc, git

On Tue, 11 Dec 2007, David Miller wrote:

> From: Nicolas Pitre <nico@cam.org>
> Date: Tue, 11 Dec 2007 12:21:11 -0500 (EST)
> 
> > BUT.  The point is that repacking the gcc repo using "git repack -a -f 
> > --window=250" has a radically different memory usage profile whether you 
> > do the repack on the earlier 2.1GB pack or the later 300MB pack.  
> 
> If you repack on the smaller pack file, git has to expand more stuff
> internally in order to search the deltas, whereas with the larger pack
> file I bet git has to less often undelta'ify to get base objects blobs
> for delta search.

Of course.  I came to that conclusion two days ago.  And despite being 
pretty familiar with the involved code (I wrote part of it myself) I 
just can't spot anything wrong with it so far.

But somehow the threading code keep distracting people from that issue 
since it gets to do the same work whether or not the source pack is 
densely packed or not.

Nicolas 
(who wish he had access to a much faster machine to investigate this issue)

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Something is broken in repack
  2007-12-11 18:43                 ` Nicolas Pitre
@ 2007-12-11 20:34                   ` Andreas Ericsson
  0 siblings, 0 replies; 45+ messages in thread
From: Andreas Ericsson @ 2007-12-11 20:34 UTC (permalink / raw)
  To: Nicolas Pitre
  Cc: David Miller, Linus Torvalds, jonsmirl, Junio C Hamano, gcc, git

Nicolas Pitre wrote:
> On Tue, 11 Dec 2007, David Miller wrote:
> 
>> From: Nicolas Pitre <nico@cam.org>
>> Date: Tue, 11 Dec 2007 12:21:11 -0500 (EST)
>>
>>> BUT.  The point is that repacking the gcc repo using "git repack -a -f 
>>> --window=250" has a radically different memory usage profile whether you 
>>> do the repack on the earlier 2.1GB pack or the later 300MB pack.  
>> If you repack on the smaller pack file, git has to expand more stuff
>> internally in order to search the deltas, whereas with the larger pack
>> file I bet git has to less often undelta'ify to get base objects blobs
>> for delta search.
> 
> Of course.  I came to that conclusion two days ago.  And despite being 
> pretty familiar with the involved code (I wrote part of it myself) I 
> just can't spot anything wrong with it so far.
> 
> But somehow the threading code keep distracting people from that issue 
> since it gets to do the same work whether or not the source pack is 
> densely packed or not.
> 
> Nicolas 
> (who wish he had access to a much faster machine to investigate this issue)

If it's still an issue next week, we'll have a 16 core (8 dual-core cpu's)
machine with some 32gb of ram in that'll be free for about two days.
You'll have to remind me about it though, as I've got a lot on my mind
these days.

-- 
Andreas Ericsson                   andreas.ericsson@op5.se
OP5 AB                             www.op5.se
Tel: +46 8-230225                  Fax: +46 8-230231

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Something is broken in repack
  2007-12-11 17:19           ` Linus Torvalds
  2007-12-11 17:24             ` Nicolas Pitre
@ 2007-12-11 18:57             ` Jon Smirl
  2007-12-11 19:17               ` Nicolas Pitre
  2007-12-11 19:40               ` Linus Torvalds
  1 sibling, 2 replies; 45+ messages in thread
From: Jon Smirl @ 2007-12-11 18:57 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Nicolas Pitre, Junio C Hamano, gcc, Git Mailing List

On 12/11/07, Linus Torvalds <torvalds@linux-foundation.org> wrote:
>
>
> On Tue, 11 Dec 2007, Jon Smirl wrote:
> >
> > So why does our threaded code take 20 CPU minutes longer (12%) to run
> > than the same code with a single thread?
>
> Threaded code *always* takes more CPU time. The only thing you can hope
> for is a wall-clock reduction. You're seeing probably a combination of
>  (a) more cache misses
>  (b) bigger dataset active at a time
> and a probably fairly miniscule
>  (c) threading itself tends to have some overheads.
>
> > Q6600 is just two E6600s in the same package, the caches are not shared.
>
> Sure they are shared. They're just not *entirely* shared. But they are
> shared between each two cores, so each thread essentially has only half
> the cache they had with the non-threaded version.
>
> Threading is *not* a magic solution to all problems. It gives you
> potentially twice the CPU power, but there are real downsides that you
> should keep in mind.
>
> > Why does the threaded code need 2.24GB (google allocator, 2.85GB gcc)
> > with 4 threads? But only need 950MB with one thread? Where's the extra
> > gigabyte going?
>
> I suspect that it's really simple: you have a few rather big files in the
> gcc history, with deep delta chains. And what happens when you have four
> threads running at the same time is that they all need to keep all those
> objects that they are working on - and their hash state - in memory at the
> same time!
>
> So if you want to use more threads, that _forces_ you to have a bigger
> memory footprint, simply because you have more "live" objects that you
> work on. Normally, that isn't much of a problem, since most source files
> are small, but if you have a few deep delta chains on big files, both the
> delta chain itself is going to use memory (you may have limited the size
> of the cache, but it's still needed for the actual delta generation, so
> it's not like the memory usage went away).

This makes sense. Those runs that blew up to 4.5GB were a combination
of this effect and fragmentation in the gcc allocator. Google
allocator appears to be much better at controlling fragmentation.

Is there a reasonable scheme to force the chains to only be loaded
once and then shared between worker threads? The memory blow up
appears to be directly correlated with chain length.

>
> That said, I suspect there are a few things fighting you:
>
>  - threading is hard. I haven't looked a lot at the changes Nico did to do
>    a threaded object packer, but what I've seen does not convince me it is
>    correct. The "trg_entry" accesses are *mostly* protected with
>    "cache_lock", but nothing else really seems to be, so quite frankly, I
>    wouldn't trust the threaded version very much. It's off by default, and
>    for a good reason, I think.
>
>    For example: the packing code does this:
>
>         if (!src->data) {
>                 read_lock();
>                 src->data = read_sha1_file(src_entry->idx.sha1, &type, &sz);
>                 read_unlock();
>                 ...
>
>    and that's racy. If two threads come in at roughly the same time and
>    see a NULL src->data, theÿ́'ll both get the lock, and they'll both
>    (serially) try to fill it in. It will all *work*, but one of them will
>    have done unnecessary work, and one of them will have their result
>    thrown away and leaked.

That may account for the threaded version needing an extra 20 minutes
CPU time.  An extra 12% of CPU seems like too much overhead for
threading. Just letting a couple of those long chain compressions be
done twice

>
>    Are you hitting issues like this? I dunno. The object sorting means
>    that different threads normally shouldn't look at the same objects (not
>    even the sources), so probably not, but basically, I wouldn't trust the
>    threading 100%. It needs work, and it needs to stay off by default.
>
>  - you're working on a problem that isn't really even worth optimizing
>    that much. The *normal* case is to re-use old deltas, which makes all
>    of the issues you are fighting basically go away (because you only have
>    a few _incremental_ objects that need deltaing).

I agree, this problem only occurs when people import giant
repositories. But every time someone hits these problems they declare
git to be screwed up and proceed to thrash it in their blogs.

>    In other words: the _real_ optimizations have already been done, and
>    are done elsewhere, and are much smarter (the best way to optimize X is
>    not to make X run fast, but to avoid doing X in the first place!). The
>    thing you are trying to work with is the one-time-only case where you
>    explicitly disable that big and important optimization, and then you
>    complain about the end result being slow!
>
>    It's like saying that you're compiling with extreme debugging and no
>    optimizations, and then complaining that the end result doesn't run as
>    fast as if you used -O2. Except this is a hundred times worse, because
>    you literally asked git to do the really expensive thing that it really
>    really doesn't want to do ;)
>
> > Is there another allocator to try? One that combines Google's
> > efficiency with gcc's speed?
>
> See above: I'd look around at threading-related bugs and check the way we
> lock (or don't) accesses.
>
>                 Linus
>


-- 
Jon Smirl
jonsmirl@gmail.com

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Something is broken in repack
  2007-12-11 18:57             ` Jon Smirl
@ 2007-12-11 19:17               ` Nicolas Pitre
  2007-12-11 19:40               ` Linus Torvalds
  1 sibling, 0 replies; 45+ messages in thread
From: Nicolas Pitre @ 2007-12-11 19:17 UTC (permalink / raw)
  To: Jon Smirl; +Cc: Linus Torvalds, Junio C Hamano, gcc, Git Mailing List

[-- Attachment #1: Type: TEXT/PLAIN, Size: 3042 bytes --]

On Tue, 11 Dec 2007, Jon Smirl wrote:

> This makes sense. Those runs that blew up to 4.5GB were a combination
> of this effect and fragmentation in the gcc allocator.

I disagree.  This is insane.

> Google allocator appears to be much better at controlling fragmentation.

Indeed.  And if fragmentation is indeed wasting half of Git's memory 
usage then we'll have to come with a custom memory allocator.

> Is there a reasonable scheme to force the chains to only be loaded
> once and then shared between worker threads? The memory blow up
> appears to be directly correlated with chain length.

No.  That would be the equivalent of holding each revision of all files 
uncompressed all at once in memory.

> > That said, I suspect there are a few things fighting you:
> >
> >  - threading is hard. I haven't looked a lot at the changes Nico did to do
> >    a threaded object packer, but what I've seen does not convince me it is
> >    correct. The "trg_entry" accesses are *mostly* protected with
> >    "cache_lock", but nothing else really seems to be, so quite frankly, I
> >    wouldn't trust the threaded version very much. It's off by default, and
> >    for a good reason, I think.
> >
> >    For example: the packing code does this:
> >
> >         if (!src->data) {
> >                 read_lock();
> >                 src->data = read_sha1_file(src_entry->idx.sha1, &type, &sz);
> >                 read_unlock();
> >                 ...
> >
> >    and that's racy. If two threads come in at roughly the same time and
> >    see a NULL src->data, theyÍ„'ll both get the lock, and they'll both
> >    (serially) try to fill it in. It will all *work*, but one of them will
> >    have done unnecessary work, and one of them will have their result
> >    thrown away and leaked.
> 
> That may account for the threaded version needing an extra 20 minutes
> CPU time.  An extra 12% of CPU seems like too much overhead for
> threading. Just letting a couple of those long chain compressions be
> done twice

No it may not.  This theory is wrong as explained before.

> >
> >    Are you hitting issues like this? I dunno. The object sorting means
> >    that different threads normally shouldn't look at the same objects (not
> >    even the sources), so probably not, but basically, I wouldn't trust the
> >    threading 100%. It needs work, and it needs to stay off by default.
> >
> >  - you're working on a problem that isn't really even worth optimizing
> >    that much. The *normal* case is to re-use old deltas, which makes all
> >    of the issues you are fighting basically go away (because you only have
> >    a few _incremental_ objects that need deltaing).
> 
> I agree, this problem only occurs when people import giant
> repositories. But every time someone hits these problems they declare
> git to be screwed up and proceed to thrash it in their blogs.

It's not only for repack.  Someone just reported git-blame being 
unusable too due to insane memory usage, which I suspect is due to the 
same issue.


Nicolas

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Something is broken in repack
  2007-12-11 18:57             ` Jon Smirl
  2007-12-11 19:17               ` Nicolas Pitre
@ 2007-12-11 19:40               ` Linus Torvalds
  2007-12-11 20:26                 ` Junio C Hamano
  1 sibling, 1 reply; 45+ messages in thread
From: Linus Torvalds @ 2007-12-11 19:40 UTC (permalink / raw)
  To: Jon Smirl; +Cc: Nicolas Pitre, Junio C Hamano, gcc, Git Mailing List

On Tue, 11 Dec 2007, Jon Smirl wrote:
> >
> > So if you want to use more threads, that _forces_ you to have a bigger
> > memory footprint, simply because you have more "live" objects that you
> > work on. Normally, that isn't much of a problem, since most source files
> > are small, but if you have a few deep delta chains on big files, both the
> > delta chain itself is going to use memory (you may have limited the size
> > of the cache, but it's still needed for the actual delta generation, so
> > it's not like the memory usage went away).
> 
> This makes sense. Those runs that blew up to 4.5GB were a combination
> of this effect and fragmentation in the gcc allocator. Google
> allocator appears to be much better at controlling fragmentation.

Yes. I think we do have some case where we simply keep a lot of objects 
around, and if we are talking reasonably large deltas, we'll have the 
whole delta-chain in memory just to unpack one single object.

The delta cache size limits kick in only when we explicitly cache old 
delta results (in case they will be re-used, which is rather common), it 
doesn't affect the normal "I'm using this data right now" case at all.

And then fragmentation makes it much much worse. Since the allocation 
patterns aren't nice (they are pretty random and depend on just the sizes 
of the objects), and the lifetimes aren't always nicely nested _either_ 
(they become more so when you disable the cache entirely, but that's just 
death for performance), I'm not surprised that there can be memory 
allocators that end up having some issues.

> Is there a reasonable scheme to force the chains to only be loaded
> once and then shared between worker threads? The memory blow up
> appears to be directly correlated with chain length.

The worker threads explicitly avoid touching the same objects, and no, you 
definitely don't want to explode the chains globally once, because the 
whole point is that we do fit 15 years worth of history into 300MB of 
pack-file thanks to having a very dense representation. The "loaded once" 
part is the mmap'ing of the pack-file into memory, but if you were to 
actually then try to expand the chains, you'd be talking about many *many* 
more gigabytes of memory than you already see used ;)

So what you actually want to do is to just re-use already packed delta 
chains directly, which is what we normally do. But you are explicitly 
looking at the "--no-reuse-delta" (aka "git repack -f") case, which is why 
it then blows up.

I'm sure we can find places to improve. But I would like to re-iterate the 
statement that you're kind of doing a "don't do that then" case which is 
really - by design - meant to be done once and never again, and is using 
resources - again, pretty much by design - wildly inappropriately just to 
get an initial packing done.

> That may account for the threaded version needing an extra 20 minutes
> CPU time.  An extra 12% of CPU seems like too much overhead for
> threading. Just letting a couple of those long chain compressions be
> done twice

Well, Nico pointed out that those things should all be thread-private 
data, so no, the race isn't there (unless there's some other bug there).

> I agree, this problem only occurs when people import giant
> repositories. But every time someone hits these problems they declare
> git to be screwed up and proceed to thrash it in their blogs.

Sure. I'd love to do global packing without paying the cost, but it really 
was a design decision. Thanks to doing off-line packing ("let it run 
overnight on some beefy machine") we can get better results. It's 
expensive, yes. But it was pretty much meant to be expensive. It's a very 
efficient compression algorithm, after all, and you're turning it up to 
eleven ;)

I also suspect that the gcc archive makes things more interesting thanks 
to having some rather large files. The ChangeLog is probably the worst 
case (large file with *lots* of edits), but I suspect the *.po files 
aren't wonderful either.

			Linus

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Something is broken in repack
  2007-12-11 19:40               ` Linus Torvalds
@ 2007-12-11 20:26                 ` Junio C Hamano
  2007-12-12  5:06                   ` Andreas Ericsson
  0 siblings, 1 reply; 45+ messages in thread
From: Junio C Hamano @ 2007-12-11 20:26 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Jon Smirl, Nicolas Pitre, gcc, Git Mailing List

Linus Torvalds <torvalds@linux-foundation.org> writes:

> On Tue, 11 Dec 2007, Jon Smirl wrote:
>> >
>> > So if you want to use more threads, that _forces_ you to have a bigger
>> > memory footprint, simply because you have more "live" objects that you
>> > work on. Normally, that isn't much of a problem, since most source files
>> > are small, but if you have a few deep delta chains on big files, both the
>> > delta chain itself is going to use memory (you may have limited the size
>> > of the cache, but it's still needed for the actual delta generation, so
>> > it's not like the memory usage went away).
>> 
>> This makes sense. Those runs that blew up to 4.5GB were a combination
>> of this effect and fragmentation in the gcc allocator. Google
>> allocator appears to be much better at controlling fragmentation.
>
> Yes. I think we do have some case where we simply keep a lot of objects 
> around, and if we are talking reasonably large deltas, we'll have the 
> whole delta-chain in memory just to unpack one single object.

Eh, excuse me.  unpack_delta_entry()

 - first unpacks the base object (this goes recursive);
 - uncompresses the delta;
 - applies the delta to the base to obtain the target object;
 - frees delta;
 - frees (but allows it to be cached) the base object;
 - returns the result

So no matter how deep a chain is, you keep only one delta at a time in
core, not whole delta-chain in core.

> So what you actually want to do is to just re-use already packed delta 
> chains directly, which is what we normally do. But you are explicitly 
> looking at the "--no-reuse-delta" (aka "git repack -f") case, which is why 
> it then blows up.

While that does not explain, as Nico pointed out, the huge difference
between the two repack runs that have different starting pack, I would
say it is a fair thing to say.  If you have a suboptimal pack (i.e. not
enough reusable deltas, as in the 2.1GB pack case), do run "repack -f",
but if you have a good pack (i.e. 300MB pack), don't.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Something is broken in repack
  2007-12-11 20:26                 ` Junio C Hamano
@ 2007-12-12  5:06                   ` Andreas Ericsson
  0 siblings, 0 replies; 45+ messages in thread
From: Andreas Ericsson @ 2007-12-12  5:06 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Linus Torvalds, Jon Smirl, Nicolas Pitre, gcc, Git Mailing List

Junio C Hamano wrote:
> Linus Torvalds <torvalds@linux-foundation.org> writes:
> 
>> So what you actually want to do is to just re-use already packed delta 
>> chains directly, which is what we normally do. But you are explicitly 
>> looking at the "--no-reuse-delta" (aka "git repack -f") case, which is why 
>> it then blows up.
> 
> While that does not explain, as Nico pointed out, the huge difference
> between the two repack runs that have different starting pack, I would
> say it is a fair thing to say.  If you have a suboptimal pack (i.e. not
> enough reusable deltas, as in the 2.1GB pack case), do run "repack -f",
> but if you have a good pack (i.e. 300MB pack), don't.


I think this is too much of a mystery for a lot of people to let it go.
Even I started looking into it, and I've got so little spare time just
now that I wouldn't stand much of a chance of making a contribution
even if I had written the code originally.

That being said, I the fact that some git repositories really *can't*
be repacked on some machines (because it eats ALL virtual memory) is
really something that lowers git's reputation among huge projects.

-- 
Andreas Ericsson                   andreas.ericsson@op5.se
OP5 AB                             www.op5.se
Tel: +46 8-230225                  Fax: +46 8-230231

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Something is broken in repack
  2007-12-11  7:34         ` Jon Smirl
                             ` (2 preceding siblings ...)
  2007-12-11 17:19           ` Linus Torvalds
@ 2007-12-11 17:44           ` Daniel Berlin
  3 siblings, 0 replies; 45+ messages in thread
From: Daniel Berlin @ 2007-12-11 17:44 UTC (permalink / raw)
  To: Jon Smirl; +Cc: Nicolas Pitre, Junio C Hamano, gcc, Git Mailing List

On 12/11/07, Jon Smirl <jonsmirl@gmail.com> wrote:
>
> Total CPU time 196 CPU minutes vs 190 for gcc. Google's claims of
> being faster are not true.

Depends on your allocation patterns. For our apps, it certainly is :)
Of course, i don't know if we've updated the external allocator in a
while, i'll bug the people in charge of it.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Something is broken in repack
  2007-12-11  7:01       ` Something is broken in repack Jon Smirl
  2007-12-11  7:34         ` Jon Smirl
@ 2007-12-11 13:49         ` Nicolas Pitre
  1 sibling, 0 replies; 45+ messages in thread
From: Nicolas Pitre @ 2007-12-11 13:49 UTC (permalink / raw)
  To: Jon Smirl; +Cc: Junio C Hamano, gcc, Git Mailing List

On Tue, 11 Dec 2007, Jon Smirl wrote:

> I added the gcc people to the CC, it's their repository. Maybe they
> can help up sort this out.

Unless there is a Git expert amongst the gcc crowd, I somehow doubt it. 
And gcc people with an interest in Git internals are probably already on 
the Git mailing list.


Nicolas

^ permalink raw reply	[flat|nested] 45+ messages in thread

end of thread, other threads:[~2007-12-14 16:59 UTC | newest]

Thread overview: 45+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <9e4733910712071505y6834f040k37261d65a2d445c4@mail.gmail.com>
     [not found] ` <9e4733910712101825l33cdc2c0mca2ddbfd5afdb298@mail.gmail.com>
     [not found]   ` <alpine.LFD.0.99999.0712102231570.555@xanadu.home>
     [not found]     ` <9e4733910712102125w56c70c0cxb8b00a060b62077@mail.gmail.com>
2007-12-11  7:01       ` Something is broken in repack Jon Smirl
2007-12-11  7:34         ` Jon Smirl
2007-12-11 11:11           ` Andreas Ericsson
2007-12-11 15:01           ` Nicolas Pitre
2007-12-11 15:36             ` Nicolas Pitre
2007-12-11 16:20               ` Jon Smirl
2007-12-11 16:22               ` Nicolas Pitre
2007-12-11 16:34                 ` Jon Smirl
2007-12-12  7:25                   ` Nicolas Pitre
2007-12-12 12:02                     ` David Kastrup
2007-12-14 16:44                       ` Wolfram Gloger
2007-12-12 16:14                     ` Nicolas Pitre
2007-12-12 16:37                       ` Paolo Bonzini
2007-12-12 16:42                       ` Linus Torvalds
2007-12-12 16:54                         ` David Miller
2007-12-12 17:12                           ` Linus Torvalds
2007-12-12 17:29                         ` Jon Smirl
2007-12-14 16:19                         ` Wolfram Gloger
2007-12-14 16:59                           ` David Kastrup
2007-12-14 18:49                             ` Wolfram Gloger
2007-12-13 14:09                       ` Nguyen Thai Ngoc Duy
     [not found]                         ` <fjrj9k$n6k$1@ger.gmane.org>
2007-12-13 16:39                           ` Paolo Bonzini
2007-12-13 17:40                           ` Johannes Sixt
2007-12-14  2:31                             ` Jakub Narebski
     [not found]                               ` <fjt6vm$n7d$1@ger.gmane.org>
2007-12-14  6:39                                 ` Nguyen Thai Ngoc Duy
2007-12-14  9:02                                   ` Paolo Bonzini
2007-12-14  9:39                                     ` Harvey Harrison
2007-12-14 10:52                                   ` Jakub Narebski
2007-12-14 13:25                                     ` Nguyen Thai Ngoc Duy
2007-12-14 13:53                                 ` Nicolas Pitre
2007-12-12 16:19                     ` Nicolas Pitre
2007-12-13  7:39                       ` Andreas Ericsson
2007-12-14 16:12                         ` Wolfram Gloger
2007-12-11 17:19           ` Linus Torvalds
2007-12-11 17:24             ` Nicolas Pitre
2007-12-11 17:28               ` David Miller
2007-12-11 18:43                 ` Nicolas Pitre
2007-12-11 20:34                   ` Andreas Ericsson
2007-12-11 18:57             ` Jon Smirl
2007-12-11 19:17               ` Nicolas Pitre
2007-12-11 19:40               ` Linus Torvalds
2007-12-11 20:26                 ` Junio C Hamano
2007-12-12  5:06                   ` Andreas Ericsson
2007-12-11 17:44           ` Daniel Berlin
2007-12-11 13:49         ` Nicolas Pitre

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).