* Re: Something is broken in repack [not found] ` <9e4733910712102125w56c70c0cxb8b00a060b62077@mail.gmail.com> @ 2007-12-11 7:01 ` Jon Smirl 2007-12-11 7:34 ` Jon Smirl 2007-12-11 13:49 ` Nicolas Pitre 0 siblings, 2 replies; 45+ messages in thread From: Jon Smirl @ 2007-12-11 7:01 UTC (permalink / raw) To: Nicolas Pitre, Junio C Hamano, gcc; +Cc: Git Mailing List I added the gcc people to the CC, it's their repository. Maybe they can help up sort this out. On 12/11/07, Jon Smirl <jonsmirl@gmail.com> wrote: > On 12/10/07, Nicolas Pitre <nico@cam.org> wrote: > > On Mon, 10 Dec 2007, Jon Smirl wrote: > > > > > New run using same configuration. With the addition of the more > > > efficient load balancing patches and delta cache accounting. > > > > > > Seconds are wall clock time. They are lower since the patch made > > > threading better at using all four cores. I am stuck at 380-390% CPU > > > utilization for the git process. > > > > > > complete seconds RAM > > > 10% 60 900M (includes counting) > > > 20% 15 900M > > > 30% 15 900M > > > 40% 50 1.2G > > > 50% 80 1.3G > > > 60% 70 1.7G > > > 70% 140 1.8G > > > 80% 180 2.0G > > > 90% 280 2.2G > > > 95% 530 2.8G - 1,420 total to here, previous was 1,983 > > > 100% 1390 2.85G > > > During the writing phase RAM fell to 1.6G > > > What is being freed in the writing phase?? > > > > The cached delta results, but you put a cap of 256MB for them. > > > > Could you try again with that cache disabled entirely, with > > pack.deltacachesize = 1 (don't use 0 as that means unbounded). > > > > And then, while still keeping the delta cache disabled, could you try > > with pack.threads = 2, and pack.threads = 1 ? > > > > I'm sorry to ask you to do this but I don't have enough ram to even > > complete a repack with threads=2 so I'm reattempting single threaded at > > the moment. But I really wonder if the threading has such an effect on > > memory usage. > > I already have a threads = 1 running with this config. Binary and > config were same from threads=4 run. > > 10% 28min 950M > 40% 135min 950M > 50% 157min 900M > 60% 160min 830M > 100% 170min 830M > > Something is hurting bad with threads. 170 CPU minutes with one > thread, versus 195 CPU minutes with four threads. > > Is there a different memory allocator that can be used when > multithreaded on gcc? This whole problem may be coming from the memory > allocation function. git is hardly interacting at all on the thread > level so it's likely a problem in the C run-time. > > [core] > repositoryformatversion = 0 > filemode = true > bare = false > logallrefupdates = true > [pack] > threads = 1 > deltacachesize = 256M > windowmemory = 256M > deltacachelimit = 0 > [remote "origin"] > url = git://git.infradead.org/gcc.git > fetch = +refs/heads/*:refs/remotes/origin/* > [branch "trunk"] > remote = origin > merge = refs/heads/trunk > > > > > > > > > > > > > > > > I have no explanation for the change in RAM usage. Two guesses come to > > > mind. Memory fragmentation. Or the change in the way the work was > > > split up altered RAM usage. > > > > > > Total CPU time was 195 minutes in 70 minutes clock time. About 70% > > > efficient. During the compress phase all four cores were active until > > > the last 90 seconds. Writing the objects took over 23 minutes CPU > > > bound on one core. > > > > > > New pack file is: 270,594,853 > > > Old one was: 344,543,752 > > > It still has 828,660 objects > > > > You mean the pack for the gcc repo is now less than 300MB? Wow. > > > > > > Nicolas > > > > > -- > Jon Smirl > jonsmirl@gmail.com > -- Jon Smirl jonsmirl@gmail.com ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: Something is broken in repack 2007-12-11 7:01 ` Something is broken in repack Jon Smirl @ 2007-12-11 7:34 ` Jon Smirl 2007-12-11 11:11 ` Andreas Ericsson ` (3 more replies) 2007-12-11 13:49 ` Nicolas Pitre 1 sibling, 4 replies; 45+ messages in thread From: Jon Smirl @ 2007-12-11 7:34 UTC (permalink / raw) To: Nicolas Pitre, Junio C Hamano, gcc; +Cc: Git Mailing List Switching to the Google perftools malloc http://goog-perftools.sourceforge.net/ 10% 30 828M 20% 15 831M 30% 10 834M 40% 50 1014M 50% 80 1086M 60% 80 1500M 70% 200 1.53G 80% 200 1.85G 90% 260 1.87G 95% 520 1.97G 100% 1335 2.24G Google allocator knocked 600MB off from memory use. Memory consumption did not fall during the write out phase like it did with gcc. Since all of this is with the same code except for changing the threading split, those runs where memory consumption went to 4.5GB with the gcc allocator must have triggered an extreme problem with fragmentation. Total CPU time 196 CPU minutes vs 190 for gcc. Google's claims of being faster are not true. So why does our threaded code take 20 CPU minutes longer (12%) to run than the same code with a single thread? Clock time is obviously faster. Are the threads working too close to each other in memory and bouncing cache lines between the cores? Q6600 is just two E6600s in the same package, the caches are not shared. Why does the threaded code need 2.24GB (google allocator, 2.85GB gcc) with 4 threads? But only need 950MB with one thread? Where's the extra gigabyte going? Is there another allocator to try? One that combines Google's efficiency with gcc's speed? On 12/11/07, Jon Smirl <jonsmirl@gmail.com> wrote: > I added the gcc people to the CC, it's their repository. Maybe they > can help up sort this out. > > On 12/11/07, Jon Smirl <jonsmirl@gmail.com> wrote: > > On 12/10/07, Nicolas Pitre <nico@cam.org> wrote: > > > On Mon, 10 Dec 2007, Jon Smirl wrote: > > > > > > > New run using same configuration. With the addition of the more > > > > efficient load balancing patches and delta cache accounting. > > > > > > > > Seconds are wall clock time. They are lower since the patch made > > > > threading better at using all four cores. I am stuck at 380-390% CPU > > > > utilization for the git process. > > > > > > > > complete seconds RAM > > > > 10% 60 900M (includes counting) > > > > 20% 15 900M > > > > 30% 15 900M > > > > 40% 50 1.2G > > > > 50% 80 1.3G > > > > 60% 70 1.7G > > > > 70% 140 1.8G > > > > 80% 180 2.0G > > > > 90% 280 2.2G > > > > 95% 530 2.8G - 1,420 total to here, previous was 1,983 > > > > 100% 1390 2.85G > > > > During the writing phase RAM fell to 1.6G > > > > What is being freed in the writing phase?? > > > > > > The cached delta results, but you put a cap of 256MB for them. > > > > > > Could you try again with that cache disabled entirely, with > > > pack.deltacachesize = 1 (don't use 0 as that means unbounded). > > > > > > And then, while still keeping the delta cache disabled, could you try > > > with pack.threads = 2, and pack.threads = 1 ? > > > > > > I'm sorry to ask you to do this but I don't have enough ram to even > > > complete a repack with threads=2 so I'm reattempting single threaded at > > > the moment. But I really wonder if the threading has such an effect on > > > memory usage. > > > > I already have a threads = 1 running with this config. Binary and > > config were same from threads=4 run. > > > > 10% 28min 950M > > 40% 135min 950M > > 50% 157min 900M > > 60% 160min 830M > > 100% 170min 830M > > > > Something is hurting bad with threads. 170 CPU minutes with one > > thread, versus 195 CPU minutes with four threads. > > > > Is there a different memory allocator that can be used when > > multithreaded on gcc? This whole problem may be coming from the memory > > allocation function. git is hardly interacting at all on the thread > > level so it's likely a problem in the C run-time. > > > > [core] > > repositoryformatversion = 0 > > filemode = true > > bare = false > > logallrefupdates = true > > [pack] > > threads = 1 > > deltacachesize = 256M > > windowmemory = 256M > > deltacachelimit = 0 > > [remote "origin"] > > url = git://git.infradead.org/gcc.git > > fetch = +refs/heads/*:refs/remotes/origin/* > > [branch "trunk"] > > remote = origin > > merge = refs/heads/trunk > > > > > > > > > > > > > > > > > > > > > > > > > I have no explanation for the change in RAM usage. Two guesses come to > > > > mind. Memory fragmentation. Or the change in the way the work was > > > > split up altered RAM usage. > > > > > > > > Total CPU time was 195 minutes in 70 minutes clock time. About 70% > > > > efficient. During the compress phase all four cores were active until > > > > the last 90 seconds. Writing the objects took over 23 minutes CPU > > > > bound on one core. > > > > > > > > New pack file is: 270,594,853 > > > > Old one was: 344,543,752 > > > > It still has 828,660 objects > > > > > > You mean the pack for the gcc repo is now less than 300MB? Wow. > > > > > > > > > Nicolas > > > > > > > > > -- > > Jon Smirl > > jonsmirl@gmail.com > > > > > -- > Jon Smirl > jonsmirl@gmail.com > -- Jon Smirl jonsmirl@gmail.com ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: Something is broken in repack 2007-12-11 7:34 ` Jon Smirl @ 2007-12-11 11:11 ` Andreas Ericsson 2007-12-11 15:01 ` Nicolas Pitre ` (2 subsequent siblings) 3 siblings, 0 replies; 45+ messages in thread From: Andreas Ericsson @ 2007-12-11 11:11 UTC (permalink / raw) To: Jon Smirl; +Cc: Nicolas Pitre, Junio C Hamano, gcc, Git Mailing List Jon Smirl wrote: > Switching to the Google perftools malloc > http://goog-perftools.sourceforge.net/ > > Google allocator knocked 600MB off from memory use. > Memory consumption did not fall during the write out phase like it did with gcc. > > Since all of this is with the same code except for changing the > threading split, those runs where memory consumption went to 4.5GB > with the gcc allocator must have triggered an extreme problem with > fragmentation. > > Total CPU time 196 CPU minutes vs 190 for gcc. Google's claims of > being faster are not true. > Did you use the tcmalloc with heap checker/profiler, or tcmalloc_minimal? -- Andreas Ericsson andreas.ericsson@op5.se OP5 AB www.op5.se Tel: +46 8-230225 Fax: +46 8-230231 ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: Something is broken in repack 2007-12-11 7:34 ` Jon Smirl 2007-12-11 11:11 ` Andreas Ericsson @ 2007-12-11 15:01 ` Nicolas Pitre 2007-12-11 15:36 ` Nicolas Pitre 2007-12-11 17:19 ` Linus Torvalds 2007-12-11 17:44 ` Daniel Berlin 3 siblings, 1 reply; 45+ messages in thread From: Nicolas Pitre @ 2007-12-11 15:01 UTC (permalink / raw) To: Jon Smirl; +Cc: Junio C Hamano, gcc, Git Mailing List On Tue, 11 Dec 2007, Jon Smirl wrote: > Switching to the Google perftools malloc > http://goog-perftools.sourceforge.net/ > > 10% 30 828M > 20% 15 831M > 30% 10 834M > 40% 50 1014M > 50% 80 1086M > 60% 80 1500M > 70% 200 1.53G > 80% 200 1.85G > 90% 260 1.87G > 95% 520 1.97G > 100% 1335 2.24G > > Google allocator knocked 600MB off from memory use. > Memory consumption did not fall during the write out phase like it did with gcc. > > Since all of this is with the same code except for changing the > threading split, those runs where memory consumption went to 4.5GB > with the gcc allocator must have triggered an extreme problem with > fragmentation. Did you mean the glibc allocator? > Total CPU time 196 CPU minutes vs 190 for gcc. Google's claims of > being faster are not true. > > So why does our threaded code take 20 CPU minutes longer (12%) to run > than the same code with a single thread? Clock time is obviously > faster. Are the threads working too close to each other in memory and > bouncing cache lines between the cores? Q6600 is just two E6600s in > the same package, the caches are not shared. Of course there'll always be a certain amount of wasted cycles when threaded. The locking overhead, the extra contention for IO, etc. So 12% overhead (3% per thread) when using 4 threads is not that bad I would say. > Why does the threaded code need 2.24GB (google allocator, 2.85GB gcc) > with 4 threads? But only need 950MB with one thread? Where's the extra > gigabyte going? I really don't know. Did you try with pack.deltacachesize set to 1 ? And yet, this is still missing the actual issue. The issue being that the 2.1GB pack as a _source_ doesn't cause as much memory to be allocated even if the _result_ pack ends up being the same. I was able to repack the 2.1GB pack on my machine which has 1GB of ram. Now that it has been repacked, I can't repack it anymore, even when single threaded, as it start crowling into swap fairly quickly. It is really non intuitive and actually senseless that Git would require twice as much RAM to deal with a pack that is 7 times smaller. Nicolas (still puzzled) ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: Something is broken in repack 2007-12-11 15:01 ` Nicolas Pitre @ 2007-12-11 15:36 ` Nicolas Pitre 2007-12-11 16:20 ` Jon Smirl 2007-12-11 16:22 ` Nicolas Pitre 0 siblings, 2 replies; 45+ messages in thread From: Nicolas Pitre @ 2007-12-11 15:36 UTC (permalink / raw) To: Jon Smirl; +Cc: Junio C Hamano, gcc, Git Mailing List On Tue, 11 Dec 2007, Nicolas Pitre wrote: > And yet, this is still missing the actual issue. The issue being that > the 2.1GB pack as a _source_ doesn't cause as much memory to be > allocated even if the _result_ pack ends up being the same. > > I was able to repack the 2.1GB pack on my machine which has 1GB of ram. > Now that it has been repacked, I can't repack it anymore, even when > single threaded, as it start crowling into swap fairly quickly. It is > really non intuitive and actually senseless that Git would require twice > as much RAM to deal with a pack that is 7 times smaller. OK, here's something else for you to try: core.deltabasecachelimit=0 pack.threads=2 pack.deltacachesize=1 With that I'm able to repack the small gcc pack on my machine with 1GB of ram using: git repack -a -f -d --window=250 --depth=250 and top reports a ~700m virt and ~500m res without hitting swap at all. It is only at 25% so far, but I was unable to get that far before. Would be curious to know what you get with 4 threads on your machine. Nicolas ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: Something is broken in repack 2007-12-11 15:36 ` Nicolas Pitre @ 2007-12-11 16:20 ` Jon Smirl 2007-12-11 16:22 ` Nicolas Pitre 1 sibling, 0 replies; 45+ messages in thread From: Jon Smirl @ 2007-12-11 16:20 UTC (permalink / raw) To: Nicolas Pitre; +Cc: Junio C Hamano, gcc, Git Mailing List On 12/11/07, Nicolas Pitre <nico@cam.org> wrote: > On Tue, 11 Dec 2007, Nicolas Pitre wrote: > > > And yet, this is still missing the actual issue. The issue being that > > the 2.1GB pack as a _source_ doesn't cause as much memory to be > > allocated even if the _result_ pack ends up being the same. > > > > I was able to repack the 2.1GB pack on my machine which has 1GB of ram. > > Now that it has been repacked, I can't repack it anymore, even when > > single threaded, as it start crowling into swap fairly quickly. It is > > really non intuitive and actually senseless that Git would require twice > > as much RAM to deal with a pack that is 7 times smaller. > > OK, here's something else for you to try: > > core.deltabasecachelimit=0 > pack.threads=2 > pack.deltacachesize=1 > > With that I'm able to repack the small gcc pack on my machine with 1GB > of ram using: > > git repack -a -f -d --window=250 --depth=250 > > and top reports a ~700m virt and ~500m res without hitting swap at all. > It is only at 25% so far, but I was unable to get that far before. > > Would be curious to know what you get with 4 threads on your machine. Changing those parameters really slowed down counting the objects. I used to be able to count in 45 seconds now it took 130 seconds. I am still have the Google allocator linked in. 4 threads, cumulative clock time 25% 200 seconds, 820/627M 55% 510 seconds, 1240/1000M - little late recording 75% 15 minutes, 1658/1500M 90% 22 minutes, 1974/1800M it's still running but there is no significant change. Are two types of allocations being mixed? 1) long term, global objects kept until the end of everything 2) volatile, private objects allocated only while the object is being compressed and then freed Separating these would make a big difference to the fragmentation problem. Single threading probably wouldn't see a fragmentation problem from mixing the allocation types. When a thread is created it could allocated a private 20MB (or whatever) pool. The volatile, private objects would come from that pool. Long term objects would stay in the global pool. Since they are long term they will just get laid down sequentially in memory. Separating these allocation types make things way easier for malloc. CPU time would be helped by removing some of the locking if possible. -- Jon Smirl jonsmirl@gmail.com ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: Something is broken in repack 2007-12-11 15:36 ` Nicolas Pitre 2007-12-11 16:20 ` Jon Smirl @ 2007-12-11 16:22 ` Nicolas Pitre 2007-12-11 16:34 ` Jon Smirl 1 sibling, 1 reply; 45+ messages in thread From: Nicolas Pitre @ 2007-12-11 16:22 UTC (permalink / raw) To: Jon Smirl; +Cc: Junio C Hamano, gcc, Git Mailing List On Tue, 11 Dec 2007, Nicolas Pitre wrote: > OK, here's something else for you to try: > > core.deltabasecachelimit=0 > pack.threads=2 > pack.deltacachesize=1 > > With that I'm able to repack the small gcc pack on my machine with 1GB > of ram using: > > git repack -a -f -d --window=250 --depth=250 > > and top reports a ~700m virt and ~500m res without hitting swap at all. > It is only at 25% so far, but I was unable to get that far before. Well, around 55% memory usage skyrocketed to 1.6GB and the system went deep into swap. So I restarted it with no threads. Nicolas (even more puzzled) ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: Something is broken in repack 2007-12-11 16:22 ` Nicolas Pitre @ 2007-12-11 16:34 ` Jon Smirl 2007-12-12 7:25 ` Nicolas Pitre 0 siblings, 1 reply; 45+ messages in thread From: Jon Smirl @ 2007-12-11 16:34 UTC (permalink / raw) To: Nicolas Pitre; +Cc: Junio C Hamano, gcc, Git Mailing List On 12/11/07, Nicolas Pitre <nico@cam.org> wrote: > On Tue, 11 Dec 2007, Nicolas Pitre wrote: > > > OK, here's something else for you to try: > > > > core.deltabasecachelimit=0 > > pack.threads=2 > > pack.deltacachesize=1 > > > > With that I'm able to repack the small gcc pack on my machine with 1GB > > of ram using: > > > > git repack -a -f -d --window=250 --depth=250 > > > > and top reports a ~700m virt and ~500m res without hitting swap at all. > > It is only at 25% so far, but I was unable to get that far before. > > Well, around 55% memory usage skyrocketed to 1.6GB and the system went > deep into swap. So I restarted it with no threads. > > Nicolas (even more puzzled) On the plus side you are seeing what I see, so it proves I am not imagining it. -- Jon Smirl jonsmirl@gmail.com ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: Something is broken in repack 2007-12-11 16:34 ` Jon Smirl @ 2007-12-12 7:25 ` Nicolas Pitre 2007-12-12 12:02 ` David Kastrup ` (2 more replies) 0 siblings, 3 replies; 45+ messages in thread From: Nicolas Pitre @ 2007-12-12 7:25 UTC (permalink / raw) To: Jon Smirl; +Cc: Junio C Hamano, gcc, Git Mailing List On Tue, 11 Dec 2007, Jon Smirl wrote: > On 12/11/07, Nicolas Pitre <nico@cam.org> wrote: > > On Tue, 11 Dec 2007, Nicolas Pitre wrote: > > > > > OK, here's something else for you to try: > > > > > > core.deltabasecachelimit=0 > > > pack.threads=2 > > > pack.deltacachesize=1 > > > > > > With that I'm able to repack the small gcc pack on my machine with 1GB > > > of ram using: > > > > > > git repack -a -f -d --window=250 --depth=250 > > > > > > and top reports a ~700m virt and ~500m res without hitting swap at all. > > > It is only at 25% so far, but I was unable to get that far before. > > > > Well, around 55% memory usage skyrocketed to 1.6GB and the system went > > deep into swap. So I restarted it with no threads. > > > > Nicolas (even more puzzled) > > On the plus side you are seeing what I see, so it proves I am not imagining it. Well... This is weird. It seems that memory fragmentation is really really killing us here. The fact that the Google allocator did manage to waste quite less memory is a good indicator already. I did modify the progress display to show accounted memory that was allocated vs memory that was freed but still not released to the system. At least that gives you an idea of memory allocation and fragmentation with glibc in real time: diff --git a/progress.c b/progress.c index d19f80c..46ac9ef 100644 --- a/progress.c +++ b/progress.c @@ -8,6 +8,7 @@ * published by the Free Software Foundation. */ +#include <malloc.h> #include "git-compat-util.h" #include "progress.h" @@ -94,10 +95,12 @@ static int display(struct progress *progress, unsigned n, const char *done) if (progress->total) { unsigned percent = n * 100 / progress->total; if (percent != progress->last_percent || progress_update) { + struct mallinfo m = mallinfo(); progress->last_percent = percent; - fprintf(stderr, "%s: %3u%% (%u/%u)%s%s", - progress->title, percent, n, - progress->total, tp, eol); + fprintf(stderr, "%s: %3u%% (%u/%u) %u/%uMB%s%s", + progress->title, percent, n, progress->total, + m.uordblks >> 18, m.fordblks >> 18, + tp, eol); fflush(stderr); progress_update = 0; return 1; This shows that at some point the repack goes into a big memory surge. I don't have enough RAM to see how fragmented memory gets though, since it starts swapping around 50% done with 2 threads. With only 1 thread, memory usage grows significantly at around 11% with a pretty noticeable slowdown in the progress rate. So I think the theory goes like this: There is a block of big objects together in the list somewhere. Initially, all those big objects are assigned to thread #1 out of 4. Because those objects are big, they get really slow to delta compress, and storing them all in a window with 250 slots takes significant memory. Threads 2, 3, and 4 have "easy" work loads, so they complete fairly quicly compared to thread #1. But since the progress display is global then you won't notice that one thread is actually crawling slowly. To keep all threads busy until the end, those threads that are done with their work load will steal some work from another thread, choosing the one with the largest remaining work. That is most likely thread #1. So as threads 2, 3, and 4 complete, they will steal from thread 1 and populate their own window with those big objects too, and get slow too. And because all threads gets to work on those big objects towards the end, the progress display will then show a significant slowdown, and memory usage will almost quadruple. Add memory fragmentation to that and you have a clogged system. Solution: pack.deltacachesize=1 pack.windowmemory=16M Limiting the window memory to 16MB will automatically shrink the window size when big objects are encountered, therefore keeping much fewer of those objects at the same time in memory, which in turn means they will be processed much more quickly. And somehow that must help with memory fragmentation as well. Setting pack.deltacachesize to 1 is simply to disable the caching of delta results entirely which will only slow down the writing phase, but I wanted to keep it out of the picture for now. With the above settings, I'm currently repacking the gcc repo with 2 threads, and memory allocation never exceeded 700m virt and 400m res, while the mallinfo shows about 350MB, and progress has reached 90% which has never occurred on this machine with the 300MB source pack so far. Nicolas ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: Something is broken in repack 2007-12-12 7:25 ` Nicolas Pitre @ 2007-12-12 12:02 ` David Kastrup 2007-12-14 16:44 ` Wolfram Gloger 2007-12-12 16:14 ` Nicolas Pitre 2007-12-12 16:19 ` Nicolas Pitre 2 siblings, 1 reply; 45+ messages in thread From: David Kastrup @ 2007-12-12 12:02 UTC (permalink / raw) To: Nicolas Pitre; +Cc: Jon Smirl, Junio C Hamano, gcc, Git Mailing List Nicolas Pitre <nico@cam.org> writes: > Well... This is weird. > > It seems that memory fragmentation is really really killing us here. > The fact that the Google allocator did manage to waste quite less memory > is a good indicator already. Maybe an malloc/free/mmap wrapper that records the requested sizes and alloc/free order and dumps them to file so that one can make a compact git-free standalone test case for the glibc maintainers might be a good thing. -- David Kastrup, Kriemhildstr. 15, 44793 Bochum ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: Something is broken in repack 2007-12-12 12:02 ` David Kastrup @ 2007-12-14 16:44 ` Wolfram Gloger 0 siblings, 0 replies; 45+ messages in thread From: Wolfram Gloger @ 2007-12-14 16:44 UTC (permalink / raw) To: dak; +Cc: nico, jonsmirl, gitster, gcc, git Hi, > Maybe an malloc/free/mmap wrapper that records the requested sizes and > alloc/free order and dumps them to file so that one can make a compact > git-free standalone test case for the glibc maintainers might be a good > thing. I already have such a wrapper: http://malloc.de/malloc/mtrace-20060529.tar.gz But note that it does interfere with the thread scheduling, so it can't record the exact same allocation pattern as when not using the wrapper. Regards, Wolfram. ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: Something is broken in repack 2007-12-12 7:25 ` Nicolas Pitre 2007-12-12 12:02 ` David Kastrup @ 2007-12-12 16:14 ` Nicolas Pitre 2007-12-12 16:37 ` Paolo Bonzini ` (2 more replies) 2007-12-12 16:19 ` Nicolas Pitre 2 siblings, 3 replies; 45+ messages in thread From: Nicolas Pitre @ 2007-12-12 16:14 UTC (permalink / raw) To: Jon Smirl; +Cc: Junio C Hamano, gcc, Git Mailing List On Wed, 12 Dec 2007, Nicolas Pitre wrote: > Add memory fragmentation to that and you have a clogged system. > > Solution: > > pack.deltacachesize=1 > pack.windowmemory=16M > > Limiting the window memory to 16MB will automatically shrink the window > size when big objects are encountered, therefore keeping much fewer of > those objects at the same time in memory, which in turn means they will > be processed much more quickly. And somehow that must help with memory > fragmentation as well. OK scrap that. When I returned to the computer this morning, the repack was completed... with a 1.3GB pack instead. So... The gcc repo apparently really needs a large window to efficiently compress those large objects. But when those large objects are already well deltified and you repack again with a large window, somehow the memory allocator is way more involved, probably even more so when there are several threads in parallel amplifying the issue, and things probably get to a point of no return with regard to memory fragmentation after a while. So... my conclusion is that the glibc allocator has fragmentation issues with this work load, given the notable difference with the Google allocator, which itself might not be completely immune to fragmentation issues of its own. And because the gcc repo requires a large window of big objects to get good compression, then you're better not using 4 threads to repack it with -a -f. The fact that the size of the source pack has such an influence is probably only because the increased usage of the delta base object cache is playing a role in the global memory allocation pattern, allowing for the bad fragmentation issue to occur. If you could run one last test with the mallinfo patch I posted, without the pack.windowmemory setting, and adding the reported values along with those from top, then we could formally conclude to memory fragmentation issues. So I don't think Git itself is actually bad. The gcc repo most certainly constitute a nasty use case for memory allocators, but I don't think there is much we can do about it besides possibly implementing our own memory allocator with active defragmentation where possible (read memcpy) at some point to give glibc's allocator some chance to breathe a bit more. In the mean time you might have to use only one thread and lots of memory to repack the gcc repo, or find the perfect memory allocator to be used with Git. After all, packing the whole gcc history to around 230MB is quite a stunt but it requires sufficient resources to achieve it. Fortunately, like Linus said, such a wholesale repack is not something that most users have to do anyway. Nicolas ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: Something is broken in repack 2007-12-12 16:14 ` Nicolas Pitre @ 2007-12-12 16:37 ` Paolo Bonzini 2007-12-12 16:42 ` Linus Torvalds 2007-12-13 14:09 ` Nguyen Thai Ngoc Duy 2 siblings, 0 replies; 45+ messages in thread From: Paolo Bonzini @ 2007-12-12 16:37 UTC (permalink / raw) To: gcc; +Cc: git > When I returned to the computer this morning, the repack was > completed... with a 1.3GB pack instead. > > So... The gcc repo apparently really needs a large window to efficiently > compress those large objects. So, am I right that if you have a very well-done pack (such as gcc's), you might want to repack in two phases: - first discarding the old deltas and using a small window, thus producing a bad pack that can be repacked without humongous amounts of memory... - ... then discarding the old deltas and producing another well-compressed pack? Paolo ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: Something is broken in repack 2007-12-12 16:14 ` Nicolas Pitre 2007-12-12 16:37 ` Paolo Bonzini @ 2007-12-12 16:42 ` Linus Torvalds 2007-12-12 16:54 ` David Miller ` (2 more replies) 2007-12-13 14:09 ` Nguyen Thai Ngoc Duy 2 siblings, 3 replies; 45+ messages in thread From: Linus Torvalds @ 2007-12-12 16:42 UTC (permalink / raw) To: Nicolas Pitre; +Cc: Jon Smirl, Junio C Hamano, gcc, Git Mailing List On Wed, 12 Dec 2007, Nicolas Pitre wrote: > > So... my conclusion is that the glibc allocator has fragmentation issues > with this work load, given the notable difference with the Google > allocator, which itself might not be completely immune to fragmentation > issues of its own. Yes. Note that delta following involves patterns something like allocate (small) space for delta for i in (1..depth) { allocate large space for base allocate large space for result .. apply delta .. free large space for base free small space for delta } so if you have some stupid heap algorithm that doesn't try to merge and re-use free'd spaces very aggressively (because that takes CPU time!), you might have memory usage be horribly inflated by the heap having all those holes for all the objects that got free'd in the chain that don't get aggressively re-used. Threaded memory allocators then make this worse by probably using totally different heaps for different threads (in order to avoid locking), so they will *all* have the fragmentation issue. And if you *really* want to cause trouble for a memory allocator, what you should try to do is to allocate the memory in one thread, and free it in another, and then things can really explode (the freeing thread notices that the allocation is not in its thread-local heap, so instead of really freeing it, it puts it on a separate list of areas to be freed later by the original thread when it needs memory - or worse, it adds it to the local thread list, and makes it effectively totally impossible to then ever merge different free'd allocations ever again because the freed things will be on different heap lists!). I'm not saying that particular case happens in git, I'm just saying that it's not unheard of. And with the delta cache and the object lookup, it's not at _all_ impossible that we hit the "allocate in one thread, free in another" case! Linus ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: Something is broken in repack 2007-12-12 16:42 ` Linus Torvalds @ 2007-12-12 16:54 ` David Miller 2007-12-12 17:12 ` Linus Torvalds 2007-12-12 17:29 ` Jon Smirl 2007-12-14 16:19 ` Wolfram Gloger 2 siblings, 1 reply; 45+ messages in thread From: David Miller @ 2007-12-12 16:54 UTC (permalink / raw) To: torvalds; +Cc: nico, jonsmirl, gitster, gcc, git From: Linus Torvalds <torvalds@linux-foundation.org> Date: Wed, 12 Dec 2007 08:37:10 -0800 (PST) > I'm not saying that particular case happens in git, I'm just saying that > it's not unheard of. And with the delta cache and the object lookup, it's > not at _all_ impossible that we hit the "allocate in one thread, free in > another" case! One thing that supports these theories is that, while running these large repacks, I notice that the RSS is roughly 2/3 of the amount of virtual address space allocated. I personally don't think it's unreasonable for GIT to have it's own customized allocator at least for certain object types. ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: Something is broken in repack 2007-12-12 16:54 ` David Miller @ 2007-12-12 17:12 ` Linus Torvalds 0 siblings, 0 replies; 45+ messages in thread From: Linus Torvalds @ 2007-12-12 17:12 UTC (permalink / raw) To: David Miller; +Cc: nico, jonsmirl, gitster, gcc, git On Wed, 12 Dec 2007, David Miller wrote: > > I personally don't think it's unreasonable for GIT to have it's > own customized allocator at least for certain object types. Well, we actually already *do* have a customized allocator, but currently only for the actual core "object descriptor" that really just has the SHA1 and object flags in it (and a few extra words depending on object type). Those are critical for certain loads, and small too (so using the standard allocator wasted a _lot_ of memory). In addition, they're fixed-size and never free'd, so a specialized allocator really can do a lot better than any general-purpose memory allocator ever could. But the actual object *contents* are currently all allocated with whatever the standard libc malloc/free allocator is that you compile for (or load dynamically). Havign a specialized allocator for them is a much more involved issue, exactly because we do have interesting allocation patterns etc. That said, at least those object allocations are all single-threaded (for right now, at least), so even when git does multi-threaded stuff, the core sha1_file.c stuff is always run under a single lock, and a simpler allocator that doesn't care about threads is likely to be much better than one that tries to have thread-local heaps etc. I suspect that is what the google allocator does. It probably doesn't have per-thread heaps, it just uses locking (and quite possibly things like per-*size* heaps, which is much more memory-efficient and helps avoid some of the fragmentation problems). Locking is much slower than per-thread accesses, but it doesn't have the issues with per-thread-fragmentation and all the problems with one thread allocating and another one freeing. Linus ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: Something is broken in repack 2007-12-12 16:42 ` Linus Torvalds 2007-12-12 16:54 ` David Miller @ 2007-12-12 17:29 ` Jon Smirl 2007-12-14 16:19 ` Wolfram Gloger 2 siblings, 0 replies; 45+ messages in thread From: Jon Smirl @ 2007-12-12 17:29 UTC (permalink / raw) To: Linus Torvalds; +Cc: Nicolas Pitre, Junio C Hamano, gcc, Git Mailing List On 12/12/07, Linus Torvalds <torvalds@linux-foundation.org> wrote: > > > On Wed, 12 Dec 2007, Nicolas Pitre wrote: > > > > So... my conclusion is that the glibc allocator has fragmentation issues > > with this work load, given the notable difference with the Google > > allocator, which itself might not be completely immune to fragmentation > > issues of its own. > > Yes. > > Note that delta following involves patterns something like > > allocate (small) space for delta > for i in (1..depth) { > allocate large space for base > allocate large space for result > .. apply delta .. > free large space for base > free small space for delta > } Is it hard to hack up something that statically allocates a big block of memory per thread for these two and then just reuses it? allocate (small) space for delta allocate large space for base The alternating between long term and short term allocations definitely aggravates fragmentation. > > so if you have some stupid heap algorithm that doesn't try to merge and > re-use free'd spaces very aggressively (because that takes CPU time!), you > might have memory usage be horribly inflated by the heap having all those > holes for all the objects that got free'd in the chain that don't get > aggressively re-used. > > Threaded memory allocators then make this worse by probably using totally > different heaps for different threads (in order to avoid locking), so they > will *all* have the fragmentation issue. > > And if you *really* want to cause trouble for a memory allocator, what you > should try to do is to allocate the memory in one thread, and free it in > another, and then things can really explode (the freeing thread notices > that the allocation is not in its thread-local heap, so instead of really > freeing it, it puts it on a separate list of areas to be freed later by > the original thread when it needs memory - or worse, it adds it to the > local thread list, and makes it effectively totally impossible to then > ever merge different free'd allocations ever again because the freed > things will be on different heap lists!). > > I'm not saying that particular case happens in git, I'm just saying that > it's not unheard of. And with the delta cache and the object lookup, it's > not at _all_ impossible that we hit the "allocate in one thread, free in > another" case! > > Linus > -- Jon Smirl jonsmirl@gmail.com ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: Something is broken in repack 2007-12-12 16:42 ` Linus Torvalds 2007-12-12 16:54 ` David Miller 2007-12-12 17:29 ` Jon Smirl @ 2007-12-14 16:19 ` Wolfram Gloger 2007-12-14 16:59 ` David Kastrup 2 siblings, 1 reply; 45+ messages in thread From: Wolfram Gloger @ 2007-12-14 16:19 UTC (permalink / raw) To: torvalds; +Cc: nico, jonsmirl, gitster, gcc, git Hi, > Note that delta following involves patterns something like > > allocate (small) space for delta > for i in (1..depth) { > allocate large space for base > allocate large space for result > .. apply delta .. > free large space for base > free small space for delta > } > > so if you have some stupid heap algorithm that doesn't try to merge and > re-use free'd spaces very aggressively (because that takes CPU time!), ptmalloc2 (in glibc) _per arena_ is basically best-fit. This is the best known general strategy, but it certainly cannot be the best in every case. > you > might have memory usage be horribly inflated by the heap having all those > holes for all the objects that got free'd in the chain that don't get > aggressively re-used. It depends how large 'large' is -- if it exceeds the mmap() threshold (settable with mallopt(M_MMAP_THRESHOLD, ...)) the 'large' spaces will be allocated with mmap() and won't cause any internal fragmentation. It might pay to experiment with this parameter if it is hard to avoid the alloc/free large space sequence. > Threaded memory allocators then make this worse by probably using totally > different heaps for different threads (in order to avoid locking), so they > will *all* have the fragmentation issue. Indeed. Could someone perhaps try ptmalloc3 (http://malloc.de/malloc/ptmalloc3-current.tar.gz) on this case? Thanks, Wolfram. ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: Something is broken in repack 2007-12-14 16:19 ` Wolfram Gloger @ 2007-12-14 16:59 ` David Kastrup 2007-12-14 18:49 ` Wolfram Gloger 0 siblings, 1 reply; 45+ messages in thread From: David Kastrup @ 2007-12-14 16:59 UTC (permalink / raw) To: Wolfram Gloger; +Cc: torvalds, nico, jonsmirl, gitster, gcc, git Wolfram Gloger <wmglo@dent.med.uni-muenchen.de> writes: > Hi, > >> Note that delta following involves patterns something like >> >> allocate (small) space for delta >> for i in (1..depth) { >> allocate large space for base >> allocate large space for result >> .. apply delta .. >> free large space for base >> free small space for delta >> } >> >> so if you have some stupid heap algorithm that doesn't try to merge and >> re-use free'd spaces very aggressively (because that takes CPU time!), > > ptmalloc2 (in glibc) _per arena_ is basically best-fit. This is the > best known general strategy, Uh what? Someone crank out his copy of "The Art of Computer Programming", I think volume 1. Best fit is known (analyzed and proven and documented decades ago) to be one of the worst strategies for memory allocation. Exactly because it leads to huge fragmentation problems. -- David Kastrup, Kriemhildstr. 15, 44793 Bochum ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: Something is broken in repack 2007-12-14 16:59 ` David Kastrup @ 2007-12-14 18:49 ` Wolfram Gloger 0 siblings, 0 replies; 45+ messages in thread From: Wolfram Gloger @ 2007-12-14 18:49 UTC (permalink / raw) To: dak; +Cc: wmglo, torvalds, nico, jonsmirl, gitster, gcc, git Hi, > Uh what? Someone crank out his copy of "The Art of Computer > Programming", I think volume 1. Best fit is known (analyzed and proven > and documented decades ago) to be one of the worst strategies for memory > allocation. Exactly because it leads to huge fragmentation problems. Well, quoting http://gee.cs.oswego.edu/dl/html/malloc.html: "As shown by Wilson et al, best-fit schemes (of various kinds and approximations) tend to produce the least fragmentation on real loads compared to other general approaches such as first-fit." See [Wilson 1995] ftp://ftp.cs.utexas.edu/pub/garbage/allocsrv.ps for more details and references. Regards, Wolfram. ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: Something is broken in repack 2007-12-12 16:14 ` Nicolas Pitre 2007-12-12 16:37 ` Paolo Bonzini 2007-12-12 16:42 ` Linus Torvalds @ 2007-12-13 14:09 ` Nguyen Thai Ngoc Duy [not found] ` <fjrj9k$n6k$1@ger.gmane.org> 2 siblings, 1 reply; 45+ messages in thread From: Nguyen Thai Ngoc Duy @ 2007-12-13 14:09 UTC (permalink / raw) To: Nicolas Pitre; +Cc: Jon Smirl, Junio C Hamano, gcc, Git Mailing List On Dec 12, 2007 10:48 PM, Nicolas Pitre <nico@cam.org> wrote: > In the mean time you might have to use only one thread and lots of > memory to repack the gcc repo, or find the perfect memory allocator to > be used with Git. After all, packing the whole gcc history to around > 230MB is quite a stunt but it requires sufficient resources to > achieve it. Fortunately, like Linus said, such a wholesale repack is not > something that most users have to do anyway. Is there an alternative to "git repack -a -d" that repacks everything but the first pack? -- Duy ^ permalink raw reply [flat|nested] 45+ messages in thread
[parent not found: <fjrj9k$n6k$1@ger.gmane.org>]
* Re: Something is broken in repack [not found] ` <fjrj9k$n6k$1@ger.gmane.org> @ 2007-12-13 16:39 ` Paolo Bonzini 2007-12-13 17:40 ` Johannes Sixt 1 sibling, 0 replies; 45+ messages in thread From: Paolo Bonzini @ 2007-12-13 16:39 UTC (permalink / raw) Cc: git, gcc >> Is there an alternative to "git repack -a -d" that repacks everything >> but the first pack? > > That would be a pretty good idea for big repositories. If I were to > implement it, I would actually add a .git/config option like > pack.permanent so that more than one pack could be made permanent; then > to repack really really everything you'd need "git repack -a -a -d". Actually there is something like this, as seen from the source of git-repack: for e in `cd "$PACKDIR" && find . -type f -name '*.pack' \ | sed -e 's/^\.\///' -e 's/\.pack$//'` do if [ -e "$PACKDIR/$e.keep" ]; then : keep else args="$args --unpacked=$e.pack" existing="$existing $e" fi done So, just create a file named as the pack, but with extension ".keep". Paolo ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: Something is broken in repack [not found] ` <fjrj9k$n6k$1@ger.gmane.org> 2007-12-13 16:39 ` Paolo Bonzini @ 2007-12-13 17:40 ` Johannes Sixt 2007-12-14 2:31 ` Jakub Narebski 1 sibling, 1 reply; 45+ messages in thread From: Johannes Sixt @ 2007-12-13 17:40 UTC (permalink / raw) To: Paolo Bonzini; +Cc: git, gcc Paolo Bonzini schrieb: > Nguyen Thai Ngoc Duy wrote: >> On Dec 12, 2007 10:48 PM, Nicolas Pitre <nico@cam.org> wrote: >>> In the mean time you might have to use only one thread and lots of >>> memory to repack the gcc repo, or find the perfect memory allocator to >>> be used with Git. After all, packing the whole gcc history to around >>> 230MB is quite a stunt but it requires sufficient resources to >>> achieve it. Fortunately, like Linus said, such a wholesale repack is not >>> something that most users have to do anyway. >> >> Is there an alternative to "git repack -a -d" that repacks everything >> but the first pack? > > That would be a pretty good idea for big repositories. If I were to > implement it, I would actually add a .git/config option like > pack.permanent so that more than one pack could be made permanent; then > to repack really really everything you'd need "git repack -a -a -d". It's already there: If you have a pack .git/objects/pack/pack-foo.pack, then "touch .git/objects/pack/pack-foo.keep" marks the pack as precious. -- Hannes ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: Something is broken in repack 2007-12-13 17:40 ` Johannes Sixt @ 2007-12-14 2:31 ` Jakub Narebski [not found] ` <fjt6vm$n7d$1@ger.gmane.org> 0 siblings, 1 reply; 45+ messages in thread From: Jakub Narebski @ 2007-12-14 2:31 UTC (permalink / raw) To: gcc; +Cc: git Johannes Sixt wrote: > Paolo Bonzini schrieb: >> Nguyen Thai Ngoc Duy wrote: >>> >>> Is there an alternative to "git repack -a -d" that repacks everything >>> but the first pack? >> >> That would be a pretty good idea for big repositories. If I were to >> implement it, I would actually add a .git/config option like >> pack.permanent so that more than one pack could be made permanent; then >> to repack really really everything you'd need "git repack -a -a -d". > > It's already there: If you have a pack .git/objects/pack/pack-foo.pack, then > "touch .git/objects/pack/pack-foo.keep" marks the pack as precious. Actually you can (and probably should) put the one line with the _reason_ pack is to be kept in the *.keep file. Hmmm... it is even documented in git-gc(1)... and git-index-pack(1) of all things. -- Jakub Narebski Warsaw, Poland ShadeHawk on #git ^ permalink raw reply [flat|nested] 45+ messages in thread
[parent not found: <fjt6vm$n7d$1@ger.gmane.org>]
* Re: Something is broken in repack [not found] ` <fjt6vm$n7d$1@ger.gmane.org> @ 2007-12-14 6:39 ` Nguyen Thai Ngoc Duy 2007-12-14 9:02 ` Paolo Bonzini 2007-12-14 10:52 ` Jakub Narebski 2007-12-14 13:53 ` Nicolas Pitre 1 sibling, 2 replies; 45+ messages in thread From: Nguyen Thai Ngoc Duy @ 2007-12-14 6:39 UTC (permalink / raw) To: Paolo Bonzini; +Cc: git, gcc On Dec 14, 2007 1:14 PM, Paolo Bonzini <bonzini@gnu.org> wrote: > > Hmmm... it is even documented in git-gc(1)... and git-index-pack(1) of > > all things. > > I found that the .keep file is not transmitted over the network (at > least I tried with git+ssh:// and http:// protocols), however. I'm thinking about "git clone --keep" to mark initial packs precious. But 'git clone' is under rewrite to C. Let's wait until C rewrite is done. -- Duy ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: Something is broken in repack 2007-12-14 6:39 ` Nguyen Thai Ngoc Duy @ 2007-12-14 9:02 ` Paolo Bonzini 2007-12-14 9:39 ` Harvey Harrison 2007-12-14 10:52 ` Jakub Narebski 1 sibling, 1 reply; 45+ messages in thread From: Paolo Bonzini @ 2007-12-14 9:02 UTC (permalink / raw) To: gcc; +Cc: git > I'm thinking about "git clone --keep" to mark initial packs precious. > But 'git clone' is under rewrite to C. Let's wait until C rewrite is > done. It should be the default, IMHO. Paolo ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: Something is broken in repack 2007-12-14 9:02 ` Paolo Bonzini @ 2007-12-14 9:39 ` Harvey Harrison 0 siblings, 0 replies; 45+ messages in thread From: Harvey Harrison @ 2007-12-14 9:39 UTC (permalink / raw) To: Paolo Bonzini; +Cc: gcc, git On Fri, 2007-12-14 at 09:20 +0100, Paolo Bonzini wrote: > > I'm thinking about "git clone --keep" to mark initial packs precious. > > But 'git clone' is under rewrite to C. Let's wait until C rewrite is > > done. > > It should be the default, IMHO. > While it doesn't mark the packs as .keep, git will reuse all of the old deltas you got in the original clone, so you're not losing anything. Harvey ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: Something is broken in repack 2007-12-14 6:39 ` Nguyen Thai Ngoc Duy 2007-12-14 9:02 ` Paolo Bonzini @ 2007-12-14 10:52 ` Jakub Narebski 2007-12-14 13:25 ` Nguyen Thai Ngoc Duy 1 sibling, 1 reply; 45+ messages in thread From: Jakub Narebski @ 2007-12-14 10:52 UTC (permalink / raw) To: Nguyen Thai Ngoc Duy; +Cc: Paolo Bonzini, git, gcc "Nguyen Thai Ngoc Duy" <pclouds@gmail.com> writes: > On Dec 14, 2007 1:14 PM, Paolo Bonzini <bonzini@gnu.org> wrote: > > > Hmmm... it is even documented in git-gc(1)... and git-index-pack(1) of > > > all things. > > > > I found that the .keep file is not transmitted over the network (at > > least I tried with git+ssh:// and http:// protocols), however. > > I'm thinking about "git clone --keep" to mark initial packs precious. > But 'git clone' is under rewrite to C. Let's wait until C rewrite is > done. But if you clone via network, pack might be network optimized if you use "smart" transport, not disk optimized, at least with current git which regenerates pack also on clone AFAIK. -- Jakub Narebski Poland ShadeHawk on #git ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: Something is broken in repack 2007-12-14 10:52 ` Jakub Narebski @ 2007-12-14 13:25 ` Nguyen Thai Ngoc Duy 0 siblings, 0 replies; 45+ messages in thread From: Nguyen Thai Ngoc Duy @ 2007-12-14 13:25 UTC (permalink / raw) To: Jakub Narebski, Harvey Harrison; +Cc: Paolo Bonzini, git, gcc On Dec 14, 2007 4:01 PM, Harvey Harrison <harvey.harrison@gmail.com> wrote: > While it doesn't mark the packs as .keep, git will reuse all of the old > deltas you got in the original clone, so you're not losing anything. There is another reason I want it. I have an ~800MB pack and I don't want git to rewrite the pack every time I repack my changes. So it's kind of disk-wise (don't require 800MB on disk to prepare new pack, and don't write too much). On Dec 14, 2007 5:40 PM, Jakub Narebski <jnareb@gmail.com> wrote: > But if you clone via network, pack might be network optimized if you > use "smart" transport, not disk optimized, at least with current git > which regenerates pack also on clone AFAIK. Um.. that's ok it just regenerate once. -- Duy ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: Something is broken in repack [not found] ` <fjt6vm$n7d$1@ger.gmane.org> 2007-12-14 6:39 ` Nguyen Thai Ngoc Duy @ 2007-12-14 13:53 ` Nicolas Pitre 1 sibling, 0 replies; 45+ messages in thread From: Nicolas Pitre @ 2007-12-14 13:53 UTC (permalink / raw) To: Paolo Bonzini; +Cc: git, gcc On Fri, 14 Dec 2007, Paolo Bonzini wrote: > > Hmmm... it is even documented in git-gc(1)... and git-index-pack(1) of > > all things. > > I found that the .keep file is not transmitted over the network (at least I > tried with git+ssh:// and http:// protocols), however. That is a local policy. Nicolas ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: Something is broken in repack 2007-12-12 7:25 ` Nicolas Pitre 2007-12-12 12:02 ` David Kastrup 2007-12-12 16:14 ` Nicolas Pitre @ 2007-12-12 16:19 ` Nicolas Pitre 2007-12-13 7:39 ` Andreas Ericsson 2 siblings, 1 reply; 45+ messages in thread From: Nicolas Pitre @ 2007-12-12 16:19 UTC (permalink / raw) To: Jon Smirl; +Cc: Junio C Hamano, gcc, Git Mailing List On Wed, 12 Dec 2007, Nicolas Pitre wrote: > I did modify the progress display to show accounted memory that was > allocated vs memory that was freed but still not released to the system. > At least that gives you an idea of memory allocation and fragmentation > with glibc in real time: > > diff --git a/progress.c b/progress.c > index d19f80c..46ac9ef 100644 > --- a/progress.c > +++ b/progress.c > @@ -8,6 +8,7 @@ > * published by the Free Software Foundation. > */ > > +#include <malloc.h> > #include "git-compat-util.h" > #include "progress.h" > > @@ -94,10 +95,12 @@ static int display(struct progress *progress, unsigned n, const char *done) > if (progress->total) { > unsigned percent = n * 100 / progress->total; > if (percent != progress->last_percent || progress_update) { > + struct mallinfo m = mallinfo(); > progress->last_percent = percent; > - fprintf(stderr, "%s: %3u%% (%u/%u)%s%s", > - progress->title, percent, n, > - progress->total, tp, eol); > + fprintf(stderr, "%s: %3u%% (%u/%u) %u/%uMB%s%s", > + progress->title, percent, n, progress->total, > + m.uordblks >> 18, m.fordblks >> 18, > + tp, eol); Note: I didn't know what unit of memory those blocks represents, so the shift is most probably wrong. Nicolas ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: Something is broken in repack 2007-12-12 16:19 ` Nicolas Pitre @ 2007-12-13 7:39 ` Andreas Ericsson 2007-12-14 16:12 ` Wolfram Gloger 0 siblings, 1 reply; 45+ messages in thread From: Andreas Ericsson @ 2007-12-13 7:39 UTC (permalink / raw) To: Nicolas Pitre; +Cc: Jon Smirl, Junio C Hamano, gcc, Git Mailing List Nicolas Pitre wrote: > On Wed, 12 Dec 2007, Nicolas Pitre wrote: > >> I did modify the progress display to show accounted memory that was >> allocated vs memory that was freed but still not released to the system. >> At least that gives you an idea of memory allocation and fragmentation >> with glibc in real time: >> >> diff --git a/progress.c b/progress.c >> index d19f80c..46ac9ef 100644 >> --- a/progress.c >> +++ b/progress.c >> @@ -8,6 +8,7 @@ >> * published by the Free Software Foundation. >> */ >> >> +#include <malloc.h> >> #include "git-compat-util.h" >> #include "progress.h" >> >> @@ -94,10 +95,12 @@ static int display(struct progress *progress, unsigned n, const char *done) >> if (progress->total) { >> unsigned percent = n * 100 / progress->total; >> if (percent != progress->last_percent || progress_update) { >> + struct mallinfo m = mallinfo(); >> progress->last_percent = percent; >> - fprintf(stderr, "%s: %3u%% (%u/%u)%s%s", >> - progress->title, percent, n, >> - progress->total, tp, eol); >> + fprintf(stderr, "%s: %3u%% (%u/%u) %u/%uMB%s%s", >> + progress->title, percent, n, progress->total, >> + m.uordblks >> 18, m.fordblks >> 18, >> + tp, eol); > > Note: I didn't know what unit of memory those blocks represents, so the > shift is most probably wrong. > Me neither, but it appears to me as if hblkhd holds the actual memory consumed by the process. It seems to store the information in bytes, which I find a bit dubious unless glibc has some internal multiplier. -- Andreas Ericsson andreas.ericsson@op5.se OP5 AB www.op5.se Tel: +46 8-230225 Fax: +46 8-230231 ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: Something is broken in repack 2007-12-13 7:39 ` Andreas Ericsson @ 2007-12-14 16:12 ` Wolfram Gloger 0 siblings, 0 replies; 45+ messages in thread From: Wolfram Gloger @ 2007-12-14 16:12 UTC (permalink / raw) To: ae; +Cc: nico, jonsmirl, gitster, gcc, git Hi, > >> if (progress->total) { > >> unsigned percent = n * 100 / progress->total; > >> if (percent != progress->last_percent || progress_update) { > >> + struct mallinfo m = mallinfo(); > >> progress->last_percent = percent; > >> - fprintf(stderr, "%s: %3u%% (%u/%u)%s%s", > >> - progress->title, percent, n, > >> - progress->total, tp, eol); > >> + fprintf(stderr, "%s: %3u%% (%u/%u) %u/%uMB%s%s", > >> + progress->title, percent, n, progress->total, > >> + m.uordblks >> 18, m.fordblks >> 18, > >> + tp, eol); > > > > Note: I didn't know what unit of memory those blocks represents, so the > > shift is most probably wrong. > > > > Me neither, but it appears to me as if hblkhd holds the actual memory > consumed by the process. It seems to store the information in bytes, > which I find a bit dubious unless glibc has some internal multiplier. mallinfo() will only give you the used memory for the main arena. When you have separate arenas (likely when concurrent threads have been used), the only way to get the full picture is to call malloc_stats(), which prints to stderr. Regards, Wolfram. ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: Something is broken in repack 2007-12-11 7:34 ` Jon Smirl 2007-12-11 11:11 ` Andreas Ericsson 2007-12-11 15:01 ` Nicolas Pitre @ 2007-12-11 17:19 ` Linus Torvalds 2007-12-11 17:24 ` Nicolas Pitre 2007-12-11 18:57 ` Jon Smirl 2007-12-11 17:44 ` Daniel Berlin 3 siblings, 2 replies; 45+ messages in thread From: Linus Torvalds @ 2007-12-11 17:19 UTC (permalink / raw) To: Jon Smirl; +Cc: Nicolas Pitre, Junio C Hamano, gcc, Git Mailing List On Tue, 11 Dec 2007, Jon Smirl wrote: > > So why does our threaded code take 20 CPU minutes longer (12%) to run > than the same code with a single thread? Threaded code *always* takes more CPU time. The only thing you can hope for is a wall-clock reduction. You're seeing probably a combination of (a) more cache misses (b) bigger dataset active at a time and a probably fairly miniscule (c) threading itself tends to have some overheads. > Q6600 is just two E6600s in the same package, the caches are not shared. Sure they are shared. They're just not *entirely* shared. But they are shared between each two cores, so each thread essentially has only half the cache they had with the non-threaded version. Threading is *not* a magic solution to all problems. It gives you potentially twice the CPU power, but there are real downsides that you should keep in mind. > Why does the threaded code need 2.24GB (google allocator, 2.85GB gcc) > with 4 threads? But only need 950MB with one thread? Where's the extra > gigabyte going? I suspect that it's really simple: you have a few rather big files in the gcc history, with deep delta chains. And what happens when you have four threads running at the same time is that they all need to keep all those objects that they are working on - and their hash state - in memory at the same time! So if you want to use more threads, that _forces_ you to have a bigger memory footprint, simply because you have more "live" objects that you work on. Normally, that isn't much of a problem, since most source files are small, but if you have a few deep delta chains on big files, both the delta chain itself is going to use memory (you may have limited the size of the cache, but it's still needed for the actual delta generation, so it's not like the memory usage went away). That said, I suspect there are a few things fighting you: - threading is hard. I haven't looked a lot at the changes Nico did to do a threaded object packer, but what I've seen does not convince me it is correct. The "trg_entry" accesses are *mostly* protected with "cache_lock", but nothing else really seems to be, so quite frankly, I wouldn't trust the threaded version very much. It's off by default, and for a good reason, I think. For example: the packing code does this: if (!src->data) { read_lock(); src->data = read_sha1_file(src_entry->idx.sha1, &type, &sz); read_unlock(); ... and that's racy. If two threads come in at roughly the same time and see a NULL src->data, theyÍ'll both get the lock, and they'll both (serially) try to fill it in. It will all *work*, but one of them will have done unnecessary work, and one of them will have their result thrown away and leaked. Are you hitting issues like this? I dunno. The object sorting means that different threads normally shouldn't look at the same objects (not even the sources), so probably not, but basically, I wouldn't trust the threading 100%. It needs work, and it needs to stay off by default. - you're working on a problem that isn't really even worth optimizing that much. The *normal* case is to re-use old deltas, which makes all of the issues you are fighting basically go away (because you only have a few _incremental_ objects that need deltaing). In other words: the _real_ optimizations have already been done, and are done elsewhere, and are much smarter (the best way to optimize X is not to make X run fast, but to avoid doing X in the first place!). The thing you are trying to work with is the one-time-only case where you explicitly disable that big and important optimization, and then you complain about the end result being slow! It's like saying that you're compiling with extreme debugging and no optimizations, and then complaining that the end result doesn't run as fast as if you used -O2. Except this is a hundred times worse, because you literally asked git to do the really expensive thing that it really really doesn't want to do ;) > Is there another allocator to try? One that combines Google's > efficiency with gcc's speed? See above: I'd look around at threading-related bugs and check the way we lock (or don't) accesses. Linus ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: Something is broken in repack 2007-12-11 17:19 ` Linus Torvalds @ 2007-12-11 17:24 ` Nicolas Pitre 2007-12-11 17:28 ` David Miller 2007-12-11 18:57 ` Jon Smirl 1 sibling, 1 reply; 45+ messages in thread From: Nicolas Pitre @ 2007-12-11 17:24 UTC (permalink / raw) To: Linus Torvalds; +Cc: Jon Smirl, Junio C Hamano, gcc, Git Mailing List [-- Attachment #1: Type: TEXT/PLAIN, Size: 4148 bytes --] On Tue, 11 Dec 2007, Linus Torvalds wrote: > That said, I suspect there are a few things fighting you: > > - threading is hard. I haven't looked a lot at the changes Nico did to do > a threaded object packer, but what I've seen does not convince me it is > correct. The "trg_entry" accesses are *mostly* protected with > "cache_lock", but nothing else really seems to be, so quite frankly, I > wouldn't trust the threaded version very much. It's off by default, and > for a good reason, I think. I beg to differ (of course, since I always know precisely what I do, and like you, my code never has bugs). Seriously though, the trg_entry has not to be protected at all. Why? Simply because each thread has its own exclusive set of objects which no other threads ever mess with. They never overlap. > For example: the packing code does this: > > if (!src->data) { > read_lock(); > src->data = read_sha1_file(src_entry->idx.sha1, &type, &sz); > read_unlock(); > ... > > and that's racy. If two threads come in at roughly the same time and > see a NULL src->data, theyÍ'll both get the lock, and they'll both > (serially) try to fill it in. It will all *work*, but one of them will > have done unnecessary work, and one of them will have their result > thrown away and leaked. No. Once again, it is impossible for two threads to ever see the same src->data at all. The lock is there simply because read_sha1_file() is not reentrant. > Are you hitting issues like this? I dunno. The object sorting means > that different threads normally shouldn't look at the same objects (not > even the sources), so probably not, but basically, I wouldn't trust the > threading 100%. It needs work, and it needs to stay off by default. For now it is, but I wouldn't say it really needs significant work at this point. The latest thread patches were more about tuning than correctness. What the threading could be doing, though, is uncovering some other bugs, like in the pack mmap windowing code for example. Although that code is serialized by the read lock above, the fact that multiple threads are hammering on it in turns means that the mmap window is possibly seeking back and forth much more often than otherwise, possibly leaking something in the process. > - you're working on a problem that isn't really even worth optimizing > that much. The *normal* case is to re-use old deltas, which makes all > of the issues you are fighting basically go away (because you only have > a few _incremental_ objects that need deltaing). > > In other words: the _real_ optimizations have already been done, and > are done elsewhere, and are much smarter (the best way to optimize X is > not to make X run fast, but to avoid doing X in the first place!). The > thing you are trying to work with is the one-time-only case where you > explicitly disable that big and important optimization, and then you > complain about the end result being slow! > > It's like saying that you're compiling with extreme debugging and no > optimizations, and then complaining that the end result doesn't run as > fast as if you used -O2. Except this is a hundred times worse, because > you literally asked git to do the really expensive thing that it really > really doesn't want to do ;) Linus, please pay attention to the _actual_ important issue here. Sure I've been tuning the threading code in parallel to the attempt to debug this memory usage issue. BUT. The point is that repacking the gcc repo using "git repack -a -f --window=250" has a radically different memory usage profile whether you do the repack on the earlier 2.1GB pack or the later 300MB pack. _That_ is the issue. Ironically, it is the 300MB pack that causes the repack to blow memory usage out of proportion. And in both cases, the threading code has to do the same work whether or not the original pack was densely packed or not since -f throws away every existing deltas anyway. So something is fishy elsewhere than in the packing code. Nicolas ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: Something is broken in repack 2007-12-11 17:24 ` Nicolas Pitre @ 2007-12-11 17:28 ` David Miller 2007-12-11 18:43 ` Nicolas Pitre 0 siblings, 1 reply; 45+ messages in thread From: David Miller @ 2007-12-11 17:28 UTC (permalink / raw) To: nico; +Cc: torvalds, jonsmirl, gitster, gcc, git From: Nicolas Pitre <nico@cam.org> Date: Tue, 11 Dec 2007 12:21:11 -0500 (EST) > BUT. The point is that repacking the gcc repo using "git repack -a -f > --window=250" has a radically different memory usage profile whether you > do the repack on the earlier 2.1GB pack or the later 300MB pack. If you repack on the smaller pack file, git has to expand more stuff internally in order to search the deltas, whereas with the larger pack file I bet git has to less often undelta'ify to get base objects blobs for delta search. In fact that behavior makes perfect sense to me and I don't understand GIT internals very well :-) ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: Something is broken in repack 2007-12-11 17:28 ` David Miller @ 2007-12-11 18:43 ` Nicolas Pitre 2007-12-11 20:34 ` Andreas Ericsson 0 siblings, 1 reply; 45+ messages in thread From: Nicolas Pitre @ 2007-12-11 18:43 UTC (permalink / raw) To: David Miller; +Cc: Linus Torvalds, jonsmirl, Junio C Hamano, gcc, git On Tue, 11 Dec 2007, David Miller wrote: > From: Nicolas Pitre <nico@cam.org> > Date: Tue, 11 Dec 2007 12:21:11 -0500 (EST) > > > BUT. The point is that repacking the gcc repo using "git repack -a -f > > --window=250" has a radically different memory usage profile whether you > > do the repack on the earlier 2.1GB pack or the later 300MB pack. > > If you repack on the smaller pack file, git has to expand more stuff > internally in order to search the deltas, whereas with the larger pack > file I bet git has to less often undelta'ify to get base objects blobs > for delta search. Of course. I came to that conclusion two days ago. And despite being pretty familiar with the involved code (I wrote part of it myself) I just can't spot anything wrong with it so far. But somehow the threading code keep distracting people from that issue since it gets to do the same work whether or not the source pack is densely packed or not. Nicolas (who wish he had access to a much faster machine to investigate this issue) ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: Something is broken in repack 2007-12-11 18:43 ` Nicolas Pitre @ 2007-12-11 20:34 ` Andreas Ericsson 0 siblings, 0 replies; 45+ messages in thread From: Andreas Ericsson @ 2007-12-11 20:34 UTC (permalink / raw) To: Nicolas Pitre Cc: David Miller, Linus Torvalds, jonsmirl, Junio C Hamano, gcc, git Nicolas Pitre wrote: > On Tue, 11 Dec 2007, David Miller wrote: > >> From: Nicolas Pitre <nico@cam.org> >> Date: Tue, 11 Dec 2007 12:21:11 -0500 (EST) >> >>> BUT. The point is that repacking the gcc repo using "git repack -a -f >>> --window=250" has a radically different memory usage profile whether you >>> do the repack on the earlier 2.1GB pack or the later 300MB pack. >> If you repack on the smaller pack file, git has to expand more stuff >> internally in order to search the deltas, whereas with the larger pack >> file I bet git has to less often undelta'ify to get base objects blobs >> for delta search. > > Of course. I came to that conclusion two days ago. And despite being > pretty familiar with the involved code (I wrote part of it myself) I > just can't spot anything wrong with it so far. > > But somehow the threading code keep distracting people from that issue > since it gets to do the same work whether or not the source pack is > densely packed or not. > > Nicolas > (who wish he had access to a much faster machine to investigate this issue) If it's still an issue next week, we'll have a 16 core (8 dual-core cpu's) machine with some 32gb of ram in that'll be free for about two days. You'll have to remind me about it though, as I've got a lot on my mind these days. -- Andreas Ericsson andreas.ericsson@op5.se OP5 AB www.op5.se Tel: +46 8-230225 Fax: +46 8-230231 ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: Something is broken in repack 2007-12-11 17:19 ` Linus Torvalds 2007-12-11 17:24 ` Nicolas Pitre @ 2007-12-11 18:57 ` Jon Smirl 2007-12-11 19:17 ` Nicolas Pitre 2007-12-11 19:40 ` Linus Torvalds 1 sibling, 2 replies; 45+ messages in thread From: Jon Smirl @ 2007-12-11 18:57 UTC (permalink / raw) To: Linus Torvalds; +Cc: Nicolas Pitre, Junio C Hamano, gcc, Git Mailing List On 12/11/07, Linus Torvalds <torvalds@linux-foundation.org> wrote: > > > On Tue, 11 Dec 2007, Jon Smirl wrote: > > > > So why does our threaded code take 20 CPU minutes longer (12%) to run > > than the same code with a single thread? > > Threaded code *always* takes more CPU time. The only thing you can hope > for is a wall-clock reduction. You're seeing probably a combination of > (a) more cache misses > (b) bigger dataset active at a time > and a probably fairly miniscule > (c) threading itself tends to have some overheads. > > > Q6600 is just two E6600s in the same package, the caches are not shared. > > Sure they are shared. They're just not *entirely* shared. But they are > shared between each two cores, so each thread essentially has only half > the cache they had with the non-threaded version. > > Threading is *not* a magic solution to all problems. It gives you > potentially twice the CPU power, but there are real downsides that you > should keep in mind. > > > Why does the threaded code need 2.24GB (google allocator, 2.85GB gcc) > > with 4 threads? But only need 950MB with one thread? Where's the extra > > gigabyte going? > > I suspect that it's really simple: you have a few rather big files in the > gcc history, with deep delta chains. And what happens when you have four > threads running at the same time is that they all need to keep all those > objects that they are working on - and their hash state - in memory at the > same time! > > So if you want to use more threads, that _forces_ you to have a bigger > memory footprint, simply because you have more "live" objects that you > work on. Normally, that isn't much of a problem, since most source files > are small, but if you have a few deep delta chains on big files, both the > delta chain itself is going to use memory (you may have limited the size > of the cache, but it's still needed for the actual delta generation, so > it's not like the memory usage went away). This makes sense. Those runs that blew up to 4.5GB were a combination of this effect and fragmentation in the gcc allocator. Google allocator appears to be much better at controlling fragmentation. Is there a reasonable scheme to force the chains to only be loaded once and then shared between worker threads? The memory blow up appears to be directly correlated with chain length. > > That said, I suspect there are a few things fighting you: > > - threading is hard. I haven't looked a lot at the changes Nico did to do > a threaded object packer, but what I've seen does not convince me it is > correct. The "trg_entry" accesses are *mostly* protected with > "cache_lock", but nothing else really seems to be, so quite frankly, I > wouldn't trust the threaded version very much. It's off by default, and > for a good reason, I think. > > For example: the packing code does this: > > if (!src->data) { > read_lock(); > src->data = read_sha1_file(src_entry->idx.sha1, &type, &sz); > read_unlock(); > ... > > and that's racy. If two threads come in at roughly the same time and > see a NULL src->data, theÿ́'ll both get the lock, and they'll both > (serially) try to fill it in. It will all *work*, but one of them will > have done unnecessary work, and one of them will have their result > thrown away and leaked. That may account for the threaded version needing an extra 20 minutes CPU time. An extra 12% of CPU seems like too much overhead for threading. Just letting a couple of those long chain compressions be done twice > > Are you hitting issues like this? I dunno. The object sorting means > that different threads normally shouldn't look at the same objects (not > even the sources), so probably not, but basically, I wouldn't trust the > threading 100%. It needs work, and it needs to stay off by default. > > - you're working on a problem that isn't really even worth optimizing > that much. The *normal* case is to re-use old deltas, which makes all > of the issues you are fighting basically go away (because you only have > a few _incremental_ objects that need deltaing). I agree, this problem only occurs when people import giant repositories. But every time someone hits these problems they declare git to be screwed up and proceed to thrash it in their blogs. > In other words: the _real_ optimizations have already been done, and > are done elsewhere, and are much smarter (the best way to optimize X is > not to make X run fast, but to avoid doing X in the first place!). The > thing you are trying to work with is the one-time-only case where you > explicitly disable that big and important optimization, and then you > complain about the end result being slow! > > It's like saying that you're compiling with extreme debugging and no > optimizations, and then complaining that the end result doesn't run as > fast as if you used -O2. Except this is a hundred times worse, because > you literally asked git to do the really expensive thing that it really > really doesn't want to do ;) > > > Is there another allocator to try? One that combines Google's > > efficiency with gcc's speed? > > See above: I'd look around at threading-related bugs and check the way we > lock (or don't) accesses. > > Linus > -- Jon Smirl jonsmirl@gmail.com ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: Something is broken in repack 2007-12-11 18:57 ` Jon Smirl @ 2007-12-11 19:17 ` Nicolas Pitre 2007-12-11 19:40 ` Linus Torvalds 1 sibling, 0 replies; 45+ messages in thread From: Nicolas Pitre @ 2007-12-11 19:17 UTC (permalink / raw) To: Jon Smirl; +Cc: Linus Torvalds, Junio C Hamano, gcc, Git Mailing List [-- Attachment #1: Type: TEXT/PLAIN, Size: 3042 bytes --] On Tue, 11 Dec 2007, Jon Smirl wrote: > This makes sense. Those runs that blew up to 4.5GB were a combination > of this effect and fragmentation in the gcc allocator. I disagree. This is insane. > Google allocator appears to be much better at controlling fragmentation. Indeed. And if fragmentation is indeed wasting half of Git's memory usage then we'll have to come with a custom memory allocator. > Is there a reasonable scheme to force the chains to only be loaded > once and then shared between worker threads? The memory blow up > appears to be directly correlated with chain length. No. That would be the equivalent of holding each revision of all files uncompressed all at once in memory. > > That said, I suspect there are a few things fighting you: > > > > - threading is hard. I haven't looked a lot at the changes Nico did to do > > a threaded object packer, but what I've seen does not convince me it is > > correct. The "trg_entry" accesses are *mostly* protected with > > "cache_lock", but nothing else really seems to be, so quite frankly, I > > wouldn't trust the threaded version very much. It's off by default, and > > for a good reason, I think. > > > > For example: the packing code does this: > > > > if (!src->data) { > > read_lock(); > > src->data = read_sha1_file(src_entry->idx.sha1, &type, &sz); > > read_unlock(); > > ... > > > > and that's racy. If two threads come in at roughly the same time and > > see a NULL src->data, theyÍ'll both get the lock, and they'll both > > (serially) try to fill it in. It will all *work*, but one of them will > > have done unnecessary work, and one of them will have their result > > thrown away and leaked. > > That may account for the threaded version needing an extra 20 minutes > CPU time. An extra 12% of CPU seems like too much overhead for > threading. Just letting a couple of those long chain compressions be > done twice No it may not. This theory is wrong as explained before. > > > > Are you hitting issues like this? I dunno. The object sorting means > > that different threads normally shouldn't look at the same objects (not > > even the sources), so probably not, but basically, I wouldn't trust the > > threading 100%. It needs work, and it needs to stay off by default. > > > > - you're working on a problem that isn't really even worth optimizing > > that much. The *normal* case is to re-use old deltas, which makes all > > of the issues you are fighting basically go away (because you only have > > a few _incremental_ objects that need deltaing). > > I agree, this problem only occurs when people import giant > repositories. But every time someone hits these problems they declare > git to be screwed up and proceed to thrash it in their blogs. It's not only for repack. Someone just reported git-blame being unusable too due to insane memory usage, which I suspect is due to the same issue. Nicolas ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: Something is broken in repack 2007-12-11 18:57 ` Jon Smirl 2007-12-11 19:17 ` Nicolas Pitre @ 2007-12-11 19:40 ` Linus Torvalds 2007-12-11 20:26 ` Junio C Hamano 1 sibling, 1 reply; 45+ messages in thread From: Linus Torvalds @ 2007-12-11 19:40 UTC (permalink / raw) To: Jon Smirl; +Cc: Nicolas Pitre, Junio C Hamano, gcc, Git Mailing List On Tue, 11 Dec 2007, Jon Smirl wrote: > > > > So if you want to use more threads, that _forces_ you to have a bigger > > memory footprint, simply because you have more "live" objects that you > > work on. Normally, that isn't much of a problem, since most source files > > are small, but if you have a few deep delta chains on big files, both the > > delta chain itself is going to use memory (you may have limited the size > > of the cache, but it's still needed for the actual delta generation, so > > it's not like the memory usage went away). > > This makes sense. Those runs that blew up to 4.5GB were a combination > of this effect and fragmentation in the gcc allocator. Google > allocator appears to be much better at controlling fragmentation. Yes. I think we do have some case where we simply keep a lot of objects around, and if we are talking reasonably large deltas, we'll have the whole delta-chain in memory just to unpack one single object. The delta cache size limits kick in only when we explicitly cache old delta results (in case they will be re-used, which is rather common), it doesn't affect the normal "I'm using this data right now" case at all. And then fragmentation makes it much much worse. Since the allocation patterns aren't nice (they are pretty random and depend on just the sizes of the objects), and the lifetimes aren't always nicely nested _either_ (they become more so when you disable the cache entirely, but that's just death for performance), I'm not surprised that there can be memory allocators that end up having some issues. > Is there a reasonable scheme to force the chains to only be loaded > once and then shared between worker threads? The memory blow up > appears to be directly correlated with chain length. The worker threads explicitly avoid touching the same objects, and no, you definitely don't want to explode the chains globally once, because the whole point is that we do fit 15 years worth of history into 300MB of pack-file thanks to having a very dense representation. The "loaded once" part is the mmap'ing of the pack-file into memory, but if you were to actually then try to expand the chains, you'd be talking about many *many* more gigabytes of memory than you already see used ;) So what you actually want to do is to just re-use already packed delta chains directly, which is what we normally do. But you are explicitly looking at the "--no-reuse-delta" (aka "git repack -f") case, which is why it then blows up. I'm sure we can find places to improve. But I would like to re-iterate the statement that you're kind of doing a "don't do that then" case which is really - by design - meant to be done once and never again, and is using resources - again, pretty much by design - wildly inappropriately just to get an initial packing done. > That may account for the threaded version needing an extra 20 minutes > CPU time. An extra 12% of CPU seems like too much overhead for > threading. Just letting a couple of those long chain compressions be > done twice Well, Nico pointed out that those things should all be thread-private data, so no, the race isn't there (unless there's some other bug there). > I agree, this problem only occurs when people import giant > repositories. But every time someone hits these problems they declare > git to be screwed up and proceed to thrash it in their blogs. Sure. I'd love to do global packing without paying the cost, but it really was a design decision. Thanks to doing off-line packing ("let it run overnight on some beefy machine") we can get better results. It's expensive, yes. But it was pretty much meant to be expensive. It's a very efficient compression algorithm, after all, and you're turning it up to eleven ;) I also suspect that the gcc archive makes things more interesting thanks to having some rather large files. The ChangeLog is probably the worst case (large file with *lots* of edits), but I suspect the *.po files aren't wonderful either. Linus ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: Something is broken in repack 2007-12-11 19:40 ` Linus Torvalds @ 2007-12-11 20:26 ` Junio C Hamano 2007-12-12 5:06 ` Andreas Ericsson 0 siblings, 1 reply; 45+ messages in thread From: Junio C Hamano @ 2007-12-11 20:26 UTC (permalink / raw) To: Linus Torvalds; +Cc: Jon Smirl, Nicolas Pitre, gcc, Git Mailing List Linus Torvalds <torvalds@linux-foundation.org> writes: > On Tue, 11 Dec 2007, Jon Smirl wrote: >> > >> > So if you want to use more threads, that _forces_ you to have a bigger >> > memory footprint, simply because you have more "live" objects that you >> > work on. Normally, that isn't much of a problem, since most source files >> > are small, but if you have a few deep delta chains on big files, both the >> > delta chain itself is going to use memory (you may have limited the size >> > of the cache, but it's still needed for the actual delta generation, so >> > it's not like the memory usage went away). >> >> This makes sense. Those runs that blew up to 4.5GB were a combination >> of this effect and fragmentation in the gcc allocator. Google >> allocator appears to be much better at controlling fragmentation. > > Yes. I think we do have some case where we simply keep a lot of objects > around, and if we are talking reasonably large deltas, we'll have the > whole delta-chain in memory just to unpack one single object. Eh, excuse me. unpack_delta_entry() - first unpacks the base object (this goes recursive); - uncompresses the delta; - applies the delta to the base to obtain the target object; - frees delta; - frees (but allows it to be cached) the base object; - returns the result So no matter how deep a chain is, you keep only one delta at a time in core, not whole delta-chain in core. > So what you actually want to do is to just re-use already packed delta > chains directly, which is what we normally do. But you are explicitly > looking at the "--no-reuse-delta" (aka "git repack -f") case, which is why > it then blows up. While that does not explain, as Nico pointed out, the huge difference between the two repack runs that have different starting pack, I would say it is a fair thing to say. If you have a suboptimal pack (i.e. not enough reusable deltas, as in the 2.1GB pack case), do run "repack -f", but if you have a good pack (i.e. 300MB pack), don't. ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: Something is broken in repack 2007-12-11 20:26 ` Junio C Hamano @ 2007-12-12 5:06 ` Andreas Ericsson 0 siblings, 0 replies; 45+ messages in thread From: Andreas Ericsson @ 2007-12-12 5:06 UTC (permalink / raw) To: Junio C Hamano Cc: Linus Torvalds, Jon Smirl, Nicolas Pitre, gcc, Git Mailing List Junio C Hamano wrote: > Linus Torvalds <torvalds@linux-foundation.org> writes: > >> So what you actually want to do is to just re-use already packed delta >> chains directly, which is what we normally do. But you are explicitly >> looking at the "--no-reuse-delta" (aka "git repack -f") case, which is why >> it then blows up. > > While that does not explain, as Nico pointed out, the huge difference > between the two repack runs that have different starting pack, I would > say it is a fair thing to say. If you have a suboptimal pack (i.e. not > enough reusable deltas, as in the 2.1GB pack case), do run "repack -f", > but if you have a good pack (i.e. 300MB pack), don't. I think this is too much of a mystery for a lot of people to let it go. Even I started looking into it, and I've got so little spare time just now that I wouldn't stand much of a chance of making a contribution even if I had written the code originally. That being said, I the fact that some git repositories really *can't* be repacked on some machines (because it eats ALL virtual memory) is really something that lowers git's reputation among huge projects. -- Andreas Ericsson andreas.ericsson@op5.se OP5 AB www.op5.se Tel: +46 8-230225 Fax: +46 8-230231 ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: Something is broken in repack 2007-12-11 7:34 ` Jon Smirl ` (2 preceding siblings ...) 2007-12-11 17:19 ` Linus Torvalds @ 2007-12-11 17:44 ` Daniel Berlin 3 siblings, 0 replies; 45+ messages in thread From: Daniel Berlin @ 2007-12-11 17:44 UTC (permalink / raw) To: Jon Smirl; +Cc: Nicolas Pitre, Junio C Hamano, gcc, Git Mailing List On 12/11/07, Jon Smirl <jonsmirl@gmail.com> wrote: > > Total CPU time 196 CPU minutes vs 190 for gcc. Google's claims of > being faster are not true. Depends on your allocation patterns. For our apps, it certainly is :) Of course, i don't know if we've updated the external allocator in a while, i'll bug the people in charge of it. ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: Something is broken in repack 2007-12-11 7:01 ` Something is broken in repack Jon Smirl 2007-12-11 7:34 ` Jon Smirl @ 2007-12-11 13:49 ` Nicolas Pitre 1 sibling, 0 replies; 45+ messages in thread From: Nicolas Pitre @ 2007-12-11 13:49 UTC (permalink / raw) To: Jon Smirl; +Cc: Junio C Hamano, gcc, Git Mailing List On Tue, 11 Dec 2007, Jon Smirl wrote: > I added the gcc people to the CC, it's their repository. Maybe they > can help up sort this out. Unless there is a Git expert amongst the gcc crowd, I somehow doubt it. And gcc people with an interest in Git internals are probably already on the Git mailing list. Nicolas ^ permalink raw reply [flat|nested] 45+ messages in thread
end of thread, other threads:[~2007-12-14 16:59 UTC | newest] Thread overview: 45+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- [not found] <9e4733910712071505y6834f040k37261d65a2d445c4@mail.gmail.com> [not found] ` <9e4733910712101825l33cdc2c0mca2ddbfd5afdb298@mail.gmail.com> [not found] ` <alpine.LFD.0.99999.0712102231570.555@xanadu.home> [not found] ` <9e4733910712102125w56c70c0cxb8b00a060b62077@mail.gmail.com> 2007-12-11 7:01 ` Something is broken in repack Jon Smirl 2007-12-11 7:34 ` Jon Smirl 2007-12-11 11:11 ` Andreas Ericsson 2007-12-11 15:01 ` Nicolas Pitre 2007-12-11 15:36 ` Nicolas Pitre 2007-12-11 16:20 ` Jon Smirl 2007-12-11 16:22 ` Nicolas Pitre 2007-12-11 16:34 ` Jon Smirl 2007-12-12 7:25 ` Nicolas Pitre 2007-12-12 12:02 ` David Kastrup 2007-12-14 16:44 ` Wolfram Gloger 2007-12-12 16:14 ` Nicolas Pitre 2007-12-12 16:37 ` Paolo Bonzini 2007-12-12 16:42 ` Linus Torvalds 2007-12-12 16:54 ` David Miller 2007-12-12 17:12 ` Linus Torvalds 2007-12-12 17:29 ` Jon Smirl 2007-12-14 16:19 ` Wolfram Gloger 2007-12-14 16:59 ` David Kastrup 2007-12-14 18:49 ` Wolfram Gloger 2007-12-13 14:09 ` Nguyen Thai Ngoc Duy [not found] ` <fjrj9k$n6k$1@ger.gmane.org> 2007-12-13 16:39 ` Paolo Bonzini 2007-12-13 17:40 ` Johannes Sixt 2007-12-14 2:31 ` Jakub Narebski [not found] ` <fjt6vm$n7d$1@ger.gmane.org> 2007-12-14 6:39 ` Nguyen Thai Ngoc Duy 2007-12-14 9:02 ` Paolo Bonzini 2007-12-14 9:39 ` Harvey Harrison 2007-12-14 10:52 ` Jakub Narebski 2007-12-14 13:25 ` Nguyen Thai Ngoc Duy 2007-12-14 13:53 ` Nicolas Pitre 2007-12-12 16:19 ` Nicolas Pitre 2007-12-13 7:39 ` Andreas Ericsson 2007-12-14 16:12 ` Wolfram Gloger 2007-12-11 17:19 ` Linus Torvalds 2007-12-11 17:24 ` Nicolas Pitre 2007-12-11 17:28 ` David Miller 2007-12-11 18:43 ` Nicolas Pitre 2007-12-11 20:34 ` Andreas Ericsson 2007-12-11 18:57 ` Jon Smirl 2007-12-11 19:17 ` Nicolas Pitre 2007-12-11 19:40 ` Linus Torvalds 2007-12-11 20:26 ` Junio C Hamano 2007-12-12 5:06 ` Andreas Ericsson 2007-12-11 17:44 ` Daniel Berlin 2007-12-11 13:49 ` Nicolas Pitre
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).