On Nov 21 01:21, Mark Geisert wrote: > Kacper Michajlow wrote: > >Thanks for reply. And sorry for being not specific enough before. 'git > >gc' is a driver which runs various git command to do cleanup in > >repository. Though I'm mostly concerned about the code I linked. > >Instead of 'git gc' it is better to test directly 'git repack -a -f' > >and possibly on repository where it takes some time. > >'git://sourceware.org/git/newlib-cygwin.git' is good test case. > >Although with bigger repositories performance hit is bigger, this is > >good example to see what's going on. > > I appreciate that more specific info on how you experience the issue. > > >I'm well aware that forking on windows is problematic, but I > >explicitly interested in parallelized part of execution. I don't care > >about forks, while this slows things down too, they are not used in > >compression process which is parallelized over the all cpu threads. > >Each command is indeed forked, but I'm only interested about > >pack-objects part hence the code I linked. > > OK, we're on the same page now :). > > >$ strace --mask=debug+syscall+thread -o git.strace git repack -a -f > >Counting objects: 156690, done. > >Delta compression using up to 12 threads. > >Compressing objects: 100% (154730/154730), done. > >Writing objects: 100% (156690/156690), done. > >Total 156690 (delta 123449), reused 33146 (delta 0) > > > >$ grep "fork(" git.strace > > 559 53728 [main] git 24340 fork: 24368 = fork() > > 465 54022 [main] git 24368 fork: 0 = fork() > > > >Only two forks were created, while during compression only 25% cpu was > >used (on big repo like linux kernel it doesn't exceed 8%). With native > >git the same workload easily uses 95-100% cpu and therefor is a lot > >faster. > > I was able to reproduce your issue using a cloned newlib-cygwin repo. On a > 6-CPU machine I saw max 36% CPU utilization during the compression phase. > ProcessExplorer showed all 6 threads were getting CPU time (to varying > degrees) and when suspended they were always trying to acquire a mutex. I'd > like to run some more straces and perhaps investigate with some other tools > before saying more. This may take a while. > > What I've done so far is install the git-debuginfo and cygwin-debuginfo > packages to that I can convert hex RIP addresses to line numbers. I've run > the testcase under gdb so I can interrupt at random times and poke around. > The straces from this testcase are ginormous so I hope I can figure out a > better way to see why the compression threads aren't CPU-bound like they > should be. If you don't already know, 'strace --help' shows the available > mask values. The threads are each writing to disk, so I wonder if there's > some unintentional serialization going on somewhere, but I don't know yet > how I could verify that theory. If I'm allowed to make an educated guess, the big serializer in Cygwin are probably the calls to malloc, calloc, realloc, free. We desperately need a new malloc implementation better suited to multi-threading. Corinna -- Corinna Vinschen Please, send mails regarding Cygwin to Cygwin Maintainer cygwin AT cygwin DOT com Red Hat