Cygwin multithreading performance

public inbox for cygwin@cygwin.com
 help / color / mirror / Atom feed

* Cygwin multithreading performance
@ 2015-11-14  0:24 Kacper Michajlow
  2015-11-19 20:24 ` Mark Geisert
  2015-12-18 15:06 ` Achim Gratz
  0 siblings, 2 replies; 21+ messages in thread
From: Kacper Michajlow @ 2015-11-14  0:24 UTC (permalink / raw)
  To: cygwin

Hello,

I recently noticed that Cygwin multithreading is very inefficient. I
was repacking few git repositories and with Cygwin's git, it spawns
threads but they are so badly synchronized that there is no speed gain
over one thread and possible loose because of the overhead. On my
machine I got 7-10% CPU usage while with git build with mingw easily
uses 100%.

You can find the code in question here
https://github.com/git/git/blob/master/builtin/pack-objects.c#L1967-L2094

Do you have any suggestions? Is there any chance to get MT workloads
improved in Cygwin? In present days it is really big problem in my
opinion.

Best Regard

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Cygwin multithreading performance
  2015-11-14  0:24 Cygwin multithreading performance Kacper Michajlow
@ 2015-11-19 20:24 ` Mark Geisert
  2015-11-20 14:25   ` Kacper Michajlow
  2015-12-18 15:06 ` Achim Gratz
  1 sibling, 1 reply; 21+ messages in thread
From: Mark Geisert @ 2015-11-19 20:24 UTC (permalink / raw)
  To: cygwin

Kacper Michajlow wrote:
> I recently noticed that Cygwin multithreading is very inefficient. I
> was repacking few git repositories and with Cygwin's git, it spawns
> threads but they are so badly synchronized that there is no speed gain
> over one thread and possible loose because of the overhead. On my
> machine I got 7-10% CPU usage while with git build with mingw easily
> uses 100%.
>
> You can find the code in question here
> https://github.com/git/git/blob/master/builtin/pack-objects.c#L1967-L2094
>
> Do you have any suggestions? Is there any chance to get MT workloads
> improved in Cygwin? In present days it is really big problem in my
> opinion.

Although there have been some issues with Cygwin pthreads reported and 
resolved, I can't recall complaints about their performance.  You don't 
supply much specific info so I had to guess that you must be doing 
something like 'git gc' to provoke calls to the code you quote.  Please 
give more info if I was mistaken.

I did an strace of 'git gc' over a small source tree I have and found:

> ~/src/cygwin-cygutils strace --mask=debug+syscall+thread -o git.strace git gc
> Counting objects: 1691, done.
> Delta compression using up to 4 threads.
> Compressing objects: 100% (398/398), done.
> Writing objects: 100% (1691/1691), done.
> Total 1691 (delta 1250), reused 1691 (delta 1250)
>
> ~/src/cygwin-cygutils grep "fork(" git.strace
>   350  111164 [main] git 360 fork: 0 = fork()
>    59  113379 [main] git 4980 fork: 360 = fork()
>   496  242346 [main] git 4980 fork: 368 = fork()
>   513  242585 [main] git 368 fork: 0 = fork()
>   828  589040 [main] git 4980 fork: 4968 = fork()
>   685  589341 [main] git 4968 fork: 0 = fork()
>   591  126631 [main] git 4968 fork: 1784 = fork()
>   483  126866 [main] git 1784 fork: 0 = fork()
>   618 2320996 [main] git 4980 fork: 2912 = fork()
>   558 2321259 [main] git 2912 fork: 0 = fork()
>   555 3023781 [main] git 4980 fork: 1612 = fork()
>   500 3024002 [main] git 1612 fork: 0 = fork()
>   766 3112383 [main] git 4980 fork: 1756 = fork()
>   681 3112655 [main] git 1756 fork: 0 = fork()

There's your problem.  Git is for some reason fork()ing to do its 
parallel operations.  fork() is very complicated to emulate on Windows 
and Cygwin's fork() is already known to be slow compared to native OS 
implementations.

Why is mingw faster?  Inspection of run-command.c in the git source tree 
(BTW thanks for the github link) shows that start_command() has two code 
paths divided by "#ifndef GIT_WINDOWS_NATIVE".  The Windows native path 
(e.g. mingw) doesn't fork() but instead spawns subprocesses.  On Cygwin 
the fork() path is used.  Git probably ought to use the spawn code path 
on Cygwin too.

I don't know offhand if this is something Cygwin's git maintainer would 
want to tackle or if it should be handled upstream but I'd guess the latter.
Hope this helps,

..mark

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Cygwin multithreading performance
  2015-11-19 20:24 ` Mark Geisert
@ 2015-11-20 14:25   ` Kacper Michajlow
  2015-11-21  9:21     ` Mark Geisert
  0 siblings, 1 reply; 21+ messages in thread
From: Kacper Michajlow @ 2015-11-20 14:25 UTC (permalink / raw)
  To: cygwin

2015-11-19 21:24 GMT+01:00 Mark Geisert <mark@maxrnd.com>:
> Kacper Michajlow wrote:
>>
>> I recently noticed that Cygwin multithreading is very inefficient. I
>> was repacking few git repositories and with Cygwin's git, it spawns
>> threads but they are so badly synchronized that there is no speed gain
>> over one thread and possible loose because of the overhead. On my
>> machine I got 7-10% CPU usage while with git build with mingw easily
>> uses 100%.
>>
>> You can find the code in question here
>> https://github.com/git/git/blob/master/builtin/pack-objects.c#L1967-L2094
>>
>> Do you have any suggestions? Is there any chance to get MT workloads
>> improved in Cygwin? In present days it is really big problem in my
>> opinion.
>
>
> Although there have been some issues with Cygwin pthreads reported and
> resolved, I can't recall complaints about their performance.  You don't
> supply much specific info so I had to guess that you must be doing something
> like 'git gc' to provoke calls to the code you quote.  Please give more info
> if I was mistaken.
>
> I did an strace of 'git gc' over a small source tree I have and found:
>
>> ~/src/cygwin-cygutils strace --mask=debug+syscall+thread -o git.strace git
>> gc
>> Counting objects: 1691, done.
>> Delta compression using up to 4 threads.
>> Compressing objects: 100% (398/398), done.
>> Writing objects: 100% (1691/1691), done.
>> Total 1691 (delta 1250), reused 1691 (delta 1250)
>>
>> ~/src/cygwin-cygutils grep "fork(" git.strace
>>   350  111164 [main] git 360 fork: 0 = fork()
>>    59  113379 [main] git 4980 fork: 360 = fork()
>>   496  242346 [main] git 4980 fork: 368 = fork()
>>   513  242585 [main] git 368 fork: 0 = fork()
>>   828  589040 [main] git 4980 fork: 4968 = fork()
>>   685  589341 [main] git 4968 fork: 0 = fork()
>>   591  126631 [main] git 4968 fork: 1784 = fork()
>>   483  126866 [main] git 1784 fork: 0 = fork()
>>   618 2320996 [main] git 4980 fork: 2912 = fork()
>>   558 2321259 [main] git 2912 fork: 0 = fork()
>>   555 3023781 [main] git 4980 fork: 1612 = fork()
>>   500 3024002 [main] git 1612 fork: 0 = fork()
>>   766 3112383 [main] git 4980 fork: 1756 = fork()
>>   681 3112655 [main] git 1756 fork: 0 = fork()
>
>
> There's your problem.  Git is for some reason fork()ing to do its parallel
> operations.  fork() is very complicated to emulate on Windows and Cygwin's
> fork() is already known to be slow compared to native OS implementations.
>
> Why is mingw faster?  Inspection of run-command.c in the git source tree
> (BTW thanks for the github link) shows that start_command() has two code
> paths divided by "#ifndef GIT_WINDOWS_NATIVE".  The Windows native path
> (e.g. mingw) doesn't fork() but instead spawns subprocesses.  On Cygwin the
> fork() path is used.  Git probably ought to use the spawn code path on
> Cygwin too.
>
> I don't know offhand if this is something Cygwin's git maintainer would want
> to tackle or if it should be handled upstream but I'd guess the latter.
> Hope this helps,
>
> ..mark
>
> --
> Problem reports:       http://cygwin.com/problems.html
> FAQ:                   http://cygwin.com/faq/
> Documentation:         http://cygwin.com/docs.html
> Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple
>

Thanks for reply. And sorry for being not specific enough before. 'git
gc' is a driver which runs various git command to do cleanup in
repository. Though I'm mostly concerned about the code I linked.
Instead of 'git gc' it is better to test directly 'git repack -a -f'
and possibly on repository where it takes some time.
'git://sourceware.org/git/newlib-cygwin.git' is good test case.
Although with bigger repositories performance hit is bigger, this is
good example to see what's going on.

I'm well aware that forking on windows is problematic, but I
explicitly interested in parallelized part of execution. I don't care
about forks, while this slows things down too, they are not used in
compression process which is parallelized over the all cpu threads.
Each command is indeed forked, but I'm only interested about
pack-objects part hence the code I linked.

Here is my result on mineralized test.

$ strace --mask=debug+syscall+thread -o git.strace git repack -a -f
Counting objects: 156690, done.
Delta compression using up to 12 threads.
Compressing objects: 100% (154730/154730), done.
Writing objects: 100% (156690/156690), done.
Total 156690 (delta 123449), reused 33146 (delta 0)

$ grep "fork(" git.strace
  559   53728 [main] git 24340 fork: 24368 = fork()
  465   54022 [main] git 24368 fork: 0 = fork()

Only two forks were created, while during compression only 25% cpu was
used (on big repo like linux kernel it doesn't exceed 8%). With native
git the same workload easily uses 95-100% cpu and therefor is a lot
faster.

I know I'm not that specific, but I don't know what more to say here.
I could try to produce sample app to illustrate the issue. But git is
already good example I think. Pure C with pthreads. I already linked
the code in my first email.

Tell me how I can help to diagnose it further.

-Kacper

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Cygwin multithreading performance
  2015-11-20 14:25   ` Kacper Michajlow
@ 2015-11-21  9:21     ` Mark Geisert
  2015-11-21 10:53       ` Corinna Vinschen
  0 siblings, 1 reply; 21+ messages in thread
From: Mark Geisert @ 2015-11-21  9:21 UTC (permalink / raw)
  To: cygwin

Kacper Michajlow wrote:
> Thanks for reply. And sorry for being not specific enough before. 'git
> gc' is a driver which runs various git command to do cleanup in
> repository. Though I'm mostly concerned about the code I linked.
> Instead of 'git gc' it is better to test directly 'git repack -a -f'
> and possibly on repository where it takes some time.
> 'git://sourceware.org/git/newlib-cygwin.git' is good test case.
> Although with bigger repositories performance hit is bigger, this is
> good example to see what's going on.

I appreciate that more specific info on how you experience the issue.

> I'm well aware that forking on windows is problematic, but I
> explicitly interested in parallelized part of execution. I don't care
> about forks, while this slows things down too, they are not used in
> compression process which is parallelized over the all cpu threads.
> Each command is indeed forked, but I'm only interested about
> pack-objects part hence the code I linked.

OK, we're on the same page now :).

> $ strace --mask=debug+syscall+thread -o git.strace git repack -a -f
> Counting objects: 156690, done.
> Delta compression using up to 12 threads.
> Compressing objects: 100% (154730/154730), done.
> Writing objects: 100% (156690/156690), done.
> Total 156690 (delta 123449), reused 33146 (delta 0)
>
> $ grep "fork(" git.strace
>    559   53728 [main] git 24340 fork: 24368 = fork()
>    465   54022 [main] git 24368 fork: 0 = fork()
>
> Only two forks were created, while during compression only 25% cpu was
> used (on big repo like linux kernel it doesn't exceed 8%). With native
> git the same workload easily uses 95-100% cpu and therefor is a lot
> faster.

I was able to reproduce your issue using a cloned newlib-cygwin repo. 
On a 6-CPU machine I saw max 36% CPU utilization during the compression 
phase.  ProcessExplorer showed all 6 threads were getting CPU time (to 
varying degrees) and when suspended they were always trying to acquire a 
mutex.  I'd like to run some more straces and perhaps investigate with 
some other tools before saying more.  This may take a while.

What I've done so far is install the git-debuginfo and cygwin-debuginfo 
packages to that I can convert hex RIP addresses to line numbers.  I've 
run the testcase under gdb so I can interrupt at random times and poke 
around.  The straces from this testcase are ginormous so I hope I can 
figure out a better way to see why the compression threads aren't 
CPU-bound like they should be.  If you don't already know, 'strace 
--help' shows the available mask values.  The threads are each writing 
to disk, so I wonder if there's some unintentional serialization going 
on somewhere, but I don't know yet how I could verify that theory.

..mark


--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Cygwin multithreading performance
  2015-11-21  9:21     ` Mark Geisert
@ 2015-11-21 10:53       ` Corinna Vinschen
  2015-11-23  7:45         ` Mark Geisert
  0 siblings, 1 reply; 21+ messages in thread
From: Corinna Vinschen @ 2015-11-21 10:53 UTC (permalink / raw)
  To: cygwin

[-- Attachment #1: Type: text/plain, Size: 3282 bytes --]

On Nov 21 01:21, Mark Geisert wrote:
> Kacper Michajlow wrote:
> >Thanks for reply. And sorry for being not specific enough before. 'git
> >gc' is a driver which runs various git command to do cleanup in
> >repository. Though I'm mostly concerned about the code I linked.
> >Instead of 'git gc' it is better to test directly 'git repack -a -f'
> >and possibly on repository where it takes some time.
> >'git://sourceware.org/git/newlib-cygwin.git' is good test case.
> >Although with bigger repositories performance hit is bigger, this is
> >good example to see what's going on.
> 
> I appreciate that more specific info on how you experience the issue.
> 
> >I'm well aware that forking on windows is problematic, but I
> >explicitly interested in parallelized part of execution. I don't care
> >about forks, while this slows things down too, they are not used in
> >compression process which is parallelized over the all cpu threads.
> >Each command is indeed forked, but I'm only interested about
> >pack-objects part hence the code I linked.
> 
> OK, we're on the same page now :).
> 
> >$ strace --mask=debug+syscall+thread -o git.strace git repack -a -f
> >Counting objects: 156690, done.
> >Delta compression using up to 12 threads.
> >Compressing objects: 100% (154730/154730), done.
> >Writing objects: 100% (156690/156690), done.
> >Total 156690 (delta 123449), reused 33146 (delta 0)
> >
> >$ grep "fork(" git.strace
> >   559   53728 [main] git 24340 fork: 24368 = fork()
> >   465   54022 [main] git 24368 fork: 0 = fork()
> >
> >Only two forks were created, while during compression only 25% cpu was
> >used (on big repo like linux kernel it doesn't exceed 8%). With native
> >git the same workload easily uses 95-100% cpu and therefor is a lot
> >faster.
> 
> I was able to reproduce your issue using a cloned newlib-cygwin repo. On a
> 6-CPU machine I saw max 36% CPU utilization during the compression phase.
> ProcessExplorer showed all 6 threads were getting CPU time (to varying
> degrees) and when suspended they were always trying to acquire a mutex.  I'd
> like to run some more straces and perhaps investigate with some other tools
> before saying more.  This may take a while.
> 
> What I've done so far is install the git-debuginfo and cygwin-debuginfo
> packages to that I can convert hex RIP addresses to line numbers.  I've run
> the testcase under gdb so I can interrupt at random times and poke around.
> The straces from this testcase are ginormous so I hope I can figure out a
> better way to see why the compression threads aren't CPU-bound like they
> should be.  If you don't already know, 'strace --help' shows the available
> mask values.  The threads are each writing to disk, so I wonder if there's
> some unintentional serialization going on somewhere, but I don't know yet
> how I could verify that theory.

If I'm allowed to make an educated guess, the big serializer in Cygwin
are probably the calls to malloc, calloc, realloc, free.  We desperately
need a new malloc implementation better suited to multi-threading.


Corinna

-- 
Corinna Vinschen                  Please, send mails regarding Cygwin to
Cygwin Maintainer                 cygwin AT cygwin DOT com
Red Hat

[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Cygwin multithreading performance
  2015-11-21 10:53       ` Corinna Vinschen
@ 2015-11-23  7:45         ` Mark Geisert
  2015-11-23 10:27           ` John Hein
  0 siblings, 1 reply; 21+ messages in thread
From: Mark Geisert @ 2015-11-23  7:45 UTC (permalink / raw)
  To: cygwin

Corinna Vinschen wrote:
> On Nov 21 01:21, Mark Geisert wrote:
[...] so I wonder if there's
>> some unintentional serialization going on somewhere, but I don't know yet
>> how I could verify that theory.
>
> If I'm allowed to make an educated guess, the big serializer in Cygwin
> are probably the calls to malloc, calloc, realloc, free.  We desperately
> need a new malloc implementation better suited to multi-threading.

That's very helpful to know.  I'd want to first make sure the heavy lock 
activity I'm seeing in the traces really is due to malloc() and friends 
but I couldn't help a speculative search online for multithread-safe 
malloc().  These turned up:
     tcmalloc - part of google-perftools, requires libunwind, evidently 
not yet ported to Windows AFAICT,
     nedmalloc - http://www.nedprod.com/programs/portable/nedmalloc/
     ptmalloc - http://www.malloc.de/

The latter two are based on Doug Lea's dlmalloc which is also the basis 
of Cygwin's malloc() functions.  As I understand it, ptmalloc in one 
form or another has been part of glibc on Linux for some time.

So there may be a solution in sight if we need to go that direction.  Of 
course, SHTDI as usual :).

..mark

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Cygwin multithreading performance
  2015-11-23  7:45         ` Mark Geisert
@ 2015-11-23 10:27           ` John Hein
  2015-11-24  1:05             ` Mark Geisert
  0 siblings, 1 reply; 21+ messages in thread
From: John Hein @ 2015-11-23 10:27 UTC (permalink / raw)
  To: cygwin

Mark Geisert wrote at 23:45 -0800 on Nov 22, 2015:
 > Corinna Vinschen wrote:
 > > On Nov 21 01:21, Mark Geisert wrote:
 > [...] so I wonder if there's
 > >> some unintentional serialization going on somewhere, but I don't know yet
 > >> how I could verify that theory.
 > >
 > > If I'm allowed to make an educated guess, the big serializer in Cygwin
 > > are probably the calls to malloc, calloc, realloc, free.  We desperately
 > > need a new malloc implementation better suited to multi-threading.
 > 
 > That's very helpful to know.  I'd want to first make sure the heavy lock 
 > activity I'm seeing in the traces really is due to malloc() and friends 
 > but I couldn't help a speculative search online for multithread-safe 
 > malloc().  These turned up:
 >      tcmalloc - part of google-perftools, requires libunwind, evidently 
 > not yet ported to Windows AFAICT,
 >      nedmalloc - http://www.nedprod.com/programs/portable/nedmalloc/
 >      ptmalloc - http://www.malloc.de/
 > 
 > The latter two are based on Doug Lea's dlmalloc which is also the basis 
 > of Cygwin's malloc() functions.  As I understand it, ptmalloc in one 
 > form or another has been part of glibc on Linux for some time.
 > 
 > So there may be a solution in sight if we need to go that direction.  Of 
 > course, SHTDI as usual :).
 > 
 > ...mark

Someone recently mentioned on this list they were working on porting
jemalloc.  That would be a good choice.

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Cygwin multithreading performance
  2015-11-23 10:27           ` John Hein
@ 2015-11-24  1:05             ` Mark Geisert
  2015-11-26  9:49               ` Corinna Vinschen
  0 siblings, 1 reply; 21+ messages in thread
From: Mark Geisert @ 2015-11-24  1:05 UTC (permalink / raw)
  To: cygwin

John Hein wrote:
> Mark Geisert wrote at 23:45 -0800 on Nov 22, 2015:
>   > Corinna Vinschen wrote:
>   > > On Nov 21 01:21, Mark Geisert wrote:
>   > [...] so I wonder if there's
>   > >> some unintentional serialization going on somewhere, but I don't know yet
>   > >> how I could verify that theory.
>   > >
>   > > If I'm allowed to make an educated guess, the big serializer in Cygwin
>   > > are probably the calls to malloc, calloc, realloc, free.  We desperately
>   > > need a new malloc implementation better suited to multi-threading.
>   >
>   > That's very helpful to know.  I'd want to first make sure the heavy lock
>   > activity I'm seeing in the traces really is due to malloc() and friends
>   > but I couldn't help a speculative search online for multithread-safe
>   > malloc().  These turned up:
>   >      tcmalloc - part of google-perftools, requires libunwind, evidently
>   > not yet ported to Windows AFAICT,
>   >      nedmalloc - http://www.nedprod.com/programs/portable/nedmalloc/
>   >      ptmalloc - http://www.malloc.de/
>   >
>   > The latter two are based on Doug Lea's dlmalloc which is also the basis
>   > of Cygwin's malloc() functions.  As I understand it, ptmalloc in one
>   > form or another has been part of glibc on Linux for some time.
>   >
>   > So there may be a solution in sight if we need to go that direction.  Of
>   > course, SHTDI as usual :).
>   >
>   > ...mark
>
> Someone recently mentioned on this list they were working on porting
> jemalloc.  That would be a good choice.

Indeed; thanks for the reminder.  Somehow I hadn't followed that thread.

..mark


--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Cygwin multithreading performance
  2015-11-24  1:05             ` Mark Geisert
@ 2015-11-26  9:49               ` Corinna Vinschen
  2015-11-26 10:49                 ` Mark Geisert
  0 siblings, 1 reply; 21+ messages in thread
From: Corinna Vinschen @ 2015-11-26  9:49 UTC (permalink / raw)
  To: cygwin

[-- Attachment #1: Type: text/plain, Size: 2043 bytes --]

On Nov 23 16:54, Mark Geisert wrote:
> John Hein wrote:
> >Mark Geisert wrote at 23:45 -0800 on Nov 22, 2015:
> >  > Corinna Vinschen wrote:
> >  > > On Nov 21 01:21, Mark Geisert wrote:
> >  > [...] so I wonder if there's
> >  > >> some unintentional serialization going on somewhere, but I don't know yet
> >  > >> how I could verify that theory.
> >  > >
> >  > > If I'm allowed to make an educated guess, the big serializer in Cygwin
> >  > > are probably the calls to malloc, calloc, realloc, free.  We desperately
> >  > > need a new malloc implementation better suited to multi-threading.
> >  >
> >  > That's very helpful to know.  I'd want to first make sure the heavy lock
> >  > activity I'm seeing in the traces really is due to malloc() and friends
> >  > but I couldn't help a speculative search online for multithread-safe
> >  > malloc().  These turned up:
> >  >      tcmalloc - part of google-perftools, requires libunwind, evidently
> >  > not yet ported to Windows AFAICT,
> >  >      nedmalloc - http://www.nedprod.com/programs/portable/nedmalloc/
> >  >      ptmalloc - http://www.malloc.de/
> >  >
> >  > The latter two are based on Doug Lea's dlmalloc which is also the basis
> >  > of Cygwin's malloc() functions.  As I understand it, ptmalloc in one
> >  > form or another has been part of glibc on Linux for some time.
> >  >
> >  > So there may be a solution in sight if we need to go that direction.  Of
> >  > course, SHTDI as usual :).
> >  >
> >  > ...mark
> >
> >Someone recently mentioned on this list they were working on porting
> >jemalloc.  That would be a good choice.
> 
> Indeed; thanks for the reminder.  Somehow I hadn't followed that thread.

Indeed^2.  Did you look into the locking any further to see if there's
more than one culprit?  I guess we've a rather long way to a "lock-less
kernel"...


Corinna

-- 
Corinna Vinschen                  Please, send mails regarding Cygwin to
Cygwin Maintainer                 cygwin AT cygwin DOT com
Red Hat

[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Cygwin multithreading performance
  2015-11-26  9:49               ` Corinna Vinschen
@ 2015-11-26 10:49                 ` Mark Geisert
  2015-12-05 10:51                   ` Mark Geisert
  0 siblings, 1 reply; 21+ messages in thread
From: Mark Geisert @ 2015-11-26 10:49 UTC (permalink / raw)
  To: cygwin

Corinna Vinschen wrote:
> On Nov 23 16:54, Mark Geisert wrote:
>> John Hein wrote:
>>> Mark Geisert wrote at 23:45 -0800 on Nov 22, 2015:
>>>   > Corinna Vinschen wrote:
>>>   > > On Nov 21 01:21, Mark Geisert wrote:
>>>   > [...] so I wonder if there's
>>>   > >> some unintentional serialization going on somewhere, but I don't know yet
>>>   > >> how I could verify that theory.
>>>   > >
>>>   > > If I'm allowed to make an educated guess, the big serializer in Cygwin
>>>   > > are probably the calls to malloc, calloc, realloc, free.  We desperately
>>>   > > need a new malloc implementation better suited to multi-threading.
[...]
>>>
>>> Someone recently mentioned on this list they were working on porting
>>> jemalloc.  That would be a good choice.
>>
>> Indeed; thanks for the reminder.  Somehow I hadn't followed that thread.
>
> Indeed^2.  Did you look into the locking any further to see if there's
> more than one culprit?  I guess we've a rather long way to a "lock-less
> kernel"...

It took me a while to figure out what I wanted to see in the strace 
logs.  I ended up adding a small patch to pthread_mutex::lock() to 
record a timestamp on entry, and also log that in the pthread_printf() 
near the end of the method.  With that I'm able to see how long a thread 
has to wait for a lock before actually acquiring it.  That will allow me 
to unravel the sequence of locking and unlocking and give stats for all 
threads and/or locks.  That could be generally useful to evaluate 
different memory allocators or different locking strategies using the 
same allocator.

But that is just groundwork to identifying which locks are suffering the 
most contention.  To identify them at source level I think I'll also 
need to record the caller's RIP when they are being locked.

In the raw strace data I'm looking at for the OP's testcase, I can see a 
lot of cases where a thread wants a lock but is delayed for milliseconds 
before getting ahold of it.  I can't say ATM whether it's just one or a 
few locks suffering this way, or more.  Work continues :).

..mark

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Cygwin multithreading performance
  2015-11-26 10:49                 ` Mark Geisert
@ 2015-12-05 10:51                   ` Mark Geisert
  2015-12-05 13:07                     ` Kacper Michajlow
  0 siblings, 1 reply; 21+ messages in thread
From: Mark Geisert @ 2015-12-05 10:51 UTC (permalink / raw)
  To: cygwin

Mark Geisert wrote:
> Corinna Vinschen wrote:
>> On Nov 23 16:54, Mark Geisert wrote:
>>> John Hein wrote:
>>>> Mark Geisert wrote at 23:45 -0800 on Nov 22, 2015:
>>>>   > Corinna Vinschen wrote:
>>>>   > > On Nov 21 01:21, Mark Geisert wrote:
>>>>   > [...] so I wonder if there's
>>>>   > >> some unintentional serialization going on somewhere, but I
>>>> don't know yet
>>>>   > >> how I could verify that theory.
>>>>   > >
>>>>   > > If I'm allowed to make an educated guess, the big serializer
>>>> in Cygwin
>>>>   > > are probably the calls to malloc, calloc, realloc, free.  We
>>>> desperately
>>>>   > > need a new malloc implementation better suited to
>>>> multi-threading.
> [...]
>>>>
>>>> Someone recently mentioned on this list they were working on porting
>>>> jemalloc.  That would be a good choice.
>>>
>>> Indeed; thanks for the reminder.  Somehow I hadn't followed that thread.
>>
>> Indeed^2.  Did you look into the locking any further to see if there's
>> more than one culprit?  I guess we've a rather long way to a "lock-less
>> kernel"...
[...]
> But that is just groundwork to identifying which locks are suffering the
> most contention.  To identify them at source level I think I'll also
> need to record the caller's RIP when they are being locked.

In the OP's very good testcase the most heavily contended locks, by far, 
are those internal to git's builtin/pack-objects.c.  I plan to show 
actual stats after some more cleanup, but I did notice something in that 
git source file that might explain the difference between Cygwin and 
MinGW when running this testcase...

#ifndef NO_PTHREADS

static pthread_mutex_t read_mutex;
#define read_lock()             pthread_mutex_lock(&read_mutex)
#define read_unlock()           pthread_mutex_unlock(&read_mutex)

static pthread_mutex_t cache_mutex;
#define cache_lock()            pthread_mutex_lock(&cache_mutex)
#define cache_unlock()          pthread_mutex_unlock(&cache_mutex)

static pthread_mutex_t progress_mutex;
#define progress_lock()         pthread_mutex_lock(&progress_mutex)
#define progress_unlock()       pthread_mutex_unlock(&progress_mutex)

#else

#define read_lock()             (void)0
#define read_unlock()           (void)0
#define cache_lock()            (void)0
#define cache_unlock()          (void)0
#define progress_lock()         (void)0
#define progress_unlock()       (void)0

#endif

Is it possible the MinGW version of git is compiled with NO_PTHREADS 
#defined?  If so, it would mean there's no locking being done at all and 
would explain the faster execution and near 100% CPU utilization when 
running under MinGW.

..mark


--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Cygwin multithreading performance
  2015-12-05 10:51                   ` Mark Geisert
@ 2015-12-05 13:07                     ` Kacper Michajlow
  2015-12-05 13:59                       ` Kacper Michajlow
  2015-12-05 22:40                       ` Mark Geisert
  0 siblings, 2 replies; 21+ messages in thread
From: Kacper Michajlow @ 2015-12-05 13:07 UTC (permalink / raw)
  To: cygwin

2015-12-05 11:51 GMT+01:00 Mark Geisert <mark@maxrnd.com>:
> Mark Geisert wrote:
>>
>> Corinna Vinschen wrote:
>>>
>>> On Nov 23 16:54, Mark Geisert wrote:
>>>>
>>>> John Hein wrote:
>>>>>
>>>>> Mark Geisert wrote at 23:45 -0800 on Nov 22, 2015:
>>>>>   > Corinna Vinschen wrote:
>>>>>   > > On Nov 21 01:21, Mark Geisert wrote:
>>>>>   > [...] so I wonder if there's
>>>>>   > >> some unintentional serialization going on somewhere, but I
>>>>> don't know yet
>>>>>   > >> how I could verify that theory.
>>>>>   > >
>>>>>   > > If I'm allowed to make an educated guess, the big serializer
>>>>> in Cygwin
>>>>>   > > are probably the calls to malloc, calloc, realloc, free.  We
>>>>> desperately
>>>>>   > > need a new malloc implementation better suited to
>>>>> multi-threading.
>>
>> [...]
>>>>>
>>>>>
>>>>> Someone recently mentioned on this list they were working on porting
>>>>> jemalloc.  That would be a good choice.
>>>>
>>>>
>>>> Indeed; thanks for the reminder.  Somehow I hadn't followed that thread.
>>>
>>>
>>> Indeed^2.  Did you look into the locking any further to see if there's
>>> more than one culprit?  I guess we've a rather long way to a "lock-less
>>> kernel"...
>
> [...]
>>
>> But that is just groundwork to identifying which locks are suffering the
>> most contention.  To identify them at source level I think I'll also
>> need to record the caller's RIP when they are being locked.
>
>
> In the OP's very good testcase the most heavily contended locks, by far, are
> those internal to git's builtin/pack-objects.c.  I plan to show actual stats
> after some more cleanup, but I did notice something in that git source file
> that might explain the difference between Cygwin and MinGW when running this
> testcase...
>
> #ifndef NO_PTHREADS
>
> static pthread_mutex_t read_mutex;
> #define read_lock()             pthread_mutex_lock(&read_mutex)
> #define read_unlock()           pthread_mutex_unlock(&read_mutex)
>
> static pthread_mutex_t cache_mutex;
> #define cache_lock()            pthread_mutex_lock(&cache_mutex)
> #define cache_unlock()          pthread_mutex_unlock(&cache_mutex)
>
> static pthread_mutex_t progress_mutex;
> #define progress_lock()         pthread_mutex_lock(&progress_mutex)
> #define progress_unlock()       pthread_mutex_unlock(&progress_mutex)
>
> #else
>
> #define read_lock()             (void)0
> #define read_unlock()           (void)0
> #define cache_lock()            (void)0
> #define cache_unlock()          (void)0
> #define progress_lock()         (void)0
> #define progress_unlock()       (void)0
>
> #endif
>
> Is it possible the MinGW version of git is compiled with NO_PTHREADS
> #defined?  If so, it would mean there's no locking being done at all and
> would explain the faster execution and near 100% CPU utilization when
> running under MinGW.

Nah, there is no threading enabled when there is no pthreads. How
would that work? :D See thread-utils.h

#ifndef NO_PTHREADS
#include <pthread.h>

extern int online_cpus(void);
extern int init_recursive_mutex(pthread_mutex_t*);

#else

#define online_cpus() 1

#endif


Looks like there is indeed a bug in git code when passing "--threads"
explicitly to "git pack-objects", because they show warning about
threads being unsupported, but doesn't overwrite delta_search_threads
value. I will go to git's ML about it. This is completely not related
to our issue.

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Cygwin multithreading performance
  2015-12-05 13:07                     ` Kacper Michajlow
@ 2015-12-05 13:59                       ` Kacper Michajlow
  2015-12-05 22:40                       ` Mark Geisert
  1 sibling, 0 replies; 21+ messages in thread
From: Kacper Michajlow @ 2015-12-05 13:59 UTC (permalink / raw)
  To: cygwin

2015-12-05 14:07 GMT+01:00 Kacper Michajlow <kasper93@gmail.com>:
> 2015-12-05 11:51 GMT+01:00 Mark Geisert <mark@maxrnd.com>:
>> Mark Geisert wrote:
>>>
>>> Corinna Vinschen wrote:
>>>>
>>>> On Nov 23 16:54, Mark Geisert wrote:
>>>>>
>>>>> John Hein wrote:
>>>>>>
>>>>>> Mark Geisert wrote at 23:45 -0800 on Nov 22, 2015:
>>>>>>   > Corinna Vinschen wrote:
>>>>>>   > > On Nov 21 01:21, Mark Geisert wrote:
>>>>>>   > [...] so I wonder if there's
>>>>>>   > >> some unintentional serialization going on somewhere, but I
>>>>>> don't know yet
>>>>>>   > >> how I could verify that theory.
>>>>>>   > >
>>>>>>   > > If I'm allowed to make an educated guess, the big serializer
>>>>>> in Cygwin
>>>>>>   > > are probably the calls to malloc, calloc, realloc, free.  We
>>>>>> desperately
>>>>>>   > > need a new malloc implementation better suited to
>>>>>> multi-threading.
>>>
>>> [...]
>>>>>>
>>>>>>
>>>>>> Someone recently mentioned on this list they were working on porting
>>>>>> jemalloc.  That would be a good choice.
>>>>>
>>>>>
>>>>> Indeed; thanks for the reminder.  Somehow I hadn't followed that thread.
>>>>
>>>>
>>>> Indeed^2.  Did you look into the locking any further to see if there's
>>>> more than one culprit?  I guess we've a rather long way to a "lock-less
>>>> kernel"...
>>
>> [...]
>>>
>>> But that is just groundwork to identifying which locks are suffering the
>>> most contention.  To identify them at source level I think I'll also
>>> need to record the caller's RIP when they are being locked.
>>
>>
>> In the OP's very good testcase the most heavily contended locks, by far, are
>> those internal to git's builtin/pack-objects.c.  I plan to show actual stats
>> after some more cleanup, but I did notice something in that git source file
>> that might explain the difference between Cygwin and MinGW when running this
>> testcase...
>>
>> #ifndef NO_PTHREADS
>>
>> static pthread_mutex_t read_mutex;
>> #define read_lock()             pthread_mutex_lock(&read_mutex)
>> #define read_unlock()           pthread_mutex_unlock(&read_mutex)
>>
>> static pthread_mutex_t cache_mutex;
>> #define cache_lock()            pthread_mutex_lock(&cache_mutex)
>> #define cache_unlock()          pthread_mutex_unlock(&cache_mutex)
>>
>> static pthread_mutex_t progress_mutex;
>> #define progress_lock()         pthread_mutex_lock(&progress_mutex)
>> #define progress_unlock()       pthread_mutex_unlock(&progress_mutex)
>>
>> #else
>>
>> #define read_lock()             (void)0
>> #define read_unlock()           (void)0
>> #define cache_lock()            (void)0
>> #define cache_unlock()          (void)0
>> #define progress_lock()         (void)0
>> #define progress_unlock()       (void)0
>>
>> #endif
>>
>> Is it possible the MinGW version of git is compiled with NO_PTHREADS
>> #defined?  If so, it would mean there's no locking being done at all and
>> would explain the faster execution and near 100% CPU utilization when
>> running under MinGW.
>
> Nah, there is no threading enabled when there is no pthreads. How
> would that work? :D See thread-utils.h
>
> #ifndef NO_PTHREADS
> #include <pthread.h>
>
> extern int online_cpus(void);
> extern int init_recursive_mutex(pthread_mutex_t*);
>
> #else
>
> #define online_cpus() 1
>
> #endif
>
>
> Looks like there is indeed a bug in git code when passing "--threads"
> explicitly to "git pack-objects", because they show warning about
> threads being unsupported, but doesn't overwrite delta_search_threads
> value. I will go to git's ML about it. This is completely not related
> to our issue.

Obviously I was wrong. There is
#define ll_find_deltas(l, s, w, d, p)    find_deltas(l, &s, w, d, p)
So 'delta_search_threads' value is never used. Still not related to
cygwin issue tho ;)

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Cygwin multithreading performance
  2015-12-05 13:07                     ` Kacper Michajlow
  2015-12-05 13:59                       ` Kacper Michajlow
@ 2015-12-05 22:40                       ` Mark Geisert
  2015-12-06  2:35                         ` Kacper Michajlow
  1 sibling, 1 reply; 21+ messages in thread
From: Mark Geisert @ 2015-12-05 22:40 UTC (permalink / raw)
  To: cygwin

Kacper Michajlow wrote:
> 2015-12-05 11:51 GMT+01:00 Mark Geisert <mark@maxrnd.com>:
>> Mark Geisert wrote:
>> In the OP's very good testcase the most heavily contended locks, by far, are
>> those internal to git's builtin/pack-objects.c.  I plan to show actual stats
>> after some more cleanup, but I did notice something in that git source file
>> that might explain the difference between Cygwin and MinGW when running this
>> testcase...
>>
>> #ifndef NO_PTHREADS
>>
>> static pthread_mutex_t read_mutex;
>> #define read_lock()             pthread_mutex_lock(&read_mutex)
>> #define read_unlock()           pthread_mutex_unlock(&read_mutex)
>>
>> static pthread_mutex_t cache_mutex;
>> #define cache_lock()            pthread_mutex_lock(&cache_mutex)
>> #define cache_unlock()          pthread_mutex_unlock(&cache_mutex)
>>
>> static pthread_mutex_t progress_mutex;
>> #define progress_lock()         pthread_mutex_lock(&progress_mutex)
>> #define progress_unlock()       pthread_mutex_unlock(&progress_mutex)
>>
>> #else
>>
>> #define read_lock()             (void)0
>> #define read_unlock()           (void)0
>> #define cache_lock()            (void)0
>> #define cache_unlock()          (void)0
>> #define progress_lock()         (void)0
>> #define progress_unlock()       (void)0
>>
>> #endif
>>
>> Is it possible the MinGW version of git is compiled with NO_PTHREADS
>> #defined?  If so, it would mean there's no locking being done at all and
>> would explain the faster execution and near 100% CPU utilization when
>> running under MinGW.
>
> Nah, there is no threading enabled when there is no pthreads. How
> would that work? :D See thread-utils.h
>
> #ifndef NO_PTHREADS
> #include <pthread.h>
>
> extern int online_cpus(void);
> extern int init_recursive_mutex(pthread_mutex_t*);
>
> #else
>
> #define online_cpus() 1
>
> #endif

We're not familiar at all with MinGW.  Could you locate the source for 
MinGW's pthread_mutex_lock() online and give us a link to it?  And BTW, 
which Windows are you running and on what kind of hardware (bitness and 
#CPUS/threads)?

It looks like we're going to have to compare actual pthread_mutex_lock() 
implementations.  Inspecting source is nice but I don't want to be 
chasing a mirage so I really hope there's a pthread_mutex_lock() 
function inside the MinGW git you are running.  gdb could easily answer 
that question.  Could you please do an 'info func pthread_mutex_lock' 
after starting MinGW git under MinGW gdb with a breakpoint at main() (so 
libraries are loaded).

..mark


--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Cygwin multithreading performance
  2015-12-05 22:40                       ` Mark Geisert
@ 2015-12-06  2:35                         ` Kacper Michajlow
  2015-12-06  8:02                           ` Mark Geisert
  0 siblings, 1 reply; 21+ messages in thread
From: Kacper Michajlow @ 2015-12-06  2:35 UTC (permalink / raw)
  To: cygwin

2015-12-05 23:40 GMT+01:00 Mark Geisert <mark@maxrnd.com>:
> Kacper Michajlow wrote:
>>
>> 2015-12-05 11:51 GMT+01:00 Mark Geisert <mark@maxrnd.com>:
>>>
>>> Mark Geisert wrote:
>>> In the OP's very good testcase the most heavily contended locks, by far,
>>> are
>>> those internal to git's builtin/pack-objects.c.  I plan to show actual
>>> stats
>>> after some more cleanup, but I did notice something in that git source
>>> file
>>> that might explain the difference between Cygwin and MinGW when running
>>> this
>>> testcase...
>>>
>>> #ifndef NO_PTHREADS
>>>
>>> static pthread_mutex_t read_mutex;
>>> #define read_lock()             pthread_mutex_lock(&read_mutex)
>>> #define read_unlock()           pthread_mutex_unlock(&read_mutex)
>>>
>>> static pthread_mutex_t cache_mutex;
>>> #define cache_lock()            pthread_mutex_lock(&cache_mutex)
>>> #define cache_unlock()          pthread_mutex_unlock(&cache_mutex)
>>>
>>> static pthread_mutex_t progress_mutex;
>>> #define progress_lock()         pthread_mutex_lock(&progress_mutex)
>>> #define progress_unlock()       pthread_mutex_unlock(&progress_mutex)
>>>
>>> #else
>>>
>>> #define read_lock()             (void)0
>>> #define read_unlock()           (void)0
>>> #define cache_lock()            (void)0
>>> #define cache_unlock()          (void)0
>>> #define progress_lock()         (void)0
>>> #define progress_unlock()       (void)0
>>>
>>> #endif
>>>
>>> Is it possible the MinGW version of git is compiled with NO_PTHREADS
>>> #defined?  If so, it would mean there's no locking being done at all and
>>> would explain the faster execution and near 100% CPU utilization when
>>> running under MinGW.
>>
>>
>> Nah, there is no threading enabled when there is no pthreads. How
>> would that work? :D See thread-utils.h
>>
>> #ifndef NO_PTHREADS
>> #include <pthread.h>
>>
>> extern int online_cpus(void);
>> extern int init_recursive_mutex(pthread_mutex_t*);
>>
>> #else
>>
>> #define online_cpus() 1
>>
>> #endif
>
>
> We're not familiar at all with MinGW.  Could you locate the source for
> MinGW's pthread_mutex_lock() online and give us a link to it?  And BTW,
> which Windows are you running and on what kind of hardware (bitness and
> #CPUS/threads)?
>
> It looks like we're going to have to compare actual pthread_mutex_lock()
> implementations.  Inspecting source is nice but I don't want to be chasing a
> mirage so I really hope there's a pthread_mutex_lock() function inside the
> MinGW git you are running.  gdb could easily answer that question.  Could
> you please do an 'info func pthread_mutex_lock' after starting MinGW git
> under MinGW gdb with a breakpoint at main() (so libraries are loaded).
>
>
> ..mark
>
>
> --
> Problem reports:       http://cygwin.com/problems.html
> FAQ:                   http://cygwin.com/faq/
> Documentation:         http://cygwin.com/docs.html
> Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple
>

Hmm, thinking about it mingw doesn't have pthread implementation or
any wrapper for it. If someone needs pthread they would probably go
for pthreads-w32 implementation.

I started to wonder because I don't recall git would need pthreads to
compile on Windows. And indeed they have a wrapper for Windows API...
https://github.com/git/git/blob/master/compat/win32/pthread.h
https://github.com/git/git/blob/master/compat/win32/pthread.c

Though it is not really a matter that "native" git build is fast and
all, but that Cygwin's one really struggles if it comes to MT workload
.

And this not only issue with git unfortunately. Download speeds are
also limited on Cygwin. I know POSIX compatibility layers comes with a
price but I would love to see improvements in those areas.
Cygwin:
Receiving objects: 100% (230458/230458), 78.41 MiB | 1.53 MiB/s, done.
"native" git:
Receiving objects: 100% (230458/230458), 78.41 MiB | 18.54 MiB/s, done.

I'm on Windows 10 x64 and i7 5820K (6C/12T).

-Kacper

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Cygwin multithreading performance
  2015-12-06  2:35                         ` Kacper Michajlow
@ 2015-12-06  8:02                           ` Mark Geisert
  2015-12-06 20:56                             ` Kacper Michajlow
  0 siblings, 1 reply; 21+ messages in thread
From: Mark Geisert @ 2015-12-06  8:02 UTC (permalink / raw)
  To: cygwin

Kacper Michajlow wrote:
> 2015-12-05 23:40 GMT+01:00 Mark Geisert <mark@maxrnd.com>:
>> It looks like we're going to have to compare actual pthread_mutex_lock()
>> implementations.  Inspecting source is nice but I don't want to be chasing a
>> mirage so I really hope there's a pthread_mutex_lock() function inside the
>> MinGW git you are running.  gdb could easily answer that question.  Could
>> you please do an 'info func pthread_mutex_lock' after starting MinGW git
>> under MinGW gdb with a breakpoint at main() (so libraries are loaded).
[...]
> Hmm, thinking about it mingw doesn't have pthread implementation or
> any wrapper for it. If someone needs pthread they would probably go
> for pthreads-w32 implementation.
>
> I started to wonder because I don't recall git would need pthreads to
> compile on Windows. And indeed they have a wrapper for Windows API...
> https://github.com/git/git/blob/master/compat/win32/pthread.h
> https://github.com/git/git/blob/master/compat/win32/pthread.c

OK, so git has its own pthread_mutex_lock/unlock ops which map to very 
light-weight critical section operations.

> Though it is not really a matter that "native" git build is fast and
> all, but that Cygwin's one really struggles if it comes to MT workload.

In the worst cases I see using your testcase, about half the time the 
busiest locks are processed within 1 usec but there's a spectrum of 
longer latencies for the other half of the time.  I don't know (yet) if 
that can be improved in Cygwin's more general implementation but at 
least the matter has now been brought to our attention :).

> And this not only issue with git unfortunately. Download speeds are
> also limited on Cygwin. I know POSIX compatibility layers comes with a
> price but I would love to see improvements in those areas.
> Cygwin:
> Receiving objects: 100% (230458/230458), 78.41 MiB | 1.53 MiB/s, done.
> "native" git:
> Receiving objects: 100% (230458/230458), 78.41 MiB | 18.54 MiB/s, done.

You're asserting this additional testcase has the same cause.  What is 
telling you that?  And FTR what is the git command you are issuing?  I 
can then do the lock latency analysis on this new testcase if warranted.
Thanks,

..mark

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Cygwin multithreading performance
  2015-12-06  8:02                           ` Mark Geisert
@ 2015-12-06 20:56                             ` Kacper Michajlow
  2015-12-08 10:51                               ` Mark Geisert
  0 siblings, 1 reply; 21+ messages in thread
From: Kacper Michajlow @ 2015-12-06 20:56 UTC (permalink / raw)
  To: cygwin

2015-12-06 9:02 GMT+01:00 Mark Geisert <mark@maxrnd.com>:
> Kacper Michajlow wrote:
>>
>> 2015-12-05 23:40 GMT+01:00 Mark Geisert <mark@maxrnd.com>:
>>>
>>> It looks like we're going to have to compare actual pthread_mutex_lock()
>>> implementations.  Inspecting source is nice but I don't want to be
>>> chasing a
>>> mirage so I really hope there's a pthread_mutex_lock() function inside
>>> the
>>> MinGW git you are running.  gdb could easily answer that question.  Could
>>> you please do an 'info func pthread_mutex_lock' after starting MinGW git
>>> under MinGW gdb with a breakpoint at main() (so libraries are loaded).
>
> [...]
>>
>> Hmm, thinking about it mingw doesn't have pthread implementation or
>> any wrapper for it. If someone needs pthread they would probably go
>> for pthreads-w32 implementation.
>>
>> I started to wonder because I don't recall git would need pthreads to
>> compile on Windows. And indeed they have a wrapper for Windows API...
>> https://github.com/git/git/blob/master/compat/win32/pthread.h
>> https://github.com/git/git/blob/master/compat/win32/pthread.c
>
>
> OK, so git has its own pthread_mutex_lock/unlock ops which map to very
> light-weight critical section operations.
>
>> Though it is not really a matter that "native" git build is fast and
>> all, but that Cygwin's one really struggles if it comes to MT workload.
>
>
> In the worst cases I see using your testcase, about half the time the
> busiest locks are processed within 1 usec but there's a spectrum of longer
> latencies for the other half of the time.  I don't know (yet) if that can be
> improved in Cygwin's more general implementation but at least the matter has
> now been brought to our attention :).
,
Yes, I can imagine, git's objects are very small so threading overhead
is very noticeable.

>> And this not only issue with git unfortunately. Download speeds are
>> also limited on Cygwin. I know POSIX compatibility layers comes with a
>> price but I would love to see improvements in those areas.
>> Cygwin:
>> Receiving objects: 100% (230458/230458), 78.41 MiB | 1.53 MiB/s, done.
>> "native" git:
>> Receiving objects: 100% (230458/230458), 78.41 MiB | 18.54 MiB/s, done.
>
>
> You're asserting this additional testcase has the same cause.  What is
> telling you that?  And FTR what is the git command you are issuing?  I can
> then do the lock latency analysis on this new testcase if warranted.

No, sorry, I mixed different things. It is just that I'm ruining both
git build lately and I wanted to share another issue before I forget
about it.

This was git clone command for some random repository from github.
There is a lot factors at hand here but the fact is with cygwin speed
is capped on 1.5MB/s and this is reproducible. This is probably also
related to the fact that git operates on large amount small object.
But this time it is single thread workload. I tried strace this, but
frankly I am not sure what to look for.

All in all I just want to bring those issues to your attention.
Whether it is fixable or not is another story. But we will not know
unless someone with required knowledge analyze it.

-Kacper

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Cygwin multithreading performance
  2015-12-06 20:56                             ` Kacper Michajlow
@ 2015-12-08 10:51                               ` Mark Geisert
  2015-12-08 15:34                                 ` Corinna Vinschen
  0 siblings, 1 reply; 21+ messages in thread
From: Mark Geisert @ 2015-12-08 10:51 UTC (permalink / raw)
  To: cygwin

(Maybe cygwin-developers is a better list for this?  It's pretty obscure.)

Here are some mutex lock stats I've been talking about providing.  These are 
from the OP's original testcase 'git repack -a -f' running over a clone of the 
newlib-cygwin source tree.  Run on a 2-core, 4-HT machine under Windows 7 x64. 
I'm running a slightly modified cygwin1.dll that has 3 one-line mods to thread.cc.

I feed an strace output file through an awk script and a C program to produce 
the output below.  The first display is a summary showing all mutexes with 
latency buckets and counts for each thread and each mutex.  The second display 
shows just the last two mutexes but also shows the count of locks and unlocks 
from each source line.  You can see most mutexes have all their latencies <= 1 
usec, but there are some that have a spectrum of latencies reaching above 1000 
usecs == 1 msec.  I'm defining latency as the difference in usecs between a 
timestamp taken on entry to pthread_mutex::lock and the timestamp appearing in 
the strace output for that ::lock operation when '--mask=pthread' is specified.

I'm considering adding the tools that produced these displays to the cygutils 
package.  I'm unsure if the cygwin1.dll mods I've made locally should be shipped 
generally; I don't know how much extra CPU they use, if any.

..mark

======== first display ========
*** processes present ***
pid 4908: git
pid 7020: git

*** threads present ***
lock latency buckets:                            <=1   <=10  <=100 <=1000  >1000
tid main 0: lks 269960, ulks 269960,          269416     54    182    128    180
tid main 1: lks   6307, ulks   6307,            6304      1      2      0      0
tid   1216: lks 196941, ulks 196941,           84899   5045  91669  13914   1414
tid   4560: lks 197203, ulks 197203,           70033   4165 110333  11442   1230
tid   7840: lks  68984, ulks  68984,           34160   1389  25783   5685   1967
tid   9076: lks 166308, ulks 166308,           81715   2097  72009   8805   1682

*** mutexes present ***
lock latency buckets:                            <=1   <=10  <=100 <=1000  >1000
mtx  4908/01802F30E8 lks      0, ulks      0,      0      0      0      0      0
mtx  4908/0600000010 lks      9, ulks      9,      8      1      0      0      0
mtx  4908/0600000108 lks 179394, ulks 179394, 179361     18     14      0      1
mtx  4908/0600000160 lks      1, ulks      1,      1      0      0      0      0
mtx  4908/06000180E8 lks      0, ulks      0,      0      0      0      0      0
mtx  7020/06000180E8 lks   4182, ulks   4182,   4180      0      2      0      0
mtx  4908/0600018140 lks      0, ulks      0,      0      0      0      0      0
mtx  7020/0600018140 lks      1, ulks      1,      1      0      0      0      0
mtx  4908/0600028518 lks     18, ulks     18,     18      0      0      0      0
mtx  4908/0600038B60 lks  88002, ulks  88002,  87957     30     15      0      0
lock latency buckets:                            <=1   <=10  <=100 <=1000  >1000
mtx  4908/0600038EB0 lks    194, ulks    194,    194      0      0      0      0
mtx  4908/0600039010 lks      6, ulks      6,      6      0      0      0      0
mtx  4908/06000390A0 lks      6, ulks      6,      6      0      0      0      0
mtx  7020/0600039A20 lks      6, ulks      6,      6      0      0      0      0
mtx  4908/060003A280 lks      1, ulks      1,      1      0      0      0      0
mtx  7020/060003A280 lks      8, ulks      8,      8      0      0      0      0
mtx  4908/060003A308 lks      0, ulks      0,      0      0      0      0      0
mtx  7020/060003A308 lks      6, ulks      6,      6      0      0      0      0
mtx  4908/060003A370 lks      0, ulks      0,      0      0      0      0      0
mtx  4908/060003A3B0 lks      0, ulks      0,      0      0      0      0      0
lock latency buckets:                            <=1   <=10  <=100 <=1000  >1000
mtx  4908/060003A428 lks      0, ulks      0,      0      0      0      0      0
mtx  4908/060003A468 lks      0, ulks      0,      0      0      0      0      0
mtx  4908/060003A940 lks      0, ulks      0,      0      0      0      0      0
mtx  7020/060003A940 lks     26, ulks     26,     26      0      0      0      0
mtx  7020/060003AC90 lks    194, ulks    194,    194      0      0      0      0
mtx  7020/060003ADF0 lks      6, ulks      6,      6      0      0      0      0
mtx  4908/0600051B30 lks      1, ulks      1,      1      0      0      0      0
mtx  4908/0600051E20 lks      6, ulks      6,      6      0      0      0      0
mtx  7020/0600053A00 lks    920, ulks    920,    920      0      0      0      0
mtx  4908/0600053B20 lks    920, ulks    920,    920      0      0      0      0
lock latency buckets:                            <=1   <=10  <=100 <=1000  >1000
mtx  4908/0600062008 lks     14, ulks     14,     14      0      0      0      0
mtx  4908/06000621D0 lks      2, ulks      2,      2      0      0      0      0
mtx  4908/06000625B0 lks      6, ulks      6,      6      0      0      0      0
mtx  4908/0600063B90 lks      0, ulks      0,      0      0      0      0      0
mtx  7020/0600063B90 lks      2, ulks      2,      2      0      0      0      0
mtx  4908/0600063BE0 lks      0, ulks      0,      0      0      0      0      0
mtx  7020/0600063BE0 lks      5, ulks      5,      5      0      0      0      0
mtx  4908/0600063C30 lks      0, ulks      0,      0      0      0      0      0
mtx  7020/0600063C30 lks      2, ulks      2,      2      0      0      0      0
mtx  7020/0600063C80 lks      4, ulks      4,      4      0      0      0      0
lock latency buckets:                            <=1   <=10  <=100 <=1000  >1000
mtx  7020/0600076500 lks    920, ulks    920,    920      0      0      0      0
mtx  4908/0600114120 lks     15, ulks     15,      9      2      4      0      0
mtx  4908/060013EE78 lks    658, ulks    658,    446     17    189      6      0
mtx  4908/060026DE50 lks     12, ulks     12,      4      1      6      1      0
mtx  4908/06002A00F0 lks 155066, ulks 155066,  66359   4395  78895   4742    675
mtx  4908/06006628D0 lks      4, ulks      4,      4      0      0      0      0
mtx  4908/06007217B0 lks     23, ulks     23,     23      0      0      0      0
mtx  4908/0600784C70 lks   1529, ulks   1529,   1285     39    195     10      0
mtx  7020/0600837A80 lks     13, ulks     13,     13      0      0      0      0
mtx  4908/0600A081E8 lks     10, ulks     10,      9      1      0      0      0
lock latency buckets:                            <=1   <=10  <=100 <=1000  >1000
mtx  4908/0600A08228 lks     10, ulks     10,      5      3      2      0      0
mtx  4908/0600A082A8 lks      8, ulks      8,      6      0      2      0      0
mtx  4908/0600A082E8 lks      8, ulks      8,      3      0      5      0      0
mtx  4908/0600A08368 lks      8, ulks      8,      5      0      3      0      0
mtx  4908/0600A083A8 lks      8, ulks      8,      3      0      4      1      0
mtx  4908/0600D0A5B0 lks      2, ulks      2,      2      0      0      0      0
mtx  4908/0600F35670 lks      8, ulks      8,      8      0      0      0      0
mtx  4908/0600FA6860 lks 154745, ulks 154745,  56092   3217  64883  25435   5118
mtx  4908/060157A3B8 lks    580, ulks    580,    410     11    154      5      0
mtx  4908/060157E568 lks      4, ulks      4,      4      0      0      0      0
lock latency buckets:                            <=1   <=10  <=100 <=1000  >1000
mtx  4908/060157E5A8 lks      4, ulks      4,      2      0      2      0      0
mtx  4908/06015B1AD0 lks     12, ulks     12,      3      0      7      2      0
mtx  4908/06019741E8 lks    259, ulks    259,    186      2     54     16      1
mtx  4908/0601974228 lks    259, ulks    259,     27      0     45     63    124
mtx  4908/0602076490 lks      6, ulks      6,      2      0      3      1      0
mtx  7020/0602874000 lks     12, ulks     12,     11      1      0      0      0
mtx  4908/060345CAB0 lks      1, ulks      1,      1      0      0      0      0
mtx  4908/060347FE48 lks    316, ulks    316,    246     13     54      3      0
mtx  4908/0603498600 lks 316825, ulks 316825, 146254   4986 155345   9686    554
mtx  4908/06034C8E68 lks    436, ulks    436,    324     14     95      3      0

======== second display ========
lock latency buckets:                            <=1   <=10  <=100 <=1000  >1000
mtx  4908/0603498600 lks 316825, ulks 316825, 146254   4986 155345   9686    554
   caller 0x0100455269, count 196769, L, /usr/src/git/builtin/pack-objects.c:1695
   caller 0x01004552C4, count  15148, U, /usr/src/git/builtin/pack-objects.c:1705
   caller 0x0100455478, count 181621, U, /usr/src/git/builtin/pack-objects.c:1702
   caller 0x010045554C, count 120056, L, /usr/src/git/builtin/pack-objects.c:1834
   caller 0x010045556E, count 120056, U, /usr/src/git/builtin/pack-objects.c:1837
mtx  4908/06034C8E68 lks    436, ulks    436,    324     14     95      3      0
   caller 0x018014CC77, count      1, L, /oss/src/winsup/cygwin/thread.cc:475
   caller 0x018014CD00, count      1, U, /oss/src/winsup/cygwin/thread.cc:496
   caller 0x018014CDAF, count    432, L, /oss/src/winsup/cygwin/thread.cc:971
   caller 0x018014CDE6, count    432, U, /oss/src/winsup/cygwin/thread.cc:982
   caller 0x018014D07E, count      1, L, /oss/src/winsup/cygwin/thread.cc:1946
   caller 0x018014D090, count      1, U, /oss/src/winsup/cygwin/thread.cc:1951
   caller 0x018014D7E6, count      1, L, /oss/src/winsup/cygwin/thread.cc:525
   caller 0x018014D7FF, count      1, U, /oss/src/winsup/cygwin/thread.cc:533
   caller 0x018014EDD7, count      1, U, /oss/src/winsup/cygwin/thread.cc:2400
   caller 0x018014EE97, count      1, L, /oss/src/winsup/cygwin/thread.cc:2389


--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Cygwin multithreading performance
  2015-12-08 10:51                               ` Mark Geisert
@ 2015-12-08 15:34                                 ` Corinna Vinschen
  2015-12-08 17:02                                   ` Corinna Vinschen
  0 siblings, 1 reply; 21+ messages in thread
From: Corinna Vinschen @ 2015-12-08 15:34 UTC (permalink / raw)
  To: cygwin

[-- Attachment #1: Type: text/plain, Size: 2713 bytes --]

On Dec  8 02:51, Mark Geisert wrote:
> (Maybe cygwin-developers is a better list for this?  It's pretty obscure.)

Yes, cygwin-developers is fine since it's gory implementation details.

> Here are some mutex lock stats I've been talking about providing.  These are
> from the OP's original testcase 'git repack -a -f' running over a clone of
> the newlib-cygwin source tree.  Run on a 2-core, 4-HT machine under Windows
> 7 x64. I'm running a slightly modified cygwin1.dll that has 3 one-line mods
> to thread.cc.

Which I'd like to see a patch of, just to know what you mean.

> I'm considering adding the tools that produced these displays to the
> cygutils package.  I'm unsure if the cygwin1.dll mods I've made locally
> should be shipped generally; I don't know how much extra CPU they use, if
> any.

Well, let's have a look.  This is open source after all :)

>   caller 0x018014CC77, count      1, L, /oss/src/winsup/cygwin/thread.cc:475
>   caller 0x018014CD00, count      1, U, /oss/src/winsup/cygwin/thread.cc:496
>   caller 0x018014CDAF, count    432, L, /oss/src/winsup/cygwin/thread.cc:971
>   caller 0x018014CDE6, count    432, U, /oss/src/winsup/cygwin/thread.cc:982
>   caller 0x018014D07E, count      1, L, /oss/src/winsup/cygwin/thread.cc:1946
>   caller 0x018014D090, count      1, U, /oss/src/winsup/cygwin/thread.cc:1951
>   caller 0x018014D7E6, count      1, L, /oss/src/winsup/cygwin/thread.cc:525
>   caller 0x018014D7FF, count      1, U, /oss/src/winsup/cygwin/thread.cc:533
>   caller 0x018014EDD7, count      1, U, /oss/src/winsup/cygwin/thread.cc:2400
>   caller 0x018014EE97, count      1, L, /oss/src/winsup/cygwin/thread.cc:2389

This is interesting.  I'm not sure if anything in the rest of the
output shows how much is wasted on the above two calls, though.

thread.cc:971 and thread.cc:982 are pthread_setcancelstate, and it's
called pretty often as part of stdio functions.  Every stdio function
which has to lock the FILE structure also calls pthread_setcancelstate
to disable and reenable cancellation before and after locking.  That's
almost any stdio function.

This may be one of the problems which lower performance, but there's no
easy or quick way around that, AFAICS.

There's also the fact that, even for tools using __fsetlocking to disable
stdio locking, pthread_setcancelstate will still be called unconditionally.
The question here is, if that's wrong and pthread_setcancelstate should be
skipped if the application sets FSETLOCKING_BYCALLER.

Thanks,
Corinna

-- 
Corinna Vinschen                  Please, send mails regarding Cygwin to
Cygwin Maintainer                 cygwin AT cygwin DOT com
Red Hat

[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Cygwin multithreading performance
  2015-12-08 15:34                                 ` Corinna Vinschen
@ 2015-12-08 17:02                                   ` Corinna Vinschen
  0 siblings, 0 replies; 21+ messages in thread
From: Corinna Vinschen @ 2015-12-08 17:02 UTC (permalink / raw)
  To: cygwin

[-- Attachment #1: Type: text/plain, Size: 3257 bytes --]

On Dec  8 16:34, Corinna Vinschen wrote:
> On Dec  8 02:51, Mark Geisert wrote:
> > (Maybe cygwin-developers is a better list for this?  It's pretty obscure.)
> 
> Yes, cygwin-developers is fine since it's gory implementation details.
> 
> > Here are some mutex lock stats I've been talking about providing.  These are
> > from the OP's original testcase 'git repack -a -f' running over a clone of
> > the newlib-cygwin source tree.  Run on a 2-core, 4-HT machine under Windows
> > 7 x64. I'm running a slightly modified cygwin1.dll that has 3 one-line mods
> > to thread.cc.
> 
> Which I'd like to see a patch of, just to know what you mean.
> 
> > I'm considering adding the tools that produced these displays to the
> > cygutils package.  I'm unsure if the cygwin1.dll mods I've made locally
> > should be shipped generally; I don't know how much extra CPU they use, if
> > any.
> 
> Well, let's have a look.  This is open source after all :)
> 
> >   caller 0x018014CC77, count      1, L, /oss/src/winsup/cygwin/thread.cc:475
> >   caller 0x018014CD00, count      1, U, /oss/src/winsup/cygwin/thread.cc:496
> >   caller 0x018014CDAF, count    432, L, /oss/src/winsup/cygwin/thread.cc:971
> >   caller 0x018014CDE6, count    432, U, /oss/src/winsup/cygwin/thread.cc:982
> >   caller 0x018014D07E, count      1, L, /oss/src/winsup/cygwin/thread.cc:1946
> >   caller 0x018014D090, count      1, U, /oss/src/winsup/cygwin/thread.cc:1951
> >   caller 0x018014D7E6, count      1, L, /oss/src/winsup/cygwin/thread.cc:525
> >   caller 0x018014D7FF, count      1, U, /oss/src/winsup/cygwin/thread.cc:533
> >   caller 0x018014EDD7, count      1, U, /oss/src/winsup/cygwin/thread.cc:2400
> >   caller 0x018014EE97, count      1, L, /oss/src/winsup/cygwin/thread.cc:2389
> 
> This is interesting.  I'm not sure if anything in the rest of the
> output shows how much is wasted on the above two calls, though.
> 
> thread.cc:971 and thread.cc:982 are pthread_setcancelstate, and it's
> called pretty often as part of stdio functions.  Every stdio function
> which has to lock the FILE structure also calls pthread_setcancelstate
> to disable and reenable cancellation before and after locking.  That's
> almost any stdio function.
> 
> This may be one of the problems which lower performance, but there's no
> easy or quick way around that, AFAICS.
> 
> There's also the fact that, even for tools using __fsetlocking to disable
> stdio locking, pthread_setcancelstate will still be called unconditionally.
> The question here is, if that's wrong and pthread_setcancelstate should be
> skipped if the application sets FSETLOCKING_BYCALLER.

For a start, I simply removed the mutex lock/unlock in calls to
pthread_setcancelstate and pthread_setcanceltype.  These locks are
completely unnecessary.  These functions are only called for the current
thread anyway.

I'm just creating a developer snapshot which I'll upload to
https://cygwin.com/snapshots/ in half an hour at the latest.  Please
have a look if your testcase behaves better now.


Thanks,
Corinna

-- 
Corinna Vinschen                  Please, send mails regarding Cygwin to
Cygwin Maintainer                 cygwin AT cygwin DOT com
Red Hat

[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Cygwin multithreading performance
  2015-11-14  0:24 Cygwin multithreading performance Kacper Michajlow
  2015-11-19 20:24 ` Mark Geisert
@ 2015-12-18 15:06 ` Achim Gratz
  1 sibling, 0 replies; 21+ messages in thread
From: Achim Gratz @ 2015-12-18 15:06 UTC (permalink / raw)
  To: cygwin

Kacper Michajlow writes:
> I recently noticed that Cygwin multithreading is very inefficient. I
> was repacking few git repositories and with Cygwin's git, it spawns
> threads but they are so badly synchronized that there is no speed gain
> over one thread and possible loose because of the overhead. On my
> machine I got 7-10% CPU usage while with git build with mingw easily
> uses 100%.

I've been testing this again with my local copy of Emacs' Git repository
and at least on this two-core system it works just fine (it was working
fine on a four-core system earlier).  The object count phase looks
serialized and doesn't go over 50%, however a good deal of that is
system time anyway, so I assume it's file access.  The actual
compression uses whatever CPU it can get, with the occasional spike in
system time when it goes to disk (it's an SSD).

Is your repo local or on some remote filesystem?

Regards,
Achim.
-- 
+<[Q+ Matrix-12 WAVE#46+305 Neuron microQkb Andromeda XTk Blofeld]>+

SD adaptation for Waldorf microQ V2.22R2:
http://Synth.Stromeko.net/Downloads.html#WaldorfSDada

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2015-12-18 15:06 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-11-14  0:24 Cygwin multithreading performance Kacper Michajlow
2015-11-19 20:24 ` Mark Geisert
2015-11-20 14:25   ` Kacper Michajlow
2015-11-21  9:21     ` Mark Geisert
2015-11-21 10:53       ` Corinna Vinschen
2015-11-23  7:45         ` Mark Geisert
2015-11-23 10:27           ` John Hein
2015-11-24  1:05             ` Mark Geisert
2015-11-26  9:49               ` Corinna Vinschen
2015-11-26 10:49                 ` Mark Geisert
2015-12-05 10:51                   ` Mark Geisert
2015-12-05 13:07                     ` Kacper Michajlow
2015-12-05 13:59                       ` Kacper Michajlow
2015-12-05 22:40                       ` Mark Geisert
2015-12-06  2:35                         ` Kacper Michajlow
2015-12-06  8:02                           ` Mark Geisert
2015-12-06 20:56                             ` Kacper Michajlow
2015-12-08 10:51                               ` Mark Geisert
2015-12-08 15:34                                 ` Corinna Vinschen
2015-12-08 17:02                                   ` Corinna Vinschen
2015-12-18 15:06 ` Achim Gratz

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).