* Cygwin multithreading performance
@ 2015-11-14 0:24 Kacper Michajlow
2015-11-19 20:24 ` Mark Geisert
2015-12-18 15:06 ` Achim Gratz
0 siblings, 2 replies; 21+ messages in thread
From: Kacper Michajlow @ 2015-11-14 0:24 UTC (permalink / raw)
To: cygwin
Hello,
I recently noticed that Cygwin multithreading is very inefficient. I
was repacking few git repositories and with Cygwin's git, it spawns
threads but they are so badly synchronized that there is no speed gain
over one thread and possible loose because of the overhead. On my
machine I got 7-10% CPU usage while with git build with mingw easily
uses 100%.
You can find the code in question here
https://github.com/git/git/blob/master/builtin/pack-objects.c#L1967-L2094
Do you have any suggestions? Is there any chance to get MT workloads
improved in Cygwin? In present days it is really big problem in my
opinion.
Best Regard
--
Problem reports: http://cygwin.com/problems.html
FAQ: http://cygwin.com/faq/
Documentation: http://cygwin.com/docs.html
Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Cygwin multithreading performance
2015-11-14 0:24 Cygwin multithreading performance Kacper Michajlow
@ 2015-11-19 20:24 ` Mark Geisert
2015-11-20 14:25 ` Kacper Michajlow
2015-12-18 15:06 ` Achim Gratz
1 sibling, 1 reply; 21+ messages in thread
From: Mark Geisert @ 2015-11-19 20:24 UTC (permalink / raw)
To: cygwin
Kacper Michajlow wrote:
> I recently noticed that Cygwin multithreading is very inefficient. I
> was repacking few git repositories and with Cygwin's git, it spawns
> threads but they are so badly synchronized that there is no speed gain
> over one thread and possible loose because of the overhead. On my
> machine I got 7-10% CPU usage while with git build with mingw easily
> uses 100%.
>
> You can find the code in question here
> https://github.com/git/git/blob/master/builtin/pack-objects.c#L1967-L2094
>
> Do you have any suggestions? Is there any chance to get MT workloads
> improved in Cygwin? In present days it is really big problem in my
> opinion.
Although there have been some issues with Cygwin pthreads reported and
resolved, I can't recall complaints about their performance. You don't
supply much specific info so I had to guess that you must be doing
something like 'git gc' to provoke calls to the code you quote. Please
give more info if I was mistaken.
I did an strace of 'git gc' over a small source tree I have and found:
> ~/src/cygwin-cygutils strace --mask=debug+syscall+thread -o git.strace git gc
> Counting objects: 1691, done.
> Delta compression using up to 4 threads.
> Compressing objects: 100% (398/398), done.
> Writing objects: 100% (1691/1691), done.
> Total 1691 (delta 1250), reused 1691 (delta 1250)
>
> ~/src/cygwin-cygutils grep "fork(" git.strace
> 350 111164 [main] git 360 fork: 0 = fork()
> 59 113379 [main] git 4980 fork: 360 = fork()
> 496 242346 [main] git 4980 fork: 368 = fork()
> 513 242585 [main] git 368 fork: 0 = fork()
> 828 589040 [main] git 4980 fork: 4968 = fork()
> 685 589341 [main] git 4968 fork: 0 = fork()
> 591 126631 [main] git 4968 fork: 1784 = fork()
> 483 126866 [main] git 1784 fork: 0 = fork()
> 618 2320996 [main] git 4980 fork: 2912 = fork()
> 558 2321259 [main] git 2912 fork: 0 = fork()
> 555 3023781 [main] git 4980 fork: 1612 = fork()
> 500 3024002 [main] git 1612 fork: 0 = fork()
> 766 3112383 [main] git 4980 fork: 1756 = fork()
> 681 3112655 [main] git 1756 fork: 0 = fork()
There's your problem. Git is for some reason fork()ing to do its
parallel operations. fork() is very complicated to emulate on Windows
and Cygwin's fork() is already known to be slow compared to native OS
implementations.
Why is mingw faster? Inspection of run-command.c in the git source tree
(BTW thanks for the github link) shows that start_command() has two code
paths divided by "#ifndef GIT_WINDOWS_NATIVE". The Windows native path
(e.g. mingw) doesn't fork() but instead spawns subprocesses. On Cygwin
the fork() path is used. Git probably ought to use the spawn code path
on Cygwin too.
I don't know offhand if this is something Cygwin's git maintainer would
want to tackle or if it should be handled upstream but I'd guess the latter.
Hope this helps,
..mark
--
Problem reports: http://cygwin.com/problems.html
FAQ: http://cygwin.com/faq/
Documentation: http://cygwin.com/docs.html
Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Cygwin multithreading performance
2015-11-19 20:24 ` Mark Geisert
@ 2015-11-20 14:25 ` Kacper Michajlow
2015-11-21 9:21 ` Mark Geisert
0 siblings, 1 reply; 21+ messages in thread
From: Kacper Michajlow @ 2015-11-20 14:25 UTC (permalink / raw)
To: cygwin
2015-11-19 21:24 GMT+01:00 Mark Geisert <mark@maxrnd.com>:
> Kacper Michajlow wrote:
>>
>> I recently noticed that Cygwin multithreading is very inefficient. I
>> was repacking few git repositories and with Cygwin's git, it spawns
>> threads but they are so badly synchronized that there is no speed gain
>> over one thread and possible loose because of the overhead. On my
>> machine I got 7-10% CPU usage while with git build with mingw easily
>> uses 100%.
>>
>> You can find the code in question here
>> https://github.com/git/git/blob/master/builtin/pack-objects.c#L1967-L2094
>>
>> Do you have any suggestions? Is there any chance to get MT workloads
>> improved in Cygwin? In present days it is really big problem in my
>> opinion.
>
>
> Although there have been some issues with Cygwin pthreads reported and
> resolved, I can't recall complaints about their performance. You don't
> supply much specific info so I had to guess that you must be doing something
> like 'git gc' to provoke calls to the code you quote. Please give more info
> if I was mistaken.
>
> I did an strace of 'git gc' over a small source tree I have and found:
>
>> ~/src/cygwin-cygutils strace --mask=debug+syscall+thread -o git.strace git
>> gc
>> Counting objects: 1691, done.
>> Delta compression using up to 4 threads.
>> Compressing objects: 100% (398/398), done.
>> Writing objects: 100% (1691/1691), done.
>> Total 1691 (delta 1250), reused 1691 (delta 1250)
>>
>> ~/src/cygwin-cygutils grep "fork(" git.strace
>> 350 111164 [main] git 360 fork: 0 = fork()
>> 59 113379 [main] git 4980 fork: 360 = fork()
>> 496 242346 [main] git 4980 fork: 368 = fork()
>> 513 242585 [main] git 368 fork: 0 = fork()
>> 828 589040 [main] git 4980 fork: 4968 = fork()
>> 685 589341 [main] git 4968 fork: 0 = fork()
>> 591 126631 [main] git 4968 fork: 1784 = fork()
>> 483 126866 [main] git 1784 fork: 0 = fork()
>> 618 2320996 [main] git 4980 fork: 2912 = fork()
>> 558 2321259 [main] git 2912 fork: 0 = fork()
>> 555 3023781 [main] git 4980 fork: 1612 = fork()
>> 500 3024002 [main] git 1612 fork: 0 = fork()
>> 766 3112383 [main] git 4980 fork: 1756 = fork()
>> 681 3112655 [main] git 1756 fork: 0 = fork()
>
>
> There's your problem. Git is for some reason fork()ing to do its parallel
> operations. fork() is very complicated to emulate on Windows and Cygwin's
> fork() is already known to be slow compared to native OS implementations.
>
> Why is mingw faster? Inspection of run-command.c in the git source tree
> (BTW thanks for the github link) shows that start_command() has two code
> paths divided by "#ifndef GIT_WINDOWS_NATIVE". The Windows native path
> (e.g. mingw) doesn't fork() but instead spawns subprocesses. On Cygwin the
> fork() path is used. Git probably ought to use the spawn code path on
> Cygwin too.
>
> I don't know offhand if this is something Cygwin's git maintainer would want
> to tackle or if it should be handled upstream but I'd guess the latter.
> Hope this helps,
>
> ..mark
>
> --
> Problem reports: http://cygwin.com/problems.html
> FAQ: http://cygwin.com/faq/
> Documentation: http://cygwin.com/docs.html
> Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
>
Thanks for reply. And sorry for being not specific enough before. 'git
gc' is a driver which runs various git command to do cleanup in
repository. Though I'm mostly concerned about the code I linked.
Instead of 'git gc' it is better to test directly 'git repack -a -f'
and possibly on repository where it takes some time.
'git://sourceware.org/git/newlib-cygwin.git' is good test case.
Although with bigger repositories performance hit is bigger, this is
good example to see what's going on.
I'm well aware that forking on windows is problematic, but I
explicitly interested in parallelized part of execution. I don't care
about forks, while this slows things down too, they are not used in
compression process which is parallelized over the all cpu threads.
Each command is indeed forked, but I'm only interested about
pack-objects part hence the code I linked.
Here is my result on mineralized test.
$ strace --mask=debug+syscall+thread -o git.strace git repack -a -f
Counting objects: 156690, done.
Delta compression using up to 12 threads.
Compressing objects: 100% (154730/154730), done.
Writing objects: 100% (156690/156690), done.
Total 156690 (delta 123449), reused 33146 (delta 0)
$ grep "fork(" git.strace
559 53728 [main] git 24340 fork: 24368 = fork()
465 54022 [main] git 24368 fork: 0 = fork()
Only two forks were created, while during compression only 25% cpu was
used (on big repo like linux kernel it doesn't exceed 8%). With native
git the same workload easily uses 95-100% cpu and therefor is a lot
faster.
I know I'm not that specific, but I don't know what more to say here.
I could try to produce sample app to illustrate the issue. But git is
already good example I think. Pure C with pthreads. I already linked
the code in my first email.
Tell me how I can help to diagnose it further.
-Kacper
--
Problem reports: http://cygwin.com/problems.html
FAQ: http://cygwin.com/faq/
Documentation: http://cygwin.com/docs.html
Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Cygwin multithreading performance
2015-11-20 14:25 ` Kacper Michajlow
@ 2015-11-21 9:21 ` Mark Geisert
2015-11-21 10:53 ` Corinna Vinschen
0 siblings, 1 reply; 21+ messages in thread
From: Mark Geisert @ 2015-11-21 9:21 UTC (permalink / raw)
To: cygwin
Kacper Michajlow wrote:
> Thanks for reply. And sorry for being not specific enough before. 'git
> gc' is a driver which runs various git command to do cleanup in
> repository. Though I'm mostly concerned about the code I linked.
> Instead of 'git gc' it is better to test directly 'git repack -a -f'
> and possibly on repository where it takes some time.
> 'git://sourceware.org/git/newlib-cygwin.git' is good test case.
> Although with bigger repositories performance hit is bigger, this is
> good example to see what's going on.
I appreciate that more specific info on how you experience the issue.
> I'm well aware that forking on windows is problematic, but I
> explicitly interested in parallelized part of execution. I don't care
> about forks, while this slows things down too, they are not used in
> compression process which is parallelized over the all cpu threads.
> Each command is indeed forked, but I'm only interested about
> pack-objects part hence the code I linked.
OK, we're on the same page now :).
> $ strace --mask=debug+syscall+thread -o git.strace git repack -a -f
> Counting objects: 156690, done.
> Delta compression using up to 12 threads.
> Compressing objects: 100% (154730/154730), done.
> Writing objects: 100% (156690/156690), done.
> Total 156690 (delta 123449), reused 33146 (delta 0)
>
> $ grep "fork(" git.strace
> 559 53728 [main] git 24340 fork: 24368 = fork()
> 465 54022 [main] git 24368 fork: 0 = fork()
>
> Only two forks were created, while during compression only 25% cpu was
> used (on big repo like linux kernel it doesn't exceed 8%). With native
> git the same workload easily uses 95-100% cpu and therefor is a lot
> faster.
I was able to reproduce your issue using a cloned newlib-cygwin repo.
On a 6-CPU machine I saw max 36% CPU utilization during the compression
phase. ProcessExplorer showed all 6 threads were getting CPU time (to
varying degrees) and when suspended they were always trying to acquire a
mutex. I'd like to run some more straces and perhaps investigate with
some other tools before saying more. This may take a while.
What I've done so far is install the git-debuginfo and cygwin-debuginfo
packages to that I can convert hex RIP addresses to line numbers. I've
run the testcase under gdb so I can interrupt at random times and poke
around. The straces from this testcase are ginormous so I hope I can
figure out a better way to see why the compression threads aren't
CPU-bound like they should be. If you don't already know, 'strace
--help' shows the available mask values. The threads are each writing
to disk, so I wonder if there's some unintentional serialization going
on somewhere, but I don't know yet how I could verify that theory.
..mark
--
Problem reports: http://cygwin.com/problems.html
FAQ: http://cygwin.com/faq/
Documentation: http://cygwin.com/docs.html
Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Cygwin multithreading performance
2015-11-21 9:21 ` Mark Geisert
@ 2015-11-21 10:53 ` Corinna Vinschen
2015-11-23 7:45 ` Mark Geisert
0 siblings, 1 reply; 21+ messages in thread
From: Corinna Vinschen @ 2015-11-21 10:53 UTC (permalink / raw)
To: cygwin
[-- Attachment #1: Type: text/plain, Size: 3282 bytes --]
On Nov 21 01:21, Mark Geisert wrote:
> Kacper Michajlow wrote:
> >Thanks for reply. And sorry for being not specific enough before. 'git
> >gc' is a driver which runs various git command to do cleanup in
> >repository. Though I'm mostly concerned about the code I linked.
> >Instead of 'git gc' it is better to test directly 'git repack -a -f'
> >and possibly on repository where it takes some time.
> >'git://sourceware.org/git/newlib-cygwin.git' is good test case.
> >Although with bigger repositories performance hit is bigger, this is
> >good example to see what's going on.
>
> I appreciate that more specific info on how you experience the issue.
>
> >I'm well aware that forking on windows is problematic, but I
> >explicitly interested in parallelized part of execution. I don't care
> >about forks, while this slows things down too, they are not used in
> >compression process which is parallelized over the all cpu threads.
> >Each command is indeed forked, but I'm only interested about
> >pack-objects part hence the code I linked.
>
> OK, we're on the same page now :).
>
> >$ strace --mask=debug+syscall+thread -o git.strace git repack -a -f
> >Counting objects: 156690, done.
> >Delta compression using up to 12 threads.
> >Compressing objects: 100% (154730/154730), done.
> >Writing objects: 100% (156690/156690), done.
> >Total 156690 (delta 123449), reused 33146 (delta 0)
> >
> >$ grep "fork(" git.strace
> > 559 53728 [main] git 24340 fork: 24368 = fork()
> > 465 54022 [main] git 24368 fork: 0 = fork()
> >
> >Only two forks were created, while during compression only 25% cpu was
> >used (on big repo like linux kernel it doesn't exceed 8%). With native
> >git the same workload easily uses 95-100% cpu and therefor is a lot
> >faster.
>
> I was able to reproduce your issue using a cloned newlib-cygwin repo. On a
> 6-CPU machine I saw max 36% CPU utilization during the compression phase.
> ProcessExplorer showed all 6 threads were getting CPU time (to varying
> degrees) and when suspended they were always trying to acquire a mutex. I'd
> like to run some more straces and perhaps investigate with some other tools
> before saying more. This may take a while.
>
> What I've done so far is install the git-debuginfo and cygwin-debuginfo
> packages to that I can convert hex RIP addresses to line numbers. I've run
> the testcase under gdb so I can interrupt at random times and poke around.
> The straces from this testcase are ginormous so I hope I can figure out a
> better way to see why the compression threads aren't CPU-bound like they
> should be. If you don't already know, 'strace --help' shows the available
> mask values. The threads are each writing to disk, so I wonder if there's
> some unintentional serialization going on somewhere, but I don't know yet
> how I could verify that theory.
If I'm allowed to make an educated guess, the big serializer in Cygwin
are probably the calls to malloc, calloc, realloc, free. We desperately
need a new malloc implementation better suited to multi-threading.
Corinna
--
Corinna Vinschen Please, send mails regarding Cygwin to
Cygwin Maintainer cygwin AT cygwin DOT com
Red Hat
[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Cygwin multithreading performance
2015-11-21 10:53 ` Corinna Vinschen
@ 2015-11-23 7:45 ` Mark Geisert
2015-11-23 10:27 ` John Hein
0 siblings, 1 reply; 21+ messages in thread
From: Mark Geisert @ 2015-11-23 7:45 UTC (permalink / raw)
To: cygwin
Corinna Vinschen wrote:
> On Nov 21 01:21, Mark Geisert wrote:
[...] so I wonder if there's
>> some unintentional serialization going on somewhere, but I don't know yet
>> how I could verify that theory.
>
> If I'm allowed to make an educated guess, the big serializer in Cygwin
> are probably the calls to malloc, calloc, realloc, free. We desperately
> need a new malloc implementation better suited to multi-threading.
That's very helpful to know. I'd want to first make sure the heavy lock
activity I'm seeing in the traces really is due to malloc() and friends
but I couldn't help a speculative search online for multithread-safe
malloc(). These turned up:
tcmalloc - part of google-perftools, requires libunwind, evidently
not yet ported to Windows AFAICT,
nedmalloc - http://www.nedprod.com/programs/portable/nedmalloc/
ptmalloc - http://www.malloc.de/
The latter two are based on Doug Lea's dlmalloc which is also the basis
of Cygwin's malloc() functions. As I understand it, ptmalloc in one
form or another has been part of glibc on Linux for some time.
So there may be a solution in sight if we need to go that direction. Of
course, SHTDI as usual :).
..mark
--
Problem reports: http://cygwin.com/problems.html
FAQ: http://cygwin.com/faq/
Documentation: http://cygwin.com/docs.html
Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Cygwin multithreading performance
2015-11-23 7:45 ` Mark Geisert
@ 2015-11-23 10:27 ` John Hein
2015-11-24 1:05 ` Mark Geisert
0 siblings, 1 reply; 21+ messages in thread
From: John Hein @ 2015-11-23 10:27 UTC (permalink / raw)
To: cygwin
Mark Geisert wrote at 23:45 -0800 on Nov 22, 2015:
> Corinna Vinschen wrote:
> > On Nov 21 01:21, Mark Geisert wrote:
> [...] so I wonder if there's
> >> some unintentional serialization going on somewhere, but I don't know yet
> >> how I could verify that theory.
> >
> > If I'm allowed to make an educated guess, the big serializer in Cygwin
> > are probably the calls to malloc, calloc, realloc, free. We desperately
> > need a new malloc implementation better suited to multi-threading.
>
> That's very helpful to know. I'd want to first make sure the heavy lock
> activity I'm seeing in the traces really is due to malloc() and friends
> but I couldn't help a speculative search online for multithread-safe
> malloc(). These turned up:
> tcmalloc - part of google-perftools, requires libunwind, evidently
> not yet ported to Windows AFAICT,
> nedmalloc - http://www.nedprod.com/programs/portable/nedmalloc/
> ptmalloc - http://www.malloc.de/
>
> The latter two are based on Doug Lea's dlmalloc which is also the basis
> of Cygwin's malloc() functions. As I understand it, ptmalloc in one
> form or another has been part of glibc on Linux for some time.
>
> So there may be a solution in sight if we need to go that direction. Of
> course, SHTDI as usual :).
>
> ...mark
Someone recently mentioned on this list they were working on porting
jemalloc. That would be a good choice.
--
Problem reports: http://cygwin.com/problems.html
FAQ: http://cygwin.com/faq/
Documentation: http://cygwin.com/docs.html
Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Cygwin multithreading performance
2015-11-23 10:27 ` John Hein
@ 2015-11-24 1:05 ` Mark Geisert
2015-11-26 9:49 ` Corinna Vinschen
0 siblings, 1 reply; 21+ messages in thread
From: Mark Geisert @ 2015-11-24 1:05 UTC (permalink / raw)
To: cygwin
John Hein wrote:
> Mark Geisert wrote at 23:45 -0800 on Nov 22, 2015:
> > Corinna Vinschen wrote:
> > > On Nov 21 01:21, Mark Geisert wrote:
> > [...] so I wonder if there's
> > >> some unintentional serialization going on somewhere, but I don't know yet
> > >> how I could verify that theory.
> > >
> > > If I'm allowed to make an educated guess, the big serializer in Cygwin
> > > are probably the calls to malloc, calloc, realloc, free. We desperately
> > > need a new malloc implementation better suited to multi-threading.
> >
> > That's very helpful to know. I'd want to first make sure the heavy lock
> > activity I'm seeing in the traces really is due to malloc() and friends
> > but I couldn't help a speculative search online for multithread-safe
> > malloc(). These turned up:
> > tcmalloc - part of google-perftools, requires libunwind, evidently
> > not yet ported to Windows AFAICT,
> > nedmalloc - http://www.nedprod.com/programs/portable/nedmalloc/
> > ptmalloc - http://www.malloc.de/
> >
> > The latter two are based on Doug Lea's dlmalloc which is also the basis
> > of Cygwin's malloc() functions. As I understand it, ptmalloc in one
> > form or another has been part of glibc on Linux for some time.
> >
> > So there may be a solution in sight if we need to go that direction. Of
> > course, SHTDI as usual :).
> >
> > ...mark
>
> Someone recently mentioned on this list they were working on porting
> jemalloc. That would be a good choice.
Indeed; thanks for the reminder. Somehow I hadn't followed that thread.
..mark
--
Problem reports: http://cygwin.com/problems.html
FAQ: http://cygwin.com/faq/
Documentation: http://cygwin.com/docs.html
Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Cygwin multithreading performance
2015-11-24 1:05 ` Mark Geisert
@ 2015-11-26 9:49 ` Corinna Vinschen
2015-11-26 10:49 ` Mark Geisert
0 siblings, 1 reply; 21+ messages in thread
From: Corinna Vinschen @ 2015-11-26 9:49 UTC (permalink / raw)
To: cygwin
[-- Attachment #1: Type: text/plain, Size: 2043 bytes --]
On Nov 23 16:54, Mark Geisert wrote:
> John Hein wrote:
> >Mark Geisert wrote at 23:45 -0800 on Nov 22, 2015:
> > > Corinna Vinschen wrote:
> > > > On Nov 21 01:21, Mark Geisert wrote:
> > > [...] so I wonder if there's
> > > >> some unintentional serialization going on somewhere, but I don't know yet
> > > >> how I could verify that theory.
> > > >
> > > > If I'm allowed to make an educated guess, the big serializer in Cygwin
> > > > are probably the calls to malloc, calloc, realloc, free. We desperately
> > > > need a new malloc implementation better suited to multi-threading.
> > >
> > > That's very helpful to know. I'd want to first make sure the heavy lock
> > > activity I'm seeing in the traces really is due to malloc() and friends
> > > but I couldn't help a speculative search online for multithread-safe
> > > malloc(). These turned up:
> > > tcmalloc - part of google-perftools, requires libunwind, evidently
> > > not yet ported to Windows AFAICT,
> > > nedmalloc - http://www.nedprod.com/programs/portable/nedmalloc/
> > > ptmalloc - http://www.malloc.de/
> > >
> > > The latter two are based on Doug Lea's dlmalloc which is also the basis
> > > of Cygwin's malloc() functions. As I understand it, ptmalloc in one
> > > form or another has been part of glibc on Linux for some time.
> > >
> > > So there may be a solution in sight if we need to go that direction. Of
> > > course, SHTDI as usual :).
> > >
> > > ...mark
> >
> >Someone recently mentioned on this list they were working on porting
> >jemalloc. That would be a good choice.
>
> Indeed; thanks for the reminder. Somehow I hadn't followed that thread.
Indeed^2. Did you look into the locking any further to see if there's
more than one culprit? I guess we've a rather long way to a "lock-less
kernel"...
Corinna
--
Corinna Vinschen Please, send mails regarding Cygwin to
Cygwin Maintainer cygwin AT cygwin DOT com
Red Hat
[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Cygwin multithreading performance
2015-11-26 9:49 ` Corinna Vinschen
@ 2015-11-26 10:49 ` Mark Geisert
2015-12-05 10:51 ` Mark Geisert
0 siblings, 1 reply; 21+ messages in thread
From: Mark Geisert @ 2015-11-26 10:49 UTC (permalink / raw)
To: cygwin
Corinna Vinschen wrote:
> On Nov 23 16:54, Mark Geisert wrote:
>> John Hein wrote:
>>> Mark Geisert wrote at 23:45 -0800 on Nov 22, 2015:
>>> > Corinna Vinschen wrote:
>>> > > On Nov 21 01:21, Mark Geisert wrote:
>>> > [...] so I wonder if there's
>>> > >> some unintentional serialization going on somewhere, but I don't know yet
>>> > >> how I could verify that theory.
>>> > >
>>> > > If I'm allowed to make an educated guess, the big serializer in Cygwin
>>> > > are probably the calls to malloc, calloc, realloc, free. We desperately
>>> > > need a new malloc implementation better suited to multi-threading.
[...]
>>>
>>> Someone recently mentioned on this list they were working on porting
>>> jemalloc. That would be a good choice.
>>
>> Indeed; thanks for the reminder. Somehow I hadn't followed that thread.
>
> Indeed^2. Did you look into the locking any further to see if there's
> more than one culprit? I guess we've a rather long way to a "lock-less
> kernel"...
It took me a while to figure out what I wanted to see in the strace
logs. I ended up adding a small patch to pthread_mutex::lock() to
record a timestamp on entry, and also log that in the pthread_printf()
near the end of the method. With that I'm able to see how long a thread
has to wait for a lock before actually acquiring it. That will allow me
to unravel the sequence of locking and unlocking and give stats for all
threads and/or locks. That could be generally useful to evaluate
different memory allocators or different locking strategies using the
same allocator.
But that is just groundwork to identifying which locks are suffering the
most contention. To identify them at source level I think I'll also
need to record the caller's RIP when they are being locked.
In the raw strace data I'm looking at for the OP's testcase, I can see a
lot of cases where a thread wants a lock but is delayed for milliseconds
before getting ahold of it. I can't say ATM whether it's just one or a
few locks suffering this way, or more. Work continues :).
..mark
--
Problem reports: http://cygwin.com/problems.html
FAQ: http://cygwin.com/faq/
Documentation: http://cygwin.com/docs.html
Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Cygwin multithreading performance
2015-11-26 10:49 ` Mark Geisert
@ 2015-12-05 10:51 ` Mark Geisert
2015-12-05 13:07 ` Kacper Michajlow
0 siblings, 1 reply; 21+ messages in thread
From: Mark Geisert @ 2015-12-05 10:51 UTC (permalink / raw)
To: cygwin
Mark Geisert wrote:
> Corinna Vinschen wrote:
>> On Nov 23 16:54, Mark Geisert wrote:
>>> John Hein wrote:
>>>> Mark Geisert wrote at 23:45 -0800 on Nov 22, 2015:
>>>> > Corinna Vinschen wrote:
>>>> > > On Nov 21 01:21, Mark Geisert wrote:
>>>> > [...] so I wonder if there's
>>>> > >> some unintentional serialization going on somewhere, but I
>>>> don't know yet
>>>> > >> how I could verify that theory.
>>>> > >
>>>> > > If I'm allowed to make an educated guess, the big serializer
>>>> in Cygwin
>>>> > > are probably the calls to malloc, calloc, realloc, free. We
>>>> desperately
>>>> > > need a new malloc implementation better suited to
>>>> multi-threading.
> [...]
>>>>
>>>> Someone recently mentioned on this list they were working on porting
>>>> jemalloc. That would be a good choice.
>>>
>>> Indeed; thanks for the reminder. Somehow I hadn't followed that thread.
>>
>> Indeed^2. Did you look into the locking any further to see if there's
>> more than one culprit? I guess we've a rather long way to a "lock-less
>> kernel"...
[...]
> But that is just groundwork to identifying which locks are suffering the
> most contention. To identify them at source level I think I'll also
> need to record the caller's RIP when they are being locked.
In the OP's very good testcase the most heavily contended locks, by far,
are those internal to git's builtin/pack-objects.c. I plan to show
actual stats after some more cleanup, but I did notice something in that
git source file that might explain the difference between Cygwin and
MinGW when running this testcase...
#ifndef NO_PTHREADS
static pthread_mutex_t read_mutex;
#define read_lock() pthread_mutex_lock(&read_mutex)
#define read_unlock() pthread_mutex_unlock(&read_mutex)
static pthread_mutex_t cache_mutex;
#define cache_lock() pthread_mutex_lock(&cache_mutex)
#define cache_unlock() pthread_mutex_unlock(&cache_mutex)
static pthread_mutex_t progress_mutex;
#define progress_lock() pthread_mutex_lock(&progress_mutex)
#define progress_unlock() pthread_mutex_unlock(&progress_mutex)
#else
#define read_lock() (void)0
#define read_unlock() (void)0
#define cache_lock() (void)0
#define cache_unlock() (void)0
#define progress_lock() (void)0
#define progress_unlock() (void)0
#endif
Is it possible the MinGW version of git is compiled with NO_PTHREADS
#defined? If so, it would mean there's no locking being done at all and
would explain the faster execution and near 100% CPU utilization when
running under MinGW.
..mark
--
Problem reports: http://cygwin.com/problems.html
FAQ: http://cygwin.com/faq/
Documentation: http://cygwin.com/docs.html
Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Cygwin multithreading performance
2015-12-05 10:51 ` Mark Geisert
@ 2015-12-05 13:07 ` Kacper Michajlow
2015-12-05 13:59 ` Kacper Michajlow
2015-12-05 22:40 ` Mark Geisert
0 siblings, 2 replies; 21+ messages in thread
From: Kacper Michajlow @ 2015-12-05 13:07 UTC (permalink / raw)
To: cygwin
2015-12-05 11:51 GMT+01:00 Mark Geisert <mark@maxrnd.com>:
> Mark Geisert wrote:
>>
>> Corinna Vinschen wrote:
>>>
>>> On Nov 23 16:54, Mark Geisert wrote:
>>>>
>>>> John Hein wrote:
>>>>>
>>>>> Mark Geisert wrote at 23:45 -0800 on Nov 22, 2015:
>>>>> > Corinna Vinschen wrote:
>>>>> > > On Nov 21 01:21, Mark Geisert wrote:
>>>>> > [...] so I wonder if there's
>>>>> > >> some unintentional serialization going on somewhere, but I
>>>>> don't know yet
>>>>> > >> how I could verify that theory.
>>>>> > >
>>>>> > > If I'm allowed to make an educated guess, the big serializer
>>>>> in Cygwin
>>>>> > > are probably the calls to malloc, calloc, realloc, free. We
>>>>> desperately
>>>>> > > need a new malloc implementation better suited to
>>>>> multi-threading.
>>
>> [...]
>>>>>
>>>>>
>>>>> Someone recently mentioned on this list they were working on porting
>>>>> jemalloc. That would be a good choice.
>>>>
>>>>
>>>> Indeed; thanks for the reminder. Somehow I hadn't followed that thread.
>>>
>>>
>>> Indeed^2. Did you look into the locking any further to see if there's
>>> more than one culprit? I guess we've a rather long way to a "lock-less
>>> kernel"...
>
> [...]
>>
>> But that is just groundwork to identifying which locks are suffering the
>> most contention. To identify them at source level I think I'll also
>> need to record the caller's RIP when they are being locked.
>
>
> In the OP's very good testcase the most heavily contended locks, by far, are
> those internal to git's builtin/pack-objects.c. I plan to show actual stats
> after some more cleanup, but I did notice something in that git source file
> that might explain the difference between Cygwin and MinGW when running this
> testcase...
>
> #ifndef NO_PTHREADS
>
> static pthread_mutex_t read_mutex;
> #define read_lock() pthread_mutex_lock(&read_mutex)
> #define read_unlock() pthread_mutex_unlock(&read_mutex)
>
> static pthread_mutex_t cache_mutex;
> #define cache_lock() pthread_mutex_lock(&cache_mutex)
> #define cache_unlock() pthread_mutex_unlock(&cache_mutex)
>
> static pthread_mutex_t progress_mutex;
> #define progress_lock() pthread_mutex_lock(&progress_mutex)
> #define progress_unlock() pthread_mutex_unlock(&progress_mutex)
>
> #else
>
> #define read_lock() (void)0
> #define read_unlock() (void)0
> #define cache_lock() (void)0
> #define cache_unlock() (void)0
> #define progress_lock() (void)0
> #define progress_unlock() (void)0
>
> #endif
>
> Is it possible the MinGW version of git is compiled with NO_PTHREADS
> #defined? If so, it would mean there's no locking being done at all and
> would explain the faster execution and near 100% CPU utilization when
> running under MinGW.
Nah, there is no threading enabled when there is no pthreads. How
would that work? :D See thread-utils.h
#ifndef NO_PTHREADS
#include <pthread.h>
extern int online_cpus(void);
extern int init_recursive_mutex(pthread_mutex_t*);
#else
#define online_cpus() 1
#endif
Looks like there is indeed a bug in git code when passing "--threads"
explicitly to "git pack-objects", because they show warning about
threads being unsupported, but doesn't overwrite delta_search_threads
value. I will go to git's ML about it. This is completely not related
to our issue.
--
Problem reports: http://cygwin.com/problems.html
FAQ: http://cygwin.com/faq/
Documentation: http://cygwin.com/docs.html
Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Cygwin multithreading performance
2015-12-05 13:07 ` Kacper Michajlow
@ 2015-12-05 13:59 ` Kacper Michajlow
2015-12-05 22:40 ` Mark Geisert
1 sibling, 0 replies; 21+ messages in thread
From: Kacper Michajlow @ 2015-12-05 13:59 UTC (permalink / raw)
To: cygwin
2015-12-05 14:07 GMT+01:00 Kacper Michajlow <kasper93@gmail.com>:
> 2015-12-05 11:51 GMT+01:00 Mark Geisert <mark@maxrnd.com>:
>> Mark Geisert wrote:
>>>
>>> Corinna Vinschen wrote:
>>>>
>>>> On Nov 23 16:54, Mark Geisert wrote:
>>>>>
>>>>> John Hein wrote:
>>>>>>
>>>>>> Mark Geisert wrote at 23:45 -0800 on Nov 22, 2015:
>>>>>> > Corinna Vinschen wrote:
>>>>>> > > On Nov 21 01:21, Mark Geisert wrote:
>>>>>> > [...] so I wonder if there's
>>>>>> > >> some unintentional serialization going on somewhere, but I
>>>>>> don't know yet
>>>>>> > >> how I could verify that theory.
>>>>>> > >
>>>>>> > > If I'm allowed to make an educated guess, the big serializer
>>>>>> in Cygwin
>>>>>> > > are probably the calls to malloc, calloc, realloc, free. We
>>>>>> desperately
>>>>>> > > need a new malloc implementation better suited to
>>>>>> multi-threading.
>>>
>>> [...]
>>>>>>
>>>>>>
>>>>>> Someone recently mentioned on this list they were working on porting
>>>>>> jemalloc. That would be a good choice.
>>>>>
>>>>>
>>>>> Indeed; thanks for the reminder. Somehow I hadn't followed that thread.
>>>>
>>>>
>>>> Indeed^2. Did you look into the locking any further to see if there's
>>>> more than one culprit? I guess we've a rather long way to a "lock-less
>>>> kernel"...
>>
>> [...]
>>>
>>> But that is just groundwork to identifying which locks are suffering the
>>> most contention. To identify them at source level I think I'll also
>>> need to record the caller's RIP when they are being locked.
>>
>>
>> In the OP's very good testcase the most heavily contended locks, by far, are
>> those internal to git's builtin/pack-objects.c. I plan to show actual stats
>> after some more cleanup, but I did notice something in that git source file
>> that might explain the difference between Cygwin and MinGW when running this
>> testcase...
>>
>> #ifndef NO_PTHREADS
>>
>> static pthread_mutex_t read_mutex;
>> #define read_lock() pthread_mutex_lock(&read_mutex)
>> #define read_unlock() pthread_mutex_unlock(&read_mutex)
>>
>> static pthread_mutex_t cache_mutex;
>> #define cache_lock() pthread_mutex_lock(&cache_mutex)
>> #define cache_unlock() pthread_mutex_unlock(&cache_mutex)
>>
>> static pthread_mutex_t progress_mutex;
>> #define progress_lock() pthread_mutex_lock(&progress_mutex)
>> #define progress_unlock() pthread_mutex_unlock(&progress_mutex)
>>
>> #else
>>
>> #define read_lock() (void)0
>> #define read_unlock() (void)0
>> #define cache_lock() (void)0
>> #define cache_unlock() (void)0
>> #define progress_lock() (void)0
>> #define progress_unlock() (void)0
>>
>> #endif
>>
>> Is it possible the MinGW version of git is compiled with NO_PTHREADS
>> #defined? If so, it would mean there's no locking being done at all and
>> would explain the faster execution and near 100% CPU utilization when
>> running under MinGW.
>
> Nah, there is no threading enabled when there is no pthreads. How
> would that work? :D See thread-utils.h
>
> #ifndef NO_PTHREADS
> #include <pthread.h>
>
> extern int online_cpus(void);
> extern int init_recursive_mutex(pthread_mutex_t*);
>
> #else
>
> #define online_cpus() 1
>
> #endif
>
>
> Looks like there is indeed a bug in git code when passing "--threads"
> explicitly to "git pack-objects", because they show warning about
> threads being unsupported, but doesn't overwrite delta_search_threads
> value. I will go to git's ML about it. This is completely not related
> to our issue.
Obviously I was wrong. There is
#define ll_find_deltas(l, s, w, d, p) find_deltas(l, &s, w, d, p)
So 'delta_search_threads' value is never used. Still not related to
cygwin issue tho ;)
--
Problem reports: http://cygwin.com/problems.html
FAQ: http://cygwin.com/faq/
Documentation: http://cygwin.com/docs.html
Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Cygwin multithreading performance
2015-12-05 13:07 ` Kacper Michajlow
2015-12-05 13:59 ` Kacper Michajlow
@ 2015-12-05 22:40 ` Mark Geisert
2015-12-06 2:35 ` Kacper Michajlow
1 sibling, 1 reply; 21+ messages in thread
From: Mark Geisert @ 2015-12-05 22:40 UTC (permalink / raw)
To: cygwin
Kacper Michajlow wrote:
> 2015-12-05 11:51 GMT+01:00 Mark Geisert <mark@maxrnd.com>:
>> Mark Geisert wrote:
>> In the OP's very good testcase the most heavily contended locks, by far, are
>> those internal to git's builtin/pack-objects.c. I plan to show actual stats
>> after some more cleanup, but I did notice something in that git source file
>> that might explain the difference between Cygwin and MinGW when running this
>> testcase...
>>
>> #ifndef NO_PTHREADS
>>
>> static pthread_mutex_t read_mutex;
>> #define read_lock() pthread_mutex_lock(&read_mutex)
>> #define read_unlock() pthread_mutex_unlock(&read_mutex)
>>
>> static pthread_mutex_t cache_mutex;
>> #define cache_lock() pthread_mutex_lock(&cache_mutex)
>> #define cache_unlock() pthread_mutex_unlock(&cache_mutex)
>>
>> static pthread_mutex_t progress_mutex;
>> #define progress_lock() pthread_mutex_lock(&progress_mutex)
>> #define progress_unlock() pthread_mutex_unlock(&progress_mutex)
>>
>> #else
>>
>> #define read_lock() (void)0
>> #define read_unlock() (void)0
>> #define cache_lock() (void)0
>> #define cache_unlock() (void)0
>> #define progress_lock() (void)0
>> #define progress_unlock() (void)0
>>
>> #endif
>>
>> Is it possible the MinGW version of git is compiled with NO_PTHREADS
>> #defined? If so, it would mean there's no locking being done at all and
>> would explain the faster execution and near 100% CPU utilization when
>> running under MinGW.
>
> Nah, there is no threading enabled when there is no pthreads. How
> would that work? :D See thread-utils.h
>
> #ifndef NO_PTHREADS
> #include <pthread.h>
>
> extern int online_cpus(void);
> extern int init_recursive_mutex(pthread_mutex_t*);
>
> #else
>
> #define online_cpus() 1
>
> #endif
We're not familiar at all with MinGW. Could you locate the source for
MinGW's pthread_mutex_lock() online and give us a link to it? And BTW,
which Windows are you running and on what kind of hardware (bitness and
#CPUS/threads)?
It looks like we're going to have to compare actual pthread_mutex_lock()
implementations. Inspecting source is nice but I don't want to be
chasing a mirage so I really hope there's a pthread_mutex_lock()
function inside the MinGW git you are running. gdb could easily answer
that question. Could you please do an 'info func pthread_mutex_lock'
after starting MinGW git under MinGW gdb with a breakpoint at main() (so
libraries are loaded).
..mark
--
Problem reports: http://cygwin.com/problems.html
FAQ: http://cygwin.com/faq/
Documentation: http://cygwin.com/docs.html
Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Cygwin multithreading performance
2015-12-05 22:40 ` Mark Geisert
@ 2015-12-06 2:35 ` Kacper Michajlow
2015-12-06 8:02 ` Mark Geisert
0 siblings, 1 reply; 21+ messages in thread
From: Kacper Michajlow @ 2015-12-06 2:35 UTC (permalink / raw)
To: cygwin
2015-12-05 23:40 GMT+01:00 Mark Geisert <mark@maxrnd.com>:
> Kacper Michajlow wrote:
>>
>> 2015-12-05 11:51 GMT+01:00 Mark Geisert <mark@maxrnd.com>:
>>>
>>> Mark Geisert wrote:
>>> In the OP's very good testcase the most heavily contended locks, by far,
>>> are
>>> those internal to git's builtin/pack-objects.c. I plan to show actual
>>> stats
>>> after some more cleanup, but I did notice something in that git source
>>> file
>>> that might explain the difference between Cygwin and MinGW when running
>>> this
>>> testcase...
>>>
>>> #ifndef NO_PTHREADS
>>>
>>> static pthread_mutex_t read_mutex;
>>> #define read_lock() pthread_mutex_lock(&read_mutex)
>>> #define read_unlock() pthread_mutex_unlock(&read_mutex)
>>>
>>> static pthread_mutex_t cache_mutex;
>>> #define cache_lock() pthread_mutex_lock(&cache_mutex)
>>> #define cache_unlock() pthread_mutex_unlock(&cache_mutex)
>>>
>>> static pthread_mutex_t progress_mutex;
>>> #define progress_lock() pthread_mutex_lock(&progress_mutex)
>>> #define progress_unlock() pthread_mutex_unlock(&progress_mutex)
>>>
>>> #else
>>>
>>> #define read_lock() (void)0
>>> #define read_unlock() (void)0
>>> #define cache_lock() (void)0
>>> #define cache_unlock() (void)0
>>> #define progress_lock() (void)0
>>> #define progress_unlock() (void)0
>>>
>>> #endif
>>>
>>> Is it possible the MinGW version of git is compiled with NO_PTHREADS
>>> #defined? If so, it would mean there's no locking being done at all and
>>> would explain the faster execution and near 100% CPU utilization when
>>> running under MinGW.
>>
>>
>> Nah, there is no threading enabled when there is no pthreads. How
>> would that work? :D See thread-utils.h
>>
>> #ifndef NO_PTHREADS
>> #include <pthread.h>
>>
>> extern int online_cpus(void);
>> extern int init_recursive_mutex(pthread_mutex_t*);
>>
>> #else
>>
>> #define online_cpus() 1
>>
>> #endif
>
>
> We're not familiar at all with MinGW. Could you locate the source for
> MinGW's pthread_mutex_lock() online and give us a link to it? And BTW,
> which Windows are you running and on what kind of hardware (bitness and
> #CPUS/threads)?
>
> It looks like we're going to have to compare actual pthread_mutex_lock()
> implementations. Inspecting source is nice but I don't want to be chasing a
> mirage so I really hope there's a pthread_mutex_lock() function inside the
> MinGW git you are running. gdb could easily answer that question. Could
> you please do an 'info func pthread_mutex_lock' after starting MinGW git
> under MinGW gdb with a breakpoint at main() (so libraries are loaded).
>
>
> ..mark
>
>
> --
> Problem reports: http://cygwin.com/problems.html
> FAQ: http://cygwin.com/faq/
> Documentation: http://cygwin.com/docs.html
> Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
>
Hmm, thinking about it mingw doesn't have pthread implementation or
any wrapper for it. If someone needs pthread they would probably go
for pthreads-w32 implementation.
I started to wonder because I don't recall git would need pthreads to
compile on Windows. And indeed they have a wrapper for Windows API...
https://github.com/git/git/blob/master/compat/win32/pthread.h
https://github.com/git/git/blob/master/compat/win32/pthread.c
Though it is not really a matter that "native" git build is fast and
all, but that Cygwin's one really struggles if it comes to MT workload
.
And this not only issue with git unfortunately. Download speeds are
also limited on Cygwin. I know POSIX compatibility layers comes with a
price but I would love to see improvements in those areas.
Cygwin:
Receiving objects: 100% (230458/230458), 78.41 MiB | 1.53 MiB/s, done.
"native" git:
Receiving objects: 100% (230458/230458), 78.41 MiB | 18.54 MiB/s, done.
I'm on Windows 10 x64 and i7 5820K (6C/12T).
-Kacper
--
Problem reports: http://cygwin.com/problems.html
FAQ: http://cygwin.com/faq/
Documentation: http://cygwin.com/docs.html
Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Cygwin multithreading performance
2015-12-06 2:35 ` Kacper Michajlow
@ 2015-12-06 8:02 ` Mark Geisert
2015-12-06 20:56 ` Kacper Michajlow
0 siblings, 1 reply; 21+ messages in thread
From: Mark Geisert @ 2015-12-06 8:02 UTC (permalink / raw)
To: cygwin
Kacper Michajlow wrote:
> 2015-12-05 23:40 GMT+01:00 Mark Geisert <mark@maxrnd.com>:
>> It looks like we're going to have to compare actual pthread_mutex_lock()
>> implementations. Inspecting source is nice but I don't want to be chasing a
>> mirage so I really hope there's a pthread_mutex_lock() function inside the
>> MinGW git you are running. gdb could easily answer that question. Could
>> you please do an 'info func pthread_mutex_lock' after starting MinGW git
>> under MinGW gdb with a breakpoint at main() (so libraries are loaded).
[...]
> Hmm, thinking about it mingw doesn't have pthread implementation or
> any wrapper for it. If someone needs pthread they would probably go
> for pthreads-w32 implementation.
>
> I started to wonder because I don't recall git would need pthreads to
> compile on Windows. And indeed they have a wrapper for Windows API...
> https://github.com/git/git/blob/master/compat/win32/pthread.h
> https://github.com/git/git/blob/master/compat/win32/pthread.c
OK, so git has its own pthread_mutex_lock/unlock ops which map to very
light-weight critical section operations.
> Though it is not really a matter that "native" git build is fast and
> all, but that Cygwin's one really struggles if it comes to MT workload.
In the worst cases I see using your testcase, about half the time the
busiest locks are processed within 1 usec but there's a spectrum of
longer latencies for the other half of the time. I don't know (yet) if
that can be improved in Cygwin's more general implementation but at
least the matter has now been brought to our attention :).
> And this not only issue with git unfortunately. Download speeds are
> also limited on Cygwin. I know POSIX compatibility layers comes with a
> price but I would love to see improvements in those areas.
> Cygwin:
> Receiving objects: 100% (230458/230458), 78.41 MiB | 1.53 MiB/s, done.
> "native" git:
> Receiving objects: 100% (230458/230458), 78.41 MiB | 18.54 MiB/s, done.
You're asserting this additional testcase has the same cause. What is
telling you that? And FTR what is the git command you are issuing? I
can then do the lock latency analysis on this new testcase if warranted.
Thanks,
..mark
--
Problem reports: http://cygwin.com/problems.html
FAQ: http://cygwin.com/faq/
Documentation: http://cygwin.com/docs.html
Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Cygwin multithreading performance
2015-12-06 8:02 ` Mark Geisert
@ 2015-12-06 20:56 ` Kacper Michajlow
2015-12-08 10:51 ` Mark Geisert
0 siblings, 1 reply; 21+ messages in thread
From: Kacper Michajlow @ 2015-12-06 20:56 UTC (permalink / raw)
To: cygwin
2015-12-06 9:02 GMT+01:00 Mark Geisert <mark@maxrnd.com>:
> Kacper Michajlow wrote:
>>
>> 2015-12-05 23:40 GMT+01:00 Mark Geisert <mark@maxrnd.com>:
>>>
>>> It looks like we're going to have to compare actual pthread_mutex_lock()
>>> implementations. Inspecting source is nice but I don't want to be
>>> chasing a
>>> mirage so I really hope there's a pthread_mutex_lock() function inside
>>> the
>>> MinGW git you are running. gdb could easily answer that question. Could
>>> you please do an 'info func pthread_mutex_lock' after starting MinGW git
>>> under MinGW gdb with a breakpoint at main() (so libraries are loaded).
>
> [...]
>>
>> Hmm, thinking about it mingw doesn't have pthread implementation or
>> any wrapper for it. If someone needs pthread they would probably go
>> for pthreads-w32 implementation.
>>
>> I started to wonder because I don't recall git would need pthreads to
>> compile on Windows. And indeed they have a wrapper for Windows API...
>> https://github.com/git/git/blob/master/compat/win32/pthread.h
>> https://github.com/git/git/blob/master/compat/win32/pthread.c
>
>
> OK, so git has its own pthread_mutex_lock/unlock ops which map to very
> light-weight critical section operations.
>
>> Though it is not really a matter that "native" git build is fast and
>> all, but that Cygwin's one really struggles if it comes to MT workload.
>
>
> In the worst cases I see using your testcase, about half the time the
> busiest locks are processed within 1 usec but there's a spectrum of longer
> latencies for the other half of the time. I don't know (yet) if that can be
> improved in Cygwin's more general implementation but at least the matter has
> now been brought to our attention :).
,
Yes, I can imagine, git's objects are very small so threading overhead
is very noticeable.
>> And this not only issue with git unfortunately. Download speeds are
>> also limited on Cygwin. I know POSIX compatibility layers comes with a
>> price but I would love to see improvements in those areas.
>> Cygwin:
>> Receiving objects: 100% (230458/230458), 78.41 MiB | 1.53 MiB/s, done.
>> "native" git:
>> Receiving objects: 100% (230458/230458), 78.41 MiB | 18.54 MiB/s, done.
>
>
> You're asserting this additional testcase has the same cause. What is
> telling you that? And FTR what is the git command you are issuing? I can
> then do the lock latency analysis on this new testcase if warranted.
No, sorry, I mixed different things. It is just that I'm ruining both
git build lately and I wanted to share another issue before I forget
about it.
This was git clone command for some random repository from github.
There is a lot factors at hand here but the fact is with cygwin speed
is capped on 1.5MB/s and this is reproducible. This is probably also
related to the fact that git operates on large amount small object.
But this time it is single thread workload. I tried strace this, but
frankly I am not sure what to look for.
All in all I just want to bring those issues to your attention.
Whether it is fixable or not is another story. But we will not know
unless someone with required knowledge analyze it.
-Kacper
--
Problem reports: http://cygwin.com/problems.html
FAQ: http://cygwin.com/faq/
Documentation: http://cygwin.com/docs.html
Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Cygwin multithreading performance
2015-12-06 20:56 ` Kacper Michajlow
@ 2015-12-08 10:51 ` Mark Geisert
2015-12-08 15:34 ` Corinna Vinschen
0 siblings, 1 reply; 21+ messages in thread
From: Mark Geisert @ 2015-12-08 10:51 UTC (permalink / raw)
To: cygwin
(Maybe cygwin-developers is a better list for this? It's pretty obscure.)
Here are some mutex lock stats I've been talking about providing. These are
from the OP's original testcase 'git repack -a -f' running over a clone of the
newlib-cygwin source tree. Run on a 2-core, 4-HT machine under Windows 7 x64.
I'm running a slightly modified cygwin1.dll that has 3 one-line mods to thread.cc.
I feed an strace output file through an awk script and a C program to produce
the output below. The first display is a summary showing all mutexes with
latency buckets and counts for each thread and each mutex. The second display
shows just the last two mutexes but also shows the count of locks and unlocks
from each source line. You can see most mutexes have all their latencies <= 1
usec, but there are some that have a spectrum of latencies reaching above 1000
usecs == 1 msec. I'm defining latency as the difference in usecs between a
timestamp taken on entry to pthread_mutex::lock and the timestamp appearing in
the strace output for that ::lock operation when '--mask=pthread' is specified.
I'm considering adding the tools that produced these displays to the cygutils
package. I'm unsure if the cygwin1.dll mods I've made locally should be shipped
generally; I don't know how much extra CPU they use, if any.
..mark
======== first display ========
*** processes present ***
pid 4908: git
pid 7020: git
*** threads present ***
lock latency buckets: <=1 <=10 <=100 <=1000 >1000
tid main 0: lks 269960, ulks 269960, 269416 54 182 128 180
tid main 1: lks 6307, ulks 6307, 6304 1 2 0 0
tid 1216: lks 196941, ulks 196941, 84899 5045 91669 13914 1414
tid 4560: lks 197203, ulks 197203, 70033 4165 110333 11442 1230
tid 7840: lks 68984, ulks 68984, 34160 1389 25783 5685 1967
tid 9076: lks 166308, ulks 166308, 81715 2097 72009 8805 1682
*** mutexes present ***
lock latency buckets: <=1 <=10 <=100 <=1000 >1000
mtx 4908/01802F30E8 lks 0, ulks 0, 0 0 0 0 0
mtx 4908/0600000010 lks 9, ulks 9, 8 1 0 0 0
mtx 4908/0600000108 lks 179394, ulks 179394, 179361 18 14 0 1
mtx 4908/0600000160 lks 1, ulks 1, 1 0 0 0 0
mtx 4908/06000180E8 lks 0, ulks 0, 0 0 0 0 0
mtx 7020/06000180E8 lks 4182, ulks 4182, 4180 0 2 0 0
mtx 4908/0600018140 lks 0, ulks 0, 0 0 0 0 0
mtx 7020/0600018140 lks 1, ulks 1, 1 0 0 0 0
mtx 4908/0600028518 lks 18, ulks 18, 18 0 0 0 0
mtx 4908/0600038B60 lks 88002, ulks 88002, 87957 30 15 0 0
lock latency buckets: <=1 <=10 <=100 <=1000 >1000
mtx 4908/0600038EB0 lks 194, ulks 194, 194 0 0 0 0
mtx 4908/0600039010 lks 6, ulks 6, 6 0 0 0 0
mtx 4908/06000390A0 lks 6, ulks 6, 6 0 0 0 0
mtx 7020/0600039A20 lks 6, ulks 6, 6 0 0 0 0
mtx 4908/060003A280 lks 1, ulks 1, 1 0 0 0 0
mtx 7020/060003A280 lks 8, ulks 8, 8 0 0 0 0
mtx 4908/060003A308 lks 0, ulks 0, 0 0 0 0 0
mtx 7020/060003A308 lks 6, ulks 6, 6 0 0 0 0
mtx 4908/060003A370 lks 0, ulks 0, 0 0 0 0 0
mtx 4908/060003A3B0 lks 0, ulks 0, 0 0 0 0 0
lock latency buckets: <=1 <=10 <=100 <=1000 >1000
mtx 4908/060003A428 lks 0, ulks 0, 0 0 0 0 0
mtx 4908/060003A468 lks 0, ulks 0, 0 0 0 0 0
mtx 4908/060003A940 lks 0, ulks 0, 0 0 0 0 0
mtx 7020/060003A940 lks 26, ulks 26, 26 0 0 0 0
mtx 7020/060003AC90 lks 194, ulks 194, 194 0 0 0 0
mtx 7020/060003ADF0 lks 6, ulks 6, 6 0 0 0 0
mtx 4908/0600051B30 lks 1, ulks 1, 1 0 0 0 0
mtx 4908/0600051E20 lks 6, ulks 6, 6 0 0 0 0
mtx 7020/0600053A00 lks 920, ulks 920, 920 0 0 0 0
mtx 4908/0600053B20 lks 920, ulks 920, 920 0 0 0 0
lock latency buckets: <=1 <=10 <=100 <=1000 >1000
mtx 4908/0600062008 lks 14, ulks 14, 14 0 0 0 0
mtx 4908/06000621D0 lks 2, ulks 2, 2 0 0 0 0
mtx 4908/06000625B0 lks 6, ulks 6, 6 0 0 0 0
mtx 4908/0600063B90 lks 0, ulks 0, 0 0 0 0 0
mtx 7020/0600063B90 lks 2, ulks 2, 2 0 0 0 0
mtx 4908/0600063BE0 lks 0, ulks 0, 0 0 0 0 0
mtx 7020/0600063BE0 lks 5, ulks 5, 5 0 0 0 0
mtx 4908/0600063C30 lks 0, ulks 0, 0 0 0 0 0
mtx 7020/0600063C30 lks 2, ulks 2, 2 0 0 0 0
mtx 7020/0600063C80 lks 4, ulks 4, 4 0 0 0 0
lock latency buckets: <=1 <=10 <=100 <=1000 >1000
mtx 7020/0600076500 lks 920, ulks 920, 920 0 0 0 0
mtx 4908/0600114120 lks 15, ulks 15, 9 2 4 0 0
mtx 4908/060013EE78 lks 658, ulks 658, 446 17 189 6 0
mtx 4908/060026DE50 lks 12, ulks 12, 4 1 6 1 0
mtx 4908/06002A00F0 lks 155066, ulks 155066, 66359 4395 78895 4742 675
mtx 4908/06006628D0 lks 4, ulks 4, 4 0 0 0 0
mtx 4908/06007217B0 lks 23, ulks 23, 23 0 0 0 0
mtx 4908/0600784C70 lks 1529, ulks 1529, 1285 39 195 10 0
mtx 7020/0600837A80 lks 13, ulks 13, 13 0 0 0 0
mtx 4908/0600A081E8 lks 10, ulks 10, 9 1 0 0 0
lock latency buckets: <=1 <=10 <=100 <=1000 >1000
mtx 4908/0600A08228 lks 10, ulks 10, 5 3 2 0 0
mtx 4908/0600A082A8 lks 8, ulks 8, 6 0 2 0 0
mtx 4908/0600A082E8 lks 8, ulks 8, 3 0 5 0 0
mtx 4908/0600A08368 lks 8, ulks 8, 5 0 3 0 0
mtx 4908/0600A083A8 lks 8, ulks 8, 3 0 4 1 0
mtx 4908/0600D0A5B0 lks 2, ulks 2, 2 0 0 0 0
mtx 4908/0600F35670 lks 8, ulks 8, 8 0 0 0 0
mtx 4908/0600FA6860 lks 154745, ulks 154745, 56092 3217 64883 25435 5118
mtx 4908/060157A3B8 lks 580, ulks 580, 410 11 154 5 0
mtx 4908/060157E568 lks 4, ulks 4, 4 0 0 0 0
lock latency buckets: <=1 <=10 <=100 <=1000 >1000
mtx 4908/060157E5A8 lks 4, ulks 4, 2 0 2 0 0
mtx 4908/06015B1AD0 lks 12, ulks 12, 3 0 7 2 0
mtx 4908/06019741E8 lks 259, ulks 259, 186 2 54 16 1
mtx 4908/0601974228 lks 259, ulks 259, 27 0 45 63 124
mtx 4908/0602076490 lks 6, ulks 6, 2 0 3 1 0
mtx 7020/0602874000 lks 12, ulks 12, 11 1 0 0 0
mtx 4908/060345CAB0 lks 1, ulks 1, 1 0 0 0 0
mtx 4908/060347FE48 lks 316, ulks 316, 246 13 54 3 0
mtx 4908/0603498600 lks 316825, ulks 316825, 146254 4986 155345 9686 554
mtx 4908/06034C8E68 lks 436, ulks 436, 324 14 95 3 0
======== second display ========
lock latency buckets: <=1 <=10 <=100 <=1000 >1000
mtx 4908/0603498600 lks 316825, ulks 316825, 146254 4986 155345 9686 554
caller 0x0100455269, count 196769, L, /usr/src/git/builtin/pack-objects.c:1695
caller 0x01004552C4, count 15148, U, /usr/src/git/builtin/pack-objects.c:1705
caller 0x0100455478, count 181621, U, /usr/src/git/builtin/pack-objects.c:1702
caller 0x010045554C, count 120056, L, /usr/src/git/builtin/pack-objects.c:1834
caller 0x010045556E, count 120056, U, /usr/src/git/builtin/pack-objects.c:1837
mtx 4908/06034C8E68 lks 436, ulks 436, 324 14 95 3 0
caller 0x018014CC77, count 1, L, /oss/src/winsup/cygwin/thread.cc:475
caller 0x018014CD00, count 1, U, /oss/src/winsup/cygwin/thread.cc:496
caller 0x018014CDAF, count 432, L, /oss/src/winsup/cygwin/thread.cc:971
caller 0x018014CDE6, count 432, U, /oss/src/winsup/cygwin/thread.cc:982
caller 0x018014D07E, count 1, L, /oss/src/winsup/cygwin/thread.cc:1946
caller 0x018014D090, count 1, U, /oss/src/winsup/cygwin/thread.cc:1951
caller 0x018014D7E6, count 1, L, /oss/src/winsup/cygwin/thread.cc:525
caller 0x018014D7FF, count 1, U, /oss/src/winsup/cygwin/thread.cc:533
caller 0x018014EDD7, count 1, U, /oss/src/winsup/cygwin/thread.cc:2400
caller 0x018014EE97, count 1, L, /oss/src/winsup/cygwin/thread.cc:2389
--
Problem reports: http://cygwin.com/problems.html
FAQ: http://cygwin.com/faq/
Documentation: http://cygwin.com/docs.html
Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Cygwin multithreading performance
2015-12-08 10:51 ` Mark Geisert
@ 2015-12-08 15:34 ` Corinna Vinschen
2015-12-08 17:02 ` Corinna Vinschen
0 siblings, 1 reply; 21+ messages in thread
From: Corinna Vinschen @ 2015-12-08 15:34 UTC (permalink / raw)
To: cygwin
[-- Attachment #1: Type: text/plain, Size: 2713 bytes --]
On Dec 8 02:51, Mark Geisert wrote:
> (Maybe cygwin-developers is a better list for this? It's pretty obscure.)
Yes, cygwin-developers is fine since it's gory implementation details.
> Here are some mutex lock stats I've been talking about providing. These are
> from the OP's original testcase 'git repack -a -f' running over a clone of
> the newlib-cygwin source tree. Run on a 2-core, 4-HT machine under Windows
> 7 x64. I'm running a slightly modified cygwin1.dll that has 3 one-line mods
> to thread.cc.
Which I'd like to see a patch of, just to know what you mean.
> I'm considering adding the tools that produced these displays to the
> cygutils package. I'm unsure if the cygwin1.dll mods I've made locally
> should be shipped generally; I don't know how much extra CPU they use, if
> any.
Well, let's have a look. This is open source after all :)
> caller 0x018014CC77, count 1, L, /oss/src/winsup/cygwin/thread.cc:475
> caller 0x018014CD00, count 1, U, /oss/src/winsup/cygwin/thread.cc:496
> caller 0x018014CDAF, count 432, L, /oss/src/winsup/cygwin/thread.cc:971
> caller 0x018014CDE6, count 432, U, /oss/src/winsup/cygwin/thread.cc:982
> caller 0x018014D07E, count 1, L, /oss/src/winsup/cygwin/thread.cc:1946
> caller 0x018014D090, count 1, U, /oss/src/winsup/cygwin/thread.cc:1951
> caller 0x018014D7E6, count 1, L, /oss/src/winsup/cygwin/thread.cc:525
> caller 0x018014D7FF, count 1, U, /oss/src/winsup/cygwin/thread.cc:533
> caller 0x018014EDD7, count 1, U, /oss/src/winsup/cygwin/thread.cc:2400
> caller 0x018014EE97, count 1, L, /oss/src/winsup/cygwin/thread.cc:2389
This is interesting. I'm not sure if anything in the rest of the
output shows how much is wasted on the above two calls, though.
thread.cc:971 and thread.cc:982 are pthread_setcancelstate, and it's
called pretty often as part of stdio functions. Every stdio function
which has to lock the FILE structure also calls pthread_setcancelstate
to disable and reenable cancellation before and after locking. That's
almost any stdio function.
This may be one of the problems which lower performance, but there's no
easy or quick way around that, AFAICS.
There's also the fact that, even for tools using __fsetlocking to disable
stdio locking, pthread_setcancelstate will still be called unconditionally.
The question here is, if that's wrong and pthread_setcancelstate should be
skipped if the application sets FSETLOCKING_BYCALLER.
Thanks,
Corinna
--
Corinna Vinschen Please, send mails regarding Cygwin to
Cygwin Maintainer cygwin AT cygwin DOT com
Red Hat
[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Cygwin multithreading performance
2015-12-08 15:34 ` Corinna Vinschen
@ 2015-12-08 17:02 ` Corinna Vinschen
0 siblings, 0 replies; 21+ messages in thread
From: Corinna Vinschen @ 2015-12-08 17:02 UTC (permalink / raw)
To: cygwin
[-- Attachment #1: Type: text/plain, Size: 3257 bytes --]
On Dec 8 16:34, Corinna Vinschen wrote:
> On Dec 8 02:51, Mark Geisert wrote:
> > (Maybe cygwin-developers is a better list for this? It's pretty obscure.)
>
> Yes, cygwin-developers is fine since it's gory implementation details.
>
> > Here are some mutex lock stats I've been talking about providing. These are
> > from the OP's original testcase 'git repack -a -f' running over a clone of
> > the newlib-cygwin source tree. Run on a 2-core, 4-HT machine under Windows
> > 7 x64. I'm running a slightly modified cygwin1.dll that has 3 one-line mods
> > to thread.cc.
>
> Which I'd like to see a patch of, just to know what you mean.
>
> > I'm considering adding the tools that produced these displays to the
> > cygutils package. I'm unsure if the cygwin1.dll mods I've made locally
> > should be shipped generally; I don't know how much extra CPU they use, if
> > any.
>
> Well, let's have a look. This is open source after all :)
>
> > caller 0x018014CC77, count 1, L, /oss/src/winsup/cygwin/thread.cc:475
> > caller 0x018014CD00, count 1, U, /oss/src/winsup/cygwin/thread.cc:496
> > caller 0x018014CDAF, count 432, L, /oss/src/winsup/cygwin/thread.cc:971
> > caller 0x018014CDE6, count 432, U, /oss/src/winsup/cygwin/thread.cc:982
> > caller 0x018014D07E, count 1, L, /oss/src/winsup/cygwin/thread.cc:1946
> > caller 0x018014D090, count 1, U, /oss/src/winsup/cygwin/thread.cc:1951
> > caller 0x018014D7E6, count 1, L, /oss/src/winsup/cygwin/thread.cc:525
> > caller 0x018014D7FF, count 1, U, /oss/src/winsup/cygwin/thread.cc:533
> > caller 0x018014EDD7, count 1, U, /oss/src/winsup/cygwin/thread.cc:2400
> > caller 0x018014EE97, count 1, L, /oss/src/winsup/cygwin/thread.cc:2389
>
> This is interesting. I'm not sure if anything in the rest of the
> output shows how much is wasted on the above two calls, though.
>
> thread.cc:971 and thread.cc:982 are pthread_setcancelstate, and it's
> called pretty often as part of stdio functions. Every stdio function
> which has to lock the FILE structure also calls pthread_setcancelstate
> to disable and reenable cancellation before and after locking. That's
> almost any stdio function.
>
> This may be one of the problems which lower performance, but there's no
> easy or quick way around that, AFAICS.
>
> There's also the fact that, even for tools using __fsetlocking to disable
> stdio locking, pthread_setcancelstate will still be called unconditionally.
> The question here is, if that's wrong and pthread_setcancelstate should be
> skipped if the application sets FSETLOCKING_BYCALLER.
For a start, I simply removed the mutex lock/unlock in calls to
pthread_setcancelstate and pthread_setcanceltype. These locks are
completely unnecessary. These functions are only called for the current
thread anyway.
I'm just creating a developer snapshot which I'll upload to
https://cygwin.com/snapshots/ in half an hour at the latest. Please
have a look if your testcase behaves better now.
Thanks,
Corinna
--
Corinna Vinschen Please, send mails regarding Cygwin to
Cygwin Maintainer cygwin AT cygwin DOT com
Red Hat
[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Cygwin multithreading performance
2015-11-14 0:24 Cygwin multithreading performance Kacper Michajlow
2015-11-19 20:24 ` Mark Geisert
@ 2015-12-18 15:06 ` Achim Gratz
1 sibling, 0 replies; 21+ messages in thread
From: Achim Gratz @ 2015-12-18 15:06 UTC (permalink / raw)
To: cygwin
Kacper Michajlow writes:
> I recently noticed that Cygwin multithreading is very inefficient. I
> was repacking few git repositories and with Cygwin's git, it spawns
> threads but they are so badly synchronized that there is no speed gain
> over one thread and possible loose because of the overhead. On my
> machine I got 7-10% CPU usage while with git build with mingw easily
> uses 100%.
I've been testing this again with my local copy of Emacs' Git repository
and at least on this two-core system it works just fine (it was working
fine on a four-core system earlier). The object count phase looks
serialized and doesn't go over 50%, however a good deal of that is
system time anyway, so I assume it's file access. The actual
compression uses whatever CPU it can get, with the occasional spike in
system time when it goes to disk (it's an SSD).
Is your repo local or on some remote filesystem?
Regards,
Achim.
--
+<[Q+ Matrix-12 WAVE#46+305 Neuron microQkb Andromeda XTk Blofeld]>+
SD adaptation for Waldorf microQ V2.22R2:
http://Synth.Stromeko.net/Downloads.html#WaldorfSDada
--
Problem reports: http://cygwin.com/problems.html
FAQ: http://cygwin.com/faq/
Documentation: http://cygwin.com/docs.html
Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
^ permalink raw reply [flat|nested] 21+ messages in thread
end of thread, other threads:[~2015-12-18 15:06 UTC | newest]
Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-11-14 0:24 Cygwin multithreading performance Kacper Michajlow
2015-11-19 20:24 ` Mark Geisert
2015-11-20 14:25 ` Kacper Michajlow
2015-11-21 9:21 ` Mark Geisert
2015-11-21 10:53 ` Corinna Vinschen
2015-11-23 7:45 ` Mark Geisert
2015-11-23 10:27 ` John Hein
2015-11-24 1:05 ` Mark Geisert
2015-11-26 9:49 ` Corinna Vinschen
2015-11-26 10:49 ` Mark Geisert
2015-12-05 10:51 ` Mark Geisert
2015-12-05 13:07 ` Kacper Michajlow
2015-12-05 13:59 ` Kacper Michajlow
2015-12-05 22:40 ` Mark Geisert
2015-12-06 2:35 ` Kacper Michajlow
2015-12-06 8:02 ` Mark Geisert
2015-12-06 20:56 ` Kacper Michajlow
2015-12-08 10:51 ` Mark Geisert
2015-12-08 15:34 ` Corinna Vinschen
2015-12-08 17:02 ` Corinna Vinschen
2015-12-18 15:06 ` Achim Gratz
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).