public inbox for cygwin-developers@cygwin.com
 help / color / mirror / Atom feed
* More (?) steps toward jemalloc within Cygwin DLL
@ 2020-06-16  9:16 Mark Geisert
  2020-06-30  9:24 ` Corinna Vinschen
  0 siblings, 1 reply; 7+ messages in thread
From: Mark Geisert @ 2020-06-16  9:16 UTC (permalink / raw)
  To: cygwin-developers

I'm just putting a flag down on this new (to me) territory.  If somebody 
else has claimed this project already, let me know and I'll shove off.

It wasn't much trouble to build a jemalloc.lib and statically link it into 
the Cygwin DLL when the latter is built.  I'm still learning which 
jemalloc configure options are required in order to get complete test 
coverage and to initialize properly within cygwin1.dll.

I'm currently using the "supply your own malloc" mechanism provided by 
Cygwin's malloc_wrapper.cc to overlay the usual dlmalloc-sourced functions 
with replacements from jemalloc.  I suspect there will be allocation 
collisions ahead...

One question has popped up though.  I see from the docs that if one wants 
jemalloc to run thread-aware, it wraps pthread_create() to find out when 
the app has gone from single-threaded to multi-threaded.  But in Cygwin's 
case we'll additionally need to consider cygthreads, won't we?
Thanks,

..mark

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: More (?) steps toward jemalloc within Cygwin DLL
  2020-06-16  9:16 More (?) steps toward jemalloc within Cygwin DLL Mark Geisert
@ 2020-06-30  9:24 ` Corinna Vinschen
  2020-07-03  6:57   ` Mark Geisert
  0 siblings, 1 reply; 7+ messages in thread
From: Corinna Vinschen @ 2020-06-30  9:24 UTC (permalink / raw)
  To: cygwin-developers

Hi Mark,

On Jun 16 02:16, Mark Geisert wrote:
> I'm just putting a flag down on this new (to me) territory.  If somebody
> else has claimed this project already, let me know and I'll shove off.

No, please.  Just keep on working on that.  If you manage to get jemalloc
working and replacing dlmalloc, this would be really great.

> It wasn't much trouble to build a jemalloc.lib and statically link it into
> the Cygwin DLL when the latter is built.  I'm still learning which jemalloc
> configure options are required in order to get complete test coverage and to
> initialize properly within cygwin1.dll.
> 
> I'm currently using the "supply your own malloc" mechanism provided by
> Cygwin's malloc_wrapper.cc to overlay the usual dlmalloc-sourced functions
> with replacements from jemalloc.  I suspect there will be allocation
> collisions ahead...

The real problem here is this:

  __malloc_lock ();
  dl_foo_function ();
  __malloc_unlock ();

This locking is what makes our dlmalloc even slower in multi-threaded
scenarios because it disallows using malloc/free calls concurrently.

If you get jemalloc working, it would be nice in itself, but the main
improvement would be the ability to get rid of these __malloc_lock/
__malloc_unlock brackets.

> One question has popped up though.  I see from the docs that if one wants
> jemalloc to run thread-aware, it wraps pthread_create() to find out when the
> app has gone from single-threaded to multi-threaded.  But in Cygwin's case
> we'll additionally need to consider cygthreads, won't we?

Yes, and the pthread_create call should not be performed in jemalloc
if possible.  The best solution is probably letting jemalloc always
work under threading assumtion, right from the start.


Thanks,
Corinna

-- 
Corinna Vinschen
Cygwin Maintainer

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: More (?) steps toward jemalloc within Cygwin DLL
  2020-06-30  9:24 ` Corinna Vinschen
@ 2020-07-03  6:57   ` Mark Geisert
  2020-07-03 10:11     ` Corinna Vinschen
  0 siblings, 1 reply; 7+ messages in thread
From: Mark Geisert @ 2020-07-03  6:57 UTC (permalink / raw)
  To: cygwin-developers

Hi Corinna,

Corinna Vinschen wrote:
> On Jun 16 02:16, Mark Geisert wrote:
>> I'm just putting a flag down on this new (to me) territory.  If somebody
>> else has claimed this project already, let me know and I'll shove off.
> 
> No, please.  Just keep on working on that.  If you manage to get jemalloc
> working and replacing dlmalloc, this would be really great.

Super.

>> It wasn't much trouble to build a jemalloc.lib and statically link it into
>> the Cygwin DLL when the latter is built.  I'm still learning which jemalloc
>> configure options are required in order to get complete test coverage and to
>> initialize properly within cygwin1.dll.
>>
>> I'm currently using the "supply your own malloc" mechanism provided by
>> Cygwin's malloc_wrapper.cc to overlay the usual dlmalloc-sourced functions
>> with replacements from jemalloc.  I suspect there will be allocation
>> collisions ahead...

I've had to rethink the above a bit.

> The real problem here is this:
> 
>    __malloc_lock ();
>    dl_foo_function ();
>    __malloc_unlock ();
> 
> This locking is what makes our dlmalloc even slower in multi-threaded
> scenarios because it disallows using malloc/free calls concurrently.
> 
> If you get jemalloc working, it would be nice in itself, but the main
> improvement would be the ability to get rid of these __malloc_lock/
> __malloc_unlock brackets.

Thanks for reminding me of that aspect of Cygwin's current malloc.  The malloc 
implementation has seemed to be bulletproof for many years so I guess the 
function-level locking is the only drawback of note?

I've found that jemalloc would add 500kB to cygwin1.dll and it also seems 
difficult to get working, at first blush at least.  I've switched to a plug-in 
sort of implementation that allows one to choose among several malloc packages: 
"original", dlmalloc (w/ internal locking), ptmalloc[23], nedalloc, jemalloc, 
and a Windows Heap wrapper.  Perhaps tcmalloc in the future.  One sets an 
environment variable CYGMALLOC=<name> before launching a program and that malloc 
implementation is used.  This should make testing and benchmarking the various 
choices possible.  I don't expect big improvements in individual programs 
(unless they are stress testing), but something like a large configure or build 
should give more useful data.

..mark

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: More (?) steps toward jemalloc within Cygwin DLL
  2020-07-03  6:57   ` Mark Geisert
@ 2020-07-03 10:11     ` Corinna Vinschen
  2020-07-21  8:50       ` Mark Geisert
  0 siblings, 1 reply; 7+ messages in thread
From: Corinna Vinschen @ 2020-07-03 10:11 UTC (permalink / raw)
  To: cygwin-developers

Hi Mark,

On Jul  2 23:57, Mark Geisert wrote:
> Hi Corinna,
> 
> Corinna Vinschen wrote:
> > On Jun 16 02:16, Mark Geisert wrote:
> > > I'm just putting a flag down on this new (to me) territory.  If somebody
> > > else has claimed this project already, let me know and I'll shove off.
> > 
> > No, please.  Just keep on working on that.  If you manage to get jemalloc
> > working and replacing dlmalloc, this would be really great.
> 
> Super.
> 
> > > It wasn't much trouble to build a jemalloc.lib and statically link it into
> > > the Cygwin DLL when the latter is built.  I'm still learning which jemalloc
> > > configure options are required in order to get complete test coverage and to
> > > initialize properly within cygwin1.dll.
> > > 
> > > I'm currently using the "supply your own malloc" mechanism provided by
> > > Cygwin's malloc_wrapper.cc to overlay the usual dlmalloc-sourced functions
> > > with replacements from jemalloc.  I suspect there will be allocation
> > > collisions ahead...
> 
> I've had to rethink the above a bit.
> 
> > The real problem here is this:
> > 
> >    __malloc_lock ();
> >    dl_foo_function ();
> >    __malloc_unlock ();
> > 
> > This locking is what makes our dlmalloc even slower in multi-threaded
> > scenarios because it disallows using malloc/free calls concurrently.
> > 
> > If you get jemalloc working, it would be nice in itself, but the main
> > improvement would be the ability to get rid of these __malloc_lock/
> > __malloc_unlock brackets.
> 
> Thanks for reminding me of that aspect of Cygwin's current malloc.  The
> malloc implementation has seemed to be bulletproof for many years so I guess
> the function-level locking is the only drawback of note?

Not quite.  It's bad enough, given how much this slows down multi-threaded
executables, but...

...the big problem are dependencies on malloc during Cygwin startup,
especially in fork/exec, so the real challenge is to get the new malloc
still let Cygwin processes start up correctly first time and especially
in fork/exec situations, and to make sure the malloc bookkeeping
survives fork/exec.

These malloc dependencies sometimes crop up in the weirdest situations,
so that's something to look out for.  For instance, using pthread
functions may call malloc as well.  If a problem can be solved by
changing another part of Cygwin, don't hesitate to discuss this!

> I've found that jemalloc would add 500kB to cygwin1.dll and it also seems
> difficult to get working, at first blush at least.

OTOH you leave dlmalloc behind, so that's 280kB less again.

> I've switched to a
> plug-in sort of implementation that allows one to choose among several
> malloc packages: "original", dlmalloc (w/ internal locking), ptmalloc[23],
> nedalloc, jemalloc, and a Windows Heap wrapper.  Perhaps tcmalloc in the
> future.  One sets an environment variable CYGMALLOC=<name> before launching
> a program and that malloc implementation is used.  This should make testing
> and benchmarking the various choices possible.  I don't expect big
> improvements in individual programs (unless they are stress testing), but
> something like a large configure or build should give more useful data.

In the end, we should settle for a single malloc implementation, though.
It doesn't really matter if it's jemalloc, ptmalloc, xymalloc.  Almost
all other modern mallocs are faster and better suited for multi-threading
than dlmalloc, *especially* if the above locks can go away.

The only danger here is this: If you manage to get dlmalloc replaced
reliably, you *will* get a pink plush hippo!


Corinna

-- 
Corinna Vinschen
Cygwin Maintainer

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: More (?) steps toward jemalloc within Cygwin DLL
  2020-07-03 10:11     ` Corinna Vinschen
@ 2020-07-21  8:50       ` Mark Geisert
  2020-07-21 12:00         ` Corinna Vinschen
  2020-07-21 22:06         ` Ford, Brian
  0 siblings, 2 replies; 7+ messages in thread
From: Mark Geisert @ 2020-07-21  8:50 UTC (permalink / raw)
  To: cygwin-developers

Corinna Vinschen wrote:
>>> If you get jemalloc working, it would be nice in itself, but the main
>>> improvement would be the ability to get rid of these __malloc_lock/
>>> __malloc_unlock brackets.
>>
>> Thanks for reminding me of that aspect of Cygwin's current malloc.  The
>> malloc implementation has seemed to be bulletproof for many years so I guess
>> the function-level locking is the only drawback of note?
> 
> Not quite.  It's bad enough, given how much this slows down multi-threaded
> executables, but...
> 
> ...the big problem are dependencies on malloc during Cygwin startup,
> especially in fork/exec, so the real challenge is to get the new malloc
> still let Cygwin processes start up correctly first time and especially
> in fork/exec situations, and to make sure the malloc bookkeeping
> survives fork/exec.

O.K., understood.

> These malloc dependencies sometimes crop up in the weirdest situations,
> so that's something to look out for.  For instance, using pthread
> functions may call malloc as well.  If a problem can be solved by
> changing another part of Cygwin, don't hesitate to discuss this!

Yes, a couple of the malloc packages I'm testing want to allocate locks and TLS 
slots right off the bat so there's nasty recursion possible.

>> I've switched to a
>> plug-in sort of implementation that allows one to choose among several
>> malloc packages: "original", dlmalloc (w/ internal locking), ptmalloc[23],
>> nedalloc, jemalloc, and a Windows Heap wrapper.  Perhaps tcmalloc in the
>> future.  One sets an environment variable CYGMALLOC=<name> before launching
>> a program and that malloc implementation is used.  This should make testing
>> and benchmarking the various choices possible.  I don't expect big
>> improvements in individual programs (unless they are stress testing), but
>> something like a large configure or build should give more useful data.
> 
> In the end, we should settle for a single malloc implementation, though.
> It doesn't really matter if it's jemalloc, ptmalloc, xymalloc.  Almost
> all other modern mallocs are faster and better suited for multi-threading
> than dlmalloc, *especially* if the above locks can go away.

For sure; I didn't make it clear this CYGMALLOC setup is just for testing the 
different malloc packages.  When I stumble across some failing in one of them 
it's nice to be able to quickly re-run using a different malloc.

Here's a question I didn't expect to come up: If it turns out a home-grown 
wrapper on the Win32 HeapXXX functions performs better (hint: it does, 2.5 to 3 
times better) than any malloc package derived from dlmalloc, is there any reason 
why we ought not use it?  Assuming it can be made to work for all those cases 
you mentioned above, of course.

> The only danger here is this: If you manage to get dlmalloc replaced
> reliably, you *will* get a pink plush hippo!

Oh, gee, that sounds like a really nice reward... Wow, I'm gonna have to do this 
project now for sure!

..mark

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: More (?) steps toward jemalloc within Cygwin DLL
  2020-07-21  8:50       ` Mark Geisert
@ 2020-07-21 12:00         ` Corinna Vinschen
  2020-07-21 22:06         ` Ford, Brian
  1 sibling, 0 replies; 7+ messages in thread
From: Corinna Vinschen @ 2020-07-21 12:00 UTC (permalink / raw)
  To: cygwin-developers

Hi Mark,

On Jul 21 01:50, Mark Geisert wrote:
> Corinna Vinschen wrote:
> > [...]
> > ...the big problem are dependencies on malloc during Cygwin startup,
> > especially in fork/exec, so the real challenge is to get the new malloc
> > still let Cygwin processes start up correctly first time and especially
> > in fork/exec situations, and to make sure the malloc bookkeeping
> > survives fork/exec.
> 
> O.K., understood.
> 
> > These malloc dependencies sometimes crop up in the weirdest situations,
> > so that's something to look out for.  For instance, using pthread
> > functions may call malloc as well.  If a problem can be solved by
> > changing another part of Cygwin, don't hesitate to discuss this!
> 
> Yes, a couple of the malloc packages I'm testing want to allocate locks and
> TLS slots right off the bat so there's nasty recursion possible.

Given these locks are process-only, it's probably a good idea to overload
the functions with equivalent WinAPI function calls using, for instance,
slim R/W Locks.  If these locks are stored as global NO_COPY objects, they
don't even have to be initialized at process startup explicitely.  If they
have to be created dynamically, they should probably go into the cygheap,
so they are duplicated automagically.

> [...]
> Here's a question I didn't expect to come up: If it turns out a home-grown
> wrapper on the Win32 HeapXXX functions performs better (hint: it does, 2.5
> to 3 times better) than any malloc package derived from dlmalloc, is there
> any reason why we ought not use it?  Assuming it can be made to work for all
> those cases you mentioned above, of course.

It won't work with fork.  Malloc'd memory has to be duplicated in the exact
same spot during fork (think application pointers to allocated memory).
Windows Heaps are ASLR'ed and the mechanism the heap is allocating and
freeing memory is not built for reproducability in another process.

That's why we have our own process heap given away via sbrk, as well as
diligent bookkeeping of mmap'ed memory.

> > The only danger here is this: If you manage to get dlmalloc replaced
> > reliably, you *will* get a pink plush hippo!
> 
> Oh, gee, that sounds like a really nice reward... Wow, I'm gonna have to do
> this project now for sure!

I'm really looking forward to it!


Thanks,
Corinna

-- 
Corinna Vinschen
Cygwin Maintainer

^ permalink raw reply	[flat|nested] 7+ messages in thread

* RE: More (?) steps toward jemalloc within Cygwin DLL
  2020-07-21  8:50       ` Mark Geisert
  2020-07-21 12:00         ` Corinna Vinschen
@ 2020-07-21 22:06         ` Ford, Brian
  1 sibling, 0 replies; 7+ messages in thread
From: Ford, Brian @ 2020-07-21 22:06 UTC (permalink / raw)
  To: cygwin-developers

FWIW, we found Intel TBB malloc (https://software.intel.com/content/www/us/en/develop/tools/threading-building-blocks.html) to be necessary for our native multi-threaded Windows app performance up until Windows 10 based OS's when the native heap became competitive (and maybe even slightly better).  

-----Original Message-----
From: Cygwin-developers [mailto:cygwin-developers-bounces@cygwin.com] On Behalf Of Mark Geisert
Sent: Tuesday, July 21, 2020 3:51 AM
To: cygwin-developers@cygwin.com
Subject: Re: More (?) steps toward jemalloc within Cygwin DLL

CAUTION EXTERNAL EMAIL: Verify sender, links, and attachments are safe before taking action.


Corinna Vinschen wrote:
>>> If you get jemalloc working, it would be nice in itself, but the 
>>> main improvement would be the ability to get rid of these 
>>> __malloc_lock/ __malloc_unlock brackets.
>>
>> Thanks for reminding me of that aspect of Cygwin's current malloc.  
>> The malloc implementation has seemed to be bulletproof for many years 
>> so I guess the function-level locking is the only drawback of note?
>
> Not quite.  It's bad enough, given how much this slows down 
> multi-threaded executables, but...
>
> ...the big problem are dependencies on malloc during Cygwin startup, 
> especially in fork/exec, so the real challenge is to get the new 
> malloc still let Cygwin processes start up correctly first time and 
> especially in fork/exec situations, and to make sure the malloc 
> bookkeeping survives fork/exec.

O.K., understood.

> These malloc dependencies sometimes crop up in the weirdest 
> situations, so that's something to look out for.  For instance, using 
> pthread functions may call malloc as well.  If a problem can be solved 
> by changing another part of Cygwin, don't hesitate to discuss this!

Yes, a couple of the malloc packages I'm testing want to allocate locks and TLS slots right off the bat so there's nasty recursion possible.

>> I've switched to a
>> plug-in sort of implementation that allows one to choose among 
>> several malloc packages: "original", dlmalloc (w/ internal locking), 
>> ptmalloc[23], nedalloc, jemalloc, and a Windows Heap wrapper.  
>> Perhaps tcmalloc in the future.  One sets an environment variable 
>> CYGMALLOC=<name> before launching a program and that malloc 
>> implementation is used.  This should make testing and benchmarking 
>> the various choices possible.  I don't expect big improvements in 
>> individual programs (unless they are stress testing), but something like a large configure or build should give more useful data.
>
> In the end, we should settle for a single malloc implementation, though.
> It doesn't really matter if it's jemalloc, ptmalloc, xymalloc.  Almost 
> all other modern mallocs are faster and better suited for 
> multi-threading than dlmalloc, *especially* if the above locks can go away.

For sure; I didn't make it clear this CYGMALLOC setup is just for testing the different malloc packages.  When I stumble across some failing in one of them it's nice to be able to quickly re-run using a different malloc.

Here's a question I didn't expect to come up: If it turns out a home-grown wrapper on the Win32 HeapXXX functions performs better (hint: it does, 2.5 to 3 times better) than any malloc package derived from dlmalloc, is there any reason why we ought not use it?  Assuming it can be made to work for all those cases you mentioned above, of course.

> The only danger here is this: If you manage to get dlmalloc replaced 
> reliably, you *will* get a pink plush hippo!

Oh, gee, that sounds like a really nice reward... Wow, I'm gonna have to do this project now for sure!

..mark

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2020-07-21 22:06 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-06-16  9:16 More (?) steps toward jemalloc within Cygwin DLL Mark Geisert
2020-06-30  9:24 ` Corinna Vinschen
2020-07-03  6:57   ` Mark Geisert
2020-07-03 10:11     ` Corinna Vinschen
2020-07-21  8:50       ` Mark Geisert
2020-07-21 12:00         ` Corinna Vinschen
2020-07-21 22:06         ` Ford, Brian

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).