public inbox for glibc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug malloc/26663] New: malloc() doesn't handle MALLOC_MMAP_THRESHOLD_=0 well
@ 2020-09-24 13:36 toiwoton at gmail dot com
  2020-09-24 14:53 ` [Bug malloc/26663] " carlos at redhat dot com
                   ` (4 more replies)
  0 siblings, 5 replies; 6+ messages in thread
From: toiwoton at gmail dot com @ 2020-09-24 13:36 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=26663

            Bug ID: 26663
           Summary: malloc() doesn't handle MALLOC_MMAP_THRESHOLD_=0 well
           Product: glibc
           Version: unspecified
            Status: UNCONFIRMED
          Severity: normal
          Priority: P2
         Component: malloc
          Assignee: unassigned at sourceware dot org
          Reporter: toiwoton at gmail dot com
  Target Milestone: ---

Using environment variable MALLOC_MMAP_THRESHOLD_=0 may cause malloc() to fail
very easily. A case in point is startup of systemd-cryptsetup, which fails when
lvm2 library can't malloc() something
(https://github.com/lvmteam/lvm2/issues/39). This is probably a bug in kernel,
there's plenty of RAM available and I suppose the memory shouldn't ever be too
fragmented for page sized items.

But perhaps malloc() should use a better allocation strategy when mmap()ing,
for example mmap() larger areas at once, use them as arenas and deal them to
malloc() users as needed.

Maybe also when mmap() specifically fails with EAGAIN it could be retried a few
times. It looks like sysmalloc() only checks for MAP_FAILED
(https://sourceware.org/git?p=glibc.git;a=blob;f=malloc/malloc.c;h=cd9933b4e580a58a694ebf34e76ac6fecee29c14;hb=HEAD#l2330)
but not errno. Perhaps also some unused memory could be released by
munmap()ping unused parts of arenas etc. before retrying.

-Topi

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug malloc/26663] malloc() doesn't handle MALLOC_MMAP_THRESHOLD_=0 well
  2020-09-24 13:36 [Bug malloc/26663] New: malloc() doesn't handle MALLOC_MMAP_THRESHOLD_=0 well toiwoton at gmail dot com
@ 2020-09-24 14:53 ` carlos at redhat dot com
  2020-09-24 15:20 ` toiwoton at gmail dot com
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: carlos at redhat dot com @ 2020-09-24 14:53 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=26663

Carlos O'Donell <carlos at redhat dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |carlos at redhat dot com
     Ever confirmed|0                           |1
   Last reconfirmed|                            |2020-09-24
             Status|UNCONFIRMED                 |NEW

--- Comment #1 from Carlos O'Donell <carlos at redhat dot com> ---
(In reply to Topi Miettinen from comment #0)
> Using environment variable MALLOC_MMAP_THRESHOLD_=0 may cause malloc() to
> fail very easily. A case in point is startup of systemd-cryptsetup, which
> fails when lvm2 library can't malloc() something
> (https://github.com/lvmteam/lvm2/issues/39). This is probably a bug in
> kernel, there's plenty of RAM available and I suppose the memory shouldn't
> ever be too fragmented for page sized items.

In the upstream ticket I've asked "Why?" It's important we understand your use
case so we can give better feedback about solutions.

In this bug we'll talk about the specific behaviour of malloc.

Firstly you need to be aware that you'll hit MALLOC_MMAP_MAX_ at some point
which is 65,536 mappings before malloc stops allocating anything.

Secondly, the MALLOC_MMAP_THRESHOLD_ value is the threshold at which requests
to extend the heap will go directly to mmap() instead of growing the current
arena (also done via mmap()).

Thus even if you set MALLOC_MMAP_THRESHOLD_ to a low value, you will still use
the process heap, it is only when you run out of heap that you will switch to
trying to mmap() all subsequent allocations via single mmap() calls (instead of
trying to extend the main arena with another larger single mmap to service
current and future requests).

So the first thing to check is the value of MALLOC_MMAP_MAX_ and see if that is
set high enough for your expectations.

I would also use `strace -ff -ttt` on the process to see if mmap() is actually
failing or not, and with what values.

If you want deeper insight into your application behaviour you can use a
trace/dump tooling for the malloc API calls:
https://pagure.io/glibc-malloc-trace-utils
(not yet integrated upstream)

> But perhaps malloc() should use a better allocation strategy when mmap()ing,
> for example mmap() larger areas at once, use them as arenas and deal them to
> malloc() users as needed.

This suggestion is exactly what it does *normally*, but the setting of
MALLOC_MMAP_THRESHOLD_=0 bypasses that and attempts to *always* service
extension requirements with single mmap() calls.

> Maybe also when mmap() specifically fails with EAGAIN it could be retried a
> few times. It looks like sysmalloc() only checks for MAP_FAILED
> (https://sourceware.org/git?p=glibc.git;a=blob;f=malloc/malloc.c;
> h=cd9933b4e580a58a694ebf34e76ac6fecee29c14;hb=HEAD#l2330) but not errno.
> Perhaps also some unused memory could be released by munmap()ping unused
> parts of arenas etc. before retrying.

Calls to malloc() should not block retrying mmap(), they should fail
immediately and report that failure to higher software layers that have better
visbility into overall allocation strategy. The retry should happen at higher
levels. From the allocators perspective there is no difference between EAGAIN
and ENOMEM.

There is one place where we can improve though, and it's in tracking the
currently cached amount of memory that the process is using, and keeping that
at a constant percentage by walking the cached chunk lists and freeing down to
an acceptable percentage e.g. calling malloc_trim(0) but terminating when a
certain percentage has been freed.

Does this answer your questions?

If it does then I'll mark this RESOLVED/NOTABUG

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug malloc/26663] malloc() doesn't handle MALLOC_MMAP_THRESHOLD_=0 well
  2020-09-24 13:36 [Bug malloc/26663] New: malloc() doesn't handle MALLOC_MMAP_THRESHOLD_=0 well toiwoton at gmail dot com
  2020-09-24 14:53 ` [Bug malloc/26663] " carlos at redhat dot com
@ 2020-09-24 15:20 ` toiwoton at gmail dot com
  2020-09-24 15:55 ` toiwoton at gmail dot com
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: toiwoton at gmail dot com @ 2020-09-24 15:20 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=26663

--- Comment #2 from Topi Miettinen <toiwoton at gmail dot com> ---
Created attachment 12863
  --> https://sourceware.org/bugzilla/attachment.cgi?id=12863&action=edit
strace

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug malloc/26663] malloc() doesn't handle MALLOC_MMAP_THRESHOLD_=0 well
  2020-09-24 13:36 [Bug malloc/26663] New: malloc() doesn't handle MALLOC_MMAP_THRESHOLD_=0 well toiwoton at gmail dot com
  2020-09-24 14:53 ` [Bug malloc/26663] " carlos at redhat dot com
  2020-09-24 15:20 ` toiwoton at gmail dot com
@ 2020-09-24 15:55 ` toiwoton at gmail dot com
  2020-09-24 16:42 ` carlos at redhat dot com
  2024-01-11  9:39 ` fweimer at redhat dot com
  4 siblings, 0 replies; 6+ messages in thread
From: toiwoton at gmail dot com @ 2020-09-24 15:55 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=26663

--- Comment #3 from Topi Miettinen <toiwoton at gmail dot com> ---
(In reply to Carlos O'Donell from comment #1)
> (In reply to Topi Miettinen from comment #0)
> > Using environment variable MALLOC_MMAP_THRESHOLD_=0 may cause malloc() to
> > fail very easily. A case in point is startup of systemd-cryptsetup, which
> > fails when lvm2 library can't malloc() something
> > (https://github.com/lvmteam/lvm2/issues/39). This is probably a bug in
> > kernel, there's plenty of RAM available and I suppose the memory shouldn't
> > ever be too fragmented for page sized items.
> 
> In the upstream ticket I've asked "Why?" It's important we understand your
> use case so we can give better feedback about solutions.
> 
> In this bug we'll talk about the specific behaviour of malloc.
> 
> Firstly you need to be aware that you'll hit MALLOC_MMAP_MAX_ at some point
> which is 65,536 mappings before malloc stops allocating anything.

I suppose that this limit will not be an issue, since other services and user
applications worked just fine.

> Secondly, the MALLOC_MMAP_THRESHOLD_ value is the threshold at which
> requests to extend the heap will go directly to mmap() instead of growing
> the current arena (also done via mmap()).
> 
> Thus even if you set MALLOC_MMAP_THRESHOLD_ to a low value, you will still
> use the process heap, it is only when you run out of heap that you will
> switch to trying to mmap() all subsequent allocations via single mmap()
> calls (instead of trying to extend the main arena with another larger single
> mmap to service current and future requests).
> 
> So the first thing to check is the value of MALLOC_MMAP_MAX_ and see if that
> is set high enough for your expectations.

I specifically used zero, since I wanted malloc() to always use mmap() for
allocating memory. The original idea was based on my expectation that then the
mmap()ed regions would be randomly distributed in process address space ("ASLR
for malloc()"), but sadly that does not seem to be the case as the addresses
are pretty much consecutive. Also performance seems to be worse (judging just
from noise of CPU fan).

> I would also use `strace -ff -ttt` on the process to see if mmap() is
> actually failing or not, and with what values.

I've attached a strace file. The problem happens here:

1600959989.943625 mmap(NULL, 4096, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = -1 EAGAIN (Resource temporarily
unavailable)

My expectation for the kernel is that this should only happen if the kernel was
really unable to allocate a single page, which should never be the case when
there are gigabytes of free RAM in the system.

> If you want deeper insight into your application behaviour you can use a
> trace/dump tooling for the malloc API calls:
> https://pagure.io/glibc-malloc-trace-utils
> (not yet integrated upstream)
>  
> > But perhaps malloc() should use a better allocation strategy when mmap()ing,
> > for example mmap() larger areas at once, use them as arenas and deal them to
> > malloc() users as needed.
> 
> This suggestion is exactly what it does *normally*, but the setting of
> MALLOC_MMAP_THRESHOLD_=0 bypasses that and attempts to *always* service
> extension requirements with single mmap() calls.

I meant that instead of single mmap() calls per malloc() call, it would grab a
larger area with mmap() and then use that.

I thought that MALLOC_MMAP_THRESHOLD_=0 would just switch from sbrk() to mmap()
but otherwise things would stay the same. If MALLOC_MMAP_THRESHOLD_=0 is not so
great tool to achieve this, could there perhaps be another option to just force
mmap() use?

> > Maybe also when mmap() specifically fails with EAGAIN it could be retried a
> > few times. It looks like sysmalloc() only checks for MAP_FAILED
> > (https://sourceware.org/git?p=glibc.git;a=blob;f=malloc/malloc.c;
> > h=cd9933b4e580a58a694ebf34e76ac6fecee29c14;hb=HEAD#l2330) but not errno.
> > Perhaps also some unused memory could be released by munmap()ping unused
> > parts of arenas etc. before retrying.
> 
> Calls to malloc() should not block retrying mmap(), they should fail
> immediately and report that failure to higher software layers that have
> better visbility into overall allocation strategy. The retry should happen
> at higher levels. From the allocators perspective there is no difference
> between EAGAIN and ENOMEM.

OK. I think most callers just look at return value of NULL and don't look at
errno code.

> There is one place where we can improve though, and it's in tracking the
> currently cached amount of memory that the process is using, and keeping
> that at a constant percentage by walking the cached chunk lists and freeing
> down to an acceptable percentage e.g. calling malloc_trim(0) but terminating
> when a certain percentage has been freed.

That could also work. Though if the problem is at kernel side, maybe mmap()
would eventually fail again later. 

> 
> Does this answer your questions?
> 
> If it does then I'll mark this RESOLVED/NOTABUG

My only remaining question is that can a new flag could be introduced to
instruct malloc() to never use sbrk() and the process heap but otherwise
perform normally while using mmap() for allocating? That way, if one day mmap()
could be randomized, the memory arenas would be at random locations.

I think also the current feature would work to a degree (minus the possible
performance issue) with improvements to kernel (avoid failing and randomize the
address).

Maybe I could also force the randomization with a special allocator by plugging
into malloc internals and using LD_PRELOAD to force this to critical
applications. I suppose then nothing would need changing, but it's a hack.
Perhaps also glibc could do this, for example another option to malloc()?

-Topi

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug malloc/26663] malloc() doesn't handle MALLOC_MMAP_THRESHOLD_=0 well
  2020-09-24 13:36 [Bug malloc/26663] New: malloc() doesn't handle MALLOC_MMAP_THRESHOLD_=0 well toiwoton at gmail dot com
                   ` (2 preceding siblings ...)
  2020-09-24 15:55 ` toiwoton at gmail dot com
@ 2020-09-24 16:42 ` carlos at redhat dot com
  2024-01-11  9:39 ` fweimer at redhat dot com
  4 siblings, 0 replies; 6+ messages in thread
From: carlos at redhat dot com @ 2020-09-24 16:42 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=26663

--- Comment #4 from Carlos O'Donell <carlos at redhat dot com> ---
(In reply to Topi Miettinen from comment #3)
> (In reply to Carlos O'Donell from comment #1)
> > (In reply to Topi Miettinen from comment #0)
> > Secondly, the MALLOC_MMAP_THRESHOLD_ value is the threshold at which
> > requests to extend the heap will go directly to mmap() instead of growing
> > the current arena (also done via mmap()).
> > 
> > Thus even if you set MALLOC_MMAP_THRESHOLD_ to a low value, you will still
> > use the process heap, it is only when you run out of heap that you will
> > switch to trying to mmap() all subsequent allocations via single mmap()
> > calls (instead of trying to extend the main arena with another larger single
> > mmap to service current and future requests).
> > 
> > So the first thing to check is the value of MALLOC_MMAP_MAX_ and see if that
> > is set high enough for your expectations.
> 
> I specifically used zero, since I wanted malloc() to always use mmap() for
> allocating memory. The original idea was based on my expectation that then
> the mmap()ed regions would be randomly distributed in process address space
> ("ASLR for malloc()"), but sadly that does not seem to be the case as the
> addresses are pretty much consecutive. Also performance seems to be worse
> (judging just from noise of CPU fan).

Setting MALLOC_MMAP_THRESHOLD_ to zero does not force all allocations to go
through mmap. The man page makes that clear "that can't be satisfied by the
free list" and the free list may come from pages in the heap. So it's close to
what you want semantically, but I just wanted to point out that the behaviour
is a little different. The algorithm will use existing pages if available
otherwise it will start calling mmap() all the time and that will cause
*terrible* performance.

> I've attached a strace file. The problem happens here:
> 
> 1600959989.943625 mmap(NULL, 4096, PROT_READ|PROT_WRITE,
> MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = -1 EAGAIN (Resource temporarily
> unavailable)
> 
> My expectation for the kernel is that this should only happen if the kernel
> was really unable to allocate a single page, which should never be the case
> when there are gigabytes of free RAM in the system.

I can't help you there. I don't know why you get EAGAIN. Perhaps you have some
process soft limits you are hitting with mappings?

cat /proc/sys/vm/max_map_count?

> > This suggestion is exactly what it does *normally*, but the setting of
> > MALLOC_MMAP_THRESHOLD_=0 bypasses that and attempts to *always* service
> > extension requirements with single mmap() calls.
> 
> I meant that instead of single mmap() calls per malloc() call, it would grab
> a larger area with mmap() and then use that.

That is exactly what the algorithm does if you *don't* set
MALLOC_MMAP_THRESHOLD_=0, but there is one case, the "main" arena where sbrk is
used instead of mmap.

> I thought that MALLOC_MMAP_THRESHOLD_=0 would just switch from sbrk() to
> mmap() but otherwise things would stay the same. If MALLOC_MMAP_THRESHOLD_=0
> is not so great tool to achieve this, could there perhaps be another option
> to just force mmap() use?

Just be totally clear again:

- The use of MALLOC_MMAP_THRESHOLD_=0 will cause all allocations to become
single mmap() calls to service those allocations where they can't be serviced
from the existing free list (pages taken from sbrk in the main arena, or pages
taken from mmap in all other arenas).\

- If mmap() fails under MALLOC_MMAP_THRESHOLD_=0 then the arena will be
extended by mmap or sbrk (main arena), and if that fails, then we fail the
allocation.

So it is close to the semantics you want.

Can you provide a compelling use case to avoid sbrk()?

> > > Maybe also when mmap() specifically fails with EAGAIN it could be retried a
> > > few times. It looks like sysmalloc() only checks for MAP_FAILED
> > > (https://sourceware.org/git?p=glibc.git;a=blob;f=malloc/malloc.c;
> > > h=cd9933b4e580a58a694ebf34e76ac6fecee29c14;hb=HEAD#l2330) but not errno.
> > > Perhaps also some unused memory could be released by munmap()ping unused
> > > parts of arenas etc. before retrying.
> > 
> > Calls to malloc() should not block retrying mmap(), they should fail
> > immediately and report that failure to higher software layers that have
> > better visbility into overall allocation strategy. The retry should happen
> > at higher levels. From the allocators perspective there is no difference
> > between EAGAIN and ENOMEM.
> 
> OK. I think most callers just look at return value of NULL and don't look at
> errno code.

The malloc API returns ENOMEM. It is the kernel that may return EAGAIN for mmap
failures, and the algorithm doesn't care about those, they are translated into
malloc API failures e.g. return null pointer and set errno to ENOMEM. We don't
know when the memory will become available again, only the application can know
how to free what it doesn't need (though ENOMEM/EAGAIN in glibc could trigger
trimming and it doesn't today).

> > There is one place where we can improve though, and it's in tracking the
> > currently cached amount of memory that the process is using, and keeping
> > that at a constant percentage by walking the cached chunk lists and freeing
> > down to an acceptable percentage e.g. calling malloc_trim(0) but terminating
> > when a certain percentage has been freed.
> 
> That could also work. Though if the problem is at kernel side, maybe mmap()
> would eventually fail again later. 

Correct.

> > 
> > Does this answer your questions?
> > 
> > If it does then I'll mark this RESOLVED/NOTABUG
> 
> My only remaining question is that can a new flag could be introduced to
> instruct malloc() to never use sbrk() and the process heap but otherwise
> perform normally while using mmap() for allocating? That way, if one day
> mmap() could be randomized, the memory arenas would be at random locations.

But the memory wouldn't be at random locations? As soon as we start caching the
result of an mmap() then subsequent malloc()s return consecutive locations?

What exactly are you trying to achieve here?

What is the attack vector if this is a security issue?

> Maybe I could also force the randomization with a special allocator by
> plugging into malloc internals and using LD_PRELOAD to force this to
> critical applications. I suppose then nothing would need changing, but it's
> a hack. Perhaps also glibc could do this, for example another option to
> malloc()?

We need to understand your use case if we're going to make improvements.
Perhaps there is a better way of doing this. Perhaps as you suggest you want to
use some other ultra-hardened malloc that you can interpose with LD_PRELOAD to
provide you additional security guarantees at the cost of performance
(something that the generic system allocator may not want to do by default).

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug malloc/26663] malloc() doesn't handle MALLOC_MMAP_THRESHOLD_=0 well
  2020-09-24 13:36 [Bug malloc/26663] New: malloc() doesn't handle MALLOC_MMAP_THRESHOLD_=0 well toiwoton at gmail dot com
                   ` (3 preceding siblings ...)
  2020-09-24 16:42 ` carlos at redhat dot com
@ 2024-01-11  9:39 ` fweimer at redhat dot com
  4 siblings, 0 replies; 6+ messages in thread
From: fweimer at redhat dot com @ 2024-01-11  9:39 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=26663

Florian Weimer <fweimer at redhat dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |fweimer at redhat dot com
             Status|NEW                         |RESOLVED
         Resolution|---                         |NOTABUG

--- Comment #5 from Florian Weimer <fweimer at redhat dot com> ---
This is caused by the memlockall call, see comment 2:

1600959989.934866 mlockall(MCL_FUTURE)  = 0

This triggers a fairly low memory limit, see ulimit -H.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2024-01-11  9:39 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-24 13:36 [Bug malloc/26663] New: malloc() doesn't handle MALLOC_MMAP_THRESHOLD_=0 well toiwoton at gmail dot com
2020-09-24 14:53 ` [Bug malloc/26663] " carlos at redhat dot com
2020-09-24 15:20 ` toiwoton at gmail dot com
2020-09-24 15:55 ` toiwoton at gmail dot com
2020-09-24 16:42 ` carlos at redhat dot com
2024-01-11  9:39 ` fweimer at redhat dot com

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).