From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id 95CA73959E60; Thu, 24 Sep 2020 16:42:43 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 95CA73959E60 From: "carlos at redhat dot com" To: glibc-bugs@sourceware.org Subject: [Bug malloc/26663] malloc() doesn't handle MALLOC_MMAP_THRESHOLD_=0 well Date: Thu, 24 Sep 2020 16:42:43 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: glibc X-Bugzilla-Component: malloc X-Bugzilla-Version: unspecified X-Bugzilla-Keywords: X-Bugzilla-Severity: normal X-Bugzilla-Who: carlos at redhat dot com X-Bugzilla-Status: NEW X-Bugzilla-Resolution: X-Bugzilla-Priority: P2 X-Bugzilla-Assigned-To: unassigned at sourceware dot org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://sourceware.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: glibc-bugs@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Glibc-bugs mailing list List-Unsubscribe: , List-Archive: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 24 Sep 2020 16:42:43 -0000 https://sourceware.org/bugzilla/show_bug.cgi?id=3D26663 --- Comment #4 from Carlos O'Donell --- (In reply to Topi Miettinen from comment #3) > (In reply to Carlos O'Donell from comment #1) > > (In reply to Topi Miettinen from comment #0) > > Secondly, the MALLOC_MMAP_THRESHOLD_ value is the threshold at which > > requests to extend the heap will go directly to mmap() instead of growi= ng > > the current arena (also done via mmap()). > >=20 > > Thus even if you set MALLOC_MMAP_THRESHOLD_ to a low value, you will st= ill > > use the process heap, it is only when you run out of heap that you will > > switch to trying to mmap() all subsequent allocations via single mmap() > > calls (instead of trying to extend the main arena with another larger s= ingle > > mmap to service current and future requests). > >=20 > > So the first thing to check is the value of MALLOC_MMAP_MAX_ and see if= that > > is set high enough for your expectations. >=20 > I specifically used zero, since I wanted malloc() to always use mmap() for > allocating memory. The original idea was based on my expectation that then > the mmap()ed regions would be randomly distributed in process address spa= ce > ("ASLR for malloc()"), but sadly that does not seem to be the case as the > addresses are pretty much consecutive. Also performance seems to be worse > (judging just from noise of CPU fan). Setting MALLOC_MMAP_THRESHOLD_ to zero does not force all allocations to go through mmap. The man page makes that clear "that can't be satisfied by the free list" and the free list may come from pages in the heap. So it's close= to what you want semantically, but I just wanted to point out that the behavio= ur is a little different. The algorithm will use existing pages if available otherwise it will start calling mmap() all the time and that will cause *terrible* performance. > I've attached a strace file. The problem happens here: >=20 > 1600959989.943625 mmap(NULL, 4096, PROT_READ|PROT_WRITE, > MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) =3D -1 EAGAIN (Resource temporarily > unavailable) >=20 > My expectation for the kernel is that this should only happen if the kern= el > was really unable to allocate a single page, which should never be the ca= se > when there are gigabytes of free RAM in the system. I can't help you there. I don't know why you get EAGAIN. Perhaps you have s= ome process soft limits you are hitting with mappings? cat /proc/sys/vm/max_map_count? > > This suggestion is exactly what it does *normally*, but the setting of > > MALLOC_MMAP_THRESHOLD_=3D0 bypasses that and attempts to *always* servi= ce > > extension requirements with single mmap() calls. >=20 > I meant that instead of single mmap() calls per malloc() call, it would g= rab > a larger area with mmap() and then use that. That is exactly what the algorithm does if you *don't* set MALLOC_MMAP_THRESHOLD_=3D0, but there is one case, the "main" arena where s= brk is used instead of mmap. > I thought that MALLOC_MMAP_THRESHOLD_=3D0 would just switch from sbrk() to > mmap() but otherwise things would stay the same. If MALLOC_MMAP_THRESHOLD= _=3D0 > is not so great tool to achieve this, could there perhaps be another opti= on > to just force mmap() use? Just be totally clear again: - The use of MALLOC_MMAP_THRESHOLD_=3D0 will cause all allocations to become single mmap() calls to service those allocations where they can't be servic= ed from the existing free list (pages taken from sbrk in the main arena, or pa= ges taken from mmap in all other arenas).\ - If mmap() fails under MALLOC_MMAP_THRESHOLD_=3D0 then the arena will be extended by mmap or sbrk (main arena), and if that fails, then we fail the allocation. So it is close to the semantics you want. Can you provide a compelling use case to avoid sbrk()? > > > Maybe also when mmap() specifically fails with EAGAIN it could be ret= ried a > > > few times. It looks like sysmalloc() only checks for MAP_FAILED > > > (https://sourceware.org/git?p=3Dglibc.git;a=3Dblob;f=3Dmalloc/malloc.= c; > > > h=3Dcd9933b4e580a58a694ebf34e76ac6fecee29c14;hb=3DHEAD#l2330) but not= errno. > > > Perhaps also some unused memory could be released by munmap()ping unu= sed > > > parts of arenas etc. before retrying. > >=20 > > Calls to malloc() should not block retrying mmap(), they should fail > > immediately and report that failure to higher software layers that have > > better visbility into overall allocation strategy. The retry should hap= pen > > at higher levels. From the allocators perspective there is no difference > > between EAGAIN and ENOMEM. >=20 > OK. I think most callers just look at return value of NULL and don't look= at > errno code. The malloc API returns ENOMEM. It is the kernel that may return EAGAIN for = mmap failures, and the algorithm doesn't care about those, they are translated i= nto malloc API failures e.g. return null pointer and set errno to ENOMEM. We do= n't know when the memory will become available again, only the application can = know how to free what it doesn't need (though ENOMEM/EAGAIN in glibc could trigg= er trimming and it doesn't today). > > There is one place where we can improve though, and it's in tracking the > > currently cached amount of memory that the process is using, and keeping > > that at a constant percentage by walking the cached chunk lists and fre= eing > > down to an acceptable percentage e.g. calling malloc_trim(0) but termin= ating > > when a certain percentage has been freed. >=20 > That could also work. Though if the problem is at kernel side, maybe mmap= () > would eventually fail again later.=20 Correct. > >=20 > > Does this answer your questions? > >=20 > > If it does then I'll mark this RESOLVED/NOTABUG >=20 > My only remaining question is that can a new flag could be introduced to > instruct malloc() to never use sbrk() and the process heap but otherwise > perform normally while using mmap() for allocating? That way, if one day > mmap() could be randomized, the memory arenas would be at random location= s. But the memory wouldn't be at random locations? As soon as we start caching= the result of an mmap() then subsequent malloc()s return consecutive locations? What exactly are you trying to achieve here? What is the attack vector if this is a security issue? > Maybe I could also force the randomization with a special allocator by > plugging into malloc internals and using LD_PRELOAD to force this to > critical applications. I suppose then nothing would need changing, but it= 's > a hack. Perhaps also glibc could do this, for example another option to > malloc()? We need to understand your use case if we're going to make improvements. Perhaps there is a better way of doing this. Perhaps as you suggest you wan= t to use some other ultra-hardened malloc that you can interpose with LD_PRELOAD= to provide you additional security guarantees at the cost of performance (something that the generic system allocator may not want to do by default). --=20 You are receiving this mail because: You are on the CC list for the bug.=