From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail.freedelity.be (mail.freedelity.be [85.158.214.51]) by sourceware.org (Postfix) with ESMTPS id 5DD573858CD1; Wed, 5 Jul 2023 07:08:24 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 5DD573858CD1 Authentication-Results: sourceware.org; dmarc=pass (p=quarantine dis=none) header.from=freedelity.be Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=freedelity.be Received: from [192.168.1.181] (d51A4F48A.access.telenet.be [81.164.244.138]) by mail.freedelity.be (Postfix) with ESMTPSA id 6AA7580BC867; Wed, 5 Jul 2023 09:08:23 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=freedelity.be; s=default; t=1688540903; bh=lvScmNiwzhj8gpiClKln2N/kCNy25SQ2PUQ/E5FZwXk=; h=Subject:From:To:Cc:Date:In-Reply-To:References:From; b=duR9lA4IBTmxA4kVxyGTMdE4XMWl6D1XmiCWRW/wLW+PXjSqwT/0QsjzeYqjAWGEn rjedAuVZ4GjUmBw89/2+zRUqq1VOjTthJnAcn1yc9h/GZLZ8RRGqUBFPuX+63/oT9Q MnxuPVh6RRh12TAIJ4ei82/GUa+Wl28c06FeQ3AE= Message-ID: <911272453cd5141dbe1ccbcc1e195296c5573ca0.camel@freedelity.be> Subject: Re: [PATCH] realloc: Limit chunk reuse to only growing requests [BZ #30579] From: Nicolas Dusart To: Siddhesh Poyarekar , libc-alpha@sourceware.org Cc: Aurelien Jarno Date: Wed, 05 Jul 2023 09:08:22 +0200 In-Reply-To: <20230704182402.1040962-1-siddhesh@sourceware.org> References: <20230704182402.1040962-1-siddhesh@sourceware.org> Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable User-Agent: Evolution 3.48.4 MIME-Version: 1.0 X-Spam-Status: No, score=-10.3 required=5.0 tests=BAYES_00,BODY_8BITS,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,GIT_PATCH_0,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Thanks for having reconsidered the issue and for the quick fix. I can confirm this patch will resolve the issue we have in some of our processes. My main concern was not that we hadn't any way to mitigate this but that this was a transparent change that will very likely impact a lot of projects and teams in the future years. I was not expecting this to impact desktop environment and happen this soon. I was expecting that this would make problems in server environment where it is likely that some processes cache large datasets and suddenly suffer from a big raise in memory usage. As these environments tends to stick to stable distributions, it may take some time before they use latest version of the libc. They could use another allocator but still, they would have had to debug it down to point the libc as the source of that change. Thanks Aur=C3=A9lien for the report, it confirms the impact it may had :) Siddhesh, if you happen to find an heuristic that is suitable and can save reallocations for bigger shrinks, may I suggest to avoid reusing an option if the new behavior of this option does not fit exactly in the expectation of how it worked earlier ? After the addition of trim_threshold in realloc, trim_threshold was now used as a way to tweak how the top of the heap may grow before releasing it to the system, and as a way to tweak how big internal chunks of memory can be left unused and still be acceptable. This prevent us to enjoy one of this optimization given the other is not applicable in one of our use case. I'd argue that if that is appropriate as a change, it is because the first optimisation (trim for the top of heap only) can be considered as ineffective if the second optimisation is not applied too. But I think that you'll agree that this is not the case, the optimisation had already an appreciable effect. So why preventing libc users to opt-out from the new change and still enjoy the older behavior that was available for them ? That was my main concerns when reporting that issue. Following the principle of least astonishment, in a way. Still, the patch is greatly appreciated. Thanks for that and all your work on the glibc. Kind Regards, On Tue, 2023-07-04 at 14:24 -0400, Siddhesh Poyarekar wrote: > The trim_threshold is too aggressive a heuristic to decide if chunk > reuse is OK for reallocated memory; for repeated small, shrinking > allocations it leads to internal fragmentation and for repeated > larger > allocations that fragmentation may blow up even worse due to the > dynamic > nature of the threshold. >=20 > Limit reuse only when it is within the alignment padding, which is 2 > * > size_t for heap allocations and a page size for mmapped allocations. > There's the added wrinkle of THP, but this fix ignores it for now, > pessimizing that case in favor of keeping fragmentation low. >=20 > This resolves BZ #30579. >=20 > Signed-off-by: Siddhesh Poyarekar > Reported-by: Nicolas Dusart > Reported-by: Aurelien Jarno > --- >=20 > The test case in the bz seems fixed with this, bringing VSZ and RSS > back > to ~40M from ~1G.=C2=A0 Aurelien, can you please test with plasma desktop= ? >=20 > Thanks, > Sid >=20 >=20 > =C2=A0malloc/malloc.c | 23 +++++++++++++++-------- > =C2=A01 file changed, 15 insertions(+), 8 deletions(-) >=20 > diff --git a/malloc/malloc.c b/malloc/malloc.c > index b8c0f4f580..e2f1a615a4 100644 > --- a/malloc/malloc.c > +++ b/malloc/malloc.c > @@ -3417,16 +3417,23 @@ __libc_realloc (void *oldmem, size_t bytes) > =C2=A0=C2=A0 if (__glibc_unlikely (mtag_enabled)) > =C2=A0=C2=A0=C2=A0=C2=A0 *(volatile char*) oldmem; > =C2=A0 > -=C2=A0 /* Return the chunk as is whenever possible, i.e. there's enough > usable space > -=C2=A0=C2=A0=C2=A0=C2=A0 but not so much that we end up fragmenting the = block.=C2=A0 We use > the trim > -=C2=A0=C2=A0=C2=A0=C2=A0 threshold as the heuristic to decide the latter= .=C2=A0 */ > -=C2=A0 size_t usable =3D musable (oldmem); > -=C2=A0 if (bytes <=3D usable > -=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 && (unsigned long) (usable - bytes) <=3D = mp_.trim_threshold) > -=C2=A0=C2=A0=C2=A0 return oldmem; > - > =C2=A0=C2=A0 /* chunk corresponding to oldmem */ > =C2=A0=C2=A0 const mchunkptr oldp =3D mem2chunk (oldmem); > + > +=C2=A0 /* Return the chunk as is if the request grows within usable > bytes, typically > +=C2=A0=C2=A0=C2=A0=C2=A0 into the alignment padding.=C2=A0 We want to av= oid reusing the block > for > +=C2=A0=C2=A0=C2=A0=C2=A0 shrinkages because it ends up unnecessarily fra= gmenting the > address space. > +=C2=A0=C2=A0=C2=A0=C2=A0 This is also why the heuristic misses alignment= padding for THP > for > +=C2=A0=C2=A0=C2=A0=C2=A0 now.=C2=A0 */ > +=C2=A0 size_t usable =3D musable (oldmem); > +=C2=A0 if (bytes <=3D usable) > +=C2=A0=C2=A0=C2=A0 { > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 size_t difference =3D usable - bytes; > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 if ((unsigned long) difference < 2 * size= of (INTERNAL_SIZE_T) > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 || (chunk_is_mmapped (o= ldp) && difference <=3D GLRO > (dl_pagesize))) > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0return oldmem; > +=C2=A0=C2=A0=C2=A0 } > + > =C2=A0=C2=A0 /* its size */ > =C2=A0=C2=A0 const INTERNAL_SIZE_T oldsize =3D chunksize (oldp); > =C2=A0