From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=amqa=CX=freedelity.be=nicolas@sourceware.org>
Received: from mail.freedelity.be (mail.freedelity.be [85.158.214.51])
	by sourceware.org (Postfix) with ESMTPS id 5DD573858CD1;
	Wed,  5 Jul 2023 07:08:24 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 5DD573858CD1
Authentication-Results: sourceware.org; dmarc=pass (p=quarantine dis=none) header.from=freedelity.be
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=freedelity.be
Received: from [192.168.1.181] (d51A4F48A.access.telenet.be [81.164.244.138])
	by mail.freedelity.be (Postfix) with ESMTPSA id 6AA7580BC867;
	Wed,  5 Jul 2023 09:08:23 +0200 (CEST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=freedelity.be;
	s=default; t=1688540903;
	bh=lvScmNiwzhj8gpiClKln2N/kCNy25SQ2PUQ/E5FZwXk=;
	h=Subject:From:To:Cc:Date:In-Reply-To:References:From;
	b=duR9lA4IBTmxA4kVxyGTMdE4XMWl6D1XmiCWRW/wLW+PXjSqwT/0QsjzeYqjAWGEn
	 rjedAuVZ4GjUmBw89/2+zRUqq1VOjTthJnAcn1yc9h/GZLZ8RRGqUBFPuX+63/oT9Q
	 MnxuPVh6RRh12TAIJ4ei82/GUa+Wl28c06FeQ3AE=
Message-ID: <911272453cd5141dbe1ccbcc1e195296c5573ca0.camel@freedelity.be>
Subject: Re: [PATCH] realloc: Limit chunk reuse to only growing requests [BZ
 #30579]
From: Nicolas Dusart <nicolas@freedelity.be>
To: Siddhesh Poyarekar <siddhesh@sourceware.org>, libc-alpha@sourceware.org
Cc: Aurelien Jarno <aurelien@aurel32.net>
Date: Wed, 05 Jul 2023 09:08:22 +0200
In-Reply-To: <20230704182402.1040962-1-siddhesh@sourceware.org>
References: <20230704182402.1040962-1-siddhesh@sourceware.org>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
User-Agent: Evolution 3.48.4 
MIME-Version: 1.0
X-Spam-Status: No, score=-10.3 required=5.0 tests=BAYES_00,BODY_8BITS,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,GIT_PATCH_0,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org
List-Id: <libc-alpha.sourceware.org>

Thanks for having reconsidered the issue and for the quick fix.
I can confirm this patch will resolve the issue we have in some of our
processes.

My main concern was not that we hadn't any way to mitigate this but
that this was a transparent change that will very likely impact a lot
of projects and teams in the future years. I was not expecting this to
impact desktop environment and happen this soon. I was expecting that
this would make problems in server environment where it is likely that
some processes cache large datasets and suddenly suffer from a big
raise in memory usage. As these environments tends to stick to stable
distributions, it may take some time before they use latest version of
the libc.
They could use another allocator but still, they would have had to
debug it down to point the libc as the source of that change.

Thanks Aur=C3=A9lien for the report, it confirms the impact it may had :)

Siddhesh, if you happen to find an heuristic that is suitable and can
save reallocations for bigger shrinks, may I suggest to avoid reusing
an option if the new behavior of this option does not fit exactly in
the expectation of how it worked earlier ?
After the addition of trim_threshold in realloc, trim_threshold was now
used as a way to tweak how the top of the heap may grow before
releasing it to the system, and as a way to tweak how big internal
chunks of memory can be left unused and still be acceptable.
This prevent us to enjoy one of this optimization given the other is
not applicable in one of our use case.

I'd argue that if that is appropriate as a change, it is because the
first optimisation (trim for the top of heap only) can be considered as
ineffective if the second optimisation is not applied too.
But I think that you'll agree that this is not the case, the
optimisation had already an appreciable effect.
So why preventing libc users to opt-out from the new change and still
enjoy the older behavior that was available for them ?

That was my main concerns when reporting that issue. Following the
principle of least astonishment, in a way.

Still, the patch is greatly appreciated.
Thanks for that and all your work on the glibc.

Kind Regards,

On Tue, 2023-07-04 at 14:24 -0400, Siddhesh Poyarekar wrote:
> The trim_threshold is too aggressive a heuristic to decide if chunk
> reuse is OK for reallocated memory; for repeated small, shrinking
> allocations it leads to internal fragmentation and for repeated
> larger
> allocations that fragmentation may blow up even worse due to the
> dynamic
> nature of the threshold.
>=20
> Limit reuse only when it is within the alignment padding, which is 2
> *
> size_t for heap allocations and a page size for mmapped allocations.
> There's the added wrinkle of THP, but this fix ignores it for now,
> pessimizing that case in favor of keeping fragmentation low.
>=20
> This resolves BZ #30579.
>=20
> Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
> Reported-by: Nicolas Dusart <nicolas@freedelity.be>
> Reported-by: Aurelien Jarno <aurelien@aurel32.net>
> ---
>=20
> The test case in the bz seems fixed with this, bringing VSZ and RSS
> back
> to ~40M from ~1G.=C2=A0 Aurelien, can you please test with plasma desktop=
?
>=20
> Thanks,
> Sid
>=20
>=20
> =C2=A0malloc/malloc.c | 23 +++++++++++++++--------
> =C2=A01 file changed, 15 insertions(+), 8 deletions(-)
>=20
> diff --git a/malloc/malloc.c b/malloc/malloc.c
> index b8c0f4f580..e2f1a615a4 100644
> --- a/malloc/malloc.c
> +++ b/malloc/malloc.c
> @@ -3417,16 +3417,23 @@ __libc_realloc (void *oldmem, size_t bytes)
> =C2=A0=C2=A0 if (__glibc_unlikely (mtag_enabled))
> =C2=A0=C2=A0=C2=A0=C2=A0 *(volatile char*) oldmem;
> =C2=A0
> -=C2=A0 /* Return the chunk as is whenever possible, i.e. there's enough
> usable space
> -=C2=A0=C2=A0=C2=A0=C2=A0 but not so much that we end up fragmenting the =
block.=C2=A0 We use
> the trim
> -=C2=A0=C2=A0=C2=A0=C2=A0 threshold as the heuristic to decide the latter=
.=C2=A0 */
> -=C2=A0 size_t usable =3D musable (oldmem);
> -=C2=A0 if (bytes <=3D usable
> -=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 && (unsigned long) (usable - bytes) <=3D =
mp_.trim_threshold)
> -=C2=A0=C2=A0=C2=A0 return oldmem;
> -
> =C2=A0=C2=A0 /* chunk corresponding to oldmem */
> =C2=A0=C2=A0 const mchunkptr oldp =3D mem2chunk (oldmem);
> +
> +=C2=A0 /* Return the chunk as is if the request grows within usable
> bytes, typically
> +=C2=A0=C2=A0=C2=A0=C2=A0 into the alignment padding.=C2=A0 We want to av=
oid reusing the block
> for
> +=C2=A0=C2=A0=C2=A0=C2=A0 shrinkages because it ends up unnecessarily fra=
gmenting the
> address space.
> +=C2=A0=C2=A0=C2=A0=C2=A0 This is also why the heuristic misses alignment=
 padding for THP
> for
> +=C2=A0=C2=A0=C2=A0=C2=A0 now.=C2=A0 */
> +=C2=A0 size_t usable =3D musable (oldmem);
> +=C2=A0 if (bytes <=3D usable)
> +=C2=A0=C2=A0=C2=A0 {
> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 size_t difference =3D usable - bytes;
> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 if ((unsigned long) difference < 2 * size=
of (INTERNAL_SIZE_T)
> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 || (chunk_is_mmapped (o=
ldp) && difference <=3D GLRO
> (dl_pagesize)))
> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0return oldmem;
> +=C2=A0=C2=A0=C2=A0 }
> +
> =C2=A0=C2=A0 /* its size */
> =C2=A0=C2=A0 const INTERNAL_SIZE_T oldsize =3D chunksize (oldp);
> =C2=A0