Re: posix_memalign performance regression in 2.38?

public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed

From: Florian Weimer <fweimer@redhat.com>
To: Xi Ruoyao via Libc-alpha <libc-alpha@sourceware.org>
Cc: DJ Delorie <dj@redhat.com>,  Sam James <sam@gentoo.org>,
	 Xi Ruoyao <xry111@xry111.site>,
	 adhemerval.zanella@linaro.org, dilfridge@gentoo.org,
	 timo@rothenpieler.org
Subject: Re: posix_memalign performance regression in 2.38?
Date: Wed, 09 Aug 2023 12:47:39 +0200	[thread overview]
Message-ID: <871qgc1ohg.fsf@oldenburg.str.redhat.com> (raw)
In-Reply-To: <e6e8c4f3f0abee73966021f6f207eedd51b77049.camel@xry111.site> (Xi Ruoyao via Libc-alpha's message of "Tue, 08 Aug 2023 16:08:32 +0800")

* Xi Ruoyao via Libc-alpha:

> On Mon, 2023-08-07 at 23:38 -0400, DJ Delorie wrote:
>> 
>> Reproduced.
>> 
>> In the case where I reproduced it, the most common problematic case was
>> an allocation of 64-byte aligned chunks of 472 bytes, where 30 smallbin
>> chunks were tested without finding a match.
>> 
>> The most common non-problematic case was a 64-byte-aligned request for
>> 24 bytes.
>> 
>> There were a LOT of other size requests.  The smallest I saw was TWO
>> bytes.  WHY?  I'm tempted to not fix this, to teach developers to not
>> use posix_memalign() unless they REALLY need it ;-)
>
>
> Have you tested this?
>
> $ cat t.c
> #include <stdlib.h>
> int main()
> {
> 	void *buf;
> 	for (int i = 0; i < (1 << 16); i++)
> 		posix_memalign(&buf, 64, 64);
> }
>
> To me this is quite reasonable (if we just want many blocks each can fit
> into a cache line), but this costs 17.7 seconds on my system.  Do you
> think people just should avoid this?  If so we at least need to document
> the issue in the manual.

This code doesn't work well for glibc malloc (and other dlmalloc-style
mallocs), and never has.  Even with glibc 2.37, it produces a heap
layout like this:

v: 64-byte allocation boundary (all characters are 8 byte wide)
U: available user data
u: unused userdata tail
m: glibc metadata
-: data available for allocation

   v       v       v       v       v       v       v       v
   UUUUUUUUum--------------UUUUUUUUum--------------UUUUUUUUum

This can be seen from the 192 byte increments in the pointers.  The gaps
are not wide enough for reuse, so that part is expected.

However, we should not produce these gaps because with a clean heap, we
split from the remainder, so we should produce this more compact layout
instead:

   v       v       v       v       v       v       v       v
   UUUUUUUUum------UUUUUUUUum------UUUUUUUUum------UUUUUUUUum

It seems to me that this doesn't happen because we call _int_free to
give back the unused memory, and _int_free will use tcache and fastbins,
so it does not make memory available for consolidation.  Eventually this
memory is flushed to the low-level allocator, but that's too late
because then we already have another allocation after 112 bytes that
block further consolidation.  And of course these 112 byte chunks are
all not suitably aligned for re-use.

(Even the compact layout wastes 50% of memory, but at least it's better
than what any glibc version produces today.)

DJ, could you look at bypassing the start of _int_free for these
deallocations?  Once we do that, at least for the synthetic reproducer,
I expect the bin lists to remain short, so hunting for aligned chunks
(which still will not exist) will be fast.

Thanks,
Florian

next prev parent reply	other threads:[~2023-08-09 10:47 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-08-04  2:52 Xi Ruoyao
2023-08-04 14:12 ` Adhemerval Zanella Netto
2023-08-07 19:49   ` DJ Delorie
2023-08-07 19:57     ` Sam James
2023-08-07 20:15       ` DJ Delorie
2023-08-08  3:38       ` DJ Delorie
2023-08-08  8:08         ` Xi Ruoyao
2023-08-08 15:08           ` DJ Delorie
2023-08-09 10:47           ` Florian Weimer [this message]
2023-08-09 16:59             ` Florian Weimer
2023-08-07 19:58     ` Noah Goldstein
2023-08-07 20:07       ` DJ Delorie

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=871qgc1ohg.fsf@oldenburg.str.redhat.com \
    --to=fweimer@redhat.com \
    --cc=adhemerval.zanella@linaro.org \
    --cc=dilfridge@gentoo.org \
    --cc=dj@redhat.com \
    --cc=libc-alpha@sourceware.org \
    --cc=sam@gentoo.org \
    --cc=timo@rothenpieler.org \
    --cc=xry111@xry111.site \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).