Re: [PATCH v6 3/4] Reduce CAS in malloc spinlocks

public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed

From: DJ Delorie <dj@redhat.com>
To: "H.J. Lu" <hjl.tools@gmail.com>
Cc: libc-alpha@sourceware.org
Subject: Re: [PATCH v6 3/4] Reduce CAS in malloc spinlocks
Date: Thu, 23 Feb 2023 00:48:52 -0500	[thread overview]
Message-ID: <xn7cw97wzf.fsf@greed.delorie.com> (raw)
In-Reply-To: <20211111162428.2286605-4-hjl.tools@gmail.com>

Sorry for letting this one slip...

"H.J. Lu via Libc-alpha" <libc-alpha@sourceware.org> writes:
>        size_t n = narenas;
>        if (__glibc_unlikely (n <= narenas_limit - 1))
>          {
> +          if (atomic_load_relaxed (&narenas) != n)
> +           {
> +              atomic_spin_nop ();
> +              goto repeat;
> +           }
>            if (catomic_compare_and_exchange_bool_acq (&narenas, n + 1, n))
>              goto repeat;

I understand that a congested spinloop will benefit from this kind of
change, but... we JUST loaded narenas into n, and adding arenas is rare.
We probably should have loaded it atomically, but still, we just loaded
it.  The odds of malloc being so congested that we miss the CAS is
essentially (but of course not exactly) zero.  Are we just adding an
uneeded atomic read here?  Do any benchmarks say this would be
beneficial?

Also, the malloc code is already complicated enough.  Are the extra
lines of code and slight reduction in readability justified?

Also, we've been migrating to C11-like atomics; would this patch need
changing for that?

Should target-specific atomics optimizations be "hidden" somewhere in
the atomics implementation?  Just because x86 may benefit from a
pre-read doesn't mean that all targets will, and if x86 generally
benefits, it should update its implementation of the atomics to do that
at a lower level.

>            a = _int_new_arena (size);
> diff --git a/malloc/malloc.c b/malloc/malloc.c
> index 095d97a3be..403ffb84ef 100644
> --- a/malloc/malloc.c
> +++ b/malloc/malloc.c
> @@ -3717,6 +3717,11 @@ _int_malloc (mstate av, size_t bytes)
>        pp = REVEAL_PTR (victim->fd);                                     \
>        if (__glibc_unlikely (pp != NULL && misaligned_chunk (pp)))       \
>  	malloc_printerr ("malloc(): unaligned fastbin chunk detected"); \
> +      if (atomic_load_relaxed (fb) != victim)		\
> +	{						\
> +	  atomic_spin_nop ();				\
> +	  continue;					\
> +	}						\
>      }							\
>    while ((pp = catomic_compare_and_exchange_val_acq (fb, pp, victim)) \
>  	 != victim);					\
> @@ -4435,6 +4440,11 @@ _int_free (mstate av, mchunkptr p, int have_lock)
>  	    malloc_printerr ("double free or corruption (fasttop)");
>  	  old2 = old;
>  	  p->fd = PROTECT_PTR (&p->fd, old);
> +	  if (atomic_load_relaxed (fb) != old2)
> +	    {
> +	      atomic_spin_nop ();
> +	      continue;
> +	    }
>  	}
>        while ((old = catomic_compare_and_exchange_val_rel (fb, p, old2))
>  	     != old2);

Likewise, although these are less rare, but not so common as I'd expect
a benefit from the extra code.

next prev parent reply	other threads:[~2023-02-23  5:48 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-11-11 16:24 [PATCH v6 0/4] Optimize CAS [BZ #28537] H.J. Lu
2021-11-11 16:24 ` [PATCH v6 1/4] Add LLL_MUTEX_READ_LOCK " H.J. Lu
2021-11-12 17:23   ` Szabolcs Nagy
2021-11-17  2:24   ` Noah Goldstein
2021-11-17 23:54     ` H.J. Lu
2021-11-18  0:03       ` Noah Goldstein
2021-11-18  0:31         ` H.J. Lu
2021-11-18  1:16           ` Arjan van de Ven
2022-09-11 20:19             ` Sunil Pandey
2022-09-29  0:10               ` Noah Goldstein
2021-11-11 16:24 ` [PATCH v6 2/4] Avoid extra load with CAS in __pthread_mutex_lock_full " H.J. Lu
2021-11-12 16:31   ` Szabolcs Nagy
2021-11-12 18:50   ` Andreas Schwab
2022-09-11 20:16     ` Sunil Pandey
2022-09-29  0:10       ` Noah Goldstein
2021-11-11 16:24 ` [PATCH v6 3/4] Reduce CAS in malloc spinlocks H.J. Lu
2023-02-23  5:48   ` DJ Delorie [this message]
2021-11-11 16:24 ` [PATCH v6 4/4] Avoid extra load with CAS in __pthread_mutex_clocklock_common [BZ #28537] H.J. Lu
2021-11-12 16:32   ` Szabolcs Nagy
2021-11-12 18:51   ` Andreas Schwab
2022-09-11 20:12     ` Sunil Pandey
2022-09-11 20:15       ` Arjan van de Ven
2022-09-11 21:26         ` Florian Weimer
2022-09-29  0:09       ` Noah Goldstein
2021-11-15 13:01 [PATCH v6 3/4] Reduce CAS in malloc spinlocks Wilco Dijkstra
2023-02-23 18:27 Wilco Dijkstra
2023-02-23 19:53 ` H.J. Lu
2023-02-23 20:07   ` DJ Delorie

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xn7cw97wzf.fsf@greed.delorie.com \
    --to=dj@redhat.com \
    --cc=hjl.tools@gmail.com \
    --cc=libc-alpha@sourceware.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).