From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=4n3x=6T=redhat.com=dj@sourceware.org>
Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124])
	by sourceware.org (Postfix) with ESMTPS id 34C523858D33
	for <libc-alpha@sourceware.org>; Thu, 23 Feb 2023 05:48:57 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 34C523858D33
Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=redhat.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=redhat.com
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com;
	s=mimecast20190719; t=1677131336;
	h=from:from:reply-to:subject:subject:date:date:message-id:message-id:
	 to:to:cc:cc:mime-version:mime-version:content-type:content-type:
	 in-reply-to:in-reply-to; bh=snj4NUeL9Rk+9nag0zcj3tvb0QPbcly1eaBJ+i1kwq4=;
	b=A18Tua0oxqqR5QDbfhm3ZWpNKIQMzIdj2UGdQQ9dhzUllCM3qVL8r/rHrJkHoryPcUhI3N
	fe0k88rOyKwbQB/8553PVuasasoDKMHrsAh3gpo9OvtxKGjg3DErVmssizH6qbARBfww3r
	Vl124Yt05r4twVjDqMEJErBm8gf5tvQ=
Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com
 [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS
 (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id
 us-mta-256-P1p-RwYrMfiILRo-reNIEA-1; Thu, 23 Feb 2023 00:48:53 -0500
X-MC-Unique: P1p-RwYrMfiILRo-reNIEA-1
Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.rdu2.redhat.com [10.11.54.2])
	(using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
	(No client certificate requested)
	by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 70E1B38123B0;
	Thu, 23 Feb 2023 05:48:53 +0000 (UTC)
Received: from greed.delorie.com (unknown [10.22.8.35])
	by smtp.corp.redhat.com (Postfix) with ESMTPS id 5AA34403D0C1;
	Thu, 23 Feb 2023 05:48:53 +0000 (UTC)
Received: from greed.delorie.com.redhat.com (localhost [127.0.0.1])
	by greed.delorie.com (8.15.2/8.15.2) with ESMTP id 31N5mq8F1076473;
	Thu, 23 Feb 2023 00:48:52 -0500
From: DJ Delorie <dj@redhat.com>
To: "H.J. Lu" <hjl.tools@gmail.com>
Cc: libc-alpha@sourceware.org
Subject: Re: [PATCH v6 3/4] Reduce CAS in malloc spinlocks
In-Reply-To: <20211111162428.2286605-4-hjl.tools@gmail.com>
Date: Thu, 23 Feb 2023 00:48:52 -0500
Message-ID: <xn7cw97wzf.fsf@greed.delorie.com>
MIME-Version: 1.0
X-Scanned-By: MIMEDefang 3.1 on 10.11.54.2
X-Mimecast-Spam-Score: 0
X-Mimecast-Originator: redhat.com
Content-Type: text/plain
X-Spam-Status: No, score=-10.8 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,GIT_PATCH_0,RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE,TXREP autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org
List-Id: <libc-alpha.sourceware.org>


Sorry for letting this one slip...

"H.J. Lu via Libc-alpha" <libc-alpha@sourceware.org> writes:
>        size_t n = narenas;
>        if (__glibc_unlikely (n <= narenas_limit - 1))
>          {
> +          if (atomic_load_relaxed (&narenas) != n)
> +           {
> +              atomic_spin_nop ();
> +              goto repeat;
> +           }
>            if (catomic_compare_and_exchange_bool_acq (&narenas, n + 1, n))
>              goto repeat;

I understand that a congested spinloop will benefit from this kind of
change, but... we JUST loaded narenas into n, and adding arenas is rare.
We probably should have loaded it atomically, but still, we just loaded
it.  The odds of malloc being so congested that we miss the CAS is
essentially (but of course not exactly) zero.  Are we just adding an
uneeded atomic read here?  Do any benchmarks say this would be
beneficial?

Also, the malloc code is already complicated enough.  Are the extra
lines of code and slight reduction in readability justified?

Also, we've been migrating to C11-like atomics; would this patch need
changing for that?

Should target-specific atomics optimizations be "hidden" somewhere in
the atomics implementation?  Just because x86 may benefit from a
pre-read doesn't mean that all targets will, and if x86 generally
benefits, it should update its implementation of the atomics to do that
at a lower level.

>            a = _int_new_arena (size);
> diff --git a/malloc/malloc.c b/malloc/malloc.c
> index 095d97a3be..403ffb84ef 100644
> --- a/malloc/malloc.c
> +++ b/malloc/malloc.c
> @@ -3717,6 +3717,11 @@ _int_malloc (mstate av, size_t bytes)
>        pp = REVEAL_PTR (victim->fd);                                     \
>        if (__glibc_unlikely (pp != NULL && misaligned_chunk (pp)))       \
>  	malloc_printerr ("malloc(): unaligned fastbin chunk detected"); \
> +      if (atomic_load_relaxed (fb) != victim)		\
> +	{						\
> +	  atomic_spin_nop ();				\
> +	  continue;					\
> +	}						\
>      }							\
>    while ((pp = catomic_compare_and_exchange_val_acq (fb, pp, victim)) \
>  	 != victim);					\
> @@ -4435,6 +4440,11 @@ _int_free (mstate av, mchunkptr p, int have_lock)
>  	    malloc_printerr ("double free or corruption (fasttop)");
>  	  old2 = old;
>  	  p->fd = PROTECT_PTR (&p->fd, old);
> +	  if (atomic_load_relaxed (fb) != old2)
> +	    {
> +	      atomic_spin_nop ();
> +	      continue;
> +	    }
>  	}
>        while ((old = catomic_compare_and_exchange_val_rel (fb, p, old2))
>  	     != old2);

Likewise, although these are less rare, but not so common as I'd expect
a benefit from the extra code.