From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by sourceware.org (Postfix) with ESMTPS id 34C523858D33 for ; Thu, 23 Feb 2023 05:48:57 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 34C523858D33 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=redhat.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1677131336; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to; bh=snj4NUeL9Rk+9nag0zcj3tvb0QPbcly1eaBJ+i1kwq4=; b=A18Tua0oxqqR5QDbfhm3ZWpNKIQMzIdj2UGdQQ9dhzUllCM3qVL8r/rHrJkHoryPcUhI3N fe0k88rOyKwbQB/8553PVuasasoDKMHrsAh3gpo9OvtxKGjg3DErVmssizH6qbARBfww3r Vl124Yt05r4twVjDqMEJErBm8gf5tvQ= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-256-P1p-RwYrMfiILRo-reNIEA-1; Thu, 23 Feb 2023 00:48:53 -0500 X-MC-Unique: P1p-RwYrMfiILRo-reNIEA-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.rdu2.redhat.com [10.11.54.2]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 70E1B38123B0; Thu, 23 Feb 2023 05:48:53 +0000 (UTC) Received: from greed.delorie.com (unknown [10.22.8.35]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 5AA34403D0C1; Thu, 23 Feb 2023 05:48:53 +0000 (UTC) Received: from greed.delorie.com.redhat.com (localhost [127.0.0.1]) by greed.delorie.com (8.15.2/8.15.2) with ESMTP id 31N5mq8F1076473; Thu, 23 Feb 2023 00:48:52 -0500 From: DJ Delorie To: "H.J. Lu" Cc: libc-alpha@sourceware.org Subject: Re: [PATCH v6 3/4] Reduce CAS in malloc spinlocks In-Reply-To: <20211111162428.2286605-4-hjl.tools@gmail.com> Date: Thu, 23 Feb 2023 00:48:52 -0500 Message-ID: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.2 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain X-Spam-Status: No, score=-10.8 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,GIT_PATCH_0,RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Sorry for letting this one slip... "H.J. Lu via Libc-alpha" writes: > size_t n = narenas; > if (__glibc_unlikely (n <= narenas_limit - 1)) > { > + if (atomic_load_relaxed (&narenas) != n) > + { > + atomic_spin_nop (); > + goto repeat; > + } > if (catomic_compare_and_exchange_bool_acq (&narenas, n + 1, n)) > goto repeat; I understand that a congested spinloop will benefit from this kind of change, but... we JUST loaded narenas into n, and adding arenas is rare. We probably should have loaded it atomically, but still, we just loaded it. The odds of malloc being so congested that we miss the CAS is essentially (but of course not exactly) zero. Are we just adding an uneeded atomic read here? Do any benchmarks say this would be beneficial? Also, the malloc code is already complicated enough. Are the extra lines of code and slight reduction in readability justified? Also, we've been migrating to C11-like atomics; would this patch need changing for that? Should target-specific atomics optimizations be "hidden" somewhere in the atomics implementation? Just because x86 may benefit from a pre-read doesn't mean that all targets will, and if x86 generally benefits, it should update its implementation of the atomics to do that at a lower level. > a = _int_new_arena (size); > diff --git a/malloc/malloc.c b/malloc/malloc.c > index 095d97a3be..403ffb84ef 100644 > --- a/malloc/malloc.c > +++ b/malloc/malloc.c > @@ -3717,6 +3717,11 @@ _int_malloc (mstate av, size_t bytes) > pp = REVEAL_PTR (victim->fd); \ > if (__glibc_unlikely (pp != NULL && misaligned_chunk (pp))) \ > malloc_printerr ("malloc(): unaligned fastbin chunk detected"); \ > + if (atomic_load_relaxed (fb) != victim) \ > + { \ > + atomic_spin_nop (); \ > + continue; \ > + } \ > } \ > while ((pp = catomic_compare_and_exchange_val_acq (fb, pp, victim)) \ > != victim); \ > @@ -4435,6 +4440,11 @@ _int_free (mstate av, mchunkptr p, int have_lock) > malloc_printerr ("double free or corruption (fasttop)"); > old2 = old; > p->fd = PROTECT_PTR (&p->fd, old); > + if (atomic_load_relaxed (fb) != old2) > + { > + atomic_spin_nop (); > + continue; > + } > } > while ((old = catomic_compare_and_exchange_val_rel (fb, p, old2)) > != old2); Likewise, although these are less rare, but not so common as I'd expect a benefit from the extra code.