From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-oa1-x34.google.com (mail-oa1-x34.google.com [IPv6:2001:4860:4864:20::34]) by sourceware.org (Postfix) with ESMTPS id A6F703858C00 for ; Thu, 23 Feb 2023 19:53:48 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org A6F703858C00 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com Received: by mail-oa1-x34.google.com with SMTP id 586e51a60fabf-17227cba608so14232294fac.3 for ; Thu, 23 Feb 2023 11:53:48 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=rJy2PnWoblrVQKJZvJcvkXXC2ehB/ZXf7OlbDE55nUU=; b=ZkH5H4GWeF71ZwOWKLWBdRGDsgSS9CDlWj4uLQaHIrsv5czGkLMr4ec6ts0rVaRD7I AhbaH+bUoI68RAkGFFY0KIS7W/ddHH5UecATeoymuSUEWKUFo8U78aKf7OSg2xlejHU6 7gnZTPMX6fr2iKanB/9UvVWuoRdA/lrqN+MZWNjw+aIg1ubLqfS0LzhcY4U+ivK/ilPR c4fBB+S4JFbM4Ncp3IexLDUoNj9wo5k993/H/9pcAXZT/5klU1QOcwtydYEmIHz23dVq TUdCRC0qjQS3CJdRFkjc6vBStLDgfOypAM/o3Rf4VN5iMjXoFgEOEB9cJEU4sG4R88qp LN1g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=rJy2PnWoblrVQKJZvJcvkXXC2ehB/ZXf7OlbDE55nUU=; b=Sj59i5ecMbMdu4V0Ln+zKgvMB1U3aJJ6evp7pW2cYiKvAP3lhXqOzQWAFbrL487xgg h6vtmgKAZnGhJsv+PcsP7NGNhNFirS53Tj40sYAx6nbm0KYTdZ9e2iRAhesnqHpqV1pU 8ZZNMhn+z4hURchArwDp/mASgVjgRRPLFe6Iv62XjmNHtMrFvpWAOiAfcOxULG3WPCK1 vqfOiMlRQmpbkyjAuRkF60LjfobQYYvAdui6UzZuF0/aPktToYEFIYmyl+uKgtf7uHab SYz+dtfcmp/SdZ5rOgIbbtAm14JCnP8zVyB5oM7N/3e34DupV81BL45+HJKnuRg1ph3N 1z2w== X-Gm-Message-State: AO0yUKUBsBpnoqmhKmT4R+3mU0xLBtqZgbPW9wYZNTfIF5CwXB1mKGbR vpTq3bQlc+Io6xzOhEZlp4be8HEXCOE0lKulGn4= X-Google-Smtp-Source: AK7set/7ytM84wPaem16fDwTmtSGsdhh/3XvKYX/EdC4EIGoP8KUpS0eKZuLQ8YdvgJ5otGWDG3kufRwaXMQrT3Kocc= X-Received: by 2002:a05:6870:d8a5:b0:172:2006:49d with SMTP id dv37-20020a056870d8a500b001722006049dmr334291oab.4.1677182027918; Thu, 23 Feb 2023 11:53:47 -0800 (PST) MIME-Version: 1.0 References: In-Reply-To: From: "H.J. Lu" Date: Thu, 23 Feb 2023 11:53:12 -0800 Message-ID: Subject: Re: [PATCH v6 3/4] Reduce CAS in malloc spinlocks To: Wilco Dijkstra Cc: "dj@redhat.com" , GNU C Library Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-3016.8 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Thu, Feb 23, 2023 at 10:28 AM Wilco Dijkstra wrote: > > Hi DJ, > > > size_t n = narenas; > > if (__glibc_unlikely (n <= narenas_limit - 1)) > > { > > + if (atomic_load_relaxed (&narenas) != n) > > + { > > + atomic_spin_nop (); > > + goto repeat; > > + } > > if (catomic_compare_and_exchange_bool_acq (&narenas, n + 1, n)) > > goto repeat; > > Before we consider optimizing it, we should first simplify it. All this wants > to do is a relaxed atomic add, then check the maximum arenas and > treat the case of having too many arenas in the same way as failure to > create another arena (ie. just atomically decrement again). Ie. no CAS > loop required, and there is nothing to optimize either. > > > Should target-specific atomics optimizations be "hidden" somewhere in > > the atomics implementation? Just because x86 may benefit from a > > pre-read doesn't mean that all targets will, and if x86 generally > > benefits, it should update its implementation of the atomics to do that > > at a lower level. > > We have been removing secret optimizations hidden behind atomics and just > use standard atomics everywhere. These micro optimizations are often counter- > productive - it's far better to do them at a higher similar to the single-thread > path in malloc/free. > > For these there is no evidence they are heavily contended - if anything the > extra code will just slow things down. The cost is in the CAS itself, we could > remove it from the multithreaded malloc path by splitting the free list into a > local one (only accessed when you have the malloc lock) and a concurrent one. I didn't pursue it further since I couldn't show how much it would improve performance. -- H.J.