From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-qv1-xf32.google.com (mail-qv1-xf32.google.com [IPv6:2607:f8b0:4864:20::f32]) by sourceware.org (Postfix) with ESMTPS id 56B9B3858D38 for ; Mon, 3 Oct 2022 17:09:22 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 56B9B3858D38 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com Received: by mail-qv1-xf32.google.com with SMTP id h10so5566388qvq.7 for ; Mon, 03 Oct 2022 10:09:22 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date; bh=dXWAu2ocUSSvuq86QD2EMc+prnPK5zIkwMwv4ZL0EBw=; b=YQ2CdFD+Oino2oHD9kLemdWeprRC6A9UQ2TvXbM5qnj0rv4FIGLm+RoykaShltHsWO by+pq2UZyOR96SeHigJnTi1IWRg7S5KgAwdwtx1fpJhCeyuEcdFVH2JfwV1YFtSN4oKA pfZrRucURcELI5oN9CJYN2G9Td0nZTwCQErrTVeyva+tperCIWij7i4Z+mDH+XLtZrkh eogcvkRnLsUoz23LD19V4obhJCP/yA9mK/oEORV9UYqQkuO5UFq2QF9HND0DRDnJqG3F qDrVWDiyaVBdwzMW59xkVnP5Dha/g7Un9UkfnhSqCmfOsejHWdGl8LFMNBvNXgsshbD9 oaLQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date; bh=dXWAu2ocUSSvuq86QD2EMc+prnPK5zIkwMwv4ZL0EBw=; b=u5uC5FMaqxhKr+TKVUwAME5yArJUNgEexkb2YKGz8RUSSCe8K7XRmOVatLlAUXBah0 Y5wVz/xUvMPxRTzDeQHnSsIUNVSrZA2f8/MNmOp+UdKVWO5qcOhMjPRFg2BFFJwo0ewl kEn/p/qYQRdeOQeDhDLM9w7p+R83NCLA/uzACX1WKB2p22SJMxIY2PNi/1tGo7zgR+gm x8G1KX6hBmH6/+Ll+qgfvnNl2Qh9kNW6/xXU7Ol0r6eAU/pdo4jUG/PmIP1KjVcwSvBw 4xa2ag1caAFJyNJ2VO2RX3yseWvXuKA+Aj0jsO45wceY31eJQ8BPXqmEsk4TMBn+1iLu 8LPA== X-Gm-Message-State: ACrzQf3dneC/a8lzpHfKcPIJL4M8XBNo0biziUCzLZqJGGdMtofF+I8r uW7qtQWhDHnBzEb5BJ7Bwvi70byA8+mXSFKLK+Cx3hfo X-Google-Smtp-Source: AMsMyM5WLAJKJnFGOr93xrMIBEc5B0Y9NyJD1/Kv0QETVuf5v+Ek+PAwTGKkfTmG4+OHiUQ5Aw0odHD/0M18F+9iacA= X-Received: by 2002:a05:6214:d0e:b0:4ad:5e35:329a with SMTP id 14-20020a0562140d0e00b004ad5e35329amr16909703qvh.28.1664816961479; Mon, 03 Oct 2022 10:09:21 -0700 (PDT) MIME-Version: 1.0 References: <20221001023337.1127793-2-goldstein.w.n@gmail.com> <20221001041327.1133757-1-goldstein.w.n@gmail.com> <20221001041327.1133757-2-goldstein.w.n@gmail.com> In-Reply-To: <20221001041327.1133757-2-goldstein.w.n@gmail.com> From: "H.J. Lu" Date: Mon, 3 Oct 2022 10:08:45 -0700 Message-ID: Subject: Re: [PATCH v2 2/2] x86: Cleanup pthread_spin_{try}lock.S To: Noah Goldstein Cc: libc-alpha@sourceware.org, carlos@systemhalted.org Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-3023.0 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,GIT_PATCH_0,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP,URIBL_BLACK autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Fri, Sep 30, 2022 at 9:13 PM Noah Goldstein wrote: > > Save a jmp on the lock path coming from an initial failure in > pthread_spin_lock.S. This costs 4-bytes of code but since the > function still fits in the same number of 16-byte blocks (default > function alignment) it does not have affect on the total binary size > of libc.so (unchanged after this commit). > > pthread_spin_trylock was using a CAS when a simple xchg works which > is often more expensive. > > Full check passes on x86-64. > --- > sysdeps/x86_64/nptl/pthread_spin_lock.S | 23 +++++++++++++++------- > sysdeps/x86_64/nptl/pthread_spin_trylock.S | 18 ++++++++++++----- > 2 files changed, 29 insertions(+), 12 deletions(-) > > diff --git a/sysdeps/x86_64/nptl/pthread_spin_lock.S b/sysdeps/x86_64/nptl/pthread_spin_lock.S > index 44b837d9db..1e09e59b10 100644 > --- a/sysdeps/x86_64/nptl/pthread_spin_lock.S > +++ b/sysdeps/x86_64/nptl/pthread_spin_lock.S > @@ -19,18 +19,27 @@ > #include > > ENTRY(__pthread_spin_lock) > -1: LOCK > - decl 0(%rdi) > - jne 2f > + /* Always return zero. */ > xor %eax, %eax > + LOCK > + decl 0(%rdi) > + jne 1f > ret > > .align 16 > -2: rep > +1: > + /* `rep nop` == `pause`. */ > + rep > nop > - cmpl $0, 0(%rdi) > - jg 1b > - jmp 2b > + cmpl %eax, 0(%rdi) > + jle 1b > + /* Just repeat the `lock decl` logic here. The code size save > + of jumping back to entry doesn't change how many 16-byte > + chunks (default function alignment) that the code fits in. */ > + LOCK > + decl 0(%rdi) > + jne 1b > + ret > END(__pthread_spin_lock) > versioned_symbol (libc, __pthread_spin_lock, pthread_spin_lock, GLIBC_2_34) > > diff --git a/sysdeps/x86_64/nptl/pthread_spin_trylock.S b/sysdeps/x86_64/nptl/pthread_spin_trylock.S > index fffdb27dd9..a1f97cb420 100644 > --- a/sysdeps/x86_64/nptl/pthread_spin_trylock.S > +++ b/sysdeps/x86_64/nptl/pthread_spin_trylock.S > @@ -20,13 +20,21 @@ > #include > > ENTRY(__pthread_spin_trylock) > - movl $1, %eax > xorl %ecx, %ecx > - lock > - cmpxchgl %ecx, (%rdi) > + /* xchg has implicit LOCK prefix. */ > + xchgl %ecx, (%rdi) > + > + /* Branch on result. Expectation is the use of trylock will be > + branching on success/failure so this branch can be used to > + to predict the coming branch. It has the benefit of > + breaking the likely expensive memory dependency on (%rdi). */ > + cmpl $1, %ecx > + jnz 1f > + xorl %eax, %eax > + ret > +1: > movl $EBUSY, %eax > - cmovel %ecx, %eax > - retq > + ret > END(__pthread_spin_trylock) > versioned_symbol (libc, __pthread_spin_trylock, pthread_spin_trylock, > GLIBC_2_34) > -- > 2.34.1 > LGTM. Thanks. -- H.J.