public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed
From: "H.J. Lu" <hjl.tools@gmail.com>
To: Noah Goldstein <goldstein.w.n@gmail.com>
Cc: libc-alpha@sourceware.org, carlos@systemhalted.org
Subject: Re: [PATCH v2 2/2] x86: Cleanup pthread_spin_{try}lock.S
Date: Mon, 3 Oct 2022 10:08:45 -0700	[thread overview]
Message-ID: <CAMe9rOraC9s7++8xuC1bjHPjY5JjXNP_-TtCS+g6-SJJRGQTsA@mail.gmail.com> (raw)
In-Reply-To: <20221001041327.1133757-2-goldstein.w.n@gmail.com>

On Fri, Sep 30, 2022 at 9:13 PM Noah Goldstein <goldstein.w.n@gmail.com> wrote:
>
> Save a jmp on the lock path coming from an initial failure in
> pthread_spin_lock.S.  This costs 4-bytes of code but since the
> function still fits in the same number of 16-byte blocks (default
> function alignment) it does not have affect on the total binary size
> of libc.so (unchanged after this commit).
>
> pthread_spin_trylock was using a CAS when a simple xchg works which
> is often more expensive.
>
> Full check passes on x86-64.
> ---
>  sysdeps/x86_64/nptl/pthread_spin_lock.S    | 23 +++++++++++++++-------
>  sysdeps/x86_64/nptl/pthread_spin_trylock.S | 18 ++++++++++++-----
>  2 files changed, 29 insertions(+), 12 deletions(-)
>
> diff --git a/sysdeps/x86_64/nptl/pthread_spin_lock.S b/sysdeps/x86_64/nptl/pthread_spin_lock.S
> index 44b837d9db..1e09e59b10 100644
> --- a/sysdeps/x86_64/nptl/pthread_spin_lock.S
> +++ b/sysdeps/x86_64/nptl/pthread_spin_lock.S
> @@ -19,18 +19,27 @@
>  #include <shlib-compat.h>
>
>  ENTRY(__pthread_spin_lock)
> -1:     LOCK
> -       decl    0(%rdi)
> -       jne     2f
> +       /* Always return zero.  */
>         xor     %eax, %eax
> +       LOCK
> +       decl    0(%rdi)
> +       jne     1f
>         ret
>
>         .align  16
> -2:     rep
> +1:
> +       /* `rep nop` == `pause`.  */
> +       rep
>         nop
> -       cmpl    $0, 0(%rdi)
> -       jg      1b
> -       jmp     2b
> +       cmpl    %eax, 0(%rdi)
> +       jle     1b
> +       /* Just repeat the `lock decl` logic here.  The code size save
> +          of jumping back to entry doesn't change how many 16-byte
> +          chunks (default function alignment) that the code fits in.  */
> +       LOCK
> +       decl    0(%rdi)
> +       jne     1b
> +       ret
>  END(__pthread_spin_lock)
>  versioned_symbol (libc, __pthread_spin_lock, pthread_spin_lock, GLIBC_2_34)
>
> diff --git a/sysdeps/x86_64/nptl/pthread_spin_trylock.S b/sysdeps/x86_64/nptl/pthread_spin_trylock.S
> index fffdb27dd9..a1f97cb420 100644
> --- a/sysdeps/x86_64/nptl/pthread_spin_trylock.S
> +++ b/sysdeps/x86_64/nptl/pthread_spin_trylock.S
> @@ -20,13 +20,21 @@
>  #include <shlib-compat.h>
>
>  ENTRY(__pthread_spin_trylock)
> -       movl    $1, %eax
>         xorl    %ecx, %ecx
> -       lock
> -       cmpxchgl %ecx, (%rdi)
> +       /* xchg has implicit LOCK prefix.  */
> +       xchgl   %ecx, (%rdi)
> +
> +       /* Branch on result.  Expectation is the use of trylock will be
> +          branching on success/failure so this branch can be used to
> +          to predict the coming branch.  It has the benefit of
> +          breaking the likely expensive memory dependency on (%rdi).  */
> +       cmpl    $1, %ecx
> +       jnz     1f
> +       xorl    %eax, %eax
> +       ret
> +1:
>         movl    $EBUSY, %eax
> -       cmovel  %ecx, %eax
> -       retq
> +       ret
>  END(__pthread_spin_trylock)
>  versioned_symbol (libc, __pthread_spin_trylock, pthread_spin_trylock,
>                   GLIBC_2_34)
> --
> 2.34.1
>

LGTM.

Thanks.

-- 
H.J.

  reply	other threads:[~2022-10-03 17:09 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-10-01  2:33 [PATCH v1 1/2] Benchtests: Add bench for pthread_spin_{try}lock and mutex_trylock Noah Goldstein
2022-10-01  2:33 ` [PATCH v1 2/2] x86: Optimize pthread_spin_{try}lock.S Noah Goldstein
2022-10-01  4:13   ` [PATCH v2 1/2] Benchtests: Add bench for pthread_spin_{try}lock and mutex_trylock Noah Goldstein
2022-10-01  4:13     ` [PATCH v2 2/2] x86: Cleanup pthread_spin_{try}lock.S Noah Goldstein
2022-10-03 17:08       ` H.J. Lu [this message]
2022-10-03 17:10     ` [PATCH v2 1/2] Benchtests: Add bench for pthread_spin_{try}lock and mutex_trylock H.J. Lu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAMe9rOraC9s7++8xuC1bjHPjY5JjXNP_-TtCS+g6-SJJRGQTsA@mail.gmail.com \
    --to=hjl.tools@gmail.com \
    --cc=carlos@systemhalted.org \
    --cc=goldstein.w.n@gmail.com \
    --cc=libc-alpha@sourceware.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).