Re: [RFC PATCH 3/3] RISC-V: Implement CPU yielding for busy loops with Zihintpause/Zawrs

public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed

From: "Christoph Müllner" <christoph.muellner@vrull.eu>
To: Vineet Gupta <vineetg@rivosinc.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>,
	libc-alpha@sourceware.org,  adhemerval.zanella@linaro.org,
	Darius Rad <darius@bluespec.com>,
	 Andrew Waterman <andrew@sifive.com>,
	philipp.tomsich@vrull.eu,  Evan Green <evan@rivosinc.com>,
	kito.cheng@sifive.com, jeffreyalaw@gmail.com,
	 gnu-toolchain <gnu-toolchain@rivosinc.com>
Subject: Re: [RFC PATCH 3/3] RISC-V: Implement CPU yielding for busy loops with Zihintpause/Zawrs
Date: Thu, 18 Apr 2024 22:25:10 +0200	[thread overview]
Message-ID: <CAEg0e7h-3-2huZTgzJVdugycPW0n-5fSBiLQ-6g5L+m6E1O-Qw@mail.gmail.com> (raw)
In-Reply-To: <45e809bf-fc10-4cab-9190-48b21d8cf263@rivosinc.com>

On Thu, Apr 18, 2024 at 10:03 PM Vineet Gupta <vineetg@rivosinc.com> wrote:
>
> Hi Christoph,
>
> My 2 cents.
>
> On 4/18/24 10:17, Palmer Dabbelt wrote:
> > On Thu, 18 Apr 2024 02:46:35 PDT (-0700), christoph.muellner@vrull.eu wrote:
> >> The macro atomic_spin_nop can be used to implement arch-specific
> >> CPU yielding that is used in busy loops (e.g. in pthread_spin_lock).
> >> This patch introduces an ifunc-based implementation for RISC-V,
> >> that uses Zihintpause's PAUSE instruction for that matter (as PAUSE
> >> is a HINT instruction there is not dependency to Zihintpause at
> >> runtime).  Further, we test for Zawrs via hwprobe() and if found
> >> we use WRS.STO instead of PAUSE.
>
> [snip]
>
> >> +ENTRY (__cpu_relax_generic)
> >> +    /* While we can use the `pause` instruction without
> >> +       the need of Zihintpause (because it is a HINT instruction),
> >> +       we still have to enable Zihintpause for the assembler.  */
> >> +    pause
> >> +    ret
> >> +END (__cpu_relax_generic)
>
> [snip]
>
> >> +.option push
> >> +.option arch, +zawrs
> >> +ENTRY (__cpu_relax_zawrs)
> >> +    wrs.sto
> > This has the same forward progress/eventual success violation as the
> > code you sent for GCC and Linux does.  It doesn't really matter if the
> > user of the reservation is in a builtin, an asm block, or a function.
> > The compiler just doesn't know about those reservation rules and isn't
> > going to generate code that follows them.
>
> So, this routine maps to atomic_spin_nop () called in generic code in a
> bunch of places.
>
> The easiest case is nptl/pthread_spin_lock.c which looks something like this
>
> __pthread_spin_lock (pthread_spinlock_t *lock)
> ...
>    do
>     {
>       atomic_spin_nop ();
>       val = atomic_load_relaxed (lock);
>     }
>   while (val != 0);
>
> This works fine for a PAUSE based implementation which doesn't need a
> prior reservation, but WRS does and even if both the load and spin are
> inlined, reservation happens after WRS which is wrong.
>
> So I think before we can implement this optimization for riscv, we need
> a generic glibc change to replace of atomic_spin_nop () with a variant
> which also takes the lock/memory under consideration.
> The fallback implementation of atomic_load_and_spin_if_cond() will be
> what the above loop does. For arches that do implement the API, based on
> ISA semantics, they can choose to ignore the mem arg completely (RISC-V
> PAUSE based implementation) or implement a little loop which explicitly
> takes reservation on the mem/lock and calls WRS.
>
> Does that make sense ?

I did something similar in the Linux patch a while ago.
Here I wanted to avoid changes in arch-independent code.
But you are right, wrs.sto does not make sense without a
valid reservation.

>
> We just need to sort out the 2nd use of atomic_spin_nop in
> nptl/pthread_mutex_lock.c where this conversion might not be straight fwd.
> The recent back off back spin looping is getting in the way of obvious
> solution, however for discussion sake if we ignore it, the code looks
> like this:
>
> PTHREAD_MUTEX_LOCK (pthread_mutex_t *mutex)
> ...
>        int cnt=0, max_cnt = something;
>        do
>         {
>             if (cnt ++>= max_cnt)
>             {
>                  LLL_MUTEX_LOCK (mutex);
>                  break;
>             }
>             atomic_spin_nop ();
>        }
>        while (LLL_MUTEX_READ_LOCK (mutex) != 0 || LLL_MUTEX_TRYLOCK
> (mutex) != 0);
>
> And this seems like it can be converted to a
> atomic_load_and_spin_if_cond(mutex, !=0 ) kind of API.
>
> Let me know if you want me to spin this - pun intended ;-)
>
> -Vineet
>
>
> > That said, as far as I can tell the same grey area in the WRS spec that
> > I pointed out during one of the Linux reviews still exists.   So if
> > there's some hardware that has a more generous set of LR->WRS rules than
> > the ISA-required LR->SC rules that's fine, we just need to know what the
> > hardware actually is so we can write down the rules and then stick to
> > them -- it's just a performance thing for WRS, so having weaker rules
> > seems entirely plausible.
> >
> > I don't know of any hardware with WRS, but sorry if I missed one.  Do
> > you have something publicly availiable that behaves this way?

next prev parent reply	other threads:[~2024-04-18 20:25 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-04-18  9:46 [RFC PATCH 0/3] RISC-V: Use WRS.STO for atomic_spin_nop Christoph Müllner
2024-04-18  9:46 ` [RFC PATCH 1/3] RISC-V: Sync hwprobe: Sync extension bits with Linux 6.8 Christoph Müllner
2024-04-18  9:46 ` [RFC PATCH 2/3] RISC-V: hwprobe: Add Zawrs test bit Christoph Müllner
2024-04-18  9:46 ` [RFC PATCH 3/3] RISC-V: Implement CPU yielding for busy loops with Zihintpause/Zawrs Christoph Müllner
2024-04-18 17:17   ` Palmer Dabbelt
2024-04-18 20:03     ` Vineet Gupta
2024-04-18 20:25       ` Christoph Müllner [this message]
2024-04-18 20:19     ` Christoph Müllner
2024-04-18 20:36       ` Vineet Gupta
2024-04-18 21:10         ` Palmer Dabbelt
2024-04-19 14:09           ` Andrew Jones

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAEg0e7h-3-2huZTgzJVdugycPW0n-5fSBiLQ-6g5L+m6E1O-Qw@mail.gmail.com \
    --to=christoph.muellner@vrull.eu \
    --cc=adhemerval.zanella@linaro.org \
    --cc=andrew@sifive.com \
    --cc=darius@bluespec.com \
    --cc=evan@rivosinc.com \
    --cc=gnu-toolchain@rivosinc.com \
    --cc=jeffreyalaw@gmail.com \
    --cc=kito.cheng@sifive.com \
    --cc=libc-alpha@sourceware.org \
    --cc=palmer@dabbelt.com \
    --cc=philipp.tomsich@vrull.eu \
    --cc=vineetg@rivosinc.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).