public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug target/110096] New: Would be nice if __builtin_ia32_pause had a portable equivalent as it's applicable to ARM
@ 2023-06-02 16:52 pdimov at gmail dot com
  2023-06-02 17:27 ` [Bug target/110096] " pinskia at gcc dot gnu.org
                   ` (12 more replies)
  0 siblings, 13 replies; 14+ messages in thread
From: pdimov at gmail dot com @ 2023-06-02 16:52 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110096

            Bug ID: 110096
           Summary: Would be nice if __builtin_ia32_pause had a portable
                    equivalent as it's applicable to ARM
           Product: gcc
           Version: 14.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: pdimov at gmail dot com
  Target Milestone: ---

This is more of a feature request than a bug.

Currently `__builtin_ia32_pause()` only applies to Intel/AMD CPUs (hence the
`ia32` in the name), but it has a straightforward and equivalent meaning for
ARM (issue a YIELD instruction, which is the exact ARM equivalent to the PAUSE
x86 one.)

This forces us to do things like

```
#if __has_builtin(__builtin_ia32_pause)

__builtin_ia32_pause();

#elif defined(__GNUC__) && ( (defined(__ARM_ARCH) && __ARM_ARCH >= 8) ||
defined(__ARM_ARCH_8A__) || defined(__aarch64__) )

__asm__ __volatile__( "yield" : : : "memory" );

// ...
```

(E.g.
https://github.com/boostorg/core/blob/3b96d237c0e3ada30c9beca0f60062a2576dcafd/include/boost/core/detail/sp_thread_pause.hpp)

This can be solved in one of two ways; one, extend `__builtin_ia32_pause` to do
the right thing for ARM - unprincipled because of ia32 in the name, but will
automagically "fix" all code using `#if __has_builtin(__builtin_ia32_pause)`.

Or two, add a portable spelling for the intrinsic, either `__builtin_pause()`
or `__builtin_yield()`.

(Failing that, an ARM-specific `__builtin_arm_yield()` would still be an
improvement over the above because it at least will allow us to not hardcode
the ARM target detection, which we are probably getting wrong.)

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug target/110096] Would be nice if __builtin_ia32_pause had a portable equivalent as it's applicable to ARM
  2023-06-02 16:52 [Bug target/110096] New: Would be nice if __builtin_ia32_pause had a portable equivalent as it's applicable to ARM pdimov at gmail dot com
@ 2023-06-02 17:27 ` pinskia at gcc dot gnu.org
  2023-06-02 17:30 ` pinskia at gcc dot gnu.org
                   ` (11 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-06-02 17:27 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110096

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         Resolution|---                         |INVALID
             Status|UNCONFIRMED                 |RESOLVED

--- Comment #1 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
https://developer.arm.com/documentation/101028/latest/

Defines __yield already if you include arm_acle.h .

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug target/110096] Would be nice if __builtin_ia32_pause had a portable equivalent as it's applicable to ARM
  2023-06-02 16:52 [Bug target/110096] New: Would be nice if __builtin_ia32_pause had a portable equivalent as it's applicable to ARM pdimov at gmail dot com
  2023-06-02 17:27 ` [Bug target/110096] " pinskia at gcc dot gnu.org
@ 2023-06-02 17:30 ` pinskia at gcc dot gnu.org
  2023-06-02 17:34 ` pdimov at gmail dot com
                   ` (10 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-06-02 17:30 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110096

--- Comment #2 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Oh and the acle defines how to detect compiling on arm targets and gcc follows
that.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug target/110096] Would be nice if __builtin_ia32_pause had a portable equivalent as it's applicable to ARM
  2023-06-02 16:52 [Bug target/110096] New: Would be nice if __builtin_ia32_pause had a portable equivalent as it's applicable to ARM pdimov at gmail dot com
  2023-06-02 17:27 ` [Bug target/110096] " pinskia at gcc dot gnu.org
  2023-06-02 17:30 ` pinskia at gcc dot gnu.org
@ 2023-06-02 17:34 ` pdimov at gmail dot com
  2023-06-02 17:49 ` pinskia at gcc dot gnu.org
                   ` (9 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: pdimov at gmail dot com @ 2023-06-02 17:34 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110096

--- Comment #3 from Peter Dimov <pdimov at gmail dot com> ---
How does the user know when to include `arm_acle.h`?

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug target/110096] Would be nice if __builtin_ia32_pause had a portable equivalent as it's applicable to ARM
  2023-06-02 16:52 [Bug target/110096] New: Would be nice if __builtin_ia32_pause had a portable equivalent as it's applicable to ARM pdimov at gmail dot com
                   ` (2 preceding siblings ...)
  2023-06-02 17:34 ` pdimov at gmail dot com
@ 2023-06-02 17:49 ` pinskia at gcc dot gnu.org
  2023-06-02 17:54 ` pdimov at gmail dot com
                   ` (8 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-06-02 17:49 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110096

--- Comment #4 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
(In reply to Peter Dimov from comment #3)
> How does the user know when to include `arm_acle.h`?

__ARM_ARCH will be defined for both arm*-*-* and aarch64-*-* targets as defined
by the ACLE documentation.
You could also use __has_include too if needed.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug target/110096] Would be nice if __builtin_ia32_pause had a portable equivalent as it's applicable to ARM
  2023-06-02 16:52 [Bug target/110096] New: Would be nice if __builtin_ia32_pause had a portable equivalent as it's applicable to ARM pdimov at gmail dot com
                   ` (3 preceding siblings ...)
  2023-06-02 17:49 ` pinskia at gcc dot gnu.org
@ 2023-06-02 17:54 ` pdimov at gmail dot com
  2023-06-02 17:59 ` pinskia at gcc dot gnu.org
                   ` (7 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: pdimov at gmail dot com @ 2023-06-02 17:54 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110096

--- Comment #5 from Peter Dimov <pdimov at gmail dot com> ---
This works for the specific case of ARM, even though I don't find it
substantially better than just using `asm("yield")`, but the benefit of having
a portable intrinsic for this functionality is that as such instructions are
added to targets and GCC gains support of them (as has happened with ARM), code
would automatically take advantage of them, without having to acquire new
ifdefs for each supported target.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug target/110096] Would be nice if __builtin_ia32_pause had a portable equivalent as it's applicable to ARM
  2023-06-02 16:52 [Bug target/110096] New: Would be nice if __builtin_ia32_pause had a portable equivalent as it's applicable to ARM pdimov at gmail dot com
                   ` (4 preceding siblings ...)
  2023-06-02 17:54 ` pdimov at gmail dot com
@ 2023-06-02 17:59 ` pinskia at gcc dot gnu.org
  2023-06-02 18:05 ` pdimov at gmail dot com
                   ` (6 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-06-02 17:59 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110096

--- Comment #6 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
(In reply to Peter Dimov from comment #5)
> This works for the specific case of ARM, even though I don't find it
> substantially better than just using `asm("yield")`, but the benefit of
> having a portable intrinsic for this functionality is that as such
> instructions are added to targets and GCC gains support of them (as has
> happened with ARM), code would automatically take advantage of them, without
> having to acquire new ifdefs for each supported target.

There is no benifit in having an intrinsic that is generic here really. ACLE
defines the ones for ARM. Intel has a definition of their own intrinsics too
(which gcc does follow too for __pause/_mm_pause but I notice you used the
builtin directly for GCC for some reason).

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug target/110096] Would be nice if __builtin_ia32_pause had a portable equivalent as it's applicable to ARM
  2023-06-02 16:52 [Bug target/110096] New: Would be nice if __builtin_ia32_pause had a portable equivalent as it's applicable to ARM pdimov at gmail dot com
                   ` (5 preceding siblings ...)
  2023-06-02 17:59 ` pinskia at gcc dot gnu.org
@ 2023-06-02 18:05 ` pdimov at gmail dot com
  2023-06-02 18:09 ` pinskia at gcc dot gnu.org
                   ` (5 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: pdimov at gmail dot com @ 2023-06-02 18:05 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110096

--- Comment #7 from Peter Dimov <pdimov at gmail dot com> ---
These intrinsics are typically used in spinlocks as in
```
while( flag_.test_and_set() )
{
    // issue a power-saving NOP here
}
```
(where `flag_` is `std::atomic_flag`) and this use is generic and not
target-dependent.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug target/110096] Would be nice if __builtin_ia32_pause had a portable equivalent as it's applicable to ARM
  2023-06-02 16:52 [Bug target/110096] New: Would be nice if __builtin_ia32_pause had a portable equivalent as it's applicable to ARM pdimov at gmail dot com
                   ` (6 preceding siblings ...)
  2023-06-02 18:05 ` pdimov at gmail dot com
@ 2023-06-02 18:09 ` pinskia at gcc dot gnu.org
  2023-06-02 18:24 ` pdimov at gmail dot com
                   ` (4 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-06-02 18:09 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110096

--- Comment #8 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
(In reply to Peter Dimov from comment #7)
> These intrinsics are typically used in spinlocks as in
> ```
> while( flag_.test_and_set() )
> {
>     // issue a power-saving NOP here
> }
> ```
> (where `flag_` is `std::atomic_flag`) and this use is generic and not
> target-dependent.

But yield is not the right thing for spinlocks for ARMv8. You want to use wfe
instead. yield is only for multi-threaded cores and can be used anytime.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug target/110096] Would be nice if __builtin_ia32_pause had a portable equivalent as it's applicable to ARM
  2023-06-02 16:52 [Bug target/110096] New: Would be nice if __builtin_ia32_pause had a portable equivalent as it's applicable to ARM pdimov at gmail dot com
                   ` (7 preceding siblings ...)
  2023-06-02 18:09 ` pinskia at gcc dot gnu.org
@ 2023-06-02 18:24 ` pdimov at gmail dot com
  2023-06-02 19:12 ` pinskia at gcc dot gnu.org
                   ` (3 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: pdimov at gmail dot com @ 2023-06-02 18:24 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110096

--- Comment #9 from Peter Dimov <pdimov at gmail dot com> ---
I don't think I want WFE here, based on what I read about it. Putting the core
to sleep seems like something to do in an embedded system where I have full
control of what cores do, not something to do on the application level, in a
portable C++ library.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug target/110096] Would be nice if __builtin_ia32_pause had a portable equivalent as it's applicable to ARM
  2023-06-02 16:52 [Bug target/110096] New: Would be nice if __builtin_ia32_pause had a portable equivalent as it's applicable to ARM pdimov at gmail dot com
                   ` (8 preceding siblings ...)
  2023-06-02 18:24 ` pdimov at gmail dot com
@ 2023-06-02 19:12 ` pinskia at gcc dot gnu.org
  2023-06-02 23:31 ` andysem at mail dot ru
                   ` (2 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-06-02 19:12 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110096

--- Comment #10 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
(In reply to Peter Dimov from comment #9)
> I don't think I want WFE here, based on what I read about it. Putting the
> core to sleep seems like something to do in an embedded system where I have
> full control of what cores do, not something to do on the application level,
> in a portable C++ library.

No, WFE can be used in userspace just fine and in fact it will be interrupted
every once in a while. yield only sleeps for a few cycles and then wakes up,
while wfe will sleep until an event happens (also WFE is very hypervisor
friendly too).

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug target/110096] Would be nice if __builtin_ia32_pause had a portable equivalent as it's applicable to ARM
  2023-06-02 16:52 [Bug target/110096] New: Would be nice if __builtin_ia32_pause had a portable equivalent as it's applicable to ARM pdimov at gmail dot com
                   ` (9 preceding siblings ...)
  2023-06-02 19:12 ` pinskia at gcc dot gnu.org
@ 2023-06-02 23:31 ` andysem at mail dot ru
  2023-06-02 23:47 ` pinskia at gcc dot gnu.org
  2023-06-03  1:57 ` pdimov at gmail dot com
  12 siblings, 0 replies; 14+ messages in thread
From: andysem at mail dot ru @ 2023-06-02 23:31 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110096

andysem at mail dot ru changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |andysem at mail dot ru

--- Comment #11 from andysem at mail dot ru ---
(In reply to Andrew Pinski from comment #10)
> (In reply to Peter Dimov from comment #9)
> > I don't think I want WFE here, based on what I read about it. Putting the
> > core to sleep seems like something to do in an embedded system where I have
> > full control of what cores do, not something to do on the application level,
> > in a portable C++ library.
> 
> No, WFE can be used in userspace just fine and in fact it will be
> interrupted every once in a while. yield only sleeps for a few cycles and
> then wakes up, while wfe will sleep until an event happens (also WFE is very
> hypervisor friendly too).

Spin locks are used when latency is a concern and when the protected region is
extremely small (i.e. a few instructions). Putting the core to sleep until the
next interrupt does not seem appropriate for this purpose. x86 pause and ARM
yield are better suited exactly because they wait for a time in the order of
cycles (up to about a hundred on recent x86) rather than microseconds or more.

I can add that in most spin lock implementations I have seen, either yield, nop
or nothing is used for wasting CPU cycles. A few examples:

https://www.boost.org/doc/libs/1_82_0/boost/fiber/detail/cpu_relax.hpp
https://elixir.bootlin.com/linux/latest/source/arch/arm/include/asm/vdso/processor.h#L11
https://elixir.bootlin.com/linux/latest/source/arch/arm64/include/asm/vdso/processor.h#L12
https://chromium.googlesource.com/chromium/src/third_party/WebKit/Source/wtf/+/823d62cdecdbd5f161634177e130e5ac01eb7b48/SpinLock.cpp

The first link has instructions for a few other architectures besides ARM and
x86. I agree with Peter that an architecture-neutral intrinsic could be useful
to avoid this kind of code duplicated in various projects. Although I realize
that specifying the exact behavior of this intrinsic would be difficult, since
even the underlying instructions are defined rather vaguely. However, one thing
is certain: this intrinsic must be a full compiler fence.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug target/110096] Would be nice if __builtin_ia32_pause had a portable equivalent as it's applicable to ARM
  2023-06-02 16:52 [Bug target/110096] New: Would be nice if __builtin_ia32_pause had a portable equivalent as it's applicable to ARM pdimov at gmail dot com
                   ` (10 preceding siblings ...)
  2023-06-02 23:31 ` andysem at mail dot ru
@ 2023-06-02 23:47 ` pinskia at gcc dot gnu.org
  2023-06-03  1:57 ` pdimov at gmail dot com
  12 siblings, 0 replies; 14+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-06-02 23:47 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110096

--- Comment #12 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Wfe normally waits a hundreds of cycles before being woken up. It also wake up
when an event happens which is exactly when a write to that address happens.
Exactly when you want.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug target/110096] Would be nice if __builtin_ia32_pause had a portable equivalent as it's applicable to ARM
  2023-06-02 16:52 [Bug target/110096] New: Would be nice if __builtin_ia32_pause had a portable equivalent as it's applicable to ARM pdimov at gmail dot com
                   ` (11 preceding siblings ...)
  2023-06-02 23:47 ` pinskia at gcc dot gnu.org
@ 2023-06-03  1:57 ` pdimov at gmail dot com
  12 siblings, 0 replies; 14+ messages in thread
From: pdimov at gmail dot com @ 2023-06-03  1:57 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110096

--- Comment #13 from Peter Dimov <pdimov at gmail dot com> ---
Even if we assume that WFE on lock (and SEV on unlock) is the correct approach
on ARM instead of YIELD (though this seems very much domain-specific, depending
on the expected amount of contention and who knows what else), isn't the
existence of pause/yield instructions on MIPS, POWER, and apparently RISC-V (*)
enough further evidence in favor of having a portable intrinsic for emitting
such an instruction?

(*) https://doc.rust-lang.org/src/core/hint.rs.html#178-191 (implementation of
https://doc.rust-lang.org/std/hint/fn.spin_loop.html)

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2023-06-03  1:57 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-06-02 16:52 [Bug target/110096] New: Would be nice if __builtin_ia32_pause had a portable equivalent as it's applicable to ARM pdimov at gmail dot com
2023-06-02 17:27 ` [Bug target/110096] " pinskia at gcc dot gnu.org
2023-06-02 17:30 ` pinskia at gcc dot gnu.org
2023-06-02 17:34 ` pdimov at gmail dot com
2023-06-02 17:49 ` pinskia at gcc dot gnu.org
2023-06-02 17:54 ` pdimov at gmail dot com
2023-06-02 17:59 ` pinskia at gcc dot gnu.org
2023-06-02 18:05 ` pdimov at gmail dot com
2023-06-02 18:09 ` pinskia at gcc dot gnu.org
2023-06-02 18:24 ` pdimov at gmail dot com
2023-06-02 19:12 ` pinskia at gcc dot gnu.org
2023-06-02 23:31 ` andysem at mail dot ru
2023-06-02 23:47 ` pinskia at gcc dot gnu.org
2023-06-03  1:57 ` pdimov at gmail dot com

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).