> When emitting a compare-and-swap loop for @ref{__sync Builtins}
> and @ref{__atomic Builtins} lacking a native instruction, optimize
> for the highly contended case by issuing an atomic load before the
> @code{CMPXCHG} instruction, and using the @code{PAUSE} instruction
> to save CPU power when restarting the loop.

Thanks for the correction, it looks quite clear now! Here is the
updated patch, ok for trunk?

Alexander Monakov via Gcc-patches <gcc-patches@gcc.gnu.org>
于2022年11月15日周二 21:59写道：
>
>
> On Tue, 15 Nov 2022, Jonathan Wakely wrote:
>
> > > How about the following:
> > >
> > > When emitting a compare-and-swap loop for @ref{__sync Builtins}
> > > and @ref{__atomic Builtins} lacking a native instruction, optimize
> > > for the highly contended case by issuing an atomic load before the
> > > @code{CMPXCHG} instruction, and invoke the @code{PAUSE} instruction
> > > when restarting the loop.
> >
> > That's much better, thanks. My only remaining quibble would be that
> > "invoking" an instruction seems only marginally better than running
> > one. Emitting? Issuing? Using? Adding?
>
> Right, it should be 'using'; let me also add 'to save CPU power':
>
> When emitting a compare-and-swap loop for @ref{__sync Builtins}
> and @ref{__atomic Builtins} lacking a native instruction, optimize
> for the highly contended case by issuing an atomic load before the
> @code{CMPXCHG} instruction, and using the @code{PAUSE} instruction
> to save CPU power when restarting the loop.
>
> Alexander