[PATCH] doc: Reword the description of -mrelax-cmpxchg-loop [PR 107676]

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* [PATCH] doc: Reword the description of -mrelax-cmpxchg-loop [PR 107676]
@ 2022-11-15  3:35 Hongyu Wang
  2022-11-15 10:53 ` Jonathan Wakely
  0 siblings, 1 reply; 9+ messages in thread
From: Hongyu Wang @ 2022-11-15  3:35 UTC (permalink / raw)
  To: gcc-patches; +Cc: hongtao.liu, ubizjak

Hi,

According to PR 107676, the document of -mrelax-cmpxchg-loop is nonsensical.
Adjust the wording according to the comments.

Bootstrapped on x86_64-pc-linux-gnu, ok for trunk?

gcc/ChangeLog:

	PR target/107676
	* doc/invoke.texi: Reword the description of
	-mrelax-cmpxchg-loop.
---
 gcc/doc/invoke.texi | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 40f667a630a..bdd7c319aef 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -33805,10 +33805,12 @@ registers.
 
 @item -mrelax-cmpxchg-loop
 @opindex mrelax-cmpxchg-loop
-Relax cmpxchg loop by emitting an early load and compare before cmpxchg,
-execute pause if load value is not expected. This reduces excessive
-cachline bouncing when and works for all atomic logic fetch builtins
-that generates compare and swap loop.
+For compare and swap loops that emitted by some __atomic_* builtins
+(e.g. __atomic_fetch_(or|and|xor|nand) and their __atomic_*_fetch
+counterparts), emit an atomic load before cmpxchg instruction. If the
+loaded value is not equal to expected, execute a pause instead of
+directly run the cmpxchg instruction. This might reduce excessive
+cacheline bouncing.
 
 @item -mindirect-branch=@var{choice}
 @opindex mindirect-branch
-- 
2.18.1


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] doc: Reword the description of -mrelax-cmpxchg-loop [PR 107676]
  2022-11-15  3:35 [PATCH] doc: Reword the description of -mrelax-cmpxchg-loop [PR 107676] Hongyu Wang
@ 2022-11-15 10:53 ` Jonathan Wakely
  2022-11-15 13:28   ` Alexander Monakov
  0 siblings, 1 reply; 9+ messages in thread
From: Jonathan Wakely @ 2022-11-15 10:53 UTC (permalink / raw)
  To: Hongyu Wang; +Cc: gcc-patches, hongtao.liu, ubizjak

On 15/11/22 11:35 +0800, Hongyu Wang wrote:
>Hi,
>
>According to PR 107676, the document of -mrelax-cmpxchg-loop is nonsensical.
>Adjust the wording according to the comments.
>
>Bootstrapped on x86_64-pc-linux-gnu, ok for trunk?
>
>gcc/ChangeLog:
>
>	PR target/107676
>	* doc/invoke.texi: Reword the description of
>	-mrelax-cmpxchg-loop.
>---
> gcc/doc/invoke.texi | 10 ++++++----
> 1 file changed, 6 insertions(+), 4 deletions(-)
>
>diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
>index 40f667a630a..bdd7c319aef 100644
>--- a/gcc/doc/invoke.texi
>+++ b/gcc/doc/invoke.texi
>@@ -33805,10 +33805,12 @@ registers.
>
> @item -mrelax-cmpxchg-loop
> @opindex mrelax-cmpxchg-loop
>-Relax cmpxchg loop by emitting an early load and compare before cmpxchg,
>-execute pause if load value is not expected. This reduces excessive
>-cachline bouncing when and works for all atomic logic fetch builtins
>-that generates compare and swap loop.
>+For compare and swap loops that emitted by some __atomic_* builtins

s/that emitted/that are emitted/

>+(e.g. __atomic_fetch_(or|and|xor|nand) and their __atomic_*_fetch
>+counterparts), emit an atomic load before cmpxchg instruction. If the

s/before cmpxchg/before the cmpxchg/

>+loaded value is not equal to expected, execute a pause instead of

s/not equal to expected/not equal to the expected/

>+directly run the cmpxchg instruction. This might reduce excessive

s/directly run/directly running/

>+cacheline bouncing.
>
> @item -mindirect-branch=@var{choice}
> @opindex mindirect-branch


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] doc: Reword the description of -mrelax-cmpxchg-loop [PR 107676]
  2022-11-15 10:53 ` Jonathan Wakely
@ 2022-11-15 13:28   ` Alexander Monakov
  2022-11-15 13:44     ` Jonathan Wakely
  0 siblings, 1 reply; 9+ messages in thread
From: Alexander Monakov @ 2022-11-15 13:28 UTC (permalink / raw)
  To: Jonathan Wakely; +Cc: Hongyu Wang, gcc-patches, hongtao.liu, ubizjak

On Tue, 15 Nov 2022, Jonathan Wakely via Gcc-patches wrote:

> > @item -mrelax-cmpxchg-loop
> > @opindex mrelax-cmpxchg-loop
> >-Relax cmpxchg loop by emitting an early load and compare before cmpxchg,
> >-execute pause if load value is not expected. This reduces excessive
> >-cachline bouncing when and works for all atomic logic fetch builtins
> >-that generates compare and swap loop.
> >+For compare and swap loops that emitted by some __atomic_* builtins
> 
> s/that emitted/that are emitted/
> 
> >+(e.g. __atomic_fetch_(or|and|xor|nand) and their __atomic_*_fetch
> >+counterparts), emit an atomic load before cmpxchg instruction. If the
> 
> s/before cmpxchg/before the cmpxchg/
> 
> >+loaded value is not equal to expected, execute a pause instead of
> 
> s/not equal to expected/not equal to the expected/
> 
> >+directly run the cmpxchg instruction. This might reduce excessive
> 
> s/directly run/directly running/

This results in "... execute a pause instead of directly running the
cmpxchg instruction", which needs further TLC because:

* 'a pause' should be 'the PAUSE instruction';
* 'directly running [an instruction]' does not seem correct in context.

The option also applies to __sync builtins, not just __atomic.

How about the following:

When emitting a compare-and-swap loop for @ref{__sync Builtins}
and @ref{__atomic Builtins} lacking a native instruction, optimize
for the highly contended case by issuing an atomic load before the
@code{CMPXCHG} instruction, and invoke the @code{PAUSE} instruction
when restarting the loop.

Alexander

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] doc: Reword the description of -mrelax-cmpxchg-loop [PR 107676]
  2022-11-15 13:28   ` Alexander Monakov
@ 2022-11-15 13:44     ` Jonathan Wakely
  2022-11-15 13:59       ` Alexander Monakov
  0 siblings, 1 reply; 9+ messages in thread
From: Jonathan Wakely @ 2022-11-15 13:44 UTC (permalink / raw)
  To: Alexander Monakov; +Cc: Hongyu Wang, gcc-patches, hongtao.liu, ubizjak

On Tue, 15 Nov 2022 at 13:34, Alexander Monakov <amonakov@ispras.ru> wrote:
>
> On Tue, 15 Nov 2022, Jonathan Wakely via Gcc-patches wrote:
>
> > > @item -mrelax-cmpxchg-loop
> > > @opindex mrelax-cmpxchg-loop
> > >-Relax cmpxchg loop by emitting an early load and compare before cmpxchg,
> > >-execute pause if load value is not expected. This reduces excessive
> > >-cachline bouncing when and works for all atomic logic fetch builtins
> > >-that generates compare and swap loop.
> > >+For compare and swap loops that emitted by some __atomic_* builtins
> >
> > s/that emitted/that are emitted/
> >
> > >+(e.g. __atomic_fetch_(or|and|xor|nand) and their __atomic_*_fetch
> > >+counterparts), emit an atomic load before cmpxchg instruction. If the
> >
> > s/before cmpxchg/before the cmpxchg/
> >
> > >+loaded value is not equal to expected, execute a pause instead of
> >
> > s/not equal to expected/not equal to the expected/
> >
> > >+directly run the cmpxchg instruction. This might reduce excessive
> >
> > s/directly run/directly running/
>
> This results in "... execute a pause instead of directly running the
> cmpxchg instruction", which needs further TLC because:
>
> * 'a pause' should be 'the PAUSE instruction';
> * 'directly running [an instruction]' does not seem correct in context.
>
> The option also applies to __sync builtins, not just __atomic.
>
>
> How about the following:
>
> When emitting a compare-and-swap loop for @ref{__sync Builtins}
> and @ref{__atomic Builtins} lacking a native instruction, optimize
> for the highly contended case by issuing an atomic load before the
> @code{CMPXCHG} instruction, and invoke the @code{PAUSE} instruction
> when restarting the loop.

That's much better, thanks. My only remaining quibble would be that
"invoking" an instruction seems only marginally better than running
one. Emitting? Issuing? Using? Adding?


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] doc: Reword the description of -mrelax-cmpxchg-loop [PR 107676]
  2022-11-15 13:44     ` Jonathan Wakely
@ 2022-11-15 13:59       ` Alexander Monakov
  2022-11-16  5:15         ` Hongyu Wang
  0 siblings, 1 reply; 9+ messages in thread
From: Alexander Monakov @ 2022-11-15 13:59 UTC (permalink / raw)
  To: Jonathan Wakely; +Cc: Hongyu Wang, gcc-patches, hongtao.liu, ubizjak


On Tue, 15 Nov 2022, Jonathan Wakely wrote:

> > How about the following:
> >
> > When emitting a compare-and-swap loop for @ref{__sync Builtins}
> > and @ref{__atomic Builtins} lacking a native instruction, optimize
> > for the highly contended case by issuing an atomic load before the
> > @code{CMPXCHG} instruction, and invoke the @code{PAUSE} instruction
> > when restarting the loop.
> 
> That's much better, thanks. My only remaining quibble would be that
> "invoking" an instruction seems only marginally better than running
> one. Emitting? Issuing? Using? Adding?

Right, it should be 'using'; let me also add 'to save CPU power':

When emitting a compare-and-swap loop for @ref{__sync Builtins}
and @ref{__atomic Builtins} lacking a native instruction, optimize
for the highly contended case by issuing an atomic load before the
@code{CMPXCHG} instruction, and using the @code{PAUSE} instruction
to save CPU power when restarting the loop.

Alexander

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] doc: Reword the description of -mrelax-cmpxchg-loop [PR 107676]
  2022-11-15 13:59       ` Alexander Monakov
@ 2022-11-16  5:15         ` Hongyu Wang
  2022-11-16  7:51           ` Alexander Monakov
  0 siblings, 1 reply; 9+ messages in thread
From: Hongyu Wang @ 2022-11-16  5:15 UTC (permalink / raw)
  To: Alexander Monakov
  Cc: Jonathan Wakely, Hongyu Wang, gcc-patches, hongtao.liu, ubizjak

[-- Attachment #1: Type: text/plain, Size: 1560 bytes --]

> When emitting a compare-and-swap loop for @ref{__sync Builtins}
> and @ref{__atomic Builtins} lacking a native instruction, optimize
> for the highly contended case by issuing an atomic load before the
> @code{CMPXCHG} instruction, and using the @code{PAUSE} instruction
> to save CPU power when restarting the loop.

Thanks for the correction, it looks quite clear now! Here is the
updated patch, ok for trunk?

Alexander Monakov via Gcc-patches <gcc-patches@gcc.gnu.org>
于2022年11月15日周二 21:59写道：
>
>
> On Tue, 15 Nov 2022, Jonathan Wakely wrote:
>
> > > How about the following:
> > >
> > > When emitting a compare-and-swap loop for @ref{__sync Builtins}
> > > and @ref{__atomic Builtins} lacking a native instruction, optimize
> > > for the highly contended case by issuing an atomic load before the
> > > @code{CMPXCHG} instruction, and invoke the @code{PAUSE} instruction
> > > when restarting the loop.
> >
> > That's much better, thanks. My only remaining quibble would be that
> > "invoking" an instruction seems only marginally better than running
> > one. Emitting? Issuing? Using? Adding?
>
> Right, it should be 'using'; let me also add 'to save CPU power':
>
> When emitting a compare-and-swap loop for @ref{__sync Builtins}
> and @ref{__atomic Builtins} lacking a native instruction, optimize
> for the highly contended case by issuing an atomic load before the
> @code{CMPXCHG} instruction, and using the @code{PAUSE} instruction
> to save CPU power when restarting the loop.
>
> Alexander

[-- Attachment #2: 0001-doc-Reword-the-description-of-mrelax-cmpxchg-loop-PR.patch --]
[-- Type: text/x-patch, Size: 1313 bytes --]

From e82f3e03115480ac3d055819658a107249932c65 Mon Sep 17 00:00:00 2001
From: Hongyu Wang <hongyu.wang@intel.com>
Date: Tue, 15 Nov 2022 11:16:17 +0800
Subject: [PATCH] doc: Reword the description of -mrelax-cmpxchg-loop [PR
 107676]

gcc/ChangeLog:

	PR target/107676
	* doc/invoke.texi: Reword the description of
	-mrelax-cmpxchg-loop.
---
 gcc/doc/invoke.texi | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 40f667a630a..bcbe3a7e420 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -33805,10 +33805,11 @@ registers.
 
 @item -mrelax-cmpxchg-loop
 @opindex mrelax-cmpxchg-loop
-Relax cmpxchg loop by emitting an early load and compare before cmpxchg,
-execute pause if load value is not expected. This reduces excessive
-cachline bouncing when and works for all atomic logic fetch builtins
-that generates compare and swap loop.
+When emitting a compare-and-swap loop for @ref{__sync Builtins}
+and @ref{__atomic Builtins} lacking a native instruction, optimize
+for the highly contended case by issuing an atomic load before the
+@code{CMPXCHG} instruction, and using the @code{PAUSE} instruction
+to save CPU power when restarting the loop.
 
 @item -mindirect-branch=@var{choice}
 @opindex mindirect-branch
-- 
2.18.1


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] doc: Reword the description of -mrelax-cmpxchg-loop [PR 107676]
  2022-11-16  5:15         ` Hongyu Wang
@ 2022-11-16  7:51           ` Alexander Monakov
  2022-11-16  8:29             ` Hongyu Wang
  2022-11-16 10:51             ` Jonathan Wakely
  0 siblings, 2 replies; 9+ messages in thread
From: Alexander Monakov @ 2022-11-16  7:51 UTC (permalink / raw)
  To: Hongyu Wang
  Cc: Jonathan Wakely, Hongyu Wang, gcc-patches, hongtao.liu, ubizjak


On Wed, 16 Nov 2022, Hongyu Wang wrote:

> > When emitting a compare-and-swap loop for @ref{__sync Builtins}
> > and @ref{__atomic Builtins} lacking a native instruction, optimize
> > for the highly contended case by issuing an atomic load before the
> > @code{CMPXCHG} instruction, and using the @code{PAUSE} instruction
> > to save CPU power when restarting the loop.
> 
> Thanks for the correction, it looks quite clear now! Here is the
> updated patch, ok for trunk?

Please use 'git commit --author' to indicate authorship of the patch
(or simply let me push it once approved).

Jonathan, please let us know if the above wording looks fine to you?
Mainly I'm asking if '... and using' or '... and use' is easier to read.

Alexander

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] doc: Reword the description of -mrelax-cmpxchg-loop [PR 107676]
  2022-11-16  7:51           ` Alexander Monakov
@ 2022-11-16  8:29             ` Hongyu Wang
  2022-11-16 10:51             ` Jonathan Wakely
  1 sibling, 0 replies; 9+ messages in thread
From: Hongyu Wang @ 2022-11-16  8:29 UTC (permalink / raw)
  To: Alexander Monakov
  Cc: Jonathan Wakely, Hongyu Wang, gcc-patches, hongtao.liu, ubizjak

> Please use 'git commit --author' to indicate authorship of the patch
> (or simply let me push it once approved).

Yes, just change the author and push it.
Thanks for your help!

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] doc: Reword the description of -mrelax-cmpxchg-loop [PR 107676]
  2022-11-16  7:51           ` Alexander Monakov
  2022-11-16  8:29             ` Hongyu Wang
@ 2022-11-16 10:51             ` Jonathan Wakely
  1 sibling, 0 replies; 9+ messages in thread
From: Jonathan Wakely @ 2022-11-16 10:51 UTC (permalink / raw)
  To: Alexander Monakov
  Cc: Hongyu Wang, Hongyu Wang, gcc-patches, hongtao.liu, ubizjak

On Wed, 16 Nov 2022 at 07:51, Alexander Monakov <amonakov@ispras.ru> wrote:
>
>
> On Wed, 16 Nov 2022, Hongyu Wang wrote:
>
> > > When emitting a compare-and-swap loop for @ref{__sync Builtins}
> > > and @ref{__atomic Builtins} lacking a native instruction, optimize
> > > for the highly contended case by issuing an atomic load before the
> > > @code{CMPXCHG} instruction, and using the @code{PAUSE} instruction
> > > to save CPU power when restarting the loop.
> >
> > Thanks for the correction, it looks quite clear now! Here is the
> > updated patch, ok for trunk?
>
> Please use 'git commit --author' to indicate authorship of the patch
> (or simply let me push it once approved).
>
> Jonathan, please let us know if the above wording looks fine to you?
> Mainly I'm asking if '... and using' or '... and use' is easier to read.


Your wording above ("and using...") looks good, it reads naturally and clearly.

It's quite a long sentence, so I considered suggesting:

"... by issuing an atomic load before the CMPXCHG instruction. Also
use the PAUSE instruction to save CPU power when restarting the loop."

But I think your original is better. The sentence is long, but it
flows better as a single sentence. As two sentences, the second one
just seems tacked onto the end and it's less clear how it relates to
the first.


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2022-11-16 10:52 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-11-15  3:35 [PATCH] doc: Reword the description of -mrelax-cmpxchg-loop [PR 107676] Hongyu Wang
2022-11-15 10:53 ` Jonathan Wakely
2022-11-15 13:28   ` Alexander Monakov
2022-11-15 13:44     ` Jonathan Wakely
2022-11-15 13:59       ` Alexander Monakov
2022-11-16  5:15         ` Hongyu Wang
2022-11-16  7:51           ` Alexander Monakov
2022-11-16  8:29             ` Hongyu Wang
2022-11-16 10:51             ` Jonathan Wakely

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).