public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* Re: [PATCH, AArch64, v3 0/6] LSE atomics out-of-line
@ 2019-09-05 14:36 Wilco Dijkstra
  2019-09-14 19:26 ` Richard Henderson
  0 siblings, 1 reply; 9+ messages in thread
From: Wilco Dijkstra @ 2019-09-05 14:36 UTC (permalink / raw)
  To: GCC Patches, Richard Henderson, Kyrylo Tkachov
  Cc: nd, Ramana Radhakrishnan, agraf, Marcus Shawcroft,
	James Greenhalgh, Richard Henderson

Hi Richard,

>    What I have not done, but is now a possibility, is to use a custom
>    calling convention for the out-of-line routines.  I now only clobber
>    2 (or 3, for TImode) temp regs and set a return value.

This would be a great feature to have since it reduces the overhead of
outlining considerably.

> I think this patch series would be great to have for GCC 10!

Agreed. I've got a couple of general comments:

* The option name -matomic-ool sounds too abbreviated. I think eg.
-moutline-atomics is more descriptive and user friendlier.

* Similarly the exported __aa64_have_atomics variable could be named
  __aarch64_have_lse_atomics so it's clear that it is about LSE atomics.

+@item -matomic-ool
+@itemx -mno-atomic-ool
+Enable or disable calls to out-of-line helpers to implement atomic operations.
+These helpers will, at runtime, determine if ARMv8.1-Atomics instructions
+should be used; if not, they will use the load/store-exclusive instructions
+that are present in the base ARMv8.0 ISA.
+
+This option is only applicable when compiling for the base ARMv8.0
+instruction set.  If using a later revision, e.g. @option{-march=armv8.1-a}
+or @option{-march=armv8-a+lse}, the ARMv8.1-Atomics instructions will be
+used directly. 

So what is the behaviour when you explicitly select a specific CPU?

+/* Branch to LABEL if LSE is enabled.
+   The branch should be easily predicted, in that it will, after constructors,
+   always branch the same way.  The expectation is that systems that implement
+   ARMv8.1-Atomics are "beefier" than those that omit the extension.
+   By arranging for the fall-through path to use load-store-exclusive insns,
+   we aid the branch predictor of the smallest cpus.  */ 

I'd say that by the time GCC10 is released and used in distros, systems without
LSE atomics would be practically non-existent. So we should favour LSE atomics
by default.

Cheers,
Wilco

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH, AArch64, v3 0/6] LSE atomics out-of-line
  2019-09-05 14:36 [PATCH, AArch64, v3 0/6] LSE atomics out-of-line Wilco Dijkstra
@ 2019-09-14 19:26 ` Richard Henderson
  2019-09-16 11:59   ` Wilco Dijkstra
  0 siblings, 1 reply; 9+ messages in thread
From: Richard Henderson @ 2019-09-14 19:26 UTC (permalink / raw)
  To: Wilco Dijkstra, GCC Patches, Richard Henderson, Kyrylo Tkachov
  Cc: nd, Ramana Radhakrishnan, agraf, Marcus Shawcroft, James Greenhalgh

On 9/5/19 10:35 AM, Wilco Dijkstra wrote:
> Agreed. I've got a couple of general comments:
> 
> * The option name -matomic-ool sounds too abbreviated. I think eg.
> -moutline-atomics is more descriptive and user friendlier.

Changed.

> * Similarly the exported __aa64_have_atomics variable could be named
>   __aarch64_have_lse_atomics so it's clear that it is about LSE atomics.

Changed.

> +@item -matomic-ool
> +@itemx -mno-atomic-ool
> +Enable or disable calls to out-of-line helpers to implement atomic operations.
> +These helpers will, at runtime, determine if ARMv8.1-Atomics instructions
> +should be used; if not, they will use the load/store-exclusive instructions
> +that are present in the base ARMv8.0 ISA.
> +
> +This option is only applicable when compiling for the base ARMv8.0
> +instruction set.  If using a later revision, e.g. @option{-march=armv8.1-a}
> +or @option{-march=armv8-a+lse}, the ARMv8.1-Atomics instructions will be
> +used directly. 
> 
> So what is the behaviour when you explicitly select a specific CPU?

Selecting a specific cpu selects the specific architecture that the cpu
supports, does it not?  Thus the architecture example above still applies.

Unless I don't understand what distinction that you're making?

> +/* Branch to LABEL if LSE is enabled.
> +   The branch should be easily predicted, in that it will, after constructors,
> +   always branch the same way.  The expectation is that systems that implement
> +   ARMv8.1-Atomics are "beefier" than those that omit the extension.
> +   By arranging for the fall-through path to use load-store-exclusive insns,
> +   we aid the branch predictor of the smallest cpus.  */ 
> 
> I'd say that by the time GCC10 is released and used in distros, systems without
> LSE atomics would be practically non-existent. So we should favour LSE atomics
> by default.

I suppose.  Does it not continue to be true that an a53 is more impacted by the
branch prediction than an a76?


r~

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH, AArch64, v3 0/6] LSE atomics out-of-line
  2019-09-14 19:26 ` Richard Henderson
@ 2019-09-16 11:59   ` Wilco Dijkstra
  2019-09-17  8:40     ` Kyrill Tkachov
  0 siblings, 1 reply; 9+ messages in thread
From: Wilco Dijkstra @ 2019-09-16 11:59 UTC (permalink / raw)
  To: Richard Henderson, GCC Patches, Kyrylo Tkachov
  Cc: nd, Ramana Radhakrishnan, agraf, Marcus Shawcroft, James Greenhalgh

Hi Richard,

>> So what is the behaviour when you explicitly select a specific CPU?
>
> Selecting a specific cpu selects the specific architecture that the cpu
> supports, does it not?  Thus the architecture example above still applies.
>
> Unless I don't understand what distinction that you're making?

When you select a CPU the goal is that we optimize and schedule for that
specific microarchitecture. That implies using atomics that work best for
that core rather than outlining them.

>> I'd say that by the time GCC10 is released and used in distros, systems without
>> LSE atomics would be practically non-existent. So we should favour LSE atomics
>> by default.
>
> I suppose.  Does it not continue to be true that an a53 is more impacted by the
> branch prediction than an a76?

That's hard to say for sure - the cost of taken branches (3 in just a few instructions for
the outlined atomics) might well affect big/wide cores more. Also note Cortex-A55
(successor of Cortex-A53) has LSE atomics.

Wilco

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH, AArch64, v3 0/6] LSE atomics out-of-line
  2019-09-16 11:59   ` Wilco Dijkstra
@ 2019-09-17  8:40     ` Kyrill Tkachov
  2019-09-17 10:55       ` Wilco Dijkstra
  0 siblings, 1 reply; 9+ messages in thread
From: Kyrill Tkachov @ 2019-09-17  8:40 UTC (permalink / raw)
  To: Wilco Dijkstra, Richard Henderson, GCC Patches
  Cc: nd, Ramana Radhakrishnan, agraf, Marcus Shawcroft, James Greenhalgh


On 9/16/19 12:58 PM, Wilco Dijkstra wrote:
> Hi Richard,
>
> >> So what is the behaviour when you explicitly select a specific CPU?
> >
> > Selecting a specific cpu selects the specific architecture that the cpu
> > supports, does it not?  Thus the architecture example above still 
> applies.
> >
> > Unless I don't understand what distinction that you're making?
>
> When you select a CPU the goal is that we optimize and schedule for that
> specific microarchitecture. That implies using atomics that work best for
> that core rather than outlining them.


I think we want to go ahead with this framework to enable the portable 
deployment of LSE atomics.

More CPU-specific fine-tuning can come later separately.

Thanks,

Kyrill


>
> >> I'd say that by the time GCC10 is released and used in distros, 
> systems without
> >> LSE atomics would be practically non-existent. So we should favour 
> LSE atomics
> >> by default.
> >
> > I suppose.  Does it not continue to be true that an a53 is more 
> impacted by the
> > branch prediction than an a76?
>
> That's hard to say for sure - the cost of taken branches (3 in just a 
> few instructions for
> the outlined atomics) might well affect big/wide cores more. Also note 
> Cortex-A55
> (successor of Cortex-A53) has LSE atomics.
>
> Wilco

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH, AArch64, v3 0/6] LSE atomics out-of-line
  2019-09-17  8:40     ` Kyrill Tkachov
@ 2019-09-17 10:55       ` Wilco Dijkstra
  2019-09-17 21:11         ` Richard Henderson
  0 siblings, 1 reply; 9+ messages in thread
From: Wilco Dijkstra @ 2019-09-17 10:55 UTC (permalink / raw)
  To: Kyrill Tkachov, Richard Henderson, GCC Patches
  Cc: nd, Ramana Radhakrishnan, agraf, Marcus Shawcroft, James Greenhalgh

Hi Kyrill,

>> When you select a CPU the goal is that we optimize and schedule for that
>> specific microarchitecture. That implies using atomics that work best for
>> that core rather than outlining them.
>
> I think we want to go ahead with this framework to enable the portable 
> deployment of LSE atomics.
>
> More CPU-specific fine-tuning can come later separately.

I'm not talking about CPU-specific fine-tuning, but ensuring we don't penalize
performance when a user selects the specific CPU their application will run on.
And in that case outlining is unnecessary.

Cheers,
Wilco

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH, AArch64, v3 0/6] LSE atomics out-of-line
  2019-09-17 10:55       ` Wilco Dijkstra
@ 2019-09-17 21:11         ` Richard Henderson
  0 siblings, 0 replies; 9+ messages in thread
From: Richard Henderson @ 2019-09-17 21:11 UTC (permalink / raw)
  To: Wilco Dijkstra, Kyrill Tkachov, Richard Henderson, GCC Patches
  Cc: nd, Ramana Radhakrishnan, agraf, Marcus Shawcroft, James Greenhalgh

On 9/17/19 6:55 AM, Wilco Dijkstra wrote:
> Hi Kyrill,
> 
>>> When you select a CPU the goal is that we optimize and schedule for that
>>> specific microarchitecture. That implies using atomics that work best for
>>> that core rather than outlining them.
>>
>> I think we want to go ahead with this framework to enable the portable 
>> deployment of LSE atomics.
>>
>> More CPU-specific fine-tuning can come later separately.
> 
> I'm not talking about CPU-specific fine-tuning, but ensuring we don't penalize
> performance when a user selects the specific CPU their application will run on.
> And in that case outlining is unnecessary.

From aarch64_override_options:

Given both -march=foo -mcpu=bar, then the architecture will be foo and -mcpu
will be treated as -mtune=bar, but will not use any insn not in foo.

Given only -mcpu=foo, then the architecture will be the one supported by foo.

So if foo supports LSE, then we will not outline the functions, no matter how
we arrive at foo.


r~

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH, AArch64, v3 0/6] LSE atomics out-of-line
  2018-11-01 21:47 Richard Henderson
  2018-11-11 12:30 ` Richard Henderson
@ 2019-09-05  9:51 ` Kyrill Tkachov
  1 sibling, 0 replies; 9+ messages in thread
From: Kyrill Tkachov @ 2019-09-05  9:51 UTC (permalink / raw)
  To: Richard Henderson, gcc-patches
  Cc: Ramana Radhakrishnan, agraf, Marcus Shawcroft, James Greenhalgh,
	Richard Henderson

Hi Richard,

On 11/1/18 9:46 PM, Richard Henderson wrote:
> From: Richard Henderson <rth@twiddle.net>
>
> Changes since v2:
>   * Committed half of the patch set.
>   * Split inline TImode support from out-of-line patches.
>   * Removed the ST<OP> out-of-line functions, to match inline.
>   * Moved the out-of-line functions to assembly.
>
> What I have not done, but is now a possibility, is to use a custom
> calling convention for the out-of-line routines.  I now only clobber
> 2 (or 3, for TImode) temp regs and set a return value.
>
I think this patch series would be great to have for GCC 10!

I've rebased them on current trunk and fixed up a couple of minor 
conflicts in my local tree.

After that, I've encountered a couple of issues with building a compiler 
with these patches.

I'll respond to the individual patches that I think cause the trouble.

Thanks,

Kyrill


>
> r~
>
>
> Richard Henderson (6):
>   aarch64: Extend %R for integer registers
>   aarch64: Implement TImode compare-and-swap
>   aarch64: Tidy aarch64_split_compare_and_swap
>   aarch64: Add out-of-line functions for LSE atomics
>   aarch64: Implement -matomic-ool
>   Enable -matomic-ool by default
>
>  gcc/config/aarch64/aarch64-protos.h           |  13 +
>  gcc/common/config/aarch64/aarch64-common.c    |   6 +-
>  gcc/config/aarch64/aarch64.c                  | 211 ++++++++++++----
>  .../atomic-comp-swap-release-acquire.c        |   2 +-
>  .../gcc.target/aarch64/atomic-op-acq_rel.c    |   2 +-
>  .../gcc.target/aarch64/atomic-op-acquire.c    |   2 +-
>  .../gcc.target/aarch64/atomic-op-char.c       |   2 +-
>  .../gcc.target/aarch64/atomic-op-consume.c    |   2 +-
>  .../gcc.target/aarch64/atomic-op-imm.c        |   2 +-
>  .../gcc.target/aarch64/atomic-op-int.c        |   2 +-
>  .../gcc.target/aarch64/atomic-op-long.c       |   2 +-
>  .../gcc.target/aarch64/atomic-op-relaxed.c    |   2 +-
>  .../gcc.target/aarch64/atomic-op-release.c    |   2 +-
>  .../gcc.target/aarch64/atomic-op-seq_cst.c    |   2 +-
>  .../gcc.target/aarch64/atomic-op-short.c      |   2 +-
>  .../aarch64/atomic_cmp_exchange_zero_reg_1.c  |   2 +-
>  .../atomic_cmp_exchange_zero_strong_1.c       |   2 +-
>  .../gcc.target/aarch64/sync-comp-swap.c       |   2 +-
>  .../gcc.target/aarch64/sync-op-acquire.c      |   2 +-
>  .../gcc.target/aarch64/sync-op-full.c         |   2 +-
>  libgcc/config/aarch64/lse-init.c              |  45 ++++
>  gcc/config/aarch64/aarch64.opt                |   4 +
>  gcc/config/aarch64/atomics.md                 | 185 +++++++++++++-
>  gcc/config/aarch64/iterators.md               |   3 +
>  gcc/doc/invoke.texi                           |  14 +-
>  libgcc/config.host                            |   4 +
>  libgcc/config/aarch64/lse.S                   | 238 ++++++++++++++++++
>  libgcc/config/aarch64/t-lse                   |  44 ++++
>  28 files changed, 717 insertions(+), 84 deletions(-)
>  create mode 100644 libgcc/config/aarch64/lse-init.c
>  create mode 100644 libgcc/config/aarch64/lse.S
>  create mode 100644 libgcc/config/aarch64/t-lse
>
> -- 
> 2.17.2
>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH, AArch64, v3 0/6] LSE atomics out-of-line
  2018-11-01 21:47 Richard Henderson
@ 2018-11-11 12:30 ` Richard Henderson
  2019-09-05  9:51 ` Kyrill Tkachov
  1 sibling, 0 replies; 9+ messages in thread
From: Richard Henderson @ 2018-11-11 12:30 UTC (permalink / raw)
  To: Richard Henderson, gcc-patches
  Cc: ramana.radhakrishnan, agraf, marcus.shawcroft, james.greenhalgh

Ping.

On 11/1/18 10:46 PM, Richard Henderson wrote:
> From: Richard Henderson <rth@twiddle.net>
> 
> Changes since v2:
>   * Committed half of the patch set.
>   * Split inline TImode support from out-of-line patches.
>   * Removed the ST<OP> out-of-line functions, to match inline.
>   * Moved the out-of-line functions to assembly.
> 
> What I have not done, but is now a possibility, is to use a custom
> calling convention for the out-of-line routines.  I now only clobber
> 2 (or 3, for TImode) temp regs and set a return value.
> 
> 
> r~
>   
> 
> Richard Henderson (6):
>   aarch64: Extend %R for integer registers
>   aarch64: Implement TImode compare-and-swap
>   aarch64: Tidy aarch64_split_compare_and_swap
>   aarch64: Add out-of-line functions for LSE atomics
>   aarch64: Implement -matomic-ool
>   Enable -matomic-ool by default
> 
>  gcc/config/aarch64/aarch64-protos.h           |  13 +
>  gcc/common/config/aarch64/aarch64-common.c    |   6 +-
>  gcc/config/aarch64/aarch64.c                  | 211 ++++++++++++----
>  .../atomic-comp-swap-release-acquire.c        |   2 +-
>  .../gcc.target/aarch64/atomic-op-acq_rel.c    |   2 +-
>  .../gcc.target/aarch64/atomic-op-acquire.c    |   2 +-
>  .../gcc.target/aarch64/atomic-op-char.c       |   2 +-
>  .../gcc.target/aarch64/atomic-op-consume.c    |   2 +-
>  .../gcc.target/aarch64/atomic-op-imm.c        |   2 +-
>  .../gcc.target/aarch64/atomic-op-int.c        |   2 +-
>  .../gcc.target/aarch64/atomic-op-long.c       |   2 +-
>  .../gcc.target/aarch64/atomic-op-relaxed.c    |   2 +-
>  .../gcc.target/aarch64/atomic-op-release.c    |   2 +-
>  .../gcc.target/aarch64/atomic-op-seq_cst.c    |   2 +-
>  .../gcc.target/aarch64/atomic-op-short.c      |   2 +-
>  .../aarch64/atomic_cmp_exchange_zero_reg_1.c  |   2 +-
>  .../atomic_cmp_exchange_zero_strong_1.c       |   2 +-
>  .../gcc.target/aarch64/sync-comp-swap.c       |   2 +-
>  .../gcc.target/aarch64/sync-op-acquire.c      |   2 +-
>  .../gcc.target/aarch64/sync-op-full.c         |   2 +-
>  libgcc/config/aarch64/lse-init.c              |  45 ++++
>  gcc/config/aarch64/aarch64.opt                |   4 +
>  gcc/config/aarch64/atomics.md                 | 185 +++++++++++++-
>  gcc/config/aarch64/iterators.md               |   3 +
>  gcc/doc/invoke.texi                           |  14 +-
>  libgcc/config.host                            |   4 +
>  libgcc/config/aarch64/lse.S                   | 238 ++++++++++++++++++
>  libgcc/config/aarch64/t-lse                   |  44 ++++
>  28 files changed, 717 insertions(+), 84 deletions(-)
>  create mode 100644 libgcc/config/aarch64/lse-init.c
>  create mode 100644 libgcc/config/aarch64/lse.S
>  create mode 100644 libgcc/config/aarch64/t-lse
> 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH, AArch64, v3 0/6] LSE atomics out-of-line
@ 2018-11-01 21:47 Richard Henderson
  2018-11-11 12:30 ` Richard Henderson
  2019-09-05  9:51 ` Kyrill Tkachov
  0 siblings, 2 replies; 9+ messages in thread
From: Richard Henderson @ 2018-11-01 21:47 UTC (permalink / raw)
  To: gcc-patches
  Cc: ramana.radhakrishnan, agraf, marcus.shawcroft, james.greenhalgh,
	Richard Henderson

From: Richard Henderson <rth@twiddle.net>

Changes since v2:
  * Committed half of the patch set.
  * Split inline TImode support from out-of-line patches.
  * Removed the ST<OP> out-of-line functions, to match inline.
  * Moved the out-of-line functions to assembly.

What I have not done, but is now a possibility, is to use a custom
calling convention for the out-of-line routines.  I now only clobber
2 (or 3, for TImode) temp regs and set a return value.


r~
  

Richard Henderson (6):
  aarch64: Extend %R for integer registers
  aarch64: Implement TImode compare-and-swap
  aarch64: Tidy aarch64_split_compare_and_swap
  aarch64: Add out-of-line functions for LSE atomics
  aarch64: Implement -matomic-ool
  Enable -matomic-ool by default

 gcc/config/aarch64/aarch64-protos.h           |  13 +
 gcc/common/config/aarch64/aarch64-common.c    |   6 +-
 gcc/config/aarch64/aarch64.c                  | 211 ++++++++++++----
 .../atomic-comp-swap-release-acquire.c        |   2 +-
 .../gcc.target/aarch64/atomic-op-acq_rel.c    |   2 +-
 .../gcc.target/aarch64/atomic-op-acquire.c    |   2 +-
 .../gcc.target/aarch64/atomic-op-char.c       |   2 +-
 .../gcc.target/aarch64/atomic-op-consume.c    |   2 +-
 .../gcc.target/aarch64/atomic-op-imm.c        |   2 +-
 .../gcc.target/aarch64/atomic-op-int.c        |   2 +-
 .../gcc.target/aarch64/atomic-op-long.c       |   2 +-
 .../gcc.target/aarch64/atomic-op-relaxed.c    |   2 +-
 .../gcc.target/aarch64/atomic-op-release.c    |   2 +-
 .../gcc.target/aarch64/atomic-op-seq_cst.c    |   2 +-
 .../gcc.target/aarch64/atomic-op-short.c      |   2 +-
 .../aarch64/atomic_cmp_exchange_zero_reg_1.c  |   2 +-
 .../atomic_cmp_exchange_zero_strong_1.c       |   2 +-
 .../gcc.target/aarch64/sync-comp-swap.c       |   2 +-
 .../gcc.target/aarch64/sync-op-acquire.c      |   2 +-
 .../gcc.target/aarch64/sync-op-full.c         |   2 +-
 libgcc/config/aarch64/lse-init.c              |  45 ++++
 gcc/config/aarch64/aarch64.opt                |   4 +
 gcc/config/aarch64/atomics.md                 | 185 +++++++++++++-
 gcc/config/aarch64/iterators.md               |   3 +
 gcc/doc/invoke.texi                           |  14 +-
 libgcc/config.host                            |   4 +
 libgcc/config/aarch64/lse.S                   | 238 ++++++++++++++++++
 libgcc/config/aarch64/t-lse                   |  44 ++++
 28 files changed, 717 insertions(+), 84 deletions(-)
 create mode 100644 libgcc/config/aarch64/lse-init.c
 create mode 100644 libgcc/config/aarch64/lse.S
 create mode 100644 libgcc/config/aarch64/t-lse

-- 
2.17.2

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2019-09-17 21:11 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-09-05 14:36 [PATCH, AArch64, v3 0/6] LSE atomics out-of-line Wilco Dijkstra
2019-09-14 19:26 ` Richard Henderson
2019-09-16 11:59   ` Wilco Dijkstra
2019-09-17  8:40     ` Kyrill Tkachov
2019-09-17 10:55       ` Wilco Dijkstra
2019-09-17 21:11         ` Richard Henderson
  -- strict thread matches above, loose matches on Subject: below --
2018-11-01 21:47 Richard Henderson
2018-11-11 12:30 ` Richard Henderson
2019-09-05  9:51 ` Kyrill Tkachov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).