* Re: [PATCH, AArch64, v3 0/6] LSE atomics out-of-line
@ 2019-09-05 14:36 Wilco Dijkstra
2019-09-14 19:26 ` Richard Henderson
0 siblings, 1 reply; 9+ messages in thread
From: Wilco Dijkstra @ 2019-09-05 14:36 UTC (permalink / raw)
To: GCC Patches, Richard Henderson, Kyrylo Tkachov
Cc: nd, Ramana Radhakrishnan, agraf, Marcus Shawcroft,
James Greenhalgh, Richard Henderson
Hi Richard,
> What I have not done, but is now a possibility, is to use a custom
> calling convention for the out-of-line routines. I now only clobber
> 2 (or 3, for TImode) temp regs and set a return value.
This would be a great feature to have since it reduces the overhead of
outlining considerably.
> I think this patch series would be great to have for GCC 10!
Agreed. I've got a couple of general comments:
* The option name -matomic-ool sounds too abbreviated. I think eg.
-moutline-atomics is more descriptive and user friendlier.
* Similarly the exported __aa64_have_atomics variable could be named
__aarch64_have_lse_atomics so it's clear that it is about LSE atomics.
+@item -matomic-ool
+@itemx -mno-atomic-ool
+Enable or disable calls to out-of-line helpers to implement atomic operations.
+These helpers will, at runtime, determine if ARMv8.1-Atomics instructions
+should be used; if not, they will use the load/store-exclusive instructions
+that are present in the base ARMv8.0 ISA.
+
+This option is only applicable when compiling for the base ARMv8.0
+instruction set. If using a later revision, e.g. @option{-march=armv8.1-a}
+or @option{-march=armv8-a+lse}, the ARMv8.1-Atomics instructions will be
+used directly.
So what is the behaviour when you explicitly select a specific CPU?
+/* Branch to LABEL if LSE is enabled.
+ The branch should be easily predicted, in that it will, after constructors,
+ always branch the same way. The expectation is that systems that implement
+ ARMv8.1-Atomics are "beefier" than those that omit the extension.
+ By arranging for the fall-through path to use load-store-exclusive insns,
+ we aid the branch predictor of the smallest cpus. */
I'd say that by the time GCC10 is released and used in distros, systems without
LSE atomics would be practically non-existent. So we should favour LSE atomics
by default.
Cheers,
Wilco
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH, AArch64, v3 0/6] LSE atomics out-of-line
2019-09-05 14:36 [PATCH, AArch64, v3 0/6] LSE atomics out-of-line Wilco Dijkstra
@ 2019-09-14 19:26 ` Richard Henderson
2019-09-16 11:59 ` Wilco Dijkstra
0 siblings, 1 reply; 9+ messages in thread
From: Richard Henderson @ 2019-09-14 19:26 UTC (permalink / raw)
To: Wilco Dijkstra, GCC Patches, Richard Henderson, Kyrylo Tkachov
Cc: nd, Ramana Radhakrishnan, agraf, Marcus Shawcroft, James Greenhalgh
On 9/5/19 10:35 AM, Wilco Dijkstra wrote:
> Agreed. I've got a couple of general comments:
>
> * The option name -matomic-ool sounds too abbreviated. I think eg.
> -moutline-atomics is more descriptive and user friendlier.
Changed.
> * Similarly the exported __aa64_have_atomics variable could be named
> __aarch64_have_lse_atomics so it's clear that it is about LSE atomics.
Changed.
> +@item -matomic-ool
> +@itemx -mno-atomic-ool
> +Enable or disable calls to out-of-line helpers to implement atomic operations.
> +These helpers will, at runtime, determine if ARMv8.1-Atomics instructions
> +should be used; if not, they will use the load/store-exclusive instructions
> +that are present in the base ARMv8.0 ISA.
> +
> +This option is only applicable when compiling for the base ARMv8.0
> +instruction set. If using a later revision, e.g. @option{-march=armv8.1-a}
> +or @option{-march=armv8-a+lse}, the ARMv8.1-Atomics instructions will be
> +used directly.
>
> So what is the behaviour when you explicitly select a specific CPU?
Selecting a specific cpu selects the specific architecture that the cpu
supports, does it not? Thus the architecture example above still applies.
Unless I don't understand what distinction that you're making?
> +/* Branch to LABEL if LSE is enabled.
> + The branch should be easily predicted, in that it will, after constructors,
> + always branch the same way. The expectation is that systems that implement
> + ARMv8.1-Atomics are "beefier" than those that omit the extension.
> + By arranging for the fall-through path to use load-store-exclusive insns,
> + we aid the branch predictor of the smallest cpus. */
>
> I'd say that by the time GCC10 is released and used in distros, systems without
> LSE atomics would be practically non-existent. So we should favour LSE atomics
> by default.
I suppose. Does it not continue to be true that an a53 is more impacted by the
branch prediction than an a76?
r~
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH, AArch64, v3 0/6] LSE atomics out-of-line
2019-09-14 19:26 ` Richard Henderson
@ 2019-09-16 11:59 ` Wilco Dijkstra
2019-09-17 8:40 ` Kyrill Tkachov
0 siblings, 1 reply; 9+ messages in thread
From: Wilco Dijkstra @ 2019-09-16 11:59 UTC (permalink / raw)
To: Richard Henderson, GCC Patches, Kyrylo Tkachov
Cc: nd, Ramana Radhakrishnan, agraf, Marcus Shawcroft, James Greenhalgh
Hi Richard,
>> So what is the behaviour when you explicitly select a specific CPU?
>
> Selecting a specific cpu selects the specific architecture that the cpu
> supports, does it not? Thus the architecture example above still applies.
>
> Unless I don't understand what distinction that you're making?
When you select a CPU the goal is that we optimize and schedule for that
specific microarchitecture. That implies using atomics that work best for
that core rather than outlining them.
>> I'd say that by the time GCC10 is released and used in distros, systems without
>> LSE atomics would be practically non-existent. So we should favour LSE atomics
>> by default.
>
> I suppose. Does it not continue to be true that an a53 is more impacted by the
> branch prediction than an a76?
That's hard to say for sure - the cost of taken branches (3 in just a few instructions for
the outlined atomics) might well affect big/wide cores more. Also note Cortex-A55
(successor of Cortex-A53) has LSE atomics.
Wilco
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH, AArch64, v3 0/6] LSE atomics out-of-line
2019-09-16 11:59 ` Wilco Dijkstra
@ 2019-09-17 8:40 ` Kyrill Tkachov
2019-09-17 10:55 ` Wilco Dijkstra
0 siblings, 1 reply; 9+ messages in thread
From: Kyrill Tkachov @ 2019-09-17 8:40 UTC (permalink / raw)
To: Wilco Dijkstra, Richard Henderson, GCC Patches
Cc: nd, Ramana Radhakrishnan, agraf, Marcus Shawcroft, James Greenhalgh
On 9/16/19 12:58 PM, Wilco Dijkstra wrote:
> Hi Richard,
>
> >> So what is the behaviour when you explicitly select a specific CPU?
> >
> > Selecting a specific cpu selects the specific architecture that the cpu
> > supports, does it not? Thus the architecture example above still
> applies.
> >
> > Unless I don't understand what distinction that you're making?
>
> When you select a CPU the goal is that we optimize and schedule for that
> specific microarchitecture. That implies using atomics that work best for
> that core rather than outlining them.
I think we want to go ahead with this framework to enable the portable
deployment of LSE atomics.
More CPU-specific fine-tuning can come later separately.
Thanks,
Kyrill
>
> >> I'd say that by the time GCC10 is released and used in distros,
> systems without
> >> LSE atomics would be practically non-existent. So we should favour
> LSE atomics
> >> by default.
> >
> > I suppose. Does it not continue to be true that an a53 is more
> impacted by the
> > branch prediction than an a76?
>
> That's hard to say for sure - the cost of taken branches (3 in just a
> few instructions for
> the outlined atomics) might well affect big/wide cores more. Also note
> Cortex-A55
> (successor of Cortex-A53) has LSE atomics.
>
> Wilco
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH, AArch64, v3 0/6] LSE atomics out-of-line
2019-09-17 8:40 ` Kyrill Tkachov
@ 2019-09-17 10:55 ` Wilco Dijkstra
2019-09-17 21:11 ` Richard Henderson
0 siblings, 1 reply; 9+ messages in thread
From: Wilco Dijkstra @ 2019-09-17 10:55 UTC (permalink / raw)
To: Kyrill Tkachov, Richard Henderson, GCC Patches
Cc: nd, Ramana Radhakrishnan, agraf, Marcus Shawcroft, James Greenhalgh
Hi Kyrill,
>> When you select a CPU the goal is that we optimize and schedule for that
>> specific microarchitecture. That implies using atomics that work best for
>> that core rather than outlining them.
>
> I think we want to go ahead with this framework to enable the portable
> deployment of LSE atomics.
>
> More CPU-specific fine-tuning can come later separately.
I'm not talking about CPU-specific fine-tuning, but ensuring we don't penalize
performance when a user selects the specific CPU their application will run on.
And in that case outlining is unnecessary.
Cheers,
Wilco
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH, AArch64, v3 0/6] LSE atomics out-of-line
2019-09-17 10:55 ` Wilco Dijkstra
@ 2019-09-17 21:11 ` Richard Henderson
0 siblings, 0 replies; 9+ messages in thread
From: Richard Henderson @ 2019-09-17 21:11 UTC (permalink / raw)
To: Wilco Dijkstra, Kyrill Tkachov, Richard Henderson, GCC Patches
Cc: nd, Ramana Radhakrishnan, agraf, Marcus Shawcroft, James Greenhalgh
On 9/17/19 6:55 AM, Wilco Dijkstra wrote:
> Hi Kyrill,
>
>>> When you select a CPU the goal is that we optimize and schedule for that
>>> specific microarchitecture. That implies using atomics that work best for
>>> that core rather than outlining them.
>>
>> I think we want to go ahead with this framework to enable the portable
>> deployment of LSE atomics.
>>
>> More CPU-specific fine-tuning can come later separately.
>
> I'm not talking about CPU-specific fine-tuning, but ensuring we don't penalize
> performance when a user selects the specific CPU their application will run on.
> And in that case outlining is unnecessary.
From aarch64_override_options:
Given both -march=foo -mcpu=bar, then the architecture will be foo and -mcpu
will be treated as -mtune=bar, but will not use any insn not in foo.
Given only -mcpu=foo, then the architecture will be the one supported by foo.
So if foo supports LSE, then we will not outline the functions, no matter how
we arrive at foo.
r~
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH, AArch64, v3 0/6] LSE atomics out-of-line
2018-11-01 21:47 Richard Henderson
2018-11-11 12:30 ` Richard Henderson
@ 2019-09-05 9:51 ` Kyrill Tkachov
1 sibling, 0 replies; 9+ messages in thread
From: Kyrill Tkachov @ 2019-09-05 9:51 UTC (permalink / raw)
To: Richard Henderson, gcc-patches
Cc: Ramana Radhakrishnan, agraf, Marcus Shawcroft, James Greenhalgh,
Richard Henderson
Hi Richard,
On 11/1/18 9:46 PM, Richard Henderson wrote:
> From: Richard Henderson <rth@twiddle.net>
>
> Changes since v2:
> Â * Committed half of the patch set.
> Â * Split inline TImode support from out-of-line patches.
> Â * Removed the ST<OP> out-of-line functions, to match inline.
> Â * Moved the out-of-line functions to assembly.
>
> What I have not done, but is now a possibility, is to use a custom
> calling convention for the out-of-line routines. I now only clobber
> 2 (or 3, for TImode) temp regs and set a return value.
>
I think this patch series would be great to have for GCC 10!
I've rebased them on current trunk and fixed up a couple of minor
conflicts in my local tree.
After that, I've encountered a couple of issues with building a compiler
with these patches.
I'll respond to the individual patches that I think cause the trouble.
Thanks,
Kyrill
>
> r~
>
>
> Richard Henderson (6):
> Â aarch64: Extend %R for integer registers
> Â aarch64: Implement TImode compare-and-swap
> Â aarch64: Tidy aarch64_split_compare_and_swap
> Â aarch64: Add out-of-line functions for LSE atomics
> Â aarch64: Implement -matomic-ool
> Â Enable -matomic-ool by default
>
>  gcc/config/aarch64/aarch64-protos.h          | 13 +
>  gcc/common/config/aarch64/aarch64-common.c   |  6 +-
>  gcc/config/aarch64/aarch64.c                 | 211 ++++++++++++----
>  .../atomic-comp-swap-release-acquire.c       |  2 +-
>  .../gcc.target/aarch64/atomic-op-acq_rel.c   |  2 +-
>  .../gcc.target/aarch64/atomic-op-acquire.c   |  2 +-
>  .../gcc.target/aarch64/atomic-op-char.c      |  2 +-
>  .../gcc.target/aarch64/atomic-op-consume.c   |  2 +-
>  .../gcc.target/aarch64/atomic-op-imm.c       |  2 +-
>  .../gcc.target/aarch64/atomic-op-int.c       |  2 +-
>  .../gcc.target/aarch64/atomic-op-long.c      |  2 +-
>  .../gcc.target/aarch64/atomic-op-relaxed.c   |  2 +-
>  .../gcc.target/aarch64/atomic-op-release.c   |  2 +-
>  .../gcc.target/aarch64/atomic-op-seq_cst.c   |  2 +-
>  .../gcc.target/aarch64/atomic-op-short.c     |  2 +-
>  .../aarch64/atomic_cmp_exchange_zero_reg_1.c |  2 +-
>  .../atomic_cmp_exchange_zero_strong_1.c      |  2 +-
>  .../gcc.target/aarch64/sync-comp-swap.c      |  2 +-
>  .../gcc.target/aarch64/sync-op-acquire.c     |  2 +-
>  .../gcc.target/aarch64/sync-op-full.c        |  2 +-
>  libgcc/config/aarch64/lse-init.c             | 45 ++++
>  gcc/config/aarch64/aarch64.opt               |  4 +
>  gcc/config/aarch64/atomics.md                | 185 +++++++++++++-
>  gcc/config/aarch64/iterators.md              |  3 +
>  gcc/doc/invoke.texi                          | 14 +-
>  libgcc/config.host                           |  4 +
> Â libgcc/config/aarch64/lse.SÂ Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â | 238 ++++++++++++++++++
>  libgcc/config/aarch64/t-lse                  | 44 ++++
> Â 28 files changed, 717 insertions(+), 84 deletions(-)
> Â create mode 100644 libgcc/config/aarch64/lse-init.c
> Â create mode 100644 libgcc/config/aarch64/lse.S
> Â create mode 100644 libgcc/config/aarch64/t-lse
>
> --
> 2.17.2
>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH, AArch64, v3 0/6] LSE atomics out-of-line
2018-11-01 21:47 Richard Henderson
@ 2018-11-11 12:30 ` Richard Henderson
2019-09-05 9:51 ` Kyrill Tkachov
1 sibling, 0 replies; 9+ messages in thread
From: Richard Henderson @ 2018-11-11 12:30 UTC (permalink / raw)
To: Richard Henderson, gcc-patches
Cc: ramana.radhakrishnan, agraf, marcus.shawcroft, james.greenhalgh
Ping.
On 11/1/18 10:46 PM, Richard Henderson wrote:
> From: Richard Henderson <rth@twiddle.net>
>
> Changes since v2:
> * Committed half of the patch set.
> * Split inline TImode support from out-of-line patches.
> * Removed the ST<OP> out-of-line functions, to match inline.
> * Moved the out-of-line functions to assembly.
>
> What I have not done, but is now a possibility, is to use a custom
> calling convention for the out-of-line routines. I now only clobber
> 2 (or 3, for TImode) temp regs and set a return value.
>
>
> r~
>
>
> Richard Henderson (6):
> aarch64: Extend %R for integer registers
> aarch64: Implement TImode compare-and-swap
> aarch64: Tidy aarch64_split_compare_and_swap
> aarch64: Add out-of-line functions for LSE atomics
> aarch64: Implement -matomic-ool
> Enable -matomic-ool by default
>
> gcc/config/aarch64/aarch64-protos.h | 13 +
> gcc/common/config/aarch64/aarch64-common.c | 6 +-
> gcc/config/aarch64/aarch64.c | 211 ++++++++++++----
> .../atomic-comp-swap-release-acquire.c | 2 +-
> .../gcc.target/aarch64/atomic-op-acq_rel.c | 2 +-
> .../gcc.target/aarch64/atomic-op-acquire.c | 2 +-
> .../gcc.target/aarch64/atomic-op-char.c | 2 +-
> .../gcc.target/aarch64/atomic-op-consume.c | 2 +-
> .../gcc.target/aarch64/atomic-op-imm.c | 2 +-
> .../gcc.target/aarch64/atomic-op-int.c | 2 +-
> .../gcc.target/aarch64/atomic-op-long.c | 2 +-
> .../gcc.target/aarch64/atomic-op-relaxed.c | 2 +-
> .../gcc.target/aarch64/atomic-op-release.c | 2 +-
> .../gcc.target/aarch64/atomic-op-seq_cst.c | 2 +-
> .../gcc.target/aarch64/atomic-op-short.c | 2 +-
> .../aarch64/atomic_cmp_exchange_zero_reg_1.c | 2 +-
> .../atomic_cmp_exchange_zero_strong_1.c | 2 +-
> .../gcc.target/aarch64/sync-comp-swap.c | 2 +-
> .../gcc.target/aarch64/sync-op-acquire.c | 2 +-
> .../gcc.target/aarch64/sync-op-full.c | 2 +-
> libgcc/config/aarch64/lse-init.c | 45 ++++
> gcc/config/aarch64/aarch64.opt | 4 +
> gcc/config/aarch64/atomics.md | 185 +++++++++++++-
> gcc/config/aarch64/iterators.md | 3 +
> gcc/doc/invoke.texi | 14 +-
> libgcc/config.host | 4 +
> libgcc/config/aarch64/lse.S | 238 ++++++++++++++++++
> libgcc/config/aarch64/t-lse | 44 ++++
> 28 files changed, 717 insertions(+), 84 deletions(-)
> create mode 100644 libgcc/config/aarch64/lse-init.c
> create mode 100644 libgcc/config/aarch64/lse.S
> create mode 100644 libgcc/config/aarch64/t-lse
>
^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH, AArch64, v3 0/6] LSE atomics out-of-line
@ 2018-11-01 21:47 Richard Henderson
2018-11-11 12:30 ` Richard Henderson
2019-09-05 9:51 ` Kyrill Tkachov
0 siblings, 2 replies; 9+ messages in thread
From: Richard Henderson @ 2018-11-01 21:47 UTC (permalink / raw)
To: gcc-patches
Cc: ramana.radhakrishnan, agraf, marcus.shawcroft, james.greenhalgh,
Richard Henderson
From: Richard Henderson <rth@twiddle.net>
Changes since v2:
* Committed half of the patch set.
* Split inline TImode support from out-of-line patches.
* Removed the ST<OP> out-of-line functions, to match inline.
* Moved the out-of-line functions to assembly.
What I have not done, but is now a possibility, is to use a custom
calling convention for the out-of-line routines. I now only clobber
2 (or 3, for TImode) temp regs and set a return value.
r~
Richard Henderson (6):
aarch64: Extend %R for integer registers
aarch64: Implement TImode compare-and-swap
aarch64: Tidy aarch64_split_compare_and_swap
aarch64: Add out-of-line functions for LSE atomics
aarch64: Implement -matomic-ool
Enable -matomic-ool by default
gcc/config/aarch64/aarch64-protos.h | 13 +
gcc/common/config/aarch64/aarch64-common.c | 6 +-
gcc/config/aarch64/aarch64.c | 211 ++++++++++++----
.../atomic-comp-swap-release-acquire.c | 2 +-
.../gcc.target/aarch64/atomic-op-acq_rel.c | 2 +-
.../gcc.target/aarch64/atomic-op-acquire.c | 2 +-
.../gcc.target/aarch64/atomic-op-char.c | 2 +-
.../gcc.target/aarch64/atomic-op-consume.c | 2 +-
.../gcc.target/aarch64/atomic-op-imm.c | 2 +-
.../gcc.target/aarch64/atomic-op-int.c | 2 +-
.../gcc.target/aarch64/atomic-op-long.c | 2 +-
.../gcc.target/aarch64/atomic-op-relaxed.c | 2 +-
.../gcc.target/aarch64/atomic-op-release.c | 2 +-
.../gcc.target/aarch64/atomic-op-seq_cst.c | 2 +-
.../gcc.target/aarch64/atomic-op-short.c | 2 +-
.../aarch64/atomic_cmp_exchange_zero_reg_1.c | 2 +-
.../atomic_cmp_exchange_zero_strong_1.c | 2 +-
.../gcc.target/aarch64/sync-comp-swap.c | 2 +-
.../gcc.target/aarch64/sync-op-acquire.c | 2 +-
.../gcc.target/aarch64/sync-op-full.c | 2 +-
libgcc/config/aarch64/lse-init.c | 45 ++++
gcc/config/aarch64/aarch64.opt | 4 +
gcc/config/aarch64/atomics.md | 185 +++++++++++++-
gcc/config/aarch64/iterators.md | 3 +
gcc/doc/invoke.texi | 14 +-
libgcc/config.host | 4 +
libgcc/config/aarch64/lse.S | 238 ++++++++++++++++++
libgcc/config/aarch64/t-lse | 44 ++++
28 files changed, 717 insertions(+), 84 deletions(-)
create mode 100644 libgcc/config/aarch64/lse-init.c
create mode 100644 libgcc/config/aarch64/lse.S
create mode 100644 libgcc/config/aarch64/t-lse
--
2.17.2
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2019-09-17 21:11 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-09-05 14:36 [PATCH, AArch64, v3 0/6] LSE atomics out-of-line Wilco Dijkstra
2019-09-14 19:26 ` Richard Henderson
2019-09-16 11:59 ` Wilco Dijkstra
2019-09-17 8:40 ` Kyrill Tkachov
2019-09-17 10:55 ` Wilco Dijkstra
2019-09-17 21:11 ` Richard Henderson
-- strict thread matches above, loose matches on Subject: below --
2018-11-01 21:47 Richard Henderson
2018-11-11 12:30 ` Richard Henderson
2019-09-05 9:51 ` Kyrill Tkachov
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).