public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: Ramana Radhakrishnan <ramana.radhakrishnan@arm.com>
To: "rth7680@gmail.com" <rth7680@gmail.com>,
	"gcc-patches@gcc.gnu.org" <gcc-patches@gcc.gnu.org>
Cc: "agraf@suse.de" <agraf@suse.de>, "matz@suse.de" <matz@suse.de>,
	Richard Henderson <richard.henderson@linaro.org>, nd <nd@arm.com>
Subject: Re: [PATCH, AArch64 00/11] LSE atomics out-of-line
Date: Thu, 27 Sep 2018 13:08:00 -0000	[thread overview]
Message-ID: <590e4768-0d3b-92e4-9103-ed36902125ff@arm.com> (raw)
In-Reply-To: <20180926050355.32746-1-richard.henderson@linaro.org>

On 26/09/2018 06:03, rth7680@gmail.com wrote:
> From: Richard Henderson <richard.henderson@linaro.org>
> 
> ARMv8.1 adds an (mandatory) Atomics extension, also known as the
> Large System Extension.  Deploying this extension at the OS level
> has proved challenging.
> 
> The following is the result of a conversation between myself,
> Alex Graf of SuSE, and Ramana Radhakrishnan of ARM, at last week's
> Linaro Connect in Vancouver.
> 
> The current state of the world is that one could distribute two
> different copies of a given shared library and place the LSE-enabled
> version in /lib64/atomics/ and it will be selected over the /lib64/
> version by ld.so when HWCAP_ATOMICS is present.
> 
> Alex's main concern with this is that (1) he doesn't want to
> distribute two copies of every library, or determine what a
> resonable subset would be and (2) this solution does not work
> for executables, e.g. mysql.
> 
> Ramana's main concern was to avoid the overhead of an indirect jump,
> especially in how that would affect the (non-)branch-prediction of
> the smallest implementations.
> 
> Therefore, I've created small out-of-line helpers that are directly
> linked into every library or executable that requires them.  There
> will be two direct branches, both of which will be well-predicted.
> 
> In the process, I discovered a number of places within the code
> where the existing implementation could be improved.  In particular:
> 
>   - the LSE patterns didn't use predicates or constraints that
>     match the actual instructions, requiring unnecessary splitting.
> 
>   - the non-LSE compare-and-swap can use an extending compare to
>     avoid requiring the input to have been previously extended.
> 
>   - TImode compare-and-swap was missing entirely.  This brings
>     aarch64 to parity with x86_64 wrt __sync_val_compare_and_swap.
> 
> There is a final patch that enables the new option by default.
> I am not necessarily expecting this to be merged upstream, but
> for the operating system to decide what the default should be.
> It might be that this should be a configure option, so as to
> make that OS choice easier, but I've just now thought of that.  ;-)
> 
> I'm going to have to rely on Alex and/or Ramana to perform
> testing on a system that supports LSE.
> 

Thanks for this patchset -

I'll give this a whirl in the next couple of days but don't expect 
results until Monday or so.

I do have an additional concern that I forgot to mention in Vancouver -

Thanks Wilco for reminding me that this now replaces a bunch of inline 
instructions with effectively a library call therefore clobbering a 
whole bunch of caller saved registers.

In which case I see 2 options.

-  maybe we should consider a private interface and restrict the 
registers that these files are compiled with to minimise the number of 
caller saved registers we trash.

- Alternatively we should consider an option to inline these at O2 or O3 
as we may just be trading the performance improvements we get with using 
the lse atomics for additional stacking and unstacking of caller saved 
registers in the main functions...

But anyway while we discuss that we'll have a look at testing and 
benchmarking this.


regards
Ramana

> 
> r~
> 
> 
> Richard Henderson (11):
>    aarch64: Simplify LSE cas generation
>    aarch64: Improve cas generation
>    aarch64: Improve swp generation
>    aarch64: Improve atomic-op lse generation
>    aarch64: Emit LSE st<op> instructions
>    Add visibility to libfunc constructors
>    Link static libgcc after shared libgcc for -shared-libgcc
>    aarch64: Add out-of-line functions for LSE atomics
>    aarch64: Implement -matomic-ool
>    aarch64: Implement TImode compare-and-swap
>    Enable -matomic-ool by default
> 
>   gcc/config/aarch64/aarch64-protos.h           |  20 +-
>   gcc/optabs-libfuncs.h                         |   2 +
>   gcc/common/config/aarch64/aarch64-common.c    |   6 +-
>   gcc/config/aarch64/aarch64.c                  | 480 ++++++--------
>   gcc/gcc.c                                     |   9 +-
>   gcc/optabs-libfuncs.c                         |  26 +-
>   .../atomic-comp-swap-release-acquire.c        |   2 +-
>   .../gcc.target/aarch64/atomic-inst-ldadd.c    |  18 +-
>   .../gcc.target/aarch64/atomic-inst-ldlogic.c  |  54 +-
>   .../gcc.target/aarch64/atomic-op-acq_rel.c    |   2 +-
>   .../gcc.target/aarch64/atomic-op-acquire.c    |   2 +-
>   .../gcc.target/aarch64/atomic-op-char.c       |   2 +-
>   .../gcc.target/aarch64/atomic-op-consume.c    |   2 +-
>   .../gcc.target/aarch64/atomic-op-imm.c        |   2 +-
>   .../gcc.target/aarch64/atomic-op-int.c        |   2 +-
>   .../gcc.target/aarch64/atomic-op-long.c       |   2 +-
>   .../gcc.target/aarch64/atomic-op-relaxed.c    |   2 +-
>   .../gcc.target/aarch64/atomic-op-release.c    |   2 +-
>   .../gcc.target/aarch64/atomic-op-seq_cst.c    |   2 +-
>   .../gcc.target/aarch64/atomic-op-short.c      |   2 +-
>   .../aarch64/atomic_cmp_exchange_zero_reg_1.c  |   2 +-
>   .../atomic_cmp_exchange_zero_strong_1.c       |   2 +-
>   .../gcc.target/aarch64/sync-comp-swap.c       |   2 +-
>   .../gcc.target/aarch64/sync-op-acquire.c      |   2 +-
>   .../gcc.target/aarch64/sync-op-full.c         |   2 +-
>   libgcc/config/aarch64/lse.c                   | 280 ++++++++
>   gcc/config/aarch64/aarch64.opt                |   4 +
>   gcc/config/aarch64/atomics.md                 | 608 ++++++++++--------
>   gcc/config/aarch64/iterators.md               |   8 +-
>   gcc/config/aarch64/predicates.md              |  12 +
>   gcc/doc/invoke.texi                           |  14 +-
>   libgcc/config.host                            |   4 +
>   libgcc/config/aarch64/t-lse                   |  48 ++
>   33 files changed, 1050 insertions(+), 577 deletions(-)
>   create mode 100644 libgcc/config/aarch64/lse.c
>   create mode 100644 libgcc/config/aarch64/t-lse
> 

  parent reply	other threads:[~2018-09-27 13:08 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-09-26  5:04 rth7680
2018-09-26  5:04 ` [PATCH, AArch64 04/11] aarch64: Improve atomic-op lse generation rth7680
2018-09-26  5:04 ` [PATCH, AArch64 05/11] aarch64: Emit LSE st<op> instructions rth7680
2018-09-26  5:04 ` [PATCH, AArch64 07/11] Link static libgcc after shared libgcc for -shared-libgcc rth7680
2018-09-26 16:55   ` Joseph Myers
2018-09-26 16:57     ` Richard Henderson
2018-09-26  5:04 ` [PATCH, AArch64 09/11] aarch64: Implement -matomic-ool rth7680
2018-09-26  5:04 ` [PATCH, AArch64 11/11] Enable -matomic-ool by default rth7680
2018-09-26  5:04 ` [PATCH, AArch64 06/11] Add visibility to libfunc constructors rth7680
2018-09-26  5:04 ` [PATCH, AArch64 01/11] aarch64: Simplify LSE cas generation rth7680
2018-09-26  5:04 ` [PATCH, AArch64 10/11] aarch64: Implement TImode compare-and-swap rth7680
2018-09-27 13:08   ` Matthew Malcomson
     [not found]   ` <3460dd10-4d9a-1def-3f9b-5f7a1afe5906@arm.com>
2018-09-27 16:39     ` Richard Henderson
2018-09-27 17:07       ` Matthew Malcomson
2018-10-01 13:51   ` Matthew Malcomson
2018-09-26  5:04 ` [PATCH, AArch64 03/11] aarch64: Improve swp generation rth7680
2018-09-26  5:04 ` [PATCH, AArch64 08/11] aarch64: Add out-of-line functions for LSE atomics rth7680
2018-09-26  9:01   ` Florian Weimer
2018-09-26 14:33     ` Richard Henderson
2018-09-26 14:36       ` Florian Weimer
2018-09-26 14:37         ` Richard Henderson
2018-09-28 16:29   ` Ramana Radhakrishnan
2018-09-26  7:40 ` [PATCH, AArch64 02/11] aarch64: Improve cas generation rth7680
2018-09-26  9:22 ` [PATCH, AArch64 00/11] LSE atomics out-of-line Florian Weimer
2018-09-26 13:05   ` Michael Matz
2018-09-27 13:08 ` Ramana Radhakrishnan [this message]
2018-09-27 15:19   ` Alexander Graf
2018-09-27 16:51   ` Richard Henderson
2018-09-28  8:48     ` Ramana Radhakrishnan
2019-02-04 11:14 ` __libc_single_threaded variable for optimizing std::shared_ptr (was: [PATCH, AArch64 00/11] LSE atomics out-of-line) Florian Weimer
2019-02-04 12:15   ` Jonathan Wakely

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=590e4768-0d3b-92e4-9103-ed36902125ff@arm.com \
    --to=ramana.radhakrishnan@arm.com \
    --cc=agraf@suse.de \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=matz@suse.de \
    --cc=nd@arm.com \
    --cc=richard.henderson@linaro.org \
    --cc=rth7680@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).