Re: [PATCH, AArch64 v2 05/11] aarch64: Emit LSE st<op> instructions

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

From: Will Deacon <will.deacon@arm.com>
To: Richard Henderson <richard.henderson@linaro.org>
Cc: James Greenhalgh <james.greenhalgh@arm.com>,
	"gcc-patches@gcc.gnu.org" <gcc-patches@gcc.gnu.org>,
	Ramana Radhakrishnan <Ramana.Radhakrishnan@arm.com>,
	"agraf@suse.de" <agraf@suse.de>,
	Marcus Shawcroft <Marcus.Shawcroft@arm.com>,
	Richard Earnshaw <Richard.Earnshaw@arm.com>,
	nd@arm.com
Subject: Re: [PATCH, AArch64 v2 05/11] aarch64: Emit LSE st<op> instructions
Date: Wed, 31 Oct 2018 19:42:00 -0000	[thread overview]
Message-ID: <20181031175144.GB27871@arm.com> (raw)
In-Reply-To: <b065d2aa-aed8-cf32-12bb-2db0ddc62579@linaro.org>

On Wed, Oct 31, 2018 at 04:38:53PM +0000, Richard Henderson wrote:
> On 10/31/18 3:04 PM, Will Deacon wrote:
> > The example test above uses relaxed atomics in conjunction with an acquire
> > fence, so I don't think we can actually use ST<op> at all without a change
> > to the language specification. I previouslyyallocated P0861 for this purpose
> > but never got a chance to write it up...
> > 
> > Perhaps the issue is a bit clearer with an additional thread (not often I
> > say that!):
> > 
> > 
> > P0 (atomic_int* y,atomic_int* x) {
> >   atomic_store_explicit(x,1,memory_order_relaxed);
> >   atomic_thread_fence(memory_order_release);
> >   atomic_store_explicit(y,1,memory_order_relaxed);
> > }
> > 
> > P1 (atomic_int* y,atomic_int* x) {
> >   atomic_fetch_add_explicit(y,1,memory_order_relaxed);	// STADD
> >   atomic_thread_fence(memory_order_acquire);
> >   int r0 = atomic_load_explicit(x,memory_order_relaxed);
> > }
> > 
> > P2 (atomic_int* y) {
> >   int r1 = atomic_load_explicit(y,memory_order_relaxed);
> > }
> > 
> > 
> > My understanding is that it is forbidden for r0 == 0 and r1 == 2 after
> > this test has executed. However, if the relaxed add in P1 compiles to
> > STADD and the subsequent acquire fence is compiled as DMB LD, then we
> > don't have any ordering guarantees in P1 and the forbidden result could
> > be observed.
> 
> I suppose I don't understand exactly what you're saying.

Apologies, I'm probably not explaining things very well. I'm trying to
avoid getting into the C11 memory model relations if I can help it, hence
the example.

> I can see that, yes, if you split the fetch-add from the acquire in P1 you get
> the incorrect results you describe.  But isn't that a bug in the test itself?

Per the C11 memory model, the test above is well-defined and if r1 == 2
then it is required that r0 == 1. With your proposal, this is not guaranteed
for AArch64, and it would be possible to end up with r1 == 2 and r0 == 0.

> Why would not the only correct version have
> 
> P1 (atomic_int* y, atomic_int* x) {
>   atomic_fetch_add_explicit(y, 1, memory_order_acquire);
>   int r0 = atomic_load_explicit(x, memory_order_relaxed);
> }
> 
> at which point we won't use STADD for the fetch-add, but LDADDA.

That would indeed work correctly, but the problem is that the C11 memory
model doesn't rule out the previous test as something which isn't portable.

> If the problem is more fundamental than this, would you have another go at
> explaining?  In particular, I don't see the difference between
> 
> 	ldadd	val, scratch, [base]
>   vs
> 	stadd	val, [base]
> 
> and
> 
> 	ldaddl	val, scratch, [base]
>   vs
> 	staddl	val, [base]
> 
> where both pairs of instructions have the same memory ordering semantics.
> Currently we are always producing the ld version of each pair.

Aha, maybe this is the problem. An acquire fence on AArch64 is implemented
using a DMB LD instruction, which orders prior reads against subsequent
reads and writes. However, the architecture says:

  | The ST<OP> instructions, and LD<OP> instructions where the destination
  | register is WZR or XZR, are not regarded as doing a read for the purpose
  | of a DMB LD barrier.

and so therefore an ST atomic is not affected by a subsequent acquire fence,
whereas an LD atomic is.

Does that help at all?

Will

next prev parent reply	other threads:[~2018-10-31 17:51 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-10-02 16:19 [PATCH, AArch64 v2 00/11] LSE atomics out-of-line Richard Henderson
2018-10-02 16:19 ` [PATCH, AArch64 v2 02/11] aarch64: Improve cas generation Richard Henderson
2018-10-30 20:37   ` James Greenhalgh
2018-10-02 16:19 ` [PATCH, AArch64 v2 07/11] aarch64: Add out-of-line functions for LSE atomics Richard Henderson
2018-10-02 16:19 ` [PATCH, AArch64 v2 11/11] Enable -matomic-ool by default Richard Henderson
2018-10-02 16:19 ` [PATCH, AArch64 v2 08/11] aarch64: Implement -matomic-ool Richard Henderson
2018-10-02 16:19 ` [PATCH, AArch64 v2 03/11] aarch64: Improve swp generation Richard Henderson
2018-10-30 20:50   ` James Greenhalgh
2018-10-02 16:19 ` [PATCH, AArch64 v2 04/11] aarch64: Improve atomic-op lse generation Richard Henderson
2018-10-30 21:40   ` James Greenhalgh
2018-10-02 16:19 ` [PATCH, AArch64 v2 01/11] aarch64: Simplify LSE cas generation Richard Henderson
2018-10-30 20:32   ` James Greenhalgh
2018-10-31 10:35     ` Richard Henderson
2018-10-02 16:19 ` [PATCH, AArch64 v2 09/11] aarch64: Force TImode values into even registers Richard Henderson
2018-10-30 21:47   ` James Greenhalgh
2018-10-02 16:19 ` [PATCH, AArch64 v2 06/11] Add visibility to libfunc constructors Richard Henderson
2018-10-30 21:46   ` James Greenhalgh
2018-10-31 11:29   ` Richard Henderson
2018-10-31 17:46   ` Eric Botcazou
2018-10-31 19:04     ` Richard Henderson
2018-10-31 19:43       ` Eric Botcazou
2018-10-02 16:36 ` [PATCH, AArch64 v2 10/11] aarch64: Implement TImode compare-and-swap Richard Henderson
2018-10-02 16:37 ` [PATCH, AArch64 v2 05/11] aarch64: Emit LSE st<op> instructions Richard Henderson
2018-10-30 21:42   ` James Greenhalgh
2018-10-31 11:13     ` Richard Henderson
2018-10-31 11:25     ` Richard Henderson
2018-10-31 15:49       ` Will Deacon
2018-10-31 17:35         ` Richard Henderson
2018-10-31 19:42           ` Will Deacon [this message]
2018-10-31 22:39             ` Richard Henderson
2018-10-11 17:44 ` [PATCH, AArch64 v2 00/11] LSE atomics out-of-line Richard Henderson
2018-10-22 14:01   ` Richard Henderson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20181031175144.GB27871@arm.com \
    --to=will.deacon@arm.com \
    --cc=Marcus.Shawcroft@arm.com \
    --cc=Ramana.Radhakrishnan@arm.com \
    --cc=Richard.Earnshaw@arm.com \
    --cc=agraf@suse.de \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=james.greenhalgh@arm.com \
    --cc=nd@arm.com \
    --cc=richard.henderson@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).