From: Will Deacon <will.deacon@arm.com>
To: Richard Henderson <richard.henderson@linaro.org>
Cc: James Greenhalgh <james.greenhalgh@arm.com>,
"gcc-patches@gcc.gnu.org" <gcc-patches@gcc.gnu.org>,
Ramana Radhakrishnan <Ramana.Radhakrishnan@arm.com>,
"agraf@suse.de" <agraf@suse.de>,
Marcus Shawcroft <Marcus.Shawcroft@arm.com>,
Richard Earnshaw <Richard.Earnshaw@arm.com>,
nd@arm.com
Subject: Re: [PATCH, AArch64 v2 05/11] aarch64: Emit LSE st<op> instructions
Date: Wed, 31 Oct 2018 19:42:00 -0000 [thread overview]
Message-ID: <20181031175144.GB27871@arm.com> (raw)
In-Reply-To: <b065d2aa-aed8-cf32-12bb-2db0ddc62579@linaro.org>
On Wed, Oct 31, 2018 at 04:38:53PM +0000, Richard Henderson wrote:
> On 10/31/18 3:04 PM, Will Deacon wrote:
> > The example test above uses relaxed atomics in conjunction with an acquire
> > fence, so I don't think we can actually use ST<op> at all without a change
> > to the language specification. I previouslyyallocated P0861 for this purpose
> > but never got a chance to write it up...
> >
> > Perhaps the issue is a bit clearer with an additional thread (not often I
> > say that!):
> >
> >
> > P0 (atomic_int* y,atomic_int* x) {
> > atomic_store_explicit(x,1,memory_order_relaxed);
> > atomic_thread_fence(memory_order_release);
> > atomic_store_explicit(y,1,memory_order_relaxed);
> > }
> >
> > P1 (atomic_int* y,atomic_int* x) {
> > atomic_fetch_add_explicit(y,1,memory_order_relaxed); // STADD
> > atomic_thread_fence(memory_order_acquire);
> > int r0 = atomic_load_explicit(x,memory_order_relaxed);
> > }
> >
> > P2 (atomic_int* y) {
> > int r1 = atomic_load_explicit(y,memory_order_relaxed);
> > }
> >
> >
> > My understanding is that it is forbidden for r0 == 0 and r1 == 2 after
> > this test has executed. However, if the relaxed add in P1 compiles to
> > STADD and the subsequent acquire fence is compiled as DMB LD, then we
> > don't have any ordering guarantees in P1 and the forbidden result could
> > be observed.
>
> I suppose I don't understand exactly what you're saying.
Apologies, I'm probably not explaining things very well. I'm trying to
avoid getting into the C11 memory model relations if I can help it, hence
the example.
> I can see that, yes, if you split the fetch-add from the acquire in P1 you get
> the incorrect results you describe. But isn't that a bug in the test itself?
Per the C11 memory model, the test above is well-defined and if r1 == 2
then it is required that r0 == 1. With your proposal, this is not guaranteed
for AArch64, and it would be possible to end up with r1 == 2 and r0 == 0.
> Why would not the only correct version have
>
> P1 (atomic_int* y, atomic_int* x) {
> atomic_fetch_add_explicit(y, 1, memory_order_acquire);
> int r0 = atomic_load_explicit(x, memory_order_relaxed);
> }
>
> at which point we won't use STADD for the fetch-add, but LDADDA.
That would indeed work correctly, but the problem is that the C11 memory
model doesn't rule out the previous test as something which isn't portable.
> If the problem is more fundamental than this, would you have another go at
> explaining? In particular, I don't see the difference between
>
> ldadd val, scratch, [base]
> vs
> stadd val, [base]
>
> and
>
> ldaddl val, scratch, [base]
> vs
> staddl val, [base]
>
> where both pairs of instructions have the same memory ordering semantics.
> Currently we are always producing the ld version of each pair.
Aha, maybe this is the problem. An acquire fence on AArch64 is implemented
using a DMB LD instruction, which orders prior reads against subsequent
reads and writes. However, the architecture says:
| The ST<OP> instructions, and LD<OP> instructions where the destination
| register is WZR or XZR, are not regarded as doing a read for the purpose
| of a DMB LD barrier.
and so therefore an ST atomic is not affected by a subsequent acquire fence,
whereas an LD atomic is.
Does that help at all?
Will
next prev parent reply other threads:[~2018-10-31 17:51 UTC|newest]
Thread overview: 32+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-10-02 16:19 [PATCH, AArch64 v2 00/11] LSE atomics out-of-line Richard Henderson
2018-10-02 16:19 ` [PATCH, AArch64 v2 02/11] aarch64: Improve cas generation Richard Henderson
2018-10-30 20:37 ` James Greenhalgh
2018-10-02 16:19 ` [PATCH, AArch64 v2 07/11] aarch64: Add out-of-line functions for LSE atomics Richard Henderson
2018-10-02 16:19 ` [PATCH, AArch64 v2 11/11] Enable -matomic-ool by default Richard Henderson
2018-10-02 16:19 ` [PATCH, AArch64 v2 08/11] aarch64: Implement -matomic-ool Richard Henderson
2018-10-02 16:19 ` [PATCH, AArch64 v2 03/11] aarch64: Improve swp generation Richard Henderson
2018-10-30 20:50 ` James Greenhalgh
2018-10-02 16:19 ` [PATCH, AArch64 v2 04/11] aarch64: Improve atomic-op lse generation Richard Henderson
2018-10-30 21:40 ` James Greenhalgh
2018-10-02 16:19 ` [PATCH, AArch64 v2 01/11] aarch64: Simplify LSE cas generation Richard Henderson
2018-10-30 20:32 ` James Greenhalgh
2018-10-31 10:35 ` Richard Henderson
2018-10-02 16:19 ` [PATCH, AArch64 v2 09/11] aarch64: Force TImode values into even registers Richard Henderson
2018-10-30 21:47 ` James Greenhalgh
2018-10-02 16:19 ` [PATCH, AArch64 v2 06/11] Add visibility to libfunc constructors Richard Henderson
2018-10-30 21:46 ` James Greenhalgh
2018-10-31 11:29 ` Richard Henderson
2018-10-31 17:46 ` Eric Botcazou
2018-10-31 19:04 ` Richard Henderson
2018-10-31 19:43 ` Eric Botcazou
2018-10-02 16:36 ` [PATCH, AArch64 v2 10/11] aarch64: Implement TImode compare-and-swap Richard Henderson
2018-10-02 16:37 ` [PATCH, AArch64 v2 05/11] aarch64: Emit LSE st<op> instructions Richard Henderson
2018-10-30 21:42 ` James Greenhalgh
2018-10-31 11:13 ` Richard Henderson
2018-10-31 11:25 ` Richard Henderson
2018-10-31 15:49 ` Will Deacon
2018-10-31 17:35 ` Richard Henderson
2018-10-31 19:42 ` Will Deacon [this message]
2018-10-31 22:39 ` Richard Henderson
2018-10-11 17:44 ` [PATCH, AArch64 v2 00/11] LSE atomics out-of-line Richard Henderson
2018-10-22 14:01 ` Richard Henderson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20181031175144.GB27871@arm.com \
--to=will.deacon@arm.com \
--cc=Marcus.Shawcroft@arm.com \
--cc=Ramana.Radhakrishnan@arm.com \
--cc=Richard.Earnshaw@arm.com \
--cc=agraf@suse.de \
--cc=gcc-patches@gcc.gnu.org \
--cc=james.greenhalgh@arm.com \
--cc=nd@arm.com \
--cc=richard.henderson@linaro.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).