[Bug target/65697] New: __atomic memory barriers not strong enough for _

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

* [Bug target/65697] New: __atomic memory barriers not strong enough for __sync builtins
@ 2015-04-08 11:25 matthew.wahab at arm dot com
  2015-04-08 11:33 ` [Bug target/65697] " matthew.wahab at arm dot com
                   ` (68 more replies)
  0 siblings, 69 replies; 70+ messages in thread
From: matthew.wahab at arm dot com @ 2015-04-08 11:25 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65697

            Bug ID: 65697
           Summary: __atomic memory barriers not strong enough for __sync
                    builtins
           Product: gcc
           Version: 5.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: matthew.wahab at arm dot com

The __sync builtins are implemented by expanding them to the equivalent
__atomic builtins, using MEMMODEL_SEQ_CST for those that are full barriers.
This is too weak since __atomic operations with MEMMODEL_SEQ_CST still allow
some data references to move across the operation (see
https://gcc.gnu.org/ml/gcc/2014-02/msg00058.html) while a __sync full barrier
should block all movement
(https://gcc.gnu.org/onlinedocs/gcc/_005f_005fsync-Builtins.html#_005f_005fsync-Builtins).

This is problem for Aarch64 where data references after the barrier could be
speculated ahead of it.

For example, compiling with aarch64-none-linux-gnu at -O2,
-----
int foo = 0;
int bar = 0;

int T1(void)
{
  int x = __sync_fetch_and_add(&foo, 1);
  return bar;
}
----
produces
----
T1:
    adrp    x0, .LANCHOR0
    add    x0, x0, :lo12:.LANCHOR0
.L2:
    ldaxr    w1, [x0]       ; load-acquire (__sync_fetch_and_add)
    add    w1, w1, 1
    stlxr    w2, w1, [x0]   ; store-release  (__sync_fetch_and_add)
    cbnz    w2, .L2
    ldr    w0, [x0, 4]    ; load (return bar)
    ret
----
With this code, the load can be executed ahead of the store-release which ends
the __sync_fetch_and_add. A correct implementation should emit a dmb
instruction after the cbnz.

GCC info:
gcc version 5.0.0 20150407 (experimental) 
Target: aarch64-none-linux-gnu


^ permalink raw reply	[flat|nested] 70+ messages in thread

* [Bug target/65697] __atomic memory barriers not strong enough for __sync builtins
  2015-04-08 11:25 [Bug target/65697] New: __atomic memory barriers not strong enough for __sync builtins matthew.wahab at arm dot com
@ 2015-04-08 11:33 ` matthew.wahab at arm dot com
  2015-04-08 16:16 ` jakub at gcc dot gnu.org
                   ` (67 subsequent siblings)
  68 siblings, 0 replies; 70+ messages in thread
From: matthew.wahab at arm dot com @ 2015-04-08 11:33 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65697

--- Comment #1 from Matthew Wahab <matthew.wahab at arm dot com> ---
I'm working on this but it isn't obvious how to fix it. The strong barriers
aren't needed for the __atomics and will have an effect on performance so
simply strengthening the MEMMODEL_SEQ_CST implementation in the backend isn't a
good solution.

It looks like the __sync builtins are always expanded out to the __atomics. 
This is done in the middle-end and there doesn't seem to be any way for a
backend to intervene. 

I'm currently preparing a patch that introduces a new MEMMODEL tag to
coretypes.h/memmodel to specify the __sync style full barriers. This can then
be used in the __sync to __atomic expansions and, because the MEMMODEL argument
to the __atomics is eventually passed to the backends, this is enough to enable
backends to know when the stronger barrier is needed.

The drawback with this is that a number of backends need to be updated to
recognize the new MEMMODEL tag. For each, the minimum change is to make the new
MEMMODEL tag a synonym for MEMMODEL_SEQ_CST. The backends that would need
updating are arm, aarch64, i386, ia64, pa, rs6000, s390, sparc. (A subsequent
patch would make the aarch64 backend generate the stronger barrier for the new
MEMMODEL tag.)

If there's a better way of doing it, let me know.

^ permalink raw reply	[flat|nested] 70+ messages in thread

* [Bug target/65697] __atomic memory barriers not strong enough for __sync builtins
  2015-04-08 11:25 [Bug target/65697] New: __atomic memory barriers not strong enough for __sync builtins matthew.wahab at arm dot com
  2015-04-08 11:33 ` [Bug target/65697] " matthew.wahab at arm dot com
@ 2015-04-08 16:16 ` jakub at gcc dot gnu.org
  2015-04-09  9:53 ` matthew.wahab at arm dot com
                   ` (66 subsequent siblings)
  68 siblings, 0 replies; 70+ messages in thread
From: jakub at gcc dot gnu.org @ 2015-04-08 16:16 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65697

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |jakub at gcc dot gnu.org,
                   |                            |rth at gcc dot gnu.org,
                   |                            |torvald at gcc dot gnu.org

--- Comment #2 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
I doubt the __sync_* builtins are meant to be any stronger than
__ATOMIC_SEQ_CST is.


^ permalink raw reply	[flat|nested] 70+ messages in thread

* [Bug target/65697] __atomic memory barriers not strong enough for __sync builtins
  2015-04-08 11:25 [Bug target/65697] New: __atomic memory barriers not strong enough for __sync builtins matthew.wahab at arm dot com
  2015-04-08 11:33 ` [Bug target/65697] " matthew.wahab at arm dot com
  2015-04-08 16:16 ` jakub at gcc dot gnu.org
@ 2015-04-09  9:53 ` matthew.wahab at arm dot com
  2015-04-09 10:08 ` torvald at gcc dot gnu.org
                   ` (65 subsequent siblings)
  68 siblings, 0 replies; 70+ messages in thread
From: matthew.wahab at arm dot com @ 2015-04-09  9:53 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65697

--- Comment #4 from Matthew Wahab <matthew.wahab at arm dot com> ---
There is a difference between __syncs and __atomics, assuming the the __atomics
are meant to match the C11/C++11 memory model. With C11 atomics, a barrier for
an operation on an object only affects memory references related to that
object. So __atomic_add_fetch(&foo, 1, __ATOMIC_SEQ_CST) is only a full barrier
for references that can affect the value of foo, other references can be moved
around it. The documentation for the __sync builtins is clear that the
__sync_fetch_and_add is a barrier that affects all memory references.

There are C11 fences which make __ATOMIC_SEQ_CST act as a barrier for all
memory references but these are distinct from the operations on a memory
location.

Descriptions of C11 atomics are a little opaque, the relevant sections in the
C11 standard seem to be 5.1.2.4-9 Note 4 ("There is a separate order for each
atomic object") and 7.17.3-7 (limiting memory_order_seq_cst to affected
locations). Torvald Riegels email, linked above, explains it better. It's from
a similar discussion about the Aarch64 implementation of MEMMODEL_SEQ_CST for
the __atomic builtins.


^ permalink raw reply	[flat|nested] 70+ messages in thread

* [Bug target/65697] __atomic memory barriers not strong enough for __sync builtins
  2015-04-08 11:25 [Bug target/65697] New: __atomic memory barriers not strong enough for __sync builtins matthew.wahab at arm dot com
                   ` (2 preceding siblings ...)
  2015-04-09  9:53 ` matthew.wahab at arm dot com
@ 2015-04-09 10:08 ` torvald at gcc dot gnu.org
  2015-04-09 10:11 ` torvald at gcc dot gnu.org
                   ` (64 subsequent siblings)
  68 siblings, 0 replies; 70+ messages in thread
From: torvald at gcc dot gnu.org @ 2015-04-09 10:08 UTC (permalink / raw)
  To: gcc-bugs

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="UTF-8", Size: 5943 bytes --]

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65697

--- Comment #5 from torvald at gcc dot gnu.org ---
(In reply to Matthew Wahab from comment #0)
> The __sync builtins are implemented by expanding them to the equivalent
> __atomic builtins, using MEMMODEL_SEQ_CST for those that are full barriers.
> This is too weak since __atomic operations with MEMMODEL_SEQ_CST still allow
> some data references to move across the operation (see
> https://gcc.gnu.org/ml/gcc/2014-02/msg00058.html)

I don't think that's a precise characterization.  The difference discussed in
that thread is that seq-cst fences have a stronger effect than seq-cst stores. 
This is not really a question of MEMMODEL_SEQ_CST but what you apply it to.

> while a __sync full
> barrier should block all movement
> (https://gcc.gnu.org/onlinedocs/gcc/_005f_005fsync-Builtins.
> html#_005f_005fsync-Builtins).

But in your code example below, you didn't use __synch_synchronize.  The other
__sync operations do not document that they are a full barrier, AFAICS.  (I'm
not sure whether there is a wide assumption that they do.)

> This is problem for Aarch64 where data references after the barrier could be
> speculated ahead of it.
> 
> For example, compiling with aarch64-none-linux-gnu at -O2,
> -----
> int foo = 0;
> int bar = 0;
> 
> int T1(void)
> {
>   int x = __sync_fetch_and_add(&foo, 1);
>   return bar;
> }

This test case is problematic in that it has a data race on bar unless bar is
never modified by another thread, in which case it would be fine to load bar
before the fetch_add.  It also loads bar irrespective of which value the
fetch_add actually returns.  I'm aware that the __sync builtins offer no
MEMMODEL_RELAXED, and so lots of code simply has a plain memory access.  

Nonetheless, I think you should also look at a test case that is properly
synchronized, data-race-free C11 code with the builtins, and see whether that
is compiled differently (ie, either use a relaxed MO load for bar or make the
load of bar conditional on the outcome of the fetch_add).

To be on the safe side, you can also use the cppmem tool to verify what the C11
model requires the compiled code to guarantee.  Here's what you probably want
to achieve:

int main() {
  atomic_int foo = 0; 
  atomic_int bar = 0; 
  {{{ {
        r1=cas_strong(foo, 0, 1);
        r2=bar.load(memory_order_relaxed);
      }
  ||| {
        bar.store(42, memory_order_relaxed);
        foo.store(1, memory_order_release);
      }
  }}};
  return 0; }

This tries to test whether "ordered" stores to bar and foo are observed in this
order in the first thread (it uses a CAS instead of fetch_add).  This gives me
3 consistent executions, of which one is the one in which the CAS succeeds (ie,
like fetch_add would do), and this one returns a value of 42 for the load of
bar.

> ----
> produces
> ----
> T1:
> 	adrp	x0, .LANCHOR0
> 	add	x0, x0, :lo12:.LANCHOR0
> .L2:
> 	ldaxr	w1, [x0]       ; load-acquire (__sync_fetch_and_add)
> 	add	w1, w1, 1
> 	stlxr	w2, w1, [x0]   ; store-release  (__sync_fetch_and_add)
> 	cbnz	w2, .L2
> 	ldr	w0, [x0, 4]    ; load (return bar)
> 	ret
> ----
> With this code, the load can be executed ahead of the store-release which
> ends the __sync_fetch_and_add. A correct implementation should emit a dmb
> instruction after the cbnz.

I'm not sufficiently familiar with ARM to assess whether that's correct.  Is
this ARMv8?  It seems to be consistent with the mapping to machine code that
GCC has followed (http://www.cl.cam.ac.uk/~pes20/cpp/cpp0xmappings.html), which
does not include a DMB.  However, it also shows no difference between an
acq_rel CAS and a seq_cst CAS.  Could it be that the CAS sequence itself (ie,
the machine code for it) prevents the reordering you are concerned about at the
HW level?
>From gcc-bugs-return-483107-listarch-gcc-bugs=gcc.gnu.org@gcc.gnu.org Thu Apr 09 10:09:58 2015
Return-Path: <gcc-bugs-return-483107-listarch-gcc-bugs=gcc.gnu.org@gcc.gnu.org>
Delivered-To: listarch-gcc-bugs@gcc.gnu.org
Received: (qmail 126032 invoked by alias); 9 Apr 2015 10:09:57 -0000
Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-bugs.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-bugs/>
List-Post: <mailto:gcc-bugs@gcc.gnu.org>
List-Help: <mailto:gcc-bugs-help@gcc.gnu.org>
Sender: gcc-bugs-owner@gcc.gnu.org
Delivered-To: mailing list gcc-bugs@gcc.gnu.org
Received: (qmail 125996 invoked by uid 48); 9 Apr 2015 10:09:54 -0000
From: "kyukhin at gcc dot gnu.org" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug target/65671] Assembly failure (invalid register operand) with -O3 -mavx512vl
Date: Thu, 09 Apr 2015 10:09:00 -0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gcc
X-Bugzilla-Component: target
X-Bugzilla-Version: 5.0
X-Bugzilla-Keywords:
X-Bugzilla-Severity: normal
X-Bugzilla-Who: kyukhin at gcc dot gnu.org
X-Bugzilla-Status: ASSIGNED
X-Bugzilla-Priority: P3
X-Bugzilla-Assigned-To: kyukhin at gcc dot gnu.org
X-Bugzilla-Target-Milestone: ---
X-Bugzilla-Flags:
X-Bugzilla-Changed-Fields: attachments.created
Message-ID: <bug-65671-4-9emdcYgpkD@http.gcc.gnu.org/bugzilla/>
In-Reply-To: <bug-65671-4@http.gcc.gnu.org/bugzilla/>
References: <bug-65671-4@http.gcc.gnu.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: 7bit
X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
X-SW-Source: 2015-04/txt/msg00659.txt.bz2
Content-length: 257

https://gcc.gnu.org/bugzilla/show_bug.cgi?ide671

--- Comment #2 from Kirill Yukhin <kyukhin at gcc dot gnu.org> ---
Created attachment 35272
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id5272&actioníit
Proposed patch

I am testing this patch.


^ permalink raw reply	[flat|nested] 70+ messages in thread

* [Bug target/65697] __atomic memory barriers not strong enough for __sync builtins
  2015-04-08 11:25 [Bug target/65697] New: __atomic memory barriers not strong enough for __sync builtins matthew.wahab at arm dot com
                   ` (3 preceding siblings ...)
  2015-04-09 10:08 ` torvald at gcc dot gnu.org
@ 2015-04-09 10:11 ` torvald at gcc dot gnu.org
  2015-04-09 10:47 ` matthew.wahab at arm dot com
                   ` (63 subsequent siblings)
  68 siblings, 0 replies; 70+ messages in thread
From: torvald at gcc dot gnu.org @ 2015-04-09 10:11 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65697

--- Comment #6 from torvald at gcc dot gnu.org ---
(In reply to torvald from comment #5)
> (In reply to Matthew Wahab from comment #0)
> > while a __sync full
> > barrier should block all movement
> > (https://gcc.gnu.org/onlinedocs/gcc/_005f_005fsync-Builtins.
> > html#_005f_005fsync-Builtins).
> 
> But in your code example below, you didn't use __synch_synchronize.  The
> other __sync operations do not document that they are a full barrier,
> AFAICS.  (I'm not sure whether there is a wide assumption that they do.)

No, you're right.  They are "in most cases" considered to be a full barrier.


^ permalink raw reply	[flat|nested] 70+ messages in thread

* [Bug target/65697] __atomic memory barriers not strong enough for __sync builtins
  2015-04-08 11:25 [Bug target/65697] New: __atomic memory barriers not strong enough for __sync builtins matthew.wahab at arm dot com
                   ` (4 preceding siblings ...)
  2015-04-09 10:11 ` torvald at gcc dot gnu.org
@ 2015-04-09 10:47 ` matthew.wahab at arm dot com
  2015-04-09 11:10 ` redi at gcc dot gnu.org
                   ` (62 subsequent siblings)
  68 siblings, 0 replies; 70+ messages in thread
From: matthew.wahab at arm dot com @ 2015-04-09 10:47 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65697

--- Comment #7 from Matthew Wahab <matthew.wahab at arm dot com> ---
(In reply to torvald from comment #5)
> (In reply to Matthew Wahab from comment #0)
> > The __sync builtins are implemented by expanding them to the equivalent
> > __atomic builtins, using MEMMODEL_SEQ_CST for those that are full barriers.
> > This is too weak since __atomic operations with MEMMODEL_SEQ_CST still allow
> > some data references to move across the operation (see
> > https://gcc.gnu.org/ml/gcc/2014-02/msg00058.html)
> 
> I don't think that's a precise characterization.  The difference discussed
> in that thread is that seq-cst fences have a stronger effect than seq-cst
> stores.  This is not really a question of MEMMODEL_SEQ_CST but what you
> apply it to.

Ok, my point was just that an __sync operation has a stronger barrier that an
__atomic operation.

> This test case is problematic in that it has a data race on bar unless bar
> is never modified by another thread, in which case it would be fine to load
> bar before the fetch_add.  It also loads bar irrespective of which value the
> fetch_add actually returns.  I'm aware that the __sync builtins offer no
> MEMMODEL_RELAXED, and so lots of code simply has a plain memory access.  
> 
> Nonetheless, I think you should also look at a test case that is properly
> synchronized, data-race-free C11 code with the builtins, and see whether
> that is compiled differently (ie, either use a relaxed MO load for bar or
> make the load of bar conditional on the outcome of the fetch_add).
> 
> To be on the safe side, you can also use the cppmem tool to verify what the
> C11 model requires the compiled code to guarantee.  Here's what you probably
> want to achieve:

I agree that this wouldn't affect valid C11 code (because of data-races) but my
understanding is that __sync builtins don't require a C11 model. The problem is
that the __syncs are being implemented using atomic operations that do assume a
C11 model and its notion of program validity. This looks like it would lead to
differences in behaviour when code, using only the __sync builtins and knowing
nothing of C11, is moved between targets with different memory models.


> > ----
> > produces
> > ----
> > T1:
> > 	adrp	x0, .LANCHOR0
> > 	add	x0, x0, :lo12:.LANCHOR0
> > .L2:
> > 	ldaxr	w1, [x0]       ; load-acquire (__sync_fetch_and_add)
> > 	add	w1, w1, 1
> > 	stlxr	w2, w1, [x0]   ; store-release  (__sync_fetch_and_add)
> > 	cbnz	w2, .L2
> > 	ldr	w0, [x0, 4]    ; load (return bar)
> > 	ret
> > ----
> > With this code, the load can be executed ahead of the store-release which
> > ends the __sync_fetch_and_add. A correct implementation should emit a dmb
> > instruction after the cbnz.
> 
> I'm not sufficiently familiar with ARM to assess whether that's correct.  Is
> this ARMv8?  It seems to be consistent with the mapping to machine code that
> GCC has followed (http://www.cl.cam.ac.uk/~pes20/cpp/cpp0xmappings.html),
> which does not include a DMB.  However, it also shows no difference between
> an acq_rel CAS and a seq_cst CAS.  Could it be that the CAS sequence itself
> (ie, the machine code for it) prevents the reordering you are concerned
> about at the HW level?

Yes, its ARMv8. It's consistent with the code expected for the atomics used to
implement the __sync_fetch_and_add. I believe that acquire-release will
normally be treated as seq-cst for the aarch64 back-end (I'd have to
double-check though).
>From gcc-bugs-return-483115-listarch-gcc-bugs=gcc.gnu.org@gcc.gnu.org Thu Apr 09 10:49:14 2015
Return-Path: <gcc-bugs-return-483115-listarch-gcc-bugs=gcc.gnu.org@gcc.gnu.org>
Delivered-To: listarch-gcc-bugs@gcc.gnu.org
Received: (qmail 116000 invoked by alias); 9 Apr 2015 10:49:14 -0000
Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-bugs.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-bugs/>
List-Post: <mailto:gcc-bugs@gcc.gnu.org>
List-Help: <mailto:gcc-bugs-help@gcc.gnu.org>
Sender: gcc-bugs-owner@gcc.gnu.org
Delivered-To: mailing list gcc-bugs@gcc.gnu.org
Received: (qmail 115950 invoked by uid 48); 9 Apr 2015 10:49:10 -0000
From: "rguenth at gcc dot gnu.org" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug c++/61971] [4.9 Regression] array subscript is above array bounds [-Werror=array-bounds]
Date: Thu, 09 Apr 2015 10:49:00 -0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gcc
X-Bugzilla-Component: c++
X-Bugzilla-Version: 4.8.2
X-Bugzilla-Keywords:
X-Bugzilla-Severity: normal
X-Bugzilla-Who: rguenth at gcc dot gnu.org
X-Bugzilla-Status: RESOLVED
X-Bugzilla-Priority: P3
X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org
X-Bugzilla-Target-Milestone: 4.9.3
X-Bugzilla-Flags:
X-Bugzilla-Changed-Fields: bug_status resolution target_milestone
Message-ID: <bug-61971-4-aQDA57azi3@http.gcc.gnu.org/bugzilla/>
In-Reply-To: <bug-61971-4@http.gcc.gnu.org/bugzilla/>
References: <bug-61971-4@http.gcc.gnu.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: 7bit
X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
X-SW-Source: 2015-04/txt/msg00667.txt.bz2
Content-length: 775

https://gcc.gnu.org/bugzilla/show_bug.cgi?ida971

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|---                         |FIXED
   Target Milestone|---                         |4.9.3

--- Comment #6 from Richard Biener <rguenth at gcc dot gnu.org> ---
This is fixed with the backport of

        2015-01-27  Richard Biener  <rguenther@suse.de>

        PR tree-optimization/56273
        PR tree-optimization/59124
        PR tree-optimization/64277
        * tree-vrp.c (vrp_finalize): Emit array-bound warnings only
        from the first VRP pass.


^ permalink raw reply	[flat|nested] 70+ messages in thread

* [Bug target/65697] __atomic memory barriers not strong enough for __sync builtins
  2015-04-08 11:25 [Bug target/65697] New: __atomic memory barriers not strong enough for __sync builtins matthew.wahab at arm dot com
                   ` (5 preceding siblings ...)
  2015-04-09 10:47 ` matthew.wahab at arm dot com
@ 2015-04-09 11:10 ` redi at gcc dot gnu.org
  2015-04-09 11:25 ` matthew.wahab at arm dot com
                   ` (61 subsequent siblings)
  68 siblings, 0 replies; 70+ messages in thread
From: redi at gcc dot gnu.org @ 2015-04-09 11:10 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65697

--- Comment #8 from Jonathan Wakely <redi at gcc dot gnu.org> ---
(In reply to Matthew Wahab from comment #7)
> I agree that this wouldn't affect valid C11 code (because of data-races) but
> my understanding is that __sync builtins don't require a C11 model. The

You say that like it's a good thing :-)

They don't require a memory model only because there wasn't a cross-platform
one that existed at the time.

> problem is that the __syncs are being implemented using atomic operations
> that do assume a C11 model and its notion of program validity. This looks
> like it would lead to differences in behaviour when code, using only the
> __sync builtins and knowing nothing of C11, is moved between targets with
> different memory models.

It seems unsurprising to me that you'll get different behaviour when trying to
use a program written with no formal memory model on platforms with different
memory models.


^ permalink raw reply	[flat|nested] 70+ messages in thread

* [Bug target/65697] __atomic memory barriers not strong enough for __sync builtins
  2015-04-08 11:25 [Bug target/65697] New: __atomic memory barriers not strong enough for __sync builtins matthew.wahab at arm dot com
                   ` (6 preceding siblings ...)
  2015-04-09 11:10 ` redi at gcc dot gnu.org
@ 2015-04-09 11:25 ` matthew.wahab at arm dot com
  2015-04-09 11:51 ` amacleod at redhat dot com
                   ` (60 subsequent siblings)
  68 siblings, 0 replies; 70+ messages in thread
From: matthew.wahab at arm dot com @ 2015-04-09 11:25 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65697

--- Comment #9 from Matthew Wahab <matthew.wahab at arm dot com> ---
(In reply to Jonathan Wakely from comment #8)
> (In reply to Matthew Wahab from comment #7)
> > I agree that this wouldn't affect valid C11 code (because of data-races) but
> > my understanding is that __sync builtins don't require a C11 model. The
> 
> You say that like it's a good thing :-)
> 
> They don't require a memory model only because there wasn't a cross-platform
> one that existed at the time.

That's certainly changed, I'm not sure its made things any simpler though.

> It seems unsurprising to me that you'll get different behaviour when trying
> to use a program written with no formal memory model on platforms with
> different memory models.

I don't think it's the programs that are at fault here. I'd expect programs to
only rely on what the documentation tells them to expect, so, if the manual
says that something is a full barrier, it seems reasonable for code to assume a
full barrier will be provided.


^ permalink raw reply	[flat|nested] 70+ messages in thread

* [Bug target/65697] __atomic memory barriers not strong enough for __sync builtins
  2015-04-08 11:25 [Bug target/65697] New: __atomic memory barriers not strong enough for __sync builtins matthew.wahab at arm dot com
                   ` (7 preceding siblings ...)
  2015-04-09 11:25 ` matthew.wahab at arm dot com
@ 2015-04-09 11:51 ` amacleod at redhat dot com
  2015-04-09 14:03 ` matthew.wahab at arm dot com
                   ` (59 subsequent siblings)
  68 siblings, 0 replies; 70+ messages in thread
From: amacleod at redhat dot com @ 2015-04-09 11:51 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65697

--- Comment #10 from Andrew Macleod <amacleod at redhat dot com> ---
(In reply to Matthew Wahab from comment #7)
> (In reply to torvald from comment #5)
> > (In reply to Matthew Wahab from comment #0)

> > 
> > I don't think that's a precise characterization.  The difference discussed
> > in that thread is that seq-cst fences have a stronger effect than seq-cst
> > stores.  This is not really a question of MEMMODEL_SEQ_CST but what you
> > apply it to.
> 
> Ok, my point was just that an __sync operation has a stronger barrier that
> an __atomic operation.
> 

I think that is just an interpretation problem or flaw in my documentation. It
was never meant to indicate any difference between __sync and MEMMODEL_SEQ_CST. 

Have you tried it on a gcc before __atomic and __sync were merged? probably GCC
4.7 and earlier.  The code should be identical.  Of course, changes to the
machine description patterns could result in differences I suppose...  but the
intent is that SEQ_CST and __sync are the same... hence the code base was
merged. 

The intention was also to deprecate __sync so I wouldn't put too much thought
into it.

^ permalink raw reply	[flat|nested] 70+ messages in thread

* [Bug target/65697] __atomic memory barriers not strong enough for __sync builtins
  2015-04-08 11:25 [Bug target/65697] New: __atomic memory barriers not strong enough for __sync builtins matthew.wahab at arm dot com
                   ` (8 preceding siblings ...)
  2015-04-09 11:51 ` amacleod at redhat dot com
@ 2015-04-09 14:03 ` matthew.wahab at arm dot com
  2015-04-10 15:09 ` jgreenhalgh at gcc dot gnu.org
                   ` (58 subsequent siblings)
  68 siblings, 0 replies; 70+ messages in thread
From: matthew.wahab at arm dot com @ 2015-04-09 14:03 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65697

--- Comment #11 from Matthew Wahab <matthew.wahab at arm dot com> ---
(In reply to Andrew Macleod from comment #10)
> (In reply to Matthew Wahab from comment #7)
>
> > Ok, my point was just that an __sync operation has a stronger barrier that
> > an __atomic operation.
> > 
> 
> I think that is just an interpretation problem or flaw in my documentation.
> It was never meant to indicate any difference between __sync and
> MEMMODEL_SEQ_CST. 

The gcc manual page for the __atomics does suggest that ATOMIC_SEQ_CST is a
full barrier so should be equivalent to the __sync full barrier. There is a
difference though and that does mean that the semantics of __sync builtins was 
changed by the merge with __atomics.

> Have you tried it on a gcc before __atomic and __sync were merged? probably
> GCC 4.7 and earlier.  The code should be identical.  Of course, changes to
> the machine description patterns could result in differences I suppose... 
> but the intent is that SEQ_CST and __sync are the same... hence the code
> base was merged. 

Aarch64 support first went into GCC 4.8 and the __atomic/__sync merge went into
GCC 4.7. 

I took a look the __sync implementation in GCC-4.6: the builtins would expand
to the sync_* instructions if a backend provided them and fall back to a
compare-swap loop otherwise. 

For backends that provided sync_* instructions, it looks like the operations
were treated as a full barrier (applying to all memory references). This is
still the case for the __atomics on some targets; e.g. i386 uses lock <inst>
which apparently acts as a full barrier. 

The compare-swap loop didn't emit any kind of barrier.

> The intention was also to deprecate __sync so I wouldn't put too much
> thought into it.

New code should certainly be pointed at the __atomics but there's likely to be
a lot existing code using __syncs. It should be possible to move it to a new
architecture without having to do potentially painful rewrites.

^ permalink raw reply	[flat|nested] 70+ messages in thread

* [Bug target/65697] __atomic memory barriers not strong enough for __sync builtins
  2015-04-08 11:25 [Bug target/65697] New: __atomic memory barriers not strong enough for __sync builtins matthew.wahab at arm dot com
                   ` (9 preceding siblings ...)
  2015-04-09 14:03 ` matthew.wahab at arm dot com
@ 2015-04-10 15:09 ` jgreenhalgh at gcc dot gnu.org
  2015-04-15 10:16 ` aph at gcc dot gnu.org
                   ` (57 subsequent siblings)
  68 siblings, 0 replies; 70+ messages in thread
From: jgreenhalgh at gcc dot gnu.org @ 2015-04-10 15:09 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65697

--- Comment #12 from James Greenhalgh <jgreenhalgh at gcc dot gnu.org> ---
There are two problems here, one of which concerns me more in the real world,
and both of which rely on races if you are in the C/C++11 model - there isn't
much I can do about that as the __sync primitives are legacy and give stronger
guarantees about "visible memory references" (which includes ALL global data).

Consider:

int x = 0;
int y = 0;
int z = 0;

void foo (void)
{
  x = 1;
  __sync_fetch_and_add (&y, 1);
  z = 1;
}

My reading of what a full barrier implies would suggest that we should never
have a modification order which is any of:

  z = 1, x = 1, y = 0+1
  z = 1, y = 0+1, x = 1
  x = 1, z = 1, y = 0+1

GCC 5.0-ish (recent trunk) will emit:

foo:
    adrp    x3, .LANCHOR0
    mov    w1, 1
    add    x0, x3, :lo12:.LANCHOR0
    add    x2, x0, 4
    str    w1, [x3, #:lo12:.LANCHOR0]
.L2:
    ldaxr    w3, [x2]
    add    w3, w3, w1
    stlxr    w4, w3, [x2]
    cbnz    w4, .L2
    str    w1, [x0, 8]
    ret

Dropping some of the context and switching to pseudo-asm we have:

    str    1, [x]
.L2:
    ldaxr    tmp, [y]
    add    tmp, tmp, 1
    stlxr    flag, tmp, [y]
    cbnz    flag, .L2
    str    1, [z]

As far as I understand it, the memory ordering guarantees of the half-barrier
LDAXR/STLXR instructions do not prevent this being reordered to an execution
which looks like:

    ldaxr    tmp, [y]
    str    1, [z]
    str    1, [x]
    add    tmp, tmp, 1
    stlxr    flag, tmp, [y]
    cbnz    flag, .L2

which gives one of the modification orders we wanted to forbid above. Similar
reordering can give the other undesirable modification orders.

As mentioned, this reordering is permitted under C11, as the stores to x and z
are racy - this permits the CPPMEM/Cambridge documentation of what an SEQ_CST
must do. However, my feeling is that it is at the very least *surprising* for
the __sync primitives to allow this ordering, and more likely points to a break
in the AArch64 implementation.

Fixing this requires a hard memory barrier - but I really don't want to see us
penalise C11 SEQ_CST to get the right behaviour out of __sync_fetch_and_add...

The second problem relies on a similar argument to show that in a
__sync_lock_test_and_set operation we can reorder a later access before the
"set" half of the operation. i.e. that

  __sync_lock_test_and_set (&y, 1)
  x = 1;

Can be seen as

  tmp = load_acquire (y)
  store (x, 1)
  store (y, 1)

Giving an incorrect modification order.

The AArch64 parts are something I can argue through with smarter people than
me, but I think my comments on the __sync primitives are correct?

^ permalink raw reply	[flat|nested] 70+ messages in thread

* [Bug target/65697] __atomic memory barriers not strong enough for __sync builtins
  2015-04-08 11:25 [Bug target/65697] New: __atomic memory barriers not strong enough for __sync builtins matthew.wahab at arm dot com
                   ` (10 preceding siblings ...)
  2015-04-10 15:09 ` jgreenhalgh at gcc dot gnu.org
@ 2015-04-15 10:16 ` aph at gcc dot gnu.org
  2015-04-15 11:11 ` mwahab at gcc dot gnu.org
                   ` (56 subsequent siblings)
  68 siblings, 0 replies; 70+ messages in thread
From: aph at gcc dot gnu.org @ 2015-04-15 10:16 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65697

Andrew Haley <aph at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |aph at gcc dot gnu.org

--- Comment #13 from Andrew Haley <aph at gcc dot gnu.org> ---
There's surely a documentation problem here.

GCC defines this:

`__ATOMIC_SEQ_CST'
     Full barrier in both directions and synchronizes with acquire
     loads and release stores in all threads.

But LDAXR/STLXR doesn't do that, and there's no write barrier at all when the
compare fails.  If the intention really is to map onto c++11, this
specification is wrong.

But if this specification is correct, then

bool gcc_cas() {
  int expected = 1;
  int desired = 2;
  return __atomic_compare_exchange(&barf, &expected, &desired, false,
                                   __ATOMIC_SEQ_CST, __ATOMIC_SEQ_CST);
}

gcc_cas():
    sub    sp, sp, #16
    mov    w1, 1
    adrp    x0, .LANCHOR0
    str    w1, [sp,12]
    add    x0, x0, :lo12:.LANCHOR0
    mov    w2, 2
.L10:
    ldaxr    w3, [x0]
    cmp    w3, w1
    bne    .L11
    stlxr    w4, w2, [x0]
    cbnz    w4, .L10
.L11:
    cset    w0, eq
    add    sp, sp, 16
    ret

is wrong.


^ permalink raw reply	[flat|nested] 70+ messages in thread

* [Bug target/65697] __atomic memory barriers not strong enough for __sync builtins
  2015-04-08 11:25 [Bug target/65697] New: __atomic memory barriers not strong enough for __sync builtins matthew.wahab at arm dot com
                   ` (11 preceding siblings ...)
  2015-04-15 10:16 ` aph at gcc dot gnu.org
@ 2015-04-15 11:11 ` mwahab at gcc dot gnu.org
  2015-04-15 11:54 ` jakub at gcc dot gnu.org
                   ` (55 subsequent siblings)
  68 siblings, 0 replies; 70+ messages in thread
From: mwahab at gcc dot gnu.org @ 2015-04-15 11:11 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65697

--- Comment #14 from mwahab at gcc dot gnu.org ---
(In reply to Andrew Haley from comment #13)
> There's surely a documentation problem here.
> 
> GCC defines this:
> 
> `__ATOMIC_SEQ_CST'
>      Full barrier in both directions and synchronizes with acquire
>      loads and release stores in all threads.

The documentation is a little misleading. Barriers in __atomic operations are
slightly weaker then barriers in __atomic fences; the differences are rather
subtle though. For the __sync implementation an fence would be ok but an
operation is too weak.

> But LDAXR/STLXR doesn't do that, and there's no write barrier at all when
> the compare fails.  If the intention really is to map onto c++11, this
> specification is wrong.

The LDAXR/STLXR sequences rely on the C11/C++11 prohibition of data races. That
the __atomic builtins assume this restriction is implied by the references to
C11/C++11 in the documentation but should probably be made clearer. I'll try to
write a patch, if nobody else gets there first.

I'll have to look at the compare-exchange code, it does looks like it goes
wrong.

^ permalink raw reply	[flat|nested] 70+ messages in thread

* [Bug target/65697] __atomic memory barriers not strong enough for __sync builtins
  2015-04-08 11:25 [Bug target/65697] New: __atomic memory barriers not strong enough for __sync builtins matthew.wahab at arm dot com
                   ` (12 preceding siblings ...)
  2015-04-15 11:11 ` mwahab at gcc dot gnu.org
@ 2015-04-15 11:54 ` jakub at gcc dot gnu.org
  2015-04-15 12:43 ` aph at gcc dot gnu.org
                   ` (54 subsequent siblings)
  68 siblings, 0 replies; 70+ messages in thread
From: jakub at gcc dot gnu.org @ 2015-04-15 11:54 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65697

--- Comment #15 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
(In reply to mwahab from comment #14)
> The LDAXR/STLXR sequences rely on the C11/C++11 prohibition of data races.
> That the __atomic builtins assume this restriction is implied by the
> references to C11/C++11 in the documentation but should probably be made
> clearer. I'll try to write a patch, if nobody else gets there first.

The __atomic builtins are used far more widely than just as the underlying
implementation for C11/C++11 atomics.  And, even on very weakly ordered
architectures like PowerPC, __atomic_ CAS __ATOMIC_SEQ_CST emits a sync (full
barrier) before the LL/SC loop and isync after the loop, so it behaves as
documented currently.


^ permalink raw reply	[flat|nested] 70+ messages in thread

* [Bug target/65697] __atomic memory barriers not strong enough for __sync builtins
  2015-04-08 11:25 [Bug target/65697] New: __atomic memory barriers not strong enough for __sync builtins matthew.wahab at arm dot com
                   ` (13 preceding siblings ...)
  2015-04-15 11:54 ` jakub at gcc dot gnu.org
@ 2015-04-15 12:43 ` aph at gcc dot gnu.org
  2015-04-15 13:01 ` mwahab at gcc dot gnu.org
                   ` (53 subsequent siblings)
  68 siblings, 0 replies; 70+ messages in thread
From: aph at gcc dot gnu.org @ 2015-04-15 12:43 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65697

--- Comment #16 from Andrew Haley <aph at gcc dot gnu.org> ---
(In reply to mwahab from comment #14)
> (In reply to Andrew Haley from comment #13)

> > But LDAXR/STLXR doesn't do that, and there's no write barrier at all when
> > the compare fails.  If the intention really is to map onto c++11, this
> > specification is wrong.
> 
> The LDAXR/STLXR sequences rely on the C11/C++11 prohibition of data races.
> That the __atomic builtins assume this restriction is implied by the
> references to C11/C++11 in the documentation but should probably be made
> clearer. I'll try to write a patch, if nobody else gets there first.

Right.

> I'll have to look at the compare-exchange code, it does looks like it goes
> wrong.

Well... goes wrong how?  If it's intended to be a C++11 sequentially-consistent
CAS, it's just fine.


^ permalink raw reply	[flat|nested] 70+ messages in thread

* [Bug target/65697] __atomic memory barriers not strong enough for __sync builtins
  2015-04-08 11:25 [Bug target/65697] New: __atomic memory barriers not strong enough for __sync builtins matthew.wahab at arm dot com
                   ` (14 preceding siblings ...)
  2015-04-15 12:43 ` aph at gcc dot gnu.org
@ 2015-04-15 13:01 ` mwahab at gcc dot gnu.org
  2015-04-15 14:10 ` aph at gcc dot gnu.org
                   ` (52 subsequent siblings)
  68 siblings, 0 replies; 70+ messages in thread
From: mwahab at gcc dot gnu.org @ 2015-04-15 13:01 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65697

--- Comment #17 from mwahab at gcc dot gnu.org ---
According to the GCC documentation, __atomic_compare_exchange(ptr, exp, des,
..) is: if (*ptr == *exp) *ptr = *exp; else *exp = *ptr;

On Aarch64 the else (*ptr != *exp) branch is a store rather than a
store-release.

This doesn't appear in your example but with
----
int cas(int* barf, int* expected, int* desired)
{
  return __atomic_compare_exchange_n(barf, expected, desired, 0,
                     __ATOMIC_SEQ_CST, __ATOMIC_SEQ_CST);
}
----
cas:
    ldr    w3, [x1]
.L3:
    ldaxr    w4, [x0]
    cmp    w4, w3
    bne    .L4
    stlxr    w5, w2, [x0]  ; store-release
    cbnz    w5, .L3
.L4:
    cset    w0, eq
    cbz    w0, .L6
    ret
    .p2align 3
.L6:
    str    w4, [x1]  ; store, no barrier.
    ret
----

This looks odd to me but I'd need look into it more.


^ permalink raw reply	[flat|nested] 70+ messages in thread

* [Bug target/65697] __atomic memory barriers not strong enough for __sync builtins
  2015-04-08 11:25 [Bug target/65697] New: __atomic memory barriers not strong enough for __sync builtins matthew.wahab at arm dot com
                   ` (15 preceding siblings ...)
  2015-04-15 13:01 ` mwahab at gcc dot gnu.org
@ 2015-04-15 14:10 ` aph at gcc dot gnu.org
  2015-04-15 14:33 ` mwahab at gcc dot gnu.org
                   ` (51 subsequent siblings)
  68 siblings, 0 replies; 70+ messages in thread
From: aph at gcc dot gnu.org @ 2015-04-15 14:10 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65697

--- Comment #18 from Andrew Haley <aph at gcc dot gnu.org> ---
(In reply to mwahab from comment #17)

> ----
> int cas(int* barf, int* expected, int* desired)
> {
>   return __atomic_compare_exchange_n(barf, expected, desired, 0,
> 				     __ATOMIC_SEQ_CST, __ATOMIC_SEQ_CST);
> }
> ----
> cas:
> 	ldr	w3, [x1]
> .L3:
> 	ldaxr	w4, [x0]
> 	cmp	w4, w3
> 	bne	.L4
> 	stlxr	w5, w2, [x0]  ; store-release
> 	cbnz	w5, .L3
> .L4:
> 	cset	w0, eq
> 	cbz	w0, .L6
> 	ret
> 	.p2align 3
> .L6:
> 	str	w4, [x1]  ; store, no barrier.
> 	ret
> ----
> 
> This looks odd to me but I'd need look into it more.

That looks fine to me: if the CAS fails, the prev value -> *expected.
>From gcc-bugs-return-483673-listarch-gcc-bugs=gcc.gnu.org@gcc.gnu.org Wed Apr 15 14:23:27 2015
Return-Path: <gcc-bugs-return-483673-listarch-gcc-bugs=gcc.gnu.org@gcc.gnu.org>
Delivered-To: listarch-gcc-bugs@gcc.gnu.org
Received: (qmail 58036 invoked by alias); 15 Apr 2015 14:23:27 -0000
Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-bugs.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-bugs/>
List-Post: <mailto:gcc-bugs@gcc.gnu.org>
List-Help: <mailto:gcc-bugs-help@gcc.gnu.org>
Sender: gcc-bugs-owner@gcc.gnu.org
Delivered-To: mailing list gcc-bugs@gcc.gnu.org
Received: (qmail 57991 invoked by uid 48); 15 Apr 2015 14:23:22 -0000
From: "dominiq at lps dot ens.fr" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug libgomp/65742] [5/6 Regression] Several libgomp.oacc-* failures after r221922.
Date: Wed, 15 Apr 2015 14:23:00 -0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gcc
X-Bugzilla-Component: libgomp
X-Bugzilla-Version: 5.0
X-Bugzilla-Keywords: openacc
X-Bugzilla-Severity: normal
X-Bugzilla-Who: dominiq at lps dot ens.fr
X-Bugzilla-Status: NEW
X-Bugzilla-Priority: P3
X-Bugzilla-Assigned-To: jules at gcc dot gnu.org
X-Bugzilla-Target-Milestone: 5.0
X-Bugzilla-Flags:
X-Bugzilla-Changed-Fields:
Message-ID: <bug-65742-4-4ZVrMujLzi@http.gcc.gnu.org/bugzilla/>
In-Reply-To: <bug-65742-4@http.gcc.gnu.org/bugzilla/>
References: <bug-65742-4@http.gcc.gnu.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: 7bit
X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
X-SW-Source: 2015-04/txt/msg01225.txt.bz2
Content-length: 217

https://gcc.gnu.org/bugzilla/show_bug.cgi?ide742

--- Comment #2 from Dominique d'Humieres <dominiq at lps dot ens.fr> ---
This PR is fixed by the patch at
https://gcc.gnu.org/ml/gcc-patches/2015-04/msg00667.html.


^ permalink raw reply	[flat|nested] 70+ messages in thread

* [Bug target/65697] __atomic memory barriers not strong enough for __sync builtins
  2015-04-08 11:25 [Bug target/65697] New: __atomic memory barriers not strong enough for __sync builtins matthew.wahab at arm dot com
                   ` (16 preceding siblings ...)
  2015-04-15 14:10 ` aph at gcc dot gnu.org
@ 2015-04-15 14:33 ` mwahab at gcc dot gnu.org
  2015-04-15 14:49 ` aph at gcc dot gnu.org
                   ` (50 subsequent siblings)
  68 siblings, 0 replies; 70+ messages in thread
From: mwahab at gcc dot gnu.org @ 2015-04-15 14:33 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65697

--- Comment #19 from mwahab at gcc dot gnu.org ---
(In reply to Andrew Haley from comment #18)
> (In reply to mwahab from comment #17)
> 
> > ----
> > int cas(int* barf, int* expected, int* desired)
> > {
> >   return __atomic_compare_exchange_n(barf, expected, desired, 0,
> > 				     __ATOMIC_SEQ_CST, __ATOMIC_SEQ_CST);
> > }
> > ----
> > cas:
> > 	ldr	w3, [x1]
> > .L3:
> > 	ldaxr	w4, [x0]
> > 	cmp	w4, w3
> > 	bne	.L4
> > 	stlxr	w5, w2, [x0]  ; store-release
> > 	cbnz	w5, .L3
> > .L4:
> > 	cset	w0, eq
> > 	cbz	w0, .L6
> > 	ret
> > 	.p2align 3
> > .L6:
> > 	str	w4, [x1]  ; store, no barrier.
> > 	ret
> > ----
> > 
> > This looks odd to me but I'd need look into it more.
> 
> That looks fine to me: if the CAS fails, the prev value -> *expected.

It looks inconsistent with C11 S7.17.7.4-2 (C++11 S29.6.4-21) "Further, if the
comparison is true, memory is affected according to the value of success, and
if the comparison is false, memory is affected according to the value of
failure." (where success and failure are the memory model arguments.) In this
case, the write to *exp should be memory_order_seq_cst.
>From gcc-bugs-return-483675-listarch-gcc-bugs=gcc.gnu.org@gcc.gnu.org Wed Apr 15 14:35:12 2015
Return-Path: <gcc-bugs-return-483675-listarch-gcc-bugs=gcc.gnu.org@gcc.gnu.org>
Delivered-To: listarch-gcc-bugs@gcc.gnu.org
Received: (qmail 75879 invoked by alias); 15 Apr 2015 14:35:12 -0000
Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-bugs.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-bugs/>
List-Post: <mailto:gcc-bugs@gcc.gnu.org>
List-Help: <mailto:gcc-bugs-help@gcc.gnu.org>
Sender: gcc-bugs-owner@gcc.gnu.org
Delivered-To: mailing list gcc-bugs@gcc.gnu.org
Received: (qmail 75817 invoked by uid 48); 15 Apr 2015 14:35:04 -0000
From: "d.v.a at ngs dot ru" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug libstdc++/60936] [4.9/5/6 Regression] Binary code bloat with std::string
Date: Wed, 15 Apr 2015 14:35:00 -0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gcc
X-Bugzilla-Component: libstdc++
X-Bugzilla-Version: 4.9.0
X-Bugzilla-Keywords:
X-Bugzilla-Severity: normal
X-Bugzilla-Who: d.v.a at ngs dot ru
X-Bugzilla-Status: NEW
X-Bugzilla-Priority: P3
X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org
X-Bugzilla-Target-Milestone: 6.0
X-Bugzilla-Flags:
X-Bugzilla-Changed-Fields:
Message-ID: <bug-60936-4-JeyaFXzuzZ@http.gcc.gnu.org/bugzilla/>
In-Reply-To: <bug-60936-4@http.gcc.gnu.org/bugzilla/>
References: <bug-60936-4@http.gcc.gnu.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: 7bit
X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
X-SW-Source: 2015-04/txt/msg01227.txt.bz2
Content-length: 146

https://gcc.gnu.org/bugzilla/show_bug.cgi?id`936

--- Comment #10 from __vic <d.v.a at ngs dot ru> ---
What brings new dependences on locales?


^ permalink raw reply	[flat|nested] 70+ messages in thread

* [Bug target/65697] __atomic memory barriers not strong enough for __sync builtins
  2015-04-08 11:25 [Bug target/65697] New: __atomic memory barriers not strong enough for __sync builtins matthew.wahab at arm dot com
                   ` (17 preceding siblings ...)
  2015-04-15 14:33 ` mwahab at gcc dot gnu.org
@ 2015-04-15 14:49 ` aph at gcc dot gnu.org
  2015-04-15 15:49 ` torvald at gcc dot gnu.org
                   ` (49 subsequent siblings)
  68 siblings, 0 replies; 70+ messages in thread
From: aph at gcc dot gnu.org @ 2015-04-15 14:49 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65697

--- Comment #20 from Andrew Haley <aph at gcc dot gnu.org> ---
(In reply to mwahab from comment #19)
> (In reply to Andrew Haley from comment #18)
> 
> It looks inconsistent with C11 S7.17.7.4-2 (C++11 S29.6.4-21) "Further, if
> the comparison is true, memory is affected according to the value of
> success, and if the comparison is false, memory is affected according to the
> value of failure." (where success and failure are the memory model
> arguments.) In this case, the write to *exp should be memory_order_seq_cst.

But no store actually takes place, so the only effect is that of the read. You
can't have a sequentially consistent store without a store.


^ permalink raw reply	[flat|nested] 70+ messages in thread

* [Bug target/65697] __atomic memory barriers not strong enough for __sync builtins
  2015-04-08 11:25 [Bug target/65697] New: __atomic memory barriers not strong enough for __sync builtins matthew.wahab at arm dot com
                   ` (18 preceding siblings ...)
  2015-04-15 14:49 ` aph at gcc dot gnu.org
@ 2015-04-15 15:49 ` torvald at gcc dot gnu.org
  2015-04-15 20:05 ` torvald at gcc dot gnu.org
                   ` (48 subsequent siblings)
  68 siblings, 0 replies; 70+ messages in thread
From: torvald at gcc dot gnu.org @ 2015-04-15 15:49 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65697

--- Comment #21 from torvald at gcc dot gnu.org ---
(In reply to Andrew Haley from comment #20)
> (In reply to mwahab from comment #19)
> > (In reply to Andrew Haley from comment #18)
> > 
> > It looks inconsistent with C11 S7.17.7.4-2 (C++11 S29.6.4-21) "Further, if
> > the comparison is true, memory is affected according to the value of
> > success, and if the comparison is false, memory is affected according to the
> > value of failure." (where success and failure are the memory model
> > arguments.) In this case, the write to *exp should be memory_order_seq_cst.
> 
> But no store actually takes place, so the only effect is that of the read.
> You can't have a sequentially consistent store without a store.

I agree. If you continue reading in the C++11 paragraph that you cited, you'll
see that if just one MO is provided and the CAS fails, an acq_rel MO is
downgraded to acquire and a release MO to relaxed.  This is consistent with no
update of the atomic variable (note that expected is not atomic, so applying an
MO to accesses to it is not meaningful in any case).  However, if the provided
MO is seq_cst, I think a failed CAS still needs to be equivalent to a seq_cst
load.


^ permalink raw reply	[flat|nested] 70+ messages in thread

* [Bug target/65697] __atomic memory barriers not strong enough for __sync builtins
  2015-04-08 11:25 [Bug target/65697] New: __atomic memory barriers not strong enough for __sync builtins matthew.wahab at arm dot com
                   ` (19 preceding siblings ...)
  2015-04-15 15:49 ` torvald at gcc dot gnu.org
@ 2015-04-15 20:05 ` torvald at gcc dot gnu.org
  2015-04-15 20:49 ` jgreenhalgh at gcc dot gnu.org
                   ` (47 subsequent siblings)
  68 siblings, 0 replies; 70+ messages in thread
From: torvald at gcc dot gnu.org @ 2015-04-15 20:05 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65697

--- Comment #22 from torvald at gcc dot gnu.org ---
(In reply to James Greenhalgh from comment #12)
> There are two problems here, one of which concerns me more in the real
> world, and both of which rely on races if you are in the C/C++11 model -
> there isn't much I can do about that as the __sync primitives are legacy and
> give stronger guarantees about "visible memory references" (which includes
> ALL global data).
> 
> Consider:
> 
> int x = 0;
> int y = 0;
> int z = 0;
> 
> void foo (void)
> {
>   x = 1;
>   __sync_fetch_and_add (&y, 1);
>   z = 1;
> }
> 
> My reading of what a full barrier implies would suggest that we should never
> have a modification order which is any of:
> 
>   z = 1, x = 1, y = 0+1
>   z = 1, y = 0+1, x = 1
>   x = 1, z = 1, y = 0+1

At least in C11/C++11, modification orders are per memory location.  If we want
to have something that orders accesses to x, y, and z, we need to model the
observer side too (which the C11 model does, for example).  I'm mentioning that
because at the very least, we have compiler reordering happening at the
observer side; if a programmer would need to use, for example, __sync builtins
to order the observers, then this means we "just" have to consider combinations
of those on the modification and the observation side.

(And because I'm not sure what you think "modification order" is, I can't
comment further.)

> GCC 5.0-ish (recent trunk) will emit:
> 
> foo:
> 	adrp	x3, .LANCHOR0
> 	mov	w1, 1
> 	add	x0, x3, :lo12:.LANCHOR0
> 	add	x2, x0, 4
> 	str	w1, [x3, #:lo12:.LANCHOR0]
> .L2:
> 	ldaxr	w3, [x2]
> 	add	w3, w3, w1
> 	stlxr	w4, w3, [x2]
> 	cbnz	w4, .L2
> 	str	w1, [x0, 8]
> 	ret
> 
> Dropping some of the context and switching to pseudo-asm we have:
> 
> 	str	1, [x]
> .L2:
> 	ldaxr	tmp, [y]
> 	add	tmp, tmp, 1
> 	stlxr	flag, tmp, [y]
> 	cbnz	flag, .L2
> 	str	1, [z]
> 
> As far as I understand it, the memory ordering guarantees of the
> half-barrier LDAXR/STLXR instructions do not prevent this being reordered to
> an execution which looks like:
> 
> 	ldaxr	tmp, [y]
> 	str	1, [z]
> 	str	1, [x]
> 	add	tmp, tmp, 1
> 	stlxr	flag, tmp, [y]
> 	cbnz	flag, .L2
>    
> which gives one of the modification orders we wanted to forbid above.
> Similar reordering can give the other undesirable modification orders.

(I'm not familiar with the ARM model, so please bear with me.)

This is reordering the HW can do?  Or are you concerned about the compiler
backend?
Would the reordered str always become visible atomically with the stlxr?
Would it become visible even if the LLSC fails, thus potentially storing more
than once to z?  This would be surprising in that the ldaxr would then have to
be reloaded too potentially after the store to z, I believe (at least for a
strong CAS) -- which would break the acquire MO on the load.

> As mentioned, this reordering is permitted under C11, as the stores to x and
> z are racy - this permits the CPPMEM/Cambridge documentation of what an
> SEQ_CST must do.

That's true in this example, but at the instruction level (assuming HW
reordering is what you're concerned about), a atomic relaxed-MO load isn't
distinguishable from a normal memory access, right?  So, it's not DRF what this
is strictly about, but the difference between C11 seq_cst fences and seq_cst
RMW ops.

> However, my feeling is that it is at the very least
> *surprising* for the __sync primitives to allow this ordering, and more
> likely points to a break in the AArch64 implementation.

With the caveat that given that __sync isn't documented in great detail, a lot
of interpretations might happen in practice, so there might be a few surprises
to some people :)

> Fixing this requires a hard memory barrier - but I really don't want to see
> us penalise C11 SEQ_CST to get the right behaviour out of
> __sync_fetch_and_add...

I agree.
>From gcc-bugs-return-483698-listarch-gcc-bugs=gcc.gnu.org@gcc.gnu.org Wed Apr 15 20:09:37 2015
Return-Path: <gcc-bugs-return-483698-listarch-gcc-bugs=gcc.gnu.org@gcc.gnu.org>
Delivered-To: listarch-gcc-bugs@gcc.gnu.org
Received: (qmail 49836 invoked by alias); 15 Apr 2015 20:09:36 -0000
Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-bugs.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-bugs/>
List-Post: <mailto:gcc-bugs@gcc.gnu.org>
List-Help: <mailto:gcc-bugs-help@gcc.gnu.org>
Sender: gcc-bugs-owner@gcc.gnu.org
Delivered-To: mailing list gcc-bugs@gcc.gnu.org
Received: (qmail 49797 invoked by uid 48); 15 Apr 2015 20:09:32 -0000
From: "prathamesh3492 at gcc dot gnu.org" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug lto/65776] New: ICE in varpool_node::get_constructor() during chromium build on arm-linux-gnueabihf with LTO
Date: Wed, 15 Apr 2015 20:09:00 -0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: new
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gcc
X-Bugzilla-Component: lto
X-Bugzilla-Version: unknown
X-Bugzilla-Keywords:
X-Bugzilla-Severity: normal
X-Bugzilla-Who: prathamesh3492 at gcc dot gnu.org
X-Bugzilla-Status: UNCONFIRMED
X-Bugzilla-Priority: P3
X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org
X-Bugzilla-Target-Milestone: ---
X-Bugzilla-Flags:
X-Bugzilla-Changed-Fields: bug_id short_desc product version bug_status bug_severity priority component assigned_to reporter
Message-ID: <bug-65776-4@http.gcc.gnu.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: 7bit
X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
X-SW-Source: 2015-04/txt/msg01250.txt.bz2
Content-length: 3477

https://gcc.gnu.org/bugzilla/show_bug.cgi?ide776

            Bug ID: 65776
           Summary: ICE in varpool_node::get_constructor() during chromium
                    build on arm-linux-gnueabihf with LTO
           Product: gcc
           Version: unknown
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: lto
          Assignee: unassigned at gcc dot gnu.org
          Reporter: prathamesh3492 at gcc dot gnu.org

Hi,
Cross compiling chromium on arm-linux-gnueabihf results in following ICE with
LTO enabled (builds fine without LTO):
[13545/18665] SOLINK lib/libblink_web.so

lto1: internal compiler error: in get_constructor, at varpool.c:331
0xc65183 varpool_node::get_constructor()

/home/prathamesh.kulkarni/gnu-toolchain/src/gcc.git~gcc-chromium/gcc/varpool.c:331
0xf09b66 ipa_icf::sem_variable::equals(ipa_icf::sem_item*,
hash_map<symtab_node*, ipa_icf::sem_item*, default_hashmap_traits>&)

/home/prathamesh.kulkarni/gnu-toolchain/src/gcc.git~gcc-chromium/gcc/ipa-icf.c:1698
0xf072f2 ipa_icf::sem_item_optimizer::subdivide_classes_by_equality(bool)

/home/prathamesh.kulkarni/gnu-toolchain/src/gcc.git~gcc-chromium/gcc/ipa-icf.c:2657
0xf0f8dc ipa_icf::sem_item_optimizer::execute()

/home/prathamesh.kulkarni/gnu-toolchain/src/gcc.git~gcc-chromium/gcc/ipa-icf.c:2421
0xf11c5e ipa_icf_driver

/home/prathamesh.kulkarni/gnu-toolchain/src/gcc.git~gcc-chromium/gcc/ipa-icf.c:3306
0xf11c5e ipa_icf::pass_ipa_icf::execute(function*)

/home/prathamesh.kulkarni/gnu-toolchain/src/gcc.git~gcc-chromium/gcc/ipa-icf.c:3353

gcc -v:
Using built-in specs.
COLLECT_GCC=/home/prathamesh.kulkarni/gnu-toolchain/master-arm-linux-gnueabihf/builds/destdir/x86_64-unknown-linux-gnu/bin/arm-linux-gnueabihf-gcc
COLLECT_LTO_WRAPPER=/home/prathamesh.kulkarni/gnu-toolchain/master-arm-linux-gnueabihf/builds/destdir/x86_64-unknown-linux-gnu/libexec/gcc/arm-linux-gnueabihf/5.0.0/lto-wrapper
Target: arm-linux-gnueabihf
Configured with:
'/home/prathamesh.kulkarni/gnu-toolchain/src/gcc.git~gcc-chromium/configure'
SHELL=/bin/bash --with-bugurl=https://bugs.linaro.org
--with-mpc=/home/prathamesh.kulkarni/gnu-toolchain/master-arm-linux-gnueabihf/builds/destdir/x86_64-unknown-linux-gnu
--with-mpfr=/home/prathamesh.kulkarni/gnu-toolchain/master-arm-linux-gnueabihf/builds/destdir/x86_64-unknown-linux-gnu
--with-gmp=/home/prathamesh.kulkarni/gnu-toolchain/master-arm-linux-gnueabihf/builds/destdir/x86_64-unknown-linux-gnu
--with-gnu-as --with-gnu-ld --disable-libstdcxx-pch --disable-libmudflap
--with-cloog=no --with-ppl=no --with-isl=no --disable-nls --enable-multiarch
--disable-multilib --enable-c99 --with-tune=cortex-a9 --with-arch=armv7-a
--with-fpu=vfpv3-d16 --with-float=hard --with-mode=thumb
--with-build-sysroot=/home/prathamesh.kulkarni/gnu-toolchain/master-arm-linux-gnueabihf/sysroots/arm-linux-gnueabihf
--enable-lto --enable-linker-build-id --enable-long-long --enable-shared
--with-sysroot=/home/prathamesh.kulkarni/gnu-toolchain/master-arm-linux-gnueabihf/builds/sysroot-arm-linux-gnueabihf
--enable-languages=c,c++,lto --enable-fix-cortex-a53-835769
--enable-checking=yes --disable-bootstrap --build=x86_64-unknown-linux-gnu
--host=x86_64-unknown-linux-gnu --target=arm-linux-gnueabihf
--prefix=/home/prathamesh.kulkarni/gnu-toolchain/master-arm-linux-gnueabihf/builds/destdir/x86_64-unknown-linux-gnu
Thread model: posix
gcc version 5.0.0 20150410 (experimental) (GCC)

Thank you,
Prathamesh


^ permalink raw reply	[flat|nested] 70+ messages in thread

* [Bug target/65697] __atomic memory barriers not strong enough for __sync builtins
  2015-04-08 11:25 [Bug target/65697] New: __atomic memory barriers not strong enough for __sync builtins matthew.wahab at arm dot com
                   ` (20 preceding siblings ...)
  2015-04-15 20:05 ` torvald at gcc dot gnu.org
@ 2015-04-15 20:49 ` jgreenhalgh at gcc dot gnu.org
  2015-04-15 22:13 ` torvald at gcc dot gnu.org
                   ` (46 subsequent siblings)
  68 siblings, 0 replies; 70+ messages in thread
From: jgreenhalgh at gcc dot gnu.org @ 2015-04-15 20:49 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65697

--- Comment #23 from James Greenhalgh <jgreenhalgh at gcc dot gnu.org> ---
(In reply to torvald from comment #22)
> (In reply to James Greenhalgh from comment #12)
> > There are two problems here, one of which concerns me more in the real
> > world, and both of which rely on races if you are in the C/C++11 model -
> > there isn't much I can do about that as the __sync primitives are legacy and
> > give stronger guarantees about "visible memory references" (which includes
> > ALL global data).
> > 
> > Consider:
> > 
> > int x = 0;
> > int y = 0;
> > int z = 0;
> > 
> > void foo (void)
> > {
> >   x = 1;
> >   __sync_fetch_and_add (&y, 1);
> >   z = 1;
> > }
> > 
> > My reading of what a full barrier implies would suggest that we should never
> > have a modification order which is any of:
> > 
> >   z = 1, x = 1, y = 0+1
> >   z = 1, y = 0+1, x = 1
> >   x = 1, z = 1, y = 0+1
> 
> At least in C11/C++11, modification orders are per memory location.  If we
> want to have something that orders accesses to x, y, and z, we need to model
> the observer side too (which the C11 model does, for example).  I'm
> mentioning that because at the very least, we have compiler reordering
> happening at the observer side; if a programmer would need to use, for
> example, __sync builtins to order the observers, then this means we "just"
> have to consider combinations of those on the modification and the
> observation side.
> 
> (And because I'm not sure what you think "modification order" is, I can't
> comment further.)

Yes. I'm sorry,  I was imprecise in my language as I couldn't quite find the
right phrase in specifications - what I am trying to get at is: the order in
which observers in the system are able to observe the writes to x, y, z. Later
in this post I'll call this "observation order", until I can find a better
terminology.

I believe the __sync builtins should guarantee that an observer could trust:

  assert (z == 1 && y == 1 && x ==1);

to hold.

That is, the write to z is observed if and only if the writes to y and x have
also been observed.

> > GCC 5.0-ish (recent trunk) will emit:
> > 
> > foo:
> > 	adrp	x3, .LANCHOR0
> > 	mov	w1, 1
> > 	add	x0, x3, :lo12:.LANCHOR0
> > 	add	x2, x0, 4
> > 	str	w1, [x3, #:lo12:.LANCHOR0]
> > .L2:
> > 	ldaxr	w3, [x2]
> > 	add	w3, w3, w1
> > 	stlxr	w4, w3, [x2]
> > 	cbnz	w4, .L2
> > 	str	w1, [x0, 8]
> > 	ret
> > 
> > Dropping some of the context and switching to pseudo-asm we have:
> > 
> > 	str	1, [x]
> > .L2:
> > 	ldaxr	tmp, [y]
> > 	add	tmp, tmp, 1
> > 	stlxr	flag, tmp, [y]
> > 	cbnz	flag, .L2
> > 	str	1, [z]
> > 
> > As far as I understand it, the memory ordering guarantees of the
> > half-barrier LDAXR/STLXR instructions do not prevent this being reordered to
> > an execution which looks like:
> > 
> > 	ldaxr	tmp, [y]
> > 	str	1, [z]
> > 	str	1, [x]
> > 	add	tmp, tmp, 1
> > 	stlxr	flag, tmp, [y]
> > 	cbnz	flag, .L2
> >    
> > which gives one of the modification orders we wanted to forbid above.
> > Similar reordering can give the other undesirable modification orders.
> 
> (I'm not familiar with the ARM model, so please bear with me.)
> 
> This is reordering the HW can do?  Or are you concerned about the compiler
> backend?

The architectural description of the memory model - this is reordering the HW
is permitted to do.

> Would the reordered str always become visible atomically with the stlxr?
> Would it become visible even if the LLSC fails, thus potentially storing
> more than once to z?  This would be surprising in that the ldaxr would then
> have to be reloaded too potentially after the store to z, I believe (at
> least for a strong CAS) -- which would break the acquire MO on the load.

The STR would not become visible until after the stlxr had resolved as
successful, however, it can take a position of the observation order ahead of
the stlxr, such the the ordering of writes seen by observers would not be
program order.

> > As mentioned, this reordering is permitted under C11, as the stores to x and
> > z are racy - this permits the CPPMEM/Cambridge documentation of what an
> > SEQ_CST must do.
> 
> That's true in this example, but at the instruction level (assuming HW
> reordering is what you're concerned about), a atomic relaxed-MO load isn't
> distinguishable from a normal memory access, right?  So, it's not DRF what
> this is strictly about, but the difference between C11 seq_cst fences and
> seq_cst RMW ops.

Absolutely! I was trying to preempt "That program is racy" responses :).

> > However, my feeling is that it is at the very least
> > *surprising* for the __sync primitives to allow this ordering, and more
> > likely points to a break in the AArch64 implementation.
> 
> With the caveat that given that __sync isn't documented in great detail, a
> lot of interpretations might happen in practice, so there might be a few
> surprises to some people :)

:) Agreed.

> > Fixing this requires a hard memory barrier - but I really don't want to see
> > us penalise C11 SEQ_CST to get the right behaviour out of
> > __sync_fetch_and_add...
> 
> I agree.
>From gcc-bugs-return-483700-listarch-gcc-bugs=gcc.gnu.org@gcc.gnu.org Wed Apr 15 21:01:36 2015
Return-Path: <gcc-bugs-return-483700-listarch-gcc-bugs=gcc.gnu.org@gcc.gnu.org>
Delivered-To: listarch-gcc-bugs@gcc.gnu.org
Received: (qmail 120028 invoked by alias); 15 Apr 2015 21:01:35 -0000
Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-bugs.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-bugs/>
List-Post: <mailto:gcc-bugs@gcc.gnu.org>
List-Help: <mailto:gcc-bugs-help@gcc.gnu.org>
Sender: gcc-bugs-owner@gcc.gnu.org
Delivered-To: mailing list gcc-bugs@gcc.gnu.org
Received: (qmail 117513 invoked by uid 48); 15 Apr 2015 21:01:29 -0000
From: "rajendray_14 at yahoo dot co.in" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug c/65777] New: SPECOMP component  362.fma3d fails with error "SIGSEGV, segmentation fault occurred"
Date: Wed, 15 Apr 2015 21:01:00 -0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: new
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gcc
X-Bugzilla-Component: c
X-Bugzilla-Version: unknown
X-Bugzilla-Keywords:
X-Bugzilla-Severity: critical
X-Bugzilla-Who: rajendray_14 at yahoo dot co.in
X-Bugzilla-Status: UNCONFIRMED
X-Bugzilla-Priority: P3
X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org
X-Bugzilla-Target-Milestone: ---
X-Bugzilla-Flags:
X-Bugzilla-Changed-Fields: bug_id short_desc product version bug_status bug_severity priority component assigned_to reporter
Message-ID: <bug-65777-4@http.gcc.gnu.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: 7bit
X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
X-SW-Source: 2015-04/txt/msg01252.txt.bz2
Content-length: 2377

https://gcc.gnu.org/bugzilla/show_bug.cgi?ide777

            Bug ID: 65777
           Summary: SPECOMP component  362.fma3d fails with error
                    "SIGSEGV, segmentation fault occurred"
           Product: gcc
           Version: unknown
            Status: UNCONFIRMED
          Severity: critical
          Priority: P3
         Component: c
          Assignee: unassigned at gcc dot gnu.org
          Reporter: rajendray_14 at yahoo dot co.in

While running SPECOMP2012 kit on SLES 12/SLES 11 SP3, I get the following
error:

362.fma3d: copy 0 non-zero return code (exit codeF, signal=0)


****************************************
Contents of fma3d.err
****************************************
forrtl: severe (174): SIGSEGV, segmentation fault occurred
Image              PC                Routine            Line        Source
fma3d_base.hsw-sm  0000000000499D91  Unknown               Unknown  Unknown
fma3d_base.hsw-sm  00000000004984E7  Unknown               Unknown  Unknown
fma3d_base.hsw-sm  000000000044EBB4  Unknown               Unknown  Unknown
fma3d_base.hsw-sm  000000000044E9C6  Unknown               Unknown  Unknown
fma3d_base.hsw-sm  0000000000405FC4  Unknown               Unknown  Unknown
fma3d_base.hsw-sm  0000000000409BAD  Unknown               Unknown  Unknown
libpthread.so.0    00002AB56408C890  Unknown               Unknown  Unknown
libpthread.so.0    00002AB56408E1B8  Unknown               Unknown  Unknown
fma3d_base.hsw-sm  0000000000414310  Unknown               Unknown  Unknown
fma3d_base.hsw-sm  000000000040FBE6  Unknown               Unknown  Unknown
fma3d_base.hsw-sm  000000000058AA0E  Unknown               Unknown  Unknown
fma3d_base.hsw-sm  00000000004A45FE  Unknown               Unknown  Unknown
fma3d_base.hsw-sm  00000000004A3F61  Unknown               Unknown  Unknown
fma3d_base.hsw-sm  000000000040307E  Unknown               Unknown  Unknown
libc.so.6          00002AB5642BBB05  Unknown               Unknown  Unknown
fma3d_base.hsw-sm  0000000000402F89  Unknown               Unknown  Unknown

****************************************
Invalid run; unable to continue.
If you wish to ignore errors please use '-I' or ignore_errors

System info: SUSE Linux Enterprise Server 12 (x86_64) VERSION = 12
Kernel: 3.12.39-47-default
GCC: gcc version 4.8.3 20140627 [gcc-4_8-branch revision 212064] (SUSE Linux)


^ permalink raw reply	[flat|nested] 70+ messages in thread

* [Bug target/65697] __atomic memory barriers not strong enough for __sync builtins
  2015-04-08 11:25 [Bug target/65697] New: __atomic memory barriers not strong enough for __sync builtins matthew.wahab at arm dot com
                   ` (21 preceding siblings ...)
  2015-04-15 20:49 ` jgreenhalgh at gcc dot gnu.org
@ 2015-04-15 22:13 ` torvald at gcc dot gnu.org
  2015-04-16  3:45 ` amacleod at redhat dot com
                   ` (45 subsequent siblings)
  68 siblings, 0 replies; 70+ messages in thread
From: torvald at gcc dot gnu.org @ 2015-04-15 22:13 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65697

--- Comment #24 from torvald at gcc dot gnu.org ---
I think we need to at least clarify the documentation of __atomic, probably
also of __sync; we might also have to change the implementation of __sync
builtins on some archs.

First, I think the __atomic builtins should strictly follow the C11 model --
one major reason being that it's not worthwhile for us to support a
GCC-specific memory model, which is complex.  Therefore, I think we should
clarify in the __atomic docs that C++11 is the reference semantics, and that
any additional text we provide (e.g., to describe what MEMMODEL_SEQ_CST does)
is just some hand-wavy attempt of makings things look simpler.  I'd choose
C++11 over C11 because, although the models are supposed to be equal, C++11 has
more detailed docs in the area of atomics and synchronization in general, IMO. 
There are also more people involved in C++ standardization than C
standardization.

The only exception that I see is that we might want to give more guarantees wrt
data-race-freedom (DRF) as far as compiler reordering is concerned.  I'm
mentioning just the compiler because I'm not aware of GCC supporting C11/C++11
on any archs whose ISA distinguishes between atomic and nonatomic accesses (ie,
where DRF is visible at the ISA level); however, I've heard of upcoming archs
that may do that.  Trying to promise some support for non-DRF programs that
synchronize might make transition of legacy code to __atomic builtins easier,
but it's a slippery slope; I don't think it's worthwhile for us to try to give
really precise and hard guarantees, just because of the complexity involved. 
I'm not quite sure what's best here.

Second, in C11 there is a difference between a memory access or
read-modify-write op (RMW) that is seq-cst, and a seq-cst fence.  For
reference, try this example in cppmem:

int main() {
  atomic_int x = 0; 
  atomic_int y = 0; 
  atomic_int d1 = 0; 
  atomic_int d2 = 0; 
  {{{ {
        x.store(1, memory_order_relaxed);
        // atomic_thread_fence(memory_order_seq_cst);
        r3=cas_strong(d1, 0, 1);
        r1=y.load(memory_order_relaxed).readsvalue(0);
      }
  ||| {
        y.store(1, memory_order_relaxed);
        // atomic_thread_fence(memory_order_seq_cst);
        r4=cas_strong(d2, 0, 1);
        r2=x.load(memory_order_relaxed).readsvalue(0);
      }
  }}};
  return 0; }

This is Dekker synchronization: It doesn't work (ie, both threads can read
value 0) with seq-cst RMWs to separate locations (d1 and d2) even though those
RMWs will be totally ordered; however, it does work when using fences (see the
comments).  Therefore, for __atomic, it's not only the MEMMODEL_* value that
counts, but also which operation it is applied to.  Perhaps we need to point
this out in the docs.

However, our current docs for __sync indicate that most of the __sync RMWs are
"full barriers". The latter has a very TSO-like description, and my guess would
be that most people might understand it this way too (ie, that it would at
least be as strong as a seq-cst C11 fence).  Also, Andrew mentioned offline
that prior documentation seemed to have described the __sync RMWs as
  __sync_synchronize();
  operation...
  __sync_synchronize();

For x86 and similar, a C11-seq-cst RMW will be a full barrier; Jakub says the
same holds for PowerPC currently.  AFAIU, on ARM it doesn't (but it kind of
surprises me that HW would reorder into an LLSC region; my guess would that
this complicates matters, and I can't see a benefit).

So, at least on ARM, there seems to be a difference between a __sync RMW that
is documented to be at least as strong as a C11 seq-cst fence, and how we
implement a C11 seq-cst fence.  This difference is observable by a program, but
doing so requires a non-DRF program (but there are no relaxed-MO __sync
variants, so programs that used __sync might just relied on non-DRF accesses).

I would guess that programs that actually contain synchronization code affected
by this difference are rather rare.  Normal locks and such are not affected, as
are many common synchronization algorithms such as for linked lists etc (eg,
which work fine with acq_rel MO ops).  I'd say that the full fences are really
necessary mostly if one wants to synchronize using separate locations and
without a global common order for those accesses (eg, using several mem
locations and asymmetric access patterns to tune contention favorably for
common cases (eg, Dekker-like stuff if made asymmetric; libitm uses that)).  I
guess it's much more common to "route" the synchronization through centralized
or hierarchical locations so that eventually there is a tie-breaker.  For
example, AFAIR there's no use of a seq-cst fence in current glibc.

Thus, we need to at least improve the __sync docs.  If we want to really make
them C11-like, we need to update the docs, and there might be legacy code
assuming different semantics that breaks.  If we don't, we need to update the
implementation of __sync RMWs.  I don't know what would be the easiest way to
do that:
1) We could put __synch_synchronize around them on all archs, and just don't
care about the performance overhead (thinking that we want to deprecate them
anyway).  But unless the backends optimize unnecessary sync ops (eg, on x86,
remove mfence adjacent to a lock'd RMW), there could be performance degradation
for __sync-using code.
2) We could make __atomic RMWs with MEMMODEL_SEQ_CST stronger than required by
C11.  This could result in other C11-conforming compilers to produce more
efficient synchronization code, and is thus not a good option IMO.
3) We could do something just on ARM (and scan other arcs for similar issues). 
That's perhaps the cleanest option.

Thoughts?

^ permalink raw reply	[flat|nested] 70+ messages in thread

* [Bug target/65697] __atomic memory barriers not strong enough for __sync builtins
  2015-04-08 11:25 [Bug target/65697] New: __atomic memory barriers not strong enough for __sync builtins matthew.wahab at arm dot com
                   ` (22 preceding siblings ...)
  2015-04-15 22:13 ` torvald at gcc dot gnu.org
@ 2015-04-16  3:45 ` amacleod at redhat dot com
  2015-04-16  8:11 ` mwahab at gcc dot gnu.org
                   ` (44 subsequent siblings)
  68 siblings, 0 replies; 70+ messages in thread
From: amacleod at redhat dot com @ 2015-04-16  3:45 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65697

--- Comment #25 from Andrew Macleod <amacleod at redhat dot com> ---
My opinion:

1) is undesirable... even though it could possibly accelerate the conversion of
legacy sync to atomic calls... I fear it would instead just cause frustration,
annoyance and worse.  I don't think we need to actually penalize sync for no
reason better than a 1 in a somethillion shot in the dark.

2) Similar reasoning, I don't think we need to penalize SEQ_CST everywhere for
a legacy sync call that probably doesn't need stronger code either.

which leaves option 3)

There are 2 primary options I think

a) introduce an additional memory model... MEMMODEL_SYNC or something which is
even stronger than SEQ_CST.  This would involve fixing any places that assume
SEQ_CST is the highest.  And then we use this from the places that expand sync
calls as the memory model.  

or
b) When we are expanding sync calls, we first look for target __sync patterns
instead of atomic patterns.  If they exist, we use those. Otherwise we simply
expand like we do today.  

(b) may be easier to implement, but puts more onus on the target.. probably
involves more complex patterns since you need both atomic and sync patterns. My
guess is some duplication of code will occur here.  But the impact is only to
specific targets.

(a) Is probably cleaner... just touches a lot more code.  Since we're in stage
1, maybe (a) is a better long term solution...?   

Documentation needs updating for sure... The rules have changed under us since
originally SEQ_CST and sync were intended to be the same thing... Guess I
shouldn't have tied our horse to the C++11 cart :-)

^ permalink raw reply	[flat|nested] 70+ messages in thread

* [Bug target/65697] __atomic memory barriers not strong enough for __sync builtins
  2015-04-08 11:25 [Bug target/65697] New: __atomic memory barriers not strong enough for __sync builtins matthew.wahab at arm dot com
                   ` (23 preceding siblings ...)
  2015-04-16  3:45 ` amacleod at redhat dot com
@ 2015-04-16  8:11 ` mwahab at gcc dot gnu.org
  2015-04-16  8:50 ` mwahab at gcc dot gnu.org
                   ` (43 subsequent siblings)
  68 siblings, 0 replies; 70+ messages in thread
From: mwahab at gcc dot gnu.org @ 2015-04-16  8:11 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65697

--- Comment #26 from mwahab at gcc dot gnu.org ---
(In reply to torvald from comment #21)
> (In reply to Andrew Haley from comment #20)
> > (In reply to mwahab from comment #19)
> > > (In reply to Andrew Haley from comment #18)
> > > 
> > > It looks inconsistent with C11 S7.17.7.4-2 (C++11 S29.6.4-21) "Further, if
> > > the comparison is true, memory is affected according to the value of
> > > success, and if the comparison is false, memory is affected according to the
> > > value of failure." (where success and failure are the memory model
> > > arguments.) In this case, the write to *exp should be memory_order_seq_cst.
> > 
> > But no store actually takes place, so the only effect is that of the read.
> > You can't have a sequentially consistent store without a store.
> 
> I agree. If you continue reading in the C++11 paragraph that you cited,
> you'll see that if just one MO is provided and the CAS fails, an acq_rel MO
> is downgraded to acquire and a release MO to relaxed.  This is consistent
> with no update of the atomic variable (note that expected is not atomic, so
> applying an MO to accesses to it is not meaningful in any case).  However,
> if the provided MO is seq_cst, I think a failed CAS still needs to be
> equivalent to a seq_cst load.

Yes, the last two sentences in the C++11 paragraph make it clear: "If the
operation returns true, these operations are atomic read-modify-write
operations (1.10). Otherwise, these operations are atomic load operations."  In
that case, the Aarch64 code looks ok.


^ permalink raw reply	[flat|nested] 70+ messages in thread

* [Bug target/65697] __atomic memory barriers not strong enough for __sync builtins
  2015-04-08 11:25 [Bug target/65697] New: __atomic memory barriers not strong enough for __sync builtins matthew.wahab at arm dot com
                   ` (24 preceding siblings ...)
  2015-04-16  8:11 ` mwahab at gcc dot gnu.org
@ 2015-04-16  8:50 ` mwahab at gcc dot gnu.org
  2015-04-16  9:01 ` jgreenhalgh at gcc dot gnu.org
                   ` (42 subsequent siblings)
  68 siblings, 0 replies; 70+ messages in thread
From: mwahab at gcc dot gnu.org @ 2015-04-16  8:50 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65697

--- Comment #27 from mwahab at gcc dot gnu.org ---
(In reply to Andrew Macleod from comment #25)
> My opinion:
> 
> 1) is undesirable... even though it could possibly accelerate the conversion
> of legacy sync to atomic calls... I fear it would instead just cause
> frustration, annoyance and worse.  I don't think we need to actually
> penalize sync for no reason better than a 1 in a somethillion shot in the
> dark.
> 
> 2) Similar reasoning, I don't think we need to penalize SEQ_CST everywhere
> for a legacy sync call that probably doesn't need stronger code either.
> 
> which leaves option 3)
> 
> There are 2 primary options I think
> 
> a) introduce an additional memory model... MEMMODEL_SYNC or something which
> is even stronger than SEQ_CST.  This would involve fixing any places that
> assume SEQ_CST is the highest.  And then we use this from the places that
> expand sync calls as the memory model.  
>
> or
> b) When we are expanding sync calls, we first look for target __sync
> patterns instead of atomic patterns.  If they exist, we use those. Otherwise
> we simply expand like we do today.  
> 
> 
> (b) may be easier to implement, but puts more onus on the target.. probably
> involves more complex patterns since you need both atomic and sync patterns.
> My guess is some duplication of code will occur here.  But the impact is
> only to specific targets.
> 
> (a) Is probably cleaner... just touches a lot more code.  Since we're in
> stage 1, maybe (a) is a better long term solution...?   
> 

Adding a new barrier specifer to enum memmodel seems the simplest approach. It
would mean that all barriers used in gcc are represented in the same way and
that would make them more visible and easier to understand than splitting them
between the enum memmodel and the atomic/sync patterns.

I'm testing a patch-set for option (a). It touches several backends but all the
changes are very simple since the new barrier is the same as MEMMODEL_SEQ_CST
for most targets.

One thing though is that the aarch64 code for __sync_lock_test_and_set and
there may have a similar problem with a too-weak barrier. It's not confirmed
that there is a problem but, since it may mean more work in this area, I
thought I'd mention it.


^ permalink raw reply	[flat|nested] 70+ messages in thread

* [Bug target/65697] __atomic memory barriers not strong enough for __sync builtins
  2015-04-08 11:25 [Bug target/65697] New: __atomic memory barriers not strong enough for __sync builtins matthew.wahab at arm dot com
                   ` (25 preceding siblings ...)
  2015-04-16  8:50 ` mwahab at gcc dot gnu.org
@ 2015-04-16  9:01 ` jgreenhalgh at gcc dot gnu.org
  2015-04-16  9:18 ` mwahab at gcc dot gnu.org
                   ` (41 subsequent siblings)
  68 siblings, 0 replies; 70+ messages in thread
From: jgreenhalgh at gcc dot gnu.org @ 2015-04-16  9:01 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65697

--- Comment #28 from James Greenhalgh <jgreenhalgh at gcc dot gnu.org> ---
(In reply to torvald from comment #24)
> I think we need to at least clarify the documentation of __atomic, probably
> also of __sync; we might also have to change the implementation of __sync
> builtins on some archs.
> 
> First, I think the __atomic builtins should strictly follow the C11 model --
> one major reason being that it's not worthwhile for us to support a
> GCC-specific memory model, which is complex.  Therefore, I think we should
> clarify in the __atomic docs that C++11 is the reference semantics, and that
> any additional text we provide (e.g., to describe what MEMMODEL_SEQ_CST
> does) is just some hand-wavy attempt of makings things look simpler.  I'd
> choose C++11 over C11 because, although the models are supposed to be equal,
> C++11 has more detailed docs in the area of atomics and synchronization in
> general, IMO.  There are also more people involved in C++ standardization
> than C standardization.

Agreed.

> The only exception that I see is that we might want to give more guarantees
> wrt data-race-freedom (DRF) as far as compiler reordering is concerned.  I'm
> mentioning just the compiler because I'm not aware of GCC supporting
> C11/C++11 on any archs whose ISA distinguishes between atomic and nonatomic
> accesses (ie, where DRF is visible at the ISA level); however, I've heard of
> upcoming archs that may do that.  Trying to promise some support for non-DRF
> programs that synchronize might make transition of legacy code to __atomic
> builtins easier, but it's a slippery slope; I don't think it's worthwhile
> for us to try to give really precise and hard guarantees, just because of
> the complexity involved.  I'm not quite sure what's best here.

I use enough whiteboard space trying to work through the well researched,
formalized, and discussed C++11 semantics; I would not like to see a subtly
different GNU++11 semantics!! Not least because I worry for the programmer who
thinks they've understood the guarantees of the specification and reverse
engineers GCC's output for confirmation, only to find GCC is giving stronger
guarantees than is strictly neccessary.

> <snip>
>
> Thus, we need to at least improve the __sync docs.  If we want to really
> make them C11-like, we need to update the docs, and there might be legacy
> code assuming different semantics that breaks.If we don't, we need to
> update the implementation of __sync RMWs.  I don't know what would be the
> easiest way to do that:
>
> 1) We could put __synch_synchronize around them on all archs, and just don't
> care about the performance overhead (thinking that we want to deprecate them
> anyway).  But unless the backends optimize unnecessary sync ops (eg, on x86,
> remove mfence adjacent to a lock'd RMW), there could be performance
> degradation for __sync-using code.

I'm with Andrew, I don't see the need to penalise everyone to ensure
AArch64/ARM memory ordering (Though I agree it would be nice as a stick to
encourage people to move forwards).

> 2) We could make __atomic RMWs with MEMMODEL_SEQ_CST stronger than required
> by C11.  This could result in other C11-conforming compilers to produce more
> efficient synchronization code, and is thus not a good option IMO.

I would not like to see this implemented.

> 3) We could do something just on ARM (and scan other arcs for similar
> issues).  That's perhaps the cleanest option.

Which leaves 3). From Andrew's two proposed solutions:

> a) introduce an additional memory model... MEMMODEL_SYNC or something
> which is even stronger than SEQ_CST.  This would involve fixing any places
> that assume SEQ_CST is the highest.  And then we use this from the places
> that expand sync calls as the memory model.

This seems a sensible approach and leads to a nicer design, but I worry that it
might be overkill for primitives which we ultimately want to reduce support for
and eventually deprecate.

> b) When we are expanding sync calls, we first look for target __sync
> patterns instead of atomic patterns.  If they exist, we use those.
> Otherwise we simply expand like we do today.  
>
> (b) may be easier to implement, but puts more onus on the target..
> probably involves more complex patterns since you need both atomic and
> sync patterns. My guess is some duplication of code will occur here.  But
> the impact is only to specific targets.

When I looked at this problem internally a few weeks back, this is exactly how
I expected things to work (we can add the documentation for the pattern names
to the list of things which need fixing, as it took me a while to figure out
why my new patterns were never expanded!).

I don't think this is particularly onerous for a target. The tough part in all
of this is figuring out the minimal cost instructions at the ISA level to use
for the various __atomic primitives. Extending support to a stronger model,
should be straightforward given explicit documentation of the stronger ordering
requirements.

Certainly, a target would have to do the same legwork for a) regardless, and
would have to pollute their atomic patterns with checks and code paths for
MEMMODEL_SYNC.

This also gives us an easier route to fixing any issues with the
acquire/release __sync primitives (__sync_lock_test_and_set and
__sync_lock_release) if we decide that these also need to be stronger than
their C++11 equivalents.

^ permalink raw reply	[flat|nested] 70+ messages in thread

* [Bug target/65697] __atomic memory barriers not strong enough for __sync builtins
  2015-04-08 11:25 [Bug target/65697] New: __atomic memory barriers not strong enough for __sync builtins matthew.wahab at arm dot com
                   ` (26 preceding siblings ...)
  2015-04-16  9:01 ` jgreenhalgh at gcc dot gnu.org
@ 2015-04-16  9:18 ` mwahab at gcc dot gnu.org
  2015-04-16 11:37 ` amacleod at redhat dot com
                   ` (40 subsequent siblings)
  68 siblings, 0 replies; 70+ messages in thread
From: mwahab at gcc dot gnu.org @ 2015-04-16  9:18 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65697

--- Comment #29 from mwahab at gcc dot gnu.org ---
(In reply to mwahab from comment #27)
> (In reply to Andrew Macleod from comment #25)
> > My opinion:
> > 
> > 1) is undesirable... even though it could possibly accelerate the conversion
> > of legacy sync to atomic calls... I fear it would instead just cause
> > frustration, annoyance and worse.  I don't think we need to actually
> > penalize sync for no reason better than a 1 in a somethillion shot in the
> > dark.
> > 
> > 2) Similar reasoning, I don't think we need to penalize SEQ_CST everywhere
> > for a legacy sync call that probably doesn't need stronger code either.
> > 
> > which leaves option 3)
> > 
> > There are 2 primary options I think


There may be a third option, which is to set-up a target hook to allow the sync
expansions in the middle end to be overridden. This limits the changes to the
backends that need the different semantics without having to continue with the
sync_ patterns (which aren't in aarch64 for example). There's space in the
memmodel values to support target specific barriers so, for aarch64, this would
allow the atomics code to be reused. I don't know much about the target hook
infrastructure though so I don't know how disruptive a new hook would be.


^ permalink raw reply	[flat|nested] 70+ messages in thread

* [Bug target/65697] __atomic memory barriers not strong enough for __sync builtins
  2015-04-08 11:25 [Bug target/65697] New: __atomic memory barriers not strong enough for __sync builtins matthew.wahab at arm dot com
                   ` (27 preceding siblings ...)
  2015-04-16  9:18 ` mwahab at gcc dot gnu.org
@ 2015-04-16 11:37 ` amacleod at redhat dot com
  2015-04-16 11:54 ` torvald at gcc dot gnu.org
                   ` (39 subsequent siblings)
  68 siblings, 0 replies; 70+ messages in thread
From: amacleod at redhat dot com @ 2015-04-16 11:37 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65697

--- Comment #31 from Andrew Macleod <amacleod at redhat dot com> ---
(In reply to mwahab from comment #30)
> (In reply to James Greenhalgh from comment #28)
> 
> > I don't think this is particularly onerous for a target. The tough part in
> > all of this is figuring out the minimal cost instructions at the ISA level
> > to use for the various __atomic primitives. Extending support to a stronger
> > model, should be straightforward given explicit documentation of the
> > stronger ordering requirements.
> >
> 
> My objection to using sync patterns is that it does take more work, both for
> the initial implementation and for continuing maintenance. It also means
> adding sync patterns to targets that would not otherwise need them and
> preserving a part of the gcc infrastructure that is only needed for a legacy
> feature and could otherwise be targeted for removal.
> 

Targets that don't need special sync patterns (ie most of them) simply don't
provide them.   The expanders see no sync pattern and use SEQ_CST, exactly like
they do today.

sync patterns would only be provided by targets which do need to do something
different.   This means there is no potential bug introduction on unaffected
targets.. 

> > This also gives us an easier route to fixing any issues with the
> > acquire/release __sync primitives (__sync_lock_test_and_set and
> > __sync_lock_release) if we decide that these also need to be stronger than
> > their C++11 equivalents.
> 
> This seems like a chunky workaround to avoid having to add a barrier
> representation to gcc.  It's a, not necessarily straightforward, reworking
> of the middle-end to use patterns that would need to be added to
> architectures that don't currently have them and at least checked in the
> architectures that do.

Well, it actually returns back to the situation before they were merged. We use
to look for sync patterns too... until I thought they were redundant.

> 
> This seems to come down to what enum memmodel is supposed to be for. If it
> is intended to represent the barriers provided by C11/C+11 atomics and
> nothing else than a workaround seems unavoidable. If it is meant to
> represent barriers needed by gcc to compile user code than, it seems to me,
> that it would be better to just add the barrier to the enum and update code
> where necessary. 
> 
> Since memmodel is relatively recent, was merged from what looks like the C++
> memory model branch (cxx-mem-model), and doesn't seem to have changed since
> then, it's maybe not surprising that it doesn't already include every thing
> needed by gcc. I don't see that adding to it should be prohibited, provided
> the additions can be show to be strictly required.

The intention was to deprecate __sync and support just the c++ memory model. 
SEQ_CST == SYNC was the original intent and understanding, thus the code merge.
 If we decide we want/need to provide a stronger form of SEQ_CST and call it
SYNC, then we can do that as required... but I'm not envisioning a lot of
future additional memory models. Although never say never I suppose.


^ permalink raw reply	[flat|nested] 70+ messages in thread

* [Bug target/65697] __atomic memory barriers not strong enough for __sync builtins
  2015-04-08 11:25 [Bug target/65697] New: __atomic memory barriers not strong enough for __sync builtins matthew.wahab at arm dot com
                   ` (28 preceding siblings ...)
  2015-04-16 11:37 ` amacleod at redhat dot com
@ 2015-04-16 11:54 ` torvald at gcc dot gnu.org
  2015-04-16 12:13 ` mwahab at gcc dot gnu.org
                   ` (38 subsequent siblings)
  68 siblings, 0 replies; 70+ messages in thread
From: torvald at gcc dot gnu.org @ 2015-04-16 11:54 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65697

--- Comment #32 from torvald at gcc dot gnu.org ---
(In reply to James Greenhalgh from comment #28)
> (In reply to torvald from comment #24)
> > 3) We could do something just on ARM (and scan other arcs for similar
> > issues).  That's perhaps the cleanest option.
> 
> Which leaves 3). From Andrew's two proposed solutions:

3) Also seems best to me.  2) is worst, 1) is too much of a stick.

> This also gives us an easier route to fixing any issues with the
> acquire/release __sync primitives (__sync_lock_test_and_set and
> __sync_lock_release) if we decide that these also need to be stronger than
> their C++11 equivalents.

I don't think we have another case of different __sync vs. __atomics semantics
in case of __sync_lock_test_and_set.  The current specification makes it clear
that this is an acquire barrier, and how it describes the semantics (ie, loads
and stores that are program-order before the acquire op can move to after it) ,
this seems to be consistent with the effects C11 specifies for acquire MO (with
perhaps the distinction that C11 is clear that acquire needs to be paired with
some release op to create an ordering constraint).

I'd say this basically also applies to __sync_lock_release, with the exception
that the current documentation does not mention that stores can be speculated
to before the barrier.  That seems to be an artefact of a TSO model.
However, I don't think this matters much because what the release barrier
allows one to do is reasoning that if one sees the barrier to have taken place
(eg, observe that the lock has been released), then also all ops before the
barrier will be visible.
It does not guarantee that if one observes an effect that is after the barrier
in program order, that the barrier itself will necessarily have taken effect. 
To be able to make this observation, one would have to ensure using __sync ops
that the other effect after the barrier is indeed after the barrier, which
would mean using an release op for the other effect -- which would take care of
things.

If everyone agrees with this reasoning, we probably should add documentation
explaining this.

However, I guess some people relying on data races in their programs could
(mis?)understand the __sync_lock_release semantics to mean that it is a means
to get the equivalent of a C11 release *fence* -- which it is not because the
fence would apply to the (erroneously non-atomic) store after the barrier,
which could one lead to believe that if one observes the store after the
barrier, the fence must also be in effect.  Thoughts?

^ permalink raw reply	[flat|nested] 70+ messages in thread

* [Bug target/65697] __atomic memory barriers not strong enough for __sync builtins
  2015-04-08 11:25 [Bug target/65697] New: __atomic memory barriers not strong enough for __sync builtins matthew.wahab at arm dot com
                   ` (29 preceding siblings ...)
  2015-04-16 11:54 ` torvald at gcc dot gnu.org
@ 2015-04-16 12:13 ` mwahab at gcc dot gnu.org
  2015-04-16 13:27 ` jgreenhalgh at gcc dot gnu.org
                   ` (37 subsequent siblings)
  68 siblings, 0 replies; 70+ messages in thread
From: mwahab at gcc dot gnu.org @ 2015-04-16 12:13 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65697

--- Comment #33 from mwahab at gcc dot gnu.org ---
(In reply to torvald from comment #32)
> (In reply to James Greenhalgh from comment #28)
> > (In reply to torvald from comment #24)
> > > 3) We could do something just on ARM (and scan other arcs for similar
> > > issues).  That's perhaps the cleanest option.
> > 
> > Which leaves 3). From Andrew's two proposed solutions:
> 
> 3) Also seems best to me.  2) is worst, 1) is too much of a stick.
> 
> > This also gives us an easier route to fixing any issues with the
> > acquire/release __sync primitives (__sync_lock_test_and_set and
> > __sync_lock_release) if we decide that these also need to be stronger than
> > their C++11 equivalents.
> 
> I don't think we have another case of different __sync vs. __atomics
> semantics in case of __sync_lock_test_and_set.  The current specification
> makes it clear that this is an acquire barrier, and how it describes the
> semantics (ie, loads and stores that are program-order before the acquire op
> can move to after it) , this seems to be consistent with the effects C11
> specifies for acquire MO (with perhaps the distinction that C11 is clear
> that acquire needs to be paired with some release op to create an ordering
> constraint).

Thanks, I suspect that the acquire barrier may not be much as much of an issue
as I had remembered. (The issue came up while I was trying to understand the
C11 semantics.)

The test case (aarch64) I have is:
----
int foo = 0;
int bar = 0;
int T5(void)
{
  int x = __sync_lock_test_and_set(&foo, 1);
  return bar;
}
----
.L11:
    ldaxr    w2, [x0]     ; load-acquire
    stxr    w3, w1, [x0] ; store
    cbnz    w3, .L11
    ldr    w0, [x0, 4]  ; load
    ret
----
My concern was that the load could be speculated ahead of the store. Since the
store marks the end of the barrier, that could make it appear as if the load
had completed before the acquire-barrier.

In retrospect, I don't think that there will be a problem because any load that
could be moved would have to end up with the same value as if it had not moved.


^ permalink raw reply	[flat|nested] 70+ messages in thread

* [Bug target/65697] __atomic memory barriers not strong enough for __sync builtins
  2015-04-08 11:25 [Bug target/65697] New: __atomic memory barriers not strong enough for __sync builtins matthew.wahab at arm dot com
                   ` (30 preceding siblings ...)
  2015-04-16 12:13 ` mwahab at gcc dot gnu.org
@ 2015-04-16 13:27 ` jgreenhalgh at gcc dot gnu.org
  2015-04-17 15:38 ` torvald at gcc dot gnu.org
                   ` (36 subsequent siblings)
  68 siblings, 0 replies; 70+ messages in thread
From: jgreenhalgh at gcc dot gnu.org @ 2015-04-16 13:27 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65697

--- Comment #35 from James Greenhalgh <jgreenhalgh at gcc dot gnu.org> ---
(In reply to torvald from comment #32)
> (In reply to James Greenhalgh from comment #28)
> > This also gives us an easier route to fixing any issues with the
> > acquire/release __sync primitives (__sync_lock_test_and_set and
> > __sync_lock_release) if we decide that these also need to be stronger than
> > their C++11 equivalents.
> 
> I don't think we have another case of different __sync vs. __atomics
> semantics in case of __sync_lock_test_and_set.  The current specification
> makes it clear that this is an acquire barrier, and how it describes the
> semantics (ie, loads and stores that are program-order before the acquire op
> can move to after it) , this seems to be consistent with the effects C11
> specifies for acquire MO (with perhaps the distinction that C11 is clear
> that acquire needs to be paired with some release op to create an ordering
> constraint).

I think that the question is which parts of a RMW operation with
MEMMODEL_ACQUIRE semantics is ordered. My understanding is that in C++11
MEMMODEL_ACQUIRE only applies to the "load" half of the operation. So an
observer to:

  atomic_flag_test_and_set_explicit(foo, memory_order_acquire)
  atomic_store_exlicit (bar, 1, memory_model_relaxed)

Is permitted to observe a write to bar before a write to foo (but not before
the read from foo).

My reading of the Itanium ABI is that the acquire barrier applies to the entire
operation (Andrew, I think you copied these over exactly backwards in comment
34 ;) ):

  "Disallows the movement of memory references to visible data from
   after the intrinsic (in program order) to before the intrinsic (this
   behavior is desirable at lock-acquire operations, hence the name)."

The definition of __sync_lock_test_and_set is:

  "Behavior:
   • Atomically store the supplied value in *ptr and return the old value
     of *ptr. (i.e.)
       { tmp = *ptr; *ptr = value; return tmp; }
   • Acquire barrier."

So by the strict letter of the specification, no memory references to visible
data should be allowed to move from after the entire body of the intrinsic to
before it. That is to say in:

  __sync_lock_test_and_set (foo, 1)
  bar = 1

an observer should not be able to observe the write to bar before the write to
foo. This is a difference from the C++11 semantics.

I'm not worried about __sync_lock_release, I think the documentation is strong
enough and unambiguous.
>From gcc-bugs-return-483807-listarch-gcc-bugs=gcc.gnu.org@gcc.gnu.org Thu Apr 16 13:29:50 2015
Return-Path: <gcc-bugs-return-483807-listarch-gcc-bugs=gcc.gnu.org@gcc.gnu.org>
Delivered-To: listarch-gcc-bugs@gcc.gnu.org
Received: (qmail 110193 invoked by alias); 16 Apr 2015 13:29:50 -0000
Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-bugs.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-bugs/>
List-Post: <mailto:gcc-bugs@gcc.gnu.org>
List-Help: <mailto:gcc-bugs-help@gcc.gnu.org>
Sender: gcc-bugs-owner@gcc.gnu.org
Delivered-To: mailing list gcc-bugs@gcc.gnu.org
Received: (qmail 110143 invoked by uid 48); 16 Apr 2015 13:29:46 -0000
From: "mwahab at gcc dot gnu.org" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug target/65697] __atomic memory barriers not strong enough for __sync builtins
Date: Thu, 16 Apr 2015 13:29:00 -0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gcc
X-Bugzilla-Component: target
X-Bugzilla-Version: 5.0
X-Bugzilla-Keywords: wrong-code
X-Bugzilla-Severity: normal
X-Bugzilla-Who: mwahab at gcc dot gnu.org
X-Bugzilla-Status: UNCONFIRMED
X-Bugzilla-Priority: P3
X-Bugzilla-Assigned-To: mwahab at gcc dot gnu.org
X-Bugzilla-Target-Milestone: ---
X-Bugzilla-Flags:
X-Bugzilla-Changed-Fields:
Message-ID: <bug-65697-4-ScVbpYCHMS@http.gcc.gnu.org/bugzilla/>
In-Reply-To: <bug-65697-4@http.gcc.gnu.org/bugzilla/>
References: <bug-65697-4@http.gcc.gnu.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: 7bit
X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
X-SW-Source: 2015-04/txt/msg01359.txt.bz2
Content-length: 2310

https://gcc.gnu.org/bugzilla/show_bug.cgi?ide697

--- Comment #36 from mwahab at gcc dot gnu.org ---
(In reply to Andrew Macleod from comment #31)
>
>
> Targets that don't need special sync patterns (ie most of them) simply don't
> provide them.   The expanders see no sync pattern and use SEQ_CST, exactly
> like they do today.

For current targets that may be true but assumes that no new target will need
the special sync patterns.

> sync patterns would only be provided by targets which do need to do
> something different.   This means there is no potential bug introduction on
> unaffected targets..

I was thinking of existing sync patterns in current backends which may have
been made redundant by the atomics builtins but are still there. There's a
danger that they'll get used even if they're not preferred for that target.
There's also the question of whether they're getting tested, assuming that they
were only ever generated for the __sync builtins.

> Well, it actually returns back to the situation before they were merged. We
> use to look for sync patterns too... until I thought they were redundant.

I believe that they are redundant (for __sync builtins anyway) since it looks
like everything could be done through the atomics + new barrier.

> The intention was to deprecate __sync and support just the c++ memory model.
> SEQ_CST == SYNC was the original intent and understanding, thus the code
> merge.  If we decide we want/need to provide a stronger form of SEQ_CST and
> call it SYNC, then we can do that as required... but I'm not envisioning a
> lot of future additional memory models. Although never say never I suppose.

I don't expect that many models will be needed, the set should certainly be
kept as small as possible. I think that extending it in this case is justified
because of the gap between what is needed and what is now available.

The choice seems to be
- between continuing the move away from the syncs to the atomics. This makes
the __sync and the __atomic builtins rely on one infrastructure.
- keeping both the atomics and the syncs infrastructures with individual
targets choosing between them.

The first option seems better, not least because it reduces the number of
things that need to be understood when dealing with synchronization primitives.


^ permalink raw reply	[flat|nested] 70+ messages in thread

* [Bug target/65697] __atomic memory barriers not strong enough for __sync builtins
  2015-04-08 11:25 [Bug target/65697] New: __atomic memory barriers not strong enough for __sync builtins matthew.wahab at arm dot com
                   ` (31 preceding siblings ...)
  2015-04-16 13:27 ` jgreenhalgh at gcc dot gnu.org
@ 2015-04-17 15:38 ` torvald at gcc dot gnu.org
  2015-04-17 15:48 ` torvald at gcc dot gnu.org
                   ` (35 subsequent siblings)
  68 siblings, 0 replies; 70+ messages in thread
From: torvald at gcc dot gnu.org @ 2015-04-17 15:38 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65697

--- Comment #37 from torvald at gcc dot gnu.org ---
(In reply to James Greenhalgh from comment #35)
> (In reply to torvald from comment #32)
> > (In reply to James Greenhalgh from comment #28)
> > > This also gives us an easier route to fixing any issues with the
> > > acquire/release __sync primitives (__sync_lock_test_and_set and
> > > __sync_lock_release) if we decide that these also need to be stronger than
> > > their C++11 equivalents.
> > 
> > I don't think we have another case of different __sync vs. __atomics
> > semantics in case of __sync_lock_test_and_set.  The current specification
> > makes it clear that this is an acquire barrier, and how it describes the
> > semantics (ie, loads and stores that are program-order before the acquire op
> > can move to after it) , this seems to be consistent with the effects C11
> > specifies for acquire MO (with perhaps the distinction that C11 is clear
> > that acquire needs to be paired with some release op to create an ordering
> > constraint).
> 
> I think that the question is which parts of a RMW operation with
> MEMMODEL_ACQUIRE semantics is ordered. My understanding is that in C++11
> MEMMODEL_ACQUIRE only applies to the "load" half of the operation. So an
> observer to:
> 
>   atomic_flag_test_and_set_explicit(foo, memory_order_acquire)
>   atomic_store_exlicit (bar, 1, memory_model_relaxed)

That depends on your observer.  When reasoning about C11, please let's use
complete examples (and ideally, run them through cppmem).  For example, if the
observation is done by a pair of relaxed-MO loads, bar==1 && foo==0 can be
observed:

int main() {
  atomic_int foo = 0; 
  atomic_int bar = 0; 
  {{{ {
    r1 = cas_strong(foo, 0, 1);
    bar.store(1, memory_order_relaxed);
  } ||| {
    r2 = bar.load(memory_order_relaxed).readsvalue(1); 
    r3 = foo.load(memory_order_relaxed).readsvalue(0); 
  } }}};
  return 0; }

> Is permitted to observe a write to bar before a write to foo (but not before
> the read from foo).

How does one observe a load in C11 (ie, so that "before the read from foo" is
defined)?  You can constrain the reads-from of a load, but the load itself is
not observable as an effect.

I think I get what you're trying to say regarding the "load half" but I would
phrase it differently: If there could be another release store to foo that the
CAS/TAS reads-from, then moving the store to bar before the load would risk
creating a cycle between inter-thread happens-before and
sequenced-before-created intra-thread happens-before.  (Note that the later
isn't visible to other threads though, except through additional inter-thread
synchronization.)

Perhaps one could also say that moving the store to bar before the store to foo
could result in the CAS/TAS not being atomic -- but this is just reliably
observable if the foo observation is a CAS/TAS by itself (so it's totally
ordered with the other CAS), or if there is a inter-thread happens-before
between the observation of bar and the store to bar (which means using a
release store for bar).

I'm discussing these aspects because I think it matters which observations are
actually possible in a reliable way.  It matters for C11, and it may matter for
the __sync semantics as well because while the __sync functions promise TSO, we
don't really do so for all (nonatomic) loads or stores anywhere else in the
program...

IOW, can we actually come up with reliable and correct counter-examples (either
using the C11 or __sync interfaces) for the guarantees we think ARM might not
provide?

> My reading of the Itanium ABI is that the acquire barrier applies to the
> entire operation (Andrew, I think you copied these over exactly backwards in
> comment 34 ;) ):
> 
>   "Disallows the movement of memory references to visible data from
>    after the intrinsic (in program order) to before the intrinsic (this
>    behavior is desirable at lock-acquire operations, hence the name)."
> 
> The definition of __sync_lock_test_and_set is:
> 
>   "Behavior:
>    • Atomically store the supplied value in *ptr and return the old value
>      of *ptr. (i.e.)
>        { tmp = *ptr; *ptr = value; return tmp; }
>    • Acquire barrier."

Hmm.  I don't think this list is meant to be a sequence; __sync_lock_release
has two items, first the assignment to the memory location, and *second* a
release barrier.  If this were supposed to be a sequence, it wouldn't be
correct for lock releases.  It rather seems that, somehow, the whole intrinsic
is supposed to be a barrier, and be atomic.  Similarly, the __sync RMW ops just
have a single full barrier listed under behavior.

> So by the strict letter of the specification, no memory references to
> visible data should be allowed to move from after the entire body of the
> intrinsic to before it. That is to say in:
> 
>   __sync_lock_test_and_set (foo, 1)
>   bar = 1
> 
> an observer should not be able to observe the write to bar before the write
> to foo. This is a difference from the C++11 semantics.

Can you clarify how this observer would look like?  I think we should look at
both code examples that just use __sync for concurrent accesses, and examples
that also use normal memory accesses as if those would be guaranteed to be
atomic.  None of the __sync docs nor the psABI guarantee any of the latter
AFAIK.  I don't think it would be unreasonable to argue that __sync doesn't
make promises about non-DRF normal accesses, and so, strictly speaking, maybe
programs can't in fact rely on any of the behaviors that are complicated to
implement for ARM.  That's why I'd like to distinguish those two cases.

Currently, I couldn't say with confidence which semantics __sync really should
have, and which semantics we can actually promise in practice.  It feels as if
we need to have a more thorough model / understanding and more detailed info
about the trade-offs before we can make a decision.  It feels kind of risky to
just make a quick decision here that seems alright but which we don't truly
understand -- OTOH, we want to deprecate __sync eventually, so maybe just going
the super-conservative route is most practical.

> I'm not worried about __sync_lock_release, I think the documentation is
> strong enough and unambiguous.

Are you aware that the GCC's __sync disallow store-store reordering across
__sync_lock_release, whereas the psABI docs don't?
>From gcc-bugs-return-483904-listarch-gcc-bugs=gcc.gnu.org@gcc.gnu.org Fri Apr 17 15:46:02 2015
Return-Path: <gcc-bugs-return-483904-listarch-gcc-bugs=gcc.gnu.org@gcc.gnu.org>
Delivered-To: listarch-gcc-bugs@gcc.gnu.org
Received: (qmail 100058 invoked by alias); 17 Apr 2015 15:46:02 -0000
Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-bugs.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-bugs/>
List-Post: <mailto:gcc-bugs@gcc.gnu.org>
List-Help: <mailto:gcc-bugs-help@gcc.gnu.org>
Sender: gcc-bugs-owner@gcc.gnu.org
Delivered-To: mailing list gcc-bugs@gcc.gnu.org
Received: (qmail 100000 invoked by uid 48); 17 Apr 2015 15:45:58 -0000
From: "boger at us dot ibm.com" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug go/65180] regression in gccgo testcase runtime/pprof on ppc64le, ppc64 following move to go 1.4 from 1.3
Date: Fri, 17 Apr 2015 15:46:00 -0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gcc
X-Bugzilla-Component: go
X-Bugzilla-Version: 5.0
X-Bugzilla-Keywords:
X-Bugzilla-Severity: normal
X-Bugzilla-Who: boger at us dot ibm.com
X-Bugzilla-Status: UNCONFIRMED
X-Bugzilla-Priority: P3
X-Bugzilla-Assigned-To: ian at airs dot com
X-Bugzilla-Target-Milestone: ---
X-Bugzilla-Flags:
X-Bugzilla-Changed-Fields:
Message-ID: <bug-65180-4-5H6xQObPUR@http.gcc.gnu.org/bugzilla/>
In-Reply-To: <bug-65180-4@http.gcc.gnu.org/bugzilla/>
References: <bug-65180-4@http.gcc.gnu.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: 7bit
X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
X-SW-Source: 2015-04/txt/msg01456.txt.bz2
Content-length: 1530

https://gcc.gnu.org/bugzilla/show_bug.cgi?ide180

--- Comment #2 from boger at us dot ibm.com ---
We've been putting most of the discussion on this in the bugzilla mentioned in
the previous comment.

However there is a simple fix for Power which I will add here:

ndex: libgo/runtime/go-callers.c
==================================================================--- libgo/runtime/go-callers.c  (revision 222128)
+++ libgo/runtime/go-callers.c  (working copy)
@@ -83,8 +83,20 @@ callback (void *data, uintptr_t pc, const char *fi
     }

   loc = &arg->locbuf[arg->index];
-  loc->pc = pc;

+  /* On the call to backtrace_full the pc value was most likely decremented
+     if there was a normal call, since the pc referred to the instruction
+     where the call returned and not the call itself.  This was done so that
+     the line number referred to the call instruction.  To make sure
+     the actual pc from the call stack is used, it is incremented here.
+
+     In the case of a signal, the pc was not decremented by backtrace_full but
+     still incremented here.  That doesn't really hurt anything since the line
+     number is right and the pc refers to the same instruction.
+  */
+
+  loc->pc = pc+1;
+
   /* The libbacktrace library says that these strings might disappear,
      but with the current implementation they won't.  We can't easily
      allocate memory here, so for now assume that we can save a

Since 64999 was written against Z there might be an additional change needed to
make that work.

^ permalink raw reply	[flat|nested] 70+ messages in thread

* [Bug target/65697] __atomic memory barriers not strong enough for __sync builtins
  2015-04-08 11:25 [Bug target/65697] New: __atomic memory barriers not strong enough for __sync builtins matthew.wahab at arm dot com
                   ` (32 preceding siblings ...)
  2015-04-17 15:38 ` torvald at gcc dot gnu.org
@ 2015-04-17 15:48 ` torvald at gcc dot gnu.org
  2015-04-17 18:00 ` amacleod at redhat dot com
                   ` (34 subsequent siblings)
  68 siblings, 0 replies; 70+ messages in thread
From: torvald at gcc dot gnu.org @ 2015-04-17 15:48 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65697

--- Comment #38 from torvald at gcc dot gnu.org ---
(In reply to Andrew Macleod from comment #34)
> > However, I guess some people relying on data races in their programs could
> > (mis?)understand the __sync_lock_release semantics to mean that it is a
> > means to get the equivalent of a C11 release *fence* -- which it is not
> > because the fence would apply to the (erroneously non-atomic) store after
> > the barrier, which could one lead to believe that if one observes the store
> > after the barrier, the fence must also be in effect.  Thoughts?
> 
> before we get too carried away, maybe we should return to what we *think*
> __sync are suppose to do. It represents a specific definition by intel..
> From the original documentation for __sync "back in the day", and all legacy
> uses of sync should expect this behaviour:

The problem I see with that is that I don't think that just looking at the
psABI gives you enough information to really reason about what you are allowed
to do or not.  Strictly speaking, the psABI doesn't give you guarantees about
normal memory accesses that are not data-race-free (through use of the __sync
builtins).  Nonetheless, legacy code does use them in a combination with the
__sync builtins.

Also, if you look at the IA-64 __sync_lock_release vs. GCC docs'
__sync_lock_release, the latter is like x86/TSO.  Do you have any info on which
other semantics __sync was supposed to adhere to?

One potential way to solve it would be to just require code that uses __sync to
more or less implement an IA-64 or x86 memory model, modulo allowing
compiler-reordering and optimization between adjacent non-__sync memory
accesses.  This could be inefficient on ARM (see James' examples) and perhaps
Power too (or not -- see Jakub's comments).

^ permalink raw reply	[flat|nested] 70+ messages in thread

* [Bug target/65697] __atomic memory barriers not strong enough for __sync builtins
  2015-04-08 11:25 [Bug target/65697] New: __atomic memory barriers not strong enough for __sync builtins matthew.wahab at arm dot com
                   ` (33 preceding siblings ...)
  2015-04-17 15:48 ` torvald at gcc dot gnu.org
@ 2015-04-17 18:00 ` amacleod at redhat dot com
  2015-04-20 13:42 ` mwahab at gcc dot gnu.org
                   ` (33 subsequent siblings)
  68 siblings, 0 replies; 70+ messages in thread
From: amacleod at redhat dot com @ 2015-04-17 18:00 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65697

--- Comment #39 from Andrew Macleod <amacleod at redhat dot com> ---
(In reply to torvald from comment #38)
> (In reply to Andrew Macleod from comment #34)
> > > However, I guess some people relying on data races in their programs could
> > > (mis?)understand the __sync_lock_release semantics to mean that it is a
> > > means to get the equivalent of a C11 release *fence* -- which it is not
> > > because the fence would apply to the (erroneously non-atomic) store after
> > > the barrier, which could one lead to believe that if one observes the store
> > > after the barrier, the fence must also be in effect.  Thoughts?
> > 
> > before we get too carried away, maybe we should return to what we *think*
> > __sync are suppose to do. It represents a specific definition by intel..
> > From the original documentation for __sync "back in the day", and all legacy
> > uses of sync should expect this behaviour:
> 
> The problem I see with that is that I don't think that just looking at the
> psABI gives you enough information to really reason about what you are
> allowed to do or not.  Strictly speaking, the psABI doesn't give you
> guarantees about normal memory accesses that are not data-race-free (through
> use of the __sync builtins).  Nonetheless, legacy code does use them in a
> combination with the __sync builtins.
> 
> Also, if you look at the IA-64 __sync_lock_release vs. GCC docs'
> __sync_lock_release, the latter is like x86/TSO.  Do you have any info on
> which other semantics __sync was supposed to adhere to?
>

no, __sync was simply an implementation of psABI back when it was new... I'm
not aware of any additions, enhancements or guarantees that were added when it
was ported to other arch's.  

Terminology was much looser 14 years ago :-)  That's one of the reasons we want
to migrate to __atomic... it is supposedly more precisely defined, whereas
__sync had some hand-waving.  We're now experiencing some different
interpretations of that.    Regardless of the documentation, we didn't think
we'd be supporting something stronger than SEQ_CST since they were suppose to
be equivalent... 

when rth gets back he may have some opinion as well.



> One potential way to solve it would be to just require code that uses __sync
> to more or less implement an IA-64 or x86 memory model, modulo allowing
> compiler-reordering and optimization between adjacent non-__sync memory
> accesses.  This could be inefficient on ARM (see James' examples) and
> perhaps Power too (or not -- see Jakub's comments).


^ permalink raw reply	[flat|nested] 70+ messages in thread

* [Bug target/65697] __atomic memory barriers not strong enough for __sync builtins
  2015-04-08 11:25 [Bug target/65697] New: __atomic memory barriers not strong enough for __sync builtins matthew.wahab at arm dot com
                   ` (34 preceding siblings ...)
  2015-04-17 18:00 ` amacleod at redhat dot com
@ 2015-04-20 13:42 ` mwahab at gcc dot gnu.org
  2015-04-20 15:17 ` mwahab at gcc dot gnu.org
                   ` (32 subsequent siblings)
  68 siblings, 0 replies; 70+ messages in thread
From: mwahab at gcc dot gnu.org @ 2015-04-20 13:42 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65697

--- Comment #40 from mwahab at gcc dot gnu.org ---
(In reply to Andrew Macleod from comment #25)

> Documentation needs updating for sure... The rules have changed under us
> since originally SEQ_CST and sync were intended to be the same thing...
> Guess I shouldn't have tied our horse to the C++11 cart :-)

I've submitted a patch to try to update the __atomics documentation:
https://gcc.gnu.org/ml/gcc-patches/2015-04/msg01025.html.


^ permalink raw reply	[flat|nested] 70+ messages in thread

* [Bug target/65697] __atomic memory barriers not strong enough for __sync builtins
  2015-04-08 11:25 [Bug target/65697] New: __atomic memory barriers not strong enough for __sync builtins matthew.wahab at arm dot com
                   ` (35 preceding siblings ...)
  2015-04-20 13:42 ` mwahab at gcc dot gnu.org
@ 2015-04-20 15:17 ` mwahab at gcc dot gnu.org
  2015-04-21 18:51 ` rth at gcc dot gnu.org
                   ` (31 subsequent siblings)
  68 siblings, 0 replies; 70+ messages in thread
From: mwahab at gcc dot gnu.org @ 2015-04-20 15:17 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65697

--- Comment #41 from mwahab at gcc dot gnu.org ---
(In reply to torvald from comment #38)
> (In reply to Andrew Macleod from comment #34)
> 
> Also, if you look at the IA-64 __sync_lock_release vs. GCC docs'
> __sync_lock_release, the latter is like x86/TSO.  Do you have any info on
> which other semantics __sync was supposed to adhere to?
> 
> One potential way to solve it would be to just require code that uses __sync
> to more or less implement an IA-64 or x86 memory model, modulo allowing
> compiler-reordering and optimization between adjacent non-__sync memory
> accesses.  This could be inefficient on ARM (see James' examples) and
> perhaps Power too (or not -- see Jakub's comments).

If the __sync barriers are as described in the GCC manual, that a barrier is
atomic and its restrictions apply to all data references, then the Aarch64
backend doesn't currently emit strong enough barriers.

For MEMMODEL_SEQ_CST, the problem was visible enough and the solution I
suggested (extending the set of available memmodel types) was simple enough
that the changes it would need could be justified. I don't think that's true
for the MEMMODEL_ACQUIRE case which seems to be much less likely to be seen and
would be rather more disruptive.

I believe that Aarch64 is the only current target where the code needs to be
strengthened. Since extending the set of memmodels is difficult to justify and
(IMO) so is resurrecting the __sync patterns, I suggest just adding a target
hook to allow the expansion of __sync calls to be overridden. That will allow
Aarch64 to set a target-specific memmodel value, as is currently allowed, which
can then be passed through the existing __atomics mechanisms in the middle
through to the Aarch64 backend. No other backend will need to be touched.

If it happens that future architectures have a similar problem then we can
reconsider whether any changes need to be made in the target-independent
expansions.

Does that sounds like a reasonable approach?

^ permalink raw reply	[flat|nested] 70+ messages in thread

* [Bug target/65697] __atomic memory barriers not strong enough for __sync builtins
  2015-04-08 11:25 [Bug target/65697] New: __atomic memory barriers not strong enough for __sync builtins matthew.wahab at arm dot com
                   ` (36 preceding siblings ...)
  2015-04-20 15:17 ` mwahab at gcc dot gnu.org
@ 2015-04-21 18:51 ` rth at gcc dot gnu.org
  2015-04-28 15:32 ` jgreenhalgh at gcc dot gnu.org
                   ` (30 subsequent siblings)
  68 siblings, 0 replies; 70+ messages in thread
From: rth at gcc dot gnu.org @ 2015-04-21 18:51 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65697

--- Comment #42 from Richard Henderson <rth at gcc dot gnu.org> ---
(In reply to Andrew Macleod from comment #39)
> no, __sync was simply an implementation of psABI back when it was new... I'm
> not aware of any additions, enhancements or guarantees that were added when
> it was ported to other arch's.  
> 
> Terminology was much looser 14 years ago :-)  That's one of the reasons we
> want to migrate to __atomic... it is supposedly more precisely defined,
> whereas __sync had some hand-waving.  We're now experiencing some different
> interpretations of that.    Regardless of the documentation, we didn't think
> we'd be supporting something stronger than SEQ_CST since they were suppose
> to be equivalent... 

Exactly right.  I don't believe there's anywhere we can look for 
more definitive semantics than the psABI.  And as already explored
here, that's not entirely helpful.


^ permalink raw reply	[flat|nested] 70+ messages in thread

* [Bug target/65697] __atomic memory barriers not strong enough for __sync builtins
  2015-04-08 11:25 [Bug target/65697] New: __atomic memory barriers not strong enough for __sync builtins matthew.wahab at arm dot com
                   ` (37 preceding siblings ...)
  2015-04-21 18:51 ` rth at gcc dot gnu.org
@ 2015-04-28 15:32 ` jgreenhalgh at gcc dot gnu.org
  2015-04-29  8:47 ` mwahab at gcc dot gnu.org
                   ` (29 subsequent siblings)
  68 siblings, 0 replies; 70+ messages in thread
From: jgreenhalgh at gcc dot gnu.org @ 2015-04-28 15:32 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65697

--- Comment #43 from James Greenhalgh <jgreenhalgh at gcc dot gnu.org> ---
(In reply to torvald from comment #37)
> (In reply to James Greenhalgh from comment #35)
> > (In reply to torvald from comment #32)
> > > (In reply to James Greenhalgh from comment #28)
> > > > This also gives us an easier route to fixing any issues with the
> > > > acquire/release __sync primitives (__sync_lock_test_and_set and
> > > > __sync_lock_release) if we decide that these also need to be stronger than
> > > > their C++11 equivalents.
> > > 
> > > I don't think we have another case of different __sync vs. __atomics
> > > semantics in case of __sync_lock_test_and_set.  The current specification
> > > makes it clear that this is an acquire barrier, and how it describes the
> > > semantics (ie, loads and stores that are program-order before the acquire op
> > > can move to after it) , this seems to be consistent with the effects C11
> > > specifies for acquire MO (with perhaps the distinction that C11 is clear
> > > that acquire needs to be paired with some release op to create an ordering
> > > constraint).
> > 
> > I think that the question is which parts of a RMW operation with
> > MEMMODEL_ACQUIRE semantics is ordered. My understanding is that in C++11
> > MEMMODEL_ACQUIRE only applies to the "load" half of the operation. So an
> > observer to:
> > 
> >   atomic_flag_test_and_set_explicit(foo, memory_order_acquire)
> >   atomic_store_exlicit (bar, 1, memory_model_relaxed)
>  
> That depends on your observer.  When reasoning about C11, please let's use
> complete examples (and ideally, run them through cppmem).  For example, if
> the observation is done by a pair of relaxed-MO loads, bar==1 && foo==0 can
> be observed:
> 
> int main() {
>   atomic_int foo = 0; 
>   atomic_int bar = 0; 
>   {{{ {
>     r1 = cas_strong(foo, 0, 1);
>     bar.store(1, memory_order_relaxed);
>   } ||| {
>     r2 = bar.load(memory_order_relaxed).readsvalue(1); 
>     r3 = foo.load(memory_order_relaxed).readsvalue(0); 
>   } }}};
>   return 0; }

Thanks for that, and sorry again for my sloppy terminology. I did try to write
a
cppmem example, but I couldn't find the syntax for a CAS. If I could have, this
is
what I would have written, and gets the results I had expected and was trying
to express.

> 
> > Is permitted to observe a write to bar before a write to foo (but not before
> > the read from foo).
> 
> How does one observe a load in C11 (ie, so that "before the read from foo"
> is defined)?  You can constrain the reads-from of a load, but the load
> itself is not observable as an effect.

Sorry, this is ARM memory model terminology leaking across to my C11 examples,
but
even then what I was saying doesn't make sense. To observe a load in the ARM
model:

"A read of a location in memory is said to be observed by an observer when a
 subsequent write to the location by the same observer has no effect on the
value
 returned by the read."

But you are right, this is not interesting to model.


> I think I get what you're trying to say regarding the "load half" but I
> would phrase it differently: If there could be another release store to foo
> that the CAS/TAS reads-from, then moving the store to bar before the load
> would risk creating a cycle between inter-thread happens-before and
> sequenced-before-created intra-thread happens-before.  (Note that the later
> isn't visible to other threads though, except through additional
> inter-thread synchronization.)
> 
> Perhaps one could also say that moving the store to bar before the store to
> foo could result in the CAS/TAS not being atomic -- but this is just
> reliably observable if the foo observation is a CAS/TAS by itself (so it's
> totally ordered with the other CAS), or if there is a inter-thread
> happens-before between the observation of bar and the store to bar (which
> means using a release store for bar).
> 
> I'm discussing these aspects because I think it matters which observations
> are actually possible in a reliable way.  It matters for C11, and it may
> matter for the __sync semantics as well because while the __sync functions
> promise TSO, we don't really do so for all (nonatomic) loads or stores
> anywhere else in the program...
> 
> IOW, can we actually come up with reliable and correct counter-examples
> (either using the C11 or __sync interfaces) for the guarantees we think ARM
> might not provide?
> 
> > My reading of the Itanium ABI is that the acquire barrier applies to the
> > entire operation (Andrew, I think you copied these over exactly backwards in
> > comment 34 ;) ):
> > 
> >   "Disallows the movement of memory references to visible data from
> >    after the intrinsic (in program order) to before the intrinsic (this
> >    behavior is desirable at lock-acquire operations, hence the name)."
> > 
> > The definition of __sync_lock_test_and_set is:
> > 
> >   "Behavior:
> >    • Atomically store the supplied value in *ptr and return the old value
> >      of *ptr. (i.e.)
> >        { tmp = *ptr; *ptr = value; return tmp; }
> >    • Acquire barrier."
> 
> Hmm.  I don't think this list is meant to be a sequence; __sync_lock_release
> has two items, first the assignment to the memory location, and *second* a
> release barrier.  If this were supposed to be a sequence, it wouldn't be
> correct for lock releases.  It rather seems that, somehow, the whole
> intrinsic is supposed to be a barrier, and be atomic.  Similarly, the __sync
> RMW ops just have a single full barrier listed under behavior.

Yes, I agree and that was the interpretation I was getting at. I believe that
this is a consequence of the Itanium "cmpxchg" instruction, which implements
the
intrinsic in one instruction, and thus does not need to care about its
decomposition.

> > So by the strict letter of the specification, no memory references to
> > visible data should be allowed to move from after the entire body of the
> > intrinsic to before it. That is to say in:
> > 
> >   __sync_lock_test_and_set (foo, 1)
> >   bar = 1
> > 
> > an observer should not be able to observe the write to bar before the write
> > to foo. This is a difference from the C++11 semantics.
> 
> Can you clarify how this observer would look like?  I think we should look
> at both code examples that just use __sync for concurrent accesses, and
> examples that also use normal memory accesses as if those would be
> guaranteed to be atomic.  None of the __sync docs nor the psABI guarantee
> any of the latter AFAIK.  I don't think it would be unreasonable to argue
> that __sync doesn't make promises about non-DRF normal accesses, and so,
> strictly speaking, maybe programs can't in fact rely on any of the behaviors
> that are complicated to implement for ARM.  That's why I'd like to
> distinguish those two cases.

Sure, it would look much like your cppmem example above, though you can add
some
barriers to the observer if you want to make a stronger example

bar = 0, foo = 0;

thread_a {
  __sync_lock_test_and_set (foo, 1)
  bar = 1
}

thread_b {
  /* If we can see the write to bar, the write
     to foo must also have happened.  */
  if (bar) /* Reads 1.  */
   assert (foo) /* Should never read 0.  */
}

> Currently, I couldn't say with confidence which semantics __sync really
> should have, and which semantics we can actually promise in practice.  It
> feels as if we need to have a more thorough model / understanding and more
> detailed info about the trade-offs before we can make a decision.  It feels
> kind of risky to just make a quick decision here that seems alright but
> which we don't truly understand -- OTOH, we want to deprecate __sync
> eventually, so maybe just going the super-conservative route is most
> practical.

Yes agreed.

> > I'm not worried about __sync_lock_release, I think the documentation is
> > strong enough and unambiguous.
> 
> Are you aware that the GCC's __sync disallow store-store reordering across
> __sync_lock_release, whereas the psABI docs don't?

No I was not, and even looking exactly for the text you were referring to, it
took
me three attempts to spot it. Yes, I understand now why you are concerned about
the GCC wording. Perhaps this is just an artefact of a mistake transcribing
the psABI?

AArch64 is certainly not providing that guarantee just now.
>From gcc-bugs-return-484871-listarch-gcc-bugs=gcc.gnu.org@gcc.gnu.org Tue Apr 28 15:59:58 2015
Return-Path: <gcc-bugs-return-484871-listarch-gcc-bugs=gcc.gnu.org@gcc.gnu.org>
Delivered-To: listarch-gcc-bugs@gcc.gnu.org
Received: (qmail 85413 invoked by alias); 28 Apr 2015 15:59:57 -0000
Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-bugs.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-bugs/>
List-Post: <mailto:gcc-bugs@gcc.gnu.org>
List-Help: <mailto:gcc-bugs-help@gcc.gnu.org>
Sender: gcc-bugs-owner@gcc.gnu.org
Delivered-To: mailing list gcc-bugs@gcc.gnu.org
Received: (qmail 85266 invoked by uid 48); 28 Apr 2015 15:59:54 -0000
From: "habanero_pizza at yahoo dot com" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug tree-optimization/65918] Optimized code (> -O0) on 2-dim array iteration incorrect
Date: Tue, 28 Apr 2015 15:59:00 -0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gcc
X-Bugzilla-Component: tree-optimization
X-Bugzilla-Version: 5.1.0
X-Bugzilla-Keywords:
X-Bugzilla-Severity: normal
X-Bugzilla-Who: habanero_pizza at yahoo dot com
X-Bugzilla-Status: RESOLVED
X-Bugzilla-Resolution: INVALID
X-Bugzilla-Priority: P3
X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org
X-Bugzilla-Target-Milestone: ---
X-Bugzilla-Flags:
X-Bugzilla-Changed-Fields:
Message-ID: <bug-65918-4-ZSMa6IapDT@http.gcc.gnu.org/bugzilla/>
In-Reply-To: <bug-65918-4@http.gcc.gnu.org/bugzilla/>
References: <bug-65918-4@http.gcc.gnu.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: 7bit
X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
X-SW-Source: 2015-04/txt/msg02423.txt.bz2
Content-length: 168

https://gcc.gnu.org/bugzilla/show_bug.cgi?ide918

--- Comment #3 from J. W. Mitchell <habanero_pizza at yahoo dot com> ---
Indeed.  Apologies for the submission....


^ permalink raw reply	[flat|nested] 70+ messages in thread

* [Bug target/65697] __atomic memory barriers not strong enough for __sync builtins
  2015-04-08 11:25 [Bug target/65697] New: __atomic memory barriers not strong enough for __sync builtins matthew.wahab at arm dot com
                   ` (38 preceding siblings ...)
  2015-04-28 15:32 ` jgreenhalgh at gcc dot gnu.org
@ 2015-04-29  8:47 ` mwahab at gcc dot gnu.org
  2015-04-29 12:20 ` jgreenhalgh at gcc dot gnu.org
                   ` (28 subsequent siblings)
  68 siblings, 0 replies; 70+ messages in thread
From: mwahab at gcc dot gnu.org @ 2015-04-29  8:47 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65697

--- Comment #44 from mwahab at gcc dot gnu.org ---
(In reply to James Greenhalgh from comment #43)
> (In reply to torvald from comment #37)
> 
> > > I'm not worried about __sync_lock_release, I think the documentation is
> > > strong enough and unambiguous.
> > 
> > Are you aware that the GCC's __sync disallow store-store reordering across
> > __sync_lock_release, whereas the psABI docs don't?
> 
> No I was not, and even looking exactly for the text you were referring to,
> it took
> me three attempts to spot it. Yes, I understand now why you are concerned
> about
> the GCC wording. Perhaps this is just an artefact of a mistake transcribing
> the psABI?
> 
> AArch64 is certainly not providing that guarantee just now.

I wasn't able to find the text, could you copy it in a reply. 
In the GCC manual, the only description of __sync_lock_release behaviour is in
the last paragraph. That descriptions seems consistent with the function being
a release barrier and with the current Aarch64 code generated for it.


^ permalink raw reply	[flat|nested] 70+ messages in thread

* [Bug target/65697] __atomic memory barriers not strong enough for __sync builtins
  2015-04-08 11:25 [Bug target/65697] New: __atomic memory barriers not strong enough for __sync builtins matthew.wahab at arm dot com
                   ` (39 preceding siblings ...)
  2015-04-29  8:47 ` mwahab at gcc dot gnu.org
@ 2015-04-29 12:20 ` jgreenhalgh at gcc dot gnu.org
  2015-04-29 13:26 ` mwahab at gcc dot gnu.org
                   ` (27 subsequent siblings)
  68 siblings, 0 replies; 70+ messages in thread
From: jgreenhalgh at gcc dot gnu.org @ 2015-04-29 12:20 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65697

--- Comment #45 from James Greenhalgh <jgreenhalgh at gcc dot gnu.org> ---
(In reply to mwahab from comment #44)
> (In reply to James Greenhalgh from comment #43)
> > (In reply to torvald from comment #37)
> > 
> > > > I'm not worried about __sync_lock_release, I think the documentation is
> > > > strong enough and unambiguous.
> > > 
> > > Are you aware that the GCC's __sync disallow store-store reordering across
> > > __sync_lock_release, whereas the psABI docs don't?
> > 
> > No I was not, and even looking exactly for the text you were referring to,
> > it took
> > me three attempts to spot it. Yes, I understand now why you are concerned
> > about
> > the GCC wording. Perhaps this is just an artefact of a mistake transcribing
> > the psABI?
> > 
> > AArch64 is certainly not providing that guarantee just now.
> 
> I wasn't able to find the text, could you copy it in a reply. 
> In the GCC manual, the only description of __sync_lock_release behaviour is
> in the last paragraph. That descriptions seems consistent with the function
> being a release barrier and with the current Aarch64 code generated for it.

Yes, so this part is fine...

  void __sync_lock_release (type *ptr, ...)

  This built-in function releases the lock acquired by
  __sync_lock_test_and_set. Normally this means writing the
  constant 0 to *ptr.

  This built-in function is not a full barrier, but rather a release barrier.
  This means that all previous memory stores are globally visible, and all
  previous memory loads have been satisfied,

And this final sentence is buggy by omission of a mention of memory writes:

  but following memory reads are not prevented from being speculated to
  before the barrier.

Which can be read as forbidding store-store reordering across the barrier.

As I say though, I think this is an artefact of transcribing the documentation,
rather than an attempt to provide stronger ordering semantics in the GCC
implementation.


^ permalink raw reply	[flat|nested] 70+ messages in thread

* [Bug target/65697] __atomic memory barriers not strong enough for __sync builtins
  2015-04-08 11:25 [Bug target/65697] New: __atomic memory barriers not strong enough for __sync builtins matthew.wahab at arm dot com
                   ` (40 preceding siblings ...)
  2015-04-29 12:20 ` jgreenhalgh at gcc dot gnu.org
@ 2015-04-29 13:26 ` mwahab at gcc dot gnu.org
  2015-04-29 16:04 ` amacleod at redhat dot com
                   ` (26 subsequent siblings)
  68 siblings, 0 replies; 70+ messages in thread
From: mwahab at gcc dot gnu.org @ 2015-04-29 13:26 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65697

--- Comment #46 from mwahab at gcc dot gnu.org ---
(In reply to James Greenhalgh from comment #45)
> (In reply to mwahab from comment #44)
>
> And this final sentence is buggy by omission of a mention of memory writes:
> 
>   but following memory reads are not prevented from being speculated to
>   before the barrier.
> 
> Which can be read as forbidding store-store reordering across the barrier.

It doesn't say anything about store-store reordering so it's trumped by the
stated intention to be compatible with the psABI.

> As I say though, I think this is an artefact of transcribing the
> documentation, rather than an attempt to provide stronger ordering semantics
> in the GCC implementation.

Agreed.


^ permalink raw reply	[flat|nested] 70+ messages in thread

* [Bug target/65697] __atomic memory barriers not strong enough for __sync builtins
  2015-04-08 11:25 [Bug target/65697] New: __atomic memory barriers not strong enough for __sync builtins matthew.wahab at arm dot com
                   ` (41 preceding siblings ...)
  2015-04-29 13:26 ` mwahab at gcc dot gnu.org
@ 2015-04-29 16:04 ` amacleod at redhat dot com
  2015-04-30  8:20 ` mwahab at gcc dot gnu.org
                   ` (25 subsequent siblings)
  68 siblings, 0 replies; 70+ messages in thread
From: amacleod at redhat dot com @ 2015-04-29 16:04 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65697

--- Comment #47 from Andrew Macleod <amacleod at redhat dot com> ---
Created attachment 35425
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=35425&action=edit
potential patch to add MEMMODEL_SYNC

I don't know where we've finally settled on this, but I did prototype out what
involved in adding a new MEMMODEL_SYNC which can be identified and made
stronger than MEMMODEL_SEQ_CST if so desired. Normally it will behave exactly
like MEMMODEL_SEQ_CST.   This bootstraps on x86-64 with no testsuite
regressions.

Other than using the new memmodel in the expanders for __sync routines, there
were 3 common idioms that needed changing:
- a few places assumed SEQ_CST was the end of the enumeration, changed those to
check for MEMMODEL_LAST instead. 
- if (model == MEMMODEL_SEQ_CST) needed changing to if (model >=
MEMMODEL_SEQ_CST)
- case MEMMODEL_SEQ_CST:  needed a CASE MEMMODEL_SYNC: added.

Its actually not that invasive, and should have no impact on any existing
target.  If a target feels they need to strengthen the __sync cases, then they
can look for MEMMODEL_SYNC in their rtl atomic patterns and do something
different.

I did a quick and dirty test on x86_64 to make the atomic_sub pattern issue an
extra lock if it sees MEMMODEL_SYNC.. and that worked fine.

MEMMODEL_SYNC is not exposed in any way to the user, it is only generated from
the __sync expanders, and the compare_swap loop generator.

If someone wants to fool around with it and see if this resolves their issues,
go for it.  

If we decide we don't need to support a stronger __sync model and leave things
as originally intended, then we simply don't use the patch.

If we do decide to go ahead with it, I will also do a comparison run on the
__sync tests in the testsuite to verify that the code generated is in fact
identical and I didn't miss anything.

^ permalink raw reply	[flat|nested] 70+ messages in thread

* [Bug target/65697] __atomic memory barriers not strong enough for __sync builtins
  2015-04-08 11:25 [Bug target/65697] New: __atomic memory barriers not strong enough for __sync builtins matthew.wahab at arm dot com
                   ` (42 preceding siblings ...)
  2015-04-29 16:04 ` amacleod at redhat dot com
@ 2015-04-30  8:20 ` mwahab at gcc dot gnu.org
  2015-05-01 12:53 ` torvald at gcc dot gnu.org
                   ` (24 subsequent siblings)
  68 siblings, 0 replies; 70+ messages in thread
From: mwahab at gcc dot gnu.org @ 2015-04-30  8:20 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65697

--- Comment #48 from mwahab at gcc dot gnu.org ---
(In reply to Andrew Macleod from comment #47)
> Created attachment 35425 [details]
> potential patch to add MEMMODEL_SYNC
> 
> I don't know where we've finally settled on this, but I did prototype out
> what involved in adding a new MEMMODEL_SYNC which can be identified and made
> stronger than MEMMODEL_SEQ_CST if so desired. Normally it will behave
> exactly like MEMMODEL_SEQ_CST.   This bootstraps on x86-64 with no testsuite
> regressions.
> 

I've been working on patches for this. Adding a new MEMMODEL type was my first
approach because it means that gcc has one representation for all the memory
models it supports. I decided not to submit the patch because Aarch64 also
needs an tag for the __sync acquire used in __sync_lock_test_and_set. Adding
that touches more backends than the MEMMODEL_SYNC and some of those changes are
not obvious (for the mips backend in particular). The reasons why Aarch64 needs
a new acquire barrier are rather tenuous (they're in the earlier comments) and
don't seem to justify the possibly invasive changes that would be needed. 

The approach I'm working on now is to add a target hook to allow a backend to
supply its own expansion of calls to the __sync builtins. That makes for
smaller changes in the target-independent code and lets the Aarch64 backend add
the new barriers as target-specific memory models. The actual code generation
goes through the infrastructure for the atomics.

Adding the __sync barriers to coretypes.h is the better approach if more than
one backend has this problem. Since it seems that only Aarch64 is affected, I
think its better to go with the target hook. If a new architecture gets added
with the same problem, it will be easy enough to switch to the coretypes.h
approach.

^ permalink raw reply	[flat|nested] 70+ messages in thread

* [Bug target/65697] __atomic memory barriers not strong enough for __sync builtins
  2015-04-08 11:25 [Bug target/65697] New: __atomic memory barriers not strong enough for __sync builtins matthew.wahab at arm dot com
                   ` (43 preceding siblings ...)
  2015-04-30  8:20 ` mwahab at gcc dot gnu.org
@ 2015-05-01 12:53 ` torvald at gcc dot gnu.org
  2015-05-06 14:25 ` amacleod at redhat dot com
                   ` (23 subsequent siblings)
  68 siblings, 0 replies; 70+ messages in thread
From: torvald at gcc dot gnu.org @ 2015-05-01 12:53 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65697

--- Comment #49 from torvald at gcc dot gnu.org ---
(In reply to James Greenhalgh from comment #43)
> (In reply to torvald from comment #37)
> > (In reply to James Greenhalgh from comment #35)
> > > So by the strict letter of the specification, no memory references to
> > > visible data should be allowed to move from after the entire body of the
> > > intrinsic to before it. That is to say in:
> > > 
> > >   __sync_lock_test_and_set (foo, 1)
> > >   bar = 1
> > > 
> > > an observer should not be able to observe the write to bar before the write
> > > to foo. This is a difference from the C++11 semantics.
> > 
> > Can you clarify how this observer would look like?  I think we should look
> > at both code examples that just use __sync for concurrent accesses, and
> > examples that also use normal memory accesses as if those would be
> > guaranteed to be atomic.  None of the __sync docs nor the psABI guarantee
> > any of the latter AFAIK.  I don't think it would be unreasonable to argue
> > that __sync doesn't make promises about non-DRF normal accesses, and so,
> > strictly speaking, maybe programs can't in fact rely on any of the behaviors
> > that are complicated to implement for ARM.  That's why I'd like to
> > distinguish those two cases.
> 
> Sure, it would look much like your cppmem example above, though you can add
> some
> barriers to the observer if you want to make a stronger example
> 
> bar = 0, foo = 0;
> 
> thread_a {
>   __sync_lock_test_and_set (foo, 1)
>   bar = 1
> }
> 
> thread_b {
>   /* If we can see the write to bar, the write
>      to foo must also have happened.  */
>   if (bar) /* Reads 1.  */
>    assert (foo) /* Should never read 0.  */
> }

This is the case of allowing non-DRF normal accesses.  The *other* case I was
thinking about is how the test would have to look like when *not* allowing
them.  One way to do it would be:

thread_a {
  __sync_lock_test_and_set (foo, 1)
  __sync_lock_test_and_set (bar, 1) // or __sync_lock_release, or __sync RMW
}

thread_b {
  if (__sync_fetch_and_add (bar, 0))
    assert (foo)  // DRF if thread_a's write is the final one
}

In this case, would the current ARM implementation still produce insufficient
code?  If not, at least in this test case, we could argue that there's nothing
wrong with what ARM does.  (The question whether we wan't to require DRF
strictly for __sync usage is of course still open.)

> > > I'm not worried about __sync_lock_release, I think the documentation is
> > > strong enough and unambiguous.
> > 
> > Are you aware that the GCC's __sync disallow store-store reordering across
> > __sync_lock_release, whereas the psABI docs don't?
> 
> No I was not, and even looking exactly for the text you were referring to,
> it took
> me three attempts to spot it. Yes, I understand now why you are concerned
> about
> the GCC wording. Perhaps this is just an artefact of a mistake transcribing
> the psABI?

I suppose so, but can't say for sure.  Somebody might have had x86 TSO in mind
when writing that, and perhaps not just accidentally.

However, we say psABI is the reference spec, so I agree we should change the
GCC __sync_lock_release docs accordingly.


^ permalink raw reply	[flat|nested] 70+ messages in thread

* [Bug target/65697] __atomic memory barriers not strong enough for __sync builtins
  2015-04-08 11:25 [Bug target/65697] New: __atomic memory barriers not strong enough for __sync builtins matthew.wahab at arm dot com
                   ` (44 preceding siblings ...)
  2015-05-01 12:53 ` torvald at gcc dot gnu.org
@ 2015-05-06 14:25 ` amacleod at redhat dot com
  2015-05-06 15:58 ` mwahab at gcc dot gnu.org
                   ` (22 subsequent siblings)
  68 siblings, 0 replies; 70+ messages in thread
From: amacleod at redhat dot com @ 2015-05-06 14:25 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65697

Andrew Macleod <amacleod at redhat dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
  Attachment #35425|0                           |1
        is obsolete|                            |

--- Comment #50 from Andrew Macleod <amacleod at redhat dot com> ---
Created attachment 35478
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=35478&action=edit
implement SYNC flag for memory model

> I've been working on patches for this. Adding a new MEMMODEL type was my
> first approach because it means that gcc has one representation for all the
> memory models it supports. I decided not to submit the patch because Aarch64
> also needs an tag for the __sync acquire used in __sync_lock_test_and_set.
> Adding that touches more backends than the MEMMODEL_SYNC and some of those
> changes are not obvious (for the mips backend in particular). The reasons
> why Aarch64 needs a new acquire barrier are rather tenuous (they're in the
> earlier comments) and don't seem to justify the possibly invasive changes
> that would be needed. 
> 
> The approach I'm working on now is to add a target hook to allow a backend
> to supply its own expansion of calls to the __sync builtins. That makes for
> smaller changes in the target-independent code and lets the Aarch64 backend
> add the new barriers as target-specific memory models. The actual code
> generation goes through the infrastructure for the atomics.
> 
> Adding the __sync barriers to coretypes.h is the better approach if more
> than one backend has this problem. Since it seems that only Aarch64 is
> affected, I think its better to go with the target hook. If a new
> architecture gets added with the same problem, it will be easy enough to
> switch to the coretypes.h approach.

Well, if it affects some target now, it likely to affect some target in the
future as well. Didn't someone also mention there is a potential issue with PPC
somewhere too?

In any case, I'm not a fan of adding target hooks for this... We ought to just
bite the bullet and integrate it cleanly into the memory model. 

I have a patch which adds a SYNC flag to the upper bit of the memory model. It
modifies all the generic code and target patterns to use access routines
(declared in tree.h) instead of hard coding masks and flags (which probably
should have be done in the first place). This makes carrying around the SYNC
flag transparent until you want to look for it.

There are new sync enum variants for SYNC_ACQUIRE, SYNC_SEQ_CST, and
SYNC_RELEASE. 

The new access routines ignore the sync flag, so "is_mm_acquire (model)" will
be true for MEMMODEL_ACQUIRE and MEMMODEL_SYNC_ACQUIRE. If a target doesn't
care about differentiating sync (which is most targets) it will be transparent.

It is also simple to check for the presence of sync "is_mm_sync (model)", or a
particular MEMMODEL_SYNC variant "model == MEMMODEL_SYNC_SEQ_CST". A quick hack
to a couple of x86 patterns indicate this works and I can issue extra/different
barriers for sync variants.

This compiles on all targets, but is only runtime tested on
x86_64-unknown-linux-gnu with no regressions.

With this you should be able to easily issue whatever different insn sequence
you need to within an atomic pattern.   Give it a try.  If it works as
required, then we'll see about executing on all targets for testing...

^ permalink raw reply	[flat|nested] 70+ messages in thread

* [Bug target/65697] __atomic memory barriers not strong enough for __sync builtins
  2015-04-08 11:25 [Bug target/65697] New: __atomic memory barriers not strong enough for __sync builtins matthew.wahab at arm dot com
                   ` (45 preceding siblings ...)
  2015-05-06 14:25 ` amacleod at redhat dot com
@ 2015-05-06 15:58 ` mwahab at gcc dot gnu.org
  2015-05-07 13:25 ` mwahab at gcc dot gnu.org
                   ` (21 subsequent siblings)
  68 siblings, 0 replies; 70+ messages in thread
From: mwahab at gcc dot gnu.org @ 2015-05-06 15:58 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65697

--- Comment #51 from mwahab at gcc dot gnu.org ---
(In reply to Andrew Macleod from comment #50)
> Created attachment 35478 [details]
> implement SYNC flag for memory model
> 
> > Adding the __sync barriers to coretypes.h is the better approach if more
> > than one backend has this problem. Since it seems that only Aarch64 is
> > affected, I think its better to go with the target hook. If a new
> > architecture gets added with the same problem, it will be easy enough to
> > switch to the coretypes.h approach.
> 
> Well, if it affects some target now, it likely to affect some target in the
> future as well. Didn't someone also mention there is a potential issue with
> PPC somewhere too?

The earlier comment (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65697#c24)
suggests that PPC generates full barriers so won't be affected by this. It
looks like Aarch64 is the only current target with the problem.

> In any case, I'm not a fan of adding target hooks for this... We ought to
> just bite the bullet and integrate it cleanly into the memory model. 

OK.

> I have a patch which adds a SYNC flag to the upper bit of the memory model.
> It modifies all the generic code and target patterns to use access routines
> (declared in tree.h) instead of hard coding masks and flags (which probably
> should have be done in the first place). This makes carrying around the SYNC
> flag transparent until you want to look for it.

>From a quick look, this seems to do what's needed to allow the Aarch64 backend
to distinguish the sync and atomic code.

> This compiles on all targets, but is only runtime tested on
> x86_64-unknown-linux-gnu with no regressions.

The mips backend was the only one that stood out as needing some care, because
the way it uses the memory models (e.g. in function mips_process_sync_loop) is
a little different from the backends.

> With this you should be able to easily issue whatever different insn
> sequence you need to within an atomic pattern.   Give it a try.  If it works
> as required, then we'll see about executing on all targets for testing...

I'll give it a go.


^ permalink raw reply	[flat|nested] 70+ messages in thread

* [Bug target/65697] __atomic memory barriers not strong enough for __sync builtins
  2015-04-08 11:25 [Bug target/65697] New: __atomic memory barriers not strong enough for __sync builtins matthew.wahab at arm dot com
                   ` (46 preceding siblings ...)
  2015-05-06 15:58 ` mwahab at gcc dot gnu.org
@ 2015-05-07 13:25 ` mwahab at gcc dot gnu.org
  2015-05-08  8:01 ` mwahab at gcc dot gnu.org
                   ` (20 subsequent siblings)
  68 siblings, 0 replies; 70+ messages in thread
From: mwahab at gcc dot gnu.org @ 2015-05-07 13:25 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65697

--- Comment #53 from mwahab at gcc dot gnu.org ---
(In reply to Andrew Macleod from comment #50)
> Created attachment 35478 [details]
> implement SYNC flag for memory model
> 
> This compiles on all targets, but is only runtime tested on
> x86_64-unknown-linux-gnu with no regressions.

The patch passes check-gcc for aarch64-none-linux-gnu with no new failures.
I'll also run the tests on arm.

> With this you should be able to easily issue whatever different insn
> sequence you need to within an atomic pattern.   Give it a try.  If it works
> as required, then we'll see about executing on all targets for testing...

I modified the Aarch64 backend to use the new is_mm_sync test to generate
stronger barriers when needed and it all works as required.


^ permalink raw reply	[flat|nested] 70+ messages in thread

* [Bug target/65697] __atomic memory barriers not strong enough for __sync builtins
  2015-04-08 11:25 [Bug target/65697] New: __atomic memory barriers not strong enough for __sync builtins matthew.wahab at arm dot com
                   ` (47 preceding siblings ...)
  2015-05-07 13:25 ` mwahab at gcc dot gnu.org
@ 2015-05-08  8:01 ` mwahab at gcc dot gnu.org
  2015-05-11 13:44 ` jgreenhalgh at gcc dot gnu.org
                   ` (19 subsequent siblings)
  68 siblings, 0 replies; 70+ messages in thread
From: mwahab at gcc dot gnu.org @ 2015-05-08  8:01 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65697

--- Comment #54 from mwahab at gcc dot gnu.org ---
(In reply to mwahab from comment #53)
> (In reply to Andrew Macleod from comment #50)
> > Created attachment 35478 [details]
> > implement SYNC flag for memory model
> > 
> > This compiles on all targets, but is only runtime tested on
> > x86_64-unknown-linux-gnu with no regressions.
> 
> The patch passes check-gcc for aarch64-none-linux-gnu with no new failures.
> I'll also run the tests on arm.
> 

Also passes check-gcc for arm-none-eabi.


^ permalink raw reply	[flat|nested] 70+ messages in thread

* [Bug target/65697] __atomic memory barriers not strong enough for __sync builtins
  2015-04-08 11:25 [Bug target/65697] New: __atomic memory barriers not strong enough for __sync builtins matthew.wahab at arm dot com
                   ` (48 preceding siblings ...)
  2015-05-08  8:01 ` mwahab at gcc dot gnu.org
@ 2015-05-11 13:44 ` jgreenhalgh at gcc dot gnu.org
  2015-05-11 15:58 ` mwahab at gcc dot gnu.org
                   ` (18 subsequent siblings)
  68 siblings, 0 replies; 70+ messages in thread
From: jgreenhalgh at gcc dot gnu.org @ 2015-05-11 13:44 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65697

--- Comment #55 from James Greenhalgh <jgreenhalgh at gcc dot gnu.org> ---
(In reply to torvald from comment #49)
> > bar = 0, foo = 0;
> > 
> > thread_a {
> >   __sync_lock_test_and_set (foo, 1)
> >   bar = 1
> > }
> > 
> > thread_b {
> >   /* If we can see the write to bar, the write
> >      to foo must also have happened.  */
> >   if (bar) /* Reads 1.  */
> >    assert (foo) /* Should never read 0.  */
> > }
> 
> This is the case of allowing non-DRF normal accesses.  The *other* case I
> was thinking about is how the test would have to look like when *not*
> allowing them.  One way to do it would be:
> 
> thread_a {
>   __sync_lock_test_and_set (foo, 1)
>   __sync_lock_test_and_set (bar, 1) // or __sync_lock_release, or __sync RMW
> }
> 
> thread_b {
>   if (__sync_fetch_and_add (bar, 0))
>     assert (foo)  // DRF if thread_a's write is the final one
> }
> 
> In this case, would the current ARM implementation still produce
> insufficient code?  If not, at least in this test case, we could argue that
> there's nothing wrong with what ARM does.  (The question whether we wan't to
> require DRF strictly for __sync usage is of course still open.)

In this case, the current implementation would be fine. Thread A looks like
this:

thread_a:
        adrp    x0, foo
        mov     w1, 1
        ldr     x0, [x0, #:lo12:foo]
.L2:
        ldaxr   w2, [x0] /* Load acquire foo.  */
        stxr    w3, w1, [x0] /* Store release foo.  */
        cbnz    w3, .L2 /* Branch if not exclusive access.  */
        adrp    x0, bar
        ldr     x0, [x0, #:lo12:bar]
.L3:
        ldaxr   w2, [x0] /* Load acquire bar.  */
        stxr    w3, w1, [x0] /* Store release bar.  */
        cbnz    w3, .L3
        ret

And the architecture gives a specific requirement on the ordering of
store-release and load-acquire:

A Store-Release followed by a Load-Acquire is observed in program order by any
observers that are in both:
  — The shareability domain of the address accessed by the Store-Release.
  — The shareability domain of the address accessed by the Load-Acquire.

So yes, I think in this case we could argue that there is nothing wrong with
what ARM does, however I would expect the non-DRF code to be much more common
in the wild, so I think we still need to deal with this issue. (it is a shame
that the DRF code you provided will suffer from an extra barrier if
Matthew/Andrew's work is applied, but I think this is a corner case which we
probably don't want to put too much thought in to working around).
>From gcc-bugs-return-486003-listarch-gcc-bugs=gcc.gnu.org@gcc.gnu.org Mon May 11 13:59:14 2015
Return-Path: <gcc-bugs-return-486003-listarch-gcc-bugs=gcc.gnu.org@gcc.gnu.org>
Delivered-To: listarch-gcc-bugs@gcc.gnu.org
Received: (qmail 37662 invoked by alias); 11 May 2015 13:59:13 -0000
Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-bugs.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-bugs/>
List-Post: <mailto:gcc-bugs@gcc.gnu.org>
List-Help: <mailto:gcc-bugs-help@gcc.gnu.org>
Sender: gcc-bugs-owner@gcc.gnu.org
Delivered-To: mailing list gcc-bugs@gcc.gnu.org
Received: (qmail 37599 invoked by uid 48); 11 May 2015 13:59:09 -0000
From: "marxin at gcc dot gnu.org" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug ipa/65908] [5/6 Regression] ICE: in expand_thunk, at cgraphunit.c:1700
Date: Mon, 11 May 2015 13:59:00 -0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gcc
X-Bugzilla-Component: ipa
X-Bugzilla-Version: 5.1.0
X-Bugzilla-Keywords:
X-Bugzilla-Severity: normal
X-Bugzilla-Who: marxin at gcc dot gnu.org
X-Bugzilla-Status: NEW
X-Bugzilla-Resolution:
X-Bugzilla-Priority: P1
X-Bugzilla-Assigned-To: marxin at gcc dot gnu.org
X-Bugzilla-Target-Milestone: 5.2
X-Bugzilla-Flags:
X-Bugzilla-Changed-Fields:
Message-ID: <bug-65908-4-7k8DzrzrU7@http.gcc.gnu.org/bugzilla/>
In-Reply-To: <bug-65908-4@http.gcc.gnu.org/bugzilla/>
References: <bug-65908-4@http.gcc.gnu.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
X-SW-Source: 2015-05/txt/msg00843.txt.bz2
Content-length: 488

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65908

--- Comment #6 from Martin Liška <marxin at gcc dot gnu.org> ---
(In reply to Jan Hubicka from comment #5)
> Yep, I suppose we want to match both (TREE_TYPE/TYPE_ARG_TYPES and the decls
> since both control how calls.c produce code and should agree in equivalent
> functions anyway. I will look into why TREE_TYPE (fntype) and TREE_TYPE
> (result_decl) disagreees.

OK, I'm going to write a patch that will match both.

Martin
>From gcc-bugs-return-486004-listarch-gcc-bugs=gcc.gnu.org@gcc.gnu.org Mon May 11 14:16:57 2015
Return-Path: <gcc-bugs-return-486004-listarch-gcc-bugs=gcc.gnu.org@gcc.gnu.org>
Delivered-To: listarch-gcc-bugs@gcc.gnu.org
Received: (qmail 74068 invoked by alias); 11 May 2015 14:16:56 -0000
Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-bugs.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-bugs/>
List-Post: <mailto:gcc-bugs@gcc.gnu.org>
List-Help: <mailto:gcc-bugs-help@gcc.gnu.org>
Sender: gcc-bugs-owner@gcc.gnu.org
Delivered-To: mailing list gcc-bugs@gcc.gnu.org
Received: (qmail 74025 invoked by uid 48); 11 May 2015 14:16:51 -0000
From: "dougmencken at gmail dot com" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug bootstrap/66038] SIGSEGV at stage 2 - build/genmatch fails in operand::gen_transform
Date: Mon, 11 May 2015 14:16:00 -0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gcc
X-Bugzilla-Component: bootstrap
X-Bugzilla-Version: 5.1.0
X-Bugzilla-Keywords:
X-Bugzilla-Severity: normal
X-Bugzilla-Who: dougmencken at gmail dot com
X-Bugzilla-Status: WAITING
X-Bugzilla-Resolution:
X-Bugzilla-Priority: P3
X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org
X-Bugzilla-Target-Milestone: ---
X-Bugzilla-Flags:
X-Bugzilla-Changed-Fields:
Message-ID: <bug-66038-4-QBIkLG2vdv@http.gcc.gnu.org/bugzilla/>
In-Reply-To: <bug-66038-4@http.gcc.gnu.org/bugzilla/>
References: <bug-66038-4@http.gcc.gnu.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: 7bit
X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
X-SW-Source: 2015-05/txt/msg00844.txt.bz2
Content-length: 978

https://gcc.gnu.org/bugzilla/show_bug.cgi?idf038

--- Comment #5 from Douglas Mencken <dougmencken at gmail dot com> ---
(In reply to rguenther@suse.de from comment #4)
> Can you build stage2 with debuginfo?  (--without-build-config at
> configure)
>
> That should imrpove the backtrace.
>
> Thanks,
> Richard.

Sure I can. Here you go:

(gdb) bt
#0  0x0000e208 in operand::gen_transform () at
../../gcc-5.1.0/gcc/hash-table.h:223
#1  0x00008530 in get_operator (id=0x808bec "tcc_comparison") at
../../gcc-5.1.0/gcc/genmatch.c:1495
#2  0x00009ef8 in parser::parse_operator_list (this=0xbffff698) at
../../gcc-5.1.0/gcc/genmatch.c:2941
#3  0x0000d190 in parser::parse_pattern (this=0xbffff698) at
../../gcc-5.1.0/gcc/hash-table.h:223
#4  0x0000e19c in parser::parser (this=0xbffff698, r_=0xbffff4f8) at
../../gcc-5.1.0/gcc/hash-table.h:223
#5  0x0005f3ac in main (argc146720, argv=0xbffff4f8) at
../../gcc-5.1.0/gcc/genmatch.c:3711
Current language:  auto; currently c++


^ permalink raw reply	[flat|nested] 70+ messages in thread

* [Bug target/65697] __atomic memory barriers not strong enough for __sync builtins
  2015-04-08 11:25 [Bug target/65697] New: __atomic memory barriers not strong enough for __sync builtins matthew.wahab at arm dot com
                   ` (49 preceding siblings ...)
  2015-05-11 13:44 ` jgreenhalgh at gcc dot gnu.org
@ 2015-05-11 15:58 ` mwahab at gcc dot gnu.org
  2015-05-12 20:02 ` amacleod at redhat dot com
                   ` (17 subsequent siblings)
  68 siblings, 0 replies; 70+ messages in thread
From: mwahab at gcc dot gnu.org @ 2015-05-11 15:58 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65697

--- Comment #56 from mwahab at gcc dot gnu.org ---
(In reply to James Greenhalgh from comment #55)
> (In reply to torvald from comment #49)
>
> > This is the case of allowing non-DRF normal accesses.  The *other* case I
> > was thinking about is how the test would have to look like when *not*
> > allowing them.  One way to do it would be:
> > 
> > thread_a {
> >   __sync_lock_test_and_set (foo, 1)
> >   __sync_lock_test_and_set (bar, 1) // or __sync_lock_release, or __sync RMW
> > }
>
> [..] (it is
> a shame that the DRF code you provided will suffer from an extra barrier if
> Matthew/Andrew's work is applied, but I think this is a corner case which we
> probably don't want to put too much thought in to working around).

I'm not familiar with the code but I would have thought that it would be 
straightforward to optimize away the first dmb. It seems like it would be a
simple peephole to spot a sequence of two atomic operations, both with __sync
barriers, and replace the first with the equivalent __atomic barrier.


^ permalink raw reply	[flat|nested] 70+ messages in thread

* [Bug target/65697] __atomic memory barriers not strong enough for __sync builtins
  2015-04-08 11:25 [Bug target/65697] New: __atomic memory barriers not strong enough for __sync builtins matthew.wahab at arm dot com
                   ` (50 preceding siblings ...)
  2015-05-11 15:58 ` mwahab at gcc dot gnu.org
@ 2015-05-12 20:02 ` amacleod at redhat dot com
  2015-06-01 15:19 ` mwahab at gcc dot gnu.org
                   ` (16 subsequent siblings)
  68 siblings, 0 replies; 70+ messages in thread
From: amacleod at redhat dot com @ 2015-05-12 20:02 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65697

--- Comment #57 from Andrew Macleod <amacleod at redhat dot com> ---
Author: amacleod
Date: Tue May 12 20:01:47 2015
New Revision: 223096

URL: https://gcc.gnu.org/viewcvs?rev=223096&root=gcc&view=rev
Log:
2015-05-12  Andrew MacLeod  <amacleod@redhat.com>

        PR target/65697
        * coretypes.h (MEMMODEL_SYNC, MEMMODEL_BASE_MASK): New macros.
        (enum memmodel): Add SYNC_{ACQUIRE,RELEASE,SEQ_CST}.
        * tree.h (memmodel_from_int, memmodel_base, is_mm_relaxed,
        is_mm_consume,is_mm_acquire, is_mm_release, is_mm_acq_rel,
        is_mm_seq_cst, is_mm_sync): New accessor functions.
        * builtins.c (expand_builtin_sync_operation,
        expand_builtin_compare_and_swap): Use MEMMODEL_SYNC_SEQ_CST.
        (expand_builtin_sync_lock_release): Use MEMMODEL_SYNC_RELEASE.
        (get_memmodel,  expand_builtin_atomic_compare_exchange,
        expand_builtin_atomic_load, expand_builtin_atomic_store,
        expand_builtin_atomic_clear): Use new accessor routines.
        (expand_builtin_sync_synchronize): Use MEMMODEL_SYNC_SEQ_CST.
        * optabs.c (expand_compare_and_swap_loop): Use MEMMODEL_SYNC_SEQ_CST.
        (maybe_emit_sync_lock_test_and_set): Use new accessors and
        MEMMODEL_SYNC_ACQUIRE.
        (expand_sync_lock_test_and_set): Use MEMMODEL_SYNC_ACQUIRE.
        (expand_mem_thread_fence, expand_mem_signal_fence, expand_atomic_load,
        expand_atomic_store): Use new accessors.
        * emit-rtl.c (need_atomic_barrier_p): Add additional enum cases.
        * tsan.c (instrument_builtin_call): Update check for memory model
beyond
        final enum to use MEMMODEL_LAST.
        * c-family/c-common.c: Use new accessor for memmodel_base.
        * config/aarch64/aarch64.c (aarch64_expand_compare_and_swap): Use new
        accessors.
        * config/aarch64/atomics.md (atomic_load<mode>,atomic_store<mode>,
        arch64_load_exclusive<mode>, aarch64_store_exclusive<mode>,
        mem_thread_fence, *dmb): Likewise.
        * config/alpha/alpha.c (alpha_split_compare_and_swap,
        alpha_split_compare_and_swap_12): Likewise.
        * config/arm/arm.c (arm_expand_compare_and_swap,
        arm_split_compare_and_swap, arm_split_atomic_op): Likewise.
        * config/arm/sync.md (atomic_load<mode>, atomic_store<mode>,
        atomic_loaddi): Likewise.
        * config/i386/i386.c (ix86_destroy_cost_data, ix86_memmodel_check):
        Likewise.
        * config/i386/sync.md (mem_thread_fence, atomic_store<mode>): Likewise.
        * config/ia64/ia64.c (ia64_expand_atomic_op): Add new memmodel cases
and
        use new accessors.
        * config/ia64/sync.md (mem_thread_fence, atomic_load<mode>,
        atomic_store<mode>, atomic_compare_and_swap<mode>,
        atomic_exchange<mode>): Use new accessors.
        * config/mips/mips.c (mips_process_sync_loop): Likewise.
        * config/pa/pa.md (atomic_loaddi, atomic_storedi): Likewise.
        * config/rs6000/rs6000.c (rs6000_pre_atomic_barrier,
        rs6000_post_atomic_barrier): Add new cases.
        (rs6000_expand_atomic_compare_and_swap): Use new accessors.
        * config/rs6000/sync.md (mem_thread_fence): Add new cases.
        (atomic_load<mode>): Add new cases and use new accessors.
        (store_quadpti): Add new cases.
        * config/s390/s390.md (mem_thread_fence, atomic_store<mode>): Use new
        accessors.
        * config/sparc/sparc.c (sparc_emit_membar_for_model): Use new
accessors.
        * doc/extend.texi: Update docs to indicate 16 bits are used for memory
        model, not 8.



Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/builtins.c
    trunk/gcc/c-family/c-common.c
    trunk/gcc/config/aarch64/aarch64.c
    trunk/gcc/config/aarch64/atomics.md
    trunk/gcc/config/alpha/alpha.c
    trunk/gcc/config/arm/arm.c
    trunk/gcc/config/arm/sync.md
    trunk/gcc/config/i386/i386.c
    trunk/gcc/config/i386/sync.md
    trunk/gcc/config/ia64/ia64.c
    trunk/gcc/config/ia64/sync.md
    trunk/gcc/config/mips/mips.c
    trunk/gcc/config/pa/pa.md
    trunk/gcc/config/rs6000/rs6000.c
    trunk/gcc/config/rs6000/sync.md
    trunk/gcc/config/s390/s390.md
    trunk/gcc/config/sparc/sparc.c
    trunk/gcc/coretypes.h
    trunk/gcc/doc/extend.texi
    trunk/gcc/emit-rtl.c
    trunk/gcc/optabs.c
    trunk/gcc/tree.h
    trunk/gcc/tsan.c


^ permalink raw reply	[flat|nested] 70+ messages in thread

* [Bug target/65697] __atomic memory barriers not strong enough for __sync builtins
  2015-04-08 11:25 [Bug target/65697] New: __atomic memory barriers not strong enough for __sync builtins matthew.wahab at arm dot com
                   ` (51 preceding siblings ...)
  2015-05-12 20:02 ` amacleod at redhat dot com
@ 2015-06-01 15:19 ` mwahab at gcc dot gnu.org
  2015-06-01 15:21 ` mwahab at gcc dot gnu.org
                   ` (15 subsequent siblings)
  68 siblings, 0 replies; 70+ messages in thread
From: mwahab at gcc dot gnu.org @ 2015-06-01 15:19 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65697

--- Comment #58 from mwahab at gcc dot gnu.org ---
Author: mwahab
Date: Mon Jun  1 15:18:19 2015
New Revision: 223983

URL: https://gcc.gnu.org/viewcvs?rev=223983&root=gcc&view=rev
Log:
        PR target/65697
        * config/aarch64/aarch64.c (aarch64_emit_post_barrier):New.
        (aarch64_split_atomic_op): Check for __sync memory models, emit
        appropriate initial loads and final barriers.


Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/config/aarch64/aarch64.c


^ permalink raw reply	[flat|nested] 70+ messages in thread

* [Bug target/65697] __atomic memory barriers not strong enough for __sync builtins
  2015-04-08 11:25 [Bug target/65697] New: __atomic memory barriers not strong enough for __sync builtins matthew.wahab at arm dot com
                   ` (52 preceding siblings ...)
  2015-06-01 15:19 ` mwahab at gcc dot gnu.org
@ 2015-06-01 15:21 ` mwahab at gcc dot gnu.org
  2015-06-01 15:25 ` mwahab at gcc dot gnu.org
                   ` (14 subsequent siblings)
  68 siblings, 0 replies; 70+ messages in thread
From: mwahab at gcc dot gnu.org @ 2015-06-01 15:21 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65697

--- Comment #59 from mwahab at gcc dot gnu.org ---
Author: mwahab
Date: Mon Jun  1 15:21:02 2015
New Revision: 223984

URL: https://gcc.gnu.org/viewcvs?rev=223984&root=gcc&view=rev
Log:
        PR target/65697
        * config/aarch64/aarch64.c (aarch64_split_compare_and_swap): Check
        for __sync memory models, emit initial loads and final barriers as
        appropriate.


Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/config/aarch64/aarch64.c


^ permalink raw reply	[flat|nested] 70+ messages in thread

* [Bug target/65697] __atomic memory barriers not strong enough for __sync builtins
  2015-04-08 11:25 [Bug target/65697] New: __atomic memory barriers not strong enough for __sync builtins matthew.wahab at arm dot com
                   ` (53 preceding siblings ...)
  2015-06-01 15:21 ` mwahab at gcc dot gnu.org
@ 2015-06-01 15:25 ` mwahab at gcc dot gnu.org
  2015-06-11  7:35 ` ramana at gcc dot gnu.org
                   ` (13 subsequent siblings)
  68 siblings, 0 replies; 70+ messages in thread
From: mwahab at gcc dot gnu.org @ 2015-06-01 15:25 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65697

--- Comment #60 from mwahab at gcc dot gnu.org ---
Author: mwahab
Date: Mon Jun  1 15:24:37 2015
New Revision: 223986

URL: https://gcc.gnu.org/viewcvs?rev=223986&root=gcc&view=rev
Log:
        PR target/65697
        * gcc.target/aarch64/sync-comp-swap.c: New.
        * gcc.target/aarch64/sync-comp-swap.x: New.
        * gcc.target/aarch64/sync-op-acquire.c: New.
        * gcc.target/aarch64/sync-op-acquire.x: New.
        * gcc.target/aarch64/sync-op-full.c: New.
        * gcc.target/aarch64/sync-op-full.x: New.
        * gcc.target/aarch64/sync-op-release.c: New.
        * gcc.target/aarch64/sync-op-release.x: New.


Added:
    trunk/gcc/testsuite/gcc.target/aarch64/sync-comp-swap.c
    trunk/gcc/testsuite/gcc.target/aarch64/sync-comp-swap.x
    trunk/gcc/testsuite/gcc.target/aarch64/sync-op-acquire.c
    trunk/gcc/testsuite/gcc.target/aarch64/sync-op-acquire.x
    trunk/gcc/testsuite/gcc.target/aarch64/sync-op-full.c
    trunk/gcc/testsuite/gcc.target/aarch64/sync-op-full.x
    trunk/gcc/testsuite/gcc.target/aarch64/sync-op-release.c
    trunk/gcc/testsuite/gcc.target/aarch64/sync-op-release.x
Modified:
    trunk/gcc/testsuite/ChangeLog


^ permalink raw reply	[flat|nested] 70+ messages in thread

* [Bug target/65697] __atomic memory barriers not strong enough for __sync builtins
  2015-04-08 11:25 [Bug target/65697] New: __atomic memory barriers not strong enough for __sync builtins matthew.wahab at arm dot com
                   ` (54 preceding siblings ...)
  2015-06-01 15:25 ` mwahab at gcc dot gnu.org
@ 2015-06-11  7:35 ` ramana at gcc dot gnu.org
  2015-06-29 16:04 ` mwahab at gcc dot gnu.org
                   ` (12 subsequent siblings)
  68 siblings, 0 replies; 70+ messages in thread
From: ramana at gcc dot gnu.org @ 2015-06-11  7:35 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65697

Ramana Radhakrishnan <ramana at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |NEW
   Last reconfirmed|                            |2015-06-11
                 CC|                            |ramana at gcc dot gnu.org
     Ever confirmed|0                           |1

--- Comment #61 from Ramana Radhakrishnan <ramana at gcc dot gnu.org> ---
Well, confirmed at least. And at the minute fixed on trunk - not sure if we are
asking for backports for this ?


^ permalink raw reply	[flat|nested] 70+ messages in thread

* [Bug target/65697] __atomic memory barriers not strong enough for __sync builtins
  2015-04-08 11:25 [Bug target/65697] New: __atomic memory barriers not strong enough for __sync builtins matthew.wahab at arm dot com
                   ` (55 preceding siblings ...)
  2015-06-11  7:35 ` ramana at gcc dot gnu.org
@ 2015-06-29 16:04 ` mwahab at gcc dot gnu.org
  2015-06-29 16:09 ` mwahab at gcc dot gnu.org
                   ` (11 subsequent siblings)
  68 siblings, 0 replies; 70+ messages in thread
From: mwahab at gcc dot gnu.org @ 2015-06-29 16:04 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65697

--- Comment #63 from mwahab at gcc dot gnu.org ---
Author: mwahab
Date: Mon Jun 29 16:03:34 2015
New Revision: 225132

URL: https://gcc.gnu.org/viewcvs?rev=225132&root=gcc&view=rev
Log:
2015-06-29  Matthew Wahab  <matthew.wahab@arm.com>

        PR target/65697
        * config/armc/arm.c (arm_split_atomic_op): For ARMv8, replace an
        initial acquire barrier with final barrier.


Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/config/arm/arm.c


^ permalink raw reply	[flat|nested] 70+ messages in thread

* [Bug target/65697] __atomic memory barriers not strong enough for __sync builtins
  2015-04-08 11:25 [Bug target/65697] New: __atomic memory barriers not strong enough for __sync builtins matthew.wahab at arm dot com
                   ` (56 preceding siblings ...)
  2015-06-29 16:04 ` mwahab at gcc dot gnu.org
@ 2015-06-29 16:09 ` mwahab at gcc dot gnu.org
  2015-06-29 16:12 ` mwahab at gcc dot gnu.org
                   ` (10 subsequent siblings)
  68 siblings, 0 replies; 70+ messages in thread
From: mwahab at gcc dot gnu.org @ 2015-06-29 16:09 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65697

--- Comment #64 from mwahab at gcc dot gnu.org ---
Author: mwahab
Date: Mon Jun 29 16:09:10 2015
New Revision: 225133

URL: https://gcc.gnu.org/viewcvs?rev=225133&root=gcc&view=rev
Log:
2015-06-29  Matthew Wahab  <matthew.wahab@arm.com>

        PR target/65697
        * config/armc/arm.c (arm_split_compare_and_swap): For ARMv8, replace an
        initial acquire barrier with final barrier.


Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/config/arm/arm.c


^ permalink raw reply	[flat|nested] 70+ messages in thread

* [Bug target/65697] __atomic memory barriers not strong enough for __sync builtins
  2015-04-08 11:25 [Bug target/65697] New: __atomic memory barriers not strong enough for __sync builtins matthew.wahab at arm dot com
                   ` (57 preceding siblings ...)
  2015-06-29 16:09 ` mwahab at gcc dot gnu.org
@ 2015-06-29 16:12 ` mwahab at gcc dot gnu.org
  2015-08-05 11:21 ` mwahab at gcc dot gnu.org
                   ` (9 subsequent siblings)
  68 siblings, 0 replies; 70+ messages in thread
From: mwahab at gcc dot gnu.org @ 2015-06-29 16:12 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65697

--- Comment #65 from mwahab at gcc dot gnu.org ---
Author: mwahab
Date: Mon Jun 29 16:12:12 2015
New Revision: 225134

URL: https://gcc.gnu.org/viewcvs?rev=225134&root=gcc&view=rev
Log:
2015-06-29  Matthew Wahab  <matthew.wahab@arm.com>

        PR target/65697
        * gcc.target/arm/armv-sync-comp-swap.c: New.
        * gcc.target/arm/armv-sync-op-acquire.c: New.
        * gcc.target/arm/armv-sync-op-full.c: New.
        * gcc.target/arm/armv-sync-op-release.c: New.


Added:
    trunk/gcc/testsuite/gcc.target/arm/armv8-sync-comp-swap.c   (with props)
    trunk/gcc/testsuite/gcc.target/arm/armv8-sync-op-acquire.c   (with props)
    trunk/gcc/testsuite/gcc.target/arm/armv8-sync-op-full.c   (with props)
    trunk/gcc/testsuite/gcc.target/arm/armv8-sync-op-release.c   (with props)
Modified:
    trunk/gcc/ChangeLog

Propchange: trunk/gcc/testsuite/gcc.target/arm/armv8-sync-comp-swap.c
            ('svn:eol-style' added)

Propchange: trunk/gcc/testsuite/gcc.target/arm/armv8-sync-comp-swap.c
            ('svn:keywords' added)

Propchange: trunk/gcc/testsuite/gcc.target/arm/armv8-sync-op-acquire.c
            ('svn:eol-style' added)

Propchange: trunk/gcc/testsuite/gcc.target/arm/armv8-sync-op-acquire.c
            ('svn:keywords' added)

Propchange: trunk/gcc/testsuite/gcc.target/arm/armv8-sync-op-full.c
            ('svn:eol-style' added)

Propchange: trunk/gcc/testsuite/gcc.target/arm/armv8-sync-op-full.c
            ('svn:keywords' added)

Propchange: trunk/gcc/testsuite/gcc.target/arm/armv8-sync-op-release.c
            ('svn:eol-style' added)

Propchange: trunk/gcc/testsuite/gcc.target/arm/armv8-sync-op-release.c
            ('svn:keywords' added)


^ permalink raw reply	[flat|nested] 70+ messages in thread

* [Bug target/65697] __atomic memory barriers not strong enough for __sync builtins
  2015-04-08 11:25 [Bug target/65697] New: __atomic memory barriers not strong enough for __sync builtins matthew.wahab at arm dot com
                   ` (58 preceding siblings ...)
  2015-06-29 16:12 ` mwahab at gcc dot gnu.org
@ 2015-08-05 11:21 ` mwahab at gcc dot gnu.org
  2015-08-05 11:30 ` mwahab at gcc dot gnu.org
                   ` (8 subsequent siblings)
  68 siblings, 0 replies; 70+ messages in thread
From: mwahab at gcc dot gnu.org @ 2015-08-05 11:21 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65697

--- Comment #66 from mwahab at gcc dot gnu.org ---
Author: mwahab
Date: Wed Aug  5 11:20:59 2015
New Revision: 226618

URL: https://gcc.gnu.org/viewcvs?rev=226618&root=gcc&view=rev
Log:
        Backport from trunk
        2015-05-12  Andrew MacLeod  <amacleod@redhat.com>

        PR target/65697
        * coretypes.h (MEMMODEL_SYNC, MEMMODEL_BASE_MASK): New macros.
        (enum memmodel): Add SYNC_{ACQUIRE,RELEASE,SEQ_CST}.
        * tree.h (memmodel_from_int, memmodel_base, is_mm_relaxed)
        (is_mm_consume,is_mm_acquire, is_mm_release, is_mm_acq_rel)
        (is_mm_seq_cst, is_mm_sync): New accessor functions.
        * builtins.c (expand_builtin_sync_operation)
        (expand_builtin_compare_and_swap): Use MEMMODEL_SYNC_SEQ_CST.
        (expand_builtin_sync_lock_release): Use MEMMODEL_SYNC_RELEASE.
        (get_memmodel,  expand_builtin_atomic_compare_exchange)
        (expand_builtin_atomic_load, expand_builtin_atomic_store)
        (expand_builtin_atomic_clear): Use new accessor routines.
        (expand_builtin_sync_synchronize): Use MEMMODEL_SYNC_SEQ_CST.
        * optabs.c (expand_compare_and_swap_loop): Use MEMMODEL_SYNC_SEQ_CST.
        (maybe_emit_sync_lock_test_and_set): Use new accessors and
        MEMMODEL_SYNC_ACQUIRE.
        (expand_sync_lock_test_and_set): Use MEMMODEL_SYNC_ACQUIRE.
        (expand_mem_thread_fence, expand_mem_signal_fence, expand_atomic_load)
        (expand_atomic_store): Use new accessors.
        * emit-rtl.c (need_atomic_barrier_p): Add additional enum cases.
        * tsan.c (instrument_builtin_call): Update check for memory model
beyond
        final enum to use MEMMODEL_LAST.
        * c-family/c-common.c: Use new accessor for memmodel_base.
        * config/aarch64/aarch64.c (aarch64_expand_compare_and_swap): Use new
        accessors.
        * config/aarch64/atomics.md (atomic_load<mode>,atomic_store<mode>)
        (arch64_load_exclusive<mode>, aarch64_store_exclusive<mode>)
        (mem_thread_fence, *dmb): Likewise.
        * config/alpha/alpha.c (alpha_split_compare_and_swap)
        (alpha_split_compare_and_swap_12): Likewise.
        * config/arm/arm.c (arm_expand_compare_and_swap)
        (arm_split_compare_and_swap, arm_split_atomic_op): Likewise.
        * config/arm/sync.md (atomic_load<mode>, atomic_store<mode>)
        (atomic_loaddi): Likewise.
        * config/i386/i386.c (ix86_destroy_cost_data, ix86_memmodel_check):
        Likewise.
        * config/i386/sync.md (mem_thread_fence, atomic_store<mode>): Likewise.
        * config/ia64/ia64.c (ia64_expand_atomic_op): Add new memmodel cases
and
        use new accessors.
        * config/ia64/sync.md (mem_thread_fence, atomic_load<mode>)
        (atomic_store<mode>, atomic_compare_and_swap<mode>)
        (atomic_exchange<mode>): Use new accessors.
        * config/mips/mips.c (mips_process_sync_loop): Likewise.
        * config/pa/pa.md (atomic_loaddi, atomic_storedi): Likewise.
        * config/rs6000/rs6000.c (rs6000_pre_atomic_barrier)
        (rs6000_post_atomic_barrier): Add new cases.
        (rs6000_expand_atomic_compare_and_swap): Use new accessors.
        * config/rs6000/sync.md (mem_thread_fence): Add new cases.
        (atomic_load<mode>): Add new cases and use new accessors.
        (store_quadpti): Add new cases.
        * config/s390/s390.md (mem_thread_fence, atomic_store<mode>): Use new
        accessors.
        * config/sparc/sparc.c (sparc_emit_membar_for_model): Use new
accessors.
        * doc/extend.texi: Update docs to indicate 16 bits are used for memory
        model, not 8.


Modified:
    branches/gcc-5-branch/gcc/ChangeLog
    branches/gcc-5-branch/gcc/builtins.c
    branches/gcc-5-branch/gcc/c-family/c-common.c
    branches/gcc-5-branch/gcc/config/aarch64/aarch64.c
    branches/gcc-5-branch/gcc/config/aarch64/atomics.md
    branches/gcc-5-branch/gcc/config/alpha/alpha.c
    branches/gcc-5-branch/gcc/config/arm/arm.c
    branches/gcc-5-branch/gcc/config/arm/sync.md
    branches/gcc-5-branch/gcc/config/i386/i386.c
    branches/gcc-5-branch/gcc/config/i386/sync.md
    branches/gcc-5-branch/gcc/config/ia64/ia64.c
    branches/gcc-5-branch/gcc/config/ia64/sync.md
    branches/gcc-5-branch/gcc/config/mips/mips.c
    branches/gcc-5-branch/gcc/config/pa/pa.md
    branches/gcc-5-branch/gcc/config/rs6000/rs6000.c
    branches/gcc-5-branch/gcc/config/rs6000/sync.md
    branches/gcc-5-branch/gcc/config/s390/s390.md
    branches/gcc-5-branch/gcc/config/sparc/sparc.c
    branches/gcc-5-branch/gcc/coretypes.h
    branches/gcc-5-branch/gcc/doc/extend.texi
    branches/gcc-5-branch/gcc/emit-rtl.c
    branches/gcc-5-branch/gcc/optabs.c
    branches/gcc-5-branch/gcc/tree.h
    branches/gcc-5-branch/gcc/tsan.c


^ permalink raw reply	[flat|nested] 70+ messages in thread

* [Bug target/65697] __atomic memory barriers not strong enough for __sync builtins
  2015-04-08 11:25 [Bug target/65697] New: __atomic memory barriers not strong enough for __sync builtins matthew.wahab at arm dot com
                   ` (59 preceding siblings ...)
  2015-08-05 11:21 ` mwahab at gcc dot gnu.org
@ 2015-08-05 11:30 ` mwahab at gcc dot gnu.org
  2015-08-05 11:41 ` mwahab at gcc dot gnu.org
                   ` (7 subsequent siblings)
  68 siblings, 0 replies; 70+ messages in thread
From: mwahab at gcc dot gnu.org @ 2015-08-05 11:30 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65697

--- Comment #67 from mwahab at gcc dot gnu.org ---
Author: mwahab
Date: Wed Aug  5 11:29:28 2015
New Revision: 226619

URL: https://gcc.gnu.org/viewcvs?rev=226619&root=gcc&view=rev
Log:
        Backport from trunk.
        2015-06-01  Matthew Wahab  <matthew.wahab@arm.com>

        PR target/65697
        * config/aarch64/aarch64.c (aarch64_emit_post_barrier): New.
        (aarch64_split_atomic_op): Check for __sync memory models, emit
        appropriate initial loads and final barriers.


Modified:
    branches/gcc-5-branch/gcc/ChangeLog
    branches/gcc-5-branch/gcc/config/aarch64/aarch64.c


^ permalink raw reply	[flat|nested] 70+ messages in thread

* [Bug target/65697] __atomic memory barriers not strong enough for __sync builtins
  2015-04-08 11:25 [Bug target/65697] New: __atomic memory barriers not strong enough for __sync builtins matthew.wahab at arm dot com
                   ` (60 preceding siblings ...)
  2015-08-05 11:30 ` mwahab at gcc dot gnu.org
@ 2015-08-05 11:41 ` mwahab at gcc dot gnu.org
  2015-08-05 11:49 ` mwahab at gcc dot gnu.org
                   ` (6 subsequent siblings)
  68 siblings, 0 replies; 70+ messages in thread
From: mwahab at gcc dot gnu.org @ 2015-08-05 11:41 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65697

--- Comment #68 from mwahab at gcc dot gnu.org ---
Author: mwahab
Date: Wed Aug  5 11:40:25 2015
New Revision: 226620

URL: https://gcc.gnu.org/viewcvs?rev=226620&root=gcc&view=rev
Log:
        Backport from trunk.
        2015-06-01  Matthew Wahab  <matthew.wahab@arm.com>

        PR target/65697
        * config/aarch64/aarch64.c (aarch64_split_compare_and_swap): Check
        for __sync memory models, emit initial loads and final barriers as
        appropriate.


Modified:
    branches/gcc-5-branch/gcc/ChangeLog
    branches/gcc-5-branch/gcc/config/aarch64/aarch64.c


^ permalink raw reply	[flat|nested] 70+ messages in thread

* [Bug target/65697] __atomic memory barriers not strong enough for __sync builtins
  2015-04-08 11:25 [Bug target/65697] New: __atomic memory barriers not strong enough for __sync builtins matthew.wahab at arm dot com
                   ` (61 preceding siblings ...)
  2015-08-05 11:41 ` mwahab at gcc dot gnu.org
@ 2015-08-05 11:49 ` mwahab at gcc dot gnu.org
  2015-08-05 13:28 ` mwahab at gcc dot gnu.org
                   ` (5 subsequent siblings)
  68 siblings, 0 replies; 70+ messages in thread
From: mwahab at gcc dot gnu.org @ 2015-08-05 11:49 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65697

--- Comment #69 from mwahab at gcc dot gnu.org ---
Author: mwahab
Date: Wed Aug  5 11:48:43 2015
New Revision: 226621

URL: https://gcc.gnu.org/viewcvs?rev=226621&root=gcc&view=rev
Log:
        Backport from trunk
        2015-06-01  Matthew Wahab  <matthew.wahab@arm.com>

        PR target/65697
        * gcc.target/aarch64/sync-comp-swap.c: New.
        * gcc.target/aarch64/sync-comp-swap.x: New.
        * gcc.target/aarch64/sync-op-acquire.c: New.
        * gcc.target/aarch64/sync-op-acquire.x: New.
        * gcc.target/aarch64/sync-op-full.c: New.
        * gcc.target/aarch64/sync-op-full.x: New.
        * gcc.target/aarch64/sync-op-release.c: New.
        * gcc.target/aarch64/sync-op-release.x: New.


Added:
    branches/gcc-5-branch/gcc/testsuite/gcc.target/aarch64/sync-comp-swap.c
    branches/gcc-5-branch/gcc/testsuite/gcc.target/aarch64/sync-comp-swap.x
    branches/gcc-5-branch/gcc/testsuite/gcc.target/aarch64/sync-op-acquire.c
    branches/gcc-5-branch/gcc/testsuite/gcc.target/aarch64/sync-op-acquire.x
    branches/gcc-5-branch/gcc/testsuite/gcc.target/aarch64/sync-op-full.c
    branches/gcc-5-branch/gcc/testsuite/gcc.target/aarch64/sync-op-full.x
    branches/gcc-5-branch/gcc/testsuite/gcc.target/aarch64/sync-op-release.c
    branches/gcc-5-branch/gcc/testsuite/gcc.target/aarch64/sync-op-release.x
Modified:
    branches/gcc-5-branch/gcc/testsuite/ChangeLog


^ permalink raw reply	[flat|nested] 70+ messages in thread

* [Bug target/65697] __atomic memory barriers not strong enough for __sync builtins
  2015-04-08 11:25 [Bug target/65697] New: __atomic memory barriers not strong enough for __sync builtins matthew.wahab at arm dot com
                   ` (62 preceding siblings ...)
  2015-08-05 11:49 ` mwahab at gcc dot gnu.org
@ 2015-08-05 13:28 ` mwahab at gcc dot gnu.org
  2015-08-05 13:40 ` mwahab at gcc dot gnu.org
                   ` (4 subsequent siblings)
  68 siblings, 0 replies; 70+ messages in thread
From: mwahab at gcc dot gnu.org @ 2015-08-05 13:28 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65697

--- Comment #70 from mwahab at gcc dot gnu.org ---
Author: mwahab
Date: Wed Aug  5 13:27:41 2015
New Revision: 226625

URL: https://gcc.gnu.org/viewcvs?rev=226625&root=gcc&view=rev
Log:
        Backport from trunk:
        2015-06-29  Matthew Wahab  <matthew.wahab@arm.com>

        PR target/65697
        * config/armc/arm.c (arm_split_atomic_op): For ARMv8, replace an
        initial acquire barrier with final barrier.


Modified:
    branches/gcc-5-branch/gcc/ChangeLog
    branches/gcc-5-branch/gcc/config/arm/arm.c


^ permalink raw reply	[flat|nested] 70+ messages in thread

* [Bug target/65697] __atomic memory barriers not strong enough for __sync builtins
  2015-04-08 11:25 [Bug target/65697] New: __atomic memory barriers not strong enough for __sync builtins matthew.wahab at arm dot com
                   ` (63 preceding siblings ...)
  2015-08-05 13:28 ` mwahab at gcc dot gnu.org
@ 2015-08-05 13:40 ` mwahab at gcc dot gnu.org
  2015-08-05 13:43 ` mwahab at gcc dot gnu.org
                   ` (3 subsequent siblings)
  68 siblings, 0 replies; 70+ messages in thread
From: mwahab at gcc dot gnu.org @ 2015-08-05 13:40 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65697

--- Comment #71 from mwahab at gcc dot gnu.org ---
Author: mwahab
Date: Wed Aug  5 13:40:14 2015
New Revision: 226627

URL: https://gcc.gnu.org/viewcvs?rev=226627&root=gcc&view=rev
Log:
        Backport from trunk:
        2015-06-29  Matthew Wahab  <matthew.wahab@arm.com>

        PR target/65697
        * config/arm/arm.c (arm_split_compare_and_swap): For ARMv8,
        replace an initial acquire barrier with final barrier.


Modified:
    branches/gcc-5-branch/gcc/ChangeLog
    branches/gcc-5-branch/gcc/config/arm/arm.c


^ permalink raw reply	[flat|nested] 70+ messages in thread

* [Bug target/65697] __atomic memory barriers not strong enough for __sync builtins
  2015-04-08 11:25 [Bug target/65697] New: __atomic memory barriers not strong enough for __sync builtins matthew.wahab at arm dot com
                   ` (64 preceding siblings ...)
  2015-08-05 13:40 ` mwahab at gcc dot gnu.org
@ 2015-08-05 13:43 ` mwahab at gcc dot gnu.org
  2015-09-30  3:07 ` ramana at gcc dot gnu.org
                   ` (2 subsequent siblings)
  68 siblings, 0 replies; 70+ messages in thread
From: mwahab at gcc dot gnu.org @ 2015-08-05 13:43 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65697

--- Comment #72 from mwahab at gcc dot gnu.org ---
Author: mwahab
Date: Wed Aug  5 13:43:04 2015
New Revision: 226628

URL: https://gcc.gnu.org/viewcvs?rev=226628&root=gcc&view=rev
Log:
        Backport from trunk:
        2015-06-29  Matthew Wahab  <matthew.wahab@arm.com>

        PR target/65697
        * gcc.target/arm/armv-sync-comp-swap.c: New.
        * gcc.target/arm/armv-sync-op-acquire.c: New.
        * gcc.target/arm/armv-sync-op-full.c: New.
        * gcc.target/arm/armv-sync-op-release.c: New.


Added:
    branches/gcc-5-branch/gcc/testsuite/gcc.target/arm/armv8-sync-comp-swap.c
    branches/gcc-5-branch/gcc/testsuite/gcc.target/arm/armv8-sync-op-acquire.c
    branches/gcc-5-branch/gcc/testsuite/gcc.target/arm/armv8-sync-op-full.c
    branches/gcc-5-branch/gcc/testsuite/gcc.target/arm/armv8-sync-op-release.c
Modified:
    branches/gcc-5-branch/gcc/testsuite/ChangeLog


^ permalink raw reply	[flat|nested] 70+ messages in thread

* [Bug target/65697] __atomic memory barriers not strong enough for __sync builtins
  2015-04-08 11:25 [Bug target/65697] New: __atomic memory barriers not strong enough for __sync builtins matthew.wahab at arm dot com
                   ` (65 preceding siblings ...)
  2015-08-05 13:43 ` mwahab at gcc dot gnu.org
@ 2015-09-30  3:07 ` ramana at gcc dot gnu.org
  2015-10-05  8:08 ` mwahab at gcc dot gnu.org
  2015-10-05  8:30 ` ramana at gcc dot gnu.org
  68 siblings, 0 replies; 70+ messages in thread
From: ramana at gcc dot gnu.org @ 2015-09-30  3:07 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65697

--- Comment #73 from Ramana Radhakrishnan <ramana at gcc dot gnu.org> ---
Now fixed ,Matthew ?

Ramana


^ permalink raw reply	[flat|nested] 70+ messages in thread

* [Bug target/65697] __atomic memory barriers not strong enough for __sync builtins
  2015-04-08 11:25 [Bug target/65697] New: __atomic memory barriers not strong enough for __sync builtins matthew.wahab at arm dot com
                   ` (66 preceding siblings ...)
  2015-09-30  3:07 ` ramana at gcc dot gnu.org
@ 2015-10-05  8:08 ` mwahab at gcc dot gnu.org
  2015-10-05  8:30 ` ramana at gcc dot gnu.org
  68 siblings, 0 replies; 70+ messages in thread
From: mwahab at gcc dot gnu.org @ 2015-10-05  8:08 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65697

mwahab at gcc dot gnu.org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
      Known to fail|                            |4.9.4

--- Comment #74 from mwahab at gcc dot gnu.org ---
(In reply to Ramana Radhakrishnan from comment #73)
> Now fixed ,Matthew ?

Its fixed in trunk and gcc-5 but not in gcc-4.9.


^ permalink raw reply	[flat|nested] 70+ messages in thread

* [Bug target/65697] __atomic memory barriers not strong enough for __sync builtins
  2015-04-08 11:25 [Bug target/65697] New: __atomic memory barriers not strong enough for __sync builtins matthew.wahab at arm dot com
                   ` (67 preceding siblings ...)
  2015-10-05  8:08 ` mwahab at gcc dot gnu.org
@ 2015-10-05  8:30 ` ramana at gcc dot gnu.org
  68 siblings, 0 replies; 70+ messages in thread
From: ramana at gcc dot gnu.org @ 2015-10-05  8:30 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65697

Ramana Radhakrishnan <ramana at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|---                         |FIXED
   Target Milestone|---                         |5.3

--- Comment #75 from Ramana Radhakrishnan <ramana at gcc dot gnu.org> ---
Fixed on 5 branch and trunk.


^ permalink raw reply	[flat|nested] 70+ messages in thread

end of thread, other threads:[~2015-10-05  8:30 UTC | newest]

Thread overview: 70+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-04-08 11:25 [Bug target/65697] New: __atomic memory barriers not strong enough for __sync builtins matthew.wahab at arm dot com
2015-04-08 11:33 ` [Bug target/65697] " matthew.wahab at arm dot com
2015-04-08 16:16 ` jakub at gcc dot gnu.org
2015-04-09  9:53 ` matthew.wahab at arm dot com
2015-04-09 10:08 ` torvald at gcc dot gnu.org
2015-04-09 10:11 ` torvald at gcc dot gnu.org
2015-04-09 10:47 ` matthew.wahab at arm dot com
2015-04-09 11:10 ` redi at gcc dot gnu.org
2015-04-09 11:25 ` matthew.wahab at arm dot com
2015-04-09 11:51 ` amacleod at redhat dot com
2015-04-09 14:03 ` matthew.wahab at arm dot com
2015-04-10 15:09 ` jgreenhalgh at gcc dot gnu.org
2015-04-15 10:16 ` aph at gcc dot gnu.org
2015-04-15 11:11 ` mwahab at gcc dot gnu.org
2015-04-15 11:54 ` jakub at gcc dot gnu.org
2015-04-15 12:43 ` aph at gcc dot gnu.org
2015-04-15 13:01 ` mwahab at gcc dot gnu.org
2015-04-15 14:10 ` aph at gcc dot gnu.org
2015-04-15 14:33 ` mwahab at gcc dot gnu.org
2015-04-15 14:49 ` aph at gcc dot gnu.org
2015-04-15 15:49 ` torvald at gcc dot gnu.org
2015-04-15 20:05 ` torvald at gcc dot gnu.org
2015-04-15 20:49 ` jgreenhalgh at gcc dot gnu.org
2015-04-15 22:13 ` torvald at gcc dot gnu.org
2015-04-16  3:45 ` amacleod at redhat dot com
2015-04-16  8:11 ` mwahab at gcc dot gnu.org
2015-04-16  8:50 ` mwahab at gcc dot gnu.org
2015-04-16  9:01 ` jgreenhalgh at gcc dot gnu.org
2015-04-16  9:18 ` mwahab at gcc dot gnu.org
2015-04-16 11:37 ` amacleod at redhat dot com
2015-04-16 11:54 ` torvald at gcc dot gnu.org
2015-04-16 12:13 ` mwahab at gcc dot gnu.org
2015-04-16 13:27 ` jgreenhalgh at gcc dot gnu.org
2015-04-17 15:38 ` torvald at gcc dot gnu.org
2015-04-17 15:48 ` torvald at gcc dot gnu.org
2015-04-17 18:00 ` amacleod at redhat dot com
2015-04-20 13:42 ` mwahab at gcc dot gnu.org
2015-04-20 15:17 ` mwahab at gcc dot gnu.org
2015-04-21 18:51 ` rth at gcc dot gnu.org
2015-04-28 15:32 ` jgreenhalgh at gcc dot gnu.org
2015-04-29  8:47 ` mwahab at gcc dot gnu.org
2015-04-29 12:20 ` jgreenhalgh at gcc dot gnu.org
2015-04-29 13:26 ` mwahab at gcc dot gnu.org
2015-04-29 16:04 ` amacleod at redhat dot com
2015-04-30  8:20 ` mwahab at gcc dot gnu.org
2015-05-01 12:53 ` torvald at gcc dot gnu.org
2015-05-06 14:25 ` amacleod at redhat dot com
2015-05-06 15:58 ` mwahab at gcc dot gnu.org
2015-05-07 13:25 ` mwahab at gcc dot gnu.org
2015-05-08  8:01 ` mwahab at gcc dot gnu.org
2015-05-11 13:44 ` jgreenhalgh at gcc dot gnu.org
2015-05-11 15:58 ` mwahab at gcc dot gnu.org
2015-05-12 20:02 ` amacleod at redhat dot com
2015-06-01 15:19 ` mwahab at gcc dot gnu.org
2015-06-01 15:21 ` mwahab at gcc dot gnu.org
2015-06-01 15:25 ` mwahab at gcc dot gnu.org
2015-06-11  7:35 ` ramana at gcc dot gnu.org
2015-06-29 16:04 ` mwahab at gcc dot gnu.org
2015-06-29 16:09 ` mwahab at gcc dot gnu.org
2015-06-29 16:12 ` mwahab at gcc dot gnu.org
2015-08-05 11:21 ` mwahab at gcc dot gnu.org
2015-08-05 11:30 ` mwahab at gcc dot gnu.org
2015-08-05 11:41 ` mwahab at gcc dot gnu.org
2015-08-05 11:49 ` mwahab at gcc dot gnu.org
2015-08-05 13:28 ` mwahab at gcc dot gnu.org
2015-08-05 13:40 ` mwahab at gcc dot gnu.org
2015-08-05 13:43 ` mwahab at gcc dot gnu.org
2015-09-30  3:07 ` ramana at gcc dot gnu.org
2015-10-05  8:08 ` mwahab at gcc dot gnu.org
2015-10-05  8:30 ` ramana at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).