From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-bugs-return-483114-listarch-gcc-bugs=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 114212 invoked by alias); 9 Apr 2015 10:47:19 -0000
Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-bugs.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-bugs/>
List-Post: <mailto:gcc-bugs@gcc.gnu.org>
List-Help: <mailto:gcc-bugs-help@gcc.gnu.org>
Sender: gcc-bugs-owner@gcc.gnu.org
Received: (qmail 114133 invoked by uid 48); 9 Apr 2015 10:47:15 -0000
From: "matthew.wahab at arm dot com" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug target/65697] __atomic memory barriers not strong enough for __sync builtins
Date: Thu, 09 Apr 2015 10:47:00 -0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gcc
X-Bugzilla-Component: target
X-Bugzilla-Version: 5.0
X-Bugzilla-Keywords: wrong-code
X-Bugzilla-Severity: normal
X-Bugzilla-Who: matthew.wahab at arm dot com
X-Bugzilla-Status: UNCONFIRMED
X-Bugzilla-Priority: P3
X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org
X-Bugzilla-Target-Milestone: ---
X-Bugzilla-Flags:
X-Bugzilla-Changed-Fields:
Message-ID: <bug-65697-4-nxTR92wMHT@http.gcc.gnu.org/bugzilla/>
In-Reply-To: <bug-65697-4@http.gcc.gnu.org/bugzilla/>
References: <bug-65697-4@http.gcc.gnu.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
X-SW-Source: 2015-04/txt/msg00666.txt.bz2

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D65697

--- Comment #7 from Matthew Wahab <matthew.wahab at arm dot com> ---
(In reply to torvald from comment #5)
> (In reply to Matthew Wahab from comment #0)
> > The __sync builtins are implemented by expanding them to the equivalent
> > __atomic builtins, using MEMMODEL_SEQ_CST for those that are full barri=
ers.
> > This is too weak since __atomic operations with MEMMODEL_SEQ_CST still =
allow
> > some data references to move across the operation (see
> > https://gcc.gnu.org/ml/gcc/2014-02/msg00058.html)
>=20
> I don't think that's a precise characterization.  The difference discussed
> in that thread is that seq-cst fences have a stronger effect than seq-cst
> stores.  This is not really a question of MEMMODEL_SEQ_CST but what you
> apply it to.

Ok, my point was just that an __sync operation has a stronger barrier that =
an
__atomic operation.

> This test case is problematic in that it has a data race on bar unless bar
> is never modified by another thread, in which case it would be fine to lo=
ad
> bar before the fetch_add.  It also loads bar irrespective of which value =
the
> fetch_add actually returns.  I'm aware that the __sync builtins offer no
> MEMMODEL_RELAXED, and so lots of code simply has a plain memory access.=
=20=20
>=20
> Nonetheless, I think you should also look at a test case that is properly
> synchronized, data-race-free C11 code with the builtins, and see whether
> that is compiled differently (ie, either use a relaxed MO load for bar or
> make the load of bar conditional on the outcome of the fetch_add).
>=20
> To be on the safe side, you can also use the cppmem tool to verify what t=
he
> C11 model requires the compiled code to guarantee.  Here's what you proba=
bly
> want to achieve:

I agree that this wouldn't affect valid C11 code (because of data-races) bu=
t my
understanding is that __sync builtins don't require a C11 model. The proble=
m is
that the __syncs are being implemented using atomic operations that do assu=
me a
C11 model and its notion of program validity. This looks like it would lead=
 to
differences in behaviour when code, using only the __sync builtins and know=
ing
nothing of C11, is moved between targets with different memory models.


> > ----
> > produces
> > ----
> > T1:
> > 	adrp	x0, .LANCHOR0
> > 	add	x0, x0, :lo12:.LANCHOR0
> > .L2:
> > 	ldaxr	w1, [x0]       ; load-acquire (__sync_fetch_and_add)
> > 	add	w1, w1, 1
> > 	stlxr	w2, w1, [x0]   ; store-release  (__sync_fetch_and_add)
> > 	cbnz	w2, .L2
> > 	ldr	w0, [x0, 4]    ; load (return bar)
> > 	ret
> > ----
> > With this code, the load can be executed ahead of the store-release whi=
ch
> > ends the __sync_fetch_and_add. A correct implementation should emit a d=
mb
> > instruction after the cbnz.
>=20
> I'm not sufficiently familiar with ARM to assess whether that's correct. =
 Is
> this ARMv8?  It seems to be consistent with the mapping to machine code t=
hat
> GCC has followed (http://www.cl.cam.ac.uk/~pes20/cpp/cpp0xmappings.html),
> which does not include a DMB.  However, it also shows no difference betwe=
en
> an acq_rel CAS and a seq_cst CAS.  Could it be that the CAS sequence itse=
lf
> (ie, the machine code for it) prevents the reordering you are concerned
> about at the HW level?

Yes, its ARMv8. It's consistent with the code expected for the atomics used=
 to
implement the __sync_fetch_and_add. I believe that acquire-release will
normally be treated as seq-cst for the aarch64 back-end (I'd have to
double-check though).
>>From gcc-bugs-return-483115-listarch-gcc-bugs=gcc.gnu.org@gcc.gnu.org Thu Apr 09 10:49:14 2015
Return-Path: <gcc-bugs-return-483115-listarch-gcc-bugs=gcc.gnu.org@gcc.gnu.org>
Delivered-To: listarch-gcc-bugs@gcc.gnu.org
Received: (qmail 116000 invoked by alias); 9 Apr 2015 10:49:14 -0000
Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-bugs.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-bugs/>
List-Post: <mailto:gcc-bugs@gcc.gnu.org>
List-Help: <mailto:gcc-bugs-help@gcc.gnu.org>
Sender: gcc-bugs-owner@gcc.gnu.org
Delivered-To: mailing list gcc-bugs@gcc.gnu.org
Received: (qmail 115950 invoked by uid 48); 9 Apr 2015 10:49:10 -0000
From: "rguenth at gcc dot gnu.org" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug c++/61971] [4.9 Regression] array subscript is above array bounds [-Werror=array-bounds]
Date: Thu, 09 Apr 2015 10:49:00 -0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gcc
X-Bugzilla-Component: c++
X-Bugzilla-Version: 4.8.2
X-Bugzilla-Keywords:
X-Bugzilla-Severity: normal
X-Bugzilla-Who: rguenth at gcc dot gnu.org
X-Bugzilla-Status: RESOLVED
X-Bugzilla-Priority: P3
X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org
X-Bugzilla-Target-Milestone: 4.9.3
X-Bugzilla-Flags:
X-Bugzilla-Changed-Fields: bug_status resolution target_milestone
Message-ID: <bug-61971-4-aQDA57azi3@http.gcc.gnu.org/bugzilla/>
In-Reply-To: <bug-61971-4@http.gcc.gnu.org/bugzilla/>
References: <bug-61971-4@http.gcc.gnu.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: 7bit
X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
X-SW-Source: 2015-04/txt/msg00667.txt.bz2
Content-length: 775

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61971

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|---                         |FIXED
   Target Milestone|---                         |4.9.3

--- Comment #6 from Richard Biener <rguenth at gcc dot gnu.org> ---
This is fixed with the backport of

        2015-01-27  Richard Biener  <rguenther@suse.de>

        PR tree-optimization/56273
        PR tree-optimization/59124
        PR tree-optimization/64277
        * tree-vrp.c (vrp_finalize): Emit array-bound warnings only
        from the first VRP pass.