From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 114212 invoked by alias); 9 Apr 2015 10:47:19 -0000 Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-bugs-owner@gcc.gnu.org Received: (qmail 114133 invoked by uid 48); 9 Apr 2015 10:47:15 -0000 From: "matthew.wahab at arm dot com" To: gcc-bugs@gcc.gnu.org Subject: [Bug target/65697] __atomic memory barriers not strong enough for __sync builtins Date: Thu, 09 Apr 2015 10:47:00 -0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: target X-Bugzilla-Version: 5.0 X-Bugzilla-Keywords: wrong-code X-Bugzilla-Severity: normal X-Bugzilla-Who: matthew.wahab at arm dot com X-Bugzilla-Status: UNCONFIRMED X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-SW-Source: 2015-04/txt/msg00666.txt.bz2 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D65697 --- Comment #7 from Matthew Wahab --- (In reply to torvald from comment #5) > (In reply to Matthew Wahab from comment #0) > > The __sync builtins are implemented by expanding them to the equivalent > > __atomic builtins, using MEMMODEL_SEQ_CST for those that are full barri= ers. > > This is too weak since __atomic operations with MEMMODEL_SEQ_CST still = allow > > some data references to move across the operation (see > > https://gcc.gnu.org/ml/gcc/2014-02/msg00058.html) >=20 > I don't think that's a precise characterization. The difference discussed > in that thread is that seq-cst fences have a stronger effect than seq-cst > stores. This is not really a question of MEMMODEL_SEQ_CST but what you > apply it to. Ok, my point was just that an __sync operation has a stronger barrier that = an __atomic operation. > This test case is problematic in that it has a data race on bar unless bar > is never modified by another thread, in which case it would be fine to lo= ad > bar before the fetch_add. It also loads bar irrespective of which value = the > fetch_add actually returns. I'm aware that the __sync builtins offer no > MEMMODEL_RELAXED, and so lots of code simply has a plain memory access.= =20=20 >=20 > Nonetheless, I think you should also look at a test case that is properly > synchronized, data-race-free C11 code with the builtins, and see whether > that is compiled differently (ie, either use a relaxed MO load for bar or > make the load of bar conditional on the outcome of the fetch_add). >=20 > To be on the safe side, you can also use the cppmem tool to verify what t= he > C11 model requires the compiled code to guarantee. Here's what you proba= bly > want to achieve: I agree that this wouldn't affect valid C11 code (because of data-races) bu= t my understanding is that __sync builtins don't require a C11 model. The proble= m is that the __syncs are being implemented using atomic operations that do assu= me a C11 model and its notion of program validity. This looks like it would lead= to differences in behaviour when code, using only the __sync builtins and know= ing nothing of C11, is moved between targets with different memory models. > > ---- > > produces > > ---- > > T1: > > adrp x0, .LANCHOR0 > > add x0, x0, :lo12:.LANCHOR0 > > .L2: > > ldaxr w1, [x0] ; load-acquire (__sync_fetch_and_add) > > add w1, w1, 1 > > stlxr w2, w1, [x0] ; store-release (__sync_fetch_and_add) > > cbnz w2, .L2 > > ldr w0, [x0, 4] ; load (return bar) > > ret > > ---- > > With this code, the load can be executed ahead of the store-release whi= ch > > ends the __sync_fetch_and_add. A correct implementation should emit a d= mb > > instruction after the cbnz. >=20 > I'm not sufficiently familiar with ARM to assess whether that's correct. = Is > this ARMv8? It seems to be consistent with the mapping to machine code t= hat > GCC has followed (http://www.cl.cam.ac.uk/~pes20/cpp/cpp0xmappings.html), > which does not include a DMB. However, it also shows no difference betwe= en > an acq_rel CAS and a seq_cst CAS. Could it be that the CAS sequence itse= lf > (ie, the machine code for it) prevents the reordering you are concerned > about at the HW level? Yes, its ARMv8. It's consistent with the code expected for the atomics used= to implement the __sync_fetch_and_add. I believe that acquire-release will normally be treated as seq-cst for the aarch64 back-end (I'd have to double-check though). >>From gcc-bugs-return-483115-listarch-gcc-bugs=gcc.gnu.org@gcc.gnu.org Thu Apr 09 10:49:14 2015 Return-Path: Delivered-To: listarch-gcc-bugs@gcc.gnu.org Received: (qmail 116000 invoked by alias); 9 Apr 2015 10:49:14 -0000 Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-bugs-owner@gcc.gnu.org Delivered-To: mailing list gcc-bugs@gcc.gnu.org Received: (qmail 115950 invoked by uid 48); 9 Apr 2015 10:49:10 -0000 From: "rguenth at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug c++/61971] [4.9 Regression] array subscript is above array bounds [-Werror=array-bounds] Date: Thu, 09 Apr 2015 10:49:00 -0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: c++ X-Bugzilla-Version: 4.8.2 X-Bugzilla-Keywords: X-Bugzilla-Severity: normal X-Bugzilla-Who: rguenth at gcc dot gnu.org X-Bugzilla-Status: RESOLVED X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: 4.9.3 X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: bug_status resolution target_milestone Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-SW-Source: 2015-04/txt/msg00667.txt.bz2 Content-length: 775 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61971 Richard Biener changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution|--- |FIXED Target Milestone|--- |4.9.3 --- Comment #6 from Richard Biener --- This is fixed with the backport of 2015-01-27 Richard Biener PR tree-optimization/56273 PR tree-optimization/59124 PR tree-optimization/64277 * tree-vrp.c (vrp_finalize): Emit array-bound warnings only from the first VRP pass.