From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 9297 invoked by alias); 7 Feb 2014 18:02:24 -0000 Mailing-List: contact gcc-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-owner@gcc.gnu.org Received: (qmail 9288 invoked by uid 89); 7 Feb 2014 18:02:24 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-2.4 required=5.0 tests=AWL,BAYES_00,RP_MATCHES_RCVD autolearn=ham version=3.3.2 X-HELO: e34.co.us.ibm.com Received: from e34.co.us.ibm.com (HELO e34.co.us.ibm.com) (32.97.110.152) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES256-SHA encrypted) ESMTPS; Fri, 07 Feb 2014 18:02:23 +0000 Received: from /spool/local by e34.co.us.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Fri, 7 Feb 2014 11:02:21 -0700 Received: from d03dlp03.boulder.ibm.com (9.17.202.179) by e34.co.us.ibm.com (192.168.1.134) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; Fri, 7 Feb 2014 11:02:19 -0700 Received: from b03cxnp08027.gho.boulder.ibm.com (b03cxnp08027.gho.boulder.ibm.com [9.17.130.19]) by d03dlp03.boulder.ibm.com (Postfix) with ESMTP id 09CD319D8042 for ; Fri, 7 Feb 2014 11:02:19 -0700 (MST) Received: from d03av06.boulder.ibm.com (d03av06.boulder.ibm.com [9.17.195.245]) by b03cxnp08027.gho.boulder.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id s17I1xai9109928 for ; Fri, 7 Feb 2014 19:01:59 +0100 Received: from d03av06.boulder.ibm.com (loopback [127.0.0.1]) by d03av06.boulder.ibm.com (8.14.4/8.13.1/NCO v10.0 AVout) with ESMTP id s17I5ctY019411 for ; Fri, 7 Feb 2014 11:05:39 -0700 Received: from paulmck-ThinkPad-W500 (dyn9050016220.mts.ibm.com [9.50.16.220] (may be forged)) by d03av06.boulder.ibm.com (8.14.4/8.13.1/NCO v10.0 AVin) with ESMTP id s17I5bcM019266; Fri, 7 Feb 2014 11:05:38 -0700 Received: by paulmck-ThinkPad-W500 (Postfix, from userid 1000) id 3DE41381B90; Fri, 7 Feb 2014 10:02:16 -0800 (PST) Date: Fri, 07 Feb 2014 18:02:00 -0000 From: "Paul E. McKenney" To: Will Deacon Cc: Peter Zijlstra , Torvald Riegel , Ramana Radhakrishnan , David Howells , "linux-arch@vger.kernel.org" , "linux-kernel@vger.kernel.org" , "torvalds@linux-foundation.org" , "akpm@linux-foundation.org" , "mingo@kernel.org" , "gcc@gcc.gnu.org" Subject: Re: [RFC][PATCH 0/5] arch: atomic rework Message-ID: <20140207180216.GP4250@linux.vnet.ibm.com> Reply-To: paulmck@linux.vnet.ibm.com References: <52F3DA85.1060209@arm.com> <20140206185910.GE27276@mudshark.cambridge.arm.com> <20140206192743.GH4250@linux.vnet.ibm.com> <1391721423.23421.3898.camel@triegel.csb> <20140206221117.GJ4250@linux.vnet.ibm.com> <1391730288.23421.4102.camel@triegel.csb> <20140207042051.GL4250@linux.vnet.ibm.com> <20140207074405.GM5002@laptop.programming.kicks-ass.net> <20140207165028.GO4250@linux.vnet.ibm.com> <20140207165548.GR5976@mudshark.cambridge.arm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20140207165548.GR5976@mudshark.cambridge.arm.com> User-Agent: Mutt/1.5.21 (2010-09-15) X-TM-AS-MML: disable X-Content-Scanned: Fidelis XPS MAILER x-cbid: 14020718-1542-0000-0000-000005DD4690 X-SW-Source: 2014-02/txt/msg00088.txt.bz2 On Fri, Feb 07, 2014 at 04:55:48PM +0000, Will Deacon wrote: > Hi Paul, > > On Fri, Feb 07, 2014 at 04:50:28PM +0000, Paul E. McKenney wrote: > > On Fri, Feb 07, 2014 at 08:44:05AM +0100, Peter Zijlstra wrote: > > > On Thu, Feb 06, 2014 at 08:20:51PM -0800, Paul E. McKenney wrote: > > > > Hopefully some discussion of out-of-thin-air values as well. > > > > > > Yes, absolutely shoot store speculation in the head already. Then drive > > > a wooden stake through its hart. > > > > > > C11/C++11 should not be allowed to claim itself a memory model until that > > > is sorted. > > > > There actually is a proposal being put forward, but it might not make ARM > > and Power people happy because it involves adding a compare, a branch, > > and an ISB/isync after every relaxed load... Me, I agree with you, > > much preferring the no-store-speculation approach. > > Can you elaborate a bit on this please? We don't permit speculative stores > in the ARM architecture, so it seems counter-intuitive that GCC needs to > emit any additional instructions to prevent that from happening. Requiring a compare/branch/ISB after each relaxed load enables a simple(r) proof that out-of-thin-air values cannot be observed in the face of any compiler optimization that refrains from reordering a prior relaxed load with a subsequent relaxed store. > Stores can, of course, be observed out-of-order but that's a lot more > reasonable :) So let me try an example. I am sure that Torvald Riegel will jump in with any needed corrections or amplifications: Initial state: x == y == 0 T1: r1 = atomic_load_explicit(x, memory_order_relaxed); atomic_store_explicit(r1, y, memory_order_relaxed); T2: r2 = atomic_load_explicit(y, memory_order_relaxed); atomic_store_explicit(r2, x, memory_order_relaxed); One would intuitively expect r1 == r2 == 0 as the only possible outcome. But suppose that the compiler used specialization optimizations, as it would if there was a function that has a very lightweight implementation for some values and a very heavyweight one for other. In particular, suppose that the lightweight implementation was for the value 42. Then the compiler might do something like the following: Initial state: x == y == 0 T1: r1 = atomic_load_explicit(x, memory_order_relaxed); if (r1 == 42) atomic_store_explicit(42, y, memory_order_relaxed); else atomic_store_explicit(r1, y, memory_order_relaxed); T2: r2 = atomic_load_explicit(y, memory_order_relaxed); atomic_store_explicit(r2, x, memory_order_relaxed); Suddenly we have an explicit constant 42 showing up. Of course, if the compiler carefully avoided speculative stores (as both Peter and I believe that it should if its code generation is to be regarded as anything other than an act of vandalism, the words in the standard notwithstanding), there would be no problem. But currently, a number of compiler writers see absolutely nothing wrong with transforming the optimized-for-42 version above with something like this: Initial state: x == y == 0 T1: r1 = atomic_load_explicit(x, memory_order_relaxed); atomic_store_explicit(42, y, memory_order_relaxed); if (r1 != 42) atomic_store_explicit(r1, y, memory_order_relaxed); T2: r2 = atomic_load_explicit(y, memory_order_relaxed); atomic_store_explicit(r2, x, memory_order_relaxed); And then it is a short and uncontroversial step to the following: Initial state: x == y == 0 T1: atomic_store_explicit(42, y, memory_order_relaxed); r1 = atomic_load_explicit(x, memory_order_relaxed); if (r1 != 42) atomic_store_explicit(r1, y, memory_order_relaxed); T2: r2 = atomic_load_explicit(y, memory_order_relaxed); atomic_store_explicit(r2, x, memory_order_relaxed); This can of course result in r1 == r2 == 42, even though the constant 42 never appeared in the original code. This is one way to generate an out-of-thin-air value. As near as I can tell, compiler writers hate the idea of prohibiting speculative-store optimizations because it requires them to introduce both control and data dependency tracking into their compilers. Many of them seem to hate dependency tracking with a purple passion. At least, such a hatred would go a long way towards explaining the incomplete and high-overhead implementations of memory_order_consume, the long and successful use of idioms based on the memory_order_consume pattern notwithstanding [*]. ;-) That said, the Java guys are talking about introducing something vaguely resembling memory_order_consume (and thus resembling the rcu_assign_pointer() and rcu_dereference() portions of RCU) to solve Java out-of-thin-air issues involving initialization, so perhaps there is hope. Thanx, Paul [*] http://queue.acm.org/detail.cfm?id=2488549 http://www.rdrop.com/users/paulmck/RCU/rclockpdcsproof.pdf