From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-return-181910-listarch-gcc=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 9297 invoked by alias); 7 Feb 2014 18:02:24 -0000
Mailing-List: contact gcc-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc/>
List-Post: <mailto:gcc@gcc.gnu.org>
List-Help: <http://gcc.gnu.org/ml/>
Sender: gcc-owner@gcc.gnu.org
Received: (qmail 9288 invoked by uid 89); 7 Feb 2014 18:02:24 -0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-2.4 required=5.0 tests=AWL,BAYES_00,RP_MATCHES_RCVD autolearn=ham version=3.3.2
X-HELO: e34.co.us.ibm.com
Received: from e34.co.us.ibm.com (HELO e34.co.us.ibm.com) (32.97.110.152) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES256-SHA encrypted) ESMTPS; Fri, 07 Feb 2014 18:02:23 +0000
Received: from /spool/local	by e34.co.us.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted	for <gcc@gcc.gnu.org> from <paulmck@linux.vnet.ibm.com>;	Fri, 7 Feb 2014 11:02:21 -0700
Received: from d03dlp03.boulder.ibm.com (9.17.202.179)	by e34.co.us.ibm.com (192.168.1.134) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted;	Fri, 7 Feb 2014 11:02:19 -0700
Received: from b03cxnp08027.gho.boulder.ibm.com (b03cxnp08027.gho.boulder.ibm.com [9.17.130.19])	by d03dlp03.boulder.ibm.com (Postfix) with ESMTP id 09CD319D8042	for <gcc@gcc.gnu.org>; Fri,  7 Feb 2014 11:02:19 -0700 (MST)
Received: from d03av06.boulder.ibm.com (d03av06.boulder.ibm.com [9.17.195.245])	by b03cxnp08027.gho.boulder.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id s17I1xai9109928	for <gcc@gcc.gnu.org>; Fri, 7 Feb 2014 19:01:59 +0100
Received: from d03av06.boulder.ibm.com (loopback [127.0.0.1])	by d03av06.boulder.ibm.com (8.14.4/8.13.1/NCO v10.0 AVout) with ESMTP id s17I5ctY019411	for <gcc@gcc.gnu.org>; Fri, 7 Feb 2014 11:05:39 -0700
Received: from paulmck-ThinkPad-W500 (dyn9050016220.mts.ibm.com [9.50.16.220] (may be forged))	by d03av06.boulder.ibm.com (8.14.4/8.13.1/NCO v10.0 AVin) with ESMTP id s17I5bcM019266;	Fri, 7 Feb 2014 11:05:38 -0700
Received: by paulmck-ThinkPad-W500 (Postfix, from userid 1000)	id 3DE41381B90; Fri,  7 Feb 2014 10:02:16 -0800 (PST)
Date: Fri, 07 Feb 2014 18:02:00 -0000
From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Will Deacon <will.deacon@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>, Torvald Riegel <triegel@redhat.com>,        Ramana Radhakrishnan <Ramana.Radhakrishnan@arm.com>,        David Howells <dhowells@redhat.com>,        "linux-arch@vger.kernel.org" <linux-arch@vger.kernel.org>,        "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,        "torvalds@linux-foundation.org" <torvalds@linux-foundation.org>,        "akpm@linux-foundation.org" <akpm@linux-foundation.org>,        "mingo@kernel.org" <mingo@kernel.org>,        "gcc@gcc.gnu.org" <gcc@gcc.gnu.org>
Subject: Re: [RFC][PATCH 0/5] arch: atomic rework
Message-ID: <20140207180216.GP4250@linux.vnet.ibm.com>
Reply-To: paulmck@linux.vnet.ibm.com
References: <52F3DA85.1060209@arm.com> <20140206185910.GE27276@mudshark.cambridge.arm.com> <20140206192743.GH4250@linux.vnet.ibm.com> <1391721423.23421.3898.camel@triegel.csb> <20140206221117.GJ4250@linux.vnet.ibm.com> <1391730288.23421.4102.camel@triegel.csb> <20140207042051.GL4250@linux.vnet.ibm.com> <20140207074405.GM5002@laptop.programming.kicks-ass.net> <20140207165028.GO4250@linux.vnet.ibm.com> <20140207165548.GR5976@mudshark.cambridge.arm.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20140207165548.GR5976@mudshark.cambridge.arm.com>
User-Agent: Mutt/1.5.21 (2010-09-15)
X-TM-AS-MML: disable
X-Content-Scanned: Fidelis XPS MAILER
x-cbid: 14020718-1542-0000-0000-000005DD4690
X-SW-Source: 2014-02/txt/msg00088.txt.bz2

On Fri, Feb 07, 2014 at 04:55:48PM +0000, Will Deacon wrote:
> Hi Paul,
> 
> On Fri, Feb 07, 2014 at 04:50:28PM +0000, Paul E. McKenney wrote:
> > On Fri, Feb 07, 2014 at 08:44:05AM +0100, Peter Zijlstra wrote:
> > > On Thu, Feb 06, 2014 at 08:20:51PM -0800, Paul E. McKenney wrote:
> > > > Hopefully some discussion of out-of-thin-air values as well.
> > > 
> > > Yes, absolutely shoot store speculation in the head already. Then drive
> > > a wooden stake through its hart.
> > > 
> > > C11/C++11 should not be allowed to claim itself a memory model until that
> > > is sorted.
> > 
> > There actually is a proposal being put forward, but it might not make ARM
> > and Power people happy because it involves adding a compare, a branch,
> > and an ISB/isync after every relaxed load...  Me, I agree with you,
> > much preferring the no-store-speculation approach.
> 
> Can you elaborate a bit on this please? We don't permit speculative stores
> in the ARM architecture, so it seems counter-intuitive that GCC needs to
> emit any additional instructions to prevent that from happening.

Requiring a compare/branch/ISB after each relaxed load enables a simple(r)
proof that out-of-thin-air values cannot be observed in the face of any
compiler optimization that refrains from reordering a prior relaxed load
with a subsequent relaxed store.

> Stores can, of course, be observed out-of-order but that's a lot more
> reasonable :)

So let me try an example.  I am sure that Torvald Riegel will jump in
with any needed corrections or amplifications:

Initial state: x == y == 0

T1:	r1 = atomic_load_explicit(x, memory_order_relaxed);
	atomic_store_explicit(r1, y, memory_order_relaxed);

T2:	r2 = atomic_load_explicit(y, memory_order_relaxed);
	atomic_store_explicit(r2, x, memory_order_relaxed);

One would intuitively expect r1 == r2 == 0 as the only possible outcome.
But suppose that the compiler used specialization optimizations, as it
would if there was a function that has a very lightweight implementation
for some values and a very heavyweight one for other.  In particular,
suppose that the lightweight implementation was for the value 42.
Then the compiler might do something like the following:

Initial state: x == y == 0

T1:	r1 = atomic_load_explicit(x, memory_order_relaxed);
	if (r1 == 42)
		atomic_store_explicit(42, y, memory_order_relaxed);
	else
		atomic_store_explicit(r1, y, memory_order_relaxed);

T2:	r2 = atomic_load_explicit(y, memory_order_relaxed);
	atomic_store_explicit(r2, x, memory_order_relaxed);

Suddenly we have an explicit constant 42 showing up.  Of course, if
the compiler carefully avoided speculative stores (as both Peter and
I believe that it should if its code generation is to be regarded as
anything other than an act of vandalism, the words in the standard
notwithstanding), there would be no problem.  But currently, a number
of compiler writers see absolutely nothing wrong with transforming
the optimized-for-42 version above with something like this:

Initial state: x == y == 0

T1:	r1 = atomic_load_explicit(x, memory_order_relaxed);
	atomic_store_explicit(42, y, memory_order_relaxed);
	if (r1 != 42)
		atomic_store_explicit(r1, y, memory_order_relaxed);

T2:	r2 = atomic_load_explicit(y, memory_order_relaxed);
	atomic_store_explicit(r2, x, memory_order_relaxed);

And then it is a short and uncontroversial step to the following:

Initial state: x == y == 0

T1:	atomic_store_explicit(42, y, memory_order_relaxed);
	r1 = atomic_load_explicit(x, memory_order_relaxed);
	if (r1 != 42)
		atomic_store_explicit(r1, y, memory_order_relaxed);

T2:	r2 = atomic_load_explicit(y, memory_order_relaxed);
	atomic_store_explicit(r2, x, memory_order_relaxed);

This can of course result in r1 == r2 == 42, even though the constant
42 never appeared in the original code.  This is one way to generate
an out-of-thin-air value.

As near as I can tell, compiler writers hate the idea of prohibiting
speculative-store optimizations because it requires them to introduce
both control and data dependency tracking into their compilers.  Many of
them seem to hate dependency tracking with a purple passion.  At least,
such a hatred would go a long way towards explaining the incomplete
and high-overhead implementations of memory_order_consume, the long
and successful use of idioms based on the memory_order_consume pattern
notwithstanding [*].  ;-)

That said, the Java guys are talking about introducing something
vaguely resembling memory_order_consume (and thus resembling the
rcu_assign_pointer() and rcu_dereference() portions of RCU) to solve Java
out-of-thin-air issues involving initialization, so perhaps there is hope.

						Thanx, Paul

[*] http://queue.acm.org/detail.cfm?id=2488549
    http://www.rdrop.com/users/paulmck/RCU/rclockpdcsproof.pdf