From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 16053 invoked by alias); 26 Feb 2014 05:22:42 -0000 Mailing-List: contact gcc-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-owner@gcc.gnu.org Received: (qmail 16043 invoked by uid 89); 26 Feb 2014 05:22:41 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=0.9 required=5.0 tests=AWL,BAYES_50,KHOP_BIG_TO_CC,RP_MATCHES_RCVD autolearn=no version=3.3.2 X-HELO: e34.co.us.ibm.com Received: from e34.co.us.ibm.com (HELO e34.co.us.ibm.com) (32.97.110.152) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES256-SHA encrypted) ESMTPS; Wed, 26 Feb 2014 05:22:40 +0000 Received: from /spool/local by e34.co.us.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Tue, 25 Feb 2014 22:22:38 -0700 Received: from d03dlp02.boulder.ibm.com (9.17.202.178) by e34.co.us.ibm.com (192.168.1.134) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; Tue, 25 Feb 2014 22:22:37 -0700 Received: from b03cxnp08025.gho.boulder.ibm.com (b03cxnp08025.gho.boulder.ibm.com [9.17.130.17]) by d03dlp02.boulder.ibm.com (Postfix) with ESMTP id A3ADC3E40047 for ; Tue, 25 Feb 2014 22:22:36 -0700 (MST) Received: from d03av06.boulder.ibm.com (d03av06.boulder.ibm.com [9.17.195.245]) by b03cxnp08025.gho.boulder.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id s1Q5Mapw4194732 for ; Wed, 26 Feb 2014 06:22:36 +0100 Received: from d03av06.boulder.ibm.com (loopback [127.0.0.1]) by d03av06.boulder.ibm.com (8.14.4/8.13.1/NCO v10.0 AVout) with ESMTP id s1Q5Q38d031064 for ; Tue, 25 Feb 2014 22:26:04 -0700 Received: from paulmck-ThinkPad-W500 (dyn9050020227.mts.ibm.com [9.50.20.227] (may be forged)) by d03av06.boulder.ibm.com (8.14.4/8.13.1/NCO v10.0 AVin) with ESMTP id s1Q5Q2rZ031038; Tue, 25 Feb 2014 22:26:03 -0700 Received: by paulmck-ThinkPad-W500 (Postfix, from userid 1000) id C604E38D6AD; Tue, 25 Feb 2014 21:22:34 -0800 (PST) Date: Wed, 26 Feb 2014 05:22:00 -0000 From: "Paul E. McKenney" To: George Spelvin Cc: akpm@linux-foundation.org, dhowells@redhat.com, gcc@gcc.gnu.org, linux-arch@vger.kernel.org, linux-kernel@vger.kernel.org, mingo@kernel.org, peterz@infradead.org, Ramana.Radhakrishnan@arm.com, torvalds@linux-foundation.org, triegel@redhat.com, will.deacon@arm.com Subject: Re: [RFC][PATCH 0/5] arch: atomic rework Message-ID: <20140226052234.GC8264@linux.vnet.ibm.com> Reply-To: paulmck@linux.vnet.ibm.com References: <20140226030653.17328.qmail@science.horizon.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20140226030653.17328.qmail@science.horizon.com> User-Agent: Mutt/1.5.21 (2010-09-15) X-TM-AS-MML: disable X-Content-Scanned: Fidelis XPS MAILER x-cbid: 14022605-1542-0000-0000-0000069F77DC X-SW-Source: 2014-02/txt/msg00488.txt.bz2 On Tue, Feb 25, 2014 at 10:06:53PM -0500, George Spelvin wrote: > wrote: > > wrote: > >> I have for the last several years been 100% convinced that the Intel > >> memory ordering is the right thing, and that people who like weak > >> memory ordering are wrong and should try to avoid reproducing if at > >> all possible. > > > > Are ARM and Power really the bad boys here? Or are they instead playing > > the role of the canary in the coal mine? > > To paraphrase some older threads, I think Linus's argument is that > weak memory ordering is like branch delay slots: a way to make a simple > implementation simpler, but ends up being no help to a more aggressive > implementation. > > Branch delay slots give a one-cycle bonus to in-order cores, but > once you go superscalar and add branch prediction, they stop helping, > and once you go full out of order, they're just an annoyance. > > Likewise, I can see the point that weak ordering can help make a simple > cache interface simpler, but once you start doing speculative loads, > you've already bought and paid for all the hardware you need to do > stronger coherency. > > Another thing that requires all the strong-coherency machinery is > a high-performance implementation of the various memory barrier and > synchronization operations. Yes, a low-performance (drain the pipeline) > implementation is tolerable if the instructions aren't used frequently, > but once you're really trying, it doesn't save complexity. > > Once you're there, strong coherency always doesn't actually cost you any > time outside of critical synchronization code, and it both simplifies > and speeds up the tricky synchronization software. > > > So PPC and ARM's weak ordering are not the direction the future is going. > Rather, weak ordering is something that's only useful in a limited > technology window, which is rapidly passing. That does indeed appear to be Intel's story. Might well be correct. Time will tell. > If you can find someone in IBM who's worked on the Z series cache > coherency (extremely strong ordering), they probably have some useful > insights. The big question is if strong ordering, once you've accepted > the implementation complexity and area, actually costs anything in > execution time. If there's an unavoidable cost which weak ordering saves, > that's significant. There has been a lot of ink spilled on this argument. ;-) PPC has much larger CPU counts than does the mainframe. On the other hand, there are large x86 systems. Some claim that there are differences in latency due to the different approaches, and there could be a long argument about whether all this in inherent in the memory ordering or whether it is due to implementation issues. I don't claim to know the answer. I do know that ARM and PPC are here now, and that I need to deal with them. Thanx, Paul