From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <libc-ports-return-3701-listarch-libc-ports=sources.redhat.com@sourceware.org>
Received: (qmail 27088 invoked by alias); 20 Feb 2013 22:41:29 -0000
Received: (qmail 27062 invoked by uid 22791); 20 Feb 2013 22:41:28 -0000
X-SWARE-Spam-Status: No, hits=-6.8 required=5.0	tests=AWL,BAYES_00,KHOP_RCVD_UNTRUST,KHOP_SPAMHAUS_DROP,KHOP_THREADED,RCVD_IN_DNSWL_HI,RCVD_IN_HOSTKARMA_W,RP_MATCHES_RCVD
X-Spam-Check-By: sourceware.org
Received: from e38.co.us.ibm.com (HELO e38.co.us.ibm.com) (32.97.110.159)    by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Wed, 20 Feb 2013 22:41:20 +0000
Received: from /spool/local	by e38.co.us.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted	for <libc-ports@sourceware.org> from <munroesj@linux.vnet.ibm.com>;	Wed, 20 Feb 2013 15:41:15 -0700
Received: from d01dlp02.pok.ibm.com (9.56.250.167)	by e38.co.us.ibm.com (192.168.1.138) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted;	Wed, 20 Feb 2013 15:41:13 -0700
Received: from d01relay02.pok.ibm.com (d01relay02.pok.ibm.com [9.56.227.234])	by d01dlp02.pok.ibm.com (Postfix) with ESMTP id 1BC716ED34B;	Wed, 20 Feb 2013 16:26:16 -0500 (EST)
Received: from d03av01.boulder.ibm.com (d03av01.boulder.ibm.com [9.17.195.167])	by d01relay02.pok.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id r1KLQH85305462;	Wed, 20 Feb 2013 16:26:17 -0500
Received: from d03av01.boulder.ibm.com (loopback [127.0.0.1])	by d03av01.boulder.ibm.com (8.14.4/8.13.1/NCO v10.0 AVout) with ESMTP id r1KLPjLb011093;	Wed, 20 Feb 2013 14:25:46 -0700
Received: from spokane1.rchland.ibm.com (spokane1.rchland.ibm.com [9.10.86.94])	by d03av01.boulder.ibm.com (8.14.4/8.13.1/NCO v10.0 AVin) with ESMTP id r1KLPggl010400;	Wed, 20 Feb 2013 14:25:43 -0700
Subject: Re: PI mutex support for pthread_cond_* now in nptl
From: Steven Munroe <munroesj@linux.vnet.ibm.com>
Reply-To: munroesj@us.ibm.com
To: Torvald Riegel <triegel@redhat.com>
Cc: munroesj@us.ibm.com, "Joseph S. Myers" <joseph@codesourcery.com>,        Richard Henderson <rth@twiddle.net>,        Siddhesh Poyarekar <siddhesh@redhat.com>, libc-ports@sourceware.org,        libc-alpha@sourceware.org
In-Reply-To: <1361391926.581.1774.camel@triegel.csb>
References: <20130218105637.GJ32163@spoyarek.pnq.redhat.com>	 <5123AB55.2070100@twiddle.net>	 <Pine.LNX.4.64.1302191714120.8011@digraph.polyomino.org.uk>	 <1361304381.581.80.camel@triegel.csb>	 <1361379598.19573.167.camel@spokane1.rchland.ibm.com>	 <1361391926.581.1774.camel@triegel.csb>
Content-Type: text/plain
Date: Wed, 20 Feb 2013 22:41:00 -0000
Message-Id: <1361396429.19573.173.camel@spokane1.rchland.ibm.com>
Mime-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Content-Scanned: Fidelis XPS MAILER
x-cbid: 13022022-5518-0000-0000-00000BC777CC
Mailing-List: contact libc-ports-help@sourceware.org; run by ezmlm
Precedence: bulk
List-Id: <libc-ports.sourceware.org>
List-Subscribe: <mailto:libc-ports-subscribe@sourceware.org>
List-Post: <mailto:libc-ports@sourceware.org>
List-Help: <mailto:libc-ports-help@sourceware.org>, <http://sourceware.org/lists.html#faqs>
Sender: libc-ports-owner@sourceware.org
X-SW-Source: 2013-02/txt/msg00058.txt.bz2

On Wed, 2013-02-20 at 21:25 +0100, Torvald Riegel wrote:
> On Wed, 2013-02-20 at 10:59 -0600, Steven Munroe wrote:
> > On Tue, 2013-02-19 at 21:06 +0100, Torvald Riegel wrote:
> > > On Tue, 2013-02-19 at 17:18 +0000, Joseph S. Myers wrote:
> > > > On Tue, 19 Feb 2013, Richard Henderson wrote:
> > > > 
> > > > > Any chance we can move these macros into a generic linux header?
> > > > > Given that we're using INTERNAL_SYSCALL macros, the definitions ought to be
> > > > > the same for all targets.
> > > > 
> > > > Generally most of lowlevellock.h should probably be shared between 
> > > > architectures.  (If some architectures don't implement a particular 
> > > > feature as of a particular kernel version, that's a matter for 
> > > > kernel-features.h and __ASSUME_* conditionals.)
> > > 
> > > On a related note: What are the reasons to have arch-specific assembler
> > > versions of many of the synchronization operations?  I would be
> > > surprised if they'd provide a significant performance advantage; has
> > > anyone recent measurements for this?
> > > 
> > The introduction of GCC compiler builtins like __sync is fairly recent
> > and the new __atomic builtins start with GCC-4.7. So until recently we
> > had no choice. 
> 
> Using assembler for the atomic operations is possible (e.g., as in
> Boehm's libatomic-ops, or in./sysdeps/powerpc/bits/atomic.h and others).
> It doesn't allow for the same level of compiler optimization across
> barriers, but it's unclear whether that has much benefit, and GCC
> doesn't do it yet anyway.
> 
> There are some cases in which compilers that don't support the C11/C++11
> memory model can generate code that wouldn't be correct in such a model,
> and which can theoretically interfere with other concurrent code (e.g.,
> introduce data races due to accesses being too wide).  However, because
> we don't have custom assembler for everything, we should be already
> exposed to that.
> 
> > For platforms (like PowerPC) that implement acquire/release the GCC
> > __sync builtins are not sufficient and GCC-4.7 __atomic builtins are not
> > pervasive enough to make that the default. 
> 
> I agree regarding the __sync builtins, but using assembler in place of
> the __atomic builtins should work, or not?
> 
> > > It seems to me that it would be useful to consolidate the different
> > > versions that exist for the synchronization operations into shared C
> > > code as long as this doesn't make a significant performance difference.
> > > They are all based on atomic operations and futex operations, both of
> > > which we have in C code (especially if we have compilers that support
> > > the C11 memory model).  Or are there other reasons for keeping different
> > > versions that I'm not aware of?
> > > 
> > I disagree. The performance of lowlevellocks and associated platform
> > specific optimizations are too import to move forward with the
> > consolidation you suggest.
> 
> Which specific optimizations do you refer to?  I didn't see any for
> powerpc, for example (i.e., the lock fast path is C up to the point of
> the atomic operation).  The ones that I saw are for x86, and I'm
> wondering whether they provide much benefit.  Especially because this
> can mostly just matter for the execution path taken when a free lock is
> acquired; once you get any cache miss, you're to some extent on the slow
> path anyway.  Also, for the Linux platforms I looked at, the mutex
> algorithms are the same.
> 
Like the lwarx MUTEX_HINT (EH field) hint.

> Do you have any recent measurements (or could point to them) that show
> the benefit of the optimizations you refer to?
> 
No. I don't current have access to a machine big enough show this effect
and I cant tell you about the specific customer. So you will have to
trust me on this.