From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 27088 invoked by alias); 20 Feb 2013 22:41:29 -0000 Received: (qmail 27062 invoked by uid 22791); 20 Feb 2013 22:41:28 -0000 X-SWARE-Spam-Status: No, hits=-6.8 required=5.0 tests=AWL,BAYES_00,KHOP_RCVD_UNTRUST,KHOP_SPAMHAUS_DROP,KHOP_THREADED,RCVD_IN_DNSWL_HI,RCVD_IN_HOSTKARMA_W,RP_MATCHES_RCVD X-Spam-Check-By: sourceware.org Received: from e38.co.us.ibm.com (HELO e38.co.us.ibm.com) (32.97.110.159) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Wed, 20 Feb 2013 22:41:20 +0000 Received: from /spool/local by e38.co.us.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Wed, 20 Feb 2013 15:41:15 -0700 Received: from d01dlp02.pok.ibm.com (9.56.250.167) by e38.co.us.ibm.com (192.168.1.138) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; Wed, 20 Feb 2013 15:41:13 -0700 Received: from d01relay02.pok.ibm.com (d01relay02.pok.ibm.com [9.56.227.234]) by d01dlp02.pok.ibm.com (Postfix) with ESMTP id 1BC716ED34B; Wed, 20 Feb 2013 16:26:16 -0500 (EST) Received: from d03av01.boulder.ibm.com (d03av01.boulder.ibm.com [9.17.195.167]) by d01relay02.pok.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id r1KLQH85305462; Wed, 20 Feb 2013 16:26:17 -0500 Received: from d03av01.boulder.ibm.com (loopback [127.0.0.1]) by d03av01.boulder.ibm.com (8.14.4/8.13.1/NCO v10.0 AVout) with ESMTP id r1KLPjLb011093; Wed, 20 Feb 2013 14:25:46 -0700 Received: from spokane1.rchland.ibm.com (spokane1.rchland.ibm.com [9.10.86.94]) by d03av01.boulder.ibm.com (8.14.4/8.13.1/NCO v10.0 AVin) with ESMTP id r1KLPggl010400; Wed, 20 Feb 2013 14:25:43 -0700 Subject: Re: PI mutex support for pthread_cond_* now in nptl From: Steven Munroe Reply-To: munroesj@us.ibm.com To: Torvald Riegel Cc: munroesj@us.ibm.com, "Joseph S. Myers" , Richard Henderson , Siddhesh Poyarekar , libc-ports@sourceware.org, libc-alpha@sourceware.org In-Reply-To: <1361391926.581.1774.camel@triegel.csb> References: <20130218105637.GJ32163@spoyarek.pnq.redhat.com> <5123AB55.2070100@twiddle.net> <1361304381.581.80.camel@triegel.csb> <1361379598.19573.167.camel@spokane1.rchland.ibm.com> <1361391926.581.1774.camel@triegel.csb> Content-Type: text/plain Date: Wed, 20 Feb 2013 22:41:00 -0000 Message-Id: <1361396429.19573.173.camel@spokane1.rchland.ibm.com> Mime-Version: 1.0 Content-Transfer-Encoding: 7bit X-Content-Scanned: Fidelis XPS MAILER x-cbid: 13022022-5518-0000-0000-00000BC777CC Mailing-List: contact libc-ports-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Post: List-Help: , Sender: libc-ports-owner@sourceware.org X-SW-Source: 2013-02/txt/msg00058.txt.bz2 On Wed, 2013-02-20 at 21:25 +0100, Torvald Riegel wrote: > On Wed, 2013-02-20 at 10:59 -0600, Steven Munroe wrote: > > On Tue, 2013-02-19 at 21:06 +0100, Torvald Riegel wrote: > > > On Tue, 2013-02-19 at 17:18 +0000, Joseph S. Myers wrote: > > > > On Tue, 19 Feb 2013, Richard Henderson wrote: > > > > > > > > > Any chance we can move these macros into a generic linux header? > > > > > Given that we're using INTERNAL_SYSCALL macros, the definitions ought to be > > > > > the same for all targets. > > > > > > > > Generally most of lowlevellock.h should probably be shared between > > > > architectures. (If some architectures don't implement a particular > > > > feature as of a particular kernel version, that's a matter for > > > > kernel-features.h and __ASSUME_* conditionals.) > > > > > > On a related note: What are the reasons to have arch-specific assembler > > > versions of many of the synchronization operations? I would be > > > surprised if they'd provide a significant performance advantage; has > > > anyone recent measurements for this? > > > > > The introduction of GCC compiler builtins like __sync is fairly recent > > and the new __atomic builtins start with GCC-4.7. So until recently we > > had no choice. > > Using assembler for the atomic operations is possible (e.g., as in > Boehm's libatomic-ops, or in./sysdeps/powerpc/bits/atomic.h and others). > It doesn't allow for the same level of compiler optimization across > barriers, but it's unclear whether that has much benefit, and GCC > doesn't do it yet anyway. > > There are some cases in which compilers that don't support the C11/C++11 > memory model can generate code that wouldn't be correct in such a model, > and which can theoretically interfere with other concurrent code (e.g., > introduce data races due to accesses being too wide). However, because > we don't have custom assembler for everything, we should be already > exposed to that. > > > For platforms (like PowerPC) that implement acquire/release the GCC > > __sync builtins are not sufficient and GCC-4.7 __atomic builtins are not > > pervasive enough to make that the default. > > I agree regarding the __sync builtins, but using assembler in place of > the __atomic builtins should work, or not? > > > > It seems to me that it would be useful to consolidate the different > > > versions that exist for the synchronization operations into shared C > > > code as long as this doesn't make a significant performance difference. > > > They are all based on atomic operations and futex operations, both of > > > which we have in C code (especially if we have compilers that support > > > the C11 memory model). Or are there other reasons for keeping different > > > versions that I'm not aware of? > > > > > I disagree. The performance of lowlevellocks and associated platform > > specific optimizations are too import to move forward with the > > consolidation you suggest. > > Which specific optimizations do you refer to? I didn't see any for > powerpc, for example (i.e., the lock fast path is C up to the point of > the atomic operation). The ones that I saw are for x86, and I'm > wondering whether they provide much benefit. Especially because this > can mostly just matter for the execution path taken when a free lock is > acquired; once you get any cache miss, you're to some extent on the slow > path anyway. Also, for the Linux platforms I looked at, the mutex > algorithms are the same. > Like the lwarx MUTEX_HINT (EH field) hint. > Do you have any recent measurements (or could point to them) that show > the benefit of the optimizations you refer to? > No. I don't current have access to a machine big enough show this effect and I cant tell you about the specific customer. So you will have to trust me on this.