From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 9929 invoked by alias); 20 Feb 2013 20:25:49 -0000 Received: (qmail 9875 invoked by uid 22791); 20 Feb 2013 20:25:45 -0000 X-SWARE-Spam-Status: No, hits=-7.6 required=5.0 tests=AWL,BAYES_00,KHOP_RCVD_UNTRUST,KHOP_SPAMHAUS_DROP,KHOP_THREADED,RCVD_IN_DNSWL_HI,RCVD_IN_HOSTKARMA_W,RP_MATCHES_RCVD,SPF_HELO_PASS X-Spam-Check-By: sourceware.org Received: from mx1.redhat.com (HELO mx1.redhat.com) (209.132.183.28) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Wed, 20 Feb 2013 20:25:31 +0000 Received: from int-mx09.intmail.prod.int.phx2.redhat.com (int-mx09.intmail.prod.int.phx2.redhat.com [10.5.11.22]) by mx1.redhat.com (8.14.4/8.14.4) with ESMTP id r1KKPTMg000464 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Wed, 20 Feb 2013 15:25:29 -0500 Received: from [10.36.7.48] (vpn1-7-48.ams2.redhat.com [10.36.7.48]) by int-mx09.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id r1KKPR2C004877; Wed, 20 Feb 2013 15:25:27 -0500 Subject: Re: PI mutex support for pthread_cond_* now in nptl From: Torvald Riegel To: munroesj@us.ibm.com Cc: "Joseph S. Myers" , Richard Henderson , Siddhesh Poyarekar , libc-ports@sourceware.org, libc-alpha@sourceware.org In-Reply-To: <1361379598.19573.167.camel@spokane1.rchland.ibm.com> References: <20130218105637.GJ32163@spoyarek.pnq.redhat.com> <5123AB55.2070100@twiddle.net> <1361304381.581.80.camel@triegel.csb> <1361379598.19573.167.camel@spokane1.rchland.ibm.com> Content-Type: text/plain; charset="UTF-8" Date: Wed, 20 Feb 2013 20:25:00 -0000 Message-ID: <1361391926.581.1774.camel@triegel.csb> Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Mailing-List: contact libc-ports-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Post: List-Help: , Sender: libc-ports-owner@sourceware.org X-SW-Source: 2013-02/txt/msg00056.txt.bz2 On Wed, 2013-02-20 at 10:59 -0600, Steven Munroe wrote: > On Tue, 2013-02-19 at 21:06 +0100, Torvald Riegel wrote: > > On Tue, 2013-02-19 at 17:18 +0000, Joseph S. Myers wrote: > > > On Tue, 19 Feb 2013, Richard Henderson wrote: > > > > > > > Any chance we can move these macros into a generic linux header? > > > > Given that we're using INTERNAL_SYSCALL macros, the definitions ought to be > > > > the same for all targets. > > > > > > Generally most of lowlevellock.h should probably be shared between > > > architectures. (If some architectures don't implement a particular > > > feature as of a particular kernel version, that's a matter for > > > kernel-features.h and __ASSUME_* conditionals.) > > > > On a related note: What are the reasons to have arch-specific assembler > > versions of many of the synchronization operations? I would be > > surprised if they'd provide a significant performance advantage; has > > anyone recent measurements for this? > > > The introduction of GCC compiler builtins like __sync is fairly recent > and the new __atomic builtins start with GCC-4.7. So until recently we > had no choice. Using assembler for the atomic operations is possible (e.g., as in Boehm's libatomic-ops, or in./sysdeps/powerpc/bits/atomic.h and others). It doesn't allow for the same level of compiler optimization across barriers, but it's unclear whether that has much benefit, and GCC doesn't do it yet anyway. There are some cases in which compilers that don't support the C11/C++11 memory model can generate code that wouldn't be correct in such a model, and which can theoretically interfere with other concurrent code (e.g., introduce data races due to accesses being too wide). However, because we don't have custom assembler for everything, we should be already exposed to that. > For platforms (like PowerPC) that implement acquire/release the GCC > __sync builtins are not sufficient and GCC-4.7 __atomic builtins are not > pervasive enough to make that the default. I agree regarding the __sync builtins, but using assembler in place of the __atomic builtins should work, or not? > > It seems to me that it would be useful to consolidate the different > > versions that exist for the synchronization operations into shared C > > code as long as this doesn't make a significant performance difference. > > They are all based on atomic operations and futex operations, both of > > which we have in C code (especially if we have compilers that support > > the C11 memory model). Or are there other reasons for keeping different > > versions that I'm not aware of? > > > I disagree. The performance of lowlevellocks and associated platform > specific optimizations are too import to move forward with the > consolidation you suggest. Which specific optimizations do you refer to? I didn't see any for powerpc, for example (i.e., the lock fast path is C up to the point of the atomic operation). The ones that I saw are for x86, and I'm wondering whether they provide much benefit. Especially because this can mostly just matter for the execution path taken when a free lock is acquired; once you get any cache miss, you're to some extent on the slow path anyway. Also, for the Linux platforms I looked at, the mutex algorithms are the same. Do you have any recent measurements (or could point to them) that show the benefit of the optimizations you refer to? For example, we've spent quite some time debugging a PI cond var failure in the past, and this wasn't made any easier by having several (different) versions of the cond var implementation. Torvald