From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 28426 invoked by alias); 8 May 2013 21:25:12 -0000 Mailing-List: contact libc-ports-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Post: List-Help: , Sender: libc-ports-owner@sourceware.org Received: (qmail 28378 invoked by uid 89); 8 May 2013 21:25:08 -0000 X-Spam-SWARE-Status: No, score=0.1 required=5.0 tests=AWL,BAYES_00,KHOP_DYNAMIC2,RDNS_DYNAMIC,TVD_RCVD_IP autolearn=no version=3.3.1 X-Spam-User: qpsmtpd, 2 recipients Received: from 216-12-86-13.cv.mvl.ntelos.net (HELO brightrain.aerifal.cx) (216.12.86.13) by sourceware.org (qpsmtpd/0.84/v0.84-167-ge50287c) with ESMTP; Wed, 08 May 2013 21:25:07 +0000 Received: from dalias by brightrain.aerifal.cx with local (Exim 3.15 #2) id 1UaBrC-00079z-00; Wed, 08 May 2013 21:25:02 +0000 Date: Wed, 08 May 2013 21:25:00 -0000 To: Torvald Riegel Cc: GLIBC Devel , libc-ports Subject: Re: [PATCH] Unify pthread_once (bug 15215) Message-ID: <20130508212502.GF20323@brightrain.aerifal.cx> References: <1368024237.7774.794.camel@triegel.csb> <20130508175132.GB20323@brightrain.aerifal.cx> <1368046046.7774.1441.camel@triegel.csb> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1368046046.7774.1441.camel@triegel.csb> User-Agent: Mutt/1.5.21 (2010-09-15) From: Rich Felker X-SW-Source: 2013-05/txt/msg00040.txt.bz2 On Wed, May 08, 2013 at 10:47:26PM +0200, Torvald Riegel wrote: > On Wed, 2013-05-08 at 13:51 -0400, Rich Felker wrote: > > On Wed, May 08, 2013 at 04:43:57PM +0200, Torvald Riegel wrote: > > > Note that this will make a call to pthread_once that doesn't need to > > > actually run the init routine slightly slower due to the additional > > > acquire barrier. If you're really concerned about this overhead, speak > > > up. There are ways to avoid it, but it comes with additional complexity > > > and bookkeeping. > > > > On the one hand, I think it should be avoided if at all possible. > > pthread_once is the correct, canonical way to do initialization (as > > opposed to hacks like library init functions or global ctors), and the > > main doubt lots of people have about doing it the correct way is that > > they're going to kill performance if they call pthread_once from every > > point where initialization needs to have been completed. If every call > > imposes memory synchronization, performance might become a real issue > > discouraging people from following best practices for library > > initialization. > > Well, what we precisely need is that the initialization happens-before > (ie, the relation from the, say, C11 memory model) every call that does > not in fact initialize. If initialization happened on another thread, > you need to synchronize. But from there on, you are essentially free to > establish this in any way you want. And there are ways, because > happens-before is more-or-less transitive. > > > On the other hand, I don't think it's conforming to elide the barrier. > > POSIX states (XSH 4.11 Memory Synchronization): > > > > "The pthread_once() function shall synchronize memory for the first > > call in each thread for a given pthread_once_t object." > > No, it's not. You could see just parts of the effects of the > initialization; potentially reading garbage can't be the intended > semantics :) The work of synchronizing memory should take place at the end of the pthread_once call that actually does the initialization, rather than in the other threads which synchronize. This is the way the x86 memory model naturally works, but perhaps it's prohibitive to achieve on other architectures. However, the idea is that pthread_once only runs init routines a small finite number of times, so even if you had to so some horrible hack that makes the synchronization on return 1000x slower (e.g. a syscall), it would still be better than incurring the cost of a full acquire barrier in each subsequent call, which ideally should have the same cost as a call to an empty function. > > Since it's impossible to track whether a call is the first call in a > > given thread > > Are you sure about this? :) It's impossible with bounded memory requirements, and thus impossible in general (allocating memory for the tracking might fail). > > this means every call to pthread_once() is required to > > be a full memory barrier. > > Note that we do not need a full memory barrier, just an acquire memory > barrier. So this only matters on architectures with memory models that > give weaker per-default ordering guarantees. For example, this doesn't > add any hardware barrier instructions on x86 or Sparc TSO. But for > Power and ARM it does. Yes, I see that. > > I suspect this is unintended, and we should > > perhaps file a bug report with the Austin Group and see if the > > requirement can be relaxed. > > I don't think that other semantics are intended. If you return from > pthread_once(), initialization should have happened before that. If it > doesn't, you don't really know whether initialization happened once, so > programs would be forced to do their own synchronization. I think my confusion is merely that POSIX does not define the phrase "synchronize memory", and in the absence of a definition, "full memory barrier" (both release and acquire semantics) is the only reasonable interpretation I can find. In other words, it seems like a pathological conforming program could attempt to use the language in the specification to use pthread_once as a release barrier. I'm not sure if there are ways this could be meaningfully arranged (i.e. with well-defined ordering; off-hand, I would think tricks with cancelling an in-progress invocation of pthread_once might make it possible. By the way, cancellation probably makes the above POSIX text incorrect anyway; a thread could call pthread_once on the same pthread_once_t object more than once, with the second call not being a no-op, if the initialization routine for the first call is cancelled and the second call takes place from a cancellation cleanup handler. Rich