On Thu, 2013-05-23 at 00:15 -0400, Carlos O'Donell wrote: > On 05/08/2013 10:43 AM, Torvald Riegel wrote: > > See http://sourceware.org/bugzilla/show_bug.cgi?id=15215 for background. > > You've already hashed out the details of these changes with Rich > and he has no objection with this first phase of the patch which > is to unify the implementations. > > > I1 and I2 follow essentially the same algorithm, and we can replace it > > with a unified variant, as the bug suggests. See the attached patch for > > a modified version of the sparc instance. The differences between both > > are either cosmetic, or are unnecessary changes (ie, how the > > init-finished state is set (atomic_inc vs. store), or how the fork > > generations are compared). > > > > Both I1 and I2 were missing a release memory order (MO) when marking > > once_control as finished initialization. If the particular arch doesn't > > need a HW barrier for release, we at least need a compiler barrier; if > > it's needed, the original I1 and I2 are not guaranteed to work. > > > > Both I1 and I2 were missing acquire MO on the very first load of > > once_control. This needs to synchronize with the release MO on setting > > the state to init-finished, so without it it's not guaranteed to work > > either. > > Note that this will make a call to pthread_once that doesn't need to > > actually run the init routine slightly slower due to the additional > > acquire barrier. If you're really concerned about this overhead, speak > > up. There are ways to avoid it, but it comes with additional complexity > > and bookkeeping. > > We want correctness. This is a place where correctness is infinitely > more important than speed. We should be correct first and then we > should argue about how to make it fast. > > > I'm currently also using the existing atomic_{read/write}_barrier > > functions instead of not-yet-existing load_acq or store_rel functions. > > I'm not sure whether the latter can have somewhat more efficient > > implementations on Power and ARM; if so, and if you're concerned about > > the overhead, we can add load_acq and store_rel to atomic.h and start > > using it. This would be in line with C11, where we should eventually be > > heading to anyways, IMO. > > Agreed. > > > Both I1 and I2 have an ABA issue on __fork_generation, as explained in > > the comments that the patch adds. How do you all feel about this? > > I can't present a simple fix right now, but I believe it could be fixed > > with additional bookkeeping. > > > > If there's no objection to the essence of this patch, I'll post another > > patch that actually replaces I1 and I2 with the modified variant in the > > attached patch. > > Please repost. See attached patch. This has been tested on ppc64 but not on the other archs that are affected. Nonetheless, ppc has a weak memory model, so, for example, having an acquire barrier on a load or not having it does make a difference. OK?