From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 24493 invoked by alias); 21 Jul 2010 18:43:55 -0000 Received: (qmail 23188 invoked by uid 48); 21 Jul 2010 18:43:37 -0000 Date: Wed, 21 Jul 2010 18:43:00 -0000 Subject: [Bug libgomp/45025] New: Memory ordering issues with libgomp critical sections and __sync X-Bugzilla-Reason: CC Message-ID: Reply-To: gcc-bugzilla@gcc.gnu.org To: gcc-bugs@gcc.gnu.org From: "Hans dot Boehm at hp dot com" Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-bugs-owner@gcc.gnu.org X-SW-Source: 2010-07/txt/msg02258.txt.bz2 This is really a continuation of bug 42869, but that title mentions Itanium, and this is not Itanium-specific. Gomp_mutex_lock and as a result OpenMP critical sections and locks do not implement sane memory ordering semantics. This is largely due to __sync implementation problems. I looked at Itanium and PowerPC, but other weakly ordered architectures are probably similar. I was trying to understand the memory ordering semantics of OpenMP critical sections with gcc, since the 3.0 OpenMP spec requires full fence semantics, which seems to be overkill. I concluded that the current state of affairs in 4.4.4 doesn't make much sense, independent of the spec interpretation. Specific issues: 1. The Itanium version of gomp_mutex_lock still has the fence on the wrong side of the cmpxchg4.rel. This is a __sync_bool_compare_and_swap problem, and was fixed in the trunk (bug 42869, thanks Steve!) but still appears to be wrong in the 4.4 series. I think this needs a backport to 4.4, since we may effectively be generating critical sections that don't guarantee mutual exclusion. (I don't know whether current hardware provides stronger properties that mitigate this, but ...) 2. The PowerPC & Itanium version of gomp_mutex_lock explicitly calls __sync_bool_compare_and_swap. Nonetheless the PowerPC code ends up generating a code sequence along the lines of lwsync; lwarx; stwcx.; bne; isync As far as I can determine, this does not order a preceding store with respect to a later load. (It uses lwsync, not sync.) It's unclear whether this is a problem with respect to the OpenMP spec. (Technically it is for 3.0, but 3.1 was reworded, on my suggestion, to allow this, since it's a useless restriction.) But I think it clearly fails to satisfy the __sync_bool_compare_and_swap spec, which claims this should be a "full barrier". More generally, gomp_mutex_lock() appears inconsistent about whether it satisfies the OpenMP 3.0 and earlier rules that it include a flush or full barrier. Modulo the earlier bug, the Itanium versions do. The PowerPC version does for unlock, but not lock. (And given that it doesn't, I'm not sure there's a point behind the lwsync in the lock sequence either.) Since this choice has a significant performance impact on both architectures, this seems to be a way to get the worst of all possible worlds; we get both weak guarantees and, in many cases, bad performance. I would recommend following the OpenMP3.1 draft spec, and insisting on only acquire/release semantics for gomp_mutex_(un)lock. -- Summary: Memory ordering issues with libgomp critical sections and __sync Product: gcc Version: 4.4.4 Status: UNCONFIRMED Severity: normal Priority: P3 Component: libgomp AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: Hans dot Boehm at hp dot com http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45025