From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 7436 invoked by alias); 4 Sep 2008 18:37:11 -0000 Received: (qmail 7427 invoked by uid 22791); 4 Sep 2008 18:37:10 -0000 X-Spam-Check-By: sourceware.org Received: from g1t0028.austin.hp.com (HELO g1t0028.austin.hp.com) (15.216.28.35) by sourceware.org (qpsmtpd/0.31) with ESMTP; Thu, 04 Sep 2008 18:36:18 +0000 Received: from G6W0640.americas.hpqcorp.net (g6w0640.atlanta.hp.com [16.230.34.76]) (using TLSv1 with cipher RC4-MD5 (128/128 bits)) (No client certificate requested) by g1t0028.austin.hp.com (Postfix) with ESMTP id 53E2C1CA86; Thu, 4 Sep 2008 18:36:17 +0000 (UTC) Received: from G4W1853.americas.hpqcorp.net (16.234.97.231) by G6W0640.americas.hpqcorp.net (16.230.34.76) with Microsoft SMTP Server (TLS) id 8.1.263.0; Thu, 4 Sep 2008 18:35:12 +0000 Received: from GVW0436EXB.americas.hpqcorp.net ([16.234.32.153]) by G4W1853.americas.hpqcorp.net ([16.234.97.231]) with mapi; Thu, 4 Sep 2008 18:35:12 +0000 From: "Boehm, Hans" To: Andrew Haley , Java Patch List Date: Fri, 05 Sep 2008 09:39:00 -0000 Subject: RE: Remove data race in libgcj interpreter Message-ID: <238A96A773B3934685A7269CC8A8D0423152AF178E@GVW0436EXB.americas.hpqcorp.net> References: <48AD67B2.4040308@redhat.com> <48C0042C.5080804@redhat.com> In-Reply-To: <48C0042C.5080804@redhat.com> Accept-Language: en-US Content-Language: en-US acceptlanguage: en-US Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Mailing-List: contact java-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: java-patches-owner@gcc.gnu.org X-SW-Source: 2008-q3/txt/msg00074.txt.bz2 > > + > /************************************************************* > ********** > + Locking in the direct-threaded case is quite subtle. See PR > + libgcj/8995 for more details. > + > + We must ensure that no-one ever reads an inconsistent {insn,value} > + pair. Rather than trying to lock everything we detect when some > + other thread has modified the insn we are trying to > modify and back > + out. > + > + When we read a pair before rewriting it we do this: > + > + i1 =3D insn > + v =3D value > + lock > + i2 =3D insn > + unlock > + if (i2 !=3D i1) try again This is probably statistically likely to generate correct code on some mach= ines, but it doesn't look correct to me. I think you're counting on the th= ree loads to be performed in the order written. But the compiler gets to r= eshuffle the first two. And there is generally no prohibition against movi= ng loads INTO a critical section, so they can in fact also be arbitrarily r= eordered with the i2 assignment as well. (On IA64 and SPARC, the hardware = might conceivably do that, depending on the lock implementation. There is = a long story about when this is Posix compatible, but it doesn't really mat= ter here.) There is also another important issue: Posix and the upcoming C++ standard= (among others) give undefined behavior to any program involving a data rac= e. Clearly this program involves data races on the initial insn and value = reads. The problem is that there are several good reasons for the Posix an= d C++ positions: 1) Compilers, including I believe gcc, implicitly assume that there are no = asynchronous changes on non-volatile variables. If you invalidate this ass= umption, weird things may happen. These weird things don't always correspo= nd to reading a different value. For example, the compiler may reread a re= gister copy of the value from the shared variable when it has to spill the = register, making it appear that a local variable (which is a copy of the sh= ared variable) suddenly changes without an assignment to it. It's not hard= to come up with cases in which this would provoke a crash. Empirically, t= his sort of thing doesn't happen much, but we were told at some point that = gcc may behave this way. 2) I would guess that gcc supports some platforms on which things like poin= ter loads are not always atomic. 3) In the not-too-distant future, we'd really like to have reliable data-ra= ce detectors. This is much easier if code doesn't contain intentional data= races. 4) None of us has managed to come up with an entirely satisfactory definiti= on of what programs with data races mean. The JSR133 definition is reasona= bly close, but still has some issues that have come up recently. And it wo= uld probably have more serious performance implications on something like C= ++. I haven't thought though this through enough. Does it work to instead keep= a shared (possibly per method?) count of the number of running interpreter= s, together with a shared bit (tested on backward branches) asking them to = stop? You only modify the code if no interpreters are running (in that met= hod). Getting this right is still nontrivial, but the races are restricted= to well-defined atomic variables, which is fine. Hans > + > + and when we write a pair we do this: > + > + lock > + value =3D v > + write barrier > + insn =3D i > + unlock > + > + There is a read barrier in NEXT_INSN which ensures that we don't > + read a rewritten value with a stale insn. > + > +