From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <java-patches-return-19230-listarch-java-patches=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 7436 invoked by alias); 4 Sep 2008 18:37:11 -0000
Received: (qmail 7427 invoked by uid 22791); 4 Sep 2008 18:37:10 -0000
X-Spam-Check-By: sourceware.org
Received: from g1t0028.austin.hp.com (HELO g1t0028.austin.hp.com) (15.216.28.35)     by sourceware.org (qpsmtpd/0.31) with ESMTP; Thu, 04 Sep 2008 18:36:18 +0000
Received: from G6W0640.americas.hpqcorp.net (g6w0640.atlanta.hp.com [16.230.34.76]) 	(using TLSv1 with cipher RC4-MD5 (128/128 bits)) 	(No client certificate requested) 	by g1t0028.austin.hp.com (Postfix) with ESMTP id 53E2C1CA86; 	Thu,  4 Sep 2008 18:36:17 +0000 (UTC)
Received: from G4W1853.americas.hpqcorp.net (16.234.97.231) by  G6W0640.americas.hpqcorp.net (16.230.34.76) with Microsoft SMTP Server (TLS)  id 8.1.263.0; Thu, 4 Sep 2008 18:35:12 +0000
Received: from GVW0436EXB.americas.hpqcorp.net ([16.234.32.153]) by  G4W1853.americas.hpqcorp.net ([16.234.97.231]) with mapi; Thu, 4 Sep 2008  18:35:12 +0000
From: "Boehm, Hans" <hans.boehm@hp.com>
To: Andrew Haley <aph@redhat.com>, Java Patch List <java-patches@gcc.gnu.org>
Date: Fri, 05 Sep 2008 09:39:00 -0000
Subject: RE: Remove data race in libgcj interpreter
Message-ID: <238A96A773B3934685A7269CC8A8D0423152AF178E@GVW0436EXB.americas.hpqcorp.net>
References: <48AD67B2.4040308@redhat.com> <48C0042C.5080804@redhat.com>
In-Reply-To: <48C0042C.5080804@redhat.com>
Accept-Language: en-US
Content-Language: en-US
acceptlanguage: en-US
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
Mailing-List: contact java-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <java-patches.gcc.gnu.org>
List-Subscribe: <mailto:java-patches-subscribe@gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/java-patches/>
List-Post: <mailto:java-patches@gcc.gnu.org>
List-Help: <mailto:java-patches-help@gcc.gnu.org>, <http://gcc.gnu.org/ml/#faqs>
Sender: java-patches-owner@gcc.gnu.org
X-SW-Source: 2008-q3/txt/msg00074.txt.bz2

>
> +
> /*************************************************************
> **********
> +   Locking in the direct-threaded case is quite subtle.  See PR
> +   libgcj/8995 for more details.
> +
> +   We must ensure that no-one ever reads an inconsistent {insn,value}
> +   pair.  Rather than trying to lock everything we detect when some
> +   other thread has modified the insn we are trying to
> modify and back
> +   out.
> +
> +   When we read a pair before rewriting it we do this:
> +
> +     i1 =3D insn
> +     v =3D value
> +     lock
> +       i2 =3D insn
> +     unlock
> +     if (i2 !=3D i1) try again
This is probably statistically likely to generate correct code on some mach=
ines, but it doesn't look correct to me.  I think you're counting on the th=
ree loads to be performed in the order written.  But the compiler gets to r=
eshuffle the first two.  And there is generally no prohibition against movi=
ng loads INTO a critical section, so they can in fact also be arbitrarily r=
eordered with the i2 assignment as well.  (On IA64 and SPARC, the hardware =
might conceivably do that, depending on the lock implementation.  There is =
a long story about when this is Posix compatible, but it doesn't really mat=
ter here.)

There is also another important issue:  Posix and the upcoming C++ standard=
 (among others) give undefined behavior to any program involving a data rac=
e.  Clearly this program involves data races on the initial insn and value =
reads.  The problem is that there are several good reasons for the Posix an=
d C++ positions:

1) Compilers, including I believe gcc, implicitly assume that there are no =
asynchronous changes on non-volatile variables.  If you invalidate this ass=
umption, weird things may happen.  These weird things don't always correspo=
nd to reading a different value.  For example, the compiler may reread a re=
gister copy of the value from the shared variable when it has to spill the =
register, making it appear that a local variable (which is a copy of the sh=
ared variable) suddenly changes without an assignment to it.  It's not hard=
 to come up with cases in which this would provoke a crash.  Empirically, t=
his sort of thing doesn't happen much, but we were told at some point that =
gcc may behave this way.

2) I would guess that gcc supports some platforms on which things like poin=
ter loads are not always atomic.

3) In the not-too-distant future, we'd really like to have reliable data-ra=
ce detectors.  This is much easier if code doesn't contain intentional data=
 races.

4) None of us has managed to come up with an entirely satisfactory definiti=
on of what programs with data races mean.  The JSR133 definition is reasona=
bly close, but still has some issues that have come up recently.  And it wo=
uld probably have more serious performance implications on something like C=
++.

I haven't thought though this through enough.  Does it work to instead keep=
 a shared (possibly per method?) count of the number of running interpreter=
s, together with a shared bit (tested on backward branches) asking them to =
stop?  You only modify the code if no interpreters are running (in that met=
hod).  Getting this right is still nontrivial, but the races are restricted=
 to well-defined atomic variables, which is fine.

Hans

> +
> +   and when we write a pair we do this:
> +
> +      lock
> +        value =3D v
> +        write barrier
> +        insn =3D i
> +      unlock
> +
> +   There is a read barrier in NEXT_INSN which ensures that we don't
> +   read a rewritten value with a stale insn.
> +
> +