From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 9434 invoked by alias); 10 Jul 2015 09:14:11 -0000 Mailing-List: contact jit-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Post: List-Help: List-Subscribe: Sender: jit-owner@gcc.gnu.org Received: (qmail 9389 invoked by uid 89); 10 Jul 2015 09:14:10 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Checked: by ClamAV 0.98.7 on sourceware.org X-Virus-Found: No X-Spam-SWARE-Status: No, score=-0.7 required=5.0 tests=AWL,BAYES_50,FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,SPF_PASS autolearn=ham version=3.3.2 X-Spam-Status: No, score=-0.7 required=5.0 tests=AWL,BAYES_50,FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,SPF_PASS autolearn=ham version=3.3.2 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on sourceware.org X-Spam-Level: X-Spam-User: qpsmtpd, 2 recipients X-HELO: mail-lb0-f175.google.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:from:date:message-id :subject:to:cc:content-type:content-transfer-encoding; bh=mNqm3Mj893y9cmafs/o29hrRL1XOenA4o1ou+gzevkM=; b=MP8rOqDi7I9D/9OOoflC6+OlwY0ndFpgo96sSY1FUpxmSmr8G2h/zVfRJ5QtoNNNGa Locgn4X8RoXLnFWk3XB2u9d2HWHFDbWNI0PbSj30THzCP9/+vAnHly3wKblqTddnK192 ylGRXNb3hhayv65fT7BudapCedSNhBSAhz01u/s+h7KGTUgArKUlSZqsJY3IgCiWeCpf WuPeHI1ZtrfE3UmBu5Mgdl7ABwmJBEkh8emRmthKK7cTJrLCT32vIr0RWmhfLBeZ/w7l R0tgwsvPINoeBufsOLpRrPdK8Nk1DB6dOSANSV/WD0PJ8z/j+x0d32+ophX30ovKLKzD LyOQ== X-Received: by 10.152.10.97 with SMTP id h1mr19066326lab.45.1436519644837; Fri, 10 Jul 2015 02:14:04 -0700 (PDT) MIME-Version: 1.0 In-Reply-To: <1436493224.31573.32.camel@surprise> References: <559EF2F1.6000000@starynkevitch.net> <1436493224.31573.32.camel@surprise> From: Armin Rigo Date: Thu, 01 Jan 2015 00:00:00 -0000 X-Google-Sender-Auth: S9cFcrKJsX7QPpt-ZIeaD_G5tSk Message-ID: Subject: Re: GCC/JIT and precise garbage collection support? To: David Malcolm Cc: Basile Starynkevitch , jit@gcc.gnu.org, GCC Development Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-SW-Source: 2015-q3/txt/msg00076.txt.bz2 Hi David, hi Basile, On 10 July 2015 at 03:53, David Malcolm wrote: > FWIW PyPy (an implementation of Python) defaults to using true GC, and > could benefit from GC support in GCC; currently PyPy has a nasty hack > for locating on-stack GC roots, by compiling to assembler, then carving > up the assembler with regexes to build GC metadata. A first note: write barriers, stack walking, and so on can all be implemented manually. The only thing that cannot be implemented easily is stack maps. Here's in more details how the PyPy hacks work, in case there is interest. It might be possible to do it cleanly with minimal changes in GCC (hopefully?). The goal: when a garbage collection occurs, we need to locate and possibly change the GC pointers in the stack. (They may have been originally in callee-saved registers, saved by some callee.) So this is about writing some "stack map" that describes where the values are around all calls in the stack. To do that, we put in the C sources "v =3D pypy_asm_gcroot(v);" for all GC-pointer variables after each call (at least each call that can recursively end up collecting): /* The following pseudo-instruction is used by --gcrootfinder=3Dasmgcc just after a call to tell gcc to put a GCROOT mark on each gc-pointer local variable. All such local variables need to go through a "v =3D pypy_asm_gcroot(v)". The old value should not be used any more by the C code; this prevents the following case from occurring: gcc could make two copies of the local variable (e.g. one in the stack and one in a register), pass one to GCROOT, and later use the other one. In practice the pypy_asm_gcroot() is often a no-op in the final machine code and doesn't prevent most optimizations. */ /* With gcc, getting the asm() right was tricky, though. The asm() is not volatile so that gcc is free to delete it if the output variable is not used at all. We need to prevent gcc from moving the asm() *before* the call that could cause a collection; this is the purpose of the (unused) __gcnoreorderhack input argument. Any memory input argument would have this effect: as far as gcc knows the call instruction can modify arbitrary memory, thus creating the order dependency that we want. */ #define pypy_asm_gcroot(p) ({void*_r; \ asm ("/* GCROOT %0 */" : "=3Dg" (_r) : \ "0" (p), "m" (__gcnoreorderhack)); \ _r; }) This puts a comment in the .s file, which we post-process. The goal of this post-processing is to find the GCROOT comments, see what value they mention, and track where this value comes from at the preceding call. This is the messy part, because the value can often move around, sometimes across jumps. We also track if and where the callee-saved registers end up being saved. At the end we generate some static data: a map from every CALL location to a list of GC pointers which are live across this call, written out as a list of callee-saved registers and stack locations. This static data is read by custom platform-specific code in the stack walker. This works well enough because, from gcc's point of view, all GC pointers after a CALL are only used as arguments to "v2 =3D pypy_asm_gcroot(v)". GCC is not allowed to do things like precompute offsets inside GC objects---because v2 !=3D v (which is true if the GC moved the object) and v2 is only created by the pypy_asm_gcroot() after the call. The drawback of this "asm" statement (besides being detached from the CALL) is that, even though we say "=3Dg", a stack pointer will often be loaded into a register just before the "asm" and spilled again to a (likely different) stack location afterwards. This creates some pointless data movements. This seems to degrade performance by at most a few percents, so it's fine for us. So how would a GCC-supported solution look like? Maybe a single builtin that does a call and at the same time "marks" some local variables (for read/write). It would be enough if a CALL emitted from this built-in is immediately followed by an assembler pseudo-instruction that describe the location of all the local variables listed (plus context information: the current stack frame's depth, and where callee-saved registers have been saved). This would mean the user of this builtin still needs to come up with custom tools to post-process the assembler, but it is probably the simplest and most flexible solution. I may be wrong about thinking any of this would be easy, though... A bient=C3=B4t, Armin.