From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 20495 invoked by alias); 10 Jul 2015 14:19:52 -0000 Mailing-List: contact jit-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Post: List-Help: List-Subscribe: Sender: jit-owner@gcc.gnu.org Received: (qmail 20476 invoked by uid 89); 10 Jul 2015 14:19:51 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Checked: by ClamAV 0.98.7 on sourceware.org X-Virus-Found: No X-Spam-SWARE-Status: No, score=-1.7 required=5.0 tests=AWL,BAYES_00,KAM_LAZY_DOMAIN_SECURITY,RP_MATCHES_RCVD,SPF_HELO_PASS autolearn=no version=3.3.2 X-Spam-Status: No, score=-1.7 required=5.0 tests=AWL,BAYES_00,KAM_LAZY_DOMAIN_SECURITY,RP_MATCHES_RCVD,SPF_HELO_PASS autolearn=no version=3.3.2 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on sourceware.org X-Spam-Level: X-Spam-User: qpsmtpd, 2 recipients X-HELO: mx1.redhat.com Message-ID: <1436537517.31573.48.camel@surprise> Subject: Re: GCC/JIT and precise garbage collection support? From: David Malcolm To: Armin Rigo Cc: Basile Starynkevitch , jit@gcc.gnu.org, GCC Development Date: Thu, 01 Jan 2015 00:00:00 -0000 In-Reply-To: References: <559EF2F1.6000000@starynkevitch.net> <1436493224.31573.32.camel@surprise> Content-Type: text/plain; charset="UTF-8" Mime-Version: 1.0 Content-Transfer-Encoding: 7bit X-Scanned-By: MIMEDefang 2.68 on 10.5.11.23 X-SW-Source: 2015-q3/txt/msg00078.txt.bz2 On Fri, 2015-07-10 at 11:13 +0200, Armin Rigo wrote: > Hi David, hi Basile, > > On 10 July 2015 at 03:53, David Malcolm wrote: > > FWIW PyPy (an implementation of Python) defaults to using true GC, and > > could benefit from GC support in GCC; currently PyPy has a nasty hack > > for locating on-stack GC roots, by compiling to assembler, then carving > > up the assembler with regexes to build GC metadata. > > A first note: write barriers, stack walking, and so on can all be > implemented manually. The only thing that cannot be implemented > easily is stack maps. > > Here's in more details how the PyPy hacks work, in case there is > interest. It might be possible to do it cleanly with minimal changes > in GCC (hopefully?). > > The goal: when a garbage collection occurs, we need to locate and > possibly change the GC pointers in the stack. (They may have been > originally in callee-saved registers, saved by some callee.) So this > is about writing some "stack map" that describes where the values are > around all calls in the stack. To do that, we put in the C sources "v > = pypy_asm_gcroot(v);" for all GC-pointer variables after each call > (at least each call that can recursively end up collecting): > > > /* The following pseudo-instruction is used by --gcrootfinder=asmgcc > just after a call to tell gcc to put a GCROOT mark on each gc-pointer > local variable. All such local variables need to go through a "v = > pypy_asm_gcroot(v)". The old value should not be used any more by > the C code; this prevents the following case from occurring: gcc > could make two copies of the local variable (e.g. one in the stack > and one in a register), pass one to GCROOT, and later use the other > one. In practice the pypy_asm_gcroot() is often a no-op in the final > machine code and doesn't prevent most optimizations. */ > > /* With gcc, getting the asm() right was tricky, though. The asm() is > not volatile so that gcc is free to delete it if the output variable > is not used at all. We need to prevent gcc from moving the asm() > *before* the call that could cause a collection; this is the purpose > of the (unused) __gcnoreorderhack input argument. Any memory input > argument would have this effect: as far as gcc knows the call > instruction can modify arbitrary memory, thus creating the order > dependency that we want. */ > > #define pypy_asm_gcroot(p) ({void*_r; \ > asm ("/* GCROOT %0 */" : "=g" (_r) : \ > "0" (p), "m" (__gcnoreorderhack)); \ > _r; }) > > > This puts a comment in the .s file, which we post-process. The goal > of this post-processing is to find the GCROOT comments, see what value > they mention, and track where this value comes from at the preceding > call. This is the messy part, because the value can often move > around, sometimes across jumps. > > We also track if and where the callee-saved registers end up being saved. > > At the end we generate some static data: a map from every CALL > location to a list of GC pointers which are live across this call, > written out as a list of callee-saved registers and stack locations. > This static data is read by custom platform-specific code in the stack > walker. > > This works well enough because, from gcc's point of view, all GC > pointers after a CALL are only used as arguments to "v2 = > pypy_asm_gcroot(v)". GCC is not allowed to do things like precompute > offsets inside GC objects---because v2 != v (which is true if the GC > moved the object) and v2 is only created by the pypy_asm_gcroot() > after the call. > > The drawback of this "asm" statement (besides being detached from the > CALL) is that, even though we say "=g", a stack pointer will often be > loaded into a register just before the "asm" and spilled again to a > (likely different) stack location afterwards. This creates some > pointless data movements. This seems to degrade performance by at > most a few percents, so it's fine for us. > > So how would a GCC-supported solution look like? Maybe a single > builtin that does a call and at the same time "marks" some local > variables (for read/write). It would be enough if a CALL emitted from > this built-in is immediately followed by an assembler > pseudo-instruction that describe the location of all the local > variables listed (plus context information: the current stack frame's > depth, and where callee-saved registers have been saved). This would > mean the user of this builtin still needs to come up with custom tools > to post-process the assembler, but it is probably the simplest and > most flexible solution. I may be wrong about thinking any of this > would be easy, though... Presumably you'd want some kind of: -fgenerate-stack-maps option (perhaps taking an argument describing the data format?) Thinking aloud here, maybe a way to implement this is to have some kind of annotation on each call describing the GC-vars that are live at the call. This somehow has to be created/maintained through the various IR formats and lowerings. I'm still relatively new to the GCC backend, so take the following with a large pinch of salt... AIUI, we lose precise type information when we expand from gimple to RTL (do we?), so presumably to keep GC-precision we'd want some kind of thing during the gimple-to-RTL expansion that annotates RTL expressions with GC information. Perhaps just a flag on RTL nodes to note those nodes that are GC-holding pointers? AIUI, we have CALL_INSN instructions all the way through the RTL phase of the backend, so we can identify which locations in the generated code are calls; presumably we'd need at each CALL_INSN to determine somehow which RTL expressions tagged as being GC-aware are live (perhaps a mixture of registers and fp-offset expressions?) So presumably we could use that information (maybe in the final pass) to write out some metadata describing for each %pc callsite the relevant GC roots. Armin: does this sound like what you need? RTL experts: does this sound (at least) vaguely sane? Dave