* GCC/JIT and precise garbage collection support? @ 2015-01-01 0:00 Basile Starynkevitch 2015-01-01 0:00 ` David Malcolm 2015-01-01 0:00 ` Andrew Haley 0 siblings, 2 replies; 9+ messages in thread From: Basile Starynkevitch @ 2015-01-01 0:00 UTC (permalink / raw) To: jit; +Cc: gcc Hello All, (this is triggered by a question on the Ocaml mailing list asking about SystemZ backend in Ocaml; SystemZ is today a backend for GCC & probably GCCJIT) We might want to support better good garbage collection schemes in GCC, particularily in GCCJIT. This is a thing that LLVM is known to be weak at, and we might aim to do much better. If we did, good frontends for good functional languages (e.g. F#, Ocaml, Haskell) might in the future profit of GCC technology. And even a Javascript engine based on GCCJIT could profit. A good GC is very probably a precise (sometimes generational copying) GC with write barriers (read the http://gchandbook.org/ for more, or at least the wikipage about garbage collection). So a good GC is changing pointers. So we need to know where, and provide a mechanism for, pointer values are located in the call stack (of the GCCJIT generated code), and probably provide some write barrier machinery. In my incomplete understanding, this requires cooperation between GCC backend and middle-end; it perhaps mean in the GIMPLE level that we mark some trees for local variables as been required to be spilled (by the backend) at some well defined location in the call frame, and be able to query that location (i.e. its offset). Perhaps a possible approach might be to add, at the C front-end level, an extra variable attribute telling that the variable should be spilled always at the same offset in the call frame, to have some machinery to query the value of that fixed offset, and to also have a GCC builtin which flushes all the registers into the call frame? This is just food for thoughts and still fuzzy in my head. Comments are welcome (including things like we should not care at all about GC). Notice that if we had such support for garbage collection, the (dying) Java front-end could be resurrected to provide a faster GC than Boehm GC. And GCC based compilers for languages like Go or D which have garbage collection could also profit. (even MELT might take advantage of that). Regards. -- Basile STARYNKEVITCH http://starynkevitch.net/Basile/ email: basile<at>starynkevitch<dot>net mobile: +33 6 8501 2359 8, rue de la Faiencerie, 92340 Bourg La Reine, France *** opinions {are only mine, sont seulement les miennes} *** ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: GCC/JIT and precise garbage collection support? 2015-01-01 0:00 GCC/JIT and precise garbage collection support? Basile Starynkevitch @ 2015-01-01 0:00 ` David Malcolm 2015-01-01 0:00 ` Armin Rigo 2015-01-01 0:00 ` Andrew Haley 1 sibling, 1 reply; 9+ messages in thread From: David Malcolm @ 2015-01-01 0:00 UTC (permalink / raw) To: Basile Starynkevitch; +Cc: jit, gcc On Fri, 2015-07-10 at 00:17 +0200, Basile Starynkevitch wrote: > Hello All, > > (this is triggered by a question on the Ocaml mailing list asking about > SystemZ backend in Ocaml; SystemZ is today a backend for GCC & probably > GCCJIT) > > We might want to support better good garbage collection schemes in GCC, > particularily in GCCJIT. This is a > thing that LLVM is known to be weak at, and we might aim to do much > better. If we did, good frontends for > good functional languages (e.g. F#, Ocaml, Haskell) might in the future > profit of GCC technology. And even a Javascript engine based on GCCJIT > could profit. FWIW PyPy (an implementation of Python) defaults to using true GC, and could benefit from GC support in GCC; currently PyPy has a nasty hack for locating on-stack GC roots, by compiling to assembler, then carving up the assembler with regexes to build GC metadata. (IIRC this is the --gcrootfinder=asmgcc option here: http://pypy.readthedocs.org/en/latest/config/translation.gcrootfinder.html ) > A good GC is very probably a precise (sometimes generational copying) GC > with write barriers > (read the http://gchandbook.org/ for more, or at least the wikipage > about garbage collection). So a good GC is changing pointers. > > So we need to know where, and provide a mechanism for, pointer values > are located in the call stack (of the GCCJIT generated code), and > probably provide some write barrier machinery. > > In my incomplete understanding, this requires cooperation between GCC > backend and middle-end; it perhaps mean in the GIMPLE level that we mark > some trees for local variables as been required to be spilled (by the > backend) at some well defined location in the call frame, and be able to > query that location (i.e. its offset). > > Perhaps a possible approach might be to add, at the C front-end level, > an extra variable attribute telling that the variable should be spilled > always at the same offset in the call frame, to have some machinery to > query the value of that fixed offset, and to also have a GCC builtin > which flushes all the registers into the call frame? > > This is just food for thoughts and still fuzzy in my head. Comments are > welcome (including things like we should not care at all about GC). > > Notice that if we had such support for garbage collection, the (dying) > Java front-end could be resurrected to provide a faster GC than Boehm > GC. And GCC based compilers for languages like Go or D which have > garbage collection could also profit. (even MELT might take advantage of > that). This all sounds like a lot of work. I think a simpler first step might be to have some kind of option to support tracking on-stack roots; presumably some kind of late RTL pass that writes out a stack map: const data describing what GC-pointers are live, at each %pc range, assuming we already have enough metadata to let a collector walk the stack frames of a thread (presumably we already have that for e.g. backtraces). This assumes we have enough type information at the RTL phase to be able to distinguish GC types at different places in the frame, or to punt it and be imprecise. Though that doesn't solve GC ptrs in registers. That said, fwiw I'm already fully tasked with things for GCC 6. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: GCC/JIT and precise garbage collection support? 2015-01-01 0:00 ` David Malcolm @ 2015-01-01 0:00 ` Armin Rigo 2015-01-01 0:00 ` David Malcolm 0 siblings, 1 reply; 9+ messages in thread From: Armin Rigo @ 2015-01-01 0:00 UTC (permalink / raw) To: David Malcolm; +Cc: Basile Starynkevitch, jit, GCC Development Hi David, hi Basile, On 10 July 2015 at 03:53, David Malcolm <dmalcolm@redhat.com> wrote: > FWIW PyPy (an implementation of Python) defaults to using true GC, and > could benefit from GC support in GCC; currently PyPy has a nasty hack > for locating on-stack GC roots, by compiling to assembler, then carving > up the assembler with regexes to build GC metadata. A first note: write barriers, stack walking, and so on can all be implemented manually. The only thing that cannot be implemented easily is stack maps. Here's in more details how the PyPy hacks work, in case there is interest. It might be possible to do it cleanly with minimal changes in GCC (hopefully?). The goal: when a garbage collection occurs, we need to locate and possibly change the GC pointers in the stack. (They may have been originally in callee-saved registers, saved by some callee.) So this is about writing some "stack map" that describes where the values are around all calls in the stack. To do that, we put in the C sources "v = pypy_asm_gcroot(v);" for all GC-pointer variables after each call (at least each call that can recursively end up collecting): /* The following pseudo-instruction is used by --gcrootfinder=asmgcc just after a call to tell gcc to put a GCROOT mark on each gc-pointer local variable. All such local variables need to go through a "v = pypy_asm_gcroot(v)". The old value should not be used any more by the C code; this prevents the following case from occurring: gcc could make two copies of the local variable (e.g. one in the stack and one in a register), pass one to GCROOT, and later use the other one. In practice the pypy_asm_gcroot() is often a no-op in the final machine code and doesn't prevent most optimizations. */ /* With gcc, getting the asm() right was tricky, though. The asm() is not volatile so that gcc is free to delete it if the output variable is not used at all. We need to prevent gcc from moving the asm() *before* the call that could cause a collection; this is the purpose of the (unused) __gcnoreorderhack input argument. Any memory input argument would have this effect: as far as gcc knows the call instruction can modify arbitrary memory, thus creating the order dependency that we want. */ #define pypy_asm_gcroot(p) ({void*_r; \ asm ("/* GCROOT %0 */" : "=g" (_r) : \ "0" (p), "m" (__gcnoreorderhack)); \ _r; }) This puts a comment in the .s file, which we post-process. The goal of this post-processing is to find the GCROOT comments, see what value they mention, and track where this value comes from at the preceding call. This is the messy part, because the value can often move around, sometimes across jumps. We also track if and where the callee-saved registers end up being saved. At the end we generate some static data: a map from every CALL location to a list of GC pointers which are live across this call, written out as a list of callee-saved registers and stack locations. This static data is read by custom platform-specific code in the stack walker. This works well enough because, from gcc's point of view, all GC pointers after a CALL are only used as arguments to "v2 = pypy_asm_gcroot(v)". GCC is not allowed to do things like precompute offsets inside GC objects---because v2 != v (which is true if the GC moved the object) and v2 is only created by the pypy_asm_gcroot() after the call. The drawback of this "asm" statement (besides being detached from the CALL) is that, even though we say "=g", a stack pointer will often be loaded into a register just before the "asm" and spilled again to a (likely different) stack location afterwards. This creates some pointless data movements. This seems to degrade performance by at most a few percents, so it's fine for us. So how would a GCC-supported solution look like? Maybe a single builtin that does a call and at the same time "marks" some local variables (for read/write). It would be enough if a CALL emitted from this built-in is immediately followed by an assembler pseudo-instruction that describe the location of all the local variables listed (plus context information: the current stack frame's depth, and where callee-saved registers have been saved). This would mean the user of this builtin still needs to come up with custom tools to post-process the assembler, but it is probably the simplest and most flexible solution. I may be wrong about thinking any of this would be easy, though... A bientôt, Armin. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: GCC/JIT and precise garbage collection support? 2015-01-01 0:00 ` Armin Rigo @ 2015-01-01 0:00 ` David Malcolm 2015-01-01 0:00 ` Armin Rigo 0 siblings, 1 reply; 9+ messages in thread From: David Malcolm @ 2015-01-01 0:00 UTC (permalink / raw) To: Armin Rigo; +Cc: Basile Starynkevitch, jit, GCC Development On Fri, 2015-07-10 at 11:13 +0200, Armin Rigo wrote: > Hi David, hi Basile, > > On 10 July 2015 at 03:53, David Malcolm <dmalcolm@redhat.com> wrote: > > FWIW PyPy (an implementation of Python) defaults to using true GC, and > > could benefit from GC support in GCC; currently PyPy has a nasty hack > > for locating on-stack GC roots, by compiling to assembler, then carving > > up the assembler with regexes to build GC metadata. > > A first note: write barriers, stack walking, and so on can all be > implemented manually. The only thing that cannot be implemented > easily is stack maps. > > Here's in more details how the PyPy hacks work, in case there is > interest. It might be possible to do it cleanly with minimal changes > in GCC (hopefully?). > > The goal: when a garbage collection occurs, we need to locate and > possibly change the GC pointers in the stack. (They may have been > originally in callee-saved registers, saved by some callee.) So this > is about writing some "stack map" that describes where the values are > around all calls in the stack. To do that, we put in the C sources "v > = pypy_asm_gcroot(v);" for all GC-pointer variables after each call > (at least each call that can recursively end up collecting): > > > /* The following pseudo-instruction is used by --gcrootfinder=asmgcc > just after a call to tell gcc to put a GCROOT mark on each gc-pointer > local variable. All such local variables need to go through a "v = > pypy_asm_gcroot(v)". The old value should not be used any more by > the C code; this prevents the following case from occurring: gcc > could make two copies of the local variable (e.g. one in the stack > and one in a register), pass one to GCROOT, and later use the other > one. In practice the pypy_asm_gcroot() is often a no-op in the final > machine code and doesn't prevent most optimizations. */ > > /* With gcc, getting the asm() right was tricky, though. The asm() is > not volatile so that gcc is free to delete it if the output variable > is not used at all. We need to prevent gcc from moving the asm() > *before* the call that could cause a collection; this is the purpose > of the (unused) __gcnoreorderhack input argument. Any memory input > argument would have this effect: as far as gcc knows the call > instruction can modify arbitrary memory, thus creating the order > dependency that we want. */ > > #define pypy_asm_gcroot(p) ({void*_r; \ > asm ("/* GCROOT %0 */" : "=g" (_r) : \ > "0" (p), "m" (__gcnoreorderhack)); \ > _r; }) > > > This puts a comment in the .s file, which we post-process. The goal > of this post-processing is to find the GCROOT comments, see what value > they mention, and track where this value comes from at the preceding > call. This is the messy part, because the value can often move > around, sometimes across jumps. > > We also track if and where the callee-saved registers end up being saved. > > At the end we generate some static data: a map from every CALL > location to a list of GC pointers which are live across this call, > written out as a list of callee-saved registers and stack locations. > This static data is read by custom platform-specific code in the stack > walker. > > This works well enough because, from gcc's point of view, all GC > pointers after a CALL are only used as arguments to "v2 = > pypy_asm_gcroot(v)". GCC is not allowed to do things like precompute > offsets inside GC objects---because v2 != v (which is true if the GC > moved the object) and v2 is only created by the pypy_asm_gcroot() > after the call. > > The drawback of this "asm" statement (besides being detached from the > CALL) is that, even though we say "=g", a stack pointer will often be > loaded into a register just before the "asm" and spilled again to a > (likely different) stack location afterwards. This creates some > pointless data movements. This seems to degrade performance by at > most a few percents, so it's fine for us. > > So how would a GCC-supported solution look like? Maybe a single > builtin that does a call and at the same time "marks" some local > variables (for read/write). It would be enough if a CALL emitted from > this built-in is immediately followed by an assembler > pseudo-instruction that describe the location of all the local > variables listed (plus context information: the current stack frame's > depth, and where callee-saved registers have been saved). This would > mean the user of this builtin still needs to come up with custom tools > to post-process the assembler, but it is probably the simplest and > most flexible solution. I may be wrong about thinking any of this > would be easy, though... Presumably you'd want some kind of: -fgenerate-stack-maps option (perhaps taking an argument describing the data format?) Thinking aloud here, maybe a way to implement this is to have some kind of annotation on each call describing the GC-vars that are live at the call. This somehow has to be created/maintained through the various IR formats and lowerings. I'm still relatively new to the GCC backend, so take the following with a large pinch of salt... AIUI, we lose precise type information when we expand from gimple to RTL (do we?), so presumably to keep GC-precision we'd want some kind of thing during the gimple-to-RTL expansion that annotates RTL expressions with GC information. Perhaps just a flag on RTL nodes to note those nodes that are GC-holding pointers? AIUI, we have CALL_INSN instructions all the way through the RTL phase of the backend, so we can identify which locations in the generated code are calls; presumably we'd need at each CALL_INSN to determine somehow which RTL expressions tagged as being GC-aware are live (perhaps a mixture of registers and fp-offset expressions?) So presumably we could use that information (maybe in the final pass) to write out some metadata describing for each %pc callsite the relevant GC roots. Armin: does this sound like what you need? RTL experts: does this sound (at least) vaguely sane? Dave ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: GCC/JIT and precise garbage collection support? 2015-01-01 0:00 ` David Malcolm @ 2015-01-01 0:00 ` Armin Rigo 2015-01-01 0:00 ` Jeff Law 0 siblings, 1 reply; 9+ messages in thread From: Armin Rigo @ 2015-01-01 0:00 UTC (permalink / raw) To: David Malcolm; +Cc: Basile Starynkevitch, jit, GCC Development Hi David, On 10 July 2015 at 16:11, David Malcolm <dmalcolm@redhat.com> wrote: > AIUI, we have CALL_INSN instructions all the way through the RTL phase > of the backend, so we can identify which locations in the generated code > are calls; presumably we'd need at each CALL_INSN to determine somehow > which RTL expressions tagged as being GC-aware are live (perhaps a > mixture of registers and fp-offset expressions?) > > So presumably we could use that information (maybe in the final pass) to > write out some metadata describing for each %pc callsite the relevant GC > roots. > > Armin: does this sound like what you need? Not quite. I can understand that you're trying to find some solution with automatic discovery of the live variables of a "GC pointer" type and so on. This is more than we need, and if we had that, then we'd need to work harder to remove the extra stuff. We only want the end result: attach to each CALL_INSN a list of variables which should be stored in the stack map for that call, and be ready to see these locations be modified from outside across the call if a GC occurs. A bientôt, Armin. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: GCC/JIT and precise garbage collection support? 2015-01-01 0:00 ` Armin Rigo @ 2015-01-01 0:00 ` Jeff Law 2015-01-01 0:00 ` Trevor Saunders 2015-01-01 0:00 ` Andrew Haley 0 siblings, 2 replies; 9+ messages in thread From: Jeff Law @ 2015-01-01 0:00 UTC (permalink / raw) To: Armin Rigo, David Malcolm; +Cc: Basile Starynkevitch, jit, GCC Development On 07/10/2015 09:04 AM, Armin Rigo wrote: > Hi David, > > On 10 July 2015 at 16:11, David Malcolm <dmalcolm@redhat.com> wrote: >> AIUI, we have CALL_INSN instructions all the way through the RTL phase >> of the backend, so we can identify which locations in the generated code >> are calls; presumably we'd need at each CALL_INSN to determine somehow >> which RTL expressions tagged as being GC-aware are live (perhaps a >> mixture of registers and fp-offset expressions?) >> >> So presumably we could use that information (maybe in the final pass) to >> write out some metadata describing for each %pc callsite the relevant GC >> roots. >> >> Armin: does this sound like what you need? > > Not quite. I can understand that you're trying to find some solution > with automatic discovery of the live variables of a "GC pointer" type > and so on. This is more than we need, and if > we had that, then we'd need to work harder to remove the extra stuff. > We only want the end result: attach to each CALL_INSN a list of > variables which should be stored in the stack map for that call, and > be ready to see these locations be modified from outside across the > call if a GC occurs. I wonder how much overlap there is between this need and what we're going to need to do for resumable functions which are being discussed in the ISO C++ standards meetings. jeff ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: GCC/JIT and precise garbage collection support? 2015-01-01 0:00 ` Jeff Law @ 2015-01-01 0:00 ` Trevor Saunders 2015-01-01 0:00 ` Andrew Haley 1 sibling, 0 replies; 9+ messages in thread From: Trevor Saunders @ 2015-01-01 0:00 UTC (permalink / raw) To: Jeff Law Cc: Armin Rigo, David Malcolm, Basile Starynkevitch, jit, GCC Development On Fri, Jul 10, 2015 at 12:35:14PM -0600, Jeff Law wrote: > On 07/10/2015 09:04 AM, Armin Rigo wrote: > >Hi David, > > > >On 10 July 2015 at 16:11, David Malcolm <dmalcolm@redhat.com> wrote: > >>AIUI, we have CALL_INSN instructions all the way through the RTL phase > >>of the backend, so we can identify which locations in the generated code > >>are calls; presumably we'd need at each CALL_INSN to determine somehow > >>which RTL expressions tagged as being GC-aware are live (perhaps a > >>mixture of registers and fp-offset expressions?) > >> > >>So presumably we could use that information (maybe in the final pass) to > >>write out some metadata describing for each %pc callsite the relevant GC > >>roots. > >> > >>Armin: does this sound like what you need? > > > >Not quite. I can understand that you're trying to find some solution > >with automatic discovery of the live variables of a "GC pointer" type > >and so on. This is more than we need, and if > >we had that, then we'd need to work harder to remove the extra stuff. > >We only want the end result: attach to each CALL_INSN a list of > >variables which should be stored in the stack map for that call, and > >be ready to see these locations be modified from outside across the > >call if a GC occurs. > I wonder how much overlap there is between this need and what we're going to > need to do for resumable functions which are being discussed in the ISO C++ I'm not sure about that, but there may also be some overlap with the machinary to handle exceptions. It sounded kind of crazy and I'm not sure what happened to it, but a while ago at Mozilla someone was proposing to use eh tables to implement rootting in the JS gc. Trev > standards meetings. > > jeff ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: GCC/JIT and precise garbage collection support? 2015-01-01 0:00 ` Jeff Law 2015-01-01 0:00 ` Trevor Saunders @ 2015-01-01 0:00 ` Andrew Haley 1 sibling, 0 replies; 9+ messages in thread From: Andrew Haley @ 2015-01-01 0:00 UTC (permalink / raw) To: Jeff Law, Armin Rigo, David Malcolm Cc: Basile Starynkevitch, jit, GCC Development On 07/10/2015 07:35 PM, Jeff Law wrote: > I wonder how much overlap there is between this need and what we're > going to need to do for resumable functions which are being discussed in > the ISO C++ standards meetings. A considerable amount, I'll be bound. Andrew. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: GCC/JIT and precise garbage collection support? 2015-01-01 0:00 GCC/JIT and precise garbage collection support? Basile Starynkevitch 2015-01-01 0:00 ` David Malcolm @ 2015-01-01 0:00 ` Andrew Haley 1 sibling, 0 replies; 9+ messages in thread From: Andrew Haley @ 2015-01-01 0:00 UTC (permalink / raw) To: Basile Starynkevitch, jit; +Cc: gcc On 09/07/15 23:17, Basile Starynkevitch wrote: > (this is triggered by a question on the Ocaml mailing list asking about > SystemZ backend in Ocaml; SystemZ is today a backend for GCC & probably > GCCJIT) > > We might want to support better good garbage collection schemes in GCC, > particularily in GCCJIT. This is a > thing that LLVM is known to be weak at, and we might aim to do much > better. If we did, good frontends for > good functional languages (e.g. F#, Ocaml, Haskell) might in the future > profit of GCC technology. And even a Javascript engine based on GCCJIT > could profit. > > A good GC is very probably a precise (sometimes generational copying) GC > with write barriers > (read the http://gchandbook.org/ for more, or at least the wikipage > about garbage collection). So a good GC is changing pointers. > > So we need to know where, and provide a mechanism for, pointer values > are located in the call stack (of the GCCJIT generated code), and > probably provide some write barrier machinery. It's going to be very hard. All our experience with GCJ shows that a true precise GC in GCC is going to require major surgery in many places. The HotSpot optimizing compilers are written with GC as a basic requirement and it touches many places. Andrew. ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2015-07-12 20:16 UTC | newest] Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2015-01-01 0:00 GCC/JIT and precise garbage collection support? Basile Starynkevitch 2015-01-01 0:00 ` David Malcolm 2015-01-01 0:00 ` Armin Rigo 2015-01-01 0:00 ` David Malcolm 2015-01-01 0:00 ` Armin Rigo 2015-01-01 0:00 ` Jeff Law 2015-01-01 0:00 ` Trevor Saunders 2015-01-01 0:00 ` Andrew Haley 2015-01-01 0:00 ` Andrew Haley
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).