From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <jit-return-778-listarch-jit=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 20495 invoked by alias); 10 Jul 2015 14:19:52 -0000
Mailing-List: contact jit-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Post: <mailto:jit@gcc.gnu.org>
List-Help: <mailto:jit-help@gcc.gnu.org>
List-Subscribe: <mailto:jit-subscribe@gcc.gnu.org>
Sender: jit-owner@gcc.gnu.org
Received: (qmail 20476 invoked by uid 89); 10 Jul 2015 14:19:51 -0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Checked: by ClamAV 0.98.7 on sourceware.org
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-1.7 required=5.0 tests=AWL,BAYES_00,KAM_LAZY_DOMAIN_SECURITY,RP_MATCHES_RCVD,SPF_HELO_PASS autolearn=no version=3.3.2
X-Spam-Status: No, score=-1.7 required=5.0 tests=AWL,BAYES_00,KAM_LAZY_DOMAIN_SECURITY,RP_MATCHES_RCVD,SPF_HELO_PASS autolearn=no version=3.3.2
X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on sourceware.org
X-Spam-Level: 
X-Spam-User: qpsmtpd, 2 recipients
X-HELO: mx1.redhat.com
Message-ID: <1436537517.31573.48.camel@surprise>
Subject: Re: GCC/JIT and precise garbage collection support?
From: David Malcolm <dmalcolm@redhat.com>
To: Armin Rigo <arigo@tunes.org>
Cc: Basile Starynkevitch <basile@starynkevitch.net>, jit@gcc.gnu.org,
        GCC
	Development <gcc@gcc.gnu.org>
Date: Thu, 01 Jan 2015 00:00:00 -0000
In-Reply-To: <CAMSv6X2XjYZ5Q_zUEDY0-Tt_EAuow9WC7sO7Kh5PSk0xKheWGg@mail.gmail.com>
References: <559EF2F1.6000000@starynkevitch.net>
	 <1436493224.31573.32.camel@surprise>
	 <CAMSv6X2XjYZ5Q_zUEDY0-Tt_EAuow9WC7sO7Kh5PSk0xKheWGg@mail.gmail.com>
Content-Type: text/plain; charset="UTF-8"
Mime-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Scanned-By: MIMEDefang 2.68 on 10.5.11.23
X-SW-Source: 2015-q3/txt/msg00078.txt.bz2

On Fri, 2015-07-10 at 11:13 +0200, Armin Rigo wrote:
> Hi David, hi Basile,
> 
> On 10 July 2015 at 03:53, David Malcolm <dmalcolm@redhat.com> wrote:
> > FWIW PyPy (an implementation of Python) defaults to using true GC, and
> > could benefit from GC support in GCC; currently PyPy has a nasty hack
> > for locating on-stack GC roots, by compiling to assembler, then carving
> > up the assembler with regexes to build GC metadata.
> 
> A first note: write barriers, stack walking, and so on can all be
> implemented manually.  The only thing that cannot be implemented
> easily is stack maps.
> 
> Here's in more details how the PyPy hacks work, in case there is
> interest.  It might be possible to do it cleanly with minimal changes
> in GCC (hopefully?).
> 
> The goal: when a garbage collection occurs, we need to locate and
> possibly change the GC pointers in the stack.  (They may have been
> originally in callee-saved registers, saved by some callee.)  So this
> is about writing some "stack map" that describes where the values are
> around all calls in the stack.  To do that, we put in the C sources "v
> = pypy_asm_gcroot(v);" for all GC-pointer variables after each call
> (at least each call that can recursively end up collecting):
> 
> 
> /* The following pseudo-instruction is used by --gcrootfinder=asmgcc
>    just after a call to tell gcc to put a GCROOT mark on each gc-pointer
>    local variable.  All such local variables need to go through a "v =
>    pypy_asm_gcroot(v)".  The old value should not be used any more by
>    the C code; this prevents the following case from occurring: gcc
>    could make two copies of the local variable (e.g. one in the stack
>    and one in a register), pass one to GCROOT, and later use the other
>    one.  In practice the pypy_asm_gcroot() is often a no-op in the final
>    machine code and doesn't prevent most optimizations. */
> 
> /* With gcc, getting the asm() right was tricky, though.  The asm() is
>    not volatile so that gcc is free to delete it if the output variable
>    is not used at all.  We need to prevent gcc from moving the asm()
>    *before* the call that could cause a collection; this is the purpose
>    of the (unused) __gcnoreorderhack input argument.  Any memory input
>    argument would have this effect: as far as gcc knows the call
>    instruction can modify arbitrary memory, thus creating the order
>    dependency that we want. */
> 
> #define pypy_asm_gcroot(p) ({void*_r; \
>         asm ("/* GCROOT %0 */" : "=g" (_r) :       \
>          "0" (p), "m" (__gcnoreorderhack));    \
>         _r; })
> 
> 
> This puts a comment in the .s file, which we post-process.  The goal
> of this post-processing is to find the GCROOT comments, see what value
> they mention, and track where this value comes from at the preceding
> call.  This is the messy part, because the value can often move
> around, sometimes across jumps.
> 
> We also track if and where the callee-saved registers end up being saved.
> 
> At the end we generate some static data: a map from every CALL
> location to a list of GC pointers which are live across this call,
> written out as a list of callee-saved registers and stack locations.
> This static data is read by custom platform-specific code in the stack
> walker.
> 
> This works well enough because, from gcc's point of view, all GC
> pointers after a CALL are only used as arguments to "v2 =
> pypy_asm_gcroot(v)".  GCC is not allowed to do things like precompute
> offsets inside GC objects---because v2 != v (which is true if the GC
> moved the object) and v2 is only created by the pypy_asm_gcroot()
> after the call.
> 
> The drawback of this "asm" statement (besides being detached from the
> CALL) is that, even though we say "=g", a stack pointer will often be
> loaded into a register just before the "asm" and spilled again to a
> (likely different) stack location afterwards.  This creates some
> pointless data movements.  This seems to degrade performance by at
> most a few percents, so it's fine for us.
> 
> So how would a GCC-supported solution look like?  Maybe a single
> builtin that does a call and at the same time "marks" some local
> variables (for read/write).  It would be enough if a CALL emitted from
> this built-in is immediately followed by an assembler
> pseudo-instruction that describe the location of all the local
> variables listed (plus context information: the current stack frame's
> depth, and where callee-saved registers have been saved).  This would
> mean the user of this builtin still needs to come up with custom tools
> to post-process the assembler, but it is probably the simplest and
> most flexible solution.  I may be wrong about thinking any of this
> would be easy, though...

Presumably you'd want some kind of:

  -fgenerate-stack-maps

option (perhaps taking an argument describing the data format?)

Thinking aloud here, maybe a way to implement this is to have some kind
of annotation on each call describing the GC-vars that are live at the
call.

This somehow has to be created/maintained through the various IR formats
and lowerings.

I'm still relatively new to the GCC backend, so take the following with
a large pinch of salt...

AIUI, we lose precise type information when we expand from gimple to RTL
(do we?), so presumably to keep GC-precision we'd want some kind of
thing during the gimple-to-RTL expansion that annotates RTL expressions
with GC information.  Perhaps just a flag on RTL nodes to note those
nodes that are GC-holding pointers?

AIUI, we have CALL_INSN instructions all the way through the RTL phase
of the backend, so we can identify which locations in the generated code
are calls; presumably we'd need at each CALL_INSN to determine somehow
which RTL expressions tagged as being GC-aware are live (perhaps a
mixture of registers and fp-offset expressions?)

So presumably we could use that information (maybe in the final pass) to
write out some metadata describing for each %pc callsite the relevant GC
roots.

Armin: does this sound like what you need?
RTL experts: does this sound (at least) vaguely sane?

Dave