public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: Richard Guenther <richard.guenther@gmail.com>
To: Michael Matz <matz@suse.de>
Cc: Andrew MacLeod <amacleod@redhat.com>, gcc-patches@gcc.gnu.org
Subject: Re: RFC: expand from SSA form (1/2)
Date: Wed, 22 Apr 2009 11:20:00 -0000	[thread overview]
Message-ID: <84fc9c000904220420m104826a7j1859aeea31a72d90@mail.gmail.com> (raw)
In-Reply-To: <Pine.LNX.4.64.0904221129090.29566@wotan.suse.de>

On Wed, Apr 22, 2009 at 12:54 PM, Michael Matz <matz@suse.de> wrote:
> Hi,
>
> On Tue, 21 Apr 2009, Andrew MacLeod wrote:
>
>> > Anyway, here is how it works:
>> > (1) in SSA form: create and coalesce partitions, as before, but don't
>> >     rewrite anything.  Pass this info to cfgexpand.c.
>> > (2) cfgexpand.c: allocate some space (in pseudos or stack) for each
>> >     partition not containing a default def (these are the ones corresponding
>> >     to VAR_DECLs).  Set interesting attributes for that RTX according to the
>> >     underlying SSA_NAME_VAR.  I.e. we don't have to create artificial abc.24
>> >     variables for secondary partition.
>>
>> Do you still create stack space for the original SSA_NAME variable? It
>> doesn't look like it, but I'm not real familiar with that code.
>
> No, no space is allocated for the base variables itself.  Only for a
> partition which happens to be based on a PARM_DECL, but which doesn't
> contain the default def I have to allocate new space.  These partitions
> are the ones for which formerly new temporary variables would have been
> created (and hence new space allocated) as they weren't coalescable with
> the partition containing the default def.
>
>> It shouldn't be needed normally right? I'm just wondering which
>> partitions without default defs would require stack space.
>
> All partitions without default def require space.  If stack or pseudos is
> decided by use_register_for_decl().
>
>> yeah, but TER is basically just a large expression table indexed by ssa
>> version.  So you just carry that table through to expand and utilize it
>> as you encounter the single use during expansion right?
>
> Exactly.  So the effect of TER with those patches is (for now) only to
> reduce the lifetime of the LHS (most probably a pseudo).  It doesn't start
> at the original point of definition anymore, but only right before it's
> use (where it then also dies).  In the future we can make use of the table
> directly in expand, i.e. insert the RHS into the to-be-expanded
> expressions on the fly.
>
>> > As out-of-ssa and expand are now basically one pass there can't be any
>> > passes working on non-SSA trees anymore.  Currently that's only two
>> > passes: tree-nrv, which is easily fixed, and mudflap, which I
>> > deactivated for now.
>>
>> Has there been any thought to turning mudflap into a plugin now that we
>> have the machinery for that? Its seems like a ripe candidate.
>
> At least not from my side.  I went the sissy way and instead fixed mudflap
> to work on SSA form :-|
>
> Comments on the patch comments below :)  I'll soon send a new version of
> the patch that fixes all problems and testcases I encountered.
>
>
> Ciao,
> Michael.
>
>> > +     /* This is a PARM_DECL or RESULT_DECL.  For those partitions that
>> > +        contain the default def (representing the parm or result itself)
>> > +        we don't do anything here.  But those which don't contain the
>> > +        default def (representing a temporary based on the parm/result)
>> > +        we need to allocate space just like for normal VAR_DECLs.  */
>>
>> I presume the allocated space for these temps is normally in a register?
>
> Yes, when optimization is on.  With -O0 mostly everything will be comitted
> to stack.
>
>> > +     int j = i;
>> > +     struct partition_elem *start, *elem;
>> > +     int has_default = 0;
>> > +     if (SA.map->view_to_partition)
>> > +       j = SA.map->view_to_partition[j];
>> > +     j = partition_find (SA.map->var_partition, j);
>> > +     start = elem = SA.map->var_partition->elements + j;
>> > +     do
>>
>> Ugg, do you have to expose the internal workings of the partition
>> mechanism?  I tried to use only the published API in case I ever
>> wanted to change it for performance reasons, or whatever... Maybe add
>> a bitmap to the expansion structure which indicates which partitions
>> have a default_def in them, and initialize it at the end of the
>> out-of-ssa process once partitions don't change any more.
>
> Yeah, I'm not terribly fond of the iteration above either.  I somehow
> wanted to avoid walking all SSA names to set this to-be-invented flag,
> but I also didn't want to slow the partition unioning by merging the
> flags.  I probably bite the apple and create a new bitmap per partition
> as you suggested.
>
>> > !             /* Mark this stmt for removal if it is the list of replaceable
>> > !                expressions.  */
>>
>> The stmt isnt being marked for removal, it just isnt having any expansion
>> done on it...
>
> Oh, right.  copy&pasted comments tend to become stale :)
>
>> > +       /* Some RTL parts really want to look at DECL_RTL(x) when x
>> > +          was a decl marked in REG_ATTR or MEM_ATTR.  We could use
>> > +    SET_DECL_RTL here making this available, but that would mean
>> > +    to select one of the potentially many RTLs for one DECL.  Instead
>> > +    of doing that we simply reset the MEM_EXPR of the RTL in question,
>> > +    then nobody can get at it and hence nobody can call DECL_RTL on it.  */
>>
>> I presume that the MEM_EXPR is recreated somewhere? or it doesn't
>> matter for some other reason?
>
> MEM_ATTR (and hence MEM_EXPR) are created by
> set_decl_rtl (via set_mem_attr*).  If it isn't set explicitely it
> isn't recreated lazily anymore.  Which is a good thing.  It's used in
> the RTL alias analysis.  So given two RTL MEMs it looks
> up the MEM_EXPRs of both (getting at e.g. tree _DECL nodes), and
> compares those.  But it then _also_ looks at DECL_RTL() of those nodes,
> getting back to the RTL expression (either the original or one
> representing the base object, e.g. without offset).  So it first goes
> from RTL to tree and from there back to RTL.
>
> The problem with that is that DECL_RTL() lazily tries to create the RTL
> expression if it weren't set yet.  And we don't set DECL_RTL for
> variables split into SSA partitions (because there are multiple RTL
> expressions for each base var).  So this DECL_RTL lookup in alias.c
> would lazily try to create the RTL, which then ICEs because such lazy
> generation is not acceptable for VAR_DECLs.
>
> So, we have to break the RTL->tree->RTL lookup at one of the two steps.
> Either fix all RTL passes to not look up DECL_RTL() on some tree, or not
> even providing the tree to start with.  I chose the latter for the
> following reason: the problem only occurs with MEMs (not REGs).  We
> generate MEMs for SSA names only if not optimizing or in exceptional
> situations (impossible machine mode for registers for instance).  If not
> optimizing the alias machinery isn't active, so doesn't need the
> MEM_EXPR.  In the exceptional situations looking at the MEM RTL
> expression itself is enough for disambiguation (and it happens very
> seldomly).  So removing the MEM_EXPR from the MEM_ATTRs doesn't hinder
> optimization, and is a central point to ensure that the later RTL
> passes don't accidentally call DECL_RTL.
>
>> > +   currently_expanding_to_rtl = 1;
>> > +   FOR_BB_BETWEEN (bb, ENTRY_BLOCK_PTR->next_bb, EXIT_BLOCK_PTR, next_bb)
>>
>> Wasn't 'currently_expanding_to_rtl' already set earlier in
>> gimple_expand_cfg()?
>
> Yeah.  It's reset to 0 five lines above:
>  currently_expanding_to_rtl = 0;
>  convert_from_eh_region_ranges ();
>  set_eh_throw_stmt_table (cfun, NULL);
>  rebuild_jump_labels (get_insns ());
>  find_exception_handler_labels ();
>  currently_expanding_to_rtl = 1;
> The four routines in between look as if they might be affected by the
> setting of currently_expanding_to_rtl, but splitting edges needs to come
> afterwards and needs to have it set.  greping around now I see that
> before the patch currently_expanding_to_rtl is only used in two
> backends, so it's probably safe to remove the funny reset/set.
>
>> > +       /* ??? commit_one_edge_insertion might reorder the edges of the
>> > +          current block (when it needs to split an edge), so that we
>> > +    might miss some edges with instructions on them.  Pff.
>> > +    (see execute/ashldi-1.c).  */
>> > +       VEC(edge,gc) *edges = VEC_copy (edge,gc,bb->preds);
>> > +
>> > +       for (ei = ei_start (edges); (e = ei_safe_edge (ei)); )
>> > +   {
>> > +     ei_next (&ei);
>> > +     if (e->insns.r)
>> > +       commit_one_edge_insertion (e);
>> > +   }
>> > +       VEC_free (edge, gc, edges);
>> > +     }
>>
>> how annoying eh. Why doesn't commit_edge_insertions() run into this
>> problem as well?
>
> Yeah, I was also confused about this.
>
>> It just loops over the edges with a FOR... is it
>> because it does SUCC's instead? could we use SUCCs here instead?
>
> I tried using preds and succs, iterating from back or from front.
> Nothing helped.  If one thinks about it it also can't help (the way
> edges are removed/inserted to split them necessarily invalidates
> iterators).  The only theory I have is, that in all places where
> currently commit_edge_insertions() is called there either aren't that
> many critical edges with insns on them, or at least there's only one
> such edge per block.
>
> The nature of the problem is the following:  Suppose you have this edge
> list (preds or succs doesn't matter):  E2 E3* E4 E5*
> (the * edges have insns, suppose E3 is critical).  Now when inserting
> for E3 we need to split it, let's call it E3'.  For wiring this new edge
> into the above list, we first remove E3 and end up with:
>  E2 E5* E4  (that's the fast removal at work, fill up the empty slot
>              with the last element)
> and put back E3' :  E2 E5* E4 E3'.  Voila, our iterator (now pointing to
> the next element, E4) won't come by E5 anymore and miss an insertion.
>
> Thinking about it now I see a way to solve it.  Simply not advancing the
> iterator when we have something to insert should do the trick.  Still
> doesn't explain why commit_edge_insertions should work.

We probably should fix commit_edge_insertions the same way, likewise
the tree variant in gimple-iterator.  Note that FOR_EACH_EDGE
specifically says

  It must not be used when an element might be removed during the traversal,
  otherwise elements will be missed

A separate patch for this is welcome.

>> > !   result = (* pp1)->cost - (* pp2)->cost;
>> >     /* Since qsort does not guarantee stability we use the elements
>>        as a secondary key.  This provides us with independence from
>>        the host's implementation of the sorting algorithm.  */
>>
>> Hmm. how long has this been backwards...
>> its seems fine in 4.2, I rewrote it for 4.3, and jeez, its been backwards
>> since 4.3.0 was released it seems.  Zoinks.
>
> :-)  It's a bit non-obvious, because the predicate looks correct at
> first sight and only becomes incorrect considering that the sorted array
> then is walked from back to front ;)

Maybe fix this on trunk separately.

Richard.

  reply	other threads:[~2009-04-22 11:20 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-04-13 20:50 Michael Matz
2009-04-21 18:23 ` Andrew MacLeod
2009-04-22 10:12   ` Paolo Bonzini
2009-04-22 10:16     ` Richard Guenther
2009-04-22 10:54   ` Michael Matz
2009-04-22 11:20     ` Richard Guenther [this message]
2009-04-22 12:34     ` Andrew MacLeod
2009-04-22 16:45     ` [RFA] " Michael Matz
2009-04-23 15:10       ` Andrew MacLeod
2009-04-24  9:42         ` Richard Guenther
2009-04-26 20:27         ` Michael Matz
2010-01-19 15:48           ` H.J. Lu
2009-04-24 14:32       ` Richard Guenther
2009-04-24 14:46         ` Richard Guenther
2009-04-26 20:21           ` Michael Matz
2009-04-26 20:34             ` Richard Guenther
2009-04-26 20:53               ` Michael Matz
2009-04-26 21:14                 ` Richard Guenther
2009-04-26 21:15                   ` Michael Matz
2009-04-26 21:17                     ` Richard Guenther
2009-04-26 22:21                       ` Michael Matz
2009-04-26 21:42             ` Michael Matz
2009-04-26 22:15               ` Michael Matz
2009-04-27 12:34             ` Michael Matz
2009-04-27  5:47       ` H.J. Lu
2009-04-28 23:49         ` H.J. Lu
2009-04-29  0:21           ` Andrew Pinski
2009-04-30 13:47           ` H.J. Lu
2009-05-29  3:47             ` H.J. Lu
2010-10-20 18:02               ` H.J. Lu
2011-02-14 18:02                 ` H.J. Lu
2009-04-27  7:22       ` Hans-Peter Nilsson
2009-04-30 18:18       ` Steve Ellcey
2009-05-01 17:40         ` Michael Matz

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=84fc9c000904220420m104826a7j1859aeea31a72d90@mail.gmail.com \
    --to=richard.guenther@gmail.com \
    --cc=amacleod@redhat.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=matz@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).