From: Richard Guenther <richard.guenther@gmail.com>
To: Michael Matz <matz@suse.de>
Cc: Andrew MacLeod <amacleod@redhat.com>, gcc-patches@gcc.gnu.org
Subject: Re: RFC: expand from SSA form (1/2)
Date: Wed, 22 Apr 2009 11:20:00 -0000 [thread overview]
Message-ID: <84fc9c000904220420m104826a7j1859aeea31a72d90@mail.gmail.com> (raw)
In-Reply-To: <Pine.LNX.4.64.0904221129090.29566@wotan.suse.de>
On Wed, Apr 22, 2009 at 12:54 PM, Michael Matz <matz@suse.de> wrote:
> Hi,
>
> On Tue, 21 Apr 2009, Andrew MacLeod wrote:
>
>> > Anyway, here is how it works:
>> > (1) in SSA form: create and coalesce partitions, as before, but don't
>> > rewrite anything. Pass this info to cfgexpand.c.
>> > (2) cfgexpand.c: allocate some space (in pseudos or stack) for each
>> > partition not containing a default def (these are the ones corresponding
>> > to VAR_DECLs). Set interesting attributes for that RTX according to the
>> > underlying SSA_NAME_VAR. I.e. we don't have to create artificial abc.24
>> > variables for secondary partition.
>>
>> Do you still create stack space for the original SSA_NAME variable? It
>> doesn't look like it, but I'm not real familiar with that code.
>
> No, no space is allocated for the base variables itself. Only for a
> partition which happens to be based on a PARM_DECL, but which doesn't
> contain the default def I have to allocate new space. These partitions
> are the ones for which formerly new temporary variables would have been
> created (and hence new space allocated) as they weren't coalescable with
> the partition containing the default def.
>
>> It shouldn't be needed normally right? I'm just wondering which
>> partitions without default defs would require stack space.
>
> All partitions without default def require space. If stack or pseudos is
> decided by use_register_for_decl().
>
>> yeah, but TER is basically just a large expression table indexed by ssa
>> version. So you just carry that table through to expand and utilize it
>> as you encounter the single use during expansion right?
>
> Exactly. So the effect of TER with those patches is (for now) only to
> reduce the lifetime of the LHS (most probably a pseudo). It doesn't start
> at the original point of definition anymore, but only right before it's
> use (where it then also dies). In the future we can make use of the table
> directly in expand, i.e. insert the RHS into the to-be-expanded
> expressions on the fly.
>
>> > As out-of-ssa and expand are now basically one pass there can't be any
>> > passes working on non-SSA trees anymore. Currently that's only two
>> > passes: tree-nrv, which is easily fixed, and mudflap, which I
>> > deactivated for now.
>>
>> Has there been any thought to turning mudflap into a plugin now that we
>> have the machinery for that? Its seems like a ripe candidate.
>
> At least not from my side. I went the sissy way and instead fixed mudflap
> to work on SSA form :-|
>
> Comments on the patch comments below :) I'll soon send a new version of
> the patch that fixes all problems and testcases I encountered.
>
>
> Ciao,
> Michael.
>
>> > + /* This is a PARM_DECL or RESULT_DECL. For those partitions that
>> > + contain the default def (representing the parm or result itself)
>> > + we don't do anything here. But those which don't contain the
>> > + default def (representing a temporary based on the parm/result)
>> > + we need to allocate space just like for normal VAR_DECLs. */
>>
>> I presume the allocated space for these temps is normally in a register?
>
> Yes, when optimization is on. With -O0 mostly everything will be comitted
> to stack.
>
>> > + int j = i;
>> > + struct partition_elem *start, *elem;
>> > + int has_default = 0;
>> > + if (SA.map->view_to_partition)
>> > + j = SA.map->view_to_partition[j];
>> > + j = partition_find (SA.map->var_partition, j);
>> > + start = elem = SA.map->var_partition->elements + j;
>> > + do
>>
>> Ugg, do you have to expose the internal workings of the partition
>> mechanism? I tried to use only the published API in case I ever
>> wanted to change it for performance reasons, or whatever... Maybe add
>> a bitmap to the expansion structure which indicates which partitions
>> have a default_def in them, and initialize it at the end of the
>> out-of-ssa process once partitions don't change any more.
>
> Yeah, I'm not terribly fond of the iteration above either. I somehow
> wanted to avoid walking all SSA names to set this to-be-invented flag,
> but I also didn't want to slow the partition unioning by merging the
> flags. I probably bite the apple and create a new bitmap per partition
> as you suggested.
>
>> > ! /* Mark this stmt for removal if it is the list of replaceable
>> > ! expressions. */
>>
>> The stmt isnt being marked for removal, it just isnt having any expansion
>> done on it...
>
> Oh, right. copy&pasted comments tend to become stale :)
>
>> > + /* Some RTL parts really want to look at DECL_RTL(x) when x
>> > + was a decl marked in REG_ATTR or MEM_ATTR. We could use
>> > + SET_DECL_RTL here making this available, but that would mean
>> > + to select one of the potentially many RTLs for one DECL. Instead
>> > + of doing that we simply reset the MEM_EXPR of the RTL in question,
>> > + then nobody can get at it and hence nobody can call DECL_RTL on it. */
>>
>> I presume that the MEM_EXPR is recreated somewhere? or it doesn't
>> matter for some other reason?
>
> MEM_ATTR (and hence MEM_EXPR) are created by
> set_decl_rtl (via set_mem_attr*). If it isn't set explicitely it
> isn't recreated lazily anymore. Which is a good thing. It's used in
> the RTL alias analysis. So given two RTL MEMs it looks
> up the MEM_EXPRs of both (getting at e.g. tree _DECL nodes), and
> compares those. But it then _also_ looks at DECL_RTL() of those nodes,
> getting back to the RTL expression (either the original or one
> representing the base object, e.g. without offset). So it first goes
> from RTL to tree and from there back to RTL.
>
> The problem with that is that DECL_RTL() lazily tries to create the RTL
> expression if it weren't set yet. And we don't set DECL_RTL for
> variables split into SSA partitions (because there are multiple RTL
> expressions for each base var). So this DECL_RTL lookup in alias.c
> would lazily try to create the RTL, which then ICEs because such lazy
> generation is not acceptable for VAR_DECLs.
>
> So, we have to break the RTL->tree->RTL lookup at one of the two steps.
> Either fix all RTL passes to not look up DECL_RTL() on some tree, or not
> even providing the tree to start with. I chose the latter for the
> following reason: the problem only occurs with MEMs (not REGs). We
> generate MEMs for SSA names only if not optimizing or in exceptional
> situations (impossible machine mode for registers for instance). If not
> optimizing the alias machinery isn't active, so doesn't need the
> MEM_EXPR. In the exceptional situations looking at the MEM RTL
> expression itself is enough for disambiguation (and it happens very
> seldomly). So removing the MEM_EXPR from the MEM_ATTRs doesn't hinder
> optimization, and is a central point to ensure that the later RTL
> passes don't accidentally call DECL_RTL.
>
>> > + currently_expanding_to_rtl = 1;
>> > + FOR_BB_BETWEEN (bb, ENTRY_BLOCK_PTR->next_bb, EXIT_BLOCK_PTR, next_bb)
>>
>> Wasn't 'currently_expanding_to_rtl' already set earlier in
>> gimple_expand_cfg()?
>
> Yeah. It's reset to 0 five lines above:
> currently_expanding_to_rtl = 0;
> convert_from_eh_region_ranges ();
> set_eh_throw_stmt_table (cfun, NULL);
> rebuild_jump_labels (get_insns ());
> find_exception_handler_labels ();
> currently_expanding_to_rtl = 1;
> The four routines in between look as if they might be affected by the
> setting of currently_expanding_to_rtl, but splitting edges needs to come
> afterwards and needs to have it set. greping around now I see that
> before the patch currently_expanding_to_rtl is only used in two
> backends, so it's probably safe to remove the funny reset/set.
>
>> > + /* ??? commit_one_edge_insertion might reorder the edges of the
>> > + current block (when it needs to split an edge), so that we
>> > + might miss some edges with instructions on them. Pff.
>> > + (see execute/ashldi-1.c). */
>> > + VEC(edge,gc) *edges = VEC_copy (edge,gc,bb->preds);
>> > +
>> > + for (ei = ei_start (edges); (e = ei_safe_edge (ei)); )
>> > + {
>> > + ei_next (&ei);
>> > + if (e->insns.r)
>> > + commit_one_edge_insertion (e);
>> > + }
>> > + VEC_free (edge, gc, edges);
>> > + }
>>
>> how annoying eh. Why doesn't commit_edge_insertions() run into this
>> problem as well?
>
> Yeah, I was also confused about this.
>
>> It just loops over the edges with a FOR... is it
>> because it does SUCC's instead? could we use SUCCs here instead?
>
> I tried using preds and succs, iterating from back or from front.
> Nothing helped. If one thinks about it it also can't help (the way
> edges are removed/inserted to split them necessarily invalidates
> iterators). The only theory I have is, that in all places where
> currently commit_edge_insertions() is called there either aren't that
> many critical edges with insns on them, or at least there's only one
> such edge per block.
>
> The nature of the problem is the following: Suppose you have this edge
> list (preds or succs doesn't matter): E2 E3* E4 E5*
> (the * edges have insns, suppose E3 is critical). Now when inserting
> for E3 we need to split it, let's call it E3'. For wiring this new edge
> into the above list, we first remove E3 and end up with:
> E2 E5* E4 (that's the fast removal at work, fill up the empty slot
> with the last element)
> and put back E3' : E2 E5* E4 E3'. Voila, our iterator (now pointing to
> the next element, E4) won't come by E5 anymore and miss an insertion.
>
> Thinking about it now I see a way to solve it. Simply not advancing the
> iterator when we have something to insert should do the trick. Still
> doesn't explain why commit_edge_insertions should work.
We probably should fix commit_edge_insertions the same way, likewise
the tree variant in gimple-iterator. Note that FOR_EACH_EDGE
specifically says
It must not be used when an element might be removed during the traversal,
otherwise elements will be missed
A separate patch for this is welcome.
>> > ! result = (* pp1)->cost - (* pp2)->cost;
>> > /* Since qsort does not guarantee stability we use the elements
>> as a secondary key. This provides us with independence from
>> the host's implementation of the sorting algorithm. */
>>
>> Hmm. how long has this been backwards...
>> its seems fine in 4.2, I rewrote it for 4.3, and jeez, its been backwards
>> since 4.3.0 was released it seems. Zoinks.
>
> :-) It's a bit non-obvious, because the predicate looks correct at
> first sight and only becomes incorrect considering that the sorted array
> then is walked from back to front ;)
Maybe fix this on trunk separately.
Richard.
next prev parent reply other threads:[~2009-04-22 11:20 UTC|newest]
Thread overview: 34+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-04-13 20:50 Michael Matz
2009-04-21 18:23 ` Andrew MacLeod
2009-04-22 10:12 ` Paolo Bonzini
2009-04-22 10:16 ` Richard Guenther
2009-04-22 10:54 ` Michael Matz
2009-04-22 11:20 ` Richard Guenther [this message]
2009-04-22 12:34 ` Andrew MacLeod
2009-04-22 16:45 ` [RFA] " Michael Matz
2009-04-23 15:10 ` Andrew MacLeod
2009-04-24 9:42 ` Richard Guenther
2009-04-26 20:27 ` Michael Matz
2010-01-19 15:48 ` H.J. Lu
2009-04-24 14:32 ` Richard Guenther
2009-04-24 14:46 ` Richard Guenther
2009-04-26 20:21 ` Michael Matz
2009-04-26 20:34 ` Richard Guenther
2009-04-26 20:53 ` Michael Matz
2009-04-26 21:14 ` Richard Guenther
2009-04-26 21:15 ` Michael Matz
2009-04-26 21:17 ` Richard Guenther
2009-04-26 22:21 ` Michael Matz
2009-04-26 21:42 ` Michael Matz
2009-04-26 22:15 ` Michael Matz
2009-04-27 12:34 ` Michael Matz
2009-04-27 5:47 ` H.J. Lu
2009-04-28 23:49 ` H.J. Lu
2009-04-29 0:21 ` Andrew Pinski
2009-04-30 13:47 ` H.J. Lu
2009-05-29 3:47 ` H.J. Lu
2010-10-20 18:02 ` H.J. Lu
2011-02-14 18:02 ` H.J. Lu
2009-04-27 7:22 ` Hans-Peter Nilsson
2009-04-30 18:18 ` Steve Ellcey
2009-05-01 17:40 ` Michael Matz
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=84fc9c000904220420m104826a7j1859aeea31a72d90@mail.gmail.com \
--to=richard.guenther@gmail.com \
--cc=amacleod@redhat.com \
--cc=gcc-patches@gcc.gnu.org \
--cc=matz@suse.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).