From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-patches-return-239050-listarch-gcc-patches=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 23577 invoked by alias); 21 Apr 2009 18:23:20 -0000
Received: (qmail 23567 invoked by uid 22791); 21 Apr 2009 18:23:19 -0000
X-SWARE-Spam-Status: No, hits=-2.3 required=5.0 	tests=AWL,BAYES_00,J_CHICKENPOX_34,SPF_HELO_PASS,SPF_PASS
X-Spam-Check-By: sourceware.org
Received: from mx2.redhat.com (HELO mx2.redhat.com) (66.187.237.31)     by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Tue, 21 Apr 2009 18:23:12 +0000
Received: from int-mx2.corp.redhat.com (int-mx2.corp.redhat.com [172.16.27.26]) 	by mx2.redhat.com (8.13.8/8.13.8) with ESMTP id n3LINAka016190; 	Tue, 21 Apr 2009 14:23:10 -0400
Received: from ns3.rdu.redhat.com (ns3.rdu.redhat.com [10.11.255.199]) 	by int-mx2.corp.redhat.com (8.13.1/8.13.1) with ESMTP id n3LIN9gR009889; 	Tue, 21 Apr 2009 14:23:10 -0400
Received: from [10.11.13.6] (vpn-13-6.rdu.redhat.com [10.11.13.6]) 	by ns3.rdu.redhat.com (8.13.8/8.13.8) with ESMTP id n3LIN8Gr018698; 	Tue, 21 Apr 2009 14:23:08 -0400
Message-ID: <49EE0F0C.1090101@redhat.com>
Date: Tue, 21 Apr 2009 18:23:00 -0000
From: Andrew MacLeod <amacleod@redhat.com>
User-Agent: Thunderbird 2.0.0.21 (X11/20090320)
MIME-Version: 1.0
To: Michael Matz <matz@suse.de>
CC: gcc-patches@gcc.gnu.org
Subject: Re: RFC: expand from SSA form (1/2)
References: <Pine.LNX.4.64.0904132144420.29566@wotan.suse.de>
In-Reply-To: <Pine.LNX.4.64.0904132144420.29566@wotan.suse.de>
Content-Type: multipart/mixed;  boundary="------------040406080501000904080901"
X-IsSubscribed: yes
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
X-SW-Source: 2009-04/txt/msg01660.txt.bz2

This is a multi-part message in MIME format.
--------------040406080501000904080901
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Content-length: 2996

Michael Matz wrote:
> Hi,
>
> This patch implements expanding directly from SSA form, i.e. without first 
> going out and producing GENERIC.  It doesn't yet get rid of explicitely 
>   
Sorry I'm so slow getting to this :-)

This seems like a good approach, and is similar to what I figured was 
the most straightforward way of merging out-of-ssa and expand.  
Basically perform out-of-ssa and allow the ssa-names to pass through to 
to expand and deal with them there.

>
> Anyway, here is how it works:
> (1) in SSA form: create and coalesce partitions, as before, but don't
>     rewrite anything.  Pass this info to cfgexpand.c.
> (2) cfgexpand.c: allocate some space (in pseudos or stack) for each 
>     partition not containing a default def (these are the ones 
>     corresponding to VAR_DECLs).  Set interesting attributes for that RTX 
>     according to the underlying SSA_NAME_VAR.  I.e. we don't have to 
>     create artificial abc.24 variables for secondary partition.
>   
Do you still create stack space for the original SSA_NAME variable? It 
doesn't look like it, but I'm not real familiar with that code.  It 
shouldn't be needed normally right? I'm just wondering which partitions 
without default defs would require stack space.

> TER is still supported and implemented by deferring expanding an 
> assignment that is TERed to the place where it's actually used.  That's a 
> change from before: we don't insert the tree of the RHS directly into the 
> tree that is going to be expanded.
>   
yeah, but TER is basically just a large expression table indexed by ssa 
version.  So you just carry that table through to expand and utilize it 
as you encounter the single use during expansion right?

>
> I've also changed out-of-ssa to mostly work only with partition numbers 
> instead of partition variables (those are removed in the cleanup pass).  
> We also don't need to must-coalesce default defs with base variables or in 
> fact change the variables per partition in any way at all (i.e. a 
> partition always corresponds to exactly one SSA name, whose version is the 
> partition number).
>   

yes, this aspect is certainly simplified now.

> As out-of-ssa and expand are now basically one pass there can't be any 
> passes working on non-SSA trees anymore.  Currently that's only two 
> passes: tree-nrv, which is easily fixed, and mudflap, which I deactivated 
> for now.
>   

Has there been any thought to turning mudflap into a plugin now that we 
have the machinery for that? Its seems like a ripe candidate.
>
> But I'm interested in any comments you might have.  In particular I'm not 
> exceptionally fond of the four insert_*_on_edge() routines, though we 
> indeed need four signatures: partition to partition, tree to partition, 
> rtx to partition and partition to rtx.
>   
yeah, but I'm not sure what you can usefully do about that either, with 
tree, rtx and int types to deal with.

I think it looks pretty good.  Just a few comments.
Andrew


--------------040406080501000904080901
Content-Type: text/plain;
 name="outofssa-reply"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline;
 filename="outofssa-reply"
Content-length: 5201

> *************** expand_used_vars (void)
> *** 1451,1456 ****
> --- 1510,1557 ----
>   
>     init_vars_expansion ();
>   
> +   for (i = 0; i < SA.map->num_partitions; i++)
> +     {
> +       tree var = partition_to_var (SA.map, i);
> + 
> +       if (TREE_CODE (var) == SSA_NAME)
> + 	var = SSA_NAME_VAR (var);
> +       gcc_assert (is_gimple_reg (var));
> +       if (TREE_CODE (var) == VAR_DECL)
> + 	expand_one_var (partition_to_var (SA.map, i), true, true);
> +       else
> + 	{
> + 	  /* This is a PARM_DECL or RESULT_DECL.  For those partitions that
> + 	     contain the default def (representing the parm or result itself)
> + 	     we don't do anything here.  But those which don't contain the
> + 	     default def (representing a temporary based on the parm/result)
> + 	     we need to allocate space just like for normal VAR_DECLs.  */

I presume the allocated space for these temps is normally in a register?


> + 	  int j = i;
> + 	  struct partition_elem *start, *elem;
> + 	  int has_default = 0;
> + 	  if (SA.map->view_to_partition)
> + 	    j = SA.map->view_to_partition[j];
> + 	  j = partition_find (SA.map->var_partition, j);
> + 	  start = elem = SA.map->var_partition->elements + j;
> + 	  do

Ugg, do you have to expose the internal workings of the partition mechanism? I tried to use only the published API in case I ever wanted to change it for performance reasons, or whatever... Maybe add a bitmap to the expansion structure which indicates which partitions have a default_def in them, and initialize it at the end of the out-of-ssa process once partitions don't change any more.  It'll also make the code here much clearer, just check if the bit is on.


> ! 	      def_operand_p def_p;
> ! 	      tree stmt_tree;
> ! 	      def_p = SINGLE_SSA_DEF_OPERAND (stmt, SSA_OP_DEF);
> ! 
> ! 	      if (def_p != NULL)
> ! 		{
> ! 		  /* Mark this stmt for removal if it is the list of replaceable
> ! 		     expressions.  */

The stmt isnt being marked for removal, it just isnt having any expansion done on it...

>   
> +   /* Now that we also have the parameter RTXs, copy them over to our
> +      partitions.  */
> +   for (i = 0; i < SA.map->num_partitions; i++)
> +     {
> +       tree var = partition_to_var (SA.map, i);
> + 
> +       if (TREE_CODE (var) == SSA_NAME)
> + 	var = SSA_NAME_VAR (var);
> +       if (TREE_CODE (var) != VAR_DECL
> + 	  && !SA.partition_to_pseudo[i])
> + 	SA.partition_to_pseudo[i] = DECL_RTL_IF_SET (var);
> +       gcc_assert (SA.partition_to_pseudo[i]);
> +       /* Some RTL parts really want to look at DECL_RTL(x) when x
> +          was a decl marked in REG_ATTR or MEM_ATTR.  We could use
> + 	 SET_DECL_RTL here making this available, but that would mean
> + 	 to select one of the potentially many RTLs for one DECL.  Instead
> + 	 of doing that we simply reset the MEM_EXPR of the RTL in question,
> + 	 then nobody can get at it and hence nobody can call DECL_RTL on it.  */

I presume that the MEM_EXPR is recreated somewhere? or it doesn't matter for some other reason?
  

> +   currently_expanding_to_rtl = 1;
> +   FOR_BB_BETWEEN (bb, ENTRY_BLOCK_PTR->next_bb, EXIT_BLOCK_PTR, next_bb)

Wasn't 'currently_expanding_to_rtl' already set earlier in gimple_expand_cfg()? 

> +     {
> +       edge e;
> +       edge_iterator ei;
> +       /* ??? commit_one_edge_insertion might reorder the edges of the
> +          current block (when it needs to split an edge), so that we
> + 	 might miss some edges with instructions on them.  Pff.
> + 	 (see execute/ashldi-1.c).  */
> +       VEC(edge,gc) *edges = VEC_copy (edge,gc,bb->preds);
> + 
> +       for (ei = ei_start (edges); (e = ei_safe_edge (ei)); )
> + 	{
> + 	  ei_next (&ei);
> + 	  if (e->insns.r)
> + 	    commit_one_edge_insertion (e);
> + 	}
> +       VEC_free (edge, gc, edges);
> +     }

how annoying eh. Why doesn't commit_edge_insertions() run into this problem as well? It just loops over the edges with a FOR... is it because it does SUCC's instead? could we use SUCCs here instead?

> Index: tree-ssa-coalesce.c
> ===================================================================
> *** tree-ssa-coalesce.c.orig
> --- tree-ssa-coalesce.c
> *************** compare_pairs (const void *p1, const voi
> *** 314,320 ****
>     const_coalesce_pair_p const *const pp2 = (const_coalesce_pair_p const *) p2;
>     int result;
>   
> !   result = (* pp2)->cost - (* pp1)->cost;
>     /* Since qsort does not guarantee stability we use the elements
	 as a secondary key.  This provides us with independence from
	 the host's implementation of the sorting algorithm.  */
> --- 314,320 ----
>     const_coalesce_pair_p const *const pp2 = (const_coalesce_pair_p const *) p2;
>     int result;
>   
> !   result = (* pp1)->cost - (* pp2)->cost;
>     /* Since qsort does not guarantee stability we use the elements
	 as a secondary key.  This provides us with independence from
	 the host's implementation of the sorting algorithm.  */

Hmm. how long has this been backwards... 
its seems fine in 4.2, I rewrote it for 4.3, and jeez, its been backwards since 4.3.0 was released it seems.  Zoinks.  Wonder if there are any performance regressions due to that....  


--------------040406080501000904080901--