public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
* [tree-ssa, RFC]  CFG transparent RTL expansion
@ 2003-12-19  5:37 Jan Hubicka
  2003-12-19  6:50 ` Andrew MacLeod
  0 siblings, 1 reply; 20+ messages in thread
From: Jan Hubicka @ 2003-12-19  5:37 UTC (permalink / raw)
  To: gcc, gcc-patches, rth, dnovillo, amacleod

Hi,
I am attaching patch implementing the quite much discussed idea of expanding
RTL out of GIMPLE CFG transparently.  The patch do have some rough edges and
can be decomposed to about 5 incremental steps so it is not intended for
detailed review rather than a demonstration/proof of the concept in a hope that
it will make decision easier (the patch however bootstraps and works well).

The problem I am trying to solve is question on how to preserve profile
information out of SSA based passes.  It is clear that most effective consumers
of profile include inlining, loop optimization, register allocation and basic
block reordering so profile must be available in about whole compilation pass.
Actually measuring profile in multiple stages of compilation is possible, but
earlier optimizations invalidate later measured profile so it require user to
do multiple train runs that is unfortunate so I think it is a must to have
machinery to preserve profile as well as possible from before inlining up to
final.

From the work on RTL, the experience shows that CFG is the most natural place
to store information into and about only place where updating after control
flow transformations is manageable (still not easy), the alternatives like
remembering execution counts for each instructions or branch probabilities for
conditional branches does not work well for nontrivial updates.

In RTL world, we do not have way to represent profile outside CFG and it seems
to be quite natural to make optimizers CFG transparent to both conserve CPU
cycles needed for rebulilding and add some robustness.

I believe that these observations apply to the expansion too.  The patch bellow
demonstrate how existing machinery used for updating CFG after instruction splitting
can be reused for the expansion.  The machinery works as follows:

1)  Instruction in each basic blocks are expanded from trees to RTL.
    Non control flow statements are expanded using exisitng expand functions,
    some control flow statements needs special care so they are expanded
    by hand.  As followup cleanup I would like to unfity both places expansion
    is done.

this is only new code in the patch, see tree_expand_cfg
From now on find_sub_basic_blocks is used.

2)  New basic block boundaries are discovered (expansion of STMT may involve
    conditionals or loops, so original basic blocks are turned into single
    entry multiple exists regions)
3)  Missing edges are added
4)  Probabilities of new edges are discovered based on notes dropped by RTL expanders
    Not many RTL expanders drop the probability notes, I plan to solve this by
    modifying current branch outcome guessing code to work just after expansion
    so missing probabilities are estimated using educated guesses
5)  BB and Edge frequencies/counts are propagated starting from known values at
    entry point of each original basic block using the probabilities

At the moment GCC is not CFG transparent in early stages and thus we need more
work to really get the fruits, but this solve the major part (I do have patches
for the other offenders except for loop optimizer worked on by Zdenek)

Also during working on the patch I noticed that it is pretty robust in a way
that it finds latent problems elsewhere.  It insist on both CFG representations
(tree and RTL) to be the same or the differences to be explicitly present in
the code.  It also allows to add sanity checking code such as verifying that
conditionals does not fold into constant by RTL folding so the expansion
actually don't change control flow in some nonobvious way.

The patch also needs unnesting of nested functions that is done by somewhat
bruteforce approach.  This would be temporary requirement as I would like
to put CFG into object so nested functions may have separate CFGs, but
I believe that unnesting is needed anyway.  In longer run it shall be done
in more involved way as propsed by Richard, but I don't quite see how some
technical details shall be solved (such as nonlocal gotos) so I decided to
not go for it right now.

Any comments and ideas are welcome and I hope we will make decision on this
soon.  In the case you don't like the idea of making CFG part of interface in
between tree based middleend and RTL based middleend, I would welcome more
detailed proposal on how to maintain the profile over the pass and how the
updating documented above shall be done.

Honza

Index: builtins.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/builtins.c,v
retrieving revision 1.152.2.43
diff -c -3 -p -r1.152.2.43 builtins.c
*** builtins.c	11 Dec 2003 01:44:01 -0000	1.152.2.43
--- builtins.c	19 Dec 2003 00:19:10 -0000
*************** Software Foundation, 59 Temple Place - S
*** 44,49 ****
--- 44,50 ----
  #include "tm_p.h"
  #include "target.h"
  #include "langhooks.h"
+ #include "basic-block.h"
  
  #define CALLED_AS_BUILT_IN(NODE) \
     (!strncmp (IDENTIFIER_POINTER (DECL_NAME (NODE)), "__builtin_", 10))
*************** expand_builtin_apply_args_1 (void)
*** 1154,1159 ****
--- 1155,1167 ----
    return copy_addr_to_reg (XEXP (registers, 0));
  }
  
+ /* Return RTX to emit after when we want to emit code on the entry of function.  */
+ static rtx
+ entry_of_function (void)
+ {
+   return (n_basic_blocks ? ENTRY_BLOCK_PTR->next_bb->head : get_insns ());
+ }
+ 
  /* __builtin_apply_args returns block of memory allocated on
     the stack into which is stored the arg pointer, structure
     value address, static chain, and all the registers that might
*************** expand_builtin_apply_args (void)
*** 1187,1193 ****
         chain current, so the code is placed at the start of the
         function.  */
      push_topmost_sequence ();
!     emit_insn_before (seq, NEXT_INSN (get_insns ()));
      pop_topmost_sequence ();
      return temp;
    }
--- 1195,1201 ----
         chain current, so the code is placed at the start of the
         function.  */
      push_topmost_sequence ();
!     emit_insn_before (seq, NEXT_INSN (entry_of_function ()));
      pop_topmost_sequence ();
      return temp;
    }
*************** expand_builtin_saveregs (void)
*** 3806,3812 ****
       is inside a start_sequence, make the outer-level insn chain current, so
       the code is placed at the start of the function.  */
    push_topmost_sequence ();
!   emit_insn_after (seq, get_insns ());
    pop_topmost_sequence ();
  
    return val;
--- 3814,3820 ----
       is inside a start_sequence, make the outer-level insn chain current, so
       the code is placed at the start of the function.  */
    push_topmost_sequence ();
!   emit_insn_after (seq, entry_of_function ());
    pop_topmost_sequence ();
  
    return val;
Index: cfgbuild.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/cfgbuild.c,v
retrieving revision 1.25.2.11
diff -c -3 -p -r1.25.2.11 cfgbuild.c
*** cfgbuild.c	28 Sep 2003 06:06:07 -0000	1.25.2.11
--- cfgbuild.c	19 Dec 2003 00:19:29 -0000
*************** static rtx find_label_refs (rtx, rtx);
*** 54,60 ****
  static void make_edges (rtx, basic_block, basic_block, int);
  static void make_label_edge (sbitmap *, basic_block, rtx, int);
  static void make_eh_edge (sbitmap *, basic_block, rtx);
- static void find_bb_boundaries (basic_block);
  static void compute_outgoing_frequencies (basic_block);
  \f
  /* Return true if insn is something that should be contained inside basic
--- 54,59 ----
*************** control_flow_insn_p (rtx insn)
*** 109,114 ****
--- 108,115 ----
  	      && GET_CODE (PATTERN (insn)) != ADDR_DIFF_VEC);
  
      case CALL_INSN:
+       if (find_reg_note (insn, REG_NORETURN, 0))
+ 	return true;
        /* Call insn may return to the nonlocal goto handler.  */
        return ((nonlocal_goto_handler_labels
  	       && (0 == (note = find_reg_note (insn, REG_EH_REGION,
*************** control_flow_insn_p (rtx insn)
*** 118,123 ****
--- 119,129 ----
  	      || can_throw_internal (insn));
  
      case INSN:
+       /* We represent EH manipulation via unspec followed by barrier.
+          Such instruction is control flow instruction even when there is
+          no other clue specifying it.  */
+       if (NEXT_INSN (insn) && GET_CODE (NEXT_INSN (insn)) == BARRIER)
+ 	return true;
        return (flag_non_call_exceptions && can_throw_internal (insn));
  
      case BARRIER:
*************** enum state {BLOCK_NEW = 0, BLOCK_ORIGINA
*** 644,653 ****
  #define SET_STATE(BB, STATE) ((BB)->aux = (void *) (size_t) (STATE))
  
  /* Scan basic block BB for possible BB boundaries inside the block
!    and create new basic blocks in the progress.  */
  
! static void
! find_bb_boundaries (basic_block bb)
  {
    rtx insn = bb->head;
    rtx end = bb->end;
--- 650,662 ----
  #define SET_STATE(BB, STATE) ((BB)->aux = (void *) (size_t) (STATE))
  
  /* Scan basic block BB for possible BB boundaries inside the block
!    and create new basic blocks in the progress. 
  
!    Collect and return a list of labels whose addresses are taken.  This
!    will be used in make_edges for use with computed gotos.  */
! 
! static rtx
! find_bb_boundaries (rtx lvl, basic_block bb)
  {
    rtx insn = bb->head;
    rtx end = bb->end;
*************** find_bb_boundaries (basic_block bb)
*** 655,661 ****
    edge fallthru = NULL;
  
    if (insn == bb->end)
!     return;
  
    if (GET_CODE (insn) == CODE_LABEL)
      insn = NEXT_INSN (insn);
--- 664,670 ----
    edge fallthru = NULL;
  
    if (insn == bb->end)
!     return lvl;
  
    if (GET_CODE (insn) == CODE_LABEL)
      insn = NEXT_INSN (insn);
*************** find_bb_boundaries (basic_block bb)
*** 665,670 ****
--- 674,711 ----
      {
        enum rtx_code code = GET_CODE (insn);
  
+       if (GET_CODE (insn) == CALL_INSN
+ 	  && GET_CODE (PATTERN (insn)) == CALL_PLACEHOLDER)
+ 	{
+ 	  /* Scan each of the alternatives for label refs.  */
+ 	  lvl = find_label_refs (XEXP (PATTERN (insn), 0), lvl);
+ 	  lvl = find_label_refs (XEXP (PATTERN (insn), 1), lvl);
+ 	  lvl = find_label_refs (XEXP (PATTERN (insn), 2), lvl);
+ 	}
+       if (GET_CODE (insn) == INSN || GET_CODE (insn) == CALL_INSN)
+ 	{
+ 	  rtx note;
+ 
+ 	  for (note = REG_NOTES (insn); note; note = XEXP (note, 1))
+ 	    if (REG_NOTE_KIND (note) == REG_LABEL)
+ 	      {
+ 		rtx lab = XEXP (note, 0), next;
+ 
+ 		if ((next = next_nonnote_insn (lab)) != NULL
+ 			 && GET_CODE (next) == JUMP_INSN
+ 			 && (GET_CODE (PATTERN (next)) == ADDR_VEC
+ 			     || GET_CODE (PATTERN (next)) == ADDR_DIFF_VEC))
+ 		  ;
+ 		else if (GET_CODE (lab) == NOTE)
+ 		  ;
+ 		else if (GET_CODE (NEXT_INSN (insn)) == JUMP_INSN
+ 			 && find_reg_note (NEXT_INSN (insn), REG_LABEL, lab))
+ 		  ;
+ 		else
+ 		  lvl = alloc_EXPR_LIST (0, XEXP (note, 0), lvl);
+ 	      }
+ 	}
+ 
        /* On code label, split current basic block.  */
        if (code == CODE_LABEL)
  	{
*************** find_bb_boundaries (basic_block bb)
*** 707,712 ****
--- 748,754 ----
       followed by cleanup at fallthru edge, so the outgoing edges may
       be dead.  */
    purge_dead_edges (bb);
+   return lvl;
  }
  
  /*  Assume that frequency of basic block B is known.  Compute frequencies
*************** void
*** 750,755 ****
--- 792,798 ----
  find_many_sub_basic_blocks (sbitmap blocks)
  {
    basic_block bb, min, max;
+   rtx label_value_list = NULL;
  
    FOR_EACH_BB (bb)
      SET_STATE (bb,
*************** find_many_sub_basic_blocks (sbitmap bloc
*** 757,763 ****
  
    FOR_EACH_BB (bb)
      if (STATE (bb) == BLOCK_TO_SPLIT)
!       find_bb_boundaries (bb);
  
    FOR_EACH_BB (bb)
      if (STATE (bb) != BLOCK_ORIGINAL)
--- 800,806 ----
  
    FOR_EACH_BB (bb)
      if (STATE (bb) == BLOCK_TO_SPLIT)
!       label_value_list = find_bb_boundaries (label_value_list, bb);
  
    FOR_EACH_BB (bb)
      if (STATE (bb) != BLOCK_ORIGINAL)
*************** find_many_sub_basic_blocks (sbitmap bloc
*** 770,776 ****
  
    /* Now re-scan and wire in all edges.  This expect simple (conditional)
       jumps at the end of each new basic blocks.  */
!   make_edges (NULL, min, max, 1);
  
    /* Update branch probabilities.  Expect only (un)conditional jumps
       to be created with only the forward edges.  */
--- 813,819 ----
  
    /* Now re-scan and wire in all edges.  This expect simple (conditional)
       jumps at the end of each new basic blocks.  */
!   make_edges (label_value_list, min, max, 1);
  
    /* Update branch probabilities.  Expect only (un)conditional jumps
       to be created with only the forward edges.  */
*************** find_sub_basic_blocks (basic_block bb)
*** 807,813 ****
    basic_block next = bb->next_bb;
  
    min = bb;
!   find_bb_boundaries (bb);
    max = next->prev_bb;
  
    /* Now re-scan and wire in all edges.  This expect simple (conditional)
--- 850,856 ----
    basic_block next = bb->next_bb;
  
    min = bb;
!   find_bb_boundaries (NULL, bb);
    max = next->prev_bb;
  
    /* Now re-scan and wire in all edges.  This expect simple (conditional)
Index: expr.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/expr.c,v
retrieving revision 1.467.2.70
diff -c -3 -p -r1.467.2.70 expr.c
*** expr.c	17 Dec 2003 20:25:55 -0000	1.467.2.70
--- expr.c	19 Dec 2003 00:20:06 -0000
*************** expand_expr_1 (tree exp, rtx target, enu
*** 8435,8442 ****
  	     to restore, and at least one arm of the COND_EXPR is a
  	     GOTO_EXPR to a local label, then we can emit more efficient
  	     code by using jumpif/jumpifnot instead of the 'if' machinery.  */
! 	  if (! optimize
! 	      || containing_blocks_have_cleanups_or_stack_level ())
  	    ;
  	  else if (TREE_CODE (then_) == GOTO_EXPR
  		   && TREE_CODE (GOTO_DESTINATION (then_)) == LABEL_DECL
--- 8435,8442 ----
  	     to restore, and at least one arm of the COND_EXPR is a
  	     GOTO_EXPR to a local label, then we can emit more efficient
  	     code by using jumpif/jumpifnot instead of the 'if' machinery.  */
! 	  if (/*! optimize
! 	      || */containing_blocks_have_cleanups_or_stack_level ())
  	    ;
  	  else if (TREE_CODE (then_) == GOTO_EXPR
  		   && TREE_CODE (GOTO_DESTINATION (then_)) == LABEL_DECL
Index: gimple-low.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/Attic/gimple-low.c,v
retrieving revision 1.1.4.15
diff -c -3 -p -r1.1.4.15 gimple-low.c
*** gimple-low.c	7 Dec 2003 10:07:44 -0000	1.1.4.15
--- gimple-low.c	19 Dec 2003 00:20:33 -0000
*************** expand_used_vars (void)
*** 421,427 ****
    cfun->unexpanded_var_list = nreverse (cfun->unexpanded_var_list);
  
    for (cell = cfun->unexpanded_var_list; cell; cell = TREE_CHAIN (cell))
!     expand_var (TREE_VALUE (cell));
  
    cfun->unexpanded_var_list = NULL_TREE;
  }
--- 421,437 ----
    cfun->unexpanded_var_list = nreverse (cfun->unexpanded_var_list);
  
    for (cell = cfun->unexpanded_var_list; cell; cell = TREE_CHAIN (cell))
!     if (TREE_CODE (TREE_VALUE (cell)) != FUNCTION_DECL)
!       expand_var (TREE_VALUE (cell));
! }
! void
! expand_nested_functions (void)
! {
!   tree cell;
! 
!   for (cell = cfun->unexpanded_var_list; cell; cell = TREE_CHAIN (cell))
!     if (TREE_CODE (TREE_VALUE (cell)) == FUNCTION_DECL)
!       expand_var (TREE_VALUE (cell));
  
    cfun->unexpanded_var_list = NULL_TREE;
  }
Index: langhooks.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/langhooks.c,v
retrieving revision 1.31.2.19
diff -c -3 -p -r1.31.2.19 langhooks.c
*** langhooks.c	25 Nov 2003 02:09:48 -0000	1.31.2.19
--- langhooks.c	19 Dec 2003 00:20:35 -0000
*************** lhd_expand_decl (tree t ATTRIBUTE_UNUSED
*** 290,295 ****
--- 290,297 ----
  const char *
  lhd_decl_printable_name (tree decl, int verbosity ATTRIBUTE_UNUSED)
  {
+   if (!DECL_NAME (decl))
+     return "<null name>";
    return IDENTIFIER_POINTER (DECL_NAME (decl));
  }
  
Index: stmt.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/stmt.c,v
retrieving revision 1.267.2.43
diff -c -3 -p -r1.267.2.43 stmt.c
*** stmt.c	18 Dec 2003 00:15:42 -0000	1.267.2.43
--- stmt.c	19 Dec 2003 00:20:54 -0000
*************** declare_nonlocal_label (tree label)
*** 591,596 ****
--- 591,598 ----
      }
    nonlocal_goto_handler_slots
      = gen_rtx_EXPR_LIST (VOIDmode, slot, nonlocal_goto_handler_slots);
+   /* ??? We should walk nested functions to figure out whehter it is needed.  */
+   cfun->has_nonlocal_label = 1;
  }
  
  /* Generate RTL code for a `goto' statement with target label LABEL.
Index: tree-cfg.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/Attic/tree-cfg.c,v
retrieving revision 1.1.4.238
diff -c -3 -p -r1.1.4.238 tree-cfg.c
*** tree-cfg.c	17 Dec 2003 23:19:03 -0000	1.1.4.238
--- tree-cfg.c	19 Dec 2003 00:21:17 -0000
*************** Boston, MA 02111-1307, USA.  */
*** 42,47 ****
--- 42,48 ----
  #include "toplev.h"
  #include "except.h"
  #include "cfgloop.h"
+ #include "rtl.h"
  
  /* This file contains functions for building the Control Flow Graph (CFG)
     for a function tree.  */
*************** cleanup_control_expr_graph (basic_block 
*** 1585,1590 ****
--- 1586,1593 ----
  	  next = e->succ_next;
  	  if (e != taken_edge)
  	    {
+ 	      taken_edge->probability += e->probability;
+ 	      taken_edge->count += e->count;
  	      ssa_remove_edge (e);
  	      retval = true;
  	    }
*************** find_taken_edge (basic_block bb, tree va
*** 1615,1620 ****
--- 1618,1626 ----
      abort ();
  #endif
  
+   if (val)
+     val = fold (val);
+ 
    /* If VAL is not a constant, we can't determine which edge might
       be taken.  */
    if (val == NULL || !really_constant_p (val))
*************** tree_redirect_edge_and_branch_force (edg
*** 3693,3698 ****
--- 3699,4024 ----
      abort ();
  
    return NULL;
+ }
+ 
+ static basic_block
+ expand_block (basic_block bb, FILE * dump_file)
+ {
+   block_stmt_iterator bsi = bsi = bsi_start (bb);
+   tree stmt = NULL;
+   rtx note;
+   edge e;
+ 
+   if (dump_file)
+     {
+       tree_register_cfg_hooks ();
+       dump_bb (bb, dump_file, 0);
+       rtl_register_cfg_hooks ();
+     }
+ 
+   if (!bsi_end_p (bsi))
+     stmt = bsi_stmt (bsi);
+   if (stmt && TREE_CODE (stmt) == LABEL_EXPR)
+     {
+       (*lang_hooks.rtl_expand.stmt) (stmt);
+       bb->head = get_last_insn ();
+       bsi_next (&bsi);
+       note = emit_note (NOTE_INSN_BASIC_BLOCK);
+     }
+   else
+     note = bb->head = emit_note (NOTE_INSN_BASIC_BLOCK);
+   NOTE_BASIC_BLOCK (note) = bb;
+ 
+   /* This flag is never used by RTL backend.  */
+   for (e = bb->succ; e; e = e->succ_next)
+     {
+       e->flags &= ~EDGE_EXECUTABLE;
+ 
+       /* At the moment not all abnormal edges match RTL representation.
+          It is safe to remove them here as find_sub_basic_blocks will
+          rediscover.  In future we should get this fixed properly.  */
+       if (e->flags & EDGE_ABNORMAL)
+ 	remove_edge (e);
+     }
+ 
+   for (; !bsi_end_p (bsi); bsi_next (&bsi))
+     {
+       tree stmt = bsi_stmt (bsi);
+       rtx last = get_last_insn ();
+ 
+       (*lang_hooks.rtl_expand.stmt) (stmt);
+       switch (TREE_CODE (stmt))
+ 	{
+ 	  /* Cond expr expands into conditional jump followed by unconditional
+ 	     jump.  While in theory find_sub_basic_block would be able to deal
+ 	     with this, we rather do this by hand to make maintaining of profile
+ 	     easier as well as to allow edges to be moved instead of
+ 	     purged & re-created.  */
+ 	case COND_EXPR:
+ 	  {
+ 	    rtx condjump = get_last_insn ();
+ 	    edge true_edge;
+ 	    edge false_edge;
+ 
+ 	    /* One of the succestors can be fallthru.  */
+ 	    true_edge = bb->succ;
+ 	    false_edge = bb->succ->succ_next;
+ 	    if ((false_edge->flags & EDGE_TRUE_VALUE)
+ 		|| (true_edge->flags & EDGE_FALSE_VALUE))
+ 	      {
+ 	        false_edge = bb->succ;
+ 	        true_edge = bb->succ->succ_next;
+ 	      }
+ 	    if (!(false_edge->flags & EDGE_FALSE_VALUE)
+ 		&& (true_edge->flags & EDGE_TRUE_VALUE))
+ 	      abort ();
+ 
+ 	    true_edge->flags &= ~EDGE_TRUE_VALUE;
+ 	    false_edge->flags &= ~EDGE_FALSE_VALUE;
+ 
+ 	    if (TREE_CODE (COND_EXPR_THEN (stmt)) != GOTO_EXPR)
+ 	      {
+ 		true_edge->flags |= EDGE_FALLTHRU;
+ 		break;
+ 	      }
+ 	    if (TREE_CODE (COND_EXPR_ELSE (stmt)) != GOTO_EXPR)
+ 	      {
+ 		false_edge->flags |= EDGE_FALLTHRU;
+ 		break;
+ 	      }
+ 
+ 	    /* Skip the unconditional jump.  */
+ 	    if (GET_CODE (condjump) != BARRIER)
+ 	      abort ();
+ 	    bb->end = condjump = PREV_INSN (condjump);
+ 	    if (GET_CODE (condjump) != JUMP_INSN)
+ 	      abort ();
+ 
+ 	    /* Look backward for the conditional jump.  */
+ 	    do
+ 	      {
+ 		/* Aborting here means that conditional jump expansion
+ 		   managed to optimize jump into unconditional jump.  fold_const
+ 		   should be extended to deal with this case then.  */
+ 		if (condjump == last)
+ 		  abort ();
+ 		condjump = PREV_INSN (condjump);
+ 	      }
+ 	    while (GET_CODE (condjump) != JUMP_INSN);
+ 
+ 	    update_bb_for_insn (bb);
+ 	    split_block (bb, condjump);
+ 	    redirect_edge_pred (true_edge, bb);
+ 
+ 	    if (dump_file)
+ 	      {
+ 		dump_bb (bb, dump_file, 0);
+ 		dump_bb (bb->next_bb, dump_file, 0);
+ 	      }
+ 	    return bb->next_bb;
+ 	  }
+ 	  break;
+ 	  /* Update after expansion of sibling call.  */
+ 	case CALL_EXPR:
+ 	case MODIFY_EXPR:
+ 	case RETURN_EXPR:
+ 	  for (last = NEXT_INSN (last); last; last = NEXT_INSN (last))
+ 	    {
+ 	      if (GET_CODE (last) == CALL_INSN && SIBLING_CALL_P (last))
+ 		{
+ 		  edge e;
+ 		  int probability = 0;
+ 		  gcov_type count = 0;
+ 
+ 		  do_pending_stack_adjust ();
+ 		  for (e = bb->succ; e; e = e->succ_next)
+ 		    if (!(e->flags & (EDGE_ABNORMAL | EDGE_EH)))
+ 		      {
+ 			count += e->count;
+ 			probability += e->probability;
+ 			remove_edge (e);
+ 		      }
+ 
+ 		  e = make_edge (bb, EXIT_BLOCK_PTR,
+ 				     EDGE_ABNORMAL | EDGE_SIBCALL);
+ 		  e->probability += probability;
+ 		  e->count += count;
+ 		  bb->end = last;
+ 
+ 		  last = NEXT_INSN (last);
+ 		  if (GET_CODE (last) != BARRIER)
+ 		    abort ();
+ 		  while (NEXT_INSN (last))
+ 		    delete_insn (NEXT_INSN (last));
+ 		  update_bb_for_insn (bb);
+ 		  if (dump_file)
+ 		    dump_bb (bb, dump_file, 0);
+ 		  return bb;
+ 		}
+ 	    }
+ 	  break;
+ 	default:
+ 	  break;
+ 	}
+     }
+   do_pending_stack_adjust ();
+   bb->end = get_last_insn ();
+   if (GET_CODE (bb->end) == BARRIER)
+     bb->end = PREV_INSN (bb->end);
+ 
+   if (GET_CODE (bb->end) == JUMP_INSN
+       && (GET_CODE (PATTERN (bb->end)) == ADDR_VEC
+ 	  || GET_CODE (PATTERN (bb->end)) == ADDR_DIFF_VEC))
+     bb->end = PREV_INSN (PREV_INSN (bb->end));
+ 
+   if (dump_file)
+     dump_bb (bb, dump_file, 0);
+   update_bb_for_insn (bb);
+   return bb;
+ }
+ 
+ /* Create basic block for initialization code.  */
+ static basic_block
+ construct_init_block (void)
+ {
+   basic_block init_block, first_block;
+   edge e;
+   tree bind_expr = DECL_SAVED_TREE (current_function_decl);
+   tree block;
+ 
+   /* Expand start of outermost BIND_EXPR.  */
+   if (TREE_CODE (bind_expr) != BIND_EXPR)
+     abort ();
+   block = BIND_EXPR_BLOCK (bind_expr);
+   expand_start_bindings_and_block (0, block);
+ 
+   for (e = ENTRY_BLOCK_PTR->succ; e; e = e->succ_next)
+     if (e->dest == ENTRY_BLOCK_PTR->next_bb)
+       break;
+   init_block = create_basic_block (NEXT_INSN (get_insns ()), get_last_insn (), ENTRY_BLOCK_PTR);
+   if (e)
+     {
+       first_block = e->dest;
+       redirect_edge_succ (e, init_block);
+       make_edge (init_block, first_block, EDGE_FALLTHRU);
+     }
+   else
+     make_edge (init_block, EXIT_BLOCK_PTR, EDGE_FALLTHRU);
+   update_bb_for_insn (init_block);
+ 
+   return init_block;
+ }
+ 
+ /* Create block containing landing pads and similar stuff.  */
+ static void
+ construct_exit_block (void)
+ {
+   rtx head = get_last_insn ();
+   rtx end;
+   basic_block exit_block;
+   tree bind_expr = DECL_SAVED_TREE (current_function_decl);
+   edge e, next;
+ 
+   /* We hard-wired immediate_size_expand to zero above.
+      expand_function_end will decrement this variable.  So, we set the
+      variable to one here, so that after the decrement it will remain
+      zero.  */
+   immediate_size_expand = 1;
+ 
+   /* Make sure the locus is set to the end of the function, so that 
+      epilogue line numbers and warnings are set properly.  */
+   if (cfun->function_end_locus.file)
+     input_location = cfun->function_end_locus;
+ 
+   /* The following insns belong to the top scope.  */
+   record_block_change (DECL_INITIAL (current_function_decl));
+ 
+   expand_end_bindings (BIND_EXPR_VARS (bind_expr), 1, 0);
+   
+   /* Allow language dialects to perform special processing.  */
+   (*lang_hooks.rtl_expand.end) ();
+ 
+   /* Generate rtl for function exit.  */
+   expand_function_end ();
+ 
+   end = get_last_insn ();
+   if (head == end)
+     return;
+   exit_block = create_basic_block (NEXT_INSN (head), end, EXIT_BLOCK_PTR->prev_bb);
+   for (e = EXIT_BLOCK_PTR->pred; e; e = next)
+     {
+       next = e->pred_next;
+       if (!(e->flags & EDGE_ABNORMAL))
+         redirect_edge_succ (e, exit_block);
+     }
+   make_edge (exit_block, EXIT_BLOCK_PTR, EDGE_FALLTHRU);
+   update_bb_for_insn (exit_block);
+ }
+ 
+ /* Convert IL representation from a GIMPLE to RTL.
+    We do conversion per basic block basis and preserve GIMPLE CFG.  This
+    imply some magic as CFG is partly consisting of RTL and partly of GIMPLE
+    basic blocks, so be curefull to not manipulate CFG during the expansion.  */
+ void
+ tree_expand_cfg (void)
+ {
+   basic_block bb, init_block;
+   rtx insn;
+   sbitmap blocks;
+   FILE *dump_file;
+ 
+   /* We get confused horribly by conditional jumps folding to constants.  */
+   cleanup_control_flow ();
+ 
+   /* Write the flowgraph to a dot file.  */
+   dump_file = dump_begin (TDI_expanded, &dump_flags);
+   rtl_register_cfg_hooks ();
+ 
+   init_block = construct_init_block ();
+ 
+   FOR_BB_BETWEEN (bb, init_block->next_bb, EXIT_BLOCK_PTR, next_bb)
+     bb = expand_block (bb, dump_file);
+ 
+   delete_tree_cfg ();
+ 
+   construct_exit_block ();
+ 
+   /* Convert from NOTE_INSN_EH_REGION style notes, and do other
+      sorts of eh initialization.  Delay this until after the
+      initial rtl dump so that we can see the original nesting.  */
+   convert_from_eh_region_ranges ();
+ 
+   rebuild_jump_labels (get_insns ());
+   find_exception_handler_labels ();
+ 
+   blocks = sbitmap_alloc (last_basic_block);
+   sbitmap_ones (blocks);
+   find_many_sub_basic_blocks (blocks);
+   purge_all_dead_edges (0);
+   sbitmap_free (blocks);
+ 
+   if (dump_file)
+     {
+       fprintf (dump_file, "\n\n\nExpanded body:\n\n\n");
+       print_rtl_with_bb (dump_file, get_insns ());
+       dump_end (TDI_expanded, dump_file);
+     }
+ #ifdef ENABLE_CHECKING
+   verify_flow_info();
+ #endif
+   cleanup_cfg (CLEANUP_PRE_SIBCALL | CLEANUP_PRE_LOOP);
+ #ifdef ENABLE_CHECKING
+   verify_flow_info();
+ #endif
+ 
+   free_basic_block_vars (0);
+   free_bb_for_insn ();
+ 
+   /* ??? Avoid confusion while rebuilding CFG.  */
+   for (insn = get_insns (); insn; insn = NEXT_INSN (insn))
+     if (GET_CODE (insn) == NOTE
+ 	&& NOTE_LINE_NUMBER (insn) == NOTE_INSN_BASIC_BLOCK)
+       delete_insn (insn);
  }
  
  /* Dump FUNCTION_DECL FN to file FILE using FLAGS (see TDF_* in tree.h)  */
Index: tree-dump.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/tree-dump.c,v
retrieving revision 1.6.2.61
diff -c -3 -p -r1.6.2.61 tree-dump.c
*** tree-dump.c	17 Dec 2003 19:53:01 -0000	1.6.2.61
--- tree-dump.c	19 Dec 2003 00:21:24 -0000
*************** static struct dump_file_info dump_files[
*** 678,683 ****
--- 678,684 ----
    {".dce2", "tree-dce2", 0, 0},
    {".tail2", "tree-tail2", 0, 0},
    {".optimized", "tree-optimized", 0, 0},
+   {".expanded", "tree-expanded", 0, 0},
    {".mudflap2", "tree-mudflap2", 0, 0},
    {".xml", "call-graph", 0, 0},
    {NULL, "tree-all", 0, 0},
Index: tree-flow.h
===================================================================
RCS file: /cvs/gcc/gcc/gcc/Attic/tree-flow.h,v
retrieving revision 1.1.4.173
diff -c -3 -p -r1.1.4.173 tree-flow.h
*** tree-flow.h	17 Dec 2003 19:53:01 -0000	1.1.4.173
--- tree-flow.h	19 Dec 2003 00:21:25 -0000
*************** extern void clear_special_calls (void);
*** 428,433 ****
--- 428,434 ----
  extern void compute_dominance_frontiers (bitmap *, dominance_info);
  extern bool verify_stmt (tree);
  extern void verify_stmts (void);
+ extern void tree_expand_cfg (void);
  
  /* In tree-pretty-print.c.  */
  extern void dump_generic_bb (FILE *, basic_block, int, int);
*************** extern void expand_used_vars (void);
*** 479,484 ****
--- 480,486 ----
  extern void remove_useless_vars (void);
  extern void record_vars (tree);
  extern bool block_may_fallthru (tree block);
+ void expand_nested_functions (void);
  
  /* In tree-ssa.c  */
  extern void init_tree_ssa (void);
Index: tree-optimize.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/tree-optimize.c,v
retrieving revision 1.1.4.96
diff -c -3 -p -r1.1.4.96 tree-optimize.c
*** tree-optimize.c	17 Dec 2003 19:53:02 -0000	1.1.4.96
--- tree-optimize.c	19 Dec 2003 00:21:28 -0000
*************** optimize_function_tree (tree fndecl, tre
*** 253,259 ****
  
        /* Rewrite the function out of SSA form.  */
        rewrite_out_of_ssa (fndecl, TDI_optimized);
!       ggc_collect ();
  
        /* Flush out flow graph and SSA data.  */
        BITMAP_XFREE (vars_to_rename);
--- 253,269 ----
  
        /* Rewrite the function out of SSA form.  */
        rewrite_out_of_ssa (fndecl, TDI_optimized);
! #ifdef ENABLE_CHECKING
!       verify_flow_info ();
!       verify_stmts ();
! #endif
! 
!       /* Do some cleanups which reduce the amount of data the
! 	 tree->rtl expanders deal with.  */
!       cfg_remove_useless_stmts ();
! 
!       /* Remove unnecesary variables.  */
!       remove_useless_vars ();
  
        /* Flush out flow graph and SSA data.  */
        BITMAP_XFREE (vars_to_rename);
*************** tree_rest_of_compilation (tree fndecl, b
*** 451,478 ****
        && DECL_FILE_SCOPE_P (fndecl))
      expand_main_function ();
  
    /* Generate the RTL for this function.  */
!   (*lang_hooks.rtl_expand.stmt) (DECL_SAVED_TREE (fndecl));
  
!   /* We hard-wired immediate_size_expand to zero above.
!      expand_function_end will decrement this variable.  So, we set the
!      variable to one here, so that after the decrement it will remain
!      zero.  */
!   immediate_size_expand = 1;
! 
!   /* Make sure the locus is set to the end of the function, so that 
!      epilogue line numbers and warnings are set properly.  */
!   if (cfun->function_end_locus.file)
!     input_location = cfun->function_end_locus;
! 
!   /* The following insns belong to the top scope.  */
!   record_block_change (DECL_INITIAL (current_function_decl));
!   
!   /* Allow language dialects to perform special processing.  */
!   (*lang_hooks.rtl_expand.end) ();
  
!   /* Generate rtl for function exit.  */
!   expand_function_end ();
  
    /* If this is a nested function, protect the local variables in the stack
       above us from being collected while we're compiling this function.  */
--- 461,507 ----
        && DECL_FILE_SCOPE_P (fndecl))
      expand_main_function ();
  
+   /* It is possible to take address of nested function in another nested
+      function.  In that case we need to initialize trampoline_address in the
+      outer function.  We simply initialize all needed nested functions
+      to avoid need of walking their bodies.  */
+   node = cgraph_node (current_function_decl)->nested;
+   while (node)
+     {
+       if (node->needed)
+         trampoline_address (node->decl);
+       node = node->next_nested;
+     }
+ 
    /* Generate the RTL for this function.  */
!   if (/*optimize >= 1 && !flag_disable_tree_ssa &&*/ n_basic_blocks > 0)
!     tree_expand_cfg ();
!   else
!     {
!       (*lang_hooks.rtl_expand.stmt) (DECL_SAVED_TREE (fndecl));
  
!       /* We hard-wired immediate_size_expand to zero above.
! 	 expand_function_end will decrement this variable.  So, we set the
! 	 variable to one here, so that after the decrement it will remain
! 	 zero.  */
!       immediate_size_expand = 1;
! 
!       /* Make sure the locus is set to the end of the function, so that 
! 	 epilogue line numbers and warnings are set properly.  */
!       if (cfun->function_end_locus.file)
! 	input_location = cfun->function_end_locus;
! 
!       /* The following insns belong to the top scope.  */
!       record_block_change (DECL_INITIAL (current_function_decl));
!       
!       /* Allow language dialects to perform special processing.  */
!       (*lang_hooks.rtl_expand.end) ();
! 
!       /* Generate rtl for function exit.  */
!       expand_function_end ();
!     }
  
!   expand_nested_functions ();
  
    /* If this is a nested function, protect the local variables in the stack
       above us from being collected while we're compiling this function.  */
Index: tree-ssa.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/Attic/tree-ssa.c,v
retrieving revision 1.1.4.178
diff -c -3 -p -r1.1.4.178 tree-ssa.c
*** tree-ssa.c	15 Dec 2003 22:58:36 -0000	1.1.4.178
--- tree-ssa.c	19 Dec 2003 00:21:39 -0000
*************** rewrite_out_of_ssa (tree fndecl, enum tr
*** 2745,2757 ****
    if (dump_file && (dump_flags & TDF_DETAILS))
      dump_tree_cfg (dump_file, dump_flags & ~TDF_DETAILS);
  
-   /* Do some cleanups which reduce the amount of data the
-      tree->rtl expanders deal with.  */
-   cfg_remove_useless_stmts ();
- 
-   /* Remove unnecesary variables.  */
-   remove_useless_vars ();
- 
    /* Debugging dumps.  */
    if (dump_file)
      {
--- 2745,2750 ----
Index: tree.h
===================================================================
RCS file: /cvs/gcc/gcc/gcc/tree.h,v
retrieving revision 1.342.2.153
diff -c -3 -p -r1.342.2.153 tree.h
*** tree.h	17 Dec 2003 19:53:02 -0000	1.342.2.153
--- tree.h	19 Dec 2003 00:22:05 -0000
*************** enum tree_dump_index
*** 3602,3607 ****
--- 3603,3609 ----
    TDI_dce_2,
    TDI_tail2,			/* dump after tail recursion/tail call */
    TDI_optimized,
+   TDI_expanded,
    TDI_mudflap2,
  
    TDI_xml,                      /* dump function call graph.   */

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [tree-ssa, RFC]  CFG transparent RTL expansion
  2003-12-19  5:37 [tree-ssa, RFC] CFG transparent RTL expansion Jan Hubicka
@ 2003-12-19  6:50 ` Andrew MacLeod
  2003-12-19 13:14   ` Jan Hubicka
                     ` (2 more replies)
  0 siblings, 3 replies; 20+ messages in thread
From: Andrew MacLeod @ 2003-12-19  6:50 UTC (permalink / raw)
  To: Jan Hubicka
  Cc: gcc mailing list, gcc-patches, Richard Henderson, Diego Novillo

On Thu, 2003-12-18 at 19:44, Jan Hubicka wrote:
> Hi,
> I am attaching patch implementing the quite much discussed idea of expanding
> RTL out of GIMPLE CFG transparently.  The patch do have some rough edges and
> can be decomposed to about 5 incremental steps so it is not intended for
> detailed review rather than a demonstration/proof of the concept in a hope that
> it will make decision easier (the patch however bootstraps and works well).
> 
> The problem I am trying to solve is question on how to preserve profile
> information out of SSA based passes.  It is clear that most effective consumers
> of profile include inlining, loop optimization, register allocation and basic
> block reordering so profile must be available in about whole compilation pass.
> Actually measuring profile in multiple stages of compilation is possible, but
> earlier optimizations invalidate later measured profile so it require user to
> do multiple train runs that is unfortunate so I think it is a must to have
> machinery to preserve profile as well as possible from before inlining up to
> final.
> 
> From the work on RTL, the experience shows that CFG is the most natural place
> to store information into and about only place where updating after control
> flow transformations is manageable (still not easy), the alternatives like
> remembering execution counts for each instructions or branch probabilities for
> conditional branches does not work well for nontrivial updates.

I need more details :-). 

This implements something which isnt actually used yet (ie, no
collection emitted/done). If I read it correctly, just a proof of
concept of a persistant CFG. So lets skip to how it would actually be
used. And excuse my lackof knowledge about how gcc collects profile info
:-)

1 - Where do you propose to emit the measurement code, and where do you
plan to read it in? Presumably they have to be the same location...  I
assume the measurement code is currently emitted in RTL land, probably
early in the rtl cycle? Would we be doing that in the front end, or in
SSA land? or just before SSA after the CFG is created?

2 - Does profiling work on abnormal edges?  Ie, are throw edges
annotated, or are they guessed/ignored/calculated when possible?

3-  So, assume the profile information on trees is going to be kept in
the CFG only right through into RTL. When I inline function 'a' into
function b:

  int a(int y)
  {
    if (y < 20)   
      return 1; 
    else
      return 0;  /* Profiling says this branch is taken 90% of the time.  */
  }

  void b()
  {
    <...>
    a(z)
    <...>
  }


simplistic example, but lets say profiling indicates that we ought to
inline a(z) when we are processing b. How does that profile information
from a() get propagated into the code which is inlined into b? There is
no CFG for a(), just the trees. Do you have to read the info for a() in
from the profile source each time its inlined or how does that work?

4- Does anyone other than me find the idea of inlining optimized trees
appealing? I understand thats not on the table right now because there
are difficulties with EH, at the very least. I would think we could
encapsulate that info somehow, but clearly thats a problem that would
need to be solved.

Im sure more questions will occur to me, but thats it for now :-)

Andrew

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [tree-ssa, RFC]  CFG transparent RTL expansion
  2003-12-19  6:50 ` Andrew MacLeod
@ 2003-12-19 13:14   ` Jan Hubicka
  2003-12-19 14:56     ` Jan Hubicka
  2003-12-19 18:08     ` Andrew MacLeod
  2003-12-19 15:01   ` Jan Hubicka
  2003-12-19 17:20   ` law
  2 siblings, 2 replies; 20+ messages in thread
From: Jan Hubicka @ 2003-12-19 13:14 UTC (permalink / raw)
  To: Andrew MacLeod
  Cc: Jan Hubicka, gcc mailing list, gcc-patches, Richard Henderson,
	Diego Novillo

> On Thu, 2003-12-18 at 19:44, Jan Hubicka wrote:
> > Hi,
> > I am attaching patch implementing the quite much discussed idea of expanding
> > RTL out of GIMPLE CFG transparently.  The patch do have some rough edges and
> > can be decomposed to about 5 incremental steps so it is not intended for
> > detailed review rather than a demonstration/proof of the concept in a hope that
> > it will make decision easier (the patch however bootstraps and works well).
> > 
> > The problem I am trying to solve is question on how to preserve profile
> > information out of SSA based passes.  It is clear that most effective consumers
> > of profile include inlining, loop optimization, register allocation and basic
> > block reordering so profile must be available in about whole compilation pass.
> > Actually measuring profile in multiple stages of compilation is possible, but
> > earlier optimizations invalidate later measured profile so it require user to
> > do multiple train runs that is unfortunate so I think it is a must to have
> > machinery to preserve profile as well as possible from before inlining up to
> > final.
> > 
> > From the work on RTL, the experience shows that CFG is the most natural place
> > to store information into and about only place where updating after control
> > flow transformations is manageable (still not easy), the alternatives like
> > remembering execution counts for each instructions or branch probabilities for
> > conditional branches does not work well for nontrivial updates.
> 
> I need more details :-). 
> 
> This implements something which isnt actually used yet (ie, no
> collection emitted/done). If I read it correctly, just a proof of
> concept of a persistant CFG. So lets skip to how it would actually be
> used. And excuse my lackof knowledge about how gcc collects profile info
> :-)
I already do have patches to produce profile estimates early (no profile
feedback, but it is the same thing for updating)
> 
> 1 - Where do you propose to emit the measurement code, and where do you
> plan to read it in? Presumably they have to be the same location...  I
> assume the measurement code is currently emitted in RTL land, probably
> early in the rtl cycle? Would we be doing that in the front end, or in
> SSA land? or just before SSA after the CFG is created

Currently profiling is run just after loop unroller in RTL compilation
path.  This is because old loop unroller does mess up profile, otherwise
it would be done before that.

I plan to move it to very early SSA compilation path, just after early
cleanups (ideally DOM and DCE) is done so we don't end up profiling
unnecesairly compilicated CFG.
> 
> 2 - Does profiling work on abnormal edges?  Ie, are throw edges
> annotated, or are they guessed/ignored/calculated when possible?

It works on noncritical abnormal edges, for critical abnormal edges we
guess, but of course the idea is that once we make EH edges redirectable
(and thus not abnormal) abnormal edges will be real anomality and very
few of them will be critical (setjmp edges are not and computed jump
edges also won't create critical ones as long as we stay with one
computed jump per function trick we do at SSA now)
> 
> 3-  So, assume the profile information on trees is going to be kept in
> the CFG only right through into RTL. When I inline function 'a' into
> function b:
> 
>   int a(int y)
>   {
>     if (y < 20)   
>       return 1; 
>     else
>       return 0;  /* Profiling says this branch is taken 90% of the time.  */
>   }
> 
>   void b()
>   {
>     <...>
>     a(z)
>     <...>
>   }
> 
> 
> simplistic example, but lets say profiling indicates that we ought to
> inline a(z) when we are processing b. How does that profile information
> from a() get propagated into the code which is inlined into b? There is
> no CFG for a(), just the trees. Do you have to read the info for a() in
> from the profile source each time its inlined or how does that work?

The inlining problem is not dealt with this particular patch, it solves
just SSA<->RTL interface, but you know what is my longer term plan, but
lets try to not run into very deep details with the inlining interface
right now.

This is common problem with updating profile (not only after inlining
but after any code specialization - unrolling, tracing or whatever). 

The idea is when such conflict is noticed, the profile gets updates
somehow partly incorrectly (such that all basic block directly dominated
by edge being removed gets their frequencies subtracted).  The
optimizations expect such partially invalided profiles and must behave
sanely in presence of them.

The profile becomes less exact during optimization process, but overall
most of compilers (ORC, IMPACT for instance) are able to keep it by such
simplistics methods in good shape to bring benefits on BB reordering.
Most questions about the profile are easy, like "is this loop executed
insanily many times" or "is this basic block hot?"
these questions are relatively stable WRT slight degenerations.

Relearch compilers usually provide method to additionally profile in
later compilation stages but the benefits are very low. Another more
research topis are path profiles that allows profiling of such
situations.
> 
> 4- Does anyone other than me find the idea of inlining optimized trees
> appealing? I understand thats not on the table right now because there
> are difficulties with EH, at the very least. I would think we could
> encapsulate that info somehow, but clearly thats a problem that would
> need to be solved.

Sure it is appealing.  In order to do something sane, you need to do
some analysis that are not doable without CFG/early cleanups.
As pointed out already, the benefits of inlining estimate by time of
execution of the callee, while the costs is the size.  At the moment we
do have only the second information and our strategy is "do as many
inlinining as size constraint allows and lets hope that very fast
functions will get inlined too then"

For instance my current recursive inlining code would do much better job
if we are able to discover tail calls.  Another examples include partial
inlining if functions like
if (test)
   common fast path
else
   something large

can be inlined as

if (test)
  fast path
else
  do_something_large ()

Honza
> 
> Im sure more questions will occur to me, but thats it for now :-)
> 
> Andrew
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [tree-ssa, RFC]  CFG transparent RTL expansion
  2003-12-19 13:14   ` Jan Hubicka
@ 2003-12-19 14:56     ` Jan Hubicka
  2003-12-19 18:08     ` Andrew MacLeod
  1 sibling, 0 replies; 20+ messages in thread
From: Jan Hubicka @ 2003-12-19 14:56 UTC (permalink / raw)
  To: Jan Hubicka
  Cc: Andrew MacLeod, gcc mailing list, gcc-patches, Richard Henderson,
	Diego Novillo

> > On Thu, 2003-12-18 at 19:44, Jan Hubicka wrote:
> > > Hi,
> > > I am attaching patch implementing the quite much discussed idea of expanding
> > > RTL out of GIMPLE CFG transparently.  The patch do have some rough edges and
> > > can be decomposed to about 5 incremental steps so it is not intended for
> > > detailed review rather than a demonstration/proof of the concept in a hope that
> > > it will make decision easier (the patch however bootstraps and works well).
> > > 
> > > The problem I am trying to solve is question on how to preserve profile
> > > information out of SSA based passes.  It is clear that most effective consumers
> > > of profile include inlining, loop optimization, register allocation and basic
> > > block reordering so profile must be available in about whole compilation pass.
> > > Actually measuring profile in multiple stages of compilation is possible, but
> > > earlier optimizations invalidate later measured profile so it require user to
> > > do multiple train runs that is unfortunate so I think it is a must to have
> > > machinery to preserve profile as well as possible from before inlining up to
> > > final.
> > > 
> > > From the work on RTL, the experience shows that CFG is the most natural place
> > > to store information into and about only place where updating after control
> > > flow transformations is manageable (still not easy), the alternatives like
> > > remembering execution counts for each instructions or branch probabilities for
> > > conditional branches does not work well for nontrivial updates.
> > 
> > I need more details :-). 
> > 
> > This implements something which isnt actually used yet (ie, no
> > collection emitted/done). If I read it correctly, just a proof of
> > concept of a persistant CFG. So lets skip to how it would actually be
> > used. And excuse my lackof knowledge about how gcc collects profile info
> > :-)
> I already do have patches to produce profile estimates early (no profile
> feedback, but it is the same thing for updating)

Also note that updating current profiling code for trees is more or less
trivial issue.  I do plan to keep ability to profile RTL for
experimenting with reading the profile back later in the process to see
how much difference it makes.

Honza

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [tree-ssa, RFC]  CFG transparent RTL expansion
  2003-12-19  6:50 ` Andrew MacLeod
  2003-12-19 13:14   ` Jan Hubicka
@ 2003-12-19 15:01   ` Jan Hubicka
  2003-12-19 17:20   ` law
  2 siblings, 0 replies; 20+ messages in thread
From: Jan Hubicka @ 2003-12-19 15:01 UTC (permalink / raw)
  To: Andrew MacLeod
  Cc: Jan Hubicka, gcc mailing list, gcc-patches, Richard Henderson,
	Diego Novillo

> On Thu, 2003-12-18 at 19:44, Jan Hubicka wrote:
> > Hi,
> > I am attaching patch implementing the quite much discussed idea of expanding
> > RTL out of GIMPLE CFG transparently.  The patch do have some rough edges and
> > can be decomposed to about 5 incremental steps so it is not intended for
> > detailed review rather than a demonstration/proof of the concept in a hope that
> > it will make decision easier (the patch however bootstraps and works well).
> > 
> > The problem I am trying to solve is question on how to preserve profile
> > information out of SSA based passes.  It is clear that most effective consumers
> > of profile include inlining, loop optimization, register allocation and basic
> > block reordering so profile must be available in about whole compilation pass.
> > Actually measuring profile in multiple stages of compilation is possible, but
> > earlier optimizations invalidate later measured profile so it require user to
> > do multiple train runs that is unfortunate so I think it is a must to have
> > machinery to preserve profile as well as possible from before inlining up to
> > final.
> > 
> > From the work on RTL, the experience shows that CFG is the most natural place
> > to store information into and about only place where updating after control
> > flow transformations is manageable (still not easy), the alternatives like
> > remembering execution counts for each instructions or branch probabilities for
> > conditional branches does not work well for nontrivial updates.
> 
> I need more details :-). 
> 
> This implements something which isnt actually used yet (ie, no
> collection emitted/done). If I read it correctly, just a proof of
> concept of a persistant CFG. So lets skip to how it would actually be
> used. And excuse my lackof knowledge about how gcc collects profile info
> :-)
> 
> 1 - Where do you propose to emit the measurement code, and where do you
> plan to read it in? Presumably they have to be the same location...  I
> assume the measurement code is currently emitted in RTL land, probably
> early in the rtl cycle? Would we be doing that in the front end, or in
> SSA land? or just before SSA after the CFG is created?
> 
> 2 - Does profiling work on abnormal edges?  Ie, are throw edges
> annotated, or are they guessed/ignored/calculated when possible?

Also about the annotation.  The algorithm works by identifying minimal
spanning tree in the CFG.  It is enought to annotate only the
non-spanning tree edges as the rest of profile can be computed from the
rule that sum of counts of edges incomming to the block are matching
sum of counts of edges leaving the block  (the CFG is first extended via
fake edges to represent instructions that may terminate executiona and
such).
You can manage the minimal spanning tree to contain majority of abnormal
edges.

Honza

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [tree-ssa, RFC] CFG transparent RTL expansion
  2003-12-19  6:50 ` Andrew MacLeod
  2003-12-19 13:14   ` Jan Hubicka
  2003-12-19 15:01   ` Jan Hubicka
@ 2003-12-19 17:20   ` law
  2003-12-19 17:44     ` Jan Hubicka
  2003-12-19 18:15     ` Andrew MacLeod
  2 siblings, 2 replies; 20+ messages in thread
From: law @ 2003-12-19 17:20 UTC (permalink / raw)
  To: Andrew MacLeod
  Cc: Jan Hubicka, gcc mailing list, gcc-patches, Richard Henderson,
	Diego Novillo

In message <1071801428.21456.162.camel@p4>, Andrew MacLeod writes:
 >This implements something which isnt actually used yet (ie, no
 >collection emitted/done). If I read it correctly, just a proof of
 >concept of a persistant CFG. So lets skip to how it would actually be
 >used. And excuse my lackof knowledge about how gcc collects profile info
 >:-)
Another potential use would be to detect when the tree->rtl lowering
process creates new blocks.  This _may_ be interesting in that it may
be an indication that rather than a block-local CSE we may want to run
something a little more extensive.  

If it's not obvious, I'm thinking about how we start trimming down the
first stage RTL optimizers -- knowing if new blocks were created may be
useful in determining which early RTL optimizers to run or what mode to
run them in (block-local or extended-basic-blocks).

jeff


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [tree-ssa, RFC] CFG transparent RTL expansion
  2003-12-19 17:20   ` law
@ 2003-12-19 17:44     ` Jan Hubicka
  2003-12-19 18:15     ` Andrew MacLeod
  1 sibling, 0 replies; 20+ messages in thread
From: Jan Hubicka @ 2003-12-19 17:44 UTC (permalink / raw)
  To: law
  Cc: Andrew MacLeod, Jan Hubicka, gcc mailing list, gcc-patches,
	Richard Henderson, Diego Novillo

> In message <1071801428.21456.162.camel@p4>, Andrew MacLeod writes:
>  >This implements something which isnt actually used yet (ie, no
>  >collection emitted/done). If I read it correctly, just a proof of
>  >concept of a persistant CFG. So lets skip to how it would actually be
>  >used. And excuse my lackof knowledge about how gcc collects profile info
>  >:-)
> Another potential use would be to detect when the tree->rtl lowering
> process creates new blocks.  This _may_ be interesting in that it may
> be an indication that rather than a block-local CSE we may want to run
> something a little more extensive.  

Even tought it is not obviously clean to me how this information will
help tunning CSE, I also do have also plans to taking advantage of CFG
during expansion and early optimization.

It would be interesting to consider RTL expansion as kind of
"optimization pass" more similar to code generation passes discussed in
literature that are not supposed to do only correct job, but also do not
produce ugly code.  Our current idea of "solve dificult problem of
expansion somehow and cleanup everything later" is definitly expensive.

For instance once RTL expansion is made cleaner, we may catch redudant
expressions and such during RTL production as suggested by Morgan book
(expanding flow graph).  He claims that local GVN embedded in the
expansion code produce very good savings in amount of generated
intermediate code.  Even tought his results are dificult to apply to our
situation, similar idea would work for us by hooking into optabs
expand_bin*

Similarly we do face problems where RTL expansion need more information
than is readilly available in pure GIMPLE.  For instance memcpy expander
shall consider the frequency (optimize for size or speed) the average
size of block and similar info to decide whether emit libcall, loop or
sequence of moves or combination of these.  This is not doable at all in
the later stages of RTL optimization.

Honza
> 
> If it's not obvious, I'm thinking about how we start trimming down the
> first stage RTL optimizers -- knowing if new blocks were created may be
> useful in determining which early RTL optimizers to run or what mode to
> run them in (block-local or extended-basic-blocks).
> 
> jeff
> 
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [tree-ssa, RFC]  CFG transparent RTL expansion
  2003-12-19 13:14   ` Jan Hubicka
  2003-12-19 14:56     ` Jan Hubicka
@ 2003-12-19 18:08     ` Andrew MacLeod
  2003-12-19 19:11       ` Jan Hubicka
                         ` (2 more replies)
  1 sibling, 3 replies; 20+ messages in thread
From: Andrew MacLeod @ 2003-12-19 18:08 UTC (permalink / raw)
  To: Jan Hubicka
  Cc: gcc mailing list, gcc-patches, Richard Henderson, Diego Novillo

On Fri, 2003-12-19 at 07:13, Jan Hubicka wrote:

> > 1 - Where do you propose to emit the measurement code, and where do you
> > plan to read it in? Presumably they have to be the same location...  I
> > assume the measurement code is currently emitted in RTL land, probably
> > early in the rtl cycle? Would we be doing that in the front end, or in
> > SSA land? or just before SSA after the CFG is created

> 
> I plan to move it to very early SSA compilation path, just after early
> cleanups (ideally DOM and DCE) is done so we don't end up profiling
> unnecesairly compilicated CFG.


> > 
> > 2 - Does profiling work on abnormal edges?  Ie, are throw edges
> > annotated, or are they guessed/ignored/calculated when possible?
> 
> It works on noncritical abnormal edges, for critical abnormal edges we
> guess, but of course the idea is that once we make EH edges redirectable
> (and thus not abnormal) abnormal edges will be real anomality and very
> few of them will be critical (setjmp edges are not and computed jump
> edges also won't create critical ones as long as we stay with one
> computed jump per function trick we do at SSA now)


> > 
> > simplistic example, but lets say profiling indicates that we ought to
> > inline a(z) when we are processing b. How does that profile information
> > from a() get propagated into the code which is inlined into b? There is
> > no CFG for a(), just the trees. Do you have to read the info for a() in
> > from the profile source each time its inlined or how does that work?
> 
> The inlining problem is not dealt with this particular patch, it solves
> just SSA<->RTL interface, but you know what is my longer term plan, but
> lets try to not run into very deep details with the inlining interface
> right now.
> 

I realize this. The main purpose of this patch is to enable other
things, so I think we need to discuss those other things in order to
make decisions on whether this is the best approach.


> This is common problem with updating profile (not only after inlining
> but after any code specialization - unrolling, tracing or whatever). 
> 
> The idea is when such conflict is noticed, the profile gets updates
> somehow partly incorrectly (such that all basic block directly dominated
> by edge being removed gets their frequencies subtracted).  The
> optimizations expect such partially invalided profiles and must behave
> sanely in presence of them.



Clearly, as long as the CFG exists, thats where the information ought to
be stored. The only real question I want to deal with is should the CFG
be kept after SSA right through to RTL, or when the CFG is destroyed,
should the information be attached to the trees somehow and then used
during expansion to annotate the new CFG rtl creates, and/or used to
annotate the CFG for trees when the function is inlined. I think thats
fundamentally where we have decisions to make, so I would like to work
through the various differences. Im also about to go on vacation, so the
more I have to think about the better :-)

If we annotate the trees with profiling info when we are done with them,
reasonably correct branch information can be propagated into the inlined
code. If the CFG is the only place the information is stored, we'd also
have to keep the CFG for a() around in order to put its information into
into b() when its inlined. So we wouldn't just be keeping the trees for
inlineable functions around, we'd have to keep objectified CFG's as
well. Dont read that as me saying Im religously opposed to keeping CFGs
right through to RTL. Its merely an observation.


> The profile becomes less exact during optimization process, but overall
> most of compilers (ORC, IMPACT for instance) are able to keep it by such
> simplistics methods in good shape to bring benefits on BB reordering.
> Most questions about the profile are easy, like "is this loop executed
> insanily many times" or "is this basic block hot?"
> these questions are relatively stable WRT slight degenerations.
> 

Sure, but if we don't bring an inlinable function's profile into the new
function when we inline it, we are throwing away reasonably correct
information right off the bat. If you have all the probilities for a()'s
branches, you ought to bring those into the code you create when you
inline a() into b(). I'd like to make sure we do that.

Yes, The degenerations dont affect loop stuff so much as it does the
lower level bits like register allocation and branch reversal/prediction
at the rtl level, which will be the farthest away consumers of this
information, the side effects of which we cant predict, just measure :-)

So we presumably have to read the information back in during the second
compilation at the same point the information was written out during the
first one, or we dont get a good mapping of block execution counts
right? And thats somewhere in the SSA optimizer.

> > 
> > 4- Does anyone other than me find the idea of inlining optimized trees
> > appealing? I understand thats not on the table right now because there
> > are difficulties with EH, at the very least. I would think we could
> > encapsulate that info somehow, but clearly thats a problem that would
> > need to be solved.
> 
> Sure it is appealing.  In order to do something sane, you need to do
> some analysis that are not doable without CFG/early cleanups.
> As pointed out already, the benefits of inlining estimate by time of
> execution of the callee, while the costs is the size.  At the moment we
> do have only the second information and our strategy is "do as many
> inlinining as size constraint allows and lets hope that very fast
> functions will get inlined too then"
> 
> For instance my current recursive inlining code would do much better job
> if we are able to discover tail calls.  Another examples include partial
> inlining if functions like
> if (test)
>    common fast path
> else
>    something large
> 
> can be inlined as
> 
> if (test)
>   fast path
> else
>   do_something_large ()
> 

OK, so we're on the same page there, someday we want to inline optimized
functions :-)


So where do you envision the inliner then? It would have to be done
after the profiling information is read is in order to make use of it.
Presumably immediately after.  You also plan to create a CFG aware
inliner and objectify the CFG, and do the inlining from that? And then
the inliner would work on SSA? right after DCE or somesuch place. SO the
inliner will be an SSA inliner, or would we go out of SSA and back into
SSA at that point?

Andrew



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [tree-ssa, RFC] CFG transparent RTL expansion
  2003-12-19 17:20   ` law
  2003-12-19 17:44     ` Jan Hubicka
@ 2003-12-19 18:15     ` Andrew MacLeod
  2003-12-19 19:33       ` Jan Hubicka
  1 sibling, 1 reply; 20+ messages in thread
From: Andrew MacLeod @ 2003-12-19 18:15 UTC (permalink / raw)
  To: Jeff Law
  Cc: Jan Hubicka, gcc mailing list, gcc-patches, Richard Henderson,
	Diego Novillo

On Fri, 2003-12-19 at 11:23, law@redhat.com wrote:
> In message <1071801428.21456.162.camel@p4>, Andrew MacLeod writes:
>  >This implements something which isnt actually used yet (ie, no
>  >collection emitted/done). If I read it correctly, just a proof of
>  >concept of a persistant CFG. So lets skip to how it would actually be
>  >used. And excuse my lackof knowledge about how gcc collects profile info
>  >:-)
> Another potential use would be to detect when the tree->rtl lowering
> process creates new blocks.  This _may_ be interesting in that it may
> be an indication that rather than a block-local CSE we may want to run
> something a little more extensive.  
> 
> If it's not obvious, I'm thinking about how we start trimming down the
> first stage RTL optimizers -- knowing if new blocks were created may be
> useful in determining which early RTL optimizers to run or what mode to
> run them in (block-local or extended-basic-blocks).

Yeah, Since the driving force behind keeping the CFG live through rtl is
to keep profiling information, we should try to list other things that
would either benefit from this, or be impacted so that we have multiple
reasons.

Andrew


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [tree-ssa, RFC]  CFG transparent RTL expansion
  2003-12-19 18:08     ` Andrew MacLeod
@ 2003-12-19 19:11       ` Jan Hubicka
  2003-12-19 19:51       ` Steven Bosscher
  2003-12-31 22:57       ` Jan Hubicka
  2 siblings, 0 replies; 20+ messages in thread
From: Jan Hubicka @ 2003-12-19 19:11 UTC (permalink / raw)
  To: Andrew MacLeod
  Cc: Jan Hubicka, gcc mailing list, gcc-patches, Richard Henderson,
	Diego Novillo

> > > simplistic example, but lets say profiling indicates that we ought to
> > > inline a(z) when we are processing b. How does that profile information
> > > from a() get propagated into the code which is inlined into b? There is
> > > no CFG for a(), just the trees. Do you have to read the info for a() in
> > > from the profile source each time its inlined or how does that work?
> > 
> > The inlining problem is not dealt with this particular patch, it solves
> > just SSA<->RTL interface, but you know what is my longer term plan, but
> > lets try to not run into very deep details with the inlining interface
> > right now.
> > 
> 
> I realize this. The main purpose of this patch is to enable other
> things, so I think we need to discuss those other things in order to
> make decisions on whether this is the best approach.

Sure, just wanted to point out that the scope of current patch is
limited, so we won't run into missunderstandings like with Zdenek's edge
redirection patch.
> 
> 
> > This is common problem with updating profile (not only after inlining
> > but after any code specialization - unrolling, tracing or whatever). 
> > 
> > The idea is when such conflict is noticed, the profile gets updates
> > somehow partly incorrectly (such that all basic block directly dominated
> > by edge being removed gets their frequencies subtracted).  The
> > optimizations expect such partially invalided profiles and must behave
> > sanely in presence of them.
> 
> 
> 
> Clearly, as long as the CFG exists, thats where the information ought to
> be stored. The only real question I want to deal with is should the CFG
> be kept after SSA right through to RTL, or when the CFG is destroyed,
> should the information be attached to the trees somehow and then used
> during expansion to annotate the new CFG rtl creates, and/or used to
> annotate the CFG for trees when the function is inlined. I think thats
> fundamentally where we have decisions to make, so I would like to work
> through the various differences. Im also about to go on vacation, so the

Yes, I got your plan.  I was considering this too, but it seemed like
serious effort as we need to develop alternate way to store profile
information in both tree and RTL form and to correctly update the
profile after expansion one needs to know almost everything present in
the CFG (original basic blocks, edges and their frequencies).

This is why I felt somewhat pleased when I run first into idea of simply
preserving the CFG.  Orignaly I didn't think about it and I din't
expected it to be easy.

> more I have to think about the better :-)
> 
> If we annotate the trees with profiling info when we are done with them,
> reasonably correct branch information can be propagated into the inlined
> code. If the CFG is the only place the information is stored, we'd also
> have to keep the CFG for a() around in order to put its information into
> into b() when its inlined. So we wouldn't just be keeping the trees for
> inlineable functions around, we'd have to keep objectified CFG's as
> well. Dont read that as me saying Im religously opposed to keeping CFGs
> right through to RTL. Its merely an observation.

I still tend to think about interface to inliner and interface to RTL
expansion as two different things (but would preffer to see common
answer to these :).  RTL expansion is definitly more complicated
operation when one comes to control flow changes and CFG has very great
utility here.

Basic inlining is easier.  If one thinks about what is needed, one needs
to be able to figure out count/frequency of call expression one is
inlining into and scale counts/frequencies in inlined function
accordingly.  For this one needs GIMPLE representation of basic block
and edge counts/frequencies/probabilities, but in other words, this is
CFG again.

I also believe that CFG will serve more utility once we get into more
complicated partial inlining.  For instance loop unrolling looks like
optimization that don't need CFG as well (you simply copy loop body
around) but you start need it later when you want to do more smart
tricks.  I think inlining is just the same.

Main issue here is the memory footprint.  Definitly CFG needs to be
built before inlining for early optimizations so we save some CPU cycles
but saving CFG for all inline functions can be expensive.
I did some testing once Steven put CFG into GGC and the footprint is
about 3-5% of overall GGC allocation.  This is not too disasterous
especially accounting the fact that profile is much more bloated than it
needs to be (I will work on this shortly) and that it is built 4 times
during compilation of single function.
> 
> 
> > The profile becomes less exact during optimization process, but overall
> > most of compilers (ORC, IMPACT for instance) are able to keep it by such
> > simplistics methods in good shape to bring benefits on BB reordering.
> > Most questions about the profile are easy, like "is this loop executed
> > insanily many times" or "is this basic block hot?"
> > these questions are relatively stable WRT slight degenerations.
> > 
> 
> Sure, but if we don't bring an inlinable function's profile into the new
> function when we inline it, we are throwing away reasonably correct
> information right off the bat. If you have all the probilities for a()'s
> branches, you ought to bring those into the code you create when you
> inline a() into b(). I'd like to make sure we do that.
> 
> Yes, The degenerations dont affect loop stuff so much as it does the
> lower level bits like register allocation and branch reversal/prediction
> at the rtl level, which will be the farthest away consumers of this
> information, the side effects of which we cant predict, just measure :-)
> 
> So we presumably have to read the information back in during the second
> compilation at the same point the information was written out during the
> first one, or we dont get a good mapping of block execution counts
> right? And thats somewhere in the SSA optimizer.

Yes.  While in theory you can read profile at different stages and
attempt to update it, in fact if you want to do that you need to do
exactly the same bookeeping you do to maintain the profile, just in
symbolic way so you know how to map read info to the CFG, so it does not
make things any better.
> 
> > > 
> > > 4- Does anyone other than me find the idea of inlining optimized trees
> > > appealing? I understand thats not on the table right now because there
> > > are difficulties with EH, at the very least. I would think we could
> > > encapsulate that info somehow, but clearly thats a problem that would
> > > need to be solved.
> > 
> > Sure it is appealing.  In order to do something sane, you need to do
> > some analysis that are not doable without CFG/early cleanups.
> > As pointed out already, the benefits of inlining estimate by time of
> > execution of the callee, while the costs is the size.  At the moment we
> > do have only the second information and our strategy is "do as many
> > inlinining as size constraint allows and lets hope that very fast
> > functions will get inlined too then"
> > 
> > For instance my current recursive inlining code would do much better job
> > if we are able to discover tail calls.  Another examples include partial
> > inlining if functions like
> > if (test)
> >    common fast path
> > else
> >    something large
> > 
> > can be inlined as
> > 
> > if (test)
> >   fast path
> > else
> >   do_something_large ()
> > 
> 
> OK, so we're on the same page there, someday we want to inline optimized
> functions :-)
> 
> 
> So where do you envision the inliner then? It would have to be done
> after the profiling information is read is in order to make use of it.
> Presumably immediately after.  You also plan to create a CFG aware
> inliner and objectify the CFG, and do the inlining from that? And then
> the inliner would work on SSA? right after DCE or somesuch place. SO the
> inliner will be an SSA inliner, or would we go out of SSA and back into
> SSA at that point?

While I don't have exact answer for all question about perfrect
construction of ideal compiler, the idea I do have in mind is:

1) parsing pass
   - parsing
   - production of function bodies
2) finalizing compilation unit pass
   - do reachability analysis and throw away unneeded functions, turn
     needed functions into generic and gimple in the progress.
   - read feedback
   - simple early optimization
   - callgraph production
   (all these steps needs to be done together)
   - collection of local properties
2b) possible saving info into file
3) global pass  (don't need function bodies, just local info collected
   in 2 so we fit in memory)
   - callgraph optimization
   - global dataflow
   - global optimization decisions  (inlining, cloning and such)
4) assembling pass
   - get function body
   - perform transformation worked out by global optimizers
     (expand_calls_inline and such)
   - SSA, tree optimization

This scheme is more or less implemented by cgraph.  I am missing
completely 2b of course :) and also early optimization pass.

In order to minimize code duplication it would be nice to share
framework in simple early optimization (that should subsume someting
like dom1+dce1+cfgcleanup) and it would be nice to avoid converting into
and out of SSA form.  I think it would be cool to go into SSA form right
in step 2 and stay in SSA until RTL expansion.  That would declare SSA
as the IL majority of passes are interested in.  It is more or less done
by Open64 but it also has some drawbacks (optimizations doing a lot of
CFG manipulation do have problems with updating SSA, jump threading is
first example of this, I hope to soon make tracer to work on trees, but
still these optimizations often can take advnatage of SSA so going out
and in SSA can happen in the subpass)

There are memory consumption implications of this, so I am not sure
about this plan.  It is probably impossible to save virtual operands for
all functions so perhaps we can do SSA only on registers, not on memory
locations at that form or do the fallback plan of either going out of
SSA at the end of 2) or doing just non-SSA optimizations in 2), but I
don't like these very much.

Honza
> 
> Andrew
> 
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [tree-ssa, RFC] CFG transparent RTL expansion
  2003-12-19 18:15     ` Andrew MacLeod
@ 2003-12-19 19:33       ` Jan Hubicka
  0 siblings, 0 replies; 20+ messages in thread
From: Jan Hubicka @ 2003-12-19 19:33 UTC (permalink / raw)
  To: Andrew MacLeod
  Cc: Jeff Law, Jan Hubicka, gcc mailing list, gcc-patches,
	Richard Henderson, Diego Novillo

> On Fri, 2003-12-19 at 11:23, law@redhat.com wrote:
> > In message <1071801428.21456.162.camel@p4>, Andrew MacLeod writes:
> >  >This implements something which isnt actually used yet (ie, no
> >  >collection emitted/done). If I read it correctly, just a proof of
> >  >concept of a persistant CFG. So lets skip to how it would actually be
> >  >used. And excuse my lackof knowledge about how gcc collects profile info
> >  >:-)
> > Another potential use would be to detect when the tree->rtl lowering
> > process creates new blocks.  This _may_ be interesting in that it may
> > be an indication that rather than a block-local CSE we may want to run
> > something a little more extensive.  
> > 
> > If it's not obvious, I'm thinking about how we start trimming down the
> > first stage RTL optimizers -- knowing if new blocks were created may be
> > useful in determining which early RTL optimizers to run or what mode to
> > run them in (block-local or extended-basic-blocks).
> 
> Yeah, Since the driving force behind keeping the CFG live through rtl is
> to keep profiling information, we should try to list other things that
> would either benefit from this, or be impacted so that we have multiple
> reasons.

Well, profiling information has been always kind of excuse for rewriting
passes and cleaning up interfaces for me :)

I believe that there are multiple reasons for doing the changes.  Off my
head:
1) it allows to preserve control flow information of any kind easilly
(not only the control flow, for instance we don't represent whether
instruction may not trap because of nontrivial reasons in other way than
having or not having EH edge)
2) it allows optimizations to take place during expansion to limit
produced noise
3) more sanity checking is possible
(one can check that trapping instructions still trap, conditionals
didn't simplified into unconditionals, CFG representations are in sync
and more).
4) it is good start for overall cleanups of the expansion interface.  I
wanted to use profile information for a while in instruction splitting
and similar places but it has been bit dificult as these things were
generally designed around expand_expr interface that served an interface
in between backend and many frontends.  Now it is internal interface and
thus it is much more flexible.
5) it is stupid to rebuild CFG just after we thrown it away in last SSA
pass.  I tested that the patch in it's not very cool form saves 3
seconds out of 3 minute compilation of GCC componenets and of course
this is just very basic start - we throw away the profile very early
then.

Honza
> 
> Andrew
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [tree-ssa, RFC]  CFG transparent RTL expansion
  2003-12-19 18:08     ` Andrew MacLeod
  2003-12-19 19:11       ` Jan Hubicka
@ 2003-12-19 19:51       ` Steven Bosscher
  2003-12-31 22:57       ` Jan Hubicka
  2 siblings, 0 replies; 20+ messages in thread
From: Steven Bosscher @ 2003-12-19 19:51 UTC (permalink / raw)
  To: Andrew MacLeod, Jan Hubicka
  Cc: gcc mailing list, gcc-patches, Richard Henderson, Diego Novillo

On Friday 19 December 2003 18:00, Andrew MacLeod wrote:
> Clearly, as long as the CFG exists, thats where the information ought to
> be stored.

Of course.

> The only real question I want to deal with is should the CFG
> be kept after SSA right through to RTL, or when the CFG is destroyed,
> should the information be attached to the trees somehow and then used
> during expansion to annotate the new CFG rtl creates, and/or used to
> annotate the CFG for trees when the function is inlined. 

When we annotate trees, the profile information is not "update-able" as in 
just the CFG.  I'd expect that profile-annotated trees also are more fragile: 
Propagating counts/probabilities during expanding or inlining would have to 
be done in unintuitive ways if you don't simply have the CFG available.  To 
propagate such information without the CFG available doesn't sound very sane 
to me.

Also, you would have to first have to annotate trees, then RTL, and then 
finally the RTL CFG.  The RTL CFG is not constructed during tree->RTL 
expansion, so you'd need alternative storage for profile information in both 
trees and RTL.  Where do you think the profile information could be stored in 
the phases between expanding and RTL CFG construction?


> If we annotate the trees with profiling info when we are done with them,
> reasonably correct branch information can be propagated into the inlined
> code. If the CFG is the only place the information is stored, we'd also
> have to keep the CFG for a() around in order to put its information into
> into b() when its inlined.

Yes, we would need to remember the CFG for each useful function.  That's not 
very hard to do, and not very expensive in terms of memory usage, either.  It 
will be even less expensive after we've made basic_block_def a bit smaller by 
moving more things into (RTL-)bb annotations.

(It could be even less expensive if we'd lose GOTO_EXPRs hanging from 
COND_EXPRs, etc...  ;-)


> So we wouldn't just be keeping the trees for
> inlineable functions around, we'd have to keep objectified CFG's as
> well.

The CFG is not very hard to objectify.  You really only need to make the the 
entry and exit blocks reachable from, say, struct function.  We can do that 
now that the CFG is garbage collectable.  In fact, we could completely lose 
the function-as-tree representation as we have it now, and instead represent 
every function with just the CFG.


> So where do you envision the inliner then? It would have to be done
> after the profiling information is read is in order to make use of it.
> Presumably immediately after.

Yes, probably.


> You also plan to create a CFG aware
> inliner and objectify the CFG, and do the inlining from that?

Yes.


> And then
> the inliner would work on SSA? right after DCE or somesuch place. SO the
> inliner will be an SSA inliner, or would we go out of SSA and back into
> SSA at that point?

Inlining trees while in SSA form doesn't sound feasible at this point.  I 
think LLVM does this, but our representation is probably too heavy-weight, 
which makes having whole functions hanging around as trees in SSA form not a 
very attractive idea.  We could of course inline before going into SSA form. 
We have the CFG at that point, and we'd still be inlining optimized 
functions.

Gr.
Steven

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [tree-ssa, RFC]  CFG transparent RTL expansion
  2003-12-19 18:08     ` Andrew MacLeod
  2003-12-19 19:11       ` Jan Hubicka
  2003-12-19 19:51       ` Steven Bosscher
@ 2003-12-31 22:57       ` Jan Hubicka
  2004-01-20 18:32         ` Jan Hubicka
  2 siblings, 1 reply; 20+ messages in thread
From: Jan Hubicka @ 2003-12-31 22:57 UTC (permalink / raw)
  To: Andrew MacLeod
  Cc: Jan Hubicka, gcc mailing list, gcc-patches, Richard Henderson,
	Diego Novillo

> Clearly, as long as the CFG exists, thats where the information ought to
> be stored. The only real question I want to deal with is should the CFG
> be kept after SSA right through to RTL, or when the CFG is destroyed,
> should the information be attached to the trees somehow and then used
> during expansion to annotate the new CFG rtl creates, and/or used to
> annotate the CFG for trees when the function is inlined. I think thats
> fundamentally where we have decisions to make, so I would like to work
> through the various differences. Im also about to go on vacation, so the
> more I have to think about the better :-)

I guess the vacation is about to over :)  What are your conclusions?

Honza

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [tree-ssa, RFC]  CFG transparent RTL expansion
  2003-12-31 22:57       ` Jan Hubicka
@ 2004-01-20 18:32         ` Jan Hubicka
  2004-01-21 15:58           ` Andrew MacLeod
  0 siblings, 1 reply; 20+ messages in thread
From: Jan Hubicka @ 2004-01-20 18:32 UTC (permalink / raw)
  To: Jan Hubicka
  Cc: Andrew MacLeod, Jan Hubicka, gcc mailing list, gcc-patches,
	Richard Henderson, Diego Novillo, stuart, dalej

> > Clearly, as long as the CFG exists, thats where the information ought to
> > be stored. The only real question I want to deal with is should the CFG
> > be kept after SSA right through to RTL, or when the CFG is destroyed,
> > should the information be attached to the trees somehow and then used
> > during expansion to annotate the new CFG rtl creates, and/or used to
> > annotate the CFG for trees when the function is inlined. I think thats
> > fundamentally where we have decisions to make, so I would like to work
> > through the various differences. Im also about to go on vacation, so the
> > more I have to think about the better :-)
> 
> I guess the vacation is about to over :)  What are your conclusions?

Jeff and Andrew,
I would really appreciate some progress on this front.  This is blocker
not only for my plans but also for Stuart and Dale who is interested to
port some of Apple's profiling based inlining code into current
cgraphunit inlining implementation and help making tree-SSA profile
ready.

So in the case the CFG preserving idea still looks inferrior to you,
please explain alternate approach.

Thank you,
Honza
> 
> Honza

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [tree-ssa, RFC]  CFG transparent RTL expansion
  2004-01-20 18:32         ` Jan Hubicka
@ 2004-01-21 15:58           ` Andrew MacLeod
  2004-01-21 16:12             ` Diego Novillo
  2004-01-21 18:05             ` Jan Hubicka
  0 siblings, 2 replies; 20+ messages in thread
From: Andrew MacLeod @ 2004-01-21 15:58 UTC (permalink / raw)
  To: Jan Hubicka
  Cc: Jan Hubicka, gcc mailing list, gcc-patches, Richard Henderson,
	Diego Novillo, stuart, dalej

On Tue, 2004-01-20 at 13:32, Jan Hubicka wrote:
> > > Clearly, as long as the CFG exists, thats where the information ought to
> > > be stored. The only real question I want to deal with is should the CFG
> > > be kept after SSA right through to RTL, or when the CFG is destroyed,
> > > should the information be attached to the trees somehow and then used
> > > during expansion to annotate the new CFG rtl creates, and/or used to
> > > annotate the CFG for trees when the function is inlined. I think thats
> > > fundamentally where we have decisions to make, so I would like to work
> > > through the various differences. Im also about to go on vacation, so the
> > > more I have to think about the better :-)
> > 
> > I guess the vacation is about to over :)  What are your conclusions?
> 
> Jeff and Andrew,
> I would really appreciate some progress on this front.  This is blocker
> not only for my plans but also for Stuart and Dale who is interested to
> port some of Apple's profiling based inlining code into current
> cgraphunit inlining implementation and help making tree-SSA profile
> ready.
> 
> So in the case the CFG preserving idea still looks inferrior to you,
> please explain alternate approach.
> 

OK, so I am going to offer no alternative, just comments :-)

- A persistent CFG which crosses an IL translator concernes me a little,
but not enough to stonewall it. Its been pointed out that having the CFG
present during rtl expansion might discover a few things we weren't
aware of.

- A persistent CFG will require that anything between SSA and RTL
expansion (such as mudflap) be CFG aware. Other than the work involved
in making the conversion, this is also not necessarily a bad thing, and
may provide some benefits. Presumably this also means we have to make
all the bsi_* routines work on non-ssa trees as well. Otherwise mudflap
et al will have a hard time manipulating things. At the very least, we
dont want to create stmt annotations via the modify_stmt() call common
in the bsi_ routines.

- If we don't destroy the CFG, does this mean we are not going to put
the explicit GOTOs back in for fallthru edges which are not immediately
followed by their target label?  Again, a gnawing uncertainty about
that, but I have nothing concrete to contribute :-). Presumably that
means more work converting any passes between SSA and RTL. (or at least
is part of the conversion of things like mudflap and the expanders to
being CFG aware).

This also seem pretty invasive. If we decide in the next week or so that
an attempt to merge into mainline in the next couple of months is a
possibility, I would think we want to delay integrating this until such
time as we've straightened problems in the branch out, and then do this
work on mainline. I think we'll be pretty busy trying to satisfy
requirements to merge, and new hunks of large code like this might be
counter productive. If we dont merge with mainline, well then it would
be fair game in the branch.  Again, just my opinion :-) If we do end up
trying to merge with mainline, it would be in everyones best interest to
have as many contributing parties as possible helping move in that
direction. Hopefully within a week well have a resolution on whether we
are even going to make an attempt at 3.5 or not.

Since I am not offering you a viable alternative, you can take that as a
removal of my opposition to your plan :-). It may very well turn out to
have other positive benefits as well. You've clearly thought about it a
lot more than I have. 

Andrew



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [tree-ssa, RFC]  CFG transparent RTL expansion
  2004-01-21 15:58           ` Andrew MacLeod
@ 2004-01-21 16:12             ` Diego Novillo
  2004-01-21 18:05             ` Jan Hubicka
  1 sibling, 0 replies; 20+ messages in thread
From: Diego Novillo @ 2004-01-21 16:12 UTC (permalink / raw)
  To: Andrew Macleod
  Cc: Jan Hubicka, Jan Hubicka, gcc mailing list, gcc-patches,
	Richard Henderson, stuart, dalej

On Wed, 2004-01-21 at 10:57, Andrew MacLeod wrote:

> Since I am not offering you a viable alternative, you can take that as a
> removal of my opposition to your plan :-). It may very well turn out to
> have other positive benefits as well. You've clearly thought about it a
> lot more than I have. 
> 
I also have no major issues with this approach, but we should watch out
for breakage and, particularly, performance regressions.  I would very
interested in any benefits we could derive by doing RTL expansion using
a domwalk, for instance.


Diego.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [tree-ssa, RFC]  CFG transparent RTL expansion
  2004-01-21 15:58           ` Andrew MacLeod
  2004-01-21 16:12             ` Diego Novillo
@ 2004-01-21 18:05             ` Jan Hubicka
  2004-01-22 15:07               ` Andrew MacLeod
  1 sibling, 1 reply; 20+ messages in thread
From: Jan Hubicka @ 2004-01-21 18:05 UTC (permalink / raw)
  To: Andrew MacLeod
  Cc: Jan Hubicka, Jan Hubicka, gcc mailing list, gcc-patches,
	Richard Henderson, Diego Novillo, stuart, dalej

> OK, so I am going to offer no alternative, just comments :-)
> 
> - A persistent CFG which crosses an IL translator concernes me a little,
> but not enough to stonewall it. Its been pointed out that having the CFG
> present during rtl expansion might discover a few things we weren't
> aware of.
> 
> - A persistent CFG will require that anything between SSA and RTL
> expansion (such as mudflap) be CFG aware. Other than the work involved
> in making the conversion, this is also not necessarily a bad thing, and
> may provide some benefits. Presumably this also means we have to make
> all the bsi_* routines work on non-ssa trees as well. Otherwise mudflap
> et al will have a hard time manipulating things. At the very least, we
> dont want to create stmt annotations via the modify_stmt() call common
> in the bsi_ routines.

I actually had no problem using the bsi iterators after the last
Richard's reorganization of how they are linked.
We no longer need statement anotations so iterators just work.  Perhaps
I've missed something, but mudflap also seemed to just work (as well as
TER), but I will look into this in more detail.
> 
> - If we don't destroy the CFG, does this mean we are not going to put
> the explicit GOTOs back in for fallthru edges which are not immediately
> followed by their target label?  Again, a gnawing uncertainty about

This is technical detail.  It seems to me useless to produce it just
before expansion, but basically we can go any way.
I think with first version I will shoot for keeping the goto like they
are now and we can consider removing/not removing it later.
> 
> This also seem pretty invasive. If we decide in the next week or so that
> an attempt to merge into mainline in the next couple of months is a
> possibility, I would think we want to delay integrating this until such
> time as we've straightened problems in the branch out, and then do this
> work on mainline. I think we'll be pretty busy trying to satisfy
> requirements to merge, and new hunks of large code like this might be
> counter productive. If we dont merge with mainline, well then it would
> be fair game in the branch.  Again, just my opinion :-) If we do end up


Yeah, the timing got more complicated, but the patch itself is very
small (it is design change that makes it dificult, not the amount of
code) but it hits some side issues I think I can just start solving the
and pushing the patches to tree-SSA branch as long as it will be
comfortable to do so.

If it will start affecting the merge/code quality I can switch to branch
and target for merging shortly after tree-SSA.
> 
> Since I am not offering you a viable alternative, you can take that as a
> removal of my opposition to your plan :-). It may very well turn out to
> have other positive benefits as well. You've clearly thought about it a
> lot more than I have. 

Thanks.  I will start pushing patches once I get out of bugfixing mode.

Honza
> 
> Andrew
> 
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [tree-ssa, RFC]  CFG transparent RTL expansion
  2004-01-21 18:05             ` Jan Hubicka
@ 2004-01-22 15:07               ` Andrew MacLeod
  2004-01-22 15:54                 ` Jan Hubicka
  2004-01-22 19:33                 ` Zdenek Dvorak
  0 siblings, 2 replies; 20+ messages in thread
From: Andrew MacLeod @ 2004-01-22 15:07 UTC (permalink / raw)
  To: Jan Hubicka
  Cc: Jan Hubicka, gcc mailing list, gcc-patches, Richard Henderson,
	Diego Novillo, stuart, dalej

On Wed, 2004-01-21 at 12:58, Jan Hubicka wrote:
> > OK, so I am going to offer no alternative, just comments :-)
> > 
> > - A persistent CFG which crosses an IL translator concernes me a little,
> > but not enough to stonewall it. Its been pointed out that having the CFG
> > present during rtl expansion might discover a few things we weren't
> > aware of.
> > 
> > - A persistent CFG will require that anything between SSA and RTL
> > expansion (such as mudflap) be CFG aware. Other than the work involved
> > in making the conversion, this is also not necessarily a bad thing, and
> > may provide some benefits. Presumably this also means we have to make
> > all the bsi_* routines work on non-ssa trees as well. Otherwise mudflap
> > et al will have a hard time manipulating things. At the very least, we
> > dont want to create stmt annotations via the modify_stmt() call common
> > in the bsi_ routines.
> 
> I actually had no problem using the bsi iterators after the last
> Richard's reorganization of how they are linked.
> We no longer need statement anotations so iterators just work.  Perhaps
> I've missed something, but mudflap also seemed to just work (as well as
> TER), but I will look into this in more detail.

I beleive they should work fine, but you are inadvertanly creating a
useless stmt annotation every time modify_stmt() is called within
bsi_replace, or a routine like that. Presumably mudflap or anything
between SSA and RTL will manipulate stmts using those routines which
currently call modify_stmt()... At least I think :-)
> > 
> > - If we don't destroy the CFG, does this mean we are not going to put
> > the explicit GOTOs back in for fallthru edges which are not immediately
> > followed by their target label?  Again, a gnawing uncertainty about
> 
> This is technical detail.  It seems to me useless to produce it just
> before expansion, but basically we can go any way.
> I think with first version I will shoot for keeping the goto like they
> are now and we can consider removing/not removing it later.

Yeah, I *expect* that it will become unneccesary to put the GOTOs back
in, but you are probably right to leave it as is for now.

Andrew

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [tree-ssa, RFC]  CFG transparent RTL expansion
  2004-01-22 15:07               ` Andrew MacLeod
@ 2004-01-22 15:54                 ` Jan Hubicka
  2004-01-22 19:33                 ` Zdenek Dvorak
  1 sibling, 0 replies; 20+ messages in thread
From: Jan Hubicka @ 2004-01-22 15:54 UTC (permalink / raw)
  To: Andrew MacLeod
  Cc: Jan Hubicka, Jan Hubicka, gcc mailing list, gcc-patches,
	Richard Henderson, Diego Novillo, stuart, dalej

> I beleive they should work fine, but you are inadvertanly creating a
> useless stmt annotation every time modify_stmt() is called within
> bsi_replace, or a routine like that. Presumably mudflap or anything
> between SSA and RTL will manipulate stmts using those routines which
> currently call modify_stmt()... At least I think :-)

I guess I will work this out.

> > > 
> > > - If we don't destroy the CFG, does this mean we are not going to put
> > > the explicit GOTOs back in for fallthru edges which are not immediately
> > > followed by their target label?  Again, a gnawing uncertainty about
> > 
> > This is technical detail.  It seems to me useless to produce it just
> > before expansion, but basically we can go any way.
> > I think with first version I will shoot for keeping the goto like they
> > are now and we can consider removing/not removing it later.
> 
> Yeah, I *expect* that it will become unneccesary to put the GOTOs back
> in, but you are probably right to leave it as is for now.

Yes, I will try to do small steps and describe why I am doing it.  It
would be helpful if you can check the patches and tell me what do you
think about them.

Honza
> 
> Andrew

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [tree-ssa, RFC]  CFG transparent RTL expansion
  2004-01-22 15:07               ` Andrew MacLeod
  2004-01-22 15:54                 ` Jan Hubicka
@ 2004-01-22 19:33                 ` Zdenek Dvorak
  1 sibling, 0 replies; 20+ messages in thread
From: Zdenek Dvorak @ 2004-01-22 19:33 UTC (permalink / raw)
  To: Andrew MacLeod
  Cc: Jan Hubicka, Jan Hubicka, gcc mailing list, gcc-patches,
	Richard Henderson, Diego Novillo, stuart, dalej

Hello,

> On Wed, 2004-01-21 at 12:58, Jan Hubicka wrote:
> > > OK, so I am going to offer no alternative, just comments :-)
> > > 
> > > - A persistent CFG which crosses an IL translator concernes me a little,
> > > but not enough to stonewall it. Its been pointed out that having the CFG
> > > present during rtl expansion might discover a few things we weren't
> > > aware of.
> > > 
> > > - A persistent CFG will require that anything between SSA and RTL
> > > expansion (such as mudflap) be CFG aware. Other than the work involved
> > > in making the conversion, this is also not necessarily a bad thing, and
> > > may provide some benefits. Presumably this also means we have to make
> > > all the bsi_* routines work on non-ssa trees as well. Otherwise mudflap
> > > et al will have a hard time manipulating things. At the very least, we
> > > dont want to create stmt annotations via the modify_stmt() call common
> > > in the bsi_ routines.
> > 
> > I actually had no problem using the bsi iterators after the last
> > Richard's reorganization of how they are linked.
> > We no longer need statement anotations so iterators just work.  Perhaps
> > I've missed something, but mudflap also seemed to just work (as well as
> > TER), but I will look into this in more detail.
> 
> I beleive they should work fine, but you are inadvertanly creating a
> useless stmt annotation every time modify_stmt() is called within
> bsi_replace, or a routine like that. Presumably mudflap or anything
> between SSA and RTL will manipulate stmts using those routines which
> currently call modify_stmt()... At least I think :-)

iirc this should not be neccessary -- we consider statements without
annotations modified anyway, so modify_stmt can be just modified to
leave the statement without annotations.

Zdenek

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2004-01-22 19:32 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-12-19  5:37 [tree-ssa, RFC] CFG transparent RTL expansion Jan Hubicka
2003-12-19  6:50 ` Andrew MacLeod
2003-12-19 13:14   ` Jan Hubicka
2003-12-19 14:56     ` Jan Hubicka
2003-12-19 18:08     ` Andrew MacLeod
2003-12-19 19:11       ` Jan Hubicka
2003-12-19 19:51       ` Steven Bosscher
2003-12-31 22:57       ` Jan Hubicka
2004-01-20 18:32         ` Jan Hubicka
2004-01-21 15:58           ` Andrew MacLeod
2004-01-21 16:12             ` Diego Novillo
2004-01-21 18:05             ` Jan Hubicka
2004-01-22 15:07               ` Andrew MacLeod
2004-01-22 15:54                 ` Jan Hubicka
2004-01-22 19:33                 ` Zdenek Dvorak
2003-12-19 15:01   ` Jan Hubicka
2003-12-19 17:20   ` law
2003-12-19 17:44     ` Jan Hubicka
2003-12-19 18:15     ` Andrew MacLeod
2003-12-19 19:33       ` Jan Hubicka

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).