RFC: BB duplication code

public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed

* RFC: BB duplication code
@ 2001-08-22 11:29 Jan Hubicka
  2001-08-22 12:14 ` Richard Henderson
  0 siblings, 1 reply; 25+ messages in thread
From: Jan Hubicka @ 2001-08-22 11:29 UTC (permalink / raw)
  To: gcc-patches, rth, patches, gcc

Hi,
this patch contains BB duplication code I use in my bb-reorder code and tracer.
It is tied into bb-reorder infrastructure and allows you to simply call
reorder_duplicate_bb for basic block and place it anywhere in the reordered
chain.

While I am able to build bootstrapping and regtesting compiler with this
change (+tracer) I would like to discuss some issues first:

1) In case of copying tablejumps, the vector remains shared.  This is not
   good idea except very latest passes of compilation, but making tablejump
   unshared is dificult task, as code computing destination address using
   the table can be hoisted outside.
   What to do?  Prohibit duplication of tablejumps?
2) Pairable notes are splitted off - I should probably handle the
   paired notes inside single basic block by duplicating, but what to do
   with notes starting or ending in the BB?
   What about EH region notes?  Should I prohibit duplication if EH region
   starts or ends in the BB?
3) Can I update debug information using exiting bb-reorder infrastructure?
   It seems to work already somehow.
4) What about epilogues - I should probably copy the frame_related flag,
   but will the debug output happy to see multiple epilogues?
   It should probably, as it is able to see epilogue in the middle of code.
   What about EPILOGUE_BEGIN note? and FUNCTION_END?
5) Something else I've missed?

Well, thats all for now.  Hope that these issues will settle down somehow
so I will be able to contribute the code soon.

Honza

Wed Aug 22 20:27:43 CEST 2001  Jan Hubicka  <jh@suse.cz>
	* bb-reorder.c (reorder_duplicate_bb): New function.

*** bb-reorder.c1	Tue Aug 21 17:47:08 2001
--- bb-reorder.c	Wed Aug 22 20:22:35 2001
*************** static rtx get_next_bb_note		PARAMS ((rt
*** 196,201 ****
--- 196,202 ----
  static rtx get_prev_bb_note		PARAMS ((rtx));
  
  void verify_insn_chain			PARAMS ((void));
+ static basic_block reorder_duplicate_bb PARAMS ((basic_block, edge));
  \f
  /* Skip over inter-block insns occurring after BB which are typically
     associated with BB (e.g., barriers). If there are any such insns,
*************** record_effective_endpoints ()
*** 309,314 ****
--- 310,458 ----
      }
  }
  
+ /* Duplicate block BB and redirect edge E into it.  */
+ static basic_block
+ reorder_duplicate_bb (bb, e)
+      basic_block bb;
+      edge e;
+ {
+   rtx last = get_last_insn ();
+   rtx insn, new = NULL_RTX;
+   edge s, n;
+   basic_block new_bb;
+ 
+   if (!bb->pred || !bb->pred->pred_next)
+     abort ();
+ 
+   /* Create copy at the end of INSN chain.  The chain will
+      be reordered later.  */
+ 
+   new_bb = (basic_block) obstack_alloc (&flow_obstack, sizeof (*new_bb));
+   memset (new_bb, 0, sizeof (*new_bb));
+   if (new_bb->global_live_at_start)
+     {
+       new_bb->global_live_at_start = OBSTACK_ALLOC_REG_SET (&flow_obstack);
+       new_bb->global_live_at_end = OBSTACK_ALLOC_REG_SET (&flow_obstack);
+       COPY_REG_SET (new_bb->global_live_at_start, bb->global_live_at_start);
+       COPY_REG_SET (new_bb->global_live_at_end, bb->global_live_at_end);
+     }
+ 
+   /* Place the new block after the last block.  */
+   VARRAY_GROW (basic_block_info, ++n_basic_blocks);
+   new_bb = (basic_block) obstack_alloc (&flow_obstack, sizeof (*new_bb));
+   memset (new_bb, 0, sizeof (*new_bb));
+   BASIC_BLOCK (n_basic_blocks - 1) = new_bb;
+ 
+   new_bb = BASIC_BLOCK (n_basic_blocks - 1);
+   new_bb->loop_depth = bb->loop_depth;
+   for (s = bb->succ; s; s = s->succ_next)
+     {
+       n = (edge) xcalloc (1, sizeof (*n));
+       n->src = new_bb;
+       n->dest = s->dest;
+       n->flags = s->flags;
+       n->probability = s->probability;
+       if (bb->count)
+         n->count = s->count * e->count / bb->count;
+       else
+ 	n->count = 0;
+       n->succ_next = new_bb->succ;
+       new_bb->succ = n;
+       n->pred_next = n->dest->pred;
+       n->dest->pred = n;
+     }
+ 
+   new_bb->count = e->count;
+   new_bb->frequency = EDGE_FREQUENCY (e);
+   bb->count -= e->count;
+   bb->frequency -= EDGE_FREQUENCY (e);
+ 
+ 
+   for (insn = RBI (bb)->eff_head; insn != NEXT_INSN (RBI (bb)->eff_end);
+        insn = NEXT_INSN (insn))
+     {
+       switch (GET_CODE (insn))
+ 	{
+ 	  case INSN:
+ 	    new = emit_insn (copy_insn (PATTERN (insn)));
+ 	    if (REG_NOTES (insn))
+ 	      REG_NOTES (new) = copy_insn (REG_NOTES (insn));
+ 	    break;
+ 	  case JUMP_INSN:
+ 	    /* Share jumptables for both copies of tablejump.  */
+ 	    if (GET_CODE (PATTERN (insn)) == ADDR_VEC
+ 		|| GET_CODE (PATTERN (insn)) == ADDR_DIFF_VEC)
+ 	      continue;
+ 	    new = emit_jump_insn (copy_insn (PATTERN (insn)));
+ 	    if (REG_NOTES (insn))
+ 	      REG_NOTES (new) = copy_insn (REG_NOTES (insn));
+ 	    mark_jump_label (PATTERN (new), new, 0);
+ 	    if (JUMP_LABEL (new))
+ 	      LABEL_NUSES (JUMP_LABEL (new))++;
+ 	    break;
+ 	  case CALL_INSN:
+ 	    new = emit_call_insn (copy_insn (PATTERN (insn)));
+ 	    if (REG_NOTES (insn))
+ 	      REG_NOTES (new) = copy_insn (REG_NOTES (insn));
+ 	    if (CALL_INSN_FUNCTION_USAGE (insn))
+ 	      CALL_INSN_FUNCTION_USAGE (new) = copy_insn (CALL_INSN_FUNCTION_USAGE (insn));
+ 	  case CODE_LABEL:
+ 	    /*new = emit_label (gen_label_rtx ());*/
+ 	    break;
+ 	  case BARRIER:
+ 	    new = emit_barrier ();
+ 	    break;
+ 	  case NOTE:
+ 	    switch (NOTE_LINE_NUMBER (insn))
+ 	      {
+ 		case NOTE_INSN_DELETED:
+ 		case NOTE_INSN_DELETED_LABEL:
+ 		case NOTE_INSN_LOOP_VTOP:
+ 		case NOTE_INSN_LOOP_CONT:
+ 		  break;
+ 		case NOTE_INSN_BASIC_BLOCK:
+ 		  new = emit_note (NULL, NOTE_INSN_BASIC_BLOCK);
+ 		  NOTE_BASIC_BLOCK (new) = new_bb;
+ 		  new_bb->head = new;
+ 		  break;
+ 		case NOTE_INSN_LOOP_BEG:
+ 		case NOTE_INSN_LOOP_END:
+ 		case NOTE_INSN_RANGE_BEG:
+ 		case NOTE_INSN_RANGE_END:
+ 		case NOTE_INSN_EPILOGUE_BEG:
+ 		case NOTE_INSN_PROLOGUE_END:
+ 		case NOTE_INSN_BLOCK_BEG:
+ 		case NOTE_INSN_BLOCK_END:
+ 		case NOTE_INSN_FUNCTION_END:
+ 		case NOTE_INSN_FUNCTION_BEG:
+ 		case NOTE_INSN_EH_REGION_BEG:
+ 		case NOTE_INSN_EH_REGION_END:
+ 		  /* Strip down pairable notes and keep them only
+ 		     in the "master copy".  */
+ 		  break;
+ 		default:
+ 		  new = emit_note (NOTE_SOURCE_FILE (insn), NOTE_LINE_NUMBER (insn));
+ 	      }
+ 	}
+       if (bb->end == insn)
+ 	new_bb->end = new;
+       if (new && INSN_P (new) && basic_block_for_insn)
+ 	set_block_for_insn (new, new_bb);
+       /*debug_rtx (new);*/
+     }
+   new_bb->index = n_basic_blocks + 1;
+   if (e->flags & EDGE_FALLTHRU)
+     redirect_edge_succ (e, new_bb);
+   else
+     redirect_edge_and_branch (e, new_bb);
+   new_bb->index = n_basic_blocks - 1;
+   new_bb->aux = xmalloc (sizeof (struct reorder_block_def));
+   RBI (new_bb)->eff_head = NEXT_INSN (last);
+   RBI (new_bb)->eff_end = get_last_insn ();
+   RBI (new_bb)->scope = RBI (bb)->scope;
+   return new_bb;
+ }
+ 
  
  /* Compute an ordering for a subgraph beginning with block BB.  Record the
     ordering in RBI()->index and chained through RBI()->next.  */

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: RFC: BB duplication code
  2001-08-22 11:29 RFC: BB duplication code Jan Hubicka
@ 2001-08-22 12:14 ` Richard Henderson
  2001-08-22 12:27   ` Jan Hubicka
  2001-08-22 16:04   ` RFC: BB duplication code Joern Rennecke
  0 siblings, 2 replies; 25+ messages in thread
From: Richard Henderson @ 2001-08-22 12:14 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: gcc-patches, patches, gcc

On Wed, Aug 22, 2001 at 08:29:00PM +0200, Jan Hubicka wrote:
> 1) In case of copying tablejumps, the vector remains shared.  This is not
>    good idea except very latest passes of compilation, but making tablejump
>    unshared is dificult task, as code computing destination address using
>    the table can be hoisted outside.
>    What to do?  Prohibit duplication of tablejumps?

In the case of JUMP_TABLES_IN_TEXT_SECTION, yes, you absolutely cannot
share the table.  This is used by targets that use pc-relative addressing
from the jump instruction to the table.  VAX is the most extreme example
in that the entire table is an argument to the instruction.

Without JUMP_TABLES_IN_TEXT_SECTION, you might be able to get away with
sharing the tables.  I don't have a good feel for what sort of problems
you might encounter.

> 2) Pairable notes are splitted off - I should probably handle the
>    paired notes inside single basic block by duplicating, but what to do
>    with notes starting or ending in the BB?

Which ones?

>    What about EH region notes?  Should I prohibit duplication if EH region
>    starts or ends in the BB?

There are no NOTE_INSN_EH_REG_BE/END notes during optimization.  They
only exist during initial rtl generation and during final assemble output.

> 3) Can I update debug information using exiting bb-reorder infrastructure?
>    It seems to work already somehow.

Yes, that's already implemented.

> 4) What about epilogues - I should probably copy the frame_related flag,

The frame_related flag is never set for epilogues, only prologues.

> 	* bb-reorder.c (reorder_duplicate_bb): New function.

Not ok yet -- you'll have to figure out what to do about addr_vec.

> + 	      CALL_INSN_FUNCTION_USAGE (new) = copy_insn (CALL_INSN_FUNCTION_USAGE (insn));

Wrap.

> + 	  case CODE_LABEL:
> + 	    /*new = emit_label (gen_label_rtx ());*/
[...]
> +       /*debug_rtx (new);*/

Debugging?

r~

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: RFC: BB duplication code
  2001-08-22 12:14 ` Richard Henderson
@ 2001-08-22 12:27   ` Jan Hubicka
  2001-08-22 13:04     ` Richard Henderson
  2001-08-22 16:04   ` RFC: BB duplication code Joern Rennecke
  1 sibling, 1 reply; 25+ messages in thread
From: Jan Hubicka @ 2001-08-22 12:27 UTC (permalink / raw)
  To: Richard Henderson, Jan Hubicka, gcc-patches, patches, gcc

> On Wed, Aug 22, 2001 at 08:29:00PM +0200, Jan Hubicka wrote:
> > 1) In case of copying tablejumps, the vector remains shared.  This is not
> >    good idea except very latest passes of compilation, but making tablejump
> >    unshared is dificult task, as code computing destination address using
> >    the table can be hoisted outside.
> >    What to do?  Prohibit duplication of tablejumps?
> 
> In the case of JUMP_TABLES_IN_TEXT_SECTION, yes, you absolutely cannot
> share the table.  This is used by targets that use pc-relative addressing
> from the jump instruction to the table.  VAX is the most extreme example
> in that the entire table is an argument to the instruction.
> 
> Without JUMP_TABLES_IN_TEXT_SECTION, you might be able to get away with
> sharing the tables.  I don't have a good feel for what sort of problems
> you might encounter.
In case I do duplication soon, in order to form superblock and do subsequent
optimizations on each superblock differently, I eventually will want
to redirect jumptable in one of superblock (lets say by edge redirection)
and due to sharing I will redirect both jumptables.

IMO we should absolutely avoid shared jumptables during compilation and
possibly make them shared at final pass, if this turns out to be
common enought.

But this has problem of tracking down instruction calculaitng the value.
> 
> > 2) Pairable notes are splitted off - I should probably handle the
> >    paired notes inside single basic block by duplicating, but what to do
> >    with notes starting or ending in the BB?
> 
> Which ones?
Any of them. I can imagine, if I do have LOOP_BEG/LOOP_END pair, I can
easilly preserve it, but if the loop cosist of multiple basic blocks,
I will get LOOP_BEG in one instance and while duplicating the basic block
i don't know if the basic block holding LOOP_END will get duplicated too.

Sane way is to strip them.  This works for superblock formation/loop peeling
as it usually don't want to duplicate whole loop.
But if I do, I lose the information.

LOOP notes are not problem as we should move away from them and
their absence does not harm, but what about BLOCK_BEG/BLOCK_END?
> 
> >    What about EH region notes?  Should I prohibit duplication if EH region
> >    starts or ends in the BB?
> 
> There are no NOTE_INSN_EH_REG_BE/END notes during optimization.  They
> only exist during initial rtl generation and during final assemble output.
Good for me :)
> 
> > 3) Can I update debug information using exiting bb-reorder infrastructure?
> >    It seems to work already somehow.
> 
> Yes, that's already implemented.
It is, but I need to understand how and see if it still works with presence
of duplicated BB - OK, I will try to investigate this futher.
> 
> > 4) What about epilogues - I should probably copy the frame_related flag,
> 
> The frame_related flag is never set for epilogues, only prologues.
> 
> > 	* bb-reorder.c (reorder_duplicate_bb): New function.
> 
> Not ok yet -- you'll have to figure out what to do about addr_vec.
> 
> > + 	      CALL_INSN_FUNCTION_USAGE (new) = copy_insn (CALL_INSN_FUNCTION_USAGE (insn));
> 
> Wrap.
> 
> > + 	  case CODE_LABEL:
> > + 	    /*new = emit_label (gen_label_rtx ());*/
> [...]
> > +       /*debug_rtx (new);*/
> 
> Debugging?
Oh yes - as commented, I didn't wanted to OKay it yet.
But it looks like there are fewer issues than I've expected.

So open is tablejumps - I suggest go initialy by prohibiting duplication.
Later we may want with some conservative way - in case we will find the
instruciton taking CODE_LABEL of ADDR_VEC as argument we will allow duplication
otherwise we won't.

For now I will add predicate can_diplicate_bb_p that will check this condition
and fail in presense of tablejump.

Other problem is BLOCK_BEG and debug info.  I will try to investigate it
overnight.

Honza

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: RFC: BB duplication code
  2001-08-22 12:27   ` Jan Hubicka
@ 2001-08-22 13:04     ` Richard Henderson
  2001-08-23  6:44       ` Jan Hubicka
  0 siblings, 1 reply; 25+ messages in thread
From: Richard Henderson @ 2001-08-22 13:04 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: gcc-patches, patches, gcc

On Wed, Aug 22, 2001 at 09:27:56PM +0200, Jan Hubicka wrote:
> But this has problem of tracking down instruction calculaitng the value.

Yep.  I have no good idea how so solve that.

> Sane way is to strip them.  This works for superblock formation/loop peeling
> as it usually don't want to duplicate whole loop.
> But if I do, I lose the information.

I'd say strip the loop notes.  You don't really lose any information,
in that correct information can always be obtained from the shape of
the CFG, via flow_loop_nodes_find.

> LOOP notes are not problem as we should move away from them and
> their absence does not harm, but what about BLOCK_BEG/BLOCK_END?

BLOCK_BEG/END must be properly duplicated for debug info to be correct.

> It is, but I need to understand how and see if it still works with presence
> of duplicated BB - OK, I will try to investigate this futher.

It should.  The reorder_blocks (bad name) function is also used 
for loop unrolling, which does duplication.  Additionally, when
bb-reorder moves two blocks that shared a BLOCK_BEG/END, duplicates
are created.

> So open is tablejumps - I suggest go initialy by prohibiting duplication.

Seems good.

> Later we may want with some conservative way - in case we will find the
> instruciton taking CODE_LABEL of ADDR_VEC as argument we will allow
> duplication otherwise we won't.

Careful, if the target uses a pc-relative load, there may be
no such instruction.  Which is actually ideal for you.

The best way would be to search the dominator tree for the insn
that loads the label.  What to do when you find it is somewhat
complicated.  It's not impossible that the entire table index,
load, and decode could have been hoisted.

There are probably simple cases that can be handled easily, but
you'll probably always have to be able to cancel the duplication.

r~

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: RFC: BB duplication code
  2001-08-22 12:14 ` Richard Henderson
  2001-08-22 12:27   ` Jan Hubicka
@ 2001-08-22 16:04   ` Joern Rennecke
  2001-08-22 16:44     ` Richard Henderson
  1 sibling, 1 reply; 25+ messages in thread
From: Joern Rennecke @ 2001-08-22 16:04 UTC (permalink / raw)
  To: Richard Henderson; +Cc: Jan Hubicka, gcc-patches, patches, gcc

> In the case of JUMP_TABLES_IN_TEXT_SECTION, yes, you absolutely cannot
> share the table.  This is used by targets that use pc-relative addressing
> from the jump instruction to the table.  VAX is the most extreme example
> in that the entire table is an argument to the instruction.

This is target-dependent.  Some targets just put the jump table in the text
section because the linker can't handle cross-section differences, and
they use differences (either always or for pic / PIC).

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: RFC: BB duplication code
  2001-08-22 16:04   ` RFC: BB duplication code Joern Rennecke
@ 2001-08-22 16:44     ` Richard Henderson
  0 siblings, 0 replies; 25+ messages in thread
From: Richard Henderson @ 2001-08-22 16:44 UTC (permalink / raw)
  To: Joern Rennecke; +Cc: Jan Hubicka, gcc-patches, patches, gcc

On Thu, Aug 23, 2001 at 12:04:48AM +0100, Joern Rennecke wrote:
> This is target-dependent.  Some targets just put the jump table in the text
> section because the linker can't handle cross-section differences, and
> they use differences (either always or for pic / PIC).

True, JUMP_TABLES_IN_TEXT_SECTION is an overestimate.  However
it's fairly close to accurate.  Certainly it doesn't miss any
cases for which pc-relative loads *are* used.


r~

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: RFC: BB duplication code
  2001-08-22 13:04     ` Richard Henderson
@ 2001-08-23  6:44       ` Jan Hubicka
  2001-08-23 16:19         ` Richard Henderson
  2001-09-07 20:34         ` Loop optimiser upgrade (Was RFC: BB duplication code) Michael Hayes
  0 siblings, 2 replies; 25+ messages in thread
From: Jan Hubicka @ 2001-08-23  6:44 UTC (permalink / raw)
  To: Richard Henderson, Jan Hubicka, gcc-patches, patches, gcc

> On Wed, Aug 22, 2001 at 09:27:56PM +0200, Jan Hubicka wrote:
> > But this has problem of tracking down instruction calculaitng the value.
> 
> Yep.  I have no good idea how so solve that.
Neighter I do - so lets avoid duplication for now.  I believe it is not
critical for the optimizations to be effective.
> 
> > Sane way is to strip them.  This works for superblock formation/loop peeling
> > as it usually don't want to duplicate whole loop.
> > But if I do, I lose the information.
> 
> I'd say strip the loop notes.  You don't really lose any information,
> in that correct information can always be obtained from the shape of
> the CFG, via flow_loop_nodes_find.
Agreed - anyway an loop optimizer is still an barrier for my effort.
Until it is converted to use CFG, we can't make flow info survive.

BB duplication framework can be usefull step in rewriting loop optimizer,
as loop unrolling can be definitly done at the top of it.
Converting whole loop optimizer at once can be huge task. Additionally some
pieces should be already obsoletted (as code hoisting).
So I think that in mid term, we can make the BIV/GIV discovery code idedendent
on loop notes (it should not be that dificult) and start work on new loop
unroller done as separate pass.
I was looking at some compiler designs and loop unrolling seems to be done
mostly at pretty late levels of compilation (as strength reduction is),
while other loop transformations are done early.

Do we have plans about the loop transformations done at tree level?
> 
> > LOOP notes are not problem as we should move away from them and
> > their absence does not harm, but what about BLOCK_BEG/BLOCK_END?
> 
> BLOCK_BEG/END must be properly duplicated for debug info to be correct.
I see.
> 
> > It is, but I need to understand how and see if it still works with presence
> > of duplicated BB - OK, I will try to investigate this futher.
> 
> It should.  The reorder_blocks (bad name) function is also used 
> for loop unrolling, which does duplication.  Additionally, when
> bb-reorder moves two blocks that shared a BLOCK_BEG/END, duplicates
> are created.
This means that when I am duplicating BB that starts BLOCK_BEG in the middle, I
can easilly add an paired BLOCK_END at the end of basic block and get info
correct?

(symetrically att BLOCK_BEG to the begginign of block if needed)?
> Careful, if the target uses a pc-relative load, there may be
> no such instruction.  Which is actually ideal for you.
I see there are many side corners :(

Honza

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: RFC: BB duplication code
  2001-08-23  6:44       ` Jan Hubicka
@ 2001-08-23 16:19         ` Richard Henderson
  2001-08-24  6:35           ` Jan Hubicka
  2001-09-07 20:34         ` Loop optimiser upgrade (Was RFC: BB duplication code) Michael Hayes
  1 sibling, 1 reply; 25+ messages in thread
From: Richard Henderson @ 2001-08-23 16:19 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: gcc-patches, patches, gcc

On Thu, Aug 23, 2001 at 03:44:00PM +0200, Jan Hubicka wrote:
> This means that when I am duplicating BB that starts BLOCK_BEG in the
> middle, I can easilly add an paired BLOCK_END at the end of basic block
> and get info correct?

Not quite.  There are some data strucures that need updating.
See the end of relate_bbs_with_scopes.


r~

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: RFC: BB duplication code
  2001-08-23 16:19         ` Richard Henderson
@ 2001-08-24  6:35           ` Jan Hubicka
  0 siblings, 0 replies; 25+ messages in thread
From: Jan Hubicka @ 2001-08-24  6:35 UTC (permalink / raw)
  To: Richard Henderson, Jan Hubicka, gcc-patches, patches, gcc

> On Thu, Aug 23, 2001 at 03:44:00PM +0200, Jan Hubicka wrote:
> > This means that when I am duplicating BB that starts BLOCK_BEG in the
> > middle, I can easilly add an paired BLOCK_END at the end of basic block
> > and get info correct?
> 
> Not quite.  There are some data strucures that need updating.
> See the end of relate_bbs_with_scopes.
Yes, I've went across the file last night and I believe I understand the issues
already.  It now appears to emit proper debug info, but I've introduced
bootstrap failure :(, so I will send updated patch once this is done.

Thanks a lot for your help!

Honza
> 
> 
> r~

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Loop optimiser upgrade (Was RFC: BB duplication code)
  2001-08-23  6:44       ` Jan Hubicka
  2001-08-23 16:19         ` Richard Henderson
@ 2001-09-07 20:34         ` Michael Hayes
  2001-09-07 22:23           ` Daniel Berlin
                             ` (5 more replies)
  1 sibling, 6 replies; 25+ messages in thread
From: Michael Hayes @ 2001-09-07 20:34 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: gcc

[-- Attachment #1: Type: text/plain, Size: 2389 bytes --]

Jan Hubicka writes:

 > > I'd say strip the loop notes.  You don't really lose any information,
 > > in that correct information can always be obtained from the shape of
 > > the CFG, via flow_loop_nodes_find.
 > Agreed - anyway an loop optimizer is still an barrier for my effort.
 > Until it is converted to use CFG, we can't make flow info survive.

I agree that the loop notes should be weeded out.  The one useful loop
note is the VTOP note that indicates if the loop exit code has been
duplicated.  When this loop exists and we know that the loop body will
be executed at least once if the loop header is entered.  I'm not sure
of an easy means to determine this from the CFG.

 > Converting whole loop optimizer at once can be huge task. Additionally some
 > pieces should be already obsoletted (as code hoisting).

Yes, I agree.  Before I got snowed under with my real job I had a big
hack at converting the old loop optimiser to use the CFG.  The problem
was ensuring that I did not break anything.  I think a better approach
is to write another loop optimiser that can run after the current loop
optimiser and then one day replace it completely.  After the current
loop optimiser has run, I'd say that we should strip the loop notes
completely (or at least regenerate them from the CFG) and rely totally
on the CFG.

 > So I think that in mid term, we can make the BIV/GIV discovery code idedendent
 > on loop notes (it should not be that dificult) and start work on new loop
 > unroller done as separate pass.

Yes, this could be done standalone.  However, it would need loop
invariant discovery code first.  This is one of the reasons I wrote
the dataflow code that Dan Berlin has marvellously souped up for the
new register allocator.  It is also the reason I added routines to
loop.c for sinking and hoisting insns so that I could track the
changes to the dataflow.

What started to stump me was an efficient way to update the dataflow
information incrementally after each loop optimisation.  To mitigate
the dataflow computation needed, I structured the loop optimiser to
optimise all the innermost loops first, then all the next level loops,
and so on.  This way the loops could be optimised independently
without having to always recompute the dataflow information.

I've attached the basic structure of the data flow based loop
optimiser that I was working on.

Michael


[-- Attachment #2: loop-df.c --]
[-- Type: text/plain, Size: 11441 bytes --]

static void 
loop_notes_strip (f)
     rtx f;
{
  rtx insn;

  /* Strip old loop notes.  */
  for (insn = f; insn; insn = NEXT_INSN (insn))
    {
      if (GET_CODE (insn) == NOTE
	  && (NOTE_LINE_NUMBER (insn) == NOTE_INSN_LOOP_BEG 
	      || NOTE_LINE_NUMBER (insn) == NOTE_INSN_LOOP_CONT
	      || NOTE_LINE_NUMBER (insn) == NOTE_INSN_LOOP_END))
	delete_insn (insn);
    }
}


static void 
loop_note_stats_dump (f, stream)
     rtx f;
     FILE *stream;
{
  struct {
    int loops;
    int vtops;
    int conts;
  } loop_stats;

 
  /* Log old loop note stats.  */
  loop_stats.loops = 0;
  loop_stats.conts = 0;
  loop_stats.vtops = 0;
  
  for (insn = f; insn; insn = NEXT_INSN (insn))
    {
      if (GET_CODE (insn) == NOTE
	  && NOTE_LINE_NUMBER (insn) == NOTE_INSN_LOOP_BEG)
	loop_stats.loops++;
      else if (GET_CODE (insn) == NOTE
	       && NOTE_LINE_NUMBER (insn) == NOTE_INSN_LOOP_CONT)
	loop_stats.conts++;
      else if (GET_CODE (insn) == NOTE
	       && NOTE_LINE_NUMBER (insn) == NOTE_INSN_LOOP_VTOP)
	loop_stats.vtops++;
    }
  
  if (stream)
    fprintf (stream, 
	     "\n;; Loop stats: %d loops, %d conts, %d vtops\n",
	     loop_stats.loops, loop_stats.conts, loop_stats.vtops);
}  


/* Split an edge to create a new basic block and insert a
   NOTE_INSN_DELETED as a placeholder.  */

static void
loop_edge_split (e)
     edge e;
{
  rtx seq;

  start_sequence ();
  emit_note (NULL_PTR, NOTE_INSN_DELETED);
  seq = gen_sequence ();
  end_sequence ();
  
  /* If we don't need to split the edge then we still want to add a
     dummy note placeholder to the end of the block so that we can
     insert insns before it without having to update bb->end all the
     time.  */

  insert_insn_on_edge (seq, e);
}


static void
loop_pre_header_create (e)
     edge e;
{
  basic_block bb = e->src;

  /* We know that this edge is the only forward edge into
     the loop header.  */

  /* If the source of the edge has a single successor then the source
     is the pre_header and we do not have to split any edges.  */
  if (bb->succ->succ_next == NULL
      && bb != ENTRY_BLOCK_PTR)
    {
      /* insert_insn_on_edge cannot handle a block with a single
	 successor and with a single jump_insn at the end.  Note that
	 we know that the jump must be unconditional since the block
	 has a single successor.  We can use this jump insn to hoist
	 insns before.  */
      if (GET_CODE (bb->end) == JUMP_INSN)
	return;

      /* We need a dummy insn at the end of the block to insert the
	 insn before.  loop_edge_split will do this for us although no
	 actual edges get split.  */
    }
  
  loop_edge_split (e);
}


static void
loop_landing_pad_create (e)
     edge e;
{
  loop_edge_split (e);
}



/* Set the invalid flag for each loop we cannot process.  */

static void
loops_validate (loops)
     struct loops *loops;
{
  int i;
  
  for (i = 0; i < loops->num; i++)
    {
      int j;
      struct loop *loop = &loops->array[i];
      
      /* Invalidate natural loops with multiple forward edges into the
	 loop header or with shared headers for now.  These should be
	 transformed to have a single entry edge to simplify hoisting
	 or we could insert hoisted insns onto all entry edges.  
	 Also invalidate loops with abnormal entry or exit edges
	 since we cannot split these edges.  */
      if (loop->num_entries != 1 || loop->shared
	  || (loop->entry_edges[0]->flags & EDGE_ABNORMAL) != 0)
	{
	  loop->invalid = 1;
	  continue;
	}
      
      for (j = 0; j < loop->num_exits; j++)
	if ((loop->exit_edges[j]->flags & EDGE_ABNORMAL) != 0)
	  loop->invalid = 1;
    }
}


/* Entry point of this file.  Perform loop optimization
   on the current function.  F is the first insn of the function
   and DUMPFILE is a stream for output of a trace of actions taken
   (or 0 if none should be output).  */

void
loop_optimize (f, dumpfile, flags)
     /* f is the first instruction of a chain of insns for one function.  */
     rtx f;
     FILE *dumpfile;
     int flags;
{
  register rtx insn;
  register int i;
  struct loops loops_data;
  struct loops *loops = &loops_data;
  struct loop_info *loops_info;
  struct loop **loops_order;
  int loop_level;
  int n_basic_blocks_orig;

  loop_dump_stream = dumpfile;

  init_recog_no_volatile ();

  max_reg_before_loop = max_reg_num ();

  loop_note_stats_dump (f, loop_dump_stream);

  loop_notes_strip (f);

  /* Build CFG.  */
  find_basic_blocks (f, max_reg_before_loop, loop_dump_stream);
  cleanup_cfg (f);

  /* Find natural loops within the CFG.  Only build the loop tree
     and find the loop entry and exit edges.  */
  flow_loops_find (loops, LOOP_ALL);

  if (! loops->num)
    {
      if (loop_dump_stream)
	fprintf (loop_dump_stream, 
		 ";; Loop stats: 0 natural loops, %d marked loops\n",
		 loop_stats.loops);
      flow_loops_free (loops);
      return;
    }

  loops_validate (loops);
   
  loops_bbs_find (loops);

  /* Create pre-header and landing-pad nodes to hoist and sink insns
     into.  Use dummy note as a placeholder so that we can insert
     insns before it.  */
  for (i = loops->num - 1; i >= 0; i--)
    {
      int j;
      struct loop *loop = &loops->array[i];

      if (loop->invalid)
	continue;
      
      loop_pre_header_create (loop->entry_edges[0]);
      
      for (j = 0; j < loop->num_exits; j++)
	if ((loop->exit_edges[j]->flags & EDGE_FAKE) == 0)
	  loop_landing_pad_create (loop->exit_edges[j]);

      /* Try inverting the loop; i.e., converting a while loop
	 into a repeat loop.  */
      loop_invert (loop);
    }

  n_basic_blocks_orig = n_basic_blocks;

  commit_edge_insertions ();

  /* Add fake edges to the exit block for calls that may exit.
     This will introduce new basic blocks and new loop exits.
     We do not need landing pads for these new exits since
     they will never get executed.  */
  loop_call_edges_add (loops);

  if (n_basic_blocks != n_basic_blocks_orig)
    {
      if (loop_dump_stream)
	fprintf (loop_dump_stream, 
		 "\n;; Loop %d basic blocks created\n",
		 n_basic_blocks - n_basic_blocks_orig);

      /* Update the loop data.  This is only necessary if new basic
         blocks were created.  */
      flow_loops_update (loops, LOOP_ALL);

      loops_validate (loops);
    }

  /* Allocate and initialize auxiliary loop information.  */
  loops_info = xcalloc (loops->num, sizeof (struct loop_info));
  for (i = 0; i < loops->num; i++)
    loops->array[i].aux = loops_info + i;

  /* Now find all register lifetimes.  */
  reg_scan (f, max_reg_before_loop, 1);

  /* Initialise dataflow information.  */
  df = df_init ();

  /* Copy pointer to dominator data.  */
  df->dom = loops->cfg.dom;

  /* This must occur after reg_scan so that registers created by gcse
     will have entries in the register tables.

     We could have added a call to reg_scan after gcse_main in toplev.c,
     but moving this call to init_alias_analysis is more efficient.  */
  init_alias_analysis ();

  /* Link loops in terms of the loop level.  */
  loops_order = xcalloc (loops->levels + 1, sizeof (*loops_order));
  for (i = loops->num - 1; i >= 0; i--)
    {
      struct loop *loop = &loops->array[i];
      loop->next = loops_order[loop->level];
      loops_order[loop->level] = loop;
    }

  /* Now optimize the loops with all the innermost loops first (level
     1), followed by all the level 2 loops, etc.  Note that at each
     level the loops are independent and thus the transformations on a
     loop should not screw up the dataflow information for other loops
     at the same level.  */

  for (loop_level = 1; loop_level <= loops->levels; loop_level++)
    {
      struct loop *loop;

      if (loop_dump_stream)
	fprintf (loop_dump_stream,
		 "\n;; Processing loops at level %d.\n",
		 loop_level);

      /* Create data flow information.  */
      if (df_analyse (df, 0, DF_LOOP))
	df_dump (df, DF_LOOP_DUMP, loop_dump_stream);

      /* Loop optimizations should only affect the dataflow
	 information for the blocks within the loop and for the loop
	 pre-header and landing-pad blocks.  Newly created uses should
	 not be exposed above the loop pre-header unless an
	 optimization has gone wrong.  Thus the existing bits within
	 the IN bitmaps for reaching uses should not change.
	 Similarly, newly created defs within the loop should not be
	 used after the end of the loop landing-pads.  Thus there is
	 no need to propagate dataflow information before the loop
	 pre-header or after the loop landing-pads.  */
      
      /* Scan loops looking for valid ones to optimize.  */
      for (loop = loops_order[loop_level]; loop && !loop->invalid; 
	   loop = loop->next)
	{
	  struct loop_info *loop_info = LOOP_INFO (loop);
	  
	  if (loop->invalid)
	    continue;
	  
	  /* Set up variables describing this loop.  */
	  loop_scan (loop);

	  /* Give up on this loop if it has a setjmp.  */
	  if (loop_info->has_setjmp)
	    {
	      if (loop_dump_stream)
		fprintf (loop_dump_stream,
			 "\n;; Loop%d ignored due to setjmp.\n",
			 loop->num);
	      loop->invalid = 1;
	      continue;
	    }
	}
      
      /* Perform loop invariant code motion.  */
      for (loop = loops_order[loop_level]; loop && !loop->invalid; 
	   loop = loop->next)
	{
	  struct loop_regs *regs = LOOP_REGS (loop);

	  if (loop_dump_stream)
	    fprintf (loop_dump_stream,
		     "\n;; Loop%d invariant code motion.\n",
			 loop->num);

	  loop_invariants_move (loop, flags);
	}

      /* Update data flow information.  */
      if (df_analyse (df, (sbitmap)-1, DF_LOOP))
	df_dump (df, DF_LOOP_DUMP, loop_dump_stream);

      /* Perform shadowing of loop mems.  */
      for (loop = loops_order[loop_level]; loop && !loop->invalid; 
	   loop = loop->next)
	{
	  if (loop_dump_stream)
	    fprintf (loop_dump_stream,
		     "\n;; Loop%d mem loading.\n",
			 loop->num);
	  loop_mems_load (loop);
	}

      /* Update data flow information for blocks that were
	 modified.  */
      if (df_analyse (df, (sbitmap)-1, DF_LOOP))
	df_dump (df, DF_LOOP_DUMP, loop_dump_stream);

      /* Find induction variables.  */
      for (loop = loops_order[loop_level]; loop && !loop->invalid; 
	   loop = loop->next)
	{
	}

      /* Invert loop.  */
      for (loop = loops_order[loop_level]; loop && !loop->invalid; 
	   loop = loop->next)
	{
	}
	  
      /* Update data flow information for blocks that were
	 modified.  */
      if (df_analyse (df, (sbitmap)-1, DF_LOOP))
	df_dump (df, DF_LOOP_DUMP, loop_dump_stream);

      /* Unroll loop.  */
      for (loop = loops_order[loop_level]; loop && !loop->invalid; 
	   loop = loop->next)
	{
	}

      /* Update data flow information for blocks that were
	 modified.  */
      if (df_analyse (df, (sbitmap)-1, DF_LOOP))
	df_dump (df, DF_LOOP_DUMP, loop_dump_stream);

      /* Induction variable strength reduction.  */
      for (loop = loops_order[loop_level]; loop && !loop->invalid; 
	   loop = loop->next)
	{
	}
    }

  /* If there were lexical blocks inside the loop, they have been
     replicated.  We will now have more than one NOTE_INSN_BLOCK_BEG
     and NOTE_INSN_BLOCK_END for each such block.  We must duplicate
     the BLOCKs as well.  */
  if (write_symbols != NO_DEBUG)
    reorder_blocks ();

  end_alias_analysis ();

  if (loop_dump_stream)
    flow_loops_dump (loops, loop_dump_stream, loop_dump_aux, 1);

  /* Clean up.  */
  df_finish (df);
  free (loops_order);
  free (loops_info);
  flow_loops_free (loops);
}

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Loop optimiser upgrade (Was RFC: BB duplication code)
  2001-09-07 20:34         ` Loop optimiser upgrade (Was RFC: BB duplication code) Michael Hayes
@ 2001-09-07 22:23           ` Daniel Berlin
  2001-09-08  9:15             ` Jan Hubicka
  2001-09-08  8:57           ` Jan Hubicka
                             ` (4 subsequent siblings)
  5 siblings, 1 reply; 25+ messages in thread
From: Daniel Berlin @ 2001-09-07 22:23 UTC (permalink / raw)
  To: Michael Hayes; +Cc: Jan Hubicka, gcc

Michael Hayes <m.hayes@elec.canterbury.ac.nz> writes:

> Jan Hubicka writes:
>
>  > > I'd say strip the loop notes.  You don't really lose any information,
>  > > in that correct information can always be obtained from the shape of
>  > > the CFG, via flow_loop_nodes_find.
>  > Agreed - anyway an loop optimizer is still an barrier for my effort.
>  > Until it is converted to use CFG, we can't make flow info survive.
>
> I agree that the loop notes should be weeded out.  The one useful loop
> note is the VTOP note that indicates if the loop exit code has been
> duplicated.  When this loop exists and we know that the loop body will
> be executed at least once if the loop header is entered.  I'm not sure
> of an easy means to determine this from the CFG.
>
>  > Converting whole loop optimizer at once can be huge task. Additionally some
>  > pieces should be already obsoletted (as code hoisting).
>
> Yes, I agree.  Before I got snowed under with my real job I had a big
> hack at converting the old loop optimiser to use the CFG.  The problem
> was ensuring that I did not break anything.  I think a better approach
> is to write another loop optimiser that can run after the current loop
> optimiser and then one day replace it completely.  After the current
> loop optimiser has run, I'd say that we should strip the loop notes
> completely (or at least regenerate them from the CFG) and rely totally
> on the CFG.
>
>  > So I think that in mid term, we can make the BIV/GIV discovery code idedendent
>  > on loop notes (it should not be that dificult) and start work on new loop
>  > unroller done as separate pass.
>
> Yes, this could be done standalone.  However, it would need loop
> invariant discovery code first.  This is one of the reasons I wrote
> the dataflow code that Dan Berlin has marvellously souped up for the
> new register allocator.  It is also the reason I added routines to
> loop.c for sinking and hoisting insns so that I could track the
> changes to the dataflow.

Loop invariant discovery as it stands right now isn't doing all it
could, as i'm sure everyone knows.
Without a fixed store motion i'm about to send off to Andreas for some
more bootstraps (It passes bootstrap and make check on rs6000 and x86,
after a few more fixes tonight), for the following simple dead stores
in loop code:

int global;
        float r;
void f()
{
        int q;
        int i;
        r=6;
        for (q = 0; q < 50; q++)
        {
                i = 1;
                global = 1;
                global = 2;
                r = 4;
                r=global;
        }
        return;
        global = 3;
}

We get:
   .file   "dse.c"
        .text
        .align 2
        .p2align 4,,15
.globl f
        .type   f,@function
f:
        pushl   %ebp
        movl    $0x40000000, %edx
        movl    %esp, %ebp
        movl    $0x40c00000, r
        movl    $49, %eax
        .p2align 4,,15
.L5:
        movl    %edx, r
        decl    %eax
        jns     .L5
        popl    %ebp
        movl    $2, %eax
        movl    %eax, global
        ret

Which completely misses the fact that r is loop invariant, for some
reason.

However, a working store motion turns it into:

.globl f
        .type   f,@function
f:
        pushl   %ebp
        movl    %esp, %ebp
        movl    $49, %eax
        .p2align 4,,15
.L5:
        decl    %eax
        jns     .L5
        popl    %ebp
        movl    $2, %eax
        movl    %eax, global
        movl    $0x40000000, %eax
        movl    %eax, r
        ret

Load motion isn't as good as store motion yet, and leaves one r/o reg
for loop to hoist.

However, i'm curious as to why we don't notice that r was invariant in
the current loop code. 
Any ideas?

>
> What started to stump me was an efficient way to update the dataflow
> information incrementally after each loop optimisation.  To mitigate
> the dataflow computation needed, I structured the loop optimiser to
> optimise all the innermost loops first, then all the next level loops,
> and so on.  This way the loops could be optimised independently
> without having to always recompute the dataflow information.
>
> I've attached the basic structure of the data flow based loop
> optimiser that I was working on.
>
> Michael
>
>

-- 
"My girlfriend asked me how long I was going to be gone on this
tour.  I said, "the whole time."
"-Steven Wright

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Loop optimiser upgrade (Was RFC: BB duplication code)
  2001-09-07 20:34         ` Loop optimiser upgrade (Was RFC: BB duplication code) Michael Hayes
  2001-09-07 22:23           ` Daniel Berlin
@ 2001-09-08  8:57           ` Jan Hubicka
  2001-09-08 14:00             ` Daniel Berlin
  2001-09-08  9:09           ` Jan Hubicka
                             ` (3 subsequent siblings)
  5 siblings, 1 reply; 25+ messages in thread
From: Jan Hubicka @ 2001-09-08  8:57 UTC (permalink / raw)
  To: Michael Hayes; +Cc: Jan Hubicka, gcc

> Jan Hubicka writes:
> 
>  > > I'd say strip the loop notes.  You don't really lose any information,
>  > > in that correct information can always be obtained from the shape of
>  > > the CFG, via flow_loop_nodes_find.
>  > Agreed - anyway an loop optimizer is still an barrier for my effort.
>  > Until it is converted to use CFG, we can't make flow info survive.
> 
> I agree that the loop notes should be weeded out.  The one useful loop
> note is the VTOP note that indicates if the loop exit code has been
> duplicated.  When this loop exists and we know that the loop body will
> be executed at least once if the loop header is entered.  I'm not sure
> of an easy means to determine this from the CFG.
It should be possible to tie this into branch prediction scheme, as I do
currently with the knowledge of iteration count. Overall I believe that future
loop optimizer should be able to do loop peeling (and header duplication as
special case), instead letting previous passes doing it.  This way we will
probably not need to preserve the information too long.
> 
>  > Converting whole loop optimizer at once can be huge task. Additionally some
>  > pieces should be already obsoletted (as code hoisting).
> 
> Yes, I agree.  Before I got snowed under with my real job I had a big
> hack at converting the old loop optimiser to use the CFG.  The problem
It would be excellent to see your code.
> was ensuring that I did not break anything.  I think a better approach
> is to write another loop optimiser that can run after the current loop
> optimiser and then one day replace it completely.  After the current
100% agreed - actually I was suggesting that in separate email.
I am thinking about doing that at CFG branch.  I think we should follow
> loop optimiser has run, I'd say that we should strip the loop notes
> completely (or at least regenerate them from the CFG) and rely totally
> on the CFG.
> 
>  > So I think that in mid term, we can make the BIV/GIV discovery code idedendent
>  > on loop notes (it should not be that dificult) and start work on new loop
>  > unroller done as separate pass.
> 
> Yes, this could be done standalone.  However, it would need loop
> invariant discovery code first.  This is one of the reasons I wrote
> the dataflow code that Dan Berlin has marvellously souped up for the
> new register allocator.  It is also the reason I added routines to
> loop.c for sinking and hoisting insns so that I could track the
> changes to the dataflow.
Looks your plan is compatible.
I planned steps
1) create an "loop module" able to analuze loop and get invariants
2) create an BIV/GIV module
3) rewrite strength reduction and loop unrolling on top of it.
We should be able to drop invariant/memory hoisting code as it is
redundant with PRE/load/store motion.
> 
> What started to stump me was an efficient way to update the dataflow
> information incrementally after each loop optimisation.  To mitigate
> the dataflow computation needed, I structured the loop optimiser to
> optimise all the innermost loops first, then all the next level loops,
> and so on.  This way the loops could be optimised independently
> without having to always recompute the dataflow information.
You mean the DU/DF chains?
Thats problem - how it is handled by other compilers? Can't we get them
updated incrementaly?
> 
> I've attached the basic structure of the data flow based loop
> optimiser that I was working on.
Great.  It is probably time to start the CFG branch and get your work
into it.

Honza
> 
> Michael
> 


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Loop optimiser upgrade (Was RFC: BB duplication code)
  2001-09-07 20:34         ` Loop optimiser upgrade (Was RFC: BB duplication code) Michael Hayes
  2001-09-07 22:23           ` Daniel Berlin
  2001-09-08  8:57           ` Jan Hubicka
@ 2001-09-08  9:09           ` Jan Hubicka
  2001-09-08 16:56             ` Michael Hayes
  2001-09-08  9:33           ` Jan Hubicka
                             ` (2 subsequent siblings)
  5 siblings, 1 reply; 25+ messages in thread
From: Jan Hubicka @ 2001-09-08  9:09 UTC (permalink / raw)
  To: Michael Hayes; +Cc: Jan Hubicka, gcc

Michael,
I've run across your code and I would love to see it all so we can work
together on getting it useable.

Basically it looks great, especially because the DU/DF and natural loop
framework is already used and tested in gcc, so it should not be so dificult
to continue.  I am happy you implemented the preheader and landing pad
construction as it will simplify my CFGcleanup code.

I have one major comment.
What do you think about restructuring it to kind of "loop library"
as I've mentioned previously?

As we already have code to discover loops, I would love to see everything
designed in similar way - ie function to create preheaders/landing pads,
function to discover invariants, function to discover BIVs and implement each
optimization as separate pass.

This should help us to get better ordering of passes and simplify maitance of
it.  Major problem of loop optimizer now is that it is such a huge beast so
dificult to replace. Also once midlevel RTL and SSA is at place, we definitly
want to move some optimizations there and keep other in late passes.

I understand there is problem with DU/DF updating. So perhaps the passes should
be called by driver per loop basis - that will save the idea of optimizing
several unnested loops and then re-update DU/DF.

Honza

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Loop optimiser upgrade (Was RFC: BB duplication code)
  2001-09-07 22:23           ` Daniel Berlin
@ 2001-09-08  9:15             ` Jan Hubicka
  0 siblings, 0 replies; 25+ messages in thread
From: Jan Hubicka @ 2001-09-08  9:15 UTC (permalink / raw)
  To: Daniel Berlin; +Cc: Michael Hayes, Jan Hubicka, gcc

> Load motion isn't as good as store motion yet, and leaves one r/o reg
> for loop to hoist.
> 
> However, i'm curious as to why we don't notice that r was invariant in
> the current loop code. 
> Any ideas?
It is because the invariant code simply "sees" that R is something
at loop entry and set to something else in the body.
It does not know that the value is used only with the new one as all analysis
it does it to look at variable's live range and see that variable is live
before the definition concluding it is used too.

Using DU/DF will solve it.

Anyway I think the loop invariant code, once PRE/store/load motion is at place
don't need to attempt to be smarter.  It should simply know that value is
invariant if it is not set in the loop, as other cases should be already put
away.

We only need invariant information to discover BIVs/GIVs incremented/multiplied
by invariant registers.

Honza

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Loop optimiser upgrade (Was RFC: BB duplication code)
  2001-09-07 20:34         ` Loop optimiser upgrade (Was RFC: BB duplication code) Michael Hayes
                             ` (2 preceding siblings ...)
  2001-09-08  9:09           ` Jan Hubicka
@ 2001-09-08  9:33           ` Jan Hubicka
  2001-09-19 22:23           ` Joern Rennecke
  2001-09-22 10:17           ` Jan Hubicka
  5 siblings, 0 replies; 25+ messages in thread
From: Jan Hubicka @ 2001-09-08  9:33 UTC (permalink / raw)
  To: Michael Hayes; +Cc: Jan Hubicka, gcc

> Jan Hubicka writes:
> 
>  > > I'd say strip the loop notes.  You don't really lose any information,
>  > > in that correct information can always be obtained from the shape of
>  > > the CFG, via flow_loop_nodes_find.
>  > Agreed - anyway an loop optimizer is still an barrier for my effort.
>  > Until it is converted to use CFG, we can't make flow info survive.
> 
> I agree that the loop notes should be weeded out.  The one useful loop
> note is the VTOP note that indicates if the loop exit code has been
> duplicated.  When this loop exists and we know that the loop body will
Just to note, the case we were speaking about is the tracer and dropping
loop notes in the code duplicates.
This means that it will not affect current loop optimizer, as the BB
duplication is run afterwards
(but if loop optimizer were sane, it should be run before).

Honza

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Loop optimiser upgrade (Was RFC: BB duplication code)
  2001-09-08  8:57           ` Jan Hubicka
@ 2001-09-08 14:00             ` Daniel Berlin
  0 siblings, 0 replies; 25+ messages in thread
From: Daniel Berlin @ 2001-09-08 14:00 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: Michael Hayes, gcc

Jan Hubicka <jh@suse.cz> writes:

> You mean the DU/DF chains?
> Thats problem - how it is handled by other compilers? Can't we get them
> updated incrementaly?
A few do elimination based dataflow or it's ilk.
A few just do loop unrolling/optimizations on SSA form.

Until we have a mid level RTL, the second isn't really an option.

I've got DJ-graph elimination based dataflow, with incremental
updating, working okay, but it needs some massive cleanup.

As it stands right now, df can update incrementally, just make sure
you don't depend on df->defs[x] and df->uses[x] being non-null (since
we may create holes when we unlink the now deleted defs/uses).
I only mention it because it's easy to assume you can just run through
all the defs in the defs array, and that none of them will be NULL.

> > 
>> I've attached the basic structure of the data flow based loop
>> optimiser that I was working on.
> Great.  It is probably time to start the CFG branch and get your work
> into it.
>
> Honza
> > 
>> Michael
>> 
>

-- 
"In Vegas, I got into a long argument with the man at the
roulette wheel over what I considered to be an odd number.
"-Steven Wright

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Loop optimiser upgrade (Was RFC: BB duplication code)
  2001-09-08  9:09           ` Jan Hubicka
@ 2001-09-08 16:56             ` Michael Hayes
  2001-09-09  1:23               ` Jan Hubicka
  2001-09-26 17:27               ` Jan Hubicka
  0 siblings, 2 replies; 25+ messages in thread
From: Michael Hayes @ 2001-09-08 16:56 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: Michael Hayes, gcc

Jan Hubicka writes:
 > What do you think about restructuring it to kind of "loop library"
 > as I've mentioned previously?
 > 
 > As we already have code to discover loops, I would love to see everything
 > designed in similar way - ie function to create preheaders/landing pads,
 > function to discover invariants, function to discover BIVs and implement each
 > optimization as separate pass.

Yes, I agree.  Each of these modules should be decoupled as much as
possible to simplify the maintenance.

Maybe we should try the unthinkable and have a go at designing the
interfaces first ;-)

Michael.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Loop optimiser upgrade (Was RFC: BB duplication code)
  2001-09-08 16:56             ` Michael Hayes
@ 2001-09-09  1:23               ` Jan Hubicka
  2001-09-26 17:27               ` Jan Hubicka
  1 sibling, 0 replies; 25+ messages in thread
From: Jan Hubicka @ 2001-09-09  1:23 UTC (permalink / raw)
  To: Michael Hayes; +Cc: Jan Hubicka, gcc

> 
> Yes, I agree.  Each of these modules should be decoupled as much as
> possible to simplify the maintenance.
> 
> Maybe we should try the unthinkable and have a go at designing the
> interfaces first ;-)

Sounds like good idea.  I still do have dificulties in thinking how all
those parts we plan fits together.

I have quite clean plans about CFG infrastructure, but concerning the loop
optimizer my little mind still goes crazy.

Maybe you noticed that some bits are already in. For instance the BB pre-header
code you have can be rewriten to use redirect_edge_and_branch.  Simply split
the header first and then redirect loopback edge.  That will work even with
multiple header predecesors.

First thing that comes into mind is how to update the "loops" datastructure
after updating the CFG?  It is probably needed to do so while unrolling loops
or peeling, as the loop nest will force too many rebuilds otherwise.

Main problem I see is in the bitmaps used to hold nodes of the CFG. Lets take
example of simple preheader/sink creation pass.  Any creating of basic block
would need shifting all these bitmaps.  If we use consistently create_block
function (I plan to do so), we can probably shift the bitmaps there, but it
is ugly.

One alternative I see is to not represent the bitmap explicitly, instead maitain
dominance tree and use it to always construct the entries at fly. (that should
be possible easilly or not?)

Another solution can be to use DJ graphs, that can be less sensitive to such
a local updates.  What do you think?

OK, another issues later :)

Honza
> 
> Michael.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Loop optimiser upgrade (Was RFC: BB duplication code)
  2001-09-07 20:34         ` Loop optimiser upgrade (Was RFC: BB duplication code) Michael Hayes
                             ` (3 preceding siblings ...)
  2001-09-08  9:33           ` Jan Hubicka
@ 2001-09-19 22:23           ` Joern Rennecke
  2001-09-21 19:11             ` Michael Hayes
  2001-09-22 10:17           ` Jan Hubicka
  5 siblings, 1 reply; 25+ messages in thread
From: Joern Rennecke @ 2001-09-19 22:23 UTC (permalink / raw)
  To: Michael Hayes; +Cc: Jan Hubicka, gcc

> I agree that the loop notes should be weeded out.  The one useful loop
> note is the VTOP note that indicates if the loop exit code has been
> duplicated.  When this loop exists and we know that the loop body will
> be executed at least once if the loop header is entered.  I'm not sure
> of an easy means to determine this from the CFG.

Actually, that is the easy part.  You just have to look if the loop is
solely entered by fallthrough from the preceding block.  In the non-vtop
while loop case, you have a jump from outside to the loop exit test.

However, the VTOP note means more than just that the loop is executed
at least once when entered.  It also means that the loop exit condition
will be false at the start of the first iteration.  This knowledge
allows you to do some extra strength reduction and unroll loop
preconditioning transforations without emitting extra runtime checks.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Loop optimiser upgrade (Was RFC: BB duplication code)
  2001-09-19 22:23           ` Joern Rennecke
@ 2001-09-21 19:11             ` Michael Hayes
  2001-09-21 21:16               ` Joern Rennecke
  0 siblings, 1 reply; 25+ messages in thread
From: Michael Hayes @ 2001-09-21 19:11 UTC (permalink / raw)
  To: Joern Rennecke; +Cc: Michael Hayes, Jan Hubicka, gcc

Joern Rennecke writes:
 > > I agree that the loop notes should be weeded out.  The one useful loop
 > > note is the VTOP note that indicates if the loop exit code has been
 > > duplicated.  When this loop exists and we know that the loop body will
 > > be executed at least once if the loop header is entered.  I'm not sure
 > > of an easy means to determine this from the CFG.
 > 
 > Actually, that is the easy part.  You just have to look if the loop is
 > solely entered by fallthrough from the preceding block.  In the non-vtop
 > while loop case, you have a jump from outside to the loop exit
 > test.

Are you sure?  A simple while loop has a CFG:

    +---+
    v   |
--->H-->L
    |
    v

where H is the loop header and L is the loop latch node.

If we transform the while loop into a do-while loop and duplicate the
loop exit test we have something like:

       +---+
       v   |
-->B-->H-->L--+-->
   |          ^
   |          |
   +----------+   

where B is a node containing the duplicated exit test that is used
to bypass the loop.

My question is how do we tell this from a do-while loop with a goto
around it?

 > However, the VTOP note means more than just that the loop is executed
 > at least once when entered.  It also means that the loop exit condition
 > will be false at the start of the first iteration.  This knowledge
 > allows you to do some extra strength reduction and unroll loop
 > preconditioning transforations without emitting extra runtime checks.


I realise this thanks.

Michael.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Loop optimiser upgrade (Was RFC: BB duplication code)
  2001-09-21 19:11             ` Michael Hayes
@ 2001-09-21 21:16               ` Joern Rennecke
  2001-09-23  4:29                 ` Michael Hayes
  0 siblings, 1 reply; 25+ messages in thread
From: Joern Rennecke @ 2001-09-21 21:16 UTC (permalink / raw)
  To: Michael Hayes; +Cc: Joern Rennecke, Michael Hayes, Jan Hubicka, gcc

> My question is how do we tell this from a do-while loop with a goto
> around it?

You don't - and it's immaterial for the purpose of spotting loops that are
executed at least once if entered.  That's a property all do-wile loops
have, by definition.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Loop optimiser upgrade (Was RFC: BB duplication code)
  2001-09-07 20:34         ` Loop optimiser upgrade (Was RFC: BB duplication code) Michael Hayes
                             ` (4 preceding siblings ...)
  2001-09-19 22:23           ` Joern Rennecke
@ 2001-09-22 10:17           ` Jan Hubicka
  5 siblings, 0 replies; 25+ messages in thread
From: Jan Hubicka @ 2001-09-22 10:17 UTC (permalink / raw)
  To: Michael Hayes; +Cc: Jan Hubicka, gcc

Hi,
another simple question - what notion of loop we want to base our optimizer
on?  The natural loops itself are probably not best choice, as in case of
following loop

for (i=0; i<1000;i++)
  if (i&1)
    do something;
  else
    do something else;

will result in two natural loop each iterating exactly once.  Do you have
some plans about that?  We can merge the loops based on profile data,
probably - the compaq compiler unsplits each such shared header to common
and uncommon one causing two nested loops.

Will we work on multiple loops, or split the header to get common latch
and thus one natural loop from both?

Whats about the notion of superblock loops - the impact compiler is doing some
optimizations only on these and thus they are easier to implement.  With my
tracer code, hot paths of most of internal loops should convert to superblocks,
but still this can be somewhat limited, as too much can fall outside the loop
notion...

Just ideas...
Honza

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Loop optimiser upgrade (Was RFC: BB duplication code)
  2001-09-21 21:16               ` Joern Rennecke
@ 2001-09-23  4:29                 ` Michael Hayes
  2001-09-24  4:34                   ` Jan Hubicka
  0 siblings, 1 reply; 25+ messages in thread
From: Michael Hayes @ 2001-09-23  4:29 UTC (permalink / raw)
  To: Joern Rennecke; +Cc: Michael Hayes, Joern Rennecke, Jan Hubicka, gcc

Joern Rennecke writes:
 > > My question is how do we tell this from a do-while loop with a goto
 > > around it?
 > 
 > You don't - and it's immaterial for the purpose of spotting loops that are
 > executed at least once if entered.  That's a property all do-wile loops
 > have, by definition.

Sure!   I guess I didn't explain myself well or you missed my point.  

The advantage of knowing that the loop exit test has been duplicated
is that when one of these transformed loops is entered we know that
the exit test will be true on the first iteration.  This allows us to
compute the number (or maximum number) of iterations of well behaved
loops without having do perform value range propagation.

The point I was trying to make is that I do not know of a simple,
general, way of looking at the CFG to see if the loop exit has been
duplicated.  I cannot see how your suggestion that there may be a
fall-through edge into the loop header will tell us this since
I can construct an equivalent CFG using a goto.

Michael.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Loop optimiser upgrade (Was RFC: BB duplication code)
  2001-09-23  4:29                 ` Michael Hayes
@ 2001-09-24  4:34                   ` Jan Hubicka
  0 siblings, 0 replies; 25+ messages in thread
From: Jan Hubicka @ 2001-09-24  4:34 UTC (permalink / raw)
  To: Michael Hayes; +Cc: Joern Rennecke, Joern Rennecke, Jan Hubicka, gcc

> The point I was trying to make is that I do not know of a simple,
> general, way of looking at the CFG to see if the loop exit has been
> duplicated.  I cannot see how your suggestion that there may be a
> fall-through edge into the loop header will tell us this since
> I can construct an equivalent CFG using a goto.
Surely that idea won't work.  Only way I see is to duplicate in the loop
optimizer and attach the information to "loop" structure that will hopefully
be updated untill all relevant loop optimizations passes.

BTW did I sent you the bugreport about the loop discovery code, or I did forgot
about that?

Honza
> 
> Michael.
> 
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Loop optimiser upgrade (Was RFC: BB duplication code)
  2001-09-08 16:56             ` Michael Hayes
  2001-09-09  1:23               ` Jan Hubicka
@ 2001-09-26 17:27               ` Jan Hubicka
  1 sibling, 0 replies; 25+ messages in thread
From: Jan Hubicka @ 2001-09-26 17:27 UTC (permalink / raw)
  To: Michael Hayes; +Cc: Jan Hubicka, gcc

> Jan Hubicka writes:
>  > What do you think about restructuring it to kind of "loop library"
>  > as I've mentioned previously?
>  > 
>  > As we already have code to discover loops, I would love to see everything
>  > designed in similar way - ie function to create preheaders/landing pads,
>  > function to discover invariants, function to discover BIVs and implement each
>  > optimization as separate pass.
> 
> Yes, I agree.  Each of these modules should be decoupled as much as
> possible to simplify the maintenance.
> 
> Maybe we should try the unthinkable and have a go at designing the
> interfaces first ;-)
Hmm, this still sounds temping. Perhaps we can try to sumarize what
we want to accomplish at RTL level.  I following optimizations are IMO thinkable:

1) code hoisting/invariant code motion - this should be obsoletted by PRE/load/store motion
2) strength reduction
3) "prefetch" instruction genration and probably dropping the "nontemporal" hints to loads too
4) loop peeling
5) loop unrolling+preconditioning+variable expansion
6) induction variable removal - this should be obsoleted by proper dead code elimination
7) Something else?

About 4) - I already do have code that is able to peel loop based on the
bb-reorder pass.  Problem is that the representation a bit differs there
from the normal insn stream, as the CFG contains disjunt basic block and
the fallthru edges may point anywhere. This makes duplicating the basic
blocks very convenient, but I am not quite sure if it will work well with
rest of loop infrastructure.

Also the header duplication can be seen as special case of loop peeling.
There is still problem how to keep the "VTOP" infromation around..

About 5) - that one should be done IMO as the peeling is - just by duplicating.
How much do we want to integrate it with scheduling and software pipelining?
Also needs it to be done together with preconditioning?

I do have the code to turn pseudos into webs that, in combination with CSE,
should replace the variable expansion done by current unroller.

Honza
> 
> Michael.

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2001-09-26 17:27 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2001-08-22 11:29 RFC: BB duplication code Jan Hubicka
2001-08-22 12:14 ` Richard Henderson
2001-08-22 12:27   ` Jan Hubicka
2001-08-22 13:04     ` Richard Henderson
2001-08-23  6:44       ` Jan Hubicka
2001-08-23 16:19         ` Richard Henderson
2001-08-24  6:35           ` Jan Hubicka
2001-09-07 20:34         ` Loop optimiser upgrade (Was RFC: BB duplication code) Michael Hayes
2001-09-07 22:23           ` Daniel Berlin
2001-09-08  9:15             ` Jan Hubicka
2001-09-08  8:57           ` Jan Hubicka
2001-09-08 14:00             ` Daniel Berlin
2001-09-08  9:09           ` Jan Hubicka
2001-09-08 16:56             ` Michael Hayes
2001-09-09  1:23               ` Jan Hubicka
2001-09-26 17:27               ` Jan Hubicka
2001-09-08  9:33           ` Jan Hubicka
2001-09-19 22:23           ` Joern Rennecke
2001-09-21 19:11             ` Michael Hayes
2001-09-21 21:16               ` Joern Rennecke
2001-09-23  4:29                 ` Michael Hayes
2001-09-24  4:34                   ` Jan Hubicka
2001-09-22 10:17           ` Jan Hubicka
2001-08-22 16:04   ` RFC: BB duplication code Joern Rennecke
2001-08-22 16:44     ` Richard Henderson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).