* RFC: BB duplication code @ 2001-08-22 11:29 Jan Hubicka 2001-08-22 12:14 ` Richard Henderson 0 siblings, 1 reply; 25+ messages in thread From: Jan Hubicka @ 2001-08-22 11:29 UTC (permalink / raw) To: gcc-patches, rth, patches, gcc Hi, this patch contains BB duplication code I use in my bb-reorder code and tracer. It is tied into bb-reorder infrastructure and allows you to simply call reorder_duplicate_bb for basic block and place it anywhere in the reordered chain. While I am able to build bootstrapping and regtesting compiler with this change (+tracer) I would like to discuss some issues first: 1) In case of copying tablejumps, the vector remains shared. This is not good idea except very latest passes of compilation, but making tablejump unshared is dificult task, as code computing destination address using the table can be hoisted outside. What to do? Prohibit duplication of tablejumps? 2) Pairable notes are splitted off - I should probably handle the paired notes inside single basic block by duplicating, but what to do with notes starting or ending in the BB? What about EH region notes? Should I prohibit duplication if EH region starts or ends in the BB? 3) Can I update debug information using exiting bb-reorder infrastructure? It seems to work already somehow. 4) What about epilogues - I should probably copy the frame_related flag, but will the debug output happy to see multiple epilogues? It should probably, as it is able to see epilogue in the middle of code. What about EPILOGUE_BEGIN note? and FUNCTION_END? 5) Something else I've missed? Well, thats all for now. Hope that these issues will settle down somehow so I will be able to contribute the code soon. Honza Wed Aug 22 20:27:43 CEST 2001 Jan Hubicka <jh@suse.cz> * bb-reorder.c (reorder_duplicate_bb): New function. *** bb-reorder.c1 Tue Aug 21 17:47:08 2001 --- bb-reorder.c Wed Aug 22 20:22:35 2001 *************** static rtx get_next_bb_note PARAMS ((rt *** 196,201 **** --- 196,202 ---- static rtx get_prev_bb_note PARAMS ((rtx)); void verify_insn_chain PARAMS ((void)); + static basic_block reorder_duplicate_bb PARAMS ((basic_block, edge)); \f /* Skip over inter-block insns occurring after BB which are typically associated with BB (e.g., barriers). If there are any such insns, *************** record_effective_endpoints () *** 309,314 **** --- 310,458 ---- } } + /* Duplicate block BB and redirect edge E into it. */ + static basic_block + reorder_duplicate_bb (bb, e) + basic_block bb; + edge e; + { + rtx last = get_last_insn (); + rtx insn, new = NULL_RTX; + edge s, n; + basic_block new_bb; + + if (!bb->pred || !bb->pred->pred_next) + abort (); + + /* Create copy at the end of INSN chain. The chain will + be reordered later. */ + + new_bb = (basic_block) obstack_alloc (&flow_obstack, sizeof (*new_bb)); + memset (new_bb, 0, sizeof (*new_bb)); + if (new_bb->global_live_at_start) + { + new_bb->global_live_at_start = OBSTACK_ALLOC_REG_SET (&flow_obstack); + new_bb->global_live_at_end = OBSTACK_ALLOC_REG_SET (&flow_obstack); + COPY_REG_SET (new_bb->global_live_at_start, bb->global_live_at_start); + COPY_REG_SET (new_bb->global_live_at_end, bb->global_live_at_end); + } + + /* Place the new block after the last block. */ + VARRAY_GROW (basic_block_info, ++n_basic_blocks); + new_bb = (basic_block) obstack_alloc (&flow_obstack, sizeof (*new_bb)); + memset (new_bb, 0, sizeof (*new_bb)); + BASIC_BLOCK (n_basic_blocks - 1) = new_bb; + + new_bb = BASIC_BLOCK (n_basic_blocks - 1); + new_bb->loop_depth = bb->loop_depth; + for (s = bb->succ; s; s = s->succ_next) + { + n = (edge) xcalloc (1, sizeof (*n)); + n->src = new_bb; + n->dest = s->dest; + n->flags = s->flags; + n->probability = s->probability; + if (bb->count) + n->count = s->count * e->count / bb->count; + else + n->count = 0; + n->succ_next = new_bb->succ; + new_bb->succ = n; + n->pred_next = n->dest->pred; + n->dest->pred = n; + } + + new_bb->count = e->count; + new_bb->frequency = EDGE_FREQUENCY (e); + bb->count -= e->count; + bb->frequency -= EDGE_FREQUENCY (e); + + + for (insn = RBI (bb)->eff_head; insn != NEXT_INSN (RBI (bb)->eff_end); + insn = NEXT_INSN (insn)) + { + switch (GET_CODE (insn)) + { + case INSN: + new = emit_insn (copy_insn (PATTERN (insn))); + if (REG_NOTES (insn)) + REG_NOTES (new) = copy_insn (REG_NOTES (insn)); + break; + case JUMP_INSN: + /* Share jumptables for both copies of tablejump. */ + if (GET_CODE (PATTERN (insn)) == ADDR_VEC + || GET_CODE (PATTERN (insn)) == ADDR_DIFF_VEC) + continue; + new = emit_jump_insn (copy_insn (PATTERN (insn))); + if (REG_NOTES (insn)) + REG_NOTES (new) = copy_insn (REG_NOTES (insn)); + mark_jump_label (PATTERN (new), new, 0); + if (JUMP_LABEL (new)) + LABEL_NUSES (JUMP_LABEL (new))++; + break; + case CALL_INSN: + new = emit_call_insn (copy_insn (PATTERN (insn))); + if (REG_NOTES (insn)) + REG_NOTES (new) = copy_insn (REG_NOTES (insn)); + if (CALL_INSN_FUNCTION_USAGE (insn)) + CALL_INSN_FUNCTION_USAGE (new) = copy_insn (CALL_INSN_FUNCTION_USAGE (insn)); + case CODE_LABEL: + /*new = emit_label (gen_label_rtx ());*/ + break; + case BARRIER: + new = emit_barrier (); + break; + case NOTE: + switch (NOTE_LINE_NUMBER (insn)) + { + case NOTE_INSN_DELETED: + case NOTE_INSN_DELETED_LABEL: + case NOTE_INSN_LOOP_VTOP: + case NOTE_INSN_LOOP_CONT: + break; + case NOTE_INSN_BASIC_BLOCK: + new = emit_note (NULL, NOTE_INSN_BASIC_BLOCK); + NOTE_BASIC_BLOCK (new) = new_bb; + new_bb->head = new; + break; + case NOTE_INSN_LOOP_BEG: + case NOTE_INSN_LOOP_END: + case NOTE_INSN_RANGE_BEG: + case NOTE_INSN_RANGE_END: + case NOTE_INSN_EPILOGUE_BEG: + case NOTE_INSN_PROLOGUE_END: + case NOTE_INSN_BLOCK_BEG: + case NOTE_INSN_BLOCK_END: + case NOTE_INSN_FUNCTION_END: + case NOTE_INSN_FUNCTION_BEG: + case NOTE_INSN_EH_REGION_BEG: + case NOTE_INSN_EH_REGION_END: + /* Strip down pairable notes and keep them only + in the "master copy". */ + break; + default: + new = emit_note (NOTE_SOURCE_FILE (insn), NOTE_LINE_NUMBER (insn)); + } + } + if (bb->end == insn) + new_bb->end = new; + if (new && INSN_P (new) && basic_block_for_insn) + set_block_for_insn (new, new_bb); + /*debug_rtx (new);*/ + } + new_bb->index = n_basic_blocks + 1; + if (e->flags & EDGE_FALLTHRU) + redirect_edge_succ (e, new_bb); + else + redirect_edge_and_branch (e, new_bb); + new_bb->index = n_basic_blocks - 1; + new_bb->aux = xmalloc (sizeof (struct reorder_block_def)); + RBI (new_bb)->eff_head = NEXT_INSN (last); + RBI (new_bb)->eff_end = get_last_insn (); + RBI (new_bb)->scope = RBI (bb)->scope; + return new_bb; + } + /* Compute an ordering for a subgraph beginning with block BB. Record the ordering in RBI()->index and chained through RBI()->next. */ ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: RFC: BB duplication code 2001-08-22 11:29 RFC: BB duplication code Jan Hubicka @ 2001-08-22 12:14 ` Richard Henderson 2001-08-22 12:27 ` Jan Hubicka 2001-08-22 16:04 ` RFC: BB duplication code Joern Rennecke 0 siblings, 2 replies; 25+ messages in thread From: Richard Henderson @ 2001-08-22 12:14 UTC (permalink / raw) To: Jan Hubicka; +Cc: gcc-patches, patches, gcc On Wed, Aug 22, 2001 at 08:29:00PM +0200, Jan Hubicka wrote: > 1) In case of copying tablejumps, the vector remains shared. This is not > good idea except very latest passes of compilation, but making tablejump > unshared is dificult task, as code computing destination address using > the table can be hoisted outside. > What to do? Prohibit duplication of tablejumps? In the case of JUMP_TABLES_IN_TEXT_SECTION, yes, you absolutely cannot share the table. This is used by targets that use pc-relative addressing from the jump instruction to the table. VAX is the most extreme example in that the entire table is an argument to the instruction. Without JUMP_TABLES_IN_TEXT_SECTION, you might be able to get away with sharing the tables. I don't have a good feel for what sort of problems you might encounter. > 2) Pairable notes are splitted off - I should probably handle the > paired notes inside single basic block by duplicating, but what to do > with notes starting or ending in the BB? Which ones? > What about EH region notes? Should I prohibit duplication if EH region > starts or ends in the BB? There are no NOTE_INSN_EH_REG_BE/END notes during optimization. They only exist during initial rtl generation and during final assemble output. > 3) Can I update debug information using exiting bb-reorder infrastructure? > It seems to work already somehow. Yes, that's already implemented. > 4) What about epilogues - I should probably copy the frame_related flag, The frame_related flag is never set for epilogues, only prologues. > * bb-reorder.c (reorder_duplicate_bb): New function. Not ok yet -- you'll have to figure out what to do about addr_vec. > + CALL_INSN_FUNCTION_USAGE (new) = copy_insn (CALL_INSN_FUNCTION_USAGE (insn)); Wrap. > + case CODE_LABEL: > + /*new = emit_label (gen_label_rtx ());*/ [...] > + /*debug_rtx (new);*/ Debugging? r~ ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: RFC: BB duplication code 2001-08-22 12:14 ` Richard Henderson @ 2001-08-22 12:27 ` Jan Hubicka 2001-08-22 13:04 ` Richard Henderson 2001-08-22 16:04 ` RFC: BB duplication code Joern Rennecke 1 sibling, 1 reply; 25+ messages in thread From: Jan Hubicka @ 2001-08-22 12:27 UTC (permalink / raw) To: Richard Henderson, Jan Hubicka, gcc-patches, patches, gcc > On Wed, Aug 22, 2001 at 08:29:00PM +0200, Jan Hubicka wrote: > > 1) In case of copying tablejumps, the vector remains shared. This is not > > good idea except very latest passes of compilation, but making tablejump > > unshared is dificult task, as code computing destination address using > > the table can be hoisted outside. > > What to do? Prohibit duplication of tablejumps? > > In the case of JUMP_TABLES_IN_TEXT_SECTION, yes, you absolutely cannot > share the table. This is used by targets that use pc-relative addressing > from the jump instruction to the table. VAX is the most extreme example > in that the entire table is an argument to the instruction. > > Without JUMP_TABLES_IN_TEXT_SECTION, you might be able to get away with > sharing the tables. I don't have a good feel for what sort of problems > you might encounter. In case I do duplication soon, in order to form superblock and do subsequent optimizations on each superblock differently, I eventually will want to redirect jumptable in one of superblock (lets say by edge redirection) and due to sharing I will redirect both jumptables. IMO we should absolutely avoid shared jumptables during compilation and possibly make them shared at final pass, if this turns out to be common enought. But this has problem of tracking down instruction calculaitng the value. > > > 2) Pairable notes are splitted off - I should probably handle the > > paired notes inside single basic block by duplicating, but what to do > > with notes starting or ending in the BB? > > Which ones? Any of them. I can imagine, if I do have LOOP_BEG/LOOP_END pair, I can easilly preserve it, but if the loop cosist of multiple basic blocks, I will get LOOP_BEG in one instance and while duplicating the basic block i don't know if the basic block holding LOOP_END will get duplicated too. Sane way is to strip them. This works for superblock formation/loop peeling as it usually don't want to duplicate whole loop. But if I do, I lose the information. LOOP notes are not problem as we should move away from them and their absence does not harm, but what about BLOCK_BEG/BLOCK_END? > > > What about EH region notes? Should I prohibit duplication if EH region > > starts or ends in the BB? > > There are no NOTE_INSN_EH_REG_BE/END notes during optimization. They > only exist during initial rtl generation and during final assemble output. Good for me :) > > > 3) Can I update debug information using exiting bb-reorder infrastructure? > > It seems to work already somehow. > > Yes, that's already implemented. It is, but I need to understand how and see if it still works with presence of duplicated BB - OK, I will try to investigate this futher. > > > 4) What about epilogues - I should probably copy the frame_related flag, > > The frame_related flag is never set for epilogues, only prologues. > > > * bb-reorder.c (reorder_duplicate_bb): New function. > > Not ok yet -- you'll have to figure out what to do about addr_vec. > > > + CALL_INSN_FUNCTION_USAGE (new) = copy_insn (CALL_INSN_FUNCTION_USAGE (insn)); > > Wrap. > > > + case CODE_LABEL: > > + /*new = emit_label (gen_label_rtx ());*/ > [...] > > + /*debug_rtx (new);*/ > > Debugging? Oh yes - as commented, I didn't wanted to OKay it yet. But it looks like there are fewer issues than I've expected. So open is tablejumps - I suggest go initialy by prohibiting duplication. Later we may want with some conservative way - in case we will find the instruciton taking CODE_LABEL of ADDR_VEC as argument we will allow duplication otherwise we won't. For now I will add predicate can_diplicate_bb_p that will check this condition and fail in presense of tablejump. Other problem is BLOCK_BEG and debug info. I will try to investigate it overnight. Honza ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: RFC: BB duplication code 2001-08-22 12:27 ` Jan Hubicka @ 2001-08-22 13:04 ` Richard Henderson 2001-08-23 6:44 ` Jan Hubicka 0 siblings, 1 reply; 25+ messages in thread From: Richard Henderson @ 2001-08-22 13:04 UTC (permalink / raw) To: Jan Hubicka; +Cc: gcc-patches, patches, gcc On Wed, Aug 22, 2001 at 09:27:56PM +0200, Jan Hubicka wrote: > But this has problem of tracking down instruction calculaitng the value. Yep. I have no good idea how so solve that. > Sane way is to strip them. This works for superblock formation/loop peeling > as it usually don't want to duplicate whole loop. > But if I do, I lose the information. I'd say strip the loop notes. You don't really lose any information, in that correct information can always be obtained from the shape of the CFG, via flow_loop_nodes_find. > LOOP notes are not problem as we should move away from them and > their absence does not harm, but what about BLOCK_BEG/BLOCK_END? BLOCK_BEG/END must be properly duplicated for debug info to be correct. > It is, but I need to understand how and see if it still works with presence > of duplicated BB - OK, I will try to investigate this futher. It should. The reorder_blocks (bad name) function is also used for loop unrolling, which does duplication. Additionally, when bb-reorder moves two blocks that shared a BLOCK_BEG/END, duplicates are created. > So open is tablejumps - I suggest go initialy by prohibiting duplication. Seems good. > Later we may want with some conservative way - in case we will find the > instruciton taking CODE_LABEL of ADDR_VEC as argument we will allow > duplication otherwise we won't. Careful, if the target uses a pc-relative load, there may be no such instruction. Which is actually ideal for you. The best way would be to search the dominator tree for the insn that loads the label. What to do when you find it is somewhat complicated. It's not impossible that the entire table index, load, and decode could have been hoisted. There are probably simple cases that can be handled easily, but you'll probably always have to be able to cancel the duplication. r~ ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: RFC: BB duplication code 2001-08-22 13:04 ` Richard Henderson @ 2001-08-23 6:44 ` Jan Hubicka 2001-08-23 16:19 ` Richard Henderson 2001-09-07 20:34 ` Loop optimiser upgrade (Was RFC: BB duplication code) Michael Hayes 0 siblings, 2 replies; 25+ messages in thread From: Jan Hubicka @ 2001-08-23 6:44 UTC (permalink / raw) To: Richard Henderson, Jan Hubicka, gcc-patches, patches, gcc > On Wed, Aug 22, 2001 at 09:27:56PM +0200, Jan Hubicka wrote: > > But this has problem of tracking down instruction calculaitng the value. > > Yep. I have no good idea how so solve that. Neighter I do - so lets avoid duplication for now. I believe it is not critical for the optimizations to be effective. > > > Sane way is to strip them. This works for superblock formation/loop peeling > > as it usually don't want to duplicate whole loop. > > But if I do, I lose the information. > > I'd say strip the loop notes. You don't really lose any information, > in that correct information can always be obtained from the shape of > the CFG, via flow_loop_nodes_find. Agreed - anyway an loop optimizer is still an barrier for my effort. Until it is converted to use CFG, we can't make flow info survive. BB duplication framework can be usefull step in rewriting loop optimizer, as loop unrolling can be definitly done at the top of it. Converting whole loop optimizer at once can be huge task. Additionally some pieces should be already obsoletted (as code hoisting). So I think that in mid term, we can make the BIV/GIV discovery code idedendent on loop notes (it should not be that dificult) and start work on new loop unroller done as separate pass. I was looking at some compiler designs and loop unrolling seems to be done mostly at pretty late levels of compilation (as strength reduction is), while other loop transformations are done early. Do we have plans about the loop transformations done at tree level? > > > LOOP notes are not problem as we should move away from them and > > their absence does not harm, but what about BLOCK_BEG/BLOCK_END? > > BLOCK_BEG/END must be properly duplicated for debug info to be correct. I see. > > > It is, but I need to understand how and see if it still works with presence > > of duplicated BB - OK, I will try to investigate this futher. > > It should. The reorder_blocks (bad name) function is also used > for loop unrolling, which does duplication. Additionally, when > bb-reorder moves two blocks that shared a BLOCK_BEG/END, duplicates > are created. This means that when I am duplicating BB that starts BLOCK_BEG in the middle, I can easilly add an paired BLOCK_END at the end of basic block and get info correct? (symetrically att BLOCK_BEG to the begginign of block if needed)? > Careful, if the target uses a pc-relative load, there may be > no such instruction. Which is actually ideal for you. I see there are many side corners :( Honza ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: RFC: BB duplication code 2001-08-23 6:44 ` Jan Hubicka @ 2001-08-23 16:19 ` Richard Henderson 2001-08-24 6:35 ` Jan Hubicka 2001-09-07 20:34 ` Loop optimiser upgrade (Was RFC: BB duplication code) Michael Hayes 1 sibling, 1 reply; 25+ messages in thread From: Richard Henderson @ 2001-08-23 16:19 UTC (permalink / raw) To: Jan Hubicka; +Cc: gcc-patches, patches, gcc On Thu, Aug 23, 2001 at 03:44:00PM +0200, Jan Hubicka wrote: > This means that when I am duplicating BB that starts BLOCK_BEG in the > middle, I can easilly add an paired BLOCK_END at the end of basic block > and get info correct? Not quite. There are some data strucures that need updating. See the end of relate_bbs_with_scopes. r~ ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: RFC: BB duplication code 2001-08-23 16:19 ` Richard Henderson @ 2001-08-24 6:35 ` Jan Hubicka 0 siblings, 0 replies; 25+ messages in thread From: Jan Hubicka @ 2001-08-24 6:35 UTC (permalink / raw) To: Richard Henderson, Jan Hubicka, gcc-patches, patches, gcc > On Thu, Aug 23, 2001 at 03:44:00PM +0200, Jan Hubicka wrote: > > This means that when I am duplicating BB that starts BLOCK_BEG in the > > middle, I can easilly add an paired BLOCK_END at the end of basic block > > and get info correct? > > Not quite. There are some data strucures that need updating. > See the end of relate_bbs_with_scopes. Yes, I've went across the file last night and I believe I understand the issues already. It now appears to emit proper debug info, but I've introduced bootstrap failure :(, so I will send updated patch once this is done. Thanks a lot for your help! Honza > > > r~ ^ permalink raw reply [flat|nested] 25+ messages in thread
* Loop optimiser upgrade (Was RFC: BB duplication code) 2001-08-23 6:44 ` Jan Hubicka 2001-08-23 16:19 ` Richard Henderson @ 2001-09-07 20:34 ` Michael Hayes 2001-09-07 22:23 ` Daniel Berlin ` (5 more replies) 1 sibling, 6 replies; 25+ messages in thread From: Michael Hayes @ 2001-09-07 20:34 UTC (permalink / raw) To: Jan Hubicka; +Cc: gcc [-- Attachment #1: Type: text/plain, Size: 2389 bytes --] Jan Hubicka writes: > > I'd say strip the loop notes. You don't really lose any information, > > in that correct information can always be obtained from the shape of > > the CFG, via flow_loop_nodes_find. > Agreed - anyway an loop optimizer is still an barrier for my effort. > Until it is converted to use CFG, we can't make flow info survive. I agree that the loop notes should be weeded out. The one useful loop note is the VTOP note that indicates if the loop exit code has been duplicated. When this loop exists and we know that the loop body will be executed at least once if the loop header is entered. I'm not sure of an easy means to determine this from the CFG. > Converting whole loop optimizer at once can be huge task. Additionally some > pieces should be already obsoletted (as code hoisting). Yes, I agree. Before I got snowed under with my real job I had a big hack at converting the old loop optimiser to use the CFG. The problem was ensuring that I did not break anything. I think a better approach is to write another loop optimiser that can run after the current loop optimiser and then one day replace it completely. After the current loop optimiser has run, I'd say that we should strip the loop notes completely (or at least regenerate them from the CFG) and rely totally on the CFG. > So I think that in mid term, we can make the BIV/GIV discovery code idedendent > on loop notes (it should not be that dificult) and start work on new loop > unroller done as separate pass. Yes, this could be done standalone. However, it would need loop invariant discovery code first. This is one of the reasons I wrote the dataflow code that Dan Berlin has marvellously souped up for the new register allocator. It is also the reason I added routines to loop.c for sinking and hoisting insns so that I could track the changes to the dataflow. What started to stump me was an efficient way to update the dataflow information incrementally after each loop optimisation. To mitigate the dataflow computation needed, I structured the loop optimiser to optimise all the innermost loops first, then all the next level loops, and so on. This way the loops could be optimised independently without having to always recompute the dataflow information. I've attached the basic structure of the data flow based loop optimiser that I was working on. Michael [-- Attachment #2: loop-df.c --] [-- Type: text/plain, Size: 11441 bytes --] static void loop_notes_strip (f) rtx f; { rtx insn; /* Strip old loop notes. */ for (insn = f; insn; insn = NEXT_INSN (insn)) { if (GET_CODE (insn) == NOTE && (NOTE_LINE_NUMBER (insn) == NOTE_INSN_LOOP_BEG || NOTE_LINE_NUMBER (insn) == NOTE_INSN_LOOP_CONT || NOTE_LINE_NUMBER (insn) == NOTE_INSN_LOOP_END)) delete_insn (insn); } } static void loop_note_stats_dump (f, stream) rtx f; FILE *stream; { struct { int loops; int vtops; int conts; } loop_stats; /* Log old loop note stats. */ loop_stats.loops = 0; loop_stats.conts = 0; loop_stats.vtops = 0; for (insn = f; insn; insn = NEXT_INSN (insn)) { if (GET_CODE (insn) == NOTE && NOTE_LINE_NUMBER (insn) == NOTE_INSN_LOOP_BEG) loop_stats.loops++; else if (GET_CODE (insn) == NOTE && NOTE_LINE_NUMBER (insn) == NOTE_INSN_LOOP_CONT) loop_stats.conts++; else if (GET_CODE (insn) == NOTE && NOTE_LINE_NUMBER (insn) == NOTE_INSN_LOOP_VTOP) loop_stats.vtops++; } if (stream) fprintf (stream, "\n;; Loop stats: %d loops, %d conts, %d vtops\n", loop_stats.loops, loop_stats.conts, loop_stats.vtops); } /* Split an edge to create a new basic block and insert a NOTE_INSN_DELETED as a placeholder. */ static void loop_edge_split (e) edge e; { rtx seq; start_sequence (); emit_note (NULL_PTR, NOTE_INSN_DELETED); seq = gen_sequence (); end_sequence (); /* If we don't need to split the edge then we still want to add a dummy note placeholder to the end of the block so that we can insert insns before it without having to update bb->end all the time. */ insert_insn_on_edge (seq, e); } static void loop_pre_header_create (e) edge e; { basic_block bb = e->src; /* We know that this edge is the only forward edge into the loop header. */ /* If the source of the edge has a single successor then the source is the pre_header and we do not have to split any edges. */ if (bb->succ->succ_next == NULL && bb != ENTRY_BLOCK_PTR) { /* insert_insn_on_edge cannot handle a block with a single successor and with a single jump_insn at the end. Note that we know that the jump must be unconditional since the block has a single successor. We can use this jump insn to hoist insns before. */ if (GET_CODE (bb->end) == JUMP_INSN) return; /* We need a dummy insn at the end of the block to insert the insn before. loop_edge_split will do this for us although no actual edges get split. */ } loop_edge_split (e); } static void loop_landing_pad_create (e) edge e; { loop_edge_split (e); } /* Set the invalid flag for each loop we cannot process. */ static void loops_validate (loops) struct loops *loops; { int i; for (i = 0; i < loops->num; i++) { int j; struct loop *loop = &loops->array[i]; /* Invalidate natural loops with multiple forward edges into the loop header or with shared headers for now. These should be transformed to have a single entry edge to simplify hoisting or we could insert hoisted insns onto all entry edges. Also invalidate loops with abnormal entry or exit edges since we cannot split these edges. */ if (loop->num_entries != 1 || loop->shared || (loop->entry_edges[0]->flags & EDGE_ABNORMAL) != 0) { loop->invalid = 1; continue; } for (j = 0; j < loop->num_exits; j++) if ((loop->exit_edges[j]->flags & EDGE_ABNORMAL) != 0) loop->invalid = 1; } } /* Entry point of this file. Perform loop optimization on the current function. F is the first insn of the function and DUMPFILE is a stream for output of a trace of actions taken (or 0 if none should be output). */ void loop_optimize (f, dumpfile, flags) /* f is the first instruction of a chain of insns for one function. */ rtx f; FILE *dumpfile; int flags; { register rtx insn; register int i; struct loops loops_data; struct loops *loops = &loops_data; struct loop_info *loops_info; struct loop **loops_order; int loop_level; int n_basic_blocks_orig; loop_dump_stream = dumpfile; init_recog_no_volatile (); max_reg_before_loop = max_reg_num (); loop_note_stats_dump (f, loop_dump_stream); loop_notes_strip (f); /* Build CFG. */ find_basic_blocks (f, max_reg_before_loop, loop_dump_stream); cleanup_cfg (f); /* Find natural loops within the CFG. Only build the loop tree and find the loop entry and exit edges. */ flow_loops_find (loops, LOOP_ALL); if (! loops->num) { if (loop_dump_stream) fprintf (loop_dump_stream, ";; Loop stats: 0 natural loops, %d marked loops\n", loop_stats.loops); flow_loops_free (loops); return; } loops_validate (loops); loops_bbs_find (loops); /* Create pre-header and landing-pad nodes to hoist and sink insns into. Use dummy note as a placeholder so that we can insert insns before it. */ for (i = loops->num - 1; i >= 0; i--) { int j; struct loop *loop = &loops->array[i]; if (loop->invalid) continue; loop_pre_header_create (loop->entry_edges[0]); for (j = 0; j < loop->num_exits; j++) if ((loop->exit_edges[j]->flags & EDGE_FAKE) == 0) loop_landing_pad_create (loop->exit_edges[j]); /* Try inverting the loop; i.e., converting a while loop into a repeat loop. */ loop_invert (loop); } n_basic_blocks_orig = n_basic_blocks; commit_edge_insertions (); /* Add fake edges to the exit block for calls that may exit. This will introduce new basic blocks and new loop exits. We do not need landing pads for these new exits since they will never get executed. */ loop_call_edges_add (loops); if (n_basic_blocks != n_basic_blocks_orig) { if (loop_dump_stream) fprintf (loop_dump_stream, "\n;; Loop %d basic blocks created\n", n_basic_blocks - n_basic_blocks_orig); /* Update the loop data. This is only necessary if new basic blocks were created. */ flow_loops_update (loops, LOOP_ALL); loops_validate (loops); } /* Allocate and initialize auxiliary loop information. */ loops_info = xcalloc (loops->num, sizeof (struct loop_info)); for (i = 0; i < loops->num; i++) loops->array[i].aux = loops_info + i; /* Now find all register lifetimes. */ reg_scan (f, max_reg_before_loop, 1); /* Initialise dataflow information. */ df = df_init (); /* Copy pointer to dominator data. */ df->dom = loops->cfg.dom; /* This must occur after reg_scan so that registers created by gcse will have entries in the register tables. We could have added a call to reg_scan after gcse_main in toplev.c, but moving this call to init_alias_analysis is more efficient. */ init_alias_analysis (); /* Link loops in terms of the loop level. */ loops_order = xcalloc (loops->levels + 1, sizeof (*loops_order)); for (i = loops->num - 1; i >= 0; i--) { struct loop *loop = &loops->array[i]; loop->next = loops_order[loop->level]; loops_order[loop->level] = loop; } /* Now optimize the loops with all the innermost loops first (level 1), followed by all the level 2 loops, etc. Note that at each level the loops are independent and thus the transformations on a loop should not screw up the dataflow information for other loops at the same level. */ for (loop_level = 1; loop_level <= loops->levels; loop_level++) { struct loop *loop; if (loop_dump_stream) fprintf (loop_dump_stream, "\n;; Processing loops at level %d.\n", loop_level); /* Create data flow information. */ if (df_analyse (df, 0, DF_LOOP)) df_dump (df, DF_LOOP_DUMP, loop_dump_stream); /* Loop optimizations should only affect the dataflow information for the blocks within the loop and for the loop pre-header and landing-pad blocks. Newly created uses should not be exposed above the loop pre-header unless an optimization has gone wrong. Thus the existing bits within the IN bitmaps for reaching uses should not change. Similarly, newly created defs within the loop should not be used after the end of the loop landing-pads. Thus there is no need to propagate dataflow information before the loop pre-header or after the loop landing-pads. */ /* Scan loops looking for valid ones to optimize. */ for (loop = loops_order[loop_level]; loop && !loop->invalid; loop = loop->next) { struct loop_info *loop_info = LOOP_INFO (loop); if (loop->invalid) continue; /* Set up variables describing this loop. */ loop_scan (loop); /* Give up on this loop if it has a setjmp. */ if (loop_info->has_setjmp) { if (loop_dump_stream) fprintf (loop_dump_stream, "\n;; Loop%d ignored due to setjmp.\n", loop->num); loop->invalid = 1; continue; } } /* Perform loop invariant code motion. */ for (loop = loops_order[loop_level]; loop && !loop->invalid; loop = loop->next) { struct loop_regs *regs = LOOP_REGS (loop); if (loop_dump_stream) fprintf (loop_dump_stream, "\n;; Loop%d invariant code motion.\n", loop->num); loop_invariants_move (loop, flags); } /* Update data flow information. */ if (df_analyse (df, (sbitmap)-1, DF_LOOP)) df_dump (df, DF_LOOP_DUMP, loop_dump_stream); /* Perform shadowing of loop mems. */ for (loop = loops_order[loop_level]; loop && !loop->invalid; loop = loop->next) { if (loop_dump_stream) fprintf (loop_dump_stream, "\n;; Loop%d mem loading.\n", loop->num); loop_mems_load (loop); } /* Update data flow information for blocks that were modified. */ if (df_analyse (df, (sbitmap)-1, DF_LOOP)) df_dump (df, DF_LOOP_DUMP, loop_dump_stream); /* Find induction variables. */ for (loop = loops_order[loop_level]; loop && !loop->invalid; loop = loop->next) { } /* Invert loop. */ for (loop = loops_order[loop_level]; loop && !loop->invalid; loop = loop->next) { } /* Update data flow information for blocks that were modified. */ if (df_analyse (df, (sbitmap)-1, DF_LOOP)) df_dump (df, DF_LOOP_DUMP, loop_dump_stream); /* Unroll loop. */ for (loop = loops_order[loop_level]; loop && !loop->invalid; loop = loop->next) { } /* Update data flow information for blocks that were modified. */ if (df_analyse (df, (sbitmap)-1, DF_LOOP)) df_dump (df, DF_LOOP_DUMP, loop_dump_stream); /* Induction variable strength reduction. */ for (loop = loops_order[loop_level]; loop && !loop->invalid; loop = loop->next) { } } /* If there were lexical blocks inside the loop, they have been replicated. We will now have more than one NOTE_INSN_BLOCK_BEG and NOTE_INSN_BLOCK_END for each such block. We must duplicate the BLOCKs as well. */ if (write_symbols != NO_DEBUG) reorder_blocks (); end_alias_analysis (); if (loop_dump_stream) flow_loops_dump (loops, loop_dump_stream, loop_dump_aux, 1); /* Clean up. */ df_finish (df); free (loops_order); free (loops_info); flow_loops_free (loops); } ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Loop optimiser upgrade (Was RFC: BB duplication code) 2001-09-07 20:34 ` Loop optimiser upgrade (Was RFC: BB duplication code) Michael Hayes @ 2001-09-07 22:23 ` Daniel Berlin 2001-09-08 9:15 ` Jan Hubicka 2001-09-08 8:57 ` Jan Hubicka ` (4 subsequent siblings) 5 siblings, 1 reply; 25+ messages in thread From: Daniel Berlin @ 2001-09-07 22:23 UTC (permalink / raw) To: Michael Hayes; +Cc: Jan Hubicka, gcc Michael Hayes <m.hayes@elec.canterbury.ac.nz> writes: > Jan Hubicka writes: > > > > I'd say strip the loop notes. You don't really lose any information, > > > in that correct information can always be obtained from the shape of > > > the CFG, via flow_loop_nodes_find. > > Agreed - anyway an loop optimizer is still an barrier for my effort. > > Until it is converted to use CFG, we can't make flow info survive. > > I agree that the loop notes should be weeded out. The one useful loop > note is the VTOP note that indicates if the loop exit code has been > duplicated. When this loop exists and we know that the loop body will > be executed at least once if the loop header is entered. I'm not sure > of an easy means to determine this from the CFG. > > > Converting whole loop optimizer at once can be huge task. Additionally some > > pieces should be already obsoletted (as code hoisting). > > Yes, I agree. Before I got snowed under with my real job I had a big > hack at converting the old loop optimiser to use the CFG. The problem > was ensuring that I did not break anything. I think a better approach > is to write another loop optimiser that can run after the current loop > optimiser and then one day replace it completely. After the current > loop optimiser has run, I'd say that we should strip the loop notes > completely (or at least regenerate them from the CFG) and rely totally > on the CFG. > > > So I think that in mid term, we can make the BIV/GIV discovery code idedendent > > on loop notes (it should not be that dificult) and start work on new loop > > unroller done as separate pass. > > Yes, this could be done standalone. However, it would need loop > invariant discovery code first. This is one of the reasons I wrote > the dataflow code that Dan Berlin has marvellously souped up for the > new register allocator. It is also the reason I added routines to > loop.c for sinking and hoisting insns so that I could track the > changes to the dataflow. Loop invariant discovery as it stands right now isn't doing all it could, as i'm sure everyone knows. Without a fixed store motion i'm about to send off to Andreas for some more bootstraps (It passes bootstrap and make check on rs6000 and x86, after a few more fixes tonight), for the following simple dead stores in loop code: int global; float r; void f() { int q; int i; r=6; for (q = 0; q < 50; q++) { i = 1; global = 1; global = 2; r = 4; r=global; } return; global = 3; } We get: .file "dse.c" .text .align 2 .p2align 4,,15 .globl f .type f,@function f: pushl %ebp movl $0x40000000, %edx movl %esp, %ebp movl $0x40c00000, r movl $49, %eax .p2align 4,,15 .L5: movl %edx, r decl %eax jns .L5 popl %ebp movl $2, %eax movl %eax, global ret Which completely misses the fact that r is loop invariant, for some reason. However, a working store motion turns it into: .globl f .type f,@function f: pushl %ebp movl %esp, %ebp movl $49, %eax .p2align 4,,15 .L5: decl %eax jns .L5 popl %ebp movl $2, %eax movl %eax, global movl $0x40000000, %eax movl %eax, r ret Load motion isn't as good as store motion yet, and leaves one r/o reg for loop to hoist. However, i'm curious as to why we don't notice that r was invariant in the current loop code. Any ideas? > > What started to stump me was an efficient way to update the dataflow > information incrementally after each loop optimisation. To mitigate > the dataflow computation needed, I structured the loop optimiser to > optimise all the innermost loops first, then all the next level loops, > and so on. This way the loops could be optimised independently > without having to always recompute the dataflow information. > > I've attached the basic structure of the data flow based loop > optimiser that I was working on. > > Michael > > -- "My girlfriend asked me how long I was going to be gone on this tour. I said, "the whole time." "-Steven Wright ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Loop optimiser upgrade (Was RFC: BB duplication code) 2001-09-07 22:23 ` Daniel Berlin @ 2001-09-08 9:15 ` Jan Hubicka 0 siblings, 0 replies; 25+ messages in thread From: Jan Hubicka @ 2001-09-08 9:15 UTC (permalink / raw) To: Daniel Berlin; +Cc: Michael Hayes, Jan Hubicka, gcc > Load motion isn't as good as store motion yet, and leaves one r/o reg > for loop to hoist. > > However, i'm curious as to why we don't notice that r was invariant in > the current loop code. > Any ideas? It is because the invariant code simply "sees" that R is something at loop entry and set to something else in the body. It does not know that the value is used only with the new one as all analysis it does it to look at variable's live range and see that variable is live before the definition concluding it is used too. Using DU/DF will solve it. Anyway I think the loop invariant code, once PRE/store/load motion is at place don't need to attempt to be smarter. It should simply know that value is invariant if it is not set in the loop, as other cases should be already put away. We only need invariant information to discover BIVs/GIVs incremented/multiplied by invariant registers. Honza ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Loop optimiser upgrade (Was RFC: BB duplication code) 2001-09-07 20:34 ` Loop optimiser upgrade (Was RFC: BB duplication code) Michael Hayes 2001-09-07 22:23 ` Daniel Berlin @ 2001-09-08 8:57 ` Jan Hubicka 2001-09-08 14:00 ` Daniel Berlin 2001-09-08 9:09 ` Jan Hubicka ` (3 subsequent siblings) 5 siblings, 1 reply; 25+ messages in thread From: Jan Hubicka @ 2001-09-08 8:57 UTC (permalink / raw) To: Michael Hayes; +Cc: Jan Hubicka, gcc > Jan Hubicka writes: > > > > I'd say strip the loop notes. You don't really lose any information, > > > in that correct information can always be obtained from the shape of > > > the CFG, via flow_loop_nodes_find. > > Agreed - anyway an loop optimizer is still an barrier for my effort. > > Until it is converted to use CFG, we can't make flow info survive. > > I agree that the loop notes should be weeded out. The one useful loop > note is the VTOP note that indicates if the loop exit code has been > duplicated. When this loop exists and we know that the loop body will > be executed at least once if the loop header is entered. I'm not sure > of an easy means to determine this from the CFG. It should be possible to tie this into branch prediction scheme, as I do currently with the knowledge of iteration count. Overall I believe that future loop optimizer should be able to do loop peeling (and header duplication as special case), instead letting previous passes doing it. This way we will probably not need to preserve the information too long. > > > Converting whole loop optimizer at once can be huge task. Additionally some > > pieces should be already obsoletted (as code hoisting). > > Yes, I agree. Before I got snowed under with my real job I had a big > hack at converting the old loop optimiser to use the CFG. The problem It would be excellent to see your code. > was ensuring that I did not break anything. I think a better approach > is to write another loop optimiser that can run after the current loop > optimiser and then one day replace it completely. After the current 100% agreed - actually I was suggesting that in separate email. I am thinking about doing that at CFG branch. I think we should follow > loop optimiser has run, I'd say that we should strip the loop notes > completely (or at least regenerate them from the CFG) and rely totally > on the CFG. > > > So I think that in mid term, we can make the BIV/GIV discovery code idedendent > > on loop notes (it should not be that dificult) and start work on new loop > > unroller done as separate pass. > > Yes, this could be done standalone. However, it would need loop > invariant discovery code first. This is one of the reasons I wrote > the dataflow code that Dan Berlin has marvellously souped up for the > new register allocator. It is also the reason I added routines to > loop.c for sinking and hoisting insns so that I could track the > changes to the dataflow. Looks your plan is compatible. I planned steps 1) create an "loop module" able to analuze loop and get invariants 2) create an BIV/GIV module 3) rewrite strength reduction and loop unrolling on top of it. We should be able to drop invariant/memory hoisting code as it is redundant with PRE/load/store motion. > > What started to stump me was an efficient way to update the dataflow > information incrementally after each loop optimisation. To mitigate > the dataflow computation needed, I structured the loop optimiser to > optimise all the innermost loops first, then all the next level loops, > and so on. This way the loops could be optimised independently > without having to always recompute the dataflow information. You mean the DU/DF chains? Thats problem - how it is handled by other compilers? Can't we get them updated incrementaly? > > I've attached the basic structure of the data flow based loop > optimiser that I was working on. Great. It is probably time to start the CFG branch and get your work into it. Honza > > Michael > ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Loop optimiser upgrade (Was RFC: BB duplication code) 2001-09-08 8:57 ` Jan Hubicka @ 2001-09-08 14:00 ` Daniel Berlin 0 siblings, 0 replies; 25+ messages in thread From: Daniel Berlin @ 2001-09-08 14:00 UTC (permalink / raw) To: Jan Hubicka; +Cc: Michael Hayes, gcc Jan Hubicka <jh@suse.cz> writes: > You mean the DU/DF chains? > Thats problem - how it is handled by other compilers? Can't we get them > updated incrementaly? A few do elimination based dataflow or it's ilk. A few just do loop unrolling/optimizations on SSA form. Until we have a mid level RTL, the second isn't really an option. I've got DJ-graph elimination based dataflow, with incremental updating, working okay, but it needs some massive cleanup. As it stands right now, df can update incrementally, just make sure you don't depend on df->defs[x] and df->uses[x] being non-null (since we may create holes when we unlink the now deleted defs/uses). I only mention it because it's easy to assume you can just run through all the defs in the defs array, and that none of them will be NULL. > > >> I've attached the basic structure of the data flow based loop >> optimiser that I was working on. > Great. It is probably time to start the CFG branch and get your work > into it. > > Honza > > >> Michael >> > -- "In Vegas, I got into a long argument with the man at the roulette wheel over what I considered to be an odd number. "-Steven Wright ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Loop optimiser upgrade (Was RFC: BB duplication code) 2001-09-07 20:34 ` Loop optimiser upgrade (Was RFC: BB duplication code) Michael Hayes 2001-09-07 22:23 ` Daniel Berlin 2001-09-08 8:57 ` Jan Hubicka @ 2001-09-08 9:09 ` Jan Hubicka 2001-09-08 16:56 ` Michael Hayes 2001-09-08 9:33 ` Jan Hubicka ` (2 subsequent siblings) 5 siblings, 1 reply; 25+ messages in thread From: Jan Hubicka @ 2001-09-08 9:09 UTC (permalink / raw) To: Michael Hayes; +Cc: Jan Hubicka, gcc Michael, I've run across your code and I would love to see it all so we can work together on getting it useable. Basically it looks great, especially because the DU/DF and natural loop framework is already used and tested in gcc, so it should not be so dificult to continue. I am happy you implemented the preheader and landing pad construction as it will simplify my CFGcleanup code. I have one major comment. What do you think about restructuring it to kind of "loop library" as I've mentioned previously? As we already have code to discover loops, I would love to see everything designed in similar way - ie function to create preheaders/landing pads, function to discover invariants, function to discover BIVs and implement each optimization as separate pass. This should help us to get better ordering of passes and simplify maitance of it. Major problem of loop optimizer now is that it is such a huge beast so dificult to replace. Also once midlevel RTL and SSA is at place, we definitly want to move some optimizations there and keep other in late passes. I understand there is problem with DU/DF updating. So perhaps the passes should be called by driver per loop basis - that will save the idea of optimizing several unnested loops and then re-update DU/DF. Honza ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Loop optimiser upgrade (Was RFC: BB duplication code) 2001-09-08 9:09 ` Jan Hubicka @ 2001-09-08 16:56 ` Michael Hayes 2001-09-09 1:23 ` Jan Hubicka 2001-09-26 17:27 ` Jan Hubicka 0 siblings, 2 replies; 25+ messages in thread From: Michael Hayes @ 2001-09-08 16:56 UTC (permalink / raw) To: Jan Hubicka; +Cc: Michael Hayes, gcc Jan Hubicka writes: > What do you think about restructuring it to kind of "loop library" > as I've mentioned previously? > > As we already have code to discover loops, I would love to see everything > designed in similar way - ie function to create preheaders/landing pads, > function to discover invariants, function to discover BIVs and implement each > optimization as separate pass. Yes, I agree. Each of these modules should be decoupled as much as possible to simplify the maintenance. Maybe we should try the unthinkable and have a go at designing the interfaces first ;-) Michael. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Loop optimiser upgrade (Was RFC: BB duplication code) 2001-09-08 16:56 ` Michael Hayes @ 2001-09-09 1:23 ` Jan Hubicka 2001-09-26 17:27 ` Jan Hubicka 1 sibling, 0 replies; 25+ messages in thread From: Jan Hubicka @ 2001-09-09 1:23 UTC (permalink / raw) To: Michael Hayes; +Cc: Jan Hubicka, gcc > > Yes, I agree. Each of these modules should be decoupled as much as > possible to simplify the maintenance. > > Maybe we should try the unthinkable and have a go at designing the > interfaces first ;-) Sounds like good idea. I still do have dificulties in thinking how all those parts we plan fits together. I have quite clean plans about CFG infrastructure, but concerning the loop optimizer my little mind still goes crazy. Maybe you noticed that some bits are already in. For instance the BB pre-header code you have can be rewriten to use redirect_edge_and_branch. Simply split the header first and then redirect loopback edge. That will work even with multiple header predecesors. First thing that comes into mind is how to update the "loops" datastructure after updating the CFG? It is probably needed to do so while unrolling loops or peeling, as the loop nest will force too many rebuilds otherwise. Main problem I see is in the bitmaps used to hold nodes of the CFG. Lets take example of simple preheader/sink creation pass. Any creating of basic block would need shifting all these bitmaps. If we use consistently create_block function (I plan to do so), we can probably shift the bitmaps there, but it is ugly. One alternative I see is to not represent the bitmap explicitly, instead maitain dominance tree and use it to always construct the entries at fly. (that should be possible easilly or not?) Another solution can be to use DJ graphs, that can be less sensitive to such a local updates. What do you think? OK, another issues later :) Honza > > Michael. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Loop optimiser upgrade (Was RFC: BB duplication code) 2001-09-08 16:56 ` Michael Hayes 2001-09-09 1:23 ` Jan Hubicka @ 2001-09-26 17:27 ` Jan Hubicka 1 sibling, 0 replies; 25+ messages in thread From: Jan Hubicka @ 2001-09-26 17:27 UTC (permalink / raw) To: Michael Hayes; +Cc: Jan Hubicka, gcc > Jan Hubicka writes: > > What do you think about restructuring it to kind of "loop library" > > as I've mentioned previously? > > > > As we already have code to discover loops, I would love to see everything > > designed in similar way - ie function to create preheaders/landing pads, > > function to discover invariants, function to discover BIVs and implement each > > optimization as separate pass. > > Yes, I agree. Each of these modules should be decoupled as much as > possible to simplify the maintenance. > > Maybe we should try the unthinkable and have a go at designing the > interfaces first ;-) Hmm, this still sounds temping. Perhaps we can try to sumarize what we want to accomplish at RTL level. I following optimizations are IMO thinkable: 1) code hoisting/invariant code motion - this should be obsoletted by PRE/load/store motion 2) strength reduction 3) "prefetch" instruction genration and probably dropping the "nontemporal" hints to loads too 4) loop peeling 5) loop unrolling+preconditioning+variable expansion 6) induction variable removal - this should be obsoleted by proper dead code elimination 7) Something else? About 4) - I already do have code that is able to peel loop based on the bb-reorder pass. Problem is that the representation a bit differs there from the normal insn stream, as the CFG contains disjunt basic block and the fallthru edges may point anywhere. This makes duplicating the basic blocks very convenient, but I am not quite sure if it will work well with rest of loop infrastructure. Also the header duplication can be seen as special case of loop peeling. There is still problem how to keep the "VTOP" infromation around.. About 5) - that one should be done IMO as the peeling is - just by duplicating. How much do we want to integrate it with scheduling and software pipelining? Also needs it to be done together with preconditioning? I do have the code to turn pseudos into webs that, in combination with CSE, should replace the variable expansion done by current unroller. Honza > > Michael. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Loop optimiser upgrade (Was RFC: BB duplication code) 2001-09-07 20:34 ` Loop optimiser upgrade (Was RFC: BB duplication code) Michael Hayes ` (2 preceding siblings ...) 2001-09-08 9:09 ` Jan Hubicka @ 2001-09-08 9:33 ` Jan Hubicka 2001-09-19 22:23 ` Joern Rennecke 2001-09-22 10:17 ` Jan Hubicka 5 siblings, 0 replies; 25+ messages in thread From: Jan Hubicka @ 2001-09-08 9:33 UTC (permalink / raw) To: Michael Hayes; +Cc: Jan Hubicka, gcc > Jan Hubicka writes: > > > > I'd say strip the loop notes. You don't really lose any information, > > > in that correct information can always be obtained from the shape of > > > the CFG, via flow_loop_nodes_find. > > Agreed - anyway an loop optimizer is still an barrier for my effort. > > Until it is converted to use CFG, we can't make flow info survive. > > I agree that the loop notes should be weeded out. The one useful loop > note is the VTOP note that indicates if the loop exit code has been > duplicated. When this loop exists and we know that the loop body will Just to note, the case we were speaking about is the tracer and dropping loop notes in the code duplicates. This means that it will not affect current loop optimizer, as the BB duplication is run afterwards (but if loop optimizer were sane, it should be run before). Honza ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Loop optimiser upgrade (Was RFC: BB duplication code) 2001-09-07 20:34 ` Loop optimiser upgrade (Was RFC: BB duplication code) Michael Hayes ` (3 preceding siblings ...) 2001-09-08 9:33 ` Jan Hubicka @ 2001-09-19 22:23 ` Joern Rennecke 2001-09-21 19:11 ` Michael Hayes 2001-09-22 10:17 ` Jan Hubicka 5 siblings, 1 reply; 25+ messages in thread From: Joern Rennecke @ 2001-09-19 22:23 UTC (permalink / raw) To: Michael Hayes; +Cc: Jan Hubicka, gcc > I agree that the loop notes should be weeded out. The one useful loop > note is the VTOP note that indicates if the loop exit code has been > duplicated. When this loop exists and we know that the loop body will > be executed at least once if the loop header is entered. I'm not sure > of an easy means to determine this from the CFG. Actually, that is the easy part. You just have to look if the loop is solely entered by fallthrough from the preceding block. In the non-vtop while loop case, you have a jump from outside to the loop exit test. However, the VTOP note means more than just that the loop is executed at least once when entered. It also means that the loop exit condition will be false at the start of the first iteration. This knowledge allows you to do some extra strength reduction and unroll loop preconditioning transforations without emitting extra runtime checks. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Loop optimiser upgrade (Was RFC: BB duplication code) 2001-09-19 22:23 ` Joern Rennecke @ 2001-09-21 19:11 ` Michael Hayes 2001-09-21 21:16 ` Joern Rennecke 0 siblings, 1 reply; 25+ messages in thread From: Michael Hayes @ 2001-09-21 19:11 UTC (permalink / raw) To: Joern Rennecke; +Cc: Michael Hayes, Jan Hubicka, gcc Joern Rennecke writes: > > I agree that the loop notes should be weeded out. The one useful loop > > note is the VTOP note that indicates if the loop exit code has been > > duplicated. When this loop exists and we know that the loop body will > > be executed at least once if the loop header is entered. I'm not sure > > of an easy means to determine this from the CFG. > > Actually, that is the easy part. You just have to look if the loop is > solely entered by fallthrough from the preceding block. In the non-vtop > while loop case, you have a jump from outside to the loop exit > test. Are you sure? A simple while loop has a CFG: +---+ v | --->H-->L | v where H is the loop header and L is the loop latch node. If we transform the while loop into a do-while loop and duplicate the loop exit test we have something like: +---+ v | -->B-->H-->L--+--> | ^ | | +----------+ where B is a node containing the duplicated exit test that is used to bypass the loop. My question is how do we tell this from a do-while loop with a goto around it? > However, the VTOP note means more than just that the loop is executed > at least once when entered. It also means that the loop exit condition > will be false at the start of the first iteration. This knowledge > allows you to do some extra strength reduction and unroll loop > preconditioning transforations without emitting extra runtime checks. I realise this thanks. Michael. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Loop optimiser upgrade (Was RFC: BB duplication code) 2001-09-21 19:11 ` Michael Hayes @ 2001-09-21 21:16 ` Joern Rennecke 2001-09-23 4:29 ` Michael Hayes 0 siblings, 1 reply; 25+ messages in thread From: Joern Rennecke @ 2001-09-21 21:16 UTC (permalink / raw) To: Michael Hayes; +Cc: Joern Rennecke, Michael Hayes, Jan Hubicka, gcc > My question is how do we tell this from a do-while loop with a goto > around it? You don't - and it's immaterial for the purpose of spotting loops that are executed at least once if entered. That's a property all do-wile loops have, by definition. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Loop optimiser upgrade (Was RFC: BB duplication code) 2001-09-21 21:16 ` Joern Rennecke @ 2001-09-23 4:29 ` Michael Hayes 2001-09-24 4:34 ` Jan Hubicka 0 siblings, 1 reply; 25+ messages in thread From: Michael Hayes @ 2001-09-23 4:29 UTC (permalink / raw) To: Joern Rennecke; +Cc: Michael Hayes, Joern Rennecke, Jan Hubicka, gcc Joern Rennecke writes: > > My question is how do we tell this from a do-while loop with a goto > > around it? > > You don't - and it's immaterial for the purpose of spotting loops that are > executed at least once if entered. That's a property all do-wile loops > have, by definition. Sure! I guess I didn't explain myself well or you missed my point. The advantage of knowing that the loop exit test has been duplicated is that when one of these transformed loops is entered we know that the exit test will be true on the first iteration. This allows us to compute the number (or maximum number) of iterations of well behaved loops without having do perform value range propagation. The point I was trying to make is that I do not know of a simple, general, way of looking at the CFG to see if the loop exit has been duplicated. I cannot see how your suggestion that there may be a fall-through edge into the loop header will tell us this since I can construct an equivalent CFG using a goto. Michael. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Loop optimiser upgrade (Was RFC: BB duplication code) 2001-09-23 4:29 ` Michael Hayes @ 2001-09-24 4:34 ` Jan Hubicka 0 siblings, 0 replies; 25+ messages in thread From: Jan Hubicka @ 2001-09-24 4:34 UTC (permalink / raw) To: Michael Hayes; +Cc: Joern Rennecke, Joern Rennecke, Jan Hubicka, gcc > The point I was trying to make is that I do not know of a simple, > general, way of looking at the CFG to see if the loop exit has been > duplicated. I cannot see how your suggestion that there may be a > fall-through edge into the loop header will tell us this since > I can construct an equivalent CFG using a goto. Surely that idea won't work. Only way I see is to duplicate in the loop optimizer and attach the information to "loop" structure that will hopefully be updated untill all relevant loop optimizations passes. BTW did I sent you the bugreport about the loop discovery code, or I did forgot about that? Honza > > Michael. > > ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Loop optimiser upgrade (Was RFC: BB duplication code) 2001-09-07 20:34 ` Loop optimiser upgrade (Was RFC: BB duplication code) Michael Hayes ` (4 preceding siblings ...) 2001-09-19 22:23 ` Joern Rennecke @ 2001-09-22 10:17 ` Jan Hubicka 5 siblings, 0 replies; 25+ messages in thread From: Jan Hubicka @ 2001-09-22 10:17 UTC (permalink / raw) To: Michael Hayes; +Cc: Jan Hubicka, gcc Hi, another simple question - what notion of loop we want to base our optimizer on? The natural loops itself are probably not best choice, as in case of following loop for (i=0; i<1000;i++) if (i&1) do something; else do something else; will result in two natural loop each iterating exactly once. Do you have some plans about that? We can merge the loops based on profile data, probably - the compaq compiler unsplits each such shared header to common and uncommon one causing two nested loops. Will we work on multiple loops, or split the header to get common latch and thus one natural loop from both? Whats about the notion of superblock loops - the impact compiler is doing some optimizations only on these and thus they are easier to implement. With my tracer code, hot paths of most of internal loops should convert to superblocks, but still this can be somewhat limited, as too much can fall outside the loop notion... Just ideas... Honza ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: RFC: BB duplication code 2001-08-22 12:14 ` Richard Henderson 2001-08-22 12:27 ` Jan Hubicka @ 2001-08-22 16:04 ` Joern Rennecke 2001-08-22 16:44 ` Richard Henderson 1 sibling, 1 reply; 25+ messages in thread From: Joern Rennecke @ 2001-08-22 16:04 UTC (permalink / raw) To: Richard Henderson; +Cc: Jan Hubicka, gcc-patches, patches, gcc > In the case of JUMP_TABLES_IN_TEXT_SECTION, yes, you absolutely cannot > share the table. This is used by targets that use pc-relative addressing > from the jump instruction to the table. VAX is the most extreme example > in that the entire table is an argument to the instruction. This is target-dependent. Some targets just put the jump table in the text section because the linker can't handle cross-section differences, and they use differences (either always or for pic / PIC). ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: RFC: BB duplication code 2001-08-22 16:04 ` RFC: BB duplication code Joern Rennecke @ 2001-08-22 16:44 ` Richard Henderson 0 siblings, 0 replies; 25+ messages in thread From: Richard Henderson @ 2001-08-22 16:44 UTC (permalink / raw) To: Joern Rennecke; +Cc: Jan Hubicka, gcc-patches, patches, gcc On Thu, Aug 23, 2001 at 12:04:48AM +0100, Joern Rennecke wrote: > This is target-dependent. Some targets just put the jump table in the text > section because the linker can't handle cross-section differences, and > they use differences (either always or for pic / PIC). True, JUMP_TABLES_IN_TEXT_SECTION is an overestimate. However it's fairly close to accurate. Certainly it doesn't miss any cases for which pc-relative loads *are* used. r~ ^ permalink raw reply [flat|nested] 25+ messages in thread
end of thread, other threads:[~2001-09-26 17:27 UTC | newest] Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2001-08-22 11:29 RFC: BB duplication code Jan Hubicka 2001-08-22 12:14 ` Richard Henderson 2001-08-22 12:27 ` Jan Hubicka 2001-08-22 13:04 ` Richard Henderson 2001-08-23 6:44 ` Jan Hubicka 2001-08-23 16:19 ` Richard Henderson 2001-08-24 6:35 ` Jan Hubicka 2001-09-07 20:34 ` Loop optimiser upgrade (Was RFC: BB duplication code) Michael Hayes 2001-09-07 22:23 ` Daniel Berlin 2001-09-08 9:15 ` Jan Hubicka 2001-09-08 8:57 ` Jan Hubicka 2001-09-08 14:00 ` Daniel Berlin 2001-09-08 9:09 ` Jan Hubicka 2001-09-08 16:56 ` Michael Hayes 2001-09-09 1:23 ` Jan Hubicka 2001-09-26 17:27 ` Jan Hubicka 2001-09-08 9:33 ` Jan Hubicka 2001-09-19 22:23 ` Joern Rennecke 2001-09-21 19:11 ` Michael Hayes 2001-09-21 21:16 ` Joern Rennecke 2001-09-23 4:29 ` Michael Hayes 2001-09-24 4:34 ` Jan Hubicka 2001-09-22 10:17 ` Jan Hubicka 2001-08-22 16:04 ` RFC: BB duplication code Joern Rennecke 2001-08-22 16:44 ` Richard Henderson
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).