* PowerPC shrink-wrap support 0 of 3 @ 2011-09-17 7:59 Alan Modra 2011-09-17 8:22 ` PowerPC shrink-wrap support 1 " Alan Modra ` (3 more replies) 0 siblings, 4 replies; 48+ messages in thread From: Alan Modra @ 2011-09-17 7:59 UTC (permalink / raw) To: gcc-patches, Bernd Schmidt, David Edelsohn This patch series adds shrink-wrap support for PowerPC. The patches are on top of Bernd's "Initial shrink-wrapping patch": http://gcc.gnu.org/ml/gcc-patches/2011-08/msg02557.html, but with the tm.texi patch applied to tm.texi.in. Bootstrapped and regression tested powerpc64-linux all langs except ada, and spec CPU2006 tested. The spec results were a little disappointing as I expected to see some gains, but my baseline was a -O3 run and I suppose most of the shrink-wrap opportunities were lost to inlining. I deliberately omitted defining RETURN_ADDR_REGNUM for PowerPC as I don't see why it is necessary to treat LR any differently to other regs saved by the prologue. I believe code in requires_stack_frame_p CLEAR_HARD_REG_SET (hardregs); note_stores (PATTERN (insn), record_hard_reg_sets, &hardregs); if (hard_reg_set_intersect_p (hardregs, prologue_used)) return true; will correctly pick up any set of LR. If code in a function body sets LR without saving and restoring around the use, then the prologue must save LR, so LR will be in prologue_used. -- Alan Modra Australia Development Lab, IBM ^ permalink raw reply [flat|nested] 48+ messages in thread
* PowerPC shrink-wrap support 1 of 3 2011-09-17 7:59 PowerPC shrink-wrap support 0 of 3 Alan Modra @ 2011-09-17 8:22 ` Alan Modra 2011-09-17 9:13 ` PowerPC shrink-wrap support 2 " Alan Modra ` (2 subsequent siblings) 3 siblings, 0 replies; 48+ messages in thread From: Alan Modra @ 2011-09-17 8:22 UTC (permalink / raw) To: gcc-patches, Bernd Schmidt This obvious patch extends my 2011-08-03 fix to simple_return. Necessary for the same reason as the original patch, namely that rs6000.md has a peephole2 that matches and recreates a conditional jump. With the third patch in this series applied, rs6000 will start to use simple_return in conditional jumps. Even though I think this patch meets the "obvious" requirement, I won't apply it without the other two since the assert may trigger on mips without the second patch in this series. PR rtl-optimization/49941 * jump.c (mark_jump_label_1): Set JUMP_LABEL for simple_return jumps. diff -urp -x.svn -x'*~' -x'*.orig' gcc-bernd/gcc/jump.c gcc-current/gcc/jump.c --- gcc-bernd/gcc/jump.c 2011-09-16 11:52:14.000000000 +0930 +++ gcc-current/gcc/jump.c 2011-09-16 09:59:39.000000000 +0930 @@ -1086,6 +1086,7 @@ mark_jump_label_1 (rtx x, rtx insn, bool return; case RETURN: + case SIMPLE_RETURN: if (is_target) { gcc_assert (JUMP_LABEL (insn) == NULL || JUMP_LABEL (insn) == x); @@ -1408,7 +1409,7 @@ redirect_exp_1 (rtx *loc, rtx olabel, rt int i; const char *fmt; - if ((code == LABEL_REF && XEXP (x, 0) == olabel) + if ((code == LABEL_REF && XEXP (x, 0) == olabel) || x == olabel) { x = redirect_target (nlabel); -- Alan Modra Australia Development Lab, IBM ^ permalink raw reply [flat|nested] 48+ messages in thread
* PowerPC shrink-wrap support 2 of 3 2011-09-17 7:59 PowerPC shrink-wrap support 0 of 3 Alan Modra 2011-09-17 8:22 ` PowerPC shrink-wrap support 1 " Alan Modra @ 2011-09-17 9:13 ` Alan Modra 2011-09-17 9:26 ` PowerPC shrink-wrap support 3 " Alan Modra 2011-09-17 18:16 ` PowerPC shrink-wrap support 0 of 3 Bernd Schmidt 3 siblings, 0 replies; 48+ messages in thread From: Alan Modra @ 2011-09-17 9:13 UTC (permalink / raw) To: gcc-patches, Bernd Schmidt This patch corrects some cases where a return jump jump_label wrongly uses ret_rtx with a simple_return, thus triggering the assert in mark_jump_label_1. PowerPC doesn't need this patch with the final version of my shrink wrap support, but at one stage I was using (define_expand "return" [(simple_return)] "direct_return ()") (define_insn "simple_return" [(simple_return)] "" "{br|blr}" [(set_attr "type" "jmpreg")]) in rs6000.md, similar to mips.md. So gen_return() would create an instruction using simple_return, and the code patched here would put ret_rtx into JUMP_LABEL. * rtl.h (set_return_jump_label): Declare. * function.c (set_return_jump_label): New function, extracted.. (thread_prologue_and_epilogue_insns): ..from here. Use it in another instance to set return jump_label. * cfgrtl.c (force_nonfallthru_and_redirect): Use set_return_jump_label. * reorg.c (find_end_label): Likewise. diff -urp -x.svn -x'*~' -x'*.orig' gcc-bernd/gcc/rtl.h gcc-current/gcc/rtl.h --- gcc-bernd/gcc/rtl.h 2011-09-16 11:52:14.000000000 +0930 +++ gcc-current/gcc/rtl.h 2011-09-16 00:24:55.000000000 +0930 @@ -2502,6 +2502,7 @@ extern int sibcall_epilogue_contains (co extern void mark_temp_addr_taken (rtx); extern void update_temp_slot_address (rtx, rtx); extern void maybe_copy_prologue_epilogue_insn (rtx, rtx); +extern void set_return_jump_label (rtx); /* In stmt.c */ extern void expand_null_return (void); diff -urp -x.svn -x'*~' -x'*.orig' gcc-bernd/gcc/function.c gcc-current/gcc/function.c --- gcc-bernd/gcc/function.c 2011-09-16 11:53:40.000000000 +0930 +++ gcc-current/gcc/function.c 2011-09-16 00:24:55.000000000 +0930 @@ -5395,6 +5395,20 @@ emit_return_into_block (bool simple_p, b } #endif +/* Set JUMP_LABEL for a return insn. */ + +void +set_return_jump_label (rtx returnjump) +{ + rtx pat = PATTERN (returnjump); + if (GET_CODE (pat) == PARALLEL) + pat = XVECEXP (pat, 0, 0); + if (ANY_RETURN_P (pat)) + JUMP_LABEL (returnjump) = pat; + else + JUMP_LABEL (returnjump) = ret_rtx; +} + /* Generate the prologue and epilogue RTL if the machine supports it. Thread this into place with notes indicating where the prologue ends and where the epilogue begins. Update the basic block information when possible. @@ -5883,7 +5897,7 @@ thread_prologue_and_epilogue_insns (void emit_return_into_block (false, last_bb); epilogue_end = BB_END (last_bb); if (JUMP_P (epilogue_end)) - JUMP_LABEL (epilogue_end) = ret_rtx; + set_return_jump_label (epilogue_end); single_succ_edge (last_bb)->flags &= ~EDGE_FALLTHRU; goto epilogue_done; } @@ -5948,15 +5962,7 @@ thread_prologue_and_epilogue_insns (void inserted = true; if (JUMP_P (returnjump)) - { - rtx pat = PATTERN (returnjump); - if (GET_CODE (pat) == PARALLEL) - pat = XVECEXP (pat, 0, 0); - if (ANY_RETURN_P (pat)) - JUMP_LABEL (returnjump) = pat; - else - JUMP_LABEL (returnjump) = ret_rtx; - } + set_return_jump_label (returnjump); } else #endif diff -urp -x.svn -x'*~' -x'*.orig' gcc-bernd/gcc/cfgrtl.c gcc-current/gcc/cfgrtl.c --- gcc-bernd/gcc/cfgrtl.c 2011-09-16 11:52:15.000000000 +0930 +++ gcc-current/gcc/cfgrtl.c 2011-09-16 09:55:38.000000000 +0930 @@ -1273,7 +1273,7 @@ force_nonfallthru_and_redirect (edge e, gcc_unreachable (); #endif } - JUMP_LABEL (BB_END (jump_block)) = jump_label; + set_return_jump_label (BB_END (jump_block)); } else { diff -urp -x.svn -x'*~' -x'*.orig' gcc-bernd/gcc/reorg.c gcc-current/gcc/reorg.c --- gcc-bernd/gcc/reorg.c 2011-09-16 11:52:12.000000000 +0930 +++ gcc-current/gcc/reorg.c 2011-09-16 00:07:46.000000000 +0930 @@ -467,7 +467,7 @@ find_end_label (rtx kind) /* The return we make may have delay slots too. */ rtx insn = gen_return (); insn = emit_jump_insn (insn); - JUMP_LABEL (insn) = ret_rtx; + set_return_jump_label (insn); emit_barrier (); if (num_delay_slots (insn) > 0) obstack_ptr_grow (&unfilled_slots_obstack, insn); -- Alan Modra Australia Development Lab, IBM ^ permalink raw reply [flat|nested] 48+ messages in thread
* PowerPC shrink-wrap support 3 of 3 2011-09-17 7:59 PowerPC shrink-wrap support 0 of 3 Alan Modra 2011-09-17 8:22 ` PowerPC shrink-wrap support 1 " Alan Modra 2011-09-17 9:13 ` PowerPC shrink-wrap support 2 " Alan Modra @ 2011-09-17 9:26 ` Alan Modra 2011-09-26 14:25 ` Alan Modra 2011-09-17 18:16 ` PowerPC shrink-wrap support 0 of 3 Bernd Schmidt 3 siblings, 1 reply; 48+ messages in thread From: Alan Modra @ 2011-09-17 9:26 UTC (permalink / raw) To: gcc-patches, Bernd Schmidt, David Edelsohn Finally, the powerpc backend changes. These are mostly just mechanical. I'll note that we need both "simple_return" and "return" variants of the conditional returns because they can only be used when no epilogue is required. The "return" variant must use direct_return() as a predicate to check this requirement. The "simple_return" variant is used by the shrink_wrap optimization to enable a conditional return before the function prologue, and should *not* use direct_return() because the optimization is valid for functions that do require an epilogue. All other instructions in the rs6000 backend that currently use "return" are generated from the epilogue or are sibling calls, so "simple_return" is appropriate as per md.texi. * config/rs6000/rs6000.c (rs6000_make_savres_rtx): Delete unneeded declaration. (rs6000_make_savres_rtx): Rename to rs6000_emit_savres_rtx. Use simple_return in pattern, emit instruction, and set jump_label. (rs6000_emit_prologue): Update for rs6000_emit_savres_rtx. Use simple_return rather than return. (rs6000_output_mi_thunk): Use simple_return rather than return. * config/rs6000/altivec.md (restore_world): Likewise. * config/rs6000/spe.md (return_and_restore_gpregs_spe): Likewise. * config/rs6000/rs6000.md (sibcall*, sibcall_value*): Likewise. (return_internal*, return_and_restore*): Likewise. (any_return, return_pred, return_str): New iterators. (return, conditional return insns): Provide both return and simple_return variants. diff -urp -x.svn -x'*~' -x'*.orig' gcc-bernd/gcc/config/rs6000/rs6000.c gcc-current/gcc/config/rs6000/rs6000.c --- gcc-bernd/gcc/config/rs6000/rs6000.c 2011-09-16 11:52:15.000000000 +0930 +++ gcc-current/gcc/config/rs6000/rs6000.c 2011-09-15 22:25:50.000000000 +0930 @@ -899,8 +900,6 @@ static const char *rs6000_mangle_type (c static void rs6000_set_default_type_attributes (tree); static rtx rs6000_savres_routine_sym (rs6000_stack_t *, bool, bool, bool); static rtx rs6000_emit_stack_reset (rs6000_stack_t *, rtx, rtx, int, bool); -static rtx rs6000_make_savres_rtx (rs6000_stack_t *, rtx, int, - enum machine_mode, bool, bool, bool); static bool rs6000_reg_live_or_pic_offset_p (int); static tree rs6000_builtin_vectorized_libmass (tree, tree, tree); static tree rs6000_builtin_vectorized_function (tree, tree, tree); @@ -19729,10 +19764,11 @@ rs6000_emit_stack_reset (rs6000_stack_t } /* Construct a parallel rtx describing the effect of a call to an - out-of-line register save/restore routine. */ + out-of-line register save/restore routine, and emit the insn + or jump_insn as appropriate. */ static rtx -rs6000_make_savres_rtx (rs6000_stack_t *info, +rs6000_emit_savres_rtx (rs6000_stack_t *info, rtx frame_reg_rtx, int save_area_offset, enum machine_mode reg_mode, bool savep, bool gpr, bool lr) @@ -19742,6 +19778,7 @@ rs6000_make_savres_rtx (rs6000_stack_t * int reg_size = GET_MODE_SIZE (reg_mode); rtx sym; rtvec p; + rtx par, insn; offset = 0; start_reg = (gpr @@ -19752,7 +19789,7 @@ rs6000_make_savres_rtx (rs6000_stack_t * p = rtvec_alloc ((lr ? 4 : 3) + n_regs); if (!savep && lr) - RTVEC_ELT (p, offset++) = ret_rtx; + RTVEC_ELT (p, offset++) = simple_return_rtx; RTVEC_ELT (p, offset++) = gen_rtx_CLOBBER (VOIDmode, gen_rtx_REG (Pmode, 65)); @@ -19788,7 +19825,16 @@ rs6000_make_savres_rtx (rs6000_stack_t * RTVEC_ELT (p, i + offset) = gen_rtx_SET (VOIDmode, mem, reg); } - return gen_rtx_PARALLEL (VOIDmode, p); + par = gen_rtx_PARALLEL (VOIDmode, p); + + if (!savep && lr) + { + insn = emit_jump_insn (par); + JUMP_LABEL (insn) = simple_return_rtx; + } + else + insn = emit_insn (par); + return insn; } /* Determine whether the gp REG is really used. */ @@ -20087,16 +20133,13 @@ rs6000_emit_prologue (void) } else if (!WORLD_SAVE_P (info) && info->first_fp_reg_save != 64) { - rtx par; - - par = rs6000_make_savres_rtx (info, frame_reg_rtx, - info->fp_save_offset + sp_offset, - DFmode, - /*savep=*/true, /*gpr=*/false, - /*lr=*/(strategy - & SAVE_NOINLINE_FPRS_SAVES_LR) - != 0); - insn = emit_insn (par); + insn = rs6000_emit_savres_rtx (info, frame_reg_rtx, + info->fp_save_offset + sp_offset, + DFmode, + /*savep=*/true, /*gpr=*/false, + /*lr=*/((strategy + & SAVE_NOINLINE_FPRS_SAVES_LR) + != 0)); rs6000_frame_related (insn, frame_ptr_rtx, info->total_size, NULL_RTX, NULL_RTX); } @@ -20186,13 +20229,10 @@ rs6000_emit_prologue (void) } else { - rtx par; - - par = rs6000_make_savres_rtx (info, gen_rtx_REG (Pmode, 11), - 0, reg_mode, - /*savep=*/true, /*gpr=*/true, - /*lr=*/false); - insn = emit_insn (par); + insn = rs6000_emit_savres_rtx (info, gen_rtx_REG (Pmode, 11), + 0, reg_mode, + /*savep=*/true, /*gpr=*/true, + /*lr=*/false); rs6000_frame_related (insn, frame_ptr_rtx, info->total_size, NULL_RTX, NULL_RTX); } @@ -20204,8 +20244,6 @@ rs6000_emit_prologue (void) } else if (!WORLD_SAVE_P (info) && !saving_GPRs_inline) { - rtx par; - /* Need to adjust r11 (r12) if we saved any FPRs. */ if (info->first_fp_reg_save != 64) { @@ -20216,14 +20254,13 @@ rs6000_emit_prologue (void) emit_insn (gen_add3_insn (dest_reg, frame_reg_rtx, offset)); } - par = rs6000_make_savres_rtx (info, frame_reg_rtx, - info->gp_save_offset + sp_offset, - reg_mode, - /*savep=*/true, /*gpr=*/true, - /*lr=*/(strategy - & SAVE_NOINLINE_GPRS_SAVES_LR) - != 0); - insn = emit_insn (par); + insn = rs6000_emit_savres_rtx (info, frame_reg_rtx, + info->gp_save_offset + sp_offset, + reg_mode, + /*savep=*/true, /*gpr=*/true, + /*lr=*/((strategy + & SAVE_NOINLINE_GPRS_SAVES_LR) + != 0)); rs6000_frame_related (insn, frame_ptr_rtx, info->total_size, NULL_RTX, NULL_RTX); } @@ -20750,7 +20805,7 @@ rs6000_emit_epilogue (int sibcall) alloc_rname = ggc_strdup (rname); j = 0; - RTVEC_ELT (p, j++) = ret_rtx; + RTVEC_ELT (p, j++) = simple_return_rtx; RTVEC_ELT (p, j++) = gen_rtx_USE (VOIDmode, gen_rtx_REG (Pmode, LR_REGNO)); @@ -21165,13 +21214,11 @@ rs6000_emit_epilogue (int sibcall) } else { - rtx par; + rs6000_emit_savres_rtx (info, gen_rtx_REG (Pmode, 11), + 0, reg_mode, + /*savep=*/false, /*gpr=*/true, + /*lr=*/true); - par = rs6000_make_savres_rtx (info, gen_rtx_REG (Pmode, 11), - 0, reg_mode, - /*savep=*/false, /*gpr=*/true, - /*lr=*/true); - emit_jump_insn (par); /* We don't want anybody else emitting things after we jumped back. */ return; @@ -21181,12 +21228,22 @@ rs6000_emit_epilogue (int sibcall) { /* We are jumping to an out-of-line function. */ bool can_use_exit = info->first_fp_reg_save == 64; - rtx par; /* Emit stack reset code if we need it. */ if (can_use_exit) - rs6000_emit_stack_reset (info, sp_reg_rtx, frame_reg_rtx, - sp_offset, can_use_exit); + { + rs6000_emit_stack_reset (info, sp_reg_rtx, frame_reg_rtx, + sp_offset, can_use_exit); + if (info->cr_save_p) + { + rs6000_restore_saved_cr (cr_save_reg, using_mtcr_multiple); + if (DEFAULT_ABI == ABI_V4) + cfa_restores + = alloc_reg_note (REG_CFA_RESTORE, + gen_rtx_REG (SImode, CR2_REGNO), + cfa_restores); + } + } else { emit_insn (gen_add3_insn (gen_rtx_REG (Pmode, DEFAULT_ABI == ABI_AIX @@ -21197,31 +21254,18 @@ rs6000_emit_epilogue (int sibcall) sp_offset += info->fp_size; } - par = rs6000_make_savres_rtx (info, frame_reg_rtx, - info->gp_save_offset, reg_mode, - /*savep=*/false, /*gpr=*/true, - /*lr=*/can_use_exit); + insn = rs6000_emit_savres_rtx (info, frame_reg_rtx, + info->gp_save_offset, reg_mode, + /*savep=*/false, /*gpr=*/true, + /*lr=*/can_use_exit); if (can_use_exit) { - if (info->cr_save_p) - { - rs6000_restore_saved_cr (cr_save_reg, using_mtcr_multiple); - if (DEFAULT_ABI == ABI_V4) - cfa_restores - = alloc_reg_note (REG_CFA_RESTORE, - gen_rtx_REG (SImode, CR2_REGNO), - cfa_restores); - } - - emit_jump_insn (par); - /* We don't want anybody else emitting things after we jumped back. */ return; } - insn = emit_insn (par); if (DEFAULT_ABI == ABI_V4) { if (frame_pointer_needed) @@ -21366,7 +21410,7 @@ rs6000_emit_epilogue (int sibcall) else p = rtvec_alloc (2); - RTVEC_ELT (p, 0) = ret_rtx; + RTVEC_ELT (p, 0) = simple_return_rtx; RTVEC_ELT (p, 1) = ((restoring_FPRs_inline || !lr) ? gen_rtx_USE (VOIDmode, gen_rtx_REG (Pmode, 65)) : gen_rtx_CLOBBER (VOIDmode, @@ -21768,7 +21812,7 @@ rs6000_output_mi_thunk (FILE *file, tree gen_rtx_USE (VOIDmode, gen_rtx_REG (SImode, LR_REGNO)), - ret_rtx))); + simple_return_rtx))); SIBLING_CALL_P (insn) = 1; emit_barrier (); diff -urp -x.svn -x'*~' -x'*.orig' gcc-bernd/gcc/config/rs6000/altivec.md gcc-current/gcc/config/rs6000/altivec.md --- gcc-bernd/gcc/config/rs6000/altivec.md 2011-09-16 11:52:15.000000000 +0930 +++ gcc-current/gcc/config/rs6000/altivec.md 2011-09-15 18:28:43.000000000 +0930 @@ -306,7 +306,7 @@ (define_insn "*restore_world" [(match_parallel 0 "restore_world_operation" - [(return) + [(simple_return) (use (reg:SI 65)) (use (match_operand:SI 1 "call_operand" "s")) (clobber (match_operand:SI 2 "gpc_reg_operand" "=r"))])] diff -urp -x.svn -x'*~' -x'*.orig' gcc-bernd/gcc/config/rs6000/rs6000.md gcc-current/gcc/config/rs6000/rs6000.md --- gcc-bernd/gcc/config/rs6000/rs6000.md 2011-09-16 11:52:15.000000000 +0930 +++ gcc-current/gcc/config/rs6000/rs6000.md 2011-09-16 11:32:31.000000000 +0930 @@ -264,6 +265,12 @@ ; Iterator for just SF/DF (define_mode_iterator SFDF [SF DF]) +; Conditional returns. +(define_code_iterator any_return [return simple_return]) +(define_code_attr return_pred [(return "direct_return ()") + (simple_return "")]) +(define_code_attr return_str [(return "") (simple_return "simple_")]) + ; Various instructions that come in SI and DI forms. ; A generic w/d attribute, for things like cmpw/cmpd. (define_mode_attr wd [(QI "b") (HI "h") (SI "w") (DI "d")]) @@ -12814,7 +12831,7 @@ (match_operand 1 "" "")) (use (match_operand 2 "" "")) (use (reg:SI LR_REGNO)) - (return)])] + (simple_return)])] "" " { @@ -12838,7 +12855,7 @@ (match_operand 1 "" "g,g")) (use (match_operand:SI 2 "immediate_operand" "O,n")) (use (reg:SI LR_REGNO)) - (return)] + (simple_return)] "(INTVAL (operands[2]) & CALL_LONG) == 0" "* { @@ -12858,7 +12875,7 @@ (match_operand 1 "" "g,g")) (use (match_operand:SI 2 "immediate_operand" "O,n")) (use (reg:SI LR_REGNO)) - (return)] + (simple_return)] "TARGET_64BIT && (INTVAL (operands[2]) & CALL_LONG) == 0" "* { @@ -12879,7 +12896,7 @@ (match_operand 2 "" "g,g"))) (use (match_operand:SI 3 "immediate_operand" "O,n")) (use (reg:SI LR_REGNO)) - (return)] + (simple_return)] "(INTVAL (operands[3]) & CALL_LONG) == 0" "* { @@ -12901,7 +12918,7 @@ (match_operand 2 "" "g,g"))) (use (match_operand:SI 3 "immediate_operand" "O,n")) (use (reg:SI LR_REGNO)) - (return)] + (simple_return)] "TARGET_64BIT && (INTVAL (operands[3]) & CALL_LONG) == 0" "* { @@ -12921,7 +12938,7 @@ (match_operand 1 "" "g,g")) (use (match_operand:SI 2 "immediate_operand" "O,O")) (use (reg:SI LR_REGNO)) - (return)] + (simple_return)] "DEFAULT_ABI == ABI_AIX && (INTVAL (operands[2]) & CALL_LONG) == 0" "@ @@ -12936,7 +12953,7 @@ (match_operand 2 "" "g,g"))) (use (match_operand:SI 3 "immediate_operand" "O,O")) (use (reg:SI LR_REGNO)) - (return)] + (simple_return)] "DEFAULT_ABI == ABI_AIX && (INTVAL (operands[3]) & CALL_LONG) == 0" "@ @@ -12950,7 +12967,7 @@ (match_operand 1 "" "")) (use (match_operand 2 "immediate_operand" "O,n,O,n")) (use (reg:SI LR_REGNO)) - (return)] + (simple_return)] "(DEFAULT_ABI == ABI_DARWIN || DEFAULT_ABI == ABI_V4) && (INTVAL (operands[2]) & CALL_LONG) == 0" @@ -12981,7 +12998,7 @@ (match_operand 2 "" ""))) (use (match_operand 3 "" "")) (use (reg:SI LR_REGNO)) - (return)])] + (simple_return)])] "" " { @@ -13002,7 +13019,7 @@ (match_operand 2 "" ""))) (use (match_operand:SI 3 "immediate_operand" "O,n,O,n")) (use (reg:SI LR_REGNO)) - (return)] + (simple_return)] "(DEFAULT_ABI == ABI_DARWIN || DEFAULT_ABI == ABI_V4) && (INTVAL (operands[3]) & CALL_LONG) == 0" @@ -15328,9 +15345,9 @@ [(match_operand 1 "cc_reg_operand" "y") (const_int 0)]) - (return) + (any_return) (pc)))] - "direct_return ()" + "<return_pred>" "* { return output_cbranch (operands[0], NULL, 0, insn); @@ -15360,8 +15377,8 @@ "cc_reg_operand" "y") (const_int 0)]) (pc) - (return)))] - "direct_return ()" + (any_return)))] + "<return_pred>" "* { return output_cbranch (operands[0], NULL, 1, insn); @@ -15491,9 +15508,9 @@ "b %l0" [(set_attr "type" "branch")]) -(define_insn "return" - [(return)] - "direct_return ()" +(define_insn "<return_str>return" + [(any_return)] + "<return_pred>" "{br|blr}" [(set_attr "type" "jmpreg")]) @@ -16015,7 +16032,7 @@ (set_attr "cell_micro" "always")]) (define_insn "*return_internal_<mode>" - [(return) + [(simple_return) (use (match_operand:P 0 "register_operand" "lc"))] "" "b%T0" @@ -16077,7 +16094,7 @@ (define_insn "*return_and_restore_gpregs_<mode>_r11" [(match_parallel 0 "any_parallel_operand" - [(return) + [(simple_return) (clobber (match_operand:P 1 "register_operand" "=l")) (use (match_operand:P 2 "symbol_ref_operand" "s")) (use (reg:P 11)) @@ -16090,7 +16107,7 @@ (define_insn "*return_and_restore_gpregs_<mode>_r12" [(match_parallel 0 "any_parallel_operand" - [(return) + [(simple_return) (clobber (match_operand:P 1 "register_operand" "=l")) (use (match_operand:P 2 "symbol_ref_operand" "s")) (use (reg:P 12)) @@ -16103,7 +16120,7 @@ (define_insn "*return_and_restore_gpregs_<mode>_r1" [(match_parallel 0 "any_parallel_operand" - [(return) + [(simple_return) (clobber (match_operand:P 1 "register_operand" "=l")) (use (match_operand:P 2 "symbol_ref_operand" "s")) (use (reg:P 1)) @@ -16116,7 +16133,7 @@ (define_insn "*return_and_restore_fpregs_<mode>_r11" [(match_parallel 0 "any_parallel_operand" - [(return) + [(simple_return) (clobber (match_operand:P 1 "register_operand" "=l")) (use (match_operand:P 2 "symbol_ref_operand" "s")) (use (reg:P 11)) @@ -16129,7 +16146,7 @@ (define_insn "*return_and_restore_fpregs_<mode>_r12" [(match_parallel 0 "any_parallel_operand" - [(return) + [(simple_return) (clobber (match_operand:P 1 "register_operand" "=l")) (use (match_operand:P 2 "symbol_ref_operand" "s")) (use (reg:P 12)) @@ -16142,7 +16159,7 @@ (define_insn "*return_and_restore_fpregs_<mode>_r1" [(match_parallel 0 "any_parallel_operand" - [(return) + [(simple_return) (clobber (match_operand:P 1 "register_operand" "=l")) (use (match_operand:P 2 "symbol_ref_operand" "s")) (use (reg:P 1)) @@ -16155,7 +16172,7 @@ (define_insn "*return_and_restore_fpregs_aix_<mode>_r11" [(match_parallel 0 "any_parallel_operand" - [(return) + [(simple_return) (use (match_operand:P 1 "register_operand" "l")) (use (match_operand:P 2 "symbol_ref_operand" "s")) (use (reg:P 11)) @@ -16168,7 +16185,7 @@ (define_insn "*return_and_restore_fpregs_aix_<mode>_r1" [(match_parallel 0 "any_parallel_operand" - [(return) + [(simple_return) (use (match_operand:P 1 "register_operand" "l")) (use (match_operand:P 2 "symbol_ref_operand" "s")) (use (reg:P 1)) diff -urp -x.svn -x'*~' -x'*.orig' gcc-bernd/gcc/config/rs6000/spe.md gcc-current/gcc/config/rs6000/spe.md --- gcc-bernd/gcc/config/rs6000/spe.md 2011-09-16 11:52:15.000000000 +0930 +++ gcc-current/gcc/config/rs6000/spe.md 2011-09-15 18:28:43.000000000 +0930 @@ -3178,7 +3178,7 @@ (define_insn "*return_and_restore_gpregs_spe" [(match_parallel 0 "any_parallel_operand" - [(return) + [(simple_return) (clobber (reg:P 65)) (use (match_operand:P 1 "symbol_ref_operand" "s")) (use (reg:P 11)) -- Alan Modra Australia Development Lab, IBM ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: PowerPC shrink-wrap support 3 of 3 2011-09-17 9:26 ` PowerPC shrink-wrap support 3 " Alan Modra @ 2011-09-26 14:25 ` Alan Modra 2011-09-27 0:15 ` Alan Modra 0 siblings, 1 reply; 48+ messages in thread From: Alan Modra @ 2011-09-26 14:25 UTC (permalink / raw) To: gcc-patches, Bernd Schmidt, David Edelsohn This patch fixes an issue that limit opportunities for shrink-wrapping on PowerPC. The rs6000 REG_ALLOC_ORDER chooses r0 as the very first gpr to use in code, with r11 also having high priority. This means it is quite likely that r0 or r11 is live on the edge chosen for shrink-wrapping. That's unfortunate since rs6000 uses r0, r11, and r12 as temporaries in the prologue, and thus can't insert the prologue on that edge. I avoid this situation in most cases by simply changing the register allocation order, making r0 and r11 the last of the volatile gprs to be used. I also noticed that r12 is not used until after the non-volatile regs are allocated. According to the comment, the reason this is done is to avoid putting a DI into the r12,r13 pair (and as r13 is non-volatile would mean saving and restoring all the non-volatile gprs when not inlining the saves/restores). However, when r13 is a fixed reg this cannot occur, so moving r12 before the non-volatile regs makes another reg available before we need to start saving regs. After making the register allocation order changes I found gcc bootstrap failed. This turned out to be caused by shrink-wrapping and register renaming moving some pre-prologue code after the epilogue. To support this we need to ensure no prologue unwind info is left active after the epilogue. Bootstrapped and regression tested powerpc-linux. Two regressions appeared due to a problem in the shrink-wrap code. More on that later. * config/rs6000/rs6000.c (rs6000_emit_epilogue): Emit cfa_restores when flag_shrink_wrap. * gcc/config/rs6000/rs6000.h (EARLY_R12, LATE_R12): Define. (REG_ALLOC_ORDER): Move r12 before call-saved regs when FIXED_R13. Move r11 and r0 later to suit shrink-wrapping. diff -urp -x'*~' -x'*.orig' -x'*.rej' -x.svn gcc-alanshrink1/gcc/config/rs6000/rs6000.c gcc-current/gcc/config/rs6000/rs6000.c --- gcc-alanshrink1/gcc/config/rs6000/rs6000.c 2011-09-26 15:51:19.000000000 +0930 +++ gcc-current/gcc/config/rs6000/rs6000.c 2011-09-23 11:54:00.000000000 +0930 @@ -21001,7 +21045,7 @@ rs6000_emit_epilogue (int sibcall) reg = gen_rtx_REG (V4SImode, i); emit_move_insn (reg, mem); - if (DEFAULT_ABI == ABI_V4) + if (DEFAULT_ABI == ABI_V4 || flag_shrink_wrap) cfa_restores = alloc_reg_note (REG_CFA_RESTORE, reg, cfa_restores); } @@ -21052,10 +21096,16 @@ rs6000_emit_epilogue (int sibcall) } /* Set LR here to try to overlap restores below. LR is always saved - above incoming stack, so it never needs REG_CFA_RESTORE. */ + above incoming stack, so it never needs REG_CFA_RESTORE except + when shrink-wrapping as in that case we may have blocks from + before the prologue being moved after the epilogue. */ if (restore_lr && restoring_GPRs_inline) - emit_move_insn (gen_rtx_REG (Pmode, LR_REGNO), - gen_rtx_REG (Pmode, 0)); + { + rtx lr = gen_rtx_REG (Pmode, LR_REGNO); + emit_move_insn (lr, gen_rtx_REG (Pmode, 0)); + if (flag_shrink_wrap) + cfa_restores = alloc_reg_note (REG_CFA_RESTORE, lr, cfa_restores); + } /* Load exception handler data registers, if needed. */ if (crtl->calls_eh_return) @@ -21146,7 +21196,7 @@ rs6000_emit_epilogue (int sibcall) reg = gen_rtx_REG (reg_mode, info->first_gp_reg_save + i); insn = emit_move_insn (reg, mem); - if (DEFAULT_ABI == ABI_V4) + if (DEFAULT_ABI == ABI_V4 || flag_shrink_wrap) { if (frame_pointer_needed && info->first_gp_reg_save + i @@ -21188,7 +21238,7 @@ rs6000_emit_epilogue (int sibcall) if (info->cr_save_p) { rs6000_restore_saved_cr (cr_save_reg, using_mtcr_multiple); - if (DEFAULT_ABI == ABI_V4) + if (DEFAULT_ABI == ABI_V4 || flag_shrink_wrap) cfa_restores = alloc_reg_note (REG_CFA_RESTORE, gen_rtx_REG (SImode, CR2_REGNO), @@ -21217,7 +21267,7 @@ rs6000_emit_epilogue (int sibcall) return; } - if (DEFAULT_ABI == ABI_V4) + if (DEFAULT_ABI == ABI_V4 || flag_shrink_wrap) { if (frame_pointer_needed) { @@ -21246,7 +21296,7 @@ rs6000_emit_epilogue (int sibcall) rtx reg = gen_rtx_REG (reg_mode, info->first_gp_reg_save + i); RTVEC_ELT (p, i) = gen_rtx_SET (VOIDmode, reg, mem); - if (DEFAULT_ABI == ABI_V4) + if (DEFAULT_ABI == ABI_V4 || flag_shrink_wrap) cfa_restores = alloc_reg_note (REG_CFA_RESTORE, reg, cfa_restores); } @@ -21271,7 +21321,7 @@ rs6000_emit_epilogue (int sibcall) rtx reg = gen_rtx_REG (reg_mode, info->first_gp_reg_save + i); insn = emit_move_insn (reg, mem); - if (DEFAULT_ABI == ABI_V4) + if (DEFAULT_ABI == ABI_V4 || flag_shrink_wrap) { if (frame_pointer_needed && info->first_gp_reg_save + i @@ -21292,10 +21342,12 @@ rs6000_emit_epilogue (int sibcall) { rtx mem = gen_frame_mem_offset (Pmode, frame_reg_rtx, info->lr_save_offset + sp_offset); + rtx lr = gen_rtx_REG (Pmode, LR_REGNO); emit_move_insn (gen_rtx_REG (Pmode, 0), mem); - emit_move_insn (gen_rtx_REG (Pmode, LR_REGNO), - gen_rtx_REG (Pmode, 0)); + emit_move_insn (lr, gen_rtx_REG (Pmode, 0)); + if (flag_shrink_wrap) + cfa_restores = alloc_reg_note (REG_CFA_RESTORE, lr, cfa_restores); } /* Restore fpr's if we need to do it without calling a function. */ @@ -21316,7 +21368,7 @@ rs6000_emit_epilogue (int sibcall) info->first_fp_reg_save + i); emit_move_insn (reg, mem); - if (DEFAULT_ABI == ABI_V4) + if (DEFAULT_ABI == ABI_V4 || flag_shrink_wrap) cfa_restores = alloc_reg_note (REG_CFA_RESTORE, reg, cfa_restores); } @@ -21325,7 +21377,7 @@ rs6000_emit_epilogue (int sibcall) if (info->cr_save_p) { rs6000_restore_saved_cr (cr_save_reg, using_mtcr_multiple); - if (DEFAULT_ABI == ABI_V4) + if (DEFAULT_ABI == ABI_V4 || flag_shrink_wrap) cfa_restores = alloc_reg_note (REG_CFA_RESTORE, gen_rtx_REG (SImode, CR2_REGNO), cfa_restores); diff -urp -x'*~' -x'*.orig' -x'*.rej' -x.svn gcc-alanshrink1/gcc/config/rs6000/rs6000.h gcc-current/gcc/config/rs6000/rs6000.h --- gcc-alanshrink1/gcc/config/rs6000/rs6000.h 2011-09-26 15:29:15.000000000 +0930 +++ gcc-current/gcc/config/rs6000/rs6000.h 2011-09-22 02:30:15.000000000 +0930 @@ -894,10 +894,11 @@ extern unsigned rs6000_pointer_size; cr1 (not saved, but used for FP operations) cr0 (not saved, but used for arithmetic operations) cr4, cr3, cr2 (saved) - r0 (not saved; cannot be base reg) r9 (not saved; best for TImode) - r11, r10, r8-r4 (not saved; highest used first to make less conflict) + r10, r8-r4 (not saved; highest first for less conflict with params) r3 (not saved; return value register) + r11 (not saved; later alloc to help shrink-wrap) + r0 (not saved; cannot be base reg) r31 - r13 (saved; order given to save least number) r12 (not saved; if used for DImode or DFmode would use r13) mq (not saved; best to use it if we can) @@ -922,6 +923,14 @@ extern unsigned rs6000_pointer_size; #define MAYBE_R2_FIXED #endif +#if FIXED_R13 == 1 +#define EARLY_R12 12, +#define LATE_R12 +#else +#define EARLY_R12 +#define LATE_R12 12, +#endif + #define REG_ALLOC_ORDER \ {32, \ 45, 44, 43, 42, 41, 40, 39, 38, 37, 36, 35, 34, \ @@ -929,11 +938,11 @@ extern unsigned rs6000_pointer_size; 63, 62, 61, 60, 59, 58, 57, 56, 55, 54, 53, 52, 51, \ 50, 49, 48, 47, 46, \ 75, 74, 69, 68, 72, 71, 70, \ - 0, MAYBE_R2_AVAILABLE \ - 9, 11, 10, 8, 7, 6, 5, 4, \ - 3, \ + MAYBE_R2_AVAILABLE \ + 9, 10, 8, 7, 6, 5, 4, \ + 3, EARLY_R12 11, 0, \ 31, 30, 29, 28, 27, 26, 25, 24, 23, 22, 21, 20, 19, \ - 18, 17, 16, 15, 14, 13, 12, \ + 18, 17, 16, 15, 14, 13, LATE_R12 \ 64, 66, 65, \ 73, 1, MAYBE_R2_FIXED 67, 76, \ /* AltiVec registers. */ \ -- Alan Modra Australia Development Lab, IBM ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: PowerPC shrink-wrap support 3 of 3 2011-09-26 14:25 ` Alan Modra @ 2011-09-27 0:15 ` Alan Modra 2011-09-27 0:19 ` Bernd Schmidt 0 siblings, 1 reply; 48+ messages in thread From: Alan Modra @ 2011-09-27 0:15 UTC (permalink / raw) To: gcc-patches, Bernd Schmidt, Richard Henderson On Mon, Sep 26, 2011 at 11:22:54PM +0930, Alan Modra wrote: > Two regressions appeared due to a problem in the shrink-wrap code. These two. +FAIL: g++.dg/torture/pr46111.C -O1 (internal compiler error) +FAIL: gcc.dg/autopar/pr46099.c (internal compiler error) Both "internal compiler error: in maybe_record_trace_start, at dwarf2cfi.c:2243", caused by disjoint blocks in the set of blocks that needs no prologue. ie. thread_prologue_and_epilogue_insns gives ==== code needing no prologue ==== <- prologue inserted here code needing prologue ==== more no prologue code ==== more code needing prologue ==== epilogue That second block needing no prologue now wrongly tries to use unwind info from the prologue. -- Alan Modra Australia Development Lab, IBM ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: PowerPC shrink-wrap support 3 of 3 2011-09-27 0:15 ` Alan Modra @ 2011-09-27 0:19 ` Bernd Schmidt 2011-09-27 0:49 ` Alan Modra 0 siblings, 1 reply; 48+ messages in thread From: Bernd Schmidt @ 2011-09-27 0:19 UTC (permalink / raw) To: gcc-patches, Richard Henderson On 09/27/11 00:32, Alan Modra wrote: > On Mon, Sep 26, 2011 at 11:22:54PM +0930, Alan Modra wrote: >> Two regressions appeared due to a problem in the shrink-wrap code. > > These two. > +FAIL: g++.dg/torture/pr46111.C -O1 (internal compiler error) > +FAIL: gcc.dg/autopar/pr46099.c (internal compiler error) > > Both "internal compiler error: in maybe_record_trace_start, at > dwarf2cfi.c:2243", caused by disjoint blocks in the set of blocks that > needs no prologue. ie. thread_prologue_and_epilogue_insns gives > > ==== > code needing no prologue > ==== <- prologue inserted here > code needing prologue > ==== > more no prologue code > ==== > more code needing prologue > ==== > epilogue > > That second block needing no prologue now wrongly tries to use unwind > info from the prologue. What's the actual CFG structure here? Bernd ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: PowerPC shrink-wrap support 3 of 3 2011-09-27 0:19 ` Bernd Schmidt @ 2011-09-27 0:49 ` Alan Modra 2011-09-27 1:08 ` Bernd Schmidt 0 siblings, 1 reply; 48+ messages in thread From: Alan Modra @ 2011-09-27 0:49 UTC (permalink / raw) To: Bernd Schmidt; +Cc: gcc-patches, Richard Henderson On Tue, Sep 27, 2011 at 12:39:36AM +0200, Bernd Schmidt wrote: > On 09/27/11 00:32, Alan Modra wrote: > > On Mon, Sep 26, 2011 at 11:22:54PM +0930, Alan Modra wrote: > >> Two regressions appeared due to a problem in the shrink-wrap code. > > > > These two. > > +FAIL: g++.dg/torture/pr46111.C -O1 (internal compiler error) > > +FAIL: gcc.dg/autopar/pr46099.c (internal compiler error) > > > > Both "internal compiler error: in maybe_record_trace_start, at > > dwarf2cfi.c:2243", caused by disjoint blocks in the set of blocks that > > needs no prologue. ie. thread_prologue_and_epilogue_insns gives > > > > ==== > > code needing no prologue > > ==== <- prologue inserted here > > code needing prologue > > ==== > > more no prologue code > > ==== > > more code needing prologue > > ==== > > epilogue > > > > That second block needing no prologue now wrongly tries to use unwind > > info from the prologue. > > What's the actual CFG structure here? Immediately after thread_prologue_and_epilogue_insns bb2 -> simple_ret | bb3 /\ / \ bb4 bb5 (pro) | | |<--| bb7 | | (epi) bb6--| | | bb8<-| | (s ret)| | |-bb9 -- Alan Modra Australia Development Lab, IBM ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: PowerPC shrink-wrap support 3 of 3 2011-09-27 0:49 ` Alan Modra @ 2011-09-27 1:08 ` Bernd Schmidt 2011-09-27 2:16 ` Alan Modra 0 siblings, 1 reply; 48+ messages in thread From: Bernd Schmidt @ 2011-09-27 1:08 UTC (permalink / raw) To: gcc-patches, Richard Henderson On 09/27/11 02:11, Alan Modra wrote: > On Tue, Sep 27, 2011 at 12:39:36AM +0200, Bernd Schmidt wrote: >> On 09/27/11 00:32, Alan Modra wrote: >>> On Mon, Sep 26, 2011 at 11:22:54PM +0930, Alan Modra wrote: >>>> Two regressions appeared due to a problem in the shrink-wrap code. >>> >>> These two. >>> +FAIL: g++.dg/torture/pr46111.C -O1 (internal compiler error) >>> +FAIL: gcc.dg/autopar/pr46099.c (internal compiler error) >>> >>> Both "internal compiler error: in maybe_record_trace_start, at >>> dwarf2cfi.c:2243", caused by disjoint blocks in the set of blocks that >>> needs no prologue. ie. thread_prologue_and_epilogue_insns gives >>> >>> ==== >>> code needing no prologue >>> ==== <- prologue inserted here >>> code needing prologue >>> ==== >>> more no prologue code >>> ==== >>> more code needing prologue >>> ==== >>> epilogue >>> >>> That second block needing no prologue now wrongly tries to use unwind >>> info from the prologue. >> >> What's the actual CFG structure here? > > Immediately after thread_prologue_and_epilogue_insns > > bb2 -> simple_ret > | > bb3 > /\ > / \ > bb4 bb5 > (pro) | > | |<--| > bb7 | | > (epi) bb6--| > | | > bb8<-| | > (s ret)| | > |-bb9 That looks perfectly reasonable, and even if it is ordered in the way shown above, dwarf2cfi still should be able to deal with it if the prologue and epilogue are annotated correctly. Again, at the crash site, examine the start insn to find out which basic block is the problem, and use dump_cfi_row on the two rows it tried to compare (and found an unexpected difference). Bernd ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: PowerPC shrink-wrap support 3 of 3 2011-09-27 1:08 ` Bernd Schmidt @ 2011-09-27 2:16 ` Alan Modra 2011-09-28 16:35 ` Alan Modra 0 siblings, 1 reply; 48+ messages in thread From: Alan Modra @ 2011-09-27 2:16 UTC (permalink / raw) To: Bernd Schmidt; +Cc: gcc-patches, Richard Henderson On Tue, Sep 27, 2011 at 02:15:24AM +0200, Bernd Schmidt wrote: > That looks perfectly reasonable, and even if it is ordered in the way > shown above, dwarf2cfi still should be able to deal with it if the > prologue and epilogue are annotated correctly. Here I was thinking that it was someone else's problem. :-) I looked at this before and saw some epilogue cfa_restores, but I see now we're missing some for -m64. More work needed in rs6000_emit_epilogue.. Sorry for the noise. -- Alan Modra Australia Development Lab, IBM ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: PowerPC shrink-wrap support 3 of 3 2011-09-27 2:16 ` Alan Modra @ 2011-09-28 16:35 ` Alan Modra 2011-10-16 20:19 ` David Edelsohn 0 siblings, 1 reply; 48+ messages in thread From: Alan Modra @ 2011-09-28 16:35 UTC (permalink / raw) To: gcc-patches, David Edelsohn; +Cc: Bernd Schmidt This supercedes http://gcc.gnu.org/ml/gcc-patches/2011-09/msg01004.html and http://gcc.gnu.org/ml/gcc-patches/2011-09/msg01593.html, fixing the two regressions introduced by those patches. The first patch is unchanged except to leave all the out-of-line restore functions using "return" rather than "simple_return". We don't want these being confused with a plain "simple_return" and perhaps used by the shrink- wrapping to return from pre-prologue code. The second of these two patches was way too simplistic. It was a real pain getting the cfa_restores correct. A lot were missing, or emitted at the wrong place (due to bug in rs6000_emit_stack_reset). I also had the real restore insn move past the cfa_restores ("mtlr 0" insn scheduled over loads from stack). * config/rs6000/rs6000.c (rs6000_make_savres_rtx): Delete unneeded declaration. (rs6000_emit_stack_reset): Only return insn emitted when it adjusts sp. (rs6000_make_savres_rtx): Rename to rs6000_emit_savres_rtx. Use simple_return in pattern, emit instruction, and set jump_label. (rs6000_emit_prologue): Update for rs6000_emit_savres_rtx. Use simple_return rather than return. (emit_cfa_restores): New function. (rs6000_emit_epilogue): Emit cfa_restores when flag_shrink_wrap. Add missing cfa_restores for SAVE_WORLD. Add missing LR cfa_restore when using out-of-line gpr restore. Add missing LR and FP regs cfa_restores for out-of-line fpr restore. Consolidate code setting up cfa_restores. Formatting. Use LR_REGNO define. (rs6000_output_mi_thunk): Use simple_return rather than return. * config/rs6000/rs6000.md (sibcall*, sibcall_value*): Likewise. (return_internal*): Likewise. (any_return, return_pred, return_str): New iterators. (return, conditional return insns): Provide both return and simple_return variants. * gcc/config/rs6000/rs6000.h (EARLY_R12, LATE_R12): Define. (REG_ALLOC_ORDER): Move r12 before call-saved regs when FIXED_R13. Move r11 and r0 later to suit shrink-wrapping. Index: gcc/config/rs6000/rs6000.c =================================================================== --- gcc/config/rs6000/rs6000.c (revision 178876) +++ gcc/config/rs6000/rs6000.c (working copy) @@ -899,8 +900,6 @@ static const char *rs6000_mangle_type (c static void rs6000_set_default_type_attributes (tree); static rtx rs6000_savres_routine_sym (rs6000_stack_t *, bool, bool, bool); static rtx rs6000_emit_stack_reset (rs6000_stack_t *, rtx, rtx, int, bool); -static rtx rs6000_make_savres_rtx (rs6000_stack_t *, rtx, int, - enum machine_mode, bool, bool, bool); static bool rs6000_reg_live_or_pic_offset_p (int); static tree rs6000_builtin_vectorized_libmass (tree, tree, tree); static tree rs6000_builtin_vectorized_function (tree, tree, tree); @@ -19704,8 +19728,10 @@ rs6000_emit_stack_reset (rs6000_stack_t if (sp_offset != 0) { rtx dest_reg = savres ? gen_rtx_REG (Pmode, 11) : sp_reg_rtx; - return emit_insn (gen_add3_insn (dest_reg, frame_reg_rtx, - GEN_INT (sp_offset))); + rtx insn = emit_insn (gen_add3_insn (dest_reg, frame_reg_rtx, + GEN_INT (sp_offset))); + if (!savres) + return insn; } else if (!savres) return emit_move_insn (sp_reg_rtx, frame_reg_rtx); @@ -19729,10 +19755,11 @@ rs6000_emit_stack_reset (rs6000_stack_t } /* Construct a parallel rtx describing the effect of a call to an - out-of-line register save/restore routine. */ + out-of-line register save/restore routine, and emit the insn + or jump_insn as appropriate. */ static rtx -rs6000_make_savres_rtx (rs6000_stack_t *info, +rs6000_emit_savres_rtx (rs6000_stack_t *info, rtx frame_reg_rtx, int save_area_offset, enum machine_mode reg_mode, bool savep, bool gpr, bool lr) @@ -19742,6 +19769,7 @@ rs6000_make_savres_rtx (rs6000_stack_t * int reg_size = GET_MODE_SIZE (reg_mode); rtx sym; rtvec p; + rtx par, insn; offset = 0; start_reg = (gpr @@ -19755,7 +19783,7 @@ rs6000_make_savres_rtx (rs6000_stack_t * RTVEC_ELT (p, offset++) = ret_rtx; RTVEC_ELT (p, offset++) - = gen_rtx_CLOBBER (VOIDmode, gen_rtx_REG (Pmode, 65)); + = gen_rtx_CLOBBER (VOIDmode, gen_rtx_REG (Pmode, LR_REGNO)); sym = rs6000_savres_routine_sym (info, savep, gpr, lr); RTVEC_ELT (p, offset++) = gen_rtx_USE (VOIDmode, sym); @@ -19788,7 +19816,16 @@ rs6000_make_savres_rtx (rs6000_stack_t * RTVEC_ELT (p, i + offset) = gen_rtx_SET (VOIDmode, mem, reg); } - return gen_rtx_PARALLEL (VOIDmode, p); + par = gen_rtx_PARALLEL (VOIDmode, p); + + if (!savep && lr) + { + insn = emit_jump_insn (par); + JUMP_LABEL (insn) = ret_rtx; + } + else + insn = emit_insn (par); + return insn; } /* Determine whether the gp REG is really used. */ @@ -20087,16 +20124,13 @@ rs6000_emit_prologue (void) } else if (!WORLD_SAVE_P (info) && info->first_fp_reg_save != 64) { - rtx par; - - par = rs6000_make_savres_rtx (info, frame_reg_rtx, - info->fp_save_offset + sp_offset, - DFmode, - /*savep=*/true, /*gpr=*/false, - /*lr=*/(strategy - & SAVE_NOINLINE_FPRS_SAVES_LR) - != 0); - insn = emit_insn (par); + insn = rs6000_emit_savres_rtx (info, frame_reg_rtx, + info->fp_save_offset + sp_offset, + DFmode, + /*savep=*/true, /*gpr=*/false, + /*lr=*/((strategy + & SAVE_NOINLINE_FPRS_SAVES_LR) + != 0)); rs6000_frame_related (insn, frame_ptr_rtx, info->total_size, NULL_RTX, NULL_RTX); } @@ -20186,13 +20220,10 @@ rs6000_emit_prologue (void) } else { - rtx par; - - par = rs6000_make_savres_rtx (info, gen_rtx_REG (Pmode, 11), - 0, reg_mode, - /*savep=*/true, /*gpr=*/true, - /*lr=*/false); - insn = emit_insn (par); + insn = rs6000_emit_savres_rtx (info, gen_rtx_REG (Pmode, 11), + 0, reg_mode, + /*savep=*/true, /*gpr=*/true, + /*lr=*/false); rs6000_frame_related (insn, frame_ptr_rtx, info->total_size, NULL_RTX, NULL_RTX); } @@ -20204,8 +20235,6 @@ rs6000_emit_prologue (void) } else if (!WORLD_SAVE_P (info) && !saving_GPRs_inline) { - rtx par; - /* Need to adjust r11 (r12) if we saved any FPRs. */ if (info->first_fp_reg_save != 64) { @@ -20216,14 +20245,13 @@ rs6000_emit_prologue (void) emit_insn (gen_add3_insn (dest_reg, frame_reg_rtx, offset)); } - par = rs6000_make_savres_rtx (info, frame_reg_rtx, - info->gp_save_offset + sp_offset, - reg_mode, - /*savep=*/true, /*gpr=*/true, - /*lr=*/(strategy - & SAVE_NOINLINE_GPRS_SAVES_LR) - != 0); - insn = emit_insn (par); + insn = rs6000_emit_savres_rtx (info, frame_reg_rtx, + info->gp_save_offset + sp_offset, + reg_mode, + /*savep=*/true, /*gpr=*/true, + /*lr=*/((strategy + & SAVE_NOINLINE_GPRS_SAVES_LR) + != 0)); rs6000_frame_related (insn, frame_ptr_rtx, info->total_size, NULL_RTX, NULL_RTX); } @@ -20672,6 +20718,20 @@ offset_below_red_zone_p (HOST_WIDE_INT o : TARGET_32BIT ? -220 : -288); } +/* Append CFA_RESTORES to any existing REG_NOTES on the last insn. */ + +static void +emit_cfa_restores (rtx cfa_restores) +{ + rtx insn = get_last_insn (); + rtx *loc = ®_NOTES (insn); + + while (*loc) + loc = &XEXP (*loc, 1); + *loc = cfa_restores; + RTX_FRAME_RELATED_P (insn) = 1; +} + /* Emit function epilogue as insns. */ void @@ -20769,6 +20829,14 @@ rs6000_emit_epilogue (int sibcall) rtx mem = gen_frame_mem (reg_mode, addr); RTVEC_ELT (p, j++) = gen_rtx_SET (VOIDmode, reg, mem); + + if (flag_shrink_wrap) + { + cfa_restores = alloc_reg_note (REG_CFA_RESTORE, + gen_rtx_REG (Pmode, LR_REGNO), + cfa_restores); + cfa_restores = alloc_reg_note (REG_CFA_RESTORE, reg, cfa_restores); + } } for (i = 0; i < 32 - info->first_gp_reg_save; i++) @@ -20780,6 +20848,8 @@ rs6000_emit_epilogue (int sibcall) rtx mem = gen_frame_mem (reg_mode, addr); RTVEC_ELT (p, j++) = gen_rtx_SET (VOIDmode, reg, mem); + if (flag_shrink_wrap) + cfa_restores = alloc_reg_note (REG_CFA_RESTORE, reg, cfa_restores); } for (i = 0; info->first_altivec_reg_save + i <= LAST_ALTIVEC_REGNO; i++) { @@ -20790,6 +20860,8 @@ rs6000_emit_epilogue (int sibcall) rtx mem = gen_frame_mem (V4SImode, addr); RTVEC_ELT (p, j++) = gen_rtx_SET (VOIDmode, reg, mem); + if (flag_shrink_wrap) + cfa_restores = alloc_reg_note (REG_CFA_RESTORE, reg, cfa_restores); } for (i = 0; info->first_fp_reg_save + i <= 63; i++) { @@ -20803,6 +20875,8 @@ rs6000_emit_epilogue (int sibcall) ? DFmode : SFmode), addr); RTVEC_ELT (p, j++) = gen_rtx_SET (VOIDmode, reg, mem); + if (flag_shrink_wrap) + cfa_restores = alloc_reg_note (REG_CFA_RESTORE, reg, cfa_restores); } RTVEC_ELT (p, j++) = gen_rtx_CLOBBER (VOIDmode, gen_rtx_REG (Pmode, 0)); @@ -20814,8 +20888,14 @@ rs6000_emit_epilogue (int sibcall) = gen_rtx_CLOBBER (VOIDmode, gen_rtx_REG (SImode, 8)); RTVEC_ELT (p, j++) = gen_rtx_USE (VOIDmode, gen_rtx_REG (SImode, 10)); - emit_jump_insn (gen_rtx_PARALLEL (VOIDmode, p)); + insn = emit_jump_insn (gen_rtx_PARALLEL (VOIDmode, p)); + if (flag_shrink_wrap) + { + REG_NOTES (insn) = cfa_restores; + add_reg_note (insn, REG_CFA_DEF_CFA, sp_reg_rtx); + RTX_FRAME_RELATED_P (insn) = 1; + } return; } @@ -20860,9 +20940,10 @@ rs6000_emit_epilogue (int sibcall) reg = gen_rtx_REG (V4SImode, i); emit_move_insn (reg, mem); - if (offset_below_red_zone_p (info->altivec_save_offset - + (i - info->first_altivec_reg_save) - * 16)) + if (flag_shrink_wrap + || offset_below_red_zone_p (info->altivec_save_offset + + (i - info->first_altivec_reg_save) + * 16)) cfa_restores = alloc_reg_note (REG_CFA_RESTORE, reg, cfa_restores); } @@ -21001,7 +21082,7 @@ rs6000_emit_epilogue (int sibcall) reg = gen_rtx_REG (V4SImode, i); emit_move_insn (reg, mem); - if (DEFAULT_ABI == ABI_V4) + if (DEFAULT_ABI == ABI_V4 || flag_shrink_wrap) cfa_restores = alloc_reg_note (REG_CFA_RESTORE, reg, cfa_restores); } @@ -21051,8 +21132,7 @@ rs6000_emit_epilogue (int sibcall) emit_move_insn (cr_save_reg, mem); } - /* Set LR here to try to overlap restores below. LR is always saved - above incoming stack, so it never needs REG_CFA_RESTORE. */ + /* Set LR here to try to overlap restores below. */ if (restore_lr && restoring_GPRs_inline) emit_move_insn (gen_rtx_REG (Pmode, LR_REGNO), gen_rtx_REG (Pmode, 0)); @@ -21090,7 +21170,7 @@ rs6000_emit_epilogue (int sibcall) /* Restore GPRs. This is done as a PARALLEL if we are using the load-multiple instructions. */ if (TARGET_SPE_ABI - && info->spe_64bit_regs_used != 0 + && info->spe_64bit_regs_used && info->first_gp_reg_save != 32) { /* Determine whether we can address all of the registers that need @@ -21114,7 +21194,7 @@ rs6000_emit_epilogue (int sibcall) int ool_adjust = (restoring_GPRs_inline ? 0 : (info->first_gp_reg_save - - (FIRST_SAVRES_REGISTER+1))*8); + - (FIRST_SAVRES_REGISTER + 1)) * 8); if (frame_reg_rtx == sp_reg_rtx) frame_reg_rtx = gen_rtx_REG (Pmode, 11); @@ -21145,48 +21225,28 @@ rs6000_emit_epilogue (int sibcall) mem = gen_rtx_MEM (V2SImode, addr); reg = gen_rtx_REG (reg_mode, info->first_gp_reg_save + i); - insn = emit_move_insn (reg, mem); - if (DEFAULT_ABI == ABI_V4) - { - if (frame_pointer_needed - && info->first_gp_reg_save + i - == HARD_FRAME_POINTER_REGNUM) - { - add_reg_note (insn, REG_CFA_DEF_CFA, - plus_constant (frame_reg_rtx, - sp_offset)); - RTX_FRAME_RELATED_P (insn) = 1; - } - - cfa_restores = alloc_reg_note (REG_CFA_RESTORE, reg, - cfa_restores); - } + emit_move_insn (reg, mem); } } else - { - rtx par; - - par = rs6000_make_savres_rtx (info, gen_rtx_REG (Pmode, 11), - 0, reg_mode, - /*savep=*/false, /*gpr=*/true, - /*lr=*/true); - emit_jump_insn (par); - /* We don't want anybody else emitting things after we jumped - back. */ - return; - } + rs6000_emit_savres_rtx (info, gen_rtx_REG (Pmode, 11), + 0, reg_mode, + /*savep=*/false, /*gpr=*/true, + /*lr=*/true); } else if (!restoring_GPRs_inline) { /* We are jumping to an out-of-line function. */ bool can_use_exit = info->first_fp_reg_save == 64; - rtx par; /* Emit stack reset code if we need it. */ if (can_use_exit) - rs6000_emit_stack_reset (info, sp_reg_rtx, frame_reg_rtx, - sp_offset, can_use_exit); + { + rs6000_emit_stack_reset (info, sp_reg_rtx, frame_reg_rtx, + sp_offset, can_use_exit); + if (info->cr_save_p) + rs6000_restore_saved_cr (cr_save_reg, using_mtcr_multiple); + } else { emit_insn (gen_add3_insn (gen_rtx_REG (Pmode, DEFAULT_ABI == ABI_AIX @@ -21197,45 +21257,10 @@ rs6000_emit_epilogue (int sibcall) sp_offset += info->fp_size; } - par = rs6000_make_savres_rtx (info, frame_reg_rtx, - info->gp_save_offset, reg_mode, - /*savep=*/false, /*gpr=*/true, - /*lr=*/can_use_exit); - - if (can_use_exit) - { - if (info->cr_save_p) - { - rs6000_restore_saved_cr (cr_save_reg, using_mtcr_multiple); - if (DEFAULT_ABI == ABI_V4) - cfa_restores - = alloc_reg_note (REG_CFA_RESTORE, - gen_rtx_REG (SImode, CR2_REGNO), - cfa_restores); - } - - emit_jump_insn (par); - - /* We don't want anybody else emitting things after we jumped - back. */ - return; - } - - insn = emit_insn (par); - if (DEFAULT_ABI == ABI_V4) - { - if (frame_pointer_needed) - { - add_reg_note (insn, REG_CFA_DEF_CFA, - plus_constant (frame_reg_rtx, sp_offset)); - RTX_FRAME_RELATED_P (insn) = 1; - } - - for (i = info->first_gp_reg_save; i < 32; i++) - cfa_restores - = alloc_reg_note (REG_CFA_RESTORE, - gen_rtx_REG (reg_mode, i), cfa_restores); - } + rs6000_emit_savres_rtx (info, frame_reg_rtx, + info->gp_save_offset, reg_mode, + /*savep=*/false, /*gpr=*/true, + /*lr=*/can_use_exit); } else if (using_load_multiple) { @@ -21251,17 +21276,8 @@ rs6000_emit_epilogue (int sibcall) rtx reg = gen_rtx_REG (reg_mode, info->first_gp_reg_save + i); RTVEC_ELT (p, i) = gen_rtx_SET (VOIDmode, reg, mem); - if (DEFAULT_ABI == ABI_V4) - cfa_restores = alloc_reg_note (REG_CFA_RESTORE, reg, - cfa_restores); - } - insn = emit_insn (gen_rtx_PARALLEL (VOIDmode, p)); - if (DEFAULT_ABI == ABI_V4 && frame_pointer_needed) - { - add_reg_note (insn, REG_CFA_DEF_CFA, - plus_constant (frame_reg_rtx, sp_offset)); - RTX_FRAME_RELATED_P (insn) = 1; } + emit_insn (gen_rtx_PARALLEL (VOIDmode, p)); } else { @@ -21275,24 +21291,70 @@ rs6000_emit_epilogue (int sibcall) rtx mem = gen_frame_mem (reg_mode, addr); rtx reg = gen_rtx_REG (reg_mode, info->first_gp_reg_save + i); - insn = emit_move_insn (reg, mem); - if (DEFAULT_ABI == ABI_V4) - { - if (frame_pointer_needed - && info->first_gp_reg_save + i - == HARD_FRAME_POINTER_REGNUM) - { - add_reg_note (insn, REG_CFA_DEF_CFA, - plus_constant (frame_reg_rtx, sp_offset)); - RTX_FRAME_RELATED_P (insn) = 1; - } - - cfa_restores = alloc_reg_note (REG_CFA_RESTORE, reg, - cfa_restores); - } + emit_move_insn (reg, mem); } } + if (DEFAULT_ABI == ABI_V4 || flag_shrink_wrap) + { + /* If the frame pointer was used then we can't delay emitting + a REG_CFA_DEF_CFA note. This must happen on the insn that + restores the frame pointer, r31. We may have already emitted + a REG_CFA_DEF_CFA note, but that's OK; A duplicate is + discarded by dwarf2cfi.c/dwarf2out.c, and in any case would + be harmless if emitted. */ + if (frame_pointer_needed) + { + insn = get_last_insn (); + add_reg_note (insn, REG_CFA_DEF_CFA, + plus_constant (frame_reg_rtx, sp_offset)); + RTX_FRAME_RELATED_P (insn) = 1; + } + + /* Set up cfa_restores. We always need these when + shrink-wrapping. If not shrink-wrapping then we only need + the cfa_restore when the stack location is no longer valid. + The cfa_restores must be emitted on or before the insn that + invalidates the stack, and of course must not be emitted + before the insn that actually does the restore. The latter + is why the LR cfa_restore condition below is a little + complicated. It's also why it is a bad idea to emit the + cfa_restores as a group on the last instruction here that + actually does a restore: That insn may be reordered with + respect to others doing restores. */ + if (info->cr_save_p) + cfa_restores = alloc_reg_note (REG_CFA_RESTORE, + gen_rtx_REG (SImode, CR2_REGNO), + cfa_restores); + if (flag_shrink_wrap + && (restore_lr + || (info->lr_save_p + && !restoring_GPRs_inline + && info->first_fp_reg_save == 64))) + cfa_restores = alloc_reg_note (REG_CFA_RESTORE, + gen_rtx_REG (Pmode, LR_REGNO), + cfa_restores); + + for (i = info->first_gp_reg_save; i < 32; i++) + if (!restoring_GPRs_inline + || using_load_multiple + || rs6000_reg_live_or_pic_offset_p (i)) + { + rtx reg = gen_rtx_REG (reg_mode, i); + + cfa_restores = alloc_reg_note (REG_CFA_RESTORE, reg, cfa_restores); + } + } + + if (!restoring_GPRs_inline + && info->first_fp_reg_save == 64) + { + /* We are jumping to an out-of-line function. */ + if (cfa_restores) + emit_cfa_restores (cfa_restores); + return; + } + if (restore_lr && !restoring_GPRs_inline) { rtx mem = gen_frame_mem_offset (Pmode, frame_reg_rtx, @@ -21306,8 +21368,8 @@ rs6000_emit_epilogue (int sibcall) /* Restore fpr's if we need to do it without calling a function. */ if (restoring_FPRs_inline) for (i = 0; i < 64 - info->first_fp_reg_save; i++) - if ((df_regs_ever_live_p (info->first_fp_reg_save+i) - && ! call_used_regs[info->first_fp_reg_save+i])) + if ((df_regs_ever_live_p (info->first_fp_reg_save + i) + && !call_used_regs[info->first_fp_reg_save + i])) { rtx addr, mem, reg; addr = gen_rtx_PLUS (Pmode, frame_reg_rtx, @@ -21321,20 +21383,13 @@ rs6000_emit_epilogue (int sibcall) info->first_fp_reg_save + i); emit_move_insn (reg, mem); - if (DEFAULT_ABI == ABI_V4) - cfa_restores = alloc_reg_note (REG_CFA_RESTORE, reg, - cfa_restores); + if (DEFAULT_ABI == ABI_V4 || flag_shrink_wrap) + cfa_restores = alloc_reg_note (REG_CFA_RESTORE, reg, cfa_restores); } /* If we saved cr, restore it here. Just those that were used. */ if (info->cr_save_p) - { - rs6000_restore_saved_cr (cr_save_reg, using_mtcr_multiple); - if (DEFAULT_ABI == ABI_V4) - cfa_restores - = alloc_reg_note (REG_CFA_RESTORE, gen_rtx_REG (SImode, CR2_REGNO), - cfa_restores); - } + rs6000_restore_saved_cr (cr_save_reg, using_mtcr_multiple); /* If this is V.4, unwind the stack pointer after all of the loads have been done. */ @@ -21362,15 +21417,40 @@ rs6000_emit_epilogue (int sibcall) rtvec p; bool lr = (strategy & REST_NOINLINE_FPRS_DOESNT_RESTORE_LR) == 0; if (! restoring_FPRs_inline) - p = rtvec_alloc (4 + 64 - info->first_fp_reg_save); + { + p = rtvec_alloc (4 + 64 - info->first_fp_reg_save); + RTVEC_ELT (p, 0) = ret_rtx; + } else - p = rtvec_alloc (2); + { + if (cfa_restores) + { + /* We can't hang the cfa_restores off a simple return, + since the shrink-wrap code sometimes uses an existing + return. This means there might be a path from + pre-prologue code to this return, and dwarf2cfi code + wants the eh_frame unwinder state to be the same on + all paths to any point. So we need to emit the + cfa_restores before the return. For -m64 we really + don't need epilogue cfa_restores at all, except for + this irritating dwarf2cfi with shrink-wrap + requirement; The stack red-zone means eh_frame info + from the prologue telling the unwinder to restore + from the stack is perfectly good right to the end of + the function. */ + emit_insn (gen_blockage ()); + emit_cfa_restores (cfa_restores); + cfa_restores = NULL_RTX; + } + p = rtvec_alloc (2); + RTVEC_ELT (p, 0) = simple_return_rtx; + } - RTVEC_ELT (p, 0) = ret_rtx; RTVEC_ELT (p, 1) = ((restoring_FPRs_inline || !lr) - ? gen_rtx_USE (VOIDmode, gen_rtx_REG (Pmode, 65)) + ? gen_rtx_USE (VOIDmode, + gen_rtx_REG (Pmode, LR_REGNO)) : gen_rtx_CLOBBER (VOIDmode, - gen_rtx_REG (Pmode, 65))); + gen_rtx_REG (Pmode, LR_REGNO))); /* If we have to restore more than two FP registers, branch to the restore function. It will return to our caller. */ @@ -21379,6 +21459,12 @@ rs6000_emit_epilogue (int sibcall) int i; rtx sym; + if ((DEFAULT_ABI == ABI_V4 || flag_shrink_wrap) + && lr) + cfa_restores = alloc_reg_note (REG_CFA_RESTORE, + gen_rtx_REG (Pmode, LR_REGNO), + cfa_restores); + sym = rs6000_savres_routine_sym (info, /*savep=*/false, /*gpr=*/false, @@ -21390,20 +21476,32 @@ rs6000_emit_epilogue (int sibcall) ? 1 : 11)); for (i = 0; i < 64 - info->first_fp_reg_save; i++) { - rtx addr, mem; + rtx addr, mem, reg; + addr = gen_rtx_PLUS (Pmode, sp_reg_rtx, - GEN_INT (info->fp_save_offset + 8*i)); + GEN_INT (info->fp_save_offset + 8 * i)); mem = gen_frame_mem (DFmode, addr); + reg = gen_rtx_REG (DFmode, info->first_fp_reg_save + i); - RTVEC_ELT (p, i+4) = - gen_rtx_SET (VOIDmode, - gen_rtx_REG (DFmode, info->first_fp_reg_save + i), - mem); + RTVEC_ELT (p, i + 4) = gen_rtx_SET (VOIDmode, reg, mem); + if (DEFAULT_ABI == ABI_V4 || flag_shrink_wrap) + cfa_restores = alloc_reg_note (REG_CFA_RESTORE, reg, + cfa_restores); } } emit_jump_insn (gen_rtx_PARALLEL (VOIDmode, p)); } + + if (cfa_restores) + { + if (sibcall) + /* Ensure the cfa_restores are hung off an insn that won't + be reordered above other restores. */ + emit_insn (gen_blockage ()); + + emit_cfa_restores (cfa_restores); + } } /* Write function epilogue. */ @@ -21768,7 +21866,7 @@ rs6000_output_mi_thunk (FILE *file, tree gen_rtx_USE (VOIDmode, gen_rtx_REG (SImode, LR_REGNO)), - ret_rtx))); + simple_return_rtx))); SIBLING_CALL_P (insn) = 1; emit_barrier (); Index: gcc/config/rs6000/rs6000.h =================================================================== --- gcc/config/rs6000/rs6000.h (revision 178876) +++ gcc/config/rs6000/rs6000.h (working copy) @@ -894,10 +894,11 @@ extern unsigned rs6000_pointer_size; cr1 (not saved, but used for FP operations) cr0 (not saved, but used for arithmetic operations) cr4, cr3, cr2 (saved) - r0 (not saved; cannot be base reg) r9 (not saved; best for TImode) - r11, r10, r8-r4 (not saved; highest used first to make less conflict) + r10, r8-r4 (not saved; highest first for less conflict with params) r3 (not saved; return value register) + r11 (not saved; later alloc to help shrink-wrap) + r0 (not saved; cannot be base reg) r31 - r13 (saved; order given to save least number) r12 (not saved; if used for DImode or DFmode would use r13) mq (not saved; best to use it if we can) @@ -922,6 +923,14 @@ extern unsigned rs6000_pointer_size; #define MAYBE_R2_FIXED #endif +#if FIXED_R13 == 1 +#define EARLY_R12 12, +#define LATE_R12 +#else +#define EARLY_R12 +#define LATE_R12 12, +#endif + #define REG_ALLOC_ORDER \ {32, \ 45, 44, 43, 42, 41, 40, 39, 38, 37, 36, 35, 34, \ @@ -929,11 +938,11 @@ extern unsigned rs6000_pointer_size; 63, 62, 61, 60, 59, 58, 57, 56, 55, 54, 53, 52, 51, \ 50, 49, 48, 47, 46, \ 75, 74, 69, 68, 72, 71, 70, \ - 0, MAYBE_R2_AVAILABLE \ - 9, 11, 10, 8, 7, 6, 5, 4, \ - 3, \ + MAYBE_R2_AVAILABLE \ + 9, 10, 8, 7, 6, 5, 4, \ + 3, EARLY_R12 11, 0, \ 31, 30, 29, 28, 27, 26, 25, 24, 23, 22, 21, 20, 19, \ - 18, 17, 16, 15, 14, 13, 12, \ + 18, 17, 16, 15, 14, 13, LATE_R12 \ 64, 66, 65, \ 73, 1, MAYBE_R2_FIXED 67, 76, \ /* AltiVec registers. */ \ Index: gcc/config/rs6000/rs6000.md =================================================================== --- gcc/config/rs6000/rs6000.md (revision 178876) +++ gcc/config/rs6000/rs6000.md (working copy) @@ -264,6 +265,12 @@ (define_mode_iterator RECIPF [SF DF V4SF ; Iterator for just SF/DF (define_mode_iterator SFDF [SF DF]) +; Conditional returns. +(define_code_iterator any_return [return simple_return]) +(define_code_attr return_pred [(return "direct_return ()") + (simple_return "")]) +(define_code_attr return_str [(return "") (simple_return "simple_")]) + ; Various instructions that come in SI and DI forms. ; A generic w/d attribute, for things like cmpw/cmpd. (define_mode_attr wd [(QI "b") (HI "h") (SI "w") (DI "d")]) @@ -12814,7 +12831,7 @@ (define_expand "sibcall" (match_operand 1 "" "")) (use (match_operand 2 "" "")) (use (reg:SI LR_REGNO)) - (return)])] + (simple_return)])] "" " { @@ -12838,7 +12855,7 @@ (define_insn "*sibcall_local32" (match_operand 1 "" "g,g")) (use (match_operand:SI 2 "immediate_operand" "O,n")) (use (reg:SI LR_REGNO)) - (return)] + (simple_return)] "(INTVAL (operands[2]) & CALL_LONG) == 0" "* { @@ -12858,7 +12875,7 @@ (define_insn "*sibcall_local64" (match_operand 1 "" "g,g")) (use (match_operand:SI 2 "immediate_operand" "O,n")) (use (reg:SI LR_REGNO)) - (return)] + (simple_return)] "TARGET_64BIT && (INTVAL (operands[2]) & CALL_LONG) == 0" "* { @@ -12879,7 +12896,7 @@ (define_insn "*sibcall_value_local32" (match_operand 2 "" "g,g"))) (use (match_operand:SI 3 "immediate_operand" "O,n")) (use (reg:SI LR_REGNO)) - (return)] + (simple_return)] "(INTVAL (operands[3]) & CALL_LONG) == 0" "* { @@ -12901,7 +12918,7 @@ (define_insn "*sibcall_value_local64" (match_operand 2 "" "g,g"))) (use (match_operand:SI 3 "immediate_operand" "O,n")) (use (reg:SI LR_REGNO)) - (return)] + (simple_return)] "TARGET_64BIT && (INTVAL (operands[3]) & CALL_LONG) == 0" "* { @@ -12921,7 +12938,7 @@ (define_insn "*sibcall_nonlocal_aix<mode (match_operand 1 "" "g,g")) (use (match_operand:SI 2 "immediate_operand" "O,O")) (use (reg:SI LR_REGNO)) - (return)] + (simple_return)] "DEFAULT_ABI == ABI_AIX && (INTVAL (operands[2]) & CALL_LONG) == 0" "@ @@ -12936,7 +12953,7 @@ (define_insn "*sibcall_value_nonlocal_ai (match_operand 2 "" "g,g"))) (use (match_operand:SI 3 "immediate_operand" "O,O")) (use (reg:SI LR_REGNO)) - (return)] + (simple_return)] "DEFAULT_ABI == ABI_AIX && (INTVAL (operands[3]) & CALL_LONG) == 0" "@ @@ -12950,7 +12967,7 @@ (define_insn "*sibcall_nonlocal_sysv<mod (match_operand 1 "" "")) (use (match_operand 2 "immediate_operand" "O,n,O,n")) (use (reg:SI LR_REGNO)) - (return)] + (simple_return)] "(DEFAULT_ABI == ABI_DARWIN || DEFAULT_ABI == ABI_V4) && (INTVAL (operands[2]) & CALL_LONG) == 0" @@ -12981,7 +12998,7 @@ (define_expand "sibcall_value" (match_operand 2 "" ""))) (use (match_operand 3 "" "")) (use (reg:SI LR_REGNO)) - (return)])] + (simple_return)])] "" " { @@ -13002,7 +13019,7 @@ (define_insn "*sibcall_value_nonlocal_sy (match_operand 2 "" ""))) (use (match_operand:SI 3 "immediate_operand" "O,n,O,n")) (use (reg:SI LR_REGNO)) - (return)] + (simple_return)] "(DEFAULT_ABI == ABI_DARWIN || DEFAULT_ABI == ABI_V4) && (INTVAL (operands[3]) & CALL_LONG) == 0" @@ -15328,9 +15345,9 @@ (define_insn "" [(match_operand 1 "cc_reg_operand" "y") (const_int 0)]) - (return) + (any_return) (pc)))] - "direct_return ()" + "<return_pred>" "* { return output_cbranch (operands[0], NULL, 0, insn); @@ -15360,8 +15377,8 @@ (define_insn "" "cc_reg_operand" "y") (const_int 0)]) (pc) - (return)))] - "direct_return ()" + (any_return)))] + "<return_pred>" "* { return output_cbranch (operands[0], NULL, 1, insn); @@ -15491,9 +15508,9 @@ (define_insn "jump" "b %l0" [(set_attr "type" "branch")]) -(define_insn "return" - [(return)] - "direct_return ()" +(define_insn "<return_str>return" + [(any_return)] + "<return_pred>" "{br|blr}" [(set_attr "type" "jmpreg")]) @@ -16015,7 +16032,7 @@ (define_insn "*lmw" (set_attr "cell_micro" "always")]) (define_insn "*return_internal_<mode>" - [(return) + [(simple_return) (use (match_operand:P 0 "register_operand" "lc"))] "" "b%T0" -- Alan Modra Australia Development Lab, IBM ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: PowerPC shrink-wrap support 3 of 3 2011-09-28 16:35 ` Alan Modra @ 2011-10-16 20:19 ` David Edelsohn 2011-10-26 13:03 ` Alan Modra 0 siblings, 1 reply; 48+ messages in thread From: David Edelsohn @ 2011-10-16 20:19 UTC (permalink / raw) To: gcc-patches, Bernd Schmidt, Alan Modra On Wed, Sep 28, 2011 at 11:47 AM, Alan Modra <amodra@gmail.com> wrote: > * config/rs6000/rs6000.c (rs6000_make_savres_rtx): Delete unneeded > declaration. > (rs6000_emit_stack_reset): Only return insn emitted when it adjusts sp. > (rs6000_make_savres_rtx): Rename to rs6000_emit_savres_rtx. Use > simple_return in pattern, emit instruction, and set jump_label. > (rs6000_emit_prologue): Update for rs6000_emit_savres_rtx. Use > simple_return rather than return. > (emit_cfa_restores): New function. > (rs6000_emit_epilogue): Emit cfa_restores when flag_shrink_wrap. > Add missing cfa_restores for SAVE_WORLD. Add missing LR cfa_restore > when using out-of-line gpr restore. Add missing LR and FP regs > cfa_restores for out-of-line fpr restore. Consolidate code setting > up cfa_restores. Formatting. Use LR_REGNO define. > (rs6000_output_mi_thunk): Use simple_return rather than return. > * config/rs6000/rs6000.md (sibcall*, sibcall_value*): Likewise. > (return_internal*): Likewise. > (any_return, return_pred, return_str): New iterators. > (return, conditional return insns): Provide both return and > simple_return variants. > * gcc/config/rs6000/rs6000.h (EARLY_R12, LATE_R12): Define. > (REG_ALLOC_ORDER): Move r12 before call-saved regs when FIXED_R13. > Move r11 and r0 later to suit shrink-wrapping. Alan, The patch is okay, although I am not thrilled about the need to change the register allocation order. Thanks, David ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: PowerPC shrink-wrap support 3 of 3 2011-10-16 20:19 ` David Edelsohn @ 2011-10-26 13:03 ` Alan Modra 2011-10-26 13:42 ` Bernd Schmidt 2011-10-31 15:14 ` Alan Modra 0 siblings, 2 replies; 48+ messages in thread From: Alan Modra @ 2011-10-26 13:03 UTC (permalink / raw) To: gcc-patches; +Cc: Bernd Schmidt On Sun, Oct 16, 2011 at 02:51:01PM -0400, David Edelsohn wrote: > The patch is okay, although I am not thrilled about the need to change > the register allocation order. Committed revision 180522. It turns out that shrink-wrapping isn't as effective as it used to be with the 20110915 based sources I was using originally. povray Ray_In_Bound no longer gets the benefit of shrink wrap, likely due to some cfg optimization. We end up with a simple block that just does r3=1 then jumps to last_bb being reached from blocks that need a prologue as well as blocks that don't. That's enough to kill our current shrink wrap implementation. What we need is something to duplicate these tail blocks.. Patch here for comment. I haven't yet run benchmarks, but this passes bootstrap and regression test (on rev 180286, current virgin mainline fails bootstrap on powerpc-linux). * function.c (thread_prologue_and_epilogue_insns): Delete shadowing variables. Don't do prologue register clobber tests when shrink wrapping already failed. Compute tail block candidates for duplicating exit path. Remove these from antic set. Duplicate tails when reached from both blocks needing a prologue/epilogue and blocks not needing such. Index: gcc/function.c =================================================================== --- gcc/function.c (revision 180467) +++ gcc/function.c (working copy) @@ -5697,11 +5697,11 @@ thread_prologue_and_epilogue_insns (void HARD_REG_SET prologue_clobbered, prologue_used, live_on_edge; HARD_REG_SET set_up_by_prologue; rtx p_insn; - VEC(basic_block, heap) *vec; basic_block bb; bitmap_head bb_antic_flags; bitmap_head bb_on_list; + bitmap_head bb_tail; if (dump_file) fprintf (dump_file, "Attempting shrink-wrapping optimization.\n"); @@ -5732,6 +5732,7 @@ thread_prologue_and_epilogue_insns (void bitmap_initialize (&bb_antic_flags, &bitmap_default_obstack); bitmap_initialize (&bb_on_list, &bitmap_default_obstack); + bitmap_initialize (&bb_tail, &bitmap_default_obstack); /* Find the set of basic blocks that require a stack frame. */ @@ -5774,19 +5775,22 @@ thread_prologue_and_epilogue_insns (void } } + /* Save a copy of blocks that really need a prologue. */ + bitmap_copy (&bb_antic_flags, &bb_flags); + /* For every basic block that needs a prologue, mark all blocks reachable from it, so as to ensure they are also seen as requiring a prologue. */ while (!VEC_empty (basic_block, vec)) { basic_block tmp_bb = VEC_pop (basic_block, vec); - edge e; - edge_iterator ei; + FOR_EACH_EDGE (e, ei, tmp_bb->succs) if (e->dest != EXIT_BLOCK_PTR && bitmap_set_bit (&bb_flags, e->dest->index)) VEC_quick_push (basic_block, vec, e->dest); } + /* If the last basic block contains only a label, we'll be able to convert jumps to it to (potentially conditional) return insns later. This means we don't necessarily need a prologue @@ -5799,14 +5803,29 @@ thread_prologue_and_epilogue_insns (void goto fail_shrinkwrap; } + /* Find the set of basic blocks that need no prologue and only + go to the exit. */ + bitmap_set_bit (&bb_tail, EXIT_BLOCK_PTR->index); + VEC_quick_push (basic_block, vec, EXIT_BLOCK_PTR); + while (!VEC_empty (basic_block, vec)) + { + basic_block tmp_bb = VEC_pop (basic_block, vec); + + FOR_EACH_EDGE (e, ei, tmp_bb->preds) + if (single_succ_p (e->src) + && !bitmap_bit_p (&bb_antic_flags, e->src->index) + && bitmap_set_bit (&bb_tail, e->src->index)) + VEC_quick_push (basic_block, vec, e->src); + } + /* Now walk backwards from every block that is marked as needing - a prologue to compute the bb_antic_flags bitmap. */ - bitmap_copy (&bb_antic_flags, &bb_flags); + a prologue to compute the bb_antic_flags bitmap. Exclude + tail blocks; They can be duplicated to be used on paths not + needing a prologue. */ + bitmap_and_compl (&bb_antic_flags, &bb_flags, &bb_tail); FOR_EACH_BB (bb) { - edge e; - edge_iterator ei; - if (!bitmap_bit_p (&bb_flags, bb->index)) + if (!bitmap_bit_p (&bb_antic_flags, bb->index)) continue; FOR_EACH_EDGE (e, ei, bb->preds) if (!bitmap_bit_p (&bb_antic_flags, e->src->index) @@ -5816,8 +5835,6 @@ thread_prologue_and_epilogue_insns (void while (!VEC_empty (basic_block, vec)) { basic_block tmp_bb = VEC_pop (basic_block, vec); - edge e; - edge_iterator ei; bool all_set = true; bitmap_clear_bit (&bb_on_list, tmp_bb->index); @@ -5862,28 +5879,172 @@ thread_prologue_and_epilogue_insns (void } } - /* Test whether the prologue is known to clobber any register - (other than FP or SP) which are live on the edge. */ - CLEAR_HARD_REG_BIT (prologue_clobbered, STACK_POINTER_REGNUM); - if (frame_pointer_needed) - CLEAR_HARD_REG_BIT (prologue_clobbered, HARD_FRAME_POINTER_REGNUM); - CLEAR_HARD_REG_SET (live_on_edge); - reg_set_to_hard_reg_set (&live_on_edge, - df_get_live_in (entry_edge->dest)); - if (hard_reg_set_intersect_p (live_on_edge, prologue_clobbered)) + if (entry_edge != orig_entry_edge) { - entry_edge = orig_entry_edge; - if (dump_file) - fprintf (dump_file, "Shrink-wrapping aborted due to clobber.\n"); + /* Test whether the prologue is known to clobber any register + (other than FP or SP) which are live on the edge. */ + CLEAR_HARD_REG_BIT (prologue_clobbered, STACK_POINTER_REGNUM); + if (frame_pointer_needed) + CLEAR_HARD_REG_BIT (prologue_clobbered, HARD_FRAME_POINTER_REGNUM); + CLEAR_HARD_REG_SET (live_on_edge); + reg_set_to_hard_reg_set (&live_on_edge, + df_get_live_in (entry_edge->dest)); + if (hard_reg_set_intersect_p (live_on_edge, prologue_clobbered)) + { + entry_edge = orig_entry_edge; + if (dump_file) + fprintf (dump_file, + "Shrink-wrapping aborted due to clobber.\n"); + } } - else if (entry_edge != orig_entry_edge) + if (entry_edge != orig_entry_edge) { crtl->shrink_wrapped = true; if (dump_file) fprintf (dump_file, "Performing shrink-wrapping.\n"); + + /* Find tail blocks reachable from both blocks needing a + prologue and blocks not needing a prologue. */ + bitmap_clear_bit (&bb_tail, EXIT_BLOCK_PTR->index); + if (!bitmap_empty_p (&bb_tail)) + FOR_EACH_BB (bb) + { + bool some_pro, some_no_pro; + if (!bitmap_bit_p (&bb_tail, bb->index)) + continue; + some_pro = some_no_pro = false; + FOR_EACH_EDGE (e, ei, bb->preds) + { + if (bitmap_bit_p (&bb_flags, e->src->index)) + some_pro = true; + else + some_no_pro = true; + } + if (some_pro && some_no_pro) + VEC_quick_push (basic_block, vec, bb); + else + bitmap_clear_bit (&bb_tail, bb->index); + } + /* Find the head of each tail. */ + while (!VEC_empty (basic_block, vec)) + { + basic_block tbb = VEC_pop (basic_block, vec); + + if (!bitmap_bit_p (&bb_tail, tbb->index)) + continue; + + while (single_succ_p (tbb)) + { + tbb = single_succ (tbb); + bitmap_clear_bit (&bb_tail, tbb->index); + } + } + /* Now duplicate the tails. */ + if (!bitmap_empty_p (&bb_tail)) + FOR_EACH_BB_REVERSE (bb) + { + basic_block copy_bb, next_bb, fall_bb; + edge fall_into; + rtx insert_point, last; + + if (!bitmap_clear_bit (&bb_tail, bb->index)) + continue; + + copy_bb = NULL; + next_bb = single_succ (bb); + fall_into = find_fallthru_edge (bb->preds); + fall_bb = (fall_into ? fall_into->src : NULL); + + if (fall_into + && !bitmap_bit_p (&bb_flags, fall_bb->index)) + e = fall_into; + else + { + FOR_EACH_EDGE (e, ei, bb->preds) + if (!bitmap_bit_p (&bb_flags, e->src->index)) + break; + gcc_assert (e); + } + + /* Create a copy of BB, instructions and all, for + use on paths that don't need a prologue. We know + BB has a single successor, so there is no need to + copy a jump at the end of BB. */ + start_sequence (); + last = BB_END (bb); + if (simplejump_p (last)) + last = PREV_INSN (last); + duplicate_insn_chain (BB_HEAD (bb), last); + copy_bb = split_edge (e); + emit_insn_after (get_insns (), BB_END (copy_bb)); + end_sequence (); + df_set_bb_dirty (copy_bb); + force_nonfallthru_and_redirect (single_succ_edge (copy_bb), + EXIT_BLOCK_PTR, + simple_return_rtx); + insert_point = BB_END (copy_bb); + gcc_assert (JUMP_P (insert_point)); + + /* If there was a fall-thru edge into bb, then + creating copy_bb may have required inserting a + jump block. Set bb_flags for the jump block. */ + if (fall_bb + && bitmap_bit_p (&bb_flags, fall_bb->index)) + FOR_EACH_EDGE (e, ei, fall_bb->succs) + if (e->dest != EXIT_BLOCK_PTR) + bitmap_set_bit (&bb_flags, e->dest->index); + + /* Redirect all the paths that need no prologue into + copy_bb. */ + for (ei = ei_start (bb->preds); (e = ei_safe_edge (ei)); ) + if (!bitmap_bit_p (&bb_flags, e->src->index)) + { + redirect_edge_and_branch_force (e, copy_bb); + continue; + } + else + ei_next (&ei); + + while (next_bb != EXIT_BLOCK_PTR) + { + basic_block tbb = next_bb; + next_bb = single_succ (tbb); + e = split_block (copy_bb, PREV_INSN (insert_point)); + copy_bb = e->dest; + start_sequence (); + last = BB_END (tbb); + if (simplejump_p (last)) + last = PREV_INSN (last); + duplicate_insn_chain (BB_HEAD (tbb), last); + emit_insn_before (get_insns (), insert_point); + end_sequence (); + + for (ei = ei_start (tbb->preds); (e = ei_safe_edge (ei)); ) + if (!bitmap_bit_p (&bb_flags, e->src->index)) + { + redirect_edge_and_branch_force (e, copy_bb); + continue; + } + else + ei_next (&ei); + } + /* Our tail may just end in a sibling call, in which + case we don't want the simple_return jump added by + force_nonfallthru_and_redirect. */ + if (CALL_P (PREV_INSN (insert_point)) + && SIBLING_CALL_P (PREV_INSN (insert_point))) + { + delete_insn (insert_point); + e = single_succ_edge (copy_bb); + e->flags = EDGE_SIBCALL | EDGE_ABNORMAL; + } + if (bitmap_empty_p (&bb_tail)) + break; + } } fail_shrinkwrap: + bitmap_clear (&bb_tail); bitmap_clear (&bb_antic_flags); bitmap_clear (&bb_on_list); VEC_free (basic_block, heap, vec); -- Alan Modra Australia Development Lab, IBM ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: PowerPC shrink-wrap support 3 of 3 2011-10-26 13:03 ` Alan Modra @ 2011-10-26 13:42 ` Bernd Schmidt 2011-10-26 14:40 ` Alan Modra 2011-10-31 15:14 ` Alan Modra 1 sibling, 1 reply; 48+ messages in thread From: Bernd Schmidt @ 2011-10-26 13:42 UTC (permalink / raw) To: gcc-patches On 10/26/11 14:27, Alan Modra wrote: > Committed revision 180522. It turns out that shrink-wrapping isn't as > effective as it used to be with the 20110915 based sources I was using > originally. povray Ray_In_Bound no longer gets the benefit of shrink > wrap, likely due to some cfg optimization. We end up with a simple > block that just does r3=1 then jumps to last_bb being reached from > blocks that need a prologue as well as blocks that don't. That's > enough to kill our current shrink wrap implementation. What we need > is something to duplicate these tail blocks.. Would it work to insert the epilogue on some edges to this R3=1 block, and not on the others? (How many edges of each kind are there?) Now that we have an initial patch in the tree and it mostly seems to work, we can think about making it a little stronger - the initial implementation is really quite conservative. Bernd ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: PowerPC shrink-wrap support 3 of 3 2011-10-26 13:42 ` Bernd Schmidt @ 2011-10-26 14:40 ` Alan Modra 2011-10-26 14:44 ` Bernd Schmidt 2011-10-28 0:41 ` Alan Modra 0 siblings, 2 replies; 48+ messages in thread From: Alan Modra @ 2011-10-26 14:40 UTC (permalink / raw) To: Bernd Schmidt; +Cc: gcc-patches On Wed, Oct 26, 2011 at 03:01:01PM +0200, Bernd Schmidt wrote: > On 10/26/11 14:27, Alan Modra wrote: > > Committed revision 180522. It turns out that shrink-wrapping isn't as > > effective as it used to be with the 20110915 based sources I was using > > originally. povray Ray_In_Bound no longer gets the benefit of shrink > > wrap, likely due to some cfg optimization. We end up with a simple > > block that just does r3=1 then jumps to last_bb being reached from > > blocks that need a prologue as well as blocks that don't. That's > > enough to kill our current shrink wrap implementation. What we need > > is something to duplicate these tail blocks.. > > Would it work to insert the epilogue on some edges to this R3=1 block, > and not on the others? Wouldn't you need to modify all the target epilogue code? Our epilogues return. > (How many edges of each kind are there?) In the povray case there was one edge of each kind, but I have seen other cases where there were 4 edges from blocks needing no prologue and 2 edges from blocks needing a prologue. I can't tell you what the testcase was now; It was something I looked at when ironing out bugs in my code. You wouldn't believe how many ways it is possible to write buggy cfg manipulation code.. I guess the tradeoff between the classic shrink-wrap epilogue scheme and my duplicate tail idea is whether duplicating tail blocks adds more code than duplicating epilogues. From what I've seen, the duplicate tails are generally very small. I guess I should dump out some info so we can get a better idea. -- Alan Modra Australia Development Lab, IBM ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: PowerPC shrink-wrap support 3 of 3 2011-10-26 14:40 ` Alan Modra @ 2011-10-26 14:44 ` Bernd Schmidt 2011-10-26 15:40 ` Alan Modra 2011-10-28 0:41 ` Alan Modra 1 sibling, 1 reply; 48+ messages in thread From: Bernd Schmidt @ 2011-10-26 14:44 UTC (permalink / raw) To: gcc-patches On 10/26/11 15:54, Alan Modra wrote: > On Wed, Oct 26, 2011 at 03:01:01PM +0200, Bernd Schmidt wrote: >> On 10/26/11 14:27, Alan Modra wrote: >>> Committed revision 180522. It turns out that shrink-wrapping isn't as >>> effective as it used to be with the 20110915 based sources I was using >>> originally. povray Ray_In_Bound no longer gets the benefit of shrink >>> wrap, likely due to some cfg optimization. We end up with a simple >>> block that just does r3=1 then jumps to last_bb being reached from >>> blocks that need a prologue as well as blocks that don't. That's >>> enough to kill our current shrink wrap implementation. What we need >>> is something to duplicate these tail blocks.. >> >> Would it work to insert the epilogue on some edges to this R3=1 block, >> and not on the others? > > Wouldn't you need to modify all the target epilogue code? Our > epilogues return. Not all of them at once - you could require that if a target has a simple_return pattern, the epilogue does not return. But yes, these kinds of complications are a reason why I went for a simple variant first. > I guess the tradeoff between the classic shrink-wrap epilogue scheme > and my duplicate tail idea is whether duplicating tail blocks adds > more code than duplicating epilogues. From what I've seen, the > duplicate tails are generally very small. I guess I should dump out > some info so we can get a better idea. I suppose if one wanted to avoid inserting more than one epilogue for code-size reasons, one could make a new basic block containing the epilogue, and redirect edges as appropriate. Bernd ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: PowerPC shrink-wrap support 3 of 3 2011-10-26 14:44 ` Bernd Schmidt @ 2011-10-26 15:40 ` Alan Modra 0 siblings, 0 replies; 48+ messages in thread From: Alan Modra @ 2011-10-26 15:40 UTC (permalink / raw) To: Bernd Schmidt; +Cc: gcc-patches On Wed, Oct 26, 2011 at 03:59:36PM +0200, Bernd Schmidt wrote: > On 10/26/11 15:54, Alan Modra wrote: > > I guess the tradeoff between the classic shrink-wrap epilogue scheme > > and my duplicate tail idea is whether duplicating tail blocks adds > > more code than duplicating epilogues. From what I've seen, the > > duplicate tails are generally very small. I guess I should dump out > > some info so we can get a better idea. > > I suppose if one wanted to avoid inserting more than one epilogue for > code-size reasons, one could make a new basic block containing the > epilogue, and redirect edges as appropriate. Suppose you have a function that returns r3=0 in one tail block and r3=1 in another, and these blocks are reached both by paths needing a prologue and by paths not needing a prologue. Which seems a likely common case. I'm fairly certain that would require two copies of the normal epilogue, or duplicating the tail blocks. (But it's late here and I'm ready to nod off so may not be thinking straight.) -- Alan Modra Australia Development Lab, IBM ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: PowerPC shrink-wrap support 3 of 3 2011-10-26 14:40 ` Alan Modra 2011-10-26 14:44 ` Bernd Schmidt @ 2011-10-28 0:41 ` Alan Modra 1 sibling, 0 replies; 48+ messages in thread From: Alan Modra @ 2011-10-28 0:41 UTC (permalink / raw) To: Bernd Schmidt, gcc-patches On Thu, Oct 27, 2011 at 12:24:46AM +1030, Alan Modra wrote: > more code than duplicating epilogues. From what I've seen, the > duplicate tails are generally very small. I guess I should dump out > some info so we can get a better idea. There were 545 occurrences of shrink-wrap in the gcc/ dir for a -O2 powerpc-linux bootstrap. I counted active insns in duplicated blocks. 244 had zero insns (all the ones I looked at were just uses of r3), 96 had one insn (all the ones I looked at were setting r3), with the remainder being: cfgrtl.c.199r.pro_and_epilogue:Duplicating bb 6, 2 insns. tail call function.c.199r.pro_and_epilogue:Duplicating bb 17, 2 insns. setting a value via pointer (*poffset in instantiate_new_reg) insn-automata.c.199r.pro_and_epilogue:Duplicating bb 229, 2 insns. tail call varasm.c.199r.pro_and_epilogue:Duplicating bb 48, 2 insns. setting a global var rs6000.c.199r.pro_and_epilogue:Duplicating bb 300, 3 insns. tail call rs6000.c.199r.pro_and_epilogue:Duplicating bb 8, 3 insns. tail call toplev.c.199r.pro_and_epilogue:Duplicating bb 5, 4 insns. loading two word return value from "random_seed" var. reginfo.c.199r.pro_and_epilogue:Duplicating bb 4, 7 insns. setting a two word global var and r3 dbxout.c.199r.pro_and_epilogue:Duplicating bb 4, 8 insns. tail call dbxout.c.199r.pro_and_epilogue:Duplicating bb 4, 8 insns. tail call Note that having 350 duplicated blocks doesn't mean we had 350 extra cases of shrink-wrapping, because some tails were two blocks, and I recall seeing a case when developing my code where one function had two duplicated tails. Certainly many more shrink wrapped functions though. :-) -- Alan Modra Australia Development Lab, IBM ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: PowerPC shrink-wrap support 3 of 3 2011-10-26 13:03 ` Alan Modra 2011-10-26 13:42 ` Bernd Schmidt @ 2011-10-31 15:14 ` Alan Modra 2011-11-01 15:34 ` Alan Modra 1 sibling, 1 reply; 48+ messages in thread From: Alan Modra @ 2011-10-31 15:14 UTC (permalink / raw) To: gcc-patches, Bernd Schmidt So I'm at the point where I'm reasonably happy with this work. This patch doesn't do anything particularly clever regarding our shrink-wrap implementation. We still only insert one copy of the prologue, and one of the epilogue in thread_prologue_and_epilogue. All it really does is replaces Bernd's !last_bb_active code (allowing one tail block with no active insns to be shared by paths needing a prologue and paths not needing a prologue), with what I think is conceptually simpler, duplicating a shared tail block. Then I extend this to duplicating a chain of tail blocks. That leads to some simplification as all the special cases and restrictions of !last_bb_active disappear. For example, convert_jumps_to_returns looks much like the code in gcc-4.6. We also get many more functions being shrink-wrapped. Some numbers from my latest gcc bootstraps: powerpc-linux .../gcc-virgin/gcc> grep 'Performing shrink' *.pro_and_epilogue | wc -l 453 .../gcc-curr/gcc> grep 'Performing shrink' *.pro_and_epilogue | wc -l 648 i686-linux .../gcc-virgin/gcc$ grep 'Performing shrink' *pro_and_epilogue | wc -l 329 .../gcc-curr/gcc$ grep 'Performing shrink' *.pro_and_epilogue | wc -l 416 Bits left to do - limit size of duplicated tails - don't duplicate sibling call blocks, but instead split the block after the sibling call epilogue has been added, redirecting non-prologue paths past the epilogue. Is this OK to apply as is? * function.c (bb_active_p): Delete. (dup_block_and_redirect, active_insn_between): New functions. (convert_jumps_to_returns, emit_return_for_exit): New functions, split out from.. (thread_prologue_and_epilogue_insns): ..here. Delete shadowing variables. Don't do prologue register clobber tests when shrink wrapping already failed. Delete all last_bb_active code. Instead compute tail block candidates for duplicating exit path. Remove these from antic set. Duplicate tails when reached from both blocks needing a prologue/epilogue and blocks not needing such. Index: gcc/function.c =================================================================== *** gcc/function.c (revision 180588) --- gcc/function.c (working copy) *************** set_return_jump_label (rtx returnjump) *** 5514,5535 **** JUMP_LABEL (returnjump) = ret_rtx; } ! /* Return true if BB has any active insns. */ static bool ! bb_active_p (basic_block bb) { rtx label; ! /* Test whether there are active instructions in BB. */ ! label = BB_END (bb); ! while (label && !LABEL_P (label)) { ! if (active_insn_p (label)) ! break; ! label = PREV_INSN (label); } ! return BB_HEAD (bb) != label || !LABEL_P (label); } /* Generate the prologue and epilogue RTL if the machine supports it. Thread this into place with notes indicating where the prologue ends and where --- 5514,5698 ---- JUMP_LABEL (returnjump) = ret_rtx; } ! #ifdef HAVE_simple_return ! /* Create a copy of BB instructions and insert at BEFORE. Redirect ! preds of BB to COPY_BB if they don't appear in NEED_PROLOGUE. */ ! static void ! dup_block_and_redirect (basic_block bb, basic_block copy_bb, rtx before, ! bitmap_head *need_prologue) ! { ! edge_iterator ei; ! edge e; ! rtx insn = BB_END (bb); ! ! /* We know BB has a single successor, so there is no need to copy a ! simple jump at the end of BB. */ ! if (simplejump_p (insn)) ! insn = PREV_INSN (insn); ! ! start_sequence (); ! duplicate_insn_chain (BB_HEAD (bb), insn); ! if (dump_file) ! { ! unsigned count = 0; ! for (insn = get_insns (); insn; insn = NEXT_INSN (insn)) ! if (active_insn_p (insn)) ! ++count; ! fprintf (dump_file, "Duplicating bb %d to bb %d, %u active insns.\n", ! bb->index, copy_bb->index, count); ! } ! insn = get_insns (); ! end_sequence (); ! emit_insn_before (insn, before); ! ! /* Redirect all the paths that need no prologue into copy_bb. */ ! for (ei = ei_start (bb->preds); (e = ei_safe_edge (ei)); ) ! if (!bitmap_bit_p (need_prologue, e->src->index)) ! { ! redirect_edge_and_branch_force (e, copy_bb); ! continue; ! } ! else ! ei_next (&ei); ! } ! #endif ! ! #if defined (HAVE_return) || defined (HAVE_simple_return) ! /* Return true if there are any active insns between HEAD and TAIL. */ static bool ! active_insn_between (rtx head, rtx tail) ! { ! while (tail) ! { ! if (active_insn_p (tail)) ! return true; ! if (tail == head) ! return false; ! tail = PREV_INSN (tail); ! } ! return false; ! } ! ! /* LAST_BB is a block that exits, and empty of active instructions. ! Examine its predecessors for jumps that can be converted to ! (conditional) returns. */ ! static VEC (edge, heap) * ! convert_jumps_to_returns (basic_block last_bb, bool simple_p, ! VEC (edge, heap) *unconverted ATTRIBUTE_UNUSED) { + int i; + basic_block bb; rtx label; + edge_iterator ei; + edge e; + VEC(basic_block,heap) *src_bbs; + + src_bbs = VEC_alloc (basic_block, heap, EDGE_COUNT (last_bb->preds)); + FOR_EACH_EDGE (e, ei, last_bb->preds) + if (e->src != ENTRY_BLOCK_PTR) + VEC_quick_push (basic_block, src_bbs, e->src); ! label = BB_HEAD (last_bb); ! ! FOR_EACH_VEC_ELT (basic_block, src_bbs, i, bb) { ! rtx jump = BB_END (bb); ! ! if (!JUMP_P (jump) || JUMP_LABEL (jump) != label) ! continue; ! ! e = find_edge (bb, last_bb); ! ! /* If we have an unconditional jump, we can replace that ! with a simple return instruction. */ ! if (simplejump_p (jump)) ! { ! /* The use of the return register might be present in the exit ! fallthru block. Either: ! - removing the use is safe, and we should remove the use in ! the exit fallthru block, or ! - removing the use is not safe, and we should add it here. ! For now, we conservatively choose the latter. Either of the ! 2 helps in crossjumping. */ ! emit_use_return_register_into_block (bb); ! ! emit_return_into_block (simple_p, bb); ! delete_insn (jump); ! } ! ! /* If we have a conditional jump branching to the last ! block, we can try to replace that with a conditional ! return instruction. */ ! else if (condjump_p (jump)) ! { ! rtx dest; ! ! if (simple_p) ! dest = simple_return_rtx; ! else ! dest = ret_rtx; ! if (!redirect_jump (jump, dest, 0)) ! { ! #ifdef HAVE_simple_return ! if (simple_p) ! { ! if (dump_file) ! fprintf (dump_file, ! "Failed to redirect bb %d branch.\n", bb->index); ! VEC_safe_push (edge, heap, unconverted, e); ! } ! #endif ! continue; ! } ! ! /* See comment in simplejump_p case above. */ ! emit_use_return_register_into_block (bb); ! ! /* If this block has only one successor, it both jumps ! and falls through to the fallthru block, so we can't ! delete the edge. */ ! if (single_succ_p (bb)) ! continue; ! } ! else ! { ! #ifdef HAVE_simple_return ! if (simple_p) ! { ! if (dump_file) ! fprintf (dump_file, ! "Failed to redirect bb %d branch.\n", bb->index); ! VEC_safe_push (edge, heap, unconverted, e); ! } ! #endif ! continue; ! } ! ! /* Fix up the CFG for the successful change we just made. */ ! redirect_edge_succ (e, EXIT_BLOCK_PTR); ! } ! VEC_free (basic_block, heap, src_bbs); ! return unconverted; ! } ! ! /* Emit a return insn for the exit fallthru block. */ ! static basic_block ! emit_return_for_exit (edge exit_fallthru_edge, bool simple_p) ! { ! basic_block last_bb = exit_fallthru_edge->src; ! ! if (JUMP_P (BB_END (last_bb))) ! { ! last_bb = split_edge (exit_fallthru_edge); ! exit_fallthru_edge = single_succ_edge (last_bb); } ! emit_barrier_after (BB_END (last_bb)); ! emit_return_into_block (simple_p, last_bb); ! exit_fallthru_edge->flags &= ~EDGE_FALLTHRU; ! return last_bb; } + #endif + /* Generate the prologue and epilogue RTL if the machine supports it. Thread this into place with notes indicating where the prologue ends and where *************** static void *** 5583,5602 **** thread_prologue_and_epilogue_insns (void) { bool inserted; - basic_block last_bb; - bool last_bb_active ATTRIBUTE_UNUSED; #ifdef HAVE_simple_return ! VEC (rtx, heap) *unconverted_simple_returns = NULL; ! basic_block simple_return_block_hot = NULL; ! basic_block simple_return_block_cold = NULL; bool nonempty_prologue; #endif ! rtx returnjump ATTRIBUTE_UNUSED; rtx seq ATTRIBUTE_UNUSED, epilogue_end ATTRIBUTE_UNUSED; rtx prologue_seq ATTRIBUTE_UNUSED, split_prologue_seq ATTRIBUTE_UNUSED; edge e, entry_edge, orig_entry_edge, exit_fallthru_edge; edge_iterator ei; - bitmap_head bb_flags; df_analyze (); --- 5746,5761 ---- thread_prologue_and_epilogue_insns (void) { bool inserted; #ifdef HAVE_simple_return ! VEC (edge, heap) *unconverted_simple_returns = NULL; bool nonempty_prologue; + bitmap_head bb_flags; #endif ! rtx returnjump; rtx seq ATTRIBUTE_UNUSED, epilogue_end ATTRIBUTE_UNUSED; rtx prologue_seq ATTRIBUTE_UNUSED, split_prologue_seq ATTRIBUTE_UNUSED; edge e, entry_edge, orig_entry_edge, exit_fallthru_edge; edge_iterator ei; df_analyze (); *************** thread_prologue_and_epilogue_insns (void *** 5614,5631 **** entry_edge = single_succ_edge (ENTRY_BLOCK_PTR); orig_entry_edge = entry_edge; - exit_fallthru_edge = find_fallthru_edge (EXIT_BLOCK_PTR->preds); - if (exit_fallthru_edge != NULL) - { - last_bb = exit_fallthru_edge->src; - last_bb_active = bb_active_p (last_bb); - } - else - { - last_bb = NULL; - last_bb_active = false; - } - split_prologue_seq = NULL_RTX; if (flag_split_stack && (lookup_attribute ("no_split_stack", DECL_ATTRIBUTES (cfun->decl)) --- 5773,5778 ---- *************** thread_prologue_and_epilogue_insns (void *** 5675,5683 **** } #endif bitmap_initialize (&bb_flags, &bitmap_default_obstack); - #ifdef HAVE_simple_return /* Try to perform a kind of shrink-wrapping, making sure the prologue/epilogue is emitted only around those parts of the function that require it. */ --- 5822,5830 ---- } #endif + #ifdef HAVE_simple_return bitmap_initialize (&bb_flags, &bitmap_default_obstack); /* Try to perform a kind of shrink-wrapping, making sure the prologue/epilogue is emitted only around those parts of the function that require it. */ *************** thread_prologue_and_epilogue_insns (void *** 5697,5707 **** HARD_REG_SET prologue_clobbered, prologue_used, live_on_edge; HARD_REG_SET set_up_by_prologue; rtx p_insn; - VEC(basic_block, heap) *vec; basic_block bb; bitmap_head bb_antic_flags; bitmap_head bb_on_list; if (dump_file) fprintf (dump_file, "Attempting shrink-wrapping optimization.\n"); --- 5844,5854 ---- HARD_REG_SET prologue_clobbered, prologue_used, live_on_edge; HARD_REG_SET set_up_by_prologue; rtx p_insn; VEC(basic_block, heap) *vec; basic_block bb; bitmap_head bb_antic_flags; bitmap_head bb_on_list; + bitmap_head bb_tail; if (dump_file) fprintf (dump_file, "Attempting shrink-wrapping optimization.\n"); *************** thread_prologue_and_epilogue_insns (void *** 5726,5737 **** prepare_shrink_wrap (entry_edge->dest); - /* That may have inserted instructions into the last block. */ - if (last_bb && !last_bb_active) - last_bb_active = bb_active_p (last_bb); - bitmap_initialize (&bb_antic_flags, &bitmap_default_obstack); bitmap_initialize (&bb_on_list, &bitmap_default_obstack); /* Find the set of basic blocks that require a stack frame. */ --- 5873,5881 ---- prepare_shrink_wrap (entry_edge->dest); bitmap_initialize (&bb_antic_flags, &bitmap_default_obstack); bitmap_initialize (&bb_on_list, &bitmap_default_obstack); + bitmap_initialize (&bb_tail, &bitmap_default_obstack); /* Find the set of basic blocks that require a stack frame. */ *************** thread_prologue_and_epilogue_insns (void *** 5750,5812 **** FOR_EACH_BB (bb) { rtx insn; ! /* As a special case, check for jumps to the last bb that ! cannot successfully be converted to simple_returns later ! on, and mark them as requiring a frame. These are ! conditional jumps that jump to their fallthru block, so ! it's not a case that is expected to occur often. */ ! if (JUMP_P (BB_END (bb)) && any_condjump_p (BB_END (bb)) ! && single_succ_p (bb) ! && !last_bb_active ! && single_succ (bb) == last_bb) ! { ! bitmap_set_bit (&bb_flags, bb->index); ! VEC_quick_push (basic_block, vec, bb); ! } ! else ! FOR_BB_INSNS (bb, insn) ! if (requires_stack_frame_p (insn, prologue_used, ! set_up_by_prologue)) ! { ! bitmap_set_bit (&bb_flags, bb->index); ! VEC_quick_push (basic_block, vec, bb); ! break; ! } } /* For every basic block that needs a prologue, mark all blocks reachable from it, so as to ensure they are also seen as requiring a prologue. */ while (!VEC_empty (basic_block, vec)) { basic_block tmp_bb = VEC_pop (basic_block, vec); ! edge e; ! edge_iterator ei; FOR_EACH_EDGE (e, ei, tmp_bb->succs) if (e->dest != EXIT_BLOCK_PTR && bitmap_set_bit (&bb_flags, e->dest->index)) VEC_quick_push (basic_block, vec, e->dest); } ! /* If the last basic block contains only a label, we'll be able ! to convert jumps to it to (potentially conditional) return ! insns later. This means we don't necessarily need a prologue ! for paths reaching it. */ ! if (last_bb && optimize) ! { ! if (!last_bb_active) ! bitmap_clear_bit (&bb_flags, last_bb->index); ! else if (!bitmap_bit_p (&bb_flags, last_bb->index)) ! goto fail_shrinkwrap; } /* Now walk backwards from every block that is marked as needing ! a prologue to compute the bb_antic_flags bitmap. */ ! bitmap_copy (&bb_antic_flags, &bb_flags); FOR_EACH_BB (bb) { ! edge e; ! edge_iterator ei; ! if (!bitmap_bit_p (&bb_flags, bb->index)) continue; FOR_EACH_EDGE (e, ei, bb->preds) if (!bitmap_bit_p (&bb_antic_flags, e->src->index) --- 5894,5947 ---- FOR_EACH_BB (bb) { rtx insn; ! FOR_BB_INSNS (bb, insn) ! if (requires_stack_frame_p (insn, prologue_used, ! set_up_by_prologue)) ! { ! bitmap_set_bit (&bb_flags, bb->index); ! VEC_quick_push (basic_block, vec, bb); ! break; ! } } + /* Save a copy of blocks that really need a prologue. */ + bitmap_copy (&bb_antic_flags, &bb_flags); + /* For every basic block that needs a prologue, mark all blocks reachable from it, so as to ensure they are also seen as requiring a prologue. */ while (!VEC_empty (basic_block, vec)) { basic_block tmp_bb = VEC_pop (basic_block, vec); ! FOR_EACH_EDGE (e, ei, tmp_bb->succs) if (e->dest != EXIT_BLOCK_PTR && bitmap_set_bit (&bb_flags, e->dest->index)) VEC_quick_push (basic_block, vec, e->dest); } ! ! /* Find the set of basic blocks that need no prologue, have a ! single successor, and only go to the exit. */ ! VEC_quick_push (basic_block, vec, EXIT_BLOCK_PTR); ! while (!VEC_empty (basic_block, vec)) ! { ! basic_block tmp_bb = VEC_pop (basic_block, vec); ! ! FOR_EACH_EDGE (e, ei, tmp_bb->preds) ! if (single_succ_p (e->src) ! && !bitmap_bit_p (&bb_antic_flags, e->src->index) ! && bitmap_set_bit (&bb_tail, e->src->index)) ! VEC_quick_push (basic_block, vec, e->src); } /* Now walk backwards from every block that is marked as needing ! a prologue to compute the bb_antic_flags bitmap. Exclude ! tail blocks; They can be duplicated to be used on paths not ! needing a prologue. */ ! bitmap_and_compl (&bb_antic_flags, &bb_flags, &bb_tail); FOR_EACH_BB (bb) { ! if (!bitmap_bit_p (&bb_antic_flags, bb->index)) continue; FOR_EACH_EDGE (e, ei, bb->preds) if (!bitmap_bit_p (&bb_antic_flags, e->src->index) *************** thread_prologue_and_epilogue_insns (void *** 5816,5823 **** while (!VEC_empty (basic_block, vec)) { basic_block tmp_bb = VEC_pop (basic_block, vec); - edge e; - edge_iterator ei; bool all_set = true; bitmap_clear_bit (&bb_on_list, tmp_bb->index); --- 5951,5956 ---- *************** thread_prologue_and_epilogue_insns (void *** 5862,5889 **** } } ! /* Test whether the prologue is known to clobber any register ! (other than FP or SP) which are live on the edge. */ ! CLEAR_HARD_REG_BIT (prologue_clobbered, STACK_POINTER_REGNUM); ! if (frame_pointer_needed) ! CLEAR_HARD_REG_BIT (prologue_clobbered, HARD_FRAME_POINTER_REGNUM); ! CLEAR_HARD_REG_SET (live_on_edge); ! reg_set_to_hard_reg_set (&live_on_edge, ! df_get_live_in (entry_edge->dest)); ! if (hard_reg_set_intersect_p (live_on_edge, prologue_clobbered)) { ! entry_edge = orig_entry_edge; ! if (dump_file) ! fprintf (dump_file, "Shrink-wrapping aborted due to clobber.\n"); } ! else if (entry_edge != orig_entry_edge) { crtl->shrink_wrapped = true; if (dump_file) fprintf (dump_file, "Performing shrink-wrapping.\n"); } fail_shrinkwrap: bitmap_clear (&bb_antic_flags); bitmap_clear (&bb_on_list); VEC_free (basic_block, heap, vec); --- 5995,6132 ---- } } ! if (entry_edge != orig_entry_edge) { ! /* Test whether the prologue is known to clobber any register ! (other than FP or SP) which are live on the edge. */ ! CLEAR_HARD_REG_BIT (prologue_clobbered, STACK_POINTER_REGNUM); ! if (frame_pointer_needed) ! CLEAR_HARD_REG_BIT (prologue_clobbered, HARD_FRAME_POINTER_REGNUM); ! CLEAR_HARD_REG_SET (live_on_edge); ! reg_set_to_hard_reg_set (&live_on_edge, ! df_get_live_in (entry_edge->dest)); ! if (hard_reg_set_intersect_p (live_on_edge, prologue_clobbered)) ! { ! entry_edge = orig_entry_edge; ! if (dump_file) ! fprintf (dump_file, ! "Shrink-wrapping aborted due to clobber.\n"); ! } } ! if (entry_edge != orig_entry_edge) { crtl->shrink_wrapped = true; if (dump_file) fprintf (dump_file, "Performing shrink-wrapping.\n"); + + /* Find tail blocks reachable from both blocks needing a + prologue and blocks not needing a prologue. */ + if (!bitmap_empty_p (&bb_tail)) + FOR_EACH_BB (bb) + { + bool some_pro, some_no_pro; + if (!bitmap_bit_p (&bb_tail, bb->index)) + continue; + some_pro = some_no_pro = false; + FOR_EACH_EDGE (e, ei, bb->preds) + { + if (bitmap_bit_p (&bb_flags, e->src->index)) + some_pro = true; + else + some_no_pro = true; + } + if (some_pro && some_no_pro) + VEC_quick_push (basic_block, vec, bb); + else + bitmap_clear_bit (&bb_tail, bb->index); + } + /* Find the head of each tail. */ + while (!VEC_empty (basic_block, vec)) + { + basic_block tbb = VEC_pop (basic_block, vec); + + if (!bitmap_bit_p (&bb_tail, tbb->index)) + continue; + + while (single_succ_p (tbb)) + { + tbb = single_succ (tbb); + bitmap_clear_bit (&bb_tail, tbb->index); + } + } + /* Now duplicate the tails. */ + if (!bitmap_empty_p (&bb_tail)) + FOR_EACH_BB_REVERSE (bb) + { + basic_block copy_bb, next_bb; + rtx insert_point; + + if (!bitmap_clear_bit (&bb_tail, bb->index)) + continue; + + next_bb = single_succ (bb); + + /* Create a copy of BB, instructions and all, for + use on paths that don't need a prologue. + Ideal placement of the copy is on a fall-thru edge + or after a block that would jump to the copy. */ + FOR_EACH_EDGE (e, ei, bb->preds) + if (!bitmap_bit_p (&bb_flags, e->src->index) + && single_succ_p (e->src)) + break; + if (e) + { + copy_bb = create_basic_block (NEXT_INSN (BB_END (e->src)), + NULL_RTX, e->src); + BB_COPY_PARTITION (copy_bb, e->src); + } + else + { + /* Otherwise put the copy at the end of the function. */ + copy_bb = create_basic_block (NULL_RTX, NULL_RTX, + EXIT_BLOCK_PTR->prev_bb); + BB_COPY_PARTITION (copy_bb, bb); + } + + insert_point = emit_note_after (NOTE_INSN_DELETED, + BB_END (copy_bb)); + emit_barrier_after (BB_END (copy_bb)); + + make_single_succ_edge (copy_bb, EXIT_BLOCK_PTR, 0); + + dup_block_and_redirect (bb, copy_bb, insert_point, + &bb_flags); + + while (next_bb != EXIT_BLOCK_PTR) + { + basic_block tbb = next_bb; + next_bb = single_succ (tbb); + e = split_block (copy_bb, PREV_INSN (insert_point)); + copy_bb = e->dest; + dup_block_and_redirect (tbb, copy_bb, insert_point, + &bb_flags); + } + + /* Fiddle with edge flags to quiet verify_flow_info. + We have yet to add a simple_return to the tails, + as we'd like to first convert_jumps_to_returns in + case the block is no longer used after that. */ + e = single_succ_edge (copy_bb); + if (CALL_P (PREV_INSN (insert_point)) + && SIBLING_CALL_P (PREV_INSN (insert_point))) + e->flags = EDGE_SIBCALL | EDGE_ABNORMAL; + else + e->flags = EDGE_FAKE; + /* verify_flow_info doesn't like a note after a + sibling call. */ + delete_insn (insert_point); + if (bitmap_empty_p (&bb_tail)) + break; + } } fail_shrinkwrap: + bitmap_clear (&bb_tail); bitmap_clear (&bb_antic_flags); bitmap_clear (&bb_on_list); VEC_free (basic_block, heap, vec); *************** thread_prologue_and_epilogue_insns (void *** 5911,6057 **** rtl_profile_for_bb (EXIT_BLOCK_PTR); ! #ifdef HAVE_return /* If we're allowed to generate a simple return instruction, then by definition we don't need a full epilogue. If the last basic block before the exit block does not contain active instructions, examine its predecessors and try to emit (conditional) return instructions. */ ! if (optimize && !last_bb_active ! && (HAVE_return || entry_edge != orig_entry_edge)) { ! edge_iterator ei2; ! int i; ! basic_block bb; ! rtx label; ! VEC(basic_block,heap) *src_bbs; ! ! if (exit_fallthru_edge == NULL) ! goto epilogue_done; ! label = BB_HEAD (last_bb); ! ! src_bbs = VEC_alloc (basic_block, heap, EDGE_COUNT (last_bb->preds)); ! FOR_EACH_EDGE (e, ei2, last_bb->preds) ! if (e->src != ENTRY_BLOCK_PTR) ! VEC_quick_push (basic_block, src_bbs, e->src); ! ! FOR_EACH_VEC_ELT (basic_block, src_bbs, i, bb) { ! bool simple_p; ! rtx jump; ! e = find_edge (bb, last_bb); ! jump = BB_END (bb); ! ! #ifdef HAVE_simple_return ! simple_p = (entry_edge != orig_entry_edge ! && !bitmap_bit_p (&bb_flags, bb->index)); ! #else ! simple_p = false; ! #endif ! ! if (!simple_p ! && (!HAVE_return || !JUMP_P (jump) ! || JUMP_LABEL (jump) != label)) ! continue; ! ! /* If we have an unconditional jump, we can replace that ! with a simple return instruction. */ ! if (!JUMP_P (jump)) ! { ! emit_barrier_after (BB_END (bb)); ! emit_return_into_block (simple_p, bb); ! } ! else if (simplejump_p (jump)) { ! /* The use of the return register might be present in the exit ! fallthru block. Either: ! - removing the use is safe, and we should remove the use in ! the exit fallthru block, or ! - removing the use is not safe, and we should add it here. ! For now, we conservatively choose the latter. Either of the ! 2 helps in crossjumping. */ ! emit_use_return_register_into_block (bb); ! ! emit_return_into_block (simple_p, bb); ! delete_insn (jump); } ! else if (condjump_p (jump) && JUMP_LABEL (jump) != label) ! { ! basic_block new_bb; ! edge new_e; ! gcc_assert (simple_p); ! new_bb = split_edge (e); ! emit_barrier_after (BB_END (new_bb)); ! emit_return_into_block (simple_p, new_bb); ! #ifdef HAVE_simple_return ! if (BB_PARTITION (new_bb) == BB_HOT_PARTITION) ! simple_return_block_hot = new_bb; ! else ! simple_return_block_cold = new_bb; #endif ! new_e = single_succ_edge (new_bb); ! redirect_edge_succ (new_e, EXIT_BLOCK_PTR); ! continue; ! } ! /* If we have a conditional jump branching to the last ! block, we can try to replace that with a conditional ! return instruction. */ ! else if (condjump_p (jump)) ! { ! rtx dest; ! if (simple_p) ! dest = simple_return_rtx; ! else ! dest = ret_rtx; ! if (! redirect_jump (jump, dest, 0)) ! { ! #ifdef HAVE_simple_return ! if (simple_p) ! VEC_safe_push (rtx, heap, ! unconverted_simple_returns, jump); ! #endif ! continue; ! } ! /* See comment in simple_jump_p case above. */ ! emit_use_return_register_into_block (bb); ! /* If this block has only one successor, it both jumps ! and falls through to the fallthru block, so we can't ! delete the edge. */ ! if (single_succ_p (bb)) ! continue; ! } ! else { #ifdef HAVE_simple_return ! if (simple_p) ! VEC_safe_push (rtx, heap, ! unconverted_simple_returns, jump); #endif ! continue; } - - /* Fix up the CFG for the successful change we just made. */ - redirect_edge_succ (e, EXIT_BLOCK_PTR); - } - VEC_free (basic_block, heap, src_bbs); - - if (HAVE_return) - { - /* Emit a return insn for the exit fallthru block. Whether - this is still reachable will be determined later. */ - - emit_barrier_after (BB_END (last_bb)); - emit_return_into_block (false, last_bb); - epilogue_end = BB_END (last_bb); - if (JUMP_P (epilogue_end)) - set_return_jump_label (epilogue_end); - single_succ_edge (last_bb)->flags &= ~EDGE_FALLTHRU; - goto epilogue_done; } } #endif --- 6154,6226 ---- rtl_profile_for_bb (EXIT_BLOCK_PTR); ! exit_fallthru_edge = find_fallthru_edge (EXIT_BLOCK_PTR->preds); ! /* If we're allowed to generate a simple return instruction, then by definition we don't need a full epilogue. If the last basic block before the exit block does not contain active instructions, examine its predecessors and try to emit (conditional) return instructions. */ ! #ifdef HAVE_simple_return ! if (entry_edge != orig_entry_edge) { ! if (optimize) { ! unsigned i, last; ! /* convert_jumps_to_returns may add to EXIT_BLOCK_PTR->preds ! (but won't remove). Stop at end of current preds. */ ! last = EDGE_COUNT (EXIT_BLOCK_PTR->preds); ! for (i = 0; i < last; i++) { ! e = EDGE_I (EXIT_BLOCK_PTR->preds, i); ! if (LABEL_P (BB_HEAD (e->src)) ! && !bitmap_bit_p (&bb_flags, e->src->index) ! && !active_insn_between (BB_HEAD (e->src), BB_END (e->src))) ! unconverted_simple_returns ! = convert_jumps_to_returns (e->src, true, ! unconverted_simple_returns); } ! } ! if (exit_fallthru_edge != NULL ! && EDGE_COUNT (exit_fallthru_edge->src->preds) != 0 ! && !bitmap_bit_p (&bb_flags, exit_fallthru_edge->src->index)) ! { ! basic_block last_bb; ! ! last_bb = emit_return_for_exit (exit_fallthru_edge, true); ! returnjump = BB_END (last_bb); ! exit_fallthru_edge = NULL; ! } ! } #endif ! #ifdef HAVE_return ! if (HAVE_return) ! { ! if (exit_fallthru_edge == NULL) ! goto epilogue_done; ! if (optimize) ! { ! basic_block last_bb = exit_fallthru_edge->src; ! if (LABEL_P (BB_HEAD (last_bb)) ! && !active_insn_between (BB_HEAD (last_bb), BB_END (last_bb))) ! convert_jumps_to_returns (last_bb, false, NULL); ! if (EDGE_COUNT (exit_fallthru_edge->src->preds) != 0) { + last_bb = emit_return_for_exit (exit_fallthru_edge, false); + epilogue_end = returnjump = BB_END (last_bb); #ifdef HAVE_simple_return ! /* Emitting the return may add a basic block. ! Fix bb_flags for the added block. */ ! if (last_bb != exit_fallthru_edge->src) ! bitmap_set_bit (&bb_flags, last_bb->index); #endif ! goto epilogue_done; } } } #endif *************** epilogue_done: *** 6171,6180 **** convert to conditional simple_returns, but couldn't for some reason, create a block to hold a simple_return insn and redirect those remaining edges. */ ! if (!VEC_empty (rtx, unconverted_simple_returns)) { basic_block exit_pred = EXIT_BLOCK_PTR->prev_bb; - rtx jump; int i; gcc_assert (entry_edge != orig_entry_edge); --- 6340,6352 ---- convert to conditional simple_returns, but couldn't for some reason, create a block to hold a simple_return insn and redirect those remaining edges. */ ! if (!VEC_empty (edge, unconverted_simple_returns)) { + basic_block simple_return_block_hot = NULL; + basic_block simple_return_block_cold = NULL; + edge pending_edge_hot = NULL; + edge pending_edge_cold = NULL; basic_block exit_pred = EXIT_BLOCK_PTR->prev_bb; int i; gcc_assert (entry_edge != orig_entry_edge); *************** epilogue_done: *** 6184,6208 **** if (returnjump != NULL_RTX && JUMP_LABEL (returnjump) == simple_return_rtx) { ! edge e = split_block (exit_fallthru_edge->src, ! PREV_INSN (returnjump)); if (BB_PARTITION (e->src) == BB_HOT_PARTITION) simple_return_block_hot = e->dest; else simple_return_block_cold = e->dest; } ! FOR_EACH_VEC_ELT (rtx, unconverted_simple_returns, i, jump) { - basic_block src_bb = BLOCK_FOR_INSN (jump); - edge e = find_edge (src_bb, last_bb); basic_block *pdest_bb; ! if (BB_PARTITION (src_bb) == BB_HOT_PARTITION) ! pdest_bb = &simple_return_block_hot; else ! pdest_bb = &simple_return_block_cold; ! if (*pdest_bb == NULL) { basic_block bb; rtx start; --- 6356,6403 ---- if (returnjump != NULL_RTX && JUMP_LABEL (returnjump) == simple_return_rtx) { ! e = split_block (BLOCK_FOR_INSN (returnjump), PREV_INSN (returnjump)); if (BB_PARTITION (e->src) == BB_HOT_PARTITION) simple_return_block_hot = e->dest; else simple_return_block_cold = e->dest; } ! /* Also check returns we might need to add to tail blocks. */ ! FOR_EACH_EDGE (e, ei, EXIT_BLOCK_PTR->preds) ! if (EDGE_COUNT (e->src->preds) != 0 ! && (e->flags & EDGE_FAKE) != 0 ! && !bitmap_bit_p (&bb_flags, e->src->index)) ! { ! if (BB_PARTITION (e->src) == BB_HOT_PARTITION) ! pending_edge_hot = e; ! else ! pending_edge_cold = e; ! } ! ! FOR_EACH_VEC_ELT (edge, unconverted_simple_returns, i, e) { basic_block *pdest_bb; + edge pending; ! if (BB_PARTITION (e->src) == BB_HOT_PARTITION) ! { ! pdest_bb = &simple_return_block_hot; ! pending = pending_edge_hot; ! } else ! { ! pdest_bb = &simple_return_block_cold; ! pending = pending_edge_cold; ! } ! ! if (*pdest_bb == NULL && pending != NULL) ! { ! emit_return_into_block (true, pending->src); ! pending->flags &= ~(EDGE_FALLTHRU | EDGE_FAKE); ! *pdest_bb = pending->src; ! } ! else if (*pdest_bb == NULL) { basic_block bb; rtx start; *************** epilogue_done: *** 6219,6225 **** } redirect_edge_and_branch_force (e, *pdest_bb); } ! VEC_free (rtx, heap, unconverted_simple_returns); } #endif --- 6414,6432 ---- } redirect_edge_and_branch_force (e, *pdest_bb); } ! VEC_free (edge, heap, unconverted_simple_returns); ! } ! ! if (entry_edge != orig_entry_edge) ! { ! FOR_EACH_EDGE (e, ei, EXIT_BLOCK_PTR->preds) ! if (EDGE_COUNT (e->src->preds) != 0 ! && (e->flags & EDGE_FAKE) != 0 ! && !bitmap_bit_p (&bb_flags, e->src->index)) ! { ! emit_return_into_block (true, e->src); ! e->flags &= ~(EDGE_FALLTHRU | EDGE_FAKE); ! } } #endif *************** epilogue_done: *** 6233,6240 **** if (!CALL_P (insn) || ! SIBLING_CALL_P (insn) || (entry_edge != orig_entry_edge ! && !bitmap_bit_p (&bb_flags, bb->index))) { ei_next (&ei); continue; --- 6440,6450 ---- if (!CALL_P (insn) || ! SIBLING_CALL_P (insn) + #ifdef HAVE_simple_return || (entry_edge != orig_entry_edge ! && !bitmap_bit_p (&bb_flags, bb->index)) ! #endif ! ) { ei_next (&ei); continue; *************** epilogue_done: *** 6281,6287 **** --- 6491,6499 ---- } #endif + #ifdef HAVE_simple_return bitmap_clear (&bb_flags); + #endif /* Threading the prologue and epilogue changes the artificial refs in the entry and exit blocks. */ -- Alan Modra Australia Development Lab, IBM ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: PowerPC shrink-wrap support 3 of 3 2011-10-31 15:14 ` Alan Modra @ 2011-11-01 15:34 ` Alan Modra 2011-11-07 17:27 ` Jakub Jelinek 2011-11-09 9:48 ` Hans-Peter Nilsson 0 siblings, 2 replies; 48+ messages in thread From: Alan Modra @ 2011-11-01 15:34 UTC (permalink / raw) To: gcc-patches, Bernd Schmidt On Tue, Nov 01, 2011 at 12:57:22AM +1030, Alan Modra wrote: > Bits left to do > - limit size of duplicated tails Done here. Also fixes a hole in that I took no notice of targetm.cannot_copy_insn_p when duplicating tails. One interesting result is that the tail duplication actually reduces the text size of libstdc++.so from 1074042 to 1073478 bytes on powerpc-linux. The reason being that a shrink-wrapped function that needs a prologue only on paths ending in a sibling call will lose one copy of the epilogue. That must happen enough to more than make up for duplicated tails. Bootstrapped and regression tested powerpc-linux. OK to apply? (And I won't be posting any more versions of the patch until this is reviewed. Please excuse me for spamming the list.) * function.c (bb_active_p): Delete. (dup_block_and_redirect, active_insn_between): New functions. (convert_jumps_to_returns, emit_return_for_exit): New functions, split out from.. (thread_prologue_and_epilogue_insns): ..here. Delete shadowing variables. Don't do prologue register clobber tests when shrink wrapping already failed. Delete all last_bb_active code. Instead compute tail block candidates for duplicating exit path. Remove these from antic set. Duplicate tails when reached from both blocks needing a prologue/epilogue and blocks not needing such. * ifcvt.c (dead_or_predicable): Test both flag_shrink_wrap and HAVE_simple_return. * bb-reorder.c (get_uncond_jump_length): Make global. * bb-reorder.h (get_uncond_jump_length): Declare. * cfgrtl.c (rtl_create_basic_block): Comment typo fix. (rtl_split_edge): Likewise. Warning fix. (rtl_duplicate_bb): New function. (rtl_cfg_hooks): Enable can_duplicate_block_p and duplicate_block. Index: gcc/function.c =================================================================== --- gcc/function.c (revision 180588) +++ gcc/function.c (working copy) @@ -65,6 +65,8 @@ along with GCC; see the file COPYING3. #include "df.h" #include "timevar.h" #include "vecprim.h" +#include "params.h" +#include "bb-reorder.h" /* So we can assign to cfun in this file. */ #undef cfun @@ -5290,8 +5292,6 @@ requires_stack_frame_p (rtx insn, HARD_R HARD_REG_SET hardregs; unsigned regno; - if (!INSN_P (insn) || DEBUG_INSN_P (insn)) - return false; if (CALL_P (insn)) return !SIBLING_CALL_P (insn); @@ -5514,23 +5514,186 @@ set_return_jump_label (rtx returnjump) JUMP_LABEL (returnjump) = ret_rtx; } -/* Return true if BB has any active insns. */ +#ifdef HAVE_simple_return +/* Create a copy of BB instructions and insert at BEFORE. Redirect + preds of BB to COPY_BB if they don't appear in NEED_PROLOGUE. */ +static void +dup_block_and_redirect (basic_block bb, basic_block copy_bb, rtx before, + bitmap_head *need_prologue) +{ + edge_iterator ei; + edge e; + rtx insn = BB_END (bb); + + /* We know BB has a single successor, so there is no need to copy a + simple jump at the end of BB. */ + if (simplejump_p (insn)) + insn = PREV_INSN (insn); + + start_sequence (); + duplicate_insn_chain (BB_HEAD (bb), insn); + if (dump_file) + { + unsigned count = 0; + for (insn = get_insns (); insn; insn = NEXT_INSN (insn)) + if (active_insn_p (insn)) + ++count; + fprintf (dump_file, "Duplicating bb %d to bb %d, %u active insns.\n", + bb->index, copy_bb->index, count); + } + insn = get_insns (); + end_sequence (); + emit_insn_before (insn, before); + + /* Redirect all the paths that need no prologue into copy_bb. */ + for (ei = ei_start (bb->preds); (e = ei_safe_edge (ei)); ) + if (!bitmap_bit_p (need_prologue, e->src->index)) + { + redirect_edge_and_branch_force (e, copy_bb); + continue; + } + else + ei_next (&ei); +} +#endif + +#if defined (HAVE_return) || defined (HAVE_simple_return) +/* Return true if there are any active insns between HEAD and TAIL. */ static bool -bb_active_p (basic_block bb) +active_insn_between (rtx head, rtx tail) { + while (tail) + { + if (active_insn_p (tail)) + return true; + if (tail == head) + return false; + tail = PREV_INSN (tail); + } + return false; +} + +/* LAST_BB is a block that exits, and empty of active instructions. + Examine its predecessors for jumps that can be converted to + (conditional) returns. */ +static VEC (edge, heap) * +convert_jumps_to_returns (basic_block last_bb, bool simple_p, + VEC (edge, heap) *unconverted ATTRIBUTE_UNUSED) +{ + int i; + basic_block bb; rtx label; + edge_iterator ei; + edge e; + VEC(basic_block,heap) *src_bbs; + + src_bbs = VEC_alloc (basic_block, heap, EDGE_COUNT (last_bb->preds)); + FOR_EACH_EDGE (e, ei, last_bb->preds) + if (e->src != ENTRY_BLOCK_PTR) + VEC_quick_push (basic_block, src_bbs, e->src); - /* Test whether there are active instructions in BB. */ - label = BB_END (bb); - while (label && !LABEL_P (label)) + label = BB_HEAD (last_bb); + + FOR_EACH_VEC_ELT (basic_block, src_bbs, i, bb) { - if (active_insn_p (label)) - break; - label = PREV_INSN (label); + rtx jump = BB_END (bb); + + if (!JUMP_P (jump) || JUMP_LABEL (jump) != label) + continue; + + e = find_edge (bb, last_bb); + + /* If we have an unconditional jump, we can replace that + with a simple return instruction. */ + if (simplejump_p (jump)) + { + /* The use of the return register might be present in the exit + fallthru block. Either: + - removing the use is safe, and we should remove the use in + the exit fallthru block, or + - removing the use is not safe, and we should add it here. + For now, we conservatively choose the latter. Either of the + 2 helps in crossjumping. */ + emit_use_return_register_into_block (bb); + + emit_return_into_block (simple_p, bb); + delete_insn (jump); + } + + /* If we have a conditional jump branching to the last + block, we can try to replace that with a conditional + return instruction. */ + else if (condjump_p (jump)) + { + rtx dest; + + if (simple_p) + dest = simple_return_rtx; + else + dest = ret_rtx; + if (!redirect_jump (jump, dest, 0)) + { +#ifdef HAVE_simple_return + if (simple_p) + { + if (dump_file) + fprintf (dump_file, + "Failed to redirect bb %d branch.\n", bb->index); + VEC_safe_push (edge, heap, unconverted, e); + } +#endif + continue; + } + + /* See comment in simplejump_p case above. */ + emit_use_return_register_into_block (bb); + + /* If this block has only one successor, it both jumps + and falls through to the fallthru block, so we can't + delete the edge. */ + if (single_succ_p (bb)) + continue; + } + else + { +#ifdef HAVE_simple_return + if (simple_p) + { + if (dump_file) + fprintf (dump_file, + "Failed to redirect bb %d branch.\n", bb->index); + VEC_safe_push (edge, heap, unconverted, e); + } +#endif + continue; + } + + /* Fix up the CFG for the successful change we just made. */ + redirect_edge_succ (e, EXIT_BLOCK_PTR); } - return BB_HEAD (bb) != label || !LABEL_P (label); + VEC_free (basic_block, heap, src_bbs); + return unconverted; } +/* Emit a return insn for the exit fallthru block. */ +static basic_block +emit_return_for_exit (edge exit_fallthru_edge, bool simple_p) +{ + basic_block last_bb = exit_fallthru_edge->src; + + if (JUMP_P (BB_END (last_bb))) + { + last_bb = split_edge (exit_fallthru_edge); + exit_fallthru_edge = single_succ_edge (last_bb); + } + emit_barrier_after (BB_END (last_bb)); + emit_return_into_block (simple_p, last_bb); + exit_fallthru_edge->flags &= ~EDGE_FALLTHRU; + return last_bb; +} +#endif + + /* Generate the prologue and epilogue RTL if the machine supports it. Thread this into place with notes indicating where the prologue ends and where the epilogue begins. Update the basic block information when possible. @@ -5583,20 +5746,17 @@ static void thread_prologue_and_epilogue_insns (void) { bool inserted; - basic_block last_bb; - bool last_bb_active ATTRIBUTE_UNUSED; #ifdef HAVE_simple_return - VEC (rtx, heap) *unconverted_simple_returns = NULL; - basic_block simple_return_block_hot = NULL; - basic_block simple_return_block_cold = NULL; + VEC (edge, heap) *unconverted_simple_returns = NULL; bool nonempty_prologue; + bitmap_head bb_flags; + unsigned max_grow_size; #endif - rtx returnjump ATTRIBUTE_UNUSED; + rtx returnjump; rtx seq ATTRIBUTE_UNUSED, epilogue_end ATTRIBUTE_UNUSED; rtx prologue_seq ATTRIBUTE_UNUSED, split_prologue_seq ATTRIBUTE_UNUSED; edge e, entry_edge, orig_entry_edge, exit_fallthru_edge; edge_iterator ei; - bitmap_head bb_flags; df_analyze (); @@ -5614,18 +5774,6 @@ thread_prologue_and_epilogue_insns (void entry_edge = single_succ_edge (ENTRY_BLOCK_PTR); orig_entry_edge = entry_edge; - exit_fallthru_edge = find_fallthru_edge (EXIT_BLOCK_PTR->preds); - if (exit_fallthru_edge != NULL) - { - last_bb = exit_fallthru_edge->src; - last_bb_active = bb_active_p (last_bb); - } - else - { - last_bb = NULL; - last_bb_active = false; - } - split_prologue_seq = NULL_RTX; if (flag_split_stack && (lookup_attribute ("no_split_stack", DECL_ATTRIBUTES (cfun->decl)) @@ -5675,9 +5823,9 @@ thread_prologue_and_epilogue_insns (void } #endif +#ifdef HAVE_simple_return bitmap_initialize (&bb_flags, &bitmap_default_obstack); -#ifdef HAVE_simple_return /* Try to perform a kind of shrink-wrapping, making sure the prologue/epilogue is emitted only around those parts of the function that require it. */ @@ -5697,11 +5845,11 @@ thread_prologue_and_epilogue_insns (void HARD_REG_SET prologue_clobbered, prologue_used, live_on_edge; HARD_REG_SET set_up_by_prologue; rtx p_insn; - VEC(basic_block, heap) *vec; basic_block bb; bitmap_head bb_antic_flags; bitmap_head bb_on_list; + bitmap_head bb_tail; if (dump_file) fprintf (dump_file, "Attempting shrink-wrapping optimization.\n"); @@ -5726,14 +5874,12 @@ thread_prologue_and_epilogue_insns (void prepare_shrink_wrap (entry_edge->dest); - /* That may have inserted instructions into the last block. */ - if (last_bb && !last_bb_active) - last_bb_active = bb_active_p (last_bb); - bitmap_initialize (&bb_antic_flags, &bitmap_default_obstack); bitmap_initialize (&bb_on_list, &bitmap_default_obstack); + bitmap_initialize (&bb_tail, &bitmap_default_obstack); - /* Find the set of basic blocks that require a stack frame. */ + /* Find the set of basic blocks that require a stack frame, + and blocks that are too big to be duplicated. */ vec = VEC_alloc (basic_block, heap, n_basic_blocks); @@ -5747,66 +5893,80 @@ thread_prologue_and_epilogue_insns (void add_to_hard_reg_set (&set_up_by_prologue, Pmode, PIC_OFFSET_TABLE_REGNUM); + /* We don't use a different max size depending on + optimize_bb_for_speed_p because increasing shrink-wrapping + opportunities by duplicating tail blocks can actually result + in an overall decrease in code size. */ + max_grow_size = get_uncond_jump_length (); + max_grow_size *= PARAM_VALUE (PARAM_MAX_GROW_COPY_BB_INSNS); + FOR_EACH_BB (bb) { rtx insn; - /* As a special case, check for jumps to the last bb that - cannot successfully be converted to simple_returns later - on, and mark them as requiring a frame. These are - conditional jumps that jump to their fallthru block, so - it's not a case that is expected to occur often. */ - if (JUMP_P (BB_END (bb)) && any_condjump_p (BB_END (bb)) - && single_succ_p (bb) - && !last_bb_active - && single_succ (bb) == last_bb) - { - bitmap_set_bit (&bb_flags, bb->index); - VEC_quick_push (basic_block, vec, bb); - } - else - FOR_BB_INSNS (bb, insn) - if (requires_stack_frame_p (insn, prologue_used, - set_up_by_prologue)) - { - bitmap_set_bit (&bb_flags, bb->index); - VEC_quick_push (basic_block, vec, bb); - break; - } + unsigned size = 0; + + FOR_BB_INSNS (bb, insn) + if (NONDEBUG_INSN_P (insn)) + { + if (requires_stack_frame_p (insn, prologue_used, + set_up_by_prologue)) + { + if (bb == entry_edge->dest) + goto fail_shrinkwrap; + bitmap_set_bit (&bb_flags, bb->index); + VEC_quick_push (basic_block, vec, bb); + break; + } + else if (size <= max_grow_size) + { + size += get_attr_min_length (insn); + if (size > max_grow_size) + bitmap_set_bit (&bb_on_list, bb->index); + } + } } + /* Blocks that really need a prologue, or are too big for tails. */ + bitmap_ior_into (&bb_on_list, &bb_flags); + /* For every basic block that needs a prologue, mark all blocks reachable from it, so as to ensure they are also seen as requiring a prologue. */ while (!VEC_empty (basic_block, vec)) { basic_block tmp_bb = VEC_pop (basic_block, vec); - edge e; - edge_iterator ei; + FOR_EACH_EDGE (e, ei, tmp_bb->succs) if (e->dest != EXIT_BLOCK_PTR && bitmap_set_bit (&bb_flags, e->dest->index)) VEC_quick_push (basic_block, vec, e->dest); } - /* If the last basic block contains only a label, we'll be able - to convert jumps to it to (potentially conditional) return - insns later. This means we don't necessarily need a prologue - for paths reaching it. */ - if (last_bb && optimize) - { - if (!last_bb_active) - bitmap_clear_bit (&bb_flags, last_bb->index); - else if (!bitmap_bit_p (&bb_flags, last_bb->index)) - goto fail_shrinkwrap; + + /* Find the set of basic blocks that need no prologue, have a + single successor, can be duplicated, meet a max size + requirement, and go to the exit via like blocks. */ + VEC_quick_push (basic_block, vec, EXIT_BLOCK_PTR); + while (!VEC_empty (basic_block, vec)) + { + basic_block tmp_bb = VEC_pop (basic_block, vec); + + FOR_EACH_EDGE (e, ei, tmp_bb->preds) + if (single_succ_p (e->src) + && !bitmap_bit_p (&bb_on_list, e->src->index) + && can_duplicate_block_p (e->src) + && bitmap_set_bit (&bb_tail, e->src->index)) + VEC_quick_push (basic_block, vec, e->src); } /* Now walk backwards from every block that is marked as needing - a prologue to compute the bb_antic_flags bitmap. */ - bitmap_copy (&bb_antic_flags, &bb_flags); + a prologue to compute the bb_antic_flags bitmap. Exclude + tail blocks; They can be duplicated to be used on paths not + needing a prologue. */ + bitmap_clear (&bb_on_list); + bitmap_and_compl (&bb_antic_flags, &bb_flags, &bb_tail); FOR_EACH_BB (bb) { - edge e; - edge_iterator ei; - if (!bitmap_bit_p (&bb_flags, bb->index)) + if (!bitmap_bit_p (&bb_antic_flags, bb->index)) continue; FOR_EACH_EDGE (e, ei, bb->preds) if (!bitmap_bit_p (&bb_antic_flags, e->src->index) @@ -5816,8 +5976,6 @@ thread_prologue_and_epilogue_insns (void while (!VEC_empty (basic_block, vec)) { basic_block tmp_bb = VEC_pop (basic_block, vec); - edge e; - edge_iterator ei; bool all_set = true; bitmap_clear_bit (&bb_on_list, tmp_bb->index); @@ -5862,28 +6020,134 @@ thread_prologue_and_epilogue_insns (void } } - /* Test whether the prologue is known to clobber any register - (other than FP or SP) which are live on the edge. */ - CLEAR_HARD_REG_BIT (prologue_clobbered, STACK_POINTER_REGNUM); - if (frame_pointer_needed) - CLEAR_HARD_REG_BIT (prologue_clobbered, HARD_FRAME_POINTER_REGNUM); - CLEAR_HARD_REG_SET (live_on_edge); - reg_set_to_hard_reg_set (&live_on_edge, - df_get_live_in (entry_edge->dest)); - if (hard_reg_set_intersect_p (live_on_edge, prologue_clobbered)) + if (entry_edge != orig_entry_edge) { - entry_edge = orig_entry_edge; - if (dump_file) - fprintf (dump_file, "Shrink-wrapping aborted due to clobber.\n"); + /* Test whether the prologue is known to clobber any register + (other than FP or SP) which are live on the edge. */ + CLEAR_HARD_REG_BIT (prologue_clobbered, STACK_POINTER_REGNUM); + if (frame_pointer_needed) + CLEAR_HARD_REG_BIT (prologue_clobbered, HARD_FRAME_POINTER_REGNUM); + CLEAR_HARD_REG_SET (live_on_edge); + reg_set_to_hard_reg_set (&live_on_edge, + df_get_live_in (entry_edge->dest)); + if (hard_reg_set_intersect_p (live_on_edge, prologue_clobbered)) + { + entry_edge = orig_entry_edge; + if (dump_file) + fprintf (dump_file, + "Shrink-wrapping aborted due to clobber.\n"); + } } - else if (entry_edge != orig_entry_edge) + if (entry_edge != orig_entry_edge) { crtl->shrink_wrapped = true; if (dump_file) fprintf (dump_file, "Performing shrink-wrapping.\n"); + + /* Find tail blocks reachable from both blocks needing a + prologue and blocks not needing a prologue. */ + if (!bitmap_empty_p (&bb_tail)) + FOR_EACH_BB (bb) + { + bool some_pro, some_no_pro; + if (!bitmap_bit_p (&bb_tail, bb->index)) + continue; + some_pro = some_no_pro = false; + FOR_EACH_EDGE (e, ei, bb->preds) + { + if (bitmap_bit_p (&bb_flags, e->src->index)) + some_pro = true; + else + some_no_pro = true; + } + if (some_pro && some_no_pro) + VEC_quick_push (basic_block, vec, bb); + else + bitmap_clear_bit (&bb_tail, bb->index); + } + /* Find the head of each tail. */ + while (!VEC_empty (basic_block, vec)) + { + basic_block tbb = VEC_pop (basic_block, vec); + + if (!bitmap_bit_p (&bb_tail, tbb->index)) + continue; + + while (single_succ_p (tbb)) + { + tbb = single_succ (tbb); + bitmap_clear_bit (&bb_tail, tbb->index); + } + } + /* Now duplicate the tails. */ + if (!bitmap_empty_p (&bb_tail)) + FOR_EACH_BB_REVERSE (bb) + { + basic_block copy_bb, tbb; + rtx insert_point; + int eflags; + + if (!bitmap_clear_bit (&bb_tail, bb->index)) + continue; + + /* Create a copy of BB, instructions and all, for + use on paths that don't need a prologue. + Ideal placement of the copy is on a fall-thru edge + or after a block that would jump to the copy. */ + FOR_EACH_EDGE (e, ei, bb->preds) + if (!bitmap_bit_p (&bb_flags, e->src->index) + && single_succ_p (e->src)) + break; + if (e) + { + copy_bb = create_basic_block (NEXT_INSN (BB_END (e->src)), + NULL_RTX, e->src); + BB_COPY_PARTITION (copy_bb, e->src); + } + else + { + /* Otherwise put the copy at the end of the function. */ + copy_bb = create_basic_block (NULL_RTX, NULL_RTX, + EXIT_BLOCK_PTR->prev_bb); + BB_COPY_PARTITION (copy_bb, bb); + } + + insert_point = emit_note_after (NOTE_INSN_DELETED, + BB_END (copy_bb)); + emit_barrier_after (BB_END (copy_bb)); + + tbb = bb; + while (1) + { + dup_block_and_redirect (tbb, copy_bb, insert_point, + &bb_flags); + tbb = single_succ (tbb); + if (tbb == EXIT_BLOCK_PTR) + break; + e = split_block (copy_bb, PREV_INSN (insert_point)); + copy_bb = e->dest; + } + + /* Quiet verify_flow_info by (ab)using EDGE_FAKE. + We have yet to add a simple_return to the tails, + as we'd like to first convert_jumps_to_returns in + case the block is no longer used after that. */ + eflags = EDGE_FAKE; + if (CALL_P (PREV_INSN (insert_point)) + && SIBLING_CALL_P (PREV_INSN (insert_point))) + eflags = EDGE_SIBCALL | EDGE_ABNORMAL; + make_single_succ_edge (copy_bb, EXIT_BLOCK_PTR, eflags); + + /* verify_flow_info doesn't like a note after a + sibling call. */ + delete_insn (insert_point); + if (bitmap_empty_p (&bb_tail)) + break; + } } fail_shrinkwrap: + bitmap_clear (&bb_tail); bitmap_clear (&bb_antic_flags); bitmap_clear (&bb_on_list); VEC_free (basic_block, heap, vec); @@ -5911,147 +6175,73 @@ thread_prologue_and_epilogue_insns (void rtl_profile_for_bb (EXIT_BLOCK_PTR); -#ifdef HAVE_return + exit_fallthru_edge = find_fallthru_edge (EXIT_BLOCK_PTR->preds); + /* If we're allowed to generate a simple return instruction, then by definition we don't need a full epilogue. If the last basic block before the exit block does not contain active instructions, examine its predecessors and try to emit (conditional) return instructions. */ - if (optimize && !last_bb_active - && (HAVE_return || entry_edge != orig_entry_edge)) +#ifdef HAVE_simple_return + if (entry_edge != orig_entry_edge) { - edge_iterator ei2; - int i; - basic_block bb; - rtx label; - VEC(basic_block,heap) *src_bbs; - - if (exit_fallthru_edge == NULL) - goto epilogue_done; - label = BB_HEAD (last_bb); - - src_bbs = VEC_alloc (basic_block, heap, EDGE_COUNT (last_bb->preds)); - FOR_EACH_EDGE (e, ei2, last_bb->preds) - if (e->src != ENTRY_BLOCK_PTR) - VEC_quick_push (basic_block, src_bbs, e->src); - - FOR_EACH_VEC_ELT (basic_block, src_bbs, i, bb) + if (optimize) { - bool simple_p; - rtx jump; - e = find_edge (bb, last_bb); - - jump = BB_END (bb); + unsigned i, last; -#ifdef HAVE_simple_return - simple_p = (entry_edge != orig_entry_edge - && !bitmap_bit_p (&bb_flags, bb->index)); -#else - simple_p = false; -#endif + /* convert_jumps_to_returns may add to EXIT_BLOCK_PTR->preds + (but won't remove). Stop at end of current preds. */ + last = EDGE_COUNT (EXIT_BLOCK_PTR->preds); + for (i = 0; i < last; i++) + { + e = EDGE_I (EXIT_BLOCK_PTR->preds, i); + if (LABEL_P (BB_HEAD (e->src)) + && !bitmap_bit_p (&bb_flags, e->src->index) + && !active_insn_between (BB_HEAD (e->src), BB_END (e->src))) + unconverted_simple_returns + = convert_jumps_to_returns (e->src, true, + unconverted_simple_returns); + } + } - if (!simple_p - && (!HAVE_return || !JUMP_P (jump) - || JUMP_LABEL (jump) != label)) - continue; + if (exit_fallthru_edge != NULL + && EDGE_COUNT (exit_fallthru_edge->src->preds) != 0 + && !bitmap_bit_p (&bb_flags, exit_fallthru_edge->src->index)) + { + basic_block last_bb; - /* If we have an unconditional jump, we can replace that - with a simple return instruction. */ - if (!JUMP_P (jump)) - { - emit_barrier_after (BB_END (bb)); - emit_return_into_block (simple_p, bb); - } - else if (simplejump_p (jump)) - { - /* The use of the return register might be present in the exit - fallthru block. Either: - - removing the use is safe, and we should remove the use in - the exit fallthru block, or - - removing the use is not safe, and we should add it here. - For now, we conservatively choose the latter. Either of the - 2 helps in crossjumping. */ - emit_use_return_register_into_block (bb); - - emit_return_into_block (simple_p, bb); - delete_insn (jump); - } - else if (condjump_p (jump) && JUMP_LABEL (jump) != label) - { - basic_block new_bb; - edge new_e; - - gcc_assert (simple_p); - new_bb = split_edge (e); - emit_barrier_after (BB_END (new_bb)); - emit_return_into_block (simple_p, new_bb); -#ifdef HAVE_simple_return - if (BB_PARTITION (new_bb) == BB_HOT_PARTITION) - simple_return_block_hot = new_bb; - else - simple_return_block_cold = new_bb; + last_bb = emit_return_for_exit (exit_fallthru_edge, true); + returnjump = BB_END (last_bb); + exit_fallthru_edge = NULL; + } + } #endif - new_e = single_succ_edge (new_bb); - redirect_edge_succ (new_e, EXIT_BLOCK_PTR); +#ifdef HAVE_return + if (HAVE_return) + { + if (exit_fallthru_edge == NULL) + goto epilogue_done; - continue; - } - /* If we have a conditional jump branching to the last - block, we can try to replace that with a conditional - return instruction. */ - else if (condjump_p (jump)) - { - rtx dest; - if (simple_p) - dest = simple_return_rtx; - else - dest = ret_rtx; - if (! redirect_jump (jump, dest, 0)) - { -#ifdef HAVE_simple_return - if (simple_p) - VEC_safe_push (rtx, heap, - unconverted_simple_returns, jump); -#endif - continue; - } + if (optimize) + { + basic_block last_bb = exit_fallthru_edge->src; - /* See comment in simple_jump_p case above. */ - emit_use_return_register_into_block (bb); + if (LABEL_P (BB_HEAD (last_bb)) + && !active_insn_between (BB_HEAD (last_bb), BB_END (last_bb))) + convert_jumps_to_returns (last_bb, false, NULL); - /* If this block has only one successor, it both jumps - and falls through to the fallthru block, so we can't - delete the edge. */ - if (single_succ_p (bb)) - continue; - } - else + if (EDGE_COUNT (exit_fallthru_edge->src->preds) != 0) { + last_bb = emit_return_for_exit (exit_fallthru_edge, false); + epilogue_end = returnjump = BB_END (last_bb); #ifdef HAVE_simple_return - if (simple_p) - VEC_safe_push (rtx, heap, - unconverted_simple_returns, jump); + /* Emitting the return may add a basic block. + Fix bb_flags for the added block. */ + if (last_bb != exit_fallthru_edge->src) + bitmap_set_bit (&bb_flags, last_bb->index); #endif - continue; + goto epilogue_done; } - - /* Fix up the CFG for the successful change we just made. */ - redirect_edge_succ (e, EXIT_BLOCK_PTR); - } - VEC_free (basic_block, heap, src_bbs); - - if (HAVE_return) - { - /* Emit a return insn for the exit fallthru block. Whether - this is still reachable will be determined later. */ - - emit_barrier_after (BB_END (last_bb)); - emit_return_into_block (false, last_bb); - epilogue_end = BB_END (last_bb); - if (JUMP_P (epilogue_end)) - set_return_jump_label (epilogue_end); - single_succ_edge (last_bb)->flags &= ~EDGE_FALLTHRU; - goto epilogue_done; } } #endif @@ -6171,10 +6361,13 @@ epilogue_done: convert to conditional simple_returns, but couldn't for some reason, create a block to hold a simple_return insn and redirect those remaining edges. */ - if (!VEC_empty (rtx, unconverted_simple_returns)) + if (!VEC_empty (edge, unconverted_simple_returns)) { + basic_block simple_return_block_hot = NULL; + basic_block simple_return_block_cold = NULL; + edge pending_edge_hot = NULL; + edge pending_edge_cold = NULL; basic_block exit_pred = EXIT_BLOCK_PTR->prev_bb; - rtx jump; int i; gcc_assert (entry_edge != orig_entry_edge); @@ -6184,25 +6377,48 @@ epilogue_done: if (returnjump != NULL_RTX && JUMP_LABEL (returnjump) == simple_return_rtx) { - edge e = split_block (exit_fallthru_edge->src, - PREV_INSN (returnjump)); + e = split_block (BLOCK_FOR_INSN (returnjump), PREV_INSN (returnjump)); if (BB_PARTITION (e->src) == BB_HOT_PARTITION) simple_return_block_hot = e->dest; else simple_return_block_cold = e->dest; } - FOR_EACH_VEC_ELT (rtx, unconverted_simple_returns, i, jump) + /* Also check returns we might need to add to tail blocks. */ + FOR_EACH_EDGE (e, ei, EXIT_BLOCK_PTR->preds) + if (EDGE_COUNT (e->src->preds) != 0 + && (e->flags & EDGE_FAKE) != 0 + && !bitmap_bit_p (&bb_flags, e->src->index)) + { + if (BB_PARTITION (e->src) == BB_HOT_PARTITION) + pending_edge_hot = e; + else + pending_edge_cold = e; + } + + FOR_EACH_VEC_ELT (edge, unconverted_simple_returns, i, e) { - basic_block src_bb = BLOCK_FOR_INSN (jump); - edge e = find_edge (src_bb, last_bb); basic_block *pdest_bb; + edge pending; - if (BB_PARTITION (src_bb) == BB_HOT_PARTITION) - pdest_bb = &simple_return_block_hot; + if (BB_PARTITION (e->src) == BB_HOT_PARTITION) + { + pdest_bb = &simple_return_block_hot; + pending = pending_edge_hot; + } else - pdest_bb = &simple_return_block_cold; - if (*pdest_bb == NULL) + { + pdest_bb = &simple_return_block_cold; + pending = pending_edge_cold; + } + + if (*pdest_bb == NULL && pending != NULL) + { + emit_return_into_block (true, pending->src); + pending->flags &= ~(EDGE_FALLTHRU | EDGE_FAKE); + *pdest_bb = pending->src; + } + else if (*pdest_bb == NULL) { basic_block bb; rtx start; @@ -6219,7 +6435,19 @@ epilogue_done: } redirect_edge_and_branch_force (e, *pdest_bb); } - VEC_free (rtx, heap, unconverted_simple_returns); + VEC_free (edge, heap, unconverted_simple_returns); + } + + if (entry_edge != orig_entry_edge) + { + FOR_EACH_EDGE (e, ei, EXIT_BLOCK_PTR->preds) + if (EDGE_COUNT (e->src->preds) != 0 + && (e->flags & EDGE_FAKE) != 0 + && !bitmap_bit_p (&bb_flags, e->src->index)) + { + emit_return_into_block (true, e->src); + e->flags &= ~(EDGE_FALLTHRU | EDGE_FAKE); + } } #endif @@ -6233,8 +6461,11 @@ epilogue_done: if (!CALL_P (insn) || ! SIBLING_CALL_P (insn) +#ifdef HAVE_simple_return || (entry_edge != orig_entry_edge - && !bitmap_bit_p (&bb_flags, bb->index))) + && !bitmap_bit_p (&bb_flags, bb->index)) +#endif + ) { ei_next (&ei); continue; @@ -6281,7 +6512,9 @@ epilogue_done: } #endif +#ifdef HAVE_simple_return bitmap_clear (&bb_flags); +#endif /* Threading the prologue and epilogue changes the artificial refs in the entry and exit blocks. */ Index: gcc/ifcvt.c =================================================================== --- gcc/ifcvt.c (revision 180588) +++ gcc/ifcvt.c (working copy) @@ -4167,13 +4167,14 @@ dead_or_predicable (basic_block test_bb, if (NONDEBUG_INSN_P (insn)) df_simulate_find_defs (insn, merge_set); +#ifdef HAVE_simple_return /* If shrink-wrapping, disable this optimization when test_bb is the first basic block and merge_bb exits. The idea is to not move code setting up a return register as that may clobber a register used to pass function parameters, which then must be saved in caller-saved regs. A caller-saved reg requires the prologue, killing a shrink-wrap opportunity. */ - if ((flag_shrink_wrap && !epilogue_completed) + if ((flag_shrink_wrap && HAVE_simple_return && !epilogue_completed) && ENTRY_BLOCK_PTR->next_bb == test_bb && single_succ_p (new_dest) && single_succ (new_dest) == EXIT_BLOCK_PTR @@ -4224,6 +4225,7 @@ dead_or_predicable (basic_block test_bb, } BITMAP_FREE (return_regs); } +#endif } no_body: Index: gcc/bb-reorder.c =================================================================== --- gcc/bb-reorder.c (revision 180588) +++ gcc/bb-reorder.c (working copy) @@ -181,7 +181,6 @@ static fibheapkey_t bb_to_key (basic_blo static bool better_edge_p (const_basic_block, const_edge, int, int, int, int, const_edge); static void connect_traces (int, struct trace *); static bool copy_bb_p (const_basic_block, int); -static int get_uncond_jump_length (void); static bool push_to_next_round_p (const_basic_block, int, int, int, gcov_type); \f /* Check to see if bb should be pushed into the next round of trace @@ -1193,7 +1192,7 @@ copy_bb_p (const_basic_block bb, int cod /* Return the length of unconditional jump instruction. */ -static int +int get_uncond_jump_length (void) { rtx label, jump; Index: gcc/bb-reorder.h =================================================================== --- gcc/bb-reorder.h (revision 180588) +++ gcc/bb-reorder.h (working copy) @@ -34,4 +34,6 @@ extern struct target_bb_reorder *this_ta #define this_target_bb_reorder (&default_target_bb_reorder) #endif +extern int get_uncond_jump_length (void); + #endif Index: gcc/cfgrtl.c =================================================================== --- gcc/cfgrtl.c (revision 180588) +++ gcc/cfgrtl.c (working copy) @@ -322,9 +322,9 @@ create_basic_block_structure (rtx head, } /* Create new basic block consisting of instructions in between HEAD and END - and place it to the BB chain after block AFTER. END can be NULL in to - create new empty basic block before HEAD. Both END and HEAD can be NULL to - create basic block at the end of INSN chain. */ + and place it to the BB chain after block AFTER. END can be NULL to + create a new empty basic block before HEAD. Both END and HEAD can be + NULL to create basic block at the end of INSN chain. */ static basic_block rtl_create_basic_block (void *headp, void *endp, basic_block after) @@ -1411,8 +1411,8 @@ rtl_split_edge (edge edge_in) before = NULL_RTX; /* If this is a fall through edge to the exit block, the blocks might be - not adjacent, and the right place is the after the source. */ - if (edge_in->flags & EDGE_FALLTHRU && edge_in->dest == EXIT_BLOCK_PTR) + not adjacent, and the right place is after the source. */ + if ((edge_in->flags & EDGE_FALLTHRU) && edge_in->dest == EXIT_BLOCK_PTR) { before = NEXT_INSN (BB_END (edge_in->src)); bb = create_basic_block (before, NULL, edge_in->src); @@ -3175,6 +3175,21 @@ rtl_can_remove_branch_p (const_edge e) return true; } +/* We do not want to declare these functions in a header file, since they + should only be used through the cfghooks interface, and we do not want to + move them here since it would require also moving quite a lot of related + code. They are in cfglayout.c. */ +extern bool cfg_layout_can_duplicate_bb_p (const_basic_block); +extern basic_block cfg_layout_duplicate_bb (basic_block); + +static basic_block +rtl_duplicate_bb (basic_block bb) +{ + bb = cfg_layout_duplicate_bb (bb); + bb->aux = NULL; + return bb; +} + /* Implementation of CFG manipulation for linearized RTL. */ struct cfg_hooks rtl_cfg_hooks = { "rtl", @@ -3191,8 +3206,8 @@ struct cfg_hooks rtl_cfg_hooks = { rtl_merge_blocks, rtl_predict_edge, rtl_predicted_by_p, - NULL, /* can_duplicate_block_p */ - NULL, /* duplicate_block */ + cfg_layout_can_duplicate_bb_p, + rtl_duplicate_bb, rtl_split_edge, rtl_make_forwarder_block, rtl_tidy_fallthru_edge, @@ -3214,13 +3229,6 @@ struct cfg_hooks rtl_cfg_hooks = { This representation will hopefully become the default one in future version of the compiler. */ -/* We do not want to declare these functions in a header file, since they - should only be used through the cfghooks interface, and we do not want to - move them here since it would require also moving quite a lot of related - code. They are in cfglayout.c. */ -extern bool cfg_layout_can_duplicate_bb_p (const_basic_block); -extern basic_block cfg_layout_duplicate_bb (basic_block); - struct cfg_hooks cfg_layout_rtl_cfg_hooks = { "cfglayout mode", rtl_verify_flow_info_1, -- Alan Modra Australia Development Lab, IBM ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: PowerPC shrink-wrap support 3 of 3 2011-11-01 15:34 ` Alan Modra @ 2011-11-07 17:27 ` Jakub Jelinek 2011-11-09 9:48 ` Hans-Peter Nilsson 1 sibling, 0 replies; 48+ messages in thread From: Jakub Jelinek @ 2011-11-07 17:27 UTC (permalink / raw) To: Alan Modra, Bernd Schmidt, Richard Henderson; +Cc: gcc-patches On Wed, Nov 02, 2011 at 02:03:40AM +1030, Alan Modra wrote: > Bootstrapped and regression tested powerpc-linux. OK to apply? > (And I won't be posting any more versions of the patch until this is > reviewed. Please excuse me for spamming the list.) Looks reasonable to me, appart from > * function.c (bb_active_p): Delete. > (dup_block_and_redirect, active_insn_between): New functions. > (convert_jumps_to_returns, emit_return_for_exit): New functions, > split out from.. > (thread_prologue_and_epilogue_insns): ..here. Delete > shadowing variables. Don't do prologue register clobber tests > when shrink wrapping already failed. Delete all last_bb_active > code. Instead compute tail block candidates for duplicating > exit path. Remove these from antic set. Duplicate tails when > reached from both blocks needing a prologue/epilogue and > blocks not needing such. > * ifcvt.c (dead_or_predicable): Test both flag_shrink_wrap and > HAVE_simple_return. > * bb-reorder.c (get_uncond_jump_length): Make global. > * bb-reorder.h (get_uncond_jump_length): Declare. > * cfgrtl.c (rtl_create_basic_block): Comment typo fix. > (rtl_split_edge): Likewise. Warning fix. > (rtl_duplicate_bb): New function. > (rtl_cfg_hooks): Enable can_duplicate_block_p and duplicate_block. > > Index: gcc/function.c > =================================================================== > --- gcc/function.c (revision 180588) > +++ gcc/function.c (working copy) > @@ -65,6 +65,8 @@ along with GCC; see the file COPYING3. > #include "df.h" > #include "timevar.h" > #include "vecprim.h" > +#include "params.h" > +#include "bb-reorder.h" This needs corresponding change in Makefile.in, function.o needs $(PARAMS_H) bb-reorder.h dependencies added. Ok for trunk with that change if no maintainer objects within 24 hours. Jakub ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: PowerPC shrink-wrap support 3 of 3 2011-11-01 15:34 ` Alan Modra 2011-11-07 17:27 ` Jakub Jelinek @ 2011-11-09 9:48 ` Hans-Peter Nilsson 2011-11-10 11:25 ` Revert "PowerPC shrink-wrap support 3 of 3" Hans-Peter Nilsson 1 sibling, 1 reply; 48+ messages in thread From: Hans-Peter Nilsson @ 2011-11-09 9:48 UTC (permalink / raw) To: amodra; +Cc: gcc-patches, bernds > From: Alan Modra <amodra@gmail.com> > Date: Tue, 1 Nov 2011 16:33:40 +0100 > On Tue, Nov 01, 2011 at 12:57:22AM +1030, Alan Modra wrote: > * function.c (bb_active_p): Delete. > (dup_block_and_redirect, active_insn_between): New functions. > (convert_jumps_to_returns, emit_return_for_exit): New functions, > split out from.. > (thread_prologue_and_epilogue_insns): ..here. Delete > shadowing variables. Don't do prologue register clobber tests > when shrink wrapping already failed. Delete all last_bb_active > code. Instead compute tail block candidates for duplicating > exit path. Remove these from antic set. Duplicate tails when > reached from both blocks needing a prologue/epilogue and > blocks not needing such. > * ifcvt.c (dead_or_predicable): Test both flag_shrink_wrap and > HAVE_simple_return. > * bb-reorder.c (get_uncond_jump_length): Make global. > * bb-reorder.h (get_uncond_jump_length): Declare. > * cfgrtl.c (rtl_create_basic_block): Comment typo fix. > (rtl_split_edge): Likewise. Warning fix. > (rtl_duplicate_bb): New function. > (rtl_cfg_hooks): Enable can_duplicate_block_p and duplicate_block. This (a revision in the range 181187:181189) broke build for cris-elf like so: libtool: compile: /tmp/hpautotest-gcc1/cris-elf/gccobj/./gcc/xgcc -shared-libgcc -B/tmp/hpautotest-gcc1/cris-elf/gccobj/./gcc -nostdinc++ -L/tmp/hpautotest-gcc1/cris-elf/gccobj/cris-elf/libstdc++-v3/src -L/tmp/hpautotest-gcc1/cris-elf/gccobj/cris-elf/libstdc++-v3/src/.libs -nostdinc -B/tmp/hpautotest-gcc1/cris-elf/gccobj/cris-elf/newlib/ -isystem /tmp/hpautotest-gcc1/cris-elf/gccobj/cris-elf/newlib/targ-include -isystem /tmp/hpautotest-gcc1/gcc/newlib/libc/include -B/tmp/hpautotest-gcc1/cris-elf/gccobj/cris-elf/libgloss/cris -L/tmp/hpautotest-gcc1/cris-elf/gccobj/cris-elf/libgloss/libnosys -L/tmp/hpautotest-gcc1/gcc/libgloss/cris -B/tmp/hpautotest-gcc1/cris-elf/pre/cris-elf/bin/ -B/tmp/hpautotest-gcc1/cris-elf/pre/cris-elf/lib/ -isystem /tmp/hpautotest-gcc1/cris-elf/pre/cris-elf/include -isystem /tmp/hpautotest-gcc1/cris-elf/pre/cris-elf/sys-include -I/tmp/hpautotest-gcc1/gcc/libstdc++-v3/../libgcc -I/tmp/hpautotest-gcc1/cris-elf/gccobj/cris-elf/libstdc++-v3/include/cris-elf -I/tmp/hpautotest-gcc1/cris-elf/gccobj/cris-elf/libstdc++-v3/include -I/tmp/hpautotest-gcc1/gcc/libstdc++-v3/libsupc++ -fno-implicit-templates -Wall -Wextra -Wwrite-strings -Wcast-qual -fdiagnostics-show-location=once -ffunction-sections -fdata-sections -frandom-seed=new_opv.lo -g -O2 -c /tmp/hpautotest-gcc1/gcc/libstdc++-v3/libsupc++/new_opv.cc -o new_opv.o /tmp/hpautotest-gcc1/gcc/libstdc++-v3/libsupc++/new_opv.cc: In function 'void* operator new [](std::size_t)': /tmp/hpautotest-gcc1/gcc/libstdc++-v3/libsupc++/new_opv.cc:34:1: error: missing REG_EH_REGION note in the end of bb 2 /tmp/hpautotest-gcc1/gcc/libstdc++-v3/libsupc++/new_opv.cc:34:1: error: call edges for non-call insn in bb 2 /tmp/hpautotest-gcc1/gcc/libstdc++-v3/libsupc++/new_opv.cc:34:1: error: in basic block 2: /tmp/hpautotest-gcc1/gcc/libstdc++-v3/libsupc++/new_opv.cc:34:1: error: flow control insn inside a basic block (call_insn 8 6 50 2 (parallel [ (set (reg:SI 10 r10) (call (mem:QI (symbol_ref:SI ("_Znwm") [flags 0x41] <function_decl 0x7f94a90bf300 operator new>) [0 operator new S1 A8]) (const_int 0 [0]))) (clobber (reg:SI 16 srp)) ]) /tmp/hpautotest-gcc1/gcc/libstdc++-v3/libsupc++/new_opv.cc:33 222 {*expanded_call_value_non_v32} (expr_list:REG_EH_REGION (const_int 1 [0x1]) (nil)) (expr_list:REG_UNUSED (use (reg:SI 10 r10)) (nil))) /tmp/hpautotest-gcc1/gcc/libstdc++-v3/libsupc++/new_opv.cc:34:1: internal compiler error: in rtl_verify_flow_info_1, at cfgrtl.c:2001 See PR51051. brgds, H-P ^ permalink raw reply [flat|nested] 48+ messages in thread
* Revert "PowerPC shrink-wrap support 3 of 3" 2011-11-09 9:48 ` Hans-Peter Nilsson @ 2011-11-10 11:25 ` Hans-Peter Nilsson 2011-11-10 12:10 ` Richard Guenther 2015-04-08 11:11 ` Gerald Pfeifer 0 siblings, 2 replies; 48+ messages in thread From: Hans-Peter Nilsson @ 2011-11-10 11:25 UTC (permalink / raw) To: amodra; +Cc: gcc-patches, bernds > From: Hans-Peter Nilsson <hp@axis.com> > Date: Wed, 9 Nov 2011 09:55:59 +0100 > > From: Alan Modra <amodra@gmail.com> > > Date: Tue, 1 Nov 2011 16:33:40 +0100 > > > On Tue, Nov 01, 2011 at 12:57:22AM +1030, Alan Modra wrote: > > > * function.c (bb_active_p): Delete. > > (dup_block_and_redirect, active_insn_between): New functions. > > (convert_jumps_to_returns, emit_return_for_exit): New functions, > > split out from.. > > (thread_prologue_and_epilogue_insns): ..here. Delete > > shadowing variables. Don't do prologue register clobber tests > > when shrink wrapping already failed. Delete all last_bb_active > > code. Instead compute tail block candidates for duplicating > > exit path. Remove these from antic set. Duplicate tails when > > reached from both blocks needing a prologue/epilogue and > > blocks not needing such. > > * ifcvt.c (dead_or_predicable): Test both flag_shrink_wrap and > > HAVE_simple_return. > > * bb-reorder.c (get_uncond_jump_length): Make global. > > * bb-reorder.h (get_uncond_jump_length): Declare. > > * cfgrtl.c (rtl_create_basic_block): Comment typo fix. > > (rtl_split_edge): Likewise. Warning fix. > > (rtl_duplicate_bb): New function. > > (rtl_cfg_hooks): Enable can_duplicate_block_p and duplicate_block. > > This (a revision in the range 181187:181189) broke build for > cris-elf like so: > See PR51051. Given that this also broke arm-linux-gnueabi, a primary platform, and Alan being absent until the 15th according to a message on IRC, I move to revert r181188. I think I need someone with appropriate write privileges to agree with that, and to also give 48h for someone to fix the problem. Sorry for not forthcoming on the second point. brgds, H-P PS. where is the policy written down, besides the mailing list archives? ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: Revert "PowerPC shrink-wrap support 3 of 3" 2011-11-10 11:25 ` Revert "PowerPC shrink-wrap support 3 of 3" Hans-Peter Nilsson @ 2011-11-10 12:10 ` Richard Guenther 2011-11-10 13:29 ` Hans-Peter Nilsson 2015-04-08 11:11 ` Gerald Pfeifer 1 sibling, 1 reply; 48+ messages in thread From: Richard Guenther @ 2011-11-10 12:10 UTC (permalink / raw) To: Hans-Peter Nilsson; +Cc: amodra, gcc-patches, bernds On Thu, Nov 10, 2011 at 11:38 AM, Hans-Peter Nilsson <hans-peter.nilsson@axis.com> wrote: >> From: Hans-Peter Nilsson <hp@axis.com> >> Date: Wed, 9 Nov 2011 09:55:59 +0100 > >> > From: Alan Modra <amodra@gmail.com> >> > Date: Tue, 1 Nov 2011 16:33:40 +0100 >> >> > On Tue, Nov 01, 2011 at 12:57:22AM +1030, Alan Modra wrote: >> >> > * function.c (bb_active_p): Delete. >> > (dup_block_and_redirect, active_insn_between): New functions. >> > (convert_jumps_to_returns, emit_return_for_exit): New functions, >> > split out from.. >> > (thread_prologue_and_epilogue_insns): ..here. Delete >> > shadowing variables. Don't do prologue register clobber tests >> > when shrink wrapping already failed. Delete all last_bb_active >> > code. Instead compute tail block candidates for duplicating >> > exit path. Remove these from antic set. Duplicate tails when >> > reached from both blocks needing a prologue/epilogue and >> > blocks not needing such. >> > * ifcvt.c (dead_or_predicable): Test both flag_shrink_wrap and >> > HAVE_simple_return. >> > * bb-reorder.c (get_uncond_jump_length): Make global. >> > * bb-reorder.h (get_uncond_jump_length): Declare. >> > * cfgrtl.c (rtl_create_basic_block): Comment typo fix. >> > (rtl_split_edge): Likewise. Warning fix. >> > (rtl_duplicate_bb): New function. >> > (rtl_cfg_hooks): Enable can_duplicate_block_p and duplicate_block. >> >> This (a revision in the range 181187:181189) broke build for >> cris-elf like so: >> See PR51051. > > Given that this also broke arm-linux-gnueabi, a primary > platform, and Alan being absent until the 15th according to a > message on IRC, I move to revert r181188. Is there a PR for the arm issue? > I think I need someone with appropriate write privileges to > agree with that, and to also give 48h for someone to fix the > problem. Sorry for not forthcoming on the second point. Did you or somebody else try to look into the problem? To decide whether it's the "best course of action" it would be nice to know if it's a simple error in the patch that is easy to fix. > brgds, H-P > PS. where is the policy written down, besides the mailing list archives? http://gcc.gnu.org/develop.html ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: Revert "PowerPC shrink-wrap support 3 of 3" 2011-11-10 12:10 ` Richard Guenther @ 2011-11-10 13:29 ` Hans-Peter Nilsson 2011-11-10 13:44 ` Richard Guenther 0 siblings, 1 reply; 48+ messages in thread From: Hans-Peter Nilsson @ 2011-11-10 13:29 UTC (permalink / raw) To: richard.guenther; +Cc: hp, amodra, gcc-patches, bernds > From: Richard Guenther <richard.guenther@gmail.com> > Date: Thu, 10 Nov 2011 12:22:56 +0100 > On Thu, Nov 10, 2011 at 11:38 AM, Hans-Peter Nilsson > <hans-peter.nilsson@axis.com> wrote: > >> From: Hans-Peter Nilsson <hp@axis.com> > >> Date: Wed, 9 Nov 2011 09:55:59 +0100 > > > >> > From: Alan Modra <amodra@gmail.com> > >> > Date: Tue, 1 Nov 2011 16:33:40 +0100 > >> > >> > On Tue, Nov 01, 2011 at 12:57:22AM +1030, Alan Modra wrote: > >> > >> > Â Â Â Â * function.c (bb_active_p): Delete. > >> > Â Â Â Â (dup_block_and_redirect, active_insn_between): New functions. > >> > Â Â Â Â (convert_jumps_to_returns, emit_return_for_exit): New functions, > >> > Â Â Â Â split out from.. > >> > Â Â Â Â (thread_prologue_and_epilogue_insns): ..here. Â Delete > >> > Â Â Â Â shadowing variables. Â Don't do prologue register clobber tests > >> > Â Â Â Â when shrink wrapping already failed. Â Delete all last_bb_active > >> > Â Â Â Â code. Â Instead compute tail block candidates for duplicating > >> > Â Â Â Â exit path. Â Remove these from antic set. Â Duplicate tails when > >> > Â Â Â Â reached from both blocks needing a prologue/epilogue and > >> > Â Â Â Â blocks not needing such. > >> > Â Â Â Â * ifcvt.c (dead_or_predicable): Test both flag_shrink_wrap and > >> > Â Â Â Â HAVE_simple_return. > >> > Â Â Â Â * bb-reorder.c (get_uncond_jump_length): Make global. > >> > Â Â Â Â * bb-reorder.h (get_uncond_jump_length): Declare. > >> > Â Â Â Â * cfgrtl.c (rtl_create_basic_block): Comment typo fix. > >> > Â Â Â Â (rtl_split_edge): Likewise. Â Warning fix. > >> > Â Â Â Â (rtl_duplicate_bb): New function. > >> > Â Â Â Â (rtl_cfg_hooks): Enable can_duplicate_block_p and duplicate_block. > >> > >> This (a revision in the range 181187:181189) broke build for > >> cris-elf like so: > >> See PR51051. > > > > Given that this also broke arm-linux-gnueabi, a primary > > platform, and Alan being absent until the 15th according to a > > message on IRC, I move to revert r181188. > > Is there a PR for the arm issue? It's covered by the same PR, see comment #1. I've now updated the target field. > > I think I need someone with appropriate write privileges to > > agree with that, and to also give 48h for someone to fix the > > problem. Â Sorry for not forthcoming on the second point. > > Did you or somebody else try to look into the problem? To decide > whether it's the "best course of action" it would be nice to know if > it's a simple error in the patch that is easy to fix. Nope, not really. Wouldn't FWIW, de jure matter, me not having write privileges to the affected area. Though, I had a quick look at the patch and nothing stood out except its intrusiveness, and it seems the patch wasn't tested on a !simple_return target (just "powerpc-linux" according to the replied-to message). brgds, H-P ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: Revert "PowerPC shrink-wrap support 3 of 3" 2011-11-10 13:29 ` Hans-Peter Nilsson @ 2011-11-10 13:44 ` Richard Guenther 2011-11-10 14:13 ` Bernd Schmidt 0 siblings, 1 reply; 48+ messages in thread From: Richard Guenther @ 2011-11-10 13:44 UTC (permalink / raw) To: Hans-Peter Nilsson; +Cc: hp, amodra, gcc-patches, bernds On Thu, Nov 10, 2011 at 12:43 PM, Hans-Peter Nilsson <hans-peter.nilsson@axis.com> wrote: >> From: Richard Guenther <richard.guenther@gmail.com> >> Date: Thu, 10 Nov 2011 12:22:56 +0100 > >> On Thu, Nov 10, 2011 at 11:38 AM, Hans-Peter Nilsson >> <hans-peter.nilsson@axis.com> wrote: >> >> From: Hans-Peter Nilsson <hp@axis.com> >> >> Date: Wed, 9 Nov 2011 09:55:59 +0100 >> > >> >> > From: Alan Modra <amodra@gmail.com> >> >> > Date: Tue, 1 Nov 2011 16:33:40 +0100 >> >> >> >> > On Tue, Nov 01, 2011 at 12:57:22AM +1030, Alan Modra wrote: >> >> >> >> > * function.c (bb_active_p): Delete. >> >> > (dup_block_and_redirect, active_insn_between): New functions. >> >> > (convert_jumps_to_returns, emit_return_for_exit): New functions, >> >> > split out from.. >> >> > (thread_prologue_and_epilogue_insns): ..here. Delete >> >> > shadowing variables. Don't do prologue register clobber tests >> >> > when shrink wrapping already failed. Delete all last_bb_active >> >> > code. Instead compute tail block candidates for duplicating >> >> > exit path. Remove these from antic set. Duplicate tails when >> >> > reached from both blocks needing a prologue/epilogue and >> >> > blocks not needing such. >> >> > * ifcvt.c (dead_or_predicable): Test both flag_shrink_wrap and >> >> > HAVE_simple_return. >> >> > * bb-reorder.c (get_uncond_jump_length): Make global. >> >> > * bb-reorder.h (get_uncond_jump_length): Declare. >> >> > * cfgrtl.c (rtl_create_basic_block): Comment typo fix. >> >> > (rtl_split_edge): Likewise. Warning fix. >> >> > (rtl_duplicate_bb): New function. >> >> > (rtl_cfg_hooks): Enable can_duplicate_block_p and duplicate_block. >> >> >> >> This (a revision in the range 181187:181189) broke build for >> >> cris-elf like so: >> >> See PR51051. >> > >> > Given that this also broke arm-linux-gnueabi, a primary >> > platform, and Alan being absent until the 15th according to a >> > message on IRC, I move to revert r181188. >> >> Is there a PR for the arm issue? > > It's covered by the same PR, see comment #1. > I've now updated the target field. > >> > I think I need someone with appropriate write privileges to >> > agree with that, and to also give 48h for someone to fix the >> > problem. Sorry for not forthcoming on the second point. >> >> Did you or somebody else try to look into the problem? To decide >> whether it's the "best course of action" it would be nice to know if >> it's a simple error in the patch that is easy to fix. > > Nope, not really. Wouldn't FWIW, de jure matter, me not having > write privileges to the affected area. Though, I had a quick > look at the patch and nothing stood out except its > intrusiveness, and it seems the patch wasn't tested on a > !simple_return target (just "powerpc-linux" according to the > replied-to message). Fair enough. You can count me as "one" then, and I'll defer to Bernd to either provide a fix or ack the revert. Thanks, Richard. > brgds, H-P > ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: Revert "PowerPC shrink-wrap support 3 of 3" 2011-11-10 13:44 ` Richard Guenther @ 2011-11-10 14:13 ` Bernd Schmidt 2011-11-10 15:23 ` Hans-Peter Nilsson 2011-11-11 0:22 ` Revert " Michael Meissner 0 siblings, 2 replies; 48+ messages in thread From: Bernd Schmidt @ 2011-11-10 14:13 UTC (permalink / raw) To: Richard Guenther; +Cc: Hans-Peter Nilsson, hp, amodra, gcc-patches [-- Attachment #1: Type: text/plain, Size: 1032 bytes --] On 11/10/11 13:14, Richard Guenther wrote: > Fair enough. You can count me as "one" then, and I'll defer to Bernd > to either provide a fix or ack the revert. I'm trying to track it down. In 189r.outof_cfglayout, we have (insn 31 33 35 3 (use (reg/i:SI 0 r0)) ../../../../baseline-trunk/libstdc++-v3/libsupc++/new_opv.cc:34 -1 (nil)) ;; Successors: EXIT [100.0%] (fallthru) ;; lr out 0 [r0] 11 [fp] 13 [sp] 14 [lr] 25 [sfp] 26 [afp] ;; live out 0 [r0] 11 [fp] 13 [sp] 25 [sfp] 26 [afp] followed by a number of other basic blocks, so that looks wrong to me. outof_cfglayout seems to assume that fallthrough edges to the exit block are OK and don't need fixing up, and changing that seems nontrivial at first glance. The situation is first created during cfgcleanup in into_cfglayout. The following patch makes the testcase compile by stopping the compiler from moving the exit fallthru block around, but I've not checked whether it has a negative effect on code quality. HP, can you run full tests? Bernd [-- Attachment #2: cfgl-exit.diff --] [-- Type: text/plain, Size: 856 bytes --] Index: ../baseline-trunk/gcc/cfgrtl.c =================================================================== --- ../baseline-trunk/gcc/cfgrtl.c (revision 181252) +++ ../baseline-trunk/gcc/cfgrtl.c (working copy) @@ -2735,6 +2735,16 @@ cfg_layout_can_merge_blocks_p (basic_blo if (BB_PARTITION (a) != BB_PARTITION (b)) return false; + /* If we would end up moving B's instructions, make sure it doesn't fall + through into the exit block, since we cannot recover from a fallthrough + edge into the exit block occurring in the middle of a function. */ + if (NEXT_INSN (BB_END (a)) != BB_HEAD (b)) + { + edge e = find_fallthru_edge (b->succs); + if (e && e->dest == EXIT_BLOCK_PTR) + return false; + } + /* There must be exactly one edge in between the blocks. */ return (single_succ_p (a) && single_succ (a) == b ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: Revert "PowerPC shrink-wrap support 3 of 3" 2011-11-10 14:13 ` Bernd Schmidt @ 2011-11-10 15:23 ` Hans-Peter Nilsson 2011-11-10 18:06 ` Hans-Peter Nilsson 2011-11-11 0:22 ` Revert " Michael Meissner 1 sibling, 1 reply; 48+ messages in thread From: Hans-Peter Nilsson @ 2011-11-10 15:23 UTC (permalink / raw) To: bernds; +Cc: richard.guenther, amodra, gcc-patches > From: Bernd Schmidt <bernds@codesourcery.com> > Date: Thu, 10 Nov 2011 14:29:04 +0100 > HP, can you run full tests? Cross-test to cris-elf in progress. Thanks! brgds, H-P ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: Revert "PowerPC shrink-wrap support 3 of 3" 2011-11-10 15:23 ` Hans-Peter Nilsson @ 2011-11-10 18:06 ` Hans-Peter Nilsson 2011-11-11 22:09 ` Hans-Peter Nilsson 0 siblings, 1 reply; 48+ messages in thread From: Hans-Peter Nilsson @ 2011-11-10 18:06 UTC (permalink / raw) To: bernds; +Cc: richard.guenther, amodra, gcc-patches > From: Hans-Peter Nilsson <hp@axis.com> > Date: Thu, 10 Nov 2011 15:12:54 +0100 > > From: Bernd Schmidt <bernds@codesourcery.com> > > Date: Thu, 10 Nov 2011 14:29:04 +0100 > > > HP, can you run full tests? > > Cross-test to cris-elf in progress. > Thanks! Works, no regressions compared to before the breakage (r181187). Thanks! According to <http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51051#c3> it fixes building for arm-unknown-linux-gnueabi too. brgds, H-P ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: Revert "PowerPC shrink-wrap support 3 of 3" 2011-11-10 18:06 ` Hans-Peter Nilsson @ 2011-11-11 22:09 ` Hans-Peter Nilsson 2011-11-14 14:59 ` Bernd Schmidt 0 siblings, 1 reply; 48+ messages in thread From: Hans-Peter Nilsson @ 2011-11-11 22:09 UTC (permalink / raw) To: bernds; +Cc: gcc-patches > From: Hans-Peter Nilsson <hp@axis.com> > Date: Thu, 10 Nov 2011 18:52:39 +0100 > > From: Hans-Peter Nilsson <hp@axis.com> > > Date: Thu, 10 Nov 2011 15:12:54 +0100 > > > > From: Bernd Schmidt <bernds@codesourcery.com> > > > Date: Thu, 10 Nov 2011 14:29:04 +0100 > > > > > HP, can you run full tests? > > > > Cross-test to cris-elf in progress. > > Thanks! > > Works, no regressions compared to before the breakage (r181187). > Thanks! According to > <http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51051#c3> it fixes > building for arm-unknown-linux-gnueabi too. AFAICT, your patch has got sufficiently testing now (on three targets to boot) to be considered safe to check in. Or is something amiss? (If it's the unchecked code quality you mentioned, that can be just as well dealt with having the tree in a working state; having the tree broken for some targets accomplishes nothing.) brgds, H-P ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: Revert "PowerPC shrink-wrap support 3 of 3" 2011-11-11 22:09 ` Hans-Peter Nilsson @ 2011-11-14 14:59 ` Bernd Schmidt 2011-11-14 16:49 ` CFG review needed for fix of " Hans-Peter Nilsson 0 siblings, 1 reply; 48+ messages in thread From: Bernd Schmidt @ 2011-11-14 14:59 UTC (permalink / raw) To: Hans-Peter Nilsson; +Cc: gcc-patches On 11/11/11 20:13, Hans-Peter Nilsson wrote: > AFAICT, your patch has got sufficiently testing now (on three > targets to boot) to be considered safe to check in. Or is > something amiss? > > (If it's the unchecked code quality you mentioned, that can be > just as well dealt with having the tree in a working state; > having the tree broken for some targets accomplishes nothing.) I briefly looked at that, and while it does tend to move stuff around compared to an unpatched compiler, the effects don't seem to be all that bad. If we find a testcase where it's a real problem, we could do more aggressive reordering after epilogue generation, when we no longer have exit fallthroguh edges. So I'd like to ask for an OK here. Bernd ^ permalink raw reply [flat|nested] 48+ messages in thread
* CFG review needed for fix of "PowerPC shrink-wrap support 3 of 3" 2011-11-14 14:59 ` Bernd Schmidt @ 2011-11-14 16:49 ` Hans-Peter Nilsson 2011-11-14 17:06 ` Ramana Radhakrishnan 2011-11-14 18:21 ` Richard Henderson 0 siblings, 2 replies; 48+ messages in thread From: Hans-Peter Nilsson @ 2011-11-14 16:49 UTC (permalink / raw) To: bernds; +Cc: hp, gcc-patches > From: Bernd Schmidt <bernds@codesourcery.com> > Date: Mon, 14 Nov 2011 10:51:56 +0100 > On 11/11/11 20:13, Hans-Peter Nilsson wrote: > > AFAICT, your patch has got sufficiently testing now (on three > > targets to boot) to be considered safe to check in. Or is > > something amiss? > > > > (If it's the unchecked code quality you mentioned, that can be > > just as well dealt with having the tree in a working state; > > having the tree broken for some targets accomplishes nothing.) > > I briefly looked at that, and while it does tend to move stuff around > compared to an unpatched compiler, the effects don't seem to be all that > bad. If we find a testcase where it's a real problem, we could do more > aggressive reordering after epilogue generation, when we no longer have > exit fallthroguh edges. So I'd like to ask for an OK here. Looks like all we need is a positive review of <http://gcc.gnu.org/ml/gcc-patches/2011-11/msg01409.html> and a ChangeLog entry to unbreak three or more targets. Someone with approval rights: pretty please? brgds, H-P ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: CFG review needed for fix of "PowerPC shrink-wrap support 3 of 3" 2011-11-14 16:49 ` CFG review needed for fix of " Hans-Peter Nilsson @ 2011-11-14 17:06 ` Ramana Radhakrishnan 2011-11-14 17:15 ` Rainer Orth 2011-11-14 18:21 ` Richard Henderson 1 sibling, 1 reply; 48+ messages in thread From: Ramana Radhakrishnan @ 2011-11-14 17:06 UTC (permalink / raw) To: gcc-patches; +Cc: bernds, hp > > Someone with approval rights: pretty please? Can I add my +1 "pretty please" as well here :) ? According to #c3 this fixes arm-linux-gnueabi cross-builds for C++ as well and potentially allows this to bootstrap again. I have kicked off a bootstrap and test run on arm-linux-gnueabi . cheers Ramana ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: CFG review needed for fix of "PowerPC shrink-wrap support 3 of 3" 2011-11-14 17:06 ` Ramana Radhakrishnan @ 2011-11-14 17:15 ` Rainer Orth 0 siblings, 0 replies; 48+ messages in thread From: Rainer Orth @ 2011-11-14 17:15 UTC (permalink / raw) To: Ramana Radhakrishnan; +Cc: gcc-patches, bernds, hp Ramana Radhakrishnan <ramana.radhakrishnan@linaro.org> writes: >> Someone with approval rights: pretty please? > > Can I add my +1 "pretty please" as well here :) ? According to #c3 > this fixes arm-linux-gnueabi cross-builds for C++ as well and > potentially allows this to bootstrap again. I have kicked off a > bootstrap and test run on arm-linux-gnueabi . It's better than before, but Ada bootstrap (on Solaris/SPARC, cf. PR bootstrap/51086) still fails with the same ICE. Rainer -- ----------------------------------------------------------------------------- Rainer Orth, Center for Biotechnology, Bielefeld University ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: CFG review needed for fix of "PowerPC shrink-wrap support 3 of 3" 2011-11-14 16:49 ` CFG review needed for fix of " Hans-Peter Nilsson 2011-11-14 17:06 ` Ramana Radhakrishnan @ 2011-11-14 18:21 ` Richard Henderson 2011-11-14 22:44 ` Alan Modra 2011-11-15 0:45 ` Hans-Peter Nilsson 1 sibling, 2 replies; 48+ messages in thread From: Richard Henderson @ 2011-11-14 18:21 UTC (permalink / raw) To: Hans-Peter Nilsson; +Cc: bernds, hp, gcc-patches On 11/14/2011 04:10 AM, Hans-Peter Nilsson wrote: > Looks like all we need is a positive review of > <http://gcc.gnu.org/ml/gcc-patches/2011-11/msg01409.html> and a > ChangeLog entry to unbreak three or more targets. > > Someone with approval rights: pretty please? That patch is ok. r~ ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: CFG review needed for fix of "PowerPC shrink-wrap support 3 of 3" 2011-11-14 18:21 ` Richard Henderson @ 2011-11-14 22:44 ` Alan Modra 2011-11-15 2:50 ` Hans-Peter Nilsson ` (2 more replies) 2011-11-15 0:45 ` Hans-Peter Nilsson 1 sibling, 3 replies; 48+ messages in thread From: Alan Modra @ 2011-11-14 22:44 UTC (permalink / raw) To: Richard Henderson; +Cc: Hans-Peter Nilsson, bernds, hp, gcc-patches On Mon, Nov 14, 2011 at 07:48:03AM -1000, Richard Henderson wrote: > On 11/14/2011 04:10 AM, Hans-Peter Nilsson wrote: > > Looks like all we need is a positive review of > > <http://gcc.gnu.org/ml/gcc-patches/2011-11/msg01409.html> and a > > ChangeLog entry to unbreak three or more targets. > > > > Someone with approval rights: pretty please? > > That patch is ok. Sorry for the bootstrap problems. I believe the cause is my extraction of convert_jumps_to_returns and emit_return_for_exit. The new HAVE_return code misses a single_succ_p test (covered by the bitmap_bit_p (&bb_flags, ...) test in the HAVE_simple_return case). I haven't really looked into what Bernd's fix does. I know this one fixes what I broke.. * function.c (thread_prologue_and_epilogue_insns): Guard emitting return with single_succ_p test. Index: gcc/function.c =================================================================== --- gcc/function.c (revision 181188) +++ gcc/function.c (working copy) @@ -6230,7 +6230,8 @@ thread_prologue_and_epilogue_insns (void && !active_insn_between (BB_HEAD (last_bb), BB_END (last_bb))) convert_jumps_to_returns (last_bb, false, NULL); - if (EDGE_COUNT (exit_fallthru_edge->src->preds) != 0) + if (EDGE_COUNT (last_bb->preds) != 0 + && single_succ_p (last_bb)) { last_bb = emit_return_for_exit (exit_fallthru_edge, false); epilogue_end = returnjump = BB_END (last_bb); -- Alan Modra Australia Development Lab, IBM ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: CFG review needed for fix of "PowerPC shrink-wrap support 3 of 3" 2011-11-14 22:44 ` Alan Modra @ 2011-11-15 2:50 ` Hans-Peter Nilsson 2011-11-15 5:03 ` Richard Henderson 2011-11-15 18:54 ` Richard Henderson 2 siblings, 0 replies; 48+ messages in thread From: Hans-Peter Nilsson @ 2011-11-15 2:50 UTC (permalink / raw) To: amodra; +Cc: gcc-patches > From: Alan Modra <amodra@gmail.com> > Date: Mon, 14 Nov 2011 22:56:48 +0100 > I haven't really looked into what Bernd's fix does. I know this one > fixes what I broke.. Hm... Oh well, I'm trusting RTH and Bernd that it fixed a real issue. Thanks for looking (and a belated thanks to RTH for the review). brgds, H-P ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: CFG review needed for fix of "PowerPC shrink-wrap support 3 of 3" 2011-11-14 22:44 ` Alan Modra 2011-11-15 2:50 ` Hans-Peter Nilsson @ 2011-11-15 5:03 ` Richard Henderson 2011-11-15 6:11 ` Bernd Schmidt 2011-11-15 18:54 ` Richard Henderson 2 siblings, 1 reply; 48+ messages in thread From: Richard Henderson @ 2011-11-15 5:03 UTC (permalink / raw) To: Hans-Peter Nilsson, bernds, hp, gcc-patches On 11/14/2011 11:56 AM, Alan Modra wrote: > On Mon, Nov 14, 2011 at 07:48:03AM -1000, Richard Henderson wrote: >> On 11/14/2011 04:10 AM, Hans-Peter Nilsson wrote: >>> Looks like all we need is a positive review of >>> <http://gcc.gnu.org/ml/gcc-patches/2011-11/msg01409.html> and a >>> ChangeLog entry to unbreak three or more targets. >>> >>> Someone with approval rights: pretty please? >> >> That patch is ok. > > Sorry for the bootstrap problems. I believe the cause is my > extraction of convert_jumps_to_returns and emit_return_for_exit. The > new HAVE_return code misses a single_succ_p test (covered by the > bitmap_bit_p (&bb_flags, ...) test in the HAVE_simple_return case). > > I haven't really looked into what Bernd's fix does. I know this one > fixes what I broke.. > > * function.c (thread_prologue_and_epilogue_insns): Guard > emitting return with single_succ_p test. Hmm. This looks plausible too. Bernd's patch made sure that cfglayout didn't do something impossible. Of course, it's possible that his patch should merely be an assert, and the correct fix goes here. Thoughts? r~ ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: CFG review needed for fix of "PowerPC shrink-wrap support 3 of 3" 2011-11-15 5:03 ` Richard Henderson @ 2011-11-15 6:11 ` Bernd Schmidt 2011-11-15 7:09 ` David Miller 0 siblings, 1 reply; 48+ messages in thread From: Bernd Schmidt @ 2011-11-15 6:11 UTC (permalink / raw) To: Richard Henderson; +Cc: Hans-Peter Nilsson, hp, gcc-patches On 11/15/11 01:43, Richard Henderson wrote: > On 11/14/2011 11:56 AM, Alan Modra wrote: >> * function.c (thread_prologue_and_epilogue_insns): Guard >> emitting return with single_succ_p test. > > Hmm. This looks plausible too. > > Bernd's patch made sure that cfglayout didn't do something impossible. > Of course, it's possible that his patch should merely be an assert, > and the correct fix goes here. I haven't really looked at Alan's patch; I'm thinking there were probably two bugs - the one I found being latent before. Note that the problematic CFG exists way before thread_prologue_and_epilogue even runs, so an assert would still trigger. Bernd ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: CFG review needed for fix of "PowerPC shrink-wrap support 3 of 3" 2011-11-15 6:11 ` Bernd Schmidt @ 2011-11-15 7:09 ` David Miller 0 siblings, 0 replies; 48+ messages in thread From: David Miller @ 2011-11-15 7:09 UTC (permalink / raw) To: bernds; +Cc: rth, hans-peter.nilsson, hp, gcc-patches From: Bernd Schmidt <bernds@codesourcery.com> Date: Tue, 15 Nov 2011 01:54:34 +0100 > On 11/15/11 01:43, Richard Henderson wrote: >> On 11/14/2011 11:56 AM, Alan Modra wrote: >>> * function.c (thread_prologue_and_epilogue_insns): Guard >>> emitting return with single_succ_p test. >> >> Hmm. This looks plausible too. >> >> Bernd's patch made sure that cfglayout didn't do something impossible. >> Of course, it's possible that his patch should merely be an assert, >> and the correct fix goes here. > > I haven't really looked at Alan's patch; I'm thinking there were > probably two bugs - the one I found being latent before. Note that the > problematic CFG exists way before thread_prologue_and_epilogue even > runs, so an assert would still trigger. I suspect that Alan's fix here will take care of the sparc ada bootstrap regression reported, and regardless if it's correct it should be installed. ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: CFG review needed for fix of "PowerPC shrink-wrap support 3 of 3" 2011-11-14 22:44 ` Alan Modra 2011-11-15 2:50 ` Hans-Peter Nilsson 2011-11-15 5:03 ` Richard Henderson @ 2011-11-15 18:54 ` Richard Henderson 2 siblings, 0 replies; 48+ messages in thread From: Richard Henderson @ 2011-11-15 18:54 UTC (permalink / raw) To: Hans-Peter Nilsson, bernds, hp, gcc-patches On 11/14/2011 11:56 AM, Alan Modra wrote: > * function.c (thread_prologue_and_epilogue_insns): Guard > emitting return with single_succ_p test. Ok. r~ ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: CFG review needed for fix of "PowerPC shrink-wrap support 3 of 3" 2011-11-14 18:21 ` Richard Henderson 2011-11-14 22:44 ` Alan Modra @ 2011-11-15 0:45 ` Hans-Peter Nilsson 1 sibling, 0 replies; 48+ messages in thread From: Hans-Peter Nilsson @ 2011-11-15 0:45 UTC (permalink / raw) To: gcc-patches; +Cc: bernds > From: Richard Henderson <rth@redhat.com> > Date: Mon, 14 Nov 2011 18:48:03 +0100 > On 11/14/2011 04:10 AM, Hans-Peter Nilsson wrote: > > Looks like all we need is a positive review of > > <http://gcc.gnu.org/ml/gcc-patches/2011-11/msg01409.html> and a > > ChangeLog entry to unbreak three or more targets. > > > > Someone with approval rights: pretty please? > > That patch is ok. Committed with this ChangeLog entry, obvious from the function header and the added comment. 2011-11-15 Bernd Schmidt <bernds@codesourcery.com> PR rtl-optimization/51051 * cfgrtl.c (cfg_layout_can_merge_blocks_p): Return FALSE if the move would cause fallthrough into the exit block. brgds, H-P ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: Revert "PowerPC shrink-wrap support 3 of 3" 2011-11-10 14:13 ` Bernd Schmidt 2011-11-10 15:23 ` Hans-Peter Nilsson @ 2011-11-11 0:22 ` Michael Meissner 1 sibling, 0 replies; 48+ messages in thread From: Michael Meissner @ 2011-11-11 0:22 UTC (permalink / raw) To: Bernd Schmidt Cc: Richard Guenther, Hans-Peter Nilsson, hp, amodra, gcc-patches On Thu, Nov 10, 2011 at 02:29:04PM +0100, Bernd Schmidt wrote: > On 11/10/11 13:14, Richard Guenther wrote: > > Fair enough. You can count me as "one" then, and I'll defer to Bernd > > to either provide a fix or ack the revert. > > I'm trying to track it down. > > In 189r.outof_cfglayout, we have > > (insn 31 33 35 3 (use (reg/i:SI 0 r0)) > ../../../../baseline-trunk/libstdc++-v3/libsupc++/new_opv.cc:34 -1 > (nil)) > > ;; Successors: EXIT [100.0%] (fallthru) > ;; lr out 0 [r0] 11 [fp] 13 [sp] 14 [lr] 25 [sfp] 26 [afp] > ;; live out 0 [r0] 11 [fp] 13 [sp] 25 [sfp] 26 [afp] > > followed by a number of other basic blocks, so that looks wrong to me. > outof_cfglayout seems to assume that fallthrough edges to the exit block > are OK and don't need fixing up, and changing that seems nontrivial at > first glance. > > The situation is first created during cfgcleanup in into_cfglayout. The > following patch makes the testcase compile by stopping the compiler from > moving the exit fallthru block around, but I've not checked whether it > has a negative effect on code quality. HP, can you run full tests? FWIW, I did bootstrap and make check with/without this patch, and it introduces no regressions in the PowerPC, but I haven't look at the code generated. -- Michael Meissner, IBM 5 Technology Place Drive, M/S 2757, Westford, MA 01886-3141, USA meissner@linux.vnet.ibm.com fax +1 (978) 399-6899 ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: Revert "PowerPC shrink-wrap support 3 of 3" 2011-11-10 11:25 ` Revert "PowerPC shrink-wrap support 3 of 3" Hans-Peter Nilsson 2011-11-10 12:10 ` Richard Guenther @ 2015-04-08 11:11 ` Gerald Pfeifer 1 sibling, 0 replies; 48+ messages in thread From: Gerald Pfeifer @ 2015-04-08 11:11 UTC (permalink / raw) To: Hans-Peter Nilsson; +Cc: Alan Modra, gcc-patches, Bernd Schmidt On Thu, 10 Nov 2011, Hans-Peter Nilsson wrote: > I think I need someone with appropriate write privileges to > agree with that, and to also give 48h for someone to fix the > problem. Sorry for not forthcoming on the second point. > > brgds, H-P > PS. where is the policy written down, besides the mailing list archives? https://gcc.gnu.org/develop.html has had the following since 2003 or so: Patch Reversion If a patch is committed which introduces a regression on any target which the Steering Committee considers to be important and if: * the problem is reported to the original poster; * 48 hours pass without the original poster or any other party indicating that a fix will be forthcoming in the very near future; * two people with write privileges to the affected area of the compiler determine that the best course of action is to revert the patch; then they may revert the patch. (The list of important targets will be revised at the beginning of each release cycle, if necessary, and is part of the release criteria.) After the patch has been reverted, the poster may appeal the decision to the Steering Committee. Note that no distinction is made between patches which are themselves buggy and patches that expose latent bugs elsewhere in the compiler. This is there as part of the overall (release) methodology, though svnwrite.html would be at least as natural. Perhaps a reference there? Thoughts? Gerald PS: Yes, I am catching up with some older mails. ;-) ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: PowerPC shrink-wrap support 0 of 3 2011-09-17 7:59 PowerPC shrink-wrap support 0 of 3 Alan Modra ` (2 preceding siblings ...) 2011-09-17 9:26 ` PowerPC shrink-wrap support 3 " Alan Modra @ 2011-09-17 18:16 ` Bernd Schmidt 2011-09-19 5:39 ` Alan Modra 3 siblings, 1 reply; 48+ messages in thread From: Bernd Schmidt @ 2011-09-17 18:16 UTC (permalink / raw) To: gcc-patches, David Edelsohn On 09/17/11 09:16, Alan Modra wrote: > This patch series adds shrink-wrap support for PowerPC. The patches > are on top of Bernd's "Initial shrink-wrapping patch": > http://gcc.gnu.org/ml/gcc-patches/2011-08/msg02557.html, but with the > tm.texi patch applied to tm.texi.in. Bootstrapped and regression > tested powerpc64-linux all langs except ada, and spec CPU2006 tested. > The spec results were a little disappointing as I expected to see some > gains, but my baseline was a -O3 run and I suppose most of the > shrink-wrap opportunities were lost to inlining. The last posted version had a bug that crept in during the review cycle, and which made it quite ineffective. Find this block /* As a special case, check for jumps to the last bb that cannot successfully be converted to simple_returns later on, and mark them as requiring a frame. These are conditional jumps that jump to their fallthru block, so it's not a case that is expected to occur often. */ and replace "condjump_p" in the if statement with "any_condjump_p". There'll be further patches which increase the number of shrink-wrapping opportunities we get to see. One of them was submitted earlier: http://gcc.gnu.org/ml/gcc-patches/2011-03/msg01499.html http://gcc.gnu.org/ml/gcc-patches/2011-03/msg02191.html Bernd ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: PowerPC shrink-wrap support 0 of 3 2011-09-17 18:16 ` PowerPC shrink-wrap support 0 of 3 Bernd Schmidt @ 2011-09-19 5:39 ` Alan Modra 2011-09-19 13:36 ` Bernd Schmidt 0 siblings, 1 reply; 48+ messages in thread From: Alan Modra @ 2011-09-19 5:39 UTC (permalink / raw) To: Bernd Schmidt; +Cc: gcc-patches, David Edelsohn On Sat, Sep 17, 2011 at 03:26:21PM +0200, Bernd Schmidt wrote: > On 09/17/11 09:16, Alan Modra wrote: > > This patch series adds shrink-wrap support for PowerPC. The patches > > are on top of Bernd's "Initial shrink-wrapping patch": > > http://gcc.gnu.org/ml/gcc-patches/2011-08/msg02557.html, but with the > > tm.texi patch applied to tm.texi.in. Bootstrapped and regression > > tested powerpc64-linux all langs except ada, and spec CPU2006 tested. > > The spec results were a little disappointing as I expected to see some > > gains, but my baseline was a -O3 run and I suppose most of the > > shrink-wrap opportunities were lost to inlining. > > The last posted version had a bug that crept in during the review cycle, > and which made it quite ineffective. I wasn't complaining! My disappointment really stemmed from having unrealistically high expectations. I still think this optimization is a great feature. Thanks for contributing it! -- Alan Modra Australia Development Lab, IBM ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: PowerPC shrink-wrap support 0 of 3 2011-09-19 5:39 ` Alan Modra @ 2011-09-19 13:36 ` Bernd Schmidt [not found] ` <20110921152851.GE10321@bubble.grove.modra.org> 0 siblings, 1 reply; 48+ messages in thread From: Bernd Schmidt @ 2011-09-19 13:36 UTC (permalink / raw) To: gcc-patches, David Edelsohn On 09/19/11 02:16, Alan Modra wrote: > On Sat, Sep 17, 2011 at 03:26:21PM +0200, Bernd Schmidt wrote: >> On 09/17/11 09:16, Alan Modra wrote: >>> This patch series adds shrink-wrap support for PowerPC. The patches >>> are on top of Bernd's "Initial shrink-wrapping patch": >>> http://gcc.gnu.org/ml/gcc-patches/2011-08/msg02557.html, but with the >>> tm.texi patch applied to tm.texi.in. Bootstrapped and regression >>> tested powerpc64-linux all langs except ada, and spec CPU2006 tested. >>> The spec results were a little disappointing as I expected to see some >>> gains, but my baseline was a -O3 run and I suppose most of the >>> shrink-wrap opportunities were lost to inlining. >> >> The last posted version had a bug that crept in during the review cycle, >> and which made it quite ineffective. > > I wasn't complaining! I wasn't implying you were :) I was just hoping to see some good benchmark results too, so that I can feel that all the pain was actually worth it... Bernd ^ permalink raw reply [flat|nested] 48+ messages in thread
[parent not found: <20110921152851.GE10321@bubble.grove.modra.org>]
[parent not found: <20110922144017.GF10321@bubble.grove.modra.org>]
* [PATCH] PowerPC shrink-wrap support benchmark gains [not found] ` <20110922144017.GF10321@bubble.grove.modra.org> @ 2011-09-26 14:35 ` Alan Modra 0 siblings, 0 replies; 48+ messages in thread From: Alan Modra @ 2011-09-26 14:35 UTC (permalink / raw) To: Bernd Schmidt, Peter Bergner, gcc-patches This patch increases opportunities for shrink-wrapping. With this applied, http://gcc.gnu.org/ml/gcc-patches/2011-09/msg00754.html and http://gcc.gnu.org/ml/gcc-patches/2011-03/msg01499.html plus the fix mentioned in http://gcc.gnu.org/ml/gcc-patches/2011-09/msg01016.html, along with my powerpc changes, I finally see some gains in CPU2006. 453.povray gives a massive 12% gain in one particular setup, with other benchmarks in the suite mostly showing small improvements. Some of this gain is likely due to having an extra gpr to play with before we start saving regs. * gcc/ifcvt.c (dead_or_predicable): Disable if-conversion when doing so is likely to kill a shrink-wrapping opportunity. diff -urp -x'*~' -x'*.orig' -x'*.rej' -x.svn gcc-alanshrink1/gcc/ifcvt.c gcc-current/gcc/ifcvt.c --- gcc-alanshrink1/gcc/ifcvt.c 2011-09-26 15:29:14.000000000 +0930 +++ gcc-current/gcc/ifcvt.c 2011-09-26 14:24:53.000000000 +0930 @@ -4138,6 +4138,64 @@ dead_or_predicable (basic_block test_bb, FOR_BB_INSNS (merge_bb, insn) if (NONDEBUG_INSN_P (insn)) df_simulate_find_defs (insn, merge_set); + + /* If shrink-wrapping, disable this optimization when test_bb is + the first basic block and merge_bb exits. The idea is to not + move code setting up a return register as that may clobber a + register used to pass function parameters, which then must be + saved in caller-saved regs. A caller-saved reg requires the + prologue, killing a shrink-wrap opportunity. */ + if ((flag_shrink_wrap && !epilogue_completed) + && ENTRY_BLOCK_PTR->next_bb == test_bb + && single_succ_p (new_dest) + && single_succ (new_dest) == EXIT_BLOCK_PTR + && bitmap_intersect_p (df_get_live_in (new_dest), merge_set)) + { + regset return_regs; + unsigned int i; + + return_regs = BITMAP_ALLOC (®_obstack); + + /* Start off with the intersection of regs used to pass + params and regs used to return values. */ + for (i = 0; i < FIRST_PSEUDO_REGISTER; i++) + if (FUNCTION_ARG_REGNO_P (i) + && FUNCTION_VALUE_REGNO_P (i)) + bitmap_set_bit (return_regs, INCOMING_REGNO (i)); + + bitmap_and_into (return_regs, df_get_live_out (ENTRY_BLOCK_PTR)); + bitmap_and_into (return_regs, df_get_live_in (EXIT_BLOCK_PTR)); + if (!bitmap_empty_p (return_regs)) + { + FOR_BB_INSNS_REVERSE (new_dest, insn) + if (NONDEBUG_INSN_P (insn)) + { + df_ref *def_rec; + unsigned int uid = INSN_UID (insn); + + /* If this insn sets any reg in return_regs.. */ + for (def_rec = DF_INSN_UID_DEFS (uid); *def_rec; def_rec++) + { + df_ref def = *def_rec; + unsigned r = DF_REF_REGNO (def); + + if (bitmap_bit_p (return_regs, r)) + break; + } + /* ..then add all reg uses to the set of regs + we're interested in. */ + if (*def_rec) + df_simulate_uses (insn, return_regs); + } + if (bitmap_intersect_p (merge_set, return_regs)) + { + BITMAP_FREE (return_regs); + BITMAP_FREE (merge_set); + return FALSE; + } + } + BITMAP_FREE (return_regs); + } } no_body: -- Alan Modra Australia Development Lab, IBM ^ permalink raw reply [flat|nested] 48+ messages in thread
end of thread, other threads:[~2015-04-08 11:11 UTC | newest] Thread overview: 48+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2011-09-17 7:59 PowerPC shrink-wrap support 0 of 3 Alan Modra 2011-09-17 8:22 ` PowerPC shrink-wrap support 1 " Alan Modra 2011-09-17 9:13 ` PowerPC shrink-wrap support 2 " Alan Modra 2011-09-17 9:26 ` PowerPC shrink-wrap support 3 " Alan Modra 2011-09-26 14:25 ` Alan Modra 2011-09-27 0:15 ` Alan Modra 2011-09-27 0:19 ` Bernd Schmidt 2011-09-27 0:49 ` Alan Modra 2011-09-27 1:08 ` Bernd Schmidt 2011-09-27 2:16 ` Alan Modra 2011-09-28 16:35 ` Alan Modra 2011-10-16 20:19 ` David Edelsohn 2011-10-26 13:03 ` Alan Modra 2011-10-26 13:42 ` Bernd Schmidt 2011-10-26 14:40 ` Alan Modra 2011-10-26 14:44 ` Bernd Schmidt 2011-10-26 15:40 ` Alan Modra 2011-10-28 0:41 ` Alan Modra 2011-10-31 15:14 ` Alan Modra 2011-11-01 15:34 ` Alan Modra 2011-11-07 17:27 ` Jakub Jelinek 2011-11-09 9:48 ` Hans-Peter Nilsson 2011-11-10 11:25 ` Revert "PowerPC shrink-wrap support 3 of 3" Hans-Peter Nilsson 2011-11-10 12:10 ` Richard Guenther 2011-11-10 13:29 ` Hans-Peter Nilsson 2011-11-10 13:44 ` Richard Guenther 2011-11-10 14:13 ` Bernd Schmidt 2011-11-10 15:23 ` Hans-Peter Nilsson 2011-11-10 18:06 ` Hans-Peter Nilsson 2011-11-11 22:09 ` Hans-Peter Nilsson 2011-11-14 14:59 ` Bernd Schmidt 2011-11-14 16:49 ` CFG review needed for fix of " Hans-Peter Nilsson 2011-11-14 17:06 ` Ramana Radhakrishnan 2011-11-14 17:15 ` Rainer Orth 2011-11-14 18:21 ` Richard Henderson 2011-11-14 22:44 ` Alan Modra 2011-11-15 2:50 ` Hans-Peter Nilsson 2011-11-15 5:03 ` Richard Henderson 2011-11-15 6:11 ` Bernd Schmidt 2011-11-15 7:09 ` David Miller 2011-11-15 18:54 ` Richard Henderson 2011-11-15 0:45 ` Hans-Peter Nilsson 2011-11-11 0:22 ` Revert " Michael Meissner 2015-04-08 11:11 ` Gerald Pfeifer 2011-09-17 18:16 ` PowerPC shrink-wrap support 0 of 3 Bernd Schmidt 2011-09-19 5:39 ` Alan Modra 2011-09-19 13:36 ` Bernd Schmidt [not found] ` <20110921152851.GE10321@bubble.grove.modra.org> [not found] ` <20110922144017.GF10321@bubble.grove.modra.org> 2011-09-26 14:35 ` [PATCH] PowerPC shrink-wrap support benchmark gains Alan Modra
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).