* [PATCH] Don't combine param and return value copies @ 2015-05-23 11:20 Alan Modra 2015-05-23 11:20 ` Andrew Pinski 2015-05-23 19:40 ` Segher Boessenkool 0 siblings, 2 replies; 8+ messages in thread From: Alan Modra @ 2015-05-23 11:20 UTC (permalink / raw) To: gcc-patches; +Cc: Segher Boessenkool This stops combine messing with parameter and return value copies from/to hard registers. Bootstrapped and regression tested powerpc64le-linux, powerpc64-linux and x86_64-linux. In looking at a number of different powerpc64le gcc/*.o files, I noticed a few code generation improvements. There were cases where a register copy was no longer needed, cmp used in place of mr., and rlwinm instead of rldicl. x86_64 gcc/*.o showed no changes (apart from combine.o of course), but should see a small improvement in compile time with this change. The "clear next_use when !can_combine_p" change is to fix a non-bug. Given 1) insn defining rn ... 2) insn defining rn but !can_combine_p ... 3) insn using rn then create_log_links might create a link from (3) to (1), I thought. However, can_combine_p doesn't currently allow this to happen. Obviously, any can_combine_p result depending on regno shouldn't give a different result at (1) and (2), but there is also at test of DF_REF_PRE_POST_MODIFY that can. The saving grace is that pre/post modify insns also use the register, which means next_use[rn] will point at (2), not (3), when (1) is processed. I came across this because at one stage I considered modifying can_combine_p. Someone who does so in the future might trigger the above problem, so I thought it worth posting the change. OK for mainline? * combine.c (set_return_regs): New function. (twiddle_first_block, twiddle_last_block): New functions. (create_log_links): Exclude instructions copying parameter values from hard regs to pseudos, and instructions copying return value pseudos to hard regs. Clear next_use when !can_combine_p. Index: gcc/combine.c =================================================================== --- gcc/combine.c (revision 223463) +++ gcc/combine.c (working copy) @@ -1048,9 +1048,79 @@ can_combine_use_p (df_ref use) return true; } -/* Fill in log links field for all insns. */ +/* Used to build set of return value regs. Add X to the set. */ static void +set_return_regs (rtx x, void *arg) +{ + HARD_REG_SET *regs = (HARD_REG_SET *) arg; + + add_to_hard_reg_set (regs, GET_MODE (x), REGNO (x)); +} + +/* Twiddle BLOCK_FOR_INSN to TO for instructions in the first block BB + that we don't want to combine with other instructions. */ + +static void +twiddle_first_block (basic_block bb, basic_block to) +{ + rtx_insn *insn; + + FOR_BB_INSNS (bb, insn) + { + if (NOTE_P (insn) && NOTE_KIND (insn) == NOTE_INSN_FUNCTION_BEG) + break; + if (!NONDEBUG_INSN_P (insn)) + continue; + + /* reg,reg copies from parameter hard regs. */ + rtx set = single_set (insn); + if (set + && REG_P (SET_DEST (set)) + && REG_P (SET_SRC (set)) + && HARD_REGISTER_P (SET_SRC (set))) + set_block_for_insn (insn, to); + } +} + +/* Twiddle BLOCK_FOR_INSN to TO for instructions in the last block BB + that we don't want to combine with other instructions. */ + +static void +twiddle_last_block (basic_block bb, basic_block to, HARD_REG_SET return_regs) +{ + rtx_insn *insn; + + FOR_BB_INSNS_REVERSE (bb, insn) + { + if (CALL_P (insn)) + break; + if (!NONDEBUG_INSN_P (insn)) + continue; + + rtx reg = NULL_RTX; + /* use_return_regster added USEs. */ + if (GET_CODE (PATTERN (insn)) == USE) + reg = XEXP (PATTERN (insn), 0); + else + { + /* reg,reg copies that set return value hard regs. */ + rtx set = single_set (insn); + if (set && REG_P (SET_SRC (set))) + reg = SET_DEST (set); + } + if (reg + && REG_P (reg) + && HARD_REGISTER_P (reg) + && overlaps_hard_reg_set_p (return_regs, + GET_MODE (reg), REGNO (reg))) + set_block_for_insn (insn, to); + } +} + +/* Fill in log links field for all insns that we wish to combine. */ + +static void create_log_links (void) { basic_block bb; @@ -1057,9 +1127,28 @@ create_log_links (void) rtx_insn **next_use; rtx_insn *insn; df_ref def, use; + HARD_REG_SET return_regs; next_use = XCNEWVEC (rtx_insn *, max_reg_num ()); + /* Don't combine instructions copying parameter values from hard + regs to pseudos. Exclude such instructions from LOG_LINKS by + temporarily zapping BLOCK_FOR_INSN. */ + + bb = ENTRY_BLOCK_PTR_FOR_FN (cfun)->next_bb; + twiddle_first_block (bb, 0); + + /* Similarly, don't combine instructions copying return values + from pseudos to hard regs. */ + + CLEAR_HARD_REG_SET (return_regs); + diddle_return_value (set_return_regs, &return_regs); + if (!hard_reg_set_empty_p (return_regs)) + { + bb = EXIT_BLOCK_PTR_FOR_FN (cfun)->prev_bb; + twiddle_last_block (bb, 0, return_regs); + } + /* Pass through each block from the end, recording the uses of each register and establishing log links when def is encountered. Note that we do not clear next_use array in order to save time, @@ -1087,12 +1176,12 @@ create_log_links (void) if (!next_use[regno]) continue; + use_insn = next_use[regno]; + next_use[regno] = NULL; + if (!can_combine_def_p (def)) continue; - use_insn = next_use[regno]; - next_use[regno] = NULL; - if (BLOCK_FOR_INSN (use_insn) != bb) continue; @@ -1103,7 +1192,7 @@ create_log_links (void) we might wind up changing the semantics of the insn, even if reload can make what appear to be valid assignments later. */ - if (regno < FIRST_PSEUDO_REGISTER + if (HARD_REGISTER_NUM_P (regno) && asm_noperands (PATTERN (use_insn)) >= 0) continue; @@ -1124,6 +1213,16 @@ create_log_links (void) } } + /* Repair BLOCK_FOR_INSN. */ + + bb = ENTRY_BLOCK_PTR_FOR_FN (cfun)->next_bb; + twiddle_first_block (bb, bb); + if (!hard_reg_set_empty_p (return_regs)) + { + bb = EXIT_BLOCK_PTR_FOR_FN (cfun)->prev_bb; + twiddle_last_block (bb, bb, return_regs); + } + free (next_use); } -- Alan Modra Australia Development Lab, IBM ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] Don't combine param and return value copies 2015-05-23 11:20 [PATCH] Don't combine param and return value copies Alan Modra @ 2015-05-23 11:20 ` Andrew Pinski 2015-05-23 19:40 ` Segher Boessenkool 1 sibling, 0 replies; 8+ messages in thread From: Andrew Pinski @ 2015-05-23 11:20 UTC (permalink / raw) To: GCC Patches, Segher Boessenkool On Sat, May 23, 2015 at 1:33 AM, Alan Modra <amodra@gmail.com> wrote: > This stops combine messing with parameter and return value copies > from/to hard registers. Bootstrapped and regression tested > powerpc64le-linux, powerpc64-linux and x86_64-linux. In looking at a > number of different powerpc64le gcc/*.o files, I noticed a few code > generation improvements. There were cases where a register copy was > no longer needed, cmp used in place of mr., and rlwinm instead of > rldicl. x86_64 gcc/*.o showed no changes (apart from combine.o of > course), but should see a small improvement in compile time with this > change. I noticed this a while back but I never got around to fixing this. This should also improve other RISC targets like AARCH64 and MIPS where I had saw issues like this too. Thanks, Andrew > > The "clear next_use when !can_combine_p" change is to fix a non-bug. > Given > > 1) insn defining rn > ... > 2) insn defining rn but !can_combine_p > ... > 3) insn using rn > > then create_log_links might create a link from (3) to (1), I thought. > However, can_combine_p doesn't currently allow this to happen. > Obviously, any can_combine_p result depending on regno shouldn't give > a different result at (1) and (2), but there is also at test of > DF_REF_PRE_POST_MODIFY that can. The saving grace is that pre/post > modify insns also use the register, which means next_use[rn] will > point at (2), not (3), when (1) is processed. > > I came across this because at one stage I considered modifying > can_combine_p. Someone who does so in the future might trigger the > above problem, so I thought it worth posting the change. > > OK for mainline? > > * combine.c (set_return_regs): New function. > (twiddle_first_block, twiddle_last_block): New functions. > (create_log_links): Exclude instructions copying parameter > values from hard regs to pseudos, and instructions copying > return value pseudos to hard regs. Clear next_use when > !can_combine_p. > > Index: gcc/combine.c > =================================================================== > --- gcc/combine.c (revision 223463) > +++ gcc/combine.c (working copy) > @@ -1048,9 +1048,79 @@ can_combine_use_p (df_ref use) > return true; > } > > -/* Fill in log links field for all insns. */ > +/* Used to build set of return value regs. Add X to the set. */ > > static void > +set_return_regs (rtx x, void *arg) > +{ > + HARD_REG_SET *regs = (HARD_REG_SET *) arg; > + > + add_to_hard_reg_set (regs, GET_MODE (x), REGNO (x)); > +} > + > +/* Twiddle BLOCK_FOR_INSN to TO for instructions in the first block BB > + that we don't want to combine with other instructions. */ > + > +static void > +twiddle_first_block (basic_block bb, basic_block to) > +{ > + rtx_insn *insn; > + > + FOR_BB_INSNS (bb, insn) > + { > + if (NOTE_P (insn) && NOTE_KIND (insn) == NOTE_INSN_FUNCTION_BEG) > + break; > + if (!NONDEBUG_INSN_P (insn)) > + continue; > + > + /* reg,reg copies from parameter hard regs. */ > + rtx set = single_set (insn); > + if (set > + && REG_P (SET_DEST (set)) > + && REG_P (SET_SRC (set)) > + && HARD_REGISTER_P (SET_SRC (set))) > + set_block_for_insn (insn, to); > + } > +} > + > +/* Twiddle BLOCK_FOR_INSN to TO for instructions in the last block BB > + that we don't want to combine with other instructions. */ > + > +static void > +twiddle_last_block (basic_block bb, basic_block to, HARD_REG_SET return_regs) > +{ > + rtx_insn *insn; > + > + FOR_BB_INSNS_REVERSE (bb, insn) > + { > + if (CALL_P (insn)) > + break; > + if (!NONDEBUG_INSN_P (insn)) > + continue; > + > + rtx reg = NULL_RTX; > + /* use_return_regster added USEs. */ > + if (GET_CODE (PATTERN (insn)) == USE) > + reg = XEXP (PATTERN (insn), 0); > + else > + { > + /* reg,reg copies that set return value hard regs. */ > + rtx set = single_set (insn); > + if (set && REG_P (SET_SRC (set))) > + reg = SET_DEST (set); > + } > + if (reg > + && REG_P (reg) > + && HARD_REGISTER_P (reg) > + && overlaps_hard_reg_set_p (return_regs, > + GET_MODE (reg), REGNO (reg))) > + set_block_for_insn (insn, to); > + } > +} > + > +/* Fill in log links field for all insns that we wish to combine. */ > + > +static void > create_log_links (void) > { > basic_block bb; > @@ -1057,9 +1127,28 @@ create_log_links (void) > rtx_insn **next_use; > rtx_insn *insn; > df_ref def, use; > + HARD_REG_SET return_regs; > > next_use = XCNEWVEC (rtx_insn *, max_reg_num ()); > > + /* Don't combine instructions copying parameter values from hard > + regs to pseudos. Exclude such instructions from LOG_LINKS by > + temporarily zapping BLOCK_FOR_INSN. */ > + > + bb = ENTRY_BLOCK_PTR_FOR_FN (cfun)->next_bb; > + twiddle_first_block (bb, 0); > + > + /* Similarly, don't combine instructions copying return values > + from pseudos to hard regs. */ > + > + CLEAR_HARD_REG_SET (return_regs); > + diddle_return_value (set_return_regs, &return_regs); > + if (!hard_reg_set_empty_p (return_regs)) > + { > + bb = EXIT_BLOCK_PTR_FOR_FN (cfun)->prev_bb; > + twiddle_last_block (bb, 0, return_regs); > + } > + > /* Pass through each block from the end, recording the uses of each > register and establishing log links when def is encountered. > Note that we do not clear next_use array in order to save time, > @@ -1087,12 +1176,12 @@ create_log_links (void) > if (!next_use[regno]) > continue; > > + use_insn = next_use[regno]; > + next_use[regno] = NULL; > + > if (!can_combine_def_p (def)) > continue; > > - use_insn = next_use[regno]; > - next_use[regno] = NULL; > - > if (BLOCK_FOR_INSN (use_insn) != bb) > continue; > > @@ -1103,7 +1192,7 @@ create_log_links (void) > we might wind up changing the semantics of the insn, > even if reload can make what appear to be valid > assignments later. */ > - if (regno < FIRST_PSEUDO_REGISTER > + if (HARD_REGISTER_NUM_P (regno) > && asm_noperands (PATTERN (use_insn)) >= 0) > continue; > > @@ -1124,6 +1213,16 @@ create_log_links (void) > } > } > > + /* Repair BLOCK_FOR_INSN. */ > + > + bb = ENTRY_BLOCK_PTR_FOR_FN (cfun)->next_bb; > + twiddle_first_block (bb, bb); > + if (!hard_reg_set_empty_p (return_regs)) > + { > + bb = EXIT_BLOCK_PTR_FOR_FN (cfun)->prev_bb; > + twiddle_last_block (bb, bb, return_regs); > + } > + > free (next_use); > } > > > -- > Alan Modra > Australia Development Lab, IBM ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] Don't combine param and return value copies 2015-05-23 11:20 [PATCH] Don't combine param and return value copies Alan Modra 2015-05-23 11:20 ` Andrew Pinski @ 2015-05-23 19:40 ` Segher Boessenkool 2015-05-25 5:24 ` Alan Modra 1 sibling, 1 reply; 8+ messages in thread From: Segher Boessenkool @ 2015-05-23 19:40 UTC (permalink / raw) To: gcc-patches Hi Alan, On Sat, May 23, 2015 at 06:03:05PM +0930, Alan Modra wrote: > This stops combine messing with parameter and return value copies > from/to hard registers. Bootstrapped and regression tested > powerpc64le-linux, powerpc64-linux and x86_64-linux. In looking at a > number of different powerpc64le gcc/*.o files, I noticed a few code > generation improvements. There were cases where a register copy was > no longer needed, cmp used in place of mr., and rlwinm instead of > rldicl. x86_64 gcc/*.o showed no changes (apart from combine.o of > course), but should see a small improvement in compile time with this > change. Thanks. Did you see improvements around return as well, or mostly / only related to the function start? > The "clear next_use when !can_combine_p" change is to fix a non-bug. > Given > > 1) insn defining rn > ... > 2) insn defining rn but !can_combine_p > ... > 3) insn using rn > > then create_log_links might create a link from (3) to (1), I thought. > However, can_combine_p doesn't currently allow this to happen. > Obviously, any can_combine_p result depending on regno shouldn't give > a different result at (1) and (2), but there is also at test of > DF_REF_PRE_POST_MODIFY that can. The saving grace is that pre/post > modify insns also use the register, which means next_use[rn] will > point at (2), not (3), when (1) is processed. > > I came across this because at one stage I considered modifying > can_combine_p. Someone who does so in the future might trigger the > above problem, so I thought it worth posting the change. I agree it looks like a bug waiting to happen. Please post it as a separate patch though. > +/* Twiddle BLOCK_FOR_INSN to TO for instructions in the first block BB > + that we don't want to combine with other instructions. */ > + > +static void > +twiddle_first_block (basic_block bb, basic_block to) I don't like this much. Messing with global state makes things harder to change later. If it is hard to come up with a good name for a factor, it usually means it is not such a good factor. You can do these inside can_combine_{def,use}_p as far as I can see? Probably need to give those an extra parameter then: for _def, a bool that says "don't allow moves from hard regs", set when the scan has encountered a NOTE_INSN_FUNCTION_BEG in this bb; and for _use, a regset of those hard regs we don't want to allow moves to (those seen in USEs later in that same block). Or do it in the main loop itself, _{def,use} are each called in one place only; whatever works best. What makes return values different from CALL args here, btw? Why do you not want to combine return values, but you leave alone call args? > +/* Fill in log links field for all insns that we wish to combine. */ Please don't change this comment; it was more correct before. > @@ -1103,7 +1192,7 @@ create_log_links (void) > we might wind up changing the semantics of the insn, > even if reload can make what appear to be valid > assignments later. */ > - if (regno < FIRST_PSEUDO_REGISTER > + if (HARD_REGISTER_NUM_P (regno) > && asm_noperands (PATTERN (use_insn)) >= 0) > continue; An independent change. Segher ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] Don't combine param and return value copies 2015-05-23 19:40 ` Segher Boessenkool @ 2015-05-25 5:24 ` Alan Modra 2015-05-25 7:30 ` Alan Modra 2015-05-25 20:19 ` Segher Boessenkool 0 siblings, 2 replies; 8+ messages in thread From: Alan Modra @ 2015-05-25 5:24 UTC (permalink / raw) To: Segher Boessenkool; +Cc: gcc-patches On Sat, May 23, 2015 at 08:45:23AM -0500, Segher Boessenkool wrote: > Thanks. Did you see improvements around return as well, or mostly / > only related to the function start? The rlwinm vs. rldicl change was in dwarf2out.c add_ranges_num for r3 one insn before a blr. I'm sure that was due to not combining return value copies. In fact, I'm sure all the improvements I saw were due to changing the exit.. See below. [snip] > I agree it looks like a bug waiting to happen. Please post it as a > separate patch though. OK. > > +/* Twiddle BLOCK_FOR_INSN to TO for instructions in the first block BB > > + that we don't want to combine with other instructions. */ > > + > > +static void > > +twiddle_first_block (basic_block bb, basic_block to) > > I don't like this much. Messing with global state makes things harder > to change later. If it is hard to come up with a good name for a > factor, it usually means it is not such a good factor. > > You can do these inside can_combine_{def,use}_p as far as I can see? > Probably need to give those an extra parameter then: for _def, a bool > that says "don't allow moves from hard regs", set when the scan has > encountered a NOTE_INSN_FUNCTION_BEG in this bb; and for _use, a regset > of those hard regs we don't want to allow moves to (those seen in USEs > later in that same block). Or do it in the main loop itself, _{def,use} > are each called in one place only; whatever works best. Huh, that's the way I started writing this patch.. The reason I went with modifying BLOCK_FOR_INSN is that the loop setting up LOG_LINKS needs to test that anyway. Changing code in the main loop means every insn in a function will need to be tested for additional conditions. I thought that might be slower. I can see your point though, especially if someone wanted to wean combine off LOG_LINKS. I'll rewrite the patch, which I realized was buggy anyway, and didn't prevent param copies from being combined. :-( > What makes return values different from CALL args here, btw? Why do > you not want to combine return values, but you leave alone call args? I don't think there is much difference between SETs for return values and SETs for call args. The reason I deliberately left them out is that in the original discussion we were talking about parameters and return values. So I thought I'd better restrict the change to just those SETs. It would be a much simpler patch to make combine ignore all SETs that are a reg,reg copy with one of them a hard reg. I was a little worried I'd regress some target if I tried that. (Results on powerpc64le-linux for such a change look good. A lot more cases where code is better, due to catching the parameter reg,reg copies. In looking at gcc/*.o I haven't yet seen any regressions in code quality.) > > +/* Fill in log links field for all insns that we wish to combine. */ > > Please don't change this comment; it was more correct before. But it wasn't correct! LOG_LINKS are not set up for *all* insns. Perhaps I should add "that we might wish to combine" rather than "that we wish to combine"? > > @@ -1103,7 +1192,7 @@ create_log_links (void) > > we might wind up changing the semantics of the insn, > > even if reload can make what appear to be valid > > assignments later. */ > > - if (regno < FIRST_PSEUDO_REGISTER > > + if (HARD_REGISTER_NUM_P (regno) > > && asm_noperands (PATTERN (use_insn)) >= 0) > > continue; > > An independent change. Yeah, I guess. I tidy these if I'm working on a piece of code. Here's the whole file done. Boostrapped and regression tested powerpc64le-linux and x86_64-linux. * combine.c: Use HARD_REGISTER_P and HARD_REGISTER_NUM_P as appropriate throughout file in place of comparing regno against FIRST_PSEUDO_REGISTER. Index: combine.c =================================================================== --- combine.c (revision 223463) +++ combine.c (working copy) @@ -1103,7 +1103,7 @@ create_log_links (void) we might wind up changing the semantics of the insn, even if reload can make what appear to be valid assignments later. */ - if (regno < FIRST_PSEUDO_REGISTER + if (HARD_REGISTER_NUM_P (regno) && asm_noperands (PATTERN (use_insn)) >= 0) continue; @@ -1730,7 +1730,7 @@ set_nonzero_bits_and_sign_copies (rtx x, const_rtx rtx_insn *insn = (rtx_insn *) data; if (REG_P (x) - && REGNO (x) >= FIRST_PSEUDO_REGISTER + && !HARD_REGISTER_P (x) /* If this register is undefined at the start of the file, we can't say what its contents were. */ && ! REGNO_REG_SET_P @@ -1897,7 +1897,7 @@ can_combine_p (rtx_insn *insn, rtx_insn *i3, rtx_i && (REGNO (XEXP (i3elt, 0)) == regno ? reg_set_between_p (XEXP (elt, 0), PREV_INSN (insn), i3) - : regno >= FIRST_PSEUDO_REGISTER)) + : !HARD_REGISTER_NUM_P (regno))) return 0; } while (--i >= 0); @@ -1971,7 +1971,7 @@ can_combine_p (rtx_insn *insn, rtx_insn *i3, rtx_i || (CALL_P (i3) && (find_reg_fusage (i3, USE, dest) || (REG_P (dest) - && REGNO (dest) < FIRST_PSEUDO_REGISTER + && HARD_REGISTER_P (dest) && global_regs[REGNO (dest)]))) /* Don't substitute into an incremented register. */ || FIND_REG_INC_NOTE (i3, dest) @@ -2021,7 +2021,7 @@ can_combine_p (rtx_insn *insn, rtx_insn *i3, rtx_i register. */ if (REG_P (src) - && ((REGNO (dest) < FIRST_PSEUDO_REGISTER + && ((HARD_REGISTER_P (dest) && ! HARD_REGNO_MODE_OK (REGNO (dest), GET_MODE (dest))) /* Don't extend the life of a hard register unless it is user variable (if we have few registers) or it can't @@ -2030,7 +2030,7 @@ can_combine_p (rtx_insn *insn, rtx_insn *i3, rtx_i Also avoid substituting a return register into I3, because reload can't handle a conflict with constraints of other inputs. */ - || (REGNO (src) < FIRST_PSEUDO_REGISTER + || (HARD_REGISTER_P (src) && ! HARD_REGNO_MODE_OK (REGNO (src), GET_MODE (src))))) return 0; } @@ -2052,7 +2052,7 @@ can_combine_p (rtx_insn *insn, rtx_insn *i3, rtx_i we leave it up to the machine description to either accept or reject use-and-clobber patterns. */ if (!REG_P (reg) - || REGNO (reg) >= FIRST_PSEUDO_REGISTER + || !HARD_REGISTER_P (reg) || !fixed_regs[REGNO (reg)]) if (reg_overlap_mentioned_p (reg, src)) return 0; @@ -2075,7 +2075,7 @@ can_combine_p (rtx_insn *insn, rtx_insn *i3, rtx_i to be an explicit register variable, and was chosen for a reason. */ if (GET_CODE (src) == ASM_OPERANDS - && REG_P (dest) && REGNO (dest) < FIRST_PSEUDO_REGISTER) + && REG_P (dest) && HARD_REGISTER_P (dest)) return 0; /* If INSN contains volatile references (specifically volatile MEMs), @@ -2221,7 +2221,7 @@ combinable_i3pat (rtx_insn *i3, rtx *loc, rtx i2de checks this; here, we do a more specific test for this case. */ || (REG_P (inner_dest) - && REGNO (inner_dest) < FIRST_PSEUDO_REGISTER + && HARD_REGISTER_P (inner_dest) && (! HARD_REGNO_MODE_OK (REGNO (inner_dest), GET_MODE (inner_dest)))) || (i1_not_in_src && reg_overlap_mentioned_p (i1dest, src)) @@ -2469,7 +2469,7 @@ can_change_dest_mode (rtx x, int added_sets, machi regno = REGNO (x); /* Allow hard registers if the new mode is legal, and occupies no more registers than the old mode. */ - if (regno < FIRST_PSEUDO_REGISTER) + if (HARD_REGISTER_NUM_P (regno)) return (HARD_REGNO_MODE_OK (regno, mode) && REG_NREGS (x) >= hard_regno_nregs[regno][mode]); @@ -2790,7 +2790,7 @@ try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn if (i1 == 0 && NONJUMP_INSN_P (i3) && GET_CODE (PATTERN (i3)) == SET && REG_P (SET_SRC (PATTERN (i3))) - && REGNO (SET_SRC (PATTERN (i3))) >= FIRST_PSEUDO_REGISTER + && !HARD_REGISTER_P (SET_SRC (PATTERN (i3))) && find_reg_note (i3, REG_DEAD, SET_SRC (PATTERN (i3))) && GET_CODE (PATTERN (i2)) == PARALLEL && ! side_effects_p (SET_DEST (PATTERN (i3))) @@ -3219,7 +3219,7 @@ try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn { unsigned int regno = REGNO (newpat_dest); compare_mode = new_mode; - if (regno < FIRST_PSEUDO_REGISTER) + if (HARD_REGISTER_NUM_P (regno)) newpat_dest = gen_rtx_REG (compare_mode, regno); else { @@ -3588,7 +3588,7 @@ try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn machine_mode old_mode = GET_MODE (i2dest); rtx ni2dest; - if (REGNO (i2dest) < FIRST_PSEUDO_REGISTER) + if (HARD_REGISTER_P (i2dest)) ni2dest = gen_rtx_REG (new_mode, REGNO (i2dest)); else { @@ -3604,7 +3604,7 @@ try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn m_split_insn = combine_split_insns (parallel, i3); if (m_split_insn == 0 - && REGNO (i2dest) >= FIRST_PSEUDO_REGISTER) + && !HARD_REGISTER_P (i2dest)) { struct undo *buf; @@ -3725,7 +3725,7 @@ try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn validated that we can do this. */ if (GET_MODE (i2dest) != split_mode && split_mode != VOIDmode) { - if (REGNO (i2dest) < FIRST_PSEUDO_REGISTER) + if (HARD_REGISTER_P (i2dest)) newdest = gen_rtx_REG (split_mode, REGNO (i2dest)); else { @@ -5390,7 +5390,7 @@ subst (rtx x, rtx from, rtx to, int in_dest, int i if (code == SUBREG && REG_P (to) - && REGNO (to) < FIRST_PSEUDO_REGISTER + && HARD_REGISTER_P (to) && simplify_subreg_regno (REGNO (to), GET_MODE (to), SUBREG_BYTE (x), GET_MODE (x)) < 0) @@ -6645,7 +6645,7 @@ simplify_set (rtx x) unsigned int regno = REGNO (dest); rtx new_dest; - if (regno < FIRST_PSEUDO_REGISTER) + if (HARD_REGISTER_NUM_P (regno)) new_dest = gen_rtx_REG (compare_mode, regno); else { @@ -6750,7 +6750,7 @@ simplify_set (rtx x) < GET_MODE_SIZE (GET_MODE (SUBREG_REG (src)))) #endif #ifdef CANNOT_CHANGE_MODE_CLASS - && ! (REG_P (dest) && REGNO (dest) < FIRST_PSEUDO_REGISTER + && ! (REG_P (dest) && HARD_REGISTER_P (dest) && REG_CANNOT_CHANGE_MODE_P (REGNO (dest), GET_MODE (SUBREG_REG (src)), GET_MODE (src))) @@ -9815,7 +9815,7 @@ reg_nonzero_bits_for_combine (const_rtx x, machine && rsp->last_set_label < label_tick) || (rsp->last_set_label == label_tick && DF_INSN_LUID (rsp->last_set) < subst_low_luid) - || (REGNO (x) >= FIRST_PSEUDO_REGISTER + || (!HARD_REGISTER_P (x) && REGNO (x) < reg_n_sets_max && REG_N_SETS (REGNO (x)) == 1 && !REGNO_REG_SET_P @@ -9879,7 +9879,7 @@ reg_num_sign_bit_copies_for_combine (const_rtx x, && rsp->last_set_label < label_tick) || (rsp->last_set_label == label_tick && DF_INSN_LUID (rsp->last_set) < subst_low_luid) - || (REGNO (x) >= FIRST_PSEUDO_REGISTER + || (!HARD_REGISTER_P (x) && REGNO (x) < reg_n_sets_max && REG_N_SETS (REGNO (x)) == 1 && !REGNO_REG_SET_P @@ -12921,7 +12921,7 @@ record_truncated_value (rtx x) } /* ??? For hard-regs we now record everything. We might be able to optimize this using last_set_mode. */ - else if (REG_P (x) && REGNO (x) < FIRST_PSEUDO_REGISTER) + else if (REG_P (x) && HARD_REGISTER_P (x)) truncated_mode = GET_MODE (x); else return false; @@ -13012,7 +13012,7 @@ get_last_value_validate (rtx *loc, rtx_insn *insn, if (rsp->last_set_invalid /* If this is a pseudo-register that was only set once and not live at the beginning of the function, it is always valid. */ - || (! (regno >= FIRST_PSEUDO_REGISTER + || (! (!HARD_REGISTER_NUM_P (regno) && regno < reg_n_sets_max && REG_N_SETS (regno) == 1 && (!REGNO_REG_SET_P @@ -13129,7 +13129,7 @@ get_last_value (const_rtx x) if (value == 0 || (rsp->last_set_label < label_tick_ebb_start - && (regno < FIRST_PSEUDO_REGISTER + && (HARD_REGISTER_NUM_P (regno) || regno >= reg_n_sets_max || REG_N_SETS (regno) != 1 || REGNO_REG_SET_P @@ -13257,7 +13257,7 @@ reg_dead_at_p (rtx reg, rtx_insn *insn) /* Check that reg isn't mentioned in NEWPAT_USED_REGS. For fixed registers we allow the machine description to decide whether use-and-clobber patterns are OK. */ - if (reg_dead_regno < FIRST_PSEUDO_REGISTER) + if (HARD_REGISTER_NUM_P (reg_dead_regno)) { for (i = reg_dead_regno; i < reg_dead_endregno; i++) if (!fixed_regs[i] && TEST_HARD_REG_BIT (newpat_used_regs, i)) @@ -13331,7 +13331,7 @@ mark_used_regs_combine (rtx x) regno = REGNO (x); /* A hard reg in a wide mode may really be multiple registers. If so, mark all of them just like the first. */ - if (regno < FIRST_PSEUDO_REGISTER) + if (HARD_REGISTER_NUM_P (regno)) { /* None of this applies to the stack, frame or arg pointers. */ if (regno == STACK_POINTER_REGNUM @@ -13449,7 +13449,7 @@ move_deaths (rtx x, rtx maybe_kill_insn, int from_ including X. In that case, we must put REG_DEAD notes for the remaining registers in place of NOTE. */ - if (note != 0 && regno < FIRST_PSEUDO_REGISTER + if (note != 0 && HARD_REGISTER_NUM_P (regno) && (GET_MODE_SIZE (GET_MODE (XEXP (note, 0))) > GET_MODE_SIZE (GET_MODE (x)))) { @@ -13472,7 +13472,7 @@ move_deaths (rtx x, rtx maybe_kill_insn, int from_ || (note != 0 && (GET_MODE_SIZE (GET_MODE (XEXP (note, 0))) < GET_MODE_SIZE (GET_MODE (x))))) - && regno < FIRST_PSEUDO_REGISTER + && HARD_REGISTER_NUM_P (regno) && REG_NREGS (x) > 1) { unsigned int ourend = END_REGNO (x); @@ -13588,7 +13588,7 @@ reg_bitfield_target_p (rtx x, rtx body) return 0; tregno = REGNO (target), regno = REGNO (x); - if (tregno >= FIRST_PSEUDO_REGISTER || regno >= FIRST_PSEUDO_REGISTER) + if (!HARD_REGISTER_NUM_P (tregno) || !HARD_REGISTER_NUM_P (regno)) return target == x; endtregno = end_hard_regno (GET_MODE (target), tregno); @@ -13922,7 +13922,7 @@ distribute_notes (rtx notes, rtx_insn *from_insn, TEM_INSN is doing. If so, delete TEM_INSN. Otherwise, make this into a REG_UNUSED note instead. Don't delete sets to global register vars. */ - if ((REGNO (XEXP (note, 0)) >= FIRST_PSEUDO_REGISTER + if ((!HARD_REGISTER_P (XEXP (note, 0)) || !global_regs[REGNO (XEXP (note, 0))]) && reg_set_p (XEXP (note, 0), PATTERN (tem_insn))) { -- Alan Modra Australia Development Lab, IBM ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] Don't combine param and return value copies 2015-05-25 5:24 ` Alan Modra @ 2015-05-25 7:30 ` Alan Modra 2015-05-25 20:19 ` Segher Boessenkool 1 sibling, 0 replies; 8+ messages in thread From: Alan Modra @ 2015-05-25 7:30 UTC (permalink / raw) To: Segher Boessenkool, gcc-patches On Mon, May 25, 2015 at 10:26:35AM +0930, Alan Modra wrote: > looking at gcc/*.o I haven't yet seen any regressions in code quality.) Well that didn't last very long. There are regressions, and just from looking at disassembled object files it would appear to be frame pointer related. So the simple patch won't work. -- Alan Modra Australia Development Lab, IBM ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] Don't combine param and return value copies 2015-05-25 5:24 ` Alan Modra 2015-05-25 7:30 ` Alan Modra @ 2015-05-25 20:19 ` Segher Boessenkool 2015-05-26 8:15 ` Alan Modra 1 sibling, 1 reply; 8+ messages in thread From: Segher Boessenkool @ 2015-05-25 20:19 UTC (permalink / raw) To: gcc-patches On Mon, May 25, 2015 at 10:26:35AM +0930, Alan Modra wrote: > On Sat, May 23, 2015 at 08:45:23AM -0500, Segher Boessenkool wrote: > > Thanks. Did you see improvements around return as well, or mostly / > > only related to the function start? > > The rlwinm vs. rldicl change was in dwarf2out.c add_ranges_num for r3 > one insn before a blr. I'm sure that was due to not combining return > value copies. In fact, I'm sure all the improvements I saw were due > to changing the exit.. See below. > > > > +/* Twiddle BLOCK_FOR_INSN to TO for instructions in the first block BB > > > + that we don't want to combine with other instructions. */ > > > + > > > +static void > > > +twiddle_first_block (basic_block bb, basic_block to) > > > > I don't like this much. Messing with global state makes things harder > > to change later. If it is hard to come up with a good name for a > > factor, it usually means it is not such a good factor. > > > > You can do these inside can_combine_{def,use}_p as far as I can see? > > Probably need to give those an extra parameter then: for _def, a bool > > that says "don't allow moves from hard regs", set when the scan has > > encountered a NOTE_INSN_FUNCTION_BEG in this bb; and for _use, a regset > > of those hard regs we don't want to allow moves to (those seen in USEs > > later in that same block). Or do it in the main loop itself, _{def,use} > > are each called in one place only; whatever works best. > > Huh, that's the way I started writing this patch.. The reason I went > with modifying BLOCK_FOR_INSN is that the loop setting up LOG_LINKS > needs to test that anyway. Changing code in the main loop means > every insn in a function will need to be tested for additional > conditions. I thought that might be slower. I can see your point > though, especially if someone wanted to wean combine off LOG_LINKS. The setup of the LOG_LINKS is a simple fast linear loop, much less work than the rest of combine. So don't worry about performance too much :-) > > What makes return values different from CALL args here, btw? Why do > > you not want to combine return values, but you leave alone call args? > > I don't think there is much difference between SETs for return values > and SETs for call args. The reason I deliberately left them out is > that in the original discussion we were talking about parameters and > return values. So I thought I'd better restrict the change to just > those SETs. > > It would be a much simpler patch to make combine ignore all SETs > that are a reg,reg copy with one of them a hard reg. I was a little > worried I'd regress some target if I tried that. _All_ moves to/from hard regs includes much more (register asm, fixed registers, maybe some targets do weird things as well?). So yes I share those worries. Since we do not want any other pass (before RA) to foul up these register moves either, maybe a better solution would be to mark them some way at expand time? > (Results on > powerpc64le-linux for such a change look good. A lot more cases where > code is better, due to catching the parameter reg,reg copies. In > looking at gcc/*.o I haven't yet seen any regressions in code quality.) That is good news :-) > > > +/* Fill in log links field for all insns that we wish to combine. */ > > > > Please don't change this comment; it was more correct before. > > But it wasn't correct! LOG_LINKS are not set up for *all* insns. It "sets" all LOG_LINKS to some value ("sets", it doesn't actually zero them at the start, it has an assert for that though). > > > @@ -1103,7 +1192,7 @@ create_log_links (void) > > > we might wind up changing the semantics of the insn, > > > even if reload can make what appear to be valid > > > assignments later. */ > > > - if (regno < FIRST_PSEUDO_REGISTER > > > + if (HARD_REGISTER_NUM_P (regno) > > > && asm_noperands (PATTERN (use_insn)) >= 0) > > > continue; > > > > An independent change. > > Yeah, I guess. I tidy these if I'm working on a piece of code. Pretty far away in this case ;-) > Here's the whole file done. Boostrapped and regression tested > powerpc64le-linux and x86_64-linux. > > * combine.c: Use HARD_REGISTER_P and HARD_REGISTER_NUM_P as > appropriate throughout file in place of comparing regno > against FIRST_PSEUDO_REGISTER. Have we decided we want to convert the whole compiler to this? It is a pretty lousy interface IMO: HARD_REGISTER_P does not check if its arg is a register; it does not check if its arg is a hard register either (it checks if it is not a pseudo); it cannot be used in all places in all passes because of that (which means, not in all macros and hooks either). Segher ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] Don't combine param and return value copies 2015-05-25 20:19 ` Segher Boessenkool @ 2015-05-26 8:15 ` Alan Modra 2015-05-28 15:33 ` Segher Boessenkool 0 siblings, 1 reply; 8+ messages in thread From: Alan Modra @ 2015-05-26 8:15 UTC (permalink / raw) To: Segher Boessenkool; +Cc: gcc-patches On powerpc64le, modifying the way combine treats function parameters and call arguments results in some regressions. For instance, this testcase from varasm.c extern int foo3 (void *, ...); extern void foo4 (void *, const char *); int emit_tls_common (void *decl, const char *name, unsigned long size) { foo3 (0, "\t%s\t", ".."); foo4 (0, name); foo3 (0, ",%lu,%u\n", size, ((unsigned int *)decl)[88] / 8); return 1; } at -O2 produces for the prologue and first call old new mflr 0 mflr 0 std 29,-24(1) std 29,-24(1) std 30,-16(1) std 30,-16(1) mr 29,4 addis 9,2,.LC0@toc@ha std 31,-8(1) std 31,-8(1) addis 4,2,.LC1@toc@ha addis 10,2,.LC1@toc@ha mr 31,5 addi 9,9,.LC0@toc@l addis 5,2,.LC0@toc@ha addi 10,10,.LC1@toc@l mr 30,3 mr 30,3 addi 5,5,.LC0@toc@l mr 29,4 addi 4,4,.LC1@toc@l mr 31,5 li 3,0 mr 4,10 std 0,16(1) mr 5,9 stdu 1,-128(1) std 0,16(1) bl foo3 stdu 1,-128(1) nop li 3,0 bl foo3 nop As you can see, we have some extra register shuffling from keeping a pseudo for arg setup insns. I guess the pseudos allow sched more freedom to mess around.. On the positive side, I saw cases where keeping parameter pseudos allowed shrink-wrap to occur. varasm.c:decode_reg_name_and_count is one of them. More shrink-wrapping is a big win. Here's a case where changes at the return result in poorer code int decl_readonly_section_1 (int category) { switch (category) { case 1: case 2: case 3: case 4: case 5: return 1; default: return 0; } } old new addi 9,3,-6 addi 9,3,-6 neg 3,3 neg 3,3 and 3,9,3 and 3,9,3 rldicl 3,3,33,63 srwi 3,3,31 blr rldicl 3,3,0,32 blr Previously this: (insn 35 34 36 2 (set (reg:SI 161) (lshiftrt:SI (reg:SI 164) (const_int 31 [0x1f]))) {lshrsi3}) (insn 36 35 23 2 (set (reg:DI 155 [ D.2441 ]) (zero_extend:DI (reg:SI 161))) {zero_extendsidi2}) (insn 23 36 24 2 (set (reg/i:DI 3 3) (reg:DI 155 [ D.2441 ])) {*movdi_internal64}) is first combined to (insn 35 34 36 2 (set (reg:SI 161) (lshiftrt:SI (reg:SI 164) (const_int 31 [0x1f]))) {lshrsi3}) (insn 23 35 24 2 (set (reg/i:DI 3 3) (and:DI (subreg:DI (reg:SI 161) 0) (const_int 1 [0x1])))) which is somewhat surprising, but from my previous forays into combine.c I'd say happens due to known zero bits. (Just looking at dumps here, rather than in gdb.) Then the above is further combined to (insn 23 34 24 2 (set (reg/i:DI 3 3) (zero_extract:DI (subreg:DI (reg:SI 164) 0) (const_int 1 [0x1]) (const_int 32 [0x20]))) Looks to me like a missed optimization opportunity that insns 35 and 36 aren't combined without first going through the intermediate step. Anyway, here's the rewritten patch. I've left in some knobs I used when testing in case you want to see for yourself what happens with various options. Bootstrapped etc. powerpc64le-linux and x86_64-linux. One testsuite regression appears on powerpc64 which should go away if I push on getting an old patch of mine committed https://gcc.gnu.org/ml/gcc-patches/2013-08/msg00971.html * combine.c (set_return_regs): New function. (create_log_links): Exclude instructions copying parameter values from hard regs to pseudos, and instructions copying call argument and return value pseudos to hard regs. Index: gcc/combine.c =================================================================== --- gcc/combine.c (revision 223463) +++ gcc/combine.c (working copy) @@ -1048,6 +1048,19 @@ can_combine_use_p (df_ref use) return true; } +/* Used to build set of return value regs. Add X to the set. */ + +static void +set_return_regs (rtx x, void *arg) +{ + HARD_REG_SET *regs = (HARD_REG_SET *) arg; + + add_to_hard_reg_set (regs, GET_MODE (x), REGNO (x)); +} + +#define DONT_COMBINE_PARAMS 1 +#define DONT_COMBINE_CALL_ARGS 1 + /* Fill in log links field for all insns. */ static void @@ -1057,9 +1070,13 @@ create_log_links (void) rtx_insn **next_use; rtx_insn *insn; df_ref def, use; + HARD_REG_SET hard_regs; next_use = XCNEWVEC (rtx_insn *, max_reg_num ()); + CLEAR_HARD_REG_SET (hard_regs); + diddle_return_value (set_return_regs, &hard_regs); + /* Pass through each block from the end, recording the uses of each register and establishing log links when def is encountered. Note that we do not clear next_use array in order to save time, @@ -1069,10 +1086,37 @@ create_log_links (void) usage -- these are taken from original flow.c did. Don't ask me why it is done this way; I don't know and if it works, I don't want to know. */ - FOR_EACH_BB_FN (bb, cfun) + bool in_parameters = false; + FOR_EACH_BB_REVERSE_FN (bb, cfun) { FOR_BB_INSNS_REVERSE (bb, insn) { + if (DONT_COMBINE_PARAMS + && NOTE_P (insn) + && NOTE_KIND (insn) == NOTE_INSN_FUNCTION_BEG) + { + in_parameters = true; + continue; + } + + if (CALL_P (insn)) + { + /* Add function args passed in hard regs to HARD_REGS. */ + CLEAR_HARD_REG_SET (hard_regs); + if (DONT_COMBINE_CALL_ARGS) + for (rtx link = CALL_INSN_FUNCTION_USAGE (insn); + link; + link = XEXP (link, 1)) + if (GET_CODE (XEXP (link, 0)) == USE) + { + rtx reg = XEXP (XEXP (link, 0), 0); + if (REG_P (reg) + && HARD_REGISTER_P (reg)) + add_to_hard_reg_set (&hard_regs, + GET_MODE (reg), REGNO (reg)); + } + } + if (!NONDEBUG_INSN_P (insn)) continue; @@ -1079,13 +1123,27 @@ create_log_links (void) /* Log links are created only once. */ gcc_assert (!LOG_LINKS (insn)); + /* Exclude certain reg,reg copies when one is a hard reg. + Leaving these insns alone provides the register allocator + useful information. */ + rtx reg_reg = 0; + int hard = 0; + if (GET_CODE (PATTERN (insn)) == SET) + { + reg_reg = PATTERN (insn); + if (REG_P (SET_DEST (reg_reg)) + && REG_P (SET_SRC (reg_reg))) + hard = (HARD_REGISTER_P (SET_DEST (reg_reg)) + - HARD_REGISTER_P (SET_SRC (reg_reg))); + } + FOR_EACH_INSN_DEF (def, insn) - { - unsigned int regno = DF_REF_REGNO (def); - rtx_insn *use_insn; + { + unsigned int regno = DF_REF_REGNO (def); + rtx_insn *use_insn; - if (!next_use[regno]) - continue; + if (!next_use[regno]) + continue; if (!can_combine_def_p (def)) continue; @@ -1096,6 +1154,14 @@ create_log_links (void) if (BLOCK_FOR_INSN (use_insn) != bb) continue; + if (in_parameters && hard < 0) + { + /* This is a reg,reg copy from a hard reg to a pseudo, + such as those copying parameter registers to + pseudos. Don't set up LOG_LINKS to this insn. */ + continue; + } + /* flow.c claimed: We don't build a LOG_LINK for hard registers contained @@ -1110,18 +1176,36 @@ create_log_links (void) /* Don't add duplicate links between instructions. */ struct insn_link *links; FOR_EACH_LOG_LINK (links, use_insn) - if (insn == links->insn && regno == links->regno) + if (insn == links->insn && regno == links->regno) break; if (!links) LOG_LINKS (use_insn) = alloc_insn_link (insn, regno, LOG_LINKS (use_insn)); - } + } FOR_EACH_INSN_USE (use, insn) - if (can_combine_use_p (use)) - next_use[DF_REF_REGNO (use)] = insn; + { + if (hard > 0 + && overlaps_hard_reg_set_p (hard_regs, + GET_MODE (SET_DEST (reg_reg)), + REGNO (SET_DEST (reg_reg)))) + { + /* This is a reg,reg copy to a hard register, such + as those setting the function return value, or + setting up arguments for a function call. Don't + provide LOG_LINKS from this insn. This prevents + the insn defining the pseudo from being combined + into the reg,reg copy insn. */ + remove_from_hard_reg_set (&hard_regs, + GET_MODE (SET_DEST (reg_reg)), + REGNO (SET_DEST (reg_reg))); + } + else if (can_combine_use_p (use)) + next_use[DF_REF_REGNO (use)] = insn; + } } + CLEAR_HARD_REG_SET (hard_regs); } free (next_use); -- Alan Modra Australia Development Lab, IBM ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] Don't combine param and return value copies 2015-05-26 8:15 ` Alan Modra @ 2015-05-28 15:33 ` Segher Boessenkool 0 siblings, 0 replies; 8+ messages in thread From: Segher Boessenkool @ 2015-05-28 15:33 UTC (permalink / raw) To: gcc-patches On Tue, May 26, 2015 at 04:37:46PM +0930, Alan Modra wrote: > On powerpc64le, modifying the way combine treats function parameters > and call arguments results in some regressions. > > For instance, this testcase from varasm.c > > extern int foo3 (void *, ...); > extern void foo4 (void *, const char *); > int > emit_tls_common (void *decl, > const char *name, > unsigned long size) > { > foo3 (0, "\t%s\t", ".."); > foo4 (0, name); > foo3 (0, ",%lu,%u\n", size, ((unsigned int *)decl)[88] / 8); > return 1; > } > > at -O2 produces for the prologue and first call > > old new > mflr 0 mflr 0 > std 29,-24(1) std 29,-24(1) > std 30,-16(1) std 30,-16(1) > mr 29,4 addis 9,2,.LC0@toc@ha > std 31,-8(1) std 31,-8(1) > addis 4,2,.LC1@toc@ha addis 10,2,.LC1@toc@ha > mr 31,5 addi 9,9,.LC0@toc@l > addis 5,2,.LC0@toc@ha addi 10,10,.LC1@toc@l > mr 30,3 mr 30,3 > addi 5,5,.LC0@toc@l mr 29,4 > addi 4,4,.LC1@toc@l mr 31,5 > li 3,0 mr 4,10 > std 0,16(1) mr 5,9 > stdu 1,-128(1) std 0,16(1) > bl foo3 stdu 1,-128(1) > nop li 3,0 > bl foo3 > nop > > As you can see, we have some extra register shuffling from keeping a > pseudo for arg setup insns. I guess the pseudos allow sched more > freedom to mess around.. ... and then RA isn't able to move things back. I see this happening with all three changes (return value, incoming args, outgoing args); the changes to combine give sched1 and RA more freedom, but those then end up generating lots of unnecessary register moves. > On the positive side, I saw cases where keeping parameter pseudos > allowed shrink-wrap to occur. varasm.c:decode_reg_name_and_count is > one of them. More shrink-wrapping is a big win. > > Here's a case where changes at the return result in poorer code > int > decl_readonly_section_1 (int category) > { > switch (category) > { > case 1: > case 2: > case 3: > case 4: > case 5: > return 1; > default: > return 0; > } > } > old new > addi 9,3,-6 addi 9,3,-6 > neg 3,3 neg 3,3 > and 3,9,3 and 3,9,3 > rldicl 3,3,33,63 srwi 3,3,31 > blr rldicl 3,3,0,32 > blr > > Previously this: > (insn 35 34 36 2 (set (reg:SI 161) > (lshiftrt:SI (reg:SI 164) > (const_int 31 [0x1f]))) {lshrsi3}) > (insn 36 35 23 2 (set (reg:DI 155 [ D.2441 ]) > (zero_extend:DI (reg:SI 161))) {zero_extendsidi2}) > (insn 23 36 24 2 (set (reg/i:DI 3 3) > (reg:DI 155 [ D.2441 ])) {*movdi_internal64}) > > is first combined to > (insn 35 34 36 2 (set (reg:SI 161) > (lshiftrt:SI (reg:SI 164) > (const_int 31 [0x1f]))) {lshrsi3}) > (insn 23 35 24 2 (set (reg/i:DI 3 3) > (and:DI (subreg:DI (reg:SI 161) 0) > (const_int 1 [0x1])))) > which is somewhat surprising, but from my previous forays into > combine.c I'd say happens due to known zero bits. (Just looking at > dumps here, rather than in gdb.) > > Then the above is further combined to > (insn 23 34 24 2 (set (reg/i:DI 3 3) > (zero_extract:DI (subreg:DI (reg:SI 164) 0) > (const_int 1 [0x1]) > (const_int 32 [0x20]))) > > Looks to me like a missed optimization opportunity that insns 35 and > 36 aren't combined without first going through the intermediate step. The rs6000 backend doesn't have zero_extend variants of many of its patterns, only some. Well-known problem, long-term project. > Anyway, here's the rewritten patch. I've left in some knobs I used > when testing in case you want to see for yourself what happens with > various options. Bootstrapped etc. powerpc64le-linux and > x86_64-linux. > +#define DONT_COMBINE_PARAMS 1 > +#define DONT_COMBINE_CALL_ARGS 1 I tested with all combinations of those knob settings, building Linux kernels (mostly defconfigs); these are the resulting text sizes: master alan00 alan10 alan01 alan11 5432728 5432728 5433848 5435472 5436080 alpha 3851131 3851391 3852495 3852567 3853755 arm 2190716 2190716 2190716 2190708 2190708 blackfin 2191439 2191503 2191983 2192335 2192751 c6x 2213186 2213250 2213154 2213482 2213546 cris 3322420 3322420 3322500 3322564 3322692 frv 10898664 10898664 10898664 10898664 10898664 i386 3253459 3253539 3253599 3255235 3255331 m32r 4708528 4708532 4709772 4708660 4709656 microblaze 3949689 3949745 3950269 3952401 3952857 mips 5693823 5693975 5694443 5697971 5698227 mips64 2374664 2374708 2375126 2375485 2375841 mn10300 7488219 7488419 7492299 7489743 7493431 parisc 6195727 6195935 6196367 6221647 6221687 parisc64 8688975 8689111 8691931 8692619 8695279 powerpc 20252077 20252337 20255185 20266897 20269477 powerpc64 11423734 11421858 11423946 11422726 11424838 s390 6488342 6488406 6488534 6489110 6489302 sh64 1545652 1545652 1545776 1545812 1545944 shnommu 3737005 3736973 3737357 3737585 3737945 sparc 6075342 6074426 6074762 6075570 6075834 sparc64 12301607 12301607 12301543 12301787 12301723 x86_64 1954029 1954061 1954441 1948841 1949305 xtensa As you see, almost everything regresses code size. s390 and sparc and xtensa like some of it; everything else generates more register moves. A typical one for powerpc, with the "00" setting: before after rlwinm 3,9,N,N,N rlwinm 9,9,N,N,N mr 3,9 (and then blr etc., no further uses of r3 or r9; nothing special about rlwinm here, there also are "and" and "addi" cases, etc.) I think we should at least ameliorate this regression before we can apply any variant of this :-( Segher ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2015-05-28 15:22 UTC | newest] Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2015-05-23 11:20 [PATCH] Don't combine param and return value copies Alan Modra 2015-05-23 11:20 ` Andrew Pinski 2015-05-23 19:40 ` Segher Boessenkool 2015-05-25 5:24 ` Alan Modra 2015-05-25 7:30 ` Alan Modra 2015-05-25 20:19 ` Segher Boessenkool 2015-05-26 8:15 ` Alan Modra 2015-05-28 15:33 ` Segher Boessenkool
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).