* [PR81611] improve auto-inc @ 2018-01-24 5:28 Alexandre Oliva 2018-01-24 14:40 ` Richard Biener 2018-02-14 5:06 ` Jeff Law 0 siblings, 2 replies; 7+ messages in thread From: Alexandre Oliva @ 2018-01-24 5:28 UTC (permalink / raw) To: gcc-patches These two patches fix PR81611. The first one improves forwprop so that we avoid adding SSA conflicting by forwpropping the iv increment, which may cause both the incremented and the original value to be live, even when the iv is copied between the PHI node and the increment. We already handled the case in which there aren't any such copies. Alas, this is not enough to address the problem on avr, even though it fixes it on e.g. ppc. The reason is that avr rejects var+offset addresses, and this prevents the memory access in a post-inc code sequence from being adjusted to an address that auto-inc-dec can recognize. The second patch adjusts auto-inc-dec to recognize a code sequence in which the original, unincremented pseudo is used in an address after it's incremented into another pseudo, and turn that into a post-inc address, leaving the copying for subsequent passes to eliminate. Regstrapped on x86_64-linux-gnu, i686-linux-gnu, ppc64-linux-gnu and aarch64-linux-gnu. Ok to install? I'd appreciate suggestions on how to turn the submitted testcase into a regression test; I suppose an avr-specific test that requires the auto-inc transformation is a possibility, but that feels a bit too limited, doesn't it? Thoughts? Thanks in advance, [PR81611] accept copies in simple_iv_increment_p If there are copies between the GIMPLE_PHI at the loop body and the increment that reaches it (presumably through a back edge), still regard it as a simple_iv_increment, so that we won't consider the value in the back edge eligible for forwprop. Doing so would risk making the phi node and the incremented conflicting value live within the loop, and the phi node to be preserved for propagated uses after the loop. for gcc/ChangeLog PR tree-optimization/81611 * tree-ssa-dom.c (simple_iv_increment_p): Skip intervening copies. --- gcc/tree-ssa-dom.c | 21 +++++++++++++++++---- 1 file changed, 17 insertions(+), 4 deletions(-) diff --git a/gcc/tree-ssa-dom.c b/gcc/tree-ssa-dom.c index 2b371667253a..3c0ff9458342 100644 --- a/gcc/tree-ssa-dom.c +++ b/gcc/tree-ssa-dom.c @@ -1276,8 +1276,11 @@ record_equality (tree x, tree y, class const_and_copies *const_and_copies) /* Returns true when STMT is a simple iv increment. It detects the following situation: - i_1 = phi (..., i_2) - i_2 = i_1 +/- ... */ + i_1 = phi (..., i_k) + [...] + i_j = i_{j-1} for each j : 2 <= j <= k-1 + [...] + i_k = i_{k-1} +/- ... */ bool simple_iv_increment_p (gimple *stmt) @@ -1305,8 +1308,18 @@ simple_iv_increment_p (gimple *stmt) return false; phi = SSA_NAME_DEF_STMT (preinc); - if (gimple_code (phi) != GIMPLE_PHI) - return false; + while (gimple_code (phi) != GIMPLE_PHI) + { + /* Follow trivial copies, but not the DEF used in a back edge, + so that we don't prevent coalescing. */ + if (gimple_code (phi) != GIMPLE_ASSIGN + || gimple_assign_lhs (phi) != preinc + || !gimple_assign_ssa_name_copy_p (phi)) + return false; + + preinc = gimple_assign_rhs1 (phi); + phi = SSA_NAME_DEF_STMT (preinc); + } for (i = 0; i < gimple_phi_num_args (phi); i++) if (gimple_phi_arg_def (phi, i) == lhs) [PR81611] turn inc-and-use-of-dead-orig into auto-inc When the addressing modes available on the machine don't allow offsets in addresses, odds are that post-increments will be represented in trees and RTL as: y <= x + 1 ... *(x) ... x <= y so deal with this form so as to create auto-inc addresses that we'd otherwise miss. for gcc/ChangeLog PR rtl-optimization/81611 * auto-inc-dec.c (attempt_change): Move dead note from mem_insn if it's the next use of regno (find_address): Take address use of reg holding non-incremented value. (merge_in_block): Attempt to use a mem insn that is the next use of the original regno. --- gcc/auto-inc-dec.c | 46 ++++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 44 insertions(+), 2 deletions(-) diff --git a/gcc/auto-inc-dec.c b/gcc/auto-inc-dec.c index d02fa9d081c7..4ffbcf56a456 100644 --- a/gcc/auto-inc-dec.c +++ b/gcc/auto-inc-dec.c @@ -508,7 +508,11 @@ attempt_change (rtx new_addr, rtx inc_reg) before the memory reference. */ gcc_assert (mov_insn); emit_insn_before (mov_insn, inc_insn.insn); - move_dead_notes (mov_insn, inc_insn.insn, inc_insn.reg0); + regno = REGNO (inc_insn.reg0); + if (reg_next_use[regno] == mem_insn.insn) + move_dead_notes (mov_insn, mem_insn.insn, inc_insn.reg0); + else + move_dead_notes (mov_insn, inc_insn.insn, inc_insn.reg0); regno = REGNO (inc_insn.reg_res); reg_next_def[regno] = mov_insn; @@ -842,7 +846,7 @@ find_address (rtx *address_of_x) if (code == MEM && rtx_equal_p (XEXP (x, 0), inc_insn.reg_res)) { - /* Match with *reg0. */ + /* Match with *reg_res. */ mem_insn.mem_loc = address_of_x; mem_insn.reg0 = inc_insn.reg_res; mem_insn.reg1_is_const = true; @@ -850,6 +854,17 @@ find_address (rtx *address_of_x) mem_insn.reg1 = GEN_INT (0); return -1; } + if (code == MEM && inc_insn.reg1_is_const && inc_insn.reg0 + && rtx_equal_p (XEXP (x, 0), inc_insn.reg0)) + { + /* Match with *reg0 AKA *(reg_res - reg1_val). */ + mem_insn.mem_loc = address_of_x; + mem_insn.reg0 = inc_insn.reg_res; + mem_insn.reg1_is_const = true; + mem_insn.reg1_val = -inc_insn.reg1_val; + mem_insn.reg1 = GEN_INT (mem_insn.reg1_val); + return -1; + } if (code == MEM && GET_CODE (XEXP (x, 0)) == PLUS && rtx_equal_p (XEXP (XEXP (x, 0), 0), inc_insn.reg_res)) { @@ -1370,6 +1385,33 @@ merge_in_block (int max_reg, basic_block bb) insn_is_add_or_inc = false; } } + + if (ok && insn_is_add_or_inc + && inc_insn.reg0 != inc_insn.reg_res) + { + regno = REGNO (inc_insn.reg0); + int luid = DF_INSN_LUID (mem_insn.insn); + mem_insn.insn = get_next_ref (regno, bb, reg_next_use); + + /* Try a mem use of reg0 before that of reg_res + too. If there's no further use of reg_res, + there's no point in trying an auto-inc, and + if the use of reg0 is after that of reg_res, + it will be too late for the auto-inc to + compute reg_res's correct value. */ + if (mem_insn.insn + && luid > DF_INSN_LUID (mem_insn.insn) + && find_address (&PATTERN (mem_insn.insn)) == -1) + { + if (dump_file) + dump_mem_insn (dump_file); + if (try_merge ()) + { + success_in_block++; + insn_is_add_or_inc = false; + } + } + } } } } -- Alexandre Oliva, freedom fighter http://FSFLA.org/~lxoliva/ You must be the change you wish to see in the world. -- Gandhi Be Free! -- http://FSFLA.org/ FSF Latin America board member Free Software Evangelist|Red Hat Brasil GNU Toolchain Engineer ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PR81611] improve auto-inc 2018-01-24 5:28 [PR81611] improve auto-inc Alexandre Oliva @ 2018-01-24 14:40 ` Richard Biener 2018-01-25 20:32 ` Alexandre Oliva 2018-02-14 5:06 ` Jeff Law 1 sibling, 1 reply; 7+ messages in thread From: Richard Biener @ 2018-01-24 14:40 UTC (permalink / raw) To: Alexandre Oliva; +Cc: GCC Patches On Wed, Jan 24, 2018 at 4:42 AM, Alexandre Oliva <aoliva@redhat.com> wrote: > These two patches fix PR81611. > > The first one improves forwprop so that we avoid adding SSA conflicting > by forwpropping the iv increment, which may cause both the incremented > and the original value to be live, even when the iv is copied between > the PHI node and the increment. We already handled the case in which > there aren't any such copies. > > Alas, this is not enough to address the problem on avr, even though it > fixes it on e.g. ppc. The reason is that avr rejects var+offset > addresses, and this prevents the memory access in a post-inc code > sequence from being adjusted to an address that auto-inc-dec can > recognize. > > The second patch adjusts auto-inc-dec to recognize a code sequence in > which the original, unincremented pseudo is used in an address after > it's incremented into another pseudo, and turn that into a post-inc > address, leaving the copying for subsequent passes to eliminate. > > Regstrapped on x86_64-linux-gnu, i686-linux-gnu, ppc64-linux-gnu and > aarch64-linux-gnu. Ok to install? > > > I'd appreciate suggestions on how to turn the submitted testcase into a > regression test; I suppose an avr-specific test that requires the > auto-inc transformation is a possibility, but that feels a bit too > limited, doesn't it? Thoughts? Thanks in advance, > > > [PR81611] accept copies in simple_iv_increment_p > > If there are copies between the GIMPLE_PHI at the loop body and the > increment that reaches it (presumably through a back edge), still > regard it as a simple_iv_increment, so that we won't consider the > value in the back edge eligible for forwprop. Doing so would risk > making the phi node and the incremented conflicting value live > within the loop, and the phi node to be preserved for propagated > uses after the loop. > > for gcc/ChangeLog > > PR tree-optimization/81611 > * tree-ssa-dom.c (simple_iv_increment_p): Skip intervening > copies. > --- > gcc/tree-ssa-dom.c | 21 +++++++++++++++++---- > 1 file changed, 17 insertions(+), 4 deletions(-) > > diff --git a/gcc/tree-ssa-dom.c b/gcc/tree-ssa-dom.c > index 2b371667253a..3c0ff9458342 100644 > --- a/gcc/tree-ssa-dom.c > +++ b/gcc/tree-ssa-dom.c > @@ -1276,8 +1276,11 @@ record_equality (tree x, tree y, class const_and_copies *const_and_copies) > /* Returns true when STMT is a simple iv increment. It detects the > following situation: > > - i_1 = phi (..., i_2) > - i_2 = i_1 +/- ... */ > + i_1 = phi (..., i_k) > + [...] > + i_j = i_{j-1} for each j : 2 <= j <= k-1 > + [...] > + i_k = i_{k-1} +/- ... */ > > bool > simple_iv_increment_p (gimple *stmt) > @@ -1305,8 +1308,18 @@ simple_iv_increment_p (gimple *stmt) > return false; > > phi = SSA_NAME_DEF_STMT (preinc); > - if (gimple_code (phi) != GIMPLE_PHI) > - return false; > + while (gimple_code (phi) != GIMPLE_PHI) > + { > + /* Follow trivial copies, but not the DEF used in a back edge, > + so that we don't prevent coalescing. */ > + if (gimple_code (phi) != GIMPLE_ASSIGN > + || gimple_assign_lhs (phi) != preinc > + || !gimple_assign_ssa_name_copy_p (phi)) given gimple_assign_ssa_name_copy checks it is an assign just do if (!gimple_assign_ssa-anme_Copy_p (phi)) the lhs != preinc check is always false given you got to phi via SSA_NAME_DEF_STMT of preinc. The simple_iv_increment_p change is ok with that change. The other change is RTL which I defer to somebody else. Richard. > + return false; > + > + preinc = gimple_assign_rhs1 (phi); > + phi = SSA_NAME_DEF_STMT (preinc); > + } > > for (i = 0; i < gimple_phi_num_args (phi); i++) > if (gimple_phi_arg_def (phi, i) == lhs) > > > [PR81611] turn inc-and-use-of-dead-orig into auto-inc > > When the addressing modes available on the machine don't allow offsets > in addresses, odds are that post-increments will be represented in > trees and RTL as: > > y <= x + 1 > ... *(x) ... > x <= y > > so deal with this form so as to create auto-inc addresses that we'd > otherwise miss. > > > for gcc/ChangeLog > > PR rtl-optimization/81611 > * auto-inc-dec.c (attempt_change): Move dead note from > mem_insn if it's the next use of regno > (find_address): Take address use of reg holding > non-incremented value. > (merge_in_block): Attempt to use a mem insn that is the next > use of the original regno. > --- > gcc/auto-inc-dec.c | 46 ++++++++++++++++++++++++++++++++++++++++++++-- > 1 file changed, 44 insertions(+), 2 deletions(-) > > diff --git a/gcc/auto-inc-dec.c b/gcc/auto-inc-dec.c > index d02fa9d081c7..4ffbcf56a456 100644 > --- a/gcc/auto-inc-dec.c > +++ b/gcc/auto-inc-dec.c > @@ -508,7 +508,11 @@ attempt_change (rtx new_addr, rtx inc_reg) > before the memory reference. */ > gcc_assert (mov_insn); > emit_insn_before (mov_insn, inc_insn.insn); > - move_dead_notes (mov_insn, inc_insn.insn, inc_insn.reg0); > + regno = REGNO (inc_insn.reg0); > + if (reg_next_use[regno] == mem_insn.insn) > + move_dead_notes (mov_insn, mem_insn.insn, inc_insn.reg0); > + else > + move_dead_notes (mov_insn, inc_insn.insn, inc_insn.reg0); > > regno = REGNO (inc_insn.reg_res); > reg_next_def[regno] = mov_insn; > @@ -842,7 +846,7 @@ find_address (rtx *address_of_x) > > if (code == MEM && rtx_equal_p (XEXP (x, 0), inc_insn.reg_res)) > { > - /* Match with *reg0. */ > + /* Match with *reg_res. */ > mem_insn.mem_loc = address_of_x; > mem_insn.reg0 = inc_insn.reg_res; > mem_insn.reg1_is_const = true; > @@ -850,6 +854,17 @@ find_address (rtx *address_of_x) > mem_insn.reg1 = GEN_INT (0); > return -1; > } > + if (code == MEM && inc_insn.reg1_is_const && inc_insn.reg0 > + && rtx_equal_p (XEXP (x, 0), inc_insn.reg0)) > + { > + /* Match with *reg0 AKA *(reg_res - reg1_val). */ > + mem_insn.mem_loc = address_of_x; > + mem_insn.reg0 = inc_insn.reg_res; > + mem_insn.reg1_is_const = true; > + mem_insn.reg1_val = -inc_insn.reg1_val; > + mem_insn.reg1 = GEN_INT (mem_insn.reg1_val); > + return -1; > + } > if (code == MEM && GET_CODE (XEXP (x, 0)) == PLUS > && rtx_equal_p (XEXP (XEXP (x, 0), 0), inc_insn.reg_res)) > { > @@ -1370,6 +1385,33 @@ merge_in_block (int max_reg, basic_block bb) > insn_is_add_or_inc = false; > } > } > + > + if (ok && insn_is_add_or_inc > + && inc_insn.reg0 != inc_insn.reg_res) > + { > + regno = REGNO (inc_insn.reg0); > + int luid = DF_INSN_LUID (mem_insn.insn); > + mem_insn.insn = get_next_ref (regno, bb, reg_next_use); > + > + /* Try a mem use of reg0 before that of reg_res > + too. If there's no further use of reg_res, > + there's no point in trying an auto-inc, and > + if the use of reg0 is after that of reg_res, > + it will be too late for the auto-inc to > + compute reg_res's correct value. */ > + if (mem_insn.insn > + && luid > DF_INSN_LUID (mem_insn.insn) > + && find_address (&PATTERN (mem_insn.insn)) == -1) > + { > + if (dump_file) > + dump_mem_insn (dump_file); > + if (try_merge ()) > + { > + success_in_block++; > + insn_is_add_or_inc = false; > + } > + } > + } > } > } > } > > > -- > Alexandre Oliva, freedom fighter http://FSFLA.org/~lxoliva/ > You must be the change you wish to see in the world. -- Gandhi > Be Free! -- http://FSFLA.org/ FSF Latin America board member > Free Software Evangelist|Red Hat Brasil GNU Toolchain Engineer ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PR81611] improve auto-inc 2018-01-24 14:40 ` Richard Biener @ 2018-01-25 20:32 ` Alexandre Oliva 2018-01-30 18:36 ` Alexandre Oliva 0 siblings, 1 reply; 7+ messages in thread From: Alexandre Oliva @ 2018-01-25 20:32 UTC (permalink / raw) To: Richard Biener; +Cc: GCC Patches On Jan 24, 2018, Richard Biener <richard.guenther@gmail.com> wrote: > given gimple_assign_ssa_name_copy checks it is an assign *nod* > the lhs != preinc check is always false given you got to phi via > SSA_NAME_DEF_STMT of preinc. *nod* > The simple_iv_increment_p change is ok with that change. Thanks, I'll retest with the simplified test (just in case; for I can't recall why I ended up with all those redundant conditions), repost and install. -- Alexandre Oliva, freedom fighter http://FSFLA.org/~lxoliva/ You must be the change you wish to see in the world. -- Gandhi Be Free! -- http://FSFLA.org/ FSF Latin America board member Free Software Evangelist|Red Hat Brasil GNU Toolchain Engineer ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PR81611] improve auto-inc 2018-01-25 20:32 ` Alexandre Oliva @ 2018-01-30 18:36 ` Alexandre Oliva 0 siblings, 0 replies; 7+ messages in thread From: Alexandre Oliva @ 2018-01-30 18:36 UTC (permalink / raw) To: Richard Biener; +Cc: GCC Patches On Jan 25, 2018, Alexandre Oliva <aoliva@redhat.com> wrote: > Thanks, I'll retest with the simplified test (just in case; for I can't > recall why I ended up with all those redundant conditions), As suspected, removing the redundant tests didn't regress anything. I suppose they mattered in some earlier experimental version of the patch (I vaguely recall having a long sequence of tests within the loop at some point), and I just failed to further simplify the final form. Thanks for catching the further simplification opportunities! FTR, here's the patch I'm installing, while awaiting a review from someone else on the rtl auto-inc patch (the second and last patch in https://gcc.gnu.org/ml/gcc-patches/2018-01/msg01994.html). [PR81611] accept copies in simple_iv_increment_p If there are copies between the GIMPLE_PHI at the loop body and the increment that reaches it (presumably through a back edge), still regard it as a simple_iv_increment, so that we won't consider the value in the back edge eligible for forwprop. Doing so would risk making the phi node and the incremented conflicting value live within the loop, and the phi node to be preserved for propagated uses after the loop. for gcc/ChangeLog PR tree-optimization/81611 * tree-ssa-dom.c (simple_iv_increment_p): Skip intervening copies. --- gcc/tree-ssa-dom.c | 18 ++++++++++++++---- 1 file changed, 14 insertions(+), 4 deletions(-) diff --git a/gcc/tree-ssa-dom.c b/gcc/tree-ssa-dom.c index 2b371667253a..a6f176c5def0 100644 --- a/gcc/tree-ssa-dom.c +++ b/gcc/tree-ssa-dom.c @@ -1276,8 +1276,11 @@ record_equality (tree x, tree y, class const_and_copies *const_and_copies) /* Returns true when STMT is a simple iv increment. It detects the following situation: - i_1 = phi (..., i_2) - i_2 = i_1 +/- ... */ + i_1 = phi (..., i_k) + [...] + i_j = i_{j-1} for each j : 2 <= j <= k-1 + [...] + i_k = i_{k-1} +/- ... */ bool simple_iv_increment_p (gimple *stmt) @@ -1305,8 +1308,15 @@ simple_iv_increment_p (gimple *stmt) return false; phi = SSA_NAME_DEF_STMT (preinc); - if (gimple_code (phi) != GIMPLE_PHI) - return false; + while (gimple_code (phi) != GIMPLE_PHI) + { + /* Follow trivial copies, but not the DEF used in a back edge, + so that we don't prevent coalescing. */ + if (!gimple_assign_ssa_name_copy_p (phi)) + return false; + preinc = gimple_assign_rhs1 (phi); + phi = SSA_NAME_DEF_STMT (preinc); + } for (i = 0; i < gimple_phi_num_args (phi); i++) if (gimple_phi_arg_def (phi, i) == lhs) -- Alexandre Oliva, freedom fighter http://FSFLA.org/~lxoliva/ You must be the change you wish to see in the world. -- Gandhi Be Free! -- http://FSFLA.org/ FSF Latin America board member Free Software Evangelist|Red Hat Brasil GNU Toolchain Engineer ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PR81611] improve auto-inc 2018-01-24 5:28 [PR81611] improve auto-inc Alexandre Oliva 2018-01-24 14:40 ` Richard Biener @ 2018-02-14 5:06 ` Jeff Law 2018-02-27 23:18 ` Alexandre Oliva 1 sibling, 1 reply; 7+ messages in thread From: Jeff Law @ 2018-02-14 5:06 UTC (permalink / raw) To: Alexandre Oliva, gcc-patches On 01/23/2018 08:42 PM, Alexandre Oliva wrote: > These two patches fix PR81611. > > The first one improves forwprop so that we avoid adding SSA conflicting > by forwpropping the iv increment, which may cause both the incremented > and the original value to be live, even when the iv is copied between > the PHI node and the increment. We already handled the case in which > there aren't any such copies. > > Alas, this is not enough to address the problem on avr, even though it > fixes it on e.g. ppc. The reason is that avr rejects var+offset > addresses, and this prevents the memory access in a post-inc code > sequence from being adjusted to an address that auto-inc-dec can > recognize. > > The second patch adjusts auto-inc-dec to recognize a code sequence in > which the original, unincremented pseudo is used in an address after > it's incremented into another pseudo, and turn that into a post-inc > address, leaving the copying for subsequent passes to eliminate. > > Regstrapped on x86_64-linux-gnu, i686-linux-gnu, ppc64-linux-gnu and > aarch64-linux-gnu. Ok to install? > > > I'd appreciate suggestions on how to turn the submitted testcase into a > regression test; I suppose an avr-specific test that requires the > auto-inc transformation is a possibility, but that feels a bit too > limited, doesn't it? Thoughts? Thanks in advance, > > [ snip the DOM change which was approved and installed ] > [PR81611] turn inc-and-use-of-dead-orig into auto-inc > > When the addressing modes available on the machine don't allow offsets > in addresses, odds are that post-increments will be represented in > trees and RTL as: > > y <= x + 1 > ... *(x) ... > x <= y > > so deal with this form so as to create auto-inc addresses that we'd > otherwise miss. > > > for gcc/ChangeLog > > PR rtl-optimization/81611 > * auto-inc-dec.c (attempt_change): Move dead note from > mem_insn if it's the next use of regno > (find_address): Take address use of reg holding > non-incremented value. > (merge_in_block): Attempt to use a mem insn that is the next > use of the original regno. > --- > gcc/auto-inc-dec.c | 46 ++++++++++++++++++++++++++++++++++++++++++++-- > 1 file changed, 44 insertions(+), 2 deletions(-) > > diff --git a/gcc/auto-inc-dec.c b/gcc/auto-inc-dec.c > index d02fa9d081c7..4ffbcf56a456 100644 > --- a/gcc/auto-inc-dec.c > +++ b/gcc/auto-inc-dec.c > @@ -508,7 +508,11 @@ attempt_change (rtx new_addr, rtx inc_reg) > before the memory reference. */ > gcc_assert (mov_insn); > emit_insn_before (mov_insn, inc_insn.insn); > - move_dead_notes (mov_insn, inc_insn.insn, inc_insn.reg0); > + regno = REGNO (inc_insn.reg0); > + if (reg_next_use[regno] == mem_insn.insn) > + move_dead_notes (mov_insn, mem_insn.insn, inc_insn.reg0); > + else > + move_dead_notes (mov_insn, inc_insn.insn, inc_insn.reg0); > > regno = REGNO (inc_insn.reg_res); > reg_next_def[regno] = mov_insn; > @@ -1370,6 +1385,33 @@ merge_in_block (int max_reg, basic_block bb) > insn_is_add_or_inc = false; > } > } > + > + if (ok && insn_is_add_or_inc > + && inc_insn.reg0 != inc_insn.reg_res) > + { > + regno = REGNO (inc_insn.reg0); > + int luid = DF_INSN_LUID (mem_insn.insn); > + mem_insn.insn = get_next_ref (regno, bb, reg_next_use); So I think a comment is warranted right as we enter the TRUE arm. At that point INC_INSN is an inc/dec. But MEM_INSN is not necessarily a memory reference. It could be a memory reference, it could be a copy, it could be something completely different (it's just the next insn that references the result of the increment). In the case we care about we want it to be a copy of INC_INSN's REG_RES back to REG0. ISTM that verifying MEM_INSN is a reg->reg copy (reg_res -> reg0) before we call get_next_ref for reg0 is advisable and probably good from a compile-time standpoint by avoiding calls into find_address. After we call get_next_ref to get the next reference of the source of the increment, then we're hoping to find a memory reference that uses REG0. But it's not guaranteed it's a memory reference insn. I was having an awful time understanding how this code could work from the comments until I put it under a debugger and got a sense of the state as we entered that IF block. Then it was much clearer :-) I believe Georg had other testcases in subsequent comments in the BZ, but I don't believe they were flagged as regressions. So I think once you check in your patch we note in the BZ that the original testcase (and thus the regression) is fixed, but that the later tests are not. We then remove the regression marker. If Georg (or anyone else) does the analysis to show those later tests are also regressions, then we'll add the regression marker back. Seem reasonable? Jeff ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PR81611] improve auto-inc 2018-02-14 5:06 ` Jeff Law @ 2018-02-27 23:18 ` Alexandre Oliva 2018-02-28 2:39 ` Jeff Law 0 siblings, 1 reply; 7+ messages in thread From: Alexandre Oliva @ 2018-02-27 23:18 UTC (permalink / raw) To: Jeff Law; +Cc: gcc-patches On Feb 14, 2018, Jeff Law <law@redhat.com> wrote: >> + regno = REGNO (inc_insn.reg0); >> + int luid = DF_INSN_LUID (mem_insn.insn); >> + mem_insn.insn = get_next_ref (regno, bb, reg_next_use); > So I think a comment is warranted right as we enter the TRUE arm. > At that point INC_INSN is an inc/dec. But MEM_INSN is not necessarily a > memory reference. It could be a memory reference, it could be a copy, > it could be something completely different (it's just the next insn that > references the result of the increment). In the case we care about we > want it to be a copy of INC_INSN's REG_RES back to REG0. > ISTM that verifying MEM_INSN is a reg->reg copy (reg_res -> reg0) before > we call get_next_ref for reg0 is advisable and probably good from a > compile-time standpoint by avoiding calls into find_address. But we don't need it to be a copy. The transformation is just as legitimate if the regs go independent ways after that point. We have reg_res set to reg0+reg1, and then a use of reg0 in a MEM before any other use of reg_res. We turn that into a copy of reg0 to reg_res, and the MEM addr into a post_add of reg_res with reg1 (possibly a post_inc), so that the MEM dereferences reg_res while it's still equal to reg0, and after the MEM, reg_res becomes reg0+reg1, as it should for any subsequent uses, and reg0 is unmodified. Whether or not a subsequent copy from reg_res to reg0 is to be found won't make the transformation any more or less legitimate. > After we call get_next_ref to get the next reference of the source of > the increment, then we're hoping to find a memory reference that uses > REG0. But it's not guaranteed it's a memory reference insn. Yeah, find_address will determine if it contains any of the MEM patterns we might be interested in, but it could be anything whatsoever. The MEM pattern might appear virtually anywhere in the insn. > I was having an awful time understanding how this code could work from > the comments until I put it under a debugger and got a sense of the > state as we entered that IF block. Then it was much clearer :-) Sorry, I realize the comments were written based on a lot of context about the overall behavior of the pass, that I had learned while trying to figure it out. At the risk of making it redundant, I've expanded the comments, and added further tests that won't affect current behavior in any significant way, but that might speed things up a bit and will save us trouble should find_address be extended to catch additional patterns. > I believe Georg had other testcases in subsequent comments in the BZ, > but I don't believe they were flagged as regressions. However, with the testcases I realized the incremented register could still be live, even if we didn't find a subsequent use for it. Adjusting for that made those testcases use post_inc too. Here's the improved patch, regstrapped on aarch64-, ppc64-, and ppc64el-linux-gnu. Ok to install? [PR81611] turn inc-and-use-of-dead-orig into auto-inc When the addressing modes available on the machine don't allow offsets in addresses, odds are that post-increments will be represented in trees and RTL as: y <= x + 1 ... *(x) ... x <= y so deal with it by turning such RTL as: (set y (plus x n)) ... (mem x) ... without intervening uses of y into (set y x) ... (mem (post_add y n)) ... so as to create auto-inc addresses that we'd otherwise miss. for gcc/ChangeLog PR rtl-optimization/81611 * auto-inc-dec.c (attempt_change): Move dead note from mem_insn if it's the next use of regno (find_address): Take address use of reg holding non-incremented value. Add parm to limit search to the named reg only. (merge_in_block): Attempt to use a mem insn that is the next use of the original regno. --- gcc/auto-inc-dec.c | 140 ++++++++++++++++++++++++++++++++++++++++++++++++---- 1 file changed, 130 insertions(+), 10 deletions(-) diff --git a/gcc/auto-inc-dec.c b/gcc/auto-inc-dec.c index d02fa9d081c7..e6dc1c30d716 100644 --- a/gcc/auto-inc-dec.c +++ b/gcc/auto-inc-dec.c @@ -508,7 +508,11 @@ attempt_change (rtx new_addr, rtx inc_reg) before the memory reference. */ gcc_assert (mov_insn); emit_insn_before (mov_insn, inc_insn.insn); - move_dead_notes (mov_insn, inc_insn.insn, inc_insn.reg0); + regno = REGNO (inc_insn.reg0); + if (reg_next_use[regno] == mem_insn.insn) + move_dead_notes (mov_insn, mem_insn.insn, inc_insn.reg0); + else + move_dead_notes (mov_insn, inc_insn.insn, inc_insn.reg0); regno = REGNO (inc_insn.reg_res); reg_next_def[regno] = mov_insn; @@ -825,13 +829,15 @@ parse_add_or_inc (rtx_insn *insn, bool before_mem) /* A recursive function that checks all of the mem uses in ADDRESS_OF_X to see if any single one of them is compatible with - what has been found in inc_insn. + what has been found in inc_insn. To avoid accidental matches, we + will only find MEMs with FINDREG, be it inc_insn.reg_res, be it + inc_insn.reg0. -1 is returned for success. 0 is returned if nothing was found and 1 is returned for failure. */ static int -find_address (rtx *address_of_x) +find_address (rtx *address_of_x, rtx findreg) { rtx x = *address_of_x; enum rtx_code code = GET_CODE (x); @@ -840,9 +846,10 @@ find_address (rtx *address_of_x) int value = 0; int tem; - if (code == MEM && rtx_equal_p (XEXP (x, 0), inc_insn.reg_res)) + if (code == MEM && findreg == inc_insn.reg_res + && rtx_equal_p (XEXP (x, 0), inc_insn.reg_res)) { - /* Match with *reg0. */ + /* Match with *reg_res. */ mem_insn.mem_loc = address_of_x; mem_insn.reg0 = inc_insn.reg_res; mem_insn.reg1_is_const = true; @@ -850,7 +857,21 @@ find_address (rtx *address_of_x) mem_insn.reg1 = GEN_INT (0); return -1; } - if (code == MEM && GET_CODE (XEXP (x, 0)) == PLUS + if (code == MEM && inc_insn.reg1_is_const && inc_insn.reg0 + && findreg == inc_insn.reg0 + && rtx_equal_p (XEXP (x, 0), inc_insn.reg0)) + { + /* Match with *reg0, assumed to be equivalent to + *(reg_res - reg1_val); callers must check whether this is the case. */ + mem_insn.mem_loc = address_of_x; + mem_insn.reg0 = inc_insn.reg_res; + mem_insn.reg1_is_const = true; + mem_insn.reg1_val = -inc_insn.reg1_val; + mem_insn.reg1 = GEN_INT (mem_insn.reg1_val); + return -1; + } + if (code == MEM && findreg == inc_insn.reg_res + && GET_CODE (XEXP (x, 0)) == PLUS && rtx_equal_p (XEXP (XEXP (x, 0), 0), inc_insn.reg_res)) { rtx b = XEXP (XEXP (x, 0), 1); @@ -879,7 +900,7 @@ find_address (rtx *address_of_x) { /* If REG occurs inside a MEM used in a bit-field reference, that is unacceptable. */ - if (find_address (&XEXP (x, 0))) + if (find_address (&XEXP (x, 0), findreg)) return 1; } @@ -891,7 +912,7 @@ find_address (rtx *address_of_x) { if (fmt[i] == 'e') { - tem = find_address (&XEXP (x, i)); + tem = find_address (&XEXP (x, i), findreg); /* If this is the first use, let it go so the rest of the insn can be checked. */ if (value == 0) @@ -905,7 +926,7 @@ find_address (rtx *address_of_x) int j; for (j = XVECLEN (x, i) - 1; j >= 0; j--) { - tem = find_address (&XVECEXP (x, i, j)); + tem = find_address (&XVECEXP (x, i, j), findreg); /* If this is the first use, let it go so the rest of the insn can be checked. */ if (value == 0) @@ -1360,7 +1381,106 @@ merge_in_block (int max_reg, basic_block bb) if (dump_file) dump_inc_insn (dump_file); - if (ok && find_address (&PATTERN (mem_insn.insn)) == -1) + if (ok && find_address (&PATTERN (mem_insn.insn), + inc_insn.reg_res) == -1) + { + if (dump_file) + dump_mem_insn (dump_file); + if (try_merge ()) + { + success_in_block++; + insn_is_add_or_inc = false; + } + } + } + + if (insn_is_add_or_inc + /* find_address will only recognize an address + with a reg0 that's not reg_res when + reg1_is_const, so cut it off early if we + already know it won't match. */ + && inc_insn.reg1_is_const + && inc_insn.reg0 + && inc_insn.reg0 != inc_insn.reg_res) + { + /* If we identified an inc_insn that uses two + different pseudos, it's of the form + + (set reg_res (plus reg0 reg1)) + + where reg1 is a constant (*). + + The next use of reg_res was not idenfied by + find_address as a mem_insn that we could turn + into auto-inc, so see if we find a suitable + MEM in the next use of reg0, as long as it's + before any subsequent use of reg_res: + + ... (mem (... reg0 ...)) ... + + ... reg_res ... + + In this case, we can turn the plus into a + copy, and the reg0 in the MEM address into a + post_inc of reg_res: + + (set reg_res reg0) + + ... (mem (... (post_add reg_res reg1) ...)) ... + + reg_res will then have the correct value at + subsequent uses, and reg0 will remain + unchanged. + + (*) We could support non-const reg1, but then + we'd have to check that reg1 remains + unchanged all the way to the modified MEM, + and we'd have to extend find_address to + represent a non-const negated reg1. */ + regno = REGNO (inc_insn.reg0); + rtx_insn *reg0_use = get_next_ref (regno, bb, + reg_next_use); + + /* Give up if the next use of reg0 is after the next + use of reg_res (same insn is ok; we might have + found a MEM with reg_res before, and that failed, + but now we try reg0, which might work), or defs + of reg_res (same insn is not ok, we'd introduce + another def in the same insn) or reg0. */ + if (reg0_use) + { + int luid = DF_INSN_LUID (reg0_use); + + /* It might seem pointless to introduce an + auto-inc if there's no subsequent use of + reg_res (i.e., mem_insn.insn == NULL), but + the next use might be in the next iteration + of a loop, and it won't hurt if we make the + change even if it's not needed. */ + if (mem_insn.insn + && luid > DF_INSN_LUID (mem_insn.insn)) + reg0_use = NULL; + + rtx_insn *other_insn + = get_next_ref (REGNO (inc_insn.reg_res), bb, + reg_next_def); + + if (other_insn && luid >= DF_INSN_LUID (other_insn)) + reg0_use = NULL; + + other_insn + = get_next_ref (REGNO (inc_insn.reg0), bb, + reg_next_def); + + if (other_insn && luid > DF_INSN_LUID (other_insn)) + reg0_use = NULL; + } + + mem_insn.insn = reg0_use; + + if (mem_insn.insn + && find_address (&PATTERN (mem_insn.insn), + inc_insn.reg0) == -1) { if (dump_file) dump_mem_insn (dump_file); -- Alexandre Oliva, freedom fighter http://FSFLA.org/~lxoliva/ You must be the change you wish to see in the world. -- Gandhi Be Free! -- http://FSFLA.org/ FSF Latin America board member Free Software Evangelist|Red Hat Brasil GNU Toolchain Engineer ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PR81611] improve auto-inc 2018-02-27 23:18 ` Alexandre Oliva @ 2018-02-28 2:39 ` Jeff Law 0 siblings, 0 replies; 7+ messages in thread From: Jeff Law @ 2018-02-28 2:39 UTC (permalink / raw) To: Alexandre Oliva; +Cc: gcc-patches On 02/27/2018 04:18 PM, Alexandre Oliva wrote: > On Feb 14, 2018, Jeff Law <law@redhat.com> wrote: > >>> + regno = REGNO (inc_insn.reg0); >>> + int luid = DF_INSN_LUID (mem_insn.insn); >>> + mem_insn.insn = get_next_ref (regno, bb, reg_next_use); >> So I think a comment is warranted right as we enter the TRUE arm. > >> At that point INC_INSN is an inc/dec. But MEM_INSN is not necessarily a >> memory reference. It could be a memory reference, it could be a copy, >> it could be something completely different (it's just the next insn that >> references the result of the increment). In the case we care about we >> want it to be a copy of INC_INSN's REG_RES back to REG0. > >> ISTM that verifying MEM_INSN is a reg->reg copy (reg_res -> reg0) before >> we call get_next_ref for reg0 is advisable and probably good from a >> compile-time standpoint by avoiding calls into find_address. > > But we don't need it to be a copy. The transformation is just as > legitimate if the regs go independent ways after that point. We have > reg_res set to reg0+reg1, and then a use of reg0 in a MEM before any > other use of reg_res. We turn that into a copy of reg0 to reg_res, and > the MEM addr into a post_add of reg_res with reg1 (possibly a post_inc), > so that the MEM dereferences reg_res while it's still equal to reg0, and > after the MEM, reg_res becomes reg0+reg1, as it should for any > subsequent uses, and reg0 is unmodified. Whether or not a subsequent > copy from reg_res to reg0 is to be found won't make the transformation > any more or less legitimate. > >> After we call get_next_ref to get the next reference of the source of >> the increment, then we're hoping to find a memory reference that uses >> REG0. But it's not guaranteed it's a memory reference insn. > > Yeah, find_address will determine if it contains any of the MEM patterns > we might be interested in, but it could be anything whatsoever. The MEM > pattern might appear virtually anywhere in the insn. > >> I was having an awful time understanding how this code could work from >> the comments until I put it under a debugger and got a sense of the >> state as we entered that IF block. Then it was much clearer :-) > > Sorry, I realize the comments were written based on a lot of context > about the overall behavior of the pass, that I had learned while trying > to figure it out. At the risk of making it redundant, I've expanded the > comments, and added further tests that won't affect current behavior in > any significant way, but that might speed things up a bit and will save > us trouble should find_address be extended to catch additional patterns. > > >> I believe Georg had other testcases in subsequent comments in the BZ, >> but I don't believe they were flagged as regressions. > > However, with the testcases I realized the incremented register could > still be live, even if we didn't find a subsequent use for it. > Adjusting for that made those testcases use post_inc too. > > Here's the improved patch, regstrapped on aarch64-, ppc64-, and > ppc64el-linux-gnu. Ok to install? > > > [PR81611] turn inc-and-use-of-dead-orig into auto-inc > > When the addressing modes available on the machine don't allow offsets > in addresses, odds are that post-increments will be represented in > trees and RTL as: > > y <= x + 1 > ... *(x) ... > x <= y > > so deal with it by turning such RTL as: > > (set y (plus x n)) > ... (mem x) ... > > without intervening uses of y into > > (set y x) > ... (mem (post_add y n)) ... > > so as to create auto-inc addresses that we'd otherwise miss. > > > for gcc/ChangeLog > > PR rtl-optimization/81611 > * auto-inc-dec.c (attempt_change): Move dead note from > mem_insn if it's the next use of regno > (find_address): Take address use of reg holding > non-incremented value. Add parm to limit search to the named > reg only. > (merge_in_block): Attempt to use a mem insn that is the next > use of the original regno. OK. Thanks! Jeff ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2018-02-28 2:39 UTC | newest] Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2018-01-24 5:28 [PR81611] improve auto-inc Alexandre Oliva 2018-01-24 14:40 ` Richard Biener 2018-01-25 20:32 ` Alexandre Oliva 2018-01-30 18:36 ` Alexandre Oliva 2018-02-14 5:06 ` Jeff Law 2018-02-27 23:18 ` Alexandre Oliva 2018-02-28 2:39 ` Jeff Law
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).