From: Jiong Wang <jiong.wang@arm.com>
To: Richard Biener <richard.guenther@gmail.com>
Cc: "gcc-patches@gcc.gnu.org" <gcc-patches@gcc.gnu.org>
Subject: Re: [PATCH] PR 62173, re-shuffle insns for RTL loop invariant hoisting
Date: Mon, 15 Dec 2014 15:36:00 -0000 [thread overview]
Message-ID: <548EFE55.6090901@arm.com> (raw)
In-Reply-To: <548EFE0D.1070808@arm.com>
[-- Attachment #1: Type: text/plain, Size: 7445 bytes --]
On 15/12/14 15:28, Jiong Wang wrote:
> On 04/12/14 19:32, Jiong Wang wrote:
>> On 04/12/14 11:07, Richard Biener wrote:
>>
>>> On Thu, Dec 4, 2014 at 12:07 PM, Richard Biener
>>> <richard.guenther@gmail.com> wrote:
>>>> On Thu, Dec 4, 2014 at 12:00 PM, Jiong Wang <jiong.wang@arm.com> wrote:
>>>>
>>>>
>>>> which means re-associate the constant imm with the virtual frame pointer.
>>>>
>>>> transform
>>>>
>>>> RA <- fixed_reg + RC
>>>> RD <- MEM (RA + const_offset)
>>>>
>>>> into:
>>>>
>>>> RA <- fixed_reg + const_offset
>>>> RD <- MEM (RA + RC)
>>>>
>>>> then RA <- fixed_reg + const_offset is actually loop invariant, so the later
>>>> RTL GCSE PRE pass could catch it and do the hoisting, and thus ameliorate
>>>> what tree
>>>> level ivopts could not sort out.
>>>> There is a LIM pass after gimple ivopts - if the invariantness is already
>>>> visible there why not handle it there similar to the special-cases in
>>>> rewrite_bittest and rewrite_reciprocal?
>> maybe, needs further check.
>>
>>>> And of course similar tricks could be applied on the RTL level to
>>>> RTL invariant motion?
>> Thanks. I just checked the code, yes, loop invariant motion pass
>> is the natural place to integrate such multi-insns invariant analysis trick.
>>
>> those code could be integrated into loop-invariant.c cleanly, but
>> then I found although the invariant detected successfully but it's not moved
>> out of loop because of cost issue, and looks like the patch committed to fix PR33928
>> is trying to prevent such cheap address be hoisted to reduce register pressure.
>>
>>
>> 805 /* ??? Try to determine cheapness of address computation. Unfortunately
>> 806 the address cost is only a relative measure, we can't really compare
>> 807 it with any absolute number, but only with other address costs.
>> 808 But here we don't have any other addresses, so compare with a magic
>> 809 number anyway. It has to be large enough to not regress PR33928
>> 810 (by avoiding to move reg+8,reg+16,reg+24 invariants), but small
>> 811 enough to not regress 410.bwaves either (by still moving reg+reg
>> 812 invariants).
>> 813 See http://gcc.gnu.org/ml/gcc-patches/2009-10/msg01210.html . */
>> 814 inv->cheap_address = address_cost (SET_SRC (set), word_mode,
>> 815 ADDR_SPACE_GENERIC, speed) < 3;
>>
>>
>> I think that maybe necessary for x86 which is short of register, while for RISC,
>> it may not be that necessary, especially the whole register pressure is not big.
>>
>> currently, what I could think of is for this transformation below, we should
>> increase the costs:
>>
>> A
>> ==
>> RA <- virtual_stack_var + RC
>> RD <- MEM (RA + const_offset)
>>
>> into:
>>
>> B
>> ==
>> RA <- virtual_stack_var + const_offset <--B
>> RD <- MEM (RA + RC)
>>
>> because the cost is not that cheap, if there is not re-assocation of virtual_stack_var
>> with const_offset, then lra elimination will create another instruction to hold the
>> elimination result, so format A will actually be
>>
>> RT <- real_stack_pointer + elimination_offset
>> RA <- RT + RC
>> RD <- MEM (RA + const_offset)
>>
>> so, the re-assocation and later hoisting of invariant B could actually save two instructions
>> in the loop, this is why there are 15% perf gap for bzip2 under some situation.
> updated patch.
>
> moved this instruction shuffling trick to rtl loop invariant pass.
> as described above, this patch tries to transform A to B, so that
> after the transformation:
>
> * RA <- virtual_stack_var + const_offset could be hoisted out of the loop
> * easy the work of lra elimination as virtual_stack_var is associated with const_offset
> that the elimination offset could be combined with const_offset automatically.
>
> current rtl loop invariant pass treat "reg <- reg + off" as cheap address, while although
> "reg <- virtual_stack_var + offset" fall into the same format, but it's not that cheap as
> we could also save one lra elimination instruction. so this patch will mark
> "reg <- virtual_stack_var + offset" transformed from A to be expensive, so that it could be
> hoisted later.
>
> after patch, pr62173 is fixed on powerpc64, while *still not on aarch64*. because there are one
> glitch in aarch64_legitimize_address which cause unnecessary complex instructions sequences
> generated when legitimize some addresses. and if we fix that, we will get cheaper address for
> those cases which is generally good, and the cheaper address will cause tree level IVOPT do
> more IVOPT optimization which is generally good also, but from the speck2k result, there
> are actually around 1% code size regression on two cases, the reason is for target support
> post-index address, doing IVOPT may not always be the best choice because we lost the advantage
> of using post-index addressing.
>
> on aarch64, for the following testcase, the ivopted version is complexer than not ivopted version.
>
> while (oargc--) *(nargv++) = *(oargv++);
>
> so, I sent the generic fix here only, as it's an independent patch, and could benefit other targets
> like powerpc64 although the issue on aarch64 is still not resolved before we figure out how to let
> the correct fix on aarch64_legitimize_address do not cause code size regression on benchmark caused
> by sub-optimal tree level IVOPT.
>
> and the testcase is marked to run on powerpc64 only at current time.
>
> bootstrap OK on x86-64, no regression.
> bootstrap OK on AArch64, and from speck2k compile dump, there do have a few more RTL loop
> invariants get hoisted.
>
> ok for trunk?
>
> gcc/
> PR62173
> loop-invariant.c.c (expensive_addr): New hash_table.
> (need_expensive_addr_check_p): New bool.
> (find_exits): Rename to "find_exists_and_reshuffle.
> Support re-shuffle instructions for better loop invariant hoisting.
> (create_new_invariant): Mark address as expensive if it's generated by re-shuffle.
> (init_inv_motion_data): Initialize expensive_addr and need_expensive_addr_check_p.
> (free_inv_motion_data): Release expensive_addr.
>
> gcc/testssuite/
> PR62173
> gcc.dg/pr62173.c: New test.
sorry, patch uploaded.
>
>>> Oh, and the patch misses a testcase.
>> It's at the start of the email, will include in the patch.
>>
>> void bar(int i) {
>> char A[10];
>> int d = 0;
>> while (i > 0)
>> A[d++] = i--;
>>
>> while (d > 0)
>> foo(A[d--]);
>> }
>>
>>>> Thanks,
>>>> Richard.
>>>>
>>>>> and this patch only tries to re-shuffle instructions within single basic
>>>>> block which
>>>>> is a inner loop which is perf critical.
>>>>>
>>>>> I am reusing the loop info in fwprop because there is loop info and it's run
>>>>> before
>>>>> GCSE.
>>>>>
>>>>> verified on aarch64 and mips64, the array base address hoisted out of loop.
>>>>>
>>>>> bootstrap ok on x86-64 and aarch64.
>>>>>
>>>>> comments?
>>>>>
>>>>> thanks.
>>>>>
>>>>> gcc/
>>>>> PR62173
>>>>> fwprop.c (prepare_for_gcse_pre): New function.
>>>>> (fwprop_done): Call it.
>>
>
>
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: rtl-iv.patch --]
[-- Type: text/x-patch; name=rtl-iv.patch, Size: 6874 bytes --]
diff --git a/gcc/loop-invariant.c b/gcc/loop-invariant.c
index 19e536f..7a72f10 100644
--- a/gcc/loop-invariant.c
+++ b/gcc/loop-invariant.c
@@ -186,6 +186,8 @@ typedef struct invariant *invariant_p;
/* The invariants. */
static vec<invariant_p> invariants;
+static hash_table <pointer_hash <rtx_insn> > *expensive_addr;
+static bool need_expensive_addr_check_p;
/* Check the size of the invariant table and realloc if necessary. */
@@ -576,18 +578,37 @@ compute_always_reached (struct loop *loop, basic_block *body,
/* Finds exits out of the LOOP with body BODY. Marks blocks in that we may
exit the loop by cfg edge to HAS_EXIT and MAY_EXIT. In MAY_EXIT
- additionally mark blocks that may exit due to a call. */
+ additionally mark blocks that may exit due to a call.
+
+ This function also try to transform
+
+ RA <- fixed_reg + RC
+ RD <- MEM (RA + const_offset)
+
+ into:
+
+ RA <- fixed_reg + const_offset
+ RD <- MEM (RA + RC) <- pos0
+
+ If RA is DEAD after pos0.
+
+ After this change, the first instruction is loop invariant. */
static void
-find_exits (struct loop *loop, basic_block *body,
- bitmap may_exit, bitmap has_exit)
+find_exits_and_reshuffle (struct loop *loop, basic_block *body,
+ bitmap may_exit, bitmap has_exit)
{
unsigned i;
edge_iterator ei;
edge e;
struct loop *outermost_exit = loop, *aexit;
bool has_call = false;
- rtx_insn *insn;
+ rtx_insn *insn, *next_insn;
+ rtx single_set1, single_set2, old_dest;
+ rtx op0, op0_;
+ rtx old_op1, op1, op1_;
+ rtx inner;
+ rtx *mem_plus_loc;
for (i = 0; i < loop->num_nodes; i++)
{
@@ -597,12 +618,116 @@ find_exits (struct loop *loop, basic_block *body,
{
if (CALL_P (insn)
&& (RTL_LOOPING_CONST_OR_PURE_CALL_P (insn)
- || !RTL_CONST_OR_PURE_CALL_P (insn)))
+ || !RTL_CONST_OR_PURE_CALL_P (insn))
+ && ! has_call)
{
has_call = true;
bitmap_set_bit (may_exit, i);
- break;
}
+
+ if (! NONDEBUG_INSN_P (insn))
+ continue;
+
+ single_set1 = single_set (insn);
+
+ if (! single_set1
+ || GET_CODE (SET_SRC (single_set1)) != PLUS)
+ continue;
+
+ old_dest = SET_DEST (single_set1);
+ op0 = XEXP (SET_SRC (single_set1), 0);
+ old_op1 = op1 = XEXP (SET_SRC (single_set1), 1);
+
+ if (op1 == frame_pointer_rtx
+ || op1 == stack_pointer_rtx
+ || op1 == virtual_stack_vars_rtx)
+ {
+ old_op1 = op0;
+ std::swap (op0, op1);
+ }
+
+ if (GET_CODE (op1) == SIGN_EXTEND
+ || GET_CODE (op1) == ZERO_EXTEND)
+ op1 = XEXP (op1, 0);
+
+ if (! (REG_P (old_dest) && REG_P (op0) && REG_P (op1)
+ && (op0 == frame_pointer_rtx
+ || op0 == stack_pointer_rtx
+ || op0 == virtual_stack_vars_rtx)))
+ continue;
+
+ if (! (next_insn = next_real_insn (insn)))
+ break;
+
+ do
+ {
+ if (DEBUG_INSN_P (next_insn))
+ continue;
+
+ single_set2 = single_set (next_insn);
+
+ if (!single_set2
+ || (! REG_P (SET_DEST (single_set2))
+ && ! MEM_P (SET_DEST (single_set2))))
+ continue;
+
+ if (REG_P (SET_DEST (single_set2)))
+ inner = SET_SRC (single_set2);
+ else
+ inner = SET_DEST (single_set2);
+
+ if (GET_CODE (inner) == ZERO_EXTEND
+ || GET_CODE (inner) == SIGN_EXTEND
+ || GET_CODE (inner) == TRUNCATE)
+ inner = XEXP (inner, 0);
+
+ if (! MEM_P (inner)
+ || GET_CODE (XEXP (inner, 0)) != PLUS)
+ continue;
+
+ mem_plus_loc = &XEXP (inner, 0);
+ op0_ = XEXP (XEXP (inner, 0), 0);
+ op1_ = XEXP (XEXP (inner, 0), 1);
+
+ if (REG_P (op0_) && CONST_INT_P (op1_)
+ && rtx_equal_p (op0_, old_dest)
+ && GET_MODE (op0_) == GET_MODE (old_op1))
+ {
+ rtx new_src;
+
+ if (find_regno_note (next_insn, REG_DEAD,
+ REGNO (old_dest)))
+ {
+ new_src = plus_constant (GET_MODE (op0), op0,
+ INTVAL (op1_));
+ validate_change (insn, &SET_SRC (single_set1),
+ new_src, 1);
+ new_src = gen_rtx_PLUS (GET_MODE (op0_), op0_,
+ old_op1);
+ validate_change (next_insn, mem_plus_loc, new_src, 1);
+ if (apply_change_group ())
+ {
+ rtx_insn **slot =
+ expensive_addr->find_slot (insn, INSERT);
+
+ if (*slot)
+ gcc_unreachable ();
+ else
+ *slot = insn;
+
+ need_expensive_addr_check_p = true;
+
+ if (dump_file)
+ fprintf (dump_file,
+ "\nRe-associate insn %d and %d for later"
+ " RTL loop invariant hoisting.\n",
+ INSN_UID (insn), INSN_UID (next_insn));
+ }
+ }
+ break;
+ }
+ } while ((next_insn = next_real_insn (next_insn))
+ && body[i] == BLOCK_FOR_INSN (next_insn));
}
FOR_EACH_EDGE (e, ei, body[i]->succs)
@@ -725,6 +850,9 @@ create_new_invariant (struct def *def, rtx_insn *insn, bitmap depends_on,
See http://gcc.gnu.org/ml/gcc-patches/2009-10/msg01210.html . */
inv->cheap_address = address_cost (SET_SRC (set), word_mode,
ADDR_SPACE_GENERIC, speed) < 3;
+ if (need_expensive_addr_check_p
+ && expensive_addr->find (insn))
+ inv->cheap_address = false;
}
else
{
@@ -1043,7 +1171,7 @@ find_invariants (struct loop *loop)
bitmap always_executed = BITMAP_ALLOC (NULL);
basic_block *body = get_loop_body_in_dom_order (loop);
- find_exits (loop, body, may_exit, has_exit);
+ find_exits_and_reshuffle (loop, body, may_exit, has_exit);
compute_always_reached (loop, body, may_exit, always_reached);
compute_always_reached (loop, body, has_exit, always_executed);
@@ -1628,6 +1756,9 @@ move_invariants (struct loop *loop)
static void
init_inv_motion_data (void)
{
+ need_expensive_addr_check_p = false;
+ expensive_addr = new hash_table <pointer_hash <rtx_insn> > (4);
+
actual_stamp = 1;
invariants.create (100);
@@ -1663,6 +1794,8 @@ free_inv_motion_data (void)
free (inv);
}
invariants.release ();
+
+ delete expensive_addr;
}
/* Move the invariants out of the LOOP. */
diff --git a/gcc/testsuite/gcc.dg/pr62173.c b/gcc/testsuite/gcc.dg/pr62173.c
new file mode 100644
index 0000000..a5a1a86
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr62173.c
@@ -0,0 +1,19 @@
+/* { dg-do compile { target { powerpc64-*-* } } } */
+/* { dg-options "-O2 -fdump-rtl-loop2_invariant" } */
+
+void foo (char);
+
+void
+bar(int i)
+{
+ char A[10];
+ int d = 0;
+ while (i > 0)
+ A[d++] = i--;
+
+ while (d > 0)
+ foo (A[d--]);
+}
+
+/* { dg-final { scan-rtl-dump "Re-associate insn .* loop invariant hoisting" "loop2_invariant" } } */
+/* { dg-final { cleanup-rtl-dump "loop2_invariant" } } */
next prev parent reply other threads:[~2014-12-15 15:29 UTC|newest]
Thread overview: 37+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-12-04 11:00 Jiong Wang
2014-12-04 11:07 ` Richard Biener
2014-12-04 11:07 ` Richard Biener
2014-12-04 19:32 ` Jiong Wang
2014-12-15 15:29 ` Jiong Wang
2014-12-15 15:36 ` Jiong Wang [this message]
2014-12-17 16:19 ` Richard Biener
2014-12-18 17:08 ` Jiong Wang
2014-12-18 21:16 ` Jiong Wang
2014-12-18 22:19 ` Segher Boessenkool
2014-12-19 4:06 ` Bin.Cheng
2014-12-19 10:29 ` Jiong Wang
2014-12-19 11:45 ` Richard Biener
2014-12-19 15:31 ` Kenneth Zadeck
2015-02-11 11:20 ` Jiong Wang
2015-02-11 14:22 ` Kenneth Zadeck
2015-02-11 18:18 ` Jiong Wang
2015-04-14 15:06 ` Jiong Wang
2015-04-14 16:49 ` Steven Bosscher
2015-04-14 17:24 ` Jeff Law
2015-04-14 21:49 ` Jiong Wang
2015-04-21 14:43 ` Jiong Wang
2015-04-24 1:55 ` Jeff Law
2015-04-24 17:05 ` Jiong Wang
2015-05-14 20:04 ` Jeff Law
2015-05-14 22:07 ` Jiong Wang
2015-05-14 22:24 ` Jeff Law
2015-05-21 21:51 ` Jiong Wang
2015-05-27 16:11 ` Jeff Law
2015-09-02 13:49 ` Jiong Wang
2015-09-02 20:52 ` Jeff Law
2015-04-28 12:16 ` Jiong Wang
2015-04-28 14:00 ` Matthew Fortune
2015-04-28 14:31 ` Jiong Wang
2015-04-19 16:20 ` Jiong Wang
2014-12-19 12:09 ` Eric Botcazou
2014-12-19 15:21 ` Segher Boessenkool
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=548EFE55.6090901@arm.com \
--to=jiong.wang@arm.com \
--cc=gcc-patches@gcc.gnu.org \
--cc=richard.guenther@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).