Re: [PATCH] PR 62173, re-shuffle insns for RTL loop invariant hoisting

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

From: Jiong Wang <jiong.wang@arm.com>
To: Richard Biener <richard.guenther@gmail.com>
Cc: "gcc-patches@gcc.gnu.org" <gcc-patches@gcc.gnu.org>
Subject: Re: [PATCH] PR 62173, re-shuffle insns for RTL loop invariant hoisting
Date: Thu, 18 Dec 2014 17:08:00 -0000	[thread overview]
Message-ID: <54930811.1020003@arm.com> (raw)
In-Reply-To: <CAFiYyc3oYRsYkQwivE+T4A4mysDBe0gjZqjroQ8B2p1J6sakQg@mail.gmail.com>


On 17/12/14 15:54, Richard Biener wrote:
> On Mon, Dec 15, 2014 at 4:29 PM, Jiong Wang <jiong.wang@arm.com> wrote:
>> On 15/12/14 15:28, Jiong Wang wrote:
>>> On 04/12/14 19:32, Jiong Wang wrote:
>>>> On 04/12/14 11:07, Richard Biener wrote:
>>>>
>>>>> On Thu, Dec 4, 2014 at 12:07 PM, Richard Biener
>>>>> <richard.guenther@gmail.com> wrote:
>>>>>> On Thu, Dec 4, 2014 at 12:00 PM, Jiong Wang <jiong.wang@arm.com> wrote:
>>>>>>
>>>>>>
>>>>>> which means re-associate the constant imm with the virtual frame
>>>>>> pointer.
>>>>>>
>>>>>> transform
>>>>>>
>>>>>>          RA <- fixed_reg + RC
>>>>>>          RD <- MEM (RA + const_offset)
>>>>>>
>>>>>>       into:
>>>>>>
>>>>>>          RA <- fixed_reg + const_offset
>>>>>>          RD <- MEM (RA + RC)
>>>>>>
>>>>>> then RA <- fixed_reg + const_offset is actually loop invariant, so the
>>>>>> later
>>>>>> RTL GCSE PRE pass could catch it and do the hoisting, and thus
>>>>>> ameliorate
>>>>>> what tree
>>>>>> level ivopts could not sort out.
>>>>>> There is a LIM pass after gimple ivopts - if the invariantness is
>>>>>> already
>>>>>> visible there why not handle it there similar to the special-cases in
>>>>>> rewrite_bittest and rewrite_reciprocal?
>>>> maybe, needs further check.
>>>>
>>>>>> And of course similar tricks could be applied on the RTL level to
>>>>>> RTL invariant motion?
>>>> Thanks. I just checked the code, yes, loop invariant motion pass
>>>> is the natural place to integrate such multi-insns invariant analysis
>>>> trick.
>>>>
>>>> those code could be integrated into loop-invariant.c cleanly, but
>>>> then I found although the invariant detected successfully but it's not
>>>> moved
>>>> out of loop because of cost issue, and looks like the patch committed to
>>>> fix PR33928
>>>> is trying to prevent such cheap address be hoisted to reduce register
>>>> pressure.
>>>>
>>>>
>>>>      805       /* ??? Try to determine cheapness of address computation.
>>>> Unfortunately
>>>>      806          the address cost is only a relative measure, we can't
>>>> really compare
>>>>      807          it with any absolute number, but only with other address
>>>> costs.
>>>>      808          But here we don't have any other addresses, so compare
>>>> with a magic
>>>>      809          number anyway.  It has to be large enough to not regress
>>>> PR33928
>>>>      810          (by avoiding to move reg+8,reg+16,reg+24 invariants),
>>>> but small
>>>>      811          enough to not regress 410.bwaves either (by still moving
>>>> reg+reg
>>>>      812          invariants).
>>>>      813          See
>>>> http://gcc.gnu.org/ml/gcc-patches/2009-10/msg01210.html .  */
>>>>      814       inv->cheap_address = address_cost (SET_SRC (set),
>>>> word_mode,
>>>>      815                                          ADDR_SPACE_GENERIC,
>>>> speed) < 3;
>>>>
>>>>
>>>> I think that maybe necessary for x86 which is short of register, while
>>>> for RISC,
>>>> it may not be that necessary, especially the whole register pressure is
>>>> not big.
>>>>
>>>> currently, what I could think of is for this transformation below, we
>>>> should
>>>> increase the costs:
>>>>      A
>>>> ==
>>>>          RA <- virtual_stack_var + RC
>>>>          RD <- MEM (RA + const_offset)
>>>>
>>>>       into:
>>>>
>>>> B
>>>> ==
>>>>          RA <- virtual_stack_var + const_offset  <--B
>>>>          RD <- MEM (RA + RC)
>>>>
>>>> because the cost is not that cheap, if there is not re-assocation of
>>>> virtual_stack_var
>>>> with const_offset, then lra elimination will create another instruction
>>>> to hold the
>>>> elimination result, so format A will actually be
>>>>
>>>>          RT <- real_stack_pointer + elimination_offset
>>>>          RA <- RT + RC
>>>>          RD <- MEM (RA + const_offset)
>>>>
>>>> so, the re-assocation and later hoisting of invariant B could actually
>>>> save two instructions
>>>> in the loop, this is why there are 15% perf gap for bzip2 under some
>>>> situation.
>>> updated patch.
>>>
>>> moved this instruction shuffling trick to rtl loop invariant pass.
>>> as described above, this patch tries to transform A to B, so that
>>> after the transformation:
>>>
>>>      * RA <- virtual_stack_var + const_offset  could be hoisted out of the
>>> loop
>>>      * easy the work of lra elimination as virtual_stack_var is associated
>>> with const_offset
>>>        that the elimination offset could be combined with const_offset
>>> automatically.
>>>
>>> current rtl loop invariant pass treat "reg <- reg + off" as cheap address,
>>> while although
>>> "reg <- virtual_stack_var + offset" fall into the same format, but it's
>>> not that cheap as
>>> we could also save one lra elimination instruction. so this patch will
>>> mark
>>> "reg <- virtual_stack_var + offset" transformed from A to be expensive, so
>>> that it could be
>>> hoisted later.
>>>
>>> after patch, pr62173 is fixed on powerpc64, while *still not on aarch64*.
>>> because there are one
>>> glitch in aarch64_legitimize_address which cause unnecessary complex
>>> instructions sequences
>>> generated when legitimize some addresses. and if we fix that, we will get
>>> cheaper address for
>>> those cases which is generally good, and the cheaper address will cause
>>> tree level IVOPT do
>>> more IVOPT optimization which is generally good also, but from the speck2k
>>> result, there
>>> are actually around 1% code size regression on two cases, the reason is
>>> for target support
>>> post-index address, doing IVOPT may not always be the best choice because
>>> we lost the advantage
>>> of using post-index addressing.
>>>
>>> on aarch64, for the following testcase, the ivopted version is complexer
>>> than not ivopted version.
>>>
>>>        while (oargc--) *(nargv++) = *(oargv++);
>>>     so, I sent the generic fix here only, as it's an independent patch, and
>>> could benefit other targets
>>> like powerpc64 although the issue on aarch64 is still not resolved before
>>> we figure out how to let
>>> the correct fix on aarch64_legitimize_address do not cause code size
>>> regression on benchmark caused
>>> by sub-optimal tree level IVOPT.
>>>
>>> and the testcase is marked to run on powerpc64 only at current time.
>>>
>>> bootstrap OK on x86-64, no regression.
>>> bootstrap OK on AArch64, and from speck2k compile dump, there do have a
>>> few more RTL loop
>>> invariants get hoisted.
>>>
>>> ok for trunk?
>>>
>>> gcc/
>>> PR62173
>>>      loop-invariant.c.c (expensive_addr): New hash_table.
>>>      (need_expensive_addr_check_p): New bool.
>>>      (find_exits): Rename to "find_exists_and_reshuffle.
>>>      Support re-shuffle instructions for better loop invariant hoisting.
>>>      (create_new_invariant): Mark address as expensive if it's generated by
>>> re-shuffle.
>>>      (init_inv_motion_data): Initialize expensive_addr and
>>> need_expensive_addr_check_p.
>>>      (free_inv_motion_data): Release expensive_addr.
>>>
>>> gcc/testssuite/
>>>      PR62173
>>>      gcc.dg/pr62173.c: New test.
>>
>> sorry, patch uploaded.
> +             do
> +               {
> ...
> +               } while ((next_insn = next_real_insn (next_insn))
> +                        && body[i] == BLOCK_FOR_INSN (next_insn));
>
> ick.  I realize we don't have SSA form on RTL but doesn't DF provide
> at least some help in looking up definition statements for pseudos?
> In fact we want to restrict the transform to single-use pseudos, thus
> hopefully it can at least tell us that... (maybe not and this is what
> LOG_LINKS are for in combine...?)  At least loop-invariant alreadly
> computes df_chain with DF_UD_CHAIN which seems exactly what
> is needed (apart from maybe getting at single-use info).

thanks very much for these inspiring questions.

yes, we want to restrict the transformation on single-use pseudo only,
and it's better the transformation could re-use existed info and helper
function to avoid increase compile time. but I haven't found anything I
can reuse at the stage the transformation happen.

the info similar as LOG_LINKS is what I want, but maybe simpler. I'd study
the code about build LOG_LINKS, and try to see if we can do some factor out.

thanks.

Regards,
Jiong

>
> The meat of this should be factored out to a separate function I guess.
>
> Otherwise my RTL fu is too weak to review this properly.
>
> Thanks,
> Richard.
>
>>>>> Oh, and the patch misses a testcase.
>>>> It's at the start of the email, will include in the patch.
>>>>
>>>> void bar(int i) {
>>>>       char A[10];
>>>>       int d = 0;
>>>>       while (i > 0)
>>>>       A[d++] = i--;
>>>>
>>>>       while (d > 0)
>>>>       foo(A[d--]);
>>>> }
>>>>
>>>>>> Thanks,
>>>>>> Richard.
>>>>>>
>>>>>>> and this patch only tries to re-shuffle instructions within single
>>>>>>> basic
>>>>>>> block which
>>>>>>> is a inner loop which is perf critical.
>>>>>>>
>>>>>>> I am reusing the loop info in fwprop because there is loop info and
>>>>>>> it's run
>>>>>>> before
>>>>>>> GCSE.
>>>>>>>
>>>>>>> verified on aarch64 and mips64, the array base address hoisted out of
>>>>>>> loop.
>>>>>>>
>>>>>>> bootstrap ok on x86-64 and aarch64.
>>>>>>>
>>>>>>> comments?
>>>>>>>
>>>>>>> thanks.
>>>>>>>
>>>>>>> gcc/
>>>>>>>       PR62173
>>>>>>>       fwprop.c (prepare_for_gcse_pre): New function.
>>>>>>>       (fwprop_done): Call it.
>>>>
>>>

next prev parent reply	other threads:[~2014-12-18 17:00 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-12-04 11:00 Jiong Wang
2014-12-04 11:07 ` Richard Biener
2014-12-04 11:07   ` Richard Biener
2014-12-04 19:32     ` Jiong Wang
2014-12-15 15:29       ` Jiong Wang
2014-12-15 15:36         ` Jiong Wang
2014-12-17 16:19           ` Richard Biener
2014-12-18 17:08             ` Jiong Wang [this message]
2014-12-18 21:16               ` Jiong Wang
2014-12-18 22:19               ` Segher Boessenkool
2014-12-19  4:06                 ` Bin.Cheng
2014-12-19 10:29                   ` Jiong Wang
2014-12-19 11:45                     ` Richard Biener
2014-12-19 15:31                       ` Kenneth Zadeck
2015-02-11 11:20                         ` Jiong Wang
2015-02-11 14:22                           ` Kenneth Zadeck
2015-02-11 18:18                             ` Jiong Wang
2015-04-14 15:06                               ` Jiong Wang
2015-04-14 16:49                                 ` Steven Bosscher
2015-04-14 17:24                                   ` Jeff Law
2015-04-14 21:49                                     ` Jiong Wang
2015-04-21 14:43                                       ` Jiong Wang
2015-04-24  1:55                                         ` Jeff Law
2015-04-24 17:05                                           ` Jiong Wang
2015-05-14 20:04                                             ` Jeff Law
2015-05-14 22:07                                               ` Jiong Wang
2015-05-14 22:24                                                 ` Jeff Law
2015-05-21 21:51                                                   ` Jiong Wang
2015-05-27 16:11                                                     ` Jeff Law
2015-09-02 13:49                                                       ` Jiong Wang
2015-09-02 20:52                                                         ` Jeff Law
2015-04-28 12:16                                         ` Jiong Wang
2015-04-28 14:00                                           ` Matthew Fortune
2015-04-28 14:31                                             ` Jiong Wang
2015-04-19 16:20                               ` Jiong Wang
2014-12-19 12:09                     ` Eric Botcazou
2014-12-19 15:21                   ` Segher Boessenkool

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=54930811.1020003@arm.com \
    --to=jiong.wang@arm.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=richard.guenther@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).