Re: [RFC]rs6000: split complicated constant to memory

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

From: Jiufu Guo <guojiufu@linux.ibm.com>
To: Richard Biener <richard.guenther@gmail.com>
Cc: GCC Patches <gcc-patches@gcc.gnu.org>,
	David Edelsohn <dje.gcc@gmail.com>,
	 Segher Boessenkool <segher@kernel.crashing.org>,
	linkw@gcc.gnu.org
Subject: Re: [RFC]rs6000: split complicated constant to memory
Date: Tue, 16 Aug 2022 11:50:17 +0800	[thread overview]
Message-ID: <7ek078ludi.fsf@pike.rch.stglabs.ibm.com> (raw)
In-Reply-To: <CAFiYyc0vqQyLzzov8ghFXiz1VrLRBDfGfddws1BxQXAJJXKg0Q@mail.gmail.com> (Richard Biener's message of "Mon, 15 Aug 2022 10:07:52 +0200")


Hi,

Richard Biener <richard.guenther@gmail.com> writes:

> On Mon, Aug 15, 2022 at 7:26 AM Jiufu Guo via Gcc-patches
> <gcc-patches@gcc.gnu.org> wrote:
>>
>> Hi,
>>
>> This patch tries to put the constant into constant pool if building the
>> constant requires 3 or more instructions.
>>
>> But there is a concern: I'm wondering if this patch is really profitable.
>>
>> Because, as I tested, 1. for simple case, if instructions are not been run
>> in parallel, loading constant from memory maybe faster; but 2. if there
>> are some instructions could run in parallel, loading constant from memory
>> are not win comparing with building constant.  As below examples.
>>
>> For f1.c and f3.c, 'loading' constant would be acceptable in runtime aspect;
>> for f2.c and f4.c, 'loading' constant are visibly slower.
>>
>> For real-world cases, both kinds of code sequences exist.
>>
>> So, I'm not sure if we need to push this patch.
>>
>> Run a lot of times (1000000000) below functions to check runtime.
>> f1.c:
>> long foo (long *arg, long*, long *)
>> {
>>   *arg = 0x1234567800000000;
>> }
>> asm building constant:
>>         lis 10,0x1234
>>         ori 10,10,0x5678
>>         sldi 10,10,32
>> vs.  asm loading
>>         addis 10,2,.LC0@toc@ha
>>         ld 10,.LC0@toc@l(10)
>> The runtime between 'building' and 'loading' are similar: some times the
>> 'building' is faster; sometimes 'loading' is faster. And the difference is
>> slight.
>
> I wonder if it is possible to decide this during scheduling - chose the
> variant that, when the result is needed, is cheaper?  Post-RA might
> be a bit difficult (I see the load from memory needs the TOC, but then
> when the TOC is not available we could just always emit the build form),
> and pre-reload precision might be not good enough to make this worth
> the experiment?
Thanks a lot for your comments!

Yes, Post-RA may not handle all cases.
If there is no TOC avaiable, we are not able to load the const through
TOC.  As Segher point out: crtl->uses_const_pool maybe an approximation
way.
Sched2 pass could optimize some cases(e.g. for f2.c and f4.c), but for
some cases, it may not distrubuted those 'building' instructions.

So, maybe we add a peephole after sched2.  If the five-instructions
to building constant are still successive, then using 'load' to replace
(need to check TOC available).
While I'm not sure if it is worthy. 

>
> Of course the scheduler might lack on the technical side as well.


BR,
Jeff(Jiufu)

>
>>
>> f2.c
>> long foo (long *arg, long *arg2, long *arg3)
>> {
>>   *arg = 0x1234567800000000;
>>   *arg2 = 0x7965234700000000;
>>   *arg3 = 0x4689123700000000;
>> }
>> asm building constant:
>>         lis 7,0x1234
>>         lis 10,0x7965
>>         lis 9,0x4689
>>         ori 7,7,0x5678
>>         ori 10,10,0x2347
>>         ori 9,9,0x1237
>>         sldi 7,7,32
>>         sldi 10,10,32
>>         sldi 9,9,32
>> vs. loading
>>         addis 7,2,.LC0@toc@ha
>>         addis 10,2,.LC1@toc@ha
>>         addis 9,2,.LC2@toc@ha
>>         ld 7,.LC0@toc@l(7)
>>         ld 10,.LC1@toc@l(10)
>>         ld 9,.LC2@toc@l(9)
>> For this case, 'loading' is always slower than 'building' (>15%).
>>
>> f3.c
>> long foo (long *arg, long *, long *)
>> {
>>   *arg = 384307168202282325;
>> }
>>         lis 10,0x555
>>         ori 10,10,0x5555
>>         sldi 10,10,32
>>         oris 10,10,0x5555
>>         ori 10,10,0x5555
>> For this case, 'building' (through 5 instructions) are slower, and 'loading'
>> is faster ~5%;
>>
>> f4.c
>> long foo (long *arg, long *arg2, long *arg3)
>> {
>>   *arg = 384307168202282325;
>>   *arg2 = -6148914691236517205;
>>   *arg3 = 768614336404564651;
>> }
>>         lis 7,0x555
>>         lis 10,0xaaaa
>>         lis 9,0xaaa
>>         ori 7,7,0x5555
>>         ori 10,10,0xaaaa
>>         ori 9,9,0xaaaa
>>         sldi 7,7,32
>>         sldi 10,10,32
>>         sldi 9,9,32
>>         oris 7,7,0x5555
>>         oris 10,10,0xaaaa
>>         oris 9,9,0xaaaa
>>         ori 7,7,0x5555
>>         ori 10,10,0xaaab
>>         ori 9,9,0xaaab
>> For this cases, since 'building' constant are parallel, 'loading' is slower:
>> ~8%. On p10, 'loading'(through 'pld') is also slower >4%.
>>
>>
>> BR,
>> Jeff(Jiufu)
>>
>> ---
>>  gcc/config/rs6000/rs6000.cc                | 14 ++++++++++++++
>>  gcc/testsuite/gcc.target/powerpc/pr63281.c | 11 +++++++++++
>>  2 files changed, 25 insertions(+)
>>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr63281.c
>>
>> diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
>> index 4b727d2a500..3798e11bdbc 100644
>> --- a/gcc/config/rs6000/rs6000.cc
>> +++ b/gcc/config/rs6000/rs6000.cc
>> @@ -10098,6 +10098,20 @@ rs6000_emit_set_const (rtx dest, rtx source)
>>           c = ((c & 0xffffffff) ^ 0x80000000) - 0x80000000;
>>           emit_move_insn (lo, GEN_INT (c));
>>         }
>> +      else if (base_reg_operand (dest, mode)
>> +              && num_insns_constant (source, mode) > 2)
>> +       {
>> +         rtx sym = force_const_mem (mode, source);
>> +         if (TARGET_TOC && SYMBOL_REF_P (XEXP (sym, 0))
>> +             && use_toc_relative_ref (XEXP (sym, 0), mode))
>> +           {
>> +             rtx toc = create_TOC_reference (XEXP (sym, 0), copy_rtx (dest));
>> +             sym = gen_const_mem (mode, toc);
>> +             set_mem_alias_set (sym, get_TOC_alias_set ());
>> +           }
>> +
>> +         emit_insn (gen_rtx_SET (dest, sym));
>> +       }
>>        else
>>         rs6000_emit_set_long_const (dest, c);
>>        break;
>> diff --git a/gcc/testsuite/gcc.target/powerpc/pr63281.c b/gcc/testsuite/gcc.target/powerpc/pr63281.c
>> new file mode 100644
>> index 00000000000..469a8f64400
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/powerpc/pr63281.c
>> @@ -0,0 +1,11 @@
>> +/* PR target/63281 */
>> +/* { dg-do compile { target lp64 } } */
>> +/* { dg-options "-O2 -std=c99" } */
>> +
>> +void
>> +foo (unsigned long long *a)
>> +{
>> +  *a = 0x020805006106003;
>> +}
>> +
>> +/* { dg-final { scan-assembler-times {\mp?ld\M} 1 } } */
>> --
>> 2.17.1
>>

next prev parent reply	other threads:[~2022-08-16  3:50 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-08-15  5:25 Jiufu Guo
2022-08-15  8:07 ` Richard Biener
2022-08-16  3:50   ` Jiufu Guo [this message]
2022-08-16  6:45     ` Jiufu Guo
2022-08-15 21:12 ` Segher Boessenkool
2022-08-17  2:32   ` Jiufu Guo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7ek078ludi.fsf@pike.rch.stglabs.ibm.com \
    --to=guojiufu@linux.ibm.com \
    --cc=dje.gcc@gmail.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=linkw@gcc.gnu.org \
    --cc=richard.guenther@gmail.com \
    --cc=segher@kernel.crashing.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).