Re: [PATCH V2]rs6000: Store complicated constant into pool

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

From: Segher Boessenkool <segher@kernel.crashing.org>
To: Jiufu Guo <guojiufu@linux.ibm.com>
Cc: gcc-patches@gcc.gnu.org, dje.gcc@gmail.com, linkw@gcc.gnu.org,
	amodra@gmail.com
Subject: Re: [PATCH V2]rs6000: Store complicated constant into pool
Date: Wed, 15 Jun 2022 12:44:41 -0500	[thread overview]
Message-ID: <20220615174441.GD25951@gate.crashing.org> (raw)
In-Reply-To: <h48tu8me3tu.fsf@genoa.aus.stglabs.ibm.com>

Hi!

On Wed, Jun 15, 2022 at 04:19:41PM +0800, Jiufu Guo wrote:
> Segher Boessenkool <segher@kernel.crashing.org> writes:
> > Have you tried with different limits?
> I drafted cases(C code, and updated asm code) to test runtime costs for
> different code sequences:  building constants with 5 insns; with 3
> insns and 2 insns; and loading from rodata.

I'm not asking about trivial examples here.  Have you tried the
resulting compiler, on some big real-life code, with various choices for
these essentially random choices, on different cpus?

Huge changes like this need quite a bit of experimental data to back it
up, or some convincing other evidence.  That is the only way to move
forward: anything else will resemble Brownian motion :-(

> >  And, p10 is a red herring, you
> > actually test if prefixed insns are used.
> 
> 'pld' is the instruction which we care about instead target p10 :-). So
> in patch, 'TARGET_PREFIXED' is tested.

Huh.  I would think "pli" is the most important thing here!  Which is
not a load instruction at all, notwithstanding its name.

> >> --- a/gcc/config/rs6000/rs6000.cc
> >> +++ b/gcc/config/rs6000/rs6000.cc
> >> @@ -9706,8 +9706,9 @@ rs6000_init_stack_protect_guard (void)
> >>  static bool
> >>  rs6000_cannot_force_const_mem (machine_mode mode ATTRIBUTE_UNUSED, rtx x)
> >>  {
> >> -  if (GET_CODE (x) == HIGH
> >> -      && GET_CODE (XEXP (x, 0)) == UNSPEC)
> >> +  /* Exclude CONSTANT HIGH part.  e.g.
> >> +     (high:DI (symbol_ref:DI ("var") [flags 0xc0] <var_decl>)).  */
> >> +  if (GET_CODE (x) == HIGH)
> >>      return true;
> >
> > So, why is this?  You didn't explain.
> 
> In the function rs6000_cannot_force_const_mem, we already excluded
> "HIGH with UNSPEC" like: "high:P (unspec:P [symbol_ref..".
> I find, "high:DI (symbol_ref:DI" is similar, which may also need to
> exclude.  Half high part of address would be invalid for constant pool.
> 
> Or maybe, we could use below code to exclude it.
>   /* high part of a symbol_ref could not be put into constant pool.  */
>   if (GET_CODE (x) == HIGH
>       && (GET_CODE (XEXP (x, 0)) == UNSPEC || SYMBOL_REF_P (XEXP (x, 0))))
> 
> One interesting thing is how the rtx "(high:DI (symbol_ref:DI (var" is
> generated.  It seems in the middle of optimization, it is checked to
> see if ok to store in constant memory.

It probably is fine to not allow the HIGH of *anything* to be put in a
constant pool, sure.  But again, that is a change independent of other
things, and it could use some supporting evidence.

> >>      case CONST_DOUBLE:
> >>      case CONST_WIDE_INT:
> >> +      /* It may needs a few insns for const to SET. -1 for outer SET code.  */
> >> +      if (outer_code == SET)
> >> +	{
> >> +	  *total = COSTS_N_INSNS (num_insns_constant (x, mode)) - 1;
> >> +	  return true;
> >> +	}
> >> +      /* FALLTHRU */
> >> +
> >>      case CONST:
> >>      case HIGH:
> >>      case SYMBOL_REF:
> >
> > But this again isn't an obvious improvement at all, and needs
> > performance testing -- separately of the other changes.
> 
> This code updates the cost of constant assigning, we need this to avoid
> CSE to replace the constant loading with constant assigning.

No.  This is rtx_cost, which makes a cost estimate of random RTL, not
typically corresponding to existing RTL insns at all.  We do not use it
a lot anymore thankfully (we use insn_cost in most places), but things
like CSE still use it.

> The above change (the threshold for storing const in the pool) depends
> on this code.

Yes, and a gazillion other places as well still (or about 50 really :-) )

> > And this is completely independent of the rest as well.  Cost 5 or 7 are
> > completely random numbers, why did you pick these?  Does it work better
> > than 8, or 4, etc.?
> For 'pld', asm code would be "pld %r9,.LC0@pcrel"; as tests, it is
> faster than 2insns (like, "li+oris"), but still slower than one
> instruction (e.g. li, pli).  So, "COSTS_N_INSNS (2) - 1" is selected.
> And with this, cse will not replace 'pld' with 'li + oris'. 

(That is WinNT assembler syntax btw.  We never use that.  We write
either "pld 9,.LC0@pcrel" or "pld r9,.LC0@pcrel".)

COSTS_N_INSNS(1) means 4, always.  The costs are not additive at all in
reality of course, so you cannot simply add them and get any sensible
result :-(

Why did you choose 7?  Your explanation makes 5 and 6 good candidates
as well.  My money is on 6 performing best here, but there is no way to
know without actually trying out things.

> >> --- a/gcc/testsuite/gcc.target/powerpc/medium_offset.c
> >> +++ b/gcc/testsuite/gcc.target/powerpc/medium_offset.c
> >> @@ -1,7 +1,7 @@
> >>  /* { dg-do compile { target { powerpc*-*-* } } } */
> >>  /* { dg-require-effective-target lp64 } */
> >>  /* { dg-options "-O" } */
> >> -/* { dg-final { scan-assembler-not "\\+4611686018427387904" } } */
> >> +/* { dg-final { scan-assembler-times {\msldi|pld\M} 1 } } */
> >
> > Why?  This is still better generated in code, no?  It should never be
> > loaded from a constant pool (it is hex 4000_0000_0000_0000, easy to
> > construct with just one or two insns).
> 
> For p8/9, two insns "lis 0x4000+sldi 32" are used:
>         addis %r3,%r2,.LANCHOR0@toc@ha
>         addi %r3,%r3,.LANCHOR0@toc@l
>         lis %r9,0x4000
>         sldi %r9,%r9,32
>         add %r3,%r3,%r9
> 	blr

That does not mean putting this constant in the constant pool is a good
idea at all, of course.

> On p10, as expected, 'pld' would be better than 'lis+sldi'.

Is it?

> And for this case, it also saves other instructions(addis/addi):
>         pld %r3,.LC1@pcrel
>         blr

This explanation then belongs in your commit message :-)

> > Btw, you need to write
> >   \m(?:rldimi|ld|pld)\M
> Oh, thanks!  I did not aware of this.

(?:xxx) is just the same as (xxx), but the parentheses in the latter are
capturing, which messes up scan-assembler-times, so you need
non-capturing grouping.

> Maybe I should separate out two cases one for "-mdejagnu-cpu=power10",
> and one for "-mdejagnu-cpu=power7", because the load instructions
> the number would be different.

Yeah, it is probably best to never allow both a load and non-load insns
in the same check.

> > So no doubt this will improve things, but we need testing of each part
> > separately.  Also look at code size, or differences in the generated
> > code in general: this is much more sensitive to detect than performance,
> > and is not itself sensitive to things like system load, so a) is easier
> > to measure, and b) has more useful outputs, outputs that tell more of
> > the whole story.
> Thanks for your suggestions!
> I also feel it is very sensitive when I test performance.

Yeah, there are constants *everywhere* in typical compiler output, so a
small change will easily move the needle already.

> The change
> (e.g. cost of constant) may affect other optimization passes indirectly.

That too, yes.

> I will also check the object changes (object size or differences). I may
> use objects from GCC bootstrap and GCC test suite.

Good choice :-)

Looking forward to it,


Segher

next prev parent reply	other threads:[~2022-06-15 17:45 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-06-14 13:23 Jiufu Guo
2022-06-14 22:34 ` Segher Boessenkool
2022-06-15  8:19   ` Jiufu Guo
2022-06-15 17:44     ` Segher Boessenkool [this message]
2022-06-16  7:47       ` Jiufu Guo
2022-06-21 18:10         ` Segher Boessenkool
  -- strict thread matches above, loose matches on Subject: below --
2022-05-10  9:28 [PATCH] Store complicated const " Jiufu Guo
2022-05-23  1:39 ` [PATCH V2]rs6000: Store complicated constant " Jiufu Guo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220615174441.GD25951@gate.crashing.org \
    --to=segher@kernel.crashing.org \
    --cc=amodra@gmail.com \
    --cc=dje.gcc@gmail.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=guojiufu@linux.ibm.com \
    --cc=linkw@gcc.gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).