public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: Richard Biener <rguenther@suse.de>
To: Jiufu Guo <guojiufu@linux.ibm.com>
Cc: Jiufu Guo via Gcc-patches <gcc-patches@gcc.gnu.org>,
	 Segher Boessenkool <segher@kernel.crashing.org>,
	wschmidt@linux.ibm.com,  jlaw@tachyum.com, dje.gcc@gmail.com
Subject: Re: [PATCH] Check if loading const from mem is faster
Date: Fri, 25 Feb 2022 09:45:45 +0100 (CET)	[thread overview]
Message-ID: <1q51o2n4-r721-qqo6-o186-7r04o4463n@fhfr.qr> (raw)
In-Reply-To: <h48lexzr2bb.fsf@genoa.aus.stglabs.ibm.com>

On Fri, 25 Feb 2022, Jiufu Guo wrote:

> Richard Biener <rguenther@suse.de> writes:
> 
> > On Thu, 24 Feb 2022, Jiufu Guo wrote:
> >
> >> Jiufu Guo via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
> >> 
> >> > Segher Boessenkool <segher@kernel.crashing.org> writes:
> >> >
> >> >> On Wed, Feb 23, 2022 at 02:02:59PM +0100, Richard Biener wrote:
> >> >>> I'm assuming we're always dealing with
> >> >>> 
> >> >>>   (set (reg:MODE ..) <src_folded>)
> >> >>> 
> >> >>> here and CSE is not substituting into random places of an
> >> >>> instruction(?).  I don't know what 'rtx_cost' should evaluate
> >> >>> to for a constant, if it should implicitely evaluate the cost
> >> >>> of putting the result into a register for example.
> >> >>
> >> >> rtx_cost is no good here (and in most places).  rtx_cost should be 0
> >> >> for anything that is used as input in a machine instruction -- but you
> >> >> need much more context to determine that.  insn_cost is much simpler and
> >> >> much easier to use.
> >> >>
> >> >>> Using RTX_COST with SET and 1 at least looks no worse than using
> >> >>> your proposed new target hook and comparing it with the original
> >> >>> unfolded src (again with SET and 1).
> >> >>
> >> >> It is required to generate valid instructions no matter what, before
> >> >> the pass has finished that is.  On all more modern architectures it is
> >> >> futile to think you can usefully consider the cost of an RTL expression
> >> >> and derive a real-world cost of the generated code from that.
> >> >
> >> > Thanks Segher for pointing out these!  Here is  another reason that I
> >> > did not use rtx_cost: in a few passes, there are codes to check the
> >> > constants and store them in constant pool.  I'm thinking to integerate
> >> > those codes in a consistent way.
> >> 
> >> Hi Segher, Richard!
> >> 
> >> I'm thinking the way like: For a constant,
> >> 1. if the constant could be used as an immediate for the
> >> instruction, then retreated as an operand;
> >> 2. otherwise if the constant can not be stored into a
> >> constant pool, then handle through instructions;
> >> 3. if it is faster to access constant from pool, then emit
> >> constant as data(.rodata);
> >> 4. otherwise, handle the constant by instructions.
> >> 
> >> And to store the constant into a pool, besides force_const_mem,
> >> create reference (toc) may be needed on some platforms.
> >> 
> >> For this particular issue in CSE, there is already code that
> >> tries to put constant into a pool (invoke force_const_mem).
> >> While the code is too late.  So, we may check the constant
> >> earlier and store it into constant pool if profitable.
> >> 
> >> And another thing as Segher pointed out, CSE is doing too
> >> much work.  It may be ok to separate the constant handling
> >> logic from CSE.
> >
> > Not sure - CSE just is value numbering, I don't see that it does
> > more than that.  Yes, it might have developed "heuristics" over
> > the years what to CSE and to what and where to substitute and
> > where not.  But in the end it does just value numbering.
> >
> >> 
> >> I update a new version patch as follow (did not seprate CSE):
> >
> > How is the new target hook better in any way compared to rtx_cost
> > or insn_cost?  It looks like a total hack.
> >
> > I suppose the actual way of materializing a constant is done
> > behind GCCs back and not exposed anywhere?  But instead you
> > claim the constants are valid when they actually are not?
> > Isn't the problem then that the rs6000 backend lies?
> 
> Hi Richard,
> 
> Thanks for your comments and sugguestions!
> 
> Materializing a constant should be done behind GCC.
> On rs6000, in expand pass, during emit_move, the constant is
> checked and store into constant pool if necessary.
> Some other platforms are doing a similar thing, e.g.
> ix86_expand_vector_move, alpha_expand_mov,...
> mips_legitimize_const_move.
> 
> But, it does not as we expect, force_const_mem is also 
> exposed other places (not only ira/reload for stack reference).
> 
> CSE is one place, for example, CSE first retrieve the constant
> from insn's equal, but it also tries to put constant into
> pool for some condition (the condition was introduced at
> early age of cse.c, and it is rare to run into in trunk).
> In some aspects, IMHO, this seems not a great work of CSE.
> 
> And this is how the 'invalid(or say slow)' constant occurs.
> e.g.  before cse:
>     7: r119:DI=[unspec[`*.LC0',%r2:DI] 47]
>       REG_EQUAL 0x100803004101001
> after cse: 
>     7: r119:DI=0x100803004101001
>       REG_EQUAL 0x100803004101001
> 
> As you pointed out: we can also avoid this transformation through
> rtx_cost/insn_cost by estimating the COST more accurately for
>  "r119:DI=0x100803004101001". (r119:DI=0x100803004101001 will not
> be a real final instruction.)

At which point does this become the final instruction?  I suppose
CSE belives this constant is legitimate and the insn is recognized
just fine in CSE?  (that's what I mean with "behind GCCs back")

> Is it necessary to refine this constant handling for CSE, or even
> to eliminate the logic on constant extracting for an insn, beside
> updating rtx_cost/insn_cost?

So if CSE really just transforms

>     7: r119:DI=[unspec[`*.LC0',%r2:DI] 47]
>       REG_EQUAL 0x100803004101001

to

>     7: r119:DI=0x100803004101001
>       REG_EQUAL 0x100803004101001

then why can't rtx_cost (SET, 1) or insn_cost () be used to
accurately tell it that the immediate is going to be a lot
more expensive?

That is, rtx_cost (CONST_INT-rtx, DI, SET, 1, ...) is accurate
enough to be treated as an actual insn (it _might_ be part of
a parallel I guess, but that's unlikely) and thus the backend
should have a very good idea of the many instruction it needs
for this.  Likewise rtx_cost ([unspec[`*.LC0',%r2:DI] 47], DI, SET, 1, 
...)
should receive accurate cost that can be compared to other
rtx_cost (..., DI, SET, 1, ...)

And CSE doesn't even need to create fake insns here since IMHO
rtx_cost is good enough to capture the full insn.  Targets can
choose to split out a set_cost from their rtx_cost/insn_cost
hooks for this case for example.

Richard.

> Any sugguestions?
> 
> >
> > Btw, all of this is of course not appropriate for stage4 and changes
> > to CSE need testing on more than one target.
> I would do more evaluation, thanks!
> 
> Jiufu
> 
> >
> > Richard.
> >
> >> Thanks for the comments and suggestions again!
> >> 
> >> 
> >> BR,
> >> Jiufu
> >> 
> >> ---
> >>  gcc/config/rs6000/rs6000.cc                   | 39 ++++++++++++++-----
> >>  gcc/cse.cc                                    | 36 ++++++++---------
> >>  gcc/doc/tm.texi                               |  5 +++
> >>  gcc/doc/tm.texi.in                            |  2 +
> >>  gcc/target.def                                |  8 ++++
> >>  gcc/targhooks.cc                              |  6 +++
> >>  gcc/targhooks.h                               |  1 +
> >>  .../gcc.target/powerpc/medium_offset.c        |  2 +-
> >>  gcc/testsuite/gcc.target/powerpc/pr63281.c    | 14 +++++++
> >>  gcc/testsuite/gcc.target/powerpc/pr93012.c    |  2 +-
> >>  10 files changed, 84 insertions(+), 31 deletions(-)
> >>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr63281.c
> >> 
> >> diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
> >> index d7a7cfe860f..0a8f487d516 100644
> >> --- a/gcc/config/rs6000/rs6000.cc
> >> +++ b/gcc/config/rs6000/rs6000.cc
> >> @@ -1361,6 +1361,9 @@ static const struct attribute_spec rs6000_attribute_table[] =
> >>  #undef TARGET_CANNOT_FORCE_CONST_MEM
> >>  #define TARGET_CANNOT_FORCE_CONST_MEM rs6000_cannot_force_const_mem
> >>  
> >> +#undef TARGET_FASTER_LOADING_CONSTANT
> >> +#define TARGET_FASTER_LOADING_CONSTANT rs6000_faster_loading_const
> >> +
> >>  #undef TARGET_DELEGITIMIZE_ADDRESS
> >>  #define TARGET_DELEGITIMIZE_ADDRESS rs6000_delegitimize_address
> >>  
> >> @@ -9684,8 +9687,8 @@ rs6000_init_stack_protect_guard (void)
> >>  static bool
> >>  rs6000_cannot_force_const_mem (machine_mode mode ATTRIBUTE_UNUSED, rtx x)
> >>  {
> >> -  if (GET_CODE (x) == HIGH
> >> -      && GET_CODE (XEXP (x, 0)) == UNSPEC)
> >> +  /* Exclude CONSTANT HIGH part.  */
> >> +  if (GET_CODE (x) == HIGH)
> >>      return true;
> >>  
> >>    /* A TLS symbol in the TOC cannot contain a sum.  */
> >> @@ -10483,6 +10486,30 @@ rs6000_emit_move_si_sf_subreg (rtx dest, rtx source, machine_mode mode)
> >>    return false;
> >>  }
> >>  
> >> +
> >> +/* Implement TARGET_FASTER_LOADING_CONSTANT.  */
> >> +
> >> +static bool
> >> +rs6000_faster_loading_const (machine_mode mode, rtx dest, rtx src)
> >> +{
> >> +  gcc_assert (CONSTANT_P (src));
> >> +
> >> +  if (GET_MODE_CLASS (mode) == MODE_CC || mode == VOIDmode)
> >> +    return false;
> >> +  if (GET_CODE (src) == HIGH)
> >> +    return false;
> >> +  if (toc_relative_expr_p (src, false, NULL, NULL))
> >> +    return false;
> >> +  if (rs6000_cannot_force_const_mem (mode, src))
> >> +    return false;
> >> +
> >> +  if (REG_P (dest) && FP_REGNO_P (REGNO (dest)))
> >> +    return true;
> >> +  if (!CONST_INT_P (src))
> >> +    return true;
> >> +  return num_insns_constant (src, mode) > (TARGET_PCREL ? 1 : 2);
> >> +}
> >> +
> >>  /* Emit a move from SOURCE to DEST in mode MODE.  */
> >>  void
> >>  rs6000_emit_move (rtx dest, rtx source, machine_mode mode)
> >> @@ -10860,13 +10887,7 @@ rs6000_emit_move (rtx dest, rtx source, machine_mode mode)
> >>  	operands[1] = create_TOC_reference (operands[1], operands[0]);
> >>        else if (mode == Pmode
> >>  	       && CONSTANT_P (operands[1])
> >> -	       && GET_CODE (operands[1]) != HIGH
> >> -	       && ((REG_P (operands[0])
> >> -		    && FP_REGNO_P (REGNO (operands[0])))
> >> -		   || !CONST_INT_P (operands[1])
> >> -		   || (num_insns_constant (operands[1], mode)
> >> -		       > (TARGET_CMODEL != CMODEL_SMALL ? 3 : 2)))
> >> -	       && !toc_relative_expr_p (operands[1], false, NULL, NULL)
> >> +	       && rs6000_faster_loading_const (mode, operands[0], operands[1])
> >>  	       && (TARGET_CMODEL == CMODEL_SMALL
> >>  		   || can_create_pseudo_p ()
> >>  		   || (REG_P (operands[0])
> >> diff --git a/gcc/cse.cc b/gcc/cse.cc
> >> index a18b599d324..53a7b3556b3 100644
> >> --- a/gcc/cse.cc
> >> +++ b/gcc/cse.cc
> >> @@ -5192,6 +5192,22 @@ cse_insn (rtx_insn *insn)
> >>  	      src_elt_regcost = elt->regcost;
> >>  	    }
> >>  
> >> +	  /* If the current function uses a constant pool and this is a
> >> +	     constant, try making a pool entry. Put it in src_folded
> >> +	     unless we already have done this since that is where it
> >> +	     likely came from.  */
> >> +
> >> +	  if (crtl->uses_const_pool && src_folded && CONSTANT_P (src_folded)
> >> +	      && targetm.faster_loading_constant (mode, dest, src_folded))
> >> +	    {
> >> +	      src_folded = force_const_mem (mode, src_folded);
> >> +	      if (src_folded)
> >> +		{
> >> +		  src_folded_cost = COST (src_folded, mode);
> >> +		  src_folded_regcost = approx_reg_cost (src_folded);
> >> +		}
> >> +	    }
> >> +
> >>  	  /* Find cheapest and skip it for the next time.   For items
> >>  	     of equal cost, use this order:
> >>  	     src_folded, src, src_eqv, src_related and hash table entry.  */
> >> @@ -5390,26 +5406,6 @@ cse_insn (rtx_insn *insn)
> >>  
> >>  	      break;
> >>  	    }
> >> -
> >> -	  /* If the current function uses a constant pool and this is a
> >> -	     constant, try making a pool entry. Put it in src_folded
> >> -	     unless we already have done this since that is where it
> >> -	     likely came from.  */
> >> -
> >> -	  else if (crtl->uses_const_pool
> >> -		   && CONSTANT_P (trial)
> >> -		   && !CONST_INT_P (trial)
> >> -		   && (src_folded == 0 || !MEM_P (src_folded))
> >> -		   && GET_MODE_CLASS (mode) != MODE_CC
> >> -		   && mode != VOIDmode)
> >> -	    {
> >> -	      src_folded = force_const_mem (mode, trial);
> >> -	      if (src_folded)
> >> -		{
> >> -		  src_folded_cost = COST (src_folded, mode);
> >> -		  src_folded_regcost = approx_reg_cost (src_folded);
> >> -		}
> >> -	    }
> >>  	}
> >>  
> >>        /* If we changed the insn too much, handle this set from scratch.  */
> >> diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
> >> index 962bbb8caaf..65685311249 100644
> >> --- a/gcc/doc/tm.texi
> >> +++ b/gcc/doc/tm.texi
> >> @@ -6014,6 +6014,11 @@ holding the constant.  This restriction is often true of addresses
> >>  of TLS symbols for various targets.
> >>  @end deftypefn
> >>  
> >> +@deftypefn {Target Hook} bool TARGET_FASTER_LOADING_CONSTANT (machine_mode @var{mode}, rtx @var{y}, rtx @var{x})
> >> +It returns true if loading contant @var{x} is faster than building
> >> +from scratch, and set to @var{y}.  @var{mode} is the mode of @var{x}.
> >> +@end deftypefn
> >> +
> >>  @deftypefn {Target Hook} bool TARGET_USE_BLOCKS_FOR_CONSTANT_P (machine_mode @var{mode}, const_rtx @var{x})
> >>  This hook should return true if pool entries for constant @var{x} can
> >>  be placed in an @code{object_block} structure.  @var{mode} is the mode
> >> diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
> >> index 394b59e70a6..918914f0e30 100644
> >> --- a/gcc/doc/tm.texi.in
> >> +++ b/gcc/doc/tm.texi.in
> >> @@ -4148,6 +4148,8 @@ address;  but often a machine-dependent strategy can generate better code.
> >>  
> >>  @hook TARGET_CANNOT_FORCE_CONST_MEM
> >>  
> >> +@hook TARGET_FASTER_LOADING_CONSTANT
> >> +
> >>  @hook TARGET_USE_BLOCKS_FOR_CONSTANT_P
> >>  
> >>  @hook TARGET_USE_BLOCKS_FOR_DECL_P
> >> diff --git a/gcc/target.def b/gcc/target.def
> >> index 57e64b20eef..8b007aca9dc 100644
> >> --- a/gcc/target.def
> >> +++ b/gcc/target.def
> >> @@ -2987,6 +2987,14 @@ locally.  The default is to return false.",
> >>   bool, (void),
> >>   hook_bool_void_false)
> >>  
> >> +/* Check if it is profitable to load the constant from constant pool.  */
> >> +DEFHOOK
> >> +(faster_loading_constant,
> >> + "It returns true if loading contant @var{x} is faster than building\n\
> >> +from scratch, and set to @var{y}.  @var{mode} is the mode of @var{x}.",
> >> + bool, (machine_mode mode, rtx y, rtx x),
> >> + default_faster_loading_constant)
> >> +
> >>  /* True if it is OK to do sibling call optimization for the specified
> >>     call expression EXP.  DECL will be the called function, or NULL if
> >>     this is an indirect call.  */
> >> diff --git a/gcc/targhooks.cc b/gcc/targhooks.cc
> >> index fc49235eb38..12de61a7ed8 100644
> >> --- a/gcc/targhooks.cc
> >> +++ b/gcc/targhooks.cc
> >> @@ -173,6 +173,12 @@ default_return_in_memory (const_tree type,
> >>    return (TYPE_MODE (type) == BLKmode);
> >>  }
> >>  
> >> +bool
> >> +default_faster_loading_constant (machine_mode, rtx, rtx)
> >> +{
> >> +  return false;
> >> +}
> >> +
> >>  rtx
> >>  default_legitimize_address (rtx x, rtx orig_x ATTRIBUTE_UNUSED,
> >>  			    machine_mode mode ATTRIBUTE_UNUSED)
> >> diff --git a/gcc/targhooks.h b/gcc/targhooks.h
> >> index ecfa11287ef..4db21cd59f6 100644
> >> --- a/gcc/targhooks.h
> >> +++ b/gcc/targhooks.h
> >> @@ -295,5 +295,6 @@ extern rtx default_memtag_extract_tag (rtx, rtx);
> >>  extern rtx default_memtag_untagged_pointer (rtx, rtx);
> >>  
> >>  extern HOST_WIDE_INT default_gcov_type_size (void);
> >> +extern bool default_faster_loading_constant (machine_mode, rtx, rtx);
> >>  
> >>  #endif /* GCC_TARGHOOKS_H */
> >> diff --git a/gcc/testsuite/gcc.target/powerpc/medium_offset.c b/gcc/testsuite/gcc.target/powerpc/medium_offset.c
> >> index f29eba08c38..4889e8fa8ec 100644
> >> --- a/gcc/testsuite/gcc.target/powerpc/medium_offset.c
> >> +++ b/gcc/testsuite/gcc.target/powerpc/medium_offset.c
> >> @@ -1,7 +1,7 @@
> >>  /* { dg-do compile { target { powerpc*-*-* } } } */
> >>  /* { dg-require-effective-target lp64 } */
> >>  /* { dg-options "-O" } */
> >> -/* { dg-final { scan-assembler-not "\\+4611686018427387904" } } */
> >> +/* { dg-final { scan-assembler-times {\msldi|pld\M} 1 } } */
> >>  
> >>  static int x;
> >>  
> >> diff --git a/gcc/testsuite/gcc.target/powerpc/pr63281.c b/gcc/testsuite/gcc.target/powerpc/pr63281.c
> >> new file mode 100644
> >> index 00000000000..58ba074d427
> >> --- /dev/null
> >> +++ b/gcc/testsuite/gcc.target/powerpc/pr63281.c
> >> @@ -0,0 +1,14 @@
> >> +/* { dg-do compile } */
> >> +/* { dg-options "-O2" } */
> >> +
> >> +#define CONST1 0x100803004101001
> >> +#define CONST2 0x000406002105007
> >> +
> >> +void __attribute__ ((noinline))
> >> +foo (unsigned long long *a, unsigned long long *b)
> >> +{
> >> +  *a = CONST1;
> >> +  *b = CONST2;
> >> +}
> >> +
> >> +/* { dg-final { scan-assembler-times {\mld|pld\M} 2 } } */
> >> diff --git a/gcc/testsuite/gcc.target/powerpc/pr93012.c b/gcc/testsuite/gcc.target/powerpc/pr93012.c
> >> index 4f764d0576f..5afb4f79c45 100644
> >> --- a/gcc/testsuite/gcc.target/powerpc/pr93012.c
> >> +++ b/gcc/testsuite/gcc.target/powerpc/pr93012.c
> >> @@ -10,4 +10,4 @@ unsigned long long mskh1() { return 0xffff9234ffff9234ULL; }
> >>  unsigned long long mskl1() { return 0x2bcdffff2bcdffffULL; }
> >>  unsigned long long mskse() { return 0xffff1234ffff1234ULL; }
> >>  
> >> -/* { dg-final { scan-assembler-times {\mrldimi\M} 7 } } */
> >> +/* { dg-final { scan-assembler-times {\mrldimi|ld|pld\M} 7 } } */
> >> 
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Ivo Totev; HRB 36809 (AG Nuernberg)

  reply	other threads:[~2022-02-25  8:45 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-02-22  6:53 Jiufu Guo
2022-02-22  7:26 ` Richard Biener
2022-02-23 11:32   ` guojiufu
2022-02-23 13:02     ` Richard Biener
2022-02-23 21:14       ` Segher Boessenkool
2022-02-24  5:56         ` Jiufu Guo
2022-02-24  6:33           ` Jiufu Guo
2022-02-24  8:50             ` Richard Biener
2022-02-25  4:35               ` Jiufu Guo
2022-02-25  8:45                 ` Richard Biener [this message]
2022-02-25 13:32                   ` Jiufu Guo
2022-02-25 13:57                     ` Richard Biener
2022-02-28  9:15                       ` Jiufu Guo
2022-02-28 17:03               ` Segher Boessenkool
2022-03-01  2:59                 ` Jiufu Guo
2022-03-01  7:47                   ` Richard Biener
2022-03-01 13:47                     ` Jiufu Guo
2022-03-02 19:15                     ` Jeff Law
2022-03-03 10:08                       ` Jiufu Guo
2022-02-23 21:27     ` Segher Boessenkool
2022-02-24  7:48       ` Jiufu Guo
2022-02-28 16:45         ` Segher Boessenkool
2022-03-01 14:28           ` Jiufu Guo
2022-03-02 20:24             ` Segher Boessenkool
2022-03-03 10:09               ` Jiufu Guo
2022-03-08 11:25                 ` Jiufu Guo
2022-03-08 11:50                   ` Richard Biener
2022-03-09  4:37                     ` Jiufu Guo
2022-03-09  7:41                       ` Richard Biener
2022-03-10  2:09                         ` Jiufu Guo
2022-03-10  7:09                           ` Richard Biener
2022-03-11  6:33                             ` Jiufu Guo
2022-02-22 17:30 ` Segher Boessenkool
2022-02-23  3:31   ` guojiufu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1q51o2n4-r721-qqo6-o186-7r04o4463n@fhfr.qr \
    --to=rguenther@suse.de \
    --cc=dje.gcc@gmail.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=guojiufu@linux.ibm.com \
    --cc=jlaw@tachyum.com \
    --cc=segher@kernel.crashing.org \
    --cc=wschmidt@linux.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).