public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: Richard Biener <rguenther@suse.de>
To: Jiufu Guo <guojiufu@linux.ibm.com>
Cc: gcc-patches@gcc.gnu.org, segher@kernel.crashing.org,
	dje.gcc@gmail.com,  jlaw@tachyum.com, wschmidt@linux.ibm.com
Subject: Re: [PATCH] Check if loading const from mem is faster
Date: Tue, 22 Feb 2022 08:26:59 +0100 (CET)	[thread overview]
Message-ID: <70r5oq10-988r-3rns-356-o3s79o292nn0@fhfr.qr> (raw)
In-Reply-To: <20220222065313.2040127-1-guojiufu@linux.ibm.com>

On Tue, 22 Feb 2022, Jiufu Guo wrote:

> Hi,
> 
> For constants, there are some codes to check: if it is able to put
> to instruction as an immediate operand or it is profitable to load from
> mem.  There are still some places that could be improved for platforms.
> 
> This patch could handle PR63281/57836.  This patch does not change
> too much on the code like force_const_mem and legitimate_constant_p.
> We may integrate these APIs for passes like expand/cse/combine
> as a whole solution in the future (maybe better for stage1?).
> 
> Bootstrap and regtest pass on ppc64le and x86_64. Is this ok for trunk?
> Thanks for comments!

I'm not sure whether we need a new hook here, but iff, then I think
whether loading a constant (from memory?) is faster or not depends
on the context.  So what's the exact situation and the two variants
you are costing against each other?  I assume (since you are
touching CSE) you are costing

  (set (...) (mem))  (before CSE)

against

  (set (...) (immediate))  (what CSE does now)

vs.

  (set (...) (mem))  (original, no CSE)

?  With the new hook you are skipping _all_ of the following loops
logic which does look like a quite bad design and hack (not that
I am very familiar with the candidate / costing logic in there).

We already have TARGET_INSN_COST which you could ask for a cost.
Like if we'd have a single_set then just temporarily substitute
the RHS with the candidate and cost the insns and compare against
the original insn cost.  So why exactly do you need a new hook
for this particular situation?

Thanks,
Richard.


> 
> BR,
> Jiufu
> 
> gcc/ChangeLog:
> 
> 	PR target/94393
> 	PR rtl-optimization/63281
> 	* config/rs6000/rs6000.cc (TARGET_FASTER_LOADING_CONSTANT):
> 	New hook.
> 	(rs6000_faster_loading_const): New hook.
> 	(rs6000_cannot_force_const_mem): Update.
> 	(rs6000_emit_move): Use rs6000_faster_loading_const.
> 	* cse.cc (cse_insn): Use new hook.
> 	* doc/tm.texi.in: New hook TARGET_FASTER_LOADING_CONSTANT.
> 	* doc/tm.texi: Regenerate.
> 	* target.def: New hook faster_loading_constant.
> 	* targhooks.cc (default_faster_loading_constant): New.
> 	* targhooks.h (default_faster_loading_constant): Declare it.
> 
> gcc/testsuite/ChangeLog:
> 
> 	* gcc.target/powerpc/medium_offset.c: Update.
> 	* gcc.target/powerpc/pr93012.c: Update.
> 	* gcc.target/powerpc/pr63281.c: New test.
> 
> ---
>  gcc/config/rs6000/rs6000.cc                   | 38 ++++++++++++++-----
>  gcc/cse.cc                                    |  5 +++
>  gcc/doc/tm.texi                               |  5 +++
>  gcc/doc/tm.texi.in                            |  2 +
>  gcc/target.def                                |  8 ++++
>  gcc/targhooks.cc                              |  6 +++
>  .../gcc.target/powerpc/medium_offset.c        |  2 +-
>  gcc/testsuite/gcc.target/powerpc/pr63281.c    | 14 +++++++
>  gcc/testsuite/gcc.target/powerpc/pr93012.c    |  2 +-
>  9 files changed, 71 insertions(+), 11 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr63281.c
> 
> diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
> index d7a7cfe860f..beb21a0bb2b 100644
> --- a/gcc/config/rs6000/rs6000.cc
> +++ b/gcc/config/rs6000/rs6000.cc
> @@ -1361,6 +1361,9 @@ static const struct attribute_spec rs6000_attribute_table[] =
>  #undef TARGET_CANNOT_FORCE_CONST_MEM
>  #define TARGET_CANNOT_FORCE_CONST_MEM rs6000_cannot_force_const_mem
>  
> +#undef TARGET_FASTER_LOADING_CONSTANT
> +#define TARGET_FASTER_LOADING_CONSTANT rs6000_faster_loading_const
> +
>  #undef TARGET_DELEGITIMIZE_ADDRESS
>  #define TARGET_DELEGITIMIZE_ADDRESS rs6000_delegitimize_address
>  
> @@ -9684,8 +9687,7 @@ rs6000_init_stack_protect_guard (void)
>  static bool
>  rs6000_cannot_force_const_mem (machine_mode mode ATTRIBUTE_UNUSED, rtx x)
>  {
> -  if (GET_CODE (x) == HIGH
> -      && GET_CODE (XEXP (x, 0)) == UNSPEC)
> +  if (GET_CODE (x) == HIGH)
>      return true;
>  
>    /* A TLS symbol in the TOC cannot contain a sum.  */
> @@ -10483,6 +10485,30 @@ rs6000_emit_move_si_sf_subreg (rtx dest, rtx source, machine_mode mode)
>    return false;
>  }
>  
> +
> +/* Implement TARGET_FASTER_LOADING_CONSTANT.  */
> +
> +static bool
> +rs6000_faster_loading_const (machine_mode mode, rtx dest, rtx src)
> +{
> +  gcc_assert (CONSTANT_P (src));
> +
> +  if (GET_MODE_CLASS (mode) == MODE_CC || mode == VOIDmode)
> +    return false;
> +  if (GET_CODE (src) == HIGH)
> +    return false;
> +  if (toc_relative_expr_p (src, false, NULL, NULL))
> +    return false;
> +  if (rs6000_cannot_force_const_mem (mode, src))
> +    return false;
> +
> +  if (REG_P (dest) && FP_REGNO_P (REGNO (dest)))
> +    return true;
> +  if (!CONST_INT_P (src))
> +    return true;
> +  return num_insns_constant (src, mode) > (TARGET_PCREL ? 1 : 2);
> +}
> +
>  /* Emit a move from SOURCE to DEST in mode MODE.  */
>  void
>  rs6000_emit_move (rtx dest, rtx source, machine_mode mode)
> @@ -10860,13 +10886,7 @@ rs6000_emit_move (rtx dest, rtx source, machine_mode mode)
>  	operands[1] = create_TOC_reference (operands[1], operands[0]);
>        else if (mode == Pmode
>  	       && CONSTANT_P (operands[1])
> -	       && GET_CODE (operands[1]) != HIGH
> -	       && ((REG_P (operands[0])
> -		    && FP_REGNO_P (REGNO (operands[0])))
> -		   || !CONST_INT_P (operands[1])
> -		   || (num_insns_constant (operands[1], mode)
> -		       > (TARGET_CMODEL != CMODEL_SMALL ? 3 : 2)))
> -	       && !toc_relative_expr_p (operands[1], false, NULL, NULL)
> +	       && rs6000_faster_loading_const (mode, operands[0], operands[1])
>  	       && (TARGET_CMODEL == CMODEL_SMALL
>  		   || can_create_pseudo_p ()
>  		   || (REG_P (operands[0])
> diff --git a/gcc/cse.cc b/gcc/cse.cc
> index a18b599d324..c33d3c7e911 100644
> --- a/gcc/cse.cc
> +++ b/gcc/cse.cc
> @@ -5158,6 +5158,11 @@ cse_insn (rtx_insn *insn)
>        if (dest == pc_rtx && src_const && GET_CODE (src_const) == LABEL_REF)
>  	src_folded = src_const, src_folded_cost = src_folded_regcost = -1;
>  
> +      /* Check if src_foled is contant which is fastster to load pool.  */
> +      if (src_folded && CONSTANT_P (src_folded) && src && MEM_P (src)
> +	  && targetm.faster_loading_constant (mode, dest, src_folded))
> +	src_folded = NULL_RTX, src_folded_cost = src_folded_regcost = -1;
> +
>        /* Terminate loop when replacement made.  This must terminate since
>           the current contents will be tested and will always be valid.  */
>        while (1)
> diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
> index 962bbb8caaf..65685311249 100644
> --- a/gcc/doc/tm.texi
> +++ b/gcc/doc/tm.texi
> @@ -6014,6 +6014,11 @@ holding the constant.  This restriction is often true of addresses
>  of TLS symbols for various targets.
>  @end deftypefn
>  
> +@deftypefn {Target Hook} bool TARGET_FASTER_LOADING_CONSTANT (machine_mode @var{mode}, rtx @var{y}, rtx @var{x})
> +It returns true if loading contant @var{x} is faster than building
> +from scratch, and set to @var{y}.  @var{mode} is the mode of @var{x}.
> +@end deftypefn
> +
>  @deftypefn {Target Hook} bool TARGET_USE_BLOCKS_FOR_CONSTANT_P (machine_mode @var{mode}, const_rtx @var{x})
>  This hook should return true if pool entries for constant @var{x} can
>  be placed in an @code{object_block} structure.  @var{mode} is the mode
> diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
> index 394b59e70a6..918914f0e30 100644
> --- a/gcc/doc/tm.texi.in
> +++ b/gcc/doc/tm.texi.in
> @@ -4148,6 +4148,8 @@ address;  but often a machine-dependent strategy can generate better code.
>  
>  @hook TARGET_CANNOT_FORCE_CONST_MEM
>  
> +@hook TARGET_FASTER_LOADING_CONSTANT
> +
>  @hook TARGET_USE_BLOCKS_FOR_CONSTANT_P
>  
>  @hook TARGET_USE_BLOCKS_FOR_DECL_P
> diff --git a/gcc/target.def b/gcc/target.def
> index 57e64b20eef..8b007aca9dc 100644
> --- a/gcc/target.def
> +++ b/gcc/target.def
> @@ -2987,6 +2987,14 @@ locally.  The default is to return false.",
>   bool, (void),
>   hook_bool_void_false)
>  
> +/* Check if it is profitable to load the constant from constant pool.  */
> +DEFHOOK
> +(faster_loading_constant,
> + "It returns true if loading contant @var{x} is faster than building\n\
> +from scratch, and set to @var{y}.  @var{mode} is the mode of @var{x}.",
> + bool, (machine_mode mode, rtx y, rtx x),
> + default_faster_loading_constant)
> +
>  /* True if it is OK to do sibling call optimization for the specified
>     call expression EXP.  DECL will be the called function, or NULL if
>     this is an indirect call.  */
> diff --git a/gcc/targhooks.cc b/gcc/targhooks.cc
> index fc49235eb38..1d9340d1c14 100644
> --- a/gcc/targhooks.cc
> +++ b/gcc/targhooks.cc
> @@ -173,6 +173,12 @@ default_return_in_memory (const_tree type,
>    return (TYPE_MODE (type) == BLKmode);
>  }
>  
> +bool
> +default_faster_loading_constant (machine_mode mode, rtx dest, rtx src)
> +{
> +  return false;
> +}
> +
>  rtx
>  default_legitimize_address (rtx x, rtx orig_x ATTRIBUTE_UNUSED,
>  			    machine_mode mode ATTRIBUTE_UNUSED)
> diff --git a/gcc/testsuite/gcc.target/powerpc/medium_offset.c b/gcc/testsuite/gcc.target/powerpc/medium_offset.c
> index f29eba08c38..4889e8fa8ec 100644
> --- a/gcc/testsuite/gcc.target/powerpc/medium_offset.c
> +++ b/gcc/testsuite/gcc.target/powerpc/medium_offset.c
> @@ -1,7 +1,7 @@
>  /* { dg-do compile { target { powerpc*-*-* } } } */
>  /* { dg-require-effective-target lp64 } */
>  /* { dg-options "-O" } */
> -/* { dg-final { scan-assembler-not "\\+4611686018427387904" } } */
> +/* { dg-final { scan-assembler-times {\msldi|pld\M} 1 } } */
>  
>  static int x;
>  
> diff --git a/gcc/testsuite/gcc.target/powerpc/pr63281.c b/gcc/testsuite/gcc.target/powerpc/pr63281.c
> new file mode 100644
> index 00000000000..58ba074d427
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr63281.c
> @@ -0,0 +1,14 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2" } */
> +
> +#define CONST1 0x100803004101001
> +#define CONST2 0x000406002105007
> +
> +void __attribute__ ((noinline))
> +foo (unsigned long long *a, unsigned long long *b)
> +{
> +  *a = CONST1;
> +  *b = CONST2;
> +}
> +
> +/* { dg-final { scan-assembler-times {\mld|pld\M} 2 } } */
> diff --git a/gcc/testsuite/gcc.target/powerpc/pr93012.c b/gcc/testsuite/gcc.target/powerpc/pr93012.c
> index 4f764d0576f..5afb4f79c45 100644
> --- a/gcc/testsuite/gcc.target/powerpc/pr93012.c
> +++ b/gcc/testsuite/gcc.target/powerpc/pr93012.c
> @@ -10,4 +10,4 @@ unsigned long long mskh1() { return 0xffff9234ffff9234ULL; }
>  unsigned long long mskl1() { return 0x2bcdffff2bcdffffULL; }
>  unsigned long long mskse() { return 0xffff1234ffff1234ULL; }
>  
> -/* { dg-final { scan-assembler-times {\mrldimi\M} 7 } } */
> +/* { dg-final { scan-assembler-times {\mrldimi|ld|pld\M} 7 } } */
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Ivo Totev; HRB 36809 (AG Nuernberg)

  reply	other threads:[~2022-02-22  7:27 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-02-22  6:53 Jiufu Guo
2022-02-22  7:26 ` Richard Biener [this message]
2022-02-23 11:32   ` guojiufu
2022-02-23 13:02     ` Richard Biener
2022-02-23 21:14       ` Segher Boessenkool
2022-02-24  5:56         ` Jiufu Guo
2022-02-24  6:33           ` Jiufu Guo
2022-02-24  8:50             ` Richard Biener
2022-02-25  4:35               ` Jiufu Guo
2022-02-25  8:45                 ` Richard Biener
2022-02-25 13:32                   ` Jiufu Guo
2022-02-25 13:57                     ` Richard Biener
2022-02-28  9:15                       ` Jiufu Guo
2022-02-28 17:03               ` Segher Boessenkool
2022-03-01  2:59                 ` Jiufu Guo
2022-03-01  7:47                   ` Richard Biener
2022-03-01 13:47                     ` Jiufu Guo
2022-03-02 19:15                     ` Jeff Law
2022-03-03 10:08                       ` Jiufu Guo
2022-02-23 21:27     ` Segher Boessenkool
2022-02-24  7:48       ` Jiufu Guo
2022-02-28 16:45         ` Segher Boessenkool
2022-03-01 14:28           ` Jiufu Guo
2022-03-02 20:24             ` Segher Boessenkool
2022-03-03 10:09               ` Jiufu Guo
2022-03-08 11:25                 ` Jiufu Guo
2022-03-08 11:50                   ` Richard Biener
2022-03-09  4:37                     ` Jiufu Guo
2022-03-09  7:41                       ` Richard Biener
2022-03-10  2:09                         ` Jiufu Guo
2022-03-10  7:09                           ` Richard Biener
2022-03-11  6:33                             ` Jiufu Guo
2022-02-22 17:30 ` Segher Boessenkool
2022-02-23  3:31   ` guojiufu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=70r5oq10-988r-3rns-356-o3s79o292nn0@fhfr.qr \
    --to=rguenther@suse.de \
    --cc=dje.gcc@gmail.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=guojiufu@linux.ibm.com \
    --cc=jlaw@tachyum.com \
    --cc=segher@kernel.crashing.org \
    --cc=wschmidt@linux.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).