public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: Palmer Dabbelt <palmer@dabbelt.com>
To: manolis.tsamis@vrull.eu
Cc: gcc-patches@gcc.gnu.org, Vineet Gupta <vineetg@rivosinc.com>,
	Kito Cheng <kito.cheng@gmail.com>,
	philipp.tomsich@vrull.eu
Subject: Re: [PATCH] Enable shrink wrapping for the RISC-V target.
Date: Sun, 02 Oct 2022 13:32:29 -0700 (PDT)	[thread overview]
Message-ID: <mhng-a82e0def-6d5a-4983-98ba-b527e20a9781@palmer-ri-x1c9> (raw)
In-Reply-To: <20220906103902.2719585-1-manolis.tsamis@vrull.eu>

On Tue, 06 Sep 2022 03:39:02 PDT (-0700), manolis.tsamis@vrull.eu wrote:
> This commit implements the target macros (TARGET_SHRINK_WRAP_*) that
> enable separate shrink wrapping for function prologues/epilogues in
> RISC-V.
>
> Tested against SPEC CPU 2017, this change always has a net-positive
> effect on the dynamic instruction count.  See the following table for
> the breakdown on how this reduces the number of dynamic instructions
> per workload on a like-for-like (i.e., same config file; suppressing
> shrink-wrapping with -fno-shrink-wrap):

Does this also pass the regression tests?

(there's also some comments on the code in-line)

>
>                              # dynamic instructions
>                 w/o shrink-wrap   w/ shrink-wrap      reduction
> 500.perlbench_r   1265716786593    1262156218578     3560568015   0.28%
> 500.perlbench_r    779224795689     765337009025    13887786664   1.78%
> 500.perlbench_r    724087331471     711307152522    12780178949   1.77%
> 502.gcc_r          204259864844     194517006339     9742858505   4.77%
> 502.gcc_r          244047794302     231555834722    12491959580   5.12%
> 502.gcc_r          230896069400     221877703011     9018366389   3.91%
> 502.gcc_r          192130616624     183856450605     8274166019   4.31%
> 502.gcc_r          258875074079     247756203226    11118870853   4.30%
> 505.mcf_r          662653430325     660678680547     1974749778   0.30%
> 520.omnetpp_r      985114167068     934191310154    50922856914   5.17%
> 523.xalancbmk_r    927037633578     921688937650     5348695928   0.58%
> 525.x264_r         490953958454     490565583447      388375007   0.08%
> 525.x264_r        1994662294421    1993171932425     1490361996   0.07%
> 525.x264_r        1897617120450    1896062750609     1554369841   0.08%
> 531.deepsjeng_r   1695189878907    1669304130411    25885748496   1.53%
> 541.leela_r       1925941222222    1897900861198    28040361024   1.46%
> 548.exchange2_r   2073816227944    2073816226729           1215   0.00%
> 557.xz_r           379572090003     379057409041      514680962   0.14%
> 557.xz_r           953117469352     952680431430      437037922   0.05%
> 557.xz_r           536859579650     536456690164      402889486   0.08%
>                  18421773405376   18223938521833   197834883543   1.07%  totals
>
> Signed-off-by: Manolis Tsamis <manolis.tsamis@vrull.eu>
>
> gcc/ChangeLog:
>
> 	* config/riscv/riscv.cc (struct machine_function): Add array to store
> 	register wrapping information.
> 	(riscv_for_each_saved_reg): Skip registers that are wrapped separetely.
> 	(riscv_get_separate_components): New function.
> 	(riscv_components_for_bb): Likewise.
> 	(riscv_disqualify_components): Likewise.
> 	(riscv_process_components): Likewise.
> 	(riscv_emit_prologue_components): Likewise.
> 	(riscv_emit_epilogue_components): Likewise.
> 	(riscv_set_handled_components): Likewise.
> 	(TARGET_SHRINK_WRAP_GET_SEPARATE_COMPONENTS): Define.
> 	(TARGET_SHRINK_WRAP_COMPONENTS_FOR_BB): Likewise.
> 	(TARGET_SHRINK_WRAP_DISQUALIFY_COMPONENTS): Likewise.
> 	(TARGET_SHRINK_WRAP_EMIT_PROLOGUE_COMPONENTS): Likewise.
> 	(TARGET_SHRINK_WRAP_EMIT_EPILOGUE_COMPONENTS): Likewise.
> 	(TARGET_SHRINK_WRAP_SET_HANDLED_COMPONENTS): Likewise.
>
> gcc/testsuite/ChangeLog:
>
> 	* gcc.target/riscv/shrink-wrap-1.c: New test.
>
> ---
>
>  gcc/config/riscv/riscv.cc                     | 187 +++++++++++++++++-
>  .../gcc.target/riscv/shrink-wrap-1.c          |  25 +++
>  2 files changed, 210 insertions(+), 2 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/riscv/shrink-wrap-1.c
>
> diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
> index 5a0adffb5ce..3b633149a9a 100644
> --- a/gcc/config/riscv/riscv.cc
> +++ b/gcc/config/riscv/riscv.cc
> @@ -25,6 +25,7 @@ along with GCC; see the file COPYING3.  If not see
>  #include "config.h"
>  #include "system.h"
>  #include "coretypes.h"
> +#include "backend.h"
>  #include "tm.h"
>  #include "rtl.h"
>  #include "regs.h"
> @@ -52,6 +53,7 @@ along with GCC; see the file COPYING3.  If not see
>  #include "optabs.h"
>  #include "bitmap.h"
>  #include "df.h"
> +#include "function-abi.h"
>  #include "diagnostic.h"
>  #include "builtins.h"
>  #include "predict.h"
> @@ -147,6 +149,11 @@ struct GTY(())  machine_function {
>
>    /* The current frame information, calculated by riscv_compute_frame_info.  */
>    struct riscv_frame_info frame;
> +
> +  /* The components already handled by separate shrink-wrapping, which should
> +     not be considered by the prologue and epilogue.  */
> +  bool reg_is_wrapped_separately[FIRST_PSEUDO_REGISTER];
> +
>  };
>
>  /* Information about a single argument.  */
> @@ -4209,7 +4216,7 @@ riscv_for_each_saved_reg (HOST_WIDE_INT sp_offset, riscv_save_restore_fn fn,
>    for (unsigned int regno = GP_REG_FIRST; regno <= GP_REG_LAST; regno++)
>      if (BITSET_P (cfun->machine->frame.mask, regno - GP_REG_FIRST))
>        {
> -	bool handle_reg = TRUE;
> +	bool handle_reg = !cfun->machine->reg_is_wrapped_separately[regno];
>
>  	/* If this is a normal return in a function that calls the eh_return
>  	   builtin, then do not restore the eh return data registers as that
> @@ -4240,9 +4247,11 @@ riscv_for_each_saved_reg (HOST_WIDE_INT sp_offset, riscv_save_restore_fn fn,
>    for (unsigned int regno = FP_REG_FIRST; regno <= FP_REG_LAST; regno++)
>      if (BITSET_P (cfun->machine->frame.fmask, regno - FP_REG_FIRST))
>        {
> +	bool handle_reg = !cfun->machine->reg_is_wrapped_separately[regno];
>  	machine_mode mode = TARGET_DOUBLE_FLOAT ? DFmode : SFmode;
>
> -	riscv_save_restore_reg (mode, regno, offset, fn);
> +	if (handle_reg)
> +	  riscv_save_restore_reg (mode, regno, offset, fn);
>  	offset -= GET_MODE_SIZE (mode);
>        }
>  }
> @@ -4667,6 +4676,156 @@ riscv_epilogue_uses (unsigned int regno)
>    return false;
>  }
>
> +/* Implement TARGET_SHRINK_WRAP_GET_SEPARATE_COMPONENTS.  */
> +
> +static sbitmap
> +riscv_get_separate_components (void)
> +{
> +  HOST_WIDE_INT offset;
> +  sbitmap components = sbitmap_alloc (FIRST_PSEUDO_REGISTER);
> +  bitmap_clear (components);
> +
> +  if (riscv_use_save_libcall (&cfun->machine->frame)
> +      || cfun->machine->interrupt_handler_p)

riscv_use_save_libcall() already checks interrupt_handler_p, so that's 
redundant.  That said, I'm not sure riscv_use_save_libcall() is the 
right check here as unless I'm missing something we don't have all those 
other constraints when shrink-wrapping.

Either way, we'll also need to avoid saving for naked functions.

> +    return components;
> +
> +  offset = cfun->machine->frame.gp_sp_offset;
> +  for (unsigned int regno = GP_REG_FIRST; regno <= GP_REG_LAST; regno++)
> +    if (BITSET_P (cfun->machine->frame.mask, regno - GP_REG_FIRST))
> +      {
> +	if (SMALL_OPERAND (offset))
> +	  bitmap_set_bit (components, regno);
> +
> +	offset -= UNITS_PER_WORD;
> +      }
> +
> +  offset = cfun->machine->frame.fp_sp_offset;
> +  for (unsigned int regno = FP_REG_FIRST; regno <= FP_REG_LAST; regno++)
> +    if (BITSET_P (cfun->machine->frame.fmask, regno - FP_REG_FIRST))
> +      {
> +	machine_mode mode = TARGET_DOUBLE_FLOAT ? DFmode : SFmode;
> +
> +	if (SMALL_OPERAND (offset))
> +	  bitmap_set_bit (components, regno);
> +
> +	offset -= GET_MODE_SIZE (mode);
> +      }

It seems kind of clunky to have two copies of all these loops (and we'll 
need a third to make this work with the V stuff), but we've got that 
issue elsewhere in the port so I don't think you need to fix it here 
(though the V stuff will be there for the v2, so you'll need the third 
copy of each loop).

> +
> +  /* Don't mess with the hard frame pointer.  */
> +  if (frame_pointer_needed)
> +    bitmap_clear_bit (components, HARD_FRAME_POINTER_REGNUM);
> +
> +  bitmap_clear_bit (components, RETURN_ADDR_REGNUM);
> +
> +  return components;
> +}
> +
> +/* Implement TARGET_SHRINK_WRAP_COMPONENTS_FOR_BB.  */
> +
> +static sbitmap
> +riscv_components_for_bb (basic_block bb)
> +{
> +  bitmap in = DF_LIVE_IN (bb);
> +  bitmap gen = &DF_LIVE_BB_INFO (bb)->gen;
> +  bitmap kill = &DF_LIVE_BB_INFO (bb)->kill;
> +
> +  sbitmap components = sbitmap_alloc (FIRST_PSEUDO_REGISTER);
> +  bitmap_clear (components);
> +
> +  function_abi_aggregator callee_abis;
> +  rtx_insn *insn;
> +  FOR_BB_INSNS (bb, insn)
> +    if (CALL_P (insn))
> +      callee_abis.note_callee_abi (insn_callee_abi (insn));
> +  HARD_REG_SET extra_caller_saves = callee_abis.caller_save_regs (*crtl->abi);
> +
> +  /* GPRs are used in a bb if they are in the IN, GEN, or KILL sets.  */
> +  for (unsigned int regno = GP_REG_FIRST; regno <= GP_REG_LAST; regno++)
> +    if (!fixed_regs[regno]
> +       && !crtl->abi->clobbers_full_reg_p (regno)
> +       && (TEST_HARD_REG_BIT (extra_caller_saves, regno)
> +	   || bitmap_bit_p (in, regno)
> +	   || bitmap_bit_p (gen, regno)
> +	   || bitmap_bit_p (kill, regno)))
> +      bitmap_set_bit (components, regno);
> +
> +  for (unsigned int regno = FP_REG_FIRST; regno <= FP_REG_LAST; regno++)
> +    if (!fixed_regs[regno]
> +       && !crtl->abi->clobbers_full_reg_p (regno)
> +       && (TEST_HARD_REG_BIT (extra_caller_saves, regno)
> +	   || bitmap_bit_p (in, regno)
> +	   || bitmap_bit_p (gen, regno)
> +	   || bitmap_bit_p (kill, regno)))
> +      bitmap_set_bit (components, regno);
> +
> +  return components;
> +}
> +
> +/* Implement TARGET_SHRINK_WRAP_DISQUALIFY_COMPONENTS.  */
> +
> +static void
> +riscv_disqualify_components (sbitmap, edge, sbitmap, bool)
> +{
> +  /* Nothing to do for riscv.  */

I think that's true, but when poking around I couldn't figure out how 
saving registers to long stack frames works: we need a temporary register 
to generate the long stack offset, which is easy when we're in the 
pro/epilogue but I haven't figured out if it's handled properly when 
shrink wrapping.

It looks like the aarch64 port side steps the issue by only allowing 
registers that fall within a short offset to be shrink wrapped.  That 
seems like a pretty straight-forward way to handle the problem, but I'm 
not sure how it'd impact performance -- we've already got a lot of 
issues with large stack frames, though, so maybe it's OK to just punt on 
them for now?

Either way, this deserves a test case.  I think it should be possible to 
write one by introducing some register pressure around a 
shrink-wrappable block that needs a long stack offset and making sure 
in-flight registers don't get trashed.

> +}
> +
> +static void
> +riscv_process_components (sbitmap components, bool prologue_p)
> +{
> +  HOST_WIDE_INT offset;
> +  riscv_save_restore_fn fn = prologue_p? riscv_save_reg : riscv_restore_reg;
> +
> +  offset = cfun->machine->frame.gp_sp_offset;
> +  for (unsigned int regno = GP_REG_FIRST; regno <= GP_REG_LAST; regno++)
> +    if (BITSET_P (cfun->machine->frame.mask, regno - GP_REG_FIRST))
> +      {
> +	if (bitmap_bit_p (components, regno))
> +	  riscv_save_restore_reg (word_mode, regno, offset, fn);
> +
> +	offset -= UNITS_PER_WORD;
> +      }
> +
> +  offset = cfun->machine->frame.fp_sp_offset;
> +  for (unsigned int regno = FP_REG_FIRST; regno <= FP_REG_LAST; regno++)
> +    if (BITSET_P (cfun->machine->frame.fmask, regno - FP_REG_FIRST))
> +      {
> +	machine_mode mode = TARGET_DOUBLE_FLOAT ? DFmode : SFmode;
> +
> +	if (bitmap_bit_p (components, regno))
> +	  riscv_save_restore_reg (mode, regno, offset, fn);
> +
> +	offset -= GET_MODE_SIZE (mode);
> +      }
> +}
> +
> +/* Implement TARGET_SHRINK_WRAP_EMIT_PROLOGUE_COMPONENTS.  */
> +
> +static void
> +riscv_emit_prologue_components (sbitmap components)
> +{
> +  riscv_process_components (components, true);
> +}
> +
> +/* Implement TARGET_SHRINK_WRAP_EMIT_EPILOGUE_COMPONENTS.  */
> +
> +static void
> +riscv_emit_epilogue_components (sbitmap components)
> +{
> +  riscv_process_components (components, false);
> +}
> +
> +static void
> +riscv_set_handled_components (sbitmap components)
> +{
> +  for (unsigned int regno = GP_REG_FIRST; regno <= GP_REG_LAST; regno++)
> +    if (bitmap_bit_p (components, regno))
> +      cfun->machine->reg_is_wrapped_separately[regno] = true;
> +
> +  for (unsigned int regno = FP_REG_FIRST; regno <= FP_REG_LAST; regno++)
> +    if (bitmap_bit_p (components, regno))
> +      cfun->machine->reg_is_wrapped_separately[regno] = true;
> +}
> +
>  /* Return nonzero if this function is known to have a null epilogue.
>     This allows the optimizer to omit jumps to jumps if no stack
>     was created.  */
> @@ -5713,6 +5872,30 @@ riscv_asan_shadow_offset (void)
>  #undef TARGET_FUNCTION_ARG_BOUNDARY
>  #define TARGET_FUNCTION_ARG_BOUNDARY riscv_function_arg_boundary
>
> +#undef TARGET_SHRINK_WRAP_GET_SEPARATE_COMPONENTS
> +#define TARGET_SHRINK_WRAP_GET_SEPARATE_COMPONENTS \
> +  riscv_get_separate_components
> +
> +#undef TARGET_SHRINK_WRAP_COMPONENTS_FOR_BB
> +#define TARGET_SHRINK_WRAP_COMPONENTS_FOR_BB \
> +  riscv_components_for_bb
> +
> +#undef TARGET_SHRINK_WRAP_DISQUALIFY_COMPONENTS
> +#define TARGET_SHRINK_WRAP_DISQUALIFY_COMPONENTS \
> +  riscv_disqualify_components
> +
> +#undef TARGET_SHRINK_WRAP_EMIT_PROLOGUE_COMPONENTS
> +#define TARGET_SHRINK_WRAP_EMIT_PROLOGUE_COMPONENTS \
> +  riscv_emit_prologue_components
> +
> +#undef TARGET_SHRINK_WRAP_EMIT_EPILOGUE_COMPONENTS
> +#define TARGET_SHRINK_WRAP_EMIT_EPILOGUE_COMPONENTS \
> +  riscv_emit_epilogue_components
> +
> +#undef TARGET_SHRINK_WRAP_SET_HANDLED_COMPONENTS
> +#define TARGET_SHRINK_WRAP_SET_HANDLED_COMPONENTS \
> +  riscv_set_handled_components
> +
>  /* The generic ELF target does not always have TLS support.  */
>  #ifdef HAVE_AS_TLS
>  #undef TARGET_HAVE_TLS
> diff --git a/gcc/testsuite/gcc.target/riscv/shrink-wrap-1.c b/gcc/testsuite/gcc.target/riscv/shrink-wrap-1.c
> new file mode 100644
> index 00000000000..b05745968e6
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/shrink-wrap-1.c
> @@ -0,0 +1,25 @@
> +/* { dg-do compile } */
> +/* { dg-options "-fshrink-wrap" } */
> +/* { dg-skip-if "" { *-*-* } { "-O0" "-Os" "-Oz" } } */
> +
> +void g(void);
> +
> +void f(int x)
> +{
> +	if (x)
> +    {
> +      // Those only need to be saved if x is non-zero; without
> +      // separate shrink-wrapping it would however be saved in all cases.
> +      // Force saving of callee-saved registers
> +      register int s2 asm("18") = x;
> +      register int s3 asm("19") = x;
> +      register int s4 asm("20") = x;
> +      asm("" : : "r"(s2));
> +      asm("" : : "r"(s3));
> +      asm("" : : "r"(s4));
> +      g();
> +    }
> +}
> +
> +// The resulting code should do nothing if x == 0
> +/* { dg-final { scan-assembler "bne\ta0,zero,.*\n.*ret" } } */

  reply	other threads:[~2022-10-02 20:32 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-09-06 10:39 mtsamis
2022-10-02 20:32 ` Palmer Dabbelt [this message]
2022-10-18 15:18   ` Manolis Tsamis
2022-10-18 15:57     ` Jeff Law
2022-10-18 17:35       ` Palmer Dabbelt
2022-10-19 17:15         ` Jeff Law
2022-10-20  7:42           ` Manolis Tsamis
2022-11-02 14:12           ` Manolis Tsamis
2022-11-02 15:02             ` Jeff Law
2022-10-20  7:35         ` Manolis Tsamis
2022-11-02 13:54         ` Manolis Tsamis
2022-11-02 15:06           ` Jeff Law
2022-11-03  0:26             ` Palmer Dabbelt
2022-11-03 22:23               ` Jeff Law
2022-11-07 22:07                 ` Palmer Dabbelt
2022-11-13  1:32                   ` Jeff Law
2022-11-16 10:26                     ` Manolis Tsamis
2022-11-17  2:09                       ` Jeff Law
2022-11-17 10:54                         ` Manolis Tsamis
2022-11-17 11:59                           ` Philipp Tomsich

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=mhng-a82e0def-6d5a-4983-98ba-b527e20a9781@palmer-ri-x1c9 \
    --to=palmer@dabbelt.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=kito.cheng@gmail.com \
    --cc=manolis.tsamis@vrull.eu \
    --cc=philipp.tomsich@vrull.eu \
    --cc=vineetg@rivosinc.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).