From: Palmer Dabbelt <palmer@dabbelt.com>
To: manolis.tsamis@vrull.eu
Cc: gcc-patches@gcc.gnu.org, Vineet Gupta <vineetg@rivosinc.com>,
Kito Cheng <kito.cheng@gmail.com>,
philipp.tomsich@vrull.eu
Subject: Re: [PATCH] Enable shrink wrapping for the RISC-V target.
Date: Sun, 02 Oct 2022 13:32:29 -0700 (PDT) [thread overview]
Message-ID: <mhng-a82e0def-6d5a-4983-98ba-b527e20a9781@palmer-ri-x1c9> (raw)
In-Reply-To: <20220906103902.2719585-1-manolis.tsamis@vrull.eu>
On Tue, 06 Sep 2022 03:39:02 PDT (-0700), manolis.tsamis@vrull.eu wrote:
> This commit implements the target macros (TARGET_SHRINK_WRAP_*) that
> enable separate shrink wrapping for function prologues/epilogues in
> RISC-V.
>
> Tested against SPEC CPU 2017, this change always has a net-positive
> effect on the dynamic instruction count. See the following table for
> the breakdown on how this reduces the number of dynamic instructions
> per workload on a like-for-like (i.e., same config file; suppressing
> shrink-wrapping with -fno-shrink-wrap):
Does this also pass the regression tests?
(there's also some comments on the code in-line)
>
> # dynamic instructions
> w/o shrink-wrap w/ shrink-wrap reduction
> 500.perlbench_r 1265716786593 1262156218578 3560568015 0.28%
> 500.perlbench_r 779224795689 765337009025 13887786664 1.78%
> 500.perlbench_r 724087331471 711307152522 12780178949 1.77%
> 502.gcc_r 204259864844 194517006339 9742858505 4.77%
> 502.gcc_r 244047794302 231555834722 12491959580 5.12%
> 502.gcc_r 230896069400 221877703011 9018366389 3.91%
> 502.gcc_r 192130616624 183856450605 8274166019 4.31%
> 502.gcc_r 258875074079 247756203226 11118870853 4.30%
> 505.mcf_r 662653430325 660678680547 1974749778 0.30%
> 520.omnetpp_r 985114167068 934191310154 50922856914 5.17%
> 523.xalancbmk_r 927037633578 921688937650 5348695928 0.58%
> 525.x264_r 490953958454 490565583447 388375007 0.08%
> 525.x264_r 1994662294421 1993171932425 1490361996 0.07%
> 525.x264_r 1897617120450 1896062750609 1554369841 0.08%
> 531.deepsjeng_r 1695189878907 1669304130411 25885748496 1.53%
> 541.leela_r 1925941222222 1897900861198 28040361024 1.46%
> 548.exchange2_r 2073816227944 2073816226729 1215 0.00%
> 557.xz_r 379572090003 379057409041 514680962 0.14%
> 557.xz_r 953117469352 952680431430 437037922 0.05%
> 557.xz_r 536859579650 536456690164 402889486 0.08%
> 18421773405376 18223938521833 197834883543 1.07% totals
>
> Signed-off-by: Manolis Tsamis <manolis.tsamis@vrull.eu>
>
> gcc/ChangeLog:
>
> * config/riscv/riscv.cc (struct machine_function): Add array to store
> register wrapping information.
> (riscv_for_each_saved_reg): Skip registers that are wrapped separetely.
> (riscv_get_separate_components): New function.
> (riscv_components_for_bb): Likewise.
> (riscv_disqualify_components): Likewise.
> (riscv_process_components): Likewise.
> (riscv_emit_prologue_components): Likewise.
> (riscv_emit_epilogue_components): Likewise.
> (riscv_set_handled_components): Likewise.
> (TARGET_SHRINK_WRAP_GET_SEPARATE_COMPONENTS): Define.
> (TARGET_SHRINK_WRAP_COMPONENTS_FOR_BB): Likewise.
> (TARGET_SHRINK_WRAP_DISQUALIFY_COMPONENTS): Likewise.
> (TARGET_SHRINK_WRAP_EMIT_PROLOGUE_COMPONENTS): Likewise.
> (TARGET_SHRINK_WRAP_EMIT_EPILOGUE_COMPONENTS): Likewise.
> (TARGET_SHRINK_WRAP_SET_HANDLED_COMPONENTS): Likewise.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/shrink-wrap-1.c: New test.
>
> ---
>
> gcc/config/riscv/riscv.cc | 187 +++++++++++++++++-
> .../gcc.target/riscv/shrink-wrap-1.c | 25 +++
> 2 files changed, 210 insertions(+), 2 deletions(-)
> create mode 100644 gcc/testsuite/gcc.target/riscv/shrink-wrap-1.c
>
> diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
> index 5a0adffb5ce..3b633149a9a 100644
> --- a/gcc/config/riscv/riscv.cc
> +++ b/gcc/config/riscv/riscv.cc
> @@ -25,6 +25,7 @@ along with GCC; see the file COPYING3. If not see
> #include "config.h"
> #include "system.h"
> #include "coretypes.h"
> +#include "backend.h"
> #include "tm.h"
> #include "rtl.h"
> #include "regs.h"
> @@ -52,6 +53,7 @@ along with GCC; see the file COPYING3. If not see
> #include "optabs.h"
> #include "bitmap.h"
> #include "df.h"
> +#include "function-abi.h"
> #include "diagnostic.h"
> #include "builtins.h"
> #include "predict.h"
> @@ -147,6 +149,11 @@ struct GTY(()) machine_function {
>
> /* The current frame information, calculated by riscv_compute_frame_info. */
> struct riscv_frame_info frame;
> +
> + /* The components already handled by separate shrink-wrapping, which should
> + not be considered by the prologue and epilogue. */
> + bool reg_is_wrapped_separately[FIRST_PSEUDO_REGISTER];
> +
> };
>
> /* Information about a single argument. */
> @@ -4209,7 +4216,7 @@ riscv_for_each_saved_reg (HOST_WIDE_INT sp_offset, riscv_save_restore_fn fn,
> for (unsigned int regno = GP_REG_FIRST; regno <= GP_REG_LAST; regno++)
> if (BITSET_P (cfun->machine->frame.mask, regno - GP_REG_FIRST))
> {
> - bool handle_reg = TRUE;
> + bool handle_reg = !cfun->machine->reg_is_wrapped_separately[regno];
>
> /* If this is a normal return in a function that calls the eh_return
> builtin, then do not restore the eh return data registers as that
> @@ -4240,9 +4247,11 @@ riscv_for_each_saved_reg (HOST_WIDE_INT sp_offset, riscv_save_restore_fn fn,
> for (unsigned int regno = FP_REG_FIRST; regno <= FP_REG_LAST; regno++)
> if (BITSET_P (cfun->machine->frame.fmask, regno - FP_REG_FIRST))
> {
> + bool handle_reg = !cfun->machine->reg_is_wrapped_separately[regno];
> machine_mode mode = TARGET_DOUBLE_FLOAT ? DFmode : SFmode;
>
> - riscv_save_restore_reg (mode, regno, offset, fn);
> + if (handle_reg)
> + riscv_save_restore_reg (mode, regno, offset, fn);
> offset -= GET_MODE_SIZE (mode);
> }
> }
> @@ -4667,6 +4676,156 @@ riscv_epilogue_uses (unsigned int regno)
> return false;
> }
>
> +/* Implement TARGET_SHRINK_WRAP_GET_SEPARATE_COMPONENTS. */
> +
> +static sbitmap
> +riscv_get_separate_components (void)
> +{
> + HOST_WIDE_INT offset;
> + sbitmap components = sbitmap_alloc (FIRST_PSEUDO_REGISTER);
> + bitmap_clear (components);
> +
> + if (riscv_use_save_libcall (&cfun->machine->frame)
> + || cfun->machine->interrupt_handler_p)
riscv_use_save_libcall() already checks interrupt_handler_p, so that's
redundant. That said, I'm not sure riscv_use_save_libcall() is the
right check here as unless I'm missing something we don't have all those
other constraints when shrink-wrapping.
Either way, we'll also need to avoid saving for naked functions.
> + return components;
> +
> + offset = cfun->machine->frame.gp_sp_offset;
> + for (unsigned int regno = GP_REG_FIRST; regno <= GP_REG_LAST; regno++)
> + if (BITSET_P (cfun->machine->frame.mask, regno - GP_REG_FIRST))
> + {
> + if (SMALL_OPERAND (offset))
> + bitmap_set_bit (components, regno);
> +
> + offset -= UNITS_PER_WORD;
> + }
> +
> + offset = cfun->machine->frame.fp_sp_offset;
> + for (unsigned int regno = FP_REG_FIRST; regno <= FP_REG_LAST; regno++)
> + if (BITSET_P (cfun->machine->frame.fmask, regno - FP_REG_FIRST))
> + {
> + machine_mode mode = TARGET_DOUBLE_FLOAT ? DFmode : SFmode;
> +
> + if (SMALL_OPERAND (offset))
> + bitmap_set_bit (components, regno);
> +
> + offset -= GET_MODE_SIZE (mode);
> + }
It seems kind of clunky to have two copies of all these loops (and we'll
need a third to make this work with the V stuff), but we've got that
issue elsewhere in the port so I don't think you need to fix it here
(though the V stuff will be there for the v2, so you'll need the third
copy of each loop).
> +
> + /* Don't mess with the hard frame pointer. */
> + if (frame_pointer_needed)
> + bitmap_clear_bit (components, HARD_FRAME_POINTER_REGNUM);
> +
> + bitmap_clear_bit (components, RETURN_ADDR_REGNUM);
> +
> + return components;
> +}
> +
> +/* Implement TARGET_SHRINK_WRAP_COMPONENTS_FOR_BB. */
> +
> +static sbitmap
> +riscv_components_for_bb (basic_block bb)
> +{
> + bitmap in = DF_LIVE_IN (bb);
> + bitmap gen = &DF_LIVE_BB_INFO (bb)->gen;
> + bitmap kill = &DF_LIVE_BB_INFO (bb)->kill;
> +
> + sbitmap components = sbitmap_alloc (FIRST_PSEUDO_REGISTER);
> + bitmap_clear (components);
> +
> + function_abi_aggregator callee_abis;
> + rtx_insn *insn;
> + FOR_BB_INSNS (bb, insn)
> + if (CALL_P (insn))
> + callee_abis.note_callee_abi (insn_callee_abi (insn));
> + HARD_REG_SET extra_caller_saves = callee_abis.caller_save_regs (*crtl->abi);
> +
> + /* GPRs are used in a bb if they are in the IN, GEN, or KILL sets. */
> + for (unsigned int regno = GP_REG_FIRST; regno <= GP_REG_LAST; regno++)
> + if (!fixed_regs[regno]
> + && !crtl->abi->clobbers_full_reg_p (regno)
> + && (TEST_HARD_REG_BIT (extra_caller_saves, regno)
> + || bitmap_bit_p (in, regno)
> + || bitmap_bit_p (gen, regno)
> + || bitmap_bit_p (kill, regno)))
> + bitmap_set_bit (components, regno);
> +
> + for (unsigned int regno = FP_REG_FIRST; regno <= FP_REG_LAST; regno++)
> + if (!fixed_regs[regno]
> + && !crtl->abi->clobbers_full_reg_p (regno)
> + && (TEST_HARD_REG_BIT (extra_caller_saves, regno)
> + || bitmap_bit_p (in, regno)
> + || bitmap_bit_p (gen, regno)
> + || bitmap_bit_p (kill, regno)))
> + bitmap_set_bit (components, regno);
> +
> + return components;
> +}
> +
> +/* Implement TARGET_SHRINK_WRAP_DISQUALIFY_COMPONENTS. */
> +
> +static void
> +riscv_disqualify_components (sbitmap, edge, sbitmap, bool)
> +{
> + /* Nothing to do for riscv. */
I think that's true, but when poking around I couldn't figure out how
saving registers to long stack frames works: we need a temporary register
to generate the long stack offset, which is easy when we're in the
pro/epilogue but I haven't figured out if it's handled properly when
shrink wrapping.
It looks like the aarch64 port side steps the issue by only allowing
registers that fall within a short offset to be shrink wrapped. That
seems like a pretty straight-forward way to handle the problem, but I'm
not sure how it'd impact performance -- we've already got a lot of
issues with large stack frames, though, so maybe it's OK to just punt on
them for now?
Either way, this deserves a test case. I think it should be possible to
write one by introducing some register pressure around a
shrink-wrappable block that needs a long stack offset and making sure
in-flight registers don't get trashed.
> +}
> +
> +static void
> +riscv_process_components (sbitmap components, bool prologue_p)
> +{
> + HOST_WIDE_INT offset;
> + riscv_save_restore_fn fn = prologue_p? riscv_save_reg : riscv_restore_reg;
> +
> + offset = cfun->machine->frame.gp_sp_offset;
> + for (unsigned int regno = GP_REG_FIRST; regno <= GP_REG_LAST; regno++)
> + if (BITSET_P (cfun->machine->frame.mask, regno - GP_REG_FIRST))
> + {
> + if (bitmap_bit_p (components, regno))
> + riscv_save_restore_reg (word_mode, regno, offset, fn);
> +
> + offset -= UNITS_PER_WORD;
> + }
> +
> + offset = cfun->machine->frame.fp_sp_offset;
> + for (unsigned int regno = FP_REG_FIRST; regno <= FP_REG_LAST; regno++)
> + if (BITSET_P (cfun->machine->frame.fmask, regno - FP_REG_FIRST))
> + {
> + machine_mode mode = TARGET_DOUBLE_FLOAT ? DFmode : SFmode;
> +
> + if (bitmap_bit_p (components, regno))
> + riscv_save_restore_reg (mode, regno, offset, fn);
> +
> + offset -= GET_MODE_SIZE (mode);
> + }
> +}
> +
> +/* Implement TARGET_SHRINK_WRAP_EMIT_PROLOGUE_COMPONENTS. */
> +
> +static void
> +riscv_emit_prologue_components (sbitmap components)
> +{
> + riscv_process_components (components, true);
> +}
> +
> +/* Implement TARGET_SHRINK_WRAP_EMIT_EPILOGUE_COMPONENTS. */
> +
> +static void
> +riscv_emit_epilogue_components (sbitmap components)
> +{
> + riscv_process_components (components, false);
> +}
> +
> +static void
> +riscv_set_handled_components (sbitmap components)
> +{
> + for (unsigned int regno = GP_REG_FIRST; regno <= GP_REG_LAST; regno++)
> + if (bitmap_bit_p (components, regno))
> + cfun->machine->reg_is_wrapped_separately[regno] = true;
> +
> + for (unsigned int regno = FP_REG_FIRST; regno <= FP_REG_LAST; regno++)
> + if (bitmap_bit_p (components, regno))
> + cfun->machine->reg_is_wrapped_separately[regno] = true;
> +}
> +
> /* Return nonzero if this function is known to have a null epilogue.
> This allows the optimizer to omit jumps to jumps if no stack
> was created. */
> @@ -5713,6 +5872,30 @@ riscv_asan_shadow_offset (void)
> #undef TARGET_FUNCTION_ARG_BOUNDARY
> #define TARGET_FUNCTION_ARG_BOUNDARY riscv_function_arg_boundary
>
> +#undef TARGET_SHRINK_WRAP_GET_SEPARATE_COMPONENTS
> +#define TARGET_SHRINK_WRAP_GET_SEPARATE_COMPONENTS \
> + riscv_get_separate_components
> +
> +#undef TARGET_SHRINK_WRAP_COMPONENTS_FOR_BB
> +#define TARGET_SHRINK_WRAP_COMPONENTS_FOR_BB \
> + riscv_components_for_bb
> +
> +#undef TARGET_SHRINK_WRAP_DISQUALIFY_COMPONENTS
> +#define TARGET_SHRINK_WRAP_DISQUALIFY_COMPONENTS \
> + riscv_disqualify_components
> +
> +#undef TARGET_SHRINK_WRAP_EMIT_PROLOGUE_COMPONENTS
> +#define TARGET_SHRINK_WRAP_EMIT_PROLOGUE_COMPONENTS \
> + riscv_emit_prologue_components
> +
> +#undef TARGET_SHRINK_WRAP_EMIT_EPILOGUE_COMPONENTS
> +#define TARGET_SHRINK_WRAP_EMIT_EPILOGUE_COMPONENTS \
> + riscv_emit_epilogue_components
> +
> +#undef TARGET_SHRINK_WRAP_SET_HANDLED_COMPONENTS
> +#define TARGET_SHRINK_WRAP_SET_HANDLED_COMPONENTS \
> + riscv_set_handled_components
> +
> /* The generic ELF target does not always have TLS support. */
> #ifdef HAVE_AS_TLS
> #undef TARGET_HAVE_TLS
> diff --git a/gcc/testsuite/gcc.target/riscv/shrink-wrap-1.c b/gcc/testsuite/gcc.target/riscv/shrink-wrap-1.c
> new file mode 100644
> index 00000000000..b05745968e6
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/shrink-wrap-1.c
> @@ -0,0 +1,25 @@
> +/* { dg-do compile } */
> +/* { dg-options "-fshrink-wrap" } */
> +/* { dg-skip-if "" { *-*-* } { "-O0" "-Os" "-Oz" } } */
> +
> +void g(void);
> +
> +void f(int x)
> +{
> + if (x)
> + {
> + // Those only need to be saved if x is non-zero; without
> + // separate shrink-wrapping it would however be saved in all cases.
> + // Force saving of callee-saved registers
> + register int s2 asm("18") = x;
> + register int s3 asm("19") = x;
> + register int s4 asm("20") = x;
> + asm("" : : "r"(s2));
> + asm("" : : "r"(s3));
> + asm("" : : "r"(s4));
> + g();
> + }
> +}
> +
> +// The resulting code should do nothing if x == 0
> +/* { dg-final { scan-assembler "bne\ta0,zero,.*\n.*ret" } } */
next prev parent reply other threads:[~2022-10-02 20:32 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-09-06 10:39 mtsamis
2022-10-02 20:32 ` Palmer Dabbelt [this message]
2022-10-18 15:18 ` Manolis Tsamis
2022-10-18 15:57 ` Jeff Law
2022-10-18 17:35 ` Palmer Dabbelt
2022-10-19 17:15 ` Jeff Law
2022-10-20 7:42 ` Manolis Tsamis
2022-11-02 14:12 ` Manolis Tsamis
2022-11-02 15:02 ` Jeff Law
2022-10-20 7:35 ` Manolis Tsamis
2022-11-02 13:54 ` Manolis Tsamis
2022-11-02 15:06 ` Jeff Law
2022-11-03 0:26 ` Palmer Dabbelt
2022-11-03 22:23 ` Jeff Law
2022-11-07 22:07 ` Palmer Dabbelt
2022-11-13 1:32 ` Jeff Law
2022-11-16 10:26 ` Manolis Tsamis
2022-11-17 2:09 ` Jeff Law
2022-11-17 10:54 ` Manolis Tsamis
2022-11-17 11:59 ` Philipp Tomsich
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=mhng-a82e0def-6d5a-4983-98ba-b527e20a9781@palmer-ri-x1c9 \
--to=palmer@dabbelt.com \
--cc=gcc-patches@gcc.gnu.org \
--cc=kito.cheng@gmail.com \
--cc=manolis.tsamis@vrull.eu \
--cc=philipp.tomsich@vrull.eu \
--cc=vineetg@rivosinc.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).