From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pj1-x102e.google.com (mail-pj1-x102e.google.com [IPv6:2607:f8b0:4864:20::102e]) by sourceware.org (Postfix) with ESMTPS id E27333858D32 for ; Sun, 2 Oct 2022 20:32:31 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org E27333858D32 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=dabbelt.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=dabbelt.com Received: by mail-pj1-x102e.google.com with SMTP id 70so8320286pjo.4 for ; Sun, 02 Oct 2022 13:32:31 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=dabbelt-com.20210112.gappssmtp.com; s=20210112; h=content-transfer-encoding:mime-version:message-id:to:from:cc :in-reply-to:subject:date:from:to:cc:subject:date; bh=ALZRbS0ci8HTENN3nOfr9bo6xzNm3/FVyBxTChtWnUs=; b=C4aAZP/+kTbq4GlhO+ZwdaHAU7n9WBRfqgKGLGSiKWnmI1N4g2M0kNXI2VuiFkRHfV n2hIqPNwElcVegVmXBdT43yk66OHagrbL2pnPjMJej98PLClpjOIRFSD/KVBnxjEbDJl 8z7fjqLLgQNH1O4bXUvkuQdEsnwiOsqmRNmRltHM9LAPEdPu4bQHi+CY7x3M4jCgSzJX wnswp6D7lZm11UmbsIg3ZUirQugc/oG2ak6mK6KfbUqb4/4QAXwMfvGAzp2R67+joJ2+ zkJK/IDmALJQzh1t0J+xd9tW0JH/k3ubKatv8XkMg6t5iu8ryV8zT6qxUeSls9Z0+zJD 8v0w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:message-id:to:from:cc :in-reply-to:subject:date:x-gm-message-state:from:to:cc:subject:date; bh=ALZRbS0ci8HTENN3nOfr9bo6xzNm3/FVyBxTChtWnUs=; b=F31mshJZSzYCCediHow2mw6Wklxb1iyjiwnPjBYvWk9PI5gDW9BnzBic99/x35fGFF IgnIG3Vm3rbZRyBUUwkrKqYwvDbyp0m+Mg/RlAi4I7AoXC5j8nba2dvGtxKmbHYAB3FE fSYbV50icM15qV0ncjrJL4koBk7bxzCPau6r9Zlof2PFK2NZK/AcUp4MA9gloXlvNBCy 1NiU84dfP9Zs85j4lf6VQync5yjjt6TTdMd/93AyqF0GCXdSQFd9HUW0efyGZfgM4zMJ gsjW6VQZWkQbw1TioQONBw0ZCmsmPuEajNDlywxerBDSiXfuYBHOUgXb4+b/GFvOuf9w FCqg== X-Gm-Message-State: ACrzQf3O5RyMLMaMdWKYdwI4qodxORHj9GGfJX3fPV0If19cP1n/D8hs WrgoJj5NKGXdDY6zYnYZvvbYcieDN4jFWTBj X-Google-Smtp-Source: AMsMyM4eU6HckuNjJCXq1fc+ouCBkhxcp+cVHpQauBRxe5nTzZ9gxj783n8eRkxC52MxlbexhH+nDg== X-Received: by 2002:a17:902:6907:b0:179:c9bc:dd73 with SMTP id j7-20020a170902690700b00179c9bcdd73mr18852547plk.159.1664742750064; Sun, 02 Oct 2022 13:32:30 -0700 (PDT) Received: from localhost (76-210-143-223.lightspeed.sntcca.sbcglobal.net. [76.210.143.223]) by smtp.gmail.com with ESMTPSA id y7-20020a17090a2b4700b00205f2a54815sm5109379pjc.25.2022.10.02.13.32.29 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 02 Oct 2022 13:32:29 -0700 (PDT) Date: Sun, 02 Oct 2022 13:32:29 -0700 (PDT) X-Google-Original-Date: Sun, 02 Oct 2022 13:32:25 PDT (-0700) Subject: Re: [PATCH] Enable shrink wrapping for the RISC-V target. In-Reply-To: <20220906103902.2719585-1-manolis.tsamis@vrull.eu> CC: gcc-patches@gcc.gnu.org, Vineet Gupta , Kito Cheng , philipp.tomsich@vrull.eu From: Palmer Dabbelt To: manolis.tsamis@vrull.eu Message-ID: Mime-Version: 1.0 (MHng) Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-10.7 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,GIT_PATCH_0,KAM_SHORT,LIKELY_SPAM_BODY,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Tue, 06 Sep 2022 03:39:02 PDT (-0700), manolis.tsamis@vrull.eu wrote: > This commit implements the target macros (TARGET_SHRINK_WRAP_*) that > enable separate shrink wrapping for function prologues/epilogues in > RISC-V. > > Tested against SPEC CPU 2017, this change always has a net-positive > effect on the dynamic instruction count. See the following table for > the breakdown on how this reduces the number of dynamic instructions > per workload on a like-for-like (i.e., same config file; suppressing > shrink-wrapping with -fno-shrink-wrap): Does this also pass the regression tests? (there's also some comments on the code in-line) > > # dynamic instructions > w/o shrink-wrap w/ shrink-wrap reduction > 500.perlbench_r 1265716786593 1262156218578 3560568015 0.28% > 500.perlbench_r 779224795689 765337009025 13887786664 1.78% > 500.perlbench_r 724087331471 711307152522 12780178949 1.77% > 502.gcc_r 204259864844 194517006339 9742858505 4.77% > 502.gcc_r 244047794302 231555834722 12491959580 5.12% > 502.gcc_r 230896069400 221877703011 9018366389 3.91% > 502.gcc_r 192130616624 183856450605 8274166019 4.31% > 502.gcc_r 258875074079 247756203226 11118870853 4.30% > 505.mcf_r 662653430325 660678680547 1974749778 0.30% > 520.omnetpp_r 985114167068 934191310154 50922856914 5.17% > 523.xalancbmk_r 927037633578 921688937650 5348695928 0.58% > 525.x264_r 490953958454 490565583447 388375007 0.08% > 525.x264_r 1994662294421 1993171932425 1490361996 0.07% > 525.x264_r 1897617120450 1896062750609 1554369841 0.08% > 531.deepsjeng_r 1695189878907 1669304130411 25885748496 1.53% > 541.leela_r 1925941222222 1897900861198 28040361024 1.46% > 548.exchange2_r 2073816227944 2073816226729 1215 0.00% > 557.xz_r 379572090003 379057409041 514680962 0.14% > 557.xz_r 953117469352 952680431430 437037922 0.05% > 557.xz_r 536859579650 536456690164 402889486 0.08% > 18421773405376 18223938521833 197834883543 1.07% totals > > Signed-off-by: Manolis Tsamis > > gcc/ChangeLog: > > * config/riscv/riscv.cc (struct machine_function): Add array to store > register wrapping information. > (riscv_for_each_saved_reg): Skip registers that are wrapped separetely. > (riscv_get_separate_components): New function. > (riscv_components_for_bb): Likewise. > (riscv_disqualify_components): Likewise. > (riscv_process_components): Likewise. > (riscv_emit_prologue_components): Likewise. > (riscv_emit_epilogue_components): Likewise. > (riscv_set_handled_components): Likewise. > (TARGET_SHRINK_WRAP_GET_SEPARATE_COMPONENTS): Define. > (TARGET_SHRINK_WRAP_COMPONENTS_FOR_BB): Likewise. > (TARGET_SHRINK_WRAP_DISQUALIFY_COMPONENTS): Likewise. > (TARGET_SHRINK_WRAP_EMIT_PROLOGUE_COMPONENTS): Likewise. > (TARGET_SHRINK_WRAP_EMIT_EPILOGUE_COMPONENTS): Likewise. > (TARGET_SHRINK_WRAP_SET_HANDLED_COMPONENTS): Likewise. > > gcc/testsuite/ChangeLog: > > * gcc.target/riscv/shrink-wrap-1.c: New test. > > --- > > gcc/config/riscv/riscv.cc | 187 +++++++++++++++++- > .../gcc.target/riscv/shrink-wrap-1.c | 25 +++ > 2 files changed, 210 insertions(+), 2 deletions(-) > create mode 100644 gcc/testsuite/gcc.target/riscv/shrink-wrap-1.c > > diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc > index 5a0adffb5ce..3b633149a9a 100644 > --- a/gcc/config/riscv/riscv.cc > +++ b/gcc/config/riscv/riscv.cc > @@ -25,6 +25,7 @@ along with GCC; see the file COPYING3. If not see > #include "config.h" > #include "system.h" > #include "coretypes.h" > +#include "backend.h" > #include "tm.h" > #include "rtl.h" > #include "regs.h" > @@ -52,6 +53,7 @@ along with GCC; see the file COPYING3. If not see > #include "optabs.h" > #include "bitmap.h" > #include "df.h" > +#include "function-abi.h" > #include "diagnostic.h" > #include "builtins.h" > #include "predict.h" > @@ -147,6 +149,11 @@ struct GTY(()) machine_function { > > /* The current frame information, calculated by riscv_compute_frame_info. */ > struct riscv_frame_info frame; > + > + /* The components already handled by separate shrink-wrapping, which should > + not be considered by the prologue and epilogue. */ > + bool reg_is_wrapped_separately[FIRST_PSEUDO_REGISTER]; > + > }; > > /* Information about a single argument. */ > @@ -4209,7 +4216,7 @@ riscv_for_each_saved_reg (HOST_WIDE_INT sp_offset, riscv_save_restore_fn fn, > for (unsigned int regno = GP_REG_FIRST; regno <= GP_REG_LAST; regno++) > if (BITSET_P (cfun->machine->frame.mask, regno - GP_REG_FIRST)) > { > - bool handle_reg = TRUE; > + bool handle_reg = !cfun->machine->reg_is_wrapped_separately[regno]; > > /* If this is a normal return in a function that calls the eh_return > builtin, then do not restore the eh return data registers as that > @@ -4240,9 +4247,11 @@ riscv_for_each_saved_reg (HOST_WIDE_INT sp_offset, riscv_save_restore_fn fn, > for (unsigned int regno = FP_REG_FIRST; regno <= FP_REG_LAST; regno++) > if (BITSET_P (cfun->machine->frame.fmask, regno - FP_REG_FIRST)) > { > + bool handle_reg = !cfun->machine->reg_is_wrapped_separately[regno]; > machine_mode mode = TARGET_DOUBLE_FLOAT ? DFmode : SFmode; > > - riscv_save_restore_reg (mode, regno, offset, fn); > + if (handle_reg) > + riscv_save_restore_reg (mode, regno, offset, fn); > offset -= GET_MODE_SIZE (mode); > } > } > @@ -4667,6 +4676,156 @@ riscv_epilogue_uses (unsigned int regno) > return false; > } > > +/* Implement TARGET_SHRINK_WRAP_GET_SEPARATE_COMPONENTS. */ > + > +static sbitmap > +riscv_get_separate_components (void) > +{ > + HOST_WIDE_INT offset; > + sbitmap components = sbitmap_alloc (FIRST_PSEUDO_REGISTER); > + bitmap_clear (components); > + > + if (riscv_use_save_libcall (&cfun->machine->frame) > + || cfun->machine->interrupt_handler_p) riscv_use_save_libcall() already checks interrupt_handler_p, so that's redundant. That said, I'm not sure riscv_use_save_libcall() is the right check here as unless I'm missing something we don't have all those other constraints when shrink-wrapping. Either way, we'll also need to avoid saving for naked functions. > + return components; > + > + offset = cfun->machine->frame.gp_sp_offset; > + for (unsigned int regno = GP_REG_FIRST; regno <= GP_REG_LAST; regno++) > + if (BITSET_P (cfun->machine->frame.mask, regno - GP_REG_FIRST)) > + { > + if (SMALL_OPERAND (offset)) > + bitmap_set_bit (components, regno); > + > + offset -= UNITS_PER_WORD; > + } > + > + offset = cfun->machine->frame.fp_sp_offset; > + for (unsigned int regno = FP_REG_FIRST; regno <= FP_REG_LAST; regno++) > + if (BITSET_P (cfun->machine->frame.fmask, regno - FP_REG_FIRST)) > + { > + machine_mode mode = TARGET_DOUBLE_FLOAT ? DFmode : SFmode; > + > + if (SMALL_OPERAND (offset)) > + bitmap_set_bit (components, regno); > + > + offset -= GET_MODE_SIZE (mode); > + } It seems kind of clunky to have two copies of all these loops (and we'll need a third to make this work with the V stuff), but we've got that issue elsewhere in the port so I don't think you need to fix it here (though the V stuff will be there for the v2, so you'll need the third copy of each loop). > + > + /* Don't mess with the hard frame pointer. */ > + if (frame_pointer_needed) > + bitmap_clear_bit (components, HARD_FRAME_POINTER_REGNUM); > + > + bitmap_clear_bit (components, RETURN_ADDR_REGNUM); > + > + return components; > +} > + > +/* Implement TARGET_SHRINK_WRAP_COMPONENTS_FOR_BB. */ > + > +static sbitmap > +riscv_components_for_bb (basic_block bb) > +{ > + bitmap in = DF_LIVE_IN (bb); > + bitmap gen = &DF_LIVE_BB_INFO (bb)->gen; > + bitmap kill = &DF_LIVE_BB_INFO (bb)->kill; > + > + sbitmap components = sbitmap_alloc (FIRST_PSEUDO_REGISTER); > + bitmap_clear (components); > + > + function_abi_aggregator callee_abis; > + rtx_insn *insn; > + FOR_BB_INSNS (bb, insn) > + if (CALL_P (insn)) > + callee_abis.note_callee_abi (insn_callee_abi (insn)); > + HARD_REG_SET extra_caller_saves = callee_abis.caller_save_regs (*crtl->abi); > + > + /* GPRs are used in a bb if they are in the IN, GEN, or KILL sets. */ > + for (unsigned int regno = GP_REG_FIRST; regno <= GP_REG_LAST; regno++) > + if (!fixed_regs[regno] > + && !crtl->abi->clobbers_full_reg_p (regno) > + && (TEST_HARD_REG_BIT (extra_caller_saves, regno) > + || bitmap_bit_p (in, regno) > + || bitmap_bit_p (gen, regno) > + || bitmap_bit_p (kill, regno))) > + bitmap_set_bit (components, regno); > + > + for (unsigned int regno = FP_REG_FIRST; regno <= FP_REG_LAST; regno++) > + if (!fixed_regs[regno] > + && !crtl->abi->clobbers_full_reg_p (regno) > + && (TEST_HARD_REG_BIT (extra_caller_saves, regno) > + || bitmap_bit_p (in, regno) > + || bitmap_bit_p (gen, regno) > + || bitmap_bit_p (kill, regno))) > + bitmap_set_bit (components, regno); > + > + return components; > +} > + > +/* Implement TARGET_SHRINK_WRAP_DISQUALIFY_COMPONENTS. */ > + > +static void > +riscv_disqualify_components (sbitmap, edge, sbitmap, bool) > +{ > + /* Nothing to do for riscv. */ I think that's true, but when poking around I couldn't figure out how saving registers to long stack frames works: we need a temporary register to generate the long stack offset, which is easy when we're in the pro/epilogue but I haven't figured out if it's handled properly when shrink wrapping. It looks like the aarch64 port side steps the issue by only allowing registers that fall within a short offset to be shrink wrapped. That seems like a pretty straight-forward way to handle the problem, but I'm not sure how it'd impact performance -- we've already got a lot of issues with large stack frames, though, so maybe it's OK to just punt on them for now? Either way, this deserves a test case. I think it should be possible to write one by introducing some register pressure around a shrink-wrappable block that needs a long stack offset and making sure in-flight registers don't get trashed. > +} > + > +static void > +riscv_process_components (sbitmap components, bool prologue_p) > +{ > + HOST_WIDE_INT offset; > + riscv_save_restore_fn fn = prologue_p? riscv_save_reg : riscv_restore_reg; > + > + offset = cfun->machine->frame.gp_sp_offset; > + for (unsigned int regno = GP_REG_FIRST; regno <= GP_REG_LAST; regno++) > + if (BITSET_P (cfun->machine->frame.mask, regno - GP_REG_FIRST)) > + { > + if (bitmap_bit_p (components, regno)) > + riscv_save_restore_reg (word_mode, regno, offset, fn); > + > + offset -= UNITS_PER_WORD; > + } > + > + offset = cfun->machine->frame.fp_sp_offset; > + for (unsigned int regno = FP_REG_FIRST; regno <= FP_REG_LAST; regno++) > + if (BITSET_P (cfun->machine->frame.fmask, regno - FP_REG_FIRST)) > + { > + machine_mode mode = TARGET_DOUBLE_FLOAT ? DFmode : SFmode; > + > + if (bitmap_bit_p (components, regno)) > + riscv_save_restore_reg (mode, regno, offset, fn); > + > + offset -= GET_MODE_SIZE (mode); > + } > +} > + > +/* Implement TARGET_SHRINK_WRAP_EMIT_PROLOGUE_COMPONENTS. */ > + > +static void > +riscv_emit_prologue_components (sbitmap components) > +{ > + riscv_process_components (components, true); > +} > + > +/* Implement TARGET_SHRINK_WRAP_EMIT_EPILOGUE_COMPONENTS. */ > + > +static void > +riscv_emit_epilogue_components (sbitmap components) > +{ > + riscv_process_components (components, false); > +} > + > +static void > +riscv_set_handled_components (sbitmap components) > +{ > + for (unsigned int regno = GP_REG_FIRST; regno <= GP_REG_LAST; regno++) > + if (bitmap_bit_p (components, regno)) > + cfun->machine->reg_is_wrapped_separately[regno] = true; > + > + for (unsigned int regno = FP_REG_FIRST; regno <= FP_REG_LAST; regno++) > + if (bitmap_bit_p (components, regno)) > + cfun->machine->reg_is_wrapped_separately[regno] = true; > +} > + > /* Return nonzero if this function is known to have a null epilogue. > This allows the optimizer to omit jumps to jumps if no stack > was created. */ > @@ -5713,6 +5872,30 @@ riscv_asan_shadow_offset (void) > #undef TARGET_FUNCTION_ARG_BOUNDARY > #define TARGET_FUNCTION_ARG_BOUNDARY riscv_function_arg_boundary > > +#undef TARGET_SHRINK_WRAP_GET_SEPARATE_COMPONENTS > +#define TARGET_SHRINK_WRAP_GET_SEPARATE_COMPONENTS \ > + riscv_get_separate_components > + > +#undef TARGET_SHRINK_WRAP_COMPONENTS_FOR_BB > +#define TARGET_SHRINK_WRAP_COMPONENTS_FOR_BB \ > + riscv_components_for_bb > + > +#undef TARGET_SHRINK_WRAP_DISQUALIFY_COMPONENTS > +#define TARGET_SHRINK_WRAP_DISQUALIFY_COMPONENTS \ > + riscv_disqualify_components > + > +#undef TARGET_SHRINK_WRAP_EMIT_PROLOGUE_COMPONENTS > +#define TARGET_SHRINK_WRAP_EMIT_PROLOGUE_COMPONENTS \ > + riscv_emit_prologue_components > + > +#undef TARGET_SHRINK_WRAP_EMIT_EPILOGUE_COMPONENTS > +#define TARGET_SHRINK_WRAP_EMIT_EPILOGUE_COMPONENTS \ > + riscv_emit_epilogue_components > + > +#undef TARGET_SHRINK_WRAP_SET_HANDLED_COMPONENTS > +#define TARGET_SHRINK_WRAP_SET_HANDLED_COMPONENTS \ > + riscv_set_handled_components > + > /* The generic ELF target does not always have TLS support. */ > #ifdef HAVE_AS_TLS > #undef TARGET_HAVE_TLS > diff --git a/gcc/testsuite/gcc.target/riscv/shrink-wrap-1.c b/gcc/testsuite/gcc.target/riscv/shrink-wrap-1.c > new file mode 100644 > index 00000000000..b05745968e6 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/riscv/shrink-wrap-1.c > @@ -0,0 +1,25 @@ > +/* { dg-do compile } */ > +/* { dg-options "-fshrink-wrap" } */ > +/* { dg-skip-if "" { *-*-* } { "-O0" "-Os" "-Oz" } } */ > + > +void g(void); > + > +void f(int x) > +{ > + if (x) > + { > + // Those only need to be saved if x is non-zero; without > + // separate shrink-wrapping it would however be saved in all cases. > + // Force saving of callee-saved registers > + register int s2 asm("18") = x; > + register int s3 asm("19") = x; > + register int s4 asm("20") = x; > + asm("" : : "r"(s2)); > + asm("" : : "r"(s3)); > + asm("" : : "r"(s4)); > + g(); > + } > +} > + > +// The resulting code should do nothing if x == 0 > +/* { dg-final { scan-assembler "bne\ta0,zero,.*\n.*ret" } } */