From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from loongson.cn (mail.loongson.cn [114.242.206.163]) by sourceware.org (Postfix) with ESMTP id 910CB3858D28 for ; Mon, 24 Apr 2023 10:03:41 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 910CB3858D28 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=loongson.cn Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=loongson.cn Received: from loongson.cn (unknown [10.20.4.52]) by gateway (Coremail) with SMTP id _____8AxdfD6U0ZkZgEAAA--.11S3; Mon, 24 Apr 2023 18:03:38 +0800 (CST) Received: from [10.20.4.52] (unknown [10.20.4.52]) by localhost.localdomain (Coremail) with SMTP id AQAAf8AxirIVQ0ZkroA4AA--.10830S2; Mon, 24 Apr 2023 16:51:34 +0800 (CST) Subject: Re: [PATCH] LoongArch: Enable shrink wrapping To: Xi Ruoyao , gcc-patches@gcc.gnu.org Cc: WANG Xuerui , Chenghua Xu References: <20230423131903.155998-1-xry111@xry111.site> From: Lulu Cheng Message-ID: <0ccf3771-af4f-cee7-d660-7aea17c6cffd@loongson.cn> Date: Mon, 24 Apr 2023 16:51:33 +0800 User-Agent: Mozilla/5.0 (X11; Linux mips64; rv:68.0) Gecko/20100101 Thunderbird/68.7.0 MIME-Version: 1.0 In-Reply-To: <20230423131903.155998-1-xry111@xry111.site> Content-Type: text/plain; charset=gbk; format=flowed Content-Transfer-Encoding: 8bit Content-Language: en-US X-CM-TRANSID:AQAAf8AxirIVQ0ZkroA4AA--.10830S2 X-CM-SenderInfo: xfkh0wpoxo3qxorr0wxvrqhubq/ X-Coremail-Antispam: 1Uk129KBjvJXoW3Cw4xWw4fJF17ZF4UCFWxWFg_yoWkGF4fpF yDAw4qyr4rXF90vrWDJa4rXFnxCrn8KF129a4fKrWxCa1DAr93uF40g3sFvayvqaykWrnF 9r1rKw47u3WDAa7anT9S1TB71UUUUUUqnTZGkaVYY2UrUUUUj1kv1TuYvTs0mT0YCTnIWj qI5I8CrVACY4xI64kE6c02F40Ex7xfYxn0WfASr-VFAUDa7-sFnT9fnUUIcSsGvfJTRUUU bIkYFVCjjxCrM7AC8VAFwI0_Jr0_Gr1l1xkIjI8I6I8E6xAIw20EY4v20xvaj40_Wr0E3s 1l1IIY67AEw4v_Jr0_Jr4l8cAvFVAK0II2c7xJM28CjxkF64kEwVA0rcxSw2x7M28EF7xv wVC0I7IYx2IY67AKxVWDJVCq3wA2z4x0Y4vE2Ix0cI8IcVCY1x0267AKxVW8Jr0_Cr1UM2 8EF7xvwVC2z280aVAFwI0_Gr1j6F4UJwA2z4x0Y4vEx4A2jsIEc7CjxVAFwI0_Gr1j6F4U JwAS0I0E0xvYzxvE52x082IY62kv0487Mc804VCY07AIYIkI8VC2zVCFFI0UMc02F40EFc xC0VAKzVAqx4xG6I80ewAv7VC0I7IYx2IY67AKxVWUXVWUAwAv7VC2z280aVAFwI0_Jr0_ Gr1lOx8S6xCaFVCjc4AY6r1j6r4UM4x0Y48IcVAKI48JMxk0xIA0c2IEe2xFo4CEbIxvr2 1l42xK82IYc2Ij64vIr41l4I8I3I0E4IkC6x0Yz7v_Jr0_Gr1lx2IqxVAqx4xG67AKxVWU JVWUGwC20s026x8GjcxK67AKxVWUGVWUWwC2zVAF1VAY17CE14v26r126r1DMIIYrxkI7V AKI48JMIIF0xvE2Ix0cI8IcVAFwI0_Jr0_JF4lIxAIcVC0I7IYx2IY6xkF7I0E14v26r1j 6r4UMIIF0xvE42xK8VAvwI8IcIk0rVWUJVWUCwCI42IY6I8E87Iv67AKxVWUJVW8JwCI42 IY6I8E87Iv6xkF7I0E14v26r1j6r4UYxBIdaVFxhVjvjDU0xZFpf9x07UNvtZUUUUU= X-Spam-Status: No, score=-10.6 required=5.0 tests=BAYES_00,GIT_PATCH_0,KAM_DMARC_STATUS,KAM_SHORT,MIME_CHARSET_FARAWAY,NICE_REPLY_A,SPF_HELO_PASS,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Ok, I will do spec performance test comparison as soon as possible. Thanks! ÔÚ 2023/4/23 ÏÂÎç9:19, Xi Ruoyao дµÀ: > This commit implements the target macros for shrink wrapping of function > prologues/epilogues shrink wrapping on LoongArch. > > Bootstrapped and regtested on loongarch64-linux-gnu. I don't have an > access to SPEC CPU so I hope the reviewer can perform a benchmark to see > if there is real benefit. > > gcc/ChangeLog: > > * config/loongarch/loongarch.h (struct machine_function): Add > reg_is_wrapped_separately array for register wrapping > information. > * config/loongarch/loongarch.cc > (loongarch_get_separate_components): New function. > (loongarch_components_for_bb): Likewise. > (loongarch_disqualify_components): Likewise. > (loongarch_process_components): Likewise. > (loongarch_emit_prologue_components): Likewise. > (loongarch_emit_epilogue_components): Likewise. > (loongarch_set_handled_components): Likewise. > (TARGET_SHRINK_WRAP_GET_SEPARATE_COMPONENTS): Define. > (TARGET_SHRINK_WRAP_COMPONENTS_FOR_BB): Likewise. > (TARGET_SHRINK_WRAP_DISQUALIFY_COMPONENTS): Likewise. > (TARGET_SHRINK_WRAP_EMIT_PROLOGUE_COMPONENTS): Likewise. > (TARGET_SHRINK_WRAP_EMIT_EPILOGUE_COMPONENTS): Likewise. > (TARGET_SHRINK_WRAP_SET_HANDLED_COMPONENTS): Likewise. > (loongarch_for_each_saved_reg): Skip registers that are wrapped > separately. > > gcc/testsuite/ChangeLog: > > * gcc.target/loongarch/shrink-wrap.c: New test. > --- > gcc/config/loongarch/loongarch.cc | 179 +++++++++++++++++- > gcc/config/loongarch/loongarch.h | 2 + > .../gcc.target/loongarch/shrink-wrap.c | 22 +++ > 3 files changed, 200 insertions(+), 3 deletions(-) > create mode 100644 gcc/testsuite/gcc.target/loongarch/shrink-wrap.c > > diff --git a/gcc/config/loongarch/loongarch.cc b/gcc/config/loongarch/loongarch.cc > index e523fcb6b7f..d0024237a6a 100644 > --- a/gcc/config/loongarch/loongarch.cc > +++ b/gcc/config/loongarch/loongarch.cc > @@ -64,6 +64,7 @@ along with GCC; see the file COPYING3. If not see > #include "builtins.h" > #include "rtl-iter.h" > #include "opts.h" > +#include "function-abi.h" > > /* This file should be included last. */ > #include "target-def.h" > @@ -1017,19 +1018,23 @@ loongarch_for_each_saved_reg (HOST_WIDE_INT sp_offset, > for (int regno = GP_REG_FIRST; regno <= GP_REG_LAST; regno++) > if (BITSET_P (cfun->machine->frame.mask, regno - GP_REG_FIRST)) > { > - loongarch_save_restore_reg (word_mode, regno, offset, fn); > + if (!cfun->machine->reg_is_wrapped_separately[regno]) > + loongarch_save_restore_reg (word_mode, regno, offset, fn); > + > offset -= UNITS_PER_WORD; > } > > /* This loop must iterate over the same space as its companion in > loongarch_compute_frame_info. */ > offset = cfun->machine->frame.fp_sp_offset - sp_offset; > + machine_mode mode = TARGET_DOUBLE_FLOAT ? DFmode : SFmode; > + > for (int regno = FP_REG_FIRST; regno <= FP_REG_LAST; regno++) > if (BITSET_P (cfun->machine->frame.fmask, regno - FP_REG_FIRST)) > { > - machine_mode mode = TARGET_DOUBLE_FLOAT ? DFmode : SFmode; > + if (!cfun->machine->reg_is_wrapped_separately[regno]) > + loongarch_save_restore_reg (word_mode, regno, offset, fn); > > - loongarch_save_restore_reg (mode, regno, offset, fn); > offset -= GET_MODE_SIZE (mode); > } > } > @@ -6644,6 +6649,151 @@ loongarch_asan_shadow_offset (void) > return TARGET_64BIT ? (HOST_WIDE_INT_1 << 46) : 0; > } > > +static sbitmap > +loongarch_get_separate_components (void) > +{ > + HOST_WIDE_INT offset; > + sbitmap components = sbitmap_alloc (FIRST_PSEUDO_REGISTER); > + bitmap_clear (components); > + offset = cfun->machine->frame.gp_sp_offset; > + > + /* The stack should be aligned to 16-bytes boundary, so we can make the use > + of ldptr instructions. */ > + gcc_assert (offset % UNITS_PER_WORD == 0); > + > + for (unsigned int regno = GP_REG_FIRST; regno <= GP_REG_LAST; regno++) > + if (BITSET_P (cfun->machine->frame.mask, regno - GP_REG_FIRST)) > + { > + /* We can wrap general registers saved at [sp, sp + 32768) using the > + ldptr/stptr instructions. For large offsets a pseudo register > + might be needed which cannot be created during the shrink > + wrapping pass. > + > + TODO: This may need a revise when we add LA32 as ldptr.w is not > + guaranteed available by the manual. */ > + if (offset < 32768) > + bitmap_set_bit (components, regno); > + > + offset -= UNITS_PER_WORD; > + } > + > + offset = cfun->machine->frame.fp_sp_offset; > + for (unsigned int regno = FP_REG_FIRST; regno <= FP_REG_LAST; regno++) > + if (BITSET_P (cfun->machine->frame.fmask, regno - FP_REG_FIRST)) > + { > + /* We can only wrap FP registers with imm12 offsets. For large > + offsets a pseudo register might be needed which cannot be > + created during the shrink wrapping pass. */ > + if (IMM12_OPERAND (offset)) > + bitmap_set_bit (components, regno); > + > + offset -= UNITS_PER_FPREG; > + } > + > + /* Don't mess with the hard frame pointer. */ > + if (frame_pointer_needed) > + bitmap_clear_bit (components, HARD_FRAME_POINTER_REGNUM); > + > + bitmap_clear_bit (components, RETURN_ADDR_REGNUM); > + > + return components; > +} > + > +static sbitmap > +loongarch_components_for_bb (basic_block bb) > +{ > + /* Registers are used in a bb if they are in the IN, GEN, or KILL sets. */ > + auto_bitmap used; > + bitmap_copy (used, DF_LIVE_IN (bb)); > + bitmap_ior_into (used, &DF_LIVE_BB_INFO (bb)->gen); > + bitmap_ior_into (used, &DF_LIVE_BB_INFO (bb)->kill); > + > + sbitmap components = sbitmap_alloc (FIRST_PSEUDO_REGISTER); > + bitmap_clear (components); > + > + function_abi_aggregator callee_abis; > + rtx_insn *insn; > + FOR_BB_INSNS (bb, insn) > + if (CALL_P (insn)) > + callee_abis.note_callee_abi (insn_callee_abi (insn)); > + > + HARD_REG_SET extra_caller_saves = > + callee_abis.caller_save_regs (*crtl->abi); > + > + for (unsigned int regno = GP_REG_FIRST; regno <= GP_REG_LAST; regno++) > + if (!fixed_regs[regno] > + && !crtl->abi->clobbers_full_reg_p (regno) > + && (TEST_HARD_REG_BIT (extra_caller_saves, regno) || > + bitmap_bit_p (used, regno))) > + bitmap_set_bit (components, regno); > + > + for (unsigned int regno = FP_REG_FIRST; regno <= FP_REG_LAST; regno++) > + if (!fixed_regs[regno] > + && !crtl->abi->clobbers_full_reg_p (regno) > + && (TEST_HARD_REG_BIT (extra_caller_saves, regno) || > + bitmap_bit_p (used, regno))) > + bitmap_set_bit (components, regno); > + > + return components; > +} > + > +static void > +loongarch_disqualify_components (sbitmap, edge, sbitmap, bool) > +{ > + /* Do nothing. */ > +} > + > +static void > +loongarch_process_components (sbitmap components, loongarch_save_restore_fn fn) > +{ > + HOST_WIDE_INT offset = cfun->machine->frame.gp_sp_offset; > + > + for (unsigned int regno = GP_REG_FIRST; regno <= GP_REG_LAST; regno++) > + if (BITSET_P (cfun->machine->frame.mask, regno - GP_REG_FIRST)) > + { > + if (bitmap_bit_p (components, regno)) > + loongarch_save_restore_reg (word_mode, regno, offset, fn); > + > + offset -= UNITS_PER_WORD; > + } > + > + offset = cfun->machine->frame.fp_sp_offset; > + machine_mode mode = TARGET_DOUBLE_FLOAT ? DFmode : SFmode; > + > + for (unsigned int regno = FP_REG_FIRST; regno <= FP_REG_LAST; regno++) > + if (BITSET_P (cfun->machine->frame.fmask, regno - FP_REG_FIRST)) > + { > + if (bitmap_bit_p (components, regno)) > + loongarch_save_restore_reg (mode, regno, offset, fn); > + > + offset -= UNITS_PER_FPREG; > + } > +} > + > +static void > +loongarch_emit_prologue_components (sbitmap components) > +{ > + loongarch_process_components (components, loongarch_save_reg); > +} > + > +static void > +loongarch_emit_epilogue_components (sbitmap components) > +{ > + loongarch_process_components (components, loongarch_restore_reg); > +} > + > +static void > +loongarch_set_handled_components (sbitmap components) > +{ > + for (unsigned int regno = GP_REG_FIRST; regno <= GP_REG_LAST; regno++) > + if (bitmap_bit_p (components, regno)) > + cfun->machine->reg_is_wrapped_separately[regno] = true; > + > + for (unsigned int regno = FP_REG_FIRST; regno <= FP_REG_LAST; regno++) > + if (bitmap_bit_p (components, regno)) > + cfun->machine->reg_is_wrapped_separately[regno] = true; > +} > + > /* Initialize the GCC target structure. */ > #undef TARGET_ASM_ALIGNED_HI_OP > #define TARGET_ASM_ALIGNED_HI_OP "\t.half\t" > @@ -6841,6 +6991,29 @@ loongarch_asan_shadow_offset (void) > #undef TARGET_ASAN_SHADOW_OFFSET > #define TARGET_ASAN_SHADOW_OFFSET loongarch_asan_shadow_offset > > +#undef TARGET_SHRINK_WRAP_GET_SEPARATE_COMPONENTS > +#define TARGET_SHRINK_WRAP_GET_SEPARATE_COMPONENTS \ > + loongarch_get_separate_components > + > +#undef TARGET_SHRINK_WRAP_COMPONENTS_FOR_BB > +#define TARGET_SHRINK_WRAP_COMPONENTS_FOR_BB loongarch_components_for_bb > + > +#undef TARGET_SHRINK_WRAP_DISQUALIFY_COMPONENTS > +#define TARGET_SHRINK_WRAP_DISQUALIFY_COMPONENTS \ > + loongarch_disqualify_components > + > +#undef TARGET_SHRINK_WRAP_EMIT_PROLOGUE_COMPONENTS > +#define TARGET_SHRINK_WRAP_EMIT_PROLOGUE_COMPONENTS \ > + loongarch_emit_prologue_components > + > +#undef TARGET_SHRINK_WRAP_EMIT_EPILOGUE_COMPONENTS > +#define TARGET_SHRINK_WRAP_EMIT_EPILOGUE_COMPONENTS \ > + loongarch_emit_epilogue_components > + > +#undef TARGET_SHRINK_WRAP_SET_HANDLED_COMPONENTS > +#define TARGET_SHRINK_WRAP_SET_HANDLED_COMPONENTS \ > + loongarch_set_handled_components > + > struct gcc_target targetm = TARGET_INITIALIZER; > > #include "gt-loongarch.h" > diff --git a/gcc/config/loongarch/loongarch.h b/gcc/config/loongarch/loongarch.h > index a9eff6a81bd..829acdaa9be 100644 > --- a/gcc/config/loongarch/loongarch.h > +++ b/gcc/config/loongarch/loongarch.h > @@ -1147,6 +1147,8 @@ struct GTY (()) machine_function > /* The current frame information, calculated by loongarch_compute_frame_info. > */ > struct loongarch_frame_info frame; > + > + bool reg_is_wrapped_separately[FIRST_PSEUDO_REGISTER]; > }; > #endif > > diff --git a/gcc/testsuite/gcc.target/loongarch/shrink-wrap.c b/gcc/testsuite/gcc.target/loongarch/shrink-wrap.c > new file mode 100644 > index 00000000000..f2c867a2769 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/loongarch/shrink-wrap.c > @@ -0,0 +1,22 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O -fshrink-wrap" } */ > + > +/* f(x) should do nothing if x is 0. */ > +/* { dg-final { scan-assembler "bnez\t\\\$r4,\[^\n\]*\n\tjr\t\\\$r1" } } */ > + > +void g(void); > + > +void > +f(int x) > +{ > + if (x) > + { > + register int s0 asm("s0") = x; > + register int s1 asm("s1") = x; > + register int s2 asm("s2") = x; > + asm("" : : "r"(s0)); > + asm("" : : "r"(s1)); > + asm("" : : "r"(s2)); > + g(); > + } > +}