From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from sonicconh6001-vm2.mail.ssk.yahoo.co.jp (sonicconh6001-vm2.mail.ssk.yahoo.co.jp [182.22.37.11]) by sourceware.org (Postfix) with ESMTPS id 1B60B3858D20 for ; Fri, 27 Jan 2023 03:17:49 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 1B60B3858D20 Authentication-Results: sourceware.org; dmarc=pass (p=quarantine dis=none) header.from=yahoo.co.jp Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=yahoo.co.jp X-YMail-OSG: 6pDll2EVM1nytDSqgOSYIoD_S6NE_52wU0nUnR4Fu7p9F8YEOIH6Ag_k5J.jpfb f3A5XLtMP.bG4FX5drYOntsWiIIQbZty.YKE._ARToBAN8Qlowyy28kR93PRR5K.XNGwXC0SKNpp 0dQbI8agGmrgfp1wEEgslNfrF4S81QsZJBqq9ytw8rv89mcf0Ff2X4Is5CGNiFe1JduYvAMLM6D7 xED0zsHzH4ak514bCl5t7CJIvirOKycdvcP9kkWXuRYaZyt38HgkMz5gUwY9.HRU_jjnIN.CfiaP gowvt9jtPo3LQdO4AlaTs7FM3ecnBrGV.wfOuLE.agGzo7zmRtJ1s.403Cbr74MHhxnD46AEl0Br _3kyytAAVyKcm78FiHEut4IISwV5UhoAU2GEYTPN8bAnuKyJxXyHuofuphUIu5_sJOFLDl3p7DbO Y2MIIT9rB2qgyQadhBnY4hXtYiQ5jgBeyvK0P9pP12HBNIQk8qyOaNKjMaY11Mp4cwZqs85i9qDr srsps8QQVufP4LfBKgoL1O2GwOV7L4Np37jIxcBDuJFGfIKeIuZSoz99SxkqVV6HrQamZkuJD8FG 4aKDjut2Qsa.spqKyyRfM3KzqkA.L0sHiR3Jackfl5qyu1TWozwTb3MLr3nsJCK4GDe7mbtwv_yV w83BHV7Xao8iPQiBCuTHT.4BWXwJ2IOmzHGm7g0l.AM9pleT8By_ujpy3BIb7WNkj4sKjvkYvQtM jwE9xCiFshrD419BHDXCrYqAttZ5yl8IkbXY2_ek3qzxE7VzM.BLna0lInhMjHcOF8EfDdWlOjN3 ntgSMv0cDQeayviEbql4BJYmUGPcoZWJPXUqg68uas5cJxRWq5ncE_j9IV9GVaFFR443EYXN7KFo dP.rA.hKjzixlpOOu.UYGm7odKpJzAeie6YaBAVMbpmziHVR_bhxL.KXbC4HxCKCt8WaDGU3nDNk d Received: from sonicgw.mail.yahoo.co.jp by sonicconh6001.mail.ssk.yahoo.co.jp with HTTP; Fri, 27 Jan 2023 03:17:45 +0000 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; t=1674789465; s=yj20110701; d=yahoo.co.jp; h=Message-ID:Date:MIME-Version:To:Cc:From:Subject:Content-Type:Content-Transfer-Encoding:References; bh=SCYX+TEA7ZikPKFQEgKlMf4jbMdiYlEY5S7H6b/VJNs=; b=kyOobrtTPSy58SQdQsQG01eXvSkZNHJddlyoYYqGT1xYL5D5YsxgYSztMGYYdavv b3/xKPp7hczCRLxxVI3ilpuULtKcwkRG/tThrGoHvWXqP8vUlHsLvvV2atTjhoUzZ/C KijqVlYJGIAGRobF2BTPndwIVhBXDIR752D84ylc= DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=yj20110701; d=yahoo.co.jp; h=Message-ID:Date:MIME-Version:Cc:From:Content-Type:Content-Transfer-Encoding:References; b=IuUEOBBdZVO614XwRX+t4xyt2u2Wlch27c8HcbeYcLew4fea0R4Ga4bu6QRoNizX LjQhWXRPK5sm6aHEspX8P+5FqSIcXJSAmWAPvDoDfMmnoq8KIIUZXjKf1pOMaIoRbum MfS/dE1YDjL4yySE6cI5d4qru9ZxT5JcZ+mvRqo8=; Received: by smtphe6010.mail.ssk.ynwp.yahoo.co.jp (YJ Hermes SMTP Server) with ESMTPA ID 70359e74142f29c65d0b71ec18a6c6ea; Fri, 27 Jan 2023 12:17:43 +0900 (JST) Message-ID: <23119c5d-75a4-af2d-ad6e-8e125b0891f9@yahoo.co.jp> Date: Fri, 27 Jan 2023 12:17:33 +0900 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:102.0) Gecko/20100101 Thunderbird/102.7.0 To: GCC Patches Cc: Max Filippov From: Takayuki 'January June' Suwa Subject: [PATCH v6] xtensa: Eliminate the use of callee-saved register that saves and restores only once Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit References: <23119c5d-75a4-af2d-ad6e-8e125b0891f9.ref@yahoo.co.jp> X-Spam-Status: No, score=-12.6 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_ENVFROM_END_DIGIT,FREEMAIL_FROM,GIT_PATCH_0,KAM_SHORT,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: In the case of the CALL0 ABI, values that must be retained before and after function calls are placed in the callee-saved registers (A12 through A15) and referenced later. However, it is often the case that the save and the reference are each only once and a simple register- register move (with two exceptions; i. the register saved to/restored from is the stack pointer, ii. the function needs an additional stack pointer adjustment to grow the stack). e.g. in the following example, if there are no other occurrences of register A14: ;; before ; prologue { ... s32i.n a14, sp, 16 ... ;; no frame pointer needed ;; no additional stack growth ; } prologue ... mov.n a14, a6 ;; A6 is not SP ... call0 foo ... mov.n a8, a14 ;; A8 is not SP ... ; epilogue { ... l32i.n a14, sp, 16 ... ; } epilogue It can be possible like this: ;; after ; prologue { ... (no save needed) ... ; } prologue ... s32i.n a6, sp, 16 ;; replaced with A14's slot ... call0 foo ... l32i.n a8, sp, 16 ;; through SP ... ; epilogue { ... (no restoration needed) ... ; } epilogue This patch adds the abovementioned logic to the function prologue/epilogue RTL expander code. gcc/ChangeLog: * config/xtensa/xtensa.cc (machine_function): Add new member 'eliminated_callee_saved_bmp'. (xtensa_can_eliminate_callee_saved_reg_p): New function to determine whether the register can be eliminated or not. (xtensa_expand_prologue): Add invoking the above function and elimination the use of callee-saved register by using its stack slot through the stack pointer (or the frame pointer if needed) directly. (xtensa_expand_prologue): Modify to not emit register restoration insn from its stack slot if the register is already eliminated. gcc/testsuite/ChangeLog: * gcc.target/xtensa/elim_callee_saved.c: New. --- gcc/config/xtensa/xtensa.cc | 132 ++++++++++++++---- .../gcc.target/xtensa/elim_callee_saved.c | 38 +++++ 2 files changed, 145 insertions(+), 25 deletions(-) create mode 100644 gcc/testsuite/gcc.target/xtensa/elim_callee_saved.c diff --git a/gcc/config/xtensa/xtensa.cc b/gcc/config/xtensa/xtensa.cc index 3e2e22d4cbe..ff59c933d4d 100644 --- a/gcc/config/xtensa/xtensa.cc +++ b/gcc/config/xtensa/xtensa.cc @@ -105,6 +105,7 @@ struct GTY(()) machine_function bool epilogue_done; bool inhibit_logues_a1_adjusts; rtx last_logues_a9_content; + HOST_WIDE_INT eliminated_callee_saved_bmp; }; static void xtensa_option_override (void); @@ -3343,6 +3344,66 @@ xtensa_emit_adjust_stack_ptr (HOST_WIDE_INT offset, int flags) cfun->machine->last_logues_a9_content = GEN_INT (offset); } +static bool +xtensa_can_eliminate_callee_saved_reg_p (unsigned int regno, + rtx_insn **p_insnS, + rtx_insn **p_insnR) +{ + df_ref ref; + rtx_insn *insn, *insnS = NULL, *insnR = NULL; + rtx pattern; + + if (!optimize || !df || call_used_or_fixed_reg_p (regno)) + return false; + + for (ref = DF_REG_DEF_CHAIN (regno); + ref; ref = DF_REF_NEXT_REG (ref)) + if (DF_REF_CLASS (ref) != DF_REF_REGULAR + || DEBUG_INSN_P (insn = DF_REF_INSN (ref))) + continue; + else if (GET_CODE (pattern = PATTERN (insn)) == SET + && REG_P (SET_DEST (pattern)) + && REGNO (SET_DEST (pattern)) == regno + && REG_NREGS (SET_DEST (pattern)) == 1 + && REG_P (SET_SRC (pattern)) + && REGNO (SET_SRC (pattern)) != A1_REG) + { + if (insnS) + return false; + insnS = insn; + continue; + } + else + return false; + + for (ref = DF_REG_USE_CHAIN (regno); + ref; ref = DF_REF_NEXT_REG (ref)) + if (DF_REF_CLASS (ref) != DF_REF_REGULAR + || DEBUG_INSN_P (insn = DF_REF_INSN (ref))) + continue; + else if (GET_CODE (pattern = PATTERN (insn)) == SET + && REG_P (SET_SRC (pattern)) + && REGNO (SET_SRC (pattern)) == regno + && REG_NREGS (SET_SRC (pattern)) == 1 + && REG_P (SET_DEST (pattern)) + && REGNO (SET_DEST (pattern)) != A1_REG) + { + if (insnR) + return false; + insnR = insn; + continue; + } + else + return false; + + if (!insnS || !insnR) + return false; + + *p_insnS = insnS, *p_insnR = insnR; + + return true; +} + /* minimum frame = reg save area (4 words) plus static chain (1 word) and the total number of words must be a multiple of 128 bits. */ #define MIN_FRAME_SIZE (8 * UNITS_PER_WORD) @@ -3382,6 +3443,7 @@ xtensa_expand_prologue (void) df_ref ref; bool stack_pointer_needed = frame_pointer_needed || crtl->calls_eh_return; + bool large_stack_needed; /* Check if the function body really needs the stack pointer. */ if (!stack_pointer_needed && df) @@ -3430,23 +3492,41 @@ xtensa_expand_prologue (void) } } + large_stack_needed = total_size > 1024 + || (!callee_save_size && total_size > 128); for (regno = 0; regno < FIRST_PSEUDO_REGISTER; ++regno) - { - if (xtensa_call_save_reg(regno)) - { - rtx x = gen_rtx_PLUS (Pmode, stack_pointer_rtx, GEN_INT (offset)); - rtx mem = gen_frame_mem (SImode, x); - rtx reg = gen_rtx_REG (SImode, regno); + if (xtensa_call_save_reg(regno)) + { + rtx x = gen_rtx_PLUS (Pmode, + stack_pointer_rtx, GEN_INT (offset)); + rtx mem = gen_frame_mem (SImode, x); + rtx_insn *insnS, *insnR; + + if (!large_stack_needed + && xtensa_can_eliminate_callee_saved_reg_p (regno, + &insnS, &insnR)) + { + if (frame_pointer_needed) + mem = replace_rtx (mem, stack_pointer_rtx, + hard_frame_pointer_rtx); + SET_DEST (PATTERN (insnS)) = mem; + df_insn_rescan (insnS); + SET_SRC (PATTERN (insnR)) = copy_rtx (mem); + df_insn_rescan (insnR); + cfun->machine->eliminated_callee_saved_bmp |= 1 << regno; + } + else + { + rtx reg = gen_rtx_REG (SImode, regno); - offset -= UNITS_PER_WORD; - insn = emit_move_insn (mem, reg); - RTX_FRAME_RELATED_P (insn) = 1; - add_reg_note (insn, REG_FRAME_RELATED_EXPR, - gen_rtx_SET (mem, reg)); - } - } - if (total_size > 1024 - || (!callee_save_size && total_size > 128)) + insn = emit_move_insn (mem, reg); + RTX_FRAME_RELATED_P (insn) = 1; + add_reg_note (insn, REG_FRAME_RELATED_EXPR, + gen_rtx_SET (mem, reg)); + } + offset -= UNITS_PER_WORD; + } + if (large_stack_needed) xtensa_emit_adjust_stack_ptr (callee_save_size - total_size, ADJUST_SP_NEED_NOTE); } @@ -3535,16 +3615,18 @@ xtensa_expand_epilogue (bool sibcall_p) emit_insn (gen_blockage ()); for (regno = 0; regno < FIRST_PSEUDO_REGISTER; ++regno) - { - if (xtensa_call_save_reg(regno)) - { - rtx x = gen_rtx_PLUS (Pmode, stack_pointer_rtx, GEN_INT (offset)); - - offset -= UNITS_PER_WORD; - emit_move_insn (gen_rtx_REG (SImode, regno), - gen_frame_mem (SImode, x)); - } - } + if (xtensa_call_save_reg(regno)) + { + if (! (cfun->machine->eliminated_callee_saved_bmp + & (1 << regno))) + { + rtx x = gen_rtx_PLUS (Pmode, + stack_pointer_rtx, GEN_INT (offset)); + emit_move_insn (gen_rtx_REG (SImode, regno), + gen_frame_mem (SImode, x)); + } + offset -= UNITS_PER_WORD; + } if (sibcall_p) emit_use (gen_rtx_REG (SImode, A0_REG)); diff --git a/gcc/testsuite/gcc.target/xtensa/elim_callee_saved.c b/gcc/testsuite/gcc.target/xtensa/elim_callee_saved.c new file mode 100644 index 00000000000..cd3d6b9f249 --- /dev/null +++ b/gcc/testsuite/gcc.target/xtensa/elim_callee_saved.c @@ -0,0 +1,38 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -mabi=call0" } */ + +extern void foo(void); + +/* eliminated one register (the reservoir of variable 'a') by its stack slot through the stack pointer. */ +int test0(int a) { + int array[252]; /* the maximum bound of non-large stack. */ + foo(); + asm volatile("" : : "m"(array)); + return a; +} + +/* cannot eliminate if large stack is needed, because the offset from TOS cannot fit into single L32I/S32I instruction. */ +int test1(int a) { + int array[10000]; /* requires large stack. */ + foo(); + asm volatile("" : : "m"(array)); + return a; +} + +/* register A15 is the reservoir of the stack pointer and cannot be eliminated if the frame pointer is needed. + other registers still can be, but through the frame pointer rather the stack pointer. */ +int test2(int a) { + int* p = __builtin_alloca(16); + foo(); + asm volatile("" : : "r"(p)); + return a; +} + +/* in -O0 the composite hard registers may still remain unsplitted at pro_and_epilogue and must be excluded. */ +extern double bar(void); +int __attribute__((optimize(0))) test3(int a) { + return bar() + a; +} + +/* { dg-final { scan-assembler-times "mov\t|mov.n\t" 21 } } */ +/* { dg-final { scan-assembler-times "a15, 8" 2 } } */ -- 2.30.2