From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from omggw7022-vm1.mail.djm.yahoo.co.jp (omggw7022-vm1.mail.djm.yahoo.co.jp [183.79.55.72]) by sourceware.org (Postfix) with ESMTPS id 2AB9C3858D33 for ; Fri, 17 Feb 2023 04:28:26 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 2AB9C3858D33 Authentication-Results: sourceware.org; dmarc=pass (p=quarantine dis=none) header.from=yahoo.co.jp Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=yahoo.co.jp X-YMail-OSG: sH_zTUcVM1nF1VLIol9SEStjr5lgtLTdunnwqsYCcFosKyMdNHo1WTY8ao9cUcl 4JV0jHk.N4BoI.uk7ZWuct2J9QuFjB4G7QTkoylK.i74v4oTkEcA9I_qdBAfEYzjSlYqHAWtyEP3 8Y8GmYtzmrfUBAtlcAe7HJl6TH5ZptFSzu.rQ3bSRHc68e4uHrvC3AEQIi314vrhCmk5vft.WQqf egxIQ1CAVGsES_fsxxP5Iv3.uAIAKeADj_7oaP020GHP3QtW90C2e8o3oFgo5TKcMMIaxR6sTvr3 TD4uGmPaud_huoI3DqpZv66ulw6lsw5jvsTtAX1WKTBZZs6PrMQ0CuplB8d4eCqmoKtTrR4uGIUD Z9ecc4spzYE65W6JRW9kBVcIDv8O1Ae3DK3LS_RlesLsotw1D8Wm8myUS6Nm7RyzFzxU2pQnRORE 0P2rOl.6vt73heMSfXgQgKAWO4Yjr.EJB.GVEXMPpe3lZXLsIT978dBmVNzqjhrRiaHRSQNhqdI9 GxGpwXWmi92bMrHLzaKCQhspaggR20_3huz1..QL0qnB1cmq_Dl4cGeKSIo411L732B7GN_Ay0NP R4UMcpPTllG8R.Na.bGkMPCl87g54xYaEQxRHRMAUE0XJ7P0.Gd6nLp2.56J9.fL6HIr2e1pgI1Z 4mbhrso3IZGtH25stulaNxeQv02oMpEx_akmwb9TE56Kpv5GrgtmthYU7whGnsl9p52EDmH84AoF CmEYw_AmjjRLDqdyPNotA_ST3k67g7SCH57bGKG4_aUCIFBHL_pJQK87sayprj_LEApxHDWOF.zu tyXiTylX067iBnhXxRrLB.2pIB2Bh.qo6Lj.MYLKAl.y_tc53ImObryd1nfgOuofH74Oa4TM8.6e wfXG2Ylz2iSwhbHolpTle4joZQGBioJdlYbPwYcNxU8kxqwruview4Sl5ji0GDTzlQo.T8I2hrD3 1Pls- Received: from sonicgw.mail.yahoo.co.jp by sonicconh5001.mail.kks.yahoo.co.jp with HTTP; Fri, 17 Feb 2023 04:28:22 +0000 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; t=1676608102; s=yj20110701; d=yahoo.co.jp; h=Message-ID:Date:MIME-Version:Subject:To:References:Cc:From:In-Reply-To:Content-Type:Content-Transfer-Encoding; bh=VN+LOJ0GR7D3JtTD8BVRZune5T/AV/EejAWUb9Ns8a8=; b=Hud/qqqoT7zaQkD71m/CjwjJjgw1bQxXOxBkVKRCvMUlQmaOOQI/nll8PdjuftBr 02+mC/IySzNWVcEaG6Gv9sMsWQ4HqIkf7EcN+PTwfDbhnlyTLB/QnWVXvZgekqRb9EM fu4rHV84x0YP3dq7q0Y7mlHyjRWI45k4qRwIOk4k= DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=yj20110701; d=yahoo.co.jp; h=Message-ID:Date:MIME-Version:References:Cc:From:In-Reply-To:Content-Type:Content-Transfer-Encoding; b=BCWO4HV1Z62lDvBcn19aajc1aDLkcMlnIIl23wuFjQBgYyn9SVPkTvS1h2FA0XEZ KdJsuELt2VI1yOXXZQ0neWp2OMVJKZTy0XvbQodiAZ1Vcy/C5QZ0Oiys8G2Std+VYQ/ Q2KYvYhLBqDmR/51jb3+iGw1wZE5aRRFQbZWl9hE=; Received: by smtphe5005.mail.kks.ynwp.yahoo.co.jp (YJ Hermes SMTP Server) with ESMTPA ID d67e129100f3451f380381e1550df6a9; Fri, 17 Feb 2023 13:28:19 +0900 (JST) Message-ID: Date: Fri, 17 Feb 2023 13:28:15 +0900 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:102.0) Gecko/20100101 Thunderbird/102.8.0 Subject: Re: [PATCH v6] xtensa: Eliminate the use of callee-saved register that saves and restores only once To: Max Filippov References: <23119c5d-75a4-af2d-ad6e-8e125b0891f9.ref@yahoo.co.jp> <23119c5d-75a4-af2d-ad6e-8e125b0891f9@yahoo.co.jp> Cc: GCC Patches From: Takayuki 'January June' Suwa In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-13.1 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_ENVFROM_END_DIGIT,FREEMAIL_FROM,GIT_PATCH_0,KAM_SHORT,NICE_REPLY_A,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On 2023/02/16 7:18, Max Filippov wrote: > Hi Suwa-san, Hi! > > On Thu, Jan 26, 2023 at 7:17 PM Takayuki 'January June' Suwa > wrote: >> >> In the case of the CALL0 ABI, values that must be retained before and >> after function calls are placed in the callee-saved registers (A12 >> through A15) and referenced later. However, it is often the case that >> the save and the reference are each only once and a simple register- >> register move (with two exceptions; i. the register saved to/restored >> from is the stack pointer, ii. the function needs an additional stack >> pointer adjustment to grow the stack). >> >> e.g. in the following example, if there are no other occurrences of >> register A14: >> >> ;; before >> ; prologue { >> ... >> s32i.n a14, sp, 16 >> ... ;; no frame pointer needed >> ;; no additional stack growth >> ; } prologue >> ... >> mov.n a14, a6 ;; A6 is not SP >> ... >> call0 foo >> ... >> mov.n a8, a14 ;; A8 is not SP >> ... >> ; epilogue { >> ... >> l32i.n a14, sp, 16 >> ... >> ; } epilogue >> >> It can be possible like this: >> >> ;; after >> ; prologue { >> ... >> (no save needed) >> ... >> ; } prologue >> ... >> s32i.n a6, sp, 16 ;; replaced with A14's slot >> ... >> call0 foo >> ... >> l32i.n a8, sp, 16 ;; through SP >> ... >> ; epilogue { >> ... >> (no restoration needed) >> ... >> ; } epilogue >> >> This patch adds the abovementioned logic to the function prologue/epilogue >> RTL expander code. >> >> gcc/ChangeLog: >> >> * config/xtensa/xtensa.cc (machine_function): Add new member >> 'eliminated_callee_saved_bmp'. >> (xtensa_can_eliminate_callee_saved_reg_p): New function to >> determine whether the register can be eliminated or not. >> (xtensa_expand_prologue): Add invoking the above function and >> elimination the use of callee-saved register by using its stack >> slot through the stack pointer (or the frame pointer if needed) >> directly. >> (xtensa_expand_prologue): Modify to not emit register restoration >> insn from its stack slot if the register is already eliminated. >> >> gcc/testsuite/ChangeLog: >> >> * gcc.target/xtensa/elim_callee_saved.c: New. >> --- >> gcc/config/xtensa/xtensa.cc | 132 ++++++++++++++---- >> .../gcc.target/xtensa/elim_callee_saved.c | 38 +++++ >> 2 files changed, 145 insertions(+), 25 deletions(-) >> create mode 100644 gcc/testsuite/gcc.target/xtensa/elim_callee_saved.c > > This version passes regression tests, but I still have a couple questions. > >> diff --git a/gcc/config/xtensa/xtensa.cc b/gcc/config/xtensa/xtensa.cc >> index 3e2e22d4cbe..ff59c933d4d 100644 >> --- a/gcc/config/xtensa/xtensa.cc >> +++ b/gcc/config/xtensa/xtensa.cc >> @@ -105,6 +105,7 @@ struct GTY(()) machine_function >> bool epilogue_done; >> bool inhibit_logues_a1_adjusts; >> rtx last_logues_a9_content; >> + HOST_WIDE_INT eliminated_callee_saved_bmp; >> }; >> >> static void xtensa_option_override (void); >> @@ -3343,6 +3344,66 @@ xtensa_emit_adjust_stack_ptr (HOST_WIDE_INT offset, int flags) >> cfun->machine->last_logues_a9_content = GEN_INT (offset); >> } >> >> +static bool >> +xtensa_can_eliminate_callee_saved_reg_p (unsigned int regno, >> + rtx_insn **p_insnS, >> + rtx_insn **p_insnR) >> +{ >> + df_ref ref; >> + rtx_insn *insn, *insnS = NULL, *insnR = NULL; >> + rtx pattern; >> + >> + if (!optimize || !df || call_used_or_fixed_reg_p (regno)) >> + return false; >> + >> + for (ref = DF_REG_DEF_CHAIN (regno); >> + ref; ref = DF_REF_NEXT_REG (ref)) >> + if (DF_REF_CLASS (ref) != DF_REF_REGULAR >> + || DEBUG_INSN_P (insn = DF_REF_INSN (ref))) >> + continue; >> + else if (GET_CODE (pattern = PATTERN (insn)) == SET >> + && REG_P (SET_DEST (pattern)) >> + && REGNO (SET_DEST (pattern)) == regno >> + && REG_NREGS (SET_DEST (pattern)) == 1 >> + && REG_P (SET_SRC (pattern)) >> + && REGNO (SET_SRC (pattern)) != A1_REG) > > Do I understand correctly that the check for A1 here and below is > for the case when regno is a hard frame pointer and the function > needs the frame pointer? If so, wouldn't it be better to check > for it explicitly in the beginning? I see. But I can't be sure that the body of the function never saves and restores the stack pointer to another register if the function doesn't need the frame pointer. Therefore, I think that the validity depends on the regtest. > >> + { >> + if (insnS) >> + return false; >> + insnS = insn; >> + continue; >> + } >> + else >> + return false; >> + >> + for (ref = DF_REG_USE_CHAIN (regno); >> + ref; ref = DF_REF_NEXT_REG (ref)) >> + if (DF_REF_CLASS (ref) != DF_REF_REGULAR >> + || DEBUG_INSN_P (insn = DF_REF_INSN (ref))) >> + continue; >> + else if (GET_CODE (pattern = PATTERN (insn)) == SET >> + && REG_P (SET_SRC (pattern)) >> + && REGNO (SET_SRC (pattern)) == regno >> + && REG_NREGS (SET_SRC (pattern)) == 1 >> + && REG_P (SET_DEST (pattern)) >> + && REGNO (SET_DEST (pattern)) != A1_REG) >> + { >> + if (insnR) >> + return false; >> + insnR = insn; >> + continue; >> + } >> + else >> + return false; >> + >> + if (!insnS || !insnR) >> + return false; >> + >> + *p_insnS = insnS, *p_insnR = insnR; >> + >> + return true; >> +} > > [...] > >> diff --git a/gcc/testsuite/gcc.target/xtensa/elim_callee_saved.c b/gcc/testsuite/gcc.target/xtensa/elim_callee_saved.c >> new file mode 100644 >> index 00000000000..cd3d6b9f249 >> --- /dev/null >> +++ b/gcc/testsuite/gcc.target/xtensa/elim_callee_saved.c >> @@ -0,0 +1,38 @@ >> +/* { dg-do compile } */ >> +/* { dg-options "-O2 -mabi=call0" } */ >> + >> +extern void foo(void); >> + >> +/* eliminated one register (the reservoir of variable 'a') by its stack slot through the stack pointer. */ >> +int test0(int a) { >> + int array[252]; /* the maximum bound of non-large stack. */ >> + foo(); >> + asm volatile("" : : "m"(array)); >> + return a; >> +} >> + >> +/* cannot eliminate if large stack is needed, because the offset from TOS cannot fit into single L32I/S32I instruction. */ >> +int test1(int a) { >> + int array[10000]; /* requires large stack. */ >> + foo(); >> + asm volatile("" : : "m"(array)); >> + return a; >> +} >> + >> +/* register A15 is the reservoir of the stack pointer and cannot be eliminated if the frame pointer is needed. >> + other registers still can be, but through the frame pointer rather the stack pointer. */ >> +int test2(int a) { >> + int* p = __builtin_alloca(16); >> + foo(); >> + asm volatile("" : : "r"(p)); >> + return a; >> +} >> + >> +/* in -O0 the composite hard registers may still remain unsplitted at pro_and_epilogue and must be excluded. */ >> +extern double bar(void); >> +int __attribute__((optimize(0))) test3(int a) { >> + return bar() + a; >> +} >> + >> +/* { dg-final { scan-assembler-times "mov\t|mov.n\t" 21 } } */ > > This test looks quite fragile as the number of movs would vary > when the testsuite is run with additional options. I totally agree with your point. > >> +/* { dg-final { scan-assembler-times "a15, 8" 2 } } */ >> -- >> 2.30.2