From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pj1-x102a.google.com (mail-pj1-x102a.google.com [IPv6:2607:f8b0:4864:20::102a]) by sourceware.org (Postfix) with ESMTPS id D8F8A3858D33 for ; Wed, 15 Feb 2023 22:18:26 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org D8F8A3858D33 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com Received: by mail-pj1-x102a.google.com with SMTP id w14-20020a17090a5e0e00b00233d3b9650eso3816218pjf.4 for ; Wed, 15 Feb 2023 14:18:26 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=W0MWwY3APbVNI1q/D6I6xCm1Yexw3PmOPCOkboFW3+o=; b=SmGqn6O3EZv7GgAMJvA8TC+7M8m9q52zE9ec5leT4+zzTTAS9D1kHlBHTKsG44xlKQ 6ChXAQpTlDQLHob6McS95WdgE9t6yVQ3xND5OqcUL+nQ5o/vAHT6K/vrtVr60zlgtf/J jTKiUOH3ost092NzIPdxSkWoyqqsWuHWgUh/IhbkTp5O3UW9d5hqU+1gGQwEl5bPtB+C tigMRn1CzqwlyK0xlWNj9QJl1Q3C5+aRAYEm33hOuzaQZuynysxEwEw6p4yjQ8Z0sj2n x3FARi5lyIL5yaghfpuoW3tHxPFANDfFGzKr3GGqwGUck6Uteg+ax8j3OVmHHexWAhzZ B93g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=W0MWwY3APbVNI1q/D6I6xCm1Yexw3PmOPCOkboFW3+o=; b=q9nFD6MDrSr5hfJf1QzmCxK4+kePAdKRvHUsKKDPCPpQK9mwhR4uMPxtbPRVJMdrZr MPv+9mQuUDjJUoFGJTHsNz7jzRPjRz8L1hcqv10XZo3zAiAAfxOvG3IftwAcP34Mzuhs idpsu9rRxxhaQlfujdvv0oS6oYGJu2Lk3teRhWQo88QDEYEviFju6BWIPe7CxA1Php/o onSYMLHAQ0WCwBzm5224s9QIpGPpRW1mZctotmJrbCuBorJfDcDKbL0ddiM8rmNG6ros 9qJpAz7Sw0Ln5H++dCCDnjBfR/YYUbqJJH9rlzOxBxbY/MD8ZP+ELoeblogv8r/LdsZV Zjrg== X-Gm-Message-State: AO0yUKUJwtEKvHK1QeY4B4HjEZT8AOQO+5iysCc/1e2jOIpqRNTMYP1E AAslSJ0jERREnfvyksCjAilGUFz8qipT4Fe+8bw= X-Google-Smtp-Source: AK7set9C3OA7jlilDIFfs917jk/wSACl9A57SZWnd9854JrYskQvwqXdcV3bd+N6qsgtZmxoumxw62qKPnt6ZwWG1hM= X-Received: by 2002:a17:90b:2789:b0:234:80e4:81c0 with SMTP id pw9-20020a17090b278900b0023480e481c0mr5947pjb.94.1676499502582; Wed, 15 Feb 2023 14:18:22 -0800 (PST) MIME-Version: 1.0 References: <23119c5d-75a4-af2d-ad6e-8e125b0891f9.ref@yahoo.co.jp> <23119c5d-75a4-af2d-ad6e-8e125b0891f9@yahoo.co.jp> In-Reply-To: <23119c5d-75a4-af2d-ad6e-8e125b0891f9@yahoo.co.jp> From: Max Filippov Date: Wed, 15 Feb 2023 14:18:11 -0800 Message-ID: Subject: Re: [PATCH v6] xtensa: Eliminate the use of callee-saved register that saves and restores only once To: "Takayuki 'January June' Suwa" Cc: GCC Patches Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-6.3 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,FROM_LOCAL_NOVOWEL,GIT_PATCH_0,HK_RANDOM_ENVFROM,HK_RANDOM_FROM,KAM_SHORT,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Hi Suwa-san, On Thu, Jan 26, 2023 at 7:17 PM Takayuki 'January June' Suwa wrote: > > In the case of the CALL0 ABI, values that must be retained before and > after function calls are placed in the callee-saved registers (A12 > through A15) and referenced later. However, it is often the case that > the save and the reference are each only once and a simple register- > register move (with two exceptions; i. the register saved to/restored > from is the stack pointer, ii. the function needs an additional stack > pointer adjustment to grow the stack). > > e.g. in the following example, if there are no other occurrences of > register A14: > > ;; before > ; prologue { > ... > s32i.n a14, sp, 16 > ... ;; no frame pointer needed > ;; no additional stack growth > ; } prologue > ... > mov.n a14, a6 ;; A6 is not SP > ... > call0 foo > ... > mov.n a8, a14 ;; A8 is not SP > ... > ; epilogue { > ... > l32i.n a14, sp, 16 > ... > ; } epilogue > > It can be possible like this: > > ;; after > ; prologue { > ... > (no save needed) > ... > ; } prologue > ... > s32i.n a6, sp, 16 ;; replaced with A14's slot > ... > call0 foo > ... > l32i.n a8, sp, 16 ;; through SP > ... > ; epilogue { > ... > (no restoration needed) > ... > ; } epilogue > > This patch adds the abovementioned logic to the function prologue/epilogue > RTL expander code. > > gcc/ChangeLog: > > * config/xtensa/xtensa.cc (machine_function): Add new member > 'eliminated_callee_saved_bmp'. > (xtensa_can_eliminate_callee_saved_reg_p): New function to > determine whether the register can be eliminated or not. > (xtensa_expand_prologue): Add invoking the above function and > elimination the use of callee-saved register by using its stack > slot through the stack pointer (or the frame pointer if needed) > directly. > (xtensa_expand_prologue): Modify to not emit register restoration > insn from its stack slot if the register is already eliminated. > > gcc/testsuite/ChangeLog: > > * gcc.target/xtensa/elim_callee_saved.c: New. > --- > gcc/config/xtensa/xtensa.cc | 132 ++++++++++++++---- > .../gcc.target/xtensa/elim_callee_saved.c | 38 +++++ > 2 files changed, 145 insertions(+), 25 deletions(-) > create mode 100644 gcc/testsuite/gcc.target/xtensa/elim_callee_saved.c This version passes regression tests, but I still have a couple questions. > diff --git a/gcc/config/xtensa/xtensa.cc b/gcc/config/xtensa/xtensa.cc > index 3e2e22d4cbe..ff59c933d4d 100644 > --- a/gcc/config/xtensa/xtensa.cc > +++ b/gcc/config/xtensa/xtensa.cc > @@ -105,6 +105,7 @@ struct GTY(()) machine_function > bool epilogue_done; > bool inhibit_logues_a1_adjusts; > rtx last_logues_a9_content; > + HOST_WIDE_INT eliminated_callee_saved_bmp; > }; > > static void xtensa_option_override (void); > @@ -3343,6 +3344,66 @@ xtensa_emit_adjust_stack_ptr (HOST_WIDE_INT offset, int flags) > cfun->machine->last_logues_a9_content = GEN_INT (offset); > } > > +static bool > +xtensa_can_eliminate_callee_saved_reg_p (unsigned int regno, > + rtx_insn **p_insnS, > + rtx_insn **p_insnR) > +{ > + df_ref ref; > + rtx_insn *insn, *insnS = NULL, *insnR = NULL; > + rtx pattern; > + > + if (!optimize || !df || call_used_or_fixed_reg_p (regno)) > + return false; > + > + for (ref = DF_REG_DEF_CHAIN (regno); > + ref; ref = DF_REF_NEXT_REG (ref)) > + if (DF_REF_CLASS (ref) != DF_REF_REGULAR > + || DEBUG_INSN_P (insn = DF_REF_INSN (ref))) > + continue; > + else if (GET_CODE (pattern = PATTERN (insn)) == SET > + && REG_P (SET_DEST (pattern)) > + && REGNO (SET_DEST (pattern)) == regno > + && REG_NREGS (SET_DEST (pattern)) == 1 > + && REG_P (SET_SRC (pattern)) > + && REGNO (SET_SRC (pattern)) != A1_REG) Do I understand correctly that the check for A1 here and below is for the case when regno is a hard frame pointer and the function needs the frame pointer? If so, wouldn't it be better to check for it explicitly in the beginning? > + { > + if (insnS) > + return false; > + insnS = insn; > + continue; > + } > + else > + return false; > + > + for (ref = DF_REG_USE_CHAIN (regno); > + ref; ref = DF_REF_NEXT_REG (ref)) > + if (DF_REF_CLASS (ref) != DF_REF_REGULAR > + || DEBUG_INSN_P (insn = DF_REF_INSN (ref))) > + continue; > + else if (GET_CODE (pattern = PATTERN (insn)) == SET > + && REG_P (SET_SRC (pattern)) > + && REGNO (SET_SRC (pattern)) == regno > + && REG_NREGS (SET_SRC (pattern)) == 1 > + && REG_P (SET_DEST (pattern)) > + && REGNO (SET_DEST (pattern)) != A1_REG) > + { > + if (insnR) > + return false; > + insnR = insn; > + continue; > + } > + else > + return false; > + > + if (!insnS || !insnR) > + return false; > + > + *p_insnS = insnS, *p_insnR = insnR; > + > + return true; > +} [...] > diff --git a/gcc/testsuite/gcc.target/xtensa/elim_callee_saved.c b/gcc/testsuite/gcc.target/xtensa/elim_callee_saved.c > new file mode 100644 > index 00000000000..cd3d6b9f249 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/xtensa/elim_callee_saved.c > @@ -0,0 +1,38 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O2 -mabi=call0" } */ > + > +extern void foo(void); > + > +/* eliminated one register (the reservoir of variable 'a') by its stack slot through the stack pointer. */ > +int test0(int a) { > + int array[252]; /* the maximum bound of non-large stack. */ > + foo(); > + asm volatile("" : : "m"(array)); > + return a; > +} > + > +/* cannot eliminate if large stack is needed, because the offset from TOS cannot fit into single L32I/S32I instruction. */ > +int test1(int a) { > + int array[10000]; /* requires large stack. */ > + foo(); > + asm volatile("" : : "m"(array)); > + return a; > +} > + > +/* register A15 is the reservoir of the stack pointer and cannot be eliminated if the frame pointer is needed. > + other registers still can be, but through the frame pointer rather the stack pointer. */ > +int test2(int a) { > + int* p = __builtin_alloca(16); > + foo(); > + asm volatile("" : : "r"(p)); > + return a; > +} > + > +/* in -O0 the composite hard registers may still remain unsplitted at pro_and_epilogue and must be excluded. */ > +extern double bar(void); > +int __attribute__((optimize(0))) test3(int a) { > + return bar() + a; > +} > + > +/* { dg-final { scan-assembler-times "mov\t|mov.n\t" 21 } } */ This test looks quite fragile as the number of movs would vary when the testsuite is run with additional options. > +/* { dg-final { scan-assembler-times "a15, 8" 2 } } */ > -- > 2.30.2 -- Thanks. -- Max