public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: Max Filippov <jcmvbkbc@gmail.com>
To: "Takayuki 'January June' Suwa" <jjsuwa_sys3175@yahoo.co.jp>
Cc: GCC Patches <gcc-patches@gcc.gnu.org>
Subject: Re: [PATCH v6] xtensa: Eliminate the use of callee-saved register that saves and restores only once
Date: Wed, 15 Feb 2023 14:18:11 -0800	[thread overview]
Message-ID: <CAMo8BfLO7rDFyfBtTaCaZYPtTzct+bKa_VBpShak5yZuD6-UCA@mail.gmail.com> (raw)
In-Reply-To: <23119c5d-75a4-af2d-ad6e-8e125b0891f9@yahoo.co.jp>

Hi Suwa-san,

On Thu, Jan 26, 2023 at 7:17 PM Takayuki 'January June' Suwa
<jjsuwa_sys3175@yahoo.co.jp> wrote:
>
> In the case of the CALL0 ABI, values that must be retained before and
> after function calls are placed in the callee-saved registers (A12
> through A15) and referenced later.  However, it is often the case that
> the save and the reference are each only once and a simple register-
> register move (with two exceptions; i. the register saved to/restored
> from is the stack pointer, ii. the function needs an additional stack
> pointer adjustment to grow the stack).
>
> e.g. in the following example, if there are no other occurrences of
> register A14:
>
> ;; before
>         ; prologue {
>   ...
>         s32i.n  a14, sp, 16
>   ...                           ;; no frame pointer needed
>                                 ;; no additional stack growth
>         ; } prologue
>   ...
>         mov.n   a14, a6         ;; A6 is not SP
>   ...
>         call0   foo
>   ...
>         mov.n   a8, a14         ;; A8 is not SP
>   ...
>         ; epilogue {
>   ...
>         l32i.n  a14, sp, 16
>   ...
>         ; } epilogue
>
> It can be possible like this:
>
> ;; after
>         ; prologue {
>   ...
>         (no save needed)
>   ...
>         ; } prologue
>   ...
>         s32i.n  a6, sp, 16      ;; replaced with A14's slot
>   ...
>         call0   foo
>   ...
>         l32i.n  a8, sp, 16      ;; through SP
>   ...
>         ; epilogue {
>   ...
>         (no restoration needed)
>   ...
>         ; } epilogue
>
> This patch adds the abovementioned logic to the function prologue/epilogue
> RTL expander code.
>
> gcc/ChangeLog:
>
>         * config/xtensa/xtensa.cc (machine_function): Add new member
>         'eliminated_callee_saved_bmp'.
>         (xtensa_can_eliminate_callee_saved_reg_p): New function to
>         determine whether the register can be eliminated or not.
>         (xtensa_expand_prologue): Add invoking the above function and
>         elimination the use of callee-saved register by using its stack
>         slot through the stack pointer (or the frame pointer if needed)
>         directly.
>         (xtensa_expand_prologue): Modify to not emit register restoration
>         insn from its stack slot if the register is already eliminated.
>
> gcc/testsuite/ChangeLog:
>
>         * gcc.target/xtensa/elim_callee_saved.c: New.
> ---
>  gcc/config/xtensa/xtensa.cc                   | 132 ++++++++++++++----
>  .../gcc.target/xtensa/elim_callee_saved.c     |  38 +++++
>  2 files changed, 145 insertions(+), 25 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/xtensa/elim_callee_saved.c

This version passes regression tests, but I still have a couple questions.

> diff --git a/gcc/config/xtensa/xtensa.cc b/gcc/config/xtensa/xtensa.cc
> index 3e2e22d4cbe..ff59c933d4d 100644
> --- a/gcc/config/xtensa/xtensa.cc
> +++ b/gcc/config/xtensa/xtensa.cc
> @@ -105,6 +105,7 @@ struct GTY(()) machine_function
>    bool epilogue_done;
>    bool inhibit_logues_a1_adjusts;
>    rtx last_logues_a9_content;
> +  HOST_WIDE_INT eliminated_callee_saved_bmp;
>  };
>
>  static void xtensa_option_override (void);
> @@ -3343,6 +3344,66 @@ xtensa_emit_adjust_stack_ptr (HOST_WIDE_INT offset, int flags)
>      cfun->machine->last_logues_a9_content = GEN_INT (offset);
>  }
>
> +static bool
> +xtensa_can_eliminate_callee_saved_reg_p (unsigned int regno,
> +                                        rtx_insn **p_insnS,
> +                                        rtx_insn **p_insnR)
> +{
> +  df_ref ref;
> +  rtx_insn *insn, *insnS = NULL, *insnR = NULL;
> +  rtx pattern;
> +
> +  if (!optimize || !df || call_used_or_fixed_reg_p (regno))
> +    return false;
> +
> +  for (ref = DF_REG_DEF_CHAIN (regno);
> +       ref; ref = DF_REF_NEXT_REG (ref))
> +    if (DF_REF_CLASS (ref) != DF_REF_REGULAR
> +       || DEBUG_INSN_P (insn = DF_REF_INSN (ref)))
> +      continue;
> +    else if (GET_CODE (pattern = PATTERN (insn)) == SET
> +            && REG_P (SET_DEST (pattern))
> +            && REGNO (SET_DEST (pattern)) == regno
> +            && REG_NREGS (SET_DEST (pattern)) == 1
> +            && REG_P (SET_SRC (pattern))
> +            && REGNO (SET_SRC (pattern)) != A1_REG)

Do I understand correctly that the check for A1 here and below is
for the case when regno is a hard frame pointer and the function
needs the frame pointer? If so, wouldn't it be better to check
for it explicitly in the beginning?

> +      {
> +       if (insnS)
> +         return false;
> +       insnS = insn;
> +       continue;
> +      }
> +    else
> +      return false;
> +
> +  for (ref = DF_REG_USE_CHAIN (regno);
> +       ref; ref = DF_REF_NEXT_REG (ref))
> +    if (DF_REF_CLASS (ref) != DF_REF_REGULAR
> +       || DEBUG_INSN_P (insn = DF_REF_INSN (ref)))
> +      continue;
> +    else if (GET_CODE (pattern = PATTERN (insn)) == SET
> +            && REG_P (SET_SRC (pattern))
> +            && REGNO (SET_SRC (pattern)) == regno
> +            && REG_NREGS (SET_SRC (pattern)) == 1
> +            && REG_P (SET_DEST (pattern))
> +            && REGNO (SET_DEST (pattern)) != A1_REG)
> +      {
> +       if (insnR)
> +         return false;
> +       insnR = insn;
> +       continue;
> +      }
> +    else
> +      return false;
> +
> +  if (!insnS || !insnR)
> +    return false;
> +
> +  *p_insnS = insnS, *p_insnR = insnR;
> +
> +  return true;
> +}

[...]

> diff --git a/gcc/testsuite/gcc.target/xtensa/elim_callee_saved.c b/gcc/testsuite/gcc.target/xtensa/elim_callee_saved.c
> new file mode 100644
> index 00000000000..cd3d6b9f249
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/xtensa/elim_callee_saved.c
> @@ -0,0 +1,38 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -mabi=call0" } */
> +
> +extern void foo(void);
> +
> +/* eliminated one register (the reservoir of variable 'a') by its stack slot through the stack pointer.  */
> +int test0(int a) {
> +  int array[252];  /* the maximum bound of non-large stack.  */
> +  foo();
> +  asm volatile("" : : "m"(array));
> +  return a;
> +}
> +
> +/* cannot eliminate if large stack is needed, because the offset from TOS cannot fit into single L32I/S32I instruction.  */
> +int test1(int a) {
> +  int array[10000];  /* requires large stack.  */
> +  foo();
> +  asm volatile("" : : "m"(array));
> +  return a;
> +}
> +
> +/* register A15 is the reservoir of the stack pointer and cannot be eliminated if the frame pointer is needed.
> +   other registers still can be, but through the frame pointer rather the stack pointer.  */
> +int test2(int a) {
> +  int* p = __builtin_alloca(16);
> +  foo();
> +  asm volatile("" : : "r"(p));
> +  return a;
> +}
> +
> +/* in -O0 the composite hard registers may still remain unsplitted at pro_and_epilogue and must be excluded.  */
> +extern double bar(void);
> +int __attribute__((optimize(0))) test3(int a) {
> +  return bar() + a;
> +}
> +
> +/* { dg-final { scan-assembler-times "mov\t|mov.n\t" 21 } } */

This test looks quite fragile as the number of movs would vary
when the testsuite is run with additional options.

> +/* { dg-final { scan-assembler-times "a15, 8" 2 } } */
> --
> 2.30.2

-- 
Thanks.
-- Max

  reply	other threads:[~2023-02-15 22:18 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <23119c5d-75a4-af2d-ad6e-8e125b0891f9.ref@yahoo.co.jp>
2023-01-27  3:17 ` Takayuki 'January June' Suwa
2023-02-15 22:18   ` Max Filippov [this message]
2023-02-17  4:28     ` Takayuki 'January June' Suwa

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAMo8BfLO7rDFyfBtTaCaZYPtTzct+bKa_VBpShak5yZuD6-UCA@mail.gmail.com \
    --to=jcmvbkbc@gmail.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=jjsuwa_sys3175@yahoo.co.jp \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).