Re: [PATCH v4] xtensa: Eliminate the use of callee-saved register that saves and restores only once

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

From: Takayuki 'January June' Suwa <jjsuwa_sys3175@yahoo.co.jp>
To: Max Filippov <jcmvbkbc@gmail.com>
Cc: GCC Patches <gcc-patches@gcc.gnu.org>
Subject: Re: [PATCH v4] xtensa: Eliminate the use of callee-saved register that saves and restores only once
Date: Sat, 21 Jan 2023 13:39:43 +0900	[thread overview]
Message-ID: <d03c75c4-25e8-c505-d957-f62742c5b642@yahoo.co.jp> (raw)
In-Reply-To: <CAMo8BfLFFSHHUxPJ90Z=w8GnWYezfkPK1okQtdTmwtGgmqCNEw@mail.gmail.com>

On 2023/01/21 0:14, Max Filippov wrote:
> Hi Suwa-san,
Hi!

> 
> On Wed, Jan 18, 2023 at 7:50 PM Takayuki 'January June' Suwa
> <jjsuwa_sys3175@yahoo.co.jp> wrote:
>>
>> In the previous patch, if insn is JUMP_INSN or CALL_INSN, it bypasses the reg check (possibly FAIL).
>>
>> =====
>> In the case of the CALL0 ABI, values that must be retained before and
>> after function calls are placed in the callee-saved registers (A12
>> through A15) and referenced later.  However, it is often the case that
>> the save and the reference are each only once and a simple register-
>> register move (the frame pointer is needed to recover the stack pointer
>> and must be excluded).
>>
>> e.g. in the following example, if there are no other occurrences of
>> register A14:
>>
>> ;; before
>>         ; prologue {
>>   ...
>>         s32i.n  a14, sp, 16
>>   ...
>>         ; } prologue
>>   ...
>>         mov.n   a14, a6
>>   ...
>>         call0   foo
>>   ...
>>         mov.n   a8, a14
>>   ...
>>         ; epilogue {
>>   ...
>>         l32i.n  a14, sp, 16
>>   ...
>>         ; } epilogue
>>
>> It can be possible like this:
>>
>> ;; after
>>         ; prologue {
>>   ...
>>         (deleted)
>>   ...
>>         ; } prologue
>>   ...
>>         s32i.n  a6, sp, 16
>>   ...
>>         call0   foo
>>   ...
>>         l32i.n  a8, sp, 16
>>   ...
>>         ; epilogue {
>>   ...
>>         (deleted)
>>   ...
>>         ; } epilogue
>>
>> This patch introduces a new peephole2 pattern that implements the above.
>>
>> gcc/ChangeLog:
>>
>>         * config/xtensa/xtensa.md: New peephole2 pattern that eliminates
>>         the use of callee-saved register that saves and restores only once
>>         for other register, by using its stack slot directly.
>> ---
>>  gcc/config/xtensa/xtensa.md | 62 +++++++++++++++++++++++++++++++++++++
>>  1 file changed, 62 insertions(+)
> 
> There are still issues with this change in the libgomp:
> 
> FAIL: libgomp.c/examples-4/target-1.c execution test
> FAIL: libgomp.c/examples-4/target-2.c execution test
> 
> They come from the following function:
> 
> code produced before the change:
>        .literal_position
>        .literal .LC8, init@PLT
>        .literal .LC9, 400000
>        .literal .LC10, 100000
>        .literal .LC11, -800000
>        .literal .LC12, 800000
>        .align  4
>        .global vec_mult_ref
>        .type   vec_mult_ref, @function
> vec_mult_ref:
>        l32r    a9, .LC11
>        addi    sp, sp, -16
>        l32r    a10, .LC9
>        s32i.n  a12, sp, 8
>        s32i.n  a13, sp, 4
>        s32i.n  a0, sp, 12
>        add.n   sp, sp, a9
>        add.n   a12, sp, a10
>        l32r    a9, .LC8
>        mov.n   a13, a2
>        mov.n   a3, sp
>        mov.n   a2, a12
>        callx0  a9
>        l32r    a7, .LC10
>        mov.n   a10, a12
>        mov.n   a11, sp
>        mov.n   a2, a13
>        loop    a7, .L17_LEND
> .L17:
>        l32i.n  a9, a10, 0
>        l32i.n  a6, a11, 0
>        addi.n  a10, a10, 4
>        mull    a9, a9, a6
>        addi.n  a11, a11, 4
>        s32i.n  a9, a2, 0
>        addi.n  a2, a2, 4
>        .L17_LEND:
>        l32r    a9, .LC12
>        add.n   sp, sp, a9
>        l32i.n  a0, sp, 12
>        l32i.n  a12, sp, 8
>        l32i.n  a13, sp, 4
>        addi    sp, sp, 16
>        ret.n
> 
> --------------------
> 
> with the change:
>        .literal_position
>        .literal .LC8, init@PLT
>        .literal .LC9, 400000
>        .literal .LC10, 100000
>        .literal .LC11, -800000
>        .literal .LC12, 800000
>        .align  4
>        .global vec_mult_ref
>        .type   vec_mult_ref, @function
> vec_mult_ref:
>        l32r    a9, .LC11
>        l32r    a10, .LC9
>        addi    sp, sp, -16
>        s32i.n  a12, sp, 8
>        s32i.n  a0, sp, 12
>        add.n   sp, sp, a9
>        add.n   a12, sp, a10
>        l32r    a9, .LC8
>        s32i.n  a2, sp, 4
>        mov.n   a3, sp
>        mov.n   a2, a12
>        callx0  a9
>        l32r    a7, .LC10
>        l32i.n  a2, sp, 4
>        mov.n   a10, a12
>        mov.n   a11, sp
>        loop    a7, .L17_LEND
> .L17:
>        l32i.n  a9, a10, 0
>        l32i.n  a6, a11, 0
>        addi.n  a10, a10, 4
>        mull    a9, a9, a6
>        addi.n  a11, a11, 4
>        s32i.n  a9, a2, 0
>        addi.n  a2, a2, 4
>        .L17_LEND:
>        l32r    a9, .LC12
>        add.n   sp, sp, a9
>        l32i.n  a0, sp, 12
>        l32i.n  a12, sp, 8
>        addi    sp, sp, 16
>        ret.n
> 
> the stack pointer is modified after saving callee-saved registers,
> but the stack offset where a2 is stored and reloaded does not take
> this into an account.
> 
> After having this many attempts and getting to the issues that are
> really hard to detect I wonder if the target backend is the right place
> for this optimization?
> 
I guess they are not hard to detect but just issues I didn't anticipate (and I just need a little more work).
And where else should it be done?  What about implementing a target-specific pass just for one-point optimization?

next prev parent reply	other threads:[~2023-01-21  4:39 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <465b0cbe-73ca-f5a0-661d-d34217e29b4d.ref@yahoo.co.jp>
2023-01-19  3:50 ` Takayuki 'January June' Suwa
2023-01-20 15:14   ` Max Filippov
2023-01-21  4:39     ` Takayuki 'January June' Suwa [this message]
2023-01-22 15:45       ` Max Filippov
2023-01-23  3:33         ` Takayuki 'January June' Suwa

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=d03c75c4-25e8-c505-d957-f62742c5b642@yahoo.co.jp \
    --to=jjsuwa_sys3175@yahoo.co.jp \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=jcmvbkbc@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).