From: Takayuki 'January June' Suwa <jjsuwa_sys3175@yahoo.co.jp>
To: Max Filippov <jcmvbkbc@gmail.com>
Cc: GCC Patches <gcc-patches@gcc.gnu.org>
Subject: Re: [PATCH v4] xtensa: Eliminate the use of callee-saved register that saves and restores only once
Date: Sat, 21 Jan 2023 13:39:43 +0900 [thread overview]
Message-ID: <d03c75c4-25e8-c505-d957-f62742c5b642@yahoo.co.jp> (raw)
In-Reply-To: <CAMo8BfLFFSHHUxPJ90Z=w8GnWYezfkPK1okQtdTmwtGgmqCNEw@mail.gmail.com>
On 2023/01/21 0:14, Max Filippov wrote:
> Hi Suwa-san,
Hi!
>
> On Wed, Jan 18, 2023 at 7:50 PM Takayuki 'January June' Suwa
> <jjsuwa_sys3175@yahoo.co.jp> wrote:
>>
>> In the previous patch, if insn is JUMP_INSN or CALL_INSN, it bypasses the reg check (possibly FAIL).
>>
>> =====
>> In the case of the CALL0 ABI, values that must be retained before and
>> after function calls are placed in the callee-saved registers (A12
>> through A15) and referenced later. However, it is often the case that
>> the save and the reference are each only once and a simple register-
>> register move (the frame pointer is needed to recover the stack pointer
>> and must be excluded).
>>
>> e.g. in the following example, if there are no other occurrences of
>> register A14:
>>
>> ;; before
>> ; prologue {
>> ...
>> s32i.n a14, sp, 16
>> ...
>> ; } prologue
>> ...
>> mov.n a14, a6
>> ...
>> call0 foo
>> ...
>> mov.n a8, a14
>> ...
>> ; epilogue {
>> ...
>> l32i.n a14, sp, 16
>> ...
>> ; } epilogue
>>
>> It can be possible like this:
>>
>> ;; after
>> ; prologue {
>> ...
>> (deleted)
>> ...
>> ; } prologue
>> ...
>> s32i.n a6, sp, 16
>> ...
>> call0 foo
>> ...
>> l32i.n a8, sp, 16
>> ...
>> ; epilogue {
>> ...
>> (deleted)
>> ...
>> ; } epilogue
>>
>> This patch introduces a new peephole2 pattern that implements the above.
>>
>> gcc/ChangeLog:
>>
>> * config/xtensa/xtensa.md: New peephole2 pattern that eliminates
>> the use of callee-saved register that saves and restores only once
>> for other register, by using its stack slot directly.
>> ---
>> gcc/config/xtensa/xtensa.md | 62 +++++++++++++++++++++++++++++++++++++
>> 1 file changed, 62 insertions(+)
>
> There are still issues with this change in the libgomp:
>
> FAIL: libgomp.c/examples-4/target-1.c execution test
> FAIL: libgomp.c/examples-4/target-2.c execution test
>
> They come from the following function:
>
> code produced before the change:
> .literal_position
> .literal .LC8, init@PLT
> .literal .LC9, 400000
> .literal .LC10, 100000
> .literal .LC11, -800000
> .literal .LC12, 800000
> .align 4
> .global vec_mult_ref
> .type vec_mult_ref, @function
> vec_mult_ref:
> l32r a9, .LC11
> addi sp, sp, -16
> l32r a10, .LC9
> s32i.n a12, sp, 8
> s32i.n a13, sp, 4
> s32i.n a0, sp, 12
> add.n sp, sp, a9
> add.n a12, sp, a10
> l32r a9, .LC8
> mov.n a13, a2
> mov.n a3, sp
> mov.n a2, a12
> callx0 a9
> l32r a7, .LC10
> mov.n a10, a12
> mov.n a11, sp
> mov.n a2, a13
> loop a7, .L17_LEND
> .L17:
> l32i.n a9, a10, 0
> l32i.n a6, a11, 0
> addi.n a10, a10, 4
> mull a9, a9, a6
> addi.n a11, a11, 4
> s32i.n a9, a2, 0
> addi.n a2, a2, 4
> .L17_LEND:
> l32r a9, .LC12
> add.n sp, sp, a9
> l32i.n a0, sp, 12
> l32i.n a12, sp, 8
> l32i.n a13, sp, 4
> addi sp, sp, 16
> ret.n
>
> --------------------
>
> with the change:
> .literal_position
> .literal .LC8, init@PLT
> .literal .LC9, 400000
> .literal .LC10, 100000
> .literal .LC11, -800000
> .literal .LC12, 800000
> .align 4
> .global vec_mult_ref
> .type vec_mult_ref, @function
> vec_mult_ref:
> l32r a9, .LC11
> l32r a10, .LC9
> addi sp, sp, -16
> s32i.n a12, sp, 8
> s32i.n a0, sp, 12
> add.n sp, sp, a9
> add.n a12, sp, a10
> l32r a9, .LC8
> s32i.n a2, sp, 4
> mov.n a3, sp
> mov.n a2, a12
> callx0 a9
> l32r a7, .LC10
> l32i.n a2, sp, 4
> mov.n a10, a12
> mov.n a11, sp
> loop a7, .L17_LEND
> .L17:
> l32i.n a9, a10, 0
> l32i.n a6, a11, 0
> addi.n a10, a10, 4
> mull a9, a9, a6
> addi.n a11, a11, 4
> s32i.n a9, a2, 0
> addi.n a2, a2, 4
> .L17_LEND:
> l32r a9, .LC12
> add.n sp, sp, a9
> l32i.n a0, sp, 12
> l32i.n a12, sp, 8
> addi sp, sp, 16
> ret.n
>
> the stack pointer is modified after saving callee-saved registers,
> but the stack offset where a2 is stored and reloaded does not take
> this into an account.
>
> After having this many attempts and getting to the issues that are
> really hard to detect I wonder if the target backend is the right place
> for this optimization?
>
I guess they are not hard to detect but just issues I didn't anticipate (and I just need a little more work).
And where else should it be done? What about implementing a target-specific pass just for one-point optimization?
next prev parent reply other threads:[~2023-01-21 4:39 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <465b0cbe-73ca-f5a0-661d-d34217e29b4d.ref@yahoo.co.jp>
2023-01-19 3:50 ` Takayuki 'January June' Suwa
2023-01-20 15:14 ` Max Filippov
2023-01-21 4:39 ` Takayuki 'January June' Suwa [this message]
2023-01-22 15:45 ` Max Filippov
2023-01-23 3:33 ` Takayuki 'January June' Suwa
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=d03c75c4-25e8-c505-d957-f62742c5b642@yahoo.co.jp \
--to=jjsuwa_sys3175@yahoo.co.jp \
--cc=gcc-patches@gcc.gnu.org \
--cc=jcmvbkbc@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).