* [PATCH v4] xtensa: Eliminate the use of callee-saved register that saves and restores only once [not found] <465b0cbe-73ca-f5a0-661d-d34217e29b4d.ref@yahoo.co.jp> @ 2023-01-19 3:50 ` Takayuki 'January June' Suwa 2023-01-20 15:14 ` Max Filippov 0 siblings, 1 reply; 5+ messages in thread From: Takayuki 'January June' Suwa @ 2023-01-19 3:50 UTC (permalink / raw) To: GCC Patches; +Cc: Max Filippov In the previous patch, if insn is JUMP_INSN or CALL_INSN, it bypasses the reg check (possibly FAIL). ===== In the case of the CALL0 ABI, values that must be retained before and after function calls are placed in the callee-saved registers (A12 through A15) and referenced later. However, it is often the case that the save and the reference are each only once and a simple register- register move (the frame pointer is needed to recover the stack pointer and must be excluded). e.g. in the following example, if there are no other occurrences of register A14: ;; before ; prologue { ... s32i.n a14, sp, 16 ... ; } prologue ... mov.n a14, a6 ... call0 foo ... mov.n a8, a14 ... ; epilogue { ... l32i.n a14, sp, 16 ... ; } epilogue It can be possible like this: ;; after ; prologue { ... (deleted) ... ; } prologue ... s32i.n a6, sp, 16 ... call0 foo ... l32i.n a8, sp, 16 ... ; epilogue { ... (deleted) ... ; } epilogue This patch introduces a new peephole2 pattern that implements the above. gcc/ChangeLog: * config/xtensa/xtensa.md: New peephole2 pattern that eliminates the use of callee-saved register that saves and restores only once for other register, by using its stack slot directly. --- gcc/config/xtensa/xtensa.md | 62 +++++++++++++++++++++++++++++++++++++ 1 file changed, 62 insertions(+) diff --git a/gcc/config/xtensa/xtensa.md b/gcc/config/xtensa/xtensa.md index 4f1e8fd13..ac04ef6f0 100644 --- a/gcc/config/xtensa/xtensa.md +++ b/gcc/config/xtensa/xtensa.md @@ -3029,3 +3029,65 @@ FALLTHRU:; operands[1] = GEN_INT (imm0); operands[2] = GEN_INT (imm1); }) + +(define_peephole2 + [(set (match_operand:SI 0 "register_operand") + (match_operand:SI 1 "reload_operand"))] + "!TARGET_WINDOWED_ABI && df + && epilogue_contains (insn) + && ! call_used_or_fixed_reg_p (REGNO (operands[0])) + && (!frame_pointer_needed + || REGNO (operands[0]) != HARD_FRAME_POINTER_REGNUM)" + [(const_int 0)] +{ + rtx reg = operands[0], pattern; + rtx_insn *insnP = NULL, *insnS = NULL, *insnR = NULL; + df_ref ref; + rtx_insn *insn; + for (ref = DF_REG_DEF_CHAIN (REGNO (reg)); + ref; ref = DF_REF_NEXT_REG (ref)) + if (DF_REF_CLASS (ref) != DF_REF_REGULAR + || DEBUG_INSN_P (insn = DF_REF_INSN (ref))) + continue; + else if (insn == curr_insn) + continue; + else if (GET_CODE (pattern = PATTERN (insn)) == SET + && rtx_equal_p (SET_DEST (pattern), reg) + && REG_P (SET_SRC (pattern))) + { + if (insnS) + FAIL; + insnS = insn; + continue; + } + else + FAIL; + for (ref = DF_REG_USE_CHAIN (REGNO (reg)); + ref; ref = DF_REF_NEXT_REG (ref)) + if (DF_REF_CLASS (ref) != DF_REF_REGULAR + || DEBUG_INSN_P (insn = DF_REF_INSN (ref))) + continue; + else if (prologue_contains (insn)) + { + insnP = insn; + continue; + } + else if (GET_CODE (pattern = PATTERN (insn)) == SET + && rtx_equal_p (SET_SRC (pattern), reg) + && REG_P (SET_DEST (pattern))) + { + if (insnR) + FAIL; + insnR = insn; + continue; + } + else + FAIL; + if (!insnP || !insnS || !insnR) + FAIL; + SET_DEST (PATTERN (insnS)) = copy_rtx (operands[1]); + df_insn_rescan (insnS); + SET_SRC (PATTERN (insnR)) = copy_rtx (operands[1]); + df_insn_rescan (insnR); + set_insn_deleted (insnP); +}) -- 2.30.2 ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH v4] xtensa: Eliminate the use of callee-saved register that saves and restores only once 2023-01-19 3:50 ` [PATCH v4] xtensa: Eliminate the use of callee-saved register that saves and restores only once Takayuki 'January June' Suwa @ 2023-01-20 15:14 ` Max Filippov 2023-01-21 4:39 ` Takayuki 'January June' Suwa 0 siblings, 1 reply; 5+ messages in thread From: Max Filippov @ 2023-01-20 15:14 UTC (permalink / raw) To: Takayuki 'January June' Suwa; +Cc: GCC Patches Hi Suwa-san, On Wed, Jan 18, 2023 at 7:50 PM Takayuki 'January June' Suwa <jjsuwa_sys3175@yahoo.co.jp> wrote: > > In the previous patch, if insn is JUMP_INSN or CALL_INSN, it bypasses the reg check (possibly FAIL). > > ===== > In the case of the CALL0 ABI, values that must be retained before and > after function calls are placed in the callee-saved registers (A12 > through A15) and referenced later. However, it is often the case that > the save and the reference are each only once and a simple register- > register move (the frame pointer is needed to recover the stack pointer > and must be excluded). > > e.g. in the following example, if there are no other occurrences of > register A14: > > ;; before > ; prologue { > ... > s32i.n a14, sp, 16 > ... > ; } prologue > ... > mov.n a14, a6 > ... > call0 foo > ... > mov.n a8, a14 > ... > ; epilogue { > ... > l32i.n a14, sp, 16 > ... > ; } epilogue > > It can be possible like this: > > ;; after > ; prologue { > ... > (deleted) > ... > ; } prologue > ... > s32i.n a6, sp, 16 > ... > call0 foo > ... > l32i.n a8, sp, 16 > ... > ; epilogue { > ... > (deleted) > ... > ; } epilogue > > This patch introduces a new peephole2 pattern that implements the above. > > gcc/ChangeLog: > > * config/xtensa/xtensa.md: New peephole2 pattern that eliminates > the use of callee-saved register that saves and restores only once > for other register, by using its stack slot directly. > --- > gcc/config/xtensa/xtensa.md | 62 +++++++++++++++++++++++++++++++++++++ > 1 file changed, 62 insertions(+) There are still issues with this change in the libgomp: FAIL: libgomp.c/examples-4/target-1.c execution test FAIL: libgomp.c/examples-4/target-2.c execution test They come from the following function: code produced before the change: .literal_position .literal .LC8, init@PLT .literal .LC9, 400000 .literal .LC10, 100000 .literal .LC11, -800000 .literal .LC12, 800000 .align 4 .global vec_mult_ref .type vec_mult_ref, @function vec_mult_ref: l32r a9, .LC11 addi sp, sp, -16 l32r a10, .LC9 s32i.n a12, sp, 8 s32i.n a13, sp, 4 s32i.n a0, sp, 12 add.n sp, sp, a9 add.n a12, sp, a10 l32r a9, .LC8 mov.n a13, a2 mov.n a3, sp mov.n a2, a12 callx0 a9 l32r a7, .LC10 mov.n a10, a12 mov.n a11, sp mov.n a2, a13 loop a7, .L17_LEND .L17: l32i.n a9, a10, 0 l32i.n a6, a11, 0 addi.n a10, a10, 4 mull a9, a9, a6 addi.n a11, a11, 4 s32i.n a9, a2, 0 addi.n a2, a2, 4 .L17_LEND: l32r a9, .LC12 add.n sp, sp, a9 l32i.n a0, sp, 12 l32i.n a12, sp, 8 l32i.n a13, sp, 4 addi sp, sp, 16 ret.n -------------------- with the change: .literal_position .literal .LC8, init@PLT .literal .LC9, 400000 .literal .LC10, 100000 .literal .LC11, -800000 .literal .LC12, 800000 .align 4 .global vec_mult_ref .type vec_mult_ref, @function vec_mult_ref: l32r a9, .LC11 l32r a10, .LC9 addi sp, sp, -16 s32i.n a12, sp, 8 s32i.n a0, sp, 12 add.n sp, sp, a9 add.n a12, sp, a10 l32r a9, .LC8 s32i.n a2, sp, 4 mov.n a3, sp mov.n a2, a12 callx0 a9 l32r a7, .LC10 l32i.n a2, sp, 4 mov.n a10, a12 mov.n a11, sp loop a7, .L17_LEND .L17: l32i.n a9, a10, 0 l32i.n a6, a11, 0 addi.n a10, a10, 4 mull a9, a9, a6 addi.n a11, a11, 4 s32i.n a9, a2, 0 addi.n a2, a2, 4 .L17_LEND: l32r a9, .LC12 add.n sp, sp, a9 l32i.n a0, sp, 12 l32i.n a12, sp, 8 addi sp, sp, 16 ret.n the stack pointer is modified after saving callee-saved registers, but the stack offset where a2 is stored and reloaded does not take this into an account. After having this many attempts and getting to the issues that are really hard to detect I wonder if the target backend is the right place for this optimization? -- Thanks. -- Max ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH v4] xtensa: Eliminate the use of callee-saved register that saves and restores only once 2023-01-20 15:14 ` Max Filippov @ 2023-01-21 4:39 ` Takayuki 'January June' Suwa 2023-01-22 15:45 ` Max Filippov 0 siblings, 1 reply; 5+ messages in thread From: Takayuki 'January June' Suwa @ 2023-01-21 4:39 UTC (permalink / raw) To: Max Filippov; +Cc: GCC Patches On 2023/01/21 0:14, Max Filippov wrote: > Hi Suwa-san, Hi! > > On Wed, Jan 18, 2023 at 7:50 PM Takayuki 'January June' Suwa > <jjsuwa_sys3175@yahoo.co.jp> wrote: >> >> In the previous patch, if insn is JUMP_INSN or CALL_INSN, it bypasses the reg check (possibly FAIL). >> >> ===== >> In the case of the CALL0 ABI, values that must be retained before and >> after function calls are placed in the callee-saved registers (A12 >> through A15) and referenced later. However, it is often the case that >> the save and the reference are each only once and a simple register- >> register move (the frame pointer is needed to recover the stack pointer >> and must be excluded). >> >> e.g. in the following example, if there are no other occurrences of >> register A14: >> >> ;; before >> ; prologue { >> ... >> s32i.n a14, sp, 16 >> ... >> ; } prologue >> ... >> mov.n a14, a6 >> ... >> call0 foo >> ... >> mov.n a8, a14 >> ... >> ; epilogue { >> ... >> l32i.n a14, sp, 16 >> ... >> ; } epilogue >> >> It can be possible like this: >> >> ;; after >> ; prologue { >> ... >> (deleted) >> ... >> ; } prologue >> ... >> s32i.n a6, sp, 16 >> ... >> call0 foo >> ... >> l32i.n a8, sp, 16 >> ... >> ; epilogue { >> ... >> (deleted) >> ... >> ; } epilogue >> >> This patch introduces a new peephole2 pattern that implements the above. >> >> gcc/ChangeLog: >> >> * config/xtensa/xtensa.md: New peephole2 pattern that eliminates >> the use of callee-saved register that saves and restores only once >> for other register, by using its stack slot directly. >> --- >> gcc/config/xtensa/xtensa.md | 62 +++++++++++++++++++++++++++++++++++++ >> 1 file changed, 62 insertions(+) > > There are still issues with this change in the libgomp: > > FAIL: libgomp.c/examples-4/target-1.c execution test > FAIL: libgomp.c/examples-4/target-2.c execution test > > They come from the following function: > > code produced before the change: > .literal_position > .literal .LC8, init@PLT > .literal .LC9, 400000 > .literal .LC10, 100000 > .literal .LC11, -800000 > .literal .LC12, 800000 > .align 4 > .global vec_mult_ref > .type vec_mult_ref, @function > vec_mult_ref: > l32r a9, .LC11 > addi sp, sp, -16 > l32r a10, .LC9 > s32i.n a12, sp, 8 > s32i.n a13, sp, 4 > s32i.n a0, sp, 12 > add.n sp, sp, a9 > add.n a12, sp, a10 > l32r a9, .LC8 > mov.n a13, a2 > mov.n a3, sp > mov.n a2, a12 > callx0 a9 > l32r a7, .LC10 > mov.n a10, a12 > mov.n a11, sp > mov.n a2, a13 > loop a7, .L17_LEND > .L17: > l32i.n a9, a10, 0 > l32i.n a6, a11, 0 > addi.n a10, a10, 4 > mull a9, a9, a6 > addi.n a11, a11, 4 > s32i.n a9, a2, 0 > addi.n a2, a2, 4 > .L17_LEND: > l32r a9, .LC12 > add.n sp, sp, a9 > l32i.n a0, sp, 12 > l32i.n a12, sp, 8 > l32i.n a13, sp, 4 > addi sp, sp, 16 > ret.n > > -------------------- > > with the change: > .literal_position > .literal .LC8, init@PLT > .literal .LC9, 400000 > .literal .LC10, 100000 > .literal .LC11, -800000 > .literal .LC12, 800000 > .align 4 > .global vec_mult_ref > .type vec_mult_ref, @function > vec_mult_ref: > l32r a9, .LC11 > l32r a10, .LC9 > addi sp, sp, -16 > s32i.n a12, sp, 8 > s32i.n a0, sp, 12 > add.n sp, sp, a9 > add.n a12, sp, a10 > l32r a9, .LC8 > s32i.n a2, sp, 4 > mov.n a3, sp > mov.n a2, a12 > callx0 a9 > l32r a7, .LC10 > l32i.n a2, sp, 4 > mov.n a10, a12 > mov.n a11, sp > loop a7, .L17_LEND > .L17: > l32i.n a9, a10, 0 > l32i.n a6, a11, 0 > addi.n a10, a10, 4 > mull a9, a9, a6 > addi.n a11, a11, 4 > s32i.n a9, a2, 0 > addi.n a2, a2, 4 > .L17_LEND: > l32r a9, .LC12 > add.n sp, sp, a9 > l32i.n a0, sp, 12 > l32i.n a12, sp, 8 > addi sp, sp, 16 > ret.n > > the stack pointer is modified after saving callee-saved registers, > but the stack offset where a2 is stored and reloaded does not take > this into an account. > > After having this many attempts and getting to the issues that are > really hard to detect I wonder if the target backend is the right place > for this optimization? > I guess they are not hard to detect but just issues I didn't anticipate (and I just need a little more work). And where else should it be done? What about implementing a target-specific pass just for one-point optimization? ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH v4] xtensa: Eliminate the use of callee-saved register that saves and restores only once 2023-01-21 4:39 ` Takayuki 'January June' Suwa @ 2023-01-22 15:45 ` Max Filippov 2023-01-23 3:33 ` Takayuki 'January June' Suwa 0 siblings, 1 reply; 5+ messages in thread From: Max Filippov @ 2023-01-22 15:45 UTC (permalink / raw) To: Takayuki 'January June' Suwa; +Cc: GCC Patches On Fri, Jan 20, 2023 at 8:39 PM Takayuki 'January June' Suwa <jjsuwa_sys3175@yahoo.co.jp> wrote: > On 2023/01/21 0:14, Max Filippov wrote: > > After having this many attempts and getting to the issues that are > > really hard to detect I wonder if the target backend is the right place > > for this optimization? > > > I guess they are not hard to detect I mean, on the testing side. check-gcc testsuite passed without new regressions with this change, linux kernel smoke test passed, I was almost convinced that it's ok to commit. > but just issues I didn't anticipate (and I just need a little more work). Looking at other peephole2 patterns I see that their code transformations are much more compact and they don't need to track additional properties of unrelated instructions. > And where else should it be done? What about implementing a > target-specific pass just for one-point optimization? I don't even understand what's target-specific in this optimization? It looks very generic to me. -- Thanks. -- Max ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH v4] xtensa: Eliminate the use of callee-saved register that saves and restores only once 2023-01-22 15:45 ` Max Filippov @ 2023-01-23 3:33 ` Takayuki 'January June' Suwa 0 siblings, 0 replies; 5+ messages in thread From: Takayuki 'January June' Suwa @ 2023-01-23 3:33 UTC (permalink / raw) To: Max Filippov; +Cc: GCC Patches On 2023/01/23 0:45, Max Filippov wrote: > On Fri, Jan 20, 2023 at 8:39 PM Takayuki 'January June' Suwa > <jjsuwa_sys3175@yahoo.co.jp> wrote: >> On 2023/01/21 0:14, Max Filippov wrote: >>> After having this many attempts and getting to the issues that are >>> really hard to detect I wonder if the target backend is the right place >>> for this optimization? >>> >> I guess they are not hard to detect > > I mean, on the testing side. check-gcc testsuite passed without new > regressions with this change, linux kernel smoke test passed, I was > almost convinced that it's ok to commit. > >> but just issues I didn't anticipate (and I just need a little more work). > > Looking at other peephole2 patterns I see that their code transformations > are much more compact and they don't need to track additional properties > of unrelated instructions. > >> And where else should it be done? What about implementing a >> target-specific pass just for one-point optimization? > > I don't even understand what's target-specific in this optimization? > It looks very generic to me. > Ah, I seem to have misunderstood what you meant, sorry. Now, what this patch is trying to do depends on whether register moves can be converted to stack pointer indirect loads/stores with offsets, and whether there is any benefit in doing so, but they are not target dependent. Is it? If we want the target-independent part to do something like this, we will need a mechanism (macro, hook, etc.) to write appropriate information in the machine description and pass it on. For example, offset ranges for register indirect loads and stores, or whether the ABI requires that callee-saved registers always be associated with stack slots, or even the need for stack frame construction... I totally agree that the peephole2 pattern is not the best implementation location. ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2023-01-23 3:33 UTC | newest] Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- [not found] <465b0cbe-73ca-f5a0-661d-d34217e29b4d.ref@yahoo.co.jp> 2023-01-19 3:50 ` [PATCH v4] xtensa: Eliminate the use of callee-saved register that saves and restores only once Takayuki 'January June' Suwa 2023-01-20 15:14 ` Max Filippov 2023-01-21 4:39 ` Takayuki 'January June' Suwa 2023-01-22 15:45 ` Max Filippov 2023-01-23 3:33 ` Takayuki 'January June' Suwa
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).