* [PATCH] Replace lra-spill.cc's return_regno_p with return_reg_p.
@ 2023-07-22 16:44 Roger Sayle
2023-07-25 11:18 ` Richard Biener
0 siblings, 1 reply; 2+ messages in thread
From: Roger Sayle @ 2023-07-22 16:44 UTC (permalink / raw)
To: gcc-patches
[-- Attachment #1: Type: text/plain, Size: 1714 bytes --]
This patch is my attempt to address the compile-time hog issue
in PR rtl-optimization/110587. Richard Biener's analysis shows that
compilation of pr28071.c with -O0 currently spends ~70% in timer
"LRA non-specific" due to return_regno_p failing to filter a large
number of calls to regno_in_use_p, resulting in quadratic behaviour.
For this pathological test case, things can be improved significantly.
Although the return register (%rax) is indeed mentioned a large
number of times in this function, due to inlining, the inlined functions
access the returned register in TImode, whereas the current function
returns a DImode. Hence the check to see if we're the last SET of the
return register, which should be followed by a USE, can be improved
by also testing the mode. Implementation-wise, rather than pass an
additional mode parameter to LRA's local return_regno_p function, which
only has a single caller, it's more convenient to pass the rtx REG_P,
and from this extract both the REGNO and the mode in the callee, and
rename this function to return_reg_p.
The good news is that with this change "LRA non-specific" drops from
70% to 13%.
This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
and make -k check, with no new failures. Ok for mainline?
2023-07-22 Roger Sayle <roger@nextmovesoftware.com>
gcc/ChangeLog
PR middle-end/28071
PR rtl-optimization/110587
* lra-spills.cc (return_regno_p): Change argument and rename to...
(return_reg_p): Check if the given register RTX has the same
REGNO and machine mode as the function's return value.
(lra_final_code_change): Update call to return_reg_p.
Thanks in advance,
Roger
--
[-- Attachment #2: patchrr.txt --]
[-- Type: text/plain, Size: 1554 bytes --]
diff --git a/gcc/lra-spills.cc b/gcc/lra-spills.cc
index 3a7bb7e..ae147ad 100644
--- a/gcc/lra-spills.cc
+++ b/gcc/lra-spills.cc
@@ -705,10 +705,10 @@ alter_subregs (rtx *loc, bool final_p)
return res;
}
-/* Return true if REGNO is used for return in the current
- function. */
+/* Return true if register REG, known to be REG_P, is used for return
+ in the current function. */
static bool
-return_regno_p (unsigned int regno)
+return_reg_p (rtx reg)
{
rtx outgoing = crtl->return_rtx;
@@ -716,7 +716,8 @@ return_regno_p (unsigned int regno)
return false;
if (REG_P (outgoing))
- return REGNO (outgoing) == regno;
+ return REGNO (outgoing) == REGNO (reg)
+ && GET_MODE (outgoing) == GET_MODE (reg);
else if (GET_CODE (outgoing) == PARALLEL)
{
int i;
@@ -725,7 +726,9 @@ return_regno_p (unsigned int regno)
{
rtx x = XEXP (XVECEXP (outgoing, 0, i), 0);
- if (REG_P (x) && REGNO (x) == regno)
+ if (REG_P (x)
+ && REGNO (x) == REGNO (reg)
+ && GET_MODE (x) == GET_MODE (reg))
return true;
}
}
@@ -821,7 +824,7 @@ lra_final_code_change (void)
if (NONJUMP_INSN_P (insn) && GET_CODE (pat) == SET
&& REG_P (SET_SRC (pat)) && REG_P (SET_DEST (pat))
&& REGNO (SET_SRC (pat)) == REGNO (SET_DEST (pat))
- && (! return_regno_p (REGNO (SET_SRC (pat)))
+ && (! return_reg_p (SET_SRC (pat))
|| ! regno_in_use_p (insn, REGNO (SET_SRC (pat)))))
{
lra_invalidate_insn_data (insn);
^ permalink raw reply [flat|nested] 2+ messages in thread
* Re: [PATCH] Replace lra-spill.cc's return_regno_p with return_reg_p.
2023-07-22 16:44 [PATCH] Replace lra-spill.cc's return_regno_p with return_reg_p Roger Sayle
@ 2023-07-25 11:18 ` Richard Biener
0 siblings, 0 replies; 2+ messages in thread
From: Richard Biener @ 2023-07-25 11:18 UTC (permalink / raw)
To: Roger Sayle; +Cc: gcc-patches
On Sat, Jul 22, 2023 at 6:45 PM Roger Sayle <roger@nextmovesoftware.com> wrote:
>
>
> This patch is my attempt to address the compile-time hog issue
> in PR rtl-optimization/110587. Richard Biener's analysis shows that
> compilation of pr28071.c with -O0 currently spends ~70% in timer
> "LRA non-specific" due to return_regno_p failing to filter a large
> number of calls to regno_in_use_p, resulting in quadratic behaviour.
>
> For this pathological test case, things can be improved significantly.
> Although the return register (%rax) is indeed mentioned a large
> number of times in this function, due to inlining, the inlined functions
> access the returned register in TImode, whereas the current function
> returns a DImode. Hence the check to see if we're the last SET of the
> return register, which should be followed by a USE, can be improved
> by also testing the mode. Implementation-wise, rather than pass an
> additional mode parameter to LRA's local return_regno_p function, which
> only has a single caller, it's more convenient to pass the rtx REG_P,
> and from this extract both the REGNO and the mode in the callee, and
> rename this function to return_reg_p.
>
> The good news is that with this change "LRA non-specific" drops from
> 70% to 13%.
>
> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> and make -k check, with no new failures. Ok for mainline?
I think this is a good heuristic improvement, but I don't feel competent
on assessing the constraints this implies on the structure of the
return value setting instructions. Previously we'd preserve
noop copies mentioning the return hardreg but now just if that noop
copy has the same mode as the return_rtx. I don't even understand
why we need to preserve this noop copy in the first place.
Looking at the history of return_regno_p and the surrounding of the
single caller tells this is a place of "interestingness" ...
Can somebody summarize why we need to preserve a noop-move
between the return_rtx (hard?)regs?
The comment
/* IRA can generate move insns involving pseudos. It is
better remove them earlier to speed up compiler a bit.
It is also better to do it here as they might not pass
final RTL check in LRA, (e.g. insn moving a control
register into itself). So remove an useless move insn
unless next insn is USE marking the return reg (we should
save this as some subsequent optimizations assume that
such original insns are saved). */
says this is about removing noop copies of _pseudos_ for correctness
reasons. So, can't we simply scrap the return_regno_p and regno_in_use_p
checks and always preserve noop moves between hardregs here, leaving
that to other passes?
I'm going to bootstrap & test that for the fun of it.
Thanks,
Richard.
>
>
> 2023-07-22 Roger Sayle <roger@nextmovesoftware.com>
>
> gcc/ChangeLog
> PR middle-end/28071
> PR rtl-optimization/110587
> * lra-spills.cc (return_regno_p): Change argument and rename to...
> (return_reg_p): Check if the given register RTX has the same
> REGNO and machine mode as the function's return value.
> (lra_final_code_change): Update call to return_reg_p.
>
>
> Thanks in advance,
> Roger
> --
>
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2023-07-25 11:19 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-07-22 16:44 [PATCH] Replace lra-spill.cc's return_regno_p with return_reg_p Roger Sayle
2023-07-25 11:18 ` Richard Biener
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).