This patch is my attempt to address the compile-time hog issue
in PR rtl-optimization/110587.  Richard Biener's analysis shows that
compilation of pr28071.c with -O0 currently spends ~70% in timer
"LRA non-specific" due to return_regno_p failing to filter a large
number of calls to regno_in_use_p, resulting in quadratic behaviour.

For this pathological test case, things can be improved significantly.
Although the return register (%rax) is indeed mentioned a large
number of times in this function, due to inlining, the inlined functions
access the returned register in TImode, whereas the current function
returns a DImode.  Hence the check to see if we're the last SET of the
return register, which should be followed by a USE, can be improved
by also testing the mode.  Implementation-wise, rather than pass an
additional mode parameter to LRA's local return_regno_p function, which
only has a single caller, it's more convenient to pass the rtx REG_P,
and from this extract both the REGNO and the mode in the callee, and
rename this function to return_reg_p.

The good news is that with this change "LRA non-specific" drops from
70% to 13%.

This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
and make -k check, with no new failures.  Ok for mainline?


2023-07-22  Roger Sayle  <roger@nextmovesoftware.com>

gcc/ChangeLog
        PR middle-end/28071
        PR rtl-optimization/110587
        * lra-spills.cc (return_regno_p): Change argument and rename to...
        (return_reg_p): Check if the given register RTX has the same
        REGNO and machine mode as the function's return value.
        (lra_final_code_change): Update call to return_reg_p.


Thanks in advance,
Roger
--