public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug middle-end/110587] New: 96% pr28071.c compile time regression betwen g:8377cf1bf41a0a9d9d49de807b2341f0bf5d30cf and g:3a61ca1b9256535e1bfb19b2d46cde21f3908a5d
@ 2023-07-07 9:52 hubicka at gcc dot gnu.org
2023-07-15 14:11 ` [Bug middle-end/110587] " jamborm at gcc dot gnu.org
` (22 more replies)
0 siblings, 23 replies; 24+ messages in thread
From: hubicka at gcc dot gnu.org @ 2023-07-07 9:52 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110587
Bug ID: 110587
Summary: 96% pr28071.c compile time regression betwen
g:8377cf1bf41a0a9d9d49de807b2341f0bf5d30cf and
g:3a61ca1b9256535e1bfb19b2d46cde21f3908a5d
Product: gcc
Version: 13.1.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: middle-end
Assignee: unassigned at gcc dot gnu.org
Reporter: hubicka at gcc dot gnu.org
Target Milestone: ---
Seen here:
https://lnt.opensuse.org/db_default/v4/CPP/graph?plot.0=288.597.8
https://lnt.opensuse.org/db_default/v4/CPP/graph?plot.0=468.597.8
https://lnt.opensuse.org/db_default/v4/CPP/graph?plot.0=172.597.8
^ permalink raw reply [flat|nested] 24+ messages in thread
* [Bug middle-end/110587] 96% pr28071.c compile time regression betwen g:8377cf1bf41a0a9d9d49de807b2341f0bf5d30cf and g:3a61ca1b9256535e1bfb19b2d46cde21f3908a5d
2023-07-07 9:52 [Bug middle-end/110587] New: 96% pr28071.c compile time regression betwen g:8377cf1bf41a0a9d9d49de807b2341f0bf5d30cf and g:3a61ca1b9256535e1bfb19b2d46cde21f3908a5d hubicka at gcc dot gnu.org
@ 2023-07-15 14:11 ` jamborm at gcc dot gnu.org
2023-07-15 17:20 ` [Bug middle-end/110587] [14 regression] 96% pr28071.c compile time regression since r14-2337-g37a231cc7594d1 pinskia at gcc dot gnu.org
` (21 subsequent siblings)
22 siblings, 0 replies; 24+ messages in thread
From: jamborm at gcc dot gnu.org @ 2023-07-15 14:11 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110587
Martin Jambor <jamborm at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |jamborm at gcc dot gnu.org,
| |liuhongt at gcc dot gnu.org
Last reconfirmed| |2023-07-15
Ever confirmed|0 |1
Status|UNCONFIRMED |NEW
--- Comment #1 from Martin Jambor <jamborm at gcc dot gnu.org> ---
I have bisected this to
37a231cc7594d12ba0822077018aad751a6fb94e is the first bad commit
commit 37a231cc7594d12ba0822077018aad751a6fb94e
Author: liuhongt <hongtao.liu@intel.com>
Date: Wed Jul 5 13:45:11 2023 +0800
Disparage slightly for the alternative which move DFmode between SSE_REGS
and GENERAL_REGS.
^ permalink raw reply [flat|nested] 24+ messages in thread
* [Bug middle-end/110587] [14 regression] 96% pr28071.c compile time regression since r14-2337-g37a231cc7594d1
2023-07-07 9:52 [Bug middle-end/110587] New: 96% pr28071.c compile time regression betwen g:8377cf1bf41a0a9d9d49de807b2341f0bf5d30cf and g:3a61ca1b9256535e1bfb19b2d46cde21f3908a5d hubicka at gcc dot gnu.org
2023-07-15 14:11 ` [Bug middle-end/110587] " jamborm at gcc dot gnu.org
@ 2023-07-15 17:20 ` pinskia at gcc dot gnu.org
2023-07-15 17:21 ` [Bug rtl-optimization/110587] " pinskia at gcc dot gnu.org
` (20 subsequent siblings)
22 siblings, 0 replies; 24+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-07-15 17:20 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110587
Andrew Pinski <pinskia at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Summary|96% pr28071.c compile time |[14 regression] 96%
|regression since |pr28071.c compile time
|r14-2337-g37a231cc7594d1 |regression since
| |r14-2337-g37a231cc7594d1
Target Milestone|--- |14.0
Keywords| |compile-time-hog, ra
Version|13.1.0 |14.0
^ permalink raw reply [flat|nested] 24+ messages in thread
* [Bug rtl-optimization/110587] [14 regression] 96% pr28071.c compile time regression since r14-2337-g37a231cc7594d1
2023-07-07 9:52 [Bug middle-end/110587] New: 96% pr28071.c compile time regression betwen g:8377cf1bf41a0a9d9d49de807b2341f0bf5d30cf and g:3a61ca1b9256535e1bfb19b2d46cde21f3908a5d hubicka at gcc dot gnu.org
2023-07-15 14:11 ` [Bug middle-end/110587] " jamborm at gcc dot gnu.org
2023-07-15 17:20 ` [Bug middle-end/110587] [14 regression] 96% pr28071.c compile time regression since r14-2337-g37a231cc7594d1 pinskia at gcc dot gnu.org
@ 2023-07-15 17:21 ` pinskia at gcc dot gnu.org
2023-07-17 6:27 ` crazylht at gmail dot com
` (19 subsequent siblings)
22 siblings, 0 replies; 24+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-07-15 17:21 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110587
Andrew Pinski <pinskia at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Target| |x86_64-linux-gnu
--- Comment #2 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Would be interesting to see if it is the register allocator and where (which
function) in GCC the compile time slow down happens.
^ permalink raw reply [flat|nested] 24+ messages in thread
* [Bug rtl-optimization/110587] [14 regression] 96% pr28071.c compile time regression since r14-2337-g37a231cc7594d1
2023-07-07 9:52 [Bug middle-end/110587] New: 96% pr28071.c compile time regression betwen g:8377cf1bf41a0a9d9d49de807b2341f0bf5d30cf and g:3a61ca1b9256535e1bfb19b2d46cde21f3908a5d hubicka at gcc dot gnu.org
` (2 preceding siblings ...)
2023-07-15 17:21 ` [Bug rtl-optimization/110587] " pinskia at gcc dot gnu.org
@ 2023-07-17 6:27 ` crazylht at gmail dot com
2023-07-17 6:28 ` crazylht at gmail dot com
` (18 subsequent siblings)
22 siblings, 0 replies; 24+ messages in thread
From: crazylht at gmail dot com @ 2023-07-17 6:27 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110587
Hongtao.liu <crazylht at gmail dot com> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |crazylht at gmail dot com
--- Comment #3 from Hongtao.liu <crazylht at gmail dot com> ---
I can't find pr28071.c in GCC testsuite, but find an attached source file in
the PR #c1, is that pr28071.c you means?
^ permalink raw reply [flat|nested] 24+ messages in thread
* [Bug rtl-optimization/110587] [14 regression] 96% pr28071.c compile time regression since r14-2337-g37a231cc7594d1
2023-07-07 9:52 [Bug middle-end/110587] New: 96% pr28071.c compile time regression betwen g:8377cf1bf41a0a9d9d49de807b2341f0bf5d30cf and g:3a61ca1b9256535e1bfb19b2d46cde21f3908a5d hubicka at gcc dot gnu.org
` (3 preceding siblings ...)
2023-07-17 6:27 ` crazylht at gmail dot com
@ 2023-07-17 6:28 ` crazylht at gmail dot com
2023-07-17 8:56 ` jamborm at gcc dot gnu.org
` (17 subsequent siblings)
22 siblings, 0 replies; 24+ messages in thread
From: crazylht at gmail dot com @ 2023-07-17 6:28 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110587
--- Comment #4 from Hongtao.liu <crazylht at gmail dot com> ---
(In reply to Jan Hubicka from comment #0)
> Seen here:
> https://lnt.opensuse.org/db_default/v4/CPP/graph?plot.0=288.597.8
> https://lnt.opensuse.org/db_default/v4/CPP/graph?plot.0=468.597.8
> https://lnt.opensuse.org/db_default/v4/CPP/graph?plot.0=172.597.8
Also is O0_g means compile flag is -O0 -g?
^ permalink raw reply [flat|nested] 24+ messages in thread
* [Bug rtl-optimization/110587] [14 regression] 96% pr28071.c compile time regression since r14-2337-g37a231cc7594d1
2023-07-07 9:52 [Bug middle-end/110587] New: 96% pr28071.c compile time regression betwen g:8377cf1bf41a0a9d9d49de807b2341f0bf5d30cf and g:3a61ca1b9256535e1bfb19b2d46cde21f3908a5d hubicka at gcc dot gnu.org
` (4 preceding siblings ...)
2023-07-17 6:28 ` crazylht at gmail dot com
@ 2023-07-17 8:56 ` jamborm at gcc dot gnu.org
2023-07-17 9:13 ` rguenth at gcc dot gnu.org
` (16 subsequent siblings)
22 siblings, 0 replies; 24+ messages in thread
From: jamborm at gcc dot gnu.org @ 2023-07-17 8:56 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110587
--- Comment #5 from Martin Jambor <jamborm at gcc dot gnu.org> ---
(In reply to Hongtao.liu from comment #3)
> I can't find pr28071.c in GCC testsuite, but find an attached source file in
> the PR #c1, is that pr28071.c you means?
Yes.
(In reply to Hongtao.liu from comment #4)
> (In reply to Jan Hubicka from comment #0)
> > Seen here:
> > https://lnt.opensuse.org/db_default/v4/CPP/graph?plot.0=288.597.8
> > https://lnt.opensuse.org/db_default/v4/CPP/graph?plot.0=468.597.8
> > https://lnt.opensuse.org/db_default/v4/CPP/graph?plot.0=172.597.8
>
> Also is O0_g means compile flag is -O0 -g?
That is what I used to bisect, although I *think* that -g is not
necessary.
^ permalink raw reply [flat|nested] 24+ messages in thread
* [Bug rtl-optimization/110587] [14 regression] 96% pr28071.c compile time regression since r14-2337-g37a231cc7594d1
2023-07-07 9:52 [Bug middle-end/110587] New: 96% pr28071.c compile time regression betwen g:8377cf1bf41a0a9d9d49de807b2341f0bf5d30cf and g:3a61ca1b9256535e1bfb19b2d46cde21f3908a5d hubicka at gcc dot gnu.org
` (5 preceding siblings ...)
2023-07-17 8:56 ` jamborm at gcc dot gnu.org
@ 2023-07-17 9:13 ` rguenth at gcc dot gnu.org
2023-07-17 10:52 ` jamborm at gcc dot gnu.org
` (15 subsequent siblings)
22 siblings, 0 replies; 24+ messages in thread
From: rguenth at gcc dot gnu.org @ 2023-07-17 9:13 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110587
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |rguenth at gcc dot gnu.org
--- Comment #6 from Richard Biener <rguenth at gcc dot gnu.org> ---
That doesn't seem to be the larger jump at Jul 16/17? Can we bisect that as
well?
^ permalink raw reply [flat|nested] 24+ messages in thread
* [Bug rtl-optimization/110587] [14 regression] 96% pr28071.c compile time regression since r14-2337-g37a231cc7594d1
2023-07-07 9:52 [Bug middle-end/110587] New: 96% pr28071.c compile time regression betwen g:8377cf1bf41a0a9d9d49de807b2341f0bf5d30cf and g:3a61ca1b9256535e1bfb19b2d46cde21f3908a5d hubicka at gcc dot gnu.org
` (6 preceding siblings ...)
2023-07-17 9:13 ` rguenth at gcc dot gnu.org
@ 2023-07-17 10:52 ` jamborm at gcc dot gnu.org
2023-07-17 11:09 ` rguenth at gcc dot gnu.org
` (14 subsequent siblings)
22 siblings, 0 replies; 24+ messages in thread
From: jamborm at gcc dot gnu.org @ 2023-07-17 10:52 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110587
Martin Jambor <jamborm at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |sayle at gcc dot gnu.org
--- Comment #7 from Martin Jambor <jamborm at gcc dot gnu.org> ---
Oops sorry, indeed, the much bigger regression is because of:
commit 8911879415d6c2a7baad88235554a912887a1c5c
Author: Roger Sayle <roger@nextmovesoftware.com>
Date: Fri Jul 14 18:10:05 2023 +0100
i386: Improved insv of DImode/DFmode {high,low}parts into TImode.
This is the next piece towards a fix for (the x86_64 ABI issues affecting)
PR 88873. This patch generalizes the recent tweak to ix86_expand_move
for setting the highpart of a TImode reg from a DImode source using
*insvti_highpart_1, to handle both DImode and DFmode sources, and also
use the recently added *insvti_lowpart_1 for setting the lowpart.
Although this is another intermediate step (not yet a fix), towards
enabling *insvti and *concat* patterns to be candidates for TImode STV
(by using V2DI/V2DF instructions), it already improves things a little.
[...]
^ permalink raw reply [flat|nested] 24+ messages in thread
* [Bug rtl-optimization/110587] [14 regression] 96% pr28071.c compile time regression since r14-2337-g37a231cc7594d1
2023-07-07 9:52 [Bug middle-end/110587] New: 96% pr28071.c compile time regression betwen g:8377cf1bf41a0a9d9d49de807b2341f0bf5d30cf and g:3a61ca1b9256535e1bfb19b2d46cde21f3908a5d hubicka at gcc dot gnu.org
` (7 preceding siblings ...)
2023-07-17 10:52 ` jamborm at gcc dot gnu.org
@ 2023-07-17 11:09 ` rguenth at gcc dot gnu.org
2023-07-17 11:26 ` roger at nextmovesoftware dot com
` (13 subsequent siblings)
22 siblings, 0 replies; 24+ messages in thread
From: rguenth at gcc dot gnu.org @ 2023-07-17 11:09 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110587
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |vmakarov at gcc dot gnu.org
--- Comment #8 from Richard Biener <rguenth at gcc dot gnu.org> ---
Btw, with GCC 13.1 this is already a LRA hog:
LRA non-specific : 3.31 ( 73%) 0.01 ( 9%) 3.33 ( 72%)
3876k ( 3%)
TOTAL : 4.53 0.11 4.65
126M
GCC 8 and before were worse. On trunk:
LRA non-specific : 6.22 ( 69%) 0.02 ( 20%) 6.22 ( 69%)
8922k ( 6%)
LRA hard reg assignment : 1.00 ( 11%) 0.02 ( 20%) 1.02 ( 11%)
0 ( 0%)
TOTAL : 8.97 0.10 9.08
149M
the above is with just -O0.
Profile:
Samples: 37K of event 'cycles:u', Event count (approx.): 49984847870
Overhead Samples Command Shared Object Symbol
51.58% 19087 cc1 cc1 [.] lra_final_code_change
11.10% 4106 cc1 cc1 [.] next_nondebug_insn
7.61% 2879 cc1 cc1 [.] bitmap_set_bit
6.42% 2425 cc1 cc1 [.] find_hard_regno_for_1
2.28% 842 cc1 cc1 [.] bitmap_bit_p
0.99% 365 cc1 cc1 [.]
lra_create_live_ranges_1
it possibly means we now spill more, at -O0 at least. We have a 10%
regression in assembly line count between 13 and trunk.
The main hog in lra_final_code_change is calls to regno_in_use_p and
the loop within that. The BB in this function is _huge_ so the whole
process quickly becomes quadratic. Maybe the whole thing should work
backwards on a BB and this info collected on-the-fly as some "liveness"
problem?
^ permalink raw reply [flat|nested] 24+ messages in thread
* [Bug rtl-optimization/110587] [14 regression] 96% pr28071.c compile time regression since r14-2337-g37a231cc7594d1
2023-07-07 9:52 [Bug middle-end/110587] New: 96% pr28071.c compile time regression betwen g:8377cf1bf41a0a9d9d49de807b2341f0bf5d30cf and g:3a61ca1b9256535e1bfb19b2d46cde21f3908a5d hubicka at gcc dot gnu.org
` (8 preceding siblings ...)
2023-07-17 11:09 ` rguenth at gcc dot gnu.org
@ 2023-07-17 11:26 ` roger at nextmovesoftware dot com
2023-07-17 11:42 ` rguenth at gcc dot gnu.org
` (12 subsequent siblings)
22 siblings, 0 replies; 24+ messages in thread
From: roger at nextmovesoftware dot com @ 2023-07-17 11:26 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110587
Roger Sayle <roger at nextmovesoftware dot com> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |roger at nextmovesoftware dot com
See Also| |https://gcc.gnu.org/bugzill
| |a/show_bug.cgi?id=88873
--- Comment #9 from Roger Sayle <roger at nextmovesoftware dot com> ---
I'll check whether turning off the insvti_{low,high}part transformations during
lra_in_progress helps compile-time. I believe everytime reload encounters a
TI<->SSE SUBREG, the spill/reload generates two or three additional
instructions. I'm thinking that perhaps this should ideally be an UNSPEC, that
we can split after reload. As shown in PR 88873, we'd like SSE->TI->SSE to
avoid going via memory [where currently this happens twice]. It looks like
"interval" in pr28071.c suffers from the same x86 ABI issues [i.e. is placed in
scalar TImode, where ideally we'd like V2DI].
^ permalink raw reply [flat|nested] 24+ messages in thread
* [Bug rtl-optimization/110587] [14 regression] 96% pr28071.c compile time regression since r14-2337-g37a231cc7594d1
2023-07-07 9:52 [Bug middle-end/110587] New: 96% pr28071.c compile time regression betwen g:8377cf1bf41a0a9d9d49de807b2341f0bf5d30cf and g:3a61ca1b9256535e1bfb19b2d46cde21f3908a5d hubicka at gcc dot gnu.org
` (9 preceding siblings ...)
2023-07-17 11:26 ` roger at nextmovesoftware dot com
@ 2023-07-17 11:42 ` rguenth at gcc dot gnu.org
2023-07-17 16:04 ` roger at nextmovesoftware dot com
` (11 subsequent siblings)
22 siblings, 0 replies; 24+ messages in thread
From: rguenth at gcc dot gnu.org @ 2023-07-17 11:42 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110587
--- Comment #10 from Richard Biener <rguenth at gcc dot gnu.org> ---
I wonder what the following does anyway. We delete the noop move
only when either the reg isn't used for return or it isn't in
use in later insns between 'insn' and the next set of it.
That seems to detect the hardreg = X; USE (hardreg); return sequence
and wants to protect that despite X being the same as 'hardreg'.
/* IRA can generate move insns involving pseudos. It is
better remove them earlier to speed up compiler a bit.
It is also better to do it here as they might not pass
final RTL check in LRA, (e.g. insn moving a control
register into itself). So remove an useless move insn
unless next insn is USE marking the return reg (we should
save this as some subsequent optimizations assume that
such original insns are saved). */
if (NONJUMP_INSN_P (insn) && GET_CODE (pat) == SET
&& REG_P (SET_SRC (pat)) && REG_P (SET_DEST (pat))
&& REGNO (SET_SRC (pat)) == REGNO (SET_DEST (pat))
&& (! return_regno_p (REGNO (SET_SRC (pat)))
|| ! regno_in_use_p (insn, REGNO (SET_SRC (pat)))))
what's odd is of course that return_regno_p returns true so much for this
testcase.
The return sequence to protect should be easily discoverable by walking
from the function exit and thus could be marked instead of trying to
match it to each insn like above.
But I don't understand why we want to preserve this noop copy anyway ...
^ permalink raw reply [flat|nested] 24+ messages in thread
* [Bug rtl-optimization/110587] [14 regression] 96% pr28071.c compile time regression since r14-2337-g37a231cc7594d1
2023-07-07 9:52 [Bug middle-end/110587] New: 96% pr28071.c compile time regression betwen g:8377cf1bf41a0a9d9d49de807b2341f0bf5d30cf and g:3a61ca1b9256535e1bfb19b2d46cde21f3908a5d hubicka at gcc dot gnu.org
` (10 preceding siblings ...)
2023-07-17 11:42 ` rguenth at gcc dot gnu.org
@ 2023-07-17 16:04 ` roger at nextmovesoftware dot com
2023-07-18 8:25 ` rguenth at gcc dot gnu.org
` (10 subsequent siblings)
22 siblings, 0 replies; 24+ messages in thread
From: roger at nextmovesoftware dot com @ 2023-07-17 16:04 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110587
Roger Sayle <roger at nextmovesoftware dot com> changed:
What |Removed |Added
----------------------------------------------------------------------------
Assignee|unassigned at gcc dot gnu.org |roger at nextmovesoftware dot com
--- Comment #11 from Roger Sayle <roger at nextmovesoftware dot com> ---
My (upcoming) patch for PR88873 dramatically reduces the compile-time (with
-O0) for this test case (by reducing the number of pseudos and reducing the
number of reloads). But don't let that stop anyone from speeding up
lra_final_code_change.
^ permalink raw reply [flat|nested] 24+ messages in thread
* [Bug rtl-optimization/110587] [14 regression] 96% pr28071.c compile time regression since r14-2337-g37a231cc7594d1
2023-07-07 9:52 [Bug middle-end/110587] New: 96% pr28071.c compile time regression betwen g:8377cf1bf41a0a9d9d49de807b2341f0bf5d30cf and g:3a61ca1b9256535e1bfb19b2d46cde21f3908a5d hubicka at gcc dot gnu.org
` (11 preceding siblings ...)
2023-07-17 16:04 ` roger at nextmovesoftware dot com
@ 2023-07-18 8:25 ` rguenth at gcc dot gnu.org
2023-07-22 20:55 ` cvs-commit at gcc dot gnu.org
` (9 subsequent siblings)
22 siblings, 0 replies; 24+ messages in thread
From: rguenth at gcc dot gnu.org @ 2023-07-18 8:25 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110587
--- Comment #12 from Richard Biener <rguenth at gcc dot gnu.org> ---
This code block has a rich history with many fixes for many issues :/ (I
thought of just scrapping it ...), still regno_in_use_p is badly engineered in
this context. Of course we're quite unlucky that the return REG is in use that
much for this large BB.
In the end the reason why this code exists and also some of the fallout
observed in the history point at issues that might be worth fixing elsewhere as
well.
^ permalink raw reply [flat|nested] 24+ messages in thread
* [Bug rtl-optimization/110587] [14 regression] 96% pr28071.c compile time regression since r14-2337-g37a231cc7594d1
2023-07-07 9:52 [Bug middle-end/110587] New: 96% pr28071.c compile time regression betwen g:8377cf1bf41a0a9d9d49de807b2341f0bf5d30cf and g:3a61ca1b9256535e1bfb19b2d46cde21f3908a5d hubicka at gcc dot gnu.org
` (12 preceding siblings ...)
2023-07-18 8:25 ` rguenth at gcc dot gnu.org
@ 2023-07-22 20:55 ` cvs-commit at gcc dot gnu.org
2023-07-25 8:38 ` rguenth at gcc dot gnu.org
` (8 subsequent siblings)
22 siblings, 0 replies; 24+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2023-07-22 20:55 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110587
--- Comment #13 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Roger Sayle <sayle@gcc.gnu.org>:
https://gcc.gnu.org/g:8125b12f846b41f26e58c0fe3b218d654f65d1c8
commit r14-2730-g8125b12f846b41f26e58c0fe3b218d654f65d1c8
Author: Roger Sayle <roger@nextmovesoftware.com>
Date: Sat Jul 22 21:52:55 2023 +0100
i386: Don't use insvti_{high,low}part with -O0 (for compile-time).
This patch attempts to help with PR rtl-optimization/110587, a regression
of -O0 compile time for the pathological pr28071.c. My recent patch helps
a bit, but hasn't returned -O0 compile-time to where it was before my
ix86_expand_move changes. The obvious solution/workaround is to guard
these new TImode parameter passing optimizations with "&& optimize", so
they don't trigger when compiling with -O0. The very minor complication
is that "&& optimize" alone leads to the regression of pr110533.c, where
our improved TImode parameter passing fixes a wrong-code issue with naked
functions, importantly, when compiling with -O0. This should explain
the one line fix below "&& (optimize || ix86_function_naked (cfun))".
I've an additional fix/tweak or two for this compile-time issue, but
this change eliminates the part of the regression that I've caused.
2023-07-22 Roger Sayle <roger@nextmovesoftware.com>
gcc/ChangeLog
* config/i386/i386-expand.cc (ix86_expand_move): Disable the
64-bit insertions into TImode optimizations with -O0, unless
the function has the "naked" attribute (for PR target/110533).
^ permalink raw reply [flat|nested] 24+ messages in thread
* [Bug rtl-optimization/110587] [14 regression] 96% pr28071.c compile time regression since r14-2337-g37a231cc7594d1
2023-07-07 9:52 [Bug middle-end/110587] New: 96% pr28071.c compile time regression betwen g:8377cf1bf41a0a9d9d49de807b2341f0bf5d30cf and g:3a61ca1b9256535e1bfb19b2d46cde21f3908a5d hubicka at gcc dot gnu.org
` (13 preceding siblings ...)
2023-07-22 20:55 ` cvs-commit at gcc dot gnu.org
@ 2023-07-25 8:38 ` rguenth at gcc dot gnu.org
2023-07-25 8:44 ` roger at nextmovesoftware dot com
` (7 subsequent siblings)
22 siblings, 0 replies; 24+ messages in thread
From: rguenth at gcc dot gnu.org @ 2023-07-25 8:38 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110587
--- Comment #14 from Richard Biener <rguenth at gcc dot gnu.org> ---
compile-time is back to the first jump caused by r14-2337-g37a231cc7594d1,
thanks Roger. We still have
LRA non-specific : 3.53 ( 75%)
at -O0 here which Rogers followup patch will improve (but not generally
solve the issue).
At -O1 combine dominates, at -O2 we see other parts of RA being slow:
integrated RA : 7.10 ( 23%)
LRA non-specific : 1.56 ( 5%)
LRA virtuals elimination : 0.07 ( 0%)
LRA reload inheritance : 1.02 ( 3%)
LRA create live ranges : 0.88 ( 3%)
LRA hard reg assignment : 8.22 ( 27%)
LRA coalesce pseudo regs : 0.00 ( 0%)
LRA rematerialization : 0.18 ( 1%)
Samples: 124K of event 'cycles:u', Event count (approx.): 164730867020
Overhead Samples Command Shared Object Symbol
16.60% 20660 cc1 cc1 [.] find_hard_regno_for_1
11.90% 14742 cc1 cc1 [.] bitmap_set_bit
6.47% 7973 cc1 cc1 [.] color_allocnos
3.31% 4023 cc1 cc1 [.] bitmap_bit_p
3.07% 3791 cc1 cc1 [.]
remove_allocno_from_bucket_and_push
2.77% 3435 cc1 cc1 [.] assign_hard_reg
2.54% 3138 cc1 cc1 [.] ira_build_conflicts
in find_hard_regno_for_1 the loop over live ranges is what's costly, esp.
because it seems the conditionals in the loops depend on (indirect) memory
and that no longer fits nicely into caches.
Maybe regno_allocno_class_array can be shrunk from 'enum reg_class'
(unsigned int) to something smaller. It looks like this array is a
memory optimization since reg_allocno_class would perform a much sparser
access.
^ permalink raw reply [flat|nested] 24+ messages in thread
* [Bug rtl-optimization/110587] [14 regression] 96% pr28071.c compile time regression since r14-2337-g37a231cc7594d1
2023-07-07 9:52 [Bug middle-end/110587] New: 96% pr28071.c compile time regression betwen g:8377cf1bf41a0a9d9d49de807b2341f0bf5d30cf and g:3a61ca1b9256535e1bfb19b2d46cde21f3908a5d hubicka at gcc dot gnu.org
` (14 preceding siblings ...)
2023-07-25 8:38 ` rguenth at gcc dot gnu.org
@ 2023-07-25 8:44 ` roger at nextmovesoftware dot com
2023-07-27 18:28 ` roger at nextmovesoftware dot com
` (6 subsequent siblings)
22 siblings, 0 replies; 24+ messages in thread
From: roger at nextmovesoftware dot com @ 2023-07-25 8:44 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110587
--- Comment #15 from Roger Sayle <roger at nextmovesoftware dot com> ---
Hi Richard,
There's another patch awaiting review at
https://gcc.gnu.org/pipermail/gcc-patches/2023-July/625282.html
and I've another follow-up after that currently regression testing...
^ permalink raw reply [flat|nested] 24+ messages in thread
* [Bug rtl-optimization/110587] [14 regression] 96% pr28071.c compile time regression since r14-2337-g37a231cc7594d1
2023-07-07 9:52 [Bug middle-end/110587] New: 96% pr28071.c compile time regression betwen g:8377cf1bf41a0a9d9d49de807b2341f0bf5d30cf and g:3a61ca1b9256535e1bfb19b2d46cde21f3908a5d hubicka at gcc dot gnu.org
` (15 preceding siblings ...)
2023-07-25 8:44 ` roger at nextmovesoftware dot com
@ 2023-07-27 18:28 ` roger at nextmovesoftware dot com
2023-07-28 8:40 ` cvs-commit at gcc dot gnu.org
` (5 subsequent siblings)
22 siblings, 0 replies; 24+ messages in thread
From: roger at nextmovesoftware dot com @ 2023-07-27 18:28 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110587
Roger Sayle <roger at nextmovesoftware dot com> changed:
What |Removed |Added
----------------------------------------------------------------------------
Assignee|roger at nextmovesoftware dot com |unassigned at gcc dot gnu.org
--- Comment #16 from Roger Sayle <roger at nextmovesoftware dot com> ---
My patch (in comment #15) is obsoleted by Richard Biener's much better
solution(s):
https://gcc.gnu.org/pipermail/gcc-patches/2023-July/625416.html
https://gcc.gnu.org/pipermail/gcc-patches/2023-July/625417.html
^ permalink raw reply [flat|nested] 24+ messages in thread
* [Bug rtl-optimization/110587] [14 regression] 96% pr28071.c compile time regression since r14-2337-g37a231cc7594d1
2023-07-07 9:52 [Bug middle-end/110587] New: 96% pr28071.c compile time regression betwen g:8377cf1bf41a0a9d9d49de807b2341f0bf5d30cf and g:3a61ca1b9256535e1bfb19b2d46cde21f3908a5d hubicka at gcc dot gnu.org
` (16 preceding siblings ...)
2023-07-27 18:28 ` roger at nextmovesoftware dot com
@ 2023-07-28 8:40 ` cvs-commit at gcc dot gnu.org
2023-08-02 7:04 ` cvs-commit at gcc dot gnu.org
` (4 subsequent siblings)
22 siblings, 0 replies; 24+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2023-07-28 8:40 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110587
--- Comment #17 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Roger Sayle <sayle@gcc.gnu.org>:
https://gcc.gnu.org/g:095eb138f736d94dabf9a07a6671bd351be0e66a
commit r14-2851-g095eb138f736d94dabf9a07a6671bd351be0e66a
Author: Roger Sayle <roger@nextmovesoftware.com>
Date: Fri Jul 28 09:39:46 2023 +0100
PR rtl-optimization/110587: Reduce useless moves in compile-time hog.
This patch is one of a series of fixes for PR rtl-optimization/110587,
a compile-time regression with -O0, that attempts to address the underlying
cause. As noted previously, the pathological test case pr28071.c contains
a large number of useless register-to-register moves that can produce
quadratic behaviour (in LRA). These moves are generated during RTL
expansion in emit_group_load_1, where the middle-end attempts to simplify
the source before calling extract_bit_field. This is reasonable if the
source is a complex expression (from before the tree-ssa optimizers), or
a SUBREG, or a hard register, but it's not particularly useful to copy
a pseudo register into a new pseudo register. This patch eliminates that
redundancy.
The -fdump-tree-expand for pr28071.c compiled with -O0 currently contains
777K lines, with this patch it contains 717K lines, i.e. saving about 60K
lines (admittedly of debugging text output, but it makes the point).
2023-07-28 Roger Sayle <roger@nextmovesoftware.com>
Richard Biener <rguenther@suse.de>
gcc/ChangeLog
PR middle-end/28071
PR rtl-optimization/110587
* expr.cc (emit_group_load_1): Simplify logic for calling
force_reg on ORIG_SRC, to avoid making a copy if the source
is already in a pseudo register.
^ permalink raw reply [flat|nested] 24+ messages in thread
* [Bug rtl-optimization/110587] [14 regression] 96% pr28071.c compile time regression since r14-2337-g37a231cc7594d1
2023-07-07 9:52 [Bug middle-end/110587] New: 96% pr28071.c compile time regression betwen g:8377cf1bf41a0a9d9d49de807b2341f0bf5d30cf and g:3a61ca1b9256535e1bfb19b2d46cde21f3908a5d hubicka at gcc dot gnu.org
` (17 preceding siblings ...)
2023-07-28 8:40 ` cvs-commit at gcc dot gnu.org
@ 2023-08-02 7:04 ` cvs-commit at gcc dot gnu.org
2023-08-02 7:47 ` rguenth at gcc dot gnu.org
` (3 subsequent siblings)
22 siblings, 0 replies; 24+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2023-08-02 7:04 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110587
--- Comment #18 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Richard Biener <rguenth@gcc.gnu.org>:
https://gcc.gnu.org/g:07b7cd70399d22c113ad8bb1eff5cc2d12973d33
commit r14-2920-g07b7cd70399d22c113ad8bb1eff5cc2d12973d33
Author: Richard Biener <rguenther@suse.de>
Date: Tue Jul 25 15:32:11 2023 +0200
rtl-optimization/110587 - remove quadratic regno_in_use_p
The following removes the code checking whether a noop copy
is between something involved in the return sequence composed
of a SET and USE. Instead of checking for this special-case
the following makes us only ever remove noop copies between
pseudos - which is the case that is necessary for IRA/LRA
interfacing to function according to the comment. That makes
looking for the return reg special case unnecessary, reducing
the compile-time in LRA non-specific to zero for the testcase.
PR rtl-optimization/110587
* lra-spills.cc (return_regno_p): Remove.
(regno_in_use_p): Likewise.
(lra_final_code_change): Do not remove noop moves
between hard registers.
^ permalink raw reply [flat|nested] 24+ messages in thread
* [Bug rtl-optimization/110587] [14 regression] 96% pr28071.c compile time regression since r14-2337-g37a231cc7594d1
2023-07-07 9:52 [Bug middle-end/110587] New: 96% pr28071.c compile time regression betwen g:8377cf1bf41a0a9d9d49de807b2341f0bf5d30cf and g:3a61ca1b9256535e1bfb19b2d46cde21f3908a5d hubicka at gcc dot gnu.org
` (18 preceding siblings ...)
2023-08-02 7:04 ` cvs-commit at gcc dot gnu.org
@ 2023-08-02 7:47 ` rguenth at gcc dot gnu.org
2023-08-02 9:11 ` ubizjak at gmail dot com
` (2 subsequent siblings)
22 siblings, 0 replies; 24+ messages in thread
From: rguenth at gcc dot gnu.org @ 2023-08-02 7:47 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110587
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Resolution|--- |FIXED
Status|NEW |RESOLVED
--- Comment #19 from Richard Biener <rguenth at gcc dot gnu.org> ---
The tester shows the issue is fixed now (we're faster than before the
regression). At -O0 compile-time is still dominated by RA
(r14-2920-g07b7cd70399d22, release checking):
integrated RA : 0.29 ( 32%)
LRA non-specific : 0.15 ( 16%)
TOTAL : 0.91
Samples: 3K of event 'cycles:u', Event count (approx.): 5038659855
Overhead Samples Command Shared Object Symbol
6.15% 233 cc1 cc1 [.] process_alt_operands
4.29% 163 cc1 cc1 [.] process_bb_node_lives
3.72% 142 cc1 cc1 [.] record_reg_classes
3.01% 114 cc1 cc1 [.] mark_ref_dead
2.87% 109 cc1 cc1 [.] constrain_operands
2.71% 114 cc1 cc1 [.]
df_ref_create_structure
2.47% 94 cc1 cc1 [.] ira_setup_alts
^ permalink raw reply [flat|nested] 24+ messages in thread
* [Bug rtl-optimization/110587] [14 regression] 96% pr28071.c compile time regression since r14-2337-g37a231cc7594d1
2023-07-07 9:52 [Bug middle-end/110587] New: 96% pr28071.c compile time regression betwen g:8377cf1bf41a0a9d9d49de807b2341f0bf5d30cf and g:3a61ca1b9256535e1bfb19b2d46cde21f3908a5d hubicka at gcc dot gnu.org
` (19 preceding siblings ...)
2023-08-02 7:47 ` rguenth at gcc dot gnu.org
@ 2023-08-02 9:11 ` ubizjak at gmail dot com
2023-08-02 9:45 ` rguenth at gcc dot gnu.org
2023-08-09 6:48 ` cvs-commit at gcc dot gnu.org
22 siblings, 0 replies; 24+ messages in thread
From: ubizjak at gmail dot com @ 2023-08-02 9:11 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110587
--- Comment #20 from Uroš Bizjak <ubizjak at gmail dot com> ---
Can we revert the Comment #13 kludge now?
^ permalink raw reply [flat|nested] 24+ messages in thread
* [Bug rtl-optimization/110587] [14 regression] 96% pr28071.c compile time regression since r14-2337-g37a231cc7594d1
2023-07-07 9:52 [Bug middle-end/110587] New: 96% pr28071.c compile time regression betwen g:8377cf1bf41a0a9d9d49de807b2341f0bf5d30cf and g:3a61ca1b9256535e1bfb19b2d46cde21f3908a5d hubicka at gcc dot gnu.org
` (20 preceding siblings ...)
2023-08-02 9:11 ` ubizjak at gmail dot com
@ 2023-08-02 9:45 ` rguenth at gcc dot gnu.org
2023-08-09 6:48 ` cvs-commit at gcc dot gnu.org
22 siblings, 0 replies; 24+ messages in thread
From: rguenth at gcc dot gnu.org @ 2023-08-02 9:45 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110587
--- Comment #21 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Uroš Bizjak from comment #20)
> Can we revert the Comment #13 kludge now?
When we revert it we get
integrated RA : 0.42 ( 17%) 0.00 ( 0%) 0.43 ( 17%)
19M ( 16%)
LRA non-specific : 0.39 ( 16%) 0.00 ( 0%) 0.39 ( 15%)
6304k ( 5%)
LRA virtuals elimination : 0.03 ( 1%) 0.00 ( 0%) 0.02 ( 1%)
3729k ( 3%)
LRA reload inheritance : 0.17 ( 7%) 0.01 ( 10%) 0.18 ( 7%)
5109k ( 4%)
LRA create live ranges : 0.27 ( 11%) 0.00 ( 0%) 0.28 ( 11%)
984k ( 1%)
LRA hard reg assignment : 0.72 ( 30%) 0.01 ( 10%) 0.74 ( 29%)
0 ( 0%)
TOTAL : 2.43 0.10 2.54
123M
so the regression is back and also code size increases significantly.
^ permalink raw reply [flat|nested] 24+ messages in thread
* [Bug rtl-optimization/110587] [14 regression] 96% pr28071.c compile time regression since r14-2337-g37a231cc7594d1
2023-07-07 9:52 [Bug middle-end/110587] New: 96% pr28071.c compile time regression betwen g:8377cf1bf41a0a9d9d49de807b2341f0bf5d30cf and g:3a61ca1b9256535e1bfb19b2d46cde21f3908a5d hubicka at gcc dot gnu.org
` (21 preceding siblings ...)
2023-08-02 9:45 ` rguenth at gcc dot gnu.org
@ 2023-08-09 6:48 ` cvs-commit at gcc dot gnu.org
22 siblings, 0 replies; 24+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2023-08-09 6:48 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110587
--- Comment #22 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Richard Biener <rguenth@gcc.gnu.org>:
https://gcc.gnu.org/g:b66e613a1a8d5b8fc9d8b03f7b60260700acf833
commit r14-3095-gb66e613a1a8d5b8fc9d8b03f7b60260700acf833
Author: Richard Biener <rguenther@suse.de>
Date: Tue Jul 25 15:36:30 2023 +0200
rtl-optimization/110587 - speedup find_hard_regno_for_1
The following applies a micro-optimization to find_hard_regno_for_1,
re-ordering the check so we can easily jump-thread by using an else.
This reduces the time spent in this function by 15% for the testcase
in the PR.
PR rtl-optimization/110587
* lra-assigns.cc (find_hard_regno_for_1): Re-order checks.
^ permalink raw reply [flat|nested] 24+ messages in thread
end of thread, other threads:[~2023-08-09 6:48 UTC | newest]
Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-07-07 9:52 [Bug middle-end/110587] New: 96% pr28071.c compile time regression betwen g:8377cf1bf41a0a9d9d49de807b2341f0bf5d30cf and g:3a61ca1b9256535e1bfb19b2d46cde21f3908a5d hubicka at gcc dot gnu.org
2023-07-15 14:11 ` [Bug middle-end/110587] " jamborm at gcc dot gnu.org
2023-07-15 17:20 ` [Bug middle-end/110587] [14 regression] 96% pr28071.c compile time regression since r14-2337-g37a231cc7594d1 pinskia at gcc dot gnu.org
2023-07-15 17:21 ` [Bug rtl-optimization/110587] " pinskia at gcc dot gnu.org
2023-07-17 6:27 ` crazylht at gmail dot com
2023-07-17 6:28 ` crazylht at gmail dot com
2023-07-17 8:56 ` jamborm at gcc dot gnu.org
2023-07-17 9:13 ` rguenth at gcc dot gnu.org
2023-07-17 10:52 ` jamborm at gcc dot gnu.org
2023-07-17 11:09 ` rguenth at gcc dot gnu.org
2023-07-17 11:26 ` roger at nextmovesoftware dot com
2023-07-17 11:42 ` rguenth at gcc dot gnu.org
2023-07-17 16:04 ` roger at nextmovesoftware dot com
2023-07-18 8:25 ` rguenth at gcc dot gnu.org
2023-07-22 20:55 ` cvs-commit at gcc dot gnu.org
2023-07-25 8:38 ` rguenth at gcc dot gnu.org
2023-07-25 8:44 ` roger at nextmovesoftware dot com
2023-07-27 18:28 ` roger at nextmovesoftware dot com
2023-07-28 8:40 ` cvs-commit at gcc dot gnu.org
2023-08-02 7:04 ` cvs-commit at gcc dot gnu.org
2023-08-02 7:47 ` rguenth at gcc dot gnu.org
2023-08-02 9:11 ` ubizjak at gmail dot com
2023-08-02 9:45 ` rguenth at gcc dot gnu.org
2023-08-09 6:48 ` cvs-commit at gcc dot gnu.org
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).