public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug middle-end/110587] New: 96% pr28071.c compile time regression betwen g:8377cf1bf41a0a9d9d49de807b2341f0bf5d30cf and g:3a61ca1b9256535e1bfb19b2d46cde21f3908a5d
@ 2023-07-07  9:52 hubicka at gcc dot gnu.org
  2023-07-15 14:11 ` [Bug middle-end/110587] " jamborm at gcc dot gnu.org
                   ` (22 more replies)
  0 siblings, 23 replies; 24+ messages in thread
From: hubicka at gcc dot gnu.org @ 2023-07-07  9:52 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110587

            Bug ID: 110587
           Summary: 96% pr28071.c compile time regression betwen
                    g:8377cf1bf41a0a9d9d49de807b2341f0bf5d30cf and
                    g:3a61ca1b9256535e1bfb19b2d46cde21f3908a5d
           Product: gcc
           Version: 13.1.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: middle-end
          Assignee: unassigned at gcc dot gnu.org
          Reporter: hubicka at gcc dot gnu.org
  Target Milestone: ---

Seen here:
https://lnt.opensuse.org/db_default/v4/CPP/graph?plot.0=288.597.8
https://lnt.opensuse.org/db_default/v4/CPP/graph?plot.0=468.597.8
https://lnt.opensuse.org/db_default/v4/CPP/graph?plot.0=172.597.8

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug middle-end/110587] 96% pr28071.c compile time regression betwen g:8377cf1bf41a0a9d9d49de807b2341f0bf5d30cf and g:3a61ca1b9256535e1bfb19b2d46cde21f3908a5d
  2023-07-07  9:52 [Bug middle-end/110587] New: 96% pr28071.c compile time regression betwen g:8377cf1bf41a0a9d9d49de807b2341f0bf5d30cf and g:3a61ca1b9256535e1bfb19b2d46cde21f3908a5d hubicka at gcc dot gnu.org
@ 2023-07-15 14:11 ` jamborm at gcc dot gnu.org
  2023-07-15 17:20 ` [Bug middle-end/110587] [14 regression] 96% pr28071.c compile time regression since r14-2337-g37a231cc7594d1 pinskia at gcc dot gnu.org
                   ` (21 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: jamborm at gcc dot gnu.org @ 2023-07-15 14:11 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110587

Martin Jambor <jamborm at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |jamborm at gcc dot gnu.org,
                   |                            |liuhongt at gcc dot gnu.org
   Last reconfirmed|                            |2023-07-15
     Ever confirmed|0                           |1
             Status|UNCONFIRMED                 |NEW

--- Comment #1 from Martin Jambor <jamborm at gcc dot gnu.org> ---
I have bisected this to

37a231cc7594d12ba0822077018aad751a6fb94e is the first bad commit
commit 37a231cc7594d12ba0822077018aad751a6fb94e
Author: liuhongt <hongtao.liu@intel.com>
Date:   Wed Jul 5 13:45:11 2023 +0800

    Disparage slightly for the alternative which move DFmode between SSE_REGS
and GENERAL_REGS.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug middle-end/110587] [14 regression] 96% pr28071.c compile time regression since r14-2337-g37a231cc7594d1
  2023-07-07  9:52 [Bug middle-end/110587] New: 96% pr28071.c compile time regression betwen g:8377cf1bf41a0a9d9d49de807b2341f0bf5d30cf and g:3a61ca1b9256535e1bfb19b2d46cde21f3908a5d hubicka at gcc dot gnu.org
  2023-07-15 14:11 ` [Bug middle-end/110587] " jamborm at gcc dot gnu.org
@ 2023-07-15 17:20 ` pinskia at gcc dot gnu.org
  2023-07-15 17:21 ` [Bug rtl-optimization/110587] " pinskia at gcc dot gnu.org
                   ` (20 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-07-15 17:20 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110587

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
            Summary|96% pr28071.c compile time  |[14 regression] 96%
                   |regression since            |pr28071.c compile time
                   |r14-2337-g37a231cc7594d1    |regression since
                   |                            |r14-2337-g37a231cc7594d1
   Target Milestone|---                         |14.0
           Keywords|                            |compile-time-hog, ra
            Version|13.1.0                      |14.0

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug rtl-optimization/110587] [14 regression] 96% pr28071.c compile time regression since r14-2337-g37a231cc7594d1
  2023-07-07  9:52 [Bug middle-end/110587] New: 96% pr28071.c compile time regression betwen g:8377cf1bf41a0a9d9d49de807b2341f0bf5d30cf and g:3a61ca1b9256535e1bfb19b2d46cde21f3908a5d hubicka at gcc dot gnu.org
  2023-07-15 14:11 ` [Bug middle-end/110587] " jamborm at gcc dot gnu.org
  2023-07-15 17:20 ` [Bug middle-end/110587] [14 regression] 96% pr28071.c compile time regression since r14-2337-g37a231cc7594d1 pinskia at gcc dot gnu.org
@ 2023-07-15 17:21 ` pinskia at gcc dot gnu.org
  2023-07-17  6:27 ` crazylht at gmail dot com
                   ` (19 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-07-15 17:21 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110587

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Target|                            |x86_64-linux-gnu

--- Comment #2 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Would be interesting to see if it is the register allocator and where (which
function) in GCC the compile time slow down happens.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug rtl-optimization/110587] [14 regression] 96% pr28071.c compile time regression since r14-2337-g37a231cc7594d1
  2023-07-07  9:52 [Bug middle-end/110587] New: 96% pr28071.c compile time regression betwen g:8377cf1bf41a0a9d9d49de807b2341f0bf5d30cf and g:3a61ca1b9256535e1bfb19b2d46cde21f3908a5d hubicka at gcc dot gnu.org
                   ` (2 preceding siblings ...)
  2023-07-15 17:21 ` [Bug rtl-optimization/110587] " pinskia at gcc dot gnu.org
@ 2023-07-17  6:27 ` crazylht at gmail dot com
  2023-07-17  6:28 ` crazylht at gmail dot com
                   ` (18 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: crazylht at gmail dot com @ 2023-07-17  6:27 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110587

Hongtao.liu <crazylht at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |crazylht at gmail dot com

--- Comment #3 from Hongtao.liu <crazylht at gmail dot com> ---
I can't find pr28071.c in GCC testsuite, but find an attached source file in
the PR #c1, is that pr28071.c you means?

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug rtl-optimization/110587] [14 regression] 96% pr28071.c compile time regression since r14-2337-g37a231cc7594d1
  2023-07-07  9:52 [Bug middle-end/110587] New: 96% pr28071.c compile time regression betwen g:8377cf1bf41a0a9d9d49de807b2341f0bf5d30cf and g:3a61ca1b9256535e1bfb19b2d46cde21f3908a5d hubicka at gcc dot gnu.org
                   ` (3 preceding siblings ...)
  2023-07-17  6:27 ` crazylht at gmail dot com
@ 2023-07-17  6:28 ` crazylht at gmail dot com
  2023-07-17  8:56 ` jamborm at gcc dot gnu.org
                   ` (17 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: crazylht at gmail dot com @ 2023-07-17  6:28 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110587

--- Comment #4 from Hongtao.liu <crazylht at gmail dot com> ---
(In reply to Jan Hubicka from comment #0)
> Seen here:
> https://lnt.opensuse.org/db_default/v4/CPP/graph?plot.0=288.597.8
> https://lnt.opensuse.org/db_default/v4/CPP/graph?plot.0=468.597.8
> https://lnt.opensuse.org/db_default/v4/CPP/graph?plot.0=172.597.8

Also is O0_g means compile flag is -O0 -g?

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug rtl-optimization/110587] [14 regression] 96% pr28071.c compile time regression since r14-2337-g37a231cc7594d1
  2023-07-07  9:52 [Bug middle-end/110587] New: 96% pr28071.c compile time regression betwen g:8377cf1bf41a0a9d9d49de807b2341f0bf5d30cf and g:3a61ca1b9256535e1bfb19b2d46cde21f3908a5d hubicka at gcc dot gnu.org
                   ` (4 preceding siblings ...)
  2023-07-17  6:28 ` crazylht at gmail dot com
@ 2023-07-17  8:56 ` jamborm at gcc dot gnu.org
  2023-07-17  9:13 ` rguenth at gcc dot gnu.org
                   ` (16 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: jamborm at gcc dot gnu.org @ 2023-07-17  8:56 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110587

--- Comment #5 from Martin Jambor <jamborm at gcc dot gnu.org> ---
(In reply to Hongtao.liu from comment #3)
> I can't find pr28071.c in GCC testsuite, but find an attached source file in
> the PR #c1, is that pr28071.c you means?

Yes.


(In reply to Hongtao.liu from comment #4)
> (In reply to Jan Hubicka from comment #0)
> > Seen here:
> > https://lnt.opensuse.org/db_default/v4/CPP/graph?plot.0=288.597.8
> > https://lnt.opensuse.org/db_default/v4/CPP/graph?plot.0=468.597.8
> > https://lnt.opensuse.org/db_default/v4/CPP/graph?plot.0=172.597.8
> 
> Also is O0_g means compile flag is -O0 -g?

That is what I used to bisect, although I *think* that -g is not
necessary.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug rtl-optimization/110587] [14 regression] 96% pr28071.c compile time regression since r14-2337-g37a231cc7594d1
  2023-07-07  9:52 [Bug middle-end/110587] New: 96% pr28071.c compile time regression betwen g:8377cf1bf41a0a9d9d49de807b2341f0bf5d30cf and g:3a61ca1b9256535e1bfb19b2d46cde21f3908a5d hubicka at gcc dot gnu.org
                   ` (5 preceding siblings ...)
  2023-07-17  8:56 ` jamborm at gcc dot gnu.org
@ 2023-07-17  9:13 ` rguenth at gcc dot gnu.org
  2023-07-17 10:52 ` jamborm at gcc dot gnu.org
                   ` (15 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: rguenth at gcc dot gnu.org @ 2023-07-17  9:13 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110587

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |rguenth at gcc dot gnu.org

--- Comment #6 from Richard Biener <rguenth at gcc dot gnu.org> ---
That doesn't seem to be the larger jump at Jul 16/17?  Can we bisect that as
well?

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug rtl-optimization/110587] [14 regression] 96% pr28071.c compile time regression since r14-2337-g37a231cc7594d1
  2023-07-07  9:52 [Bug middle-end/110587] New: 96% pr28071.c compile time regression betwen g:8377cf1bf41a0a9d9d49de807b2341f0bf5d30cf and g:3a61ca1b9256535e1bfb19b2d46cde21f3908a5d hubicka at gcc dot gnu.org
                   ` (6 preceding siblings ...)
  2023-07-17  9:13 ` rguenth at gcc dot gnu.org
@ 2023-07-17 10:52 ` jamborm at gcc dot gnu.org
  2023-07-17 11:09 ` rguenth at gcc dot gnu.org
                   ` (14 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: jamborm at gcc dot gnu.org @ 2023-07-17 10:52 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110587

Martin Jambor <jamborm at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |sayle at gcc dot gnu.org

--- Comment #7 from Martin Jambor <jamborm at gcc dot gnu.org> ---
Oops sorry, indeed, the much bigger regression is because of:

commit 8911879415d6c2a7baad88235554a912887a1c5c
Author: Roger Sayle <roger@nextmovesoftware.com>
Date:   Fri Jul 14 18:10:05 2023 +0100

    i386: Improved insv of DImode/DFmode {high,low}parts into TImode.

    This is the next piece towards a fix for (the x86_64 ABI issues affecting)
    PR 88873.  This patch generalizes the recent tweak to ix86_expand_move
    for setting the highpart of a TImode reg from a DImode source using
    *insvti_highpart_1, to handle both DImode and DFmode sources, and also
    use the recently added *insvti_lowpart_1 for setting the lowpart.

    Although this is another intermediate step (not yet a fix), towards
    enabling *insvti and *concat* patterns to be candidates for TImode STV
    (by using V2DI/V2DF instructions), it already improves things a little.
    [...]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug rtl-optimization/110587] [14 regression] 96% pr28071.c compile time regression since r14-2337-g37a231cc7594d1
  2023-07-07  9:52 [Bug middle-end/110587] New: 96% pr28071.c compile time regression betwen g:8377cf1bf41a0a9d9d49de807b2341f0bf5d30cf and g:3a61ca1b9256535e1bfb19b2d46cde21f3908a5d hubicka at gcc dot gnu.org
                   ` (7 preceding siblings ...)
  2023-07-17 10:52 ` jamborm at gcc dot gnu.org
@ 2023-07-17 11:09 ` rguenth at gcc dot gnu.org
  2023-07-17 11:26 ` roger at nextmovesoftware dot com
                   ` (13 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: rguenth at gcc dot gnu.org @ 2023-07-17 11:09 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110587

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |vmakarov at gcc dot gnu.org

--- Comment #8 from Richard Biener <rguenth at gcc dot gnu.org> ---
Btw, with GCC 13.1 this is already a LRA hog:

 LRA non-specific                   :   3.31 ( 73%)   0.01 (  9%)   3.33 ( 72%)
 3876k (  3%)
 TOTAL                              :   4.53          0.11          4.65       
  126M

GCC 8 and before were worse.  On trunk:

 LRA non-specific                   :   6.22 ( 69%)   0.02 ( 20%)   6.22 ( 69%)
 8922k (  6%)
 LRA hard reg assignment            :   1.00 ( 11%)   0.02 ( 20%)   1.02 ( 11%)
    0  (  0%)
 TOTAL                              :   8.97          0.10          9.08       
  149M

the above is with just -O0.

Profile:

Samples: 37K of event 'cycles:u', Event count (approx.): 49984847870            
Overhead       Samples  Command  Shared Object       Symbol                     
  51.58%         19087  cc1      cc1                 [.] lra_final_code_change
  11.10%          4106  cc1      cc1                 [.] next_nondebug_insn
   7.61%          2879  cc1      cc1                 [.] bitmap_set_bit
   6.42%          2425  cc1      cc1                 [.] find_hard_regno_for_1
   2.28%           842  cc1      cc1                 [.] bitmap_bit_p
   0.99%           365  cc1      cc1                 [.]
lra_create_live_ranges_1

it possibly means we now spill more, at -O0 at least.  We have a 10%
regression in assembly line count between 13 and trunk.

The main hog in lra_final_code_change is calls to regno_in_use_p and
the loop within that.  The BB in this function is _huge_ so the whole
process quickly becomes quadratic.  Maybe the whole thing should work
backwards on a BB and this info collected on-the-fly as some "liveness"
problem?

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug rtl-optimization/110587] [14 regression] 96% pr28071.c compile time regression since r14-2337-g37a231cc7594d1
  2023-07-07  9:52 [Bug middle-end/110587] New: 96% pr28071.c compile time regression betwen g:8377cf1bf41a0a9d9d49de807b2341f0bf5d30cf and g:3a61ca1b9256535e1bfb19b2d46cde21f3908a5d hubicka at gcc dot gnu.org
                   ` (8 preceding siblings ...)
  2023-07-17 11:09 ` rguenth at gcc dot gnu.org
@ 2023-07-17 11:26 ` roger at nextmovesoftware dot com
  2023-07-17 11:42 ` rguenth at gcc dot gnu.org
                   ` (12 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: roger at nextmovesoftware dot com @ 2023-07-17 11:26 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110587

Roger Sayle <roger at nextmovesoftware dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |roger at nextmovesoftware dot com
           See Also|                            |https://gcc.gnu.org/bugzill
                   |                            |a/show_bug.cgi?id=88873

--- Comment #9 from Roger Sayle <roger at nextmovesoftware dot com> ---
I'll check whether turning off the insvti_{low,high}part transformations during
lra_in_progress helps compile-time.  I believe everytime reload encounters a
TI<->SSE SUBREG, the spill/reload generates two or three additional
instructions.  I'm thinking that perhaps this should ideally be an UNSPEC, that
we can split after reload. As shown in PR 88873, we'd like SSE->TI->SSE to
avoid going via memory [where currently this happens twice]. It looks like
"interval" in pr28071.c suffers from the same x86 ABI issues [i.e. is placed in
scalar TImode, where ideally we'd like V2DI].

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug rtl-optimization/110587] [14 regression] 96% pr28071.c compile time regression since r14-2337-g37a231cc7594d1
  2023-07-07  9:52 [Bug middle-end/110587] New: 96% pr28071.c compile time regression betwen g:8377cf1bf41a0a9d9d49de807b2341f0bf5d30cf and g:3a61ca1b9256535e1bfb19b2d46cde21f3908a5d hubicka at gcc dot gnu.org
                   ` (9 preceding siblings ...)
  2023-07-17 11:26 ` roger at nextmovesoftware dot com
@ 2023-07-17 11:42 ` rguenth at gcc dot gnu.org
  2023-07-17 16:04 ` roger at nextmovesoftware dot com
                   ` (11 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: rguenth at gcc dot gnu.org @ 2023-07-17 11:42 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110587

--- Comment #10 from Richard Biener <rguenth at gcc dot gnu.org> ---
I wonder what the following does anyway.  We delete the noop move
only when either the reg isn't used for return or it isn't in
use in later insns between 'insn' and the next set of it.
That seems to detect the hardreg = X; USE (hardreg); return sequence
and wants to protect that despite X being the same as 'hardreg'.

          /* IRA can generate move insns involving pseudos.  It is
             better remove them earlier to speed up compiler a bit.
             It is also better to do it here as they might not pass
             final RTL check in LRA, (e.g. insn moving a control
             register into itself).  So remove an useless move insn
             unless next insn is USE marking the return reg (we should
             save this as some subsequent optimizations assume that
             such original insns are saved).  */
          if (NONJUMP_INSN_P (insn) && GET_CODE (pat) == SET
              && REG_P (SET_SRC (pat)) && REG_P (SET_DEST (pat))
              && REGNO (SET_SRC (pat)) == REGNO (SET_DEST (pat))
              && (! return_regno_p (REGNO (SET_SRC (pat)))
                  || ! regno_in_use_p (insn, REGNO (SET_SRC (pat)))))

what's odd is of course that return_regno_p returns true so much for this
testcase.

The return sequence to protect should be easily discoverable by walking
from the function exit and thus could be marked instead of trying to
match it to each insn like above.

But I don't understand why we want to preserve this noop copy anyway ...

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug rtl-optimization/110587] [14 regression] 96% pr28071.c compile time regression since r14-2337-g37a231cc7594d1
  2023-07-07  9:52 [Bug middle-end/110587] New: 96% pr28071.c compile time regression betwen g:8377cf1bf41a0a9d9d49de807b2341f0bf5d30cf and g:3a61ca1b9256535e1bfb19b2d46cde21f3908a5d hubicka at gcc dot gnu.org
                   ` (10 preceding siblings ...)
  2023-07-17 11:42 ` rguenth at gcc dot gnu.org
@ 2023-07-17 16:04 ` roger at nextmovesoftware dot com
  2023-07-18  8:25 ` rguenth at gcc dot gnu.org
                   ` (10 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: roger at nextmovesoftware dot com @ 2023-07-17 16:04 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110587

Roger Sayle <roger at nextmovesoftware dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Assignee|unassigned at gcc dot gnu.org      |roger at nextmovesoftware dot com

--- Comment #11 from Roger Sayle <roger at nextmovesoftware dot com> ---
My (upcoming) patch for PR88873 dramatically reduces the compile-time (with
-O0) for this test case (by reducing the number of pseudos and reducing the
number of reloads).  But don't let that stop anyone from speeding up
lra_final_code_change.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug rtl-optimization/110587] [14 regression] 96% pr28071.c compile time regression since r14-2337-g37a231cc7594d1
  2023-07-07  9:52 [Bug middle-end/110587] New: 96% pr28071.c compile time regression betwen g:8377cf1bf41a0a9d9d49de807b2341f0bf5d30cf and g:3a61ca1b9256535e1bfb19b2d46cde21f3908a5d hubicka at gcc dot gnu.org
                   ` (11 preceding siblings ...)
  2023-07-17 16:04 ` roger at nextmovesoftware dot com
@ 2023-07-18  8:25 ` rguenth at gcc dot gnu.org
  2023-07-22 20:55 ` cvs-commit at gcc dot gnu.org
                   ` (9 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: rguenth at gcc dot gnu.org @ 2023-07-18  8:25 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110587

--- Comment #12 from Richard Biener <rguenth at gcc dot gnu.org> ---
This code block has a rich history with many fixes for many issues :/  (I
thought of just scrapping it ...), still regno_in_use_p is badly engineered in
this context.  Of course we're quite unlucky that the return REG is in use that
much for this large BB.

In the end the reason why this code exists and also some of the fallout
observed in the history point at issues that might be worth fixing elsewhere as
well.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug rtl-optimization/110587] [14 regression] 96% pr28071.c compile time regression since r14-2337-g37a231cc7594d1
  2023-07-07  9:52 [Bug middle-end/110587] New: 96% pr28071.c compile time regression betwen g:8377cf1bf41a0a9d9d49de807b2341f0bf5d30cf and g:3a61ca1b9256535e1bfb19b2d46cde21f3908a5d hubicka at gcc dot gnu.org
                   ` (12 preceding siblings ...)
  2023-07-18  8:25 ` rguenth at gcc dot gnu.org
@ 2023-07-22 20:55 ` cvs-commit at gcc dot gnu.org
  2023-07-25  8:38 ` rguenth at gcc dot gnu.org
                   ` (8 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2023-07-22 20:55 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110587

--- Comment #13 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Roger Sayle <sayle@gcc.gnu.org>:

https://gcc.gnu.org/g:8125b12f846b41f26e58c0fe3b218d654f65d1c8

commit r14-2730-g8125b12f846b41f26e58c0fe3b218d654f65d1c8
Author: Roger Sayle <roger@nextmovesoftware.com>
Date:   Sat Jul 22 21:52:55 2023 +0100

    i386: Don't use insvti_{high,low}part with -O0 (for compile-time).

    This patch attempts to help with PR rtl-optimization/110587, a regression
    of -O0 compile time for the pathological pr28071.c.  My recent patch helps
    a bit, but hasn't returned -O0 compile-time to where it was before my
    ix86_expand_move changes.  The obvious solution/workaround is to guard
    these new TImode parameter passing optimizations with "&& optimize", so
    they don't trigger when compiling with -O0.  The very minor complication
    is that "&& optimize" alone leads to the regression of pr110533.c, where
    our improved TImode parameter passing fixes a wrong-code issue with naked
    functions, importantly, when compiling with -O0.  This should explain
    the one line fix below "&& (optimize || ix86_function_naked (cfun))".

    I've an additional fix/tweak or two for this compile-time issue, but
    this change eliminates the part of the regression that I've caused.

    2023-07-22  Roger Sayle  <roger@nextmovesoftware.com>

    gcc/ChangeLog
            * config/i386/i386-expand.cc (ix86_expand_move): Disable the
            64-bit insertions into TImode optimizations with -O0, unless
            the function has the "naked" attribute (for PR target/110533).

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug rtl-optimization/110587] [14 regression] 96% pr28071.c compile time regression since r14-2337-g37a231cc7594d1
  2023-07-07  9:52 [Bug middle-end/110587] New: 96% pr28071.c compile time regression betwen g:8377cf1bf41a0a9d9d49de807b2341f0bf5d30cf and g:3a61ca1b9256535e1bfb19b2d46cde21f3908a5d hubicka at gcc dot gnu.org
                   ` (13 preceding siblings ...)
  2023-07-22 20:55 ` cvs-commit at gcc dot gnu.org
@ 2023-07-25  8:38 ` rguenth at gcc dot gnu.org
  2023-07-25  8:44 ` roger at nextmovesoftware dot com
                   ` (7 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: rguenth at gcc dot gnu.org @ 2023-07-25  8:38 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110587

--- Comment #14 from Richard Biener <rguenth at gcc dot gnu.org> ---
compile-time is back to the first jump caused by r14-2337-g37a231cc7594d1,
thanks Roger.  We still have

 LRA non-specific                   :   3.53 ( 75%)

at -O0 here which Rogers followup patch will improve (but not generally
solve the issue).

At -O1 combine dominates, at -O2 we see other parts of RA being slow:

 integrated RA                      :   7.10 ( 23%) 
 LRA non-specific                   :   1.56 (  5%)
 LRA virtuals elimination           :   0.07 (  0%)
 LRA reload inheritance             :   1.02 (  3%)
 LRA create live ranges             :   0.88 (  3%)
 LRA hard reg assignment            :   8.22 ( 27%)
 LRA coalesce pseudo regs           :   0.00 (  0%)
 LRA rematerialization              :   0.18 (  1%)

Samples: 124K of event 'cycles:u', Event count (approx.): 164730867020          
Overhead       Samples  Command  Shared Object       Symbol                     
  16.60%         20660  cc1      cc1                 [.] find_hard_regno_for_1
  11.90%         14742  cc1      cc1                 [.] bitmap_set_bit
   6.47%          7973  cc1      cc1                 [.] color_allocnos
   3.31%          4023  cc1      cc1                 [.] bitmap_bit_p
   3.07%          3791  cc1      cc1                 [.]
remove_allocno_from_bucket_and_push
   2.77%          3435  cc1      cc1                 [.] assign_hard_reg
   2.54%          3138  cc1      cc1                 [.] ira_build_conflicts

in find_hard_regno_for_1 the loop over live ranges is what's costly, esp.
because it seems the conditionals in the loops depend on (indirect) memory
and that no longer fits nicely into caches.

Maybe regno_allocno_class_array can be shrunk from 'enum reg_class'
(unsigned int) to something smaller.  It looks like this array is a
memory optimization since reg_allocno_class would perform a much sparser
access.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug rtl-optimization/110587] [14 regression] 96% pr28071.c compile time regression since r14-2337-g37a231cc7594d1
  2023-07-07  9:52 [Bug middle-end/110587] New: 96% pr28071.c compile time regression betwen g:8377cf1bf41a0a9d9d49de807b2341f0bf5d30cf and g:3a61ca1b9256535e1bfb19b2d46cde21f3908a5d hubicka at gcc dot gnu.org
                   ` (14 preceding siblings ...)
  2023-07-25  8:38 ` rguenth at gcc dot gnu.org
@ 2023-07-25  8:44 ` roger at nextmovesoftware dot com
  2023-07-27 18:28 ` roger at nextmovesoftware dot com
                   ` (6 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: roger at nextmovesoftware dot com @ 2023-07-25  8:44 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110587

--- Comment #15 from Roger Sayle <roger at nextmovesoftware dot com> ---
Hi Richard,
There's another patch awaiting review at
https://gcc.gnu.org/pipermail/gcc-patches/2023-July/625282.html
and I've another follow-up after that currently regression testing...

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug rtl-optimization/110587] [14 regression] 96% pr28071.c compile time regression since r14-2337-g37a231cc7594d1
  2023-07-07  9:52 [Bug middle-end/110587] New: 96% pr28071.c compile time regression betwen g:8377cf1bf41a0a9d9d49de807b2341f0bf5d30cf and g:3a61ca1b9256535e1bfb19b2d46cde21f3908a5d hubicka at gcc dot gnu.org
                   ` (15 preceding siblings ...)
  2023-07-25  8:44 ` roger at nextmovesoftware dot com
@ 2023-07-27 18:28 ` roger at nextmovesoftware dot com
  2023-07-28  8:40 ` cvs-commit at gcc dot gnu.org
                   ` (5 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: roger at nextmovesoftware dot com @ 2023-07-27 18:28 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110587

Roger Sayle <roger at nextmovesoftware dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Assignee|roger at nextmovesoftware dot com  |unassigned at gcc dot gnu.org

--- Comment #16 from Roger Sayle <roger at nextmovesoftware dot com> ---
My patch (in comment #15) is obsoleted by Richard Biener's much better
solution(s):
https://gcc.gnu.org/pipermail/gcc-patches/2023-July/625416.html
https://gcc.gnu.org/pipermail/gcc-patches/2023-July/625417.html

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug rtl-optimization/110587] [14 regression] 96% pr28071.c compile time regression since r14-2337-g37a231cc7594d1
  2023-07-07  9:52 [Bug middle-end/110587] New: 96% pr28071.c compile time regression betwen g:8377cf1bf41a0a9d9d49de807b2341f0bf5d30cf and g:3a61ca1b9256535e1bfb19b2d46cde21f3908a5d hubicka at gcc dot gnu.org
                   ` (16 preceding siblings ...)
  2023-07-27 18:28 ` roger at nextmovesoftware dot com
@ 2023-07-28  8:40 ` cvs-commit at gcc dot gnu.org
  2023-08-02  7:04 ` cvs-commit at gcc dot gnu.org
                   ` (4 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2023-07-28  8:40 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110587

--- Comment #17 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Roger Sayle <sayle@gcc.gnu.org>:

https://gcc.gnu.org/g:095eb138f736d94dabf9a07a6671bd351be0e66a

commit r14-2851-g095eb138f736d94dabf9a07a6671bd351be0e66a
Author: Roger Sayle <roger@nextmovesoftware.com>
Date:   Fri Jul 28 09:39:46 2023 +0100

    PR rtl-optimization/110587: Reduce useless moves in compile-time hog.

    This patch is one of a series of fixes for PR rtl-optimization/110587,
    a compile-time regression with -O0, that attempts to address the underlying
    cause.  As noted previously, the pathological test case pr28071.c contains
    a large number of useless register-to-register moves that can produce
    quadratic behaviour (in LRA).  These moves are generated during RTL
    expansion in emit_group_load_1, where the middle-end attempts to simplify
    the source before calling extract_bit_field.  This is reasonable if the
    source is a complex expression (from before the tree-ssa optimizers), or
    a SUBREG, or a hard register, but it's not particularly useful to copy
    a pseudo register into a new pseudo register.  This patch eliminates that
    redundancy.

    The -fdump-tree-expand for pr28071.c compiled with -O0 currently contains
    777K lines, with this patch it contains 717K lines, i.e. saving about 60K
    lines (admittedly of debugging text output, but it makes the point).

    2023-07-28  Roger Sayle  <roger@nextmovesoftware.com>
                Richard Biener  <rguenther@suse.de>

    gcc/ChangeLog
            PR middle-end/28071
            PR rtl-optimization/110587
            * expr.cc (emit_group_load_1): Simplify logic for calling
            force_reg on ORIG_SRC, to avoid making a copy if the source
            is already in a pseudo register.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug rtl-optimization/110587] [14 regression] 96% pr28071.c compile time regression since r14-2337-g37a231cc7594d1
  2023-07-07  9:52 [Bug middle-end/110587] New: 96% pr28071.c compile time regression betwen g:8377cf1bf41a0a9d9d49de807b2341f0bf5d30cf and g:3a61ca1b9256535e1bfb19b2d46cde21f3908a5d hubicka at gcc dot gnu.org
                   ` (17 preceding siblings ...)
  2023-07-28  8:40 ` cvs-commit at gcc dot gnu.org
@ 2023-08-02  7:04 ` cvs-commit at gcc dot gnu.org
  2023-08-02  7:47 ` rguenth at gcc dot gnu.org
                   ` (3 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2023-08-02  7:04 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110587

--- Comment #18 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Richard Biener <rguenth@gcc.gnu.org>:

https://gcc.gnu.org/g:07b7cd70399d22c113ad8bb1eff5cc2d12973d33

commit r14-2920-g07b7cd70399d22c113ad8bb1eff5cc2d12973d33
Author: Richard Biener <rguenther@suse.de>
Date:   Tue Jul 25 15:32:11 2023 +0200

    rtl-optimization/110587 - remove quadratic regno_in_use_p

    The following removes the code checking whether a noop copy
    is between something involved in the return sequence composed
    of a SET and USE.  Instead of checking for this special-case
    the following makes us only ever remove noop copies between
    pseudos - which is the case that is necessary for IRA/LRA
    interfacing to function according to the comment.  That makes
    looking for the return reg special case unnecessary, reducing
    the compile-time in LRA non-specific to zero for the testcase.

            PR rtl-optimization/110587
            * lra-spills.cc (return_regno_p): Remove.
            (regno_in_use_p): Likewise.
            (lra_final_code_change): Do not remove noop moves
            between hard registers.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug rtl-optimization/110587] [14 regression] 96% pr28071.c compile time regression since r14-2337-g37a231cc7594d1
  2023-07-07  9:52 [Bug middle-end/110587] New: 96% pr28071.c compile time regression betwen g:8377cf1bf41a0a9d9d49de807b2341f0bf5d30cf and g:3a61ca1b9256535e1bfb19b2d46cde21f3908a5d hubicka at gcc dot gnu.org
                   ` (18 preceding siblings ...)
  2023-08-02  7:04 ` cvs-commit at gcc dot gnu.org
@ 2023-08-02  7:47 ` rguenth at gcc dot gnu.org
  2023-08-02  9:11 ` ubizjak at gmail dot com
                   ` (2 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: rguenth at gcc dot gnu.org @ 2023-08-02  7:47 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110587

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         Resolution|---                         |FIXED
             Status|NEW                         |RESOLVED

--- Comment #19 from Richard Biener <rguenth at gcc dot gnu.org> ---
The tester shows the issue is fixed now (we're faster than before the
regression).  At -O0 compile-time is still dominated by RA
(r14-2920-g07b7cd70399d22, release checking):

 integrated RA                      :   0.29 ( 32%)
 LRA non-specific                   :   0.15 ( 16%)
 TOTAL                              :   0.91

Samples: 3K of event 'cycles:u', Event count (approx.): 5038659855              
Overhead       Samples  Command  Shared Object       Symbol                     
   6.15%           233  cc1      cc1                 [.] process_alt_operands
   4.29%           163  cc1      cc1                 [.] process_bb_node_lives
   3.72%           142  cc1      cc1                 [.] record_reg_classes
   3.01%           114  cc1      cc1                 [.] mark_ref_dead
   2.87%           109  cc1      cc1                 [.] constrain_operands
   2.71%           114  cc1      cc1                 [.]
df_ref_create_structure
   2.47%            94  cc1      cc1                 [.] ira_setup_alts

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug rtl-optimization/110587] [14 regression] 96% pr28071.c compile time regression since r14-2337-g37a231cc7594d1
  2023-07-07  9:52 [Bug middle-end/110587] New: 96% pr28071.c compile time regression betwen g:8377cf1bf41a0a9d9d49de807b2341f0bf5d30cf and g:3a61ca1b9256535e1bfb19b2d46cde21f3908a5d hubicka at gcc dot gnu.org
                   ` (19 preceding siblings ...)
  2023-08-02  7:47 ` rguenth at gcc dot gnu.org
@ 2023-08-02  9:11 ` ubizjak at gmail dot com
  2023-08-02  9:45 ` rguenth at gcc dot gnu.org
  2023-08-09  6:48 ` cvs-commit at gcc dot gnu.org
  22 siblings, 0 replies; 24+ messages in thread
From: ubizjak at gmail dot com @ 2023-08-02  9:11 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110587

--- Comment #20 from Uroš Bizjak <ubizjak at gmail dot com> ---
Can we revert the Comment #13 kludge now?

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug rtl-optimization/110587] [14 regression] 96% pr28071.c compile time regression since r14-2337-g37a231cc7594d1
  2023-07-07  9:52 [Bug middle-end/110587] New: 96% pr28071.c compile time regression betwen g:8377cf1bf41a0a9d9d49de807b2341f0bf5d30cf and g:3a61ca1b9256535e1bfb19b2d46cde21f3908a5d hubicka at gcc dot gnu.org
                   ` (20 preceding siblings ...)
  2023-08-02  9:11 ` ubizjak at gmail dot com
@ 2023-08-02  9:45 ` rguenth at gcc dot gnu.org
  2023-08-09  6:48 ` cvs-commit at gcc dot gnu.org
  22 siblings, 0 replies; 24+ messages in thread
From: rguenth at gcc dot gnu.org @ 2023-08-02  9:45 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110587

--- Comment #21 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Uroš Bizjak from comment #20)
> Can we revert the Comment #13 kludge now?

When we revert it we get

 integrated RA                      :   0.42 ( 17%)   0.00 (  0%)   0.43 ( 17%)
   19M ( 16%)
 LRA non-specific                   :   0.39 ( 16%)   0.00 (  0%)   0.39 ( 15%)
 6304k (  5%)
 LRA virtuals elimination           :   0.03 (  1%)   0.00 (  0%)   0.02 (  1%)
 3729k (  3%)
 LRA reload inheritance             :   0.17 (  7%)   0.01 ( 10%)   0.18 (  7%)
 5109k (  4%)
 LRA create live ranges             :   0.27 ( 11%)   0.00 (  0%)   0.28 ( 11%)
  984k (  1%)
 LRA hard reg assignment            :   0.72 ( 30%)   0.01 ( 10%)   0.74 ( 29%)
    0  (  0%)
 TOTAL                              :   2.43          0.10          2.54       
  123M

so the regression is back and also code size increases significantly.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug rtl-optimization/110587] [14 regression] 96% pr28071.c compile time regression since r14-2337-g37a231cc7594d1
  2023-07-07  9:52 [Bug middle-end/110587] New: 96% pr28071.c compile time regression betwen g:8377cf1bf41a0a9d9d49de807b2341f0bf5d30cf and g:3a61ca1b9256535e1bfb19b2d46cde21f3908a5d hubicka at gcc dot gnu.org
                   ` (21 preceding siblings ...)
  2023-08-02  9:45 ` rguenth at gcc dot gnu.org
@ 2023-08-09  6:48 ` cvs-commit at gcc dot gnu.org
  22 siblings, 0 replies; 24+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2023-08-09  6:48 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110587

--- Comment #22 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Richard Biener <rguenth@gcc.gnu.org>:

https://gcc.gnu.org/g:b66e613a1a8d5b8fc9d8b03f7b60260700acf833

commit r14-3095-gb66e613a1a8d5b8fc9d8b03f7b60260700acf833
Author: Richard Biener <rguenther@suse.de>
Date:   Tue Jul 25 15:36:30 2023 +0200

    rtl-optimization/110587 - speedup find_hard_regno_for_1

    The following applies a micro-optimization to find_hard_regno_for_1,
    re-ordering the check so we can easily jump-thread by using an else.
    This reduces the time spent in this function by 15% for the testcase
    in the PR.

            PR rtl-optimization/110587
            * lra-assigns.cc (find_hard_regno_for_1): Re-order checks.

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2023-08-09  6:48 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-07-07  9:52 [Bug middle-end/110587] New: 96% pr28071.c compile time regression betwen g:8377cf1bf41a0a9d9d49de807b2341f0bf5d30cf and g:3a61ca1b9256535e1bfb19b2d46cde21f3908a5d hubicka at gcc dot gnu.org
2023-07-15 14:11 ` [Bug middle-end/110587] " jamborm at gcc dot gnu.org
2023-07-15 17:20 ` [Bug middle-end/110587] [14 regression] 96% pr28071.c compile time regression since r14-2337-g37a231cc7594d1 pinskia at gcc dot gnu.org
2023-07-15 17:21 ` [Bug rtl-optimization/110587] " pinskia at gcc dot gnu.org
2023-07-17  6:27 ` crazylht at gmail dot com
2023-07-17  6:28 ` crazylht at gmail dot com
2023-07-17  8:56 ` jamborm at gcc dot gnu.org
2023-07-17  9:13 ` rguenth at gcc dot gnu.org
2023-07-17 10:52 ` jamborm at gcc dot gnu.org
2023-07-17 11:09 ` rguenth at gcc dot gnu.org
2023-07-17 11:26 ` roger at nextmovesoftware dot com
2023-07-17 11:42 ` rguenth at gcc dot gnu.org
2023-07-17 16:04 ` roger at nextmovesoftware dot com
2023-07-18  8:25 ` rguenth at gcc dot gnu.org
2023-07-22 20:55 ` cvs-commit at gcc dot gnu.org
2023-07-25  8:38 ` rguenth at gcc dot gnu.org
2023-07-25  8:44 ` roger at nextmovesoftware dot com
2023-07-27 18:28 ` roger at nextmovesoftware dot com
2023-07-28  8:40 ` cvs-commit at gcc dot gnu.org
2023-08-02  7:04 ` cvs-commit at gcc dot gnu.org
2023-08-02  7:47 ` rguenth at gcc dot gnu.org
2023-08-02  9:11 ` ubizjak at gmail dot com
2023-08-02  9:45 ` rguenth at gcc dot gnu.org
2023-08-09  6:48 ` cvs-commit at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).