public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug rtl-optimization/101523] New: Huge number of combine attempts
@ 2021-07-20  9:35 krebbel at gcc dot gnu.org
  2021-07-20  9:41 ` [Bug rtl-optimization/101523] " krebbel at gcc dot gnu.org
                   ` (66 more replies)
  0 siblings, 67 replies; 68+ messages in thread
From: krebbel at gcc dot gnu.org @ 2021-07-20  9:35 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101523

            Bug ID: 101523
           Summary: Huge number of combine attempts
           Product: gcc
           Version: 12.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: rtl-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: krebbel at gcc dot gnu.org
  Target Milestone: ---

Compiling the attached testcase on s390x with:

cc1plus -fpreprocessed t.ii -quiet -march=z196 -g -O2 -std=c++11

produces a huge amount of combine attempts compared to x86 consuming more than
11GB of memory:

x86:      27264 combine attempts for 170631 insns
s390x: 40009540 combine attempts for 164327 insns

gcc g:6d4da4aeef5b20f7f9693ddc27d26740d0dbe36c

^ permalink raw reply	[flat|nested] 68+ messages in thread

* [Bug rtl-optimization/101523] Huge number of combine attempts
  2021-07-20  9:35 [Bug rtl-optimization/101523] New: Huge number of combine attempts krebbel at gcc dot gnu.org
@ 2021-07-20  9:41 ` krebbel at gcc dot gnu.org
  2021-07-20  9:42 ` krebbel at gcc dot gnu.org
                   ` (65 subsequent siblings)
  66 siblings, 0 replies; 68+ messages in thread
From: krebbel at gcc dot gnu.org @ 2021-07-20  9:41 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101523

--- Comment #1 from Andreas Krebbel <krebbel at gcc dot gnu.org> ---
This appears to be triggered by try_combine unnecessarily setting back the
position by returning the i2 insn.

When 866 is inserted into 973 866 still needs to be kept around for other
users. So try_combine first merges the two sets into a parallel and immediately
notices that this can't be recognized. Because none of the sets is a trivial
move it is split again into two separate insns. Although the new i2 pattern
exactly matches the input i2 combine considers this to be a new insn and
triggers all the scanning log link creation and eventually returns it what
let's the combine start all over at 866.

Due to that combine tries many of the substitutions more than 400x.

Trying 866 -> 973:
  866: r22393:DI=r22391:DI+r22392:DI
  973: r22499:DF=r22498:DF*[r22393:DI]
      REG_DEAD r22498:DF
Failed to match this instruction:
(parallel [
        (set (reg:DF 22499)
            (mult:DF (reg:DF 22498)
                (mem:DF (plus:DI (reg/f:DI 22391 [ _85085 ])
                        (reg:DI 22392 [ _85086 ])) [17 *_85087+0 S8 A64])))
        (set (reg/f:DI 22393 [ _85087 ])
            (plus:DI (reg/f:DI 22391 [ _85085 ])
                (reg:DI 22392 [ _85086 ])))
    ])
Failed to match this instruction:
(parallel [
        (set (reg:DF 22499)
            (mult:DF (reg:DF 22498)
                (mem:DF (plus:DI (reg/f:DI 22391 [ _85085 ])
                        (reg:DI 22392 [ _85086 ])) [17 *_85087+0 S8 A64])))
        (set (reg/f:DI 22393 [ _85087 ])
            (plus:DI (reg/f:DI 22391 [ _85085 ])
                (reg:DI 22392 [ _85086 ])))
    ])
Successfully matched this instruction:
(set (reg/f:DI 22393 [ _85087 ])
    (plus:DI (reg/f:DI 22391 [ _85085 ])
        (reg:DI 22392 [ _85086 ])))
Successfully matched this instruction:
(set (reg:DF 22499)
    (mult:DF (reg:DF 22498)
        (mem:DF (plus:DI (reg/f:DI 22391 [ _85085 ])
                (reg:DI 22392 [ _85086 ])) [17 *_85087+0 S8 A64])))
allowing combination of insns 866 and 973
original costs 4 + 4 = 8
replacement costs 4 + 4 = 8
modifying insn i2   866: r22393:DI=r22391:DI+r22392:DI
deferring rescan insn with uid = 866.
modifying insn i3   973: r22499:DF=r22498:DF*[r22391:DI+r22392:DI]
      REG_DEAD r22498:DF
deferring rescan insn with uid = 973.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* [Bug rtl-optimization/101523] Huge number of combine attempts
  2021-07-20  9:35 [Bug rtl-optimization/101523] New: Huge number of combine attempts krebbel at gcc dot gnu.org
  2021-07-20  9:41 ` [Bug rtl-optimization/101523] " krebbel at gcc dot gnu.org
@ 2021-07-20  9:42 ` krebbel at gcc dot gnu.org
  2021-07-20 21:27 ` segher at gcc dot gnu.org
                   ` (64 subsequent siblings)
  66 siblings, 0 replies; 68+ messages in thread
From: krebbel at gcc dot gnu.org @ 2021-07-20  9:42 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101523

--- Comment #2 from Andreas Krebbel <krebbel at gcc dot gnu.org> ---
Created attachment 51174
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=51174&action=edit
Experimental Fix

With that patch the number of combine attempts goes back to normal.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* [Bug rtl-optimization/101523] Huge number of combine attempts
  2021-07-20  9:35 [Bug rtl-optimization/101523] New: Huge number of combine attempts krebbel at gcc dot gnu.org
  2021-07-20  9:41 ` [Bug rtl-optimization/101523] " krebbel at gcc dot gnu.org
  2021-07-20  9:42 ` krebbel at gcc dot gnu.org
@ 2021-07-20 21:27 ` segher at gcc dot gnu.org
  2024-02-24  5:45 ` krebbel at gcc dot gnu.org
                   ` (63 subsequent siblings)
  66 siblings, 0 replies; 68+ messages in thread
From: segher at gcc dot gnu.org @ 2021-07-20 21:27 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101523

Segher Boessenkool <segher at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Assignee|unassigned at gcc dot gnu.org      |segher at gcc dot gnu.org

--- Comment #3 from Segher Boessenkool <segher at gcc dot gnu.org> ---
The "newi2pat = NULL_RTX;" you do there cannot be correct.  But the general
idea is fine, sure.  I'll work on it.

(nN

^ permalink raw reply	[flat|nested] 68+ messages in thread

* [Bug rtl-optimization/101523] Huge number of combine attempts
  2021-07-20  9:35 [Bug rtl-optimization/101523] New: Huge number of combine attempts krebbel at gcc dot gnu.org
                   ` (2 preceding siblings ...)
  2021-07-20 21:27 ` segher at gcc dot gnu.org
@ 2024-02-24  5:45 ` krebbel at gcc dot gnu.org
  2024-02-25 10:53 ` segher at gcc dot gnu.org
                   ` (62 subsequent siblings)
  66 siblings, 0 replies; 68+ messages in thread
From: krebbel at gcc dot gnu.org @ 2024-02-24  5:45 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101523

Andreas Krebbel <krebbel at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |stefansf at linux dot ibm.com

--- Comment #4 from Andreas Krebbel <krebbel at gcc dot gnu.org> ---
Hi Segher, any guidance on how to proceed with that? This recently was brought
up by distro people again because it is causing actual problems in their build
setups.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* [Bug rtl-optimization/101523] Huge number of combine attempts
  2021-07-20  9:35 [Bug rtl-optimization/101523] New: Huge number of combine attempts krebbel at gcc dot gnu.org
                   ` (3 preceding siblings ...)
  2024-02-24  5:45 ` krebbel at gcc dot gnu.org
@ 2024-02-25 10:53 ` segher at gcc dot gnu.org
  2024-02-25 15:06 ` segher at gcc dot gnu.org
                   ` (61 subsequent siblings)
  66 siblings, 0 replies; 68+ messages in thread
From: segher at gcc dot gnu.org @ 2024-02-25 10:53 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101523

--- Comment #5 from Segher Boessenkool <segher at gcc dot gnu.org> ---
Hrm.  When did this start, can you tell?

^ permalink raw reply	[flat|nested] 68+ messages in thread

* [Bug rtl-optimization/101523] Huge number of combine attempts
  2021-07-20  9:35 [Bug rtl-optimization/101523] New: Huge number of combine attempts krebbel at gcc dot gnu.org
                   ` (4 preceding siblings ...)
  2024-02-25 10:53 ` segher at gcc dot gnu.org
@ 2024-02-25 15:06 ` segher at gcc dot gnu.org
  2024-03-02 18:43 ` sarah.kriesch at opensuse dot org
                   ` (60 subsequent siblings)
  66 siblings, 0 replies; 68+ messages in thread
From: segher at gcc dot gnu.org @ 2024-02-25 15:06 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101523

--- Comment #6 from Segher Boessenkool <segher at gcc dot gnu.org> ---
There is no attached testcase, btw.  This makes investigating this kind of
tricky ;-)

^ permalink raw reply	[flat|nested] 68+ messages in thread

* [Bug rtl-optimization/101523] Huge number of combine attempts
  2021-07-20  9:35 [Bug rtl-optimization/101523] New: Huge number of combine attempts krebbel at gcc dot gnu.org
                   ` (5 preceding siblings ...)
  2024-02-25 15:06 ` segher at gcc dot gnu.org
@ 2024-03-02 18:43 ` sarah.kriesch at opensuse dot org
  2024-03-02 21:04 ` sjames at gcc dot gnu.org
                   ` (59 subsequent siblings)
  66 siblings, 0 replies; 68+ messages in thread
From: sarah.kriesch at opensuse dot org @ 2024-03-02 18:43 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101523

Sarah Julia Kriesch <sarah.kriesch at opensuse dot org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |sarah.kriesch at opensuse dot org

--- Comment #7 from Sarah Julia Kriesch <sarah.kriesch at opensuse dot org> ---
That started more than some years ago.
I have initiated the debugging with this openSUSE bug report:
https://bugzilla.opensuse.org/show_bug.cgi?id=1188441

IBM Bugzilla:
https://bugzilla.linux.ibm.com/show_bug.cgi?id=193674

The problem with the memory has been already available before my time at IBM.
That is reproducible on most Linux distributions for IBM Z & LinuxONE.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* [Bug rtl-optimization/101523] Huge number of combine attempts
  2021-07-20  9:35 [Bug rtl-optimization/101523] New: Huge number of combine attempts krebbel at gcc dot gnu.org
                   ` (6 preceding siblings ...)
  2024-03-02 18:43 ` sarah.kriesch at opensuse dot org
@ 2024-03-02 21:04 ` sjames at gcc dot gnu.org
  2024-03-03 19:32 ` segher at gcc dot gnu.org
                   ` (58 subsequent siblings)
  66 siblings, 0 replies; 68+ messages in thread
From: sjames at gcc dot gnu.org @ 2024-03-02 21:04 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101523

Sam James <sjames at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |sjames at gcc dot gnu.org

--- Comment #8 from Sam James <sjames at gcc dot gnu.org> ---
I think Andreas meant to attach a testcase but hadn't?

^ permalink raw reply	[flat|nested] 68+ messages in thread

* [Bug rtl-optimization/101523] Huge number of combine attempts
  2021-07-20  9:35 [Bug rtl-optimization/101523] New: Huge number of combine attempts krebbel at gcc dot gnu.org
                   ` (7 preceding siblings ...)
  2024-03-02 21:04 ` sjames at gcc dot gnu.org
@ 2024-03-03 19:32 ` segher at gcc dot gnu.org
  2024-03-04  8:18 ` krebbel at gcc dot gnu.org
                   ` (57 subsequent siblings)
  66 siblings, 0 replies; 68+ messages in thread
From: segher at gcc dot gnu.org @ 2024-03-03 19:32 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101523

--- Comment #9 from Segher Boessenkool <segher at gcc dot gnu.org> ---
Yeah.

Without a testcase we do not know what is going on.  Likely it is a testcase
with some very big basic block, which naturally gives very many combination
opportunities: the problem by nature is at least quadratic.  There are various
ways to limit the work done for this, all amounting to "just give up if the
problem is too big", just like we do in many other places.

It also is interesting to see when this started happening.  One of the external
PRs indicated this has happened for some years already -- so notably this is
not a regression -- but what change caused this then?  It can even be the 2-2
thing, if it started far enough back.  Or, the real reason why we need to know
when it started: possibly a bug was introduced.

In all cases, we need the testcase.

(The reason this does not happen on x86 is that so many things on x86 are
stored
in memory, and on less register-poor archs like 390 not.  Combine never does
dependencies via memory).

^ permalink raw reply	[flat|nested] 68+ messages in thread

* [Bug rtl-optimization/101523] Huge number of combine attempts
  2021-07-20  9:35 [Bug rtl-optimization/101523] New: Huge number of combine attempts krebbel at gcc dot gnu.org
                   ` (8 preceding siblings ...)
  2024-03-03 19:32 ` segher at gcc dot gnu.org
@ 2024-03-04  8:18 ` krebbel at gcc dot gnu.org
  2024-03-04 16:31 ` segher at gcc dot gnu.org
                   ` (56 subsequent siblings)
  66 siblings, 0 replies; 68+ messages in thread
From: krebbel at gcc dot gnu.org @ 2024-03-04  8:18 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101523

--- Comment #10 from Andreas Krebbel <krebbel at gcc dot gnu.org> ---
Created attachment 57599
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57599&action=edit
Testcase - somewhat reduced from libecpint

Verified with rev 146f16c97f6

cc1plus -O2 t.cc

try_combine invocations:
x86:
3
27262
27603

s390x:
8
40439657
40440339

^ permalink raw reply	[flat|nested] 68+ messages in thread

* [Bug rtl-optimization/101523] Huge number of combine attempts
  2021-07-20  9:35 [Bug rtl-optimization/101523] New: Huge number of combine attempts krebbel at gcc dot gnu.org
                   ` (9 preceding siblings ...)
  2024-03-04  8:18 ` krebbel at gcc dot gnu.org
@ 2024-03-04 16:31 ` segher at gcc dot gnu.org
  2024-03-04 16:49 ` sarah.kriesch at opensuse dot org
                   ` (55 subsequent siblings)
  66 siblings, 0 replies; 68+ messages in thread
From: segher at gcc dot gnu.org @ 2024-03-04 16:31 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101523

--- Comment #11 from Segher Boessenkool <segher at gcc dot gnu.org> ---
Okay, so it is a function with a huge BB, so this is not a regression at all,
there will have been incredibly many combination attempts since the day combine
has existed.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* [Bug rtl-optimization/101523] Huge number of combine attempts
  2021-07-20  9:35 [Bug rtl-optimization/101523] New: Huge number of combine attempts krebbel at gcc dot gnu.org
                   ` (10 preceding siblings ...)
  2024-03-04 16:31 ` segher at gcc dot gnu.org
@ 2024-03-04 16:49 ` sarah.kriesch at opensuse dot org
  2024-03-04 21:26 ` segher at gcc dot gnu.org
                   ` (54 subsequent siblings)
  66 siblings, 0 replies; 68+ messages in thread
From: sarah.kriesch at opensuse dot org @ 2024-03-04 16:49 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101523

--- Comment #12 from Sarah Julia Kriesch <sarah.kriesch at opensuse dot org> ---
Raise your hand if you need anything new from my side.
We have got enough use cases in our build system and upstream open source
projects gave warnings to remove the s390x support because of long building
time and the required resources.

I expect also, that this bug is a bigger case.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* [Bug rtl-optimization/101523] Huge number of combine attempts
  2021-07-20  9:35 [Bug rtl-optimization/101523] New: Huge number of combine attempts krebbel at gcc dot gnu.org
                   ` (11 preceding siblings ...)
  2024-03-04 16:49 ` sarah.kriesch at opensuse dot org
@ 2024-03-04 21:26 ` segher at gcc dot gnu.org
  2024-03-05  7:31 ` krebbel at gcc dot gnu.org
                   ` (53 subsequent siblings)
  66 siblings, 0 replies; 68+ messages in thread
From: segher at gcc dot gnu.org @ 2024-03-04 21:26 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101523

--- Comment #13 from Segher Boessenkool <segher at gcc dot gnu.org> ---
(In reply to Sarah Julia Kriesch from comment #12)
> I expect also, that this bug is a bigger case.

A bigger case of what?  What do you mean?

^ permalink raw reply	[flat|nested] 68+ messages in thread

* [Bug rtl-optimization/101523] Huge number of combine attempts
  2021-07-20  9:35 [Bug rtl-optimization/101523] New: Huge number of combine attempts krebbel at gcc dot gnu.org
                   ` (12 preceding siblings ...)
  2024-03-04 21:26 ` segher at gcc dot gnu.org
@ 2024-03-05  7:31 ` krebbel at gcc dot gnu.org
  2024-03-05 10:36 ` sarah.kriesch at opensuse dot org
                   ` (52 subsequent siblings)
  66 siblings, 0 replies; 68+ messages in thread
From: krebbel at gcc dot gnu.org @ 2024-03-05  7:31 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101523

--- Comment #14 from Andreas Krebbel <krebbel at gcc dot gnu.org> ---
If my analysis from comment #1 is correct, combine does superfluous steps here.
Getting rid of this should not cause any harm, but should be beneficial for
other targets as well. I agree that the patch I've proposed is kind of a hack.
Do you think this could be turned into a proper fix?

^ permalink raw reply	[flat|nested] 68+ messages in thread

* [Bug rtl-optimization/101523] Huge number of combine attempts
  2021-07-20  9:35 [Bug rtl-optimization/101523] New: Huge number of combine attempts krebbel at gcc dot gnu.org
                   ` (13 preceding siblings ...)
  2024-03-05  7:31 ` krebbel at gcc dot gnu.org
@ 2024-03-05 10:36 ` sarah.kriesch at opensuse dot org
  2024-03-06 22:49 ` segher at gcc dot gnu.org
                   ` (51 subsequent siblings)
  66 siblings, 0 replies; 68+ messages in thread
From: sarah.kriesch at opensuse dot org @ 2024-03-05 10:36 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101523

--- Comment #15 from Sarah Julia Kriesch <sarah.kriesch at opensuse dot org> ---
(In reply to Segher Boessenkool from comment #13)
> (In reply to Sarah Julia Kriesch from comment #12)
> A bigger case of what?  What do you mean?
Not only one software package is affected by this bug. "Most" software builds
are affected. As Andreas mentioned correctly, the fix is also beneficial for
other projects/target software.

There are so many packages with customized memory settings for s390x at
openSUSE, and the Maintainers can only shake their head about this behaviour.
But let's progress step by step.

I have luck, that also IBM customers are raising questions regarding that now.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* [Bug rtl-optimization/101523] Huge number of combine attempts
  2021-07-20  9:35 [Bug rtl-optimization/101523] New: Huge number of combine attempts krebbel at gcc dot gnu.org
                   ` (14 preceding siblings ...)
  2024-03-05 10:36 ` sarah.kriesch at opensuse dot org
@ 2024-03-06 22:49 ` segher at gcc dot gnu.org
  2024-03-06 22:53 ` segher at gcc dot gnu.org
                   ` (50 subsequent siblings)
  66 siblings, 0 replies; 68+ messages in thread
From: segher at gcc dot gnu.org @ 2024-03-06 22:49 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101523

--- Comment #16 from Segher Boessenkool <segher at gcc dot gnu.org> ---
(In reply to Andreas Krebbel from comment #14)
> If my analysis from comment #1 is correct, combine does superfluous steps
> here. Getting rid of this should not cause any harm, but should be
> beneficial for other targets as well. I agree that the patch I've proposed
> is kind of a hack. Do you think this could be turned into a proper fix?

When some insns have changed (or might have changed, combine does not always
know
the details), combinations of the insn with later insns are tried again. 
Sometimes
this finds new combination opportunities.

Not retrying combinations after one of the insns has changed would be a
regression.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* [Bug rtl-optimization/101523] Huge number of combine attempts
  2021-07-20  9:35 [Bug rtl-optimization/101523] New: Huge number of combine attempts krebbel at gcc dot gnu.org
                   ` (15 preceding siblings ...)
  2024-03-06 22:49 ` segher at gcc dot gnu.org
@ 2024-03-06 22:53 ` segher at gcc dot gnu.org
  2024-03-06 23:15 ` pinskia at gcc dot gnu.org
                   ` (49 subsequent siblings)
  66 siblings, 0 replies; 68+ messages in thread
From: segher at gcc dot gnu.org @ 2024-03-06 22:53 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101523

--- Comment #17 from Segher Boessenkool <segher at gcc dot gnu.org> ---
Why does this happen so extremely often for s390x customers?  It should from
first principles happen way more often for e.g. powerpc, but we never see such
big problems, let alone "all of the time"!

So what is really happening?  And, when did this start, anyway, because
apparently
at some point in time all was fine?

^ permalink raw reply	[flat|nested] 68+ messages in thread

* [Bug rtl-optimization/101523] Huge number of combine attempts
  2021-07-20  9:35 [Bug rtl-optimization/101523] New: Huge number of combine attempts krebbel at gcc dot gnu.org
                   ` (16 preceding siblings ...)
  2024-03-06 22:53 ` segher at gcc dot gnu.org
@ 2024-03-06 23:15 ` pinskia at gcc dot gnu.org
  2024-03-07  9:26 ` krebbel at gcc dot gnu.org
                   ` (48 subsequent siblings)
  66 siblings, 0 replies; 68+ messages in thread
From: pinskia at gcc dot gnu.org @ 2024-03-06 23:15 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101523

--- Comment #18 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Hmm, looking at what is done in combine, I wonder why forwprop didn't do the
address add into the memory. That would definitely decrease the # of combines
being done.

Maybe it is because it is used more than once. But still seems like another
pass should have done the a=b+c; d=e*[a] into a=b+c; d=e*[b+c] before hand.

Maybe there is some address cost going wrong here that forwprop is causing
issues. And it just happens combine does not take that into account due to rtl
cost not taking address cost into account.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* [Bug rtl-optimization/101523] Huge number of combine attempts
  2021-07-20  9:35 [Bug rtl-optimization/101523] New: Huge number of combine attempts krebbel at gcc dot gnu.org
                   ` (17 preceding siblings ...)
  2024-03-06 23:15 ` pinskia at gcc dot gnu.org
@ 2024-03-07  9:26 ` krebbel at gcc dot gnu.org
  2024-03-07  9:28 ` krebbel at gcc dot gnu.org
                   ` (47 subsequent siblings)
  66 siblings, 0 replies; 68+ messages in thread
From: krebbel at gcc dot gnu.org @ 2024-03-07  9:26 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101523

--- Comment #19 from Andreas Krebbel <krebbel at gcc dot gnu.org> ---
(In reply to Sarah Julia Kriesch from comment #15)
> (In reply to Segher Boessenkool from comment #13)
> > (In reply to Sarah Julia Kriesch from comment #12)
> > A bigger case of what?  What do you mean?
> Not only one software package is affected by this bug. "Most" software
> builds are affected. As Andreas mentioned correctly, the fix is also
> beneficial for other projects/target software.

I don't think we have any evidence yet that this is the problem which also hits
us with other packages builds. If you have other cases please open separate BZs
for that and we will try to figure out whether it is actually a DUP of this
one.

With "targets" I meant other GCC build targets. This pattern doesn't look
s390x-specific to me, although I haven't tried to reproduce it somewhere else.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* [Bug rtl-optimization/101523] Huge number of combine attempts
  2021-07-20  9:35 [Bug rtl-optimization/101523] New: Huge number of combine attempts krebbel at gcc dot gnu.org
                   ` (18 preceding siblings ...)
  2024-03-07  9:26 ` krebbel at gcc dot gnu.org
@ 2024-03-07  9:28 ` krebbel at gcc dot gnu.org
  2024-03-07  9:34 ` krebbel at gcc dot gnu.org
                   ` (46 subsequent siblings)
  66 siblings, 0 replies; 68+ messages in thread
From: krebbel at gcc dot gnu.org @ 2024-03-07  9:28 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101523

--- Comment #20 from Andreas Krebbel <krebbel at gcc dot gnu.org> ---
(In reply to Segher Boessenkool from comment #17)
...
> So what is really happening?  And, when did this start, anyway, because
> apparently at some point in time all was fine?

Due to the C++ constructs used the testcase doesn't compile with much older
GCCs. However, I can confirm that the problem can already be reproduced with
GCC 11.1.0.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* [Bug rtl-optimization/101523] Huge number of combine attempts
  2021-07-20  9:35 [Bug rtl-optimization/101523] New: Huge number of combine attempts krebbel at gcc dot gnu.org
                   ` (19 preceding siblings ...)
  2024-03-07  9:28 ` krebbel at gcc dot gnu.org
@ 2024-03-07  9:34 ` krebbel at gcc dot gnu.org
  2024-03-07 15:51 ` krebbel at gcc dot gnu.org
                   ` (45 subsequent siblings)
  66 siblings, 0 replies; 68+ messages in thread
From: krebbel at gcc dot gnu.org @ 2024-03-07  9:34 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101523

--- Comment #21 from Andreas Krebbel <krebbel at gcc dot gnu.org> ---
(In reply to Segher Boessenkool from comment #16)
...
> When some insns have changed (or might have changed, combine does not always
> know
> the details), combinations of the insn with later insns are tried again. 
> Sometimes
> this finds new combination opportunities.
> 
> Not retrying combinations after one of the insns has changed would be a
> regression.

Wouldn't it in this particular case be possible to recognize already in
try_combine that separating the move out of the parallel cannot lead to
additional optimization opportunities? To me it looks like we are just
recreating the situation we had before merging the INSNs into a parallel. Is
there a situation where this could lead to any improvement in the end?

^ permalink raw reply	[flat|nested] 68+ messages in thread

* [Bug rtl-optimization/101523] Huge number of combine attempts
  2021-07-20  9:35 [Bug rtl-optimization/101523] New: Huge number of combine attempts krebbel at gcc dot gnu.org
                   ` (20 preceding siblings ...)
  2024-03-07  9:34 ` krebbel at gcc dot gnu.org
@ 2024-03-07 15:51 ` krebbel at gcc dot gnu.org
  2024-03-07 15:52 ` krebbel at gcc dot gnu.org
                   ` (44 subsequent siblings)
  66 siblings, 0 replies; 68+ messages in thread
From: krebbel at gcc dot gnu.org @ 2024-03-07 15:51 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101523

--- Comment #22 from Andreas Krebbel <krebbel at gcc dot gnu.org> ---
I did a git bisect which ended up pointing at this commit, somewhere between
GCC 8 and 9:

commit c4c5ad1d6d1e1e1fe7a1c2b3bb097cc269dc7306 (bad)
Author: Segher Boessenkool <segher@kernel.crashing.org>
Date:   Mon Jul 30 15:18:17 2018 +0200

    combine: Allow combining two insns to two insns

    This patch allows combine to combine two insns into two.  This helps
    in many cases, by reducing instruction path length, and also allowing
    further combinations to happen.  PR85160 is a typical example of code
    that it can improve.

    This patch does not allow such combinations if either of the original
    instructions was a simple move instruction.  In those cases combining
    the two instructions increases register pressure without improving the
    code.  With this move test register pressure does no longer increase
    noticably as far as I can tell.

    (At first I also didn't allow either of the resulting insns to be a
    move instruction.  But that is actually a very good thing to have, as
    should have been obvious).

With this command line:
cc1plus -O2 -march=z196 -fpreprocessed Q111-8.ii -quiet

before:   20s compile-time and    21846 total combine attempts
after: > 5min compile-time and 43175686 total combine attempts

^ permalink raw reply	[flat|nested] 68+ messages in thread

* [Bug rtl-optimization/101523] Huge number of combine attempts
  2021-07-20  9:35 [Bug rtl-optimization/101523] New: Huge number of combine attempts krebbel at gcc dot gnu.org
                   ` (21 preceding siblings ...)
  2024-03-07 15:51 ` krebbel at gcc dot gnu.org
@ 2024-03-07 15:52 ` krebbel at gcc dot gnu.org
  2024-03-07 16:11 ` segher at gcc dot gnu.org
                   ` (43 subsequent siblings)
  66 siblings, 0 replies; 68+ messages in thread
From: krebbel at gcc dot gnu.org @ 2024-03-07 15:52 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101523

--- Comment #23 from Andreas Krebbel <krebbel at gcc dot gnu.org> ---
Created attachment 57646
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57646&action=edit
Testcase for comment #22

^ permalink raw reply	[flat|nested] 68+ messages in thread

* [Bug rtl-optimization/101523] Huge number of combine attempts
  2021-07-20  9:35 [Bug rtl-optimization/101523] New: Huge number of combine attempts krebbel at gcc dot gnu.org
                   ` (22 preceding siblings ...)
  2024-03-07 15:52 ` krebbel at gcc dot gnu.org
@ 2024-03-07 16:11 ` segher at gcc dot gnu.org
  2024-03-07 16:19 ` segher at gcc dot gnu.org
                   ` (42 subsequent siblings)
  66 siblings, 0 replies; 68+ messages in thread
From: segher at gcc dot gnu.org @ 2024-03-07 16:11 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101523

--- Comment #24 from Segher Boessenkool <segher at gcc dot gnu.org> ---
(In reply to Andreas Krebbel from comment #21)
> Wouldn't it in this particular case be possible to recognize already in
> try_combine that separating the move out of the parallel cannot lead to
> additional optimization opportunities? To me it looks like we are just
> recreating the situation we had before merging the INSNs into a parallel. Is
> there a situation where this could lead to any improvement in the end?

It might be possible.  It's not trivial at all though, esp. if you consider
other patterns, other targets, everything.

Anything that grossly reduces what we try will not fly.

This testcase is very degenerate, if we can recognise something about that
and make combine handle that better, that could be done.  Or I'll do my
proposed "do not try more than 40 billion things" patch.

As it is now, combine only ever reconsiders anything if it *did* make changes.
So, if you see it reconsidering things a lot, you also see it making a lot of
changes.  And all those changes make for materially better generated code (that
is tested by combine always, before making changes).

Changing things so combine makes fewer changes directly means you want it to
optimise less well.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* [Bug rtl-optimization/101523] Huge number of combine attempts
  2021-07-20  9:35 [Bug rtl-optimization/101523] New: Huge number of combine attempts krebbel at gcc dot gnu.org
                   ` (23 preceding siblings ...)
  2024-03-07 16:11 ` segher at gcc dot gnu.org
@ 2024-03-07 16:19 ` segher at gcc dot gnu.org
  2024-03-07 16:53 ` pinskia at gcc dot gnu.org
                   ` (41 subsequent siblings)
  66 siblings, 0 replies; 68+ messages in thread
From: segher at gcc dot gnu.org @ 2024-03-07 16:19 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101523

--- Comment #25 from Segher Boessenkool <segher at gcc dot gnu.org> ---
So this testcase compiles on powerpc64-linux (-O2) in about 34s.  Is s390x
way worse, or is this in lie what you are seeing?

^ permalink raw reply	[flat|nested] 68+ messages in thread

* [Bug rtl-optimization/101523] Huge number of combine attempts
  2021-07-20  9:35 [Bug rtl-optimization/101523] New: Huge number of combine attempts krebbel at gcc dot gnu.org
                   ` (24 preceding siblings ...)
  2024-03-07 16:19 ` segher at gcc dot gnu.org
@ 2024-03-07 16:53 ` pinskia at gcc dot gnu.org
  2024-03-07 16:57 ` pinskia at gcc dot gnu.org
                   ` (40 subsequent siblings)
  66 siblings, 0 replies; 68+ messages in thread
From: pinskia at gcc dot gnu.org @ 2024-03-07 16:53 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101523

--- Comment #26 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
So looking into the s390 backend, I notice that s390_address_cost says the
addressing mode `base+index` is slightly more expensive than just `base`:
from s390_address_cost :
  return ad.indx? COSTS_N_INSNS (1) + 1 : COSTS_N_INSNS (1);

BUT then s390_rtx_costs when looking at MEM does not take into account the
addressing for the cost there (it does take into account on the LHS though):
```
    case MEM:
      *total = 0;
      return true;
```

This mismatch does cause some issues. Basically fwprop uses address_cost to
figure out when it is replacing into MEM but combine just uses
insn_cost/rtx_cost .  So while fwprop rejects it as being worse and then
combine comes along and does it.

I suspect if we change the s390 backend just slightly to set the cost when
there is an index to the address to 1 for the MEM, combine won't be acting up
here.
Basically putting in sync the 2 cost methods.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* [Bug rtl-optimization/101523] Huge number of combine attempts
  2021-07-20  9:35 [Bug rtl-optimization/101523] New: Huge number of combine attempts krebbel at gcc dot gnu.org
                   ` (25 preceding siblings ...)
  2024-03-07 16:53 ` pinskia at gcc dot gnu.org
@ 2024-03-07 16:57 ` pinskia at gcc dot gnu.org
  2024-03-07 18:15 ` segher at gcc dot gnu.org
                   ` (39 subsequent siblings)
  66 siblings, 0 replies; 68+ messages in thread
From: pinskia at gcc dot gnu.org @ 2024-03-07 16:57 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101523

--- Comment #27 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
(In reply to Segher Boessenkool from comment #25)
> So this testcase compiles on powerpc64-linux (-O2) in about 34s.  Is s390x
> way worse, or is this in lie what you are seeing?

I should note that in the powerpc backend, address_cost is always 0 so I
suspect it won't run into this issue where fwprop rejects the transformation
but combine accepts it.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* [Bug rtl-optimization/101523] Huge number of combine attempts
  2021-07-20  9:35 [Bug rtl-optimization/101523] New: Huge number of combine attempts krebbel at gcc dot gnu.org
                   ` (26 preceding siblings ...)
  2024-03-07 16:57 ` pinskia at gcc dot gnu.org
@ 2024-03-07 18:15 ` segher at gcc dot gnu.org
  2024-03-07 19:42 ` segher at gcc dot gnu.org
                   ` (38 subsequent siblings)
  66 siblings, 0 replies; 68+ messages in thread
From: segher at gcc dot gnu.org @ 2024-03-07 18:15 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101523

--- Comment #28 from Segher Boessenkool <segher at gcc dot gnu.org> ---
For Q111: on rs6000:
;; Combiner totals: 53059 attempts, 52289 substitutions (7135 requiring new
space),
;; 229 successes.

I don't have C++ cross-compilers built (those are not trivial to do, hrm). 
I'll
try to build a s390x one.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* [Bug rtl-optimization/101523] Huge number of combine attempts
  2021-07-20  9:35 [Bug rtl-optimization/101523] New: Huge number of combine attempts krebbel at gcc dot gnu.org
                   ` (27 preceding siblings ...)
  2024-03-07 18:15 ` segher at gcc dot gnu.org
@ 2024-03-07 19:42 ` segher at gcc dot gnu.org
  2024-03-07 19:45 ` jakub at gcc dot gnu.org
                   ` (37 subsequent siblings)
  66 siblings, 0 replies; 68+ messages in thread
From: segher at gcc dot gnu.org @ 2024-03-07 19:42 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101523

--- Comment #29 from Segher Boessenkool <segher at gcc dot gnu.org> ---
I did manage to build one, but it does not know _Float64x and stuff.

Do you have a basic C-only testcase, maybe?

^ permalink raw reply	[flat|nested] 68+ messages in thread

* [Bug rtl-optimization/101523] Huge number of combine attempts
  2021-07-20  9:35 [Bug rtl-optimization/101523] New: Huge number of combine attempts krebbel at gcc dot gnu.org
                   ` (28 preceding siblings ...)
  2024-03-07 19:42 ` segher at gcc dot gnu.org
@ 2024-03-07 19:45 ` jakub at gcc dot gnu.org
  2024-03-07 20:10 ` segher at gcc dot gnu.org
                   ` (36 subsequent siblings)
  66 siblings, 0 replies; 68+ messages in thread
From: jakub at gcc dot gnu.org @ 2024-03-07 19:45 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101523

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |jakub at gcc dot gnu.org

--- Comment #30 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
(In reply to Segher Boessenkool from comment #29)
> I did manage to build one, but it does not know _Float64x and stuff.

Use -mlong-double-128 ?

^ permalink raw reply	[flat|nested] 68+ messages in thread

* [Bug rtl-optimization/101523] Huge number of combine attempts
  2021-07-20  9:35 [Bug rtl-optimization/101523] New: Huge number of combine attempts krebbel at gcc dot gnu.org
                   ` (29 preceding siblings ...)
  2024-03-07 19:45 ` jakub at gcc dot gnu.org
@ 2024-03-07 20:10 ` segher at gcc dot gnu.org
  2024-03-07 20:17 ` krebbel at gcc dot gnu.org
                   ` (35 subsequent siblings)
  66 siblings, 0 replies; 68+ messages in thread
From: segher at gcc dot gnu.org @ 2024-03-07 20:10 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101523

--- Comment #31 from Segher Boessenkool <segher at gcc dot gnu.org> ---
I need a configure flag, hrm.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* [Bug rtl-optimization/101523] Huge number of combine attempts
  2021-07-20  9:35 [Bug rtl-optimization/101523] New: Huge number of combine attempts krebbel at gcc dot gnu.org
                   ` (30 preceding siblings ...)
  2024-03-07 20:10 ` segher at gcc dot gnu.org
@ 2024-03-07 20:17 ` krebbel at gcc dot gnu.org
  2024-03-07 20:53 ` krebbel at gcc dot gnu.org
                   ` (34 subsequent siblings)
  66 siblings, 0 replies; 68+ messages in thread
From: krebbel at gcc dot gnu.org @ 2024-03-07 20:17 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101523

--- Comment #32 from Andreas Krebbel <krebbel at gcc dot gnu.org> ---
(In reply to Segher Boessenkool from comment #25)
> So this testcase compiles on powerpc64-linux (-O2) in about 34s.  Is s390x
> way worse, or is this in lie what you are seeing?

Way worse. See #c22 : 20s before your commit and 5min with it.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* [Bug rtl-optimization/101523] Huge number of combine attempts
  2021-07-20  9:35 [Bug rtl-optimization/101523] New: Huge number of combine attempts krebbel at gcc dot gnu.org
                   ` (31 preceding siblings ...)
  2024-03-07 20:17 ` krebbel at gcc dot gnu.org
@ 2024-03-07 20:53 ` krebbel at gcc dot gnu.org
  2024-03-20 11:53 ` [Bug target/101523] " rguenth at gcc dot gnu.org
                   ` (33 subsequent siblings)
  66 siblings, 0 replies; 68+ messages in thread
From: krebbel at gcc dot gnu.org @ 2024-03-07 20:53 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101523

--- Comment #33 from Andreas Krebbel <krebbel at gcc dot gnu.org> ---
(In reply to Andrew Pinski from comment #26)
...
> I suspect if we change the s390 backend just slightly to set the cost when
> there is an index to the address to 1 for the MEM, combine won't be acting
> up here.
> Basically putting in sync the 2 cost methods.

I've tried that but this didn't change anything. As you have expected the
problem goes away when letting s390_address_cost always return 0.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* [Bug target/101523] Huge number of combine attempts
  2021-07-20  9:35 [Bug rtl-optimization/101523] New: Huge number of combine attempts krebbel at gcc dot gnu.org
                   ` (32 preceding siblings ...)
  2024-03-07 20:53 ` krebbel at gcc dot gnu.org
@ 2024-03-20 11:53 ` rguenth at gcc dot gnu.org
  2024-03-21  7:56 ` segher at gcc dot gnu.org
                   ` (32 subsequent siblings)
  66 siblings, 0 replies; 68+ messages in thread
From: rguenth at gcc dot gnu.org @ 2024-03-20 11:53 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101523

--- Comment #34 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Andreas Krebbel from comment #1)
> This appears to be triggered by try_combine unnecessarily setting back the
> position by returning the i2 insn.
> 
> When 866 is inserted into 973 866 still needs to be kept around for other
> users. So try_combine first merges the two sets into a parallel and
> immediately notices that this can't be recognized. Because none of the sets
> is a trivial move it is split again into two separate insns. Although the
> new i2 pattern exactly matches the input i2 combine considers this to be a
> new insn and triggers all the scanning log link creation and eventually
> returns it what let's the combine start all over at 866.
> 
> Due to that combine tries many of the substitutions more than 400x.
> 
> Trying 866 -> 973:
>   866: r22393:DI=r22391:DI+r22392:DI
>   973: r22499:DF=r22498:DF*[r22393:DI]
>       REG_DEAD r22498:DF
> Failed to match this instruction:
> (parallel [
>         (set (reg:DF 22499)
>             (mult:DF (reg:DF 22498)
>                 (mem:DF (plus:DI (reg/f:DI 22391 [ _85085 ])
>                         (reg:DI 22392 [ _85086 ])) [17 *_85087+0 S8 A64])))
>         (set (reg/f:DI 22393 [ _85087 ])
>             (plus:DI (reg/f:DI 22391 [ _85085 ])
>                 (reg:DI 22392 [ _85086 ])))
>     ])
> Failed to match this instruction:
> (parallel [
>         (set (reg:DF 22499)
>             (mult:DF (reg:DF 22498)
>                 (mem:DF (plus:DI (reg/f:DI 22391 [ _85085 ])
>                         (reg:DI 22392 [ _85086 ])) [17 *_85087+0 S8 A64])))
>         (set (reg/f:DI 22393 [ _85087 ])
>             (plus:DI (reg/f:DI 22391 [ _85085 ])
>                 (reg:DI 22392 [ _85086 ])))
>     ])
> Successfully matched this instruction:
> (set (reg/f:DI 22393 [ _85087 ])
>     (plus:DI (reg/f:DI 22391 [ _85085 ])
>         (reg:DI 22392 [ _85086 ])))

So this is "unchanged", do we re-combine into it?

> Successfully matched this instruction:
> (set (reg:DF 22499)
>     (mult:DF (reg:DF 22498)
>         (mem:DF (plus:DI (reg/f:DI 22391 [ _85085 ])
>                 (reg:DI 22392 [ _85086 ])) [17 *_85087+0 S8 A64])))

This one is changed.

> allowing combination of insns 866 and 973
> original costs 4 + 4 = 8
> replacement costs 4 + 4 = 8
> modifying insn i2   866: r22393:DI=r22391:DI+r22392:DI
> deferring rescan insn with uid = 866.
> modifying insn i3   973: r22499:DF=r22498:DF*[r22391:DI+r22392:DI]
>       REG_DEAD r22498:DF
> deferring rescan insn with uid = 973.

The change itself looks reasonable given costs, though maybe 2 -> 2
combinations should not trigger when the cost remains the same?  In
this case it definitely doesn't look profitable, does it?  Well,
in theory it might hide latency and the 2nd instruction can issue
at the same time as the first.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* [Bug target/101523] Huge number of combine attempts
  2021-07-20  9:35 [Bug rtl-optimization/101523] New: Huge number of combine attempts krebbel at gcc dot gnu.org
                   ` (33 preceding siblings ...)
  2024-03-20 11:53 ` [Bug target/101523] " rguenth at gcc dot gnu.org
@ 2024-03-21  7:56 ` segher at gcc dot gnu.org
  2024-03-21  8:15 ` rguenth at gcc dot gnu.org
                   ` (31 subsequent siblings)
  66 siblings, 0 replies; 68+ messages in thread
From: segher at gcc dot gnu.org @ 2024-03-21  7:56 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101523

--- Comment #35 from Segher Boessenkool <segher at gcc dot gnu.org> ---
(In reply to Richard Biener from comment #34)
> The change itself looks reasonable given costs, though maybe 2 -> 2
> combinations should not trigger when the cost remains the same?  In
> this case it definitely doesn't look profitable, does it?  Well,
> in theory it might hide latency and the 2nd instruction can issue
> at the same time as the first.

No, it definitely should be done.  As I showed back then, it costs less than 1%
extra compile time on *any platform* on average, and it reduced code size by
1%-2%
everywhere.

It also cannot get stuck, any combination is attempted only once, any
combination
that succeeds eats up a loglink.  It is finite, (almost) linear in fact.

Any backend is free to say certain insns shouldn't combine at all.  This will
lead to reduced performance though.

- ~ - ~ -

Something that is the real complaint here: it seems we do not GC often enough,
only after processing a BB (or EBB)?  That adds up for artificial code like
this, sure.

And the "param to give an upper limit to how many combination attempts are done
(per BB)" offer is on the table still, too.  I don't think it would ever be
useful (if you want your code to compile faster just write better code!), but
:-)

^ permalink raw reply	[flat|nested] 68+ messages in thread

* [Bug target/101523] Huge number of combine attempts
  2021-07-20  9:35 [Bug rtl-optimization/101523] New: Huge number of combine attempts krebbel at gcc dot gnu.org
                   ` (34 preceding siblings ...)
  2024-03-21  7:56 ` segher at gcc dot gnu.org
@ 2024-03-21  8:15 ` rguenth at gcc dot gnu.org
  2024-03-21  8:27 ` rguenth at gcc dot gnu.org
                   ` (30 subsequent siblings)
  66 siblings, 0 replies; 68+ messages in thread
From: rguenth at gcc dot gnu.org @ 2024-03-21  8:15 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101523

--- Comment #36 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Segher Boessenkool from comment #35)
> (In reply to Richard Biener from comment #34)
> > The change itself looks reasonable given costs, though maybe 2 -> 2
> > combinations should not trigger when the cost remains the same?  In
> > this case it definitely doesn't look profitable, does it?  Well,
> > in theory it might hide latency and the 2nd instruction can issue
> > at the same time as the first.
> 
> No, it definitely should be done.  As I showed back then, it costs less than
> 1%
> extra compile time on *any platform* on average, and it reduced code size by
> 1%-2%
> everywhere.
> 
> It also cannot get stuck, any combination is attempted only once, any
> combination
> that succeeds eats up a loglink.  It is finite, (almost) linear in fact.

So the slowness for the testcase comes from failed attempts.

> Any backend is free to say certain insns shouldn't combine at all.  This will
> lead to reduced performance though.
> 
> - ~ - ~ -
> 
> Something that is the real complaint here: it seems we do not GC often
> enough,
> only after processing a BB (or EBB)?  That adds up for artificial code like
> this, sure.

For memory use if you know combine doesn't have "dangling" links to GC memory
you can call ggc_collect at any point you like.  Or, when you create
throw-away RTL, ggc_free it explicitly (yeah, that only frees the "toplevel").

> And the "param to give an upper limit to how many combination attempts are
> done
> (per BB)" offer is on the table still, too.  I don't think it would ever be
> useful (if you want your code to compile faster just write better code!),
> but :-)

Well, while you say the number of successful combinations is linear the
number of combine attempts appearantly isn't (well, of course, if we ever
combine from multi-use defs).  So yeah, a param might be useful here but
instead of some constant limit on the number of combine attempts per
function or per BB it might make sense to instead limit it on the number
of DEFs?  I understand we work on the uses so it'll be a bit hard to
apply this in a way to, say, combine a DEF only with the N nearest uses
(but not any ones farther out), and maintaining such a count per DEF would
cost.  So more practical might be to limit the number of attempts to combine
into an (unchanged?) insns?

Basically I would hope with a hard limit in place we'd not stop after the
first half of a BB leaving trivial combinations in the second half
unhandled but instead somehow throttle the "expensive" cases?

^ permalink raw reply	[flat|nested] 68+ messages in thread

* [Bug target/101523] Huge number of combine attempts
  2021-07-20  9:35 [Bug rtl-optimization/101523] New: Huge number of combine attempts krebbel at gcc dot gnu.org
                   ` (35 preceding siblings ...)
  2024-03-21  8:15 ` rguenth at gcc dot gnu.org
@ 2024-03-21  8:27 ` rguenth at gcc dot gnu.org
  2024-03-21 12:57 ` segher at gcc dot gnu.org
                   ` (29 subsequent siblings)
  66 siblings, 0 replies; 68+ messages in thread
From: rguenth at gcc dot gnu.org @ 2024-03-21  8:27 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101523

--- Comment #37 from Richard Biener <rguenth at gcc dot gnu.org> ---
Created attachment 57753
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57753&action=edit
quick attempt at a limit

So like this?

^ permalink raw reply	[flat|nested] 68+ messages in thread

* [Bug target/101523] Huge number of combine attempts
  2021-07-20  9:35 [Bug rtl-optimization/101523] New: Huge number of combine attempts krebbel at gcc dot gnu.org
                   ` (36 preceding siblings ...)
  2024-03-21  8:27 ` rguenth at gcc dot gnu.org
@ 2024-03-21 12:57 ` segher at gcc dot gnu.org
  2024-03-21 12:58 ` segher at gcc dot gnu.org
                   ` (28 subsequent siblings)
  66 siblings, 0 replies; 68+ messages in thread
From: segher at gcc dot gnu.org @ 2024-03-21 12:57 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101523

--- Comment #38 from Segher Boessenkool <segher at gcc dot gnu.org> ---
(In reply to Richard Biener from comment #36)
> > No, it definitely should be done.  As I showed back then, it costs less than
> > 1%
> > extra compile time on *any platform* on average, and it reduced code size by
> > 1%-2%
> > everywhere.
> > 
> > It also cannot get stuck, any combination is attempted only once, any
> > combination
> > that succeeds eats up a loglink.  It is finite, (almost) linear in fact.
> 
> So the slowness for the testcase comes from failed attempts.

Of course.  Most attempts do not succeed, there aren't instructions for most
"random" combinations of instructions feeding each other.  But combine blindly
tries everything, that is its strength!  It ends up finding many more thing
than
any recognition automaton does.

> > Something that is the real complaint here: it seems we do not GC often
> > enough,
> > only after processing a BB (or EBB)?  That adds up for artificial code like
> > this, sure.
> 
> For memory use if you know combine doesn't have "dangling" links to GC memory
> you can call ggc_collect at any point you like.  Or, when you create
> throw-away RTL, ggc_free it explicitly (yeah, that only frees the
> "toplevel").

A lot of it *is* toplevel (well, completely disconnected RTX), just
temporaries,
things we can just throw away.  At every try_combine call even, kinda.  There
might be some more RTX that needs some protection.  We'll see.

> > And the "param to give an upper limit to how many combination attempts are
> > done
> > (per BB)" offer is on the table still, too.  I don't think it would ever be
> > useful (if you want your code to compile faster just write better code!),
> > but :-)
> 
> Well, while you say the number of successful combinations is linear the
> number of combine attempts appearantly isn't

It is, and that is pretty easy to show even.  With retries it stays linear, but
with a hefty constant.  And on some targets (with more than three inputs for
some instructions, say) it can be a big constant anyway.

But linear is linear, and stays linear, for way too big code it is just as
acceptable as for "normal" code.  Just slow.  If you don't want the compiler to
take a long time compiling your way too big code, use -O0, or preferably do not
write insane code in the first place :-)


> (well, of course, if we ever
> combine from multi-use defs).  So yeah, a param might be useful here but
> instead of some constant limit on the number of combine attempts per
> function or per BB it might make sense to instead limit it on the number
> of DEFs?

We still use loglinks in combine.  These are nice to prove that things stay
linear, even (every time combine succeeds a loglink is used up).

The number of loglinks and insns (insns combine can do anything with) differs
by a small constant factor.

> I understand we work on the uses

We work on the loglinks, a def-use pair if you want.

> so it'll be a bit hard to
> apply this in a way to, say, combine a DEF only with the N nearest uses
> (but not any ones farther out),

There is only a loglink from a def to the very next use.  If that combines, the
insn that does the def is retained as well, if there is any other use.  But
there never is a combination of a def with a later use tried, if the earliest
use does not combine.

> and maintaining such a count per DEF would
> cost.  So more practical might be to limit the number of attempts to combine
> into an (unchanged?) insns?
> 
> Basically I would hope with a hard limit in place we'd not stop after the
> first half of a BB leaving trivial combinations in the second half
> unhandled but instead somehow throttle the "expensive" cases?

Ideally we'll not do *any* artificial limitations.  But GCC just throws its hat
in the ring in other cases as well, say, too big RA problems.  You do get not
as high quality code as wanted, but at least you get something compiled in
an acceptable timeframe :-)

^ permalink raw reply	[flat|nested] 68+ messages in thread

* [Bug target/101523] Huge number of combine attempts
  2021-07-20  9:35 [Bug rtl-optimization/101523] New: Huge number of combine attempts krebbel at gcc dot gnu.org
                   ` (37 preceding siblings ...)
  2024-03-21 12:57 ` segher at gcc dot gnu.org
@ 2024-03-21 12:58 ` segher at gcc dot gnu.org
  2024-03-21 13:15 ` rguenth at gcc dot gnu.org
                   ` (27 subsequent siblings)
  66 siblings, 0 replies; 68+ messages in thread
From: segher at gcc dot gnu.org @ 2024-03-21 12:58 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101523

--- Comment #39 from Segher Boessenkool <segher at gcc dot gnu.org> ---
(In reply to Richard Biener from comment #37)
> Created attachment 57753 [details]
> quick attempt at a limit
> 
> So like this?

Hrm.  It should be possible to not have the same test 28 times.  Just at one
spot!

^ permalink raw reply	[flat|nested] 68+ messages in thread

* [Bug target/101523] Huge number of combine attempts
  2021-07-20  9:35 [Bug rtl-optimization/101523] New: Huge number of combine attempts krebbel at gcc dot gnu.org
                   ` (38 preceding siblings ...)
  2024-03-21 12:58 ` segher at gcc dot gnu.org
@ 2024-03-21 13:15 ` rguenth at gcc dot gnu.org
  2024-03-21 13:22 ` rguenth at gcc dot gnu.org
                   ` (26 subsequent siblings)
  66 siblings, 0 replies; 68+ messages in thread
From: rguenth at gcc dot gnu.org @ 2024-03-21 13:15 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101523

--- Comment #40 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Segher Boessenkool from comment #39)
> (In reply to Richard Biener from comment #37)
> > Created attachment 57753 [details]
> > quick attempt at a limit
> > 
> > So like this?
> 
> Hrm.  It should be possible to not have the same test 28 times.  Just at one
> spot!

Not sure.  We loop over log-links multiple times.  You could argue we
should only call try_combine once ;)  But yeah, it's not very pretty,
agreed.  We could pass the counter to try_combine and have a special
return value, (void *)-1 for the failed-and-exhausted case, handling
that in retry: like

retry:
  if (next == NULL || next == (void *)-1)
    attempts = 0;

then only the last case where we mangle 'set' and have to restore it on
failure would need special casing (and of course try_combine itself).
But in a way that's also ugly so I guess I stay with my proposal.

At least until somebody actually tried if and how much it helps (and for which
values of the --param).

^ permalink raw reply	[flat|nested] 68+ messages in thread

* [Bug target/101523] Huge number of combine attempts
  2021-07-20  9:35 [Bug rtl-optimization/101523] New: Huge number of combine attempts krebbel at gcc dot gnu.org
                   ` (39 preceding siblings ...)
  2024-03-21 13:15 ` rguenth at gcc dot gnu.org
@ 2024-03-21 13:22 ` rguenth at gcc dot gnu.org
  2024-03-22  8:07 ` rguenth at gcc dot gnu.org
                   ` (25 subsequent siblings)
  66 siblings, 0 replies; 68+ messages in thread
From: rguenth at gcc dot gnu.org @ 2024-03-21 13:22 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101523

--- Comment #41 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Segher Boessenkool from comment #38)
> (In reply to Richard Biener from comment #36)
[...]
> But linear is linear, and stays linear, for way too big code it is just as
> acceptable as for "normal" code.  Just slow.  If you don't want the compiler
> to
> take a long time compiling your way too big code, use -O0, or preferably do
> not
> write insane code in the first place :-)

;)  We promise to try to behave reasonably with insane code, but
technically we tell people to use at most -O1 for that.  That will
at least avoid trying three and four insn combinations.

[...]

> Ideally we'll not do *any* artificial limitations.

I agree.  And we should try hard to fix actual algorithmic problems if
they exist before resorting to limits.

>  But GCC just throws its hat
> in the ring in other cases as well, say, too big RA problems.  You do get not
> as high quality code as wanted, but at least you get something compiled in
> an acceptable timeframe :-)

Yep.  See above for my comment about -O1.  I think it's fine to take
time (and memory) to optimize high quality code at -O2.  And if you
throw insane code to GCC then also an insane amount of time and memory ;)

So I do wonder whether with -O1 the issue is gone anyway already?

If not then for the sake of -O1 and insane we want such limit.  It can
be more crude aka just count all attempts and stop alltogether, or like
PRE, just not PRE when the number of pseudos/blocks crosses a magic barrier.
I just thought combine is a bit a too core part of our instruction selection
so disabling it completely (after some point) would be too bad even for
insane code ...

Andreas - can you try --param max-combine-insns=2 please?  That is I think
what -O1 uses and only then does two-insn combinations.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* [Bug target/101523] Huge number of combine attempts
  2021-07-20  9:35 [Bug rtl-optimization/101523] New: Huge number of combine attempts krebbel at gcc dot gnu.org
                   ` (40 preceding siblings ...)
  2024-03-21 13:22 ` rguenth at gcc dot gnu.org
@ 2024-03-22  8:07 ` rguenth at gcc dot gnu.org
  2024-03-22  9:41 ` rguenth at gcc dot gnu.org
                   ` (24 subsequent siblings)
  66 siblings, 0 replies; 68+ messages in thread
From: rguenth at gcc dot gnu.org @ 2024-03-22  8:07 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101523

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
      Known to fail|                            |14.0
     Ever confirmed|0                           |1
   Last reconfirmed|                            |2024-03-22
             Status|UNCONFIRMED                 |NEW

--- Comment #42 from Richard Biener <rguenth at gcc dot gnu.org> ---
I checked with a cross btw, and with -O1 we use 10s and 900MB memory for the
testcase for comment #22.  With -O2 it's 160s and 11GB as reported.

It might of course that with -O1 we simply do not confront combine with
the opportunity to blow up.

So IMHO this is a non-issue and the reporter should use -O1 for such a TU.

Still confirmed as a s390x specific problem and confirmed on trunk.

Statistics with the -O2 combines:

305 combine "successes" 9425 
305 combine "three-insn combine" 1
305 combine "four-insn combine" 1
305 combine "merges" 40418007 
305 combine "extras" 9110287
305 combine "two-insn combine" 9423
305 combine "attempts" 40440004

With -O1:

305 combine "successes" 1861
305 combine "three-insn combine" 1732
305 combine "merges" 191051
305 combine "extras" 9274
305 combine "two-insn combine" 129
305 combine "attempts" 192434

^ permalink raw reply	[flat|nested] 68+ messages in thread

* [Bug target/101523] Huge number of combine attempts
  2021-07-20  9:35 [Bug rtl-optimization/101523] New: Huge number of combine attempts krebbel at gcc dot gnu.org
                   ` (41 preceding siblings ...)
  2024-03-22  8:07 ` rguenth at gcc dot gnu.org
@ 2024-03-22  9:41 ` rguenth at gcc dot gnu.org
  2024-03-22  9:42 ` sjames at gcc dot gnu.org
                   ` (23 subsequent siblings)
  66 siblings, 0 replies; 68+ messages in thread
From: rguenth at gcc dot gnu.org @ 2024-03-22  9:41 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101523

--- Comment #43 from Richard Biener <rguenth at gcc dot gnu.org> ---
The interesting bit is that there are only 12026 overall loglinks, the
number of combine attempts is way higher than that would suggest also
given the few successful combinations.  So something is odd here.

There's one interesting machinery in try_combine through added_links_insn
and added_notes_insn we can end up re-processing a large swat of insns
(even though we should need to only re-process the link target insns, not
all insns inbetween).  There might be the opportunity, for the "reprocessing",
to use a worklist instead of resetting the insn walk.

I added statistics to note the "distance" we travel there by taking
DF_INSN_LUID (ret) - DF_INSN_LUID (added_{notes,links}_insn) as that.
This shows 48 such jumps with seemingly large distances:

305 combine "restart earlier == 143" 3
305 combine "restart earlier == 254" 1
305 combine "restart earlier == 684" 1
305 combine "restart earlier == 726" 1
305 combine "restart earlier == 777" 1
305 combine "restart earlier == 1158" 1
305 combine "restart earlier == 1421" 1
305 combine "restart earlier == 2073" 1
305 combine "restart earlier == 2130" 1
...
305 combine "restart earlier == 49717" 1
305 combine "restart earlier == 49763" 1
305 combine "restart earlier == 49866" 1
305 combine "restart earlier == 50010" 1
305 combine "restart earlier == 50286" 1
305 combine "restart earlier == 50754" 1
305 combine "restart earlier == 50809" 1

killing this feature doesn't improve things to -O1 levels though so it's
more likely the fact that we also do

  rtx_insn *ret = newi2pat ? i2 : i3;

thus re-start at i2 when we altered i2.  We re-start through this 6910
times.  Always re-starting at i3 helps a lot and gets us -O1 performance
back.  From comment#1 it suggests that newi2pat might in fact be equal
to the old, so I tried to count how many times this happens with a stupid

diff --git a/gcc/combine.cc b/gcc/combine.cc
index a4479f8d836..acd176d3185 100644
--- a/gcc/combine.cc
+++ b/gcc/combine.cc
@@ -4435,6 +4435,8 @@ try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn *i1,
rtx_insn *i0,
          propagate_for_debug (i2, last_combined_insn, i2dest, i2src,
                               this_basic_block);
        INSN_CODE (i2) = i2_code_number;
+       if (rtx_equal_p (PATTERN (i2), newi2pat))
+         statistics_counter_event (cfun, "equal newi2pat", 1);
        PATTERN (i2) = newi2pat;
       }
     else

and indeed this shows this to be the case 9211 times.

The following improves compile-time to 20s and 460MB memory use.  In general
the algorithmic deficiency with the "restarting" remains and a proper fix
is to use a worklist for those that you'd drain before advancing in the
instruction chain (so not have a single 'ret' insn to reprocess but add
to the worklist).

I'm not sure whether identifying a not changed "new" i2 can be done better.
I'll leave it all to Segher of course - he'll be fastest to produce something
he likes.

diff --git a/gcc/combine.cc b/gcc/combine.cc
index a4479f8d836..0c61dcedaa1 100644
--- a/gcc/combine.cc
+++ b/gcc/combine.cc
@@ -4276,6 +4276,7 @@ try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn *i1,
rtx
_insn *i0,
        }
     }

+  bool newi2pat_not_new = false;
   {
     rtx i3notes, i2notes, i1notes = 0, i0notes = 0;
     struct insn_link *i3links, *i2links, *i1links = 0, *i0links = 0;
@@ -4435,6 +4436,8 @@ try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn *i1,
rtx_insn *i0,
          propagate_for_debug (i2, last_combined_insn, i2dest, i2src,
                               this_basic_block);
        INSN_CODE (i2) = i2_code_number;
+       if (rtx_equal_p (PATTERN (i2), newi2pat))
+         newi2pat_not_new = true;
        PATTERN (i2) = newi2pat;
       }
     else
@@ -4752,7 +4755,7 @@ try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn *i1,
rtx_insn *i0,
   combine_successes++;
   undo_commit ();

-  rtx_insn *ret = newi2pat ? i2 : i3;
+  rtx_insn *ret = newi2pat && !newi2pat_not_new ? i2 : i3;
   if (added_links_insn && DF_INSN_LUID (added_links_insn) < DF_INSN_LUID
(ret))
     ret = added_links_insn;
   if (added_notes_insn && DF_INSN_LUID (added_notes_insn) < DF_INSN_LUID
(ret))

^ permalink raw reply	[flat|nested] 68+ messages in thread

* [Bug target/101523] Huge number of combine attempts
  2021-07-20  9:35 [Bug rtl-optimization/101523] New: Huge number of combine attempts krebbel at gcc dot gnu.org
                   ` (42 preceding siblings ...)
  2024-03-22  9:41 ` rguenth at gcc dot gnu.org
@ 2024-03-22  9:42 ` sjames at gcc dot gnu.org
  2024-03-22  9:42 ` sjames at gcc dot gnu.org
                   ` (22 subsequent siblings)
  66 siblings, 0 replies; 68+ messages in thread
From: sjames at gcc dot gnu.org @ 2024-03-22  9:42 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101523

--- Comment #44 from Sam James <sjames at gcc dot gnu.org> ---
I'm really curious as to if there's other test cases which could be shared, as
Andreas mentioned distributions were complaining about this even. That's
unlikely if it's a single degenerate case.

Even listing some example package names could help.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* [Bug target/101523] Huge number of combine attempts
  2021-07-20  9:35 [Bug rtl-optimization/101523] New: Huge number of combine attempts krebbel at gcc dot gnu.org
                   ` (43 preceding siblings ...)
  2024-03-22  9:42 ` sjames at gcc dot gnu.org
@ 2024-03-22  9:42 ` sjames at gcc dot gnu.org
  2024-03-22  9:52 ` rguenth at gcc dot gnu.org
                   ` (21 subsequent siblings)
  66 siblings, 0 replies; 68+ messages in thread
From: sjames at gcc dot gnu.org @ 2024-03-22  9:42 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101523

--- Comment #45 from Sam James <sjames at gcc dot gnu.org> ---
(ah, not andreas, but sarah)

^ permalink raw reply	[flat|nested] 68+ messages in thread

* [Bug target/101523] Huge number of combine attempts
  2021-07-20  9:35 [Bug rtl-optimization/101523] New: Huge number of combine attempts krebbel at gcc dot gnu.org
                   ` (44 preceding siblings ...)
  2024-03-22  9:42 ` sjames at gcc dot gnu.org
@ 2024-03-22  9:52 ` rguenth at gcc dot gnu.org
  2024-03-22 12:17 ` rguenth at gcc dot gnu.org
                   ` (20 subsequent siblings)
  66 siblings, 0 replies; 68+ messages in thread
From: rguenth at gcc dot gnu.org @ 2024-03-22  9:52 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101523

--- Comment #46 from Richard Biener <rguenth at gcc dot gnu.org> ---
Maybe combine already knows that it just "keeps i2" rather than replacing it?
When !newi2pat we seem to delete i2.  Anyway, somebody more familiar with
combine should produce a good(TM) patch.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* [Bug target/101523] Huge number of combine attempts
  2021-07-20  9:35 [Bug rtl-optimization/101523] New: Huge number of combine attempts krebbel at gcc dot gnu.org
                   ` (45 preceding siblings ...)
  2024-03-22  9:52 ` rguenth at gcc dot gnu.org
@ 2024-03-22 12:17 ` rguenth at gcc dot gnu.org
  2024-03-22 12:38 ` rguenth at gcc dot gnu.org
                   ` (19 subsequent siblings)
  66 siblings, 0 replies; 68+ messages in thread
From: rguenth at gcc dot gnu.org @ 2024-03-22 12:17 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101523

--- Comment #47 from Richard Biener <rguenth at gcc dot gnu.org> ---
The rtx_equal_p change gets us 50% improvement only, it's necessary to also
disable the added_{links,notes}_insn extra re-processing to get us all the
way to -O1 speed.  We'd need the worklist to avoid combine regressions there
(though for the actual testcase it doesn't make a difference).

^ permalink raw reply	[flat|nested] 68+ messages in thread

* [Bug target/101523] Huge number of combine attempts
  2021-07-20  9:35 [Bug rtl-optimization/101523] New: Huge number of combine attempts krebbel at gcc dot gnu.org
                   ` (46 preceding siblings ...)
  2024-03-22 12:17 ` rguenth at gcc dot gnu.org
@ 2024-03-22 12:38 ` rguenth at gcc dot gnu.org
  2024-03-27 14:35 ` sarah.kriesch at opensuse dot org
                   ` (18 subsequent siblings)
  66 siblings, 0 replies; 68+ messages in thread
From: rguenth at gcc dot gnu.org @ 2024-03-22 12:38 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101523

--- Comment #48 from Richard Biener <rguenth at gcc dot gnu.org> ---
So another "simple" way is to keep the redundant insn walking ("it's O(1)") but
remember processsed insns and only re-process those we mark as such.

There might be a free "visited" bit on rtx_insn, who knows, the following uses
a bitmap to track this.  Likely where we set/update added_links_insn we
should mark insns for re-processing.

A worklist, if it were to be processed in instruction order, would need to
be kept ordered and DF docs say DF_INSN_LUID isn't to be trusted after
adding/removing insns.

diff --git a/gcc/combine.cc b/gcc/combine.cc
index a4479f8d836..c2f04e6b86e 100644
--- a/gcc/combine.cc
+++ b/gcc/combine.cc
@@ -1106,6 +1106,8 @@ insn_a_feeds_b (rtx_insn *a, rtx_insn *b)
   return false;
 }
 ^L
+static bitmap processed;
+
 /* Main entry point for combiner.  F is the first insn of the function.
    NREGS is the first unused pseudo-reg number.

@@ -1211,6 +1213,8 @@ combine_instructions (rtx_insn *f, unsigned int nregs)
   setup_incoming_promotions (first);
   last_bb = ENTRY_BLOCK_PTR_FOR_FN (cfun);
   int max_combine = param_max_combine_insns;
+  processed = BITMAP_ALLOC (NULL);
+  bitmap_tree_view (processed);

   FOR_EACH_BB_FN (this_basic_block, cfun)
     {
@@ -1231,6 +1235,7 @@ combine_instructions (rtx_insn *f, unsigned int nregs)
        label_tick_ebb_start = label_tick;
       last_bb = this_basic_block;

+      bitmap_clear (processed);
       rtl_profile_for_bb (this_basic_block);
       for (insn = BB_HEAD (this_basic_block);
           insn != NEXT_INSN (BB_END (this_basic_block));
@@ -1240,6 +1245,9 @@ combine_instructions (rtx_insn *f, unsigned int nregs)
          if (!NONDEBUG_INSN_P (insn))
            continue;

+         if (!bitmap_set_bit (processed, INSN_UID (insn)))
+           continue;
+
          while (last_combined_insn
                 && (!NONDEBUG_INSN_P (last_combined_insn)
                     || last_combined_insn->deleted ()))
@@ -1427,6 +1435,7 @@ retry:
          ;
        }
     }
+  BITMAP_FREE (processed);

   default_rtl_profile ();
   clear_bb_flags ();
@@ -4758,6 +4767,14 @@ try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn *i1,
rtx_insn *i0,
   if (added_notes_insn && DF_INSN_LUID (added_notes_insn) < DF_INSN_LUID
(ret))
     ret = added_notes_insn;

+  bitmap_clear_bit (processed, INSN_UID (i3));
+  if (newi2pat)
+    bitmap_clear_bit (processed, INSN_UID (newi2pat));
+  if (added_links_insn)
+    bitmap_clear_bit (processed, INSN_UID (added_links_insn));
+  if (added_notes_insn)
+    bitmap_clear_bit (processed, INSN_UID (added_notes_insn));
+
   return ret;
 }
 ^L

^ permalink raw reply	[flat|nested] 68+ messages in thread

* [Bug target/101523] Huge number of combine attempts
  2021-07-20  9:35 [Bug rtl-optimization/101523] New: Huge number of combine attempts krebbel at gcc dot gnu.org
                   ` (47 preceding siblings ...)
  2024-03-22 12:38 ` rguenth at gcc dot gnu.org
@ 2024-03-27 14:35 ` sarah.kriesch at opensuse dot org
  2024-03-27 16:10 ` [Bug rtl-optimization/101523] " cvs-commit at gcc dot gnu.org
                   ` (17 subsequent siblings)
  66 siblings, 0 replies; 68+ messages in thread
From: sarah.kriesch at opensuse dot org @ 2024-03-27 14:35 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101523

--- Comment #49 from Sarah Julia Kriesch <sarah.kriesch at opensuse dot org> ---
(In reply to Sam James from comment #44)
> I'm really curious as to if there's other test cases which could be shared,
> as Andreas mentioned distributions were complaining about this even. That's
> unlikely if it's a single degenerate case.
> 
> Even listing some example package names could help.

Sorry for the late response! I am a volunteer and went through all constraints
files from the last few years (I added to multiple packages). Most
memory-related issues have been already resolved.
But I found some easter eggs for you today:

1) nodejs21 with 11,5GB on s390x, 2,5GB on x86, 3,7GB on PPCle, 2,5GB on
aarch64 and 2,4GB on armv7:
https://build.opensuse.org/package/show/devel:languages:nodejs/nodejs21

2) PDAL with 9,7GB on s390x, 2,2GB on x86 and 2,2GB on aarch64:
https://build.opensuse.org/package/show/openSUSE:Factory:zSystems/PDAL

3) python-numpy with 15,2GB on s390x, 8,6GB on PPCle, 9,3GB on x86,1,9 on
armv7, 9,3GB on aarch64:
https://build.opensuse.org/package/show/devel:languages:python:numeric/python-numpy

I wish you a happy Eastern!

^ permalink raw reply	[flat|nested] 68+ messages in thread

* [Bug rtl-optimization/101523] Huge number of combine attempts
  2021-07-20  9:35 [Bug rtl-optimization/101523] New: Huge number of combine attempts krebbel at gcc dot gnu.org
                   ` (48 preceding siblings ...)
  2024-03-27 14:35 ` sarah.kriesch at opensuse dot org
@ 2024-03-27 16:10 ` cvs-commit at gcc dot gnu.org
  2024-03-27 16:53 ` segher at gcc dot gnu.org
                   ` (16 subsequent siblings)
  66 siblings, 0 replies; 68+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2024-03-27 16:10 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101523

--- Comment #50 from GCC Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Segher Boessenkool <segher@gcc.gnu.org>:

https://gcc.gnu.org/g:839bc42772ba7af66af3bd16efed4a69511312ae

commit r14-9692-g839bc42772ba7af66af3bd16efed4a69511312ae
Author: Segher Boessenkool <segher@kernel.crashing.org>
Date:   Wed Mar 27 14:09:52 2024 +0000

    combine: Don't combine if I2 does not change

    In some cases combine will "combine" an I2 and I3, but end up putting
    exactly the same thing back as I2 as was there before.  This is never
    progress, so we shouldn't do it, it will lead to oscillating behaviour
    and the like.

    If we want to canonicalise things, that's fine, but this is not the
    way to do it.

    2024-03-27  Segher Boessenkool  <segher@kernel.crashing.org>

            PR rtl-optimization/101523
            * combine.cc (try_combine): Don't do a 2-insn combination if
            it does not in fact change I2.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* [Bug rtl-optimization/101523] Huge number of combine attempts
  2021-07-20  9:35 [Bug rtl-optimization/101523] New: Huge number of combine attempts krebbel at gcc dot gnu.org
                   ` (49 preceding siblings ...)
  2024-03-27 16:10 ` [Bug rtl-optimization/101523] " cvs-commit at gcc dot gnu.org
@ 2024-03-27 16:53 ` segher at gcc dot gnu.org
  2024-03-27 16:55 ` segher at gcc dot gnu.org
                   ` (15 subsequent siblings)
  66 siblings, 0 replies; 68+ messages in thread
From: segher at gcc dot gnu.org @ 2024-03-27 16:53 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101523

--- Comment #51 from Segher Boessenkool <segher at gcc dot gnu.org> ---
(In reply to Richard Biener from comment #46)
> Maybe combine already knows that it just "keeps i2" rather than replacing it?

It never does that.  Instead, it thinks it is making a new I2, but it ends up
to be exactly the same instruction.  This is not a good thing to do, combine
can change the whole thing back to the previous shape for example, when it
feels like it (combine does not make canonical forms ever!)

> When !newi2pat we seem to delete i2.  Anyway, somebody more familiar with
> combine should produce a good(TM) patch.

Yes, the most common combinations delete I2, they combine 2->1 or 3->1 or 4->1.
When this isn't possible combine tries to combine to two instructions, it has
various strategies for this: the backend can do it explicitly (via a
define_split),
or it can break apart the expression that was the src in the one set that was
the ->1 result, hoping that the two instructions it gets that way are valid
insns.  It tries only one way to do this, and it isn't very smart about it,
just very heuristic.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* [Bug rtl-optimization/101523] Huge number of combine attempts
  2021-07-20  9:35 [Bug rtl-optimization/101523] New: Huge number of combine attempts krebbel at gcc dot gnu.org
                   ` (50 preceding siblings ...)
  2024-03-27 16:53 ` segher at gcc dot gnu.org
@ 2024-03-27 16:55 ` segher at gcc dot gnu.org
  2024-04-03 10:58 ` rguenth at gcc dot gnu.org
                   ` (14 subsequent siblings)
  66 siblings, 0 replies; 68+ messages in thread
From: segher at gcc dot gnu.org @ 2024-03-27 16:55 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101523

Segher Boessenkool <segher at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         Resolution|---                         |FIXED
             Status|NEW                         |RESOLVED

--- Comment #52 from Segher Boessenkool <segher at gcc dot gnu.org> ---
Fixed.  (On trunk only, no backports planned, this goes back aaaaaaages).

^ permalink raw reply	[flat|nested] 68+ messages in thread

* [Bug rtl-optimization/101523] Huge number of combine attempts
  2021-07-20  9:35 [Bug rtl-optimization/101523] New: Huge number of combine attempts krebbel at gcc dot gnu.org
                   ` (51 preceding siblings ...)
  2024-03-27 16:55 ` segher at gcc dot gnu.org
@ 2024-04-03 10:58 ` rguenth at gcc dot gnu.org
  2024-04-05 11:48 ` segher at gcc dot gnu.org
                   ` (13 subsequent siblings)
  66 siblings, 0 replies; 68+ messages in thread
From: rguenth at gcc dot gnu.org @ 2024-04-03 10:58 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101523

--- Comment #53 from Richard Biener <rguenth at gcc dot gnu.org> ---
So just to recap, with reverting the change and instead doing

diff --git a/gcc/combine.cc b/gcc/combine.cc
index a4479f8d836..ff25752cac4 100644
--- a/gcc/combine.cc
+++ b/gcc/combine.cc
@@ -4186,6 +4186,10 @@ try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn *i1,
rtx_insn *i0,
       adjust_for_new_dest (i3);
     }

+  bool i2_unchanged = false;
+  if (rtx_equal_p (newi2pat, PATTERN (i2)))
+    i2_unchanged = true;
+
   /* We now know that we can do this combination.  Merge the insns and
      update the status of registers and LOG_LINKS.  */

@@ -4752,6 +4756,9 @@ try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn *i1,
rtx_insn *i0,
   combine_successes++;
   undo_commit ();

+  if (i2_unchanged)
+    return i3;
+
   rtx_insn *ret = newi2pat ? i2 : i3;
   if (added_links_insn && DF_INSN_LUID (added_links_insn) < DF_INSN_LUID
(ret))
     ret = added_links_insn;

combine time is down from 79s (93%) to 3.5s (37%), quite a bit more than
with the currently installed patch which has combine down to 0.02s (0%).
But notably peak memory use is down from 9GB to 400MB (installed patch 340MB).

That was with a cross from x86_64-linux and a release checking build.

This change should avoid any code generation changes, I do think if the
pattern doesn't change what distribute_notes/links does should be a no-op
even to I2 so we can ignore added_{links,notes}_insn (not ignoring them
only provides a 50% speedup).

I like the 0% combine result of the installed patch but the regressions
observed probably mean this needs to be defered to stage1.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* [Bug rtl-optimization/101523] Huge number of combine attempts
  2021-07-20  9:35 [Bug rtl-optimization/101523] New: Huge number of combine attempts krebbel at gcc dot gnu.org
                   ` (52 preceding siblings ...)
  2024-04-03 10:58 ` rguenth at gcc dot gnu.org
@ 2024-04-05 11:48 ` segher at gcc dot gnu.org
  2024-04-05 12:20 ` rguenth at gcc dot gnu.org
                   ` (12 subsequent siblings)
  66 siblings, 0 replies; 68+ messages in thread
From: segher at gcc dot gnu.org @ 2024-04-05 11:48 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101523

--- Comment #54 from Segher Boessenkool <segher at gcc dot gnu.org> ---
Propose a patch, then?  With justification.  It should also work for 100000x
bigger testcases.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* [Bug rtl-optimization/101523] Huge number of combine attempts
  2021-07-20  9:35 [Bug rtl-optimization/101523] New: Huge number of combine attempts krebbel at gcc dot gnu.org
                   ` (53 preceding siblings ...)
  2024-04-05 11:48 ` segher at gcc dot gnu.org
@ 2024-04-05 12:20 ` rguenth at gcc dot gnu.org
  2024-04-10  6:02 ` rguenth at gcc dot gnu.org
                   ` (11 subsequent siblings)
  66 siblings, 0 replies; 68+ messages in thread
From: rguenth at gcc dot gnu.org @ 2024-04-05 12:20 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101523

--- Comment #55 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Segher Boessenkool from comment #54)
> Propose a patch, then?  With justification.  It should also work for 100000x
> bigger testcases.

https://gcc.gnu.org/pipermail/gcc-patches/2024-April/648725.html

^ permalink raw reply	[flat|nested] 68+ messages in thread

* [Bug rtl-optimization/101523] Huge number of combine attempts
  2021-07-20  9:35 [Bug rtl-optimization/101523] New: Huge number of combine attempts krebbel at gcc dot gnu.org
                   ` (54 preceding siblings ...)
  2024-04-05 12:20 ` rguenth at gcc dot gnu.org
@ 2024-04-10  6:02 ` rguenth at gcc dot gnu.org
  2024-04-10 15:51 ` segher at gcc dot gnu.org
                   ` (10 subsequent siblings)
  66 siblings, 0 replies; 68+ messages in thread
From: rguenth at gcc dot gnu.org @ 2024-04-10  6:02 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101523

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         Resolution|FIXED                       |---
             Status|RESOLVED                    |ASSIGNED
           Assignee|segher at gcc dot gnu.org          |rguenth at gcc dot gnu.org

--- Comment #56 from Richard Biener <rguenth at gcc dot gnu.org> ---
The fix was reverted but will be re-instantiated for GCC 15 by me.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* [Bug rtl-optimization/101523] Huge number of combine attempts
  2021-07-20  9:35 [Bug rtl-optimization/101523] New: Huge number of combine attempts krebbel at gcc dot gnu.org
                   ` (55 preceding siblings ...)
  2024-04-10  6:02 ` rguenth at gcc dot gnu.org
@ 2024-04-10 15:51 ` segher at gcc dot gnu.org
  2024-04-10 15:57 ` jakub at gcc dot gnu.org
                   ` (9 subsequent siblings)
  66 siblings, 0 replies; 68+ messages in thread
From: segher at gcc dot gnu.org @ 2024-04-10 15:51 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101523

--- Comment #57 from Segher Boessenkool <segher at gcc dot gnu.org> ---
(In reply to Richard Biener from comment #56)
> The fix was reverted but will be re-instantiated for GCC 15 by me.

And I still protest.

PR101523 is a very serious problem, way way way more "P1" than any of the
"my target was inconvenienced by some bad testcases failing now" "P1"s there
are now.  Please undo this!

^ permalink raw reply	[flat|nested] 68+ messages in thread

* [Bug rtl-optimization/101523] Huge number of combine attempts
  2021-07-20  9:35 [Bug rtl-optimization/101523] New: Huge number of combine attempts krebbel at gcc dot gnu.org
                   ` (56 preceding siblings ...)
  2024-04-10 15:51 ` segher at gcc dot gnu.org
@ 2024-04-10 15:57 ` jakub at gcc dot gnu.org
  2024-04-10 17:03 ` rguenth at gcc dot gnu.org
                   ` (8 subsequent siblings)
  66 siblings, 0 replies; 68+ messages in thread
From: jakub at gcc dot gnu.org @ 2024-04-10 15:57 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101523

--- Comment #58 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
(In reply to Segher Boessenkool from comment #57)
> (In reply to Richard Biener from comment #56)
> > The fix was reverted but will be re-instantiated for GCC 15 by me.
> 
> And I still protest.
> 
> PR101523 is a very serious problem, way way way more "P1" than any of the
> "my target was inconvenienced by some bad testcases failing now" "P1"s there
> are now.  Please undo this!

It isn't just the case where some unimportant testcase would need to be
xfailed, there are 5% slowdowns on SPEC and similar bugs as well.
Yes, PR101523 is important to get fixed, but we've lived with it for years and
don't have time in GCC 14 cycle to deal with the fallouts from the change,
which there are clearly many.  If the fix was done in stage1, there could be
time to deal with that, but we want to release in 2 weeks or so.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* [Bug rtl-optimization/101523] Huge number of combine attempts
  2021-07-20  9:35 [Bug rtl-optimization/101523] New: Huge number of combine attempts krebbel at gcc dot gnu.org
                   ` (57 preceding siblings ...)
  2024-04-10 15:57 ` jakub at gcc dot gnu.org
@ 2024-04-10 17:03 ` rguenth at gcc dot gnu.org
  2024-04-10 17:45 ` sarah.kriesch at opensuse dot org
                   ` (7 subsequent siblings)
  66 siblings, 0 replies; 68+ messages in thread
From: rguenth at gcc dot gnu.org @ 2024-04-10 17:03 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101523

--- Comment #59 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Segher Boessenkool from comment #57)
> (In reply to Richard Biener from comment #56)
> > The fix was reverted but will be re-instantiated for GCC 15 by me.
> 
> And I still protest.
> 
> PR101523 is a very serious problem, way way way more "P1" than any of the
> "my target was inconvenienced by some bad testcases failing now" "P1"s there
> are now.  Please undo this!

It was a very serious problem in 2021, too.  But since we shipped with that
very serious problem it is by definition not a ship-stopper.  I'll also
point out that your previous assessment of this bug was more as an obscure
corner case.

If there's a solution avoiding the code generation regressions that can
be considered for backporting to GCC 14.2 or even earlier branches.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* [Bug rtl-optimization/101523] Huge number of combine attempts
  2021-07-20  9:35 [Bug rtl-optimization/101523] New: Huge number of combine attempts krebbel at gcc dot gnu.org
                   ` (58 preceding siblings ...)
  2024-04-10 17:03 ` rguenth at gcc dot gnu.org
@ 2024-04-10 17:45 ` sarah.kriesch at opensuse dot org
  2024-05-04  6:00 ` segher at gcc dot gnu.org
                   ` (6 subsequent siblings)
  66 siblings, 0 replies; 68+ messages in thread
From: sarah.kriesch at opensuse dot org @ 2024-04-10 17:45 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101523

--- Comment #60 from Sarah Julia Kriesch <sarah.kriesch at opensuse dot org> ---
I have to agree with Richard. This problem has been serious for a long time but
has been ignored by IBM based on distribution choices.

Anyway, we want to live within the open source community without any Linux
distribution priorities (especially in upstream projects like here).

Segher, can you specify the failed test cases? Then, it should be possible to
reproduce and improve that all. In such a collaborative way, we can also
achieve a solution.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* [Bug rtl-optimization/101523] Huge number of combine attempts
  2021-07-20  9:35 [Bug rtl-optimization/101523] New: Huge number of combine attempts krebbel at gcc dot gnu.org
                   ` (59 preceding siblings ...)
  2024-04-10 17:45 ` sarah.kriesch at opensuse dot org
@ 2024-05-04  6:00 ` segher at gcc dot gnu.org
  2024-05-04 13:14 ` sarah.kriesch at opensuse dot org
                   ` (5 subsequent siblings)
  66 siblings, 0 replies; 68+ messages in thread
From: segher at gcc dot gnu.org @ 2024-05-04  6:00 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101523

--- Comment #61 from Segher Boessenkool <segher at gcc dot gnu.org> ---
(In reply to Sarah Julia Kriesch from comment #60)
> I have to agree with Richard. This problem has been serious for a long time
> but has been ignored by IBM based on distribution choices.

What?  What does IBM have to do with this?  Yes, they are my employer, but
what I decide is best for combine to do is not influenced by them *at all*
(except some times they want me to spend time doing paid work, distracting
me from things that really matter, like combine!)

> Anyway, we want to live within the open source community without any Linux
> distribution priorities (especially in upstream projects like here).

No idea what that means either.

> Segher, can you specify the failed test cases? Then, it should be possible
> to reproduce and improve that all. In such a collaborative way, we can also
> achieve a solution.

What failed test cases?  You completely lost me.

We used to do the wrong thing in combine.  Now that my fix was reverted, we
still do.  This should be undone soonish, so that I can commit an actual UNCSE
implementation, which fixes all "regressions" (quotes, because they are not!)
caused by my previous patch, and does a lot more too.  It also will allow us
to remove a bunch of other code from combine, speeding up things a lot more
(things that keep a copy of a set if the dest is used more than once).  There
has been talk of doing an UNCSE for over twenty years now, so annoying me
enough to get this done is a good result of this whole thing :-)

^ permalink raw reply	[flat|nested] 68+ messages in thread

* [Bug rtl-optimization/101523] Huge number of combine attempts
  2021-07-20  9:35 [Bug rtl-optimization/101523] New: Huge number of combine attempts krebbel at gcc dot gnu.org
                   ` (60 preceding siblings ...)
  2024-05-04  6:00 ` segher at gcc dot gnu.org
@ 2024-05-04 13:14 ` sarah.kriesch at opensuse dot org
  2024-05-04 17:30 ` segher at gcc dot gnu.org
                   ` (4 subsequent siblings)
  66 siblings, 0 replies; 68+ messages in thread
From: sarah.kriesch at opensuse dot org @ 2024-05-04 13:14 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101523

--- Comment #62 from Sarah Julia Kriesch <sarah.kriesch at opensuse dot org> ---
(In reply to Segher Boessenkool from comment #61)
> (In reply to Sarah Julia Kriesch from comment #60)
> > I have to agree with Richard. This problem has been serious for a long time
> > but has been ignored by IBM based on distribution choices.
> 
> What?  What does IBM have to do with this?  Yes, they are my employer, but
> what I decide is best for combine to do is not influenced by them *at all*
> (except some times they want me to spend time doing paid work, distracting
> me from things that really matter, like combine!)
> 
Then, tell other reasons why my requests in the openSUSE bug report have been
rejected in the past, and this bug report has been open for 3 years.
Perhaps it is helpful to know that IBM has fixed memory issues in PostgreSQL
(for openSUSE/upstream) with higher quality via my request with the support for
Red Hat (and faster).

> > Anyway, we want to live within the open source community without any Linux
> > distribution priorities (especially in upstream projects like here).
> 
> No idea what that means either.
> 
There is a reason for founding the Linux Distributions Working Group at the
Open Mainframe Project (equality for all Linux Distributions on s390x).
SUSE, Red Hat and Canonical have been supporting this idea also (especially
based on my own work experience at IBM and the priorities inside).

> > Segher, can you specify the failed test cases? Then, it should be possible
> > to reproduce and improve that all. In such a collaborative way, we can also
> > achieve a solution.
> 
> What failed test cases?  You completely lost me.
> 
This one:
(In reply to Segher Boessenkool from comment #57)
> (In reply to Richard Biener from comment #56)
> PR101523 is a very serious problem, way way way more "P1" than any of the
> "my target was inconvenienced by some bad testcases failing now" "P1"s there
> are now.  Please undo this!

(In reply to Segher Boessenkool from comment #61)
> We used to do the wrong thing in combine.  Now that my fix was reverted, we
> still do.  This should be undone soonish, so that I can commit an actual
> UNCSE
> implementation, which fixes all "regressions" (quotes, because they are not!)
> caused by my previous patch, and does a lot more too.  It also will allow us
> to remove a bunch of other code from combine, speeding up things a lot more
> (things that keep a copy of a set if the dest is used more than once).  There
> has been talk of doing an UNCSE for over twenty years now, so annoying me
> enough to get this done is a good result of this whole thing :-)
Your fixes should also work with upstream code and the used gcc versions in
our/all Linux distributions. I recommend applying tests and merging your fixes
to at least one gcc version.


If you want to watch something about our reasons for creating a collaboration
between Linux distributions (and upstream projects), you should watch my first
presentation "Collaboration instead of Competition":
https://av.tib.eu/media/57010

Hint: The IBM statement came from my former IBM Manager (now your CPO).

^ permalink raw reply	[flat|nested] 68+ messages in thread

* [Bug rtl-optimization/101523] Huge number of combine attempts
  2021-07-20  9:35 [Bug rtl-optimization/101523] New: Huge number of combine attempts krebbel at gcc dot gnu.org
                   ` (61 preceding siblings ...)
  2024-05-04 13:14 ` sarah.kriesch at opensuse dot org
@ 2024-05-04 17:30 ` segher at gcc dot gnu.org
  2024-05-06  9:21 ` rguenther at suse dot de
                   ` (3 subsequent siblings)
  66 siblings, 0 replies; 68+ messages in thread
From: segher at gcc dot gnu.org @ 2024-05-04 17:30 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101523

--- Comment #63 from Segher Boessenkool <segher at gcc dot gnu.org> ---
(In reply to Sarah Julia Kriesch from comment #62)
> (In reply to Segher Boessenkool from comment #61)
> > (In reply to Sarah Julia Kriesch from comment #60)
> > > I have to agree with Richard. This problem has been serious for a long time
> > > but has been ignored by IBM based on distribution choices.
> > 
> > What?  What does IBM have to do with this?  Yes, they are my employer, but
> > what I decide is best for combine to do is not influenced by them *at all*
> > (except some times they want me to spend time doing paid work, distracting
> > me from things that really matter, like combine!)
> > 
> Then, tell other reasons why my requests in the openSUSE bug report have
> been rejected in the past, and this bug report has been open for 3 years.
> Perhaps it is helpful to know that IBM has fixed memory issues in PostgreSQL
> (for openSUSE/upstream) with higher quality via my request with the support
> for Red Hat (and faster).

Once again, I have no idea what you are talking about.  It sounds like some
complot theory?  Exciting!

I really have no idea what you are talking about.  I recognise some of the
words, but not enough to give me a handle on what you are on about.

> > > Anyway, we want to live within the open source community without any Linux
> > > distribution priorities (especially in upstream projects like here).
> > 
> > No idea what that means either.
> > 
> There is a reason for founding the Linux Distributions Working Group at the
> Open Mainframe Project (equality for all Linux Distributions on s390x).
> SUSE, Red Hat and Canonical have been supporting this idea also (especially
> based on my own work experience at IBM and the priorities inside).

And here I don't have any context either.

> > > Segher, can you specify the failed test cases? Then, it should be possible
> > > to reproduce and improve that all. In such a collaborative way, we can also
> > > achieve a solution.
> > 
> > What failed test cases?  You completely lost me.
> > 
> This one:
> (In reply to Segher Boessenkool from comment #57)
> > (In reply to Richard Biener from comment #56)
> > PR101523 is a very serious problem, way way way more "P1" than any of the
> > "my target was inconvenienced by some bad testcases failing now" "P1"s there
> > are now.  Please undo this!

They are in this PR.  "See Also", top right corner in the headings.

> (In reply to Segher Boessenkool from comment #61)
> > We used to do the wrong thing in combine.  Now that my fix was reverted, we
> > still do.  This should be undone soonish, so that I can commit an actual
> > UNCSE
> > implementation, which fixes all "regressions" (quotes, because they are not!)
> > caused by my previous patch, and does a lot more too.  It also will allow us
> > to remove a bunch of other code from combine, speeding up things a lot more
> > (things that keep a copy of a set if the dest is used more than once).  There
> > has been talk of doing an UNCSE for over twenty years now, so annoying me
> > enough to get this done is a good result of this whole thing :-)
> Your fixes should also work with upstream code and the used gcc versions in
> our/all Linux distributions. I recommend applying tests and merging your
> fixes to at least one gcc version.

Lol.  No.  Distributions have to sort out their own problems.  I don't have
a copy of an old version of most distros even; I haven't *heard* about the
*existence* of most distros!

I don't use a Linux distro on any of my own machines.  And I care about some
other OSes at least as much, btw.  And not just because my employer cares about
some of those.

> If you want to watch something about our reasons for creating a
> collaboration between Linux distributions (and upstream projects), you
> should watch my first presentation "Collaboration instead of Competition":
> https://av.tib.eu/media/57010
> 
> Hint: The IBM statement came from my former IBM Manager (now your CPO).

CPO?  What is a CPO?  I don't think I have any?  I do have an R2 somewhere,
does that help?

^ permalink raw reply	[flat|nested] 68+ messages in thread

* [Bug rtl-optimization/101523] Huge number of combine attempts
  2021-07-20  9:35 [Bug rtl-optimization/101523] New: Huge number of combine attempts krebbel at gcc dot gnu.org
                   ` (62 preceding siblings ...)
  2024-05-04 17:30 ` segher at gcc dot gnu.org
@ 2024-05-06  9:21 ` rguenther at suse dot de
  2024-05-06  9:40 ` segher at kernel dot crashing.org
                   ` (2 subsequent siblings)
  66 siblings, 0 replies; 68+ messages in thread
From: rguenther at suse dot de @ 2024-05-06  9:21 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101523

--- Comment #64 from rguenther at suse dot de <rguenther at suse dot de> ---
On Sat, 4 May 2024, segher at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101523
> 
> --- Comment #61 from Segher Boessenkool <segher at gcc dot gnu.org> ---
> We used to do the wrong thing in combine.  Now that my fix was reverted, we
> still do.  This should be undone soonish,

As promised I'm going to revert the revert after 14.1 is released 
(hopefully tomorrow).  As for distros I have decided to include my
hack posted in 
https://gcc.gnu.org/pipermail/gcc-patches/2024-April/648725.html
for SUSE based distros in GCC 13 and 14 as that seems to improve
the problematical memory uses in our build farm.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* [Bug rtl-optimization/101523] Huge number of combine attempts
  2021-07-20  9:35 [Bug rtl-optimization/101523] New: Huge number of combine attempts krebbel at gcc dot gnu.org
                   ` (63 preceding siblings ...)
  2024-05-06  9:21 ` rguenther at suse dot de
@ 2024-05-06  9:40 ` segher at kernel dot crashing.org
  2024-05-06  9:42 ` segher at gcc dot gnu.org
  2024-05-06  9:46 ` rguenth at gcc dot gnu.org
  66 siblings, 0 replies; 68+ messages in thread
From: segher at kernel dot crashing.org @ 2024-05-06  9:40 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101523

--- Comment #65 from Segher Boessenkool <segher at kernel dot crashing.org> ---
On Sat, May 04, 2024 at 01:14:18PM +0000, sarah.kriesch at opensuse dot org
wrote:
Do not reply to a PR comment in private mail.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* [Bug rtl-optimization/101523] Huge number of combine attempts
  2021-07-20  9:35 [Bug rtl-optimization/101523] New: Huge number of combine attempts krebbel at gcc dot gnu.org
                   ` (64 preceding siblings ...)
  2024-05-06  9:40 ` segher at kernel dot crashing.org
@ 2024-05-06  9:42 ` segher at gcc dot gnu.org
  2024-05-06  9:46 ` rguenth at gcc dot gnu.org
  66 siblings, 0 replies; 68+ messages in thread
From: segher at gcc dot gnu.org @ 2024-05-06  9:42 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101523

--- Comment #66 from Segher Boessenkool <segher at gcc dot gnu.org> ---
(In reply to rguenther@suse.de from comment #64)
> As promised I'm going to revert the revert after 14.1 is released 
> (hopefully tomorrow).

Thank you!  beer++

> As for distros I have decided to include my
> hack posted in 
> https://gcc.gnu.org/pipermail/gcc-patches/2024-April/648725.html
> for SUSE based distros in GCC 13 and 14 as that seems to improve
> the problematical memory uses in our build farm.

I think this patch may well show some actual regressions :-(  We'll see.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* [Bug rtl-optimization/101523] Huge number of combine attempts
  2021-07-20  9:35 [Bug rtl-optimization/101523] New: Huge number of combine attempts krebbel at gcc dot gnu.org
                   ` (65 preceding siblings ...)
  2024-05-06  9:42 ` segher at gcc dot gnu.org
@ 2024-05-06  9:46 ` rguenth at gcc dot gnu.org
  66 siblings, 0 replies; 68+ messages in thread
From: rguenth at gcc dot gnu.org @ 2024-05-06  9:46 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101523

--- Comment #67 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Segher Boessenkool from comment #66)
> (In reply to rguenther@suse.de from comment #64)
> > As promised I'm going to revert the revert after 14.1 is released 
> > (hopefully tomorrow).
> 
> Thank you!  beer++
> 
> > As for distros I have decided to include my
> > hack posted in 
> > https://gcc.gnu.org/pipermail/gcc-patches/2024-April/648725.html
> > for SUSE based distros in GCC 13 and 14 as that seems to improve
> > the problematical memory uses in our build farm.
> 
> I think this patch may well show some actual regressions :-(  We'll see.

I'm probably not going to notice - at least I think it should be fine by
design, but as we see combine doesn't adhere to it's design, so my milage
may vary ;)  But yeah, I didn't do any extensive before/after code
differences (there should be no difference - fingers crossing ;))

^ permalink raw reply	[flat|nested] 68+ messages in thread

end of thread, other threads:[~2024-05-06  9:46 UTC | newest]

Thread overview: 68+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-07-20  9:35 [Bug rtl-optimization/101523] New: Huge number of combine attempts krebbel at gcc dot gnu.org
2021-07-20  9:41 ` [Bug rtl-optimization/101523] " krebbel at gcc dot gnu.org
2021-07-20  9:42 ` krebbel at gcc dot gnu.org
2021-07-20 21:27 ` segher at gcc dot gnu.org
2024-02-24  5:45 ` krebbel at gcc dot gnu.org
2024-02-25 10:53 ` segher at gcc dot gnu.org
2024-02-25 15:06 ` segher at gcc dot gnu.org
2024-03-02 18:43 ` sarah.kriesch at opensuse dot org
2024-03-02 21:04 ` sjames at gcc dot gnu.org
2024-03-03 19:32 ` segher at gcc dot gnu.org
2024-03-04  8:18 ` krebbel at gcc dot gnu.org
2024-03-04 16:31 ` segher at gcc dot gnu.org
2024-03-04 16:49 ` sarah.kriesch at opensuse dot org
2024-03-04 21:26 ` segher at gcc dot gnu.org
2024-03-05  7:31 ` krebbel at gcc dot gnu.org
2024-03-05 10:36 ` sarah.kriesch at opensuse dot org
2024-03-06 22:49 ` segher at gcc dot gnu.org
2024-03-06 22:53 ` segher at gcc dot gnu.org
2024-03-06 23:15 ` pinskia at gcc dot gnu.org
2024-03-07  9:26 ` krebbel at gcc dot gnu.org
2024-03-07  9:28 ` krebbel at gcc dot gnu.org
2024-03-07  9:34 ` krebbel at gcc dot gnu.org
2024-03-07 15:51 ` krebbel at gcc dot gnu.org
2024-03-07 15:52 ` krebbel at gcc dot gnu.org
2024-03-07 16:11 ` segher at gcc dot gnu.org
2024-03-07 16:19 ` segher at gcc dot gnu.org
2024-03-07 16:53 ` pinskia at gcc dot gnu.org
2024-03-07 16:57 ` pinskia at gcc dot gnu.org
2024-03-07 18:15 ` segher at gcc dot gnu.org
2024-03-07 19:42 ` segher at gcc dot gnu.org
2024-03-07 19:45 ` jakub at gcc dot gnu.org
2024-03-07 20:10 ` segher at gcc dot gnu.org
2024-03-07 20:17 ` krebbel at gcc dot gnu.org
2024-03-07 20:53 ` krebbel at gcc dot gnu.org
2024-03-20 11:53 ` [Bug target/101523] " rguenth at gcc dot gnu.org
2024-03-21  7:56 ` segher at gcc dot gnu.org
2024-03-21  8:15 ` rguenth at gcc dot gnu.org
2024-03-21  8:27 ` rguenth at gcc dot gnu.org
2024-03-21 12:57 ` segher at gcc dot gnu.org
2024-03-21 12:58 ` segher at gcc dot gnu.org
2024-03-21 13:15 ` rguenth at gcc dot gnu.org
2024-03-21 13:22 ` rguenth at gcc dot gnu.org
2024-03-22  8:07 ` rguenth at gcc dot gnu.org
2024-03-22  9:41 ` rguenth at gcc dot gnu.org
2024-03-22  9:42 ` sjames at gcc dot gnu.org
2024-03-22  9:42 ` sjames at gcc dot gnu.org
2024-03-22  9:52 ` rguenth at gcc dot gnu.org
2024-03-22 12:17 ` rguenth at gcc dot gnu.org
2024-03-22 12:38 ` rguenth at gcc dot gnu.org
2024-03-27 14:35 ` sarah.kriesch at opensuse dot org
2024-03-27 16:10 ` [Bug rtl-optimization/101523] " cvs-commit at gcc dot gnu.org
2024-03-27 16:53 ` segher at gcc dot gnu.org
2024-03-27 16:55 ` segher at gcc dot gnu.org
2024-04-03 10:58 ` rguenth at gcc dot gnu.org
2024-04-05 11:48 ` segher at gcc dot gnu.org
2024-04-05 12:20 ` rguenth at gcc dot gnu.org
2024-04-10  6:02 ` rguenth at gcc dot gnu.org
2024-04-10 15:51 ` segher at gcc dot gnu.org
2024-04-10 15:57 ` jakub at gcc dot gnu.org
2024-04-10 17:03 ` rguenth at gcc dot gnu.org
2024-04-10 17:45 ` sarah.kriesch at opensuse dot org
2024-05-04  6:00 ` segher at gcc dot gnu.org
2024-05-04 13:14 ` sarah.kriesch at opensuse dot org
2024-05-04 17:30 ` segher at gcc dot gnu.org
2024-05-06  9:21 ` rguenther at suse dot de
2024-05-06  9:40 ` segher at kernel dot crashing.org
2024-05-06  9:42 ` segher at gcc dot gnu.org
2024-05-06  9:46 ` rguenth at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).