public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug target/113847] New: [14 Regression] 10% slowdown of 462.libquantum on AMD Ryzen 7700X and Ryzen 7900X
@ 2024-02-09  9:24 fkastl at suse dot cz
  2024-02-09 11:13 ` [Bug target/113847] " rguenth at gcc dot gnu.org
                   ` (8 more replies)
  0 siblings, 9 replies; 10+ messages in thread
From: fkastl at suse dot cz @ 2024-02-09  9:24 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113847

            Bug ID: 113847
           Summary: [14 Regression] 10% slowdown of 462.libquantum on AMD
                    Ryzen 7700X and Ryzen 7900X
           Product: gcc
           Version: 14.0
            Status: UNCONFIRMED
          Keywords: missed-optimization, needs-bisection
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: fkastl at suse dot cz
  Target Milestone: ---
              Host: x86_64-linux
            Target: x86_64-linux

As seen here

https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=956.210.0

between commits

g:d826596acb02edf4

and

g:23cd2961bd2ff635

there is about 10% slowdown of execution time of the 2006SPEC 462.libquantum
benchmark.

The test is run with -O2 and lto on an AMD Ryzen 7700X.

I also reproduced the slowdown on a AMD Ryzen 7900X machine. However I wasn't
able to reproduce the slowdown on an AMD EPYC machine - also Zen4
microarchitecture. So I suppose this slowdown occurs only on Zen4 Ryzen CPUs or
is maybe even more specific.

I'm not sure if we want to do anything about this. The same slowdown on the
same machine has already happened once, see pr112547. The benchmark results
eventually returned to the original values.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug target/113847] [14 Regression] 10% slowdown of 462.libquantum on AMD Ryzen 7700X and Ryzen 7900X
  2024-02-09  9:24 [Bug target/113847] New: [14 Regression] 10% slowdown of 462.libquantum on AMD Ryzen 7700X and Ryzen 7900X fkastl at suse dot cz
@ 2024-02-09 11:13 ` rguenth at gcc dot gnu.org
  2024-02-10 17:04 ` fkastl at suse dot cz
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: rguenth at gcc dot gnu.org @ 2024-02-09 11:13 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113847

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|---                         |14.0

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug target/113847] [14 Regression] 10% slowdown of 462.libquantum on AMD Ryzen 7700X and Ryzen 7900X
  2024-02-09  9:24 [Bug target/113847] New: [14 Regression] 10% slowdown of 462.libquantum on AMD Ryzen 7700X and Ryzen 7900X fkastl at suse dot cz
  2024-02-09 11:13 ` [Bug target/113847] " rguenth at gcc dot gnu.org
@ 2024-02-10 17:04 ` fkastl at suse dot cz
  2024-02-12  9:45 ` rguenth at gcc dot gnu.org
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: fkastl at suse dot cz @ 2024-02-10 17:04 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113847

Filip Kastl <fkastl at suse dot cz> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Keywords|needs-bisection             |
                 CC|                            |rguenth at gcc dot gnu.org

--- Comment #1 from Filip Kastl <fkastl at suse dot cz> ---
Bisected to g:724b64304ff5c8ac08a913509afd6fde38d7b767 (I did the bisection on
Ryzen 7900X)

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug target/113847] [14 Regression] 10% slowdown of 462.libquantum on AMD Ryzen 7700X and Ryzen 7900X
  2024-02-09  9:24 [Bug target/113847] New: [14 Regression] 10% slowdown of 462.libquantum on AMD Ryzen 7700X and Ryzen 7900X fkastl at suse dot cz
  2024-02-09 11:13 ` [Bug target/113847] " rguenth at gcc dot gnu.org
  2024-02-10 17:04 ` fkastl at suse dot cz
@ 2024-02-12  9:45 ` rguenth at gcc dot gnu.org
  2024-02-12 13:18 ` rguenth at gcc dot gnu.org
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: rguenth at gcc dot gnu.org @ 2024-02-12  9:45 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113847

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |ASSIGNED
           Assignee|unassigned at gcc dot gnu.org      |rguenth at gcc dot gnu.org
   Last reconfirmed|                            |2024-02-12
     Ever confirmed|0                           |1

--- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> ---
I will try to investigate.  Note this was a correctness fix, it could be
relaxed a tiny bit but behavior will then depend on the order of processing of
blocks not ordered by RPO.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug target/113847] [14 Regression] 10% slowdown of 462.libquantum on AMD Ryzen 7700X and Ryzen 7900X
  2024-02-09  9:24 [Bug target/113847] New: [14 Regression] 10% slowdown of 462.libquantum on AMD Ryzen 7700X and Ryzen 7900X fkastl at suse dot cz
                   ` (2 preceding siblings ...)
  2024-02-12  9:45 ` rguenth at gcc dot gnu.org
@ 2024-02-12 13:18 ` rguenth at gcc dot gnu.org
  2024-02-12 14:41 ` rguenth at gcc dot gnu.org
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: rguenth at gcc dot gnu.org @ 2024-02-12 13:18 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113847

--- Comment #3 from Richard Biener <rguenth at gcc dot gnu.org> ---
I can't confirm a regression (testing r14-8925-g1e3f78dbb328a2 with the
offending rev reverted vs bare).

462.libquantum  20720       61.9        335 S   20720       62.6        331 *
462.libquantum  20720       62.2        333 *   20720       61.9        335 S
462.libquantum  20720       62.4        332 S   20720       62.7        330 S

so the "best" run with the change is faster than the best run with it reverted
while the worst runs are the same.

There's only code-gen changes in quantum_bmeasure.part.0 and we can see
it's likely

{component_ref<node>,mem_ref<0B>,reg_3(D)}@.MEM_166 (0030)

vs

{component_ref<hash>,mem_ref<0B>,reg_3(D)}@.MEM_9 (0022)

where once the size is 256 and once 64.  The types are

 <record_type 0x7ffff6a753f0 quantum_reg BLK
    size <integer_cst 0x7ffff6c29138 type <integer_type 0x7ffff6c250a8
bitsizetype> constant 256>
    unit-size <integer_cst 0x7ffff6c29228 type <integer_type 0x7ffff6c25000
sizetype> constant 32>

vs.

 <pointer_type 0x7ffff6a813f0
    type <record_type 0x7ffff6a81348 quantum_reg_node TI
        size <integer_cst 0x7ffff6c0be10 constant 128>
        unit-size <integer_cst 0x7ffff6c0be28 constant 16>

the former is subsetted by a COMPONENT_REF to eventually

 <pointer_type 0x7ffff6e752a0
    type <record_type 0x7ffff6e751f8 quantum_reg_node VOID
        align:8 warn_if_not_align:0 symtab:0 alias-set -1 structural-equality
        pointer_to_this <pointer_type 0x7ffff6e752a0>>
    unsigned DI

so we have basically MEM<ptr + off> vs. MEM<ptr>.member-with-off.

That's indeed a case where we maybe like to avoid applying this fix, but
maybe only when strict-aliasing is in effect.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug target/113847] [14 Regression] 10% slowdown of 462.libquantum on AMD Ryzen 7700X and Ryzen 7900X
  2024-02-09  9:24 [Bug target/113847] New: [14 Regression] 10% slowdown of 462.libquantum on AMD Ryzen 7700X and Ryzen 7900X fkastl at suse dot cz
                   ` (3 preceding siblings ...)
  2024-02-12 13:18 ` rguenth at gcc dot gnu.org
@ 2024-02-12 14:41 ` rguenth at gcc dot gnu.org
  2024-02-12 14:43 ` rguenth at gcc dot gnu.org
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: rguenth at gcc dot gnu.org @ 2024-02-12 14:41 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113847

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |hubicka at gcc dot gnu.org

--- Comment #4 from Richard Biener <rguenth at gcc dot gnu.org> ---
Hmm, the important one is actually MEM[ptr + CST] vs MEM[ptr].component.  But
those are not semantically equivalent, even when the same TBAA type is in
effect.

  _31 = MEM <int> [(struct quantum_reg *)reg_3(D)];
  _33 = MEM <int> [(struct quantum_reg *)reg_3(D) + 8B];
  _34 = MEM <struct quantum_reg_node *> [(struct quantum_reg *)reg_3(D) + 16B];
  _35 = MEM <int *> [(struct quantum_reg *)reg_3(D) + 24B];
  out = quantum_state_collapse.isra (pos_1(D), result_22, _31, _32, _33, _34,
_35); [return slot optimization]

this is from inlined quantum_state_collapse where IPA SRA is eventually
applied producing the above.

That we do produce those might hint at that we can't really assume the
dynamic type quantum_reg is at offset 8 but that was the original intent.
What we are left with is the special-case where typeof (MEM[ptr + CST])
== typeof (alias-pointed-to-type) (with CST == 0).  For any other case
what we know is only that the access MEM[ptr + CST] is to somewhere
inside an object of dynamic type quantum_reg?

I'm not sure that's not less than we make use of in the alias-oracle,
esp. aliasing_component_refs_walk and friends?  We might be fine in
practice for "bare" MEM_REFs like the above, but if we ever fold only
part of the access path into the constant offset funny things may happen?

So I think IPA SRA does wrong here (and maybe GCC in other places as well),
possibly only pessimizing and possibly creating latent wrong-code.
Note quantum_state_collapse has

  reg$size_62 = reg.size;
  reg$node_75 = reg.node;
...

pre-IPA.

Honza, any opinion?

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug target/113847] [14 Regression] 10% slowdown of 462.libquantum on AMD Ryzen 7700X and Ryzen 7900X
  2024-02-09  9:24 [Bug target/113847] New: [14 Regression] 10% slowdown of 462.libquantum on AMD Ryzen 7700X and Ryzen 7900X fkastl at suse dot cz
                   ` (4 preceding siblings ...)
  2024-02-12 14:41 ` rguenth at gcc dot gnu.org
@ 2024-02-12 14:43 ` rguenth at gcc dot gnu.org
  2024-02-12 15:30 ` jamborm at gcc dot gnu.org
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: rguenth at gcc dot gnu.org @ 2024-02-12 14:43 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113847

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |jamborm at gcc dot gnu.org

--- Comment #5 from Richard Biener <rguenth at gcc dot gnu.org> ---
CCing also Martin who should know how/why IPA SRA doesn't reconstruct the
component ref chain here or why it choses the dynamic type as it does
(possibly local SRA when fully scalarizing an aggregate copy does the same).

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug target/113847] [14 Regression] 10% slowdown of 462.libquantum on AMD Ryzen 7700X and Ryzen 7900X
  2024-02-09  9:24 [Bug target/113847] New: [14 Regression] 10% slowdown of 462.libquantum on AMD Ryzen 7700X and Ryzen 7900X fkastl at suse dot cz
                   ` (5 preceding siblings ...)
  2024-02-12 14:43 ` rguenth at gcc dot gnu.org
@ 2024-02-12 15:30 ` jamborm at gcc dot gnu.org
  2024-03-07 20:40 ` law at gcc dot gnu.org
  2024-05-07  7:45 ` [Bug target/113847] [14/15 " rguenth at gcc dot gnu.org
  8 siblings, 0 replies; 10+ messages in thread
From: jamborm at gcc dot gnu.org @ 2024-02-12 15:30 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113847

--- Comment #6 from Martin Jambor <jamborm at gcc dot gnu.org> ---
(In reply to Richard Biener from comment #5)
> CCing also Martin who should know how/why IPA SRA doesn't reconstruct the
> component ref chain here 

I have not had a look at this specific case (yet), but IPA-SRA just
doesn't (unlike intraprocedural SRA) and always creates MEM_REFs (in
callers).  I guess we could stream field offsets and/or array_ref
indices and attempt to reconstruct it for simple (non-union,
non-otherwise-overlapping) types, even if it would make the
ipa_adjusted_param type (and thus ipa_param_adjustments) slightly
bigger and add another vector.

> or why it choses the dynamic type as it does
> (possibly local SRA when fully scalarizing an aggregate copy does the same).

That is unlikely.  Total scalarization in intraprocedural SRA just
follows the type of the decl whereas IPA-SRA (and intra-SRA too when
not totally scalarizing) takes all types from existing memory
accesses.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug target/113847] [14 Regression] 10% slowdown of 462.libquantum on AMD Ryzen 7700X and Ryzen 7900X
  2024-02-09  9:24 [Bug target/113847] New: [14 Regression] 10% slowdown of 462.libquantum on AMD Ryzen 7700X and Ryzen 7900X fkastl at suse dot cz
                   ` (6 preceding siblings ...)
  2024-02-12 15:30 ` jamborm at gcc dot gnu.org
@ 2024-03-07 20:40 ` law at gcc dot gnu.org
  2024-05-07  7:45 ` [Bug target/113847] [14/15 " rguenth at gcc dot gnu.org
  8 siblings, 0 replies; 10+ messages in thread
From: law at gcc dot gnu.org @ 2024-03-07 20:40 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113847

Jeffrey A. Law <law at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |law at gcc dot gnu.org
           Priority|P3                          |P2

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug target/113847] [14/15 Regression] 10% slowdown of 462.libquantum on AMD Ryzen 7700X and Ryzen 7900X
  2024-02-09  9:24 [Bug target/113847] New: [14 Regression] 10% slowdown of 462.libquantum on AMD Ryzen 7700X and Ryzen 7900X fkastl at suse dot cz
                   ` (7 preceding siblings ...)
  2024-03-07 20:40 ` law at gcc dot gnu.org
@ 2024-05-07  7:45 ` rguenth at gcc dot gnu.org
  8 siblings, 0 replies; 10+ messages in thread
From: rguenth at gcc dot gnu.org @ 2024-05-07  7:45 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113847

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|14.0                        |14.2

--- Comment #7 from Richard Biener <rguenth at gcc dot gnu.org> ---
GCC 14.1 is being released, retargeting bugs to GCC 14.2.

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2024-05-07  7:45 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-02-09  9:24 [Bug target/113847] New: [14 Regression] 10% slowdown of 462.libquantum on AMD Ryzen 7700X and Ryzen 7900X fkastl at suse dot cz
2024-02-09 11:13 ` [Bug target/113847] " rguenth at gcc dot gnu.org
2024-02-10 17:04 ` fkastl at suse dot cz
2024-02-12  9:45 ` rguenth at gcc dot gnu.org
2024-02-12 13:18 ` rguenth at gcc dot gnu.org
2024-02-12 14:41 ` rguenth at gcc dot gnu.org
2024-02-12 14:43 ` rguenth at gcc dot gnu.org
2024-02-12 15:30 ` jamborm at gcc dot gnu.org
2024-03-07 20:40 ` law at gcc dot gnu.org
2024-05-07  7:45 ` [Bug target/113847] [14/15 " rguenth at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).