[Bug target/101296] New: Addition of x86 addsub SLP patterned slowed down 433.milc by 12% on znver2 with -Ofast -flto

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

* [Bug target/101296] New: Addition of  x86 addsub SLP patterned slowed down 433.milc by 12% on znver2 with -Ofast -flto
@ 2021-07-02 12:17 jamborm at gcc dot gnu.org
  2021-07-02 12:26 ` [Bug target/101296] " rguenth at gcc dot gnu.org
                   ` (10 more replies)
  0 siblings, 11 replies; 12+ messages in thread
From: jamborm at gcc dot gnu.org @ 2021-07-02 12:17 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101296

            Bug ID: 101296
           Summary: Addition of  x86 addsub SLP patterned slowed down
                    433.milc by 12% on znver2 with -Ofast -flto
           Product: gcc
           Version: 12.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: jamborm at gcc dot gnu.org
                CC: rguenth at gcc dot gnu.org
  Target Milestone: ---
              Host: x86_64-linux
            Target: x86_64-linux

Commit g:7a6c31f0f84 (Add x86 addsub SLP pattern) has slowed down
433.milc from SPECFP 2006 by 12% on a znver2 based machine when
compiled with -Ofast -flto -march=native.

Note however that the master branch has afterwards recovered some of
these losses and today's bc8f0ed7042 is only 6% slower than what it
was before the addsub addition.  Nevertheless, this effect of this
particular change may be worth looking at.

LNT tracking graph is available for example here:
https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=289.70.0

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug target/101296] Addition of  x86 addsub SLP patterned slowed down 433.milc by 12% on znver2 with -Ofast -flto
  2021-07-02 12:17 [Bug target/101296] New: Addition of x86 addsub SLP patterned slowed down 433.milc by 12% on znver2 with -Ofast -flto jamborm at gcc dot gnu.org
@ 2021-07-02 12:26 ` rguenth at gcc dot gnu.org
  2021-07-05  9:20 ` rguenth at gcc dot gnu.org
                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-07-02 12:26 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101296

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
     Ever confirmed|0                           |1
           Assignee|unassigned at gcc dot gnu.org      |rguenth at gcc dot gnu.org
   Last reconfirmed|                            |2021-07-02
             Status|UNCONFIRMED                 |ASSIGNED

--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
I will have a look next week.  A quick look shows FMAs being used and addsub
can break FMA detection until we get general optab support for fmaddsub
and friends.  So it might be { fma, fms } + blend compared to addsub + mul
where the former maybe has lower latency though Agner says FMA (5c) + blend
(1c)
vs ADDSUB (3c) + MUL (3c).  As said, I have to look into this in more detail.

double a[4], b[4], c[4];

void foo ()
{
  c[0] = a[0] - b[0] * c[0];
  c[1] = a[1] + b[1] * c[1];
  c[2] = a[2] - b[2] * c[2];
  c[3] = a[3] + b[3] * c[3];
}

        vmovapd a(%rip), %ymm2
        vmovapd b(%rip), %ymm1
        vmovapd b(%rip), %ymm0
        vfmadd132pd     c(%rip), %ymm2, %ymm1
        vfnmadd132pd    c(%rip), %ymm2, %ymm0
        vshufpd $10, %ymm1, %ymm0, %ymm0
        vmovapd %ymm0, c(%rip)

vs.

        vmovapd b(%rip), %ymm1
        vmovapd a(%rip), %ymm2
        vmulpd  c(%rip), %ymm1, %ymm0
        vaddsubpd       %ymm0, %ymm2, %ymm0
        vmovapd %ymm0, c(%rip)

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug target/101296] Addition of  x86 addsub SLP patterned slowed down 433.milc by 12% on znver2 with -Ofast -flto
  2021-07-02 12:17 [Bug target/101296] New: Addition of x86 addsub SLP patterned slowed down 433.milc by 12% on znver2 with -Ofast -flto jamborm at gcc dot gnu.org
  2021-07-02 12:26 ` [Bug target/101296] " rguenth at gcc dot gnu.org
@ 2021-07-05  9:20 ` rguenth at gcc dot gnu.org
  2021-07-05  9:30 ` rguenth at gcc dot gnu.org
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-07-05  9:20 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101296

--- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> ---
Samples: 884K of event 'cycles:u', Event count (approx.): 967510000841          
Overhead       Samples  Command          Shared Object             Symbol       
  13.76%        119196  milc_peak.amd64  milc_peak.amd64-m64-mine  [.]
u_shift_fermion                                                 #
  10.08%         87085  milc_base.amd64  milc_base.amd64-m64-mine  [.]
add_force_to_mom                                                #
   9.93%         85891  milc_base.amd64  milc_base.amd64-m64-mine  [.]
u_shift_fermion                                                 #
   9.38%         81331  milc_peak.amd64  milc_peak.amd64-m64-mine  [.]
add_force_to_mom                                                #
   9.03%         82570  milc_peak.amd64  milc_peak.amd64-m64-mine  [.]
mult_su3_na                                                     #
   8.55%         77803  milc_base.amd64  milc_base.amd64-m64-mine  [.]
mult_su3_na                                                     #
   7.41%         65641  milc_peak.amd64  milc_peak.amd64-m64-mine  [.]
mult_su3_nn                                                     #
   6.26%         55314  milc_base.amd64  milc_base.amd64-m64-mine  [.]
mult_su3_nn                                                     #
   1.48%         12876  milc_peak.amd64  milc_peak.amd64-m64-mine  [.]
mult_su3_an                                                     #
   1.42%         12625  milc_base.amd64  milc_base.amd64-m64-mine  [.]
imp_gauge_force.constprop.0                                     #
   1.18%         10602  milc_peak.amd64  milc_peak.amd64-m64-mine  [.]
imp_gauge_force.constprop.0                                     #
   1.00%          8853  milc_base.amd64  milc_base.amd64-m64-mine  [.]
mult_su3_mat_vec_sum_4dir                                       #
   0.94%          8343  milc_peak.amd64  milc_peak.amd64-m64-mine  [.]
mult_su3_mat_vec_sum_4dir                                       #
   0.94%          8156  milc_base.amd64  milc_base.amd64-m64-mine  [.]
mult_su3_an

The odd thing is that for example mult_su3_an reports vastly different
amount of cycles but the assembly is 1:1 identical.

There are in total 16 vaddsubpd instructions in the new variant in
symbols add_force_to_mom (1) and mult_su3_nn (15) but that doesn't
explain the difference seen above.

There are more detected ADDSUB patterns but they do not materialize in the
end, still there's some effect on RA and scheduling in functions like
u_shift_fermion, but the vectorizer dumps do not reveal anything interesting
for this example either.

I was using the following to disable the added pattern:

diff --git a/gcc/tree-vect-slp-patterns.c b/gcc/tree-vect-slp-patterns.c
index 2671f91972d..388b185dc7b 100644
--- a/gcc/tree-vect-slp-patterns.c
+++ b/gcc/tree-vect-slp-patterns.c
@@ -1510,7 +1510,7 @@ addsub_pattern::recognize (slp_tree_to_load_perm_map_t *,
slp_tree *node_)
 {
   slp_tree node = *node_;
   if (SLP_TREE_CODE (node) != VEC_PERM_EXPR
-      || SLP_TREE_CHILDREN (node).length () != 2)
+      || SLP_TREE_CHILDREN (node).length () != 2 || 1)
     return NULL;

   /* Match a blend of a plus and a minus op with the same number of plus and


To sum up - I have no idea why performance has regressed.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug target/101296] Addition of  x86 addsub SLP patterned slowed down 433.milc by 12% on znver2 with -Ofast -flto
  2021-07-02 12:17 [Bug target/101296] New: Addition of x86 addsub SLP patterned slowed down 433.milc by 12% on znver2 with -Ofast -flto jamborm at gcc dot gnu.org
  2021-07-02 12:26 ` [Bug target/101296] " rguenth at gcc dot gnu.org
  2021-07-05  9:20 ` rguenth at gcc dot gnu.org
@ 2021-07-05  9:30 ` rguenth at gcc dot gnu.org
  2021-07-05  9:36 ` rguenth at gcc dot gnu.org
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-07-05  9:30 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101296

--- Comment #3 from Richard Biener <rguenth at gcc dot gnu.org> ---
Created attachment 51104
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=51104&action=edit
mult_su3_nn testcase

This is the function with the (nearly) only and many vaddsubpd instructions.

With the addsub pattern we have 15 addsub and 33 fma, 51 mul, 14 add and 3 sub
while without the pattern we have zero addsub and 54 fma, 54 mul, 32 add and 9
sub.  Detecting fmaddsub directly in the vectorizer might be worthwhile.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug target/101296] Addition of  x86 addsub SLP patterned slowed down 433.milc by 12% on znver2 with -Ofast -flto
  2021-07-02 12:17 [Bug target/101296] New: Addition of x86 addsub SLP patterned slowed down 433.milc by 12% on znver2 with -Ofast -flto jamborm at gcc dot gnu.org
                   ` (2 preceding siblings ...)
  2021-07-05  9:30 ` rguenth at gcc dot gnu.org
@ 2021-07-05  9:36 ` rguenth at gcc dot gnu.org
  2021-07-06 13:02 ` rguenth at gcc dot gnu.org
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-07-05  9:36 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101296

--- Comment #4 from Richard Biener <rguenth at gcc dot gnu.org> ---
Disabling vectorization for mult_su3_nn (the one with the vaddsubpd
instructions) still reproduces

433.milc         9180        126       73.1 *    9180        133       69.2 *  

and thus a 5% slowdown.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug target/101296] Addition of  x86 addsub SLP patterned slowed down 433.milc by 12% on znver2 with -Ofast -flto
  2021-07-02 12:17 [Bug target/101296] New: Addition of x86 addsub SLP patterned slowed down 433.milc by 12% on znver2 with -Ofast -flto jamborm at gcc dot gnu.org
                   ` (3 preceding siblings ...)
  2021-07-05  9:36 ` rguenth at gcc dot gnu.org
@ 2021-07-06 13:02 ` rguenth at gcc dot gnu.org
  2021-07-07  8:31 ` rguenth at gcc dot gnu.org
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-07-06 13:02 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101296

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |hubicka at gcc dot gnu.org

--- Comment #5 from Richard Biener <rguenth at gcc dot gnu.org> ---
OK, so some interesting difference is (that's all of the -fopt-info-vec
differences):

-s_m_a_mat.c:18:18: optimized: basic block part vectorized using 32 byte
vectors
-s_m_a_mat.c:18:18: optimized: basic block part vectorized using 32 byte
vectors
-s_m_a_mat.c:18:18: optimized: basic block part vectorized using 32 byte
vectors
-s_m_a_mat.c:18:18: optimized: basic block part vectorized using 32 byte
vectors
-s_m_a_mat.c:18:18: optimized: basic block part vectorized using 32 byte
vectors
+m_mat_nn.c:90:17: optimized: basic block part vectorized using 16 byte vectors

The +m_mat_nn.c:90:17 is mult_su3_nn while the -s_m_a_mat.c:18:18 is
scalar_mult_add_su3_matrix which is inlined at all call sites.   The cases
missing are all inlined into the function update_u.

The odd thing is that we're seeing changes in .vect of update_u like

@@ -3426,46 +3334,40 @@
   # DEBUG j => 0
   # DEBUG BEGIN_STMT
   # DEBUG BEGIN_STMT
-  _918 = MEM <struct site> [(struct su3_matrix
*)s_103].link[dir_67].e[0][0].real;
   _919 = temp1.e[0][0].real;
   _920 = t5_12 * _919;
-  _921 = _918 + _920;
+  _921 = _920 + _1023;
   temp2.e[0][0].real = _921;
   # DEBUG BEGIN_STMT
-  _923 = MEM <struct site> [(struct su3_matrix
*)s_103].link[dir_67].e[0][0].imag;
   _924 = temp1.e[0][0].imag;
   _925 = t5_12 * _924;
-  _926 = _923 + _925;
+  _926 = _925 + _1028;
...

which in the end result in less DRs into SLP and thus a different outcome
there.
This difference starts in the cunrolli dump!?  Dump differences are like

+ipa-modref: call stmt mult_su3_nn (&htemp, link_24, &temp1);
+ipa-modref: call to mult_su3_nn/1705 does not clobber base: temp2 alias sets:
6->5
...
 Value numbering stmt = _938 = link_24->e[i_915][2].real;
-Setting value number of _938 to _938 (changed)
-Making available beyond BB152 _938 for value _938
+ipa-modref: call stmt mult_su3_nn (&htemp, &temp2, &temp1);
+ipa-modref: call to mult_su3_nn/1705 does not clobber base: MEM <struct site>
[(struct su3_matrix *)s_5] alias sets: 6->5
+ipa-modref: call stmt mult_su3_nn (&htemp, link_24, &temp1);
+ipa-modref: call to mult_su3_nn/1705 does not clobber base: MEM <struct site>
[(struct su3_matrix *)s_5] alias sets: 6->5
+Setting value number of _938 to _1043 (changed)
+_1043 is available for _1043
+Replaced link_24->e[i_915][2].real with _1043 in all uses of _938 =
link_24->e[i_915][2].real;

it's really odd, the WPA and LTRANS modref dumps do not show any difference
but the above looks like IPA summary is once available and once not.  Ah,
the late modref pass results spill over and it looks like we "improve" here:

   loads:
     Limits: 32 bases, 16 refs
-      Base 0: alias set 6
+      Base 0: alias set 5
+        Ref 0: alias set 5
+          Every access
+      Base 1: alias set 6
         Ref 0: alias set 5
           Every access
   stores:
     Limits: 32 bases, 16 refs
-      Base 0: alias set 6
+      Base 0: alias set 5
         Ref 0: alias set 5
-          Every access
+          access: Parm 2 param offset:0 offset:0 size:128 max_size:128
+          access: Parm 2 param offset:16 offset:0 size:128 max_size:128
+          access: Parm 2 param offset:48 offset:0 size:128 max_size:128
+          access: Parm 2 param offset:64 offset:0 size:128 max_size:128
+          access: Parm 2 param offset:112 offset:0 size:128 max_size:128
+      Base 1: alias set 6
+        Ref 0: alias set 5
+          access: Parm 2 param offset:0 offset:256 size:64 max_size:64
+          access: Parm 2 param offset:0 offset:320 size:64 max_size:64
+          access: Parm 2 param offset:0 offset:640 size:64 max_size:64
+          access: Parm 2 param offset:0 offset:704 size:64 max_size:64
+          access: Parm 2 param offset:0 offset:768 size:64 max_size:64
+          access: Parm 2 param offset:0 offset:832 size:64 max_size:64
+          access: Parm 2 param offset:0 offset:1024 size:64 max_size:64
+          access: Parm 2 param offset:0 offset:1088 size:64 max_size:64
   parm 0 flags: nodirectescape
   parm 1 flags: nodirectescape
   parm 2 flags: direct noescape nodirectescape
 void mult_su3_nn (struct su3_matrix * a, struct su3_matrix * b, struct
su3_matrix * c)

I'm not sure what "Every access" means but I suppose it's "bad" here.  Maybe
it's

  - Analyzing load: b_10(D)->e[2][1].real
    - Recording base_set=6 ref_set=5 parm=1
---param param=modref-max-accesses limit reached
  - Analyzing load: b_10(D)->e[2][1].imag
    - Recording base_set=6 ref_set=5 parm=1
... (a lot) ...
+--param param=modref-max-accesses limit reached
  - Analyzing load: a_7(D)->e[1][1].imag
    - Recording base_set=6 ref_set=5 parm=0
  - ECF_CONST | ECF_NOVOPS, ignoring all stores and all loads except for args.

so eventually vectorizing helps reducing the number of accesses and thus
running into this case?  Using --param modref-max-accesses=64 avoids the
differences in vectorizing besides the expected

+m_mat_nn.c:90:17: optimized: basic block part vectorized using 16 byte vectors

-fno-ipa-modref does the trick as well.  But unfortunately neither manages
to produce binaries that fix the runtime difference or make the perf
report any clearer :/

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug target/101296] Addition of  x86 addsub SLP patterned slowed down 433.milc by 12% on znver2 with -Ofast -flto
  2021-07-02 12:17 [Bug target/101296] New: Addition of x86 addsub SLP patterned slowed down 433.milc by 12% on znver2 with -Ofast -flto jamborm at gcc dot gnu.org
                   ` (4 preceding siblings ...)
  2021-07-06 13:02 ` rguenth at gcc dot gnu.org
@ 2021-07-07  8:31 ` rguenth at gcc dot gnu.org
  2021-08-22 19:26 ` hubicka at gcc dot gnu.org
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-07-07  8:31 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101296

--- Comment #6 from Richard Biener <rguenth at gcc dot gnu.org> ---
Btw, there's no effect of the change visible on Haswell.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug target/101296] Addition of  x86 addsub SLP patterned slowed down 433.milc by 12% on znver2 with -Ofast -flto
  2021-07-02 12:17 [Bug target/101296] New: Addition of x86 addsub SLP patterned slowed down 433.milc by 12% on znver2 with -Ofast -flto jamborm at gcc dot gnu.org
                   ` (5 preceding siblings ...)
  2021-07-07  8:31 ` rguenth at gcc dot gnu.org
@ 2021-08-22 19:26 ` hubicka at gcc dot gnu.org
  2021-10-07 15:42 ` hubicka at gcc dot gnu.org
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: hubicka at gcc dot gnu.org @ 2021-08-22 19:26 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101296

--- Comment #7 from Jan Hubicka <hubicka at gcc dot gnu.org> ---
"every access" means that we no longer track individual bases+offsets+sizes and
everything matching the base/ref alias set will be considered conflicting.

I planned to implement smarter merging of accesses so we do not run out of
limits for such sequential case.  Will look into it.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug target/101296] Addition of  x86 addsub SLP patterned slowed down 433.milc by 12% on znver2 with -Ofast -flto
  2021-07-02 12:17 [Bug target/101296] New: Addition of x86 addsub SLP patterned slowed down 433.milc by 12% on znver2 with -Ofast -flto jamborm at gcc dot gnu.org
                   ` (6 preceding siblings ...)
  2021-08-22 19:26 ` hubicka at gcc dot gnu.org
@ 2021-10-07 15:42 ` hubicka at gcc dot gnu.org
  2021-10-08  6:56 ` rguenth at gcc dot gnu.org
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: hubicka at gcc dot gnu.org @ 2021-10-07 15:42 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101296

--- Comment #8 from Jan Hubicka <hubicka at gcc dot gnu.org> ---
so smarter merging in modref is now implemented ;)

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug target/101296] Addition of  x86 addsub SLP patterned slowed down 433.milc by 12% on znver2 with -Ofast -flto
  2021-07-02 12:17 [Bug target/101296] New: Addition of x86 addsub SLP patterned slowed down 433.milc by 12% on znver2 with -Ofast -flto jamborm at gcc dot gnu.org
                   ` (7 preceding siblings ...)
  2021-10-07 15:42 ` hubicka at gcc dot gnu.org
@ 2021-10-08  6:56 ` rguenth at gcc dot gnu.org
  2021-10-14 16:42 ` jamborm at gcc dot gnu.org
  2023-01-31 11:26 ` jamborm at gcc dot gnu.org
  10 siblings, 0 replies; 12+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-10-08  6:56 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101296

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Assignee|rguenth at gcc dot gnu.org         |unassigned at gcc dot gnu.org
             Status|ASSIGNED                    |NEW

--- Comment #9 from Richard Biener <rguenth at gcc dot gnu.org> ---
433.milc on that specific LNT instance seems to jump up and down with
recovering from the originally reported regression but now being worse than
ever, regressing between Sep. 27 and 28.

But as said, on Zen2, while the changes are reproducible, perf is almost
useless there, pointing to code that's exactly the same :/

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug target/101296] Addition of  x86 addsub SLP patterned slowed down 433.milc by 12% on znver2 with -Ofast -flto
  2021-07-02 12:17 [Bug target/101296] New: Addition of x86 addsub SLP patterned slowed down 433.milc by 12% on znver2 with -Ofast -flto jamborm at gcc dot gnu.org
                   ` (8 preceding siblings ...)
  2021-10-08  6:56 ` rguenth at gcc dot gnu.org
@ 2021-10-14 16:42 ` jamborm at gcc dot gnu.org
  2023-01-31 11:26 ` jamborm at gcc dot gnu.org
  10 siblings, 0 replies; 12+ messages in thread
From: jamborm at gcc dot gnu.org @ 2021-10-14 16:42 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101296

--- Comment #10 from Martin Jambor <jamborm at gcc dot gnu.org> ---
Looking at the LNT graph, I guess this bug should be either closed or suspended
(not sure what the suspended state means for the blocked metabug, so probably
closed).

Yeah, it's weird.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug target/101296] Addition of  x86 addsub SLP patterned slowed down 433.milc by 12% on znver2 with -Ofast -flto
  2021-07-02 12:17 [Bug target/101296] New: Addition of x86 addsub SLP patterned slowed down 433.milc by 12% on znver2 with -Ofast -flto jamborm at gcc dot gnu.org
                   ` (9 preceding siblings ...)
  2021-10-14 16:42 ` jamborm at gcc dot gnu.org
@ 2023-01-31 11:26 ` jamborm at gcc dot gnu.org
  10 siblings, 0 replies; 12+ messages in thread
From: jamborm at gcc dot gnu.org @ 2023-01-31 11:26 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101296

Martin Jambor <jamborm at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|---                         |INVALID

--- Comment #11 from Martin Jambor <jamborm at gcc dot gnu.org> ---
Probably just weirdness of the universe we live in rather than a bug.  At least
the LNT graph loogs good now too.

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2023-01-31 11:26 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-07-02 12:17 [Bug target/101296] New: Addition of x86 addsub SLP patterned slowed down 433.milc by 12% on znver2 with -Ofast -flto jamborm at gcc dot gnu.org
2021-07-02 12:26 ` [Bug target/101296] " rguenth at gcc dot gnu.org
2021-07-05  9:20 ` rguenth at gcc dot gnu.org
2021-07-05  9:30 ` rguenth at gcc dot gnu.org
2021-07-05  9:36 ` rguenth at gcc dot gnu.org
2021-07-06 13:02 ` rguenth at gcc dot gnu.org
2021-07-07  8:31 ` rguenth at gcc dot gnu.org
2021-08-22 19:26 ` hubicka at gcc dot gnu.org
2021-10-07 15:42 ` hubicka at gcc dot gnu.org
2021-10-08  6:56 ` rguenth at gcc dot gnu.org
2021-10-14 16:42 ` jamborm at gcc dot gnu.org
2023-01-31 11:26 ` jamborm at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).