[Bug tree-optimization/116463] New: [15 Regression] fast-math-complex-mls-{double,float}.c fail after r15-3087-gb07f8a301158e5

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

* [Bug tree-optimization/116463] New: [15 Regression] fast-math-complex-mls-{double,float}.c fail after r15-3087-gb07f8a301158e5
@ 2024-08-22 19:58 pinskia at gcc dot gnu.org
  2024-08-22 19:59 ` [Bug tree-optimization/116463] " pinskia at gcc dot gnu.org
                   ` (12 more replies)
  0 siblings, 13 replies; 14+ messages in thread
From: pinskia at gcc dot gnu.org @ 2024-08-22 19:58 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116463

            Bug ID: 116463
           Summary: [15 Regression] fast-math-complex-mls-{double,float}.c
                    fail after r15-3087-gb07f8a301158e5
           Product: gcc
           Version: 15.0
            Status: UNCONFIRMED
          Keywords: missed-optimization, testsuite-fail
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: pinskia at gcc dot gnu.org
  Target Milestone: ---
            Target: aarch64-*-*

make check-gcc RUNTESTFLAGS="complex.exp=fast-math-complex-mls-*.c"
FAIL: gcc.dg/vect/complex/fast-math-complex-mls-double.c scan-tree-dump vect
"Found COMPLEX_FMA"
FAIL: gcc.dg/vect/complex/fast-math-complex-mls-float.c scan-tree-dump vect
"Found COMPLEX_FMA"

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug tree-optimization/116463] [15 Regression] fast-math-complex-mls-{double,float}.c fail after r15-3087-gb07f8a301158e5
  2024-08-22 19:58 [Bug tree-optimization/116463] New: [15 Regression] fast-math-complex-mls-{double,float}.c fail after r15-3087-gb07f8a301158e5 pinskia at gcc dot gnu.org
@ 2024-08-22 19:59 ` pinskia at gcc dot gnu.org
  2024-08-22 20:09 ` pinskia at gcc dot gnu.org
                   ` (11 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: pinskia at gcc dot gnu.org @ 2024-08-22 19:59 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116463

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|---                         |15.0

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug tree-optimization/116463] [15 Regression] fast-math-complex-mls-{double,float}.c fail after r15-3087-gb07f8a301158e5
  2024-08-22 19:58 [Bug tree-optimization/116463] New: [15 Regression] fast-math-complex-mls-{double,float}.c fail after r15-3087-gb07f8a301158e5 pinskia at gcc dot gnu.org
  2024-08-22 19:59 ` [Bug tree-optimization/116463] " pinskia at gcc dot gnu.org
@ 2024-08-22 20:09 ` pinskia at gcc dot gnu.org
  2024-08-22 20:18 ` pinskia at gcc dot gnu.org
                   ` (10 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: pinskia at gcc dot gnu.org @ 2024-08-22 20:09 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116463

--- Comment #1 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Created attachment 58977
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=58977&action=edit
Reduced testcase

options: `-ftree-vectorize -fno-tree-loop-distribute-patterns
-fno-vect-cost-model -fno-common -O2 -ffast-math  -march=armv8.3-a`

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug tree-optimization/116463] [15 Regression] fast-math-complex-mls-{double,float}.c fail after r15-3087-gb07f8a301158e5
  2024-08-22 19:58 [Bug tree-optimization/116463] New: [15 Regression] fast-math-complex-mls-{double,float}.c fail after r15-3087-gb07f8a301158e5 pinskia at gcc dot gnu.org
  2024-08-22 19:59 ` [Bug tree-optimization/116463] " pinskia at gcc dot gnu.org
  2024-08-22 20:09 ` pinskia at gcc dot gnu.org
@ 2024-08-22 20:18 ` pinskia at gcc dot gnu.org
  2024-08-22 20:20 ` pinskia at gcc dot gnu.org
                   ` (9 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: pinskia at gcc dot gnu.org @ 2024-08-22 20:18 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116463

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
  Attachment #58977|0                           |1
        is obsolete|                            |

--- Comment #2 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Created attachment 58978
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=58978&action=edit
Better testcase

Before the patch fms180snd could be detected but fms180snd_1 could not.
BUT both are the same function just changed when the multiply by i happens.

fms180snd_1 represents what happens after the patch for fms180snd .

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug tree-optimization/116463] [15 Regression] fast-math-complex-mls-{double,float}.c fail after r15-3087-gb07f8a301158e5
  2024-08-22 19:58 [Bug tree-optimization/116463] New: [15 Regression] fast-math-complex-mls-{double,float}.c fail after r15-3087-gb07f8a301158e5 pinskia at gcc dot gnu.org
                   ` (2 preceding siblings ...)
  2024-08-22 20:18 ` pinskia at gcc dot gnu.org
@ 2024-08-22 20:20 ` pinskia at gcc dot gnu.org
  2024-08-22 20:43 ` pinskia at gcc dot gnu.org
                   ` (8 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: pinskia at gcc dot gnu.org @ 2024-08-22 20:20 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116463

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |tnfchris at gcc dot gnu.org
             Blocks|                            |53947
             Status|UNCONFIRMED                 |NEW
   Last reconfirmed|                            |2024-08-22
     Ever confirmed|0                           |1

--- Comment #3 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
+Tamar
since he wrote the original Complex vectorization support.


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947
[Bug 53947] [meta-bug] vectorizer missed-optimizations

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug tree-optimization/116463] [15 Regression] fast-math-complex-mls-{double,float}.c fail after r15-3087-gb07f8a301158e5
  2024-08-22 19:58 [Bug tree-optimization/116463] New: [15 Regression] fast-math-complex-mls-{double,float}.c fail after r15-3087-gb07f8a301158e5 pinskia at gcc dot gnu.org
                   ` (3 preceding siblings ...)
  2024-08-22 20:20 ` pinskia at gcc dot gnu.org
@ 2024-08-22 20:43 ` pinskia at gcc dot gnu.org
  2024-08-22 23:03 ` tnfchris at gcc dot gnu.org
                   ` (7 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: pinskia at gcc dot gnu.org @ 2024-08-22 20:43 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116463

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
  Attachment #58978|0                           |1
        is obsolete|                            |

--- Comment #4 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Created attachment 58979
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=58979&action=edit
Full testcase

Before the change fms180snd_2a and fms180snd_1 could not be detected even
though they are all the same.

Note I think fms180snd_2a is more representative of what is done after the
patch for fms180snd rather than fms180snd_1.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug tree-optimization/116463] [15 Regression] fast-math-complex-mls-{double,float}.c fail after r15-3087-gb07f8a301158e5
  2024-08-22 19:58 [Bug tree-optimization/116463] New: [15 Regression] fast-math-complex-mls-{double,float}.c fail after r15-3087-gb07f8a301158e5 pinskia at gcc dot gnu.org
                   ` (4 preceding siblings ...)
  2024-08-22 20:43 ` pinskia at gcc dot gnu.org
@ 2024-08-22 23:03 ` tnfchris at gcc dot gnu.org
  2024-08-23 11:43 ` rguenth at gcc dot gnu.org
                   ` (6 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: tnfchris at gcc dot gnu.org @ 2024-08-22 23:03 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116463

--- Comment #5 from Tamar Christina <tnfchris at gcc dot gnu.org> ---
Yeah, This is because they generate different gimple sequences and thus
different SLP trees.
The core of the problem is there's no canonical form here, and a missing gimple
simplification rule:

  _33 = IMAGPART_EXPR <*_3> + ((REALPART_EXPR <*_5> * IMAGPART_EXPR <*_7>) +
(IMAGPART_EXPR <*_5> * REALPART_EXPR <*_7>));
vs
  _37 = IMAGPART_EXPR <*_3> - ((REALPART_EXPR <*_5> * -IMAGPART_EXPR <*_7>) +
(IMAGPART_EXPR <*_5> * -REALPART_EXPR <*_7>));

i.e. a - ((b * -c) + (d * -e)) == a + (b * c) + (d * e)

So probably in match.pd we should fold _37 into _33 which is a simpler form of
the same thing and it's better on scalar as well.

It would be better to finally introduce a vectorizer canonical form, for
instance the real part generates:

  _36 = (_31 - _30) + REALPART_EXPR <*_3>;
vs
  _32 = REALPART_EXPR <*_3> + (_26 - _27);

and this already is an additional thing to check, so it would be better if slp
build always puts complex parts consistently on one side of commutative
operations so we don't have to swap operands to check.

In any case, I have some patches in this area and can take a look when I'm
back, but think the new expression should be simplified back into the old one.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug tree-optimization/116463] [15 Regression] fast-math-complex-mls-{double,float}.c fail after r15-3087-gb07f8a301158e5
  2024-08-22 19:58 [Bug tree-optimization/116463] New: [15 Regression] fast-math-complex-mls-{double,float}.c fail after r15-3087-gb07f8a301158e5 pinskia at gcc dot gnu.org
                   ` (5 preceding siblings ...)
  2024-08-22 23:03 ` tnfchris at gcc dot gnu.org
@ 2024-08-23 11:43 ` rguenth at gcc dot gnu.org
  2024-08-23 12:37 ` cvs-commit at gcc dot gnu.org
                   ` (5 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: rguenth at gcc dot gnu.org @ 2024-08-23 11:43 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116463

--- Comment #6 from Richard Biener <rguenth at gcc dot gnu.org> ---
I think

  a - ((b * -c) + (d * -e))  ->  a + (b * c) + (d * e)

is a good simplification to be made, but it's difficult to do this with
canonicalization only.  Like a * -b -> -(a * b) as the negate might
combine with both other negates down and upstream.  But for
a*-b + c * -d it might be more obvious to turn that into
-a*b - c*d.

Maybe reassoc can be of help here - IIRC it turns b * -c into
b * c * -1, undistribute_ops_list might get that.

Note one issue is that complex lowering leaves around dead stmts,
confusing reassoc and forwprop, in particular

-  _10 = COMPLEX_EXPR <_18, _6>;

stay around until reassoc.  scheduling dce for testing shows reassoc
does something.

It's update_complex_assignment who replaces existing complex
stmts with COMPLEX_EXPRs, we should possibly resort do simple_dce_from_worklist
to clean those.  Let me try to do that.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug tree-optimization/116463] [15 Regression] fast-math-complex-mls-{double,float}.c fail after r15-3087-gb07f8a301158e5
  2024-08-22 19:58 [Bug tree-optimization/116463] New: [15 Regression] fast-math-complex-mls-{double,float}.c fail after r15-3087-gb07f8a301158e5 pinskia at gcc dot gnu.org
                   ` (6 preceding siblings ...)
  2024-08-23 11:43 ` rguenth at gcc dot gnu.org
@ 2024-08-23 12:37 ` cvs-commit at gcc dot gnu.org
  2024-08-23 12:46 ` rguenth at gcc dot gnu.org
                   ` (4 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2024-08-23 12:37 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116463

--- Comment #7 from GCC Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Richard Biener <rguenth@gcc.gnu.org>:

https://gcc.gnu.org/g:de1923f9f4d5344694c22ca883aeb15caf635734

commit r15-3128-gde1923f9f4d5344694c22ca883aeb15caf635734
Author: Richard Biener <rguenther@suse.de>
Date:   Fri Aug 23 13:44:29 2024 +0200

    tree-optimization/116463 - complex lowering leaves around dead stmts

    Complex lowering generally replaces existing complex defs with
    COMPLEX_EXPRs but those might be dead when it can always refer to
    components from the lattice.  This in turn can pessimize followup
    transforms like forwprop and reassoc, the following makes sure to
    get rid of dead COMPLEX_EXPRs generated by using
    simple_dce_from_worklist.

            PR tree-optimization/116463
            * tree-complex.cc: Include tree-ssa-dce.h.
            (dce_worklist): New global.
            (update_complex_assignment): Add SSA def to the DCE worklist.
            (tree_lower_complex): Perform DCE.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug tree-optimization/116463] [15 Regression] fast-math-complex-mls-{double,float}.c fail after r15-3087-gb07f8a301158e5
  2024-08-22 19:58 [Bug tree-optimization/116463] New: [15 Regression] fast-math-complex-mls-{double,float}.c fail after r15-3087-gb07f8a301158e5 pinskia at gcc dot gnu.org
                   ` (7 preceding siblings ...)
  2024-08-23 12:37 ` cvs-commit at gcc dot gnu.org
@ 2024-08-23 12:46 ` rguenth at gcc dot gnu.org
  2024-08-23 23:03 ` pinskia at gcc dot gnu.org
                   ` (3 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: rguenth at gcc dot gnu.org @ 2024-08-23 12:46 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116463

--- Comment #8 from Richard Biener <rguenth at gcc dot gnu.org> ---
As of r15-3128-gde1923f9f4d534 now

FAIL: gcc.target/i386/avx512fp16-vector-complex-float.c scan-assembler-not
vfmadd[123]*ph[ \\\\t]
FAIL: gcc.target/i386/avx512fp16-vector-complex-float.c scan-assembler-times
vfmaddcph[ \\\\t] 1
FAIL: gcc.target/i386/part-vect-complexhf.c scan-assembler-times vfmaddcph[
\\\\t] 1

fail which look similar to the aarch64 fails (I have no idea if the patch
helped for those).

For the first test it's fma0 which is no longer vectorized as

        vmovdqu16       (%rdx), %zmm0
        vmovdqu16       (%rsi), %zmm1
        vfmaddcph       (%rdi), %zmm1, %zmm0
        vmovdqu16       %zmm0, (%rdx)

but

        vmovdqu16       (%rsi), %zmm0
        vmovdqu16       (%rdi), %zmm2
        movl    $1431655765, %eax
        kmovd   %eax, %k1
        vpshufb .LC1(%rip), %zmm0, %zmm1
        vfmadd213ph     (%rdx), %zmm2, %zmm1
        vpshufb .LC2(%rip), %zmm0, %zmm0
        vpshufb .LC0(%rip), %zmm2, %zmm3
        vmovdqa64       %zmm0, %zmm2
        vfmadd132ph     %zmm3, %zmm1, %zmm2
        vfnmadd132ph    %zmm3, %zmm1, %zmm0
        vpblendmw       %zmm0, %zmm2, %zmm0{%k1}
        vmovdqu16       %zmm0, (%rdx)

where instead of

note:    Found COMPLEX_FMA pattern in SLP tree

we have

note:    Found VEC_ADDSUB pattern in SLP tree
note:    Target does not support VEC_ADDSUB for vector type vector(32) _Float16 

with the IL difference being (- is good, + is bad)

  _12 = REALPART_EXPR <*_3>;
  _11 = IMAGPART_EXPR <*_3>;
...
@@ -46,10 +46,10 @@
   _27 = _19 * _25;
   _28 = _20 * _25;
   _29 = _19 * _24;
-  _30 = _26 - _27;
-  _31 = _28 + _29;
-  _32 = _12 + _30;
-  _33 = _11 + _31;
+  _9 = _12 + _26;
+  _10 = _11 + _28;
+  _32 = _9 - _27;
+  _33 = _10 + _29;
   REALPART_EXPR <*_3> = _32;
   IMAGPART_EXPR <*_3> = _33;
   i_18 = i_21 + 1;

which is different association, enabled by deleting dead uses that confuse
reassoc.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug tree-optimization/116463] [15 Regression] fast-math-complex-mls-{double,float}.c fail after r15-3087-gb07f8a301158e5
  2024-08-22 19:58 [Bug tree-optimization/116463] New: [15 Regression] fast-math-complex-mls-{double,float}.c fail after r15-3087-gb07f8a301158e5 pinskia at gcc dot gnu.org
                   ` (8 preceding siblings ...)
  2024-08-23 12:46 ` rguenth at gcc dot gnu.org
@ 2024-08-23 23:03 ` pinskia at gcc dot gnu.org
  2024-08-25 20:21 ` pinskia at gcc dot gnu.org
                   ` (2 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: pinskia at gcc dot gnu.org @ 2024-08-23 23:03 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116463

--- Comment #9 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
(In reply to Richard Biener from comment #8)
> fail which look similar to the aarch64 fails (I have no idea if the patch
> helped for those).

The aarch64 ones still fail. And yes they look very similar.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug tree-optimization/116463] [15 Regression] fast-math-complex-mls-{double,float}.c fail after r15-3087-gb07f8a301158e5
  2024-08-22 19:58 [Bug tree-optimization/116463] New: [15 Regression] fast-math-complex-mls-{double,float}.c fail after r15-3087-gb07f8a301158e5 pinskia at gcc dot gnu.org
                   ` (9 preceding siblings ...)
  2024-08-23 23:03 ` pinskia at gcc dot gnu.org
@ 2024-08-25 20:21 ` pinskia at gcc dot gnu.org
  2024-08-28  8:06 ` tnfchris at gcc dot gnu.org
  2024-08-28  8:13 ` rguenth at gcc dot gnu.org
  12 siblings, 0 replies; 14+ messages in thread
From: pinskia at gcc dot gnu.org @ 2024-08-25 20:21 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116463

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           See Also|                            |https://gcc.gnu.org/bugzill
                   |                            |a/show_bug.cgi?id=105095

--- Comment #10 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
The current failures/xpass for aarch64 is:
XPASS: gcc.dg/vect/complex/fast-math-complex-add-half-float.c
scan-tree-dump-times vect "stmt.*COMPLEX_ADD_ROT270" 1
XPASS: gcc.dg/vect/complex/fast-math-complex-add-half-float.c
scan-tree-dump-times vect "stmt.*COMPLEX_ADD_ROT90" 1
FAIL: gcc.dg/vect/complex/fast-math-complex-mls-double.c scan-tree-dump vect
"Found COMPLEX_ADD_ROT270"
FAIL: gcc.dg/vect/complex/fast-math-complex-mls-float.c scan-tree-dump vect
"Found COMPLEX_ADD_ROT270"
FAIL: gcc.dg/vect/complex/fast-math-complex-mls-half-float.c scan-tree-dump
vect "Found COMPLEX_ADD_ROT270"

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug tree-optimization/116463] [15 Regression] fast-math-complex-mls-{double,float}.c fail after r15-3087-gb07f8a301158e5
  2024-08-22 19:58 [Bug tree-optimization/116463] New: [15 Regression] fast-math-complex-mls-{double,float}.c fail after r15-3087-gb07f8a301158e5 pinskia at gcc dot gnu.org
                   ` (10 preceding siblings ...)
  2024-08-25 20:21 ` pinskia at gcc dot gnu.org
@ 2024-08-28  8:06 ` tnfchris at gcc dot gnu.org
  2024-08-28  8:13 ` rguenth at gcc dot gnu.org
  12 siblings, 0 replies; 14+ messages in thread
From: tnfchris at gcc dot gnu.org @ 2024-08-28  8:06 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116463

--- Comment #11 from Tamar Christina <tnfchris at gcc dot gnu.org> ---
(In reply to Richard Biener from comment #6)
> I think
> 
>   a - ((b * -c) + (d * -e))  ->  a + (b * c) + (d * e)
> 
> is a good simplification to be made, but it's difficult to do this with
> canonicalization only.  Like a * -b -> -(a * b) as the negate might
> combine with both other negates down and upstream.  But for
> a*-b + c * -d it might be more obvious to turn that into
> -a*b - c*d.

Yeah, my expectation was that this would be an easier transform to avoid
the sharing problem we discussed before and that indeed the transform
looks at the entire chain not just transforming a * -b.

a*-b + c * -d -> -a*b - c*d

has the property of still maintaining the FMS and FMNS chains and can
get further simplified in the above case.

> 
> Maybe reassoc can be of help here - IIRC it turns b * -c into
> b * c * -1, undistribute_ops_list might get that.

hmm I see, but don't we have a higher chance that folding will just
fold it back into the multiply?

For this to work we'd have to do

  (b * -c) + (d * -e) -> -(b * c + d * e)

in one transformation no? since I'd imagine

  (b * c * -1) + (d * e * -1)

would just be undone by match.pd?

> 
> Note one issue is that complex lowering leaves around dead stmts,
> confusing reassoc and forwprop, in particular
> 
> -  _10 = COMPLEX_EXPR <_18, _6>;
> 
> stay around until reassoc.  scheduling dce for testing shows reassoc
> does something.
> 
> It's update_complex_assignment who replaces existing complex
> stmts with COMPLEX_EXPRs, we should possibly resort do
> simple_dce_from_worklist
> to clean those.  Let me try to do that.

Thanks!

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug tree-optimization/116463] [15 Regression] fast-math-complex-mls-{double,float}.c fail after r15-3087-gb07f8a301158e5
  2024-08-22 19:58 [Bug tree-optimization/116463] New: [15 Regression] fast-math-complex-mls-{double,float}.c fail after r15-3087-gb07f8a301158e5 pinskia at gcc dot gnu.org
                   ` (11 preceding siblings ...)
  2024-08-28  8:06 ` tnfchris at gcc dot gnu.org
@ 2024-08-28  8:13 ` rguenth at gcc dot gnu.org
  12 siblings, 0 replies; 14+ messages in thread
From: rguenth at gcc dot gnu.org @ 2024-08-28  8:13 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116463

--- Comment #12 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Tamar Christina from comment #11)
> (In reply to Richard Biener from comment #6)
> > I think
> > 
> >   a - ((b * -c) + (d * -e))  ->  a + (b * c) + (d * e)
> > 
> > is a good simplification to be made, but it's difficult to do this with
> > canonicalization only.  Like a * -b -> -(a * b) as the negate might
> > combine with both other negates down and upstream.  But for
> > a*-b + c * -d it might be more obvious to turn that into
> > -a*b - c*d.
> 
> Yeah, my expectation was that this would be an easier transform to avoid
> the sharing problem we discussed before and that indeed the transform
> looks at the entire chain not just transforming a * -b.
> 
> a*-b + c * -d -> -a*b - c*d
> 
> has the property of still maintaining the FMS and FMNS chains and can
> get further simplified in the above case.
> 
> > 
> > Maybe reassoc can be of help here - IIRC it turns b * -c into
> > b * c * -1, undistribute_ops_list might get that.
> 
> hmm I see, but don't we have a higher chance that folding will just
> fold it back into the multiply?
> 
> For this to work we'd have to do
> 
>   (b * -c) + (d * -e) -> -(b * c + d * e)
> 
> in one transformation no? since I'd imagine
> 
>   (b * c * -1) + (d * e * -1)
> 
> would just be undone by match.pd?

The * -1 is something reassoc does only internally, it then distributes
that back to generate an outer plus or minus.

Note for the x86 testcases there isn't any such simplification opportunity,
but the reassoc heuristics correctly mangle the expression to no longer
match the expected SLP complex patterns.  There's also the re-association
of chains done by SLP discovery itself which could be a problem.

I'd say fixing this fallout is quite low priority at the moment, the
simple cases could be re-associated by reassoc into a recognizable
complex op order but even there it's a bit difficult as the operations
span two "chains" (a multiplication and addition chain) where reassoc
looks at them separately (apart from undistribution).

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2024-08-28  8:13 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-08-22 19:58 [Bug tree-optimization/116463] New: [15 Regression] fast-math-complex-mls-{double,float}.c fail after r15-3087-gb07f8a301158e5 pinskia at gcc dot gnu.org
2024-08-22 19:59 ` [Bug tree-optimization/116463] " pinskia at gcc dot gnu.org
2024-08-22 20:09 ` pinskia at gcc dot gnu.org
2024-08-22 20:18 ` pinskia at gcc dot gnu.org
2024-08-22 20:20 ` pinskia at gcc dot gnu.org
2024-08-22 20:43 ` pinskia at gcc dot gnu.org
2024-08-22 23:03 ` tnfchris at gcc dot gnu.org
2024-08-23 11:43 ` rguenth at gcc dot gnu.org
2024-08-23 12:37 ` cvs-commit at gcc dot gnu.org
2024-08-23 12:46 ` rguenth at gcc dot gnu.org
2024-08-23 23:03 ` pinskia at gcc dot gnu.org
2024-08-25 20:21 ` pinskia at gcc dot gnu.org
2024-08-28  8:06 ` tnfchris at gcc dot gnu.org
2024-08-28  8:13 ` rguenth at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).