* [Bug tree-optimization/116463] [15 Regression] fast-math-complex-mls-{double,float}.c fail after r15-3087-gb07f8a301158e5
2024-08-22 19:58 [Bug tree-optimization/116463] New: [15 Regression] fast-math-complex-mls-{double,float}.c fail after r15-3087-gb07f8a301158e5 pinskia at gcc dot gnu.org
@ 2024-08-22 19:59 ` pinskia at gcc dot gnu.org
2024-08-22 20:09 ` pinskia at gcc dot gnu.org
` (11 subsequent siblings)
12 siblings, 0 replies; 14+ messages in thread
From: pinskia at gcc dot gnu.org @ 2024-08-22 19:59 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116463
Andrew Pinski <pinskia at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Target Milestone|--- |15.0
^ permalink raw reply [flat|nested] 14+ messages in thread
* [Bug tree-optimization/116463] [15 Regression] fast-math-complex-mls-{double,float}.c fail after r15-3087-gb07f8a301158e5
2024-08-22 19:58 [Bug tree-optimization/116463] New: [15 Regression] fast-math-complex-mls-{double,float}.c fail after r15-3087-gb07f8a301158e5 pinskia at gcc dot gnu.org
2024-08-22 19:59 ` [Bug tree-optimization/116463] " pinskia at gcc dot gnu.org
@ 2024-08-22 20:09 ` pinskia at gcc dot gnu.org
2024-08-22 20:18 ` pinskia at gcc dot gnu.org
` (10 subsequent siblings)
12 siblings, 0 replies; 14+ messages in thread
From: pinskia at gcc dot gnu.org @ 2024-08-22 20:09 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116463
--- Comment #1 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Created attachment 58977
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=58977&action=edit
Reduced testcase
options: `-ftree-vectorize -fno-tree-loop-distribute-patterns
-fno-vect-cost-model -fno-common -O2 -ffast-math -march=armv8.3-a`
^ permalink raw reply [flat|nested] 14+ messages in thread
* [Bug tree-optimization/116463] [15 Regression] fast-math-complex-mls-{double,float}.c fail after r15-3087-gb07f8a301158e5
2024-08-22 19:58 [Bug tree-optimization/116463] New: [15 Regression] fast-math-complex-mls-{double,float}.c fail after r15-3087-gb07f8a301158e5 pinskia at gcc dot gnu.org
2024-08-22 19:59 ` [Bug tree-optimization/116463] " pinskia at gcc dot gnu.org
2024-08-22 20:09 ` pinskia at gcc dot gnu.org
@ 2024-08-22 20:18 ` pinskia at gcc dot gnu.org
2024-08-22 20:20 ` pinskia at gcc dot gnu.org
` (9 subsequent siblings)
12 siblings, 0 replies; 14+ messages in thread
From: pinskia at gcc dot gnu.org @ 2024-08-22 20:18 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116463
Andrew Pinski <pinskia at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Attachment #58977|0 |1
is obsolete| |
--- Comment #2 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Created attachment 58978
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=58978&action=edit
Better testcase
Before the patch fms180snd could be detected but fms180snd_1 could not.
BUT both are the same function just changed when the multiply by i happens.
fms180snd_1 represents what happens after the patch for fms180snd .
^ permalink raw reply [flat|nested] 14+ messages in thread
* [Bug tree-optimization/116463] [15 Regression] fast-math-complex-mls-{double,float}.c fail after r15-3087-gb07f8a301158e5
2024-08-22 19:58 [Bug tree-optimization/116463] New: [15 Regression] fast-math-complex-mls-{double,float}.c fail after r15-3087-gb07f8a301158e5 pinskia at gcc dot gnu.org
` (2 preceding siblings ...)
2024-08-22 20:18 ` pinskia at gcc dot gnu.org
@ 2024-08-22 20:20 ` pinskia at gcc dot gnu.org
2024-08-22 20:43 ` pinskia at gcc dot gnu.org
` (8 subsequent siblings)
12 siblings, 0 replies; 14+ messages in thread
From: pinskia at gcc dot gnu.org @ 2024-08-22 20:20 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116463
Andrew Pinski <pinskia at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |tnfchris at gcc dot gnu.org
Blocks| |53947
Status|UNCONFIRMED |NEW
Last reconfirmed| |2024-08-22
Ever confirmed|0 |1
--- Comment #3 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
+Tamar
since he wrote the original Complex vectorization support.
Referenced Bugs:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947
[Bug 53947] [meta-bug] vectorizer missed-optimizations
^ permalink raw reply [flat|nested] 14+ messages in thread
* [Bug tree-optimization/116463] [15 Regression] fast-math-complex-mls-{double,float}.c fail after r15-3087-gb07f8a301158e5
2024-08-22 19:58 [Bug tree-optimization/116463] New: [15 Regression] fast-math-complex-mls-{double,float}.c fail after r15-3087-gb07f8a301158e5 pinskia at gcc dot gnu.org
` (3 preceding siblings ...)
2024-08-22 20:20 ` pinskia at gcc dot gnu.org
@ 2024-08-22 20:43 ` pinskia at gcc dot gnu.org
2024-08-22 23:03 ` tnfchris at gcc dot gnu.org
` (7 subsequent siblings)
12 siblings, 0 replies; 14+ messages in thread
From: pinskia at gcc dot gnu.org @ 2024-08-22 20:43 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116463
Andrew Pinski <pinskia at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Attachment #58978|0 |1
is obsolete| |
--- Comment #4 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Created attachment 58979
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=58979&action=edit
Full testcase
Before the change fms180snd_2a and fms180snd_1 could not be detected even
though they are all the same.
Note I think fms180snd_2a is more representative of what is done after the
patch for fms180snd rather than fms180snd_1.
^ permalink raw reply [flat|nested] 14+ messages in thread
* [Bug tree-optimization/116463] [15 Regression] fast-math-complex-mls-{double,float}.c fail after r15-3087-gb07f8a301158e5
2024-08-22 19:58 [Bug tree-optimization/116463] New: [15 Regression] fast-math-complex-mls-{double,float}.c fail after r15-3087-gb07f8a301158e5 pinskia at gcc dot gnu.org
` (4 preceding siblings ...)
2024-08-22 20:43 ` pinskia at gcc dot gnu.org
@ 2024-08-22 23:03 ` tnfchris at gcc dot gnu.org
2024-08-23 11:43 ` rguenth at gcc dot gnu.org
` (6 subsequent siblings)
12 siblings, 0 replies; 14+ messages in thread
From: tnfchris at gcc dot gnu.org @ 2024-08-22 23:03 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116463
--- Comment #5 from Tamar Christina <tnfchris at gcc dot gnu.org> ---
Yeah, This is because they generate different gimple sequences and thus
different SLP trees.
The core of the problem is there's no canonical form here, and a missing gimple
simplification rule:
_33 = IMAGPART_EXPR <*_3> + ((REALPART_EXPR <*_5> * IMAGPART_EXPR <*_7>) +
(IMAGPART_EXPR <*_5> * REALPART_EXPR <*_7>));
vs
_37 = IMAGPART_EXPR <*_3> - ((REALPART_EXPR <*_5> * -IMAGPART_EXPR <*_7>) +
(IMAGPART_EXPR <*_5> * -REALPART_EXPR <*_7>));
i.e. a - ((b * -c) + (d * -e)) == a + (b * c) + (d * e)
So probably in match.pd we should fold _37 into _33 which is a simpler form of
the same thing and it's better on scalar as well.
It would be better to finally introduce a vectorizer canonical form, for
instance the real part generates:
_36 = (_31 - _30) + REALPART_EXPR <*_3>;
vs
_32 = REALPART_EXPR <*_3> + (_26 - _27);
and this already is an additional thing to check, so it would be better if slp
build always puts complex parts consistently on one side of commutative
operations so we don't have to swap operands to check.
In any case, I have some patches in this area and can take a look when I'm
back, but think the new expression should be simplified back into the old one.
^ permalink raw reply [flat|nested] 14+ messages in thread
* [Bug tree-optimization/116463] [15 Regression] fast-math-complex-mls-{double,float}.c fail after r15-3087-gb07f8a301158e5
2024-08-22 19:58 [Bug tree-optimization/116463] New: [15 Regression] fast-math-complex-mls-{double,float}.c fail after r15-3087-gb07f8a301158e5 pinskia at gcc dot gnu.org
` (5 preceding siblings ...)
2024-08-22 23:03 ` tnfchris at gcc dot gnu.org
@ 2024-08-23 11:43 ` rguenth at gcc dot gnu.org
2024-08-23 12:37 ` cvs-commit at gcc dot gnu.org
` (5 subsequent siblings)
12 siblings, 0 replies; 14+ messages in thread
From: rguenth at gcc dot gnu.org @ 2024-08-23 11:43 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116463
--- Comment #6 from Richard Biener <rguenth at gcc dot gnu.org> ---
I think
a - ((b * -c) + (d * -e)) -> a + (b * c) + (d * e)
is a good simplification to be made, but it's difficult to do this with
canonicalization only. Like a * -b -> -(a * b) as the negate might
combine with both other negates down and upstream. But for
a*-b + c * -d it might be more obvious to turn that into
-a*b - c*d.
Maybe reassoc can be of help here - IIRC it turns b * -c into
b * c * -1, undistribute_ops_list might get that.
Note one issue is that complex lowering leaves around dead stmts,
confusing reassoc and forwprop, in particular
- _10 = COMPLEX_EXPR <_18, _6>;
stay around until reassoc. scheduling dce for testing shows reassoc
does something.
It's update_complex_assignment who replaces existing complex
stmts with COMPLEX_EXPRs, we should possibly resort do simple_dce_from_worklist
to clean those. Let me try to do that.
^ permalink raw reply [flat|nested] 14+ messages in thread
* [Bug tree-optimization/116463] [15 Regression] fast-math-complex-mls-{double,float}.c fail after r15-3087-gb07f8a301158e5
2024-08-22 19:58 [Bug tree-optimization/116463] New: [15 Regression] fast-math-complex-mls-{double,float}.c fail after r15-3087-gb07f8a301158e5 pinskia at gcc dot gnu.org
` (6 preceding siblings ...)
2024-08-23 11:43 ` rguenth at gcc dot gnu.org
@ 2024-08-23 12:37 ` cvs-commit at gcc dot gnu.org
2024-08-23 12:46 ` rguenth at gcc dot gnu.org
` (4 subsequent siblings)
12 siblings, 0 replies; 14+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2024-08-23 12:37 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116463
--- Comment #7 from GCC Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Richard Biener <rguenth@gcc.gnu.org>:
https://gcc.gnu.org/g:de1923f9f4d5344694c22ca883aeb15caf635734
commit r15-3128-gde1923f9f4d5344694c22ca883aeb15caf635734
Author: Richard Biener <rguenther@suse.de>
Date: Fri Aug 23 13:44:29 2024 +0200
tree-optimization/116463 - complex lowering leaves around dead stmts
Complex lowering generally replaces existing complex defs with
COMPLEX_EXPRs but those might be dead when it can always refer to
components from the lattice. This in turn can pessimize followup
transforms like forwprop and reassoc, the following makes sure to
get rid of dead COMPLEX_EXPRs generated by using
simple_dce_from_worklist.
PR tree-optimization/116463
* tree-complex.cc: Include tree-ssa-dce.h.
(dce_worklist): New global.
(update_complex_assignment): Add SSA def to the DCE worklist.
(tree_lower_complex): Perform DCE.
^ permalink raw reply [flat|nested] 14+ messages in thread
* [Bug tree-optimization/116463] [15 Regression] fast-math-complex-mls-{double,float}.c fail after r15-3087-gb07f8a301158e5
2024-08-22 19:58 [Bug tree-optimization/116463] New: [15 Regression] fast-math-complex-mls-{double,float}.c fail after r15-3087-gb07f8a301158e5 pinskia at gcc dot gnu.org
` (7 preceding siblings ...)
2024-08-23 12:37 ` cvs-commit at gcc dot gnu.org
@ 2024-08-23 12:46 ` rguenth at gcc dot gnu.org
2024-08-23 23:03 ` pinskia at gcc dot gnu.org
` (3 subsequent siblings)
12 siblings, 0 replies; 14+ messages in thread
From: rguenth at gcc dot gnu.org @ 2024-08-23 12:46 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116463
--- Comment #8 from Richard Biener <rguenth at gcc dot gnu.org> ---
As of r15-3128-gde1923f9f4d534 now
FAIL: gcc.target/i386/avx512fp16-vector-complex-float.c scan-assembler-not
vfmadd[123]*ph[ \\\\t]
FAIL: gcc.target/i386/avx512fp16-vector-complex-float.c scan-assembler-times
vfmaddcph[ \\\\t] 1
FAIL: gcc.target/i386/part-vect-complexhf.c scan-assembler-times vfmaddcph[
\\\\t] 1
fail which look similar to the aarch64 fails (I have no idea if the patch
helped for those).
For the first test it's fma0 which is no longer vectorized as
vmovdqu16 (%rdx), %zmm0
vmovdqu16 (%rsi), %zmm1
vfmaddcph (%rdi), %zmm1, %zmm0
vmovdqu16 %zmm0, (%rdx)
but
vmovdqu16 (%rsi), %zmm0
vmovdqu16 (%rdi), %zmm2
movl $1431655765, %eax
kmovd %eax, %k1
vpshufb .LC1(%rip), %zmm0, %zmm1
vfmadd213ph (%rdx), %zmm2, %zmm1
vpshufb .LC2(%rip), %zmm0, %zmm0
vpshufb .LC0(%rip), %zmm2, %zmm3
vmovdqa64 %zmm0, %zmm2
vfmadd132ph %zmm3, %zmm1, %zmm2
vfnmadd132ph %zmm3, %zmm1, %zmm0
vpblendmw %zmm0, %zmm2, %zmm0{%k1}
vmovdqu16 %zmm0, (%rdx)
where instead of
note: Found COMPLEX_FMA pattern in SLP tree
we have
note: Found VEC_ADDSUB pattern in SLP tree
note: Target does not support VEC_ADDSUB for vector type vector(32) _Float16
with the IL difference being (- is good, + is bad)
_12 = REALPART_EXPR <*_3>;
_11 = IMAGPART_EXPR <*_3>;
...
@@ -46,10 +46,10 @@
_27 = _19 * _25;
_28 = _20 * _25;
_29 = _19 * _24;
- _30 = _26 - _27;
- _31 = _28 + _29;
- _32 = _12 + _30;
- _33 = _11 + _31;
+ _9 = _12 + _26;
+ _10 = _11 + _28;
+ _32 = _9 - _27;
+ _33 = _10 + _29;
REALPART_EXPR <*_3> = _32;
IMAGPART_EXPR <*_3> = _33;
i_18 = i_21 + 1;
which is different association, enabled by deleting dead uses that confuse
reassoc.
^ permalink raw reply [flat|nested] 14+ messages in thread
* [Bug tree-optimization/116463] [15 Regression] fast-math-complex-mls-{double,float}.c fail after r15-3087-gb07f8a301158e5
2024-08-22 19:58 [Bug tree-optimization/116463] New: [15 Regression] fast-math-complex-mls-{double,float}.c fail after r15-3087-gb07f8a301158e5 pinskia at gcc dot gnu.org
` (8 preceding siblings ...)
2024-08-23 12:46 ` rguenth at gcc dot gnu.org
@ 2024-08-23 23:03 ` pinskia at gcc dot gnu.org
2024-08-25 20:21 ` pinskia at gcc dot gnu.org
` (2 subsequent siblings)
12 siblings, 0 replies; 14+ messages in thread
From: pinskia at gcc dot gnu.org @ 2024-08-23 23:03 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116463
--- Comment #9 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
(In reply to Richard Biener from comment #8)
> fail which look similar to the aarch64 fails (I have no idea if the patch
> helped for those).
The aarch64 ones still fail. And yes they look very similar.
^ permalink raw reply [flat|nested] 14+ messages in thread
* [Bug tree-optimization/116463] [15 Regression] fast-math-complex-mls-{double,float}.c fail after r15-3087-gb07f8a301158e5
2024-08-22 19:58 [Bug tree-optimization/116463] New: [15 Regression] fast-math-complex-mls-{double,float}.c fail after r15-3087-gb07f8a301158e5 pinskia at gcc dot gnu.org
` (9 preceding siblings ...)
2024-08-23 23:03 ` pinskia at gcc dot gnu.org
@ 2024-08-25 20:21 ` pinskia at gcc dot gnu.org
2024-08-28 8:06 ` tnfchris at gcc dot gnu.org
2024-08-28 8:13 ` rguenth at gcc dot gnu.org
12 siblings, 0 replies; 14+ messages in thread
From: pinskia at gcc dot gnu.org @ 2024-08-25 20:21 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116463
Andrew Pinski <pinskia at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
See Also| |https://gcc.gnu.org/bugzill
| |a/show_bug.cgi?id=105095
--- Comment #10 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
The current failures/xpass for aarch64 is:
XPASS: gcc.dg/vect/complex/fast-math-complex-add-half-float.c
scan-tree-dump-times vect "stmt.*COMPLEX_ADD_ROT270" 1
XPASS: gcc.dg/vect/complex/fast-math-complex-add-half-float.c
scan-tree-dump-times vect "stmt.*COMPLEX_ADD_ROT90" 1
FAIL: gcc.dg/vect/complex/fast-math-complex-mls-double.c scan-tree-dump vect
"Found COMPLEX_ADD_ROT270"
FAIL: gcc.dg/vect/complex/fast-math-complex-mls-float.c scan-tree-dump vect
"Found COMPLEX_ADD_ROT270"
FAIL: gcc.dg/vect/complex/fast-math-complex-mls-half-float.c scan-tree-dump
vect "Found COMPLEX_ADD_ROT270"
^ permalink raw reply [flat|nested] 14+ messages in thread
* [Bug tree-optimization/116463] [15 Regression] fast-math-complex-mls-{double,float}.c fail after r15-3087-gb07f8a301158e5
2024-08-22 19:58 [Bug tree-optimization/116463] New: [15 Regression] fast-math-complex-mls-{double,float}.c fail after r15-3087-gb07f8a301158e5 pinskia at gcc dot gnu.org
` (10 preceding siblings ...)
2024-08-25 20:21 ` pinskia at gcc dot gnu.org
@ 2024-08-28 8:06 ` tnfchris at gcc dot gnu.org
2024-08-28 8:13 ` rguenth at gcc dot gnu.org
12 siblings, 0 replies; 14+ messages in thread
From: tnfchris at gcc dot gnu.org @ 2024-08-28 8:06 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116463
--- Comment #11 from Tamar Christina <tnfchris at gcc dot gnu.org> ---
(In reply to Richard Biener from comment #6)
> I think
>
> a - ((b * -c) + (d * -e)) -> a + (b * c) + (d * e)
>
> is a good simplification to be made, but it's difficult to do this with
> canonicalization only. Like a * -b -> -(a * b) as the negate might
> combine with both other negates down and upstream. But for
> a*-b + c * -d it might be more obvious to turn that into
> -a*b - c*d.
Yeah, my expectation was that this would be an easier transform to avoid
the sharing problem we discussed before and that indeed the transform
looks at the entire chain not just transforming a * -b.
a*-b + c * -d -> -a*b - c*d
has the property of still maintaining the FMS and FMNS chains and can
get further simplified in the above case.
>
> Maybe reassoc can be of help here - IIRC it turns b * -c into
> b * c * -1, undistribute_ops_list might get that.
hmm I see, but don't we have a higher chance that folding will just
fold it back into the multiply?
For this to work we'd have to do
(b * -c) + (d * -e) -> -(b * c + d * e)
in one transformation no? since I'd imagine
(b * c * -1) + (d * e * -1)
would just be undone by match.pd?
>
> Note one issue is that complex lowering leaves around dead stmts,
> confusing reassoc and forwprop, in particular
>
> - _10 = COMPLEX_EXPR <_18, _6>;
>
> stay around until reassoc. scheduling dce for testing shows reassoc
> does something.
>
> It's update_complex_assignment who replaces existing complex
> stmts with COMPLEX_EXPRs, we should possibly resort do
> simple_dce_from_worklist
> to clean those. Let me try to do that.
Thanks!
^ permalink raw reply [flat|nested] 14+ messages in thread
* [Bug tree-optimization/116463] [15 Regression] fast-math-complex-mls-{double,float}.c fail after r15-3087-gb07f8a301158e5
2024-08-22 19:58 [Bug tree-optimization/116463] New: [15 Regression] fast-math-complex-mls-{double,float}.c fail after r15-3087-gb07f8a301158e5 pinskia at gcc dot gnu.org
` (11 preceding siblings ...)
2024-08-28 8:06 ` tnfchris at gcc dot gnu.org
@ 2024-08-28 8:13 ` rguenth at gcc dot gnu.org
12 siblings, 0 replies; 14+ messages in thread
From: rguenth at gcc dot gnu.org @ 2024-08-28 8:13 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116463
--- Comment #12 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Tamar Christina from comment #11)
> (In reply to Richard Biener from comment #6)
> > I think
> >
> > a - ((b * -c) + (d * -e)) -> a + (b * c) + (d * e)
> >
> > is a good simplification to be made, but it's difficult to do this with
> > canonicalization only. Like a * -b -> -(a * b) as the negate might
> > combine with both other negates down and upstream. But for
> > a*-b + c * -d it might be more obvious to turn that into
> > -a*b - c*d.
>
> Yeah, my expectation was that this would be an easier transform to avoid
> the sharing problem we discussed before and that indeed the transform
> looks at the entire chain not just transforming a * -b.
>
> a*-b + c * -d -> -a*b - c*d
>
> has the property of still maintaining the FMS and FMNS chains and can
> get further simplified in the above case.
>
> >
> > Maybe reassoc can be of help here - IIRC it turns b * -c into
> > b * c * -1, undistribute_ops_list might get that.
>
> hmm I see, but don't we have a higher chance that folding will just
> fold it back into the multiply?
>
> For this to work we'd have to do
>
> (b * -c) + (d * -e) -> -(b * c + d * e)
>
> in one transformation no? since I'd imagine
>
> (b * c * -1) + (d * e * -1)
>
> would just be undone by match.pd?
The * -1 is something reassoc does only internally, it then distributes
that back to generate an outer plus or minus.
Note for the x86 testcases there isn't any such simplification opportunity,
but the reassoc heuristics correctly mangle the expression to no longer
match the expected SLP complex patterns. There's also the re-association
of chains done by SLP discovery itself which could be a problem.
I'd say fixing this fallout is quite low priority at the moment, the
simple cases could be re-associated by reassoc into a recognizable
complex op order but even there it's a bit difficult as the operations
span two "chains" (a multiplication and addition chain) where reassoc
looks at them separately (apart from undistribution).
^ permalink raw reply [flat|nested] 14+ messages in thread