public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug tree-optimization/99149] New: [11 Regression] ICE during vectorization when shared trees contain different complex patterns
@ 2021-02-18 13:15 tnfchris at gcc dot gnu.org
2021-02-18 13:24 ` [Bug tree-optimization/99149] " rguenth at gcc dot gnu.org
` (3 more replies)
0 siblings, 4 replies; 5+ messages in thread
From: tnfchris at gcc dot gnu.org @ 2021-02-18 13:15 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99149
Bug ID: 99149
Summary: [11 Regression] ICE during vectorization when shared
trees contain different complex patterns
Product: gcc
Version: 11.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: tree-optimization
Assignee: tnfchris at gcc dot gnu.org
Reporter: tnfchris at gcc dot gnu.org
Target Milestone: ---
Target: aarch64-*
The following testcase
class a {
float b;
float c;
public:
a(float d, float e) : b(d), c(e) {}
a operator+(a d) { return a(b + d.b, c + d.c); }
a operator*(a d) { return a(b * b - c * c, b * c + c * d.b); }
};
int f, g;
class {
a *h;
a *i;
public:
void j() {
a k = h[0], l = i[g], m = k * i[f];
i[g] = l + m;
i[f] = m;
}
} n;
main() { n.j(); }
crashes with aarch64-none-elf-g++ -w -march=armv8.3-a -O3 -S main.cpp
At -O3 there are two SLP trees created, but they share partially part of a
complex pattern:
Final SLP tree for instance 0x559a7e0:
node 0x56f5520 (max_nunits=2, refcnt=2)
op template: MEM <float> [(struct a *)_16] = _20;
stmt 0 MEM <float> [(struct a *)_16] = _20;
stmt 1 MEM <float> [(struct a *)_16 + 4B] = _23;
children 0x56f5058
node 0x56f5058 (max_nunits=2, refcnt=3)
op: VEC_PERM_EXPR
stmt 0 _20 = _18 - _19;
stmt 1 _23 = _21 + _22;
lane permutation { 0[0] 1[1] }
children 0x56f5410 0x56f5498
node 0x56f5410 (max_nunits=1, refcnt=1)
op template: _20 = _18 - _19;
{ }
children 0x56f50e0 0x56f5278
node 0x56f50e0 (max_nunits=2, refcnt=3)
op template: _18 = k$b_4 * k$b_4;
stmt 0 _18 = k$b_4 * k$b_4;
stmt 1 _21 = k$b_4 * k$c_5;
children 0x56f5168 0x56f51f0
node 0x56f5168 (max_nunits=2, refcnt=2)
op template: k$b_4 = MEM[(const struct a &)_3].b;
stmt 0 k$b_4 = MEM[(const struct a &)_3].b;
stmt 1 k$b_4 = MEM[(const struct a &)_3].b;
load permutation { 0 0 }
node 0x56f51f0 (max_nunits=2, refcnt=2)
op template: k$b_4 = MEM[(const struct a &)_3].b;
stmt 0 k$b_4 = MEM[(const struct a &)_3].b;
stmt 1 k$c_5 = MEM[(const struct a &)_3].c;
load permutation { 0 1 }
node 0x56f5278 (max_nunits=2, refcnt=3)
op template: _19 = k$c_5 * k$c_5;
stmt 0 _19 = k$c_5 * k$c_5;
stmt 1 _22 = k$c_5 * d$b_17;
children 0x56f5300 0x56f5388
node 0x56f5300 (max_nunits=2, refcnt=2)
op template: k$c_5 = MEM[(const struct a &)_3].c;
stmt 0 k$c_5 = MEM[(const struct a &)_3].c;
stmt 1 k$c_5 = MEM[(const struct a &)_3].c;
load permutation { 1 1 }
node (external) 0x56f5388 (max_nunits=1, refcnt=1)
{ k$c_5, d$b_17 }
node 0x56f5498 (max_nunits=1, refcnt=1)
op template: _23 = _21 + _22;
{ }
children 0x56f50e0 0x56f5278
These dump contains two roots, one at 0x56f5520 and one at 0x5763380:
Final SLP tree for instance 0x5763380:
node 0x56f4ec0 (max_nunits=2, refcnt=2)
op template: MEM <float> [(struct a *)_10] = _24;
stmt 0 MEM <float> [(struct a *)_10] = _24;
stmt 1 MEM <float> [(struct a *)_10 + 4B] = _25;
children 0x56f4f48
node 0x56f4f48 (max_nunits=2, refcnt=2)
op template: _24 = l$b_11 + _20;
stmt 0 _24 = l$b_11 + _20;
stmt 1 _25 = l$c_12 + _23;
children 0x56f4fd0 0x56f5058
node 0x56f4fd0 (max_nunits=2, refcnt=2)
op template: l$b_11 = MEM[(const struct a &)_10].b;
stmt 0 l$b_11 = MEM[(const struct a &)_10].b;
stmt 1 l$c_12 = MEM[(const struct a &)_10].c;
load permutation { 0 1 }
node 0x56f5058 (max_nunits=2, refcnt=2)
op: VEC_PERM_EXPR
stmt 0 _20 = _18 - _19;
stmt 1 _23 = _21 + _22;
lane permutation { 0[0] 1[1] }
children 0x56f5410 0x56f5498
node 0x56f5410 (max_nunits=1, refcnt=1)
op template: _20 = _18 - _19;
{ }
children 0x56f50e0 0x56f5278
node 0x56f50e0 (max_nunits=2, refcnt=3)
op template: _18 = k$b_4 * k$b_4;
stmt 0 _18 = k$b_4 * k$b_4;
stmt 1 _21 = k$b_4 * k$c_5;
children 0x56f5168 0x56f51f0
node 0x56f5168 (max_nunits=2, refcnt=2)
op template: k$b_4 = MEM[(const struct a &)_3].b;
stmt 0 k$b_4 = MEM[(const struct a &)_3].b;
stmt 1 k$b_4 = MEM[(const struct a &)_3].b;
load permutation { 0 0 }
node 0x56f51f0 (max_nunits=2, refcnt=2)
op template: k$b_4 = MEM[(const struct a &)_3].b;
stmt 0 k$b_4 = MEM[(const struct a &)_3].b;
stmt 1 k$c_5 = MEM[(const struct a &)_3].c;
load permutation { 0 1 }
node 0x56f5278 (max_nunits=2, refcnt=3)
op template: _19 = k$c_5 * k$c_5;
stmt 0 _19 = k$c_5 * k$c_5;
stmt 1 _22 = k$c_5 * d$b_17;
children 0x56f5300 0x56f5388
node 0x56f5300 (max_nunits=2, refcnt=2)
op template: k$c_5 = MEM[(const struct a &)_3].c;
stmt 0 k$c_5 = MEM[(const struct a &)_3].c;
stmt 1 k$c_5 = MEM[(const struct a &)_3].c;
load permutation { 1 1 }
node (external) 0x56f5388 (max_nunits=1, refcnt=1)
{ k$c_5, d$b_17 }
node 0x56f5498 (max_nunits=1, refcnt=1)
op template: _23 = _21 + _22;
{ }
children 0x56f50e0 0x56f5278
It's a bit had to see so let's render the graphs
https://sketchviz.com/@Mistuke/3f10160785531d484f9eba0de186bf65/5cedb98ad6ee1e083c3bf9574dba1c679851412e
The graph rooted to the store in *_10 matches COMPLEX_FMA but that rooted in
the store to *_16 only matches COMPLEX_MUL.
In this case the COMPLEX_FMA shouldn't be recognized as they both share the
VEC_PERM_EXPR.
Disabling the COMPLEX_FMA pattern shows that this does then generate the right
code:
main:
adrp x1, .LANCHOR0
add x2, x1, :lo12:.LANCHOR0
movi v0.2s, 0
mov w0, 0
ldr x4, [x1, #:lo12:.LANCHOR0]
ldrsw x3, [x2, 16]
ldr x1, [x2, 8]
ldr d1, [x4]
fcmla v0.2s, v1.2s, v1.2s, #0
fcmla v0.2s, v1.2s, v1.2s, #90
ldrsw x2, [x2, 20]
ldr d1, [x1, x3, lsl 3]
fadd v1.2s, v1.2s, v0.2s
str d1, [x1, x3, lsl 3]
str d0, [x1, x2, lsl 3]
ret
So the pattern matcher should probably check that all children it's going to
replace belong to the same instance. But I am not sure how.
^ permalink raw reply [flat|nested] 5+ messages in thread
* [Bug tree-optimization/99149] [11 Regression] ICE during vectorization when shared trees contain different complex patterns
2021-02-18 13:15 [Bug tree-optimization/99149] New: [11 Regression] ICE during vectorization when shared trees contain different complex patterns tnfchris at gcc dot gnu.org
@ 2021-02-18 13:24 ` rguenth at gcc dot gnu.org
2021-02-24 9:44 ` cvs-commit at gcc dot gnu.org
` (2 subsequent siblings)
3 siblings, 0 replies; 5+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-02-18 13:24 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99149
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Keywords| |ice-on-valid-code
Target Milestone|--- |11.0
^ permalink raw reply [flat|nested] 5+ messages in thread
* [Bug tree-optimization/99149] [11 Regression] ICE during vectorization when shared trees contain different complex patterns
2021-02-18 13:15 [Bug tree-optimization/99149] New: [11 Regression] ICE during vectorization when shared trees contain different complex patterns tnfchris at gcc dot gnu.org
2021-02-18 13:24 ` [Bug tree-optimization/99149] " rguenth at gcc dot gnu.org
@ 2021-02-24 9:44 ` cvs-commit at gcc dot gnu.org
2021-02-24 9:46 ` tnfchris at gcc dot gnu.org
2021-02-24 14:57 ` cvs-commit at gcc dot gnu.org
3 siblings, 0 replies; 5+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2021-02-24 9:44 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99149
--- Comment #1 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Tamar Christina <tnfchris@gcc.gnu.org>:
https://gcc.gnu.org/g:5296bd57d0605d1fec900d85e3ab3875197e609a
commit r11-7355-g5296bd57d0605d1fec900d85e3ab3875197e609a
Author: Tamar Christina <tamar.christina@arm.com>
Date: Wed Feb 24 09:43:22 2021 +0000
slp: fix sharing of SLP only patterns.
The attached testcase ICEs due to a couple of issues.
In the testcase you have two SLP instances that share the majority of their
definition with each other. One tree defines a COMPLEX_MUL sequence and
the
other tree a COMPLEX_FMA.
The ice happens because:
1. the refcounts are wrong, in particular the FMA case doesn't correctly
count
the references for the COMPLEX_MUL that it consumes.
2. when the FMA is created it incorrectly assumes it can just tear apart
the MUL
node that it's consuming. This is wrong and should only be done when there
is
no more uses of the node, in which case the vector only pattern is no
longer
relevant.
To fix the last part the SLP only pattern reset code was moved into
vect_free_slp_tree which results in cleaner code. I also think it does
belong
there since that function knows when there are no more uses of the node and
so
the pattern should be unmarked, so when the the vectorizer is inspecting
the BB
it doesn't find the now invalid vector only patterns.
The patch also clears the SLP_TREE_REPRESENTATIVE when stores are removed
such
that we don't hit an error later trying to free the stmt_vec_info again.
Lastly it also tweaks the results of whether a pattern has been detected or
not
to return true when another SLP instance has created a pattern that is only
used
by a different instance (due to the trees being unshared).
Instead of ICEing this code now produces
adrp x1, .LANCHOR0
add x2, x1, :lo12:.LANCHOR0
movi v1.2s, 0
mov w0, 0
ldr x4, [x1, #:lo12:.LANCHOR0]
ldrsw x3, [x2, 16]
ldr x1, [x2, 8]
ldrsw x2, [x2, 20]
ldr d0, [x4]
ldr d2, [x1, x3, lsl 3]
fcmla v2.2s, v0.2s, v0.2s, #0
fcmla v2.2s, v0.2s, v0.2s, #90
str d2, [x1, x3, lsl 3]
fcmla v1.2s, v0.2s, v0.2s, #0
fcmla v1.2s, v0.2s, v0.2s, #90
str d1, [x1, x2, lsl 3]
ret
PS. This testcase actually shows that the codegen we get in these cases is
not
optimal. It should generate a MUL + ADD instead MUL + FMA.
But that's for GCC 12.
gcc/ChangeLog:
PR tree-optimization/99149
* tree-vect-slp-patterns.c (vect_detect_pair_op): Don't recreate
the
buffer.
(vect_slp_reset_pattern): Remove.
(complex_fma_pattern::matches): Remove call to
vect_slp_reset_pattern.
(complex_mul_pattern::build, complex_fma_pattern::build,
complex_fms_pattern::build): Fix ref counts.
* tree-vect-slp.c (vect_free_slp_tree): Undo SLP only pattern
relevancy
when node is being deleted.
(vect_match_slp_patterns_2): Correct result of cache hit on
patterns.
(vect_schedule_slp): Invalidate SLP_TREE_REPRESENTATIVE of removed
stores.
* tree-vectorizer.c (vec_info::new_stmt_vec_info): Initialize
value.
gcc/testsuite/ChangeLog:
PR tree-optimization/99149
* g++.dg/vect/pr99149.cc: New test.
^ permalink raw reply [flat|nested] 5+ messages in thread
* [Bug tree-optimization/99149] [11 Regression] ICE during vectorization when shared trees contain different complex patterns
2021-02-18 13:15 [Bug tree-optimization/99149] New: [11 Regression] ICE during vectorization when shared trees contain different complex patterns tnfchris at gcc dot gnu.org
2021-02-18 13:24 ` [Bug tree-optimization/99149] " rguenth at gcc dot gnu.org
2021-02-24 9:44 ` cvs-commit at gcc dot gnu.org
@ 2021-02-24 9:46 ` tnfchris at gcc dot gnu.org
2021-02-24 14:57 ` cvs-commit at gcc dot gnu.org
3 siblings, 0 replies; 5+ messages in thread
From: tnfchris at gcc dot gnu.org @ 2021-02-24 9:46 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99149
Tamar Christina <tnfchris at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Resolution|--- |FIXED
Status|UNCONFIRMED |RESOLVED
--- Comment #2 from Tamar Christina <tnfchris at gcc dot gnu.org> ---
Fixed in master
^ permalink raw reply [flat|nested] 5+ messages in thread
* [Bug tree-optimization/99149] [11 Regression] ICE during vectorization when shared trees contain different complex patterns
2021-02-18 13:15 [Bug tree-optimization/99149] New: [11 Regression] ICE during vectorization when shared trees contain different complex patterns tnfchris at gcc dot gnu.org
` (2 preceding siblings ...)
2021-02-24 9:46 ` tnfchris at gcc dot gnu.org
@ 2021-02-24 14:57 ` cvs-commit at gcc dot gnu.org
3 siblings, 0 replies; 5+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2021-02-24 14:57 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99149
--- Comment #3 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Tamar Christina <tnfchris@gcc.gnu.org>:
https://gcc.gnu.org/g:96c5a8589e0dd22819bae34cac3381ad381d3989
commit r11-7358-g96c5a8589e0dd22819bae34cac3381ad381d3989
Author: Tamar Christina <tamar.christina@arm.com>
Date: Wed Feb 24 14:57:08 2021 +0000
[comitted] Testsuite: Disable PR99149 test on big-endian
This patch disables the test for PR99149 on Big-endian
where for standard AArch64 the patterns are disabled.
gcc/testsuite/ChangeLog:
PR tree-optimization/99149
* g++.dg/vect/pr99149.cc: Disabled on BE.
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2021-02-24 14:57 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-02-18 13:15 [Bug tree-optimization/99149] New: [11 Regression] ICE during vectorization when shared trees contain different complex patterns tnfchris at gcc dot gnu.org
2021-02-18 13:24 ` [Bug tree-optimization/99149] " rguenth at gcc dot gnu.org
2021-02-24 9:44 ` cvs-commit at gcc dot gnu.org
2021-02-24 9:46 ` tnfchris at gcc dot gnu.org
2021-02-24 14:57 ` cvs-commit at gcc dot gnu.org
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).