* [SVE][match.pd] Fix ICE observed in PR110280 @ 2023-06-20 9:54 Prathamesh Kulkarni 2023-06-20 11:15 ` Richard Biener 0 siblings, 1 reply; 7+ messages in thread From: Prathamesh Kulkarni @ 2023-06-20 9:54 UTC (permalink / raw) To: gcc Patches, Richard Sandiford [-- Attachment #1: Type: text/plain, Size: 2302 bytes --] Hi Richard, For the following reduced test-case taken from PR: #include "arm_sve.h" svuint32_t l() { alignas(16) const unsigned int lanes[4] = {0, 0, 0, 0}; return svld1rq_u32(svptrue_b8(), lanes); } compiling with -O3 -mcpu=generic+sve results in following ICE: during GIMPLE pass: fre pr110280.c: In function 'l': pr110280.c:5:1: internal compiler error: in eliminate_stmt, at tree-ssa-sccvn.cc:6890 5 | } | ^ 0x865fb1 eliminate_dom_walker::eliminate_stmt(basic_block_def*, gimple_stmt_iterator*) ../../gcc/gcc/tree-ssa-sccvn.cc:6890 0x120bf4d eliminate_dom_walker::before_dom_children(basic_block_def*) ../../gcc/gcc/tree-ssa-sccvn.cc:7324 0x120bf4d eliminate_dom_walker::before_dom_children(basic_block_def*) ../../gcc/gcc/tree-ssa-sccvn.cc:7257 0x1aeec77 dom_walker::walk(basic_block_def*) ../../gcc/gcc/domwalk.cc:311 0x11fd924 eliminate_with_rpo_vn(bitmap_head*) ../../gcc/gcc/tree-ssa-sccvn.cc:7504 0x1214664 do_rpo_vn_1 ../../gcc/gcc/tree-ssa-sccvn.cc:8616 0x1215ba5 execute ../../gcc/gcc/tree-ssa-sccvn.cc:8702 cc1 simplifies: lanes[0] = 0; lanes[1] = 0; lanes[2] = 0; lanes[3] = 0; _1 = { -1, ... }; _7 = svld1rq_u32 (_1, &lanes); to: _9 = MEM <vector(4) unsigned int> [(unsigned int * {ref-all})&lanes]; _7 = VEC_PERM_EXPR <_9, _9, { 0, 1, 2, 3, ... }>; and then fre1 dump shows: Applying pattern match.pd:8675, generic-match-5.cc:9025 Match-and-simplified VEC_PERM_EXPR <_9, _9, { 0, 1, 2, 3, ... }> to { 0, 0, 0, 0 } RHS VEC_PERM_EXPR <_9, _9, { 0, 1, 2, 3, ... }> simplified to { 0, 0, 0, 0 } The issue seems to be with the following pattern: (simplify (vec_perm vec_same_elem_p@0 @0 @1) @0) which simplifies above VEC_PERM_EXPR to: _7 = {0, 0, 0, 0} which is incorrect since _9 and mask have different vector lengths. The attached patch amends the pattern to simplify above VEC_PERM_EXPR only if operand and mask have same number of elements, which seems to fix the issue, and we're left with the following in .optimized dump: <bb 2> [local count: 1073741824]: _2 = VEC_PERM_EXPR <{ 0, 0, 0, 0 }, { 0, 0, 0, 0 }, { 0, 1, 2, 3, ... }>; return _2; code-gen: l: mov z0.b, #0 ret Patch is bootstrapped+tested on aarch64-linux-gnu. OK to commit ? Thanks, Prathamesh [-- Attachment #2: pr110280-1.txt --] [-- Type: text/plain, Size: 1195 bytes --] [SVE][match.pd] Fix ICE observed in PR110280. gcc/ChangeLog: PR tree-optimization/110280 * match.pd (vec_perm_expr(v, v, mask) -> v): Simplify the pattern only if operand and mask of VEC_PERM_EXPR have same number of elements. gcc/testsuite/ChangeLog: * gcc.target/aarch64/sve/pr110280.c: New test. diff --git a/gcc/match.pd b/gcc/match.pd index 2dd23826034..0eb5f8f0af6 100644 --- a/gcc/match.pd +++ b/gcc/match.pd @@ -8669,10 +8669,11 @@ and, @0 (if (uniform_vector_p (@0)))) - (simplify (vec_perm vec_same_elem_p@0 @0 @1) - @0) + (if (known_eq (TYPE_VECTOR_SUBPARTS (TREE_TYPE (@0)), + TYPE_VECTOR_SUBPARTS (TREE_TYPE (@1)))) + @0)) /* Push VEC_PERM earlier if that may help FMA perception (PR101895). */ (simplify diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pr110280.c b/gcc/testsuite/gcc.target/aarch64/sve/pr110280.c new file mode 100644 index 00000000000..453c9cbcf9e --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve/pr110280.c @@ -0,0 +1,10 @@ +/* { dg-do compile } */ +/* { dg-options "-O3" } */ + +#include "arm_sve.h" + +svuint32_t l() +{ + _Alignas(16) const unsigned int lanes[4] = {0, 0, 0, 0}; + return svld1rq_u32(svptrue_b8(), lanes); +} ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [SVE][match.pd] Fix ICE observed in PR110280 2023-06-20 9:54 [SVE][match.pd] Fix ICE observed in PR110280 Prathamesh Kulkarni @ 2023-06-20 11:15 ` Richard Biener 2023-06-22 9:07 ` Prathamesh Kulkarni 0 siblings, 1 reply; 7+ messages in thread From: Richard Biener @ 2023-06-20 11:15 UTC (permalink / raw) To: Prathamesh Kulkarni; +Cc: gcc Patches, Richard Sandiford On Tue, Jun 20, 2023 at 11:56 AM Prathamesh Kulkarni via Gcc-patches <gcc-patches@gcc.gnu.org> wrote: > > Hi Richard, > For the following reduced test-case taken from PR: > > #include "arm_sve.h" > svuint32_t l() { > alignas(16) const unsigned int lanes[4] = {0, 0, 0, 0}; > return svld1rq_u32(svptrue_b8(), lanes); > } > > compiling with -O3 -mcpu=generic+sve results in following ICE: > during GIMPLE pass: fre > pr110280.c: In function 'l': > pr110280.c:5:1: internal compiler error: in eliminate_stmt, at > tree-ssa-sccvn.cc:6890 > 5 | } > | ^ > 0x865fb1 eliminate_dom_walker::eliminate_stmt(basic_block_def*, > gimple_stmt_iterator*) > ../../gcc/gcc/tree-ssa-sccvn.cc:6890 > 0x120bf4d eliminate_dom_walker::before_dom_children(basic_block_def*) > ../../gcc/gcc/tree-ssa-sccvn.cc:7324 > 0x120bf4d eliminate_dom_walker::before_dom_children(basic_block_def*) > ../../gcc/gcc/tree-ssa-sccvn.cc:7257 > 0x1aeec77 dom_walker::walk(basic_block_def*) > ../../gcc/gcc/domwalk.cc:311 > 0x11fd924 eliminate_with_rpo_vn(bitmap_head*) > ../../gcc/gcc/tree-ssa-sccvn.cc:7504 > 0x1214664 do_rpo_vn_1 > ../../gcc/gcc/tree-ssa-sccvn.cc:8616 > 0x1215ba5 execute > ../../gcc/gcc/tree-ssa-sccvn.cc:8702 > > cc1 simplifies: > lanes[0] = 0; > lanes[1] = 0; > lanes[2] = 0; > lanes[3] = 0; > _1 = { -1, ... }; > _7 = svld1rq_u32 (_1, &lanes); > > to: > _9 = MEM <vector(4) unsigned int> [(unsigned int * {ref-all})&lanes]; > _7 = VEC_PERM_EXPR <_9, _9, { 0, 1, 2, 3, ... }>; > > and then fre1 dump shows: > Applying pattern match.pd:8675, generic-match-5.cc:9025 > Match-and-simplified VEC_PERM_EXPR <_9, _9, { 0, 1, 2, 3, ... }> to { > 0, 0, 0, 0 } > RHS VEC_PERM_EXPR <_9, _9, { 0, 1, 2, 3, ... }> simplified to { 0, 0, 0, 0 } > > The issue seems to be with the following pattern: > (simplify > (vec_perm vec_same_elem_p@0 @0 @1) > @0) > > which simplifies above VEC_PERM_EXPR to: > _7 = {0, 0, 0, 0} > which is incorrect since _9 and mask have different vector lengths. > > The attached patch amends the pattern to simplify above VEC_PERM_EXPR > only if operand and mask have same number of elements, which seems to fix > the issue, and we're left with the following in .optimized dump: > <bb 2> [local count: 1073741824]: > _2 = VEC_PERM_EXPR <{ 0, 0, 0, 0 }, { 0, 0, 0, 0 }, { 0, 1, 2, 3, ... }>; it would be nice to have this optimized. - (simplify (vec_perm vec_same_elem_p@0 @0 @1) - @0) + (if (known_eq (TYPE_VECTOR_SUBPARTS (TREE_TYPE (@0)), + TYPE_VECTOR_SUBPARTS (TREE_TYPE (@1)))) + @0)) that looks good I think. Maybe even better use 'type' instead of TREE_TYPE (@1) since that's more obviously the return type in which case (if (types_match (type, TREE_TYPE (@0)) would be more to the point. But can't you to simplify this in the !known_eq case do a simple { build_vector_from_val (type, the-element); } ? The 'vec_same_elem_p' predicate doesn't get you at the element, (with { tree el = uniform_vector_p (@0); } (if (el) { build_vector_from_val (type, el); }))) would be the cheapest workaround. > return _2; > > code-gen: > l: > mov z0.b, #0 > ret > > Patch is bootstrapped+tested on aarch64-linux-gnu. > OK to commit ? > > Thanks, > Prathamesh ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [SVE][match.pd] Fix ICE observed in PR110280 2023-06-20 11:15 ` Richard Biener @ 2023-06-22 9:07 ` Prathamesh Kulkarni 2023-06-22 12:33 ` Richard Biener 0 siblings, 1 reply; 7+ messages in thread From: Prathamesh Kulkarni @ 2023-06-22 9:07 UTC (permalink / raw) To: Richard Biener; +Cc: gcc Patches, Richard Sandiford [-- Attachment #1: Type: text/plain, Size: 3975 bytes --] On Tue, 20 Jun 2023 at 16:47, Richard Biener <richard.guenther@gmail.com> wrote: > > On Tue, Jun 20, 2023 at 11:56 AM Prathamesh Kulkarni via Gcc-patches > <gcc-patches@gcc.gnu.org> wrote: > > > > Hi Richard, > > For the following reduced test-case taken from PR: > > > > #include "arm_sve.h" > > svuint32_t l() { > > alignas(16) const unsigned int lanes[4] = {0, 0, 0, 0}; > > return svld1rq_u32(svptrue_b8(), lanes); > > } > > > > compiling with -O3 -mcpu=generic+sve results in following ICE: > > during GIMPLE pass: fre > > pr110280.c: In function 'l': > > pr110280.c:5:1: internal compiler error: in eliminate_stmt, at > > tree-ssa-sccvn.cc:6890 > > 5 | } > > | ^ > > 0x865fb1 eliminate_dom_walker::eliminate_stmt(basic_block_def*, > > gimple_stmt_iterator*) > > ../../gcc/gcc/tree-ssa-sccvn.cc:6890 > > 0x120bf4d eliminate_dom_walker::before_dom_children(basic_block_def*) > > ../../gcc/gcc/tree-ssa-sccvn.cc:7324 > > 0x120bf4d eliminate_dom_walker::before_dom_children(basic_block_def*) > > ../../gcc/gcc/tree-ssa-sccvn.cc:7257 > > 0x1aeec77 dom_walker::walk(basic_block_def*) > > ../../gcc/gcc/domwalk.cc:311 > > 0x11fd924 eliminate_with_rpo_vn(bitmap_head*) > > ../../gcc/gcc/tree-ssa-sccvn.cc:7504 > > 0x1214664 do_rpo_vn_1 > > ../../gcc/gcc/tree-ssa-sccvn.cc:8616 > > 0x1215ba5 execute > > ../../gcc/gcc/tree-ssa-sccvn.cc:8702 > > > > cc1 simplifies: > > lanes[0] = 0; > > lanes[1] = 0; > > lanes[2] = 0; > > lanes[3] = 0; > > _1 = { -1, ... }; > > _7 = svld1rq_u32 (_1, &lanes); > > > > to: > > _9 = MEM <vector(4) unsigned int> [(unsigned int * {ref-all})&lanes]; > > _7 = VEC_PERM_EXPR <_9, _9, { 0, 1, 2, 3, ... }>; > > > > and then fre1 dump shows: > > Applying pattern match.pd:8675, generic-match-5.cc:9025 > > Match-and-simplified VEC_PERM_EXPR <_9, _9, { 0, 1, 2, 3, ... }> to { > > 0, 0, 0, 0 } > > RHS VEC_PERM_EXPR <_9, _9, { 0, 1, 2, 3, ... }> simplified to { 0, 0, 0, 0 } > > > > The issue seems to be with the following pattern: > > (simplify > > (vec_perm vec_same_elem_p@0 @0 @1) > > @0) > > > > which simplifies above VEC_PERM_EXPR to: > > _7 = {0, 0, 0, 0} > > which is incorrect since _9 and mask have different vector lengths. > > > > The attached patch amends the pattern to simplify above VEC_PERM_EXPR > > only if operand and mask have same number of elements, which seems to fix > > the issue, and we're left with the following in .optimized dump: > > <bb 2> [local count: 1073741824]: > > _2 = VEC_PERM_EXPR <{ 0, 0, 0, 0 }, { 0, 0, 0, 0 }, { 0, 1, 2, 3, ... }>; > > it would be nice to have this optimized. > > - > (simplify > (vec_perm vec_same_elem_p@0 @0 @1) > - @0) > + (if (known_eq (TYPE_VECTOR_SUBPARTS (TREE_TYPE (@0)), > + TYPE_VECTOR_SUBPARTS (TREE_TYPE (@1)))) > + @0)) > > that looks good I think. Maybe even better use 'type' instead of TREE_TYPE (@1) > since that's more obviously the return type in which case > > (if (types_match (type, TREE_TYPE (@0)) > > would be more to the point. > > But can't you to simplify this in the !known_eq case do a simple > > { build_vector_from_val (type, the-element); } > > ? The 'vec_same_elem_p' predicate doesn't get you at the element, > > (with { tree el = uniform_vector_p (@0); } > (if (el) > { build_vector_from_val (type, el); }))) > > would be the cheapest workaround. Hi Richard, Thanks for the suggestions. Using build_vector_from_val simplifies it to: <bb 2> [local count: 1073741824]: return { 0, ... }; Patch is bootstrapped+tested on aarch64-linux-gnu, in progress on x86_64-linux-gnu. OK to commit ? Thanks, Prathamesh > > > return _2; > > > > code-gen: > > l: > > mov z0.b, #0 > > ret > > > > Patch is bootstrapped+tested on aarch64-linux-gnu. > > OK to commit ? > > > > Thanks, > > Prathamesh [-- Attachment #2: pr110280-2.txt --] [-- Type: text/plain, Size: 1276 bytes --] [aarch64/match.pd] Fix ICE observed in PR110280. gcc/ChangeLog: PR tree-optimization/110280 * match.pd (vec_perm_expr(v, v, mask) -> v): Explicitly build vector using build_vector_from_val with the element of input operand, and mask's type. gcc/testsuite/ChangeLog: * gcc.target/aarch64/sve/pr110280.c: New test. diff --git a/gcc/match.pd b/gcc/match.pd index 2dd23826034..76a37297d3c 100644 --- a/gcc/match.pd +++ b/gcc/match.pd @@ -8672,7 +8672,12 @@ and, (simplify (vec_perm vec_same_elem_p@0 @0 @1) - @0) + (with + { + tree elem = uniform_vector_p (@0); + } + (if (elem) + { build_vector_from_val (type, elem); }))) /* Push VEC_PERM earlier if that may help FMA perception (PR101895). */ (simplify diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pr110280.c b/gcc/testsuite/gcc.target/aarch64/sve/pr110280.c new file mode 100644 index 00000000000..d3279f38362 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve/pr110280.c @@ -0,0 +1,12 @@ +/* { dg-do compile } */ +/* { dg-options "-O3 -fdump-tree-optimized" } */ + +#include "arm_sve.h" + +svuint32_t l() +{ + _Alignas(16) const unsigned int lanes[4] = {0, 0, 0, 0}; + return svld1rq_u32(svptrue_b8(), lanes); +} + +/* { dg-final { scan-tree-dump-not "VEC_PERM_EXPR" "optimized" } } */ ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [SVE][match.pd] Fix ICE observed in PR110280 2023-06-22 9:07 ` Prathamesh Kulkarni @ 2023-06-22 12:33 ` Richard Biener 2023-06-23 9:09 ` Prathamesh Kulkarni 0 siblings, 1 reply; 7+ messages in thread From: Richard Biener @ 2023-06-22 12:33 UTC (permalink / raw) To: Prathamesh Kulkarni; +Cc: gcc Patches, Richard Sandiford On Thu, Jun 22, 2023 at 11:08 AM Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org> wrote: > > On Tue, 20 Jun 2023 at 16:47, Richard Biener <richard.guenther@gmail.com> wrote: > > > > On Tue, Jun 20, 2023 at 11:56 AM Prathamesh Kulkarni via Gcc-patches > > <gcc-patches@gcc.gnu.org> wrote: > > > > > > Hi Richard, > > > For the following reduced test-case taken from PR: > > > > > > #include "arm_sve.h" > > > svuint32_t l() { > > > alignas(16) const unsigned int lanes[4] = {0, 0, 0, 0}; > > > return svld1rq_u32(svptrue_b8(), lanes); > > > } > > > > > > compiling with -O3 -mcpu=generic+sve results in following ICE: > > > during GIMPLE pass: fre > > > pr110280.c: In function 'l': > > > pr110280.c:5:1: internal compiler error: in eliminate_stmt, at > > > tree-ssa-sccvn.cc:6890 > > > 5 | } > > > | ^ > > > 0x865fb1 eliminate_dom_walker::eliminate_stmt(basic_block_def*, > > > gimple_stmt_iterator*) > > > ../../gcc/gcc/tree-ssa-sccvn.cc:6890 > > > 0x120bf4d eliminate_dom_walker::before_dom_children(basic_block_def*) > > > ../../gcc/gcc/tree-ssa-sccvn.cc:7324 > > > 0x120bf4d eliminate_dom_walker::before_dom_children(basic_block_def*) > > > ../../gcc/gcc/tree-ssa-sccvn.cc:7257 > > > 0x1aeec77 dom_walker::walk(basic_block_def*) > > > ../../gcc/gcc/domwalk.cc:311 > > > 0x11fd924 eliminate_with_rpo_vn(bitmap_head*) > > > ../../gcc/gcc/tree-ssa-sccvn.cc:7504 > > > 0x1214664 do_rpo_vn_1 > > > ../../gcc/gcc/tree-ssa-sccvn.cc:8616 > > > 0x1215ba5 execute > > > ../../gcc/gcc/tree-ssa-sccvn.cc:8702 > > > > > > cc1 simplifies: > > > lanes[0] = 0; > > > lanes[1] = 0; > > > lanes[2] = 0; > > > lanes[3] = 0; > > > _1 = { -1, ... }; > > > _7 = svld1rq_u32 (_1, &lanes); > > > > > > to: > > > _9 = MEM <vector(4) unsigned int> [(unsigned int * {ref-all})&lanes]; > > > _7 = VEC_PERM_EXPR <_9, _9, { 0, 1, 2, 3, ... }>; > > > > > > and then fre1 dump shows: > > > Applying pattern match.pd:8675, generic-match-5.cc:9025 > > > Match-and-simplified VEC_PERM_EXPR <_9, _9, { 0, 1, 2, 3, ... }> to { > > > 0, 0, 0, 0 } > > > RHS VEC_PERM_EXPR <_9, _9, { 0, 1, 2, 3, ... }> simplified to { 0, 0, 0, 0 } > > > > > > The issue seems to be with the following pattern: > > > (simplify > > > (vec_perm vec_same_elem_p@0 @0 @1) > > > @0) > > > > > > which simplifies above VEC_PERM_EXPR to: > > > _7 = {0, 0, 0, 0} > > > which is incorrect since _9 and mask have different vector lengths. > > > > > > The attached patch amends the pattern to simplify above VEC_PERM_EXPR > > > only if operand and mask have same number of elements, which seems to fix > > > the issue, and we're left with the following in .optimized dump: > > > <bb 2> [local count: 1073741824]: > > > _2 = VEC_PERM_EXPR <{ 0, 0, 0, 0 }, { 0, 0, 0, 0 }, { 0, 1, 2, 3, ... }>; > > > > it would be nice to have this optimized. > > > > - > > (simplify > > (vec_perm vec_same_elem_p@0 @0 @1) > > - @0) > > + (if (known_eq (TYPE_VECTOR_SUBPARTS (TREE_TYPE (@0)), > > + TYPE_VECTOR_SUBPARTS (TREE_TYPE (@1)))) > > + @0)) > > > > that looks good I think. Maybe even better use 'type' instead of TREE_TYPE (@1) > > since that's more obviously the return type in which case > > > > (if (types_match (type, TREE_TYPE (@0)) > > > > would be more to the point. > > > > But can't you to simplify this in the !known_eq case do a simple > > > > { build_vector_from_val (type, the-element); } > > > > ? The 'vec_same_elem_p' predicate doesn't get you at the element, > > > > (with { tree el = uniform_vector_p (@0); } > > (if (el) > > { build_vector_from_val (type, el); }))) > > > > would be the cheapest workaround. > Hi Richard, > Thanks for the suggestions. Using build_vector_from_val simplifies it to: > <bb 2> [local count: 1073741824]: > return { 0, ... }; > > Patch is bootstrapped+tested on aarch64-linux-gnu, in progress on > x86_64-linux-gnu. > OK to commit ? Can you retain the case of matching type? Like (if (types_match (type, TREE_TYPE (@0)) @0 (with { tree elem = uniform_vector_p (@0); } (if (elem) { build_vector_from_val (type, elem); })))) ? Because uniform_vector_p is strictly less powerful than (vec_same_elem_p ...) OK with that change. Richard. > > Thanks, > Prathamesh > > > > > return _2; > > > > > > code-gen: > > > l: > > > mov z0.b, #0 > > > ret > > > > > > Patch is bootstrapped+tested on aarch64-linux-gnu. > > > OK to commit ? > > > > > > Thanks, > > > Prathamesh ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [SVE][match.pd] Fix ICE observed in PR110280 2023-06-22 12:33 ` Richard Biener @ 2023-06-23 9:09 ` Prathamesh Kulkarni 2023-06-23 9:28 ` Richard Biener 0 siblings, 1 reply; 7+ messages in thread From: Prathamesh Kulkarni @ 2023-06-23 9:09 UTC (permalink / raw) To: Richard Biener; +Cc: gcc Patches, Richard Sandiford [-- Attachment #1: Type: text/plain, Size: 5171 bytes --] On Thu, 22 Jun 2023 at 18:06, Richard Biener <richard.guenther@gmail.com> wrote: > > On Thu, Jun 22, 2023 at 11:08 AM Prathamesh Kulkarni > <prathamesh.kulkarni@linaro.org> wrote: > > > > On Tue, 20 Jun 2023 at 16:47, Richard Biener <richard.guenther@gmail.com> wrote: > > > > > > On Tue, Jun 20, 2023 at 11:56 AM Prathamesh Kulkarni via Gcc-patches > > > <gcc-patches@gcc.gnu.org> wrote: > > > > > > > > Hi Richard, > > > > For the following reduced test-case taken from PR: > > > > > > > > #include "arm_sve.h" > > > > svuint32_t l() { > > > > alignas(16) const unsigned int lanes[4] = {0, 0, 0, 0}; > > > > return svld1rq_u32(svptrue_b8(), lanes); > > > > } > > > > > > > > compiling with -O3 -mcpu=generic+sve results in following ICE: > > > > during GIMPLE pass: fre > > > > pr110280.c: In function 'l': > > > > pr110280.c:5:1: internal compiler error: in eliminate_stmt, at > > > > tree-ssa-sccvn.cc:6890 > > > > 5 | } > > > > | ^ > > > > 0x865fb1 eliminate_dom_walker::eliminate_stmt(basic_block_def*, > > > > gimple_stmt_iterator*) > > > > ../../gcc/gcc/tree-ssa-sccvn.cc:6890 > > > > 0x120bf4d eliminate_dom_walker::before_dom_children(basic_block_def*) > > > > ../../gcc/gcc/tree-ssa-sccvn.cc:7324 > > > > 0x120bf4d eliminate_dom_walker::before_dom_children(basic_block_def*) > > > > ../../gcc/gcc/tree-ssa-sccvn.cc:7257 > > > > 0x1aeec77 dom_walker::walk(basic_block_def*) > > > > ../../gcc/gcc/domwalk.cc:311 > > > > 0x11fd924 eliminate_with_rpo_vn(bitmap_head*) > > > > ../../gcc/gcc/tree-ssa-sccvn.cc:7504 > > > > 0x1214664 do_rpo_vn_1 > > > > ../../gcc/gcc/tree-ssa-sccvn.cc:8616 > > > > 0x1215ba5 execute > > > > ../../gcc/gcc/tree-ssa-sccvn.cc:8702 > > > > > > > > cc1 simplifies: > > > > lanes[0] = 0; > > > > lanes[1] = 0; > > > > lanes[2] = 0; > > > > lanes[3] = 0; > > > > _1 = { -1, ... }; > > > > _7 = svld1rq_u32 (_1, &lanes); > > > > > > > > to: > > > > _9 = MEM <vector(4) unsigned int> [(unsigned int * {ref-all})&lanes]; > > > > _7 = VEC_PERM_EXPR <_9, _9, { 0, 1, 2, 3, ... }>; > > > > > > > > and then fre1 dump shows: > > > > Applying pattern match.pd:8675, generic-match-5.cc:9025 > > > > Match-and-simplified VEC_PERM_EXPR <_9, _9, { 0, 1, 2, 3, ... }> to { > > > > 0, 0, 0, 0 } > > > > RHS VEC_PERM_EXPR <_9, _9, { 0, 1, 2, 3, ... }> simplified to { 0, 0, 0, 0 } > > > > > > > > The issue seems to be with the following pattern: > > > > (simplify > > > > (vec_perm vec_same_elem_p@0 @0 @1) > > > > @0) > > > > > > > > which simplifies above VEC_PERM_EXPR to: > > > > _7 = {0, 0, 0, 0} > > > > which is incorrect since _9 and mask have different vector lengths. > > > > > > > > The attached patch amends the pattern to simplify above VEC_PERM_EXPR > > > > only if operand and mask have same number of elements, which seems to fix > > > > the issue, and we're left with the following in .optimized dump: > > > > <bb 2> [local count: 1073741824]: > > > > _2 = VEC_PERM_EXPR <{ 0, 0, 0, 0 }, { 0, 0, 0, 0 }, { 0, 1, 2, 3, ... }>; > > > > > > it would be nice to have this optimized. > > > > > > - > > > (simplify > > > (vec_perm vec_same_elem_p@0 @0 @1) > > > - @0) > > > + (if (known_eq (TYPE_VECTOR_SUBPARTS (TREE_TYPE (@0)), > > > + TYPE_VECTOR_SUBPARTS (TREE_TYPE (@1)))) > > > + @0)) > > > > > > that looks good I think. Maybe even better use 'type' instead of TREE_TYPE (@1) > > > since that's more obviously the return type in which case > > > > > > (if (types_match (type, TREE_TYPE (@0)) > > > > > > would be more to the point. > > > > > > But can't you to simplify this in the !known_eq case do a simple > > > > > > { build_vector_from_val (type, the-element); } > > > > > > ? The 'vec_same_elem_p' predicate doesn't get you at the element, > > > > > > (with { tree el = uniform_vector_p (@0); } > > > (if (el) > > > { build_vector_from_val (type, el); }))) > > > > > > would be the cheapest workaround. > > Hi Richard, > > Thanks for the suggestions. Using build_vector_from_val simplifies it to: > > <bb 2> [local count: 1073741824]: > > return { 0, ... }; > > > > Patch is bootstrapped+tested on aarch64-linux-gnu, in progress on > > x86_64-linux-gnu. > > OK to commit ? > > Can you retain the case of matching type? Like > > (if (types_match (type, TREE_TYPE (@0)) > @0 > (with > { > tree elem = uniform_vector_p (@0); > } > (if (elem) > { build_vector_from_val (type, elem); })))) > > ? Because uniform_vector_p is strictly less powerful than (vec_same_elem_p ...) > > OK with that change. Thanks, does the attached patch look OK ? Bootstrapped+tested on aarch64-linux-gnu and x86_64-linux-gnu. Thanks, Prathamesh > > Richard. > > > > > > Thanks, > > Prathamesh > > > > > > > return _2; > > > > > > > > code-gen: > > > > l: > > > > mov z0.b, #0 > > > > ret > > > > > > > > Patch is bootstrapped+tested on aarch64-linux-gnu. > > > > OK to commit ? > > > > > > > > Thanks, > > > > Prathamesh [-- Attachment #2: pr110280-3.txt --] [-- Type: text/plain, Size: 1372 bytes --] [aarch64/match.pd] Fix ICE observed in PR110280. gcc/ChangeLog: PR tree-optimization/110280 * match.pd (vec_perm_expr(v, v, mask) -> v): Explicitly build vector using build_vector_from_val with the element of input operand, and mask's type if operand and mask's types don't match. gcc/testsuite/ChangeLog: * gcc.target/aarch64/sve/pr110280.c: New test. diff --git a/gcc/match.pd b/gcc/match.pd index 2dd23826034..5cbf74c9a06 100644 --- a/gcc/match.pd +++ b/gcc/match.pd @@ -8672,7 +8672,14 @@ and, (simplify (vec_perm vec_same_elem_p@0 @0 @1) - @0) + (if (types_match (type, TREE_TYPE (@0))) + @0 + (with + { + tree elem = uniform_vector_p (@0); + } + (if (elem) + { build_vector_from_val (type, elem); })))) /* Push VEC_PERM earlier if that may help FMA perception (PR101895). */ (simplify diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pr110280.c b/gcc/testsuite/gcc.target/aarch64/sve/pr110280.c new file mode 100644 index 00000000000..d3279f38362 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve/pr110280.c @@ -0,0 +1,12 @@ +/* { dg-do compile } */ +/* { dg-options "-O3 -fdump-tree-optimized" } */ + +#include "arm_sve.h" + +svuint32_t l() +{ + _Alignas(16) const unsigned int lanes[4] = {0, 0, 0, 0}; + return svld1rq_u32(svptrue_b8(), lanes); +} + +/* { dg-final { scan-tree-dump-not "VEC_PERM_EXPR" "optimized" } } */ ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [SVE][match.pd] Fix ICE observed in PR110280 2023-06-23 9:09 ` Prathamesh Kulkarni @ 2023-06-23 9:28 ` Richard Biener 2023-06-23 10:03 ` Prathamesh Kulkarni 0 siblings, 1 reply; 7+ messages in thread From: Richard Biener @ 2023-06-23 9:28 UTC (permalink / raw) To: Prathamesh Kulkarni; +Cc: gcc Patches, Richard Sandiford On Fri, Jun 23, 2023 at 11:09 AM Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org> wrote: > > On Thu, 22 Jun 2023 at 18:06, Richard Biener <richard.guenther@gmail.com> wrote: > > > > On Thu, Jun 22, 2023 at 11:08 AM Prathamesh Kulkarni > > <prathamesh.kulkarni@linaro.org> wrote: > > > > > > On Tue, 20 Jun 2023 at 16:47, Richard Biener <richard.guenther@gmail.com> wrote: > > > > > > > > On Tue, Jun 20, 2023 at 11:56 AM Prathamesh Kulkarni via Gcc-patches > > > > <gcc-patches@gcc.gnu.org> wrote: > > > > > > > > > > Hi Richard, > > > > > For the following reduced test-case taken from PR: > > > > > > > > > > #include "arm_sve.h" > > > > > svuint32_t l() { > > > > > alignas(16) const unsigned int lanes[4] = {0, 0, 0, 0}; > > > > > return svld1rq_u32(svptrue_b8(), lanes); > > > > > } > > > > > > > > > > compiling with -O3 -mcpu=generic+sve results in following ICE: > > > > > during GIMPLE pass: fre > > > > > pr110280.c: In function 'l': > > > > > pr110280.c:5:1: internal compiler error: in eliminate_stmt, at > > > > > tree-ssa-sccvn.cc:6890 > > > > > 5 | } > > > > > | ^ > > > > > 0x865fb1 eliminate_dom_walker::eliminate_stmt(basic_block_def*, > > > > > gimple_stmt_iterator*) > > > > > ../../gcc/gcc/tree-ssa-sccvn.cc:6890 > > > > > 0x120bf4d eliminate_dom_walker::before_dom_children(basic_block_def*) > > > > > ../../gcc/gcc/tree-ssa-sccvn.cc:7324 > > > > > 0x120bf4d eliminate_dom_walker::before_dom_children(basic_block_def*) > > > > > ../../gcc/gcc/tree-ssa-sccvn.cc:7257 > > > > > 0x1aeec77 dom_walker::walk(basic_block_def*) > > > > > ../../gcc/gcc/domwalk.cc:311 > > > > > 0x11fd924 eliminate_with_rpo_vn(bitmap_head*) > > > > > ../../gcc/gcc/tree-ssa-sccvn.cc:7504 > > > > > 0x1214664 do_rpo_vn_1 > > > > > ../../gcc/gcc/tree-ssa-sccvn.cc:8616 > > > > > 0x1215ba5 execute > > > > > ../../gcc/gcc/tree-ssa-sccvn.cc:8702 > > > > > > > > > > cc1 simplifies: > > > > > lanes[0] = 0; > > > > > lanes[1] = 0; > > > > > lanes[2] = 0; > > > > > lanes[3] = 0; > > > > > _1 = { -1, ... }; > > > > > _7 = svld1rq_u32 (_1, &lanes); > > > > > > > > > > to: > > > > > _9 = MEM <vector(4) unsigned int> [(unsigned int * {ref-all})&lanes]; > > > > > _7 = VEC_PERM_EXPR <_9, _9, { 0, 1, 2, 3, ... }>; > > > > > > > > > > and then fre1 dump shows: > > > > > Applying pattern match.pd:8675, generic-match-5.cc:9025 > > > > > Match-and-simplified VEC_PERM_EXPR <_9, _9, { 0, 1, 2, 3, ... }> to { > > > > > 0, 0, 0, 0 } > > > > > RHS VEC_PERM_EXPR <_9, _9, { 0, 1, 2, 3, ... }> simplified to { 0, 0, 0, 0 } > > > > > > > > > > The issue seems to be with the following pattern: > > > > > (simplify > > > > > (vec_perm vec_same_elem_p@0 @0 @1) > > > > > @0) > > > > > > > > > > which simplifies above VEC_PERM_EXPR to: > > > > > _7 = {0, 0, 0, 0} > > > > > which is incorrect since _9 and mask have different vector lengths. > > > > > > > > > > The attached patch amends the pattern to simplify above VEC_PERM_EXPR > > > > > only if operand and mask have same number of elements, which seems to fix > > > > > the issue, and we're left with the following in .optimized dump: > > > > > <bb 2> [local count: 1073741824]: > > > > > _2 = VEC_PERM_EXPR <{ 0, 0, 0, 0 }, { 0, 0, 0, 0 }, { 0, 1, 2, 3, ... }>; > > > > > > > > it would be nice to have this optimized. > > > > > > > > - > > > > (simplify > > > > (vec_perm vec_same_elem_p@0 @0 @1) > > > > - @0) > > > > + (if (known_eq (TYPE_VECTOR_SUBPARTS (TREE_TYPE (@0)), > > > > + TYPE_VECTOR_SUBPARTS (TREE_TYPE (@1)))) > > > > + @0)) > > > > > > > > that looks good I think. Maybe even better use 'type' instead of TREE_TYPE (@1) > > > > since that's more obviously the return type in which case > > > > > > > > (if (types_match (type, TREE_TYPE (@0)) > > > > > > > > would be more to the point. > > > > > > > > But can't you to simplify this in the !known_eq case do a simple > > > > > > > > { build_vector_from_val (type, the-element); } > > > > > > > > ? The 'vec_same_elem_p' predicate doesn't get you at the element, > > > > > > > > (with { tree el = uniform_vector_p (@0); } > > > > (if (el) > > > > { build_vector_from_val (type, el); }))) > > > > > > > > would be the cheapest workaround. > > > Hi Richard, > > > Thanks for the suggestions. Using build_vector_from_val simplifies it to: > > > <bb 2> [local count: 1073741824]: > > > return { 0, ... }; > > > > > > Patch is bootstrapped+tested on aarch64-linux-gnu, in progress on > > > x86_64-linux-gnu. > > > OK to commit ? > > > > Can you retain the case of matching type? Like > > > > (if (types_match (type, TREE_TYPE (@0)) > > @0 > > (with > > { > > tree elem = uniform_vector_p (@0); > > } > > (if (elem) > > { build_vector_from_val (type, elem); })))) > > > > ? Because uniform_vector_p is strictly less powerful than (vec_same_elem_p ...) > > > > OK with that change. > Thanks, does the attached patch look OK ? OK. > Bootstrapped+tested on aarch64-linux-gnu and x86_64-linux-gnu. > > Thanks, > Prathamesh > > > > Richard. > > > > > > > > > > Thanks, > > > Prathamesh > > > > > > > > > return _2; > > > > > > > > > > code-gen: > > > > > l: > > > > > mov z0.b, #0 > > > > > ret > > > > > > > > > > Patch is bootstrapped+tested on aarch64-linux-gnu. > > > > > OK to commit ? > > > > > > > > > > Thanks, > > > > > Prathamesh ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [SVE][match.pd] Fix ICE observed in PR110280 2023-06-23 9:28 ` Richard Biener @ 2023-06-23 10:03 ` Prathamesh Kulkarni 0 siblings, 0 replies; 7+ messages in thread From: Prathamesh Kulkarni @ 2023-06-23 10:03 UTC (permalink / raw) To: Richard Biener; +Cc: gcc Patches, Richard Sandiford On Fri, 23 Jun 2023 at 14:58, Richard Biener <richard.guenther@gmail.com> wrote: > > On Fri, Jun 23, 2023 at 11:09 AM Prathamesh Kulkarni > <prathamesh.kulkarni@linaro.org> wrote: > > > > On Thu, 22 Jun 2023 at 18:06, Richard Biener <richard.guenther@gmail.com> wrote: > > > > > > On Thu, Jun 22, 2023 at 11:08 AM Prathamesh Kulkarni > > > <prathamesh.kulkarni@linaro.org> wrote: > > > > > > > > On Tue, 20 Jun 2023 at 16:47, Richard Biener <richard.guenther@gmail.com> wrote: > > > > > > > > > > On Tue, Jun 20, 2023 at 11:56 AM Prathamesh Kulkarni via Gcc-patches > > > > > <gcc-patches@gcc.gnu.org> wrote: > > > > > > > > > > > > Hi Richard, > > > > > > For the following reduced test-case taken from PR: > > > > > > > > > > > > #include "arm_sve.h" > > > > > > svuint32_t l() { > > > > > > alignas(16) const unsigned int lanes[4] = {0, 0, 0, 0}; > > > > > > return svld1rq_u32(svptrue_b8(), lanes); > > > > > > } > > > > > > > > > > > > compiling with -O3 -mcpu=generic+sve results in following ICE: > > > > > > during GIMPLE pass: fre > > > > > > pr110280.c: In function 'l': > > > > > > pr110280.c:5:1: internal compiler error: in eliminate_stmt, at > > > > > > tree-ssa-sccvn.cc:6890 > > > > > > 5 | } > > > > > > | ^ > > > > > > 0x865fb1 eliminate_dom_walker::eliminate_stmt(basic_block_def*, > > > > > > gimple_stmt_iterator*) > > > > > > ../../gcc/gcc/tree-ssa-sccvn.cc:6890 > > > > > > 0x120bf4d eliminate_dom_walker::before_dom_children(basic_block_def*) > > > > > > ../../gcc/gcc/tree-ssa-sccvn.cc:7324 > > > > > > 0x120bf4d eliminate_dom_walker::before_dom_children(basic_block_def*) > > > > > > ../../gcc/gcc/tree-ssa-sccvn.cc:7257 > > > > > > 0x1aeec77 dom_walker::walk(basic_block_def*) > > > > > > ../../gcc/gcc/domwalk.cc:311 > > > > > > 0x11fd924 eliminate_with_rpo_vn(bitmap_head*) > > > > > > ../../gcc/gcc/tree-ssa-sccvn.cc:7504 > > > > > > 0x1214664 do_rpo_vn_1 > > > > > > ../../gcc/gcc/tree-ssa-sccvn.cc:8616 > > > > > > 0x1215ba5 execute > > > > > > ../../gcc/gcc/tree-ssa-sccvn.cc:8702 > > > > > > > > > > > > cc1 simplifies: > > > > > > lanes[0] = 0; > > > > > > lanes[1] = 0; > > > > > > lanes[2] = 0; > > > > > > lanes[3] = 0; > > > > > > _1 = { -1, ... }; > > > > > > _7 = svld1rq_u32 (_1, &lanes); > > > > > > > > > > > > to: > > > > > > _9 = MEM <vector(4) unsigned int> [(unsigned int * {ref-all})&lanes]; > > > > > > _7 = VEC_PERM_EXPR <_9, _9, { 0, 1, 2, 3, ... }>; > > > > > > > > > > > > and then fre1 dump shows: > > > > > > Applying pattern match.pd:8675, generic-match-5.cc:9025 > > > > > > Match-and-simplified VEC_PERM_EXPR <_9, _9, { 0, 1, 2, 3, ... }> to { > > > > > > 0, 0, 0, 0 } > > > > > > RHS VEC_PERM_EXPR <_9, _9, { 0, 1, 2, 3, ... }> simplified to { 0, 0, 0, 0 } > > > > > > > > > > > > The issue seems to be with the following pattern: > > > > > > (simplify > > > > > > (vec_perm vec_same_elem_p@0 @0 @1) > > > > > > @0) > > > > > > > > > > > > which simplifies above VEC_PERM_EXPR to: > > > > > > _7 = {0, 0, 0, 0} > > > > > > which is incorrect since _9 and mask have different vector lengths. > > > > > > > > > > > > The attached patch amends the pattern to simplify above VEC_PERM_EXPR > > > > > > only if operand and mask have same number of elements, which seems to fix > > > > > > the issue, and we're left with the following in .optimized dump: > > > > > > <bb 2> [local count: 1073741824]: > > > > > > _2 = VEC_PERM_EXPR <{ 0, 0, 0, 0 }, { 0, 0, 0, 0 }, { 0, 1, 2, 3, ... }>; > > > > > > > > > > it would be nice to have this optimized. > > > > > > > > > > - > > > > > (simplify > > > > > (vec_perm vec_same_elem_p@0 @0 @1) > > > > > - @0) > > > > > + (if (known_eq (TYPE_VECTOR_SUBPARTS (TREE_TYPE (@0)), > > > > > + TYPE_VECTOR_SUBPARTS (TREE_TYPE (@1)))) > > > > > + @0)) > > > > > > > > > > that looks good I think. Maybe even better use 'type' instead of TREE_TYPE (@1) > > > > > since that's more obviously the return type in which case > > > > > > > > > > (if (types_match (type, TREE_TYPE (@0)) > > > > > > > > > > would be more to the point. > > > > > > > > > > But can't you to simplify this in the !known_eq case do a simple > > > > > > > > > > { build_vector_from_val (type, the-element); } > > > > > > > > > > ? The 'vec_same_elem_p' predicate doesn't get you at the element, > > > > > > > > > > (with { tree el = uniform_vector_p (@0); } > > > > > (if (el) > > > > > { build_vector_from_val (type, el); }))) > > > > > > > > > > would be the cheapest workaround. > > > > Hi Richard, > > > > Thanks for the suggestions. Using build_vector_from_val simplifies it to: > > > > <bb 2> [local count: 1073741824]: > > > > return { 0, ... }; > > > > > > > > Patch is bootstrapped+tested on aarch64-linux-gnu, in progress on > > > > x86_64-linux-gnu. > > > > OK to commit ? > > > > > > Can you retain the case of matching type? Like > > > > > > (if (types_match (type, TREE_TYPE (@0)) > > > @0 > > > (with > > > { > > > tree elem = uniform_vector_p (@0); > > > } > > > (if (elem) > > > { build_vector_from_val (type, elem); })))) > > > > > > ? Because uniform_vector_p is strictly less powerful than (vec_same_elem_p ...) > > > > > > OK with that change. > > Thanks, does the attached patch look OK ? > > OK. Thanks, pushed to trunk in 85d8e0d8d5342ec8b4e6a54e22741c30b33c6f04. Thanks, Prathamesh > > > Bootstrapped+tested on aarch64-linux-gnu and x86_64-linux-gnu. > > > > Thanks, > > Prathamesh > > > > > > Richard. > > > > > > > > > > > > > > Thanks, > > > > Prathamesh > > > > > > > > > > > return _2; > > > > > > > > > > > > code-gen: > > > > > > l: > > > > > > mov z0.b, #0 > > > > > > ret > > > > > > > > > > > > Patch is bootstrapped+tested on aarch64-linux-gnu. > > > > > > OK to commit ? > > > > > > > > > > > > Thanks, > > > > > > Prathamesh ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2023-06-23 10:03 UTC | newest] Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2023-06-20 9:54 [SVE][match.pd] Fix ICE observed in PR110280 Prathamesh Kulkarni 2023-06-20 11:15 ` Richard Biener 2023-06-22 9:07 ` Prathamesh Kulkarni 2023-06-22 12:33 ` Richard Biener 2023-06-23 9:09 ` Prathamesh Kulkarni 2023-06-23 9:28 ` Richard Biener 2023-06-23 10:03 ` Prathamesh Kulkarni
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).