On Tue, 20 Jun 2023 at 16:47, Richard Biener wrote: > > On Tue, Jun 20, 2023 at 11:56 AM Prathamesh Kulkarni via Gcc-patches > wrote: > > > > Hi Richard, > > For the following reduced test-case taken from PR: > > > > #include "arm_sve.h" > > svuint32_t l() { > > alignas(16) const unsigned int lanes[4] = {0, 0, 0, 0}; > > return svld1rq_u32(svptrue_b8(), lanes); > > } > > > > compiling with -O3 -mcpu=generic+sve results in following ICE: > > during GIMPLE pass: fre > > pr110280.c: In function 'l': > > pr110280.c:5:1: internal compiler error: in eliminate_stmt, at > > tree-ssa-sccvn.cc:6890 > > 5 | } > > | ^ > > 0x865fb1 eliminate_dom_walker::eliminate_stmt(basic_block_def*, > > gimple_stmt_iterator*) > > ../../gcc/gcc/tree-ssa-sccvn.cc:6890 > > 0x120bf4d eliminate_dom_walker::before_dom_children(basic_block_def*) > > ../../gcc/gcc/tree-ssa-sccvn.cc:7324 > > 0x120bf4d eliminate_dom_walker::before_dom_children(basic_block_def*) > > ../../gcc/gcc/tree-ssa-sccvn.cc:7257 > > 0x1aeec77 dom_walker::walk(basic_block_def*) > > ../../gcc/gcc/domwalk.cc:311 > > 0x11fd924 eliminate_with_rpo_vn(bitmap_head*) > > ../../gcc/gcc/tree-ssa-sccvn.cc:7504 > > 0x1214664 do_rpo_vn_1 > > ../../gcc/gcc/tree-ssa-sccvn.cc:8616 > > 0x1215ba5 execute > > ../../gcc/gcc/tree-ssa-sccvn.cc:8702 > > > > cc1 simplifies: > > lanes[0] = 0; > > lanes[1] = 0; > > lanes[2] = 0; > > lanes[3] = 0; > > _1 = { -1, ... }; > > _7 = svld1rq_u32 (_1, &lanes); > > > > to: > > _9 = MEM [(unsigned int * {ref-all})&lanes]; > > _7 = VEC_PERM_EXPR <_9, _9, { 0, 1, 2, 3, ... }>; > > > > and then fre1 dump shows: > > Applying pattern match.pd:8675, generic-match-5.cc:9025 > > Match-and-simplified VEC_PERM_EXPR <_9, _9, { 0, 1, 2, 3, ... }> to { > > 0, 0, 0, 0 } > > RHS VEC_PERM_EXPR <_9, _9, { 0, 1, 2, 3, ... }> simplified to { 0, 0, 0, 0 } > > > > The issue seems to be with the following pattern: > > (simplify > > (vec_perm vec_same_elem_p@0 @0 @1) > > @0) > > > > which simplifies above VEC_PERM_EXPR to: > > _7 = {0, 0, 0, 0} > > which is incorrect since _9 and mask have different vector lengths. > > > > The attached patch amends the pattern to simplify above VEC_PERM_EXPR > > only if operand and mask have same number of elements, which seems to fix > > the issue, and we're left with the following in .optimized dump: > > [local count: 1073741824]: > > _2 = VEC_PERM_EXPR <{ 0, 0, 0, 0 }, { 0, 0, 0, 0 }, { 0, 1, 2, 3, ... }>; > > it would be nice to have this optimized. > > - > (simplify > (vec_perm vec_same_elem_p@0 @0 @1) > - @0) > + (if (known_eq (TYPE_VECTOR_SUBPARTS (TREE_TYPE (@0)), > + TYPE_VECTOR_SUBPARTS (TREE_TYPE (@1)))) > + @0)) > > that looks good I think. Maybe even better use 'type' instead of TREE_TYPE (@1) > since that's more obviously the return type in which case > > (if (types_match (type, TREE_TYPE (@0)) > > would be more to the point. > > But can't you to simplify this in the !known_eq case do a simple > > { build_vector_from_val (type, the-element); } > > ? The 'vec_same_elem_p' predicate doesn't get you at the element, > > (with { tree el = uniform_vector_p (@0); } > (if (el) > { build_vector_from_val (type, el); }))) > > would be the cheapest workaround. Hi Richard, Thanks for the suggestions. Using build_vector_from_val simplifies it to: [local count: 1073741824]: return { 0, ... }; Patch is bootstrapped+tested on aarch64-linux-gnu, in progress on x86_64-linux-gnu. OK to commit ? Thanks, Prathamesh > > > return _2; > > > > code-gen: > > l: > > mov z0.b, #0 > > ret > > > > Patch is bootstrapped+tested on aarch64-linux-gnu. > > OK to commit ? > > > > Thanks, > > Prathamesh