On Fri, Sep 30, 2022 at 10:00 AM wrote: > > From: Ju-Zhe Zhong > > Hi, After fixing previous ICE. > I add full implementation (insert permutation to get correct result.) > > The gimple IR is correct now I think: > # t_21 = PHI <_4(6), t_12(9)> > # i_22 = PHI > # vectp_a.6_26 = PHI > # vect_vec_recur_.9_9 = PHI > # vectp_b.11_7 = PHI > # curr_cnt_36 = PHI > # loop_len_20 = PHI > _38 = .WHILE_LEN (loop_len_20, 32, POLY_INT_CST [4, 4]); > while_len_37 = _38; > _1 = (long unsigned int) i_22; > _2 = _1 * 4; > _3 = a_14(D) + _2; > vect__4.8_19 = .LEN_LOAD (vectp_a.6_26, 32B, loop_len_20, 0); > _4 = *_3; > _5 = b_15(D) + _2; > vect_vec_recur_.9_9 = VEC_PERM_EXPR ; > > But I encounter another ICE: > 0x169e0e7 process_bb > ../../../riscv-gcc/gcc/tree-ssa-sccvn.cc:7498 > 0x16a09af do_rpo_vn(function*, edge_def*, bitmap_head*, bool, bool, vn_lookup_kind) > ../../../riscv-gcc/gcc/tree-ssa-sccvn.cc:8109 > 0x16a0fe7 do_rpo_vn(function*, edge_def*, bitmap_head*) > ../../../riscv-gcc/gcc/tree-ssa-sccvn.cc:8205 > 0x179b7db execute > ../../../riscv-gcc/gcc/tree-vectorizer.cc:1365 > > Could you help me with this? After fixing this ICE, I think the loop vectorizer > can run correctly. Maybe you can test is in X86 or ARM after fixing this ICE. Sorry for the late reply, the issue is that we have vect_vec_recur_.7_7 = VEC_PERM_EXPR ; thus + for (unsigned i = 0; i < ncopies; ++i) + { + gphi *phi = as_a (STMT_VINFO_VEC_STMTS (def_stmt_info)[i]); + tree latch = PHI_ARG_DEF_FROM_EDGE (phi, loop_latch_edge (loop)); + tree recur = gimple_phi_result (phi); + gassign *assign + = gimple_build_assign (recur, VEC_PERM_EXPR, recur, latch, perm); + gimple_assign_set_lhs (assign, recur); needs to create a new SSA name for each LHS. You shouldn't create code in vect_get_vec_defs_for_operand either. Let me mangle the patch a bit. The attached is what I came up with, the permutes need to be generated when the backedge PHI values are filled in. Missing are ncopies > 1 handling, we'd need to think of how the initial value and the permutes would work here, missing is SLP support but more importantly handling in the epilogue (so on x86 requires constant loop bound) I've added a testcase that triggers on x86_64. Richard.