On Thu, Oct 6, 2022 at 2:13 PM Richard Biener wrote: > > On Fri, Sep 30, 2022 at 10:00 AM wrote: > > > > From: Ju-Zhe Zhong > > > > Hi, After fixing previous ICE. > > I add full implementation (insert permutation to get correct result.) > > > > The gimple IR is correct now I think: > > # t_21 = PHI <_4(6), t_12(9)> > > # i_22 = PHI > > # vectp_a.6_26 = PHI > > # vect_vec_recur_.9_9 = PHI > > # vectp_b.11_7 = PHI > > # curr_cnt_36 = PHI > > # loop_len_20 = PHI > > _38 = .WHILE_LEN (loop_len_20, 32, POLY_INT_CST [4, 4]); > > while_len_37 = _38; > > _1 = (long unsigned int) i_22; > > _2 = _1 * 4; > > _3 = a_14(D) + _2; > > vect__4.8_19 = .LEN_LOAD (vectp_a.6_26, 32B, loop_len_20, 0); > > _4 = *_3; > > _5 = b_15(D) + _2; > > vect_vec_recur_.9_9 = VEC_PERM_EXPR ; > > > > But I encounter another ICE: > > 0x169e0e7 process_bb > > ../../../riscv-gcc/gcc/tree-ssa-sccvn.cc:7498 > > 0x16a09af do_rpo_vn(function*, edge_def*, bitmap_head*, bool, bool, vn_lookup_kind) > > ../../../riscv-gcc/gcc/tree-ssa-sccvn.cc:8109 > > 0x16a0fe7 do_rpo_vn(function*, edge_def*, bitmap_head*) > > ../../../riscv-gcc/gcc/tree-ssa-sccvn.cc:8205 > > 0x179b7db execute > > ../../../riscv-gcc/gcc/tree-vectorizer.cc:1365 > > > > Could you help me with this? After fixing this ICE, I think the loop vectorizer > > can run correctly. Maybe you can test is in X86 or ARM after fixing this ICE. > > Sorry for the late reply, the issue is that we have > > vect_vec_recur_.7_7 = VEC_PERM_EXPR { 7, 8, 9, 10, 11, 12, 13, 14 }>; > > thus > > + for (unsigned i = 0; i < ncopies; ++i) > + { > + gphi *phi = as_a (STMT_VINFO_VEC_STMTS (def_stmt_info)[i]); > + tree latch = PHI_ARG_DEF_FROM_EDGE (phi, loop_latch_edge (loop)); > + tree recur = gimple_phi_result (phi); > + gassign *assign > + = gimple_build_assign (recur, VEC_PERM_EXPR, recur, latch, perm); > + gimple_assign_set_lhs (assign, recur); > > needs to create a new SSA name for each LHS. You shouldn't create code in > vect_get_vec_defs_for_operand either. > > Let me mangle the patch a bit. > > The attached is what I came up with, the permutes need to be generated when > the backedge PHI values are filled in. Missing are ncopies > 1 handling, we'd > need to think of how the initial value and the permutes would work here, missing > is SLP support but more importantly handling in the epilogue (so on x86 requires > constant loop bound) > I've added a testcase that triggers on x86_64. Actually I broke it, the following is more correct. Richard. > Richard.