From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ej1-x62c.google.com (mail-ej1-x62c.google.com [IPv6:2a00:1450:4864:20::62c]) by sourceware.org (Postfix) with ESMTPS id 4B7233858D39 for ; Thu, 29 Sep 2022 14:20:23 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 4B7233858D39 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com Received: by mail-ej1-x62c.google.com with SMTP id rk17so3154432ejb.1 for ; Thu, 29 Sep 2022 07:20:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date; bh=4BH1F2z18F4r2BokpVj6fEp3xlSFG5+AnLmxd77hrD0=; b=eQ+W+9V0ZM91SRVR8T6B/myckXfwcDhH6v6Cgxe+JhQXGI7xEmAHwv4QzAPWdsHJDy t/+ShR44cgqgWih8JVabZ7PD4phA2XTxCuxfTFk4ZFE9YMWs2PPx0t2zIergmnNpQziB sXiS6BV4JiVUIL8MMMfv+rXEpFPNbvDAWY8Dgh9FgS9jZoGo09IwkljR6Inot7GMvRzv 6CVo9hGPjJuhIq0murVzL5g2bj2z1mLGm0n41ONSQTjfuphKNNYT6QCL3aSPv0mnnof8 AhxU1AU5Fszve2UsYEu4ErSjnlDUwhb5Md6ptlLgERcd9WehThoK5rzpZK3hPo7hnUYI IRQg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date; bh=4BH1F2z18F4r2BokpVj6fEp3xlSFG5+AnLmxd77hrD0=; b=VeIVZW5A1aLU3C5cGF/SG/cDN2jf+6rfPJaL8KZopwVh+koK4TERT5+QDSBEMyj/1d geAway1Wde9CTyNNhvWeHVaxEoIujpV8243t7EaN3oJSpYSCwpv6vufNmi9T/UGEimR+ GreDwIyLcqJjkFuaXjDYfB0yfclgBdzyWKowcRDYNKLm/1ZlwhIrnMapJM5Aa9OiR8yn lfDZoddVD3XviLmySMxnlRB9GZ6PTEO9C72+rtbAuqs218hqGm2z+/w43P9TR3vuMmsE TV88bxNkCa0lhAlAnjKIbhQyAHOjfHNYPdo1tflysL4lSnGfeJ4mAu400Y4BNC5yAjfu uZzA== X-Gm-Message-State: ACrzQf1hYE5wsNBmj7O8zlpkM7J4g5IRSODKbqgs/9uRwETWlMyt0adN OP0SGKxf9XgO6xU5t6fn8TMEsey3nSODvZySjAE= X-Google-Smtp-Source: AMsMyM4aE6CvBvjnxRiX5j+SlQxgvBDYAIhIWzCemc2T/mjsvXLtoPV9ib/kTS4nHsAj2G/od4wb6YYtbqW62B6c73A= X-Received: by 2002:a17:907:7632:b0:76f:f6e7:36cf with SMTP id jy18-20020a170907763200b0076ff6e736cfmr2977464ejc.442.1664461221728; Thu, 29 Sep 2022 07:20:21 -0700 (PDT) MIME-Version: 1.0 References: <20220929110723.277330-1-juzhe.zhong@rivai.ai> In-Reply-To: <20220929110723.277330-1-juzhe.zhong@rivai.ai> From: Richard Biener Date: Thu, 29 Sep 2022 16:20:09 +0200 Message-ID: Subject: Re: [Unfinished PATCH] Add first-order recurrence autovectorization To: juzhe.zhong@rivai.ai Cc: gcc-patches@gcc.gnu.org, richard.sandiford@arm.com Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-8.2 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,GIT_PATCH_0,KAM_SHORT,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP,WEIRD_PORT autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Thu, Sep 29, 2022 at 1:07 PM wrote: > > From: Ju-Zhe Zhong > > gcc/ChangeLog: > > * tree-vect-loop.cc (vect_phi_first_order_recurrence_p): New function. > (vect_analyze_scalar_cycles_1): Classify first-order recurrence phi. > (vect_analyze_loop_operations): Add first-order recurrence autovectorization support. > (vectorizable_dep_phi): New function. > (vect_use_first_order_phi_result_p): New function. > (vect_transform_loop): Add first-order recurrence autovectorization support. > * tree-vect-stmts.cc (vect_transform_stmt): Ditto. > (vect_is_simple_use): Ditto. > * tree-vectorizer.h (enum vect_def_type): New enum. > (enum stmt_vec_info_type): Ditto. > (vectorizable_dep_phi): New function. > > Hi, since Richard said I can post unfinished for help, I post it. > This patch is for fix issue:https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99409. > LLVM can vectorize this case using first-order recurrence loop-vectorizer. > This patch is inspired by first-order recurrence autovectorization support in LLVM: > https://reviews.llvm.org/D16197 > There is a link that I can show you several cases that GCC fails vectorization > because no support of firs-order recurrence vectorization: https://godbolt.org/z/nzf1Wrd6T > > Let's consider a simple case that I simplify: > void foo (int32_t * __restrict__ a, int32_t * __restrict__ b, int32_t * __restrict__ c, int n) > { > int32_t t = *c; > for (int i = 0; i < n; ++i) > { > b[i] = a[i] - t; > t = a[i]; > } > } > > Applying this patch, my downstream RVV GCC can vectorize with -fdump-tree-vect: > > note: vect_is_simple_use: operand t_21 = PHI <_4(6), t_12(5)>, type of def: first order recurrence > > However, it ICE in "dce6" when removing PHI node "t_21 = PHI <_4(6), t_12(5)>": > 0x143c174 crash_signal > ../../../riscv-gcc/gcc/toplev.cc:322 > 0x170d4fd delink_imm_use > ../../../riscv-gcc/gcc/ssa-iterators.h:257 > I was stuck by this issue. Besides, this patch has more 2 more things to do that I didn't implement: The issue is you're using gimple *first_use = first_imm_use_stmt (&imm, phi_result); but that's an internal function, you need to use gimple *first_use; use_operand_p use_p; single_imm_use (phi_result, &use_p, &first_use); Otherwise you are corrupting the immediate use list. When that's fixed it seems to work (in the partial way it's implemented). > 1. insert VEC_PERM before the vector subtraction statement (Because I was stuck, I didn't continue > implementing this patch and miss this.) > 2. Support this vectorization in SLP autovectorizaiton. > > To understand this patch, 2 functions are important: > > 1. vect_phi_first_order_recurrence_p, this function is used to forbid the cases that can not be vectorized > by this vectorizer. The constraints there are strictly the same as LLVM. > 2. vectorizable_dep_phi, the implementation of first-order recurrence vectorizer. > > I hope someone can help me fix && finish && test && refine this patch. > > Thanks. > > --- > gcc/tree-vect-loop.cc | 239 ++++++++++++++++++++++++++++++++++++++++- > gcc/tree-vect-stmts.cc | 12 ++- > gcc/tree-vectorizer.h | 4 + > 3 files changed, 252 insertions(+), 3 deletions(-) > > diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc > index 2536cc3cf49..adb48356c23 100644 > --- a/gcc/tree-vect-loop.cc > +++ b/gcc/tree-vect-loop.cc > @@ -529,6 +529,57 @@ vect_inner_phi_in_double_reduction_p (loop_vec_info loop_vinfo, gphi *phi) > return false; > } > > +/* Returns true if Phi is a first-order recurrence. A first-order > + recurrence is a non-reduction recurrence relation in which the value of > + the recurrence in the current loop iteration equals a value defined in > + the previous iteration. */ > + > +static bool > +vect_phi_first_order_recurrence_p (loop_vec_info loop_vinfo, class loop *loop, > + gphi *phi) > +{ > + /* Ensure the phi node is in the loop header and has two incoming values. */ > + if (gimple_bb (phi) != loop->header || gimple_phi_num_args (phi) != 2) > + return false; > + > + /* Ensure the loop has a preheader and a single latch block. The loop > + vectorizer will need the latch to set up the next iteration of the loop. */ > + edge preheader = loop_preheader_edge (loop); > + edge latch = loop_latch_edge (loop); > + if (!preheader || !latch) > + return false; > + > + /* Ensure the phi node's incoming blocks are the loop preheader and latch. */ > + if (!PHI_ARG_DEF_FROM_EDGE (phi, preheader) > + || !PHI_ARG_DEF_FROM_EDGE (phi, latch)) > + return false; > + > + /* Get the previous value. The previous value comes from the latch edge while > + the initial value comes form the preheader edge. */ > + gimple *previous = SSA_NAME_DEF_STMT (PHI_ARG_DEF_FROM_EDGE (phi, latch)); > + if (!previous) > + return false; > + > + /* Ensure every use_stmt of the phi node is dominated by the previous value. > + The dominance requirement ensures the loop vectorizer will not need to > + vectorize the initial value prior to the first iteration of the loop. */ > + gimple *use_stmt; > + imm_use_iterator imm_iter; > + FOR_EACH_IMM_USE_STMT (use_stmt, imm_iter, gimple_phi_result (phi)) > + if (use_stmt) > + if (!dominated_by_p (CDI_DOMINATORS, gimple_bb (use_stmt), > + gimple_bb (previous))) > + return false; > + > + /* First-order recurrence autovectorization needs shuffle vector. */ > + tree scalar_type = TREE_TYPE (PHI_RESULT (phi)); > + tree vectype = get_vectype_for_scalar_type (loop_vinfo, scalar_type); > + if (!can_vec_perm_var_p (TYPE_MODE (vectype))) > + return false; > + > + return true; > +} > + > /* Function vect_analyze_scalar_cycles_1. > > Examine the cross iteration def-use cycles of scalar variables > @@ -666,6 +717,8 @@ vect_analyze_scalar_cycles_1 (loop_vec_info loop_vinfo, class loop *loop, > } > } > } > + else if (vect_phi_first_order_recurrence_p (loop_vinfo, loop, phi)) > + STMT_VINFO_DEF_TYPE (stmt_vinfo) = vect_first_order_recurrence; > else > if (dump_enabled_p ()) > dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, > @@ -1808,9 +1861,13 @@ vect_analyze_loop_operations (loop_vec_info loop_vinfo) > > gcc_assert (stmt_info); > > + if (STMT_VINFO_DEF_TYPE (stmt_info) == vect_first_order_recurrence) > + ok = vectorizable_dep_phi (loop_vinfo, stmt_info, NULL, NULL); > + > if ((STMT_VINFO_RELEVANT (stmt_info) == vect_used_in_scope > || STMT_VINFO_LIVE_P (stmt_info)) > - && STMT_VINFO_DEF_TYPE (stmt_info) != vect_induction_def) > + && STMT_VINFO_DEF_TYPE (stmt_info) != vect_induction_def > + && STMT_VINFO_DEF_TYPE (stmt_info) != vect_first_order_recurrence) > /* A scalar-dependence cycle that we don't support. */ > return opt_result::failure_at (phi, > "not vectorized:" > @@ -8267,6 +8324,120 @@ vectorizable_phi (vec_info *, > return true; > } > > +/* Vectorizes Dep PHIs. */ > + > +bool > +vectorizable_dep_phi (loop_vec_info loop_vinfo, stmt_vec_info stmt_info, > + gimple **vec_stmt, slp_tree slp_node) > +{ > + if (!loop_vinfo || !is_a (stmt_info->stmt)) > + return false; > + > + gphi *phi = as_a (stmt_info->stmt); > + > + /* So far we only support first-order recurrence auto-vectorization. */ > + if (STMT_VINFO_DEF_TYPE (stmt_info) != vect_first_order_recurrence) > + return false; > + > + if (!vec_stmt) /* transformation not required. */ > + { > + /* So far we don't support first-order recurrence for SLP > + * auto-vectorization. */ > + if (slp_node) > + { > + if (dump_enabled_p ()) > + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, > + "do not support first-order recurrence" > + "autovectorization for SLP node\n"); > + return false; > + } > + STMT_VINFO_TYPE (stmt_info) = dep_phi_info_type; > + return true; > + } > + > + /* This is the second phase of vectorizing first-order rececurrences. An > + overview of the transformation is described below. Suppose we have the > + following loop. > + > + int32_t t = 0; > + for (int i = 0; i < n; ++i) > + { > + b[i] = a[i] - t; > + t = a[i]; > + } > + > + There is a first-order recurrence on "a". For this loop, the shorthand > + scalar IR looks like: > + > + scalar.preheader: > + init = a[-1] > + br loop.body > + > + scalar.body: > + i = PHI <0(scalar.preheader), i+1(scalar.body)> > + _2 = PHI <(init(scalar.preheader), <_1(scalar.body)> > + _1 = a[i] > + b[i] = _1 - _2 > + br cond, scalar.body, ... > + > + In this example, _2 is a recurrence because it's value depends on the > + previous iteration. In the first phase of vectorization, we created a > + temporary value for _2. We now complete the vectorization and produce the > + shorthand vector IR shown below (VF = 4). > + > + vector.preheader: > + vect_init = vect_cst(..., ..., ..., a[-1]) > + br vector.body > + > + vector.body > + i = PHI <0(vector.preheader), i+4(vector.body)> > + vect_1 = PHI > + vect_2 = a[i, i+1, i+2, i+3]; > + vect_3 = vector(vect_1(3), vect_2(0, 1, 2)) > + b[i, i+1, i+2, i+3] = vect_2 - vect_3 > + br cond, vector.body, middle.block > + > + middle.block: > + x = vect_2(3) > + br scalar.preheader > + > + scalar.ph: > + s_init = PHI > + br scalar.body > + > + After execution completes the vector loop, we extract the next value of > + the recurrence (x) to use as the initial value in the scalar loop. */ > + > + class loop *loop = LOOP_VINFO_LOOP (loop_vinfo); > + tree vectype = STMT_VINFO_VECTYPE (stmt_info); > + tree scalar_dest = gimple_phi_result (stmt_info->stmt); > + basic_block bb = gimple_bb (stmt_info->stmt); > + tree vec_dest = vect_create_destination_var (scalar_dest, vectype); > + auto_vec vec_oprnds; > + tree preheader = PHI_ARG_DEF_FROM_EDGE (phi, loop_preheader_edge (loop)); > + tree latch = PHI_ARG_DEF_FROM_EDGE (phi, loop_latch_edge (loop)); > + gimple_seq stmts = NULL; > + > + tree vec_init = build_vector_from_val (vectype, preheader); > + vec_init = vect_init_vector (loop_vinfo, stmt_info, > + vec_init, vectype, NULL); > + vect_get_vec_defs (loop_vinfo, stmt_info, slp_node, > + !slp_node ? vect_get_num_copies (loop_vinfo, vectype) : 1, > + gimple_phi_arg_def (stmt_info->stmt, 0), &vec_oprnds); > + /* Create the vectorized first-order PHI node. */ > + gphi *new_phi = create_phi_node (vec_dest, bb); > + add_phi_arg (new_phi, vec_oprnds[0], loop_latch_edge (loop), > + UNKNOWN_LOCATION); > + add_phi_arg (new_phi, vec_init, loop_preheader_edge (loop), UNKNOWN_LOCATION); > + if (slp_node) > + SLP_TREE_VEC_STMTS (slp_node).quick_push (new_phi); > + else > + STMT_VINFO_VEC_STMTS (stmt_info).safe_push (new_phi); > + if (!slp_node) > + *vec_stmt = STMT_VINFO_VEC_STMTS (stmt_info)[0]; > + return true; > +} > + > /* Return true if VECTYPE represents a vector that requires lowering > by the vector lowering pass. */ > > @@ -10476,6 +10647,26 @@ update_epilogue_loop_vinfo (class loop *epilogue, tree advance) > epilogue_vinfo->shared->save_datarefs (); > } > > +/* Return true if the stmt is the first statement that uses the result > + of a first-order recurrence phi node. */ > + > +static gphi * > +vect_use_first_order_phi_result_p (auto_vec &first_order_phi_vec, > + gimple *stmt) > +{ > + for (unsigned int i = 0; i < first_order_phi_vec.length (); ++i) > + { > + imm_use_iterator imm; > + gphi *phi = first_order_phi_vec[i]; > + tree phi_result = PHI_RESULT (phi); > + gimple *first_use = first_imm_use_stmt (&imm, phi_result); > + if (first_use && first_use == stmt) > + return phi; > + } > + > + return NULL; > +} > + > /* Function vect_transform_loop. > > The analysis phase has determined that the loop is vectorizable. > @@ -10624,6 +10815,31 @@ vect_transform_loop (loop_vec_info loop_vinfo, gimple *loop_vectorized_call) > basic_block bb = bbs[i]; > stmt_vec_info stmt_info; > > + /* It's used to save the first-order recurrence phi node. > + Consider the following case: > + # t_19 = PHI <_4(6), 0(5)> > + ... > + _4 = *_3; > + ... > + _6 = _4 - t_19; > + > + The vectorization sequence should be: > + 1. Vectorize _4 = *_3; > + 2. Vectorize # t_19 = PHI <_4(6), 0(5)> > + 3, Vectorize _6 = _4 - t_19; > + > + Flow: > + 1. So we save the first-order recurrence PHI in first_order_phi_vec > + and skip vectorizing it tentatively when iterating phi statement > + (save t_19 = PHI <_4(6), 0(5)>). > + 2. Vectorize all statements when iterating all statements > + until we reach the statement who is using the result of > + first-order recurrence PHI (vectorize _4 = *_3;). > + 3. Stop iterating and return to vectorize the PHI saved > + in first_order_phi_vec (# t_19 = PHI <_4(6), 0(5)>). > + 4. Conintue iterating and vectorize the residual statements. */ > + auto_vec first_order_phi_vec; > + > for (gphi_iterator si = gsi_start_phis (bb); !gsi_end_p (si); > gsi_next (&si)) > { > @@ -10677,9 +10893,16 @@ vect_transform_loop (loop_vec_info loop_vinfo, gimple *loop_vectorized_call) > || STMT_VINFO_DEF_TYPE (stmt_info) == vect_reduction_def > || STMT_VINFO_DEF_TYPE (stmt_info) == vect_double_reduction_def > || STMT_VINFO_DEF_TYPE (stmt_info) == vect_nested_cycle > - || STMT_VINFO_DEF_TYPE (stmt_info) == vect_internal_def) > + || STMT_VINFO_DEF_TYPE (stmt_info) == vect_internal_def > + || STMT_VINFO_DEF_TYPE (stmt_info) == vect_first_order_recurrence) > && ! PURE_SLP_STMT (stmt_info)) > maybe_set_vectorized_backedge_value (loop_vinfo, stmt_info); > + > + if (STMT_VINFO_DEF_TYPE (stmt_info) == vect_first_order_recurrence) > + { > + first_order_phi_vec.safe_push (phi); > + continue; > + } > } > > for (gimple_stmt_iterator si = gsi_start_bb (bb); > @@ -10698,6 +10921,18 @@ vect_transform_loop (loop_vec_info loop_vinfo, gimple *loop_vectorized_call) > /* Ignore vector stmts created in the outer loop. */ > stmt_info = loop_vinfo->lookup_stmt (stmt); > > + gphi *first_order_phi > + = vect_use_first_order_phi_result_p (first_order_phi_vec, stmt); > + if (first_order_phi) > + { > + stmt_vec_info first_order_stmt_info > + = loop_vinfo->lookup_stmt (first_order_phi); > + gimple_stmt_iterator first_order_si > + = gsi_for_stmt (first_order_phi); > + vect_transform_loop_stmt (loop_vinfo, first_order_stmt_info, > + &first_order_si, NULL); > + } > + > /* vector stmts created in the outer-loop during vectorization of > stmts in an inner-loop may not have a stmt_info, and do not > need to be vectorized. */ > diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc > index c8d1efc45e5..a8c0a8a4151 100644 > --- a/gcc/tree-vect-stmts.cc > +++ b/gcc/tree-vect-stmts.cc > @@ -11404,6 +11404,12 @@ vect_transform_stmt (vec_info *vinfo, > gcc_assert (done); > break; > > + case dep_phi_info_type: > + done = vectorizable_dep_phi (as_a (vinfo), > + stmt_info, &vec_stmt, slp_node); > + gcc_assert (done); > + break; > + > case phi_info_type: > done = vectorizable_phi (vinfo, stmt_info, &vec_stmt, slp_node, NULL); > gcc_assert (done); > @@ -11804,6 +11810,9 @@ vect_is_simple_use (tree operand, vec_info *vinfo, enum vect_def_type *dt, > case vect_nested_cycle: > dump_printf (MSG_NOTE, "nested cycle\n"); > break; > + case vect_first_order_recurrence: > + dump_printf (MSG_NOTE, "first order recurrence\n"); > + break; > case vect_unknown_def_type: > dump_printf (MSG_NOTE, "unknown\n"); > break; > @@ -11852,7 +11861,8 @@ vect_is_simple_use (tree operand, vec_info *vinfo, enum vect_def_type *dt, > || *dt == vect_induction_def > || *dt == vect_reduction_def > || *dt == vect_double_reduction_def > - || *dt == vect_nested_cycle) > + || *dt == vect_nested_cycle > + || *dt == vect_first_order_recurrence) > { > *vectype = STMT_VINFO_VECTYPE (def_stmt_info); > gcc_assert (*vectype != NULL_TREE); > diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h > index 4870c754499..2c8e7660b4e 100644 > --- a/gcc/tree-vectorizer.h > +++ b/gcc/tree-vectorizer.h > @@ -65,6 +65,7 @@ enum vect_def_type { > vect_reduction_def, > vect_double_reduction_def, > vect_nested_cycle, > + vect_first_order_recurrence, > vect_unknown_def_type > }; > > @@ -1027,6 +1028,7 @@ enum stmt_vec_info_type { > cycle_phi_info_type, > lc_phi_info_type, > phi_info_type, > + dep_phi_info_type, > loop_exit_ctrl_vec_info_type > }; > > @@ -2331,6 +2333,8 @@ extern bool vectorizable_lc_phi (loop_vec_info, stmt_vec_info, > gimple **, slp_tree); > extern bool vectorizable_phi (vec_info *, stmt_vec_info, gimple **, slp_tree, > stmt_vector_for_cost *); > +extern bool vectorizable_dep_phi (loop_vec_info, stmt_vec_info, > + gimple **, slp_tree); > extern bool vect_emulated_vector_p (tree); > extern bool vect_can_vectorize_without_simd_p (tree_code); > extern bool vect_can_vectorize_without_simd_p (code_helper); > -- > 2.36.1 >