From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by sourceware.org (Postfix) with ESMTP id D4E683858426 for ; Wed, 26 Apr 2023 11:49:05 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org D4E683858426 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 794E94B3; Wed, 26 Apr 2023 04:49:49 -0700 (PDT) Received: from localhost (e121540-lin.manchester.arm.com [10.32.110.72]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id DE13B3F5A1; Wed, 26 Apr 2023 04:49:04 -0700 (PDT) From: Richard Sandiford To: "juzhe.zhong\@rivai.ai" Mail-Followup-To: "juzhe.zhong\@rivai.ai" ,gcc-patches , rguenther , richard.sandiford@arm.com Cc: gcc-patches , rguenther Subject: Re: [PATCH] VECT: Add decrement IV iteration loop control by variable amount support References: <20230425134229.181115-1-juzhe.zhong@rivai.ai> <8D207B8D88895F0E+20230426121547140175172@rivai.ai> <07D5046915CF0505+20230426180326365636207@rivai.ai> Date: Wed, 26 Apr 2023 12:49:03 +0100 In-Reply-To: <07D5046915CF0505+20230426180326365636207@rivai.ai> (juzhe's message of "Wed, 26 Apr 2023 18:03:27 +0800") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-24.4 required=5.0 tests=BAYES_00,KAM_DMARC_NONE,KAM_DMARC_STATUS,KAM_LAZY_DOMAIN_SECURITY,SPF_HELO_NONE,SPF_NONE,TXREP,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: "juzhe.zhong@rivai.ai" writes: > Hi=EF=BC=8C Richard. > Would you mind take a look at the loop control part again: > > static gcond * > vect_set_loop_condition_partial_vectors (class loop *loop, > loop_vec_info loop_vinfo, tree niters, > tree final_iv, bool niters_maybe_zero, > gimple_stmt_iterator loop_cond_gsi) > ... > tree loop_len_x =3D NULL_TREE; > FOR_EACH_VEC_ELT (*controls, i, rgc) > if (!rgc->controls.is_empty ()) > { > ... > > /* Set up all controls for this group. */ > if (direct_internal_fn_supported_p (IFN_SELECT_VL, iv_type, > OPTIMIZE_FOR_SPEED)) > test_ctrl > =3D vect_set_loop_controls_by_select_vl (loop, loop_vinfo, > &preheader_seq, &header_seq, > rgc, niters, &loop_len_x); > ... > } > > static tree > vect_set_loop_controls_by_select_vl (class loop *loop, loop_vec_info loop= _vinfo, > gimple_seq *preheader_seq, > gimple_seq *header_seq, > rgroup_controls *rgc, tree niters, tree *x) > { > tree compare_type =3D LOOP_VINFO_RGROUP_COMPARE_TYPE (loop_vinfo); > tree iv_type =3D LOOP_VINFO_RGROUP_IV_TYPE (loop_vinfo); > /* We are not allowing masked approach in SELECT_VL. */ > gcc_assert (!LOOP_VINFO_FULLY_MASKED_P (loop_vinfo)); > > tree ctrl_type =3D rgc->type; > unsigned int nitems_per_iter =3D rgc->max_nscalars_per_iter * rgc->fact= or; > poly_uint64 nitems_per_ctrl =3D TYPE_VECTOR_SUBPARTS (ctrl_type) * rgc-= >factor; > poly_uint64 vf =3D LOOP_VINFO_VECT_FACTOR (loop_vinfo); > > /* Calculate the maximum number of item values that the rgroup > handles in total, the number that it handles for each iteration > of the vector loop. */ > tree nitems_total =3D niters; > if (nitems_per_iter !=3D 1) > { > /* We checked before setting LOOP_VINFO_USING_PARTIAL_VECTORS_P that > these multiplications don't overflow. */ > tree compare_factor =3D build_int_cst (compare_type, nitems_per_ite= r); > nitems_total =3D gimple_build (preheader_seq, MULT_EXPR, compare_ty= pe, > nitems_total, compare_factor); > } > > /* Convert the comparison value to the IV type (either a no-op or > a promotion). */ > nitems_total =3D gimple_convert (preheader_seq, iv_type, nitems_total); > > /* Create an induction variable that counts the number of items > processed. */ > tree index_before_incr, index_after_incr; > gimple_stmt_iterator incr_gsi; > bool insert_after; > standard_iv_increment_position (loop, &incr_gsi, &insert_after); > > /* Test the decremented IV, which will never underflow 0 since we have > IFN_SELECT_VL to gurantee that. */ > tree test_limit =3D nitems_total; > > /* Provide a definition of each control in the group. */ > tree ctrl; > unsigned int i; > FOR_EACH_VEC_ELT_REVERSE (rgc->controls, i, ctrl) > { > /* Previous controls will cover BIAS items. This control covers the > next batch. */ > poly_uint64 bias =3D nitems_per_ctrl * i; > tree bias_tree =3D build_int_cst (iv_type, bias); > > /* Rather than have a new IV that starts at TEST_LIMIT and goes dow= n to > BIAS, prefer to use the same TEST_LIMIT - BIAS based IV for each > control and adjust the bound down by BIAS. */ > tree this_test_limit =3D test_limit; > if (i !=3D 0) > { > this_test_limit =3D gimple_build (preheader_seq, MAX_EXPR, iv_type, > this_test_limit, bias_tree); > this_test_limit =3D gimple_build (preheader_seq, MINUS_EXPR, iv_type, > this_test_limit, bias_tree); > } > > /* Create decrement IV. */ > create_iv (this_test_limit, MINUS_EXPR, ctrl, NULL_TREE, loop, &inc= r_gsi, > insert_after, &index_before_incr, &index_after_incr); > > tree res_len; > if (rgc->controls.length () !=3D 1) > { > if (nitems_per_iter =3D=3D 1) > { > /* Generte length =3D (X - VF*I/N) capped to the range [0, VF/N]. */ > /* step =3D VF * I / N. */ > tree step > =3D build_int_cst (iv_type, > exact_div (vf * i, rgc->controls.length ())); > /* Make sure (X - VF*I/N) never underflow zero. */ > tree max =3D gimple_build (header_seq, MAX_EXPR, iv_type, *x, step); > res_len > =3D gimple_build (header_seq, MIN_EXPR, iv_type, > index_before_incr, > build_int_cst (iv_type, vf * nitems_per_iter)); > } > else > { > /* For SLP, we can't allow non-VF number of elements to be > processed in non-final iteration. We force the number of > elements to be processed in each non-final iteration is VF > elements. If we allow non-VF elements processing in non-final > iteration will make SLP too complicated and produce inferior > codegen. > > For example: > > If non-final iteration process VF elements. > > ... > .LEN_STORE (vectp_f.8_51, 128B, _71, { 1, 2, 1, 2 }, 0); > .LEN_STORE (vectp_f.8_56, 128B, _72, { 1, 2, 1, 2 }, 0); > ... > > If non-final iteration process non-VF elements. > > ... > .LEN_STORE (vectp_f.8_51, 128B, _71, { 1, 2, 1, 2 }, 0); > if (_71 % 2 =3D=3D 0) > .LEN_STORE (vectp_f.8_56, 128B, _72, { 1, 2, 1, 2 }, 0); > else > .LEN_STORE (vectp_f.8_56, 128B, _72, { 2, 1, 2, 1 }, 0); > ... > > This is the simple case of 2-elements interleaved vector SLP. > We consider other interleave vector, the situation will become > more complicated. */ > res_len > =3D gimple_build (header_seq, MIN_EXPR, iv_type, > index_before_incr, > build_int_cst (iv_type, vf * nitems_per_iter)); > } > } > else > { > res_len > =3D gimple_build (header_seq, IFN_SELECT_VL, iv_type, > index_before_incr, build_int_cst (iv_type, vf)); > } > gassign *assign =3D gimple_build_assign (ctrl, res_len); > gimple_seq_add_stmt (header_seq, assign); > if (rgc->controls.length () =3D=3D 1) > *x =3D ctrl; > } > > return index_after_incr; > } > > Am I understand correctly ? I think I may need to implement VEC_PACK_TRUN= C to test your idea. The formatting didn't reach me in tact, so TBH it's a bit difficult to follow. But VF/N should in this code be equivalent to nitems_per_ctrl. There shouldn't be any need to build an exact_div. Also, only the single-control rgroup is/needs an IV. The other cases just go at the start of the loop body, using the single-control IV as input. Unless I'm missing something, the same length > 1 code could be used for both SLP and non-SLP. (The length =3D=3D 1 handling would still be different for SLP and non-SLP.) Thanks, Richard