From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by sourceware.org (Postfix) with ESMTP id 98EA93858D28 for ; Fri, 19 May 2023 10:23:06 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 98EA93858D28 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 2AE791FB; Fri, 19 May 2023 03:23:51 -0700 (PDT) Received: from localhost (e121540-lin.manchester.arm.com [10.32.110.72]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id B18523F762; Fri, 19 May 2023 03:23:05 -0700 (PDT) From: Richard Sandiford To: juzhe.zhong@rivai.ai Mail-Followup-To: juzhe.zhong@rivai.ai,gcc-patches@gcc.gnu.org, rguenther@suse.de, richard.sandiford@arm.com Cc: gcc-patches@gcc.gnu.org, rguenther@suse.de Subject: Re: [PATCH V11] VECT: Add decrement IV support in Loop Vectorizer References: <20230516102319.163752-1-juzhe.zhong@rivai.ai> Date: Fri, 19 May 2023 11:23:04 +0100 In-Reply-To: <20230516102319.163752-1-juzhe.zhong@rivai.ai> (juzhe zhong's message of "Tue, 16 May 2023 18:23:19 +0800") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-Spam-Status: No, score=-29.0 required=5.0 tests=BAYES_00,GIT_PATCH_0,KAM_DMARC_NONE,KAM_DMARC_STATUS,KAM_LAZY_DOMAIN_SECURITY,SPF_HELO_NONE,SPF_NONE,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Thanks for the update. I'll split this review into two pieces. Second piece to follow (not sure when, but hopefully soon). juzhe.zhong@rivai.ai writes: > diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc > index ed0166fedab..6f49bdee009 100644 > --- a/gcc/tree-vect-loop.cc > +++ b/gcc/tree-vect-loop.cc > @@ -10364,12 +10375,14 @@ vect_record_loop_len (loop_vec_info loop_vinfo, vec_loop_lens *lens, > rgroup that operates on NVECTORS vectors, where 0 <= INDEX < NVECTORS. */ > > tree > -vect_get_loop_len (loop_vec_info loop_vinfo, vec_loop_lens *lens, > - unsigned int nvectors, unsigned int index) > +vect_get_loop_len (loop_vec_info loop_vinfo, gimple_stmt_iterator *gsi, > + vec_loop_lens *lens, unsigned int nvectors, tree vectype, > + unsigned int index) > { > rgroup_controls *rgl = &(*lens)[nvectors - 1]; > bool use_bias_adjusted_len = > LOOP_VINFO_PARTIAL_LOAD_STORE_BIAS (loop_vinfo) != 0; > + tree iv_type = LOOP_VINFO_RGROUP_IV_TYPE (loop_vinfo); > > /* Populate the rgroup's len array, if this is the first time we've > used it. */ > @@ -10400,6 +10413,26 @@ vect_get_loop_len (loop_vec_info loop_vinfo, vec_loop_lens *lens, > > if (use_bias_adjusted_len) > return rgl->bias_adjusted_ctrl; > + else if (LOOP_VINFO_USING_DECREMENTING_IV_P (loop_vinfo)) > + { > + tree loop_len = rgl->controls[index]; > + poly_int64 nunits1 = TYPE_VECTOR_SUBPARTS (rgl->type); > + poly_int64 nunits2 = TYPE_VECTOR_SUBPARTS (vectype); > + if (maybe_ne (nunits1, nunits2)) > + { > + /* A loop len for data type X can be reused for data type Y > + if X has N times more elements than Y and if Y's elements > + are N times bigger than X's. */ > + gcc_assert (multiple_p (nunits1, nunits2)); > + unsigned int factor = exact_div (nunits1, nunits2).to_constant (); > + gimple_seq seq = NULL; > + loop_len = gimple_build (&seq, RDIV_EXPR, iv_type, loop_len, > + build_int_cst (iv_type, factor)); > + if (seq) > + gsi_insert_seq_before (gsi, seq, GSI_SAME_STMT); > + } > + return loop_len; > + } I don't think this is a property of decrementing IVs. IIUC it's really a property of rgl->factor == 1 && factor == 1, where factor would need to be passed in by the caller. Because of that, it should probably be a separate patch. That is, current LOAD_LEN targets have two properties (IIRC): (1) all vectors used in a given piece of vector code have the same byte size (2) lengths are measured in bytes rather than elements For all cases, including SVE, the number of controls needed for a scalar statement is equal to the number of vectors needed for that scalar statement. Because of (1), on current LOADL_LEN targets, the number of controls needed for a scalar statement is also proportional to the total number of bytes occupied by the vectors generated for that scalar statement. And because of (2), the total number of bytes is the only thing that matters, so all users of a particular control can use the same control value. E.g. on current LOAD_LEN targets, 2xV16QI and 2xV8HI would use the same control (with no adjustment). 2xV16QI means 32 elements, while 2xV8HI means 16 elements. V16QI's nscalars_per_iter would therefore be double V8HI's, but V8HI's factor would be double V16QI's (2 vs 1), so things even out. The code structurally supports targets that count in elements rather than bytes, so that factor==1 for all element types. See the "rgl->factor == 1 && factor == 1" case in: if (rgl->max_nscalars_per_iter < nscalars_per_iter) { /* For now, we only support cases in which all loads and stores fall back to VnQI or none do. */ gcc_assert (!rgl->max_nscalars_per_iter || (rgl->factor == 1 && factor == 1) || (rgl->max_nscalars_per_iter * rgl->factor == nscalars_per_iter * factor)); rgl->max_nscalars_per_iter = nscalars_per_iter; rgl->type = vectype; rgl->factor = factor; } But it hasn't been tested, since no current target uses it. I think the above part of the patch shows that the current "factor is always 1" path is in fact broken, and the patch is a correctness fix on targets that measure in elements rather than bytes. So I think the above part of the patch should go in ahead of the IV changes. But the test should be based on factor rather than TYPE_VECTOR_SUBPARTS. Thanks, Richard