From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=rrce=BI=arm.com=richard.sandiford@sourceware.org>
Received: from foss.arm.com (foss.arm.com [217.140.110.172])
	by sourceware.org (Postfix) with ESMTP id 98EA93858D28
	for <gcc-patches@gcc.gnu.org>; Fri, 19 May 2023 10:23:06 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 98EA93858D28
Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com
Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14])
	by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 2AE791FB;
	Fri, 19 May 2023 03:23:51 -0700 (PDT)
Received: from localhost (e121540-lin.manchester.arm.com [10.32.110.72])
	by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id B18523F762;
	Fri, 19 May 2023 03:23:05 -0700 (PDT)
From: Richard Sandiford <richard.sandiford@arm.com>
To: juzhe.zhong@rivai.ai
Mail-Followup-To: juzhe.zhong@rivai.ai,gcc-patches@gcc.gnu.org,  rguenther@suse.de, richard.sandiford@arm.com
Cc: gcc-patches@gcc.gnu.org,  rguenther@suse.de
Subject: Re: [PATCH V11] VECT: Add decrement IV support in Loop Vectorizer
References: <20230516102319.163752-1-juzhe.zhong@rivai.ai>
Date: Fri, 19 May 2023 11:23:04 +0100
In-Reply-To: <20230516102319.163752-1-juzhe.zhong@rivai.ai> (juzhe zhong's
	message of "Tue, 16 May 2023 18:23:19 +0800")
Message-ID: <mpt5y8otybb.fsf@arm.com>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux)
MIME-Version: 1.0
Content-Type: text/plain
X-Spam-Status: No, score=-29.0 required=5.0 tests=BAYES_00,GIT_PATCH_0,KAM_DMARC_NONE,KAM_DMARC_STATUS,KAM_LAZY_DOMAIN_SECURITY,SPF_HELO_NONE,SPF_NONE,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org
List-Id: <gcc-patches.gcc.gnu.org>

Thanks for the update.  I'll split this review into two pieces.
Second piece to follow (not sure when, but hopefully soon).

juzhe.zhong@rivai.ai writes:
> diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> index ed0166fedab..6f49bdee009 100644
> --- a/gcc/tree-vect-loop.cc
> +++ b/gcc/tree-vect-loop.cc
> @@ -10364,12 +10375,14 @@ vect_record_loop_len (loop_vec_info loop_vinfo, vec_loop_lens *lens,
>     rgroup that operates on NVECTORS vectors, where 0 <= INDEX < NVECTORS.  */
>  
>  tree
> -vect_get_loop_len (loop_vec_info loop_vinfo, vec_loop_lens *lens,
> -		   unsigned int nvectors, unsigned int index)
> +vect_get_loop_len (loop_vec_info loop_vinfo, gimple_stmt_iterator *gsi,
> +		   vec_loop_lens *lens, unsigned int nvectors, tree vectype,
> +		   unsigned int index)
>  {
>    rgroup_controls *rgl = &(*lens)[nvectors - 1];
>    bool use_bias_adjusted_len =
>      LOOP_VINFO_PARTIAL_LOAD_STORE_BIAS (loop_vinfo) != 0;
> +  tree iv_type = LOOP_VINFO_RGROUP_IV_TYPE (loop_vinfo);
>  
>    /* Populate the rgroup's len array, if this is the first time we've
>       used it.  */
> @@ -10400,6 +10413,26 @@ vect_get_loop_len (loop_vec_info loop_vinfo, vec_loop_lens *lens,
>  
>    if (use_bias_adjusted_len)
>      return rgl->bias_adjusted_ctrl;
> +  else if (LOOP_VINFO_USING_DECREMENTING_IV_P (loop_vinfo))
> +    {
> +      tree loop_len = rgl->controls[index];
> +      poly_int64 nunits1 = TYPE_VECTOR_SUBPARTS (rgl->type);
> +      poly_int64 nunits2 = TYPE_VECTOR_SUBPARTS (vectype);
> +      if (maybe_ne (nunits1, nunits2))
> +	{
> +	  /* A loop len for data type X can be reused for data type Y
> +	     if X has N times more elements than Y and if Y's elements
> +	     are N times bigger than X's.  */
> +	  gcc_assert (multiple_p (nunits1, nunits2));
> +	  unsigned int factor = exact_div (nunits1, nunits2).to_constant ();
> +	  gimple_seq seq = NULL;
> +	  loop_len = gimple_build (&seq, RDIV_EXPR, iv_type, loop_len,
> +				   build_int_cst (iv_type, factor));
> +	  if (seq)
> +	    gsi_insert_seq_before (gsi, seq, GSI_SAME_STMT);
> +	}
> +      return loop_len;
> +    }

I don't think this is a property of decrementing IVs.  IIUC it's really
a property of rgl->factor == 1 && factor == 1, where factor would need
to be passed in by the caller.  Because of that, it should probably be
a separate patch.

That is, current LOAD_LEN targets have two properties (IIRC):

(1) all vectors used in a given piece of vector code have the same byte size
(2) lengths are measured in bytes rather than elements

For all cases, including SVE, the number of controls needed for a scalar
statement is equal to the number of vectors needed for that scalar
statement.

Because of (1), on current LOADL_LEN targets, the number of controls
needed for a scalar statement is also proportional to the total number
of bytes occupied by the vectors generated for that scalar statement.
And because of (2), the total number of bytes is the only thing that
matters, so all users of a particular control can use the same control
value.

E.g. on current LOAD_LEN targets, 2xV16QI and 2xV8HI would use the same
control (with no adjustment).  2xV16QI means 32 elements, while 2xV8HI
means 16 elements.  V16QI's nscalars_per_iter would therefore be double
V8HI's, but V8HI's factor would be double V16QI's (2 vs 1), so things
even out.

The code structurally supports targets that count in elements rather
than bytes, so that factor==1 for all element types.  See the
"rgl->factor == 1 && factor == 1" case in:

  if (rgl->max_nscalars_per_iter < nscalars_per_iter)
    {
      /* For now, we only support cases in which all loads and stores fall back
	 to VnQI or none do.  */
      gcc_assert (!rgl->max_nscalars_per_iter
		  || (rgl->factor == 1 && factor == 1)
		  || (rgl->max_nscalars_per_iter * rgl->factor
		      == nscalars_per_iter * factor));
      rgl->max_nscalars_per_iter = nscalars_per_iter;
      rgl->type = vectype;
      rgl->factor = factor;
    }

But it hasn't been tested, since no current target uses it.

I think the above part of the patch shows that the current "factor is
always 1" path is in fact broken, and the patch is a correctness fix on
targets that measure in elements rather than bytes.

So I think the above part of the patch should go in ahead of the IV changes.
But the test should be based on factor rather than TYPE_VECTOR_SUBPARTS.

Thanks,
Richard