From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=rj3P=BF=arm.com=richard.sandiford@sourceware.org>
Received: from foss.arm.com (foss.arm.com [217.140.110.172])
	by sourceware.org (Postfix) with ESMTP id 1621C385734D
	for <gcc-patches@gcc.gnu.org>; Tue, 16 May 2023 08:16:20 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 1621C385734D
Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com
Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14])
	by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 4FDAB2F4;
	Tue, 16 May 2023 01:17:04 -0700 (PDT)
Received: from localhost (e121540-lin.manchester.arm.com [10.32.110.72])
	by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 069073F7BD;
	Tue, 16 May 2023 01:16:18 -0700 (PDT)
From: Richard Sandiford <richard.sandiford@arm.com>
To: "juzhe.zhong\@rivai.ai" <juzhe.zhong@rivai.ai>
Mail-Followup-To: "juzhe.zhong\@rivai.ai" <juzhe.zhong@rivai.ai>,gcc-patches <gcc-patches@gcc.gnu.org>,  rguenther <rguenther@suse.de>, richard.sandiford@arm.com
Cc: gcc-patches <gcc-patches@gcc.gnu.org>,  rguenther <rguenther@suse.de>
Subject: Re: [PATCH V7] VECT: Add decrement IV support in Loop Vectorizer
References: <20230515012844.183599-1-juzhe.zhong@rivai.ai>
	<mptsfbx4by1.fsf@arm.com>
	<F99B5327DA7207C5+202305161232591637457@rivai.ai>
	<mpth6sc4vcn.fsf@arm.com>
	<2BEA9A36F71A96BC+2023051615392250169239@rivai.ai>
Date: Tue, 16 May 2023 09:16:17 +0100
In-Reply-To: <2BEA9A36F71A96BC+2023051615392250169239@rivai.ai> (juzhe's
	message of "Tue, 16 May 2023 15:39:23 +0800")
Message-ID: <mpto7mk3d4e.fsf@arm.com>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux)
MIME-Version: 1.0
Content-Type: text/plain
X-Spam-Status: No, score=-23.2 required=5.0 tests=BAYES_00,KAM_DMARC_NONE,KAM_DMARC_STATUS,KAM_LAZY_DOMAIN_SECURITY,SPF_HELO_NONE,SPF_NONE,TXREP,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org
List-Id: <gcc-patches.gcc.gnu.org>

"juzhe.zhong@rivai.ai" <juzhe.zhong@rivai.ai> writes:
> Oh, 
> I am sorry for incorrect typos in the last email, fix typos :
>
> Hi, Richard.
> For case 2, I come up with this idea:
> +	     Case 2 (SLP multiple rgroup):
> +		...
> +		_38 = (unsigned long) n_12(D);
> +		_39 = _38 * 2;
> +		_40 = MAX_EXPR <_39, 16>;   ----------------->remove
> +		_41 = _40 - 16; ----------------->remove
>
> +		...
> +		# ivtmp_42 = PHI <ivtmp_43(4), _41(3)>  ----------------->remove
>
> +		# ivtmp_45 = PHI <ivtmp_46(4), _39(3)>
> +		...
> +		_44 = MIN_EXPR <ivtmp_42, 32>;  ----------------->remove
>
> +		_47 = MIN_EXPR <ivtmp_45, 32>;+               _47_2 = MIN_EXPR <_47, 16>;  -------->add+               _47_3 = _47 - _47_2 ; --------> add
> +		...
> +		.LEN_STORE (_6, 8B, _47_2, ...);
> +		...
> +		.LEN_STORE (_25, 8B, _47_3, ...);
> +		_33 = _47_2 / 2;
> +		...
> +		.LEN_STORE (_8, 16B, _33, ...);
> +		_36 = _47_3 / 2;
> +		...
> +		.LEN_STORE (_15, 16B, _36, ...);
> +		ivtmp_46 = ivtmp_45 - _47;
> +		ivtmp_43 = ivtmp_42 - _44;  ----------------->remove
>
> +		...
> +		if (ivtmp_46 != 0)
> +		  goto <bb 4>; [83.33%]
> +		else
> +		  goto <bb 5>; [16.67%]
> Is it reasonable ? Or you do have better idea for it?

Yeah, this makes sense, and I think it makes case 2 very similar
(equivalent?) to case 3.  If so, it would be nice if they could be
combined.

Of course, this loses the nice property that the original had: that each
IV was independent, and so the dependency chains were shorter.  With the
above approach, the second length parameter instead depends on a
three-instruction chain.  But that might be OK (up to you).

How much of the riscv backend infrastructure is in place now?  The reason
I ask is that it would be good if the patch had some tests.  AIUI, the
patch is an optimisation on top of what the current len_load/store code does,
rather than something that is needed for correctness.  So it seems like
the necessary patterns could be added and tested using the current approach,
then this patch could be applied on top, with its own tests for the new
approach.

Thanks,
Richard