From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=MOOC=AF=suse.de=rguenther@sourceware.org>
Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.220.28])
	by sourceware.org (Postfix) with ESMTPS id BE8543858D20
	for <gcc-patches@gcc.gnu.org>; Fri, 14 Apr 2023 06:52:07 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org BE8543858D20
Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=suse.de
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=suse.de
Received: from relay2.suse.de (relay2.suse.de [149.44.160.134])
	by smtp-out1.suse.de (Postfix) with ESMTP id E75EB219BF;
	Fri, 14 Apr 2023 06:52:06 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa;
	t=1681455126; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc:
	 mime-version:mime-version:content-type:content-type:
	 in-reply-to:in-reply-to:references:references;
	bh=oOuC3LixelaN0KT7riT6oab4vhgiIkgqYkAKwPvs9/Y=;
	b=m8mK3PclrRcuK3qEcoZ73o2b8cgf1TQOgSZLdbZfjEXjWBp9o8+4uk6j/h+86043VoCisF
	jZEqGHTQJGGCdCs0QZ5aoIC6kgeOQh0fBVKHoBviTZGvD9in40TST+ZXDjqbjl8qnIoCeW
	DVDCOh/+YBMQDdYh/Nh6vtuv+j66D6k=
DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de;
	s=susede2_ed25519; t=1681455126;
	h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc:
	 mime-version:mime-version:content-type:content-type:
	 in-reply-to:in-reply-to:references:references;
	bh=oOuC3LixelaN0KT7riT6oab4vhgiIkgqYkAKwPvs9/Y=;
	b=fYZbJTH19ore9BKj4/MHhebi1vXYANGU1kbmnhb91T73l79i0xBzs1K9ZcLGRe+M0mpyi2
	1vzx1+43Vn7c2/DQ==
Received: from wotan.suse.de (wotan.suse.de [10.160.0.1])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by relay2.suse.de (Postfix) with ESMTPS id 8E6C32C143;
	Fri, 14 Apr 2023 06:52:06 +0000 (UTC)
Date: Fri, 14 Apr 2023 06:52:06 +0000 (UTC)
From: Richard Biener <rguenther@suse.de>
To: "juzhe.zhong@rivai.ai" <juzhe.zhong@rivai.ai>
cc: linkw <linkw@linux.ibm.com>, gcc-patches <gcc-patches@gcc.gnu.org>, 
    jeffreyalaw <jeffreyalaw@gmail.com>, rdapp <rdapp@linux.ibm.com>, 
    "richard.sandiford" <richard.sandiford@arm.com>
Subject: Re: Re: [PATCH] VECT: Add WHILE_LEN pattern for decrement IV support
 for auto-vectorization
In-Reply-To: <2E7B1DB75F2F78AE+2023041411394350100020@rivai.ai>
Message-ID: <nycvar.YFH.7.77.849.2304140647430.4466@jbgna.fhfr.qr>
References: <20230407014741.139387-1-juzhe.zhong@rivai.ai>,  <63723855B0BF2130+2023041120125573846623@rivai.ai>,  <mptjzyik2ql.fsf@arm.com>,  <nycvar.YFH.7.77.849.2304120657420.4466@jbgna.fhfr.qr>,  <139DA38AFC9CA5B5+2023041216004591287739@rivai.ai>, 
 <nycvar.YFH.7.77.849.2304120836450.4466@jbgna.fhfr.qr>,  <B6AA85AC56454A66+2023041217154958074655@rivai.ai>,  <nycvar.YFH.7.77.849.2304120923280.4466@jbgna.fhfr.qr>,  <mpt3555jqn1.fsf@arm.com>,  <E8B085826C635D01+2023041219371911535174@rivai.ai>, 
 <d431841e-8f8e-3c11-e348-87f3014f5d8e@linux.ibm.com>,  <8D9731A1540E082A+202304122122129793085@rivai.ai>,  <0fb64ee1-1bf6-2a5f-4214-7567af5facbd@linux.ibm.com>,  <9D802FBA3E3A18CE+2023041321440689154856@rivai.ai>,  <a860f987-e76f-089a-bbd6-b03df21ca212@linux.ibm.com>
 <2E7B1DB75F2F78AE+2023041411394350100020@rivai.ai>
User-Agent: Alpine 2.22 (LSU 394 2020-01-19)
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
X-Spam-Status: No, score=-5.0 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org
List-Id: <gcc-patches.gcc.gnu.org>

On Fri, 14 Apr 2023, juzhe.zhong@rivai.ai wrote:

> And also I already decided to make remove WHILE_LEN pattern since it seems to be unnecessary.
> And as Richard said, it's just a simple airthmetic and it's not worthwhile to do that.
> 
> So, I plan to replace WHILE_LEN into MIN_EXPR and make everything RVV specific done in RISC-V port.
> I think it's more reasonable for IBM use and more target use in the future.
> 
> So, this patch will need to changed as "introduce a new flow to do vectorization loop control" which is a new loop control flow
> with saturating subtracting n down to zero, and add a target hook for it so that we can switch to this flow ?
> 
> Is it more reasonable ?

I think we want to change the various IVs the vectorizer uses to
control the exit condition of prologue/vect/epilogue loops to a single
one counting the remaining _scalar_ iterations to zero.  Currently
it's somewhat of a mess which also leads to difficult to CSE expressions
based on derived values of such an IV.

But yes, whether for example the vector loop control stmt should
be a test for zero mask (while-ult) or zero scalar iterations
(or (signed) <= zero) could be subject to a new target hook if it
isn't an obvious choice based on HW capability checks we can already
do.

Richard.

> Thanks.
> 
> 
> juzhe.zhong@rivai.ai
>  
> From: Kewen.Lin
> Date: 2023-04-14 10:54
> To: ???
> CC: gcc-patches; Jeff Law; rdapp; richard.sandiford; rguenther
> Subject: Re: [PATCH] VECT: Add WHILE_LEN pattern for decrement IV support for auto-vectorization
> Hi Juzhe,
>  
> on 2023/4/13 21:44, ??? wrote:
> > Thanks Kewen.
> > 
> > Current flow in this patch like you said:
> > ....
> > len = WHILE_LEN (n,vf);
> > ...
> > v = len_load (addr,len);
> > ..
> > addr = addr + vf (in byte align);
> > ....
> > 
> > This patch is just keep adding address with a vector factor (adjust as byte align).
> > For example, if your vector length = 512bit. Then this patch is just updating address as
> > addr = addr + 64;
> > 
> > However, today after I read RVV ISA more deeply, it should be more appropriate that
> > the address should updated as : addr = addr + (len * 4) if len is element number of INT32.
> > the len is the result by WHILE_LEN which calculate the len.
>  
> I just read your detailed explanation on the usage of vsetvli insn (really appreciate that),
> it looks that this WHILE_LEN wants some more semantics than MIN, so I assume you still want
> to introduce this WHILE_LEN.
>  
> > 
> > I assume for IBM target, it's better to just update address directly adding the whole register bytesize 
> > in address IV. Since I think the second way (address = addr + (len * 4)) is too RVV specific, and won't be suitable for IBM. Is that right?
>  
> Yes, we just wants to add the whole vector register length in bytes.
>  
> > If it is true, I will keep this patch flow (won't change to  address = addr + (len * 4)) to see what else I need to do for IBM.
> > I would rather do that in RISC-V backend port.
>  
> IMHO, you don't need to push this down to RV backend, just query these ports having len_{load,store}
> support with a target hook or special operand in optab while_len (see internal_len_load_store_bias)
> for this need, and generate different codes accordingly.  IIUC, for WHILE_LEN, you want it to have
> the semantics as what vsetvli performs, but for IBM ports, it would be just like MIN_EXPR, maybe we
> can also generate MIN or WHILE_LEN based on this kind of target information.
>  
> If the above assumption holds, I wonder if you also want WHILE_LEN to have the implicit effect
> to update vector length register?  If yes, the codes with multiple rgroups looks unexpected:
>  
> + _76 = .WHILE_LEN (ivtmp_74, vf * nitems_per_ctrl);
> + _79 = .WHILE_LEN (ivtmp_77, vf * nitems_per_ctrl);
>  
> as the latter one seems to override the former.  Besides, if the given operands are known constants,
> it can't directly be folded into constants and do further propagation.   From this perspective, Richi's
> suggestion on "tieing the scalar result with the uses" looks better IMHO.
>  
> > 
> >>> I tried
> >>>to compile the above source files on Power, the former can adopt doloop
> >>>optimization but the latter fails to. 
> > You mean GCC can not do hardward loop optimization when IV loop control is variable ? 
>  
> No, for both cases, IV is variable, the dumping at loop2_doloop for the proposed sequence says
> "Doloop: Possible infinite iteration case.", it seems to show that for the proposed sequence compiler 
> isn't able to figure out the loop is finite, it may miss the range information on n, or it isn't
> able to analyze how the invariant involves, but I didn't look into it, all my guesses.
>  
> BR,
> Kewen
>  
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg,
Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman;
HRB 36809 (AG Nuernberg)