From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.220.28]) by sourceware.org (Postfix) with ESMTPS id BE8543858D20 for ; Fri, 14 Apr 2023 06:52:07 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org BE8543858D20 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=suse.de Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=suse.de Received: from relay2.suse.de (relay2.suse.de [149.44.160.134]) by smtp-out1.suse.de (Postfix) with ESMTP id E75EB219BF; Fri, 14 Apr 2023 06:52:06 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1681455126; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=oOuC3LixelaN0KT7riT6oab4vhgiIkgqYkAKwPvs9/Y=; b=m8mK3PclrRcuK3qEcoZ73o2b8cgf1TQOgSZLdbZfjEXjWBp9o8+4uk6j/h+86043VoCisF jZEqGHTQJGGCdCs0QZ5aoIC6kgeOQh0fBVKHoBviTZGvD9in40TST+ZXDjqbjl8qnIoCeW DVDCOh/+YBMQDdYh/Nh6vtuv+j66D6k= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1681455126; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=oOuC3LixelaN0KT7riT6oab4vhgiIkgqYkAKwPvs9/Y=; b=fYZbJTH19ore9BKj4/MHhebi1vXYANGU1kbmnhb91T73l79i0xBzs1K9ZcLGRe+M0mpyi2 1vzx1+43Vn7c2/DQ== Received: from wotan.suse.de (wotan.suse.de [10.160.0.1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by relay2.suse.de (Postfix) with ESMTPS id 8E6C32C143; Fri, 14 Apr 2023 06:52:06 +0000 (UTC) Date: Fri, 14 Apr 2023 06:52:06 +0000 (UTC) From: Richard Biener To: "juzhe.zhong@rivai.ai" cc: linkw , gcc-patches , jeffreyalaw , rdapp , "richard.sandiford" Subject: Re: Re: [PATCH] VECT: Add WHILE_LEN pattern for decrement IV support for auto-vectorization In-Reply-To: <2E7B1DB75F2F78AE+2023041411394350100020@rivai.ai> Message-ID: References: <20230407014741.139387-1-juzhe.zhong@rivai.ai>, <63723855B0BF2130+2023041120125573846623@rivai.ai>, , , <139DA38AFC9CA5B5+2023041216004591287739@rivai.ai>, , , , , , , <8D9731A1540E082A+202304122122129793085@rivai.ai>, <0fb64ee1-1bf6-2a5f-4214-7567af5facbd@linux.ibm.com>, <9D802FBA3E3A18CE+2023041321440689154856@rivai.ai>, <2E7B1DB75F2F78AE+2023041411394350100020@rivai.ai> User-Agent: Alpine 2.22 (LSU 394 2020-01-19) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII X-Spam-Status: No, score=-5.0 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Fri, 14 Apr 2023, juzhe.zhong@rivai.ai wrote: > And also I already decided to make remove WHILE_LEN pattern since it seems to be unnecessary. > And as Richard said, it's just a simple airthmetic and it's not worthwhile to do that. > > So, I plan to replace WHILE_LEN into MIN_EXPR and make everything RVV specific done in RISC-V port. > I think it's more reasonable for IBM use and more target use in the future. > > So, this patch will need to changed as "introduce a new flow to do vectorization loop control" which is a new loop control flow > with saturating subtracting n down to zero, and add a target hook for it so that we can switch to this flow ? > > Is it more reasonable ? I think we want to change the various IVs the vectorizer uses to control the exit condition of prologue/vect/epilogue loops to a single one counting the remaining _scalar_ iterations to zero. Currently it's somewhat of a mess which also leads to difficult to CSE expressions based on derived values of such an IV. But yes, whether for example the vector loop control stmt should be a test for zero mask (while-ult) or zero scalar iterations (or (signed) <= zero) could be subject to a new target hook if it isn't an obvious choice based on HW capability checks we can already do. Richard. > Thanks. > > > juzhe.zhong@rivai.ai > > From: Kewen.Lin > Date: 2023-04-14 10:54 > To: ??? > CC: gcc-patches; Jeff Law; rdapp; richard.sandiford; rguenther > Subject: Re: [PATCH] VECT: Add WHILE_LEN pattern for decrement IV support for auto-vectorization > Hi Juzhe, > > on 2023/4/13 21:44, ??? wrote: > > Thanks Kewen. > > > > Current flow in this patch like you said: > > .... > > len = WHILE_LEN (n,vf); > > ... > > v = len_load (addr,len); > > .. > > addr = addr + vf (in byte align); > > .... > > > > This patch is just keep adding address with a vector factor (adjust as byte align). > > For example, if your vector length = 512bit. Then this patch is just updating address as > > addr = addr + 64; > > > > However, today after I read RVV ISA more deeply, it should be more appropriate that > > the address should updated as : addr = addr + (len * 4) if len is element number of INT32. > > the len is the result by WHILE_LEN which calculate the len. > > I just read your detailed explanation on the usage of vsetvli insn (really appreciate that), > it looks that this WHILE_LEN wants some more semantics than MIN, so I assume you still want > to introduce this WHILE_LEN. > > > > > I assume for IBM target, it's better to just update address directly adding the whole register bytesize > > in address IV. Since I think the second way (address = addr + (len * 4)) is too RVV specific, and won't be suitable for IBM. Is that right? > > Yes, we just wants to add the whole vector register length in bytes. > > > If it is true, I will keep this patch flow (won't change to address = addr + (len * 4)) to see what else I need to do for IBM. > > I would rather do that in RISC-V backend port. > > IMHO, you don't need to push this down to RV backend, just query these ports having len_{load,store} > support with a target hook or special operand in optab while_len (see internal_len_load_store_bias) > for this need, and generate different codes accordingly. IIUC, for WHILE_LEN, you want it to have > the semantics as what vsetvli performs, but for IBM ports, it would be just like MIN_EXPR, maybe we > can also generate MIN or WHILE_LEN based on this kind of target information. > > If the above assumption holds, I wonder if you also want WHILE_LEN to have the implicit effect > to update vector length register? If yes, the codes with multiple rgroups looks unexpected: > > + _76 = .WHILE_LEN (ivtmp_74, vf * nitems_per_ctrl); > + _79 = .WHILE_LEN (ivtmp_77, vf * nitems_per_ctrl); > > as the latter one seems to override the former. Besides, if the given operands are known constants, > it can't directly be folded into constants and do further propagation. From this perspective, Richi's > suggestion on "tieing the scalar result with the uses" looks better IMHO. > > > > >>> I tried > >>>to compile the above source files on Power, the former can adopt doloop > >>>optimization but the latter fails to. > > You mean GCC can not do hardward loop optimization when IV loop control is variable ? > > No, for both cases, IV is variable, the dumping at loop2_doloop for the proposed sequence says > "Doloop: Possible infinite iteration case.", it seems to show that for the proposed sequence compiler > isn't able to figure out the loop is finite, it may miss the range information on n, or it isn't > able to analyze how the invariant involves, but I didn't look into it, all my guesses. > > BR, > Kewen > > -- Richard Biener SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman; HRB 36809 (AG Nuernberg)