From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by sourceware.org (Postfix) with ESMTP id D3D14395205C for ; Tue, 23 Jun 2020 12:20:55 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org D3D14395205C Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=richard.sandiford@arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 82D371F1; Tue, 23 Jun 2020 05:20:55 -0700 (PDT) Received: from localhost (e121540-lin.manchester.arm.com [10.32.98.126]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 774B63F71E; Tue, 23 Jun 2020 05:20:54 -0700 (PDT) From: Richard Sandiford To: Richard Biener Mail-Followup-To: Richard Biener , "Kewen.Lin" , GCC Patches , Bill Schmidt , David Edelsohn , Segher Boessenkool , wilson@tuliptree.org, richard.sandiford@arm.com Cc: "Kewen.Lin" , GCC Patches , Bill Schmidt , David Edelsohn , Segher Boessenkool , wilson@tuliptree.org Subject: Re: [PATCH 1/7 v5] ifn/optabs: Support vector load/store with length References: <30906c0d-3b9f-e1e6-156f-c01fcf229cb9@linux.ibm.com> <93f9e4cf-351d-5f89-65b5-7dbc97ce13b9@linux.ibm.com> <8f64b58b-050c-2e1c-36c6-049a07eceee7@linux.ibm.com> <99427c42-113d-7fbc-5617-f64c2317250d@linux.ibm.com> <0fccef6b-2885-d91d-ff05-b23c2315fc7d@linux.ibm.com> <380e1418-f262-7a09-e24a-2eb14817662f@linux.ibm.com> Date: Tue, 23 Jun 2020 13:20:53 +0100 In-Reply-To: (Richard Biener's message of "Tue, 23 Jun 2020 13:25:25 +0200") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00, KAM_DMARC_STATUS, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 23 Jun 2020 12:20:57 -0000 Richard Biener writes: > On Tue, Jun 23, 2020 at 11:53 AM Richard Sandiford > wrote: >> >> Things have moved on due to the IRC conversation, but=E2=80=A6 >> >> "Kewen.Lin" writes: >> > on 2020/6/23 =E4=B8=8A=E5=8D=883:59, Richard Sandiford wrote: >> >> "Kewen.Lin" writes: >> >>> @@ -5167,6 +5167,24 @@ mode @var{n}. >> >>> >> >>> This pattern is not allowed to @code{FAIL}. >> >>> >> >>> +@cindex @code{lenload@var{m}} instruction pattern >> >>> +@item @samp{lenload@var{m}} >> >>> +Perform a vector load with length from memory operand 1 of mode @va= r{m} >> >>> +into register operand 0. Length is provided in register operand 2 = with >> >>> +appropriate mode which should afford the maximal required precision= of >> >>> +any available lengths. >> >> >> >> I think we need to say in more detail what =E2=80=9Cload with length= =E2=80=9D actually >> >> means. How about: >> >> >> >> Load the number of bytes specified by operand 2 from memory operand= 1 >> >> into register operand 0, setting the other bytes of operand 0 to >> >> undefined values. Operands 0 and 1 have mode @var{m}. Operand 2 h= as >> >> whichever integer mode the target prefers. >> >> >> > >> > Thanks for nice wordings! Updated, for "... to undefined values" I ch= anged it >> > to "... to undefined values or zeros" as Segher's comments to match th= e behavior >> > on Power. >> >> =E2=80=9Cset =E2=80=A6 to undefined values=E2=80=9D means that the value= s are not defined by >> the optab interface. In other words, the target can set the bytes >> to whatever it wants, and gimple code can't make any assumptions about >> what the values of the bytes are. >> >> So setting the bytes to zero (as Power does) would conform to the >> interface. So would leaving the bytes in operand 0 untouched. >> So would using an instruction that really does leave the other >> bytes with undefined values, etc. >> >> So I think we should keep it as just =E2=80=9C=E2=80=A6 to undefined val= ues=E2=80=9D, >> >> The alternative would be to define the interface so that targets *must* >> ensure that the other bytes are zeros. But at the moment, the only >> intended use of the optabs and ifns is for autovectorisation, and the >> vectoriser won't care about the values of =E2=80=9Cinactive=E2=80=9D byt= es/lanes. >> Forcing the target to set them to a specific value like zero would be >> unnecessarily restrictive. > > Actually it _does_ care. I'd argue it doesn't, but for essentially the same reasons :-) > This is supposed to be used for fully masked > loops and 'unspecified values' would require us to explicitely zero > them for any FP op because of possible sNaN representations. It > also precludes us from bitwise ORing in an appropriately masked > vector of 1s to make integer division happy (OK, no vector ISA supports > integer division). Zeros would be a problem for FP division too. And even if we require loads to set inactive lanes to zero, we couldn't infer from that that any given FP addition (say) won't raise an exception. E.g. the inputs could be the result of converting integers and adding them could trigger an inexact exception. Or the values could be the result of simple bitcasts, giving arbitrary FP values. (AIUI, current bfloat code works this way.) The vectoriser currently only allows potentially-trapping FP operations on partial vectors if the target provides an appropriate IFN_COND_* function. (That's one of the main use cases for those functions.) In other cases it requires the loop to operate on full vectors. This should be relaxed in future to support inbranch partial vectorisation of simd calls. This means that the current patch series will/should simply punt for =E2=80=9Clength=E2=80=9D-based loop control if the loop contains FP ope= rations that (as far as gimple is concerned) might trap. If we're thinking about how to relax that, then IMO it will need to be done either at the level of each FP operation or by some kind of =E2=80=9Cglobal=E2=80=9D vectorisation subpass that introduces know= n-safe values for inactive lanes. The first would be easier, the second would be more optimal. I don't think that's specific to =E2=80=9Clength=E2=80=9D vectorisation tho= ugh. The same concerns apply to if-converted loops that operate on full vectors. I think the approach would be essentially the same for both. In that scenario, removing zeroing of an IFN_LEN_LOAD would =E2=80=9Cjust= =E2=80=9D be an optimisation, and could potentially be left to RTL code if necessary. (But see my main point below.) SVE supports integer division btw. :-) > So unless we have evidence that there exists an ISA that does _not_ > zero the excess bits I'd rather specify it does. I think the known architectures that might use this are: - MVE - Power - RVV MVE and Power both set inactive lanes to zero. But I'm not sure about RVV. AIUI, for RVV the approach instead would be to reduce the effective vector length for the final iteration of the vector loop, and I'm not sure whether in that situation it makes sense to say that the other elements still exist and are guaranteed to be zero. I'm the last person who should be speculating on that though. Let's see whether Jim has any comments. In summary, I'm not saying we should never define the inactive values to be zero. I just think that we should leave it until it matters. And I don't think it does/should matter for the current patch series. IFN_MASK_LOAD has been around for quite a long time now and we've never had to define the values of inactive lanes there. Thanks, Richard