From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <richard.sandiford@arm.com>
Received: from foss.arm.com (foss.arm.com [217.140.110.172])
 by sourceware.org (Postfix) with ESMTP id D3D14395205C
 for <gcc-patches@gcc.gnu.org>; Tue, 23 Jun 2020 12:20:55 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org D3D14395205C
Authentication-Results: sourceware.org;
 dmarc=none (p=none dis=none) header.from=arm.com
Authentication-Results: sourceware.org;
 spf=pass smtp.mailfrom=richard.sandiford@arm.com
Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14])
 by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 82D371F1;
 Tue, 23 Jun 2020 05:20:55 -0700 (PDT)
Received: from localhost (e121540-lin.manchester.arm.com [10.32.98.126])
 by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 774B63F71E;
 Tue, 23 Jun 2020 05:20:54 -0700 (PDT)
From: Richard Sandiford <richard.sandiford@arm.com>
To: Richard Biener <richard.guenther@gmail.com>
Mail-Followup-To: Richard Biener <richard.guenther@gmail.com>,
 "Kewen.Lin" <linkw@linux.ibm.com>, GCC Patches <gcc-patches@gcc.gnu.org>,
 Bill Schmidt <wschmidt@linux.ibm.com>, David Edelsohn <dje.gcc@gmail.com>,
 Segher Boessenkool <segher@kernel.crashing.org>, wilson@tuliptree.org,
 richard.sandiford@arm.com
Cc: "Kewen.Lin" <linkw@linux.ibm.com>, GCC Patches <gcc-patches@gcc.gnu.org>,
 Bill Schmidt <wschmidt@linux.ibm.com>, David Edelsohn <dje.gcc@gmail.com>,
 Segher Boessenkool <segher@kernel.crashing.org>, wilson@tuliptree.org
Subject: Re: [PATCH 1/7 v5] ifn/optabs: Support vector load/store with length
References: <30906c0d-3b9f-e1e6-156f-c01fcf229cb9@linux.ibm.com>
 <93f9e4cf-351d-5f89-65b5-7dbc97ce13b9@linux.ibm.com>
 <8f64b58b-050c-2e1c-36c6-049a07eceee7@linux.ibm.com>
 <mpt5zbzmg0n.fsf@arm.com>
 <99427c42-113d-7fbc-5617-f64c2317250d@linux.ibm.com>
 <0fccef6b-2885-d91d-ff05-b23c2315fc7d@linux.ibm.com>
 <mpttuz2yiq3.fsf@arm.com>
 <380e1418-f262-7a09-e24a-2eb14817662f@linux.ibm.com>
 <mptftamxg5g.fsf@arm.com>
 <CAFiYyc1twYzcW9VK1hui+sEGAH_Js7djtVjeFL+wYd3oaRCpvw@mail.gmail.com>
Date: Tue, 23 Jun 2020 13:20:53 +0100
In-Reply-To: <CAFiYyc1twYzcW9VK1hui+sEGAH_Js7djtVjeFL+wYd3oaRCpvw@mail.gmail.com>
 (Richard Biener's message of "Tue, 23 Jun 2020 13:25:25 +0200")
Message-ID: <mptsgemug62.fsf@arm.com>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.1 (gnu/linux)
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00, KAM_DMARC_STATUS,
 SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2
X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <http://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <http://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
X-List-Received-Date: Tue, 23 Jun 2020 12:20:57 -0000

Richard Biener <richard.guenther@gmail.com> writes:
> On Tue, Jun 23, 2020 at 11:53 AM Richard Sandiford
> <richard.sandiford@arm.com> wrote:
>>
>> Things have moved on due to the IRC conversation, but=E2=80=A6
>>
>> "Kewen.Lin" <linkw@linux.ibm.com> writes:
>> > on 2020/6/23 =E4=B8=8A=E5=8D=883:59, Richard Sandiford wrote:
>> >> "Kewen.Lin" <linkw@linux.ibm.com> writes:
>> >>> @@ -5167,6 +5167,24 @@ mode @var{n}.
>> >>>
>> >>>  This pattern is not allowed to @code{FAIL}.
>> >>>
>> >>> +@cindex @code{lenload@var{m}} instruction pattern
>> >>> +@item @samp{lenload@var{m}}
>> >>> +Perform a vector load with length from memory operand 1 of mode @va=
r{m}
>> >>> +into register operand 0.  Length is provided in register operand 2 =
with
>> >>> +appropriate mode which should afford the maximal required precision=
 of
>> >>> +any available lengths.
>> >>
>> >> I think we need to say in more detail what =E2=80=9Cload with length=
=E2=80=9D actually
>> >> means.  How about:
>> >>
>> >>   Load the number of bytes specified by operand 2 from memory operand=
 1
>> >>   into register operand 0, setting the other bytes of operand 0 to
>> >>   undefined values.  Operands 0 and 1 have mode @var{m}.  Operand 2 h=
as
>> >>   whichever integer mode the target prefers.
>> >>
>> >
>> > Thanks for nice wordings!  Updated, for "... to undefined values" I ch=
anged it
>> > to "... to undefined values or zeros" as Segher's comments to match th=
e behavior
>> > on Power.
>>
>> =E2=80=9Cset =E2=80=A6 to undefined values=E2=80=9D means that the value=
s are not defined by
>> the optab interface.  In other words, the target can set the bytes
>> to whatever it wants, and gimple code can't make any assumptions about
>> what the values of the bytes are.
>>
>> So setting the bytes to zero (as Power does) would conform to the
>> interface.  So would leaving the bytes in operand 0 untouched.
>> So would using an instruction that really does leave the other
>> bytes with undefined values, etc.
>>
>> So I think we should keep it as just =E2=80=9C=E2=80=A6 to undefined val=
ues=E2=80=9D,
>>
>> The alternative would be to define the interface so that targets *must*
>> ensure that the other bytes are zeros.  But at the moment, the only
>> intended use of the optabs and ifns is for autovectorisation, and the
>> vectoriser won't care about the values of =E2=80=9Cinactive=E2=80=9D byt=
es/lanes.
>> Forcing the target to set them to a specific value like zero would be
>> unnecessarily restrictive.
>
> Actually it _does_ care.

I'd argue it doesn't, but for essentially the same reasons :-)

> This is supposed to be used for fully masked
> loops and 'unspecified values' would require us to explicitely zero
> them for any FP op because of possible sNaN representations.  It
> also precludes us from bitwise ORing in an appropriately masked
> vector of 1s to make integer division happy (OK, no vector ISA supports
> integer division).

Zeros would be a problem for FP division too.  And even if we require
loads to set inactive lanes to zero, we couldn't infer from that that
any given FP addition (say) won't raise an exception.  E.g. the inputs
could be the result of converting integers and adding them could trigger
an inexact exception.  Or the values could be the result of simple
bitcasts, giving arbitrary FP values.  (AIUI, current bfloat code
works this way.)

The vectoriser currently only allows potentially-trapping FP operations
on partial vectors if the target provides an appropriate IFN_COND_*
function.  (That's one of the main use cases for those functions.)
In other cases it requires the loop to operate on full vectors.
This should be relaxed in future to support inbranch partial
vectorisation of simd calls.

This means that the current patch series will/should simply punt
for =E2=80=9Clength=E2=80=9D-based loop control if the loop contains FP ope=
rations
that (as far as gimple is concerned) might trap.

If we're thinking about how to relax that, then IMO it will need
to be done either at the level of each FP operation or by some
kind of =E2=80=9Cglobal=E2=80=9D vectorisation subpass that introduces know=
n-safe
values for inactive lanes.  The first would be easier, the second
would be more optimal.

I don't think that's specific to =E2=80=9Clength=E2=80=9D vectorisation tho=
ugh.
The same concerns apply to if-converted loops that operate on full
vectors.  I think the approach would be essentially the same for both.

In that scenario, removing zeroing of an IFN_LEN_LOAD would =E2=80=9Cjust=
=E2=80=9D be
an optimisation, and could potentially be left to RTL code if necessary.
(But see my main point below.)

SVE supports integer division btw. :-)

> So unless we have evidence that there exists an ISA that does _not_
> zero the excess bits I'd rather specify it does.

I think the known architectures that might use this are:

- MVE
- Power
- RVV

MVE and Power both set inactive lanes to zero.  But I'm not sure about RVV.
AIUI, for RVV the approach instead would be to reduce the effective vector
length for the final iteration of the vector loop, and I'm not sure
whether in that situation it makes sense to say that the other elements
still exist and are guaranteed to be zero.

I'm the last person who should be speculating on that though.  Let's see
whether Jim has any comments.

In summary, I'm not saying we should never define the inactive values
to be zero.  I just think that we should leave it until it matters.
And I don't think it does/should matter for the current patch series.

IFN_MASK_LOAD has been around for quite a long time now and we've never
had to define the values of inactive lanes there.

Thanks,
Richard