From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <richard.guenther@gmail.com>
Received: from mail-ed1-x532.google.com (mail-ed1-x532.google.com [IPv6:2a00:1450:4864:20::532])
	by sourceware.org (Postfix) with ESMTPS id 6017C3858415
	for <gcc-patches@gcc.gnu.org>; Thu, 29 Sep 2022 09:37:22 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 6017C3858415
Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com
Received: by mail-ed1-x532.google.com with SMTP id z2so1195285edi.1
        for <gcc-patches@gcc.gnu.org>; Thu, 29 Sep 2022 02:37:22 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20210112;
        h=to:subject:message-id:date:from:in-reply-to:references:mime-version
         :from:to:cc:subject:date;
        bh=DV8nweu/iNEqbNbUtyVkvuQxJ+t79ar2ZX02k50oF+Q=;
        b=aLcwa9UBGMtC8ZF7SR3lKcDMa96fPX7Lzc7fQUCF51lzNgcBK7EoA31FIgygMdYhMb
         hv+Jac0V43wD/06MY7+Zf0rZLDKyWdVahz93JQGMrAliNQeNKWfKzjemxoLlrXE3IHH/
         xp3QanXYeKuYJqmZaZuILCkjI2b7fuSmVgAsBr+4b3/iFn0BZyf5lsLp9IrJZMjwdV2l
         7edo6f5la+7gZETzj6GSyuH2OP2lHQ3z2iKhJZ1JQkQyg7AkC2aJYMkxMJW1tFTzwA78
         qamdXgnkrAcMWMjzQTZjBJ7IqxE1L62kRjcOz3ovV8spg9AYm24QJw3UrKLbmrbp8Wzp
         aoHg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=to:subject:message-id:date:from:in-reply-to:references:mime-version
         :x-gm-message-state:from:to:cc:subject:date;
        bh=DV8nweu/iNEqbNbUtyVkvuQxJ+t79ar2ZX02k50oF+Q=;
        b=JH3F71okkB011NGYUawNrcr+nwfhhusRYXITX7SycPs+Hw08Hk+FO89KdbmzbEsfOj
         UzrrNEVc8/O6jbfghDaFRHxSmdlHvx3/HMSmogMz66xqz6LnN0ABINB+VkDlQ5a58DRD
         TS1w6fC50ItEiU2zTvKnhVzIAM7m1SwMhKB8UEohiMD4DYfHDVclkbFfmZHfeOBa5sI5
         cb7wVNxbSASgh5dXdX4ZmjIHlzgogtbEt5Vogo4o8fJRjZnBv1K5dDjrSWCmu2YYgj6Y
         RZLSBvxtcXzYd20SQDFR5pYcv6LBvctSD3ojJvJ1YrJ7X3lS1XfsxxOO3yxopANWu0Xp
         BWzw==
X-Gm-Message-State: ACrzQf0HdIw5JbGjtVppcTec5YBzTzg3wHc+hep6FyZXzFjQw+mUILvs
	7zauTeuI17G+fKQzbIx9siBVtTy+x9ygUwXFTh5zomBu7nE=
X-Google-Smtp-Source: AMsMyM4DET/e2cIZv36rAnMoPaGsLYgfcs+nItBx41212WKMm2DhOxJZzMO1xxijlCwjkobmjxFv6HADZK4YG1jt5eY=
X-Received: by 2002:a05:6402:3596:b0:450:c4d9:a04b with SMTP id
 y22-20020a056402359600b00450c4d9a04bmr2437881edc.218.1664444240070; Thu, 29
 Sep 2022 02:37:20 -0700 (PDT)
MIME-Version: 1.0
References: <87180de9-d0d4-b92f-405f-100aca3d5cf8@codesourcery.com>
 <CAFiYyc0YG=kcSHkG7E4eV8HC0A8SRiDh6ScQndDd6By7ExoE+g@mail.gmail.com> <mptfsgav8gw.fsf@arm.com>
In-Reply-To: <mptfsgav8gw.fsf@arm.com>
From: Richard Biener <richard.guenther@gmail.com>
Date: Thu, 29 Sep 2022 11:37:07 +0200
Message-ID: <CAFiYyc0appd3xKp6BVcF2u4quxcApoi+PtcCV6dK0qQ53VURNQ@mail.gmail.com>
Subject: Re: [PATCH] vect: while_ult for integer mask
To: Richard Biener via Gcc-patches <gcc-patches@gcc.gnu.org>, Andrew Stubbs <ams@codesourcery.com>, 
	Richard Biener <richard.guenther@gmail.com>, juzhe.zhong@rivai.ai, 
	richard.sandiford@arm.com
Content-Type: text/plain; charset="UTF-8"
X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org
List-Id: <gcc-patches.gcc.gnu.org>

On Thu, Sep 29, 2022 at 11:24 AM Richard Sandiford
<richard.sandiford@arm.com> wrote:
>
> Richard Biener via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
> > On Wed, Sep 28, 2022 at 5:06 PM Andrew Stubbs <ams@codesourcery.com> wrote:
> >>
> >> This patch is a prerequisite for some amdgcn patches I'm working on to
> >> support shorter vector lengths (having fixed 64 lanes tends to miss
> >> optimizations, and masking is not supported everywhere yet).
> >>
> >> The problem is that, unlike AArch64, I'm not using different mask modes
> >> for different sized vectors, so all loops end up using the while_ultsidi
> >> pattern, regardless of vector length.  In theory I could use SImode for
> >> V32, HImode for V16, etc., but there's no mode to fit V4 or V2 so
> >> something else is needed.  Moving to using vector masks in the backend
> >> is not a natural fit for GCN, and would be a huge task in any case.
> >>
> >> This patch adds an additional length operand so that we can distinguish
> >> the different uses in the back end and don't end up with more lanes
> >> enabled than there ought to be.
> >>
> >> I've made the extra operand conditional on the mode so that I don't have
> >> to modify the AArch64 backend; that uses while_<cond> family of
> >> operators in a lot of places and uses iterators, so it would end up
> >> touching a lot of code just to add an inactive operand, plus I don't
> >> have a way to test it properly.  I've confirmed that AArch64 builds and
> >> expands while_ult correctly in a simple example.
> >>
> >> OK for mainline?
> >
> > Hmm, but you could introduce BI4mode and BI2mode for V4 and V2, no?
> > Not sure if it is possible to have two partial integer modes and use those.
>
> Might be difficult to do cleanly, since BI is very much a special case.
> But I agree that that would better fit the existing scheme.
>
> Otherwise:
>
>   operand0[0] = operand1 < operand2;
>   for (i = 1; i < operand3; i++)
>     operand0[i] = operand0[i - 1] && (operand1 + i < operand2);
>
> looks like a "length and mask" operation, which IIUC is also what
> RVV wanted?  (Wasn't at the Cauldron, so not entirely sure.)
>
> Perhaps the difference is that in this case the length must be constant.
> (Or is that true for RVV as well?)

I think the length is variable and queried at runtime but it might be also
used when compiling with a fixed length vector size.

Note x86 with its integer mode AVX512 masks runs into similar issues
but just uses QImode to DImode (but doesn't exercise this particular pattern).
It basically relies on the actual machine instructions not enabling the
particular lanes, like when doing a V2DFmode compare to produce a mask.
For the while_ult that's of course a bit hard to achieve (sadly AVX512 doesn't
have any such capability and my attempts to emulate have been either
unsuccessfully or slow)

Richard.

>
> Thanks,
> Richard