Re: [PATCH] VECT: Add WHILE_LEN pattern for decrement IV support for auto-vectorization

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

From: Richard Biener <richard.guenther@gmail.com>
To: Richard Sandiford <richard.sandiford@arm.com>,
	Richard Biener <rguenther@suse.de>,
	 "juzhe.zhong@rivai.ai" <juzhe.zhong@rivai.ai>,
	gcc-patches <gcc-patches@gcc.gnu.org>,
	 jeffreyalaw <jeffreyalaw@gmail.com>,
	rdapp@linux.ibm.com, linkw@linux.ibm.com
Subject: Re: [PATCH] VECT: Add WHILE_LEN pattern for decrement IV support for auto-vectorization
Date: Wed, 12 Apr 2023 13:42:04 +0200	[thread overview]
Message-ID: <CAFiYyc3+6aV0q-U-vFgECAKj2reQ4MmZm7UbAHWnywFaziYMsQ@mail.gmail.com> (raw)
In-Reply-To: <mpt3555jqn1.fsf@arm.com>

On Wed, Apr 12, 2023 at 1:18 PM Richard Sandiford via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
> Richard Biener <rguenther@suse.de> writes:
> > On Wed, 12 Apr 2023, juzhe.zhong@rivai.ai wrote:
> >
> >>
> >> >> Thanks for the detailed explanation.  Just to clarify - with RVV
> >> >> there's only a single mask register, v0.t, or did you want to
> >> >> say an instruction can only specify a single mask register?
> >>
> >> RVV has 32 (v0~v31) vector register in total.
> >> We can store vector data value or mask value in any of them.
> >> We also have mask-logic instruction for example mask-and between any vector register.
> >>
> >> However, any vector operation for example like vadd.vv can only  predicated by v0 (in asm is v0.t) which is the first vector register.
> >> We can predicate vadd.vv with v1 - v31.
> >>
> >> So, you can image every time we want to use a mask to predicate a vector operation, we should always first store the mask value
> >> into v0.
> >>
> >> So, we can write intrinsic sequence like this:
> >>
> >> vmseq v0,v8,v9 (store mask value to v0)
> >> vmslt v1,v10,v11 (store mask value to v1)
> >> vmand v0,v0,v1
> >> vadd.vv ...v0.t (predicate mask should always be mask).
> >
> > Ah, I see - that explains it well.
> >
> >> >> ARM SVE would have a loop control mask and a separate mask
> >> >> for the if (cond[i]) which would be combined with a mask-and
> >> >> instruction to a third mask which is then used on the
> >> >> predicated instructions.
> >>
> >> Yeah, I know it. ARM SVE way is a more elegant way than RVV do.
> >> However, for RVV, we can't follow this flow.
> >> We don't have a  "whilelo" instruction to generate loop control mask.
> >
> > Yep.  Similar for AVX512 where I have to use a vector compare.  I'm
> > currently using
> >
> >  { 0, 1, 2 ... } < { remaining_len, remaining_len, ... }
> >
> > and careful updating of remaining_len (we know it will either
> > be adjusted by the full constant vector length or updated to zero).
> >
> >> We only can do loop control with length generated by vsetvl.
> >> And we can only use "v0" to mask predicate vadd.vv, and mask value can only generated by comparison or mask logical instructions.
> >>
> >> >> PowerPC and s390x might be able to use WHILE_LEN as well (though
> >> >> they only have LEN variants of loads and stores) - of course
> >> >> only "simulating it".  For the fixed-vector-length ISAs the
> >> >> predicated vector loop IMHO makes most sense for the epilogue to
> >> >> handle low-trip loops better.
> >>
> >> Yeah, I wonder how they do the flow control (if (cond[i])).
> >> For RVV, you can image I will need to add a pattern LEN_MASK_LOAD/LEN_MASK_STORE (length generated by WHILE_LEN and mask generated by comparison)
> >>
> >> I think we can CC IBM folks to see whether we can make WHILE_LEN works
> >> for both IBM and RVV ?
> >
> > I've CCed them.  Adding WHILE_LEN support to rs6000/s390x would be
> > mainly the "easy" way to get len-masked (epilog) loop support.
>
> I think that already works for them (could be misremembering).
> However, IIUC, they have no special instruction to calculate the
> length (unlike for RVV), and so it's open-coded using vect_get_len.
>
> I suppose my two questions are:
>
> (1) How easy would it be to express WHILE_LEN in normal gimple?
>     I haven't thought about this at all, so the answer might be
>     "very hard".  But it reminds me a little of UQDEC on AArch64,
>     which we open-code using MAX_EXPR and MINUS_EXPR (see
>     vect_set_loop_controls_directly).
>
>     I'm not saying WHILE_LEN is the same operation, just that it seems
>     like it might be open-codeable in a similar way.

I think WHILE_LEN is saturate-to-zero subtraction.  So when the IV
can be expressed signed

   remain = MAX (0, remain - vf);

the details are more complicated then you need an unsigned IV.

It might be that WHILE_LEN for RVV computes remain % VL
so another MIN around (not sure).  For the AVX512 work I
also have a scalar 'remain' like above but currently I'm adding
a branch

do {
 if (remain < vf)
   mask = ... vector compare ..;
 else
   mask = all-ones;
} while (mask-not-all-zeros);

so I'm using the mask as control "IV".  But that's because I do
open-code WHILE_ULT at RTL expansion time and this is how
the vectorizer works for SVE.

When manually creating a loop mask in the vectorizer tracking
'remain' is easier.  Note the extra control flow complicates the
fully masked variant, for the epilog we know remain < vf and
that we'll immediately exit the loop.

>     Even if we can open-code it, we'd still need some way for the
>     target to select the "RVV way" from the "s390/PowerPC way".
>
> (2) What effect does using a variable IV step (the result of
>     the WHILE_LEN) have on ivopts?  I remember experimenting with
>     something similar once (can't remember the context) and not
>     having a constant step prevented ivopts from making good
>     addresing-mode choices.

Any kind of variable length stuff (WHILE_ULT or WHILE_LEN) will probably
make niter analysis fail.  All IV uses that are not SCEV analyzable will
just remain as-is as IVOPTs cannot deal with them either - but usually
that should be only the control IV.

Richard.

>
> Thanks,
> Richard
>
>

next prev parent reply	other threads:[~2023-04-12 11:43 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-04-07  1:47 juzhe.zhong
2023-04-07  3:23 ` Li, Pan2
2023-04-11 12:12 ` juzhe.zhong
2023-04-11 12:44   ` Richard Sandiford
2023-04-12  7:00     ` Richard Biener
2023-04-12  8:00       ` juzhe.zhong
2023-04-12  8:42         ` Richard Biener
2023-04-12  9:15           ` juzhe.zhong
2023-04-12  9:29             ` Richard Biener
2023-04-12  9:42               ` Robin Dapp
2023-04-12 11:17               ` Richard Sandiford
2023-04-12 11:37                 ` juzhe.zhong
2023-04-12 12:24                   ` Richard Sandiford
2023-04-12 14:18                     ` 钟居哲
2023-04-13  6:47                       ` Richard Biener
2023-04-13  9:54                         ` juzhe.zhong
2023-04-18  9:32                           ` Richard Sandiford
2023-04-12 12:56                   ` Kewen.Lin
2023-04-12 13:22                     ` 钟居哲
2023-04-13  7:29                       ` Kewen.Lin
2023-04-13 13:44                         ` 钟居哲
2023-04-14  2:54                           ` Kewen.Lin
2023-04-14  3:09                             ` juzhe.zhong
2023-04-14  5:40                               ` Kewen.Lin
2023-04-14  3:39                             ` juzhe.zhong
2023-04-14  6:31                               ` Kewen.Lin
2023-04-14  6:39                                 ` juzhe.zhong
2023-04-14  7:41                                   ` Kewen.Lin
2023-04-14  6:52                               ` Richard Biener
2023-04-12 11:42                 ` Richard Biener [this message]
     [not found]           ` <2023041217154958074655@rivai.ai>
2023-04-12  9:20             ` juzhe.zhong
2023-04-19 21:53 ` 钟居哲
2023-04-20  8:52   ` Richard Sandiford
2023-04-20  8:57     ` juzhe.zhong
2023-04-20  9:11       ` Richard Sandiford
2023-04-20  9:19         ` juzhe.zhong
2023-04-20  9:22           ` Richard Sandiford
2023-04-20  9:50             ` Richard Biener
2023-04-20  9:54               ` Richard Sandiford
2023-04-20 10:38                 ` juzhe.zhong
2023-04-20 12:05                   ` Richard Biener

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAFiYyc3+6aV0q-U-vFgECAKj2reQ4MmZm7UbAHWnywFaziYMsQ@mail.gmail.com \
    --to=richard.guenther@gmail.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=jeffreyalaw@gmail.com \
    --cc=juzhe.zhong@rivai.ai \
    --cc=linkw@linux.ibm.com \
    --cc=rdapp@linux.ibm.com \
    --cc=rguenther@suse.de \
    --cc=richard.sandiford@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).