From: Richard Biener <rguenther@suse.de>
To: 钟居哲 <juzhe.zhong@rivai.ai>
Cc: "richard.sandiford" <richard.sandiford@arm.com>,
gcc-patches <gcc-patches@gcc.gnu.org>,
Jeff Law <jeffreyalaw@gmail.com>, rdapp <rdapp@linux.ibm.com>,
linkw <linkw@linux.ibm.com>, "kito.cheng" <kito.cheng@gmail.com>
Subject: Re: Re: [PATCH] VECT: Add WHILE_LEN pattern for decrement IV support for auto-vectorization
Date: Thu, 13 Apr 2023 06:47:52 +0000 (UTC) [thread overview]
Message-ID: <nycvar.YFH.7.77.849.2304130643410.4466@jbgna.fhfr.qr> (raw)
In-Reply-To: <3340A6BE437A5F32+2023041222184139393725@rivai.ai>
On Wed, 12 Apr 2023, ??? wrote:
> >> It's not so much that we need to do that. But normally it's only worth
> >> adding internal functions if they do something that is too complicated
> >> to express in simple gimple arithmetic. The UQDEC case I mentioned:
>
> >> z = MAX (x, y) - y
>
> >> fell into the "simple arithmetic" category for me. We could have added
> >> an ifn for unsigned saturating decrement, but it didn't seem complicated
> >> enough to merit its own ifn.
>
> Ah, I known your concern. I should admit that WHILE_LEN is a simple arithmetic operation
> which is just taking result from
>
> min (remain,vf).
>
> The possible solution is to just use MIN_EXPR (remain,vf).
> Then, add speciall handling in umin_optab pattern to recognize "vf" in the backend.
> Finally generate vsetvl in RISC-V backend.
>
> The "vf" should be recognized as the operand of umin should be const_int/const_poly_int operand.
> Otherwise, just generate umin scalar instruction..
>
> However, there is a case that I can't recognize umin should generate vsetvl or umin. Is this following case:
> void foo (int32_t a)
> {
> return min (a, 4);
> }
>
> In this case I should generate:
> li a1,4
> umin a1,a0,a1
>
> instead of generating vsetvl
>
> However, in this case:
>
> void foo (int32_t *a...)
> for (int i = 0; i < n; i++)
> a[i] = b[i] + c[i];
>
> with -mriscv-vector-bits=128 (which means each vector can handle 4 INT32)
> Then the VF will be 4 too. If we also MIN_EXPR instead WHILE_LEN:
>
> ...
> len = MIN_EXPR (n,4)
> v = len_load (len)
> ....
> ...
>
> In this case, MIN_EXPR should emit vsetvl.
>
> It's hard for me to tell the difference between these 2 cases...
But the issue is the same in the reverse with WHILE_LEN, no?
WHILE_LEN just computes a scalar value - you seem to suggest
there's a hidden side-effect of "coalescing" the result with
a hardware vector length register? I don't think that's good design.
IMHO tieing the scalar result with the uses has to be done where
you emit the other vsetvl instructions.
One convenient thing we have with WHILE_LEN is that it is a key
for the vectorizer to query target capabilities (and preferences).
But of course collecting whether stmts can be vectorized
with length and/or with mask would be better.
Richard.
> CC RISC-V port backend maintainer: Kito.
>
>
>
> juzhe.zhong@rivai.ai
>
> From: Richard Sandiford
> Date: 2023-04-12 20:24
> To: juzhe.zhong\@rivai.ai
> CC: rguenther; gcc-patches; jeffreyalaw; rdapp; linkw
> Subject: Re: [PATCH] VECT: Add WHILE_LEN pattern for decrement IV support for auto-vectorization
> "juzhe.zhong@rivai.ai" <juzhe.zhong@rivai.ai> writes:
> >>> I think that already works for them (could be misremembering).
> >>> However, IIUC, they have no special instruction to calculate the
> >>> length (unlike for RVV), and so it's open-coded using vect_get_len.
> >
> > Yeah, the current flow using min, sub, and then min in vect_get_len
> > is working for IBM. But I wonder whether switching the current flow of
> > length-loop-control into the WHILE_LEN pattern that this patch can improve
> > their performance.
> >
> >>> (1) How easy would it be to express WHILE_LEN in normal gimple?
> >>> I haven't thought about this at all, so the answer might be
> >>> "very hard". But it reminds me a little of UQDEC on AArch64,
> >>> which we open-code using MAX_EXPR and MINUS_EXPR (see
> > >> vect_set_loop_controls_directly).
> >
> > >> I'm not saying WHILE_LEN is the same operation, just that it seems
> > >> like it might be open-codeable in a similar way.
> >
> > >> Even if we can open-code it, we'd still need some way for the
> > >> target to select the "RVV way" from the "s390/PowerPC way".
> >
> > WHILE_LEN in doc I define is
> > operand0 = MIN (operand1, operand2)operand1 is the residual number of scalar elements need to be updated.operand2 is vectorization factor (vf) for single rgroup. if multiple rgroup operan2 = vf * nitems_per_ctrl.You mean such pattern is not well expressed so we need to replace it with normaltree code (MIN OR MAX). And let RISC-V backend to optimize them into vsetvl ?Sorry, maybe I am not on the same page.
>
> It's not so much that we need to do that. But normally it's only worth
> adding internal functions if they do something that is too complicated
> to express in simple gimple arithmetic. The UQDEC case I mentioned:
>
> z = MAX (x, y) - y
>
> fell into the "simple arithmetic" category for me. We could have added
> an ifn for unsigned saturating decrement, but it didn't seem complicated
> enough to merit its own ifn.
>
> >>> (2) What effect does using a variable IV step (the result of
> >>> the WHILE_LEN) have on ivopts? I remember experimenting with
> >>> something similar once (can't remember the context) and not
> >>> having a constant step prevented ivopts from making good
> >>> addresing-mode choices.
> >
> > Thank you so much for pointing out this. Currently, varialble IV step and decreasing n down to 0
> > works fine for RISC-V downstream GCC and we didn't find issues related addressing-mode choosing.
>
> OK, that's good. Sounds like it isn't a problem then.
>
> > I think I must missed something, would you mind giving me some hints so that I can study on ivopts
> > to find out which case may generate inferior codegens for varialble IV step?
>
> I think AArch64 was sensitive to this because (a) the vectoriser creates
> separate IVs for each base address and (b) for SVE, we instead want
> invariant base addresses that are indexed by the loop control IV.
> Like Richard says, if the loop control IV isn't a SCEV, ivopts isn't
> able to use it and so (b) fails.
>
> Thanks,
> Richard
>
>
--
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg,
Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman;
HRB 36809 (AG Nuernberg)
next prev parent reply other threads:[~2023-04-13 6:47 UTC|newest]
Thread overview: 41+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-04-07 1:47 juzhe.zhong
2023-04-07 3:23 ` Li, Pan2
2023-04-11 12:12 ` juzhe.zhong
2023-04-11 12:44 ` Richard Sandiford
2023-04-12 7:00 ` Richard Biener
2023-04-12 8:00 ` juzhe.zhong
2023-04-12 8:42 ` Richard Biener
2023-04-12 9:15 ` juzhe.zhong
2023-04-12 9:29 ` Richard Biener
2023-04-12 9:42 ` Robin Dapp
2023-04-12 11:17 ` Richard Sandiford
2023-04-12 11:37 ` juzhe.zhong
2023-04-12 12:24 ` Richard Sandiford
2023-04-12 14:18 ` 钟居哲
2023-04-13 6:47 ` Richard Biener [this message]
2023-04-13 9:54 ` juzhe.zhong
2023-04-18 9:32 ` Richard Sandiford
2023-04-12 12:56 ` Kewen.Lin
2023-04-12 13:22 ` 钟居哲
2023-04-13 7:29 ` Kewen.Lin
2023-04-13 13:44 ` 钟居哲
2023-04-14 2:54 ` Kewen.Lin
2023-04-14 3:09 ` juzhe.zhong
2023-04-14 5:40 ` Kewen.Lin
2023-04-14 3:39 ` juzhe.zhong
2023-04-14 6:31 ` Kewen.Lin
2023-04-14 6:39 ` juzhe.zhong
2023-04-14 7:41 ` Kewen.Lin
2023-04-14 6:52 ` Richard Biener
2023-04-12 11:42 ` Richard Biener
[not found] ` <2023041217154958074655@rivai.ai>
2023-04-12 9:20 ` juzhe.zhong
2023-04-19 21:53 ` 钟居哲
2023-04-20 8:52 ` Richard Sandiford
2023-04-20 8:57 ` juzhe.zhong
2023-04-20 9:11 ` Richard Sandiford
2023-04-20 9:19 ` juzhe.zhong
2023-04-20 9:22 ` Richard Sandiford
2023-04-20 9:50 ` Richard Biener
2023-04-20 9:54 ` Richard Sandiford
2023-04-20 10:38 ` juzhe.zhong
2023-04-20 12:05 ` Richard Biener
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=nycvar.YFH.7.77.849.2304130643410.4466@jbgna.fhfr.qr \
--to=rguenther@suse.de \
--cc=gcc-patches@gcc.gnu.org \
--cc=jeffreyalaw@gmail.com \
--cc=juzhe.zhong@rivai.ai \
--cc=kito.cheng@gmail.com \
--cc=linkw@linux.ibm.com \
--cc=rdapp@linux.ibm.com \
--cc=richard.sandiford@arm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).