Re: Re: [PATCH] VECT: Add WHILE_LEN pattern for decrement IV support for auto-vectorization

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

From: Richard Biener <rguenther@suse.de>
To: 钟居哲 <juzhe.zhong@rivai.ai>
Cc: "richard.sandiford" <richard.sandiford@arm.com>,
	 gcc-patches <gcc-patches@gcc.gnu.org>,
	Jeff Law <jeffreyalaw@gmail.com>,  rdapp <rdapp@linux.ibm.com>,
	linkw <linkw@linux.ibm.com>,  "kito.cheng" <kito.cheng@gmail.com>
Subject: Re: Re: [PATCH] VECT: Add WHILE_LEN pattern for decrement IV support for auto-vectorization
Date: Thu, 13 Apr 2023 06:47:52 +0000 (UTC)	[thread overview]
Message-ID: <nycvar.YFH.7.77.849.2304130643410.4466@jbgna.fhfr.qr> (raw)
In-Reply-To: <3340A6BE437A5F32+2023041222184139393725@rivai.ai>

On Wed, 12 Apr 2023, ??? wrote:

> >> It's not so much that we need to do that.  But normally it's only worth
> >> adding internal functions if they do something that is too complicated
> >> to express in simple gimple arithmetic.  The UQDEC case I mentioned:
> 
> >>    z = MAX (x, y) - y
> 
> >> fell into the "simple arithmetic" category for me.  We could have added
> >> an ifn for unsigned saturating decrement, but it didn't seem complicated
> >> enough to merit its own ifn.
> 
> Ah, I known your concern. I should admit that WHILE_LEN is a simple arithmetic operation
> which is just taking result from
> 
> min (remain,vf).
> 
> The possible solution is to just use MIN_EXPR (remain,vf).
> Then, add speciall handling in umin_optab pattern to recognize "vf" in the backend.
> Finally generate vsetvl in RISC-V backend.
> 
> The "vf" should be recognized as the operand of umin should be const_int/const_poly_int operand.
> Otherwise, just generate umin scalar instruction..
> 
> However, there is a case that I can't recognize umin should generate vsetvl or umin. Is this following case:
> void foo (int32_t a)
> {
>   return min (a, 4);
> }
> 
> In this case I should generate:
> li a1,4
> umin a1,a0,a1
> 
> instead of generating vsetvl
> 
> However, in this case:
> 
> void foo (int32_t *a...)
> for (int i = 0; i < n; i++)
>   a[i] = b[i] + c[i];
> 
> with -mriscv-vector-bits=128 (which means each vector can handle 4 INT32)
> Then the VF will be 4 too. If we also MIN_EXPR instead WHILE_LEN:
> 
> ...
> len = MIN_EXPR (n,4)
> v = len_load (len)
> ....
> ...
> 
> In this case, MIN_EXPR should emit vsetvl.
> 
> It's hard for me to tell the difference between these 2 cases...

But the issue is the same in the reverse with WHILE_LEN, no?
WHILE_LEN just computes a scalar value - you seem to suggest
there's a hidden side-effect of "coalescing" the result with
a hardware vector length register?  I don't think that's good design.

IMHO tieing the scalar result with the uses has to be done where
you emit the other vsetvl instructions.

One convenient thing we have with WHILE_LEN is that it is a key
for the vectorizer to query target capabilities (and preferences).
But of course collecting whether stmts can be vectorized
with length and/or with mask would be better.

Richard.

> CC RISC-V port backend maintainer: Kito.
> 
> 
> 
> juzhe.zhong@rivai.ai
>  
> From: Richard Sandiford
> Date: 2023-04-12 20:24
> To: juzhe.zhong\@rivai.ai
> CC: rguenther; gcc-patches; jeffreyalaw; rdapp; linkw
> Subject: Re: [PATCH] VECT: Add WHILE_LEN pattern for decrement IV support for auto-vectorization
> "juzhe.zhong@rivai.ai" <juzhe.zhong@rivai.ai> writes:
> >>> I think that already works for them (could be misremembering).
> >>> However, IIUC, they have no special instruction to calculate the
> >>> length (unlike for RVV), and so it's open-coded using vect_get_len.
> >
> > Yeah, the current flow using min, sub, and then min in vect_get_len
> > is working for IBM. But I wonder whether switching the current flow of
> > length-loop-control into the WHILE_LEN pattern that this patch can improve
> > their performance.
> >
> >>> (1) How easy would it be to express WHILE_LEN in normal gimple?
> >>>     I haven't thought about this at all, so the answer might be
> >>>     "very hard".  But it reminds me a little of UQDEC on AArch64,
> >>>     which we open-code using MAX_EXPR and MINUS_EXPR (see
> >  >>    vect_set_loop_controls_directly).
> >
> >   >>   I'm not saying WHILE_LEN is the same operation, just that it seems
> >   >>   like it might be open-codeable in a similar way.
> >
> >  >>    Even if we can open-code it, we'd still need some way for the
> >   >>   target to select the "RVV way" from the "s390/PowerPC way".
> >
> > WHILE_LEN in doc I define is
> > operand0 = MIN (operand1, operand2)operand1 is the residual number of scalar elements need to be updated.operand2 is vectorization factor (vf) for single rgroup.         if multiple rgroup operan2 = vf * nitems_per_ctrl.You mean such pattern is not well expressed so we need to replace it with normaltree code (MIN OR MAX). And let RISC-V backend to optimize them into vsetvl ?Sorry, maybe I am not on the same page.
>  
> It's not so much that we need to do that.  But normally it's only worth
> adding internal functions if they do something that is too complicated
> to express in simple gimple arithmetic.  The UQDEC case I mentioned:
>  
>    z = MAX (x, y) - y
>  
> fell into the "simple arithmetic" category for me.  We could have added
> an ifn for unsigned saturating decrement, but it didn't seem complicated
> enough to merit its own ifn.
>  
> >>> (2) What effect does using a variable IV step (the result of
> >>> the WHILE_LEN) have on ivopts?  I remember experimenting with
> >>> something similar once (can't remember the context) and not
> >>> having a constant step prevented ivopts from making good
> >>> addresing-mode choices.
> >
> > Thank you so much for pointing out this. Currently, varialble IV step and decreasing n down to 0 
> > works fine for RISC-V downstream GCC and we didn't find issues related addressing-mode choosing.
>  
> OK, that's good.  Sounds like it isn't a problem then.
>  
> > I think I must missed something, would you mind giving me some hints so that I can study on ivopts
> > to find out which case may generate inferior codegens for varialble IV step?
>  
> I think AArch64 was sensitive to this because (a) the vectoriser creates
> separate IVs for each base address and (b) for SVE, we instead want
> invariant base addresses that are indexed by the loop control IV.
> Like Richard says, if the loop control IV isn't a SCEV, ivopts isn't
> able to use it and so (b) fails.
>  
> Thanks,
> Richard
>  
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg,
Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman;
HRB 36809 (AG Nuernberg)

next prev parent reply	other threads:[~2023-04-13  6:47 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-04-07  1:47 juzhe.zhong
2023-04-07  3:23 ` Li, Pan2
2023-04-11 12:12 ` juzhe.zhong
2023-04-11 12:44   ` Richard Sandiford
2023-04-12  7:00     ` Richard Biener
2023-04-12  8:00       ` juzhe.zhong
2023-04-12  8:42         ` Richard Biener
2023-04-12  9:15           ` juzhe.zhong
2023-04-12  9:29             ` Richard Biener
2023-04-12  9:42               ` Robin Dapp
2023-04-12 11:17               ` Richard Sandiford
2023-04-12 11:37                 ` juzhe.zhong
2023-04-12 12:24                   ` Richard Sandiford
2023-04-12 14:18                     ` 钟居哲
2023-04-13  6:47                       ` Richard Biener [this message]
2023-04-13  9:54                         ` juzhe.zhong
2023-04-18  9:32                           ` Richard Sandiford
2023-04-12 12:56                   ` Kewen.Lin
2023-04-12 13:22                     ` 钟居哲
2023-04-13  7:29                       ` Kewen.Lin
2023-04-13 13:44                         ` 钟居哲
2023-04-14  2:54                           ` Kewen.Lin
2023-04-14  3:09                             ` juzhe.zhong
2023-04-14  5:40                               ` Kewen.Lin
2023-04-14  3:39                             ` juzhe.zhong
2023-04-14  6:31                               ` Kewen.Lin
2023-04-14  6:39                                 ` juzhe.zhong
2023-04-14  7:41                                   ` Kewen.Lin
2023-04-14  6:52                               ` Richard Biener
2023-04-12 11:42                 ` Richard Biener
     [not found]           ` <2023041217154958074655@rivai.ai>
2023-04-12  9:20             ` juzhe.zhong
2023-04-19 21:53 ` 钟居哲
2023-04-20  8:52   ` Richard Sandiford
2023-04-20  8:57     ` juzhe.zhong
2023-04-20  9:11       ` Richard Sandiford
2023-04-20  9:19         ` juzhe.zhong
2023-04-20  9:22           ` Richard Sandiford
2023-04-20  9:50             ` Richard Biener
2023-04-20  9:54               ` Richard Sandiford
2023-04-20 10:38                 ` juzhe.zhong
2023-04-20 12:05                   ` Richard Biener

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=nycvar.YFH.7.77.849.2304130643410.4466@jbgna.fhfr.qr \
    --to=rguenther@suse.de \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=jeffreyalaw@gmail.com \
    --cc=juzhe.zhong@rivai.ai \
    --cc=kito.cheng@gmail.com \
    --cc=linkw@linux.ibm.com \
    --cc=rdapp@linux.ibm.com \
    --cc=richard.sandiford@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).