public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: 钟居哲 <juzhe.zhong@rivai.ai>
To: rguenther <rguenther@suse.de>
Cc: richard.sandiford <richard.sandiford@arm.com>,
	 gcc-patches <gcc-patches@gcc.gnu.org>,
	 linkw <linkw@linux.ibm.com>
Subject: Re: Re: [PATCH] VECT: Change flow of decrement IV
Date: Tue, 30 May 2023 22:13:14 +0800	[thread overview]
Message-ID: <4F6B3237B487B968+202305302213139285908@rivai.ai> (raw)
In-Reply-To: <nycvar.YFH.7.77.849.2305301233210.4723@jbgna.fhfr.qr>

[-- Attachment #1: Type: text/plain, Size: 4748 bytes --]

Hi, all. After several investigations:
Here is my experiements:
void
single_rgroup (int32_t *__restrict a, int32_t *__restrict b, int n)
{
  for (int i = 0; i < n; i++)
    a[i] = b[i] + a[i];
}

void
mutiple_rgroup (float *__restrict f, double *__restrict d, int n)
{
  for (int i = 0; i < n; ++i)
    {
      f[i * 2 + 0] = 1;
      f[i * 2 + 1] = 2;
      d[i] = 3;
    }
} 


single_rgroup:
ble a2,zero,.L5
li a4,4
.L3:
minu a5,a2,a4
vsetvli zero,a5,e32,m1,ta,ma
vle32.v v1,0(a0)
vle32.v v2,0(a1)
vsetivli zero,4,e32,m1,ta,ma
mv a3,a2                                       ---------> 1 more "mv" instruction
vadd.vv v1,v1,v2
vsetvli zero,a5,e32,m1,ta,ma
vse32.v v1,0(a0)
addi a1,a1,16
addi a0,a0,16
addi a2,a2,-4
bgtu a3,a4,.L3
.L5:
ret
.size single_rgroup, .-single_rgroup
.align 1
.globl foo5
.type foo5, @function
mutiple_rgroup :
ble a2,zero,.L11
lui a5,%hi(.LANCHOR0)
addi a5,a5,%lo(.LANCHOR0)
vl1re32.v v2,0(a5)
lui a5,%hi(.LANCHOR0+16)
addi a5,a5,%lo(.LANCHOR0+16)
slli a2,a2,1
li a3,8
li a7,4
vl1re64.v v1,0(a5)
.L9:
minu a5,a2,a3
minu a4,a5,a7
sub a5,a5,a4
addi a6,a0,16
vsetvli zero,a4,e32,m1,ta,ma
vse32.v v2,0(a0)
srli a4,a4,1
vsetvli zero,a5,e32,m1,ta,ma
vse32.v v2,0(a6)
srli a5,a5,1
vsetvli zero,a4,e64,m1,ta,ma
addi a6,a1,16
vse64.v v1,0(a1)
mv a4,a2                                ---------> 1 more "mv" instruction
vsetvli zero,a5,e64,m1,ta,ma
vse64.v v1,0(a6)
addi a0,a0,32
addi a1,a1,32
addi a2,a2,-8
bgtu a4,a3,.L9
.L11:
ret

These are the examples, I have tried enough amount cases. This is the worst case after this patch for RVV:
no matter single-rgroup or multiple-rgroup, we will end up with 1 more "mv" instruction inside the loop.
There are also some examples I have tried with no more instructions (It seems IVOPTS has done some optimization in some cases).

From my side (RVV),  I think one more "mv" instruction is not a big deal if this patch (apply vf step and check conditon by remain > vf)
can help IBM. 

For single-rgroup, this 'mv' instruction will gone when we use SELECT_VL. For multiple-rgroup, the 'mv' instruction remains
but as I said, not a big deal.

If this patch's approach is approved, I will rebase and send SELECT_VL patch again base on this patch.

Looking forward your suggestions.

Thanks.


juzhe.zhong@rivai.ai
 
From: Richard Biener
Date: 2023-05-30 20:33
To: juzhe.zhong
CC: Richard Sandiford; gcc-patches; linkw
Subject: Re: [PATCH] VECT: Change flow of decrement IV
On Tue, 30 May 2023, juzhe.zhong wrote:
 
> This patch will generate the number of rgroup ?mov? instructions inside the
> loop. This is unacceptable. For example?if number of rgroups=3? will be 3 more
> instruction in loop. If this patch is necessary? I think I should find a way
> to fix it.
 
That's odd, you only need to adjust the IV which is used in the exit test,
not all the others.
 
> ---- Replied Message ----
> From
> Richard Sandiford<richard.sandiford@arm.com>
> Date
> 05/30/2023 19:41
> To
> juzhe.zhong@rivai.ai<juzhe.zhong@rivai.ai>
> Cc
> gcc-patches<gcc-patches@gcc.gnu.org>,
> rguenther<rguenther@suse.de>,
> linkw<linkw@linux.ibm.com>
> Subject
> Re: [PATCH] VECT: Change flow of decrement IV
> "juzhe.zhong@rivai.ai" <juzhe.zhong@rivai.ai> writes:
> > Before this patch:
> > foo:
> > ble a2,zero,.L5
> > csrr a3,vlenb
> > srli a4,a3,2
> > .L3:
> > minu a5,a2,a4
> > vsetvli zero,a5,e32,m1,ta,ma
> > vle32.v v2,0(a1)
> > vle32.v v1,0(a0)
> > vsetvli t1,zero,e32,m1,ta,ma
> > vadd.vv v1,v1,v2
> > vsetvli zero,a5,e32,m1,ta,ma
> > vse32.v v1,0(a0)
> > add a1,a1,a3
> > add a0,a0,a3
> >       sub   a2,a2,a5
> > bne a2,zero,.L3
> > .L5:
> > ret
> >
> > After this patch:
> >
> > foo:
> > ble a2,zero,.L5
> > csrr a3,vlenb
> > srli a4,a3,2
> > neg a7,a4   -->>>additional instruction
> > .L3:
> > minu a5,a2,a4
> > vsetvli zero,a5,e32,m1,ta,ma
> > vle32.v v2,0(a1)
> > vle32.v v1,0(a0)
> > vsetvli t1,zero,e32,m1,ta,ma
> > mv a6,a2  -->>>additional instruction
> > vadd.vv v1,v1,v2
> > vsetvli zero,a5,e32,m1,ta,ma
> > vse32.v v1,0(a0)
> > add a1,a1,a3
> > add a0,a0,a3
> > add a2,a2,a7
> > bgtu a6,a4,.L3
> > .L5:
> > ret
> >
> > There is 1 more instruction in preheader and 1 more instruction in loop.
> > But I think it's OK for RVV since we will definitely be using SELECT_VL so
> this issue will gone.
> 
> But what about cases where you won't be using SELECT_VL, such as SLP?
> 
> Richard
> 
> 
 
-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg,
Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman;
HRB 36809 (AG Nuernberg)

  parent reply	other threads:[~2023-05-30 14:13 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-05-30 11:28 juzhe.zhong
2023-05-30 11:31 ` Richard Sandiford
2023-05-30 11:36   ` juzhe.zhong
2023-05-30 11:41     ` Richard Sandiford
     [not found]     ` <5C7770CA9FB40F7E+8BE427DD-97DA-4B93-A73A-8CDD1D92089B@rivai.ai>
2023-05-30 12:01       ` Richard Sandiford
     [not found]     ` <685EE879E20B3272+6338EB42-0A9D-4147-993D-99DC8FF7C832@rivai.ai>
2023-05-30 12:33       ` Richard Biener
2023-05-30 12:37         ` 钟居哲
     [not found]           ` <FA43CAC5-BCCE-42AF-8A6B-E69F1A496F5C@suse.de>
2023-05-30 22:51             ` 钟居哲
2023-05-30 14:13         ` 钟居哲 [this message]
2023-05-30 14:47         ` 钟居哲
2023-05-30 15:05         ` 钟居哲
2023-05-31  1:42           ` juzhe.zhong
2023-05-31  6:41             ` Richard Biener
2023-05-31  6:50               ` juzhe.zhong
2023-05-31  7:38                 ` Kewen.Lin
2023-05-31  7:50                   ` juzhe.zhong
2023-05-31  7:28               ` Richard Sandiford
2023-05-31  7:36                 ` juzhe.zhong
2023-05-31  8:44                   ` Richard Biener
2023-05-31  7:38                 ` Richard Biener
2023-05-31  7:49                   ` juzhe.zhong
2023-05-31  9:01                   ` Richard Sandiford
2023-05-31  9:30                     ` juzhe.zhong
2023-05-31 10:53                       ` Richard Biener
2023-05-31 12:16                         ` 钟居哲
2023-05-30 11:38   ` juzhe.zhong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4F6B3237B487B968+202305302213139285908@rivai.ai \
    --to=juzhe.zhong@rivai.ai \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=linkw@linux.ibm.com \
    --cc=rguenther@suse.de \
    --cc=richard.sandiford@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).