Re: Re: [PATCH] VECT: Add WHILE_LEN pattern for decrement IV support for auto-vectorization

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

From: 钟居哲 <juzhe.zhong@rivai.ai>
To: linkw <linkw@linux.ibm.com>
Cc: gcc-patches <gcc-patches@gcc.gnu.org>,
	 "Jeff Law" <jeffreyalaw@gmail.com>,  rdapp <rdapp@linux.ibm.com>,
	 richard.sandiford <richard.sandiford@arm.com>,
	 rguenther <rguenther@suse.de>
Subject: Re: Re: [PATCH] VECT: Add WHILE_LEN pattern for decrement IV support for auto-vectorization
Date: Thu, 13 Apr 2023 21:44:07 +0800	[thread overview]
Message-ID: <9D802FBA3E3A18CE+2023041321440689154856@rivai.ai> (raw)
In-Reply-To: <0fb64ee1-1bf6-2a5f-4214-7567af5facbd@linux.ibm.com>

[-- Attachment #1: Type: text/plain, Size: 3843 bytes --]

Thanks Kewen.

Current flow in this patch like you said:
....
len = WHILE_LEN (n,vf);
...
v = len_load (addr,len);
..
addr = addr + vf (in byte align);
....

This patch is just keep adding address with a vector factor (adjust as byte align).
For example, if your vector length = 512bit. Then this patch is just updating address as
addr = addr + 64;

However, today after I read RVV ISA more deeply, it should be more appropriate that
the address should updated as : addr = addr + (len * 4) if len is element number of INT32.
the len is the result by WHILE_LEN which calculate the len. 

I assume for IBM target, it's better to just update address directly adding the whole register bytesize 
in address IV. Since I think the second way (address = addr + (len * 4)) is too RVV specific, and won't be suitable for IBM. Is that right?
If it is true, I will keep this patch flow (won't change to  address = addr + (len * 4)) to see what else I need to do for IBM.
I would rather do that in RISC-V backend port.

>> I tried
>>to compile the above source files on Power, the former can adopt doloop
>>optimization but the latter fails to. 
You mean GCC can not do hardward loop optimization when IV loop control is variable ? 

juzhe.zhong@rivai.ai

From: Kewen.Lin
Date: 2023-04-13 15:29
To: 钟居哲
CC: gcc-patches; Jeff Law; rdapp; richard.sandiford; rguenther
Subject: Re: [PATCH] VECT: Add WHILE_LEN pattern for decrement IV support for auto-vectorization
Hi Juzhe,

on 2023/4/12 21:22, 钟居哲 wrote:
> Thanks Kewen. 
> 
> It seems that this proposal WHILE_LEN can help s390 when using --param vect-partial-vector-usage=2 compile option.
> 

Yeah, IMHO, the previous sequence vs. the proposed sequence are like:

int
foo (int *__restrict a, int *__restrict b, int n)
{
  if (n <= 0)
    return 0;

  int iv = 0;
  int len = MIN (n, 16);
  int sum = 0;
  do
    {
      sum += a[len] + b[len];
      iv += 16;
      int n1 = MIN (n, iv);                   // line A
      int n2 = n - n1;
      len = MIN (n2, 16);
    }
  while (n > iv);

  return sum;
}

vs.

int
foo (int *__restrict a, int *__restrict b, int n)
{
  if (n <= 0)
    return 0;

  int len;
  int sum = 0;
  do
    {
      len = MIN (n, 16);
      sum += a[len] + b[len];
      n -= len;
    }
  while (n > 0);

  return sum;
}

it at least saves one MIN (at line A) and one length preparation in the
last iteration (it's useless since loop ends).  But I think the concern
that this proposed IV isn't recognized as simple iv may stay.  I tried
to compile the above source files on Power, the former can adopt doloop
optimization but the latter fails to.

> Would you mind apply this patch && support WHILE_LEN in s390 backend and test it to see the overal benefits for s390
> as well as the correctness of this sequence ? 

Sure, if all of you think this approach and this revision is good enough to go forward for this kind of evaluation,
I'm happy to give it a shot, but only for rs6000. ;-)  I noticed that there are some discussions on withdrawing this
WHILE_LEN by using MIN_EXPR instead, I'll stay tuned.

btw, now we only adopt vector with length on the epilogues rather than the main vectorized loops, because of the
non-trivial extra costs for length preparation than just using the normal vector load/store (all lanes), so we don't
care about the performance with --param vect-partial-vector-usage=2 much.  Even if this new proposal can optimize
the length preparation for --param vect-partial-vector-usage=2, the extra costs for length preparation is still
unavoidable (MIN, shifting, one more GPR used), we would still stay with default --param vect-partial-vector-usage=1
(which can't benefit from this new proposal).

BR,
Kewen

next prev parent reply	other threads:[~2023-04-13 13:44 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-04-07  1:47 juzhe.zhong
2023-04-07  3:23 ` Li, Pan2
2023-04-11 12:12 ` juzhe.zhong
2023-04-11 12:44   ` Richard Sandiford
2023-04-12  7:00     ` Richard Biener
2023-04-12  8:00       ` juzhe.zhong
2023-04-12  8:42         ` Richard Biener
2023-04-12  9:15           ` juzhe.zhong
2023-04-12  9:29             ` Richard Biener
2023-04-12  9:42               ` Robin Dapp
2023-04-12 11:17               ` Richard Sandiford
2023-04-12 11:37                 ` juzhe.zhong
2023-04-12 12:24                   ` Richard Sandiford
2023-04-12 14:18                     ` 钟居哲
2023-04-13  6:47                       ` Richard Biener
2023-04-13  9:54                         ` juzhe.zhong
2023-04-18  9:32                           ` Richard Sandiford
2023-04-12 12:56                   ` Kewen.Lin
2023-04-12 13:22                     ` 钟居哲
2023-04-13  7:29                       ` Kewen.Lin
2023-04-13 13:44                         ` 钟居哲 [this message]
2023-04-14  2:54                           ` Kewen.Lin
2023-04-14  3:09                             ` juzhe.zhong
2023-04-14  5:40                               ` Kewen.Lin
2023-04-14  3:39                             ` juzhe.zhong
2023-04-14  6:31                               ` Kewen.Lin
2023-04-14  6:39                                 ` juzhe.zhong
2023-04-14  7:41                                   ` Kewen.Lin
2023-04-14  6:52                               ` Richard Biener
2023-04-12 11:42                 ` Richard Biener
     [not found]           ` <2023041217154958074655@rivai.ai>
2023-04-12  9:20             ` juzhe.zhong
2023-04-19 21:53 ` 钟居哲
2023-04-20  8:52   ` Richard Sandiford
2023-04-20  8:57     ` juzhe.zhong
2023-04-20  9:11       ` Richard Sandiford
2023-04-20  9:19         ` juzhe.zhong
2023-04-20  9:22           ` Richard Sandiford
2023-04-20  9:50             ` Richard Biener
2023-04-20  9:54               ` Richard Sandiford
2023-04-20 10:38                 ` juzhe.zhong
2023-04-20 12:05                   ` Richard Biener

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=9D802FBA3E3A18CE+2023041321440689154856@rivai.ai \
    --to=juzhe.zhong@rivai.ai \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=jeffreyalaw@gmail.com \
    --cc=linkw@linux.ibm.com \
    --cc=rdapp@linux.ibm.com \
    --cc=rguenther@suse.de \
    --cc=richard.sandiford@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).