public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: Richard Sandiford <richard.sandiford@arm.com>
To: "juzhe.zhong\@rivai.ai" <juzhe.zhong@rivai.ai>
Cc: gcc-patches <gcc-patches@gcc.gnu.org>,  rguenther <rguenther@suse.de>
Subject: Re: [PATCH] VECT: Add decrement IV iteration loop control by variable amount support
Date: Wed, 26 Apr 2023 09:06:12 +0100	[thread overview]
Message-ID: <mptttx3130r.fsf@arm.com> (raw)
In-Reply-To: <8D207B8D88895F0E+20230426121547140175172@rivai.ai> (juzhe's message of "Wed, 26 Apr 2023 12:15:47 +0800")

"juzhe.zhong@rivai.ai" <juzhe.zhong@rivai.ai> writes:
> Thanks Richard so much.
>
>>> I don't think that's guaranteed by the proposed definition of WHILE_LEN.
>>> The first int64_t WHILE_LEN could come up short, and return something
>>> less than VF/2.
>
> I am so sorry that the comments of vect_set_loop_controls_by_while_len
> is totally misleading and incorrect and I have sent V3 patch to fix that.
> Actually, I don't use WHILE_LEN in multi-rgroups situation, instead, I use MIN_EXPR
> to force VF elements for each non-final iteration to make sure result is correct.
>
> Yes, I agree with you that WHILE_LEN will produce issues for SLP situation that
> having multi-rgroups since WHILE_LEN definition is allow target produces non-VF
> outcome in non-final iteration. 

Yeah, I'd read that you weren't using WHILE_LEN for SLP.  I was talking
specifically about non-SLP though (nitems_per_iter == 1).  Consider:

void f(short *x, long *y) {
  for (int i = 0; i < 100; ++i)
    x[i] = y[i];
}

compiled at -O3 -fno-vect-cost-model for SVE:

        whilelo p4.d, wzr, w6
        whilelo p3.d, wzr, w5
        whilelo p2.h, wzr, w3
        whilelo p1.d, wzr, w3
        whilelo p0.d, wzr, w4
.L2:
        ld1d    z2.d, p0/z, [x1, #1, mul vl]
        ld1d    z0.d, p1/z, [x1]
        ld1d    z1.d, p3/z, [x1, #2, mul vl]
        uzp1    z0.s, z0.s, z2.s
        ld1d    z2.d, p4/z, [x1, #3, mul vl]
        uzp1    z1.s, z1.s, z2.s
        uzp1    z0.h, z0.h, z1.h
        st1h    z0.h, p2, [x0, x2, lsl 1]
        add     x2, x2, x8
        whilelo p2.h, w2, w3
        whilelo p4.d, w2, w6
        whilelo p3.d, w2, w5
        whilelo p0.d, w2, w4
        add     x1, x1, x7
        whilelo p1.d, w2, w3
        b.any   .L2

This is a non-SLP loop.  We have two rgroups: a single-mask/control
rgroup for the short vector, and a 4-mask/control rgroup for the long
vector.  And the loop converts the Nth long scalar (selected from 4
concatenated vectors) to the Nth short scalar (in a single vector).

It's therefore important that the 4-mask/control rgroup and the
single-mask/control rgroup treat the same lanes/scalar iterations
as active and the same lanes/scalar iterations as inactive.

But if I read the code correctly, the patch would generate 5 WHILE_LENs
for this case, since nitems_per_iter==1 for all 5 controls.  And I don't
think the documentation of WHILE_LEN guarantees that that will work
correctly, given that WHILE_LEN isn't a simple MIN operation.

It might be that it works correctly on RVV, given the later
backend-specific processing.  But I'm looking at this as a purely
gimple thing.  If something guarantees that the above works then
I think the WHILE_LEN documentation needs to be updated.

From the current documentation of WHILE_LEN, I think the safe
approach would be to use WHILE_LEN for a single-control rgroup
and then "expand" that to larger control rgroups using arithmetic.
Specifically, if the length computed by the single-control rgroup
is X, then control I in an N-control rgroup would need to be:

   (X - VF*I/N) capped to the range [0, VF/N]

SVE does something similar for:

void f(short *x, int *y) {
  for (int i = 0; i < 100; ++i)
    x[i] = y[i];
}

Here we use a single WHILELO and then unpack it, rather than use
three independent WHILELOs:

        whilelo p0.h, wzr, w3
.L2:
        punpklo p2.h, p0.b
        punpkhi p1.h, p0.b
        ld1w    z0.s, p2/z, [x1, x2, lsl 2]
        ld1w    z1.s, p1/z, [x5, x2, lsl 2]
        uzp1    z0.h, z0.h, z1.h
        st1h    z0.h, p0, [x0, x2, lsl 1]
        add     x2, x2, x4
        whilelo p0.h, w2, w3
        b.any   .L2

This is dreadful code (hence the -fno-vect-cost-model).  And I'm sure
it's equally bad for RVV.  But the point is that we can generate it,
and in more exotic cases it might even be worthwhile.

Thanks,
Richard

  reply	other threads:[~2023-04-26  8:06 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-04-25 13:42 juzhe.zhong
2023-04-25 16:58 ` Richard Sandiford
2023-04-25 18:30   ` Bernhard Reutner-Fischer
2023-04-26  4:15   ` juzhe.zhong
2023-04-26  8:06     ` Richard Sandiford [this message]
2023-04-26  8:55       ` juzhe.zhong
2023-04-26  9:06         ` Richard Sandiford
2023-04-26 10:03           ` juzhe.zhong
2023-04-26 11:49             ` Richard Sandiford
     [not found]       ` <20230426165504066654201@rivai.ai>
2023-04-26  9:05         ` juzhe.zhong
2023-04-26  7:43   ` Richard Biener
2023-04-26  7:46     ` Richard Sandiford

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=mptttx3130r.fsf@arm.com \
    --to=richard.sandiford@arm.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=juzhe.zhong@rivai.ai \
    --cc=rguenther@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).