public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
From: "juzhe.zhong at rivai dot ai" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug target/112092] RISC-V: Wrong RVV code produced for vsetvl-11.c and vsetvlmax-8.c
Date: Fri, 27 Oct 2023 00:57:27 +0000	[thread overview]
Message-ID: <bug-112092-4-cTTM1I4IQy@http.gcc.gnu.org/bugzilla/> (raw)
In-Reply-To: <bug-112092-4@http.gcc.gnu.org/bugzilla/>

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112092

--- Comment #8 from JuzheZhong <juzhe.zhong at rivai dot ai> ---
(In reply to Maciej W. Rozycki from comment #7)
> Thank you for all your explanations.  I think I'm still missing something
> here, so I'll write it differently (and let's ignore the tail-agnostic vs
> tail-undisturbed choice for the purpose of this consideration).
> 
> Let me paste the whole assembly code produced here (sans decorations):
> 
> 	beq	a5,zero,.L2
> 	vsetvli	zero,a6,e32,m1,tu,ma
> .L3:
> 	beq	a4,zero,.L7
> 	li	a5,0
> .L5:
> 	vle32.v	v1,0(a0)
> 	vle32.v	v1,0(a1)
> 	vle32.v	v1,0(a2)
> 	vse32.v	v1,0(a3)
> 	addi	a5,a5,1
> 	bne	a4,a5,.L5
> .L7:
> 	ret
> .L2:
> 	vsetvli	zero,a6,e32,m1,tu,ma
> 	j	.L3
> 
> This seems to me to correspond to this source code:
> 
>   if (cond)
>     __riscv_vsetvl_e32m1(avl);
>   else
>     __riscv_vsetvl_e16mf2(avl);
>   for (size_t i = 0; i < n; i += 1) {
>     vint32m1_t a = __riscv_vle32_v_i32m1(in1, avl);
>     vint32m1_t b = __riscv_vle32_v_i32m1_tu(a, in2, avl);
>     vint32m1_t c = __riscv_vle32_v_i32m1_tu(b, in3, avl);
>     __riscv_vse32_v_i32m1(out, c, avl);
>   }
> 
> And in that case I'd expect the conditional to be optimised away, as its
> result is ignored (along with the intrinsics) and does not affect actual
> code executed except for the different execution path, i.e.:
> 
> 	beq	a4,zero,.L7
> 	vsetvli	zero,a6,e32,m1,tu,ma
> 	li	a5,0
> .L5:
> 	vle32.v	v1,0(a0)
> 	vle32.v	v1,0(a1)
> 	vle32.v	v1,0(a2)
> 	vse32.v	v1,0(a3)
> 	addi	a5,a5,1
> 	bne	a4,a5,.L5
> .L7:
> 	ret
> 

Good catch ! I think we have a missed-optimization here and I agree this code
is correct and optimal codegen for this case.
We have a close-to-optimal (not optimal enough) codegen for now.

And this optimization should not be done by VSETVL PASS.

After VSETVL PASS fusion, both e16mf2 and e32m1 user vsetvl instrinsic are
fused into e32m1, tu. They are totally the same so it's meaningless seperate
them into different blocks (They should be the same single block).

The reason why we missed an optimization here is because we expand user
vsetvl __riscv_vsetvl_e32m1 and __riscv_vsetvl_e16mf2 into 2 different
RTL expressions. The before PASSes (before VSETVL) don't known they are
equivalent, so separate them into different blocks.

If you change codes as follows:
  if (cond)
    vl = __riscv_vsetvl_e32m1(avl);
  else
    vl = __riscv_vsetvl_e32m1(avl);

I am sure the codegen will be as you said above. (A single vsetvl e32m1 tu in
a single block).

To optimize it, a alternative approach is that we expand all user vsetvl
instrinscs into same RTL expression (as long as they are having same ratio).


Meaning, expand 

__riscv_vsetvl_e64m1
__riscv_vsetvl_e32m1
__riscv_vsetvl_e16mf2
__riscv_vsetvl_e8mf8

into same RTL expression since their VL outputs are definitely the same.

I don't see it will cause any problems here.

But different ratio like 32m1 and e32mf2 should be different RLT expression.

I am not sure kito agree with this idea.


Another alternative approach is that we enhance bb_reorder PASS.
The VSETVL PASS is run before bb_reorder PASS and current bb_reorder PASS
is unable to fuse these 2 vsetvls e32m1 Tu into same block because we split
it into "real" vsetvls which is the RTL pattern has side effects.

The "real" vsetvl patterns which generate assembly should have side effects
since vsetvl does change global VL/VTYPE status and also set a general
register.

No matter which approach to optimize it, I won't do it in GCC-14 since stage 1
is soon to close.  We have a few more features (which are much more imporant)
that we are planning and working to support in GCC-14.
I have confidence that our RVV GCC current VSETVL PASS is really optimal and
fancy enough.

After stage 1 close, we won't do any optimizations, we will only run full
coverage testing (for example, using different LMUL different -march to run the
whole gcc testsuite) and fix bugs.

  parent reply	other threads:[~2023-10-27  0:57 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-10-26  1:00 [Bug target/112092] New: " macro at orcam dot me.uk
2023-10-26  1:46 ` [Bug target/112092] " juzhe.zhong at rivai dot ai
2023-10-26  1:57 ` juzhe.zhong at rivai dot ai
2023-10-26  4:01 ` macro at orcam dot me.uk
2023-10-26  6:38 ` kito at gcc dot gnu.org
2023-10-26  6:51 ` juzhe.zhong at rivai dot ai
2023-10-26  7:08 ` juzhe.zhong at rivai dot ai
2023-10-26 23:31 ` macro at orcam dot me.uk
2023-10-27  0:57 ` juzhe.zhong at rivai dot ai [this message]
2023-10-27  1:03 ` juzhe.zhong at rivai dot ai
2023-10-31 13:58 ` [Bug target/112092] RISC-V: Suboptimal " macro at orcam dot me.uk
2023-11-08  6:38 ` cvs-commit at gcc dot gnu.org
2023-11-08  6:39 ` juzhe.zhong at rivai dot ai

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bug-112092-4-cTTM1I4IQy@http.gcc.gnu.org/bugzilla/ \
    --to=gcc-bugzilla@gcc.gnu.org \
    --cc=gcc-bugs@gcc.gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).