From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-bugzilla@gcc.gnu.org>
Received: by sourceware.org (Postfix, from userid 48)
	id 5C8433858C41; Thu, 26 Oct 2023 06:38:55 +0000 (GMT)
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 5C8433858C41
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org;
	s=default; t=1698302335;
	bh=7i3im0kEXovzdBGGYAJVe9yIIvWEga2BriZ3WmkRy+c=;
	h=From:To:Subject:Date:In-Reply-To:References:From;
	b=bOFdYuxBJ6Yahonf6x2SnMcWvIw03/a2igq6Bh6mcB+bsluzFKAWl9oYfkaDEo487
	 JmWr5rcpPK3pIjCJM+UeopOimVQLqMHIQREE7ebEMzgmfQb59no2YxlaomzmNfD/5K
	 qhwq7TQUcH4Hoctg9/1e2+gCll9akdrTH9Im1mYo=
From: "kito at gcc dot gnu.org" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug target/112092] RISC-V: Wrong RVV code produced for vsetvl-11.c
 and vsetvlmax-8.c
Date: Thu, 26 Oct 2023 06:38:54 +0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gcc
X-Bugzilla-Component: target
X-Bugzilla-Version: 14.0
X-Bugzilla-Keywords: wrong-code
X-Bugzilla-Severity: normal
X-Bugzilla-Who: kito at gcc dot gnu.org
X-Bugzilla-Status: UNCONFIRMED
X-Bugzilla-Resolution: 
X-Bugzilla-Priority: P3
X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org
X-Bugzilla-Target-Milestone: ---
X-Bugzilla-Flags: 
X-Bugzilla-Changed-Fields: cc
Message-ID: <bug-112092-4-koqmq0oNx4@http.gcc.gnu.org/bugzilla/>
In-Reply-To: <bug-112092-4@http.gcc.gnu.org/bugzilla/>
References: <bug-112092-4@http.gcc.gnu.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
List-Id: <gcc-bugs.sourceware.org>

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D112092

Kito Cheng <kito at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |kito at gcc dot gnu.org
--- Comment #4 from Kito Cheng <kito at gcc dot gnu.org> ---
The testcase it self is look like tricky but right,=20
it typically could use to optimize mixed-width (mixed-SEW) operations,

You can refer to the EEW stuffs in v-spec[1], most load store has encoding
static-EEW and then could apply such vsetvli fusion optimization.

[1]
https://github.com/riscv/riscv-v-spec/blob/master/v-spec.adoc#52-vector-ope=
rands

Give a (more) practical example here:

```c
#include "riscv_vector.h"

void foo(int32_t *in1, int16_t *in2, int16_t *in3, int32_t *out, size_t n, =
int
cond, int avl) {
    size_t vl =3D __riscv_vsetvl_e16mf2(avl);
    vint32m1_t a =3D __riscv_vle32_v_i32m1(in1, vl);
    vint16mf2_t b =3D __riscv_vle16_v_i16mf2(in2, vl);
    vint16mf2_t c =3D __riscv_vle16_v_i16mf2(in3, vl);
    vint32m1_t x =3D __riscv_vwmacc_vv_i32m1(a, b, c, vl);
    __riscv_vse32_v_i32m1(out, x, vl);
}

```

> Is is guaranteed by the RVV specification that the value of `vl' produced
> (which is then supplied as an argument to `__riscv_vle32_v_i32m1', etc.;
> I presume implicitly via the VL CSR as I can't see it in actual assembly
> produced) is going to be the same for all microarchitectures for both:
>
>	vsetvli	zero,a6,e32,m1,tu,ma
>
>and:
>
>	vsetvli	zero,a6,e16,mf2,ta,ma

This is another trick in this case: tail agnostic vs tail undisturbed

tail undisturbed has stronger semantic than tail agnostic, so using tail
undisturbed for agnostic is always safe and satisfied the semantic, same for
mask agnostic vs mask undisturbed.

But performance is another story, as I know some uArch implement agnostic as
undisturbed, which means agnostic or undisturbed no much difference, so fuse
those two vsetvli is become kind of optimization.

However you could imagine, that also means some uArch is implement agnostic=
 in
another way: agnostic MAY has better performance than undisturbed, we should
not fuse those vsetvli IF we are targeting such target, anyway, our cost mo=
del
for RVV still in an initial states, so personally I am fine with that for n=
ow,
but I guess we need add some more stuff to -mtune to handle those differenc=
e.=