From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id 5C8433858C41; Thu, 26 Oct 2023 06:38:55 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 5C8433858C41 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1698302335; bh=7i3im0kEXovzdBGGYAJVe9yIIvWEga2BriZ3WmkRy+c=; h=From:To:Subject:Date:In-Reply-To:References:From; b=bOFdYuxBJ6Yahonf6x2SnMcWvIw03/a2igq6Bh6mcB+bsluzFKAWl9oYfkaDEo487 JmWr5rcpPK3pIjCJM+UeopOimVQLqMHIQREE7ebEMzgmfQb59no2YxlaomzmNfD/5K qhwq7TQUcH4Hoctg9/1e2+gCll9akdrTH9Im1mYo= From: "kito at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug target/112092] RISC-V: Wrong RVV code produced for vsetvl-11.c and vsetvlmax-8.c Date: Thu, 26 Oct 2023 06:38:54 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: target X-Bugzilla-Version: 14.0 X-Bugzilla-Keywords: wrong-code X-Bugzilla-Severity: normal X-Bugzilla-Who: kito at gcc dot gnu.org X-Bugzilla-Status: UNCONFIRMED X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: cc Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 List-Id: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D112092 Kito Cheng changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |kito at gcc dot gnu.org --- Comment #4 from Kito Cheng --- The testcase it self is look like tricky but right,=20 it typically could use to optimize mixed-width (mixed-SEW) operations, You can refer to the EEW stuffs in v-spec[1], most load store has encoding static-EEW and then could apply such vsetvli fusion optimization. [1] https://github.com/riscv/riscv-v-spec/blob/master/v-spec.adoc#52-vector-ope= rands Give a (more) practical example here: ```c #include "riscv_vector.h" void foo(int32_t *in1, int16_t *in2, int16_t *in3, int32_t *out, size_t n, = int cond, int avl) { size_t vl =3D __riscv_vsetvl_e16mf2(avl); vint32m1_t a =3D __riscv_vle32_v_i32m1(in1, vl); vint16mf2_t b =3D __riscv_vle16_v_i16mf2(in2, vl); vint16mf2_t c =3D __riscv_vle16_v_i16mf2(in3, vl); vint32m1_t x =3D __riscv_vwmacc_vv_i32m1(a, b, c, vl); __riscv_vse32_v_i32m1(out, x, vl); } ``` > Is is guaranteed by the RVV specification that the value of `vl' produced > (which is then supplied as an argument to `__riscv_vle32_v_i32m1', etc.; > I presume implicitly via the VL CSR as I can't see it in actual assembly > produced) is going to be the same for all microarchitectures for both: > > vsetvli zero,a6,e32,m1,tu,ma > >and: > > vsetvli zero,a6,e16,mf2,ta,ma This is another trick in this case: tail agnostic vs tail undisturbed tail undisturbed has stronger semantic than tail agnostic, so using tail undisturbed for agnostic is always safe and satisfied the semantic, same for mask agnostic vs mask undisturbed. But performance is another story, as I know some uArch implement agnostic as undisturbed, which means agnostic or undisturbed no much difference, so fuse those two vsetvli is become kind of optimization. However you could imagine, that also means some uArch is implement agnostic= in another way: agnostic MAY has better performance than undisturbed, we should not fuse those vsetvli IF we are targeting such target, anyway, our cost mo= del for RVV still in an initial states, so personally I am fine with that for n= ow, but I guess we need add some more stuff to -mtune to handle those differenc= e.=