From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id B669D385AC3B; Wed, 29 Nov 2023 09:37:18 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org B669D385AC3B DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1701250638; bh=w8cEhmDkEzqmsmpmUVJT40ickJzsbrnsLU4J8rssHkg=; h=From:To:Subject:Date:In-Reply-To:References:From; b=ZpxdNBoHowBRajmH6N8QlX6lo4UFVziow9Pd8gEj4bYuj+/gqpM0BChVoC0vqEurJ z65i5+0OBUyGlxMemgbyYaDthlDh9nbFwCGC44+UddC7CtP7/iNCrPty9NIRYWxQmE DiFs0VWhWvoGLHdUVxlq9+RuG0ifeBU/koDodfn0= From: "cvs-commit at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug target/112431] RISC-V GCC-15 feature: Support register overlap on widen RVV instructions Date: Wed, 29 Nov 2023 09:37:13 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: target X-Bugzilla-Version: 14.0 X-Bugzilla-Keywords: missed-optimization X-Bugzilla-Severity: enhancement X-Bugzilla-Who: cvs-commit at gcc dot gnu.org X-Bugzilla-Status: UNCONFIRMED X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 List-Id: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D112431 --- Comment #4 from GCC Commits --- The master branch has been updated by Pan Li : https://gcc.gnu.org/g:bdad036da32f72b84a96070518e7d75c21706dc2 commit r14-5960-gbdad036da32f72b84a96070518e7d75c21706dc2 Author: Juzhe-Zhong Date: Wed Nov 29 16:34:10 2023 +0800 RISC-V: Support highpart register overlap for vwcvt Since Richard supports register filters recently, we are able to support highpart register overlap for widening RVV instructions. This patch support it for vwcvt intrinsics. I leverage real application user codes for vwcvt: https://github.com/riscv/riscv-v-spec/issues/929 https://godbolt.org/z/xoeGnzd8q This is the real application codes that using LMUL =3D 8 with unrolling= to gain optimal performance for specific libraury. You can see in the codegen, GCC has optimal codegen for such since we supported register lowpart overlap for narrowing instructions (dest EEW < source EEW). Now, we start to support highpart register overlap from this patch for widening instructions (dest EEW > source EEW). Leverage this intrinsic codes above but for vwcvt: https://godbolt.org/z/1TMPE5Wfr size_t foo (char const *buf, size_t len) { size_t sum =3D 0; size_t vl =3D __riscv_vsetvlmax_e8m8 (); size_t step =3D vl * 4; const char *it =3D buf, *end =3D buf + len; for (; it + step <=3D end;) { vint8m4_t v0 =3D __riscv_vle8_v_i8m4 ((void *) it, vl); it +=3D vl; vint8m4_t v1 =3D __riscv_vle8_v_i8m4 ((void *) it, vl); it +=3D vl; vint8m4_t v2 =3D __riscv_vle8_v_i8m4 ((void *) it, vl); it +=3D vl; vint8m4_t v3 =3D __riscv_vle8_v_i8m4 ((void *) it, vl); it +=3D vl; asm volatile("nop" ::: "memory"); vint16m8_t vw0 =3D __riscv_vwcvt_x_x_v_i16m8 (v0, vl); vint16m8_t vw1 =3D __riscv_vwcvt_x_x_v_i16m8 (v1, vl); vint16m8_t vw2 =3D __riscv_vwcvt_x_x_v_i16m8 (v2, vl); vint16m8_t vw3 =3D __riscv_vwcvt_x_x_v_i16m8 (v3, vl); asm volatile("nop" ::: "memory"); size_t sum0 =3D __riscv_vmv_x_s_i16m8_i16 (vw0); size_t sum1 =3D __riscv_vmv_x_s_i16m8_i16 (vw1); size_t sum2 =3D __riscv_vmv_x_s_i16m8_i16 (vw2); size_t sum3 =3D __riscv_vmv_x_s_i16m8_i16 (vw3); sum +=3D sumation (sum0, sum1, sum2, sum3); } return sum; } Before this patch: ... csrr t0,vlenb ... vwcvt.x.x.v v16,v8 vwcvt.x.x.v v8,v28 vs8r.v v16,0(sp) ---> spill vwcvt.x.x.v v16,v24 vwcvt.x.x.v v24,v4 nop vsetvli zero,zero,e16,m8,ta,ma vmv.x.s a2,v16 vl8re16.v v16,0(sp) ---> reload ... csrr t0,vlenb ... You can see heavy spill && reload inside the loop body. After this patch: ... vwcvt.x.x.v v8,v12 vwcvt.x.x.v v16,v20 vwcvt.x.x.v v24,v28 vwcvt.x.x.v v0,v4 ... Optimal codegen after this patch. Tested on zvl128b no regression. I am gonna to test zve64d/zvl256b/zvl512b/zvl1024b. Ok for trunk if no regression on the testing above ? Co-authored-by: kito-cheng Co-authored-by: kito-cheng PR target/112431 gcc/ChangeLog: * config/riscv/constraints.md (TARGET_VECTOR ? V_REGS : NO_REGS= ): New register filters. * config/riscv/riscv.md (no,W21,W42,W84,W41,W81,W82): Ditto. (no,yes): Ditto. * config/riscv/vector.md: Support highpart register overlap for vwcvt. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/base/pr112431-1.c: New test. * gcc.target/riscv/rvv/base/pr112431-2.c: New test. * gcc.target/riscv/rvv/base/pr112431-3.c: New test.=