From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id F09153858D35; Wed, 8 Nov 2023 00:15:18 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org F09153858D35 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1699402518; bh=oJIDPz0sfW3S3A74m0+UnfM1n2JXM2PbtvdR+R3Il84=; h=From:To:Subject:Date:From; b=iX26Up4aJtiUA/F7tLmuW/O2eOhKY3ysxDHwDWRwUN33SwGUnvq7Tq/pJJSBZWUnv /LEfp1i9+i3xX49W85/OeQF8/E3k/AQ/cfJmWWUHN5fPUU+phtSG4nyOumzOV/V4rn 8WH9WmPbTjBchczYjnCiL2O/XoVCT9tMW8MgFtF0= From: "juzhe.zhong at rivai dot ai" To: gcc-bugs@gcc.gnu.org Subject: [Bug c/112431] New: RISC-V GCC-15 feature: Support register overlap on widen RVV instructions Date: Wed, 08 Nov 2023 00:15:18 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: new X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: c X-Bugzilla-Version: 14.0 X-Bugzilla-Keywords: X-Bugzilla-Severity: normal X-Bugzilla-Who: juzhe.zhong at rivai dot ai X-Bugzilla-Status: UNCONFIRMED X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: bug_id short_desc product version bug_status bug_severity priority component assigned_to reporter target_milestone Message-ID: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 List-Id: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D112431 Bug ID: 112431 Summary: RISC-V GCC-15 feature: Support register overlap on widen RVV instructions Product: gcc Version: 14.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: juzhe.zhong at rivai dot ai Target Milestone: --- According to RVV ISA: "The destination EEW is smaller than the source EEW and the overlap is in t= he=20 lowest-numbered part of the source register group (e.g., when LMUL=3D1, vn= srl.wi=20 v0, v0, 3 is legal, but a destination of v1 is not)." It's nice that we can support register overlap currently for narrow operati= ons. Consider this following example: #include "riscv_vector.h" void f20 (int16_t *base,int8_t *out,size_t vl, size_t shift) { vuint16m2_t src =3D __riscv_vle16_v_u16m2 (base, vl); /* Only allow load v30,v31. */ asm volatile("#" :: : "v0", "v1", "v2", "v3", "v4", "v5", "v6", "v7", "v8", "v= 9", "v10", "v11", "v12", "v13", "v14", "v15", "v16", "v17",= =20 "v18", "v19", "v20", "v21", "v22", "v23", "v24", "v25",= =20=20 "v26", "v27", "v28", "v29"); vuint8m1_t v =3D __riscv_vnclipu_wx_u8m1(src,shift,0,vl); /* Only allow vncvt SRC =3D=3D DEST v30. */ asm volatile("#" ::=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20 : "v0", "v1", "v2", "v3", "v4", "v5", "v6", "v7", "v8", "v= 9",=20 "v10", "v11", "v12", "v13", "v14", "v15", "v16", "v17",= =20=20=20=20=20 "v18", "v19", "v20", "v21", "v22", "v23", "v24", "v25",= =20=20=20=20=20 "v26", "v27", "v28", "v29", "v31"); __riscv_vse8_v_u8m1 (out,v,vl); } https://gcc.godbolt.org/z/j98xejKh5 GCC doesn't have register spilling wheras LLVM has register spillings. However, we failed to support register overlap for RVV widen operations. Since according to RVV ISA:=20 "The destination EEW is greater than the source EEW, the source EMUL is at least=20 1, and the overlap is in the highest-numbered part of the destination regi= ster=20 group (e.g., when LMUL=3D8, vzext.vf4 v0, v6 is legal, but a source of v0,= v2, or=20 v4 is not)." Consider this following case: #include "riscv_vector.h" void f20 (void *base,void *out,size_t vl, size_t shift) { vuint16m1_t src =3D __riscv_vle16_v_u16m1 (base, vl); /* Only allow load v30,v31. */ asm volatile("#" :: : "v0", "v1", "v2", "v3", "v4", "v5", "v6", "v7", "v8", "v= 9", "v10", "v11", "v12", "v13", "v14", "v15", "v16", "v17",= =20 "v18", "v19", "v20", "v21", "v22", "v23", "v24", "v25",= =20=20 "v26", "v27", "v28", "v29", "v30"); vuint32m2_t v =3D __riscv_vwaddu_vv_u32m2(src,src,vl); /* Only allow vncvt SRC =3D=3D DEST v30. */ asm volatile("#" ::=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20 : "v0", "v1", "v2", "v3", "v4", "v5", "v6", "v7", "v8", "v= 9",=20 "v10", "v11", "v12", "v13", "v14", "v15", "v16", "v17",= =20=20=20=20=20 "v18", "v19", "v20", "v21", "v22", "v23", "v24", "v25",= =20=20=20=20=20 "v26", "v27", "v28", "v29"); __riscv_vse32_v_u32m2 (out,v,vl); } https://gcc.godbolt.org/z/h3cM9vhnY Since we are configuring RVV widen instructions early clobber, same as LLVM. We can see both LLVM and GCC fail to overlap registers. GCC ASM: f20: vsetvli zero,a2,e16,m1,ta,ma vle16.v v31,0(a0) vwaddu.vv v2,v31,v31 vmv2r.v v30,v2 ----> Redundant mov instruction. vse32.v v30,0(a1) ret We should be able to generate vwaddu.vv v30,v31,v31 which can eliminate the redundant move instruction. This issue will be fixed on GCC-15 since we don't enough time on GCC-14.=