From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id 197E53858D38; Thu, 28 Dec 2023 09:32:28 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 197E53858D38 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1703755949; bh=CjdtKmQIbHBC9RG1gjp7i5Wp98dCkRwleoICPb72Ii8=; h=From:To:Subject:Date:In-Reply-To:References:From; b=hYf7FUUPAIp7uGaYslN+4DsqCw0qV32jQC8YTrsOpK5GcbYfq+5H9AkFPwkrGxcaU 5yxPBzkvQp+0O44BAiKulHLwob1XLc39bQNnTlAxuwyk7ez1vznCGQ2707dlRlp2Yc HcqB3s5fp3LRqocevcYx0FlhDIpEPjQSJXwAmFak= From: "juzhe.zhong at rivai dot ai" To: gcc-bugs@gcc.gnu.org Subject: [Bug c/113162] RISC-V: Unexpected register spillings in vectorized codes and intrinsic codes that have subregs. Date: Thu, 28 Dec 2023 09:32:27 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: c X-Bugzilla-Version: 14.0 X-Bugzilla-Keywords: X-Bugzilla-Severity: normal X-Bugzilla-Who: juzhe.zhong at rivai dot ai X-Bugzilla-Status: UNCONFIRMED X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 List-Id: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D113162 --- Comment #1 from JuzheZhong --- We can have this following reduced intrinsic codes to represent the subreg issue: https://godbolt.org/z/KfvT7hjnz #include "riscv_vector.h" void foo (void *in, void *out, int x) { vint32m2_t dup =3D __riscv_vmv_v_x_i32m2 (x, 8); vint32m2x4_t tuple1 =3D __riscv_vlseg4e32_v_i32m2x4 (in, 8); vint32m2x4_t tuple2 =3D __riscv_vlseg4e32_v_i32m2x4 (in + 16, 8); vint32m2_t tmp1 =3D __riscv_vadd_vv_i32m2 (dup, __riscv_vget_v_i32m2x4_= i32m2 (tuple2, 0), 8); vint32m2_t tmp2 =3D __riscv_vmul_vv_i32m2 (tmp1, __riscv_vget_v_i32m2x4= _i32m2 (tuple1, 0), 8); tmp1 =3D __riscv_vmul_vv_i32m2 (tmp1, __riscv_vget_v_i32m2x4_i32m2 (tup= le1, 1), 8); vint32m2_t tmp3 =3D __riscv_vadd_vv_i32m2 (dup, __riscv_vget_v_i32m2x4_= i32m2 (tuple2, 2), 8); vint32m2_t tmp4 =3D __riscv_vmul_vv_i32m2 (tmp3, __riscv_vget_v_i32m2x4= _i32m2 (tuple1, 2), 8); vint32m2_t tmp9 =3D __riscv_vmul_vv_i32m2 (tmp3, __riscv_vget_v_i32m2x4= _i32m2 (tuple1, 3), 8); vint32m2_t tmp5 =3D __riscv_vnmsub_vv_i32m2 (tmp9, __riscv_vget_v_i32m2x4_i32m2 (tuple2, 1), tmp2, 8); vint32m2_t tmp6 =3D __riscv_vmacc_vv_i32m2 (tmp1, __riscv_vget_v_i32m2x4_i32m2 (tuple1, 0), tmp3, 8); vint32m2_t tmp7 =3D __riscv_vnmsac_vv_i32m2 (tmp4, __riscv_vget_v_i32m2x4_i32m2 (tuple2, 3), tmp4, 8); vint32m2_t tmp8 =3D __riscv_vmacc_vv_i32m2 (tmp3, __riscv_vget_v_i32m2x4_i32m2 (tuple2, 3), __riscv_vget_v_i32m2x4_i32m2 (tup= le1, 2), 8); vint32m2x4_t create =3D __riscv_vcreate_v_i32m2x4 (tmp5, tmp6, tmp7, tm= p8); __riscv_vsseg4e32_v_i32m2x4 (out, create, 8); } GCC: foo: csrr t0,vlenb slli t1,t0,1 addi a5,a0,16 vsetivli zero,8,e32,m2,ta,ma sub sp,sp,t1 vlseg4e32.v v8,(a0) vlseg4e32.v v24,(a5) vmv.v.x v2,a2 csrr t0,vlenb vadd.vv v4,v2,v24 vadd.vv v2,v2,v28 vmul.vv v0,v4,v10 vmul.vv v16,v2,v12 vmul.vv v4,v4,v8 vs2r.v v16,0(sp) slli t1,t0,1 vmul.vv v6,v2,v14 vmv.v.v v18,v0 vmacc.vv v18,v2,v8 vnmsub.vv v6,v26,v4 vmv.v.v v22,v2 vmacc.vv v22,v30,v12 vl2re32.v v4,0(sp) vmv2r.v v16,v6 vnmsub.vv v4,v30,v4 vmv2r.v v20,v4 vsseg4e32.v v16,(a1) add sp,sp,t1 jr ra Clang: foo: # @foo vsetivli zero, 8, e32, m2, ta, ma addi a3, a0, 16 vlseg4e32.v v8, (a3) vlseg4e32.v v16, (a0) vmv.v.x v24, a2 vadd.vv v8, v24, v8 vmul.vv v26, v8, v16 vmul.vv v4, v8, v18 vadd.vv v8, v24, v12 vmul.vv v6, v8, v20 vmul.vv v2, v8, v22 vnmsub.vv v2, v10, v26 vmacc.vv v4, v8, v16 vnmsub.vv v6, v14, v6 vmacc.vv v8, v20, v14 vsseg4e32.v v2, (a1) ret With lehua's patch: foo: addi a5,a0,16 vsetivli zero,8,e32,m2,ta,ma vlseg4e32.v v8,(a0) vlseg4e32.v v24,(a5) vmv.v.x v2,a2 vadd.vv v4,v2,v24 vadd.vv v2,v2,v28 vmul.vv v10,v4,v10 vmul.vv v14,v2,v14 vmul.vv v4,v4,v8 vmul.vv v6,v2,v12 vmv.v.v v16,v14 vnmsub.vv v16,v26,v4 vmv.v.v v18,v10 vmacc.vv v18,v2,v8 vmv.v.v v20,v6 vnmsub.vv v20,v30,v6 vmv.v.v v22,v2 vmacc.vv v22,v30,v12 vsseg4e32.v v16,(a1) ret No spillings, but has some redundant vmv.v.v which should be other issues.=