From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id B7A233858418; Wed, 1 Nov 2023 06:17:09 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org B7A233858418 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1698819429; bh=3KeQ8y5N79d1QpF7mTfvcL74pnMOpHXkp+Df3tZQZUw=; h=From:To:Subject:Date:From; b=Tc5mIC6edkvqP0AKRO2Po4fpC1+Tsqmg5ea+HAgzvKcL/jHb1Xy0H68a2Yr1fWtnv fYzmxBcJ89QBvSGNb8oHaamT2+Hu1nKCn/eT6aPs4rCe4f5tNHmQpmR27UGo2RbaTg SE/K6Q7XP2pAfYJTudK8/6VXuiJUihouzI+Vgpl8= From: "juzhe.zhong at rivai dot ai" To: gcc-bugs@gcc.gnu.org Subject: [Bug c/112327] New: RVV: Redundant vmv1r for widen reduction Date: Wed, 01 Nov 2023 06:17:09 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: new X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: c X-Bugzilla-Version: 14.0 X-Bugzilla-Keywords: X-Bugzilla-Severity: normal X-Bugzilla-Who: juzhe.zhong at rivai dot ai X-Bugzilla-Status: UNCONFIRMED X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: bug_id short_desc product version bug_status bug_severity priority component assigned_to reporter target_milestone Message-ID: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 List-Id: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D112327 Bug ID: 112327 Summary: RVV: Redundant vmv1r for widen reduction Product: gcc Version: 14.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: juzhe.zhong at rivai dot ai Target Milestone: --- #include "riscv_vector.h" void rvv_dot_prod(int16_t *pSrcA, int16_t *pSrcB, uint32_t n, int64_t *resu= lt) { size_t vl; vint16m4_t vSrcA, vSrcB; vint64m1_t vSum =3D __riscv_vmv_s_x_i64m1(0, 1); while (n > 0) { vl =3D __riscv_vsetvl_e16m4(n); vSrcA =3D __riscv_vle16_v_i16m4(pSrcA, vl); vSrcB =3D __riscv_vle16_v_i16m4(pSrcB, vl); vSum =3D __riscv_vwredsum_vs_i32m8_i64m1(__riscv_vwmul_vv_i32m8(vSr= cA, vSrcB, vl), vSum, vl); pSrcA +=3D vl; pSrcB +=3D vl; n -=3D vl; } *result =3D __riscv_vmv_x_s_i64m1_i64(vSum); } https://godbolt.org/z/sb8G7ExKP GCC: ... vmv1r.v v2,v1 ... vwredsum.vs v1,v8,v2 Clang: vwredsum.vs v8, v24, v8 The root cause is that we don't allow vwredsum.vs vd,vs2,vs1, vs1 overlaps = vd.=