From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id C234E385840E; Thu, 30 Nov 2023 10:08:57 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org C234E385840E DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1701338937; bh=50E2BFgBLsO6jTg3zeh+UuYGzvg9M8uYU6WWjf7AbnI=; h=From:To:Subject:Date:From; b=Ft3F+75/8MZBOjff19LF3EGYJrH7nUEsLDVrmpm3I0sA4KqaTfiO2WxM4aXa6ZSwV tmeiCJA21HVhZbhOMN0Lvx5YM3ScvWR/UdtUSKaS8J7aTElMKRX9LL3zU88GFn0tFS wF9YmIM+90rXvrvpFb6GrvIx0adJVGDKFgxYvgjo= From: "juzhe.zhong at rivai dot ai" To: gcc-bugs@gcc.gnu.org Subject: [Bug c/112776] New: RISC-V Regression: Missed optimization of VSETVL PASS Date: Thu, 30 Nov 2023 10:08:57 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: new X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: c X-Bugzilla-Version: 14.0 X-Bugzilla-Keywords: X-Bugzilla-Severity: normal X-Bugzilla-Who: juzhe.zhong at rivai dot ai X-Bugzilla-Status: UNCONFIRMED X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: bug_id short_desc product version bug_status bug_severity priority component assigned_to reporter target_milestone Message-ID: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 List-Id: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D112776 Bug ID: 112776 Summary: RISC-V Regression: Missed optimization of VSETVL PASS Product: gcc Version: 14.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: juzhe.zhong at rivai dot ai Target Milestone: --- #include "riscv_vector.h" void foo_vec(float *r, const float *x) { int i, k; vfloat32m4_t x_vec; vfloat32m4_t x_forward_vec; vfloat32m4_t temp_vec; /** * I have to use m1 to complicat intrisic */ vfloat32m1_t dst_vec; vfloat32m1_t src_vec; float result =3D 0.0f; float shift_prev =3D 0.0f; size_t n =3D 64; for(size_t vl; n>0; n -=3Dvl){ vl =3D __riscv_vsetvl_e32m4(n); //LMUL=3D4 x_vec =3D __riscv_vle32_v_f32m4(&x[0], vl); x_forward_vec =3D __riscv_vle32_v_f32m4(&x[0], vl); temp_vec =3D __riscv_vfmul_vv_f32m4(x_vec, x_forward_vec, vl); /** * I have to use m1 to complicat intrisic */ //vfloat32m1_t __riscv_vfmv_s_tu(vfloat32m1_t vd, float rs1, size_t vl); src_vec =3D __riscv_vfmv_s_tu(src_vec, 0.0f, vl); //ini= tial src_vec //dst_vec =3D __riscv_vfmv_s_f_f32m1_tu(dst_vec, 0.0f, vl); //c= lean for vfredosum dst_vec =3D __riscv_vfmv_s_tu(dst_vec, 0.0f, vl); //clean for vfredosum dst_vec =3D __riscv_vfredosum_tu(dst_vec, temp_vec, src_vec, vl); r[0] =3D __riscv_vfmv_f_s_f32m1_f32(dst_vec); } } ASM: GCC-14 foo_vec: li a4,64 .L2: vsetvli a5,a4,e8,m1,ta,ma --->=20 vsetvli zero,a5,e32,m1,tu,ma vmv.s.x v2,zero vmv.s.x v1,zero vsetvli zero,a5,e32,m4,tu,ma vle32.v v4,0(a1) vfmul.vv v4,v4,v4 vfredosum.vs v1,v4,v2 vfmv.f.s fa5,v1 fsw fa5,0(a0) sub a4,a4,a5 bne a4,zero,.L2 ret GCC-13: foo_vec(float*, float const*): fmv.s.x fa5,zero li a4,64 .L2: vsetvli a5,a4,e32,m4,ta,ma vle32.v v28,0(a1) vfmv.s.f v25,fa5 vfmul.vv v28,v28,v28 vfmv.s.f v24,fa5 sub a4,a4,a5 vfredosum.vs v24,v28,v25 vfmv.f.s fa4,v24 fsw fa4,0(a0) bne a4,zero,.L2 ret=