From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id B82F53858C83; Wed, 8 Mar 2023 22:32:22 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org B82F53858C83 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1678314742; bh=Iy9eHg89Vp9UZ+rhG/GkxHlhFT94LVoPliDiiqO13Do=; h=From:To:Subject:Date:From; b=sppzYxmj2mmXwHGsAyKCL/Zmea6t5fB7Nt/tCl/GE6Si/t1o51rRnpGmYcdVg4QuU 77y+GP4l2lzTyoBXu+ttlBnyfiD++oXBnL+iXTAhvZV6+8uX3klQpg1brYi2XdENiD g8y9mo58QSpergS/4xTZl/WupQGPpguzo8e9x0hQ= From: "tnfchris at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug target/109072] New: [12/13 Regression] SLP costs for vec duplicate too high since g:4963079769c99c4073adfd799885410ad484cbbe Date: Wed, 08 Mar 2023 22:32:22 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: new X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: target X-Bugzilla-Version: 12.0 X-Bugzilla-Keywords: missed-optimization X-Bugzilla-Severity: normal X-Bugzilla-Who: tnfchris at gcc dot gnu.org X-Bugzilla-Status: UNCONFIRMED X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: bug_id short_desc product version bug_status keywords bug_severity priority component assigned_to reporter cc target_milestone cf_gcctarget Message-ID: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 List-Id: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D109072 Bug ID: 109072 Summary: [12/13 Regression] SLP costs for vec duplicate too high since g:4963079769c99c4073adfd799885410ad484cbbe Product: gcc Version: 12.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: tnfchris at gcc dot gnu.org CC: rsandifo at gcc dot gnu.org Target Milestone: --- Target: aarch64* The following example --- #include float32x4_t f (float32x4_t v, float res) { float data[4]; data[0] =3D res; data[1] =3D res; data[2] =3D res; data[3] =3D res; return vld1q_f32 (&data[0]); } --- compiled with -Ofast fails to SLP starting with GCC 12. This used to generate: f: dup v0.4s, v1.s[0] ret and now generates: f: fmov w5, s1 fmov w1, s1 fmov w4, s1 fmov w0, s1 mov x2, 0 mov x3, 0 bfi x2, x5, 0, 32 bfi x3, x1, 0, 32 bfi x2, x4, 32, 32 bfi x3, x0, 32, 32 fmov d0, x2 ins v0.d[1], x3 ret The SLP costs went from: Vector cost: 2 Scalar cost: 4 to: Vector cost: 12 Scalar cost: 4 it looks like it's no longer costing it as a duplicate but instead 4 vec inserts. bisected to: commit g:4963079769c99c4073adfd799885410ad484cbbe Author: Richard Sandiford Date: Tue Feb 15 18:09:33 2022 +0000 vect+aarch64: Fix ldp_stp_* regressions ldp_stp_1.c, ldp_stp_4.c and ldp_stp_5.c have been failing since vectorisation was enabled at -O2. In all three cases SLP is generating vector code when scalar code would be better. ....=