From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id DB8B838582A9; Tue, 23 Jan 2024 13:05:21 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org DB8B838582A9 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1706015121; bh=e+iTtwMyeWKVE1NRNz+xyZnm/VBDee8H2GQoV3bH1yY=; h=From:To:Subject:Date:In-Reply-To:References:From; b=ynp+CS56uaTDh4Nlv9Qluec7QHsh+IBa2NsfhdoHzSRqgstpawJSclB20Cfx0luYF G/+UWiS54y+Z3x5dEzjfsDaQDRMOM2mi+7WGjjxKsUraZuWEvSBgfjGE20Ze8gFQPc S/ifJ4/MqTnjrfxPnHIM/yJ+QSyum9E3griY3aeY= From: "tnfchris at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug tree-optimization/113441] [14 Regression] Fail to fold the last element with multiple loop Date: Tue, 23 Jan 2024 13:05:21 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: tree-optimization X-Bugzilla-Version: 14.0 X-Bugzilla-Keywords: missed-optimization, needs-bisection X-Bugzilla-Severity: normal X-Bugzilla-Who: tnfchris at gcc dot gnu.org X-Bugzilla-Status: UNCONFIRMED X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: 14.0 X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 List-Id: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D113441 --- Comment #22 from Tamar Christina --- for me with `-fno-vect-cost-model` on without this commit we generate https://gist.github.com/Mistuke/d9252bfcb2aa766327c5f377e162f5b7 for the lo= op and with the commit well.. it doesn't fit on the screen but the codegen is pretty horrible with smlal2 v24.4s, v13.8h, v5.8h smull v31.4s, v30.4h, v17.4h add v20.4s, v20.4s, v11.4s smlal2 v29.4s, v3.8h, v6.8h smull2 v25.4s, v25.8h, v15.8h add v22.4s, v28.4s, v22.4s shrn v21.4h, v21.4s, 15 add v20.4s, v20.4s, v26.4s add v29.4s, v29.4s, v24.4s smlal2 v25.4s, v16.8h, v7.8h smlal v31.4s, v18.4h, v8.4h smull2 v27.4s, v27.8h, v17.8h shrn2 v21.8h, v22.4s, 15 add v29.4s, v29.4s, v25.4s add v31.4s, v31.4s, v20.4s smlal2 v27.4s, v18.8h, v8.8h str h21, [x5, x9] add x9, x9, 32 add x9, x5, x9 shrn v31.4h, v31.4s, 15 st1 {v21.h}[1], [x10] add v27.4s, v27.4s, v29.4s st1 {v21.h}[2], [x6] add x6, x7, 20 add x10, x1, x21 st1 {v21.h}[3], [x2] add x2, x7, 24 add x7, x7, 28 st1 {v21.h}[4], [x8] shrn2 v31.8h, v27.4s, 15 st1 {v21.h}[5], [x6] lsl x6, x10, 1 add x10, x5, x10, lsl 1 st1 {v21.h}[6], [x2] add x2, x10, 4 st1 {v21.h}[7], [x7] add x7, x10, 8 str h31, [x5, x6] add x8, x10, 12 lsl x1, x1, 1 add x6, x6, 32 st1 {v31.h}[1], [x2] add x2, x10, 16 st1 {v31.h}[2], [x7] add x7, x10, 20 st1 {v31.h}[3], [x8] add x8, x10, 24 add x10, x10, 28 st1 {v31.h}[4], [x2] st1 {v31.h}[5], [x7] add x11, x1, 32 st1 {v31.h}[6], [x8] add x11, x0, x11 st1 {v31.h}[7], [x10] add x10, x1, x25 ld1h z31.s, p5/z, [x11] going on for a while. i.e. single element lane stores. So with the cost mod= el disabled, it definitely does get worse witht that commit. with the cost mod= el on there's no difference.=