From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id 6E7BB385841A; Thu, 22 Feb 2024 16:18:17 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 6E7BB385841A DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1708618697; bh=x3F1Dw0SlOPNey/7AEIygtE5FrLpvN2z+ZCdIQMlvN4=; h=From:To:Subject:Date:In-Reply-To:References:From; b=RrggjwFZVqKv7e0H4cBpOD7qFWBe572knXuxwkecN23mwanlnYVMJR7L2eiBV1Vv+ m5zvsTXyiox20cH+aat7XL8tN8GkJpbXHn1CQXevsgoYJi22M42o5TYtDaTkUTwNtD 8paFEboUVIdHzo99ke9pvpwNKWX9TRB5wW0ydo2g= From: "tnfchris at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug tree-optimization/113441] [14 Regression] Fail to fold the last element with multiple loop since g:2efe3a7de0107618397264017fb045f237764cc7 Date: Thu, 22 Feb 2024 16:18:13 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: tree-optimization X-Bugzilla-Version: 14.0 X-Bugzilla-Keywords: missed-optimization X-Bugzilla-Severity: normal X-Bugzilla-Who: tnfchris at gcc dot gnu.org X-Bugzilla-Status: NEW X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: 14.0 X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: everconfirmed short_desc cf_reconfirmed_on bug_status keywords Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 List-Id: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D113441 Tamar Christina changed: What |Removed |Added ---------------------------------------------------------------------------- Ever confirmed|0 |1 Summary|[14 Regression] Fail to |[14 Regression] Fail to |fold the last element with |fold the last element with |multiple loop |multiple loop since | |g:2efe3a7de0107618397264017 | |fb045f237764cc7 Last reconfirmed| |2024-02-22 Status|UNCONFIRMED |NEW Keywords|needs-bisection | --- Comment #26 from Tamar Christina --- (In reply to Richard Biener from comment #18) > (In reply to Tamar Christina from comment #17) > > Ok, bisected to > >=20 > > g:2efe3a7de0107618397264017fb045f237764cc7 is the first bad commit > > commit 2efe3a7de0107618397264017fb045f237764cc7 > > Author: Hao Liu > > Date: Wed Dec 6 14:52:19 2023 +0800 > >=20 > > tree-optimization/112774: extend the SCEV CHREC tree with a nonwrap= ping > > flag > >=20 > > Before this commit we were unable to analyse the stride of the access. > > After this niters seems to estimate the loop trip count at 4 and after = that > > the logs diverge enormously. >=20 > Hum, but that's backward and would match to what I said in comment#2 - we > should get better code with that. >=20 Ok, so I've dug more into this today. It's definitely this commit that's causing it. The reason is we no longer consider masked gather/scatters. Before this commit we the gather pattern would trigger: tresg.i:3:275: note: gather/scatter pattern: detected: a[_2] =3D b.3_3;= =20=20=20=20=20=20=20=20 =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20= =20=20=20=20 =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20= =20=20=20=20 tresg.i:3:275: note: gather_scatter pattern recognized: .SCATTER_STORE ((sizetype) &a, _2, 4, b.3_3);=20=20=20 and the use of the masked scatter is what's causing the epilogue to not be required and why it generates better code. It's not the loads. The issue is that vect_analyze_data_refs only considers gather/scatters IF = DR analysis fails, which it did before: tresg.c:31:29: missed: failed: evolution of offset is not affine. base_address: offset from base address: constant offset from base address: step: base alignment: 0 base misalignment: 0 offset alignment: 0 step alignment: 0 base_object: array1 Access function 0: {{m_112 * 2, +, 24}_3, +, 2}_4 Access function 1: 0 Creating dr for array1[0][_8] this now succeeds after the quoted commit: success. base_address: &array1 offset from base address: (ssizetype) ((sizetype) (m_111 * 2) * 2) constant offset from base address: 0 step: 4 base alignment: 8 base misalignment: 0 offset alignment: 4 step alignment: 4 base_object: array1 Access function 0: {{m_112 * 2, +, 24}_3, +, 2}_4 Access function 1: 0 Creating dr for array1[0][_8] so we never enter /* Check that analysis of the data-ref succeeded. */ if (!DR_BASE_ADDRESS (dr) || !DR_OFFSET (dr) || !DR_INIT (dr) || !DR_STEP (dr)) { and without the IFN scatters it tries deinterleaving scalar stores to scatt= ers: tresg.c:29:22: note: Detected single element interleaving array1[0][_8] s= tep 4 tresg.c:29:22: note: Detected single element interleaving array1[1][_8] s= tep 4 tresg.c:29:22: note: Detected single element interleaving array1[2][_8] s= tep 4 tresg.c:29:22: note: Detected single element interleaving array1[3][_8] s= tep 4 tresg.c:29:22: note: Detected single element interleaving array1[0][_1] s= tep 4 tresg.c:29:22: note: Detected single element interleaving array1[1][_1] s= tep 4 tresg.c:29:22: note: Detected single element interleaving array1[2][_1] s= tep 4 tresg.c:29:22: note: Detected single element interleaving array1[3][_1] s= tep 4 tresg.c:29:22: missed: not consecutive access array2[_4][_8] =3D _70; tresg.c:29:22: note: using strided accesses tresg.c:29:22: missed: not consecutive access array2[_4][_1] =3D _68; tresg.c:29:22: note: using strided accesses ... tresg.c:29:22: note: using gather/scatter for strided/grouped access, sca= le =3D 2 but without the SCATTER_STORE IFN it never tries masking the scatter, so we lose MASK_SCATTER_STORE and hence we generate worse code because the whole = loop can no longer be predicated However trying to force it generates an ICE so I guess it's not that simple= .=