From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id B018E3858D39; Tue, 28 Mar 2023 10:07:23 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org B018E3858D39 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1679998043; bh=i+1tNdq2AIle+MaiY+PSTiMeIdXbY1wGMxI7bBnYwRg=; h=From:To:Subject:Date:In-Reply-To:References:From; b=qDTd/jMJXwRkhUK0DwsBiFJRkuFg/6juPypb6jcYBL/h4hguBA3VcVmdbGxhazwYF e+hCwwHkXU6Ymxi+SUyMwIztDxbPcgg3SdXaVR+GY4F6luLH4+pntXmkqoVcS510Al R94x8CB9bZGukp8tC7x3dT9977T/RDbKXZj1Mtac= From: "tnfchris at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug tree-optimization/109154] [13 regression] jump threading de-optimizes nested floating point comparisons Date: Tue, 28 Mar 2023 10:07:21 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: tree-optimization X-Bugzilla-Version: 13.0 X-Bugzilla-Keywords: missed-optimization X-Bugzilla-Severity: normal X-Bugzilla-Who: tnfchris at gcc dot gnu.org X-Bugzilla-Status: NEW X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: 13.0 X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 List-Id: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D109154 --- Comment #24 from Tamar Christina --- (In reply to Jakub Jelinek from comment #12) > (In reply to Richard Biener from comment #11) > > _1 shoud be [-Inf, nextafter (0.0, -Inf)], not [-Inf, -0.0] > The reduced testcase is invalid because it uses uninitialized l. Sure, lets fix that, it was reduced a bit too far: https://godbolt.org/z/he3rT5Exq Has the extracted codegen part. Note how GCC 14 does at least 2x the number of floating point comparisons in the hot loops. The scalar code doesn't look (off the top of my head) that bad, but the additional entries in the phi nodes are still causing major headaches for vector code. # iftmp.2_36 =3D PHI <1(10), _95(11), 0(9)> # iftmp.0_97 =3D PHI <2.0e+0(10), 2.0e+0(11), 4.0e+0(9)> # iftmp.1_101 =3D PHI <5.0e-1(10), 5.0e-1(11), 2.5e-1(9)> vs before # iftmp.2_38 =3D PHI <1(11), _95(12)> # iftmp.0_96 =3D PHI <2.0e+0(11), iftmp.0_94(12)> # iftmp.1_100 =3D PHI <5.0e-1(11), iftmp.1_98(12)> which causes it to generate: fcmge p3.s, p0/z, z0.s, z6.s fcmlt p1.s, p0/z, z0.s, z6.s fcmge p1.s, p1/z, z0.s, #0.0 fcmge p1.s, p3/z, z0.s, #0.0 fcmlt p3.s, p0/z, z0.s, #0.0 vs fcmge p3.s, p0/z, z0.s, #0.0 fcmlt p2.s, p0/z, z0.s, z16.s The split in threading is causing it to miss that it can do the comparison = with 0 just once on all the element.=