From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-bugzilla@gcc.gnu.org>
Received: by sourceware.org (Postfix, from userid 48)
	id B018E3858D39; Tue, 28 Mar 2023 10:07:23 +0000 (GMT)
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org B018E3858D39
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org;
	s=default; t=1679998043;
	bh=i+1tNdq2AIle+MaiY+PSTiMeIdXbY1wGMxI7bBnYwRg=;
	h=From:To:Subject:Date:In-Reply-To:References:From;
	b=qDTd/jMJXwRkhUK0DwsBiFJRkuFg/6juPypb6jcYBL/h4hguBA3VcVmdbGxhazwYF
	 e+hCwwHkXU6Ymxi+SUyMwIztDxbPcgg3SdXaVR+GY4F6luLH4+pntXmkqoVcS510Al
	 R94x8CB9bZGukp8tC7x3dT9977T/RDbKXZj1Mtac=
From: "tnfchris at gcc dot gnu.org" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug tree-optimization/109154] [13 regression] jump threading
 de-optimizes nested floating point comparisons
Date: Tue, 28 Mar 2023 10:07:21 +0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gcc
X-Bugzilla-Component: tree-optimization
X-Bugzilla-Version: 13.0
X-Bugzilla-Keywords: missed-optimization
X-Bugzilla-Severity: normal
X-Bugzilla-Who: tnfchris at gcc dot gnu.org
X-Bugzilla-Status: NEW
X-Bugzilla-Resolution: 
X-Bugzilla-Priority: P3
X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org
X-Bugzilla-Target-Milestone: 13.0
X-Bugzilla-Flags: 
X-Bugzilla-Changed-Fields: 
Message-ID: <bug-109154-4-DxoW4Af4Wm@http.gcc.gnu.org/bugzilla/>
In-Reply-To: <bug-109154-4@http.gcc.gnu.org/bugzilla/>
References: <bug-109154-4@http.gcc.gnu.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
List-Id: <gcc-bugs.sourceware.org>

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D109154
--- Comment #24 from Tamar Christina <tnfchris at gcc dot gnu.org> ---
(In reply to Jakub Jelinek from comment #12)
> (In reply to Richard Biener from comment #11)
> > _1 shoud be [-Inf, nextafter (0.0, -Inf)], not [-Inf, -0.0]
> The reduced testcase is invalid because it uses uninitialized l.

Sure, lets fix that, it was reduced a bit too far:

https://godbolt.org/z/he3rT5Exq

Has the extracted codegen part.

Note how GCC 14 does at least 2x the number of floating point comparisons in
the hot loops.

The scalar code doesn't look (off the top of my head) that bad, but the
additional entries in the phi nodes are still causing major headaches for
vector code.

  # iftmp.2_36 =3D PHI <1(10), _95(11), 0(9)>
  # iftmp.0_97 =3D PHI <2.0e+0(10), 2.0e+0(11), 4.0e+0(9)>
  # iftmp.1_101 =3D PHI <5.0e-1(10), 5.0e-1(11), 2.5e-1(9)>

vs before

  # iftmp.2_38 =3D PHI <1(11), _95(12)>
  # iftmp.0_96 =3D PHI <2.0e+0(11), iftmp.0_94(12)>
  # iftmp.1_100 =3D PHI <5.0e-1(11), iftmp.1_98(12)>

which causes it to generate:

        fcmge   p3.s, p0/z, z0.s, z6.s
        fcmlt   p1.s, p0/z, z0.s, z6.s
        fcmge   p1.s, p1/z, z0.s, #0.0
        fcmge   p1.s, p3/z, z0.s, #0.0
        fcmlt   p3.s, p0/z, z0.s, #0.0

        vs

        fcmge   p3.s, p0/z, z0.s, #0.0
        fcmlt   p2.s, p0/z, z0.s, z16.s

The split in threading is causing it to miss that it can do the comparison =
with
0 just once on all the element.=