From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-bugzilla@gcc.gnu.org>
Received: by sourceware.org (Postfix, from userid 48)
	id 0410D385842E; Tue, 28 Mar 2023 15:31:24 +0000 (GMT)
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 0410D385842E
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org;
	s=default; t=1680017485;
	bh=ucSeizYlBEJG56Vyq9ep8RS+R6TF1oZTKehhVxfA/6I=;
	h=From:To:Subject:Date:In-Reply-To:References:From;
	b=q3x3IDCmI5wPV3nDJ+T7jzf2ShGXI97fxQ9u9SX6qAbApBFn9gb9aEl9pAVETauT1
	 YZPkaaHt3BEfgPx961Pm9+ci03ciQUUN0u5i1vR1SQL9dsUVjl5VYJLBfEUjNaGGif
	 OWGqYH8UjoN2p+mZlxuh/qZa1VhMgey4CsCR1o4A=
From: "amacleod at redhat dot com" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug tree-optimization/109154] [13 regression] jump threading
 de-optimizes nested floating point comparisons
Date: Tue, 28 Mar 2023 15:31:23 +0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gcc
X-Bugzilla-Component: tree-optimization
X-Bugzilla-Version: 13.0
X-Bugzilla-Keywords: missed-optimization
X-Bugzilla-Severity: normal
X-Bugzilla-Who: amacleod at redhat dot com
X-Bugzilla-Status: NEW
X-Bugzilla-Resolution: 
X-Bugzilla-Priority: P3
X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org
X-Bugzilla-Target-Milestone: 13.0
X-Bugzilla-Flags: 
X-Bugzilla-Changed-Fields: 
Message-ID: <bug-109154-4-H1mEUOXWIW@http.gcc.gnu.org/bugzilla/>
In-Reply-To: <bug-109154-4@http.gcc.gnu.org/bugzilla/>
References: <bug-109154-4@http.gcc.gnu.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
List-Id: <gcc-bugs.sourceware.org>

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D109154
--- Comment #32 from Andrew Macleod <amacleod at redhat dot com> ---

The issues is here is pruning to avoid significant time growth.

  _1 =3D (float) l_11(D);=20
  _2 =3D _1 < 0.0;
  zone1_12 =3D (int) _2;
  if (_1 < 0.0)
    goto <bb 3>; [INV]

_1 is an export from the block. In theory if there was a proper range-op en=
try
for a cast from float to int l_11 could also be an export.

We can recompute anything which directly uses an export from the block. _2 =
uses
_1 so we can recompute _2.    We currently only support one level of
recomputation because recognition and computation grows between quadratical=
ly
and expoential based on the number of recomputations required, and
indentifying/evaluating the levels of indirection...

zone1_12 does not directly use an export, so GORI does not see it as someth=
ing
which it can evaluate. To evaluate it, we have to see that _2 is recomputab=
le,
reconmpute it, then recompute zone1_12.

This could in theory be an arbitrarily long range, and for performance reas=
ons,
we limited it to 1 up until this point.

Note that is we had used _2:
if (_2 !=3D 0)
   goto <bb 3>
then _2 would be a export, and zone1_12 would be a recomputation and have t=
he
approriate value.

I have plans to eventually rejig GORI to cache outgoing ranges on edges.  T=
his
would allow us to recompute chains without the quadratic growth and we would
have all the recomputations we want, but at this point, we are only doing o=
ne
level

We could in theory expand it to look at 2 levels if its a single operand...
which will help with some of these cases where there are casts, and keep the
performance degradation from being too bad.   I'm sure there will be cases
where a third would be handy :-P=