From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-bugzilla@gcc.gnu.org>
Received: by sourceware.org (Postfix, from userid 48)
	id 6168D3858D35; Mon, 10 Jul 2023 10:46:45 +0000 (GMT)
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 6168D3858D35
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org;
	s=default; t=1688986005;
	bh=/XknOUTANu4teQs6fHxzkqz4Tjej3sH6caxCo6X9CEY=;
	h=From:To:Subject:Date:In-Reply-To:References:From;
	b=nY3WSWRgf9uReHkOtpsz3Nk/d3Fu7gb3v/uqwebcXMOZnw8C3iqVjmsAGp4WbVUhC
	 j2u25ga/VdF7OiUuOgef7Q1OcUOvavrspHlkT8byRc8wlG5dWfnFECfaz8GZWnfs9A
	 kdMNCAX36+LRppT2zeE/L/5w4aUtHe/fxMEbYkD0=
From: "rguenth at gcc dot gnu.org" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug tree-optimization/109154] [13/14 regression] jump threading
 de-optimizes nested floating point comparisons
Date: Mon, 10 Jul 2023 10:46:45 +0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gcc
X-Bugzilla-Component: tree-optimization
X-Bugzilla-Version: 13.0
X-Bugzilla-Keywords: missed-optimization
X-Bugzilla-Severity: normal
X-Bugzilla-Who: rguenth at gcc dot gnu.org
X-Bugzilla-Status: NEW
X-Bugzilla-Resolution: 
X-Bugzilla-Priority: P2
X-Bugzilla-Assigned-To: tnfchris at gcc dot gnu.org
X-Bugzilla-Target-Milestone: 13.2
X-Bugzilla-Flags: 
X-Bugzilla-Changed-Fields: 
Message-ID: <bug-109154-4-AwMUWawd0O@http.gcc.gnu.org/bugzilla/>
In-Reply-To: <bug-109154-4@http.gcc.gnu.org/bugzilla/>
References: <bug-109154-4@http.gcc.gnu.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
List-Id: <gcc-bugs.sourceware.org>

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D109154
--- Comment #62 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Tamar Christina from comment #61)
> (In reply to Richard Biener from comment #60)
> > (In reply to Tamar Christina from comment #59)
> > > after ifcvt we end up with:
> > >=20
> > >   _162 =3D chrg_init_70 * iftmp.8_76;
> > >   _164 =3D ABS_EXPR <_162>;
> > >   _167 =3D -_164;
> > >   _ifc__166 =3D distbb_74 < iftmp.0_97 ? _167 : 0.0;
> > >   prephitmp_169 =3D distbb_74 >=3D 0.0 ? _ifc__166 : _168;
> > >=20=20=20
> > > instead of
> > >=20
> > >   _160 =3D chrg_init_75 * iftmp.8_80;
> > >   prephitmp_161 =3D distbb_79 < 0.0 ? chrg_init_75 : _160;
> > >   _164 =3D ABS_EXPR <prephitmp_161>;
> > >   _166 =3D -_164;
> > >   prephitmp_167 =3D distbb_79 < iftmp.0_96 ? _166 : 0.0;
> > >=20
> > > previously we'd make COND_MUL and COND_NEG and so don't need a VCOND =
in the
> > > end,
> > > now we select after the multiplication, so we only have a COND_NEG fo=
llowed
> > > by a VCOND.
> > >=20
> > > This is obviously worse, but I have no idea how to recover it.  Any i=
deas?
> >=20
> > None.  This is with -O3, right?  Can you try selectively disabling parts
> > of PRE with -fno-tree-partial-pre -fno-code-hoisting?  But I suspect it=
's
> > the improvement for general PRE that we hit here.
> >=20
>=20
> Those don't seem to make a difference sadly.
>=20
> > One idea that was always floating around was to move PRE after loop opts
> > like we did with predcom.  But the no PRE before loop will likely hurt =
as
> > well
> > so we might instead want to limit PRE when it involves generating
> > constants in PHIs and schedule another PRE after loop opts (at some cost
> > then).  It's something to experiment with ...
>=20
> It looks like `-fno-tree-pre` does the trick, but then of course, messes =
up
> elsewhere.  The conditional statement seem to stay in the most complicated
> form possible in scalar code.
>=20
> I'll try to track down what to turn off and experiment with a pre2 after
> vect.
> Is before predcom a good place?

I would avoid putting it into the loop pipeline.  Instead I'd turn the
FRE pass that runs after tracer into PRE.  Maybe conditional on whether
there are any loops.

Note it's not so easy to "tame" PRE, the existing things happen at
elimination time in eliminate_dom_walker::eliminate_stmt.  I would
experiment with restricting the use of inserted PHIs in innermost(!)
loops containing invariants, maybe only if the number of PHI args is
more than two ... (but that's somewhat artificial).

That said, I'm not really convinced this is a good idea.=