From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-bugzilla@gcc.gnu.org>
Received: by sourceware.org (Postfix, from userid 48)
	id 164733858C53; Tue,  3 Jan 2023 17:46:56 +0000 (GMT)
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 164733858C53
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org;
	s=default; t=1672768017;
	bh=EOt//yOTN+VXtlnEuVnRaWjqFsd7mHyUfP8bZYozsJw=;
	h=From:To:Subject:Date:In-Reply-To:References:From;
	b=XPpcSTK+N3kcD3RIN6TeER3RlO2RbV3/Jb249h+/DVQqQPjGcNrmu390o+t8L4vTh
	 EP4hG6Sdzx8ZqXhm1oL0UtdH+d2xBInJeDlsyKp9axf0VpQeAtA8S0hp0eBuKmBD0z
	 eQ6xWyu7WVIVNjmPMNP4o67Tc/m9BOskeDgbAFbE=
From: "amacleod at redhat dot com" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug tree-optimization/107822] [13 Regression] Dead Code Elimination
 Regression at -Os (trunk vs. 12.2.0)
Date: Tue, 03 Jan 2023 17:46:55 +0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gcc
X-Bugzilla-Component: tree-optimization
X-Bugzilla-Version: 13.0
X-Bugzilla-Keywords: missed-optimization
X-Bugzilla-Severity: normal
X-Bugzilla-Who: amacleod at redhat dot com
X-Bugzilla-Status: NEW
X-Bugzilla-Resolution: 
X-Bugzilla-Priority: P1
X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org
X-Bugzilla-Target-Milestone: 13.0
X-Bugzilla-Flags: 
X-Bugzilla-Changed-Fields: 
Message-ID: <bug-107822-4-All9ZFCJSY@http.gcc.gnu.org/bugzilla/>
In-Reply-To: <bug-107822-4@http.gcc.gnu.org/bugzilla/>
References: <bug-107822-4@http.gcc.gnu.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
List-Id: <gcc-bugs.sourceware.org>

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D107822
--- Comment #2 from Andrew Macleod <amacleod at redhat dot com> ---
Sorry, I've been out for a few weeks.

This isn't an on-demand issue. All versions of VRP do a full walk and resol=
ve
during the walk. This issue is a combination of lack of iteration and not
optimistic evaluations on back edges.=20

We do use on-demand to query the back edge to alleviate a number of issues =
with
lack of iteration. The problem is that with a lack of iteration, we can't t=
ake
the fully optimistic approach and assume that the value of c_10 is [1, 1] f=
or a
full iteration of the loop and then determine during the second iteration t=
hat
it is always either 1 or 2.

    <bb 3> [local count: 955630225]:
    _1 =3D c_10 ^ 3;

    <bb 4> [local count: 1073741824]:
    # c_10 =3D PHI <1(2), _1(3)>
    b.1_3 =3D b;
    if (b.1_3 <=3D 8)
      goto <bb 3>; [89.00%]
    else
      goto <bb 5>; [11.00%]

    <bb 5> [local count: 118111600]:
    # c_13 =3D PHI <c_10(4)>
    if (c_13 !=3D 0)

When we first try to evaluate=20
# c_10 =3D PHI <1(2), _1(3)>
the back edge 3->4 has not been processed, so it is evaluated and _1 needs =
to
be calculated. Unfortunately the value of c_10 has not been resolved yet, s=
o it
defaults to something pessimistically safe and assumes it is varying and we=
 get
the results you see.


We have a few possible approaches. In an ideal world, we could use the path
ranger to answer questions on the back edge instead of fully evaluating it.=
=20
This would allow us to do a "pretend" first iteration and use the value of
[1,1] for c_10, and produce _1 =3D  result of [1,2], which in turn would th=
en
cause us to evaluate c_10 as [1,2].  It is too late in the cycle to experim=
ent
with that approach I think.

I have also (still am) experimented with changing that initial value to be =
more
optimistic.  When we first evaluate a statement, We store an initial value =
so
that if it is encountered again during evaluation, we can avoid a cycle. Th=
at
initial value is what gets used by any calculations along a back edge. When=
 we
return from resolving all the inputs, we do a final evaluation of the state=
ment
and store the real global value.

We have existing infrastructure which uses time stamps that should allow us=
 to
calculate one value when doing the back edge, and then recalculate it for r=
eal
when we actually process that block.  I have not convinced myself that this=
 is
100% safe yet.

A third option would be to recognize this is a fairly common pattern that we
have a 2 input PHI node like this. and before setting the initial value we
could walk the use-def chains and see if the PHI is feeding itself and make
some other determinations about the range (ie, increasing, decreasing, spec=
ific
values, etc) and use that for the initial estimate instead.   This would gi=
ve
us similar results to what we did before and is likely safer than depending=
 on
timestamps to do recalculations. It is just not quite as general.=20=20=20=
=20=20=20

Im experimenting with the latter 2 approaches this week to determine what s=
eems
safest at this point in the cycle.=