From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-bugzilla@gcc.gnu.org>
Received: by sourceware.org (Postfix, from userid 48)
 id 6E9F43857C4F; Fri, 12 Feb 2021 14:40:50 +0000 (GMT)
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 6E9F43857C4F
From: "rguenth at gcc dot gnu.org" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug middle-end/38474] compile time explosion in
 dataflow_set_preserve_mem_locs at -O3
Date: Fri, 12 Feb 2021 14:40:50 +0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gcc
X-Bugzilla-Component: middle-end
X-Bugzilla-Version: 4.4.0
X-Bugzilla-Keywords: compile-time-hog, memory-hog, patch
X-Bugzilla-Severity: normal
X-Bugzilla-Who: rguenth at gcc dot gnu.org
X-Bugzilla-Status: ASSIGNED
X-Bugzilla-Resolution: 
X-Bugzilla-Priority: P3
X-Bugzilla-Assigned-To: rguenth at gcc dot gnu.org
X-Bugzilla-Target-Milestone: ---
X-Bugzilla-Flags: 
X-Bugzilla-Changed-Fields: 
Message-ID: <bug-38474-4-9IAoJZN2tt@http.gcc.gnu.org/bugzilla/>
In-Reply-To: <bug-38474-4@http.gcc.gnu.org/bugzilla/>
References: <bug-38474-4@http.gcc.gnu.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
X-BeenThere: gcc-bugs@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-bugs mailing list <gcc-bugs.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-bugs>,
 <mailto:gcc-bugs-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-bugs/>
List-Post: <mailto:gcc-bugs@gcc.gnu.org>
List-Help: <mailto:gcc-bugs-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-bugs>,
 <mailto:gcc-bugs-request@gcc.gnu.org?subject=subscribe>
X-List-Received-Date: Fri, 12 Feb 2021 14:40:50 -0000

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D38474
--- Comment #99 from Richard Biener <rguenth at gcc dot gnu.org> ---
Just a short brain-dump for the PTA issue:

--param max-fields-for-field-sensitive=3D1 helps, so some magic limit and
auto-degrading might be a good idea.

Solver stats are not so bad:

 Total vars:               21507
Non-pointer vars:          16
Statically unified vars:  6800
Dynamically unified vars: 0
Iterations:               4
Number of edges:          43380
Number of implicit edges: 23056

but varmap "compression" happens before unifying those 6800 vars which
means bitmaps are less dense than possible.  That there's nothing
dynamically unified also says that likely iteration order is sub-optimal.
We don't have entries of the forward graph and so likely do multile
DFSs starting from somewhere inside for example.  Given we add both
succ and pred edges during solving itself makes itegrating the DFS
into the iteration itself look attractive eventually.

More stats are needed to judge iteration order tweaks.

We have IL like

  D.335748 =3D __result_mpfr_division_mp_mp;
  __result_mpfr_division_mp_mp =3D{v} {CLOBBER};
  D.76250 =3D D.335748;
  D.335748 =3D{v} {CLOBBER};
...
  mpfr_add (&__result_mpfr_addition_mp_mp, &D.76250, &D.76256, 0);

that just generates a lot of initial constraints and variables.  D.335748
becomes live here, so does D.76250.  This happens early also, like with

__attribute__((fn spec (". r r ")))
struct mpfr_type mpfr_division_mp_mp (struct mpfr_type & restrict a1, struct
mpfr_type & restrict a2)
{
  struct mpfr_type __result_mpfr_division_mp_mp;
  integer(kind=3D4) retval;

  <bb 2> :
  mpfr_init (&__result_mpfr_division_mp_mp);
  retval_6 =3D mpfr_div (&__result_mpfr_division_mp_mp, a1_3(D), a2_4(D), 0=
);
  <retval> =3D __result_mpfr_division_mp_mp;
  __result_mpfr_division_mp_mp =3D{v} {CLOBBER};
  return <retval>;

having some early pass after inlining clean up the result would be nice
[simply renaming and eliding 1:1 copies here].  It takes until SRA to fix t=
his
and the PTA pass after it (run as part of PRE) is fast:

 tree PTA                           :  20.26 (  7%)

another thing to notice would be not splitting vars that just occur
"bare" in the IL but that would require some pre-scanning of the IL
to note interesting (and uninteresting) variables.  That's something
we might need anyway though for improved allocated array handling for
example.=