From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id 6E9F43857C4F; Fri, 12 Feb 2021 14:40:50 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 6E9F43857C4F From: "rguenth at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug middle-end/38474] compile time explosion in dataflow_set_preserve_mem_locs at -O3 Date: Fri, 12 Feb 2021 14:40:50 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: middle-end X-Bugzilla-Version: 4.4.0 X-Bugzilla-Keywords: compile-time-hog, memory-hog, patch X-Bugzilla-Severity: normal X-Bugzilla-Who: rguenth at gcc dot gnu.org X-Bugzilla-Status: ASSIGNED X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: rguenth at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: gcc-bugs@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-bugs mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 12 Feb 2021 14:40:50 -0000 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D38474 --- Comment #99 from Richard Biener --- Just a short brain-dump for the PTA issue: --param max-fields-for-field-sensitive=3D1 helps, so some magic limit and auto-degrading might be a good idea. Solver stats are not so bad: Total vars: 21507 Non-pointer vars: 16 Statically unified vars: 6800 Dynamically unified vars: 0 Iterations: 4 Number of edges: 43380 Number of implicit edges: 23056 but varmap "compression" happens before unifying those 6800 vars which means bitmaps are less dense than possible. That there's nothing dynamically unified also says that likely iteration order is sub-optimal. We don't have entries of the forward graph and so likely do multile DFSs starting from somewhere inside for example. Given we add both succ and pred edges during solving itself makes itegrating the DFS into the iteration itself look attractive eventually. More stats are needed to judge iteration order tweaks. We have IL like D.335748 =3D __result_mpfr_division_mp_mp; __result_mpfr_division_mp_mp =3D{v} {CLOBBER}; D.76250 =3D D.335748; D.335748 =3D{v} {CLOBBER}; ... mpfr_add (&__result_mpfr_addition_mp_mp, &D.76250, &D.76256, 0); that just generates a lot of initial constraints and variables. D.335748 becomes live here, so does D.76250. This happens early also, like with __attribute__((fn spec (". r r "))) struct mpfr_type mpfr_division_mp_mp (struct mpfr_type & restrict a1, struct mpfr_type & restrict a2) { struct mpfr_type __result_mpfr_division_mp_mp; integer(kind=3D4) retval; : mpfr_init (&__result_mpfr_division_mp_mp); retval_6 =3D mpfr_div (&__result_mpfr_division_mp_mp, a1_3(D), a2_4(D), 0= ); =3D __result_mpfr_division_mp_mp; __result_mpfr_division_mp_mp =3D{v} {CLOBBER}; return ; having some early pass after inlining clean up the result would be nice [simply renaming and eliding 1:1 copies here]. It takes until SRA to fix t= his and the PTA pass after it (run as part of PRE) is fast: tree PTA : 20.26 ( 7%) another thing to notice would be not splitting vars that just occur "bare" in the IL but that would require some pre-scanning of the IL to note interesting (and uninteresting) variables. That's something we might need anyway though for improved allocated array handling for example.=