From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id C2DD2385840D; Wed, 3 Nov 2021 10:57:27 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org C2DD2385840D From: "aldyh at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug tree-optimization/102943] [12 Regression] Jump threader compile-time hog with 521.wrf_r Date: Wed, 03 Nov 2021 10:57:27 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: tree-optimization X-Bugzilla-Version: 12.0 X-Bugzilla-Keywords: compile-time-hog X-Bugzilla-Severity: normal X-Bugzilla-Who: aldyh at gcc dot gnu.org X-Bugzilla-Status: NEW X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: 12.0 X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: cc Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: gcc-bugs@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-bugs mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 03 Nov 2021 10:57:27 -0000 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D102943 Aldy Hernandez changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |amacleod at redhat dot com --- Comment #10 from Aldy Hernandez --- I tried all sorts of knobs limiting the behavior for large BBs (one function has over 20k blocks), a large number of imports (dependencies on the final conditional), and even the max number of blocks to look back. None of them made a difference. Then I realized that this PR was originally reported against the hybrid VRP threader, which used a different path discovery engine altogether (the old forward threader). So, the problem can't be in the backward threader path discovery bits, but in something the solver is doing. I timed all the threaders using the solver by functionality (simple versus fully resolving mode): backwards simple : 4.85 ( 2%) 0.00 ( 0%) 4.84 ( = 2%)=20 932k ( 0%) backwards full : 54.60 ( 17%) 0.01 ( 1%) 54.70 ( 1= 7%)=20 664k ( 0%) This confirms my hypothesis that it's not the backward threader discovery b= its, since the above two entries use the same engine. So clearly, it's something that the fully resolving threader does that was common with the hybrid threader, i.e. our use of the ranger. A callgrind session shows that the majority of the back threader's time is being spent in: path_range_query::range_on_path_entry (irange &r, tree name) ...which is understandable, because when we can't resolve an SSA within the path, we ask the ranger what the range on entry to the path is. Curiously though, most of the time is spent in propagate_cache, especially add_to_update, which is accounting for 37.5% of the threader's time: - if (!bitmap_bit_p (m_propfail, bb->index) && !m_update_list.contains (b= b)) - m_update_list.quick_push (bb); This is a large CFG, so a linear search of a BB, is bound to be slow. Just replacing it with an sbitmap knocks a good 12 seconds: backwards jump threading : 48.40 ( 28%) 0.02 ( 1%) 48.57 ( = 27%) 1597k ( 0%) backwards jump threading : 32.96 ( 22%) 0.09 ( 4%) 33.12 ( = 22%) 1499k ( 0%) Not ideal, but a good improvement IMO. I'll post my proposed patch, but I suspect Andrew may have other tricks up = his sleeve.=