From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id 49FF53858429; Thu, 10 Mar 2022 14:01:21 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 49FF53858429 From: "amacleod at redhat dot com" To: gcc-bugs@gcc.gnu.org Subject: [Bug tree-optimization/102943] [12 Regression] Jump threader compile-time hog with 521.wrf_r Date: Thu, 10 Mar 2022 14:01:20 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: tree-optimization X-Bugzilla-Version: 12.0 X-Bugzilla-Keywords: compile-time-hog X-Bugzilla-Severity: normal X-Bugzilla-Who: amacleod at redhat dot com X-Bugzilla-Status: NEW X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: 12.0 X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: gcc-bugs@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-bugs mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 10 Mar 2022 14:01:21 -0000 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D102943 --- Comment #41 from Andrew Macleod --- >=20 > so it's still by far jump-threading/VRP dominating compile-times (I wonder > if we should separate "old" and "new" [E]VRP timevars). Given that VRP > shows up as well it's more likely the underlying ranger infrastructure? Yeah, Id be tempted to just label them vrp1 (evrp) vrp2 (current vrp1) and vrp3 (current vrp2) and track them separately. I have noticed significant behaviour differences from the code we see at VRP2 time vs EVRP. >=20 > perf thrown on ltrans22 shows >=20 > Samples: 302K of event 'cycles', Event count (approx.): 331301505627=20= =20=20=20=20=20=20=20 >=20 > Overhead Samples Command Shared Object Symbol=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20 >=20 > 10.34% 31299 lto1-ltrans lto1 [.] > bitmap_get_aligned_chunk > 7.44% 22540 lto1-ltrans lto1 [.] bitmap_bit_p > 3.17% 9593 lto1-ltrans lto1 [.] >=20 > callgraph info in perf is a mixed bag, but maybe it helps to pinpoint thi= ngs: >=20 > - 10.20% 10.18% 30364 lto1-ltrans lto1 [.] > bitmap_get_aligned_chunk=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20 > # > - 10.18% 0xffffffffffffffff=20=20=20=20=20=20=20=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20= =20=20=20=20=20=20=20 > # > + 9.16% ranger_cache::propagate_cache=20=20=20=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20 > # > + 1.01% ranger_cache::fill_block_cache=20=20=20=20=20=20=20=20=20= =20=20=20=20=20=20 >=20 I am currently looking at reworking the cache again so that the propagation= is limited only to actual changes. It can still currently get out of hand in massive CFGs, and thats already using the sparse representation. There may= be some minor tweaks that can make a big difference here. I'll have a look = over the next couple of days. Its probably safe to assume the threading performance is directly related to this as well. .=