From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id 2825639DC4CA; Fri, 12 Feb 2021 10:29:13 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 2825639DC4CA From: "rguenth at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug middle-end/38474] compile time explosion in dataflow_set_preserve_mem_locs at -O3 Date: Fri, 12 Feb 2021 10:29:12 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: middle-end X-Bugzilla-Version: 4.4.0 X-Bugzilla-Keywords: compile-time-hog, memory-hog, patch X-Bugzilla-Severity: normal X-Bugzilla-Who: rguenth at gcc dot gnu.org X-Bugzilla-Status: ASSIGNED X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: rguenth at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: gcc-bugs@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-bugs mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 12 Feb 2021 10:29:13 -0000 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D38474 --- Comment #96 from Richard Biener --- The full testcase on trunk (g:95d94b52ea8478334fb92cca545f0bd904bd0034) at = -O0 -g now takes 9s to compile and uses 1GB ram. With -O1 -g we have Time variable usr sys = wall GGC callgraph functions expansion : 13.41 ( 12%) 0.21 ( 60%) 13.63 ( = 12%) 439M ( 73%) callgraph ipa passes : 94.79 ( 86%) 0.13 ( 37%) 94.95 ( = 86%) 75M ( 13%) ipa function summary : 91.46 ( 83%) 0.02 ( 6%) 91.53 ( = 83%) 17M ( 3%) tree PTA : 5.78 ( 5%) 0.05 ( 14%) 5.85 ( = 5%) 23M ( 4%) TOTAL : 109.96 0.35 110.37=20= =20=20=20=20=20=20 597M 109.97user 0.37system 1:50.38elapsed 99%CPU (0avgtext+0avgdata 1110568maxresident)k 0inputs+0outputs (0major+350549minor)pagefaults 0swaps where perf shows Samples: 448K of event 'cycles:u', Event count (approx.): 483237005145=20= =20=20=20=20=20=20=20=20=20 Overhead Samples Command Shared Object Symbol=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20 17.26% 77187 f951 f951 [.] get_ref_base_and_ex= tent # 8.36% 37385 f951 f951 [.] stmt_may_clobber_ref_p_1 # 7.16% 32045 f951 f951 [.] default_binds_local= _p_3 # 6.40% 28628 f951 f951 [.] bitmap_bit_p=20=20= =20=20=20=20=20=20=20=20=20 # 6.39% 28557 f951 f951 [.] determine_known_aggregate_parts # 5.92% 26464 f951 f951 [.] pt_solution_include= s_1=20 # 4.66% 20834 f951 f951 [.] call_may_clobber_ref_p_1 # 3.44% 15406 f951 f951 [.] flags_from_decl_or_= type # 3.35% 14971 f951 f951 [.] refs_may_alias_p_1= =20=20=20=20=20 # 3.05% 13667 f951 f951 [.] gimple_call_flags= =20=20=20=20=20=20 # 2.55% 11387 f951 f951 [.] cgraph_node::get_availability # 2.40% 10739 f951 libc-2.26.so [.] __strncmp_sse42=20= =20=20=20=20=20=20=20 # 2.32% 10372 f951 f951 [.] check_fnspec=20=20= =20=20=20=20=20=20=20=20=20 # 1.89% 8411 f951 f951 [.] bitmap_set_bit=20= =20=20=20=20=20=20=20=20 # 1.71% 7635 f951 f951 [.] private_lookup_attribute # 1.68% 7512 f951 f951 [.] get_modref_function_summary # 1.52% 6805 f951 f951 [.] decl_binds_to_current_def_p # 1.46% 6512 f951 f951 [.] gimple_call_fnspec= =20=20=20=20=20 # 1.26% 5582 f951 f951 [.] bitmap_clear_bit=20= =20=20=20=20=20=20 # 0.94% 4212 f951 f951 [.] cgraph_node::function_or_virtual_thunk_symbol=20=20=20=20=20=20=20 we need to do sth about the IPA fnsummary cost, it looks unreasonable compa= red to all the rest, at least for -O1. Cutting down --param ipa-max-aa-steps doesn't seem to help but it looks accounting is simply broken. And with -O2 or -O3 we have Time variable usr sys = wall GGC callgraph functions expansion : 201.23 ( 20%) 0.77 ( 46%) 202.05 ( = 20%) 1230M ( 82%) callgraph ipa passes : 807.58 ( 80%) 0.86 ( 52%) 808.75 ( = 80%) 201M ( 13%) ipa inlining heuristics : 40.25 ( 4%) 0.01 ( 1%) 40.24 ( = 4%) 41M ( 3%) alias stmt walking : 21.48 ( 2%) 0.20 ( 12%) 21.72 ( = 2%) 601k ( 0%) tree PTA : 788.36 ( 78%) 0.76 ( 46%) 789.43 ( = 78%) 101M ( 7%) tree slp vectorization : 13.97 ( 1%) 0.04 ( 2%) 14.01 ( = 1%) 225M ( 15%) expand vars : 92.66 ( 9%) 0.00 ( 0%) 92.72 ( = 9%) 63M ( 4%) TOTAL :1010.42 1.66 1012.46=20= =20=20=20=20=20=20 1509M 1010.42user 1.73system 16:52.53elapsed 99%CPU (0avgtext+0avgdata 4764428maxresident)k 0inputs+0outputs (0major+1199966minor)pagefaults 0swaps surprisingly the IPA fnsummary issue is -O1 only but maybe it's an accounti= ng issue. perf with callgraph points to (if I interpret correctly) the determine_known_aggregate_parts function which, while accounting alias queries done via get_continuation_for_phi, does not account those done by walking the VDEF chain itself. I'm testing a fix.=