From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from nikam.ms.mff.cuni.cz (nikam.ms.mff.cuni.cz [195.113.20.16]) by sourceware.org (Postfix) with ESMTPS id 5D9AA3858D20 for ; Tue, 11 Jul 2023 09:28:18 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 5D9AA3858D20 Authentication-Results: sourceware.org; dmarc=fail (p=none dis=none) header.from=ucw.cz Authentication-Results: sourceware.org; spf=none smtp.mailfrom=kam.mff.cuni.cz Received: by nikam.ms.mff.cuni.cz (Postfix, from userid 16202) id C14CE281D37; Tue, 11 Jul 2023 11:28:16 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ucw.cz; s=gen1; t=1689067696; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=3WQw72Sn6e+gPULj18XyVW1enQ4+vmENHqkNXpxeieM=; b=J1uojo/Btm73GLYVESsVDBh9QDflWnWHZt5u+qBOD2xXQV+y3RuDdZR6W7KRKRZerYT2vb Cat/NBJ3NqhNS/A04UJJqI3OIyhc7EEMiZR497GZcyZ77xevILZVwg23SDfvJa0qBT+6EB SxraiEwWy3RwrUsub3HBaDWpebUhAM4= Date: Tue, 11 Jul 2023 11:28:16 +0200 From: Jan Hubicka To: Richard Biener Cc: Tamar Christina , gcc-patches@gcc.gnu.org, nd@arm.com, jlaw@ventanamicro.com Subject: Re: [PATCH 4/19]middle-end: Fix scale_loop_frequencies segfault on multiple-exits Message-ID: References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Spam-Status: No, score=-5.2 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL,SPF_HELO_NONE,SPF_NONE,TXREP,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: > > What I saw most wrecking the profile is when passes turn > if (cond) into if (0/1) leaving the CFG adjustment to CFG cleanup > which then simply deletes one of the outgoing edges without doing > anything to the (guessed) profile. Yep, I agree that this is disturbing. At the cfg cleanup time one can hardly do anything useful, since the knowledge of transform that caused profile inconsistency is forgotten. however I think it is not a complete disaster. With profile feedback the most common case of this happening is a situation where we duplicated code (by inlining, unrolling etc.) into a context where it behaves differently then the typical behaviour represented by the profile. So if one ends up zapping edge with large probability, one also knows that the code being optimized does not exhibit typical behaviour from the train run and thus is not very hot. So profile inconsistency should not affect performance that much. So doing nothing is IMO may end up being safter than trying to get the in/out counts right without really known what is going on. This is mostly about the scenario "constant propagated this conditional and profile disagrees with me". There are other cases where update is IMO important. i.e. vectorizer forgetting to cap #of iterations of epilogue may cause issue since the epilogue loop looks more frequent than the main vectorized loop and it may cause IRA to insert spilling into it or so. When we duplicate we have chance to figure out profile updates. Also we may try to get as much as possible done early. I think we should again do loop header copying that does not expand code at early opts again. I have some more plans on cleaning up loop-ch and then we can give it a try. With guessed profile we always have option to re-do the propagation. There is TODO_rebuild_frequencies for that which we do after inlining. This is mostly to handle possible overflows on large loops nests constructed by inliner. We can re-propagate once again after late cleanup passes. Looking at the queue, we have: NEXT_PASS (pass_remove_cgraph_callee_edges); /* Initial scalar cleanups before alias computation. They ensure memory accesses are not indirect wherever possible. */ NEXT_PASS (pass_strip_predict_hints, false /* early_p */); NEXT_PASS (pass_ccp, true /* nonzero_p */); /* After CCP we rewrite no longer addressed locals into SSA form if possible. */ NEXT_PASS (pass_object_sizes); NEXT_PASS (pass_post_ipa_warn); /* Must run before loop unrolling. */ NEXT_PASS (pass_warn_access, /*early=*/true); NEXT_PASS (pass_complete_unrolli); ^^^^ here we care about profile NEXT_PASS (pass_backprop); NEXT_PASS (pass_phiprop); NEXT_PASS (pass_forwprop); /* pass_build_alias is a dummy pass that ensures that we execute TODO_rebuild_alias at this point. */ NEXT_PASS (pass_build_alias); NEXT_PASS (pass_return_slot); NEXT_PASS (pass_fre, true /* may_iterate */); NEXT_PASS (pass_merge_phi); NEXT_PASS (pass_thread_jumps_full, /*first=*/true); ^^^^ here By now we did CCP and FRE so we likely optimized out most of constant conditionals exposed by inline. Honza