From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp-out1.suse.de (smtp-out1.suse.de [IPv6:2001:67c:2178:6::1c]) by sourceware.org (Postfix) with ESMTPS id 03B4E3857704 for ; Tue, 11 Jul 2023 10:31:12 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 03B4E3857704 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=suse.de Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=suse.de Received: from relay2.suse.de (relay2.suse.de [149.44.160.134]) by smtp-out1.suse.de (Postfix) with ESMTP id 3C43621ECD; Tue, 11 Jul 2023 10:31:11 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1689071471; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=ft+5/F1r43Zu8NWNiSM3wXt7H+iABFURctoVkjXwu18=; b=0foYFbRw+d76O/wdaUxPzOpFNWA5vPgI6IuWEL346BHfENt0Dr8vIgnG5LxpeRGVUKhH6W lrs1Z0f8vvFMzZFLQxNG97DhT1igVFb1Ol0nPkhdIbpRj+zFM27UMJmfdXXiKP6ByI08/B bif3i8qVZR1h/2AY7uwV9hYCZsAyRtg= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1689071471; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=ft+5/F1r43Zu8NWNiSM3wXt7H+iABFURctoVkjXwu18=; b=Wc4S5jWEN7uPgm2SE1m+Nx4sJHpUCGMxp2gs3NURqEBnOgoVP5Dwo4H6ll2QcJDhfUVAuQ PZcNXrW7RFYTWuDw== Received: from wotan.suse.de (wotan.suse.de [10.160.0.1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by relay2.suse.de (Postfix) with ESMTPS id 278612C142; Tue, 11 Jul 2023 10:31:11 +0000 (UTC) Date: Tue, 11 Jul 2023 10:31:11 +0000 (UTC) From: Richard Biener To: Jan Hubicka cc: Tamar Christina , gcc-patches@gcc.gnu.org, nd@arm.com, jlaw@ventanamicro.com Subject: Re: [PATCH 4/19]middle-end: Fix scale_loop_frequencies segfault on multiple-exits In-Reply-To: Message-ID: References: User-Agent: Alpine 2.22 (LSU 394 2020-01-19) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII X-Spam-Status: No, score=-5.0 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Tue, 11 Jul 2023, Jan Hubicka wrote: > > > > What I saw most wrecking the profile is when passes turn > > if (cond) into if (0/1) leaving the CFG adjustment to CFG cleanup > > which then simply deletes one of the outgoing edges without doing > > anything to the (guessed) profile. > > Yep, I agree that this is disturbing. At the cfg cleanup time one can > hardly do anything useful, since the knowledge of transform that caused > profile inconsistency is forgotten. however I think it is not a complete > disaster. > > With profile feedback the most common case of this happening is a > situation where we duplicated code (by inlining, unrolling etc.) into a > context where it behaves differently then the typical behaviour > represented by the profile. > > So if one ends up zapping edge with large probability, one also knows > that the code being optimized does not exhibit typical behaviour from > the train run and thus is not very hot. So profile inconsistency should > not affect performance that much. > > So doing nothing is IMO may end up being safter than trying to get the > in/out counts right without really known what is going on. > > This is mostly about the scenario "constant propagated this conditional > and profile disagrees with me". There are other cases where update is > IMO important. i.e. vectorizer forgetting to cap #of iterations of > epilogue may cause issue since the epilogue loop looks more frequent > than the main vectorized loop and it may cause IRA to insert spilling > into it or so. > > When we duplicate we have chance to figure out profile updates. > Also we may try to get as much as possible done early. > I think we should again do loop header copying that does not expand code > at early opts again. I have some more plans on cleaning up loop-ch and > then we can give it a try. > > With guessed profile we always have option to re-do the propagation. > There is TODO_rebuild_frequencies for that which we do after inlining. > This is mostly to handle possible overflows on large loops nests > constructed by inliner. > > We can re-propagate once again after late cleanup passes. Looking at the > queue, we have: > > NEXT_PASS (pass_remove_cgraph_callee_edges); > /* Initial scalar cleanups before alias computation. > They ensure memory accesses are not indirect wherever possible. */ > NEXT_PASS (pass_strip_predict_hints, false /* early_p */); > NEXT_PASS (pass_ccp, true /* nonzero_p */); > /* After CCP we rewrite no longer addressed locals into SSA > form if possible. */ > NEXT_PASS (pass_object_sizes); > NEXT_PASS (pass_post_ipa_warn); > /* Must run before loop unrolling. */ > NEXT_PASS (pass_warn_access, /*early=*/true); > NEXT_PASS (pass_complete_unrolli); > ^^^^ here we care about profile > NEXT_PASS (pass_backprop); > NEXT_PASS (pass_phiprop); > NEXT_PASS (pass_forwprop); > /* pass_build_alias is a dummy pass that ensures that we > execute TODO_rebuild_alias at this point. */ > NEXT_PASS (pass_build_alias); > NEXT_PASS (pass_return_slot); > NEXT_PASS (pass_fre, true /* may_iterate */); > NEXT_PASS (pass_merge_phi); > NEXT_PASS (pass_thread_jumps_full, /*first=*/true); > ^^^^ here > > By now we did CCP and FRE so we likely optimized out most of constant > conditionals exposed by inline. So maybe we should simply delay re-propagation of the profile? I think cunrolli doesn't so much care about the profile - cunrolli is (was) about abstraction removal. Jump threading should be the first pass to care. Richard.