From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=BjL+=C5=suse.de=rguenther@sourceware.org>
Received: from smtp-out1.suse.de (smtp-out1.suse.de [IPv6:2001:67c:2178:6::1c])
	by sourceware.org (Postfix) with ESMTPS id 03B4E3857704
	for <gcc-patches@gcc.gnu.org>; Tue, 11 Jul 2023 10:31:12 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 03B4E3857704
Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=suse.de
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=suse.de
Received: from relay2.suse.de (relay2.suse.de [149.44.160.134])
	by smtp-out1.suse.de (Postfix) with ESMTP id 3C43621ECD;
	Tue, 11 Jul 2023 10:31:11 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa;
	t=1689071471; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc:
	 mime-version:mime-version:content-type:content-type:
	 in-reply-to:in-reply-to:references:references;
	bh=ft+5/F1r43Zu8NWNiSM3wXt7H+iABFURctoVkjXwu18=;
	b=0foYFbRw+d76O/wdaUxPzOpFNWA5vPgI6IuWEL346BHfENt0Dr8vIgnG5LxpeRGVUKhH6W
	lrs1Z0f8vvFMzZFLQxNG97DhT1igVFb1Ol0nPkhdIbpRj+zFM27UMJmfdXXiKP6ByI08/B
	bif3i8qVZR1h/2AY7uwV9hYCZsAyRtg=
DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de;
	s=susede2_ed25519; t=1689071471;
	h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc:
	 mime-version:mime-version:content-type:content-type:
	 in-reply-to:in-reply-to:references:references;
	bh=ft+5/F1r43Zu8NWNiSM3wXt7H+iABFURctoVkjXwu18=;
	b=Wc4S5jWEN7uPgm2SE1m+Nx4sJHpUCGMxp2gs3NURqEBnOgoVP5Dwo4H6ll2QcJDhfUVAuQ
	PZcNXrW7RFYTWuDw==
Received: from wotan.suse.de (wotan.suse.de [10.160.0.1])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by relay2.suse.de (Postfix) with ESMTPS id 278612C142;
	Tue, 11 Jul 2023 10:31:11 +0000 (UTC)
Date: Tue, 11 Jul 2023 10:31:11 +0000 (UTC)
From: Richard Biener <rguenther@suse.de>
To: Jan Hubicka <hubicka@ucw.cz>
cc: Tamar Christina <tamar.christina@arm.com>, gcc-patches@gcc.gnu.org, 
    nd@arm.com, jlaw@ventanamicro.com
Subject: Re: [PATCH 4/19]middle-end: Fix scale_loop_frequencies segfault on
 multiple-exits
In-Reply-To: <ZK0gsKhkcKUMlmtU@kam.mff.cuni.cz>
Message-ID: <nycvar.YFH.7.77.849.2307111029430.4723@jbgna.fhfr.qr>
References: <ZJw5BcegL6zob2rC@arm.com> <nycvar.YFH.7.77.849.2307041137250.4723@jbgna.fhfr.qr> <ZKQzSJcEUL5SH0KG@kam.mff.cuni.cz> <ZKbQ7DE2cxU1hA96@kam.mff.cuni.cz> <nycvar.YFH.7.77.849.2307070557080.4723@jbgna.fhfr.qr> <ZKgC/9WHxvuD3qdA@kam.mff.cuni.cz>
 <nycvar.YFH.7.77.849.2307100654500.4723@jbgna.fhfr.qr> <ZKvOJZD83UUXTSnE@kam.mff.cuni.cz> <nycvar.YFH.7.77.849.2307100925260.4723@jbgna.fhfr.qr> <ZK0gsKhkcKUMlmtU@kam.mff.cuni.cz>
User-Agent: Alpine 2.22 (LSU 394 2020-01-19)
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
X-Spam-Status: No, score=-5.0 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org
List-Id: <gcc-patches.gcc.gnu.org>

On Tue, 11 Jul 2023, Jan Hubicka wrote:

> > 
> > What I saw most wrecking the profile is when passes turn
> > if (cond) into if (0/1) leaving the CFG adjustment to CFG cleanup
> > which then simply deletes one of the outgoing edges without doing
> > anything to the (guessed) profile.
> 
> Yep, I agree that this is disturbing.  At the cfg cleanup time one can
> hardly do anything useful, since the knowledge of transform that caused
> profile inconsistency is forgotten.  however I think it is not a complete
> disaster.
> 
> With profile feedback the most common case of this happening is a
> situation where we duplicated code (by inlining, unrolling etc.) into a
> context where it behaves differently then the typical behaviour
> represented by the profile.
> 
> So if one ends up zapping edge with large probability, one also knows
> that the code being optimized does not exhibit typical behaviour from
> the train run and thus is not very hot.  So profile inconsistency should
> not affect performance that much.
> 
> So doing nothing is IMO may end up being safter than trying to get the
> in/out counts right without really known what is going on.
> 
> This is mostly about the scenario "constant propagated this conditional
> and profile disagrees with me".  There are other cases where update is
> IMO important.  i.e. vectorizer forgetting to cap #of iterations of
> epilogue may cause issue since the epilogue loop looks more frequent
> than the main vectorized loop and it may cause IRA to insert spilling
> into it or so.
> 
> When we duplicate we have chance to figure out profile updates.
> Also we may try to get as much as possible done early.
> I think we should again do loop header copying that does not expand code
> at early opts again.  I have some more plans on cleaning up loop-ch and
> then we can give it a try.
> 
> With guessed profile we always have option to re-do the propagation.
> There is TODO_rebuild_frequencies for that which we do after inlining.
> This is mostly to handle possible overflows on large loops nests
> constructed by inliner.  
> 
> We can re-propagate once again after late cleanup passes. Looking at the
> queue, we have:
> 
>       NEXT_PASS (pass_remove_cgraph_callee_edges);
>       /* Initial scalar cleanups before alias computation.
>          They ensure memory accesses are not indirect wherever possible.  */
>       NEXT_PASS (pass_strip_predict_hints, false /* early_p */);
>       NEXT_PASS (pass_ccp, true /* nonzero_p */);
>       /* After CCP we rewrite no longer addressed locals into SSA
>          form if possible.  */
>       NEXT_PASS (pass_object_sizes);
>       NEXT_PASS (pass_post_ipa_warn);
>       /* Must run before loop unrolling.  */
>       NEXT_PASS (pass_warn_access, /*early=*/true);
>       NEXT_PASS (pass_complete_unrolli);
> ^^^^ here we care about profile
>       NEXT_PASS (pass_backprop);
>       NEXT_PASS (pass_phiprop);
>       NEXT_PASS (pass_forwprop);
>       /* pass_build_alias is a dummy pass that ensures that we
>          execute TODO_rebuild_alias at this point.  */
>       NEXT_PASS (pass_build_alias);
>       NEXT_PASS (pass_return_slot);
>       NEXT_PASS (pass_fre, true /* may_iterate */);
>       NEXT_PASS (pass_merge_phi);
>       NEXT_PASS (pass_thread_jumps_full, /*first=*/true);
> ^^^^ here
> 
> By now we did CCP and FRE so we likely optimized out most of constant
> conditionals exposed by inline.

So maybe we should simply delay re-propagation of the profile?  I
think cunrolli doesn't so much care about the profile - cunrolli
is (was) about abstraction removal.  Jump threading should be
the first pass to care.

Richard.