From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=PhnD=C5=kam.mff.cuni.cz=hubicka@sourceware.org>
Received: from nikam.ms.mff.cuni.cz (nikam.ms.mff.cuni.cz [195.113.20.16])
	by sourceware.org (Postfix) with ESMTPS id 5D9AA3858D20
	for <gcc-patches@gcc.gnu.org>; Tue, 11 Jul 2023 09:28:18 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 5D9AA3858D20
Authentication-Results: sourceware.org; dmarc=fail (p=none dis=none) header.from=ucw.cz
Authentication-Results: sourceware.org; spf=none smtp.mailfrom=kam.mff.cuni.cz
Received: by nikam.ms.mff.cuni.cz (Postfix, from userid 16202)
	id C14CE281D37; Tue, 11 Jul 2023 11:28:16 +0200 (CEST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ucw.cz; s=gen1;
	t=1689067696;
	h=from:from:reply-to:subject:subject:date:date:message-id:message-id:
	 to:to:cc:cc:mime-version:mime-version:content-type:content-type:
	 in-reply-to:in-reply-to:references:references;
	bh=3WQw72Sn6e+gPULj18XyVW1enQ4+vmENHqkNXpxeieM=;
	b=J1uojo/Btm73GLYVESsVDBh9QDflWnWHZt5u+qBOD2xXQV+y3RuDdZR6W7KRKRZerYT2vb
	Cat/NBJ3NqhNS/A04UJJqI3OIyhc7EEMiZR497GZcyZ77xevILZVwg23SDfvJa0qBT+6EB
	SxraiEwWy3RwrUsub3HBaDWpebUhAM4=
Date: Tue, 11 Jul 2023 11:28:16 +0200
From: Jan Hubicka <hubicka@ucw.cz>
To: Richard Biener <rguenther@suse.de>
Cc: Tamar Christina <tamar.christina@arm.com>, gcc-patches@gcc.gnu.org,
	nd@arm.com, jlaw@ventanamicro.com
Subject: Re: [PATCH 4/19]middle-end: Fix scale_loop_frequencies segfault on
 multiple-exits
Message-ID: <ZK0gsKhkcKUMlmtU@kam.mff.cuni.cz>
References: <ZJw5BcegL6zob2rC@arm.com>
 <nycvar.YFH.7.77.849.2307041137250.4723@jbgna.fhfr.qr>
 <ZKQzSJcEUL5SH0KG@kam.mff.cuni.cz>
 <ZKbQ7DE2cxU1hA96@kam.mff.cuni.cz>
 <nycvar.YFH.7.77.849.2307070557080.4723@jbgna.fhfr.qr>
 <ZKgC/9WHxvuD3qdA@kam.mff.cuni.cz>
 <nycvar.YFH.7.77.849.2307100654500.4723@jbgna.fhfr.qr>
 <ZKvOJZD83UUXTSnE@kam.mff.cuni.cz>
 <nycvar.YFH.7.77.849.2307100925260.4723@jbgna.fhfr.qr>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <nycvar.YFH.7.77.849.2307100925260.4723@jbgna.fhfr.qr>
X-Spam-Status: No, score=-5.2 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL,SPF_HELO_NONE,SPF_NONE,TXREP,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org
List-Id: <gcc-patches.gcc.gnu.org>

> 
> What I saw most wrecking the profile is when passes turn
> if (cond) into if (0/1) leaving the CFG adjustment to CFG cleanup
> which then simply deletes one of the outgoing edges without doing
> anything to the (guessed) profile.

Yep, I agree that this is disturbing.  At the cfg cleanup time one can
hardly do anything useful, since the knowledge of transform that caused
profile inconsistency is forgotten.  however I think it is not a complete
disaster.

With profile feedback the most common case of this happening is a
situation where we duplicated code (by inlining, unrolling etc.) into a
context where it behaves differently then the typical behaviour
represented by the profile.

So if one ends up zapping edge with large probability, one also knows
that the code being optimized does not exhibit typical behaviour from
the train run and thus is not very hot.  So profile inconsistency should
not affect performance that much.

So doing nothing is IMO may end up being safter than trying to get the
in/out counts right without really known what is going on.

This is mostly about the scenario "constant propagated this conditional
and profile disagrees with me".  There are other cases where update is
IMO important.  i.e. vectorizer forgetting to cap #of iterations of
epilogue may cause issue since the epilogue loop looks more frequent
than the main vectorized loop and it may cause IRA to insert spilling
into it or so.

When we duplicate we have chance to figure out profile updates.
Also we may try to get as much as possible done early.
I think we should again do loop header copying that does not expand code
at early opts again.  I have some more plans on cleaning up loop-ch and
then we can give it a try.

With guessed profile we always have option to re-do the propagation.
There is TODO_rebuild_frequencies for that which we do after inlining.
This is mostly to handle possible overflows on large loops nests
constructed by inliner.  

We can re-propagate once again after late cleanup passes. Looking at the
queue, we have:

      NEXT_PASS (pass_remove_cgraph_callee_edges);
      /* Initial scalar cleanups before alias computation.
         They ensure memory accesses are not indirect wherever possible.  */
      NEXT_PASS (pass_strip_predict_hints, false /* early_p */);
      NEXT_PASS (pass_ccp, true /* nonzero_p */);
      /* After CCP we rewrite no longer addressed locals into SSA
         form if possible.  */
      NEXT_PASS (pass_object_sizes);
      NEXT_PASS (pass_post_ipa_warn);
      /* Must run before loop unrolling.  */
      NEXT_PASS (pass_warn_access, /*early=*/true);
      NEXT_PASS (pass_complete_unrolli);
^^^^ here we care about profile
      NEXT_PASS (pass_backprop);
      NEXT_PASS (pass_phiprop);
      NEXT_PASS (pass_forwprop);
      /* pass_build_alias is a dummy pass that ensures that we
         execute TODO_rebuild_alias at this point.  */
      NEXT_PASS (pass_build_alias);
      NEXT_PASS (pass_return_slot);
      NEXT_PASS (pass_fre, true /* may_iterate */);
      NEXT_PASS (pass_merge_phi);
      NEXT_PASS (pass_thread_jumps_full, /*first=*/true);
^^^^ here

By now we did CCP and FRE so we likely optimized out most of constant
conditionals exposed by inline.
Honza