From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=cKl7=C4=suse.de=rguenther@sourceware.org>
Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.220.28])
	by sourceware.org (Postfix) with ESMTPS id 101613858D32
	for <gcc-patches@gcc.gnu.org>; Mon, 10 Jul 2023 09:29:30 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 101613858D32
Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=suse.de
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=suse.de
Received: from relay2.suse.de (relay2.suse.de [149.44.160.134])
	by smtp-out1.suse.de (Postfix) with ESMTP id 3FCA521B88;
	Mon, 10 Jul 2023 09:29:29 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa;
	t=1688981369; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc:
	 mime-version:mime-version:content-type:content-type:
	 in-reply-to:in-reply-to:references:references;
	bh=SNQdRQH+MbvSIfU1I5Qt4i7z5zEZx0JfdqJvOeJyCkc=;
	b=HPxXJz42Mp+mVa1nrZon18NbT2WGAmbd1i3//KQHDRbW6OBBcFH3j5enBclKORVIAr80Mr
	dptoT6eBa26TSC9u9Obo0EffZscfOTEiQaL0tqzQYrmTucFazWDjBNQiPHvFzt4+dsq7nY
	fvDL1xtBv0BpUmpLAZBPDCj0iVoUxIE=
DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de;
	s=susede2_ed25519; t=1688981369;
	h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc:
	 mime-version:mime-version:content-type:content-type:
	 in-reply-to:in-reply-to:references:references;
	bh=SNQdRQH+MbvSIfU1I5Qt4i7z5zEZx0JfdqJvOeJyCkc=;
	b=XhgBzp9KBejZncWuT0jh7O9T5NrrQ9eMq36xF8ZN29lMCA7p3igMw0P3RJx/VhGa88btAX
	GzkXwRg1oegAeeCA==
Received: from wotan.suse.de (wotan.suse.de [10.160.0.1])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by relay2.suse.de (Postfix) with ESMTPS id 286332C142;
	Mon, 10 Jul 2023 09:29:29 +0000 (UTC)
Date: Mon, 10 Jul 2023 09:29:29 +0000 (UTC)
From: Richard Biener <rguenther@suse.de>
To: Jan Hubicka <hubicka@ucw.cz>
cc: Tamar Christina <tamar.christina@arm.com>, gcc-patches@gcc.gnu.org, 
    nd@arm.com, jlaw@ventanamicro.com
Subject: Re: [PATCH 4/19]middle-end: Fix scale_loop_frequencies segfault on
 multiple-exits
In-Reply-To: <ZKvOJZD83UUXTSnE@kam.mff.cuni.cz>
Message-ID: <nycvar.YFH.7.77.849.2307100925260.4723@jbgna.fhfr.qr>
References: <ZJw5BcegL6zob2rC@arm.com> <nycvar.YFH.7.77.849.2307041137250.4723@jbgna.fhfr.qr> <ZKQzSJcEUL5SH0KG@kam.mff.cuni.cz> <ZKbQ7DE2cxU1hA96@kam.mff.cuni.cz> <nycvar.YFH.7.77.849.2307070557080.4723@jbgna.fhfr.qr> <ZKgC/9WHxvuD3qdA@kam.mff.cuni.cz>
 <nycvar.YFH.7.77.849.2307100654500.4723@jbgna.fhfr.qr> <ZKvOJZD83UUXTSnE@kam.mff.cuni.cz>
User-Agent: Alpine 2.22 (LSU 394 2020-01-19)
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
X-Spam-Status: No, score=-5.0 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org
List-Id: <gcc-patches.gcc.gnu.org>

On Mon, 10 Jul 2023, Jan Hubicka wrote:

> > On Fri, 7 Jul 2023, Jan Hubicka wrote:
> > 
> > > > 
> > > > Looks good, but I wonder what we can do to at least make the
> > > > multiple exit case behave reasonably?  The vectorizer keeps track
> > > 
> > > > of a "canonical" exit, would it be possible to pass in the main
> > > > exit edge and use that instead of single_exit (), would other
> > > > exits then behave somewhat reasonable or would we totally screw
> > > > things up here?  That is, the "canonical" exit would be the
> > > > counting exit while the other exits are on data driven conditions
> > > > and thus wouldn't change probability when we reduce the number
> > > > of iterations(?)
> > > 
> > > I can add canonical_exit parameter and make the function to direct flow
> > > to it if possible.  However overall I think fixup depends on what
> > > transformation led to the change.
> > 
> > I think the vectorizer knows there's a single counting IV and all
> > other exits are dependent on data processed, so the scaling the
> > vectorizer just changes the counting IV.  So I think it makes
> > sense to pass that exit to the function in all cases.
> 
> It really seems to me that vectorized loop is like N loops happening 
> in parallel, so the probabilities of alternative exits grows as well.
> But canonical exit is right thing to do for prologues - here we really
> add extra conditions to the iteration counting exit.
> > 
> > > Assuming that vectorizer did no prologues and apilogues and we
> > > vectorized with factor N, then I think the update could be done more
> > > specifically as follows.
> > > 
> > > We know that header block count dropped by 4. So we can start from that
> > > and each time we reach basic block with exit edge, we know the original
> > > count of the edge.  This count is unchanged, so one can rescale
> > > probabilities out of that BB accordingly.  If loop has no inner loops,
> > > we can just walk the body in RPO and propagate scales downwards and we
> > > sould arrive to right result
> > 
> > That should work for alternate exits as well, no?
> Yes, i think it could omstly work for acyclic bodies. I ended up
> implementing a special case of this for loop-ch in order to handle
> corectly loop invariant conditionals.  Will send patch after some
> cleanups. (There seems to be more loop invariant conditionals in real
> code than I would tought)
> 
> Tampering only with loop exit probabilities is not always enought.
> If you have:
>   while (1)
>     if (test1)
>       {
>         if (test2)
> 	  break;
>       }
> increasing count of exit may require increasing probablity of the outer
> conditional.   Do we support this in vectorization at all and if so, do
> we know something here?

Tamar would need to answer this but without early break vectorization
the if-conversion pass will flatten everything and I think even early
breaks will be in the end a non-nested sequence of BBs with
exit conds at the end (or a loopback branch).

Note the (scalar) epilogue is copied from the original scalar loop
body so it doesn't see any if-conversion.

> For example if the test1 is triggered if test1 is true in one of
> iterations packed togehter, its probability also increases by
> vectorization factor.  
> 
> We run into this in peeling i.e. when we prove that test1 will trigger
> undefined behaviour after one or two iterations but the orignal
> esimtated profile believes in higher iteration count.  I added special
> case for this yesterday to avoid turning if (test2) to 100% in this case
> as that triggers strange codegen in some of fortran testcases.
> 
> We also can have
>   while (1)
>     while (test1)
>       {
>         if (test2)
> 	  break;
>       }
> Which is harder because changing probability of test2 affects number
> of iteraitons of the inner loop.  So I am giving up on this.
> I think currently it happens mostly with unlooping.

What I saw most wrecking the profile is when passes turn
if (cond) into if (0/1) leaving the CFG adjustment to CFG cleanup
which then simply deletes one of the outgoing edges without doing
anything to the (guessed) profile.

> > > I originally added the bound parameter to handle prologues/epilogues
> > > which gets new artificial bound.  In prologue I think you are right that
> > > the flow will be probably directed to the conditional counting
> > > iterations.
> > 
> > I suppose we'd need to scale both main and epilogue together since
> > the epilogue "steals" from the main loop counts.  Likewise if there's
> > a skip edge around the vector loop.  I think currently we simply
> > set the edge probability of those skip conds rather than basing
> > this off the niter values they work on.  Aka if (niter < VF) goto
> > epilogue; do {} while (niter / VF); epilogue: do {} while (niter);
> > 
> > There's also the cost model which might require niter > VF to enter
> > the main loop body.
> 
> I think I mostly understand this since we was playing with it with Ondra's
> histograms (that can be used to get some of the unknowns in the
> transformation right). The unknowns (how many times we end up jumpig to
> epilogue, for instance, probably can't be reasonably well guessed if we
> do not know the loop histogram which currently we know only if we prove
> that loop has constant number of iterations.  So I am trying to get
> right at least this case first.
> 
> Theoretically correct approach would be to first determine entry counts
> of prologue and epilogue, then produce what we believe to be correct
> profile of those and subtract it from the main loop profile updating
> also probabilities in basic blocks where we did nontrivial changes while
> updating prologs/epilogs. Finally scale down the main loop profile and
> increase exit probabilities.