From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.220.28]) by sourceware.org (Postfix) with ESMTPS id 101613858D32 for ; Mon, 10 Jul 2023 09:29:30 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 101613858D32 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=suse.de Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=suse.de Received: from relay2.suse.de (relay2.suse.de [149.44.160.134]) by smtp-out1.suse.de (Postfix) with ESMTP id 3FCA521B88; Mon, 10 Jul 2023 09:29:29 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1688981369; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=SNQdRQH+MbvSIfU1I5Qt4i7z5zEZx0JfdqJvOeJyCkc=; b=HPxXJz42Mp+mVa1nrZon18NbT2WGAmbd1i3//KQHDRbW6OBBcFH3j5enBclKORVIAr80Mr dptoT6eBa26TSC9u9Obo0EffZscfOTEiQaL0tqzQYrmTucFazWDjBNQiPHvFzt4+dsq7nY fvDL1xtBv0BpUmpLAZBPDCj0iVoUxIE= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1688981369; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=SNQdRQH+MbvSIfU1I5Qt4i7z5zEZx0JfdqJvOeJyCkc=; b=XhgBzp9KBejZncWuT0jh7O9T5NrrQ9eMq36xF8ZN29lMCA7p3igMw0P3RJx/VhGa88btAX GzkXwRg1oegAeeCA== Received: from wotan.suse.de (wotan.suse.de [10.160.0.1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by relay2.suse.de (Postfix) with ESMTPS id 286332C142; Mon, 10 Jul 2023 09:29:29 +0000 (UTC) Date: Mon, 10 Jul 2023 09:29:29 +0000 (UTC) From: Richard Biener To: Jan Hubicka cc: Tamar Christina , gcc-patches@gcc.gnu.org, nd@arm.com, jlaw@ventanamicro.com Subject: Re: [PATCH 4/19]middle-end: Fix scale_loop_frequencies segfault on multiple-exits In-Reply-To: Message-ID: References: User-Agent: Alpine 2.22 (LSU 394 2020-01-19) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII X-Spam-Status: No, score=-5.0 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Mon, 10 Jul 2023, Jan Hubicka wrote: > > On Fri, 7 Jul 2023, Jan Hubicka wrote: > > > > > > > > > > Looks good, but I wonder what we can do to at least make the > > > > multiple exit case behave reasonably? The vectorizer keeps track > > > > > > > of a "canonical" exit, would it be possible to pass in the main > > > > exit edge and use that instead of single_exit (), would other > > > > exits then behave somewhat reasonable or would we totally screw > > > > things up here? That is, the "canonical" exit would be the > > > > counting exit while the other exits are on data driven conditions > > > > and thus wouldn't change probability when we reduce the number > > > > of iterations(?) > > > > > > I can add canonical_exit parameter and make the function to direct flow > > > to it if possible. However overall I think fixup depends on what > > > transformation led to the change. > > > > I think the vectorizer knows there's a single counting IV and all > > other exits are dependent on data processed, so the scaling the > > vectorizer just changes the counting IV. So I think it makes > > sense to pass that exit to the function in all cases. > > It really seems to me that vectorized loop is like N loops happening > in parallel, so the probabilities of alternative exits grows as well. > But canonical exit is right thing to do for prologues - here we really > add extra conditions to the iteration counting exit. > > > > > Assuming that vectorizer did no prologues and apilogues and we > > > vectorized with factor N, then I think the update could be done more > > > specifically as follows. > > > > > > We know that header block count dropped by 4. So we can start from that > > > and each time we reach basic block with exit edge, we know the original > > > count of the edge. This count is unchanged, so one can rescale > > > probabilities out of that BB accordingly. If loop has no inner loops, > > > we can just walk the body in RPO and propagate scales downwards and we > > > sould arrive to right result > > > > That should work for alternate exits as well, no? > Yes, i think it could omstly work for acyclic bodies. I ended up > implementing a special case of this for loop-ch in order to handle > corectly loop invariant conditionals. Will send patch after some > cleanups. (There seems to be more loop invariant conditionals in real > code than I would tought) > > Tampering only with loop exit probabilities is not always enought. > If you have: > while (1) > if (test1) > { > if (test2) > break; > } > increasing count of exit may require increasing probablity of the outer > conditional. Do we support this in vectorization at all and if so, do > we know something here? Tamar would need to answer this but without early break vectorization the if-conversion pass will flatten everything and I think even early breaks will be in the end a non-nested sequence of BBs with exit conds at the end (or a loopback branch). Note the (scalar) epilogue is copied from the original scalar loop body so it doesn't see any if-conversion. > For example if the test1 is triggered if test1 is true in one of > iterations packed togehter, its probability also increases by > vectorization factor. > > We run into this in peeling i.e. when we prove that test1 will trigger > undefined behaviour after one or two iterations but the orignal > esimtated profile believes in higher iteration count. I added special > case for this yesterday to avoid turning if (test2) to 100% in this case > as that triggers strange codegen in some of fortran testcases. > > We also can have > while (1) > while (test1) > { > if (test2) > break; > } > Which is harder because changing probability of test2 affects number > of iteraitons of the inner loop. So I am giving up on this. > I think currently it happens mostly with unlooping. What I saw most wrecking the profile is when passes turn if (cond) into if (0/1) leaving the CFG adjustment to CFG cleanup which then simply deletes one of the outgoing edges without doing anything to the (guessed) profile. > > > I originally added the bound parameter to handle prologues/epilogues > > > which gets new artificial bound. In prologue I think you are right that > > > the flow will be probably directed to the conditional counting > > > iterations. > > > > I suppose we'd need to scale both main and epilogue together since > > the epilogue "steals" from the main loop counts. Likewise if there's > > a skip edge around the vector loop. I think currently we simply > > set the edge probability of those skip conds rather than basing > > this off the niter values they work on. Aka if (niter < VF) goto > > epilogue; do {} while (niter / VF); epilogue: do {} while (niter); > > > > There's also the cost model which might require niter > VF to enter > > the main loop body. > > I think I mostly understand this since we was playing with it with Ondra's > histograms (that can be used to get some of the unknowns in the > transformation right). The unknowns (how many times we end up jumpig to > epilogue, for instance, probably can't be reasonably well guessed if we > do not know the loop histogram which currently we know only if we prove > that loop has constant number of iterations. So I am trying to get > right at least this case first. > > Theoretically correct approach would be to first determine entry counts > of prologue and epilogue, then produce what we believe to be correct > profile of those and subtract it from the main loop profile updating > also probabilities in basic blocks where we did nontrivial changes while > updating prologs/epilogs. Finally scale down the main loop profile and > increase exit probabilities.