From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from nikam.ms.mff.cuni.cz (nikam.ms.mff.cuni.cz [195.113.20.16]) by sourceware.org (Postfix) with ESMTPS id 0A05F3858D32 for ; Mon, 10 Jul 2023 09:23:51 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 0A05F3858D32 Authentication-Results: sourceware.org; dmarc=fail (p=none dis=none) header.from=ucw.cz Authentication-Results: sourceware.org; spf=none smtp.mailfrom=kam.mff.cuni.cz Received: by nikam.ms.mff.cuni.cz (Postfix, from userid 16202) id E3175281D2F; Mon, 10 Jul 2023 11:23:49 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ucw.cz; s=gen1; t=1688981029; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=aCDlIgPq6pQTsRv1TS5dAGgZ2giQ2gfu0UJu7DDC0LY=; b=cimY1uD4Ycq8h/Aa70uvCFULbG844fCBnmFU/EQnvjqn219TBc6bGEOLq+Izrv0SsewVJV hU0bIOzOgLJ2NcymBJ97QKM9yCcaQ/wAfxio1drC7oOl8hfI14dVcgcgAJ56QeLMege1fV S6515dqCvF+uX+DzqEmHT8UKhjo9LE8= Date: Mon, 10 Jul 2023 11:23:49 +0200 From: Jan Hubicka To: Richard Biener Cc: Tamar Christina , gcc-patches@gcc.gnu.org, nd@arm.com, jlaw@ventanamicro.com Subject: Re: [PATCH 4/19]middle-end: Fix scale_loop_frequencies segfault on multiple-exits Message-ID: References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Spam-Status: No, score=-5.2 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL,SPF_HELO_NONE,SPF_NONE,TXREP,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: > On Fri, 7 Jul 2023, Jan Hubicka wrote: > > > > > > > Looks good, but I wonder what we can do to at least make the > > > multiple exit case behave reasonably? The vectorizer keeps track > > > > > of a "canonical" exit, would it be possible to pass in the main > > > exit edge and use that instead of single_exit (), would other > > > exits then behave somewhat reasonable or would we totally screw > > > things up here? That is, the "canonical" exit would be the > > > counting exit while the other exits are on data driven conditions > > > and thus wouldn't change probability when we reduce the number > > > of iterations(?) > > > > I can add canonical_exit parameter and make the function to direct flow > > to it if possible. However overall I think fixup depends on what > > transformation led to the change. > > I think the vectorizer knows there's a single counting IV and all > other exits are dependent on data processed, so the scaling the > vectorizer just changes the counting IV. So I think it makes > sense to pass that exit to the function in all cases. It really seems to me that vectorized loop is like N loops happening in parallel, so the probabilities of alternative exits grows as well. But canonical exit is right thing to do for prologues - here we really add extra conditions to the iteration counting exit. > > > Assuming that vectorizer did no prologues and apilogues and we > > vectorized with factor N, then I think the update could be done more > > specifically as follows. > > > > We know that header block count dropped by 4. So we can start from that > > and each time we reach basic block with exit edge, we know the original > > count of the edge. This count is unchanged, so one can rescale > > probabilities out of that BB accordingly. If loop has no inner loops, > > we can just walk the body in RPO and propagate scales downwards and we > > sould arrive to right result > > That should work for alternate exits as well, no? Yes, i think it could omstly work for acyclic bodies. I ended up implementing a special case of this for loop-ch in order to handle corectly loop invariant conditionals. Will send patch after some cleanups. (There seems to be more loop invariant conditionals in real code than I would tought) Tampering only with loop exit probabilities is not always enought. If you have: while (1) if (test1) { if (test2) break; } increasing count of exit may require increasing probablity of the outer conditional. Do we support this in vectorization at all and if so, do we know something here? For example if the test1 is triggered if test1 is true in one of iterations packed togehter, its probability also increases by vectorization factor. We run into this in peeling i.e. when we prove that test1 will trigger undefined behaviour after one or two iterations but the orignal esimtated profile believes in higher iteration count. I added special case for this yesterday to avoid turning if (test2) to 100% in this case as that triggers strange codegen in some of fortran testcases. We also can have while (1) while (test1) { if (test2) break; } Which is harder because changing probability of test2 affects number of iteraitons of the inner loop. So I am giving up on this. I think currently it happens mostly with unlooping. > > > I originally added the bound parameter to handle prologues/epilogues > > which gets new artificial bound. In prologue I think you are right that > > the flow will be probably directed to the conditional counting > > iterations. > > I suppose we'd need to scale both main and epilogue together since > the epilogue "steals" from the main loop counts. Likewise if there's > a skip edge around the vector loop. I think currently we simply > set the edge probability of those skip conds rather than basing > this off the niter values they work on. Aka if (niter < VF) goto > epilogue; do {} while (niter / VF); epilogue: do {} while (niter); > > There's also the cost model which might require niter > VF to enter > the main loop body. I think I mostly understand this since we was playing with it with Ondra's histograms (that can be used to get some of the unknowns in the transformation right). The unknowns (how many times we end up jumpig to epilogue, for instance, probably can't be reasonably well guessed if we do not know the loop histogram which currently we know only if we prove that loop has constant number of iterations. So I am trying to get right at least this case first. Theoretically correct approach would be to first determine entry counts of prologue and epilogue, then produce what we believe to be correct profile of those and subtract it from the main loop profile updating also probabilities in basic blocks where we did nontrivial changes while updating prologs/epilogs. Finally scale down the main loop profile and increase exit probabilities. Honza