From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [216.205.24.124]) by sourceware.org (Postfix) with ESMTPS id 8EAF03858404 for ; Wed, 10 Nov 2021 07:52:55 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 8EAF03858404 Received: from mail-lj1-f200.google.com (mail-lj1-f200.google.com [209.85.208.200]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-541-kfm_esKnMNiJDQSyUpuLBw-1; Wed, 10 Nov 2021 02:52:54 -0500 X-MC-Unique: kfm_esKnMNiJDQSyUpuLBw-1 Received: by mail-lj1-f200.google.com with SMTP id h21-20020a05651c159500b00219132ab503so148747ljq.12 for ; Tue, 09 Nov 2021 23:52:53 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=R67w4K+zavbPrQgTviN/6jxJvTlAtovD6A2GE7VzjIM=; b=YFUQm+EvbfcAcwgDjNc4qdjBxlE9EGBMzC7/jaJoh8htDHgzu1TwF/CP2MY/yE6yZj vdNEO13lSGc67XRKzwIne3BrL9N7kZuxUs6i32I/4mvbpUrMDs2n6uIO9tFqKvXgpCNC vVB4ua6y+sFyBQFVRAcu8PE9feiIMWc/kggF2EKhJoAhMw97vANYSX36Dpa4LHmQsAVK KQ23sLv4qwq0bT8b9hR7AIos8c9Zlk25hpqP8fMV5SEujW8D6yeRo9aPr29FURhYhHw7 Gej2hhPAllJZlnUpPlP6m2jbQ2NIrSVkxyoB/mRj5z77gZtznMD+zUetI+YsLjMExsQq Uj0A== X-Gm-Message-State: AOAM532rqx99sVPcXoVpE5Ax8Uhe1FjBzwTuGpBj/ACqNe8VwodM98Z7 wR4WtqEyXYgGGrYnUyuobwGTYiFULeIYS3gCQTj8xGFjFGWBDwfs+HJ8LHuvUUGjmWOwxyQGjXC 70RvnHdyarH37u6ca8a4Kgpaxh7GKpPAIOg== X-Received: by 2002:a05:6512:3763:: with SMTP id z3mr12443602lft.315.1636530772381; Tue, 09 Nov 2021 23:52:52 -0800 (PST) X-Google-Smtp-Source: ABdhPJwSpswU7BHEwVt/8naXiFWqRn193J4i70P8D7pGVCjStTpzlU1fUU4mLdQEri9uBsDgCiCAhvbXWVLAjG5ZJEA= X-Received: by 2002:a05:6512:3763:: with SMTP id z3mr12443568lft.315.1636530772010; Tue, 09 Nov 2021 23:52:52 -0800 (PST) MIME-Version: 1.0 References: <7791e850-f74d-8c1c-f67c-e02f3f6e007d@redhat.com> <5c6c91d4-ed8b-8d98-2cd9-bafc84e6f2a4@suse.cz> <8da24825-19ec-56a6-a68c-5c37c7acc3e1@redhat.com> <082337e8-cb78-f7b8-fee8-3f94eaf2a5f9@redhat.com> In-Reply-To: <082337e8-cb78-f7b8-fee8-3f94eaf2a5f9@redhat.com> From: Aldy Hernandez Date: Wed, 10 Nov 2021 08:52:40 +0100 Message-ID: Subject: Re: [PATCH] Loop unswitching: support gswitch statements. To: Andrew MacLeod Cc: Richard Biener , =?UTF-8?Q?Martin_Li=C5=A1ka?= , GCC Patches X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-6.4 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, RCVD_IN_DNSWL_LOW, RCVD_IN_MSPIKE_H2, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 10 Nov 2021 07:52:58 -0000 On Tue, Nov 9, 2021 at 5:41 PM Andrew MacLeod wrote: > > On 11/9/21 8:37 AM, Richard Biener wrote: > > On Mon, Nov 8, 2021 at 8:45 PM Andrew MacLeod wro= te: > >> On 11/8/21 10:05 AM, Martin Li=C5=A1ka wrote: > >>> On 9/28/21 22:39, Andrew MacLeod wrote: > >>>> In Theory, modifying the IL should be fine, it happens already in > >>>> places, but its not extensively tested under those conditions yet. > >>> Hello Andrew. > >>> > >>> I've just tried using a global gimple_ranger and it crashes when loop > >>> unswitching duplicates > >>> some BBs. > >>> > >>> Please try the attached patch for: > >> hey Martin, > >> > >> try using this in your tree. Since nothing else is using a growing BB > >> right now, I'll let you work with it and see if everything works as > >> expected before checking it in, just in case we need more tweaking. > >> With this, > >> > >> make RUNTESTFLAGS=3Ddg.exp=3Dloop-unswitch*.c check-gcc > >> > >> runs clean. > >> > >> > >> basically, I tried to grow it by either a factor of 10% for the curren= t > >> BB size when the grow is requested, or some double the needed extra > >> size, or 128... whichever value is "maximum" That means it shoudnt = be > >> asking for tooo much each time, but also not a minimum amount. > >> > >> Im certainly open to suggestion on how much to grow it each time. > >> Note the vector being grown is ONLY fo the SSA_NAme being asked for.. = so > >> it really an on-demand thing just for specific names, in your case, > >> mostly just the switch index. > >> > >> Let me know how this works for you, and if you have any other issues. > > So I think in the end we shouldn't need the growing. Ideally we'd do a= ll > > the analysis before the first transform, but for that we'd need ranger = to > > be able to "simplify" conditions based on a known true/false predicate > > that's not yet in the IL. Consider > > > > for (;;) > > { > > if (invariant < 3) // A > > { > > ... > > } > > if (invariant < 5) // B > > { > > ... > > } > > } > > > > unswitch analysis will run into the condition 'A' and determine the loo= p > > can be unswitched with the condition 'invariant < 3'. To be able to > > perform cost assessment and to avoid redundant unswitching we > > want to determine that if we unswitch with 'invariant < 3' being > > true then the condition at 'B' is true as well before actually insertin= g > > the if (invariant < 3) outside of the loop. > > > > So I'm thinking of assigning a gimple_uid to each condition we want to > > unswitch on and have an array indexed by the uid with meta-data on > > the unswitch opportunity, the "related" conditions could be marked with > > the same uid (or some other), and the folding result recorded so that > > at transform time we can just do the appropriate replacement without > > invoking ranger again. > > > > Now, but how do we arrange for the ranger analysis here? > > well, i think there are multiple ways we could do this. are you > always doing this on > > if (invariant < constant) or might it be another ssa-name? > > because if its always a constant, you could ask for the range of > invariant at if (invariant < 3) when you unswitch it and it will > provide you with [MIN, 2]. > > when you look at if (invariant < 5), you can try folding that stmt > using the range you know already from above.. theres an API to > fold_stmt() independent from ranger (in gimple-range-fold.h) which lets > you supply ranges for ssa_names in the order they are found in the stmt, > > bool fold_range (irange &r, gimple *s, irange &r1); > > so putting it together, you can do something like: > > // decide to unswitch this, as for the range of invariant on the TRUE edg= e: > s1 =3D first_stmt : if (invariant < 3) > range_of_expr (&ivrange, invariant, TRUE_EDGE) // This will set > ivrange to [MIN, 2].. its value on the TRUE edge > > // Now we come to the second if, we try to fold it using the range from > the first stmt. if fold_stmt returns true, it mean stmt2_range will > have the result of folding the stmt. only one range is supplied, so it > will apply ivrange [MIN, 2] to the first ssa-name it encounters in the > stmt, and [MIN, 2] < 5 so it will return bool [1,1] for the range of > the stmt. > > s2 =3D second_stmt : if (invariant < 5) > if (fold_range (&stmt2_range, second_stmt, ivrange) && > stmt2_range.singleton_p () > { > if (!stmt2_range.zero_p ()) > // result is not zero, so that means this stmt will always > be true given the range in ivrange substituted for "invariant", > > There is a fair amount of flexibility on exactly what you can do, > depending on how complex you want to get. > > In some ways, you might be able to utilize Aldy's path-ranger that he > uses in the threader.. and then it wouldn't matter how complex the > conditions are, if you decide to unswitch the first condition, then > you'd create a path thru the loop using the true edge and it would > automatically pick up *all* the ranges that are true on that edge and > apply them to the other conditions in the path you create. In theory, > without doing anything else, the the second condition should > automatically fold to [1,1] . My guess is its too late in the cycle to > be fooling around with that, because Im not sure if the path ranger is > dynamically adjustable enough to build paths on the fly like that, or > whether its more focused on its thread specific bits.. It may also work > bottom to top, Im not sure. But it also includes relations that turn > out to be true and false as well as it goes. If I understand correctly, this could be easily done by the solver. The API is simple, you feed it a sequence of blocks and a bitmap of SSAs you're interested in and just ask it things with range_of_stmt / range_of_expr. The returned values would be the ranges on exit from the last block. For example, if you have a sequence of BB2->BB3->BB4->?? and want to know the answer to the final conditional in BB4, you would do gimple_ranger ranger; path_range_query path (ranger, /*resolve=3D*/true); // Populate blocks along the path in reverse order. auto_vec bbs; bbs.safe_push(bb4); bbs.safe_push(bb3); bss.safe_push (bb2); // Precompute ranges along the path. path.compute_ranges (bbs, ranger.gori ().imports (bb4)); // Ask for ranges at the end of the path. path.range_of_stmt (r, last_stmt (bb4)); or in case the last statement is a switch: tree index =3D gimple_switch_index (last_stmt (bb4)); path.range_of_expr (r, index, last_stmt (bb4)); We may have to clean up some things for you: a) The call to precompute_ranges starts with the list of SSAs that are of interest to the BB4 (call the gor().imports). We should probably hide all that inside the path solver. For that matter, there are some other items that get put in that list by the backward threader. We could move that as well to the solver. b) The blocks are in reverse order. That's just a quirk of its use from the backward threader. c) Right now, you'd have to call compute_ranges() with a new set of blocks if they change. Since the solver works top to bottom, we could probably add blocks to the bottom of the sequence without having to recalculate things. The functionality is all there. It just requires a little tweaking to separate it a bit more from its only user, the backward threader. As Andrew says, it may be too late to start playing with the solver, especially if the ranger can do what you want. If you it's too cumbersome, I could clean up the above 3 things. Aldy > > Regardless, we could work on enhancements that would function in this way= . > > > > > We might also somehow want to remember that on the > > 'invariant < 3' =3D=3D false copy of the loop there's still the > > unswitching opportunity on 'invariant < 5', but not on the > > 'invariant < 5' =3D=3D true copy. > > tracking the same thing for the false edge would give you a range of [3, > MAX] when thats applied to invariant < 5, it'll return a range of bool > [0,1], and singleton_p() will be false so you would know it doesnt > resolve in that case > > Instead of tracking the ranges, if you are stashing the meta data which > includes stmt, you can always pick up the ranges you want as you need > them by making the range_of_expr() query. on the appropriate outgoing > edge from the block with the condition > > ie, to get the range from the first stmt, you can simply ask for the > range-of-expr() on whichever edge you happen to care about (true or > false) just when you want it. You could stash it in the meta data or > just re-ask when you need it depending on which decision was made. > > So you would just need to be able to key off the ssa-name in the > condition that you unswitched I guess? > > Andrew >