From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by sourceware.org (Postfix) with ESMTPS id BA3F5385800E for ; Mon, 2 May 2022 07:17:25 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org BA3F5385800E Received: from mail-oo1-f70.google.com (mail-oo1-f70.google.com [209.85.161.70]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-442-sxyzZUQyOBSP3FDin8yOWw-1; Mon, 02 May 2022 03:17:24 -0400 X-MC-Unique: sxyzZUQyOBSP3FDin8yOWw-1 Received: by mail-oo1-f70.google.com with SMTP id h7-20020a4aa287000000b0035ea2f18e54so7188716ool.10 for ; Mon, 02 May 2022 00:17:24 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=DldqjVEx9dD9Fo3X7hRSN7jFxcFZB4QHIA14VkDZrTo=; b=Cj6bJB7AsHZcBQkLNdrkS765CcVgbmsDpXYErfr4XommP34AP8CIXH7He0V7/rULTa ISBHjna87c7JBm1ipBeDfzCmByOsuptmmz81AdIV0LP9DOKQx2YagIE34VuTdMRNlP2y wiuo0wUNDTuh4+TybcbJrZvOdmSfilbGBs7b+9Z8QY9MQU2ZzrMgB8rPqdayMOf+xuo2 21atyRFjFIh6My0ulvfGZGYM5WU/u/DQC+69lZJFYNgDvtVwWM2bRCG9MKihs6fAsuHh +6vnZRuQI7nJrJmBRGK70UNDQoA8He+NfsgPotSiXnpmPKrC6Jr62yLbQP+f965BhFXh EJrg== X-Gm-Message-State: AOAM533ECRHhP1+W9Ndu56Etj/rFoiMPCLIevstbIxyE4awPvu1j/i15 lvPekBKHSxFfZ1Aa9IllLyZkInSPu/9+KMTiwBi73wJd3S3Y/dMBfS2YjWCA3akU0XIHsCPCk9T a7kqF+GFhyGoGrEfDZIN9OPgAuvwvkR5gBQ== X-Received: by 2002:a05:6870:c1cd:b0:e6:757f:9a04 with SMTP id i13-20020a056870c1cd00b000e6757f9a04mr4237869oad.265.1651475843581; Mon, 02 May 2022 00:17:23 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwvQQfbdplqjqJr2C11YnlC0VvoO2Ssfjfzf5dhWk/FiadxKXN/0TB7B8tIzvTJXXCsdDl3F0JEhaQVRA+H9b4= X-Received: by 2002:a05:6870:c1cd:b0:e6:757f:9a04 with SMTP id i13-20020a056870c1cd00b000e6757f9a04mr4237863oad.265.1651475843246; Mon, 02 May 2022 00:17:23 -0700 (PDT) MIME-Version: 1.0 References: <20220428163015.595263-1-aldyh@redhat.com> In-Reply-To: From: Aldy Hernandez Date: Mon, 2 May 2022 09:17:12 +0200 Message-ID: Subject: Re: [PATCH] Replace EVRP in DOM with ranger. To: Richard Biener Cc: Jeff Law , GCC patches , "MacLeod, Andrew" X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-5.9 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, RCVD_IN_DNSWL_LOW, SPF_HELO_NONE, SPF_NONE, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 02 May 2022 07:17:28 -0000 On Mon, May 2, 2022 at 8:30 AM Richard Biener wrote: > > On Fri, Apr 29, 2022 at 6:22 PM Aldy Hernandez wrote: > > > > On Fri, Apr 29, 2022 at 12:13 PM Richard Biener > > wrote: > > > > > > On Fri, Apr 29, 2022 at 11:53 AM Aldy Hernandez wrote: > > > > > > > > On Fri, Apr 29, 2022 at 9:09 AM Richard Biener > > > > wrote: > > > > > > > > > > On Thu, Apr 28, 2022 at 8:44 PM Jeff Law via Gcc-patches > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > On 4/28/2022 10:30 AM, Aldy Hernandez wrote: > > > > > > > [Jeff, this is the same patch I sent you last week with minor tweaks > > > > > > > to the commit message.] > > > > > > And I just dropped it into the tester. We should have the vast majority > > > > > > of targets tested by this time tomorrow. > > > > > > > > > > > > > > > > > > > > [Despite the verbosity of the message, this is actually a pretty > > > > > > > straightforward patch. It should've gone in last cycle, but there > > > > > > > was a nagging regression I couldn't get to until after stage1 > > > > > > > had closed.] > > > > > > You had other, non-work/gcc priorities. No worries at missing gcc-12. > > > > > > > > > > > > > > > > > > > > > > > > > > There are 3 uses of EVRP in DOM that must be converted. > > > > > > > Unfortunately, they need to be converted in one go, so further > > > > > > > splitting of this patch would be problematic. > > > > > > ACK > > > > > > > > > > > > > > > > > > > There's nothing here earth shattering. It's all pretty obvious in > > > > > > > retrospect, but I've added a short description of each use to aid in > > > > > > > reviewing: > > > > > > > > > > > > > > * Convert evrp use in cprop to ranger. > > > > > > > > > > > > > > This is easy, as cprop in DOM was converted to the ranger API last > > > > > > > cycle, so this is just a matter of using a ranger instead of an > > > > > > > evrp_range_analyzer. > > > > > > Yea. We may have covered this before, but what does ranger do with a > > > > > > conditional equivalence? > > > > > > > > > > > > ie if we have something like > > > > > > > > > > > > if (a == b) > > > > > > { > > > > > > blah blah > > > > > > } > > > > > > > > > > > > Within the true arm we know there's an equivalence. *But* we don't want > > > > > > to just blindly replace a with b or vice-versa. The equivalence is > > > > > > primarily useful for simplification rather than propagation. > > > > > > > > > > > > In fact, we every much do not want to cprop in these cases. It rarely > > > > > > helps in a meaningful way and there's no good way to know which is the > > > > > > better value to use. Canonicalization on SSA_NAMEs doesn't really help > > > > > > due to SSA_NAME recycling. > > > > > > > > > > > > > > > > > > > > > > > > > > * Convert evrp use in threader to ranger. > > > > > > > > > > > > > > The idea here is to use the hybrid approach we used for the initial > > > > > > > VRP threader conversion last cycle. The DOM threader will continue > > > > > > > using the forward threader infrastructure while continuing to query > > > > > > > DOM data structures, and only if the conditional does not relsolve, > > > > > > > using the ranger. This gives us the best of both worlds, and is a > > > > > > > proven approach. > > > > > > > > > > > > > > Furthermore, as frange and prange come live in the next cycle, we > > > > > > > can move away from the forward threader altogether, and just add > > > > > > > another backward threader. This will not only remove the last use > > > > > > > of the forward threader, but will allow us to remove at least 1 or 2 > > > > > > > threader instances. > > > > > > Excellent. > > > > > > > > > > > > > > > > > > > > * Convert conditional folding to use the method used by the ranger and > > > > > > > evrp. Previously DOM was calling into the guts of > > > > > > > simplify_using_ranges::vrp_visit_cond_stmt. The blessed way now is > > > > > > > using fold_cond() which rewrites the conditional and edges > > > > > > > automatically. > > > > > > > > > > > > > > When legacy is removed, simplify_using_ranges will be further > > > > > > > cleaned up, and there will only be one entry point into simplifying > > > > > > > a statement. > > > > > > Sure. > > > > > > > > > > > > > > > > > > > > * DOM was setting global ranges determined from unreachable edges as a > > > > > > > side-effect of using the evrp engine. We must handle these cases > > > > > > > before nuking evrp, and DOM seems like a good fit. I've just moved > > > > > > > the snippet to DOM, but it could live anywhere else we do a DOM > > > > > > > walk. > > > > > > It was a semi-desirable to record/refine global ranges in DOM, but I'd > > > > > > be surprised if too much code depended on that. Presumably there's a > > > > > > testcase in the suite which we don't handle well if we don't let DOM > > > > > > refine/record global ranges? > > > > > > > > > > > > > > > > > > > > For the record, this is the case *vrp handled: > > > > > > > > > > > > > > : > > > > > > > ... > > > > > > > if (c_5(D) != 5) > > > > > > > goto ; > > > > > > > else > > > > > > > goto ; > > > > > > > : > > > > > > > __builtin_unreachable (); > > > > > > > : > > > > > > > > > > > > > > If M dominates all uses of c_5, we can set the global range of c_5 > > > > > > > to [5,5]. > > > > > > And that will only happen if M eventually loops back to the conditional, > > > > > > right? Otherwise all the uses wouldn't be dominated. I suspect this is > > > > > > exceedingly rare in practice an it looks really familiar. This is > > > > > > coming from a testcase, right? > > > > > > > > > > This is how we make use of assert() when it uses __builtin_unreachable. > > > > > > > > > > Note the difficulty here is that we have to do this at a very specific point > > > > > in the pass pipeline (setting the global range), since the first time we'll > > > > > fold the if (c_5(D) != 5) it will be folded to false and the IL representation > > > > > of the assert() is gone. > > > > > > > > > > So the idea was that with the IL present value-range analysis can determine > > > > > the range omf c_5(D) on the false path but once we remove it (and we do > > > > > want to remove it) we want to put a global range on c_5(D). > > > > > > > > > > That means the best of both worlds would be to detect this pattern at > > > > > a specific point in the pass pipeline (somewhen before loop opts for sure) > > > > > and do both - remove the IL _and_ set the global range. The fear of > > > > > doing this too early is of course that we lose the range by copy propagation > > > > > or similar transforms. The fear of doing this change at a single place only > > > > > is that we miss it because there's a stray use before the conditional. > > > > > > > > > > Note DOM is also run in pass_ipa_oacc_kernels which may be too early. > > > > > > > > > > In the end we want to perform dead code elimination here so maybe > > > > > DCE is the best fit to do this (but as said we want to avoid doing this too > > > > > early). > > > > > > > > I agree that doing both would be the ideal scenario. However, I would > > > > prefer this be done in a follow-up patch to minimize the amount of > > > > stuff being changed in one go. > > > > > > > > To provide additional background, this was being done in evrp which > > > > runs at -O2, but was also being done at -O1 as a side-effect of DOM > > > > using the evrp engine and exporting global ranges. For -O2, I don't > > > > know how much DOM was originally (prior to ranger) contributing to > > > > these assert global ranges, since evrp was probably picking them up > > > > first. However, since evrp now runs in ranger mode, this > > > > functionality has silently moved to DOM (without us realizing) for all > > > > compilation levels. > > > > > > I see. > > > > > > > Do we care about this functionality at -O1? ISTR at least one -O1 > > > > testcase failing in RTL land if exporting global ranges isn't done in > > > > DOM. I don't remember if it was a global range due to a > > > > __builtin_unreachable or otherwise. > > > > > > Well, at -O1 we definitely didn't set the global range (did we?) but > > > eventually pruned paths running into __builtin_unreachable () > > > anyway (would have to check at which point). > > > > I think we always have: > > > > evrp_range_analyzer analyzer (/*update_global_ranges=*/true); > > > > ?? > > > > DOM has been running as an incognito evrp even at -O1. I don't know > > if this has been by design, or as an unintended side-effect. I mean, > > it has been visiting all conditionals in the IL as well as calling > > evrp_range_analyzer::record_ranges_from_stmt() on each statement it > > tries to optimize. And of course, updating global ranges along the > > way. > > As said, it wasn't intended to be that way originally. But yes, as we > separated out evrp_range_analyzer and made use of it in places not > guarded by -ftree-vrp this might have happened. > > We might want to reconsider (based on now long-standing facts), > and maybe alter the -ftree-vrp documentation indicating that even > when not enabled effecive value-range propagation and optimization > might happen. > > > > > > > > > > > > One more thing, we have batted around the idea of running a fast evrp > > > > even at -O1 (ranger without looking at back edges, IIRC). This would > > > > mean that when the DOM threader becomes disentangled from DOM (when we > > > > have frange and prange), there's very little DOM will be doing that > > > > for instance PRE can't get. So perhaps, if we remove DOM, the place > > > > to put this optimization is in the fast evrp, and fold the conditional > > > > as has been suggested? Anyways... I'm digressing. > > > > > > The idea is to get rid of DOM once its threading is separated out. > > > > Excellent. There's no reason why we couldn't separate threading out > > in this release. > > > > > > > > I think doing O (n * log n) value range analysis at -O1 would be welcome. > > > We didn't get around doing that for EVRP which I think had such > > > bound - the iterating VRP definitely didn't. I'm not sure ranger really > > > qualifies given gori and all it relies on. The advantage of the simple > > > whole-IL walk way was to reasonably easily guarantee such bound. > > > That's unfortunately lost now :/ > > > > Andrew was mumbling something about a fast ranger mode for this > > release that should be on par with legacy evrp. IIRC it would be > > purely DOM based, won't visit back edges, and there's no caching. But > > he'll have to expand on it when he returns from vacation. I don't > > know the details. > > I would guess the stmt analysis building blocks (whatever API part of > ranger that is) can be used to produce something like that. But in the > end it would be the old EVRP pass with the VRP stmt analysis it > re-used replaced with the appropriate ranger parts. > > But yes, I'd welcome that. I'd also like to revisit integration of some > of this with value-numbering which doesn't do a DOM walk but instead > a RPO walk. > > > > > > > > Doing this optimization in DOM at least keeps with GCC 12, but I'm > > > > happy to move it to whatever is recommended. > > > > > > As said the intent of this trick is to preserve range information from > > > asserts even when we removed the IL carrying it. When we care > > > about range info, which means when some VRP pass runs, which > > > is why it was done in such a pass. That is, we probably want to > > > avoid altering the global range for -fno-tree-vrp. > > > > Not just -fno-tree-vrp then. There are other range consumers. For > > example, CCP, threading, etc. Threading will look at global ranges > > even in dumb mode, and CCP uses global ranges in a round about way-- > > by looking at global nonzero bits which the evrp code diligently sets: > > > > { > > set_ssa_range_info (vrs[i].first, vrs[i].second); > > maybe_set_nonzero_bits (pred_e, vrs[i].first); > > } > > > > For example, in this particular testcase, DOM will set the global > > range, and CCP will remove at least one of the conditionals by looking > > at nonzero bits. > > Yes, as said we might want to reconsider and adjust documentation. > In theory we could gate the global range/nonzero setters on > flag_tree_vrp so they become no-ops when not enabled > (there is also flag_tree_bit_ccp which is documented to track > alignment, it fails to mention non-zero bits on integers). Well, we could just not update global ranges for !flag_tree_vrp. In the old world: evrp_range_analyzer (/*update_global_ranges=*/ flag_tree_vrp) and in the new world: if (flag_tree_vrp) ranger->export_global_ranges(). This should fit in nicely with my plan to integrate nonzero bits into the irange, because we won't have to worry about them independently > > But let's say the plan is to more conciously enable some range > propagation at -O1. Sounds like a plan. Aldy