From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by sourceware.org (Postfix) with ESMTPS id 194A93858404 for ; Mon, 15 Aug 2022 19:10:10 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 194A93858404 Received: from mail-oo1-f70.google.com (mail-oo1-f70.google.com [209.85.161.70]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-346-oo6G5BA7Olu7GyW1WiKuZw-1; Mon, 15 Aug 2022 15:10:08 -0400 X-MC-Unique: oo6G5BA7Olu7GyW1WiKuZw-1 Received: by mail-oo1-f70.google.com with SMTP id h4-20020a4a6f04000000b0043541831b9dso4133030ooc.23 for ; Mon, 15 Aug 2022 12:10:08 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc; bh=u0QaOh6QCUcMbu2Impbiqag1VcpKyTpLI6grXOGHTqY=; b=u3eTZg4lvxRN1JIEO6WEzjDVpkCBVyr/sySIdQVfCVvI6B8/SolJlKnPPEKrzkvpYN zfDWDdTqiyZaVW6uqqo861+WJl+N3gA8KiaOceUqg2KfyzjBWYC0GjYfaX3TS2h0+wNa 2Vw1Xx0Ldij/eOGeaypJ+5XT8gFU6bBa44BAGzK6Jefq03qcu4zj4NF/uakxp7V5ijap CcFPoNDlPv5T8nsDSN6Pj3iqaCZ67lFSR++iIDYsCxuThY1hOObxeINx09XRZrwSBkjT hk3QjYGI2TBeHD3z041436yZbo3pWlA9+9CXGBqBsGSu7Z1oMa4TUnSRXibjjfeSKhR7 xRNg== X-Gm-Message-State: ACgBeo3jQ7WNUxsqb6zzxbHqaUu8KkEka3WeIq2jDAe/cT3oLtKy+HZC +R1wHuzdkhcz5U56iueKqyimx2NBN3hhF5jTFosVn5+UutfQX1T0ADQlSJ+3KGKaI+72ywHk1FG ECQV2WG6/PsmNiMPOfPOxcz5vSREQsVloRg== X-Received: by 2002:a4a:88cf:0:b0:435:bf6f:dff0 with SMTP id q15-20020a4a88cf000000b00435bf6fdff0mr5255377ooh.30.1660590607620; Mon, 15 Aug 2022 12:10:07 -0700 (PDT) X-Google-Smtp-Source: AA6agR7etceV8sUKddaqVZmC9nuhs+ZfA/H24Rg7PJh1x4sjYUyjlEUN8Ut4sn5mugW5R0MHe4F1PJqh4Z9wXp6ZLqc= X-Received: by 2002:a4a:88cf:0:b0:435:bf6f:dff0 with SMTP id q15-20020a4a88cf000000b00435bf6fdff0mr5255371ooh.30.1660590607310; Mon, 15 Aug 2022 12:10:07 -0700 (PDT) MIME-Version: 1.0 References: <20220812120145.91C6513AAE@imap2.suse-dmz.suse.de> In-Reply-To: From: Aldy Hernandez Date: Mon, 15 Aug 2022 21:09:55 +0200 Message-ID: Subject: Re: [PATCH] Support threading of just the exit edge To: Richard Biener Cc: Jeff Law , gcc-patches , "MacLeod, Andrew" X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: multipart/mixed; boundary="00000000000044488b05e64c6100" X-Spam-Status: No, score=-11.4 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_NONE, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 15 Aug 2022 19:10:12 -0000 --00000000000044488b05e64c6100 Content-Type: text/plain; charset="UTF-8" On Mon, Aug 15, 2022 at 11:39 AM Richard Biener wrote: > > On Fri, 12 Aug 2022, Aldy Hernandez wrote: > > > On Fri, Aug 12, 2022 at 2:01 PM Richard Biener wrote: > > > > > > This started with noticing we add ENTRY_BLOCK to our threads > > > just for the sake of simplifying the conditional at the end of > > > the first block in a function. That's not really threading > > > anything but it ends up duplicating the entry block, and > > > re-writing the result instead of statically fold the jump. > > > > Hmmm, but threading 2 blocks is not really threading at all?? Unless > > I'm misunderstanding something, this was even documented in the > > backwards threader: > > > > [snip] > > That's not really a jump threading opportunity, but instead is > > simple cprop & simplification. We could handle it here if we > > wanted by wiring up all the incoming edges. If we run this > > early in IPA, that might be worth doing. For now we just > > reject that case. */ > > if (m_path.length () <= 1) > > return false; > > > > Which you undoubtedly ran into because you're specifically eliding the check: > > > > > - if (m_profit.profitable_path_p (m_path, m_name, taken_edge, > > > - &irreducible) > > > + if ((m_path.length () == 1 > > > + || m_profit.profitable_path_p (m_path, m_name, taken_edge, > > > + &irreducible)) > > Correct. But currently the threader just "cheats", picks up one more > block and then "threads" the case anyway, doing this simple simplification > in the most expensive way possible ... Ah. > > > > > > > The following tries to handle those by recording simplifications > > > of the exit conditional as a thread of length one. That requires > > > special-casing them in the backward copier since if we do not > > > have any block to copy but modify the jump in place and remove > > > not taken edges this confuses the hell out of remaining threads. > > > > > > So back_jt_path_registry::update_cfg now first marks all > > > edges we know are never taken and then prunes the threading > > > candidates when they include such edge. Then it makes sure > > > to first perform unreachable edge removal (so we avoid > > > copying them when other thread paths contain the prevailing > > > edge) before continuing to apply the remaining threads. > > > > This is all beyond my pay grade. I'm not very well versed in the > > threader per se. So if y'all think it's a good idea, by all means. > > Perhaps Jeff can chime in, or remembers the above comment? > > > > > > > > In statistics you can see this avoids quite a bunch of useless > > > threads (I've investiated 3 random files from cc1files with > > > dropped stats in any of the thread passes). > > > > > > Still thinking about it it would be nice to avoid the work of > > > discovering those candidates we have to throw away later > > > which could eventually be done by having the backward threader > > > perform a RPO walk over the CFG, skipping edges that can be > > > statically determined as not being executed. Below I'm > > > abusing the path range query to statically analyze the exit > > > branch but I assume there's a simpler way of folding this stmt > > > which could then better integrate with such a walk. > > > > Unreachable paths can be queried with > > path_range_query::unreachable_path_p (). Could you leverage this? > > The idea was that if we ever resolved any SSA name to UNDEFINED, the > > path itself was unreachable. > > The situation is that we end up with paths where an intermediate > branch on the path can be simplified to false - but of course only > if we put all intermediate branch dependences into the list of > imports to consider. > > I don't like it very much to use the "threading" code to perform > the simplification but I couldn't figure a cheap way to perform > the simplification without invoking a full EVRP? That said, > the backwards threader simply does > > basic_block bb; > FOR_EACH_BB_FN (bb, m_fun) > if (EDGE_COUNT (bb->succs) > 1) > maybe_thread_block (bb); > > bool changed = m_registry.thread_through_all_blocks (true); > > instead of that we should only consider edges that may be executable > by instead walking the CFG along such edges, simplifying BB exit > conditionals. Unfortunately EVRP is now a C++ maze so I couldn't > find how to actually do such simplification, not knowing how > interacting with ranger influences the path query use either. > If you or Andrew has any suggestions on how to essentially > do a > > if (edge e = find_taken_edge (bb)) > { > ... > } > > where find_taken_edge should be at least as powerful as using > the path query for a single bb then I'd be all ears. As said, > I tried to find the code to cut&paste in EVRP but failed to ... Interesting... If what you need is find_taken_edge(bb), I think we can do this quite cleanly. What you're asking is to fold the conditional at the end of a basic block, and return the edge depending on the folded value. We have all the smarts to do the folding (fold_range), and tree-cfg.c has find_taken_edge(basic_block, tree val), which AFAICT, does everything except the folding of non-trivial statements. If you like, I could tweak find_taken_edge() to use a ranger if available (pass has called enable_ranger()), or otherwise use the global range query in cfun (SSA_NAME_RANGE_INFO but with the range_query API). This should be as fast as poking at SSA_NAME_RANGE_INFO for the common case, or using an active ranger if available. See the attached proof of concept. With this approach we could handle everything find_taken_edge(bb) currently handles, plus anything we can fold in ranger (without the penalty of a full ranger if you don't want to-- we can still fold and use global ranges for SSA names if available). Or ultimately, if you have an active ranger, you can leverage the full ranger functionality. Either way is a lot better than the 1 != 0, etc folding the code currently does ;-). If you want to fold these blocks from the threader, we could expose the ranger in the path_query with enable_ranger(), and then find_taken_edge(basic_block) can pick the ranger up. Up to you if you want to run this within the threader, or elsewhere. Does this make sense? I'm happy to cobble it up if you find it useful. Aldy --00000000000044488b05e64c6100 Content-Type: text/x-patch; charset="US-ASCII"; name="curr.patch" Content-Disposition: attachment; filename="curr.patch" Content-Transfer-Encoding: base64 Content-ID: X-Attachment-Id: f_l6v4bwjf0 ZGlmZiAtLWdpdCBhL2djYy90cmVlLWNmZy5jYyBiL2djYy90cmVlLWNmZy5jYwppbmRleCA1YmNm NzgxOThlNy4uNTAwOWI3ZDM0MTEgMTAwNjQ0Ci0tLSBhL2djYy90cmVlLWNmZy5jYworKysgYi9n Y2MvdHJlZS1jZmcuY2MKQEAgLTY1LDYgKzY1LDcgQEAgYWxvbmcgd2l0aCBHQ0M7IHNlZSB0aGUg ZmlsZSBDT1BZSU5HMy4gIElmIG5vdCBzZWUKICNpbmNsdWRlICJhc2FuLmgiCiAjaW5jbHVkZSAi cHJvZmlsZS5oIgogI2luY2x1ZGUgInNyZWFsLmgiCisjaW5jbHVkZSAiZ2ltcGxlLXJhbmdlLmgi CiAKIC8qIFRoaXMgZmlsZSBjb250YWlucyBmdW5jdGlvbnMgZm9yIGJ1aWxkaW5nIHRoZSBDb250 cm9sIEZsb3cgR3JhcGggKENGRykKICAgIGZvciBhIGZ1bmN0aW9uIHRyZWUuICAqLwpAQCAtMjQz NiwxMiArMjQzNywxMCBAQCBmaW5kX3Rha2VuX2VkZ2VfY29uZF9leHByIChjb25zdCBnY29uZCAq Y29uZF9zdG10LCB0cmVlIHZhbCkKIAogICBpZiAodmFsID09IE5VTExfVFJFRSkKICAgICB7Ci0g ICAgICAvKiBVc2UgdGhlIGN1cnJlbnQgdmFsdWUgb2YgdGhlIHByZWRpY2F0ZS4gICovCi0gICAg ICBpZiAoZ2ltcGxlX2NvbmRfdHJ1ZV9wIChjb25kX3N0bXQpKQotCXZhbCA9IGludGVnZXJfb25l X25vZGU7Ci0gICAgICBlbHNlIGlmIChnaW1wbGVfY29uZF9mYWxzZV9wIChjb25kX3N0bXQpKQot CXZhbCA9IGludGVnZXJfemVyb19ub2RlOwotICAgICAgZWxzZQorICAgICAgcmFuZ2VfcXVlcnkg KnEgPSBnZXRfcmFuZ2VfcXVlcnkgKGNmdW4pOworICAgICAgaW50X3JhbmdlPDI+IHI7CisgICAg ICBpZiAoIWZvbGRfcmFuZ2UgKHIsIGNvbnN0X2Nhc3QgPGdjb25kICo+IChjb25kX3N0bXQpLCBx KQorCSAgfHwgIXIuc2luZ2xldG9uX3AgKCZ2YWwpKQogCXJldHVybiBOVUxMOwogICAgIH0KICAg ZWxzZSBpZiAoVFJFRV9DT0RFICh2YWwpICE9IElOVEVHRVJfQ1NUKQo= --00000000000044488b05e64c6100--