From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-qt1-x82a.google.com (mail-qt1-x82a.google.com [IPv6:2607:f8b0:4864:20::82a]) by sourceware.org (Postfix) with ESMTPS id 2F8A4385840B for ; Tue, 5 Oct 2021 15:10:31 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 2F8A4385840B Received: by mail-qt1-x82a.google.com with SMTP id r16so19365146qtw.11 for ; Tue, 05 Oct 2021 08:10:31 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:subject:to:cc:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language; bh=ytI0RUr3J3FVVmPI+IHUQ3zdinfvk/yYwyu5Thz8SVw=; b=uEogIBCoyX5iCfHbZ5JsJ2+/sHVKuVzhmQmHlqwydH8RLTRlrDfXgnKcDS1/Gp+Ay+ 2ffE/+LBGBy2nyvdx4quyfEXEiENxAHBVSn6ROq6kRtpus/sUFhNemaLyvrZ21mk7hot X50HBIcbJB0ZIynqfJr/Fj1XAHM3JhzUlhpsJX4luHDQ/RUZb7mQelNHNnl7qxAXd+zv PUlzBY1NQDiFD5NuImrTePQ5mXYwkt109gvG4v0AM3vo4qyutlH48X/fLOYWh4E76iMt bXSdyo6dcpKonjiie85khP9VFRDKLUzyg/ml143pACFNhFkndYoEg6dRYLaHULeEc7o3 uC3Q== X-Gm-Message-State: AOAM5337ILrrUyt3LvZoQKVEu1NfVV89V3zdB/b1BYi6yH/xvu8lmVnq xt7QRYCz2KyNtnpeSS1sAkQ+MfF2vO4= X-Google-Smtp-Source: ABdhPJy0P4kjLU9tUFiaXtBIPsNiF+BgznmIwLYrZvuq1aOdETC80y8FUqKuHINjbGNwyWIMclhoBQ== X-Received: by 2002:a05:622a:11ce:: with SMTP id n14mr20641809qtk.379.1633446630392; Tue, 05 Oct 2021 08:10:30 -0700 (PDT) Received: from [172.31.0.175] (c-98-202-48-222.hsd1.ut.comcast.net. [98.202.48.222]) by smtp.gmail.com with ESMTPSA id v5sm9582777qkh.17.2021.10.05.08.10.28 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 05 Oct 2021 08:10:29 -0700 (PDT) Subject: Re: [RFC] More jump threading restrictions in the presence of loops. To: Aldy Hernandez , Michael Matz Cc: Richard Biener , Andrew MacLeod , GCC patches References: <20211004094313.1596556-1-aldyh@redhat.com> From: Jeff Law Message-ID: <88ae12c8-1254-9c57-7290-e46d6d4c4f8b@gmail.com> Date: Tue, 5 Oct 2021 09:10:28 -0600 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101 Thunderbird/78.14.0 MIME-Version: 1.0 In-Reply-To: Content-Language: en-US X-Spam-Status: No, score=-2.5 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, HTML_MESSAGE, NICE_REPLY_A, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit X-Content-Filtered-By: Mailman/MimeDel 2.1.29 X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 05 Oct 2021 15:10:33 -0000 On 10/5/2021 7:33 AM, Aldy Hernandez wrote: > On Mon, Oct 4, 2021 at 6:29 PM Michael Matz wrote: >> Hello, >> >> On Mon, 4 Oct 2021, Jeff Law wrote: >> >>> And just in case it got lost. Here's the analysis on 960218-1 on visium: >>> >>> We've got this going into ethread: >>> >>> int f (int x) >>> { >>> int D.1420; >>> int a; >>> >>> ;; basic block 2, loop depth 0, maybe hot >>> ;; prev block 0, next block 3, flags: (NEW, REACHABLE, VISITED) >>> ;; pred: ENTRY (FALLTHRU,EXECUTABLE) >>> a_4 = ~x_3(D); >>> goto ; [INV] >>> ;; succ: 4 (FALLTHRU,EXECUTABLE) >>> >>> ;; basic block 3, loop depth 1, maybe hot >>> ;; prev block 2, next block 4, flags: (NEW, REACHABLE, VISITED) >>> ;; pred: 4 (TRUE_VALUE,EXECUTABLE) >>> gl = a_1; >>> ;; succ: 4 (FALLTHRU,DFS_BACK,EXECUTABLE) >>> >>> ;; basic block 4, loop depth 1, maybe hot >>> ;; prev block 3, next block 5, flags: (NEW, REACHABLE, VISITED) >>> ;; pred: 2 (FALLTHRU,EXECUTABLE) >>> ;; 3 (FALLTHRU,DFS_BACK,EXECUTABLE) >>> # a_1 = PHI >>> if (a_1 != 0) >>> goto ; [INV] >>> else >>> goto ; [INV] >>> ;; succ: 3 (TRUE_VALUE,EXECUTABLE) >>> ;; 5 (FALSE_VALUE,EXECUTABLE) >>> >>> ;; basic block 5, loop depth 0, maybe hot >>> ;; prev block 4, next block 1, flags: (NEW, REACHABLE, VISITED) >>> ;; pred: 4 (FALSE_VALUE,EXECUTABLE) >>> return; >>> ;; succ: EXIT >>> >>> } >> First notice that this doesn't have an empty latch block to start with >> (in fact, there is no separate latch block at all), so the loop optimizers >> haven't been initialized for simple latches at this point. Still, I would >> say that even then we want to be careful with the latch edge, as putting >> code onto it will most probably create a problem downstream once we _do_ >> want to intialize the loop optimizers for simple latches. So, that we are >> careful here is okay, we are just too careful in this specific case. >> >>> There's a pretty obvious jump thread in there. 3->4->5. >>> >>> We used to allow this, but it's currently being rejected and I'm not >>> sure it should. >>> >>> In simplest terms we're threading from inside the loop, through the >>> latch to a point outside the loop. At first glance, that seems safe. >> Is at least the unrestricted jump threader (the one after loop optimizers) >> picking this up? >> >> Independend of that, I agree, that this specific instance of threading >> through the latch block, even though followed by to-be-copied >> instructions, is okay. We need to determine precisely when that is okay, >> though. In this case there are several reasons I could see why this is >> so: >> (a) while the thread is through the latch block, it's _not_ through the >> latch edge (which is 4->3). That would create the problem, so here >> we're fine. > It seems we all agree Jeff's finding should be allowed, so let's > attack this one first, since it gets almost all of his visium > failures. I can submit the rest of the cases separately. > > The attached patch catches the IL discussed, and adds a relevant > gimple FE test others can use for experimenting :). > > Tested on x86-64 and by visual inspection on visium-elf on the > regressions Jeff pointed me at. > > OK? > > BTW Jeff, this fixes all the regressions you mention except: > > 1. pr68911.c > > The path not being threaded here is 7->10->11->12. It crosses loops > multiple times, so I believe the restriction code is correct. > > 7, 10, 12 are in loop1. > 11 is in loop 2. > > So we have a path going from loop1 -> loop2 -> loop1. I can't > conceive any scenario where this is ok, but I can attach the entire IL > if I'm missing something. > > 2. 961125-1.c > > This one is a bit trickier. Here we're trying to thread the following > conditional, which I mentioned when I contributed this work, we don't > handle (and happens infrequently enough not to matter): > > + // Loop 4 > + [local count: 114863531]: > + # ptr_8 = PHI > + if (ptr_8 < &MEM [(void *)":ab" + 3B]) > + goto ; [50.00%] > + else > + goto ; [50.00%] > > The hybrid threader doesn't handle &MEM in the final conditional. As > I mentioned earlier, if this becomes an issue, we can adapt class > pointer_equiv_analyzer like we did for evrp. I have a gimple FE test > I will contribute as an XFAIL with an associated PR to keep us honest. > > That being said... in this particular test, this is all irrelevant > because the path will be disallowed for two reasons: > > a) The path crosses loops, and the reason we didn't realize it in the > old code was because the ASSERT_EXPR had pulled the SSA outside the > loop, so it looks like the entire path is l in the same loop. If you > look at the original IL, it's not. > > b) Now the path actually fits the pattern being discussed in this > patch, where there's an early exit out of a loop, so it looks like we > should handle it. But...in this case, we would fill a presently empty > latch. Interestingly, the old code didn't catch it, because....you > guessed it...there was an ASSERT_EXPR in the latch. > > So I argue that even in the absence of MEM_REF handling, we shouldn't > thread this path. > > 0001-Loosen-loop-crossing-restriction-in-threader.patch > > From 5abe65668f602d53b8f3dc515508dc4616beb048 Mon Sep 17 00:00:00 2001 > From: Aldy Hernandez > Date: Tue, 5 Oct 2021 15:03:34 +0200 > Subject: [PATCH] Loosen loop crossing restriction in threader. > > Crossing loops is generally discouraged from the threader, but we can > make an exception when we don't cross the latch or enter another loop, > since this is just an early exit out of the loop. > > In fact, the whole threaded path is logically outside the loop. This > has nice secondary effects. For example, objects on the threaded path > will no longer necessarily be live throughout the loop, so we can get > register allocation improvements. The threaded path can physically > move outside the loop resulting in better icache efficiency, etc. > > Tested on x86-64 Linux, and on a visium-elf cross making sure that the > following tests do not have an abort in the final assembly: > > gcc.c-torture/execute/960218-1.c > gcc.c-torture/execute/visium-pending-4.c > gcc.c-torture/execute/pr58209.c > > gcc/ChangeLog: > > * tree-ssa-threadupdate.c (jt_path_registry::cancel_invalid_paths): > > gcc/testsuite/ChangeLog: > > * gcc.dg/tree-ssa/ssa-thread-valid.c: New test. I'll throw it in and see what we get :-) jeff