From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ej1-x636.google.com (mail-ej1-x636.google.com [IPv6:2a00:1450:4864:20::636]) by sourceware.org (Postfix) with ESMTPS id 2EF54388C033 for ; Mon, 28 Jun 2021 08:22:29 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 2EF54388C033 Received: by mail-ej1-x636.google.com with SMTP id c17so2599652ejk.13 for ; Mon, 28 Jun 2021 01:22:29 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=MlImhK3CVomFCcEavliHqsjbiyHbc/Gnb9zIYsZ0LU0=; b=uflw2fRAvYewS5YVIimqXwMkYfe5mj7nVnYwOgsMXzV0JXOizWXcfz/NMOaRzSwf6t 6oBz+p5jfWUSIirD7ilw7/8NGNUZAc1C04Zs7zNpKmYkawf0Z6WHQdZHO4NC4cgWzPXd And9jHmmJJ4xd7RbVRMSQWYhgn7gGIdWLYpFR8k3OoZNOsjPt3UCXLMAD+bGCjDp1vPp RxD1A63S4YdFLp1acA1l3z4rPlwuvwV1xsvEzKcm3KKP3zyZqwlizhM3JWWK6zEwPyR0 /fQI/s9GCVCBK2q8egG8+Y0vlShWd0TkA4bKDalvQbxAEHMVmkCk89LBznce9uwpAVdH bu6w== X-Gm-Message-State: AOAM532V8pNzyeJmgYuGFdHodVixcHLSXAgzbiwYTGz1uhOTCJ40XoVJ xGHHFfNAFWlvwT2ADBlbqNgTghHBjYya4GtsftQ= X-Google-Smtp-Source: ABdhPJxciZgl0lpesazRg8eQeqMgXX9mkHhfJZmOtfPpocB1iQ2A3H8ARQByL/SuODDEAsizgjJh8dC1Pnsi5O/3iVE= X-Received: by 2002:a17:907:3faa:: with SMTP id hr42mr23059764ejc.129.1624868548195; Mon, 28 Jun 2021 01:22:28 -0700 (PDT) MIME-Version: 1.0 References: <07775b9d-b8eb-48cb-57ef-9cc278d38967@redhat.com> <550e1ffd-957c-f348-49b6-b980c072c307@redhat.com> <3ac5e4e1-405f-fd5a-cb36-433a93f77df4@gmail.com> <23a30c02-fa53-8734-88a8-72ec4d1e8211@redhat.com> In-Reply-To: <23a30c02-fa53-8734-88a8-72ec4d1e8211@redhat.com> From: Richard Biener Date: Mon, 28 Jun 2021 10:22:17 +0200 Message-ID: Subject: Re: replacing the backwards threader and more To: Aldy Hernandez Cc: Jeff Law , GCC Mailing List , Andrew MacLeod , Martin Sebor Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-2.5 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: gcc@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 28 Jun 2021 08:22:31 -0000 On Fri, Jun 25, 2021 at 6:20 PM Aldy Hernandez wrote: > > Hi folks. > > I'm done with benchmarking, testing and cleanups, so I'd like to post my > patchset for review. However, before doing so, I'd like to address a > handful of meta-issues that may affect how I post these patches. First of all thanks for the detailed analysis and report! > Trapping on differences > ======================= > > Originally I wanted to contribute verification code that would trap if > the legacy code threaded any edges the new code couldn't (to be removed > after a week). However, after having tested on various architectures > and only running once into a missing thread, I'm leaning towards > omitting the verification code, since it's fragile, time consuming, and > quite hacky. Agreed. > For the record, I have tested on x86-64, aarch64, ppc64 and ppc64le. > There is only one case, across bootstrap and regression tests where the > verification code is ever tripped (discussed below). > > Performance > =========== > > I re-ran benchmarks as per our callgrind suite, and the penalty with the > current pipeline is 1.55% of overall compilation time. As is being > discussed, we should be able to mitigate this significantly by removing > other threading passes. 1.55% for the overall compilation looks quite large - is this suite particularly demanding on VRP/ranger? Is the figure for 'cc1files' similar? (collect preprocessed sources of gcc/*.{c,cc} compiles, like by appending -save-temps to CFLAGS in a non-bootstrap build) I'm curious if you tried to look at the worst offenders and if there's anything new, threading specific. More threading might result in larger code (how is cc1 size affected by the change?) and more code to compile by later passes can easily explain >1% differences. > Failing testcases > ================= > > I have yet to run into incorrect code being generated, but I have had to > tweak a considerable number of tests. I have verified every single > discrepancy and documented my changes in the testsuite when it merited > doing so. However, there are a couple tests that trigger regressions > and I'd like to ask for guidance on how to address them. > > 1. gcc.c-torture/compile/pr83510.c > > I would like to XFAIL this. > > What happens here is that thread1 threads a switch statement such that > the various cases have been split into different independent blocks. > One of these blocks exposes an arr[i_27] access which is later > propagated by VRP to be arr[10]. This is an invalid access, but the > array bounds code doesn't know it is an unreachable path. > > However, it is not until dom2 that we "know" that the value of the > switch index is such that the path to arr[10] is unreachable. For that > matter, it is not until dom3 that we remove the unreachable path. Sounds reasonable. > 2. -Wfree-nonheap-object > > This warning is triggered while cleaning up an auto_vec. I see that the > va_heap::release() inline is wrapped with a pragma ignore > "-Wfree-nonheap-object", but this is not sufficient because jump > threading may alter uses in such a way that may_emit_free_warning() will > warn on the *inlined* location, thus bypassing the pragma. > > I worked around this with a mere: > > > @@ -13839,6 +13839,7 @@ maybe_emit_free_warning (tree exp) > > location_t loc = tree_inlined_location (exp); > > + loc = EXPR_LOCATION (exp); > > but this causes a ton of Wfree-nonheap* tests to fail. I think someone > more knowledgeable should address this (msebor??). That looks wrong, but see msebors response for maybe some better answer. > 3. uninit-pred-9_b.c > > The uninit code is getting confused with the threading and the bogus > warning in line 24 is back. I looked at the thread, and it is correct. > > I'm afraid all these warnings are quite fragile in the presence of more > aggressive optimizations, and I suspect it will only get worse. Yep. Shouldn't be a blocker, just XFAIL ... > 4. libphobos/src/std/net/isemail.d > > This is a D test where we don't actually fail, but we trigger the > verification code. It is the only jump threading edge that the new code > fails to get over the old code, and it only happens on ppc64. > > It triggers because a BB4 -> BB5 is too expensive to thread, but a BBn > -> BB3 -> BB4 -> BB5 is considered safe to thread because BB3 is a latch > and it alters the profitability equation. The reason we don't get it, > is that we assume that if a X->Y is unprofitable, it is not worth > looking at W->X->Y and so forth. > > Jeff had some fancy ideas on how to attack this. Once such idea was to > stop looking back, but only for things we were absolutely sure would > never yield a profitable path. I tried a subset of this, by allowing > further looks on this latch test, but my 1.55% overall performance > penalty turned into an 8.33% penalty. Personally it looks way too > expensive for this one isolated case. Besides, the test where this > clamping code originally came from still succeeds (commit > eab2541b860c48203115ac6dca3284e982015d2c). ick - I definitely expect some corner case optimization "regressions" caused by the change if and only because it might trigger pass ordering issues with later passes not expecting "optimized" code. So I'd disregard a single "missed" case for this moment. It might be interesting to try to distill a C testcase from the case and file it with bugzilla for reference purposes. > CONCLUSION > ========== > > That's basically it. > > If we agree the above things are not big issues, or can be addressed as > follow-ups, I'd like to start the ball rolling on the new threader. > This would allow more extensive testing of the code, and separate it a > bit from the other big changes coming up :). I'd say go ahead (with the bogus change somehow resolved), I'd still like to know sth about the compile-time issue but then we can eventually deal with this as followup as well. If size is the reason we can see taming down the threader via some --param adjustments for example. I'm just trying to make sure we're not introducing some algorithmic quadraticnesses that were not there before. Richard. > Aldy >