From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 94781 invoked by alias); 24 Jan 2017 10:36:42 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 94767 invoked by uid 89); 24 Jan 2017 10:36:41 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-1.4 required=5.0 tests=AWL,BAYES_00,FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,RCVD_IN_SORBS_SPAM,SPF_PASS autolearn=no version=3.3.2 spammy=fre, expense, H*Ad:D*inria.fr, HX-Envelope-From:sk:richard X-HELO: mail-ot0-f171.google.com Received: from mail-ot0-f171.google.com (HELO mail-ot0-f171.google.com) (74.125.82.171) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Tue, 24 Jan 2017 10:36:34 +0000 Received: by mail-ot0-f171.google.com with SMTP id 65so123638912otq.2 for ; Tue, 24 Jan 2017 02:36:34 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=Ergw7d7LU7+t1ddb2bKRYZQW47ILa3eW/G8Y5aREJWM=; b=KYttikmNUMdmyBDyW67J6P+bejbHXUPN2jobZcERabicCLnJMn43PLi8pztYli9vmW bg/FqOgyue/8UBPtiiOQ0X41JRVLkmu51Unn7RvscPqGUDDhnDr+RE6BEF4o8EjGOLui +gsx6WO0ml4IavKIGo3T2SrUU+K1UaFQoer2Vs0/mxF9KnVXam6RQyx43lPET8OEd8Us gzaPqY4c4RtEaHC3vf+wBloKGqemc2SgMRiRUS2NITphkzt3xWYO7XYdzy5P9tcf/711 wFHGB2JS7KKDOL/F/1KpkTUMtrHw36sxVOKjcVOe/C49NBeQZ1HVqWTExYwsEx1qvd25 iHNw== X-Gm-Message-State: AIkVDXLzKvKPSbgH+YqOiHC3oFeq8gelCZKvdSRL+6OA1bclt2byIhiYifrWAaWDuI3A0dfraqNm/dADaLkqhA== X-Received: by 10.157.11.37 with SMTP id a34mr16028608ota.258.1485254192990; Tue, 24 Jan 2017 02:36:32 -0800 (PST) MIME-Version: 1.0 Received: by 10.157.42.12 with HTTP; Tue, 24 Jan 2017 02:36:32 -0800 (PST) In-Reply-To: <45f3ffa4-99fb-96ea-ec41-8f083bf33156@redhat.com> References: <497e7848-5690-2c4e-7277-cab674a60a35@gmail.com> <20170117073804.GU1867@tucnak> <49d6004c-6a5d-0ef3-cf15-a39016b09847@gmail.com> <0ec47b4d-c5bf-d967-a103-1ff73dbda1ec@redhat.com> <20170118081059.GI1867@tucnak> <6a029a72-2f43-c999-f1e7-809913c75b6a@redhat.com> <62b68dac-5de8-008c-8df9-1c22732aa25e@redhat.com> <45f3ffa4-99fb-96ea-ec41-8f083bf33156@redhat.com> From: Richard Biener Date: Tue, 24 Jan 2017 10:49:00 -0000 Message-ID: Subject: Re: A + B CMP A -> A CMP' CST' match.pd patterns [was [PATCH] avoid calling memset et al. with excessively large sizes (PR 79095)] To: Jeff Law Cc: Marc Glisse , Martin Sebor , Jakub Jelinek , Gcc Patch List Content-Type: text/plain; charset=UTF-8 X-IsSubscribed: yes X-SW-Source: 2017-01/txt/msg01858.txt.bz2 On Tue, Jan 24, 2017 at 1:03 AM, Jeff Law wrote: > On 01/21/2017 01:00 AM, Marc Glisse wrote: >> >> On Fri, 20 Jan 2017, Jeff Law wrote: >> >>> On 01/20/2017 04:30 PM, Jeff Law wrote: >>>> >>>> Going to work from the self-contained test... >>>> >>>> >>>>> >>>>> Here's a test case that's closer to the one from the bug. It also >>>>> ends up with the out of bounds memset even at -O1, during PRE. >>>>> >>>>> typedef __SIZE_TYPE__ size_t; >>>>> >>>>> struct S >>>>> int *p0, *p1, *p2; >>>>> >>>>> size_t size () const { return p1 - p0; } >>>>> >>>>> void f (size_t n) { >>>>> if (n > size ()) // can't happen because >>>>> foo (n - size ()); // n is in [1, MIN(size() - 1, 3)] >>>>> else if (n < size ()) >>>>> bar (p0 + n); >>>>> } >>>>> >>>>> void foo (size_t n) >>>>> { >>>>> size_t left = (size_t)(p2 - p1); >>>>> if (left >= n) >>>>> __builtin_memset (p2, 0, n * sizeof *p2); >>>>> } >>>>> >>>>> void bar (int*); >>>>> }; >>>>> >>>>> void f (S &s) >>>>> { >>>>> size_t n = s.size (); >>>>> if (n > 1 && n < 5) >>>>> s.f (n - 1); >>>>> } >>>> >>>> >>>> >>>> Looking at the .vrp1 dump for this test we find: >>>> >>>> ;; basic block 3, loop depth 0, count 0, freq 3664, maybe hot >>>> ;; prev block 2, next block 4, flags: (NEW, REACHABLE, VISITED) >>>> ;; pred: 2 [36.6%] (TRUE_VALUE,EXECUTABLE) >>>> _18 = ASSERT_EXPR <_13, _13 + 18446744073709551614 <= 2>; >>>> _2 = _18 + 18446744073709551615; >>>> if (_2 > _18) >>>> goto ; [50.00%] >>>> else >>>> goto ; [50.00%] >>>> >>>> _2 > _18 is the same as >>>> >>>> (_18 - 1) > _18 >>>> >>>> Which is only true if _18 is zero. And since _18 has a computed range >>>> of [2..4], that can never happen. >>>> >>>> VRP doesn't currently handle this case, though we do have some code to >>>> turn relational comparisons into equality comparisons. If I manually >>>> turn the relational into _18 == 0 at that point, then the next VRP pass >>>> is able to optimize the conditional away which in turn eliminates the >>>> problematical memset call & warning. >>>> >>>> So one possible approach would be to treat simplify_cond_using_ranges to >>>> simplify that kind of condition. I don't know if that would fix the >>>> reported testcase as I haven't looked at it directly under the debugger >>>> yet. >>> >>> In fact, match.pd already has a pattern to handle a relational like >>> (X + -1) > X >>> >>> The pattern doesn't fire because of a single_use guard. This was >>> discussed back in 2015 with Jakub advocating for a single_use guard, >>> but Marc leaning the other direction, but not enough to argue too much >>> about it. >>> >>> In early 2016 Marc actually submitted the match.pd patch with the >>> single use guard per Jakub's preference. >>> >>> The single use guard checks the SSA_NAME holding the (X + CST) >>> expression for single use. It's not entirely clear why we check for >>> single use. If it's because of object lifetimes, then that's a total >>> red herring here. >>> >>> Given something like: >>> x = a + c >>> if (a > x) >>> >>> The transformation into >>> >>> x = a + c; >>> if (a < c') >>> >>> Where x has multiple uses, the transformation will not inherently >>> change the lifetime of "x" worse (if the conditional was the last use, >>> then the lifetime of x may shorten somewhat). If there were uses >>> after the condition, the lifetime of "x" remains unchanged. >> >> >> I don't remember the conversation and I don't have experience with >> register pressure. My assumption would be that the issue is with the new >> constant c'. In the first case, during the comparison, only a and x are >> alive, while in the second case, c' is as well, which may use an extra >> register. Does that make sense? >> >> (I don't know if that's bad enough to inhibit the simplification) >> >> Or are we talking about the transformation that has this comment in >> match.pd: >> >> Currently restricted to single use so as not to interfere too much with >> ADD_OVERFLOW detection in tree-ssa-math-opts.c. >> >> That detection logic needs to be improved anyway (see also a discussion >> with Vincent Lefevre yesterday on gcc-help, users do write the >> simplified form directly), the single_use is supposedly only there until >> that is done (really depends on how well the improved logic works...). >> >>> "a" was already live from the assignment to at least the conditional. >>> But again, we haven't inherently changed its lifetime. >>> >>> However, what can happen is after transformation, the assignment might >>> sink to a later use point in the CFG, which might lengthen the >>> lifetime of "a", but it's also shortening the lifetime of "x" by the >>> same amount. >>> >>> So, ultimately I don't buy the argument that we should inhibit this >>> based on register lifetime issues. >>> >>> Jakub, perhaps you remember why you wanted this restricted to cases >>> where "x" has a single use? > > So I went and looked at the match_uaddsub_overflow bits for a while. AFAICT > it appears to be trying to allow us to do the arithmetic and overflow status > computation once and CSE the result. At the single instance we would > generate a uaddv4 insn which does the arithmetic and jump. > > ISTM the real value here is in the initial RTL generation. DOM ought to be > simplifying the arithmetic CSEs and dominated tests. Threading should pick > up the most common non-dominated tests. > > Converting into an EQ/NE test against zero for the special case should still > preserve those optimizations (and in fact makes the threading easier to > discover). > > So it's really a matter of do we do it directly in VRP, or in match.pd. The > former is more effective, the latter is cleaner. >> >> >> I believe we are supposed to call match.pd from VRP eventually (though >> that may not have happened yet), so the test could probably also be done >> there, if that looks more convenient and we decide not to remove >> single_use completely for this transformation. > > That was my thought as well, but AFAICT we only call into match.pd from VRP > if we changed the insn. Yes - there was thoughts to change that (but it comes at an expense). Basically we'd like to re-fold stmts that indirectly use stmts we changed. We certainly don't want to re-fold everything all the time. > There's no systematic calling into match.pd from > most passes. Thus, we don't do the transformation until fairly late in the > pipeline when it's in match.pd. Passes should fold stmts they change but with match.pd we now have that "indirect" issue indeed. Currently only forwprop unconditionally folds all stmts. I'm not sure if there's any heuristic we can apply for folding stmts that is cheaper than simply folding everything... The simplest thing would be to fold stmts in the same BB after we've changed a stmt in it. A "full" solution would be to populate a bitmap with all defs of stmts using the changed stmts defs, folding their definitions (and clearing the bits). Exclude virtual operands and PHIs. Something to experiment with I guess. The above could apply to the SSA propagator substitute-and-fold phase as well as FRE elimination and DOM stmt optimization. Richard. > > Jeff >