From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ed1-x535.google.com (mail-ed1-x535.google.com [IPv6:2a00:1450:4864:20::535]) by sourceware.org (Postfix) with ESMTPS id 1711438618FF for ; Fri, 18 Jun 2021 09:44:08 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 1711438618FF Received: by mail-ed1-x535.google.com with SMTP id b11so7796804edy.4 for ; Fri, 18 Jun 2021 02:44:08 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=V7LOwOpXeYSYdTrC6cK7p+fibHqRgjT6xQskaKZG6zo=; b=EtH61mUC+nuBWQjMnwiAxeHwNvVeAhix+/D9XEV+Cr5Phj/lK4mIYCLNgP7L5DxkY6 okX/uUDHsFkpRs3m6J+UVlfRzeRSDuHU/fztVlY5L5L79aPPCtikZghvnVmE0FYf0+18 ydEq64dtT5piE3sR1QaLW+qzcnJ9beiDUss6e1h8ub0UJYsLUk3kjtfxCeWsPG7RUkai nU61Ca8NEZvuogv1WtDNIddZbdv6H0j/wFbjEeLlX9aOZTIG86XFQ6LD5U8Jyj7nhaS7 JKlFTjXOWiL/b/IGhDtBj2Bgw3Xx5uR2Z9w1/giHfehgcdIONP6gpNVLzdaLiWIcQhTt h3tQ== X-Gm-Message-State: AOAM532WAReNFB00qnkO7m0DZoEERifDxglDvXncXjKPA1Ihe6bw//m6 KaUJtAUXVZCCKyw8uIoHV5h46QMdLDVzXtGOcMU= X-Google-Smtp-Source: ABdhPJxt8/gol2FdvwHflSXX4mGa4IE8w47LfR3OK32anTBGUEF6EHs/2ouR+MkzxQRWjaSEghXgRWJe5uEp4FT8S8k= X-Received: by 2002:a05:6402:175b:: with SMTP id v27mr3781094edx.61.1624009446932; Fri, 18 Jun 2021 02:44:06 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Richard Biener Date: Fri, 18 Jun 2021 11:43:56 +0200 Message-ID: Subject: Re: [EXTERNAL] Re: [PATCH] tree-optimization: Optimize division followed by multiply [PR95176] To: Victor Tong Cc: "gcc-patches@gcc.gnu.org" , Marc Glisse Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-2.6 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 18 Jun 2021 09:44:10 -0000 On Wed, Jun 16, 2021 at 8:49 PM Victor Tong wrote: > > Hi Richard, > > Thanks for the feedback. From what you said, I can think of two possible = solutions (though I'm not sure if either is feasible/fully correct): > > Option 1: Have the new X * (Y / X) --> Y - (Y % X) optimization only run = in scenarios that don't interfere with the existing X - (X / Y) * Y --> X %= Y optimization. > > This would involve checking the expression one level up to see if there's= a subtraction that would trigger the existing optimization. I looked throu= gh the match.pd file and couldn't find a bail condition like this. It doesn= 't seem like there's a link from an expression to its parent expression one= level up. This also feels a bit counter-intuitive since it would be doing = the opposite of the bottom-up expression matching where the compiler would = like to match a larger expression rather than a smaller one. Yes, that option is not really possible from match.pd. > Option 2: Add a new pattern to support scenarios that the existing nop_co= nvert pattern bails out on. > > Existing pattern: > > (simplify > (minus (nop_convert1? @0) (nop_convert2? (minus (nop_convert3? @@0) @1= ))) > (view_convert @1)) > > New pattern to add: > > /* X - (X - Y) --> Y */ > (simplify > (minus @0 (convert? (minus @@0 @1))) > (if (INTEGRAL_TYPE_P (type) > && TYPE_OVERFLOW_UNDEFINED(type) > && INTEGRAL_TYPE_P (TREE_TYPE(@1)) > && TYPE_OVERFLOW_UNDEFINED(TREE_TYPE(@1)) > && !TYPE_UNSIGNED (TREE_TYPE (@1)) > && !TYPE_UNSIGNED (type) > && TYPE_PRECISION (TREE_TYPE (@1)) <=3D TYPE_PRECISION (type)) > (convert @1))) > > I think the truncation concerns that you brought up should be covered if = the external expression type precision is greater than or equal to the inte= rnal expression type. There may be a sign extension operation (which is why= the nop_convert check fails) but that shouldn't affect the value of the ex= pression. And if the types involved are signed integers where overflow/unde= rflow results in undefined behavior, the X - (X - Y) --> Y optimization sho= uld be legal. > > Please correct me if I'm wrong with either one of these options, or if yo= u can think of a better option to fix the regression. So to recap, we're looking to simplify 42 - (long int) (42 - 42 % x) (simplified from gcc.dg/fold-minus-6.c), or simply (new testcase): long fn1 (int x) { return 42L - (long)(42 - x); } where the existing pattern does not apply because the conversion is not a NOP one: (simplify (minus (nop_convert1? (minus (nop_convert2? @0) @1)) @0) (if (!ANY_INTEGRAL_TYPE_P (type) || TYPE_OVERFLOW_WRAPS (type)) (negate (view_convert @1)) (view_convert (negate @1)))) so let's consider replacing nop_convert1? with convert1? and thus obtain (simplify (minus (convert1? (minus (nop_convert2? @0) @1)) @0) (if (!ANY_INTEGRAL_TYPE_P (type) || TYPE_OVERFLOW_WRAPS (type)) (negate (view_convert @1)) (view_convert (negate @1)))) given we still require a matching @0 (as in operand_requal_p) it looks like a convert1 that is not the inverse of the nop_convert2, and thus also a nop_convert is only possible for constants (because operand_equal_p does not verify typ= e equality). Now - can we construct any testcase for which this conversion w= ould be wrong? Richard. > Thanks, > Victor > > > > > From: Richard Biener > Sent: Monday, June 7, 2021 1:25 AM > To: Victor Tong > Cc: gcc-patches@gcc.gnu.org > Subject: Re: [EXTERNAL] Re: [PATCH] tree-optimization: Optimize division = followed by multiply [PR95176] > > On Wed, Jun 2, 2021 at 10:55 PM Victor Tong wrote: > > > > Hi Richard, > > > > Thanks for reviewing my patch. I did a search online and you're right -= - there isn't a vector modulo instruction. I'll remove the X * (Y / X) --> = Y - (Y % X) pattern and the existing X - (X / Y) * Y --> X % Y from trigger= ing on vector types. > > > > I looked into why the following pattern isn't triggering: > > > > (simplify > > (minus @0 (nop_convert1? (minus (nop_convert2? @0) @1))) > > (view_convert @1)) > > > > The nop_converts expand into tree_nop_conversion_p checks. In fn2() of = the testsuite/gcc.dg/fold-minus-6.c, the expression during generic matching= looks like: > > > > 42 - (long int) (42 - 42 % x) > > > > When looking at the right-hand side of the expression (the (long int) (= 42 - 42 % x)), the tree_nop_conversion_p check fails because of the type pr= ecision difference. The expression inside of the cast has a 32-bit precisio= n and the outer expression has a 64-bit precision. > > > > I looked around at other patterns and it seems like nop_convert and vie= w_convert are used because of underflow/overflow concerns. I'm not familiar= with the two constructs. What's the difference between using them and chec= king TYPE_OVERFLOW_UNDEFINED? In the scenario above, since TYPE_OVERFLOW_UN= DEFINED is true, the second pattern that I added (X - (X - Y) --> Y) gets t= riggered. > > But TYPE_OVERFLOW_UNDEFINED is not a good condition here since the > conversion is the problematic one and > conversions have implementation defined behavior. Now, the above does > not match because it wasn't designed to, > and for non-constant '42' it would have needed a (convert ...) around > the first @0 as well (matching of constants is > by value, not by value + type). > > That said, your > > +/* X - (X - Y) --> Y */ > +(simplify > + (minus (convert1? @0) (convert2? (minus @@0 @1))) > + (if ((INTEGRAL_TYPE_P (type) || VECTOR_INTEGER_TYPE_P (type)) && > TYPE_OVERFLOW_UNDEFINED(type)) > + (convert @1))) > > would match (int)x - (int)(x - y) where you assert the outer subtract > has undefined behavior > on overflow but the inner subtract could wrap and the (int) conversion > can be truncating > or widening. Is that really always a valid transform then? > > Richard. > > > Thanks, > > Victor > > > > > > From: Richard Biener > > Sent: Tuesday, April 27, 2021 1:29 AM > > To: Victor Tong > > Cc: gcc-patches@gcc.gnu.org > > Subject: [EXTERNAL] Re: [PATCH] tree-optimization: Optimize division fo= llowed by multiply [PR95176] > > > > On Thu, Apr 1, 2021 at 1:03 AM Victor Tong via Gcc-patches > > wrote: > > > > > > Hello, > > > > > > This patch fixes PR tree-optimization/95176. A new pattern in match.p= d was added to transform "a * (b / a)" --> "b - (b % a)". A new test case w= as also added to cover this scenario. > > > > > > The new pattern interfered with the existing pattern of "X - (X / Y) = * Y". In some cases (such as in fn4() in gcc/testsuite/gcc.dg/fold-minus-6.= c), the new pattern is applied causing the existing pattern to no longer ap= ply. This results in worse code generation because the expression is left a= s "X - (X - Y)". An additional subtraction pattern of "X - (X - Y) --> Y" w= as added to this patch to avoid this regression. > > > > > > I also didn't remove the existing pattern because it triggered in mor= e cases than the new pattern because of a tree_invariant_p check that's ins= erted by genmatch for the new pattern. > > > > Yes, we do not handle using Y multiple times when it might contain > > side-effects in GENERIC folding > > (comments in genmatch suggest we can use save_expr but we don't > > implement this [anymore]). > > > > On GIMPLE there's also the issue that your new pattern creates a > > complex expression which > > makes it failed to be used by value-numbering for example where the > > old pattern was OK > > (eventually, if no conversion was required). > > > > So indeed it looks OK to preserve both. > > > > I wonder why you needed the > > > > +/* X - (X - Y) --> Y */ > > +(simplify > > + (minus (convert1? @0) (convert2? (minus @@0 @1))) > > + (if ((INTEGRAL_TYPE_P (type) || VECTOR_INTEGER_TYPE_P (type)) && > > TYPE_OVERFLOW_UNDEFINED(type)) > > + (convert @1))) > > > > pattern since it should be handled by > > > > /* Match patterns that allow contracting a plus-minus pair > > irrespective of overflow issues. */ > > /* (A +- B) - A -> +- B */ > > /* (A +- B) -+ B -> A */ > > /* A - (A +- B) -> -+ B */ > > /* A +- (B -+ A) -> +- B */ > > > > in particular > > > > (simplify > > (minus @0 (nop_convert1? (minus (nop_convert2? @0) @1))) > > (view_convert @1)) > > > > if there's supported cases missing I'd rather extend this pattern than > > replicating it. > > > > +/* X * (Y / X) is the same as Y - (Y % X). */ > > +(simplify > > + (mult:c (convert1? @0) (convert2? (trunc_div @1 @@0))) > > + (if (INTEGRAL_TYPE_P (type) || VECTOR_INTEGER_TYPE_P (type)) > > + (minus (convert @1) (convert (trunc_mod @1 @0))))) > > > > note that if you're allowing vector types you have to use > > (view_convert ...) in the > > transform and you also need to make sure that the target can expand > > the modulo - I suspect that's an issue with the existing pattern as wel= l. > > I don't know of any vector ISA that supports modulo (or integer > > division, that is). > > Restricting the patterns to integer types is probably the most > > sensible solution. > > > > Thanks, > > Richard. > > > > > I verified that all "make -k check" tests pass when targeting x86_64-= pc-linux-gnu. > > > > > > 2021-03-31 Victor Tong > > > > > > gcc/ChangeLog: > > > > > > * match.pd: Two new patterns: One to optimize division follow= ed by multiply and the other to avoid a regression as explained above > > > > > > gcc/testsuite/ChangeLog: > > > > > > * gcc.dg/tree-ssa/20030807-10.c: Update existing test to look= for a subtraction because a shift is no longer emitted > > > * gcc.dg/pr95176.c: New test to cover optimizing division fol= lowed by multiply > > > > > > I don't have write access to the GCC repo but I've completed the FSF = paperwork as I plan to make more contributions in the future. I'm looking f= or a sponsorship from an existing GCC maintainer before applying for write = access. > > > > > > Thanks, > > > Victor