From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <richard.guenther@gmail.com>
Received: from mail-ed1-x535.google.com (mail-ed1-x535.google.com
 [IPv6:2a00:1450:4864:20::535])
 by sourceware.org (Postfix) with ESMTPS id 1711438618FF
 for <gcc-patches@gcc.gnu.org>; Fri, 18 Jun 2021 09:44:08 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 1711438618FF
Received: by mail-ed1-x535.google.com with SMTP id b11so7796804edy.4
 for <gcc-patches@gcc.gnu.org>; Fri, 18 Jun 2021 02:44:08 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20161025;
 h=x-gm-message-state:mime-version:references:in-reply-to:from:date
 :message-id:subject:to:cc:content-transfer-encoding;
 bh=V7LOwOpXeYSYdTrC6cK7p+fibHqRgjT6xQskaKZG6zo=;
 b=EtH61mUC+nuBWQjMnwiAxeHwNvVeAhix+/D9XEV+Cr5Phj/lK4mIYCLNgP7L5DxkY6
 okX/uUDHsFkpRs3m6J+UVlfRzeRSDuHU/fztVlY5L5L79aPPCtikZghvnVmE0FYf0+18
 ydEq64dtT5piE3sR1QaLW+qzcnJ9beiDUss6e1h8ub0UJYsLUk3kjtfxCeWsPG7RUkai
 nU61Ca8NEZvuogv1WtDNIddZbdv6H0j/wFbjEeLlX9aOZTIG86XFQ6LD5U8Jyj7nhaS7
 JKlFTjXOWiL/b/IGhDtBj2Bgw3Xx5uR2Z9w1/giHfehgcdIONP6gpNVLzdaLiWIcQhTt
 h3tQ==
X-Gm-Message-State: AOAM532WAReNFB00qnkO7m0DZoEERifDxglDvXncXjKPA1Ihe6bw//m6
 KaUJtAUXVZCCKyw8uIoHV5h46QMdLDVzXtGOcMU=
X-Google-Smtp-Source: ABdhPJxt8/gol2FdvwHflSXX4mGa4IE8w47LfR3OK32anTBGUEF6EHs/2ouR+MkzxQRWjaSEghXgRWJe5uEp4FT8S8k=
X-Received: by 2002:a05:6402:175b:: with SMTP id
 v27mr3781094edx.61.1624009446932; 
 Fri, 18 Jun 2021 02:44:06 -0700 (PDT)
MIME-Version: 1.0
References: <CY4PR2101MB0801A8FFC0954BA013219E76CC7C9@CY4PR2101MB0801.namprd21.prod.outlook.com>
 <CAFiYyc1NJXgedUqSzDquv0odhZdcfP1zepSnOP27DSEg3eDy3g@mail.gmail.com>
 <MWHPR2101MB0811F770CA794056CA891FD5CC3D9@MWHPR2101MB0811.namprd21.prod.outlook.com>
 <CAFiYyc3DV6XrnQuU3LywpasS6BWoxC9-+4CDoEFXWSynmF92ww@mail.gmail.com>
 <MWHPR2101MB08118007CBC33CDFA4BA09FBCC0F9@MWHPR2101MB0811.namprd21.prod.outlook.com>
In-Reply-To: <MWHPR2101MB08118007CBC33CDFA4BA09FBCC0F9@MWHPR2101MB0811.namprd21.prod.outlook.com>
From: Richard Biener <richard.guenther@gmail.com>
Date: Fri, 18 Jun 2021 11:43:56 +0200
Message-ID: <CAFiYyc3BVkKOMiVGPFQVOqp+YEKPZKeN68nfe72gmbj6o_M25Q@mail.gmail.com>
Subject: Re: [EXTERNAL] Re: [PATCH] tree-optimization: Optimize division
 followed by multiply [PR95176]
To: Victor Tong <vitong@microsoft.com>
Cc: "gcc-patches@gcc.gnu.org" <gcc-patches@gcc.gnu.org>,
 Marc Glisse <marc.glisse@inria.fr>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Spam-Status: No, score=-2.6 required=5.0 tests=BAYES_00, DKIM_SIGNED,
 DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, RCVD_IN_DNSWL_NONE,
 SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2
X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
X-List-Received-Date: Fri, 18 Jun 2021 09:44:10 -0000

On Wed, Jun 16, 2021 at 8:49 PM Victor Tong <vitong@microsoft.com> wrote:
>
> Hi Richard,
>
> Thanks for the feedback. From what you said, I can think of two possible =
solutions (though I'm not sure if either is feasible/fully correct):
>
> Option 1: Have the new X * (Y / X) --> Y - (Y % X) optimization only run =
in scenarios that don't interfere with the existing X - (X / Y) * Y --> X %=
 Y optimization.
>
> This would involve checking the expression one level up to see if there's=
 a subtraction that would trigger the existing optimization. I looked throu=
gh the match.pd file and couldn't find a bail condition like this. It doesn=
't seem like there's a link from an expression to its parent expression one=
 level up. This also feels a bit counter-intuitive since it would be doing =
the opposite of the bottom-up expression matching where the compiler would =
like to match a larger expression rather than a smaller one.

Yes, that option is not really possible from match.pd.

> Option 2: Add a new pattern to support scenarios that the existing nop_co=
nvert pattern bails out on.
>
> Existing pattern:
>
> (simplify
>    (minus (nop_convert1? @0) (nop_convert2? (minus (nop_convert3? @@0) @1=
)))
>    (view_convert @1))
>
> New pattern to add:
>
>   /* X - (X - Y) --> Y */
>   (simplify
>   (minus @0 (convert? (minus @@0 @1)))
>   (if (INTEGRAL_TYPE_P (type)
>         && TYPE_OVERFLOW_UNDEFINED(type)
>         && INTEGRAL_TYPE_P (TREE_TYPE(@1))
>         && TYPE_OVERFLOW_UNDEFINED(TREE_TYPE(@1))
>         && !TYPE_UNSIGNED (TREE_TYPE (@1))
>         && !TYPE_UNSIGNED (type)
>         && TYPE_PRECISION (TREE_TYPE (@1)) <=3D TYPE_PRECISION (type))
>     (convert @1)))
>
> I think the truncation concerns that you brought up should be covered if =
the external expression type precision is greater than or equal to the inte=
rnal expression type. There may be a sign extension operation (which is why=
 the nop_convert check fails) but that shouldn't affect the value of the ex=
pression. And if the types involved are signed integers where overflow/unde=
rflow results in undefined behavior, the X - (X - Y) --> Y optimization sho=
uld be legal.
>
> Please correct me if I'm wrong with either one of these options, or if yo=
u can think of a better option to fix the regression.

So to recap, we're looking to simplify 42 - (long int) (42 - 42 % x)
(simplified from gcc.dg/fold-minus-6.c), or
simply (new testcase):

long
fn1 (int x)
{
  return 42L - (long)(42 - x);
}

where the existing pattern does not apply because the conversion is
not a NOP one:

  (simplify
   (minus (nop_convert1? (minus (nop_convert2? @0) @1)) @0)
   (if (!ANY_INTEGRAL_TYPE_P (type)
        || TYPE_OVERFLOW_WRAPS (type))
   (negate (view_convert @1))
   (view_convert (negate @1))))

so let's consider replacing nop_convert1? with convert1? and thus obtain

  (simplify
   (minus (convert1? (minus (nop_convert2? @0) @1)) @0)
   (if (!ANY_INTEGRAL_TYPE_P (type)
        || TYPE_OVERFLOW_WRAPS (type))
   (negate (view_convert @1))
   (view_convert (negate @1))))

given we still require a matching @0 (as in operand_requal_p) it looks like
a convert1 that is not the inverse of the nop_convert2, and thus also
a nop_convert
is only possible for constants (because operand_equal_p does not verify typ=
e
equality).  Now - can we construct any testcase for which this conversion w=
ould
be wrong?

Richard.

> Thanks,
> Victor
>
>
>
>
> From: Richard Biener <richard.guenther@gmail.com>
> Sent: Monday, June 7, 2021 1:25 AM
> To: Victor Tong <vitong@microsoft.com>
> Cc: gcc-patches@gcc.gnu.org <gcc-patches@gcc.gnu.org>
> Subject: Re: [EXTERNAL] Re: [PATCH] tree-optimization: Optimize division =
followed by multiply [PR95176]
>
> On Wed, Jun 2, 2021 at 10:55 PM Victor Tong <vitong@microsoft.com> wrote:
> >
> > Hi Richard,
> >
> > Thanks for reviewing my patch. I did a search online and you're right -=
- there isn't a vector modulo instruction. I'll remove the X * (Y / X) --> =
Y - (Y % X) pattern and the existing X - (X / Y) * Y --> X % Y from trigger=
ing on vector types.
> >
> > I looked into why the following pattern isn't triggering:
> >
> >   (simplify
> >    (minus @0 (nop_convert1? (minus (nop_convert2? @0) @1)))
> >    (view_convert @1))
> >
> > The nop_converts expand into tree_nop_conversion_p checks. In fn2() of =
the testsuite/gcc.dg/fold-minus-6.c, the expression during generic matching=
 looks like:
> >
> > 42 - (long int) (42 - 42 % x)
> >
> > When looking at the right-hand side of the expression (the (long int) (=
42 - 42 % x)), the tree_nop_conversion_p check fails because of the type pr=
ecision difference. The expression inside of the cast has a 32-bit precisio=
n and the outer expression has a 64-bit precision.
> >
> > I looked around at other patterns and it seems like nop_convert and vie=
w_convert are used because of underflow/overflow concerns. I'm not familiar=
 with the two constructs. What's the difference between using them and chec=
king TYPE_OVERFLOW_UNDEFINED? In the scenario above, since TYPE_OVERFLOW_UN=
DEFINED is true, the second pattern that I added (X - (X - Y) --> Y) gets t=
riggered.
>
> But TYPE_OVERFLOW_UNDEFINED is not a good condition here since the
> conversion is the problematic one and
> conversions have implementation defined behavior.  Now, the above does
> not match because it wasn't designed to,
> and for non-constant '42' it would have needed a (convert ...) around
> the first @0 as well (matching of constants is
> by value, not by value + type).
>
> That said, your
>
> +/* X - (X - Y) --> Y */
> +(simplify
> + (minus (convert1? @0) (convert2? (minus @@0 @1)))
> + (if ((INTEGRAL_TYPE_P (type) || VECTOR_INTEGER_TYPE_P (type)) &&
> TYPE_OVERFLOW_UNDEFINED(type))
> +  (convert @1)))
>
> would match (int)x - (int)(x - y) where you assert the outer subtract
> has undefined behavior
> on overflow but the inner subtract could wrap and the (int) conversion
> can be truncating
> or widening.  Is that really always a valid transform then?
>
> Richard.
>
> > Thanks,
> > Victor
> >
> >
> > From: Richard Biener <richard.guenther@gmail.com>
> > Sent: Tuesday, April 27, 2021 1:29 AM
> > To: Victor Tong <vitong@microsoft.com>
> > Cc: gcc-patches@gcc.gnu.org <gcc-patches@gcc.gnu.org>
> > Subject: [EXTERNAL] Re: [PATCH] tree-optimization: Optimize division fo=
llowed by multiply [PR95176]
> >
> > On Thu, Apr 1, 2021 at 1:03 AM Victor Tong via Gcc-patches
> > <gcc-patches@gcc.gnu.org> wrote:
> > >
> > > Hello,
> > >
> > > This patch fixes PR tree-optimization/95176. A new pattern in match.p=
d was added to transform "a * (b / a)" --> "b - (b % a)". A new test case w=
as also added to cover this scenario.
> > >
> > > The new pattern interfered with the existing pattern of "X - (X / Y) =
* Y". In some cases (such as in fn4() in gcc/testsuite/gcc.dg/fold-minus-6.=
c), the new pattern is applied causing the existing pattern to no longer ap=
ply. This results in worse code generation because the expression is left a=
s "X - (X - Y)". An additional subtraction pattern of "X - (X - Y) --> Y" w=
as added to this patch to avoid this regression.
> > >
> > > I also didn't remove the existing pattern because it triggered in mor=
e cases than the new pattern because of a tree_invariant_p check that's ins=
erted by genmatch for the new pattern.
> >
> > Yes, we do not handle using Y multiple times when it might contain
> > side-effects in GENERIC folding
> > (comments in genmatch suggest we can use save_expr but we don't
> > implement this [anymore]).
> >
> > On GIMPLE there's also the issue that your new pattern creates a
> > complex expression which
> > makes it failed to be used by value-numbering for example where the
> > old pattern was OK
> > (eventually, if no conversion was required).
> >
> > So indeed it looks OK to preserve both.
> >
> > I wonder why you needed the
> >
> > +/* X - (X - Y) --> Y */
> > +(simplify
> > + (minus (convert1? @0) (convert2? (minus @@0 @1)))
> > + (if ((INTEGRAL_TYPE_P (type) || VECTOR_INTEGER_TYPE_P (type)) &&
> > TYPE_OVERFLOW_UNDEFINED(type))
> > +  (convert @1)))
> >
> > pattern since it should be handled by
> >
> >   /* Match patterns that allow contracting a plus-minus pair
> >      irrespective of overflow issues.  */
> >   /* (A +- B) - A       ->  +- B */
> >   /* (A +- B) -+ B      ->  A */
> >   /* A - (A +- B)       -> -+ B */
> >   /* A +- (B -+ A)      ->  +- B */
> >
> > in particular
> >
> >   (simplify
> >    (minus @0 (nop_convert1? (minus (nop_convert2? @0) @1)))
> >    (view_convert @1))
> >
> > if there's supported cases missing I'd rather extend this pattern than
> > replicating it.
> >
> > +/* X * (Y / X) is the same as Y - (Y % X).  */
> > +(simplify
> > + (mult:c (convert1? @0) (convert2? (trunc_div @1 @@0)))
> > + (if (INTEGRAL_TYPE_P (type) || VECTOR_INTEGER_TYPE_P (type))
> > +  (minus (convert @1) (convert (trunc_mod @1 @0)))))
> >
> > note that if you're allowing vector types you have to use
> > (view_convert ...) in the
> > transform and you also need to make sure that the target can expand
> > the modulo - I suspect that's an issue with the existing pattern as wel=
l.
> > I don't know of any vector ISA that supports modulo (or integer
> > division, that is).
> > Restricting the patterns to integer types is probably the most
> > sensible solution.
> >
> > Thanks,
> > Richard.
> >
> > > I verified that all "make -k check" tests pass when targeting x86_64-=
pc-linux-gnu.
> > >
> > > 2021-03-31  Victor Tong  <vitong@microsoft.com>
> > >
> > > gcc/ChangeLog:
> > >
> > >         * match.pd: Two new patterns: One to optimize division follow=
ed by multiply and the other to avoid a regression as explained above
> > >
> > > gcc/testsuite/ChangeLog:
> > >
> > >         * gcc.dg/tree-ssa/20030807-10.c: Update existing test to look=
 for a subtraction because a shift is no longer emitted
> > >         * gcc.dg/pr95176.c: New test to cover optimizing division fol=
lowed by multiply
> > >
> > > I don't have write access to the GCC repo but I've completed the FSF =
paperwork as I plan to make more contributions in the future. I'm looking f=
or a sponsorship from an existing GCC maintainer before applying for write =
access.
> > >
> > > Thanks,
> > > Victor