From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <richard.guenther@gmail.com>
Received: from mail-qk1-x732.google.com (mail-qk1-x732.google.com
 [IPv6:2607:f8b0:4864:20::732])
 by sourceware.org (Postfix) with ESMTPS id 6527E38582B4
 for <gcc-patches@gcc.gnu.org>; Wed, 15 Jun 2022 08:25:11 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 6527E38582B4
Received: by mail-qk1-x732.google.com with SMTP id p63so8181860qkd.10
 for <gcc-patches@gcc.gnu.org>; Wed, 15 Jun 2022 01:25:11 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20210112;
 h=x-gm-message-state:mime-version:references:in-reply-to:from:date
 :message-id:subject:to:cc;
 bh=niIxVkMUC/vrkrU9NgbViks+SCMXrE6nh4iaTLvpR1k=;
 b=o8fRLgV7qYz+/jso+hsI34Z/oqj0lAlSwVU7tHCmIHW3A1G65Q5H4WkfSA1yq6WFRB
 fUrn+POWGyOm5RH9gjstQrAa/iUx4CLUCSLTh2tYaowT+lnxba2/cVLlJVbF2XjScZzD
 xZ1q9g0+NeX91ZpYINxveejVRcro5BUScQiOkDZschWQEf1ezMuXpLGY2/LwyyCDsG26
 dMYd14p7Dgx/pYIM0ANOSi0wndMiwLyLCQw2ourUXvn9/BGR9KqAPCDdMDaaIAQFocS0
 0MLSXduKaD4BVV/u7w7Svt1oQbLqEs4E1FCqTEZYbGybBJUUzDWFxgPPKdPY67/NZfHW
 1iSA==
X-Gm-Message-State: AOAM531TvEXeN/hHa1VpcWG4eUxx/4u2n9iSpDb7IujSmUHI0GQ2OVC4
 yUFvxTv2qgRK3WW4IM/qhxZ1UhWKOY8HtpJSakY=
X-Google-Smtp-Source: ABdhPJxGU9Ukt+ivijm4r0v16BJeXgf8vx+EKpH2cupIqbyRTNvgj08Z32Yzflx7ceyfays8tcTezgIoXx9JlFXoYcg=
X-Received: by 2002:a37:a252:0:b0:6a9:756d:d16d with SMTP id
 l79-20020a37a252000000b006a9756dd16dmr3951656qke.350.1655281510605; Wed, 15
 Jun 2022 01:25:10 -0700 (PDT)
MIME-Version: 1.0
References: <CAFiYyc1bphpwz4dQ91pE44MEohC=n+Ju0ffG9Cvu2pmcrxj+ow@mail.gmail.com>
 <20220607075431.24764-1-hongtao.liu@intel.com>
In-Reply-To: <20220607075431.24764-1-hongtao.liu@intel.com>
From: Richard Biener <richard.guenther@gmail.com>
Date: Wed, 15 Jun 2022 10:24:59 +0200
Message-ID: <CAFiYyc3tJrK4wRjKwW-aSaALMDOY1p74oWtb6CMvg8b5PWWs9g@mail.gmail.com>
Subject: Re: [PATCH] Simplify (B * v + C) * D -> BD* v + CD when B, C,
 D are all INTEGER_CST.
To: liuhongt <hongtao.liu@intel.com>, Andrew MacLeod <amacleod@redhat.com>
Cc: GCC Patches <gcc-patches@gcc.gnu.org>
Content-Type: text/plain; charset="UTF-8"
X-Spam-Status: No, score=-8.1 required=5.0 tests=BAYES_00, DKIM_SIGNED,
 DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, GIT_PATCH_0,
 KAM_SHORT, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP,
 T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
X-List-Received-Date: Wed, 15 Jun 2022 08:25:14 -0000

On Tue, Jun 7, 2022 at 9:54 AM liuhongt <hongtao.liu@intel.com> wrote:
>
> >> + (mult:c (plus:c@4 (mult:c@5 @0 INTEGER_CST@1) INTEGER_CST@2) INTEGER_CST@3)
>
> >since canonicalization puts INTEGER_CSTs last the :c should not be necessary.
>
> Changed.
>
> >> + (if (single_use (@4)
> >> +      && single_use (@5))
>
> >since the resulting expression is not simple using :s instead of
> >single_use (..) should
> >work as well.
>
> Changed.
>
> > when we go from (a + CST1) * CST2 to a * CST2 + CST1*CST2 we have
> > to worry about CST1 == -a which would make (a+CST1) * INT_MAX
> > not overflow but a * INT_MAX + CST1 * INT_MAX might.  Is the
> > overflow check for CST1 * INT_MAX sufficient to rule out
> > that a * CST2 does not overflow when (a + CST1) * CST2 does not
> > overflow?  Consider a == 2, CST1 == -1, CST2 == INT_MAX,
> > here 1 * INT_MAX does not overflow, nor does -1 * INT_MAX, but
> > 2 * INT_MAX overflows and thus the resulting expression invokes
> > undefined behavior.
> >
> > The same issue probably arises for the first pattern outer half
> > which looks like (a' + CST2) * CST3 with a' = a * CST1?
> >
> > The appropriate solution might be to perform the arithmetic
> > in an unsigned type with the implication that has on value-range
> > analysis.
>
> Yes, the patch patched based on value-range analysis.
>
> Update the patch.
>
> Similar for (v + B) * C + D -> C * v + BCD.
> Don't simplify it when there's overflow and overflow is UB for type v.
>
> gcc/ChangeLog:
>
>         PR tree-optimization/53533
>         * match.pd: Simplify (B * v + C) * D -> BD * v + CD and
>         (v + B) * C + D -> C * v + BCD when B,C,D are all INTEGER_CST,
>         and there's no overflow or !TYPE_OVERFLOW_UNDEFINED.
>
> gcc/testsuite/ChangeLog:
>
>         * gcc.target/i386/pr53533-1.c: New test.
>         * gcc.target/i386/pr53533-2.c: New test.
>         * gcc.target/i386/pr53533-3.c: New test.
>         * gcc.target/i386/pr53533-4.c: New test.
>         * gcc.target/i386/pr53533-5.c: New test.
>         * gcc.dg/vect/slp-11a.c: Adjust testcase.
> ---
>  gcc/match.pd                              | 82 +++++++++++++++++++++++
>  gcc/testsuite/gcc.dg/vect/slp-11a.c       | 10 +--
>  gcc/testsuite/gcc.target/i386/pr53533-1.c | 23 +++++++
>  gcc/testsuite/gcc.target/i386/pr53533-2.c | 46 +++++++++++++
>  gcc/testsuite/gcc.target/i386/pr53533-3.c | 24 +++++++
>  gcc/testsuite/gcc.target/i386/pr53533-4.c | 46 +++++++++++++
>  gcc/testsuite/gcc.target/i386/pr53533-5.c | 22 ++++++
>  7 files changed, 248 insertions(+), 5 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr53533-1.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr53533-2.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr53533-3.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr53533-4.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr53533-5.c
>
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 44a385b912d..54f53a1f988 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -489,6 +489,88 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>    (if (!overflow || TYPE_OVERFLOW_WRAPS (type))
>     (mult @0 { wide_int_to_tree (type, mul); }))))
>
> +/* Similar to above, but there could be an extra add/sub between
> +   successive multuiplications.  */
> +(simplify
> + (mult (plus:s (mult:s@4 @0 INTEGER_CST@1) INTEGER_CST@2) INTEGER_CST@3)
> + (with {
> +   bool overflowed = true;
> +   wi::overflow_type ovf1, ovf2;
> +   wide_int mul = wi::mul (wi::to_wide (@1), wi::to_wide (@3),
> +                          TYPE_SIGN (type), &ovf1);
> +   wide_int add = wi::mul (wi::to_wide (@2), wi::to_wide (@3),
> +                          TYPE_SIGN (type), &ovf2);
> +  if (TYPE_OVERFLOW_UNDEFINED (type))
> +    {
> +#if GIMPLE
> +      value_range vr0;
> +      if (ovf1 == wi::OVF_NONE && ovf2 == wi::OVF_NONE
> +         && get_global_range_query ()->range_of_expr (vr0, @4)
> +         && vr0.kind () == VR_RANGE)
> +       {
> +         wide_int wmin0 = vr0.lower_bound ();
> +         wide_int wmax0 = vr0.upper_bound ();
> +         wmin0 = wi::mul (wmin0, wi::to_wide (@3), TYPE_SIGN (type), &ovf1);
> +         wmax0 = wi::mul (wmax0, wi::to_wide (@3), TYPE_SIGN (type), &ovf2);
> +         if (ovf1 == wi::OVF_NONE && ovf2 == wi::OVF_NONE)
> +           {
> +             wi::add (wmin0, add, TYPE_SIGN (type), &ovf1);
> +             wi::add (wmax0, add, TYPE_SIGN (type), &ovf2);
> +             if (ovf1 == wi::OVF_NONE && ovf2 == wi::OVF_NONE)
> +               overflowed = false;

I wonder if we have an API available for these kind of queries already?
Possibly value_range expr_range (MULT_EXPR, vr0, vr1, &ovf),
computing the result range of the operation on ranges vr0 and vr1
indicating overflow behavior in 'ovf' (similar as to the wide_int APIs)?

Andrew?

I see match.pd already has similar code in a few places.

> +           }
> +       }
> +#endif
> +    }
> +  else
> +   overflowed = false;
> + }
> +  /* Skip folding on overflow.  */
> +  (if (!overflowed)
> +   (plus (mult @0 { wide_int_to_tree (type, mul); })
> +        { wide_int_to_tree (type, add); }))))
> +
> +/* Similar to above, but a multiplication between successive additions.  */
> +(simplify
> + (plus (mult:s (plus:s @0 INTEGER_CST@1) INTEGER_CST@2) INTEGER_CST@3)
> + (with {
> +   bool overflowed = true;
> +   wi::overflow_type ovf1;
> +   wi::overflow_type ovf2;
> +   wide_int mul = wi::mul (wi::to_wide (@1), wi::to_wide (@2),
> +                          TYPE_SIGN (type), &ovf1);
> +   wide_int add = wi::add (mul, wi::to_wide (@3),
> +                          TYPE_SIGN (type), &ovf2);
> +  if (TYPE_OVERFLOW_UNDEFINED (type))
> +    {
> +#if GIMPLE
> +      value_range vr0;
> +      if (ovf1 == wi::OVF_NONE && ovf2 == wi::OVF_NONE
> +         && get_global_range_query ()->range_of_expr (vr0, @0)
> +         && vr0.kind () == VR_RANGE)
> +       {
> +         wide_int wmin0 = vr0.lower_bound ();
> +         wide_int wmax0 = vr0.upper_bound ();
> +         wmin0 = wi::mul (wmin0, wi::to_wide (@2), TYPE_SIGN (type), &ovf1);
> +         wmax0 = wi::mul (wmax0, wi::to_wide (@2), TYPE_SIGN (type), &ovf2);
> +         if (ovf1 == wi::OVF_NONE && ovf2 == wi::OVF_NONE)
> +           {
> +             wi::add (wmin0, mul, TYPE_SIGN (type), &ovf1);
> +             wi::add (wmax0, mul, TYPE_SIGN (type), &ovf2);
> +             if (ovf1 == wi::OVF_NONE && ovf2 == wi::OVF_NONE)
> +               overflowed = false;
> +           }
> +       }
> +#endif
> +    }
> +  else
> +   overflowed = false;
> +

unnecessary vertical space

OK with that fixed.

Thanks,
Richard.

> + }
> +  /* Skip folding on overflow.  */
> +  (if (!overflowed)
> +   (plus (mult @0 @2) { wide_int_to_tree (type, add); }))))
> +
>  /* Optimize A / A to 1.0 if we don't care about
>     NaNs or Infinities.  */
>  (simplify
> diff --git a/gcc/testsuite/gcc.dg/vect/slp-11a.c b/gcc/testsuite/gcc.dg/vect/slp-11a.c
> index bcd3c861ca4..e6632fa77be 100644
> --- a/gcc/testsuite/gcc.dg/vect/slp-11a.c
> +++ b/gcc/testsuite/gcc.dg/vect/slp-11a.c
> @@ -9,14 +9,14 @@ int
>  main1 ()
>  {
>    int i;
> -  unsigned int out[N*8], a0, a1, a2, a3, a4, a5, a6, a7, b1, b0, b2, b3, b4, b5, b6, b7;
> -  unsigned int in[N*8] = {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63};
> +  int out[N*8], a0, a1, a2, a3, a4, a5, a6, a7, b1, b0, b2, b3, b4, b5, b6, b7;
> +  int in[N*8] = {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63};
>
>    /* Different operations - not SLPable.  */
>    for (i = 0; i < N; i++)
>      {
>        a0 = in[i*8] + 5;
> -      a1 = in[i*8 + 1] * 6;
> +      a1 = in[i*8 + 1] * 51072;
>        a2 = in[i*8 + 2] + 7;
>        a3 = in[i*8 + 3] + 8;
>        a4 = in[i*8 + 4] + 9;
> @@ -25,7 +25,7 @@ main1 ()
>        a7 = in[i*8 + 7] + 12;
>
>        b0 = a0 * 3;
> -      b1 = a1 * 2;
> +      b1 = a1 * 51072;
>        b2 = a2 * 12;
>        b3 = a3 * 5;
>        b4 = a4 * 8;
> @@ -47,7 +47,7 @@ main1 ()
>    for (i = 0; i < N; i++)
>      {
>        if (out[i*8] !=  (in[i*8] + 5) * 3 - 2
> -         || out[i*8 + 1] != (in[i*8 + 1] * 6) * 2 - 3
> +         || out[i*8 + 1] != (in[i*8 + 1] * 51072) * 51072 - 3
>           || out[i*8 + 2] != (in[i*8 + 2] + 7) * 12 - 2
>           || out[i*8 + 3] != (in[i*8 + 3] + 8) * 5 - 1
>           || out[i*8 + 4] != (in[i*8 + 4] + 9) * 8 - 8
> diff --git a/gcc/testsuite/gcc.target/i386/pr53533-1.c b/gcc/testsuite/gcc.target/i386/pr53533-1.c
> new file mode 100644
> index 00000000000..095de665366
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr53533-1.c
> @@ -0,0 +1,23 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O1" } */
> +/* { dg-final { scan-assembler-times "imull\[ \t\]" "1" } } */
> +/* { dg-final { scan-assembler-times "(?:addl|subl)\[ \t\]" "1" { target { ! ia32 } } } } */
> +
> +void
> +__attribute__((noipa))
> +foo (unsigned a[256], unsigned b[256])
> +{
> +  int i;
> +  for (i = 0; i < 256; ++i)
> +    {
> +      unsigned tmp = a[i] + 12345U;
> +      tmp *= 914237U;
> +      tmp += 12332U;
> +      tmp *= 914237U;
> +      tmp += 12332U;
> +      tmp *= 914237U;
> +      tmp -= 13U;
> +      tmp *= 8000U;
> +      b[i] = tmp;
> +    }
> +}
> diff --git a/gcc/testsuite/gcc.target/i386/pr53533-2.c b/gcc/testsuite/gcc.target/i386/pr53533-2.c
> new file mode 100644
> index 00000000000..c31b6ff4dec
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr53533-2.c
> @@ -0,0 +1,46 @@
> +/* { dg-do run } */
> +/* { dg-options "-O2" } */
> +
> +#include "pr53533-1.c"
> +
> +void
> +__attribute__((optimize("-O0")))
> +foo1 (unsigned a[256], unsigned b[256])
> +{
> +  int i;
> +  for (i = 0; i < 256; ++i)
> +    {
> +      unsigned tmp = a[i] + 12345U;
> +      tmp *= 914237U;
> +      tmp += 12332U;
> +      tmp *= 914237U;
> +      tmp += 12332U;
> +      tmp *= 914237U;
> +      tmp -= 13U;
> +      tmp *= 8000U;
> +      b[i] = tmp;
> +    }
> +}
> +
> +int main()
> +{
> +  unsigned int a[256];
> +  unsigned int b[256];
> +  unsigned int c[256];
> +  for (unsigned int i = 0; i != 256; i++)
> +    {
> +      b[i] = 0;
> +      c[i] = 1;
> +      a[i] = i * i - 10 * i + 33;
> +    }
> +  foo (a, b);
> +  foo1 (a, c);
> +
> +  for (unsigned int i = 0; i != 256; i++)
> +    {
> +      if (b[i] != c[i])
> +       __builtin_abort ();
> +    }
> +
> +  return 0;
> +}
> diff --git a/gcc/testsuite/gcc.target/i386/pr53533-3.c b/gcc/testsuite/gcc.target/i386/pr53533-3.c
> new file mode 100644
> index 00000000000..3b260d134e9
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr53533-3.c
> @@ -0,0 +1,24 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O1 -fwrapv" } */
> +/* { dg-final { scan-assembler-times "imull\[ \t\]" "1" } } */
> +/* { dg-final { scan-assembler-times "(?:addl|subl)\[ \t\]" "1" { target { ! ia32 } } } } */
> +
> +void
> +__attribute__((noipa))
> +foo (int a[256], int b[256])
> +{
> +  int i;
> +  for (i = 0; i < 256; ++i)
> +    {
> +      int tmp = a[i] + 12345;
> +      tmp *= 914237;
> +      tmp += 12332;
> +      tmp *= 914237;
> +      tmp += 12332;
> +      tmp *= 914237;
> +      tmp -= 13;
> +      tmp *= 8000;
> +      b[i] = tmp;
> +    }
> +}
> +
> diff --git a/gcc/testsuite/gcc.target/i386/pr53533-4.c b/gcc/testsuite/gcc.target/i386/pr53533-4.c
> new file mode 100644
> index 00000000000..c29f90a44dc
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr53533-4.c
> @@ -0,0 +1,46 @@
> +/* { dg-do run } */
> +/* { dg-options "-O2 -fwrapv" } */
> +
> +#include "pr53533-3.c"
> +
> +void
> +__attribute__((optimize("-O0")))
> +foo1 (int a[256], int b[256])
> +{
> +  int i;
> +  for (i = 0; i < 256; ++i)
> +    {
> +      int tmp = a[i] + 12345;
> +      tmp *= 914237;
> +      tmp += 12332;
> +      tmp *= 914237;
> +      tmp += 12332;
> +      tmp *= 914237;
> +      tmp -= 13;
> +      tmp *= 8000;
> +      b[i] = tmp;
> +    }
> +}
> +
> +int main()
> +{
> +  int a[256];
> +  int b[256];
> +  int c[256];
> +  for (int i = 0; i != 256; i++)
> +    {
> +      b[i] = 0;
> +      c[i] = 1;
> +      a[i] = i * i - 10 * i + 33;
> +    }
> +  foo (a, b);
> +  foo1 (a, c);
> +
> +  for (unsigned int i = 0; i != 256; i++)
> +    {
> +      if (b[i] != c[i])
> +       __builtin_abort ();
> +    }
> +
> +  return 0;
> +}
> diff --git a/gcc/testsuite/gcc.target/i386/pr53533-5.c b/gcc/testsuite/gcc.target/i386/pr53533-5.c
> new file mode 100644
> index 00000000000..56fe4f44064
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr53533-5.c
> @@ -0,0 +1,22 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O -fdump-tree-optimized" } */
> +/* { dg-final { scan-tree-dump-times {(?n)\* 2147483647} 1 "optimized" } } */
> +/* { dg-final { scan-tree-dump-times {(?n)\* 1073741823} 1 "optimized" } } */
> +
> +#define INT_MAX 2147483647
> +int
> +foo (int a)
> +{
> +  /* When a == -2, there's no overflow for (a + 1) * INT_MAX - 1.
> +     but overflow for a * INT_MAX + (INT_MAX - 1).
> +     Don't simpify it.  */
> +  return (a + 1) * INT_MAX - 1;
> +}
> +
> +int
> +foo1 (int a)
> +{
> +  /* Be conservative here, don't simplify this as long as
> +     a * 2147483646 may overflow.  */
> +  return 1073741823 * (a * 2 + 1);
> +}
> --
> 2.18.1
>