From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=Y3yo=NV=gmail.com=richard.guenther@sourceware.org>
Received: from mail-lj1-x22f.google.com (mail-lj1-x22f.google.com [IPv6:2a00:1450:4864:20::22f])
	by sourceware.org (Postfix) with ESMTPS id C5D4D3882AEC
	for <gcc-patches@gcc.gnu.org>; Wed, 19 Jun 2024 08:00:21 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org C5D4D3882AEC
Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com
ARC-Filter: OpenARC Filter v1.0.0 sourceware.org C5D4D3882AEC
Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2a00:1450:4864:20::22f
ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1718784027; cv=none;
	b=aZvI3auZUNcmDwntNt4XLuSZYD8Y5vXNJD3AlvFDUO0s0rkEsWgIt41jEKspsjxrfsK88p+zb9VpWsk0EBOpV37vdpAUoo9nk4+DdgsazZ3QK68NCIT+b0nRlSu5KsS85AitVUGoFHz+sdfFK1ZrejkAGFKcT8eEYVwwZtbzbKo=
ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key;
	t=1718784027; c=relaxed/simple;
	bh=yDee969+dEJKtp8SfgXnDE4ZDYXjWla/GdzD4echO8c=;
	h=DKIM-Signature:MIME-Version:From:Date:Message-ID:Subject:To; b=KlY5ILYjYGolYJKJL9MPNZyHBdz239kiUiiSRNa7CerOGDvEZ1/B3n/W63VDGYInQsW+Tw5vCxP5OTOCqI8XWwb0erD01uz7TXD8HiuJ2ZK1SBeFBY1LoxXBx4RGux49rkP1ZSYqUSPK5XucY+MLD51Gd557DbJm9ugEJo1f+Gk=
ARC-Authentication-Results: i=1; server2.sourceware.org
Received: by mail-lj1-x22f.google.com with SMTP id 38308e7fff4ca-2ec10324791so67193171fa.1
        for <gcc-patches@gcc.gnu.org>; Wed, 19 Jun 2024 01:00:21 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20230601; t=1718784020; x=1719388820; darn=gcc.gnu.org;
        h=content-transfer-encoding:cc:to:subject:message-id:date:from
         :in-reply-to:references:mime-version:from:to:cc:subject:date
         :message-id:reply-to;
        bh=7NO2njvaNJc4NFDdYvHlGIzGeR5TDc3KhiViDlmPc6I=;
        b=ZM+KnjsVxGinr0DB23sw7vh1SR0Muuix33NfB9XDS4U25+2ZsLTLb9df7BiyDEINRC
         C3FXbUAd9yRoR9jzjB4vfchMOfjUsw8OGaStf+xb4xvcWdDemuG8KxMN55R3t1vTm/TY
         d4k+k7GA6biABE7+J3kO0YHRuCuC7zgQ8nBkMmg1rKKB/Tz1YIcdBtDOU1BukjzzvFjJ
         vNVEDngR6SBREbrgZbBCZ8N/MXkqNumqXPUlpA8VE7HlBQzwto7KWtF7nWe0om8Bw2OQ
         p6TfyPI8XnAK2+hCsZPzOy0i9uEOG1UFhpI7zVKJ1BwPxHozhHSk+AV+CMn3DXyFwxaQ
         9mlw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1718784020; x=1719388820;
        h=content-transfer-encoding:cc:to:subject:message-id:date:from
         :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=7NO2njvaNJc4NFDdYvHlGIzGeR5TDc3KhiViDlmPc6I=;
        b=glNQSR9tCC92bIpMREYqOrlqPbpX9FvLhewiv87gEP4F5dX9LkFu/iN5QPU/3PzmaC
         woxLdsa5rlPIv/YuQ8FoQacmuWVxLv2+49PBqFK7kcQ4bQkAkXydKQyKQz9o9hhqtlF0
         sONpsmQLu6e24Gea5UHFwM2WmbuE9NQwTacsvlY0wd7PwkDNdkD4VLbWr60hzboKiYe3
         +pvoVGVeqCHrE595uazqEp8EJCKwdbzJ74JLlZQExM4aLKitjJWYkfOTXprcK+6JPEDV
         RDIN1pw3OMxbnWooH2NpjRyz/0afq/77VsXErQFwzYN2xh0HOmE7E4V6n9QWFwMAN0aL
         AbEA==
X-Gm-Message-State: AOJu0Yyx93hg8+Xvvf2+Pg+wcoCYU+Wf1ek/W92ntf9NIsBNC47akh5s
	G17NUP7YcfAnE7Uuc0JbVt7/i01POLdHvlXaHAOePyYxZoQB6dzSUmKR7Q+J4NKWJ5gT1Mp/C6o
	OCd7u3bPLtKtZNq9DlbDdoLZORZQ=
X-Google-Smtp-Source: AGHT+IFz3K4Ep5Aoa+DL+pzYYrZx11LfJ9oh8Qh6aIgMQ5YThSitkgDujk5pzLupetuZnhfsRumC4/caNP42MadK2Jk=
X-Received: by 2002:a2e:7e02:0:b0:2ea:e9f9:6ac2 with SMTP id
 38308e7fff4ca-2ec3ce9a97cmr13722521fa.8.1718784019823; Wed, 19 Jun 2024
 01:00:19 -0700 (PDT)
MIME-Version: 1.0
References: <20240612123753.201660-1-pan2.li@intel.com> <CAFiYyc33kHuCLnc5jw-9gQCfivZ1UtN9PMRjhq9TvX4y-RGN9g@mail.gmail.com>
 <MW5PR11MB5908D3243062E304526A51DDA9CF2@MW5PR11MB5908.namprd11.prod.outlook.com>
In-Reply-To: <MW5PR11MB5908D3243062E304526A51DDA9CF2@MW5PR11MB5908.namprd11.prod.outlook.com>
From: Richard Biener <richard.guenther@gmail.com>
Date: Wed, 19 Jun 2024 10:00:08 +0200
Message-ID: <CAFiYyc1L8R=Kh7RYwh3ExLbfTMvC-o2Bhks3ZMg2jiWhDGx47w@mail.gmail.com>
Subject: Re: [PATCH v1] Match: Support more forms for the scalar unsigned .SAT_SUB
To: "Li, Pan2" <pan2.li@intel.com>
Cc: "gcc-patches@gcc.gnu.org" <gcc-patches@gcc.gnu.org>, "juzhe.zhong@rivai.ai" <juzhe.zhong@rivai.ai>, 
	"kito.cheng@gmail.com" <kito.cheng@gmail.com>, "jeffreyalaw@gmail.com" <jeffreyalaw@gmail.com>, 
	"rdapp.gcc@gmail.com" <rdapp.gcc@gmail.com>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Spam-Status: No, score=-7.9 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,GIT_PATCH_0,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org
List-Id: <gcc-patches.gcc.gnu.org>

On Wed, Jun 19, 2024 at 9:37=E2=80=AFAM Li, Pan2 <pan2.li@intel.com> wrote:
>
> Hi Richard,
>
> Given almost all unsigned SAT_ADD/SAT_SUB patches are merged, I revisit t=
he original code pattern aka zip benchmark.
> It may look like below:
>
> void test (uint16_t *x, uint16_t *y, unsigned wsize, unsigned count)
> {
>   unsigned m =3D 0, n =3D count;
>   register uint16_t *p;
>
>   p =3D x;
>
>   do {
>     m =3D *--p;
>
>     *p =3D (uint16_t)(m >=3D wsize ? m-wsize : 0); // There will be a con=
version here.
>   } while (--n);
> }
>
> And we can have 179 tree pass as below:
>
>   <bb 3> [local count: 1073741824]:
>   # n_3 =3D PHI <n_15(7), count_7(D)(15)>
>   # p_4 =3D PHI <p_10(7), x_8(D)(15)>
>   p_10 =3D p_4 + 18446744073709551614;
>   _1 =3D *p_10;
>   m_11 =3D (unsigned int) _1;
>   _2 =3D m_11 - wsize_12(D);
>   iftmp.0_13 =3D (short unsigned int) _2;
>   _18 =3D m_11 >=3D wsize_12(D);
>   iftmp.0_5 =3D _18 ? iftmp.0_13 : 0;
>   *p_10 =3D iftmp.0_5;
>
> The above form doesn't hit any form we have supported in match.pd. Then I=
 have one idea that to convert
>
> uint16 d, tmp;
> uint32 a, b, m;
>
> m =3D a - b;
> tmp =3D (uint16)m;
> d =3D a >=3D b ? tmp : 0;
>
> to
>
> d =3D (uint16)(.SAT_SUB (a, b));

The key here is to turn this into

 m =3D a - b;
 tmp =3D a >=3D b ? m : 0;
 d =3D (uint16) tmp;

I guess?  We probably have the reverse transform, turn
(uint16) a ? b : c; into a ? (uint16)b : (uint16)c if any of the arm simpli=
fies.

OTOH if you figure the correct rules for the allowed conversions adjusting =
the
pattern matching to allow a conversion on the subtract would work.

> I am not very sure it is reasonable to make it work, it may have gimple a=
ssignment with convert similar as below (may require the help of vectorize_=
conversion?).
> Would like to get some hint from you before the next step, thanks a lot.
>
> patt_34 =3D .SAT_SUB (m_11, wsize_12(D));
> patt_35 =3D (vector([8,8]) short unsigned int) patt_34;
>
> Pan
>
> -----Original Message-----
> From: Richard Biener <richard.guenther@gmail.com>
> Sent: Friday, June 14, 2024 4:05 PM
> To: Li, Pan2 <pan2.li@intel.com>
> Cc: gcc-patches@gcc.gnu.org; juzhe.zhong@rivai.ai; kito.cheng@gmail.com; =
jeffreyalaw@gmail.com; rdapp.gcc@gmail.com
> Subject: Re: [PATCH v1] Match: Support more forms for the scalar unsigned=
 .SAT_SUB
>
> On Wed, Jun 12, 2024 at 2:38=E2=80=AFPM <pan2.li@intel.com> wrote:
> >
> > From: Pan Li <pan2.li@intel.com>
> >
> > After we support the scalar unsigned form 1 and 2,  we would like
> > to introduce more forms include the branch and branchless.  There
> > are forms 3-10 list as below:
> >
> > Form 3:
> >   #define SAT_SUB_U_3(T) \
> >   T sat_sub_u_3_##T (T x, T y) \
> >   { \
> >     return x > y ? x - y : 0; \
> >   }
> >
> > Form 4:
> >   #define SAT_SUB_U_4(T) \
> >   T sat_sub_u_4_##T (T x, T y) \
> >   { \
> >     return x >=3D y ? x - y : 0; \
> >   }
> >
> > Form 5:
> >   #define SAT_SUB_U_5(T) \
> >   T sat_sub_u_5_##T (T x, T y) \
> >   { \
> >     return x < y ? 0 : x - y; \
> >   }
> >
> > Form 6:
> >   #define SAT_SUB_U_6(T) \
> >   T sat_sub_u_6_##T (T x, T y) \
> >   { \
> >     return x <=3D y ? 0 : x - y; \
> >   }
> >
> > Form 7:
> >   #define SAT_SUB_U_7(T) \
> >   T sat_sub_u_7_##T (T x, T y) \
> >   { \
> >     T ret; \
> >     T overflow =3D __builtin_sub_overflow (x, y, &ret); \
> >     return ret & (T)(overflow - 1); \
> >   }
> >
> > Form 8:
> >   #define SAT_SUB_U_8(T) \
> >   T sat_sub_u_8_##T (T x, T y) \
> >   { \
> >     T ret; \
> >     T overflow =3D __builtin_sub_overflow (x, y, &ret); \
> >     return ret & (T)-(!overflow); \
> >   }
> >
> > Form 9:
> >   #define SAT_SUB_U_9(T) \
> >   T sat_sub_u_9_##T (T x, T y) \
> >   { \
> >     T ret; \
> >     T overflow =3D __builtin_sub_overflow (x, y, &ret); \
> >     return overflow ? 0 : ret; \
> >   }
> >
> > Form 10:
> >   #define SAT_SUB_U_10(T) \
> >   T sat_sub_u_10_##T (T x, T y) \
> >   { \
> >     T ret; \
> >     T overflow =3D __builtin_sub_overflow (x, y, &ret); \
> >     return !overflow ? ret : 0; \
> >   }
> >
> > Take form 10 as example:
> >
> > SAT_SUB_U_10(uint64_t);
> >
> > Before this patch:
> > uint8_t sat_sub_u_10_uint8_t (uint8_t x, uint8_t y)
> > {
> >   unsigned char _1;
> >   unsigned char _2;
> >   uint8_t _3;
> >   __complex__ unsigned char _6;
> >
> > ;;   basic block 2, loop depth 0
> > ;;    pred:       ENTRY
> >   _6 =3D .SUB_OVERFLOW (x_4(D), y_5(D));
> >   _2 =3D IMAGPART_EXPR <_6>;
> >   if (_2 =3D=3D 0)
> >     goto <bb 3>; [50.00%]
> >   else
> >     goto <bb 4>; [50.00%]
> > ;;    succ:       3
> > ;;                4
> >
> > ;;   basic block 3, loop depth 0
> > ;;    pred:       2
> >   _1 =3D REALPART_EXPR <_6>;
> > ;;    succ:       4
> >
> > ;;   basic block 4, loop depth 0
> > ;;    pred:       2
> > ;;                3
> >   # _3 =3D PHI <0(2), _1(3)>
> >   return _3;
> > ;;    succ:       EXIT
> >
> > }
> >
> > After this patch:
> > uint8_t sat_sub_u_10_uint8_t (uint8_t x, uint8_t y)
> > {
> >   uint8_t _3;
> >
> > ;;   basic block 2, loop depth 0
> > ;;    pred:       ENTRY
> >   _3 =3D .SAT_SUB (x_4(D), y_5(D)); [tail call]
> >   return _3;
> > ;;    succ:       EXIT
> >
> > }
> >
> > The below test suites are passed for this patch:
> > 1. The rv64gcv fully regression test with newlib.
> > 2. The rv64gcv build with glibc.
> > 3. The x86 bootstrap test.
> > 4. The x86 fully regression test.
> >
> > gcc/ChangeLog:
> >
> >         * match.pd: Add more match for unsigned sat_sub.
> >         * tree-ssa-math-opts.cc (match_unsigned_saturation_sub): Add ne=
w
> >         func impl to match phi node for .SAT_SUB.
> >         (math_opts_dom_walker::after_dom_children): Try match .SAT_SUB
> >         for the phi node, MULT_EXPR, BIT_XOR_EXPR and BIT_AND_EXPR.
> >
> > Signed-off-by: Pan Li <pan2.li@intel.com>
> > ---
> >  gcc/match.pd              | 25 +++++++++++++++++++++++--
> >  gcc/tree-ssa-math-opts.cc | 33 +++++++++++++++++++++++++++++++++
> >  2 files changed, 56 insertions(+), 2 deletions(-)
> >
> > diff --git a/gcc/match.pd b/gcc/match.pd
> > index 5cfe81e80b3..66e411b3359 100644
> > --- a/gcc/match.pd
> > +++ b/gcc/match.pd
> > @@ -3140,14 +3140,14 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> >  /* Unsigned saturation sub, case 1 (branch with gt):
> >     SAT_U_SUB =3D X > Y ? X - Y : 0  */
> >  (match (unsigned_integer_sat_sub @0 @1)
> > - (cond (gt @0 @1) (minus @0 @1) integer_zerop)
> > + (cond^ (gt @0 @1) (minus @0 @1) integer_zerop)
> >   (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type)
> >        && types_match (type, @0, @1))))
> >
> >  /* Unsigned saturation sub, case 2 (branch with ge):
> >     SAT_U_SUB =3D X >=3D Y ? X - Y : 0.  */
> >  (match (unsigned_integer_sat_sub @0 @1)
> > - (cond (ge @0 @1) (minus @0 @1) integer_zerop)
> > + (cond^ (ge @0 @1) (minus @0 @1) integer_zerop)
> >   (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type)
> >        && types_match (type, @0, @1))))
> >
> > @@ -3165,6 +3165,27 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> >   (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type)
> >        && types_match (type, @0, @1))))
> >
> > +/* Unsigned saturation sub, case 5 (branchless bit_and with .SUB_OVERF=
LOW.  */
> > +(match (unsigned_integer_sat_sub @0 @1)
> > + (bit_and:c (realpart (IFN_SUB_OVERFLOW@2 @0 @1))
> > +  (plus:c (imagpart @2) integer_minus_onep))
>
> :c shouldn't be necessary on the plus
>
> > + (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type)
> > +      && types_match (type, @0, @1))))
> > +
> > +/* Unsigned saturation sub, case 6 (branchless mult with .SUB_OVERFLOW=
.  */
> > +(match (unsigned_integer_sat_sub @0 @1)
> > + (mult:c (realpart (IFN_SUB_OVERFLOW@2 @0 @1))
> > +  (bit_xor:c (imagpart @2) integer_onep))
>
> or on the bit_xor
>
> OK with those changes.
>
> Richard.
>
> > + (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type)
> > +      && types_match (type, @0, @1))))
> > +
> > +/* Unsigned saturation sub, case 7 (branch with .SUB_OVERFLOW.  */
> > +(match (unsigned_integer_sat_sub @0 @1)
> > + (cond^ (eq (imagpart (IFN_SUB_OVERFLOW@2 @0 @1)) integer_zerop)
> > +  (realpart @2) integer_zerop)
> > + (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type)
> > +      && types_match (type, @0, @1))))
> > +
> >  /* x >  y  &&  x !=3D XXX_MIN  -->  x > y
> >     x >  y  &&  x =3D=3D XXX_MIN  -->  false . */
> >  (for eqne (eq ne)
> > diff --git a/gcc/tree-ssa-math-opts.cc b/gcc/tree-ssa-math-opts.cc
> > index fbb8e0ea306..05aa157611b 100644
> > --- a/gcc/tree-ssa-math-opts.cc
> > +++ b/gcc/tree-ssa-math-opts.cc
> > @@ -4186,6 +4186,36 @@ match_unsigned_saturation_sub (gimple_stmt_itera=
tor *gsi, gassign *stmt)
> >      build_saturation_binary_arith_call (gsi, IFN_SAT_SUB, lhs, ops[0],=
 ops[1]);
> >  }
> >
> > +/*
> > + * Try to match saturation unsigned sub.
> > + *  <bb 2> [local count: 1073741824]:
> > + *  if (x_2(D) > y_3(D))
> > + *    goto <bb 3>; [50.00%]
> > + *  else
> > + *    goto <bb 4>; [50.00%]
> > + *
> > + *  <bb 3> [local count: 536870912]:
> > + *  _4 =3D x_2(D) - y_3(D);
> > + *
> > + *  <bb 4> [local count: 1073741824]:
> > + *  # _1 =3D PHI <0(2), _4(3)>
> > + *  =3D>
> > + *  <bb 4> [local count: 1073741824]:
> > + *  _1 =3D .SAT_SUB (x_2(D), y_3(D));  */
> > +static void
> > +match_unsigned_saturation_sub (gimple_stmt_iterator *gsi, gphi *phi)
> > +{
> > +  if (gimple_phi_num_args (phi) !=3D 2)
> > +    return;
> > +
> > +  tree ops[2];
> > +  tree phi_result =3D gimple_phi_result (phi);
> > +
> > +  if (gimple_unsigned_integer_sat_sub (phi_result, ops, NULL))
> > +    build_saturation_binary_arith_call (gsi, phi, IFN_SAT_SUB, phi_res=
ult,
> > +                                       ops[0], ops[1]);
> > +}
> > +
> >  /* Recognize for unsigned x
> >     x =3D y - z;
> >     if (x > y)
> > @@ -6104,6 +6134,7 @@ math_opts_dom_walker::after_dom_children (basic_b=
lock bb)
> >      {
> >        gimple_stmt_iterator gsi =3D gsi_start_bb (bb);
> >        match_unsigned_saturation_add (&gsi, psi.phi ());
> > +      match_unsigned_saturation_sub (&gsi, psi.phi ());
> >      }
> >
> >    for (gsi =3D gsi_after_labels (bb); !gsi_end_p (gsi);)
> > @@ -6129,6 +6160,7 @@ math_opts_dom_walker::after_dom_children (basic_b=
lock bb)
> >                   continue;
> >                 }
> >               match_arith_overflow (&gsi, stmt, code, m_cfg_changed_p);
> > +             match_unsigned_saturation_sub (&gsi, as_a<gassign *> (stm=
t));
> >               break;
> >
> >             case PLUS_EXPR:
> > @@ -6167,6 +6199,7 @@ math_opts_dom_walker::after_dom_children (basic_b=
lock bb)
> >               break;
> >
> >             case COND_EXPR:
> > +           case BIT_AND_EXPR:
> >               match_unsigned_saturation_sub (&gsi, as_a<gassign *> (stm=
t));
> >               break;
> >
> > --
> > 2.34.1
> >