From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pj1-x1029.google.com (mail-pj1-x1029.google.com [IPv6:2607:f8b0:4864:20::1029]) by sourceware.org (Postfix) with ESMTPS id 8A16A3875DEF for ; Thu, 27 Jun 2024 07:31:49 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 8A16A3875DEF Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 8A16A3875DEF Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2607:f8b0:4864:20::1029 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1719473512; cv=none; b=kXx73jQFxERewDe+UmUQFcZgpT+apl08O082zzkDarB0evgXAGED9hbvR2phQ1lSiPPDeJHcmDGMgjmMNmCk7XOwiVMlzLrZjAZgmRZiT4yanX3VfURczki5byHS5JvpEQnVhs3cRqeEERpZtCMYA/wrjj/NSJ5/7F527ZSHqqo= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1719473512; c=relaxed/simple; bh=Z4103JCFnG//GWMPvymgVFyxIHiS/kn6/6/bo4nmonA=; h=DKIM-Signature:MIME-Version:From:Date:Message-ID:Subject:To; b=jskqDUb5NcD+wPI8g5HEaJ0Ggb9m3cyMng1NyDqgsMKFML0Vb5DGeWEJVeDghQiKzHC02GN6HYi3AgpYKcCzlB8LnOogNQSjH66cqoLSNKNuHAt9oAw8sRdLqMcYsrmu2XSucHgEmOni4GTzO9ibzv3lrM30eIo6xJBVYTaUMzg= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail-pj1-x1029.google.com with SMTP id 98e67ed59e1d1-2c2c6b27428so5557002a91.3 for ; Thu, 27 Jun 2024 00:31:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1719473508; x=1720078308; darn=gcc.gnu.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=8a59Lw34QEeDb1760z+JKSUsuyWeiIW8GU/m10hb8es=; b=bG7HZ6LGWGhktpShxCv5/0xfraVjkAB4Xe1EH/XYMyJnjtenQigu44G4Z1x1oPn32j frRezuIrG08UFY+kTpvgYueH5uej1NEfRNCslM8SAjGv+9Z5OqkVASbumrzzkDhNtwwX fBbOVqF0m12Qd5G8n8BGjFk85D55teZ/wpLU1Vlux2M2O9RQsx1oR4P1SCyBC/qb6Ijx yvKAOSmx3xPOf+8G0TKxb8oW2fd7Ir7oYj1HRZW7pcAsDJr31N1ck+RoMG5WsvbyFo0j m9jR1WY7y5BS+kDEgWjj8cZaUvYa/XYIwL6n+AnXXSoTDM4VA2kQXUzMTgkn8gpotacD vVLA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1719473508; x=1720078308; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=8a59Lw34QEeDb1760z+JKSUsuyWeiIW8GU/m10hb8es=; b=RcgPMBS+wBtBt1ExqCf4ehJr8KpgAUUL7GYRsFh30LJfTHUOA0yvzHF1gKo6OyxzCR DiRsncRIUEZ85ZFfxZ99mz8IbqcZeiI7pC1emfyDL4qmgjuQqNn5Jm+nz4FtvlIfotQQ 9X+EkShU4oG4OxPwPyOIH1Iu01bcOaTWs0+tR1A3YN9RYDI18gtru6gmn1uGIIvD2qJD BB13ao7QiFnWHa8/m9zqh6O5DIcE01iNAQVM8VhwBoP1ddUFU6sPnuICdnOy2RL2eXF1 BJLh6G0DJnClMHfjqfNrP9zjdn+a4GGBPIIdvO0+So0LTGLwR1Q2JAMomqhFYfXeBsQa B2DQ== X-Forwarded-Encrypted: i=1; AJvYcCVHEAqyCsSmervyfibzZ7Llf7ItUmLyXOGw3uVrAhhaDpxS7IUw5GZfOWJiRiKoxyOk+U33tfLfiknrpyjZawMlUY58hYeJiw== X-Gm-Message-State: AOJu0Yy+hgAnMM1icM59vzIi6PJ50GjXL2l1mWjg/Mb8mEWxOKd89Hkv y/1loj7G2/uaUPaXxCLKPOl+70tLC/zbugG2dZpsSIMZ59XWnPSGEBjzoZbUbT5CpdtWoBroEw3 kG4mbCMmXhT+Gt5pGeVf+GzF8IzoCBsW7Aj0= X-Google-Smtp-Source: AGHT+IFZBprAMPxrBeKHiRhjtBP9XXUXR7SVTcoTl/gP7naIO3PZ1uLgs9Hw1qR+ywKR70R0PIQzqg0pH92aVxAWZr4= X-Received: by 2002:a17:90a:fe13:b0:2c8:7b42:4c3a with SMTP id 98e67ed59e1d1-2c87b424cf9mr11396773a91.34.1719473508260; Thu, 27 Jun 2024 00:31:48 -0700 (PDT) MIME-Version: 1.0 References: <20240612123753.201660-1-pan2.li@intel.com> In-Reply-To: From: Andrew Pinski Date: Thu, 27 Jun 2024 00:31:36 -0700 Message-ID: Subject: Re: [PATCH v1] Match: Support more forms for the scalar unsigned .SAT_SUB To: "Li, Pan2" Cc: Richard Biener , "gcc-patches@gcc.gnu.org" , "juzhe.zhong@rivai.ai" , "kito.cheng@gmail.com" , "jeffreyalaw@gmail.com" , "rdapp.gcc@gmail.com" Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-8.0 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,GIT_PATCH_0,KAM_SHORT,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Wed, Jun 19, 2024 at 12:37=E2=80=AFAM Li, Pan2 wrote= : > > Hi Richard, > > Given almost all unsigned SAT_ADD/SAT_SUB patches are merged, I revisit t= he original code pattern aka zip benchmark. > It may look like below: > > void test (uint16_t *x, uint16_t *y, unsigned wsize, unsigned count) > { > unsigned m =3D 0, n =3D count; > register uint16_t *p; > > p =3D x; > > do { > m =3D *--p; > > *p =3D (uint16_t)(m >=3D wsize ? m-wsize : 0); // There will be a con= version here. > } while (--n); > } > > And we can have 179 tree pass as below: > > [local count: 1073741824]: > # n_3 =3D PHI > # p_4 =3D PHI > p_10 =3D p_4 + 18446744073709551614; > _1 =3D *p_10; > m_11 =3D (unsigned int) _1; > _2 =3D m_11 - wsize_12(D); > iftmp.0_13 =3D (short unsigned int) _2; > _18 =3D m_11 >=3D wsize_12(D); > iftmp.0_5 =3D _18 ? iftmp.0_13 : 0; > *p_10 =3D iftmp.0_5; > > The above form doesn't hit any form we have supported in match.pd. Then I= have one idea that to convert > > uint16 d, tmp; > uint32 a, b, m; > > m =3D a - b; > tmp =3D (uint16)m; > d =3D a >=3D b ? tmp : 0; > > to > > d =3D (uint16)(.SAT_SUB (a, b)); > > I am not very sure it is reasonable to make it work, it may have gimple a= ssignment with convert similar as below (may require the help of vectorize_= conversion?). > Would like to get some hint from you before the next step, thanks a lot. > > patt_34 =3D .SAT_SUB (m_11, wsize_12(D)); > patt_35 =3D (vector([8,8]) short unsigned int) patt_34; I am not sure if this is related to the above but we also miss: ``` uint32_t saturation_add(uint32_t a, uint32_t b) { const uint64_t tmp =3D (uint64_t)a + b; if (tmp > UINT32_MAX) { return UINT32_MAX; } return tmp; } ``` This comes from https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D88603 . I thought you might be interested in that form too. Thanks, Andrew > > Pan > > -----Original Message----- > From: Richard Biener > Sent: Friday, June 14, 2024 4:05 PM > To: Li, Pan2 > Cc: gcc-patches@gcc.gnu.org; juzhe.zhong@rivai.ai; kito.cheng@gmail.com; = jeffreyalaw@gmail.com; rdapp.gcc@gmail.com > Subject: Re: [PATCH v1] Match: Support more forms for the scalar unsigned= .SAT_SUB > > On Wed, Jun 12, 2024 at 2:38=E2=80=AFPM wrote: > > > > From: Pan Li > > > > After we support the scalar unsigned form 1 and 2, we would like > > to introduce more forms include the branch and branchless. There > > are forms 3-10 list as below: > > > > Form 3: > > #define SAT_SUB_U_3(T) \ > > T sat_sub_u_3_##T (T x, T y) \ > > { \ > > return x > y ? x - y : 0; \ > > } > > > > Form 4: > > #define SAT_SUB_U_4(T) \ > > T sat_sub_u_4_##T (T x, T y) \ > > { \ > > return x >=3D y ? x - y : 0; \ > > } > > > > Form 5: > > #define SAT_SUB_U_5(T) \ > > T sat_sub_u_5_##T (T x, T y) \ > > { \ > > return x < y ? 0 : x - y; \ > > } > > > > Form 6: > > #define SAT_SUB_U_6(T) \ > > T sat_sub_u_6_##T (T x, T y) \ > > { \ > > return x <=3D y ? 0 : x - y; \ > > } > > > > Form 7: > > #define SAT_SUB_U_7(T) \ > > T sat_sub_u_7_##T (T x, T y) \ > > { \ > > T ret; \ > > T overflow =3D __builtin_sub_overflow (x, y, &ret); \ > > return ret & (T)(overflow - 1); \ > > } > > > > Form 8: > > #define SAT_SUB_U_8(T) \ > > T sat_sub_u_8_##T (T x, T y) \ > > { \ > > T ret; \ > > T overflow =3D __builtin_sub_overflow (x, y, &ret); \ > > return ret & (T)-(!overflow); \ > > } > > > > Form 9: > > #define SAT_SUB_U_9(T) \ > > T sat_sub_u_9_##T (T x, T y) \ > > { \ > > T ret; \ > > T overflow =3D __builtin_sub_overflow (x, y, &ret); \ > > return overflow ? 0 : ret; \ > > } > > > > Form 10: > > #define SAT_SUB_U_10(T) \ > > T sat_sub_u_10_##T (T x, T y) \ > > { \ > > T ret; \ > > T overflow =3D __builtin_sub_overflow (x, y, &ret); \ > > return !overflow ? ret : 0; \ > > } > > > > Take form 10 as example: > > > > SAT_SUB_U_10(uint64_t); > > > > Before this patch: > > uint8_t sat_sub_u_10_uint8_t (uint8_t x, uint8_t y) > > { > > unsigned char _1; > > unsigned char _2; > > uint8_t _3; > > __complex__ unsigned char _6; > > > > ;; basic block 2, loop depth 0 > > ;; pred: ENTRY > > _6 =3D .SUB_OVERFLOW (x_4(D), y_5(D)); > > _2 =3D IMAGPART_EXPR <_6>; > > if (_2 =3D=3D 0) > > goto ; [50.00%] > > else > > goto ; [50.00%] > > ;; succ: 3 > > ;; 4 > > > > ;; basic block 3, loop depth 0 > > ;; pred: 2 > > _1 =3D REALPART_EXPR <_6>; > > ;; succ: 4 > > > > ;; basic block 4, loop depth 0 > > ;; pred: 2 > > ;; 3 > > # _3 =3D PHI <0(2), _1(3)> > > return _3; > > ;; succ: EXIT > > > > } > > > > After this patch: > > uint8_t sat_sub_u_10_uint8_t (uint8_t x, uint8_t y) > > { > > uint8_t _3; > > > > ;; basic block 2, loop depth 0 > > ;; pred: ENTRY > > _3 =3D .SAT_SUB (x_4(D), y_5(D)); [tail call] > > return _3; > > ;; succ: EXIT > > > > } > > > > The below test suites are passed for this patch: > > 1. The rv64gcv fully regression test with newlib. > > 2. The rv64gcv build with glibc. > > 3. The x86 bootstrap test. > > 4. The x86 fully regression test. > > > > gcc/ChangeLog: > > > > * match.pd: Add more match for unsigned sat_sub. > > * tree-ssa-math-opts.cc (match_unsigned_saturation_sub): Add ne= w > > func impl to match phi node for .SAT_SUB. > > (math_opts_dom_walker::after_dom_children): Try match .SAT_SUB > > for the phi node, MULT_EXPR, BIT_XOR_EXPR and BIT_AND_EXPR. > > > > Signed-off-by: Pan Li > > --- > > gcc/match.pd | 25 +++++++++++++++++++++++-- > > gcc/tree-ssa-math-opts.cc | 33 +++++++++++++++++++++++++++++++++ > > 2 files changed, 56 insertions(+), 2 deletions(-) > > > > diff --git a/gcc/match.pd b/gcc/match.pd > > index 5cfe81e80b3..66e411b3359 100644 > > --- a/gcc/match.pd > > +++ b/gcc/match.pd > > @@ -3140,14 +3140,14 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) > > /* Unsigned saturation sub, case 1 (branch with gt): > > SAT_U_SUB =3D X > Y ? X - Y : 0 */ > > (match (unsigned_integer_sat_sub @0 @1) > > - (cond (gt @0 @1) (minus @0 @1) integer_zerop) > > + (cond^ (gt @0 @1) (minus @0 @1) integer_zerop) > > (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type) > > && types_match (type, @0, @1)))) > > > > /* Unsigned saturation sub, case 2 (branch with ge): > > SAT_U_SUB =3D X >=3D Y ? X - Y : 0. */ > > (match (unsigned_integer_sat_sub @0 @1) > > - (cond (ge @0 @1) (minus @0 @1) integer_zerop) > > + (cond^ (ge @0 @1) (minus @0 @1) integer_zerop) > > (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type) > > && types_match (type, @0, @1)))) > > > > @@ -3165,6 +3165,27 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) > > (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type) > > && types_match (type, @0, @1)))) > > > > +/* Unsigned saturation sub, case 5 (branchless bit_and with .SUB_OVERF= LOW. */ > > +(match (unsigned_integer_sat_sub @0 @1) > > + (bit_and:c (realpart (IFN_SUB_OVERFLOW@2 @0 @1)) > > + (plus:c (imagpart @2) integer_minus_onep)) > > :c shouldn't be necessary on the plus > > > + (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type) > > + && types_match (type, @0, @1)))) > > + > > +/* Unsigned saturation sub, case 6 (branchless mult with .SUB_OVERFLOW= . */ > > +(match (unsigned_integer_sat_sub @0 @1) > > + (mult:c (realpart (IFN_SUB_OVERFLOW@2 @0 @1)) > > + (bit_xor:c (imagpart @2) integer_onep)) > > or on the bit_xor > > OK with those changes. > > Richard. > > > + (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type) > > + && types_match (type, @0, @1)))) > > + > > +/* Unsigned saturation sub, case 7 (branch with .SUB_OVERFLOW. */ > > +(match (unsigned_integer_sat_sub @0 @1) > > + (cond^ (eq (imagpart (IFN_SUB_OVERFLOW@2 @0 @1)) integer_zerop) > > + (realpart @2) integer_zerop) > > + (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type) > > + && types_match (type, @0, @1)))) > > + > > /* x > y && x !=3D XXX_MIN --> x > y > > x > y && x =3D=3D XXX_MIN --> false . */ > > (for eqne (eq ne) > > diff --git a/gcc/tree-ssa-math-opts.cc b/gcc/tree-ssa-math-opts.cc > > index fbb8e0ea306..05aa157611b 100644 > > --- a/gcc/tree-ssa-math-opts.cc > > +++ b/gcc/tree-ssa-math-opts.cc > > @@ -4186,6 +4186,36 @@ match_unsigned_saturation_sub (gimple_stmt_itera= tor *gsi, gassign *stmt) > > build_saturation_binary_arith_call (gsi, IFN_SAT_SUB, lhs, ops[0],= ops[1]); > > } > > > > +/* > > + * Try to match saturation unsigned sub. > > + * [local count: 1073741824]: > > + * if (x_2(D) > y_3(D)) > > + * goto ; [50.00%] > > + * else > > + * goto ; [50.00%] > > + * > > + * [local count: 536870912]: > > + * _4 =3D x_2(D) - y_3(D); > > + * > > + * [local count: 1073741824]: > > + * # _1 =3D PHI <0(2), _4(3)> > > + * =3D> > > + * [local count: 1073741824]: > > + * _1 =3D .SAT_SUB (x_2(D), y_3(D)); */ > > +static void > > +match_unsigned_saturation_sub (gimple_stmt_iterator *gsi, gphi *phi) > > +{ > > + if (gimple_phi_num_args (phi) !=3D 2) > > + return; > > + > > + tree ops[2]; > > + tree phi_result =3D gimple_phi_result (phi); > > + > > + if (gimple_unsigned_integer_sat_sub (phi_result, ops, NULL)) > > + build_saturation_binary_arith_call (gsi, phi, IFN_SAT_SUB, phi_res= ult, > > + ops[0], ops[1]); > > +} > > + > > /* Recognize for unsigned x > > x =3D y - z; > > if (x > y) > > @@ -6104,6 +6134,7 @@ math_opts_dom_walker::after_dom_children (basic_b= lock bb) > > { > > gimple_stmt_iterator gsi =3D gsi_start_bb (bb); > > match_unsigned_saturation_add (&gsi, psi.phi ()); > > + match_unsigned_saturation_sub (&gsi, psi.phi ()); > > } > > > > for (gsi =3D gsi_after_labels (bb); !gsi_end_p (gsi);) > > @@ -6129,6 +6160,7 @@ math_opts_dom_walker::after_dom_children (basic_b= lock bb) > > continue; > > } > > match_arith_overflow (&gsi, stmt, code, m_cfg_changed_p); > > + match_unsigned_saturation_sub (&gsi, as_a (stm= t)); > > break; > > > > case PLUS_EXPR: > > @@ -6167,6 +6199,7 @@ math_opts_dom_walker::after_dom_children (basic_b= lock bb) > > break; > > > > case COND_EXPR: > > + case BIT_AND_EXPR: > > match_unsigned_saturation_sub (&gsi, as_a (stm= t)); > > break; > > > > -- > > 2.34.1 > >