public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: Uros Bizjak <ubizjak@gmail.com>
To: Jakub Jelinek <jakub@redhat.com>
Cc: Richard Biener <rguenther@suse.de>, gcc-patches@gcc.gnu.org
Subject: Re: [PATCH] middle-end, i386, v3: Pattern recognize add/subtract with carry [PR79173]
Date: Wed, 14 Jun 2023 16:34:27 +0200	[thread overview]
Message-ID: <CAFULd4bsEnQHQEJZOa1=7Xpee7g_YzWEsSV5P9QEdaA9qJvCSA@mail.gmail.com> (raw)
In-Reply-To: <ZInH2L1MHxI13xgK@tucnak>

On Wed, Jun 14, 2023 at 4:00 PM Jakub Jelinek <jakub@redhat.com> wrote:
>
> Hi!
>
> On Wed, Jun 14, 2023 at 12:35:42PM +0000, Richard Biener wrote:
> > At this point two pages of code without a comment - can you introduce
> > some vertical spacing and comments as to what is matched now?  The
> > split out functions help somewhat but the code is far from obvious :/
> >
> > Maybe I'm confused by the loops and instead of those sth like
> >
> >  if (match_x_y_z (op0)
> >      || match_x_y_z (op1))
> >    ...
> >
> > would be easier to follow with the loop bodies split out?
> > Maybe put just put them in lambdas even?
> >
> > I guess you'll be around as long as myself so we can go with
> > this code under the premise you're going to maintain it - it's
> > not that I'm writing trivially to understand code myself ...
>
> As I said on IRC, I don't really know how to split that into further
> functions, the problem is that we need to pattern match a lot of
> statements and it is hard to come up with names for each of them.
> And we need quite a lot of variables for checking their interactions.
>
> The code isn't that much different from say match_arith_overflow or
> optimize_spaceship or other larger pattern recognizers.  And the
> intent is that all the code paths in the recognizer are actually covered
> by the testcases in the testsuite.
>
> That said, I've added 18 new comments to the function, and rebased it
> on top of the
> https://gcc.gnu.org/pipermail/gcc-patches/2023-June/621717.html
> patch with all constant arguments handling moved to fold-const-call.cc
> even for the new ifns.
>
> Ok for trunk like this if it passes bootstrap/regtest?
>
> 2023-06-13  Jakub Jelinek  <jakub@redhat.com>
>
>         PR middle-end/79173
>         * internal-fn.def (UADDC, USUBC): New internal functions.
>         * internal-fn.cc (expand_UADDC, expand_USUBC): New functions.
>         (commutative_ternary_fn_p): Return true also for IFN_UADDC.
>         * optabs.def (uaddc5_optab, usubc5_optab): New optabs.
>         * tree-ssa-math-opts.cc (uaddc_cast, uaddc_ne0, uaddc_is_cplxpart,
>         match_uaddc_usubc): New functions.
>         (math_opts_dom_walker::after_dom_children): Call match_uaddc_usubc
>         for PLUS_EXPR, MINUS_EXPR, BIT_IOR_EXPR and BIT_XOR_EXPR unless
>         other optimizations have been successful for those.
>         * gimple-fold.cc (gimple_fold_call): Handle IFN_UADDC and IFN_USUBC.
>         * fold-const-call.cc (fold_const_call): Likewise.
>         * gimple-range-fold.cc (adjust_imagpart_expr): Likewise.
>         * tree-ssa-dce.cc (eliminate_unnecessary_stmts): Likewise.
>         * doc/md.texi (uaddc<mode>5, usubc<mode>5): Document new named
>         patterns.
>         * config/i386/i386.md (subborrow<mode>): Add alternative with
>         memory destination.
>         (uaddc<mode>5, usubc<mode>5): New define_expand patterns.
>         (*sub<mode>_3, @add<mode>3_carry, addcarry<mode>, @sub<mode>3_carry,
>         subborrow<mode>, *add<mode>3_cc_overflow_1): Add define_peephole2
>         TARGET_READ_MODIFY_WRITE/-Os patterns to prefer using memory
>         destination in these patterns.
>
>         * gcc.target/i386/pr79173-1.c: New test.
>         * gcc.target/i386/pr79173-2.c: New test.
>         * gcc.target/i386/pr79173-3.c: New test.
>         * gcc.target/i386/pr79173-4.c: New test.
>         * gcc.target/i386/pr79173-5.c: New test.
>         * gcc.target/i386/pr79173-6.c: New test.
>         * gcc.target/i386/pr79173-7.c: New test.
>         * gcc.target/i386/pr79173-8.c: New test.
>         * gcc.target/i386/pr79173-9.c: New test.
>         * gcc.target/i386/pr79173-10.c: New test.

LGTM for the x86 part. I did my best, but those peephole2 patterns are
real PITA to be reviewed thoroughly.

Maybe split out peephole2 pack to a separate patch, followed by a
testcase patch. This way, bisection would be able to point out if a
generic part or target-dependent part caused eventual regression.

Thanks,
Uros.

>
> --- gcc/internal-fn.def.jj      2023-06-13 18:23:37.208793152 +0200
> +++ gcc/internal-fn.def 2023-06-14 12:21:38.650657857 +0200
> @@ -416,6 +416,8 @@ DEF_INTERNAL_FN (ASAN_POISON_USE, ECF_LE
>  DEF_INTERNAL_FN (ADD_OVERFLOW, ECF_CONST | ECF_LEAF | ECF_NOTHROW, NULL)
>  DEF_INTERNAL_FN (SUB_OVERFLOW, ECF_CONST | ECF_LEAF | ECF_NOTHROW, NULL)
>  DEF_INTERNAL_FN (MUL_OVERFLOW, ECF_CONST | ECF_LEAF | ECF_NOTHROW, NULL)
> +DEF_INTERNAL_FN (UADDC, ECF_CONST | ECF_LEAF | ECF_NOTHROW, NULL)
> +DEF_INTERNAL_FN (USUBC, ECF_CONST | ECF_LEAF | ECF_NOTHROW, NULL)
>  DEF_INTERNAL_FN (TSAN_FUNC_EXIT, ECF_NOVOPS | ECF_LEAF | ECF_NOTHROW, NULL)
>  DEF_INTERNAL_FN (VA_ARG, ECF_NOTHROW | ECF_LEAF, NULL)
>  DEF_INTERNAL_FN (VEC_CONVERT, ECF_CONST | ECF_LEAF | ECF_NOTHROW, NULL)
> --- gcc/internal-fn.cc.jj       2023-06-13 18:23:37.206793179 +0200
> +++ gcc/internal-fn.cc  2023-06-14 12:21:38.652657829 +0200
> @@ -2776,6 +2776,44 @@ expand_MUL_OVERFLOW (internal_fn, gcall
>    expand_arith_overflow (MULT_EXPR, stmt);
>  }
>
> +/* Expand UADDC STMT.  */
> +
> +static void
> +expand_UADDC (internal_fn ifn, gcall *stmt)
> +{
> +  tree lhs = gimple_call_lhs (stmt);
> +  tree arg1 = gimple_call_arg (stmt, 0);
> +  tree arg2 = gimple_call_arg (stmt, 1);
> +  tree arg3 = gimple_call_arg (stmt, 2);
> +  tree type = TREE_TYPE (arg1);
> +  machine_mode mode = TYPE_MODE (type);
> +  insn_code icode = optab_handler (ifn == IFN_UADDC
> +                                  ? uaddc5_optab : usubc5_optab, mode);
> +  rtx op1 = expand_normal (arg1);
> +  rtx op2 = expand_normal (arg2);
> +  rtx op3 = expand_normal (arg3);
> +  rtx target = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE);
> +  rtx re = gen_reg_rtx (mode);
> +  rtx im = gen_reg_rtx (mode);
> +  class expand_operand ops[5];
> +  create_output_operand (&ops[0], re, mode);
> +  create_output_operand (&ops[1], im, mode);
> +  create_input_operand (&ops[2], op1, mode);
> +  create_input_operand (&ops[3], op2, mode);
> +  create_input_operand (&ops[4], op3, mode);
> +  expand_insn (icode, 5, ops);
> +  write_complex_part (target, re, false, false);
> +  write_complex_part (target, im, true, false);
> +}
> +
> +/* Expand USUBC STMT.  */
> +
> +static void
> +expand_USUBC (internal_fn ifn, gcall *stmt)
> +{
> +  expand_UADDC (ifn, stmt);
> +}
> +
>  /* This should get folded in tree-vectorizer.cc.  */
>
>  static void
> @@ -4049,6 +4087,7 @@ commutative_ternary_fn_p (internal_fn fn
>      case IFN_FMS:
>      case IFN_FNMA:
>      case IFN_FNMS:
> +    case IFN_UADDC:
>        return true;
>
>      default:
> --- gcc/optabs.def.jj   2023-06-13 18:23:37.210793124 +0200
> +++ gcc/optabs.def      2023-06-14 12:21:38.652657829 +0200
> @@ -260,6 +260,8 @@ OPTAB_D (uaddv4_optab, "uaddv$I$a4")
>  OPTAB_D (usubv4_optab, "usubv$I$a4")
>  OPTAB_D (umulv4_optab, "umulv$I$a4")
>  OPTAB_D (negv3_optab, "negv$I$a3")
> +OPTAB_D (uaddc5_optab, "uaddc$I$a5")
> +OPTAB_D (usubc5_optab, "usubc$I$a5")
>  OPTAB_D (addptr3_optab, "addptr$a3")
>  OPTAB_D (spaceship_optab, "spaceship$a3")
>
> --- gcc/tree-ssa-math-opts.cc.jj        2023-06-13 18:23:37.217793028 +0200
> +++ gcc/tree-ssa-math-opts.cc   2023-06-14 15:39:44.853759892 +0200
> @@ -4441,6 +4441,496 @@ match_arith_overflow (gimple_stmt_iterat
>    return false;
>  }
>
> +/* Helper of match_uaddc_usubc.  Look through an integral cast
> +   which should preserve [0, 1] range value (unless source has
> +   1-bit signed type) and the cast has single use.  */
> +
> +static gimple *
> +uaddc_cast (gimple *g)
> +{
> +  if (!gimple_assign_cast_p (g))
> +    return g;
> +  tree op = gimple_assign_rhs1 (g);
> +  if (TREE_CODE (op) == SSA_NAME
> +      && INTEGRAL_TYPE_P (TREE_TYPE (op))
> +      && (TYPE_PRECISION (TREE_TYPE (op)) > 1
> +         || TYPE_UNSIGNED (TREE_TYPE (op)))
> +      && has_single_use (gimple_assign_lhs (g)))
> +    return SSA_NAME_DEF_STMT (op);
> +  return g;
> +}
> +
> +/* Helper of match_uaddc_usubc.  Look through a NE_EXPR
> +   comparison with 0 which also preserves [0, 1] value range.  */
> +
> +static gimple *
> +uaddc_ne0 (gimple *g)
> +{
> +  if (is_gimple_assign (g)
> +      && gimple_assign_rhs_code (g) == NE_EXPR
> +      && integer_zerop (gimple_assign_rhs2 (g))
> +      && TREE_CODE (gimple_assign_rhs1 (g)) == SSA_NAME
> +      && has_single_use (gimple_assign_lhs (g)))
> +    return SSA_NAME_DEF_STMT (gimple_assign_rhs1 (g));
> +  return g;
> +}
> +
> +/* Return true if G is {REAL,IMAG}PART_EXPR PART with SSA_NAME
> +   operand.  */
> +
> +static bool
> +uaddc_is_cplxpart (gimple *g, tree_code part)
> +{
> +  return (is_gimple_assign (g)
> +         && gimple_assign_rhs_code (g) == part
> +         && TREE_CODE (TREE_OPERAND (gimple_assign_rhs1 (g), 0)) == SSA_NAME);
> +}
> +
> +/* Try to match e.g.
> +   _29 = .ADD_OVERFLOW (_3, _4);
> +   _30 = REALPART_EXPR <_29>;
> +   _31 = IMAGPART_EXPR <_29>;
> +   _32 = .ADD_OVERFLOW (_30, _38);
> +   _33 = REALPART_EXPR <_32>;
> +   _34 = IMAGPART_EXPR <_32>;
> +   _35 = _31 + _34;
> +   as
> +   _36 = .UADDC (_3, _4, _38);
> +   _33 = REALPART_EXPR <_36>;
> +   _35 = IMAGPART_EXPR <_36>;
> +   or
> +   _22 = .SUB_OVERFLOW (_6, _5);
> +   _23 = REALPART_EXPR <_22>;
> +   _24 = IMAGPART_EXPR <_22>;
> +   _25 = .SUB_OVERFLOW (_23, _37);
> +   _26 = REALPART_EXPR <_25>;
> +   _27 = IMAGPART_EXPR <_25>;
> +   _28 = _24 | _27;
> +   as
> +   _29 = .USUBC (_6, _5, _37);
> +   _26 = REALPART_EXPR <_29>;
> +   _288 = IMAGPART_EXPR <_29>;
> +   provided _38 or _37 above have [0, 1] range
> +   and _3, _4 and _30 or _6, _5 and _23 are unsigned
> +   integral types with the same precision.  Whether + or | or ^ is
> +   used on the IMAGPART_EXPR results doesn't matter, with one of
> +   added or subtracted operands in [0, 1] range at most one
> +   .ADD_OVERFLOW or .SUB_OVERFLOW will indicate overflow.  */
> +
> +static bool
> +match_uaddc_usubc (gimple_stmt_iterator *gsi, gimple *stmt, tree_code code)
> +{
> +  tree rhs[4];
> +  rhs[0] = gimple_assign_rhs1 (stmt);
> +  rhs[1] = gimple_assign_rhs2 (stmt);
> +  rhs[2] = NULL_TREE;
> +  rhs[3] = NULL_TREE;
> +  tree type = TREE_TYPE (rhs[0]);
> +  if (!INTEGRAL_TYPE_P (type) || !TYPE_UNSIGNED (type))
> +    return false;
> +
> +  if (code != BIT_IOR_EXPR && code != BIT_XOR_EXPR)
> +    {
> +      /* If overflow flag is ignored on the MSB limb, we can end up with
> +        the most significant limb handled as r = op1 + op2 + ovf1 + ovf2;
> +        or r = op1 - op2 - ovf1 - ovf2; or various equivalent expressions
> +        thereof.  Handle those like the ovf = ovf1 + ovf2; case to recognize
> +        the limb below the MSB, but also create another .UADDC/.USUBC call
> +        for the last limb.
> +
> +        First look through assignments with the same rhs code as CODE,
> +        with the exception that subtraction of a constant is canonicalized
> +        into addition of its negation.  rhs[0] will be minuend for
> +        subtractions and one of addends for addition, all other assigned
> +        rhs[i] operands will be subtrahends or other addends.  */
> +      while (TREE_CODE (rhs[0]) == SSA_NAME && !rhs[3])
> +       {
> +         gimple *g = SSA_NAME_DEF_STMT (rhs[0]);
> +         if (has_single_use (rhs[0])
> +             && is_gimple_assign (g)
> +             && (gimple_assign_rhs_code (g) == code
> +                 || (code == MINUS_EXPR
> +                     && gimple_assign_rhs_code (g) == PLUS_EXPR
> +                     && TREE_CODE (gimple_assign_rhs2 (g)) == INTEGER_CST)))
> +           {
> +             tree r2 = gimple_assign_rhs2 (g);
> +             if (gimple_assign_rhs_code (g) != code)
> +               {
> +                 r2 = const_unop (NEGATE_EXPR, TREE_TYPE (r2), r2);
> +                 if (!r2)
> +                   break;
> +               }
> +             rhs[0] = gimple_assign_rhs1 (g);
> +             tree &r = rhs[2] ? rhs[3] : rhs[2];
> +             r = r2;
> +           }
> +         else
> +           break;
> +       }
> +      while (TREE_CODE (rhs[1]) == SSA_NAME && !rhs[3])
> +       {
> +         gimple *g = SSA_NAME_DEF_STMT (rhs[1]);
> +         if (has_single_use (rhs[1])
> +             && is_gimple_assign (g)
> +             && gimple_assign_rhs_code (g) == PLUS_EXPR)
> +           {
> +             rhs[1] = gimple_assign_rhs1 (g);
> +             if (rhs[2])
> +               rhs[3] = gimple_assign_rhs2 (g);
> +             else
> +               rhs[2] = gimple_assign_rhs2 (g);
> +           }
> +         else
> +           break;
> +       }
> +      /* If there are just 3 addends or one minuend and two subtrahends,
> +        check for UADDC or USUBC being pattern recognized earlier.
> +        Say r = op1 + op2 + ovf1 + ovf2; where the (ovf1 + ovf2) part
> +        got pattern matched earlier as __imag__ .UADDC (arg1, arg2, arg3)
> +        etc.  */
> +      if (rhs[2] && !rhs[3])
> +       {
> +         for (int i = (code == MINUS_EXPR ? 1 : 0); i < 3; ++i)
> +           if (TREE_CODE (rhs[i]) == SSA_NAME)
> +             {
> +               gimple *im = uaddc_cast (SSA_NAME_DEF_STMT (rhs[i]));
> +               im = uaddc_ne0 (im);
> +               if (uaddc_is_cplxpart (im, IMAGPART_EXPR))
> +                 {
> +                   /* We found one of the 3 addends or 2 subtrahends to be
> +                      __imag__ of something, verify it is .UADDC/.USUBC.  */
> +                   tree rhs1 = gimple_assign_rhs1 (im);
> +                   gimple *ovf = SSA_NAME_DEF_STMT (TREE_OPERAND (rhs1, 0));
> +                   if (gimple_call_internal_p (ovf, code == PLUS_EXPR
> +                                                    ? IFN_UADDC : IFN_USUBC)
> +                       && (optab_handler (code == PLUS_EXPR
> +                                          ? uaddc5_optab : usubc5_optab,
> +                                          TYPE_MODE (type))
> +                           != CODE_FOR_nothing))
> +                     {
> +                       /* And in that case build another .UADDC/.USUBC
> +                          call for the most significand limb addition.
> +                          Overflow bit is ignored here.  */
> +                       if (i != 2)
> +                         std::swap (rhs[i], rhs[2]);
> +                       gimple *g
> +                         = gimple_build_call_internal (code == PLUS_EXPR
> +                                                       ? IFN_UADDC
> +                                                       : IFN_USUBC,
> +                                                       3, rhs[0], rhs[1],
> +                                                       rhs[2]);
> +                       tree nlhs = make_ssa_name (build_complex_type (type));
> +                       gimple_call_set_lhs (g, nlhs);
> +                       gsi_insert_before (gsi, g, GSI_SAME_STMT);
> +                       tree ilhs = gimple_assign_lhs (stmt);
> +                       g = gimple_build_assign (ilhs, REALPART_EXPR,
> +                                                build1 (REALPART_EXPR,
> +                                                        TREE_TYPE (ilhs),
> +                                                        nlhs));
> +                       gsi_replace (gsi, g, true);
> +                       return true;
> +                     }
> +                 }
> +             }
> +         return false;
> +       }
> +      if (code == MINUS_EXPR && !rhs[2])
> +       return false;
> +      if (code == MINUS_EXPR)
> +       /* Code below expects rhs[0] and rhs[1] to have the IMAGPART_EXPRs.
> +          So, for MINUS_EXPR swap the single added rhs operand (others are
> +          subtracted) to rhs[3].  */
> +       std::swap (rhs[0], rhs[3]);
> +    }
> +  /* Walk from both operands of STMT (for +/- even sometimes from
> +     all the 4 addends or 3 subtrahends), see through casts and != 0
> +     statements which would preserve [0, 1] range of values and
> +     check which is initialized from __imag__.  */
> +  gimple *im1 = NULL, *im2 = NULL;
> +  for (int i = 0; i < (code == MINUS_EXPR ? 3 : 4); i++)
> +    if (rhs[i] && TREE_CODE (rhs[i]) == SSA_NAME)
> +      {
> +       gimple *im = uaddc_cast (SSA_NAME_DEF_STMT (rhs[i]));
> +       im = uaddc_ne0 (im);
> +       if (uaddc_is_cplxpart (im, IMAGPART_EXPR))
> +         {
> +           if (im1 == NULL)
> +             {
> +               im1 = im;
> +               if (i != 0)
> +                 std::swap (rhs[0], rhs[i]);
> +             }
> +           else
> +             {
> +               im2 = im;
> +               if (i != 1)
> +                 std::swap (rhs[1], rhs[i]);
> +               break;
> +             }
> +         }
> +      }
> +  /* If we don't find at least two, punt.  */
> +  if (!im2)
> +    return false;
> +  /* Check they are __imag__ of .ADD_OVERFLOW or .SUB_OVERFLOW call results,
> +     either both .ADD_OVERFLOW or both .SUB_OVERFLOW and that we have
> +     uaddc5/usubc5 named pattern for the corresponding mode.  */
> +  gimple *ovf1
> +    = SSA_NAME_DEF_STMT (TREE_OPERAND (gimple_assign_rhs1 (im1), 0));
> +  gimple *ovf2
> +    = SSA_NAME_DEF_STMT (TREE_OPERAND (gimple_assign_rhs1 (im2), 0));
> +  internal_fn ifn;
> +  if (!is_gimple_call (ovf1)
> +      || !gimple_call_internal_p (ovf1)
> +      || ((ifn = gimple_call_internal_fn (ovf1)) != IFN_ADD_OVERFLOW
> +         && ifn != IFN_SUB_OVERFLOW)
> +      || !gimple_call_internal_p (ovf2, ifn)
> +      || optab_handler (ifn == IFN_ADD_OVERFLOW ? uaddc5_optab : usubc5_optab,
> +                       TYPE_MODE (type)) == CODE_FOR_nothing
> +      || (rhs[2]
> +         && optab_handler (code == PLUS_EXPR ? uaddc5_optab : usubc5_optab,
> +                           TYPE_MODE (type)) == CODE_FOR_nothing))
> +    return false;
> +  tree arg1, arg2, arg3 = NULL_TREE;
> +  gimple *re1 = NULL, *re2 = NULL;
> +  /* On one of the two calls, one of the .ADD_OVERFLOW/.SUB_OVERFLOW arguments
> +     should be initialized from __real__ of the other of the two calls.
> +     Though, for .SUB_OVERFLOW, it has to be the first argument, not the
> +     second one.  */
> +  for (int i = (ifn == IFN_ADD_OVERFLOW ? 1 : 0); i >= 0; --i)
> +    for (gimple *ovf = ovf1; ovf; ovf = (ovf == ovf1 ? ovf2 : NULL))
> +      {
> +       tree arg = gimple_call_arg (ovf, i);
> +       if (TREE_CODE (arg) != SSA_NAME)
> +         continue;
> +       re1 = SSA_NAME_DEF_STMT (arg);
> +       if (uaddc_is_cplxpart (re1, REALPART_EXPR)
> +           && (SSA_NAME_DEF_STMT (TREE_OPERAND (gimple_assign_rhs1 (re1), 0))
> +               == (ovf == ovf1 ? ovf2 : ovf1)))
> +         {
> +           if (ovf == ovf1)
> +             {
> +               /* Make sure ovf2 is the .*_OVERFLOW call with argument
> +                  initialized from __real__ of ovf1.  */
> +               std::swap (rhs[0], rhs[1]);
> +               std::swap (im1, im2);
> +               std::swap (ovf1, ovf2);
> +             }
> +           arg3 = gimple_call_arg (ovf, 1 - i);
> +           i = -1;
> +           break;
> +         }
> +      }
> +  if (!arg3)
> +    return false;
> +  arg1 = gimple_call_arg (ovf1, 0);
> +  arg2 = gimple_call_arg (ovf1, 1);
> +  if (!types_compatible_p (type, TREE_TYPE (arg1)))
> +    return false;
> +  int kind[2] = { 0, 0 };
> +  /* At least one of arg2 and arg3 should have type compatible
> +     with arg1/rhs[0], and the other one should have value in [0, 1]
> +     range.  If both are in [0, 1] range and type compatible with
> +     arg1/rhs[0], try harder to find after looking through casts,
> +     != 0 comparisons which one is initialized to __imag__ of
> +     .{ADD,SUB}_OVERFLOW or .U{ADD,SUB}C call results.  */
> +  for (int i = 0; i < 2; ++i)
> +    {
> +      tree arg = i == 0 ? arg2 : arg3;
> +      if (types_compatible_p (type, TREE_TYPE (arg)))
> +       kind[i] = 1;
> +      if (!INTEGRAL_TYPE_P (TREE_TYPE (arg))
> +         || (TYPE_PRECISION (TREE_TYPE (arg)) == 1
> +             && !TYPE_UNSIGNED (TREE_TYPE (arg))))
> +       continue;
> +      if (tree_zero_one_valued_p (arg))
> +       kind[i] |= 2;
> +      if (TREE_CODE (arg) == SSA_NAME)
> +       {
> +         gimple *g = SSA_NAME_DEF_STMT (arg);
> +         if (gimple_assign_cast_p (g))
> +           {
> +             tree op = gimple_assign_rhs1 (g);
> +             if (TREE_CODE (op) == SSA_NAME
> +                 && INTEGRAL_TYPE_P (TREE_TYPE (op)))
> +               g = SSA_NAME_DEF_STMT (op);
> +           }
> +         g = uaddc_ne0 (g);
> +         if (!uaddc_is_cplxpart (g, IMAGPART_EXPR))
> +           continue;
> +         g = SSA_NAME_DEF_STMT (TREE_OPERAND (gimple_assign_rhs1 (g), 0));
> +         if (!is_gimple_call (g) || !gimple_call_internal_p (g))
> +           continue;
> +         switch (gimple_call_internal_fn (g))
> +           {
> +           case IFN_ADD_OVERFLOW:
> +           case IFN_SUB_OVERFLOW:
> +           case IFN_UADDC:
> +           case IFN_USUBC:
> +             break;
> +           default:
> +             continue;
> +           }
> +         kind[i] |= 4;
> +       }
> +    }
> +  /* Make arg2 the one with compatible type and arg3 the one
> +     with [0, 1] range.  If both is true for both operands,
> +     prefer as arg3 result of __imag__ of some ifn.  */
> +  if ((kind[0] & 1) == 0 || ((kind[1] & 1) != 0 && kind[0] > kind[1]))
> +    {
> +      std::swap (arg2, arg3);
> +      std::swap (kind[0], kind[1]);
> +    }
> +  if ((kind[0] & 1) == 0 || (kind[1] & 6) == 0)
> +    return false;
> +  if (!has_single_use (gimple_assign_lhs (im1))
> +      || !has_single_use (gimple_assign_lhs (im2))
> +      || !has_single_use (gimple_assign_lhs (re1))
> +      || num_imm_uses (gimple_call_lhs (ovf1)) != 2)
> +    return false;
> +  /* Check that ovf2's result is used in __real__ and set re2
> +     to that statement.  */
> +  use_operand_p use_p;
> +  imm_use_iterator iter;
> +  tree lhs = gimple_call_lhs (ovf2);
> +  FOR_EACH_IMM_USE_FAST (use_p, iter, lhs)
> +    {
> +      gimple *use_stmt = USE_STMT (use_p);
> +      if (is_gimple_debug (use_stmt))
> +       continue;
> +      if (use_stmt == im2)
> +       continue;
> +      if (re2)
> +       return false;
> +      if (!uaddc_is_cplxpart (use_stmt, REALPART_EXPR))
> +       return false;
> +      re2 = use_stmt;
> +    }
> +  /* Build .UADDC/.USUBC call which will be placed before the stmt.  */
> +  gimple_stmt_iterator gsi2 = gsi_for_stmt (ovf2);
> +  gimple *g;
> +  if ((kind[1] & 1) == 0)
> +    {
> +      if (TREE_CODE (arg3) == INTEGER_CST)
> +       arg3 = fold_convert (type, arg3);
> +      else
> +       {
> +         g = gimple_build_assign (make_ssa_name (type), NOP_EXPR, arg3);
> +         gsi_insert_before (&gsi2, g, GSI_SAME_STMT);
> +         arg3 = gimple_assign_lhs (g);
> +       }
> +    }
> +  g = gimple_build_call_internal (ifn == IFN_ADD_OVERFLOW
> +                                 ? IFN_UADDC : IFN_USUBC,
> +                                 3, arg1, arg2, arg3);
> +  tree nlhs = make_ssa_name (TREE_TYPE (lhs));
> +  gimple_call_set_lhs (g, nlhs);
> +  gsi_insert_before (&gsi2, g, GSI_SAME_STMT);
> +  /* In the case where stmt is | or ^ of two overflow flags
> +     or addition of those, replace stmt with __imag__ of the above
> +     added call.  In case of arg1 + arg2 + (ovf1 + ovf2) or
> +     arg1 - arg2 - (ovf1 + ovf2) just emit it before stmt.  */
> +  tree ilhs = rhs[2] ? make_ssa_name (type) : gimple_assign_lhs (stmt);
> +  g = gimple_build_assign (ilhs, IMAGPART_EXPR,
> +                          build1 (IMAGPART_EXPR, TREE_TYPE (ilhs), nlhs));
> +  if (rhs[2])
> +    gsi_insert_before (gsi, g, GSI_SAME_STMT);
> +  else
> +    gsi_replace (gsi, g, true);
> +  /* Remove some statements which can't be kept in the IL because they
> +     use SSA_NAME whose setter is going to be removed too.  */
> +  tree rhs1 = rhs[1];
> +  for (int i = 0; i < 2; i++)
> +    if (rhs1 == gimple_assign_lhs (im2))
> +      break;
> +    else
> +      {
> +       g = SSA_NAME_DEF_STMT (rhs1);
> +       rhs1 = gimple_assign_rhs1 (g);
> +       gsi2 = gsi_for_stmt (g);
> +       gsi_remove (&gsi2, true);
> +      }
> +  gcc_checking_assert (rhs1 == gimple_assign_lhs (im2));
> +  gsi2 = gsi_for_stmt (im2);
> +  gsi_remove (&gsi2, true);
> +  /* Replace the re2 statement with __real__ of the newly added
> +     .UADDC/.USUBC call.  */
> +  gsi2 = gsi_for_stmt (re2);
> +  tree rlhs = gimple_assign_lhs (re2);
> +  g = gimple_build_assign (rlhs, REALPART_EXPR,
> +                          build1 (REALPART_EXPR, TREE_TYPE (rlhs), nlhs));
> +  gsi_replace (&gsi2, g, true);
> +  if (rhs[2])
> +    {
> +      /* If this is the arg1 + arg2 + (ovf1 + ovf2) or
> +        arg1 - arg2 - (ovf1 + ovf2) case for the most significant limb,
> +        replace stmt with __real__ of another .UADDC/.USUBC call which
> +        handles the most significant limb.  Overflow flag from this is
> +        ignored.  */
> +      g = gimple_build_call_internal (code == PLUS_EXPR
> +                                     ? IFN_UADDC : IFN_USUBC,
> +                                     3, rhs[3], rhs[2], ilhs);
> +      nlhs = make_ssa_name (TREE_TYPE (lhs));
> +      gimple_call_set_lhs (g, nlhs);
> +      gsi_insert_before (gsi, g, GSI_SAME_STMT);
> +      ilhs = gimple_assign_lhs (stmt);
> +      g = gimple_build_assign (ilhs, REALPART_EXPR,
> +                              build1 (REALPART_EXPR, TREE_TYPE (ilhs), nlhs));
> +      gsi_replace (gsi, g, true);
> +    }
> +  if (TREE_CODE (arg3) == SSA_NAME)
> +    {
> +      /* When pattern recognizing the second least significant limb
> +        above (i.e. first pair of .{ADD,SUB}_OVERFLOW calls for one limb),
> +        check if the [0, 1] range argument (i.e. carry in) isn't the
> +        result of another .{ADD,SUB}_OVERFLOW call (one handling the
> +        least significant limb).  Again look through casts and != 0.  */
> +      gimple *im3 = SSA_NAME_DEF_STMT (arg3);
> +      for (int i = 0; i < 2; ++i)
> +       {
> +         gimple *im4 = uaddc_cast (im3);
> +         if (im4 == im3)
> +           break;
> +         else
> +           im3 = im4;
> +       }
> +      im3 = uaddc_ne0 (im3);
> +      if (uaddc_is_cplxpart (im3, IMAGPART_EXPR))
> +       {
> +         gimple *ovf3
> +           = SSA_NAME_DEF_STMT (TREE_OPERAND (gimple_assign_rhs1 (im3), 0));
> +         if (gimple_call_internal_p (ovf3, ifn))
> +           {
> +             lhs = gimple_call_lhs (ovf3);
> +             arg1 = gimple_call_arg (ovf3, 0);
> +             arg2 = gimple_call_arg (ovf3, 1);
> +             if (types_compatible_p (type, TREE_TYPE (TREE_TYPE (lhs)))
> +                 && types_compatible_p (type, TREE_TYPE (arg1))
> +                 && types_compatible_p (type, TREE_TYPE (arg2)))
> +               {
> +                 /* And if it is initialized from result of __imag__
> +                    of .{ADD,SUB}_OVERFLOW call, replace that
> +                    call with .U{ADD,SUB}C call with the same arguments,
> +                    just 0 added as third argument.  This isn't strictly
> +                    necessary, .ADD_OVERFLOW (x, y) and .UADDC (x, y, 0)
> +                    produce the same result, but may result in better
> +                    generated code on some targets where the backend can
> +                    better prepare in how the result will be used.  */
> +                 g = gimple_build_call_internal (ifn == IFN_ADD_OVERFLOW
> +                                                 ? IFN_UADDC : IFN_USUBC,
> +                                                 3, arg1, arg2,
> +                                                 build_zero_cst (type));
> +                 gimple_call_set_lhs (g, lhs);
> +                 gsi2 = gsi_for_stmt (ovf3);
> +                 gsi_replace (&gsi2, g, true);
> +               }
> +           }
> +       }
> +    }
> +  return true;
> +}
> +
>  /* Return true if target has support for divmod.  */
>
>  static bool
> @@ -5068,8 +5558,9 @@ math_opts_dom_walker::after_dom_children
>
>             case PLUS_EXPR:
>             case MINUS_EXPR:
> -             if (!convert_plusminus_to_widen (&gsi, stmt, code))
> -               match_arith_overflow (&gsi, stmt, code, m_cfg_changed_p);
> +             if (!convert_plusminus_to_widen (&gsi, stmt, code)
> +                 && !match_arith_overflow (&gsi, stmt, code, m_cfg_changed_p))
> +               match_uaddc_usubc (&gsi, stmt, code);
>               break;
>
>             case BIT_NOT_EXPR:
> @@ -5085,6 +5576,11 @@ math_opts_dom_walker::after_dom_children
>               convert_mult_to_highpart (as_a<gassign *> (stmt), &gsi);
>               break;
>
> +           case BIT_IOR_EXPR:
> +           case BIT_XOR_EXPR:
> +             match_uaddc_usubc (&gsi, stmt, code);
> +             break;
> +
>             default:;
>             }
>         }
> --- gcc/gimple-fold.cc.jj       2023-06-14 15:41:51.090987708 +0200
> +++ gcc/gimple-fold.cc  2023-06-14 12:52:04.335054958 +0200
> @@ -5585,6 +5585,7 @@ gimple_fold_call (gimple_stmt_iterator *
>        enum tree_code subcode = ERROR_MARK;
>        tree result = NULL_TREE;
>        bool cplx_result = false;
> +      bool uaddc_usubc = false;
>        tree overflow = NULL_TREE;
>        switch (gimple_call_internal_fn (stmt))
>         {
> @@ -5658,6 +5659,16 @@ gimple_fold_call (gimple_stmt_iterator *
>           subcode = MULT_EXPR;
>           cplx_result = true;
>           break;
> +       case IFN_UADDC:
> +         subcode = PLUS_EXPR;
> +         cplx_result = true;
> +         uaddc_usubc = true;
> +         break;
> +       case IFN_USUBC:
> +         subcode = MINUS_EXPR;
> +         cplx_result = true;
> +         uaddc_usubc = true;
> +         break;
>         case IFN_MASK_LOAD:
>           changed |= gimple_fold_partial_load (gsi, stmt, true);
>           break;
> @@ -5677,6 +5688,7 @@ gimple_fold_call (gimple_stmt_iterator *
>         {
>           tree arg0 = gimple_call_arg (stmt, 0);
>           tree arg1 = gimple_call_arg (stmt, 1);
> +         tree arg2 = NULL_TREE;
>           tree type = TREE_TYPE (arg0);
>           if (cplx_result)
>             {
> @@ -5685,9 +5697,26 @@ gimple_fold_call (gimple_stmt_iterator *
>                 type = NULL_TREE;
>               else
>                 type = TREE_TYPE (TREE_TYPE (lhs));
> +             if (uaddc_usubc)
> +               arg2 = gimple_call_arg (stmt, 2);
>             }
>           if (type == NULL_TREE)
>             ;
> +         else if (uaddc_usubc)
> +           {
> +             if (!integer_zerop (arg2))
> +               ;
> +             /* x = y + 0 + 0; x = y - 0 - 0; */
> +             else if (integer_zerop (arg1))
> +               result = arg0;
> +             /* x = 0 + y + 0; */
> +             else if (subcode != MINUS_EXPR && integer_zerop (arg0))
> +               result = arg1;
> +             /* x = y - y - 0; */
> +             else if (subcode == MINUS_EXPR
> +                      && operand_equal_p (arg0, arg1, 0))
> +               result = integer_zero_node;
> +           }
>           /* x = y + 0; x = y - 0; x = y * 0; */
>           else if (integer_zerop (arg1))
>             result = subcode == MULT_EXPR ? integer_zero_node : arg0;
> --- gcc/fold-const-call.cc.jj   2023-06-14 15:40:34.388064498 +0200
> +++ gcc/fold-const-call.cc      2023-06-14 12:56:08.195631214 +0200
> @@ -1937,6 +1937,30 @@ fold_const_call (combined_fn fn, tree ty
>         return NULL_TREE;
>        }
>
> +    case CFN_UADDC:
> +    case CFN_USUBC:
> +      if (integer_cst_p (arg0) && integer_cst_p (arg1) && integer_cst_p (arg2))
> +       {
> +         tree itype = TREE_TYPE (type);
> +         bool ovf = false;
> +         tree_code subcode = fn == CFN_UADDC ? PLUS_EXPR : MINUS_EXPR;
> +         tree r = int_const_binop (subcode, fold_convert (itype, arg0),
> +                                   fold_convert (itype, arg1));
> +         if (!r)
> +           return NULL_TREE;
> +         if (arith_overflowed_p (subcode, itype, arg0, arg1))
> +           ovf = true;
> +         tree r2 = int_const_binop (subcode, r, fold_convert (itype, arg2));
> +         if (!r2 || TREE_CODE (r2) != INTEGER_CST)
> +           return NULL_TREE;
> +         if (arith_overflowed_p (subcode, itype, r, arg2))
> +           ovf = true;
> +         if (TREE_OVERFLOW (r2))
> +           r2 = drop_tree_overflow (r2);
> +         return build_complex (type, r2, build_int_cst (itype, ovf));
> +       }
> +      return NULL_TREE;
> +
>      default:
>        return fold_const_call_1 (fn, type, arg0, arg1, arg2);
>      }
> --- gcc/gimple-range-fold.cc.jj 2023-06-13 18:23:37.202793234 +0200
> +++ gcc/gimple-range-fold.cc    2023-06-14 12:21:38.658657745 +0200
> @@ -489,6 +489,8 @@ adjust_imagpart_expr (vrange &res, const
>         case IFN_ADD_OVERFLOW:
>         case IFN_SUB_OVERFLOW:
>         case IFN_MUL_OVERFLOW:
> +       case IFN_UADDC:
> +       case IFN_USUBC:
>         case IFN_ATOMIC_COMPARE_EXCHANGE:
>           {
>             int_range<2> r;
> --- gcc/tree-ssa-dce.cc.jj      2023-06-13 18:23:37.214793069 +0200
> +++ gcc/tree-ssa-dce.cc 2023-06-14 12:21:38.659657731 +0200
> @@ -1481,6 +1481,14 @@ eliminate_unnecessary_stmts (bool aggres
>                   case IFN_MUL_OVERFLOW:
>                     maybe_optimize_arith_overflow (&gsi, MULT_EXPR);
>                     break;
> +                 case IFN_UADDC:
> +                   if (integer_zerop (gimple_call_arg (stmt, 2)))
> +                     maybe_optimize_arith_overflow (&gsi, PLUS_EXPR);
> +                   break;
> +                 case IFN_USUBC:
> +                   if (integer_zerop (gimple_call_arg (stmt, 2)))
> +                     maybe_optimize_arith_overflow (&gsi, MINUS_EXPR);
> +                   break;
>                   default:
>                     break;
>                   }
> --- gcc/doc/md.texi.jj  2023-06-13 18:23:37.192793371 +0200
> +++ gcc/doc/md.texi     2023-06-14 12:21:38.662657689 +0200
> @@ -5224,6 +5224,22 @@ is taken only on unsigned overflow.
>  @item @samp{usubv@var{m}4}, @samp{umulv@var{m}4}
>  Similar, for other unsigned arithmetic operations.
>
> +@cindex @code{uaddc@var{m}5} instruction pattern
> +@item @samp{uaddc@var{m}5}
> +Adds unsigned operands 2, 3 and 4 (where the last operand is guaranteed to
> +have only values 0 or 1) together, sets operand 0 to the result of the
> +addition of the 3 operands and sets operand 1 to 1 iff there was
> +overflow on the unsigned additions, and to 0 otherwise.  So, it is
> +an addition with carry in (operand 4) and carry out (operand 1).
> +All operands have the same mode.
> +
> +@cindex @code{usubc@var{m}5} instruction pattern
> +@item @samp{usubc@var{m}5}
> +Similarly to @samp{uaddc@var{m}5}, except subtracts unsigned operands 3
> +and 4 from operand 2 instead of adding them.  So, it is
> +a subtraction with carry/borrow in (operand 4) and carry/borrow out
> +(operand 1).  All operands have the same mode.
> +
>  @cindex @code{addptr@var{m}3} instruction pattern
>  @item @samp{addptr@var{m}3}
>  Like @code{add@var{m}3} but is guaranteed to only be used for address
> --- gcc/config/i386/i386.md.jj  2023-06-13 18:23:37.164793756 +0200
> +++ gcc/config/i386/i386.md     2023-06-14 12:21:38.668657604 +0200
> @@ -7733,6 +7733,25 @@ (define_peephole2
>    [(set (reg:CC FLAGS_REG)
>         (compare:CC (match_dup 0) (match_dup 1)))])
>
> +(define_peephole2
> +  [(set (match_operand:SWI 0 "general_reg_operand")
> +       (match_operand:SWI 1 "memory_operand"))
> +   (parallel [(set (reg:CC FLAGS_REG)
> +                  (compare:CC (match_dup 0)
> +                              (match_operand:SWI 2 "memory_operand")))
> +             (set (match_dup 0)
> +                  (minus:SWI (match_dup 0) (match_dup 2)))])
> +   (set (match_dup 1) (match_dup 0))]
> +  "(TARGET_READ_MODIFY_WRITE || optimize_insn_for_size_p ())
> +   && peep2_reg_dead_p (3, operands[0])
> +   && !reg_overlap_mentioned_p (operands[0], operands[1])
> +   && !reg_overlap_mentioned_p (operands[0], operands[2])"
> +  [(set (match_dup 0) (match_dup 2))
> +   (parallel [(set (reg:CC FLAGS_REG)
> +                  (compare:CC (match_dup 1) (match_dup 0)))
> +             (set (match_dup 1)
> +                  (minus:SWI (match_dup 1) (match_dup 0)))])])
> +
>  ;; decl %eax; cmpl $-1, %eax; jne .Lxx; can be optimized into
>  ;; subl $1, %eax; jnc .Lxx;
>  (define_peephole2
> @@ -7818,6 +7837,59 @@ (define_insn "@add<mode>3_carry"
>     (set_attr "pent_pair" "pu")
>     (set_attr "mode" "<MODE>")])
>
> +(define_peephole2
> +  [(set (match_operand:SWI 0 "general_reg_operand")
> +       (match_operand:SWI 1 "memory_operand"))
> +   (parallel [(set (match_dup 0)
> +                  (plus:SWI
> +                    (plus:SWI
> +                      (match_operator:SWI 4 "ix86_carry_flag_operator"
> +                        [(match_operand 3 "flags_reg_operand")
> +                         (const_int 0)])
> +                      (match_dup 0))
> +                    (match_operand:SWI 2 "memory_operand")))
> +             (clobber (reg:CC FLAGS_REG))])
> +   (set (match_dup 1) (match_dup 0))]
> +  "(TARGET_READ_MODIFY_WRITE || optimize_insn_for_size_p ())
> +   && peep2_reg_dead_p (3, operands[0])
> +   && !reg_overlap_mentioned_p (operands[0], operands[1])
> +   && !reg_overlap_mentioned_p (operands[0], operands[2])"
> +  [(set (match_dup 0) (match_dup 2))
> +   (parallel [(set (match_dup 1)
> +                  (plus:SWI (plus:SWI (match_op_dup 4
> +                                        [(match_dup 3) (const_int 0)])
> +                                      (match_dup 1))
> +                            (match_dup 0)))
> +             (clobber (reg:CC FLAGS_REG))])])
> +
> +(define_peephole2
> +  [(set (match_operand:SWI 0 "general_reg_operand")
> +       (match_operand:SWI 1 "memory_operand"))
> +   (parallel [(set (match_dup 0)
> +                  (plus:SWI
> +                    (plus:SWI
> +                      (match_operator:SWI 4 "ix86_carry_flag_operator"
> +                        [(match_operand 3 "flags_reg_operand")
> +                         (const_int 0)])
> +                      (match_dup 0))
> +                    (match_operand:SWI 2 "memory_operand")))
> +             (clobber (reg:CC FLAGS_REG))])
> +   (set (match_operand:SWI 5 "general_reg_operand") (match_dup 0))
> +   (set (match_dup 1) (match_dup 5))]
> +  "(TARGET_READ_MODIFY_WRITE || optimize_insn_for_size_p ())
> +   && peep2_reg_dead_p (3, operands[0])
> +   && peep2_reg_dead_p (4, operands[5])
> +   && !reg_overlap_mentioned_p (operands[0], operands[1])
> +   && !reg_overlap_mentioned_p (operands[0], operands[2])
> +   && !reg_overlap_mentioned_p (operands[5], operands[1])"
> +  [(set (match_dup 0) (match_dup 2))
> +   (parallel [(set (match_dup 1)
> +                  (plus:SWI (plus:SWI (match_op_dup 4
> +                                        [(match_dup 3) (const_int 0)])
> +                                      (match_dup 1))
> +                            (match_dup 0)))
> +             (clobber (reg:CC FLAGS_REG))])])
> +
>  (define_insn "*add<mode>3_carry_0"
>    [(set (match_operand:SWI 0 "nonimmediate_operand" "=<r>m")
>         (plus:SWI
> @@ -7918,6 +7990,159 @@ (define_insn "addcarry<mode>"
>     (set_attr "pent_pair" "pu")
>     (set_attr "mode" "<MODE>")])
>
> +;; Helper peephole2 for the addcarry<mode> and subborrow<mode>
> +;; peephole2s, to optimize away nop which resulted from uaddc/usubc
> +;; expansion optimization.
> +(define_peephole2
> +  [(set (match_operand:SWI48 0 "general_reg_operand")
> +       (match_operand:SWI48 1 "memory_operand"))
> +   (const_int 0)]
> +  ""
> +  [(set (match_dup 0) (match_dup 1))])
> +
> +(define_peephole2
> +  [(parallel [(set (reg:CCC FLAGS_REG)
> +                  (compare:CCC
> +                    (zero_extend:<DWI>
> +                      (plus:SWI48
> +                        (plus:SWI48
> +                          (match_operator:SWI48 4 "ix86_carry_flag_operator"
> +                            [(match_operand 2 "flags_reg_operand")
> +                             (const_int 0)])
> +                          (match_operand:SWI48 0 "general_reg_operand"))
> +                        (match_operand:SWI48 1 "memory_operand")))
> +                    (plus:<DWI>
> +                      (zero_extend:<DWI> (match_dup 1))
> +                      (match_operator:<DWI> 3 "ix86_carry_flag_operator"
> +                        [(match_dup 2) (const_int 0)]))))
> +             (set (match_dup 0)
> +                  (plus:SWI48 (plus:SWI48 (match_op_dup 4
> +                                            [(match_dup 2) (const_int 0)])
> +                                          (match_dup 0))
> +                              (match_dup 1)))])
> +   (set (match_dup 1) (match_dup 0))]
> +  "(TARGET_READ_MODIFY_WRITE || optimize_insn_for_size_p ())
> +   && peep2_reg_dead_p (2, operands[0])
> +   && !reg_overlap_mentioned_p (operands[0], operands[1])"
> +  [(parallel [(set (reg:CCC FLAGS_REG)
> +                  (compare:CCC
> +                    (zero_extend:<DWI>
> +                      (plus:SWI48
> +                        (plus:SWI48
> +                          (match_op_dup 4
> +                            [(match_dup 2) (const_int 0)])
> +                          (match_dup 1))
> +                        (match_dup 0)))
> +                    (plus:<DWI>
> +                      (zero_extend:<DWI> (match_dup 0))
> +                      (match_op_dup 3
> +                        [(match_dup 2) (const_int 0)]))))
> +             (set (match_dup 1)
> +                  (plus:SWI48 (plus:SWI48 (match_op_dup 4
> +                                            [(match_dup 2) (const_int 0)])
> +                                          (match_dup 1))
> +                              (match_dup 0)))])])
> +
> +(define_peephole2
> +  [(set (match_operand:SWI48 0 "general_reg_operand")
> +       (match_operand:SWI48 1 "memory_operand"))
> +   (parallel [(set (reg:CCC FLAGS_REG)
> +                  (compare:CCC
> +                    (zero_extend:<DWI>
> +                      (plus:SWI48
> +                        (plus:SWI48
> +                          (match_operator:SWI48 5 "ix86_carry_flag_operator"
> +                            [(match_operand 3 "flags_reg_operand")
> +                             (const_int 0)])
> +                          (match_dup 0))
> +                        (match_operand:SWI48 2 "memory_operand")))
> +                    (plus:<DWI>
> +                      (zero_extend:<DWI> (match_dup 2))
> +                      (match_operator:<DWI> 4 "ix86_carry_flag_operator"
> +                        [(match_dup 3) (const_int 0)]))))
> +             (set (match_dup 0)
> +                  (plus:SWI48 (plus:SWI48 (match_op_dup 5
> +                                            [(match_dup 3) (const_int 0)])
> +                                          (match_dup 0))
> +                              (match_dup 2)))])
> +   (set (match_dup 1) (match_dup 0))]
> +  "(TARGET_READ_MODIFY_WRITE || optimize_insn_for_size_p ())
> +   && peep2_reg_dead_p (3, operands[0])
> +   && !reg_overlap_mentioned_p (operands[0], operands[1])
> +   && !reg_overlap_mentioned_p (operands[0], operands[2])"
> +  [(set (match_dup 0) (match_dup 2))
> +   (parallel [(set (reg:CCC FLAGS_REG)
> +                  (compare:CCC
> +                    (zero_extend:<DWI>
> +                      (plus:SWI48
> +                        (plus:SWI48
> +                          (match_op_dup 5
> +                            [(match_dup 3) (const_int 0)])
> +                          (match_dup 1))
> +                        (match_dup 0)))
> +                    (plus:<DWI>
> +                      (zero_extend:<DWI> (match_dup 0))
> +                      (match_op_dup 4
> +                        [(match_dup 3) (const_int 0)]))))
> +             (set (match_dup 1)
> +                  (plus:SWI48 (plus:SWI48 (match_op_dup 5
> +                                            [(match_dup 3) (const_int 0)])
> +                                          (match_dup 1))
> +                              (match_dup 0)))])])
> +
> +(define_peephole2
> +  [(parallel [(set (reg:CCC FLAGS_REG)
> +                  (compare:CCC
> +                    (zero_extend:<DWI>
> +                      (plus:SWI48
> +                        (plus:SWI48
> +                          (match_operator:SWI48 4 "ix86_carry_flag_operator"
> +                            [(match_operand 2 "flags_reg_operand")
> +                             (const_int 0)])
> +                          (match_operand:SWI48 0 "general_reg_operand"))
> +                        (match_operand:SWI48 1 "memory_operand")))
> +                    (plus:<DWI>
> +                      (zero_extend:<DWI> (match_dup 1))
> +                      (match_operator:<DWI> 3 "ix86_carry_flag_operator"
> +                        [(match_dup 2) (const_int 0)]))))
> +             (set (match_dup 0)
> +                  (plus:SWI48 (plus:SWI48 (match_op_dup 4
> +                                            [(match_dup 2) (const_int 0)])
> +                                          (match_dup 0))
> +                              (match_dup 1)))])
> +   (set (match_operand:QI 5 "general_reg_operand")
> +       (ltu:QI (reg:CCC FLAGS_REG) (const_int 0)))
> +   (set (match_operand:SWI48 6 "general_reg_operand")
> +       (zero_extend:SWI48 (match_dup 5)))
> +   (set (match_dup 1) (match_dup 0))]
> +  "(TARGET_READ_MODIFY_WRITE || optimize_insn_for_size_p ())
> +   && peep2_reg_dead_p (4, operands[0])
> +   && !reg_overlap_mentioned_p (operands[0], operands[1])
> +   && !reg_overlap_mentioned_p (operands[0], operands[5])
> +   && !reg_overlap_mentioned_p (operands[5], operands[1])
> +   && !reg_overlap_mentioned_p (operands[0], operands[6])
> +   && !reg_overlap_mentioned_p (operands[6], operands[1])"
> +  [(parallel [(set (reg:CCC FLAGS_REG)
> +                  (compare:CCC
> +                    (zero_extend:<DWI>
> +                      (plus:SWI48
> +                        (plus:SWI48
> +                          (match_op_dup 4
> +                            [(match_dup 2) (const_int 0)])
> +                          (match_dup 1))
> +                        (match_dup 0)))
> +                    (plus:<DWI>
> +                      (zero_extend:<DWI> (match_dup 0))
> +                      (match_op_dup 3
> +                        [(match_dup 2) (const_int 0)]))))
> +             (set (match_dup 1)
> +                  (plus:SWI48 (plus:SWI48 (match_op_dup 4
> +                                            [(match_dup 2) (const_int 0)])
> +                                          (match_dup 1))
> +                              (match_dup 0)))])
> +   (set (match_dup 5) (ltu:QI (reg:CCC FLAGS_REG) (const_int 0)))
> +   (set (match_dup 6) (zero_extend:SWI48 (match_dup 5)))])
> +
>  (define_expand "addcarry<mode>_0"
>    [(parallel
>       [(set (reg:CCC FLAGS_REG)
> @@ -7988,6 +8213,59 @@ (define_insn "@sub<mode>3_carry"
>     (set_attr "pent_pair" "pu")
>     (set_attr "mode" "<MODE>")])
>
> +(define_peephole2
> +  [(set (match_operand:SWI 0 "general_reg_operand")
> +       (match_operand:SWI 1 "memory_operand"))
> +   (parallel [(set (match_dup 0)
> +                  (minus:SWI
> +                    (minus:SWI
> +                      (match_dup 0)
> +                      (match_operator:SWI 4 "ix86_carry_flag_operator"
> +                        [(match_operand 3 "flags_reg_operand")
> +                         (const_int 0)]))
> +                    (match_operand:SWI 2 "memory_operand")))
> +             (clobber (reg:CC FLAGS_REG))])
> +   (set (match_dup 1) (match_dup 0))]
> +  "(TARGET_READ_MODIFY_WRITE || optimize_insn_for_size_p ())
> +   && peep2_reg_dead_p (3, operands[0])
> +   && !reg_overlap_mentioned_p (operands[0], operands[1])
> +   && !reg_overlap_mentioned_p (operands[0], operands[2])"
> +  [(set (match_dup 0) (match_dup 2))
> +   (parallel [(set (match_dup 1)
> +                  (minus:SWI (minus:SWI (match_dup 1)
> +                                        (match_op_dup 4
> +                                          [(match_dup 3) (const_int 0)]))
> +                             (match_dup 0)))
> +             (clobber (reg:CC FLAGS_REG))])])
> +
> +(define_peephole2
> +  [(set (match_operand:SWI 0 "general_reg_operand")
> +       (match_operand:SWI 1 "memory_operand"))
> +   (parallel [(set (match_dup 0)
> +                  (minus:SWI
> +                    (minus:SWI
> +                      (match_dup 0)
> +                      (match_operator:SWI 4 "ix86_carry_flag_operator"
> +                        [(match_operand 3 "flags_reg_operand")
> +                         (const_int 0)]))
> +                    (match_operand:SWI 2 "memory_operand")))
> +             (clobber (reg:CC FLAGS_REG))])
> +   (set (match_operand:SWI 5 "general_reg_operand") (match_dup 0))
> +   (set (match_dup 1) (match_dup 5))]
> +  "(TARGET_READ_MODIFY_WRITE || optimize_insn_for_size_p ())
> +   && peep2_reg_dead_p (3, operands[0])
> +   && peep2_reg_dead_p (4, operands[5])
> +   && !reg_overlap_mentioned_p (operands[0], operands[1])
> +   && !reg_overlap_mentioned_p (operands[0], operands[2])
> +   && !reg_overlap_mentioned_p (operands[5], operands[1])"
> +  [(set (match_dup 0) (match_dup 2))
> +   (parallel [(set (match_dup 1)
> +                  (minus:SWI (minus:SWI (match_dup 1)
> +                                        (match_op_dup 4
> +                                          [(match_dup 3) (const_int 0)]))
> +                             (match_dup 0)))
> +             (clobber (reg:CC FLAGS_REG))])])
> +
>  (define_insn "*sub<mode>3_carry_0"
>    [(set (match_operand:SWI 0 "nonimmediate_operand" "=<r>m")
>         (minus:SWI
> @@ -8113,13 +8391,13 @@ (define_insn "subborrow<mode>"
>    [(set (reg:CCC FLAGS_REG)
>         (compare:CCC
>           (zero_extend:<DWI>
> -           (match_operand:SWI48 1 "nonimmediate_operand" "0"))
> +           (match_operand:SWI48 1 "nonimmediate_operand" "0,0"))
>           (plus:<DWI>
>             (match_operator:<DWI> 4 "ix86_carry_flag_operator"
>               [(match_operand 3 "flags_reg_operand") (const_int 0)])
>             (zero_extend:<DWI>
> -             (match_operand:SWI48 2 "nonimmediate_operand" "rm")))))
> -   (set (match_operand:SWI48 0 "register_operand" "=r")
> +             (match_operand:SWI48 2 "nonimmediate_operand" "r,rm")))))
> +   (set (match_operand:SWI48 0 "nonimmediate_operand" "=rm,r")
>         (minus:SWI48 (minus:SWI48
>                        (match_dup 1)
>                        (match_operator:SWI48 5 "ix86_carry_flag_operator"
> @@ -8132,6 +8410,154 @@ (define_insn "subborrow<mode>"
>     (set_attr "pent_pair" "pu")
>     (set_attr "mode" "<MODE>")])
>
> +(define_peephole2
> +  [(set (match_operand:SWI48 0 "general_reg_operand")
> +       (match_operand:SWI48 1 "memory_operand"))
> +   (parallel [(set (reg:CCC FLAGS_REG)
> +                  (compare:CCC
> +                    (zero_extend:<DWI> (match_dup 0))
> +                    (plus:<DWI>
> +                      (match_operator:<DWI> 4 "ix86_carry_flag_operator"
> +                        [(match_operand 3 "flags_reg_operand") (const_int 0)])
> +                      (zero_extend:<DWI>
> +                        (match_operand:SWI48 2 "memory_operand")))))
> +             (set (match_dup 0)
> +                  (minus:SWI48
> +                    (minus:SWI48
> +                      (match_dup 0)
> +                      (match_operator:SWI48 5 "ix86_carry_flag_operator"
> +                        [(match_dup 3) (const_int 0)]))
> +                    (match_dup 2)))])
> +   (set (match_dup 1) (match_dup 0))]
> +  "(TARGET_READ_MODIFY_WRITE || optimize_insn_for_size_p ())
> +   && peep2_reg_dead_p (3, operands[0])
> +   && !reg_overlap_mentioned_p (operands[0], operands[1])
> +   && !reg_overlap_mentioned_p (operands[0], operands[2])"
> +  [(set (match_dup 0) (match_dup 2))
> +   (parallel [(set (reg:CCC FLAGS_REG)
> +                  (compare:CCC
> +                    (zero_extend:<DWI> (match_dup 1))
> +                    (plus:<DWI> (match_op_dup 4
> +                                  [(match_dup 3) (const_int 0)])
> +                                (zero_extend:<DWI> (match_dup 0)))))
> +             (set (match_dup 1)
> +                  (minus:SWI48 (minus:SWI48 (match_dup 1)
> +                                            (match_op_dup 5
> +                                              [(match_dup 3) (const_int 0)]))
> +                               (match_dup 0)))])])
> +
> +(define_peephole2
> +  [(set (match_operand:SWI48 6 "general_reg_operand")
> +       (match_operand:SWI48 7 "memory_operand"))
> +   (set (match_operand:SWI48 8 "general_reg_operand")
> +       (match_operand:SWI48 9 "memory_operand"))
> +   (parallel [(set (reg:CCC FLAGS_REG)
> +                  (compare:CCC
> +                    (zero_extend:<DWI>
> +                      (match_operand:SWI48 0 "general_reg_operand"))
> +                    (plus:<DWI>
> +                      (match_operator:<DWI> 4 "ix86_carry_flag_operator"
> +                        [(match_operand 3 "flags_reg_operand") (const_int 0)])
> +                      (zero_extend:<DWI>
> +                        (match_operand:SWI48 2 "general_reg_operand")))))
> +             (set (match_dup 0)
> +                  (minus:SWI48
> +                    (minus:SWI48
> +                      (match_dup 0)
> +                      (match_operator:SWI48 5 "ix86_carry_flag_operator"
> +                        [(match_dup 3) (const_int 0)]))
> +                    (match_dup 2)))])
> +   (set (match_operand:SWI48 1 "memory_operand") (match_dup 0))]
> +  "(TARGET_READ_MODIFY_WRITE || optimize_insn_for_size_p ())
> +   && peep2_reg_dead_p (4, operands[0])
> +   && peep2_reg_dead_p (3, operands[2])
> +   && !reg_overlap_mentioned_p (operands[0], operands[1])
> +   && !reg_overlap_mentioned_p (operands[2], operands[1])
> +   && !reg_overlap_mentioned_p (operands[6], operands[9])
> +   && (rtx_equal_p (operands[6], operands[0])
> +       ? (rtx_equal_p (operands[7], operands[1])
> +         && rtx_equal_p (operands[8], operands[2]))
> +       : (rtx_equal_p (operands[8], operands[0])
> +         && rtx_equal_p (operands[9], operands[1])
> +         && rtx_equal_p (operands[6], operands[2])))"
> +  [(set (match_dup 0) (match_dup 9))
> +   (parallel [(set (reg:CCC FLAGS_REG)
> +                  (compare:CCC
> +                    (zero_extend:<DWI> (match_dup 1))
> +                    (plus:<DWI> (match_op_dup 4
> +                                  [(match_dup 3) (const_int 0)])
> +                                (zero_extend:<DWI> (match_dup 0)))))
> +             (set (match_dup 1)
> +                  (minus:SWI48 (minus:SWI48 (match_dup 1)
> +                                            (match_op_dup 5
> +                                              [(match_dup 3) (const_int 0)]))
> +                               (match_dup 0)))])]
> +{
> +  if (!rtx_equal_p (operands[6], operands[0]))
> +    operands[9] = operands[7];
> +})
> +
> +(define_peephole2
> +  [(set (match_operand:SWI48 6 "general_reg_operand")
> +       (match_operand:SWI48 7 "memory_operand"))
> +   (set (match_operand:SWI48 8 "general_reg_operand")
> +       (match_operand:SWI48 9 "memory_operand"))
> +   (parallel [(set (reg:CCC FLAGS_REG)
> +                  (compare:CCC
> +                    (zero_extend:<DWI>
> +                      (match_operand:SWI48 0 "general_reg_operand"))
> +                    (plus:<DWI>
> +                      (match_operator:<DWI> 4 "ix86_carry_flag_operator"
> +                        [(match_operand 3 "flags_reg_operand") (const_int 0)])
> +                      (zero_extend:<DWI>
> +                        (match_operand:SWI48 2 "general_reg_operand")))))
> +             (set (match_dup 0)
> +                  (minus:SWI48
> +                    (minus:SWI48
> +                      (match_dup 0)
> +                      (match_operator:SWI48 5 "ix86_carry_flag_operator"
> +                        [(match_dup 3) (const_int 0)]))
> +                    (match_dup 2)))])
> +   (set (match_operand:QI 10 "general_reg_operand")
> +       (ltu:QI (reg:CCC FLAGS_REG) (const_int 0)))
> +   (set (match_operand:SWI48 11 "general_reg_operand")
> +       (zero_extend:SWI48 (match_dup 10)))
> +   (set (match_operand:SWI48 1 "memory_operand") (match_dup 0))]
> +  "(TARGET_READ_MODIFY_WRITE || optimize_insn_for_size_p ())
> +   && peep2_reg_dead_p (6, operands[0])
> +   && peep2_reg_dead_p (3, operands[2])
> +   && !reg_overlap_mentioned_p (operands[0], operands[1])
> +   && !reg_overlap_mentioned_p (operands[2], operands[1])
> +   && !reg_overlap_mentioned_p (operands[6], operands[9])
> +   && !reg_overlap_mentioned_p (operands[0], operands[10])
> +   && !reg_overlap_mentioned_p (operands[10], operands[1])
> +   && !reg_overlap_mentioned_p (operands[0], operands[11])
> +   && !reg_overlap_mentioned_p (operands[11], operands[1])
> +   && (rtx_equal_p (operands[6], operands[0])
> +       ? (rtx_equal_p (operands[7], operands[1])
> +         && rtx_equal_p (operands[8], operands[2]))
> +       : (rtx_equal_p (operands[8], operands[0])
> +         && rtx_equal_p (operands[9], operands[1])
> +         && rtx_equal_p (operands[6], operands[2])))"
> +  [(set (match_dup 0) (match_dup 9))
> +   (parallel [(set (reg:CCC FLAGS_REG)
> +                  (compare:CCC
> +                    (zero_extend:<DWI> (match_dup 1))
> +                    (plus:<DWI> (match_op_dup 4
> +                                  [(match_dup 3) (const_int 0)])
> +                                (zero_extend:<DWI> (match_dup 0)))))
> +             (set (match_dup 1)
> +                  (minus:SWI48 (minus:SWI48 (match_dup 1)
> +                                            (match_op_dup 5
> +                                              [(match_dup 3) (const_int 0)]))
> +                               (match_dup 0)))])
> +   (set (match_dup 10) (ltu:QI (reg:CCC FLAGS_REG) (const_int 0)))
> +   (set (match_dup 11) (zero_extend:SWI48 (match_dup 10)))]
> +{
> +  if (!rtx_equal_p (operands[6], operands[0]))
> +    operands[9] = operands[7];
> +})
> +
>  (define_expand "subborrow<mode>_0"
>    [(parallel
>       [(set (reg:CC FLAGS_REG)
> @@ -8142,6 +8568,67 @@ (define_expand "subborrow<mode>_0"
>            (minus:SWI48 (match_dup 1) (match_dup 2)))])]
>    "ix86_binary_operator_ok (MINUS, <MODE>mode, operands)")
>
> +(define_expand "uaddc<mode>5"
> +  [(match_operand:SWI48 0 "register_operand")
> +   (match_operand:SWI48 1 "register_operand")
> +   (match_operand:SWI48 2 "register_operand")
> +   (match_operand:SWI48 3 "register_operand")
> +   (match_operand:SWI48 4 "nonmemory_operand")]
> +  ""
> +{
> +  rtx cf = gen_rtx_REG (CCCmode, FLAGS_REG), pat, pat2;
> +  if (operands[4] == const0_rtx)
> +    emit_insn (gen_addcarry<mode>_0 (operands[0], operands[2], operands[3]));
> +  else
> +    {
> +      rtx op4 = copy_to_mode_reg (QImode,
> +                                 convert_to_mode (QImode, operands[4], 1));
> +      emit_insn (gen_addqi3_cconly_overflow (op4, constm1_rtx));
> +      pat = gen_rtx_LTU (<DWI>mode, cf, const0_rtx);
> +      pat2 = gen_rtx_LTU (<MODE>mode, cf, const0_rtx);
> +      emit_insn (gen_addcarry<mode> (operands[0], operands[2], operands[3],
> +                                    cf, pat, pat2));
> +    }
> +  rtx cc = gen_reg_rtx (QImode);
> +  pat = gen_rtx_LTU (QImode, cf, const0_rtx);
> +  emit_insn (gen_rtx_SET (cc, pat));
> +  emit_insn (gen_zero_extendqi<mode>2 (operands[1], cc));
> +  DONE;
> +})
> +
> +(define_expand "usubc<mode>5"
> +  [(match_operand:SWI48 0 "register_operand")
> +   (match_operand:SWI48 1 "register_operand")
> +   (match_operand:SWI48 2 "register_operand")
> +   (match_operand:SWI48 3 "register_operand")
> +   (match_operand:SWI48 4 "nonmemory_operand")]
> +  ""
> +{
> +  rtx cf, pat, pat2;
> +  if (operands[4] == const0_rtx)
> +    {
> +      cf = gen_rtx_REG (CCmode, FLAGS_REG);
> +      emit_insn (gen_subborrow<mode>_0 (operands[0], operands[2],
> +                                       operands[3]));
> +    }
> +  else
> +    {
> +      cf = gen_rtx_REG (CCCmode, FLAGS_REG);
> +      rtx op4 = copy_to_mode_reg (QImode,
> +                                 convert_to_mode (QImode, operands[4], 1));
> +      emit_insn (gen_addqi3_cconly_overflow (op4, constm1_rtx));
> +      pat = gen_rtx_LTU (<DWI>mode, cf, const0_rtx);
> +      pat2 = gen_rtx_LTU (<MODE>mode, cf, const0_rtx);
> +      emit_insn (gen_subborrow<mode> (operands[0], operands[2], operands[3],
> +                                     cf, pat, pat2));
> +    }
> +  rtx cc = gen_reg_rtx (QImode);
> +  pat = gen_rtx_LTU (QImode, cf, const0_rtx);
> +  emit_insn (gen_rtx_SET (cc, pat));
> +  emit_insn (gen_zero_extendqi<mode>2 (operands[1], cc));
> +  DONE;
> +})
> +
>  (define_mode_iterator CC_CCC [CC CCC])
>
>  ;; Pre-reload splitter to optimize
> @@ -8239,6 +8726,27 @@ (define_peephole2
>                    (compare:CCC
>                      (plus:SWI (match_dup 1) (match_dup 0))
>                      (match_dup 1)))
> +             (set (match_dup 1) (plus:SWI (match_dup 1) (match_dup 0)))])])
> +
> +(define_peephole2
> +  [(set (match_operand:SWI 0 "general_reg_operand")
> +       (match_operand:SWI 1 "memory_operand"))
> +   (parallel [(set (reg:CCC FLAGS_REG)
> +                  (compare:CCC
> +                    (plus:SWI (match_dup 0)
> +                              (match_operand:SWI 2 "memory_operand"))
> +                    (match_dup 0)))
> +             (set (match_dup 0) (plus:SWI (match_dup 0) (match_dup 2)))])
> +   (set (match_dup 1) (match_dup 0))]
> +  "(TARGET_READ_MODIFY_WRITE || optimize_insn_for_size_p ())
> +   && peep2_reg_dead_p (3, operands[0])
> +   && !reg_overlap_mentioned_p (operands[0], operands[1])
> +   && !reg_overlap_mentioned_p (operands[0], operands[2])"
> +  [(set (match_dup 0) (match_dup 2))
> +   (parallel [(set (reg:CCC FLAGS_REG)
> +                  (compare:CCC
> +                    (plus:SWI (match_dup 1) (match_dup 0))
> +                    (match_dup 1)))
>               (set (match_dup 1) (plus:SWI (match_dup 1) (match_dup 0)))])])
>
>  (define_insn "*addsi3_zext_cc_overflow_1"
> --- gcc/testsuite/gcc.target/i386/pr79173-1.c.jj        2023-06-14 12:21:38.668657604 +0200
> +++ gcc/testsuite/gcc.target/i386/pr79173-1.c   2023-06-14 12:21:38.668657604 +0200
> @@ -0,0 +1,59 @@
> +/* PR middle-end/79173 */
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fno-stack-protector -masm=att" } */
> +/* { dg-final { scan-assembler-times "addq\t%r\[^\n\r]*, \\\(%rdi\\\)" 1 { target lp64 } } } */
> +/* { dg-final { scan-assembler-times "adcq\t%r\[^\n\r]*, 8\\\(%rdi\\\)" 1 { target lp64 } } } */
> +/* { dg-final { scan-assembler-times "adcq\t%r\[^\n\r]*, 16\\\(%rdi\\\)" 1 { target lp64 } } } */
> +/* { dg-final { scan-assembler-times "adcq\t%r\[^\n\r]*, 24\\\(%rdi\\\)" 1 { target lp64 } } } */
> +/* { dg-final { scan-assembler-times "subq\t%r\[^\n\r]*, \\\(%rdi\\\)" 1 { target lp64 } } } */
> +/* { dg-final { scan-assembler-times "sbbq\t%r\[^\n\r]*, 8\\\(%rdi\\\)" 1 { target lp64 } } } */
> +/* { dg-final { scan-assembler-times "sbbq\t%r\[^\n\r]*, 16\\\(%rdi\\\)" 1 { target lp64 } } } */
> +/* { dg-final { scan-assembler-times "sbbq\t%r\[^\n\r]*, 24\\\(%rdi\\\)" 1 { target lp64 } } } */
> +/* { dg-final { scan-assembler-times "addl\t%e\[^\n\r]*, \\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
> +/* { dg-final { scan-assembler-times "adcl\t%e\[^\n\r]*, 4\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
> +/* { dg-final { scan-assembler-times "adcl\t%e\[^\n\r]*, 8\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
> +/* { dg-final { scan-assembler-times "adcl\t%e\[^\n\r]*, 12\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
> +/* { dg-final { scan-assembler-times "subl\t%e\[^\n\r]*, \\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
> +/* { dg-final { scan-assembler-times "sbbl\t%e\[^\n\r]*, 4\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
> +/* { dg-final { scan-assembler-times "sbbl\t%e\[^\n\r]*, 8\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
> +/* { dg-final { scan-assembler-times "sbbl\t%e\[^\n\r]*, 12\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
> +
> +static unsigned long
> +uaddc (unsigned long x, unsigned long y, unsigned long carry_in, unsigned long *carry_out)
> +{
> +  unsigned long r;
> +  unsigned long c1 = __builtin_add_overflow (x, y, &r);
> +  unsigned long c2 = __builtin_add_overflow (r, carry_in, &r);
> +  *carry_out = c1 + c2;
> +  return r;
> +}
> +
> +static unsigned long
> +usubc (unsigned long x, unsigned long y, unsigned long carry_in, unsigned long *carry_out)
> +{
> +  unsigned long r;
> +  unsigned long c1 = __builtin_sub_overflow (x, y, &r);
> +  unsigned long c2 = __builtin_sub_overflow (r, carry_in, &r);
> +  *carry_out = c1 + c2;
> +  return r;
> +}
> +
> +void
> +foo (unsigned long *p, unsigned long *q)
> +{
> +  unsigned long c;
> +  p[0] = uaddc (p[0], q[0], 0, &c);
> +  p[1] = uaddc (p[1], q[1], c, &c);
> +  p[2] = uaddc (p[2], q[2], c, &c);
> +  p[3] = uaddc (p[3], q[3], c, &c);
> +}
> +
> +void
> +bar (unsigned long *p, unsigned long *q)
> +{
> +  unsigned long c;
> +  p[0] = usubc (p[0], q[0], 0, &c);
> +  p[1] = usubc (p[1], q[1], c, &c);
> +  p[2] = usubc (p[2], q[2], c, &c);
> +  p[3] = usubc (p[3], q[3], c, &c);
> +}
> --- gcc/testsuite/gcc.target/i386/pr79173-2.c.jj        2023-06-14 12:21:38.669657591 +0200
> +++ gcc/testsuite/gcc.target/i386/pr79173-2.c   2023-06-14 12:21:38.669657591 +0200
> @@ -0,0 +1,59 @@
> +/* PR middle-end/79173 */
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fno-stack-protector -masm=att" } */
> +/* { dg-final { scan-assembler-times "addq\t%r\[^\n\r]*, \\\(%rdi\\\)" 1 { target lp64 } } } */
> +/* { dg-final { scan-assembler-times "adcq\t%r\[^\n\r]*, 8\\\(%rdi\\\)" 1 { target lp64 } } } */
> +/* { dg-final { scan-assembler-times "adcq\t%r\[^\n\r]*, 16\\\(%rdi\\\)" 1 { target lp64 } } } */
> +/* { dg-final { scan-assembler-times "adcq\t%r\[^\n\r]*, 24\\\(%rdi\\\)" 1 { target lp64 } } } */
> +/* { dg-final { scan-assembler-times "subq\t%r\[^\n\r]*, \\\(%rdi\\\)" 1 { target lp64 } } } */
> +/* { dg-final { scan-assembler-times "sbbq\t%r\[^\n\r]*, 8\\\(%rdi\\\)" 1 { target lp64 } } } */
> +/* { dg-final { scan-assembler-times "sbbq\t%r\[^\n\r]*, 16\\\(%rdi\\\)" 1 { target lp64 } } } */
> +/* { dg-final { scan-assembler-times "sbbq\t%r\[^\n\r]*, 24\\\(%rdi\\\)" 1 { target lp64 } } } */
> +/* { dg-final { scan-assembler-times "addl\t%e\[^\n\r]*, \\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
> +/* { dg-final { scan-assembler-times "adcl\t%e\[^\n\r]*, 4\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
> +/* { dg-final { scan-assembler-times "adcl\t%e\[^\n\r]*, 8\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
> +/* { dg-final { scan-assembler-times "adcl\t%e\[^\n\r]*, 12\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
> +/* { dg-final { scan-assembler-times "subl\t%e\[^\n\r]*, \\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
> +/* { dg-final { scan-assembler-times "sbbl\t%e\[^\n\r]*, 4\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
> +/* { dg-final { scan-assembler-times "sbbl\t%e\[^\n\r]*, 8\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
> +/* { dg-final { scan-assembler-times "sbbl\t%e\[^\n\r]*, 12\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
> +
> +static unsigned long
> +uaddc (unsigned long x, unsigned long y, _Bool carry_in, _Bool *carry_out)
> +{
> +  unsigned long r;
> +  _Bool c1 = __builtin_add_overflow (x, y, &r);
> +  _Bool c2 = __builtin_add_overflow (r, carry_in, &r);
> +  *carry_out = c1 | c2;
> +  return r;
> +}
> +
> +static unsigned long
> +usubc (unsigned long x, unsigned long y, _Bool carry_in, _Bool *carry_out)
> +{
> +  unsigned long r;
> +  _Bool c1 = __builtin_sub_overflow (x, y, &r);
> +  _Bool c2 = __builtin_sub_overflow (r, carry_in, &r);
> +  *carry_out = c1 | c2;
> +  return r;
> +}
> +
> +void
> +foo (unsigned long *p, unsigned long *q)
> +{
> +  _Bool c;
> +  p[0] = uaddc (p[0], q[0], 0, &c);
> +  p[1] = uaddc (p[1], q[1], c, &c);
> +  p[2] = uaddc (p[2], q[2], c, &c);
> +  p[3] = uaddc (p[3], q[3], c, &c);
> +}
> +
> +void
> +bar (unsigned long *p, unsigned long *q)
> +{
> +  _Bool c;
> +  p[0] = usubc (p[0], q[0], 0, &c);
> +  p[1] = usubc (p[1], q[1], c, &c);
> +  p[2] = usubc (p[2], q[2], c, &c);
> +  p[3] = usubc (p[3], q[3], c, &c);
> +}
> --- gcc/testsuite/gcc.target/i386/pr79173-3.c.jj        2023-06-14 12:21:38.669657591 +0200
> +++ gcc/testsuite/gcc.target/i386/pr79173-3.c   2023-06-14 12:21:38.669657591 +0200
> @@ -0,0 +1,61 @@
> +/* PR middle-end/79173 */
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fno-stack-protector -masm=att" } */
> +/* { dg-final { scan-assembler-times "addq\t%r\[^\n\r]*, \\\(%rdi\\\)" 1 { target lp64 } } } */
> +/* { dg-final { scan-assembler-times "adcq\t%r\[^\n\r]*, 8\\\(%rdi\\\)" 1 { target lp64 } } } */
> +/* { dg-final { scan-assembler-times "adcq\t%r\[^\n\r]*, 16\\\(%rdi\\\)" 1 { target lp64 } } } */
> +/* { dg-final { scan-assembler-times "adcq\t%r\[^\n\r]*, 24\\\(%rdi\\\)" 1 { target lp64 } } } */
> +/* { dg-final { scan-assembler-times "subq\t%r\[^\n\r]*, \\\(%rdi\\\)" 1 { target lp64 } } } */
> +/* { dg-final { scan-assembler-times "sbbq\t%r\[^\n\r]*, 8\\\(%rdi\\\)" 1 { target lp64 } } } */
> +/* { dg-final { scan-assembler-times "sbbq\t%r\[^\n\r]*, 16\\\(%rdi\\\)" 1 { target lp64 } } } */
> +/* { dg-final { scan-assembler-times "sbbq\t%r\[^\n\r]*, 24\\\(%rdi\\\)" 1 { target lp64 } } } */
> +/* { dg-final { scan-assembler-times "addl\t%e\[^\n\r]*, \\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
> +/* { dg-final { scan-assembler-times "adcl\t%e\[^\n\r]*, 4\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
> +/* { dg-final { scan-assembler-times "adcl\t%e\[^\n\r]*, 8\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
> +/* { dg-final { scan-assembler-times "adcl\t%e\[^\n\r]*, 12\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
> +/* { dg-final { scan-assembler-times "subl\t%e\[^\n\r]*, \\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
> +/* { dg-final { scan-assembler-times "sbbl\t%e\[^\n\r]*, 4\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
> +/* { dg-final { scan-assembler-times "sbbl\t%e\[^\n\r]*, 8\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
> +/* { dg-final { scan-assembler-times "sbbl\t%e\[^\n\r]*, 12\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
> +
> +static unsigned long
> +uaddc (unsigned long x, unsigned long y, unsigned long carry_in, unsigned long *carry_out)
> +{
> +  unsigned long r;
> +  unsigned long c1 = __builtin_add_overflow (x, y, &r);
> +  unsigned long c2 = __builtin_add_overflow (r, carry_in, &r);
> +  *carry_out = c1 + c2;
> +  return r;
> +}
> +
> +static unsigned long
> +usubc (unsigned long x, unsigned long y, unsigned long carry_in, unsigned long *carry_out)
> +{
> +  unsigned long r;
> +  unsigned long c1 = __builtin_sub_overflow (x, y, &r);
> +  unsigned long c2 = __builtin_sub_overflow (r, carry_in, &r);
> +  *carry_out = c1 + c2;
> +  return r;
> +}
> +
> +unsigned long
> +foo (unsigned long *p, unsigned long *q)
> +{
> +  unsigned long c;
> +  p[0] = uaddc (p[0], q[0], 0, &c);
> +  p[1] = uaddc (p[1], q[1], c, &c);
> +  p[2] = uaddc (p[2], q[2], c, &c);
> +  p[3] = uaddc (p[3], q[3], c, &c);
> +  return c;
> +}
> +
> +unsigned long
> +bar (unsigned long *p, unsigned long *q)
> +{
> +  unsigned long c;
> +  p[0] = usubc (p[0], q[0], 0, &c);
> +  p[1] = usubc (p[1], q[1], c, &c);
> +  p[2] = usubc (p[2], q[2], c, &c);
> +  p[3] = usubc (p[3], q[3], c, &c);
> +  return c;
> +}
> --- gcc/testsuite/gcc.target/i386/pr79173-4.c.jj        2023-06-14 12:21:38.669657591 +0200
> +++ gcc/testsuite/gcc.target/i386/pr79173-4.c   2023-06-14 12:21:38.669657591 +0200
> @@ -0,0 +1,61 @@
> +/* PR middle-end/79173 */
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fno-stack-protector -masm=att" } */
> +/* { dg-final { scan-assembler-times "addq\t%r\[^\n\r]*, \\\(%rdi\\\)" 1 { target lp64 } } } */
> +/* { dg-final { scan-assembler-times "adcq\t%r\[^\n\r]*, 8\\\(%rdi\\\)" 1 { target lp64 } } } */
> +/* { dg-final { scan-assembler-times "adcq\t%r\[^\n\r]*, 16\\\(%rdi\\\)" 1 { target lp64 } } } */
> +/* { dg-final { scan-assembler-times "adcq\t%r\[^\n\r]*, 24\\\(%rdi\\\)" 1 { target lp64 } } } */
> +/* { dg-final { scan-assembler-times "subq\t%r\[^\n\r]*, \\\(%rdi\\\)" 1 { target lp64 } } } */
> +/* { dg-final { scan-assembler-times "sbbq\t%r\[^\n\r]*, 8\\\(%rdi\\\)" 1 { target lp64 } } } */
> +/* { dg-final { scan-assembler-times "sbbq\t%r\[^\n\r]*, 16\\\(%rdi\\\)" 1 { target lp64 } } } */
> +/* { dg-final { scan-assembler-times "sbbq\t%r\[^\n\r]*, 24\\\(%rdi\\\)" 1 { target lp64 } } } */
> +/* { dg-final { scan-assembler-times "addl\t%e\[^\n\r]*, \\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
> +/* { dg-final { scan-assembler-times "adcl\t%e\[^\n\r]*, 4\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
> +/* { dg-final { scan-assembler-times "adcl\t%e\[^\n\r]*, 8\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
> +/* { dg-final { scan-assembler-times "adcl\t%e\[^\n\r]*, 12\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
> +/* { dg-final { scan-assembler-times "subl\t%e\[^\n\r]*, \\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
> +/* { dg-final { scan-assembler-times "sbbl\t%e\[^\n\r]*, 4\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
> +/* { dg-final { scan-assembler-times "sbbl\t%e\[^\n\r]*, 8\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
> +/* { dg-final { scan-assembler-times "sbbl\t%e\[^\n\r]*, 12\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
> +
> +static unsigned long
> +uaddc (unsigned long x, unsigned long y, _Bool carry_in, _Bool *carry_out)
> +{
> +  unsigned long r;
> +  _Bool c1 = __builtin_add_overflow (x, y, &r);
> +  _Bool c2 = __builtin_add_overflow (r, carry_in, &r);
> +  *carry_out = c1 ^ c2;
> +  return r;
> +}
> +
> +static unsigned long
> +usubc (unsigned long x, unsigned long y, _Bool carry_in, _Bool *carry_out)
> +{
> +  unsigned long r;
> +  _Bool c1 = __builtin_sub_overflow (x, y, &r);
> +  _Bool c2 = __builtin_sub_overflow (r, carry_in, &r);
> +  *carry_out = c1 ^ c2;
> +  return r;
> +}
> +
> +_Bool
> +foo (unsigned long *p, unsigned long *q)
> +{
> +  _Bool c;
> +  p[0] = uaddc (p[0], q[0], 0, &c);
> +  p[1] = uaddc (p[1], q[1], c, &c);
> +  p[2] = uaddc (p[2], q[2], c, &c);
> +  p[3] = uaddc (p[3], q[3], c, &c);
> +  return c;
> +}
> +
> +_Bool
> +bar (unsigned long *p, unsigned long *q)
> +{
> +  _Bool c;
> +  p[0] = usubc (p[0], q[0], 0, &c);
> +  p[1] = usubc (p[1], q[1], c, &c);
> +  p[2] = usubc (p[2], q[2], c, &c);
> +  p[3] = usubc (p[3], q[3], c, &c);
> +  return c;
> +}
> --- gcc/testsuite/gcc.target/i386/pr79173-5.c.jj        2023-06-14 12:21:38.669657591 +0200
> +++ gcc/testsuite/gcc.target/i386/pr79173-5.c   2023-06-14 12:21:38.669657591 +0200
> @@ -0,0 +1,32 @@
> +/* PR middle-end/79173 */
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fno-stack-protector -masm=att" } */
> +/* { dg-final { scan-assembler-times "addq\t%r\[^\n\r]*, \\\(%rdi\\\)" 1 { target lp64 } } } */
> +/* { dg-final { scan-assembler-times "adcq\t%r\[^\n\r]*, 8\\\(%rdi\\\)" 1 { target lp64 } } } */
> +/* { dg-final { scan-assembler-times "adcq\t%r\[^\n\r]*, 16\\\(%rdi\\\)" 1 { target lp64 } } } */
> +/* { dg-final { scan-assembler-times "adcq\t%r\[^\n\r]*, 24\\\(%rdi\\\)" 1 { target lp64 } } } */
> +/* { dg-final { scan-assembler-times "addl\t%e\[^\n\r]*, \\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
> +/* { dg-final { scan-assembler-times "adcl\t%e\[^\n\r]*, 4\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
> +/* { dg-final { scan-assembler-times "adcl\t%e\[^\n\r]*, 8\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
> +/* { dg-final { scan-assembler-times "adcl\t%e\[^\n\r]*, 12\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
> +
> +static unsigned long
> +uaddc (unsigned long x, unsigned long y, unsigned long carry_in, unsigned long *carry_out)
> +{
> +  unsigned long r = x + y;
> +  unsigned long c1 = r < x;
> +  r += carry_in;
> +  unsigned long c2 = r < carry_in;
> +  *carry_out = c1 + c2;
> +  return r;
> +}
> +
> +void
> +foo (unsigned long *p, unsigned long *q)
> +{
> +  unsigned long c;
> +  p[0] = uaddc (p[0], q[0], 0, &c);
> +  p[1] = uaddc (p[1], q[1], c, &c);
> +  p[2] = uaddc (p[2], q[2], c, &c);
> +  p[3] = uaddc (p[3], q[3], c, &c);
> +}
> --- gcc/testsuite/gcc.target/i386/pr79173-6.c.jj        2023-06-14 12:21:38.669657591 +0200
> +++ gcc/testsuite/gcc.target/i386/pr79173-6.c   2023-06-14 12:21:38.669657591 +0200
> @@ -0,0 +1,33 @@
> +/* PR middle-end/79173 */
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fno-stack-protector -masm=att" } */
> +/* { dg-final { scan-assembler-times "addq\t%r\[^\n\r]*, \\\(%rdi\\\)" 1 { target lp64 } } } */
> +/* { dg-final { scan-assembler-times "adcq\t%r\[^\n\r]*, 8\\\(%rdi\\\)" 1 { target lp64 } } } */
> +/* { dg-final { scan-assembler-times "adcq\t%r\[^\n\r]*, 16\\\(%rdi\\\)" 1 { target lp64 } } } */
> +/* { dg-final { scan-assembler-times "adcq\t%r\[^\n\r]*, 24\\\(%rdi\\\)" 1 { target lp64 } } } */
> +/* { dg-final { scan-assembler-times "addl\t%e\[^\n\r]*, \\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
> +/* { dg-final { scan-assembler-times "adcl\t%e\[^\n\r]*, 4\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
> +/* { dg-final { scan-assembler-times "adcl\t%e\[^\n\r]*, 8\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
> +/* { dg-final { scan-assembler-times "adcl\t%e\[^\n\r]*, 12\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
> +
> +static unsigned long
> +uaddc (unsigned long x, unsigned long y, unsigned long carry_in, unsigned long *carry_out)
> +{
> +  unsigned long r = x + y;
> +  unsigned long c1 = r < x;
> +  r += carry_in;
> +  unsigned long c2 = r < carry_in;
> +  *carry_out = c1 + c2;
> +  return r;
> +}
> +
> +unsigned long
> +foo (unsigned long *p, unsigned long *q)
> +{
> +  unsigned long c;
> +  p[0] = uaddc (p[0], q[0], 0, &c);
> +  p[1] = uaddc (p[1], q[1], c, &c);
> +  p[2] = uaddc (p[2], q[2], c, &c);
> +  p[3] = uaddc (p[3], q[3], c, &c);
> +  return c;
> +}
> --- gcc/testsuite/gcc.target/i386/pr79173-7.c.jj        2023-06-14 12:21:38.669657591 +0200
> +++ gcc/testsuite/gcc.target/i386/pr79173-7.c   2023-06-14 12:21:38.669657591 +0200
> @@ -0,0 +1,31 @@
> +/* PR middle-end/79173 */
> +/* { dg-do compile { target lp64 } } */
> +/* { dg-options "-O2 -fno-stack-protector -masm=att" } */
> +/* { dg-final { scan-assembler-times "addq\t%r\[^\n\r]*, \\\(%rdi\\\)" 1 } } */
> +/* { dg-final { scan-assembler-times "adcq\t%r\[^\n\r]*, 8\\\(%rdi\\\)" 1 } } */
> +/* { dg-final { scan-assembler-times "adcq\t%r\[^\n\r]*, 16\\\(%rdi\\\)" 1 } } */
> +/* { dg-final { scan-assembler-times "adcq\t%r\[^\n\r]*, 24\\\(%rdi\\\)" 1 } } */
> +/* { dg-final { scan-assembler-times "subq\t%r\[^\n\r]*, \\\(%rdi\\\)" 1 } } */
> +/* { dg-final { scan-assembler-times "sbbq\t%r\[^\n\r]*, 8\\\(%rdi\\\)" 1 } } */
> +/* { dg-final { scan-assembler-times "sbbq\t%r\[^\n\r]*, 16\\\(%rdi\\\)" 1 } } */
> +/* { dg-final { scan-assembler-times "sbbq\t%r\[^\n\r]*, 24\\\(%rdi\\\)" 1 } } */
> +
> +#include <x86intrin.h>
> +
> +void
> +foo (unsigned long long *p, unsigned long long *q)
> +{
> +  unsigned char c = _addcarry_u64 (0, p[0], q[0], &p[0]);
> +  c = _addcarry_u64 (c, p[1], q[1], &p[1]);
> +  c = _addcarry_u64 (c, p[2], q[2], &p[2]);
> +  _addcarry_u64 (c, p[3], q[3], &p[3]);
> +}
> +
> +void
> +bar (unsigned long long *p, unsigned long long *q)
> +{
> +  unsigned char c = _subborrow_u64 (0, p[0], q[0], &p[0]);
> +  c = _subborrow_u64 (c, p[1], q[1], &p[1]);
> +  c = _subborrow_u64 (c, p[2], q[2], &p[2]);
> +  _subborrow_u64 (c, p[3], q[3], &p[3]);
> +}
> --- gcc/testsuite/gcc.target/i386/pr79173-8.c.jj        2023-06-14 12:21:38.669657591 +0200
> +++ gcc/testsuite/gcc.target/i386/pr79173-8.c   2023-06-14 12:21:38.669657591 +0200
> @@ -0,0 +1,31 @@
> +/* PR middle-end/79173 */
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fno-stack-protector -masm=att" } */
> +/* { dg-final { scan-assembler-times "addl\t%e\[^\n\r]*, \\\(%\[^\n\r]*\\\)" 1 } } */
> +/* { dg-final { scan-assembler-times "adcl\t%e\[^\n\r]*, 4\\\(%\[^\n\r]*\\\)" 1 } } */
> +/* { dg-final { scan-assembler-times "adcl\t%e\[^\n\r]*, 8\\\(%\[^\n\r]*\\\)" 1 } } */
> +/* { dg-final { scan-assembler-times "adcl\t%e\[^\n\r]*, 12\\\(%\[^\n\r]*\\\)" 1 } } */
> +/* { dg-final { scan-assembler-times "subl\t%e\[^\n\r]*, \\\(%\[^\n\r]*\\\)" 1 } } */
> +/* { dg-final { scan-assembler-times "sbbl\t%e\[^\n\r]*, 4\\\(%\[^\n\r]*\\\)" 1 } } */
> +/* { dg-final { scan-assembler-times "sbbl\t%e\[^\n\r]*, 8\\\(%\[^\n\r]*\\\)" 1 } } */
> +/* { dg-final { scan-assembler-times "sbbl\t%e\[^\n\r]*, 12\\\(%\[^\n\r]*\\\)" 1 } } */
> +
> +#include <x86intrin.h>
> +
> +void
> +foo (unsigned int *p, unsigned int *q)
> +{
> +  unsigned char c = _addcarry_u32 (0, p[0], q[0], &p[0]);
> +  c = _addcarry_u32 (c, p[1], q[1], &p[1]);
> +  c = _addcarry_u32 (c, p[2], q[2], &p[2]);
> +  _addcarry_u32 (c, p[3], q[3], &p[3]);
> +}
> +
> +void
> +bar (unsigned int *p, unsigned int *q)
> +{
> +  unsigned char c = _subborrow_u32 (0, p[0], q[0], &p[0]);
> +  c = _subborrow_u32 (c, p[1], q[1], &p[1]);
> +  c = _subborrow_u32 (c, p[2], q[2], &p[2]);
> +  _subborrow_u32 (c, p[3], q[3], &p[3]);
> +}
> --- gcc/testsuite/gcc.target/i386/pr79173-9.c.jj        2023-06-14 12:21:38.669657591 +0200
> +++ gcc/testsuite/gcc.target/i386/pr79173-9.c   2023-06-14 12:21:38.669657591 +0200
> @@ -0,0 +1,31 @@
> +/* PR middle-end/79173 */
> +/* { dg-do compile { target lp64 } } */
> +/* { dg-options "-O2 -fno-stack-protector -masm=att" } */
> +/* { dg-final { scan-assembler-times "addq\t%r\[^\n\r]*, \\\(%rdi\\\)" 1 } } */
> +/* { dg-final { scan-assembler-times "adcq\t%r\[^\n\r]*, 8\\\(%rdi\\\)" 1 } } */
> +/* { dg-final { scan-assembler-times "adcq\t%r\[^\n\r]*, 16\\\(%rdi\\\)" 1 } } */
> +/* { dg-final { scan-assembler-times "adcq\t%r\[^\n\r]*, 24\\\(%rdi\\\)" 1 } } */
> +/* { dg-final { scan-assembler-times "subq\t%r\[^\n\r]*, \\\(%rdi\\\)" 1 } } */
> +/* { dg-final { scan-assembler-times "sbbq\t%r\[^\n\r]*, 8\\\(%rdi\\\)" 1 } } */
> +/* { dg-final { scan-assembler-times "sbbq\t%r\[^\n\r]*, 16\\\(%rdi\\\)" 1 } } */
> +/* { dg-final { scan-assembler-times "sbbq\t%r\[^\n\r]*, 24\\\(%rdi\\\)" 1 } } */
> +
> +#include <x86intrin.h>
> +
> +unsigned long long
> +foo (unsigned long long *p, unsigned long long *q)
> +{
> +  unsigned char c = _addcarry_u64 (0, p[0], q[0], &p[0]);
> +  c = _addcarry_u64 (c, p[1], q[1], &p[1]);
> +  c = _addcarry_u64 (c, p[2], q[2], &p[2]);
> +  return _addcarry_u64 (c, p[3], q[3], &p[3]);
> +}
> +
> +unsigned long long
> +bar (unsigned long long *p, unsigned long long *q)
> +{
> +  unsigned char c = _subborrow_u64 (0, p[0], q[0], &p[0]);
> +  c = _subborrow_u64 (c, p[1], q[1], &p[1]);
> +  c = _subborrow_u64 (c, p[2], q[2], &p[2]);
> +  return _subborrow_u64 (c, p[3], q[3], &p[3]);
> +}
> --- gcc/testsuite/gcc.target/i386/pr79173-10.c.jj       2023-06-14 12:21:38.669657591 +0200
> +++ gcc/testsuite/gcc.target/i386/pr79173-10.c  2023-06-14 12:21:38.669657591 +0200
> @@ -0,0 +1,31 @@
> +/* PR middle-end/79173 */
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fno-stack-protector -masm=att" } */
> +/* { dg-final { scan-assembler-times "addl\t%e\[^\n\r]*, \\\(%\[^\n\r]*\\\)" 1 } } */
> +/* { dg-final { scan-assembler-times "adcl\t%e\[^\n\r]*, 4\\\(%\[^\n\r]*\\\)" 1 } } */
> +/* { dg-final { scan-assembler-times "adcl\t%e\[^\n\r]*, 8\\\(%\[^\n\r]*\\\)" 1 } } */
> +/* { dg-final { scan-assembler-times "adcl\t%e\[^\n\r]*, 12\\\(%\[^\n\r]*\\\)" 1 } } */
> +/* { dg-final { scan-assembler-times "subl\t%e\[^\n\r]*, \\\(%\[^\n\r]*\\\)" 1 } } */
> +/* { dg-final { scan-assembler-times "sbbl\t%e\[^\n\r]*, 4\\\(%\[^\n\r]*\\\)" 1 } } */
> +/* { dg-final { scan-assembler-times "sbbl\t%e\[^\n\r]*, 8\\\(%\[^\n\r]*\\\)" 1 } } */
> +/* { dg-final { scan-assembler-times "sbbl\t%e\[^\n\r]*, 12\\\(%\[^\n\r]*\\\)" 1 } } */
> +
> +#include <x86intrin.h>
> +
> +unsigned int
> +foo (unsigned int *p, unsigned int *q)
> +{
> +  unsigned char c = _addcarry_u32 (0, p[0], q[0], &p[0]);
> +  c = _addcarry_u32 (c, p[1], q[1], &p[1]);
> +  c = _addcarry_u32 (c, p[2], q[2], &p[2]);
> +  return _addcarry_u32 (c, p[3], q[3], &p[3]);
> +}
> +
> +unsigned int
> +bar (unsigned int *p, unsigned int *q)
> +{
> +  unsigned char c = _subborrow_u32 (0, p[0], q[0], &p[0]);
> +  c = _subborrow_u32 (c, p[1], q[1], &p[1]);
> +  c = _subborrow_u32 (c, p[2], q[2], &p[2]);
> +  return _subborrow_u32 (c, p[3], q[3], &p[3]);
> +}
>
>
>         Jakub
>

  parent reply	other threads:[~2023-06-14 14:34 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-06-06 21:42 [PATCH] middle-end, i386: " Jakub Jelinek
2023-06-13  7:06 ` Patch ping (Re: [PATCH] middle-end, i386: Pattern recognize add/subtract with carry [PR79173]) Jakub Jelinek
2023-06-13  8:32   ` Uros Bizjak
2023-06-13  8:40 ` [PATCH] middle-end, i386: Pattern recognize add/subtract with carry [PR79173] Richard Biener
2023-06-13 11:29   ` Jakub Jelinek
2023-06-14 11:17     ` Jakub Jelinek
2023-06-14 12:25       ` Richard Biener
2023-06-14 13:52         ` [PATCH] middle-end: Move constant args folding of .UBSAN_CHECK_* and .*_OVERFLOW into fold-const-call.cc Jakub Jelinek
2023-06-14 13:54           ` Richard Biener
2023-06-14 12:35     ` [PATCH] middle-end, i386: Pattern recognize add/subtract with carry [PR79173] Richard Biener
2023-06-14 13:59       ` [PATCH] middle-end, i386, v3: " Jakub Jelinek
2023-06-14 14:28         ` Richard Biener
2023-06-14 14:34         ` Uros Bizjak [this message]
2023-06-14 14:56           ` Jakub Jelinek
2023-06-14 15:01             ` Uros Bizjak
2023-06-14 14:45         ` Uros Bizjak
2023-06-14 15:19           ` Jakub Jelinek

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAFULd4bsEnQHQEJZOa1=7Xpee7g_YzWEsSV5P9QEdaA9qJvCSA@mail.gmail.com' \
    --to=ubizjak@gmail.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=jakub@redhat.com \
    --cc=rguenther@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).