From: Richard Biener <rguenther@suse.de>
To: Jakub Jelinek <jakub@redhat.com>
Cc: Uros Bizjak <ubizjak@gmail.com>, gcc-patches@gcc.gnu.org
Subject: Re: [PATCH] middle-end, i386: Pattern recognize add/subtract with carry [PR79173]
Date: Tue, 13 Jun 2023 08:40:36 +0000 (UTC) [thread overview]
Message-ID: <nycvar.YFH.7.77.849.2306130810310.4723@jbgna.fhfr.qr> (raw)
In-Reply-To: <ZH+oL657y6sfy08/@tucnak>
On Tue, 6 Jun 2023, Jakub Jelinek wrote:
> Hi!
>
> The following patch introduces {add,sub}c5_optab and pattern recognizes
> various forms of add with carry and subtract with carry/borrow, see
> pr79173-{1,2,3,4,5,6}.c tests on what is matched.
> Primarily forms with 2 __builtin_add_overflow or __builtin_sub_overflow
> calls per limb (with just one for the least significant one), for
> add with carry even when it is hand written in C (for subtraction
> reassoc seems to change it too much so that the pattern recognition
> doesn't work). __builtin_{add,sub}_overflow are standardized in C23
> under ckd_{add,sub} names, so it isn't any longer a GNU only extension.
>
> Note, clang has for these has (IMHO badly designed)
> __builtin_{add,sub}c{b,s,,l,ll} builtins which don't add/subtract just
> a single bit of carry, but basically add 3 unsigned values or
> subtract 2 unsigned values from one, and result in carry out of 0, 1, or 2
> because of that. If we wanted to introduce those for clang compatibility,
> we could and lower them early to just two __builtin_{add,sub}_overflow
> calls and let the pattern matching in this patch recognize it later.
>
> I've added expanders for this on ix86 and in addition to that
> added various peephole2s to make sure we get nice (and small) code
> for the common cases. I think there are other PRs which request that
> e.g. for the _{addcarry,subborrow}_u{32,64} intrinsics, which the patch
> also improves.
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>
> Would be nice if support for these optabs was added to many other targets,
> arm/aarch64 and powerpc* certainly have such instructions, I'd expect
> in fact that most targets do.
>
> The _BitInt support I'm working on will also need this to emit reasonable
> code.
>
> 2023-06-06 Jakub Jelinek <jakub@redhat.com>
>
> PR middle-end/79173
> * internal-fn.def (ADDC, SUBC): New internal functions.
> * internal-fn.cc (expand_ADDC, expand_SUBC): New functions.
> (commutative_ternary_fn_p): Return true also for IFN_ADDC.
> * optabs.def (addc5_optab, subc5_optab): New optabs.
> * tree-ssa-math-opts.cc (match_addc_subc): New function.
> (math_opts_dom_walker::after_dom_children): Call match_addc_subc
> for PLUS_EXPR, MINUS_EXPR, BIT_IOR_EXPR and BIT_XOR_EXPR unless
> other optimizations have been successful for those.
> * gimple-fold.cc (gimple_fold_call): Handle IFN_ADDC and IFN_SUBC.
> * gimple-range-fold.cc (adjust_imagpart_expr): Likewise.
> * tree-ssa-dce.cc (eliminate_unnecessary_stmts): Likewise.
> * doc/md.texi (addc<mode>5, subc<mode>5): Document new named
> patterns.
> * config/i386/i386.md (subborrow<mode>): Add alternative with
> memory destination.
> (addc<mode>5, subc<mode>5): New define_expand patterns.
> (*sub<mode>_3, @add<mode>3_carry, addcarry<mode>, @sub<mode>3_carry,
> subborrow<mode>, *add<mode>3_cc_overflow_1): Add define_peephole2
> TARGET_READ_MODIFY_WRITE/-Os patterns to prefer using memory
> destination in these patterns.
>
> * gcc.target/i386/pr79173-1.c: New test.
> * gcc.target/i386/pr79173-2.c: New test.
> * gcc.target/i386/pr79173-3.c: New test.
> * gcc.target/i386/pr79173-4.c: New test.
> * gcc.target/i386/pr79173-5.c: New test.
> * gcc.target/i386/pr79173-6.c: New test.
> * gcc.target/i386/pr79173-7.c: New test.
> * gcc.target/i386/pr79173-8.c: New test.
> * gcc.target/i386/pr79173-9.c: New test.
> * gcc.target/i386/pr79173-10.c: New test.
>
> --- gcc/internal-fn.def.jj 2023-06-05 10:38:06.670333685 +0200
> +++ gcc/internal-fn.def 2023-06-05 11:40:50.672212265 +0200
> @@ -381,6 +381,8 @@ DEF_INTERNAL_FN (ASAN_POISON_USE, ECF_LE
> DEF_INTERNAL_FN (ADD_OVERFLOW, ECF_CONST | ECF_LEAF | ECF_NOTHROW, NULL)
> DEF_INTERNAL_FN (SUB_OVERFLOW, ECF_CONST | ECF_LEAF | ECF_NOTHROW, NULL)
> DEF_INTERNAL_FN (MUL_OVERFLOW, ECF_CONST | ECF_LEAF | ECF_NOTHROW, NULL)
> +DEF_INTERNAL_FN (ADDC, ECF_CONST | ECF_LEAF | ECF_NOTHROW, NULL)
> +DEF_INTERNAL_FN (SUBC, ECF_CONST | ECF_LEAF | ECF_NOTHROW, NULL)
> DEF_INTERNAL_FN (TSAN_FUNC_EXIT, ECF_NOVOPS | ECF_LEAF | ECF_NOTHROW, NULL)
> DEF_INTERNAL_FN (VA_ARG, ECF_NOTHROW | ECF_LEAF, NULL)
> DEF_INTERNAL_FN (VEC_CONVERT, ECF_CONST | ECF_LEAF | ECF_NOTHROW, NULL)
> --- gcc/internal-fn.cc.jj 2023-05-15 19:12:24.080780016 +0200
> +++ gcc/internal-fn.cc 2023-06-06 09:38:46.333871169 +0200
> @@ -2722,6 +2722,44 @@ expand_MUL_OVERFLOW (internal_fn, gcall
> expand_arith_overflow (MULT_EXPR, stmt);
> }
>
> +/* Expand ADDC STMT. */
> +
> +static void
> +expand_ADDC (internal_fn ifn, gcall *stmt)
> +{
> + tree lhs = gimple_call_lhs (stmt);
> + tree arg1 = gimple_call_arg (stmt, 0);
> + tree arg2 = gimple_call_arg (stmt, 1);
> + tree arg3 = gimple_call_arg (stmt, 2);
> + tree type = TREE_TYPE (arg1);
> + machine_mode mode = TYPE_MODE (type);
> + insn_code icode = optab_handler (ifn == IFN_ADDC
> + ? addc5_optab : subc5_optab, mode);
> + rtx op1 = expand_normal (arg1);
> + rtx op2 = expand_normal (arg2);
> + rtx op3 = expand_normal (arg3);
> + rtx target = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE);
> + rtx re = gen_reg_rtx (mode);
> + rtx im = gen_reg_rtx (mode);
> + class expand_operand ops[5];
> + create_output_operand (&ops[0], re, mode);
> + create_output_operand (&ops[1], im, mode);
> + create_input_operand (&ops[2], op1, mode);
> + create_input_operand (&ops[3], op2, mode);
> + create_input_operand (&ops[4], op3, mode);
> + expand_insn (icode, 5, ops);
> + write_complex_part (target, re, false, false);
> + write_complex_part (target, im, true, false);
> +}
> +
> +/* Expand SUBC STMT. */
> +
> +static void
> +expand_SUBC (internal_fn ifn, gcall *stmt)
> +{
> + expand_ADDC (ifn, stmt);
> +}
> +
> /* This should get folded in tree-vectorizer.cc. */
>
> static void
> @@ -3990,6 +4028,7 @@ commutative_ternary_fn_p (internal_fn fn
> case IFN_FMS:
> case IFN_FNMA:
> case IFN_FNMS:
> + case IFN_ADDC:
> return true;
>
> default:
> --- gcc/optabs.def.jj 2023-01-02 09:32:43.984973197 +0100
> +++ gcc/optabs.def 2023-06-05 19:03:33.858210753 +0200
> @@ -260,6 +260,8 @@ OPTAB_D (uaddv4_optab, "uaddv$I$a4")
> OPTAB_D (usubv4_optab, "usubv$I$a4")
> OPTAB_D (umulv4_optab, "umulv$I$a4")
> OPTAB_D (negv3_optab, "negv$I$a3")
> +OPTAB_D (addc5_optab, "addc$I$a5")
> +OPTAB_D (subc5_optab, "subc$I$a5")
> OPTAB_D (addptr3_optab, "addptr$a3")
> OPTAB_D (spaceship_optab, "spaceship$a3")
>
> --- gcc/tree-ssa-math-opts.cc.jj 2023-05-19 12:58:25.246844019 +0200
> +++ gcc/tree-ssa-math-opts.cc 2023-06-06 17:22:24.833455259 +0200
> @@ -4441,6 +4441,438 @@ match_arith_overflow (gimple_stmt_iterat
> return false;
> }
>
> +/* Try to match e.g.
> + _29 = .ADD_OVERFLOW (_3, _4);
> + _30 = REALPART_EXPR <_29>;
> + _31 = IMAGPART_EXPR <_29>;
> + _32 = .ADD_OVERFLOW (_30, _38);
> + _33 = REALPART_EXPR <_32>;
> + _34 = IMAGPART_EXPR <_32>;
> + _35 = _31 + _34;
> + as
> + _36 = .ADDC (_3, _4, _38);
> + _33 = REALPART_EXPR <_36>;
> + _35 = IMAGPART_EXPR <_36>;
> + or
> + _22 = .SUB_OVERFLOW (_6, _5);
> + _23 = REALPART_EXPR <_22>;
> + _24 = IMAGPART_EXPR <_22>;
> + _25 = .SUB_OVERFLOW (_23, _37);
> + _26 = REALPART_EXPR <_25>;
> + _27 = IMAGPART_EXPR <_25>;
> + _28 = _24 | _27;
> + as
> + _29 = .SUBC (_6, _5, _37);
> + _26 = REALPART_EXPR <_29>;
> + _288 = IMAGPART_EXPR <_29>;
> + provided _38 or _37 above have [0, 1] range
> + and _3, _4 and _30 or _6, _5 and _23 are unsigned
> + integral types with the same precision. Whether + or | or ^ is
> + used on the IMAGPART_EXPR results doesn't matter, with one of
> + added or subtracted operands in [0, 1] range at most one
> + .ADD_OVERFLOW or .SUB_OVERFLOW will indicate overflow. */
> +
> +static bool
> +match_addc_subc (gimple_stmt_iterator *gsi, gimple *stmt, tree_code code)
> +{
> + tree rhs[4];
> + rhs[0] = gimple_assign_rhs1 (stmt);
> + rhs[1] = gimple_assign_rhs2 (stmt);
> + rhs[2] = NULL_TREE;
> + rhs[3] = NULL_TREE;
> + tree type = TREE_TYPE (rhs[0]);
> + if (!INTEGRAL_TYPE_P (type) || !TYPE_UNSIGNED (type))
> + return false;
> +
> + if (code != BIT_IOR_EXPR && code != BIT_XOR_EXPR)
> + {
> + /* If overflow flag is ignored on the MSB limb, we can end up with
> + the most significant limb handled as r = op1 + op2 + ovf1 + ovf2;
> + or r = op1 - op2 - ovf1 - ovf2; or various equivalent expressions
> + thereof. Handle those like the ovf = ovf1 + ovf2; case to recognize
> + the limb below the MSB, but also create another .ADDC/.SUBC call for
> + the last limb. */
I suspect re-association can wreck things even more here. I have
to say the matching code is very hard to follow, not sure if
splitting out a function matching
_22 = .{ADD,SUB}_OVERFLOW (_6, _5);
_23 = REALPART_EXPR <_22>;
_24 = IMAGPART_EXPR <_22>;
from _23 and _24 would help?
> + while (TREE_CODE (rhs[0]) == SSA_NAME && !rhs[3])
> + {
> + gimple *g = SSA_NAME_DEF_STMT (rhs[0]);
> + if (has_single_use (rhs[0])
> + && is_gimple_assign (g)
> + && (gimple_assign_rhs_code (g) == code
> + || (code == MINUS_EXPR
> + && gimple_assign_rhs_code (g) == PLUS_EXPR
> + && TREE_CODE (gimple_assign_rhs2 (g)) == INTEGER_CST)))
> + {
> + rhs[0] = gimple_assign_rhs1 (g);
> + tree &r = rhs[2] ? rhs[3] : rhs[2];
> + r = gimple_assign_rhs2 (g);
> + if (gimple_assign_rhs_code (g) != code)
> + r = fold_build1 (NEGATE_EXPR, TREE_TYPE (r), r);
Can you use const_unop here? In fact both will not reliably
negate all constants (ick), so maybe we want a force_const_negate ()?
> + }
> + else
> + break;
> + }
> + while (TREE_CODE (rhs[1]) == SSA_NAME && !rhs[3])
> + {
> + gimple *g = SSA_NAME_DEF_STMT (rhs[1]);
> + if (has_single_use (rhs[1])
> + && is_gimple_assign (g)
> + && gimple_assign_rhs_code (g) == PLUS_EXPR)
> + {
> + rhs[1] = gimple_assign_rhs1 (g);
> + if (rhs[2])
> + rhs[3] = gimple_assign_rhs2 (g);
> + else
> + rhs[2] = gimple_assign_rhs2 (g);
> + }
> + else
> + break;
> + }
> + if (rhs[2] && !rhs[3])
> + {
> + for (int i = (code == MINUS_EXPR ? 1 : 0); i < 3; ++i)
> + if (TREE_CODE (rhs[i]) == SSA_NAME)
> + {
> + gimple *im = SSA_NAME_DEF_STMT (rhs[i]);
> + if (gimple_assign_cast_p (im))
> + {
> + tree op = gimple_assign_rhs1 (im);
> + if (TREE_CODE (op) == SSA_NAME
> + && INTEGRAL_TYPE_P (TREE_TYPE (op))
> + && (TYPE_PRECISION (TREE_TYPE (op)) > 1
> + || TYPE_UNSIGNED (TREE_TYPE (op)))
> + && has_single_use (rhs[i]))
> + im = SSA_NAME_DEF_STMT (op);
> + }
> + if (is_gimple_assign (im)
> + && gimple_assign_rhs_code (im) == NE_EXPR
> + && integer_zerop (gimple_assign_rhs2 (im))
> + && TREE_CODE (gimple_assign_rhs1 (im)) == SSA_NAME
> + && has_single_use (gimple_assign_lhs (im)))
> + im = SSA_NAME_DEF_STMT (gimple_assign_rhs1 (im));
> + if (is_gimple_assign (im)
> + && gimple_assign_rhs_code (im) == IMAGPART_EXPR
> + && (TREE_CODE (TREE_OPERAND (gimple_assign_rhs1 (im), 0))
> + == SSA_NAME))
> + {
> + tree rhs1 = gimple_assign_rhs1 (im);
> + gimple *ovf = SSA_NAME_DEF_STMT (TREE_OPERAND (rhs1, 0));
> + if (gimple_call_internal_p (ovf, code == PLUS_EXPR
> + ? IFN_ADDC : IFN_SUBC)
> + && (optab_handler (code == PLUS_EXPR
> + ? addc5_optab : subc5_optab,
> + TYPE_MODE (type))
> + != CODE_FOR_nothing))
> + {
> + if (i != 2)
> + std::swap (rhs[i], rhs[2]);
> + gimple *g
> + = gimple_build_call_internal (code == PLUS_EXPR
> + ? IFN_ADDC : IFN_SUBC,
> + 3, rhs[0], rhs[1],
> + rhs[2]);
> + tree nlhs = make_ssa_name (build_complex_type (type));
> + gimple_call_set_lhs (g, nlhs);
> + gsi_insert_before (gsi, g, GSI_SAME_STMT);
> + tree ilhs = gimple_assign_lhs (stmt);
> + g = gimple_build_assign (ilhs, REALPART_EXPR,
> + build1 (REALPART_EXPR,
> + TREE_TYPE (ilhs),
> + nlhs));
> + gsi_replace (gsi, g, true);
> + return true;
> + }
> + }
> + }
> + return false;
> + }
> + if (code == MINUS_EXPR && !rhs[2])
> + return false;
> + if (code == MINUS_EXPR)
> + /* Code below expects rhs[0] and rhs[1] to have the IMAGPART_EXPRs.
> + So, for MINUS_EXPR swap the single added rhs operand (others are
> + subtracted) to rhs[3]. */
> + std::swap (rhs[0], rhs[3]);
> + }
> + gimple *im1 = NULL, *im2 = NULL;
> + for (int i = 0; i < (code == MINUS_EXPR ? 3 : 4); i++)
> + if (rhs[i] && TREE_CODE (rhs[i]) == SSA_NAME)
> + {
> + gimple *im = SSA_NAME_DEF_STMT (rhs[i]);
> + if (gimple_assign_cast_p (im))
> + {
> + tree op = gimple_assign_rhs1 (im);
> + if (TREE_CODE (op) == SSA_NAME
> + && INTEGRAL_TYPE_P (TREE_TYPE (op))
> + && (TYPE_PRECISION (TREE_TYPE (op)) > 1
> + || TYPE_UNSIGNED (TREE_TYPE (op)))
> + && has_single_use (rhs[i]))
> + im = SSA_NAME_DEF_STMT (op);
> + }
> + if (is_gimple_assign (im)
> + && gimple_assign_rhs_code (im) == NE_EXPR
> + && integer_zerop (gimple_assign_rhs2 (im))
> + && TREE_CODE (gimple_assign_rhs1 (im)) == SSA_NAME
> + && has_single_use (gimple_assign_lhs (im)))
> + im = SSA_NAME_DEF_STMT (gimple_assign_rhs1 (im));
> + if (is_gimple_assign (im)
> + && gimple_assign_rhs_code (im) == IMAGPART_EXPR
> + && TREE_CODE (TREE_OPERAND (gimple_assign_rhs1 (im), 0)) == SSA_NAME)
> + {
> + if (im1 == NULL)
> + {
> + im1 = im;
> + if (i != 0)
> + std::swap (rhs[0], rhs[i]);
> + }
> + else
> + {
> + im2 = im;
> + if (i != 1)
> + std::swap (rhs[1], rhs[i]);
> + break;
> + }
> + }
> + }
> + if (!im2)
> + return false;
> + gimple *ovf1
> + = SSA_NAME_DEF_STMT (TREE_OPERAND (gimple_assign_rhs1 (im1), 0));
> + gimple *ovf2
> + = SSA_NAME_DEF_STMT (TREE_OPERAND (gimple_assign_rhs1 (im2), 0));
> + internal_fn ifn;
> + if (!is_gimple_call (ovf1)
> + || !gimple_call_internal_p (ovf1)
> + || ((ifn = gimple_call_internal_fn (ovf1)) != IFN_ADD_OVERFLOW
> + && ifn != IFN_SUB_OVERFLOW)
> + || !gimple_call_internal_p (ovf2, ifn)
> + || optab_handler (ifn == IFN_ADD_OVERFLOW ? addc5_optab : subc5_optab,
> + TYPE_MODE (type)) == CODE_FOR_nothing
> + || (rhs[2]
> + && optab_handler (code == PLUS_EXPR ? addc5_optab : subc5_optab,
> + TYPE_MODE (type)) == CODE_FOR_nothing))
> + return false;
> + tree arg1, arg2, arg3 = NULL_TREE;
> + gimple *re1 = NULL, *re2 = NULL;
> + for (int i = (ifn == IFN_ADD_OVERFLOW ? 1 : 0); i >= 0; --i)
> + for (gimple *ovf = ovf1; ovf; ovf = (ovf == ovf1 ? ovf2 : NULL))
> + {
> + tree arg = gimple_call_arg (ovf, i);
> + if (TREE_CODE (arg) != SSA_NAME)
> + continue;
> + re1 = SSA_NAME_DEF_STMT (arg);
> + if (is_gimple_assign (re1)
> + && gimple_assign_rhs_code (re1) == REALPART_EXPR
> + && (TREE_CODE (TREE_OPERAND (gimple_assign_rhs1 (re1), 0))
> + == SSA_NAME)
> + && (SSA_NAME_DEF_STMT (TREE_OPERAND (gimple_assign_rhs1 (re1), 0))
> + == (ovf == ovf1 ? ovf2 : ovf1)))
> + {
> + if (ovf == ovf1)
> + {
> + std::swap (rhs[0], rhs[1]);
> + std::swap (im1, im2);
> + std::swap (ovf1, ovf2);
> + }
> + arg3 = gimple_call_arg (ovf, 1 - i);
> + i = -1;
> + break;
> + }
> + }
> + if (!arg3)
> + return false;
> + arg1 = gimple_call_arg (ovf1, 0);
> + arg2 = gimple_call_arg (ovf1, 1);
> + if (!types_compatible_p (type, TREE_TYPE (arg1)))
> + return false;
> + int kind[2] = { 0, 0 };
> + /* At least one of arg2 and arg3 should have type compatible
> + with arg1/rhs[0], and the other one should have value in [0, 1]
> + range. */
> + for (int i = 0; i < 2; ++i)
> + {
> + tree arg = i == 0 ? arg2 : arg3;
> + if (types_compatible_p (type, TREE_TYPE (arg)))
> + kind[i] = 1;
> + if (!INTEGRAL_TYPE_P (TREE_TYPE (arg))
> + || (TYPE_PRECISION (TREE_TYPE (arg)) == 1
> + && !TYPE_UNSIGNED (TREE_TYPE (arg))))
> + continue;
> + if (tree_zero_one_valued_p (arg))
> + kind[i] |= 2;
> + if (TREE_CODE (arg) == SSA_NAME)
> + {
> + gimple *g = SSA_NAME_DEF_STMT (arg);
> + if (gimple_assign_cast_p (g))
> + {
> + tree op = gimple_assign_rhs1 (g);
> + if (TREE_CODE (op) == SSA_NAME
> + && INTEGRAL_TYPE_P (TREE_TYPE (op)))
> + g = SSA_NAME_DEF_STMT (op);
> + }
> + if (is_gimple_assign (g)
> + && gimple_assign_rhs_code (g) == NE_EXPR
> + && integer_zerop (gimple_assign_rhs2 (g))
> + && TREE_CODE (gimple_assign_rhs1 (g)) == SSA_NAME)
> + g = SSA_NAME_DEF_STMT (gimple_assign_rhs1 (g));
> + if (!is_gimple_assign (g)
> + || gimple_assign_rhs_code (g) != IMAGPART_EXPR
> + || (TREE_CODE (TREE_OPERAND (gimple_assign_rhs1 (g), 0))
> + != SSA_NAME))
> + continue;
> + g = SSA_NAME_DEF_STMT (TREE_OPERAND (gimple_assign_rhs1 (g), 0));
> + if (!is_gimple_call (g) || !gimple_call_internal_p (g))
> + continue;
> + switch (gimple_call_internal_fn (g))
> + {
> + case IFN_ADD_OVERFLOW:
> + case IFN_SUB_OVERFLOW:
> + case IFN_ADDC:
> + case IFN_SUBC:
> + break;
> + default:
> + continue;
> + }
> + kind[i] |= 4;
> + }
> + }
> + /* Make arg2 the one with compatible type and arg3 the one
> + with [0, 1] range. If both is true for both operands,
> + prefer as arg3 result of __imag__ of some ifn. */
> + if ((kind[0] & 1) == 0 || ((kind[1] & 1) != 0 && kind[0] > kind[1]))
> + {
> + std::swap (arg2, arg3);
> + std::swap (kind[0], kind[1]);
> + }
> + if ((kind[0] & 1) == 0 || (kind[1] & 6) == 0)
> + return false;
> + if (!has_single_use (gimple_assign_lhs (im1))
> + || !has_single_use (gimple_assign_lhs (im2))
> + || !has_single_use (gimple_assign_lhs (re1))
> + || num_imm_uses (gimple_call_lhs (ovf1)) != 2)
> + return false;
> + use_operand_p use_p;
> + imm_use_iterator iter;
> + tree lhs = gimple_call_lhs (ovf2);
> + FOR_EACH_IMM_USE_FAST (use_p, iter, lhs)
> + {
> + gimple *use_stmt = USE_STMT (use_p);
> + if (is_gimple_debug (use_stmt))
> + continue;
> + if (use_stmt == im2)
> + continue;
> + if (re2)
> + return false;
> + if (!is_gimple_assign (use_stmt)
> + && gimple_assign_rhs_code (use_stmt) != REALPART_EXPR)
> + return false;
> + re2 = use_stmt;
> + }
> + gimple_stmt_iterator gsi2 = gsi_for_stmt (ovf2);
> + gimple *g;
> + if ((kind[1] & 1) == 0)
> + {
> + if (TREE_CODE (arg3) == INTEGER_CST)
> + arg3 = fold_convert (type, arg3);
> + else
> + {
> + g = gimple_build_assign (make_ssa_name (type), NOP_EXPR, arg3);
> + gsi_insert_before (&gsi2, g, GSI_SAME_STMT);
> + arg3 = gimple_assign_lhs (g);
> + }
> + }
> + g = gimple_build_call_internal (ifn == IFN_ADD_OVERFLOW
> + ? IFN_ADDC : IFN_SUBC, 3, arg1, arg2, arg3);
> + tree nlhs = make_ssa_name (TREE_TYPE (lhs));
> + gimple_call_set_lhs (g, nlhs);
> + gsi_insert_before (&gsi2, g, GSI_SAME_STMT);
> + tree ilhs = rhs[2] ? make_ssa_name (type) : gimple_assign_lhs (stmt);
> + g = gimple_build_assign (ilhs, IMAGPART_EXPR,
> + build1 (IMAGPART_EXPR, TREE_TYPE (ilhs), nlhs));
> + if (rhs[2])
> + gsi_insert_before (gsi, g, GSI_SAME_STMT);
> + else
> + gsi_replace (gsi, g, true);
> + tree rhs1 = rhs[1];
> + for (int i = 0; i < 2; i++)
> + if (rhs1 == gimple_assign_lhs (im2))
> + break;
> + else
> + {
> + g = SSA_NAME_DEF_STMT (rhs1);
> + rhs1 = gimple_assign_rhs1 (g);
> + gsi2 = gsi_for_stmt (g);
> + gsi_remove (&gsi2, true);
> + }
> + gcc_checking_assert (rhs1 == gimple_assign_lhs (im2));
> + gsi2 = gsi_for_stmt (im2);
> + gsi_remove (&gsi2, true);
> + gsi2 = gsi_for_stmt (re2);
> + tree rlhs = gimple_assign_lhs (re2);
> + g = gimple_build_assign (rlhs, REALPART_EXPR,
> + build1 (REALPART_EXPR, TREE_TYPE (rlhs), nlhs));
> + gsi_replace (&gsi2, g, true);
> + if (rhs[2])
> + {
> + g = gimple_build_call_internal (code == PLUS_EXPR ? IFN_ADDC : IFN_SUBC,
> + 3, rhs[3], rhs[2], ilhs);
> + nlhs = make_ssa_name (TREE_TYPE (lhs));
> + gimple_call_set_lhs (g, nlhs);
> + gsi_insert_before (gsi, g, GSI_SAME_STMT);
> + ilhs = gimple_assign_lhs (stmt);
> + g = gimple_build_assign (ilhs, REALPART_EXPR,
> + build1 (REALPART_EXPR, TREE_TYPE (ilhs), nlhs));
> + gsi_replace (gsi, g, true);
> + }
> + if (TREE_CODE (arg3) == SSA_NAME)
> + {
> + gimple *im3 = SSA_NAME_DEF_STMT (arg3);
> + for (int i = 0; gimple_assign_cast_p (im3) && i < 2; ++i)
> + {
> + tree op = gimple_assign_rhs1 (im3);
> + if (TREE_CODE (op) == SSA_NAME
> + && INTEGRAL_TYPE_P (TREE_TYPE (op))
> + && (TYPE_PRECISION (TREE_TYPE (op)) > 1
> + || TYPE_UNSIGNED (TREE_TYPE (op))))
> + im3 = SSA_NAME_DEF_STMT (op);
> + else
> + break;
> + }
> + if (is_gimple_assign (im3)
> + && gimple_assign_rhs_code (im3) == NE_EXPR
> + && integer_zerop (gimple_assign_rhs2 (im3))
> + && TREE_CODE (gimple_assign_rhs1 (im3)) == SSA_NAME
> + && has_single_use (gimple_assign_lhs (im3)))
> + im3 = SSA_NAME_DEF_STMT (gimple_assign_rhs1 (im3));
> + if (is_gimple_assign (im3)
> + && gimple_assign_rhs_code (im3) == IMAGPART_EXPR
> + && (TREE_CODE (TREE_OPERAND (gimple_assign_rhs1 (im3), 0))
> + == SSA_NAME))
> + {
> + gimple *ovf3
> + = SSA_NAME_DEF_STMT (TREE_OPERAND (gimple_assign_rhs1 (im3), 0));
> + if (gimple_call_internal_p (ovf3, ifn))
> + {
> + lhs = gimple_call_lhs (ovf3);
> + arg1 = gimple_call_arg (ovf3, 0);
> + arg2 = gimple_call_arg (ovf3, 1);
> + if (types_compatible_p (type, TREE_TYPE (TREE_TYPE (lhs)))
> + && types_compatible_p (type, TREE_TYPE (arg1))
> + && types_compatible_p (type, TREE_TYPE (arg2)))
> + {
> + g = gimple_build_call_internal (ifn == IFN_ADD_OVERFLOW
> + ? IFN_ADDC : IFN_SUBC,
> + 3, arg1, arg2,
> + build_zero_cst (type));
> + gimple_call_set_lhs (g, lhs);
> + gsi2 = gsi_for_stmt (ovf3);
> + gsi_replace (&gsi2, g, true);
> + }
> + }
> + }
> + }
> + return true;
> +}
> +
> /* Return true if target has support for divmod. */
>
> static bool
> @@ -5068,8 +5500,9 @@ math_opts_dom_walker::after_dom_children
>
> case PLUS_EXPR:
> case MINUS_EXPR:
> - if (!convert_plusminus_to_widen (&gsi, stmt, code))
> - match_arith_overflow (&gsi, stmt, code, m_cfg_changed_p);
> + if (!convert_plusminus_to_widen (&gsi, stmt, code)
> + && !match_arith_overflow (&gsi, stmt, code, m_cfg_changed_p))
> + match_addc_subc (&gsi, stmt, code);
> break;
>
> case BIT_NOT_EXPR:
> @@ -5085,6 +5518,11 @@ math_opts_dom_walker::after_dom_children
> convert_mult_to_highpart (as_a<gassign *> (stmt), &gsi);
> break;
>
> + case BIT_IOR_EXPR:
> + case BIT_XOR_EXPR:
> + match_addc_subc (&gsi, stmt, code);
> + break;
> +
> default:;
> }
> }
> --- gcc/gimple-fold.cc.jj 2023-05-01 09:59:46.434297471 +0200
> +++ gcc/gimple-fold.cc 2023-06-06 13:35:15.463010972 +0200
> @@ -5585,6 +5585,7 @@ gimple_fold_call (gimple_stmt_iterator *
> enum tree_code subcode = ERROR_MARK;
> tree result = NULL_TREE;
> bool cplx_result = false;
> + bool addc_subc = false;
> tree overflow = NULL_TREE;
> switch (gimple_call_internal_fn (stmt))
> {
> @@ -5658,6 +5659,16 @@ gimple_fold_call (gimple_stmt_iterator *
> subcode = MULT_EXPR;
> cplx_result = true;
> break;
> + case IFN_ADDC:
> + subcode = PLUS_EXPR;
> + cplx_result = true;
> + addc_subc = true;
> + break;
> + case IFN_SUBC:
> + subcode = MINUS_EXPR;
> + cplx_result = true;
> + addc_subc = true;
> + break;
> case IFN_MASK_LOAD:
> changed |= gimple_fold_partial_load (gsi, stmt, true);
> break;
> @@ -5677,6 +5688,7 @@ gimple_fold_call (gimple_stmt_iterator *
> {
> tree arg0 = gimple_call_arg (stmt, 0);
> tree arg1 = gimple_call_arg (stmt, 1);
> + tree arg2 = NULL_TREE;
> tree type = TREE_TYPE (arg0);
> if (cplx_result)
> {
> @@ -5685,9 +5697,26 @@ gimple_fold_call (gimple_stmt_iterator *
> type = NULL_TREE;
> else
> type = TREE_TYPE (TREE_TYPE (lhs));
> + if (addc_subc)
> + arg2 = gimple_call_arg (stmt, 2);
> }
> if (type == NULL_TREE)
> ;
> + else if (addc_subc)
> + {
> + if (!integer_zerop (arg2))
> + ;
> + /* x = y + 0 + 0; x = y - 0 - 0; */
> + else if (integer_zerop (arg1))
> + result = arg0;
> + /* x = 0 + y + 0; */
> + else if (subcode != MINUS_EXPR && integer_zerop (arg0))
> + result = arg1;
> + /* x = y - y - 0; */
> + else if (subcode == MINUS_EXPR
> + && operand_equal_p (arg0, arg1, 0))
> + result = integer_zero_node;
> + }
So this all performs simplifications but also constant folding. In
particular the match.pd re-simplification will invoke fold_const_call
on all-constant argument function calls but does not do extra folding
on partially constant arg cases but instead relies on patterns here.
Can you add all-constant arg handling to fold_const_call and
consider moving cases like y + 0 + 0 to match.pd?
> /* x = y + 0; x = y - 0; x = y * 0; */
> else if (integer_zerop (arg1))
> result = subcode == MULT_EXPR ? integer_zero_node : arg0;
> @@ -5702,8 +5731,11 @@ gimple_fold_call (gimple_stmt_iterator *
> result = arg0;
> else if (subcode == MULT_EXPR && integer_onep (arg0))
> result = arg1;
> - else if (TREE_CODE (arg0) == INTEGER_CST
> - && TREE_CODE (arg1) == INTEGER_CST)
> + if (type
> + && result == NULL_TREE
> + && TREE_CODE (arg0) == INTEGER_CST
> + && TREE_CODE (arg1) == INTEGER_CST
> + && (!addc_subc || TREE_CODE (arg2) == INTEGER_CST))
> {
> if (cplx_result)
> result = int_const_binop (subcode, fold_convert (type, arg0),
> @@ -5717,6 +5749,15 @@ gimple_fold_call (gimple_stmt_iterator *
> else
> result = NULL_TREE;
> }
> + if (addc_subc && result)
> + {
> + tree r = int_const_binop (subcode, result,
> + fold_convert (type, arg2));
> + if (r == NULL_TREE)
> + result = NULL_TREE;
> + else if (arith_overflowed_p (subcode, type, result, arg2))
> + overflow = build_one_cst (type);
> + }
> }
> if (result)
> {
> --- gcc/gimple-range-fold.cc.jj 2023-05-25 09:42:28.034696783 +0200
> +++ gcc/gimple-range-fold.cc 2023-06-06 09:41:06.716896505 +0200
> @@ -489,6 +489,8 @@ adjust_imagpart_expr (vrange &res, const
> case IFN_ADD_OVERFLOW:
> case IFN_SUB_OVERFLOW:
> case IFN_MUL_OVERFLOW:
> + case IFN_ADDC:
> + case IFN_SUBC:
> case IFN_ATOMIC_COMPARE_EXCHANGE:
> {
> int_range<2> r;
> --- gcc/tree-ssa-dce.cc.jj 2023-05-15 19:12:35.012626408 +0200
> +++ gcc/tree-ssa-dce.cc 2023-06-06 13:35:30.271802380 +0200
> @@ -1481,6 +1481,14 @@ eliminate_unnecessary_stmts (bool aggres
> case IFN_MUL_OVERFLOW:
> maybe_optimize_arith_overflow (&gsi, MULT_EXPR);
> break;
> + case IFN_ADDC:
> + if (integer_zerop (gimple_call_arg (stmt, 2)))
> + maybe_optimize_arith_overflow (&gsi, PLUS_EXPR);
> + break;
> + case IFN_SUBC:
> + if (integer_zerop (gimple_call_arg (stmt, 2)))
> + maybe_optimize_arith_overflow (&gsi, MINUS_EXPR);
> + break;
> default:
> break;
> }
> --- gcc/doc/md.texi.jj 2023-05-25 09:42:28.009697144 +0200
> +++ gcc/doc/md.texi 2023-06-06 13:33:56.565122304 +0200
> @@ -5202,6 +5202,22 @@ is taken only on unsigned overflow.
> @item @samp{usubv@var{m}4}, @samp{umulv@var{m}4}
> Similar, for other unsigned arithmetic operations.
>
> +@cindex @code{addc@var{m}5} instruction pattern
> +@item @samp{addc@var{m}5}
> +Adds operands 2, 3 and 4 (where the last operand is guaranteed to have
> +only values 0 or 1) together, sets operand 0 to the result of the
> +addition of the 3 operands and sets operand 1 to 1 iff there was no
> +overflow on the unsigned additions, and to 0 otherwise. So, it is
> +an addition with carry in (operand 4) and carry out (operand 1).
> +All operands have the same mode.
operand 1 set to 1 for no overflow sounds weird when specifying it
as carry out - can you double check?
> +
> +@cindex @code{subc@var{m}5} instruction pattern
> +@item @samp{subc@var{m}5}
> +Similarly to @samp{addc@var{m}5}, except subtracts operands 3 and 4
> +from operand 2 instead of adding them. So, it is
> +a subtraction with carry/borrow in (operand 4) and carry/borrow out
> +(operand 1). All operands have the same mode.
> +
I wonder if we want to name them uaddc and usubc? Or is this supposed
to be simply the twos-complement "carry"? I think the docs should
say so then (note we do have uaddv and addv).
Otherwise the middle-end parts look reasonable - as mentioned the
pattern matching looks on the border of un-maintainable (guess
we have other bits in forwprop in similar category though).
I'll obviously leave the x86 patterns to Uros.
Thanks,
Richard.
> @cindex @code{addptr@var{m}3} instruction pattern
> @item @samp{addptr@var{m}3}
> Like @code{add@var{m}3} but is guaranteed to only be used for address
> --- gcc/config/i386/i386.md.jj 2023-05-11 11:54:42.906956432 +0200
> +++ gcc/config/i386/i386.md 2023-06-06 16:27:38.300455824 +0200
> @@ -7685,6 +7685,25 @@ (define_peephole2
> [(set (reg:CC FLAGS_REG)
> (compare:CC (match_dup 0) (match_dup 1)))])
>
> +(define_peephole2
> + [(set (match_operand:SWI 0 "general_reg_operand")
> + (match_operand:SWI 1 "memory_operand"))
> + (parallel [(set (reg:CC FLAGS_REG)
> + (compare:CC (match_dup 0)
> + (match_operand:SWI 2 "memory_operand")))
> + (set (match_dup 0)
> + (minus:SWI (match_dup 0) (match_dup 2)))])
> + (set (match_dup 1) (match_dup 0))]
> + "(TARGET_READ_MODIFY_WRITE || optimize_insn_for_size_p ())
> + && peep2_reg_dead_p (3, operands[0])
> + && !reg_overlap_mentioned_p (operands[0], operands[1])
> + && !reg_overlap_mentioned_p (operands[0], operands[2])"
> + [(set (match_dup 0) (match_dup 2))
> + (parallel [(set (reg:CC FLAGS_REG)
> + (compare:CC (match_dup 1) (match_dup 0)))
> + (set (match_dup 1)
> + (minus:SWI (match_dup 1) (match_dup 0)))])])
> +
> ;; decl %eax; cmpl $-1, %eax; jne .Lxx; can be optimized into
> ;; subl $1, %eax; jnc .Lxx;
> (define_peephole2
> @@ -7770,6 +7789,59 @@ (define_insn "@add<mode>3_carry"
> (set_attr "pent_pair" "pu")
> (set_attr "mode" "<MODE>")])
>
> +(define_peephole2
> + [(set (match_operand:SWI 0 "general_reg_operand")
> + (match_operand:SWI 1 "memory_operand"))
> + (parallel [(set (match_dup 0)
> + (plus:SWI
> + (plus:SWI
> + (match_operator:SWI 4 "ix86_carry_flag_operator"
> + [(match_operand 3 "flags_reg_operand")
> + (const_int 0)])
> + (match_dup 0))
> + (match_operand:SWI 2 "memory_operand")))
> + (clobber (reg:CC FLAGS_REG))])
> + (set (match_dup 1) (match_dup 0))]
> + "(TARGET_READ_MODIFY_WRITE || optimize_insn_for_size_p ())
> + && peep2_reg_dead_p (3, operands[0])
> + && !reg_overlap_mentioned_p (operands[0], operands[1])
> + && !reg_overlap_mentioned_p (operands[0], operands[2])"
> + [(set (match_dup 0) (match_dup 2))
> + (parallel [(set (match_dup 1)
> + (plus:SWI (plus:SWI (match_op_dup 4
> + [(match_dup 3) (const_int 0)])
> + (match_dup 1))
> + (match_dup 0)))
> + (clobber (reg:CC FLAGS_REG))])])
> +
> +(define_peephole2
> + [(set (match_operand:SWI 0 "general_reg_operand")
> + (match_operand:SWI 1 "memory_operand"))
> + (parallel [(set (match_dup 0)
> + (plus:SWI
> + (plus:SWI
> + (match_operator:SWI 4 "ix86_carry_flag_operator"
> + [(match_operand 3 "flags_reg_operand")
> + (const_int 0)])
> + (match_dup 0))
> + (match_operand:SWI 2 "memory_operand")))
> + (clobber (reg:CC FLAGS_REG))])
> + (set (match_operand:SWI 5 "general_reg_operand") (match_dup 0))
> + (set (match_dup 1) (match_dup 5))]
> + "(TARGET_READ_MODIFY_WRITE || optimize_insn_for_size_p ())
> + && peep2_reg_dead_p (3, operands[0])
> + && peep2_reg_dead_p (4, operands[5])
> + && !reg_overlap_mentioned_p (operands[0], operands[1])
> + && !reg_overlap_mentioned_p (operands[0], operands[2])
> + && !reg_overlap_mentioned_p (operands[5], operands[1])"
> + [(set (match_dup 0) (match_dup 2))
> + (parallel [(set (match_dup 1)
> + (plus:SWI (plus:SWI (match_op_dup 4
> + [(match_dup 3) (const_int 0)])
> + (match_dup 1))
> + (match_dup 0)))
> + (clobber (reg:CC FLAGS_REG))])])
> +
> (define_insn "*add<mode>3_carry_0"
> [(set (match_operand:SWI 0 "nonimmediate_operand" "=<r>m")
> (plus:SWI
> @@ -7870,6 +7942,159 @@ (define_insn "addcarry<mode>"
> (set_attr "pent_pair" "pu")
> (set_attr "mode" "<MODE>")])
>
> +;; Helper peephole2 for the addcarry<mode> and subborrow<mode>
> +;; peephole2s, to optimize away nop which resulted from addc/subc
> +;; expansion optimization.
> +(define_peephole2
> + [(set (match_operand:SWI48 0 "general_reg_operand")
> + (match_operand:SWI48 1 "memory_operand"))
> + (const_int 0)]
> + ""
> + [(set (match_dup 0) (match_dup 1))])
> +
> +(define_peephole2
> + [(parallel [(set (reg:CCC FLAGS_REG)
> + (compare:CCC
> + (zero_extend:<DWI>
> + (plus:SWI48
> + (plus:SWI48
> + (match_operator:SWI48 4 "ix86_carry_flag_operator"
> + [(match_operand 2 "flags_reg_operand")
> + (const_int 0)])
> + (match_operand:SWI48 0 "general_reg_operand"))
> + (match_operand:SWI48 1 "memory_operand")))
> + (plus:<DWI>
> + (zero_extend:<DWI> (match_dup 1))
> + (match_operator:<DWI> 3 "ix86_carry_flag_operator"
> + [(match_dup 2) (const_int 0)]))))
> + (set (match_dup 0)
> + (plus:SWI48 (plus:SWI48 (match_op_dup 4
> + [(match_dup 2) (const_int 0)])
> + (match_dup 0))
> + (match_dup 1)))])
> + (set (match_dup 1) (match_dup 0))]
> + "(TARGET_READ_MODIFY_WRITE || optimize_insn_for_size_p ())
> + && peep2_reg_dead_p (2, operands[0])
> + && !reg_overlap_mentioned_p (operands[0], operands[1])"
> + [(parallel [(set (reg:CCC FLAGS_REG)
> + (compare:CCC
> + (zero_extend:<DWI>
> + (plus:SWI48
> + (plus:SWI48
> + (match_op_dup 4
> + [(match_dup 2) (const_int 0)])
> + (match_dup 1))
> + (match_dup 0)))
> + (plus:<DWI>
> + (zero_extend:<DWI> (match_dup 0))
> + (match_op_dup 3
> + [(match_dup 2) (const_int 0)]))))
> + (set (match_dup 1)
> + (plus:SWI48 (plus:SWI48 (match_op_dup 4
> + [(match_dup 2) (const_int 0)])
> + (match_dup 1))
> + (match_dup 0)))])])
> +
> +(define_peephole2
> + [(set (match_operand:SWI48 0 "general_reg_operand")
> + (match_operand:SWI48 1 "memory_operand"))
> + (parallel [(set (reg:CCC FLAGS_REG)
> + (compare:CCC
> + (zero_extend:<DWI>
> + (plus:SWI48
> + (plus:SWI48
> + (match_operator:SWI48 5 "ix86_carry_flag_operator"
> + [(match_operand 3 "flags_reg_operand")
> + (const_int 0)])
> + (match_dup 0))
> + (match_operand:SWI48 2 "memory_operand")))
> + (plus:<DWI>
> + (zero_extend:<DWI> (match_dup 2))
> + (match_operator:<DWI> 4 "ix86_carry_flag_operator"
> + [(match_dup 3) (const_int 0)]))))
> + (set (match_dup 0)
> + (plus:SWI48 (plus:SWI48 (match_op_dup 5
> + [(match_dup 3) (const_int 0)])
> + (match_dup 0))
> + (match_dup 2)))])
> + (set (match_dup 1) (match_dup 0))]
> + "(TARGET_READ_MODIFY_WRITE || optimize_insn_for_size_p ())
> + && peep2_reg_dead_p (3, operands[0])
> + && !reg_overlap_mentioned_p (operands[0], operands[1])
> + && !reg_overlap_mentioned_p (operands[0], operands[2])"
> + [(set (match_dup 0) (match_dup 2))
> + (parallel [(set (reg:CCC FLAGS_REG)
> + (compare:CCC
> + (zero_extend:<DWI>
> + (plus:SWI48
> + (plus:SWI48
> + (match_op_dup 5
> + [(match_dup 3) (const_int 0)])
> + (match_dup 1))
> + (match_dup 0)))
> + (plus:<DWI>
> + (zero_extend:<DWI> (match_dup 0))
> + (match_op_dup 4
> + [(match_dup 3) (const_int 0)]))))
> + (set (match_dup 1)
> + (plus:SWI48 (plus:SWI48 (match_op_dup 5
> + [(match_dup 3) (const_int 0)])
> + (match_dup 1))
> + (match_dup 0)))])])
> +
> +(define_peephole2
> + [(parallel [(set (reg:CCC FLAGS_REG)
> + (compare:CCC
> + (zero_extend:<DWI>
> + (plus:SWI48
> + (plus:SWI48
> + (match_operator:SWI48 4 "ix86_carry_flag_operator"
> + [(match_operand 2 "flags_reg_operand")
> + (const_int 0)])
> + (match_operand:SWI48 0 "general_reg_operand"))
> + (match_operand:SWI48 1 "memory_operand")))
> + (plus:<DWI>
> + (zero_extend:<DWI> (match_dup 1))
> + (match_operator:<DWI> 3 "ix86_carry_flag_operator"
> + [(match_dup 2) (const_int 0)]))))
> + (set (match_dup 0)
> + (plus:SWI48 (plus:SWI48 (match_op_dup 4
> + [(match_dup 2) (const_int 0)])
> + (match_dup 0))
> + (match_dup 1)))])
> + (set (match_operand:QI 5 "general_reg_operand")
> + (ltu:QI (reg:CCC FLAGS_REG) (const_int 0)))
> + (set (match_operand:SWI48 6 "general_reg_operand")
> + (zero_extend:SWI48 (match_dup 5)))
> + (set (match_dup 1) (match_dup 0))]
> + "(TARGET_READ_MODIFY_WRITE || optimize_insn_for_size_p ())
> + && peep2_reg_dead_p (4, operands[0])
> + && !reg_overlap_mentioned_p (operands[0], operands[1])
> + && !reg_overlap_mentioned_p (operands[0], operands[5])
> + && !reg_overlap_mentioned_p (operands[5], operands[1])
> + && !reg_overlap_mentioned_p (operands[0], operands[6])
> + && !reg_overlap_mentioned_p (operands[6], operands[1])"
> + [(parallel [(set (reg:CCC FLAGS_REG)
> + (compare:CCC
> + (zero_extend:<DWI>
> + (plus:SWI48
> + (plus:SWI48
> + (match_op_dup 4
> + [(match_dup 2) (const_int 0)])
> + (match_dup 1))
> + (match_dup 0)))
> + (plus:<DWI>
> + (zero_extend:<DWI> (match_dup 0))
> + (match_op_dup 3
> + [(match_dup 2) (const_int 0)]))))
> + (set (match_dup 1)
> + (plus:SWI48 (plus:SWI48 (match_op_dup 4
> + [(match_dup 2) (const_int 0)])
> + (match_dup 1))
> + (match_dup 0)))])
> + (set (match_dup 5) (ltu:QI (reg:CCC FLAGS_REG) (const_int 0)))
> + (set (match_dup 6) (zero_extend:SWI48 (match_dup 5)))])
> +
> (define_expand "addcarry<mode>_0"
> [(parallel
> [(set (reg:CCC FLAGS_REG)
> @@ -7940,6 +8165,59 @@ (define_insn "@sub<mode>3_carry"
> (set_attr "pent_pair" "pu")
> (set_attr "mode" "<MODE>")])
>
> +(define_peephole2
> + [(set (match_operand:SWI 0 "general_reg_operand")
> + (match_operand:SWI 1 "memory_operand"))
> + (parallel [(set (match_dup 0)
> + (minus:SWI
> + (minus:SWI
> + (match_dup 0)
> + (match_operator:SWI 4 "ix86_carry_flag_operator"
> + [(match_operand 3 "flags_reg_operand")
> + (const_int 0)]))
> + (match_operand:SWI 2 "memory_operand")))
> + (clobber (reg:CC FLAGS_REG))])
> + (set (match_dup 1) (match_dup 0))]
> + "(TARGET_READ_MODIFY_WRITE || optimize_insn_for_size_p ())
> + && peep2_reg_dead_p (3, operands[0])
> + && !reg_overlap_mentioned_p (operands[0], operands[1])
> + && !reg_overlap_mentioned_p (operands[0], operands[2])"
> + [(set (match_dup 0) (match_dup 2))
> + (parallel [(set (match_dup 1)
> + (minus:SWI (minus:SWI (match_dup 1)
> + (match_op_dup 4
> + [(match_dup 3) (const_int 0)]))
> + (match_dup 0)))
> + (clobber (reg:CC FLAGS_REG))])])
> +
> +(define_peephole2
> + [(set (match_operand:SWI 0 "general_reg_operand")
> + (match_operand:SWI 1 "memory_operand"))
> + (parallel [(set (match_dup 0)
> + (minus:SWI
> + (minus:SWI
> + (match_dup 0)
> + (match_operator:SWI 4 "ix86_carry_flag_operator"
> + [(match_operand 3 "flags_reg_operand")
> + (const_int 0)]))
> + (match_operand:SWI 2 "memory_operand")))
> + (clobber (reg:CC FLAGS_REG))])
> + (set (match_operand:SWI 5 "general_reg_operand") (match_dup 0))
> + (set (match_dup 1) (match_dup 5))]
> + "(TARGET_READ_MODIFY_WRITE || optimize_insn_for_size_p ())
> + && peep2_reg_dead_p (3, operands[0])
> + && peep2_reg_dead_p (4, operands[5])
> + && !reg_overlap_mentioned_p (operands[0], operands[1])
> + && !reg_overlap_mentioned_p (operands[0], operands[2])
> + && !reg_overlap_mentioned_p (operands[5], operands[1])"
> + [(set (match_dup 0) (match_dup 2))
> + (parallel [(set (match_dup 1)
> + (minus:SWI (minus:SWI (match_dup 1)
> + (match_op_dup 4
> + [(match_dup 3) (const_int 0)]))
> + (match_dup 0)))
> + (clobber (reg:CC FLAGS_REG))])])
> +
> (define_insn "*sub<mode>3_carry_0"
> [(set (match_operand:SWI 0 "nonimmediate_operand" "=<r>m")
> (minus:SWI
> @@ -8065,13 +8343,13 @@ (define_insn "subborrow<mode>"
> [(set (reg:CCC FLAGS_REG)
> (compare:CCC
> (zero_extend:<DWI>
> - (match_operand:SWI48 1 "nonimmediate_operand" "0"))
> + (match_operand:SWI48 1 "nonimmediate_operand" "0,0"))
> (plus:<DWI>
> (match_operator:<DWI> 4 "ix86_carry_flag_operator"
> [(match_operand 3 "flags_reg_operand") (const_int 0)])
> (zero_extend:<DWI>
> - (match_operand:SWI48 2 "nonimmediate_operand" "rm")))))
> - (set (match_operand:SWI48 0 "register_operand" "=r")
> + (match_operand:SWI48 2 "nonimmediate_operand" "r,rm")))))
> + (set (match_operand:SWI48 0 "nonimmediate_operand" "=rm,r")
> (minus:SWI48 (minus:SWI48
> (match_dup 1)
> (match_operator:SWI48 5 "ix86_carry_flag_operator"
> @@ -8084,6 +8362,154 @@ (define_insn "subborrow<mode>"
> (set_attr "pent_pair" "pu")
> (set_attr "mode" "<MODE>")])
>
> +(define_peephole2
> + [(set (match_operand:SWI48 0 "general_reg_operand")
> + (match_operand:SWI48 1 "memory_operand"))
> + (parallel [(set (reg:CCC FLAGS_REG)
> + (compare:CCC
> + (zero_extend:<DWI> (match_dup 0))
> + (plus:<DWI>
> + (match_operator:<DWI> 4 "ix86_carry_flag_operator"
> + [(match_operand 3 "flags_reg_operand") (const_int 0)])
> + (zero_extend:<DWI>
> + (match_operand:SWI48 2 "memory_operand")))))
> + (set (match_dup 0)
> + (minus:SWI48
> + (minus:SWI48
> + (match_dup 0)
> + (match_operator:SWI48 5 "ix86_carry_flag_operator"
> + [(match_dup 3) (const_int 0)]))
> + (match_dup 2)))])
> + (set (match_dup 1) (match_dup 0))]
> + "(TARGET_READ_MODIFY_WRITE || optimize_insn_for_size_p ())
> + && peep2_reg_dead_p (3, operands[0])
> + && !reg_overlap_mentioned_p (operands[0], operands[1])
> + && !reg_overlap_mentioned_p (operands[0], operands[2])"
> + [(set (match_dup 0) (match_dup 2))
> + (parallel [(set (reg:CCC FLAGS_REG)
> + (compare:CCC
> + (zero_extend:<DWI> (match_dup 1))
> + (plus:<DWI> (match_op_dup 4
> + [(match_dup 3) (const_int 0)])
> + (zero_extend:<DWI> (match_dup 0)))))
> + (set (match_dup 1)
> + (minus:SWI48 (minus:SWI48 (match_dup 1)
> + (match_op_dup 5
> + [(match_dup 3) (const_int 0)]))
> + (match_dup 0)))])])
> +
> +(define_peephole2
> + [(set (match_operand:SWI48 6 "general_reg_operand")
> + (match_operand:SWI48 7 "memory_operand"))
> + (set (match_operand:SWI48 8 "general_reg_operand")
> + (match_operand:SWI48 9 "memory_operand"))
> + (parallel [(set (reg:CCC FLAGS_REG)
> + (compare:CCC
> + (zero_extend:<DWI>
> + (match_operand:SWI48 0 "general_reg_operand"))
> + (plus:<DWI>
> + (match_operator:<DWI> 4 "ix86_carry_flag_operator"
> + [(match_operand 3 "flags_reg_operand") (const_int 0)])
> + (zero_extend:<DWI>
> + (match_operand:SWI48 2 "general_reg_operand")))))
> + (set (match_dup 0)
> + (minus:SWI48
> + (minus:SWI48
> + (match_dup 0)
> + (match_operator:SWI48 5 "ix86_carry_flag_operator"
> + [(match_dup 3) (const_int 0)]))
> + (match_dup 2)))])
> + (set (match_operand:SWI48 1 "memory_operand") (match_dup 0))]
> + "(TARGET_READ_MODIFY_WRITE || optimize_insn_for_size_p ())
> + && peep2_reg_dead_p (4, operands[0])
> + && peep2_reg_dead_p (3, operands[2])
> + && !reg_overlap_mentioned_p (operands[0], operands[1])
> + && !reg_overlap_mentioned_p (operands[2], operands[1])
> + && !reg_overlap_mentioned_p (operands[6], operands[9])
> + && (rtx_equal_p (operands[6], operands[0])
> + ? (rtx_equal_p (operands[7], operands[1])
> + && rtx_equal_p (operands[8], operands[2]))
> + : (rtx_equal_p (operands[8], operands[0])
> + && rtx_equal_p (operands[9], operands[1])
> + && rtx_equal_p (operands[6], operands[2])))"
> + [(set (match_dup 0) (match_dup 9))
> + (parallel [(set (reg:CCC FLAGS_REG)
> + (compare:CCC
> + (zero_extend:<DWI> (match_dup 1))
> + (plus:<DWI> (match_op_dup 4
> + [(match_dup 3) (const_int 0)])
> + (zero_extend:<DWI> (match_dup 0)))))
> + (set (match_dup 1)
> + (minus:SWI48 (minus:SWI48 (match_dup 1)
> + (match_op_dup 5
> + [(match_dup 3) (const_int 0)]))
> + (match_dup 0)))])]
> +{
> + if (!rtx_equal_p (operands[6], operands[0]))
> + operands[9] = operands[7];
> +})
> +
> +(define_peephole2
> + [(set (match_operand:SWI48 6 "general_reg_operand")
> + (match_operand:SWI48 7 "memory_operand"))
> + (set (match_operand:SWI48 8 "general_reg_operand")
> + (match_operand:SWI48 9 "memory_operand"))
> + (parallel [(set (reg:CCC FLAGS_REG)
> + (compare:CCC
> + (zero_extend:<DWI>
> + (match_operand:SWI48 0 "general_reg_operand"))
> + (plus:<DWI>
> + (match_operator:<DWI> 4 "ix86_carry_flag_operator"
> + [(match_operand 3 "flags_reg_operand") (const_int 0)])
> + (zero_extend:<DWI>
> + (match_operand:SWI48 2 "general_reg_operand")))))
> + (set (match_dup 0)
> + (minus:SWI48
> + (minus:SWI48
> + (match_dup 0)
> + (match_operator:SWI48 5 "ix86_carry_flag_operator"
> + [(match_dup 3) (const_int 0)]))
> + (match_dup 2)))])
> + (set (match_operand:QI 10 "general_reg_operand")
> + (ltu:QI (reg:CCC FLAGS_REG) (const_int 0)))
> + (set (match_operand:SWI48 11 "general_reg_operand")
> + (zero_extend:SWI48 (match_dup 10)))
> + (set (match_operand:SWI48 1 "memory_operand") (match_dup 0))]
> + "(TARGET_READ_MODIFY_WRITE || optimize_insn_for_size_p ())
> + && peep2_reg_dead_p (6, operands[0])
> + && peep2_reg_dead_p (3, operands[2])
> + && !reg_overlap_mentioned_p (operands[0], operands[1])
> + && !reg_overlap_mentioned_p (operands[2], operands[1])
> + && !reg_overlap_mentioned_p (operands[6], operands[9])
> + && !reg_overlap_mentioned_p (operands[0], operands[10])
> + && !reg_overlap_mentioned_p (operands[10], operands[1])
> + && !reg_overlap_mentioned_p (operands[0], operands[11])
> + && !reg_overlap_mentioned_p (operands[11], operands[1])
> + && (rtx_equal_p (operands[6], operands[0])
> + ? (rtx_equal_p (operands[7], operands[1])
> + && rtx_equal_p (operands[8], operands[2]))
> + : (rtx_equal_p (operands[8], operands[0])
> + && rtx_equal_p (operands[9], operands[1])
> + && rtx_equal_p (operands[6], operands[2])))"
> + [(set (match_dup 0) (match_dup 9))
> + (parallel [(set (reg:CCC FLAGS_REG)
> + (compare:CCC
> + (zero_extend:<DWI> (match_dup 1))
> + (plus:<DWI> (match_op_dup 4
> + [(match_dup 3) (const_int 0)])
> + (zero_extend:<DWI> (match_dup 0)))))
> + (set (match_dup 1)
> + (minus:SWI48 (minus:SWI48 (match_dup 1)
> + (match_op_dup 5
> + [(match_dup 3) (const_int 0)]))
> + (match_dup 0)))])
> + (set (match_dup 10) (ltu:QI (reg:CCC FLAGS_REG) (const_int 0)))
> + (set (match_dup 11) (zero_extend:SWI48 (match_dup 10)))]
> +{
> + if (!rtx_equal_p (operands[6], operands[0]))
> + operands[9] = operands[7];
> +})
> +
> (define_expand "subborrow<mode>_0"
> [(parallel
> [(set (reg:CC FLAGS_REG)
> @@ -8094,6 +8520,67 @@ (define_expand "subborrow<mode>_0"
> (minus:SWI48 (match_dup 1) (match_dup 2)))])]
> "ix86_binary_operator_ok (MINUS, <MODE>mode, operands)")
>
> +(define_expand "addc<mode>5"
> + [(match_operand:SWI48 0 "register_operand")
> + (match_operand:SWI48 1 "register_operand")
> + (match_operand:SWI48 2 "register_operand")
> + (match_operand:SWI48 3 "register_operand")
> + (match_operand:SWI48 4 "nonmemory_operand")]
> + ""
> +{
> + rtx cf = gen_rtx_REG (CCCmode, FLAGS_REG), pat, pat2;
> + if (operands[4] == const0_rtx)
> + emit_insn (gen_addcarry<mode>_0 (operands[0], operands[2], operands[3]));
> + else
> + {
> + rtx op4 = copy_to_mode_reg (QImode,
> + convert_to_mode (QImode, operands[4], 1));
> + emit_insn (gen_addqi3_cconly_overflow (op4, constm1_rtx));
> + pat = gen_rtx_LTU (<DWI>mode, cf, const0_rtx);
> + pat2 = gen_rtx_LTU (<MODE>mode, cf, const0_rtx);
> + emit_insn (gen_addcarry<mode> (operands[0], operands[2], operands[3],
> + cf, pat, pat2));
> + }
> + rtx cc = gen_reg_rtx (QImode);
> + pat = gen_rtx_LTU (QImode, cf, const0_rtx);
> + emit_insn (gen_rtx_SET (cc, pat));
> + emit_insn (gen_zero_extendqi<mode>2 (operands[1], cc));
> + DONE;
> +})
> +
> +(define_expand "subc<mode>5"
> + [(match_operand:SWI48 0 "register_operand")
> + (match_operand:SWI48 1 "register_operand")
> + (match_operand:SWI48 2 "register_operand")
> + (match_operand:SWI48 3 "register_operand")
> + (match_operand:SWI48 4 "nonmemory_operand")]
> + ""
> +{
> + rtx cf, pat, pat2;
> + if (operands[4] == const0_rtx)
> + {
> + cf = gen_rtx_REG (CCmode, FLAGS_REG);
> + emit_insn (gen_subborrow<mode>_0 (operands[0], operands[2],
> + operands[3]));
> + }
> + else
> + {
> + cf = gen_rtx_REG (CCCmode, FLAGS_REG);
> + rtx op4 = copy_to_mode_reg (QImode,
> + convert_to_mode (QImode, operands[4], 1));
> + emit_insn (gen_addqi3_cconly_overflow (op4, constm1_rtx));
> + pat = gen_rtx_LTU (<DWI>mode, cf, const0_rtx);
> + pat2 = gen_rtx_LTU (<MODE>mode, cf, const0_rtx);
> + emit_insn (gen_subborrow<mode> (operands[0], operands[2], operands[3],
> + cf, pat, pat2));
> + }
> + rtx cc = gen_reg_rtx (QImode);
> + pat = gen_rtx_LTU (QImode, cf, const0_rtx);
> + emit_insn (gen_rtx_SET (cc, pat));
> + emit_insn (gen_zero_extendqi<mode>2 (operands[1], cc));
> + DONE;
> +})
> +
> (define_mode_iterator CC_CCC [CC CCC])
>
> ;; Pre-reload splitter to optimize
> @@ -8163,6 +8650,27 @@ (define_peephole2
> (compare:CCC
> (plus:SWI (match_dup 1) (match_dup 0))
> (match_dup 1)))
> + (set (match_dup 1) (plus:SWI (match_dup 1) (match_dup 0)))])])
> +
> +(define_peephole2
> + [(set (match_operand:SWI 0 "general_reg_operand")
> + (match_operand:SWI 1 "memory_operand"))
> + (parallel [(set (reg:CCC FLAGS_REG)
> + (compare:CCC
> + (plus:SWI (match_dup 0)
> + (match_operand:SWI 2 "memory_operand"))
> + (match_dup 0)))
> + (set (match_dup 0) (plus:SWI (match_dup 0) (match_dup 2)))])
> + (set (match_dup 1) (match_dup 0))]
> + "(TARGET_READ_MODIFY_WRITE || optimize_insn_for_size_p ())
> + && peep2_reg_dead_p (3, operands[0])
> + && !reg_overlap_mentioned_p (operands[0], operands[1])
> + && !reg_overlap_mentioned_p (operands[0], operands[2])"
> + [(set (match_dup 0) (match_dup 2))
> + (parallel [(set (reg:CCC FLAGS_REG)
> + (compare:CCC
> + (plus:SWI (match_dup 1) (match_dup 0))
> + (match_dup 1)))
> (set (match_dup 1) (plus:SWI (match_dup 1) (match_dup 0)))])])
>
> (define_insn "*addsi3_zext_cc_overflow_1"
> --- gcc/testsuite/gcc.target/i386/pr79173-1.c.jj 2023-06-06 13:23:03.667319915 +0200
> +++ gcc/testsuite/gcc.target/i386/pr79173-1.c 2023-06-06 13:53:04.087958943 +0200
> @@ -0,0 +1,59 @@
> +/* PR middle-end/79173 */
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fno-stack-protector -masm=att" } */
> +/* { dg-final { scan-assembler-times "addq\t%r\[^\n\r]*, \\\(%rdi\\\)" 1 { target lp64 } } } */
> +/* { dg-final { scan-assembler-times "adcq\t%r\[^\n\r]*, 8\\\(%rdi\\\)" 1 { target lp64 } } } */
> +/* { dg-final { scan-assembler-times "adcq\t%r\[^\n\r]*, 16\\\(%rdi\\\)" 1 { target lp64 } } } */
> +/* { dg-final { scan-assembler-times "adcq\t%r\[^\n\r]*, 24\\\(%rdi\\\)" 1 { target lp64 } } } */
> +/* { dg-final { scan-assembler-times "subq\t%r\[^\n\r]*, \\\(%rdi\\\)" 1 { target lp64 } } } */
> +/* { dg-final { scan-assembler-times "sbbq\t%r\[^\n\r]*, 8\\\(%rdi\\\)" 1 { target lp64 } } } */
> +/* { dg-final { scan-assembler-times "sbbq\t%r\[^\n\r]*, 16\\\(%rdi\\\)" 1 { target lp64 } } } */
> +/* { dg-final { scan-assembler-times "sbbq\t%r\[^\n\r]*, 24\\\(%rdi\\\)" 1 { target lp64 } } } */
> +/* { dg-final { scan-assembler-times "addl\t%e\[^\n\r]*, \\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
> +/* { dg-final { scan-assembler-times "adcl\t%e\[^\n\r]*, 4\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
> +/* { dg-final { scan-assembler-times "adcl\t%e\[^\n\r]*, 8\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
> +/* { dg-final { scan-assembler-times "adcl\t%e\[^\n\r]*, 12\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
> +/* { dg-final { scan-assembler-times "subl\t%e\[^\n\r]*, \\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
> +/* { dg-final { scan-assembler-times "sbbl\t%e\[^\n\r]*, 4\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
> +/* { dg-final { scan-assembler-times "sbbl\t%e\[^\n\r]*, 8\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
> +/* { dg-final { scan-assembler-times "sbbl\t%e\[^\n\r]*, 12\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
> +
> +static unsigned long
> +addc (unsigned long x, unsigned long y, unsigned long carry_in, unsigned long *carry_out)
> +{
> + unsigned long r;
> + unsigned long c1 = __builtin_add_overflow (x, y, &r);
> + unsigned long c2 = __builtin_add_overflow (r, carry_in, &r);
> + *carry_out = c1 + c2;
> + return r;
> +}
> +
> +static unsigned long
> +subc (unsigned long x, unsigned long y, unsigned long carry_in, unsigned long *carry_out)
> +{
> + unsigned long r;
> + unsigned long c1 = __builtin_sub_overflow (x, y, &r);
> + unsigned long c2 = __builtin_sub_overflow (r, carry_in, &r);
> + *carry_out = c1 + c2;
> + return r;
> +}
> +
> +void
> +foo (unsigned long *p, unsigned long *q)
> +{
> + unsigned long c;
> + p[0] = addc (p[0], q[0], 0, &c);
> + p[1] = addc (p[1], q[1], c, &c);
> + p[2] = addc (p[2], q[2], c, &c);
> + p[3] = addc (p[3], q[3], c, &c);
> +}
> +
> +void
> +bar (unsigned long *p, unsigned long *q)
> +{
> + unsigned long c;
> + p[0] = subc (p[0], q[0], 0, &c);
> + p[1] = subc (p[1], q[1], c, &c);
> + p[2] = subc (p[2], q[2], c, &c);
> + p[3] = subc (p[3], q[3], c, &c);
> +}
> --- gcc/testsuite/gcc.target/i386/pr79173-2.c.jj 2023-06-06 13:23:49.482674416 +0200
> +++ gcc/testsuite/gcc.target/i386/pr79173-2.c 2023-06-06 13:53:04.088958929 +0200
> @@ -0,0 +1,59 @@
> +/* PR middle-end/79173 */
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fno-stack-protector -masm=att" } */
> +/* { dg-final { scan-assembler-times "addq\t%r\[^\n\r]*, \\\(%rdi\\\)" 1 { target lp64 } } } */
> +/* { dg-final { scan-assembler-times "adcq\t%r\[^\n\r]*, 8\\\(%rdi\\\)" 1 { target lp64 } } } */
> +/* { dg-final { scan-assembler-times "adcq\t%r\[^\n\r]*, 16\\\(%rdi\\\)" 1 { target lp64 } } } */
> +/* { dg-final { scan-assembler-times "adcq\t%r\[^\n\r]*, 24\\\(%rdi\\\)" 1 { target lp64 } } } */
> +/* { dg-final { scan-assembler-times "subq\t%r\[^\n\r]*, \\\(%rdi\\\)" 1 { target lp64 } } } */
> +/* { dg-final { scan-assembler-times "sbbq\t%r\[^\n\r]*, 8\\\(%rdi\\\)" 1 { target lp64 } } } */
> +/* { dg-final { scan-assembler-times "sbbq\t%r\[^\n\r]*, 16\\\(%rdi\\\)" 1 { target lp64 } } } */
> +/* { dg-final { scan-assembler-times "sbbq\t%r\[^\n\r]*, 24\\\(%rdi\\\)" 1 { target lp64 } } } */
> +/* { dg-final { scan-assembler-times "addl\t%e\[^\n\r]*, \\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
> +/* { dg-final { scan-assembler-times "adcl\t%e\[^\n\r]*, 4\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
> +/* { dg-final { scan-assembler-times "adcl\t%e\[^\n\r]*, 8\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
> +/* { dg-final { scan-assembler-times "adcl\t%e\[^\n\r]*, 12\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
> +/* { dg-final { scan-assembler-times "subl\t%e\[^\n\r]*, \\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
> +/* { dg-final { scan-assembler-times "sbbl\t%e\[^\n\r]*, 4\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
> +/* { dg-final { scan-assembler-times "sbbl\t%e\[^\n\r]*, 8\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
> +/* { dg-final { scan-assembler-times "sbbl\t%e\[^\n\r]*, 12\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
> +
> +static unsigned long
> +addc (unsigned long x, unsigned long y, _Bool carry_in, _Bool *carry_out)
> +{
> + unsigned long r;
> + _Bool c1 = __builtin_add_overflow (x, y, &r);
> + _Bool c2 = __builtin_add_overflow (r, carry_in, &r);
> + *carry_out = c1 | c2;
> + return r;
> +}
> +
> +static unsigned long
> +subc (unsigned long x, unsigned long y, _Bool carry_in, _Bool *carry_out)
> +{
> + unsigned long r;
> + _Bool c1 = __builtin_sub_overflow (x, y, &r);
> + _Bool c2 = __builtin_sub_overflow (r, carry_in, &r);
> + *carry_out = c1 | c2;
> + return r;
> +}
> +
> +void
> +foo (unsigned long *p, unsigned long *q)
> +{
> + _Bool c;
> + p[0] = addc (p[0], q[0], 0, &c);
> + p[1] = addc (p[1], q[1], c, &c);
> + p[2] = addc (p[2], q[2], c, &c);
> + p[3] = addc (p[3], q[3], c, &c);
> +}
> +
> +void
> +bar (unsigned long *p, unsigned long *q)
> +{
> + _Bool c;
> + p[0] = subc (p[0], q[0], 0, &c);
> + p[1] = subc (p[1], q[1], c, &c);
> + p[2] = subc (p[2], q[2], c, &c);
> + p[3] = subc (p[3], q[3], c, &c);
> +}
> --- gcc/testsuite/gcc.target/i386/pr79173-3.c.jj 2023-06-06 13:23:52.680629360 +0200
> +++ gcc/testsuite/gcc.target/i386/pr79173-3.c 2023-06-06 13:53:04.088958929 +0200
> @@ -0,0 +1,61 @@
> +/* PR middle-end/79173 */
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fno-stack-protector -masm=att" } */
> +/* { dg-final { scan-assembler-times "addq\t%r\[^\n\r]*, \\\(%rdi\\\)" 1 { target lp64 } } } */
> +/* { dg-final { scan-assembler-times "adcq\t%r\[^\n\r]*, 8\\\(%rdi\\\)" 1 { target lp64 } } } */
> +/* { dg-final { scan-assembler-times "adcq\t%r\[^\n\r]*, 16\\\(%rdi\\\)" 1 { target lp64 } } } */
> +/* { dg-final { scan-assembler-times "adcq\t%r\[^\n\r]*, 24\\\(%rdi\\\)" 1 { target lp64 } } } */
> +/* { dg-final { scan-assembler-times "subq\t%r\[^\n\r]*, \\\(%rdi\\\)" 1 { target lp64 } } } */
> +/* { dg-final { scan-assembler-times "sbbq\t%r\[^\n\r]*, 8\\\(%rdi\\\)" 1 { target lp64 } } } */
> +/* { dg-final { scan-assembler-times "sbbq\t%r\[^\n\r]*, 16\\\(%rdi\\\)" 1 { target lp64 } } } */
> +/* { dg-final { scan-assembler-times "sbbq\t%r\[^\n\r]*, 24\\\(%rdi\\\)" 1 { target lp64 } } } */
> +/* { dg-final { scan-assembler-times "addl\t%e\[^\n\r]*, \\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
> +/* { dg-final { scan-assembler-times "adcl\t%e\[^\n\r]*, 4\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
> +/* { dg-final { scan-assembler-times "adcl\t%e\[^\n\r]*, 8\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
> +/* { dg-final { scan-assembler-times "adcl\t%e\[^\n\r]*, 12\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
> +/* { dg-final { scan-assembler-times "subl\t%e\[^\n\r]*, \\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
> +/* { dg-final { scan-assembler-times "sbbl\t%e\[^\n\r]*, 4\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
> +/* { dg-final { scan-assembler-times "sbbl\t%e\[^\n\r]*, 8\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
> +/* { dg-final { scan-assembler-times "sbbl\t%e\[^\n\r]*, 12\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
> +
> +static unsigned long
> +addc (unsigned long x, unsigned long y, unsigned long carry_in, unsigned long *carry_out)
> +{
> + unsigned long r;
> + unsigned long c1 = __builtin_add_overflow (x, y, &r);
> + unsigned long c2 = __builtin_add_overflow (r, carry_in, &r);
> + *carry_out = c1 + c2;
> + return r;
> +}
> +
> +static unsigned long
> +subc (unsigned long x, unsigned long y, unsigned long carry_in, unsigned long *carry_out)
> +{
> + unsigned long r;
> + unsigned long c1 = __builtin_sub_overflow (x, y, &r);
> + unsigned long c2 = __builtin_sub_overflow (r, carry_in, &r);
> + *carry_out = c1 + c2;
> + return r;
> +}
> +
> +unsigned long
> +foo (unsigned long *p, unsigned long *q)
> +{
> + unsigned long c;
> + p[0] = addc (p[0], q[0], 0, &c);
> + p[1] = addc (p[1], q[1], c, &c);
> + p[2] = addc (p[2], q[2], c, &c);
> + p[3] = addc (p[3], q[3], c, &c);
> + return c;
> +}
> +
> +unsigned long
> +bar (unsigned long *p, unsigned long *q)
> +{
> + unsigned long c;
> + p[0] = subc (p[0], q[0], 0, &c);
> + p[1] = subc (p[1], q[1], c, &c);
> + p[2] = subc (p[2], q[2], c, &c);
> + p[3] = subc (p[3], q[3], c, &c);
> + return c;
> +}
> --- gcc/testsuite/gcc.target/i386/pr79173-4.c.jj 2023-06-06 13:23:55.895584064 +0200
> +++ gcc/testsuite/gcc.target/i386/pr79173-4.c 2023-06-06 13:53:04.088958929 +0200
> @@ -0,0 +1,61 @@
> +/* PR middle-end/79173 */
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fno-stack-protector -masm=att" } */
> +/* { dg-final { scan-assembler-times "addq\t%r\[^\n\r]*, \\\(%rdi\\\)" 1 { target lp64 } } } */
> +/* { dg-final { scan-assembler-times "adcq\t%r\[^\n\r]*, 8\\\(%rdi\\\)" 1 { target lp64 } } } */
> +/* { dg-final { scan-assembler-times "adcq\t%r\[^\n\r]*, 16\\\(%rdi\\\)" 1 { target lp64 } } } */
> +/* { dg-final { scan-assembler-times "adcq\t%r\[^\n\r]*, 24\\\(%rdi\\\)" 1 { target lp64 } } } */
> +/* { dg-final { scan-assembler-times "subq\t%r\[^\n\r]*, \\\(%rdi\\\)" 1 { target lp64 } } } */
> +/* { dg-final { scan-assembler-times "sbbq\t%r\[^\n\r]*, 8\\\(%rdi\\\)" 1 { target lp64 } } } */
> +/* { dg-final { scan-assembler-times "sbbq\t%r\[^\n\r]*, 16\\\(%rdi\\\)" 1 { target lp64 } } } */
> +/* { dg-final { scan-assembler-times "sbbq\t%r\[^\n\r]*, 24\\\(%rdi\\\)" 1 { target lp64 } } } */
> +/* { dg-final { scan-assembler-times "addl\t%e\[^\n\r]*, \\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
> +/* { dg-final { scan-assembler-times "adcl\t%e\[^\n\r]*, 4\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
> +/* { dg-final { scan-assembler-times "adcl\t%e\[^\n\r]*, 8\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
> +/* { dg-final { scan-assembler-times "adcl\t%e\[^\n\r]*, 12\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
> +/* { dg-final { scan-assembler-times "subl\t%e\[^\n\r]*, \\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
> +/* { dg-final { scan-assembler-times "sbbl\t%e\[^\n\r]*, 4\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
> +/* { dg-final { scan-assembler-times "sbbl\t%e\[^\n\r]*, 8\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
> +/* { dg-final { scan-assembler-times "sbbl\t%e\[^\n\r]*, 12\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
> +
> +static unsigned long
> +addc (unsigned long x, unsigned long y, _Bool carry_in, _Bool *carry_out)
> +{
> + unsigned long r;
> + _Bool c1 = __builtin_add_overflow (x, y, &r);
> + _Bool c2 = __builtin_add_overflow (r, carry_in, &r);
> + *carry_out = c1 ^ c2;
> + return r;
> +}
> +
> +static unsigned long
> +subc (unsigned long x, unsigned long y, _Bool carry_in, _Bool *carry_out)
> +{
> + unsigned long r;
> + _Bool c1 = __builtin_sub_overflow (x, y, &r);
> + _Bool c2 = __builtin_sub_overflow (r, carry_in, &r);
> + *carry_out = c1 ^ c2;
> + return r;
> +}
> +
> +_Bool
> +foo (unsigned long *p, unsigned long *q)
> +{
> + _Bool c;
> + p[0] = addc (p[0], q[0], 0, &c);
> + p[1] = addc (p[1], q[1], c, &c);
> + p[2] = addc (p[2], q[2], c, &c);
> + p[3] = addc (p[3], q[3], c, &c);
> + return c;
> +}
> +
> +_Bool
> +bar (unsigned long *p, unsigned long *q)
> +{
> + _Bool c;
> + p[0] = subc (p[0], q[0], 0, &c);
> + p[1] = subc (p[1], q[1], c, &c);
> + p[2] = subc (p[2], q[2], c, &c);
> + p[3] = subc (p[3], q[3], c, &c);
> + return c;
> +}
> --- gcc/testsuite/gcc.target/i386/pr79173-5.c.jj 2023-06-06 13:39:52.283111764 +0200
> +++ gcc/testsuite/gcc.target/i386/pr79173-5.c 2023-06-06 17:33:36.370088539 +0200
> @@ -0,0 +1,32 @@
> +/* PR middle-end/79173 */
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fno-stack-protector -masm=att" } */
> +/* { dg-final { scan-assembler-times "addq\t%r\[^\n\r]*, \\\(%rdi\\\)" 1 { target lp64 } } } */
> +/* { dg-final { scan-assembler-times "adcq\t%r\[^\n\r]*, 8\\\(%rdi\\\)" 1 { target lp64 } } } */
> +/* { dg-final { scan-assembler-times "adcq\t%r\[^\n\r]*, 16\\\(%rdi\\\)" 1 { target lp64 } } } */
> +/* { dg-final { scan-assembler-times "adcq\t%r\[^\n\r]*, 24\\\(%rdi\\\)" 1 { target lp64 } } } */
> +/* { dg-final { scan-assembler-times "addl\t%e\[^\n\r]*, \\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
> +/* { dg-final { scan-assembler-times "adcl\t%e\[^\n\r]*, 4\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
> +/* { dg-final { scan-assembler-times "adcl\t%e\[^\n\r]*, 8\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
> +/* { dg-final { scan-assembler-times "adcl\t%e\[^\n\r]*, 12\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
> +
> +static unsigned long
> +addc (unsigned long x, unsigned long y, unsigned long carry_in, unsigned long *carry_out)
> +{
> + unsigned long r = x + y;
> + unsigned long c1 = r < x;
> + r += carry_in;
> + unsigned long c2 = r < carry_in;
> + *carry_out = c1 + c2;
> + return r;
> +}
> +
> +void
> +foo (unsigned long *p, unsigned long *q)
> +{
> + unsigned long c;
> + p[0] = addc (p[0], q[0], 0, &c);
> + p[1] = addc (p[1], q[1], c, &c);
> + p[2] = addc (p[2], q[2], c, &c);
> + p[3] = addc (p[3], q[3], c, &c);
> +}
> --- gcc/testsuite/gcc.target/i386/pr79173-6.c.jj 2023-06-06 17:34:25.618401505 +0200
> +++ gcc/testsuite/gcc.target/i386/pr79173-6.c 2023-06-06 17:36:11.248927942 +0200
> @@ -0,0 +1,33 @@
> +/* PR middle-end/79173 */
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fno-stack-protector -masm=att" } */
> +/* { dg-final { scan-assembler-times "addq\t%r\[^\n\r]*, \\\(%rdi\\\)" 1 { target lp64 } } } */
> +/* { dg-final { scan-assembler-times "adcq\t%r\[^\n\r]*, 8\\\(%rdi\\\)" 1 { target lp64 } } } */
> +/* { dg-final { scan-assembler-times "adcq\t%r\[^\n\r]*, 16\\\(%rdi\\\)" 1 { target lp64 } } } */
> +/* { dg-final { scan-assembler-times "adcq\t%r\[^\n\r]*, 24\\\(%rdi\\\)" 1 { target lp64 } } } */
> +/* { dg-final { scan-assembler-times "addl\t%e\[^\n\r]*, \\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
> +/* { dg-final { scan-assembler-times "adcl\t%e\[^\n\r]*, 4\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
> +/* { dg-final { scan-assembler-times "adcl\t%e\[^\n\r]*, 8\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
> +/* { dg-final { scan-assembler-times "adcl\t%e\[^\n\r]*, 12\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
> +
> +static unsigned long
> +addc (unsigned long x, unsigned long y, unsigned long carry_in, unsigned long *carry_out)
> +{
> + unsigned long r = x + y;
> + unsigned long c1 = r < x;
> + r += carry_in;
> + unsigned long c2 = r < carry_in;
> + *carry_out = c1 + c2;
> + return r;
> +}
> +
> +unsigned long
> +foo (unsigned long *p, unsigned long *q)
> +{
> + unsigned long c;
> + p[0] = addc (p[0], q[0], 0, &c);
> + p[1] = addc (p[1], q[1], c, &c);
> + p[2] = addc (p[2], q[2], c, &c);
> + p[3] = addc (p[3], q[3], c, &c);
> + return c;
> +}
> --- gcc/testsuite/gcc.target/i386/pr79173-7.c.jj 2023-06-06 17:49:46.702561308 +0200
> +++ gcc/testsuite/gcc.target/i386/pr79173-7.c 2023-06-06 17:50:36.364871245 +0200
> @@ -0,0 +1,31 @@
> +/* PR middle-end/79173 */
> +/* { dg-do compile { target lp64 } } */
> +/* { dg-options "-O2 -fno-stack-protector -masm=att" } */
> +/* { dg-final { scan-assembler-times "addq\t%r\[^\n\r]*, \\\(%rdi\\\)" 1 } } */
> +/* { dg-final { scan-assembler-times "adcq\t%r\[^\n\r]*, 8\\\(%rdi\\\)" 1 } } */
> +/* { dg-final { scan-assembler-times "adcq\t%r\[^\n\r]*, 16\\\(%rdi\\\)" 1 } } */
> +/* { dg-final { scan-assembler-times "adcq\t%r\[^\n\r]*, 24\\\(%rdi\\\)" 1 } } */
> +/* { dg-final { scan-assembler-times "subq\t%r\[^\n\r]*, \\\(%rdi\\\)" 1 } } */
> +/* { dg-final { scan-assembler-times "sbbq\t%r\[^\n\r]*, 8\\\(%rdi\\\)" 1 } } */
> +/* { dg-final { scan-assembler-times "sbbq\t%r\[^\n\r]*, 16\\\(%rdi\\\)" 1 } } */
> +/* { dg-final { scan-assembler-times "sbbq\t%r\[^\n\r]*, 24\\\(%rdi\\\)" 1 } } */
> +
> +#include <x86intrin.h>
> +
> +void
> +foo (unsigned long long *p, unsigned long long *q)
> +{
> + unsigned char c = _addcarry_u64 (0, p[0], q[0], &p[0]);
> + c = _addcarry_u64 (c, p[1], q[1], &p[1]);
> + c = _addcarry_u64 (c, p[2], q[2], &p[2]);
> + _addcarry_u64 (c, p[3], q[3], &p[3]);
> +}
> +
> +void
> +bar (unsigned long long *p, unsigned long long *q)
> +{
> + unsigned char c = _subborrow_u64 (0, p[0], q[0], &p[0]);
> + c = _subborrow_u64 (c, p[1], q[1], &p[1]);
> + c = _subborrow_u64 (c, p[2], q[2], &p[2]);
> + _subborrow_u64 (c, p[3], q[3], &p[3]);
> +}
> --- gcc/testsuite/gcc.target/i386/pr79173-8.c.jj 2023-06-06 17:50:45.970737772 +0200
> +++ gcc/testsuite/gcc.target/i386/pr79173-8.c 2023-06-06 17:52:19.564437290 +0200
> @@ -0,0 +1,31 @@
> +/* PR middle-end/79173 */
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fno-stack-protector -masm=att" } */
> +/* { dg-final { scan-assembler-times "addl\t%e\[^\n\r]*, \\\(%\[^\n\r]*\\\)" 1 } } */
> +/* { dg-final { scan-assembler-times "adcl\t%e\[^\n\r]*, 4\\\(%\[^\n\r]*\\\)" 1 } } */
> +/* { dg-final { scan-assembler-times "adcl\t%e\[^\n\r]*, 8\\\(%\[^\n\r]*\\\)" 1 } } */
> +/* { dg-final { scan-assembler-times "adcl\t%e\[^\n\r]*, 12\\\(%\[^\n\r]*\\\)" 1 } } */
> +/* { dg-final { scan-assembler-times "subl\t%e\[^\n\r]*, \\\(%\[^\n\r]*\\\)" 1 } } */
> +/* { dg-final { scan-assembler-times "sbbl\t%e\[^\n\r]*, 4\\\(%\[^\n\r]*\\\)" 1 } } */
> +/* { dg-final { scan-assembler-times "sbbl\t%e\[^\n\r]*, 8\\\(%\[^\n\r]*\\\)" 1 } } */
> +/* { dg-final { scan-assembler-times "sbbl\t%e\[^\n\r]*, 12\\\(%\[^\n\r]*\\\)" 1 } } */
> +
> +#include <x86intrin.h>
> +
> +void
> +foo (unsigned int *p, unsigned int *q)
> +{
> + unsigned char c = _addcarry_u32 (0, p[0], q[0], &p[0]);
> + c = _addcarry_u32 (c, p[1], q[1], &p[1]);
> + c = _addcarry_u32 (c, p[2], q[2], &p[2]);
> + _addcarry_u32 (c, p[3], q[3], &p[3]);
> +}
> +
> +void
> +bar (unsigned int *p, unsigned int *q)
> +{
> + unsigned char c = _subborrow_u32 (0, p[0], q[0], &p[0]);
> + c = _subborrow_u32 (c, p[1], q[1], &p[1]);
> + c = _subborrow_u32 (c, p[2], q[2], &p[2]);
> + _subborrow_u32 (c, p[3], q[3], &p[3]);
> +}
> --- gcc/testsuite/gcc.target/i386/pr79173-9.c.jj 2023-06-06 17:52:35.869210734 +0200
> +++ gcc/testsuite/gcc.target/i386/pr79173-9.c 2023-06-06 17:53:00.076874369 +0200
> @@ -0,0 +1,31 @@
> +/* PR middle-end/79173 */
> +/* { dg-do compile { target lp64 } } */
> +/* { dg-options "-O2 -fno-stack-protector -masm=att" } */
> +/* { dg-final { scan-assembler-times "addq\t%r\[^\n\r]*, \\\(%rdi\\\)" 1 } } */
> +/* { dg-final { scan-assembler-times "adcq\t%r\[^\n\r]*, 8\\\(%rdi\\\)" 1 } } */
> +/* { dg-final { scan-assembler-times "adcq\t%r\[^\n\r]*, 16\\\(%rdi\\\)" 1 } } */
> +/* { dg-final { scan-assembler-times "adcq\t%r\[^\n\r]*, 24\\\(%rdi\\\)" 1 } } */
> +/* { dg-final { scan-assembler-times "subq\t%r\[^\n\r]*, \\\(%rdi\\\)" 1 } } */
> +/* { dg-final { scan-assembler-times "sbbq\t%r\[^\n\r]*, 8\\\(%rdi\\\)" 1 } } */
> +/* { dg-final { scan-assembler-times "sbbq\t%r\[^\n\r]*, 16\\\(%rdi\\\)" 1 } } */
> +/* { dg-final { scan-assembler-times "sbbq\t%r\[^\n\r]*, 24\\\(%rdi\\\)" 1 } } */
> +
> +#include <x86intrin.h>
> +
> +unsigned long long
> +foo (unsigned long long *p, unsigned long long *q)
> +{
> + unsigned char c = _addcarry_u64 (0, p[0], q[0], &p[0]);
> + c = _addcarry_u64 (c, p[1], q[1], &p[1]);
> + c = _addcarry_u64 (c, p[2], q[2], &p[2]);
> + return _addcarry_u64 (c, p[3], q[3], &p[3]);
> +}
> +
> +unsigned long long
> +bar (unsigned long long *p, unsigned long long *q)
> +{
> + unsigned char c = _subborrow_u64 (0, p[0], q[0], &p[0]);
> + c = _subborrow_u64 (c, p[1], q[1], &p[1]);
> + c = _subborrow_u64 (c, p[2], q[2], &p[2]);
> + return _subborrow_u64 (c, p[3], q[3], &p[3]);
> +}
> --- gcc/testsuite/gcc.target/i386/pr79173-10.c.jj 2023-06-06 17:53:29.576464475 +0200
> +++ gcc/testsuite/gcc.target/i386/pr79173-10.c 2023-06-06 17:53:25.021527762 +0200
> @@ -0,0 +1,31 @@
> +/* PR middle-end/79173 */
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fno-stack-protector -masm=att" } */
> +/* { dg-final { scan-assembler-times "addl\t%e\[^\n\r]*, \\\(%\[^\n\r]*\\\)" 1 } } */
> +/* { dg-final { scan-assembler-times "adcl\t%e\[^\n\r]*, 4\\\(%\[^\n\r]*\\\)" 1 } } */
> +/* { dg-final { scan-assembler-times "adcl\t%e\[^\n\r]*, 8\\\(%\[^\n\r]*\\\)" 1 } } */
> +/* { dg-final { scan-assembler-times "adcl\t%e\[^\n\r]*, 12\\\(%\[^\n\r]*\\\)" 1 } } */
> +/* { dg-final { scan-assembler-times "subl\t%e\[^\n\r]*, \\\(%\[^\n\r]*\\\)" 1 } } */
> +/* { dg-final { scan-assembler-times "sbbl\t%e\[^\n\r]*, 4\\\(%\[^\n\r]*\\\)" 1 } } */
> +/* { dg-final { scan-assembler-times "sbbl\t%e\[^\n\r]*, 8\\\(%\[^\n\r]*\\\)" 1 } } */
> +/* { dg-final { scan-assembler-times "sbbl\t%e\[^\n\r]*, 12\\\(%\[^\n\r]*\\\)" 1 } } */
> +
> +#include <x86intrin.h>
> +
> +unsigned int
> +foo (unsigned int *p, unsigned int *q)
> +{
> + unsigned char c = _addcarry_u32 (0, p[0], q[0], &p[0]);
> + c = _addcarry_u32 (c, p[1], q[1], &p[1]);
> + c = _addcarry_u32 (c, p[2], q[2], &p[2]);
> + return _addcarry_u32 (c, p[3], q[3], &p[3]);
> +}
> +
> +unsigned int
> +bar (unsigned int *p, unsigned int *q)
> +{
> + unsigned char c = _subborrow_u32 (0, p[0], q[0], &p[0]);
> + c = _subborrow_u32 (c, p[1], q[1], &p[1]);
> + c = _subborrow_u32 (c, p[2], q[2], &p[2]);
> + return _subborrow_u32 (c, p[3], q[3], &p[3]);
> +}
>
> Jakub
>
>
--
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg,
Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman;
HRB 36809 (AG Nuernberg)
next prev parent reply other threads:[~2023-06-13 8:40 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-06-06 21:42 Jakub Jelinek
2023-06-13 7:06 ` Patch ping (Re: [PATCH] middle-end, i386: Pattern recognize add/subtract with carry [PR79173]) Jakub Jelinek
2023-06-13 8:32 ` Uros Bizjak
2023-06-13 8:40 ` Richard Biener [this message]
2023-06-13 11:29 ` [PATCH] middle-end, i386: Pattern recognize add/subtract with carry [PR79173] Jakub Jelinek
2023-06-14 11:17 ` Jakub Jelinek
2023-06-14 12:25 ` Richard Biener
2023-06-14 13:52 ` [PATCH] middle-end: Move constant args folding of .UBSAN_CHECK_* and .*_OVERFLOW into fold-const-call.cc Jakub Jelinek
2023-06-14 13:54 ` Richard Biener
2023-06-14 12:35 ` [PATCH] middle-end, i386: Pattern recognize add/subtract with carry [PR79173] Richard Biener
2023-06-14 13:59 ` [PATCH] middle-end, i386, v3: " Jakub Jelinek
2023-06-14 14:28 ` Richard Biener
2023-06-14 14:34 ` Uros Bizjak
2023-06-14 14:56 ` Jakub Jelinek
2023-06-14 15:01 ` Uros Bizjak
2023-06-14 14:45 ` Uros Bizjak
2023-06-14 15:19 ` Jakub Jelinek
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=nycvar.YFH.7.77.849.2306130810310.4723@jbgna.fhfr.qr \
--to=rguenther@suse.de \
--cc=gcc-patches@gcc.gnu.org \
--cc=jakub@redhat.com \
--cc=ubizjak@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).