From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp-out2.suse.de (smtp-out2.suse.de [IPv6:2001:67c:2178:6::1d]) by sourceware.org (Postfix) with ESMTPS id 274363858C53 for ; Wed, 14 Jun 2023 12:35:44 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 274363858C53 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=suse.de Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=suse.de Received: from relay2.suse.de (relay2.suse.de [149.44.160.134]) by smtp-out2.suse.de (Postfix) with ESMTP id 431A91FDDA; Wed, 14 Jun 2023 12:35:43 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1686746143; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=+mOqrjCelZS+CPN0MJ08xKw+Eq2jTZjX+wKpxnqFqb0=; b=rcXQy2B4z39KCnQQNQY28Vn2GG9RLZkT+8rTUREADnEhn/TZ4VTNx30EAKUvam+NCiY54/ 5vqiQtnKyacBZv4cK1L1zM3sXxr2fQB1nhwwrK5s7vbCP6UCQKLSLY9fua3SxgsL57HRyV 1tD+WwGwxglU00r6WHuA8Uu8vEkXunE= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1686746143; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=+mOqrjCelZS+CPN0MJ08xKw+Eq2jTZjX+wKpxnqFqb0=; b=fL+z1U3ruvOsDccDYi66LjpIFYD9/YxdATbgpJquSIab5/yl4PNWgKKEBH6jzpN0azlBCZ uSYrdDiewMSDtnDQ== Received: from wotan.suse.de (wotan.suse.de [10.160.0.1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by relay2.suse.de (Postfix) with ESMTPS id 0DBB32C141; Wed, 14 Jun 2023 12:35:43 +0000 (UTC) Date: Wed, 14 Jun 2023 12:35:42 +0000 (UTC) From: Richard Biener To: Jakub Jelinek cc: Uros Bizjak , gcc-patches@gcc.gnu.org Subject: Re: [PATCH] middle-end, i386: Pattern recognize add/subtract with carry [PR79173] In-Reply-To: Message-ID: References: User-Agent: Alpine 2.22 (LSU 394 2020-01-19) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII X-Spam-Status: No, score=-4.9 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,KAM_SHORT,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Tue, 13 Jun 2023, Jakub Jelinek wrote: > On Tue, Jun 13, 2023 at 08:40:36AM +0000, Richard Biener wrote: > > I suspect re-association can wreck things even more here. I have > > to say the matching code is very hard to follow, not sure if > > splitting out a function matching > > > > _22 = .{ADD,SUB}_OVERFLOW (_6, _5); > > _23 = REALPART_EXPR <_22>; > > _24 = IMAGPART_EXPR <_22>; > > > > from _23 and _24 would help? > > I've outlined 3 most often used sequences of statements or checks > into 3 helper functions, hope that helps. > > > > + while (TREE_CODE (rhs[0]) == SSA_NAME && !rhs[3]) > > > + { > > > + gimple *g = SSA_NAME_DEF_STMT (rhs[0]); > > > + if (has_single_use (rhs[0]) > > > + && is_gimple_assign (g) > > > + && (gimple_assign_rhs_code (g) == code > > > + || (code == MINUS_EXPR > > > + && gimple_assign_rhs_code (g) == PLUS_EXPR > > > + && TREE_CODE (gimple_assign_rhs2 (g)) == INTEGER_CST))) > > > + { > > > + rhs[0] = gimple_assign_rhs1 (g); > > > + tree &r = rhs[2] ? rhs[3] : rhs[2]; > > > + r = gimple_assign_rhs2 (g); > > > + if (gimple_assign_rhs_code (g) != code) > > > + r = fold_build1 (NEGATE_EXPR, TREE_TYPE (r), r); > > > > Can you use const_unop here? In fact both will not reliably > > negate all constants (ick), so maybe we want a force_const_negate ()? > > It is unsigned type NEGATE_EXPR of INTEGER_CST, so I think it should > work. That said, changed it to const_unop and am just giving up on it > as if it wasn't a PLUS_EXPR with INTEGER_CST addend if const_unop doesn't > simplify. > > > > + else if (addc_subc) > > > + { > > > + if (!integer_zerop (arg2)) > > > + ; > > > + /* x = y + 0 + 0; x = y - 0 - 0; */ > > > + else if (integer_zerop (arg1)) > > > + result = arg0; > > > + /* x = 0 + y + 0; */ > > > + else if (subcode != MINUS_EXPR && integer_zerop (arg0)) > > > + result = arg1; > > > + /* x = y - y - 0; */ > > > + else if (subcode == MINUS_EXPR > > > + && operand_equal_p (arg0, arg1, 0)) > > > + result = integer_zero_node; > > > + } > > > > So this all performs simplifications but also constant folding. In > > particular the match.pd re-simplification will invoke fold_const_call > > on all-constant argument function calls but does not do extra folding > > on partially constant arg cases but instead relies on patterns here. > > > > Can you add all-constant arg handling to fold_const_call and > > consider moving cases like y + 0 + 0 to match.pd? > > The reason I've done this here is that this is the spot where all other > similar internal functions are handled, be it the ubsan ones > - IFN_UBSAN_CHECK_{ADD,SUB,MUL}, or __builtin_*_overflow ones > - IFN_{ADD,SUB,MUL}_OVERFLOW, or these 2 new ones. The code handles > there 2 constant arguments as well as various patterns that can be > simplified and has code to clean it up later, build a COMPLEX_CST, > or COMPLEX_EXPR etc. as needed. So, I think we want to handle those > elsewhere, we should do it for all of those functions, but then > probably incrementally. > > > > +@cindex @code{addc@var{m}5} instruction pattern > > > +@item @samp{addc@var{m}5} > > > +Adds operands 2, 3 and 4 (where the last operand is guaranteed to have > > > +only values 0 or 1) together, sets operand 0 to the result of the > > > +addition of the 3 operands and sets operand 1 to 1 iff there was no > > > +overflow on the unsigned additions, and to 0 otherwise. So, it is > > > +an addition with carry in (operand 4) and carry out (operand 1). > > > +All operands have the same mode. > > > > operand 1 set to 1 for no overflow sounds weird when specifying it > > as carry out - can you double check? > > Fixed. > > > > +@cindex @code{subc@var{m}5} instruction pattern > > > +@item @samp{subc@var{m}5} > > > +Similarly to @samp{addc@var{m}5}, except subtracts operands 3 and 4 > > > +from operand 2 instead of adding them. So, it is > > > +a subtraction with carry/borrow in (operand 4) and carry/borrow out > > > +(operand 1). All operands have the same mode. > > > + > > > > I wonder if we want to name them uaddc and usubc? Or is this supposed > > to be simply the twos-complement "carry"? I think the docs should > > say so then (note we do have uaddv and addv). > > Makes sense, I've actually renamed even the internal functions etc. > > Here is only lightly tested patch with everything but gimple-fold.cc > changed. > > 2023-06-13 Jakub Jelinek > > PR middle-end/79173 > * internal-fn.def (UADDC, USUBC): New internal functions. > * internal-fn.cc (expand_UADDC, expand_USUBC): New functions. > (commutative_ternary_fn_p): Return true also for IFN_UADDC. > * optabs.def (uaddc5_optab, usubc5_optab): New optabs. > * tree-ssa-math-opts.cc (uaddc_cast, uaddc_ne0, uaddc_is_cplxpart, > match_uaddc_usubc): New functions. > (math_opts_dom_walker::after_dom_children): Call match_uaddc_usubc > for PLUS_EXPR, MINUS_EXPR, BIT_IOR_EXPR and BIT_XOR_EXPR unless > other optimizations have been successful for those. > * gimple-fold.cc (gimple_fold_call): Handle IFN_UADDC and IFN_USUBC. > * gimple-range-fold.cc (adjust_imagpart_expr): Likewise. > * tree-ssa-dce.cc (eliminate_unnecessary_stmts): Likewise. > * doc/md.texi (uaddc5, usubc5): Document new named > patterns. > * config/i386/i386.md (subborrow): Add alternative with > memory destination. > (uaddc5, usubc5): New define_expand patterns. > (*sub_3, @add3_carry, addcarry, @sub3_carry, > subborrow, *add3_cc_overflow_1): Add define_peephole2 > TARGET_READ_MODIFY_WRITE/-Os patterns to prefer using memory > destination in these patterns. > > * gcc.target/i386/pr79173-1.c: New test. > * gcc.target/i386/pr79173-2.c: New test. > * gcc.target/i386/pr79173-3.c: New test. > * gcc.target/i386/pr79173-4.c: New test. > * gcc.target/i386/pr79173-5.c: New test. > * gcc.target/i386/pr79173-6.c: New test. > * gcc.target/i386/pr79173-7.c: New test. > * gcc.target/i386/pr79173-8.c: New test. > * gcc.target/i386/pr79173-9.c: New test. > * gcc.target/i386/pr79173-10.c: New test. > > --- gcc/internal-fn.def.jj 2023-06-12 15:47:22.190506569 +0200 > +++ gcc/internal-fn.def 2023-06-13 12:30:22.951974357 +0200 > @@ -416,6 +416,8 @@ DEF_INTERNAL_FN (ASAN_POISON_USE, ECF_LE > DEF_INTERNAL_FN (ADD_OVERFLOW, ECF_CONST | ECF_LEAF | ECF_NOTHROW, NULL) > DEF_INTERNAL_FN (SUB_OVERFLOW, ECF_CONST | ECF_LEAF | ECF_NOTHROW, NULL) > DEF_INTERNAL_FN (MUL_OVERFLOW, ECF_CONST | ECF_LEAF | ECF_NOTHROW, NULL) > +DEF_INTERNAL_FN (UADDC, ECF_CONST | ECF_LEAF | ECF_NOTHROW, NULL) > +DEF_INTERNAL_FN (USUBC, ECF_CONST | ECF_LEAF | ECF_NOTHROW, NULL) > DEF_INTERNAL_FN (TSAN_FUNC_EXIT, ECF_NOVOPS | ECF_LEAF | ECF_NOTHROW, NULL) > DEF_INTERNAL_FN (VA_ARG, ECF_NOTHROW | ECF_LEAF, NULL) > DEF_INTERNAL_FN (VEC_CONVERT, ECF_CONST | ECF_LEAF | ECF_NOTHROW, NULL) > --- gcc/internal-fn.cc.jj 2023-06-07 09:42:14.680130597 +0200 > +++ gcc/internal-fn.cc 2023-06-13 12:30:23.361968621 +0200 > @@ -2776,6 +2776,44 @@ expand_MUL_OVERFLOW (internal_fn, gcall > expand_arith_overflow (MULT_EXPR, stmt); > } > > +/* Expand UADDC STMT. */ > + > +static void > +expand_UADDC (internal_fn ifn, gcall *stmt) > +{ > + tree lhs = gimple_call_lhs (stmt); > + tree arg1 = gimple_call_arg (stmt, 0); > + tree arg2 = gimple_call_arg (stmt, 1); > + tree arg3 = gimple_call_arg (stmt, 2); > + tree type = TREE_TYPE (arg1); > + machine_mode mode = TYPE_MODE (type); > + insn_code icode = optab_handler (ifn == IFN_UADDC > + ? uaddc5_optab : usubc5_optab, mode); > + rtx op1 = expand_normal (arg1); > + rtx op2 = expand_normal (arg2); > + rtx op3 = expand_normal (arg3); > + rtx target = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE); > + rtx re = gen_reg_rtx (mode); > + rtx im = gen_reg_rtx (mode); > + class expand_operand ops[5]; > + create_output_operand (&ops[0], re, mode); > + create_output_operand (&ops[1], im, mode); > + create_input_operand (&ops[2], op1, mode); > + create_input_operand (&ops[3], op2, mode); > + create_input_operand (&ops[4], op3, mode); > + expand_insn (icode, 5, ops); > + write_complex_part (target, re, false, false); > + write_complex_part (target, im, true, false); > +} > + > +/* Expand USUBC STMT. */ > + > +static void > +expand_USUBC (internal_fn ifn, gcall *stmt) > +{ > + expand_UADDC (ifn, stmt); > +} > + > /* This should get folded in tree-vectorizer.cc. */ > > static void > @@ -4049,6 +4087,7 @@ commutative_ternary_fn_p (internal_fn fn > case IFN_FMS: > case IFN_FNMA: > case IFN_FNMS: > + case IFN_UADDC: > return true; > > default: > --- gcc/optabs.def.jj 2023-06-12 15:47:22.261505587 +0200 > +++ gcc/optabs.def 2023-06-13 12:30:23.372968467 +0200 > @@ -260,6 +260,8 @@ OPTAB_D (uaddv4_optab, "uaddv$I$a4") > OPTAB_D (usubv4_optab, "usubv$I$a4") > OPTAB_D (umulv4_optab, "umulv$I$a4") > OPTAB_D (negv3_optab, "negv$I$a3") > +OPTAB_D (uaddc5_optab, "uaddc$I$a5") > +OPTAB_D (usubc5_optab, "usubc$I$a5") > OPTAB_D (addptr3_optab, "addptr$a3") > OPTAB_D (spaceship_optab, "spaceship$a3") > > --- gcc/tree-ssa-math-opts.cc.jj 2023-06-07 09:41:49.573479611 +0200 > +++ gcc/tree-ssa-math-opts.cc 2023-06-13 13:04:43.699152339 +0200 > @@ -4441,6 +4441,434 @@ match_arith_overflow (gimple_stmt_iterat > return false; > } > > +/* Helper of match_uaddc_usubc. Look through an integral cast > + which should preserve [0, 1] range value (unless source has > + 1-bit signed type) and the cast has single use. */ > + > +static gimple * > +uaddc_cast (gimple *g) > +{ > + if (!gimple_assign_cast_p (g)) > + return g; > + tree op = gimple_assign_rhs1 (g); > + if (TREE_CODE (op) == SSA_NAME > + && INTEGRAL_TYPE_P (TREE_TYPE (op)) > + && (TYPE_PRECISION (TREE_TYPE (op)) > 1 > + || TYPE_UNSIGNED (TREE_TYPE (op))) > + && has_single_use (gimple_assign_lhs (g))) > + return SSA_NAME_DEF_STMT (op); > + return g; > +} > + > +/* Helper of match_uaddc_usubc. Look through a NE_EXPR > + comparison with 0 which also preserves [0, 1] value range. */ > + > +static gimple * > +uaddc_ne0 (gimple *g) > +{ > + if (is_gimple_assign (g) > + && gimple_assign_rhs_code (g) == NE_EXPR > + && integer_zerop (gimple_assign_rhs2 (g)) > + && TREE_CODE (gimple_assign_rhs1 (g)) == SSA_NAME > + && has_single_use (gimple_assign_lhs (g))) > + return SSA_NAME_DEF_STMT (gimple_assign_rhs1 (g)); > + return g; > +} > + > +/* Return true if G is {REAL,IMAG}PART_EXPR PART with SSA_NAME > + operand. */ > + > +static bool > +uaddc_is_cplxpart (gimple *g, tree_code part) > +{ > + return (is_gimple_assign (g) > + && gimple_assign_rhs_code (g) == part > + && TREE_CODE (TREE_OPERAND (gimple_assign_rhs1 (g), 0)) == SSA_NAME); > +} > + > +/* Try to match e.g. > + _29 = .ADD_OVERFLOW (_3, _4); > + _30 = REALPART_EXPR <_29>; > + _31 = IMAGPART_EXPR <_29>; > + _32 = .ADD_OVERFLOW (_30, _38); > + _33 = REALPART_EXPR <_32>; > + _34 = IMAGPART_EXPR <_32>; > + _35 = _31 + _34; > + as > + _36 = .UADDC (_3, _4, _38); > + _33 = REALPART_EXPR <_36>; > + _35 = IMAGPART_EXPR <_36>; > + or > + _22 = .SUB_OVERFLOW (_6, _5); > + _23 = REALPART_EXPR <_22>; > + _24 = IMAGPART_EXPR <_22>; > + _25 = .SUB_OVERFLOW (_23, _37); > + _26 = REALPART_EXPR <_25>; > + _27 = IMAGPART_EXPR <_25>; > + _28 = _24 | _27; > + as > + _29 = .USUBC (_6, _5, _37); > + _26 = REALPART_EXPR <_29>; > + _288 = IMAGPART_EXPR <_29>; > + provided _38 or _37 above have [0, 1] range > + and _3, _4 and _30 or _6, _5 and _23 are unsigned > + integral types with the same precision. Whether + or | or ^ is > + used on the IMAGPART_EXPR results doesn't matter, with one of > + added or subtracted operands in [0, 1] range at most one > + .ADD_OVERFLOW or .SUB_OVERFLOW will indicate overflow. */ > + > +static bool > +match_uaddc_usubc (gimple_stmt_iterator *gsi, gimple *stmt, tree_code code) > +{ > + tree rhs[4]; > + rhs[0] = gimple_assign_rhs1 (stmt); > + rhs[1] = gimple_assign_rhs2 (stmt); > + rhs[2] = NULL_TREE; > + rhs[3] = NULL_TREE; > + tree type = TREE_TYPE (rhs[0]); > + if (!INTEGRAL_TYPE_P (type) || !TYPE_UNSIGNED (type)) > + return false; > + > + if (code != BIT_IOR_EXPR && code != BIT_XOR_EXPR) > + { > + /* If overflow flag is ignored on the MSB limb, we can end up with > + the most significant limb handled as r = op1 + op2 + ovf1 + ovf2; > + or r = op1 - op2 - ovf1 - ovf2; or various equivalent expressions > + thereof. Handle those like the ovf = ovf1 + ovf2; case to recognize > + the limb below the MSB, but also create another .UADDC/.USUBC call > + for the last limb. */ > + while (TREE_CODE (rhs[0]) == SSA_NAME && !rhs[3]) > + { > + gimple *g = SSA_NAME_DEF_STMT (rhs[0]); > + if (has_single_use (rhs[0]) > + && is_gimple_assign (g) > + && (gimple_assign_rhs_code (g) == code > + || (code == MINUS_EXPR > + && gimple_assign_rhs_code (g) == PLUS_EXPR > + && TREE_CODE (gimple_assign_rhs2 (g)) == INTEGER_CST))) > + { > + tree r2 = gimple_assign_rhs2 (g); > + if (gimple_assign_rhs_code (g) != code) > + { > + r2 = const_unop (NEGATE_EXPR, TREE_TYPE (r2), r2); > + if (!r2) > + break; > + } > + rhs[0] = gimple_assign_rhs1 (g); > + tree &r = rhs[2] ? rhs[3] : rhs[2]; > + r = r2; > + } > + else > + break; > + } > + while (TREE_CODE (rhs[1]) == SSA_NAME && !rhs[3]) > + { > + gimple *g = SSA_NAME_DEF_STMT (rhs[1]); > + if (has_single_use (rhs[1]) > + && is_gimple_assign (g) > + && gimple_assign_rhs_code (g) == PLUS_EXPR) > + { > + rhs[1] = gimple_assign_rhs1 (g); > + if (rhs[2]) > + rhs[3] = gimple_assign_rhs2 (g); > + else > + rhs[2] = gimple_assign_rhs2 (g); > + } > + else > + break; > + } > + if (rhs[2] && !rhs[3]) > + { > + for (int i = (code == MINUS_EXPR ? 1 : 0); i < 3; ++i) > + if (TREE_CODE (rhs[i]) == SSA_NAME) > + { > + gimple *im = uaddc_cast (SSA_NAME_DEF_STMT (rhs[i])); > + im = uaddc_ne0 (im); > + if (uaddc_is_cplxpart (im, IMAGPART_EXPR)) > + { > + tree rhs1 = gimple_assign_rhs1 (im); > + gimple *ovf = SSA_NAME_DEF_STMT (TREE_OPERAND (rhs1, 0)); > + if (gimple_call_internal_p (ovf, code == PLUS_EXPR > + ? IFN_UADDC : IFN_USUBC) > + && (optab_handler (code == PLUS_EXPR > + ? uaddc5_optab : usubc5_optab, > + TYPE_MODE (type)) > + != CODE_FOR_nothing)) > + { > + if (i != 2) > + std::swap (rhs[i], rhs[2]); > + gimple *g > + = gimple_build_call_internal (code == PLUS_EXPR > + ? IFN_UADDC > + : IFN_USUBC, > + 3, rhs[0], rhs[1], > + rhs[2]); > + tree nlhs = make_ssa_name (build_complex_type (type)); > + gimple_call_set_lhs (g, nlhs); > + gsi_insert_before (gsi, g, GSI_SAME_STMT); > + tree ilhs = gimple_assign_lhs (stmt); > + g = gimple_build_assign (ilhs, REALPART_EXPR, > + build1 (REALPART_EXPR, > + TREE_TYPE (ilhs), > + nlhs)); > + gsi_replace (gsi, g, true); > + return true; > + } > + } > + } > + return false; > + } > + if (code == MINUS_EXPR && !rhs[2]) > + return false; > + if (code == MINUS_EXPR) > + /* Code below expects rhs[0] and rhs[1] to have the IMAGPART_EXPRs. > + So, for MINUS_EXPR swap the single added rhs operand (others are > + subtracted) to rhs[3]. */ > + std::swap (rhs[0], rhs[3]); > + } > + gimple *im1 = NULL, *im2 = NULL; > + for (int i = 0; i < (code == MINUS_EXPR ? 3 : 4); i++) > + if (rhs[i] && TREE_CODE (rhs[i]) == SSA_NAME) > + { > + gimple *im = uaddc_cast (SSA_NAME_DEF_STMT (rhs[i])); > + im = uaddc_ne0 (im); > + if (uaddc_is_cplxpart (im, IMAGPART_EXPR)) > + { > + if (im1 == NULL) > + { > + im1 = im; > + if (i != 0) > + std::swap (rhs[0], rhs[i]); > + } > + else > + { > + im2 = im; > + if (i != 1) > + std::swap (rhs[1], rhs[i]); > + break; > + } > + } > + } > + if (!im2) > + return false; > + gimple *ovf1 > + = SSA_NAME_DEF_STMT (TREE_OPERAND (gimple_assign_rhs1 (im1), 0)); > + gimple *ovf2 > + = SSA_NAME_DEF_STMT (TREE_OPERAND (gimple_assign_rhs1 (im2), 0)); > + internal_fn ifn; > + if (!is_gimple_call (ovf1) > + || !gimple_call_internal_p (ovf1) > + || ((ifn = gimple_call_internal_fn (ovf1)) != IFN_ADD_OVERFLOW > + && ifn != IFN_SUB_OVERFLOW) > + || !gimple_call_internal_p (ovf2, ifn) > + || optab_handler (ifn == IFN_ADD_OVERFLOW ? uaddc5_optab : usubc5_optab, > + TYPE_MODE (type)) == CODE_FOR_nothing > + || (rhs[2] > + && optab_handler (code == PLUS_EXPR ? uaddc5_optab : usubc5_optab, > + TYPE_MODE (type)) == CODE_FOR_nothing)) > + return false; > + tree arg1, arg2, arg3 = NULL_TREE; > + gimple *re1 = NULL, *re2 = NULL; > + for (int i = (ifn == IFN_ADD_OVERFLOW ? 1 : 0); i >= 0; --i) > + for (gimple *ovf = ovf1; ovf; ovf = (ovf == ovf1 ? ovf2 : NULL)) > + { > + tree arg = gimple_call_arg (ovf, i); > + if (TREE_CODE (arg) != SSA_NAME) > + continue; > + re1 = SSA_NAME_DEF_STMT (arg); > + if (uaddc_is_cplxpart (re1, REALPART_EXPR) > + && (SSA_NAME_DEF_STMT (TREE_OPERAND (gimple_assign_rhs1 (re1), 0)) > + == (ovf == ovf1 ? ovf2 : ovf1))) > + { > + if (ovf == ovf1) > + { > + std::swap (rhs[0], rhs[1]); > + std::swap (im1, im2); > + std::swap (ovf1, ovf2); > + } > + arg3 = gimple_call_arg (ovf, 1 - i); > + i = -1; > + break; > + } > + } At this point two pages of code without a comment - can you introduce some vertical spacing and comments as to what is matched now? The split out functions help somewhat but the code is far from obvious :/ Maybe I'm confused by the loops and instead of those sth like if (match_x_y_z (op0) || match_x_y_z (op1)) ... would be easier to follow with the loop bodies split out? Maybe put just put them in lambdas even? I guess you'll be around as long as myself so we can go with this code under the premise you're going to maintain it - it's not that I'm writing trivially to understand code myself ... Thanks, Richard. > + if (!arg3) > + return false; > + arg1 = gimple_call_arg (ovf1, 0); > + arg2 = gimple_call_arg (ovf1, 1); > + if (!types_compatible_p (type, TREE_TYPE (arg1))) > + return false; > + int kind[2] = { 0, 0 }; > + /* At least one of arg2 and arg3 should have type compatible > + with arg1/rhs[0], and the other one should have value in [0, 1] > + range. */ > + for (int i = 0; i < 2; ++i) > + { > + tree arg = i == 0 ? arg2 : arg3; > + if (types_compatible_p (type, TREE_TYPE (arg))) > + kind[i] = 1; > + if (!INTEGRAL_TYPE_P (TREE_TYPE (arg)) > + || (TYPE_PRECISION (TREE_TYPE (arg)) == 1 > + && !TYPE_UNSIGNED (TREE_TYPE (arg)))) > + continue; > + if (tree_zero_one_valued_p (arg)) > + kind[i] |= 2; > + if (TREE_CODE (arg) == SSA_NAME) > + { > + gimple *g = SSA_NAME_DEF_STMT (arg); > + if (gimple_assign_cast_p (g)) > + { > + tree op = gimple_assign_rhs1 (g); > + if (TREE_CODE (op) == SSA_NAME > + && INTEGRAL_TYPE_P (TREE_TYPE (op))) > + g = SSA_NAME_DEF_STMT (op); > + } > + g = uaddc_ne0 (g); > + if (!uaddc_is_cplxpart (g, IMAGPART_EXPR)) > + continue; > + g = SSA_NAME_DEF_STMT (TREE_OPERAND (gimple_assign_rhs1 (g), 0)); > + if (!is_gimple_call (g) || !gimple_call_internal_p (g)) > + continue; > + switch (gimple_call_internal_fn (g)) > + { > + case IFN_ADD_OVERFLOW: > + case IFN_SUB_OVERFLOW: > + case IFN_UADDC: > + case IFN_USUBC: > + break; > + default: > + continue; > + } > + kind[i] |= 4; > + } > + } > + /* Make arg2 the one with compatible type and arg3 the one > + with [0, 1] range. If both is true for both operands, > + prefer as arg3 result of __imag__ of some ifn. */ > + if ((kind[0] & 1) == 0 || ((kind[1] & 1) != 0 && kind[0] > kind[1])) > + { > + std::swap (arg2, arg3); > + std::swap (kind[0], kind[1]); > + } > + if ((kind[0] & 1) == 0 || (kind[1] & 6) == 0) > + return false; > + if (!has_single_use (gimple_assign_lhs (im1)) > + || !has_single_use (gimple_assign_lhs (im2)) > + || !has_single_use (gimple_assign_lhs (re1)) > + || num_imm_uses (gimple_call_lhs (ovf1)) != 2) > + return false; > + use_operand_p use_p; > + imm_use_iterator iter; > + tree lhs = gimple_call_lhs (ovf2); > + FOR_EACH_IMM_USE_FAST (use_p, iter, lhs) > + { > + gimple *use_stmt = USE_STMT (use_p); > + if (is_gimple_debug (use_stmt)) > + continue; > + if (use_stmt == im2) > + continue; > + if (re2) > + return false; > + if (!uaddc_is_cplxpart (use_stmt, REALPART_EXPR)) > + return false; > + re2 = use_stmt; > + } > + gimple_stmt_iterator gsi2 = gsi_for_stmt (ovf2); > + gimple *g; > + if ((kind[1] & 1) == 0) > + { > + if (TREE_CODE (arg3) == INTEGER_CST) > + arg3 = fold_convert (type, arg3); > + else > + { > + g = gimple_build_assign (make_ssa_name (type), NOP_EXPR, arg3); > + gsi_insert_before (&gsi2, g, GSI_SAME_STMT); > + arg3 = gimple_assign_lhs (g); > + } > + } > + g = gimple_build_call_internal (ifn == IFN_ADD_OVERFLOW > + ? IFN_UADDC : IFN_USUBC, > + 3, arg1, arg2, arg3); > + tree nlhs = make_ssa_name (TREE_TYPE (lhs)); > + gimple_call_set_lhs (g, nlhs); > + gsi_insert_before (&gsi2, g, GSI_SAME_STMT); > + tree ilhs = rhs[2] ? make_ssa_name (type) : gimple_assign_lhs (stmt); > + g = gimple_build_assign (ilhs, IMAGPART_EXPR, > + build1 (IMAGPART_EXPR, TREE_TYPE (ilhs), nlhs)); > + if (rhs[2]) > + gsi_insert_before (gsi, g, GSI_SAME_STMT); > + else > + gsi_replace (gsi, g, true); > + tree rhs1 = rhs[1]; > + for (int i = 0; i < 2; i++) > + if (rhs1 == gimple_assign_lhs (im2)) > + break; > + else > + { > + g = SSA_NAME_DEF_STMT (rhs1); > + rhs1 = gimple_assign_rhs1 (g); > + gsi2 = gsi_for_stmt (g); > + gsi_remove (&gsi2, true); > + } > + gcc_checking_assert (rhs1 == gimple_assign_lhs (im2)); > + gsi2 = gsi_for_stmt (im2); > + gsi_remove (&gsi2, true); > + gsi2 = gsi_for_stmt (re2); > + tree rlhs = gimple_assign_lhs (re2); > + g = gimple_build_assign (rlhs, REALPART_EXPR, > + build1 (REALPART_EXPR, TREE_TYPE (rlhs), nlhs)); > + gsi_replace (&gsi2, g, true); > + if (rhs[2]) > + { > + g = gimple_build_call_internal (code == PLUS_EXPR > + ? IFN_UADDC : IFN_USUBC, > + 3, rhs[3], rhs[2], ilhs); > + nlhs = make_ssa_name (TREE_TYPE (lhs)); > + gimple_call_set_lhs (g, nlhs); > + gsi_insert_before (gsi, g, GSI_SAME_STMT); > + ilhs = gimple_assign_lhs (stmt); > + g = gimple_build_assign (ilhs, REALPART_EXPR, > + build1 (REALPART_EXPR, TREE_TYPE (ilhs), nlhs)); > + gsi_replace (gsi, g, true); > + } > + if (TREE_CODE (arg3) == SSA_NAME) > + { > + gimple *im3 = SSA_NAME_DEF_STMT (arg3); > + for (int i = 0; i < 2; ++i) > + { > + gimple *im4 = uaddc_cast (im3); > + if (im4 == im3) > + break; > + else > + im3 = im4; > + } > + im3 = uaddc_ne0 (im3); > + if (uaddc_is_cplxpart (im3, IMAGPART_EXPR)) > + { > + gimple *ovf3 > + = SSA_NAME_DEF_STMT (TREE_OPERAND (gimple_assign_rhs1 (im3), 0)); > + if (gimple_call_internal_p (ovf3, ifn)) > + { > + lhs = gimple_call_lhs (ovf3); > + arg1 = gimple_call_arg (ovf3, 0); > + arg2 = gimple_call_arg (ovf3, 1); > + if (types_compatible_p (type, TREE_TYPE (TREE_TYPE (lhs))) > + && types_compatible_p (type, TREE_TYPE (arg1)) > + && types_compatible_p (type, TREE_TYPE (arg2))) > + { > + g = gimple_build_call_internal (ifn == IFN_ADD_OVERFLOW > + ? IFN_UADDC : IFN_USUBC, > + 3, arg1, arg2, > + build_zero_cst (type)); > + gimple_call_set_lhs (g, lhs); > + gsi2 = gsi_for_stmt (ovf3); > + gsi_replace (&gsi2, g, true); > + } > + } > + } > + } > + return true; > +} > + > /* Return true if target has support for divmod. */ > > static bool > @@ -5068,8 +5496,9 @@ math_opts_dom_walker::after_dom_children > > case PLUS_EXPR: > case MINUS_EXPR: > - if (!convert_plusminus_to_widen (&gsi, stmt, code)) > - match_arith_overflow (&gsi, stmt, code, m_cfg_changed_p); > + if (!convert_plusminus_to_widen (&gsi, stmt, code) > + && !match_arith_overflow (&gsi, stmt, code, m_cfg_changed_p)) > + match_uaddc_usubc (&gsi, stmt, code); > break; > > case BIT_NOT_EXPR: > @@ -5085,6 +5514,11 @@ math_opts_dom_walker::after_dom_children > convert_mult_to_highpart (as_a (stmt), &gsi); > break; > > + case BIT_IOR_EXPR: > + case BIT_XOR_EXPR: > + match_uaddc_usubc (&gsi, stmt, code); > + break; > + > default:; > } > } > --- gcc/gimple-fold.cc.jj 2023-06-07 09:41:49.117485950 +0200 > +++ gcc/gimple-fold.cc 2023-06-13 12:30:23.392968187 +0200 > @@ -5585,6 +5585,7 @@ gimple_fold_call (gimple_stmt_iterator * > enum tree_code subcode = ERROR_MARK; > tree result = NULL_TREE; > bool cplx_result = false; > + bool uaddc_usubc = false; > tree overflow = NULL_TREE; > switch (gimple_call_internal_fn (stmt)) > { > @@ -5658,6 +5659,16 @@ gimple_fold_call (gimple_stmt_iterator * > subcode = MULT_EXPR; > cplx_result = true; > break; > + case IFN_UADDC: > + subcode = PLUS_EXPR; > + cplx_result = true; > + uaddc_usubc = true; > + break; > + case IFN_USUBC: > + subcode = MINUS_EXPR; > + cplx_result = true; > + uaddc_usubc = true; > + break; > case IFN_MASK_LOAD: > changed |= gimple_fold_partial_load (gsi, stmt, true); > break; > @@ -5677,6 +5688,7 @@ gimple_fold_call (gimple_stmt_iterator * > { > tree arg0 = gimple_call_arg (stmt, 0); > tree arg1 = gimple_call_arg (stmt, 1); > + tree arg2 = NULL_TREE; > tree type = TREE_TYPE (arg0); > if (cplx_result) > { > @@ -5685,9 +5697,26 @@ gimple_fold_call (gimple_stmt_iterator * > type = NULL_TREE; > else > type = TREE_TYPE (TREE_TYPE (lhs)); > + if (uaddc_usubc) > + arg2 = gimple_call_arg (stmt, 2); > } > if (type == NULL_TREE) > ; > + else if (uaddc_usubc) > + { > + if (!integer_zerop (arg2)) > + ; > + /* x = y + 0 + 0; x = y - 0 - 0; */ > + else if (integer_zerop (arg1)) > + result = arg0; > + /* x = 0 + y + 0; */ > + else if (subcode != MINUS_EXPR && integer_zerop (arg0)) > + result = arg1; > + /* x = y - y - 0; */ > + else if (subcode == MINUS_EXPR > + && operand_equal_p (arg0, arg1, 0)) > + result = integer_zero_node; > + } > /* x = y + 0; x = y - 0; x = y * 0; */ > else if (integer_zerop (arg1)) > result = subcode == MULT_EXPR ? integer_zero_node : arg0; > @@ -5702,8 +5731,11 @@ gimple_fold_call (gimple_stmt_iterator * > result = arg0; > else if (subcode == MULT_EXPR && integer_onep (arg0)) > result = arg1; > - else if (TREE_CODE (arg0) == INTEGER_CST > - && TREE_CODE (arg1) == INTEGER_CST) > + if (type > + && result == NULL_TREE > + && TREE_CODE (arg0) == INTEGER_CST > + && TREE_CODE (arg1) == INTEGER_CST > + && (!uaddc_usubc || TREE_CODE (arg2) == INTEGER_CST)) > { > if (cplx_result) > result = int_const_binop (subcode, fold_convert (type, arg0), > @@ -5717,6 +5749,15 @@ gimple_fold_call (gimple_stmt_iterator * > else > result = NULL_TREE; > } > + if (uaddc_usubc && result) > + { > + tree r = int_const_binop (subcode, result, > + fold_convert (type, arg2)); > + if (r == NULL_TREE) > + result = NULL_TREE; > + else if (arith_overflowed_p (subcode, type, result, arg2)) > + overflow = build_one_cst (type); > + } > } > if (result) > { > --- gcc/gimple-range-fold.cc.jj 2023-06-07 09:41:49.125485839 +0200 > +++ gcc/gimple-range-fold.cc 2023-06-13 12:30:23.405968006 +0200 > @@ -489,6 +489,8 @@ adjust_imagpart_expr (vrange &res, const > case IFN_ADD_OVERFLOW: > case IFN_SUB_OVERFLOW: > case IFN_MUL_OVERFLOW: > + case IFN_UADDC: > + case IFN_USUBC: > case IFN_ATOMIC_COMPARE_EXCHANGE: > { > int_range<2> r; > --- gcc/tree-ssa-dce.cc.jj 2023-06-07 09:41:49.272483796 +0200 > +++ gcc/tree-ssa-dce.cc 2023-06-13 12:30:23.415967865 +0200 > @@ -1481,6 +1481,14 @@ eliminate_unnecessary_stmts (bool aggres > case IFN_MUL_OVERFLOW: > maybe_optimize_arith_overflow (&gsi, MULT_EXPR); > break; > + case IFN_UADDC: > + if (integer_zerop (gimple_call_arg (stmt, 2))) > + maybe_optimize_arith_overflow (&gsi, PLUS_EXPR); > + break; > + case IFN_USUBC: > + if (integer_zerop (gimple_call_arg (stmt, 2))) > + maybe_optimize_arith_overflow (&gsi, MINUS_EXPR); > + break; > default: > break; > } > --- gcc/doc/md.texi.jj 2023-06-12 15:47:22.145507192 +0200 > +++ gcc/doc/md.texi 2023-06-13 13:09:50.699868708 +0200 > @@ -5224,6 +5224,22 @@ is taken only on unsigned overflow. > @item @samp{usubv@var{m}4}, @samp{umulv@var{m}4} > Similar, for other unsigned arithmetic operations. > > +@cindex @code{uaddc@var{m}5} instruction pattern > +@item @samp{uaddc@var{m}5} > +Adds unsigned operands 2, 3 and 4 (where the last operand is guaranteed to > +have only values 0 or 1) together, sets operand 0 to the result of the > +addition of the 3 operands and sets operand 1 to 1 iff there was > +overflow on the unsigned additions, and to 0 otherwise. So, it is > +an addition with carry in (operand 4) and carry out (operand 1). > +All operands have the same mode. > + > +@cindex @code{usubc@var{m}5} instruction pattern > +@item @samp{usubc@var{m}5} > +Similarly to @samp{uaddc@var{m}5}, except subtracts unsigned operands 3 > +and 4 from operand 2 instead of adding them. So, it is > +a subtraction with carry/borrow in (operand 4) and carry/borrow out > +(operand 1). All operands have the same mode. > + > @cindex @code{addptr@var{m}3} instruction pattern > @item @samp{addptr@var{m}3} > Like @code{add@var{m}3} but is guaranteed to only be used for address > --- gcc/config/i386/i386.md.jj 2023-06-12 15:47:21.894510663 +0200 > +++ gcc/config/i386/i386.md 2023-06-13 12:30:23.465967165 +0200 > @@ -7733,6 +7733,25 @@ (define_peephole2 > [(set (reg:CC FLAGS_REG) > (compare:CC (match_dup 0) (match_dup 1)))]) > > +(define_peephole2 > + [(set (match_operand:SWI 0 "general_reg_operand") > + (match_operand:SWI 1 "memory_operand")) > + (parallel [(set (reg:CC FLAGS_REG) > + (compare:CC (match_dup 0) > + (match_operand:SWI 2 "memory_operand"))) > + (set (match_dup 0) > + (minus:SWI (match_dup 0) (match_dup 2)))]) > + (set (match_dup 1) (match_dup 0))] > + "(TARGET_READ_MODIFY_WRITE || optimize_insn_for_size_p ()) > + && peep2_reg_dead_p (3, operands[0]) > + && !reg_overlap_mentioned_p (operands[0], operands[1]) > + && !reg_overlap_mentioned_p (operands[0], operands[2])" > + [(set (match_dup 0) (match_dup 2)) > + (parallel [(set (reg:CC FLAGS_REG) > + (compare:CC (match_dup 1) (match_dup 0))) > + (set (match_dup 1) > + (minus:SWI (match_dup 1) (match_dup 0)))])]) > + > ;; decl %eax; cmpl $-1, %eax; jne .Lxx; can be optimized into > ;; subl $1, %eax; jnc .Lxx; > (define_peephole2 > @@ -7818,6 +7837,59 @@ (define_insn "@add3_carry" > (set_attr "pent_pair" "pu") > (set_attr "mode" "")]) > > +(define_peephole2 > + [(set (match_operand:SWI 0 "general_reg_operand") > + (match_operand:SWI 1 "memory_operand")) > + (parallel [(set (match_dup 0) > + (plus:SWI > + (plus:SWI > + (match_operator:SWI 4 "ix86_carry_flag_operator" > + [(match_operand 3 "flags_reg_operand") > + (const_int 0)]) > + (match_dup 0)) > + (match_operand:SWI 2 "memory_operand"))) > + (clobber (reg:CC FLAGS_REG))]) > + (set (match_dup 1) (match_dup 0))] > + "(TARGET_READ_MODIFY_WRITE || optimize_insn_for_size_p ()) > + && peep2_reg_dead_p (3, operands[0]) > + && !reg_overlap_mentioned_p (operands[0], operands[1]) > + && !reg_overlap_mentioned_p (operands[0], operands[2])" > + [(set (match_dup 0) (match_dup 2)) > + (parallel [(set (match_dup 1) > + (plus:SWI (plus:SWI (match_op_dup 4 > + [(match_dup 3) (const_int 0)]) > + (match_dup 1)) > + (match_dup 0))) > + (clobber (reg:CC FLAGS_REG))])]) > + > +(define_peephole2 > + [(set (match_operand:SWI 0 "general_reg_operand") > + (match_operand:SWI 1 "memory_operand")) > + (parallel [(set (match_dup 0) > + (plus:SWI > + (plus:SWI > + (match_operator:SWI 4 "ix86_carry_flag_operator" > + [(match_operand 3 "flags_reg_operand") > + (const_int 0)]) > + (match_dup 0)) > + (match_operand:SWI 2 "memory_operand"))) > + (clobber (reg:CC FLAGS_REG))]) > + (set (match_operand:SWI 5 "general_reg_operand") (match_dup 0)) > + (set (match_dup 1) (match_dup 5))] > + "(TARGET_READ_MODIFY_WRITE || optimize_insn_for_size_p ()) > + && peep2_reg_dead_p (3, operands[0]) > + && peep2_reg_dead_p (4, operands[5]) > + && !reg_overlap_mentioned_p (operands[0], operands[1]) > + && !reg_overlap_mentioned_p (operands[0], operands[2]) > + && !reg_overlap_mentioned_p (operands[5], operands[1])" > + [(set (match_dup 0) (match_dup 2)) > + (parallel [(set (match_dup 1) > + (plus:SWI (plus:SWI (match_op_dup 4 > + [(match_dup 3) (const_int 0)]) > + (match_dup 1)) > + (match_dup 0))) > + (clobber (reg:CC FLAGS_REG))])]) > + > (define_insn "*add3_carry_0" > [(set (match_operand:SWI 0 "nonimmediate_operand" "=m") > (plus:SWI > @@ -7918,6 +7990,159 @@ (define_insn "addcarry" > (set_attr "pent_pair" "pu") > (set_attr "mode" "")]) > > +;; Helper peephole2 for the addcarry and subborrow > +;; peephole2s, to optimize away nop which resulted from uaddc/usubc > +;; expansion optimization. > +(define_peephole2 > + [(set (match_operand:SWI48 0 "general_reg_operand") > + (match_operand:SWI48 1 "memory_operand")) > + (const_int 0)] > + "" > + [(set (match_dup 0) (match_dup 1))]) > + > +(define_peephole2 > + [(parallel [(set (reg:CCC FLAGS_REG) > + (compare:CCC > + (zero_extend: > + (plus:SWI48 > + (plus:SWI48 > + (match_operator:SWI48 4 "ix86_carry_flag_operator" > + [(match_operand 2 "flags_reg_operand") > + (const_int 0)]) > + (match_operand:SWI48 0 "general_reg_operand")) > + (match_operand:SWI48 1 "memory_operand"))) > + (plus: > + (zero_extend: (match_dup 1)) > + (match_operator: 3 "ix86_carry_flag_operator" > + [(match_dup 2) (const_int 0)])))) > + (set (match_dup 0) > + (plus:SWI48 (plus:SWI48 (match_op_dup 4 > + [(match_dup 2) (const_int 0)]) > + (match_dup 0)) > + (match_dup 1)))]) > + (set (match_dup 1) (match_dup 0))] > + "(TARGET_READ_MODIFY_WRITE || optimize_insn_for_size_p ()) > + && peep2_reg_dead_p (2, operands[0]) > + && !reg_overlap_mentioned_p (operands[0], operands[1])" > + [(parallel [(set (reg:CCC FLAGS_REG) > + (compare:CCC > + (zero_extend: > + (plus:SWI48 > + (plus:SWI48 > + (match_op_dup 4 > + [(match_dup 2) (const_int 0)]) > + (match_dup 1)) > + (match_dup 0))) > + (plus: > + (zero_extend: (match_dup 0)) > + (match_op_dup 3 > + [(match_dup 2) (const_int 0)])))) > + (set (match_dup 1) > + (plus:SWI48 (plus:SWI48 (match_op_dup 4 > + [(match_dup 2) (const_int 0)]) > + (match_dup 1)) > + (match_dup 0)))])]) > + > +(define_peephole2 > + [(set (match_operand:SWI48 0 "general_reg_operand") > + (match_operand:SWI48 1 "memory_operand")) > + (parallel [(set (reg:CCC FLAGS_REG) > + (compare:CCC > + (zero_extend: > + (plus:SWI48 > + (plus:SWI48 > + (match_operator:SWI48 5 "ix86_carry_flag_operator" > + [(match_operand 3 "flags_reg_operand") > + (const_int 0)]) > + (match_dup 0)) > + (match_operand:SWI48 2 "memory_operand"))) > + (plus: > + (zero_extend: (match_dup 2)) > + (match_operator: 4 "ix86_carry_flag_operator" > + [(match_dup 3) (const_int 0)])))) > + (set (match_dup 0) > + (plus:SWI48 (plus:SWI48 (match_op_dup 5 > + [(match_dup 3) (const_int 0)]) > + (match_dup 0)) > + (match_dup 2)))]) > + (set (match_dup 1) (match_dup 0))] > + "(TARGET_READ_MODIFY_WRITE || optimize_insn_for_size_p ()) > + && peep2_reg_dead_p (3, operands[0]) > + && !reg_overlap_mentioned_p (operands[0], operands[1]) > + && !reg_overlap_mentioned_p (operands[0], operands[2])" > + [(set (match_dup 0) (match_dup 2)) > + (parallel [(set (reg:CCC FLAGS_REG) > + (compare:CCC > + (zero_extend: > + (plus:SWI48 > + (plus:SWI48 > + (match_op_dup 5 > + [(match_dup 3) (const_int 0)]) > + (match_dup 1)) > + (match_dup 0))) > + (plus: > + (zero_extend: (match_dup 0)) > + (match_op_dup 4 > + [(match_dup 3) (const_int 0)])))) > + (set (match_dup 1) > + (plus:SWI48 (plus:SWI48 (match_op_dup 5 > + [(match_dup 3) (const_int 0)]) > + (match_dup 1)) > + (match_dup 0)))])]) > + > +(define_peephole2 > + [(parallel [(set (reg:CCC FLAGS_REG) > + (compare:CCC > + (zero_extend: > + (plus:SWI48 > + (plus:SWI48 > + (match_operator:SWI48 4 "ix86_carry_flag_operator" > + [(match_operand 2 "flags_reg_operand") > + (const_int 0)]) > + (match_operand:SWI48 0 "general_reg_operand")) > + (match_operand:SWI48 1 "memory_operand"))) > + (plus: > + (zero_extend: (match_dup 1)) > + (match_operator: 3 "ix86_carry_flag_operator" > + [(match_dup 2) (const_int 0)])))) > + (set (match_dup 0) > + (plus:SWI48 (plus:SWI48 (match_op_dup 4 > + [(match_dup 2) (const_int 0)]) > + (match_dup 0)) > + (match_dup 1)))]) > + (set (match_operand:QI 5 "general_reg_operand") > + (ltu:QI (reg:CCC FLAGS_REG) (const_int 0))) > + (set (match_operand:SWI48 6 "general_reg_operand") > + (zero_extend:SWI48 (match_dup 5))) > + (set (match_dup 1) (match_dup 0))] > + "(TARGET_READ_MODIFY_WRITE || optimize_insn_for_size_p ()) > + && peep2_reg_dead_p (4, operands[0]) > + && !reg_overlap_mentioned_p (operands[0], operands[1]) > + && !reg_overlap_mentioned_p (operands[0], operands[5]) > + && !reg_overlap_mentioned_p (operands[5], operands[1]) > + && !reg_overlap_mentioned_p (operands[0], operands[6]) > + && !reg_overlap_mentioned_p (operands[6], operands[1])" > + [(parallel [(set (reg:CCC FLAGS_REG) > + (compare:CCC > + (zero_extend: > + (plus:SWI48 > + (plus:SWI48 > + (match_op_dup 4 > + [(match_dup 2) (const_int 0)]) > + (match_dup 1)) > + (match_dup 0))) > + (plus: > + (zero_extend: (match_dup 0)) > + (match_op_dup 3 > + [(match_dup 2) (const_int 0)])))) > + (set (match_dup 1) > + (plus:SWI48 (plus:SWI48 (match_op_dup 4 > + [(match_dup 2) (const_int 0)]) > + (match_dup 1)) > + (match_dup 0)))]) > + (set (match_dup 5) (ltu:QI (reg:CCC FLAGS_REG) (const_int 0))) > + (set (match_dup 6) (zero_extend:SWI48 (match_dup 5)))]) > + > (define_expand "addcarry_0" > [(parallel > [(set (reg:CCC FLAGS_REG) > @@ -7988,6 +8213,59 @@ (define_insn "@sub3_carry" > (set_attr "pent_pair" "pu") > (set_attr "mode" "")]) > > +(define_peephole2 > + [(set (match_operand:SWI 0 "general_reg_operand") > + (match_operand:SWI 1 "memory_operand")) > + (parallel [(set (match_dup 0) > + (minus:SWI > + (minus:SWI > + (match_dup 0) > + (match_operator:SWI 4 "ix86_carry_flag_operator" > + [(match_operand 3 "flags_reg_operand") > + (const_int 0)])) > + (match_operand:SWI 2 "memory_operand"))) > + (clobber (reg:CC FLAGS_REG))]) > + (set (match_dup 1) (match_dup 0))] > + "(TARGET_READ_MODIFY_WRITE || optimize_insn_for_size_p ()) > + && peep2_reg_dead_p (3, operands[0]) > + && !reg_overlap_mentioned_p (operands[0], operands[1]) > + && !reg_overlap_mentioned_p (operands[0], operands[2])" > + [(set (match_dup 0) (match_dup 2)) > + (parallel [(set (match_dup 1) > + (minus:SWI (minus:SWI (match_dup 1) > + (match_op_dup 4 > + [(match_dup 3) (const_int 0)])) > + (match_dup 0))) > + (clobber (reg:CC FLAGS_REG))])]) > + > +(define_peephole2 > + [(set (match_operand:SWI 0 "general_reg_operand") > + (match_operand:SWI 1 "memory_operand")) > + (parallel [(set (match_dup 0) > + (minus:SWI > + (minus:SWI > + (match_dup 0) > + (match_operator:SWI 4 "ix86_carry_flag_operator" > + [(match_operand 3 "flags_reg_operand") > + (const_int 0)])) > + (match_operand:SWI 2 "memory_operand"))) > + (clobber (reg:CC FLAGS_REG))]) > + (set (match_operand:SWI 5 "general_reg_operand") (match_dup 0)) > + (set (match_dup 1) (match_dup 5))] > + "(TARGET_READ_MODIFY_WRITE || optimize_insn_for_size_p ()) > + && peep2_reg_dead_p (3, operands[0]) > + && peep2_reg_dead_p (4, operands[5]) > + && !reg_overlap_mentioned_p (operands[0], operands[1]) > + && !reg_overlap_mentioned_p (operands[0], operands[2]) > + && !reg_overlap_mentioned_p (operands[5], operands[1])" > + [(set (match_dup 0) (match_dup 2)) > + (parallel [(set (match_dup 1) > + (minus:SWI (minus:SWI (match_dup 1) > + (match_op_dup 4 > + [(match_dup 3) (const_int 0)])) > + (match_dup 0))) > + (clobber (reg:CC FLAGS_REG))])]) > + > (define_insn "*sub3_carry_0" > [(set (match_operand:SWI 0 "nonimmediate_operand" "=m") > (minus:SWI > @@ -8113,13 +8391,13 @@ (define_insn "subborrow" > [(set (reg:CCC FLAGS_REG) > (compare:CCC > (zero_extend: > - (match_operand:SWI48 1 "nonimmediate_operand" "0")) > + (match_operand:SWI48 1 "nonimmediate_operand" "0,0")) > (plus: > (match_operator: 4 "ix86_carry_flag_operator" > [(match_operand 3 "flags_reg_operand") (const_int 0)]) > (zero_extend: > - (match_operand:SWI48 2 "nonimmediate_operand" "rm"))))) > - (set (match_operand:SWI48 0 "register_operand" "=r") > + (match_operand:SWI48 2 "nonimmediate_operand" "r,rm"))))) > + (set (match_operand:SWI48 0 "nonimmediate_operand" "=rm,r") > (minus:SWI48 (minus:SWI48 > (match_dup 1) > (match_operator:SWI48 5 "ix86_carry_flag_operator" > @@ -8132,6 +8410,154 @@ (define_insn "subborrow" > (set_attr "pent_pair" "pu") > (set_attr "mode" "")]) > > +(define_peephole2 > + [(set (match_operand:SWI48 0 "general_reg_operand") > + (match_operand:SWI48 1 "memory_operand")) > + (parallel [(set (reg:CCC FLAGS_REG) > + (compare:CCC > + (zero_extend: (match_dup 0)) > + (plus: > + (match_operator: 4 "ix86_carry_flag_operator" > + [(match_operand 3 "flags_reg_operand") (const_int 0)]) > + (zero_extend: > + (match_operand:SWI48 2 "memory_operand"))))) > + (set (match_dup 0) > + (minus:SWI48 > + (minus:SWI48 > + (match_dup 0) > + (match_operator:SWI48 5 "ix86_carry_flag_operator" > + [(match_dup 3) (const_int 0)])) > + (match_dup 2)))]) > + (set (match_dup 1) (match_dup 0))] > + "(TARGET_READ_MODIFY_WRITE || optimize_insn_for_size_p ()) > + && peep2_reg_dead_p (3, operands[0]) > + && !reg_overlap_mentioned_p (operands[0], operands[1]) > + && !reg_overlap_mentioned_p (operands[0], operands[2])" > + [(set (match_dup 0) (match_dup 2)) > + (parallel [(set (reg:CCC FLAGS_REG) > + (compare:CCC > + (zero_extend: (match_dup 1)) > + (plus: (match_op_dup 4 > + [(match_dup 3) (const_int 0)]) > + (zero_extend: (match_dup 0))))) > + (set (match_dup 1) > + (minus:SWI48 (minus:SWI48 (match_dup 1) > + (match_op_dup 5 > + [(match_dup 3) (const_int 0)])) > + (match_dup 0)))])]) > + > +(define_peephole2 > + [(set (match_operand:SWI48 6 "general_reg_operand") > + (match_operand:SWI48 7 "memory_operand")) > + (set (match_operand:SWI48 8 "general_reg_operand") > + (match_operand:SWI48 9 "memory_operand")) > + (parallel [(set (reg:CCC FLAGS_REG) > + (compare:CCC > + (zero_extend: > + (match_operand:SWI48 0 "general_reg_operand")) > + (plus: > + (match_operator: 4 "ix86_carry_flag_operator" > + [(match_operand 3 "flags_reg_operand") (const_int 0)]) > + (zero_extend: > + (match_operand:SWI48 2 "general_reg_operand"))))) > + (set (match_dup 0) > + (minus:SWI48 > + (minus:SWI48 > + (match_dup 0) > + (match_operator:SWI48 5 "ix86_carry_flag_operator" > + [(match_dup 3) (const_int 0)])) > + (match_dup 2)))]) > + (set (match_operand:SWI48 1 "memory_operand") (match_dup 0))] > + "(TARGET_READ_MODIFY_WRITE || optimize_insn_for_size_p ()) > + && peep2_reg_dead_p (4, operands[0]) > + && peep2_reg_dead_p (3, operands[2]) > + && !reg_overlap_mentioned_p (operands[0], operands[1]) > + && !reg_overlap_mentioned_p (operands[2], operands[1]) > + && !reg_overlap_mentioned_p (operands[6], operands[9]) > + && (rtx_equal_p (operands[6], operands[0]) > + ? (rtx_equal_p (operands[7], operands[1]) > + && rtx_equal_p (operands[8], operands[2])) > + : (rtx_equal_p (operands[8], operands[0]) > + && rtx_equal_p (operands[9], operands[1]) > + && rtx_equal_p (operands[6], operands[2])))" > + [(set (match_dup 0) (match_dup 9)) > + (parallel [(set (reg:CCC FLAGS_REG) > + (compare:CCC > + (zero_extend: (match_dup 1)) > + (plus: (match_op_dup 4 > + [(match_dup 3) (const_int 0)]) > + (zero_extend: (match_dup 0))))) > + (set (match_dup 1) > + (minus:SWI48 (minus:SWI48 (match_dup 1) > + (match_op_dup 5 > + [(match_dup 3) (const_int 0)])) > + (match_dup 0)))])] > +{ > + if (!rtx_equal_p (operands[6], operands[0])) > + operands[9] = operands[7]; > +}) > + > +(define_peephole2 > + [(set (match_operand:SWI48 6 "general_reg_operand") > + (match_operand:SWI48 7 "memory_operand")) > + (set (match_operand:SWI48 8 "general_reg_operand") > + (match_operand:SWI48 9 "memory_operand")) > + (parallel [(set (reg:CCC FLAGS_REG) > + (compare:CCC > + (zero_extend: > + (match_operand:SWI48 0 "general_reg_operand")) > + (plus: > + (match_operator: 4 "ix86_carry_flag_operator" > + [(match_operand 3 "flags_reg_operand") (const_int 0)]) > + (zero_extend: > + (match_operand:SWI48 2 "general_reg_operand"))))) > + (set (match_dup 0) > + (minus:SWI48 > + (minus:SWI48 > + (match_dup 0) > + (match_operator:SWI48 5 "ix86_carry_flag_operator" > + [(match_dup 3) (const_int 0)])) > + (match_dup 2)))]) > + (set (match_operand:QI 10 "general_reg_operand") > + (ltu:QI (reg:CCC FLAGS_REG) (const_int 0))) > + (set (match_operand:SWI48 11 "general_reg_operand") > + (zero_extend:SWI48 (match_dup 10))) > + (set (match_operand:SWI48 1 "memory_operand") (match_dup 0))] > + "(TARGET_READ_MODIFY_WRITE || optimize_insn_for_size_p ()) > + && peep2_reg_dead_p (6, operands[0]) > + && peep2_reg_dead_p (3, operands[2]) > + && !reg_overlap_mentioned_p (operands[0], operands[1]) > + && !reg_overlap_mentioned_p (operands[2], operands[1]) > + && !reg_overlap_mentioned_p (operands[6], operands[9]) > + && !reg_overlap_mentioned_p (operands[0], operands[10]) > + && !reg_overlap_mentioned_p (operands[10], operands[1]) > + && !reg_overlap_mentioned_p (operands[0], operands[11]) > + && !reg_overlap_mentioned_p (operands[11], operands[1]) > + && (rtx_equal_p (operands[6], operands[0]) > + ? (rtx_equal_p (operands[7], operands[1]) > + && rtx_equal_p (operands[8], operands[2])) > + : (rtx_equal_p (operands[8], operands[0]) > + && rtx_equal_p (operands[9], operands[1]) > + && rtx_equal_p (operands[6], operands[2])))" > + [(set (match_dup 0) (match_dup 9)) > + (parallel [(set (reg:CCC FLAGS_REG) > + (compare:CCC > + (zero_extend: (match_dup 1)) > + (plus: (match_op_dup 4 > + [(match_dup 3) (const_int 0)]) > + (zero_extend: (match_dup 0))))) > + (set (match_dup 1) > + (minus:SWI48 (minus:SWI48 (match_dup 1) > + (match_op_dup 5 > + [(match_dup 3) (const_int 0)])) > + (match_dup 0)))]) > + (set (match_dup 10) (ltu:QI (reg:CCC FLAGS_REG) (const_int 0))) > + (set (match_dup 11) (zero_extend:SWI48 (match_dup 10)))] > +{ > + if (!rtx_equal_p (operands[6], operands[0])) > + operands[9] = operands[7]; > +}) > + > (define_expand "subborrow_0" > [(parallel > [(set (reg:CC FLAGS_REG) > @@ -8142,6 +8568,67 @@ (define_expand "subborrow_0" > (minus:SWI48 (match_dup 1) (match_dup 2)))])] > "ix86_binary_operator_ok (MINUS, mode, operands)") > > +(define_expand "uaddc5" > + [(match_operand:SWI48 0 "register_operand") > + (match_operand:SWI48 1 "register_operand") > + (match_operand:SWI48 2 "register_operand") > + (match_operand:SWI48 3 "register_operand") > + (match_operand:SWI48 4 "nonmemory_operand")] > + "" > +{ > + rtx cf = gen_rtx_REG (CCCmode, FLAGS_REG), pat, pat2; > + if (operands[4] == const0_rtx) > + emit_insn (gen_addcarry_0 (operands[0], operands[2], operands[3])); > + else > + { > + rtx op4 = copy_to_mode_reg (QImode, > + convert_to_mode (QImode, operands[4], 1)); > + emit_insn (gen_addqi3_cconly_overflow (op4, constm1_rtx)); > + pat = gen_rtx_LTU (mode, cf, const0_rtx); > + pat2 = gen_rtx_LTU (mode, cf, const0_rtx); > + emit_insn (gen_addcarry (operands[0], operands[2], operands[3], > + cf, pat, pat2)); > + } > + rtx cc = gen_reg_rtx (QImode); > + pat = gen_rtx_LTU (QImode, cf, const0_rtx); > + emit_insn (gen_rtx_SET (cc, pat)); > + emit_insn (gen_zero_extendqi2 (operands[1], cc)); > + DONE; > +}) > + > +(define_expand "usubc5" > + [(match_operand:SWI48 0 "register_operand") > + (match_operand:SWI48 1 "register_operand") > + (match_operand:SWI48 2 "register_operand") > + (match_operand:SWI48 3 "register_operand") > + (match_operand:SWI48 4 "nonmemory_operand")] > + "" > +{ > + rtx cf, pat, pat2; > + if (operands[4] == const0_rtx) > + { > + cf = gen_rtx_REG (CCmode, FLAGS_REG); > + emit_insn (gen_subborrow_0 (operands[0], operands[2], > + operands[3])); > + } > + else > + { > + cf = gen_rtx_REG (CCCmode, FLAGS_REG); > + rtx op4 = copy_to_mode_reg (QImode, > + convert_to_mode (QImode, operands[4], 1)); > + emit_insn (gen_addqi3_cconly_overflow (op4, constm1_rtx)); > + pat = gen_rtx_LTU (mode, cf, const0_rtx); > + pat2 = gen_rtx_LTU (mode, cf, const0_rtx); > + emit_insn (gen_subborrow (operands[0], operands[2], operands[3], > + cf, pat, pat2)); > + } > + rtx cc = gen_reg_rtx (QImode); > + pat = gen_rtx_LTU (QImode, cf, const0_rtx); > + emit_insn (gen_rtx_SET (cc, pat)); > + emit_insn (gen_zero_extendqi2 (operands[1], cc)); > + DONE; > +}) > + > (define_mode_iterator CC_CCC [CC CCC]) > > ;; Pre-reload splitter to optimize > @@ -8239,6 +8726,27 @@ (define_peephole2 > (compare:CCC > (plus:SWI (match_dup 1) (match_dup 0)) > (match_dup 1))) > + (set (match_dup 1) (plus:SWI (match_dup 1) (match_dup 0)))])]) > + > +(define_peephole2 > + [(set (match_operand:SWI 0 "general_reg_operand") > + (match_operand:SWI 1 "memory_operand")) > + (parallel [(set (reg:CCC FLAGS_REG) > + (compare:CCC > + (plus:SWI (match_dup 0) > + (match_operand:SWI 2 "memory_operand")) > + (match_dup 0))) > + (set (match_dup 0) (plus:SWI (match_dup 0) (match_dup 2)))]) > + (set (match_dup 1) (match_dup 0))] > + "(TARGET_READ_MODIFY_WRITE || optimize_insn_for_size_p ()) > + && peep2_reg_dead_p (3, operands[0]) > + && !reg_overlap_mentioned_p (operands[0], operands[1]) > + && !reg_overlap_mentioned_p (operands[0], operands[2])" > + [(set (match_dup 0) (match_dup 2)) > + (parallel [(set (reg:CCC FLAGS_REG) > + (compare:CCC > + (plus:SWI (match_dup 1) (match_dup 0)) > + (match_dup 1))) > (set (match_dup 1) (plus:SWI (match_dup 1) (match_dup 0)))])]) > > (define_insn "*addsi3_zext_cc_overflow_1" > --- gcc/testsuite/gcc.target/i386/pr79173-1.c.jj 2023-06-13 12:30:23.466967151 +0200 > +++ gcc/testsuite/gcc.target/i386/pr79173-1.c 2023-06-13 12:30:23.466967151 +0200 > @@ -0,0 +1,59 @@ > +/* PR middle-end/79173 */ > +/* { dg-do compile } */ > +/* { dg-options "-O2 -fno-stack-protector -masm=att" } */ > +/* { dg-final { scan-assembler-times "addq\t%r\[^\n\r]*, \\\(%rdi\\\)" 1 { target lp64 } } } */ > +/* { dg-final { scan-assembler-times "adcq\t%r\[^\n\r]*, 8\\\(%rdi\\\)" 1 { target lp64 } } } */ > +/* { dg-final { scan-assembler-times "adcq\t%r\[^\n\r]*, 16\\\(%rdi\\\)" 1 { target lp64 } } } */ > +/* { dg-final { scan-assembler-times "adcq\t%r\[^\n\r]*, 24\\\(%rdi\\\)" 1 { target lp64 } } } */ > +/* { dg-final { scan-assembler-times "subq\t%r\[^\n\r]*, \\\(%rdi\\\)" 1 { target lp64 } } } */ > +/* { dg-final { scan-assembler-times "sbbq\t%r\[^\n\r]*, 8\\\(%rdi\\\)" 1 { target lp64 } } } */ > +/* { dg-final { scan-assembler-times "sbbq\t%r\[^\n\r]*, 16\\\(%rdi\\\)" 1 { target lp64 } } } */ > +/* { dg-final { scan-assembler-times "sbbq\t%r\[^\n\r]*, 24\\\(%rdi\\\)" 1 { target lp64 } } } */ > +/* { dg-final { scan-assembler-times "addl\t%e\[^\n\r]*, \\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */ > +/* { dg-final { scan-assembler-times "adcl\t%e\[^\n\r]*, 4\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */ > +/* { dg-final { scan-assembler-times "adcl\t%e\[^\n\r]*, 8\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */ > +/* { dg-final { scan-assembler-times "adcl\t%e\[^\n\r]*, 12\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */ > +/* { dg-final { scan-assembler-times "subl\t%e\[^\n\r]*, \\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */ > +/* { dg-final { scan-assembler-times "sbbl\t%e\[^\n\r]*, 4\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */ > +/* { dg-final { scan-assembler-times "sbbl\t%e\[^\n\r]*, 8\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */ > +/* { dg-final { scan-assembler-times "sbbl\t%e\[^\n\r]*, 12\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */ > + > +static unsigned long > +uaddc (unsigned long x, unsigned long y, unsigned long carry_in, unsigned long *carry_out) > +{ > + unsigned long r; > + unsigned long c1 = __builtin_add_overflow (x, y, &r); > + unsigned long c2 = __builtin_add_overflow (r, carry_in, &r); > + *carry_out = c1 + c2; > + return r; > +} > + > +static unsigned long > +usubc (unsigned long x, unsigned long y, unsigned long carry_in, unsigned long *carry_out) > +{ > + unsigned long r; > + unsigned long c1 = __builtin_sub_overflow (x, y, &r); > + unsigned long c2 = __builtin_sub_overflow (r, carry_in, &r); > + *carry_out = c1 + c2; > + return r; > +} > + > +void > +foo (unsigned long *p, unsigned long *q) > +{ > + unsigned long c; > + p[0] = uaddc (p[0], q[0], 0, &c); > + p[1] = uaddc (p[1], q[1], c, &c); > + p[2] = uaddc (p[2], q[2], c, &c); > + p[3] = uaddc (p[3], q[3], c, &c); > +} > + > +void > +bar (unsigned long *p, unsigned long *q) > +{ > + unsigned long c; > + p[0] = usubc (p[0], q[0], 0, &c); > + p[1] = usubc (p[1], q[1], c, &c); > + p[2] = usubc (p[2], q[2], c, &c); > + p[3] = usubc (p[3], q[3], c, &c); > +} > --- gcc/testsuite/gcc.target/i386/pr79173-2.c.jj 2023-06-13 12:30:23.466967151 +0200 > +++ gcc/testsuite/gcc.target/i386/pr79173-2.c 2023-06-13 12:30:23.466967151 +0200 > @@ -0,0 +1,59 @@ > +/* PR middle-end/79173 */ > +/* { dg-do compile } */ > +/* { dg-options "-O2 -fno-stack-protector -masm=att" } */ > +/* { dg-final { scan-assembler-times "addq\t%r\[^\n\r]*, \\\(%rdi\\\)" 1 { target lp64 } } } */ > +/* { dg-final { scan-assembler-times "adcq\t%r\[^\n\r]*, 8\\\(%rdi\\\)" 1 { target lp64 } } } */ > +/* { dg-final { scan-assembler-times "adcq\t%r\[^\n\r]*, 16\\\(%rdi\\\)" 1 { target lp64 } } } */ > +/* { dg-final { scan-assembler-times "adcq\t%r\[^\n\r]*, 24\\\(%rdi\\\)" 1 { target lp64 } } } */ > +/* { dg-final { scan-assembler-times "subq\t%r\[^\n\r]*, \\\(%rdi\\\)" 1 { target lp64 } } } */ > +/* { dg-final { scan-assembler-times "sbbq\t%r\[^\n\r]*, 8\\\(%rdi\\\)" 1 { target lp64 } } } */ > +/* { dg-final { scan-assembler-times "sbbq\t%r\[^\n\r]*, 16\\\(%rdi\\\)" 1 { target lp64 } } } */ > +/* { dg-final { scan-assembler-times "sbbq\t%r\[^\n\r]*, 24\\\(%rdi\\\)" 1 { target lp64 } } } */ > +/* { dg-final { scan-assembler-times "addl\t%e\[^\n\r]*, \\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */ > +/* { dg-final { scan-assembler-times "adcl\t%e\[^\n\r]*, 4\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */ > +/* { dg-final { scan-assembler-times "adcl\t%e\[^\n\r]*, 8\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */ > +/* { dg-final { scan-assembler-times "adcl\t%e\[^\n\r]*, 12\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */ > +/* { dg-final { scan-assembler-times "subl\t%e\[^\n\r]*, \\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */ > +/* { dg-final { scan-assembler-times "sbbl\t%e\[^\n\r]*, 4\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */ > +/* { dg-final { scan-assembler-times "sbbl\t%e\[^\n\r]*, 8\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */ > +/* { dg-final { scan-assembler-times "sbbl\t%e\[^\n\r]*, 12\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */ > + > +static unsigned long > +uaddc (unsigned long x, unsigned long y, _Bool carry_in, _Bool *carry_out) > +{ > + unsigned long r; > + _Bool c1 = __builtin_add_overflow (x, y, &r); > + _Bool c2 = __builtin_add_overflow (r, carry_in, &r); > + *carry_out = c1 | c2; > + return r; > +} > + > +static unsigned long > +usubc (unsigned long x, unsigned long y, _Bool carry_in, _Bool *carry_out) > +{ > + unsigned long r; > + _Bool c1 = __builtin_sub_overflow (x, y, &r); > + _Bool c2 = __builtin_sub_overflow (r, carry_in, &r); > + *carry_out = c1 | c2; > + return r; > +} > + > +void > +foo (unsigned long *p, unsigned long *q) > +{ > + _Bool c; > + p[0] = uaddc (p[0], q[0], 0, &c); > + p[1] = uaddc (p[1], q[1], c, &c); > + p[2] = uaddc (p[2], q[2], c, &c); > + p[3] = uaddc (p[3], q[3], c, &c); > +} > + > +void > +bar (unsigned long *p, unsigned long *q) > +{ > + _Bool c; > + p[0] = usubc (p[0], q[0], 0, &c); > + p[1] = usubc (p[1], q[1], c, &c); > + p[2] = usubc (p[2], q[2], c, &c); > + p[3] = usubc (p[3], q[3], c, &c); > +} > --- gcc/testsuite/gcc.target/i386/pr79173-3.c.jj 2023-06-13 12:30:23.467967137 +0200 > +++ gcc/testsuite/gcc.target/i386/pr79173-3.c 2023-06-13 12:30:23.467967137 +0200 > @@ -0,0 +1,61 @@ > +/* PR middle-end/79173 */ > +/* { dg-do compile } */ > +/* { dg-options "-O2 -fno-stack-protector -masm=att" } */ > +/* { dg-final { scan-assembler-times "addq\t%r\[^\n\r]*, \\\(%rdi\\\)" 1 { target lp64 } } } */ > +/* { dg-final { scan-assembler-times "adcq\t%r\[^\n\r]*, 8\\\(%rdi\\\)" 1 { target lp64 } } } */ > +/* { dg-final { scan-assembler-times "adcq\t%r\[^\n\r]*, 16\\\(%rdi\\\)" 1 { target lp64 } } } */ > +/* { dg-final { scan-assembler-times "adcq\t%r\[^\n\r]*, 24\\\(%rdi\\\)" 1 { target lp64 } } } */ > +/* { dg-final { scan-assembler-times "subq\t%r\[^\n\r]*, \\\(%rdi\\\)" 1 { target lp64 } } } */ > +/* { dg-final { scan-assembler-times "sbbq\t%r\[^\n\r]*, 8\\\(%rdi\\\)" 1 { target lp64 } } } */ > +/* { dg-final { scan-assembler-times "sbbq\t%r\[^\n\r]*, 16\\\(%rdi\\\)" 1 { target lp64 } } } */ > +/* { dg-final { scan-assembler-times "sbbq\t%r\[^\n\r]*, 24\\\(%rdi\\\)" 1 { target lp64 } } } */ > +/* { dg-final { scan-assembler-times "addl\t%e\[^\n\r]*, \\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */ > +/* { dg-final { scan-assembler-times "adcl\t%e\[^\n\r]*, 4\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */ > +/* { dg-final { scan-assembler-times "adcl\t%e\[^\n\r]*, 8\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */ > +/* { dg-final { scan-assembler-times "adcl\t%e\[^\n\r]*, 12\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */ > +/* { dg-final { scan-assembler-times "subl\t%e\[^\n\r]*, \\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */ > +/* { dg-final { scan-assembler-times "sbbl\t%e\[^\n\r]*, 4\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */ > +/* { dg-final { scan-assembler-times "sbbl\t%e\[^\n\r]*, 8\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */ > +/* { dg-final { scan-assembler-times "sbbl\t%e\[^\n\r]*, 12\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */ > + > +static unsigned long > +uaddc (unsigned long x, unsigned long y, unsigned long carry_in, unsigned long *carry_out) > +{ > + unsigned long r; > + unsigned long c1 = __builtin_add_overflow (x, y, &r); > + unsigned long c2 = __builtin_add_overflow (r, carry_in, &r); > + *carry_out = c1 + c2; > + return r; > +} > + > +static unsigned long > +usubc (unsigned long x, unsigned long y, unsigned long carry_in, unsigned long *carry_out) > +{ > + unsigned long r; > + unsigned long c1 = __builtin_sub_overflow (x, y, &r); > + unsigned long c2 = __builtin_sub_overflow (r, carry_in, &r); > + *carry_out = c1 + c2; > + return r; > +} > + > +unsigned long > +foo (unsigned long *p, unsigned long *q) > +{ > + unsigned long c; > + p[0] = uaddc (p[0], q[0], 0, &c); > + p[1] = uaddc (p[1], q[1], c, &c); > + p[2] = uaddc (p[2], q[2], c, &c); > + p[3] = uaddc (p[3], q[3], c, &c); > + return c; > +} > + > +unsigned long > +bar (unsigned long *p, unsigned long *q) > +{ > + unsigned long c; > + p[0] = usubc (p[0], q[0], 0, &c); > + p[1] = usubc (p[1], q[1], c, &c); > + p[2] = usubc (p[2], q[2], c, &c); > + p[3] = usubc (p[3], q[3], c, &c); > + return c; > +} > --- gcc/testsuite/gcc.target/i386/pr79173-4.c.jj 2023-06-13 12:30:23.467967137 +0200 > +++ gcc/testsuite/gcc.target/i386/pr79173-4.c 2023-06-13 12:30:23.467967137 +0200 > @@ -0,0 +1,61 @@ > +/* PR middle-end/79173 */ > +/* { dg-do compile } */ > +/* { dg-options "-O2 -fno-stack-protector -masm=att" } */ > +/* { dg-final { scan-assembler-times "addq\t%r\[^\n\r]*, \\\(%rdi\\\)" 1 { target lp64 } } } */ > +/* { dg-final { scan-assembler-times "adcq\t%r\[^\n\r]*, 8\\\(%rdi\\\)" 1 { target lp64 } } } */ > +/* { dg-final { scan-assembler-times "adcq\t%r\[^\n\r]*, 16\\\(%rdi\\\)" 1 { target lp64 } } } */ > +/* { dg-final { scan-assembler-times "adcq\t%r\[^\n\r]*, 24\\\(%rdi\\\)" 1 { target lp64 } } } */ > +/* { dg-final { scan-assembler-times "subq\t%r\[^\n\r]*, \\\(%rdi\\\)" 1 { target lp64 } } } */ > +/* { dg-final { scan-assembler-times "sbbq\t%r\[^\n\r]*, 8\\\(%rdi\\\)" 1 { target lp64 } } } */ > +/* { dg-final { scan-assembler-times "sbbq\t%r\[^\n\r]*, 16\\\(%rdi\\\)" 1 { target lp64 } } } */ > +/* { dg-final { scan-assembler-times "sbbq\t%r\[^\n\r]*, 24\\\(%rdi\\\)" 1 { target lp64 } } } */ > +/* { dg-final { scan-assembler-times "addl\t%e\[^\n\r]*, \\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */ > +/* { dg-final { scan-assembler-times "adcl\t%e\[^\n\r]*, 4\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */ > +/* { dg-final { scan-assembler-times "adcl\t%e\[^\n\r]*, 8\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */ > +/* { dg-final { scan-assembler-times "adcl\t%e\[^\n\r]*, 12\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */ > +/* { dg-final { scan-assembler-times "subl\t%e\[^\n\r]*, \\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */ > +/* { dg-final { scan-assembler-times "sbbl\t%e\[^\n\r]*, 4\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */ > +/* { dg-final { scan-assembler-times "sbbl\t%e\[^\n\r]*, 8\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */ > +/* { dg-final { scan-assembler-times "sbbl\t%e\[^\n\r]*, 12\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */ > + > +static unsigned long > +uaddc (unsigned long x, unsigned long y, _Bool carry_in, _Bool *carry_out) > +{ > + unsigned long r; > + _Bool c1 = __builtin_add_overflow (x, y, &r); > + _Bool c2 = __builtin_add_overflow (r, carry_in, &r); > + *carry_out = c1 ^ c2; > + return r; > +} > + > +static unsigned long > +usubc (unsigned long x, unsigned long y, _Bool carry_in, _Bool *carry_out) > +{ > + unsigned long r; > + _Bool c1 = __builtin_sub_overflow (x, y, &r); > + _Bool c2 = __builtin_sub_overflow (r, carry_in, &r); > + *carry_out = c1 ^ c2; > + return r; > +} > + > +_Bool > +foo (unsigned long *p, unsigned long *q) > +{ > + _Bool c; > + p[0] = uaddc (p[0], q[0], 0, &c); > + p[1] = uaddc (p[1], q[1], c, &c); > + p[2] = uaddc (p[2], q[2], c, &c); > + p[3] = uaddc (p[3], q[3], c, &c); > + return c; > +} > + > +_Bool > +bar (unsigned long *p, unsigned long *q) > +{ > + _Bool c; > + p[0] = usubc (p[0], q[0], 0, &c); > + p[1] = usubc (p[1], q[1], c, &c); > + p[2] = usubc (p[2], q[2], c, &c); > + p[3] = usubc (p[3], q[3], c, &c); > + return c; > +} > --- gcc/testsuite/gcc.target/i386/pr79173-5.c.jj 2023-06-13 12:30:23.467967137 +0200 > +++ gcc/testsuite/gcc.target/i386/pr79173-5.c 2023-06-13 12:30:23.467967137 +0200 > @@ -0,0 +1,32 @@ > +/* PR middle-end/79173 */ > +/* { dg-do compile } */ > +/* { dg-options "-O2 -fno-stack-protector -masm=att" } */ > +/* { dg-final { scan-assembler-times "addq\t%r\[^\n\r]*, \\\(%rdi\\\)" 1 { target lp64 } } } */ > +/* { dg-final { scan-assembler-times "adcq\t%r\[^\n\r]*, 8\\\(%rdi\\\)" 1 { target lp64 } } } */ > +/* { dg-final { scan-assembler-times "adcq\t%r\[^\n\r]*, 16\\\(%rdi\\\)" 1 { target lp64 } } } */ > +/* { dg-final { scan-assembler-times "adcq\t%r\[^\n\r]*, 24\\\(%rdi\\\)" 1 { target lp64 } } } */ > +/* { dg-final { scan-assembler-times "addl\t%e\[^\n\r]*, \\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */ > +/* { dg-final { scan-assembler-times "adcl\t%e\[^\n\r]*, 4\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */ > +/* { dg-final { scan-assembler-times "adcl\t%e\[^\n\r]*, 8\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */ > +/* { dg-final { scan-assembler-times "adcl\t%e\[^\n\r]*, 12\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */ > + > +static unsigned long > +uaddc (unsigned long x, unsigned long y, unsigned long carry_in, unsigned long *carry_out) > +{ > + unsigned long r = x + y; > + unsigned long c1 = r < x; > + r += carry_in; > + unsigned long c2 = r < carry_in; > + *carry_out = c1 + c2; > + return r; > +} > + > +void > +foo (unsigned long *p, unsigned long *q) > +{ > + unsigned long c; > + p[0] = uaddc (p[0], q[0], 0, &c); > + p[1] = uaddc (p[1], q[1], c, &c); > + p[2] = uaddc (p[2], q[2], c, &c); > + p[3] = uaddc (p[3], q[3], c, &c); > +} > --- gcc/testsuite/gcc.target/i386/pr79173-6.c.jj 2023-06-13 12:30:23.467967137 +0200 > +++ gcc/testsuite/gcc.target/i386/pr79173-6.c 2023-06-13 12:30:23.467967137 +0200 > @@ -0,0 +1,33 @@ > +/* PR middle-end/79173 */ > +/* { dg-do compile } */ > +/* { dg-options "-O2 -fno-stack-protector -masm=att" } */ > +/* { dg-final { scan-assembler-times "addq\t%r\[^\n\r]*, \\\(%rdi\\\)" 1 { target lp64 } } } */ > +/* { dg-final { scan-assembler-times "adcq\t%r\[^\n\r]*, 8\\\(%rdi\\\)" 1 { target lp64 } } } */ > +/* { dg-final { scan-assembler-times "adcq\t%r\[^\n\r]*, 16\\\(%rdi\\\)" 1 { target lp64 } } } */ > +/* { dg-final { scan-assembler-times "adcq\t%r\[^\n\r]*, 24\\\(%rdi\\\)" 1 { target lp64 } } } */ > +/* { dg-final { scan-assembler-times "addl\t%e\[^\n\r]*, \\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */ > +/* { dg-final { scan-assembler-times "adcl\t%e\[^\n\r]*, 4\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */ > +/* { dg-final { scan-assembler-times "adcl\t%e\[^\n\r]*, 8\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */ > +/* { dg-final { scan-assembler-times "adcl\t%e\[^\n\r]*, 12\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */ > + > +static unsigned long > +uaddc (unsigned long x, unsigned long y, unsigned long carry_in, unsigned long *carry_out) > +{ > + unsigned long r = x + y; > + unsigned long c1 = r < x; > + r += carry_in; > + unsigned long c2 = r < carry_in; > + *carry_out = c1 + c2; > + return r; > +} > + > +unsigned long > +foo (unsigned long *p, unsigned long *q) > +{ > + unsigned long c; > + p[0] = uaddc (p[0], q[0], 0, &c); > + p[1] = uaddc (p[1], q[1], c, &c); > + p[2] = uaddc (p[2], q[2], c, &c); > + p[3] = uaddc (p[3], q[3], c, &c); > + return c; > +} > --- gcc/testsuite/gcc.target/i386/pr79173-7.c.jj 2023-06-13 12:30:23.468967123 +0200 > +++ gcc/testsuite/gcc.target/i386/pr79173-7.c 2023-06-13 12:30:23.468967123 +0200 > @@ -0,0 +1,31 @@ > +/* PR middle-end/79173 */ > +/* { dg-do compile { target lp64 } } */ > +/* { dg-options "-O2 -fno-stack-protector -masm=att" } */ > +/* { dg-final { scan-assembler-times "addq\t%r\[^\n\r]*, \\\(%rdi\\\)" 1 } } */ > +/* { dg-final { scan-assembler-times "adcq\t%r\[^\n\r]*, 8\\\(%rdi\\\)" 1 } } */ > +/* { dg-final { scan-assembler-times "adcq\t%r\[^\n\r]*, 16\\\(%rdi\\\)" 1 } } */ > +/* { dg-final { scan-assembler-times "adcq\t%r\[^\n\r]*, 24\\\(%rdi\\\)" 1 } } */ > +/* { dg-final { scan-assembler-times "subq\t%r\[^\n\r]*, \\\(%rdi\\\)" 1 } } */ > +/* { dg-final { scan-assembler-times "sbbq\t%r\[^\n\r]*, 8\\\(%rdi\\\)" 1 } } */ > +/* { dg-final { scan-assembler-times "sbbq\t%r\[^\n\r]*, 16\\\(%rdi\\\)" 1 } } */ > +/* { dg-final { scan-assembler-times "sbbq\t%r\[^\n\r]*, 24\\\(%rdi\\\)" 1 } } */ > + > +#include > + > +void > +foo (unsigned long long *p, unsigned long long *q) > +{ > + unsigned char c = _addcarry_u64 (0, p[0], q[0], &p[0]); > + c = _addcarry_u64 (c, p[1], q[1], &p[1]); > + c = _addcarry_u64 (c, p[2], q[2], &p[2]); > + _addcarry_u64 (c, p[3], q[3], &p[3]); > +} > + > +void > +bar (unsigned long long *p, unsigned long long *q) > +{ > + unsigned char c = _subborrow_u64 (0, p[0], q[0], &p[0]); > + c = _subborrow_u64 (c, p[1], q[1], &p[1]); > + c = _subborrow_u64 (c, p[2], q[2], &p[2]); > + _subborrow_u64 (c, p[3], q[3], &p[3]); > +} > --- gcc/testsuite/gcc.target/i386/pr79173-8.c.jj 2023-06-13 12:30:23.468967123 +0200 > +++ gcc/testsuite/gcc.target/i386/pr79173-8.c 2023-06-13 12:30:23.468967123 +0200 > @@ -0,0 +1,31 @@ > +/* PR middle-end/79173 */ > +/* { dg-do compile } */ > +/* { dg-options "-O2 -fno-stack-protector -masm=att" } */ > +/* { dg-final { scan-assembler-times "addl\t%e\[^\n\r]*, \\\(%\[^\n\r]*\\\)" 1 } } */ > +/* { dg-final { scan-assembler-times "adcl\t%e\[^\n\r]*, 4\\\(%\[^\n\r]*\\\)" 1 } } */ > +/* { dg-final { scan-assembler-times "adcl\t%e\[^\n\r]*, 8\\\(%\[^\n\r]*\\\)" 1 } } */ > +/* { dg-final { scan-assembler-times "adcl\t%e\[^\n\r]*, 12\\\(%\[^\n\r]*\\\)" 1 } } */ > +/* { dg-final { scan-assembler-times "subl\t%e\[^\n\r]*, \\\(%\[^\n\r]*\\\)" 1 } } */ > +/* { dg-final { scan-assembler-times "sbbl\t%e\[^\n\r]*, 4\\\(%\[^\n\r]*\\\)" 1 } } */ > +/* { dg-final { scan-assembler-times "sbbl\t%e\[^\n\r]*, 8\\\(%\[^\n\r]*\\\)" 1 } } */ > +/* { dg-final { scan-assembler-times "sbbl\t%e\[^\n\r]*, 12\\\(%\[^\n\r]*\\\)" 1 } } */ > + > +#include > + > +void > +foo (unsigned int *p, unsigned int *q) > +{ > + unsigned char c = _addcarry_u32 (0, p[0], q[0], &p[0]); > + c = _addcarry_u32 (c, p[1], q[1], &p[1]); > + c = _addcarry_u32 (c, p[2], q[2], &p[2]); > + _addcarry_u32 (c, p[3], q[3], &p[3]); > +} > + > +void > +bar (unsigned int *p, unsigned int *q) > +{ > + unsigned char c = _subborrow_u32 (0, p[0], q[0], &p[0]); > + c = _subborrow_u32 (c, p[1], q[1], &p[1]); > + c = _subborrow_u32 (c, p[2], q[2], &p[2]); > + _subborrow_u32 (c, p[3], q[3], &p[3]); > +} > --- gcc/testsuite/gcc.target/i386/pr79173-9.c.jj 2023-06-13 12:30:23.468967123 +0200 > +++ gcc/testsuite/gcc.target/i386/pr79173-9.c 2023-06-13 12:30:23.468967123 +0200 > @@ -0,0 +1,31 @@ > +/* PR middle-end/79173 */ > +/* { dg-do compile { target lp64 } } */ > +/* { dg-options "-O2 -fno-stack-protector -masm=att" } */ > +/* { dg-final { scan-assembler-times "addq\t%r\[^\n\r]*, \\\(%rdi\\\)" 1 } } */ > +/* { dg-final { scan-assembler-times "adcq\t%r\[^\n\r]*, 8\\\(%rdi\\\)" 1 } } */ > +/* { dg-final { scan-assembler-times "adcq\t%r\[^\n\r]*, 16\\\(%rdi\\\)" 1 } } */ > +/* { dg-final { scan-assembler-times "adcq\t%r\[^\n\r]*, 24\\\(%rdi\\\)" 1 } } */ > +/* { dg-final { scan-assembler-times "subq\t%r\[^\n\r]*, \\\(%rdi\\\)" 1 } } */ > +/* { dg-final { scan-assembler-times "sbbq\t%r\[^\n\r]*, 8\\\(%rdi\\\)" 1 } } */ > +/* { dg-final { scan-assembler-times "sbbq\t%r\[^\n\r]*, 16\\\(%rdi\\\)" 1 } } */ > +/* { dg-final { scan-assembler-times "sbbq\t%r\[^\n\r]*, 24\\\(%rdi\\\)" 1 } } */ > + > +#include > + > +unsigned long long > +foo (unsigned long long *p, unsigned long long *q) > +{ > + unsigned char c = _addcarry_u64 (0, p[0], q[0], &p[0]); > + c = _addcarry_u64 (c, p[1], q[1], &p[1]); > + c = _addcarry_u64 (c, p[2], q[2], &p[2]); > + return _addcarry_u64 (c, p[3], q[3], &p[3]); > +} > + > +unsigned long long > +bar (unsigned long long *p, unsigned long long *q) > +{ > + unsigned char c = _subborrow_u64 (0, p[0], q[0], &p[0]); > + c = _subborrow_u64 (c, p[1], q[1], &p[1]); > + c = _subborrow_u64 (c, p[2], q[2], &p[2]); > + return _subborrow_u64 (c, p[3], q[3], &p[3]); > +} > --- gcc/testsuite/gcc.target/i386/pr79173-10.c.jj 2023-06-13 12:30:23.468967123 +0200 > +++ gcc/testsuite/gcc.target/i386/pr79173-10.c 2023-06-13 12:30:23.468967123 +0200 > @@ -0,0 +1,31 @@ > +/* PR middle-end/79173 */ > +/* { dg-do compile } */ > +/* { dg-options "-O2 -fno-stack-protector -masm=att" } */ > +/* { dg-final { scan-assembler-times "addl\t%e\[^\n\r]*, \\\(%\[^\n\r]*\\\)" 1 } } */ > +/* { dg-final { scan-assembler-times "adcl\t%e\[^\n\r]*, 4\\\(%\[^\n\r]*\\\)" 1 } } */ > +/* { dg-final { scan-assembler-times "adcl\t%e\[^\n\r]*, 8\\\(%\[^\n\r]*\\\)" 1 } } */ > +/* { dg-final { scan-assembler-times "adcl\t%e\[^\n\r]*, 12\\\(%\[^\n\r]*\\\)" 1 } } */ > +/* { dg-final { scan-assembler-times "subl\t%e\[^\n\r]*, \\\(%\[^\n\r]*\\\)" 1 } } */ > +/* { dg-final { scan-assembler-times "sbbl\t%e\[^\n\r]*, 4\\\(%\[^\n\r]*\\\)" 1 } } */ > +/* { dg-final { scan-assembler-times "sbbl\t%e\[^\n\r]*, 8\\\(%\[^\n\r]*\\\)" 1 } } */ > +/* { dg-final { scan-assembler-times "sbbl\t%e\[^\n\r]*, 12\\\(%\[^\n\r]*\\\)" 1 } } */ > + > +#include > + > +unsigned int > +foo (unsigned int *p, unsigned int *q) > +{ > + unsigned char c = _addcarry_u32 (0, p[0], q[0], &p[0]); > + c = _addcarry_u32 (c, p[1], q[1], &p[1]); > + c = _addcarry_u32 (c, p[2], q[2], &p[2]); > + return _addcarry_u32 (c, p[3], q[3], &p[3]); > +} > + > +unsigned int > +bar (unsigned int *p, unsigned int *q) > +{ > + unsigned char c = _subborrow_u32 (0, p[0], q[0], &p[0]); > + c = _subborrow_u32 (c, p[1], q[1], &p[1]); > + c = _subborrow_u32 (c, p[2], q[2], &p[2]); > + return _subborrow_u32 (c, p[3], q[3], &p[3]); > +} > > > Jakub > > -- Richard Biener SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman; HRB 36809 (AG Nuernberg)