From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by sourceware.org (Postfix) with ESMTPS id 323EB3858D32 for ; Tue, 13 Jun 2023 11:29:13 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 323EB3858D32 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=redhat.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1686655752; h=from:from:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:in-reply-to:in-reply-to: references:references; bh=eXY0fLEpmW+iZUo0XtBcDf+McijjkGVBFIkePDOpKew=; b=GqtxEMoiBiIYqrq7YfzEA27pPDkbYda4hQl+Y5FO1c5UTDIjmr70CIzrIX6wnqwtWbI+Va 3stROnjMIjbHXJKRmotXkFD87Tug1aHGCeH1qgsvRlt3cpAz7bxAUoLLdc+OJVFIsJ2ZVa 15JYTinAYkjZO/WCKJtNgIqawxoRLYE= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-540-KNIumGRHO46JCu4sfjcYJg-1; Tue, 13 Jun 2023 07:29:09 -0400 X-MC-Unique: KNIumGRHO46JCu4sfjcYJg-1 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.rdu2.redhat.com [10.11.54.6]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 142CC1C07250; Tue, 13 Jun 2023 11:29:09 +0000 (UTC) Received: from tucnak.zalov.cz (unknown [10.39.194.30]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 707B62166B25; Tue, 13 Jun 2023 11:29:08 +0000 (UTC) Received: from tucnak.zalov.cz (localhost [127.0.0.1]) by tucnak.zalov.cz (8.17.1/8.17.1) with ESMTPS id 35DBT52C2500203 (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=NOT); Tue, 13 Jun 2023 13:29:05 +0200 Received: (from jakub@localhost) by tucnak.zalov.cz (8.17.1/8.17.1/Submit) id 35DBT48G2500202; Tue, 13 Jun 2023 13:29:04 +0200 Date: Tue, 13 Jun 2023 13:29:04 +0200 From: Jakub Jelinek To: Richard Biener Cc: Uros Bizjak , gcc-patches@gcc.gnu.org Subject: Re: [PATCH] middle-end, i386: Pattern recognize add/subtract with carry [PR79173] Message-ID: Reply-To: Jakub Jelinek References: MIME-Version: 1.0 In-Reply-To: X-Scanned-By: MIMEDefang 3.1 on 10.11.54.6 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=us-ascii Content-Disposition: inline X-Spam-Status: No, score=-3.7 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,KAM_SHORT,RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H5,RCVD_IN_MSPIKE_WL,SPF_HELO_NONE,SPF_NONE,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Tue, Jun 13, 2023 at 08:40:36AM +0000, Richard Biener wrote: > I suspect re-association can wreck things even more here. I have > to say the matching code is very hard to follow, not sure if > splitting out a function matching > > _22 = .{ADD,SUB}_OVERFLOW (_6, _5); > _23 = REALPART_EXPR <_22>; > _24 = IMAGPART_EXPR <_22>; > > from _23 and _24 would help? I've outlined 3 most often used sequences of statements or checks into 3 helper functions, hope that helps. > > + while (TREE_CODE (rhs[0]) == SSA_NAME && !rhs[3]) > > + { > > + gimple *g = SSA_NAME_DEF_STMT (rhs[0]); > > + if (has_single_use (rhs[0]) > > + && is_gimple_assign (g) > > + && (gimple_assign_rhs_code (g) == code > > + || (code == MINUS_EXPR > > + && gimple_assign_rhs_code (g) == PLUS_EXPR > > + && TREE_CODE (gimple_assign_rhs2 (g)) == INTEGER_CST))) > > + { > > + rhs[0] = gimple_assign_rhs1 (g); > > + tree &r = rhs[2] ? rhs[3] : rhs[2]; > > + r = gimple_assign_rhs2 (g); > > + if (gimple_assign_rhs_code (g) != code) > > + r = fold_build1 (NEGATE_EXPR, TREE_TYPE (r), r); > > Can you use const_unop here? In fact both will not reliably > negate all constants (ick), so maybe we want a force_const_negate ()? It is unsigned type NEGATE_EXPR of INTEGER_CST, so I think it should work. That said, changed it to const_unop and am just giving up on it as if it wasn't a PLUS_EXPR with INTEGER_CST addend if const_unop doesn't simplify. > > + else if (addc_subc) > > + { > > + if (!integer_zerop (arg2)) > > + ; > > + /* x = y + 0 + 0; x = y - 0 - 0; */ > > + else if (integer_zerop (arg1)) > > + result = arg0; > > + /* x = 0 + y + 0; */ > > + else if (subcode != MINUS_EXPR && integer_zerop (arg0)) > > + result = arg1; > > + /* x = y - y - 0; */ > > + else if (subcode == MINUS_EXPR > > + && operand_equal_p (arg0, arg1, 0)) > > + result = integer_zero_node; > > + } > > So this all performs simplifications but also constant folding. In > particular the match.pd re-simplification will invoke fold_const_call > on all-constant argument function calls but does not do extra folding > on partially constant arg cases but instead relies on patterns here. > > Can you add all-constant arg handling to fold_const_call and > consider moving cases like y + 0 + 0 to match.pd? The reason I've done this here is that this is the spot where all other similar internal functions are handled, be it the ubsan ones - IFN_UBSAN_CHECK_{ADD,SUB,MUL}, or __builtin_*_overflow ones - IFN_{ADD,SUB,MUL}_OVERFLOW, or these 2 new ones. The code handles there 2 constant arguments as well as various patterns that can be simplified and has code to clean it up later, build a COMPLEX_CST, or COMPLEX_EXPR etc. as needed. So, I think we want to handle those elsewhere, we should do it for all of those functions, but then probably incrementally. > > +@cindex @code{addc@var{m}5} instruction pattern > > +@item @samp{addc@var{m}5} > > +Adds operands 2, 3 and 4 (where the last operand is guaranteed to have > > +only values 0 or 1) together, sets operand 0 to the result of the > > +addition of the 3 operands and sets operand 1 to 1 iff there was no > > +overflow on the unsigned additions, and to 0 otherwise. So, it is > > +an addition with carry in (operand 4) and carry out (operand 1). > > +All operands have the same mode. > > operand 1 set to 1 for no overflow sounds weird when specifying it > as carry out - can you double check? Fixed. > > +@cindex @code{subc@var{m}5} instruction pattern > > +@item @samp{subc@var{m}5} > > +Similarly to @samp{addc@var{m}5}, except subtracts operands 3 and 4 > > +from operand 2 instead of adding them. So, it is > > +a subtraction with carry/borrow in (operand 4) and carry/borrow out > > +(operand 1). All operands have the same mode. > > + > > I wonder if we want to name them uaddc and usubc? Or is this supposed > to be simply the twos-complement "carry"? I think the docs should > say so then (note we do have uaddv and addv). Makes sense, I've actually renamed even the internal functions etc. Here is only lightly tested patch with everything but gimple-fold.cc changed. 2023-06-13 Jakub Jelinek PR middle-end/79173 * internal-fn.def (UADDC, USUBC): New internal functions. * internal-fn.cc (expand_UADDC, expand_USUBC): New functions. (commutative_ternary_fn_p): Return true also for IFN_UADDC. * optabs.def (uaddc5_optab, usubc5_optab): New optabs. * tree-ssa-math-opts.cc (uaddc_cast, uaddc_ne0, uaddc_is_cplxpart, match_uaddc_usubc): New functions. (math_opts_dom_walker::after_dom_children): Call match_uaddc_usubc for PLUS_EXPR, MINUS_EXPR, BIT_IOR_EXPR and BIT_XOR_EXPR unless other optimizations have been successful for those. * gimple-fold.cc (gimple_fold_call): Handle IFN_UADDC and IFN_USUBC. * gimple-range-fold.cc (adjust_imagpart_expr): Likewise. * tree-ssa-dce.cc (eliminate_unnecessary_stmts): Likewise. * doc/md.texi (uaddc5, usubc5): Document new named patterns. * config/i386/i386.md (subborrow): Add alternative with memory destination. (uaddc5, usubc5): New define_expand patterns. (*sub_3, @add3_carry, addcarry, @sub3_carry, subborrow, *add3_cc_overflow_1): Add define_peephole2 TARGET_READ_MODIFY_WRITE/-Os patterns to prefer using memory destination in these patterns. * gcc.target/i386/pr79173-1.c: New test. * gcc.target/i386/pr79173-2.c: New test. * gcc.target/i386/pr79173-3.c: New test. * gcc.target/i386/pr79173-4.c: New test. * gcc.target/i386/pr79173-5.c: New test. * gcc.target/i386/pr79173-6.c: New test. * gcc.target/i386/pr79173-7.c: New test. * gcc.target/i386/pr79173-8.c: New test. * gcc.target/i386/pr79173-9.c: New test. * gcc.target/i386/pr79173-10.c: New test. --- gcc/internal-fn.def.jj 2023-06-12 15:47:22.190506569 +0200 +++ gcc/internal-fn.def 2023-06-13 12:30:22.951974357 +0200 @@ -416,6 +416,8 @@ DEF_INTERNAL_FN (ASAN_POISON_USE, ECF_LE DEF_INTERNAL_FN (ADD_OVERFLOW, ECF_CONST | ECF_LEAF | ECF_NOTHROW, NULL) DEF_INTERNAL_FN (SUB_OVERFLOW, ECF_CONST | ECF_LEAF | ECF_NOTHROW, NULL) DEF_INTERNAL_FN (MUL_OVERFLOW, ECF_CONST | ECF_LEAF | ECF_NOTHROW, NULL) +DEF_INTERNAL_FN (UADDC, ECF_CONST | ECF_LEAF | ECF_NOTHROW, NULL) +DEF_INTERNAL_FN (USUBC, ECF_CONST | ECF_LEAF | ECF_NOTHROW, NULL) DEF_INTERNAL_FN (TSAN_FUNC_EXIT, ECF_NOVOPS | ECF_LEAF | ECF_NOTHROW, NULL) DEF_INTERNAL_FN (VA_ARG, ECF_NOTHROW | ECF_LEAF, NULL) DEF_INTERNAL_FN (VEC_CONVERT, ECF_CONST | ECF_LEAF | ECF_NOTHROW, NULL) --- gcc/internal-fn.cc.jj 2023-06-07 09:42:14.680130597 +0200 +++ gcc/internal-fn.cc 2023-06-13 12:30:23.361968621 +0200 @@ -2776,6 +2776,44 @@ expand_MUL_OVERFLOW (internal_fn, gcall expand_arith_overflow (MULT_EXPR, stmt); } +/* Expand UADDC STMT. */ + +static void +expand_UADDC (internal_fn ifn, gcall *stmt) +{ + tree lhs = gimple_call_lhs (stmt); + tree arg1 = gimple_call_arg (stmt, 0); + tree arg2 = gimple_call_arg (stmt, 1); + tree arg3 = gimple_call_arg (stmt, 2); + tree type = TREE_TYPE (arg1); + machine_mode mode = TYPE_MODE (type); + insn_code icode = optab_handler (ifn == IFN_UADDC + ? uaddc5_optab : usubc5_optab, mode); + rtx op1 = expand_normal (arg1); + rtx op2 = expand_normal (arg2); + rtx op3 = expand_normal (arg3); + rtx target = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE); + rtx re = gen_reg_rtx (mode); + rtx im = gen_reg_rtx (mode); + class expand_operand ops[5]; + create_output_operand (&ops[0], re, mode); + create_output_operand (&ops[1], im, mode); + create_input_operand (&ops[2], op1, mode); + create_input_operand (&ops[3], op2, mode); + create_input_operand (&ops[4], op3, mode); + expand_insn (icode, 5, ops); + write_complex_part (target, re, false, false); + write_complex_part (target, im, true, false); +} + +/* Expand USUBC STMT. */ + +static void +expand_USUBC (internal_fn ifn, gcall *stmt) +{ + expand_UADDC (ifn, stmt); +} + /* This should get folded in tree-vectorizer.cc. */ static void @@ -4049,6 +4087,7 @@ commutative_ternary_fn_p (internal_fn fn case IFN_FMS: case IFN_FNMA: case IFN_FNMS: + case IFN_UADDC: return true; default: --- gcc/optabs.def.jj 2023-06-12 15:47:22.261505587 +0200 +++ gcc/optabs.def 2023-06-13 12:30:23.372968467 +0200 @@ -260,6 +260,8 @@ OPTAB_D (uaddv4_optab, "uaddv$I$a4") OPTAB_D (usubv4_optab, "usubv$I$a4") OPTAB_D (umulv4_optab, "umulv$I$a4") OPTAB_D (negv3_optab, "negv$I$a3") +OPTAB_D (uaddc5_optab, "uaddc$I$a5") +OPTAB_D (usubc5_optab, "usubc$I$a5") OPTAB_D (addptr3_optab, "addptr$a3") OPTAB_D (spaceship_optab, "spaceship$a3") --- gcc/tree-ssa-math-opts.cc.jj 2023-06-07 09:41:49.573479611 +0200 +++ gcc/tree-ssa-math-opts.cc 2023-06-13 13:04:43.699152339 +0200 @@ -4441,6 +4441,434 @@ match_arith_overflow (gimple_stmt_iterat return false; } +/* Helper of match_uaddc_usubc. Look through an integral cast + which should preserve [0, 1] range value (unless source has + 1-bit signed type) and the cast has single use. */ + +static gimple * +uaddc_cast (gimple *g) +{ + if (!gimple_assign_cast_p (g)) + return g; + tree op = gimple_assign_rhs1 (g); + if (TREE_CODE (op) == SSA_NAME + && INTEGRAL_TYPE_P (TREE_TYPE (op)) + && (TYPE_PRECISION (TREE_TYPE (op)) > 1 + || TYPE_UNSIGNED (TREE_TYPE (op))) + && has_single_use (gimple_assign_lhs (g))) + return SSA_NAME_DEF_STMT (op); + return g; +} + +/* Helper of match_uaddc_usubc. Look through a NE_EXPR + comparison with 0 which also preserves [0, 1] value range. */ + +static gimple * +uaddc_ne0 (gimple *g) +{ + if (is_gimple_assign (g) + && gimple_assign_rhs_code (g) == NE_EXPR + && integer_zerop (gimple_assign_rhs2 (g)) + && TREE_CODE (gimple_assign_rhs1 (g)) == SSA_NAME + && has_single_use (gimple_assign_lhs (g))) + return SSA_NAME_DEF_STMT (gimple_assign_rhs1 (g)); + return g; +} + +/* Return true if G is {REAL,IMAG}PART_EXPR PART with SSA_NAME + operand. */ + +static bool +uaddc_is_cplxpart (gimple *g, tree_code part) +{ + return (is_gimple_assign (g) + && gimple_assign_rhs_code (g) == part + && TREE_CODE (TREE_OPERAND (gimple_assign_rhs1 (g), 0)) == SSA_NAME); +} + +/* Try to match e.g. + _29 = .ADD_OVERFLOW (_3, _4); + _30 = REALPART_EXPR <_29>; + _31 = IMAGPART_EXPR <_29>; + _32 = .ADD_OVERFLOW (_30, _38); + _33 = REALPART_EXPR <_32>; + _34 = IMAGPART_EXPR <_32>; + _35 = _31 + _34; + as + _36 = .UADDC (_3, _4, _38); + _33 = REALPART_EXPR <_36>; + _35 = IMAGPART_EXPR <_36>; + or + _22 = .SUB_OVERFLOW (_6, _5); + _23 = REALPART_EXPR <_22>; + _24 = IMAGPART_EXPR <_22>; + _25 = .SUB_OVERFLOW (_23, _37); + _26 = REALPART_EXPR <_25>; + _27 = IMAGPART_EXPR <_25>; + _28 = _24 | _27; + as + _29 = .USUBC (_6, _5, _37); + _26 = REALPART_EXPR <_29>; + _288 = IMAGPART_EXPR <_29>; + provided _38 or _37 above have [0, 1] range + and _3, _4 and _30 or _6, _5 and _23 are unsigned + integral types with the same precision. Whether + or | or ^ is + used on the IMAGPART_EXPR results doesn't matter, with one of + added or subtracted operands in [0, 1] range at most one + .ADD_OVERFLOW or .SUB_OVERFLOW will indicate overflow. */ + +static bool +match_uaddc_usubc (gimple_stmt_iterator *gsi, gimple *stmt, tree_code code) +{ + tree rhs[4]; + rhs[0] = gimple_assign_rhs1 (stmt); + rhs[1] = gimple_assign_rhs2 (stmt); + rhs[2] = NULL_TREE; + rhs[3] = NULL_TREE; + tree type = TREE_TYPE (rhs[0]); + if (!INTEGRAL_TYPE_P (type) || !TYPE_UNSIGNED (type)) + return false; + + if (code != BIT_IOR_EXPR && code != BIT_XOR_EXPR) + { + /* If overflow flag is ignored on the MSB limb, we can end up with + the most significant limb handled as r = op1 + op2 + ovf1 + ovf2; + or r = op1 - op2 - ovf1 - ovf2; or various equivalent expressions + thereof. Handle those like the ovf = ovf1 + ovf2; case to recognize + the limb below the MSB, but also create another .UADDC/.USUBC call + for the last limb. */ + while (TREE_CODE (rhs[0]) == SSA_NAME && !rhs[3]) + { + gimple *g = SSA_NAME_DEF_STMT (rhs[0]); + if (has_single_use (rhs[0]) + && is_gimple_assign (g) + && (gimple_assign_rhs_code (g) == code + || (code == MINUS_EXPR + && gimple_assign_rhs_code (g) == PLUS_EXPR + && TREE_CODE (gimple_assign_rhs2 (g)) == INTEGER_CST))) + { + tree r2 = gimple_assign_rhs2 (g); + if (gimple_assign_rhs_code (g) != code) + { + r2 = const_unop (NEGATE_EXPR, TREE_TYPE (r2), r2); + if (!r2) + break; + } + rhs[0] = gimple_assign_rhs1 (g); + tree &r = rhs[2] ? rhs[3] : rhs[2]; + r = r2; + } + else + break; + } + while (TREE_CODE (rhs[1]) == SSA_NAME && !rhs[3]) + { + gimple *g = SSA_NAME_DEF_STMT (rhs[1]); + if (has_single_use (rhs[1]) + && is_gimple_assign (g) + && gimple_assign_rhs_code (g) == PLUS_EXPR) + { + rhs[1] = gimple_assign_rhs1 (g); + if (rhs[2]) + rhs[3] = gimple_assign_rhs2 (g); + else + rhs[2] = gimple_assign_rhs2 (g); + } + else + break; + } + if (rhs[2] && !rhs[3]) + { + for (int i = (code == MINUS_EXPR ? 1 : 0); i < 3; ++i) + if (TREE_CODE (rhs[i]) == SSA_NAME) + { + gimple *im = uaddc_cast (SSA_NAME_DEF_STMT (rhs[i])); + im = uaddc_ne0 (im); + if (uaddc_is_cplxpart (im, IMAGPART_EXPR)) + { + tree rhs1 = gimple_assign_rhs1 (im); + gimple *ovf = SSA_NAME_DEF_STMT (TREE_OPERAND (rhs1, 0)); + if (gimple_call_internal_p (ovf, code == PLUS_EXPR + ? IFN_UADDC : IFN_USUBC) + && (optab_handler (code == PLUS_EXPR + ? uaddc5_optab : usubc5_optab, + TYPE_MODE (type)) + != CODE_FOR_nothing)) + { + if (i != 2) + std::swap (rhs[i], rhs[2]); + gimple *g + = gimple_build_call_internal (code == PLUS_EXPR + ? IFN_UADDC + : IFN_USUBC, + 3, rhs[0], rhs[1], + rhs[2]); + tree nlhs = make_ssa_name (build_complex_type (type)); + gimple_call_set_lhs (g, nlhs); + gsi_insert_before (gsi, g, GSI_SAME_STMT); + tree ilhs = gimple_assign_lhs (stmt); + g = gimple_build_assign (ilhs, REALPART_EXPR, + build1 (REALPART_EXPR, + TREE_TYPE (ilhs), + nlhs)); + gsi_replace (gsi, g, true); + return true; + } + } + } + return false; + } + if (code == MINUS_EXPR && !rhs[2]) + return false; + if (code == MINUS_EXPR) + /* Code below expects rhs[0] and rhs[1] to have the IMAGPART_EXPRs. + So, for MINUS_EXPR swap the single added rhs operand (others are + subtracted) to rhs[3]. */ + std::swap (rhs[0], rhs[3]); + } + gimple *im1 = NULL, *im2 = NULL; + for (int i = 0; i < (code == MINUS_EXPR ? 3 : 4); i++) + if (rhs[i] && TREE_CODE (rhs[i]) == SSA_NAME) + { + gimple *im = uaddc_cast (SSA_NAME_DEF_STMT (rhs[i])); + im = uaddc_ne0 (im); + if (uaddc_is_cplxpart (im, IMAGPART_EXPR)) + { + if (im1 == NULL) + { + im1 = im; + if (i != 0) + std::swap (rhs[0], rhs[i]); + } + else + { + im2 = im; + if (i != 1) + std::swap (rhs[1], rhs[i]); + break; + } + } + } + if (!im2) + return false; + gimple *ovf1 + = SSA_NAME_DEF_STMT (TREE_OPERAND (gimple_assign_rhs1 (im1), 0)); + gimple *ovf2 + = SSA_NAME_DEF_STMT (TREE_OPERAND (gimple_assign_rhs1 (im2), 0)); + internal_fn ifn; + if (!is_gimple_call (ovf1) + || !gimple_call_internal_p (ovf1) + || ((ifn = gimple_call_internal_fn (ovf1)) != IFN_ADD_OVERFLOW + && ifn != IFN_SUB_OVERFLOW) + || !gimple_call_internal_p (ovf2, ifn) + || optab_handler (ifn == IFN_ADD_OVERFLOW ? uaddc5_optab : usubc5_optab, + TYPE_MODE (type)) == CODE_FOR_nothing + || (rhs[2] + && optab_handler (code == PLUS_EXPR ? uaddc5_optab : usubc5_optab, + TYPE_MODE (type)) == CODE_FOR_nothing)) + return false; + tree arg1, arg2, arg3 = NULL_TREE; + gimple *re1 = NULL, *re2 = NULL; + for (int i = (ifn == IFN_ADD_OVERFLOW ? 1 : 0); i >= 0; --i) + for (gimple *ovf = ovf1; ovf; ovf = (ovf == ovf1 ? ovf2 : NULL)) + { + tree arg = gimple_call_arg (ovf, i); + if (TREE_CODE (arg) != SSA_NAME) + continue; + re1 = SSA_NAME_DEF_STMT (arg); + if (uaddc_is_cplxpart (re1, REALPART_EXPR) + && (SSA_NAME_DEF_STMT (TREE_OPERAND (gimple_assign_rhs1 (re1), 0)) + == (ovf == ovf1 ? ovf2 : ovf1))) + { + if (ovf == ovf1) + { + std::swap (rhs[0], rhs[1]); + std::swap (im1, im2); + std::swap (ovf1, ovf2); + } + arg3 = gimple_call_arg (ovf, 1 - i); + i = -1; + break; + } + } + if (!arg3) + return false; + arg1 = gimple_call_arg (ovf1, 0); + arg2 = gimple_call_arg (ovf1, 1); + if (!types_compatible_p (type, TREE_TYPE (arg1))) + return false; + int kind[2] = { 0, 0 }; + /* At least one of arg2 and arg3 should have type compatible + with arg1/rhs[0], and the other one should have value in [0, 1] + range. */ + for (int i = 0; i < 2; ++i) + { + tree arg = i == 0 ? arg2 : arg3; + if (types_compatible_p (type, TREE_TYPE (arg))) + kind[i] = 1; + if (!INTEGRAL_TYPE_P (TREE_TYPE (arg)) + || (TYPE_PRECISION (TREE_TYPE (arg)) == 1 + && !TYPE_UNSIGNED (TREE_TYPE (arg)))) + continue; + if (tree_zero_one_valued_p (arg)) + kind[i] |= 2; + if (TREE_CODE (arg) == SSA_NAME) + { + gimple *g = SSA_NAME_DEF_STMT (arg); + if (gimple_assign_cast_p (g)) + { + tree op = gimple_assign_rhs1 (g); + if (TREE_CODE (op) == SSA_NAME + && INTEGRAL_TYPE_P (TREE_TYPE (op))) + g = SSA_NAME_DEF_STMT (op); + } + g = uaddc_ne0 (g); + if (!uaddc_is_cplxpart (g, IMAGPART_EXPR)) + continue; + g = SSA_NAME_DEF_STMT (TREE_OPERAND (gimple_assign_rhs1 (g), 0)); + if (!is_gimple_call (g) || !gimple_call_internal_p (g)) + continue; + switch (gimple_call_internal_fn (g)) + { + case IFN_ADD_OVERFLOW: + case IFN_SUB_OVERFLOW: + case IFN_UADDC: + case IFN_USUBC: + break; + default: + continue; + } + kind[i] |= 4; + } + } + /* Make arg2 the one with compatible type and arg3 the one + with [0, 1] range. If both is true for both operands, + prefer as arg3 result of __imag__ of some ifn. */ + if ((kind[0] & 1) == 0 || ((kind[1] & 1) != 0 && kind[0] > kind[1])) + { + std::swap (arg2, arg3); + std::swap (kind[0], kind[1]); + } + if ((kind[0] & 1) == 0 || (kind[1] & 6) == 0) + return false; + if (!has_single_use (gimple_assign_lhs (im1)) + || !has_single_use (gimple_assign_lhs (im2)) + || !has_single_use (gimple_assign_lhs (re1)) + || num_imm_uses (gimple_call_lhs (ovf1)) != 2) + return false; + use_operand_p use_p; + imm_use_iterator iter; + tree lhs = gimple_call_lhs (ovf2); + FOR_EACH_IMM_USE_FAST (use_p, iter, lhs) + { + gimple *use_stmt = USE_STMT (use_p); + if (is_gimple_debug (use_stmt)) + continue; + if (use_stmt == im2) + continue; + if (re2) + return false; + if (!uaddc_is_cplxpart (use_stmt, REALPART_EXPR)) + return false; + re2 = use_stmt; + } + gimple_stmt_iterator gsi2 = gsi_for_stmt (ovf2); + gimple *g; + if ((kind[1] & 1) == 0) + { + if (TREE_CODE (arg3) == INTEGER_CST) + arg3 = fold_convert (type, arg3); + else + { + g = gimple_build_assign (make_ssa_name (type), NOP_EXPR, arg3); + gsi_insert_before (&gsi2, g, GSI_SAME_STMT); + arg3 = gimple_assign_lhs (g); + } + } + g = gimple_build_call_internal (ifn == IFN_ADD_OVERFLOW + ? IFN_UADDC : IFN_USUBC, + 3, arg1, arg2, arg3); + tree nlhs = make_ssa_name (TREE_TYPE (lhs)); + gimple_call_set_lhs (g, nlhs); + gsi_insert_before (&gsi2, g, GSI_SAME_STMT); + tree ilhs = rhs[2] ? make_ssa_name (type) : gimple_assign_lhs (stmt); + g = gimple_build_assign (ilhs, IMAGPART_EXPR, + build1 (IMAGPART_EXPR, TREE_TYPE (ilhs), nlhs)); + if (rhs[2]) + gsi_insert_before (gsi, g, GSI_SAME_STMT); + else + gsi_replace (gsi, g, true); + tree rhs1 = rhs[1]; + for (int i = 0; i < 2; i++) + if (rhs1 == gimple_assign_lhs (im2)) + break; + else + { + g = SSA_NAME_DEF_STMT (rhs1); + rhs1 = gimple_assign_rhs1 (g); + gsi2 = gsi_for_stmt (g); + gsi_remove (&gsi2, true); + } + gcc_checking_assert (rhs1 == gimple_assign_lhs (im2)); + gsi2 = gsi_for_stmt (im2); + gsi_remove (&gsi2, true); + gsi2 = gsi_for_stmt (re2); + tree rlhs = gimple_assign_lhs (re2); + g = gimple_build_assign (rlhs, REALPART_EXPR, + build1 (REALPART_EXPR, TREE_TYPE (rlhs), nlhs)); + gsi_replace (&gsi2, g, true); + if (rhs[2]) + { + g = gimple_build_call_internal (code == PLUS_EXPR + ? IFN_UADDC : IFN_USUBC, + 3, rhs[3], rhs[2], ilhs); + nlhs = make_ssa_name (TREE_TYPE (lhs)); + gimple_call_set_lhs (g, nlhs); + gsi_insert_before (gsi, g, GSI_SAME_STMT); + ilhs = gimple_assign_lhs (stmt); + g = gimple_build_assign (ilhs, REALPART_EXPR, + build1 (REALPART_EXPR, TREE_TYPE (ilhs), nlhs)); + gsi_replace (gsi, g, true); + } + if (TREE_CODE (arg3) == SSA_NAME) + { + gimple *im3 = SSA_NAME_DEF_STMT (arg3); + for (int i = 0; i < 2; ++i) + { + gimple *im4 = uaddc_cast (im3); + if (im4 == im3) + break; + else + im3 = im4; + } + im3 = uaddc_ne0 (im3); + if (uaddc_is_cplxpart (im3, IMAGPART_EXPR)) + { + gimple *ovf3 + = SSA_NAME_DEF_STMT (TREE_OPERAND (gimple_assign_rhs1 (im3), 0)); + if (gimple_call_internal_p (ovf3, ifn)) + { + lhs = gimple_call_lhs (ovf3); + arg1 = gimple_call_arg (ovf3, 0); + arg2 = gimple_call_arg (ovf3, 1); + if (types_compatible_p (type, TREE_TYPE (TREE_TYPE (lhs))) + && types_compatible_p (type, TREE_TYPE (arg1)) + && types_compatible_p (type, TREE_TYPE (arg2))) + { + g = gimple_build_call_internal (ifn == IFN_ADD_OVERFLOW + ? IFN_UADDC : IFN_USUBC, + 3, arg1, arg2, + build_zero_cst (type)); + gimple_call_set_lhs (g, lhs); + gsi2 = gsi_for_stmt (ovf3); + gsi_replace (&gsi2, g, true); + } + } + } + } + return true; +} + /* Return true if target has support for divmod. */ static bool @@ -5068,8 +5496,9 @@ math_opts_dom_walker::after_dom_children case PLUS_EXPR: case MINUS_EXPR: - if (!convert_plusminus_to_widen (&gsi, stmt, code)) - match_arith_overflow (&gsi, stmt, code, m_cfg_changed_p); + if (!convert_plusminus_to_widen (&gsi, stmt, code) + && !match_arith_overflow (&gsi, stmt, code, m_cfg_changed_p)) + match_uaddc_usubc (&gsi, stmt, code); break; case BIT_NOT_EXPR: @@ -5085,6 +5514,11 @@ math_opts_dom_walker::after_dom_children convert_mult_to_highpart (as_a (stmt), &gsi); break; + case BIT_IOR_EXPR: + case BIT_XOR_EXPR: + match_uaddc_usubc (&gsi, stmt, code); + break; + default:; } } --- gcc/gimple-fold.cc.jj 2023-06-07 09:41:49.117485950 +0200 +++ gcc/gimple-fold.cc 2023-06-13 12:30:23.392968187 +0200 @@ -5585,6 +5585,7 @@ gimple_fold_call (gimple_stmt_iterator * enum tree_code subcode = ERROR_MARK; tree result = NULL_TREE; bool cplx_result = false; + bool uaddc_usubc = false; tree overflow = NULL_TREE; switch (gimple_call_internal_fn (stmt)) { @@ -5658,6 +5659,16 @@ gimple_fold_call (gimple_stmt_iterator * subcode = MULT_EXPR; cplx_result = true; break; + case IFN_UADDC: + subcode = PLUS_EXPR; + cplx_result = true; + uaddc_usubc = true; + break; + case IFN_USUBC: + subcode = MINUS_EXPR; + cplx_result = true; + uaddc_usubc = true; + break; case IFN_MASK_LOAD: changed |= gimple_fold_partial_load (gsi, stmt, true); break; @@ -5677,6 +5688,7 @@ gimple_fold_call (gimple_stmt_iterator * { tree arg0 = gimple_call_arg (stmt, 0); tree arg1 = gimple_call_arg (stmt, 1); + tree arg2 = NULL_TREE; tree type = TREE_TYPE (arg0); if (cplx_result) { @@ -5685,9 +5697,26 @@ gimple_fold_call (gimple_stmt_iterator * type = NULL_TREE; else type = TREE_TYPE (TREE_TYPE (lhs)); + if (uaddc_usubc) + arg2 = gimple_call_arg (stmt, 2); } if (type == NULL_TREE) ; + else if (uaddc_usubc) + { + if (!integer_zerop (arg2)) + ; + /* x = y + 0 + 0; x = y - 0 - 0; */ + else if (integer_zerop (arg1)) + result = arg0; + /* x = 0 + y + 0; */ + else if (subcode != MINUS_EXPR && integer_zerop (arg0)) + result = arg1; + /* x = y - y - 0; */ + else if (subcode == MINUS_EXPR + && operand_equal_p (arg0, arg1, 0)) + result = integer_zero_node; + } /* x = y + 0; x = y - 0; x = y * 0; */ else if (integer_zerop (arg1)) result = subcode == MULT_EXPR ? integer_zero_node : arg0; @@ -5702,8 +5731,11 @@ gimple_fold_call (gimple_stmt_iterator * result = arg0; else if (subcode == MULT_EXPR && integer_onep (arg0)) result = arg1; - else if (TREE_CODE (arg0) == INTEGER_CST - && TREE_CODE (arg1) == INTEGER_CST) + if (type + && result == NULL_TREE + && TREE_CODE (arg0) == INTEGER_CST + && TREE_CODE (arg1) == INTEGER_CST + && (!uaddc_usubc || TREE_CODE (arg2) == INTEGER_CST)) { if (cplx_result) result = int_const_binop (subcode, fold_convert (type, arg0), @@ -5717,6 +5749,15 @@ gimple_fold_call (gimple_stmt_iterator * else result = NULL_TREE; } + if (uaddc_usubc && result) + { + tree r = int_const_binop (subcode, result, + fold_convert (type, arg2)); + if (r == NULL_TREE) + result = NULL_TREE; + else if (arith_overflowed_p (subcode, type, result, arg2)) + overflow = build_one_cst (type); + } } if (result) { --- gcc/gimple-range-fold.cc.jj 2023-06-07 09:41:49.125485839 +0200 +++ gcc/gimple-range-fold.cc 2023-06-13 12:30:23.405968006 +0200 @@ -489,6 +489,8 @@ adjust_imagpart_expr (vrange &res, const case IFN_ADD_OVERFLOW: case IFN_SUB_OVERFLOW: case IFN_MUL_OVERFLOW: + case IFN_UADDC: + case IFN_USUBC: case IFN_ATOMIC_COMPARE_EXCHANGE: { int_range<2> r; --- gcc/tree-ssa-dce.cc.jj 2023-06-07 09:41:49.272483796 +0200 +++ gcc/tree-ssa-dce.cc 2023-06-13 12:30:23.415967865 +0200 @@ -1481,6 +1481,14 @@ eliminate_unnecessary_stmts (bool aggres case IFN_MUL_OVERFLOW: maybe_optimize_arith_overflow (&gsi, MULT_EXPR); break; + case IFN_UADDC: + if (integer_zerop (gimple_call_arg (stmt, 2))) + maybe_optimize_arith_overflow (&gsi, PLUS_EXPR); + break; + case IFN_USUBC: + if (integer_zerop (gimple_call_arg (stmt, 2))) + maybe_optimize_arith_overflow (&gsi, MINUS_EXPR); + break; default: break; } --- gcc/doc/md.texi.jj 2023-06-12 15:47:22.145507192 +0200 +++ gcc/doc/md.texi 2023-06-13 13:09:50.699868708 +0200 @@ -5224,6 +5224,22 @@ is taken only on unsigned overflow. @item @samp{usubv@var{m}4}, @samp{umulv@var{m}4} Similar, for other unsigned arithmetic operations. +@cindex @code{uaddc@var{m}5} instruction pattern +@item @samp{uaddc@var{m}5} +Adds unsigned operands 2, 3 and 4 (where the last operand is guaranteed to +have only values 0 or 1) together, sets operand 0 to the result of the +addition of the 3 operands and sets operand 1 to 1 iff there was +overflow on the unsigned additions, and to 0 otherwise. So, it is +an addition with carry in (operand 4) and carry out (operand 1). +All operands have the same mode. + +@cindex @code{usubc@var{m}5} instruction pattern +@item @samp{usubc@var{m}5} +Similarly to @samp{uaddc@var{m}5}, except subtracts unsigned operands 3 +and 4 from operand 2 instead of adding them. So, it is +a subtraction with carry/borrow in (operand 4) and carry/borrow out +(operand 1). All operands have the same mode. + @cindex @code{addptr@var{m}3} instruction pattern @item @samp{addptr@var{m}3} Like @code{add@var{m}3} but is guaranteed to only be used for address --- gcc/config/i386/i386.md.jj 2023-06-12 15:47:21.894510663 +0200 +++ gcc/config/i386/i386.md 2023-06-13 12:30:23.465967165 +0200 @@ -7733,6 +7733,25 @@ (define_peephole2 [(set (reg:CC FLAGS_REG) (compare:CC (match_dup 0) (match_dup 1)))]) +(define_peephole2 + [(set (match_operand:SWI 0 "general_reg_operand") + (match_operand:SWI 1 "memory_operand")) + (parallel [(set (reg:CC FLAGS_REG) + (compare:CC (match_dup 0) + (match_operand:SWI 2 "memory_operand"))) + (set (match_dup 0) + (minus:SWI (match_dup 0) (match_dup 2)))]) + (set (match_dup 1) (match_dup 0))] + "(TARGET_READ_MODIFY_WRITE || optimize_insn_for_size_p ()) + && peep2_reg_dead_p (3, operands[0]) + && !reg_overlap_mentioned_p (operands[0], operands[1]) + && !reg_overlap_mentioned_p (operands[0], operands[2])" + [(set (match_dup 0) (match_dup 2)) + (parallel [(set (reg:CC FLAGS_REG) + (compare:CC (match_dup 1) (match_dup 0))) + (set (match_dup 1) + (minus:SWI (match_dup 1) (match_dup 0)))])]) + ;; decl %eax; cmpl $-1, %eax; jne .Lxx; can be optimized into ;; subl $1, %eax; jnc .Lxx; (define_peephole2 @@ -7818,6 +7837,59 @@ (define_insn "@add3_carry" (set_attr "pent_pair" "pu") (set_attr "mode" "")]) +(define_peephole2 + [(set (match_operand:SWI 0 "general_reg_operand") + (match_operand:SWI 1 "memory_operand")) + (parallel [(set (match_dup 0) + (plus:SWI + (plus:SWI + (match_operator:SWI 4 "ix86_carry_flag_operator" + [(match_operand 3 "flags_reg_operand") + (const_int 0)]) + (match_dup 0)) + (match_operand:SWI 2 "memory_operand"))) + (clobber (reg:CC FLAGS_REG))]) + (set (match_dup 1) (match_dup 0))] + "(TARGET_READ_MODIFY_WRITE || optimize_insn_for_size_p ()) + && peep2_reg_dead_p (3, operands[0]) + && !reg_overlap_mentioned_p (operands[0], operands[1]) + && !reg_overlap_mentioned_p (operands[0], operands[2])" + [(set (match_dup 0) (match_dup 2)) + (parallel [(set (match_dup 1) + (plus:SWI (plus:SWI (match_op_dup 4 + [(match_dup 3) (const_int 0)]) + (match_dup 1)) + (match_dup 0))) + (clobber (reg:CC FLAGS_REG))])]) + +(define_peephole2 + [(set (match_operand:SWI 0 "general_reg_operand") + (match_operand:SWI 1 "memory_operand")) + (parallel [(set (match_dup 0) + (plus:SWI + (plus:SWI + (match_operator:SWI 4 "ix86_carry_flag_operator" + [(match_operand 3 "flags_reg_operand") + (const_int 0)]) + (match_dup 0)) + (match_operand:SWI 2 "memory_operand"))) + (clobber (reg:CC FLAGS_REG))]) + (set (match_operand:SWI 5 "general_reg_operand") (match_dup 0)) + (set (match_dup 1) (match_dup 5))] + "(TARGET_READ_MODIFY_WRITE || optimize_insn_for_size_p ()) + && peep2_reg_dead_p (3, operands[0]) + && peep2_reg_dead_p (4, operands[5]) + && !reg_overlap_mentioned_p (operands[0], operands[1]) + && !reg_overlap_mentioned_p (operands[0], operands[2]) + && !reg_overlap_mentioned_p (operands[5], operands[1])" + [(set (match_dup 0) (match_dup 2)) + (parallel [(set (match_dup 1) + (plus:SWI (plus:SWI (match_op_dup 4 + [(match_dup 3) (const_int 0)]) + (match_dup 1)) + (match_dup 0))) + (clobber (reg:CC FLAGS_REG))])]) + (define_insn "*add3_carry_0" [(set (match_operand:SWI 0 "nonimmediate_operand" "=m") (plus:SWI @@ -7918,6 +7990,159 @@ (define_insn "addcarry" (set_attr "pent_pair" "pu") (set_attr "mode" "")]) +;; Helper peephole2 for the addcarry and subborrow +;; peephole2s, to optimize away nop which resulted from uaddc/usubc +;; expansion optimization. +(define_peephole2 + [(set (match_operand:SWI48 0 "general_reg_operand") + (match_operand:SWI48 1 "memory_operand")) + (const_int 0)] + "" + [(set (match_dup 0) (match_dup 1))]) + +(define_peephole2 + [(parallel [(set (reg:CCC FLAGS_REG) + (compare:CCC + (zero_extend: + (plus:SWI48 + (plus:SWI48 + (match_operator:SWI48 4 "ix86_carry_flag_operator" + [(match_operand 2 "flags_reg_operand") + (const_int 0)]) + (match_operand:SWI48 0 "general_reg_operand")) + (match_operand:SWI48 1 "memory_operand"))) + (plus: + (zero_extend: (match_dup 1)) + (match_operator: 3 "ix86_carry_flag_operator" + [(match_dup 2) (const_int 0)])))) + (set (match_dup 0) + (plus:SWI48 (plus:SWI48 (match_op_dup 4 + [(match_dup 2) (const_int 0)]) + (match_dup 0)) + (match_dup 1)))]) + (set (match_dup 1) (match_dup 0))] + "(TARGET_READ_MODIFY_WRITE || optimize_insn_for_size_p ()) + && peep2_reg_dead_p (2, operands[0]) + && !reg_overlap_mentioned_p (operands[0], operands[1])" + [(parallel [(set (reg:CCC FLAGS_REG) + (compare:CCC + (zero_extend: + (plus:SWI48 + (plus:SWI48 + (match_op_dup 4 + [(match_dup 2) (const_int 0)]) + (match_dup 1)) + (match_dup 0))) + (plus: + (zero_extend: (match_dup 0)) + (match_op_dup 3 + [(match_dup 2) (const_int 0)])))) + (set (match_dup 1) + (plus:SWI48 (plus:SWI48 (match_op_dup 4 + [(match_dup 2) (const_int 0)]) + (match_dup 1)) + (match_dup 0)))])]) + +(define_peephole2 + [(set (match_operand:SWI48 0 "general_reg_operand") + (match_operand:SWI48 1 "memory_operand")) + (parallel [(set (reg:CCC FLAGS_REG) + (compare:CCC + (zero_extend: + (plus:SWI48 + (plus:SWI48 + (match_operator:SWI48 5 "ix86_carry_flag_operator" + [(match_operand 3 "flags_reg_operand") + (const_int 0)]) + (match_dup 0)) + (match_operand:SWI48 2 "memory_operand"))) + (plus: + (zero_extend: (match_dup 2)) + (match_operator: 4 "ix86_carry_flag_operator" + [(match_dup 3) (const_int 0)])))) + (set (match_dup 0) + (plus:SWI48 (plus:SWI48 (match_op_dup 5 + [(match_dup 3) (const_int 0)]) + (match_dup 0)) + (match_dup 2)))]) + (set (match_dup 1) (match_dup 0))] + "(TARGET_READ_MODIFY_WRITE || optimize_insn_for_size_p ()) + && peep2_reg_dead_p (3, operands[0]) + && !reg_overlap_mentioned_p (operands[0], operands[1]) + && !reg_overlap_mentioned_p (operands[0], operands[2])" + [(set (match_dup 0) (match_dup 2)) + (parallel [(set (reg:CCC FLAGS_REG) + (compare:CCC + (zero_extend: + (plus:SWI48 + (plus:SWI48 + (match_op_dup 5 + [(match_dup 3) (const_int 0)]) + (match_dup 1)) + (match_dup 0))) + (plus: + (zero_extend: (match_dup 0)) + (match_op_dup 4 + [(match_dup 3) (const_int 0)])))) + (set (match_dup 1) + (plus:SWI48 (plus:SWI48 (match_op_dup 5 + [(match_dup 3) (const_int 0)]) + (match_dup 1)) + (match_dup 0)))])]) + +(define_peephole2 + [(parallel [(set (reg:CCC FLAGS_REG) + (compare:CCC + (zero_extend: + (plus:SWI48 + (plus:SWI48 + (match_operator:SWI48 4 "ix86_carry_flag_operator" + [(match_operand 2 "flags_reg_operand") + (const_int 0)]) + (match_operand:SWI48 0 "general_reg_operand")) + (match_operand:SWI48 1 "memory_operand"))) + (plus: + (zero_extend: (match_dup 1)) + (match_operator: 3 "ix86_carry_flag_operator" + [(match_dup 2) (const_int 0)])))) + (set (match_dup 0) + (plus:SWI48 (plus:SWI48 (match_op_dup 4 + [(match_dup 2) (const_int 0)]) + (match_dup 0)) + (match_dup 1)))]) + (set (match_operand:QI 5 "general_reg_operand") + (ltu:QI (reg:CCC FLAGS_REG) (const_int 0))) + (set (match_operand:SWI48 6 "general_reg_operand") + (zero_extend:SWI48 (match_dup 5))) + (set (match_dup 1) (match_dup 0))] + "(TARGET_READ_MODIFY_WRITE || optimize_insn_for_size_p ()) + && peep2_reg_dead_p (4, operands[0]) + && !reg_overlap_mentioned_p (operands[0], operands[1]) + && !reg_overlap_mentioned_p (operands[0], operands[5]) + && !reg_overlap_mentioned_p (operands[5], operands[1]) + && !reg_overlap_mentioned_p (operands[0], operands[6]) + && !reg_overlap_mentioned_p (operands[6], operands[1])" + [(parallel [(set (reg:CCC FLAGS_REG) + (compare:CCC + (zero_extend: + (plus:SWI48 + (plus:SWI48 + (match_op_dup 4 + [(match_dup 2) (const_int 0)]) + (match_dup 1)) + (match_dup 0))) + (plus: + (zero_extend: (match_dup 0)) + (match_op_dup 3 + [(match_dup 2) (const_int 0)])))) + (set (match_dup 1) + (plus:SWI48 (plus:SWI48 (match_op_dup 4 + [(match_dup 2) (const_int 0)]) + (match_dup 1)) + (match_dup 0)))]) + (set (match_dup 5) (ltu:QI (reg:CCC FLAGS_REG) (const_int 0))) + (set (match_dup 6) (zero_extend:SWI48 (match_dup 5)))]) + (define_expand "addcarry_0" [(parallel [(set (reg:CCC FLAGS_REG) @@ -7988,6 +8213,59 @@ (define_insn "@sub3_carry" (set_attr "pent_pair" "pu") (set_attr "mode" "")]) +(define_peephole2 + [(set (match_operand:SWI 0 "general_reg_operand") + (match_operand:SWI 1 "memory_operand")) + (parallel [(set (match_dup 0) + (minus:SWI + (minus:SWI + (match_dup 0) + (match_operator:SWI 4 "ix86_carry_flag_operator" + [(match_operand 3 "flags_reg_operand") + (const_int 0)])) + (match_operand:SWI 2 "memory_operand"))) + (clobber (reg:CC FLAGS_REG))]) + (set (match_dup 1) (match_dup 0))] + "(TARGET_READ_MODIFY_WRITE || optimize_insn_for_size_p ()) + && peep2_reg_dead_p (3, operands[0]) + && !reg_overlap_mentioned_p (operands[0], operands[1]) + && !reg_overlap_mentioned_p (operands[0], operands[2])" + [(set (match_dup 0) (match_dup 2)) + (parallel [(set (match_dup 1) + (minus:SWI (minus:SWI (match_dup 1) + (match_op_dup 4 + [(match_dup 3) (const_int 0)])) + (match_dup 0))) + (clobber (reg:CC FLAGS_REG))])]) + +(define_peephole2 + [(set (match_operand:SWI 0 "general_reg_operand") + (match_operand:SWI 1 "memory_operand")) + (parallel [(set (match_dup 0) + (minus:SWI + (minus:SWI + (match_dup 0) + (match_operator:SWI 4 "ix86_carry_flag_operator" + [(match_operand 3 "flags_reg_operand") + (const_int 0)])) + (match_operand:SWI 2 "memory_operand"))) + (clobber (reg:CC FLAGS_REG))]) + (set (match_operand:SWI 5 "general_reg_operand") (match_dup 0)) + (set (match_dup 1) (match_dup 5))] + "(TARGET_READ_MODIFY_WRITE || optimize_insn_for_size_p ()) + && peep2_reg_dead_p (3, operands[0]) + && peep2_reg_dead_p (4, operands[5]) + && !reg_overlap_mentioned_p (operands[0], operands[1]) + && !reg_overlap_mentioned_p (operands[0], operands[2]) + && !reg_overlap_mentioned_p (operands[5], operands[1])" + [(set (match_dup 0) (match_dup 2)) + (parallel [(set (match_dup 1) + (minus:SWI (minus:SWI (match_dup 1) + (match_op_dup 4 + [(match_dup 3) (const_int 0)])) + (match_dup 0))) + (clobber (reg:CC FLAGS_REG))])]) + (define_insn "*sub3_carry_0" [(set (match_operand:SWI 0 "nonimmediate_operand" "=m") (minus:SWI @@ -8113,13 +8391,13 @@ (define_insn "subborrow" [(set (reg:CCC FLAGS_REG) (compare:CCC (zero_extend: - (match_operand:SWI48 1 "nonimmediate_operand" "0")) + (match_operand:SWI48 1 "nonimmediate_operand" "0,0")) (plus: (match_operator: 4 "ix86_carry_flag_operator" [(match_operand 3 "flags_reg_operand") (const_int 0)]) (zero_extend: - (match_operand:SWI48 2 "nonimmediate_operand" "rm"))))) - (set (match_operand:SWI48 0 "register_operand" "=r") + (match_operand:SWI48 2 "nonimmediate_operand" "r,rm"))))) + (set (match_operand:SWI48 0 "nonimmediate_operand" "=rm,r") (minus:SWI48 (minus:SWI48 (match_dup 1) (match_operator:SWI48 5 "ix86_carry_flag_operator" @@ -8132,6 +8410,154 @@ (define_insn "subborrow" (set_attr "pent_pair" "pu") (set_attr "mode" "")]) +(define_peephole2 + [(set (match_operand:SWI48 0 "general_reg_operand") + (match_operand:SWI48 1 "memory_operand")) + (parallel [(set (reg:CCC FLAGS_REG) + (compare:CCC + (zero_extend: (match_dup 0)) + (plus: + (match_operator: 4 "ix86_carry_flag_operator" + [(match_operand 3 "flags_reg_operand") (const_int 0)]) + (zero_extend: + (match_operand:SWI48 2 "memory_operand"))))) + (set (match_dup 0) + (minus:SWI48 + (minus:SWI48 + (match_dup 0) + (match_operator:SWI48 5 "ix86_carry_flag_operator" + [(match_dup 3) (const_int 0)])) + (match_dup 2)))]) + (set (match_dup 1) (match_dup 0))] + "(TARGET_READ_MODIFY_WRITE || optimize_insn_for_size_p ()) + && peep2_reg_dead_p (3, operands[0]) + && !reg_overlap_mentioned_p (operands[0], operands[1]) + && !reg_overlap_mentioned_p (operands[0], operands[2])" + [(set (match_dup 0) (match_dup 2)) + (parallel [(set (reg:CCC FLAGS_REG) + (compare:CCC + (zero_extend: (match_dup 1)) + (plus: (match_op_dup 4 + [(match_dup 3) (const_int 0)]) + (zero_extend: (match_dup 0))))) + (set (match_dup 1) + (minus:SWI48 (minus:SWI48 (match_dup 1) + (match_op_dup 5 + [(match_dup 3) (const_int 0)])) + (match_dup 0)))])]) + +(define_peephole2 + [(set (match_operand:SWI48 6 "general_reg_operand") + (match_operand:SWI48 7 "memory_operand")) + (set (match_operand:SWI48 8 "general_reg_operand") + (match_operand:SWI48 9 "memory_operand")) + (parallel [(set (reg:CCC FLAGS_REG) + (compare:CCC + (zero_extend: + (match_operand:SWI48 0 "general_reg_operand")) + (plus: + (match_operator: 4 "ix86_carry_flag_operator" + [(match_operand 3 "flags_reg_operand") (const_int 0)]) + (zero_extend: + (match_operand:SWI48 2 "general_reg_operand"))))) + (set (match_dup 0) + (minus:SWI48 + (minus:SWI48 + (match_dup 0) + (match_operator:SWI48 5 "ix86_carry_flag_operator" + [(match_dup 3) (const_int 0)])) + (match_dup 2)))]) + (set (match_operand:SWI48 1 "memory_operand") (match_dup 0))] + "(TARGET_READ_MODIFY_WRITE || optimize_insn_for_size_p ()) + && peep2_reg_dead_p (4, operands[0]) + && peep2_reg_dead_p (3, operands[2]) + && !reg_overlap_mentioned_p (operands[0], operands[1]) + && !reg_overlap_mentioned_p (operands[2], operands[1]) + && !reg_overlap_mentioned_p (operands[6], operands[9]) + && (rtx_equal_p (operands[6], operands[0]) + ? (rtx_equal_p (operands[7], operands[1]) + && rtx_equal_p (operands[8], operands[2])) + : (rtx_equal_p (operands[8], operands[0]) + && rtx_equal_p (operands[9], operands[1]) + && rtx_equal_p (operands[6], operands[2])))" + [(set (match_dup 0) (match_dup 9)) + (parallel [(set (reg:CCC FLAGS_REG) + (compare:CCC + (zero_extend: (match_dup 1)) + (plus: (match_op_dup 4 + [(match_dup 3) (const_int 0)]) + (zero_extend: (match_dup 0))))) + (set (match_dup 1) + (minus:SWI48 (minus:SWI48 (match_dup 1) + (match_op_dup 5 + [(match_dup 3) (const_int 0)])) + (match_dup 0)))])] +{ + if (!rtx_equal_p (operands[6], operands[0])) + operands[9] = operands[7]; +}) + +(define_peephole2 + [(set (match_operand:SWI48 6 "general_reg_operand") + (match_operand:SWI48 7 "memory_operand")) + (set (match_operand:SWI48 8 "general_reg_operand") + (match_operand:SWI48 9 "memory_operand")) + (parallel [(set (reg:CCC FLAGS_REG) + (compare:CCC + (zero_extend: + (match_operand:SWI48 0 "general_reg_operand")) + (plus: + (match_operator: 4 "ix86_carry_flag_operator" + [(match_operand 3 "flags_reg_operand") (const_int 0)]) + (zero_extend: + (match_operand:SWI48 2 "general_reg_operand"))))) + (set (match_dup 0) + (minus:SWI48 + (minus:SWI48 + (match_dup 0) + (match_operator:SWI48 5 "ix86_carry_flag_operator" + [(match_dup 3) (const_int 0)])) + (match_dup 2)))]) + (set (match_operand:QI 10 "general_reg_operand") + (ltu:QI (reg:CCC FLAGS_REG) (const_int 0))) + (set (match_operand:SWI48 11 "general_reg_operand") + (zero_extend:SWI48 (match_dup 10))) + (set (match_operand:SWI48 1 "memory_operand") (match_dup 0))] + "(TARGET_READ_MODIFY_WRITE || optimize_insn_for_size_p ()) + && peep2_reg_dead_p (6, operands[0]) + && peep2_reg_dead_p (3, operands[2]) + && !reg_overlap_mentioned_p (operands[0], operands[1]) + && !reg_overlap_mentioned_p (operands[2], operands[1]) + && !reg_overlap_mentioned_p (operands[6], operands[9]) + && !reg_overlap_mentioned_p (operands[0], operands[10]) + && !reg_overlap_mentioned_p (operands[10], operands[1]) + && !reg_overlap_mentioned_p (operands[0], operands[11]) + && !reg_overlap_mentioned_p (operands[11], operands[1]) + && (rtx_equal_p (operands[6], operands[0]) + ? (rtx_equal_p (operands[7], operands[1]) + && rtx_equal_p (operands[8], operands[2])) + : (rtx_equal_p (operands[8], operands[0]) + && rtx_equal_p (operands[9], operands[1]) + && rtx_equal_p (operands[6], operands[2])))" + [(set (match_dup 0) (match_dup 9)) + (parallel [(set (reg:CCC FLAGS_REG) + (compare:CCC + (zero_extend: (match_dup 1)) + (plus: (match_op_dup 4 + [(match_dup 3) (const_int 0)]) + (zero_extend: (match_dup 0))))) + (set (match_dup 1) + (minus:SWI48 (minus:SWI48 (match_dup 1) + (match_op_dup 5 + [(match_dup 3) (const_int 0)])) + (match_dup 0)))]) + (set (match_dup 10) (ltu:QI (reg:CCC FLAGS_REG) (const_int 0))) + (set (match_dup 11) (zero_extend:SWI48 (match_dup 10)))] +{ + if (!rtx_equal_p (operands[6], operands[0])) + operands[9] = operands[7]; +}) + (define_expand "subborrow_0" [(parallel [(set (reg:CC FLAGS_REG) @@ -8142,6 +8568,67 @@ (define_expand "subborrow_0" (minus:SWI48 (match_dup 1) (match_dup 2)))])] "ix86_binary_operator_ok (MINUS, mode, operands)") +(define_expand "uaddc5" + [(match_operand:SWI48 0 "register_operand") + (match_operand:SWI48 1 "register_operand") + (match_operand:SWI48 2 "register_operand") + (match_operand:SWI48 3 "register_operand") + (match_operand:SWI48 4 "nonmemory_operand")] + "" +{ + rtx cf = gen_rtx_REG (CCCmode, FLAGS_REG), pat, pat2; + if (operands[4] == const0_rtx) + emit_insn (gen_addcarry_0 (operands[0], operands[2], operands[3])); + else + { + rtx op4 = copy_to_mode_reg (QImode, + convert_to_mode (QImode, operands[4], 1)); + emit_insn (gen_addqi3_cconly_overflow (op4, constm1_rtx)); + pat = gen_rtx_LTU (mode, cf, const0_rtx); + pat2 = gen_rtx_LTU (mode, cf, const0_rtx); + emit_insn (gen_addcarry (operands[0], operands[2], operands[3], + cf, pat, pat2)); + } + rtx cc = gen_reg_rtx (QImode); + pat = gen_rtx_LTU (QImode, cf, const0_rtx); + emit_insn (gen_rtx_SET (cc, pat)); + emit_insn (gen_zero_extendqi2 (operands[1], cc)); + DONE; +}) + +(define_expand "usubc5" + [(match_operand:SWI48 0 "register_operand") + (match_operand:SWI48 1 "register_operand") + (match_operand:SWI48 2 "register_operand") + (match_operand:SWI48 3 "register_operand") + (match_operand:SWI48 4 "nonmemory_operand")] + "" +{ + rtx cf, pat, pat2; + if (operands[4] == const0_rtx) + { + cf = gen_rtx_REG (CCmode, FLAGS_REG); + emit_insn (gen_subborrow_0 (operands[0], operands[2], + operands[3])); + } + else + { + cf = gen_rtx_REG (CCCmode, FLAGS_REG); + rtx op4 = copy_to_mode_reg (QImode, + convert_to_mode (QImode, operands[4], 1)); + emit_insn (gen_addqi3_cconly_overflow (op4, constm1_rtx)); + pat = gen_rtx_LTU (mode, cf, const0_rtx); + pat2 = gen_rtx_LTU (mode, cf, const0_rtx); + emit_insn (gen_subborrow (operands[0], operands[2], operands[3], + cf, pat, pat2)); + } + rtx cc = gen_reg_rtx (QImode); + pat = gen_rtx_LTU (QImode, cf, const0_rtx); + emit_insn (gen_rtx_SET (cc, pat)); + emit_insn (gen_zero_extendqi2 (operands[1], cc)); + DONE; +}) + (define_mode_iterator CC_CCC [CC CCC]) ;; Pre-reload splitter to optimize @@ -8239,6 +8726,27 @@ (define_peephole2 (compare:CCC (plus:SWI (match_dup 1) (match_dup 0)) (match_dup 1))) + (set (match_dup 1) (plus:SWI (match_dup 1) (match_dup 0)))])]) + +(define_peephole2 + [(set (match_operand:SWI 0 "general_reg_operand") + (match_operand:SWI 1 "memory_operand")) + (parallel [(set (reg:CCC FLAGS_REG) + (compare:CCC + (plus:SWI (match_dup 0) + (match_operand:SWI 2 "memory_operand")) + (match_dup 0))) + (set (match_dup 0) (plus:SWI (match_dup 0) (match_dup 2)))]) + (set (match_dup 1) (match_dup 0))] + "(TARGET_READ_MODIFY_WRITE || optimize_insn_for_size_p ()) + && peep2_reg_dead_p (3, operands[0]) + && !reg_overlap_mentioned_p (operands[0], operands[1]) + && !reg_overlap_mentioned_p (operands[0], operands[2])" + [(set (match_dup 0) (match_dup 2)) + (parallel [(set (reg:CCC FLAGS_REG) + (compare:CCC + (plus:SWI (match_dup 1) (match_dup 0)) + (match_dup 1))) (set (match_dup 1) (plus:SWI (match_dup 1) (match_dup 0)))])]) (define_insn "*addsi3_zext_cc_overflow_1" --- gcc/testsuite/gcc.target/i386/pr79173-1.c.jj 2023-06-13 12:30:23.466967151 +0200 +++ gcc/testsuite/gcc.target/i386/pr79173-1.c 2023-06-13 12:30:23.466967151 +0200 @@ -0,0 +1,59 @@ +/* PR middle-end/79173 */ +/* { dg-do compile } */ +/* { dg-options "-O2 -fno-stack-protector -masm=att" } */ +/* { dg-final { scan-assembler-times "addq\t%r\[^\n\r]*, \\\(%rdi\\\)" 1 { target lp64 } } } */ +/* { dg-final { scan-assembler-times "adcq\t%r\[^\n\r]*, 8\\\(%rdi\\\)" 1 { target lp64 } } } */ +/* { dg-final { scan-assembler-times "adcq\t%r\[^\n\r]*, 16\\\(%rdi\\\)" 1 { target lp64 } } } */ +/* { dg-final { scan-assembler-times "adcq\t%r\[^\n\r]*, 24\\\(%rdi\\\)" 1 { target lp64 } } } */ +/* { dg-final { scan-assembler-times "subq\t%r\[^\n\r]*, \\\(%rdi\\\)" 1 { target lp64 } } } */ +/* { dg-final { scan-assembler-times "sbbq\t%r\[^\n\r]*, 8\\\(%rdi\\\)" 1 { target lp64 } } } */ +/* { dg-final { scan-assembler-times "sbbq\t%r\[^\n\r]*, 16\\\(%rdi\\\)" 1 { target lp64 } } } */ +/* { dg-final { scan-assembler-times "sbbq\t%r\[^\n\r]*, 24\\\(%rdi\\\)" 1 { target lp64 } } } */ +/* { dg-final { scan-assembler-times "addl\t%e\[^\n\r]*, \\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */ +/* { dg-final { scan-assembler-times "adcl\t%e\[^\n\r]*, 4\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */ +/* { dg-final { scan-assembler-times "adcl\t%e\[^\n\r]*, 8\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */ +/* { dg-final { scan-assembler-times "adcl\t%e\[^\n\r]*, 12\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */ +/* { dg-final { scan-assembler-times "subl\t%e\[^\n\r]*, \\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */ +/* { dg-final { scan-assembler-times "sbbl\t%e\[^\n\r]*, 4\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */ +/* { dg-final { scan-assembler-times "sbbl\t%e\[^\n\r]*, 8\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */ +/* { dg-final { scan-assembler-times "sbbl\t%e\[^\n\r]*, 12\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */ + +static unsigned long +uaddc (unsigned long x, unsigned long y, unsigned long carry_in, unsigned long *carry_out) +{ + unsigned long r; + unsigned long c1 = __builtin_add_overflow (x, y, &r); + unsigned long c2 = __builtin_add_overflow (r, carry_in, &r); + *carry_out = c1 + c2; + return r; +} + +static unsigned long +usubc (unsigned long x, unsigned long y, unsigned long carry_in, unsigned long *carry_out) +{ + unsigned long r; + unsigned long c1 = __builtin_sub_overflow (x, y, &r); + unsigned long c2 = __builtin_sub_overflow (r, carry_in, &r); + *carry_out = c1 + c2; + return r; +} + +void +foo (unsigned long *p, unsigned long *q) +{ + unsigned long c; + p[0] = uaddc (p[0], q[0], 0, &c); + p[1] = uaddc (p[1], q[1], c, &c); + p[2] = uaddc (p[2], q[2], c, &c); + p[3] = uaddc (p[3], q[3], c, &c); +} + +void +bar (unsigned long *p, unsigned long *q) +{ + unsigned long c; + p[0] = usubc (p[0], q[0], 0, &c); + p[1] = usubc (p[1], q[1], c, &c); + p[2] = usubc (p[2], q[2], c, &c); + p[3] = usubc (p[3], q[3], c, &c); +} --- gcc/testsuite/gcc.target/i386/pr79173-2.c.jj 2023-06-13 12:30:23.466967151 +0200 +++ gcc/testsuite/gcc.target/i386/pr79173-2.c 2023-06-13 12:30:23.466967151 +0200 @@ -0,0 +1,59 @@ +/* PR middle-end/79173 */ +/* { dg-do compile } */ +/* { dg-options "-O2 -fno-stack-protector -masm=att" } */ +/* { dg-final { scan-assembler-times "addq\t%r\[^\n\r]*, \\\(%rdi\\\)" 1 { target lp64 } } } */ +/* { dg-final { scan-assembler-times "adcq\t%r\[^\n\r]*, 8\\\(%rdi\\\)" 1 { target lp64 } } } */ +/* { dg-final { scan-assembler-times "adcq\t%r\[^\n\r]*, 16\\\(%rdi\\\)" 1 { target lp64 } } } */ +/* { dg-final { scan-assembler-times "adcq\t%r\[^\n\r]*, 24\\\(%rdi\\\)" 1 { target lp64 } } } */ +/* { dg-final { scan-assembler-times "subq\t%r\[^\n\r]*, \\\(%rdi\\\)" 1 { target lp64 } } } */ +/* { dg-final { scan-assembler-times "sbbq\t%r\[^\n\r]*, 8\\\(%rdi\\\)" 1 { target lp64 } } } */ +/* { dg-final { scan-assembler-times "sbbq\t%r\[^\n\r]*, 16\\\(%rdi\\\)" 1 { target lp64 } } } */ +/* { dg-final { scan-assembler-times "sbbq\t%r\[^\n\r]*, 24\\\(%rdi\\\)" 1 { target lp64 } } } */ +/* { dg-final { scan-assembler-times "addl\t%e\[^\n\r]*, \\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */ +/* { dg-final { scan-assembler-times "adcl\t%e\[^\n\r]*, 4\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */ +/* { dg-final { scan-assembler-times "adcl\t%e\[^\n\r]*, 8\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */ +/* { dg-final { scan-assembler-times "adcl\t%e\[^\n\r]*, 12\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */ +/* { dg-final { scan-assembler-times "subl\t%e\[^\n\r]*, \\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */ +/* { dg-final { scan-assembler-times "sbbl\t%e\[^\n\r]*, 4\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */ +/* { dg-final { scan-assembler-times "sbbl\t%e\[^\n\r]*, 8\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */ +/* { dg-final { scan-assembler-times "sbbl\t%e\[^\n\r]*, 12\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */ + +static unsigned long +uaddc (unsigned long x, unsigned long y, _Bool carry_in, _Bool *carry_out) +{ + unsigned long r; + _Bool c1 = __builtin_add_overflow (x, y, &r); + _Bool c2 = __builtin_add_overflow (r, carry_in, &r); + *carry_out = c1 | c2; + return r; +} + +static unsigned long +usubc (unsigned long x, unsigned long y, _Bool carry_in, _Bool *carry_out) +{ + unsigned long r; + _Bool c1 = __builtin_sub_overflow (x, y, &r); + _Bool c2 = __builtin_sub_overflow (r, carry_in, &r); + *carry_out = c1 | c2; + return r; +} + +void +foo (unsigned long *p, unsigned long *q) +{ + _Bool c; + p[0] = uaddc (p[0], q[0], 0, &c); + p[1] = uaddc (p[1], q[1], c, &c); + p[2] = uaddc (p[2], q[2], c, &c); + p[3] = uaddc (p[3], q[3], c, &c); +} + +void +bar (unsigned long *p, unsigned long *q) +{ + _Bool c; + p[0] = usubc (p[0], q[0], 0, &c); + p[1] = usubc (p[1], q[1], c, &c); + p[2] = usubc (p[2], q[2], c, &c); + p[3] = usubc (p[3], q[3], c, &c); +} --- gcc/testsuite/gcc.target/i386/pr79173-3.c.jj 2023-06-13 12:30:23.467967137 +0200 +++ gcc/testsuite/gcc.target/i386/pr79173-3.c 2023-06-13 12:30:23.467967137 +0200 @@ -0,0 +1,61 @@ +/* PR middle-end/79173 */ +/* { dg-do compile } */ +/* { dg-options "-O2 -fno-stack-protector -masm=att" } */ +/* { dg-final { scan-assembler-times "addq\t%r\[^\n\r]*, \\\(%rdi\\\)" 1 { target lp64 } } } */ +/* { dg-final { scan-assembler-times "adcq\t%r\[^\n\r]*, 8\\\(%rdi\\\)" 1 { target lp64 } } } */ +/* { dg-final { scan-assembler-times "adcq\t%r\[^\n\r]*, 16\\\(%rdi\\\)" 1 { target lp64 } } } */ +/* { dg-final { scan-assembler-times "adcq\t%r\[^\n\r]*, 24\\\(%rdi\\\)" 1 { target lp64 } } } */ +/* { dg-final { scan-assembler-times "subq\t%r\[^\n\r]*, \\\(%rdi\\\)" 1 { target lp64 } } } */ +/* { dg-final { scan-assembler-times "sbbq\t%r\[^\n\r]*, 8\\\(%rdi\\\)" 1 { target lp64 } } } */ +/* { dg-final { scan-assembler-times "sbbq\t%r\[^\n\r]*, 16\\\(%rdi\\\)" 1 { target lp64 } } } */ +/* { dg-final { scan-assembler-times "sbbq\t%r\[^\n\r]*, 24\\\(%rdi\\\)" 1 { target lp64 } } } */ +/* { dg-final { scan-assembler-times "addl\t%e\[^\n\r]*, \\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */ +/* { dg-final { scan-assembler-times "adcl\t%e\[^\n\r]*, 4\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */ +/* { dg-final { scan-assembler-times "adcl\t%e\[^\n\r]*, 8\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */ +/* { dg-final { scan-assembler-times "adcl\t%e\[^\n\r]*, 12\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */ +/* { dg-final { scan-assembler-times "subl\t%e\[^\n\r]*, \\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */ +/* { dg-final { scan-assembler-times "sbbl\t%e\[^\n\r]*, 4\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */ +/* { dg-final { scan-assembler-times "sbbl\t%e\[^\n\r]*, 8\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */ +/* { dg-final { scan-assembler-times "sbbl\t%e\[^\n\r]*, 12\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */ + +static unsigned long +uaddc (unsigned long x, unsigned long y, unsigned long carry_in, unsigned long *carry_out) +{ + unsigned long r; + unsigned long c1 = __builtin_add_overflow (x, y, &r); + unsigned long c2 = __builtin_add_overflow (r, carry_in, &r); + *carry_out = c1 + c2; + return r; +} + +static unsigned long +usubc (unsigned long x, unsigned long y, unsigned long carry_in, unsigned long *carry_out) +{ + unsigned long r; + unsigned long c1 = __builtin_sub_overflow (x, y, &r); + unsigned long c2 = __builtin_sub_overflow (r, carry_in, &r); + *carry_out = c1 + c2; + return r; +} + +unsigned long +foo (unsigned long *p, unsigned long *q) +{ + unsigned long c; + p[0] = uaddc (p[0], q[0], 0, &c); + p[1] = uaddc (p[1], q[1], c, &c); + p[2] = uaddc (p[2], q[2], c, &c); + p[3] = uaddc (p[3], q[3], c, &c); + return c; +} + +unsigned long +bar (unsigned long *p, unsigned long *q) +{ + unsigned long c; + p[0] = usubc (p[0], q[0], 0, &c); + p[1] = usubc (p[1], q[1], c, &c); + p[2] = usubc (p[2], q[2], c, &c); + p[3] = usubc (p[3], q[3], c, &c); + return c; +} --- gcc/testsuite/gcc.target/i386/pr79173-4.c.jj 2023-06-13 12:30:23.467967137 +0200 +++ gcc/testsuite/gcc.target/i386/pr79173-4.c 2023-06-13 12:30:23.467967137 +0200 @@ -0,0 +1,61 @@ +/* PR middle-end/79173 */ +/* { dg-do compile } */ +/* { dg-options "-O2 -fno-stack-protector -masm=att" } */ +/* { dg-final { scan-assembler-times "addq\t%r\[^\n\r]*, \\\(%rdi\\\)" 1 { target lp64 } } } */ +/* { dg-final { scan-assembler-times "adcq\t%r\[^\n\r]*, 8\\\(%rdi\\\)" 1 { target lp64 } } } */ +/* { dg-final { scan-assembler-times "adcq\t%r\[^\n\r]*, 16\\\(%rdi\\\)" 1 { target lp64 } } } */ +/* { dg-final { scan-assembler-times "adcq\t%r\[^\n\r]*, 24\\\(%rdi\\\)" 1 { target lp64 } } } */ +/* { dg-final { scan-assembler-times "subq\t%r\[^\n\r]*, \\\(%rdi\\\)" 1 { target lp64 } } } */ +/* { dg-final { scan-assembler-times "sbbq\t%r\[^\n\r]*, 8\\\(%rdi\\\)" 1 { target lp64 } } } */ +/* { dg-final { scan-assembler-times "sbbq\t%r\[^\n\r]*, 16\\\(%rdi\\\)" 1 { target lp64 } } } */ +/* { dg-final { scan-assembler-times "sbbq\t%r\[^\n\r]*, 24\\\(%rdi\\\)" 1 { target lp64 } } } */ +/* { dg-final { scan-assembler-times "addl\t%e\[^\n\r]*, \\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */ +/* { dg-final { scan-assembler-times "adcl\t%e\[^\n\r]*, 4\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */ +/* { dg-final { scan-assembler-times "adcl\t%e\[^\n\r]*, 8\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */ +/* { dg-final { scan-assembler-times "adcl\t%e\[^\n\r]*, 12\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */ +/* { dg-final { scan-assembler-times "subl\t%e\[^\n\r]*, \\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */ +/* { dg-final { scan-assembler-times "sbbl\t%e\[^\n\r]*, 4\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */ +/* { dg-final { scan-assembler-times "sbbl\t%e\[^\n\r]*, 8\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */ +/* { dg-final { scan-assembler-times "sbbl\t%e\[^\n\r]*, 12\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */ + +static unsigned long +uaddc (unsigned long x, unsigned long y, _Bool carry_in, _Bool *carry_out) +{ + unsigned long r; + _Bool c1 = __builtin_add_overflow (x, y, &r); + _Bool c2 = __builtin_add_overflow (r, carry_in, &r); + *carry_out = c1 ^ c2; + return r; +} + +static unsigned long +usubc (unsigned long x, unsigned long y, _Bool carry_in, _Bool *carry_out) +{ + unsigned long r; + _Bool c1 = __builtin_sub_overflow (x, y, &r); + _Bool c2 = __builtin_sub_overflow (r, carry_in, &r); + *carry_out = c1 ^ c2; + return r; +} + +_Bool +foo (unsigned long *p, unsigned long *q) +{ + _Bool c; + p[0] = uaddc (p[0], q[0], 0, &c); + p[1] = uaddc (p[1], q[1], c, &c); + p[2] = uaddc (p[2], q[2], c, &c); + p[3] = uaddc (p[3], q[3], c, &c); + return c; +} + +_Bool +bar (unsigned long *p, unsigned long *q) +{ + _Bool c; + p[0] = usubc (p[0], q[0], 0, &c); + p[1] = usubc (p[1], q[1], c, &c); + p[2] = usubc (p[2], q[2], c, &c); + p[3] = usubc (p[3], q[3], c, &c); + return c; +} --- gcc/testsuite/gcc.target/i386/pr79173-5.c.jj 2023-06-13 12:30:23.467967137 +0200 +++ gcc/testsuite/gcc.target/i386/pr79173-5.c 2023-06-13 12:30:23.467967137 +0200 @@ -0,0 +1,32 @@ +/* PR middle-end/79173 */ +/* { dg-do compile } */ +/* { dg-options "-O2 -fno-stack-protector -masm=att" } */ +/* { dg-final { scan-assembler-times "addq\t%r\[^\n\r]*, \\\(%rdi\\\)" 1 { target lp64 } } } */ +/* { dg-final { scan-assembler-times "adcq\t%r\[^\n\r]*, 8\\\(%rdi\\\)" 1 { target lp64 } } } */ +/* { dg-final { scan-assembler-times "adcq\t%r\[^\n\r]*, 16\\\(%rdi\\\)" 1 { target lp64 } } } */ +/* { dg-final { scan-assembler-times "adcq\t%r\[^\n\r]*, 24\\\(%rdi\\\)" 1 { target lp64 } } } */ +/* { dg-final { scan-assembler-times "addl\t%e\[^\n\r]*, \\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */ +/* { dg-final { scan-assembler-times "adcl\t%e\[^\n\r]*, 4\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */ +/* { dg-final { scan-assembler-times "adcl\t%e\[^\n\r]*, 8\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */ +/* { dg-final { scan-assembler-times "adcl\t%e\[^\n\r]*, 12\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */ + +static unsigned long +uaddc (unsigned long x, unsigned long y, unsigned long carry_in, unsigned long *carry_out) +{ + unsigned long r = x + y; + unsigned long c1 = r < x; + r += carry_in; + unsigned long c2 = r < carry_in; + *carry_out = c1 + c2; + return r; +} + +void +foo (unsigned long *p, unsigned long *q) +{ + unsigned long c; + p[0] = uaddc (p[0], q[0], 0, &c); + p[1] = uaddc (p[1], q[1], c, &c); + p[2] = uaddc (p[2], q[2], c, &c); + p[3] = uaddc (p[3], q[3], c, &c); +} --- gcc/testsuite/gcc.target/i386/pr79173-6.c.jj 2023-06-13 12:30:23.467967137 +0200 +++ gcc/testsuite/gcc.target/i386/pr79173-6.c 2023-06-13 12:30:23.467967137 +0200 @@ -0,0 +1,33 @@ +/* PR middle-end/79173 */ +/* { dg-do compile } */ +/* { dg-options "-O2 -fno-stack-protector -masm=att" } */ +/* { dg-final { scan-assembler-times "addq\t%r\[^\n\r]*, \\\(%rdi\\\)" 1 { target lp64 } } } */ +/* { dg-final { scan-assembler-times "adcq\t%r\[^\n\r]*, 8\\\(%rdi\\\)" 1 { target lp64 } } } */ +/* { dg-final { scan-assembler-times "adcq\t%r\[^\n\r]*, 16\\\(%rdi\\\)" 1 { target lp64 } } } */ +/* { dg-final { scan-assembler-times "adcq\t%r\[^\n\r]*, 24\\\(%rdi\\\)" 1 { target lp64 } } } */ +/* { dg-final { scan-assembler-times "addl\t%e\[^\n\r]*, \\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */ +/* { dg-final { scan-assembler-times "adcl\t%e\[^\n\r]*, 4\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */ +/* { dg-final { scan-assembler-times "adcl\t%e\[^\n\r]*, 8\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */ +/* { dg-final { scan-assembler-times "adcl\t%e\[^\n\r]*, 12\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */ + +static unsigned long +uaddc (unsigned long x, unsigned long y, unsigned long carry_in, unsigned long *carry_out) +{ + unsigned long r = x + y; + unsigned long c1 = r < x; + r += carry_in; + unsigned long c2 = r < carry_in; + *carry_out = c1 + c2; + return r; +} + +unsigned long +foo (unsigned long *p, unsigned long *q) +{ + unsigned long c; + p[0] = uaddc (p[0], q[0], 0, &c); + p[1] = uaddc (p[1], q[1], c, &c); + p[2] = uaddc (p[2], q[2], c, &c); + p[3] = uaddc (p[3], q[3], c, &c); + return c; +} --- gcc/testsuite/gcc.target/i386/pr79173-7.c.jj 2023-06-13 12:30:23.468967123 +0200 +++ gcc/testsuite/gcc.target/i386/pr79173-7.c 2023-06-13 12:30:23.468967123 +0200 @@ -0,0 +1,31 @@ +/* PR middle-end/79173 */ +/* { dg-do compile { target lp64 } } */ +/* { dg-options "-O2 -fno-stack-protector -masm=att" } */ +/* { dg-final { scan-assembler-times "addq\t%r\[^\n\r]*, \\\(%rdi\\\)" 1 } } */ +/* { dg-final { scan-assembler-times "adcq\t%r\[^\n\r]*, 8\\\(%rdi\\\)" 1 } } */ +/* { dg-final { scan-assembler-times "adcq\t%r\[^\n\r]*, 16\\\(%rdi\\\)" 1 } } */ +/* { dg-final { scan-assembler-times "adcq\t%r\[^\n\r]*, 24\\\(%rdi\\\)" 1 } } */ +/* { dg-final { scan-assembler-times "subq\t%r\[^\n\r]*, \\\(%rdi\\\)" 1 } } */ +/* { dg-final { scan-assembler-times "sbbq\t%r\[^\n\r]*, 8\\\(%rdi\\\)" 1 } } */ +/* { dg-final { scan-assembler-times "sbbq\t%r\[^\n\r]*, 16\\\(%rdi\\\)" 1 } } */ +/* { dg-final { scan-assembler-times "sbbq\t%r\[^\n\r]*, 24\\\(%rdi\\\)" 1 } } */ + +#include + +void +foo (unsigned long long *p, unsigned long long *q) +{ + unsigned char c = _addcarry_u64 (0, p[0], q[0], &p[0]); + c = _addcarry_u64 (c, p[1], q[1], &p[1]); + c = _addcarry_u64 (c, p[2], q[2], &p[2]); + _addcarry_u64 (c, p[3], q[3], &p[3]); +} + +void +bar (unsigned long long *p, unsigned long long *q) +{ + unsigned char c = _subborrow_u64 (0, p[0], q[0], &p[0]); + c = _subborrow_u64 (c, p[1], q[1], &p[1]); + c = _subborrow_u64 (c, p[2], q[2], &p[2]); + _subborrow_u64 (c, p[3], q[3], &p[3]); +} --- gcc/testsuite/gcc.target/i386/pr79173-8.c.jj 2023-06-13 12:30:23.468967123 +0200 +++ gcc/testsuite/gcc.target/i386/pr79173-8.c 2023-06-13 12:30:23.468967123 +0200 @@ -0,0 +1,31 @@ +/* PR middle-end/79173 */ +/* { dg-do compile } */ +/* { dg-options "-O2 -fno-stack-protector -masm=att" } */ +/* { dg-final { scan-assembler-times "addl\t%e\[^\n\r]*, \\\(%\[^\n\r]*\\\)" 1 } } */ +/* { dg-final { scan-assembler-times "adcl\t%e\[^\n\r]*, 4\\\(%\[^\n\r]*\\\)" 1 } } */ +/* { dg-final { scan-assembler-times "adcl\t%e\[^\n\r]*, 8\\\(%\[^\n\r]*\\\)" 1 } } */ +/* { dg-final { scan-assembler-times "adcl\t%e\[^\n\r]*, 12\\\(%\[^\n\r]*\\\)" 1 } } */ +/* { dg-final { scan-assembler-times "subl\t%e\[^\n\r]*, \\\(%\[^\n\r]*\\\)" 1 } } */ +/* { dg-final { scan-assembler-times "sbbl\t%e\[^\n\r]*, 4\\\(%\[^\n\r]*\\\)" 1 } } */ +/* { dg-final { scan-assembler-times "sbbl\t%e\[^\n\r]*, 8\\\(%\[^\n\r]*\\\)" 1 } } */ +/* { dg-final { scan-assembler-times "sbbl\t%e\[^\n\r]*, 12\\\(%\[^\n\r]*\\\)" 1 } } */ + +#include + +void +foo (unsigned int *p, unsigned int *q) +{ + unsigned char c = _addcarry_u32 (0, p[0], q[0], &p[0]); + c = _addcarry_u32 (c, p[1], q[1], &p[1]); + c = _addcarry_u32 (c, p[2], q[2], &p[2]); + _addcarry_u32 (c, p[3], q[3], &p[3]); +} + +void +bar (unsigned int *p, unsigned int *q) +{ + unsigned char c = _subborrow_u32 (0, p[0], q[0], &p[0]); + c = _subborrow_u32 (c, p[1], q[1], &p[1]); + c = _subborrow_u32 (c, p[2], q[2], &p[2]); + _subborrow_u32 (c, p[3], q[3], &p[3]); +} --- gcc/testsuite/gcc.target/i386/pr79173-9.c.jj 2023-06-13 12:30:23.468967123 +0200 +++ gcc/testsuite/gcc.target/i386/pr79173-9.c 2023-06-13 12:30:23.468967123 +0200 @@ -0,0 +1,31 @@ +/* PR middle-end/79173 */ +/* { dg-do compile { target lp64 } } */ +/* { dg-options "-O2 -fno-stack-protector -masm=att" } */ +/* { dg-final { scan-assembler-times "addq\t%r\[^\n\r]*, \\\(%rdi\\\)" 1 } } */ +/* { dg-final { scan-assembler-times "adcq\t%r\[^\n\r]*, 8\\\(%rdi\\\)" 1 } } */ +/* { dg-final { scan-assembler-times "adcq\t%r\[^\n\r]*, 16\\\(%rdi\\\)" 1 } } */ +/* { dg-final { scan-assembler-times "adcq\t%r\[^\n\r]*, 24\\\(%rdi\\\)" 1 } } */ +/* { dg-final { scan-assembler-times "subq\t%r\[^\n\r]*, \\\(%rdi\\\)" 1 } } */ +/* { dg-final { scan-assembler-times "sbbq\t%r\[^\n\r]*, 8\\\(%rdi\\\)" 1 } } */ +/* { dg-final { scan-assembler-times "sbbq\t%r\[^\n\r]*, 16\\\(%rdi\\\)" 1 } } */ +/* { dg-final { scan-assembler-times "sbbq\t%r\[^\n\r]*, 24\\\(%rdi\\\)" 1 } } */ + +#include + +unsigned long long +foo (unsigned long long *p, unsigned long long *q) +{ + unsigned char c = _addcarry_u64 (0, p[0], q[0], &p[0]); + c = _addcarry_u64 (c, p[1], q[1], &p[1]); + c = _addcarry_u64 (c, p[2], q[2], &p[2]); + return _addcarry_u64 (c, p[3], q[3], &p[3]); +} + +unsigned long long +bar (unsigned long long *p, unsigned long long *q) +{ + unsigned char c = _subborrow_u64 (0, p[0], q[0], &p[0]); + c = _subborrow_u64 (c, p[1], q[1], &p[1]); + c = _subborrow_u64 (c, p[2], q[2], &p[2]); + return _subborrow_u64 (c, p[3], q[3], &p[3]); +} --- gcc/testsuite/gcc.target/i386/pr79173-10.c.jj 2023-06-13 12:30:23.468967123 +0200 +++ gcc/testsuite/gcc.target/i386/pr79173-10.c 2023-06-13 12:30:23.468967123 +0200 @@ -0,0 +1,31 @@ +/* PR middle-end/79173 */ +/* { dg-do compile } */ +/* { dg-options "-O2 -fno-stack-protector -masm=att" } */ +/* { dg-final { scan-assembler-times "addl\t%e\[^\n\r]*, \\\(%\[^\n\r]*\\\)" 1 } } */ +/* { dg-final { scan-assembler-times "adcl\t%e\[^\n\r]*, 4\\\(%\[^\n\r]*\\\)" 1 } } */ +/* { dg-final { scan-assembler-times "adcl\t%e\[^\n\r]*, 8\\\(%\[^\n\r]*\\\)" 1 } } */ +/* { dg-final { scan-assembler-times "adcl\t%e\[^\n\r]*, 12\\\(%\[^\n\r]*\\\)" 1 } } */ +/* { dg-final { scan-assembler-times "subl\t%e\[^\n\r]*, \\\(%\[^\n\r]*\\\)" 1 } } */ +/* { dg-final { scan-assembler-times "sbbl\t%e\[^\n\r]*, 4\\\(%\[^\n\r]*\\\)" 1 } } */ +/* { dg-final { scan-assembler-times "sbbl\t%e\[^\n\r]*, 8\\\(%\[^\n\r]*\\\)" 1 } } */ +/* { dg-final { scan-assembler-times "sbbl\t%e\[^\n\r]*, 12\\\(%\[^\n\r]*\\\)" 1 } } */ + +#include + +unsigned int +foo (unsigned int *p, unsigned int *q) +{ + unsigned char c = _addcarry_u32 (0, p[0], q[0], &p[0]); + c = _addcarry_u32 (c, p[1], q[1], &p[1]); + c = _addcarry_u32 (c, p[2], q[2], &p[2]); + return _addcarry_u32 (c, p[3], q[3], &p[3]); +} + +unsigned int +bar (unsigned int *p, unsigned int *q) +{ + unsigned char c = _subborrow_u32 (0, p[0], q[0], &p[0]); + c = _subborrow_u32 (c, p[1], q[1], &p[1]); + c = _subborrow_u32 (c, p[2], q[2], &p[2]); + return _subborrow_u32 (c, p[3], q[3], &p[3]); +} Jakub