From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by sourceware.org (Postfix) with ESMTPS id 8C7523858D38 for ; Tue, 6 Jun 2023 21:42:14 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 8C7523858D38 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=redhat.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1686087733; h=from:from:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type; bh=yywqevr5CtlktRdZtYwp4Bk4BmA4W4O4tZY4g9x/6KM=; b=QO6DOT0kwhsj56pv876s2HTyDko0pmSwvkVmhqJ2NEN8w5CcKpgjtJeOjHKq4xBwdgFm6s mK7m8564P3F8eBTEr5tw0hQKLQYwDeFEE+oA3+8jdvheB97uvThbOe4Ohe+NdFJ6NSOeTS w9TlNZCnZvJz1idYWRZvqNmZVMYnnIM= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-211-1Ra3K_WSPKSdxOsj_cDeaw-1; Tue, 06 Jun 2023 17:42:12 -0400 X-MC-Unique: 1Ra3K_WSPKSdxOsj_cDeaw-1 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.rdu2.redhat.com [10.11.54.6]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 1BAAC3806723; Tue, 6 Jun 2023 21:42:12 +0000 (UTC) Received: from tucnak.zalov.cz (unknown [10.39.194.30]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 7B63E2166B26; Tue, 6 Jun 2023 21:42:11 +0000 (UTC) Received: from tucnak.zalov.cz (localhost [127.0.0.1]) by tucnak.zalov.cz (8.17.1/8.17.1) with ESMTPS id 356Lg8ok1356534 (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=NOT); Tue, 6 Jun 2023 23:42:09 +0200 Received: (from jakub@localhost) by tucnak.zalov.cz (8.17.1/8.17.1/Submit) id 356Lg8b01356533; Tue, 6 Jun 2023 23:42:08 +0200 Date: Tue, 6 Jun 2023 23:42:07 +0200 From: Jakub Jelinek To: Richard Biener , Uros Bizjak Cc: gcc-patches@gcc.gnu.org Subject: [PATCH] middle-end, i386: Pattern recognize add/subtract with carry [PR79173] Message-ID: Reply-To: Jakub Jelinek MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.6 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=us-ascii Content-Disposition: inline X-Spam-Status: No, score=-3.7 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,KAM_SHORT,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_NONE,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Hi! The following patch introduces {add,sub}c5_optab and pattern recognizes various forms of add with carry and subtract with carry/borrow, see pr79173-{1,2,3,4,5,6}.c tests on what is matched. Primarily forms with 2 __builtin_add_overflow or __builtin_sub_overflow calls per limb (with just one for the least significant one), for add with carry even when it is hand written in C (for subtraction reassoc seems to change it too much so that the pattern recognition doesn't work). __builtin_{add,sub}_overflow are standardized in C23 under ckd_{add,sub} names, so it isn't any longer a GNU only extension. Note, clang has for these has (IMHO badly designed) __builtin_{add,sub}c{b,s,,l,ll} builtins which don't add/subtract just a single bit of carry, but basically add 3 unsigned values or subtract 2 unsigned values from one, and result in carry out of 0, 1, or 2 because of that. If we wanted to introduce those for clang compatibility, we could and lower them early to just two __builtin_{add,sub}_overflow calls and let the pattern matching in this patch recognize it later. I've added expanders for this on ix86 and in addition to that added various peephole2s to make sure we get nice (and small) code for the common cases. I think there are other PRs which request that e.g. for the _{addcarry,subborrow}_u{32,64} intrinsics, which the patch also improves. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? Would be nice if support for these optabs was added to many other targets, arm/aarch64 and powerpc* certainly have such instructions, I'd expect in fact that most targets do. The _BitInt support I'm working on will also need this to emit reasonable code. 2023-06-06 Jakub Jelinek PR middle-end/79173 * internal-fn.def (ADDC, SUBC): New internal functions. * internal-fn.cc (expand_ADDC, expand_SUBC): New functions. (commutative_ternary_fn_p): Return true also for IFN_ADDC. * optabs.def (addc5_optab, subc5_optab): New optabs. * tree-ssa-math-opts.cc (match_addc_subc): New function. (math_opts_dom_walker::after_dom_children): Call match_addc_subc for PLUS_EXPR, MINUS_EXPR, BIT_IOR_EXPR and BIT_XOR_EXPR unless other optimizations have been successful for those. * gimple-fold.cc (gimple_fold_call): Handle IFN_ADDC and IFN_SUBC. * gimple-range-fold.cc (adjust_imagpart_expr): Likewise. * tree-ssa-dce.cc (eliminate_unnecessary_stmts): Likewise. * doc/md.texi (addc5, subc5): Document new named patterns. * config/i386/i386.md (subborrow): Add alternative with memory destination. (addc5, subc5): New define_expand patterns. (*sub_3, @add3_carry, addcarry, @sub3_carry, subborrow, *add3_cc_overflow_1): Add define_peephole2 TARGET_READ_MODIFY_WRITE/-Os patterns to prefer using memory destination in these patterns. * gcc.target/i386/pr79173-1.c: New test. * gcc.target/i386/pr79173-2.c: New test. * gcc.target/i386/pr79173-3.c: New test. * gcc.target/i386/pr79173-4.c: New test. * gcc.target/i386/pr79173-5.c: New test. * gcc.target/i386/pr79173-6.c: New test. * gcc.target/i386/pr79173-7.c: New test. * gcc.target/i386/pr79173-8.c: New test. * gcc.target/i386/pr79173-9.c: New test. * gcc.target/i386/pr79173-10.c: New test. --- gcc/internal-fn.def.jj 2023-06-05 10:38:06.670333685 +0200 +++ gcc/internal-fn.def 2023-06-05 11:40:50.672212265 +0200 @@ -381,6 +381,8 @@ DEF_INTERNAL_FN (ASAN_POISON_USE, ECF_LE DEF_INTERNAL_FN (ADD_OVERFLOW, ECF_CONST | ECF_LEAF | ECF_NOTHROW, NULL) DEF_INTERNAL_FN (SUB_OVERFLOW, ECF_CONST | ECF_LEAF | ECF_NOTHROW, NULL) DEF_INTERNAL_FN (MUL_OVERFLOW, ECF_CONST | ECF_LEAF | ECF_NOTHROW, NULL) +DEF_INTERNAL_FN (ADDC, ECF_CONST | ECF_LEAF | ECF_NOTHROW, NULL) +DEF_INTERNAL_FN (SUBC, ECF_CONST | ECF_LEAF | ECF_NOTHROW, NULL) DEF_INTERNAL_FN (TSAN_FUNC_EXIT, ECF_NOVOPS | ECF_LEAF | ECF_NOTHROW, NULL) DEF_INTERNAL_FN (VA_ARG, ECF_NOTHROW | ECF_LEAF, NULL) DEF_INTERNAL_FN (VEC_CONVERT, ECF_CONST | ECF_LEAF | ECF_NOTHROW, NULL) --- gcc/internal-fn.cc.jj 2023-05-15 19:12:24.080780016 +0200 +++ gcc/internal-fn.cc 2023-06-06 09:38:46.333871169 +0200 @@ -2722,6 +2722,44 @@ expand_MUL_OVERFLOW (internal_fn, gcall expand_arith_overflow (MULT_EXPR, stmt); } +/* Expand ADDC STMT. */ + +static void +expand_ADDC (internal_fn ifn, gcall *stmt) +{ + tree lhs = gimple_call_lhs (stmt); + tree arg1 = gimple_call_arg (stmt, 0); + tree arg2 = gimple_call_arg (stmt, 1); + tree arg3 = gimple_call_arg (stmt, 2); + tree type = TREE_TYPE (arg1); + machine_mode mode = TYPE_MODE (type); + insn_code icode = optab_handler (ifn == IFN_ADDC + ? addc5_optab : subc5_optab, mode); + rtx op1 = expand_normal (arg1); + rtx op2 = expand_normal (arg2); + rtx op3 = expand_normal (arg3); + rtx target = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE); + rtx re = gen_reg_rtx (mode); + rtx im = gen_reg_rtx (mode); + class expand_operand ops[5]; + create_output_operand (&ops[0], re, mode); + create_output_operand (&ops[1], im, mode); + create_input_operand (&ops[2], op1, mode); + create_input_operand (&ops[3], op2, mode); + create_input_operand (&ops[4], op3, mode); + expand_insn (icode, 5, ops); + write_complex_part (target, re, false, false); + write_complex_part (target, im, true, false); +} + +/* Expand SUBC STMT. */ + +static void +expand_SUBC (internal_fn ifn, gcall *stmt) +{ + expand_ADDC (ifn, stmt); +} + /* This should get folded in tree-vectorizer.cc. */ static void @@ -3990,6 +4028,7 @@ commutative_ternary_fn_p (internal_fn fn case IFN_FMS: case IFN_FNMA: case IFN_FNMS: + case IFN_ADDC: return true; default: --- gcc/optabs.def.jj 2023-01-02 09:32:43.984973197 +0100 +++ gcc/optabs.def 2023-06-05 19:03:33.858210753 +0200 @@ -260,6 +260,8 @@ OPTAB_D (uaddv4_optab, "uaddv$I$a4") OPTAB_D (usubv4_optab, "usubv$I$a4") OPTAB_D (umulv4_optab, "umulv$I$a4") OPTAB_D (negv3_optab, "negv$I$a3") +OPTAB_D (addc5_optab, "addc$I$a5") +OPTAB_D (subc5_optab, "subc$I$a5") OPTAB_D (addptr3_optab, "addptr$a3") OPTAB_D (spaceship_optab, "spaceship$a3") --- gcc/tree-ssa-math-opts.cc.jj 2023-05-19 12:58:25.246844019 +0200 +++ gcc/tree-ssa-math-opts.cc 2023-06-06 17:22:24.833455259 +0200 @@ -4441,6 +4441,438 @@ match_arith_overflow (gimple_stmt_iterat return false; } +/* Try to match e.g. + _29 = .ADD_OVERFLOW (_3, _4); + _30 = REALPART_EXPR <_29>; + _31 = IMAGPART_EXPR <_29>; + _32 = .ADD_OVERFLOW (_30, _38); + _33 = REALPART_EXPR <_32>; + _34 = IMAGPART_EXPR <_32>; + _35 = _31 + _34; + as + _36 = .ADDC (_3, _4, _38); + _33 = REALPART_EXPR <_36>; + _35 = IMAGPART_EXPR <_36>; + or + _22 = .SUB_OVERFLOW (_6, _5); + _23 = REALPART_EXPR <_22>; + _24 = IMAGPART_EXPR <_22>; + _25 = .SUB_OVERFLOW (_23, _37); + _26 = REALPART_EXPR <_25>; + _27 = IMAGPART_EXPR <_25>; + _28 = _24 | _27; + as + _29 = .SUBC (_6, _5, _37); + _26 = REALPART_EXPR <_29>; + _288 = IMAGPART_EXPR <_29>; + provided _38 or _37 above have [0, 1] range + and _3, _4 and _30 or _6, _5 and _23 are unsigned + integral types with the same precision. Whether + or | or ^ is + used on the IMAGPART_EXPR results doesn't matter, with one of + added or subtracted operands in [0, 1] range at most one + .ADD_OVERFLOW or .SUB_OVERFLOW will indicate overflow. */ + +static bool +match_addc_subc (gimple_stmt_iterator *gsi, gimple *stmt, tree_code code) +{ + tree rhs[4]; + rhs[0] = gimple_assign_rhs1 (stmt); + rhs[1] = gimple_assign_rhs2 (stmt); + rhs[2] = NULL_TREE; + rhs[3] = NULL_TREE; + tree type = TREE_TYPE (rhs[0]); + if (!INTEGRAL_TYPE_P (type) || !TYPE_UNSIGNED (type)) + return false; + + if (code != BIT_IOR_EXPR && code != BIT_XOR_EXPR) + { + /* If overflow flag is ignored on the MSB limb, we can end up with + the most significant limb handled as r = op1 + op2 + ovf1 + ovf2; + or r = op1 - op2 - ovf1 - ovf2; or various equivalent expressions + thereof. Handle those like the ovf = ovf1 + ovf2; case to recognize + the limb below the MSB, but also create another .ADDC/.SUBC call for + the last limb. */ + while (TREE_CODE (rhs[0]) == SSA_NAME && !rhs[3]) + { + gimple *g = SSA_NAME_DEF_STMT (rhs[0]); + if (has_single_use (rhs[0]) + && is_gimple_assign (g) + && (gimple_assign_rhs_code (g) == code + || (code == MINUS_EXPR + && gimple_assign_rhs_code (g) == PLUS_EXPR + && TREE_CODE (gimple_assign_rhs2 (g)) == INTEGER_CST))) + { + rhs[0] = gimple_assign_rhs1 (g); + tree &r = rhs[2] ? rhs[3] : rhs[2]; + r = gimple_assign_rhs2 (g); + if (gimple_assign_rhs_code (g) != code) + r = fold_build1 (NEGATE_EXPR, TREE_TYPE (r), r); + } + else + break; + } + while (TREE_CODE (rhs[1]) == SSA_NAME && !rhs[3]) + { + gimple *g = SSA_NAME_DEF_STMT (rhs[1]); + if (has_single_use (rhs[1]) + && is_gimple_assign (g) + && gimple_assign_rhs_code (g) == PLUS_EXPR) + { + rhs[1] = gimple_assign_rhs1 (g); + if (rhs[2]) + rhs[3] = gimple_assign_rhs2 (g); + else + rhs[2] = gimple_assign_rhs2 (g); + } + else + break; + } + if (rhs[2] && !rhs[3]) + { + for (int i = (code == MINUS_EXPR ? 1 : 0); i < 3; ++i) + if (TREE_CODE (rhs[i]) == SSA_NAME) + { + gimple *im = SSA_NAME_DEF_STMT (rhs[i]); + if (gimple_assign_cast_p (im)) + { + tree op = gimple_assign_rhs1 (im); + if (TREE_CODE (op) == SSA_NAME + && INTEGRAL_TYPE_P (TREE_TYPE (op)) + && (TYPE_PRECISION (TREE_TYPE (op)) > 1 + || TYPE_UNSIGNED (TREE_TYPE (op))) + && has_single_use (rhs[i])) + im = SSA_NAME_DEF_STMT (op); + } + if (is_gimple_assign (im) + && gimple_assign_rhs_code (im) == NE_EXPR + && integer_zerop (gimple_assign_rhs2 (im)) + && TREE_CODE (gimple_assign_rhs1 (im)) == SSA_NAME + && has_single_use (gimple_assign_lhs (im))) + im = SSA_NAME_DEF_STMT (gimple_assign_rhs1 (im)); + if (is_gimple_assign (im) + && gimple_assign_rhs_code (im) == IMAGPART_EXPR + && (TREE_CODE (TREE_OPERAND (gimple_assign_rhs1 (im), 0)) + == SSA_NAME)) + { + tree rhs1 = gimple_assign_rhs1 (im); + gimple *ovf = SSA_NAME_DEF_STMT (TREE_OPERAND (rhs1, 0)); + if (gimple_call_internal_p (ovf, code == PLUS_EXPR + ? IFN_ADDC : IFN_SUBC) + && (optab_handler (code == PLUS_EXPR + ? addc5_optab : subc5_optab, + TYPE_MODE (type)) + != CODE_FOR_nothing)) + { + if (i != 2) + std::swap (rhs[i], rhs[2]); + gimple *g + = gimple_build_call_internal (code == PLUS_EXPR + ? IFN_ADDC : IFN_SUBC, + 3, rhs[0], rhs[1], + rhs[2]); + tree nlhs = make_ssa_name (build_complex_type (type)); + gimple_call_set_lhs (g, nlhs); + gsi_insert_before (gsi, g, GSI_SAME_STMT); + tree ilhs = gimple_assign_lhs (stmt); + g = gimple_build_assign (ilhs, REALPART_EXPR, + build1 (REALPART_EXPR, + TREE_TYPE (ilhs), + nlhs)); + gsi_replace (gsi, g, true); + return true; + } + } + } + return false; + } + if (code == MINUS_EXPR && !rhs[2]) + return false; + if (code == MINUS_EXPR) + /* Code below expects rhs[0] and rhs[1] to have the IMAGPART_EXPRs. + So, for MINUS_EXPR swap the single added rhs operand (others are + subtracted) to rhs[3]. */ + std::swap (rhs[0], rhs[3]); + } + gimple *im1 = NULL, *im2 = NULL; + for (int i = 0; i < (code == MINUS_EXPR ? 3 : 4); i++) + if (rhs[i] && TREE_CODE (rhs[i]) == SSA_NAME) + { + gimple *im = SSA_NAME_DEF_STMT (rhs[i]); + if (gimple_assign_cast_p (im)) + { + tree op = gimple_assign_rhs1 (im); + if (TREE_CODE (op) == SSA_NAME + && INTEGRAL_TYPE_P (TREE_TYPE (op)) + && (TYPE_PRECISION (TREE_TYPE (op)) > 1 + || TYPE_UNSIGNED (TREE_TYPE (op))) + && has_single_use (rhs[i])) + im = SSA_NAME_DEF_STMT (op); + } + if (is_gimple_assign (im) + && gimple_assign_rhs_code (im) == NE_EXPR + && integer_zerop (gimple_assign_rhs2 (im)) + && TREE_CODE (gimple_assign_rhs1 (im)) == SSA_NAME + && has_single_use (gimple_assign_lhs (im))) + im = SSA_NAME_DEF_STMT (gimple_assign_rhs1 (im)); + if (is_gimple_assign (im) + && gimple_assign_rhs_code (im) == IMAGPART_EXPR + && TREE_CODE (TREE_OPERAND (gimple_assign_rhs1 (im), 0)) == SSA_NAME) + { + if (im1 == NULL) + { + im1 = im; + if (i != 0) + std::swap (rhs[0], rhs[i]); + } + else + { + im2 = im; + if (i != 1) + std::swap (rhs[1], rhs[i]); + break; + } + } + } + if (!im2) + return false; + gimple *ovf1 + = SSA_NAME_DEF_STMT (TREE_OPERAND (gimple_assign_rhs1 (im1), 0)); + gimple *ovf2 + = SSA_NAME_DEF_STMT (TREE_OPERAND (gimple_assign_rhs1 (im2), 0)); + internal_fn ifn; + if (!is_gimple_call (ovf1) + || !gimple_call_internal_p (ovf1) + || ((ifn = gimple_call_internal_fn (ovf1)) != IFN_ADD_OVERFLOW + && ifn != IFN_SUB_OVERFLOW) + || !gimple_call_internal_p (ovf2, ifn) + || optab_handler (ifn == IFN_ADD_OVERFLOW ? addc5_optab : subc5_optab, + TYPE_MODE (type)) == CODE_FOR_nothing + || (rhs[2] + && optab_handler (code == PLUS_EXPR ? addc5_optab : subc5_optab, + TYPE_MODE (type)) == CODE_FOR_nothing)) + return false; + tree arg1, arg2, arg3 = NULL_TREE; + gimple *re1 = NULL, *re2 = NULL; + for (int i = (ifn == IFN_ADD_OVERFLOW ? 1 : 0); i >= 0; --i) + for (gimple *ovf = ovf1; ovf; ovf = (ovf == ovf1 ? ovf2 : NULL)) + { + tree arg = gimple_call_arg (ovf, i); + if (TREE_CODE (arg) != SSA_NAME) + continue; + re1 = SSA_NAME_DEF_STMT (arg); + if (is_gimple_assign (re1) + && gimple_assign_rhs_code (re1) == REALPART_EXPR + && (TREE_CODE (TREE_OPERAND (gimple_assign_rhs1 (re1), 0)) + == SSA_NAME) + && (SSA_NAME_DEF_STMT (TREE_OPERAND (gimple_assign_rhs1 (re1), 0)) + == (ovf == ovf1 ? ovf2 : ovf1))) + { + if (ovf == ovf1) + { + std::swap (rhs[0], rhs[1]); + std::swap (im1, im2); + std::swap (ovf1, ovf2); + } + arg3 = gimple_call_arg (ovf, 1 - i); + i = -1; + break; + } + } + if (!arg3) + return false; + arg1 = gimple_call_arg (ovf1, 0); + arg2 = gimple_call_arg (ovf1, 1); + if (!types_compatible_p (type, TREE_TYPE (arg1))) + return false; + int kind[2] = { 0, 0 }; + /* At least one of arg2 and arg3 should have type compatible + with arg1/rhs[0], and the other one should have value in [0, 1] + range. */ + for (int i = 0; i < 2; ++i) + { + tree arg = i == 0 ? arg2 : arg3; + if (types_compatible_p (type, TREE_TYPE (arg))) + kind[i] = 1; + if (!INTEGRAL_TYPE_P (TREE_TYPE (arg)) + || (TYPE_PRECISION (TREE_TYPE (arg)) == 1 + && !TYPE_UNSIGNED (TREE_TYPE (arg)))) + continue; + if (tree_zero_one_valued_p (arg)) + kind[i] |= 2; + if (TREE_CODE (arg) == SSA_NAME) + { + gimple *g = SSA_NAME_DEF_STMT (arg); + if (gimple_assign_cast_p (g)) + { + tree op = gimple_assign_rhs1 (g); + if (TREE_CODE (op) == SSA_NAME + && INTEGRAL_TYPE_P (TREE_TYPE (op))) + g = SSA_NAME_DEF_STMT (op); + } + if (is_gimple_assign (g) + && gimple_assign_rhs_code (g) == NE_EXPR + && integer_zerop (gimple_assign_rhs2 (g)) + && TREE_CODE (gimple_assign_rhs1 (g)) == SSA_NAME) + g = SSA_NAME_DEF_STMT (gimple_assign_rhs1 (g)); + if (!is_gimple_assign (g) + || gimple_assign_rhs_code (g) != IMAGPART_EXPR + || (TREE_CODE (TREE_OPERAND (gimple_assign_rhs1 (g), 0)) + != SSA_NAME)) + continue; + g = SSA_NAME_DEF_STMT (TREE_OPERAND (gimple_assign_rhs1 (g), 0)); + if (!is_gimple_call (g) || !gimple_call_internal_p (g)) + continue; + switch (gimple_call_internal_fn (g)) + { + case IFN_ADD_OVERFLOW: + case IFN_SUB_OVERFLOW: + case IFN_ADDC: + case IFN_SUBC: + break; + default: + continue; + } + kind[i] |= 4; + } + } + /* Make arg2 the one with compatible type and arg3 the one + with [0, 1] range. If both is true for both operands, + prefer as arg3 result of __imag__ of some ifn. */ + if ((kind[0] & 1) == 0 || ((kind[1] & 1) != 0 && kind[0] > kind[1])) + { + std::swap (arg2, arg3); + std::swap (kind[0], kind[1]); + } + if ((kind[0] & 1) == 0 || (kind[1] & 6) == 0) + return false; + if (!has_single_use (gimple_assign_lhs (im1)) + || !has_single_use (gimple_assign_lhs (im2)) + || !has_single_use (gimple_assign_lhs (re1)) + || num_imm_uses (gimple_call_lhs (ovf1)) != 2) + return false; + use_operand_p use_p; + imm_use_iterator iter; + tree lhs = gimple_call_lhs (ovf2); + FOR_EACH_IMM_USE_FAST (use_p, iter, lhs) + { + gimple *use_stmt = USE_STMT (use_p); + if (is_gimple_debug (use_stmt)) + continue; + if (use_stmt == im2) + continue; + if (re2) + return false; + if (!is_gimple_assign (use_stmt) + && gimple_assign_rhs_code (use_stmt) != REALPART_EXPR) + return false; + re2 = use_stmt; + } + gimple_stmt_iterator gsi2 = gsi_for_stmt (ovf2); + gimple *g; + if ((kind[1] & 1) == 0) + { + if (TREE_CODE (arg3) == INTEGER_CST) + arg3 = fold_convert (type, arg3); + else + { + g = gimple_build_assign (make_ssa_name (type), NOP_EXPR, arg3); + gsi_insert_before (&gsi2, g, GSI_SAME_STMT); + arg3 = gimple_assign_lhs (g); + } + } + g = gimple_build_call_internal (ifn == IFN_ADD_OVERFLOW + ? IFN_ADDC : IFN_SUBC, 3, arg1, arg2, arg3); + tree nlhs = make_ssa_name (TREE_TYPE (lhs)); + gimple_call_set_lhs (g, nlhs); + gsi_insert_before (&gsi2, g, GSI_SAME_STMT); + tree ilhs = rhs[2] ? make_ssa_name (type) : gimple_assign_lhs (stmt); + g = gimple_build_assign (ilhs, IMAGPART_EXPR, + build1 (IMAGPART_EXPR, TREE_TYPE (ilhs), nlhs)); + if (rhs[2]) + gsi_insert_before (gsi, g, GSI_SAME_STMT); + else + gsi_replace (gsi, g, true); + tree rhs1 = rhs[1]; + for (int i = 0; i < 2; i++) + if (rhs1 == gimple_assign_lhs (im2)) + break; + else + { + g = SSA_NAME_DEF_STMT (rhs1); + rhs1 = gimple_assign_rhs1 (g); + gsi2 = gsi_for_stmt (g); + gsi_remove (&gsi2, true); + } + gcc_checking_assert (rhs1 == gimple_assign_lhs (im2)); + gsi2 = gsi_for_stmt (im2); + gsi_remove (&gsi2, true); + gsi2 = gsi_for_stmt (re2); + tree rlhs = gimple_assign_lhs (re2); + g = gimple_build_assign (rlhs, REALPART_EXPR, + build1 (REALPART_EXPR, TREE_TYPE (rlhs), nlhs)); + gsi_replace (&gsi2, g, true); + if (rhs[2]) + { + g = gimple_build_call_internal (code == PLUS_EXPR ? IFN_ADDC : IFN_SUBC, + 3, rhs[3], rhs[2], ilhs); + nlhs = make_ssa_name (TREE_TYPE (lhs)); + gimple_call_set_lhs (g, nlhs); + gsi_insert_before (gsi, g, GSI_SAME_STMT); + ilhs = gimple_assign_lhs (stmt); + g = gimple_build_assign (ilhs, REALPART_EXPR, + build1 (REALPART_EXPR, TREE_TYPE (ilhs), nlhs)); + gsi_replace (gsi, g, true); + } + if (TREE_CODE (arg3) == SSA_NAME) + { + gimple *im3 = SSA_NAME_DEF_STMT (arg3); + for (int i = 0; gimple_assign_cast_p (im3) && i < 2; ++i) + { + tree op = gimple_assign_rhs1 (im3); + if (TREE_CODE (op) == SSA_NAME + && INTEGRAL_TYPE_P (TREE_TYPE (op)) + && (TYPE_PRECISION (TREE_TYPE (op)) > 1 + || TYPE_UNSIGNED (TREE_TYPE (op)))) + im3 = SSA_NAME_DEF_STMT (op); + else + break; + } + if (is_gimple_assign (im3) + && gimple_assign_rhs_code (im3) == NE_EXPR + && integer_zerop (gimple_assign_rhs2 (im3)) + && TREE_CODE (gimple_assign_rhs1 (im3)) == SSA_NAME + && has_single_use (gimple_assign_lhs (im3))) + im3 = SSA_NAME_DEF_STMT (gimple_assign_rhs1 (im3)); + if (is_gimple_assign (im3) + && gimple_assign_rhs_code (im3) == IMAGPART_EXPR + && (TREE_CODE (TREE_OPERAND (gimple_assign_rhs1 (im3), 0)) + == SSA_NAME)) + { + gimple *ovf3 + = SSA_NAME_DEF_STMT (TREE_OPERAND (gimple_assign_rhs1 (im3), 0)); + if (gimple_call_internal_p (ovf3, ifn)) + { + lhs = gimple_call_lhs (ovf3); + arg1 = gimple_call_arg (ovf3, 0); + arg2 = gimple_call_arg (ovf3, 1); + if (types_compatible_p (type, TREE_TYPE (TREE_TYPE (lhs))) + && types_compatible_p (type, TREE_TYPE (arg1)) + && types_compatible_p (type, TREE_TYPE (arg2))) + { + g = gimple_build_call_internal (ifn == IFN_ADD_OVERFLOW + ? IFN_ADDC : IFN_SUBC, + 3, arg1, arg2, + build_zero_cst (type)); + gimple_call_set_lhs (g, lhs); + gsi2 = gsi_for_stmt (ovf3); + gsi_replace (&gsi2, g, true); + } + } + } + } + return true; +} + /* Return true if target has support for divmod. */ static bool @@ -5068,8 +5500,9 @@ math_opts_dom_walker::after_dom_children case PLUS_EXPR: case MINUS_EXPR: - if (!convert_plusminus_to_widen (&gsi, stmt, code)) - match_arith_overflow (&gsi, stmt, code, m_cfg_changed_p); + if (!convert_plusminus_to_widen (&gsi, stmt, code) + && !match_arith_overflow (&gsi, stmt, code, m_cfg_changed_p)) + match_addc_subc (&gsi, stmt, code); break; case BIT_NOT_EXPR: @@ -5085,6 +5518,11 @@ math_opts_dom_walker::after_dom_children convert_mult_to_highpart (as_a (stmt), &gsi); break; + case BIT_IOR_EXPR: + case BIT_XOR_EXPR: + match_addc_subc (&gsi, stmt, code); + break; + default:; } } --- gcc/gimple-fold.cc.jj 2023-05-01 09:59:46.434297471 +0200 +++ gcc/gimple-fold.cc 2023-06-06 13:35:15.463010972 +0200 @@ -5585,6 +5585,7 @@ gimple_fold_call (gimple_stmt_iterator * enum tree_code subcode = ERROR_MARK; tree result = NULL_TREE; bool cplx_result = false; + bool addc_subc = false; tree overflow = NULL_TREE; switch (gimple_call_internal_fn (stmt)) { @@ -5658,6 +5659,16 @@ gimple_fold_call (gimple_stmt_iterator * subcode = MULT_EXPR; cplx_result = true; break; + case IFN_ADDC: + subcode = PLUS_EXPR; + cplx_result = true; + addc_subc = true; + break; + case IFN_SUBC: + subcode = MINUS_EXPR; + cplx_result = true; + addc_subc = true; + break; case IFN_MASK_LOAD: changed |= gimple_fold_partial_load (gsi, stmt, true); break; @@ -5677,6 +5688,7 @@ gimple_fold_call (gimple_stmt_iterator * { tree arg0 = gimple_call_arg (stmt, 0); tree arg1 = gimple_call_arg (stmt, 1); + tree arg2 = NULL_TREE; tree type = TREE_TYPE (arg0); if (cplx_result) { @@ -5685,9 +5697,26 @@ gimple_fold_call (gimple_stmt_iterator * type = NULL_TREE; else type = TREE_TYPE (TREE_TYPE (lhs)); + if (addc_subc) + arg2 = gimple_call_arg (stmt, 2); } if (type == NULL_TREE) ; + else if (addc_subc) + { + if (!integer_zerop (arg2)) + ; + /* x = y + 0 + 0; x = y - 0 - 0; */ + else if (integer_zerop (arg1)) + result = arg0; + /* x = 0 + y + 0; */ + else if (subcode != MINUS_EXPR && integer_zerop (arg0)) + result = arg1; + /* x = y - y - 0; */ + else if (subcode == MINUS_EXPR + && operand_equal_p (arg0, arg1, 0)) + result = integer_zero_node; + } /* x = y + 0; x = y - 0; x = y * 0; */ else if (integer_zerop (arg1)) result = subcode == MULT_EXPR ? integer_zero_node : arg0; @@ -5702,8 +5731,11 @@ gimple_fold_call (gimple_stmt_iterator * result = arg0; else if (subcode == MULT_EXPR && integer_onep (arg0)) result = arg1; - else if (TREE_CODE (arg0) == INTEGER_CST - && TREE_CODE (arg1) == INTEGER_CST) + if (type + && result == NULL_TREE + && TREE_CODE (arg0) == INTEGER_CST + && TREE_CODE (arg1) == INTEGER_CST + && (!addc_subc || TREE_CODE (arg2) == INTEGER_CST)) { if (cplx_result) result = int_const_binop (subcode, fold_convert (type, arg0), @@ -5717,6 +5749,15 @@ gimple_fold_call (gimple_stmt_iterator * else result = NULL_TREE; } + if (addc_subc && result) + { + tree r = int_const_binop (subcode, result, + fold_convert (type, arg2)); + if (r == NULL_TREE) + result = NULL_TREE; + else if (arith_overflowed_p (subcode, type, result, arg2)) + overflow = build_one_cst (type); + } } if (result) { --- gcc/gimple-range-fold.cc.jj 2023-05-25 09:42:28.034696783 +0200 +++ gcc/gimple-range-fold.cc 2023-06-06 09:41:06.716896505 +0200 @@ -489,6 +489,8 @@ adjust_imagpart_expr (vrange &res, const case IFN_ADD_OVERFLOW: case IFN_SUB_OVERFLOW: case IFN_MUL_OVERFLOW: + case IFN_ADDC: + case IFN_SUBC: case IFN_ATOMIC_COMPARE_EXCHANGE: { int_range<2> r; --- gcc/tree-ssa-dce.cc.jj 2023-05-15 19:12:35.012626408 +0200 +++ gcc/tree-ssa-dce.cc 2023-06-06 13:35:30.271802380 +0200 @@ -1481,6 +1481,14 @@ eliminate_unnecessary_stmts (bool aggres case IFN_MUL_OVERFLOW: maybe_optimize_arith_overflow (&gsi, MULT_EXPR); break; + case IFN_ADDC: + if (integer_zerop (gimple_call_arg (stmt, 2))) + maybe_optimize_arith_overflow (&gsi, PLUS_EXPR); + break; + case IFN_SUBC: + if (integer_zerop (gimple_call_arg (stmt, 2))) + maybe_optimize_arith_overflow (&gsi, MINUS_EXPR); + break; default: break; } --- gcc/doc/md.texi.jj 2023-05-25 09:42:28.009697144 +0200 +++ gcc/doc/md.texi 2023-06-06 13:33:56.565122304 +0200 @@ -5202,6 +5202,22 @@ is taken only on unsigned overflow. @item @samp{usubv@var{m}4}, @samp{umulv@var{m}4} Similar, for other unsigned arithmetic operations. +@cindex @code{addc@var{m}5} instruction pattern +@item @samp{addc@var{m}5} +Adds operands 2, 3 and 4 (where the last operand is guaranteed to have +only values 0 or 1) together, sets operand 0 to the result of the +addition of the 3 operands and sets operand 1 to 1 iff there was no +overflow on the unsigned additions, and to 0 otherwise. So, it is +an addition with carry in (operand 4) and carry out (operand 1). +All operands have the same mode. + +@cindex @code{subc@var{m}5} instruction pattern +@item @samp{subc@var{m}5} +Similarly to @samp{addc@var{m}5}, except subtracts operands 3 and 4 +from operand 2 instead of adding them. So, it is +a subtraction with carry/borrow in (operand 4) and carry/borrow out +(operand 1). All operands have the same mode. + @cindex @code{addptr@var{m}3} instruction pattern @item @samp{addptr@var{m}3} Like @code{add@var{m}3} but is guaranteed to only be used for address --- gcc/config/i386/i386.md.jj 2023-05-11 11:54:42.906956432 +0200 +++ gcc/config/i386/i386.md 2023-06-06 16:27:38.300455824 +0200 @@ -7685,6 +7685,25 @@ (define_peephole2 [(set (reg:CC FLAGS_REG) (compare:CC (match_dup 0) (match_dup 1)))]) +(define_peephole2 + [(set (match_operand:SWI 0 "general_reg_operand") + (match_operand:SWI 1 "memory_operand")) + (parallel [(set (reg:CC FLAGS_REG) + (compare:CC (match_dup 0) + (match_operand:SWI 2 "memory_operand"))) + (set (match_dup 0) + (minus:SWI (match_dup 0) (match_dup 2)))]) + (set (match_dup 1) (match_dup 0))] + "(TARGET_READ_MODIFY_WRITE || optimize_insn_for_size_p ()) + && peep2_reg_dead_p (3, operands[0]) + && !reg_overlap_mentioned_p (operands[0], operands[1]) + && !reg_overlap_mentioned_p (operands[0], operands[2])" + [(set (match_dup 0) (match_dup 2)) + (parallel [(set (reg:CC FLAGS_REG) + (compare:CC (match_dup 1) (match_dup 0))) + (set (match_dup 1) + (minus:SWI (match_dup 1) (match_dup 0)))])]) + ;; decl %eax; cmpl $-1, %eax; jne .Lxx; can be optimized into ;; subl $1, %eax; jnc .Lxx; (define_peephole2 @@ -7770,6 +7789,59 @@ (define_insn "@add3_carry" (set_attr "pent_pair" "pu") (set_attr "mode" "")]) +(define_peephole2 + [(set (match_operand:SWI 0 "general_reg_operand") + (match_operand:SWI 1 "memory_operand")) + (parallel [(set (match_dup 0) + (plus:SWI + (plus:SWI + (match_operator:SWI 4 "ix86_carry_flag_operator" + [(match_operand 3 "flags_reg_operand") + (const_int 0)]) + (match_dup 0)) + (match_operand:SWI 2 "memory_operand"))) + (clobber (reg:CC FLAGS_REG))]) + (set (match_dup 1) (match_dup 0))] + "(TARGET_READ_MODIFY_WRITE || optimize_insn_for_size_p ()) + && peep2_reg_dead_p (3, operands[0]) + && !reg_overlap_mentioned_p (operands[0], operands[1]) + && !reg_overlap_mentioned_p (operands[0], operands[2])" + [(set (match_dup 0) (match_dup 2)) + (parallel [(set (match_dup 1) + (plus:SWI (plus:SWI (match_op_dup 4 + [(match_dup 3) (const_int 0)]) + (match_dup 1)) + (match_dup 0))) + (clobber (reg:CC FLAGS_REG))])]) + +(define_peephole2 + [(set (match_operand:SWI 0 "general_reg_operand") + (match_operand:SWI 1 "memory_operand")) + (parallel [(set (match_dup 0) + (plus:SWI + (plus:SWI + (match_operator:SWI 4 "ix86_carry_flag_operator" + [(match_operand 3 "flags_reg_operand") + (const_int 0)]) + (match_dup 0)) + (match_operand:SWI 2 "memory_operand"))) + (clobber (reg:CC FLAGS_REG))]) + (set (match_operand:SWI 5 "general_reg_operand") (match_dup 0)) + (set (match_dup 1) (match_dup 5))] + "(TARGET_READ_MODIFY_WRITE || optimize_insn_for_size_p ()) + && peep2_reg_dead_p (3, operands[0]) + && peep2_reg_dead_p (4, operands[5]) + && !reg_overlap_mentioned_p (operands[0], operands[1]) + && !reg_overlap_mentioned_p (operands[0], operands[2]) + && !reg_overlap_mentioned_p (operands[5], operands[1])" + [(set (match_dup 0) (match_dup 2)) + (parallel [(set (match_dup 1) + (plus:SWI (plus:SWI (match_op_dup 4 + [(match_dup 3) (const_int 0)]) + (match_dup 1)) + (match_dup 0))) + (clobber (reg:CC FLAGS_REG))])]) + (define_insn "*add3_carry_0" [(set (match_operand:SWI 0 "nonimmediate_operand" "=m") (plus:SWI @@ -7870,6 +7942,159 @@ (define_insn "addcarry" (set_attr "pent_pair" "pu") (set_attr "mode" "")]) +;; Helper peephole2 for the addcarry and subborrow +;; peephole2s, to optimize away nop which resulted from addc/subc +;; expansion optimization. +(define_peephole2 + [(set (match_operand:SWI48 0 "general_reg_operand") + (match_operand:SWI48 1 "memory_operand")) + (const_int 0)] + "" + [(set (match_dup 0) (match_dup 1))]) + +(define_peephole2 + [(parallel [(set (reg:CCC FLAGS_REG) + (compare:CCC + (zero_extend: + (plus:SWI48 + (plus:SWI48 + (match_operator:SWI48 4 "ix86_carry_flag_operator" + [(match_operand 2 "flags_reg_operand") + (const_int 0)]) + (match_operand:SWI48 0 "general_reg_operand")) + (match_operand:SWI48 1 "memory_operand"))) + (plus: + (zero_extend: (match_dup 1)) + (match_operator: 3 "ix86_carry_flag_operator" + [(match_dup 2) (const_int 0)])))) + (set (match_dup 0) + (plus:SWI48 (plus:SWI48 (match_op_dup 4 + [(match_dup 2) (const_int 0)]) + (match_dup 0)) + (match_dup 1)))]) + (set (match_dup 1) (match_dup 0))] + "(TARGET_READ_MODIFY_WRITE || optimize_insn_for_size_p ()) + && peep2_reg_dead_p (2, operands[0]) + && !reg_overlap_mentioned_p (operands[0], operands[1])" + [(parallel [(set (reg:CCC FLAGS_REG) + (compare:CCC + (zero_extend: + (plus:SWI48 + (plus:SWI48 + (match_op_dup 4 + [(match_dup 2) (const_int 0)]) + (match_dup 1)) + (match_dup 0))) + (plus: + (zero_extend: (match_dup 0)) + (match_op_dup 3 + [(match_dup 2) (const_int 0)])))) + (set (match_dup 1) + (plus:SWI48 (plus:SWI48 (match_op_dup 4 + [(match_dup 2) (const_int 0)]) + (match_dup 1)) + (match_dup 0)))])]) + +(define_peephole2 + [(set (match_operand:SWI48 0 "general_reg_operand") + (match_operand:SWI48 1 "memory_operand")) + (parallel [(set (reg:CCC FLAGS_REG) + (compare:CCC + (zero_extend: + (plus:SWI48 + (plus:SWI48 + (match_operator:SWI48 5 "ix86_carry_flag_operator" + [(match_operand 3 "flags_reg_operand") + (const_int 0)]) + (match_dup 0)) + (match_operand:SWI48 2 "memory_operand"))) + (plus: + (zero_extend: (match_dup 2)) + (match_operator: 4 "ix86_carry_flag_operator" + [(match_dup 3) (const_int 0)])))) + (set (match_dup 0) + (plus:SWI48 (plus:SWI48 (match_op_dup 5 + [(match_dup 3) (const_int 0)]) + (match_dup 0)) + (match_dup 2)))]) + (set (match_dup 1) (match_dup 0))] + "(TARGET_READ_MODIFY_WRITE || optimize_insn_for_size_p ()) + && peep2_reg_dead_p (3, operands[0]) + && !reg_overlap_mentioned_p (operands[0], operands[1]) + && !reg_overlap_mentioned_p (operands[0], operands[2])" + [(set (match_dup 0) (match_dup 2)) + (parallel [(set (reg:CCC FLAGS_REG) + (compare:CCC + (zero_extend: + (plus:SWI48 + (plus:SWI48 + (match_op_dup 5 + [(match_dup 3) (const_int 0)]) + (match_dup 1)) + (match_dup 0))) + (plus: + (zero_extend: (match_dup 0)) + (match_op_dup 4 + [(match_dup 3) (const_int 0)])))) + (set (match_dup 1) + (plus:SWI48 (plus:SWI48 (match_op_dup 5 + [(match_dup 3) (const_int 0)]) + (match_dup 1)) + (match_dup 0)))])]) + +(define_peephole2 + [(parallel [(set (reg:CCC FLAGS_REG) + (compare:CCC + (zero_extend: + (plus:SWI48 + (plus:SWI48 + (match_operator:SWI48 4 "ix86_carry_flag_operator" + [(match_operand 2 "flags_reg_operand") + (const_int 0)]) + (match_operand:SWI48 0 "general_reg_operand")) + (match_operand:SWI48 1 "memory_operand"))) + (plus: + (zero_extend: (match_dup 1)) + (match_operator: 3 "ix86_carry_flag_operator" + [(match_dup 2) (const_int 0)])))) + (set (match_dup 0) + (plus:SWI48 (plus:SWI48 (match_op_dup 4 + [(match_dup 2) (const_int 0)]) + (match_dup 0)) + (match_dup 1)))]) + (set (match_operand:QI 5 "general_reg_operand") + (ltu:QI (reg:CCC FLAGS_REG) (const_int 0))) + (set (match_operand:SWI48 6 "general_reg_operand") + (zero_extend:SWI48 (match_dup 5))) + (set (match_dup 1) (match_dup 0))] + "(TARGET_READ_MODIFY_WRITE || optimize_insn_for_size_p ()) + && peep2_reg_dead_p (4, operands[0]) + && !reg_overlap_mentioned_p (operands[0], operands[1]) + && !reg_overlap_mentioned_p (operands[0], operands[5]) + && !reg_overlap_mentioned_p (operands[5], operands[1]) + && !reg_overlap_mentioned_p (operands[0], operands[6]) + && !reg_overlap_mentioned_p (operands[6], operands[1])" + [(parallel [(set (reg:CCC FLAGS_REG) + (compare:CCC + (zero_extend: + (plus:SWI48 + (plus:SWI48 + (match_op_dup 4 + [(match_dup 2) (const_int 0)]) + (match_dup 1)) + (match_dup 0))) + (plus: + (zero_extend: (match_dup 0)) + (match_op_dup 3 + [(match_dup 2) (const_int 0)])))) + (set (match_dup 1) + (plus:SWI48 (plus:SWI48 (match_op_dup 4 + [(match_dup 2) (const_int 0)]) + (match_dup 1)) + (match_dup 0)))]) + (set (match_dup 5) (ltu:QI (reg:CCC FLAGS_REG) (const_int 0))) + (set (match_dup 6) (zero_extend:SWI48 (match_dup 5)))]) + (define_expand "addcarry_0" [(parallel [(set (reg:CCC FLAGS_REG) @@ -7940,6 +8165,59 @@ (define_insn "@sub3_carry" (set_attr "pent_pair" "pu") (set_attr "mode" "")]) +(define_peephole2 + [(set (match_operand:SWI 0 "general_reg_operand") + (match_operand:SWI 1 "memory_operand")) + (parallel [(set (match_dup 0) + (minus:SWI + (minus:SWI + (match_dup 0) + (match_operator:SWI 4 "ix86_carry_flag_operator" + [(match_operand 3 "flags_reg_operand") + (const_int 0)])) + (match_operand:SWI 2 "memory_operand"))) + (clobber (reg:CC FLAGS_REG))]) + (set (match_dup 1) (match_dup 0))] + "(TARGET_READ_MODIFY_WRITE || optimize_insn_for_size_p ()) + && peep2_reg_dead_p (3, operands[0]) + && !reg_overlap_mentioned_p (operands[0], operands[1]) + && !reg_overlap_mentioned_p (operands[0], operands[2])" + [(set (match_dup 0) (match_dup 2)) + (parallel [(set (match_dup 1) + (minus:SWI (minus:SWI (match_dup 1) + (match_op_dup 4 + [(match_dup 3) (const_int 0)])) + (match_dup 0))) + (clobber (reg:CC FLAGS_REG))])]) + +(define_peephole2 + [(set (match_operand:SWI 0 "general_reg_operand") + (match_operand:SWI 1 "memory_operand")) + (parallel [(set (match_dup 0) + (minus:SWI + (minus:SWI + (match_dup 0) + (match_operator:SWI 4 "ix86_carry_flag_operator" + [(match_operand 3 "flags_reg_operand") + (const_int 0)])) + (match_operand:SWI 2 "memory_operand"))) + (clobber (reg:CC FLAGS_REG))]) + (set (match_operand:SWI 5 "general_reg_operand") (match_dup 0)) + (set (match_dup 1) (match_dup 5))] + "(TARGET_READ_MODIFY_WRITE || optimize_insn_for_size_p ()) + && peep2_reg_dead_p (3, operands[0]) + && peep2_reg_dead_p (4, operands[5]) + && !reg_overlap_mentioned_p (operands[0], operands[1]) + && !reg_overlap_mentioned_p (operands[0], operands[2]) + && !reg_overlap_mentioned_p (operands[5], operands[1])" + [(set (match_dup 0) (match_dup 2)) + (parallel [(set (match_dup 1) + (minus:SWI (minus:SWI (match_dup 1) + (match_op_dup 4 + [(match_dup 3) (const_int 0)])) + (match_dup 0))) + (clobber (reg:CC FLAGS_REG))])]) + (define_insn "*sub3_carry_0" [(set (match_operand:SWI 0 "nonimmediate_operand" "=m") (minus:SWI @@ -8065,13 +8343,13 @@ (define_insn "subborrow" [(set (reg:CCC FLAGS_REG) (compare:CCC (zero_extend: - (match_operand:SWI48 1 "nonimmediate_operand" "0")) + (match_operand:SWI48 1 "nonimmediate_operand" "0,0")) (plus: (match_operator: 4 "ix86_carry_flag_operator" [(match_operand 3 "flags_reg_operand") (const_int 0)]) (zero_extend: - (match_operand:SWI48 2 "nonimmediate_operand" "rm"))))) - (set (match_operand:SWI48 0 "register_operand" "=r") + (match_operand:SWI48 2 "nonimmediate_operand" "r,rm"))))) + (set (match_operand:SWI48 0 "nonimmediate_operand" "=rm,r") (minus:SWI48 (minus:SWI48 (match_dup 1) (match_operator:SWI48 5 "ix86_carry_flag_operator" @@ -8084,6 +8362,154 @@ (define_insn "subborrow" (set_attr "pent_pair" "pu") (set_attr "mode" "")]) +(define_peephole2 + [(set (match_operand:SWI48 0 "general_reg_operand") + (match_operand:SWI48 1 "memory_operand")) + (parallel [(set (reg:CCC FLAGS_REG) + (compare:CCC + (zero_extend: (match_dup 0)) + (plus: + (match_operator: 4 "ix86_carry_flag_operator" + [(match_operand 3 "flags_reg_operand") (const_int 0)]) + (zero_extend: + (match_operand:SWI48 2 "memory_operand"))))) + (set (match_dup 0) + (minus:SWI48 + (minus:SWI48 + (match_dup 0) + (match_operator:SWI48 5 "ix86_carry_flag_operator" + [(match_dup 3) (const_int 0)])) + (match_dup 2)))]) + (set (match_dup 1) (match_dup 0))] + "(TARGET_READ_MODIFY_WRITE || optimize_insn_for_size_p ()) + && peep2_reg_dead_p (3, operands[0]) + && !reg_overlap_mentioned_p (operands[0], operands[1]) + && !reg_overlap_mentioned_p (operands[0], operands[2])" + [(set (match_dup 0) (match_dup 2)) + (parallel [(set (reg:CCC FLAGS_REG) + (compare:CCC + (zero_extend: (match_dup 1)) + (plus: (match_op_dup 4 + [(match_dup 3) (const_int 0)]) + (zero_extend: (match_dup 0))))) + (set (match_dup 1) + (minus:SWI48 (minus:SWI48 (match_dup 1) + (match_op_dup 5 + [(match_dup 3) (const_int 0)])) + (match_dup 0)))])]) + +(define_peephole2 + [(set (match_operand:SWI48 6 "general_reg_operand") + (match_operand:SWI48 7 "memory_operand")) + (set (match_operand:SWI48 8 "general_reg_operand") + (match_operand:SWI48 9 "memory_operand")) + (parallel [(set (reg:CCC FLAGS_REG) + (compare:CCC + (zero_extend: + (match_operand:SWI48 0 "general_reg_operand")) + (plus: + (match_operator: 4 "ix86_carry_flag_operator" + [(match_operand 3 "flags_reg_operand") (const_int 0)]) + (zero_extend: + (match_operand:SWI48 2 "general_reg_operand"))))) + (set (match_dup 0) + (minus:SWI48 + (minus:SWI48 + (match_dup 0) + (match_operator:SWI48 5 "ix86_carry_flag_operator" + [(match_dup 3) (const_int 0)])) + (match_dup 2)))]) + (set (match_operand:SWI48 1 "memory_operand") (match_dup 0))] + "(TARGET_READ_MODIFY_WRITE || optimize_insn_for_size_p ()) + && peep2_reg_dead_p (4, operands[0]) + && peep2_reg_dead_p (3, operands[2]) + && !reg_overlap_mentioned_p (operands[0], operands[1]) + && !reg_overlap_mentioned_p (operands[2], operands[1]) + && !reg_overlap_mentioned_p (operands[6], operands[9]) + && (rtx_equal_p (operands[6], operands[0]) + ? (rtx_equal_p (operands[7], operands[1]) + && rtx_equal_p (operands[8], operands[2])) + : (rtx_equal_p (operands[8], operands[0]) + && rtx_equal_p (operands[9], operands[1]) + && rtx_equal_p (operands[6], operands[2])))" + [(set (match_dup 0) (match_dup 9)) + (parallel [(set (reg:CCC FLAGS_REG) + (compare:CCC + (zero_extend: (match_dup 1)) + (plus: (match_op_dup 4 + [(match_dup 3) (const_int 0)]) + (zero_extend: (match_dup 0))))) + (set (match_dup 1) + (minus:SWI48 (minus:SWI48 (match_dup 1) + (match_op_dup 5 + [(match_dup 3) (const_int 0)])) + (match_dup 0)))])] +{ + if (!rtx_equal_p (operands[6], operands[0])) + operands[9] = operands[7]; +}) + +(define_peephole2 + [(set (match_operand:SWI48 6 "general_reg_operand") + (match_operand:SWI48 7 "memory_operand")) + (set (match_operand:SWI48 8 "general_reg_operand") + (match_operand:SWI48 9 "memory_operand")) + (parallel [(set (reg:CCC FLAGS_REG) + (compare:CCC + (zero_extend: + (match_operand:SWI48 0 "general_reg_operand")) + (plus: + (match_operator: 4 "ix86_carry_flag_operator" + [(match_operand 3 "flags_reg_operand") (const_int 0)]) + (zero_extend: + (match_operand:SWI48 2 "general_reg_operand"))))) + (set (match_dup 0) + (minus:SWI48 + (minus:SWI48 + (match_dup 0) + (match_operator:SWI48 5 "ix86_carry_flag_operator" + [(match_dup 3) (const_int 0)])) + (match_dup 2)))]) + (set (match_operand:QI 10 "general_reg_operand") + (ltu:QI (reg:CCC FLAGS_REG) (const_int 0))) + (set (match_operand:SWI48 11 "general_reg_operand") + (zero_extend:SWI48 (match_dup 10))) + (set (match_operand:SWI48 1 "memory_operand") (match_dup 0))] + "(TARGET_READ_MODIFY_WRITE || optimize_insn_for_size_p ()) + && peep2_reg_dead_p (6, operands[0]) + && peep2_reg_dead_p (3, operands[2]) + && !reg_overlap_mentioned_p (operands[0], operands[1]) + && !reg_overlap_mentioned_p (operands[2], operands[1]) + && !reg_overlap_mentioned_p (operands[6], operands[9]) + && !reg_overlap_mentioned_p (operands[0], operands[10]) + && !reg_overlap_mentioned_p (operands[10], operands[1]) + && !reg_overlap_mentioned_p (operands[0], operands[11]) + && !reg_overlap_mentioned_p (operands[11], operands[1]) + && (rtx_equal_p (operands[6], operands[0]) + ? (rtx_equal_p (operands[7], operands[1]) + && rtx_equal_p (operands[8], operands[2])) + : (rtx_equal_p (operands[8], operands[0]) + && rtx_equal_p (operands[9], operands[1]) + && rtx_equal_p (operands[6], operands[2])))" + [(set (match_dup 0) (match_dup 9)) + (parallel [(set (reg:CCC FLAGS_REG) + (compare:CCC + (zero_extend: (match_dup 1)) + (plus: (match_op_dup 4 + [(match_dup 3) (const_int 0)]) + (zero_extend: (match_dup 0))))) + (set (match_dup 1) + (minus:SWI48 (minus:SWI48 (match_dup 1) + (match_op_dup 5 + [(match_dup 3) (const_int 0)])) + (match_dup 0)))]) + (set (match_dup 10) (ltu:QI (reg:CCC FLAGS_REG) (const_int 0))) + (set (match_dup 11) (zero_extend:SWI48 (match_dup 10)))] +{ + if (!rtx_equal_p (operands[6], operands[0])) + operands[9] = operands[7]; +}) + (define_expand "subborrow_0" [(parallel [(set (reg:CC FLAGS_REG) @@ -8094,6 +8520,67 @@ (define_expand "subborrow_0" (minus:SWI48 (match_dup 1) (match_dup 2)))])] "ix86_binary_operator_ok (MINUS, mode, operands)") +(define_expand "addc5" + [(match_operand:SWI48 0 "register_operand") + (match_operand:SWI48 1 "register_operand") + (match_operand:SWI48 2 "register_operand") + (match_operand:SWI48 3 "register_operand") + (match_operand:SWI48 4 "nonmemory_operand")] + "" +{ + rtx cf = gen_rtx_REG (CCCmode, FLAGS_REG), pat, pat2; + if (operands[4] == const0_rtx) + emit_insn (gen_addcarry_0 (operands[0], operands[2], operands[3])); + else + { + rtx op4 = copy_to_mode_reg (QImode, + convert_to_mode (QImode, operands[4], 1)); + emit_insn (gen_addqi3_cconly_overflow (op4, constm1_rtx)); + pat = gen_rtx_LTU (mode, cf, const0_rtx); + pat2 = gen_rtx_LTU (mode, cf, const0_rtx); + emit_insn (gen_addcarry (operands[0], operands[2], operands[3], + cf, pat, pat2)); + } + rtx cc = gen_reg_rtx (QImode); + pat = gen_rtx_LTU (QImode, cf, const0_rtx); + emit_insn (gen_rtx_SET (cc, pat)); + emit_insn (gen_zero_extendqi2 (operands[1], cc)); + DONE; +}) + +(define_expand "subc5" + [(match_operand:SWI48 0 "register_operand") + (match_operand:SWI48 1 "register_operand") + (match_operand:SWI48 2 "register_operand") + (match_operand:SWI48 3 "register_operand") + (match_operand:SWI48 4 "nonmemory_operand")] + "" +{ + rtx cf, pat, pat2; + if (operands[4] == const0_rtx) + { + cf = gen_rtx_REG (CCmode, FLAGS_REG); + emit_insn (gen_subborrow_0 (operands[0], operands[2], + operands[3])); + } + else + { + cf = gen_rtx_REG (CCCmode, FLAGS_REG); + rtx op4 = copy_to_mode_reg (QImode, + convert_to_mode (QImode, operands[4], 1)); + emit_insn (gen_addqi3_cconly_overflow (op4, constm1_rtx)); + pat = gen_rtx_LTU (mode, cf, const0_rtx); + pat2 = gen_rtx_LTU (mode, cf, const0_rtx); + emit_insn (gen_subborrow (operands[0], operands[2], operands[3], + cf, pat, pat2)); + } + rtx cc = gen_reg_rtx (QImode); + pat = gen_rtx_LTU (QImode, cf, const0_rtx); + emit_insn (gen_rtx_SET (cc, pat)); + emit_insn (gen_zero_extendqi2 (operands[1], cc)); + DONE; +}) + (define_mode_iterator CC_CCC [CC CCC]) ;; Pre-reload splitter to optimize @@ -8163,6 +8650,27 @@ (define_peephole2 (compare:CCC (plus:SWI (match_dup 1) (match_dup 0)) (match_dup 1))) + (set (match_dup 1) (plus:SWI (match_dup 1) (match_dup 0)))])]) + +(define_peephole2 + [(set (match_operand:SWI 0 "general_reg_operand") + (match_operand:SWI 1 "memory_operand")) + (parallel [(set (reg:CCC FLAGS_REG) + (compare:CCC + (plus:SWI (match_dup 0) + (match_operand:SWI 2 "memory_operand")) + (match_dup 0))) + (set (match_dup 0) (plus:SWI (match_dup 0) (match_dup 2)))]) + (set (match_dup 1) (match_dup 0))] + "(TARGET_READ_MODIFY_WRITE || optimize_insn_for_size_p ()) + && peep2_reg_dead_p (3, operands[0]) + && !reg_overlap_mentioned_p (operands[0], operands[1]) + && !reg_overlap_mentioned_p (operands[0], operands[2])" + [(set (match_dup 0) (match_dup 2)) + (parallel [(set (reg:CCC FLAGS_REG) + (compare:CCC + (plus:SWI (match_dup 1) (match_dup 0)) + (match_dup 1))) (set (match_dup 1) (plus:SWI (match_dup 1) (match_dup 0)))])]) (define_insn "*addsi3_zext_cc_overflow_1" --- gcc/testsuite/gcc.target/i386/pr79173-1.c.jj 2023-06-06 13:23:03.667319915 +0200 +++ gcc/testsuite/gcc.target/i386/pr79173-1.c 2023-06-06 13:53:04.087958943 +0200 @@ -0,0 +1,59 @@ +/* PR middle-end/79173 */ +/* { dg-do compile } */ +/* { dg-options "-O2 -fno-stack-protector -masm=att" } */ +/* { dg-final { scan-assembler-times "addq\t%r\[^\n\r]*, \\\(%rdi\\\)" 1 { target lp64 } } } */ +/* { dg-final { scan-assembler-times "adcq\t%r\[^\n\r]*, 8\\\(%rdi\\\)" 1 { target lp64 } } } */ +/* { dg-final { scan-assembler-times "adcq\t%r\[^\n\r]*, 16\\\(%rdi\\\)" 1 { target lp64 } } } */ +/* { dg-final { scan-assembler-times "adcq\t%r\[^\n\r]*, 24\\\(%rdi\\\)" 1 { target lp64 } } } */ +/* { dg-final { scan-assembler-times "subq\t%r\[^\n\r]*, \\\(%rdi\\\)" 1 { target lp64 } } } */ +/* { dg-final { scan-assembler-times "sbbq\t%r\[^\n\r]*, 8\\\(%rdi\\\)" 1 { target lp64 } } } */ +/* { dg-final { scan-assembler-times "sbbq\t%r\[^\n\r]*, 16\\\(%rdi\\\)" 1 { target lp64 } } } */ +/* { dg-final { scan-assembler-times "sbbq\t%r\[^\n\r]*, 24\\\(%rdi\\\)" 1 { target lp64 } } } */ +/* { dg-final { scan-assembler-times "addl\t%e\[^\n\r]*, \\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */ +/* { dg-final { scan-assembler-times "adcl\t%e\[^\n\r]*, 4\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */ +/* { dg-final { scan-assembler-times "adcl\t%e\[^\n\r]*, 8\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */ +/* { dg-final { scan-assembler-times "adcl\t%e\[^\n\r]*, 12\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */ +/* { dg-final { scan-assembler-times "subl\t%e\[^\n\r]*, \\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */ +/* { dg-final { scan-assembler-times "sbbl\t%e\[^\n\r]*, 4\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */ +/* { dg-final { scan-assembler-times "sbbl\t%e\[^\n\r]*, 8\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */ +/* { dg-final { scan-assembler-times "sbbl\t%e\[^\n\r]*, 12\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */ + +static unsigned long +addc (unsigned long x, unsigned long y, unsigned long carry_in, unsigned long *carry_out) +{ + unsigned long r; + unsigned long c1 = __builtin_add_overflow (x, y, &r); + unsigned long c2 = __builtin_add_overflow (r, carry_in, &r); + *carry_out = c1 + c2; + return r; +} + +static unsigned long +subc (unsigned long x, unsigned long y, unsigned long carry_in, unsigned long *carry_out) +{ + unsigned long r; + unsigned long c1 = __builtin_sub_overflow (x, y, &r); + unsigned long c2 = __builtin_sub_overflow (r, carry_in, &r); + *carry_out = c1 + c2; + return r; +} + +void +foo (unsigned long *p, unsigned long *q) +{ + unsigned long c; + p[0] = addc (p[0], q[0], 0, &c); + p[1] = addc (p[1], q[1], c, &c); + p[2] = addc (p[2], q[2], c, &c); + p[3] = addc (p[3], q[3], c, &c); +} + +void +bar (unsigned long *p, unsigned long *q) +{ + unsigned long c; + p[0] = subc (p[0], q[0], 0, &c); + p[1] = subc (p[1], q[1], c, &c); + p[2] = subc (p[2], q[2], c, &c); + p[3] = subc (p[3], q[3], c, &c); +} --- gcc/testsuite/gcc.target/i386/pr79173-2.c.jj 2023-06-06 13:23:49.482674416 +0200 +++ gcc/testsuite/gcc.target/i386/pr79173-2.c 2023-06-06 13:53:04.088958929 +0200 @@ -0,0 +1,59 @@ +/* PR middle-end/79173 */ +/* { dg-do compile } */ +/* { dg-options "-O2 -fno-stack-protector -masm=att" } */ +/* { dg-final { scan-assembler-times "addq\t%r\[^\n\r]*, \\\(%rdi\\\)" 1 { target lp64 } } } */ +/* { dg-final { scan-assembler-times "adcq\t%r\[^\n\r]*, 8\\\(%rdi\\\)" 1 { target lp64 } } } */ +/* { dg-final { scan-assembler-times "adcq\t%r\[^\n\r]*, 16\\\(%rdi\\\)" 1 { target lp64 } } } */ +/* { dg-final { scan-assembler-times "adcq\t%r\[^\n\r]*, 24\\\(%rdi\\\)" 1 { target lp64 } } } */ +/* { dg-final { scan-assembler-times "subq\t%r\[^\n\r]*, \\\(%rdi\\\)" 1 { target lp64 } } } */ +/* { dg-final { scan-assembler-times "sbbq\t%r\[^\n\r]*, 8\\\(%rdi\\\)" 1 { target lp64 } } } */ +/* { dg-final { scan-assembler-times "sbbq\t%r\[^\n\r]*, 16\\\(%rdi\\\)" 1 { target lp64 } } } */ +/* { dg-final { scan-assembler-times "sbbq\t%r\[^\n\r]*, 24\\\(%rdi\\\)" 1 { target lp64 } } } */ +/* { dg-final { scan-assembler-times "addl\t%e\[^\n\r]*, \\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */ +/* { dg-final { scan-assembler-times "adcl\t%e\[^\n\r]*, 4\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */ +/* { dg-final { scan-assembler-times "adcl\t%e\[^\n\r]*, 8\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */ +/* { dg-final { scan-assembler-times "adcl\t%e\[^\n\r]*, 12\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */ +/* { dg-final { scan-assembler-times "subl\t%e\[^\n\r]*, \\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */ +/* { dg-final { scan-assembler-times "sbbl\t%e\[^\n\r]*, 4\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */ +/* { dg-final { scan-assembler-times "sbbl\t%e\[^\n\r]*, 8\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */ +/* { dg-final { scan-assembler-times "sbbl\t%e\[^\n\r]*, 12\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */ + +static unsigned long +addc (unsigned long x, unsigned long y, _Bool carry_in, _Bool *carry_out) +{ + unsigned long r; + _Bool c1 = __builtin_add_overflow (x, y, &r); + _Bool c2 = __builtin_add_overflow (r, carry_in, &r); + *carry_out = c1 | c2; + return r; +} + +static unsigned long +subc (unsigned long x, unsigned long y, _Bool carry_in, _Bool *carry_out) +{ + unsigned long r; + _Bool c1 = __builtin_sub_overflow (x, y, &r); + _Bool c2 = __builtin_sub_overflow (r, carry_in, &r); + *carry_out = c1 | c2; + return r; +} + +void +foo (unsigned long *p, unsigned long *q) +{ + _Bool c; + p[0] = addc (p[0], q[0], 0, &c); + p[1] = addc (p[1], q[1], c, &c); + p[2] = addc (p[2], q[2], c, &c); + p[3] = addc (p[3], q[3], c, &c); +} + +void +bar (unsigned long *p, unsigned long *q) +{ + _Bool c; + p[0] = subc (p[0], q[0], 0, &c); + p[1] = subc (p[1], q[1], c, &c); + p[2] = subc (p[2], q[2], c, &c); + p[3] = subc (p[3], q[3], c, &c); +} --- gcc/testsuite/gcc.target/i386/pr79173-3.c.jj 2023-06-06 13:23:52.680629360 +0200 +++ gcc/testsuite/gcc.target/i386/pr79173-3.c 2023-06-06 13:53:04.088958929 +0200 @@ -0,0 +1,61 @@ +/* PR middle-end/79173 */ +/* { dg-do compile } */ +/* { dg-options "-O2 -fno-stack-protector -masm=att" } */ +/* { dg-final { scan-assembler-times "addq\t%r\[^\n\r]*, \\\(%rdi\\\)" 1 { target lp64 } } } */ +/* { dg-final { scan-assembler-times "adcq\t%r\[^\n\r]*, 8\\\(%rdi\\\)" 1 { target lp64 } } } */ +/* { dg-final { scan-assembler-times "adcq\t%r\[^\n\r]*, 16\\\(%rdi\\\)" 1 { target lp64 } } } */ +/* { dg-final { scan-assembler-times "adcq\t%r\[^\n\r]*, 24\\\(%rdi\\\)" 1 { target lp64 } } } */ +/* { dg-final { scan-assembler-times "subq\t%r\[^\n\r]*, \\\(%rdi\\\)" 1 { target lp64 } } } */ +/* { dg-final { scan-assembler-times "sbbq\t%r\[^\n\r]*, 8\\\(%rdi\\\)" 1 { target lp64 } } } */ +/* { dg-final { scan-assembler-times "sbbq\t%r\[^\n\r]*, 16\\\(%rdi\\\)" 1 { target lp64 } } } */ +/* { dg-final { scan-assembler-times "sbbq\t%r\[^\n\r]*, 24\\\(%rdi\\\)" 1 { target lp64 } } } */ +/* { dg-final { scan-assembler-times "addl\t%e\[^\n\r]*, \\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */ +/* { dg-final { scan-assembler-times "adcl\t%e\[^\n\r]*, 4\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */ +/* { dg-final { scan-assembler-times "adcl\t%e\[^\n\r]*, 8\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */ +/* { dg-final { scan-assembler-times "adcl\t%e\[^\n\r]*, 12\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */ +/* { dg-final { scan-assembler-times "subl\t%e\[^\n\r]*, \\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */ +/* { dg-final { scan-assembler-times "sbbl\t%e\[^\n\r]*, 4\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */ +/* { dg-final { scan-assembler-times "sbbl\t%e\[^\n\r]*, 8\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */ +/* { dg-final { scan-assembler-times "sbbl\t%e\[^\n\r]*, 12\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */ + +static unsigned long +addc (unsigned long x, unsigned long y, unsigned long carry_in, unsigned long *carry_out) +{ + unsigned long r; + unsigned long c1 = __builtin_add_overflow (x, y, &r); + unsigned long c2 = __builtin_add_overflow (r, carry_in, &r); + *carry_out = c1 + c2; + return r; +} + +static unsigned long +subc (unsigned long x, unsigned long y, unsigned long carry_in, unsigned long *carry_out) +{ + unsigned long r; + unsigned long c1 = __builtin_sub_overflow (x, y, &r); + unsigned long c2 = __builtin_sub_overflow (r, carry_in, &r); + *carry_out = c1 + c2; + return r; +} + +unsigned long +foo (unsigned long *p, unsigned long *q) +{ + unsigned long c; + p[0] = addc (p[0], q[0], 0, &c); + p[1] = addc (p[1], q[1], c, &c); + p[2] = addc (p[2], q[2], c, &c); + p[3] = addc (p[3], q[3], c, &c); + return c; +} + +unsigned long +bar (unsigned long *p, unsigned long *q) +{ + unsigned long c; + p[0] = subc (p[0], q[0], 0, &c); + p[1] = subc (p[1], q[1], c, &c); + p[2] = subc (p[2], q[2], c, &c); + p[3] = subc (p[3], q[3], c, &c); + return c; +} --- gcc/testsuite/gcc.target/i386/pr79173-4.c.jj 2023-06-06 13:23:55.895584064 +0200 +++ gcc/testsuite/gcc.target/i386/pr79173-4.c 2023-06-06 13:53:04.088958929 +0200 @@ -0,0 +1,61 @@ +/* PR middle-end/79173 */ +/* { dg-do compile } */ +/* { dg-options "-O2 -fno-stack-protector -masm=att" } */ +/* { dg-final { scan-assembler-times "addq\t%r\[^\n\r]*, \\\(%rdi\\\)" 1 { target lp64 } } } */ +/* { dg-final { scan-assembler-times "adcq\t%r\[^\n\r]*, 8\\\(%rdi\\\)" 1 { target lp64 } } } */ +/* { dg-final { scan-assembler-times "adcq\t%r\[^\n\r]*, 16\\\(%rdi\\\)" 1 { target lp64 } } } */ +/* { dg-final { scan-assembler-times "adcq\t%r\[^\n\r]*, 24\\\(%rdi\\\)" 1 { target lp64 } } } */ +/* { dg-final { scan-assembler-times "subq\t%r\[^\n\r]*, \\\(%rdi\\\)" 1 { target lp64 } } } */ +/* { dg-final { scan-assembler-times "sbbq\t%r\[^\n\r]*, 8\\\(%rdi\\\)" 1 { target lp64 } } } */ +/* { dg-final { scan-assembler-times "sbbq\t%r\[^\n\r]*, 16\\\(%rdi\\\)" 1 { target lp64 } } } */ +/* { dg-final { scan-assembler-times "sbbq\t%r\[^\n\r]*, 24\\\(%rdi\\\)" 1 { target lp64 } } } */ +/* { dg-final { scan-assembler-times "addl\t%e\[^\n\r]*, \\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */ +/* { dg-final { scan-assembler-times "adcl\t%e\[^\n\r]*, 4\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */ +/* { dg-final { scan-assembler-times "adcl\t%e\[^\n\r]*, 8\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */ +/* { dg-final { scan-assembler-times "adcl\t%e\[^\n\r]*, 12\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */ +/* { dg-final { scan-assembler-times "subl\t%e\[^\n\r]*, \\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */ +/* { dg-final { scan-assembler-times "sbbl\t%e\[^\n\r]*, 4\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */ +/* { dg-final { scan-assembler-times "sbbl\t%e\[^\n\r]*, 8\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */ +/* { dg-final { scan-assembler-times "sbbl\t%e\[^\n\r]*, 12\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */ + +static unsigned long +addc (unsigned long x, unsigned long y, _Bool carry_in, _Bool *carry_out) +{ + unsigned long r; + _Bool c1 = __builtin_add_overflow (x, y, &r); + _Bool c2 = __builtin_add_overflow (r, carry_in, &r); + *carry_out = c1 ^ c2; + return r; +} + +static unsigned long +subc (unsigned long x, unsigned long y, _Bool carry_in, _Bool *carry_out) +{ + unsigned long r; + _Bool c1 = __builtin_sub_overflow (x, y, &r); + _Bool c2 = __builtin_sub_overflow (r, carry_in, &r); + *carry_out = c1 ^ c2; + return r; +} + +_Bool +foo (unsigned long *p, unsigned long *q) +{ + _Bool c; + p[0] = addc (p[0], q[0], 0, &c); + p[1] = addc (p[1], q[1], c, &c); + p[2] = addc (p[2], q[2], c, &c); + p[3] = addc (p[3], q[3], c, &c); + return c; +} + +_Bool +bar (unsigned long *p, unsigned long *q) +{ + _Bool c; + p[0] = subc (p[0], q[0], 0, &c); + p[1] = subc (p[1], q[1], c, &c); + p[2] = subc (p[2], q[2], c, &c); + p[3] = subc (p[3], q[3], c, &c); + return c; +} --- gcc/testsuite/gcc.target/i386/pr79173-5.c.jj 2023-06-06 13:39:52.283111764 +0200 +++ gcc/testsuite/gcc.target/i386/pr79173-5.c 2023-06-06 17:33:36.370088539 +0200 @@ -0,0 +1,32 @@ +/* PR middle-end/79173 */ +/* { dg-do compile } */ +/* { dg-options "-O2 -fno-stack-protector -masm=att" } */ +/* { dg-final { scan-assembler-times "addq\t%r\[^\n\r]*, \\\(%rdi\\\)" 1 { target lp64 } } } */ +/* { dg-final { scan-assembler-times "adcq\t%r\[^\n\r]*, 8\\\(%rdi\\\)" 1 { target lp64 } } } */ +/* { dg-final { scan-assembler-times "adcq\t%r\[^\n\r]*, 16\\\(%rdi\\\)" 1 { target lp64 } } } */ +/* { dg-final { scan-assembler-times "adcq\t%r\[^\n\r]*, 24\\\(%rdi\\\)" 1 { target lp64 } } } */ +/* { dg-final { scan-assembler-times "addl\t%e\[^\n\r]*, \\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */ +/* { dg-final { scan-assembler-times "adcl\t%e\[^\n\r]*, 4\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */ +/* { dg-final { scan-assembler-times "adcl\t%e\[^\n\r]*, 8\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */ +/* { dg-final { scan-assembler-times "adcl\t%e\[^\n\r]*, 12\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */ + +static unsigned long +addc (unsigned long x, unsigned long y, unsigned long carry_in, unsigned long *carry_out) +{ + unsigned long r = x + y; + unsigned long c1 = r < x; + r += carry_in; + unsigned long c2 = r < carry_in; + *carry_out = c1 + c2; + return r; +} + +void +foo (unsigned long *p, unsigned long *q) +{ + unsigned long c; + p[0] = addc (p[0], q[0], 0, &c); + p[1] = addc (p[1], q[1], c, &c); + p[2] = addc (p[2], q[2], c, &c); + p[3] = addc (p[3], q[3], c, &c); +} --- gcc/testsuite/gcc.target/i386/pr79173-6.c.jj 2023-06-06 17:34:25.618401505 +0200 +++ gcc/testsuite/gcc.target/i386/pr79173-6.c 2023-06-06 17:36:11.248927942 +0200 @@ -0,0 +1,33 @@ +/* PR middle-end/79173 */ +/* { dg-do compile } */ +/* { dg-options "-O2 -fno-stack-protector -masm=att" } */ +/* { dg-final { scan-assembler-times "addq\t%r\[^\n\r]*, \\\(%rdi\\\)" 1 { target lp64 } } } */ +/* { dg-final { scan-assembler-times "adcq\t%r\[^\n\r]*, 8\\\(%rdi\\\)" 1 { target lp64 } } } */ +/* { dg-final { scan-assembler-times "adcq\t%r\[^\n\r]*, 16\\\(%rdi\\\)" 1 { target lp64 } } } */ +/* { dg-final { scan-assembler-times "adcq\t%r\[^\n\r]*, 24\\\(%rdi\\\)" 1 { target lp64 } } } */ +/* { dg-final { scan-assembler-times "addl\t%e\[^\n\r]*, \\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */ +/* { dg-final { scan-assembler-times "adcl\t%e\[^\n\r]*, 4\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */ +/* { dg-final { scan-assembler-times "adcl\t%e\[^\n\r]*, 8\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */ +/* { dg-final { scan-assembler-times "adcl\t%e\[^\n\r]*, 12\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */ + +static unsigned long +addc (unsigned long x, unsigned long y, unsigned long carry_in, unsigned long *carry_out) +{ + unsigned long r = x + y; + unsigned long c1 = r < x; + r += carry_in; + unsigned long c2 = r < carry_in; + *carry_out = c1 + c2; + return r; +} + +unsigned long +foo (unsigned long *p, unsigned long *q) +{ + unsigned long c; + p[0] = addc (p[0], q[0], 0, &c); + p[1] = addc (p[1], q[1], c, &c); + p[2] = addc (p[2], q[2], c, &c); + p[3] = addc (p[3], q[3], c, &c); + return c; +} --- gcc/testsuite/gcc.target/i386/pr79173-7.c.jj 2023-06-06 17:49:46.702561308 +0200 +++ gcc/testsuite/gcc.target/i386/pr79173-7.c 2023-06-06 17:50:36.364871245 +0200 @@ -0,0 +1,31 @@ +/* PR middle-end/79173 */ +/* { dg-do compile { target lp64 } } */ +/* { dg-options "-O2 -fno-stack-protector -masm=att" } */ +/* { dg-final { scan-assembler-times "addq\t%r\[^\n\r]*, \\\(%rdi\\\)" 1 } } */ +/* { dg-final { scan-assembler-times "adcq\t%r\[^\n\r]*, 8\\\(%rdi\\\)" 1 } } */ +/* { dg-final { scan-assembler-times "adcq\t%r\[^\n\r]*, 16\\\(%rdi\\\)" 1 } } */ +/* { dg-final { scan-assembler-times "adcq\t%r\[^\n\r]*, 24\\\(%rdi\\\)" 1 } } */ +/* { dg-final { scan-assembler-times "subq\t%r\[^\n\r]*, \\\(%rdi\\\)" 1 } } */ +/* { dg-final { scan-assembler-times "sbbq\t%r\[^\n\r]*, 8\\\(%rdi\\\)" 1 } } */ +/* { dg-final { scan-assembler-times "sbbq\t%r\[^\n\r]*, 16\\\(%rdi\\\)" 1 } } */ +/* { dg-final { scan-assembler-times "sbbq\t%r\[^\n\r]*, 24\\\(%rdi\\\)" 1 } } */ + +#include + +void +foo (unsigned long long *p, unsigned long long *q) +{ + unsigned char c = _addcarry_u64 (0, p[0], q[0], &p[0]); + c = _addcarry_u64 (c, p[1], q[1], &p[1]); + c = _addcarry_u64 (c, p[2], q[2], &p[2]); + _addcarry_u64 (c, p[3], q[3], &p[3]); +} + +void +bar (unsigned long long *p, unsigned long long *q) +{ + unsigned char c = _subborrow_u64 (0, p[0], q[0], &p[0]); + c = _subborrow_u64 (c, p[1], q[1], &p[1]); + c = _subborrow_u64 (c, p[2], q[2], &p[2]); + _subborrow_u64 (c, p[3], q[3], &p[3]); +} --- gcc/testsuite/gcc.target/i386/pr79173-8.c.jj 2023-06-06 17:50:45.970737772 +0200 +++ gcc/testsuite/gcc.target/i386/pr79173-8.c 2023-06-06 17:52:19.564437290 +0200 @@ -0,0 +1,31 @@ +/* PR middle-end/79173 */ +/* { dg-do compile } */ +/* { dg-options "-O2 -fno-stack-protector -masm=att" } */ +/* { dg-final { scan-assembler-times "addl\t%e\[^\n\r]*, \\\(%\[^\n\r]*\\\)" 1 } } */ +/* { dg-final { scan-assembler-times "adcl\t%e\[^\n\r]*, 4\\\(%\[^\n\r]*\\\)" 1 } } */ +/* { dg-final { scan-assembler-times "adcl\t%e\[^\n\r]*, 8\\\(%\[^\n\r]*\\\)" 1 } } */ +/* { dg-final { scan-assembler-times "adcl\t%e\[^\n\r]*, 12\\\(%\[^\n\r]*\\\)" 1 } } */ +/* { dg-final { scan-assembler-times "subl\t%e\[^\n\r]*, \\\(%\[^\n\r]*\\\)" 1 } } */ +/* { dg-final { scan-assembler-times "sbbl\t%e\[^\n\r]*, 4\\\(%\[^\n\r]*\\\)" 1 } } */ +/* { dg-final { scan-assembler-times "sbbl\t%e\[^\n\r]*, 8\\\(%\[^\n\r]*\\\)" 1 } } */ +/* { dg-final { scan-assembler-times "sbbl\t%e\[^\n\r]*, 12\\\(%\[^\n\r]*\\\)" 1 } } */ + +#include + +void +foo (unsigned int *p, unsigned int *q) +{ + unsigned char c = _addcarry_u32 (0, p[0], q[0], &p[0]); + c = _addcarry_u32 (c, p[1], q[1], &p[1]); + c = _addcarry_u32 (c, p[2], q[2], &p[2]); + _addcarry_u32 (c, p[3], q[3], &p[3]); +} + +void +bar (unsigned int *p, unsigned int *q) +{ + unsigned char c = _subborrow_u32 (0, p[0], q[0], &p[0]); + c = _subborrow_u32 (c, p[1], q[1], &p[1]); + c = _subborrow_u32 (c, p[2], q[2], &p[2]); + _subborrow_u32 (c, p[3], q[3], &p[3]); +} --- gcc/testsuite/gcc.target/i386/pr79173-9.c.jj 2023-06-06 17:52:35.869210734 +0200 +++ gcc/testsuite/gcc.target/i386/pr79173-9.c 2023-06-06 17:53:00.076874369 +0200 @@ -0,0 +1,31 @@ +/* PR middle-end/79173 */ +/* { dg-do compile { target lp64 } } */ +/* { dg-options "-O2 -fno-stack-protector -masm=att" } */ +/* { dg-final { scan-assembler-times "addq\t%r\[^\n\r]*, \\\(%rdi\\\)" 1 } } */ +/* { dg-final { scan-assembler-times "adcq\t%r\[^\n\r]*, 8\\\(%rdi\\\)" 1 } } */ +/* { dg-final { scan-assembler-times "adcq\t%r\[^\n\r]*, 16\\\(%rdi\\\)" 1 } } */ +/* { dg-final { scan-assembler-times "adcq\t%r\[^\n\r]*, 24\\\(%rdi\\\)" 1 } } */ +/* { dg-final { scan-assembler-times "subq\t%r\[^\n\r]*, \\\(%rdi\\\)" 1 } } */ +/* { dg-final { scan-assembler-times "sbbq\t%r\[^\n\r]*, 8\\\(%rdi\\\)" 1 } } */ +/* { dg-final { scan-assembler-times "sbbq\t%r\[^\n\r]*, 16\\\(%rdi\\\)" 1 } } */ +/* { dg-final { scan-assembler-times "sbbq\t%r\[^\n\r]*, 24\\\(%rdi\\\)" 1 } } */ + +#include + +unsigned long long +foo (unsigned long long *p, unsigned long long *q) +{ + unsigned char c = _addcarry_u64 (0, p[0], q[0], &p[0]); + c = _addcarry_u64 (c, p[1], q[1], &p[1]); + c = _addcarry_u64 (c, p[2], q[2], &p[2]); + return _addcarry_u64 (c, p[3], q[3], &p[3]); +} + +unsigned long long +bar (unsigned long long *p, unsigned long long *q) +{ + unsigned char c = _subborrow_u64 (0, p[0], q[0], &p[0]); + c = _subborrow_u64 (c, p[1], q[1], &p[1]); + c = _subborrow_u64 (c, p[2], q[2], &p[2]); + return _subborrow_u64 (c, p[3], q[3], &p[3]); +} --- gcc/testsuite/gcc.target/i386/pr79173-10.c.jj 2023-06-06 17:53:29.576464475 +0200 +++ gcc/testsuite/gcc.target/i386/pr79173-10.c 2023-06-06 17:53:25.021527762 +0200 @@ -0,0 +1,31 @@ +/* PR middle-end/79173 */ +/* { dg-do compile } */ +/* { dg-options "-O2 -fno-stack-protector -masm=att" } */ +/* { dg-final { scan-assembler-times "addl\t%e\[^\n\r]*, \\\(%\[^\n\r]*\\\)" 1 } } */ +/* { dg-final { scan-assembler-times "adcl\t%e\[^\n\r]*, 4\\\(%\[^\n\r]*\\\)" 1 } } */ +/* { dg-final { scan-assembler-times "adcl\t%e\[^\n\r]*, 8\\\(%\[^\n\r]*\\\)" 1 } } */ +/* { dg-final { scan-assembler-times "adcl\t%e\[^\n\r]*, 12\\\(%\[^\n\r]*\\\)" 1 } } */ +/* { dg-final { scan-assembler-times "subl\t%e\[^\n\r]*, \\\(%\[^\n\r]*\\\)" 1 } } */ +/* { dg-final { scan-assembler-times "sbbl\t%e\[^\n\r]*, 4\\\(%\[^\n\r]*\\\)" 1 } } */ +/* { dg-final { scan-assembler-times "sbbl\t%e\[^\n\r]*, 8\\\(%\[^\n\r]*\\\)" 1 } } */ +/* { dg-final { scan-assembler-times "sbbl\t%e\[^\n\r]*, 12\\\(%\[^\n\r]*\\\)" 1 } } */ + +#include + +unsigned int +foo (unsigned int *p, unsigned int *q) +{ + unsigned char c = _addcarry_u32 (0, p[0], q[0], &p[0]); + c = _addcarry_u32 (c, p[1], q[1], &p[1]); + c = _addcarry_u32 (c, p[2], q[2], &p[2]); + return _addcarry_u32 (c, p[3], q[3], &p[3]); +} + +unsigned int +bar (unsigned int *p, unsigned int *q) +{ + unsigned char c = _subborrow_u32 (0, p[0], q[0], &p[0]); + c = _subborrow_u32 (c, p[1], q[1], &p[1]); + c = _subborrow_u32 (c, p[2], q[2], &p[2]); + return _subborrow_u32 (c, p[3], q[3], &p[3]); +} Jakub