From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 17730 invoked by alias); 5 Jul 2017 09:38:25 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 17716 invoked by uid 89); 5 Jul 2017 09:38:25 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-26.9 required=5.0 tests=BAYES_00,GIT_PATCH_0,GIT_PATCH_1,GIT_PATCH_2,GIT_PATCH_3,RP_MATCHES_RCVD,SPF_PASS autolearn=ham version=3.3.2 spammy=ji, rr, JI, generations X-HELO: foss.arm.com Received: from foss.arm.com (HELO foss.arm.com) (217.140.101.70) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Wed, 05 Jul 2017 09:38:20 +0000 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.72.51.249]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 9451080D; Wed, 5 Jul 2017 02:38:18 -0700 (PDT) Received: from e105689-lin.cambridge.arm.com (e105689-lin.cambridge.arm.com [10.2.207.32]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 2F5FF3F581; Wed, 5 Jul 2017 02:38:16 -0700 (PDT) Subject: Re: [PATCH][Aarch64] Add support for overflow add and sub operations To: Michael Collison , Christophe Lyon Cc: "gcc-patches@gcc.gnu.org" , nd References: From: "Richard Earnshaw (lists)" Message-ID: Date: Wed, 05 Jul 2017 09:38:00 -0000 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.1.1 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-SW-Source: 2017-07/txt/msg00249.txt.bz2 On 19/05/17 22:11, Michael Collison wrote: > Christophe, > > I had a type in the two test cases: "addcs" should have been "adcs". I caught this previously but submitted the previous patch incorrectly. Updated patch attached. > > Okay for trunk? > Apologies for the delay responding, I've been procrastinating over this one. In part it's due to the size of the patch with very little top-level description of what's the motivation and overall approach to the problem. It would really help review if this could be split into multiple patches with a description of what each stage achieves. Anyway, there are a couple of obvious formatting issues to deal with first, before we get into the details of the patch. > -----Original Message----- > From: Christophe Lyon [mailto:christophe.lyon@linaro.org] > Sent: Friday, May 19, 2017 3:59 AM > To: Michael Collison > Cc: gcc-patches@gcc.gnu.org; nd > Subject: Re: [PATCH][Aarch64] Add support for overflow add and sub operations > > Hi Michael, > > > On 19 May 2017 at 07:12, Michael Collison wrote: >> Hi, >> >> This patch improves code generations for builtin arithmetic overflow operations for the aarch64 backend. As an example for a simple test case such as: >> >> Sure for a simple test case such as: >> >> int >> f (int x, int y, int *ovf) >> { >> int res; >> *ovf = __builtin_sadd_overflow (x, y, &res); >> return res; >> } >> >> Current trunk at -O2 generates >> >> f: >> mov w3, w0 >> mov w4, 0 >> add w0, w0, w1 >> tbnz w1, #31, .L4 >> cmp w0, w3 >> blt .L3 >> .L2: >> str w4, [x2] >> ret >> .p2align 3 >> .L4: >> cmp w0, w3 >> ble .L2 >> .L3: >> mov w4, 1 >> b .L2 >> >> >> With the patch this now generates: >> >> f: >> adds w0, w0, w1 >> cset w1, vs >> str w1, [x2] >> ret >> >> >> Original patch from Richard Henderson: >> >> https://gcc.gnu.org/ml/gcc-patches/2016-01/msg01903.html >> >> >> Okay for trunk? >> >> 2017-05-17 Michael Collison >> Richard Henderson >> >> * config/aarch64/aarch64-modes.def (CC_V): New. >> * config/aarch64/aarch64-protos.h >> (aarch64_add_128bit_scratch_regs): Declare >> (aarch64_add_128bit_scratch_regs): Declare. >> (aarch64_expand_subvti): Declare. >> (aarch64_gen_unlikely_cbranch): Declare >> * config/aarch64/aarch64.c (aarch64_select_cc_mode): Test >> for signed overflow using CC_Vmode. >> (aarch64_get_condition_code_1): Handle CC_Vmode. >> (aarch64_gen_unlikely_cbranch): New function. >> (aarch64_add_128bit_scratch_regs): New function. >> (aarch64_subv_128bit_scratch_regs): New function. >> (aarch64_expand_subvti): New function. >> * config/aarch64/aarch64.md (addv4, uaddv4): New. >> (addti3): Create simpler code if low part is already known to be 0. >> (addvti4, uaddvti4): New. >> (*add3_compareC_cconly_imm): New. >> (*add3_compareC_cconly): New. >> (*add3_compareC_imm): New. >> (*add3_compareC): Rename from add3_compare1; do not >> handle constants within this pattern. >> (*add3_compareV_cconly_imm): New. >> (*add3_compareV_cconly): New. >> (*add3_compareV_imm): New. >> (add3_compareV): New. >> (add3_carryinC, add3_carryinV): New. >> (*add3_carryinC_zero, *add3_carryinV_zero): New. >> (*add3_carryinC, *add3_carryinV): New. >> (subv4, usubv4): New. >> (subti): Handle op1 zero. >> (subvti4, usub4ti4): New. >> (*sub3_compare1_imm): New. >> (sub3_carryinCV): New. >> (*sub3_carryinCV_z1_z2, *sub3_carryinCV_z1): New. >> (*sub3_carryinCV_z2, *sub3_carryinCV): New. >> * testsuite/gcc.target/arm/builtin_sadd_128.c: New testcase. >> * testsuite/gcc.target/arm/builtin_saddl.c: New testcase. >> * testsuite/gcc.target/arm/builtin_saddll.c: New testcase. >> * testsuite/gcc.target/arm/builtin_uadd_128.c: New testcase. >> * testsuite/gcc.target/arm/builtin_uaddl.c: New testcase. >> * testsuite/gcc.target/arm/builtin_uaddll.c: New testcase. >> * testsuite/gcc.target/arm/builtin_ssub_128.c: New testcase. >> * testsuite/gcc.target/arm/builtin_ssubl.c: New testcase. >> * testsuite/gcc.target/arm/builtin_ssubll.c: New testcase. >> * testsuite/gcc.target/arm/builtin_usub_128.c: New testcase. >> * testsuite/gcc.target/arm/builtin_usubl.c: New testcase. >> * testsuite/gcc.target/arm/builtin_usubll.c: New testcase. > > I've tried your patch, and 2 of the new tests FAIL: > gcc.target/aarch64/builtin_sadd_128.c scan-assembler addcs > gcc.target/aarch64/builtin_uadd_128.c scan-assembler addcs > > Am I missing something? > > Thanks, > > Christophe > > > pr6308v2.patch > > > diff --git a/gcc/config/aarch64/aarch64-modes.def b/gcc/config/aarch64/aarch64-modes.def > index 45f7a44..244e490 100644 > --- a/gcc/config/aarch64/aarch64-modes.def > +++ b/gcc/config/aarch64/aarch64-modes.def > @@ -24,6 +24,7 @@ CC_MODE (CC_SWP); > CC_MODE (CC_NZ); /* Only N and Z bits of condition flags are valid. */ > CC_MODE (CC_Z); /* Only Z bit of condition flags is valid. */ > CC_MODE (CC_C); /* Only C bit of condition flags is valid. */ > +CC_MODE (CC_V); /* Only V bit of condition flags is valid. */ > > /* Half-precision floating point for __fp16. */ > FLOAT_MODE (HF, 2, 0); > diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h > index f55d4ba..f38b2b8 100644 > --- a/gcc/config/aarch64/aarch64-protos.h > +++ b/gcc/config/aarch64/aarch64-protos.h > @@ -388,6 +388,18 @@ void aarch64_relayout_simd_types (void); > void aarch64_reset_previous_fndecl (void); > bool aarch64_return_address_signing_enabled (void); > void aarch64_save_restore_target_globals (tree); > +void aarch64_add_128bit_scratch_regs (rtx op1, rtx op2, rtx *low_dest, > + rtx *low_in1, rtx *low_in2, > + rtx *high_dest, rtx *high_in1, > + rtx *high_in2); > +void aarch64_subv_128bit_scratch_regs (rtx op1, rtx op2, rtx *low_dest, > + rtx *low_in1, rtx *low_in2, > + rtx *high_dest, rtx *high_in1, > + rtx *high_in2); > +void aarch64_expand_subvti (rtx op0, rtx low_dest, rtx low_in1, > + rtx low_in2, rtx high_dest, rtx high_in1, > + rtx high_in2); > + It's a little bit inconsistent, but the general style in aarch64-protos.h is not to include parameter names in prototypes, just their types. > > /* Initialize builtins for SIMD intrinsics. */ > void init_aarch64_simd_builtins (void); > @@ -412,6 +424,8 @@ bool aarch64_float_const_representable_p (rtx); > > #if defined (RTX_CODE) > > +void aarch64_gen_unlikely_cbranch (enum rtx_code, machine_mode cc_mode, > + rtx label_ref); > bool aarch64_legitimate_address_p (machine_mode, rtx, RTX_CODE, bool); > machine_mode aarch64_select_cc_mode (RTX_CODE, rtx, rtx); > rtx aarch64_gen_compare_reg (RTX_CODE, rtx, rtx); > diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c > index f343d92..71a651c 100644 > --- a/gcc/config/aarch64/aarch64.c > +++ b/gcc/config/aarch64/aarch64.c > @@ -4716,6 +4716,13 @@ aarch64_select_cc_mode (RTX_CODE code, rtx x, rtx y) > && GET_CODE (y) == ZERO_EXTEND) > return CC_Cmode; > > + /* A test for signed overflow. */ > + if ((GET_MODE (x) == DImode || GET_MODE (x) == TImode) > + && code == NE > + && GET_CODE (x) == PLUS > + && GET_CODE (y) == SIGN_EXTEND) > + return CC_Vmode; > + > /* For everything else, return CCmode. */ > return CCmode; > } > @@ -4822,6 +4829,15 @@ aarch64_get_condition_code_1 (enum machine_mode mode, enum rtx_code comp_code) > } > break; > > + case CC_Vmode: > + switch (comp_code) > + { > + case NE: return AARCH64_VS; > + case EQ: return AARCH64_VC; > + default: return -1; > + } > + break; > + > default: > return -1; > } > @@ -13630,6 +13646,88 @@ aarch64_split_dimode_const_store (rtx dst, rtx src) > return true; > } > > +/* Generate RTL for a conditional branch with rtx comparison CODE in > + mode CC_MODE. The destination of the unlikely conditional branch > + is LABEL_REF. */ > + > +void > +aarch64_gen_unlikely_cbranch (enum rtx_code code, machine_mode cc_mode, > + rtx label_ref) > +{ > + rtx x; > + x = gen_rtx_fmt_ee (code, VOIDmode, > + gen_rtx_REG (cc_mode, CC_REGNUM), > + const0_rtx); > + > + x = gen_rtx_IF_THEN_ELSE (VOIDmode, x, > + gen_rtx_LABEL_REF (VOIDmode, label_ref), > + pc_rtx); > + aarch64_emit_unlikely_jump (gen_rtx_SET (pc_rtx, x)); > +} > + > +void aarch64_add_128bit_scratch_regs (rtx op1, rtx op2, rtx *low_dest, Function names must start in column 1, with the return type on the preceding line. All functions should have a top-level comment describing what they do (their contract with the caller). > + rtx *low_in1, rtx *low_in2, > + rtx *high_dest, rtx *high_in1, > + rtx *high_in2) > +{ > + *low_dest = gen_reg_rtx (DImode); > + *low_in1 = gen_lowpart (DImode, op1); > + *low_in2 = simplify_gen_subreg (DImode, op2, TImode, > + subreg_lowpart_offset (DImode, TImode)); > + *high_dest = gen_reg_rtx (DImode); > + *high_in1 = gen_highpart (DImode, op1); > + *high_in2 = simplify_gen_subreg (DImode, op2, TImode, > + subreg_highpart_offset (DImode, TImode)); > +} > + > +void aarch64_subv_128bit_scratch_regs (rtx op1, rtx op2, rtx *low_dest, Same here. > + rtx *low_in1, rtx *low_in2, > + rtx *high_dest, rtx *high_in1, > + rtx *high_in2) > +{ > + *low_dest = gen_reg_rtx (DImode); > + *low_in1 = simplify_gen_subreg (DImode, op1, TImode, > + subreg_lowpart_offset (DImode, TImode)); > + *low_in2 = simplify_gen_subreg (DImode, op2, TImode, > + subreg_lowpart_offset (DImode, TImode)); > + *high_dest = gen_reg_rtx (DImode); > + *high_in1 = simplify_gen_subreg (DImode, op1, TImode, > + subreg_highpart_offset (DImode, TImode)); > + *high_in2 = simplify_gen_subreg (DImode, op2, TImode, > + subreg_highpart_offset (DImode, TImode)); > + > +} > + > +void aarch64_expand_subvti (rtx op0, rtx low_dest, rtx low_in1, And here. > + rtx low_in2, rtx high_dest, rtx high_in1, > + rtx high_in2) > +{ > + if (low_in2 == const0_rtx) > + { > + low_dest = low_in1; > + emit_insn (gen_subdi3_compare1 (high_dest, high_in1, > + force_reg (DImode, high_in2))); > + } > + else > + { > + if (CONST_INT_P (low_in2)) > + { > + low_in2 = force_reg (DImode, GEN_INT (-UINTVAL (low_in2))); > + high_in2 = force_reg (DImode, high_in2); > + emit_insn (gen_adddi3_compareC (low_dest, low_in1, low_in2)); > + } > + else > + emit_insn (gen_subdi3_compare1 (low_dest, low_in1, low_in2)); > + emit_insn (gen_subdi3_carryinCV (high_dest, > + force_reg (DImode, high_in1), > + high_in2)); > + } > + > + emit_move_insn (gen_lowpart (DImode, op0), low_dest); > + emit_move_insn (gen_highpart (DImode, op0), high_dest); > + > +} > + > /* Implement the TARGET_ASAN_SHADOW_OFFSET hook. */ > > static unsigned HOST_WIDE_INT > diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md > index a693a3b..3976ecb 100644 > --- a/gcc/config/aarch64/aarch64.md > +++ b/gcc/config/aarch64/aarch64.md > @@ -1711,25 +1711,123 @@ > } > ) > > +(define_expand "addv4" > + [(match_operand:GPI 0 "register_operand") > + (match_operand:GPI 1 "register_operand") > + (match_operand:GPI 2 "register_operand") > + (match_operand 3 "")] > + "" > +{ > + emit_insn (gen_add3_compareV (operands[0], operands[1], operands[2])); > + aarch64_gen_unlikely_cbranch (NE, CC_Vmode, operands[3]); > + > + DONE; > +}) > + > +(define_expand "uaddv4" > + [(match_operand:GPI 0 "register_operand") > + (match_operand:GPI 1 "register_operand") > + (match_operand:GPI 2 "register_operand") > + (match_operand 3 "")] With no rtl in the expand to describe this pattern, it really should have a top-level comment explaining the arguments (reference to the manual is probably OK in this case). > + "" > +{ > + emit_insn (gen_add3_compareC (operands[0], operands[1], operands[2])); > + aarch64_gen_unlikely_cbranch (NE, CC_Cmode, operands[3]); > + > + DONE; > +}) > + > + > (define_expand "addti3" > [(set (match_operand:TI 0 "register_operand" "") > (plus:TI (match_operand:TI 1 "register_operand" "") > - (match_operand:TI 2 "register_operand" "")))] > + (match_operand:TI 2 "aarch64_reg_or_imm" "")))] > "" > { > - rtx low = gen_reg_rtx (DImode); > - emit_insn (gen_adddi3_compareC (low, gen_lowpart (DImode, operands[1]), > - gen_lowpart (DImode, operands[2]))); > + rtx l0,l1,l2,h0,h1,h2; > > - rtx high = gen_reg_rtx (DImode); > - emit_insn (gen_adddi3_carryin (high, gen_highpart (DImode, operands[1]), > - gen_highpart (DImode, operands[2]))); > + aarch64_add_128bit_scratch_regs (operands[1], operands[2], > + &l0, &l1, &l2, &h0, &h1, &h2); > + > + if (l2 == const0_rtx) > + { > + l0 = l1; > + if (!aarch64_pluslong_operand (h2, DImode)) > + h2 = force_reg (DImode, h2); > + emit_insn (gen_adddi3 (h0, h1, h2)); > + } > + else > + { > + emit_insn (gen_adddi3_compareC (l0, l1, force_reg (DImode, l2))); > + emit_insn (gen_adddi3_carryin (h0, h1, force_reg (DImode, h2))); > + } > + > + emit_move_insn (gen_lowpart (DImode, operands[0]), l0); > + emit_move_insn (gen_highpart (DImode, operands[0]), h0); > > - emit_move_insn (gen_lowpart (DImode, operands[0]), low); > - emit_move_insn (gen_highpart (DImode, operands[0]), high); > DONE; > }) > > +(define_expand "addvti4" > + [(match_operand:TI 0 "register_operand" "") > + (match_operand:TI 1 "register_operand" "") > + (match_operand:TI 2 "aarch64_reg_or_imm" "") > + (match_operand 3 "")] Same here. > + "" > +{ > + rtx l0,l1,l2,h0,h1,h2; > + > + aarch64_add_128bit_scratch_regs (operands[1], operands[2], > + &l0, &l1, &l2, &h0, &h1, &h2); > + > + if (l2 == const0_rtx) > + { > + l0 = l1; > + emit_insn (gen_adddi3_compareV (h0, h1, force_reg (DImode, h2))); > + } > + else > + { > + emit_insn (gen_adddi3_compareC (l0, l1, force_reg (DImode, l2))); > + emit_insn (gen_adddi3_carryinV (h0, h1, force_reg (DImode, h2))); > + } > + > + emit_move_insn (gen_lowpart (DImode, operands[0]), l0); > + emit_move_insn (gen_highpart (DImode, operands[0]), h0); > + > + aarch64_gen_unlikely_cbranch (NE, CC_Vmode, operands[3]); > + DONE; > +}) > + > +(define_expand "uaddvti4" > + [(match_operand:TI 0 "register_operand" "") > + (match_operand:TI 1 "register_operand" "") > + (match_operand:TI 2 "aarch64_reg_or_imm" "") > + (match_operand 3 "")] > + "" > +{ > + rtx l0,l1,l2,h0,h1,h2; > + > + aarch64_add_128bit_scratch_regs (operands[1], operands[2], > + &l0, &l1, &l2, &h0, &h1, &h2); > + > + if (l2 == const0_rtx) > + { > + l0 = l1; > + emit_insn (gen_adddi3_compareC (h0, h1, force_reg (DImode, h2))); > + } > + else > + { > + emit_insn (gen_adddi3_compareC (l0, l1, force_reg (DImode, l2))); > + emit_insn (gen_adddi3_carryinC (h0, h1, force_reg (DImode, h2))); > + } > + > + emit_move_insn (gen_lowpart (DImode, operands[0]), l0); > + emit_move_insn (gen_highpart (DImode, operands[0]), h0); > + > + aarch64_gen_unlikely_cbranch (NE, CC_Cmode, operands[3]); > + DONE; > + }) > + > (define_insn "add3_compare0" > [(set (reg:CC_NZ CC_REGNUM) > (compare:CC_NZ > @@ -1828,10 +1926,70 @@ > [(set_attr "type" "alus_sreg")] > ) > > +;; Note that since we're sign-extending, match the immediate in GPI > +;; rather than in DWI. Since CONST_INT is modeless, this works fine. > +(define_insn "*add3_compareV_cconly_imm" > + [(set (reg:CC_V CC_REGNUM) > + (ne:CC_V > + (plus: > + (sign_extend: (match_operand:GPI 0 "register_operand" "r,r")) > + (match_operand:GPI 1 "aarch64_plus_immediate" "I,J")) > + (sign_extend: (plus:GPI (match_dup 0) (match_dup 1)))))] > + "" > + "@ > + cmn\\t%0, %1 > + cmp\\t%0, #%n1" > + [(set_attr "type" "alus_imm")] > +) > + > +(define_insn "*add3_compareV_cconly" > + [(set (reg:CC_V CC_REGNUM) > + (ne:CC_V Use of ne is wrong here. The condition register should be set to the result of a compare rtl construct. The same applies elsewhere within this patch. NE is then used on the result of the comparison. The mode of the compare then indicates what might or might not be valid in the way the comparison is finally constructed. Note that this issue may go back to the earlier patches that this is based on, but those are equally incorrect and wil need fixing as well at some point. We shouldn't prepetuate the issue. > + (plus: > + (sign_extend: (match_operand:GPI 0 "register_operand" "r")) > + (sign_extend: (match_operand:GPI 1 "register_operand" "r"))) > + (sign_extend: (plus:GPI (match_dup 0) (match_dup 1)))))] > + "" > + "cmn\\t%0, %1" > + [(set_attr "type" "alus_sreg")] > +) > + > +(define_insn "*add3_compareV_imm" > + [(set (reg:CC_V CC_REGNUM) > + (ne:CC_V > + (plus: > + (sign_extend: > + (match_operand:GPI 1 "register_operand" "r,r")) > + (match_operand:GPI 2 "aarch64_plus_immediate" "I,J")) > + (sign_extend: > + (plus:GPI (match_dup 1) (match_dup 2))))) > + (set (match_operand:GPI 0 "register_operand" "=r,r") > + (plus:GPI (match_dup 1) (match_dup 2)))] > + "" > + "@ > + adds\\t%0, %1, %2 > + subs\\t%0, %1, #%n2" > + [(set_attr "type" "alus_imm,alus_imm")] > +) > + > +(define_insn "add3_compareV" > + [(set (reg:CC_V CC_REGNUM) > + (ne:CC_V > + (plus: > + (sign_extend: (match_operand:GPI 1 "register_operand" "r")) > + (sign_extend: (match_operand:GPI 2 "register_operand" "r"))) > + (sign_extend: (plus:GPI (match_dup 1) (match_dup 2))))) > + (set (match_operand:GPI 0 "register_operand" "=r") > + (plus:GPI (match_dup 1) (match_dup 2)))] > + "" > + "adds\\t%0, %1, %2" > + [(set_attr "type" "alus_sreg")] > +) > + > (define_insn "*adds_shift_imm_" > [(set (reg:CC_NZ CC_REGNUM) > (compare:CC_NZ > - (plus:GPI (ASHIFT:GPI > + (plus:GPI (ASHIFT:GPI > (match_operand:GPI 1 "register_operand" "r") > (match_operand:QI 2 "aarch64_shift_imm_" "n")) > (match_operand:GPI 3 "register_operand" "r")) > @@ -2187,6 +2345,138 @@ > [(set_attr "type" "adc_reg")] > ) > > +(define_expand "add3_carryinC" > + [(parallel > + [(set (match_dup 3) > + (ne:CC_C > + (plus: > + (plus: > + (match_dup 4) > + (zero_extend: > + (match_operand:GPI 1 "register_operand" "r"))) > + (zero_extend: > + (match_operand:GPI 2 "register_operand" "r"))) > + (zero_extend: > + (plus:GPI > + (plus:GPI (match_dup 5) (match_dup 1)) > + (match_dup 2))))) > + (set (match_operand:GPI 0 "register_operand") > + (plus:GPI > + (plus:GPI (match_dup 5) (match_dup 1)) > + (match_dup 2)))])] > + "" > +{ > + operands[3] = gen_rtx_REG (CC_Cmode, CC_REGNUM); > + operands[4] = gen_rtx_NE (mode, operands[3], const0_rtx); > + operands[5] = gen_rtx_NE (mode, operands[3], const0_rtx); > +}) > + > +(define_insn "*add3_carryinC_zero" > + [(set (reg:CC_C CC_REGNUM) > + (ne:CC_C > + (plus: > + (match_operand: 2 "aarch64_carry_operation" "") > + (zero_extend: (match_operand:GPI 1 "register_operand" "r"))) > + (zero_extend: > + (plus:GPI > + (match_operand:GPI 3 "aarch64_carry_operation" "") > + (match_dup 1))))) > + (set (match_operand:GPI 0 "register_operand") > + (plus:GPI (match_dup 3) (match_dup 1)))] > + "" > + "adcs\\t%0, %1, zr" > + [(set_attr "type" "adc_reg")] > +) > + > +(define_insn "*add3_carryinC" > + [(set (reg:CC_C CC_REGNUM) > + (ne:CC_C > + (plus: > + (plus: > + (match_operand: 3 "aarch64_carry_operation" "") > + (zero_extend: (match_operand:GPI 1 "register_operand" "r"))) > + (zero_extend: (match_operand:GPI 2 "register_operand" "r"))) > + (zero_extend: > + (plus:GPI > + (plus:GPI > + (match_operand:GPI 4 "aarch64_carry_operation" "") > + (match_dup 1)) > + (match_dup 2))))) > + (set (match_operand:GPI 0 "register_operand") > + (plus:GPI > + (plus:GPI (match_dup 4) (match_dup 1)) > + (match_dup 2)))] > + "" > + "adcs\\t%0, %1, %2" > + [(set_attr "type" "adc_reg")] > +) > + > +(define_expand "add3_carryinV" > + [(parallel > + [(set (reg:CC_V CC_REGNUM) > + (ne:CC_V > + (plus: > + (plus: > + (match_dup 3) > + (sign_extend: > + (match_operand:GPI 1 "register_operand" "r"))) > + (sign_extend: > + (match_operand:GPI 2 "register_operand" "r"))) > + (sign_extend: > + (plus:GPI > + (plus:GPI (match_dup 4) (match_dup 1)) > + (match_dup 2))))) > + (set (match_operand:GPI 0 "register_operand") > + (plus:GPI > + (plus:GPI (match_dup 4) (match_dup 1)) > + (match_dup 2)))])] > + "" > +{ > + rtx cc = gen_rtx_REG (CC_Cmode, CC_REGNUM); > + operands[3] = gen_rtx_NE (mode, cc, const0_rtx); > + operands[4] = gen_rtx_NE (mode, cc, const0_rtx); > +}) > + > +(define_insn "*add3_carryinV_zero" > + [(set (reg:CC_V CC_REGNUM) > + (ne:CC_V > + (plus: > + (match_operand: 2 "aarch64_carry_operation" "") > + (sign_extend: (match_operand:GPI 1 "register_operand" "r"))) > + (sign_extend: > + (plus:GPI > + (match_operand:GPI 3 "aarch64_carry_operation" "") > + (match_dup 1))))) > + (set (match_operand:GPI 0 "register_operand") > + (plus:GPI (match_dup 3) (match_dup 1)))] > + "" > + "adcs\\t%0, %1, zr" > + [(set_attr "type" "adc_reg")] > +) > + > +(define_insn "*add3_carryinV" > + [(set (reg:CC_V CC_REGNUM) > + (ne:CC_V > + (plus: > + (plus: > + (match_operand: 3 "aarch64_carry_operation" "") > + (sign_extend: (match_operand:GPI 1 "register_operand" "r"))) > + (sign_extend: (match_operand:GPI 2 "register_operand" "r"))) > + (sign_extend: > + (plus:GPI > + (plus:GPI > + (match_operand:GPI 4 "aarch64_carry_operation" "") > + (match_dup 1)) > + (match_dup 2))))) > + (set (match_operand:GPI 0 "register_operand") > + (plus:GPI > + (plus:GPI (match_dup 4) (match_dup 1)) > + (match_dup 2)))] > + "" > + "adcs\\t%0, %1, %2" > + [(set_attr "type" "adc_reg")] > +) > + > (define_insn "*add_uxt_shift2" > [(set (match_operand:GPI 0 "register_operand" "=rk") > (plus:GPI (and:GPI > @@ -2283,22 +2573,86 @@ > (set_attr "simd" "*,yes")] > ) > > +(define_expand "subv4" > + [(match_operand:GPI 0 "register_operand") > + (match_operand:GPI 1 "aarch64_reg_or_zero") > + (match_operand:GPI 2 "aarch64_reg_or_zero") > + (match_operand 3 "")] > + "" > +{ > + emit_insn (gen_sub3_compare1 (operands[0], operands[1], operands[2])); > + aarch64_gen_unlikely_cbranch (NE, CC_Vmode, operands[3]); > + > + DONE; > +}) > + > +(define_expand "usubv4" > + [(match_operand:GPI 0 "register_operand") > + (match_operand:GPI 1 "aarch64_reg_or_zero") > + (match_operand:GPI 2 "aarch64_reg_or_zero") > + (match_operand 3 "")] > + "" > +{ > + emit_insn (gen_sub3_compare1 (operands[0], operands[1], operands[2])); > + aarch64_gen_unlikely_cbranch (LTU, CCmode, operands[3]); > + > + DONE; > +}) > + > (define_expand "subti3" > [(set (match_operand:TI 0 "register_operand" "") > - (minus:TI (match_operand:TI 1 "register_operand" "") > + (minus:TI (match_operand:TI 1 "aarch64_reg_or_zero" "") > (match_operand:TI 2 "register_operand" "")))] > "" > { > - rtx low = gen_reg_rtx (DImode); > - emit_insn (gen_subdi3_compare1 (low, gen_lowpart (DImode, operands[1]), > - gen_lowpart (DImode, operands[2]))); > + rtx l0 = gen_reg_rtx (DImode); > + rtx l1 = simplify_gen_subreg (DImode, operands[1], TImode, > + subreg_lowpart_offset (DImode, TImode)); > + rtx l2 = gen_lowpart (DImode, operands[2]); > + rtx h0 = gen_reg_rtx (DImode); > + rtx h1 = simplify_gen_subreg (DImode, operands[1], TImode, > + subreg_highpart_offset (DImode, TImode)); > + rtx h2 = gen_highpart (DImode, operands[2]); > > - rtx high = gen_reg_rtx (DImode); > - emit_insn (gen_subdi3_carryin (high, gen_highpart (DImode, operands[1]), > - gen_highpart (DImode, operands[2]))); > + emit_insn (gen_subdi3_compare1 (l0, l1, l2)); > + emit_insn (gen_subdi3_carryin (h0, h1, h2)); > > - emit_move_insn (gen_lowpart (DImode, operands[0]), low); > - emit_move_insn (gen_highpart (DImode, operands[0]), high); > + emit_move_insn (gen_lowpart (DImode, operands[0]), l0); > + emit_move_insn (gen_highpart (DImode, operands[0]), h0); > + DONE; > +}) > + > +(define_expand "subvti4" > + [(match_operand:TI 0 "register_operand") > + (match_operand:TI 1 "aarch64_reg_or_zero") > + (match_operand:TI 2 "aarch64_reg_or_imm") > + (match_operand 3 "")] > + "" > +{ > + rtx l0,l1,l2,h0,h1,h2; > + > + aarch64_subv_128bit_scratch_regs (operands[1], operands[2], > + &l0, &l1, &l2, &h0, &h1, &h2); > + aarch64_expand_subvti (operands[0], l0, l1, l2, h0, h1, h2); > + > + aarch64_gen_unlikely_cbranch (NE, CC_Vmode, operands[3]); > + DONE; > +}) > + > +(define_expand "usubvti4" > + [(match_operand:TI 0 "register_operand") > + (match_operand:TI 1 "aarch64_reg_or_zero") > + (match_operand:TI 2 "aarch64_reg_or_imm") > + (match_operand 3 "")] > + "" > +{ > + rtx l0,l1,l2,h0,h1,h2; > + > + aarch64_subv_128bit_scratch_regs (operands[1], operands[2], > + &l0, &l1, &l2, &h0, &h1, &h2); > + aarch64_expand_subvti (operands[0], l0, l1, l2, h0, h1, h2); > + > + aarch64_gen_unlikely_cbranch (LTU, CCmode, operands[3]); > DONE; > }) > > @@ -2327,6 +2681,22 @@ > [(set_attr "type" "alus_sreg")] > ) > > +(define_insn "*sub3_compare1_imm" > + [(set (reg:CC CC_REGNUM) > + (compare:CC > + (match_operand:GPI 1 "aarch64_reg_or_zero" "rZ,rZ") > + (match_operand:GPI 2 "aarch64_plus_immediate" "I,J"))) > + (set (match_operand:GPI 0 "register_operand" "=r,r") > + (plus:GPI > + (match_dup 1) > + (match_operand:GPI 3 "aarch64_plus_immediate" "J,I")))] > + "UINTVAL (operands[2]) == -UINTVAL (operands[3])" > + "@ > + subs\\t%0, %1, %2 > + adds\\t%0, %1, %3" > + [(set_attr "type" "alus_imm")] > +) > + > (define_insn "sub3_compare1" > [(set (reg:CC CC_REGNUM) > (compare:CC > @@ -2554,6 +2924,85 @@ > [(set_attr "type" "adc_reg")] > ) > > +(define_expand "sub3_carryinCV" > + [(parallel > + [(set (reg:CC CC_REGNUM) > + (compare:CC > + (sign_extend: > + (match_operand:GPI 1 "aarch64_reg_or_zero" "rZ")) > + (plus: > + (sign_extend: > + (match_operand:GPI 2 "register_operand" "r")) > + (ltu: (reg:CC CC_REGNUM) (const_int 0))))) > + (set (match_operand:GPI 0 "register_operand" "=r") > + (minus:GPI > + (minus:GPI (match_dup 1) (match_dup 2)) > + (ltu:GPI (reg:CC CC_REGNUM) (const_int 0))))])] > + "" > +) > + > +(define_insn "*sub3_carryinCV_z1_z2" > + [(set (reg:CC CC_REGNUM) > + (compare:CC > + (const_int 0) > + (match_operand: 2 "aarch64_borrow_operation" ""))) > + (set (match_operand:GPI 0 "register_operand" "=r") > + (neg:GPI (match_operand:GPI 1 "aarch64_borrow_operation" "")))] > + "" > + "sbcs\\t%0, zr, zr" > + [(set_attr "type" "adc_reg")] > +) > + > +(define_insn "*sub3_carryinCV_z1" > + [(set (reg:CC CC_REGNUM) > + (compare:CC > + (const_int 0) > + (plus: > + (sign_extend: > + (match_operand:GPI 1 "register_operand" "r")) > + (match_operand: 2 "aarch64_borrow_operation" "")))) > + (set (match_operand:GPI 0 "register_operand" "=r") > + (minus:GPI > + (neg:GPI (match_dup 1)) > + (match_operand:GPI 3 "aarch64_borrow_operation" "")))] > + "" > + "sbcs\\t%0, zr, %1" > + [(set_attr "type" "adc_reg")] > +) > + > +(define_insn "*sub3_carryinCV_z2" > + [(set (reg:CC CC_REGNUM) > + (compare:CC > + (sign_extend: > + (match_operand:GPI 1 "register_operand" "r")) > + (match_operand: 2 "aarch64_borrow_operation" ""))) > + (set (match_operand:GPI 0 "register_operand" "=r") > + (minus:GPI > + (match_dup 1) > + (match_operand:GPI 3 "aarch64_borrow_operation" "")))] > + "" > + "sbcs\\t%0, %1, zr" > + [(set_attr "type" "adc_reg")] > +) > + > +(define_insn "*sub3_carryinCV" > + [(set (reg:CC CC_REGNUM) > + (compare:CC > + (sign_extend: > + (match_operand:GPI 1 "register_operand" "r")) > + (plus: > + (sign_extend: > + (match_operand:GPI 2 "register_operand" "r")) > + (match_operand: 3 "aarch64_borrow_operation" "")))) > + (set (match_operand:GPI 0 "register_operand" "=r") > + (minus:GPI > + (minus:GPI (match_dup 1) (match_dup 2)) > + (match_operand:GPI 4 "aarch64_borrow_operation" "")))] > + "" > + "sbcs\\t%0, %1, %2" > + [(set_attr "type" "adc_reg")] > +) > + > (define_insn "*sub_uxt_shift2" > [(set (match_operand:GPI 0 "register_operand" "=rk") > (minus:GPI (match_operand:GPI 4 "register_operand" "rk") > diff --git a/gcc/testsuite/gcc.target/aarch64/builtin_sadd_128.c b/gcc/testsuite/gcc.target/aarch64/builtin_sadd_128.c > new file mode 100644 > index 0000000..0b31500 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/builtin_sadd_128.c > @@ -0,0 +1,18 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O2" } */ > + > +extern void overflow_handler (); > + > +__int128 overflow_add (__int128 x, __int128 y) > +{ > + __int128 r; > + > + int ovr = __builtin_add_overflow (x, y, &r); > + if (ovr) > + overflow_handler (); > + > + return r; > +} > + > +/* { dg-final { scan-assembler "adds" } } */ > +/* { dg-final { scan-assembler "adcs" } } */ > diff --git a/gcc/testsuite/gcc.target/aarch64/builtin_saddl.c b/gcc/testsuite/gcc.target/aarch64/builtin_saddl.c > new file mode 100644 > index 0000000..9768a98 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/builtin_saddl.c > @@ -0,0 +1,17 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O2" } */ > + > +extern void overflow_handler (); > + > +long overflow_add (long x, long y) > +{ > + long r; > + > + int ovr = __builtin_saddl_overflow (x, y, &r); > + if (ovr) > + overflow_handler (); > + > + return r; > +} > + > +/* { dg-final { scan-assembler "adds" } } */ > diff --git a/gcc/testsuite/gcc.target/aarch64/builtin_saddll.c b/gcc/testsuite/gcc.target/aarch64/builtin_saddll.c > new file mode 100644 > index 0000000..126a526 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/builtin_saddll.c > @@ -0,0 +1,18 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O2" } */ > + > +extern void overflow_handler (); > + > +long long overflow_add (long long x, long long y) > +{ > + long long r; > + > + int ovr = __builtin_saddll_overflow (x, y, &r); > + if (ovr) > + overflow_handler (); > + > + return r; > +} > + > +/* { dg-final { scan-assembler "adds" } } */ > + > diff --git a/gcc/testsuite/gcc.target/aarch64/builtin_ssub_128.c b/gcc/testsuite/gcc.target/aarch64/builtin_ssub_128.c > new file mode 100644 > index 0000000..c1261e3 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/builtin_ssub_128.c > @@ -0,0 +1,18 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O2" } */ > + > +extern void overflow_handler (); > + > +__int128 overflow_sub (__int128 x, __int128 y) > +{ > + __int128 r; > + > + int ovr = __builtin_sub_overflow (x, y, &r); > + if (ovr) > + overflow_handler (); > + > + return r; > +} > + > +/* { dg-final { scan-assembler "subs" } } */ > +/* { dg-final { scan-assembler "sbcs" } } */ > diff --git a/gcc/testsuite/gcc.target/aarch64/builtin_ssubl.c b/gcc/testsuite/gcc.target/aarch64/builtin_ssubl.c > new file mode 100644 > index 0000000..1040464 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/builtin_ssubl.c > @@ -0,0 +1,17 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O2" } */ > + > +extern void overflow_handler (); > + > +long overflow_sub (long x, long y) > +{ > + long r; > + > + int ovr = __builtin_ssubl_overflow (x, y, &r); > + if (ovr) > + overflow_handler (); > + > + return r; > +} > + > +/* { dg-final { scan-assembler "subs" } } */ > diff --git a/gcc/testsuite/gcc.target/aarch64/builtin_ssubll.c b/gcc/testsuite/gcc.target/aarch64/builtin_ssubll.c > new file mode 100644 > index 0000000..a03df88 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/builtin_ssubll.c > @@ -0,0 +1,18 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O2" } */ > + > +extern void overflow_handler (); > + > +long long overflow_sub (long long x, long long y) > +{ > + long long r; > + > + int ovr = __builtin_ssubll_overflow (x, y, &r); > + if (ovr) > + overflow_handler (); > + > + return r; > +} > + > +/* { dg-final { scan-assembler "subs" } } */ > + > diff --git a/gcc/testsuite/gcc.target/aarch64/builtin_uadd_128.c b/gcc/testsuite/gcc.target/aarch64/builtin_uadd_128.c > new file mode 100644 > index 0000000..c573c2a > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/builtin_uadd_128.c > @@ -0,0 +1,18 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O2" } */ > + > +extern void overflow_handler (); > + > +unsigned __int128 overflow_add (unsigned __int128 x, unsigned __int128 y) > +{ > + unsigned __int128 r; > + > + int ovr = __builtin_add_overflow (x, y, &r); > + if (ovr) > + overflow_handler (); > + > + return r; > +} > + > +/* { dg-final { scan-assembler "adds" } } */ > +/* { dg-final { scan-assembler "adcs" } } */ > diff --git a/gcc/testsuite/gcc.target/aarch64/builtin_uaddl.c b/gcc/testsuite/gcc.target/aarch64/builtin_uaddl.c > new file mode 100644 > index 0000000..e325591 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/builtin_uaddl.c > @@ -0,0 +1,17 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O2" } */ > + > +extern void overflow_handler (); > + > +unsigned long overflow_add (unsigned long x, unsigned long y) > +{ > + unsigned long r; > + > + int ovr = __builtin_uaddl_overflow (x, y, &r); > + if (ovr) > + overflow_handler (); > + > + return r; > +} > + > +/* { dg-final { scan-assembler "adds" } } */ > diff --git a/gcc/testsuite/gcc.target/aarch64/builtin_uaddll.c b/gcc/testsuite/gcc.target/aarch64/builtin_uaddll.c > new file mode 100644 > index 0000000..5f42886 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/builtin_uaddll.c > @@ -0,0 +1,18 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O2" } */ > + > +extern void overflow_handler (); > + > +unsigned long long overflow_add (unsigned long long x, unsigned long long y) > +{ > + unsigned long long r; > + > + int ovr = __builtin_uaddll_overflow (x, y, &r); > + if (ovr) > + overflow_handler (); > + > + return r; > +} > + > +/* { dg-final { scan-assembler "adds" } } */ > + > diff --git a/gcc/testsuite/gcc.target/aarch64/builtin_usub_128.c b/gcc/testsuite/gcc.target/aarch64/builtin_usub_128.c > new file mode 100644 > index 0000000..a84f4a4 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/builtin_usub_128.c > @@ -0,0 +1,18 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O2" } */ > + > +extern void overflow_handler (); > + > +unsigned __int128 overflow_sub (unsigned __int128 x, unsigned __int128 y) > +{ > + unsigned __int128 r; > + > + int ovr = __builtin_sub_overflow (x, y, &r); > + if (ovr) > + overflow_handler (); > + > + return r; > +} > + > +/* { dg-final { scan-assembler "subs" } } */ > +/* { dg-final { scan-assembler "sbcs" } } */ > diff --git a/gcc/testsuite/gcc.target/aarch64/builtin_usubl.c b/gcc/testsuite/gcc.target/aarch64/builtin_usubl.c > new file mode 100644 > index 0000000..ed033da > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/builtin_usubl.c > @@ -0,0 +1,17 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O2" } */ > + > +extern void overflow_handler (); > + > +unsigned long overflow_sub (unsigned long x, unsigned long y) > +{ > + unsigned long r; > + > + int ovr = __builtin_usubl_overflow (x, y, &r); > + if (ovr) > + overflow_handler (); > + > + return r; > +} > + > +/* { dg-final { scan-assembler "subs" } } */ > diff --git a/gcc/testsuite/gcc.target/aarch64/builtin_usubll.c b/gcc/testsuite/gcc.target/aarch64/builtin_usubll.c > new file mode 100644 > index 0000000..a742f0c > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/builtin_usubll.c > @@ -0,0 +1,18 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O2" } */ > + > +extern void overflow_handler (); > + > +unsigned long long overflow_sub (unsigned long long x, unsigned long long y) > +{ > + unsigned long long r; > + > + int ovr = __builtin_usubll_overflow (x, y, &r); > + if (ovr) > + overflow_handler (); > + > + return r; > +} > + > +/* { dg-final { scan-assembler "subs" } } */ > + >