Hello! Attached patch improves code generation for _addcarry_* and _subborrow_* intrinsics considerably. For the following testcase: --cut here-- typedef unsigned long long u64; u64 testcarry_u64 (u64 a, u64 b, u64 c, u64 d) { u64 result0, result1; __builtin_ia32_addcarryx_u64 (__builtin_ia32_addcarryx_u64 (0, a, c, &result0), b, d, &result1); return result0 ^ result1; } --cut here-- unpatched compiler generates: xorl %r8d, %r8d addb $-1, %r8b adcq %rdi, %rdx setc %r8b movq %rdx, %rax addb $-1, %r8b adcq %rsi, %rcx xorq %rcx, %rax ret and patched compiler improves the compiled code to: addq %rdi, %rdx adcq %rsi, %rcx movq %rdx, %rax xorq %rcx, %rax ret To achieve this improvement, patterns involving UNSPECs were removed and equivalent add3_cconly_overflow patterns are used instead. Also, to enable desired combine simplifications, several patterns were rewritten to their canonical representation, similar to: (define_insn "addcarry" [(set (reg:CCC FLAGS_REG) (compare:CCC (plus:SWI48 (plus:SWI48 (match_operator:SWI48 4 "ix86_carry_flag_operator" [(match_operand 3 "flags_reg_operand") (const_int 0)]) (match_operand:SWI48 1 "nonimmediate_operand" "%0")) (match_operand:SWI48 2 "nonimmediate_operand" "rm")) (match_dup 1))) (set (match_operand:SWI48 0 "register_operand" "=r") (plus:SWI48 (plus:SWI48 (match_op_dup 4 [(match_dup 3) (const_int 0)]) (match_dup 1)) (match_dup 2)))] "ix86_binary_operator_ok (PLUS, mode, operands)" "adc{}\t{%2, %0|%0, %2}" [(set_attr "type" "alu") (set_attr "use_carry" "1") (set_attr "pent_pair" "pu") (set_attr "mode" "")]) and (define_insn "subborrow" [(set (reg:CCC FLAGS_REG) (compare:CCC (match_operand:SWI48 1 "nonimmediate_operand" "0") (plus:SWI48 (match_operator:SWI48 4 "ix86_carry_flag_operator" [(match_operand 3 "flags_reg_operand") (const_int 0)]) (match_operand:SWI48 2 "nonimmediate_operand" "rm")))) (set (match_operand:SWI48 0 "register_operand" "=r") (minus:SWI48 (minus:SWI48 (match_dup 1) (match_op_dup 4 [(match_dup 3) (const_int 0)])) (match_dup 2)))] "ix86_binary_operator_ok (MINUS, mode, operands)" "sbb{}\t{%2, %0|%0, %2}" [(set_attr "type" "alu") (set_attr "use_carry" "1") (set_attr "pent_pair" "pu") (set_attr "mode" "")]) Also, the patch removes generation of adcx instructions. I'd argue that there is no point to generate larger and probably slower adcx instead of equivalent shorter and well optimized adc[lq] instructions. (If this is not the case, the follow-up patch to generate correct insn mnemonic depending on TARGET_ADX would be trivial). Please also note that unpatched expander breaks carry flag chains. The rewritten expander takes care not to expand arguments (expansion can result in carry flag clobbering add insns) in the middle of carry flag chain. Due to this problem, the patch will also be backported to gcc-5 branch. 2015-08-27 Uros Bizjak PR target/67317 * config/i386/i386.md (*add3_cc): Remove insn pattern. (addqi3_cc): Ditto. (UNSPEC_ADD_CARRY): Remove. (addqi3_cconly_overflow): New expander. (*add3_doubleword): Split to add3_cconly_overflow. Adjust for changed add3_carry. (*neg2_doubleword): Adjust for changed add3_carry. (*sub3_doubleword): Adjust for changed sub3_carry. (3_carry): Remove expander. (*3_carry): Split insn pattern to add3_carry and sub3_carry. (plusminus_carry_mnemonic): Remove code attribute. (add3_carry): Canonicalize insn pattern. (*addsi3_carry_zext): Ditto. (sub3_carry): Ditto. (*subsi3_carry_zext): Ditto. (adcx3): Remove insn pattern. (addcarry): New insn pattern. (subborrow): Ditto. * config/i386/i386.c (ix86_expand_strlensi_unroll_1): Use gen_addqi3_cconly_overflow instead of gen_addqi3_cc. (ix86_expand_builtin) : Use CODE_FOR_subborrowsi, CODE_FOR_subborrowdi, CODE_FOR_addcarrysi and CODE_FOR_addcarrydi. Rewrite expander to not clobber carry flag chains. testsuite/ChangeLog: 2015-08-27 Uros Bizjak PR target/67317 * gcc.target/i386/pr67317-1.c: New test. * gcc.target/i386/pr67317-2.c: Ditto. * gcc.target/i386/pr67317-3.c: Ditto. * gcc.target/i386/pr67317-4.c: Ditto. * gcc.target/i386/adx-addcarryx32-1.c: Also scan for adcl. * gcc.target/i386/adx-addcarryx32-2.c: Also scan for adcq. Patch was bootstrapped and regression tested on x86_64-linux-gnu {,-m32}. Patch was committed to mainline, and will be backported to gcc-5 branch after a week without problems in mainline. Uros.