* [PATCH 1/2] Add a new target hook: TARGET_HAVE_CONDITIONAL_MOVE_MEM_NOTRAP [not found] <20240614005632.4088419-1-lingling.kong@intel.com> @ 2024-06-14 1:38 ` Kong, Lingling [not found] ` <20240614005632.4088419-2-lingling.kong@intel.com> 1 sibling, 0 replies; 4+ messages in thread From: Kong, Lingling @ 2024-06-14 1:38 UTC (permalink / raw) To: gcc-patches; +Cc: Liu, Hongtao, Kong, Lingling, Uros Bizjak From: konglin1 <lingling.kong@intel.com> gcc/ChangeLog: * doc/tm.texi: Regenerated. * doc/tm.texi.in: Add TARGET_HAVE_CONDITIONAL_MOVE_MEM_NOTRAP * target.def (bool,): New hook. * targhooks.cc (default_have_conditional_move_mem_notrap): New function to hook TARGET_HAVE_CONDITIONAL_MOVE_MEM_NOTRAP. * targhooks.h (default_have_conditional_move_mem_notrap): New target hook declear. --- gcc/doc/tm.texi | 6 ++++++ gcc/doc/tm.texi.in | 2 ++ gcc/target.def | 11 +++++++++++ gcc/targhooks.cc | 8 ++++++++ gcc/targhooks.h | 1 + 5 files changed, 28 insertions(+) diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi index 8a7aa70d605..f8faf44ab73 100644 --- a/gcc/doc/tm.texi +++ b/gcc/doc/tm.texi @@ -7311,6 +7311,12 @@ candidate as a replacement for the if-convertible sequence described in @code{if_info}. @end deftypefn +@deftypefn {Target Hook} bool TARGET_HAVE_CONDITIONAL_MOVE_MEM_NOTRAP (rtx @var{x}) +This hook returns true if the target supports condition move instructions + that enables fault suppression of memory operands when the condition code + evaluates to false. +@end deftypefn + @deftypefn {Target Hook} bool TARGET_NEW_ADDRESS_PROFITABLE_P (rtx @var{memref}, rtx_insn * @var{insn}, rtx @var{new_addr}) Return @code{true} if it is profitable to replace the address in @var{memref} with @var{new_addr}. This allows targets to prevent the diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in index 9e0830758ae..17c122aea43 100644 --- a/gcc/doc/tm.texi.in +++ b/gcc/doc/tm.texi.in @@ -4748,6 +4748,8 @@ Define this macro if a non-short-circuit operation produced by @hook TARGET_NOCE_CONVERSION_PROFITABLE_P +@hook TARGET_HAVE_CONDITIONAL_MOVE_MEM_NOTRAP + @hook TARGET_NEW_ADDRESS_PROFITABLE_P @hook TARGET_NO_SPECULATION_IN_DELAY_SLOTS_P diff --git a/gcc/target.def b/gcc/target.def index 70070caebc7..aa77737e006 100644 --- a/gcc/target.def +++ b/gcc/target.def @@ -3993,6 +3993,17 @@ candidate as a replacement for the if-convertible sequence described in\n\ bool, (rtx_insn *seq, struct noce_if_info *if_info), default_noce_conversion_profitable_p) +/* Return true if the target support condition move instructions that enables + fault suppression of memory operands when the condition code evaluates to + false. */ +DEFHOOK +(have_conditional_move_mem_notrap, + "This hook returns true if the target supports condition move instructions\n\ + that enables fault suppression of memory operands when the condition code\n\ + evaluates to false.", +bool, (rtx x), +default_have_conditional_move_mem_notrap) + /* Return true if new_addr should be preferred over the existing address used by memref in insn. */ DEFHOOK diff --git a/gcc/targhooks.cc b/gcc/targhooks.cc index fb339bf75dd..a616371b204 100644 --- a/gcc/targhooks.cc +++ b/gcc/targhooks.cc @@ -2816,4 +2816,12 @@ default_memtag_untagged_pointer (rtx tagged_pointer, rtx target) return untagged_base; } +/* The default implementation of + TARGET_HAVE_CONDITIONAL_MOVE_MEM_NOTRAP. */ +bool +default_have_conditional_move_mem_notrap (rtx x ATTRIBUTE_UNUSED) +{ + return false; +} + #include "gt-targhooks.h" diff --git a/gcc/targhooks.h b/gcc/targhooks.h index 85f3817c176..f8ea2fde53d 100644 --- a/gcc/targhooks.h +++ b/gcc/targhooks.h @@ -305,5 +305,6 @@ extern rtx default_memtag_add_tag (rtx, poly_int64, uint8_t); extern rtx default_memtag_set_tag (rtx, rtx, rtx); extern rtx default_memtag_extract_tag (rtx, rtx); extern rtx default_memtag_untagged_pointer (rtx, rtx); +extern bool default_have_conditional_move_mem_notrap (rtx x); #endif /* GCC_TARGHOOKS_H */ -- 2.31.1 ^ permalink raw reply [flat|nested] 4+ messages in thread
[parent not found: <20240614005632.4088419-2-lingling.kong@intel.com>]
* [PATCH 2/2] [APX CFCMOV] Support APX CFCMOV [not found] ` <20240614005632.4088419-2-lingling.kong@intel.com> @ 2024-06-14 1:38 ` Kong, Lingling 2024-06-14 6:12 ` Richard Biener 0 siblings, 1 reply; 4+ messages in thread From: Kong, Lingling @ 2024-06-14 1:38 UTC (permalink / raw) To: gcc-patches; +Cc: Liu, Hongtao, Kong, Lingling, Uros Bizjak [-- Attachment #1: Type: text/plain, Size: 29668 bytes --] From: konglin1 <lingling.kong@intel.com<mailto:lingling.kong@intel.com>> APX CFCMOV feature implements conditionally faulting which means that all memory faults are suppressed when the condition code evaluates to false and load or store a memory operand. Now we could load or store a memory operand may trap or fault for conditional move. To enable CFCMOV, we add a target HOOK TARGET_HAVE_CONDITIONAL_MOVE_MEM_NOTRAP in if-conversion pass to allow convert to cmov. Bootstrapped & regtested on x86-64-pc-linux-gnu with binutils 2.42 branch. OK for trunk? gcc/ChangeLog: * config/i386/i386-expand.cc (ix86_can_cfcmov_p): New function that test if the cfcmov can be generated. (ix86_expand_int_movcc): Expand to cfcmov pattern if ix86_can_cfcmov_p return ture. * config/i386/i386-opts.h (enum apx_features): Add apx_cfcmov. * config/i386/i386.cc (ix86_have_conditional_move_mem_notrap): New function to hook TARGET_HAVE_CONDITIONAL_MOVE_MEM_NOTRAP (TARGET_HAVE_CONDITIONAL_MOVE_MEM_NOTRAP): Target hook define. (ix86_rtx_costs): Add UNSPEC_APX_CFCMOV cost; * config/i386/i386.h (TARGET_APX_CFCMOV): Define. * config/i386/i386.md (*cfcmov<mode>_1): New define_insn to support cfcmov. (*cfcmov<mode>_2): Ditto. (UNSPEC_APX_CFCMOV): New unspec for cfcmov. * config/i386/i386.opt: Add enum value for cfcmov. * ifcvt.cc (noce_try_cmove_load_mem_notrap): Use target hook to allow convert to cfcmov for conditional load. (noce_try_cmove_store_mem_notrap): Convert to conditional store. (noce_process_if_block): Ditto. gcc/testsuite/ChangeLog: * gcc.target/i386/apx-cfcmov-1.c: New test. * gcc.target/i386/apx-cfcmov-2.c: Ditto. --- gcc/config/i386/i386-expand.cc | 63 +++++ gcc/config/i386/i386-opts.h | 4 +- gcc/config/i386/i386.cc | 33 ++- gcc/config/i386/i386.h | 1 + gcc/config/i386/i386.md | 53 +++- gcc/config/i386/i386.opt | 3 + gcc/config/i386/predicates.md | 7 + gcc/ifcvt.cc | 247 ++++++++++++++++++- gcc/testsuite/gcc.target/i386/apx-cfcmov-1.c | 73 ++++++ gcc/testsuite/gcc.target/i386/apx-cfcmov-2.c | 40 +++ 10 files changed, 511 insertions(+), 13 deletions(-) create mode 100644 gcc/testsuite/gcc.target/i386/apx-cfcmov-1.c create mode 100644 gcc/testsuite/gcc.target/i386/apx-cfcmov-2.c diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc index 312329e550b..c02a4bcbec3 100644 --- a/gcc/config/i386/i386-expand.cc +++ b/gcc/config/i386/i386-expand.cc @@ -3336,6 +3336,30 @@ ix86_expand_int_addcc (rtx operands[]) return true; } +/* Return TRUE if we could convert "if (test) x = a; else x = b;" to cfcmov, + especially when load a or b or x store may cause memmory faults. */ +bool +ix86_can_cfcmov_p (rtx x, rtx a, rtx b) +{ + machine_mode mode = GET_MODE (x); + if (TARGET_APX_CFCMOV + && (mode == DImode || mode == SImode || mode == HImode)) + { + /* C load (r m r), (r m C), (r r m). For r m m could use + two cfcmov. */ + if (register_operand (x, mode) + && ((MEM_P (a) && register_operand (b, mode)) + || (MEM_P (a) && b == const0_rtx) + || (register_operand (a, mode) && MEM_P (b)) + || (MEM_P (a) && MEM_P (b)))) + return true; + /* C store (m r 0). */ + else if (MEM_P (x) && x == b && register_operand (a, mode)) + return true; + } + return false; +} + bool ix86_expand_int_movcc (rtx operands[]) { @@ -3366,6 +3390,45 @@ ix86_expand_int_movcc (rtx operands[]) compare_code = GET_CODE (compare_op); + if (MEM_P (operands[0]) + && !ix86_can_cfcmov_p (operands[0], op2, op3)) + return false; + + if (may_trap_or_fault_p (op2) || may_trap_or_fault_p (op3)) + { + if (ix86_can_cfcmov_p (operands[0], op2, op3)) + { + if (may_trap_or_fault_p (op2)) + op2 = gen_rtx_UNSPEC (mode, gen_rtvec (1, operands[2]), + UNSPEC_APX_CFCMOV); + if (may_trap_or_fault_p (op3)) + op3 = gen_rtx_UNSPEC (mode, gen_rtvec (1, operands[3]), + UNSPEC_APX_CFCMOV); + emit_insn (compare_seq); + + if (may_trap_or_fault_p (op2) && may_trap_or_fault_p (op3)) + { + emit_insn (gen_rtx_SET (operands[0], + gen_rtx_IF_THEN_ELSE (mode, + compare_op, + op2, + operands[0]))); + emit_insn (gen_rtx_SET (operands[0], + gen_rtx_IF_THEN_ELSE (mode, + compare_op, + operands[0], + op3))); + } + else + emit_insn (gen_rtx_SET (operands[0], + gen_rtx_IF_THEN_ELSE (mode, + compare_op, + op2, op3))); + return true; + } + return false; + } + if ((op1 == const0_rtx && (code == GE || code == LT)) || (op1 == constm1_rtx && (code == GT || code == LE))) sign_bit_compare_p = true; diff --git a/gcc/config/i386/i386-opts.h b/gcc/config/i386/i386-opts.h index c7ec0d9fd39..711519ffb53 100644 --- a/gcc/config/i386/i386-opts.h +++ b/gcc/config/i386/i386-opts.h @@ -143,8 +143,10 @@ enum apx_features { apx_nf = 1 << 4, apx_ccmp = 1 << 5, apx_zu = 1 << 6, + apx_cfcmov = 1 << 7, apx_all = apx_egpr | apx_push2pop2 | apx_ndd - | apx_ppx | apx_nf | apx_ccmp | apx_zu, + | apx_ppx | apx_nf | apx_ccmp | apx_zu + | apx_cfcmov, }; #endif diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc index 173db213d14..b14c0a3d9f2 100644 --- a/gcc/config/i386/i386.cc +++ b/gcc/config/i386/i386.cc @@ -22349,10 +22349,18 @@ ix86_rtx_costs (rtx x, machine_mode mode, int outer_code_i, int opno, *total = COSTS_N_INSNS (1); if (!COMPARISON_P (XEXP (x, 0)) && !REG_P (XEXP (x, 0))) *total += rtx_cost (XEXP (x, 0), mode, code, 0, speed); - if (!REG_P (XEXP (x, 1))) - *total += rtx_cost (XEXP (x, 1), mode, code, 1, speed); - if (!REG_P (XEXP (x, 2))) - *total += rtx_cost (XEXP (x, 2), mode, code, 2, speed); + rtx op1, op2; + op1 = XEXP (x, 1); + op2 = XEXP (x, 2); + /* Handle UNSPEC_APX_CFCMOV for cfcmov. */ + if (GET_CODE (op1) == UNSPEC && XINT (op1, 1) == UNSPEC_APX_CFCMOV) + op1 = XVECEXP (op1, 0, 0); + if (GET_CODE (op2) == UNSPEC && XINT (op2, 1) == UNSPEC_APX_CFCMOV) + op2 = XVECEXP (op2, 0, 0); + if (!REG_P (op1)) + *total += rtx_cost (op1, mode, code, 1, speed); + if (!REG_P (op2)) + *total += rtx_cost (op2, mode, code, 2, speed); return true; } return false; @@ -24998,6 +25006,19 @@ ix86_noce_conversion_profitable_p (rtx_insn *seq, struct noce_if_info *if_info) return default_noce_conversion_profitable_p (seq, if_info); } + +/* Implement targetm.have_conditional_move_mem_notrap hook. */ +static bool +ix86_have_conditional_move_mem_notrap (rtx x) +{ + machine_mode mode = GET_MODE (x); + if (TARGET_APX_CFCMOV + && (mode == DImode || mode == SImode || mode == HImode) + && MEM_P (x)) + return true; + return false; +} + /* x86-specific vector costs. */ class ix86_vector_costs : public vector_costs { @@ -26975,6 +26996,10 @@ ix86_libgcc_floating_mode_supported_p #undef TARGET_NOCE_CONVERSION_PROFITABLE_P #define TARGET_NOCE_CONVERSION_PROFITABLE_P ix86_noce_conversion_profitable_p +#undef TARGET_HAVE_CONDITIONAL_MOVE_MEM_NOTRAP +#define TARGET_HAVE_CONDITIONAL_MOVE_MEM_NOTRAP \ + ix86_have_conditional_move_mem_notrap + #undef TARGET_HARD_REGNO_NREGS #define TARGET_HARD_REGNO_NREGS ix86_hard_regno_nregs #undef TARGET_HARD_REGNO_MODE_OK diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h index dc1a1f44320..6a20fa678c8 100644 --- a/gcc/config/i386/i386.h +++ b/gcc/config/i386/i386.h @@ -58,6 +58,7 @@ see the files COPYING3 and COPYING.RUNTIME respectively. If not, see #define TARGET_APX_NF (ix86_apx_features & apx_nf) #define TARGET_APX_CCMP (ix86_apx_features & apx_ccmp) #define TARGET_APX_ZU (ix86_apx_features & apx_zu) +#define TARGET_APX_CFCMOV (ix86_apx_features & apx_cfcmov) #include "config/vxworks-dummy.h" diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md index fd48e764469..57448c07828 100644 --- a/gcc/config/i386/i386.md +++ b/gcc/config/i386/i386.md @@ -221,6 +221,9 @@ ;; For APX CCMP support ;; DFV = default flag value UNSPEC_APX_DFV + + ;; For APX CFCMOV support + UNSPEC_APX_CFCMOV ]) (define_c_enum "unspecv" [ @@ -579,7 +582,7 @@ noavx512dq,fma_or_avx512vl,avx512vl,noavx512vl,avxvnni, avx512vnnivl,avx512fp16,avxifma,avx512ifmavl,avxneconvert, avx512bf16vl,vpclmulqdqvl,avx_noavx512f,avx_noavx512vl, - vaes_avx512vl,noapx_nf" + vaes_avx512vl,noapx_nf,apx_cfcmov" (const_string "base")) ;; The (bounding maximum) length of an instruction immediate. @@ -986,6 +989,7 @@ (eq_attr "mmx_isa" "avx") (symbol_ref "TARGET_MMX_WITH_SSE && TARGET_AVX") (eq_attr "isa" "noapx_nf") (symbol_ref "!TARGET_APX_NF") + (eq_attr "isa" "apx_cfcmov") (symbol_ref "TARGET_APX_CFCMOV") ] (const_int 1))) @@ -24995,7 +24999,7 @@ ;; Conditional move instructions. (define_expand "mov<mode>cc" - [(set (match_operand:SWIM 0 "register_operand") + [(set (match_operand:SWIM 0 "register_or_cfc_mem_operand") (if_then_else:SWIM (match_operand 1 "comparison_operator") (match_operand:SWIM 2 "<general_operand>") (match_operand:SWIM 3 "<general_operand>")))] @@ -25103,19 +25107,54 @@ (set (match_dup 0) (neg:SWI (ltu:SWI (reg:CCC FLAGS_REG) (const_int 0))))]) +(define_insn "*cfcmov<mode>_1" + [(set (match_operand:SWI248 0 "register_operand" "=r,r") + (if_then_else:SWI248 (match_operator 1 "ix86_comparison_operator" + [(reg FLAGS_REG) (const_int 0)]) + (unspec:SWI248 + [(match_operand:SWI248 2 "memory_operand" "m,m")] + UNSPEC_APX_CFCMOV) + (match_operand:SWI248 3 "reg_or_0_operand" "C,r")))] + "TARGET_CMOVE && TARGET_APX_CFCMOV" + "@ + cfcmov%O2%C1\t{%2, %0|%0, %2} + cfcmov%O2%C1\t{%2, %3, %0|%0, %3, %2}" + [(set_attr "isa" "*,apx_ndd") + (set_attr "type" "icmov") + (set_attr "prefix" "evex") + (set_attr "mode" "<MODE>")]) + +(define_insn "*cfcmov<mode>_2" + [(set (match_operand:SWI248 0 "nonimmediate_operand" "=r,m") + (if_then_else:SWI248 (match_operator 1 "ix86_comparison_operator" + [(reg FLAGS_REG) (const_int 0)]) + (match_operand:SWI248 2 "register_operand" "r,r") + (unspec:SWI248 + [(match_operand:SWI248 3 "memory_operand" "m,0")] + UNSPEC_APX_CFCMOV)))] + "TARGET_CMOVE && TARGET_APX_CFCMOV" + "@ + cfcmov%O2%c1\t{%3, %2, %0|%0, %2, %3} + cfcmov%O2%C1\t{%2, %0|%0, %2}" + [(set_attr "isa" "apx_ndd,*") + (set_attr "type" "icmov") + (set_attr "prefix" "evex") + (set_attr "mode" "<MODE>")]) + (define_insn "*mov<mode>cc_noc" - [(set (match_operand:SWI248 0 "register_operand" "=r,r,r,r") + [(set (match_operand:SWI248 0 "register_operand" "=r,r,r,r,r") (if_then_else:SWI248 (match_operator 1 "ix86_comparison_operator" [(reg FLAGS_REG) (const_int 0)]) - (match_operand:SWI248 2 "nonimmediate_operand" "rm,0,rm,r") - (match_operand:SWI248 3 "nonimmediate_operand" "0,rm,r,rm")))] + (match_operand:SWI248 2 "nonimmediate_operand" "rm,0,rm,r,r") + (match_operand:SWI248 3 "nonimm_or_0_operand" "0,rm,r,rm,C")))] "TARGET_CMOVE && !(MEM_P (operands[2]) && MEM_P (operands[3]))" "@ cmov%O2%C1\t{%2, %0|%0, %2} cmov%O2%c1\t{%3, %0|%0, %3} cmov%O2%C1\t{%2, %3, %0|%0, %3, %2} - cmov%O2%c1\t{%3, %2, %0|%0, %2, %3}" - [(set_attr "isa" "*,*,apx_ndd,apx_ndd") + cmov%O2%c1\t{%3, %2, %0|%0, %2, %3} + cfcmov%O2%C1\t{%2, %0|%0, %2}" + [(set_attr "isa" "*,*,apx_ndd,apx_ndd,apx_cfcmov") (set_attr "type" "icmov") (set_attr "mode" "<MODE>")]) diff --git a/gcc/config/i386/i386.opt b/gcc/config/i386/i386.opt index 353fffb2343..7d63d9abd95 100644 --- a/gcc/config/i386/i386.opt +++ b/gcc/config/i386/i386.opt @@ -1345,6 +1345,9 @@ Enum(apx_features) String(ccmp) Value(apx_ccmp) Set(7) EnumValue Enum(apx_features) String(zu) Value(apx_zu) Set(8) +EnumValue +Enum(apx_features) String(cfcmov) Value(apx_cfcmov) Set(9) + EnumValue Enum(apx_features) String(all) Value(apx_all) Set(1) diff --git a/gcc/config/i386/predicates.md b/gcc/config/i386/predicates.md index 7afe3100cb7..d562e10ab41 100644 --- a/gcc/config/i386/predicates.md +++ b/gcc/config/i386/predicates.md @@ -2322,3 +2322,10 @@ return true; }) + +;; Return true if OP is a register operand or memory_operand is only +;; supported under TARGET_APX_CFCMOV. +(define_predicate "register_or_cfc_mem_operand" + (ior (match_operand 0 "register_operand") + (and (match_code "mem") + (match_test "TARGET_APX_CFCMOV")))) diff --git a/gcc/ifcvt.cc b/gcc/ifcvt.cc index 58ed42673e5..6e3e48af810 100644 --- a/gcc/ifcvt.cc +++ b/gcc/ifcvt.cc @@ -783,6 +783,8 @@ static rtx noce_emit_cmove (struct noce_if_info *, rtx, enum rtx_code, rtx, rtx, rtx, rtx, rtx = NULL, rtx = NULL); static bool noce_try_cmove (struct noce_if_info *); static bool noce_try_cmove_arith (struct noce_if_info *); +static bool noce_try_cmove_load_mem_notrap (struct noce_if_info *); +static bool noce_try_cmove_store_mem_notrap (struct noce_if_info *, rtx *, rtx); static rtx noce_get_alt_condition (struct noce_if_info *, rtx, rtx_insn **); static bool noce_try_minmax (struct noce_if_info *); static bool noce_try_abs (struct noce_if_info *); @@ -2401,6 +2403,237 @@ noce_try_cmove_arith (struct noce_if_info *if_info) return false; } +/* When target support suppress memory fault, try more complex cases involving + conditional_move's source or dest may trap or fault. */ + +static bool +noce_try_cmove_load_mem_notrap (struct noce_if_info *if_info) +{ + rtx a = if_info->a; + rtx b = if_info->b; + rtx x = if_info->x; + + if (MEM_P (x)) + return false; + /* Just handle a conditional move from one trap MEM + other non_trap, + non mem cases. */ + if (!(MEM_P (a) ^ MEM_P (b))) + return false; + bool a_trap = may_trap_or_fault_p (a); + bool b_trap = may_trap_or_fault_p (b); + + if (!(a_trap ^ b_trap)) + return false; + if (a_trap && (!MEM_P (a) || !targetm.have_conditional_move_mem_notrap (a))) + return false; + if (b_trap && (!MEM_P (b) || !targetm.have_conditional_move_mem_notrap (b))) + return false; + + rtx orig_b; + rtx_insn *insn_a, *insn_b; + bool a_simple = if_info->then_simple; + bool b_simple = if_info->else_simple; + basic_block then_bb = if_info->then_bb; + basic_block else_bb = if_info->else_bb; + rtx target; + enum rtx_code code; + rtx cond = if_info->cond; + rtx_insn *ifcvt_seq; + + /* if (test) x = *a; else x = c - d; + => x = c - d; + if (test) + x = *a; + */ + + code = GET_CODE (cond); + insn_a = if_info->insn_a; + insn_b = if_info->insn_b; + + machine_mode x_mode = GET_MODE (x); + + if (!can_conditionally_move_p (x_mode)) + return false; + + /* Because we only handle one trap MEM + other non_trap, non mem cases, + just move one trap MEM always in then_bb. */ + if (noce_reversed_cond_code (if_info) != UNKNOWN) + { + bool reversep = false; + if (b_trap) + reversep = true; + + if (reversep) + { + if (if_info->rev_cond) + { + cond = if_info->rev_cond; + code = GET_CODE (cond); + } + else + code = reversed_comparison_code (cond, if_info->jump); + std::swap (a, b); + std::swap (insn_a, insn_b); + std::swap (a_simple, b_simple); + std::swap (then_bb, else_bb); + } + } + + if (then_bb && else_bb + && (!bbs_ok_for_cmove_arith (then_bb, else_bb, if_info->orig_x) + || !bbs_ok_for_cmove_arith (else_bb, then_bb, if_info->orig_x))) + return false; + + start_sequence (); + + /* If one of the blocks is empty then the corresponding B or A value + came from the test block. The non-empty complex block that we will + emit might clobber the register used by B or A, so move it to a pseudo + first. */ + + rtx tmp_b = NULL_RTX; + + /* Don't move trap mem to a pseudo. */ + if (!may_trap_or_fault_p (b) && (b_simple || !else_bb)) + tmp_b = gen_reg_rtx (x_mode); + + orig_b = b; + + rtx emit_a = NULL_RTX; + rtx emit_b = NULL_RTX; + rtx_insn *tmp_insn = NULL; + bool modified_in_a = false; + bool modified_in_b = false; + /* If either operand is complex, load it into a register first. + The best way to do this is to copy the original insn. In this + way we preserve any clobbers etc that the insn may have had. + This is of course not possible in the IS_MEM case. */ + + if (! general_operand (b, GET_MODE (b)) || tmp_b) + { + if (insn_b) + { + b = tmp_b ? tmp_b : gen_reg_rtx (GET_MODE (b)); + rtx_insn *copy_of_b = as_a <rtx_insn *> (copy_rtx (insn_b)); + rtx set = single_set (copy_of_b); + + SET_DEST (set) = b; + emit_b = PATTERN (copy_of_b); + } + else + { + rtx tmp_reg = tmp_b ? tmp_b : gen_reg_rtx (GET_MODE (b)); + emit_b = gen_rtx_SET (tmp_reg, b); + b = tmp_reg; + } + } + + if (tmp_b && then_bb) + { + FOR_BB_INSNS (then_bb, tmp_insn) + /* Don't check inside insn_a. We will have changed it to emit_a + with a destination that doesn't conflict. */ + if (!(insn_a && tmp_insn == insn_a) + && modified_in_p (orig_b, tmp_insn)) + { + modified_in_a = true; + break; + } + + } + + modified_in_b = emit_b != NULL_RTX && modified_in_p (a, emit_b); + /* If insn to set up A clobbers any registers B depends on, try to + swap insn that sets up A with the one that sets up B. If even + that doesn't help, punt. */ + if (modified_in_a && !modified_in_b) + { + if (!noce_emit_bb (emit_b, else_bb, b_simple)) + goto end_seq_and_fail; + + if (!noce_emit_bb (emit_a, then_bb, a_simple)) + goto end_seq_and_fail; + } + else if (!modified_in_a) + { + if (!noce_emit_bb (emit_b, else_bb, b_simple)) + goto end_seq_and_fail; + + if (!noce_emit_bb (emit_a, then_bb, a_simple)) + goto end_seq_and_fail; + } + else + goto end_seq_and_fail; + + target = noce_emit_cmove (if_info, x, code, XEXP (cond, 0), XEXP (cond, 1), + a, b); + + if (! target) + goto end_seq_and_fail; + + if (target != x) + noce_emit_move_insn (x, target); + + ifcvt_seq = end_ifcvt_sequence (if_info); + if (!ifcvt_seq || !targetm.noce_conversion_profitable_p (ifcvt_seq, if_info)) + return false; + + emit_insn_before_setloc (ifcvt_seq, if_info->jump, + INSN_LOCATION (if_info->insn_a)); + if_info->transform_name = "noce_try_cmove_load_mem_notrap"; + return true; + + end_seq_and_fail: + end_sequence (); + return false; +} + +static bool +noce_try_cmove_store_mem_notrap (struct noce_if_info *if_info, rtx *x_ptr, rtx orig_x) +{ + rtx a = if_info->a; + rtx b = if_info->b; + rtx x = orig_x; + machine_mode x_mode = GET_MODE (x); + + if (!MEM_P (x) || !rtx_equal_p (x, b)) + return false; + if (!may_trap_or_fault_p (x) || !targetm.have_conditional_move_mem_notrap (x)) + return false; + if (!if_info->then_simple || !register_operand (a, x_mode)) + return false; + + rtx cond = if_info->cond; + enum rtx_code code = GET_CODE (cond); + rtx_insn *ifcvt_seq; + + start_sequence (); + + rtx target = noce_emit_cmove (if_info, x, code, XEXP (cond, 0), XEXP (cond, 1), + a, b); + + if (! target) + goto end_seq_and_fail; + + if (target != x) + noce_emit_move_insn (x, target); + + ifcvt_seq = end_ifcvt_sequence (if_info); + if (!ifcvt_seq || !targetm.noce_conversion_profitable_p (ifcvt_seq, if_info)) + return false; + + emit_insn_before_setloc (ifcvt_seq, if_info->jump, + INSN_LOCATION (if_info->insn_a)); + if_info->transform_name = "noce_try_cmove_load_mem_notrap"; + if_info->x = orig_x; + *x_ptr = orig_x; + return true; + + end_seq_and_fail: + end_sequence (); + return false; +} + /* For most cases, the simplified condition we found is the best choice, but this is not the case for the min/max/abs transforms. For these we wish to know that it is A or B in the condition. */ @@ -4121,12 +4354,21 @@ noce_process_if_block (struct noce_if_info *if_info) } if (!set_b && MEM_P (orig_x)) + { + /* Conditional_move_suppress_fault for condition mem store would not + move any arithmetic calculations. */ + if (targetm.have_conditional_move_mem_notrap (orig_x) + && HAVE_conditional_move + && noce_try_cmove_store_mem_notrap (if_info, &x, orig_x)) + goto success; + else /* We want to avoid store speculation to avoid cases like if (pthread_mutex_trylock(mutex)) ++global_variable; Rather than go to much effort here, we rely on the SSA optimizers, which do a good enough job these days. */ - return false; + return false; + } if (noce_try_move (if_info)) goto success; @@ -4160,6 +4402,9 @@ noce_process_if_block (struct noce_if_info *if_info) if (HAVE_conditional_move && noce_try_cmove_arith (if_info)) goto success; + if (HAVE_conditional_move + && noce_try_cmove_load_mem_notrap (if_info)) + goto success; if (noce_try_sign_mask (if_info)) goto success; } diff --git a/gcc/testsuite/gcc.target/i386/apx-cfcmov-1.c b/gcc/testsuite/gcc.target/i386/apx-cfcmov-1.c new file mode 100644 index 00000000000..4a1fb91b24c --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/apx-cfcmov-1.c @@ -0,0 +1,73 @@ +/* { dg-do compile { target { ! ia32 } } } */ +/* { dg-options "-O3 -mapxf" } */ + +/* { dg-final { scan-assembler-times "cfcmovne" 1 } } */ +/* { dg-final { scan-assembler-times "cfcmovg" 2} } */ +/* { dg-final { scan-assembler-times "cfcmove" 1 } } */ +/* { dg-final { scan-assembler-times "cfcmovl" 2 } } */ +/* { dg-final { scan-assembler-times "cfcmovle" 1 } } */ + +__attribute__((noinline, noclone, target("apxf"))) +int cfc_store (int a, int b, int c, int d, int *arr) +{ + if (a != b) + *arr = c; + return d; + +} + +__attribute__((noinline, noclone, target("apxf"))) +int cfc_load_ndd (int a, int b, int c, int *p) +{ + if (a > b) + return *p; + return c; +} + +__attribute__((noinline, noclone, target("apxf"))) +int cfc_load_2_trap (int a, int b, int *c, int *p) +{ + if (a > b) + return *p; + return *c; +} + +__attribute__((noinline, noclone, target("apxf"))) +int cfc_load_zero (int a, int b, int c) +{ + int sum = 0; + if (a == b) + return c; + return sum; +} + +__attribute__((noinline, noclone, target("apxf"))) +int cfc_load_mem (int a, int b, int *p) +{ + int sum = 0; + if (a < b ) + sum = *p; + return sum; +} + +__attribute__((noinline, noclone, target("apxf"))) +int cfc_load_arith_1 (int a, int b, int c, int *p) +{ + int sum = 0; + if (a > b) + sum = *p; + else + sum = a + c; + return sum + 1; +} + +__attribute__((noinline, noclone, target("apxf"))) +int cfc_load_arith_2 (int a, int b, int c, int *p) +{ + int sum = 0; + if (a > b) + sum = a + c; + else + sum = *p; + return sum + 1; +} diff --git a/gcc/testsuite/gcc.target/i386/apx-cfcmov-2.c b/gcc/testsuite/gcc.target/i386/apx-cfcmov-2.c new file mode 100644 index 00000000000..2b1660f64fa --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/apx-cfcmov-2.c @@ -0,0 +1,40 @@ +/* { dg-do run { target { ! ia32 } } } */ +/* { dg-require-effective-target apxf } */ +/* { dg-options "-mapxf -march=x86-64 -O3" } */ + +#include "apx-cfcmov-1.c" + +extern void abort (void); + +int main () +{ + if (!__builtin_cpu_supports ("apxf")) + return 0; + + int arr = 6; + int arr1 = 5; + int res = cfc_store (1, 2, 3, 4, &arr); + if (arr != 3 && res != 4) + abort (); + res = cfc_load_ndd (2, 1, 2, &arr); + if (res != 3) + abort (); + res = cfc_load_2_trap (1, 2, &arr1, &arr); + if (res != 5) + abort (); + res = cfc_load_zero (1, 2, 3); + res = cfc_load_zero (1, 2, 3); + if (res != 0) + abort (); + res = cfc_load_mem (2, 1, &arr); + if (res != 0) + abort (); + res = cfc_load_arith_1 (1, 2, 3, &arr); + if (res != 5) + abort(); + res = cfc_load_arith_2 (2, 1, 3,&arr); + if (res != 6) + abort(); + return 0; +} + -- 2.31.1 ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH 2/2] [APX CFCMOV] Support APX CFCMOV 2024-06-14 1:38 ` [PATCH 2/2] [APX CFCMOV] Support APX CFCMOV Kong, Lingling @ 2024-06-14 6:12 ` Richard Biener 2024-06-14 6:19 ` Liu, Hongtao 0 siblings, 1 reply; 4+ messages in thread From: Richard Biener @ 2024-06-14 6:12 UTC (permalink / raw) To: Kong, Lingling, Richard Sandiford; +Cc: gcc-patches, Liu, Hongtao, Uros Bizjak On Fri, Jun 14, 2024 at 3:39 AM Kong, Lingling <lingling.kong@intel.com> wrote: > > From: konglin1 <lingling.kong@intel.com> > > > > APX CFCMOV feature implements conditionally faulting which means that all > > memory faults are suppressed when the condition code evaluates to false and > > load or store a memory operand. Now we could load or store a memory operand > > may trap or fault for conditional move. > > > > To enable CFCMOV, we add a target HOOK TARGET_HAVE_CONDITIONAL_MOVE_MEM_NOTRAP > > in if-conversion pass to allow convert to cmov. > > > > Bootstrapped & regtested on x86-64-pc-linux-gnu with binutils 2.42 branch. > > OK for trunk? How does if-conversion end up modifying the IL? I have the gut feeling that your hook changes semantics of RTL and you should instead have an optab for a "masked" load/store? Richard - do you already have plans how to represent the first-fault loads? (are there first-fault stores?) Richard. > > > gcc/ChangeLog: > > > > * config/i386/i386-expand.cc (ix86_can_cfcmov_p): New function that > > test if the cfcmov can be generated. > > (ix86_expand_int_movcc): Expand to cfcmov pattern if ix86_can_cfcmov_p > > return ture. > > * config/i386/i386-opts.h (enum apx_features): Add apx_cfcmov. > > * config/i386/i386.cc (ix86_have_conditional_move_mem_notrap): New > > function to hook TARGET_HAVE_CONDITIONAL_MOVE_MEM_NOTRAP > > (TARGET_HAVE_CONDITIONAL_MOVE_MEM_NOTRAP): Target hook define. > > (ix86_rtx_costs): Add UNSPEC_APX_CFCMOV cost; > > * config/i386/i386.h (TARGET_APX_CFCMOV): Define. > > * config/i386/i386.md (*cfcmov<mode>_1): New define_insn to support > > cfcmov. > > (*cfcmov<mode>_2): Ditto. > > (UNSPEC_APX_CFCMOV): New unspec for cfcmov. > > * config/i386/i386.opt: Add enum value for cfcmov. > > * ifcvt.cc (noce_try_cmove_load_mem_notrap): Use target hook to allow > > convert to cfcmov for conditional load. > > (noce_try_cmove_store_mem_notrap): Convert to conditional store. > > (noce_process_if_block): Ditto. > > > > gcc/testsuite/ChangeLog: > > > > * gcc.target/i386/apx-cfcmov-1.c: New test. > > * gcc.target/i386/apx-cfcmov-2.c: Ditto. > > --- > > gcc/config/i386/i386-expand.cc | 63 +++++ > > gcc/config/i386/i386-opts.h | 4 +- > > gcc/config/i386/i386.cc | 33 ++- > > gcc/config/i386/i386.h | 1 + > > gcc/config/i386/i386.md | 53 +++- > > gcc/config/i386/i386.opt | 3 + > > gcc/config/i386/predicates.md | 7 + > > gcc/ifcvt.cc | 247 ++++++++++++++++++- > > gcc/testsuite/gcc.target/i386/apx-cfcmov-1.c | 73 ++++++ > > gcc/testsuite/gcc.target/i386/apx-cfcmov-2.c | 40 +++ > > 10 files changed, 511 insertions(+), 13 deletions(-) > > create mode 100644 gcc/testsuite/gcc.target/i386/apx-cfcmov-1.c > > create mode 100644 gcc/testsuite/gcc.target/i386/apx-cfcmov-2.c > > > > diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc > > index 312329e550b..c02a4bcbec3 100644 > > --- a/gcc/config/i386/i386-expand.cc > > +++ b/gcc/config/i386/i386-expand.cc > > @@ -3336,6 +3336,30 @@ ix86_expand_int_addcc (rtx operands[]) > > return true; > > } > > > > +/* Return TRUE if we could convert "if (test) x = a; else x = b;" to cfcmov, > > + especially when load a or b or x store may cause memmory faults. */ > > +bool > > +ix86_can_cfcmov_p (rtx x, rtx a, rtx b) > > +{ > > + machine_mode mode = GET_MODE (x); > > + if (TARGET_APX_CFCMOV > > + && (mode == DImode || mode == SImode || mode == HImode)) > > + { > > + /* C load (r m r), (r m C), (r r m). For r m m could use > > + two cfcmov. */ > > + if (register_operand (x, mode) > > + && ((MEM_P (a) && register_operand (b, mode)) > > + || (MEM_P (a) && b == const0_rtx) > > + || (register_operand (a, mode) && MEM_P (b)) > > + || (MEM_P (a) && MEM_P (b)))) > > + return true; > > + /* C store (m r 0). */ > > + else if (MEM_P (x) && x == b && register_operand (a, mode)) > > + return true; > > + } > > + return false; > > +} > > + > > bool > > ix86_expand_int_movcc (rtx operands[]) > > { > > @@ -3366,6 +3390,45 @@ ix86_expand_int_movcc (rtx operands[]) > > > > compare_code = GET_CODE (compare_op); > > > > + if (MEM_P (operands[0]) > > + && !ix86_can_cfcmov_p (operands[0], op2, op3)) > > + return false; > > + > > + if (may_trap_or_fault_p (op2) || may_trap_or_fault_p (op3)) > > + { > > + if (ix86_can_cfcmov_p (operands[0], op2, op3)) > > + { > > + if (may_trap_or_fault_p (op2)) > > + op2 = gen_rtx_UNSPEC (mode, gen_rtvec (1, operands[2]), > > + UNSPEC_APX_CFCMOV); > > + if (may_trap_or_fault_p (op3)) > > + op3 = gen_rtx_UNSPEC (mode, gen_rtvec (1, operands[3]), > > + UNSPEC_APX_CFCMOV); > > + emit_insn (compare_seq); > > + > > + if (may_trap_or_fault_p (op2) && may_trap_or_fault_p (op3)) > > + { > > + emit_insn (gen_rtx_SET (operands[0], > > + gen_rtx_IF_THEN_ELSE (mode, > > + compare_op, > > + op2, > > + operands[0]))); > > + emit_insn (gen_rtx_SET (operands[0], > > + gen_rtx_IF_THEN_ELSE (mode, > > + compare_op, > > + operands[0], > > + op3))); > > + } > > + else > > + emit_insn (gen_rtx_SET (operands[0], > > + gen_rtx_IF_THEN_ELSE (mode, > > + compare_op, > > + op2, op3))); > > + return true; > > + } > > + return false; > > + } > > + > > if ((op1 == const0_rtx && (code == GE || code == LT)) > > || (op1 == constm1_rtx && (code == GT || code == LE))) > > sign_bit_compare_p = true; > > diff --git a/gcc/config/i386/i386-opts.h b/gcc/config/i386/i386-opts.h > > index c7ec0d9fd39..711519ffb53 100644 > > --- a/gcc/config/i386/i386-opts.h > > +++ b/gcc/config/i386/i386-opts.h > > @@ -143,8 +143,10 @@ enum apx_features { > > apx_nf = 1 << 4, > > apx_ccmp = 1 << 5, > > apx_zu = 1 << 6, > > + apx_cfcmov = 1 << 7, > > apx_all = apx_egpr | apx_push2pop2 | apx_ndd > > - | apx_ppx | apx_nf | apx_ccmp | apx_zu, > > + | apx_ppx | apx_nf | apx_ccmp | apx_zu > > + | apx_cfcmov, > > }; > > > > #endif > > diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc > > index 173db213d14..b14c0a3d9f2 100644 > > --- a/gcc/config/i386/i386.cc > > +++ b/gcc/config/i386/i386.cc > > @@ -22349,10 +22349,18 @@ ix86_rtx_costs (rtx x, machine_mode mode, int outer_code_i, int opno, > > *total = COSTS_N_INSNS (1); > > if (!COMPARISON_P (XEXP (x, 0)) && !REG_P (XEXP (x, 0))) > > *total += rtx_cost (XEXP (x, 0), mode, code, 0, speed); > > - if (!REG_P (XEXP (x, 1))) > > - *total += rtx_cost (XEXP (x, 1), mode, code, 1, speed); > > - if (!REG_P (XEXP (x, 2))) > > - *total += rtx_cost (XEXP (x, 2), mode, code, 2, speed); > > + rtx op1, op2; > > + op1 = XEXP (x, 1); > > + op2 = XEXP (x, 2); > > + /* Handle UNSPEC_APX_CFCMOV for cfcmov. */ > > + if (GET_CODE (op1) == UNSPEC && XINT (op1, 1) == UNSPEC_APX_CFCMOV) > > + op1 = XVECEXP (op1, 0, 0); > > + if (GET_CODE (op2) == UNSPEC && XINT (op2, 1) == UNSPEC_APX_CFCMOV) > > + op2 = XVECEXP (op2, 0, 0); > > + if (!REG_P (op1)) > > + *total += rtx_cost (op1, mode, code, 1, speed); > > + if (!REG_P (op2)) > > + *total += rtx_cost (op2, mode, code, 2, speed); > > return true; > > } > > return false; > > @@ -24998,6 +25006,19 @@ ix86_noce_conversion_profitable_p (rtx_insn *seq, struct noce_if_info *if_info) > > return default_noce_conversion_profitable_p (seq, if_info); > > } > > > > + > > +/* Implement targetm.have_conditional_move_mem_notrap hook. */ > > +static bool > > +ix86_have_conditional_move_mem_notrap (rtx x) > > +{ > > + machine_mode mode = GET_MODE (x); > > + if (TARGET_APX_CFCMOV > > + && (mode == DImode || mode == SImode || mode == HImode) > > + && MEM_P (x)) > > + return true; > > + return false; > > +} > > + > > /* x86-specific vector costs. */ > > class ix86_vector_costs : public vector_costs > > { > > @@ -26975,6 +26996,10 @@ ix86_libgcc_floating_mode_supported_p > > #undef TARGET_NOCE_CONVERSION_PROFITABLE_P > > #define TARGET_NOCE_CONVERSION_PROFITABLE_P ix86_noce_conversion_profitable_p > > > > +#undef TARGET_HAVE_CONDITIONAL_MOVE_MEM_NOTRAP > > +#define TARGET_HAVE_CONDITIONAL_MOVE_MEM_NOTRAP \ > > + ix86_have_conditional_move_mem_notrap > > + > > #undef TARGET_HARD_REGNO_NREGS > > #define TARGET_HARD_REGNO_NREGS ix86_hard_regno_nregs > > #undef TARGET_HARD_REGNO_MODE_OK > > diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h > > index dc1a1f44320..6a20fa678c8 100644 > > --- a/gcc/config/i386/i386.h > > +++ b/gcc/config/i386/i386.h > > @@ -58,6 +58,7 @@ see the files COPYING3 and COPYING.RUNTIME respectively. If not, see > > #define TARGET_APX_NF (ix86_apx_features & apx_nf) > > #define TARGET_APX_CCMP (ix86_apx_features & apx_ccmp) > > #define TARGET_APX_ZU (ix86_apx_features & apx_zu) > > +#define TARGET_APX_CFCMOV (ix86_apx_features & apx_cfcmov) > > > > #include "config/vxworks-dummy.h" > > > > diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md > > index fd48e764469..57448c07828 100644 > > --- a/gcc/config/i386/i386.md > > +++ b/gcc/config/i386/i386.md > > @@ -221,6 +221,9 @@ > > ;; For APX CCMP support > > ;; DFV = default flag value > > UNSPEC_APX_DFV > > + > > + ;; For APX CFCMOV support > > + UNSPEC_APX_CFCMOV > > ]) > > > > (define_c_enum "unspecv" [ > > @@ -579,7 +582,7 @@ > > noavx512dq,fma_or_avx512vl,avx512vl,noavx512vl,avxvnni, > > avx512vnnivl,avx512fp16,avxifma,avx512ifmavl,avxneconvert, > > avx512bf16vl,vpclmulqdqvl,avx_noavx512f,avx_noavx512vl, > > - vaes_avx512vl,noapx_nf" > > + vaes_avx512vl,noapx_nf,apx_cfcmov" > > (const_string "base")) > > > > ;; The (bounding maximum) length of an instruction immediate. > > @@ -986,6 +989,7 @@ > > (eq_attr "mmx_isa" "avx") > > (symbol_ref "TARGET_MMX_WITH_SSE && TARGET_AVX") > > (eq_attr "isa" "noapx_nf") (symbol_ref "!TARGET_APX_NF") > > + (eq_attr "isa" "apx_cfcmov") (symbol_ref "TARGET_APX_CFCMOV") > > ] > > (const_int 1))) > > > > @@ -24995,7 +24999,7 @@ > > ;; Conditional move instructions. > > > > (define_expand "mov<mode>cc" > > - [(set (match_operand:SWIM 0 "register_operand") > > + [(set (match_operand:SWIM 0 "register_or_cfc_mem_operand") > > (if_then_else:SWIM (match_operand 1 "comparison_operator") > > (match_operand:SWIM 2 "<general_operand>") > > (match_operand:SWIM 3 "<general_operand>")))] > > @@ -25103,19 +25107,54 @@ > > (set (match_dup 0) > > (neg:SWI (ltu:SWI (reg:CCC FLAGS_REG) (const_int 0))))]) > > > > +(define_insn "*cfcmov<mode>_1" > > + [(set (match_operand:SWI248 0 "register_operand" "=r,r") > > + (if_then_else:SWI248 (match_operator 1 "ix86_comparison_operator" > > + [(reg FLAGS_REG) (const_int 0)]) > > + (unspec:SWI248 > > + [(match_operand:SWI248 2 "memory_operand" "m,m")] > > + UNSPEC_APX_CFCMOV) > > + (match_operand:SWI248 3 "reg_or_0_operand" "C,r")))] > > + "TARGET_CMOVE && TARGET_APX_CFCMOV" > > + "@ > > + cfcmov%O2%C1\t{%2, %0|%0, %2} > > + cfcmov%O2%C1\t{%2, %3, %0|%0, %3, %2}" > > + [(set_attr "isa" "*,apx_ndd") > > + (set_attr "type" "icmov") > > + (set_attr "prefix" "evex") > > + (set_attr "mode" "<MODE>")]) > > + > > +(define_insn "*cfcmov<mode>_2" > > + [(set (match_operand:SWI248 0 "nonimmediate_operand" "=r,m") > > + (if_then_else:SWI248 (match_operator 1 "ix86_comparison_operator" > > + [(reg FLAGS_REG) (const_int 0)]) > > + (match_operand:SWI248 2 "register_operand" "r,r") > > + (unspec:SWI248 > > + [(match_operand:SWI248 3 "memory_operand" "m,0")] > > + UNSPEC_APX_CFCMOV)))] > > + "TARGET_CMOVE && TARGET_APX_CFCMOV" > > + "@ > > + cfcmov%O2%c1\t{%3, %2, %0|%0, %2, %3} > > + cfcmov%O2%C1\t{%2, %0|%0, %2}" > > + [(set_attr "isa" "apx_ndd,*") > > + (set_attr "type" "icmov") > > + (set_attr "prefix" "evex") > > + (set_attr "mode" "<MODE>")]) > > + > > (define_insn "*mov<mode>cc_noc" > > - [(set (match_operand:SWI248 0 "register_operand" "=r,r,r,r") > > + [(set (match_operand:SWI248 0 "register_operand" "=r,r,r,r,r") > > (if_then_else:SWI248 (match_operator 1 "ix86_comparison_operator" > > [(reg FLAGS_REG) (const_int 0)]) > > - (match_operand:SWI248 2 "nonimmediate_operand" "rm,0,rm,r") > > - (match_operand:SWI248 3 "nonimmediate_operand" "0,rm,r,rm")))] > > + (match_operand:SWI248 2 "nonimmediate_operand" "rm,0,rm,r,r") > > + (match_operand:SWI248 3 "nonimm_or_0_operand" "0,rm,r,rm,C")))] > > "TARGET_CMOVE && !(MEM_P (operands[2]) && MEM_P (operands[3]))" > > "@ > > cmov%O2%C1\t{%2, %0|%0, %2} > > cmov%O2%c1\t{%3, %0|%0, %3} > > cmov%O2%C1\t{%2, %3, %0|%0, %3, %2} > > - cmov%O2%c1\t{%3, %2, %0|%0, %2, %3}" > > - [(set_attr "isa" "*,*,apx_ndd,apx_ndd") > > + cmov%O2%c1\t{%3, %2, %0|%0, %2, %3} > > + cfcmov%O2%C1\t{%2, %0|%0, %2}" > > + [(set_attr "isa" "*,*,apx_ndd,apx_ndd,apx_cfcmov") > > (set_attr "type" "icmov") > > (set_attr "mode" "<MODE>")]) > > > > diff --git a/gcc/config/i386/i386.opt b/gcc/config/i386/i386.opt > > index 353fffb2343..7d63d9abd95 100644 > > --- a/gcc/config/i386/i386.opt > > +++ b/gcc/config/i386/i386.opt > > @@ -1345,6 +1345,9 @@ Enum(apx_features) String(ccmp) Value(apx_ccmp) Set(7) > > EnumValue > > Enum(apx_features) String(zu) Value(apx_zu) Set(8) > > > > +EnumValue > > +Enum(apx_features) String(cfcmov) Value(apx_cfcmov) Set(9) > > + > > EnumValue > > Enum(apx_features) String(all) Value(apx_all) Set(1) > > > > diff --git a/gcc/config/i386/predicates.md b/gcc/config/i386/predicates.md > > index 7afe3100cb7..d562e10ab41 100644 > > --- a/gcc/config/i386/predicates.md > > +++ b/gcc/config/i386/predicates.md > > @@ -2322,3 +2322,10 @@ > > > > return true; > > }) > > + > > +;; Return true if OP is a register operand or memory_operand is only > > +;; supported under TARGET_APX_CFCMOV. > > +(define_predicate "register_or_cfc_mem_operand" > > + (ior (match_operand 0 "register_operand") > > + (and (match_code "mem") > > + (match_test "TARGET_APX_CFCMOV")))) > > diff --git a/gcc/ifcvt.cc b/gcc/ifcvt.cc > > index 58ed42673e5..6e3e48af810 100644 > > --- a/gcc/ifcvt.cc > > +++ b/gcc/ifcvt.cc > > @@ -783,6 +783,8 @@ static rtx noce_emit_cmove (struct noce_if_info *, rtx, enum rtx_code, rtx, > > rtx, rtx, rtx, rtx = NULL, rtx = NULL); > > static bool noce_try_cmove (struct noce_if_info *); > > static bool noce_try_cmove_arith (struct noce_if_info *); > > +static bool noce_try_cmove_load_mem_notrap (struct noce_if_info *); > > +static bool noce_try_cmove_store_mem_notrap (struct noce_if_info *, rtx *, rtx); > > static rtx noce_get_alt_condition (struct noce_if_info *, rtx, rtx_insn **); > > static bool noce_try_minmax (struct noce_if_info *); > > static bool noce_try_abs (struct noce_if_info *); > > @@ -2401,6 +2403,237 @@ noce_try_cmove_arith (struct noce_if_info *if_info) > > return false; > > } > > > > +/* When target support suppress memory fault, try more complex cases involving > > + conditional_move's source or dest may trap or fault. */ > > + > > +static bool > > +noce_try_cmove_load_mem_notrap (struct noce_if_info *if_info) > > +{ > > + rtx a = if_info->a; > > + rtx b = if_info->b; > > + rtx x = if_info->x; > > + > > + if (MEM_P (x)) > > + return false; > > + /* Just handle a conditional move from one trap MEM + other non_trap, > > + non mem cases. */ > > + if (!(MEM_P (a) ^ MEM_P (b))) > > + return false; > > + bool a_trap = may_trap_or_fault_p (a); > > + bool b_trap = may_trap_or_fault_p (b); > > + > > + if (!(a_trap ^ b_trap)) > > + return false; > > + if (a_trap && (!MEM_P (a) || !targetm.have_conditional_move_mem_notrap (a))) > > + return false; > > + if (b_trap && (!MEM_P (b) || !targetm.have_conditional_move_mem_notrap (b))) > > + return false; > > + > > + rtx orig_b; > > + rtx_insn *insn_a, *insn_b; > > + bool a_simple = if_info->then_simple; > > + bool b_simple = if_info->else_simple; > > + basic_block then_bb = if_info->then_bb; > > + basic_block else_bb = if_info->else_bb; > > + rtx target; > > + enum rtx_code code; > > + rtx cond = if_info->cond; > > + rtx_insn *ifcvt_seq; > > + > > + /* if (test) x = *a; else x = c - d; > > + => x = c - d; > > + if (test) > > + x = *a; > > + */ > > + > > + code = GET_CODE (cond); > > + insn_a = if_info->insn_a; > > + insn_b = if_info->insn_b; > > + > > + machine_mode x_mode = GET_MODE (x); > > + > > + if (!can_conditionally_move_p (x_mode)) > > + return false; > > + > > + /* Because we only handle one trap MEM + other non_trap, non mem cases, > > + just move one trap MEM always in then_bb. */ > > + if (noce_reversed_cond_code (if_info) != UNKNOWN) > > + { > > + bool reversep = false; > > + if (b_trap) > > + reversep = true; > > + > > + if (reversep) > > + { > > + if (if_info->rev_cond) > > + { > > + cond = if_info->rev_cond; > > + code = GET_CODE (cond); > > + } > > + else > > + code = reversed_comparison_code (cond, if_info->jump); > > + std::swap (a, b); > > + std::swap (insn_a, insn_b); > > + std::swap (a_simple, b_simple); > > + std::swap (then_bb, else_bb); > > + } > > + } > > + > > + if (then_bb && else_bb > > + && (!bbs_ok_for_cmove_arith (then_bb, else_bb, if_info->orig_x) > > + || !bbs_ok_for_cmove_arith (else_bb, then_bb, if_info->orig_x))) > > + return false; > > + > > + start_sequence (); > > + > > + /* If one of the blocks is empty then the corresponding B or A value > > + came from the test block. The non-empty complex block that we will > > + emit might clobber the register used by B or A, so move it to a pseudo > > + first. */ > > + > > + rtx tmp_b = NULL_RTX; > > + > > + /* Don't move trap mem to a pseudo. */ > > + if (!may_trap_or_fault_p (b) && (b_simple || !else_bb)) > > + tmp_b = gen_reg_rtx (x_mode); > > + > > + orig_b = b; > > + > > + rtx emit_a = NULL_RTX; > > + rtx emit_b = NULL_RTX; > > + rtx_insn *tmp_insn = NULL; > > + bool modified_in_a = false; > > + bool modified_in_b = false; > > + /* If either operand is complex, load it into a register first. > > + The best way to do this is to copy the original insn. In this > > + way we preserve any clobbers etc that the insn may have had. > > + This is of course not possible in the IS_MEM case. */ > > + > > + if (! general_operand (b, GET_MODE (b)) || tmp_b) > > + { > > + if (insn_b) > > + { > > + b = tmp_b ? tmp_b : gen_reg_rtx (GET_MODE (b)); > > + rtx_insn *copy_of_b = as_a <rtx_insn *> (copy_rtx (insn_b)); > > + rtx set = single_set (copy_of_b); > > + > > + SET_DEST (set) = b; > > + emit_b = PATTERN (copy_of_b); > > + } > > + else > > + { > > + rtx tmp_reg = tmp_b ? tmp_b : gen_reg_rtx (GET_MODE (b)); > > + emit_b = gen_rtx_SET (tmp_reg, b); > > + b = tmp_reg; > > + } > > + } > > + > > + if (tmp_b && then_bb) > > + { > > + FOR_BB_INSNS (then_bb, tmp_insn) > > + /* Don't check inside insn_a. We will have changed it to emit_a > > + with a destination that doesn't conflict. */ > > + if (!(insn_a && tmp_insn == insn_a) > > + && modified_in_p (orig_b, tmp_insn)) > > + { > > + modified_in_a = true; > > + break; > > + } > > + > > + } > > + > > + modified_in_b = emit_b != NULL_RTX && modified_in_p (a, emit_b); > > + /* If insn to set up A clobbers any registers B depends on, try to > > + swap insn that sets up A with the one that sets up B. If even > > + that doesn't help, punt. */ > > + if (modified_in_a && !modified_in_b) > > + { > > + if (!noce_emit_bb (emit_b, else_bb, b_simple)) > > + goto end_seq_and_fail; > > + > > + if (!noce_emit_bb (emit_a, then_bb, a_simple)) > > + goto end_seq_and_fail; > > + } > > + else if (!modified_in_a) > > + { > > + if (!noce_emit_bb (emit_b, else_bb, b_simple)) > > + goto end_seq_and_fail; > > + > > + if (!noce_emit_bb (emit_a, then_bb, a_simple)) > > + goto end_seq_and_fail; > > + } > > + else > > + goto end_seq_and_fail; > > + > > + target = noce_emit_cmove (if_info, x, code, XEXP (cond, 0), XEXP (cond, 1), > > + a, b); > > + > > + if (! target) > > + goto end_seq_and_fail; > > + > > + if (target != x) > > + noce_emit_move_insn (x, target); > > + > > + ifcvt_seq = end_ifcvt_sequence (if_info); > > + if (!ifcvt_seq || !targetm.noce_conversion_profitable_p (ifcvt_seq, if_info)) > > + return false; > > + > > + emit_insn_before_setloc (ifcvt_seq, if_info->jump, > > + INSN_LOCATION (if_info->insn_a)); > > + if_info->transform_name = "noce_try_cmove_load_mem_notrap"; > > + return true; > > + > > + end_seq_and_fail: > > + end_sequence (); > > + return false; > > +} > > + > > +static bool > > +noce_try_cmove_store_mem_notrap (struct noce_if_info *if_info, rtx *x_ptr, rtx orig_x) > > +{ > > + rtx a = if_info->a; > > + rtx b = if_info->b; > > + rtx x = orig_x; > > + machine_mode x_mode = GET_MODE (x); > > + > > + if (!MEM_P (x) || !rtx_equal_p (x, b)) > > + return false; > > + if (!may_trap_or_fault_p (x) || !targetm.have_conditional_move_mem_notrap (x)) > > + return false; > > + if (!if_info->then_simple || !register_operand (a, x_mode)) > > + return false; > > + > > + rtx cond = if_info->cond; > > + enum rtx_code code = GET_CODE (cond); > > + rtx_insn *ifcvt_seq; > > + > > + start_sequence (); > > + > > + rtx target = noce_emit_cmove (if_info, x, code, XEXP (cond, 0), XEXP (cond, 1), > > + a, b); > > + > > + if (! target) > > + goto end_seq_and_fail; > > + > > + if (target != x) > > + noce_emit_move_insn (x, target); > > + > > + ifcvt_seq = end_ifcvt_sequence (if_info); > > + if (!ifcvt_seq || !targetm.noce_conversion_profitable_p (ifcvt_seq, if_info)) > > + return false; > > + > > + emit_insn_before_setloc (ifcvt_seq, if_info->jump, > > + INSN_LOCATION (if_info->insn_a)); > > + if_info->transform_name = "noce_try_cmove_load_mem_notrap"; > > + if_info->x = orig_x; > > + *x_ptr = orig_x; > > + return true; > > + > > + end_seq_and_fail: > > + end_sequence (); > > + return false; > > +} > > + > > /* For most cases, the simplified condition we found is the best > > choice, but this is not the case for the min/max/abs transforms. > > For these we wish to know that it is A or B in the condition. */ > > @@ -4121,12 +4354,21 @@ noce_process_if_block (struct noce_if_info *if_info) > > } > > > > if (!set_b && MEM_P (orig_x)) > > + { > > + /* Conditional_move_suppress_fault for condition mem store would not > > + move any arithmetic calculations. */ > > + if (targetm.have_conditional_move_mem_notrap (orig_x) > > + && HAVE_conditional_move > > + && noce_try_cmove_store_mem_notrap (if_info, &x, orig_x)) > > + goto success; > > + else > > /* We want to avoid store speculation to avoid cases like > > if (pthread_mutex_trylock(mutex)) > > ++global_variable; > > Rather than go to much effort here, we rely on the SSA optimizers, > > which do a good enough job these days. */ > > - return false; > > + return false; > > + } > > > > if (noce_try_move (if_info)) > > goto success; > > @@ -4160,6 +4402,9 @@ noce_process_if_block (struct noce_if_info *if_info) > > if (HAVE_conditional_move > > && noce_try_cmove_arith (if_info)) > > goto success; > > + if (HAVE_conditional_move > > + && noce_try_cmove_load_mem_notrap (if_info)) > > + goto success; > > if (noce_try_sign_mask (if_info)) > > goto success; > > } > > diff --git a/gcc/testsuite/gcc.target/i386/apx-cfcmov-1.c b/gcc/testsuite/gcc.target/i386/apx-cfcmov-1.c > > new file mode 100644 > > index 00000000000..4a1fb91b24c > > --- /dev/null > > +++ b/gcc/testsuite/gcc.target/i386/apx-cfcmov-1.c > > @@ -0,0 +1,73 @@ > > +/* { dg-do compile { target { ! ia32 } } } */ > > +/* { dg-options "-O3 -mapxf" } */ > > + > > +/* { dg-final { scan-assembler-times "cfcmovne" 1 } } */ > > +/* { dg-final { scan-assembler-times "cfcmovg" 2} } */ > > +/* { dg-final { scan-assembler-times "cfcmove" 1 } } */ > > +/* { dg-final { scan-assembler-times "cfcmovl" 2 } } */ > > +/* { dg-final { scan-assembler-times "cfcmovle" 1 } } */ > > + > > +__attribute__((noinline, noclone, target("apxf"))) > > +int cfc_store (int a, int b, int c, int d, int *arr) > > +{ > > + if (a != b) > > + *arr = c; > > + return d; > > + > > +} > > + > > +__attribute__((noinline, noclone, target("apxf"))) > > +int cfc_load_ndd (int a, int b, int c, int *p) > > +{ > > + if (a > b) > > + return *p; > > + return c; > > +} > > + > > +__attribute__((noinline, noclone, target("apxf"))) > > +int cfc_load_2_trap (int a, int b, int *c, int *p) > > +{ > > + if (a > b) > > + return *p; > > + return *c; > > +} > > + > > +__attribute__((noinline, noclone, target("apxf"))) > > +int cfc_load_zero (int a, int b, int c) > > +{ > > + int sum = 0; > > + if (a == b) > > + return c; > > + return sum; > > +} > > + > > +__attribute__((noinline, noclone, target("apxf"))) > > +int cfc_load_mem (int a, int b, int *p) > > +{ > > + int sum = 0; > > + if (a < b ) > > + sum = *p; > > + return sum; > > +} > > + > > +__attribute__((noinline, noclone, target("apxf"))) > > +int cfc_load_arith_1 (int a, int b, int c, int *p) > > +{ > > + int sum = 0; > > + if (a > b) > > + sum = *p; > > + else > > + sum = a + c; > > + return sum + 1; > > +} > > + > > +__attribute__((noinline, noclone, target("apxf"))) > > +int cfc_load_arith_2 (int a, int b, int c, int *p) > > +{ > > + int sum = 0; > > + if (a > b) > > + sum = a + c; > > + else > > + sum = *p; > > + return sum + 1; > > +} > > diff --git a/gcc/testsuite/gcc.target/i386/apx-cfcmov-2.c b/gcc/testsuite/gcc.target/i386/apx-cfcmov-2.c > > new file mode 100644 > > index 00000000000..2b1660f64fa > > --- /dev/null > > +++ b/gcc/testsuite/gcc.target/i386/apx-cfcmov-2.c > > @@ -0,0 +1,40 @@ > > +/* { dg-do run { target { ! ia32 } } } */ > > +/* { dg-require-effective-target apxf } */ > > +/* { dg-options "-mapxf -march=x86-64 -O3" } */ > > + > > +#include "apx-cfcmov-1.c" > > + > > +extern void abort (void); > > + > > +int main () > > +{ > > + if (!__builtin_cpu_supports ("apxf")) > > + return 0; > > + > > + int arr = 6; > > + int arr1 = 5; > > + int res = cfc_store (1, 2, 3, 4, &arr); > > + if (arr != 3 && res != 4) > > + abort (); > > + res = cfc_load_ndd (2, 1, 2, &arr); > > + if (res != 3) > > + abort (); > > + res = cfc_load_2_trap (1, 2, &arr1, &arr); > > + if (res != 5) > > + abort (); > > + res = cfc_load_zero (1, 2, 3); > > + res = cfc_load_zero (1, 2, 3); > > + if (res != 0) > > + abort (); > > + res = cfc_load_mem (2, 1, &arr); > > + if (res != 0) > > + abort (); > > + res = cfc_load_arith_1 (1, 2, 3, &arr); > > + if (res != 5) > > + abort(); > > + res = cfc_load_arith_2 (2, 1, 3,&arr); > > + if (res != 6) > > + abort(); > > + return 0; > > +} > > + > > -- > > 2.31.1 > > ^ permalink raw reply [flat|nested] 4+ messages in thread
* RE: [PATCH 2/2] [APX CFCMOV] Support APX CFCMOV 2024-06-14 6:12 ` Richard Biener @ 2024-06-14 6:19 ` Liu, Hongtao 0 siblings, 0 replies; 4+ messages in thread From: Liu, Hongtao @ 2024-06-14 6:19 UTC (permalink / raw) To: Richard Biener, Kong, Lingling, Richard Sandiford Cc: gcc-patches, Uros Bizjak > -----Original Message----- > From: Richard Biener <richard.guenther@gmail.com> > Sent: Friday, June 14, 2024 2:13 PM > To: Kong, Lingling <lingling.kong@intel.com>; Richard Sandiford > <richard.sandiford@arm.com> > Cc: gcc-patches@gcc.gnu.org; Liu, Hongtao <hongtao.liu@intel.com>; Uros > Bizjak <ubizjak@gmail.com> > Subject: Re: [PATCH 2/2] [APX CFCMOV] Support APX CFCMOV > > On Fri, Jun 14, 2024 at 3:39 AM Kong, Lingling <lingling.kong@intel.com> > wrote: > > > > From: konglin1 <lingling.kong@intel.com> > > > > > > > > APX CFCMOV feature implements conditionally faulting which means that > > all > > > > memory faults are suppressed when the condition code evaluates to > > false and > > > > load or store a memory operand. Now we could load or store a memory > > operand > > > > may trap or fault for conditional move. > > > > > > > > To enable CFCMOV, we add a target HOOK > > TARGET_HAVE_CONDITIONAL_MOVE_MEM_NOTRAP > > > > in if-conversion pass to allow convert to cmov. > > > > > > > > Bootstrapped & regtested on x86-64-pc-linux-gnu with binutils 2.42 branch. > > > > OK for trunk? > > How does if-conversion end up modifying the IL? > > I have the gut feeling that your hook changes semantics of RTL and you should > instead have an optab for a "masked" load/store? > > Richard - do you already have plans how to represent the first-fault loads? > (are there first-fault stores?) Yes. > > Richard. > > > > > > > gcc/ChangeLog: > > > > > > > > * config/i386/i386-expand.cc (ix86_can_cfcmov_p): New > > function that > > > > test if the cfcmov can be generated. > > > > (ix86_expand_int_movcc): Expand to cfcmov pattern if > > ix86_can_cfcmov_p > > > > return ture. > > > > * config/i386/i386-opts.h (enum apx_features): Add apx_cfcmov. > > > > * config/i386/i386.cc > > (ix86_have_conditional_move_mem_notrap): New > > > > function to hook > > TARGET_HAVE_CONDITIONAL_MOVE_MEM_NOTRAP > > > > (TARGET_HAVE_CONDITIONAL_MOVE_MEM_NOTRAP): Target hook > define. > > > > (ix86_rtx_costs): Add UNSPEC_APX_CFCMOV cost; > > > > * config/i386/i386.h (TARGET_APX_CFCMOV): Define. > > > > * config/i386/i386.md (*cfcmov<mode>_1): New > > define_insn to support > > > > cfcmov. > > > > (*cfcmov<mode>_2): Ditto. > > > > (UNSPEC_APX_CFCMOV): New unspec for cfcmov. > > > > * config/i386/i386.opt: Add enum value for cfcmov. > > > > * ifcvt.cc (noce_try_cmove_load_mem_notrap): Use target > > hook to allow > > > > convert to cfcmov for conditional load. > > > > (noce_try_cmove_store_mem_notrap): Convert to conditional store. > > > > (noce_process_if_block): Ditto. > > > > > > > > gcc/testsuite/ChangeLog: > > > > > > > > * gcc.target/i386/apx-cfcmov-1.c: New test. > > > > * gcc.target/i386/apx-cfcmov-2.c: Ditto. > > > > --- > > > > gcc/config/i386/i386-expand.cc | 63 +++++ > > > > gcc/config/i386/i386-opts.h | 4 +- > > > > gcc/config/i386/i386.cc | 33 ++- > > > > gcc/config/i386/i386.h | 1 + > > > > gcc/config/i386/i386.md | 53 +++- > > > > gcc/config/i386/i386.opt | 3 + > > > > gcc/config/i386/predicates.md | 7 + > > > > gcc/ifcvt.cc | 247 ++++++++++++++++++- > > > > gcc/testsuite/gcc.target/i386/apx-cfcmov-1.c | 73 ++++++ > > > > gcc/testsuite/gcc.target/i386/apx-cfcmov-2.c | 40 +++ > > > > 10 files changed, 511 insertions(+), 13 deletions(-) > > > > create mode 100644 gcc/testsuite/gcc.target/i386/apx-cfcmov-1.c > > > > create mode 100644 gcc/testsuite/gcc.target/i386/apx-cfcmov-2.c > > > > > > > > diff --git a/gcc/config/i386/i386-expand.cc > > b/gcc/config/i386/i386-expand.cc > > > > index 312329e550b..c02a4bcbec3 100644 > > > > --- a/gcc/config/i386/i386-expand.cc > > > > +++ b/gcc/config/i386/i386-expand.cc > > > > @@ -3336,6 +3336,30 @@ ix86_expand_int_addcc (rtx operands[]) > > > > return true; > > > > } > > > > > > > > +/* Return TRUE if we could convert "if (test) x = a; else x = b;" to > > +cfcmov, > > > > + especially when load a or b or x store may cause memmory faults. > > + */ > > > > +bool > > > > +ix86_can_cfcmov_p (rtx x, rtx a, rtx b) > > > > +{ > > > > + machine_mode mode = GET_MODE (x); > > > > + if (TARGET_APX_CFCMOV > > > > + && (mode == DImode || mode == SImode || mode == HImode)) > > > > + { > > > > + /* C load (r m r), (r m C), (r r m). For r m m could use > > > > + two cfcmov. */ > > > > + if (register_operand (x, mode) > > > > + && ((MEM_P (a) && register_operand (b, mode)) > > > > + || (MEM_P (a) && b == const0_rtx) > > > > + || (register_operand (a, mode) && MEM_P (b)) > > > > + || (MEM_P (a) && MEM_P (b)))) > > > > + return true; > > > > + /* C store (m r 0). */ > > > > + else if (MEM_P (x) && x == b && register_operand (a, mode)) > > > > + return true; > > > > + } > > > > + return false; > > > > +} > > > > + > > > > bool > > > > ix86_expand_int_movcc (rtx operands[]) > > > > { > > > > @@ -3366,6 +3390,45 @@ ix86_expand_int_movcc (rtx operands[]) > > > > > > > > compare_code = GET_CODE (compare_op); > > > > > > > > + if (MEM_P (operands[0]) > > > > + && !ix86_can_cfcmov_p (operands[0], op2, op3)) > > > > + return false; > > > > + > > > > + if (may_trap_or_fault_p (op2) || may_trap_or_fault_p (op3)) > > > > + { > > > > + if (ix86_can_cfcmov_p (operands[0], op2, op3)) > > > > + { > > > > + if (may_trap_or_fault_p (op2)) > > > > + op2 = gen_rtx_UNSPEC (mode, gen_rtvec (1, > > + operands[2]), > > > > + > > + UNSPEC_APX_CFCMOV); > > > > + if (may_trap_or_fault_p (op3)) > > > > + op3 = gen_rtx_UNSPEC (mode, gen_rtvec (1, > > + operands[3]), > > > > + > > + UNSPEC_APX_CFCMOV); > > > > + emit_insn (compare_seq); > > > > + > > > > + if (may_trap_or_fault_p (op2) && may_trap_or_fault_p > > + (op3)) > > > > + { > > > > + emit_insn (gen_rtx_SET (operands[0], > > > > + > > + gen_rtx_IF_THEN_ELSE (mode, > > > > + > > + compare_op, > > > > + > > + op2, > > > > + > > + operands[0]))); > > > > + emit_insn (gen_rtx_SET (operands[0], > > > > + > > + gen_rtx_IF_THEN_ELSE (mode, > > > > + > > + compare_op, > > > > + > > + operands[0], > > > > + > > + op3))); > > > > + } > > > > + else > > > > + emit_insn (gen_rtx_SET (operands[0], > > > > + > > + gen_rtx_IF_THEN_ELSE (mode, > > > > + > > + compare_op, > > > > + > > + op2, op3))); > > > > + return true; > > > > + } > > > > + return false; > > > > + } > > > > + > > > > if ((op1 == const0_rtx && (code == GE || code == LT)) > > > > || (op1 == constm1_rtx && (code == GT || code == LE))) > > > > sign_bit_compare_p = true; > > > > diff --git a/gcc/config/i386/i386-opts.h b/gcc/config/i386/i386-opts.h > > > > index c7ec0d9fd39..711519ffb53 100644 > > > > --- a/gcc/config/i386/i386-opts.h > > > > +++ b/gcc/config/i386/i386-opts.h > > > > @@ -143,8 +143,10 @@ enum apx_features { > > > > apx_nf = 1 << 4, > > > > apx_ccmp = 1 << 5, > > > > apx_zu = 1 << 6, > > > > + apx_cfcmov = 1 << 7, > > > > apx_all = apx_egpr | apx_push2pop2 | apx_ndd > > > > - | apx_ppx | apx_nf | apx_ccmp | apx_zu, > > > > + | apx_ppx | apx_nf | apx_ccmp | apx_zu > > > > + | apx_cfcmov, > > > > }; > > > > > > > > #endif > > > > diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc > > > > index 173db213d14..b14c0a3d9f2 100644 > > > > --- a/gcc/config/i386/i386.cc > > > > +++ b/gcc/config/i386/i386.cc > > > > @@ -22349,10 +22349,18 @@ ix86_rtx_costs (rtx x, machine_mode mode, > > int outer_code_i, int opno, > > > > *total = COSTS_N_INSNS (1); > > > > if (!COMPARISON_P (XEXP (x, 0)) && !REG_P (XEXP (x, > > 0))) > > > > *total += rtx_cost (XEXP (x, 0), mode, code, 0, > > speed); > > > > - if (!REG_P (XEXP (x, 1))) > > > > - *total += rtx_cost (XEXP (x, 1), mode, code, 1, speed); > > > > - if (!REG_P (XEXP (x, 2))) > > > > - *total += rtx_cost (XEXP (x, 2), mode, code, 2, speed); > > > > + rtx op1, op2; > > > > + op1 = XEXP (x, 1); > > > > + op2 = XEXP (x, 2); > > > > + /* Handle UNSPEC_APX_CFCMOV for cfcmov. */ > > > > + if (GET_CODE (op1) == UNSPEC && XINT (op1, 1) == > > + UNSPEC_APX_CFCMOV) > > > > + op1 = XVECEXP (op1, 0, 0); > > > > + if (GET_CODE (op2) == UNSPEC && XINT (op2, 1) == > > + UNSPEC_APX_CFCMOV) > > > > + op2 = XVECEXP (op2, 0, 0); > > > > + if (!REG_P (op1)) > > > > + *total += rtx_cost (op1, mode, code, 1, speed); > > > > + if (!REG_P (op2)) > > > > + *total += rtx_cost (op2, mode, code, 2, speed); > > > > return true; > > > > } > > > > return false; > > > > @@ -24998,6 +25006,19 @@ ix86_noce_conversion_profitable_p (rtx_insn > > *seq, struct noce_if_info *if_info) > > > > return default_noce_conversion_profitable_p (seq, if_info); > > > > } > > > > > > > > + > > > > +/* Implement targetm.have_conditional_move_mem_notrap hook. */ > > > > +static bool > > > > +ix86_have_conditional_move_mem_notrap (rtx x) > > > > +{ > > > > + machine_mode mode = GET_MODE (x); > > > > + if (TARGET_APX_CFCMOV > > > > + && (mode == DImode || mode == SImode || mode == HImode) > > > > + && MEM_P (x)) > > > > + return true; > > > > + return false; > > > > +} > > > > + > > > > /* x86-specific vector costs. */ > > > > class ix86_vector_costs : public vector_costs > > > > { > > > > @@ -26975,6 +26996,10 @@ ix86_libgcc_floating_mode_supported_p > > > > #undef TARGET_NOCE_CONVERSION_PROFITABLE_P > > > > #define TARGET_NOCE_CONVERSION_PROFITABLE_P > > ix86_noce_conversion_profitable_p > > > > > > > > +#undef TARGET_HAVE_CONDITIONAL_MOVE_MEM_NOTRAP > > > > +#define TARGET_HAVE_CONDITIONAL_MOVE_MEM_NOTRAP \ > > > > + ix86_have_conditional_move_mem_notrap > > > > + > > > > #undef TARGET_HARD_REGNO_NREGS > > > > #define TARGET_HARD_REGNO_NREGS ix86_hard_regno_nregs > > > > #undef TARGET_HARD_REGNO_MODE_OK > > > > diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h > > > > index dc1a1f44320..6a20fa678c8 100644 > > > > --- a/gcc/config/i386/i386.h > > > > +++ b/gcc/config/i386/i386.h > > > > @@ -58,6 +58,7 @@ see the files COPYING3 and COPYING.RUNTIME > > respectively. If not, see > > > > #define TARGET_APX_NF (ix86_apx_features & apx_nf) > > > > #define TARGET_APX_CCMP (ix86_apx_features & apx_ccmp) > > > > #define TARGET_APX_ZU (ix86_apx_features & apx_zu) > > > > +#define TARGET_APX_CFCMOV (ix86_apx_features & apx_cfcmov) > > > > > > > > #include "config/vxworks-dummy.h" > > > > > > > > diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md > > > > index fd48e764469..57448c07828 100644 > > > > --- a/gcc/config/i386/i386.md > > > > +++ b/gcc/config/i386/i386.md > > > > @@ -221,6 +221,9 @@ > > > > ;; For APX CCMP support > > > > ;; DFV = default flag value > > > > UNSPEC_APX_DFV > > > > + > > > > + ;; For APX CFCMOV support > > > > + UNSPEC_APX_CFCMOV > > > > ]) > > > > > > > > (define_c_enum "unspecv" [ > > > > @@ -579,7 +582,7 @@ > > > > > > noavx512dq,fma_or_avx512vl,avx512vl,noavx512vl,avxvnni, > > > > > > avx512vnnivl,avx512fp16,avxifma,avx512ifmavl,avxneconvert, > > > > > > avx512bf16vl,vpclmulqdqvl,avx_noavx512f,avx_noavx512vl, > > > > - vaes_avx512vl,noapx_nf" > > > > + vaes_avx512vl,noapx_nf,apx_cfcmov" > > > > (const_string "base")) > > > > > > > > ;; The (bounding maximum) length of an instruction immediate. > > > > @@ -986,6 +989,7 @@ > > > > (eq_attr "mmx_isa" "avx") > > > > (symbol_ref "TARGET_MMX_WITH_SSE && TARGET_AVX") > > > > (eq_attr "isa" "noapx_nf") (symbol_ref > > "!TARGET_APX_NF") > > > > + (eq_attr "isa" "apx_cfcmov") (symbol_ref > > + "TARGET_APX_CFCMOV") > > > > ] > > > > (const_int 1))) > > > > > > > > @@ -24995,7 +24999,7 @@ > > > > ;; Conditional move instructions. > > > > > > > > (define_expand "mov<mode>cc" > > > > - [(set (match_operand:SWIM 0 "register_operand") > > > > + [(set (match_operand:SWIM 0 "register_or_cfc_mem_operand") > > > > (if_then_else:SWIM (match_operand 1 > > "comparison_operator") > > > > (match_operand:SWIM 2 > > "<general_operand>") > > > > (match_operand:SWIM 3 > > "<general_operand>")))] > > > > @@ -25103,19 +25107,54 @@ > > > > (set (match_dup 0) > > > > (neg:SWI (ltu:SWI (reg:CCC FLAGS_REG) (const_int > > 0))))]) > > > > > > > > +(define_insn "*cfcmov<mode>_1" > > > > + [(set (match_operand:SWI248 0 "register_operand" "=r,r") > > > > + (if_then_else:SWI248 (match_operator 1 "ix86_comparison_operator" > > > > + [(reg FLAGS_REG) (const_int 0)]) > > > > + (unspec:SWI248 > > > > + [(match_operand:SWI248 2 "memory_operand" "m,m")] > > > > + UNSPEC_APX_CFCMOV) > > > > + (match_operand:SWI248 3 "reg_or_0_operand" "C,r")))] > > > > + "TARGET_CMOVE && TARGET_APX_CFCMOV" > > > > + "@ > > > > + cfcmov%O2%C1\t{%2, %0|%0, %2} > > > > + cfcmov%O2%C1\t{%2, %3, %0|%0, %3, %2}" > > > > + [(set_attr "isa" "*,apx_ndd") > > > > + (set_attr "type" "icmov") > > > > + (set_attr "prefix" "evex") > > > > + (set_attr "mode" "<MODE>")]) > > > > + > > > > +(define_insn "*cfcmov<mode>_2" > > > > + [(set (match_operand:SWI248 0 "nonimmediate_operand" "=r,m") > > > > + (if_then_else:SWI248 (match_operator 1 "ix86_comparison_operator" > > > > + [(reg FLAGS_REG) (const_int 0)]) > > > > + (match_operand:SWI248 2 "register_operand" "r,r") > > > > + (unspec:SWI248 > > > > + [(match_operand:SWI248 3 "memory_operand" "m,0")] > > > > + UNSPEC_APX_CFCMOV)))] > > > > + "TARGET_CMOVE && TARGET_APX_CFCMOV" > > > > + "@ > > > > + cfcmov%O2%c1\t{%3, %2, %0|%0, %2, %3} > > > > + cfcmov%O2%C1\t{%2, %0|%0, %2}" > > > > + [(set_attr "isa" "apx_ndd,*") > > > > + (set_attr "type" "icmov") > > > > + (set_attr "prefix" "evex") > > > > + (set_attr "mode" "<MODE>")]) > > > > + > > > > (define_insn "*mov<mode>cc_noc" > > > > - [(set (match_operand:SWI248 0 "register_operand" "=r,r,r,r") > > > > + [(set (match_operand:SWI248 0 "register_operand" "=r,r,r,r,r") > > > > (if_then_else:SWI248 (match_operator 1 > "ix86_comparison_operator" > > > > [(reg FLAGS_REG) > > (const_int 0)]) > > > > - (match_operand:SWI248 2 "nonimmediate_operand" "rm,0,rm,r") > > > > - (match_operand:SWI248 3 "nonimmediate_operand" > "0,rm,r,rm")))] > > > > + (match_operand:SWI248 2 "nonimmediate_operand" > > + "rm,0,rm,r,r") > > > > + (match_operand:SWI248 3 "nonimm_or_0_operand" > > + "0,rm,r,rm,C")))] > > > > "TARGET_CMOVE && !(MEM_P (operands[2]) && MEM_P (operands[3]))" > > > > "@ > > > > cmov%O2%C1\t{%2, %0|%0, %2} > > > > cmov%O2%c1\t{%3, %0|%0, %3} > > > > cmov%O2%C1\t{%2, %3, %0|%0, %3, %2} > > > > - cmov%O2%c1\t{%3, %2, %0|%0, %2, %3}" > > > > - [(set_attr "isa" "*,*,apx_ndd,apx_ndd") > > > > + cmov%O2%c1\t{%3, %2, %0|%0, %2, %3} > > > > + cfcmov%O2%C1\t{%2, %0|%0, %2}" > > > > + [(set_attr "isa" "*,*,apx_ndd,apx_ndd,apx_cfcmov") > > > > (set_attr "type" "icmov") > > > > (set_attr "mode" "<MODE>")]) > > > > > > > > diff --git a/gcc/config/i386/i386.opt b/gcc/config/i386/i386.opt > > > > index 353fffb2343..7d63d9abd95 100644 > > > > --- a/gcc/config/i386/i386.opt > > > > +++ b/gcc/config/i386/i386.opt > > > > @@ -1345,6 +1345,9 @@ Enum(apx_features) String(ccmp) > Value(apx_ccmp) > > Set(7) > > > > EnumValue > > > > Enum(apx_features) String(zu) Value(apx_zu) Set(8) > > > > > > > > +EnumValue > > > > +Enum(apx_features) String(cfcmov) Value(apx_cfcmov) Set(9) > > > > + > > > > EnumValue > > > > Enum(apx_features) String(all) Value(apx_all) Set(1) > > > > > > > > diff --git a/gcc/config/i386/predicates.md > > b/gcc/config/i386/predicates.md > > > > index 7afe3100cb7..d562e10ab41 100644 > > > > --- a/gcc/config/i386/predicates.md > > > > +++ b/gcc/config/i386/predicates.md > > > > @@ -2322,3 +2322,10 @@ > > > > > > > > return true; > > > > }) > > > > + > > > > +;; Return true if OP is a register operand or memory_operand is only > > > > +;; supported under TARGET_APX_CFCMOV. > > > > +(define_predicate "register_or_cfc_mem_operand" > > > > + (ior (match_operand 0 "register_operand") > > > > + (and (match_code "mem") > > > > + (match_test "TARGET_APX_CFCMOV")))) > > > > diff --git a/gcc/ifcvt.cc b/gcc/ifcvt.cc > > > > index 58ed42673e5..6e3e48af810 100644 > > > > --- a/gcc/ifcvt.cc > > > > +++ b/gcc/ifcvt.cc > > > > @@ -783,6 +783,8 @@ static rtx noce_emit_cmove (struct noce_if_info *, > > rtx, enum rtx_code, rtx, > > > > rtx, rtx, rtx, rtx = > > NULL, rtx = NULL); > > > > static bool noce_try_cmove (struct noce_if_info *); > > > > static bool noce_try_cmove_arith (struct noce_if_info *); > > > > +static bool noce_try_cmove_load_mem_notrap (struct noce_if_info *); > > > > +static bool noce_try_cmove_store_mem_notrap (struct noce_if_info *, > > +rtx *, rtx); > > > > static rtx noce_get_alt_condition (struct noce_if_info *, rtx, > > rtx_insn **); > > > > static bool noce_try_minmax (struct noce_if_info *); > > > > static bool noce_try_abs (struct noce_if_info *); > > > > @@ -2401,6 +2403,237 @@ noce_try_cmove_arith (struct noce_if_info > > *if_info) > > > > return false; > > > > } > > > > > > > > +/* When target support suppress memory fault, try more complex cases > > +involving > > > > + conditional_move's source or dest may trap or fault. */ > > > > + > > > > +static bool > > > > +noce_try_cmove_load_mem_notrap (struct noce_if_info *if_info) > > > > +{ > > > > + rtx a = if_info->a; > > > > + rtx b = if_info->b; > > > > + rtx x = if_info->x; > > > > + > > > > + if (MEM_P (x)) > > > > + return false; > > > > + /* Just handle a conditional move from one trap MEM + other > > + non_trap, > > > > + non mem cases. */ > > > > + if (!(MEM_P (a) ^ MEM_P (b))) > > > > + return false; > > > > + bool a_trap = may_trap_or_fault_p (a); > > > > + bool b_trap = may_trap_or_fault_p (b); > > > > + > > > > + if (!(a_trap ^ b_trap)) > > > > + return false; > > > > + if (a_trap && (!MEM_P (a) || > > + !targetm.have_conditional_move_mem_notrap (a))) > > > > + return false; > > > > + if (b_trap && (!MEM_P (b) || > > + !targetm.have_conditional_move_mem_notrap (b))) > > > > + return false; > > > > + > > > > + rtx orig_b; > > > > + rtx_insn *insn_a, *insn_b; > > > > + bool a_simple = if_info->then_simple; > > > > + bool b_simple = if_info->else_simple; > > > > + basic_block then_bb = if_info->then_bb; > > > > + basic_block else_bb = if_info->else_bb; > > > > + rtx target; > > > > + enum rtx_code code; > > > > + rtx cond = if_info->cond; > > > > + rtx_insn *ifcvt_seq; > > > > + > > > > + /* if (test) x = *a; else x = c - d; > > > > + => x = c - d; > > > > + if (test) > > > > + x = *a; > > > > + */ > > > > + > > > > + code = GET_CODE (cond); > > > > + insn_a = if_info->insn_a; > > > > + insn_b = if_info->insn_b; > > > > + > > > > + machine_mode x_mode = GET_MODE (x); > > > > + > > > > + if (!can_conditionally_move_p (x_mode)) > > > > + return false; > > > > + > > > > + /* Because we only handle one trap MEM + other non_trap, non mem > > + cases, > > > > + just move one trap MEM always in then_bb. */ > > > > + if (noce_reversed_cond_code (if_info) != UNKNOWN) > > > > + { > > > > + bool reversep = false; > > > > + if (b_trap) > > > > + reversep = true; > > > > + > > > > + if (reversep) > > > > + { > > > > + if (if_info->rev_cond) > > > > + { > > > > + cond = if_info->rev_cond; > > > > + code = GET_CODE (cond); > > > > + } > > > > + else > > > > + code = reversed_comparison_code (cond, > > + if_info->jump); > > > > + std::swap (a, b); > > > > + std::swap (insn_a, insn_b); > > > > + std::swap (a_simple, b_simple); > > > > + std::swap (then_bb, else_bb); > > > > + } > > > > + } > > > > + > > > > + if (then_bb && else_bb > > > > + && (!bbs_ok_for_cmove_arith (then_bb, else_bb, > > + if_info->orig_x) > > > > + || !bbs_ok_for_cmove_arith (else_bb, then_bb, > > + if_info->orig_x))) > > > > + return false; > > > > + > > > > + start_sequence (); > > > > + > > > > + /* If one of the blocks is empty then the corresponding B or A > > + value > > > > + came from the test block. The non-empty complex block that we > > + will > > > > + emit might clobber the register used by B or A, so move it to a > > + pseudo > > > > + first. */ > > > > + > > > > + rtx tmp_b = NULL_RTX; > > > > + > > > > + /* Don't move trap mem to a pseudo. */ > > > > + if (!may_trap_or_fault_p (b) && (b_simple || !else_bb)) > > > > + tmp_b = gen_reg_rtx (x_mode); > > > > + > > > > + orig_b = b; > > > > + > > > > + rtx emit_a = NULL_RTX; > > > > + rtx emit_b = NULL_RTX; > > > > + rtx_insn *tmp_insn = NULL; > > > > + bool modified_in_a = false; > > > > + bool modified_in_b = false; > > > > + /* If either operand is complex, load it into a register first. > > > > + The best way to do this is to copy the original insn. In this > > > > + way we preserve any clobbers etc that the insn may have had. > > > > + This is of course not possible in the IS_MEM case. */ > > > > + > > > > + if (! general_operand (b, GET_MODE (b)) || tmp_b) > > > > + { > > > > + if (insn_b) > > > > + { > > > > + b = tmp_b ? tmp_b : gen_reg_rtx (GET_MODE (b)); > > > > + rtx_insn *copy_of_b = as_a <rtx_insn *> (copy_rtx > > + (insn_b)); > > > > + rtx set = single_set (copy_of_b); > > > > + > > > > + SET_DEST (set) = b; > > > > + emit_b = PATTERN (copy_of_b); > > > > + } > > > > + else > > > > + { > > > > + rtx tmp_reg = tmp_b ? tmp_b : gen_reg_rtx > > + (GET_MODE (b)); > > > > + emit_b = gen_rtx_SET (tmp_reg, b); > > > > + b = tmp_reg; > > > > + } > > > > + } > > > > + > > > > + if (tmp_b && then_bb) > > > > + { > > > > + FOR_BB_INSNS (then_bb, tmp_insn) > > > > + /* Don't check inside insn_a. We will have changed it > > + to emit_a > > > > + with a destination that doesn't conflict. */ > > > > + if (!(insn_a && tmp_insn == insn_a) > > > > + && modified_in_p (orig_b, tmp_insn)) > > > > + { > > > > + modified_in_a = true; > > > > + break; > > > > + } > > > > + > > > > + } > > > > + > > > > + modified_in_b = emit_b != NULL_RTX && modified_in_p (a, emit_b); > > > > + /* If insn to set up A clobbers any registers B depends on, try to > > > > + swap insn that sets up A with the one that sets up B. If even > > > > + that doesn't help, punt. */ > > > > + if (modified_in_a && !modified_in_b) > > > > + { > > > > + if (!noce_emit_bb (emit_b, else_bb, b_simple)) > > > > + goto end_seq_and_fail; > > > > + > > > > + if (!noce_emit_bb (emit_a, then_bb, a_simple)) > > > > + goto end_seq_and_fail; > > > > + } > > > > + else if (!modified_in_a) > > > > + { > > > > + if (!noce_emit_bb (emit_b, else_bb, b_simple)) > > > > + goto end_seq_and_fail; > > > > + > > > > + if (!noce_emit_bb (emit_a, then_bb, a_simple)) > > > > + goto end_seq_and_fail; > > > > + } > > > > + else > > > > + goto end_seq_and_fail; > > > > + > > > > + target = noce_emit_cmove (if_info, x, code, XEXP (cond, 0), XEXP > > + (cond, 1), > > > > + a, b); > > > > + > > > > + if (! target) > > > > + goto end_seq_and_fail; > > > > + > > > > + if (target != x) > > > > + noce_emit_move_insn (x, target); > > > > + > > > > + ifcvt_seq = end_ifcvt_sequence (if_info); > > > > + if (!ifcvt_seq || !targetm.noce_conversion_profitable_p (ifcvt_seq, > > + if_info)) > > > > + return false; > > > > + > > > > + emit_insn_before_setloc (ifcvt_seq, if_info->jump, > > > > + INSN_LOCATION > > + (if_info->insn_a)); > > > > + if_info->transform_name = "noce_try_cmove_load_mem_notrap"; > > > > + return true; > > > > + > > > > + end_seq_and_fail: > > > > + end_sequence (); > > > > + return false; > > > > +} > > > > + > > > > +static bool > > > > +noce_try_cmove_store_mem_notrap (struct noce_if_info *if_info, rtx > > +*x_ptr, rtx orig_x) > > > > +{ > > > > + rtx a = if_info->a; > > > > + rtx b = if_info->b; > > > > + rtx x = orig_x; > > > > + machine_mode x_mode = GET_MODE (x); > > > > + > > > > + if (!MEM_P (x) || !rtx_equal_p (x, b)) > > > > + return false; > > > > + if (!may_trap_or_fault_p (x) || > > + !targetm.have_conditional_move_mem_notrap (x)) > > > > + return false; > > > > + if (!if_info->then_simple || !register_operand (a, x_mode)) > > > > + return false; > > > > + > > > > + rtx cond = if_info->cond; > > > > + enum rtx_code code = GET_CODE (cond); > > > > + rtx_insn *ifcvt_seq; > > > > + > > > > + start_sequence (); > > > > + > > > > + rtx target = noce_emit_cmove (if_info, x, code, XEXP (cond, 0), > > + XEXP (cond, 1), > > > > + a, b); > > > > + > > > > + if (! target) > > > > + goto end_seq_and_fail; > > > > + > > > > + if (target != x) > > > > + noce_emit_move_insn (x, target); > > > > + > > > > + ifcvt_seq = end_ifcvt_sequence (if_info); > > > > + if (!ifcvt_seq || !targetm.noce_conversion_profitable_p (ifcvt_seq, > > + if_info)) > > > > + return false; > > > > + > > > > + emit_insn_before_setloc (ifcvt_seq, if_info->jump, > > > > + INSN_LOCATION > > + (if_info->insn_a)); > > > > + if_info->transform_name = "noce_try_cmove_load_mem_notrap"; > > > > + if_info->x = orig_x; > > > > + *x_ptr = orig_x; > > > > + return true; > > > > + > > > > + end_seq_and_fail: > > > > + end_sequence (); > > > > + return false; > > > > +} > > > > + > > > > /* For most cases, the simplified condition we found is the best > > > > choice, but this is not the case for the min/max/abs transforms. > > > > For these we wish to know that it is A or B in the condition. */ > > > > @@ -4121,12 +4354,21 @@ noce_process_if_block (struct noce_if_info > > *if_info) > > > > } > > > > > > > > if (!set_b && MEM_P (orig_x)) > > > > + { > > > > + /* Conditional_move_suppress_fault for condition mem store > > + would not > > > > + move any arithmetic calculations. */ > > > > + if (targetm.have_conditional_move_mem_notrap (orig_x) > > > > + && HAVE_conditional_move > > > > + && noce_try_cmove_store_mem_notrap (if_info, &x, > > + orig_x)) > > > > + goto success; > > > > + else > > > > /* We want to avoid store speculation to avoid cases like > > > > if (pthread_mutex_trylock(mutex)) > > > > ++global_variable; > > > > Rather than go to much effort here, we rely on the SSA > > optimizers, > > > > which do a good enough job these days. */ > > > > - return false; > > > > + return false; > > > > + } > > > > > > > > if (noce_try_move (if_info)) > > > > goto success; > > > > @@ -4160,6 +4402,9 @@ noce_process_if_block (struct noce_if_info > > *if_info) > > > > if (HAVE_conditional_move > > > > && noce_try_cmove_arith (if_info)) > > > > goto success; > > > > + if (HAVE_conditional_move > > > > + && noce_try_cmove_load_mem_notrap (if_info)) > > > > + goto success; > > > > if (noce_try_sign_mask (if_info)) > > > > goto success; > > > > } > > > > diff --git a/gcc/testsuite/gcc.target/i386/apx-cfcmov-1.c > > b/gcc/testsuite/gcc.target/i386/apx-cfcmov-1.c > > > > new file mode 100644 > > > > index 00000000000..4a1fb91b24c > > > > --- /dev/null > > > > +++ b/gcc/testsuite/gcc.target/i386/apx-cfcmov-1.c > > > > @@ -0,0 +1,73 @@ > > > > +/* { dg-do compile { target { ! ia32 } } } */ > > > > +/* { dg-options "-O3 -mapxf" } */ > > > > + > > > > +/* { dg-final { scan-assembler-times "cfcmovne" 1 } } */ > > > > +/* { dg-final { scan-assembler-times "cfcmovg" 2} } */ > > > > +/* { dg-final { scan-assembler-times "cfcmove" 1 } } */ > > > > +/* { dg-final { scan-assembler-times "cfcmovl" 2 } } */ > > > > +/* { dg-final { scan-assembler-times "cfcmovle" 1 } } */ > > > > + > > > > +__attribute__((noinline, noclone, target("apxf"))) > > > > +int cfc_store (int a, int b, int c, int d, int *arr) > > > > +{ > > > > + if (a != b) > > > > + *arr = c; > > > > + return d; > > > > + > > > > +} > > > > + > > > > +__attribute__((noinline, noclone, target("apxf"))) > > > > +int cfc_load_ndd (int a, int b, int c, int *p) > > > > +{ > > > > + if (a > b) > > > > + return *p; > > > > + return c; > > > > +} > > > > + > > > > +__attribute__((noinline, noclone, target("apxf"))) > > > > +int cfc_load_2_trap (int a, int b, int *c, int *p) > > > > +{ > > > > + if (a > b) > > > > + return *p; > > > > + return *c; > > > > +} > > > > + > > > > +__attribute__((noinline, noclone, target("apxf"))) > > > > +int cfc_load_zero (int a, int b, int c) > > > > +{ > > > > + int sum = 0; > > > > + if (a == b) > > > > + return c; > > > > + return sum; > > > > +} > > > > + > > > > +__attribute__((noinline, noclone, target("apxf"))) > > > > +int cfc_load_mem (int a, int b, int *p) > > > > +{ > > > > + int sum = 0; > > > > + if (a < b ) > > > > + sum = *p; > > > > + return sum; > > > > +} > > > > + > > > > +__attribute__((noinline, noclone, target("apxf"))) > > > > +int cfc_load_arith_1 (int a, int b, int c, int *p) > > > > +{ > > > > + int sum = 0; > > > > + if (a > b) > > > > + sum = *p; > > > > + else > > > > + sum = a + c; > > > > + return sum + 1; > > > > +} > > > > + > > > > +__attribute__((noinline, noclone, target("apxf"))) > > > > +int cfc_load_arith_2 (int a, int b, int c, int *p) > > > > +{ > > > > + int sum = 0; > > > > + if (a > b) > > > > + sum = a + c; > > > > + else > > > > + sum = *p; > > > > + return sum + 1; > > > > +} > > > > diff --git a/gcc/testsuite/gcc.target/i386/apx-cfcmov-2.c > > b/gcc/testsuite/gcc.target/i386/apx-cfcmov-2.c > > > > new file mode 100644 > > > > index 00000000000..2b1660f64fa > > > > --- /dev/null > > > > +++ b/gcc/testsuite/gcc.target/i386/apx-cfcmov-2.c > > > > @@ -0,0 +1,40 @@ > > > > +/* { dg-do run { target { ! ia32 } } } */ > > > > +/* { dg-require-effective-target apxf } */ > > > > +/* { dg-options "-mapxf -march=x86-64 -O3" } */ > > > > + > > > > +#include "apx-cfcmov-1.c" > > > > + > > > > +extern void abort (void); > > > > + > > > > +int main () > > > > +{ > > > > + if (!__builtin_cpu_supports ("apxf")) > > > > + return 0; > > > > + > > > > + int arr = 6; > > > > + int arr1 = 5; > > > > + int res = cfc_store (1, 2, 3, 4, &arr); > > > > + if (arr != 3 && res != 4) > > > > + abort (); > > > > + res = cfc_load_ndd (2, 1, 2, &arr); > > > > + if (res != 3) > > > > + abort (); > > > > + res = cfc_load_2_trap (1, 2, &arr1, &arr); > > > > + if (res != 5) > > > > + abort (); > > > > + res = cfc_load_zero (1, 2, 3); > > > > + res = cfc_load_zero (1, 2, 3); > > > > + if (res != 0) > > > > + abort (); > > > > + res = cfc_load_mem (2, 1, &arr); > > > > + if (res != 0) > > > > + abort (); > > > > + res = cfc_load_arith_1 (1, 2, 3, &arr); > > > > + if (res != 5) > > > > + abort(); > > > > + res = cfc_load_arith_2 (2, 1, 3,&arr); > > > > + if (res != 6) > > > > + abort(); > > > > + return 0; > > > > +} > > > > + > > > > -- > > > > 2.31.1 > > > > ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2024-06-14 6:19 UTC | newest] Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- [not found] <20240614005632.4088419-1-lingling.kong@intel.com> 2024-06-14 1:38 ` [PATCH 1/2] Add a new target hook: TARGET_HAVE_CONDITIONAL_MOVE_MEM_NOTRAP Kong, Lingling [not found] ` <20240614005632.4088419-2-lingling.kong@intel.com> 2024-06-14 1:38 ` [PATCH 2/2] [APX CFCMOV] Support APX CFCMOV Kong, Lingling 2024-06-14 6:12 ` Richard Biener 2024-06-14 6:19 ` Liu, Hongtao
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).