* [PATCH] tree-ssa-math-opts: popcount (X) == 1 to (X ^ (X - 1)) > (X - 1) optimization [PR90693] @ 2023-11-17 14:01 Jakub Jelinek 2023-11-18 10:27 ` [PATCH] tree-ssa-math-opts: popcount (X) == 1 to (X ^ (X - 1)) > (X - 1) optimization for direct optab [PR90693] Jakub Jelinek ` (2 more replies) 0 siblings, 3 replies; 11+ messages in thread From: Jakub Jelinek @ 2023-11-17 14:01 UTC (permalink / raw) To: Richard Biener; +Cc: gcc-patches Hi! Per the earlier discussions on this PR, the following patch folds popcount (x) == 1 (and != 1) into (x ^ (x - 1)) > x - 1 (or <=) if the corresponding popcount optab isn't implemented (I think any double-word popcount or call will be necessarily slower than the above cheap 3 op check and even for -Os larger or same size). I've noticed e.g. C++ aligned new starts with std::has_single_bit which does popcount (x) == 1. As a follow-up, I'm considering changing in this routine the popcount call to IFN_POPCOUNT with 2 arguments and during expansion test costs. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2023-11-17 Jakub Jelinek <jakub@redhat.com> PR tree-optimization/90693 * tree-ssa-math-opts.cc (match_single_bit_test): New function. (math_opts_dom_walker::after_dom_children): Call it for EQ_EXPR and NE_EXPR assignments and GIMPLE_CONDs. * gcc.target/i386/pr90693.c: New test. --- gcc/tree-ssa-math-opts.cc.jj 2023-11-13 15:51:02.920562303 +0100 +++ gcc/tree-ssa-math-opts.cc 2023-11-17 11:37:02.255905475 +0100 @@ -5154,6 +5154,72 @@ match_uaddc_usubc (gimple_stmt_iterator return true; } +/* Replace .POPCOUNT (x) == 1 or .POPCOUNT (x) != 1 with + (x & (x - 1)) > x - 1 or (x & (x - 1)) <= x - 1 if .POPCOUNT + isn't a direct optab. */ + +static void +match_single_bit_test (gimple_stmt_iterator *gsi, gimple *stmt) +{ + tree clhs, crhs; + enum tree_code code; + if (gimple_code (stmt) == GIMPLE_COND) + { + clhs = gimple_cond_lhs (stmt); + crhs = gimple_cond_rhs (stmt); + code = gimple_cond_code (stmt); + } + else + { + clhs = gimple_assign_rhs1 (stmt); + crhs = gimple_assign_rhs2 (stmt); + code = gimple_assign_rhs_code (stmt); + } + if (code != EQ_EXPR && code != NE_EXPR) + return; + if (TREE_CODE (clhs) != SSA_NAME || !integer_onep (crhs)) + return; + gimple *call = SSA_NAME_DEF_STMT (clhs); + combined_fn cfn = gimple_call_combined_fn (call); + switch (cfn) + { + CASE_CFN_POPCOUNT: + break; + default: + return; + } + if (!has_single_use (clhs)) + return; + tree arg = gimple_call_arg (call, 0); + tree type = TREE_TYPE (arg); + if (!INTEGRAL_TYPE_P (type)) + return; + if (direct_internal_fn_supported_p (IFN_POPCOUNT, type, OPTIMIZE_FOR_BOTH)) + return; + tree argm1 = make_ssa_name (type); + gimple *g = gimple_build_assign (argm1, PLUS_EXPR, arg, + build_int_cst (type, -1)); + gsi_insert_before (gsi, g, GSI_SAME_STMT); + g = gimple_build_assign (make_ssa_name (type), BIT_XOR_EXPR, arg, argm1); + gsi_insert_before (gsi, g, GSI_SAME_STMT); + if (gcond *cond = dyn_cast <gcond *> (stmt)) + { + gimple_cond_set_lhs (cond, gimple_assign_lhs (g)); + gimple_cond_set_rhs (cond, argm1); + gimple_cond_set_code (cond, code == EQ_EXPR ? GT_EXPR : LE_EXPR); + } + else + { + gimple_assign_set_rhs1 (stmt, gimple_assign_lhs (g)); + gimple_assign_set_rhs2 (stmt, argm1); + gimple_assign_set_rhs_code (stmt, code == EQ_EXPR ? GT_EXPR : LE_EXPR); + } + update_stmt (stmt); + gimple_stmt_iterator gsi2 = gsi_for_stmt (call); + gsi_remove (&gsi2, true); + release_defs (call); +} + /* Return true if target has support for divmod. */ static bool @@ -5807,6 +5873,11 @@ math_opts_dom_walker::after_dom_children match_uaddc_usubc (&gsi, stmt, code); break; + case EQ_EXPR: + case NE_EXPR: + match_single_bit_test (&gsi, stmt); + break; + default:; } } @@ -5872,7 +5943,10 @@ math_opts_dom_walker::after_dom_children } } else if (gimple_code (stmt) == GIMPLE_COND) - optimize_spaceship (as_a <gcond *> (stmt)); + { + match_single_bit_test (&gsi, stmt); + optimize_spaceship (as_a <gcond *> (stmt)); + } gsi_next (&gsi); } if (fma_state.m_deferring_p --- gcc/testsuite/gcc.target/i386/pr90693.c.jj 2023-11-16 19:31:43.383587048 +0100 +++ gcc/testsuite/gcc.target/i386/pr90693.c 2023-11-16 19:33:23.676177086 +0100 @@ -0,0 +1,29 @@ +/* PR tree-optimization/90693 */ +/* { dg-do compile } */ +/* { dg-options "-O2 -mno-abm -mno-popcnt -fdump-tree-optimized" } */ +/* { dg-final { scan-tree-dump-not "POPCOUNT \\\(" "optimized" } } */ +/* { dg-final { scan-tree-dump-not "__builtin_popcount(ll)? \\\(" "optimized" } } */ + +int +foo (unsigned int x) +{ + return __builtin_popcount (x) == 1; +} + +int +bar (unsigned int x) +{ + return (x ^ (x - 1)) > x - 1; +} + +int +baz (unsigned long long x) +{ + return __builtin_popcountll (x) == 1; +} + +int +qux (unsigned long long x) +{ + return (x ^ (x - 1)) > x - 1; +} Jakub ^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH] tree-ssa-math-opts: popcount (X) == 1 to (X ^ (X - 1)) > (X - 1) optimization for direct optab [PR90693] 2023-11-17 14:01 [PATCH] tree-ssa-math-opts: popcount (X) == 1 to (X ^ (X - 1)) > (X - 1) optimization [PR90693] Jakub Jelinek @ 2023-11-18 10:27 ` Jakub Jelinek 2023-11-19 4:03 ` Jeff Law 2023-11-20 8:01 ` Richard Biener 2023-11-19 3:59 ` [PATCH] tree-ssa-math-opts: popcount (X) == 1 to (X ^ (X - 1)) > (X - 1) optimization [PR90693] Jeff Law 2023-11-20 7:54 ` Richard Biener 2 siblings, 2 replies; 11+ messages in thread From: Jakub Jelinek @ 2023-11-18 10:27 UTC (permalink / raw) To: Richard Biener; +Cc: gcc-patches Hi! On Fri, Nov 17, 2023 at 03:01:04PM +0100, Jakub Jelinek wrote: > As a follow-up, I'm considering changing in this routine the popcount > call to IFN_POPCOUNT with 2 arguments and during expansion test costs. Here is the follow-up which does the rtx costs testing. While having to tweak internal-fn.def so that POPCOUNT can have a custom expand_POPCOUNT, I have noticed we are inconsistent, some DEF_INTERNAL* macros (most of them) were undefined at the end of internal-fn.def (but in some cases uselessly undefined again after inclusion), while others were not (and sometimes undefined after the inclusion). I've changed it to always undefine at the end of internal-fn.def. Ok for trunk if it passes bootstrap/regtest? 2023-11-18 Jakub Jelinek <jakub@redhat.com> PR tree-optimization/90693 * tree-ssa-math-opts.cc (match_single_bit_test): Mark POPCOUNT with result only used in equality comparison against 1 with direct optab support as .POPCOUNT call with 2 arguments. * internal-fn.h (expand_POPCOUNT): Declare. * internal-fn.def: Document missing DEF_INTERNAL* macros and make sure they are all undefined at the end. (DEF_INTERNAL_INT_EXT_FN): New macro. (POPCOUNT): Use it instead of DEF_INTERNAL_INT_FN. * internal-fn.cc (lookup_hilo_internal_fn, lookup_evenodd_internal_fn, widening_fn_p, get_len_internal_fn): Don't undef DEF_INTERNAL_*FN macros after inclusion of internal-fn.def. (DEF_INTERNAL_INT_EXT_FN): Define to nothing before inclusion to define expanders. (expand_POPCOUNT): New function. --- gcc/tree-ssa-math-opts.cc.jj 2023-11-18 09:38:03.460813910 +0100 +++ gcc/tree-ssa-math-opts.cc 2023-11-18 10:25:18.751207936 +0100 @@ -5195,7 +5195,16 @@ match_single_bit_test (gimple_stmt_itera if (!INTEGRAL_TYPE_P (type)) return; if (direct_internal_fn_supported_p (IFN_POPCOUNT, type, OPTIMIZE_FOR_BOTH)) - return; + { + /* Tell expand_POPCOUNT the popcount result is only used in equality + comparison with one, so that it can decide based on rtx costs. */ + gimple *g = gimple_build_call_internal (IFN_POPCOUNT, 2, arg, + integer_one_node); + gimple_call_set_lhs (g, gimple_call_lhs (call)); + gimple_stmt_iterator gsi2 = gsi_for_stmt (call); + gsi_replace (&gsi2, g, true); + return; + } tree argm1 = make_ssa_name (type); gimple *g = gimple_build_assign (argm1, PLUS_EXPR, arg, build_int_cst (type, -1)); --- gcc/internal-fn.h.jj 2023-11-02 12:15:12.223998565 +0100 +++ gcc/internal-fn.h 2023-11-18 10:14:25.834340505 +0100 @@ -262,6 +262,7 @@ extern void expand_MULBITINT (internal_f extern void expand_DIVMODBITINT (internal_fn, gcall *); extern void expand_FLOATTOBITINT (internal_fn, gcall *); extern void expand_BITINTTOFLOAT (internal_fn, gcall *); +extern void expand_POPCOUNT (internal_fn, gcall *); extern bool vectorized_internal_fn_supported_p (internal_fn, tree); --- gcc/internal-fn.def.jj 2023-11-17 15:51:02.642802521 +0100 +++ gcc/internal-fn.def 2023-11-18 10:12:10.329236626 +0100 @@ -33,9 +33,13 @@ along with GCC; see the file COPYING3. DEF_INTERNAL_SIGNED_OPTAB_FN (NAME, FLAGS, SELECTOR, SIGNED_OPTAB, UNSIGNED_OPTAB, TYPE) DEF_INTERNAL_FLT_FN (NAME, FLAGS, OPTAB, TYPE) + DEF_INTERNAL_FLT_FLOATN_FN (NAME, FLAGS, OPTAB, TYPE) DEF_INTERNAL_INT_FN (NAME, FLAGS, OPTAB, TYPE) + DEF_INTERNAL_INT_EXT_FN (NAME, FLAGS, OPTAB, TYPE) DEF_INTERNAL_COND_FN (NAME, FLAGS, OPTAB, TYPE) DEF_INTERNAL_SIGNED_COND_FN (NAME, FLAGS, OPTAB, TYPE) + DEF_INTERNAL_WIDENING_OPTAB_FN (NAME, FLAGS, SELECTOR, SOPTAB, UOPTAB, + TYPE) where NAME is the name of the function, FLAGS is a set of ECF_* flags and FNSPEC is a string describing functions fnspec. @@ -97,6 +101,10 @@ along with GCC; see the file COPYING3. says that the function extends the C-level BUILT_IN_<NAME>{,L,LL,IMAX} group of functions to any integral mode (including vector modes). + DEF_INTERNAL_INT_EXT_FN is like DEF_INTERNAL_INT_FN, except that it + has expand_##NAME defined in internal-fn.cc to override the + DEF_INTERNAL_INT_FN expansion behavior. + DEF_INTERNAL_WIDENING_OPTAB_FN is a wrapper that defines five internal functions with DEF_INTERNAL_SIGNED_OPTAB_FN: - one that describes a widening operation with the same number of elements @@ -160,6 +168,11 @@ along with GCC; see the file COPYING3. DEF_INTERNAL_OPTAB_FN (NAME, FLAGS, OPTAB, TYPE) #endif +#ifndef DEF_INTERNAL_INT_EXT_FN +#define DEF_INTERNAL_INT_EXT_FN(NAME, FLAGS, OPTAB, TYPE) \ + DEF_INTERNAL_INT_FN (NAME, FLAGS, OPTAB, TYPE) +#endif + #ifndef DEF_INTERNAL_WIDENING_OPTAB_FN #define DEF_INTERNAL_WIDENING_OPTAB_FN(NAME, FLAGS, SELECTOR, SOPTAB, UOPTAB, TYPE) \ DEF_INTERNAL_SIGNED_OPTAB_FN (NAME, FLAGS, SELECTOR, SOPTAB, UOPTAB, TYPE) \ @@ -432,7 +445,7 @@ DEF_INTERNAL_INT_FN (CLZ, ECF_CONST | EC DEF_INTERNAL_INT_FN (CTZ, ECF_CONST | ECF_NOTHROW, ctz, unary) DEF_INTERNAL_INT_FN (FFS, ECF_CONST | ECF_NOTHROW, ffs, unary) DEF_INTERNAL_INT_FN (PARITY, ECF_CONST | ECF_NOTHROW, parity, unary) -DEF_INTERNAL_INT_FN (POPCOUNT, ECF_CONST | ECF_NOTHROW, popcount, unary) +DEF_INTERNAL_INT_EXT_FN (POPCOUNT, ECF_CONST | ECF_NOTHROW, popcount, unary) DEF_INTERNAL_FN (GOMP_TARGET_REV, ECF_NOVOPS | ECF_LEAF | ECF_NOTHROW, NULL) DEF_INTERNAL_FN (GOMP_USE_SIMT, ECF_NOVOPS | ECF_LEAF | ECF_NOTHROW, NULL) @@ -572,6 +585,10 @@ DEF_INTERNAL_FN (DIVMODBITINT, ECF_LEAF, DEF_INTERNAL_FN (FLOATTOBITINT, ECF_LEAF | ECF_NOTHROW, ". O . . ") DEF_INTERNAL_FN (BITINTTOFLOAT, ECF_PURE | ECF_LEAF, ". R . ") +#undef DEF_INTERNAL_WIDENING_OPTAB_FN +#undef DEF_INTERNAL_SIGNED_COND_FN +#undef DEF_INTERNAL_COND_FN +#undef DEF_INTERNAL_INT_EXT_FN #undef DEF_INTERNAL_INT_FN #undef DEF_INTERNAL_FLT_FN #undef DEF_INTERNAL_FLT_FLOATN_FN --- gcc/internal-fn.cc.jj 2023-11-07 08:32:01.741254155 +0100 +++ gcc/internal-fn.cc 2023-11-18 11:06:31.740703230 +0100 @@ -102,8 +102,6 @@ lookup_hilo_internal_fn (internal_fn ifn { default: gcc_unreachable (); -#undef DEF_INTERNAL_FN -#undef DEF_INTERNAL_WIDENING_OPTAB_FN #define DEF_INTERNAL_FN(NAME, FLAGS, TYPE) #define DEF_INTERNAL_WIDENING_OPTAB_FN(NAME, F, S, SO, UO, T) \ case IFN_##NAME: \ @@ -111,8 +109,6 @@ lookup_hilo_internal_fn (internal_fn ifn *hi = internal_fn (IFN_##NAME##_HI); \ break; #include "internal-fn.def" -#undef DEF_INTERNAL_FN -#undef DEF_INTERNAL_WIDENING_OPTAB_FN } } @@ -129,8 +125,6 @@ lookup_evenodd_internal_fn (internal_fn { default: gcc_unreachable (); -#undef DEF_INTERNAL_FN -#undef DEF_INTERNAL_WIDENING_OPTAB_FN #define DEF_INTERNAL_FN(NAME, FLAGS, TYPE) #define DEF_INTERNAL_WIDENING_OPTAB_FN(NAME, F, S, SO, UO, T) \ case IFN_##NAME: \ @@ -138,8 +132,6 @@ lookup_evenodd_internal_fn (internal_fn *odd = internal_fn (IFN_##NAME##_ODD); \ break; #include "internal-fn.def" -#undef DEF_INTERNAL_FN -#undef DEF_INTERNAL_WIDENING_OPTAB_FN } } @@ -4261,7 +4253,6 @@ widening_fn_p (code_helper code) internal_fn fn = as_internal_fn ((combined_fn) code); switch (fn) { - #undef DEF_INTERNAL_WIDENING_OPTAB_FN #define DEF_INTERNAL_WIDENING_OPTAB_FN(NAME, F, S, SO, UO, T) \ case IFN_##NAME: \ case IFN_##NAME##_HI: \ @@ -4270,7 +4261,6 @@ widening_fn_p (code_helper code) case IFN_##NAME##_ODD: \ return true; #include "internal-fn.def" - #undef DEF_INTERNAL_WIDENING_OPTAB_FN default: return false; @@ -4295,6 +4285,7 @@ set_edom_supported_p (void) { \ expand_##TYPE##_optab_fn (fn, stmt, OPTAB##_optab); \ } +#define DEF_INTERNAL_INT_EXT_FN(CODE, FLAGS, OPTAB, TYPE) #define DEF_INTERNAL_SIGNED_OPTAB_FN(CODE, FLAGS, SELECTOR, SIGNED_OPTAB, \ UNSIGNED_OPTAB, TYPE) \ static void \ @@ -4305,8 +4296,6 @@ set_edom_supported_p (void) expand_##TYPE##_optab_fn (fn, stmt, which_optab); \ } #include "internal-fn.def" -#undef DEF_INTERNAL_OPTAB_FN -#undef DEF_INTERNAL_SIGNED_OPTAB_FN /* Routines to expand each internal function, indexed by function number. Each routine has the prototype: @@ -4465,8 +4454,6 @@ get_len_internal_fn (internal_fn fn) { switch (fn) { -#undef DEF_INTERNAL_COND_FN -#undef DEF_INTERNAL_SIGNED_COND_FN #define DEF_INTERNAL_COND_FN(NAME, ...) \ case IFN_COND_##NAME: \ return IFN_COND_LEN_##NAME; @@ -4474,8 +4461,6 @@ get_len_internal_fn (internal_fn fn) case IFN_COND_##NAME: \ return IFN_COND_LEN_##NAME; #include "internal-fn.def" -#undef DEF_INTERNAL_COND_FN -#undef DEF_INTERNAL_SIGNED_COND_FN default: return IFN_LAST; } @@ -5113,3 +5098,67 @@ expand_BITINTTOFLOAT (internal_fn, gcall if (val != target) emit_move_insn (target, val); } + +void +expand_POPCOUNT (internal_fn fn, gcall *stmt) +{ + if (gimple_call_num_args (stmt) == 1) + { + expand_unary_optab_fn (fn, stmt, popcount_optab); + return; + } + /* If .POPCOUNT call has 2 arguments, match_single_bit_test marked it + because the result is only used in an equality comparison against 1. + Use rtx costs in that case to determine if .POPCOUNT (arg) == 1 + or (arg ^ (arg - 1)) > arg - 1 is cheaper. */ + bool speed_p = optimize_insn_for_speed_p (); + tree lhs = gimple_call_lhs (stmt); + tree arg = gimple_call_arg (stmt, 0); + tree type = TREE_TYPE (arg); + machine_mode mode = TYPE_MODE (type); + do_pending_stack_adjust (); + start_sequence (); + expand_unary_optab_fn (fn, stmt, popcount_optab); + rtx_insn *popcount_insns = get_insns (); + end_sequence (); + start_sequence (); + rtx plhs = expand_normal (lhs); + rtx pcmp = emit_store_flag (NULL_RTX, EQ, plhs, const1_rtx, mode, 0, 0); + if (pcmp == NULL_RTX) + { + fail: + end_sequence (); + emit_insn (popcount_insns); + return; + } + rtx_insn *popcount_cmp_insns = get_insns (); + end_sequence (); + start_sequence (); + rtx op0 = expand_normal (arg); + rtx argm1 = expand_simple_binop (mode, PLUS, op0, constm1_rtx, NULL_RTX, + 1, OPTAB_DIRECT); + if (argm1 == NULL_RTX) + goto fail; + rtx argxorargm1 = expand_simple_binop (mode, XOR, op0, argm1, NULL_RTX, + 1, OPTAB_DIRECT); + if (argxorargm1 == NULL_RTX) + goto fail; + rtx cmp = emit_store_flag (NULL_RTX, GTU, argxorargm1, argm1, mode, 1, 1); + if (cmp == NULL_RTX) + goto fail; + rtx_insn *cmp_insns = get_insns (); + end_sequence (); + unsigned popcount_cost = (seq_cost (popcount_insns, speed_p) + + seq_cost (popcount_cmp_insns, speed_p)); + unsigned cmp_cost = seq_cost (cmp_insns, speed_p); + if (popcount_cost < cmp_cost) + emit_insn (popcount_insns); + else + { + emit_insn (cmp_insns); + plhs = expand_normal (lhs); + if (GET_MODE (cmp) != GET_MODE (plhs)) + cmp = convert_to_mode (GET_MODE (plhs), cmp, 1); + emit_move_insn (plhs, cmp); + } +} Jakub ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] tree-ssa-math-opts: popcount (X) == 1 to (X ^ (X - 1)) > (X - 1) optimization for direct optab [PR90693] 2023-11-18 10:27 ` [PATCH] tree-ssa-math-opts: popcount (X) == 1 to (X ^ (X - 1)) > (X - 1) optimization for direct optab [PR90693] Jakub Jelinek @ 2023-11-19 4:03 ` Jeff Law 2023-11-20 8:01 ` Richard Biener 1 sibling, 0 replies; 11+ messages in thread From: Jeff Law @ 2023-11-19 4:03 UTC (permalink / raw) To: Jakub Jelinek, Richard Biener; +Cc: gcc-patches On 11/18/23 03:27, Jakub Jelinek wrote: > Hi! > > On Fri, Nov 17, 2023 at 03:01:04PM +0100, Jakub Jelinek wrote: >> As a follow-up, I'm considering changing in this routine the popcount >> call to IFN_POPCOUNT with 2 arguments and during expansion test costs. > > Here is the follow-up which does the rtx costs testing. > While having to tweak internal-fn.def so that POPCOUNT can have a custom > expand_POPCOUNT, I have noticed we are inconsistent, some DEF_INTERNAL* > macros (most of them) were undefined at the end of internal-fn.def (but in > some cases uselessly undefined again after inclusion), while others were not > (and sometimes undefined after the inclusion). I've changed it to always > undefine at the end of internal-fn.def. > > Ok for trunk if it passes bootstrap/regtest? > > 2023-11-18 Jakub Jelinek <jakub@redhat.com> > > PR tree-optimization/90693 > * tree-ssa-math-opts.cc (match_single_bit_test): Mark POPCOUNT with > result only used in equality comparison against 1 with direct optab > support as .POPCOUNT call with 2 arguments. > * internal-fn.h (expand_POPCOUNT): Declare. > * internal-fn.def: Document missing DEF_INTERNAL* macros and make sure > they are all undefined at the end. > (DEF_INTERNAL_INT_EXT_FN): New macro. > (POPCOUNT): Use it instead of DEF_INTERNAL_INT_FN. > * internal-fn.cc (lookup_hilo_internal_fn, lookup_evenodd_internal_fn, > widening_fn_p, get_len_internal_fn): Don't undef DEF_INTERNAL_*FN > macros after inclusion of internal-fn.def. > (DEF_INTERNAL_INT_EXT_FN): Define to nothing before inclusion to > define expanders. > (expand_POPCOUNT): New function. > > + unsigned cmp_cost = seq_cost (cmp_insns, speed_p); > + if (popcount_cost < cmp_cost) > + emit_insn (popcount_insns); > + else > + { > + emit_insn (cmp_insns); > + plhs = expand_normal (lhs); > + if (GET_MODE (cmp) != GET_MODE (plhs)) > + cmp = convert_to_mode (GET_MODE (plhs), cmp, 1); > + emit_move_insn (plhs, cmp); > + } Did you want <= for the test to use popcount? That seems like a better choice in that scenario to me as the popcount is likely smaller. OK for the trunk as-is or using a <= test. jeff ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] tree-ssa-math-opts: popcount (X) == 1 to (X ^ (X - 1)) > (X - 1) optimization for direct optab [PR90693] 2023-11-18 10:27 ` [PATCH] tree-ssa-math-opts: popcount (X) == 1 to (X ^ (X - 1)) > (X - 1) optimization for direct optab [PR90693] Jakub Jelinek 2023-11-19 4:03 ` Jeff Law @ 2023-11-20 8:01 ` Richard Biener 2023-11-20 9:05 ` Jakub Jelinek 1 sibling, 1 reply; 11+ messages in thread From: Richard Biener @ 2023-11-20 8:01 UTC (permalink / raw) To: Jakub Jelinek; +Cc: gcc-patches On Sat, 18 Nov 2023, Jakub Jelinek wrote: > Hi! > > On Fri, Nov 17, 2023 at 03:01:04PM +0100, Jakub Jelinek wrote: > > As a follow-up, I'm considering changing in this routine the popcount > > call to IFN_POPCOUNT with 2 arguments and during expansion test costs. > > Here is the follow-up which does the rtx costs testing. > While having to tweak internal-fn.def so that POPCOUNT can have a custom > expand_POPCOUNT, I have noticed we are inconsistent, some DEF_INTERNAL* > macros (most of them) were undefined at the end of internal-fn.def (but in > some cases uselessly undefined again after inclusion), while others were not > (and sometimes undefined after the inclusion). I've changed it to always > undefine at the end of internal-fn.def. The last bit is cleanup that could go in independently. > Ok for trunk if it passes bootstrap/regtest? > > 2023-11-18 Jakub Jelinek <jakub@redhat.com> > > PR tree-optimization/90693 > * tree-ssa-math-opts.cc (match_single_bit_test): Mark POPCOUNT with > result only used in equality comparison against 1 with direct optab > support as .POPCOUNT call with 2 arguments. > * internal-fn.h (expand_POPCOUNT): Declare. > * internal-fn.def: Document missing DEF_INTERNAL* macros and make sure > they are all undefined at the end. > (DEF_INTERNAL_INT_EXT_FN): New macro. > (POPCOUNT): Use it instead of DEF_INTERNAL_INT_FN. > * internal-fn.cc (lookup_hilo_internal_fn, lookup_evenodd_internal_fn, > widening_fn_p, get_len_internal_fn): Don't undef DEF_INTERNAL_*FN > macros after inclusion of internal-fn.def. > (DEF_INTERNAL_INT_EXT_FN): Define to nothing before inclusion to > define expanders. > (expand_POPCOUNT): New function. > > --- gcc/tree-ssa-math-opts.cc.jj 2023-11-18 09:38:03.460813910 +0100 > +++ gcc/tree-ssa-math-opts.cc 2023-11-18 10:25:18.751207936 +0100 > @@ -5195,7 +5195,16 @@ match_single_bit_test (gimple_stmt_itera > if (!INTEGRAL_TYPE_P (type)) > return; > if (direct_internal_fn_supported_p (IFN_POPCOUNT, type, OPTIMIZE_FOR_BOTH)) > - return; > + { > + /* Tell expand_POPCOUNT the popcount result is only used in equality > + comparison with one, so that it can decide based on rtx costs. */ > + gimple *g = gimple_build_call_internal (IFN_POPCOUNT, 2, arg, > + integer_one_node); > + gimple_call_set_lhs (g, gimple_call_lhs (call)); > + gimple_stmt_iterator gsi2 = gsi_for_stmt (call); > + gsi_replace (&gsi2, g, true); > + return; > + } > tree argm1 = make_ssa_name (type); > gimple *g = gimple_build_assign (argm1, PLUS_EXPR, arg, > build_int_cst (type, -1)); > --- gcc/internal-fn.h.jj 2023-11-02 12:15:12.223998565 +0100 > +++ gcc/internal-fn.h 2023-11-18 10:14:25.834340505 +0100 > @@ -262,6 +262,7 @@ extern void expand_MULBITINT (internal_f > extern void expand_DIVMODBITINT (internal_fn, gcall *); > extern void expand_FLOATTOBITINT (internal_fn, gcall *); > extern void expand_BITINTTOFLOAT (internal_fn, gcall *); > +extern void expand_POPCOUNT (internal_fn, gcall *); > > extern bool vectorized_internal_fn_supported_p (internal_fn, tree); > > --- gcc/internal-fn.def.jj 2023-11-17 15:51:02.642802521 +0100 > +++ gcc/internal-fn.def 2023-11-18 10:12:10.329236626 +0100 > @@ -33,9 +33,13 @@ along with GCC; see the file COPYING3. > DEF_INTERNAL_SIGNED_OPTAB_FN (NAME, FLAGS, SELECTOR, SIGNED_OPTAB, > UNSIGNED_OPTAB, TYPE) > DEF_INTERNAL_FLT_FN (NAME, FLAGS, OPTAB, TYPE) > + DEF_INTERNAL_FLT_FLOATN_FN (NAME, FLAGS, OPTAB, TYPE) > DEF_INTERNAL_INT_FN (NAME, FLAGS, OPTAB, TYPE) > + DEF_INTERNAL_INT_EXT_FN (NAME, FLAGS, OPTAB, TYPE) > DEF_INTERNAL_COND_FN (NAME, FLAGS, OPTAB, TYPE) > DEF_INTERNAL_SIGNED_COND_FN (NAME, FLAGS, OPTAB, TYPE) > + DEF_INTERNAL_WIDENING_OPTAB_FN (NAME, FLAGS, SELECTOR, SOPTAB, UOPTAB, > + TYPE) > > where NAME is the name of the function, FLAGS is a set of > ECF_* flags and FNSPEC is a string describing functions fnspec. > @@ -97,6 +101,10 @@ along with GCC; see the file COPYING3. > says that the function extends the C-level BUILT_IN_<NAME>{,L,LL,IMAX} > group of functions to any integral mode (including vector modes). > > + DEF_INTERNAL_INT_EXT_FN is like DEF_INTERNAL_INT_FN, except that it > + has expand_##NAME defined in internal-fn.cc to override the > + DEF_INTERNAL_INT_FN expansion behavior. > + > DEF_INTERNAL_WIDENING_OPTAB_FN is a wrapper that defines five internal > functions with DEF_INTERNAL_SIGNED_OPTAB_FN: > - one that describes a widening operation with the same number of elements > @@ -160,6 +168,11 @@ along with GCC; see the file COPYING3. > DEF_INTERNAL_OPTAB_FN (NAME, FLAGS, OPTAB, TYPE) > #endif > > +#ifndef DEF_INTERNAL_INT_EXT_FN > +#define DEF_INTERNAL_INT_EXT_FN(NAME, FLAGS, OPTAB, TYPE) \ > + DEF_INTERNAL_INT_FN (NAME, FLAGS, OPTAB, TYPE) > +#endif > + > #ifndef DEF_INTERNAL_WIDENING_OPTAB_FN > #define DEF_INTERNAL_WIDENING_OPTAB_FN(NAME, FLAGS, SELECTOR, SOPTAB, UOPTAB, TYPE) \ > DEF_INTERNAL_SIGNED_OPTAB_FN (NAME, FLAGS, SELECTOR, SOPTAB, UOPTAB, TYPE) \ > @@ -432,7 +445,7 @@ DEF_INTERNAL_INT_FN (CLZ, ECF_CONST | EC > DEF_INTERNAL_INT_FN (CTZ, ECF_CONST | ECF_NOTHROW, ctz, unary) > DEF_INTERNAL_INT_FN (FFS, ECF_CONST | ECF_NOTHROW, ffs, unary) > DEF_INTERNAL_INT_FN (PARITY, ECF_CONST | ECF_NOTHROW, parity, unary) > -DEF_INTERNAL_INT_FN (POPCOUNT, ECF_CONST | ECF_NOTHROW, popcount, unary) > +DEF_INTERNAL_INT_EXT_FN (POPCOUNT, ECF_CONST | ECF_NOTHROW, popcount, unary) > > DEF_INTERNAL_FN (GOMP_TARGET_REV, ECF_NOVOPS | ECF_LEAF | ECF_NOTHROW, NULL) > DEF_INTERNAL_FN (GOMP_USE_SIMT, ECF_NOVOPS | ECF_LEAF | ECF_NOTHROW, NULL) > @@ -572,6 +585,10 @@ DEF_INTERNAL_FN (DIVMODBITINT, ECF_LEAF, > DEF_INTERNAL_FN (FLOATTOBITINT, ECF_LEAF | ECF_NOTHROW, ". O . . ") > DEF_INTERNAL_FN (BITINTTOFLOAT, ECF_PURE | ECF_LEAF, ". R . ") > > +#undef DEF_INTERNAL_WIDENING_OPTAB_FN > +#undef DEF_INTERNAL_SIGNED_COND_FN > +#undef DEF_INTERNAL_COND_FN > +#undef DEF_INTERNAL_INT_EXT_FN > #undef DEF_INTERNAL_INT_FN > #undef DEF_INTERNAL_FLT_FN > #undef DEF_INTERNAL_FLT_FLOATN_FN > --- gcc/internal-fn.cc.jj 2023-11-07 08:32:01.741254155 +0100 > +++ gcc/internal-fn.cc 2023-11-18 11:06:31.740703230 +0100 > @@ -102,8 +102,6 @@ lookup_hilo_internal_fn (internal_fn ifn > { > default: > gcc_unreachable (); > -#undef DEF_INTERNAL_FN > -#undef DEF_INTERNAL_WIDENING_OPTAB_FN > #define DEF_INTERNAL_FN(NAME, FLAGS, TYPE) > #define DEF_INTERNAL_WIDENING_OPTAB_FN(NAME, F, S, SO, UO, T) \ > case IFN_##NAME: \ > @@ -111,8 +109,6 @@ lookup_hilo_internal_fn (internal_fn ifn > *hi = internal_fn (IFN_##NAME##_HI); \ > break; > #include "internal-fn.def" > -#undef DEF_INTERNAL_FN > -#undef DEF_INTERNAL_WIDENING_OPTAB_FN > } > } > > @@ -129,8 +125,6 @@ lookup_evenodd_internal_fn (internal_fn > { > default: > gcc_unreachable (); > -#undef DEF_INTERNAL_FN > -#undef DEF_INTERNAL_WIDENING_OPTAB_FN > #define DEF_INTERNAL_FN(NAME, FLAGS, TYPE) > #define DEF_INTERNAL_WIDENING_OPTAB_FN(NAME, F, S, SO, UO, T) \ > case IFN_##NAME: \ > @@ -138,8 +132,6 @@ lookup_evenodd_internal_fn (internal_fn > *odd = internal_fn (IFN_##NAME##_ODD); \ > break; > #include "internal-fn.def" > -#undef DEF_INTERNAL_FN > -#undef DEF_INTERNAL_WIDENING_OPTAB_FN > } > } > > @@ -4261,7 +4253,6 @@ widening_fn_p (code_helper code) > internal_fn fn = as_internal_fn ((combined_fn) code); > switch (fn) > { > - #undef DEF_INTERNAL_WIDENING_OPTAB_FN > #define DEF_INTERNAL_WIDENING_OPTAB_FN(NAME, F, S, SO, UO, T) \ > case IFN_##NAME: \ > case IFN_##NAME##_HI: \ > @@ -4270,7 +4261,6 @@ widening_fn_p (code_helper code) > case IFN_##NAME##_ODD: \ > return true; > #include "internal-fn.def" > - #undef DEF_INTERNAL_WIDENING_OPTAB_FN > > default: > return false; > @@ -4295,6 +4285,7 @@ set_edom_supported_p (void) > { \ > expand_##TYPE##_optab_fn (fn, stmt, OPTAB##_optab); \ > } > +#define DEF_INTERNAL_INT_EXT_FN(CODE, FLAGS, OPTAB, TYPE) > #define DEF_INTERNAL_SIGNED_OPTAB_FN(CODE, FLAGS, SELECTOR, SIGNED_OPTAB, \ > UNSIGNED_OPTAB, TYPE) \ > static void \ > @@ -4305,8 +4296,6 @@ set_edom_supported_p (void) > expand_##TYPE##_optab_fn (fn, stmt, which_optab); \ > } > #include "internal-fn.def" > -#undef DEF_INTERNAL_OPTAB_FN > -#undef DEF_INTERNAL_SIGNED_OPTAB_FN > > /* Routines to expand each internal function, indexed by function number. > Each routine has the prototype: > @@ -4465,8 +4454,6 @@ get_len_internal_fn (internal_fn fn) > { > switch (fn) > { > -#undef DEF_INTERNAL_COND_FN > -#undef DEF_INTERNAL_SIGNED_COND_FN > #define DEF_INTERNAL_COND_FN(NAME, ...) \ > case IFN_COND_##NAME: \ > return IFN_COND_LEN_##NAME; > @@ -4474,8 +4461,6 @@ get_len_internal_fn (internal_fn fn) > case IFN_COND_##NAME: \ > return IFN_COND_LEN_##NAME; > #include "internal-fn.def" > -#undef DEF_INTERNAL_COND_FN > -#undef DEF_INTERNAL_SIGNED_COND_FN > default: > return IFN_LAST; > } > @@ -5113,3 +5098,67 @@ expand_BITINTTOFLOAT (internal_fn, gcall > if (val != target) > emit_move_insn (target, val); > } > + > +void > +expand_POPCOUNT (internal_fn fn, gcall *stmt) > +{ > + if (gimple_call_num_args (stmt) == 1) > + { > + expand_unary_optab_fn (fn, stmt, popcount_optab); > + return; > + } > + /* If .POPCOUNT call has 2 arguments, match_single_bit_test marked it > + because the result is only used in an equality comparison against 1. > + Use rtx costs in that case to determine if .POPCOUNT (arg) == 1 > + or (arg ^ (arg - 1)) > arg - 1 is cheaper. */ We're doing plenty of RTX costing during GIMPLE optimizations, I wonder if it could be done equally well there given that seq_cost of an RTX sequence isn't going to capture the compare & branch cost very reliably anyway ... > + bool speed_p = optimize_insn_for_speed_p (); > + tree lhs = gimple_call_lhs (stmt); > + tree arg = gimple_call_arg (stmt, 0); > + tree type = TREE_TYPE (arg); > + machine_mode mode = TYPE_MODE (type); > + do_pending_stack_adjust (); > + start_sequence (); > + expand_unary_optab_fn (fn, stmt, popcount_optab); > + rtx_insn *popcount_insns = get_insns (); > + end_sequence (); > + start_sequence (); > + rtx plhs = expand_normal (lhs); > + rtx pcmp = emit_store_flag (NULL_RTX, EQ, plhs, const1_rtx, mode, 0, 0); > + if (pcmp == NULL_RTX) > + { > + fail: > + end_sequence (); > + emit_insn (popcount_insns); > + return; > + } > + rtx_insn *popcount_cmp_insns = get_insns (); > + end_sequence (); > + start_sequence (); > + rtx op0 = expand_normal (arg); > + rtx argm1 = expand_simple_binop (mode, PLUS, op0, constm1_rtx, NULL_RTX, > + 1, OPTAB_DIRECT); > + if (argm1 == NULL_RTX) > + goto fail; > + rtx argxorargm1 = expand_simple_binop (mode, XOR, op0, argm1, NULL_RTX, > + 1, OPTAB_DIRECT); > + if (argxorargm1 == NULL_RTX) > + goto fail; > + rtx cmp = emit_store_flag (NULL_RTX, GTU, argxorargm1, argm1, mode, 1, 1); > + if (cmp == NULL_RTX) > + goto fail; > + rtx_insn *cmp_insns = get_insns (); > + end_sequence (); > + unsigned popcount_cost = (seq_cost (popcount_insns, speed_p) > + + seq_cost (popcount_cmp_insns, speed_p)); > + unsigned cmp_cost = seq_cost (cmp_insns, speed_p); > + if (popcount_cost < cmp_cost) > + emit_insn (popcount_insns); > + else > + { > + emit_insn (cmp_insns); > + plhs = expand_normal (lhs); > + if (GET_MODE (cmp) != GET_MODE (plhs)) > + cmp = convert_to_mode (GET_MODE (plhs), cmp, 1); > + emit_move_insn (plhs, cmp); > + } > +} > > > Jakub > > -- Richard Biener <rguenther@suse.de> SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany; GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg) ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] tree-ssa-math-opts: popcount (X) == 1 to (X ^ (X - 1)) > (X - 1) optimization for direct optab [PR90693] 2023-11-20 8:01 ` Richard Biener @ 2023-11-20 9:05 ` Jakub Jelinek 0 siblings, 0 replies; 11+ messages in thread From: Jakub Jelinek @ 2023-11-20 9:05 UTC (permalink / raw) To: Richard Biener; +Cc: gcc-patches On Mon, Nov 20, 2023 at 08:01:45AM +0000, Richard Biener wrote: > > On Fri, Nov 17, 2023 at 03:01:04PM +0100, Jakub Jelinek wrote: > > > As a follow-up, I'm considering changing in this routine the popcount > > > call to IFN_POPCOUNT with 2 arguments and during expansion test costs. > > > > Here is the follow-up which does the rtx costs testing. > > While having to tweak internal-fn.def so that POPCOUNT can have a custom > > expand_POPCOUNT, I have noticed we are inconsistent, some DEF_INTERNAL* > > macros (most of them) were undefined at the end of internal-fn.def (but in > > some cases uselessly undefined again after inclusion), while others were not > > (and sometimes undefined after the inclusion). I've changed it to always > > undefine at the end of internal-fn.def. > > The last bit is cleanup that could go in independently. Ok, I've committed the following patch first and the 2 patches adjusted on top of that. The 2 original ones were bootstrapped/regtested successfully on x86_64-linux and i686-linux last night. 2023-11-20 Jakub Jelinek <jakub@redhat.com> * internal-fn.def: Document missing DEF_INTERNAL* macros and make sure they are all undefined at the end. * internal-fn.cc (lookup_hilo_internal_fn, lookup_evenodd_internal_fn, widening_fn_p, get_len_internal_fn): Don't undef DEF_INTERNAL_*FN macros after inclusion of internal-fn.def. --- gcc/internal-fn.def.jj 2023-11-17 15:51:02.642802521 +0100 +++ gcc/internal-fn.def 2023-11-18 10:12:10.329236626 +0100 @@ -33,9 +33,12 @@ along with GCC; see the file COPYING3. DEF_INTERNAL_SIGNED_OPTAB_FN (NAME, FLAGS, SELECTOR, SIGNED_OPTAB, UNSIGNED_OPTAB, TYPE) DEF_INTERNAL_FLT_FN (NAME, FLAGS, OPTAB, TYPE) + DEF_INTERNAL_FLT_FLOATN_FN (NAME, FLAGS, OPTAB, TYPE) DEF_INTERNAL_INT_FN (NAME, FLAGS, OPTAB, TYPE) DEF_INTERNAL_COND_FN (NAME, FLAGS, OPTAB, TYPE) DEF_INTERNAL_SIGNED_COND_FN (NAME, FLAGS, OPTAB, TYPE) + DEF_INTERNAL_WIDENING_OPTAB_FN (NAME, FLAGS, SELECTOR, SOPTAB, UOPTAB, + TYPE) where NAME is the name of the function, FLAGS is a set of ECF_* flags and FNSPEC is a string describing functions fnspec. @@ -572,6 +585,9 @@ DEF_INTERNAL_FN (DIVMODBITINT, ECF_LEAF, DEF_INTERNAL_FN (FLOATTOBITINT, ECF_LEAF | ECF_NOTHROW, ". O . . ") DEF_INTERNAL_FN (BITINTTOFLOAT, ECF_PURE | ECF_LEAF, ". R . ") +#undef DEF_INTERNAL_WIDENING_OPTAB_FN +#undef DEF_INTERNAL_SIGNED_COND_FN +#undef DEF_INTERNAL_COND_FN #undef DEF_INTERNAL_INT_FN #undef DEF_INTERNAL_FLT_FN #undef DEF_INTERNAL_FLT_FLOATN_FN --- gcc/internal-fn.cc.jj 2023-11-07 08:32:01.741254155 +0100 +++ gcc/internal-fn.cc 2023-11-18 11:06:31.740703230 +0100 @@ -102,8 +102,6 @@ lookup_hilo_internal_fn (internal_fn ifn { default: gcc_unreachable (); -#undef DEF_INTERNAL_FN -#undef DEF_INTERNAL_WIDENING_OPTAB_FN #define DEF_INTERNAL_FN(NAME, FLAGS, TYPE) #define DEF_INTERNAL_WIDENING_OPTAB_FN(NAME, F, S, SO, UO, T) \ case IFN_##NAME: \ @@ -111,8 +109,6 @@ lookup_hilo_internal_fn (internal_fn ifn *hi = internal_fn (IFN_##NAME##_HI); \ break; #include "internal-fn.def" -#undef DEF_INTERNAL_FN -#undef DEF_INTERNAL_WIDENING_OPTAB_FN } } @@ -129,8 +125,6 @@ lookup_evenodd_internal_fn (internal_fn { default: gcc_unreachable (); -#undef DEF_INTERNAL_FN -#undef DEF_INTERNAL_WIDENING_OPTAB_FN #define DEF_INTERNAL_FN(NAME, FLAGS, TYPE) #define DEF_INTERNAL_WIDENING_OPTAB_FN(NAME, F, S, SO, UO, T) \ case IFN_##NAME: \ @@ -138,8 +132,6 @@ lookup_evenodd_internal_fn (internal_fn *odd = internal_fn (IFN_##NAME##_ODD); \ break; #include "internal-fn.def" -#undef DEF_INTERNAL_FN -#undef DEF_INTERNAL_WIDENING_OPTAB_FN } } @@ -4261,7 +4253,6 @@ widening_fn_p (code_helper code) internal_fn fn = as_internal_fn ((combined_fn) code); switch (fn) { - #undef DEF_INTERNAL_WIDENING_OPTAB_FN #define DEF_INTERNAL_WIDENING_OPTAB_FN(NAME, F, S, SO, UO, T) \ case IFN_##NAME: \ case IFN_##NAME##_HI: \ @@ -4270,7 +4261,6 @@ widening_fn_p (code_helper code) case IFN_##NAME##_ODD: \ return true; #include "internal-fn.def" - #undef DEF_INTERNAL_WIDENING_OPTAB_FN default: return false; @@ -4305,8 +4296,6 @@ set_edom_supported_p (void) expand_##TYPE##_optab_fn (fn, stmt, which_optab); \ } #include "internal-fn.def" -#undef DEF_INTERNAL_OPTAB_FN -#undef DEF_INTERNAL_SIGNED_OPTAB_FN /* Routines to expand each internal function, indexed by function number. Each routine has the prototype: @@ -4465,8 +4454,6 @@ get_len_internal_fn (internal_fn fn) { switch (fn) { -#undef DEF_INTERNAL_COND_FN -#undef DEF_INTERNAL_SIGNED_COND_FN #define DEF_INTERNAL_COND_FN(NAME, ...) \ case IFN_COND_##NAME: \ return IFN_COND_LEN_##NAME; @@ -4474,8 +4461,6 @@ get_len_internal_fn (internal_fn fn) case IFN_COND_##NAME: \ return IFN_COND_LEN_##NAME; #include "internal-fn.def" -#undef DEF_INTERNAL_COND_FN -#undef DEF_INTERNAL_SIGNED_COND_FN default: return IFN_LAST; } Jakub ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] tree-ssa-math-opts: popcount (X) == 1 to (X ^ (X - 1)) > (X - 1) optimization [PR90693] 2023-11-17 14:01 [PATCH] tree-ssa-math-opts: popcount (X) == 1 to (X ^ (X - 1)) > (X - 1) optimization [PR90693] Jakub Jelinek 2023-11-18 10:27 ` [PATCH] tree-ssa-math-opts: popcount (X) == 1 to (X ^ (X - 1)) > (X - 1) optimization for direct optab [PR90693] Jakub Jelinek @ 2023-11-19 3:59 ` Jeff Law 2023-11-20 7:54 ` Richard Biener 2 siblings, 0 replies; 11+ messages in thread From: Jeff Law @ 2023-11-19 3:59 UTC (permalink / raw) To: Jakub Jelinek, Richard Biener; +Cc: gcc-patches On 11/17/23 07:01, Jakub Jelinek wrote: > Hi! > > Per the earlier discussions on this PR, the following patch folds > popcount (x) == 1 (and != 1) into (x ^ (x - 1)) > x - 1 (or <=) > if the corresponding popcount optab isn't implemented (I think any > double-word popcount or call will be necessarily slower than the > above cheap 3 op check and even for -Os larger or same size). > > I've noticed e.g. C++ aligned new starts with std::has_single_bit > which does popcount (x) == 1. > > As a follow-up, I'm considering changing in this routine the popcount > call to IFN_POPCOUNT with 2 arguments and during expansion test costs. > > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? > > 2023-11-17 Jakub Jelinek <jakub@redhat.com> > > PR tree-optimization/90693 > * tree-ssa-math-opts.cc (match_single_bit_test): New function. > (math_opts_dom_walker::after_dom_children): Call it for EQ_EXPR > and NE_EXPR assignments and GIMPLE_CONDs. > > * gcc.target/i386/pr90693.c: New test. OK. Jeff ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] tree-ssa-math-opts: popcount (X) == 1 to (X ^ (X - 1)) > (X - 1) optimization [PR90693] 2023-11-17 14:01 [PATCH] tree-ssa-math-opts: popcount (X) == 1 to (X ^ (X - 1)) > (X - 1) optimization [PR90693] Jakub Jelinek 2023-11-18 10:27 ` [PATCH] tree-ssa-math-opts: popcount (X) == 1 to (X ^ (X - 1)) > (X - 1) optimization for direct optab [PR90693] Jakub Jelinek 2023-11-19 3:59 ` [PATCH] tree-ssa-math-opts: popcount (X) == 1 to (X ^ (X - 1)) > (X - 1) optimization [PR90693] Jeff Law @ 2023-11-20 7:54 ` Richard Biener 2023-11-20 8:37 ` Jakub Jelinek 2 siblings, 1 reply; 11+ messages in thread From: Richard Biener @ 2023-11-20 7:54 UTC (permalink / raw) To: Jakub Jelinek; +Cc: gcc-patches On Fri, 17 Nov 2023, Jakub Jelinek wrote: > Hi! > > Per the earlier discussions on this PR, the following patch folds > popcount (x) == 1 (and != 1) into (x ^ (x - 1)) > x - 1 (or <=) > if the corresponding popcount optab isn't implemented (I think any > double-word popcount or call will be necessarily slower than the > above cheap 3 op check and even for -Os larger or same size). > > I've noticed e.g. C++ aligned new starts with std::has_single_bit > which does popcount (x) == 1. > > As a follow-up, I'm considering changing in this routine the popcount > call to IFN_POPCOUNT with 2 arguments and during expansion test costs. > > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? Classically this would have been an RTL expansion alternative, given we want to do less of those the next place would have been the ISEL pass. Any particular reason you chose widening-mul for this (guess that pass just has a bad name and it's the effective "optimize" ISEL pass we have). Richard. > 2023-11-17 Jakub Jelinek <jakub@redhat.com> > > PR tree-optimization/90693 > * tree-ssa-math-opts.cc (match_single_bit_test): New function. > (math_opts_dom_walker::after_dom_children): Call it for EQ_EXPR > and NE_EXPR assignments and GIMPLE_CONDs. > > * gcc.target/i386/pr90693.c: New test. > > --- gcc/tree-ssa-math-opts.cc.jj 2023-11-13 15:51:02.920562303 +0100 > +++ gcc/tree-ssa-math-opts.cc 2023-11-17 11:37:02.255905475 +0100 > @@ -5154,6 +5154,72 @@ match_uaddc_usubc (gimple_stmt_iterator > return true; > } > > +/* Replace .POPCOUNT (x) == 1 or .POPCOUNT (x) != 1 with > + (x & (x - 1)) > x - 1 or (x & (x - 1)) <= x - 1 if .POPCOUNT > + isn't a direct optab. */ > + > +static void > +match_single_bit_test (gimple_stmt_iterator *gsi, gimple *stmt) > +{ > + tree clhs, crhs; > + enum tree_code code; > + if (gimple_code (stmt) == GIMPLE_COND) > + { > + clhs = gimple_cond_lhs (stmt); > + crhs = gimple_cond_rhs (stmt); > + code = gimple_cond_code (stmt); > + } > + else > + { > + clhs = gimple_assign_rhs1 (stmt); > + crhs = gimple_assign_rhs2 (stmt); > + code = gimple_assign_rhs_code (stmt); > + } > + if (code != EQ_EXPR && code != NE_EXPR) > + return; > + if (TREE_CODE (clhs) != SSA_NAME || !integer_onep (crhs)) > + return; > + gimple *call = SSA_NAME_DEF_STMT (clhs); > + combined_fn cfn = gimple_call_combined_fn (call); > + switch (cfn) > + { > + CASE_CFN_POPCOUNT: > + break; > + default: > + return; > + } > + if (!has_single_use (clhs)) > + return; > + tree arg = gimple_call_arg (call, 0); > + tree type = TREE_TYPE (arg); > + if (!INTEGRAL_TYPE_P (type)) > + return; > + if (direct_internal_fn_supported_p (IFN_POPCOUNT, type, OPTIMIZE_FOR_BOTH)) > + return; > + tree argm1 = make_ssa_name (type); > + gimple *g = gimple_build_assign (argm1, PLUS_EXPR, arg, > + build_int_cst (type, -1)); > + gsi_insert_before (gsi, g, GSI_SAME_STMT); > + g = gimple_build_assign (make_ssa_name (type), BIT_XOR_EXPR, arg, argm1); > + gsi_insert_before (gsi, g, GSI_SAME_STMT); > + if (gcond *cond = dyn_cast <gcond *> (stmt)) > + { > + gimple_cond_set_lhs (cond, gimple_assign_lhs (g)); > + gimple_cond_set_rhs (cond, argm1); > + gimple_cond_set_code (cond, code == EQ_EXPR ? GT_EXPR : LE_EXPR); > + } > + else > + { > + gimple_assign_set_rhs1 (stmt, gimple_assign_lhs (g)); > + gimple_assign_set_rhs2 (stmt, argm1); > + gimple_assign_set_rhs_code (stmt, code == EQ_EXPR ? GT_EXPR : LE_EXPR); > + } > + update_stmt (stmt); > + gimple_stmt_iterator gsi2 = gsi_for_stmt (call); > + gsi_remove (&gsi2, true); > + release_defs (call); > +} > + > /* Return true if target has support for divmod. */ > > static bool > @@ -5807,6 +5873,11 @@ math_opts_dom_walker::after_dom_children > match_uaddc_usubc (&gsi, stmt, code); > break; > > + case EQ_EXPR: > + case NE_EXPR: > + match_single_bit_test (&gsi, stmt); > + break; > + > default:; > } > } > @@ -5872,7 +5943,10 @@ math_opts_dom_walker::after_dom_children > } > } > else if (gimple_code (stmt) == GIMPLE_COND) > - optimize_spaceship (as_a <gcond *> (stmt)); > + { > + match_single_bit_test (&gsi, stmt); > + optimize_spaceship (as_a <gcond *> (stmt)); > + } > gsi_next (&gsi); > } > if (fma_state.m_deferring_p > --- gcc/testsuite/gcc.target/i386/pr90693.c.jj 2023-11-16 19:31:43.383587048 +0100 > +++ gcc/testsuite/gcc.target/i386/pr90693.c 2023-11-16 19:33:23.676177086 +0100 > @@ -0,0 +1,29 @@ > +/* PR tree-optimization/90693 */ > +/* { dg-do compile } */ > +/* { dg-options "-O2 -mno-abm -mno-popcnt -fdump-tree-optimized" } */ > +/* { dg-final { scan-tree-dump-not "POPCOUNT \\\(" "optimized" } } */ > +/* { dg-final { scan-tree-dump-not "__builtin_popcount(ll)? \\\(" "optimized" } } */ > + > +int > +foo (unsigned int x) > +{ > + return __builtin_popcount (x) == 1; > +} > + > +int > +bar (unsigned int x) > +{ > + return (x ^ (x - 1)) > x - 1; > +} > + > +int > +baz (unsigned long long x) > +{ > + return __builtin_popcountll (x) == 1; > +} > + > +int > +qux (unsigned long long x) > +{ > + return (x ^ (x - 1)) > x - 1; > +} > > Jakub > > -- Richard Biener <rguenther@suse.de> SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany; GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg) ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] tree-ssa-math-opts: popcount (X) == 1 to (X ^ (X - 1)) > (X - 1) optimization [PR90693] 2023-11-20 7:54 ` Richard Biener @ 2023-11-20 8:37 ` Jakub Jelinek 2023-11-20 8:39 ` Richard Biener 0 siblings, 1 reply; 11+ messages in thread From: Jakub Jelinek @ 2023-11-20 8:37 UTC (permalink / raw) To: Richard Biener; +Cc: gcc-patches On Mon, Nov 20, 2023 at 07:54:54AM +0000, Richard Biener wrote: > On Fri, 17 Nov 2023, Jakub Jelinek wrote: > > Per the earlier discussions on this PR, the following patch folds > > popcount (x) == 1 (and != 1) into (x ^ (x - 1)) > x - 1 (or <=) > > if the corresponding popcount optab isn't implemented (I think any > > double-word popcount or call will be necessarily slower than the > > above cheap 3 op check and even for -Os larger or same size). > > > > I've noticed e.g. C++ aligned new starts with std::has_single_bit > > which does popcount (x) == 1. > > > > As a follow-up, I'm considering changing in this routine the popcount > > call to IFN_POPCOUNT with 2 arguments and during expansion test costs. > > > > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? > > Classically this would have been an RTL expansion alternative, given > we want to do less of those the next place would have been the ISEL > pass. Any particular reason you chose widening-mul for this (guess > that pass just has a bad name and it's the effective "optimize" ISEL pass > we have). I think the ssa-math-opts pass does far more of this staff than the isel pass which only deals with vector stuff right now and you've even mentioned that pass for that in the PR90693 thread. That said, I can move it into the isel pass as well if you prefer that. Jakub ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] tree-ssa-math-opts: popcount (X) == 1 to (X ^ (X - 1)) > (X - 1) optimization [PR90693] 2023-11-20 8:37 ` Jakub Jelinek @ 2023-11-20 8:39 ` Richard Biener 2023-11-20 8:41 ` Jakub Jelinek 2023-11-20 13:03 ` Jeff Law 0 siblings, 2 replies; 11+ messages in thread From: Richard Biener @ 2023-11-20 8:39 UTC (permalink / raw) To: Jakub Jelinek; +Cc: gcc-patches On Mon, 20 Nov 2023, Jakub Jelinek wrote: > On Mon, Nov 20, 2023 at 07:54:54AM +0000, Richard Biener wrote: > > On Fri, 17 Nov 2023, Jakub Jelinek wrote: > > > Per the earlier discussions on this PR, the following patch folds > > > popcount (x) == 1 (and != 1) into (x ^ (x - 1)) > x - 1 (or <=) > > > if the corresponding popcount optab isn't implemented (I think any > > > double-word popcount or call will be necessarily slower than the > > > above cheap 3 op check and even for -Os larger or same size). > > > > > > I've noticed e.g. C++ aligned new starts with std::has_single_bit > > > which does popcount (x) == 1. > > > > > > As a follow-up, I'm considering changing in this routine the popcount > > > call to IFN_POPCOUNT with 2 arguments and during expansion test costs. > > > > > > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? > > > > Classically this would have been an RTL expansion alternative, given > > we want to do less of those the next place would have been the ISEL > > pass. Any particular reason you chose widening-mul for this (guess > > that pass just has a bad name and it's the effective "optimize" ISEL pass > > we have). > > I think the ssa-math-opts pass does far more of this staff than the isel > pass which only deals with vector stuff right now and you've even mentioned > that pass for that in the PR90693 thread. > That said, I can move it into the isel pass as well if you prefer that. I think it's fine as you posted (and Jeff approved), I'm just wondering if we should rename that pass somehow ;) Note ISEL is more for required pre-expansion stuff and widen-mul is for expansion related but optimization parts. Richard. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] tree-ssa-math-opts: popcount (X) == 1 to (X ^ (X - 1)) > (X - 1) optimization [PR90693] 2023-11-20 8:39 ` Richard Biener @ 2023-11-20 8:41 ` Jakub Jelinek 2023-11-20 13:03 ` Jeff Law 1 sibling, 0 replies; 11+ messages in thread From: Jakub Jelinek @ 2023-11-20 8:41 UTC (permalink / raw) To: Richard Biener; +Cc: gcc-patches On Mon, Nov 20, 2023 at 08:39:42AM +0000, Richard Biener wrote: > I think it's fine as you posted (and Jeff approved), I'm just wondering > if we should rename that pass somehow ;) Note ISEL is more for > required pre-expansion stuff and widen-mul is for expansion related but > optimization parts. +1 for renaming the pass. Jakub ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] tree-ssa-math-opts: popcount (X) == 1 to (X ^ (X - 1)) > (X - 1) optimization [PR90693] 2023-11-20 8:39 ` Richard Biener 2023-11-20 8:41 ` Jakub Jelinek @ 2023-11-20 13:03 ` Jeff Law 1 sibling, 0 replies; 11+ messages in thread From: Jeff Law @ 2023-11-20 13:03 UTC (permalink / raw) To: Richard Biener, Jakub Jelinek; +Cc: gcc-patches On 11/20/23 01:39, Richard Biener wrote: > On Mon, 20 Nov 2023, Jakub Jelinek wrote: > >> On Mon, Nov 20, 2023 at 07:54:54AM +0000, Richard Biener wrote: >>> On Fri, 17 Nov 2023, Jakub Jelinek wrote: >>>> Per the earlier discussions on this PR, the following patch folds >>>> popcount (x) == 1 (and != 1) into (x ^ (x - 1)) > x - 1 (or <=) >>>> if the corresponding popcount optab isn't implemented (I think any >>>> double-word popcount or call will be necessarily slower than the >>>> above cheap 3 op check and even for -Os larger or same size). >>>> >>>> I've noticed e.g. C++ aligned new starts with std::has_single_bit >>>> which does popcount (x) == 1. >>>> >>>> As a follow-up, I'm considering changing in this routine the popcount >>>> call to IFN_POPCOUNT with 2 arguments and during expansion test costs. >>>> >>>> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? >>> >>> Classically this would have been an RTL expansion alternative, given >>> we want to do less of those the next place would have been the ISEL >>> pass. Any particular reason you chose widening-mul for this (guess >>> that pass just has a bad name and it's the effective "optimize" ISEL pass >>> we have). >> >> I think the ssa-math-opts pass does far more of this staff than the isel >> pass which only deals with vector stuff right now and you've even mentioned >> that pass for that in the PR90693 thread. >> That said, I can move it into the isel pass as well if you prefer that. > > I think it's fine as you posted (and Jeff approved), I'm just wondering > if we should rename that pass somehow ;) Note ISEL is more for > required pre-expansion stuff and widen-mul is for expansion related but > optimization parts. Yea, a rename seems appropriate given how much stuff in there is no longer math-ops related :-) jeff ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2023-11-21 8:20 UTC | newest] Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2023-11-17 14:01 [PATCH] tree-ssa-math-opts: popcount (X) == 1 to (X ^ (X - 1)) > (X - 1) optimization [PR90693] Jakub Jelinek 2023-11-18 10:27 ` [PATCH] tree-ssa-math-opts: popcount (X) == 1 to (X ^ (X - 1)) > (X - 1) optimization for direct optab [PR90693] Jakub Jelinek 2023-11-19 4:03 ` Jeff Law 2023-11-20 8:01 ` Richard Biener 2023-11-20 9:05 ` Jakub Jelinek 2023-11-19 3:59 ` [PATCH] tree-ssa-math-opts: popcount (X) == 1 to (X ^ (X - 1)) > (X - 1) optimization [PR90693] Jeff Law 2023-11-20 7:54 ` Richard Biener 2023-11-20 8:37 ` Jakub Jelinek 2023-11-20 8:39 ` Richard Biener 2023-11-20 8:41 ` Jakub Jelinek 2023-11-20 13:03 ` Jeff Law
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).