public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: Richard Biener <rguenther@suse.de>
To: Jakub Jelinek <jakub@redhat.com>
Cc: gcc-patches@gcc.gnu.org
Subject: Re: [PATCH] tree-ssa-math-opts: popcount (X) == 1 to (X ^ (X - 1)) > (X - 1) optimization for direct optab [PR90693]
Date: Mon, 20 Nov 2023 08:01:45 +0000 (UTC)	[thread overview]
Message-ID: <nycvar.YFH.7.77.849.2311200758180.8772@jbgna.fhfr.qr> (raw)
In-Reply-To: <ZViRqiN7i3waSY2v@tucnak>

On Sat, 18 Nov 2023, Jakub Jelinek wrote:

> Hi!
> 
> On Fri, Nov 17, 2023 at 03:01:04PM +0100, Jakub Jelinek wrote:
> > As a follow-up, I'm considering changing in this routine the popcount
> > call to IFN_POPCOUNT with 2 arguments and during expansion test costs.
> 
> Here is the follow-up which does the rtx costs testing.
> While having to tweak internal-fn.def so that POPCOUNT can have a custom
> expand_POPCOUNT, I have noticed we are inconsistent, some DEF_INTERNAL*
> macros (most of them) were undefined at the end of internal-fn.def (but in
> some cases uselessly undefined again after inclusion), while others were not
> (and sometimes undefined after the inclusion).  I've changed it to always
> undefine at the end of internal-fn.def.

The last bit is cleanup that could go in independently.

> Ok for trunk if it passes bootstrap/regtest?
> 
> 2023-11-18  Jakub Jelinek  <jakub@redhat.com>
> 
> 	PR tree-optimization/90693
> 	* tree-ssa-math-opts.cc (match_single_bit_test): Mark POPCOUNT with
> 	result only used in equality comparison against 1 with direct optab
> 	support as .POPCOUNT call with 2 arguments.
> 	* internal-fn.h (expand_POPCOUNT): Declare.
> 	* internal-fn.def: Document missing DEF_INTERNAL* macros and make sure
> 	they are all undefined at the end.
> 	(DEF_INTERNAL_INT_EXT_FN): New macro.
> 	(POPCOUNT): Use it instead of DEF_INTERNAL_INT_FN.
> 	* internal-fn.cc (lookup_hilo_internal_fn, lookup_evenodd_internal_fn,
> 	widening_fn_p, get_len_internal_fn): Don't undef DEF_INTERNAL_*FN
> 	macros after inclusion of internal-fn.def.
> 	(DEF_INTERNAL_INT_EXT_FN): Define to nothing before inclusion to
> 	define expanders.
> 	(expand_POPCOUNT): New function.
> 
> --- gcc/tree-ssa-math-opts.cc.jj	2023-11-18 09:38:03.460813910 +0100
> +++ gcc/tree-ssa-math-opts.cc	2023-11-18 10:25:18.751207936 +0100
> @@ -5195,7 +5195,16 @@ match_single_bit_test (gimple_stmt_itera
>    if (!INTEGRAL_TYPE_P (type))
>      return;
>    if (direct_internal_fn_supported_p (IFN_POPCOUNT, type, OPTIMIZE_FOR_BOTH))
> -    return;
> +    {
> +      /* Tell expand_POPCOUNT the popcount result is only used in equality
> +	 comparison with one, so that it can decide based on rtx costs.  */
> +      gimple *g = gimple_build_call_internal (IFN_POPCOUNT, 2, arg,
> +					      integer_one_node);
> +      gimple_call_set_lhs (g, gimple_call_lhs (call));
> +      gimple_stmt_iterator gsi2 = gsi_for_stmt (call);
> +      gsi_replace (&gsi2, g, true);
> +      return;
> +    }
>    tree argm1 = make_ssa_name (type);
>    gimple *g = gimple_build_assign (argm1, PLUS_EXPR, arg,
>  				   build_int_cst (type, -1));
> --- gcc/internal-fn.h.jj	2023-11-02 12:15:12.223998565 +0100
> +++ gcc/internal-fn.h	2023-11-18 10:14:25.834340505 +0100
> @@ -262,6 +262,7 @@ extern void expand_MULBITINT (internal_f
>  extern void expand_DIVMODBITINT (internal_fn, gcall *);
>  extern void expand_FLOATTOBITINT (internal_fn, gcall *);
>  extern void expand_BITINTTOFLOAT (internal_fn, gcall *);
> +extern void expand_POPCOUNT (internal_fn, gcall *);
>  
>  extern bool vectorized_internal_fn_supported_p (internal_fn, tree);
>  
> --- gcc/internal-fn.def.jj	2023-11-17 15:51:02.642802521 +0100
> +++ gcc/internal-fn.def	2023-11-18 10:12:10.329236626 +0100
> @@ -33,9 +33,13 @@ along with GCC; see the file COPYING3.
>       DEF_INTERNAL_SIGNED_OPTAB_FN (NAME, FLAGS, SELECTOR, SIGNED_OPTAB,
>  				   UNSIGNED_OPTAB, TYPE)
>       DEF_INTERNAL_FLT_FN (NAME, FLAGS, OPTAB, TYPE)
> +     DEF_INTERNAL_FLT_FLOATN_FN (NAME, FLAGS, OPTAB, TYPE)
>       DEF_INTERNAL_INT_FN (NAME, FLAGS, OPTAB, TYPE)
> +     DEF_INTERNAL_INT_EXT_FN (NAME, FLAGS, OPTAB, TYPE)
>       DEF_INTERNAL_COND_FN (NAME, FLAGS, OPTAB, TYPE)
>       DEF_INTERNAL_SIGNED_COND_FN (NAME, FLAGS, OPTAB, TYPE)
> +     DEF_INTERNAL_WIDENING_OPTAB_FN (NAME, FLAGS, SELECTOR, SOPTAB, UOPTAB,
> +				     TYPE)
>  
>     where NAME is the name of the function, FLAGS is a set of
>     ECF_* flags and FNSPEC is a string describing functions fnspec.
> @@ -97,6 +101,10 @@ along with GCC; see the file COPYING3.
>     says that the function extends the C-level BUILT_IN_<NAME>{,L,LL,IMAX}
>     group of functions to any integral mode (including vector modes).
>  
> +   DEF_INTERNAL_INT_EXT_FN is like DEF_INTERNAL_INT_FN, except that it
> +   has expand_##NAME defined in internal-fn.cc to override the
> +   DEF_INTERNAL_INT_FN expansion behavior.
> +
>     DEF_INTERNAL_WIDENING_OPTAB_FN is a wrapper that defines five internal
>     functions with DEF_INTERNAL_SIGNED_OPTAB_FN:
>     - one that describes a widening operation with the same number of elements
> @@ -160,6 +168,11 @@ along with GCC; see the file COPYING3.
>    DEF_INTERNAL_OPTAB_FN (NAME, FLAGS, OPTAB, TYPE)
>  #endif
>  
> +#ifndef DEF_INTERNAL_INT_EXT_FN
> +#define DEF_INTERNAL_INT_EXT_FN(NAME, FLAGS, OPTAB, TYPE) \
> +  DEF_INTERNAL_INT_FN (NAME, FLAGS, OPTAB, TYPE)
> +#endif
> +
>  #ifndef DEF_INTERNAL_WIDENING_OPTAB_FN
>  #define DEF_INTERNAL_WIDENING_OPTAB_FN(NAME, FLAGS, SELECTOR, SOPTAB, UOPTAB, TYPE)		    \
>    DEF_INTERNAL_SIGNED_OPTAB_FN (NAME, FLAGS, SELECTOR, SOPTAB, UOPTAB, TYPE)			    \
> @@ -432,7 +445,7 @@ DEF_INTERNAL_INT_FN (CLZ, ECF_CONST | EC
>  DEF_INTERNAL_INT_FN (CTZ, ECF_CONST | ECF_NOTHROW, ctz, unary)
>  DEF_INTERNAL_INT_FN (FFS, ECF_CONST | ECF_NOTHROW, ffs, unary)
>  DEF_INTERNAL_INT_FN (PARITY, ECF_CONST | ECF_NOTHROW, parity, unary)
> -DEF_INTERNAL_INT_FN (POPCOUNT, ECF_CONST | ECF_NOTHROW, popcount, unary)
> +DEF_INTERNAL_INT_EXT_FN (POPCOUNT, ECF_CONST | ECF_NOTHROW, popcount, unary)
>  
>  DEF_INTERNAL_FN (GOMP_TARGET_REV, ECF_NOVOPS | ECF_LEAF | ECF_NOTHROW, NULL)
>  DEF_INTERNAL_FN (GOMP_USE_SIMT, ECF_NOVOPS | ECF_LEAF | ECF_NOTHROW, NULL)
> @@ -572,6 +585,10 @@ DEF_INTERNAL_FN (DIVMODBITINT, ECF_LEAF,
>  DEF_INTERNAL_FN (FLOATTOBITINT, ECF_LEAF | ECF_NOTHROW, ". O . . ")
>  DEF_INTERNAL_FN (BITINTTOFLOAT, ECF_PURE | ECF_LEAF, ". R . ")
>  
> +#undef DEF_INTERNAL_WIDENING_OPTAB_FN
> +#undef DEF_INTERNAL_SIGNED_COND_FN
> +#undef DEF_INTERNAL_COND_FN
> +#undef DEF_INTERNAL_INT_EXT_FN
>  #undef DEF_INTERNAL_INT_FN
>  #undef DEF_INTERNAL_FLT_FN
>  #undef DEF_INTERNAL_FLT_FLOATN_FN
> --- gcc/internal-fn.cc.jj	2023-11-07 08:32:01.741254155 +0100
> +++ gcc/internal-fn.cc	2023-11-18 11:06:31.740703230 +0100
> @@ -102,8 +102,6 @@ lookup_hilo_internal_fn (internal_fn ifn
>      {
>      default:
>        gcc_unreachable ();
> -#undef DEF_INTERNAL_FN
> -#undef DEF_INTERNAL_WIDENING_OPTAB_FN
>  #define DEF_INTERNAL_FN(NAME, FLAGS, TYPE)
>  #define DEF_INTERNAL_WIDENING_OPTAB_FN(NAME, F, S, SO, UO, T)	\
>      case IFN_##NAME:						\
> @@ -111,8 +109,6 @@ lookup_hilo_internal_fn (internal_fn ifn
>        *hi = internal_fn (IFN_##NAME##_HI);			\
>        break;
>  #include "internal-fn.def"
> -#undef DEF_INTERNAL_FN
> -#undef DEF_INTERNAL_WIDENING_OPTAB_FN
>      }
>  }
>  
> @@ -129,8 +125,6 @@ lookup_evenodd_internal_fn (internal_fn
>      {
>      default:
>        gcc_unreachable ();
> -#undef DEF_INTERNAL_FN
> -#undef DEF_INTERNAL_WIDENING_OPTAB_FN
>  #define DEF_INTERNAL_FN(NAME, FLAGS, TYPE)
>  #define DEF_INTERNAL_WIDENING_OPTAB_FN(NAME, F, S, SO, UO, T)	\
>      case IFN_##NAME:						\
> @@ -138,8 +132,6 @@ lookup_evenodd_internal_fn (internal_fn
>        *odd = internal_fn (IFN_##NAME##_ODD);			\
>        break;
>  #include "internal-fn.def"
> -#undef DEF_INTERNAL_FN
> -#undef DEF_INTERNAL_WIDENING_OPTAB_FN
>      }
>  }
>  
> @@ -4261,7 +4253,6 @@ widening_fn_p (code_helper code)
>    internal_fn fn = as_internal_fn ((combined_fn) code);
>    switch (fn)
>      {
> -    #undef DEF_INTERNAL_WIDENING_OPTAB_FN
>      #define DEF_INTERNAL_WIDENING_OPTAB_FN(NAME, F, S, SO, UO, T) \
>      case IFN_##NAME:						  \
>      case IFN_##NAME##_HI:					  \
> @@ -4270,7 +4261,6 @@ widening_fn_p (code_helper code)
>      case IFN_##NAME##_ODD:					  \
>        return true;
>      #include "internal-fn.def"
> -    #undef DEF_INTERNAL_WIDENING_OPTAB_FN
>  
>      default:
>        return false;
> @@ -4295,6 +4285,7 @@ set_edom_supported_p (void)
>    {							\
>      expand_##TYPE##_optab_fn (fn, stmt, OPTAB##_optab);	\
>    }
> +#define DEF_INTERNAL_INT_EXT_FN(CODE, FLAGS, OPTAB, TYPE)
>  #define DEF_INTERNAL_SIGNED_OPTAB_FN(CODE, FLAGS, SELECTOR, SIGNED_OPTAB, \
>  				     UNSIGNED_OPTAB, TYPE)		\
>    static void								\
> @@ -4305,8 +4296,6 @@ set_edom_supported_p (void)
>      expand_##TYPE##_optab_fn (fn, stmt, which_optab);			\
>    }
>  #include "internal-fn.def"
> -#undef DEF_INTERNAL_OPTAB_FN
> -#undef DEF_INTERNAL_SIGNED_OPTAB_FN
>  
>  /* Routines to expand each internal function, indexed by function number.
>     Each routine has the prototype:
> @@ -4465,8 +4454,6 @@ get_len_internal_fn (internal_fn fn)
>  {
>    switch (fn)
>      {
> -#undef DEF_INTERNAL_COND_FN
> -#undef DEF_INTERNAL_SIGNED_COND_FN
>  #define DEF_INTERNAL_COND_FN(NAME, ...)                                        \
>    case IFN_COND_##NAME:                                                        \
>      return IFN_COND_LEN_##NAME;
> @@ -4474,8 +4461,6 @@ get_len_internal_fn (internal_fn fn)
>    case IFN_COND_##NAME:                                                        \
>      return IFN_COND_LEN_##NAME;
>  #include "internal-fn.def"
> -#undef DEF_INTERNAL_COND_FN
> -#undef DEF_INTERNAL_SIGNED_COND_FN
>      default:
>        return IFN_LAST;
>      }
> @@ -5113,3 +5098,67 @@ expand_BITINTTOFLOAT (internal_fn, gcall
>    if (val != target)
>      emit_move_insn (target, val);
>  }
> +
> +void
> +expand_POPCOUNT (internal_fn fn, gcall *stmt)
> +{
> +  if (gimple_call_num_args (stmt) == 1)
> +    {
> +      expand_unary_optab_fn (fn, stmt, popcount_optab);
> +      return;
> +    }
> +  /* If .POPCOUNT call has 2 arguments, match_single_bit_test marked it
> +     because the result is only used in an equality comparison against 1.
> +     Use rtx costs in that case to determine if .POPCOUNT (arg) == 1
> +     or (arg ^ (arg - 1)) > arg - 1 is cheaper.  */

We're doing plenty of RTX costing during GIMPLE optimizations, I wonder
if it could be done equally well there given that seq_cost of an RTX
sequence isn't going to capture the compare & branch cost very reliably
anyway ...

> +  bool speed_p = optimize_insn_for_speed_p ();
> +  tree lhs = gimple_call_lhs (stmt);
> +  tree arg = gimple_call_arg (stmt, 0);
> +  tree type = TREE_TYPE (arg);
> +  machine_mode mode = TYPE_MODE (type);
> +  do_pending_stack_adjust ();
> +  start_sequence ();
> +  expand_unary_optab_fn (fn, stmt, popcount_optab);
> +  rtx_insn *popcount_insns = get_insns ();
> +  end_sequence ();
> +  start_sequence ();
> +  rtx plhs = expand_normal (lhs);
> +  rtx pcmp = emit_store_flag (NULL_RTX, EQ, plhs, const1_rtx, mode, 0, 0);
> +  if (pcmp == NULL_RTX)
> +    {
> +    fail:
> +      end_sequence ();
> +      emit_insn (popcount_insns);
> +      return;
> +    }
> +  rtx_insn *popcount_cmp_insns = get_insns ();
> +  end_sequence ();
> +  start_sequence ();
> +  rtx op0 = expand_normal (arg);
> +  rtx argm1 = expand_simple_binop (mode, PLUS, op0, constm1_rtx, NULL_RTX,
> +				   1, OPTAB_DIRECT);
> +  if (argm1 == NULL_RTX)
> +    goto fail;
> +  rtx argxorargm1 = expand_simple_binop (mode, XOR, op0, argm1, NULL_RTX,
> +					 1, OPTAB_DIRECT);
> +  if (argxorargm1 == NULL_RTX)
> +    goto fail;
> +  rtx cmp = emit_store_flag (NULL_RTX, GTU, argxorargm1, argm1, mode, 1, 1);
> +  if (cmp == NULL_RTX)
> +    goto fail;
> +  rtx_insn *cmp_insns = get_insns ();
> +  end_sequence ();
> +  unsigned popcount_cost = (seq_cost (popcount_insns, speed_p)
> +			    + seq_cost (popcount_cmp_insns, speed_p));
> +  unsigned cmp_cost = seq_cost (cmp_insns, speed_p);
> +  if (popcount_cost < cmp_cost)
> +    emit_insn (popcount_insns);
> +  else
> +    {
> +      emit_insn (cmp_insns);
> +      plhs = expand_normal (lhs);
> +      if (GET_MODE (cmp) != GET_MODE (plhs))
> +	cmp = convert_to_mode (GET_MODE (plhs), cmp, 1);
> +      emit_move_insn (plhs, cmp);
> +    }
> +}
> 
> 
> 	Jakub
> 
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

  parent reply	other threads:[~2023-11-20  8:01 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-11-17 14:01 [PATCH] tree-ssa-math-opts: popcount (X) == 1 to (X ^ (X - 1)) > (X - 1) optimization [PR90693] Jakub Jelinek
2023-11-18 10:27 ` [PATCH] tree-ssa-math-opts: popcount (X) == 1 to (X ^ (X - 1)) > (X - 1) optimization for direct optab [PR90693] Jakub Jelinek
2023-11-19  4:03   ` Jeff Law
2023-11-20  8:01   ` Richard Biener [this message]
2023-11-20  9:05     ` Jakub Jelinek
2023-11-19  3:59 ` [PATCH] tree-ssa-math-opts: popcount (X) == 1 to (X ^ (X - 1)) > (X - 1) optimization [PR90693] Jeff Law
2023-11-20  7:54 ` Richard Biener
2023-11-20  8:37   ` Jakub Jelinek
2023-11-20  8:39     ` Richard Biener
2023-11-20  8:41       ` Jakub Jelinek
2023-11-20 13:03       ` Jeff Law

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=nycvar.YFH.7.77.849.2311200758180.8772@jbgna.fhfr.qr \
    --to=rguenther@suse.de \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=jakub@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).