public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: Richard Biener <rguenther@suse.de>
To: Jakub Jelinek <jakub@redhat.com>
Cc: gcc-patches@gcc.gnu.org
Subject: Re: [PATCH] tree-vect-patterns: Improve __builtin_{clz,ctz,ffs}ll vectorization [PR109011]
Date: Wed, 19 Apr 2023 08:52:34 +0000 (UTC)	[thread overview]
Message-ID: <nycvar.YFH.7.77.849.2304190852260.4466@jbgna.fhfr.qr> (raw)
In-Reply-To: <ZD+rh/xdkZfD7Zwe@tucnak>

On Wed, 19 Apr 2023, Jakub Jelinek wrote:

> Hi!
> 
> For __builtin_popcountll tree-vect-patterns.cc has
> vect_recog_popcount_pattern, which improves the vectorized code.
> Without that the vectorization is always multi-type vectorization
> in the loop (at least int and long long types) where we emit two
> .POPCOUNT calls with long long arguments and int return value and then
> widen to long long, so effectively after vectorization do the
> V?DImode -> V?DImode popcount twice, then pack the result into V?SImode
> and immediately unpack.
> 
> The following patch extends that handling to __builtin_{clz,ctz,ffs}ll
> builtins as well (as long as there is an optab for them; more to come
> laster).
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, plus tested on
> the testcase in crosses to powerpc64le-linux and s390x-linux.  Ok
> for trunk?

OK.

Richard.

> x86 can do __builtin_popcountll with -mavx512vpopcntdq, __builtin_clzll
> with -mavx512cd, ppc can do __builtin_popcountll and __builtin_clzll
> with -mpower8-vector and __builtin_ctzll with -mpower9-vector, s390
> can do __builtin_{popcount,clz,ctz}ll with -march=z13 -mzarch (i.e. VX).
> 
> 2023-04-19  Jakub Jelinek  <jakub@redhat.com>
> 
> 	PR tree-optimization/109011
> 	* tree-vect-patterns.cc (vect_recog_popcount_pattern): Rename to ...
> 	(vect_recog_popcount_clz_ctz_ffs_pattern): ... this.  Handle also
> 	CLZ, CTZ and FFS.  Remove vargs variable, use
> 	gimple_build_call_internal rather than gimple_build_call_internal_vec.
> 	(vect_vect_recog_func_ptrs): Adjust popcount entry.
> 
> 	* gcc.dg/vect/pr109011-1.c: New test.
> 
> --- gcc/tree-vect-patterns.cc.jj	2023-03-01 09:51:27.995362601 +0100
> +++ gcc/tree-vect-patterns.cc	2023-04-18 17:16:42.733935262 +0200
> @@ -1501,7 +1501,7 @@ vect_recog_widen_minus_pattern (vec_info
>  				      "vect_recog_widen_minus_pattern");
>  }
>  
> -/* Function vect_recog_popcount_pattern
> +/* Function vect_recog_popcount_clz_ctz_ffs_pattern
>  
>     Try to find the following pattern:
>  
> @@ -1530,16 +1530,20 @@ vect_recog_widen_minus_pattern (vec_info
>     * Return value: A new stmt that will be used to replace the sequence of
>     stmts that constitute the pattern. In this case it will be:
>     B = .POPCOUNT (A);
> +
> +   Similarly for clz, ctz and ffs.
>  */
>  
>  static gimple *
> -vect_recog_popcount_pattern (vec_info *vinfo,
> -			     stmt_vec_info stmt_vinfo, tree *type_out)
> +vect_recog_popcount_clz_ctz_ffs_pattern (vec_info *vinfo,
> +					 stmt_vec_info stmt_vinfo,
> +					 tree *type_out)
>  {
>    gassign *last_stmt = dyn_cast <gassign *> (stmt_vinfo->stmt);
> -  gimple *popcount_stmt, *pattern_stmt;
> +  gimple *call_stmt, *pattern_stmt;
>    tree rhs_oprnd, rhs_origin, lhs_oprnd, lhs_type, vec_type, new_var;
> -  auto_vec<tree> vargs;
> +  internal_fn ifn = IFN_LAST;
> +  int addend = 0;
>  
>    /* Find B = (TYPE1) temp_out. */
>    if (!last_stmt)
> @@ -1557,51 +1561,137 @@ vect_recog_popcount_pattern (vec_info *v
>    if (TREE_CODE (rhs_oprnd) != SSA_NAME
>        || !has_single_use (rhs_oprnd))
>      return NULL;
> -  popcount_stmt = SSA_NAME_DEF_STMT (rhs_oprnd);
> +  call_stmt = SSA_NAME_DEF_STMT (rhs_oprnd);
>  
>    /* Find temp_out = __builtin_popcount{,l,ll} (temp_in);  */
> -  if (!is_gimple_call (popcount_stmt))
> +  if (!is_gimple_call (call_stmt))
>      return NULL;
> -  switch (gimple_call_combined_fn (popcount_stmt))
> +  switch (gimple_call_combined_fn (call_stmt))
>      {
> +      int val;
>      CASE_CFN_POPCOUNT:
> +      ifn = IFN_POPCOUNT;
> +      break;
> +    CASE_CFN_CLZ:
> +      ifn = IFN_CLZ;
> +      /* Punt if call result is unsigned and defined value at zero
> +	 is negative, as the negative value doesn't extend correctly.  */
> +      if (TYPE_UNSIGNED (TREE_TYPE (rhs_oprnd))
> +	  && gimple_call_internal_p (call_stmt)
> +	  && CLZ_DEFINED_VALUE_AT_ZERO
> +	       (SCALAR_INT_TYPE_MODE (TREE_TYPE (rhs_oprnd)), val) == 2
> +	  && val < 0)
> +	return NULL;
> +      break;
> +    CASE_CFN_CTZ:
> +      ifn = IFN_CTZ;
> +      /* Punt if call result is unsigned and defined value at zero
> +	 is negative, as the negative value doesn't extend correctly.  */
> +      if (TYPE_UNSIGNED (TREE_TYPE (rhs_oprnd))
> +	  && gimple_call_internal_p (call_stmt)
> +	  && CTZ_DEFINED_VALUE_AT_ZERO
> +	       (SCALAR_INT_TYPE_MODE (TREE_TYPE (rhs_oprnd)), val) == 2
> +	  && val < 0)
> +	return NULL;
> +      break;
> +    CASE_CFN_FFS:
> +      ifn = IFN_FFS;
>        break;
>      default:
>        return NULL;
>      }
>  
> -  if (gimple_call_num_args (popcount_stmt) != 1)
> +  if (gimple_call_num_args (call_stmt) != 1)
>      return NULL;
>  
> -  rhs_oprnd = gimple_call_arg (popcount_stmt, 0);
> +  rhs_oprnd = gimple_call_arg (call_stmt, 0);
>    vect_unpromoted_value unprom_diff;
> -  rhs_origin = vect_look_through_possible_promotion (vinfo, rhs_oprnd,
> -						    &unprom_diff);
> +  rhs_origin
> +    = vect_look_through_possible_promotion (vinfo, rhs_oprnd, &unprom_diff);
>  
>    if (!rhs_origin)
>      return NULL;
>  
> -  /* Input and output of .POPCOUNT should be same-precision integer.
> -     Also A should be unsigned or same precision as temp_in,
> -     otherwise there would be sign_extend from A to temp_in.  */
> -  if (TYPE_PRECISION (unprom_diff.type) != TYPE_PRECISION (lhs_type)
> -      || (!TYPE_UNSIGNED (unprom_diff.type)
> -	  && (TYPE_PRECISION (unprom_diff.type)
> -	      != TYPE_PRECISION (TREE_TYPE (rhs_oprnd)))))
> +  /* Input and output of .POPCOUNT should be same-precision integer.  */
> +  if (TYPE_PRECISION (unprom_diff.type) != TYPE_PRECISION (lhs_type))
>      return NULL;
> -  vargs.safe_push (unprom_diff.op);
>  
> -  vect_pattern_detected ("vec_regcog_popcount_pattern", popcount_stmt);
> +  /* Also A should be unsigned or same precision as temp_in, otherwise
> +     different builtins/internal functions have different behaviors.  */
> +  if (TYPE_PRECISION (unprom_diff.type)
> +      != TYPE_PRECISION (TREE_TYPE (rhs_oprnd)))
> +    switch (ifn)
> +      {
> +      case IFN_POPCOUNT:
> +	/* For popcount require zero extension, which doesn't add any
> +	   further bits to the count.  */
> +	if (!TYPE_UNSIGNED (unprom_diff.type))
> +	  return NULL;
> +	break;
> +      case IFN_CLZ:
> +	/* clzll (x) == clz (x) + 32 for unsigned x != 0, so ok
> +	   if it is undefined at zero or if it matches also for the
> +	   defined value there.  */
> +	if (!TYPE_UNSIGNED (unprom_diff.type))
> +	  return NULL;
> +	if (!type_has_mode_precision_p (lhs_type)
> +	    || !type_has_mode_precision_p (TREE_TYPE (rhs_oprnd)))
> +	  return NULL;
> +	addend = (TYPE_PRECISION (TREE_TYPE (rhs_oprnd))
> +		  - TYPE_PRECISION (lhs_type));
> +	if (gimple_call_internal_p (call_stmt))
> +	  {
> +	    int val1, val2;
> +	    int d1
> +	      = CLZ_DEFINED_VALUE_AT_ZERO
> +		  (SCALAR_INT_TYPE_MODE (TREE_TYPE (rhs_oprnd)), val1);
> +	    int d2
> +	      = CLZ_DEFINED_VALUE_AT_ZERO (SCALAR_INT_TYPE_MODE (lhs_type),
> +					   val2);
> +	    if (d1 != 2)
> +	      break;
> +	    if (d2 != 2 || val1 != val2 + addend)
> +	      return NULL;
> +	  }
> +	break;
> +      case IFN_CTZ:
> +	/* ctzll (x) == ctz (x) for unsigned or signed x != 0, so ok
> +	   if it is undefined at zero or if it matches also for the
> +	   defined value there.  */
> +	if (gimple_call_internal_p (call_stmt))
> +	  {
> +	    int val1, val2;
> +	    int d1
> +	      = CTZ_DEFINED_VALUE_AT_ZERO
> +		  (SCALAR_INT_TYPE_MODE (TREE_TYPE (rhs_oprnd)), val1);
> +	    int d2
> +	      = CTZ_DEFINED_VALUE_AT_ZERO (SCALAR_INT_TYPE_MODE (lhs_type),
> +					   val2);
> +	    if (d1 != 2)
> +	      break;
> +	    if (d2 != 2 || val1 != val2)
> +	      return NULL;
> +	  }
> +	break;
> +      case IFN_FFS:
> +	/* ffsll (x) == ffs (x) for unsigned or signed x.  */
> +	break;
> +      default:
> +	gcc_unreachable ();
> +      }
> +
> +  vect_pattern_detected ("vec_recog_popcount_clz_ctz_ffs_pattern",
> +			 call_stmt);
>    vec_type = get_vectype_for_scalar_type (vinfo, lhs_type);
> -  /* Do it only if the backend has popcount<vector_mode>2 pattern.  */
> +  /* Do it only if the backend has popcount<vector_mode>2 etc. pattern.  */
>    if (!vec_type
> -      || !direct_internal_fn_supported_p (IFN_POPCOUNT, vec_type,
> +      || !direct_internal_fn_supported_p (ifn, vec_type,
>  					  OPTIMIZE_FOR_SPEED))
>      return NULL;
>  
>    /* Create B = .POPCOUNT (A).  */
>    new_var = vect_recog_temp_ssa_var (lhs_type, NULL);
> -  pattern_stmt = gimple_build_call_internal_vec (IFN_POPCOUNT, vargs);
> +  pattern_stmt = gimple_build_call_internal (ifn, 1, unprom_diff.op);
>    gimple_call_set_lhs (pattern_stmt, new_var);
>    gimple_set_location (pattern_stmt, gimple_location (last_stmt));
>    *type_out = vec_type;
> @@ -1609,6 +1699,14 @@ vect_recog_popcount_pattern (vec_info *v
>    if (dump_enabled_p ())
>      dump_printf_loc (MSG_NOTE, vect_location,
>  		     "created pattern stmt: %G", pattern_stmt);
> +
> +  if (addend)
> +    {
> +      append_pattern_def_seq (vinfo, stmt_vinfo, pattern_stmt, vec_type);
> +      tree ret_var = vect_recog_temp_ssa_var (lhs_type, NULL);
> +      pattern_stmt = gimple_build_assign (ret_var, PLUS_EXPR, new_var,
> +					  build_int_cst (lhs_type, addend));
> +    }
>    return pattern_stmt;
>  }
>  
> @@ -6051,7 +6149,7 @@ static vect_recog_func vect_vect_recog_f
>    { vect_recog_sad_pattern, "sad" },
>    { vect_recog_widen_sum_pattern, "widen_sum" },
>    { vect_recog_pow_pattern, "pow" },
> -  { vect_recog_popcount_pattern, "popcount" },
> +  { vect_recog_popcount_clz_ctz_ffs_pattern, "popcount_clz_ctz_ffs" },
>    { vect_recog_widen_shift_pattern, "widen_shift" },
>    { vect_recog_rotate_pattern, "rotate" },
>    { vect_recog_vector_vector_shift_pattern, "vector_vector_shift" },
> --- gcc/testsuite/gcc.dg/vect/pr109011-1.c.jj	2023-04-18 14:40:47.117397908 +0200
> +++ gcc/testsuite/gcc.dg/vect/pr109011-1.c	2023-04-18 14:40:05.124004362 +0200
> @@ -0,0 +1,48 @@
> +/* PR tree-optimization/109011 */
> +/* { dg-do compile } */
> +/* { dg-options "-O3 -fno-unroll-loops --param=vect-epilogues-nomask=0 -fdump-tree-optimized" } */
> +/* { dg-additional-options "-mavx512cd" { target { { i?86-*-* x86_64-*-* } && avx512cd } } } */
> +/* { dg-additional-options "-mavx512vpopcntdq" { target { { i?86-*-* x86_64-*-* } && avx512vpopcntdq } } } */
> +/* { dg-additional-options "-mpower8-vector" { target powerpc_p8vector_ok } } */
> +/* { dg-additional-options "-mpower9-vector" { target powerpc_p9vector_ok } } */
> +/* { dg-additional-options "-march=z13 -mzarch" { target s390_vx } } */
> +
> +void
> +foo (long long *p, long long *q)
> +{
> +#pragma omp simd
> +  for (int i = 0; i < 2048; ++i)
> +    p[i] = __builtin_popcountll (q[i]);
> +}
> +
> +/* { dg-final { scan-tree-dump-times " = \.POPCOUNT \\\(" 1 "optimized" { target { { i?86-*-* x86_64-*-* } && avx512vpopcntdq } } } } */
> +/* { dg-final { scan-tree-dump-times " = \.POPCOUNT \\\(" 1 "optimized" { target { powerpc_p8vector_ok || s390_vx } } } } */
> +
> +void
> +bar (long long *p, long long *q)
> +{
> +#pragma omp simd
> +  for (int i = 0; i < 2048; ++i)
> +    p[i] = __builtin_clzll (q[i]);
> +}
> +
> +/* { dg-final { scan-tree-dump-times " = \.CLZ \\\(" 1 "optimized" { target { { i?86-*-* x86_64-*-* } && avx512cd } } } } */
> +/* { dg-final { scan-tree-dump-times " = \.CLZ \\\(" 1 "optimized" { target { powerpc_p8vector_ok || s390_vx } } } } */
> +
> +void
> +baz (long long *p, long long *q)
> +{
> +#pragma omp simd
> +  for (int i = 0; i < 2048; ++i)
> +    p[i] = __builtin_ctzll (q[i]);
> +}
> +
> +/* { dg-final { scan-tree-dump-times " = \.CTZ \\\(" 1 "optimized" { target { powerpc_p9vector_ok || s390_vx } } } } */
> +
> +void
> +qux (long long *p, long long *q)
> +{
> +#pragma omp simd
> +  for (int i = 0; i < 2048; ++i)
> +    p[i] = __builtin_ffsll (q[i]);
> +}
> 
> 	Jakub
> 
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg,
Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman;
HRB 36809 (AG Nuernberg)

      reply	other threads:[~2023-04-19  8:52 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-04-19  8:51 Jakub Jelinek
2023-04-19  8:52 ` Richard Biener [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=nycvar.YFH.7.77.849.2304190852260.4466@jbgna.fhfr.qr \
    --to=rguenther@suse.de \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=jakub@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).