public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH 1/5] [PR target/96342] Change field "simdlen" into poly_uint64
@ 2020-10-30  2:29 yangyang (ET)
  2020-10-30 13:32 ` Richard Sandiford
  0 siblings, 1 reply; 9+ messages in thread
From: yangyang (ET) @ 2020-10-30  2:29 UTC (permalink / raw)
  To: gcc-patches; +Cc: Richard Sandiford

[-- Attachment #1: Type: text/plain, Size: 1185 bytes --]

Hi,

    This is the first patch for PR96698.

        In order to support the generating of SVE functions for "omp declare simd", this patch changes the type of the field "simdlen" of struct cgraph_simd_clone from unsigned int to poly_uint64.

        Although Richard mentioned in the PR that poly_uint64 will naturally decay to a uint64_t in i386 target files, it seems that operation /= is not supported yet, so I change "clonei->simdlen /= GET_MODE_BITSIZE (TYPE_MODE (base_type));" into "clonei->simdlen = clonei->simdlen / GET_MODE_BITSIZE (TYPE_MODE (base_type));". Also calls of to_constant () is added in printf to pass the bootstrap. However, I have no idea whether these are the best ways to do so, any suggestion?

    Richard also suggested to define a new macro to calculate a vector multiple instead of using constant_multiple_p in part2 patch, while I found that there are similar situations in part1 patch as well, so I do this in part1 patch. I didn't think of a better name, so I use "vector_unroll_factor".

        Bootstrap and tested on both aarch64 and x86 Linux platform, no new regression witnessed.

        Ok for trunk?

Thanks,
Yang Yang

[-- Attachment #2: PR96342-part1-v1.patch --]
[-- Type: application/octet-stream, Size: 23047 bytes --]

From d01d655e3ebcaa76bec21f2c62b304c4ff9d8f56 Mon Sep 17 00:00:00 2001
From: Yang Yang <yangyang305@huawei.com>
Date: Thu, 29 Oct 2020 02:50:40 +0800
Subject: [PATCH] PR target/96342 Change field "simdlen" into poly_uint64

This is the first patch of PR96342. In order to add support for
"omp declare simd", change the type of the field "simdlen" of
struct cgraph_simd_clone from unsigned int to poly_uint64 and
related adaptation. Since the length might be variable for the
SVE cases.

2020-10-30  Yang Yang  <yangyang305@huawei.com>

gcc/ChangeLog:

	* cgraph.h (struct cgraph_simd_clone): Change field "simdlen" of
	struct cgraph_simd_clone from unsigned int to poly_uint64.
	* config/aarch64/aarch64.c
	(aarch64_simd_clone_compute_vecsize_and_simdlen): adaptation of
	operations on "simdlen".
	* config/i386/i386.c (ix86_simd_clone_compute_vecsize_and_simdlen):
	Printf formats update.
	* gengtype.c (main): Handle poly_uint64.
	* omp-simd-clone.c (simd_clone_mangle): Likewise.Re
	(simd_clone_adjust_return_type): Likewise.
	(create_tmp_simd_array): Likewise.
	(simd_clone_adjust_argument_types): Likewise.
	(simd_clone_init_simd_arrays): Likewise.
	(ipa_simd_modify_function_body): Likewise.
	(simd_clone_adjust): Likewise.
	(expand_simd_clones): Likewise.
	* poly-int-types.h (vector_unroll_factor): New macro.
	* tree-vect-stmts.c (vectorizable_simd_clone_call): Likewise.
---
 gcc/cgraph.h                 |  6 +--
 gcc/config/aarch64/aarch64.c | 30 +++++++++------
 gcc/config/i386/i386.c       |  8 ++--
 gcc/gengtype.c               |  1 +
 gcc/omp-simd-clone.c         | 72 ++++++++++++++++++++++--------------
 gcc/poly-int-types.h         |  8 ++++
 gcc/tree-vect-stmts.c        | 59 +++++++++++++++--------------
 7 files changed, 112 insertions(+), 72 deletions(-)

diff --git a/gcc/cgraph.h b/gcc/cgraph.h
index 96d6cf609fe..9dc886cc58a 100644
--- a/gcc/cgraph.h
+++ b/gcc/cgraph.h
@@ -826,17 +826,17 @@ struct GTY(()) cgraph_simd_clone_arg {
 
 struct GTY(()) cgraph_simd_clone {
   /* Number of words in the SIMD lane associated with this clone.  */
-  unsigned int simdlen;
+  poly_uint64 simdlen;
 
   /* Number of annotated function arguments in `args'.  This is
      usually the number of named arguments in FNDECL.  */
   unsigned int nargs;
 
   /* Max hardware vector size in bits for integral vectors.  */
-  unsigned int vecsize_int;
+  poly_uint64 vecsize_int;
 
   /* Max hardware vector size in bits for floating point vectors.  */
-  unsigned int vecsize_float;
+  poly_uint64 vecsize_float;
 
   /* Machine mode of the mask argument(s), if they are to be passed
      as bitmasks in integer argument(s).  VOIDmode if masks are passed
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index a8cc545c370..c630c0c7f81 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -23044,18 +23044,23 @@ aarch64_simd_clone_compute_vecsize_and_simdlen (struct cgraph_node *node,
 					tree base_type, int num)
 {
   tree t, ret_type, arg_type;
-  unsigned int elt_bits, vec_bits, count;
+  unsigned int elt_bits, count;
+  unsigned HOST_WIDE_INT const_simdlen;
+  poly_uint64 vec_bits;
 
   if (!TARGET_SIMD)
     return 0;
 
-  if (clonei->simdlen
-      && (clonei->simdlen < 2
-	  || clonei->simdlen > 1024
-	  || (clonei->simdlen & (clonei->simdlen - 1)) != 0))
+  /* For now, SVE simdclones won't produce illegal simdlen, So only check
+     const simdlens here.  */
+  if (maybe_ne (clonei->simdlen, 0U)
+      && (clonei->simdlen.is_constant (&const_simdlen))
+      && (const_simdlen < 2
+	  || const_simdlen > 1024
+	  || (const_simdlen & (const_simdlen - 1)) != 0))
     {
       warning_at (DECL_SOURCE_LOCATION (node->decl), 0,
-		  "unsupported simdlen %d", clonei->simdlen);
+		  "unsupported simdlen %wd", const_simdlen);
       return 0;
     }
 
@@ -23099,21 +23104,24 @@ aarch64_simd_clone_compute_vecsize_and_simdlen (struct cgraph_node *node,
   clonei->vecsize_mangle = 'n';
   clonei->mask_mode = VOIDmode;
   elt_bits = GET_MODE_BITSIZE (SCALAR_TYPE_MODE (base_type));
-  if (clonei->simdlen == 0)
+  if (known_eq (clonei->simdlen, 0U))
     {
       count = 2;
       vec_bits = (num == 0 ? 64 : 128);
-      clonei->simdlen = vec_bits / elt_bits;
+      clonei->simdlen = exact_div (vec_bits, elt_bits);
     }
   else
     {
       count = 1;
       vec_bits = clonei->simdlen * elt_bits;
-      if (vec_bits != 64 && vec_bits != 128)
+      /* For now, SVE simdclones won't produce illegal simdlen, So only check
+	 const simdlens here.  */
+      if (clonei->simdlen.is_constant (&const_simdlen)
+	  && known_ne (vec_bits, 64U) && known_ne (vec_bits, 128U))
 	{
 	  warning_at (DECL_SOURCE_LOCATION (node->decl), 0,
-		      "GCC does not currently support simdlen %d for type %qT",
-		      clonei->simdlen, base_type);
+		      "GCC does not currently support simdlen %wd for type %qT",
+		      const_simdlen, base_type);
 	  return 0;
 	}
     }
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 54c2cdaf060..0ef037e5e55 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -22140,7 +22140,7 @@ ix86_simd_clone_compute_vecsize_and_simdlen (struct cgraph_node *node,
 	  || (clonei->simdlen & (clonei->simdlen - 1)) != 0))
     {
       warning_at (DECL_SOURCE_LOCATION (node->decl), 0,
-		  "unsupported simdlen %d", clonei->simdlen);
+		  "unsupported simdlen %ld", clonei->simdlen.to_constant ());
       return 0;
     }
 
@@ -22245,7 +22245,8 @@ ix86_simd_clone_compute_vecsize_and_simdlen (struct cgraph_node *node,
 	clonei->simdlen = clonei->vecsize_int;
       else
 	clonei->simdlen = clonei->vecsize_float;
-      clonei->simdlen /= GET_MODE_BITSIZE (TYPE_MODE (base_type));
+      clonei->simdlen = clonei->simdlen
+			/ GET_MODE_BITSIZE (TYPE_MODE (base_type));
     }
   else if (clonei->simdlen > 16)
     {
@@ -22267,7 +22268,8 @@ ix86_simd_clone_compute_vecsize_and_simdlen (struct cgraph_node *node,
       if (cnt > (TARGET_64BIT ? 16 : 8))
 	{
 	  warning_at (DECL_SOURCE_LOCATION (node->decl), 0,
-		      "unsupported simdlen %d", clonei->simdlen);
+		      "unsupported simdlen %ld",
+		      clonei->simdlen.to_constant ());
 	  return 0;
 	}
       }
diff --git a/gcc/gengtype.c b/gcc/gengtype.c
index a59a8823f82..919e3a00bf2 100644
--- a/gcc/gengtype.c
+++ b/gcc/gengtype.c
@@ -5198,6 +5198,7 @@ main (int argc, char **argv)
       POS_HERE (do_scalar_typedef ("widest_int", &pos));
       POS_HERE (do_scalar_typedef ("int64_t", &pos));
       POS_HERE (do_scalar_typedef ("poly_int64", &pos));
+      POS_HERE (do_scalar_typedef ("poly_uint64", &pos));
       POS_HERE (do_scalar_typedef ("uint64_t", &pos));
       POS_HERE (do_scalar_typedef ("uint8", &pos));
       POS_HERE (do_scalar_typedef ("uintptr_t", &pos));
diff --git a/gcc/omp-simd-clone.c b/gcc/omp-simd-clone.c
index 942fb971cb7..0671f93635c 100644
--- a/gcc/omp-simd-clone.c
+++ b/gcc/omp-simd-clone.c
@@ -338,16 +338,18 @@ simd_clone_mangle (struct cgraph_node *node,
 {
   char vecsize_mangle = clone_info->vecsize_mangle;
   char mask = clone_info->inbranch ? 'M' : 'N';
-  unsigned int simdlen = clone_info->simdlen;
+  poly_uint64 simdlen = clone_info->simdlen;
   unsigned int n;
   pretty_printer pp;
 
-  gcc_assert (vecsize_mangle && simdlen);
+  gcc_assert (vecsize_mangle && maybe_ne (simdlen, 0U));
 
   pp_string (&pp, "_ZGV");
   pp_character (&pp, vecsize_mangle);
   pp_character (&pp, mask);
-  pp_decimal_int (&pp, simdlen);
+  /* For now, simdlen is always constant, while variable simdlen pp 'n'.  */
+  unsigned int len = simdlen.to_constant ();
+  pp_decimal_int (&pp, (len));
 
   for (n = 0; n < clone_info->nargs; ++n)
     {
@@ -491,7 +493,7 @@ simd_clone_adjust_return_type (struct cgraph_node *node)
 {
   tree fndecl = node->decl;
   tree orig_rettype = TREE_TYPE (TREE_TYPE (fndecl));
-  unsigned int veclen;
+  poly_uint64 veclen;
   tree t;
 
   /* Adjust the function return type.  */
@@ -502,17 +504,18 @@ simd_clone_adjust_return_type (struct cgraph_node *node)
     veclen = node->simdclone->vecsize_int;
   else
     veclen = node->simdclone->vecsize_float;
-  veclen /= GET_MODE_BITSIZE (SCALAR_TYPE_MODE (t));
-  if (veclen > node->simdclone->simdlen)
+  veclen = exact_div (veclen, GET_MODE_BITSIZE (SCALAR_TYPE_MODE (t)));
+  if (known_gt (veclen, node->simdclone->simdlen))
     veclen = node->simdclone->simdlen;
   if (POINTER_TYPE_P (t))
     t = pointer_sized_int_node;
-  if (veclen == node->simdclone->simdlen)
+  if (known_eq (veclen, node->simdclone->simdlen))
     t = build_vector_type (t, node->simdclone->simdlen);
   else
     {
       t = build_vector_type (t, veclen);
-      t = build_array_type_nelts (t, node->simdclone->simdlen / veclen);
+      t = build_array_type_nelts (t, exact_div (node->simdclone->simdlen,
+				  veclen));
     }
   TREE_TYPE (TREE_TYPE (fndecl)) = t;
   if (!node->definition)
@@ -526,7 +529,7 @@ simd_clone_adjust_return_type (struct cgraph_node *node)
 
   tree atype = build_array_type_nelts (orig_rettype,
 				       node->simdclone->simdlen);
-  if (veclen != node->simdclone->simdlen)
+  if (maybe_ne (veclen, node->simdclone->simdlen))
     return build1 (VIEW_CONVERT_EXPR, atype, t);
 
   /* Set up a SIMD array to use as the return value.  */
@@ -546,7 +549,7 @@ simd_clone_adjust_return_type (struct cgraph_node *node)
    SIMDLEN is the number of elements.  */
 
 static tree
-create_tmp_simd_array (const char *prefix, tree type, int simdlen)
+create_tmp_simd_array (const char *prefix, tree type, poly_uint64 simdlen)
 {
   tree atype = build_array_type_nelts (type, simdlen);
   tree avar = create_tmp_var_raw (atype, prefix);
@@ -578,7 +581,8 @@ simd_clone_adjust_argument_types (struct cgraph_node *node)
   struct cgraph_simd_clone *sc = node->simdclone;
   vec<ipa_adjusted_param, va_gc> *new_params = NULL;
   vec_safe_reserve (new_params, sc->nargs);
-  unsigned i, j, veclen;
+  unsigned i, j, k;
+  poly_uint64 veclen;
 
   for (i = 0; i < sc->nargs; ++i)
     {
@@ -614,8 +618,9 @@ simd_clone_adjust_argument_types (struct cgraph_node *node)
 	    veclen = sc->vecsize_int;
 	  else
 	    veclen = sc->vecsize_float;
-	  veclen /= GET_MODE_BITSIZE (SCALAR_TYPE_MODE (parm_type));
-	  if (veclen > sc->simdlen)
+	  veclen = exact_div (veclen,
+			      GET_MODE_BITSIZE (SCALAR_TYPE_MODE (parm_type)));
+	  if (known_gt (veclen, sc->simdlen))
 	    veclen = sc->simdlen;
 	  adj.op = IPA_PARAM_OP_NEW;
 	  adj.param_prefix_index = IPA_PARAM_PREFIX_SIMD;
@@ -624,10 +629,11 @@ simd_clone_adjust_argument_types (struct cgraph_node *node)
 	  else
 	    adj.type = build_vector_type (parm_type, veclen);
 	  sc->args[i].vector_type = adj.type;
-	  for (j = veclen; j < sc->simdlen; j += veclen)
+	  k = vector_unroll_factor (sc->simdlen, veclen);
+	  for (j = 1; j < k; j++)
 	    {
 	      vec_safe_push (new_params, adj);
-	      if (j == veclen)
+	      if (j == 1)
 		{
 		  memset (&adj, 0, sizeof (adj));
 		  adj.op = IPA_PARAM_OP_NEW;
@@ -663,8 +669,9 @@ simd_clone_adjust_argument_types (struct cgraph_node *node)
 	veclen = sc->vecsize_int;
       else
 	veclen = sc->vecsize_float;
-      veclen /= GET_MODE_BITSIZE (SCALAR_TYPE_MODE (base_type));
-      if (veclen > sc->simdlen)
+      veclen = exact_div (veclen,
+			  GET_MODE_BITSIZE (SCALAR_TYPE_MODE (base_type)));
+      if (known_gt (veclen, sc->simdlen))
 	veclen = sc->simdlen;
       if (sc->mask_mode != VOIDmode)
 	adj.type
@@ -675,7 +682,8 @@ simd_clone_adjust_argument_types (struct cgraph_node *node)
 	adj.type = build_vector_type (base_type, veclen);
       vec_safe_push (new_params, adj);
 
-      for (j = veclen; j < sc->simdlen; j += veclen)
+      k = vector_unroll_factor (sc->simdlen, veclen);
+      for (j = 1; j < k; j++)
 	vec_safe_push (new_params, adj);
 
       /* We have previously allocated one extra entry for the mask.  Use
@@ -690,9 +698,9 @@ simd_clone_adjust_argument_types (struct cgraph_node *node)
 	  if (sc->mask_mode == VOIDmode)
 	    sc->args[i].simd_array
 	      = create_tmp_simd_array ("mask", base_type, sc->simdlen);
-	  else if (veclen < sc->simdlen)
+	  else if (known_lt (veclen, sc->simdlen))
 	    sc->args[i].simd_array
-	      = create_tmp_simd_array ("mask", adj.type, sc->simdlen / veclen);
+	      = create_tmp_simd_array ("mask", adj.type, k);
 	  else
 	    sc->args[i].simd_array = NULL_TREE;
 	}
@@ -783,7 +791,8 @@ simd_clone_init_simd_arrays (struct cgraph_node *node,
 	    }
 	  continue;
 	}
-      if (simd_clone_subparts (TREE_TYPE (arg)) == node->simdclone->simdlen)
+      if (known_eq (simd_clone_subparts (TREE_TYPE (arg)),
+		    node->simdclone->simdlen))
 	{
 	  tree ptype = build_pointer_type (TREE_TYPE (TREE_TYPE (array)));
 	  tree ptr = build_fold_addr_expr (array);
@@ -795,8 +804,10 @@ simd_clone_init_simd_arrays (struct cgraph_node *node,
       else
 	{
 	  unsigned int simdlen = simd_clone_subparts (TREE_TYPE (arg));
+	  unsigned int times = vector_unroll_factor (node->simdclone->simdlen,
+						     simdlen);
 	  tree ptype = build_pointer_type (TREE_TYPE (TREE_TYPE (array)));
-	  for (k = 0; k < node->simdclone->simdlen; k += simdlen)
+	  for (k = 0; k < times; k++)
 	    {
 	      tree ptr = build_fold_addr_expr (array);
 	      int elemsize;
@@ -808,7 +819,7 @@ simd_clone_init_simd_arrays (struct cgraph_node *node,
 	      tree elemtype = TREE_TYPE (TREE_TYPE (arg));
 	      elemsize = GET_MODE_SIZE (SCALAR_TYPE_MODE (elemtype));
 	      tree t = build2 (MEM_REF, TREE_TYPE (arg), ptr,
-			       build_int_cst (ptype, k * elemsize));
+			       build_int_cst (ptype, k * elemsize * simdlen));
 	      t = build2 (MODIFY_EXPR, TREE_TYPE (t), t, arg);
 	      gimplify_and_add (t, &seq);
 	    }
@@ -981,8 +992,11 @@ ipa_simd_modify_function_body (struct cgraph_node *node,
 		  iter, NULL_TREE, NULL_TREE);
       adjustments->register_replacement (&(*adjustments->m_adj_params)[j], r);
 
-      if (simd_clone_subparts (vectype) < node->simdclone->simdlen)
-	j += node->simdclone->simdlen / simd_clone_subparts (vectype) - 1;
+      if (known_lt (simd_clone_subparts (vectype), node->simdclone->simdlen))
+	{
+	  j += vector_unroll_factor (node->simdclone->simdlen,
+				     simd_clone_subparts (vectype)) - 1;
+	}
     }
 
   tree name;
@@ -1249,7 +1263,8 @@ simd_clone_adjust (struct cgraph_node *node)
 	 below).  */
       loop = alloc_loop ();
       cfun->has_force_vectorize_loops = true;
-      loop->safelen = node->simdclone->simdlen;
+      /* For now, simlen is always constant.  */
+      loop->safelen = node->simdclone->simdlen.to_constant ();
       loop->force_vectorize = true;
       loop->header = body_bb;
     }
@@ -1275,7 +1290,8 @@ simd_clone_adjust (struct cgraph_node *node)
 	    {
 	      tree maskt = TREE_TYPE (mask_array);
 	      int c = tree_to_uhwi (TYPE_MAX_VALUE (TYPE_DOMAIN (maskt)));
-	      c = node->simdclone->simdlen / (c + 1);
+	      /* For now, c must be constant here.  */
+	      c = exact_div (node->simdclone->simdlen, c + 1).to_constant ();
 	      int s = exact_log2 (c);
 	      gcc_assert (s > 0);
 	      c--;
@@ -1683,7 +1699,7 @@ expand_simd_clones (struct cgraph_node *node)
       if (clone_info == NULL)
 	continue;
 
-      int orig_simdlen = clone_info->simdlen;
+      poly_uint64 orig_simdlen = clone_info->simdlen;
       tree base_type = simd_clone_compute_base_data_type (node, clone_info);
       /* The target can return 0 (no simd clones should be created),
 	 1 (just one ISA of simd clones should be created) or higher
diff --git a/gcc/poly-int-types.h b/gcc/poly-int-types.h
index 5e04e63ebf2..78083098baa 100644
--- a/gcc/poly-int-types.h
+++ b/gcc/poly-int-types.h
@@ -81,6 +81,14 @@ typedef poly_int<NUM_POLY_INT_COEFFS, widest_int> poly_widest_int;
 #define vector_element_size(SIZE, NELTS) \
   (exact_div (SIZE, NELTS).to_constant ())
 
+/* Return the number of unroll times when a vector has NELTS1 elements
+   is unrolled to vectors has NELTS2 elements.
+
+   to_constant () is safe in this situation because the multiples of the
+   NELTS of two vectors are always constant-size scalars.  */
+#define vector_unroll_factor(NELTS1, NELTS2) \
+  (exact_div (NELTS1, NELTS2).to_constant ())
+
 /* Wrapper for poly_int arguments to target macros, so that if a target
    doesn't need polynomial-sized modes, its header file can continue to
    treat the argument as a normal constant.  This should go away once
diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
index 3575f25241f..c0e979cd8ee 100644
--- a/gcc/tree-vect-stmts.c
+++ b/gcc/tree-vect-stmts.c
@@ -3731,7 +3731,7 @@ vectorizable_simd_clone_call (vec_info *vinfo, stmt_vec_info stmt_info,
   tree op, type;
   tree vec_oprnd0 = NULL_TREE;
   tree vectype;
-  unsigned int nunits;
+  poly_uint64 nunits;
   loop_vec_info loop_vinfo = dyn_cast <loop_vec_info> (vinfo);
   bb_vec_info bb_vinfo = dyn_cast <bb_vec_info> (vinfo);
   class loop *loop = loop_vinfo ? LOOP_VINFO_LOOP (loop_vinfo) : NULL;
@@ -3883,8 +3883,8 @@ vectorizable_simd_clone_call (vec_info *vinfo, stmt_vec_info stmt_info,
       arginfo.quick_push (thisarginfo);
     }
 
-  unsigned HOST_WIDE_INT vf;
-  if (!LOOP_VINFO_VECT_FACTOR (loop_vinfo).is_constant (&vf))
+  poly_uint64 vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
+  if (!vf.is_constant ())
     {
       if (dump_enabled_p ())
 	dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
@@ -3902,12 +3902,12 @@ vectorizable_simd_clone_call (vec_info *vinfo, stmt_vec_info stmt_info,
 	 n = n->simdclone->next_clone)
       {
 	unsigned int this_badness = 0;
-	if (n->simdclone->simdlen > vf
+	if (known_gt (n->simdclone->simdlen, vf)
 	    || n->simdclone->nargs != nargs)
 	  continue;
-	if (n->simdclone->simdlen < vf)
-	  this_badness += (exact_log2 (vf)
-			   - exact_log2 (n->simdclone->simdlen)) * 1024;
+	if (known_lt (n->simdclone->simdlen, vf))
+	  this_badness += exact_log2
+	    (vector_unroll_factor (vf, n->simdclone->simdlen)) * 1024;
 	if (n->simdclone->inbranch)
 	  this_badness += 2048;
 	int target_badness = targetm.simd_clone.usable (n);
@@ -3988,19 +3988,19 @@ vectorizable_simd_clone_call (vec_info *vinfo, stmt_vec_info stmt_info,
 	arginfo[i].vectype = get_vectype_for_scalar_type (vinfo, arg_type,
 							  slp_node);
 	if (arginfo[i].vectype == NULL
-	    || (simd_clone_subparts (arginfo[i].vectype)
-		> bestn->simdclone->simdlen))
+	    || (known_gt (simd_clone_subparts (arginfo[i].vectype),
+			  bestn->simdclone->simdlen)))
 	  return false;
       }
 
   fndecl = bestn->decl;
   nunits = bestn->simdclone->simdlen;
-  ncopies = vf / nunits;
+  ncopies = vector_unroll_factor (vf, nunits);
 
   /* If the function isn't const, only allow it in simd loops where user
      has asserted that at least nunits consecutive iterations can be
      performed using SIMD instructions.  */
-  if ((loop == NULL || (unsigned) loop->safelen < nunits)
+  if ((loop == NULL || known_lt ((unsigned) loop->safelen, nunits))
       && gimple_vuse (stmt))
     return false;
 
@@ -4078,15 +4078,16 @@ vectorizable_simd_clone_call (vec_info *vinfo, stmt_vec_info stmt_info,
 	    {
 	    case SIMD_CLONE_ARG_TYPE_VECTOR:
 	      atype = bestn->simdclone->args[i].vector_type;
-	      o = nunits / simd_clone_subparts (atype);
+	      o = vector_unroll_factor (nunits,
+					simd_clone_subparts (atype));
 	      for (m = j * o; m < (j + 1) * o; m++)
 		{
-		  if (simd_clone_subparts (atype)
-		      < simd_clone_subparts (arginfo[i].vectype))
+		  if (known_lt (simd_clone_subparts (atype),
+				simd_clone_subparts (arginfo[i].vectype)))
 		    {
 		      poly_uint64 prec = GET_MODE_BITSIZE (TYPE_MODE (atype));
-		      k = (simd_clone_subparts (arginfo[i].vectype)
-			   / simd_clone_subparts (atype));
+		      k = simd_clone_subparts (arginfo[i].vectype)
+			  / simd_clone_subparts (atype);
 		      gcc_assert ((k & (k - 1)) == 0);
 		      if (m == 0)
 			{
@@ -4116,8 +4117,8 @@ vectorizable_simd_clone_call (vec_info *vinfo, stmt_vec_info stmt_info,
 		    }
 		  else
 		    {
-		      k = (simd_clone_subparts (atype)
-			   / simd_clone_subparts (arginfo[i].vectype));
+		      k = simd_clone_subparts (atype)
+			  / simd_clone_subparts (arginfo[i].vectype);
 		      gcc_assert ((k & (k - 1)) == 0);
 		      vec<constructor_elt, va_gc> *ctor_elts;
 		      if (k != 1)
@@ -4203,7 +4204,7 @@ vectorizable_simd_clone_call (vec_info *vinfo, stmt_vec_info stmt_info,
 		      ? POINTER_PLUS_EXPR : PLUS_EXPR;
 		  tree type = POINTER_TYPE_P (TREE_TYPE (op))
 			      ? sizetype : TREE_TYPE (op);
-		  widest_int cst
+		  poly_widest_int cst
 		    = wi::mul (bestn->simdclone->args[i].linear_step,
 			       ncopies * nunits);
 		  tree tcst = wide_int_to_tree (type, cst);
@@ -4224,7 +4225,7 @@ vectorizable_simd_clone_call (vec_info *vinfo, stmt_vec_info stmt_info,
 		      ? POINTER_PLUS_EXPR : PLUS_EXPR;
 		  tree type = POINTER_TYPE_P (TREE_TYPE (op))
 			      ? sizetype : TREE_TYPE (op);
-		  widest_int cst
+		  poly_widest_int cst
 		    = wi::mul (bestn->simdclone->args[i].linear_step,
 			       j * nunits);
 		  tree tcst = wide_int_to_tree (type, cst);
@@ -4250,7 +4251,8 @@ vectorizable_simd_clone_call (vec_info *vinfo, stmt_vec_info stmt_info,
       gcall *new_call = gimple_build_call_vec (fndecl, vargs);
       if (vec_dest)
 	{
-	  gcc_assert (ratype || simd_clone_subparts (rtype) == nunits);
+	  gcc_assert (ratype || known_eq (simd_clone_subparts (rtype),
+					  nunits));
 	  if (ratype)
 	    new_temp = create_tmp_var (ratype);
 	  else if (useless_type_conversion_p (vectype, rtype))
@@ -4264,12 +4266,13 @@ vectorizable_simd_clone_call (vec_info *vinfo, stmt_vec_info stmt_info,
 
       if (vec_dest)
 	{
-	  if (simd_clone_subparts (vectype) < nunits)
+	  if (known_lt (simd_clone_subparts (vectype), nunits))
 	    {
 	      unsigned int k, l;
 	      poly_uint64 prec = GET_MODE_BITSIZE (TYPE_MODE (vectype));
 	      poly_uint64 bytes = GET_MODE_SIZE (TYPE_MODE (vectype));
-	      k = nunits / simd_clone_subparts (vectype);
+	      k = vector_unroll_factor (nunits,
+					simd_clone_subparts (vectype));
 	      gcc_assert ((k & (k - 1)) == 0);
 	      for (l = 0; l < k; l++)
 		{
@@ -4295,16 +4298,18 @@ vectorizable_simd_clone_call (vec_info *vinfo, stmt_vec_info stmt_info,
 		vect_clobber_variable (vinfo, stmt_info, gsi, new_temp);
 	      continue;
 	    }
-	  else if (simd_clone_subparts (vectype) > nunits)
+	  else if (known_gt (simd_clone_subparts (vectype), nunits))
 	    {
-	      unsigned int k = (simd_clone_subparts (vectype)
-				/ simd_clone_subparts (rtype));
+	      unsigned int k = simd_clone_subparts (vectype)
+			       / simd_clone_subparts (rtype);
 	      gcc_assert ((k & (k - 1)) == 0);
 	      if ((j & (k - 1)) == 0)
 		vec_alloc (ret_ctor_elts, k);
 	      if (ratype)
 		{
-		  unsigned int m, o = nunits / simd_clone_subparts (rtype);
+		  unsigned int m, o;
+		  o = vector_unroll_factor (nunits,
+					    simd_clone_subparts (rtype));
 		  for (m = 0; m < o; m++)
 		    {
 		      tree tem = build4 (ARRAY_REF, rtype, new_temp,
-- 
2.19.1


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/5] [PR target/96342] Change field "simdlen" into poly_uint64
  2020-10-30  2:29 [PATCH 1/5] [PR target/96342] Change field "simdlen" into poly_uint64 yangyang (ET)
@ 2020-10-30 13:32 ` Richard Sandiford
  2020-11-02 13:17   ` yangyang (ET)
  0 siblings, 1 reply; 9+ messages in thread
From: Richard Sandiford @ 2020-10-30 13:32 UTC (permalink / raw)
  To: yangyang (ET); +Cc: gcc-patches

"yangyang (ET)" <yangyang305@huawei.com> writes:
>         Although Richard mentioned in the PR that poly_uint64 will naturally decay to a uint64_t in i386 target files, it seems that operation /= is not supported yet, so I change "clonei->simdlen /= GET_MODE_BITSIZE (TYPE_MODE (base_type));" into "clonei->simdlen = clonei->simdlen / GET_MODE_BITSIZE (TYPE_MODE (base_type));".

Ah, don't remember encountering that one.  But yeah, expanding the
/= seems like the best approach for now.

> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index a8cc545c370..c630c0c7f81 100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -23044,18 +23044,23 @@ aarch64_simd_clone_compute_vecsize_and_simdlen (struct cgraph_node *node,
>  					tree base_type, int num)
>  {
>    tree t, ret_type, arg_type;
> -  unsigned int elt_bits, vec_bits, count;
> +  unsigned int elt_bits, count;
> +  unsigned HOST_WIDE_INT const_simdlen;
> +  poly_uint64 vec_bits;
>  
>    if (!TARGET_SIMD)
>      return 0;
>  
> -  if (clonei->simdlen
> -      && (clonei->simdlen < 2
> -	  || clonei->simdlen > 1024
> -	  || (clonei->simdlen & (clonei->simdlen - 1)) != 0))
> +  /* For now, SVE simdclones won't produce illegal simdlen, So only check
> +     const simdlens here.  */
> +  if (maybe_ne (clonei->simdlen, 0U)
> +      && (clonei->simdlen.is_constant (&const_simdlen))

Very minor, but GCC style is (mostly!) not to wrap a condition like this
in parentheses if it fits on a single line, so just:

      && clonei->simdlen.is_constant (&const_simdlen)

>    else
>      {
>        count = 1;
>        vec_bits = clonei->simdlen * elt_bits;
> -      if (vec_bits != 64 && vec_bits != 128)
> +      /* For now, SVE simdclones won't produce illegal simdlen, So only check
> +	 const simdlens here.  */
> +      if (clonei->simdlen.is_constant (&const_simdlen)
> +	  && known_ne (vec_bits, 64U) && known_ne (vec_bits, 128U))

Although it won't make a difference in context due to the is_constant
check, in principle this should be “maybe_ne” rather than “known_ne”.
E.g. when testing SVE conditions, known_ne (2 + 2 * (VQ - 1), 2)
is false but maybe_ne (2 + 2 * (VQ - 1), 2) is true.

Alternatively:

   !(known_eq (vec_bits, 64U) || known_eq (vec_bits, 128U))

if that seems more natural (either's fine).

> diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
> index 54c2cdaf060..0ef037e5e55 100644
> --- a/gcc/config/i386/i386.c
> +++ b/gcc/config/i386/i386.c
> @@ -22140,7 +22140,7 @@ ix86_simd_clone_compute_vecsize_and_simdlen (struct cgraph_node *node,
>  	  || (clonei->simdlen & (clonei->simdlen - 1)) != 0))
>      {
>        warning_at (DECL_SOURCE_LOCATION (node->decl), 0,
> -		  "unsupported simdlen %d", clonei->simdlen);
> +		  "unsupported simdlen %ld", clonei->simdlen.to_constant ());

I think this should be %wd instead.

> @@ -22267,7 +22268,8 @@ ix86_simd_clone_compute_vecsize_and_simdlen (struct cgraph_node *node,
>        if (cnt > (TARGET_64BIT ? 16 : 8))
>  	{
>  	  warning_at (DECL_SOURCE_LOCATION (node->decl), 0,
> -		      "unsupported simdlen %d", clonei->simdlen);
> +		      "unsupported simdlen %ld",
> +		      clonei->simdlen.to_constant ());

Same here.

> @@ -502,17 +504,18 @@ simd_clone_adjust_return_type (struct cgraph_node *node)
>      veclen = node->simdclone->vecsize_int;
>    else
>      veclen = node->simdclone->vecsize_float;
> -  veclen /= GET_MODE_BITSIZE (SCALAR_TYPE_MODE (t));
> -  if (veclen > node->simdclone->simdlen)
> +  veclen = exact_div (veclen, GET_MODE_BITSIZE (SCALAR_TYPE_MODE (t)));
> +  if (known_gt (veclen, node->simdclone->simdlen))
>      veclen = node->simdclone->simdlen;

Although again it probably doesn't make a difference in practice,
the known/maybe situation is similar here.  When comparing:

- an SVE vector of 2 + 2 * (VQ - 1) doubles and
- an Advanced SIMD vector of 2 doubles

the Advanced SIMD version is conceptually ordered <= the SVE one,
in the sense that the SVE vector always contains a whole number of
Advanced SIMD vectors whereas the Advanced SIMD vector might not
contain a whole number of SVE vectors.

In other words, the number of lanes in the Advanced SIMD vector
is known_le the number of lanes in the SVE vector, and the number
of lanes in the SVE vector is known_ge and maybe_gt (but not known_gt)
the number of lanes in the Advanced SIMD vector.  So for these kinds of
comparison, known_gt can look a bit unexpected, even if (as here) it's
probably fine in practice.

There's currently a hard-coded assumption in this code and in the
vectoriser that both constant-length software vectors and constant-length
hardware vectors are a power of 2 in size.  This means that the > above
is effectively testing whether veclen contains a whole number of
node->simdclone->simdlens, not just whether the veclen is bigger.

So when adding the initial autovec support for SVE, we generally rewrote
these kinds of test to use multiple_p instead of comparisons.  I think
that makes the intent more obvious (but others might disagree :-)).

That was a long-winded way of saying that I suggest we use:

  if (multiple_p (veclen, node->simdclone->simdlen))
    veclen = node->simdclone->simdlen;

instead.

>    if (POINTER_TYPE_P (t))
>      t = pointer_sized_int_node;
> -  if (veclen == node->simdclone->simdlen)
> +  if (known_eq (veclen, node->simdclone->simdlen))
>      t = build_vector_type (t, node->simdclone->simdlen);
>    else
>      {
>        t = build_vector_type (t, veclen);
> -      t = build_array_type_nelts (t, node->simdclone->simdlen / veclen);
> +      t = build_array_type_nelts (t, exact_div (node->simdclone->simdlen,
> +				  veclen));

Another minor formatting thing, but the “veclen” should be indented under
“node->simdclone->simdlen”.

>      }
>    TREE_TYPE (TREE_TYPE (fndecl)) = t;
>    if (!node->definition)
> @@ -663,8 +669,9 @@ simd_clone_adjust_argument_types (struct cgraph_node *node)
>  	veclen = sc->vecsize_int;
>        else
>  	veclen = sc->vecsize_float;
> -      veclen /= GET_MODE_BITSIZE (SCALAR_TYPE_MODE (base_type));
> -      if (veclen > sc->simdlen)
> +      veclen = exact_div (veclen,
> +			  GET_MODE_BITSIZE (SCALAR_TYPE_MODE (base_type)));
> +      if (known_gt (veclen, sc->simdlen))
>  	veclen = sc->simdlen;

Similarly here.

>        if (sc->mask_mode != VOIDmode)
>  	adj.type
> @@ -675,7 +682,8 @@ simd_clone_adjust_argument_types (struct cgraph_node *node)
>  	adj.type = build_vector_type (base_type, veclen);
>        vec_safe_push (new_params, adj);
>  
> -      for (j = veclen; j < sc->simdlen; j += veclen)
> +      k = vector_unroll_factor (sc->simdlen, veclen);
> +      for (j = 1; j < k; j++)
>  	vec_safe_push (new_params, adj);
>  
>        /* We have previously allocated one extra entry for the mask.  Use
> @@ -690,9 +698,9 @@ simd_clone_adjust_argument_types (struct cgraph_node *node)
>  	  if (sc->mask_mode == VOIDmode)
>  	    sc->args[i].simd_array
>  	      = create_tmp_simd_array ("mask", base_type, sc->simdlen);
> -	  else if (veclen < sc->simdlen)
> +	  else if (known_lt (veclen, sc->simdlen))

I think k > 1 might be clearer, in the sense that it avoids a choice
between “maybe” and “known”.

>  	    sc->args[i].simd_array
> -	      = create_tmp_simd_array ("mask", adj.type, sc->simdlen / veclen);
> +	      = create_tmp_simd_array ("mask", adj.type, k);
>  	  else
>  	    sc->args[i].simd_array = NULL_TREE;
>  	}
> […]
> @@ -981,8 +992,11 @@ ipa_simd_modify_function_body (struct cgraph_node *node,
>  		  iter, NULL_TREE, NULL_TREE);
>        adjustments->register_replacement (&(*adjustments->m_adj_params)[j], r);
>  
> -      if (simd_clone_subparts (vectype) < node->simdclone->simdlen)
> -	j += node->simdclone->simdlen / simd_clone_subparts (vectype) - 1;
> +      if (known_lt (simd_clone_subparts (vectype), node->simdclone->simdlen))
> +	{
> +	  j += vector_unroll_factor (node->simdclone->simdlen,
> +				     simd_clone_subparts (vectype)) - 1;
> +	}

And similarly here:

      if (multiple_p (node->simdclone->simdlen, simd_clone_subparts (vectype)))

> diff --git a/gcc/poly-int-types.h b/gcc/poly-int-types.h
> index 5e04e63ebf2..78083098baa 100644
> --- a/gcc/poly-int-types.h
> +++ b/gcc/poly-int-types.h
> @@ -81,6 +81,14 @@ typedef poly_int<NUM_POLY_INT_COEFFS, widest_int> poly_widest_int;
>  #define vector_element_size(SIZE, NELTS) \
>    (exact_div (SIZE, NELTS).to_constant ())
>  
> +/* Return the number of unroll times when a vector has NELTS1 elements

s/vector has/vector that has/

> +   is unrolled to vectors has NELTS2 elements.

s/vectors has/vectors that have/.

> +
> +   to_constant () is safe in this situation because the multiples of the
> +   NELTS of two vectors are always constant-size scalars.  */
> +#define vector_unroll_factor(NELTS1, NELTS2) \
> +  (exact_div (NELTS1, NELTS2).to_constant ())
> +
>  /* Wrapper for poly_int arguments to target macros, so that if a target
>     doesn't need polynomial-sized modes, its header file can continue to
>     treat the argument as a normal constant.  This should go away once
> @@ -3902,12 +3902,12 @@ vectorizable_simd_clone_call (vec_info *vinfo, stmt_vec_info stmt_info,
>  	 n = n->simdclone->next_clone)
>        {
>  	unsigned int this_badness = 0;
> -	if (n->simdclone->simdlen > vf
> +	if (known_gt (n->simdclone->simdlen, vf)

I think !constant_multiple_p rather than known_gt here.  Until we add
support for vectorising simd clones with partial vectors, a whole number
of simdlens must fit within VF.  So maybe:

     unsigned int num_calls;
     if (!constant_multiple_p (n->simdclone->simdlen, vf, &num_calls)

and then…

>  	    || n->simdclone->nargs != nargs)
>  	  continue;
> -	if (n->simdclone->simdlen < vf)
> -	  this_badness += (exact_log2 (vf)
> -			   - exact_log2 (n->simdclone->simdlen)) * 1024;
> +	if (known_lt (n->simdclone->simdlen, vf))
> +	  this_badness += exact_log2
> +	    (vector_unroll_factor (vf, n->simdclone->simdlen)) * 1024;

…use num_calls instead of vector_unroll_factor here.


>  	if (n->simdclone->inbranch)
>  	  this_badness += 2048;
>  	int target_badness = targetm.simd_clone.usable (n);
> @@ -3988,19 +3988,19 @@ vectorizable_simd_clone_call (vec_info *vinfo, stmt_vec_info stmt_info,
>  	arginfo[i].vectype = get_vectype_for_scalar_type (vinfo, arg_type,
>  							  slp_node);
>  	if (arginfo[i].vectype == NULL
> -	    || (simd_clone_subparts (arginfo[i].vectype)
> -		> bestn->simdclone->simdlen))
> +	    || (known_gt (simd_clone_subparts (arginfo[i].vectype),
> +			  bestn->simdclone->simdlen)))

Here too I think we want constant_multiple_p:

	    || !constant_multiple_p (bestn->simdclone->simdlen,
				     simd_clone_subparts (arginfo[i].vectype))

>  	  return false;
>        }
>  
>    fndecl = bestn->decl;
>    nunits = bestn->simdclone->simdlen;
> -  ncopies = vf / nunits;
> +  ncopies = vector_unroll_factor (vf, nunits);
>  
>    /* If the function isn't const, only allow it in simd loops where user
>       has asserted that at least nunits consecutive iterations can be
>       performed using SIMD instructions.  */
> -  if ((loop == NULL || (unsigned) loop->safelen < nunits)
> +  if ((loop == NULL || known_lt ((unsigned) loop->safelen, nunits))

This should be maybe_lt, because we can only safely optimise if we know
that the simdlen >= safelen.

>        && gimple_vuse (stmt))
>      return false;
>  
> @@ -4078,15 +4078,16 @@ vectorizable_simd_clone_call (vec_info *vinfo, stmt_vec_info stmt_info,
>  	    {
>  	    case SIMD_CLONE_ARG_TYPE_VECTOR:
>  	      atype = bestn->simdclone->args[i].vector_type;
> -	      o = nunits / simd_clone_subparts (atype);
> +	      o = vector_unroll_factor (nunits,
> +					simd_clone_subparts (atype));
>  	      for (m = j * o; m < (j + 1) * o; m++)
>  		{
> -		  if (simd_clone_subparts (atype)
> -		      < simd_clone_subparts (arginfo[i].vectype))
> +		  if (known_lt (simd_clone_subparts (atype),
> +				simd_clone_subparts (arginfo[i].vectype)))

I think we should leave this and…

>  		    {
>  		      poly_uint64 prec = GET_MODE_BITSIZE (TYPE_MODE (atype));
> -		      k = (simd_clone_subparts (arginfo[i].vectype)
> -			   / simd_clone_subparts (atype));
> +		      k = simd_clone_subparts (arginfo[i].vectype)
> +			  / simd_clone_subparts (atype);

…this until the simd_clone_subparts calls are removed.  FWIW, the original
formatting of the division is the preferred one (with the justification
that it helps emacs to indent the second line correctly :-)).

>  		      gcc_assert ((k & (k - 1)) == 0);
>  		      if (m == 0)
>  			{
> @@ -4116,8 +4117,8 @@ vectorizable_simd_clone_call (vec_info *vinfo, stmt_vec_info stmt_info,
>  		    }
>  		  else
>  		    {
> -		      k = (simd_clone_subparts (atype)
> -			   / simd_clone_subparts (arginfo[i].vectype));
> +		      k = simd_clone_subparts (atype)
> +			  / simd_clone_subparts (arginfo[i].vectype);

Similarly here, the original formatting was correct.

>  		      gcc_assert ((k & (k - 1)) == 0);
>  		      vec<constructor_elt, va_gc> *ctor_elts;
>  		      if (k != 1)
> @@ -4250,7 +4251,8 @@ vectorizable_simd_clone_call (vec_info *vinfo, stmt_vec_info stmt_info,
>        gcall *new_call = gimple_build_call_vec (fndecl, vargs);
>        if (vec_dest)
>  	{
> -	  gcc_assert (ratype || simd_clone_subparts (rtype) == nunits);
> +	  gcc_assert (ratype || known_eq (simd_clone_subparts (rtype),
> +					  nunits));

Another formatting nit, sorry, but now that it needs to span multiple lines,
the preferred formatting is:

	  gcc_assert (ratype
		      || known_eq (simd_clone_subparts (rtype), nunits));

>  	  if (ratype)
>  	    new_temp = create_tmp_var (ratype);
>  	  else if (useless_type_conversion_p (vectype, rtype))
> @@ -4264,12 +4266,13 @@ vectorizable_simd_clone_call (vec_info *vinfo, stmt_vec_info stmt_info,
>  
>        if (vec_dest)
>  	{
> -	  if (simd_clone_subparts (vectype) < nunits)
> +	  if (known_lt (simd_clone_subparts (vectype), nunits))

multiple_p (nunits, …) here too.  It would also be possible to do:

	  if (constant_multiple_p (nunits, simd_clone_subparts (vectype), &k)

and avoid the separate vector_unroll_factor.

>  	    {
>  	      unsigned int k, l;
>  	      poly_uint64 prec = GET_MODE_BITSIZE (TYPE_MODE (vectype));
>  	      poly_uint64 bytes = GET_MODE_SIZE (TYPE_MODE (vectype));
> -	      k = nunits / simd_clone_subparts (vectype);
> +	      k = vector_unroll_factor (nunits,
> +					simd_clone_subparts (vectype));
>  	      gcc_assert ((k & (k - 1)) == 0);
>  	      for (l = 0; l < k; l++)
>  		{
> @@ -4295,16 +4298,18 @@ vectorizable_simd_clone_call (vec_info *vinfo, stmt_vec_info stmt_info,
>  		vect_clobber_variable (vinfo, stmt_info, gsi, new_temp);
>  	      continue;
>  	    }
> -	  else if (simd_clone_subparts (vectype) > nunits)
> +	  else if (known_gt (simd_clone_subparts (vectype), nunits))

multiple_p here too.

>  	    {
> -	      unsigned int k = (simd_clone_subparts (vectype)
> -				/ simd_clone_subparts (rtype));
> +	      unsigned int k = simd_clone_subparts (vectype)
> +			       / simd_clone_subparts (rtype);

Same comment about formatting.

The patch looks really good though, thanks.  The above comments are only
really repetitions of a small number of minor things.

Richard

^ permalink raw reply	[flat|nested] 9+ messages in thread

* RE: [PATCH 1/5] [PR target/96342] Change field "simdlen" into poly_uint64
  2020-10-30 13:32 ` Richard Sandiford
@ 2020-11-02 13:17   ` yangyang (ET)
  2020-11-02 14:15     ` Richard Sandiford
  0 siblings, 1 reply; 9+ messages in thread
From: yangyang (ET) @ 2020-11-02 13:17 UTC (permalink / raw)
  To: Richard Sandiford; +Cc: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 8612 bytes --]

Hi,

I have revised the patch based on your suggestions, and the following are some points that I think is needed to be mentioned in the mail.

> > @@ -502,17 +504,18 @@ simd_clone_adjust_return_type (struct
> cgraph_node *node)
> >      veclen = node->simdclone->vecsize_int;
> >    else
> >      veclen = node->simdclone->vecsize_float;
> > -  veclen /= GET_MODE_BITSIZE (SCALAR_TYPE_MODE (t));
> > -  if (veclen > node->simdclone->simdlen)
> > +  veclen = exact_div (veclen, GET_MODE_BITSIZE (SCALAR_TYPE_MODE
> > + (t)));  if (known_gt (veclen, node->simdclone->simdlen))
> >      veclen = node->simdclone->simdlen;
>
> Although again it probably doesn't make a difference in practice, the
> known/maybe situation is similar here.  When comparing:
>
> - an SVE vector of 2 + 2 * (VQ - 1) doubles and
> - an Advanced SIMD vector of 2 doubles
>
> the Advanced SIMD version is conceptually ordered <= the SVE one, in the
> sense that the SVE vector always contains a whole number of Advanced SIMD
> vectors whereas the Advanced SIMD vector might not contain a whole number
> of SVE vectors.
>
> In other words, the number of lanes in the Advanced SIMD vector is known_le
> the number of lanes in the SVE vector, and the number of lanes in the SVE
> vector is known_ge and maybe_gt (but not known_gt) the number of lanes in
> the Advanced SIMD vector.  So for these kinds of comparison, known_gt can
> look a bit unexpected, even if (as here) it's probably fine in practice.
>
> There's currently a hard-coded assumption in this code and in the vectoriser
> that both constant-length software vectors and constant-length hardware
> vectors are a power of 2 in size.  This means that the > above is effectively
> testing whether veclen contains a whole number of
> node->simdclone->simdlens, not just whether the veclen is bigger.
>
> So when adding the initial autovec support for SVE, we generally rewrote these
> kinds of test to use multiple_p instead of comparisons.  I think that makes the
> intent more obvious (but others might disagree :-)).
>
> That was a long-winded way of saying that I suggest we use:
>
>   if (multiple_p (veclen, node->simdclone->simdlen))
>     veclen = node->simdclone->simdlen;
>
> instead.

As far as I understand, multiple_p (a, b) is equal to known_ge (a, b) due to the hard-coded assumption, and !multiple_p (b, a) is equal to known_gt (a, b). So it seems better to use !multiple_p (node->simdclone->simdlen, veclen) here,while using multiple_p (veclen, node->simdclone->simdlen) is OK since doing the assignment for the eq situation won't change anything. Anyway, I used multiple_p (veclen, node->simdclone->simdlen) in the revised patch.

> >      }
> >    TREE_TYPE (TREE_TYPE (fndecl)) = t;
> >    if (!node->definition)
> > @@ -663,8 +669,9 @@ simd_clone_adjust_argument_types (struct
> cgraph_node *node)
> >     veclen = sc->vecsize_int;
> >        else
> >     veclen = sc->vecsize_float;
> > -      veclen /= GET_MODE_BITSIZE (SCALAR_TYPE_MODE (base_type));
> > -      if (veclen > sc->simdlen)
> > +      veclen = exact_div (veclen,
> > +                     GET_MODE_BITSIZE (SCALAR_TYPE_MODE (base_type)));
> > +      if (known_gt (veclen, sc->simdlen))
> >     veclen = sc->simdlen;
>
> Similarly here.

Similarly here, use multiple_p (veclen, sc->simdlen) in the revised patch in addition.

> > @@ -981,8 +992,11 @@ ipa_simd_modify_function_body (struct
> cgraph_node *node,
> >               iter, NULL_TREE, NULL_TREE);
> >        adjustments->register_replacement
> > (&(*adjustments->m_adj_params)[j], r);
> >
> > -      if (simd_clone_subparts (vectype) < node->simdclone->simdlen)
> > -   j += node->simdclone->simdlen / simd_clone_subparts (vectype) - 1;
> > +      if (known_lt (simd_clone_subparts (vectype),
> node->simdclone->simdlen))
> > +   {
> > +     j += vector_unroll_factor (node->simdclone->simdlen,
> > +                                simd_clone_subparts (vectype)) - 1;
> > +   }
>
> And similarly here:
>
>       if (multiple_p (node->simdclone->simdlen, simd_clone_subparts
> (vectype)))
>

Similarly here, use multiple_p (node->simdclone->simdlen, simd_clone_subparts (vectype)) in the revised patch in addition.

> >  /* Wrapper for poly_int arguments to target macros, so that if a target
> >     doesn't need polynomial-sized modes, its header file can continue to
> >     treat the argument as a normal constant.  This should go away once
> > @@ -3902,12 +3902,12 @@ vectorizable_simd_clone_call (vec_info *vinfo,
> stmt_vec_info stmt_info,
> >      n = n->simdclone->next_clone)
> >        {
> >     unsigned int this_badness = 0;
> > -   if (n->simdclone->simdlen > vf
> > +   if (known_gt (n->simdclone->simdlen, vf)
>
> I think !constant_multiple_p rather than known_gt here.  Until we add
> support for vectorising simd clones with partial vectors, a whole number of
> simdlens must fit within VF.  So maybe:
>
>      unsigned int num_calls;
>      if (!constant_multiple_p (n->simdclone->simdlen, vf, &num_calls)
>
> and then…
>
> >         || n->simdclone->nargs != nargs)
> >       continue;
> > -   if (n->simdclone->simdlen < vf)
> > -     this_badness += (exact_log2 (vf)
> > -                      - exact_log2 (n->simdclone->simdlen)) * 1024;
> > +   if (known_lt (n->simdclone->simdlen, vf))
> > +     this_badness += exact_log2
> > +       (vector_unroll_factor (vf, n->simdclone->simdlen)) * 1024;
>
> …use num_calls instead of vector_unroll_factor here.

The revised patch wrote these code as:

unsigned int num_calls;
if (!constant_multiple_p (vf, n->simdclone->simdlen, &num_calls)
  || n->simdclone->nargs != nargs)
 continue;
if (!multiple_p (n->simdclone->simdlen, vf))
 this_badness += exact_log2 (num_calls) * 1024;

>
> >     if (n->simdclone->inbranch)
> >       this_badness += 2048;
> >     int target_badness = targetm.simd_clone.usable (n); @@ -3988,19
> > +3988,19 @@ vectorizable_simd_clone_call (vec_info *vinfo, stmt_vec_info
> stmt_info,
> >     arginfo[i].vectype = get_vectype_for_scalar_type (vinfo, arg_type,
> >                                                       slp_node);
> >     if (arginfo[i].vectype == NULL
> > -       || (simd_clone_subparts (arginfo[i].vectype)
> > -           > bestn->simdclone->simdlen))
> > +       || (known_gt (simd_clone_subparts (arginfo[i].vectype),
> > +                     bestn->simdclone->simdlen)))
>
> Here too I think we want constant_multiple_p:
>
>           || !constant_multiple_p (bestn->simdclone->simdlen,
>                                    simd_clone_subparts (arginfo[i].vectype))
>

Use multiple_p here since the multiple is not needed.

> >       if (ratype)
> >         new_temp = create_tmp_var (ratype);
> >       else if (useless_type_conversion_p (vectype, rtype)) @@ -4264,12
> > +4266,13 @@ vectorizable_simd_clone_call (vec_info *vinfo,
> > stmt_vec_info stmt_info,
> >
> >        if (vec_dest)
> >     {
> > -     if (simd_clone_subparts (vectype) < nunits)
> > +     if (known_lt (simd_clone_subparts (vectype), nunits))
>
> multiple_p (nunits, …) here too.  It would also be possible to do:
>
>         if (constant_multiple_p (nunits, simd_clone_subparts (vectype), &k)
>
> and avoid the separate vector_unroll_factor.

Use !multiple_p (simd_clone_subparts (vectype), nunits) here, since it seems that the following code should not be executed in the eq situation.

>
> >         {
> >           unsigned int k, l;
> >           poly_uint64 prec = GET_MODE_BITSIZE (TYPE_MODE (vectype));
> >           poly_uint64 bytes = GET_MODE_SIZE (TYPE_MODE (vectype));
> > -         k = nunits / simd_clone_subparts (vectype);
> > +         k = vector_unroll_factor (nunits,
> > +                                   simd_clone_subparts (vectype));
> >           gcc_assert ((k & (k - 1)) == 0);
> >           for (l = 0; l < k; l++)
> >             {
> > @@ -4295,16 +4298,18 @@ vectorizable_simd_clone_call (vec_info *vinfo,
> stmt_vec_info stmt_info,
> >             vect_clobber_variable (vinfo, stmt_info, gsi, new_temp);
> >           continue;
> >         }
> > -     else if (simd_clone_subparts (vectype) > nunits)
> > +     else if (known_gt (simd_clone_subparts (vectype), nunits))
>
> multiple_p here too.
>

Similarly, use !multiple_p (nunits, simd_clone_subparts (vectype)) here.

Bootstrap and tested on both aarch64 and x86 Linux platform, no new regression witnessed.

Any suggestions?

Thanks,
Yang Yang

[-- Attachment #2: PR96342-part1-v2.patch --]
[-- Type: application/octet-stream, Size: 22069 bytes --]

From dc8ec002322499823324c45973db5f16cf21f3c4 Mon Sep 17 00:00:00 2001
From: Yang Yang <yangyang305@huawei.com>
Date: Mon, 2 Nov 2020 19:06:45 +0800
Subject: [PATCH] PR target/96342 Change field "simdlen" into poly_uint64

This is the first patch of PR96342. In order to add support for
"omp declare simd", change the type of the field "simdlen" of
struct cgraph_simd_clone from unsigned int to poly_uint64 and
related adaptation. Since the length might be variable for the
SVE cases.

2020-11-02  Yang Yang  <yangyang305@huawei.com>

gcc/ChangeLog:

	* cgraph.h (struct cgraph_simd_clone): Change field "simdlen" of
	struct cgraph_simd_clone from unsigned int to poly_uint64.
	* config/aarch64/aarch64.c
	(aarch64_simd_clone_compute_vecsize_and_simdlen): adaptation of
	operations on "simdlen".
	* config/i386/i386.c (ix86_simd_clone_compute_vecsize_and_simdlen):
	Printf formats update.
	* gengtype.c (main): Handle poly_uint64.
	* omp-simd-clone.c (simd_clone_mangle): Likewise.Re
	(simd_clone_adjust_return_type): Likewise.
	(create_tmp_simd_array): Likewise.
	(simd_clone_adjust_argument_types): Likewise.
	(simd_clone_init_simd_arrays): Likewise.
	(ipa_simd_modify_function_body): Likewise.
	(simd_clone_adjust): Likewise.
	(expand_simd_clones): Likewise.
	* poly-int-types.h (vector_unroll_factor): New macro.
	* tree-vect-stmts.c (vectorizable_simd_clone_call): Likewise.
---
 gcc/cgraph.h                 |  6 +--
 gcc/config/aarch64/aarch64.c | 30 +++++++++------
 gcc/config/i386/i386.c       |  8 ++--
 gcc/gengtype.c               |  1 +
 gcc/omp-simd-clone.c         | 72 ++++++++++++++++++++++--------------
 gcc/poly-int-types.h         |  8 ++++
 gcc/tree-vect-stmts.c        | 43 +++++++++++----------
 7 files changed, 104 insertions(+), 64 deletions(-)

diff --git a/gcc/cgraph.h b/gcc/cgraph.h
index 96d6cf609fe..9dc886cc58a 100644
--- a/gcc/cgraph.h
+++ b/gcc/cgraph.h
@@ -826,17 +826,17 @@ struct GTY(()) cgraph_simd_clone_arg {
 
 struct GTY(()) cgraph_simd_clone {
   /* Number of words in the SIMD lane associated with this clone.  */
-  unsigned int simdlen;
+  poly_uint64 simdlen;
 
   /* Number of annotated function arguments in `args'.  This is
      usually the number of named arguments in FNDECL.  */
   unsigned int nargs;
 
   /* Max hardware vector size in bits for integral vectors.  */
-  unsigned int vecsize_int;
+  poly_uint64 vecsize_int;
 
   /* Max hardware vector size in bits for floating point vectors.  */
-  unsigned int vecsize_float;
+  poly_uint64 vecsize_float;
 
   /* Machine mode of the mask argument(s), if they are to be passed
      as bitmasks in integer argument(s).  VOIDmode if masks are passed
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index a8cc545c370..aa1c798cec1 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -23044,18 +23044,23 @@ aarch64_simd_clone_compute_vecsize_and_simdlen (struct cgraph_node *node,
 					tree base_type, int num)
 {
   tree t, ret_type, arg_type;
-  unsigned int elt_bits, vec_bits, count;
+  unsigned int elt_bits, count;
+  unsigned HOST_WIDE_INT const_simdlen;
+  poly_uint64 vec_bits;
 
   if (!TARGET_SIMD)
     return 0;
 
-  if (clonei->simdlen
-      && (clonei->simdlen < 2
-	  || clonei->simdlen > 1024
-	  || (clonei->simdlen & (clonei->simdlen - 1)) != 0))
+  /* For now, SVE simdclones won't produce illegal simdlen, So only check
+     const simdlens here.  */
+  if (maybe_ne (clonei->simdlen, 0U)
+      && clonei->simdlen.is_constant (&const_simdlen)
+      && (const_simdlen < 2
+	  || const_simdlen > 1024
+	  || (const_simdlen & (const_simdlen - 1)) != 0))
     {
       warning_at (DECL_SOURCE_LOCATION (node->decl), 0,
-		  "unsupported simdlen %d", clonei->simdlen);
+		  "unsupported simdlen %wd", const_simdlen);
       return 0;
     }
 
@@ -23099,21 +23104,24 @@ aarch64_simd_clone_compute_vecsize_and_simdlen (struct cgraph_node *node,
   clonei->vecsize_mangle = 'n';
   clonei->mask_mode = VOIDmode;
   elt_bits = GET_MODE_BITSIZE (SCALAR_TYPE_MODE (base_type));
-  if (clonei->simdlen == 0)
+  if (known_eq (clonei->simdlen, 0U))
     {
       count = 2;
       vec_bits = (num == 0 ? 64 : 128);
-      clonei->simdlen = vec_bits / elt_bits;
+      clonei->simdlen = exact_div (vec_bits, elt_bits);
     }
   else
     {
       count = 1;
       vec_bits = clonei->simdlen * elt_bits;
-      if (vec_bits != 64 && vec_bits != 128)
+      /* For now, SVE simdclones won't produce illegal simdlen, So only check
+	 const simdlens here.  */
+      if (clonei->simdlen.is_constant (&const_simdlen)
+	  && maybe_ne (vec_bits, 64U) && maybe_ne (vec_bits, 128U))
 	{
 	  warning_at (DECL_SOURCE_LOCATION (node->decl), 0,
-		      "GCC does not currently support simdlen %d for type %qT",
-		      clonei->simdlen, base_type);
+		      "GCC does not currently support simdlen %wd for type %qT",
+		      const_simdlen, base_type);
 	  return 0;
 	}
     }
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 54c2cdaf060..9ee27e1240e 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -22140,7 +22140,7 @@ ix86_simd_clone_compute_vecsize_and_simdlen (struct cgraph_node *node,
 	  || (clonei->simdlen & (clonei->simdlen - 1)) != 0))
     {
       warning_at (DECL_SOURCE_LOCATION (node->decl), 0,
-		  "unsupported simdlen %d", clonei->simdlen);
+		  "unsupported simdlen %wd", clonei->simdlen.to_constant ());
       return 0;
     }
 
@@ -22245,7 +22245,8 @@ ix86_simd_clone_compute_vecsize_and_simdlen (struct cgraph_node *node,
 	clonei->simdlen = clonei->vecsize_int;
       else
 	clonei->simdlen = clonei->vecsize_float;
-      clonei->simdlen /= GET_MODE_BITSIZE (TYPE_MODE (base_type));
+      clonei->simdlen = clonei->simdlen
+			/ GET_MODE_BITSIZE (TYPE_MODE (base_type));
     }
   else if (clonei->simdlen > 16)
     {
@@ -22267,7 +22268,8 @@ ix86_simd_clone_compute_vecsize_and_simdlen (struct cgraph_node *node,
       if (cnt > (TARGET_64BIT ? 16 : 8))
 	{
 	  warning_at (DECL_SOURCE_LOCATION (node->decl), 0,
-		      "unsupported simdlen %d", clonei->simdlen);
+		      "unsupported simdlen %wd",
+		      clonei->simdlen.to_constant ());
 	  return 0;
 	}
       }
diff --git a/gcc/gengtype.c b/gcc/gengtype.c
index a59a8823f82..919e3a00bf2 100644
--- a/gcc/gengtype.c
+++ b/gcc/gengtype.c
@@ -5198,6 +5198,7 @@ main (int argc, char **argv)
       POS_HERE (do_scalar_typedef ("widest_int", &pos));
       POS_HERE (do_scalar_typedef ("int64_t", &pos));
       POS_HERE (do_scalar_typedef ("poly_int64", &pos));
+      POS_HERE (do_scalar_typedef ("poly_uint64", &pos));
       POS_HERE (do_scalar_typedef ("uint64_t", &pos));
       POS_HERE (do_scalar_typedef ("uint8", &pos));
       POS_HERE (do_scalar_typedef ("uintptr_t", &pos));
diff --git a/gcc/omp-simd-clone.c b/gcc/omp-simd-clone.c
index 942fb971cb7..9b9e9721735 100644
--- a/gcc/omp-simd-clone.c
+++ b/gcc/omp-simd-clone.c
@@ -338,16 +338,18 @@ simd_clone_mangle (struct cgraph_node *node,
 {
   char vecsize_mangle = clone_info->vecsize_mangle;
   char mask = clone_info->inbranch ? 'M' : 'N';
-  unsigned int simdlen = clone_info->simdlen;
+  poly_uint64 simdlen = clone_info->simdlen;
   unsigned int n;
   pretty_printer pp;
 
-  gcc_assert (vecsize_mangle && simdlen);
+  gcc_assert (vecsize_mangle && maybe_ne (simdlen, 0U));
 
   pp_string (&pp, "_ZGV");
   pp_character (&pp, vecsize_mangle);
   pp_character (&pp, mask);
-  pp_decimal_int (&pp, simdlen);
+  /* For now, simdlen is always constant, while variable simdlen pp 'n'.  */
+  unsigned int len = simdlen.to_constant ();
+  pp_decimal_int (&pp, (len));
 
   for (n = 0; n < clone_info->nargs; ++n)
     {
@@ -491,7 +493,7 @@ simd_clone_adjust_return_type (struct cgraph_node *node)
 {
   tree fndecl = node->decl;
   tree orig_rettype = TREE_TYPE (TREE_TYPE (fndecl));
-  unsigned int veclen;
+  poly_uint64 veclen;
   tree t;
 
   /* Adjust the function return type.  */
@@ -502,17 +504,18 @@ simd_clone_adjust_return_type (struct cgraph_node *node)
     veclen = node->simdclone->vecsize_int;
   else
     veclen = node->simdclone->vecsize_float;
-  veclen /= GET_MODE_BITSIZE (SCALAR_TYPE_MODE (t));
-  if (veclen > node->simdclone->simdlen)
+  veclen = exact_div (veclen, GET_MODE_BITSIZE (SCALAR_TYPE_MODE (t)));
+  if (multiple_p (veclen, node->simdclone->simdlen))
     veclen = node->simdclone->simdlen;
   if (POINTER_TYPE_P (t))
     t = pointer_sized_int_node;
-  if (veclen == node->simdclone->simdlen)
+  if (known_eq (veclen, node->simdclone->simdlen))
     t = build_vector_type (t, node->simdclone->simdlen);
   else
     {
       t = build_vector_type (t, veclen);
-      t = build_array_type_nelts (t, node->simdclone->simdlen / veclen);
+      t = build_array_type_nelts (t, exact_div (node->simdclone->simdlen,
+						veclen));
     }
   TREE_TYPE (TREE_TYPE (fndecl)) = t;
   if (!node->definition)
@@ -526,7 +529,7 @@ simd_clone_adjust_return_type (struct cgraph_node *node)
 
   tree atype = build_array_type_nelts (orig_rettype,
 				       node->simdclone->simdlen);
-  if (veclen != node->simdclone->simdlen)
+  if (maybe_ne (veclen, node->simdclone->simdlen))
     return build1 (VIEW_CONVERT_EXPR, atype, t);
 
   /* Set up a SIMD array to use as the return value.  */
@@ -546,7 +549,7 @@ simd_clone_adjust_return_type (struct cgraph_node *node)
    SIMDLEN is the number of elements.  */
 
 static tree
-create_tmp_simd_array (const char *prefix, tree type, int simdlen)
+create_tmp_simd_array (const char *prefix, tree type, poly_uint64 simdlen)
 {
   tree atype = build_array_type_nelts (type, simdlen);
   tree avar = create_tmp_var_raw (atype, prefix);
@@ -578,7 +581,8 @@ simd_clone_adjust_argument_types (struct cgraph_node *node)
   struct cgraph_simd_clone *sc = node->simdclone;
   vec<ipa_adjusted_param, va_gc> *new_params = NULL;
   vec_safe_reserve (new_params, sc->nargs);
-  unsigned i, j, veclen;
+  unsigned i, j, k;
+  poly_uint64 veclen;
 
   for (i = 0; i < sc->nargs; ++i)
     {
@@ -614,8 +618,9 @@ simd_clone_adjust_argument_types (struct cgraph_node *node)
 	    veclen = sc->vecsize_int;
 	  else
 	    veclen = sc->vecsize_float;
-	  veclen /= GET_MODE_BITSIZE (SCALAR_TYPE_MODE (parm_type));
-	  if (veclen > sc->simdlen)
+	  veclen = exact_div (veclen,
+			      GET_MODE_BITSIZE (SCALAR_TYPE_MODE (parm_type)));
+	  if (known_gt (veclen, sc->simdlen))
 	    veclen = sc->simdlen;
 	  adj.op = IPA_PARAM_OP_NEW;
 	  adj.param_prefix_index = IPA_PARAM_PREFIX_SIMD;
@@ -624,10 +629,11 @@ simd_clone_adjust_argument_types (struct cgraph_node *node)
 	  else
 	    adj.type = build_vector_type (parm_type, veclen);
 	  sc->args[i].vector_type = adj.type;
-	  for (j = veclen; j < sc->simdlen; j += veclen)
+	  k = vector_unroll_factor (sc->simdlen, veclen);
+	  for (j = 1; j < k; j++)
 	    {
 	      vec_safe_push (new_params, adj);
-	      if (j == veclen)
+	      if (j == 1)
 		{
 		  memset (&adj, 0, sizeof (adj));
 		  adj.op = IPA_PARAM_OP_NEW;
@@ -663,8 +669,9 @@ simd_clone_adjust_argument_types (struct cgraph_node *node)
 	veclen = sc->vecsize_int;
       else
 	veclen = sc->vecsize_float;
-      veclen /= GET_MODE_BITSIZE (SCALAR_TYPE_MODE (base_type));
-      if (veclen > sc->simdlen)
+      veclen = exact_div (veclen,
+			  GET_MODE_BITSIZE (SCALAR_TYPE_MODE (base_type)));
+      if (multiple_p (veclen, sc->simdlen))
 	veclen = sc->simdlen;
       if (sc->mask_mode != VOIDmode)
 	adj.type
@@ -675,7 +682,8 @@ simd_clone_adjust_argument_types (struct cgraph_node *node)
 	adj.type = build_vector_type (base_type, veclen);
       vec_safe_push (new_params, adj);
 
-      for (j = veclen; j < sc->simdlen; j += veclen)
+      k = vector_unroll_factor (sc->simdlen, veclen);
+      for (j = 1; j < k; j++)
 	vec_safe_push (new_params, adj);
 
       /* We have previously allocated one extra entry for the mask.  Use
@@ -690,9 +698,9 @@ simd_clone_adjust_argument_types (struct cgraph_node *node)
 	  if (sc->mask_mode == VOIDmode)
 	    sc->args[i].simd_array
 	      = create_tmp_simd_array ("mask", base_type, sc->simdlen);
-	  else if (veclen < sc->simdlen)
+	  else if (k > 1)
 	    sc->args[i].simd_array
-	      = create_tmp_simd_array ("mask", adj.type, sc->simdlen / veclen);
+	      = create_tmp_simd_array ("mask", adj.type, k);
 	  else
 	    sc->args[i].simd_array = NULL_TREE;
 	}
@@ -783,7 +791,8 @@ simd_clone_init_simd_arrays (struct cgraph_node *node,
 	    }
 	  continue;
 	}
-      if (simd_clone_subparts (TREE_TYPE (arg)) == node->simdclone->simdlen)
+      if (known_eq (simd_clone_subparts (TREE_TYPE (arg)),
+		    node->simdclone->simdlen))
 	{
 	  tree ptype = build_pointer_type (TREE_TYPE (TREE_TYPE (array)));
 	  tree ptr = build_fold_addr_expr (array);
@@ -795,8 +804,10 @@ simd_clone_init_simd_arrays (struct cgraph_node *node,
       else
 	{
 	  unsigned int simdlen = simd_clone_subparts (TREE_TYPE (arg));
+	  unsigned int times = vector_unroll_factor (node->simdclone->simdlen,
+						     simdlen);
 	  tree ptype = build_pointer_type (TREE_TYPE (TREE_TYPE (array)));
-	  for (k = 0; k < node->simdclone->simdlen; k += simdlen)
+	  for (k = 0; k < times; k++)
 	    {
 	      tree ptr = build_fold_addr_expr (array);
 	      int elemsize;
@@ -808,7 +819,7 @@ simd_clone_init_simd_arrays (struct cgraph_node *node,
 	      tree elemtype = TREE_TYPE (TREE_TYPE (arg));
 	      elemsize = GET_MODE_SIZE (SCALAR_TYPE_MODE (elemtype));
 	      tree t = build2 (MEM_REF, TREE_TYPE (arg), ptr,
-			       build_int_cst (ptype, k * elemsize));
+			       build_int_cst (ptype, k * elemsize * simdlen));
 	      t = build2 (MODIFY_EXPR, TREE_TYPE (t), t, arg);
 	      gimplify_and_add (t, &seq);
 	    }
@@ -981,8 +992,11 @@ ipa_simd_modify_function_body (struct cgraph_node *node,
 		  iter, NULL_TREE, NULL_TREE);
       adjustments->register_replacement (&(*adjustments->m_adj_params)[j], r);
 
-      if (simd_clone_subparts (vectype) < node->simdclone->simdlen)
-	j += node->simdclone->simdlen / simd_clone_subparts (vectype) - 1;
+      if (multiple_p (node->simdclone->simdlen, simd_clone_subparts (vectype)))
+	{
+	  j += vector_unroll_factor (node->simdclone->simdlen,
+				     simd_clone_subparts (vectype)) - 1;
+	}
     }
 
   tree name;
@@ -1249,7 +1263,8 @@ simd_clone_adjust (struct cgraph_node *node)
 	 below).  */
       loop = alloc_loop ();
       cfun->has_force_vectorize_loops = true;
-      loop->safelen = node->simdclone->simdlen;
+      /* For now, simlen is always constant.  */
+      loop->safelen = node->simdclone->simdlen.to_constant ();
       loop->force_vectorize = true;
       loop->header = body_bb;
     }
@@ -1275,7 +1290,8 @@ simd_clone_adjust (struct cgraph_node *node)
 	    {
 	      tree maskt = TREE_TYPE (mask_array);
 	      int c = tree_to_uhwi (TYPE_MAX_VALUE (TYPE_DOMAIN (maskt)));
-	      c = node->simdclone->simdlen / (c + 1);
+	      /* For now, c must be constant here.  */
+	      c = exact_div (node->simdclone->simdlen, c + 1).to_constant ();
 	      int s = exact_log2 (c);
 	      gcc_assert (s > 0);
 	      c--;
@@ -1683,7 +1699,7 @@ expand_simd_clones (struct cgraph_node *node)
       if (clone_info == NULL)
 	continue;
 
-      int orig_simdlen = clone_info->simdlen;
+      poly_uint64 orig_simdlen = clone_info->simdlen;
       tree base_type = simd_clone_compute_base_data_type (node, clone_info);
       /* The target can return 0 (no simd clones should be created),
 	 1 (just one ISA of simd clones should be created) or higher
diff --git a/gcc/poly-int-types.h b/gcc/poly-int-types.h
index 5e04e63ebf2..302e5bfd97d 100644
--- a/gcc/poly-int-types.h
+++ b/gcc/poly-int-types.h
@@ -81,6 +81,14 @@ typedef poly_int<NUM_POLY_INT_COEFFS, widest_int> poly_widest_int;
 #define vector_element_size(SIZE, NELTS) \
   (exact_div (SIZE, NELTS).to_constant ())
 
+/* Return the number of unroll times when a vector that has NELTS1 elements
+   is unrolled to vectors that have NELTS2 elements.
+
+   to_constant () is safe in this situation because the multiples of the
+   NELTS of two vectors are always constant-size scalars.  */
+#define vector_unroll_factor(NELTS1, NELTS2) \
+  (exact_div (NELTS1, NELTS2).to_constant ())
+
 /* Wrapper for poly_int arguments to target macros, so that if a target
    doesn't need polynomial-sized modes, its header file can continue to
    treat the argument as a normal constant.  This should go away once
diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
index 3575f25241f..daa84ab1e55 100644
--- a/gcc/tree-vect-stmts.c
+++ b/gcc/tree-vect-stmts.c
@@ -3731,7 +3731,7 @@ vectorizable_simd_clone_call (vec_info *vinfo, stmt_vec_info stmt_info,
   tree op, type;
   tree vec_oprnd0 = NULL_TREE;
   tree vectype;
-  unsigned int nunits;
+  poly_uint64 nunits;
   loop_vec_info loop_vinfo = dyn_cast <loop_vec_info> (vinfo);
   bb_vec_info bb_vinfo = dyn_cast <bb_vec_info> (vinfo);
   class loop *loop = loop_vinfo ? LOOP_VINFO_LOOP (loop_vinfo) : NULL;
@@ -3883,8 +3883,8 @@ vectorizable_simd_clone_call (vec_info *vinfo, stmt_vec_info stmt_info,
       arginfo.quick_push (thisarginfo);
     }
 
-  unsigned HOST_WIDE_INT vf;
-  if (!LOOP_VINFO_VECT_FACTOR (loop_vinfo).is_constant (&vf))
+  poly_uint64 vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
+  if (!vf.is_constant ())
     {
       if (dump_enabled_p ())
 	dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
@@ -3902,12 +3902,12 @@ vectorizable_simd_clone_call (vec_info *vinfo, stmt_vec_info stmt_info,
 	 n = n->simdclone->next_clone)
       {
 	unsigned int this_badness = 0;
-	if (n->simdclone->simdlen > vf
+	unsigned int num_calls;
+	if (!constant_multiple_p (vf, n->simdclone->simdlen, &num_calls)
 	    || n->simdclone->nargs != nargs)
 	  continue;
-	if (n->simdclone->simdlen < vf)
-	  this_badness += (exact_log2 (vf)
-			   - exact_log2 (n->simdclone->simdlen)) * 1024;
+	if (!multiple_p (n->simdclone->simdlen, vf))
+	  this_badness += exact_log2 (num_calls) * 1024;
 	if (n->simdclone->inbranch)
 	  this_badness += 2048;
 	int target_badness = targetm.simd_clone.usable (n);
@@ -3988,19 +3988,19 @@ vectorizable_simd_clone_call (vec_info *vinfo, stmt_vec_info stmt_info,
 	arginfo[i].vectype = get_vectype_for_scalar_type (vinfo, arg_type,
 							  slp_node);
 	if (arginfo[i].vectype == NULL
-	    || (simd_clone_subparts (arginfo[i].vectype)
-		> bestn->simdclone->simdlen))
+	    || !multiple_p (bestn->simdclone->simdlen,
+			    simd_clone_subparts (arginfo[i].vectype)))
 	  return false;
       }
 
   fndecl = bestn->decl;
   nunits = bestn->simdclone->simdlen;
-  ncopies = vf / nunits;
+  ncopies = vector_unroll_factor (vf, nunits);
 
   /* If the function isn't const, only allow it in simd loops where user
      has asserted that at least nunits consecutive iterations can be
      performed using SIMD instructions.  */
-  if ((loop == NULL || (unsigned) loop->safelen < nunits)
+  if ((loop == NULL || maybe_lt ((unsigned) loop->safelen, nunits))
       && gimple_vuse (stmt))
     return false;
 
@@ -4078,7 +4078,8 @@ vectorizable_simd_clone_call (vec_info *vinfo, stmt_vec_info stmt_info,
 	    {
 	    case SIMD_CLONE_ARG_TYPE_VECTOR:
 	      atype = bestn->simdclone->args[i].vector_type;
-	      o = nunits / simd_clone_subparts (atype);
+	      o = vector_unroll_factor (nunits,
+					simd_clone_subparts (atype));
 	      for (m = j * o; m < (j + 1) * o; m++)
 		{
 		  if (simd_clone_subparts (atype)
@@ -4203,7 +4204,7 @@ vectorizable_simd_clone_call (vec_info *vinfo, stmt_vec_info stmt_info,
 		      ? POINTER_PLUS_EXPR : PLUS_EXPR;
 		  tree type = POINTER_TYPE_P (TREE_TYPE (op))
 			      ? sizetype : TREE_TYPE (op);
-		  widest_int cst
+		  poly_widest_int cst
 		    = wi::mul (bestn->simdclone->args[i].linear_step,
 			       ncopies * nunits);
 		  tree tcst = wide_int_to_tree (type, cst);
@@ -4224,7 +4225,7 @@ vectorizable_simd_clone_call (vec_info *vinfo, stmt_vec_info stmt_info,
 		      ? POINTER_PLUS_EXPR : PLUS_EXPR;
 		  tree type = POINTER_TYPE_P (TREE_TYPE (op))
 			      ? sizetype : TREE_TYPE (op);
-		  widest_int cst
+		  poly_widest_int cst
 		    = wi::mul (bestn->simdclone->args[i].linear_step,
 			       j * nunits);
 		  tree tcst = wide_int_to_tree (type, cst);
@@ -4250,7 +4251,8 @@ vectorizable_simd_clone_call (vec_info *vinfo, stmt_vec_info stmt_info,
       gcall *new_call = gimple_build_call_vec (fndecl, vargs);
       if (vec_dest)
 	{
-	  gcc_assert (ratype || simd_clone_subparts (rtype) == nunits);
+	  gcc_assert (ratype
+		      || known_eq (simd_clone_subparts (rtype), nunits));
 	  if (ratype)
 	    new_temp = create_tmp_var (ratype);
 	  else if (useless_type_conversion_p (vectype, rtype))
@@ -4264,12 +4266,13 @@ vectorizable_simd_clone_call (vec_info *vinfo, stmt_vec_info stmt_info,
 
       if (vec_dest)
 	{
-	  if (simd_clone_subparts (vectype) < nunits)
+	  if (!multiple_p (simd_clone_subparts (vectype), nunits))
 	    {
 	      unsigned int k, l;
 	      poly_uint64 prec = GET_MODE_BITSIZE (TYPE_MODE (vectype));
 	      poly_uint64 bytes = GET_MODE_SIZE (TYPE_MODE (vectype));
-	      k = nunits / simd_clone_subparts (vectype);
+	      k = vector_unroll_factor (nunits,
+					simd_clone_subparts (vectype));
 	      gcc_assert ((k & (k - 1)) == 0);
 	      for (l = 0; l < k; l++)
 		{
@@ -4295,7 +4298,7 @@ vectorizable_simd_clone_call (vec_info *vinfo, stmt_vec_info stmt_info,
 		vect_clobber_variable (vinfo, stmt_info, gsi, new_temp);
 	      continue;
 	    }
-	  else if (simd_clone_subparts (vectype) > nunits)
+	  else if (!multiple_p (nunits, simd_clone_subparts (vectype)))
 	    {
 	      unsigned int k = (simd_clone_subparts (vectype)
 				/ simd_clone_subparts (rtype));
@@ -4304,7 +4307,9 @@ vectorizable_simd_clone_call (vec_info *vinfo, stmt_vec_info stmt_info,
 		vec_alloc (ret_ctor_elts, k);
 	      if (ratype)
 		{
-		  unsigned int m, o = nunits / simd_clone_subparts (rtype);
+		  unsigned int m, o;
+		  o = vector_unroll_factor (nunits,
+					    simd_clone_subparts (rtype));
 		  for (m = 0; m < o; m++)
 		    {
 		      tree tem = build4 (ARRAY_REF, rtype, new_temp,
-- 
2.19.1


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/5] [PR target/96342] Change field "simdlen" into poly_uint64
  2020-11-02 13:17   ` yangyang (ET)
@ 2020-11-02 14:15     ` Richard Sandiford
  2020-11-03 11:07       ` yangyang (ET)
  0 siblings, 1 reply; 9+ messages in thread
From: Richard Sandiford @ 2020-11-02 14:15 UTC (permalink / raw)
  To: yangyang (ET); +Cc: gcc-patches

"yangyang (ET)" <yangyang305@huawei.com> writes:
> Hi,
>
> I have revised the patch based on your suggestions, and the following are some points that I think is needed to be mentioned in the mail.
>
>> > @@ -502,17 +504,18 @@ simd_clone_adjust_return_type (struct
>> cgraph_node *node)
>> >      veclen = node->simdclone->vecsize_int;
>> >    else
>> >      veclen = node->simdclone->vecsize_float;
>> > -  veclen /= GET_MODE_BITSIZE (SCALAR_TYPE_MODE (t));
>> > -  if (veclen > node->simdclone->simdlen)
>> > +  veclen = exact_div (veclen, GET_MODE_BITSIZE (SCALAR_TYPE_MODE
>> > + (t)));  if (known_gt (veclen, node->simdclone->simdlen))
>> >      veclen = node->simdclone->simdlen;
>>
>> Although again it probably doesn't make a difference in practice, the
>> known/maybe situation is similar here.  When comparing:
>>
>> - an SVE vector of 2 + 2 * (VQ - 1) doubles and
>> - an Advanced SIMD vector of 2 doubles
>>
>> the Advanced SIMD version is conceptually ordered <= the SVE one, in the
>> sense that the SVE vector always contains a whole number of Advanced SIMD
>> vectors whereas the Advanced SIMD vector might not contain a whole number
>> of SVE vectors.
>>
>> In other words, the number of lanes in the Advanced SIMD vector is known_le
>> the number of lanes in the SVE vector, and the number of lanes in the SVE
>> vector is known_ge and maybe_gt (but not known_gt) the number of lanes in
>> the Advanced SIMD vector.  So for these kinds of comparison, known_gt can
>> look a bit unexpected, even if (as here) it's probably fine in practice.
>>
>> There's currently a hard-coded assumption in this code and in the vectoriser
>> that both constant-length software vectors and constant-length hardware
>> vectors are a power of 2 in size.  This means that the > above is effectively
>> testing whether veclen contains a whole number of
>> node->simdclone->simdlens, not just whether the veclen is bigger.
>>
>> So when adding the initial autovec support for SVE, we generally rewrote these
>> kinds of test to use multiple_p instead of comparisons.  I think that makes the
>> intent more obvious (but others might disagree :-)).
>>
>> That was a long-winded way of saying that I suggest we use:
>>
>>   if (multiple_p (veclen, node->simdclone->simdlen))
>>     veclen = node->simdclone->simdlen;
>>
>> instead.
>
> As far as I understand, multiple_p (a, b) is equal to known_ge (a, b) due to the hard-coded assumption, and !multiple_p (b, a) is equal to known_gt (a, b).

Yes to the first part, but !multiple_p (b, a) is equal to known_ge (a, b)
(== known_le (b, a)) rather than known_gt (a, b).  E.g.:

  multiple_p (2, 2 + 2X)

is false (for X>0) but:

  known_gt (2 + 2X, X)

is also false (for X==0).  In both cases, the conditions have an
implicit “for all X”.

> So it seems better to use !multiple_p (node->simdclone->simdlen, veclen) here,while using multiple_p (veclen, node->simdclone->simdlen) is OK since doing the assignment for the eq situation won't change anything. Anyway, I used multiple_p (veclen, node->simdclone->simdlen) in the revised patch.

!multiple_p is fine with me too FWIW, if you prefer that.

>> >      }
>> >    TREE_TYPE (TREE_TYPE (fndecl)) = t;
>> >    if (!node->definition)
>> > @@ -663,8 +669,9 @@ simd_clone_adjust_argument_types (struct
>> cgraph_node *node)
>> >     veclen = sc->vecsize_int;
>> >        else
>> >     veclen = sc->vecsize_float;
>> > -      veclen /= GET_MODE_BITSIZE (SCALAR_TYPE_MODE (base_type));
>> > -      if (veclen > sc->simdlen)
>> > +      veclen = exact_div (veclen,
>> > +                     GET_MODE_BITSIZE (SCALAR_TYPE_MODE (base_type)));
>> > +      if (known_gt (veclen, sc->simdlen))
>> >     veclen = sc->simdlen;
>>
>> Similarly here.
>
> Similarly here, use multiple_p (veclen, sc->simdlen) in the revised patch in addition.

Here too !multiple_p sounds OK if you prefer that.

>> > @@ -981,8 +992,11 @@ ipa_simd_modify_function_body (struct
>> cgraph_node *node,
>> >               iter, NULL_TREE, NULL_TREE);
>> >        adjustments->register_replacement
>> > (&(*adjustments->m_adj_params)[j], r);
>> >
>> > -      if (simd_clone_subparts (vectype) < node->simdclone->simdlen)
>> > -   j += node->simdclone->simdlen / simd_clone_subparts (vectype) - 1;
>> > +      if (known_lt (simd_clone_subparts (vectype),
>> node->simdclone->simdlen))
>> > +   {
>> > +     j += vector_unroll_factor (node->simdclone->simdlen,
>> > +                                simd_clone_subparts (vectype)) - 1;
>> > +   }
>>
>> And similarly here:
>>
>>       if (multiple_p (node->simdclone->simdlen, simd_clone_subparts
>> (vectype)))
>>
>
> Similarly here, use multiple_p (node->simdclone->simdlen, simd_clone_subparts (vectype)) in the revised patch in addition.

To be clear, I think multiple_p is still natural here though.

>> >  /* Wrapper for poly_int arguments to target macros, so that if a target
>> >     doesn't need polynomial-sized modes, its header file can continue to
>> >     treat the argument as a normal constant.  This should go away once
>> > @@ -3902,12 +3902,12 @@ vectorizable_simd_clone_call (vec_info *vinfo,
>> stmt_vec_info stmt_info,
>> >      n = n->simdclone->next_clone)
>> >        {
>> >     unsigned int this_badness = 0;
>> > -   if (n->simdclone->simdlen > vf
>> > +   if (known_gt (n->simdclone->simdlen, vf)
>>
>> I think !constant_multiple_p rather than known_gt here.  Until we add
>> support for vectorising simd clones with partial vectors, a whole number of
>> simdlens must fit within VF.  So maybe:
>>
>>      unsigned int num_calls;
>>      if (!constant_multiple_p (n->simdclone->simdlen, vf, &num_calls)
>>
>> and then…
>>
>> >         || n->simdclone->nargs != nargs)
>> >       continue;
>> > -   if (n->simdclone->simdlen < vf)
>> > -     this_badness += (exact_log2 (vf)
>> > -                      - exact_log2 (n->simdclone->simdlen)) * 1024;
>> > +   if (known_lt (n->simdclone->simdlen, vf))
>> > +     this_badness += exact_log2
>> > +       (vector_unroll_factor (vf, n->simdclone->simdlen)) * 1024;
>>
>> …use num_calls instead of vector_unroll_factor here.
>
> The revised patch wrote these code as:
>
> unsigned int num_calls;
> if (!constant_multiple_p (vf, n->simdclone->simdlen, &num_calls)
>   || n->simdclone->nargs != nargs)
>  continue;
> if (!multiple_p (n->simdclone->simdlen, vf))
>  this_badness += exact_log2 (num_calls) * 1024;

I think this should be:

  if (num_calls != 1)
    this_badness += exact_log2 (num_calls) * 1024;

(Or we could do the addition unconditionally, both work.)

>> >     if (n->simdclone->inbranch)
>> >       this_badness += 2048;
>> >     int target_badness = targetm.simd_clone.usable (n); @@ -3988,19
>> > +3988,19 @@ vectorizable_simd_clone_call (vec_info *vinfo, stmt_vec_info
>> stmt_info,
>> >     arginfo[i].vectype = get_vectype_for_scalar_type (vinfo, arg_type,
>> >                                                       slp_node);
>> >     if (arginfo[i].vectype == NULL
>> > -       || (simd_clone_subparts (arginfo[i].vectype)
>> > -           > bestn->simdclone->simdlen))
>> > +       || (known_gt (simd_clone_subparts (arginfo[i].vectype),
>> > +                     bestn->simdclone->simdlen)))
>>
>> Here too I think we want constant_multiple_p:
>>
>>           || !constant_multiple_p (bestn->simdclone->simdlen,
>>                                    simd_clone_subparts (arginfo[i].vectype))
>>
>
> Use multiple_p here since the multiple is not needed.

True, but in the case of vectorisation, we need to generate a constant
number of copies at compile time.  If we don't enforce a constant
multiple, we might end up trying to use an Advanced SIMD routine
when vectorising for SVE.

The reason we don't have a two-argument version of constant_multiple_p
is that so far nothing has needed it (at least AFAIK).  There's no
conceptual problem with adding it though.  I'm happy to do that if
it would help.

>> >       if (ratype)
>> >         new_temp = create_tmp_var (ratype);
>> >       else if (useless_type_conversion_p (vectype, rtype)) @@ -4264,12
>> > +4266,13 @@ vectorizable_simd_clone_call (vec_info *vinfo,
>> > stmt_vec_info stmt_info,
>> >
>> >        if (vec_dest)
>> >     {
>> > -     if (simd_clone_subparts (vectype) < nunits)
>> > +     if (known_lt (simd_clone_subparts (vectype), nunits))
>>
>> multiple_p (nunits, …) here too.  It would also be possible to do:
>>
>>         if (constant_multiple_p (nunits, simd_clone_subparts (vectype), &k)
>>
>> and avoid the separate vector_unroll_factor.
>
> Use !multiple_p (simd_clone_subparts (vectype), nunits) here, since it seems that the following code should not be executed in the eq situation.

Ah, yes, sorry for missing that.

>>
>> >         {
>> >           unsigned int k, l;
>> >           poly_uint64 prec = GET_MODE_BITSIZE (TYPE_MODE (vectype));
>> >           poly_uint64 bytes = GET_MODE_SIZE (TYPE_MODE (vectype));
>> > -         k = nunits / simd_clone_subparts (vectype);
>> > +         k = vector_unroll_factor (nunits,
>> > +                                   simd_clone_subparts (vectype));
>> >           gcc_assert ((k & (k - 1)) == 0);
>> >           for (l = 0; l < k; l++)
>> >             {
>> > @@ -4295,16 +4298,18 @@ vectorizable_simd_clone_call (vec_info *vinfo,
>> stmt_vec_info stmt_info,
>> >             vect_clobber_variable (vinfo, stmt_info, gsi, new_temp);
>> >           continue;
>> >         }
>> > -     else if (simd_clone_subparts (vectype) > nunits)
>> > +     else if (known_gt (simd_clone_subparts (vectype), nunits))
>>
>> multiple_p here too.
>>
>
> Similarly, use !multiple_p (nunits, simd_clone_subparts (vectype)) here.

OK.

> @@ -614,8 +618,9 @@ simd_clone_adjust_argument_types (struct cgraph_node *node)
>  	    veclen = sc->vecsize_int;
>  	  else
>  	    veclen = sc->vecsize_float;
> -	  veclen /= GET_MODE_BITSIZE (SCALAR_TYPE_MODE (parm_type));
> -	  if (veclen > sc->simdlen)
> +	  veclen = exact_div (veclen,
> +			      GET_MODE_BITSIZE (SCALAR_TYPE_MODE (parm_type)));
> +	  if (known_gt (veclen, sc->simdlen))
>  	    veclen = sc->simdlen;
>  	  adj.op = IPA_PARAM_OP_NEW;
>  	  adj.param_prefix_index = IPA_PARAM_PREFIX_SIMD;

I think this should use multiple_p (or !multiple_p) too.

> @@ -981,8 +992,11 @@ ipa_simd_modify_function_body (struct cgraph_node *node,
>  		  iter, NULL_TREE, NULL_TREE);
>        adjustments->register_replacement (&(*adjustments->m_adj_params)[j], r);
 
> -      if (simd_clone_subparts (vectype) < node->simdclone->simdlen)
> -	j += node->simdclone->simdlen / simd_clone_subparts (vectype) - 1;
> +      if (multiple_p (node->simdclone->simdlen, simd_clone_subparts (vectype)))
> +	{
> +	  j += vector_unroll_factor (node->simdclone->simdlen,
> +				     simd_clone_subparts (vectype)) - 1;
> +	}
>      }
> 
>    tree name;

Minor nit, but: the original formatting without braces is preferred
for single statements, even if the statement spans multiple lines.

Otherwise the patch looks good, thanks.

Richard

^ permalink raw reply	[flat|nested] 9+ messages in thread

* RE: [PATCH 1/5] [PR target/96342] Change field "simdlen" into poly_uint64
  2020-11-02 14:15     ` Richard Sandiford
@ 2020-11-03 11:07       ` yangyang (ET)
  2020-11-03 16:14         ` Richard Sandiford
  0 siblings, 1 reply; 9+ messages in thread
From: yangyang (ET) @ 2020-11-03 11:07 UTC (permalink / raw)
  To: Richard Sandiford; +Cc: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 1836 bytes --]

Hi,

I have revised the patch based on your suggestions. I use multiple_p instead of !multiple_p if the eq situation is OK to make it easier to understand.

> >> >     if (n->simdclone->inbranch)
> >> >       this_badness += 2048;
> >> >     int target_badness = targetm.simd_clone.usable (n); @@ -3988,19
> >> > +3988,19 @@ vectorizable_simd_clone_call (vec_info *vinfo,
> >> > +stmt_vec_info
> >> stmt_info,
> >> >     arginfo[i].vectype = get_vectype_for_scalar_type (vinfo, arg_type,
> >> >
> slp_node);
> >> >     if (arginfo[i].vectype == NULL
> >> > -       || (simd_clone_subparts (arginfo[i].vectype)
> >> > -           > bestn->simdclone->simdlen))
> >> > +       || (known_gt (simd_clone_subparts (arginfo[i].vectype),
> >> > +                     bestn->simdclone->simdlen)))
> >>
> >> Here too I think we want constant_multiple_p:
> >>
> >>           || !constant_multiple_p (bestn->simdclone->simdlen,
> >>                                    simd_clone_subparts
> >> (arginfo[i].vectype))
> >>
> >
> > Use multiple_p here since the multiple is not needed.
>
> True, but in the case of vectorisation, we need to generate a constant number
> of copies at compile time.  If we don't enforce a constant multiple, we might
> end up trying to use an Advanced SIMD routine when vectorising for SVE.
>
> The reason we don't have a two-argument version of constant_multiple_p is
> that so far nothing has needed it (at least AFAIK).  There's no conceptual
> problem with adding it though.  I'm happy to do that if it would help.
>

Two-argument versions of constant_multiple_p are added in the v3-patch. Could you please check if the added versions are OK ?

Bootstrap and tested on both aarch64 and x86 Linux platform, no new regression witnessed.

Any suggestions?

Thanks,
Yang Yang

[-- Attachment #2: PR96342-part1-v3.patch --]
[-- Type: application/octet-stream, Size: 24154 bytes --]

From 5a7d943641ac96ecd2464252305cdbd523cac76e Mon Sep 17 00:00:00 2001
From: Yang Yang <yangyang305@huawei.com>
Date: Tue, 3 Nov 2020 23:54:45 +0800
Subject: [PATCH] PR target/96342 Change field "simdlen" into poly_uint64

This is the first patch of PR96342. In order to add support for
"omp declare simd", change the type of the field "simdlen" of
struct cgraph_simd_clone from unsigned int to poly_uint64 and
related adaptation. Since the length might be variable for the
SVE cases.

2020-11-03  Yang Yang  <yangyang305@huawei.com>

gcc/ChangeLog:

	* cgraph.h (struct cgraph_simd_clone): Change field "simdlen" of
	struct cgraph_simd_clone from unsigned int to poly_uint64.
	* config/aarch64/aarch64.c
	(aarch64_simd_clone_compute_vecsize_and_simdlen): adaptation of
	operations on "simdlen".
	* config/i386/i386.c (ix86_simd_clone_compute_vecsize_and_simdlen):
	Printf formats update.
	* gengtype.c (main): Handle poly_uint64.
	* omp-simd-clone.c (simd_clone_mangle): Likewise.Re
	(simd_clone_adjust_return_type): Likewise.
	(create_tmp_simd_array): Likewise.
	(simd_clone_adjust_argument_types): Likewise.
	(simd_clone_init_simd_arrays): Likewise.
	(ipa_simd_modify_function_body): Likewise.
	(simd_clone_adjust): Likewise.
	(expand_simd_clones): Likewise.
	* poly-int-types.h (vector_unroll_factor): New macro.
	* poly-int.h (constant_multiple_p): Add two-argument versions.
	* tree-vect-stmts.c (vectorizable_simd_clone_call): Likewise.
---
 gcc/cgraph.h                 |  6 ++--
 gcc/config/aarch64/aarch64.c | 30 ++++++++++------
 gcc/config/i386/i386.c       |  8 +++--
 gcc/gengtype.c               |  1 +
 gcc/omp-simd-clone.c         | 70 +++++++++++++++++++++---------------
 gcc/poly-int-types.h         |  8 +++++
 gcc/poly-int.h               | 57 +++++++++++++++++++++++++++++
 gcc/tree-vect-stmts.c        | 43 ++++++++++++----------
 8 files changed, 159 insertions(+), 64 deletions(-)

diff --git a/gcc/cgraph.h b/gcc/cgraph.h
index 96d6cf609fe..9dc886cc58a 100644
--- a/gcc/cgraph.h
+++ b/gcc/cgraph.h
@@ -826,17 +826,17 @@ struct GTY(()) cgraph_simd_clone_arg {
 
 struct GTY(()) cgraph_simd_clone {
   /* Number of words in the SIMD lane associated with this clone.  */
-  unsigned int simdlen;
+  poly_uint64 simdlen;
 
   /* Number of annotated function arguments in `args'.  This is
      usually the number of named arguments in FNDECL.  */
   unsigned int nargs;
 
   /* Max hardware vector size in bits for integral vectors.  */
-  unsigned int vecsize_int;
+  poly_uint64 vecsize_int;
 
   /* Max hardware vector size in bits for floating point vectors.  */
-  unsigned int vecsize_float;
+  poly_uint64 vecsize_float;
 
   /* Machine mode of the mask argument(s), if they are to be passed
      as bitmasks in integer argument(s).  VOIDmode if masks are passed
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index a8cc545c370..aa1c798cec1 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -23044,18 +23044,23 @@ aarch64_simd_clone_compute_vecsize_and_simdlen (struct cgraph_node *node,
 					tree base_type, int num)
 {
   tree t, ret_type, arg_type;
-  unsigned int elt_bits, vec_bits, count;
+  unsigned int elt_bits, count;
+  unsigned HOST_WIDE_INT const_simdlen;
+  poly_uint64 vec_bits;
 
   if (!TARGET_SIMD)
     return 0;
 
-  if (clonei->simdlen
-      && (clonei->simdlen < 2
-	  || clonei->simdlen > 1024
-	  || (clonei->simdlen & (clonei->simdlen - 1)) != 0))
+  /* For now, SVE simdclones won't produce illegal simdlen, So only check
+     const simdlens here.  */
+  if (maybe_ne (clonei->simdlen, 0U)
+      && clonei->simdlen.is_constant (&const_simdlen)
+      && (const_simdlen < 2
+	  || const_simdlen > 1024
+	  || (const_simdlen & (const_simdlen - 1)) != 0))
     {
       warning_at (DECL_SOURCE_LOCATION (node->decl), 0,
-		  "unsupported simdlen %d", clonei->simdlen);
+		  "unsupported simdlen %wd", const_simdlen);
       return 0;
     }
 
@@ -23099,21 +23104,24 @@ aarch64_simd_clone_compute_vecsize_and_simdlen (struct cgraph_node *node,
   clonei->vecsize_mangle = 'n';
   clonei->mask_mode = VOIDmode;
   elt_bits = GET_MODE_BITSIZE (SCALAR_TYPE_MODE (base_type));
-  if (clonei->simdlen == 0)
+  if (known_eq (clonei->simdlen, 0U))
     {
       count = 2;
       vec_bits = (num == 0 ? 64 : 128);
-      clonei->simdlen = vec_bits / elt_bits;
+      clonei->simdlen = exact_div (vec_bits, elt_bits);
     }
   else
     {
       count = 1;
       vec_bits = clonei->simdlen * elt_bits;
-      if (vec_bits != 64 && vec_bits != 128)
+      /* For now, SVE simdclones won't produce illegal simdlen, So only check
+	 const simdlens here.  */
+      if (clonei->simdlen.is_constant (&const_simdlen)
+	  && maybe_ne (vec_bits, 64U) && maybe_ne (vec_bits, 128U))
 	{
 	  warning_at (DECL_SOURCE_LOCATION (node->decl), 0,
-		      "GCC does not currently support simdlen %d for type %qT",
-		      clonei->simdlen, base_type);
+		      "GCC does not currently support simdlen %wd for type %qT",
+		      const_simdlen, base_type);
 	  return 0;
 	}
     }
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 54c2cdaf060..9ee27e1240e 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -22140,7 +22140,7 @@ ix86_simd_clone_compute_vecsize_and_simdlen (struct cgraph_node *node,
 	  || (clonei->simdlen & (clonei->simdlen - 1)) != 0))
     {
       warning_at (DECL_SOURCE_LOCATION (node->decl), 0,
-		  "unsupported simdlen %d", clonei->simdlen);
+		  "unsupported simdlen %wd", clonei->simdlen.to_constant ());
       return 0;
     }
 
@@ -22245,7 +22245,8 @@ ix86_simd_clone_compute_vecsize_and_simdlen (struct cgraph_node *node,
 	clonei->simdlen = clonei->vecsize_int;
       else
 	clonei->simdlen = clonei->vecsize_float;
-      clonei->simdlen /= GET_MODE_BITSIZE (TYPE_MODE (base_type));
+      clonei->simdlen = clonei->simdlen
+			/ GET_MODE_BITSIZE (TYPE_MODE (base_type));
     }
   else if (clonei->simdlen > 16)
     {
@@ -22267,7 +22268,8 @@ ix86_simd_clone_compute_vecsize_and_simdlen (struct cgraph_node *node,
       if (cnt > (TARGET_64BIT ? 16 : 8))
 	{
 	  warning_at (DECL_SOURCE_LOCATION (node->decl), 0,
-		      "unsupported simdlen %d", clonei->simdlen);
+		      "unsupported simdlen %wd",
+		      clonei->simdlen.to_constant ());
 	  return 0;
 	}
       }
diff --git a/gcc/gengtype.c b/gcc/gengtype.c
index a59a8823f82..919e3a00bf2 100644
--- a/gcc/gengtype.c
+++ b/gcc/gengtype.c
@@ -5198,6 +5198,7 @@ main (int argc, char **argv)
       POS_HERE (do_scalar_typedef ("widest_int", &pos));
       POS_HERE (do_scalar_typedef ("int64_t", &pos));
       POS_HERE (do_scalar_typedef ("poly_int64", &pos));
+      POS_HERE (do_scalar_typedef ("poly_uint64", &pos));
       POS_HERE (do_scalar_typedef ("uint64_t", &pos));
       POS_HERE (do_scalar_typedef ("uint8", &pos));
       POS_HERE (do_scalar_typedef ("uintptr_t", &pos));
diff --git a/gcc/omp-simd-clone.c b/gcc/omp-simd-clone.c
index 942fb971cb7..cbd58c8987b 100644
--- a/gcc/omp-simd-clone.c
+++ b/gcc/omp-simd-clone.c
@@ -338,16 +338,18 @@ simd_clone_mangle (struct cgraph_node *node,
 {
   char vecsize_mangle = clone_info->vecsize_mangle;
   char mask = clone_info->inbranch ? 'M' : 'N';
-  unsigned int simdlen = clone_info->simdlen;
+  poly_uint64 simdlen = clone_info->simdlen;
   unsigned int n;
   pretty_printer pp;
 
-  gcc_assert (vecsize_mangle && simdlen);
+  gcc_assert (vecsize_mangle && maybe_ne (simdlen, 0U));
 
   pp_string (&pp, "_ZGV");
   pp_character (&pp, vecsize_mangle);
   pp_character (&pp, mask);
-  pp_decimal_int (&pp, simdlen);
+  /* For now, simdlen is always constant, while variable simdlen pp 'n'.  */
+  unsigned int len = simdlen.to_constant ();
+  pp_decimal_int (&pp, (len));
 
   for (n = 0; n < clone_info->nargs; ++n)
     {
@@ -491,7 +493,7 @@ simd_clone_adjust_return_type (struct cgraph_node *node)
 {
   tree fndecl = node->decl;
   tree orig_rettype = TREE_TYPE (TREE_TYPE (fndecl));
-  unsigned int veclen;
+  poly_uint64 veclen;
   tree t;
 
   /* Adjust the function return type.  */
@@ -502,17 +504,18 @@ simd_clone_adjust_return_type (struct cgraph_node *node)
     veclen = node->simdclone->vecsize_int;
   else
     veclen = node->simdclone->vecsize_float;
-  veclen /= GET_MODE_BITSIZE (SCALAR_TYPE_MODE (t));
-  if (veclen > node->simdclone->simdlen)
+  veclen = exact_div (veclen, GET_MODE_BITSIZE (SCALAR_TYPE_MODE (t)));
+  if (multiple_p (veclen, node->simdclone->simdlen))
     veclen = node->simdclone->simdlen;
   if (POINTER_TYPE_P (t))
     t = pointer_sized_int_node;
-  if (veclen == node->simdclone->simdlen)
+  if (known_eq (veclen, node->simdclone->simdlen))
     t = build_vector_type (t, node->simdclone->simdlen);
   else
     {
       t = build_vector_type (t, veclen);
-      t = build_array_type_nelts (t, node->simdclone->simdlen / veclen);
+      t = build_array_type_nelts (t, exact_div (node->simdclone->simdlen,
+						veclen));
     }
   TREE_TYPE (TREE_TYPE (fndecl)) = t;
   if (!node->definition)
@@ -526,7 +529,7 @@ simd_clone_adjust_return_type (struct cgraph_node *node)
 
   tree atype = build_array_type_nelts (orig_rettype,
 				       node->simdclone->simdlen);
-  if (veclen != node->simdclone->simdlen)
+  if (maybe_ne (veclen, node->simdclone->simdlen))
     return build1 (VIEW_CONVERT_EXPR, atype, t);
 
   /* Set up a SIMD array to use as the return value.  */
@@ -546,7 +549,7 @@ simd_clone_adjust_return_type (struct cgraph_node *node)
    SIMDLEN is the number of elements.  */
 
 static tree
-create_tmp_simd_array (const char *prefix, tree type, int simdlen)
+create_tmp_simd_array (const char *prefix, tree type, poly_uint64 simdlen)
 {
   tree atype = build_array_type_nelts (type, simdlen);
   tree avar = create_tmp_var_raw (atype, prefix);
@@ -578,7 +581,8 @@ simd_clone_adjust_argument_types (struct cgraph_node *node)
   struct cgraph_simd_clone *sc = node->simdclone;
   vec<ipa_adjusted_param, va_gc> *new_params = NULL;
   vec_safe_reserve (new_params, sc->nargs);
-  unsigned i, j, veclen;
+  unsigned i, j, k;
+  poly_uint64 veclen;
 
   for (i = 0; i < sc->nargs; ++i)
     {
@@ -614,8 +618,9 @@ simd_clone_adjust_argument_types (struct cgraph_node *node)
 	    veclen = sc->vecsize_int;
 	  else
 	    veclen = sc->vecsize_float;
-	  veclen /= GET_MODE_BITSIZE (SCALAR_TYPE_MODE (parm_type));
-	  if (veclen > sc->simdlen)
+	  veclen = exact_div (veclen,
+			      GET_MODE_BITSIZE (SCALAR_TYPE_MODE (parm_type)));
+	  if (multiple_p (veclen, sc->simdlen))
 	    veclen = sc->simdlen;
 	  adj.op = IPA_PARAM_OP_NEW;
 	  adj.param_prefix_index = IPA_PARAM_PREFIX_SIMD;
@@ -624,10 +629,11 @@ simd_clone_adjust_argument_types (struct cgraph_node *node)
 	  else
 	    adj.type = build_vector_type (parm_type, veclen);
 	  sc->args[i].vector_type = adj.type;
-	  for (j = veclen; j < sc->simdlen; j += veclen)
+	  k = vector_unroll_factor (sc->simdlen, veclen);
+	  for (j = 1; j < k; j++)
 	    {
 	      vec_safe_push (new_params, adj);
-	      if (j == veclen)
+	      if (j == 1)
 		{
 		  memset (&adj, 0, sizeof (adj));
 		  adj.op = IPA_PARAM_OP_NEW;
@@ -663,8 +669,9 @@ simd_clone_adjust_argument_types (struct cgraph_node *node)
 	veclen = sc->vecsize_int;
       else
 	veclen = sc->vecsize_float;
-      veclen /= GET_MODE_BITSIZE (SCALAR_TYPE_MODE (base_type));
-      if (veclen > sc->simdlen)
+      veclen = exact_div (veclen,
+			  GET_MODE_BITSIZE (SCALAR_TYPE_MODE (base_type)));
+      if (multiple_p (veclen, sc->simdlen))
 	veclen = sc->simdlen;
       if (sc->mask_mode != VOIDmode)
 	adj.type
@@ -675,7 +682,8 @@ simd_clone_adjust_argument_types (struct cgraph_node *node)
 	adj.type = build_vector_type (base_type, veclen);
       vec_safe_push (new_params, adj);
 
-      for (j = veclen; j < sc->simdlen; j += veclen)
+      k = vector_unroll_factor (sc->simdlen, veclen);
+      for (j = 1; j < k; j++)
 	vec_safe_push (new_params, adj);
 
       /* We have previously allocated one extra entry for the mask.  Use
@@ -690,9 +698,9 @@ simd_clone_adjust_argument_types (struct cgraph_node *node)
 	  if (sc->mask_mode == VOIDmode)
 	    sc->args[i].simd_array
 	      = create_tmp_simd_array ("mask", base_type, sc->simdlen);
-	  else if (veclen < sc->simdlen)
+	  else if (k > 1)
 	    sc->args[i].simd_array
-	      = create_tmp_simd_array ("mask", adj.type, sc->simdlen / veclen);
+	      = create_tmp_simd_array ("mask", adj.type, k);
 	  else
 	    sc->args[i].simd_array = NULL_TREE;
 	}
@@ -783,7 +791,8 @@ simd_clone_init_simd_arrays (struct cgraph_node *node,
 	    }
 	  continue;
 	}
-      if (simd_clone_subparts (TREE_TYPE (arg)) == node->simdclone->simdlen)
+      if (known_eq (simd_clone_subparts (TREE_TYPE (arg)),
+		    node->simdclone->simdlen))
 	{
 	  tree ptype = build_pointer_type (TREE_TYPE (TREE_TYPE (array)));
 	  tree ptr = build_fold_addr_expr (array);
@@ -795,8 +804,10 @@ simd_clone_init_simd_arrays (struct cgraph_node *node,
       else
 	{
 	  unsigned int simdlen = simd_clone_subparts (TREE_TYPE (arg));
+	  unsigned int times = vector_unroll_factor (node->simdclone->simdlen,
+						     simdlen);
 	  tree ptype = build_pointer_type (TREE_TYPE (TREE_TYPE (array)));
-	  for (k = 0; k < node->simdclone->simdlen; k += simdlen)
+	  for (k = 0; k < times; k++)
 	    {
 	      tree ptr = build_fold_addr_expr (array);
 	      int elemsize;
@@ -808,7 +819,7 @@ simd_clone_init_simd_arrays (struct cgraph_node *node,
 	      tree elemtype = TREE_TYPE (TREE_TYPE (arg));
 	      elemsize = GET_MODE_SIZE (SCALAR_TYPE_MODE (elemtype));
 	      tree t = build2 (MEM_REF, TREE_TYPE (arg), ptr,
-			       build_int_cst (ptype, k * elemsize));
+			       build_int_cst (ptype, k * elemsize * simdlen));
 	      t = build2 (MODIFY_EXPR, TREE_TYPE (t), t, arg);
 	      gimplify_and_add (t, &seq);
 	    }
@@ -981,8 +992,9 @@ ipa_simd_modify_function_body (struct cgraph_node *node,
 		  iter, NULL_TREE, NULL_TREE);
       adjustments->register_replacement (&(*adjustments->m_adj_params)[j], r);
 
-      if (simd_clone_subparts (vectype) < node->simdclone->simdlen)
-	j += node->simdclone->simdlen / simd_clone_subparts (vectype) - 1;
+      if (multiple_p (node->simdclone->simdlen, simd_clone_subparts (vectype)))
+	j += vector_unroll_factor (node->simdclone->simdlen,
+				   simd_clone_subparts (vectype)) - 1;
     }
 
   tree name;
@@ -1249,7 +1261,8 @@ simd_clone_adjust (struct cgraph_node *node)
 	 below).  */
       loop = alloc_loop ();
       cfun->has_force_vectorize_loops = true;
-      loop->safelen = node->simdclone->simdlen;
+      /* For now, simlen is always constant.  */
+      loop->safelen = node->simdclone->simdlen.to_constant ();
       loop->force_vectorize = true;
       loop->header = body_bb;
     }
@@ -1275,7 +1288,8 @@ simd_clone_adjust (struct cgraph_node *node)
 	    {
 	      tree maskt = TREE_TYPE (mask_array);
 	      int c = tree_to_uhwi (TYPE_MAX_VALUE (TYPE_DOMAIN (maskt)));
-	      c = node->simdclone->simdlen / (c + 1);
+	      /* For now, c must be constant here.  */
+	      c = exact_div (node->simdclone->simdlen, c + 1).to_constant ();
 	      int s = exact_log2 (c);
 	      gcc_assert (s > 0);
 	      c--;
@@ -1683,7 +1697,7 @@ expand_simd_clones (struct cgraph_node *node)
       if (clone_info == NULL)
 	continue;
 
-      int orig_simdlen = clone_info->simdlen;
+      poly_uint64 orig_simdlen = clone_info->simdlen;
       tree base_type = simd_clone_compute_base_data_type (node, clone_info);
       /* The target can return 0 (no simd clones should be created),
 	 1 (just one ISA of simd clones should be created) or higher
diff --git a/gcc/poly-int-types.h b/gcc/poly-int-types.h
index 5e04e63ebf2..302e5bfd97d 100644
--- a/gcc/poly-int-types.h
+++ b/gcc/poly-int-types.h
@@ -81,6 +81,14 @@ typedef poly_int<NUM_POLY_INT_COEFFS, widest_int> poly_widest_int;
 #define vector_element_size(SIZE, NELTS) \
   (exact_div (SIZE, NELTS).to_constant ())
 
+/* Return the number of unroll times when a vector that has NELTS1 elements
+   is unrolled to vectors that have NELTS2 elements.
+
+   to_constant () is safe in this situation because the multiples of the
+   NELTS of two vectors are always constant-size scalars.  */
+#define vector_unroll_factor(NELTS1, NELTS2) \
+  (exact_div (NELTS1, NELTS2).to_constant ())
+
 /* Wrapper for poly_int arguments to target macros, so that if a target
    doesn't need polynomial-sized modes, its header file can continue to
    treat the argument as a normal constant.  This should go away once
diff --git a/gcc/poly-int.h b/gcc/poly-int.h
index b953ffacec4..96a763daedf 100644
--- a/gcc/poly-int.h
+++ b/gcc/poly-int.h
@@ -2044,6 +2044,63 @@ constant_multiple_p (const poly_int_pod<N, Ca> &a,
   return true;
 }
 
+/* Return true if A is a constant multiple of B.  */
+
+template<unsigned int N, typename Ca, typename Cb>
+inline typename if_nonpoly<Cb, bool>::type
+constant_multiple_p (const poly_int_pod<N, Ca> &a, Cb b)
+{
+  typedef POLY_CAST (Ca, Cb) NCa;
+  typedef POLY_CAST (Cb, Ca) NCb;
+
+  /* Do the modulus before the constant check, to catch divide by
+     zero errors.  */
+  if (NCa (a.coeffs[0]) % NCb (b) != 0 || !a.is_constant ())
+    return false;
+  return true;
+}
+
+template<unsigned int N, typename Ca, typename Cb>
+inline typename if_nonpoly<Ca, bool>::type
+constant_multiple_p (Ca a, const poly_int_pod<N, Cb> &b)
+{
+  typedef POLY_CAST (Ca, Cb) NCa;
+  typedef POLY_CAST (Cb, Ca) NCb;
+  typedef POLY_INT_TYPE (Ca) int_type;
+
+  /* Do the modulus before the constant check, to catch divide by
+     zero errors.  */
+  if (NCa (a) % NCb (b.coeffs[0]) != 0
+      || (a != int_type (0) && !b.is_constant ()))
+    return false;
+  return true;
+}
+
+template<unsigned int N, typename Ca, typename Cb>
+inline bool
+constant_multiple_p (const poly_int_pod<N, Ca> &a,
+		     const poly_int_pod<N, Cb> &b)
+{
+  typedef POLY_CAST (Ca, Cb) NCa;
+  typedef POLY_CAST (Cb, Ca) NCb;
+  typedef POLY_INT_TYPE (Ca) ICa;
+  typedef POLY_INT_TYPE (Cb) ICb;
+  typedef POLY_BINARY_COEFF (Ca, Cb) C;
+
+  if (NCa (a.coeffs[0]) % NCb (b.coeffs[0]) != 0)
+    return false;
+
+  C r = NCa (a.coeffs[0]) / NCb (b.coeffs[0]);
+  for (unsigned int i = 1; i < N; ++i)
+    if (b.coeffs[i] == ICb (0)
+	? a.coeffs[i] != ICa (0)
+	: (NCa (a.coeffs[i]) % NCb (b.coeffs[i]) != 0
+	   || NCa (a.coeffs[i]) / NCb (b.coeffs[i]) != r))
+      return false;
+  return true;
+}
+
+
 /* Return true if A is a multiple of B.  */
 
 template<typename Ca, typename Cb>
diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
index 3575f25241f..69296ff3a2c 100644
--- a/gcc/tree-vect-stmts.c
+++ b/gcc/tree-vect-stmts.c
@@ -3731,7 +3731,7 @@ vectorizable_simd_clone_call (vec_info *vinfo, stmt_vec_info stmt_info,
   tree op, type;
   tree vec_oprnd0 = NULL_TREE;
   tree vectype;
-  unsigned int nunits;
+  poly_uint64 nunits;
   loop_vec_info loop_vinfo = dyn_cast <loop_vec_info> (vinfo);
   bb_vec_info bb_vinfo = dyn_cast <bb_vec_info> (vinfo);
   class loop *loop = loop_vinfo ? LOOP_VINFO_LOOP (loop_vinfo) : NULL;
@@ -3883,8 +3883,8 @@ vectorizable_simd_clone_call (vec_info *vinfo, stmt_vec_info stmt_info,
       arginfo.quick_push (thisarginfo);
     }
 
-  unsigned HOST_WIDE_INT vf;
-  if (!LOOP_VINFO_VECT_FACTOR (loop_vinfo).is_constant (&vf))
+  poly_uint64 vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
+  if (!vf.is_constant ())
     {
       if (dump_enabled_p ())
 	dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
@@ -3902,12 +3902,12 @@ vectorizable_simd_clone_call (vec_info *vinfo, stmt_vec_info stmt_info,
 	 n = n->simdclone->next_clone)
       {
 	unsigned int this_badness = 0;
-	if (n->simdclone->simdlen > vf
+	unsigned int num_calls;
+	if (!constant_multiple_p (vf, n->simdclone->simdlen, &num_calls)
 	    || n->simdclone->nargs != nargs)
 	  continue;
-	if (n->simdclone->simdlen < vf)
-	  this_badness += (exact_log2 (vf)
-			   - exact_log2 (n->simdclone->simdlen)) * 1024;
+	if (num_calls != 1)
+	  this_badness += exact_log2 (num_calls) * 1024;
 	if (n->simdclone->inbranch)
 	  this_badness += 2048;
 	int target_badness = targetm.simd_clone.usable (n);
@@ -3988,19 +3988,19 @@ vectorizable_simd_clone_call (vec_info *vinfo, stmt_vec_info stmt_info,
 	arginfo[i].vectype = get_vectype_for_scalar_type (vinfo, arg_type,
 							  slp_node);
 	if (arginfo[i].vectype == NULL
-	    || (simd_clone_subparts (arginfo[i].vectype)
-		> bestn->simdclone->simdlen))
+	    || !constant_multiple_p (bestn->simdclone->simdlen,
+				     simd_clone_subparts (arginfo[i].vectype)))
 	  return false;
       }
 
   fndecl = bestn->decl;
   nunits = bestn->simdclone->simdlen;
-  ncopies = vf / nunits;
+  ncopies = vector_unroll_factor (vf, nunits);
 
   /* If the function isn't const, only allow it in simd loops where user
      has asserted that at least nunits consecutive iterations can be
      performed using SIMD instructions.  */
-  if ((loop == NULL || (unsigned) loop->safelen < nunits)
+  if ((loop == NULL || maybe_lt ((unsigned) loop->safelen, nunits))
       && gimple_vuse (stmt))
     return false;
 
@@ -4078,7 +4078,8 @@ vectorizable_simd_clone_call (vec_info *vinfo, stmt_vec_info stmt_info,
 	    {
 	    case SIMD_CLONE_ARG_TYPE_VECTOR:
 	      atype = bestn->simdclone->args[i].vector_type;
-	      o = nunits / simd_clone_subparts (atype);
+	      o = vector_unroll_factor (nunits,
+					simd_clone_subparts (atype));
 	      for (m = j * o; m < (j + 1) * o; m++)
 		{
 		  if (simd_clone_subparts (atype)
@@ -4203,7 +4204,7 @@ vectorizable_simd_clone_call (vec_info *vinfo, stmt_vec_info stmt_info,
 		      ? POINTER_PLUS_EXPR : PLUS_EXPR;
 		  tree type = POINTER_TYPE_P (TREE_TYPE (op))
 			      ? sizetype : TREE_TYPE (op);
-		  widest_int cst
+		  poly_widest_int cst
 		    = wi::mul (bestn->simdclone->args[i].linear_step,
 			       ncopies * nunits);
 		  tree tcst = wide_int_to_tree (type, cst);
@@ -4224,7 +4225,7 @@ vectorizable_simd_clone_call (vec_info *vinfo, stmt_vec_info stmt_info,
 		      ? POINTER_PLUS_EXPR : PLUS_EXPR;
 		  tree type = POINTER_TYPE_P (TREE_TYPE (op))
 			      ? sizetype : TREE_TYPE (op);
-		  widest_int cst
+		  poly_widest_int cst
 		    = wi::mul (bestn->simdclone->args[i].linear_step,
 			       j * nunits);
 		  tree tcst = wide_int_to_tree (type, cst);
@@ -4250,7 +4251,8 @@ vectorizable_simd_clone_call (vec_info *vinfo, stmt_vec_info stmt_info,
       gcall *new_call = gimple_build_call_vec (fndecl, vargs);
       if (vec_dest)
 	{
-	  gcc_assert (ratype || simd_clone_subparts (rtype) == nunits);
+	  gcc_assert (ratype
+		      || known_eq (simd_clone_subparts (rtype), nunits));
 	  if (ratype)
 	    new_temp = create_tmp_var (ratype);
 	  else if (useless_type_conversion_p (vectype, rtype))
@@ -4264,12 +4266,13 @@ vectorizable_simd_clone_call (vec_info *vinfo, stmt_vec_info stmt_info,
 
       if (vec_dest)
 	{
-	  if (simd_clone_subparts (vectype) < nunits)
+	  if (!multiple_p (simd_clone_subparts (vectype), nunits))
 	    {
 	      unsigned int k, l;
 	      poly_uint64 prec = GET_MODE_BITSIZE (TYPE_MODE (vectype));
 	      poly_uint64 bytes = GET_MODE_SIZE (TYPE_MODE (vectype));
-	      k = nunits / simd_clone_subparts (vectype);
+	      k = vector_unroll_factor (nunits,
+					simd_clone_subparts (vectype));
 	      gcc_assert ((k & (k - 1)) == 0);
 	      for (l = 0; l < k; l++)
 		{
@@ -4295,7 +4298,7 @@ vectorizable_simd_clone_call (vec_info *vinfo, stmt_vec_info stmt_info,
 		vect_clobber_variable (vinfo, stmt_info, gsi, new_temp);
 	      continue;
 	    }
-	  else if (simd_clone_subparts (vectype) > nunits)
+	  else if (!multiple_p (nunits, simd_clone_subparts (vectype)))
 	    {
 	      unsigned int k = (simd_clone_subparts (vectype)
 				/ simd_clone_subparts (rtype));
@@ -4304,7 +4307,9 @@ vectorizable_simd_clone_call (vec_info *vinfo, stmt_vec_info stmt_info,
 		vec_alloc (ret_ctor_elts, k);
 	      if (ratype)
 		{
-		  unsigned int m, o = nunits / simd_clone_subparts (rtype);
+		  unsigned int m, o;
+		  o = vector_unroll_factor (nunits,
+					    simd_clone_subparts (rtype));
 		  for (m = 0; m < o; m++)
 		    {
 		      tree tem = build4 (ARRAY_REF, rtype, new_temp,
-- 
2.19.1


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/5] [PR target/96342] Change field "simdlen" into poly_uint64
  2020-11-03 11:07       ` yangyang (ET)
@ 2020-11-03 16:14         ` Richard Sandiford
  2020-11-04  6:01           ` yangyang (ET)
  0 siblings, 1 reply; 9+ messages in thread
From: Richard Sandiford @ 2020-11-03 16:14 UTC (permalink / raw)
  To: yangyang (ET); +Cc: gcc-patches

"yangyang (ET)" <yangyang305@huawei.com> writes:
> Hi,
>
> I have revised the patch based on your suggestions. I use multiple_p instead of !multiple_p if the eq situation is OK to make it easier to understand.
>
>> >> >     if (n->simdclone->inbranch)
>> >> >       this_badness += 2048;
>> >> >     int target_badness = targetm.simd_clone.usable (n); @@ -3988,19
>> >> > +3988,19 @@ vectorizable_simd_clone_call (vec_info *vinfo,
>> >> > +stmt_vec_info
>> >> stmt_info,
>> >> >     arginfo[i].vectype = get_vectype_for_scalar_type (vinfo, arg_type,
>> >> >
>> slp_node);
>> >> >     if (arginfo[i].vectype == NULL
>> >> > -       || (simd_clone_subparts (arginfo[i].vectype)
>> >> > -           > bestn->simdclone->simdlen))
>> >> > +       || (known_gt (simd_clone_subparts (arginfo[i].vectype),
>> >> > +                     bestn->simdclone->simdlen)))
>> >>
>> >> Here too I think we want constant_multiple_p:
>> >>
>> >>           || !constant_multiple_p (bestn->simdclone->simdlen,
>> >>                                    simd_clone_subparts
>> >> (arginfo[i].vectype))
>> >>
>> >
>> > Use multiple_p here since the multiple is not needed.
>>
>> True, but in the case of vectorisation, we need to generate a constant number
>> of copies at compile time.  If we don't enforce a constant multiple, we might
>> end up trying to use an Advanced SIMD routine when vectorising for SVE.
>>
>> The reason we don't have a two-argument version of constant_multiple_p is
>> that so far nothing has needed it (at least AFAIK).  There's no conceptual
>> problem with adding it though.  I'm happy to do that if it would help.
>>
>
> Two-argument versions of constant_multiple_p are added in the v3-patch. Could you please check if the added versions are OK ?
>
> Bootstrap and tested on both aarch64 and x86 Linux platform, no new regression witnessed.

Looks great, thanks.  Pushed to trunk.

Richard

^ permalink raw reply	[flat|nested] 9+ messages in thread

* RE: [PATCH 1/5] [PR target/96342] Change field "simdlen" into poly_uint64
  2020-11-03 16:14         ` Richard Sandiford
@ 2020-11-04  6:01           ` yangyang (ET)
  2020-11-04 10:26             ` Richard Sandiford
  0 siblings, 1 reply; 9+ messages in thread
From: yangyang (ET) @ 2020-11-04  6:01 UTC (permalink / raw)
  To: Richard Sandiford; +Cc: gcc-patches

Hi, 

> -----Original Message-----
> From: Richard Sandiford [mailto:richard.sandiford@arm.com]
> Sent: Wednesday, November 4, 2020 12:15 AM
> To: yangyang (ET) <yangyang305@huawei.com>
> Cc: gcc-patches@gcc.gnu.org
> Subject: Re: [PATCH 1/5] [PR target/96342] Change field "simdlen" into
> poly_uint64
> 
> "yangyang (ET)" <yangyang305@huawei.com> writes:
> > Hi,
> >
> > I have revised the patch based on your suggestions. I use multiple_p instead
> of !multiple_p if the eq situation is OK to make it easier to understand.
> >
> >> >> >     if (n->simdclone->inbranch)
> >> >> >       this_badness += 2048;
> >> >> >     int target_badness = targetm.simd_clone.usable (n); @@
> >> >> > -3988,19
> >> >> > +3988,19 @@ vectorizable_simd_clone_call (vec_info *vinfo,
> >> >> > +stmt_vec_info
> >> >> stmt_info,
> >> >> >     arginfo[i].vectype = get_vectype_for_scalar_type (vinfo,
> >> >> > arg_type,
> >> >> >
> >> slp_node);
> >> >> >     if (arginfo[i].vectype == NULL
> >> >> > -       || (simd_clone_subparts (arginfo[i].vectype)
> >> >> > -           > bestn->simdclone->simdlen))
> >> >> > +       || (known_gt (simd_clone_subparts (arginfo[i].vectype),
> >> >> > +                     bestn->simdclone->simdlen)))
> >> >>
> >> >> Here too I think we want constant_multiple_p:
> >> >>
> >> >>           || !constant_multiple_p (bestn->simdclone->simdlen,
> >> >>                                    simd_clone_subparts
> >> >> (arginfo[i].vectype))
> >> >>
> >> >
> >> > Use multiple_p here since the multiple is not needed.
> >>
> >> True, but in the case of vectorisation, we need to generate a
> >> constant number of copies at compile time.  If we don't enforce a
> >> constant multiple, we might end up trying to use an Advanced SIMD routine
> when vectorising for SVE.
> >>
> >> The reason we don't have a two-argument version of
> >> constant_multiple_p is that so far nothing has needed it (at least
> >> AFAIK).  There's no conceptual problem with adding it though.  I'm happy
> to do that if it would help.
> >>
> >
> > Two-argument versions of constant_multiple_p are added in the v3-patch.
> Could you please check if the added versions are OK ?
> >
> > Bootstrap and tested on both aarch64 and x86 Linux platform, no new
> regression witnessed.
> 
> Looks great, thanks.  Pushed to trunk.
> 
> Richard

Thanks for installing the patch. As you mentioned in the PR, stage1 of GCC 11 is going to close in a few weeks, and
GCC Development Plan describes the stage3 as " During this two-month period, the only (non-documentation) changes
that may be made are changes that fix bugs or new ports which do not require changes to other parts of the compiler.
New functionality may not be introduced during this period. ". So does it mean that the rest four patches of this feature
need to wait for the GCC 12 stage1 to get installed?

Thanks,
Yang Yang

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/5] [PR target/96342] Change field "simdlen" into poly_uint64
  2020-11-04  6:01           ` yangyang (ET)
@ 2020-11-04 10:26             ` Richard Sandiford
  2020-11-05  1:46               ` yangyang (ET)
  0 siblings, 1 reply; 9+ messages in thread
From: Richard Sandiford @ 2020-11-04 10:26 UTC (permalink / raw)
  To: yangyang (ET); +Cc: gcc-patches

"yangyang (ET)" <yangyang305@huawei.com> writes:
> Hi, 
>
>> -----Original Message-----
>> From: Richard Sandiford [mailto:richard.sandiford@arm.com]
>> Sent: Wednesday, November 4, 2020 12:15 AM
>> To: yangyang (ET) <yangyang305@huawei.com>
>> Cc: gcc-patches@gcc.gnu.org
>> Subject: Re: [PATCH 1/5] [PR target/96342] Change field "simdlen" into
>> poly_uint64
>> 
>> "yangyang (ET)" <yangyang305@huawei.com> writes:
>> > Hi,
>> >
>> > I have revised the patch based on your suggestions. I use multiple_p instead
>> of !multiple_p if the eq situation is OK to make it easier to understand.
>> >
>> >> >> >     if (n->simdclone->inbranch)
>> >> >> >       this_badness += 2048;
>> >> >> >     int target_badness = targetm.simd_clone.usable (n); @@
>> >> >> > -3988,19
>> >> >> > +3988,19 @@ vectorizable_simd_clone_call (vec_info *vinfo,
>> >> >> > +stmt_vec_info
>> >> >> stmt_info,
>> >> >> >     arginfo[i].vectype = get_vectype_for_scalar_type (vinfo,
>> >> >> > arg_type,
>> >> >> >
>> >> slp_node);
>> >> >> >     if (arginfo[i].vectype == NULL
>> >> >> > -       || (simd_clone_subparts (arginfo[i].vectype)
>> >> >> > -           > bestn->simdclone->simdlen))
>> >> >> > +       || (known_gt (simd_clone_subparts (arginfo[i].vectype),
>> >> >> > +                     bestn->simdclone->simdlen)))
>> >> >>
>> >> >> Here too I think we want constant_multiple_p:
>> >> >>
>> >> >>           || !constant_multiple_p (bestn->simdclone->simdlen,
>> >> >>                                    simd_clone_subparts
>> >> >> (arginfo[i].vectype))
>> >> >>
>> >> >
>> >> > Use multiple_p here since the multiple is not needed.
>> >>
>> >> True, but in the case of vectorisation, we need to generate a
>> >> constant number of copies at compile time.  If we don't enforce a
>> >> constant multiple, we might end up trying to use an Advanced SIMD routine
>> when vectorising for SVE.
>> >>
>> >> The reason we don't have a two-argument version of
>> >> constant_multiple_p is that so far nothing has needed it (at least
>> >> AFAIK).  There's no conceptual problem with adding it though.  I'm happy
>> to do that if it would help.
>> >>
>> >
>> > Two-argument versions of constant_multiple_p are added in the v3-patch.
>> Could you please check if the added versions are OK ?
>> >
>> > Bootstrap and tested on both aarch64 and x86 Linux platform, no new
>> regression witnessed.
>> 
>> Looks great, thanks.  Pushed to trunk.
>> 
>> Richard
>
> Thanks for installing the patch. As you mentioned in the PR, stage1 of GCC 11 is going to close in a few weeks, and
> GCC Development Plan describes the stage3 as " During this two-month period, the only (non-documentation) changes
> that may be made are changes that fix bugs or new ports which do not require changes to other parts of the compiler.
> New functionality may not be introduced during this period. ". So does it mean that the rest four patches of this feature
> need to wait for the GCC 12 stage1 to get installed?

Any taret-independent patches would need to be posted by the end of next
week to get into GCC 11.  There's a bit more leeway for SVE-specific
pieces in config/aarch64, since those have a lower impact.

Thanks,
Richard

^ permalink raw reply	[flat|nested] 9+ messages in thread

* RE: [PATCH 1/5] [PR target/96342] Change field "simdlen" into poly_uint64
  2020-11-04 10:26             ` Richard Sandiford
@ 2020-11-05  1:46               ` yangyang (ET)
  0 siblings, 0 replies; 9+ messages in thread
From: yangyang (ET) @ 2020-11-05  1:46 UTC (permalink / raw)
  To: Richard Sandiford; +Cc: gcc-patches

> > Thanks for installing the patch. As you mentioned in the PR, stage1 of
> > GCC 11 is going to close in a few weeks, and GCC Development Plan
> > describes the stage3 as " During this two-month period, the only
> (non-documentation) changes that may be made are changes that fix bugs or
> new ports which do not require changes to other parts of the compiler.
> > New functionality may not be introduced during this period. ". So does
> > it mean that the rest four patches of this feature need to wait for the GCC 12
> stage1 to get installed?
> 
> Any taret-independent patches would need to be posted by the end of next
> week to get into GCC 11.  There's a bit more leeway for SVE-specific pieces in
> config/aarch64, since those have a lower impact.
> 
> Thanks,
> Richard

Thanks, all the rest patches seem to contain changes to the shared code, and I haven't finished them yet. So I might wait for GCC 12 stage1 to get them in.

Thanks,
Yang Yang

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2020-11-05  1:46 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-10-30  2:29 [PATCH 1/5] [PR target/96342] Change field "simdlen" into poly_uint64 yangyang (ET)
2020-10-30 13:32 ` Richard Sandiford
2020-11-02 13:17   ` yangyang (ET)
2020-11-02 14:15     ` Richard Sandiford
2020-11-03 11:07       ` yangyang (ET)
2020-11-03 16:14         ` Richard Sandiford
2020-11-04  6:01           ` yangyang (ET)
2020-11-04 10:26             ` Richard Sandiford
2020-11-05  1:46               ` yangyang (ET)

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).