[PATCH 0/3] vect, aarch64: Add SVE support for simdclones

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* [PATCH 0/3] vect, aarch64: Add SVE support for simdclones
@ 2024-01-30 14:31 Andre Vieira
  2024-01-30 14:31 ` [PATCH 1/3] vect: Pass stmt_vec_info to TARGET_SIMD_CLONE_USABLE Andre Vieira
                   ` (2 more replies)
  0 siblings, 3 replies; 23+ messages in thread
From: Andre Vieira @ 2024-01-30 14:31 UTC (permalink / raw)
  To: gcc-patches; +Cc: Richard.Sandiford, rguenther, Andre Vieira

[-- Attachment #1: Type: text/plain, Size: 2533 bytes --]

Hi,

This patch series is a set of patches that I have sent up for review before and it enables initial support SVE simd clones with some caveats.
Caveat 1: we do not support SVE simd clones with function bodies.
To enable support for this we need to change the way we 'simdify' a function body. For each argument that maps to a vector an array is created with 'simdlen'. This however does not work for VLA simdlen.  We will need to come up with a way to support this such that the generated code is performant, there's little reason to 'simdify' a function by generating really slow code. I have some ideas on how we might be able to do this, though I'm not convinced it's even worth trying, but I think that's a bigger discussion.  For now I've disabled generating SVE simdclones for functions with function bodies.  This still fits our libmvec usecase as the simd clones are handwritten using intrinsics in glibc.

Caveat 2: we can not generate ncopy calls to a SVE simd clone call.
When I first sent the second patch of this series upstream Richi asked me to look at enabling being able to support calling ncopies of VLA simdlen simd clones, I have vectorizer code to do this, however I found that we didn't yet have enough backend support to be able to index VLA vectors to support this.  I think that's something that will need to wait until gcc 15, so for now I'd simply reject vectorization where that is required.

Caveat 3: we don't yet support SVE simdclones for VLS codegen.
We've disabled the use of SVE simdclones when the -msve-vector-bits option is used to request VLS codegen. We need this because the mangling is determined by the 'simdlen' of a simd clone which will not be VLA when -msve-vector-bits is passed. We would like to support using VLA simd clones when generating VLS, but for that to work right now we'd need to set the simdlen of the simd clone to the VLS value and that messes up the mangling.  In the future we will need to add a target hook to specify the mangling.

Given that the target agnostic changes are minimal, have been suggested before and have no impact on other targets, the target specific parts have been reviewed before, would this still be acceptable for Stage 4? I would really like to make use of the work that was done to support this and the SVE simdclones added to glibc.

Kind regards,
Andre

Andre Vieira (3):
vect: Pass stmt_vec_info to TARGET_SIMD_CLONE_USABLE
vect: disable multiple calls of poly simdclones
aarch64: Add SVE support for simd clones [PR 96342]

-- 
2.17.1

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 1/3] vect: Pass stmt_vec_info to TARGET_SIMD_CLONE_USABLE
  2024-01-30 14:31 [PATCH 0/3] vect, aarch64: Add SVE support for simdclones Andre Vieira
@ 2024-01-30 14:31 ` Andre Vieira
  2024-01-31 12:11   ` Richard Biener
  2024-01-30 14:31 ` [PATCH 2/3] vect: disable multiple calls of poly simdclones Andre Vieira
  2024-01-30 14:31 ` [PATCH 3/3] aarch64: Add SVE support for simd clones [PR 96342] Andre Vieira
  2 siblings, 1 reply; 23+ messages in thread
From: Andre Vieira @ 2024-01-30 14:31 UTC (permalink / raw)
  To: gcc-patches; +Cc: Richard.Sandiford, rguenther, Andre Vieira

[-- Attachment #1: Type: text/plain, Size: 859 bytes --]


This patch adds stmt_vec_info to TARGET_SIMD_CLONE_USABLE to make sure the
target can reject a simd_clone based on the vector mode it is using.
This is needed because for VLS SVE vectorization the vectorizer accepts
Advanced SIMD simd clones when vectorizing using SVE types because the simdlens
might match.  This will cause type errors later on.

Other targets do not currently need to use this argument.

gcc/ChangeLog:

	* target.def (TARGET_SIMD_CLONE_USABLE): Add argument.
	* tree-vect-stmts.cc (vectorizable_simd_clone_call): Pass stmt_info to
	call TARGET_SIMD_CLONE_USABLE.
	* config/aarch64/aarch64.cc (aarch64_simd_clone_usable): Add argument
	and use it to reject the use of SVE simd clones with Advanced SIMD
	modes.
	* config/gcn/gcn.cc (gcn_simd_clone_usable): Add unused argument.
	* config/i386/i386.cc (ix86_simd_clone_usable): Likewise.


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-vect-Pass-stmt_vec_info-to-TARGET_SIMD_CLONE_USABLE.patch --]
[-- Type: text/x-patch; name="0001-vect-Pass-stmt_vec_info-to-TARGET_SIMD_CLONE_USABLE.patch", Size: 2862 bytes --]

diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index a37d47b243e..31617510160 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -28694,13 +28694,16 @@ aarch64_simd_clone_adjust (struct cgraph_node *node)
 /* Implement TARGET_SIMD_CLONE_USABLE.  */
 
 static int
-aarch64_simd_clone_usable (struct cgraph_node *node)
+aarch64_simd_clone_usable (struct cgraph_node *node, stmt_vec_info stmt_vinfo)
 {
   switch (node->simdclone->vecsize_mangle)
     {
     case 'n':
       if (!TARGET_SIMD)
 	return -1;
+      if (STMT_VINFO_VECTYPE (stmt_vinfo)
+	  && aarch64_sve_mode_p (TYPE_MODE (STMT_VINFO_VECTYPE (stmt_vinfo))))
+	return -1;
       return 0;
     default:
       gcc_unreachable ();
diff --git a/gcc/config/gcn/gcn.cc b/gcc/config/gcn/gcn.cc
index e80de2ce056..c48b212d9e6 100644
--- a/gcc/config/gcn/gcn.cc
+++ b/gcc/config/gcn/gcn.cc
@@ -5658,7 +5658,8 @@ gcn_simd_clone_adjust (struct cgraph_node *ARG_UNUSED (node))
 /* Implement TARGET_SIMD_CLONE_USABLE.  */
 
 static int
-gcn_simd_clone_usable (struct cgraph_node *ARG_UNUSED (node))
+gcn_simd_clone_usable (struct cgraph_node *ARG_UNUSED (node),
+		       stmt_vec_info ARG_UNUSED (stmt_vinfo))
 {
   /* We don't need to do anything here because
      gcn_simd_clone_compute_vecsize_and_simdlen currently only returns one
diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index b3e7c74846e..63e6b9d2643 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -25193,7 +25193,8 @@ ix86_simd_clone_compute_vecsize_and_simdlen (struct cgraph_node *node,
    slightly less desirable, etc.).  */
 
 static int
-ix86_simd_clone_usable (struct cgraph_node *node)
+ix86_simd_clone_usable (struct cgraph_node *node,
+			stmt_vec_info ARG_UNUSED (stmt_vinfo))
 {
   switch (node->simdclone->vecsize_mangle)
     {
diff --git a/gcc/target.def b/gcc/target.def
index fdad7bbc93e..4fade9c4eec 100644
--- a/gcc/target.def
+++ b/gcc/target.def
@@ -1648,7 +1648,7 @@ DEFHOOK
 in vectorized loops in current function, or non-negative number if it is\n\
 usable.  In that case, the smaller the number is, the more desirable it is\n\
 to use it.",
-int, (struct cgraph_node *), NULL)
+int, (struct cgraph_node *, _stmt_vec_info *), NULL)
 
 HOOK_VECTOR_END (simd_clone)
 
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index 1dbe1115da4..da02082c034 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -4074,7 +4074,7 @@ vectorizable_simd_clone_call (vec_info *vinfo, stmt_vec_info stmt_info,
 	  this_badness += floor_log2 (num_calls) * 4096;
 	if (n->simdclone->inbranch)
 	  this_badness += 8192;
-	int target_badness = targetm.simd_clone.usable (n);
+	int target_badness = targetm.simd_clone.usable (n, stmt_info);
 	if (target_badness < 0)
 	  continue;
 	this_badness += target_badness * 512;

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 2/3] vect: disable multiple calls of poly simdclones
  2024-01-30 14:31 [PATCH 0/3] vect, aarch64: Add SVE support for simdclones Andre Vieira
  2024-01-30 14:31 ` [PATCH 1/3] vect: Pass stmt_vec_info to TARGET_SIMD_CLONE_USABLE Andre Vieira
@ 2024-01-30 14:31 ` Andre Vieira
  2024-01-31 12:13   ` Richard Biener
  2024-01-30 14:31 ` [PATCH 3/3] aarch64: Add SVE support for simd clones [PR 96342] Andre Vieira
  2 siblings, 1 reply; 23+ messages in thread
From: Andre Vieira @ 2024-01-30 14:31 UTC (permalink / raw)
  To: gcc-patches; +Cc: Richard.Sandiford, rguenther, Andre Vieira

[-- Attachment #1: Type: text/plain, Size: 478 bytes --]


The current codegen code to support VF's that are multiples of a simdclone
simdlen rely on BIT_FIELD_REF to create multiple input vectors.  This does not
work for non-constant simdclones, so we should disable using such clones when
the VF is a multiple of the non-constant simdlen until we change the codegen to
support those.

gcc/ChangeLog:

	* tree-vect-stmts.cc (vectorizable_simd_clone_call): Reject simdclones
	with non-constant simdlen when VF is not exactly the same.


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0002-vect-disable-multiple-calls-of-poly-simdclones.patch --]
[-- Type: text/x-patch; name="0002-vect-disable-multiple-calls-of-poly-simdclones.patch", Size: 763 bytes --]

diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index da02082c034..9bfb898683d 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -4068,7 +4068,10 @@ vectorizable_simd_clone_call (vec_info *vinfo, stmt_vec_info stmt_info,
 	if (!constant_multiple_p (vf * group_size, n->simdclone->simdlen,
 				  &num_calls)
 	    || (!n->simdclone->inbranch && (masked_call_offset > 0))
-	    || (nargs != simd_nargs))
+	    || (nargs != simd_nargs)
+	    /* Currently we do not support multiple calls of non-constant
+	       simdlen as poly vectors can not be accessed by BIT_FIELD_REF.  */
+	    || (!n->simdclone->simdlen.is_constant () && num_calls != 1))
 	  continue;
 	if (num_calls != 1)
 	  this_badness += floor_log2 (num_calls) * 4096;

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 3/3] aarch64: Add SVE support for simd clones [PR 96342]
  2024-01-30 14:31 [PATCH 0/3] vect, aarch64: Add SVE support for simdclones Andre Vieira
  2024-01-30 14:31 ` [PATCH 1/3] vect: Pass stmt_vec_info to TARGET_SIMD_CLONE_USABLE Andre Vieira
  2024-01-30 14:31 ` [PATCH 2/3] vect: disable multiple calls of poly simdclones Andre Vieira
@ 2024-01-30 14:31 ` Andre Vieira
  2024-02-01 21:59   ` Richard Sandiford
  2 siblings, 1 reply; 23+ messages in thread
From: Andre Vieira @ 2024-01-30 14:31 UTC (permalink / raw)
  To: gcc-patches; +Cc: Richard.Sandiford, rguenther, Andre Vieira

[-- Attachment #1: Type: text/plain, Size: 1187 bytes --]


This patch finalizes adding support for the generation of SVE simd clones when
no simdlen is provided, following the ABI rules where the widest data type
determines the minimum amount of elements in a length agnostic vector.

gcc/ChangeLog:

	* config/aarch64/aarch64-protos.h (add_sve_type_attribute): Declare.
	* config/aarch64/aarch64-sve-builtins.cc (add_sve_type_attribute): Make
	visibility global and support use for non_acle types.
	* config/aarch64/aarch64.cc
	(aarch64_simd_clone_compute_vecsize_and_simdlen): Create VLA simd clone
	when no simdlen is provided, according to ABI rules.
	(simd_clone_adjust_sve_vector_type): New helper function.
	(aarch64_simd_clone_adjust): Add '+sve' attribute to SVE simd clones
	and modify types to use SVE types.
	* omp-simd-clone.cc (simd_clone_mangle): Print 'x' for VLA simdlen.
	(simd_clone_adjust): Adapt safelen check to be compatible with VLA
	simdlen.

gcc/testsuite/ChangeLog:

	* c-c++-common/gomp/declare-variant-14.c: Make i?86 and x86_64 target
	only test.
	* gfortran.dg/gomp/declare-variant-14.f90: Likewise.
	* gcc.target/aarch64/declare-simd-2.c: Add SVE clone scan.
	* gcc.target/aarch64/vect-simd-clone-1.c: New test.


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0003-aarch64-Add-SVE-support-for-simd-clones-PR-96342.patch --]
[-- Type: text/x-patch; name="0003-aarch64-Add-SVE-support-for-simd-clones-PR-96342.patch", Size: 17079 bytes --]

diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
index a0b142e0b94..207396de0ff 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -1031,6 +1031,8 @@ namespace aarch64_sve {
 #ifdef GCC_TARGET_H
   bool verify_type_context (location_t, type_context_kind, const_tree, bool);
 #endif
+ void add_sve_type_attribute (tree, unsigned int, unsigned int,
+			      const char *, const char *);
 }
 
 extern void aarch64_split_combinev16qi (rtx operands[3]);
diff --git a/gcc/config/aarch64/aarch64-sve-builtins.cc b/gcc/config/aarch64/aarch64-sve-builtins.cc
index 11f5c5c500c..747131e684e 100644
--- a/gcc/config/aarch64/aarch64-sve-builtins.cc
+++ b/gcc/config/aarch64/aarch64-sve-builtins.cc
@@ -953,14 +953,16 @@ static bool reported_missing_registers_p;
 /* Record that TYPE is an ABI-defined SVE type that contains NUM_ZR SVE vectors
    and NUM_PR SVE predicates.  MANGLED_NAME, if nonnull, is the ABI-defined
    mangling of the type.  ACLE_NAME is the <arm_sve.h> name of the type.  */
-static void
+void
 add_sve_type_attribute (tree type, unsigned int num_zr, unsigned int num_pr,
 			const char *mangled_name, const char *acle_name)
 {
   tree mangled_name_tree
     = (mangled_name ? get_identifier (mangled_name) : NULL_TREE);
+  tree acle_name_tree
+    = (acle_name ? get_identifier (acle_name) : NULL_TREE);
 
-  tree value = tree_cons (NULL_TREE, get_identifier (acle_name), NULL_TREE);
+  tree value = tree_cons (NULL_TREE, acle_name_tree, NULL_TREE);
   value = tree_cons (NULL_TREE, mangled_name_tree, value);
   value = tree_cons (NULL_TREE, size_int (num_pr), value);
   value = tree_cons (NULL_TREE, size_int (num_zr), value);
diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 31617510160..cba8879ab33 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -28527,7 +28527,7 @@ aarch64_simd_clone_compute_vecsize_and_simdlen (struct cgraph_node *node,
 					int num, bool explicit_p)
 {
   tree t, ret_type;
-  unsigned int nds_elt_bits;
+  unsigned int nds_elt_bits, wds_elt_bits;
   unsigned HOST_WIDE_INT const_simdlen;
 
   if (!TARGET_SIMD)
@@ -28572,10 +28572,14 @@ aarch64_simd_clone_compute_vecsize_and_simdlen (struct cgraph_node *node,
   if (TREE_CODE (ret_type) != VOID_TYPE)
     {
       nds_elt_bits = lane_size (SIMD_CLONE_ARG_TYPE_VECTOR, ret_type);
+      wds_elt_bits = nds_elt_bits;
       vec_elts.safe_push (std::make_pair (ret_type, nds_elt_bits));
     }
   else
-    nds_elt_bits = POINTER_SIZE;
+    {
+      nds_elt_bits = POINTER_SIZE;
+      wds_elt_bits = 0;
+    }
 
   int i;
   tree type_arg_types = TYPE_ARG_TYPES (TREE_TYPE (node->decl));
@@ -28583,44 +28587,72 @@ aarch64_simd_clone_compute_vecsize_and_simdlen (struct cgraph_node *node,
   for (t = (decl_arg_p ? DECL_ARGUMENTS (node->decl) : type_arg_types), i = 0;
        t && t != void_list_node; t = TREE_CHAIN (t), i++)
     {
-      tree arg_type = decl_arg_p ? TREE_TYPE (t) : TREE_VALUE (t);
+      tree type = decl_arg_p ? TREE_TYPE (t) : TREE_VALUE (t);
       if (clonei->args[i].arg_type != SIMD_CLONE_ARG_TYPE_UNIFORM
-	  && !supported_simd_type (arg_type))
+	  && !supported_simd_type (type))
 	{
 	  if (!explicit_p)
 	    ;
-	  else if (COMPLEX_FLOAT_TYPE_P (ret_type))
+	  else if (COMPLEX_FLOAT_TYPE_P (type))
 	    warning_at (DECL_SOURCE_LOCATION (node->decl), 0,
 			"GCC does not currently support argument type %qT "
-			"for simd", arg_type);
+			"for simd", type);
 	  else
 	    warning_at (DECL_SOURCE_LOCATION (node->decl), 0,
 			"unsupported argument type %qT for simd",
-			arg_type);
+			type);
 	  return 0;
 	}
-      unsigned lane_bits = lane_size (clonei->args[i].arg_type, arg_type);
+      unsigned lane_bits = lane_size (clonei->args[i].arg_type, type);
       if (clonei->args[i].arg_type == SIMD_CLONE_ARG_TYPE_VECTOR)
-	vec_elts.safe_push (std::make_pair (arg_type, lane_bits));
+	vec_elts.safe_push (std::make_pair (type, lane_bits));
       if (nds_elt_bits > lane_bits)
 	nds_elt_bits = lane_bits;
+      if (wds_elt_bits < lane_bits)
+	wds_elt_bits = lane_bits;
     }
 
-  clonei->vecsize_mangle = 'n';
+  /* If we could not determine the WDS type from available parameters/return,
+     then fallback to using uintptr_t.  */
+  if (wds_elt_bits == 0)
+    wds_elt_bits = POINTER_SIZE;
+
   clonei->mask_mode = VOIDmode;
   poly_uint64 simdlen;
-  auto_vec<poly_uint64> simdlens (2);
+  auto_vec<poly_uint64> simdlens (3);
+  auto_vec<char> simdmangle (3);
   /* Keep track of the possible simdlens the clones of this function can have,
      and check them later to see if we support them.  */
   if (known_eq (clonei->simdlen, 0U))
     {
       simdlen = exact_div (poly_uint64 (64), nds_elt_bits);
       if (maybe_ne (simdlen, 1U))
-	simdlens.safe_push (simdlen);
+	{
+	  simdlens.safe_push (simdlen);
+	  simdmangle.safe_push ('n');
+	}
       simdlens.safe_push (simdlen * 2);
+      simdmangle.safe_push ('n');
+      /* Only create a SVE simd clone if we aren't dealing with an unprototyped
+	 function.
+	We have also disabled support for creating SVE simdclones for functions
+	with function bodies and any simdclones when -msve-vector-bits is used.
+	TODO: add support for these.  */
+      if ((DECL_ARGUMENTS (node->decl) != 0
+	   || type_arg_types != 0)
+	  && !node->definition
+	  && !aarch64_sve_vg.is_constant ())
+	{
+	  poly_uint64 sve_simdlen = aarch64_sve_vg * 64;
+	  simdlens.safe_push (exact_div (sve_simdlen, wds_elt_bits));
+	  simdmangle.safe_push ('s');
+	}
     }
   else
-    simdlens.safe_push (clonei->simdlen);
+    {
+      simdlens.safe_push (clonei->simdlen);
+      simdmangle.safe_push ('n');
+    }
 
   clonei->vecsize_int = 0;
   clonei->vecsize_float = 0;
@@ -28638,7 +28670,8 @@ aarch64_simd_clone_compute_vecsize_and_simdlen (struct cgraph_node *node,
     {
       bool remove_simdlen = false;
       for (auto elt : vec_elts)
-	if (known_gt (simdlens[j] * elt.second, 128U))
+	if (simdmangle[j] == 'n'
+	    && known_gt (simdlens[j] * elt.second, 128U))
 	  {
 	    /* Don't issue a warning for every simdclone when there is no
 	       specific simdlen clause.  */
@@ -28651,12 +28684,14 @@ aarch64_simd_clone_compute_vecsize_and_simdlen (struct cgraph_node *node,
 	    break;
 	  }
       if (remove_simdlen)
-	simdlens.ordered_remove (j);
+	{
+	  simdlens.ordered_remove (j);
+	  simdmangle.ordered_remove (j);
+	}
       else
 	j++;
     }
 
-
   int count = simdlens.length ();
   if (count == 0)
     {
@@ -28675,20 +28710,107 @@ aarch64_simd_clone_compute_vecsize_and_simdlen (struct cgraph_node *node,
 
   gcc_assert (num < count);
   clonei->simdlen = simdlens[num];
+  clonei->vecsize_mangle = simdmangle[num];
+  /* SVE simdclones always have a Mask, so set inbranch to 1.  */
+  if (clonei->vecsize_mangle == 's')
+    clonei->inbranch = 1;
   return count;
 }
 
+static tree
+simd_clone_adjust_sve_vector_type (tree type, bool is_mask, poly_uint64 simdlen)
+{
+    unsigned int num_zr = 0;
+    unsigned int num_pr = 0;
+    machine_mode vector_mode;
+    type = TREE_TYPE (type);
+    scalar_mode scalar_m = as_a <scalar_mode> (TYPE_MODE (type));
+    gcc_assert (aarch64_sve_data_mode (scalar_m,
+				       simdlen).exists (&vector_mode));
+    type = build_vector_type_for_mode (type, vector_mode);
+    if (is_mask)
+      {
+	type = truth_type_for (type);
+	num_pr = 1;
+      }
+    else
+      num_zr = 1;
+
+    aarch64_sve::add_sve_type_attribute (type, num_zr, num_pr, NULL,
+					 NULL);
+    return type;
+}
+
 /* Implement TARGET_SIMD_CLONE_ADJUST.  */
 
 static void
 aarch64_simd_clone_adjust (struct cgraph_node *node)
 {
-  /* Add aarch64_vector_pcs target attribute to SIMD clones so they
-     use the correct ABI.  */
-
   tree t = TREE_TYPE (node->decl);
-  TYPE_ATTRIBUTES (t) = make_attribute ("aarch64_vector_pcs", "default",
-					TYPE_ATTRIBUTES (t));
+  cl_target_option cur_target;
+  bool m_old_have_regs_of_mode[MAX_MACHINE_MODE];
+
+  if (node->simdclone->vecsize_mangle == 's')
+    {
+      tree target = build_string (strlen ("+sve"), "+sve");
+      aarch64_option_valid_attribute_p (node->decl, NULL_TREE, target, 0);
+      cl_target_option_save (&cur_target, &global_options, &global_options_set);
+      tree new_target = DECL_FUNCTION_SPECIFIC_TARGET (node->decl);
+      cl_target_option_restore (&global_options, &global_options_set,
+				TREE_TARGET_OPTION (new_target));
+      aarch64_override_options_internal (&global_options);
+      memcpy (m_old_have_regs_of_mode, have_regs_of_mode,
+	      sizeof (have_regs_of_mode));
+      for (int i = 0; i < NUM_MACHINE_MODES; ++i)
+	if (aarch64_sve_mode_p ((machine_mode) i))
+	  have_regs_of_mode[i] = true;
+    }
+  else
+    {
+	/* Add aarch64_vector_pcs target attribute to SIMD clones so they
+	   use the correct ABI.  */
+	TYPE_ATTRIBUTES (t) = make_attribute ("aarch64_vector_pcs", "default",
+					      TYPE_ATTRIBUTES (t));
+    }
+  cgraph_simd_clone *sc = node->simdclone;
+
+  for (unsigned i = 0; i < sc->nargs; ++i)
+    {
+      bool is_mask = false;
+      tree type;
+      switch (sc->args[i].arg_type)
+	{
+	case SIMD_CLONE_ARG_TYPE_MASK:
+	  is_mask = true;
+	  gcc_fallthrough ();
+	case SIMD_CLONE_ARG_TYPE_VECTOR:
+	case SIMD_CLONE_ARG_TYPE_LINEAR_VAL_CONSTANT_STEP:
+	case SIMD_CLONE_ARG_TYPE_LINEAR_VAL_VARIABLE_STEP:
+	  type = sc->args[i].vector_type;
+	  gcc_assert (VECTOR_TYPE_P (type));
+	  if (node->simdclone->vecsize_mangle == 's')
+	    type = simd_clone_adjust_sve_vector_type (type, is_mask,
+						      sc->simdlen);
+	  else if (is_mask)
+	    type = truth_type_for (type);
+	  sc->args[i].vector_type = type;
+	default:
+	    continue;
+	}
+    }
+  if (node->simdclone->vecsize_mangle == 's')
+    {
+      tree ret_type = TREE_TYPE (t);
+      if (VECTOR_TYPE_P (ret_type))
+	TREE_TYPE (t)
+	  = simd_clone_adjust_sve_vector_type (ret_type, false,
+					       node->simdclone->simdlen);
+      /* Restore current options.  */
+      cl_target_option_restore (&global_options, &global_options_set, &cur_target);
+      aarch64_override_options_internal (&global_options);
+      memcpy (have_regs_of_mode, m_old_have_regs_of_mode,
+	      sizeof (have_regs_of_mode));
+    }
 }
 
 /* Implement TARGET_SIMD_CLONE_USABLE.  */
@@ -28705,6 +28827,10 @@ aarch64_simd_clone_usable (struct cgraph_node *node, stmt_vec_info stmt_vinfo)
 	  && aarch64_sve_mode_p (TYPE_MODE (STMT_VINFO_VECTYPE (stmt_vinfo))))
 	return -1;
       return 0;
+    case 's':
+      if (!TARGET_SVE)
+	return -1;
+      return 0;
     default:
       gcc_unreachable ();
     }
diff --git a/gcc/omp-simd-clone.cc b/gcc/omp-simd-clone.cc
index 864586207ee..066b6217253 100644
--- a/gcc/omp-simd-clone.cc
+++ b/gcc/omp-simd-clone.cc
@@ -541,9 +541,12 @@ simd_clone_mangle (struct cgraph_node *node,
   pp_string (&pp, "_ZGV");
   pp_character (&pp, vecsize_mangle);
   pp_character (&pp, mask);
-  /* For now, simdlen is always constant, while variable simdlen pp 'n'.  */
-  unsigned int len = simdlen.to_constant ();
-  pp_decimal_int (&pp, (len));
+
+  unsigned long long len = 0;
+  if (simdlen.is_constant (&len))
+    pp_decimal_int (&pp, (int) (len));
+  else
+    pp_character (&pp, 'x');
 
   for (n = 0; n < clone_info->nargs; ++n)
     {
@@ -1533,8 +1536,8 @@ simd_clone_adjust (struct cgraph_node *node)
 	 below).  */
       loop = alloc_loop ();
       cfun->has_force_vectorize_loops = true;
-      /* For now, simlen is always constant.  */
-      loop->safelen = node->simdclone->simdlen.to_constant ();
+      /* We can assert that safelen is the 'minimum' simdlen.  */
+      loop->safelen = constant_lower_bound (node->simdclone->simdlen);
       loop->force_vectorize = true;
       loop->header = body_bb;
     }
diff --git a/gcc/testsuite/c-c++-common/gomp/declare-variant-14.c b/gcc/testsuite/c-c++-common/gomp/declare-variant-14.c
index e3668893afe..2b71869787e 100644
--- a/gcc/testsuite/c-c++-common/gomp/declare-variant-14.c
+++ b/gcc/testsuite/c-c++-common/gomp/declare-variant-14.c
@@ -1,6 +1,6 @@
-/* { dg-do compile { target vect_simd_clones } } */
+/* { dg-do compile { target { { i?86-*-* x86_64-*-* } && vect_simd_clones } } } */
 /* { dg-additional-options "-fdump-tree-gimple -fdump-tree-optimized" } */
-/* { dg-additional-options "-mno-sse3" { target { i?86-*-* x86_64-*-* } } } */
+/* { dg-additional-options "-mno-sse3" } */
 
 int f01 (int);
 int f02 (int);
@@ -15,15 +15,13 @@ int
 test1 (int x)
 {
   /* At gimplification time, we can't decide yet which function to call.  */
-  /* { dg-final { scan-tree-dump-times "f04 \\\(x" 2 "gimple" { target { !aarch64*-*-* } } } } */
+  /* { dg-final { scan-tree-dump-times "f04 \\\(x" 2 "gimple" } } */
   /* After simd clones are created, the original non-clone test1 shall
      call f03 (score 6), the sse2/avx/avx2 clones too, but avx512f clones
      shall call f01 with score 8.  */
   /* { dg-final { scan-tree-dump-not "f04 \\\(x" "optimized" } } */
-  /* { dg-final { scan-tree-dump-times "f03 \\\(x" 14 "optimized" { target { !aarch64*-*-* } } } } */
-  /* { dg-final { scan-tree-dump-times "f03 \\\(x" 10 "optimized" { target { aarch64*-*-* } } } } */
-  /* { dg-final { scan-tree-dump-times "f01 \\\(x" 4 "optimized" { target { !aarch64*-*-* } } } } */
-  /* { dg-final { scan-tree-dump-times "f01 \\\(x" 0 "optimized" { target { aarch64*-*-* } } } } */
+  /* { dg-final { scan-tree-dump-times "f03 \\\(x" 14 "optimized" } } */
+  /* { dg-final { scan-tree-dump-times "f01 \\\(x" 4 "optimized" } } */
   int a = f04 (x);
   int b = f04 (x);
   return a + b;
diff --git a/gcc/testsuite/gcc.target/aarch64/declare-simd-2.c b/gcc/testsuite/gcc.target/aarch64/declare-simd-2.c
index e2e80f0c663..2f4d3a866e5 100644
--- a/gcc/testsuite/gcc.target/aarch64/declare-simd-2.c
+++ b/gcc/testsuite/gcc.target/aarch64/declare-simd-2.c
@@ -43,6 +43,7 @@ float f04 (double a)
 }
 /* { dg-final { scan-assembler {_ZGVnN2v_f04:} } } */
 /* { dg-final { scan-assembler {_ZGVnM2v_f04:} } } */
+/* { dg-final { scan-assembler-not {_ZGVs[0-9a-z]*_f04:} } } */
 
 #pragma omp declare simd uniform(a) linear (b)
 void f05 (short a, short *b, short c)
diff --git a/gcc/testsuite/gcc.target/aarch64/vect-simd-clone-1.c b/gcc/testsuite/gcc.target/aarch64/vect-simd-clone-1.c
new file mode 100644
index 00000000000..71fd361acec
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/vect-simd-clone-1.c
@@ -0,0 +1,52 @@
+/* { dg-do compile }  */
+/* { dg-options "-std=c99" } */
+/* { dg-additional-options "-O3 -march=armv8-a+sve -mcpu=neoverse-n2" } */
+extern int __attribute__ ((simd, const)) fn0 (int);
+
+void test_fn0 (int *a, int *b, int n)
+{
+  for (int i = 0; i < n; ++i)
+    a[i] += fn0 (b[i]);
+}
+
+/* { dg-final { scan-assembler {_ZGVsMxv_fn0} } } */
+
+extern int __attribute__ ((simd, const)) fn1 (short, int);
+
+void test_fn1 (int *a, int *b, short *c, int n)
+{
+  for (int i = 0; i < n; ++i)
+    a[i] = fn1 (c[i], b[i]);
+}
+
+/* { dg-final { scan-assembler {_ZGVsMxvv_fn1} } } */
+
+extern short __attribute__ ((simd, const)) fn2 (short, int);
+
+void test_fn2 (short *a, int *b, short *c, int n)
+{
+  for (int i = 0; i < n; ++i)
+    a[i] = fn2 (c[i], b[i]);
+}
+
+/* { dg-final { scan-assembler {_ZGVsMxvv_fn2} } } */
+
+extern char __attribute__ ((simd, const)) fn3 (int, char);
+
+void test_fn3 (int *a, int *b, char *c, int n)
+{
+  for (int i = 0; i < n; ++i)
+    a[i] = (int) (fn3 (b[i], c[i]) + c[i]);
+}
+
+/* { dg-final { scan-assembler {_ZGVsMxvv_fn3} } } */
+
+extern short __attribute__ ((simd, const)) fn4 (int, short);
+
+void test_fn4 (int *a, int *b, short *c, int n)
+{
+  for (int i = 0; i < n; ++i)
+    a[i] = (int) (fn4 (b[i], c[i]) + c[i]);
+}
+
+/* { dg-final { scan-assembler {_ZGVsMxvv_fn4} } } */
diff --git a/gcc/testsuite/gfortran.dg/gomp/declare-variant-14.f90 b/gcc/testsuite/gfortran.dg/gomp/declare-variant-14.f90
index 6319df0558f..3c7d093c5c6 100644
--- a/gcc/testsuite/gfortran.dg/gomp/declare-variant-14.f90
+++ b/gcc/testsuite/gfortran.dg/gomp/declare-variant-14.f90
@@ -1,6 +1,6 @@
-! { dg-do compile { target vect_simd_clones } }
+! { dg-do compile { target { { i?86-*-* x86_64-*-* } && vect_simd_clones } } } */
 ! { dg-additional-options "-O0 -fdump-tree-gimple -fdump-tree-optimized" }
-! { dg-additional-options "-mno-sse3" { target { i?86-*-* x86_64-*-* } } }
+! { dg-additional-options "-mno-sse3" }
 
 module main
   implicit none
@@ -41,7 +41,7 @@ contains
     ! shall call f01 with score 8.
     ! { dg-final { scan-tree-dump-not "f04 \\\(x" "optimized" } }
     ! { dg-final { scan-tree-dump-times "f03 \\\(x" 14 "optimized" { target { !aarch64*-*-* } } } }
-    ! { dg-final { scan-tree-dump-times "f03 \\\(x" 6 "optimized" { target { aarch64*-*-* } } } }
+    ! { dg-final { scan-tree-dump-times "f03 \\\(x" 8 "optimized" { target { aarch64*-*-* } } } }
     ! { dg-final { scan-tree-dump-times "f01 \\\(x" 4 "optimized" { target { !aarch64*-*-* } } } }
     ! { dg-final { scan-tree-dump-times "f01 \\\(x" 0 "optimized" { target { aarch64*-*-* } } } }
     a = f04 (x)

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 1/3] vect: Pass stmt_vec_info to TARGET_SIMD_CLONE_USABLE
  2024-01-30 14:31 ` [PATCH 1/3] vect: Pass stmt_vec_info to TARGET_SIMD_CLONE_USABLE Andre Vieira
@ 2024-01-31 12:11   ` Richard Biener
  2024-01-31 12:13     ` Richard Biener
  0 siblings, 1 reply; 23+ messages in thread
From: Richard Biener @ 2024-01-31 12:11 UTC (permalink / raw)
  To: Andre Vieira; +Cc: gcc-patches, Richard.Sandiford

On Tue, 30 Jan 2024, Andre Vieira wrote:

> 
> This patch adds stmt_vec_info to TARGET_SIMD_CLONE_USABLE to make sure the
> target can reject a simd_clone based on the vector mode it is using.
> This is needed because for VLS SVE vectorization the vectorizer accepts
> Advanced SIMD simd clones when vectorizing using SVE types because the simdlens
> might match.  This will cause type errors later on.
> 
> Other targets do not currently need to use this argument.

Can you instead pass down the mode?

> gcc/ChangeLog:
> 
> 	* target.def (TARGET_SIMD_CLONE_USABLE): Add argument.
> 	* tree-vect-stmts.cc (vectorizable_simd_clone_call): Pass stmt_info to
> 	call TARGET_SIMD_CLONE_USABLE.
> 	* config/aarch64/aarch64.cc (aarch64_simd_clone_usable): Add argument
> 	and use it to reject the use of SVE simd clones with Advanced SIMD
> 	modes.
> 	* config/gcn/gcn.cc (gcn_simd_clone_usable): Add unused argument.
> 	* config/i386/i386.cc (ix86_simd_clone_usable): Likewise.
> 
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 1/3] vect: Pass stmt_vec_info to TARGET_SIMD_CLONE_USABLE
  2024-01-31 12:11   ` Richard Biener
@ 2024-01-31 12:13     ` Richard Biener
  2024-01-31 13:52       ` Andre Vieira (lists)
  0 siblings, 1 reply; 23+ messages in thread
From: Richard Biener @ 2024-01-31 12:13 UTC (permalink / raw)
  To: Andre Vieira; +Cc: gcc-patches, Richard.Sandiford

On Wed, 31 Jan 2024, Richard Biener wrote:

> On Tue, 30 Jan 2024, Andre Vieira wrote:
> 
> > 
> > This patch adds stmt_vec_info to TARGET_SIMD_CLONE_USABLE to make sure the
> > target can reject a simd_clone based on the vector mode it is using.
> > This is needed because for VLS SVE vectorization the vectorizer accepts
> > Advanced SIMD simd clones when vectorizing using SVE types because the simdlens
> > might match.  This will cause type errors later on.
> > 
> > Other targets do not currently need to use this argument.
> 
> Can you instead pass down the mode?

Thinking about that again the cgraph_simd_clone info in the clone
should have sufficient information to disambiguate.  If it doesn't
then we should amend it.

Richard.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 2/3] vect: disable multiple calls of poly simdclones
  2024-01-30 14:31 ` [PATCH 2/3] vect: disable multiple calls of poly simdclones Andre Vieira
@ 2024-01-31 12:13   ` Richard Biener
  0 siblings, 0 replies; 23+ messages in thread
From: Richard Biener @ 2024-01-31 12:13 UTC (permalink / raw)
  To: Andre Vieira; +Cc: gcc-patches, Richard.Sandiford

On Tue, 30 Jan 2024, Andre Vieira wrote:

> 
> The current codegen code to support VF's that are multiples of a simdclone
> simdlen rely on BIT_FIELD_REF to create multiple input vectors.  This does not
> work for non-constant simdclones, so we should disable using such clones when
> the VF is a multiple of the non-constant simdlen until we change the codegen to
> support those.

OK.

Thanks,
Richard.

> gcc/ChangeLog:
> 
> 	* tree-vect-stmts.cc (vectorizable_simd_clone_call): Reject simdclones
> 	with non-constant simdlen when VF is not exactly the same.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 1/3] vect: Pass stmt_vec_info to TARGET_SIMD_CLONE_USABLE
  2024-01-31 12:13     ` Richard Biener
@ 2024-01-31 13:52       ` Andre Vieira (lists)
  2024-01-31 13:58         ` Richard Biener
  0 siblings, 1 reply; 23+ messages in thread
From: Andre Vieira (lists) @ 2024-01-31 13:52 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches, Richard.Sandiford

On 31/01/2024 12:13, Richard Biener wrote:
> On Wed, 31 Jan 2024, Richard Biener wrote:
> 
>> On Tue, 30 Jan 2024, Andre Vieira wrote:
>>
>>>
>>> This patch adds stmt_vec_info to TARGET_SIMD_CLONE_USABLE to make sure the
>>> target can reject a simd_clone based on the vector mode it is using.
>>> This is needed because for VLS SVE vectorization the vectorizer accepts
>>> Advanced SIMD simd clones when vectorizing using SVE types because the simdlens
>>> might match.  This will cause type errors later on.
>>>
>>> Other targets do not currently need to use this argument.
>>
>> Can you instead pass down the mode?
> 
> Thinking about that again the cgraph_simd_clone info in the clone
> should have sufficient information to disambiguate.  If it doesn't
> then we should amend it.
> 
> Richard.

Hi Richard,

Thanks for the review, I don't think cgraph_simd_clone_info is the right 
place to pass down this information, since this is information about the 
caller rather than the simdclone itself. What we are trying to achieve 
here is making the vectorizer being able to accept or reject simdclones 
based on the ISA we are vectorizing for. To distinguish between SVE and 
Advanced SIMD ISAs we use modes, I am also not sure that's ideal but it 
is what we currently use. So to answer your earlier question, yes I can 
also pass down mode if that's preferable.

Regards,
Andre

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 1/3] vect: Pass stmt_vec_info to TARGET_SIMD_CLONE_USABLE
  2024-01-31 13:52       ` Andre Vieira (lists)
@ 2024-01-31 13:58         ` Richard Biener
  2024-01-31 14:03           ` Richard Biener
  2024-01-31 14:35           ` Andre Vieira (lists)
  0 siblings, 2 replies; 23+ messages in thread
From: Richard Biener @ 2024-01-31 13:58 UTC (permalink / raw)
  To: Andre Vieira (lists); +Cc: gcc-patches, Richard.Sandiford

On Wed, 31 Jan 2024, Andre Vieira (lists) wrote:

> 
> 
> On 31/01/2024 12:13, Richard Biener wrote:
> > On Wed, 31 Jan 2024, Richard Biener wrote:
> > 
> >> On Tue, 30 Jan 2024, Andre Vieira wrote:
> >>
> >>>
> >>> This patch adds stmt_vec_info to TARGET_SIMD_CLONE_USABLE to make sure the
> >>> target can reject a simd_clone based on the vector mode it is using.
> >>> This is needed because for VLS SVE vectorization the vectorizer accepts
> >>> Advanced SIMD simd clones when vectorizing using SVE types because the
> >>> simdlens
> >>> might match.  This will cause type errors later on.
> >>>
> >>> Other targets do not currently need to use this argument.
> >>
> >> Can you instead pass down the mode?
> > 
> > Thinking about that again the cgraph_simd_clone info in the clone
> > should have sufficient information to disambiguate.  If it doesn't
> > then we should amend it.
> > 
> > Richard.
> 
> Hi Richard,
> 
> Thanks for the review, I don't think cgraph_simd_clone_info is the right place
> to pass down this information, since this is information about the caller
> rather than the simdclone itself. What we are trying to achieve here is making
> the vectorizer being able to accept or reject simdclones based on the ISA we
> are vectorizing for. To distinguish between SVE and Advanced SIMD ISAs we use
> modes, I am also not sure that's ideal but it is what we currently use. So to
> answer your earlier question, yes I can also pass down mode if that's
> preferable.

Note cgraph_simd_clone_info has simdlen and we seem to check elsewhere
whether that's POLY or constant.  I wonder how aarch64_sve_mode_p
comes into play here which in the end classifies VLS SVE modes as
non-SVE?

> Regards,
> Andre
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 1/3] vect: Pass stmt_vec_info to TARGET_SIMD_CLONE_USABLE
  2024-01-31 13:58         ` Richard Biener
@ 2024-01-31 14:03           ` Richard Biener
  2024-01-31 16:13             ` Andre Vieira (lists)
  2024-01-31 14:35           ` Andre Vieira (lists)
  1 sibling, 1 reply; 23+ messages in thread
From: Richard Biener @ 2024-01-31 14:03 UTC (permalink / raw)
  To: Andre Vieira (lists); +Cc: gcc-patches, Richard.Sandiford

On Wed, 31 Jan 2024, Richard Biener wrote:

> On Wed, 31 Jan 2024, Andre Vieira (lists) wrote:
> 
> > 
> > 
> > On 31/01/2024 12:13, Richard Biener wrote:
> > > On Wed, 31 Jan 2024, Richard Biener wrote:
> > > 
> > >> On Tue, 30 Jan 2024, Andre Vieira wrote:
> > >>
> > >>>
> > >>> This patch adds stmt_vec_info to TARGET_SIMD_CLONE_USABLE to make sure the
> > >>> target can reject a simd_clone based on the vector mode it is using.
> > >>> This is needed because for VLS SVE vectorization the vectorizer accepts
> > >>> Advanced SIMD simd clones when vectorizing using SVE types because the
> > >>> simdlens
> > >>> might match.  This will cause type errors later on.
> > >>>
> > >>> Other targets do not currently need to use this argument.
> > >>
> > >> Can you instead pass down the mode?
> > > 
> > > Thinking about that again the cgraph_simd_clone info in the clone
> > > should have sufficient information to disambiguate.  If it doesn't
> > > then we should amend it.
> > > 
> > > Richard.
> > 
> > Hi Richard,
> > 
> > Thanks for the review, I don't think cgraph_simd_clone_info is the right place
> > to pass down this information, since this is information about the caller
> > rather than the simdclone itself. What we are trying to achieve here is making
> > the vectorizer being able to accept or reject simdclones based on the ISA we
> > are vectorizing for. To distinguish between SVE and Advanced SIMD ISAs we use
> > modes, I am also not sure that's ideal but it is what we currently use. So to
> > answer your earlier question, yes I can also pass down mode if that's
> > preferable.
> 
> Note cgraph_simd_clone_info has simdlen and we seem to check elsewhere
> whether that's POLY or constant.  I wonder how aarch64_sve_mode_p
> comes into play here which in the end classifies VLS SVE modes as
> non-SVE?

Maybe it's just a bit non-obvious as you key on mangling:

 static int
-aarch64_simd_clone_usable (struct cgraph_node *node)
+aarch64_simd_clone_usable (struct cgraph_node *node, stmt_vec_info 
stmt_vinfo)
 {
   switch (node->simdclone->vecsize_mangle)
     {
     case 'n':
       if (!TARGET_SIMD)
        return -1;
+      if (STMT_VINFO_VECTYPE (stmt_vinfo)
+         && aarch64_sve_mode_p (TYPE_MODE (STMT_VINFO_VECTYPE 
(stmt_vinfo))))
+       return -1;

?  What does 'n' mean?  It's documented as

  /* The mangling character for a given vector size.  This is used
     to determine the ISA mangling bit as specified in the Intel
     Vector ABI.  */
  unsigned char vecsize_mangle;

which is slightly misleading.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 1/3] vect: Pass stmt_vec_info to TARGET_SIMD_CLONE_USABLE
  2024-01-31 14:35           ` Andre Vieira (lists)
@ 2024-01-31 14:35             ` Richard Biener
  2024-01-31 16:36               ` Andre Vieira (lists)
  0 siblings, 1 reply; 23+ messages in thread
From: Richard Biener @ 2024-01-31 14:35 UTC (permalink / raw)
  To: Andre Vieira (lists); +Cc: gcc-patches, Richard.Sandiford

On Wed, 31 Jan 2024, Andre Vieira (lists) wrote:

> 
> 
> On 31/01/2024 13:58, Richard Biener wrote:
> > On Wed, 31 Jan 2024, Andre Vieira (lists) wrote:
> > 
> >>
> >>
> >> On 31/01/2024 12:13, Richard Biener wrote:
> >>> On Wed, 31 Jan 2024, Richard Biener wrote:
> >>>
> >>>> On Tue, 30 Jan 2024, Andre Vieira wrote:
> >>>>
> >>>>>
> >>>>> This patch adds stmt_vec_info to TARGET_SIMD_CLONE_USABLE to make sure
> >>>>> the
> >>>>> target can reject a simd_clone based on the vector mode it is using.
> >>>>> This is needed because for VLS SVE vectorization the vectorizer accepts
> >>>>> Advanced SIMD simd clones when vectorizing using SVE types because the
> >>>>> simdlens
> >>>>> might match.  This will cause type errors later on.
> >>>>>
> >>>>> Other targets do not currently need to use this argument.
> >>>>
> >>>> Can you instead pass down the mode?
> >>>
> >>> Thinking about that again the cgraph_simd_clone info in the clone
> >>> should have sufficient information to disambiguate.  If it doesn't
> >>> then we should amend it.
> >>>
> >>> Richard.
> >>
> >> Hi Richard,
> >>
> >> Thanks for the review, I don't think cgraph_simd_clone_info is the right
> >> place
> >> to pass down this information, since this is information about the caller
> >> rather than the simdclone itself. What we are trying to achieve here is
> >> making
> >> the vectorizer being able to accept or reject simdclones based on the ISA
> >> we
> >> are vectorizing for. To distinguish between SVE and Advanced SIMD ISAs we
> >> use
> >> modes, I am also not sure that's ideal but it is what we currently use. So
> >> to
> >> answer your earlier question, yes I can also pass down mode if that's
> >> preferable.
> > 
> > Note cgraph_simd_clone_info has simdlen and we seem to check elsewhere
> > whether that's POLY or constant.  I wonder how aarch64_sve_mode_p
> > comes into play here which in the end classifies VLS SVE modes as
> > non-SVE?
> > 
> 
> Using -msve-vector-bits=128
> (gdb) p TYPE_MODE (STMT_VINFO_VECTYPE (stmt_vinfo))
> $4 = E_VNx4SImode
> (gdb) p  TYPE_SIZE (STMT_VINFO_VECTYPE (stmt_vinfo))
> $5 = (tree) 0xfffff741c1b0
> (gdb) p debug (TYPE_SIZE (STMT_VINFO_VECTYPE (stmt_vinfo)))
> 128
> (gdb) p aarch64_sve_mode_p (TYPE_MODE (STMT_VINFO_VECTYPE (stmt_vinfo)))
> $5 = true
> 
> and for reference without vls codegen:
> (gdb) p TYPE_MODE (STMT_VINFO_VECTYPE (stmt_vinfo))
> $1 = E_VNx4SImode
> (gdb) p  debug (TYPE_SIZE (STMT_VINFO_VECTYPE (stmt_vinfo)))
> POLY_INT_CST [128, 128]
> 
> Having said that I believe that the USABLE targethook implementation for
> aarch64 should also block other uses, like an Advanced SIMD mode being used as
> input for a SVE VLS SIMDCLONE. The reason being that for instance 'half'
> registers like VNx2SI are packed differently from V2SI.
> 
> We could teach the vectorizer to support these of course, but that requires
> more work and is not extremely useful just yet. I'll add the extra check that
> to the patch once we agree on how to pass down the information we need. Happy
> to use either mode, or stmt_vec_info and extract the mode from it like it does
> now.

As said, please pass down 'mode'.  But I wonder how to document it,
which mode is that supposed to be?  Any of result or any argument
mode that happens to be a vector?  I think that we might be able
to mix Advanced SIMD modes and SVE modes with -msve-vector-bits=128
in the same loop?

Are the simd clones you don't want to use with -msve-vector-bits=128
having constant simdlen?  If so why do you generate them in the first
place?

That said, I wonder how we end up mixing things up in the first place.

Richard.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 1/3] vect: Pass stmt_vec_info to TARGET_SIMD_CLONE_USABLE
  2024-01-31 13:58         ` Richard Biener
  2024-01-31 14:03           ` Richard Biener
@ 2024-01-31 14:35           ` Andre Vieira (lists)
  2024-01-31 14:35             ` Richard Biener
  1 sibling, 1 reply; 23+ messages in thread
From: Andre Vieira (lists) @ 2024-01-31 14:35 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches, Richard.Sandiford



On 31/01/2024 13:58, Richard Biener wrote:
> On Wed, 31 Jan 2024, Andre Vieira (lists) wrote:
> 
>>
>>
>> On 31/01/2024 12:13, Richard Biener wrote:
>>> On Wed, 31 Jan 2024, Richard Biener wrote:
>>>
>>>> On Tue, 30 Jan 2024, Andre Vieira wrote:
>>>>
>>>>>
>>>>> This patch adds stmt_vec_info to TARGET_SIMD_CLONE_USABLE to make sure the
>>>>> target can reject a simd_clone based on the vector mode it is using.
>>>>> This is needed because for VLS SVE vectorization the vectorizer accepts
>>>>> Advanced SIMD simd clones when vectorizing using SVE types because the
>>>>> simdlens
>>>>> might match.  This will cause type errors later on.
>>>>>
>>>>> Other targets do not currently need to use this argument.
>>>>
>>>> Can you instead pass down the mode?
>>>
>>> Thinking about that again the cgraph_simd_clone info in the clone
>>> should have sufficient information to disambiguate.  If it doesn't
>>> then we should amend it.
>>>
>>> Richard.
>>
>> Hi Richard,
>>
>> Thanks for the review, I don't think cgraph_simd_clone_info is the right place
>> to pass down this information, since this is information about the caller
>> rather than the simdclone itself. What we are trying to achieve here is making
>> the vectorizer being able to accept or reject simdclones based on the ISA we
>> are vectorizing for. To distinguish between SVE and Advanced SIMD ISAs we use
>> modes, I am also not sure that's ideal but it is what we currently use. So to
>> answer your earlier question, yes I can also pass down mode if that's
>> preferable.
> 
> Note cgraph_simd_clone_info has simdlen and we seem to check elsewhere
> whether that's POLY or constant.  I wonder how aarch64_sve_mode_p
> comes into play here which in the end classifies VLS SVE modes as
> non-SVE?
> 

Using -msve-vector-bits=128
(gdb) p TYPE_MODE (STMT_VINFO_VECTYPE (stmt_vinfo))
$4 = E_VNx4SImode
(gdb) p  TYPE_SIZE (STMT_VINFO_VECTYPE (stmt_vinfo))
$5 = (tree) 0xfffff741c1b0
(gdb) p debug (TYPE_SIZE (STMT_VINFO_VECTYPE (stmt_vinfo)))
128
(gdb) p aarch64_sve_mode_p (TYPE_MODE (STMT_VINFO_VECTYPE (stmt_vinfo)))
$5 = true

and for reference without vls codegen:
(gdb) p TYPE_MODE (STMT_VINFO_VECTYPE (stmt_vinfo))
$1 = E_VNx4SImode
(gdb) p  debug (TYPE_SIZE (STMT_VINFO_VECTYPE (stmt_vinfo)))
POLY_INT_CST [128, 128]

Having said that I believe that the USABLE targethook implementation for 
aarch64 should also block other uses, like an Advanced SIMD mode being 
used as input for a SVE VLS SIMDCLONE. The reason being that for 
instance 'half' registers like VNx2SI are packed differently from V2SI.

We could teach the vectorizer to support these of course, but that 
requires more work and is not extremely useful just yet. I'll add the 
extra check that to the patch once we agree on how to pass down the 
information we need. Happy to use either mode, or stmt_vec_info and 
extract the mode from it like it does now.

>> Regards,
>> Andre
>>
> 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 1/3] vect: Pass stmt_vec_info to TARGET_SIMD_CLONE_USABLE
  2024-01-31 14:03           ` Richard Biener
@ 2024-01-31 16:13             ` Andre Vieira (lists)
  0 siblings, 0 replies; 23+ messages in thread
From: Andre Vieira (lists) @ 2024-01-31 16:13 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches, Richard.Sandiford



On 31/01/2024 14:03, Richard Biener wrote:
> On Wed, 31 Jan 2024, Richard Biener wrote:
> 
>> On Wed, 31 Jan 2024, Andre Vieira (lists) wrote:
>>
>>>
>>>
>>> On 31/01/2024 12:13, Richard Biener wrote:
>>>> On Wed, 31 Jan 2024, Richard Biener wrote:
>>>>
>>>>> On Tue, 30 Jan 2024, Andre Vieira wrote:
>>>>>
>>>>>>
>>>>>> This patch adds stmt_vec_info to TARGET_SIMD_CLONE_USABLE to make sure the
>>>>>> target can reject a simd_clone based on the vector mode it is using.
>>>>>> This is needed because for VLS SVE vectorization the vectorizer accepts
>>>>>> Advanced SIMD simd clones when vectorizing using SVE types because the
>>>>>> simdlens
>>>>>> might match.  This will cause type errors later on.
>>>>>>
>>>>>> Other targets do not currently need to use this argument.
>>>>>
>>>>> Can you instead pass down the mode?
>>>>
>>>> Thinking about that again the cgraph_simd_clone info in the clone
>>>> should have sufficient information to disambiguate.  If it doesn't
>>>> then we should amend it.
>>>>
>>>> Richard.
>>>
>>> Hi Richard,
>>>
>>> Thanks for the review, I don't think cgraph_simd_clone_info is the right place
>>> to pass down this information, since this is information about the caller
>>> rather than the simdclone itself. What we are trying to achieve here is making
>>> the vectorizer being able to accept or reject simdclones based on the ISA we
>>> are vectorizing for. To distinguish between SVE and Advanced SIMD ISAs we use
>>> modes, I am also not sure that's ideal but it is what we currently use. So to
>>> answer your earlier question, yes I can also pass down mode if that's
>>> preferable.
>>
>> Note cgraph_simd_clone_info has simdlen and we seem to check elsewhere
>> whether that's POLY or constant.  I wonder how aarch64_sve_mode_p
>> comes into play here which in the end classifies VLS SVE modes as
>> non-SVE?
> 
> Maybe it's just a bit non-obvious as you key on mangling:
> 
>   static int
> -aarch64_simd_clone_usable (struct cgraph_node *node)
> +aarch64_simd_clone_usable (struct cgraph_node *node, stmt_vec_info
> stmt_vinfo)
>   {
>     switch (node->simdclone->vecsize_mangle)
>       {
>       case 'n':
>         if (!TARGET_SIMD)
>          return -1;
> +      if (STMT_VINFO_VECTYPE (stmt_vinfo)
> +         && aarch64_sve_mode_p (TYPE_MODE (STMT_VINFO_VECTYPE
> (stmt_vinfo))))
> +       return -1;
> 
> ?  What does 'n' mean?  It's documented as
> 
>    /* The mangling character for a given vector size.  This is used
>       to determine the ISA mangling bit as specified in the Intel
>       Vector ABI.  */
>    unsigned char vecsize_mangle;

I'll update the comment, but yeh 'n' is for Advanced SIMD, 's' is for SVE.
> 
> which is slightly misleading.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 1/3] vect: Pass stmt_vec_info to TARGET_SIMD_CLONE_USABLE
  2024-01-31 14:35             ` Richard Biener
@ 2024-01-31 16:36               ` Andre Vieira (lists)
  2024-02-01  7:19                 ` Richard Biener
  2024-02-01  7:59                 ` Richard Sandiford
  0 siblings, 2 replies; 23+ messages in thread
From: Andre Vieira (lists) @ 2024-01-31 16:36 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches, Richard.Sandiford



On 31/01/2024 14:35, Richard Biener wrote:
> On Wed, 31 Jan 2024, Andre Vieira (lists) wrote:
> 
>>
>>
>> On 31/01/2024 13:58, Richard Biener wrote:
>>> On Wed, 31 Jan 2024, Andre Vieira (lists) wrote:
>>>
>>>>
>>>>
>>>> On 31/01/2024 12:13, Richard Biener wrote:
>>>>> On Wed, 31 Jan 2024, Richard Biener wrote:
>>>>>
>>>>>> On Tue, 30 Jan 2024, Andre Vieira wrote:
>>>>>>
>>>>>>>
>>>>>>> This patch adds stmt_vec_info to TARGET_SIMD_CLONE_USABLE to make sure
>>>>>>> the
>>>>>>> target can reject a simd_clone based on the vector mode it is using.
>>>>>>> This is needed because for VLS SVE vectorization the vectorizer accepts
>>>>>>> Advanced SIMD simd clones when vectorizing using SVE types because the
>>>>>>> simdlens
>>>>>>> might match.  This will cause type errors later on.
>>>>>>>
>>>>>>> Other targets do not currently need to use this argument.
>>>>>>
>>>>>> Can you instead pass down the mode?
>>>>>
>>>>> Thinking about that again the cgraph_simd_clone info in the clone
>>>>> should have sufficient information to disambiguate.  If it doesn't
>>>>> then we should amend it.
>>>>>
>>>>> Richard.
>>>>
>>>> Hi Richard,
>>>>
>>>> Thanks for the review, I don't think cgraph_simd_clone_info is the right
>>>> place
>>>> to pass down this information, since this is information about the caller
>>>> rather than the simdclone itself. What we are trying to achieve here is
>>>> making
>>>> the vectorizer being able to accept or reject simdclones based on the ISA
>>>> we
>>>> are vectorizing for. To distinguish between SVE and Advanced SIMD ISAs we
>>>> use
>>>> modes, I am also not sure that's ideal but it is what we currently use. So
>>>> to
>>>> answer your earlier question, yes I can also pass down mode if that's
>>>> preferable.
>>>
>>> Note cgraph_simd_clone_info has simdlen and we seem to check elsewhere
>>> whether that's POLY or constant.  I wonder how aarch64_sve_mode_p
>>> comes into play here which in the end classifies VLS SVE modes as
>>> non-SVE?
>>>
>>
>> Using -msve-vector-bits=128
>> (gdb) p TYPE_MODE (STMT_VINFO_VECTYPE (stmt_vinfo))
>> $4 = E_VNx4SImode
>> (gdb) p  TYPE_SIZE (STMT_VINFO_VECTYPE (stmt_vinfo))
>> $5 = (tree) 0xfffff741c1b0
>> (gdb) p debug (TYPE_SIZE (STMT_VINFO_VECTYPE (stmt_vinfo)))
>> 128
>> (gdb) p aarch64_sve_mode_p (TYPE_MODE (STMT_VINFO_VECTYPE (stmt_vinfo)))
>> $5 = true
>>
>> and for reference without vls codegen:
>> (gdb) p TYPE_MODE (STMT_VINFO_VECTYPE (stmt_vinfo))
>> $1 = E_VNx4SImode
>> (gdb) p  debug (TYPE_SIZE (STMT_VINFO_VECTYPE (stmt_vinfo)))
>> POLY_INT_CST [128, 128]
>>
>> Having said that I believe that the USABLE targethook implementation for
>> aarch64 should also block other uses, like an Advanced SIMD mode being used as
>> input for a SVE VLS SIMDCLONE. The reason being that for instance 'half'
>> registers like VNx2SI are packed differently from V2SI.
>>
>> We could teach the vectorizer to support these of course, but that requires
>> more work and is not extremely useful just yet. I'll add the extra check that
>> to the patch once we agree on how to pass down the information we need. Happy
>> to use either mode, or stmt_vec_info and extract the mode from it like it does
>> now.
> 
> As said, please pass down 'mode'.  But I wonder how to document it,
> which mode is that supposed to be?  Any of result or any argument
> mode that happens to be a vector?  I think that we might be able
> to mix Advanced SIMD modes and SVE modes with -msve-vector-bits=128
> in the same loop?
> 
> Are the simd clones you don't want to use with -msve-vector-bits=128
> having constant simdlen?  If so why do you generate them in the first
> place?

So this is where things get a bit confusing and I will write up some 
text for these cases to put in our ABI document (currently in Beta and 
in need of some tlc).

Our intended behaviour is for a 'declare simd' without a simdlen to 
generate simdclones for:
* Advanced SIMD 128 and 64-bit vectors, where possible (we don't allow 
for simdlen 1, Tamar fixed that in gcc recently),
* SVE VLA vectors.

Let me illustrate this with an example:

__attribute__ ((simd (notinbranch), const)) float cosf(float);

Should tell the compiler the following simd clones are available:
__ZGVnN4v_cosf 128-bit 4x4 float Advanced SIMD clone
__ZGVnN2v_cosf 64-bit  4x2 float Advanced SIMD clone
__ZGVsMxv_cosf [128, 128]-bit 4x4xN SVE SIMD clone

[To save you looking into the abi let me break this down, _ZGV is 
prefix, then 'n' or 's' picks between Advanced SIMD and SVE, 'N' or 'M' 
picks between Not Masked and Masked (SVE is always masked even if we ask 
for notinbranch), then a digit or 'x' picks between Vector Length or 
VLA, and after that you get a letter per argument, where v = vector mapped]

Regardless of -msve-vector-bits, however, the vectorizer (and any other 
part of the compiler) may assume that the VL of the VLA SVE clone is 
that specified by -msve-vector-bits, which if the clone is written in a 
VLA way will still work.

If the attribute is used with a function definition rather than 
declaration, so:

__attribute__ ((simd (notinbranch), const)) float fn0(float a)
{
   return a + 1.0f;
}

the compiler should again generate the three simd clones:
__ZGVnN4v_fn0 128-bit 4x4 float Advanced SIMD clone
__ZGVnN2v_fn0 64-bit  4x2 float Advanced SIMD clone
__ZGVsMxv_fn0 [128, 128]-bit 4x4xN SVE SIMD clone

However, in the last one it may assume a VL for the codegen of the body 
and it's the user's responsibility to only use it for targets with that 
length , much like any other code produced this way.

So that's what we tell the compiler is available and what the compiler 
generates depending on where we use the attribute. The question at hand 
here is, what can the vectorizer use for a specific loop. If we are 
using Advanced SIMD modes then it needs to call an Advanced SIMD clone, 
and if we are using SVE modes then it needs to call an SVE clone. At 
least until we support the ABI conversion, because like I said for an 
unpacked argument they behave differently.

PS: In the future OpenMP may add specifications that allow us to define 
a specific VLA simdlen... in other words, whether we want [128, 128] or 
[256, 256], [512, 512] ... etc, but that still needs agreement on the 
OpenMP Spec, which is why for now we piggy back on the simdlen-less 
definition to provide us a VLA SVE simdclone with [128, 128] VL.

Hopefully this makes things a bit clearer :/
> 
> That said, I wonder how we end up mixing things up in the first place.
> 
> Richard.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 1/3] vect: Pass stmt_vec_info to TARGET_SIMD_CLONE_USABLE
  2024-01-31 16:36               ` Andre Vieira (lists)
@ 2024-02-01  7:19                 ` Richard Biener
  2024-02-01 17:01                   ` Andre Vieira (lists)
  2024-02-01  7:59                 ` Richard Sandiford
  1 sibling, 1 reply; 23+ messages in thread
From: Richard Biener @ 2024-02-01  7:19 UTC (permalink / raw)
  To: Andre Vieira (lists); +Cc: gcc-patches, Richard.Sandiford

On Wed, 31 Jan 2024, Andre Vieira (lists) wrote:

> 
> 
> On 31/01/2024 14:35, Richard Biener wrote:
> > On Wed, 31 Jan 2024, Andre Vieira (lists) wrote:
> > 
> >>
> >>
> >> On 31/01/2024 13:58, Richard Biener wrote:
> >>> On Wed, 31 Jan 2024, Andre Vieira (lists) wrote:
> >>>
> >>>>
> >>>>
> >>>> On 31/01/2024 12:13, Richard Biener wrote:
> >>>>> On Wed, 31 Jan 2024, Richard Biener wrote:
> >>>>>
> >>>>>> On Tue, 30 Jan 2024, Andre Vieira wrote:
> >>>>>>
> >>>>>>>
> >>>>>>> This patch adds stmt_vec_info to TARGET_SIMD_CLONE_USABLE to make sure
> >>>>>>> the
> >>>>>>> target can reject a simd_clone based on the vector mode it is using.
> >>>>>>> This is needed because for VLS SVE vectorization the vectorizer
> >>>>>>> accepts
> >>>>>>> Advanced SIMD simd clones when vectorizing using SVE types because the
> >>>>>>> simdlens
> >>>>>>> might match.  This will cause type errors later on.
> >>>>>>>
> >>>>>>> Other targets do not currently need to use this argument.
> >>>>>>
> >>>>>> Can you instead pass down the mode?
> >>>>>
> >>>>> Thinking about that again the cgraph_simd_clone info in the clone
> >>>>> should have sufficient information to disambiguate.  If it doesn't
> >>>>> then we should amend it.
> >>>>>
> >>>>> Richard.
> >>>>
> >>>> Hi Richard,
> >>>>
> >>>> Thanks for the review, I don't think cgraph_simd_clone_info is the right
> >>>> place
> >>>> to pass down this information, since this is information about the caller
> >>>> rather than the simdclone itself. What we are trying to achieve here is
> >>>> making
> >>>> the vectorizer being able to accept or reject simdclones based on the ISA
> >>>> we
> >>>> are vectorizing for. To distinguish between SVE and Advanced SIMD ISAs we
> >>>> use
> >>>> modes, I am also not sure that's ideal but it is what we currently use.
> >>>> So
> >>>> to
> >>>> answer your earlier question, yes I can also pass down mode if that's
> >>>> preferable.
> >>>
> >>> Note cgraph_simd_clone_info has simdlen and we seem to check elsewhere
> >>> whether that's POLY or constant.  I wonder how aarch64_sve_mode_p
> >>> comes into play here which in the end classifies VLS SVE modes as
> >>> non-SVE?
> >>>
> >>
> >> Using -msve-vector-bits=128
> >> (gdb) p TYPE_MODE (STMT_VINFO_VECTYPE (stmt_vinfo))
> >> $4 = E_VNx4SImode
> >> (gdb) p  TYPE_SIZE (STMT_VINFO_VECTYPE (stmt_vinfo))
> >> $5 = (tree) 0xfffff741c1b0
> >> (gdb) p debug (TYPE_SIZE (STMT_VINFO_VECTYPE (stmt_vinfo)))
> >> 128
> >> (gdb) p aarch64_sve_mode_p (TYPE_MODE (STMT_VINFO_VECTYPE (stmt_vinfo)))
> >> $5 = true
> >>
> >> and for reference without vls codegen:
> >> (gdb) p TYPE_MODE (STMT_VINFO_VECTYPE (stmt_vinfo))
> >> $1 = E_VNx4SImode
> >> (gdb) p  debug (TYPE_SIZE (STMT_VINFO_VECTYPE (stmt_vinfo)))
> >> POLY_INT_CST [128, 128]
> >>
> >> Having said that I believe that the USABLE targethook implementation for
> >> aarch64 should also block other uses, like an Advanced SIMD mode being used
> >> as
> >> input for a SVE VLS SIMDCLONE. The reason being that for instance 'half'
> >> registers like VNx2SI are packed differently from V2SI.
> >>
> >> We could teach the vectorizer to support these of course, but that requires
> >> more work and is not extremely useful just yet. I'll add the extra check
> >> that
> >> to the patch once we agree on how to pass down the information we need.
> >> Happy
> >> to use either mode, or stmt_vec_info and extract the mode from it like it
> >> does
> >> now.
> > 
> > As said, please pass down 'mode'.  But I wonder how to document it,
> > which mode is that supposed to be?  Any of result or any argument
> > mode that happens to be a vector?  I think that we might be able
> > to mix Advanced SIMD modes and SVE modes with -msve-vector-bits=128
> > in the same loop?
> > 
> > Are the simd clones you don't want to use with -msve-vector-bits=128
> > having constant simdlen?  If so why do you generate them in the first
> > place?
> 
> So this is where things get a bit confusing and I will write up some text for
> these cases to put in our ABI document (currently in Beta and in need of some
> tlc).
> 
> Our intended behaviour is for a 'declare simd' without a simdlen to generate
> simdclones for:
> * Advanced SIMD 128 and 64-bit vectors, where possible (we don't allow for
> simdlen 1, Tamar fixed that in gcc recently),
> * SVE VLA vectors.
> 
> Let me illustrate this with an example:
> 
> __attribute__ ((simd (notinbranch), const)) float cosf(float);
> 
> Should tell the compiler the following simd clones are available:
> __ZGVnN4v_cosf 128-bit 4x4 float Advanced SIMD clone
> __ZGVnN2v_cosf 64-bit  4x2 float Advanced SIMD clone
> __ZGVsMxv_cosf [128, 128]-bit 4x4xN SVE SIMD clone
> 
> [To save you looking into the abi let me break this down, _ZGV is prefix, then
> 'n' or 's' picks between Advanced SIMD and SVE, 'N' or 'M' picks between Not
> Masked and Masked (SVE is always masked even if we ask for notinbranch), then
> a digit or 'x' picks between Vector Length or VLA, and after that you get a
> letter per argument, where v = vector mapped]
> 
> Regardless of -msve-vector-bits, however, the vectorizer (and any other part
> of the compiler) may assume that the VL of the VLA SVE clone is that specified
> by -msve-vector-bits, which if the clone is written in a VLA way will still
> work.
> 
> If the attribute is used with a function definition rather than declaration,
> so:
> 
> __attribute__ ((simd (notinbranch), const)) float fn0(float a)
> {
>   return a + 1.0f;
> }
> 
> the compiler should again generate the three simd clones:
> __ZGVnN4v_fn0 128-bit 4x4 float Advanced SIMD clone
> __ZGVnN2v_fn0 64-bit  4x2 float Advanced SIMD clone
> __ZGVsMxv_fn0 [128, 128]-bit 4x4xN SVE SIMD clone
> 
> However, in the last one it may assume a VL for the codegen of the body and
> it's the user's responsibility to only use it for targets with that length ,
> much like any other code produced this way.
> 
> So that's what we tell the compiler is available and what the compiler
> generates depending on where we use the attribute. The question at hand here
> is, what can the vectorizer use for a specific loop. If we are using Advanced
> SIMD modes then it needs to call an Advanced SIMD clone, and if we are using
> SVE modes then it needs to call an SVE clone. At least until we support the
> ABI conversion, because like I said for an unpacked argument they behave
> differently.
> 
> PS: In the future OpenMP may add specifications that allow us to define a
> specific VLA simdlen... in other words, whether we want [128, 128] or [256,
> 256], [512, 512] ... etc, but that still needs agreement on the OpenMP Spec,
> which is why for now we piggy back on the simdlen-less definition to provide
> us a VLA SVE simdclone with [128, 128] VL.
> 
> Hopefully this makes things a bit clearer :/

So where does it go wrong?  What case does the patch fix?  For
the non-definition case the SVE clone should have a POLY_INT simdlen
and as you say it should be fine to use that even with -msve-vector-bits.
For the definition case the SVE clone might have a constant simdlen
but so does the caller (unless we allow different setting between
functions/TUs?).  The only thing the vectorizer looks at is I think

        if (!constant_multiple_p (vf * group_size, n->simdclone->simdlen,
                                  &num_calls)
            || (!n->simdclone->inbranch && (masked_call_offset > 0))
            || (nargs != simd_nargs))
          continue;

plus your 2nd patch rejecting num_calls > 1 for variable-length SVE.

The patch didn't come with a testcase so it's really hard to tell
what goes wrong now and how it is fixed ...

Richard.

> > 
> > That said, I wonder how we end up mixing things up in the first place.
> > 
> > Richard.
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 1/3] vect: Pass stmt_vec_info to TARGET_SIMD_CLONE_USABLE
  2024-01-31 16:36               ` Andre Vieira (lists)
  2024-02-01  7:19                 ` Richard Biener
@ 2024-02-01  7:59                 ` Richard Sandiford
  1 sibling, 0 replies; 23+ messages in thread
From: Richard Sandiford @ 2024-02-01  7:59 UTC (permalink / raw)
  To: Andre Vieira (lists); +Cc: Richard Biener, gcc-patches

"Andre Vieira (lists)" <andre.simoesdiasvieira@arm.com> writes:
> [...] The question at hand 
> here is, what can the vectorizer use for a specific loop. If we are 
> using Advanced SIMD modes then it needs to call an Advanced SIMD clone, 
> and if we are using SVE modes then it needs to call an SVE clone. At 
> least until we support the ABI conversion, because like I said for an 
> unpacked argument they behave differently.

Probably also worth noting that multi-byte elements are laid out
differently for big-endian.  E.g. V4SI is loaded as a 128-bit integer
whereas VNx4SI is loaded as an array of 4 32-bit integers, with the
first 32-bit integer going in the least significant bits of the register.

So it would only be possible to use Advanced SIMD clones for SVE modes
and vice versa for little-endian, or if the elements are all bytes,
or if we add some reverses to the inputs and outputs.

Richard

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 1/3] vect: Pass stmt_vec_info to TARGET_SIMD_CLONE_USABLE
  2024-02-01  7:19                 ` Richard Biener
@ 2024-02-01 17:01                   ` Andre Vieira (lists)
  2024-02-05  9:56                     ` Richard Biener
  0 siblings, 1 reply; 23+ messages in thread
From: Andre Vieira (lists) @ 2024-02-01 17:01 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches, Richard.Sandiford

On 01/02/2024 07:19, Richard Biener wrote:
> On Wed, 31 Jan 2024, Andre Vieira (lists) wrote:
> 
> 
> The patch didn't come with a testcase so it's really hard to tell
> what goes wrong now and how it is fixed ...

My bad! I had a testcase locally but never added it...

However... now I look at it and ran it past Richard S, the codegen isn't 
'wrong', but it does have the potential to lead to some pretty slow 
codegen, especially for inbranch simdclones where it transforms the SVE 
predicate into an Advanced SIMD vector by inserting the elements one at 
a time...

An example of which can be seen if you do:

gcc -O3 -march=armv8-a+sve -msve-vector-bits=128  -fopenmp-simd t.c -S

with the following t.c:
#pragma omp declare simd simdlen(4) inbranch
int __attribute__ ((const)) fn5(int);

void fn4 (int *a, int *b, int n)
{
     for (int i = 0; i < n; ++i)
         b[i] = fn5(a[i]);
}

Now I do have to say, for our main usecase of libmvec we won't have any 
'inbranch' Advanced SIMD clones, so we avoid that issue... But of course 
that doesn't mean user-code will.

I'm gonna remove this patch and run another test regression to see if it 
catches anything weird, but if not then I guess we do have the option to 
not use this patch and aim to solve the costing or codegen issue in 
GCC-15. We don't currently do any simdclone costing and I don't have a 
clear suggestion for how given openmp has no mechanism that I know off 
to expose the speedup of a simdclone over it's scalar variant, so how 
would we 'compare' a simdclone call with extra overhead of argument 
preparation vs scalar, though at least we could prefer a call to a 
different simdclone with less argument preparation. Anyways I digress.

Other tests, these require aarch64-autovec-preference=2 so that also has 
me worried less...

gcc -O3 -march=armv8-a+sve -msve-vector-bits=128 --param 
aarch64-autovec-preference=2 -fopenmp-simd t.c -S

t.c:
#pragma omp declare simd simdlen(2) notinbranch
float __attribute__ ((const)) fn1(double);

void fn0 (float *a, float *b, int n)
{
     for (int i = 0; i < n; ++i)
         b[i] = fn1((double) a[i]);
}

#pragma omp declare simd simdlen(2) notinbranch
float __attribute__ ((const)) fn3(float);

void fn2 (float *a, double *b, int n)
{
     for (int i = 0; i < n; ++i)
         b[i] = (double) fn3(a[i]);
}

> Richard.
> 
>>>
>>> That said, I wonder how we end up mixing things up in the first place.
>>>
>>> Richard.
>>
> 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 3/3] aarch64: Add SVE support for simd clones [PR 96342]
  2024-01-30 14:31 ` [PATCH 3/3] aarch64: Add SVE support for simd clones [PR 96342] Andre Vieira
@ 2024-02-01 21:59   ` Richard Sandiford
  0 siblings, 0 replies; 23+ messages in thread
From: Richard Sandiford @ 2024-02-01 21:59 UTC (permalink / raw)
  To: Andre Vieira; +Cc: gcc-patches, rguenther, jakub

Andre Vieira <andre.simoesdiasvieira@arm.com> writes:
> This patch finalizes adding support for the generation of SVE simd clones when
> no simdlen is provided, following the ABI rules where the widest data type
> determines the minimum amount of elements in a length agnostic vector.
>
> gcc/ChangeLog:
>
> 	* config/aarch64/aarch64-protos.h (add_sve_type_attribute): Declare.
> 	* config/aarch64/aarch64-sve-builtins.cc (add_sve_type_attribute): Make
> 	visibility global and support use for non_acle types.
> 	* config/aarch64/aarch64.cc
> 	(aarch64_simd_clone_compute_vecsize_and_simdlen): Create VLA simd clone
> 	when no simdlen is provided, according to ABI rules.
> 	(simd_clone_adjust_sve_vector_type): New helper function.
> 	(aarch64_simd_clone_adjust): Add '+sve' attribute to SVE simd clones
> 	and modify types to use SVE types.
> 	* omp-simd-clone.cc (simd_clone_mangle): Print 'x' for VLA simdlen.
> 	(simd_clone_adjust): Adapt safelen check to be compatible with VLA
> 	simdlen.
>
> gcc/testsuite/ChangeLog:
>
> 	* c-c++-common/gomp/declare-variant-14.c: Make i?86 and x86_64 target
> 	only test.
> 	* gfortran.dg/gomp/declare-variant-14.f90: Likewise.
> 	* gcc.target/aarch64/declare-simd-2.c: Add SVE clone scan.
> 	* gcc.target/aarch64/vect-simd-clone-1.c: New test.
>
> diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
> index a0b142e0b94..207396de0ff 100644
> --- a/gcc/config/aarch64/aarch64-protos.h
> +++ b/gcc/config/aarch64/aarch64-protos.h
> @@ -1031,6 +1031,8 @@ namespace aarch64_sve {
>  #ifdef GCC_TARGET_H
>    bool verify_type_context (location_t, type_context_kind, const_tree, bool);
>  #endif
> + void add_sve_type_attribute (tree, unsigned int, unsigned int,
> +			      const char *, const char *);
>  }
>  
>  extern void aarch64_split_combinev16qi (rtx operands[3]);
> diff --git a/gcc/config/aarch64/aarch64-sve-builtins.cc b/gcc/config/aarch64/aarch64-sve-builtins.cc
> index 11f5c5c500c..747131e684e 100644
> --- a/gcc/config/aarch64/aarch64-sve-builtins.cc
> +++ b/gcc/config/aarch64/aarch64-sve-builtins.cc
> @@ -953,14 +953,16 @@ static bool reported_missing_registers_p;
>  /* Record that TYPE is an ABI-defined SVE type that contains NUM_ZR SVE vectors
>     and NUM_PR SVE predicates.  MANGLED_NAME, if nonnull, is the ABI-defined
>     mangling of the type.  ACLE_NAME is the <arm_sve.h> name of the type.  */
> -static void
> +void
>  add_sve_type_attribute (tree type, unsigned int num_zr, unsigned int num_pr,
>  			const char *mangled_name, const char *acle_name)
>  {
>    tree mangled_name_tree
>      = (mangled_name ? get_identifier (mangled_name) : NULL_TREE);
> +  tree acle_name_tree
> +    = (acle_name ? get_identifier (acle_name) : NULL_TREE);
>  
> -  tree value = tree_cons (NULL_TREE, get_identifier (acle_name), NULL_TREE);
> +  tree value = tree_cons (NULL_TREE, acle_name_tree, NULL_TREE);
>    value = tree_cons (NULL_TREE, mangled_name_tree, value);
>    value = tree_cons (NULL_TREE, size_int (num_pr), value);
>    value = tree_cons (NULL_TREE, size_int (num_zr), value);
> diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
> index 31617510160..cba8879ab33 100644
> --- a/gcc/config/aarch64/aarch64.cc
> +++ b/gcc/config/aarch64/aarch64.cc
> @@ -28527,7 +28527,7 @@ aarch64_simd_clone_compute_vecsize_and_simdlen (struct cgraph_node *node,
>  					int num, bool explicit_p)
>  {
>    tree t, ret_type;
> -  unsigned int nds_elt_bits;
> +  unsigned int nds_elt_bits, wds_elt_bits;
>    unsigned HOST_WIDE_INT const_simdlen;
>  
>    if (!TARGET_SIMD)
> @@ -28572,10 +28572,14 @@ aarch64_simd_clone_compute_vecsize_and_simdlen (struct cgraph_node *node,
>    if (TREE_CODE (ret_type) != VOID_TYPE)
>      {
>        nds_elt_bits = lane_size (SIMD_CLONE_ARG_TYPE_VECTOR, ret_type);
> +      wds_elt_bits = nds_elt_bits;
>        vec_elts.safe_push (std::make_pair (ret_type, nds_elt_bits));
>      }
>    else
> -    nds_elt_bits = POINTER_SIZE;
> +    {
> +      nds_elt_bits = POINTER_SIZE;
> +      wds_elt_bits = 0;
> +    }
>  
>    int i;
>    tree type_arg_types = TYPE_ARG_TYPES (TREE_TYPE (node->decl));
> @@ -28583,44 +28587,72 @@ aarch64_simd_clone_compute_vecsize_and_simdlen (struct cgraph_node *node,
>    for (t = (decl_arg_p ? DECL_ARGUMENTS (node->decl) : type_arg_types), i = 0;
>         t && t != void_list_node; t = TREE_CHAIN (t), i++)
>      {
> -      tree arg_type = decl_arg_p ? TREE_TYPE (t) : TREE_VALUE (t);
> +      tree type = decl_arg_p ? TREE_TYPE (t) : TREE_VALUE (t);
>        if (clonei->args[i].arg_type != SIMD_CLONE_ARG_TYPE_UNIFORM
> -	  && !supported_simd_type (arg_type))
> +	  && !supported_simd_type (type))
>  	{
>  	  if (!explicit_p)
>  	    ;
> -	  else if (COMPLEX_FLOAT_TYPE_P (ret_type))
> +	  else if (COMPLEX_FLOAT_TYPE_P (type))
>  	    warning_at (DECL_SOURCE_LOCATION (node->decl), 0,
>  			"GCC does not currently support argument type %qT "
> -			"for simd", arg_type);
> +			"for simd", type);
>  	  else
>  	    warning_at (DECL_SOURCE_LOCATION (node->decl), 0,
>  			"unsupported argument type %qT for simd",
> -			arg_type);
> +			type);
>  	  return 0;
>  	}
> -      unsigned lane_bits = lane_size (clonei->args[i].arg_type, arg_type);
> +      unsigned lane_bits = lane_size (clonei->args[i].arg_type, type);
>        if (clonei->args[i].arg_type == SIMD_CLONE_ARG_TYPE_VECTOR)
> -	vec_elts.safe_push (std::make_pair (arg_type, lane_bits));
> +	vec_elts.safe_push (std::make_pair (type, lane_bits));
>        if (nds_elt_bits > lane_bits)
>  	nds_elt_bits = lane_bits;
> +      if (wds_elt_bits < lane_bits)
> +	wds_elt_bits = lane_bits;
>      }
>  
> -  clonei->vecsize_mangle = 'n';
> +  /* If we could not determine the WDS type from available parameters/return,
> +     then fallback to using uintptr_t.  */
> +  if (wds_elt_bits == 0)
> +    wds_elt_bits = POINTER_SIZE;
> +
>    clonei->mask_mode = VOIDmode;
>    poly_uint64 simdlen;
> -  auto_vec<poly_uint64> simdlens (2);
> +  auto_vec<poly_uint64> simdlens (3);
> +  auto_vec<char> simdmangle (3);

Minor, but I think it'd be neater to use an ad-hoc structure that
contains the mangling prefix and simdlen together, so that only one
vector is needed.  Brace initialization should make it a bit shorter too.

>    /* Keep track of the possible simdlens the clones of this function can have,
>       and check them later to see if we support them.  */
>    if (known_eq (clonei->simdlen, 0U))
>      {
>        simdlen = exact_div (poly_uint64 (64), nds_elt_bits);
>        if (maybe_ne (simdlen, 1U))
> -	simdlens.safe_push (simdlen);
> +	{
> +	  simdlens.safe_push (simdlen);
> +	  simdmangle.safe_push ('n');
> +	}
>        simdlens.safe_push (simdlen * 2);
> +      simdmangle.safe_push ('n');
> +      /* Only create a SVE simd clone if we aren't dealing with an unprototyped
> +	 function.
> +	We have also disabled support for creating SVE simdclones for functions
> +	with function bodies and any simdclones when -msve-vector-bits is used.
> +	TODO: add support for these.  */
> +      if ((DECL_ARGUMENTS (node->decl) != 0
> +	   || type_arg_types != 0)

I think my comment from the previous review still stands:

  This check feels a bit indirect.  Does it work to use:

    if (prototype_p (TREE_TYPE (node->decl)))

  instead?

Or does that not work?

> +	  && !node->definition
> +	  && !aarch64_sve_vg.is_constant ())
> +	{
> +	  poly_uint64 sve_simdlen = aarch64_sve_vg * 64;
> +	  simdlens.safe_push (exact_div (sve_simdlen, wds_elt_bits));

Simpler as:

	  simdlens.safe_push (exact_div (BITS_PER_SVE_VECTOR, wds_elt_bits));

> +	  simdmangle.safe_push ('s');
> +	}
>      }
>    else
> -    simdlens.safe_push (clonei->simdlen);
> +    {
> +      simdlens.safe_push (clonei->simdlen);
> +      simdmangle.safe_push ('n');
> +    }
>  
>    clonei->vecsize_int = 0;
>    clonei->vecsize_float = 0;
> @@ -28638,7 +28670,8 @@ aarch64_simd_clone_compute_vecsize_and_simdlen (struct cgraph_node *node,
>      {
>        bool remove_simdlen = false;
>        for (auto elt : vec_elts)
> -	if (known_gt (simdlens[j] * elt.second, 128U))
> +	if (simdmangle[j] == 'n'
> +	    && known_gt (simdlens[j] * elt.second, 128U))
>  	  {
>  	    /* Don't issue a warning for every simdclone when there is no
>  	       specific simdlen clause.  */
> @@ -28651,12 +28684,14 @@ aarch64_simd_clone_compute_vecsize_and_simdlen (struct cgraph_node *node,
>  	    break;
>  	  }
>        if (remove_simdlen)
> -	simdlens.ordered_remove (j);
> +	{
> +	  simdlens.ordered_remove (j);
> +	  simdmangle.ordered_remove (j);
> +	}
>        else
>  	j++;
>      }
>  
> -
>    int count = simdlens.length ();
>    if (count == 0)
>      {
> @@ -28675,20 +28710,107 @@ aarch64_simd_clone_compute_vecsize_and_simdlen (struct cgraph_node *node,
>  
>    gcc_assert (num < count);
>    clonei->simdlen = simdlens[num];
> +  clonei->vecsize_mangle = simdmangle[num];
> +  /* SVE simdclones always have a Mask, so set inbranch to 1.  */
> +  if (clonei->vecsize_mangle == 's')
> +    clonei->inbranch = 1;
>    return count;
>  }
>  
> +static tree
> +simd_clone_adjust_sve_vector_type (tree type, bool is_mask, poly_uint64 simdlen)
> +{
> +    unsigned int num_zr = 0;

From the previous review:

  Nits: missing function comment.  The body is indented by too many columns.

> +    unsigned int num_pr = 0;
> +    machine_mode vector_mode;
> +    type = TREE_TYPE (type);
> +    scalar_mode scalar_m = as_a <scalar_mode> (TYPE_MODE (type));

SCALAR_TYPE_MODE

> +    gcc_assert (aarch64_sve_data_mode (scalar_m,
> +				       simdlen).exists (&vector_mode));

Better to use require () instead, since gcc_asserts can be compiled out.

> +    type = build_vector_type_for_mode (type, vector_mode);
> +    if (is_mask)
> +      {
> +	type = truth_type_for (type);
> +	num_pr = 1;
> +      }
> +    else
> +      num_zr = 1;
> +
> +    aarch64_sve::add_sve_type_attribute (type, num_zr, num_pr, NULL,
> +					 NULL);

The comment from my previous review still stands:

  Before adding the atttribute, I think we should call:

    type = build_distinct_type_copy (type);

  so that we don't change a type that is already in use, or associate
  any new types with this one.

I think it'd also be worth adding a comment to say why we take this
approach instead of reusing ACLE types.  (The reason being that we need
to handle unpacked vectors as well, which the ACLE doesn't provide.)

> +    return type;
> +}
> +
>  /* Implement TARGET_SIMD_CLONE_ADJUST.  */
>  
>  static void
>  aarch64_simd_clone_adjust (struct cgraph_node *node)
>  {
> -  /* Add aarch64_vector_pcs target attribute to SIMD clones so they
> -     use the correct ABI.  */
> -
>    tree t = TREE_TYPE (node->decl);
> -  TYPE_ATTRIBUTES (t) = make_attribute ("aarch64_vector_pcs", "default",
> -					TYPE_ATTRIBUTES (t));
> +  cl_target_option cur_target;
> +  bool m_old_have_regs_of_mode[MAX_MACHINE_MODE];
> +
> +  if (node->simdclone->vecsize_mangle == 's')
> +    {
> +      tree target = build_string (strlen ("+sve"), "+sve");

Probably worth adding a comment here to say (as you noted in the reply
to the last review) that this is additive and has no effect if SVE (or
higher) is already enabled.

> +      aarch64_option_valid_attribute_p (node->decl, NULL_TREE, target, 0);

I still think it'd be better to assert that this succeeds (via
a gcc_unreachable).  It looks weird to call a _p function and not test
the result.

> +      cl_target_option_save (&cur_target, &global_options, &global_options_set);
> +      tree new_target = DECL_FUNCTION_SPECIFIC_TARGET (node->decl);
> +      cl_target_option_restore (&global_options, &global_options_set,
> +				TREE_TARGET_OPTION (new_target));
> +      aarch64_override_options_internal (&global_options);
> +      memcpy (m_old_have_regs_of_mode, have_regs_of_mode,
> +	      sizeof (have_regs_of_mode));
> +      for (int i = 0; i < NUM_MACHINE_MODES; ++i)
> +	if (aarch64_sve_mode_p ((machine_mode) i))
> +	  have_regs_of_mode[i] = true;

Sorry, just realised I never replied to your question about the
push_cfun/pop_cfun suggestion.  I think the function we'd push is
node->decl, i.e. the one that received the +sve target attribute.

I.e. could we do:

    push_cfun (node->decl);

after aarch64_option_valid_attribute_p and skip the rest?  Then do
pop_cfun as the restoration step.

Does the above work with the:

  /* If what we're processing is the current pragma string then the
     target option node is already stored in target_option_current_node
     by aarch64_pragma_target_parse in aarch64-c.cc.  Use that to avoid
     having to re-parse the string.  This is especially useful to keep
     arm_neon.h compile times down since that header contains a lot
     of intrinsics enclosed in pragmas.  */
  if (!existing_target && args == current_target_pragma)

shortcut in aarch64_override_options_internal?  I have no particular
reason to believe that it wouldn't, just wanted to check...

> +    }
> +  else
> +    {
> +	/* Add aarch64_vector_pcs target attribute to SIMD clones so they
> +	   use the correct ABI.  */
> +	TYPE_ATTRIBUTES (t) = make_attribute ("aarch64_vector_pcs", "default",
> +					      TYPE_ATTRIBUTES (t));
> +    }
> +  cgraph_simd_clone *sc = node->simdclone;
> +
> +  for (unsigned i = 0; i < sc->nargs; ++i)
> +    {
> +      bool is_mask = false;
> +      tree type;
> +      switch (sc->args[i].arg_type)
> +	{
> +	case SIMD_CLONE_ARG_TYPE_MASK:
> +	  is_mask = true;
> +	  gcc_fallthrough ();
> +	case SIMD_CLONE_ARG_TYPE_VECTOR:
> +	case SIMD_CLONE_ARG_TYPE_LINEAR_VAL_CONSTANT_STEP:
> +	case SIMD_CLONE_ARG_TYPE_LINEAR_VAL_VARIABLE_STEP:
> +	  type = sc->args[i].vector_type;
> +	  gcc_assert (VECTOR_TYPE_P (type));
> +	  if (node->simdclone->vecsize_mangle == 's')
> +	    type = simd_clone_adjust_sve_vector_type (type, is_mask,
> +						      sc->simdlen);
> +	  else if (is_mask)
> +	    type = truth_type_for (type);
> +	  sc->args[i].vector_type = type;

Probably best to add a break here (or a fall-through if you prefer).

> +	default:
> +	    continue;

Nit: over-indented continue.  But it might as well be a break.

> +	}
> +    }
> +  if (node->simdclone->vecsize_mangle == 's')
> +    {
> +      tree ret_type = TREE_TYPE (t);
> +      if (VECTOR_TYPE_P (ret_type))
> +	TREE_TYPE (t)
> +	  = simd_clone_adjust_sve_vector_type (ret_type, false,
> +					       node->simdclone->simdlen);
> +      /* Restore current options.  */
> +      cl_target_option_restore (&global_options, &global_options_set, &cur_target);
> +      aarch64_override_options_internal (&global_options);
> +      memcpy (have_regs_of_mode, m_old_have_regs_of_mode,
> +	      sizeof (have_regs_of_mode));
> +    }
>  }
>  
>  /* Implement TARGET_SIMD_CLONE_USABLE.  */
> @@ -28705,6 +28827,10 @@ aarch64_simd_clone_usable (struct cgraph_node *node, stmt_vec_info stmt_vinfo)
>  	  && aarch64_sve_mode_p (TYPE_MODE (STMT_VINFO_VECTYPE (stmt_vinfo))))
>  	return -1;
>        return 0;
> +    case 's':
> +      if (!TARGET_SVE)
> +	return -1;
> +      return 0;
>      default:
>        gcc_unreachable ();
>      }
> diff --git a/gcc/omp-simd-clone.cc b/gcc/omp-simd-clone.cc
> index 864586207ee..066b6217253 100644
> --- a/gcc/omp-simd-clone.cc
> +++ b/gcc/omp-simd-clone.cc
> @@ -541,9 +541,12 @@ simd_clone_mangle (struct cgraph_node *node,
>    pp_string (&pp, "_ZGV");
>    pp_character (&pp, vecsize_mangle);
>    pp_character (&pp, mask);
> -  /* For now, simdlen is always constant, while variable simdlen pp 'n'.  */
> -  unsigned int len = simdlen.to_constant ();
> -  pp_decimal_int (&pp, (len));
> +
> +  unsigned long long len = 0;

unsigned HOST_WIDE_INT

> +  if (simdlen.is_constant (&len))
> +    pp_decimal_int (&pp, (int) (len));
> +  else
> +    pp_character (&pp, 'x');
>  
>    for (n = 0; n < clone_info->nargs; ++n)
>      {
> @@ -1533,8 +1536,8 @@ simd_clone_adjust (struct cgraph_node *node)
>  	 below).  */
>        loop = alloc_loop ();
>        cfun->has_force_vectorize_loops = true;
> -      /* For now, simlen is always constant.  */
> -      loop->safelen = node->simdclone->simdlen.to_constant ();
> +      /* We can assert that safelen is the 'minimum' simdlen.  */
> +      loop->safelen = constant_lower_bound (node->simdclone->simdlen);
>        loop->force_vectorize = true;
>        loop->header = body_bb;
>      }
> diff --git a/gcc/testsuite/c-c++-common/gomp/declare-variant-14.c b/gcc/testsuite/c-c++-common/gomp/declare-variant-14.c
> index e3668893afe..2b71869787e 100644
> --- a/gcc/testsuite/c-c++-common/gomp/declare-variant-14.c
> +++ b/gcc/testsuite/c-c++-common/gomp/declare-variant-14.c
> @@ -1,6 +1,6 @@
> -/* { dg-do compile { target vect_simd_clones } } */
> +/* { dg-do compile { target { { i?86-*-* x86_64-*-* } && vect_simd_clones } } } */
>  /* { dg-additional-options "-fdump-tree-gimple -fdump-tree-optimized" } */
> -/* { dg-additional-options "-mno-sse3" { target { i?86-*-* x86_64-*-* } } } */
> +/* { dg-additional-options "-mno-sse3" } */

Please get Jakub's OK for this part.  Similarly for the Fortran test.

>  
>  int f01 (int);
>  int f02 (int);
> @@ -15,15 +15,13 @@ int
>  test1 (int x)
>  {
>    /* At gimplification time, we can't decide yet which function to call.  */
> -  /* { dg-final { scan-tree-dump-times "f04 \\\(x" 2 "gimple" { target { !aarch64*-*-* } } } } */
> +  /* { dg-final { scan-tree-dump-times "f04 \\\(x" 2 "gimple" } } */
>    /* After simd clones are created, the original non-clone test1 shall
>       call f03 (score 6), the sse2/avx/avx2 clones too, but avx512f clones
>       shall call f01 with score 8.  */
>    /* { dg-final { scan-tree-dump-not "f04 \\\(x" "optimized" } } */
> -  /* { dg-final { scan-tree-dump-times "f03 \\\(x" 14 "optimized" { target { !aarch64*-*-* } } } } */
> -  /* { dg-final { scan-tree-dump-times "f03 \\\(x" 10 "optimized" { target { aarch64*-*-* } } } } */
> -  /* { dg-final { scan-tree-dump-times "f01 \\\(x" 4 "optimized" { target { !aarch64*-*-* } } } } */
> -  /* { dg-final { scan-tree-dump-times "f01 \\\(x" 0 "optimized" { target { aarch64*-*-* } } } } */
> +  /* { dg-final { scan-tree-dump-times "f03 \\\(x" 14 "optimized" } } */
> +  /* { dg-final { scan-tree-dump-times "f01 \\\(x" 4 "optimized" } } */
>    int a = f04 (x);
>    int b = f04 (x);
>    return a + b;

This part I feel safer with :)

> diff --git a/gcc/testsuite/gcc.target/aarch64/declare-simd-2.c b/gcc/testsuite/gcc.target/aarch64/declare-simd-2.c
> index e2e80f0c663..2f4d3a866e5 100644
> --- a/gcc/testsuite/gcc.target/aarch64/declare-simd-2.c
> +++ b/gcc/testsuite/gcc.target/aarch64/declare-simd-2.c
> @@ -43,6 +43,7 @@ float f04 (double a)
>  }
>  /* { dg-final { scan-assembler {_ZGVnN2v_f04:} } } */
>  /* { dg-final { scan-assembler {_ZGVnM2v_f04:} } } */
> +/* { dg-final { scan-assembler-not {_ZGVs[0-9a-z]*_f04:} } } */
>  
>  #pragma omp declare simd uniform(a) linear (b)
>  void f05 (short a, short *b, short c)
> diff --git a/gcc/testsuite/gcc.target/aarch64/vect-simd-clone-1.c b/gcc/testsuite/gcc.target/aarch64/vect-simd-clone-1.c
> new file mode 100644
> index 00000000000..71fd361acec
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/vect-simd-clone-1.c
> @@ -0,0 +1,52 @@
> +/* { dg-do compile }  */
> +/* { dg-options "-std=c99" } */
> +/* { dg-additional-options "-O3 -march=armv8-a+sve -mcpu=neoverse-n2" } */
> +extern int __attribute__ ((simd, const)) fn0 (int);
> +
> +void test_fn0 (int *a, int *b, int n)
> +{
> +  for (int i = 0; i < n; ++i)
> +    a[i] += fn0 (b[i]);
> +}
> +
> +/* { dg-final { scan-assembler {_ZGVsMxv_fn0} } } */
> +
> +extern int __attribute__ ((simd, const)) fn1 (short, int);
> +
> +void test_fn1 (int *a, int *b, short *c, int n)
> +{
> +  for (int i = 0; i < n; ++i)
> +    a[i] = fn1 (c[i], b[i]);
> +}
> +
> +/* { dg-final { scan-assembler {_ZGVsMxvv_fn1} } } */
> +
> +extern short __attribute__ ((simd, const)) fn2 (short, int);
> +
> +void test_fn2 (short *a, int *b, short *c, int n)
> +{
> +  for (int i = 0; i < n; ++i)
> +    a[i] = fn2 (c[i], b[i]);
> +}
> +
> +/* { dg-final { scan-assembler {_ZGVsMxvv_fn2} } } */
> +
> +extern char __attribute__ ((simd, const)) fn3 (int, char);
> +
> +void test_fn3 (int *a, int *b, char *c, int n)
> +{
> +  for (int i = 0; i < n; ++i)
> +    a[i] = (int) (fn3 (b[i], c[i]) + c[i]);
> +}
> +
> +/* { dg-final { scan-assembler {_ZGVsMxvv_fn3} } } */
> +
> +extern short __attribute__ ((simd, const)) fn4 (int, short);
> +
> +void test_fn4 (int *a, int *b, short *c, int n)
> +{
> +  for (int i = 0; i < n; ++i)
> +    a[i] = (int) (fn4 (b[i], c[i]) + c[i]);
> +}
> +
> +/* { dg-final { scan-assembler {_ZGVsMxvv_fn4} } } */

It'd be nice to have some more specific testing here.  Although there
are 5 different signatures, the last 4 are interchangeable as far as
the test goes.  E.g. maybe it would be possible to have some partial
check-function-bodies tests that match the inner loop.  Do we
use extending loads for the unpacked vectors?  (Hope so.)

Thanks,
Richard

> diff --git a/gcc/testsuite/gfortran.dg/gomp/declare-variant-14.f90 b/gcc/testsuite/gfortran.dg/gomp/declare-variant-14.f90
> index 6319df0558f..3c7d093c5c6 100644
> --- a/gcc/testsuite/gfortran.dg/gomp/declare-variant-14.f90
> +++ b/gcc/testsuite/gfortran.dg/gomp/declare-variant-14.f90
> @@ -1,6 +1,6 @@
> -! { dg-do compile { target vect_simd_clones } }
> +! { dg-do compile { target { { i?86-*-* x86_64-*-* } && vect_simd_clones } } } */
>  ! { dg-additional-options "-O0 -fdump-tree-gimple -fdump-tree-optimized" }
> -! { dg-additional-options "-mno-sse3" { target { i?86-*-* x86_64-*-* } } }
> +! { dg-additional-options "-mno-sse3" }
>  
>  module main
>    implicit none
> @@ -41,7 +41,7 @@ contains
>      ! shall call f01 with score 8.
>      ! { dg-final { scan-tree-dump-not "f04 \\\(x" "optimized" } }
>      ! { dg-final { scan-tree-dump-times "f03 \\\(x" 14 "optimized" { target { !aarch64*-*-* } } } }
> -    ! { dg-final { scan-tree-dump-times "f03 \\\(x" 6 "optimized" { target { aarch64*-*-* } } } }
> +    ! { dg-final { scan-tree-dump-times "f03 \\\(x" 8 "optimized" { target { aarch64*-*-* } } } }
>      ! { dg-final { scan-tree-dump-times "f01 \\\(x" 4 "optimized" { target { !aarch64*-*-* } } } }
>      ! { dg-final { scan-tree-dump-times "f01 \\\(x" 0 "optimized" { target { aarch64*-*-* } } } }
>      a = f04 (x)

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 1/3] vect: Pass stmt_vec_info to TARGET_SIMD_CLONE_USABLE
  2024-02-01 17:01                   ` Andre Vieira (lists)
@ 2024-02-05  9:56                     ` Richard Biener
  2024-02-26 16:56                       ` Andre Vieira (lists)
  0 siblings, 1 reply; 23+ messages in thread
From: Richard Biener @ 2024-02-05  9:56 UTC (permalink / raw)
  To: Andre Vieira (lists); +Cc: gcc-patches, Richard.Sandiford

On Thu, 1 Feb 2024, Andre Vieira (lists) wrote:

> 
> 
> On 01/02/2024 07:19, Richard Biener wrote:
> > On Wed, 31 Jan 2024, Andre Vieira (lists) wrote:
> > 
> > 
> > The patch didn't come with a testcase so it's really hard to tell
> > what goes wrong now and how it is fixed ...
> 
> My bad! I had a testcase locally but never added it...
> 
> However... now I look at it and ran it past Richard S, the codegen isn't
> 'wrong', but it does have the potential to lead to some pretty slow codegen,
> especially for inbranch simdclones where it transforms the SVE predicate into
> an Advanced SIMD vector by inserting the elements one at a time...
> 
> An example of which can be seen if you do:
> 
> gcc -O3 -march=armv8-a+sve -msve-vector-bits=128  -fopenmp-simd t.c -S
> 
> with the following t.c:
> #pragma omp declare simd simdlen(4) inbranch
> int __attribute__ ((const)) fn5(int);
> 
> void fn4 (int *a, int *b, int n)
> {
>     for (int i = 0; i < n; ++i)
>         b[i] = fn5(a[i]);
> }
> 
> Now I do have to say, for our main usecase of libmvec we won't have any
> 'inbranch' Advanced SIMD clones, so we avoid that issue... But of course that
> doesn't mean user-code will.

It seems to use SVE masks with vector(4) <signed-boolean:4> and the
ABI says the mask is vector(4) int.  You say that's because we choose
a Adv SIMD clone for the SVE VLS vector code (it calls _ZGVnM4v_fn5).

The vectorizer creates

  _44 = VEC_COND_EXPR <loop_mask_41, { 1, 1, 1, 1 }, { 0, 0, 0, 0 }>;

and then vector lowering decomposes this.  That means the vectorizer
lacks a check that the target handles this VEC_COND_EXPR.

Of course I would expect that SVE with VLS vectors is able to
code generate this operation, so it's missing patterns in the end.

Richard.

> I'm gonna remove this patch and run another test regression to see if it
> catches anything weird, but if not then I guess we do have the option to not
> use this patch and aim to solve the costing or codegen issue in GCC-15. We
> don't currently do any simdclone costing and I don't have a clear suggestion
> for how given openmp has no mechanism that I know off to expose the speedup of
> a simdclone over it's scalar variant, so how would we 'compare' a simdclone
> call with extra overhead of argument preparation vs scalar, though at least we
> could prefer a call to a different simdclone with less argument preparation.
> Anyways I digress.
> 
> Other tests, these require aarch64-autovec-preference=2 so that also has me
> worried less...
> 
> gcc -O3 -march=armv8-a+sve -msve-vector-bits=128 --param
> aarch64-autovec-preference=2 -fopenmp-simd t.c -S
> 
> t.c:
> #pragma omp declare simd simdlen(2) notinbranch
> float __attribute__ ((const)) fn1(double);
> 
> void fn0 (float *a, float *b, int n)
> {
>     for (int i = 0; i < n; ++i)
>         b[i] = fn1((double) a[i]);
> }
> 
> #pragma omp declare simd simdlen(2) notinbranch
> float __attribute__ ((const)) fn3(float);
> 
> void fn2 (float *a, double *b, int n)
> {
>     for (int i = 0; i < n; ++i)
>         b[i] = (double) fn3(a[i]);
> }
> 
> > Richard.
> > 
> >>>
> >>> That said, I wonder how we end up mixing things up in the first place.
> >>>
> >>> Richard.
> >>
> > 
> 
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 1/3] vect: Pass stmt_vec_info to TARGET_SIMD_CLONE_USABLE
  2024-02-05  9:56                     ` Richard Biener
@ 2024-02-26 16:56                       ` Andre Vieira (lists)
  2024-02-27  8:47                         ` Richard Biener
  0 siblings, 1 reply; 23+ messages in thread
From: Andre Vieira (lists) @ 2024-02-26 16:56 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches, Richard.Sandiford



On 05/02/2024 09:56, Richard Biener wrote:
> On Thu, 1 Feb 2024, Andre Vieira (lists) wrote:
> 
>>
>>
>> On 01/02/2024 07:19, Richard Biener wrote:
>>> On Wed, 31 Jan 2024, Andre Vieira (lists) wrote:
>>>
>>>
>>> The patch didn't come with a testcase so it's really hard to tell
>>> what goes wrong now and how it is fixed ...
>>
>> My bad! I had a testcase locally but never added it...
>>
>> However... now I look at it and ran it past Richard S, the codegen isn't
>> 'wrong', but it does have the potential to lead to some pretty slow codegen,
>> especially for inbranch simdclones where it transforms the SVE predicate into
>> an Advanced SIMD vector by inserting the elements one at a time...
>>
>> An example of which can be seen if you do:
>>
>> gcc -O3 -march=armv8-a+sve -msve-vector-bits=128  -fopenmp-simd t.c -S
>>
>> with the following t.c:
>> #pragma omp declare simd simdlen(4) inbranch
>> int __attribute__ ((const)) fn5(int);
>>
>> void fn4 (int *a, int *b, int n)
>> {
>>      for (int i = 0; i < n; ++i)
>>          b[i] = fn5(a[i]);
>> }
>>
>> Now I do have to say, for our main usecase of libmvec we won't have any
>> 'inbranch' Advanced SIMD clones, so we avoid that issue... But of course that
>> doesn't mean user-code will.
> 
> It seems to use SVE masks with vector(4) <signed-boolean:4> and the
> ABI says the mask is vector(4) int.  You say that's because we choose
> a Adv SIMD clone for the SVE VLS vector code (it calls _ZGVnM4v_fn5).
> 
> The vectorizer creates
> 
>    _44 = VEC_COND_EXPR <loop_mask_41, { 1, 1, 1, 1 }, { 0, 0, 0, 0 }>;
> 
> and then vector lowering decomposes this.  That means the vectorizer
> lacks a check that the target handles this VEC_COND_EXPR.
> 
> Of course I would expect that SVE with VLS vectors is able to
> code generate this operation, so it's missing patterns in the end.
> 
> Richard.
> 

What should we do for GCC-14? Going forward I think the right thing to 
do is to add these patterns. But I am not even going to try to do that 
right now and even though we can codegen for this, the result doesn't 
feel like it would ever be profitable which means I'd rather not 
vectorize, or well pick a different vector mode if possible.

This would be achieved with the change to the targethook. If I change 
the hook to take modes, using STMT_VINFO_VECTYPE (stmt_vinfo), is that 
OK for now?

Kind regards,
Andre

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 1/3] vect: Pass stmt_vec_info to TARGET_SIMD_CLONE_USABLE
  2024-02-26 16:56                       ` Andre Vieira (lists)
@ 2024-02-27  8:47                         ` Richard Biener
  2024-02-28 17:25                           ` Andre Vieira (lists)
  0 siblings, 1 reply; 23+ messages in thread
From: Richard Biener @ 2024-02-27  8:47 UTC (permalink / raw)
  To: Andre Vieira (lists); +Cc: gcc-patches, Richard.Sandiford

On Mon, 26 Feb 2024, Andre Vieira (lists) wrote:

> 
> 
> On 05/02/2024 09:56, Richard Biener wrote:
> > On Thu, 1 Feb 2024, Andre Vieira (lists) wrote:
> > 
> >>
> >>
> >> On 01/02/2024 07:19, Richard Biener wrote:
> >>> On Wed, 31 Jan 2024, Andre Vieira (lists) wrote:
> >>>
> >>>
> >>> The patch didn't come with a testcase so it's really hard to tell
> >>> what goes wrong now and how it is fixed ...
> >>
> >> My bad! I had a testcase locally but never added it...
> >>
> >> However... now I look at it and ran it past Richard S, the codegen isn't
> >> 'wrong', but it does have the potential to lead to some pretty slow
> >> codegen,
> >> especially for inbranch simdclones where it transforms the SVE predicate
> >> into
> >> an Advanced SIMD vector by inserting the elements one at a time...
> >>
> >> An example of which can be seen if you do:
> >>
> >> gcc -O3 -march=armv8-a+sve -msve-vector-bits=128  -fopenmp-simd t.c -S
> >>
> >> with the following t.c:
> >> #pragma omp declare simd simdlen(4) inbranch
> >> int __attribute__ ((const)) fn5(int);
> >>
> >> void fn4 (int *a, int *b, int n)
> >> {
> >>      for (int i = 0; i < n; ++i)
> >>          b[i] = fn5(a[i]);
> >> }
> >>
> >> Now I do have to say, for our main usecase of libmvec we won't have any
> >> 'inbranch' Advanced SIMD clones, so we avoid that issue... But of course
> >> that
> >> doesn't mean user-code will.
> > 
> > It seems to use SVE masks with vector(4) <signed-boolean:4> and the
> > ABI says the mask is vector(4) int.  You say that's because we choose
> > a Adv SIMD clone for the SVE VLS vector code (it calls _ZGVnM4v_fn5).
> > 
> > The vectorizer creates
> > 
> >    _44 = VEC_COND_EXPR <loop_mask_41, { 1, 1, 1, 1 }, { 0, 0, 0, 0 }>;
> > 
> > and then vector lowering decomposes this.  That means the vectorizer
> > lacks a check that the target handles this VEC_COND_EXPR.
> > 
> > Of course I would expect that SVE with VLS vectors is able to
> > code generate this operation, so it's missing patterns in the end.
> > 
> > Richard.
> > 
> 
> What should we do for GCC-14? Going forward I think the right thing to do is
> to add these patterns. But I am not even going to try to do that right now and
> even though we can codegen for this, the result doesn't feel like it would
> ever be profitable which means I'd rather not vectorize, or well pick a
> different vector mode if possible.
> 
> This would be achieved with the change to the targethook. If I change the hook
> to take modes, using STMT_VINFO_VECTYPE (stmt_vinfo), is that OK for now?

Passing in a mode is OK.  I'm still not fully understanding why the
clone isn't fully specifying 'mode' and if it does not why the
vectorizer itself can not disregard it.

From the past discussion I understood the existing situation isn't
as bad as initially thought and no bad things happen right now?

Thanks,
Richard.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 1/3] vect: Pass stmt_vec_info to TARGET_SIMD_CLONE_USABLE
  2024-02-27  8:47                         ` Richard Biener
@ 2024-02-28 17:25                           ` Andre Vieira (lists)
  2024-02-29  7:26                             ` Richard Biener
  0 siblings, 1 reply; 23+ messages in thread
From: Andre Vieira (lists) @ 2024-02-28 17:25 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches, Richard.Sandiford

[-- Attachment #1: Type: text/plain, Size: 5591 bytes --]



On 27/02/2024 08:47, Richard Biener wrote:
> On Mon, 26 Feb 2024, Andre Vieira (lists) wrote:
> 
>>
>>
>> On 05/02/2024 09:56, Richard Biener wrote:
>>> On Thu, 1 Feb 2024, Andre Vieira (lists) wrote:
>>>
>>>>
>>>>
>>>> On 01/02/2024 07:19, Richard Biener wrote:
>>>>> On Wed, 31 Jan 2024, Andre Vieira (lists) wrote:
>>>>>
>>>>>
>>>>> The patch didn't come with a testcase so it's really hard to tell
>>>>> what goes wrong now and how it is fixed ...
>>>>
>>>> My bad! I had a testcase locally but never added it...
>>>>
>>>> However... now I look at it and ran it past Richard S, the codegen isn't
>>>> 'wrong', but it does have the potential to lead to some pretty slow
>>>> codegen,
>>>> especially for inbranch simdclones where it transforms the SVE predicate
>>>> into
>>>> an Advanced SIMD vector by inserting the elements one at a time...
>>>>
>>>> An example of which can be seen if you do:
>>>>
>>>> gcc -O3 -march=armv8-a+sve -msve-vector-bits=128  -fopenmp-simd t.c -S
>>>>
>>>> with the following t.c:
>>>> #pragma omp declare simd simdlen(4) inbranch
>>>> int __attribute__ ((const)) fn5(int);
>>>>
>>>> void fn4 (int *a, int *b, int n)
>>>> {
>>>>       for (int i = 0; i < n; ++i)
>>>>           b[i] = fn5(a[i]);
>>>> }
>>>>
>>>> Now I do have to say, for our main usecase of libmvec we won't have any
>>>> 'inbranch' Advanced SIMD clones, so we avoid that issue... But of course
>>>> that
>>>> doesn't mean user-code will.
>>>
>>> It seems to use SVE masks with vector(4) <signed-boolean:4> and the
>>> ABI says the mask is vector(4) int.  You say that's because we choose
>>> a Adv SIMD clone for the SVE VLS vector code (it calls _ZGVnM4v_fn5).
>>>
>>> The vectorizer creates
>>>
>>>     _44 = VEC_COND_EXPR <loop_mask_41, { 1, 1, 1, 1 }, { 0, 0, 0, 0 }>;
>>>
>>> and then vector lowering decomposes this.  That means the vectorizer
>>> lacks a check that the target handles this VEC_COND_EXPR.
>>>
>>> Of course I would expect that SVE with VLS vectors is able to
>>> code generate this operation, so it's missing patterns in the end.
>>>
>>> Richard.
>>>
>>
>> What should we do for GCC-14? Going forward I think the right thing to do is
>> to add these patterns. But I am not even going to try to do that right now and
>> even though we can codegen for this, the result doesn't feel like it would
>> ever be profitable which means I'd rather not vectorize, or well pick a
>> different vector mode if possible.
>>
>> This would be achieved with the change to the targethook. If I change the hook
>> to take modes, using STMT_VINFO_VECTYPE (stmt_vinfo), is that OK for now?
> 
> Passing in a mode is OK.  I'm still not fully understanding why the
> clone isn't fully specifying 'mode' and if it does not why the
> vectorizer itself can not disregard it.


We could check that the modes of the parameters & return type are the 
same as the vector operands & result in the vectorizer. But then we'd 
also want to make sure we don't reject cases where we have simdclones 
with compatible modes, aka same element type, but a multiple element 
count.  Which is where'd we get in trouble again I think, because we'd 
want to accept V8SI -> 2x V4SI, but not V8SI -> 2x VNx4SI (with VLS and 
aarch64_sve_vg = 2), not because it's invalid, but because right now the 
codegen is bad. And it's easier to do this in the targethook, which we 
can technically also use to 'rank' simdclones by setting a 
target_badness value, so in the future we could decide to assign some 
'badness' to influence the rank a SVE simdclone for Advanced SIMD loops 
vs an Advanced SIMD clone for Advanced SIMD loops.

This does touch another issue of simdclone costing, which is a larger 
issue in general and one we (arm) might want to approach in the future. 
It's a complex issue, because the vectorizer doesn't know the 
performance impact of a simdclone, we assume (as we should) that its 
faster than the original scalar, though we currently don't record costs 
for either, but we don't know by how much or how much impact it has, so 
the vectorizer can't reason whether it's beneficial to use a simdclone 
if it has to do a lot of operand preparation, we can merely tell it to 
use it, or not and all the other operations in the loop will determine 
costing.


>  From the past discussion I understood the existing situation isn't
> as bad as initially thought and no bad things happen right now?
Nope, I thought they compiler would fall apart, but it seems to be able 
to transform the operands from one mode into the other, so without the 
targethook it just generates slower loops in certain cases, which we'd 
rather avoid given the usecase for simdclones is to speed things up ;)


Attached reworked patch.


This patch adds a machine_mode argument to TARGET_SIMD_CLONE_USABLE to 
make sure the target can reject a simd_clone based on the vector mode it 
is using.  This is needed because for VLS SVE vectorization the 
vectorizer accepts Advanced SIMD simd clones when vectorizing using SVE 
types because the simdlens might match, this currently leads to 
suboptimal codegen.

Other targets do not currently need to use this argument.

gcc/ChangeLog:

	* target.def (TARGET_SIMD_CLONE_USABLE): Add argument.
	* tree-vect-stmts.cc (vectorizable_simd_clone_call): Pass vector_mode
	to call TARGET_SIMD_CLONE_USABLE.
	* config/aarch64/aarch64.cc (aarch64_simd_clone_usable): Add argument
	and use it to reject the use of SVE simd clones with Advanced SIMD
	modes.
	* config/gcn/gcn.cc (gcn_simd_clone_usable): Add unused argument.
	* config/i386/i386.cc (ix86_simd_clone_usable): Likewise.

[-- Attachment #2: target_simd_clone_usable.patch --]
[-- Type: text/plain, Size: 4672 bytes --]

diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 16318bf925883ecedf9345e53fc0824a553b2747..6ee77f61235219b477d1f622fceb752d54c58b87 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -28769,12 +28769,12 @@ aarch64_simd_clone_adjust (struct cgraph_node *node)
 /* Implement TARGET_SIMD_CLONE_USABLE.  */
 
 static int
-aarch64_simd_clone_usable (struct cgraph_node *node)
+aarch64_simd_clone_usable (struct cgraph_node *node, machine_mode vector_mode)
 {
   switch (node->simdclone->vecsize_mangle)
     {
     case 'n':
-      if (!TARGET_SIMD)
+      if (!TARGET_SIMD || aarch64_sve_mode_p (vector_mode))
 	return -1;
       return 0;
     default:
diff --git a/gcc/config/gcn/gcn.cc b/gcc/config/gcn/gcn.cc
index bc076d1120d9e7d03c9bed23b8df215ae35e442c..9624b7c1aab29665c52f7b82d8b437af2e8e1ea1 100644
--- a/gcc/config/gcn/gcn.cc
+++ b/gcc/config/gcn/gcn.cc
@@ -5667,7 +5667,8 @@ gcn_simd_clone_adjust (struct cgraph_node *ARG_UNUSED (node))
 /* Implement TARGET_SIMD_CLONE_USABLE.  */
 
 static int
-gcn_simd_clone_usable (struct cgraph_node *ARG_UNUSED (node))
+gcn_simd_clone_usable (struct cgraph_node *ARG_UNUSED (node),
+		       machine_mode ARG_UNUSED (vector_mode))
 {
   /* We don't need to do anything here because
      gcn_simd_clone_compute_vecsize_and_simdlen currently only returns one
diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index fc5068539c11748a5adf70ec77b2f1cae1a1e231..c54f66543fdd4103d58c2f9390a3c91060597b94 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -25249,7 +25249,8 @@ ix86_simd_clone_compute_vecsize_and_simdlen (struct cgraph_node *node,
    slightly less desirable, etc.).  */
 
 static int
-ix86_simd_clone_usable (struct cgraph_node *node)
+ix86_simd_clone_usable (struct cgraph_node *node,
+			machine_mode ARG_UNUSED (vector_mode))
 {
   switch (node->simdclone->vecsize_mangle)
     {
diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index c8b8b126b2424b6552f824ba42ac329cfaf84d84..03f7d72a429204a584253dc5c6e8fa1b3074795d 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -6498,11 +6498,11 @@ This hook should add implicit @code{attribute(target("..."))} attribute
 to SIMD clone @var{node} if needed.
 @end deftypefn
 
-@deftypefn {Target Hook} int TARGET_SIMD_CLONE_USABLE (struct cgraph_node *@var{})
+@deftypefn {Target Hook} int TARGET_SIMD_CLONE_USABLE (struct cgraph_node *@var{}, @var{machine_mode})
 This hook should return -1 if SIMD clone @var{node} shouldn't be used
-in vectorized loops in current function, or non-negative number if it is
-usable.  In that case, the smaller the number is, the more desirable it is
-to use it.
+in vectorized loops in current function with @var{vector_mode}, or
+non-negative number if it is usable.  In that case, the smaller the number
+is, the more desirable it is to use it.
 @end deftypefn
 
 @deftypefn {Target Hook} int TARGET_SIMT_VF (void)
diff --git a/gcc/target.def b/gcc/target.def
index fdad7bbc93e2ad8aea30336d5cd4af67801e9c74..7e8921b6bd4078770268819a38595fdce612b548 100644
--- a/gcc/target.def
+++ b/gcc/target.def
@@ -1645,10 +1645,10 @@ void, (struct cgraph_node *), NULL)
 DEFHOOK
 (usable,
 "This hook should return -1 if SIMD clone @var{node} shouldn't be used\n\
-in vectorized loops in current function, or non-negative number if it is\n\
-usable.  In that case, the smaller the number is, the more desirable it is\n\
-to use it.",
-int, (struct cgraph_node *), NULL)
+in vectorized loops in current function with @var{vector_mode}, or\n\
+non-negative number if it is usable.  In that case, the smaller the number\n\
+is, the more desirable it is to use it.",
+int, (struct cgraph_node *, machine_mode), NULL)
 
 HOOK_VECTOR_END (simd_clone)
 
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index 1dbe1115da4d7dd4fc590e5830a9c7f05be6945a..f06a53d37ee05737e00e80d9c265192bede6aa18 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -4074,7 +4074,14 @@ vectorizable_simd_clone_call (vec_info *vinfo, stmt_vec_info stmt_info,
 	  this_badness += floor_log2 (num_calls) * 4096;
 	if (n->simdclone->inbranch)
 	  this_badness += 8192;
-	int target_badness = targetm.simd_clone.usable (n);
+
+	/* If STMT_VINFO_VECTYPE has not been set yet pass the general vector
+	   mode,  which for targets that use it will determine what ISA we can
+	   vectorize this code with.  */
+	machine_mode vector_mode = vinfo->vector_mode;
+	if (vectype)
+	  vector_mode = TYPE_MODE (vectype);
+	int target_badness = targetm.simd_clone.usable (n, vector_mode);
 	if (target_badness < 0)
 	  continue;
 	this_badness += target_badness * 512;

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 1/3] vect: Pass stmt_vec_info to TARGET_SIMD_CLONE_USABLE
  2024-02-28 17:25                           ` Andre Vieira (lists)
@ 2024-02-29  7:26                             ` Richard Biener
  0 siblings, 0 replies; 23+ messages in thread
From: Richard Biener @ 2024-02-29  7:26 UTC (permalink / raw)
  To: Andre Vieira (lists); +Cc: gcc-patches, Richard.Sandiford

On Wed, 28 Feb 2024, Andre Vieira (lists) wrote:

> 
> 
> On 27/02/2024 08:47, Richard Biener wrote:
> > On Mon, 26 Feb 2024, Andre Vieira (lists) wrote:
> > 
> >>
> >>
> >> On 05/02/2024 09:56, Richard Biener wrote:
> >>> On Thu, 1 Feb 2024, Andre Vieira (lists) wrote:
> >>>
> >>>>
> >>>>
> >>>> On 01/02/2024 07:19, Richard Biener wrote:
> >>>>> On Wed, 31 Jan 2024, Andre Vieira (lists) wrote:
> >>>>>
> >>>>>
> >>>>> The patch didn't come with a testcase so it's really hard to tell
> >>>>> what goes wrong now and how it is fixed ...
> >>>>
> >>>> My bad! I had a testcase locally but never added it...
> >>>>
> >>>> However... now I look at it and ran it past Richard S, the codegen isn't
> >>>> 'wrong', but it does have the potential to lead to some pretty slow
> >>>> codegen,
> >>>> especially for inbranch simdclones where it transforms the SVE predicate
> >>>> into
> >>>> an Advanced SIMD vector by inserting the elements one at a time...
> >>>>
> >>>> An example of which can be seen if you do:
> >>>>
> >>>> gcc -O3 -march=armv8-a+sve -msve-vector-bits=128  -fopenmp-simd t.c -S
> >>>>
> >>>> with the following t.c:
> >>>> #pragma omp declare simd simdlen(4) inbranch
> >>>> int __attribute__ ((const)) fn5(int);
> >>>>
> >>>> void fn4 (int *a, int *b, int n)
> >>>> {
> >>>>       for (int i = 0; i < n; ++i)
> >>>>           b[i] = fn5(a[i]);
> >>>> }
> >>>>
> >>>> Now I do have to say, for our main usecase of libmvec we won't have any
> >>>> 'inbranch' Advanced SIMD clones, so we avoid that issue... But of course
> >>>> that
> >>>> doesn't mean user-code will.
> >>>
> >>> It seems to use SVE masks with vector(4) <signed-boolean:4> and the
> >>> ABI says the mask is vector(4) int.  You say that's because we choose
> >>> a Adv SIMD clone for the SVE VLS vector code (it calls _ZGVnM4v_fn5).
> >>>
> >>> The vectorizer creates
> >>>
> >>>     _44 = VEC_COND_EXPR <loop_mask_41, { 1, 1, 1, 1 }, { 0, 0, 0, 0 }>;
> >>>
> >>> and then vector lowering decomposes this.  That means the vectorizer
> >>> lacks a check that the target handles this VEC_COND_EXPR.
> >>>
> >>> Of course I would expect that SVE with VLS vectors is able to
> >>> code generate this operation, so it's missing patterns in the end.
> >>>
> >>> Richard.
> >>>
> >>
> >> What should we do for GCC-14? Going forward I think the right thing to do
> >> is
> >> to add these patterns. But I am not even going to try to do that right now
> >> and
> >> even though we can codegen for this, the result doesn't feel like it would
> >> ever be profitable which means I'd rather not vectorize, or well pick a
> >> different vector mode if possible.
> >>
> >> This would be achieved with the change to the targethook. If I change the
> >> hook
> >> to take modes, using STMT_VINFO_VECTYPE (stmt_vinfo), is that OK for now?
> > 
> > Passing in a mode is OK.  I'm still not fully understanding why the
> > clone isn't fully specifying 'mode' and if it does not why the
> > vectorizer itself can not disregard it.
> 
> 
> We could check that the modes of the parameters & return type are the same as
> the vector operands & result in the vectorizer. But then we'd also want to
> make sure we don't reject cases where we have simdclones with compatible
> modes, aka same element type, but a multiple element count.  Which is where'd
> we get in trouble again I think, because we'd want to accept V8SI -> 2x V4SI,
> but not V8SI -> 2x VNx4SI (with VLS and aarch64_sve_vg = 2), not because it's
> invalid, but because right now the codegen is bad. And it's easier to do this
> in the targethook, which we can technically also use to 'rank' simdclones by
> setting a target_badness value, so in the future we could decide to assign
> some 'badness' to influence the rank a SVE simdclone for Advanced SIMD loops
> vs an Advanced SIMD clone for Advanced SIMD loops.
> 
> This does touch another issue of simdclone costing, which is a larger issue in
> general and one we (arm) might want to approach in the future. It's a complex
> issue, because the vectorizer doesn't know the performance impact of a
> simdclone, we assume (as we should) that its faster than the original scalar,
> though we currently don't record costs for either, but we don't know by how
> much or how much impact it has, so the vectorizer can't reason whether it's
> beneficial to use a simdclone if it has to do a lot of operand preparation, we
> can merely tell it to use it, or not and all the other operations in the loop
> will determine costing.
> 
> 
> > From the past discussion I understood the existing situation isn't
> > as bad as initially thought and no bad things happen right now?
> Nope, I thought they compiler would fall apart, but it seems to be able to
> transform the operands from one mode into the other, so without the targethook
> it just generates slower loops in certain cases, which we'd rather avoid given
> the usecase for simdclones is to speed things up ;)
> 
> 
> Attached reworked patch.
> 
> 
> This patch adds a machine_mode argument to TARGET_SIMD_CLONE_USABLE to make
> sure the target can reject a simd_clone based on the vector mode it is using.
> This is needed because for VLS SVE vectorization the vectorizer accepts
> Advanced SIMD simd clones when vectorizing using SVE types because the
> simdlens might match, this currently leads to suboptimal codegen.
> 
> Other targets do not currently need to use this argument.

+ix86_simd_clone_usable (struct cgraph_node *node,
+                       machine_mode ARG_UNUSED (vector_mode))

we use C++, just omit the parameter name.

You use STMT_VINFO_VECTYPE conditional and vinfo->vector_mode otherwise.
I think simdclones without a return value might be a thing?  What type
would STMT_VINFO_VECTYPE correspond to?  The documentation also doesn't
say whether it's the mode of a return value, or which argument value.
It seems it is just a random mode that might or might not provide
properties of the incoming(?) argument values and that might differ
from the actual argument modes?  As said, I think the vectorizer has
more info - it knows the incoming mode for each arg and the expected
argument modes for each arg, including mask modes involved (that
extra mode argument is for values and never for masks?).

Anyway, I can live with this but I'll leave it to Richard S. to
approve (and take the blame ;)).

Thanks,
Richard.

> gcc/ChangeLog:
> 
> 	* target.def (TARGET_SIMD_CLONE_USABLE): Add argument.
> 	* tree-vect-stmts.cc (vectorizable_simd_clone_call): Pass vector_mode
> 	to call TARGET_SIMD_CLONE_USABLE.
> 	* config/aarch64/aarch64.cc (aarch64_simd_clone_usable): Add argument
> 	and use it to reject the use of SVE simd clones with Advanced SIMD
> 	modes.
> 	* config/gcn/gcn.cc (gcn_simd_clone_usable): Add unused argument.
> 	* config/i386/i386.cc (ix86_simd_clone_usable): Likewise.
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2024-02-29  7:26 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-01-30 14:31 [PATCH 0/3] vect, aarch64: Add SVE support for simdclones Andre Vieira
2024-01-30 14:31 ` [PATCH 1/3] vect: Pass stmt_vec_info to TARGET_SIMD_CLONE_USABLE Andre Vieira
2024-01-31 12:11   ` Richard Biener
2024-01-31 12:13     ` Richard Biener
2024-01-31 13:52       ` Andre Vieira (lists)
2024-01-31 13:58         ` Richard Biener
2024-01-31 14:03           ` Richard Biener
2024-01-31 16:13             ` Andre Vieira (lists)
2024-01-31 14:35           ` Andre Vieira (lists)
2024-01-31 14:35             ` Richard Biener
2024-01-31 16:36               ` Andre Vieira (lists)
2024-02-01  7:19                 ` Richard Biener
2024-02-01 17:01                   ` Andre Vieira (lists)
2024-02-05  9:56                     ` Richard Biener
2024-02-26 16:56                       ` Andre Vieira (lists)
2024-02-27  8:47                         ` Richard Biener
2024-02-28 17:25                           ` Andre Vieira (lists)
2024-02-29  7:26                             ` Richard Biener
2024-02-01  7:59                 ` Richard Sandiford
2024-01-30 14:31 ` [PATCH 2/3] vect: disable multiple calls of poly simdclones Andre Vieira
2024-01-31 12:13   ` Richard Biener
2024-01-30 14:31 ` [PATCH 3/3] aarch64: Add SVE support for simd clones [PR 96342] Andre Vieira
2024-02-01 21:59   ` Richard Sandiford

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).