aarch64, vect, omp: Add SVE support for simd clones [PR 96342]

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* aarch64, vect, omp: Add SVE support for simd clones [PR 96342]
@ 2023-08-30  8:49 Andre Vieira (lists)
  2023-08-30  9:06 ` [PATCH 1/8] parloops: Copy target and optimizations when creating a function clone Andre Vieira (lists)
                   ` (9 more replies)
  0 siblings, 10 replies; 40+ messages in thread
From: Andre Vieira (lists) @ 2023-08-30  8:49 UTC (permalink / raw)
  To: gcc-patches; +Cc: Richard Sandiford, Richard Biener, jakub

Hi,

This patch series aims to implement support for SVE simd clones when not 
specifying a 'simdlen' clause for AArch64. This patch depends on my 
earlier patch: '[PATCH] aarch64: enable mixed-types for aarch64 simdclones'.

Bootstrapped and regression tested the series on 
aarch64-unknown-linux-gnu and x86_64-pc-linux-gnu. I also tried building 
the patches separately, but that was before some further clean-up 
restructuring, so will do that again prior to pushing.

Andre Vieira (8):

parloops: Copy target and optimizations when creating a function clone
parloops: Allow poly nit and bound
vect: Fix vect_get_smallest_scalar_type for simd clones
vect: don't allow fully masked loops with non-masked simd clones [PR 110485]
vect: Use inbranch simdclones in masked loops
vect: Add vector_mode paramater to simd_clone_usable
vect: Add TARGET_SIMD_CLONE_ADJUST_RET_OR_PARAM
aarch64: Add SVE support for simd clones [PR 96342]

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH 1/8] parloops: Copy target and optimizations when creating a function clone
  2023-08-30  8:49 aarch64, vect, omp: Add SVE support for simd clones [PR 96342] Andre Vieira (lists)
@ 2023-08-30  9:06 ` Andre Vieira (lists)
  2023-08-30 12:31   ` Richard Biener
  2023-08-30  9:08 ` [Patch 2/8] parloops: Allow poly nit and bound Andre Vieira (lists)
                   ` (8 subsequent siblings)
  9 siblings, 1 reply; 40+ messages in thread
From: Andre Vieira (lists) @ 2023-08-30  9:06 UTC (permalink / raw)
  To: gcc-patches; +Cc: Richard Sandiford, Richard Biener, jakub

[-- Attachment #1: Type: text/plain, Size: 528 bytes --]


SVE simd clones require to be compiled with a SVE target enabled or the 
argument types will not be created properly. To achieve this we need to 
copy DECL_FUNCTION_SPECIFIC_TARGET from the original function 
declaration to the clones.  I decided it was probably also a good idea 
to copy DECL_FUNCTION_SPECIFIC_OPTIMIZATION in case the original 
function is meant to be compiled with specific optimization options.

gcc/ChangeLog:

	* tree-parloops.cc (create_loop_fn): Copy specific target and
	optimization options to clone.

[-- Attachment #2: sve_simd_clones_1.patch --]
[-- Type: text/plain, Size: 593 bytes --]

diff --git a/gcc/tree-parloops.cc b/gcc/tree-parloops.cc
index e495bbd65270bdf90bae2c4a2b52777522352a77..a35f3d5023b06e5ef96eb4222488fcb34dd7bd45 100644
--- a/gcc/tree-parloops.cc
+++ b/gcc/tree-parloops.cc
@@ -2203,6 +2203,11 @@ create_loop_fn (location_t loc)
   DECL_CONTEXT (t) = decl;
   TREE_USED (t) = 1;
   DECL_ARGUMENTS (decl) = t;
+  DECL_FUNCTION_SPECIFIC_TARGET (decl)
+    = DECL_FUNCTION_SPECIFIC_TARGET (act_cfun->decl);
+  DECL_FUNCTION_SPECIFIC_OPTIMIZATION (decl)
+    = DECL_FUNCTION_SPECIFIC_OPTIMIZATION (act_cfun->decl);
+
 
   allocate_struct_function (decl, false);
 

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [Patch 2/8] parloops: Allow poly nit and bound
  2023-08-30  8:49 aarch64, vect, omp: Add SVE support for simd clones [PR 96342] Andre Vieira (lists)
  2023-08-30  9:06 ` [PATCH 1/8] parloops: Copy target and optimizations when creating a function clone Andre Vieira (lists)
@ 2023-08-30  9:08 ` Andre Vieira (lists)
  2023-08-30 12:32   ` Richard Biener
  2023-08-30  9:10 ` [Patch 3/8] vect: Fix vect_get_smallest_scalar_type for simd clones Andre Vieira (lists)
                   ` (7 subsequent siblings)
  9 siblings, 1 reply; 40+ messages in thread
From: Andre Vieira (lists) @ 2023-08-30  9:08 UTC (permalink / raw)
  To: gcc-patches; +Cc: Richard Sandiford, Richard Biener

[-- Attachment #1: Type: text/plain, Size: 217 bytes --]

Teach parloops how to handle a poly nit and bound e ahead of the changes 
to enable non-constant simdlen.

gcc/ChangeLog:

	* tree-parloops.cc (try_to_transform_to_exit_first_loop_alt): Accept
	poly NIT and ALT_BOUND.

[-- Attachment #2: sve_simd_clones_2.patch --]
[-- Type: text/plain, Size: 940 bytes --]

diff --git a/gcc/tree-parloops.cc b/gcc/tree-parloops.cc
index a35f3d5023b06e5ef96eb4222488fcb34dd7bd45..cf713e53d712fb5ad050e274f373adba5a90c5a7 100644
--- a/gcc/tree-parloops.cc
+++ b/gcc/tree-parloops.cc
@@ -2531,14 +2531,16 @@ try_transform_to_exit_first_loop_alt (class loop *loop,
   tree nit_type = TREE_TYPE (nit);
 
   /* Figure out whether nit + 1 overflows.  */
-  if (TREE_CODE (nit) == INTEGER_CST)
+  if (TREE_CODE (nit) == INTEGER_CST
+      || TREE_CODE (nit) == POLY_INT_CST)
     {
       if (!tree_int_cst_equal (nit, TYPE_MAX_VALUE (nit_type)))
 	{
 	  alt_bound = fold_build2_loc (UNKNOWN_LOCATION, PLUS_EXPR, nit_type,
 				       nit, build_one_cst (nit_type));
 
-	  gcc_assert (TREE_CODE (alt_bound) == INTEGER_CST);
+	  gcc_assert (TREE_CODE (alt_bound) == INTEGER_CST
+		      || TREE_CODE (alt_bound) == POLY_INT_CST);
 	  transform_to_exit_first_loop_alt (loop, reduction_list, alt_bound);
 	  return true;
 	}

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [Patch 3/8] vect: Fix vect_get_smallest_scalar_type for simd clones
  2023-08-30  8:49 aarch64, vect, omp: Add SVE support for simd clones [PR 96342] Andre Vieira (lists)
  2023-08-30  9:06 ` [PATCH 1/8] parloops: Copy target and optimizations when creating a function clone Andre Vieira (lists)
  2023-08-30  9:08 ` [Patch 2/8] parloops: Allow poly nit and bound Andre Vieira (lists)
@ 2023-08-30  9:10 ` Andre Vieira (lists)
  2023-08-30 12:54   ` Richard Biener
  2023-08-30  9:11 ` [PATCH 4/8] vect: don't allow fully masked loops with non-masked simd clones [PR 110485] Andre Vieira (lists)
                   ` (6 subsequent siblings)
  9 siblings, 1 reply; 40+ messages in thread
From: Andre Vieira (lists) @ 2023-08-30  9:10 UTC (permalink / raw)
  To: gcc-patches; +Cc: Richard Sandiford, Richard Biener, jakub

[-- Attachment #1: Type: text/plain, Size: 965 bytes --]

The vect_get_smallest_scalar_type helper function was using any argument 
to a simd clone call when trying to determine the smallest scalar type 
that would be vectorized.  This included the function pointer type in a 
MASK_CALL for instance, and would result in the wrong type being 
selected.  Instead this patch special cases simd_clone_call's and uses 
only scalar types of the original function that get transformed into 
vector types.

gcc/ChangeLog:

	* tree-vect-data-refs.cci (vect_get_smallest_scalar_type): Special case
	simd clone calls and only use types that are mapped to vectors.
	* tree-vect-stmts.cc (simd_clone_call_p): New helper function.
	* tree-vectorizer.h (simd_clone_call_p): Declare new function.

gcc/testsuite/ChangeLog:

	* gcc.dg/vect/vect-simd-clone-16f.c: Remove unnecessary differentation
	between targets with different pointer sizes.
	* gcc.dg/vect/vect-simd-clone-17f.c: Likewise.
	* gcc.dg/vect/vect-simd-clone-18f.c: Likewise.

[-- Attachment #2: sve_simd_clones_3.patch --]
[-- Type: text/plain, Size: 5882 bytes --]

diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-16f.c b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-16f.c
index 574698d3e133ecb8700e698fa42a6b05dd6b8a18..7cd29e894d0502a59fadfe67db2db383133022d3 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-16f.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-16f.c
@@ -7,9 +7,8 @@
 #include "vect-simd-clone-16.c"
 
 /* Ensure the the in-branch simd clones are used on targets that support them.
-   Some targets use pairs of vectors and do twice the calls.  */
-/* { dg-final { scan-tree-dump-times {[\n\r] [^\n]* = foo\.simdclone} 2 "vect" { target { ! { { i?86-*-* x86_64-*-* } && { ! lp64 } } } } } } */
-/* { dg-final { scan-tree-dump-times {[\n\r] [^\n]* = foo\.simdclone} 4 "vect" { target { { i?86*-*-* x86_64-*-* } && { ! lp64 } } } } } */
+ */
+/* { dg-final { scan-tree-dump-times {[\n\r] [^\n]* = foo\.simdclone} 2 "vect" } } */
 
 /* The LTO test produces two dump files and we scan the wrong one.  */
 /* { dg-skip-if "" { *-*-* } { "-flto" } { "" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-17f.c b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-17f.c
index 8bb6d19301a67a3eebce522daaf7d54d88f708d7..177521dc44531479fca1f1a1a0f2010f30fa3fb5 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-17f.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-17f.c
@@ -7,9 +7,8 @@
 #include "vect-simd-clone-17.c"
 
 /* Ensure the the in-branch simd clones are used on targets that support them.
-   Some targets use pairs of vectors and do twice the calls.  */
-/* { dg-final { scan-tree-dump-times {[\n\r] [^\n]* = foo\.simdclone} 2 "vect" { target { ! { { i?86-*-* x86_64-*-* } && { ! lp64 } } } } } } */
-/* { dg-final { scan-tree-dump-times {[\n\r] [^\n]* = foo\.simdclone} 4 "vect" { target { { i?86*-*-* x86_64-*-* } && { ! lp64 } } } } } */
+ */
+/* { dg-final { scan-tree-dump-times {[\n\r] [^\n]* = foo\.simdclone} 2 "vect" } } */
 
 /* The LTO test produces two dump files and we scan the wrong one.  */
 /* { dg-skip-if "" { *-*-* } { "-flto" } { "" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-18f.c b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-18f.c
index d34f23f4db8e9c237558cc22fe66b7e02b9e6c20..4dd51381d73c0c7c8ec812f24e5054df038059c5 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-18f.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-18f.c
@@ -7,9 +7,8 @@
 #include "vect-simd-clone-18.c"
 
 /* Ensure the the in-branch simd clones are used on targets that support them.
-   Some targets use pairs of vectors and do twice the calls.  */
-/* { dg-final { scan-tree-dump-times {[\n\r] [^\n]* = foo\.simdclone} 2 "vect" { target { ! { { i?86-*-* x86_64-*-* } && { ! lp64 } } } } } } */
-/* { dg-final { scan-tree-dump-times {[\n\r] [^\n]* = foo\.simdclone} 4 "vect" { target { { i?86*-*-* x86_64-*-* } && { ! lp64 } } } } } */
+ */
+/* { dg-final { scan-tree-dump-times {[\n\r] [^\n]* = foo\.simdclone} 2 "vect" } } */
 
 /* The LTO test produces two dump files and we scan the wrong one.  */
 /* { dg-skip-if "" { *-*-* } { "-flto" } { "" } } */
diff --git a/gcc/tree-vect-data-refs.cc b/gcc/tree-vect-data-refs.cc
index a3570c45b5209281ac18c1220c3b95398487f389..1bdbea232afc6facddac23269ee3da033eb1ed50 100644
--- a/gcc/tree-vect-data-refs.cc
+++ b/gcc/tree-vect-data-refs.cc
@@ -119,6 +119,7 @@ tree
 vect_get_smallest_scalar_type (stmt_vec_info stmt_info, tree scalar_type)
 {
   HOST_WIDE_INT lhs, rhs;
+  cgraph_node *node;
 
   /* During the analysis phase, this function is called on arbitrary
      statements that might not have scalar results.  */
@@ -145,6 +146,23 @@ vect_get_smallest_scalar_type (stmt_vec_info stmt_info, tree scalar_type)
 	    scalar_type = rhs_type;
 	}
     }
+  else if (simd_clone_call_p (stmt_info->stmt, &node))
+    {
+      auto clone = node->simd_clones->simdclone;
+      for (unsigned int i = 0; i < clone->nargs; ++i)
+	{
+	  if (clone->args[i].arg_type == SIMD_CLONE_ARG_TYPE_VECTOR)
+	    {
+	      tree arg_scalar_type = TREE_TYPE (clone->args[i].vector_type);
+	      rhs = TREE_INT_CST_LOW (TYPE_SIZE_UNIT (arg_scalar_type));
+	      if (rhs < lhs)
+		{
+		  scalar_type = arg_scalar_type;
+		  lhs = rhs;
+		}
+	    }
+	}
+    }
   else if (gcall *call = dyn_cast <gcall *> (stmt_info->stmt))
     {
       unsigned int i = 0;
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index 0fe5d0594abc095d3770b5ce4b9f2bad5205ab2f..35207de7acb410358220dbe8d1af82215b5091bf 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -3965,6 +3965,29 @@ vect_simd_lane_linear (tree op, class loop *loop,
     }
 }
 
+bool
+simd_clone_call_p (gimple *stmt, cgraph_node **out_node)
+{
+  gcall *call = dyn_cast <gcall *> (stmt);
+  if (!call)
+    return false;
+
+  tree fndecl = NULL_TREE;
+  if (gimple_call_internal_p (call, IFN_MASK_CALL))
+    fndecl = TREE_OPERAND (gimple_call_arg (stmt, 0), 0);
+  else
+    fndecl = gimple_call_fndecl (stmt);
+
+  if (fndecl == NULL_TREE)
+    return false;
+
+  cgraph_node *node = cgraph_node::get (fndecl);
+  if (out_node)
+    *out_node = node;
+
+  return node != NULL && node->simd_clones != NULL;
+}
+
 /* Function vectorizable_simd_clone_call.
 
    Check if STMT_INFO performs a function call that can be vectorized
diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
index a65161499ea13f200aa745ca396db663a217b081..69634f7a6032696b394a62fb7ca8986bc78987c8 100644
--- a/gcc/tree-vectorizer.h
+++ b/gcc/tree-vectorizer.h
@@ -2165,6 +2165,7 @@ extern bool vect_can_advance_ivs_p (loop_vec_info);
 extern void vect_update_inits_of_drs (loop_vec_info, tree, tree_code);
 
 /* In tree-vect-stmts.cc.  */
+extern bool simd_clone_call_p (gimple *, struct cgraph_node **node = NULL);
 extern tree get_related_vectype_for_scalar_type (machine_mode, tree,
 						 poly_uint64 = 0);
 extern tree get_vectype_for_scalar_type (vec_info *, tree, unsigned int = 0);

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH 4/8] vect: don't allow fully masked loops with non-masked simd clones [PR 110485]
  2023-08-30  8:49 aarch64, vect, omp: Add SVE support for simd clones [PR 96342] Andre Vieira (lists)
                   ` (2 preceding siblings ...)
  2023-08-30  9:10 ` [Patch 3/8] vect: Fix vect_get_smallest_scalar_type for simd clones Andre Vieira (lists)
@ 2023-08-30  9:11 ` Andre Vieira (lists)
  2023-08-30 12:54   ` Richard Biener
  2023-08-30  9:13 ` [PATCH 5/8] vect: Use inbranch simdclones in masked loops Andre Vieira (lists)
                   ` (5 subsequent siblings)
  9 siblings, 1 reply; 40+ messages in thread
From: Andre Vieira (lists) @ 2023-08-30  9:11 UTC (permalink / raw)
  To: gcc-patches; +Cc: Richard Biener

[-- Attachment #1: Type: text/plain, Size: 637 bytes --]

When analyzing a loop and choosing a simdclone to use it is possible to 
choose a simdclone that cannot be used 'inbranch' for a loop that can 
use partial vectors.  This may lead to the vectorizer deciding to use 
partial vectors which are not supported for notinbranch simd clones. 
This patch fixes that by disabling the use of partial vectors once a 
notinbranch simd clone has been selected.

gcc/ChangeLog:

	PR tree-optimization/110485
	* tree-vect-stmts.cc (vectorizable_simd_clone_call): Disable partial
	vectors usage if a notinbranch simdclone has been selected.

gcc/testsuite/ChangeLog:

	* gcc.dg/gomp/pr110485.c: New test.

[-- Attachment #2: sve_simd_clones_4.patch --]
[-- Type: text/plain, Size: 1925 bytes --]

diff --git a/gcc/testsuite/gcc.dg/gomp/pr110485.c b/gcc/testsuite/gcc.dg/gomp/pr110485.c
new file mode 100644
index 0000000000000000000000000000000000000000..ba6817a127f40246071e32ccebf692cc4d121d15
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/gomp/pr110485.c
@@ -0,0 +1,19 @@
+/* PR 110485 */
+/* { dg-do compile } */
+/* { dg-additional-options "-Ofast -fdump-tree-vect-details" } */
+/* { dg-additional-options "-march=znver4 --param=vect-partial-vector-usage=1" { target x86_64-*-* } } */
+#pragma omp declare simd notinbranch uniform(p)
+extern double __attribute__ ((const)) bar (double a, double p);
+
+double a[1024];
+double b[1024];
+
+void foo (int n)
+{
+  #pragma omp simd
+  for (int i = 0; i < n; ++i)
+    a[i] = bar (b[i], 71.2);
+}
+
+/* { dg-final { scan-tree-dump-not "MASK_LOAD" "vect" } } */
+/* { dg-final { scan-tree-dump "can't use a fully-masked loop because a non-masked simd clone was selected." "vect" { target x86_64-*-* } } } */
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index 35207de7acb410358220dbe8d1af82215b5091bf..664c3b5f7ca48fdb49383fb8a97f407465574479 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -4349,6 +4349,17 @@ vectorizable_simd_clone_call (vec_info *vinfo, stmt_vec_info stmt_info,
 		       ? boolean_true_node : boolean_false_node;
 	    STMT_VINFO_SIMD_CLONE_INFO (stmt_info).safe_push (sll);
 	  }
+
+      if (!bestn->simdclone->inbranch)
+	{
+	  if (dump_enabled_p ()
+	      && LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo))
+	    dump_printf_loc (MSG_NOTE, vect_location,
+			     "can't use a fully-masked loop because a"
+			     " non-masked simd clone was selected.\n");
+	  LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo) = false;
+	}
+
       STMT_VINFO_TYPE (stmt_info) = call_simd_clone_vec_info_type;
       DUMP_VECT_SCOPE ("vectorizable_simd_clone_call");
 /*      vect_model_simple_cost (vinfo, stmt_info, ncopies,

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH 5/8] vect: Use inbranch simdclones in masked loops
  2023-08-30  8:49 aarch64, vect, omp: Add SVE support for simd clones [PR 96342] Andre Vieira (lists)
                   ` (3 preceding siblings ...)
  2023-08-30  9:11 ` [PATCH 4/8] vect: don't allow fully masked loops with non-masked simd clones [PR 110485] Andre Vieira (lists)
@ 2023-08-30  9:13 ` Andre Vieira (lists)
  2023-10-18 14:41   ` Andre Vieira (lists)
  2023-08-30  9:14 ` [PATCH 6/8] vect: Add vector_mode paramater to simd_clone_usable Andre Vieira (lists)
                   ` (4 subsequent siblings)
  9 siblings, 1 reply; 40+ messages in thread
From: Andre Vieira (lists) @ 2023-08-30  9:13 UTC (permalink / raw)
  To: gcc-patches; +Cc: Richard Biener, Richard Sandiford, jakub

[-- Attachment #1: Type: text/plain, Size: 452 bytes --]

This patch enables the compiler to use inbranch simdclones when 
generating masked loops in autovectorization.

gcc/ChangeLog:

	* omp-simd-clone.cc (simd_clone_adjust_argument_types): Make function
	compatible with mask parameters in clone.
	* tree-vect-stmts.cc (vect_convert): New helper function.
	(vect_build_all_ones_mask): Allow vector boolean typed masks.
	(vectorizable_simd_clone_call): Enable the use of masked clones in
	fully masked loops.

[-- Attachment #2: sve_simd_clones_5.patch --]
[-- Type: text/plain, Size: 14633 bytes --]

diff --git a/gcc/omp-simd-clone.cc b/gcc/omp-simd-clone.cc
index a42643400ddcf10961633448b49d4caafb999f12..ef0b9b48c7212900023bc0eaebca5e1f9389db77 100644
--- a/gcc/omp-simd-clone.cc
+++ b/gcc/omp-simd-clone.cc
@@ -807,8 +807,14 @@ simd_clone_adjust_argument_types (struct cgraph_node *node)
     {
       ipa_adjusted_param adj;
       memset (&adj, 0, sizeof (adj));
-      tree parm = args[i];
-      tree parm_type = node->definition ? TREE_TYPE (parm) : parm;
+      tree parm = NULL_TREE;
+      tree parm_type = NULL_TREE;
+      if(i < args.length())
+	{
+	  parm = args[i];
+	  parm_type = node->definition ? TREE_TYPE (parm) : parm;
+	}
+
       adj.base_index = i;
       adj.prev_clone_index = i;
 
@@ -1547,7 +1553,7 @@ simd_clone_adjust (struct cgraph_node *node)
 	  mask = gimple_assign_lhs (g);
 	  g = gimple_build_assign (make_ssa_name (TREE_TYPE (mask)),
 				   BIT_AND_EXPR, mask,
-				   build_int_cst (TREE_TYPE (mask), 1));
+				   build_one_cst (TREE_TYPE (mask)));
 	  gsi_insert_after (&gsi, g, GSI_CONTINUE_LINKING);
 	  mask = gimple_assign_lhs (g);
 	}
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index 664c3b5f7ca48fdb49383fb8a97f407465574479..7217f36a250d549b955c874d7c7644d94982b0b5 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -1723,6 +1723,20 @@ check_load_store_for_partial_vectors (loop_vec_info loop_vinfo, tree vectype,
     }
 }
 
+/* Return SSA name of the result of the conversion of OPERAND into type TYPE.
+   The conversion statement is inserted at GSI.  */
+
+static tree
+vect_convert (vec_info *vinfo, stmt_vec_info stmt_info, tree type, tree operand,
+	      gimple_stmt_iterator *gsi)
+{
+  operand = build1 (VIEW_CONVERT_EXPR, type, operand);
+  gassign *new_stmt = gimple_build_assign (make_ssa_name (type),
+					   operand);
+  vect_finish_stmt_generation (vinfo, stmt_info, new_stmt, gsi);
+  return gimple_get_lhs (new_stmt);
+}
+
 /* Return the mask input to a masked load or store.  VEC_MASK is the vectorized
    form of the scalar mask condition and LOOP_MASK, if nonnull, is the mask
    that needs to be applied to all loads and stores in a vectorized loop.
@@ -2666,7 +2680,8 @@ vect_build_all_ones_mask (vec_info *vinfo,
 {
   if (TREE_CODE (masktype) == INTEGER_TYPE)
     return build_int_cst (masktype, -1);
-  else if (TREE_CODE (TREE_TYPE (masktype)) == INTEGER_TYPE)
+  else if (VECTOR_BOOLEAN_TYPE_P (masktype)
+	   || TREE_CODE (TREE_TYPE (masktype)) == INTEGER_TYPE)
     {
       tree mask = build_int_cst (TREE_TYPE (masktype), -1);
       mask = build_vector_from_val (masktype, mask);
@@ -4018,7 +4033,7 @@ vectorizable_simd_clone_call (vec_info *vinfo, stmt_vec_info stmt_info,
   size_t i, nargs;
   tree lhs, rtype, ratype;
   vec<constructor_elt, va_gc> *ret_ctor_elts = NULL;
-  int arg_offset = 0;
+  int masked_call_offset = 0;
 
   /* Is STMT a vectorizable call?   */
   gcall *stmt = dyn_cast <gcall *> (stmt_info->stmt);
@@ -4033,7 +4048,7 @@ vectorizable_simd_clone_call (vec_info *vinfo, stmt_vec_info stmt_info,
       gcc_checking_assert (TREE_CODE (fndecl) == ADDR_EXPR);
       fndecl = TREE_OPERAND (fndecl, 0);
       gcc_checking_assert (TREE_CODE (fndecl) == FUNCTION_DECL);
-      arg_offset = 1;
+      masked_call_offset = 1;
     }
   if (fndecl == NULL_TREE)
     return false;
@@ -4065,7 +4080,7 @@ vectorizable_simd_clone_call (vec_info *vinfo, stmt_vec_info stmt_info,
     return false;
 
   /* Process function arguments.  */
-  nargs = gimple_call_num_args (stmt) - arg_offset;
+  nargs = gimple_call_num_args (stmt) - masked_call_offset;
 
   /* Bail out if the function has zero arguments.  */
   if (nargs == 0)
@@ -4083,7 +4098,7 @@ vectorizable_simd_clone_call (vec_info *vinfo, stmt_vec_info stmt_info,
       thisarginfo.op = NULL_TREE;
       thisarginfo.simd_lane_linear = false;
 
-      op = gimple_call_arg (stmt, i + arg_offset);
+      op = gimple_call_arg (stmt, i + masked_call_offset);
       if (!vect_is_simple_use (op, vinfo, &thisarginfo.dt,
 			       &thisarginfo.vectype)
 	  || thisarginfo.dt == vect_uninitialized_def)
@@ -4161,14 +4176,6 @@ vectorizable_simd_clone_call (vec_info *vinfo, stmt_vec_info stmt_info,
     }
 
   poly_uint64 vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
-  if (!vf.is_constant ())
-    {
-      if (dump_enabled_p ())
-	dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
-			 "not considering SIMD clones; not yet supported"
-			 " for variable-width vectors.\n");
-      return false;
-    }
 
   unsigned int badness = 0;
   struct cgraph_node *bestn = NULL;
@@ -4181,7 +4188,8 @@ vectorizable_simd_clone_call (vec_info *vinfo, stmt_vec_info stmt_info,
 	unsigned int this_badness = 0;
 	unsigned int num_calls;
 	if (!constant_multiple_p (vf, n->simdclone->simdlen, &num_calls)
-	    || n->simdclone->nargs != nargs)
+	    || (!n->simdclone->inbranch && (masked_call_offset > 0))
+	    || nargs != n->simdclone->nargs)
 	  continue;
 	if (num_calls != 1)
 	  this_badness += exact_log2 (num_calls) * 4096;
@@ -4198,7 +4206,8 @@ vectorizable_simd_clone_call (vec_info *vinfo, stmt_vec_info stmt_info,
 	      case SIMD_CLONE_ARG_TYPE_VECTOR:
 		if (!useless_type_conversion_p
 			(n->simdclone->args[i].orig_type,
-			 TREE_TYPE (gimple_call_arg (stmt, i + arg_offset))))
+			 TREE_TYPE (gimple_call_arg (stmt,
+						     i + masked_call_offset))))
 		  i = -1;
 		else if (arginfo[i].dt == vect_constant_def
 			 || arginfo[i].dt == vect_external_def
@@ -4243,6 +4252,17 @@ vectorizable_simd_clone_call (vec_info *vinfo, stmt_vec_info stmt_info,
 	  }
 	if (i == (size_t) -1)
 	  continue;
+	if (masked_call_offset == 0
+	    && n->simdclone->inbranch
+	    && n->simdclone->nargs > nargs)
+	  {
+	    gcc_assert (n->simdclone->args[n->simdclone->nargs - 1].arg_type ==
+			SIMD_CLONE_ARG_TYPE_MASK);
+	    /* Penalize using a masked SIMD clone in a non-masked loop, that is
+	       not in a branch, as we'd have to construct an all-true mask.  */
+	    if (!loop_vinfo || !LOOP_VINFO_FULLY_MASKED_P (loop_vinfo))
+	      this_badness += 64;
+	  }
 	if (bestn == NULL || this_badness < badness)
 	  {
 	    bestn = n;
@@ -4259,7 +4279,8 @@ vectorizable_simd_clone_call (vec_info *vinfo, stmt_vec_info stmt_info,
 	   || arginfo[i].dt == vect_external_def)
 	  && bestn->simdclone->args[i].arg_type == SIMD_CLONE_ARG_TYPE_VECTOR)
 	{
-	  tree arg_type = TREE_TYPE (gimple_call_arg (stmt, i + arg_offset));
+	  tree arg_type = TREE_TYPE (gimple_call_arg (stmt,
+						      i + masked_call_offset));
 	  arginfo[i].vectype = get_vectype_for_scalar_type (vinfo, arg_type,
 							    slp_node);
 	  if (arginfo[i].vectype == NULL
@@ -4331,24 +4352,38 @@ vectorizable_simd_clone_call (vec_info *vinfo, stmt_vec_info stmt_info,
 	  && TREE_CODE (TREE_TYPE (TREE_TYPE (bestn->decl))) == ARRAY_TYPE)
 	vinfo->any_known_not_updated_vssa = true;
       STMT_VINFO_SIMD_CLONE_INFO (stmt_info).safe_push (bestn->decl);
-      for (i = 0; i < nargs; i++)
-	if ((bestn->simdclone->args[i].arg_type
-	     == SIMD_CLONE_ARG_TYPE_LINEAR_CONSTANT_STEP)
-	    || (bestn->simdclone->args[i].arg_type
-		== SIMD_CLONE_ARG_TYPE_LINEAR_REF_CONSTANT_STEP))
-	  {
-	    STMT_VINFO_SIMD_CLONE_INFO (stmt_info).safe_grow_cleared (i * 3
-									+ 1,
-								      true);
-	    STMT_VINFO_SIMD_CLONE_INFO (stmt_info).safe_push (arginfo[i].op);
-	    tree lst = POINTER_TYPE_P (TREE_TYPE (arginfo[i].op))
-		       ? size_type_node : TREE_TYPE (arginfo[i].op);
-	    tree ls = build_int_cst (lst, arginfo[i].linear_step);
-	    STMT_VINFO_SIMD_CLONE_INFO (stmt_info).safe_push (ls);
-	    tree sll = arginfo[i].simd_lane_linear
-		       ? boolean_true_node : boolean_false_node;
-	    STMT_VINFO_SIMD_CLONE_INFO (stmt_info).safe_push (sll);
-	  }
+
+      for (i = 0; i < bestn->simdclone->nargs; i++)
+	{
+	  switch (bestn->simdclone->args[i].arg_type)
+	    {
+	    default:
+	      continue;
+	    case SIMD_CLONE_ARG_TYPE_LINEAR_CONSTANT_STEP:
+	    case SIMD_CLONE_ARG_TYPE_LINEAR_REF_CONSTANT_STEP:
+	      {
+		auto &clone_info = STMT_VINFO_SIMD_CLONE_INFO (stmt_info);
+		clone_info.safe_grow_cleared (i * 3 + 1, true);
+		clone_info.safe_push (arginfo[i].op);
+		tree lst = POINTER_TYPE_P (TREE_TYPE (arginfo[i].op))
+			   ? size_type_node : TREE_TYPE (arginfo[i].op);
+		tree ls = build_int_cst (lst, arginfo[i].linear_step);
+		clone_info.safe_push (ls);
+		tree sll = arginfo[i].simd_lane_linear
+			   ? boolean_true_node : boolean_false_node;
+		clone_info.safe_push (sll);
+	      }
+	      break;
+	    case SIMD_CLONE_ARG_TYPE_MASK:
+	      if (loop_vinfo
+		  && LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo))
+		vect_record_loop_mask (loop_vinfo,
+				       &LOOP_VINFO_MASKS (loop_vinfo),
+				       ncopies, vectype, op);
+
+	      break;
+	    }
+	}
 
       if (!bestn->simdclone->inbranch)
 	{
@@ -4394,6 +4429,8 @@ vectorizable_simd_clone_call (vec_info *vinfo, stmt_vec_info stmt_info,
   vec_oprnds_i.safe_grow_cleared (nargs, true);
   for (j = 0; j < ncopies; ++j)
     {
+      poly_uint64 callee_nelements;
+      poly_uint64 caller_nelements;
       /* Build argument list for the vectorized call.  */
       if (j == 0)
 	vargs.create (nargs);
@@ -4404,8 +4441,7 @@ vectorizable_simd_clone_call (vec_info *vinfo, stmt_vec_info stmt_info,
 	{
 	  unsigned int k, l, m, o;
 	  tree atype;
-	  poly_uint64 callee_nelements, caller_nelements;
-	  op = gimple_call_arg (stmt, i + arg_offset);
+	  op = gimple_call_arg (stmt, i + masked_call_offset);
 	  switch (bestn->simdclone->args[i].arg_type)
 	    {
 	    case SIMD_CLONE_ARG_TYPE_VECTOR:
@@ -4482,16 +4518,9 @@ vectorizable_simd_clone_call (vec_info *vinfo, stmt_vec_info stmt_info,
 		      if (k == 1)
 			if (!useless_type_conversion_p (TREE_TYPE (vec_oprnd0),
 						       atype))
-			  {
-			    vec_oprnd0
-			      = build1 (VIEW_CONVERT_EXPR, atype, vec_oprnd0);
-			    gassign *new_stmt
-			      = gimple_build_assign (make_ssa_name (atype),
-						     vec_oprnd0);
-			    vect_finish_stmt_generation (vinfo, stmt_info,
-							 new_stmt, gsi);
-			    vargs.safe_push (gimple_assign_lhs (new_stmt));
-			  }
+			  vargs.safe_push (vect_convert (vinfo, stmt_info,
+							 atype, vec_oprnd0,
+							 gsi));
 			else
 			  vargs.safe_push (vec_oprnd0);
 		      else
@@ -4544,6 +4573,24 @@ vectorizable_simd_clone_call (vec_info *vinfo, stmt_vec_info stmt_info,
 			      vec_oprnds_i[i] = 0;
 			    }
 			  vec_oprnd0 = vec_oprnds[i][vec_oprnds_i[i]++];
+			  if (loop_vinfo
+			      && LOOP_VINFO_FULLY_MASKED_P (loop_vinfo))
+			    {
+			      vec_loop_masks *loop_masks
+				= &LOOP_VINFO_MASKS (loop_vinfo);
+			      tree loop_mask
+				= vect_get_loop_mask (loop_vinfo, gsi,
+						      loop_masks, ncopies,
+						      vectype, j);
+			      vec_oprnd0
+				= prepare_vec_mask (loop_vinfo,
+						    TREE_TYPE (loop_mask),
+						    loop_mask, vec_oprnd0,
+						    gsi);
+			      loop_vinfo->vec_cond_masked_set.add ({ vec_oprnd0,
+								     loop_mask });
+
+			    }
 			  vec_oprnd0
 			    = build3 (VEC_COND_EXPR, atype, vec_oprnd0,
 				      build_vector_from_val (atype, one),
@@ -4641,6 +4688,64 @@ vectorizable_simd_clone_call (vec_info *vinfo, stmt_vec_info stmt_info,
 	    }
 	}
 
+      if (masked_call_offset == 0
+	  && bestn->simdclone->inbranch
+	  && bestn->simdclone->nargs > nargs)
+	{
+	  unsigned long m, o;
+	  size_t mask_i = bestn->simdclone->nargs - 1;
+	  tree mask;
+	  gcc_assert (bestn->simdclone->args[mask_i].arg_type ==
+		      SIMD_CLONE_ARG_TYPE_MASK);
+
+	  tree masktype = bestn->simdclone->args[mask_i].vector_type;
+	  callee_nelements = TYPE_VECTOR_SUBPARTS (masktype);
+	  o = vector_unroll_factor (nunits, callee_nelements);
+	  for (m = j * o; m < (j + 1) * o; m++)
+	    {
+	      if (loop_vinfo && LOOP_VINFO_FULLY_MASKED_P (loop_vinfo))
+		{
+		  vec_loop_masks *loop_masks = &LOOP_VINFO_MASKS (loop_vinfo);
+		  mask = vect_get_loop_mask (loop_vinfo, gsi, loop_masks,
+					     ncopies, vectype, j);
+		}
+	      else
+		mask = vect_build_all_ones_mask (vinfo, stmt_info, masktype);
+
+	      if (!useless_type_conversion_p (TREE_TYPE (mask), masktype))
+		{
+		  gassign *new_stmt;
+		  if (bestn->simdclone->mask_mode != VOIDmode)
+		    {
+		      /* This means we are dealing with integer mask modes.
+			 First convert to an integer type with the same size as
+			 the current vector type.  */
+		      unsigned HOST_WIDE_INT intermediate_size
+			= tree_to_uhwi (TYPE_SIZE (TREE_TYPE (mask)));
+		      tree mid_int_type =
+			build_nonstandard_integer_type (intermediate_size, 1);
+		      mask = build1 (VIEW_CONVERT_EXPR, mid_int_type, mask);
+		      new_stmt
+			= gimple_build_assign (make_ssa_name (mid_int_type),
+					       mask);
+		      gsi_insert_before (gsi, new_stmt, GSI_SAME_STMT);
+		      /* Then zero-extend to the mask mode.  */
+		      mask = fold_build1 (NOP_EXPR, masktype,
+					  gimple_get_lhs (new_stmt));
+		    }
+		  else
+		    mask = build1 (VIEW_CONVERT_EXPR, masktype, mask);
+
+		  new_stmt = gimple_build_assign (make_ssa_name (masktype),
+						  mask);
+		  vect_finish_stmt_generation (vinfo, stmt_info,
+					       new_stmt, gsi);
+		  mask = gimple_assign_lhs (new_stmt);
+		}
+	      vargs.safe_push (mask);
+	    }
+	}
+
       gcall *new_call = gimple_build_call_vec (fndecl, vargs);
       if (vec_dest)
 	{
@@ -4659,13 +4764,13 @@ vectorizable_simd_clone_call (vec_info *vinfo, stmt_vec_info stmt_info,
 
       if (vec_dest)
 	{
-	  if (!multiple_p (TYPE_VECTOR_SUBPARTS (vectype), nunits))
+	  caller_nelements = TYPE_VECTOR_SUBPARTS (vectype);
+	  if (!multiple_p (caller_nelements, nunits))
 	    {
 	      unsigned int k, l;
 	      poly_uint64 prec = GET_MODE_BITSIZE (TYPE_MODE (vectype));
 	      poly_uint64 bytes = GET_MODE_SIZE (TYPE_MODE (vectype));
-	      k = vector_unroll_factor (nunits,
-					TYPE_VECTOR_SUBPARTS (vectype));
+	      k = vector_unroll_factor (nunits, caller_nelements);
 	      gcc_assert ((k & (k - 1)) == 0);
 	      for (l = 0; l < k; l++)
 		{
@@ -4691,11 +4796,11 @@ vectorizable_simd_clone_call (vec_info *vinfo, stmt_vec_info stmt_info,
 		vect_clobber_variable (vinfo, stmt_info, gsi, new_temp);
 	      continue;
 	    }
-	  else if (!multiple_p (nunits, TYPE_VECTOR_SUBPARTS (vectype)))
+	  else if (!multiple_p (nunits, caller_nelements))
 	    {
 	      unsigned int k;
-	      if (!constant_multiple_p (TYPE_VECTOR_SUBPARTS (rtype),
-					TYPE_VECTOR_SUBPARTS (vectype), &k))
+	      if (!constant_multiple_p (caller_nelements,
+					TYPE_VECTOR_SUBPARTS (rtype), &k))
 		gcc_unreachable ();
 	      gcc_assert ((k & (k - 1)) == 0);
 	      if ((j & (k - 1)) == 0)

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH 6/8] vect: Add vector_mode paramater to simd_clone_usable
  2023-08-30  8:49 aarch64, vect, omp: Add SVE support for simd clones [PR 96342] Andre Vieira (lists)
                   ` (4 preceding siblings ...)
  2023-08-30  9:13 ` [PATCH 5/8] vect: Use inbranch simdclones in masked loops Andre Vieira (lists)
@ 2023-08-30  9:14 ` Andre Vieira (lists)
  2023-08-30  9:17   ` Andre Vieira (lists)
  2023-08-30 13:01   ` Richard Biener
  2023-08-30  9:17 ` [PATCH7/8] vect: Add TARGET_SIMD_CLONE_ADJUST_RET_OR_PARAM Andre Vieira (lists)
                   ` (3 subsequent siblings)
  9 siblings, 2 replies; 40+ messages in thread
From: Andre Vieira (lists) @ 2023-08-30  9:14 UTC (permalink / raw)
  To: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 680 bytes --]

This patch adds a machine_mode parameter to the TARGET_SIMD_CLONE_USABLE 
hook to enable rejecting SVE modes when the target architecture does not 
support SVE.

gcc/ChangeLog:

	* config/aarch64/aarch64.cc (aarch64_simd_clone_usable): Add mode
	parameter and use to to reject SVE modes when target architecture does
	not support SVE.
	* config/gcn/gcn.cc (gcn_simd_clone_usable): Add unused mode parameter.
	* config/i386/i386.cc (ix86_simd_clone_usable): Likewise.
	* doc/tm.texi (TARGET_SIMD_CLONE_USABLE): Document new parameter.
	* target.def (usable): Add new parameter.
	* tree-vect-stmts.cc (vectorizable_simd_clone_call): Pass vector mode
	to TARGET_SIMD_CLONE_CALL hook.

[-- Attachment #2: sve_simd_clones_6.patch --]
[-- Type: text/plain, Size: 4541 bytes --]

diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 5fb4c863d875871d6de865e72ce360506a3694d2..a13d3fba05f9f9d2989b36c681bc77d71e943e0d 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -27498,12 +27498,18 @@ aarch64_simd_clone_adjust (struct cgraph_node *node)
 /* Implement TARGET_SIMD_CLONE_USABLE.  */
 
 static int
-aarch64_simd_clone_usable (struct cgraph_node *node)
+aarch64_simd_clone_usable (struct cgraph_node *node, machine_mode vector_mode)
 {
   switch (node->simdclone->vecsize_mangle)
     {
     case 'n':
-      if (!TARGET_SIMD)
+      if (!TARGET_SIMD
+	  || aarch64_sve_mode_p (vector_mode))
+	return -1;
+      return 0;
+    case 's':
+      if (!TARGET_SVE
+	  || !aarch64_sve_mode_p (vector_mode))
 	return -1;
       return 0;
     default:
diff --git a/gcc/config/gcn/gcn.cc b/gcc/config/gcn/gcn.cc
index 02f4dedec4214b1eea9e6f5057ed57d7e0db316a..252676273f06500c99df6ae251f0406c618df891 100644
--- a/gcc/config/gcn/gcn.cc
+++ b/gcc/config/gcn/gcn.cc
@@ -5599,7 +5599,8 @@ gcn_simd_clone_adjust (struct cgraph_node *ARG_UNUSED (node))
 /* Implement TARGET_SIMD_CLONE_USABLE.  */
 
 static int
-gcn_simd_clone_usable (struct cgraph_node *ARG_UNUSED (node))
+gcn_simd_clone_usable (struct cgraph_node *ARG_UNUSED (node),
+		       machine_mode ARG_UNUSED (mode))
 {
   /* We don't need to do anything here because
      gcn_simd_clone_compute_vecsize_and_simdlen currently only returns one
diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index 5d57726e22cea8bcaa8ac8b1b25ac420193f39bb..84f0d5a7cb679e6be92001f59802276635506e97 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -24379,7 +24379,8 @@ ix86_simd_clone_compute_vecsize_and_simdlen (struct cgraph_node *node,
    slightly less desirable, etc.).  */
 
 static int
-ix86_simd_clone_usable (struct cgraph_node *node)
+ix86_simd_clone_usable (struct cgraph_node *node,
+			machine_mode mode ATTRIBUTE_UNUSED)
 {
   switch (node->simdclone->vecsize_mangle)
     {
diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index 95ba56e05ae4a0f11639cc4a21d6736c53ad5ef1..bde22e562ebb9069122eb3b142ab8f4a4ae56a3a 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -6336,11 +6336,13 @@ This hook should add implicit @code{attribute(target("..."))} attribute
 to SIMD clone @var{node} if needed.
 @end deftypefn
 
-@deftypefn {Target Hook} int TARGET_SIMD_CLONE_USABLE (struct cgraph_node *@var{})
+@deftypefn {Target Hook} int TARGET_SIMD_CLONE_USABLE (struct cgraph_node *@var{}, @var{machine_mode})
 This hook should return -1 if SIMD clone @var{node} shouldn't be used
-in vectorized loops in current function, or non-negative number if it is
-usable.  In that case, the smaller the number is, the more desirable it is
-to use it.
+in vectorized loops being vectorized with mode @var{m} in current function, or
+non-negative number if it is usable.  In that case, the smaller the number is,
+the more desirable it is to use it.
+@end deftypefn
+
 @end deftypefn
 
 @deftypefn {Target Hook} int TARGET_SIMT_VF (void)
diff --git a/gcc/target.def b/gcc/target.def
index 7d684296c17897b4ceecb31c5de1ae8665a8228e..6a0cbc454526ee29011451b570354bf234a4eabd 100644
--- a/gcc/target.def
+++ b/gcc/target.def
@@ -1645,10 +1645,11 @@ void, (struct cgraph_node *), NULL)
 DEFHOOK
 (usable,
 "This hook should return -1 if SIMD clone @var{node} shouldn't be used\n\
-in vectorized loops in current function, or non-negative number if it is\n\
-usable.  In that case, the smaller the number is, the more desirable it is\n\
-to use it.",
-int, (struct cgraph_node *), NULL)
+in vectorized loops being vectorized with mode @var{m} in current function, or\n\
+non-negative number if it is usable.  In that case, the smaller the number is,\n\
+the more desirable it is to use it.",
+int, (struct cgraph_node *, machine_mode), NULL)
+
 
 HOOK_VECTOR_END (simd_clone)
 
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index 7217f36a250d549b955c874d7c7644d94982b0b5..dc2fc20ef9fe777132308c9e33f7731d62717466 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -4195,7 +4195,7 @@ vectorizable_simd_clone_call (vec_info *vinfo, stmt_vec_info stmt_info,
 	  this_badness += exact_log2 (num_calls) * 4096;
 	if (n->simdclone->inbranch)
 	  this_badness += 8192;
-	int target_badness = targetm.simd_clone.usable (n);
+	int target_badness = targetm.simd_clone.usable (n, vinfo->vector_mode);
 	if (target_badness < 0)
 	  continue;
 	this_badness += target_badness * 512;

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 6/8] vect: Add vector_mode paramater to simd_clone_usable
  2023-08-30  9:14 ` [PATCH 6/8] vect: Add vector_mode paramater to simd_clone_usable Andre Vieira (lists)
@ 2023-08-30  9:17   ` Andre Vieira (lists)
  2023-08-30 13:01   ` Richard Biener
  1 sibling, 0 replies; 40+ messages in thread
From: Andre Vieira (lists) @ 2023-08-30  9:17 UTC (permalink / raw)
  To: gcc-patches; +Cc: Richard Biener, Richard Sandiford, jakub

Forgot to CC this one to maintainers...

On 30/08/2023 10:14, Andre Vieira (lists) via Gcc-patches wrote:
> This patch adds a machine_mode parameter to the TARGET_SIMD_CLONE_USABLE 
> hook to enable rejecting SVE modes when the target architecture does not 
> support SVE.
> 
> gcc/ChangeLog:
> 
>      * config/aarch64/aarch64.cc (aarch64_simd_clone_usable): Add mode
>      parameter and use to to reject SVE modes when target architecture does
>      not support SVE.
>      * config/gcn/gcn.cc (gcn_simd_clone_usable): Add unused mode 
> parameter.
>      * config/i386/i386.cc (ix86_simd_clone_usable): Likewise.
>      * doc/tm.texi (TARGET_SIMD_CLONE_USABLE): Document new parameter.
>      * target.def (usable): Add new parameter.
>      * tree-vect-stmts.cc (vectorizable_simd_clone_call): Pass vector mode
>      to TARGET_SIMD_CLONE_CALL hook.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH7/8] vect: Add TARGET_SIMD_CLONE_ADJUST_RET_OR_PARAM
  2023-08-30  8:49 aarch64, vect, omp: Add SVE support for simd clones [PR 96342] Andre Vieira (lists)
                   ` (5 preceding siblings ...)
  2023-08-30  9:14 ` [PATCH 6/8] vect: Add vector_mode paramater to simd_clone_usable Andre Vieira (lists)
@ 2023-08-30  9:17 ` Andre Vieira (lists)
  2023-08-30 13:04   ` Richard Biener
  2023-08-30  9:19 ` [PATCH 8/8] aarch64: Add SVE support for simd clones [PR 96342] Andre Vieira (lists)
                   ` (2 subsequent siblings)
  9 siblings, 1 reply; 40+ messages in thread
From: Andre Vieira (lists) @ 2023-08-30  9:17 UTC (permalink / raw)
  To: gcc-patches; +Cc: Richard Biener, Richard Sandiford, jakub

[-- Attachment #1: Type: text/plain, Size: 1076 bytes --]

This patch adds a new target hook to enable us to adapt the types of 
return and parameters of simd clones.  We use this in two ways, the 
first one is to make sure we can create valid SVE types, including the 
SVE type attribute, when creating a SVE simd clone, even when the target 
options do not support SVE.  We are following the same behaviour seen 
with x86 that creates simd clones according to the ABI rules when no 
simdlen is provided, even if that simdlen is not supported by the 
current target options.  Note that this doesn't mean the simd clone will 
be used in auto-vectorization.

gcc/ChangeLog:

	(TARGET_SIMD_CLONE_ADJUST_RET_OR_PARAM): Define.
	* doc/tm.texi (TARGET_SIMD_CLONE_ADJUST_RET_OR_PARAM): Document.
	* doc/tm.texi.in (TARGET_SIMD_CLONE_ADJUST_RET_OR_PARAM): New.
	* omp-simd-clone.cc (simd_adjust_return_type): Call new hook.
	(simd_clone_adjust_argument_types): Likewise.
	* target.def (adjust_ret_or_param): New hook.
	* targhooks.cc (default_simd_clone_adjust_ret_or_param): New.
	* targhooks.h (default_simd_clone_adjust_ret_or_param): New.

[-- Attachment #2: sve_simd_clones_7.patch --]
[-- Type: text/plain, Size: 5023 bytes --]

diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index bde22e562ebb9069122eb3b142ab8f4a4ae56a3a..b80c09ec36d51f1bb55b14229f46207fb4457223 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -6343,6 +6343,9 @@ non-negative number if it is usable.  In that case, the smaller the number is,
 the more desirable it is to use it.
 @end deftypefn
 
+@deftypefn {Target Hook} tree TARGET_SIMD_CLONE_ADJUST_RET_OR_PARAM (struct cgraph_node *@var{}, @var{tree}, @var{bool})
+If defined, this hook should adjust the type of the return or parameter
+@var{type} to be used by the simd clone @var{node}.
 @end deftypefn
 
 @deftypefn {Target Hook} int TARGET_SIMT_VF (void)
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index 4ac96dc357d35e0e57bb43a41d1b1a4f66d05946..7496a32d84f7c422fe7ea88215ee72f3c354a3f4 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -4211,6 +4211,8 @@ address;  but often a machine-dependent strategy can generate better code.
 
 @hook TARGET_SIMD_CLONE_USABLE
 
+@hook TARGET_SIMD_CLONE_ADJUST_RET_OR_PARAM
+
 @hook TARGET_SIMT_VF
 
 @hook TARGET_OMP_DEVICE_KIND_ARCH_ISA
diff --git a/gcc/omp-simd-clone.cc b/gcc/omp-simd-clone.cc
index ef0b9b48c7212900023bc0eaebca5e1f9389db77..c2fd4d3be878e56b6394e34097d2de826a0ba1ff 100644
--- a/gcc/omp-simd-clone.cc
+++ b/gcc/omp-simd-clone.cc
@@ -736,6 +736,7 @@ simd_clone_adjust_return_type (struct cgraph_node *node)
       t = build_array_type_nelts (t, exact_div (node->simdclone->simdlen,
 						veclen));
     }
+  t = targetm.simd_clone.adjust_ret_or_param (node, t, false);
   TREE_TYPE (TREE_TYPE (fndecl)) = t;
   if (!node->definition)
     return NULL_TREE;
@@ -748,6 +749,7 @@ simd_clone_adjust_return_type (struct cgraph_node *node)
 
   tree atype = build_array_type_nelts (orig_rettype,
 				       node->simdclone->simdlen);
+  atype = targetm.simd_clone.adjust_ret_or_param (node, atype, false);
   if (maybe_ne (veclen, node->simdclone->simdlen))
     return build1 (VIEW_CONVERT_EXPR, atype, t);
 
@@ -880,6 +882,8 @@ simd_clone_adjust_argument_types (struct cgraph_node *node)
 				       ? IDENTIFIER_POINTER (DECL_NAME (parm))
 				       : NULL, parm_type, sc->simdlen);
 	}
+      adj.type = targetm.simd_clone.adjust_ret_or_param (node, adj.type,
+							 false);
       vec_safe_push (new_params, adj);
     }
 
@@ -912,6 +916,8 @@ simd_clone_adjust_argument_types (struct cgraph_node *node)
 	adj.type = build_vector_type (pointer_sized_int_node, veclen);
       else
 	adj.type = build_vector_type (base_type, veclen);
+      adj.type = targetm.simd_clone.adjust_ret_or_param (node, adj.type,
+							 true);
       vec_safe_push (new_params, adj);
 
       k = vector_unroll_factor (sc->simdlen, veclen);
@@ -937,6 +943,7 @@ simd_clone_adjust_argument_types (struct cgraph_node *node)
 	    sc->args[i].simd_array = NULL_TREE;
 	}
       sc->args[i].orig_type = base_type;
+      sc->args[i].vector_type = adj.type;
       sc->args[i].arg_type = SIMD_CLONE_ARG_TYPE_MASK;
       sc->args[i].vector_type = adj.type;
     }
diff --git a/gcc/target.def b/gcc/target.def
index 6a0cbc454526ee29011451b570354bf234a4eabd..665083ce035da03b40b15f23684ccdacce33c9d3 100644
--- a/gcc/target.def
+++ b/gcc/target.def
@@ -1650,6 +1650,13 @@ non-negative number if it is usable.  In that case, the smaller the number is,\n
 the more desirable it is to use it.",
 int, (struct cgraph_node *, machine_mode), NULL)
 
+DEFHOOK
+(adjust_ret_or_param,
+"If defined, this hook should adjust the type of the return or parameter\n\
+@var{type} to be used by the simd clone @var{node}.",
+tree, (struct cgraph_node *, tree, bool),
+default_simd_clone_adjust_ret_or_param)
+
 
 HOOK_VECTOR_END (simd_clone)
 
diff --git a/gcc/targhooks.h b/gcc/targhooks.h
index 1a0db8dddd594d9b1fb04ae0d9a66ad6b7a396dc..558157514814228ef2ed41ae0861e1c088eea9ef 100644
--- a/gcc/targhooks.h
+++ b/gcc/targhooks.h
@@ -75,6 +75,9 @@ extern void default_print_operand (FILE *, rtx, int);
 extern void default_print_operand_address (FILE *, machine_mode, rtx);
 extern bool default_print_operand_punct_valid_p (unsigned char);
 extern tree default_mangle_assembler_name (const char *);
+extern tree default_simd_clone_adjust_ret_or_param
+  (struct cgraph_node *,tree , bool);
+
 
 extern machine_mode default_translate_mode_attribute (machine_mode);
 extern bool default_scalar_mode_supported_p (scalar_mode);
diff --git a/gcc/targhooks.cc b/gcc/targhooks.cc
index e190369f87a92e6a92372dc348d9374c3a965c0a..6b6f6132c6dc62b92ad8d448d63ca6041386788f 100644
--- a/gcc/targhooks.cc
+++ b/gcc/targhooks.cc
@@ -399,6 +399,16 @@ default_mangle_assembler_name (const char *name ATTRIBUTE_UNUSED)
   return get_identifier (stripped);
 }
 
+/* The default implementation of TARGET_SIMD_CLONE_ADJUST_RET_OR_PARAM.  */
+
+tree
+default_simd_clone_adjust_ret_or_param (struct cgraph_node *node ATTRIBUTE_UNUSED,
+					tree type,
+					bool is_return ATTRIBUTE_UNUSED)
+{
+  return type;
+}
+
 /* The default implementation of TARGET_TRANSLATE_MODE_ATTRIBUTE.  */
 
 machine_mode

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH 8/8] aarch64: Add SVE support for simd clones [PR 96342]
  2023-08-30  8:49 aarch64, vect, omp: Add SVE support for simd clones [PR 96342] Andre Vieira (lists)
                   ` (6 preceding siblings ...)
  2023-08-30  9:17 ` [PATCH7/8] vect: Add TARGET_SIMD_CLONE_ADJUST_RET_OR_PARAM Andre Vieira (lists)
@ 2023-08-30  9:19 ` Andre Vieira (lists)
  2023-10-18 14:41   ` Andre Vieira (lists)
  2023-10-18 14:40 ` aarch64, vect, omp: " Andre Vieira (lists)
  2023-10-18 14:41 ` [PATCH 0/8] omp: Replace simd_clone_subparts with TYPE_VECTOR_SUBPARTS Andre Vieira (lists)
  9 siblings, 1 reply; 40+ messages in thread
From: Andre Vieira (lists) @ 2023-08-30  9:19 UTC (permalink / raw)
  To: gcc-patches; +Cc: Richard Sandiford, Kyrylo Tkachov

[-- Attachment #1: Type: text/plain, Size: 1290 bytes --]

This patch finalizes adding support for the generation of SVE simd 
clones when no simdlen is provided, following the ABI rules where the 
widest data type determines the minimum amount of elements in a length 
agnostic vector.

gcc/ChangeLog:

         * config/aarch64/aarch64-protos.h (add_sve_type_attribute): 
Declare.
	* config/aarch64/aarch64-sve-builtins.cc (add_sve_type_attribute): Make
	visibility global.
	* config/aarch64/aarch64.cc (aarch64_fntype_abi): Ensure SVE ABI is
	chosen over SIMD ABI if a SVE type is used in return or arguments.
	(aarch64_simd_clone_compute_vecsize_and_simdlen): Create VLA simd clone
	when no simdlen is provided, according to ABI rules.
	(aarch64_simd_clone_adjust): Add '+sve' attribute to SVE simd clones.
	(aarch64_simd_clone_adjust_ret_or_param): New.
	(TARGET_SIMD_CLONE_ADJUST_RET_OR_PARAM): Define.
	* omp-simd-clone.cc (simd_clone_mangle): Print 'x' for VLA simdlen.
	(simd_clone_adjust): Adapt safelen check to be compatible with VLA
	simdlen.

gcc/testsuite/ChangeLog:

	* c-c++-common/gomp/declare-variant-14.c: Adapt aarch64 scan.
	* gfortran.dg/gomp/declare-variant-14.f90: Likewise.
	* gcc.target/aarch64/declare-simd-1.c: Remove warning checks where no
	longer necessary.
	* gcc.target/aarch64/declare-simd-2.c: Add SVE clone scan.

[-- Attachment #2: sve_simd_clones_8.patch --]
[-- Type: text/plain, Size: 17532 bytes --]

diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
index 70303d6fd953e0c397b9138ede8858c2db2e53db..d7888c95a4999fad1a4c55d5cd2287c2040302c8 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -1001,6 +1001,8 @@ namespace aarch64_sve {
 #ifdef GCC_TARGET_H
   bool verify_type_context (location_t, type_context_kind, const_tree, bool);
 #endif
+ void add_sve_type_attribute (tree, unsigned int, unsigned int,
+			      const char *, const char *);
 }
 
 extern void aarch64_split_combinev16qi (rtx operands[3]);
diff --git a/gcc/config/aarch64/aarch64-sve-builtins.cc b/gcc/config/aarch64/aarch64-sve-builtins.cc
index 161a14edde7c9fb1b13b146cf50463e2d78db264..6f99c438d10daa91b7e3b623c995489f1a8a0f4c 100644
--- a/gcc/config/aarch64/aarch64-sve-builtins.cc
+++ b/gcc/config/aarch64/aarch64-sve-builtins.cc
@@ -569,14 +569,16 @@ static bool reported_missing_registers_p;
 /* Record that TYPE is an ABI-defined SVE type that contains NUM_ZR SVE vectors
    and NUM_PR SVE predicates.  MANGLED_NAME, if nonnull, is the ABI-defined
    mangling of the type.  ACLE_NAME is the <arm_sve.h> name of the type.  */
-static void
+void
 add_sve_type_attribute (tree type, unsigned int num_zr, unsigned int num_pr,
 			const char *mangled_name, const char *acle_name)
 {
   tree mangled_name_tree
     = (mangled_name ? get_identifier (mangled_name) : NULL_TREE);
+  tree acle_name_tree
+    = (acle_name ? get_identifier (acle_name) : NULL_TREE);
 
-  tree value = tree_cons (NULL_TREE, get_identifier (acle_name), NULL_TREE);
+  tree value = tree_cons (NULL_TREE, acle_name_tree, NULL_TREE);
   value = tree_cons (NULL_TREE, mangled_name_tree, value);
   value = tree_cons (NULL_TREE, size_int (num_pr), value);
   value = tree_cons (NULL_TREE, size_int (num_zr), value);
diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index a13d3fba05f9f9d2989b36c681bc77d71e943e0d..492acb9ce081866162faa8dfca777e4cb943797f 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -4034,13 +4034,13 @@ aarch64_takes_arguments_in_sve_regs_p (const_tree fntype)
 static const predefined_function_abi &
 aarch64_fntype_abi (const_tree fntype)
 {
-  if (lookup_attribute ("aarch64_vector_pcs", TYPE_ATTRIBUTES (fntype)))
-    return aarch64_simd_abi ();
-
   if (aarch64_returns_value_in_sve_regs_p (fntype)
       || aarch64_takes_arguments_in_sve_regs_p (fntype))
     return aarch64_sve_abi ();
 
+  if (lookup_attribute ("aarch64_vector_pcs", TYPE_ATTRIBUTES (fntype)))
+    return aarch64_simd_abi ();
+
   return default_function_abi;
 }
 
@@ -27327,7 +27327,7 @@ aarch64_simd_clone_compute_vecsize_and_simdlen (struct cgraph_node *node,
 					int num, bool explicit_p)
 {
   tree t, ret_type;
-  unsigned int nds_elt_bits;
+  unsigned int nds_elt_bits, wds_elt_bits;
   int count;
   unsigned HOST_WIDE_INT const_simdlen;
   poly_uint64 vec_bits;
@@ -27374,10 +27374,14 @@ aarch64_simd_clone_compute_vecsize_and_simdlen (struct cgraph_node *node,
   if (TREE_CODE (ret_type) != VOID_TYPE)
     {
       nds_elt_bits = lane_size (SIMD_CLONE_ARG_TYPE_VECTOR, ret_type);
+      wds_elt_bits = nds_elt_bits;
       vec_elts.safe_push (std::make_pair (ret_type, nds_elt_bits));
     }
   else
-    nds_elt_bits = POINTER_SIZE;
+    {
+      nds_elt_bits = POINTER_SIZE;
+      wds_elt_bits = 0;
+    }
 
   int i;
   tree type_arg_types = TYPE_ARG_TYPES (TREE_TYPE (node->decl));
@@ -27385,30 +27389,36 @@ aarch64_simd_clone_compute_vecsize_and_simdlen (struct cgraph_node *node,
   for (t = (decl_arg_p ? DECL_ARGUMENTS (node->decl) : type_arg_types), i = 0;
        t && t != void_list_node; t = TREE_CHAIN (t), i++)
     {
-      tree arg_type = decl_arg_p ? TREE_TYPE (t) : TREE_VALUE (t);
+      tree type = decl_arg_p ? TREE_TYPE (t) : TREE_VALUE (t);
       if (clonei->args[i].arg_type != SIMD_CLONE_ARG_TYPE_UNIFORM
-	  && !supported_simd_type (arg_type))
+	  && !supported_simd_type (type))
 	{
 	  if (!explicit_p)
 	    ;
-	  else if (COMPLEX_FLOAT_TYPE_P (ret_type))
+	  else if (COMPLEX_FLOAT_TYPE_P (type))
 	    warning_at (DECL_SOURCE_LOCATION (node->decl), 0,
 			"GCC does not currently support argument type %qT "
-			"for simd", arg_type);
+			"for simd", type);
 	  else
 	    warning_at (DECL_SOURCE_LOCATION (node->decl), 0,
 			"unsupported argument type %qT for simd",
-			arg_type);
+			type);
 	  return 0;
 	}
-      unsigned lane_bits = lane_size (clonei->args[i].arg_type, arg_type);
+      unsigned lane_bits = lane_size (clonei->args[i].arg_type, type);
       if (clonei->args[i].arg_type == SIMD_CLONE_ARG_TYPE_VECTOR)
-	vec_elts.safe_push (std::make_pair (arg_type, lane_bits));
+	vec_elts.safe_push (std::make_pair (type, lane_bits));
       if (nds_elt_bits > lane_bits)
 	nds_elt_bits = lane_bits;
+      else if (wds_elt_bits < lane_bits)
+	wds_elt_bits = lane_bits;
     }
 
-  clonei->vecsize_mangle = 'n';
+  /* If we could not determine the WDS type from available parameters/return,
+     then fallback to using uintptr_t.  */
+  if (wds_elt_bits == 0)
+    wds_elt_bits = POINTER_SIZE;
+
   clonei->mask_mode = VOIDmode;
   poly_uint64 simdlen;
   auto_vec<poly_uint64> simdlens (2);
@@ -27419,6 +27429,11 @@ aarch64_simd_clone_compute_vecsize_and_simdlen (struct cgraph_node *node,
       simdlen = exact_div (poly_uint64 (64), nds_elt_bits);
       simdlens.safe_push (simdlen);
       simdlens.safe_push (simdlen * 2);
+      /* Only create a SVE simd clone if we aren't dealing with an unprototyped
+	 function.  */
+      if (DECL_ARGUMENTS (node->decl) != 0
+	  || type_arg_types != 0)
+	simdlens.safe_push (exact_div (poly_uint64 (128, 128), wds_elt_bits));
     }
   else
     simdlens.safe_push (clonei->simdlen);
@@ -27439,19 +27454,20 @@ aarch64_simd_clone_compute_vecsize_and_simdlen (struct cgraph_node *node,
   while (j < count && !simdlens.is_empty ())
     {
       bool remove_simdlen = false;
-      for (auto elt : vec_elts)
-	if (known_gt (simdlens[j] * elt.second, 128U))
-	  {
-	    /* Don't issue a warning for every simdclone when there is no
-	       specific simdlen clause.  */
-	    if (explicit_p && known_ne (clonei->simdlen, 0U))
-	      warning_at (DECL_SOURCE_LOCATION (node->decl), 0,
-			  "GCC does not currently support simdlen %wd for "
-			  "type %qT",
-			  constant_lower_bound (simdlens[j]), elt.first);
-	    remove_simdlen = true;
-	    break;
-	  }
+      if (simdlens[j].is_constant ())
+	for (auto elt : vec_elts)
+	  if (known_gt (simdlens[j] * elt.second, 128U))
+	    {
+	      /* Don't issue a warning for every simdclone when there is no
+		 specific simdlen clause.  */
+	      if (explicit_p && known_ne (clonei->simdlen, 0U))
+		warning_at (DECL_SOURCE_LOCATION (node->decl), 0,
+			    "GCC does not currently support simdlen %wd for "
+			    "type %qT",
+			    constant_lower_bound (simdlens[j]), elt.first);
+	      remove_simdlen = true;
+	      break;
+	    }
       if (remove_simdlen)
 	{
 	  count--;
@@ -27479,6 +27495,13 @@ aarch64_simd_clone_compute_vecsize_and_simdlen (struct cgraph_node *node,
 
   gcc_assert (num < count);
   clonei->simdlen = simdlens[num];
+  if (clonei->simdlen.is_constant ())
+    clonei->vecsize_mangle = 'n';
+  else
+    {
+      clonei->vecsize_mangle = 's';
+      clonei->inbranch = 1;
+    }
   return count;
 }
 
@@ -27493,6 +27516,11 @@ aarch64_simd_clone_adjust (struct cgraph_node *node)
   tree t = TREE_TYPE (node->decl);
   TYPE_ATTRIBUTES (t) = make_attribute ("aarch64_vector_pcs", "default",
 					TYPE_ATTRIBUTES (t));
+  if (node->simdclone->vecsize_mangle == 's')
+    {
+      tree target = build_string (strlen ("+sve"), "+sve");
+      aarch64_option_valid_attribute_p (node->decl, NULL_TREE, target, 0);
+    }
 }
 
 /* Implement TARGET_SIMD_CLONE_USABLE.  */
@@ -27517,6 +27545,57 @@ aarch64_simd_clone_usable (struct cgraph_node *node, machine_mode vector_mode)
     }
 }
 
+/* Implement TARGET_SIMD_CLONE_ADJUST_RET_OR_PARAM.  */
+
+static tree
+aarch64_simd_clone_adjust_ret_or_param (cgraph_node *node, tree type,
+					bool is_mask)
+{
+  if (type
+      && VECTOR_TYPE_P (type)
+      && node->simdclone->vecsize_mangle == 's')
+    {
+      cl_target_option cur_target;
+      cl_target_option_save (&cur_target, &global_options, &global_options_set);
+      tree new_target = DECL_FUNCTION_SPECIFIC_TARGET (node->decl);
+      cl_target_option_restore (&global_options, &global_options_set,
+				TREE_TARGET_OPTION (new_target));
+      aarch64_override_options_internal (&global_options);
+      bool m_old_have_regs_of_mode[MAX_MACHINE_MODE];
+      memcpy (m_old_have_regs_of_mode, have_regs_of_mode,
+	      sizeof (have_regs_of_mode));
+      for (int i = 0; i < NUM_MACHINE_MODES; ++i)
+	if (aarch64_sve_mode_p ((machine_mode) i))
+	  have_regs_of_mode[i] = true;
+      poly_uint16 old_sve_vg = aarch64_sve_vg;
+      if (!node->simdclone->simdlen.is_constant ())
+	aarch64_sve_vg = poly_uint16 (2, 2);
+      unsigned int num_zr = 0;
+      unsigned int num_pr = 0;
+      type = TREE_TYPE (type);
+      type = build_vector_type (type, node->simdclone->simdlen);
+      if (is_mask)
+	{
+	  type = truth_type_for (type);
+	  num_pr = 1;
+	}
+      else
+	num_zr = 1;
+
+      aarch64_sve::add_sve_type_attribute (type, num_zr, num_pr, NULL, NULL);
+      cl_target_option_restore (&global_options, &global_options_set, &cur_target);
+      aarch64_override_options_internal (&global_options);
+      memcpy (have_regs_of_mode, m_old_have_regs_of_mode,
+	      sizeof (have_regs_of_mode));
+      aarch64_sve_vg = old_sve_vg;
+    }
+  else if (type
+	   && VECTOR_TYPE_P (type)
+	   && is_mask)
+    type = truth_type_for (type);
+  return type;
+}
+
 /* Implement TARGET_COMP_TYPE_ATTRIBUTES */
 
 static int
@@ -28590,6 +28669,10 @@ aarch64_libgcc_floating_mode_supported_p
 #undef TARGET_SIMD_CLONE_ADJUST
 #define TARGET_SIMD_CLONE_ADJUST aarch64_simd_clone_adjust
 
+#undef TARGET_SIMD_CLONE_ADJUST_RET_OR_PARAM
+#define TARGET_SIMD_CLONE_ADJUST_RET_OR_PARAM \
+  aarch64_simd_clone_adjust_ret_or_param
+
 #undef TARGET_SIMD_CLONE_USABLE
 #define TARGET_SIMD_CLONE_USABLE aarch64_simd_clone_usable
 
diff --git a/gcc/omp-simd-clone.cc b/gcc/omp-simd-clone.cc
index c2fd4d3be878e56b6394e34097d2de826a0ba1ff..091f194f1829fb9f70827d8674fd4dae44282d55 100644
--- a/gcc/omp-simd-clone.cc
+++ b/gcc/omp-simd-clone.cc
@@ -541,9 +541,12 @@ simd_clone_mangle (struct cgraph_node *node,
   pp_string (&pp, "_ZGV");
   pp_character (&pp, vecsize_mangle);
   pp_character (&pp, mask);
-  /* For now, simdlen is always constant, while variable simdlen pp 'n'.  */
-  unsigned int len = simdlen.to_constant ();
-  pp_decimal_int (&pp, (len));
+
+  unsigned long long len = 0;
+  if (simdlen.is_constant (&len))
+    pp_decimal_int (&pp, (int) (len));
+  else
+    pp_character (&pp, 'x');
 
   for (n = 0; n < clone_info->nargs; ++n)
     {
@@ -1499,8 +1502,8 @@ simd_clone_adjust (struct cgraph_node *node)
 	 below).  */
       loop = alloc_loop ();
       cfun->has_force_vectorize_loops = true;
-      /* For now, simlen is always constant.  */
-      loop->safelen = node->simdclone->simdlen.to_constant ();
+      /* We can assert that safelen is the 'minimum' simdlen.  */
+      loop->safelen = constant_lower_bound (node->simdclone->simdlen);
       loop->force_vectorize = true;
       loop->header = body_bb;
     }
diff --git a/gcc/testsuite/c-c++-common/gomp/declare-variant-14.c b/gcc/testsuite/c-c++-common/gomp/declare-variant-14.c
index e3668893afe33a58c029cddd433d9bf43cce2bfa..12f8b3b839b7f3ff9e4f99768e59c0e1c5339062 100644
--- a/gcc/testsuite/c-c++-common/gomp/declare-variant-14.c
+++ b/gcc/testsuite/c-c++-common/gomp/declare-variant-14.c
@@ -21,7 +21,7 @@ test1 (int x)
      shall call f01 with score 8.  */
   /* { dg-final { scan-tree-dump-not "f04 \\\(x" "optimized" } } */
   /* { dg-final { scan-tree-dump-times "f03 \\\(x" 14 "optimized" { target { !aarch64*-*-* } } } } */
-  /* { dg-final { scan-tree-dump-times "f03 \\\(x" 10 "optimized" { target { aarch64*-*-* } } } } */
+  /* { dg-final { scan-tree-dump-times "f03 \\\(x" 12 "optimized" { target { aarch64*-*-* } } } } */
   /* { dg-final { scan-tree-dump-times "f01 \\\(x" 4 "optimized" { target { !aarch64*-*-* } } } } */
   /* { dg-final { scan-tree-dump-times "f01 \\\(x" 0 "optimized" { target { aarch64*-*-* } } } } */
   int a = f04 (x);
diff --git a/gcc/testsuite/gcc.target/aarch64/declare-simd-1.c b/gcc/testsuite/gcc.target/aarch64/declare-simd-1.c
index aab8c17f0c442a7cda4dce23cc18162a0b7f676e..add6e7c93019834fbd5bed5ead18b52d4cdd0a37 100644
--- a/gcc/testsuite/gcc.target/aarch64/declare-simd-1.c
+++ b/gcc/testsuite/gcc.target/aarch64/declare-simd-1.c
@@ -4,28 +4,39 @@
 extern "C" {
 #endif
 #pragma omp declare simd
-int __attribute__ ((const)) f00 (int a , char b) /* { dg-warning {GCC does not currently support a simdclone with simdlens 8 and 16 for these types.} } */
+int __attribute__ ((const)) f00 (int a , char b)
 {
   return a + b;
 }
 
+/* { dg-final { scan-assembler-not {_ZGVn[a-z0-9]+_f00} } } */
+/* { dg-final { scan-assembler {_ZGVsMxvv_f00} } } */
+
 #pragma omp declare simd
-long long __attribute__ ((const)) f01 (int a , short b) /* { dg-warning {GCC does not currently support a simdclone with simdlens 4 and 8 for these types.} } */
+long long __attribute__ ((const)) f01 (int a , short b)
 {
   return a + b;
 }
+/* { dg-final { scan-assembler-not {_ZGVn[a-z0-9]+_f01} } } */
+/* { dg-final { scan-assembler {_ZGVsMxvv_f01} } } */
 
 #pragma omp declare simd linear(b)
-long long __attribute__ ((const)) f02 (short *b, int a) /* { dg-warning {GCC does not currently support a simdclone with simdlens 4 and 8 for these types.} } */
+long long __attribute__ ((const)) f02 (short *b, int a)
 {
   return a + *b;
 }
 
+/* { dg-final { scan-assembler-not {_ZGVn[a-z0-9]+_f02} } } */
+/* { dg-final { scan-assembler {_ZGVsMxl2v_f02} } } */
+
 #pragma omp declare simd uniform(b)
-void f03 (char b, int a) /* { dg-warning {GCC does not currently support a simdclone with simdlens 8 and 16 for these types.} } */
+void f03 (char b, int a)
 {
 }
 
+/* { dg-final { scan-assembler-not {_ZGVn[a-z0-9]+_f03} } } */
+/* { dg-final { scan-assembler {_ZGVsMxuv_f03} } } */
+
 #pragma omp declare simd simdlen(4)
 double f04 (void) /* { dg-warning {GCC does not currently support simdlen 4 for type 'double'} } */
 {
diff --git a/gcc/testsuite/gcc.target/aarch64/declare-simd-2.c b/gcc/testsuite/gcc.target/aarch64/declare-simd-2.c
index abb128ffc9cd2c1353b99eb38aae72377746e6d6..604869a30456e4db988bba86e059a27f19dda589 100644
--- a/gcc/testsuite/gcc.target/aarch64/declare-simd-2.c
+++ b/gcc/testsuite/gcc.target/aarch64/declare-simd-2.c
@@ -10,6 +10,7 @@ short __attribute__ ((const)) f00 (short a , char b)
 }
 /* { dg-final { scan-assembler {_ZGVnN8vv_f00:} } } */
 /* { dg-final { scan-assembler {_ZGVnM8vv_f00:} } } */
+/* { dg-final { scan-assembler {_ZGVsMxvv_f00:} } } */
 
 #pragma omp declare simd notinbranch
 short __attribute__ ((const)) f01 (int a , short b)
@@ -17,6 +18,7 @@ short __attribute__ ((const)) f01 (int a , short b)
   return a + b;
 }
 /* { dg-final { scan-assembler {_ZGVnN4vv_f01:} } } */
+/* { dg-final { scan-assembler {_ZGVsMxvv_f01:} } } */
 
 #pragma omp declare simd linear(b) inbranch
 int __attribute__ ((const)) f02 (int a, short *b)
@@ -24,6 +26,7 @@ int __attribute__ ((const)) f02 (int a, short *b)
   return a + *b;
 }
 /* { dg-final { scan-assembler {_ZGVnM4vl2_f02:} } } */
+/* { dg-final { scan-assembler {_ZGVsMxvl2_f02:} } } */
 
 #pragma omp declare simd uniform(a) notinbranch
 void f03 (char b, int a)
@@ -31,6 +34,7 @@ void f03 (char b, int a)
 }
 /* { dg-final { scan-assembler {_ZGVnN8vu_f03:} } } */
 /* { dg-final { scan-assembler {_ZGVnN16vu_f03:} } } */
+/* { dg-final { scan-assembler {_ZGVsMxvu_f03:} } } */
 
 #pragma omp declare simd simdlen(2)
 float f04 (double a)
@@ -39,6 +43,7 @@ float f04 (double a)
 }
 /* { dg-final { scan-assembler {_ZGVnN2v_f04:} } } */
 /* { dg-final { scan-assembler {_ZGVnM2v_f04:} } } */
+/* { dg-final { scan-assembler-not {_ZGVs[0-9a-z]*_f04:} } } */
 
 #pragma omp declare simd uniform(a) linear (b)
 void f05 (short a, short *b, short c)
@@ -50,6 +55,7 @@ void f05 (short a, short *b, short c)
 /* { dg-final { scan-assembler {_ZGVnN4ul2v_f05:} } } */
 /* { dg-final { scan-assembler {_ZGVnM8ul2v_f05:} } } */
 /* { dg-final { scan-assembler {_ZGVnM8ul2v_f05:} } } */
+/* { dg-final { scan-assembler {_ZGVsMxul2v_f05:} } } */
 #ifdef __cplusplus
 }
 #endif
diff --git a/gcc/testsuite/gfortran.dg/gomp/declare-variant-14.f90 b/gcc/testsuite/gfortran.dg/gomp/declare-variant-14.f90
index 6319df0558f37b95f1b2eb17374bdb4ecbc33295..38677b8f7a76b960ce9363b1c0cabf6fc5086ab6 100644
--- a/gcc/testsuite/gfortran.dg/gomp/declare-variant-14.f90
+++ b/gcc/testsuite/gfortran.dg/gomp/declare-variant-14.f90
@@ -41,7 +41,7 @@ contains
     ! shall call f01 with score 8.
     ! { dg-final { scan-tree-dump-not "f04 \\\(x" "optimized" } }
     ! { dg-final { scan-tree-dump-times "f03 \\\(x" 14 "optimized" { target { !aarch64*-*-* } } } }
-    ! { dg-final { scan-tree-dump-times "f03 \\\(x" 6 "optimized" { target { aarch64*-*-* } } } }
+    ! { dg-final { scan-tree-dump-times "f03 \\\(x" 8 "optimized" { target { aarch64*-*-* } } } }
     ! { dg-final { scan-tree-dump-times "f01 \\\(x" 4 "optimized" { target { !aarch64*-*-* } } } }
     ! { dg-final { scan-tree-dump-times "f01 \\\(x" 0 "optimized" { target { aarch64*-*-* } } } }
     a = f04 (x)

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 1/8] parloops: Copy target and optimizations when creating a function clone
  2023-08-30  9:06 ` [PATCH 1/8] parloops: Copy target and optimizations when creating a function clone Andre Vieira (lists)
@ 2023-08-30 12:31   ` Richard Biener
  2023-10-18 14:40     ` Andre Vieira (lists)
  0 siblings, 1 reply; 40+ messages in thread
From: Richard Biener @ 2023-08-30 12:31 UTC (permalink / raw)
  To: Andre Vieira (lists); +Cc: gcc-patches, Richard Sandiford, jakub

On Wed, 30 Aug 2023, Andre Vieira (lists) wrote:

> 
> SVE simd clones require to be compiled with a SVE target enabled or the
> argument types will not be created properly. To achieve this we need to copy
> DECL_FUNCTION_SPECIFIC_TARGET from the original function declaration to the
> clones.  I decided it was probably also a good idea to copy
> DECL_FUNCTION_SPECIFIC_OPTIMIZATION in case the original function is meant to
> be compiled with specific optimization options.

OK.

> gcc/ChangeLog:
> 
> 	* tree-parloops.cc (create_loop_fn): Copy specific target and
> 	optimization options to clone.
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Patch 2/8] parloops: Allow poly nit and bound
  2023-08-30  9:08 ` [Patch 2/8] parloops: Allow poly nit and bound Andre Vieira (lists)
@ 2023-08-30 12:32   ` Richard Biener
  2023-10-18 14:40     ` Andre Vieira (lists)
  0 siblings, 1 reply; 40+ messages in thread
From: Richard Biener @ 2023-08-30 12:32 UTC (permalink / raw)
  To: Andre Vieira (lists); +Cc: gcc-patches, Richard Sandiford

On Wed, 30 Aug 2023, Andre Vieira (lists) wrote:

> Teach parloops how to handle a poly nit and bound e ahead of the changes to
> enable non-constant simdlen.

Can you use poly_int_tree_p to combine INTEGER_CST || POLY_INT_CST please?

OK with that change.

> gcc/ChangeLog:
> 
> 	* tree-parloops.cc (try_to_transform_to_exit_first_loop_alt): Accept
> 	poly NIT and ALT_BOUND.
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Patch 3/8] vect: Fix vect_get_smallest_scalar_type for simd clones
  2023-08-30  9:10 ` [Patch 3/8] vect: Fix vect_get_smallest_scalar_type for simd clones Andre Vieira (lists)
@ 2023-08-30 12:54   ` Richard Biener
  2023-10-18 14:40     ` Andre Vieira (lists)
  0 siblings, 1 reply; 40+ messages in thread
From: Richard Biener @ 2023-08-30 12:54 UTC (permalink / raw)
  To: Andre Vieira (lists); +Cc: gcc-patches, Richard Sandiford, jakub

On Wed, 30 Aug 2023, Andre Vieira (lists) wrote:

> The vect_get_smallest_scalar_type helper function was using any argument to a
> simd clone call when trying to determine the smallest scalar type that would
> be vectorized.  This included the function pointer type in a MASK_CALL for
> instance, and would result in the wrong type being selected.  Instead this
> patch special cases simd_clone_call's and uses only scalar types of the
> original function that get transformed into vector types.

Looks sensible.

+bool
+simd_clone_call_p (gimple *stmt, cgraph_node **out_node)

you could return the cgraph_node * or NULL here.  Are you going to
use the function elsewhere?  Otherwise put it in the same TU as
the only use please and avoid exporting it.

Richard.

> gcc/ChangeLog:
> 
> 	* tree-vect-data-refs.cci (vect_get_smallest_scalar_type): Special
> 	case
> 	simd clone calls and only use types that are mapped to vectors.
> 	* tree-vect-stmts.cc (simd_clone_call_p): New helper function.
> 	* tree-vectorizer.h (simd_clone_call_p): Declare new function.
> 
> gcc/testsuite/ChangeLog:
> 
> 	* gcc.dg/vect/vect-simd-clone-16f.c: Remove unnecessary differentation
> 	between targets with different pointer sizes.
> 	* gcc.dg/vect/vect-simd-clone-17f.c: Likewise.
> 	* gcc.dg/vect/vect-simd-clone-18f.c: Likewise.
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 4/8] vect: don't allow fully masked loops with non-masked simd clones [PR 110485]
  2023-08-30  9:11 ` [PATCH 4/8] vect: don't allow fully masked loops with non-masked simd clones [PR 110485] Andre Vieira (lists)
@ 2023-08-30 12:54   ` Richard Biener
  2023-10-18 14:40     ` Andre Vieira (lists)
  0 siblings, 1 reply; 40+ messages in thread
From: Richard Biener @ 2023-08-30 12:54 UTC (permalink / raw)
  To: Andre Vieira (lists); +Cc: gcc-patches

On Wed, 30 Aug 2023, Andre Vieira (lists) wrote:

> When analyzing a loop and choosing a simdclone to use it is possible to choose
> a simdclone that cannot be used 'inbranch' for a loop that can use partial
> vectors.  This may lead to the vectorizer deciding to use partial vectors
> which are not supported for notinbranch simd clones. This patch fixes that by
> disabling the use of partial vectors once a notinbranch simd clone has been
> selected.

OK.

> gcc/ChangeLog:
> 
> 	PR tree-optimization/110485
> 	* tree-vect-stmts.cc (vectorizable_simd_clone_call): Disable partial
> 	vectors usage if a notinbranch simdclone has been selected.
> 
> gcc/testsuite/ChangeLog:
> 
> 	* gcc.dg/gomp/pr110485.c: New test.
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 6/8] vect: Add vector_mode paramater to simd_clone_usable
  2023-08-30  9:14 ` [PATCH 6/8] vect: Add vector_mode paramater to simd_clone_usable Andre Vieira (lists)
  2023-08-30  9:17   ` Andre Vieira (lists)
@ 2023-08-30 13:01   ` Richard Biener
  2023-08-30 15:02     ` Andre Vieira (lists)
  1 sibling, 1 reply; 40+ messages in thread
From: Richard Biener @ 2023-08-30 13:01 UTC (permalink / raw)
  To: Andre Vieira (lists); +Cc: gcc-patches

On Wed, Aug 30, 2023 at 11:15 AM Andre Vieira (lists) via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
> This patch adds a machine_mode parameter to the TARGET_SIMD_CLONE_USABLE
> hook to enable rejecting SVE modes when the target architecture does not
> support SVE.

How does the graph node of the SIMD clone lack this information?  That is, it
should have information on the types (and thus modes) for all formal arguments
and return values already, no?  At least the target would know how to
instantiate
it if it's not readily available at the point of use.

> gcc/ChangeLog:
>
>         * config/aarch64/aarch64.cc (aarch64_simd_clone_usable): Add mode
>         parameter and use to to reject SVE modes when target architecture does
>         not support SVE.
>         * config/gcn/gcn.cc (gcn_simd_clone_usable): Add unused mode parameter.
>         * config/i386/i386.cc (ix86_simd_clone_usable): Likewise.
>         * doc/tm.texi (TARGET_SIMD_CLONE_USABLE): Document new parameter.
>         * target.def (usable): Add new parameter.
>         * tree-vect-stmts.cc (vectorizable_simd_clone_call): Pass vector mode
>         to TARGET_SIMD_CLONE_CALL hook.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH7/8] vect: Add TARGET_SIMD_CLONE_ADJUST_RET_OR_PARAM
  2023-08-30  9:17 ` [PATCH7/8] vect: Add TARGET_SIMD_CLONE_ADJUST_RET_OR_PARAM Andre Vieira (lists)
@ 2023-08-30 13:04   ` Richard Biener
  2023-10-04 10:32     ` Andre Vieira (lists)
  0 siblings, 1 reply; 40+ messages in thread
From: Richard Biener @ 2023-08-30 13:04 UTC (permalink / raw)
  To: Andre Vieira (lists); +Cc: gcc-patches, Richard Sandiford, jakub

On Wed, 30 Aug 2023, Andre Vieira (lists) wrote:

> This patch adds a new target hook to enable us to adapt the types of return
> and parameters of simd clones.  We use this in two ways, the first one is to
> make sure we can create valid SVE types, including the SVE type attribute,
> when creating a SVE simd clone, even when the target options do not support
> SVE.  We are following the same behaviour seen with x86 that creates simd
> clones according to the ABI rules when no simdlen is provided, even if that
> simdlen is not supported by the current target options.  Note that this
> doesn't mean the simd clone will be used in auto-vectorization.

You are not documenting the bool parameter of the new hook.

What's wrong with doing the adjustment in TARGET_SIMD_CLONE_ADJUST?

> gcc/ChangeLog:
> 
> 	(TARGET_SIMD_CLONE_ADJUST_RET_OR_PARAM): Define.
> 	* doc/tm.texi (TARGET_SIMD_CLONE_ADJUST_RET_OR_PARAM): Document.
> 	* doc/tm.texi.in (TARGET_SIMD_CLONE_ADJUST_RET_OR_PARAM): New.
> 	* omp-simd-clone.cc (simd_adjust_return_type): Call new hook.
> 	(simd_clone_adjust_argument_types): Likewise.
> 	* target.def (adjust_ret_or_param): New hook.
> 	* targhooks.cc (default_simd_clone_adjust_ret_or_param): New.
> 	* targhooks.h (default_simd_clone_adjust_ret_or_param): New.
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 6/8] vect: Add vector_mode paramater to simd_clone_usable
  2023-08-30 13:01   ` Richard Biener
@ 2023-08-30 15:02     ` Andre Vieira (lists)
  2023-08-31  6:39       ` Richard Biener
  0 siblings, 1 reply; 40+ messages in thread
From: Andre Vieira (lists) @ 2023-08-30 15:02 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches, Richard Sandiford

On 30/08/2023 14:01, Richard Biener wrote:
> On Wed, Aug 30, 2023 at 11:15 AM Andre Vieira (lists) via Gcc-patches
> <gcc-patches@gcc.gnu.org> wrote:
>>
>> This patch adds a machine_mode parameter to the TARGET_SIMD_CLONE_USABLE
>> hook to enable rejecting SVE modes when the target architecture does not
>> support SVE.
> 
> How does the graph node of the SIMD clone lack this information?  That is, it
> should have information on the types (and thus modes) for all formal arguments
> and return values already, no?  At least the target would know how to
> instantiate
> it if it's not readily available at the point of use.
> 

Yes it does, but that's the modes the simd clone itself uses, it does 
not know what vector_mode we are currently vectorizing for. Which is 
exactly why we need the vinfo's vector_mode to make sure the simd clone 
and its types are compatible with the vector mode.

In practice, to make sure that a SVE simd clones are only used in loops 
being vectorized for SVE modes. Having said that... I just realized that 
the simdlen check already takes care of that currently...

by simdlen check I mean the one that writes off simdclones that match:
         if (!constant_multiple_p (vf, n->simdclone->simdlen, &num_calls)

However, when using -msve-vector-bits this will become an issue, as the 
VF will be constant and we will match NEON simdclones.  This requires 
some further attention though given that we now also reject the use of 
SVE simdclones when using -msve-vector-bits, and I'm not entirely sure 
we should...

I'm going on holidays for 2 weeks now though, so I'll have a look at 
that scenario when I get back. Same with other feedback, didn't expect 
feedback this quickly ;) Thank you!!

Kind regards,
Andre

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 6/8] vect: Add vector_mode paramater to simd_clone_usable
  2023-08-30 15:02     ` Andre Vieira (lists)
@ 2023-08-31  6:39       ` Richard Biener
  2023-09-28 15:57         ` Andre Vieira (lists)
  0 siblings, 1 reply; 40+ messages in thread
From: Richard Biener @ 2023-08-31  6:39 UTC (permalink / raw)
  To: Andre Vieira (lists); +Cc: gcc-patches, Richard Sandiford

On Wed, Aug 30, 2023 at 5:02 PM Andre Vieira (lists)
<andre.simoesdiasvieira@arm.com> wrote:
>
>
>
> On 30/08/2023 14:01, Richard Biener wrote:
> > On Wed, Aug 30, 2023 at 11:15 AM Andre Vieira (lists) via Gcc-patches
> > <gcc-patches@gcc.gnu.org> wrote:
> >>
> >> This patch adds a machine_mode parameter to the TARGET_SIMD_CLONE_USABLE
> >> hook to enable rejecting SVE modes when the target architecture does not
> >> support SVE.
> >
> > How does the graph node of the SIMD clone lack this information?  That is, it
> > should have information on the types (and thus modes) for all formal arguments
> > and return values already, no?  At least the target would know how to
> > instantiate
> > it if it's not readily available at the point of use.
> >
>
> Yes it does, but that's the modes the simd clone itself uses, it does
> not know what vector_mode we are currently vectorizing for. Which is
> exactly why we need the vinfo's vector_mode to make sure the simd clone
> and its types are compatible with the vector mode.
>
> In practice, to make sure that a SVE simd clones are only used in loops
> being vectorized for SVE modes. Having said that... I just realized that
> the simdlen check already takes care of that currently...
>
> by simdlen check I mean the one that writes off simdclones that match:
>          if (!constant_multiple_p (vf, n->simdclone->simdlen, &num_calls)
>
> However, when using -msve-vector-bits this will become an issue, as the
> VF will be constant and we will match NEON simdclones.  This requires
> some further attention though given that we now also reject the use of
> SVE simdclones when using -msve-vector-bits, and I'm not entirely sure
> we should...

Hmm, but vectorizable_simdclone should check for compatible types here
and if they are compatible why should we reject them?  Are -msve-vector-bits
"SVE" modes different from "NEON" modes?  I suppose not, because otherwise
the type compatibility check would say incompatible.

> I'm going on holidays for 2 weeks now though, so I'll have a look at
> that scenario when I get back. Same with other feedback, didn't expect
> feedback this quickly ;) Thank you!!
>
> Kind regards,
> Andre
>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 6/8] vect: Add vector_mode paramater to simd_clone_usable
  2023-08-31  6:39       ` Richard Biener
@ 2023-09-28 15:57         ` Andre Vieira (lists)
  0 siblings, 0 replies; 40+ messages in thread
From: Andre Vieira (lists) @ 2023-09-28 15:57 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches, Richard Sandiford



On 31/08/2023 07:39, Richard Biener wrote:
> On Wed, Aug 30, 2023 at 5:02 PM Andre Vieira (lists)
> <andre.simoesdiasvieira@arm.com> wrote:
>>
>>
>>
>> On 30/08/2023 14:01, Richard Biener wrote:
>>> On Wed, Aug 30, 2023 at 11:15 AM Andre Vieira (lists) via Gcc-patches
>>> <gcc-patches@gcc.gnu.org> wrote:
>>>>
>>>> This patch adds a machine_mode parameter to the TARGET_SIMD_CLONE_USABLE
>>>> hook to enable rejecting SVE modes when the target architecture does not
>>>> support SVE.
>>>
>>> How does the graph node of the SIMD clone lack this information?  That is, it
>>> should have information on the types (and thus modes) for all formal arguments
>>> and return values already, no?  At least the target would know how to
>>> instantiate
>>> it if it's not readily available at the point of use.
>>>
>>
>> Yes it does, but that's the modes the simd clone itself uses, it does
>> not know what vector_mode we are currently vectorizing for. Which is
>> exactly why we need the vinfo's vector_mode to make sure the simd clone
>> and its types are compatible with the vector mode.
>>
>> In practice, to make sure that a SVE simd clones are only used in loops
>> being vectorized for SVE modes. Having said that... I just realized that
>> the simdlen check already takes care of that currently...
>>
>> by simdlen check I mean the one that writes off simdclones that match:
>>           if (!constant_multiple_p (vf, n->simdclone->simdlen, &num_calls)
>>
>> However, when using -msve-vector-bits this will become an issue, as the
>> VF will be constant and we will match NEON simdclones.  This requires
>> some further attention though given that we now also reject the use of
>> SVE simdclones when using -msve-vector-bits, and I'm not entirely sure
>> we should...
> 
> Hmm, but vectorizable_simdclone should check for compatible types here
> and if they are compatible why should we reject them?  Are -msve-vector-bits
> "SVE" modes different from "NEON" modes?  I suppose not, because otherwise
> the type compatibility check would say incompatible.
> 
Prior to transformation we do all checks on the original scalar values, 
not the vector types. But I do believe you are right in that we don't 
need to pass the vector_mode. The simdlen check should be enough and if 
the length is the same or a multiple of the rest of the could should be 
able to deal with that and any conversions when dealing with things like 
SVE types that require the attribute.

I'll update the patch series soon and after that I'll look at how this 
reacts to -msve-vector-bits in more detail.

Thanks,
Andre

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH7/8] vect: Add TARGET_SIMD_CLONE_ADJUST_RET_OR_PARAM
  2023-08-30 13:04   ` Richard Biener
@ 2023-10-04 10:32     ` Andre Vieira (lists)
  2023-10-04 10:41       ` Richard Biener
  0 siblings, 1 reply; 40+ messages in thread
From: Andre Vieira (lists) @ 2023-10-04 10:32 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches, Richard Sandiford, jakub



On 30/08/2023 14:04, Richard Biener wrote:
> On Wed, 30 Aug 2023, Andre Vieira (lists) wrote:
> 
>> This patch adds a new target hook to enable us to adapt the types of return
>> and parameters of simd clones.  We use this in two ways, the first one is to
>> make sure we can create valid SVE types, including the SVE type attribute,
>> when creating a SVE simd clone, even when the target options do not support
>> SVE.  We are following the same behaviour seen with x86 that creates simd
>> clones according to the ABI rules when no simdlen is provided, even if that
>> simdlen is not supported by the current target options.  Note that this
>> doesn't mean the simd clone will be used in auto-vectorization.
> 
> You are not documenting the bool parameter of the new hook.
> 
> What's wrong with doing the adjustment in TARGET_SIMD_CLONE_ADJUST?

simd_clone_adjust_argument_types is called after that hook, so by the 
time we call TARGET_SIMD_CLONE_ADJUST the types are still in scalar, not 
vector.  The same is true for the return type one.

Also the changes to the types need to be taken into consideration in 
'adjustments' I think.

PS: I hope the subject line survived, my email client is having a bit of 
a wobble this morning... it's what you get for updating software :(

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH7/8] vect: Add TARGET_SIMD_CLONE_ADJUST_RET_OR_PARAM
  2023-10-04 10:32     ` Andre Vieira (lists)
@ 2023-10-04 10:41       ` Richard Biener
  2023-10-04 12:40         ` Andre Vieira (lists)
  0 siblings, 1 reply; 40+ messages in thread
From: Richard Biener @ 2023-10-04 10:41 UTC (permalink / raw)
  To: Andre Vieira (lists); +Cc: gcc-patches, Richard Sandiford, jakub

On Wed, 4 Oct 2023, Andre Vieira (lists) wrote:

> 
> 
> On 30/08/2023 14:04, Richard Biener wrote:
> > On Wed, 30 Aug 2023, Andre Vieira (lists) wrote:
> > 
> >> This patch adds a new target hook to enable us to adapt the types of return
> >> and parameters of simd clones.  We use this in two ways, the first one is
> >> to
> >> make sure we can create valid SVE types, including the SVE type attribute,
> >> when creating a SVE simd clone, even when the target options do not support
> >> SVE.  We are following the same behaviour seen with x86 that creates simd
> >> clones according to the ABI rules when no simdlen is provided, even if that
> >> simdlen is not supported by the current target options.  Note that this
> >> doesn't mean the simd clone will be used in auto-vectorization.
> > 
> > You are not documenting the bool parameter of the new hook.
> > 
> > What's wrong with doing the adjustment in TARGET_SIMD_CLONE_ADJUST?
> 
> simd_clone_adjust_argument_types is called after that hook, so by the time we
> call TARGET_SIMD_CLONE_ADJUST the types are still in scalar, not vector.  The
> same is true for the return type one.
> 
> Also the changes to the types need to be taken into consideration in
> 'adjustments' I think.

Nothing in the three existing implementations of TARGET_SIMD_CLONE_ADJUST
relies on this ordering I think, how about moving the hook invocation 
after simd_clone_adjust_argument_types?

Richard.

> PS: I hope the subject line survived, my email client is having a bit of a
> wobble this morning... it's what you get for updating software :(

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH7/8] vect: Add TARGET_SIMD_CLONE_ADJUST_RET_OR_PARAM
  2023-10-04 10:41       ` Richard Biener
@ 2023-10-04 12:40         ` Andre Vieira (lists)
  2023-10-18 14:41           ` [PATCH6/8] omp: Reorder call for TARGET_SIMD_CLONE_ADJUST (was Re: [PATCH7/8] vect: Add TARGET_SIMD_CLONE_ADJUST_RET_OR_PARAM) Andre Vieira (lists)
  0 siblings, 1 reply; 40+ messages in thread
From: Andre Vieira (lists) @ 2023-10-04 12:40 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches, Richard Sandiford, jakub



On 04/10/2023 11:41, Richard Biener wrote:
> On Wed, 4 Oct 2023, Andre Vieira (lists) wrote:
> 
>>
>>
>> On 30/08/2023 14:04, Richard Biener wrote:
>>> On Wed, 30 Aug 2023, Andre Vieira (lists) wrote:
>>>
>>>> This patch adds a new target hook to enable us to adapt the types of return
>>>> and parameters of simd clones.  We use this in two ways, the first one is
>>>> to
>>>> make sure we can create valid SVE types, including the SVE type attribute,
>>>> when creating a SVE simd clone, even when the target options do not support
>>>> SVE.  We are following the same behaviour seen with x86 that creates simd
>>>> clones according to the ABI rules when no simdlen is provided, even if that
>>>> simdlen is not supported by the current target options.  Note that this
>>>> doesn't mean the simd clone will be used in auto-vectorization.
>>>
>>> You are not documenting the bool parameter of the new hook.
>>>
>>> What's wrong with doing the adjustment in TARGET_SIMD_CLONE_ADJUST?
>>
>> simd_clone_adjust_argument_types is called after that hook, so by the time we
>> call TARGET_SIMD_CLONE_ADJUST the types are still in scalar, not vector.  The
>> same is true for the return type one.
>>
>> Also the changes to the types need to be taken into consideration in
>> 'adjustments' I think.
> 
> Nothing in the three existing implementations of TARGET_SIMD_CLONE_ADJUST
> relies on this ordering I think, how about moving the hook invocation
> after simd_clone_adjust_argument_types?
> 

But that wouldn't change the 'ipa_param_body_adjustments' for when we 
have a function definition and we need to redo the body.
> Richard.
> 
>> PS: I hope the subject line survived, my email client is having a bit of a
>> wobble this morning... it's what you get for updating software :(

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: aarch64, vect, omp: Add SVE support for simd clones [PR 96342]
  2023-08-30  8:49 aarch64, vect, omp: Add SVE support for simd clones [PR 96342] Andre Vieira (lists)
                   ` (7 preceding siblings ...)
  2023-08-30  9:19 ` [PATCH 8/8] aarch64: Add SVE support for simd clones [PR 96342] Andre Vieira (lists)
@ 2023-10-18 14:40 ` Andre Vieira (lists)
  2023-10-18 14:41 ` [PATCH 0/8] omp: Replace simd_clone_subparts with TYPE_VECTOR_SUBPARTS Andre Vieira (lists)
  9 siblings, 0 replies; 40+ messages in thread
From: Andre Vieira (lists) @ 2023-10-18 14:40 UTC (permalink / raw)
  To: gcc-patches; +Cc: Richard Sandiford, Richard Biener, jakub

Hi,

I noticed I had missed one of the preparatory patches at the start of 
this series (first one) added now, also removed the 'vect: Add 
vector_mode paramater to simd_clone_usable' since after review we no 
longer deemed it necessary. And replaced the old vect: Add 
TARGET_SIMD_CLONE_ADJUST_RET_OR_PARAM with omp: Reorder call for 
TARGET_SIMD_CLONE_ADJUST after comments.

Bootstrapped and regression tested the series on 
aarch64-unknown-linux-gnu and x86_64-pc-linux-gnu.


Andre Vieira (8):

omp: Replace simd_clone_supbarts with TYPE_VECTOR_SUBPARTS [NEW]
parloops: Copy target and optimizations when creating a function clone 
[Reviewed]
parloops: Allow poly nit and bound [Cond Reviewed, made the requested 
changes]
vect: Fix vect_get_smallest_scalar_type for simd clones [First Reviewe, 
made the requested changes, OK?]
vect: don't allow fully masked loops with non-masked simd clones [PR 
110485] [Reviewed]
vect: Use inbranch simdclones in masked loops [Needs review]
vect: omp: Reorder call for TARGET_SIMD_CLONE_ADJUST [NEW]
aarch64: Add SVE support for simd clones [PR 96342] [Needs review]

PS: apologies for the inconsistent numbering of the emails, things got a 
bit confusing with removing and adding patches to the series.

On 30/08/2023 09:49, Andre Vieira (lists) via Gcc-patches wrote:
> Hi,
> 
> This patch series aims to implement support for SVE simd clones when not 
> specifying a 'simdlen' clause for AArch64. This patch depends on my 
> earlier patch: '[PATCH] aarch64: enable mixed-types for aarch64 
> simdclones'.
> 
> Bootstrapped and regression tested the series on 
> aarch64-unknown-linux-gnu and x86_64-pc-linux-gnu. I also tried building 
> the patches separately, but that was before some further clean-up 
> restructuring, so will do that again prior to pushing.
> 
> Andre Vieira (8):
> 
> parloops: Copy target and optimizations when creating a function clone
> parloops: Allow poly nit and bound
> vect: Fix vect_get_smallest_scalar_type for simd clones
> vect: don't allow fully masked loops with non-masked simd clones [PR 
> 110485]
> vect: Use inbranch simdclones in masked loops
> vect: Add vector_mode paramater to simd_clone_usable
> vect: Add TARGET_SIMD_CLONE_ADJUST_RET_OR_PARAM
> aarch64: Add SVE support for simd clones [PR 96342]

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 1/8] parloops: Copy target and optimizations when creating a function clone
  2023-08-30 12:31   ` Richard Biener
@ 2023-10-18 14:40     ` Andre Vieira (lists)
  0 siblings, 0 replies; 40+ messages in thread
From: Andre Vieira (lists) @ 2023-10-18 14:40 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches, Richard Sandiford, jakub

[-- Attachment #1: Type: text/plain, Size: 707 bytes --]

Just posting a rebase for completion.

On 30/08/2023 13:31, Richard Biener wrote:
> On Wed, 30 Aug 2023, Andre Vieira (lists) wrote:
> 
>>
>> SVE simd clones require to be compiled with a SVE target enabled or the
>> argument types will not be created properly. To achieve this we need to copy
>> DECL_FUNCTION_SPECIFIC_TARGET from the original function declaration to the
>> clones.  I decided it was probably also a good idea to copy
>> DECL_FUNCTION_SPECIFIC_OPTIMIZATION in case the original function is meant to
>> be compiled with specific optimization options.
> 
> OK.
> 
>> gcc/ChangeLog:
>>
>> 	* tree-parloops.cc (create_loop_fn): Copy specific target and
>> 	optimization options to clone.
>>
> 

[-- Attachment #2: sve_simd_clone_1v2.patch --]
[-- Type: text/plain, Size: 593 bytes --]

diff --git a/gcc/tree-parloops.cc b/gcc/tree-parloops.cc
index e495bbd65270bdf90bae2c4a2b52777522352a77..a35f3d5023b06e5ef96eb4222488fcb34dd7bd45 100644
--- a/gcc/tree-parloops.cc
+++ b/gcc/tree-parloops.cc
@@ -2203,6 +2203,11 @@ create_loop_fn (location_t loc)
   DECL_CONTEXT (t) = decl;
   TREE_USED (t) = 1;
   DECL_ARGUMENTS (decl) = t;
+  DECL_FUNCTION_SPECIFIC_TARGET (decl)
+    = DECL_FUNCTION_SPECIFIC_TARGET (act_cfun->decl);
+  DECL_FUNCTION_SPECIFIC_OPTIMIZATION (decl)
+    = DECL_FUNCTION_SPECIFIC_OPTIMIZATION (act_cfun->decl);
+
 
   allocate_struct_function (decl, false);
 

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Patch 2/8] parloops: Allow poly nit and bound
  2023-08-30 12:32   ` Richard Biener
@ 2023-10-18 14:40     ` Andre Vieira (lists)
  0 siblings, 0 replies; 40+ messages in thread
From: Andre Vieira (lists) @ 2023-10-18 14:40 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches, Richard Sandiford

[-- Attachment #1: Type: text/plain, Size: 505 bytes --]

Posting the changed patch for completion, already reviewed.

On 30/08/2023 13:32, Richard Biener wrote:
> On Wed, 30 Aug 2023, Andre Vieira (lists) wrote:
> 
>> Teach parloops how to handle a poly nit and bound e ahead of the changes to
>> enable non-constant simdlen.
> 
> Can you use poly_int_tree_p to combine INTEGER_CST || POLY_INT_CST please?
> 
> OK with that change.
> 
>> gcc/ChangeLog:
>>
>> 	* tree-parloops.cc (try_to_transform_to_exit_first_loop_alt): Accept
>> 	poly NIT and ALT_BOUND.
>>
> 

[-- Attachment #2: sve_simd_clones_2v2.patch --]
[-- Type: text/plain, Size: 889 bytes --]

diff --git a/gcc/tree-parloops.cc b/gcc/tree-parloops.cc
index a35f3d5023b06e5ef96eb4222488fcb34dd7bd45..80f3dd6dce281e1eb1d76d38bd09e6638a875142 100644
--- a/gcc/tree-parloops.cc
+++ b/gcc/tree-parloops.cc
@@ -2531,14 +2531,15 @@ try_transform_to_exit_first_loop_alt (class loop *loop,
   tree nit_type = TREE_TYPE (nit);
 
   /* Figure out whether nit + 1 overflows.  */
-  if (TREE_CODE (nit) == INTEGER_CST)
+  if (poly_int_tree_p (nit))
     {
       if (!tree_int_cst_equal (nit, TYPE_MAX_VALUE (nit_type)))
 	{
 	  alt_bound = fold_build2_loc (UNKNOWN_LOCATION, PLUS_EXPR, nit_type,
 				       nit, build_one_cst (nit_type));
 
-	  gcc_assert (TREE_CODE (alt_bound) == INTEGER_CST);
+	  gcc_assert (TREE_CODE (alt_bound) == INTEGER_CST
+		      || TREE_CODE (alt_bound) == POLY_INT_CST);
 	  transform_to_exit_first_loop_alt (loop, reduction_list, alt_bound);
 	  return true;
 	}

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Patch 3/8] vect: Fix vect_get_smallest_scalar_type for simd clones
  2023-08-30 12:54   ` Richard Biener
@ 2023-10-18 14:40     ` Andre Vieira (lists)
  2023-10-19 12:07       ` Richard Biener
  0 siblings, 1 reply; 40+ messages in thread
From: Andre Vieira (lists) @ 2023-10-18 14:40 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches, Richard Sandiford, jakub

[-- Attachment #1: Type: text/plain, Size: 1712 bytes --]

Made it a local function and changed prototype according to comments.

Is this OK?

  gcc/ChangeLog:
  	* tree-vect-data-refs.cc (vect_get_smallest_scalar_type): Special
  	case
  	simd clone calls and only use types that are mapped to vectors.
         (simd_clone_call_p): New helper function.
  	
On 30/08/2023 13:54, Richard Biener wrote:
> On Wed, 30 Aug 2023, Andre Vieira (lists) wrote:
> 
>> The vect_get_smallest_scalar_type helper function was using any argument to a
>> simd clone call when trying to determine the smallest scalar type that would
>> be vectorized.  This included the function pointer type in a MASK_CALL for
>> instance, and would result in the wrong type being selected.  Instead this
>> patch special cases simd_clone_call's and uses only scalar types of the
>> original function that get transformed into vector types.
> 
> Looks sensible.
> 
> +bool
> +simd_clone_call_p (gimple *stmt, cgraph_node **out_node)
> 
> you could return the cgraph_node * or NULL here.  Are you going to
> use the function elsewhere?  Otherwise put it in the same TU as
> the only use please and avoid exporting it.
> 
> Richard.
> 
>> gcc/ChangeLog:
>>
>> 	* tree-vect-data-refs.cci (vect_get_smallest_scalar_type): Special
>> 	case
>> 	simd clone calls and only use types that are mapped to vectors.
>> 	* tree-vect-stmts.cc (simd_clone_call_p): New helper function.
>> 	* tree-vectorizer.h (simd_clone_call_p): Declare new function.
>>
>> gcc/testsuite/ChangeLog:
>>
>> 	* gcc.dg/vect/vect-simd-clone-16f.c: Remove unnecessary differentation
>> 	between targets with different pointer sizes.
>> 	* gcc.dg/vect/vect-simd-clone-17f.c: Likewise.
>> 	* gcc.dg/vect/vect-simd-clone-18f.c: Likewise.
>>
> 

[-- Attachment #2: sve_simd_clones_3v2.patch --]
[-- Type: text/plain, Size: 4949 bytes --]

diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-16f.c b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-16f.c
index 574698d3e133ecb8700e698fa42a6b05dd6b8a18..7cd29e894d0502a59fadfe67db2db383133022d3 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-16f.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-16f.c
@@ -7,9 +7,8 @@
 #include "vect-simd-clone-16.c"
 
 /* Ensure the the in-branch simd clones are used on targets that support them.
-   Some targets use pairs of vectors and do twice the calls.  */
-/* { dg-final { scan-tree-dump-times {[\n\r] [^\n]* = foo\.simdclone} 2 "vect" { target { ! { { i?86-*-* x86_64-*-* } && { ! lp64 } } } } } } */
-/* { dg-final { scan-tree-dump-times {[\n\r] [^\n]* = foo\.simdclone} 4 "vect" { target { { i?86*-*-* x86_64-*-* } && { ! lp64 } } } } } */
+ */
+/* { dg-final { scan-tree-dump-times {[\n\r] [^\n]* = foo\.simdclone} 2 "vect" } } */
 
 /* The LTO test produces two dump files and we scan the wrong one.  */
 /* { dg-skip-if "" { *-*-* } { "-flto" } { "" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-17f.c b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-17f.c
index 8bb6d19301a67a3eebce522daaf7d54d88f708d7..177521dc44531479fca1f1a1a0f2010f30fa3fb5 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-17f.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-17f.c
@@ -7,9 +7,8 @@
 #include "vect-simd-clone-17.c"
 
 /* Ensure the the in-branch simd clones are used on targets that support them.
-   Some targets use pairs of vectors and do twice the calls.  */
-/* { dg-final { scan-tree-dump-times {[\n\r] [^\n]* = foo\.simdclone} 2 "vect" { target { ! { { i?86-*-* x86_64-*-* } && { ! lp64 } } } } } } */
-/* { dg-final { scan-tree-dump-times {[\n\r] [^\n]* = foo\.simdclone} 4 "vect" { target { { i?86*-*-* x86_64-*-* } && { ! lp64 } } } } } */
+ */
+/* { dg-final { scan-tree-dump-times {[\n\r] [^\n]* = foo\.simdclone} 2 "vect" } } */
 
 /* The LTO test produces two dump files and we scan the wrong one.  */
 /* { dg-skip-if "" { *-*-* } { "-flto" } { "" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-18f.c b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-18f.c
index d34f23f4db8e9c237558cc22fe66b7e02b9e6c20..4dd51381d73c0c7c8ec812f24e5054df038059c5 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-18f.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-18f.c
@@ -7,9 +7,8 @@
 #include "vect-simd-clone-18.c"
 
 /* Ensure the the in-branch simd clones are used on targets that support them.
-   Some targets use pairs of vectors and do twice the calls.  */
-/* { dg-final { scan-tree-dump-times {[\n\r] [^\n]* = foo\.simdclone} 2 "vect" { target { ! { { i?86-*-* x86_64-*-* } && { ! lp64 } } } } } } */
-/* { dg-final { scan-tree-dump-times {[\n\r] [^\n]* = foo\.simdclone} 4 "vect" { target { { i?86*-*-* x86_64-*-* } && { ! lp64 } } } } } */
+ */
+/* { dg-final { scan-tree-dump-times {[\n\r] [^\n]* = foo\.simdclone} 2 "vect" } } */
 
 /* The LTO test produces two dump files and we scan the wrong one.  */
 /* { dg-skip-if "" { *-*-* } { "-flto" } { "" } } */
diff --git a/gcc/tree-vect-data-refs.cc b/gcc/tree-vect-data-refs.cc
index 40ab568fe355964b878d770010aa9eeaef63eeac..106d46e68910df94e806433e1cd841894a86d062 100644
--- a/gcc/tree-vect-data-refs.cc
+++ b/gcc/tree-vect-data-refs.cc
@@ -97,6 +97,34 @@ vect_lanes_optab_supported_p (const char *name, convert_optab optab,
   return true;
 }
 
+/* Helper function to identify a simd clone call.  If this is a call to a
+   function with simd clones then return the corresponding cgraph_node,
+   otherwise return NULL.  */
+
+static cgraph_node*
+simd_clone_call_p (gimple *stmt)
+{
+  gcall *call = dyn_cast <gcall *> (stmt);
+  if (!call)
+    return NULL;
+
+  tree fndecl = NULL_TREE;
+  if (gimple_call_internal_p (call, IFN_MASK_CALL))
+    fndecl = TREE_OPERAND (gimple_call_arg (stmt, 0), 0);
+  else
+    fndecl = gimple_call_fndecl (stmt);
+
+  if (fndecl == NULL_TREE)
+    return NULL;
+
+  cgraph_node *node = cgraph_node::get (fndecl);
+  if (node && node->simd_clones != NULL)
+    return node;
+
+  return NULL;
+}
+
+
 
 /* Return the smallest scalar part of STMT_INFO.
    This is used to determine the vectype of the stmt.  We generally set the
@@ -145,6 +173,23 @@ vect_get_smallest_scalar_type (stmt_vec_info stmt_info, tree scalar_type)
 	    scalar_type = rhs_type;
 	}
     }
+  else if (cgraph_node *node = simd_clone_call_p (stmt_info->stmt))
+    {
+      auto clone = node->simd_clones->simdclone;
+      for (unsigned int i = 0; i < clone->nargs; ++i)
+	{
+	  if (clone->args[i].arg_type == SIMD_CLONE_ARG_TYPE_VECTOR)
+	    {
+	      tree arg_scalar_type = TREE_TYPE (clone->args[i].vector_type);
+	      rhs = TREE_INT_CST_LOW (TYPE_SIZE_UNIT (arg_scalar_type));
+	      if (rhs < lhs)
+		{
+		  scalar_type = arg_scalar_type;
+		  lhs = rhs;
+		}
+	    }
+	}
+    }
   else if (gcall *call = dyn_cast <gcall *> (stmt_info->stmt))
     {
       unsigned int i = 0;

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 4/8] vect: don't allow fully masked loops with non-masked simd clones [PR 110485]
  2023-08-30 12:54   ` Richard Biener
@ 2023-10-18 14:40     ` Andre Vieira (lists)
  2023-10-19 12:06       ` Richard Biener
  0 siblings, 1 reply; 40+ messages in thread
From: Andre Vieira (lists) @ 2023-10-18 14:40 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 946 bytes --]

Rebased on top of trunk, minor change to check if loop_vinfo since we 
now do some slp vectorization for simd_clones.

I assume the previous OK still holds.

On 30/08/2023 13:54, Richard Biener wrote:
> On Wed, 30 Aug 2023, Andre Vieira (lists) wrote:
> 
>> When analyzing a loop and choosing a simdclone to use it is possible to choose
>> a simdclone that cannot be used 'inbranch' for a loop that can use partial
>> vectors.  This may lead to the vectorizer deciding to use partial vectors
>> which are not supported for notinbranch simd clones. This patch fixes that by
>> disabling the use of partial vectors once a notinbranch simd clone has been
>> selected.
> 
> OK.
> 
>> gcc/ChangeLog:
>>
>> 	PR tree-optimization/110485
>> 	* tree-vect-stmts.cc (vectorizable_simd_clone_call): Disable partial
>> 	vectors usage if a notinbranch simdclone has been selected.
>>
>> gcc/testsuite/ChangeLog:
>>
>> 	* gcc.dg/gomp/pr110485.c: New test.
>>
> 

[-- Attachment #2: sve_simd_clones_4v2.patch --]
[-- Type: text/plain, Size: 1916 bytes --]

diff --git a/gcc/testsuite/gcc.dg/gomp/pr110485.c b/gcc/testsuite/gcc.dg/gomp/pr110485.c
new file mode 100644
index 0000000000000000000000000000000000000000..ba6817a127f40246071e32ccebf692cc4d121d15
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/gomp/pr110485.c
@@ -0,0 +1,19 @@
+/* PR 110485 */
+/* { dg-do compile } */
+/* { dg-additional-options "-Ofast -fdump-tree-vect-details" } */
+/* { dg-additional-options "-march=znver4 --param=vect-partial-vector-usage=1" { target x86_64-*-* } } */
+#pragma omp declare simd notinbranch uniform(p)
+extern double __attribute__ ((const)) bar (double a, double p);
+
+double a[1024];
+double b[1024];
+
+void foo (int n)
+{
+  #pragma omp simd
+  for (int i = 0; i < n; ++i)
+    a[i] = bar (b[i], 71.2);
+}
+
+/* { dg-final { scan-tree-dump-not "MASK_LOAD" "vect" } } */
+/* { dg-final { scan-tree-dump "can't use a fully-masked loop because a non-masked simd clone was selected." "vect" { target x86_64-*-* } } } */
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index a9156975d64c7a335ffd27614e87f9d11b23d1ba..731acc76350cae39c899a866584068cff247183a 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -4539,6 +4539,17 @@ vectorizable_simd_clone_call (vec_info *vinfo, stmt_vec_info stmt_info,
 		       ? boolean_true_node : boolean_false_node;
 	    simd_clone_info.safe_push (sll);
 	  }
+
+      if (!bestn->simdclone->inbranch && loop_vinfo)
+	{
+	  if (dump_enabled_p ()
+	      && LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo))
+	    dump_printf_loc (MSG_NOTE, vect_location,
+			     "can't use a fully-masked loop because a"
+			     " non-masked simd clone was selected.\n");
+	  LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo) = false;
+	}
+
       STMT_VINFO_TYPE (stmt_info) = call_simd_clone_vec_info_type;
       DUMP_VECT_SCOPE ("vectorizable_simd_clone_call");
 /*      vect_model_simple_cost (vinfo, stmt_info, ncopies,

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 5/8] vect: Use inbranch simdclones in masked loops
  2023-08-30  9:13 ` [PATCH 5/8] vect: Use inbranch simdclones in masked loops Andre Vieira (lists)
@ 2023-10-18 14:41   ` Andre Vieira (lists)
  2023-10-19 12:17     ` Richard Biener
  0 siblings, 1 reply; 40+ messages in thread
From: Andre Vieira (lists) @ 2023-10-18 14:41 UTC (permalink / raw)
  To: gcc-patches; +Cc: Richard Biener, Richard Sandiford, jakub

[-- Attachment #1: Type: text/plain, Size: 611 bytes --]

Rebased, needs review.

On 30/08/2023 10:13, Andre Vieira (lists) via Gcc-patches wrote:
> This patch enables the compiler to use inbranch simdclones when 
> generating masked loops in autovectorization.
> 
> gcc/ChangeLog:
> 
>      * omp-simd-clone.cc (simd_clone_adjust_argument_types): Make function
>      compatible with mask parameters in clone.
>      * tree-vect-stmts.cc (vect_convert): New helper function.
>      (vect_build_all_ones_mask): Allow vector boolean typed masks.
>      (vectorizable_simd_clone_call): Enable the use of masked clones in
>      fully masked loops.

[-- Attachment #2: sve_simd_clones_5v2.patch --]
[-- Type: text/plain, Size: 13395 bytes --]

diff --git a/gcc/omp-simd-clone.cc b/gcc/omp-simd-clone.cc
index a42643400ddcf10961633448b49d4caafb999f12..ef0b9b48c7212900023bc0eaebca5e1f9389db77 100644
--- a/gcc/omp-simd-clone.cc
+++ b/gcc/omp-simd-clone.cc
@@ -807,8 +807,14 @@ simd_clone_adjust_argument_types (struct cgraph_node *node)
     {
       ipa_adjusted_param adj;
       memset (&adj, 0, sizeof (adj));
-      tree parm = args[i];
-      tree parm_type = node->definition ? TREE_TYPE (parm) : parm;
+      tree parm = NULL_TREE;
+      tree parm_type = NULL_TREE;
+      if(i < args.length())
+	{
+	  parm = args[i];
+	  parm_type = node->definition ? TREE_TYPE (parm) : parm;
+	}
+
       adj.base_index = i;
       adj.prev_clone_index = i;
 
@@ -1547,7 +1553,7 @@ simd_clone_adjust (struct cgraph_node *node)
 	  mask = gimple_assign_lhs (g);
 	  g = gimple_build_assign (make_ssa_name (TREE_TYPE (mask)),
 				   BIT_AND_EXPR, mask,
-				   build_int_cst (TREE_TYPE (mask), 1));
+				   build_one_cst (TREE_TYPE (mask)));
 	  gsi_insert_after (&gsi, g, GSI_CONTINUE_LINKING);
 	  mask = gimple_assign_lhs (g);
 	}
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index 731acc76350cae39c899a866584068cff247183a..6e2c70c1d3970af652c1e50e41b144162884bf24 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -1594,6 +1594,20 @@ check_load_store_for_partial_vectors (loop_vec_info loop_vinfo, tree vectype,
     }
 }
 
+/* Return SSA name of the result of the conversion of OPERAND into type TYPE.
+   The conversion statement is inserted at GSI.  */
+
+static tree
+vect_convert (vec_info *vinfo, stmt_vec_info stmt_info, tree type, tree operand,
+	      gimple_stmt_iterator *gsi)
+{
+  operand = build1 (VIEW_CONVERT_EXPR, type, operand);
+  gassign *new_stmt = gimple_build_assign (make_ssa_name (type),
+					   operand);
+  vect_finish_stmt_generation (vinfo, stmt_info, new_stmt, gsi);
+  return gimple_get_lhs (new_stmt);
+}
+
 /* Return the mask input to a masked load or store.  VEC_MASK is the vectorized
    form of the scalar mask condition and LOOP_MASK, if nonnull, is the mask
    that needs to be applied to all loads and stores in a vectorized loop.
@@ -2547,7 +2561,8 @@ vect_build_all_ones_mask (vec_info *vinfo,
 {
   if (TREE_CODE (masktype) == INTEGER_TYPE)
     return build_int_cst (masktype, -1);
-  else if (TREE_CODE (TREE_TYPE (masktype)) == INTEGER_TYPE)
+  else if (VECTOR_BOOLEAN_TYPE_P (masktype)
+	   || TREE_CODE (TREE_TYPE (masktype)) == INTEGER_TYPE)
     {
       tree mask = build_int_cst (TREE_TYPE (masktype), -1);
       mask = build_vector_from_val (masktype, mask);
@@ -4156,7 +4171,7 @@ vectorizable_simd_clone_call (vec_info *vinfo, stmt_vec_info stmt_info,
   size_t i, nargs;
   tree lhs, rtype, ratype;
   vec<constructor_elt, va_gc> *ret_ctor_elts = NULL;
-  int arg_offset = 0;
+  int masked_call_offset = 0;
 
   /* Is STMT a vectorizable call?   */
   gcall *stmt = dyn_cast <gcall *> (stmt_info->stmt);
@@ -4171,7 +4186,7 @@ vectorizable_simd_clone_call (vec_info *vinfo, stmt_vec_info stmt_info,
       gcc_checking_assert (TREE_CODE (fndecl) == ADDR_EXPR);
       fndecl = TREE_OPERAND (fndecl, 0);
       gcc_checking_assert (TREE_CODE (fndecl) == FUNCTION_DECL);
-      arg_offset = 1;
+      masked_call_offset = 1;
     }
   if (fndecl == NULL_TREE)
     return false;
@@ -4199,7 +4214,7 @@ vectorizable_simd_clone_call (vec_info *vinfo, stmt_vec_info stmt_info,
     return false;
 
   /* Process function arguments.  */
-  nargs = gimple_call_num_args (stmt) - arg_offset;
+  nargs = gimple_call_num_args (stmt) - masked_call_offset;
 
   /* Bail out if the function has zero arguments.  */
   if (nargs == 0)
@@ -4221,7 +4236,7 @@ vectorizable_simd_clone_call (vec_info *vinfo, stmt_vec_info stmt_info,
       thisarginfo.op = NULL_TREE;
       thisarginfo.simd_lane_linear = false;
 
-      int op_no = i + arg_offset;
+      int op_no = i + masked_call_offset;
       if (slp_node)
 	op_no = vect_slp_child_index_for_operand (stmt, op_no);
       if (!vect_is_simple_use (vinfo, stmt_info, slp_node,
@@ -4303,16 +4318,6 @@ vectorizable_simd_clone_call (vec_info *vinfo, stmt_vec_info stmt_info,
       arginfo.quick_push (thisarginfo);
     }
 
-  if (loop_vinfo
-      && !LOOP_VINFO_VECT_FACTOR (loop_vinfo).is_constant ())
-    {
-      if (dump_enabled_p ())
-	dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
-			 "not considering SIMD clones; not yet supported"
-			 " for variable-width vectors.\n");
-      return false;
-    }
-
   poly_uint64 vf = loop_vinfo ? LOOP_VINFO_VECT_FACTOR (loop_vinfo) : 1;
   unsigned group_size = slp_node ? SLP_TREE_LANES (slp_node) : 1;
   unsigned int badness = 0;
@@ -4325,9 +4330,10 @@ vectorizable_simd_clone_call (vec_info *vinfo, stmt_vec_info stmt_info,
       {
 	unsigned int this_badness = 0;
 	unsigned int num_calls;
-	if (!constant_multiple_p (vf * group_size,
-				  n->simdclone->simdlen, &num_calls)
-	    || n->simdclone->nargs != nargs)
+	if (!constant_multiple_p (vf * group_size, n->simdclone->simdlen,
+				  &num_calls)
+	    || (!n->simdclone->inbranch && (masked_call_offset > 0))
+	    || nargs != n->simdclone->nargs)
 	  continue;
 	if (num_calls != 1)
 	  this_badness += exact_log2 (num_calls) * 4096;
@@ -4344,7 +4350,8 @@ vectorizable_simd_clone_call (vec_info *vinfo, stmt_vec_info stmt_info,
 	      case SIMD_CLONE_ARG_TYPE_VECTOR:
 		if (!useless_type_conversion_p
 			(n->simdclone->args[i].orig_type,
-			 TREE_TYPE (gimple_call_arg (stmt, i + arg_offset))))
+			 TREE_TYPE (gimple_call_arg (stmt,
+						     i + masked_call_offset))))
 		  i = -1;
 		else if (arginfo[i].dt == vect_constant_def
 			 || arginfo[i].dt == vect_external_def
@@ -4392,6 +4399,17 @@ vectorizable_simd_clone_call (vec_info *vinfo, stmt_vec_info stmt_info,
 	  }
 	if (i == (size_t) -1)
 	  continue;
+	if (masked_call_offset == 0
+	    && n->simdclone->inbranch
+	    && n->simdclone->nargs > nargs)
+	  {
+	    gcc_assert (n->simdclone->args[n->simdclone->nargs - 1].arg_type ==
+			SIMD_CLONE_ARG_TYPE_MASK);
+	    /* Penalize using a masked SIMD clone in a non-masked loop, that is
+	       not in a branch, as we'd have to construct an all-true mask.  */
+	    if (!loop_vinfo || !LOOP_VINFO_FULLY_MASKED_P (loop_vinfo))
+	      this_badness += 64;
+	  }
 	if (bestn == NULL || this_badness < badness)
 	  {
 	    bestn = n;
@@ -4414,7 +4432,8 @@ vectorizable_simd_clone_call (vec_info *vinfo, stmt_vec_info stmt_info,
 	   || arginfo[i].dt == vect_external_def)
 	  && bestn->simdclone->args[i].arg_type == SIMD_CLONE_ARG_TYPE_VECTOR)
 	{
-	  tree arg_type = TREE_TYPE (gimple_call_arg (stmt, i + arg_offset));
+	  tree arg_type = TREE_TYPE (gimple_call_arg (stmt,
+						      i + masked_call_offset));
 	  arginfo[i].vectype = get_vectype_for_scalar_type (vinfo, arg_type,
 							    slp_node);
 	  if (arginfo[i].vectype == NULL
@@ -4523,22 +4542,37 @@ vectorizable_simd_clone_call (vec_info *vinfo, stmt_vec_info stmt_info,
       if (gimple_vuse (stmt) && slp_node)
 	vinfo->any_known_not_updated_vssa = true;
       simd_clone_info.safe_push (bestn->decl);
-      for (i = 0; i < nargs; i++)
-	if ((bestn->simdclone->args[i].arg_type
-	     == SIMD_CLONE_ARG_TYPE_LINEAR_CONSTANT_STEP)
-	    || (bestn->simdclone->args[i].arg_type
-		== SIMD_CLONE_ARG_TYPE_LINEAR_REF_CONSTANT_STEP))
-	  {
-	    simd_clone_info.safe_grow_cleared (i * 3 + 1, true);
-	    simd_clone_info.safe_push (arginfo[i].op);
-	    tree lst = POINTER_TYPE_P (TREE_TYPE (arginfo[i].op))
-		       ? size_type_node : TREE_TYPE (arginfo[i].op);
-	    tree ls = build_int_cst (lst, arginfo[i].linear_step);
-	    simd_clone_info.safe_push (ls);
-	    tree sll = arginfo[i].simd_lane_linear
-		       ? boolean_true_node : boolean_false_node;
-	    simd_clone_info.safe_push (sll);
-	  }
+      for (i = 0; i < bestn->simdclone->nargs; i++)
+	{
+	  switch (bestn->simdclone->args[i].arg_type)
+	    {
+	    default:
+	      continue;
+	    case SIMD_CLONE_ARG_TYPE_LINEAR_CONSTANT_STEP:
+	    case SIMD_CLONE_ARG_TYPE_LINEAR_REF_CONSTANT_STEP:
+	      {
+		auto &clone_info = STMT_VINFO_SIMD_CLONE_INFO (stmt_info);
+		clone_info.safe_grow_cleared (i * 3 + 1, true);
+		clone_info.safe_push (arginfo[i].op);
+		tree lst = POINTER_TYPE_P (TREE_TYPE (arginfo[i].op))
+			   ? size_type_node : TREE_TYPE (arginfo[i].op);
+		tree ls = build_int_cst (lst, arginfo[i].linear_step);
+		clone_info.safe_push (ls);
+		tree sll = arginfo[i].simd_lane_linear
+			   ? boolean_true_node : boolean_false_node;
+		clone_info.safe_push (sll);
+	      }
+	      break;
+	    case SIMD_CLONE_ARG_TYPE_MASK:
+	      if (loop_vinfo
+		  && LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo))
+		vect_record_loop_mask (loop_vinfo,
+				       &LOOP_VINFO_MASKS (loop_vinfo),
+				       ncopies, vectype, op);
+
+	      break;
+	    }
+	}
 
       if (!bestn->simdclone->inbranch && loop_vinfo)
 	{
@@ -4590,6 +4624,8 @@ vectorizable_simd_clone_call (vec_info *vinfo, stmt_vec_info stmt_info,
     vec_oprnds.safe_grow_cleared (nargs, true);
   for (j = 0; j < ncopies; ++j)
     {
+      poly_uint64 callee_nelements;
+      poly_uint64 caller_nelements;
       /* Build argument list for the vectorized call.  */
       if (j == 0)
 	vargs.create (nargs);
@@ -4600,8 +4636,7 @@ vectorizable_simd_clone_call (vec_info *vinfo, stmt_vec_info stmt_info,
 	{
 	  unsigned int k, l, m, o;
 	  tree atype;
-	  poly_uint64 callee_nelements, caller_nelements;
-	  op = gimple_call_arg (stmt, i + arg_offset);
+	  op = gimple_call_arg (stmt, i + masked_call_offset);
 	  switch (bestn->simdclone->args[i].arg_type)
 	    {
 	    case SIMD_CLONE_ARG_TYPE_VECTOR:
@@ -4680,16 +4715,9 @@ vectorizable_simd_clone_call (vec_info *vinfo, stmt_vec_info stmt_info,
 		      if (k == 1)
 			if (!useless_type_conversion_p (TREE_TYPE (vec_oprnd0),
 						       atype))
-			  {
-			    vec_oprnd0
-			      = build1 (VIEW_CONVERT_EXPR, atype, vec_oprnd0);
-			    gassign *new_stmt
-			      = gimple_build_assign (make_ssa_name (atype),
-						     vec_oprnd0);
-			    vect_finish_stmt_generation (vinfo, stmt_info,
-							 new_stmt, gsi);
-			    vargs.safe_push (gimple_assign_lhs (new_stmt));
-			  }
+			  vargs.safe_push (vect_convert (vinfo, stmt_info,
+							 atype, vec_oprnd0,
+							 gsi));
 			else
 			  vargs.safe_push (vec_oprnd0);
 		      else
@@ -4738,6 +4766,24 @@ vectorizable_simd_clone_call (vec_info *vinfo, stmt_vec_info stmt_info,
 			      vec_oprnds_i[i] = 0;
 			    }
 			  vec_oprnd0 = vec_oprnds[i][vec_oprnds_i[i]++];
+			  if (loop_vinfo
+			      && LOOP_VINFO_FULLY_MASKED_P (loop_vinfo))
+			    {
+			      vec_loop_masks *loop_masks
+				= &LOOP_VINFO_MASKS (loop_vinfo);
+			      tree loop_mask
+				= vect_get_loop_mask (loop_vinfo, gsi,
+						      loop_masks, ncopies,
+						      vectype, j);
+			      vec_oprnd0
+				= prepare_vec_mask (loop_vinfo,
+						    TREE_TYPE (loop_mask),
+						    loop_mask, vec_oprnd0,
+						    gsi);
+			      loop_vinfo->vec_cond_masked_set.add ({ vec_oprnd0,
+								     loop_mask });
+
+			    }
 			  vec_oprnd0
 			    = build3 (VEC_COND_EXPR, atype, vec_oprnd0,
 				      build_vector_from_val (atype, one),
@@ -4901,6 +4947,64 @@ vectorizable_simd_clone_call (vec_info *vinfo, stmt_vec_info stmt_info,
 	    }
 	}
 
+      if (masked_call_offset == 0
+	  && bestn->simdclone->inbranch
+	  && bestn->simdclone->nargs > nargs)
+	{
+	  unsigned long m, o;
+	  size_t mask_i = bestn->simdclone->nargs - 1;
+	  tree mask;
+	  gcc_assert (bestn->simdclone->args[mask_i].arg_type ==
+		      SIMD_CLONE_ARG_TYPE_MASK);
+
+	  tree masktype = bestn->simdclone->args[mask_i].vector_type;
+	  callee_nelements = TYPE_VECTOR_SUBPARTS (masktype);
+	  o = vector_unroll_factor (nunits, callee_nelements);
+	  for (m = j * o; m < (j + 1) * o; m++)
+	    {
+	      if (loop_vinfo && LOOP_VINFO_FULLY_MASKED_P (loop_vinfo))
+		{
+		  vec_loop_masks *loop_masks = &LOOP_VINFO_MASKS (loop_vinfo);
+		  mask = vect_get_loop_mask (loop_vinfo, gsi, loop_masks,
+					     ncopies, vectype, j);
+		}
+	      else
+		mask = vect_build_all_ones_mask (vinfo, stmt_info, masktype);
+
+	      if (!useless_type_conversion_p (TREE_TYPE (mask), masktype))
+		{
+		  gassign *new_stmt;
+		  if (bestn->simdclone->mask_mode != VOIDmode)
+		    {
+		      /* This means we are dealing with integer mask modes.
+			 First convert to an integer type with the same size as
+			 the current vector type.  */
+		      unsigned HOST_WIDE_INT intermediate_size
+			= tree_to_uhwi (TYPE_SIZE (TREE_TYPE (mask)));
+		      tree mid_int_type =
+			build_nonstandard_integer_type (intermediate_size, 1);
+		      mask = build1 (VIEW_CONVERT_EXPR, mid_int_type, mask);
+		      new_stmt
+			= gimple_build_assign (make_ssa_name (mid_int_type),
+					       mask);
+		      gsi_insert_before (gsi, new_stmt, GSI_SAME_STMT);
+		      /* Then zero-extend to the mask mode.  */
+		      mask = fold_build1 (NOP_EXPR, masktype,
+					  gimple_get_lhs (new_stmt));
+		    }
+		  else
+		    mask = build1 (VIEW_CONVERT_EXPR, masktype, mask);
+
+		  new_stmt = gimple_build_assign (make_ssa_name (masktype),
+						  mask);
+		  vect_finish_stmt_generation (vinfo, stmt_info,
+					       new_stmt, gsi);
+		  mask = gimple_assign_lhs (new_stmt);
+		}
+	      vargs.safe_push (mask);
+	    }
+	}
+
       gcall *new_call = gimple_build_call_vec (fndecl, vargs);
       if (vec_dest)
 	{

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH6/8] omp: Reorder call for TARGET_SIMD_CLONE_ADJUST (was Re: [PATCH7/8] vect: Add TARGET_SIMD_CLONE_ADJUST_RET_OR_PARAM)
  2023-10-04 12:40         ` Andre Vieira (lists)
@ 2023-10-18 14:41           ` Andre Vieira (lists)
  2023-10-30 18:34             ` Andre Vieira (lists)
  2023-10-31  7:59             ` Richard Biener
  0 siblings, 2 replies; 40+ messages in thread
From: Andre Vieira (lists) @ 2023-10-18 14:41 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches, Richard Sandiford, jakub

[-- Attachment #1: Type: text/plain, Size: 2896 bytes --]

This patch moves the call to TARGET_SIMD_CLONE_ADJUST until after the 
arguments and return types have been transformed into vector types.  It 
also constructs the adjuments and retval modifications after this call, 
allowing targets to alter the types of the arguments and return of the 
clone prior to the modifications to the function definition.

Is this OK?

gcc/ChangeLog:

         * omp-simd-clone.cc (simd_clone_adjust_return_type): Hoist out
         code to create return array and don't return new type.
         (simd_clone_adjust_argument_types): Hoist out code that creates
         ipa_param_body_adjustments and don't return them.
         (simd_clone_adjust): Call TARGET_SIMD_CLONE_ADJUST after return
         and argument types have been vectorized, create adjustments and
         return array after the hook.
         (expand_simd_clones): Call TARGET_SIMD_CLONE_ADJUST after return
         and argument types have been vectorized.

On 04/10/2023 13:40, Andre Vieira (lists) wrote:
> 
> 
> On 04/10/2023 11:41, Richard Biener wrote:
>> On Wed, 4 Oct 2023, Andre Vieira (lists) wrote:
>>
>>>
>>>
>>> On 30/08/2023 14:04, Richard Biener wrote:
>>>> On Wed, 30 Aug 2023, Andre Vieira (lists) wrote:
>>>>
>>>>> This patch adds a new target hook to enable us to adapt the types 
>>>>> of return
>>>>> and parameters of simd clones.  We use this in two ways, the first 
>>>>> one is
>>>>> to
>>>>> make sure we can create valid SVE types, including the SVE type 
>>>>> attribute,
>>>>> when creating a SVE simd clone, even when the target options do not 
>>>>> support
>>>>> SVE.  We are following the same behaviour seen with x86 that 
>>>>> creates simd
>>>>> clones according to the ABI rules when no simdlen is provided, even 
>>>>> if that
>>>>> simdlen is not supported by the current target options.  Note that 
>>>>> this
>>>>> doesn't mean the simd clone will be used in auto-vectorization.
>>>>
>>>> You are not documenting the bool parameter of the new hook.
>>>>
>>>> What's wrong with doing the adjustment in TARGET_SIMD_CLONE_ADJUST?
>>>
>>> simd_clone_adjust_argument_types is called after that hook, so by the 
>>> time we
>>> call TARGET_SIMD_CLONE_ADJUST the types are still in scalar, not 
>>> vector.  The
>>> same is true for the return type one.
>>>
>>> Also the changes to the types need to be taken into consideration in
>>> 'adjustments' I think.
>>
>> Nothing in the three existing implementations of TARGET_SIMD_CLONE_ADJUST
>> relies on this ordering I think, how about moving the hook invocation
>> after simd_clone_adjust_argument_types?
>>
> 
> But that wouldn't change the 'ipa_param_body_adjustments' for when we 
> have a function definition and we need to redo the body.
>> Richard.
>>
>>> PS: I hope the subject line survived, my email client is having a bit 
>>> of a
>>> wobble this morning... it's what you get for updating software :(

[-- Attachment #2: sve_simd_clones_6v2.patch --]
[-- Type: text/plain, Size: 14008 bytes --]

diff --git a/gcc/omp-simd-clone.cc b/gcc/omp-simd-clone.cc
index ef0b9b48c7212900023bc0eaebca5e1f9389db77..fb80888190c88e29895ecfbbe1b17d390c9a9dfe 100644
--- a/gcc/omp-simd-clone.cc
+++ b/gcc/omp-simd-clone.cc
@@ -701,10 +701,9 @@ simd_clone_create (struct cgraph_node *old_node, bool force_local)
 }
 
 /* Adjust the return type of the given function to its appropriate
-   vector counterpart.  Returns a simd array to be used throughout the
-   function as a return value.  */
+   vector counterpart.  */
 
-static tree
+static void
 simd_clone_adjust_return_type (struct cgraph_node *node)
 {
   tree fndecl = node->decl;
@@ -714,7 +713,7 @@ simd_clone_adjust_return_type (struct cgraph_node *node)
 
   /* Adjust the function return type.  */
   if (orig_rettype == void_type_node)
-    return NULL_TREE;
+    return;
   t = TREE_TYPE (TREE_TYPE (fndecl));
   if (INTEGRAL_TYPE_P (t) || POINTER_TYPE_P (t))
     veclen = node->simdclone->vecsize_int;
@@ -737,24 +736,6 @@ simd_clone_adjust_return_type (struct cgraph_node *node)
 						veclen));
     }
   TREE_TYPE (TREE_TYPE (fndecl)) = t;
-  if (!node->definition)
-    return NULL_TREE;
-
-  t = DECL_RESULT (fndecl);
-  /* Adjust the DECL_RESULT.  */
-  gcc_assert (TREE_TYPE (t) != void_type_node);
-  TREE_TYPE (t) = TREE_TYPE (TREE_TYPE (fndecl));
-  relayout_decl (t);
-
-  tree atype = build_array_type_nelts (orig_rettype,
-				       node->simdclone->simdlen);
-  if (maybe_ne (veclen, node->simdclone->simdlen))
-    return build1 (VIEW_CONVERT_EXPR, atype, t);
-
-  /* Set up a SIMD array to use as the return value.  */
-  tree retval = create_tmp_var_raw (atype, "retval");
-  gimple_add_tmp_var (retval);
-  return retval;
 }
 
 /* Each vector argument has a corresponding array to be used locally
@@ -788,7 +769,7 @@ create_tmp_simd_array (const char *prefix, tree type, poly_uint64 simdlen)
    declarations will be remapped.  New arguments which are not to be remapped
    are marked with USER_FLAG.  */
 
-static ipa_param_body_adjustments *
+static void
 simd_clone_adjust_argument_types (struct cgraph_node *node)
 {
   auto_vec<tree> args;
@@ -798,15 +779,11 @@ simd_clone_adjust_argument_types (struct cgraph_node *node)
   else
     simd_clone_vector_of_formal_parm_types (&args, node->decl);
   struct cgraph_simd_clone *sc = node->simdclone;
-  vec<ipa_adjusted_param, va_gc> *new_params = NULL;
-  vec_safe_reserve (new_params, sc->nargs);
-  unsigned i, j, k;
+  unsigned i, k;
   poly_uint64 veclen;
 
   for (i = 0; i < sc->nargs; ++i)
     {
-      ipa_adjusted_param adj;
-      memset (&adj, 0, sizeof (adj));
       tree parm = NULL_TREE;
       tree parm_type = NULL_TREE;
       if(i < args.length())
@@ -815,17 +792,12 @@ simd_clone_adjust_argument_types (struct cgraph_node *node)
 	  parm_type = node->definition ? TREE_TYPE (parm) : parm;
 	}
 
-      adj.base_index = i;
-      adj.prev_clone_index = i;
-
       sc->args[i].orig_arg = node->definition ? parm : NULL_TREE;
       sc->args[i].orig_type = parm_type;
 
       switch (sc->args[i].arg_type)
 	{
 	default:
-	  /* No adjustment necessary for scalar arguments.  */
-	  adj.op = IPA_PARAM_OP_COPY;
 	  break;
 	case SIMD_CLONE_ARG_TYPE_LINEAR_UVAL_CONSTANT_STEP:
 	case SIMD_CLONE_ARG_TYPE_LINEAR_UVAL_VARIABLE_STEP:
@@ -834,7 +806,6 @@ simd_clone_adjust_argument_types (struct cgraph_node *node)
 	      = create_tmp_simd_array (IDENTIFIER_POINTER (DECL_NAME (parm)),
 				       TREE_TYPE (parm_type),
 				       sc->simdlen);
-	  adj.op = IPA_PARAM_OP_COPY;
 	  break;
 	case SIMD_CLONE_ARG_TYPE_LINEAR_VAL_CONSTANT_STEP:
 	case SIMD_CLONE_ARG_TYPE_LINEAR_VAL_VARIABLE_STEP:
@@ -851,28 +822,12 @@ simd_clone_adjust_argument_types (struct cgraph_node *node)
 			   GET_MODE_BITSIZE (SCALAR_TYPE_MODE (parm_type)));
 	  if (multiple_p (veclen, sc->simdlen))
 	    veclen = sc->simdlen;
-	  adj.op = IPA_PARAM_OP_NEW;
-	  adj.param_prefix_index = IPA_PARAM_PREFIX_SIMD;
+	  tree vtype;
 	  if (POINTER_TYPE_P (parm_type))
-	    adj.type = build_vector_type (pointer_sized_int_node, veclen);
+	    vtype = build_vector_type (pointer_sized_int_node, veclen);
 	  else
-	    adj.type = build_vector_type (parm_type, veclen);
-	  sc->args[i].vector_type = adj.type;
-	  k = vector_unroll_factor (sc->simdlen, veclen);
-	  for (j = 1; j < k; j++)
-	    {
-	      vec_safe_push (new_params, adj);
-	      if (j == 1)
-		{
-		  memset (&adj, 0, sizeof (adj));
-		  adj.op = IPA_PARAM_OP_NEW;
-		  adj.user_flag = 1;
-		  adj.param_prefix_index = IPA_PARAM_PREFIX_SIMD;
-		  adj.base_index = i;
-		  adj.prev_clone_index = i;
-		  adj.type = sc->args[i].vector_type;
-		}
-	    }
+	    vtype = build_vector_type (parm_type, veclen);
+	  sc->args[i].vector_type = vtype;
 
 	  if (node->definition)
 	    sc->args[i].simd_array
@@ -880,20 +835,12 @@ simd_clone_adjust_argument_types (struct cgraph_node *node)
 				       ? IDENTIFIER_POINTER (DECL_NAME (parm))
 				       : NULL, parm_type, sc->simdlen);
 	}
-      vec_safe_push (new_params, adj);
     }
 
   if (sc->inbranch)
     {
       tree base_type = simd_clone_compute_base_data_type (sc->origin, sc);
-      ipa_adjusted_param adj;
-      memset (&adj, 0, sizeof (adj));
-      adj.op = IPA_PARAM_OP_NEW;
-      adj.user_flag = 1;
-      adj.param_prefix_index = IPA_PARAM_PREFIX_MASK;
-
-      adj.base_index = i;
-      adj.prev_clone_index = i;
+      tree mask_type;
       if (INTEGRAL_TYPE_P (base_type) || POINTER_TYPE_P (base_type))
 	veclen = sc->vecsize_int;
       else
@@ -906,17 +853,14 @@ simd_clone_adjust_argument_types (struct cgraph_node *node)
       if (multiple_p (veclen, sc->simdlen))
 	veclen = sc->simdlen;
       if (sc->mask_mode != VOIDmode)
-	adj.type
+	mask_type
 	  = lang_hooks.types.type_for_mode (sc->mask_mode, 1);
       else if (POINTER_TYPE_P (base_type))
-	adj.type = build_vector_type (pointer_sized_int_node, veclen);
+	mask_type = build_vector_type (pointer_sized_int_node, veclen);
       else
-	adj.type = build_vector_type (base_type, veclen);
-      vec_safe_push (new_params, adj);
+	mask_type = build_vector_type (base_type, veclen);
 
       k = vector_unroll_factor (sc->simdlen, veclen);
-      for (j = 1; j < k; j++)
-	vec_safe_push (new_params, adj);
 
       /* We have previously allocated one extra entry for the mask.  Use
 	 it and fill it.  */
@@ -932,24 +876,16 @@ simd_clone_adjust_argument_types (struct cgraph_node *node)
 	      = create_tmp_simd_array ("mask", base_type, sc->simdlen);
 	  else if (k > 1)
 	    sc->args[i].simd_array
-	      = create_tmp_simd_array ("mask", adj.type, k);
+	      = create_tmp_simd_array ("mask", mask_type, k);
 	  else
 	    sc->args[i].simd_array = NULL_TREE;
 	}
       sc->args[i].orig_type = base_type;
       sc->args[i].arg_type = SIMD_CLONE_ARG_TYPE_MASK;
-      sc->args[i].vector_type = adj.type;
+      sc->args[i].vector_type = mask_type;
     }
 
-  if (node->definition)
-    {
-      ipa_param_body_adjustments *adjustments
-	= new ipa_param_body_adjustments (new_params, node->decl);
-
-      adjustments->modify_formal_parameters ();
-      return adjustments;
-    }
-  else
+  if (!node->definition)
     {
       tree new_arg_types = NULL_TREE, new_reversed;
       bool last_parm_void = false;
@@ -957,15 +893,20 @@ simd_clone_adjust_argument_types (struct cgraph_node *node)
 	last_parm_void = true;
 
       gcc_assert (TYPE_ARG_TYPES (TREE_TYPE (node->decl)));
-      j = vec_safe_length (new_params);
-      for (i = 0; i < j; i++)
+      for (i = 0; i < sc->nargs; i++)
 	{
-	  struct ipa_adjusted_param *adj = &(*new_params)[i];
 	  tree ptype;
-	  if (adj->op == IPA_PARAM_OP_COPY)
-	    ptype = args[adj->base_index];
-	  else
-	    ptype = adj->type;
+	  switch (sc->args[i].arg_type)
+	    {
+	    default:
+	      ptype = sc->args[i].orig_type;
+	      break;
+	    case SIMD_CLONE_ARG_TYPE_LINEAR_VAL_CONSTANT_STEP:
+	    case SIMD_CLONE_ARG_TYPE_LINEAR_VAL_VARIABLE_STEP:
+	    case SIMD_CLONE_ARG_TYPE_VECTOR:
+	      ptype = sc->args[i].vector_type;
+	      break;
+	    }
 	  new_arg_types = tree_cons (NULL_TREE, ptype, new_arg_types);
 	}
       new_reversed = nreverse (new_arg_types);
@@ -977,7 +918,6 @@ simd_clone_adjust_argument_types (struct cgraph_node *node)
 	    new_reversed = void_list_node;
 	}
       TYPE_ARG_TYPES (TREE_TYPE (node->decl)) = new_reversed;
-      return NULL;
     }
 }
 
@@ -996,7 +936,8 @@ simd_clone_init_simd_arrays (struct cgraph_node *node,
        arg;
        arg = DECL_CHAIN (arg), i++, j++)
     {
-      if ((*adjustments->m_adj_params)[j].op == IPA_PARAM_OP_COPY
+      ipa_adjusted_param adj = (*adjustments->m_adj_params)[j];
+      if (adj.op == IPA_PARAM_OP_COPY
 	  || POINTER_TYPE_P (TREE_TYPE (arg)))
 	continue;
 
@@ -1004,7 +945,7 @@ simd_clone_init_simd_arrays (struct cgraph_node *node,
 
       tree array = node->simdclone->args[i].simd_array;
       if (node->simdclone->mask_mode != VOIDmode
-	  && node->simdclone->args[i].arg_type == SIMD_CLONE_ARG_TYPE_MASK)
+	  && adj.param_prefix_index == IPA_PARAM_PREFIX_MASK)
 	{
 	  if (array == NULL_TREE)
 	    continue;
@@ -1024,8 +965,9 @@ simd_clone_init_simd_arrays (struct cgraph_node *node,
 	    }
 	  continue;
 	}
-      if (known_eq (TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg)),
-		    node->simdclone->simdlen))
+      if (!VECTOR_TYPE_P (TREE_TYPE (arg))
+	  || known_eq (TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg)),
+		       node->simdclone->simdlen))
 	{
 	  tree ptype = build_pointer_type (TREE_TYPE (TREE_TYPE (array)));
 	  tree ptr = build_fold_addr_expr (array);
@@ -1423,13 +1365,120 @@ simd_clone_adjust (struct cgraph_node *node)
 {
   push_cfun (DECL_STRUCT_FUNCTION (node->decl));
 
+  tree orig_rettype = TREE_TYPE (TREE_TYPE (node->decl));
   TREE_TYPE (node->decl) = build_distinct_type_copy (TREE_TYPE (node->decl));
+  simd_clone_adjust_return_type (node);
+  simd_clone_adjust_argument_types (node);
   targetm.simd_clone.adjust (node);
+  tree retval = NULL_TREE;
+  if (orig_rettype != void_type_node)
+    {
+      poly_uint64 veclen;
+      if (INTEGRAL_TYPE_P (orig_rettype) || POINTER_TYPE_P (orig_rettype))
+	veclen = node->simdclone->vecsize_int;
+      else
+	veclen = node->simdclone->vecsize_float;
+      if (known_eq (veclen, 0U))
+	veclen = node->simdclone->simdlen;
+      else
+	veclen = exact_div (veclen,
+			    GET_MODE_BITSIZE (SCALAR_TYPE_MODE (orig_rettype)));
+      if (multiple_p (veclen, node->simdclone->simdlen))
+	veclen = node->simdclone->simdlen;
+
+      retval = DECL_RESULT (node->decl);
+      /* Adjust the DECL_RESULT.  */
+      TREE_TYPE (retval) = TREE_TYPE (TREE_TYPE (node->decl));
+      relayout_decl (retval);
+
+      tree atype = build_array_type_nelts (orig_rettype,
+					   node->simdclone->simdlen);
+      if (maybe_ne (veclen, node->simdclone->simdlen))
+	retval = build1 (VIEW_CONVERT_EXPR, atype, retval);
+      else
+	{
+	  /* Set up a SIMD array to use as the return value.  */
+	  retval = create_tmp_var_raw (atype, "retval");
+	  gimple_add_tmp_var (retval);
+	}
+    }
 
-  tree retval = simd_clone_adjust_return_type (node);
+  struct cgraph_simd_clone *sc = node->simdclone;
+  vec<ipa_adjusted_param, va_gc> *new_params = NULL;
+  vec_safe_reserve (new_params, sc->nargs);
+  unsigned i, j, k;
+  for (i = 0; i < sc->nargs; ++i)
+    {
+      ipa_adjusted_param adj;
+      memset (&adj, 0, sizeof (adj));
+      poly_uint64 veclen;
+      tree elem_type;
+
+      adj.base_index = i;
+      adj.prev_clone_index = i;
+      switch (sc->args[i].arg_type)
+	{
+	default:
+	  /* No adjustment necessary for scalar arguments.  */
+	  adj.op = IPA_PARAM_OP_COPY;
+	  break;
+	case SIMD_CLONE_ARG_TYPE_LINEAR_UVAL_VARIABLE_STEP:
+	  adj.op = IPA_PARAM_OP_COPY;
+	  break;
+	case SIMD_CLONE_ARG_TYPE_MASK:
+	case SIMD_CLONE_ARG_TYPE_LINEAR_VAL_CONSTANT_STEP:
+	case SIMD_CLONE_ARG_TYPE_LINEAR_VAL_VARIABLE_STEP:
+	case SIMD_CLONE_ARG_TYPE_VECTOR:
+	  if (sc->args[i].arg_type == SIMD_CLONE_ARG_TYPE_MASK
+	      && sc->mask_mode != VOIDmode)
+	    elem_type = boolean_type_node;
+	  else
+	    elem_type = TREE_TYPE (sc->args[i].vector_type);
+	  if (INTEGRAL_TYPE_P (elem_type) || POINTER_TYPE_P (elem_type))
+	    veclen = sc->vecsize_int;
+	  else
+	    veclen = sc->vecsize_float;
+	  if (known_eq (veclen, 0U))
+	    veclen = sc->simdlen;
+	  else
+	    veclen
+	      = exact_div (veclen,
+			   GET_MODE_BITSIZE (SCALAR_TYPE_MODE (elem_type)));
+	  if (multiple_p (veclen, sc->simdlen))
+	    veclen = sc->simdlen;
+	  if (sc->args[i].arg_type == SIMD_CLONE_ARG_TYPE_MASK)
+	    {
+	      adj.user_flag = 1;
+	      adj.param_prefix_index = IPA_PARAM_PREFIX_MASK;
+	    }
+	  else
+	    adj.param_prefix_index = IPA_PARAM_PREFIX_SIMD;
+	  adj.op = IPA_PARAM_OP_NEW;
+	  adj.type =  sc->args[i].vector_type;
+	  k = vector_unroll_factor (sc->simdlen, veclen);
+	  for (j = 1; j < k; j++)
+	    {
+	      vec_safe_push (new_params, adj);
+	      if (j == 1)
+		{
+		  memset (&adj, 0, sizeof (adj));
+		  adj.op = IPA_PARAM_OP_NEW;
+		  adj.user_flag = 1;
+		  if (sc->args[i].arg_type == SIMD_CLONE_ARG_TYPE_MASK)
+		    adj.param_prefix_index = IPA_PARAM_PREFIX_MASK;
+		  else
+		    adj.param_prefix_index = IPA_PARAM_PREFIX_SIMD;
+		  adj.base_index = i;
+		  adj.prev_clone_index = i;
+		  adj.type = sc->args[i].vector_type;
+		}
+	    }
+	}
+      vec_safe_push (new_params, adj);
+    }
   ipa_param_body_adjustments *adjustments
-    = simd_clone_adjust_argument_types (node);
-  gcc_assert (adjustments);
+    = new ipa_param_body_adjustments (new_params, node->decl);
+  adjustments->modify_formal_parameters ();
 
   push_gimplify_context ();
 
@@ -2050,9 +2099,9 @@ expand_simd_clones (struct cgraph_node *node)
 	    {
 	      TREE_TYPE (n->decl)
 		= build_distinct_type_copy (TREE_TYPE (n->decl));
-	      targetm.simd_clone.adjust (n);
 	      simd_clone_adjust_return_type (n);
 	      simd_clone_adjust_argument_types (n);
+	      targetm.simd_clone.adjust (n);
 	    }
 	  if (dump_file)
 	    fprintf (dump_file, "\nGenerated %s clone %s\n",

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH 0/8] omp: Replace simd_clone_subparts with TYPE_VECTOR_SUBPARTS
  2023-08-30  8:49 aarch64, vect, omp: Add SVE support for simd clones [PR 96342] Andre Vieira (lists)
                   ` (8 preceding siblings ...)
  2023-10-18 14:40 ` aarch64, vect, omp: " Andre Vieira (lists)
@ 2023-10-18 14:41 ` Andre Vieira (lists)
  2023-10-19  7:10   ` Richard Biener
  9 siblings, 1 reply; 40+ messages in thread
From: Andre Vieira (lists) @ 2023-10-18 14:41 UTC (permalink / raw)
  To: gcc-patches; +Cc: Richard Sandiford, Richard Biener, jakub

[-- Attachment #1: Type: text/plain, Size: 367 bytes --]


Refactor simd clone handling code ahead of support for poly simdlen.

gcc/ChangeLog:

	* omp-simd-clone.cc (simd_clone_subparts): Remove.
	(simd_clone_init_simd_arrays): Replace simd_clone_supbarts with
	TYPE_VECTOR_SUBPARTS.
	(ipa_simd_modify_function_body): Likewise.
	* tree-vect-stmts.cc (vectorizable_simd_clone_call): Likewise.
	(simd_clone_subparts): Remove.

[-- Attachment #2: sve_simd_clones_0.patch --]
[-- Type: text/plain, Size: 10191 bytes --]

diff --git a/gcc/omp-simd-clone.cc b/gcc/omp-simd-clone.cc
index c1cb7cc8a5c770940bc2032f824e084b37e96dbe..a42643400ddcf10961633448b49d4caafb999f12 100644
--- a/gcc/omp-simd-clone.cc
+++ b/gcc/omp-simd-clone.cc
@@ -255,16 +255,6 @@ ok_for_auto_simd_clone (struct cgraph_node *node)
   return true;
 }
 
-
-/* Return the number of elements in vector type VECTYPE, which is associated
-   with a SIMD clone.  At present these always have a constant length.  */
-
-static unsigned HOST_WIDE_INT
-simd_clone_subparts (tree vectype)
-{
-  return TYPE_VECTOR_SUBPARTS (vectype).to_constant ();
-}
-
 /* Allocate a fresh `simd_clone' and return it.  NARGS is the number
    of arguments to reserve space for.  */
 
@@ -1028,7 +1018,7 @@ simd_clone_init_simd_arrays (struct cgraph_node *node,
 	    }
 	  continue;
 	}
-      if (known_eq (simd_clone_subparts (TREE_TYPE (arg)),
+      if (known_eq (TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg)),
 		    node->simdclone->simdlen))
 	{
 	  tree ptype = build_pointer_type (TREE_TYPE (TREE_TYPE (array)));
@@ -1040,7 +1030,7 @@ simd_clone_init_simd_arrays (struct cgraph_node *node,
 	}
       else
 	{
-	  unsigned int simdlen = simd_clone_subparts (TREE_TYPE (arg));
+	  poly_uint64 simdlen = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg));
 	  unsigned int times = vector_unroll_factor (node->simdclone->simdlen,
 						     simdlen);
 	  tree ptype = build_pointer_type (TREE_TYPE (TREE_TYPE (array)));
@@ -1226,9 +1216,9 @@ ipa_simd_modify_function_body (struct cgraph_node *node,
 		  iter, NULL_TREE, NULL_TREE);
       adjustments->register_replacement (&(*adjustments->m_adj_params)[j], r);
 
-      if (multiple_p (node->simdclone->simdlen, simd_clone_subparts (vectype)))
+      if (multiple_p (node->simdclone->simdlen, TYPE_VECTOR_SUBPARTS (vectype)))
 	j += vector_unroll_factor (node->simdclone->simdlen,
-				   simd_clone_subparts (vectype)) - 1;
+				   TYPE_VECTOR_SUBPARTS (vectype)) - 1;
     }
   adjustments->sort_replacements ();
 
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index 9bb43e98f56d18929c9c02227954fdf38eafefd8..a9156975d64c7a335ffd27614e87f9d11b23d1ba 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -4126,16 +4126,6 @@ vect_simd_lane_linear (tree op, class loop *loop,
     }
 }
 
-/* Return the number of elements in vector type VECTYPE, which is associated
-   with a SIMD clone.  At present these vectors always have a constant
-   length.  */
-
-static unsigned HOST_WIDE_INT
-simd_clone_subparts (tree vectype)
-{
-  return TYPE_VECTOR_SUBPARTS (vectype).to_constant ();
-}
-
 /* Function vectorizable_simd_clone_call.
 
    Check if STMT_INFO performs a function call that can be vectorized
@@ -4429,7 +4419,7 @@ vectorizable_simd_clone_call (vec_info *vinfo, stmt_vec_info stmt_info,
 							    slp_node);
 	  if (arginfo[i].vectype == NULL
 	      || !constant_multiple_p (bestn->simdclone->simdlen,
-				       simd_clone_subparts (arginfo[i].vectype)))
+				       TYPE_VECTOR_SUBPARTS (arginfo[i].vectype)))
 	    return false;
 	}
 
@@ -4444,10 +4434,11 @@ vectorizable_simd_clone_call (vec_info *vinfo, stmt_vec_info stmt_info,
 
       if (bestn->simdclone->args[i].arg_type == SIMD_CLONE_ARG_TYPE_MASK)
 	{
+	  tree clone_arg_vectype = bestn->simdclone->args[i].vector_type;
 	  if (bestn->simdclone->mask_mode == VOIDmode)
 	    {
-	      if (simd_clone_subparts (bestn->simdclone->args[i].vector_type)
-		  != simd_clone_subparts (arginfo[i].vectype))
+	      if (maybe_ne (TYPE_VECTOR_SUBPARTS (clone_arg_vectype),
+			    TYPE_VECTOR_SUBPARTS (arginfo[i].vectype)))
 		{
 		  /* FORNOW we only have partial support for vector-type masks
 		     that can't hold all of simdlen. */
@@ -4464,7 +4455,7 @@ vectorizable_simd_clone_call (vec_info *vinfo, stmt_vec_info stmt_info,
 	      if (!SCALAR_INT_MODE_P (TYPE_MODE (arginfo[i].vectype))
 		  || maybe_ne (exact_div (bestn->simdclone->simdlen,
 					  num_mask_args),
-			       simd_clone_subparts (arginfo[i].vectype)))
+			       TYPE_VECTOR_SUBPARTS (arginfo[i].vectype)))
 		{
 		  /* FORNOW we only have partial support for integer-type masks
 		     that represent the same number of lanes as the
@@ -4598,21 +4589,24 @@ vectorizable_simd_clone_call (vec_info *vinfo, stmt_vec_info stmt_info,
 	{
 	  unsigned int k, l, m, o;
 	  tree atype;
+	  poly_uint64 callee_nelements, caller_nelements;
 	  op = gimple_call_arg (stmt, i + arg_offset);
 	  switch (bestn->simdclone->args[i].arg_type)
 	    {
 	    case SIMD_CLONE_ARG_TYPE_VECTOR:
 	      atype = bestn->simdclone->args[i].vector_type;
-	      o = vector_unroll_factor (nunits,
-					simd_clone_subparts (atype));
+	      caller_nelements = TYPE_VECTOR_SUBPARTS (arginfo[i].vectype);
+	      callee_nelements = TYPE_VECTOR_SUBPARTS (atype);
+	      o = vector_unroll_factor (nunits, callee_nelements);
 	      for (m = j * o; m < (j + 1) * o; m++)
 		{
-		  if (simd_clone_subparts (atype)
-		      < simd_clone_subparts (arginfo[i].vectype))
+		  if (known_lt (callee_nelements, caller_nelements))
 		    {
 		      poly_uint64 prec = GET_MODE_BITSIZE (TYPE_MODE (atype));
-		      k = (simd_clone_subparts (arginfo[i].vectype)
-			   / simd_clone_subparts (atype));
+		      if (!constant_multiple_p (caller_nelements,
+						callee_nelements, &k))
+			gcc_unreachable ();
+
 		      gcc_assert ((k & (k - 1)) == 0);
 		      if (m == 0)
 			{
@@ -4643,8 +4637,9 @@ vectorizable_simd_clone_call (vec_info *vinfo, stmt_vec_info stmt_info,
 		    }
 		  else
 		    {
-		      k = (simd_clone_subparts (atype)
-			   / simd_clone_subparts (arginfo[i].vectype));
+		      if (!constant_multiple_p (callee_nelements,
+						caller_nelements, &k))
+			gcc_unreachable ();
 		      gcc_assert ((k & (k - 1)) == 0);
 		      vec<constructor_elt, va_gc> *ctor_elts;
 		      if (k != 1)
@@ -4706,20 +4701,19 @@ vectorizable_simd_clone_call (vec_info *vinfo, stmt_vec_info stmt_info,
 		  tree elt_type = TREE_TYPE (atype);
 		  tree one = fold_convert (elt_type, integer_one_node);
 		  tree zero = fold_convert (elt_type, integer_zero_node);
-		  o = vector_unroll_factor (nunits,
-					    simd_clone_subparts (atype));
+		  callee_nelements = TYPE_VECTOR_SUBPARTS (atype);
+		  caller_nelements = TYPE_VECTOR_SUBPARTS (arginfo[i].vectype);
+		  o = vector_unroll_factor (nunits, callee_nelements);
 		  for (m = j * o; m < (j + 1) * o; m++)
 		    {
-		      if (simd_clone_subparts (atype)
-			  < simd_clone_subparts (arginfo[i].vectype))
+		      if (maybe_lt (callee_nelements, caller_nelements))
 			{
 			  /* The mask type has fewer elements than simdlen.  */
 
 			  /* FORNOW */
 			  gcc_unreachable ();
 			}
-		      else if (simd_clone_subparts (atype)
-			       == simd_clone_subparts (arginfo[i].vectype))
+		      else if (known_eq (callee_nelements, caller_nelements))
 			{
 			  /* The SIMD clone function has the same number of
 			     elements as the current function.  */
@@ -4757,9 +4751,9 @@ vectorizable_simd_clone_call (vec_info *vinfo, stmt_vec_info stmt_info,
 		{
 		  atype = bestn->simdclone->args[i].vector_type;
 		  /* Guess the number of lanes represented by atype.  */
-		  unsigned HOST_WIDE_INT atype_subparts
+		  poly_uint64 atype_subparts
 		    = exact_div (bestn->simdclone->simdlen,
-				 num_mask_args).to_constant ();
+				 num_mask_args);
 		  o = vector_unroll_factor (nunits, atype_subparts);
 		  for (m = j * o; m < (j + 1) * o; m++)
 		    {
@@ -4772,16 +4766,16 @@ vectorizable_simd_clone_call (vec_info *vinfo, stmt_vec_info stmt_info,
 							   &vec_oprnds[i]);
 			  vec_oprnds_i[i] = 0;
 			}
-		      if (atype_subparts
-			  < simd_clone_subparts (arginfo[i].vectype))
+		      if (maybe_lt (atype_subparts,
+				    TYPE_VECTOR_SUBPARTS (arginfo[i].vectype)))
 			{
 			  /* The mask argument has fewer elements than the
 			     input vector.  */
 			  /* FORNOW */
 			  gcc_unreachable ();
 			}
-		      else if (atype_subparts
-			       == simd_clone_subparts (arginfo[i].vectype))
+		      else if (known_eq (atype_subparts,
+					 TYPE_VECTOR_SUBPARTS (arginfo[i].vectype)))
 			{
 			  /* The vector mask argument matches the input
 			     in the number of lanes, but not necessarily
@@ -4900,7 +4894,7 @@ vectorizable_simd_clone_call (vec_info *vinfo, stmt_vec_info stmt_info,
       if (vec_dest)
 	{
 	  gcc_assert (ratype
-		      || known_eq (simd_clone_subparts (rtype), nunits));
+		      || known_eq (TYPE_VECTOR_SUBPARTS (rtype), nunits));
 	  if (ratype)
 	    new_temp = create_tmp_var (ratype);
 	  else if (useless_type_conversion_p (vectype, rtype))
@@ -4914,13 +4908,13 @@ vectorizable_simd_clone_call (vec_info *vinfo, stmt_vec_info stmt_info,
 
       if (vec_dest)
 	{
-	  if (!multiple_p (simd_clone_subparts (vectype), nunits))
+	  if (!multiple_p (TYPE_VECTOR_SUBPARTS (vectype), nunits))
 	    {
 	      unsigned int k, l;
 	      poly_uint64 prec = GET_MODE_BITSIZE (TYPE_MODE (vectype));
 	      poly_uint64 bytes = GET_MODE_SIZE (TYPE_MODE (vectype));
 	      k = vector_unroll_factor (nunits,
-					simd_clone_subparts (vectype));
+					TYPE_VECTOR_SUBPARTS (vectype));
 	      gcc_assert ((k & (k - 1)) == 0);
 	      for (l = 0; l < k; l++)
 		{
@@ -4950,10 +4944,12 @@ vectorizable_simd_clone_call (vec_info *vinfo, stmt_vec_info stmt_info,
 		vect_clobber_variable (vinfo, stmt_info, gsi, new_temp);
 	      continue;
 	    }
-	  else if (!multiple_p (nunits, simd_clone_subparts (vectype)))
+	  else if (!multiple_p (nunits, TYPE_VECTOR_SUBPARTS (vectype)))
 	    {
-	      unsigned int k = (simd_clone_subparts (vectype)
-				/ simd_clone_subparts (rtype));
+	      unsigned int k;
+	      if (!constant_multiple_p (TYPE_VECTOR_SUBPARTS (vectype),
+					TYPE_VECTOR_SUBPARTS (rtype), &k))
+		gcc_unreachable ();
 	      gcc_assert ((k & (k - 1)) == 0);
 	      if ((j & (k - 1)) == 0)
 		vec_alloc (ret_ctor_elts, k);
@@ -4961,7 +4957,7 @@ vectorizable_simd_clone_call (vec_info *vinfo, stmt_vec_info stmt_info,
 		{
 		  unsigned int m, o;
 		  o = vector_unroll_factor (nunits,
-					    simd_clone_subparts (rtype));
+					    TYPE_VECTOR_SUBPARTS (rtype));
 		  for (m = 0; m < o; m++)
 		    {
 		      tree tem = build4 (ARRAY_REF, rtype, new_temp,

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 8/8] aarch64: Add SVE support for simd clones [PR 96342]
  2023-08-30  9:19 ` [PATCH 8/8] aarch64: Add SVE support for simd clones [PR 96342] Andre Vieira (lists)
@ 2023-10-18 14:41   ` Andre Vieira (lists)
  2023-11-29 17:01     ` Richard Sandiford
  0 siblings, 1 reply; 40+ messages in thread
From: Andre Vieira (lists) @ 2023-10-18 14:41 UTC (permalink / raw)
  To: gcc-patches; +Cc: Richard Sandiford

[-- Attachment #1: Type: text/plain, Size: 1610 bytes --]

Rebased, no major changes, still needs review.

On 30/08/2023 10:19, Andre Vieira (lists) via Gcc-patches wrote:
> This patch finalizes adding support for the generation of SVE simd 
> clones when no simdlen is provided, following the ABI rules where the 
> widest data type determines the minimum amount of elements in a length 
> agnostic vector.
> 
> gcc/ChangeLog:
> 
>          * config/aarch64/aarch64-protos.h (add_sve_type_attribute): 
> Declare.
>      * config/aarch64/aarch64-sve-builtins.cc (add_sve_type_attribute): 
> Make
>      visibility global.
>      * config/aarch64/aarch64.cc (aarch64_fntype_abi): Ensure SVE ABI is
>      chosen over SIMD ABI if a SVE type is used in return or arguments.
>      (aarch64_simd_clone_compute_vecsize_and_simdlen): Create VLA simd 
> clone
>      when no simdlen is provided, according to ABI rules.
>      (aarch64_simd_clone_adjust): Add '+sve' attribute to SVE simd clones.
>      (aarch64_simd_clone_adjust_ret_or_param): New.
>      (TARGET_SIMD_CLONE_ADJUST_RET_OR_PARAM): Define.
>      * omp-simd-clone.cc (simd_clone_mangle): Print 'x' for VLA simdlen.
>      (simd_clone_adjust): Adapt safelen check to be compatible with VLA
>      simdlen.
> 
> gcc/testsuite/ChangeLog:
> 
>      * c-c++-common/gomp/declare-variant-14.c: Adapt aarch64 scan.
>      * gfortran.dg/gomp/declare-variant-14.f90: Likewise.
>      * gcc.target/aarch64/declare-simd-1.c: Remove warning checks where no
>      longer necessary.
>      * gcc.target/aarch64/declare-simd-2.c: Add SVE clone scan.

[-- Attachment #2: sve_simd_clones_7v2.patch --]
[-- Type: text/plain, Size: 19045 bytes --]

diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
index 60a55f4bc1956786ea687fc7cad7ec9e4a84e1f0..769d637f63724a7f0044f48f3dd683e0fb46049c 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -1005,6 +1005,8 @@ namespace aarch64_sve {
 #ifdef GCC_TARGET_H
   bool verify_type_context (location_t, type_context_kind, const_tree, bool);
 #endif
+ void add_sve_type_attribute (tree, unsigned int, unsigned int,
+			      const char *, const char *);
 }
 
 extern void aarch64_split_combinev16qi (rtx operands[3]);
diff --git a/gcc/config/aarch64/aarch64-sve-builtins.cc b/gcc/config/aarch64/aarch64-sve-builtins.cc
index 161a14edde7c9fb1b13b146cf50463e2d78db264..6f99c438d10daa91b7e3b623c995489f1a8a0f4c 100644
--- a/gcc/config/aarch64/aarch64-sve-builtins.cc
+++ b/gcc/config/aarch64/aarch64-sve-builtins.cc
@@ -569,14 +569,16 @@ static bool reported_missing_registers_p;
 /* Record that TYPE is an ABI-defined SVE type that contains NUM_ZR SVE vectors
    and NUM_PR SVE predicates.  MANGLED_NAME, if nonnull, is the ABI-defined
    mangling of the type.  ACLE_NAME is the <arm_sve.h> name of the type.  */
-static void
+void
 add_sve_type_attribute (tree type, unsigned int num_zr, unsigned int num_pr,
 			const char *mangled_name, const char *acle_name)
 {
   tree mangled_name_tree
     = (mangled_name ? get_identifier (mangled_name) : NULL_TREE);
+  tree acle_name_tree
+    = (acle_name ? get_identifier (acle_name) : NULL_TREE);
 
-  tree value = tree_cons (NULL_TREE, get_identifier (acle_name), NULL_TREE);
+  tree value = tree_cons (NULL_TREE, acle_name_tree, NULL_TREE);
   value = tree_cons (NULL_TREE, mangled_name_tree, value);
   value = tree_cons (NULL_TREE, size_int (num_pr), value);
   value = tree_cons (NULL_TREE, size_int (num_zr), value);
diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 37507f091c2a6154fa944c3a9fad6a655ab5d5a1..cb0947b18c6a611d55579b5b08d93f6a4a9c3b2c 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -4080,13 +4080,13 @@ aarch64_takes_arguments_in_sve_regs_p (const_tree fntype)
 static const predefined_function_abi &
 aarch64_fntype_abi (const_tree fntype)
 {
-  if (lookup_attribute ("aarch64_vector_pcs", TYPE_ATTRIBUTES (fntype)))
-    return aarch64_simd_abi ();
-
   if (aarch64_returns_value_in_sve_regs_p (fntype)
       || aarch64_takes_arguments_in_sve_regs_p (fntype))
     return aarch64_sve_abi ();
 
+  if (lookup_attribute ("aarch64_vector_pcs", TYPE_ATTRIBUTES (fntype)))
+    return aarch64_simd_abi ();
+
   return default_function_abi;
 }
 
@@ -27467,7 +27467,7 @@ aarch64_simd_clone_compute_vecsize_and_simdlen (struct cgraph_node *node,
 					int num, bool explicit_p)
 {
   tree t, ret_type;
-  unsigned int nds_elt_bits;
+  unsigned int nds_elt_bits, wds_elt_bits;
   int count;
   unsigned HOST_WIDE_INT const_simdlen;
 
@@ -27513,10 +27513,14 @@ aarch64_simd_clone_compute_vecsize_and_simdlen (struct cgraph_node *node,
   if (TREE_CODE (ret_type) != VOID_TYPE)
     {
       nds_elt_bits = lane_size (SIMD_CLONE_ARG_TYPE_VECTOR, ret_type);
+      wds_elt_bits = nds_elt_bits;
       vec_elts.safe_push (std::make_pair (ret_type, nds_elt_bits));
     }
   else
-    nds_elt_bits = POINTER_SIZE;
+    {
+      nds_elt_bits = POINTER_SIZE;
+      wds_elt_bits = 0;
+    }
 
   int i;
   tree type_arg_types = TYPE_ARG_TYPES (TREE_TYPE (node->decl));
@@ -27524,30 +27528,36 @@ aarch64_simd_clone_compute_vecsize_and_simdlen (struct cgraph_node *node,
   for (t = (decl_arg_p ? DECL_ARGUMENTS (node->decl) : type_arg_types), i = 0;
        t && t != void_list_node; t = TREE_CHAIN (t), i++)
     {
-      tree arg_type = decl_arg_p ? TREE_TYPE (t) : TREE_VALUE (t);
+      tree type = decl_arg_p ? TREE_TYPE (t) : TREE_VALUE (t);
       if (clonei->args[i].arg_type != SIMD_CLONE_ARG_TYPE_UNIFORM
-	  && !supported_simd_type (arg_type))
+	  && !supported_simd_type (type))
 	{
 	  if (!explicit_p)
 	    ;
-	  else if (COMPLEX_FLOAT_TYPE_P (ret_type))
+	  else if (COMPLEX_FLOAT_TYPE_P (type))
 	    warning_at (DECL_SOURCE_LOCATION (node->decl), 0,
 			"GCC does not currently support argument type %qT "
-			"for simd", arg_type);
+			"for simd", type);
 	  else
 	    warning_at (DECL_SOURCE_LOCATION (node->decl), 0,
 			"unsupported argument type %qT for simd",
-			arg_type);
+			type);
 	  return 0;
 	}
-      unsigned lane_bits = lane_size (clonei->args[i].arg_type, arg_type);
+      unsigned lane_bits = lane_size (clonei->args[i].arg_type, type);
       if (clonei->args[i].arg_type == SIMD_CLONE_ARG_TYPE_VECTOR)
-	vec_elts.safe_push (std::make_pair (arg_type, lane_bits));
+	vec_elts.safe_push (std::make_pair (type, lane_bits));
       if (nds_elt_bits > lane_bits)
 	nds_elt_bits = lane_bits;
+      else if (wds_elt_bits < lane_bits)
+	wds_elt_bits = lane_bits;
     }
 
-  clonei->vecsize_mangle = 'n';
+  /* If we could not determine the WDS type from available parameters/return,
+     then fallback to using uintptr_t.  */
+  if (wds_elt_bits == 0)
+    wds_elt_bits = POINTER_SIZE;
+
   clonei->mask_mode = VOIDmode;
   poly_uint64 simdlen;
   auto_vec<poly_uint64> simdlens (2);
@@ -27558,6 +27568,11 @@ aarch64_simd_clone_compute_vecsize_and_simdlen (struct cgraph_node *node,
       simdlen = exact_div (poly_uint64 (64), nds_elt_bits);
       simdlens.safe_push (simdlen);
       simdlens.safe_push (simdlen * 2);
+      /* Only create a SVE simd clone if we aren't dealing with an unprototyped
+	 function.  */
+      if (DECL_ARGUMENTS (node->decl) != 0
+	  || type_arg_types != 0)
+	simdlens.safe_push (exact_div (poly_uint64 (128, 128), wds_elt_bits));
     }
   else
     simdlens.safe_push (clonei->simdlen);
@@ -27578,19 +27593,20 @@ aarch64_simd_clone_compute_vecsize_and_simdlen (struct cgraph_node *node,
   while (j < count && !simdlens.is_empty ())
     {
       bool remove_simdlen = false;
-      for (auto elt : vec_elts)
-	if (known_gt (simdlens[j] * elt.second, 128U))
-	  {
-	    /* Don't issue a warning for every simdclone when there is no
-	       specific simdlen clause.  */
-	    if (explicit_p && known_ne (clonei->simdlen, 0U))
-	      warning_at (DECL_SOURCE_LOCATION (node->decl), 0,
-			  "GCC does not currently support simdlen %wd for "
-			  "type %qT",
-			  constant_lower_bound (simdlens[j]), elt.first);
-	    remove_simdlen = true;
-	    break;
-	  }
+      if (simdlens[j].is_constant ())
+	for (auto elt : vec_elts)
+	  if (known_gt (simdlens[j] * elt.second, 128U))
+	    {
+	      /* Don't issue a warning for every simdclone when there is no
+		 specific simdlen clause.  */
+	      if (explicit_p && known_ne (clonei->simdlen, 0U))
+		warning_at (DECL_SOURCE_LOCATION (node->decl), 0,
+			    "GCC does not currently support simdlen %wd for "
+			    "type %qT",
+			    constant_lower_bound (simdlens[j]), elt.first);
+	      remove_simdlen = true;
+	      break;
+	    }
       if (remove_simdlen)
 	{
 	  count--;
@@ -27618,9 +27634,36 @@ aarch64_simd_clone_compute_vecsize_and_simdlen (struct cgraph_node *node,
 
   gcc_assert (num < count);
   clonei->simdlen = simdlens[num];
+  if (clonei->simdlen.is_constant ())
+    clonei->vecsize_mangle = 'n';
+  else
+    {
+      clonei->vecsize_mangle = 's';
+      clonei->inbranch = 1;
+    }
   return count;
 }
 
+static tree
+simd_clone_adjust_sve_vector_type (tree type, bool is_mask, poly_uint64 simdlen)
+{
+    unsigned int num_zr = 0;
+    unsigned int num_pr = 0;
+    type = TREE_TYPE (type);
+    type = build_vector_type (type, simdlen);
+    if (is_mask)
+      {
+	type = truth_type_for (type);
+	num_pr = 1;
+      }
+    else
+      num_zr = 1;
+
+    aarch64_sve::add_sve_type_attribute (type, num_zr, num_pr, NULL,
+					 NULL);
+    return type;
+}
+
 /* Implement TARGET_SIMD_CLONE_ADJUST.  */
 
 static void
@@ -27632,6 +27675,69 @@ aarch64_simd_clone_adjust (struct cgraph_node *node)
   tree t = TREE_TYPE (node->decl);
   TYPE_ATTRIBUTES (t) = make_attribute ("aarch64_vector_pcs", "default",
 					TYPE_ATTRIBUTES (t));
+
+  cl_target_option cur_target;
+  poly_uint16 old_sve_vg;
+  bool m_old_have_regs_of_mode[MAX_MACHINE_MODE];
+
+  if (node->simdclone->vecsize_mangle == 's')
+    {
+      tree target = build_string (strlen ("+sve"), "+sve");
+      aarch64_option_valid_attribute_p (node->decl, NULL_TREE, target, 0);
+      cl_target_option_save (&cur_target, &global_options, &global_options_set);
+      tree new_target = DECL_FUNCTION_SPECIFIC_TARGET (node->decl);
+      cl_target_option_restore (&global_options, &global_options_set,
+				TREE_TARGET_OPTION (new_target));
+      aarch64_override_options_internal (&global_options);
+      memcpy (m_old_have_regs_of_mode, have_regs_of_mode,
+	      sizeof (have_regs_of_mode));
+      for (int i = 0; i < NUM_MACHINE_MODES; ++i)
+	if (aarch64_sve_mode_p ((machine_mode) i))
+	  have_regs_of_mode[i] = true;
+      old_sve_vg = aarch64_sve_vg;
+      if (!node->simdclone->simdlen.is_constant ())
+	aarch64_sve_vg = poly_uint16 (2, 2);
+    }
+  cgraph_simd_clone *sc = node->simdclone;
+
+  for (unsigned i = 0; i < sc->nargs; ++i)
+    {
+      bool is_mask = false;
+      tree type;
+      switch (sc->args[i].arg_type)
+	{
+	case SIMD_CLONE_ARG_TYPE_MASK:
+	  is_mask = true;
+	  gcc_fallthrough ();
+	case SIMD_CLONE_ARG_TYPE_VECTOR:
+	case SIMD_CLONE_ARG_TYPE_LINEAR_VAL_CONSTANT_STEP:
+	case SIMD_CLONE_ARG_TYPE_LINEAR_VAL_VARIABLE_STEP:
+	  type = sc->args[i].vector_type;
+	  gcc_assert (VECTOR_TYPE_P (type));
+	  if (node->simdclone->vecsize_mangle == 's')
+	    type = simd_clone_adjust_sve_vector_type (type, is_mask,
+						      sc->simdlen);
+	  else if (is_mask)
+	    type = truth_type_for (type);
+	  sc->args[i].vector_type = type;
+	default:
+	    continue;
+	}
+    }
+  if (node->simdclone->vecsize_mangle == 's')
+    {
+      tree ret_type = TREE_TYPE (t);
+      if (VECTOR_TYPE_P (ret_type))
+	TREE_TYPE (t)
+	  = simd_clone_adjust_sve_vector_type (ret_type, false,
+					       node->simdclone->simdlen);
+      /* Restore current options and AARCH64_SVE_VG.  */
+      cl_target_option_restore (&global_options, &global_options_set, &cur_target);
+      aarch64_override_options_internal (&global_options);
+      memcpy (have_regs_of_mode, m_old_have_regs_of_mode,
+	      sizeof (have_regs_of_mode));
+      aarch64_sve_vg = old_sve_vg;
+    }
 }
 
 /* Implement TARGET_SIMD_CLONE_USABLE.  */
@@ -27645,6 +27751,10 @@ aarch64_simd_clone_usable (struct cgraph_node *node)
       if (!TARGET_SIMD)
 	return -1;
       return 0;
+    case 's':
+      if (!TARGET_SVE)
+	return -1;
+      return 0;
     default:
       gcc_unreachable ();
     }
diff --git a/gcc/omp-simd-clone.cc b/gcc/omp-simd-clone.cc
index fb80888190c88e29895ecfbbe1b17d390c9a9dfe..150af24c5bc52b6737f3ca46ba73d1c890b143e9 100644
--- a/gcc/omp-simd-clone.cc
+++ b/gcc/omp-simd-clone.cc
@@ -541,9 +541,12 @@ simd_clone_mangle (struct cgraph_node *node,
   pp_string (&pp, "_ZGV");
   pp_character (&pp, vecsize_mangle);
   pp_character (&pp, mask);
-  /* For now, simdlen is always constant, while variable simdlen pp 'n'.  */
-  unsigned int len = simdlen.to_constant ();
-  pp_decimal_int (&pp, (len));
+
+  unsigned long long len = 0;
+  if (simdlen.is_constant (&len))
+    pp_decimal_int (&pp, (int) (len));
+  else
+    pp_character (&pp, 'x');
 
   for (n = 0; n < clone_info->nargs; ++n)
     {
@@ -1541,8 +1544,8 @@ simd_clone_adjust (struct cgraph_node *node)
 	 below).  */
       loop = alloc_loop ();
       cfun->has_force_vectorize_loops = true;
-      /* For now, simlen is always constant.  */
-      loop->safelen = node->simdclone->simdlen.to_constant ();
+      /* We can assert that safelen is the 'minimum' simdlen.  */
+      loop->safelen = constant_lower_bound (node->simdclone->simdlen);
       loop->force_vectorize = true;
       loop->header = body_bb;
     }
diff --git a/gcc/testsuite/c-c++-common/gomp/declare-variant-14.c b/gcc/testsuite/c-c++-common/gomp/declare-variant-14.c
index e3668893afe33a58c029cddd433d9bf43cce2bfa..12f8b3b839b7f3ff9e4f99768e59c0e1c5339062 100644
--- a/gcc/testsuite/c-c++-common/gomp/declare-variant-14.c
+++ b/gcc/testsuite/c-c++-common/gomp/declare-variant-14.c
@@ -21,7 +21,7 @@ test1 (int x)
      shall call f01 with score 8.  */
   /* { dg-final { scan-tree-dump-not "f04 \\\(x" "optimized" } } */
   /* { dg-final { scan-tree-dump-times "f03 \\\(x" 14 "optimized" { target { !aarch64*-*-* } } } } */
-  /* { dg-final { scan-tree-dump-times "f03 \\\(x" 10 "optimized" { target { aarch64*-*-* } } } } */
+  /* { dg-final { scan-tree-dump-times "f03 \\\(x" 12 "optimized" { target { aarch64*-*-* } } } } */
   /* { dg-final { scan-tree-dump-times "f01 \\\(x" 4 "optimized" { target { !aarch64*-*-* } } } } */
   /* { dg-final { scan-tree-dump-times "f01 \\\(x" 0 "optimized" { target { aarch64*-*-* } } } } */
   int a = f04 (x);
diff --git a/gcc/testsuite/gcc.target/aarch64/declare-simd-1.c b/gcc/testsuite/gcc.target/aarch64/declare-simd-1.c
index aab8c17f0c442a7cda4dce23cc18162a0b7f676e..add6e7c93019834fbd5bed5ead18b52d4cdd0a37 100644
--- a/gcc/testsuite/gcc.target/aarch64/declare-simd-1.c
+++ b/gcc/testsuite/gcc.target/aarch64/declare-simd-1.c
@@ -4,28 +4,39 @@
 extern "C" {
 #endif
 #pragma omp declare simd
-int __attribute__ ((const)) f00 (int a , char b) /* { dg-warning {GCC does not currently support a simdclone with simdlens 8 and 16 for these types.} } */
+int __attribute__ ((const)) f00 (int a , char b)
 {
   return a + b;
 }
 
+/* { dg-final { scan-assembler-not {_ZGVn[a-z0-9]+_f00} } } */
+/* { dg-final { scan-assembler {_ZGVsMxvv_f00} } } */
+
 #pragma omp declare simd
-long long __attribute__ ((const)) f01 (int a , short b) /* { dg-warning {GCC does not currently support a simdclone with simdlens 4 and 8 for these types.} } */
+long long __attribute__ ((const)) f01 (int a , short b)
 {
   return a + b;
 }
+/* { dg-final { scan-assembler-not {_ZGVn[a-z0-9]+_f01} } } */
+/* { dg-final { scan-assembler {_ZGVsMxvv_f01} } } */
 
 #pragma omp declare simd linear(b)
-long long __attribute__ ((const)) f02 (short *b, int a) /* { dg-warning {GCC does not currently support a simdclone with simdlens 4 and 8 for these types.} } */
+long long __attribute__ ((const)) f02 (short *b, int a)
 {
   return a + *b;
 }
 
+/* { dg-final { scan-assembler-not {_ZGVn[a-z0-9]+_f02} } } */
+/* { dg-final { scan-assembler {_ZGVsMxl2v_f02} } } */
+
 #pragma omp declare simd uniform(b)
-void f03 (char b, int a) /* { dg-warning {GCC does not currently support a simdclone with simdlens 8 and 16 for these types.} } */
+void f03 (char b, int a)
 {
 }
 
+/* { dg-final { scan-assembler-not {_ZGVn[a-z0-9]+_f03} } } */
+/* { dg-final { scan-assembler {_ZGVsMxuv_f03} } } */
+
 #pragma omp declare simd simdlen(4)
 double f04 (void) /* { dg-warning {GCC does not currently support simdlen 4 for type 'double'} } */
 {
diff --git a/gcc/testsuite/gcc.target/aarch64/declare-simd-2.c b/gcc/testsuite/gcc.target/aarch64/declare-simd-2.c
index abb128ffc9cd2c1353b99eb38aae72377746e6d6..604869a30456e4db988bba86e059a27f19dda589 100644
--- a/gcc/testsuite/gcc.target/aarch64/declare-simd-2.c
+++ b/gcc/testsuite/gcc.target/aarch64/declare-simd-2.c
@@ -10,6 +10,7 @@ short __attribute__ ((const)) f00 (short a , char b)
 }
 /* { dg-final { scan-assembler {_ZGVnN8vv_f00:} } } */
 /* { dg-final { scan-assembler {_ZGVnM8vv_f00:} } } */
+/* { dg-final { scan-assembler {_ZGVsMxvv_f00:} } } */
 
 #pragma omp declare simd notinbranch
 short __attribute__ ((const)) f01 (int a , short b)
@@ -17,6 +18,7 @@ short __attribute__ ((const)) f01 (int a , short b)
   return a + b;
 }
 /* { dg-final { scan-assembler {_ZGVnN4vv_f01:} } } */
+/* { dg-final { scan-assembler {_ZGVsMxvv_f01:} } } */
 
 #pragma omp declare simd linear(b) inbranch
 int __attribute__ ((const)) f02 (int a, short *b)
@@ -24,6 +26,7 @@ int __attribute__ ((const)) f02 (int a, short *b)
   return a + *b;
 }
 /* { dg-final { scan-assembler {_ZGVnM4vl2_f02:} } } */
+/* { dg-final { scan-assembler {_ZGVsMxvl2_f02:} } } */
 
 #pragma omp declare simd uniform(a) notinbranch
 void f03 (char b, int a)
@@ -31,6 +34,7 @@ void f03 (char b, int a)
 }
 /* { dg-final { scan-assembler {_ZGVnN8vu_f03:} } } */
 /* { dg-final { scan-assembler {_ZGVnN16vu_f03:} } } */
+/* { dg-final { scan-assembler {_ZGVsMxvu_f03:} } } */
 
 #pragma omp declare simd simdlen(2)
 float f04 (double a)
@@ -39,6 +43,7 @@ float f04 (double a)
 }
 /* { dg-final { scan-assembler {_ZGVnN2v_f04:} } } */
 /* { dg-final { scan-assembler {_ZGVnM2v_f04:} } } */
+/* { dg-final { scan-assembler-not {_ZGVs[0-9a-z]*_f04:} } } */
 
 #pragma omp declare simd uniform(a) linear (b)
 void f05 (short a, short *b, short c)
@@ -50,6 +55,7 @@ void f05 (short a, short *b, short c)
 /* { dg-final { scan-assembler {_ZGVnN4ul2v_f05:} } } */
 /* { dg-final { scan-assembler {_ZGVnM8ul2v_f05:} } } */
 /* { dg-final { scan-assembler {_ZGVnM8ul2v_f05:} } } */
+/* { dg-final { scan-assembler {_ZGVsMxul2v_f05:} } } */
 #ifdef __cplusplus
 }
 #endif
diff --git a/gcc/testsuite/gfortran.dg/gomp/declare-variant-14.f90 b/gcc/testsuite/gfortran.dg/gomp/declare-variant-14.f90
index 6319df0558f37b95f1b2eb17374bdb4ecbc33295..38677b8f7a76b960ce9363b1c0cabf6fc5086ab6 100644
--- a/gcc/testsuite/gfortran.dg/gomp/declare-variant-14.f90
+++ b/gcc/testsuite/gfortran.dg/gomp/declare-variant-14.f90
@@ -41,7 +41,7 @@ contains
     ! shall call f01 with score 8.
     ! { dg-final { scan-tree-dump-not "f04 \\\(x" "optimized" } }
     ! { dg-final { scan-tree-dump-times "f03 \\\(x" 14 "optimized" { target { !aarch64*-*-* } } } }
-    ! { dg-final { scan-tree-dump-times "f03 \\\(x" 6 "optimized" { target { aarch64*-*-* } } } }
+    ! { dg-final { scan-tree-dump-times "f03 \\\(x" 8 "optimized" { target { aarch64*-*-* } } } }
     ! { dg-final { scan-tree-dump-times "f01 \\\(x" 4 "optimized" { target { !aarch64*-*-* } } } }
     ! { dg-final { scan-tree-dump-times "f01 \\\(x" 0 "optimized" { target { aarch64*-*-* } } } }
     a = f04 (x)
diff --git a/libgomp/testsuite/libgomp.c/declare-variant-1.c b/libgomp/testsuite/libgomp.c/declare-variant-1.c
index 6129f23a0f80585246957022d63608dc3a68f1ff..591867bdc97f28bcb64fbe8bb9db39ccee5fe643 100644
--- a/libgomp/testsuite/libgomp.c/declare-variant-1.c
+++ b/libgomp/testsuite/libgomp.c/declare-variant-1.c
@@ -48,7 +48,7 @@ test1 (int x)
   /* { dg-final { scan-ltrans-tree-dump-not "f04 \\\(x" "optimized" } } */
   /* { dg-final { scan-ltrans-tree-dump-times "f03 \\\(x" 14 "optimized" { target { !aarch64*-*-* } } } } } */
   /* { dg-final { scan-ltrans-tree-dump-times "f01 \\\(x" 4 "optimized" { target { !aarch64*-*-* } } } } } */
-  /* { dg-final { scan-ltrans-tree-dump-times "f03 \\\(x" 10 "optimized" { target { aarch64*-*-* } } } } } */
+  /* { dg-final { scan-ltrans-tree-dump-times "f03 \\\(x" 12 "optimized" { target { aarch64*-*-* } } } } } */
   /* { dg-final { scan-ltrans-tree-dump-not "f01 \\\(x" "optimized" { target { aarch64*-*-* } } } } } */
   int a = f04 (x);
   int b = f04 (x);

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 0/8] omp: Replace simd_clone_subparts with TYPE_VECTOR_SUBPARTS
  2023-10-18 14:41 ` [PATCH 0/8] omp: Replace simd_clone_subparts with TYPE_VECTOR_SUBPARTS Andre Vieira (lists)
@ 2023-10-19  7:10   ` Richard Biener
  0 siblings, 0 replies; 40+ messages in thread
From: Richard Biener @ 2023-10-19  7:10 UTC (permalink / raw)
  To: Andre Vieira (lists); +Cc: gcc-patches, Richard Sandiford, jakub

On Wed, 18 Oct 2023, Andre Vieira (lists) wrote:

> 
> Refactor simd clone handling code ahead of support for poly simdlen.

OK.

Richard.

> gcc/ChangeLog:
> 
> 	* omp-simd-clone.cc (simd_clone_subparts): Remove.
> 	(simd_clone_init_simd_arrays): Replace simd_clone_supbarts with
> 	TYPE_VECTOR_SUBPARTS.
> 	(ipa_simd_modify_function_body): Likewise.
> 	* tree-vect-stmts.cc (vectorizable_simd_clone_call): Likewise.
> 	(simd_clone_subparts): Remove.
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 4/8] vect: don't allow fully masked loops with non-masked simd clones [PR 110485]
  2023-10-18 14:40     ` Andre Vieira (lists)
@ 2023-10-19 12:06       ` Richard Biener
  0 siblings, 0 replies; 40+ messages in thread
From: Richard Biener @ 2023-10-19 12:06 UTC (permalink / raw)
  To: Andre Vieira (lists); +Cc: gcc-patches

On Wed, 18 Oct 2023, Andre Vieira (lists) wrote:

> Rebased on top of trunk, minor change to check if loop_vinfo since we now do
> some slp vectorization for simd_clones.
> 
> I assume the previous OK still holds.

Ack.

> On 30/08/2023 13:54, Richard Biener wrote:
> > On Wed, 30 Aug 2023, Andre Vieira (lists) wrote:
> > 
> >> When analyzing a loop and choosing a simdclone to use it is possible to
> >> choose
> >> a simdclone that cannot be used 'inbranch' for a loop that can use partial
> >> vectors.  This may lead to the vectorizer deciding to use partial vectors
> >> which are not supported for notinbranch simd clones. This patch fixes that
> >> by
> >> disabling the use of partial vectors once a notinbranch simd clone has been
> >> selected.
> > 
> > OK.
> > 
> >> gcc/ChangeLog:
> >>
> >>  PR tree-optimization/110485
> >>  * tree-vect-stmts.cc (vectorizable_simd_clone_call): Disable partial
> >>  vectors usage if a notinbranch simdclone has been selected.
> >>
> >> gcc/testsuite/ChangeLog:
> >>
> >>  * gcc.dg/gomp/pr110485.c: New test.
> >>
> > 

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Patch 3/8] vect: Fix vect_get_smallest_scalar_type for simd clones
  2023-10-18 14:40     ` Andre Vieira (lists)
@ 2023-10-19 12:07       ` Richard Biener
  0 siblings, 0 replies; 40+ messages in thread
From: Richard Biener @ 2023-10-19 12:07 UTC (permalink / raw)
  To: Andre Vieira (lists); +Cc: gcc-patches, Richard Sandiford, jakub

On Wed, 18 Oct 2023, Andre Vieira (lists) wrote:

> Made it a local function and changed prototype according to comments.
> 
> Is this OK?

OK.

>  gcc/ChangeLog:
>  	* tree-vect-data-refs.cc (vect_get_smallest_scalar_type): Special
>  	case
>  	simd clone calls and only use types that are mapped to vectors.
>         (simd_clone_call_p): New helper function.
>  	
> On 30/08/2023 13:54, Richard Biener wrote:
> > On Wed, 30 Aug 2023, Andre Vieira (lists) wrote:
> > 
> >> The vect_get_smallest_scalar_type helper function was using any argument to
> >> a
> >> simd clone call when trying to determine the smallest scalar type that
> >> would
> >> be vectorized.  This included the function pointer type in a MASK_CALL for
> >> instance, and would result in the wrong type being selected.  Instead this
> >> patch special cases simd_clone_call's and uses only scalar types of the
> >> original function that get transformed into vector types.
> > 
> > Looks sensible.
> > 
> > +bool
> > +simd_clone_call_p (gimple *stmt, cgraph_node **out_node)
> > 
> > you could return the cgraph_node * or NULL here.  Are you going to
> > use the function elsewhere?  Otherwise put it in the same TU as
> > the only use please and avoid exporting it.
> > 
> > Richard.
> > 
> >> gcc/ChangeLog:
> >>
> >>  * tree-vect-data-refs.cci (vect_get_smallest_scalar_type): Special
> >>  case
> >>  simd clone calls and only use types that are mapped to vectors.
> >>  * tree-vect-stmts.cc (simd_clone_call_p): New helper function.
> >>  * tree-vectorizer.h (simd_clone_call_p): Declare new function.
> >>
> >> gcc/testsuite/ChangeLog:
> >>
> >>  * gcc.dg/vect/vect-simd-clone-16f.c: Remove unnecessary differentation
> >>  between targets with different pointer sizes.
> >>  * gcc.dg/vect/vect-simd-clone-17f.c: Likewise.
> >>  * gcc.dg/vect/vect-simd-clone-18f.c: Likewise.
> >>
> > 
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 5/8] vect: Use inbranch simdclones in masked loops
  2023-10-18 14:41   ` Andre Vieira (lists)
@ 2023-10-19 12:17     ` Richard Biener
  0 siblings, 0 replies; 40+ messages in thread
From: Richard Biener @ 2023-10-19 12:17 UTC (permalink / raw)
  To: Andre Vieira (lists); +Cc: gcc-patches, Richard Sandiford, jakub

[-- Attachment #1: Type: text/plain, Size: 1646 bytes --]

On Wed, 18 Oct 2023, Andre Vieira (lists) wrote:

> Rebased, needs review.

+      tree parm_type = NULL_TREE;
+      if(i < args.length())
+       {

space before (

+/* Return SSA name of the result of the conversion of OPERAND into type 
TYPE.
+   The conversion statement is inserted at GSI.  */
+ 
+static tree                
+vect_convert (vec_info *vinfo, stmt_vec_info stmt_info, tree type, tree 
operand,
+             gimple_stmt_iterator *gsi)                 
+{                                  
+  operand = build1 (VIEW_CONVERT_EXPR, type, operand);
+  gassign *new_stmt = gimple_build_assign (make_ssa_name (type),
+                                          operand);

I don't like this much, it's got one use in your patch only.  Please
leave this abstraction out.

OK with the above two changes.

Thanks,
Richard.

> On 30/08/2023 10:13, Andre Vieira (lists) via Gcc-patches wrote:
> > This patch enables the compiler to use inbranch simdclones when generating
> > masked loops in autovectorization.
> > 
> > gcc/ChangeLog:
> > 
> >      * omp-simd-clone.cc (simd_clone_adjust_argument_types): Make function
> >      compatible with mask parameters in clone.
> >      * tree-vect-stmts.cc (vect_convert): New helper function.
> >      (vect_build_all_ones_mask): Allow vector boolean typed masks.
> >      (vectorizable_simd_clone_call): Enable the use of masked clones in
> >      fully masked loops.
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH6/8] omp: Reorder call for TARGET_SIMD_CLONE_ADJUST (was Re: [PATCH7/8] vect: Add TARGET_SIMD_CLONE_ADJUST_RET_OR_PARAM)
  2023-10-18 14:41           ` [PATCH6/8] omp: Reorder call for TARGET_SIMD_CLONE_ADJUST (was Re: [PATCH7/8] vect: Add TARGET_SIMD_CLONE_ADJUST_RET_OR_PARAM) Andre Vieira (lists)
@ 2023-10-30 18:34             ` Andre Vieira (lists)
  2023-10-31  7:59             ` Richard Biener
  1 sibling, 0 replies; 40+ messages in thread
From: Andre Vieira (lists) @ 2023-10-30 18:34 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches, Richard Sandiford, jakub

Hi Richi,

Friendly ping on this. I'm going away for two weeks end of this week, so 
I won't be here for end of stage-1, but I'd still very much like to get 
this done for GCC 14.

I don't know if you had a chance to look at this yet when you reviewed 
the other patches or if you maybe just missed it? A quick td;lr this 
moves around the TARGET_SIMD_CLONE_ADJUST call after we've vectorized 
the types in simdclones to avoid having to add the extra target hooks to 
change the types.  This required some moving around of the code that 
constructed the adjustments and the code that constructed the array for 
the return value.

Kind regards,
Andre

On 18/10/2023 15:41, Andre Vieira (lists) wrote:
> This patch moves the call to TARGET_SIMD_CLONE_ADJUST until after the 
> arguments and return types have been transformed into vector types.  It 
> also constructs the adjuments and retval modifications after this call, 
> allowing targets to alter the types of the arguments and return of the 
> clone prior to the modifications to the function definition.
> 
> Is this OK?
> 
> gcc/ChangeLog:
> 
>          * omp-simd-clone.cc (simd_clone_adjust_return_type): Hoist out
>          code to create return array and don't return new type.
>          (simd_clone_adjust_argument_types): Hoist out code that creates
>          ipa_param_body_adjustments and don't return them.
>          (simd_clone_adjust): Call TARGET_SIMD_CLONE_ADJUST after return
>          and argument types have been vectorized, create adjustments and
>          return array after the hook.
>          (expand_simd_clones): Call TARGET_SIMD_CLONE_ADJUST after return
>          and argument types have been vectorized.
> 
> On 04/10/2023 13:40, Andre Vieira (lists) wrote:
>>
>>
>> On 04/10/2023 11:41, Richard Biener wrote:
>>> On Wed, 4 Oct 2023, Andre Vieira (lists) wrote:
>>>
>>>>
>>>>
>>>> On 30/08/2023 14:04, Richard Biener wrote:
>>>>> On Wed, 30 Aug 2023, Andre Vieira (lists) wrote:
>>>>>
>>>>>> This patch adds a new target hook to enable us to adapt the types 
>>>>>> of return
>>>>>> and parameters of simd clones.  We use this in two ways, the first 
>>>>>> one is
>>>>>> to
>>>>>> make sure we can create valid SVE types, including the SVE type 
>>>>>> attribute,
>>>>>> when creating a SVE simd clone, even when the target options do 
>>>>>> not support
>>>>>> SVE.  We are following the same behaviour seen with x86 that 
>>>>>> creates simd
>>>>>> clones according to the ABI rules when no simdlen is provided, 
>>>>>> even if that
>>>>>> simdlen is not supported by the current target options.  Note that 
>>>>>> this
>>>>>> doesn't mean the simd clone will be used in auto-vectorization.
>>>>>
>>>>> You are not documenting the bool parameter of the new hook.
>>>>>
>>>>> What's wrong with doing the adjustment in TARGET_SIMD_CLONE_ADJUST?
>>>>
>>>> simd_clone_adjust_argument_types is called after that hook, so by 
>>>> the time we
>>>> call TARGET_SIMD_CLONE_ADJUST the types are still in scalar, not 
>>>> vector.  The
>>>> same is true for the return type one.
>>>>
>>>> Also the changes to the types need to be taken into consideration in
>>>> 'adjustments' I think.
>>>
>>> Nothing in the three existing implementations of 
>>> TARGET_SIMD_CLONE_ADJUST
>>> relies on this ordering I think, how about moving the hook invocation
>>> after simd_clone_adjust_argument_types?
>>>
>>
>> But that wouldn't change the 'ipa_param_body_adjustments' for when we 
>> have a function definition and we need to redo the body.
>>> Richard.
>>>
>>>> PS: I hope the subject line survived, my email client is having a 
>>>> bit of a
>>>> wobble this morning... it's what you get for updating software :(

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH6/8] omp: Reorder call for TARGET_SIMD_CLONE_ADJUST (was Re: [PATCH7/8] vect: Add TARGET_SIMD_CLONE_ADJUST_RET_OR_PARAM)
  2023-10-18 14:41           ` [PATCH6/8] omp: Reorder call for TARGET_SIMD_CLONE_ADJUST (was Re: [PATCH7/8] vect: Add TARGET_SIMD_CLONE_ADJUST_RET_OR_PARAM) Andre Vieira (lists)
  2023-10-30 18:34             ` Andre Vieira (lists)
@ 2023-10-31  7:59             ` Richard Biener
  2023-12-08 10:35               ` Jakub Jelinek
  1 sibling, 1 reply; 40+ messages in thread
From: Richard Biener @ 2023-10-31  7:59 UTC (permalink / raw)
  To: Andre Vieira (lists); +Cc: gcc-patches, Richard Sandiford, jakub

[-- Attachment #1: Type: text/plain, Size: 3317 bytes --]

On Wed, 18 Oct 2023, Andre Vieira (lists) wrote:

> This patch moves the call to TARGET_SIMD_CLONE_ADJUST until after the
> arguments and return types have been transformed into vector types.  It also
> constructs the adjuments and retval modifications after this call, allowing
> targets to alter the types of the arguments and return of the clone prior to
> the modifications to the function definition.
> 
> Is this OK?

OK (I was hoping for Jakub to have a look).

Thanks,
Richard.

> gcc/ChangeLog:
> 
>         * omp-simd-clone.cc (simd_clone_adjust_return_type): Hoist out
>         code to create return array and don't return new type.
>         (simd_clone_adjust_argument_types): Hoist out code that creates
>         ipa_param_body_adjustments and don't return them.
>         (simd_clone_adjust): Call TARGET_SIMD_CLONE_ADJUST after return
>         and argument types have been vectorized, create adjustments and
>         return array after the hook.
>         (expand_simd_clones): Call TARGET_SIMD_CLONE_ADJUST after return
>         and argument types have been vectorized.
> 
> On 04/10/2023 13:40, Andre Vieira (lists) wrote:
> > 
> > 
> > On 04/10/2023 11:41, Richard Biener wrote:
> >> On Wed, 4 Oct 2023, Andre Vieira (lists) wrote:
> >>
> >>>
> >>>
> >>> On 30/08/2023 14:04, Richard Biener wrote:
> >>>> On Wed, 30 Aug 2023, Andre Vieira (lists) wrote:
> >>>>
> >>>>> This patch adds a new target hook to enable us to adapt the types of
> >>>>> return
> >>>>> and parameters of simd clones.  We use this in two ways, the first one
> >>>>> is
> >>>>> to
> >>>>> make sure we can create valid SVE types, including the SVE type
> >>>>> attribute,
> >>>>> when creating a SVE simd clone, even when the target options do not
> >>>>> support
> >>>>> SVE.  We are following the same behaviour seen with x86 that creates
> >>>>> simd
> >>>>> clones according to the ABI rules when no simdlen is provided, even if
> >>>>> that
> >>>>> simdlen is not supported by the current target options.  Note that this
> >>>>> doesn't mean the simd clone will be used in auto-vectorization.
> >>>>
> >>>> You are not documenting the bool parameter of the new hook.
> >>>>
> >>>> What's wrong with doing the adjustment in TARGET_SIMD_CLONE_ADJUST?
> >>>
> >>> simd_clone_adjust_argument_types is called after that hook, so by the time
> >>> we
> >>> call TARGET_SIMD_CLONE_ADJUST the types are still in scalar, not vector. 
> >>> The
> >>> same is true for the return type one.
> >>>
> >>> Also the changes to the types need to be taken into consideration in
> >>> 'adjustments' I think.
> >>
> >> Nothing in the three existing implementations of TARGET_SIMD_CLONE_ADJUST
> >> relies on this ordering I think, how about moving the hook invocation
> >> after simd_clone_adjust_argument_types?
> >>
> > 
> > But that wouldn't change the 'ipa_param_body_adjustments' for when we have a
> > function definition and we need to redo the body.
> >> Richard.
> >>
> >>> PS: I hope the subject line survived, my email client is having a bit of a
> >>> wobble this morning... it's what you get for updating software :(
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 8/8] aarch64: Add SVE support for simd clones [PR 96342]
  2023-10-18 14:41   ` Andre Vieira (lists)
@ 2023-11-29 17:01     ` Richard Sandiford
  2023-12-01 16:39       ` Andre Vieira (lists)
  0 siblings, 1 reply; 40+ messages in thread
From: Richard Sandiford @ 2023-11-29 17:01 UTC (permalink / raw)
  To: Andre Vieira (lists); +Cc: gcc-patches

"Andre Vieira (lists)" <andre.simoesdiasvieira@arm.com> writes:
> Rebased, no major changes, still needs review.
>
> On 30/08/2023 10:19, Andre Vieira (lists) via Gcc-patches wrote:
>> This patch finalizes adding support for the generation of SVE simd 
>> clones when no simdlen is provided, following the ABI rules where the 
>> widest data type determines the minimum amount of elements in a length 
>> agnostic vector.
>> 
>> gcc/ChangeLog:
>> 
>>          * config/aarch64/aarch64-protos.h (add_sve_type_attribute): 
>> Declare.
>>      * config/aarch64/aarch64-sve-builtins.cc (add_sve_type_attribute): 
>> Make
>>      visibility global.
>>      * config/aarch64/aarch64.cc (aarch64_fntype_abi): Ensure SVE ABI is
>>      chosen over SIMD ABI if a SVE type is used in return or arguments.
>>      (aarch64_simd_clone_compute_vecsize_and_simdlen): Create VLA simd 
>> clone
>>      when no simdlen is provided, according to ABI rules.
>>      (aarch64_simd_clone_adjust): Add '+sve' attribute to SVE simd clones.
>>      (aarch64_simd_clone_adjust_ret_or_param): New.
>>      (TARGET_SIMD_CLONE_ADJUST_RET_OR_PARAM): Define.
>>      * omp-simd-clone.cc (simd_clone_mangle): Print 'x' for VLA simdlen.
>>      (simd_clone_adjust): Adapt safelen check to be compatible with VLA
>>      simdlen.
>> 
>> gcc/testsuite/ChangeLog:
>> 
>>      * c-c++-common/gomp/declare-variant-14.c: Adapt aarch64 scan.
>>      * gfortran.dg/gomp/declare-variant-14.f90: Likewise.
>>      * gcc.target/aarch64/declare-simd-1.c: Remove warning checks where no
>>      longer necessary.
>>      * gcc.target/aarch64/declare-simd-2.c: Add SVE clone scan.
>
> diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
> index 60a55f4bc1956786ea687fc7cad7ec9e4a84e1f0..769d637f63724a7f0044f48f3dd683e0fb46049c 100644
> --- a/gcc/config/aarch64/aarch64-protos.h
> +++ b/gcc/config/aarch64/aarch64-protos.h
> @@ -1005,6 +1005,8 @@ namespace aarch64_sve {
>  #ifdef GCC_TARGET_H
>    bool verify_type_context (location_t, type_context_kind, const_tree, bool);
>  #endif
> + void add_sve_type_attribute (tree, unsigned int, unsigned int,
> +			      const char *, const char *);
>  }
>  
>  extern void aarch64_split_combinev16qi (rtx operands[3]);
> diff --git a/gcc/config/aarch64/aarch64-sve-builtins.cc b/gcc/config/aarch64/aarch64-sve-builtins.cc
> index 161a14edde7c9fb1b13b146cf50463e2d78db264..6f99c438d10daa91b7e3b623c995489f1a8a0f4c 100644
> --- a/gcc/config/aarch64/aarch64-sve-builtins.cc
> +++ b/gcc/config/aarch64/aarch64-sve-builtins.cc
> @@ -569,14 +569,16 @@ static bool reported_missing_registers_p;
>  /* Record that TYPE is an ABI-defined SVE type that contains NUM_ZR SVE vectors
>     and NUM_PR SVE predicates.  MANGLED_NAME, if nonnull, is the ABI-defined
>     mangling of the type.  ACLE_NAME is the <arm_sve.h> name of the type.  */
> -static void
> +void
>  add_sve_type_attribute (tree type, unsigned int num_zr, unsigned int num_pr,
>  			const char *mangled_name, const char *acle_name)
>  {
>    tree mangled_name_tree
>      = (mangled_name ? get_identifier (mangled_name) : NULL_TREE);
> +  tree acle_name_tree
> +    = (acle_name ? get_identifier (acle_name) : NULL_TREE);
>  
> -  tree value = tree_cons (NULL_TREE, get_identifier (acle_name), NULL_TREE);
> +  tree value = tree_cons (NULL_TREE, acle_name_tree, NULL_TREE);
>    value = tree_cons (NULL_TREE, mangled_name_tree, value);
>    value = tree_cons (NULL_TREE, size_int (num_pr), value);
>    value = tree_cons (NULL_TREE, size_int (num_zr), value);
> diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
> index 37507f091c2a6154fa944c3a9fad6a655ab5d5a1..cb0947b18c6a611d55579b5b08d93f6a4a9c3b2c 100644
> --- a/gcc/config/aarch64/aarch64.cc
> +++ b/gcc/config/aarch64/aarch64.cc
> @@ -4080,13 +4080,13 @@ aarch64_takes_arguments_in_sve_regs_p (const_tree fntype)
>  static const predefined_function_abi &
>  aarch64_fntype_abi (const_tree fntype)
>  {
> -  if (lookup_attribute ("aarch64_vector_pcs", TYPE_ATTRIBUTES (fntype)))
> -    return aarch64_simd_abi ();
> -
>    if (aarch64_returns_value_in_sve_regs_p (fntype)
>        || aarch64_takes_arguments_in_sve_regs_p (fntype))
>      return aarch64_sve_abi ();
>  
> +  if (lookup_attribute ("aarch64_vector_pcs", TYPE_ATTRIBUTES (fntype)))
> +    return aarch64_simd_abi ();
> +
>    return default_function_abi;
>  }
>  

I think we discussed this off-list later, but the change above shouldn't
be necessary.  aarch64_vector_pcs must not be attached to SVE PCS functions,
so the two cases should be mutually exclusive.

> @@ -27467,7 +27467,7 @@ aarch64_simd_clone_compute_vecsize_and_simdlen (struct cgraph_node *node,
>  					int num, bool explicit_p)
>  {
>    tree t, ret_type;
> -  unsigned int nds_elt_bits;
> +  unsigned int nds_elt_bits, wds_elt_bits;
>    int count;
>    unsigned HOST_WIDE_INT const_simdlen;
>  
> @@ -27513,10 +27513,14 @@ aarch64_simd_clone_compute_vecsize_and_simdlen (struct cgraph_node *node,
>    if (TREE_CODE (ret_type) != VOID_TYPE)
>      {
>        nds_elt_bits = lane_size (SIMD_CLONE_ARG_TYPE_VECTOR, ret_type);
> +      wds_elt_bits = nds_elt_bits;
>        vec_elts.safe_push (std::make_pair (ret_type, nds_elt_bits));
>      }
>    else
> -    nds_elt_bits = POINTER_SIZE;
> +    {
> +      nds_elt_bits = POINTER_SIZE;
> +      wds_elt_bits = 0;
> +    }
>  
>    int i;
>    tree type_arg_types = TYPE_ARG_TYPES (TREE_TYPE (node->decl));
> @@ -27524,30 +27528,36 @@ aarch64_simd_clone_compute_vecsize_and_simdlen (struct cgraph_node *node,
>    for (t = (decl_arg_p ? DECL_ARGUMENTS (node->decl) : type_arg_types), i = 0;
>         t && t != void_list_node; t = TREE_CHAIN (t), i++)
>      {
> -      tree arg_type = decl_arg_p ? TREE_TYPE (t) : TREE_VALUE (t);
> +      tree type = decl_arg_p ? TREE_TYPE (t) : TREE_VALUE (t);

Not sure renaming arg_type is worth it.  The original was probably
more descriptive.

>        if (clonei->args[i].arg_type != SIMD_CLONE_ARG_TYPE_UNIFORM
> -	  && !supported_simd_type (arg_type))
> +	  && !supported_simd_type (type))
>  	{
>  	  if (!explicit_p)
>  	    ;
> -	  else if (COMPLEX_FLOAT_TYPE_P (ret_type))
> +	  else if (COMPLEX_FLOAT_TYPE_P (type))
>  	    warning_at (DECL_SOURCE_LOCATION (node->decl), 0,
>  			"GCC does not currently support argument type %qT "
> -			"for simd", arg_type);
> +			"for simd", type);
>  	  else
>  	    warning_at (DECL_SOURCE_LOCATION (node->decl), 0,
>  			"unsupported argument type %qT for simd",
> -			arg_type);
> +			type);
>  	  return 0;
>  	}
> -      unsigned lane_bits = lane_size (clonei->args[i].arg_type, arg_type);
> +      unsigned lane_bits = lane_size (clonei->args[i].arg_type, type);
>        if (clonei->args[i].arg_type == SIMD_CLONE_ARG_TYPE_VECTOR)
> -	vec_elts.safe_push (std::make_pair (arg_type, lane_bits));
> +	vec_elts.safe_push (std::make_pair (type, lane_bits));
>        if (nds_elt_bits > lane_bits)
>  	nds_elt_bits = lane_bits;
> +      else if (wds_elt_bits < lane_bits)
> +	wds_elt_bits = lane_bits;
>      }
>  
> -  clonei->vecsize_mangle = 'n';
> +  /* If we could not determine the WDS type from available parameters/return,
> +     then fallback to using uintptr_t.  */
> +  if (wds_elt_bits == 0)
> +    wds_elt_bits = POINTER_SIZE;
> +
>    clonei->mask_mode = VOIDmode;
>    poly_uint64 simdlen;
>    auto_vec<poly_uint64> simdlens (2);
> @@ -27558,6 +27568,11 @@ aarch64_simd_clone_compute_vecsize_and_simdlen (struct cgraph_node *node,
>        simdlen = exact_div (poly_uint64 (64), nds_elt_bits);
>        simdlens.safe_push (simdlen);
>        simdlens.safe_push (simdlen * 2);
> +      /* Only create a SVE simd clone if we aren't dealing with an unprototyped
> +	 function.  */
> +      if (DECL_ARGUMENTS (node->decl) != 0
> +	  || type_arg_types != 0)
> +	simdlens.safe_push (exact_div (poly_uint64 (128, 128), wds_elt_bits));

This check feels a bit indirect.  Does it work to use:

  if (prototype_p (TREE_TYPE (node->decl)))

instead?

>      }
>    else
>      simdlens.safe_push (clonei->simdlen);
> @@ -27578,19 +27593,20 @@ aarch64_simd_clone_compute_vecsize_and_simdlen (struct cgraph_node *node,
>    while (j < count && !simdlens.is_empty ())
>      {
>        bool remove_simdlen = false;
> -      for (auto elt : vec_elts)
> -	if (known_gt (simdlens[j] * elt.second, 128U))
> -	  {
> -	    /* Don't issue a warning for every simdclone when there is no
> -	       specific simdlen clause.  */
> -	    if (explicit_p && known_ne (clonei->simdlen, 0U))
> -	      warning_at (DECL_SOURCE_LOCATION (node->decl), 0,
> -			  "GCC does not currently support simdlen %wd for "
> -			  "type %qT",
> -			  constant_lower_bound (simdlens[j]), elt.first);
> -	    remove_simdlen = true;
> -	    break;
> -	  }
> +      if (simdlens[j].is_constant ())
> +	for (auto elt : vec_elts)
> +	  if (known_gt (simdlens[j] * elt.second, 128U))
> +	    {
> +	      /* Don't issue a warning for every simdclone when there is no
> +		 specific simdlen clause.  */
> +	      if (explicit_p && known_ne (clonei->simdlen, 0U))
> +		warning_at (DECL_SOURCE_LOCATION (node->decl), 0,
> +			    "GCC does not currently support simdlen %wd for "
> +			    "type %qT",
> +			    constant_lower_bound (simdlens[j]), elt.first);
> +	      remove_simdlen = true;
> +	      break;
> +	    }
>        if (remove_simdlen)
>  	{
>  	  count--;
> @@ -27618,9 +27634,36 @@ aarch64_simd_clone_compute_vecsize_and_simdlen (struct cgraph_node *node,
>  
>    gcc_assert (num < count);
>    clonei->simdlen = simdlens[num];
> +  if (clonei->simdlen.is_constant ())
> +    clonei->vecsize_mangle = 'n';
> +  else
> +    {
> +      clonei->vecsize_mangle = 's';
> +      clonei->inbranch = 1;
> +    }
>    return count;
>  }
>  
> +static tree
> +simd_clone_adjust_sve_vector_type (tree type, bool is_mask, poly_uint64 simdlen)
> +{
> +    unsigned int num_zr = 0;

Nits: missing function comment.  The body is indented by too many columns.

> +    unsigned int num_pr = 0;
> +    type = TREE_TYPE (type);
> +    type = build_vector_type (type, simdlen);

Is simdlen ever different from the original TYPE_VECTOR_SUBPARTS?
I think a comment is needed if so.

> +    if (is_mask)
> +      {
> +	type = truth_type_for (type);
> +	num_pr = 1;
> +      }
> +    else
> +      num_zr = 1;
> +
> +    aarch64_sve::add_sve_type_attribute (type, num_zr, num_pr, NULL,
> +					 NULL);

Before adding the atttribute, I think we should call:

  type = build_distinct_type_copy (type);

so that we don't change a type that is already in use, or associate
any new types with this one.  If simdlen is always equal to the original
TYPE_VECTOR_SUBPARTS then the build_distinct_type_copy would replace the
build_vector_type.

> +    return type;
> +}
> +
>  /* Implement TARGET_SIMD_CLONE_ADJUST.  */
>  
>  static void
> @@ -27632,6 +27675,69 @@ aarch64_simd_clone_adjust (struct cgraph_node *node)
>    tree t = TREE_TYPE (node->decl);
>    TYPE_ATTRIBUTES (t) = make_attribute ("aarch64_vector_pcs", "default",
>  					TYPE_ATTRIBUTES (t));
> +
> +  cl_target_option cur_target;
> +  poly_uint16 old_sve_vg;
> +  bool m_old_have_regs_of_mode[MAX_MACHINE_MODE];
> +
> +  if (node->simdclone->vecsize_mangle == 's')
> +    {
> +      tree target = build_string (strlen ("+sve"), "+sve");
> +      aarch64_option_valid_attribute_p (node->decl, NULL_TREE, target, 0);

It would be good to assert that this succeeds.  It's unfortunate that we
have a predicate with side-effects, but that's obviously not your fault. :)

> +      cl_target_option_save (&cur_target, &global_options, &global_options_set);
> +      tree new_target = DECL_FUNCTION_SPECIFIC_TARGET (node->decl);
> +      cl_target_option_restore (&global_options, &global_options_set,
> +				TREE_TARGET_OPTION (new_target));
> +      aarch64_override_options_internal (&global_options);

Does this preserve any existing target attributes too, to the extent
possible?  E.g. if the function has:

  #pragma GCC target "+sve2"

then I think we should honour that rather than dial down to "+sve".
Same if the code is compiled with -march=armv9-a+sve2: we should
compile the clone as SVE2 rather than SVE.

> +      memcpy (m_old_have_regs_of_mode, have_regs_of_mode,
> +	      sizeof (have_regs_of_mode));
> +      for (int i = 0; i < NUM_MACHINE_MODES; ++i)
> +	if (aarch64_sve_mode_p ((machine_mode) i))
> +	  have_regs_of_mode[i] = true;

Would it be possible to use push_cfun and pop_cfun instead, so that
we do a proper target switch?

(The SVE ACLE code uses a similar technique to the above, but that's
because it's run in a non-function context.)

> +      old_sve_vg = aarch64_sve_vg;
> +      if (!node->simdclone->simdlen.is_constant ())
> +	aarch64_sve_vg = poly_uint16 (2, 2);

I'm not sure we should change VG here.

The basis for that and for allowing SVE2 above is that the ODR requires
that all comdat instances of a function are compiled in the same way.
That applies to all functions, not just simd clones.  So IMO it's user
error if a clone is compiled multiple times with different target options,
or if it's compiled with target options that the runtime target doesn't in
fact support.

We're already implicitly assuming the same thing for Advanced SIMD,
since we'll use whatever post-Armv8-A features happen to be enabled.

> +    }
> +  cgraph_simd_clone *sc = node->simdclone;
> +
> +  for (unsigned i = 0; i < sc->nargs; ++i)
> +    {
> +      bool is_mask = false;
> +      tree type;
> +      switch (sc->args[i].arg_type)
> +	{
> +	case SIMD_CLONE_ARG_TYPE_MASK:
> +	  is_mask = true;
> +	  gcc_fallthrough ();
> +	case SIMD_CLONE_ARG_TYPE_VECTOR:
> +	case SIMD_CLONE_ARG_TYPE_LINEAR_VAL_CONSTANT_STEP:
> +	case SIMD_CLONE_ARG_TYPE_LINEAR_VAL_VARIABLE_STEP:
> +	  type = sc->args[i].vector_type;
> +	  gcc_assert (VECTOR_TYPE_P (type));
> +	  if (node->simdclone->vecsize_mangle == 's')
> +	    type = simd_clone_adjust_sve_vector_type (type, is_mask,
> +						      sc->simdlen);
> +	  else if (is_mask)
> +	    type = truth_type_for (type);
> +	  sc->args[i].vector_type = type;
> +	default:
> +	    continue;
> +	}
> +    }
> +  if (node->simdclone->vecsize_mangle == 's')
> +    {
> +      tree ret_type = TREE_TYPE (t);
> +      if (VECTOR_TYPE_P (ret_type))
> +	TREE_TYPE (t)
> +	  = simd_clone_adjust_sve_vector_type (ret_type, false,
> +					       node->simdclone->simdlen);
> +      /* Restore current options and AARCH64_SVE_VG.  */
> +      cl_target_option_restore (&global_options, &global_options_set, &cur_target);
> +      aarch64_override_options_internal (&global_options);
> +      memcpy (have_regs_of_mode, m_old_have_regs_of_mode,
> +	      sizeof (have_regs_of_mode));
> +      aarch64_sve_vg = old_sve_vg;
> +    }
>  }
>  
>  /* Implement TARGET_SIMD_CLONE_USABLE.  */
> @@ -27645,6 +27751,10 @@ aarch64_simd_clone_usable (struct cgraph_node *node)
>        if (!TARGET_SIMD)
>  	return -1;
>        return 0;
> +    case 's':
> +      if (!TARGET_SVE)
> +	return -1;
> +      return 0;
>      default:
>        gcc_unreachable ();
>      }
> diff --git a/gcc/omp-simd-clone.cc b/gcc/omp-simd-clone.cc
> index fb80888190c88e29895ecfbbe1b17d390c9a9dfe..150af24c5bc52b6737f3ca46ba73d1c890b143e9 100644
> --- a/gcc/omp-simd-clone.cc
> +++ b/gcc/omp-simd-clone.cc
> @@ -541,9 +541,12 @@ simd_clone_mangle (struct cgraph_node *node,
>    pp_string (&pp, "_ZGV");
>    pp_character (&pp, vecsize_mangle);
>    pp_character (&pp, mask);
> -  /* For now, simdlen is always constant, while variable simdlen pp 'n'.  */
> -  unsigned int len = simdlen.to_constant ();
> -  pp_decimal_int (&pp, (len));
> +
> +  unsigned long long len = 0;
> +  if (simdlen.is_constant (&len))
> +    pp_decimal_int (&pp, (int) (len));
> +  else
> +    pp_character (&pp, 'x');

FWIW, I agree this is the right approach for now.  We can put the 'x' behind
a target hook later if another VLA target uses a different convention.

Thanks,
Richard

>    for (n = 0; n < clone_info->nargs; ++n)
>      {
> @@ -1541,8 +1544,8 @@ simd_clone_adjust (struct cgraph_node *node)
>  	 below).  */
>        loop = alloc_loop ();
>        cfun->has_force_vectorize_loops = true;
> -      /* For now, simlen is always constant.  */
> -      loop->safelen = node->simdclone->simdlen.to_constant ();
> +      /* We can assert that safelen is the 'minimum' simdlen.  */
> +      loop->safelen = constant_lower_bound (node->simdclone->simdlen);
>        loop->force_vectorize = true;
>        loop->header = body_bb;
>      }
> diff --git a/gcc/testsuite/c-c++-common/gomp/declare-variant-14.c b/gcc/testsuite/c-c++-common/gomp/declare-variant-14.c
> index e3668893afe33a58c029cddd433d9bf43cce2bfa..12f8b3b839b7f3ff9e4f99768e59c0e1c5339062 100644
> --- a/gcc/testsuite/c-c++-common/gomp/declare-variant-14.c
> +++ b/gcc/testsuite/c-c++-common/gomp/declare-variant-14.c
> @@ -21,7 +21,7 @@ test1 (int x)
>       shall call f01 with score 8.  */
>    /* { dg-final { scan-tree-dump-not "f04 \\\(x" "optimized" } } */
>    /* { dg-final { scan-tree-dump-times "f03 \\\(x" 14 "optimized" { target { !aarch64*-*-* } } } } */
> -  /* { dg-final { scan-tree-dump-times "f03 \\\(x" 10 "optimized" { target { aarch64*-*-* } } } } */
> +  /* { dg-final { scan-tree-dump-times "f03 \\\(x" 12 "optimized" { target { aarch64*-*-* } } } } */
>    /* { dg-final { scan-tree-dump-times "f01 \\\(x" 4 "optimized" { target { !aarch64*-*-* } } } } */
>    /* { dg-final { scan-tree-dump-times "f01 \\\(x" 0 "optimized" { target { aarch64*-*-* } } } } */
>    int a = f04 (x);
> diff --git a/gcc/testsuite/gcc.target/aarch64/declare-simd-1.c b/gcc/testsuite/gcc.target/aarch64/declare-simd-1.c
> index aab8c17f0c442a7cda4dce23cc18162a0b7f676e..add6e7c93019834fbd5bed5ead18b52d4cdd0a37 100644
> --- a/gcc/testsuite/gcc.target/aarch64/declare-simd-1.c
> +++ b/gcc/testsuite/gcc.target/aarch64/declare-simd-1.c
> @@ -4,28 +4,39 @@
>  extern "C" {
>  #endif
>  #pragma omp declare simd
> -int __attribute__ ((const)) f00 (int a , char b) /* { dg-warning {GCC does not currently support a simdclone with simdlens 8 and 16 for these types.} } */
> +int __attribute__ ((const)) f00 (int a , char b)
>  {
>    return a + b;
>  }
>  
> +/* { dg-final { scan-assembler-not {_ZGVn[a-z0-9]+_f00} } } */
> +/* { dg-final { scan-assembler {_ZGVsMxvv_f00} } } */
> +
>  #pragma omp declare simd
> -long long __attribute__ ((const)) f01 (int a , short b) /* { dg-warning {GCC does not currently support a simdclone with simdlens 4 and 8 for these types.} } */
> +long long __attribute__ ((const)) f01 (int a , short b)
>  {
>    return a + b;
>  }
> +/* { dg-final { scan-assembler-not {_ZGVn[a-z0-9]+_f01} } } */
> +/* { dg-final { scan-assembler {_ZGVsMxvv_f01} } } */
>  
>  #pragma omp declare simd linear(b)
> -long long __attribute__ ((const)) f02 (short *b, int a) /* { dg-warning {GCC does not currently support a simdclone with simdlens 4 and 8 for these types.} } */
> +long long __attribute__ ((const)) f02 (short *b, int a)
>  {
>    return a + *b;
>  }
>  
> +/* { dg-final { scan-assembler-not {_ZGVn[a-z0-9]+_f02} } } */
> +/* { dg-final { scan-assembler {_ZGVsMxl2v_f02} } } */
> +
>  #pragma omp declare simd uniform(b)
> -void f03 (char b, int a) /* { dg-warning {GCC does not currently support a simdclone with simdlens 8 and 16 for these types.} } */
> +void f03 (char b, int a)
>  {
>  }
>  
> +/* { dg-final { scan-assembler-not {_ZGVn[a-z0-9]+_f03} } } */
> +/* { dg-final { scan-assembler {_ZGVsMxuv_f03} } } */
> +
>  #pragma omp declare simd simdlen(4)
>  double f04 (void) /* { dg-warning {GCC does not currently support simdlen 4 for type 'double'} } */
>  {
> diff --git a/gcc/testsuite/gcc.target/aarch64/declare-simd-2.c b/gcc/testsuite/gcc.target/aarch64/declare-simd-2.c
> index abb128ffc9cd2c1353b99eb38aae72377746e6d6..604869a30456e4db988bba86e059a27f19dda589 100644
> --- a/gcc/testsuite/gcc.target/aarch64/declare-simd-2.c
> +++ b/gcc/testsuite/gcc.target/aarch64/declare-simd-2.c
> @@ -10,6 +10,7 @@ short __attribute__ ((const)) f00 (short a , char b)
>  }
>  /* { dg-final { scan-assembler {_ZGVnN8vv_f00:} } } */
>  /* { dg-final { scan-assembler {_ZGVnM8vv_f00:} } } */
> +/* { dg-final { scan-assembler {_ZGVsMxvv_f00:} } } */
>  
>  #pragma omp declare simd notinbranch
>  short __attribute__ ((const)) f01 (int a , short b)
> @@ -17,6 +18,7 @@ short __attribute__ ((const)) f01 (int a , short b)
>    return a + b;
>  }
>  /* { dg-final { scan-assembler {_ZGVnN4vv_f01:} } } */
> +/* { dg-final { scan-assembler {_ZGVsMxvv_f01:} } } */
>  
>  #pragma omp declare simd linear(b) inbranch
>  int __attribute__ ((const)) f02 (int a, short *b)
> @@ -24,6 +26,7 @@ int __attribute__ ((const)) f02 (int a, short *b)
>    return a + *b;
>  }
>  /* { dg-final { scan-assembler {_ZGVnM4vl2_f02:} } } */
> +/* { dg-final { scan-assembler {_ZGVsMxvl2_f02:} } } */
>  
>  #pragma omp declare simd uniform(a) notinbranch
>  void f03 (char b, int a)
> @@ -31,6 +34,7 @@ void f03 (char b, int a)
>  }
>  /* { dg-final { scan-assembler {_ZGVnN8vu_f03:} } } */
>  /* { dg-final { scan-assembler {_ZGVnN16vu_f03:} } } */
> +/* { dg-final { scan-assembler {_ZGVsMxvu_f03:} } } */
>  
>  #pragma omp declare simd simdlen(2)
>  float f04 (double a)
> @@ -39,6 +43,7 @@ float f04 (double a)
>  }
>  /* { dg-final { scan-assembler {_ZGVnN2v_f04:} } } */
>  /* { dg-final { scan-assembler {_ZGVnM2v_f04:} } } */
> +/* { dg-final { scan-assembler-not {_ZGVs[0-9a-z]*_f04:} } } */
>  
>  #pragma omp declare simd uniform(a) linear (b)
>  void f05 (short a, short *b, short c)
> @@ -50,6 +55,7 @@ void f05 (short a, short *b, short c)
>  /* { dg-final { scan-assembler {_ZGVnN4ul2v_f05:} } } */
>  /* { dg-final { scan-assembler {_ZGVnM8ul2v_f05:} } } */
>  /* { dg-final { scan-assembler {_ZGVnM8ul2v_f05:} } } */
> +/* { dg-final { scan-assembler {_ZGVsMxul2v_f05:} } } */
>  #ifdef __cplusplus
>  }
>  #endif
> diff --git a/gcc/testsuite/gfortran.dg/gomp/declare-variant-14.f90 b/gcc/testsuite/gfortran.dg/gomp/declare-variant-14.f90
> index 6319df0558f37b95f1b2eb17374bdb4ecbc33295..38677b8f7a76b960ce9363b1c0cabf6fc5086ab6 100644
> --- a/gcc/testsuite/gfortran.dg/gomp/declare-variant-14.f90
> +++ b/gcc/testsuite/gfortran.dg/gomp/declare-variant-14.f90
> @@ -41,7 +41,7 @@ contains
>      ! shall call f01 with score 8.
>      ! { dg-final { scan-tree-dump-not "f04 \\\(x" "optimized" } }
>      ! { dg-final { scan-tree-dump-times "f03 \\\(x" 14 "optimized" { target { !aarch64*-*-* } } } }
> -    ! { dg-final { scan-tree-dump-times "f03 \\\(x" 6 "optimized" { target { aarch64*-*-* } } } }
> +    ! { dg-final { scan-tree-dump-times "f03 \\\(x" 8 "optimized" { target { aarch64*-*-* } } } }
>      ! { dg-final { scan-tree-dump-times "f01 \\\(x" 4 "optimized" { target { !aarch64*-*-* } } } }
>      ! { dg-final { scan-tree-dump-times "f01 \\\(x" 0 "optimized" { target { aarch64*-*-* } } } }
>      a = f04 (x)
> diff --git a/libgomp/testsuite/libgomp.c/declare-variant-1.c b/libgomp/testsuite/libgomp.c/declare-variant-1.c
> index 6129f23a0f80585246957022d63608dc3a68f1ff..591867bdc97f28bcb64fbe8bb9db39ccee5fe643 100644
> --- a/libgomp/testsuite/libgomp.c/declare-variant-1.c
> +++ b/libgomp/testsuite/libgomp.c/declare-variant-1.c
> @@ -48,7 +48,7 @@ test1 (int x)
>    /* { dg-final { scan-ltrans-tree-dump-not "f04 \\\(x" "optimized" } } */
>    /* { dg-final { scan-ltrans-tree-dump-times "f03 \\\(x" 14 "optimized" { target { !aarch64*-*-* } } } } } */
>    /* { dg-final { scan-ltrans-tree-dump-times "f01 \\\(x" 4 "optimized" { target { !aarch64*-*-* } } } } } */
> -  /* { dg-final { scan-ltrans-tree-dump-times "f03 \\\(x" 10 "optimized" { target { aarch64*-*-* } } } } } */
> +  /* { dg-final { scan-ltrans-tree-dump-times "f03 \\\(x" 12 "optimized" { target { aarch64*-*-* } } } } } */
>    /* { dg-final { scan-ltrans-tree-dump-not "f01 \\\(x" "optimized" { target { aarch64*-*-* } } } } } */
>    int a = f04 (x);
>    int b = f04 (x);

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 8/8] aarch64: Add SVE support for simd clones [PR 96342]
  2023-11-29 17:01     ` Richard Sandiford
@ 2023-12-01 16:39       ` Andre Vieira (lists)
  0 siblings, 0 replies; 40+ messages in thread
From: Andre Vieira (lists) @ 2023-12-01 16:39 UTC (permalink / raw)
  To: gcc-patches, richard.sandiford



On 29/11/2023 17:01, Richard Sandiford wrote:
> "Andre Vieira (lists)" <andre.simoesdiasvieira@arm.com> writes:
>> Rebased, no major changes, still needs review.
>>
>> On 30/08/2023 10:19, Andre Vieira (lists) via Gcc-patches wrote:
>>> This patch finalizes adding support for the generation of SVE simd
>>> clones when no simdlen is provided, following the ABI rules where the
>>> widest data type determines the minimum amount of elements in a length
>>> agnostic vector.
>>>
>>> gcc/ChangeLog:
>>>
>>>           * config/aarch64/aarch64-protos.h (add_sve_type_attribute):
>>> Declare.
>>>       * config/aarch64/aarch64-sve-builtins.cc (add_sve_type_attribute):
>>> Make
>>>       visibility global.
>>>       * config/aarch64/aarch64.cc (aarch64_fntype_abi): Ensure SVE ABI is
>>>       chosen over SIMD ABI if a SVE type is used in return or arguments.
>>>       (aarch64_simd_clone_compute_vecsize_and_simdlen): Create VLA simd
>>> clone
>>>       when no simdlen is provided, according to ABI rules.
>>>       (aarch64_simd_clone_adjust): Add '+sve' attribute to SVE simd clones.
>>>       (aarch64_simd_clone_adjust_ret_or_param): New.
>>>       (TARGET_SIMD_CLONE_ADJUST_RET_OR_PARAM): Define.
>>>       * omp-simd-clone.cc (simd_clone_mangle): Print 'x' for VLA simdlen.
>>>       (simd_clone_adjust): Adapt safelen check to be compatible with VLA
>>>       simdlen.
>>>
>>> gcc/testsuite/ChangeLog:
>>>
>>>       * c-c++-common/gomp/declare-variant-14.c: Adapt aarch64 scan.
>>>       * gfortran.dg/gomp/declare-variant-14.f90: Likewise.
>>>       * gcc.target/aarch64/declare-simd-1.c: Remove warning checks where no
>>>       longer necessary.
>>>       * gcc.target/aarch64/declare-simd-2.c: Add SVE clone scan.
>>
>> diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
>> index 60a55f4bc1956786ea687fc7cad7ec9e4a84e1f0..769d637f63724a7f0044f48f3dd683e0fb46049c 100644
>> --- a/gcc/config/aarch64/aarch64-protos.h
>> +++ b/gcc/config/aarch64/aarch64-protos.h
>> @@ -1005,6 +1005,8 @@ namespace aarch64_sve {
>>   #ifdef GCC_TARGET_H
>>     bool verify_type_context (location_t, type_context_kind, const_tree, bool);
>>   #endif
>> + void add_sve_type_attribute (tree, unsigned int, unsigned int,
>> +			      const char *, const char *);
>>   }
>>   
>>   extern void aarch64_split_combinev16qi (rtx operands[3]);
>> diff --git a/gcc/config/aarch64/aarch64-sve-builtins.cc b/gcc/config/aarch64/aarch64-sve-builtins.cc
>> index 161a14edde7c9fb1b13b146cf50463e2d78db264..6f99c438d10daa91b7e3b623c995489f1a8a0f4c 100644
>> --- a/gcc/config/aarch64/aarch64-sve-builtins.cc
>> +++ b/gcc/config/aarch64/aarch64-sve-builtins.cc
>> @@ -569,14 +569,16 @@ static bool reported_missing_registers_p;
>>   /* Record that TYPE is an ABI-defined SVE type that contains NUM_ZR SVE vectors
>>      and NUM_PR SVE predicates.  MANGLED_NAME, if nonnull, is the ABI-defined
>>      mangling of the type.  ACLE_NAME is the <arm_sve.h> name of the type.  */
>> -static void
>> +void
>>   add_sve_type_attribute (tree type, unsigned int num_zr, unsigned int num_pr,
>>   			const char *mangled_name, const char *acle_name)
>>   {
>>     tree mangled_name_tree
>>       = (mangled_name ? get_identifier (mangled_name) : NULL_TREE);
>> +  tree acle_name_tree
>> +    = (acle_name ? get_identifier (acle_name) : NULL_TREE);
>>   
>> -  tree value = tree_cons (NULL_TREE, get_identifier (acle_name), NULL_TREE);
>> +  tree value = tree_cons (NULL_TREE, acle_name_tree, NULL_TREE);
>>     value = tree_cons (NULL_TREE, mangled_name_tree, value);
>>     value = tree_cons (NULL_TREE, size_int (num_pr), value);
>>     value = tree_cons (NULL_TREE, size_int (num_zr), value);
>> diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
>> index 37507f091c2a6154fa944c3a9fad6a655ab5d5a1..cb0947b18c6a611d55579b5b08d93f6a4a9c3b2c 100644
>> --- a/gcc/config/aarch64/aarch64.cc
>> +++ b/gcc/config/aarch64/aarch64.cc
>> @@ -4080,13 +4080,13 @@ aarch64_takes_arguments_in_sve_regs_p (const_tree fntype)
>>   static const predefined_function_abi &
>>   aarch64_fntype_abi (const_tree fntype)
>>   {
>> -  if (lookup_attribute ("aarch64_vector_pcs", TYPE_ATTRIBUTES (fntype)))
>> -    return aarch64_simd_abi ();
>> -
>>     if (aarch64_returns_value_in_sve_regs_p (fntype)
>>         || aarch64_takes_arguments_in_sve_regs_p (fntype))
>>       return aarch64_sve_abi ();
>>   
>> +  if (lookup_attribute ("aarch64_vector_pcs", TYPE_ATTRIBUTES (fntype)))
>> +    return aarch64_simd_abi ();
>> +
>>     return default_function_abi;
>>   }
>>   
> 
> I think we discussed this off-list later, but the change above shouldn't
> be necessary.  aarch64_vector_pcs must not be attached to SVE PCS functions,
> so the two cases should be mutually exclusive.

Yeah I had made the changes locally, but not updated the patch yet.
> 
>> @@ -27467,7 +27467,7 @@ aarch64_simd_clone_compute_vecsize_and_simdlen (struct cgraph_node *node,
>>   					int num, bool explicit_p)
>>   {
>>     tree t, ret_type;
>> -  unsigned int nds_elt_bits;
>> +  unsigned int nds_elt_bits, wds_elt_bits;
>>     int count;
>>     unsigned HOST_WIDE_INT const_simdlen;
>>   
>> @@ -27513,10 +27513,14 @@ aarch64_simd_clone_compute_vecsize_and_simdlen (struct cgraph_node *node,
>>     if (TREE_CODE (ret_type) != VOID_TYPE)
>>       {
>>         nds_elt_bits = lane_size (SIMD_CLONE_ARG_TYPE_VECTOR, ret_type);
>> +      wds_elt_bits = nds_elt_bits;
>>         vec_elts.safe_push (std::make_pair (ret_type, nds_elt_bits));
>>       }
>>     else
>> -    nds_elt_bits = POINTER_SIZE;
>> +    {
>> +      nds_elt_bits = POINTER_SIZE;
>> +      wds_elt_bits = 0;
>> +    }
>>   
>>     int i;
>>     tree type_arg_types = TYPE_ARG_TYPES (TREE_TYPE (node->decl));
>> @@ -27524,30 +27528,36 @@ aarch64_simd_clone_compute_vecsize_and_simdlen (struct cgraph_node *node,
>>     for (t = (decl_arg_p ? DECL_ARGUMENTS (node->decl) : type_arg_types), i = 0;
>>          t && t != void_list_node; t = TREE_CHAIN (t), i++)
>>       {
>> -      tree arg_type = decl_arg_p ? TREE_TYPE (t) : TREE_VALUE (t);
>> +      tree type = decl_arg_p ? TREE_TYPE (t) : TREE_VALUE (t);
> 
> Not sure renaming arg_type is worth it.  The original was probably
> more descriptive.
> 
>>         if (clonei->args[i].arg_type != SIMD_CLONE_ARG_TYPE_UNIFORM
>> -	  && !supported_simd_type (arg_type))
>> +	  && !supported_simd_type (type))
>>   	{
>>   	  if (!explicit_p)
>>   	    ;
>> -	  else if (COMPLEX_FLOAT_TYPE_P (ret_type))
>> +	  else if (COMPLEX_FLOAT_TYPE_P (type))
>>   	    warning_at (DECL_SOURCE_LOCATION (node->decl), 0,
>>   			"GCC does not currently support argument type %qT "
>> -			"for simd", arg_type);
>> +			"for simd", type);
>>   	  else
>>   	    warning_at (DECL_SOURCE_LOCATION (node->decl), 0,
>>   			"unsupported argument type %qT for simd",
>> -			arg_type);
>> +			type);
>>   	  return 0;
>>   	}
>> -      unsigned lane_bits = lane_size (clonei->args[i].arg_type, arg_type);
>> +      unsigned lane_bits = lane_size (clonei->args[i].arg_type, type);
>>         if (clonei->args[i].arg_type == SIMD_CLONE_ARG_TYPE_VECTOR)
>> -	vec_elts.safe_push (std::make_pair (arg_type, lane_bits));
>> +	vec_elts.safe_push (std::make_pair (type, lane_bits));
>>         if (nds_elt_bits > lane_bits)
>>   	nds_elt_bits = lane_bits;
>> +      else if (wds_elt_bits < lane_bits)
>> +	wds_elt_bits = lane_bits;
>>       }
>>   
>> -  clonei->vecsize_mangle = 'n';
>> +  /* If we could not determine the WDS type from available parameters/return,
>> +     then fallback to using uintptr_t.  */
>> +  if (wds_elt_bits == 0)
>> +    wds_elt_bits = POINTER_SIZE;
>> +
>>     clonei->mask_mode = VOIDmode;
>>     poly_uint64 simdlen;
>>     auto_vec<poly_uint64> simdlens (2);
>> @@ -27558,6 +27568,11 @@ aarch64_simd_clone_compute_vecsize_and_simdlen (struct cgraph_node *node,
>>         simdlen = exact_div (poly_uint64 (64), nds_elt_bits);
>>         simdlens.safe_push (simdlen);
>>         simdlens.safe_push (simdlen * 2);
>> +      /* Only create a SVE simd clone if we aren't dealing with an unprototyped
>> +	 function.  */
>> +      if (DECL_ARGUMENTS (node->decl) != 0
>> +	  || type_arg_types != 0)
>> +	simdlens.safe_push (exact_div (poly_uint64 (128, 128), wds_elt_bits));
> 
> This check feels a bit indirect.  Does it work to use:
> 
>    if (prototype_p (TREE_TYPE (node->decl)))
> 
> instead?
> 
>>       }
>>     else
>>       simdlens.safe_push (clonei->simdlen);
>> @@ -27578,19 +27593,20 @@ aarch64_simd_clone_compute_vecsize_and_simdlen (struct cgraph_node *node,
>>     while (j < count && !simdlens.is_empty ())
>>       {
>>         bool remove_simdlen = false;
>> -      for (auto elt : vec_elts)
>> -	if (known_gt (simdlens[j] * elt.second, 128U))
>> -	  {
>> -	    /* Don't issue a warning for every simdclone when there is no
>> -	       specific simdlen clause.  */
>> -	    if (explicit_p && known_ne (clonei->simdlen, 0U))
>> -	      warning_at (DECL_SOURCE_LOCATION (node->decl), 0,
>> -			  "GCC does not currently support simdlen %wd for "
>> -			  "type %qT",
>> -			  constant_lower_bound (simdlens[j]), elt.first);
>> -	    remove_simdlen = true;
>> -	    break;
>> -	  }
>> +      if (simdlens[j].is_constant ())
>> +	for (auto elt : vec_elts)
>> +	  if (known_gt (simdlens[j] * elt.second, 128U))
>> +	    {
>> +	      /* Don't issue a warning for every simdclone when there is no
>> +		 specific simdlen clause.  */
>> +	      if (explicit_p && known_ne (clonei->simdlen, 0U))
>> +		warning_at (DECL_SOURCE_LOCATION (node->decl), 0,
>> +			    "GCC does not currently support simdlen %wd for "
>> +			    "type %qT",
>> +			    constant_lower_bound (simdlens[j]), elt.first);
>> +	      remove_simdlen = true;
>> +	      break;
>> +	    }
>>         if (remove_simdlen)
>>   	{
>>   	  count--;
>> @@ -27618,9 +27634,36 @@ aarch64_simd_clone_compute_vecsize_and_simdlen (struct cgraph_node *node,
>>   
>>     gcc_assert (num < count);
>>     clonei->simdlen = simdlens[num];
>> +  if (clonei->simdlen.is_constant ())
>> +    clonei->vecsize_mangle = 'n';
>> +  else
>> +    {
>> +      clonei->vecsize_mangle = 's';
>> +      clonei->inbranch = 1;
>> +    }
>>     return count;
>>   }
>>   
>> +static tree
>> +simd_clone_adjust_sve_vector_type (tree type, bool is_mask, poly_uint64 simdlen)
>> +{
>> +    unsigned int num_zr = 0;
> 
> Nits: missing function comment.  The body is indented by too many columns.
> 
>> +    unsigned int num_pr = 0;
>> +    type = TREE_TYPE (type);
>> +    type = build_vector_type (type, simdlen);
> 
> Is simdlen ever different from the original TYPE_VECTOR_SUBPARTS?
> I think a comment is needed if so.

Not right now, but I'm not sure why you are asking. So the reason why I 
say not right now is because we don't support multi-{register,argument} 
vector mappings, so as soon as simdlen means it wouldn't fit in a single 
register we reject, that's for both Advanced SIMD and SVE. If we would 
want to, then simdlen could be larger than type's TYPE_VECTOR_SUBPARTS.


>> +
>>   /* Implement TARGET_SIMD_CLONE_ADJUST.  */
>>   
>>   static void
>> @@ -27632,6 +27675,69 @@ aarch64_simd_clone_adjust (struct cgraph_node *node)
>>     tree t = TREE_TYPE (node->decl);
>>     TYPE_ATTRIBUTES (t) = make_attribute ("aarch64_vector_pcs", "default",
>>   					TYPE_ATTRIBUTES (t));
>> +
>> +  cl_target_option cur_target;
>> +  poly_uint16 old_sve_vg;
>> +  bool m_old_have_regs_of_mode[MAX_MACHINE_MODE];
>> +
>> +  if (node->simdclone->vecsize_mangle == 's')
>> +    {
>> +      tree target = build_string (strlen ("+sve"), "+sve");
>> +      aarch64_option_valid_attribute_p (node->decl, NULL_TREE, target, 0);
> 
> It would be good to assert that this succeeds.  It's unfortunate that we
> have a predicate with side-effects, but that's obviously not your fault. :)
Scared to think of how it wouldn't...
> 
>> +      cl_target_option_save (&cur_target, &global_options, &global_options_set);
>> +      tree new_target = DECL_FUNCTION_SPECIFIC_TARGET (node->decl);
>> +      cl_target_option_restore (&global_options, &global_options_set,
>> +				TREE_TARGET_OPTION (new_target));
>> +      aarch64_override_options_internal (&global_options);
> 
> Does this preserve any existing target attributes too, to the extent
> possible?  E.g. if the function has:
> 
>    #pragma GCC target "+sve2"
> 
> then I think we should honour that rather than dial down to "+sve".
> Same if the code is compiled with -march=armv9-a+sve2: we should
> compile the clone as SVE2 rather than SVE.

Yes it adds to it, so for sve2 this actually has no effect, because 
obviously sve is already enabled.
> 
>> +      memcpy (m_old_have_regs_of_mode, have_regs_of_mode,
>> +	      sizeof (have_regs_of_mode));
>> +      for (int i = 0; i < NUM_MACHINE_MODES; ++i)
>> +	if (aarch64_sve_mode_p ((machine_mode) i))
>> +	  have_regs_of_mode[i] = true;
> 
> Would it be possible to use push_cfun and pop_cfun instead, so that
> we do a proper target switch?

Where would I get the new cfun from? I'll go have a bit of a look around 
see if I can make snse of what this would do.
> 
> (The SVE ACLE code uses a similar technique to the above, but that's
> because it's run in a non-function context.)
> 
>> +      old_sve_vg = aarch64_sve_vg;
>> +      if (!node->simdclone->simdlen.is_constant ())
>> +	aarch64_sve_vg = poly_uint16 (2, 2);
> 
> I'm not sure we should change VG here.
Agree. I forgot about the decision to accept that -msve-vector-bits 
would influence the VG of simdclones. I'll also make the changes to the 
code to compute simdlen.

> 
> The basis for that and for allowing SVE2 above is that the ODR requires
> that all comdat instances of a function are compiled in the same way.
> That applies to all functions, not just simd clones.  So IMO it's user
> error if a clone is compiled multiple times with different target options,
> or if it's compiled with target options that the runtime target doesn't in
> fact support.
> 
> We're already implicitly assuming the same thing for Advanced SIMD,
> since we'll use whatever post-Armv8-A features happen to be enabled.
> 
Agree.


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH6/8] omp: Reorder call for TARGET_SIMD_CLONE_ADJUST (was Re: [PATCH7/8] vect: Add TARGET_SIMD_CLONE_ADJUST_RET_OR_PARAM)
  2023-10-31  7:59             ` Richard Biener
@ 2023-12-08 10:35               ` Jakub Jelinek
  0 siblings, 0 replies; 40+ messages in thread
From: Jakub Jelinek @ 2023-12-08 10:35 UTC (permalink / raw)
  To: Richard Biener; +Cc: Andre Vieira (lists), gcc-patches, Richard Sandiford

On Tue, Oct 31, 2023 at 07:59:25AM +0000, Richard Biener wrote:
> On Wed, 18 Oct 2023, Andre Vieira (lists) wrote:
> 
> > This patch moves the call to TARGET_SIMD_CLONE_ADJUST until after the
> > arguments and return types have been transformed into vector types.  It also
> > constructs the adjuments and retval modifications after this call, allowing
> > targets to alter the types of the arguments and return of the clone prior to
> > the modifications to the function definition.
> > 
> > Is this OK?
> 
> OK (I was hoping for Jakub to have a look).

Sorry for the delay, no objections from me there.

	Jakub


^ permalink raw reply	[flat|nested] 40+ messages in thread

end of thread, other threads:[~2023-12-08 10:35 UTC | newest]

Thread overview: 40+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-08-30  8:49 aarch64, vect, omp: Add SVE support for simd clones [PR 96342] Andre Vieira (lists)
2023-08-30  9:06 ` [PATCH 1/8] parloops: Copy target and optimizations when creating a function clone Andre Vieira (lists)
2023-08-30 12:31   ` Richard Biener
2023-10-18 14:40     ` Andre Vieira (lists)
2023-08-30  9:08 ` [Patch 2/8] parloops: Allow poly nit and bound Andre Vieira (lists)
2023-08-30 12:32   ` Richard Biener
2023-10-18 14:40     ` Andre Vieira (lists)
2023-08-30  9:10 ` [Patch 3/8] vect: Fix vect_get_smallest_scalar_type for simd clones Andre Vieira (lists)
2023-08-30 12:54   ` Richard Biener
2023-10-18 14:40     ` Andre Vieira (lists)
2023-10-19 12:07       ` Richard Biener
2023-08-30  9:11 ` [PATCH 4/8] vect: don't allow fully masked loops with non-masked simd clones [PR 110485] Andre Vieira (lists)
2023-08-30 12:54   ` Richard Biener
2023-10-18 14:40     ` Andre Vieira (lists)
2023-10-19 12:06       ` Richard Biener
2023-08-30  9:13 ` [PATCH 5/8] vect: Use inbranch simdclones in masked loops Andre Vieira (lists)
2023-10-18 14:41   ` Andre Vieira (lists)
2023-10-19 12:17     ` Richard Biener
2023-08-30  9:14 ` [PATCH 6/8] vect: Add vector_mode paramater to simd_clone_usable Andre Vieira (lists)
2023-08-30  9:17   ` Andre Vieira (lists)
2023-08-30 13:01   ` Richard Biener
2023-08-30 15:02     ` Andre Vieira (lists)
2023-08-31  6:39       ` Richard Biener
2023-09-28 15:57         ` Andre Vieira (lists)
2023-08-30  9:17 ` [PATCH7/8] vect: Add TARGET_SIMD_CLONE_ADJUST_RET_OR_PARAM Andre Vieira (lists)
2023-08-30 13:04   ` Richard Biener
2023-10-04 10:32     ` Andre Vieira (lists)
2023-10-04 10:41       ` Richard Biener
2023-10-04 12:40         ` Andre Vieira (lists)
2023-10-18 14:41           ` [PATCH6/8] omp: Reorder call for TARGET_SIMD_CLONE_ADJUST (was Re: [PATCH7/8] vect: Add TARGET_SIMD_CLONE_ADJUST_RET_OR_PARAM) Andre Vieira (lists)
2023-10-30 18:34             ` Andre Vieira (lists)
2023-10-31  7:59             ` Richard Biener
2023-12-08 10:35               ` Jakub Jelinek
2023-08-30  9:19 ` [PATCH 8/8] aarch64: Add SVE support for simd clones [PR 96342] Andre Vieira (lists)
2023-10-18 14:41   ` Andre Vieira (lists)
2023-11-29 17:01     ` Richard Sandiford
2023-12-01 16:39       ` Andre Vieira (lists)
2023-10-18 14:40 ` aarch64, vect, omp: " Andre Vieira (lists)
2023-10-18 14:41 ` [PATCH 0/8] omp: Replace simd_clone_subparts with TYPE_VECTOR_SUBPARTS Andre Vieira (lists)
2023-10-19  7:10   ` Richard Biener

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).