[RFC 0/X] Implement GCC support for AArch64 libmvec

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* [RFC 0/X] Implement GCC support for AArch64 libmvec
@ 2023-03-08 16:17 Andre Vieira (lists)
  2023-03-08 16:20 ` [PATCH 1/X] omp: Replace simd_clone_subparts with TYPE_VECTOR_SUBPARTS Andre Vieira (lists)
                   ` (6 more replies)
  0 siblings, 7 replies; 15+ messages in thread
From: Andre Vieira (lists) @ 2023-03-08 16:17 UTC (permalink / raw)
  To: gcc-patches; +Cc: jakub, Richard Sandiford, Richard Biener

Hi all,

This is a series of patches/RFCs to implement support in GCC to be able 
to target AArch64's libmvec functions that will be/are being added to glibc.
We have chosen to use the omp pragma '#pragma omp declare variant ...' 
with a simd construct as the way for glibc to inform GCC what functions 
are available.

For example, if we would like to supply a vector version of the scalar 
'cosf' we would have an include file with something like:
typedef __attribute__((__neon_vector_type__(4))) float __f32x4_t;
typedef __attribute__((__neon_vector_type__(2))) float __f32x2_t;
typedef __SVFloat32_t __sv_f32_t;
typedef __SVBool_t __sv_bool_t;
__f32x4_t _ZGVnN4v_cosf (__f32x4_t);
__f32x2_t _ZGVnN2v_cosf (__f32x2_t);
__sv_f32_t _ZGVsMxv_cosf (__sv_f32_t, __sv_bool_t);
#pragma omp declare variant(_ZGVnN4v_cosf) \
     match(construct = {simd(notinbranch, simdlen(4))}, device = 
{isa("simd")})
#pragma omp declare variant(_ZGVnN2v_cosf) \
     match(construct = {simd(notinbranch, simdlen(2))}, device = 
{isa("simd")})
#pragma omp declare variant(_ZGVsMxv_cosf) \
     match(construct = {simd(inbranch)}, device = {isa("sve")})
extern float cosf (float);

The BETA ABI can be found in the vfabia64 subdir of 
https://github.com/ARM-software/abi-aa/
This currently disagrees with how this patch series implements 'omp 
declare simd' for SVE and I also do not see a need for the 'omp declare 
variant' scalable extension constructs. I will make changes to the ABI 
once we've finalized the co-design of the ABI and this implementation.

The patch series has three main steps:
1) Add SVE support for 'omp declare simd', see PR 96342
2) Enable GCC to use omp declare variants with simd constructs as simd 
clones during auto-vectorization.
3) Add SLP support for vectorizable_simd_clone_call (This sounded like a 
nice thing to add as we want to move away from non-slp vectorization).

Below you can see the list of current Patches/RFCs, the difference being 
on how confident I am of the proposed changes. For the RFC I am hoping 
to get early comments on the approach, rather than more indepth 
code-reviews.

I appreciate we are still in Stage 4, so I can completely understand if 
you don't have time to review this now, but I thought it can't hurt to 
post these early.

Andre Vieira:
[PATCH] omp: Replace simd_clone_subparts with TYPE_VECTOR_SUBPARTS
[PATCH] parloops: Copy target and optimizations when creating a function 
clone
[PATCH] parloops: Allow poly nit and bound
[RFC] omp, aarch64: Add SVE support for 'omp declare simd' [PR 96342]
[RFC] omp: Create simd clones from 'omp declare variant's
[RFC] omp: Allow creation of simd clones from omp declare variant with 
-fopenmp-simd flag

Work in progress:
[RFC] vect: Enable SLP codegen for vectorizable_simd_clone_call

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 1/X] omp: Replace simd_clone_subparts with TYPE_VECTOR_SUBPARTS
  2023-03-08 16:17 [RFC 0/X] Implement GCC support for AArch64 libmvec Andre Vieira (lists)
@ 2023-03-08 16:20 ` Andre Vieira (lists)
  2023-04-20 15:20   ` Richard Sandiford
  2023-03-08 16:21 ` [PATCH 2/X] parloops: Copy target and optimizations when creating a function clone Andre Vieira (lists)
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 15+ messages in thread
From: Andre Vieira (lists) @ 2023-03-08 16:20 UTC (permalink / raw)
  To: gcc-patches; +Cc: jakub, Richard Sandiford, Richard Biener

[-- Attachment #1: Type: text/plain, Size: 507 bytes --]

Hi,

This patch replaces the uses of simd_clone_subparts with 
TYPE_VECTOR_SUBPARTS and removes the definition of the first.

gcc/ChangeLog:

         * omp-sind-clone.cc (simd_clone_subparts): Remove.
         (simd_clone_init_simd_arrays): Replace simd_clone_subparts with 
TYPE_VECTOR_SUBPARTS.
         (ipa_simd_modify_function_body): Likewise.
         * tree-vect-stmts.cc (simd_clone_subparts): Remove.
         (vectorizable_simd_clone_call): Replace simd_clone_subparts 
with TYPE_VECTOR_SUBPARTS.

[-- Attachment #2: libmvec_1.patch --]
[-- Type: text/plain, Size: 6470 bytes --]

diff --git a/gcc/omp-simd-clone.cc b/gcc/omp-simd-clone.cc
index 0949b8ba288dfc7e7692403bfc600983faddf5dd..48b480e7556d9ad8e5502e10e513ec36b17b9cbb 100644
--- a/gcc/omp-simd-clone.cc
+++ b/gcc/omp-simd-clone.cc
@@ -255,16 +255,6 @@ ok_for_auto_simd_clone (struct cgraph_node *node)
   return true;
 }
 
-
-/* Return the number of elements in vector type VECTYPE, which is associated
-   with a SIMD clone.  At present these always have a constant length.  */
-
-static unsigned HOST_WIDE_INT
-simd_clone_subparts (tree vectype)
-{
-  return TYPE_VECTOR_SUBPARTS (vectype).to_constant ();
-}
-
 /* Allocate a fresh `simd_clone' and return it.  NARGS is the number
    of arguments to reserve space for.  */
 
@@ -1027,7 +1017,7 @@ simd_clone_init_simd_arrays (struct cgraph_node *node,
 	    }
 	  continue;
 	}
-      if (known_eq (simd_clone_subparts (TREE_TYPE (arg)),
+      if (known_eq (TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg)),
 		    node->simdclone->simdlen))
 	{
 	  tree ptype = build_pointer_type (TREE_TYPE (TREE_TYPE (array)));
@@ -1039,7 +1029,7 @@ simd_clone_init_simd_arrays (struct cgraph_node *node,
 	}
       else
 	{
-	  unsigned int simdlen = simd_clone_subparts (TREE_TYPE (arg));
+	  poly_uint64 simdlen = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg));
 	  unsigned int times = vector_unroll_factor (node->simdclone->simdlen,
 						     simdlen);
 	  tree ptype = build_pointer_type (TREE_TYPE (TREE_TYPE (array)));
@@ -1225,9 +1215,9 @@ ipa_simd_modify_function_body (struct cgraph_node *node,
 		  iter, NULL_TREE, NULL_TREE);
       adjustments->register_replacement (&(*adjustments->m_adj_params)[j], r);
 
-      if (multiple_p (node->simdclone->simdlen, simd_clone_subparts (vectype)))
+      if (multiple_p (node->simdclone->simdlen, TYPE_VECTOR_SUBPARTS (vectype)))
 	j += vector_unroll_factor (node->simdclone->simdlen,
-				   simd_clone_subparts (vectype)) - 1;
+				   TYPE_VECTOR_SUBPARTS (vectype)) - 1;
     }
   adjustments->sort_replacements ();
 
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index df6239a1c61c7213ad3c1468723bc1adf70bc02c..c85b6babc4bc5bc3111ef326dcc8f32bb25333f6 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -3964,16 +3964,6 @@ vect_simd_lane_linear (tree op, class loop *loop,
     }
 }
 
-/* Return the number of elements in vector type VECTYPE, which is associated
-   with a SIMD clone.  At present these vectors always have a constant
-   length.  */
-
-static unsigned HOST_WIDE_INT
-simd_clone_subparts (tree vectype)
-{
-  return TYPE_VECTOR_SUBPARTS (vectype).to_constant ();
-}
-
 /* Function vectorizable_simd_clone_call.
 
    Check if STMT_INFO performs a function call that can be vectorized
@@ -4251,7 +4241,7 @@ vectorizable_simd_clone_call (vec_info *vinfo, stmt_vec_info stmt_info,
 							  slp_node);
 	if (arginfo[i].vectype == NULL
 	    || !constant_multiple_p (bestn->simdclone->simdlen,
-				     simd_clone_subparts (arginfo[i].vectype)))
+				     TYPE_VECTOR_SUBPARTS (arginfo[i].vectype)))
 	  return false;
       }
 
@@ -4349,15 +4339,19 @@ vectorizable_simd_clone_call (vec_info *vinfo, stmt_vec_info stmt_info,
 	    case SIMD_CLONE_ARG_TYPE_VECTOR:
 	      atype = bestn->simdclone->args[i].vector_type;
 	      o = vector_unroll_factor (nunits,
-					simd_clone_subparts (atype));
+					TYPE_VECTOR_SUBPARTS (atype));
 	      for (m = j * o; m < (j + 1) * o; m++)
 		{
-		  if (simd_clone_subparts (atype)
-		      < simd_clone_subparts (arginfo[i].vectype))
+		  poly_uint64 atype_subparts = TYPE_VECTOR_SUBPARTS (atype);
+		  poly_uint64 arginfo_subparts
+		    = TYPE_VECTOR_SUBPARTS (arginfo[i].vectype);
+		  if (known_lt (atype_subparts, arginfo_subparts))
 		    {
 		      poly_uint64 prec = GET_MODE_BITSIZE (TYPE_MODE (atype));
-		      k = (simd_clone_subparts (arginfo[i].vectype)
-			   / simd_clone_subparts (atype));
+		      if (!constant_multiple_p (atype_subparts,
+						arginfo_subparts, &k))
+			gcc_unreachable ();
+
 		      gcc_assert ((k & (k - 1)) == 0);
 		      if (m == 0)
 			{
@@ -4387,8 +4381,9 @@ vectorizable_simd_clone_call (vec_info *vinfo, stmt_vec_info stmt_info,
 		    }
 		  else
 		    {
-		      k = (simd_clone_subparts (atype)
-			   / simd_clone_subparts (arginfo[i].vectype));
+		      if (!constant_multiple_p (arginfo_subparts,
+						atype_subparts, &k))
+			gcc_unreachable ();
 		      gcc_assert ((k & (k - 1)) == 0);
 		      vec<constructor_elt, va_gc> *ctor_elts;
 		      if (k != 1)
@@ -4522,7 +4517,7 @@ vectorizable_simd_clone_call (vec_info *vinfo, stmt_vec_info stmt_info,
       if (vec_dest)
 	{
 	  gcc_assert (ratype
-		      || known_eq (simd_clone_subparts (rtype), nunits));
+		      || known_eq (TYPE_VECTOR_SUBPARTS (rtype), nunits));
 	  if (ratype)
 	    new_temp = create_tmp_var (ratype);
 	  else if (useless_type_conversion_p (vectype, rtype))
@@ -4536,13 +4531,13 @@ vectorizable_simd_clone_call (vec_info *vinfo, stmt_vec_info stmt_info,
 
       if (vec_dest)
 	{
-	  if (!multiple_p (simd_clone_subparts (vectype), nunits))
+	  if (!multiple_p (TYPE_VECTOR_SUBPARTS (vectype), nunits))
 	    {
 	      unsigned int k, l;
 	      poly_uint64 prec = GET_MODE_BITSIZE (TYPE_MODE (vectype));
 	      poly_uint64 bytes = GET_MODE_SIZE (TYPE_MODE (vectype));
 	      k = vector_unroll_factor (nunits,
-					simd_clone_subparts (vectype));
+					TYPE_VECTOR_SUBPARTS (vectype));
 	      gcc_assert ((k & (k - 1)) == 0);
 	      for (l = 0; l < k; l++)
 		{
@@ -4568,10 +4563,12 @@ vectorizable_simd_clone_call (vec_info *vinfo, stmt_vec_info stmt_info,
 		vect_clobber_variable (vinfo, stmt_info, gsi, new_temp);
 	      continue;
 	    }
-	  else if (!multiple_p (nunits, simd_clone_subparts (vectype)))
+	  else if (!multiple_p (nunits, TYPE_VECTOR_SUBPARTS (vectype)))
 	    {
-	      unsigned int k = (simd_clone_subparts (vectype)
-				/ simd_clone_subparts (rtype));
+	      unsigned int k;
+	      if (!constant_multiple_p (TYPE_VECTOR_SUBPARTS (vectype),
+					TYPE_VECTOR_SUBPARTS (rtype), &k))
+		gcc_unreachable ();
 	      gcc_assert ((k & (k - 1)) == 0);
 	      if ((j & (k - 1)) == 0)
 		vec_alloc (ret_ctor_elts, k);
@@ -4579,7 +4576,7 @@ vectorizable_simd_clone_call (vec_info *vinfo, stmt_vec_info stmt_info,
 		{
 		  unsigned int m, o;
 		  o = vector_unroll_factor (nunits,
-					    simd_clone_subparts (rtype));
+					    TYPE_VECTOR_SUBPARTS (rtype));
 		  for (m = 0; m < o; m++)
 		    {
 		      tree tem = build4 (ARRAY_REF, rtype, new_temp,

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 2/X] parloops: Copy target and optimizations when creating a function clone
  2023-03-08 16:17 [RFC 0/X] Implement GCC support for AArch64 libmvec Andre Vieira (lists)
  2023-03-08 16:20 ` [PATCH 1/X] omp: Replace simd_clone_subparts with TYPE_VECTOR_SUBPARTS Andre Vieira (lists)
@ 2023-03-08 16:21 ` Andre Vieira (lists)
  2023-03-08 16:23 ` [PATCH 3/X] parloops: Allow poly number of iterations Andre Vieira (lists)
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 15+ messages in thread
From: Andre Vieira (lists) @ 2023-03-08 16:21 UTC (permalink / raw)
  To: gcc-patches; +Cc: jakub, Richard Sandiford, Richard Biener

[-- Attachment #1: Type: text/plain, Size: 604 bytes --]

Hi,

This patch makes sure we copy over 
DECL_FUNCTION_SPECIFIC_{TARGET,OPTIMIZATION} in parloops when creating 
function clones.  This is required for SVE clones as we will need to 
enable +sve for them, regardless of the current target options.
I don't actually need the 'OPTIMIZATION' for this patch, but it sounds 
like a nice feature to have, so you can use pragmas to better control 
options used in simd_clone generation.

gcc/ChangeLog:

         * tree-parloops.cc (create_loop_fn): Copy specific target and 
optimization options
         when creating a function clone.

Is this OK for stage 1?

[-- Attachment #2: libmvec_2.patch --]
[-- Type: text/plain, Size: 593 bytes --]

diff --git a/gcc/tree-parloops.cc b/gcc/tree-parloops.cc
index dfb75c369d6d00d893ddd6fc28f189ec0d774711..02c1ed3220a949c1349536ef3f74bb497bf76f71 100644
--- a/gcc/tree-parloops.cc
+++ b/gcc/tree-parloops.cc
@@ -2203,6 +2203,11 @@ create_loop_fn (location_t loc)
   DECL_CONTEXT (t) = decl;
   TREE_USED (t) = 1;
   DECL_ARGUMENTS (decl) = t;
+  DECL_FUNCTION_SPECIFIC_TARGET (decl)
+    = DECL_FUNCTION_SPECIFIC_TARGET (act_cfun->decl);
+  DECL_FUNCTION_SPECIFIC_OPTIMIZATION (decl)
+    = DECL_FUNCTION_SPECIFIC_OPTIMIZATION (act_cfun->decl);
+
 
   allocate_struct_function (decl, false);
 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 3/X] parloops: Allow poly number of iterations
  2023-03-08 16:17 [RFC 0/X] Implement GCC support for AArch64 libmvec Andre Vieira (lists)
  2023-03-08 16:20 ` [PATCH 1/X] omp: Replace simd_clone_subparts with TYPE_VECTOR_SUBPARTS Andre Vieira (lists)
  2023-03-08 16:21 ` [PATCH 2/X] parloops: Copy target and optimizations when creating a function clone Andre Vieira (lists)
@ 2023-03-08 16:23 ` Andre Vieira (lists)
  2023-03-08 16:25 ` [RFC 4/X] omp, aarch64: Add SVE support for 'omp declare simd' [PR 96342] Andre Vieira (lists)
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 15+ messages in thread
From: Andre Vieira (lists) @ 2023-03-08 16:23 UTC (permalink / raw)
  To: gcc-patches; +Cc: jakub, Richard Sandiford, Richard Biener

[-- Attachment #1: Type: text/plain, Size: 236 bytes --]

Hi,

This patch modifies this function in parloops to allow it to handle 
loops with poly iteration counts.

gcc/ChangeLog:

         * tree-parloops.cc (try_transform_to_exit_first_loop_alt): 
Handle poly nits.

Is this OK for Stage 1?

[-- Attachment #2: libmvec_3.patch --]
[-- Type: text/plain, Size: 940 bytes --]

diff --git a/gcc/tree-parloops.cc b/gcc/tree-parloops.cc
index 02c1ed3220a949c1349536ef3f74bb497bf76f71..0a3133a3ae7932e11aa680dc14b8ea01613a514c 100644
--- a/gcc/tree-parloops.cc
+++ b/gcc/tree-parloops.cc
@@ -2531,14 +2531,16 @@ try_transform_to_exit_first_loop_alt (class loop *loop,
   tree nit_type = TREE_TYPE (nit);
 
   /* Figure out whether nit + 1 overflows.  */
-  if (TREE_CODE (nit) == INTEGER_CST)
+  if (TREE_CODE (nit) == INTEGER_CST
+      || TREE_CODE (nit) == POLY_INT_CST)
     {
       if (!tree_int_cst_equal (nit, TYPE_MAX_VALUE (nit_type)))
 	{
 	  alt_bound = fold_build2_loc (UNKNOWN_LOCATION, PLUS_EXPR, nit_type,
 				       nit, build_one_cst (nit_type));
 
-	  gcc_assert (TREE_CODE (alt_bound) == INTEGER_CST);
+	  gcc_assert (TREE_CODE (alt_bound) == INTEGER_CST
+		      || TREE_CODE (alt_bound) == POLY_INT_CST);
 	  transform_to_exit_first_loop_alt (loop, reduction_list, alt_bound);
 	  return true;
 	}

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [RFC 4/X] omp, aarch64: Add SVE support for 'omp declare simd' [PR 96342]
  2023-03-08 16:17 [RFC 0/X] Implement GCC support for AArch64 libmvec Andre Vieira (lists)
                   ` (2 preceding siblings ...)
  2023-03-08 16:23 ` [PATCH 3/X] parloops: Allow poly number of iterations Andre Vieira (lists)
@ 2023-03-08 16:25 ` Andre Vieira (lists)
  2023-03-08 16:26 ` [RFC 5/X] omp: Create simd clones from 'omp declare variant's Andre Vieira (lists)
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 15+ messages in thread
From: Andre Vieira (lists) @ 2023-03-08 16:25 UTC (permalink / raw)
  To: gcc-patches; +Cc: jakub, Richard Sandiford, Richard Biener

[-- Attachment #1: Type: text/plain, Size: 2309 bytes --]

Hi,

This patch adds SVE support for simd clone generation when using 'omp 
declare simd'. The design is based on what was discussed in PR 96342, 
but I did not look at YangYang's patch as I wasn't sure of whether that 
code's copyright had been assigned to FSF.

This patch also is not in accordance with the examples in the BETA 
VFABIA64 document that can be found in the vfabia64 subdir of 
https://github.com/ARM-software/abi-aa/
If we agree to this approach I will propose changes to the ABI.
It differs in that we take the ommission of 'simdlen' to be the only way 
to create a SVE simd clone using 'omp declare simd', and that the 
current target defined on the command-line has no bearing in what simd 
clones are generated. This SVE simd clone is always VLA.
The document describes a way to specify SVE simdclones of VLS by using 
the simdlen clause, but that would require another way to toggle between 
SVE and Advanced SIMD and since there is no clause to do that for 'omp 
declare simd' I would have to assume this would be controllable by the 
command-line target options (march/mcpu).
By generating all possible Advanced SIMD simdlens and a VLA simdlen for 
SVE when ommitting simdlen we would be adhering to the same practice 
x86_64 does.

Targethook changes

This patch proposes two targethook changes:
1) Add mode parameter to TARGET_SIMD_CLONE_USABLE
We require the mode parameter to distinguish between calls to a simd 
clone from a Advanced SIMD mode and a SVE mode.

2) Add new TARGET_SIMD_CLONE_ADJUST_RET_OR_PARAM
We require this to be able to modify the types used in SVE simd clones, 
as we need to add the SVE type attribute so that the correct PCS can be 
applied.

Other notable changes:
- We discourage the use of an 'inbranch' simdclone for when the caller 
is not in a branch, such that it picks a 'notinbranch' variant if 
available over an inbranch one. (we could probably rely on ordering but 
that's quite error prone and the ordering I'm looking at is by 
definition target specific).
- I currently put the VLA mangling in the target agnostic mangling 
function, if other targets with VLA want to use a different mangling in 
the future we may want to change this into a targethook.


I'll create a ChangeLog when I turn this into a PATCH if we agree on 
this direction.

[-- Attachment #2: libmvec_4.patch --]
[-- Type: text/plain, Size: 20319 bytes --]

diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
index f75eb892f3daa7c2576efcedc8d944ab1e895cdb..122a473770eb4526ecce326f02d843608d088b5b 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -995,6 +995,8 @@ namespace aarch64_sve {
 #ifdef GCC_TARGET_H
   bool verify_type_context (location_t, type_context_kind, const_tree, bool);
 #endif
+ void add_sve_type_attribute (tree, unsigned int, unsigned int,
+			      const char *, const char *);
 }
 
 extern void aarch64_split_combinev16qi (rtx operands[3]);
diff --git a/gcc/config/aarch64/aarch64-sve-builtins.cc b/gcc/config/aarch64/aarch64-sve-builtins.cc
index 161a14edde7c9fb1b13b146cf50463e2d78db264..6f99c438d10daa91b7e3b623c995489f1a8a0f4c 100644
--- a/gcc/config/aarch64/aarch64-sve-builtins.cc
+++ b/gcc/config/aarch64/aarch64-sve-builtins.cc
@@ -569,14 +569,16 @@ static bool reported_missing_registers_p;
 /* Record that TYPE is an ABI-defined SVE type that contains NUM_ZR SVE vectors
    and NUM_PR SVE predicates.  MANGLED_NAME, if nonnull, is the ABI-defined
    mangling of the type.  ACLE_NAME is the <arm_sve.h> name of the type.  */
-static void
+void
 add_sve_type_attribute (tree type, unsigned int num_zr, unsigned int num_pr,
 			const char *mangled_name, const char *acle_name)
 {
   tree mangled_name_tree
     = (mangled_name ? get_identifier (mangled_name) : NULL_TREE);
+  tree acle_name_tree
+    = (acle_name ? get_identifier (acle_name) : NULL_TREE);
 
-  tree value = tree_cons (NULL_TREE, get_identifier (acle_name), NULL_TREE);
+  tree value = tree_cons (NULL_TREE, acle_name_tree, NULL_TREE);
   value = tree_cons (NULL_TREE, mangled_name_tree, value);
   value = tree_cons (NULL_TREE, size_int (num_pr), value);
   value = tree_cons (NULL_TREE, size_int (num_zr), value);
diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 5c40b6ed22a508723bd535a7460762c3a243d441..ef93a4e9d43799df4410f152cdd798db285e8897 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -4015,13 +4015,13 @@ aarch64_takes_arguments_in_sve_regs_p (const_tree fntype)
 static const predefined_function_abi &
 aarch64_fntype_abi (const_tree fntype)
 {
-  if (lookup_attribute ("aarch64_vector_pcs", TYPE_ATTRIBUTES (fntype)))
-    return aarch64_simd_abi ();
-
   if (aarch64_returns_value_in_sve_regs_p (fntype)
       || aarch64_takes_arguments_in_sve_regs_p (fntype))
     return aarch64_sve_abi ();
 
+  if (lookup_attribute ("aarch64_vector_pcs", TYPE_ATTRIBUTES (fntype)))
+    return aarch64_simd_abi ();
+
   return default_function_abi;
 }
 
@@ -26968,14 +26968,21 @@ aarch64_simd_clone_compute_vecsize_and_simdlen (struct cgraph_node *node,
 	}
     }
 
-  clonei->vecsize_mangle = 'n';
   clonei->mask_mode = VOIDmode;
   elt_bits = GET_MODE_BITSIZE (SCALAR_TYPE_MODE (base_type));
   if (known_eq (clonei->simdlen, 0U))
     {
-      count = 2;
-      vec_bits = (num == 0 ? 64 : 128);
-      clonei->simdlen = exact_div (vec_bits, elt_bits);
+      if (num >= 2)
+	{
+	  vec_bits = poly_uint64 (128, 128);
+	  clonei->simdlen = exact_div (vec_bits, elt_bits);
+	}
+      else
+	{
+	  count = 3;
+	  vec_bits = (num == 0 ? 64 : 128);
+	  clonei->simdlen = exact_div (vec_bits, elt_bits);
+	}
     }
   else
     {
@@ -26994,6 +27001,15 @@ aarch64_simd_clone_compute_vecsize_and_simdlen (struct cgraph_node *node,
 	  return 0;
 	}
     }
+
+  if (num >= 2)
+    {
+      clonei->vecsize_mangle = 's';
+      clonei->inbranch = 1;
+    }
+  else
+    clonei->vecsize_mangle = 'n';
+
   clonei->vecsize_int = vec_bits;
   clonei->vecsize_float = vec_bits;
   return count;
@@ -27010,17 +27026,28 @@ aarch64_simd_clone_adjust (struct cgraph_node *node)
   tree t = TREE_TYPE (node->decl);
   TYPE_ATTRIBUTES (t) = make_attribute ("aarch64_vector_pcs", "default",
 					TYPE_ATTRIBUTES (t));
+  if (node->simdclone->vecsize_mangle == 's')
+    {
+      tree target = build_string (strlen ("+sve"), "+sve");
+      aarch64_option_valid_attribute_p (node->decl, NULL_TREE, target, 0);
+    }
 }
 
 /* Implement TARGET_SIMD_CLONE_USABLE.  */
 
 static int
-aarch64_simd_clone_usable (struct cgraph_node *node)
+aarch64_simd_clone_usable (struct cgraph_node *node, machine_mode vector_mode)
 {
   switch (node->simdclone->vecsize_mangle)
     {
     case 'n':
-      if (!TARGET_SIMD)
+      if (!TARGET_SIMD
+	  || aarch64_sve_mode_p (vector_mode))
+	return -1;
+      return 0;
+    case 's':
+      if (!TARGET_SVE
+	  || !aarch64_sve_mode_p (vector_mode))
 	return -1;
       return 0;
     default:
@@ -27028,6 +27055,61 @@ aarch64_simd_clone_usable (struct cgraph_node *node)
     }
 }
 
+/* Implement TARGET_SIMD_CLONE_ADJUST_RET_OR_PARAM.  */
+
+static tree
+aarch64_simd_clone_adjust_ret_or_param (struct cgraph_node *node, tree type,
+					bool is_mask)
+{
+  if (type
+      && VECTOR_TYPE_P (type)
+      && node->simdclone->vecsize_mangle == 's')
+    {
+      cl_target_option cur_target;
+      cl_target_option_save (&cur_target, &global_options, &global_options_set);
+      tree new_target = DECL_FUNCTION_SPECIFIC_TARGET (node->decl);
+      cl_target_option_restore (&global_options, &global_options_set,
+				TREE_TARGET_OPTION (new_target));
+      aarch64_override_options_internal (&global_options);
+      bool m_old_have_regs_of_mode[MAX_MACHINE_MODE];
+      memcpy (m_old_have_regs_of_mode, have_regs_of_mode,
+	      sizeof (have_regs_of_mode));
+      for (int i = 0; i < NUM_MACHINE_MODES; ++i)
+	if (aarch64_sve_mode_p ((machine_mode) i))
+	  have_regs_of_mode[i] = true;
+      poly_uint16 old_sve_vg = aarch64_sve_vg;
+      if (!node->simdclone->simdlen.is_constant ())
+	aarch64_sve_vg = poly_uint16 (2, 2);
+      unsigned int num_zr = 0;
+      unsigned int num_pr = 0;
+      if (is_mask)
+	{
+	  type = truth_type_for (type);
+	  num_pr = 1;
+	}
+      else
+	{
+	  num_zr = 1;
+	  tree base_type = TREE_TYPE (type);
+	  if (POINTER_TYPE_P (base_type))
+	    base_type = pointer_sized_int_node;
+	  poly_int64 vec_size = tree_to_poly_int64 (TYPE_SIZE (type));
+	  scalar_mode base_mode = as_a <scalar_mode> (TYPE_MODE (base_type));
+	  machine_mode vec_mode
+	    = aarch64_simd_container_mode (base_mode, vec_size);
+	  type = build_vector_type_for_mode (base_type, vec_mode);
+	}
+
+      aarch64_sve::add_sve_type_attribute (type, num_zr, num_pr, NULL, NULL);
+      cl_target_option_restore (&global_options, &global_options_set, &cur_target);
+      aarch64_override_options_internal (&global_options);
+      memcpy (have_regs_of_mode, m_old_have_regs_of_mode,
+	      sizeof (have_regs_of_mode));
+      aarch64_sve_vg = old_sve_vg;
+    }
+  return type;
+}
+
 /* Implement TARGET_COMP_TYPE_ATTRIBUTES */
 
 static int
@@ -28048,6 +28130,10 @@ aarch64_libgcc_floating_mode_supported_p
 #undef TARGET_SIMD_CLONE_USABLE
 #define TARGET_SIMD_CLONE_USABLE aarch64_simd_clone_usable
 
+#undef TARGET_SIMD_CLONE_ADJUST_RET_OR_PARAM
+#define TARGET_SIMD_CLONE_ADJUST_RET_OR_PARAM \
+  aarch64_simd_clone_adjust_ret_or_param
+
 #undef TARGET_COMP_TYPE_ATTRIBUTES
 #define TARGET_COMP_TYPE_ATTRIBUTES aarch64_comp_type_attributes
 
diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index c6c891972d1e58cd163b259ba96a599d62326865..ed12271027305a0017cb9b2ff821bad403c52836 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -6306,11 +6306,16 @@ This hook should add implicit @code{attribute(target("..."))} attribute
 to SIMD clone @var{node} if needed.
 @end deftypefn
 
-@deftypefn {Target Hook} int TARGET_SIMD_CLONE_USABLE (struct cgraph_node *@var{})
+@deftypefn {Target Hook} int TARGET_SIMD_CLONE_USABLE (struct cgraph_node *@var{}, @var{machine_mode})
 This hook should return -1 if SIMD clone @var{node} shouldn't be used
-in vectorized loops in current function, or non-negative number if it is
-usable.  In that case, the smaller the number is, the more desirable it is
-to use it.
+in vectorized loops being vectorized with mode @var{m} in current function, or
+non-negative number if it is usable.  In that case, the smaller the number is,
+the more desirable it is to use it.
+@end deftypefn
+
+@deftypefn {Target Hook} tree TARGET_SIMD_CLONE_ADJUST_RET_OR_PARAM (struct cgraph_node *@var{}, @var{tree}, @var{bool})
+If defined, this hook should adjust the type of the return or parameter
+@var{type} to be used by the simd clone @var{node}.
 @end deftypefn
 
 @deftypefn {Target Hook} int TARGET_SIMT_VF (void)
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index 613b2534149415f442163d599503efaf423b673b..fd0d2c8d0dcc2fd249b34745d77749d99c49d13d 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -4205,6 +4205,8 @@ address;  but often a machine-dependent strategy can generate better code.
 
 @hook TARGET_SIMD_CLONE_USABLE
 
+@hook TARGET_SIMD_CLONE_ADJUST_RET_OR_PARAM
+
 @hook TARGET_SIMT_VF
 
 @hook TARGET_OMP_DEVICE_KIND_ARCH_ISA
diff --git a/gcc/omp-simd-clone.cc b/gcc/omp-simd-clone.cc
index 48b480e7556d9ad8e5502e10e513ec36b17b9cbb..4808608b7a1c06802ee231480c2003cf41c11799 100644
--- a/gcc/omp-simd-clone.cc
+++ b/gcc/omp-simd-clone.cc
@@ -378,8 +378,9 @@ simd_clone_clauses_extract (struct cgraph_node *node, tree clauses,
 		  arg_type = SIMD_CLONE_ARG_TYPE_LINEAR_VARIABLE_STEP;
 		clone_info->args[argno].arg_type = arg_type;
 		clone_info->args[argno].linear_step = tree_to_shwi (step);
+		int nargs = clone_info->nargs;
 		gcc_assert (clone_info->args[argno].linear_step >= 0
-			    && clone_info->args[argno].linear_step < n);
+			    && clone_info->args[argno].linear_step < nargs);
 	      }
 	    else
 	      {
@@ -541,9 +542,12 @@ simd_clone_mangle (struct cgraph_node *node,
   pp_string (&pp, "_ZGV");
   pp_character (&pp, vecsize_mangle);
   pp_character (&pp, mask);
-  /* For now, simdlen is always constant, while variable simdlen pp 'n'.  */
-  unsigned int len = simdlen.to_constant ();
-  pp_decimal_int (&pp, (len));
+
+  unsigned long long len = 0;
+  if (simdlen.is_constant (&len))
+    pp_decimal_int (&pp, (int) (len));
+  else
+    pp_character (&pp, 'x');
 
   for (n = 0; n < clone_info->nargs; ++n)
     {
@@ -736,6 +740,7 @@ simd_clone_adjust_return_type (struct cgraph_node *node)
       t = build_array_type_nelts (t, exact_div (node->simdclone->simdlen,
 						veclen));
     }
+  t = targetm.simd_clone.adjust_ret_or_param (node, t, false);
   TREE_TYPE (TREE_TYPE (fndecl)) = t;
   if (!node->definition)
     return NULL_TREE;
@@ -748,6 +753,7 @@ simd_clone_adjust_return_type (struct cgraph_node *node)
 
   tree atype = build_array_type_nelts (orig_rettype,
 				       node->simdclone->simdlen);
+  atype = targetm.simd_clone.adjust_ret_or_param (node, atype, false);
   if (maybe_ne (veclen, node->simdclone->simdlen))
     return build1 (VIEW_CONVERT_EXPR, atype, t);
 
@@ -807,8 +813,14 @@ simd_clone_adjust_argument_types (struct cgraph_node *node)
     {
       ipa_adjusted_param adj;
       memset (&adj, 0, sizeof (adj));
-      tree parm = args[i];
-      tree parm_type = node->definition ? TREE_TYPE (parm) : parm;
+      tree parm = NULL_TREE;
+      tree parm_type = NULL_TREE;
+      if(i < args.length())
+	{
+	  parm = args[i];
+	  parm_type = node->definition ? TREE_TYPE (parm) : parm;
+	}
+
       adj.base_index = i;
       adj.prev_clone_index = i;
 
@@ -874,6 +886,8 @@ simd_clone_adjust_argument_types (struct cgraph_node *node)
 				       ? IDENTIFIER_POINTER (DECL_NAME (parm))
 				       : NULL, parm_type, sc->simdlen);
 	}
+      adj.type = targetm.simd_clone.adjust_ret_or_param (node, adj.type,
+							 false);
       vec_safe_push (new_params, adj);
     }
 
@@ -906,6 +920,8 @@ simd_clone_adjust_argument_types (struct cgraph_node *node)
 	adj.type = build_vector_type (pointer_sized_int_node, veclen);
       else
 	adj.type = build_vector_type (base_type, veclen);
+      adj.type = targetm.simd_clone.adjust_ret_or_param (node, adj.type,
+							 true);
       vec_safe_push (new_params, adj);
 
       k = vector_unroll_factor (sc->simdlen, veclen);
@@ -931,6 +947,7 @@ simd_clone_adjust_argument_types (struct cgraph_node *node)
 	    sc->args[i].simd_array = NULL_TREE;
 	}
       sc->args[i].orig_type = base_type;
+      sc->args[i].vector_type = adj.type;
       sc->args[i].arg_type = SIMD_CLONE_ARG_TYPE_MASK;
     }
 
@@ -1485,8 +1502,8 @@ simd_clone_adjust (struct cgraph_node *node)
 	 below).  */
       loop = alloc_loop ();
       cfun->has_force_vectorize_loops = true;
-      /* For now, simlen is always constant.  */
-      loop->safelen = node->simdclone->simdlen.to_constant ();
+      /* We can assert that safelen is the 'minimum' simdlen.  */
+      loop->safelen = constant_lower_bound (node->simdclone->simdlen);
       loop->force_vectorize = true;
       loop->header = body_bb;
     }
@@ -1546,7 +1563,7 @@ simd_clone_adjust (struct cgraph_node *node)
 	  mask = gimple_assign_lhs (g);
 	  g = gimple_build_assign (make_ssa_name (TREE_TYPE (mask)),
 				   BIT_AND_EXPR, mask,
-				   build_int_cst (TREE_TYPE (mask), 1));
+				   build_one_cst (TREE_TYPE (mask)));
 	  gsi_insert_after (&gsi, g, GSI_CONTINUE_LINKING);
 	  mask = gimple_assign_lhs (g);
 	}
diff --git a/gcc/target.def b/gcc/target.def
index db8af0cbe81624513f114fc9bbd8be61d855f409..ffa12aa9023bb8f26a647a9848800c77f34afc67 100644
--- a/gcc/target.def
+++ b/gcc/target.def
@@ -1645,10 +1645,18 @@ void, (struct cgraph_node *), NULL)
 DEFHOOK
 (usable,
 "This hook should return -1 if SIMD clone @var{node} shouldn't be used\n\
-in vectorized loops in current function, or non-negative number if it is\n\
-usable.  In that case, the smaller the number is, the more desirable it is\n\
-to use it.",
-int, (struct cgraph_node *), NULL)
+in vectorized loops being vectorized with mode @var{m} in current function, or\n\
+non-negative number if it is usable.  In that case, the smaller the number is,\n\
+the more desirable it is to use it.",
+int, (struct cgraph_node *, machine_mode), NULL)
+
+DEFHOOK
+(adjust_ret_or_param,
+"If defined, this hook should adjust the type of the return or parameter\n\
+@var{type} to be used by the simd clone @var{node}.",
+tree, (struct cgraph_node *, tree, bool),
+default_simd_clone_adjust_ret_or_param)
+
 
 HOOK_VECTOR_END (simd_clone)
 
diff --git a/gcc/targhooks.h b/gcc/targhooks.h
index a1df260f5483dc84f18d8f12c5202484a32d5bb7..860fb8ccbf1ab00c43dc4b4d32808c1f488406e4 100644
--- a/gcc/targhooks.h
+++ b/gcc/targhooks.h
@@ -73,6 +73,9 @@ extern void default_print_operand (FILE *, rtx, int);
 extern void default_print_operand_address (FILE *, machine_mode, rtx);
 extern bool default_print_operand_punct_valid_p (unsigned char);
 extern tree default_mangle_assembler_name (const char *);
+extern tree default_simd_clone_adjust_ret_or_param
+  (struct cgraph_node *,tree , bool);
+
 
 extern machine_mode default_translate_mode_attribute (machine_mode);
 extern bool default_scalar_mode_supported_p (scalar_mode);
diff --git a/gcc/targhooks.cc b/gcc/targhooks.cc
index fe0116521feaf32187e7bc113bf93b1805852c79..4e54ceb0297828cf13e418dfa113651670a6f112 100644
--- a/gcc/targhooks.cc
+++ b/gcc/targhooks.cc
@@ -398,6 +398,16 @@ default_mangle_assembler_name (const char *name ATTRIBUTE_UNUSED)
   return get_identifier (stripped);
 }
 
+/* The default implementation of TARGET_SIMD_CLONE_ADJUST_RET_OR_PARAM.  */
+
+tree
+default_simd_clone_adjust_ret_or_param (struct cgraph_node *node ATTRIBUTE_UNUSED,
+					tree type,
+					bool is_return ATTRIBUTE_UNUSED)
+{
+  return type;
+}
+
 /* The default implementation of TARGET_TRANSLATE_MODE_ATTRIBUTE.  */
 
 machine_mode
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index c85b6babc4bc5bc3111ef326dcc8f32bb25333f6..da6aa3f193bd52a1e40bb6dbe3d483f92ecd7896 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -2759,7 +2759,8 @@ vect_build_all_ones_mask (vec_info *vinfo,
 {
   if (TREE_CODE (masktype) == INTEGER_TYPE)
     return build_int_cst (masktype, -1);
-  else if (TREE_CODE (TREE_TYPE (masktype)) == INTEGER_TYPE)
+  else if (VECTOR_BOOLEAN_TYPE_P (masktype)
+	   || TREE_CODE (TREE_TYPE (masktype)) == INTEGER_TYPE)
     {
       tree mask = build_int_cst (TREE_TYPE (masktype), -1);
       mask = build_vector_from_val (masktype, mask);
@@ -4136,14 +4137,6 @@ vectorizable_simd_clone_call (vec_info *vinfo, stmt_vec_info stmt_info,
     }
 
   poly_uint64 vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
-  if (!vf.is_constant ())
-    {
-      if (dump_enabled_p ())
-	dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
-			 "not considering SIMD clones; not yet supported"
-			 " for variable-width vectors.\n");
-      return false;
-    }
 
   unsigned int badness = 0;
   struct cgraph_node *bestn = NULL;
@@ -4156,20 +4149,17 @@ vectorizable_simd_clone_call (vec_info *vinfo, stmt_vec_info stmt_info,
 	unsigned int this_badness = 0;
 	unsigned int num_calls;
 	if (!constant_multiple_p (vf, n->simdclone->simdlen, &num_calls)
-	    || n->simdclone->nargs != nargs)
+	    || n->simdclone->nargs != (nargs + n->simdclone->inbranch))
 	  continue;
 	if (num_calls != 1)
 	  this_badness += exact_log2 (num_calls) * 4096;
 	if (n->simdclone->inbranch)
 	  this_badness += 8192;
-	int target_badness = targetm.simd_clone.usable (n);
+	int target_badness = targetm.simd_clone.usable (n, vinfo->vector_mode);
 	if (target_badness < 0)
 	  continue;
 	this_badness += target_badness * 512;
-	/* FORNOW: Have to add code to add the mask argument.  */
-	if (n->simdclone->inbranch)
-	  continue;
-	for (i = 0; i < nargs; i++)
+	for (i = 0; i < n->simdclone->nargs; i++)
 	  {
 	    switch (n->simdclone->args[i].arg_type)
 	      {
@@ -4206,16 +4196,22 @@ vectorizable_simd_clone_call (vec_info *vinfo, stmt_vec_info stmt_info,
 		i = -1;
 		break;
 	      case SIMD_CLONE_ARG_TYPE_MASK:
-		gcc_unreachable ();
+		/* Penalize using a predicated SIMD clone in a non-masked loop,
+		   as we'd have to needlessly construct an all-true mask.  */
+		if (!loop_vinfo || !LOOP_VINFO_FULLY_MASKED_P (loop_vinfo))
+		  this_badness += 64;
+		break;
 	      }
 	    if (i == (size_t) -1)
 	      break;
-	    if (n->simdclone->args[i].alignment > arginfo[i].align)
+	    if (i < nargs
+		&& n->simdclone->args[i].alignment > arginfo[i].align)
 	      {
 		i = -1;
 		break;
 	      }
-	    if (arginfo[i].align)
+	    if (i < nargs
+		&& arginfo[i].align)
 	      this_badness += (exact_log2 (arginfo[i].align)
 			       - exact_log2 (n->simdclone->args[i].alignment));
 	  }
@@ -4248,6 +4244,7 @@ vectorizable_simd_clone_call (vec_info *vinfo, stmt_vec_info stmt_info,
   fndecl = bestn->decl;
   nunits = bestn->simdclone->simdlen;
   ncopies = vector_unroll_factor (vf, nunits);
+  nargs = bestn->simdclone->nargs;
 
   /* If the function isn't const, only allow it in simd loops where user
      has asserted that at least nunits consecutive iterations can be
@@ -4331,11 +4328,45 @@ vectorizable_simd_clone_call (vec_info *vinfo, stmt_vec_info stmt_info,
 
       for (i = 0; i < nargs; i++)
 	{
-	  unsigned int k, l, m, o;
+	  unsigned long long k, l, m, o;
 	  tree atype;
-	  op = gimple_call_arg (stmt, i);
+	  if (i < gimple_call_num_args (stmt))
+	    op = gimple_call_arg (stmt, i);
+	  else
+	    op = NULL_TREE;
+
 	  switch (bestn->simdclone->args[i].arg_type)
 	    {
+	    case SIMD_CLONE_ARG_TYPE_MASK:
+		{
+		    tree mask;
+		    atype = bestn->simdclone->args[i].vector_type;
+		    if (loop_vinfo && LOOP_VINFO_FULLY_MASKED_P (loop_vinfo))
+		      {
+			vec_loop_masks *loop_masks
+			  = &LOOP_VINFO_MASKS (loop_vinfo);
+			mask = vect_get_loop_mask (gsi, loop_masks, ncopies,
+						   vectype, j);
+		      }
+		    else
+		      {
+			tree mask_type = bestn->simdclone->args[i].vector_type;
+			mask
+			  = vect_build_all_ones_mask (vinfo, stmt_info,
+						      mask_type);
+		      }
+		    if (!useless_type_conversion_p (TREE_TYPE (mask), atype))
+		      {
+			mask = build1 (VIEW_CONVERT_EXPR, atype, mask);
+			gassign *new_stmt
+			  = gimple_build_assign (make_ssa_name (atype), mask);
+			vect_finish_stmt_generation (vinfo, stmt_info,
+						     new_stmt, gsi);
+			mask = gimple_assign_lhs (new_stmt);
+		      }
+		    vargs.safe_push (mask);
+		}
+	      break;
 	    case SIMD_CLONE_ARG_TYPE_VECTOR:
 	      atype = bestn->simdclone->args[i].vector_type;
 	      o = vector_unroll_factor (nunits,

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [RFC 5/X] omp: Create simd clones from 'omp declare variant's
  2023-03-08 16:17 [RFC 0/X] Implement GCC support for AArch64 libmvec Andre Vieira (lists)
                   ` (3 preceding siblings ...)
  2023-03-08 16:25 ` [RFC 4/X] omp, aarch64: Add SVE support for 'omp declare simd' [PR 96342] Andre Vieira (lists)
@ 2023-03-08 16:26 ` Andre Vieira (lists)
  2023-03-08 16:28 ` [RFC 6/X] omp: Allow creation of simd clones from omp declare variant with -fopenmp-simd flag Andre Vieira (lists)
  2023-04-20 14:51 ` [RFC 0/X] Implement GCC support for AArch64 libmvec Richard Sandiford
  6 siblings, 0 replies; 15+ messages in thread
From: Andre Vieira (lists) @ 2023-03-08 16:26 UTC (permalink / raw)
  To: gcc-patches; +Cc: jakub, Richard Sandiford, Richard Biener

[-- Attachment #1: Type: text/plain, Size: 1096 bytes --]

Hi,

This RFC extends the omp-simd-clone pass to create simd clones for 
functions with 'omp declare variant' pragmas that contain simd 
constructs. This patch also implements AArch64's use for this functionality.
This requires two extra pieces of information be kept for each 
simd-clone, a 'variant_name' since each variant has to be named upon 
declaration, and a 'device' since a omp variant has the capability of 
having device clauses that can 'select' the device the variant is meant 
to be used with. For the latter I decided to currently implement it as 
an 'int', to keep a 'code' per device which is target dependent. Though 
we may want to expand this in the future to contain a target dependent 
'target selector' of sorts. This would enable the implementation of the 
'arch' device clause we describe in the BETA ABI can be found in the 
vfabia64 subdir of https://github.com/ARM-software/abi-aa/, this patch 
only implements support for the two 'isa' device clauses isa("simd") and 
isa("sve").

I'll create a ChangeLog when I turn this into a PATCH if we agree on 
this direction.

[-- Attachment #2: libmvec_5.patch --]
[-- Type: text/plain, Size: 21717 bytes --]

diff --git a/gcc/cgraph.h b/gcc/cgraph.h
index b5fc739f1b0602a871040292a5bb1d69a9ef305f..ae1af65a9b5913ec435e783223e79767ddd68341 100644
--- a/gcc/cgraph.h
+++ b/gcc/cgraph.h
@@ -810,6 +810,14 @@ struct GTY(()) cgraph_simd_clone {
   /* Original cgraph node the SIMD clones were created for.  */
   cgraph_node *origin;
 
+  /* This is a flag to indicate what device was selected for the variant
+     clone.  Always 0 for 'omp declare simd' clones.  */
+  unsigned device;
+
+  /* The identifier for the name of the variant in case of a declare variant
+     clone, this is NULL_TREE for declare simd clones.  */
+  tree variant_name;
+
   /* Annotated function arguments for the original function.  */
   cgraph_simd_clone_arg GTY((length ("%h.nargs"))) args[1];
 };
diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index ef93a4e9d43799df4410f152cdd798db285e8897..344c6001fdd646a31326f5deb8ff94873d346ed1 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -26970,15 +26970,28 @@ aarch64_simd_clone_compute_vecsize_and_simdlen (struct cgraph_node *node,
 
   clonei->mask_mode = VOIDmode;
   elt_bits = GET_MODE_BITSIZE (SCALAR_TYPE_MODE (base_type));
+  /* A simdclone without simdlen can legally originate from either a:
+     'omp declare simd':
+	In this case generate at least 3 simd clones, one for Advanced SIMD
+	64-bit vectors, one for Advanced SIMD 128-bit vectors and one for SVE
+	vector length agnostic vectors.
+      'omp declare variant':
+	In this case we must be generating a simd clone for SVE vector length
+	agnostic vectors.
+   */
   if (known_eq (clonei->simdlen, 0U))
     {
-      if (num >= 2)
+      if (clonei->device == 2 || num >= 2)
 	{
+	  count = 1;
 	  vec_bits = poly_uint64 (128, 128);
 	  clonei->simdlen = exact_div (vec_bits, elt_bits);
 	}
       else
 	{
+	  if (clonei->device != 0)
+	    return 0;
+
 	  count = 3;
 	  vec_bits = (num == 0 ? 64 : 128);
 	  clonei->simdlen = exact_div (vec_bits, elt_bits);
@@ -26991,7 +27004,14 @@ aarch64_simd_clone_compute_vecsize_and_simdlen (struct cgraph_node *node,
       /* For now, SVE simdclones won't produce illegal simdlen, So only check
 	 const simdlens here.  */
       if (clonei->simdlen.is_constant (&const_simdlen)
-	  && maybe_ne (vec_bits, 64U) && maybe_ne (vec_bits, 128U))
+	  /* For Advanced SIMD we require either 64- or 128-bit vectors.  */
+	  && ((clonei->device < 2
+	       && maybe_ne (vec_bits, 64U)
+	       && maybe_ne (vec_bits, 128U))
+	  /* For SVE we require multiples of 128-bits.  TODO: should we check
+	     for max VL?  */
+	      || (clonei->device == 2
+		  && !constant_multiple_p (vec_bits, 128))))
 	{
 	  if (explicit_p)
 	    warning_at (DECL_SOURCE_LOCATION (node->decl), 0,
@@ -27002,7 +27022,7 @@ aarch64_simd_clone_compute_vecsize_and_simdlen (struct cgraph_node *node,
 	}
     }
 
-  if (num >= 2)
+  if (clonei->device == 2 || num >= 2)
     {
       clonei->vecsize_mangle = 's';
       clonei->inbranch = 1;
@@ -27082,22 +27102,21 @@ aarch64_simd_clone_adjust_ret_or_param (struct cgraph_node *node, tree type,
 	aarch64_sve_vg = poly_uint16 (2, 2);
       unsigned int num_zr = 0;
       unsigned int num_pr = 0;
+      tree base_type = TREE_TYPE (type);
+      if (POINTER_TYPE_P (base_type))
+	base_type = pointer_sized_int_node;
+      scalar_mode base_mode = as_a <scalar_mode> (TYPE_MODE (base_type));
+      machine_mode vec_mode = aarch64_full_sve_mode (base_mode).require ();
+      tree vectype = build_vector_type_for_mode (base_type, vec_mode);
       if (is_mask)
 	{
-	  type = truth_type_for (type);
 	  num_pr = 1;
+	  type = truth_type_for (vectype);
 	}
       else
 	{
 	  num_zr = 1;
-	  tree base_type = TREE_TYPE (type);
-	  if (POINTER_TYPE_P (base_type))
-	    base_type = pointer_sized_int_node;
-	  poly_int64 vec_size = tree_to_poly_int64 (TYPE_SIZE (type));
-	  scalar_mode base_mode = as_a <scalar_mode> (TYPE_MODE (base_type));
-	  machine_mode vec_mode
-	    = aarch64_simd_container_mode (base_mode, vec_size);
-	  type = build_vector_type_for_mode (base_type, vec_mode);
+	  type = vectype;
 	}
 
       aarch64_sve::add_sve_type_attribute (type, num_zr, num_pr, NULL, NULL);
@@ -27223,6 +27242,22 @@ aarch64_can_tag_addresses ()
   return !TARGET_ILP32;
 }
 
+int
+aarch64_omp_device_kind_arch_isa (enum omp_device_kind_arch_isa trait,
+				  const char *name)
+{
+  if (trait != omp_device_isa)
+    return default_omp_device_kind_arch_isa (trait, name);
+
+  if (strncmp (name, "simd", strlen ("simd")) == 0)
+    return 1;
+  if (strncmp (name, "sve", strlen ("sve")) == 0
+      && TARGET_SVE)
+    return 2;
+
+  return 0;
+}
+
 /* Implement TARGET_ASM_FILE_END for AArch64.  This adds the AArch64 GNU NOTE
    section at the end if needed.  */
 #define GNU_PROPERTY_AARCH64_FEATURE_1_AND	0xc0000000
@@ -28146,6 +28181,9 @@ aarch64_libgcc_floating_mode_supported_p
 #undef TARGET_MEMTAG_CAN_TAG_ADDRESSES
 #define TARGET_MEMTAG_CAN_TAG_ADDRESSES aarch64_can_tag_addresses
 
+#undef TARGET_OMP_DEVICE_KIND_ARCH_ISA
+#define TARGET_OMP_DEVICE_KIND_ARCH_ISA aarch64_omp_device_kind_arch_isa
+
 #if CHECKING_P
 #undef TARGET_RUN_TARGET_SELFTESTS
 #define TARGET_RUN_TARGET_SELFTESTS selftest::aarch64_run_selftests
diff --git a/gcc/cp/decl.cc b/gcc/cp/decl.cc
index 30c7470974d4b62ec6c03b2a7dd37f046983a247..1aa5f1a7898df9483a2af4f6f9fea99e6b219271 100644
--- a/gcc/cp/decl.cc
+++ b/gcc/cp/decl.cc
@@ -7998,6 +7998,18 @@ decl_maybe_constant_destruction (tree decl, tree type)
 
 static tree declare_simd_adjust_this (tree *, int *, void *);
 
+tree declare_variant_adjust_parm (tree *tp, int *walk_subtrees, void *data)
+{
+  tree *parm = (tree *) data;
+  if (TREE_CODE (*tp) == FUNCTION_DECL)
+    {
+      *parm = DECL_ARGUMENTS (*tp);
+      *walk_subtrees = 0;
+      return NULL_TREE;
+    }
+  return NULL_TREE;
+}
+
 /* Helper function of omp_declare_variant_finalize.  Finalize one
    "omp declare variant base" attribute.  Return true if it should be
    removed.  */
@@ -8015,13 +8027,14 @@ omp_declare_variant_finalize_one (tree decl, tree attr)
 
   tree ctx = TREE_VALUE (TREE_VALUE (attr));
   tree simd = omp_get_context_selector (ctx, "construct", "simd");
+  tree parm = NULL_TREE;
   if (simd)
     {
       TREE_VALUE (simd)
 	= c_omp_declare_simd_clauses_to_numbers (DECL_ARGUMENTS (decl),
 						 TREE_VALUE (simd));
-      /* FIXME, adjusting simd args unimplemented.  */
-      return true;
+      walk_tree (&TREE_PURPOSE (TREE_VALUE (attr)), declare_variant_adjust_parm,
+		 &parm, NULL);
     }
 
   tree chain = TREE_CHAIN (TREE_VALUE (attr));
@@ -8035,7 +8048,8 @@ omp_declare_variant_finalize_one (tree decl, tree attr)
   input_location = varid_loc;
 
   releasing_vec args;
-  tree parm = DECL_ARGUMENTS (decl);
+  if (!parm)
+    parm = DECL_ARGUMENTS (decl);
   if (TREE_CODE (TREE_TYPE (decl)) == METHOD_TYPE)
     parm = DECL_CHAIN (parm);
   for (; parm; parm = DECL_CHAIN (parm))
@@ -8096,7 +8110,9 @@ omp_declare_variant_finalize_one (tree decl, tree attr)
   if (variant)
     {
       const char *varname = IDENTIFIER_POINTER (DECL_NAME (variant));
-      if (!comptypes (TREE_TYPE (decl), TREE_TYPE (variant), 0))
+      /* TODO: Should we check that if (simd) the return vector type has an
+	 element type that is compatible to the declaration's return type.  */
+      if (!simd && !comptypes (TREE_TYPE (decl), TREE_TYPE (variant), 0))
 	{
 	  error_at (varid_loc, "variant %qD and base %qD have incompatible "
 			       "types", variant, decl);
diff --git a/gcc/omp-simd-clone.cc b/gcc/omp-simd-clone.cc
index 4808608b7a1c06802ee231480c2003cf41c11799..9e7e1a15cb0c1ddf59e99c568d16c45fede5f8a8 100644
--- a/gcc/omp-simd-clone.cc
+++ b/gcc/omp-simd-clone.cc
@@ -299,39 +299,58 @@ simd_clone_vector_of_formal_parm_types (vec<tree> *args, tree fndecl)
     (*args)[i] = TREE_TYPE ((*args)[i]);
 }
 
+static tree
+find_variant_clauses (tree attr, tree *fn_decl, tree *device_clauses)
+{
+  if (!attr)
+    return NULL_TREE;
+
+  gcc_assert (TREE_CODE (attr) == TREE_LIST);
+
+  *fn_decl = TREE_PURPOSE (attr);
+
+  tree clauses = TREE_VALUE (attr);
+  tree simd_clause = NULL_TREE;
+
+  while (clauses)
+    {
+      tree identifier = TREE_PURPOSE (clauses);
+      if (identifier == maybe_get_identifier ("construct"))
+	{
+	  tree construct_clauses = TREE_VALUE (clauses);
+	  while (construct_clauses)
+	    {
+	      identifier = TREE_PURPOSE (construct_clauses);
+	      if (identifier == maybe_get_identifier ("simd"))
+		simd_clause = TREE_VALUE (construct_clauses);
+	      else
+		return NULL_TREE;
+	      construct_clauses = TREE_CHAIN (construct_clauses);
+	    }
+	}
+      else if (identifier == maybe_get_identifier ("device"))
+	*device_clauses = TREE_VALUE (clauses);
+      clauses = TREE_CHAIN (clauses);
+    }
+  return simd_clause;
+}
+
 /* Given a simd function in NODE, extract the simd specific
    information from the OMP clauses passed in CLAUSES, and return
    the struct cgraph_simd_clone * if it should be cloned.  *INBRANCH_SPECIFIED
    is set to TRUE if the `inbranch' or `notinbranch' clause specified,
    otherwise set to FALSE.  */
 
-static struct cgraph_simd_clone *
-simd_clone_clauses_extract (struct cgraph_node *node, tree clauses,
+static bool
+simd_clone_clauses_extract (struct cgraph_node *node ATTRIBUTE_UNUSED,
+			    struct cgraph_simd_clone *clone_info,
+			    auto_vec<tree> &args, tree clauses,
 			    bool *inbranch_specified)
 {
-  auto_vec<tree> args;
-  simd_clone_vector_of_formal_parm_types (&args, node->decl);
-  tree t;
-  int n;
-  *inbranch_specified = false;
-
-  n = args.length ();
-  if (n > 0 && args.last () == void_type_node)
-    n--;
-
-  /* Allocate one more than needed just in case this is an in-branch
-     clone which will require a mask argument.  */
-  struct cgraph_simd_clone *clone_info = simd_clone_struct_alloc (n + 1);
-  clone_info->nargs = n;
-
-  if (!clauses)
-    goto out;
-
-  clauses = TREE_VALUE (clauses);
   if (!clauses || TREE_CODE (clauses) != OMP_CLAUSE)
-    goto out;
+    return true;
 
-  for (t = clauses; t; t = OMP_CLAUSE_CHAIN (t))
+  for (tree t = clauses; t; t = OMP_CLAUSE_CHAIN (t))
     {
       switch (OMP_CLAUSE_CODE (t))
 	{
@@ -390,13 +409,13 @@ simd_clone_clauses_extract (struct cgraph_node *node, tree clauses,
 		  {
 		    warning_at (OMP_CLAUSE_LOCATION (t), 0,
 				"ignoring large linear step");
-		    return NULL;
+		    return false;
 		  }
 		else if (integer_zerop (step))
 		  {
 		    warning_at (OMP_CLAUSE_LOCATION (t), 0,
 				"ignoring zero linear step");
-		    return NULL;
+		    return false;
 		  }
 		else
 		  {
@@ -453,7 +472,76 @@ simd_clone_clauses_extract (struct cgraph_node *node, tree clauses,
 	}
     }
 
- out:
+  return true;
+}
+
+static struct cgraph_simd_clone *
+create_simd_clone_for_simd_or_variant (struct cgraph_node *node, tree attr,
+				       bool variant, bool *inbranch_specified)
+{
+  tree fn_decl = NULL_TREE;
+  tree device_clauses = NULL_TREE;
+  *inbranch_specified = false;
+
+  tree simd_clauses;
+  if (variant)
+    simd_clauses = find_variant_clauses (attr, &fn_decl, &device_clauses);
+  else
+    {
+      /* ATTR is currently pointing to 'omp declare simd', use TREE_VALUE to
+	 to get the TREE_LIST with OMP_CLAUSE.  */
+      simd_clauses = TREE_VALUE (attr);
+      /* If SIMD_CLAUSES is not NULL_TREE, then it should be a TREE_LIST with
+	 OMP_CLAUSE inside.  */
+      if (simd_clauses)
+	simd_clauses = TREE_VALUE (simd_clauses);
+    }
+  auto_vec<tree> args;
+  simd_clone_vector_of_formal_parm_types (&args, node->decl);
+
+  int n = args.length ();
+  if (n > 0 && args.last () == void_type_node)
+    n--;
+
+  /* Allocate one more than needed just in case this is an in-branch
+     clone which will require a mask argument.  */
+  struct cgraph_simd_clone *clone_info = simd_clone_struct_alloc (n + 1);
+  clone_info->nargs = n;
+
+  if (!simd_clone_clauses_extract (node, clone_info, args,
+				   simd_clauses,
+				   inbranch_specified))
+    return NULL;
+
+  if (!clone_info)
+    return NULL;
+
+  clone_info->device = 0;
+  if (device_clauses)
+    {
+      while (device_clauses)
+	{
+	  tree identifier = TREE_PURPOSE (device_clauses);
+	  if (identifier == maybe_get_identifier ("isa"))
+	    {
+	      tree string_cst = TREE_VALUE (TREE_VALUE (device_clauses));
+	      const char * string_cst_p = TREE_STRING_POINTER (string_cst);
+	      clone_info->device
+		|= targetm.omp.device_kind_arch_isa (omp_device_isa,
+						    string_cst_p);
+	    }
+	  else if (identifier == maybe_get_identifier ("arch"))
+	    {
+	      tree string_cst = TREE_VALUE (TREE_VALUE (device_clauses));
+	      const char * string_cst_p = TREE_STRING_POINTER (string_cst);
+	      clone_info->device
+		|= targetm.omp.device_kind_arch_isa (omp_device_arch,
+						    string_cst_p);
+	    }
+	  device_clauses = TREE_CHAIN (device_clauses);
+	}
+    }
+
   if (TYPE_ATOMIC (TREE_TYPE (TREE_TYPE (node->decl))))
     {
       warning_at (DECL_SOURCE_LOCATION (node->decl), 0,
@@ -473,6 +561,11 @@ simd_clone_clauses_extract (struct cgraph_node *node, tree clauses,
 	return NULL;
       }
 
+  if (variant)
+    clone_info->variant_name = DECL_NAME (TREE_PURPOSE (attr));
+  else
+    clone_info->variant_name = NULL_TREE;
+
   return clone_info;
 }
 
@@ -531,6 +624,9 @@ static tree
 simd_clone_mangle (struct cgraph_node *node,
 		   struct cgraph_simd_clone *clone_info)
 {
+  if (clone_info->variant_name)
+    return clone_info->variant_name;
+
   char vecsize_mangle = clone_info->vecsize_mangle;
   char mask = clone_info->inbranch ? 'M' : 'N';
   poly_uint64 simdlen = clone_info->simdlen;
@@ -1911,21 +2007,48 @@ simd_clone_adjust (struct cgraph_node *node)
   pop_cfun ();
 }
 
+tree
+get_simd_or_variant_attrs (tree attrs, bool *variant)
+{
+  tree attr = lookup_attribute ("omp declare simd", attrs);
+  *variant = false;
+  if (attr)
+    return attr;
+  attr = lookup_attribute ("omp declare variant base", attrs);
+  if (!attr)
+    return NULL_TREE;
+  /* Go through the 'omp declare variant base' and function declaration.  */
+  attr = TREE_VALUE (TREE_VALUE (attr));
+  attr = lookup_attribute ("construct", attr);
+  if (!attr)
+    return NULL_TREE;
+  /* Go through 'construct'.  */
+  attr = TREE_VALUE (attr);
+  attr = lookup_attribute ("simd", attr);
+  if (!attr)
+    return NULL_TREE;
+  *variant = true;
+  return TREE_VALUE (attrs);
+}
+
 /* If the function in NODE is tagged as an elemental SIMD function,
    create the appropriate SIMD clones.  */
 
 void
 expand_simd_clones (struct cgraph_node *node)
 {
-  tree attr;
+  tree attr, attrs;
+  bool variant = false;
   bool explicit_p = true;
 
   if (node->inlined_to
       || lookup_attribute ("noclone", DECL_ATTRIBUTES (node->decl)))
     return;
 
-  attr = lookup_attribute ("omp declare simd",
-			   DECL_ATTRIBUTES (node->decl));
+  attrs = DECL_ATTRIBUTES (node->decl);
+  attr = get_simd_or_variant_attrs (attrs, &variant);
+  if (variant)
+    explicit_p = false;
 
   /* See if we can add an "omp declare simd" directive implicitly
      before giving up.  */
@@ -1944,8 +2067,7 @@ expand_simd_clones (struct cgraph_node *node)
       && !oacc_get_fn_attrib (node->decl)
       && ok_for_auto_simd_clone (node))
     {
-      attr = tree_cons (get_identifier ("omp declare simd"), NULL,
-			DECL_ATTRIBUTES (node->decl));
+      attr = tree_cons (get_identifier ("omp declare simd"), NULL, attrs);
       DECL_ATTRIBUTES (node->decl) = attr;
       explicit_p = false;
     }
@@ -1970,8 +2092,9 @@ expand_simd_clones (struct cgraph_node *node)
       /* Start with parsing the "omp declare simd" attribute(s).  */
       bool inbranch_clause_specified;
       struct cgraph_simd_clone *clone_info
-	= simd_clone_clauses_extract (node, TREE_VALUE (attr),
-				      &inbranch_clause_specified);
+	= create_simd_clone_for_simd_or_variant (node, attr, variant,
+						 &inbranch_clause_specified);
+
       if (clone_info == NULL)
 	continue;
 
@@ -2070,7 +2193,8 @@ expand_simd_clones (struct cgraph_node *node)
 		     IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (n->decl)));
 	}
     }
-  while ((attr = lookup_attribute ("omp declare simd", TREE_CHAIN (attr))));
+  while ((attrs = TREE_CHAIN (attrs))
+	  && (attr = get_simd_or_variant_attrs (attrs, &variant)));
 }
 
 /* Entry point for IPA simd clone creation pass.  */
diff --git a/gcc/target.def b/gcc/target.def
index ffa12aa9023bb8f26a647a9848800c77f34afc67..ba14cc6da9dba5b6294c78c54b95b4622ea3139a 100644
--- a/gcc/target.def
+++ b/gcc/target.def
@@ -1683,7 +1683,8 @@ DEFHOOK
 device trait set, return 0 if not present in any OpenMP context in the\n\
 whole translation unit, or -1 if not present in the current OpenMP context\n\
 but might be present in another OpenMP context in the same TU.",
-int, (enum omp_device_kind_arch_isa trait, const char *name), NULL)
+int, (enum omp_device_kind_arch_isa trait, const char *name),
+default_omp_device_kind_arch_isa)
 
 HOOK_VECTOR_END (omp)
 
diff --git a/gcc/targhooks.h b/gcc/targhooks.h
index 860fb8ccbf1ab00c43dc4b4d32808c1f488406e4..2599d3fad0451e00df8fcf2ac1d6434d33fd9997 100644
--- a/gcc/targhooks.h
+++ b/gcc/targhooks.h
@@ -76,6 +76,8 @@ extern tree default_mangle_assembler_name (const char *);
 extern tree default_simd_clone_adjust_ret_or_param
   (struct cgraph_node *,tree , bool);
 
+extern int default_omp_device_kind_arch_isa
+  (omp_device_kind_arch_isa , const char *);
 
 extern machine_mode default_translate_mode_attribute (machine_mode);
 extern bool default_scalar_mode_supported_p (scalar_mode);
diff --git a/gcc/targhooks.cc b/gcc/targhooks.cc
index 4e54ceb0297828cf13e418dfa113651670a6f112..e6c65447fb7db4ff96108de98807a012d839bac8 100644
--- a/gcc/targhooks.cc
+++ b/gcc/targhooks.cc
@@ -408,6 +408,14 @@ default_simd_clone_adjust_ret_or_param (struct cgraph_node *node ATTRIBUTE_UNUSE
   return type;
 }
 
+int
+default_omp_device_kind_arch_isa (omp_device_kind_arch_isa trait, const char *name)
+{
+    if (trait == omp_device_kind)
+      return strncmp (name, "cpu", strlen ("cpu")) == 0;
+    return 0;
+}
+
 /* The default implementation of TARGET_TRANSLATE_MODE_ATTRIBUTE.  */
 
 machine_mode
diff --git a/gcc/testsuite/gcc.target/aarch64/declare-variant-1.c b/gcc/testsuite/gcc.target/aarch64/declare-variant-1.c
new file mode 100644
index 0000000000000000000000000000000000000000..c44c9464f4e27047db9be5b0c9710ae3cfee8eee
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/declare-variant-1.c
@@ -0,0 +1,7 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -fopenmp" } */
+
+#include "declare-variant-1.x"
+
+/* { dg-final { scan-assembler "_ZGVnN4v_callee" } } */
+/* { dg-final { scan-assembler "_ZGVnN8v_callee" } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/declare-variant-1.x b/gcc/testsuite/gcc.target/aarch64/declare-variant-1.x
new file mode 100644
index 0000000000000000000000000000000000000000..61bcf8eff02e415a5044a7cbda8a593607fd0c56
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/declare-variant-1.x
@@ -0,0 +1,27 @@
+#if __ARM_FEATURE_SVE
+__SVInt16_t _ZGVsMxv_callee (__SVInt16_t, __SVBool_t);
+__SVInt16_t _ZGVsM8v_callee (__SVInt16_t, __SVBool_t);
+__SVInt16_t _ZGVsM16v_callee (__SVInt16_t, __SVBool_t);
+#endif
+__Int16x4_t _ZGVnN4v_callee (__Int16x4_t);
+__Int16x8_t _ZGVnN8v_callee (__Int16x8_t);
+#if __ARM_FEATURE_SVE
+#pragma omp declare variant(_ZGVsM16v_callee) \
+    match(construct = {simd(notinbranch, simdlen(16))}, device = {isa("sve")})
+#pragma omp declare variant(_ZGVsM8v_callee) \
+    match(construct = {simd(notinbranch, simdlen(8))}, device = {isa("sve")})
+#pragma omp declare variant(_ZGVsMxv_callee) \
+    match(construct = {simd(notinbranch)}, device = {isa("sve")})
+#endif
+#pragma omp declare variant(_ZGVnN4v_callee) \
+    match(construct = {simd(notinbranch, simdlen(4))}, device = {isa("simd")})
+#pragma omp declare variant(_ZGVnN8v_callee) \
+    match(construct = {simd(notinbranch, simdlen(8))}, device = {isa("simd")})
+extern short __attribute__ ((const)) callee (short);
+
+
+void caller_autovec (short * __restrict a, short *__restrict b, unsigned n)
+{
+  for (unsigned i = 0; i < n; ++i)
+    a[i] = callee (b[i]);
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/declare-variant-1.c b/gcc/testsuite/gcc.target/aarch64/sve/declare-variant-1.c
new file mode 100644
index 0000000000000000000000000000000000000000..7a8129fe88ac9759b2337892a3d14f4e8196e61f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/declare-variant-1.c
@@ -0,0 +1,6 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -fopenmp" } */
+
+#include "../declare-variant-1.x"
+
+/* { dg-final { scan-assembler "_ZGVsMxv_callee" } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/declare-variant-2.c b/gcc/testsuite/gcc.target/aarch64/sve/declare-variant-2.c
new file mode 100644
index 0000000000000000000000000000000000000000..2b6eabac76cf1cd059ec8d960ddd9e30973dc797
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/declare-variant-2.c
@@ -0,0 +1,6 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -fopenmp -msve-vector-bits=128" } */
+
+#include "../declare-variant-1.x"
+
+/* { dg-final { scan-assembler "_ZGVsM8v_callee" } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/declare-variant-3.c b/gcc/testsuite/gcc.target/aarch64/sve/declare-variant-3.c
new file mode 100644
index 0000000000000000000000000000000000000000..e8b598fe479d7e1e92eb7f9e3413d5ac183626a9
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/declare-variant-3.c
@@ -0,0 +1,6 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -fopenmp -msve-vector-bits=256" } */
+
+#include "../declare-variant-1.x"
+
+/* { dg-final { scan-assembler "_ZGVsM16v_callee" } } */

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [RFC 6/X] omp: Allow creation of simd clones from omp declare variant with -fopenmp-simd flag
  2023-03-08 16:17 [RFC 0/X] Implement GCC support for AArch64 libmvec Andre Vieira (lists)
                   ` (4 preceding siblings ...)
  2023-03-08 16:26 ` [RFC 5/X] omp: Create simd clones from 'omp declare variant's Andre Vieira (lists)
@ 2023-03-08 16:28 ` Andre Vieira (lists)
  2023-04-20 14:51 ` [RFC 0/X] Implement GCC support for AArch64 libmvec Richard Sandiford
  6 siblings, 0 replies; 15+ messages in thread
From: Andre Vieira (lists) @ 2023-03-08 16:28 UTC (permalink / raw)
  To: gcc-patches; +Cc: jakub, Richard Sandiford, Richard Biener

[-- Attachment #1: Type: text/plain, Size: 734 bytes --]

Hi,

This RFC is to propose relaxing the flag needed to allow the creation of 
simd clones from omp declare variants, such that we can use 
-fopenmp-simd rather than -fopenmp.
This should only change the behaviour of omp simd clones and should not 
enable any other openmp functionality, though I need to test this 
furter, for the time being I just played around a bit with some of the 
existing declare-variant tests.

Any objections to this in general? And/or ideas to properly test the 
effect of this on other omp codegen? My current plan is to have a look 
at the declare-variant tests we had before this patch series, locally 
modify them to pass -fopenmp-simd and make sure they fail the same way 
before and after this patch.

[-- Attachment #2: libmvec_6.patch --]
[-- Type: text/plain, Size: 4223 bytes --]

diff --git a/gcc/c/c-parser.cc b/gcc/c/c-parser.cc
index 21bc3167ce224823c214efc064be399f2da9c787..b28e3d0a8adb520941dc3a17173cc07de4a653c5 100644
--- a/gcc/c/c-parser.cc
+++ b/gcc/c/c-parser.cc
@@ -23564,6 +23564,13 @@ c_parser_omp_declare (c_parser *parser, enum pragma_context context)
 	  c_parser_omp_declare_reduction (parser, context);
 	  return false;
 	}
+      if (strcmp (p, "variant") == 0)
+	{
+	  /* c_parser_consume_token (parser); done in
+	     c_parser_omp_declare_simd.  */
+	  c_parser_omp_declare_simd (parser, context);
+	  return true;
+	}
       if (!flag_openmp)  /* flag_openmp_simd  */
 	{
 	  c_parser_skip_to_pragma_eol (parser, false);
@@ -23575,13 +23582,6 @@ c_parser_omp_declare (c_parser *parser, enum pragma_context context)
 	  c_parser_omp_declare_target (parser);
 	  return false;
 	}
-      if (strcmp (p, "variant") == 0)
-	{
-	  /* c_parser_consume_token (parser); done in
-	     c_parser_omp_declare_simd.  */
-	  c_parser_omp_declare_simd (parser, context);
-	  return true;
-	}
     }
 
   c_parser_error (parser, "expected %<simd%>, %<reduction%>, "
diff --git a/gcc/cp/decl.cc b/gcc/cp/decl.cc
index 1aa5f1a7898df9483a2af4f6f9fea99e6b219271..7bd32fd3e345a003be03d1e9acf33db76eed9460 100644
--- a/gcc/cp/decl.cc
+++ b/gcc/cp/decl.cc
@@ -8428,7 +8428,7 @@ cp_finish_decl (tree decl, tree init, bool init_const_expr_p,
 	suppress_warning (decl, OPT_Winit_self);
     }
 
-  if (flag_openmp
+  if (flag_openmp_simd
       && TREE_CODE (decl) == FUNCTION_DECL
       /* #pragma omp declare variant on methods handled in finish_struct
 	 instead.  */
diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc
index 1a124f5395e018f3c4b2f9f36fcd42159d0b868f..d1c7f9d91d2546ad8f5674232a05f7d7726eeafe 100644
--- a/gcc/cp/parser.cc
+++ b/gcc/cp/parser.cc
@@ -47884,7 +47884,7 @@ cp_parser_omp_declare (cp_parser *parser, cp_token *pragma_tok,
 				      context, false);
 	  return true;
 	}
-      if (flag_openmp && strcmp (p, "variant") == 0)
+      if (strcmp (p, "variant") == 0)
 	{
 	  cp_lexer_consume_token (parser->lexer);
 	  cp_parser_omp_declare_simd (parser, pragma_tok,
diff --git a/gcc/testsuite/gcc.target/aarch64/declare-variant-1.c b/gcc/testsuite/gcc.target/aarch64/declare-variant-1.c
index c44c9464f4e27047db9be5b0c9710ae3cfee8eee..83eeadd108b5578623c63e73dea11b2b17a08618 100644
--- a/gcc/testsuite/gcc.target/aarch64/declare-variant-1.c
+++ b/gcc/testsuite/gcc.target/aarch64/declare-variant-1.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O3 -fopenmp" } */
+/* { dg-options "-O3 -fopenmp-simd" } */
 
 #include "declare-variant-1.x"
 
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/declare-variant-1.c b/gcc/testsuite/gcc.target/aarch64/sve/declare-variant-1.c
index 7a8129fe88ac9759b2337892a3d14f4e8196e61f..616b0ed1c1dc019103dae504d2cec65523a35a3d 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/declare-variant-1.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/declare-variant-1.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O3 -fopenmp" } */
+/* { dg-options "-O3 -fopenmp-simd" } */
 
 #include "../declare-variant-1.x"
 
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/declare-variant-2.c b/gcc/testsuite/gcc.target/aarch64/sve/declare-variant-2.c
index 2b6eabac76cf1cd059ec8d960ddd9e30973dc797..a832c5255306999b0006b68b1890c7f42c3dafb0 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/declare-variant-2.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/declare-variant-2.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O3 -fopenmp -msve-vector-bits=128" } */
+/* { dg-options "-O3 -fopenmp-simd -msve-vector-bits=128" } */
 
 #include "../declare-variant-1.x"
 
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/declare-variant-3.c b/gcc/testsuite/gcc.target/aarch64/sve/declare-variant-3.c
index e8b598fe479d7e1e92eb7f9e3413d5ac183626a9..455c0338d4680d143daae666c29e4f018df5bff9 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/declare-variant-3.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/declare-variant-3.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O3 -fopenmp -msve-vector-bits=256" } */
+/* { dg-options "-O3 -fopenmp-simd -msve-vector-bits=256" } */
 
 #include "../declare-variant-1.x"
 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC 0/X] Implement GCC support for AArch64 libmvec
  2023-03-08 16:17 [RFC 0/X] Implement GCC support for AArch64 libmvec Andre Vieira (lists)
                   ` (5 preceding siblings ...)
  2023-03-08 16:28 ` [RFC 6/X] omp: Allow creation of simd clones from omp declare variant with -fopenmp-simd flag Andre Vieira (lists)
@ 2023-04-20 14:51 ` Richard Sandiford
  2023-04-20 15:22   ` Andre Vieira (lists)
  6 siblings, 1 reply; 15+ messages in thread
From: Richard Sandiford @ 2023-04-20 14:51 UTC (permalink / raw)
  To: Andre Vieira (lists); +Cc: gcc-patches, jakub, Richard Biener

"Andre Vieira (lists)" <andre.simoesdiasvieira@arm.com> writes:
> Hi all,
>
> This is a series of patches/RFCs to implement support in GCC to be able 
> to target AArch64's libmvec functions that will be/are being added to glibc.
> We have chosen to use the omp pragma '#pragma omp declare variant ...' 
> with a simd construct as the way for glibc to inform GCC what functions 
> are available.
>
> For example, if we would like to supply a vector version of the scalar 
> 'cosf' we would have an include file with something like:
> typedef __attribute__((__neon_vector_type__(4))) float __f32x4_t;
> typedef __attribute__((__neon_vector_type__(2))) float __f32x2_t;
> typedef __SVFloat32_t __sv_f32_t;
> typedef __SVBool_t __sv_bool_t;
> __f32x4_t _ZGVnN4v_cosf (__f32x4_t);
> __f32x2_t _ZGVnN2v_cosf (__f32x2_t);
> __sv_f32_t _ZGVsMxv_cosf (__sv_f32_t, __sv_bool_t);
> #pragma omp declare variant(_ZGVnN4v_cosf) \
>      match(construct = {simd(notinbranch, simdlen(4))}, device = 
> {isa("simd")})
> #pragma omp declare variant(_ZGVnN2v_cosf) \
>      match(construct = {simd(notinbranch, simdlen(2))}, device = 
> {isa("simd")})
> #pragma omp declare variant(_ZGVsMxv_cosf) \
>      match(construct = {simd(inbranch)}, device = {isa("sve")})
> extern float cosf (float);
>
> The BETA ABI can be found in the vfabia64 subdir of 
> https://github.com/ARM-software/abi-aa/
> This currently disagrees with how this patch series implements 'omp 
> declare simd' for SVE and I also do not see a need for the 'omp declare 
> variant' scalable extension constructs. I will make changes to the ABI 
> once we've finalized the co-design of the ABI and this implementation.

I don't see a good reason for dropping the extension("scalable").
The problem is that since the base spec requires a simdlen clause,
GCC should in general raise an error if simdlen is omitted.
Relaxing that for an explicit extension seems better than doing it
only based on the ISA (which should in general be a free-form string).
Having "scalable" in the definition also helps to make the intent clearer.

Any change to the declare simd behaviour should probably be agreed
with the LLVM folks first.  Like you say, we already know that GCC
can do your version, since it already does the equivalent thing for x86.

I'm not sure, but I'm guessing the declare simd VFABI was written
that way because, at the time (several years ago), there were
concerns about switching SVE on and off on a function-by-function
basis in LLVM.

But I'm not sure it makes sense to ignore -msve-vector-bits= when
compiling the SVE version (which is what patch 4 seems to do).
If someone compiles with -march=armv8.4-a, we'll use all Armv8.4-A
features in the Advanced SIMD routines.  Why should we ignore
SVE-related target information for the SVE routines?

Of course, the fact that we take command-line options into account
means that omp simd/variant clauses on linkonce/comdat group functions
are an ODR violation waiting to happen.  But the same is true for the
original scalar functions that the clauses are attached to.

Thanks,
Richard

> The patch series has three main steps:
> 1) Add SVE support for 'omp declare simd', see PR 96342
> 2) Enable GCC to use omp declare variants with simd constructs as simd 
> clones during auto-vectorization.
> 3) Add SLP support for vectorizable_simd_clone_call (This sounded like a 
> nice thing to add as we want to move away from non-slp vectorization).
>
> Below you can see the list of current Patches/RFCs, the difference being 
> on how confident I am of the proposed changes. For the RFC I am hoping 
> to get early comments on the approach, rather than more indepth 
> code-reviews.
>
> I appreciate we are still in Stage 4, so I can completely understand if 
> you don't have time to review this now, but I thought it can't hurt to 
> post these early.
>
> Andre Vieira:
> [PATCH] omp: Replace simd_clone_subparts with TYPE_VECTOR_SUBPARTS
> [PATCH] parloops: Copy target and optimizations when creating a function 
> clone
> [PATCH] parloops: Allow poly nit and bound
> [RFC] omp, aarch64: Add SVE support for 'omp declare simd' [PR 96342]
> [RFC] omp: Create simd clones from 'omp declare variant's
> [RFC] omp: Allow creation of simd clones from omp declare variant with 
> -fopenmp-simd flag
>
> Work in progress:
> [RFC] vect: Enable SLP codegen for vectorizable_simd_clone_call

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/X] omp: Replace simd_clone_subparts with TYPE_VECTOR_SUBPARTS
  2023-03-08 16:20 ` [PATCH 1/X] omp: Replace simd_clone_subparts with TYPE_VECTOR_SUBPARTS Andre Vieira (lists)
@ 2023-04-20 15:20   ` Richard Sandiford
  0 siblings, 0 replies; 15+ messages in thread
From: Richard Sandiford @ 2023-04-20 15:20 UTC (permalink / raw)
  To: Andre Vieira (lists); +Cc: gcc-patches, jakub, Richard Biener

"Andre Vieira (lists)" <andre.simoesdiasvieira@arm.com> writes:
> Hi,
>
> This patch replaces the uses of simd_clone_subparts with 
> TYPE_VECTOR_SUBPARTS and removes the definition of the first.
>
> gcc/ChangeLog:
>
>          * omp-sind-clone.cc (simd_clone_subparts): Remove.
>          (simd_clone_init_simd_arrays): Replace simd_clone_subparts with 
> TYPE_VECTOR_SUBPARTS.
>          (ipa_simd_modify_function_body): Likewise.
>          * tree-vect-stmts.cc (simd_clone_subparts): Remove.
>          (vectorizable_simd_clone_call): Replace simd_clone_subparts 
> with TYPE_VECTOR_SUBPARTS.
>
> diff --git a/gcc/omp-simd-clone.cc b/gcc/omp-simd-clone.cc
> index 0949b8ba288dfc7e7692403bfc600983faddf5dd..48b480e7556d9ad8e5502e10e513ec36b17b9cbb 100644
> --- a/gcc/omp-simd-clone.cc
> +++ b/gcc/omp-simd-clone.cc
> @@ -255,16 +255,6 @@ ok_for_auto_simd_clone (struct cgraph_node *node)
>    return true;
>  }
>  
> -
> -/* Return the number of elements in vector type VECTYPE, which is associated
> -   with a SIMD clone.  At present these always have a constant length.  */
> -
> -static unsigned HOST_WIDE_INT
> -simd_clone_subparts (tree vectype)
> -{
> -  return TYPE_VECTOR_SUBPARTS (vectype).to_constant ();
> -}
> -
>  /* Allocate a fresh `simd_clone' and return it.  NARGS is the number
>     of arguments to reserve space for.  */
>  
> @@ -1027,7 +1017,7 @@ simd_clone_init_simd_arrays (struct cgraph_node *node,
>  	    }
>  	  continue;
>  	}
> -      if (known_eq (simd_clone_subparts (TREE_TYPE (arg)),
> +      if (known_eq (TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg)),
>  		    node->simdclone->simdlen))
>  	{
>  	  tree ptype = build_pointer_type (TREE_TYPE (TREE_TYPE (array)));
> @@ -1039,7 +1029,7 @@ simd_clone_init_simd_arrays (struct cgraph_node *node,
>  	}
>        else
>  	{
> -	  unsigned int simdlen = simd_clone_subparts (TREE_TYPE (arg));
> +	  poly_uint64 simdlen = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg));
>  	  unsigned int times = vector_unroll_factor (node->simdclone->simdlen,
>  						     simdlen);
>  	  tree ptype = build_pointer_type (TREE_TYPE (TREE_TYPE (array)));
> @@ -1225,9 +1215,9 @@ ipa_simd_modify_function_body (struct cgraph_node *node,
>  		  iter, NULL_TREE, NULL_TREE);
>        adjustments->register_replacement (&(*adjustments->m_adj_params)[j], r);
>  
> -      if (multiple_p (node->simdclone->simdlen, simd_clone_subparts (vectype)))
> +      if (multiple_p (node->simdclone->simdlen, TYPE_VECTOR_SUBPARTS (vectype)))
>  	j += vector_unroll_factor (node->simdclone->simdlen,
> -				   simd_clone_subparts (vectype)) - 1;
> +				   TYPE_VECTOR_SUBPARTS (vectype)) - 1;
>      }
>    adjustments->sort_replacements ();
>  
> diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> index df6239a1c61c7213ad3c1468723bc1adf70bc02c..c85b6babc4bc5bc3111ef326dcc8f32bb25333f6 100644
> --- a/gcc/tree-vect-stmts.cc
> +++ b/gcc/tree-vect-stmts.cc
> @@ -3964,16 +3964,6 @@ vect_simd_lane_linear (tree op, class loop *loop,
>      }
>  }
>  
> -/* Return the number of elements in vector type VECTYPE, which is associated
> -   with a SIMD clone.  At present these vectors always have a constant
> -   length.  */
> -
> -static unsigned HOST_WIDE_INT
> -simd_clone_subparts (tree vectype)
> -{
> -  return TYPE_VECTOR_SUBPARTS (vectype).to_constant ();
> -}
> -
>  /* Function vectorizable_simd_clone_call.
>  
>     Check if STMT_INFO performs a function call that can be vectorized
> @@ -4251,7 +4241,7 @@ vectorizable_simd_clone_call (vec_info *vinfo, stmt_vec_info stmt_info,
>  							  slp_node);
>  	if (arginfo[i].vectype == NULL
>  	    || !constant_multiple_p (bestn->simdclone->simdlen,
> -				     simd_clone_subparts (arginfo[i].vectype)))
> +				     TYPE_VECTOR_SUBPARTS (arginfo[i].vectype)))
>  	  return false;
>        }
>  
> @@ -4349,15 +4339,19 @@ vectorizable_simd_clone_call (vec_info *vinfo, stmt_vec_info stmt_info,
>  	    case SIMD_CLONE_ARG_TYPE_VECTOR:
>  	      atype = bestn->simdclone->args[i].vector_type;
>  	      o = vector_unroll_factor (nunits,
> -					simd_clone_subparts (atype));
> +					TYPE_VECTOR_SUBPARTS (atype));
>  	      for (m = j * o; m < (j + 1) * o; m++)
>  		{
> -		  if (simd_clone_subparts (atype)
> -		      < simd_clone_subparts (arginfo[i].vectype))
> +		  poly_uint64 atype_subparts = TYPE_VECTOR_SUBPARTS (atype);
> +		  poly_uint64 arginfo_subparts
> +		    = TYPE_VECTOR_SUBPARTS (arginfo[i].vectype);
> +		  if (known_lt (atype_subparts, arginfo_subparts))
>  		    {
>  		      poly_uint64 prec = GET_MODE_BITSIZE (TYPE_MODE (atype));
> -		      k = (simd_clone_subparts (arginfo[i].vectype)
> -			   / simd_clone_subparts (atype));
> +		      if (!constant_multiple_p (atype_subparts,
> +						arginfo_subparts, &k))
> +			gcc_unreachable ();
> +

Very minor, but I think it's conceptually cleaner to use the
constant_multiple_p as the if condition, rather than known_lt.
Then...

>  		      gcc_assert ((k & (k - 1)) == 0);
>  		      if (m == 0)
>  			{
> @@ -4387,8 +4381,9 @@ vectorizable_simd_clone_call (vec_info *vinfo, stmt_vec_info stmt_info,
>  		    }
>  		  else
>  		    {
> -		      k = (simd_clone_subparts (atype)
> -			   / simd_clone_subparts (arginfo[i].vectype));
> +		      if (!constant_multiple_p (arginfo_subparts,
> +						atype_subparts, &k))
> +			gcc_unreachable ();

...make this else conditional on constant_multiple_p too,
with a new final else that contains gcc_unreachable.

>  		      gcc_assert ((k & (k - 1)) == 0);
>  		      vec<constructor_elt, va_gc> *ctor_elts;
>  		      if (k != 1)
> @@ -4522,7 +4517,7 @@ vectorizable_simd_clone_call (vec_info *vinfo, stmt_vec_info stmt_info,
>        if (vec_dest)
>  	{
>  	  gcc_assert (ratype
> -		      || known_eq (simd_clone_subparts (rtype), nunits));
> +		      || known_eq (TYPE_VECTOR_SUBPARTS (rtype), nunits));
>  	  if (ratype)
>  	    new_temp = create_tmp_var (ratype);
>  	  else if (useless_type_conversion_p (vectype, rtype))
> @@ -4536,13 +4531,13 @@ vectorizable_simd_clone_call (vec_info *vinfo, stmt_vec_info stmt_info,
>  
>        if (vec_dest)
>  	{
> -	  if (!multiple_p (simd_clone_subparts (vectype), nunits))
> +	  if (!multiple_p (TYPE_VECTOR_SUBPARTS (vectype), nunits))
>  	    {
>  	      unsigned int k, l;
>  	      poly_uint64 prec = GET_MODE_BITSIZE (TYPE_MODE (vectype));
>  	      poly_uint64 bytes = GET_MODE_SIZE (TYPE_MODE (vectype));
>  	      k = vector_unroll_factor (nunits,
> -					simd_clone_subparts (vectype));
> +					TYPE_VECTOR_SUBPARTS (vectype));
>  	      gcc_assert ((k & (k - 1)) == 0);
>  	      for (l = 0; l < k; l++)
>  		{
> @@ -4568,10 +4563,12 @@ vectorizable_simd_clone_call (vec_info *vinfo, stmt_vec_info stmt_info,
>  		vect_clobber_variable (vinfo, stmt_info, gsi, new_temp);
>  	      continue;
>  	    }
> -	  else if (!multiple_p (nunits, simd_clone_subparts (vectype)))
> +	  else if (!multiple_p (nunits, TYPE_VECTOR_SUBPARTS (vectype)))
>  	    {
> -	      unsigned int k = (simd_clone_subparts (vectype)
> -				/ simd_clone_subparts (rtype));
> +	      unsigned int k;
> +	      if (!constant_multiple_p (TYPE_VECTOR_SUBPARTS (vectype),
> +					TYPE_VECTOR_SUBPARTS (rtype), &k))
> +		gcc_unreachable ();

Suggest using vector_unroll_factor here too.

Thanks,
Richard

>  	      gcc_assert ((k & (k - 1)) == 0);
>  	      if ((j & (k - 1)) == 0)
>  		vec_alloc (ret_ctor_elts, k);
> @@ -4579,7 +4576,7 @@ vectorizable_simd_clone_call (vec_info *vinfo, stmt_vec_info stmt_info,
>  		{
>  		  unsigned int m, o;
>  		  o = vector_unroll_factor (nunits,
> -					    simd_clone_subparts (rtype));
> +					    TYPE_VECTOR_SUBPARTS (rtype));
>  		  for (m = 0; m < o; m++)
>  		    {
>  		      tree tem = build4 (ARRAY_REF, rtype, new_temp,

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC 0/X] Implement GCC support for AArch64 libmvec
  2023-04-20 14:51 ` [RFC 0/X] Implement GCC support for AArch64 libmvec Richard Sandiford
@ 2023-04-20 15:22   ` Andre Vieira (lists)
  2023-04-20 16:02     ` Jakub Jelinek
  2023-04-20 16:13     ` Richard Sandiford
  0 siblings, 2 replies; 15+ messages in thread
From: Andre Vieira (lists) @ 2023-04-20 15:22 UTC (permalink / raw)
  To: gcc-patches, jakub, Richard Biener, richard.sandiford



On 20/04/2023 15:51, Richard Sandiford wrote:
> "Andre Vieira (lists)" <andre.simoesdiasvieira@arm.com> writes:
>> Hi all,
>>
>> This is a series of patches/RFCs to implement support in GCC to be able
>> to target AArch64's libmvec functions that will be/are being added to glibc.
>> We have chosen to use the omp pragma '#pragma omp declare variant ...'
>> with a simd construct as the way for glibc to inform GCC what functions
>> are available.
>>
>> For example, if we would like to supply a vector version of the scalar
>> 'cosf' we would have an include file with something like:
>> typedef __attribute__((__neon_vector_type__(4))) float __f32x4_t;
>> typedef __attribute__((__neon_vector_type__(2))) float __f32x2_t;
>> typedef __SVFloat32_t __sv_f32_t;
>> typedef __SVBool_t __sv_bool_t;
>> __f32x4_t _ZGVnN4v_cosf (__f32x4_t);
>> __f32x2_t _ZGVnN2v_cosf (__f32x2_t);
>> __sv_f32_t _ZGVsMxv_cosf (__sv_f32_t, __sv_bool_t);
>> #pragma omp declare variant(_ZGVnN4v_cosf) \
>>       match(construct = {simd(notinbranch, simdlen(4))}, device =
>> {isa("simd")})
>> #pragma omp declare variant(_ZGVnN2v_cosf) \
>>       match(construct = {simd(notinbranch, simdlen(2))}, device =
>> {isa("simd")})
>> #pragma omp declare variant(_ZGVsMxv_cosf) \
>>       match(construct = {simd(inbranch)}, device = {isa("sve")})
>> extern float cosf (float);
>>
>> The BETA ABI can be found in the vfabia64 subdir of
>> https://github.com/ARM-software/abi-aa/
>> This currently disagrees with how this patch series implements 'omp
>> declare simd' for SVE and I also do not see a need for the 'omp declare
>> variant' scalable extension constructs. I will make changes to the ABI
>> once we've finalized the co-design of the ABI and this implementation.
> 
> I don't see a good reason for dropping the extension("scalable").
> The problem is that since the base spec requires a simdlen clause,
> GCC should in general raise an error if simdlen is omitted.
Where can you find this in the specs? I tried to find it but couldn't.

Leaving out simdlen in a 'omp declare simd' I assume is OK, our vector 
ABI defines behaviour for this. But I couldn't find what it meant for a 
omp declare variant, obviously can't be the same as for declare simd, as 
that is defined to mean 'define a set of clones' and only one clone can 
be associated to a declare variant.
> 
> But I'm not sure it makes sense to ignore -msve-vector-bits= when
> compiling the SVE version (which is what patch 4 seems to do).
> If someone compiles with -march=armv8.4-a, we'll use all Armv8.4-A
> features in the Advanced SIMD routines.  Why should we ignore
> SVE-related target information for the SVE routines?
Not sure I understand what you mean.  The vector ABI defines that if a 
simdlen is omitted that (other than the NEON clones) a SVE VLA clone is 
available. So how would I take -msve-vector-bits into consideration? Do 
you mean I ought to add them as options to pass to the function so that 
it gets used when doing the codegen for the clone (if a function body is 
available)?

This is where things get a bit iffy for me though... We purposefully 
generate a SVE simdclone regardless of command-line options, just like 
x86 does, so why would these options affect simd clone generation but 
not the actual availability of SVE? Just seems a bit odd...

A viable alternative would be to rely on declare variant for such 
behaviour, where we could use function attributes to pass specific 
target options to the variant's prototype to be able to add more 
specific tuning options per variant.  Not sure it will work but I can 
try it with my rebased patches at some point. I have to admit though, it 
is not a feature we are looking to use, so not sure it's worth the 
effort. The SVE simdclone codegen (with function bodies) is already 
pretty bad, so if we do believe there is a usecase for these, that might 
be something we should focus on before this sort of more specific tuning.
> 
> Of course, the fact that we take command-line options into account
> means that omp simd/variant clauses on linkonce/comdat group functions
> are an ODR violation waiting to happen.  But the same is true for the
> original scalar functions that the clauses are attached to.
Can't find proper definitions of linkonce/comdat group functions so 
can't comment.

> 
> Thanks,
> Richard
> 
>> The patch series has three main steps:
>> 1) Add SVE support for 'omp declare simd', see PR 96342
>> 2) Enable GCC to use omp declare variants with simd constructs as simd
>> clones during auto-vectorization.
>> 3) Add SLP support for vectorizable_simd_clone_call (This sounded like a
>> nice thing to add as we want to move away from non-slp vectorization).
>>
>> Below you can see the list of current Patches/RFCs, the difference being
>> on how confident I am of the proposed changes. For the RFC I am hoping
>> to get early comments on the approach, rather than more indepth
>> code-reviews.
>>
>> I appreciate we are still in Stage 4, so I can completely understand if
>> you don't have time to review this now, but I thought it can't hurt to
>> post these early.
>>
>> Andre Vieira:
>> [PATCH] omp: Replace simd_clone_subparts with TYPE_VECTOR_SUBPARTS
>> [PATCH] parloops: Copy target and optimizations when creating a function
>> clone
>> [PATCH] parloops: Allow poly nit and bound
>> [RFC] omp, aarch64: Add SVE support for 'omp declare simd' [PR 96342]
>> [RFC] omp: Create simd clones from 'omp declare variant's
>> [RFC] omp: Allow creation of simd clones from omp declare variant with
>> -fopenmp-simd flag
>>
>> Work in progress:
>> [RFC] vect: Enable SLP codegen for vectorizable_simd_clone_call

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC 0/X] Implement GCC support for AArch64 libmvec
  2023-04-20 15:22   ` Andre Vieira (lists)
@ 2023-04-20 16:02     ` Jakub Jelinek
  2023-04-20 16:13     ` Richard Sandiford
  1 sibling, 0 replies; 15+ messages in thread
From: Jakub Jelinek @ 2023-04-20 16:02 UTC (permalink / raw)
  To: Andre Vieira (lists); +Cc: gcc-patches, Richard Biener, richard.sandiford

On Thu, Apr 20, 2023 at 04:22:50PM +0100, Andre Vieira (lists) wrote:
> > I don't see a good reason for dropping the extension("scalable").
> > The problem is that since the base spec requires a simdlen clause,
> > GCC should in general raise an error if simdlen is omitted.
> Where can you find this in the specs? I tried to find it but couldn't.
> 
> Leaving out simdlen in a 'omp declare simd' I assume is OK, our vector ABI
> defines behaviour for this. But I couldn't find what it meant for a omp
> declare variant, obviously can't be the same as for declare simd, as that is
> defined to mean 'define a set of clones' and only one clone can be
> associated to a declare variant.

For missing simdlen on omp declare simd, OpenMP 5.2 says [202:14-15]:
"If a SIMD version is created and the simdlen clause is not specified, the number of concurrent
arguments for the function is implementation defined."
Nobody says it must be a constant when not specified, when specified it has
to be a constant.
declare variant is function call specialization based on lots of different
aspects.  If you specify simd among construct selectors, then the
implementation is allowed (and kind of expected but not currently implemented in
GCC) to change the calling convention based on the declare simd ABIs, but
again, simdlen might be specified (then it has to have constant number in
it) or not, then I bet it is supposed to be derived from the actual
differences in the calling convention to which match it is.
But as I said, this part isn't implemented yet even on other targets.

	Jakub

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC 0/X] Implement GCC support for AArch64 libmvec
  2023-04-20 15:22   ` Andre Vieira (lists)
  2023-04-20 16:02     ` Jakub Jelinek
@ 2023-04-20 16:13     ` Richard Sandiford
  2023-04-21  9:28       ` Andre Vieira (lists)
  1 sibling, 1 reply; 15+ messages in thread
From: Richard Sandiford @ 2023-04-20 16:13 UTC (permalink / raw)
  To: Andre Vieira (lists); +Cc: gcc-patches, jakub, Richard Biener

"Andre Vieira (lists)" <andre.simoesdiasvieira@arm.com> writes:
> On 20/04/2023 15:51, Richard Sandiford wrote:
>> "Andre Vieira (lists)" <andre.simoesdiasvieira@arm.com> writes:
>>> Hi all,
>>>
>>> This is a series of patches/RFCs to implement support in GCC to be able
>>> to target AArch64's libmvec functions that will be/are being added to glibc.
>>> We have chosen to use the omp pragma '#pragma omp declare variant ...'
>>> with a simd construct as the way for glibc to inform GCC what functions
>>> are available.
>>>
>>> For example, if we would like to supply a vector version of the scalar
>>> 'cosf' we would have an include file with something like:
>>> typedef __attribute__((__neon_vector_type__(4))) float __f32x4_t;
>>> typedef __attribute__((__neon_vector_type__(2))) float __f32x2_t;
>>> typedef __SVFloat32_t __sv_f32_t;
>>> typedef __SVBool_t __sv_bool_t;
>>> __f32x4_t _ZGVnN4v_cosf (__f32x4_t);
>>> __f32x2_t _ZGVnN2v_cosf (__f32x2_t);
>>> __sv_f32_t _ZGVsMxv_cosf (__sv_f32_t, __sv_bool_t);
>>> #pragma omp declare variant(_ZGVnN4v_cosf) \
>>>       match(construct = {simd(notinbranch, simdlen(4))}, device =
>>> {isa("simd")})
>>> #pragma omp declare variant(_ZGVnN2v_cosf) \
>>>       match(construct = {simd(notinbranch, simdlen(2))}, device =
>>> {isa("simd")})
>>> #pragma omp declare variant(_ZGVsMxv_cosf) \
>>>       match(construct = {simd(inbranch)}, device = {isa("sve")})
>>> extern float cosf (float);
>>>
>>> The BETA ABI can be found in the vfabia64 subdir of
>>> https://github.com/ARM-software/abi-aa/
>>> This currently disagrees with how this patch series implements 'omp
>>> declare simd' for SVE and I also do not see a need for the 'omp declare
>>> variant' scalable extension constructs. I will make changes to the ABI
>>> once we've finalized the co-design of the ABI and this implementation.
>> 
>> I don't see a good reason for dropping the extension("scalable").
>> The problem is that since the base spec requires a simdlen clause,
>> GCC should in general raise an error if simdlen is omitted.
> Where can you find this in the specs? I tried to find it but couldn't.
>
> Leaving out simdlen in a 'omp declare simd' I assume is OK, our vector 
> ABI defines behaviour for this. But I couldn't find what it meant for a 
> omp declare variant, obviously can't be the same as for declare simd, as 
> that is defined to mean 'define a set of clones' and only one clone can 
> be associated to a declare variant.

I was going from https://www.openmp.org/spec-html/5.0/openmpsu25.html ,
which says:

  The simd trait can be further defined with properties that match the
  clauses accepted by the declare simd directive with the same name and
  semantics. The simd trait must define at least the simdlen property and
  one of the inbranch or notinbranch properties.

(probably best to read it in the original -- it's almost incomprehensible
without markup)

Richard

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC 0/X] Implement GCC support for AArch64 libmvec
  2023-04-20 16:13     ` Richard Sandiford
@ 2023-04-21  9:28       ` Andre Vieira (lists)
  2023-04-21  9:54         ` Richard Sandiford
  0 siblings, 1 reply; 15+ messages in thread
From: Andre Vieira (lists) @ 2023-04-21  9:28 UTC (permalink / raw)
  To: gcc-patches, jakub, Richard Biener, richard.sandiford



On 20/04/2023 17:13, Richard Sandiford wrote:
> "Andre Vieira (lists)" <andre.simoesdiasvieira@arm.com> writes:
>> On 20/04/2023 15:51, Richard Sandiford wrote:
>>> "Andre Vieira (lists)" <andre.simoesdiasvieira@arm.com> writes:
>>>> Hi all,
>>>>
>>>> This is a series of patches/RFCs to implement support in GCC to be able
>>>> to target AArch64's libmvec functions that will be/are being added to glibc.
>>>> We have chosen to use the omp pragma '#pragma omp declare variant ...'
>>>> with a simd construct as the way for glibc to inform GCC what functions
>>>> are available.
>>>>
>>>> For example, if we would like to supply a vector version of the scalar
>>>> 'cosf' we would have an include file with something like:
>>>> typedef __attribute__((__neon_vector_type__(4))) float __f32x4_t;
>>>> typedef __attribute__((__neon_vector_type__(2))) float __f32x2_t;
>>>> typedef __SVFloat32_t __sv_f32_t;
>>>> typedef __SVBool_t __sv_bool_t;
>>>> __f32x4_t _ZGVnN4v_cosf (__f32x4_t);
>>>> __f32x2_t _ZGVnN2v_cosf (__f32x2_t);
>>>> __sv_f32_t _ZGVsMxv_cosf (__sv_f32_t, __sv_bool_t);
>>>> #pragma omp declare variant(_ZGVnN4v_cosf) \
>>>>        match(construct = {simd(notinbranch, simdlen(4))}, device =
>>>> {isa("simd")})
>>>> #pragma omp declare variant(_ZGVnN2v_cosf) \
>>>>        match(construct = {simd(notinbranch, simdlen(2))}, device =
>>>> {isa("simd")})
>>>> #pragma omp declare variant(_ZGVsMxv_cosf) \
>>>>        match(construct = {simd(inbranch)}, device = {isa("sve")})
>>>> extern float cosf (float);
>>>>
>>>> The BETA ABI can be found in the vfabia64 subdir of
>>>> https://github.com/ARM-software/abi-aa/
>>>> This currently disagrees with how this patch series implements 'omp
>>>> declare simd' for SVE and I also do not see a need for the 'omp declare
>>>> variant' scalable extension constructs. I will make changes to the ABI
>>>> once we've finalized the co-design of the ABI and this implementation.
>>>
>>> I don't see a good reason for dropping the extension("scalable").
>>> The problem is that since the base spec requires a simdlen clause,
>>> GCC should in general raise an error if simdlen is omitted.
>> Where can you find this in the specs? I tried to find it but couldn't.
>>
>> Leaving out simdlen in a 'omp declare simd' I assume is OK, our vector
>> ABI defines behaviour for this. But I couldn't find what it meant for a
>> omp declare variant, obviously can't be the same as for declare simd, as
>> that is defined to mean 'define a set of clones' and only one clone can
>> be associated to a declare variant.
> 
> I was going from https://www.openmp.org/spec-html/5.0/openmpsu25.html ,
> which says:
> 
>    The simd trait can be further defined with properties that match the
>    clauses accepted by the declare simd directive with the same name and
>    semantics. The simd trait must define at least the simdlen property and
>    one of the inbranch or notinbranch properties.
> 
> (probably best to read it in the original -- it's almost incomprehensible
> without markup)
> 
I'm guessing the keyword here is 'trait' which I'm guessing is different 
from a omp declare simd directive, which is why it's not required to 
have a simdlen clause in an omp declare simd (see Jakub's comment).

But for declare variants I guess it does require you to? It doesn't 
'break' anything, just means I need to add support for parsing the 
extension clause as was originally planned.
> Richard

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC 0/X] Implement GCC support for AArch64 libmvec
  2023-04-21  9:28       ` Andre Vieira (lists)
@ 2023-04-21  9:54         ` Richard Sandiford
  2023-04-21 10:28           ` Jakub Jelinek
  0 siblings, 1 reply; 15+ messages in thread
From: Richard Sandiford @ 2023-04-21  9:54 UTC (permalink / raw)
  To: Andre Vieira (lists); +Cc: gcc-patches, jakub, Richard Biener

"Andre Vieira (lists)" <andre.simoesdiasvieira@arm.com> writes:
> On 20/04/2023 17:13, Richard Sandiford wrote:
>> "Andre Vieira (lists)" <andre.simoesdiasvieira@arm.com> writes:
>>> On 20/04/2023 15:51, Richard Sandiford wrote:
>>>> "Andre Vieira (lists)" <andre.simoesdiasvieira@arm.com> writes:
>>>>> Hi all,
>>>>>
>>>>> This is a series of patches/RFCs to implement support in GCC to be able
>>>>> to target AArch64's libmvec functions that will be/are being added to glibc.
>>>>> We have chosen to use the omp pragma '#pragma omp declare variant ...'
>>>>> with a simd construct as the way for glibc to inform GCC what functions
>>>>> are available.
>>>>>
>>>>> For example, if we would like to supply a vector version of the scalar
>>>>> 'cosf' we would have an include file with something like:
>>>>> typedef __attribute__((__neon_vector_type__(4))) float __f32x4_t;
>>>>> typedef __attribute__((__neon_vector_type__(2))) float __f32x2_t;
>>>>> typedef __SVFloat32_t __sv_f32_t;
>>>>> typedef __SVBool_t __sv_bool_t;
>>>>> __f32x4_t _ZGVnN4v_cosf (__f32x4_t);
>>>>> __f32x2_t _ZGVnN2v_cosf (__f32x2_t);
>>>>> __sv_f32_t _ZGVsMxv_cosf (__sv_f32_t, __sv_bool_t);
>>>>> #pragma omp declare variant(_ZGVnN4v_cosf) \
>>>>>        match(construct = {simd(notinbranch, simdlen(4))}, device =
>>>>> {isa("simd")})
>>>>> #pragma omp declare variant(_ZGVnN2v_cosf) \
>>>>>        match(construct = {simd(notinbranch, simdlen(2))}, device =
>>>>> {isa("simd")})
>>>>> #pragma omp declare variant(_ZGVsMxv_cosf) \
>>>>>        match(construct = {simd(inbranch)}, device = {isa("sve")})
>>>>> extern float cosf (float);
>>>>>
>>>>> The BETA ABI can be found in the vfabia64 subdir of
>>>>> https://github.com/ARM-software/abi-aa/
>>>>> This currently disagrees with how this patch series implements 'omp
>>>>> declare simd' for SVE and I also do not see a need for the 'omp declare
>>>>> variant' scalable extension constructs. I will make changes to the ABI
>>>>> once we've finalized the co-design of the ABI and this implementation.
>>>>
>>>> I don't see a good reason for dropping the extension("scalable").
>>>> The problem is that since the base spec requires a simdlen clause,
>>>> GCC should in general raise an error if simdlen is omitted.
>>> Where can you find this in the specs? I tried to find it but couldn't.
>>>
>>> Leaving out simdlen in a 'omp declare simd' I assume is OK, our vector
>>> ABI defines behaviour for this. But I couldn't find what it meant for a
>>> omp declare variant, obviously can't be the same as for declare simd, as
>>> that is defined to mean 'define a set of clones' and only one clone can
>>> be associated to a declare variant.
>> 
>> I was going from https://www.openmp.org/spec-html/5.0/openmpsu25.html ,
>> which says:
>> 
>>    The simd trait can be further defined with properties that match the
>>    clauses accepted by the declare simd directive with the same name and
>>    semantics. The simd trait must define at least the simdlen property and
>>    one of the inbranch or notinbranch properties.
>> 
>> (probably best to read it in the original -- it's almost incomprehensible
>> without markup)
>> 
> I'm guessing the keyword here is 'trait' which I'm guessing is different 
> from a omp declare simd directive, which is why it's not required to 
> have a simdlen clause in an omp declare simd (see Jakub's comment).

Sure.  The thread above is about whether we need extension("scalable")
or should drop it.  And extension("scalable") is only used in omp
declare variant.  This was in response to "I also do not see a need
for the 'omp declare variant' scalable extension constructs".

Not having a simdlen on an omp declare simd is of course OK (and the
VFABI defines behaviour for that case).

Richard

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC 0/X] Implement GCC support for AArch64 libmvec
  2023-04-21  9:54         ` Richard Sandiford
@ 2023-04-21 10:28           ` Jakub Jelinek
  0 siblings, 0 replies; 15+ messages in thread
From: Jakub Jelinek @ 2023-04-21 10:28 UTC (permalink / raw)
  To: Andre Vieira (lists), gcc-patches, Richard Biener, richard.sandiford

On Fri, Apr 21, 2023 at 10:54:51AM +0100, Richard Sandiford wrote:
> > I'm guessing the keyword here is 'trait' which I'm guessing is different 
> > from a omp declare simd directive, which is why it's not required to 
> > have a simdlen clause in an omp declare simd (see Jakub's comment).
> 
> Sure.  The thread above is about whether we need extension("scalable")
> or should drop it.  And extension("scalable") is only used in omp
> declare variant.  This was in response to "I also do not see a need
> for the 'omp declare variant' scalable extension constructs".

I'm not sure extension("scalable") in context selectors is what you want
to handle declare variant.  While extension trait is allowed and it is
implementation defined what is accepted as its arguments (within the
boundaries of allowed syntax), in this case you really want to adjust
behavior of the simd trait, so it would be better to specify you want
scalable simdlen using a simd trait property.

There will be OpenMP F2F meeting next month, I think this should be
discussed there and agreed on how to do this, after all, seems ARM
won't be the only architecture that needs it, RISC-V might be another.

	Jakub

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2023-04-21 10:28 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-03-08 16:17 [RFC 0/X] Implement GCC support for AArch64 libmvec Andre Vieira (lists)
2023-03-08 16:20 ` [PATCH 1/X] omp: Replace simd_clone_subparts with TYPE_VECTOR_SUBPARTS Andre Vieira (lists)
2023-04-20 15:20   ` Richard Sandiford
2023-03-08 16:21 ` [PATCH 2/X] parloops: Copy target and optimizations when creating a function clone Andre Vieira (lists)
2023-03-08 16:23 ` [PATCH 3/X] parloops: Allow poly number of iterations Andre Vieira (lists)
2023-03-08 16:25 ` [RFC 4/X] omp, aarch64: Add SVE support for 'omp declare simd' [PR 96342] Andre Vieira (lists)
2023-03-08 16:26 ` [RFC 5/X] omp: Create simd clones from 'omp declare variant's Andre Vieira (lists)
2023-03-08 16:28 ` [RFC 6/X] omp: Allow creation of simd clones from omp declare variant with -fopenmp-simd flag Andre Vieira (lists)
2023-04-20 14:51 ` [RFC 0/X] Implement GCC support for AArch64 libmvec Richard Sandiford
2023-04-20 15:22   ` Andre Vieira (lists)
2023-04-20 16:02     ` Jakub Jelinek
2023-04-20 16:13     ` Richard Sandiford
2023-04-21  9:28       ` Andre Vieira (lists)
2023-04-21  9:54         ` Richard Sandiford
2023-04-21 10:28           ` Jakub Jelinek

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).