[PATCH 1/6] arm: Fix arm_simd_types and MVE scalar

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* [PATCH 1/6] arm: Fix arm_simd_types and MVE scalar_types
@ 2023-11-16 15:26 Christophe Lyon
  2023-11-16 15:26 ` [PATCH 2/6] arm: [MVE intrinsics] Add support for void and load/store pointers as argument types Christophe Lyon
                   ` (5 more replies)
  0 siblings, 6 replies; 15+ messages in thread
From: Christophe Lyon @ 2023-11-16 15:26 UTC (permalink / raw)
  To: gcc-patches, richard.sandiford, richard.earnshaw, kyrylo.tkachov
  Cc: Christophe Lyon

So far we define arm_simd_types and scalar_types using type
definitions like intSI_type_node, etc...

This is causing problems with later patches which re-implement
load/store MVE intrinsics, leading to error messages such as:
  error: passing argument 1 of 'vst1q_s32' from incompatible pointer type
  note: expected 'int *' but argument is of type 'int32_t *' {aka 'long int *'}

This patch uses get_typenode_from_name (INT32_TYPE) instead, which
defines the types as appropriate for the target/C library.

2023-11-16  Christophe Lyon  <christophe.lyon@linaro.org>

	gcc/
	* config/arm/arm-builtins.cc (arm_init_simd_builtin_types): Fix
	initialization of arm_simd_types[].eltype.
	* config/arm/arm-mve-builtins.def (DEF_MVE_TYPE): Fix scalar
	types.
---
 gcc/config/arm/arm-builtins.cc      | 28 ++++++++++++++--------------
 gcc/config/arm/arm-mve-builtins.def | 16 ++++++++--------
 2 files changed, 22 insertions(+), 22 deletions(-)

diff --git a/gcc/config/arm/arm-builtins.cc b/gcc/config/arm/arm-builtins.cc
index fca7dcaf565..dd9c5815c45 100644
--- a/gcc/config/arm/arm-builtins.cc
+++ b/gcc/config/arm/arm-builtins.cc
@@ -1580,20 +1580,20 @@ arm_init_simd_builtin_types (void)
       TYPE_STRING_FLAG (arm_simd_polyHI_type_node) = false;
     }
   /* Init all the element types built by the front-end.  */
-  arm_simd_types[Int8x8_t].eltype = intQI_type_node;
-  arm_simd_types[Int8x16_t].eltype = intQI_type_node;
-  arm_simd_types[Int16x4_t].eltype = intHI_type_node;
-  arm_simd_types[Int16x8_t].eltype = intHI_type_node;
-  arm_simd_types[Int32x2_t].eltype = intSI_type_node;
-  arm_simd_types[Int32x4_t].eltype = intSI_type_node;
-  arm_simd_types[Int64x2_t].eltype = intDI_type_node;
-  arm_simd_types[Uint8x8_t].eltype = unsigned_intQI_type_node;
-  arm_simd_types[Uint8x16_t].eltype = unsigned_intQI_type_node;
-  arm_simd_types[Uint16x4_t].eltype = unsigned_intHI_type_node;
-  arm_simd_types[Uint16x8_t].eltype = unsigned_intHI_type_node;
-  arm_simd_types[Uint32x2_t].eltype = unsigned_intSI_type_node;
-  arm_simd_types[Uint32x4_t].eltype = unsigned_intSI_type_node;
-  arm_simd_types[Uint64x2_t].eltype = unsigned_intDI_type_node;
+  arm_simd_types[Int8x8_t].eltype = get_typenode_from_name (INT8_TYPE);
+  arm_simd_types[Int8x16_t].eltype = get_typenode_from_name (INT8_TYPE);
+  arm_simd_types[Int16x4_t].eltype = get_typenode_from_name (INT16_TYPE);
+  arm_simd_types[Int16x8_t].eltype = get_typenode_from_name (INT16_TYPE);
+  arm_simd_types[Int32x2_t].eltype = get_typenode_from_name (INT32_TYPE);
+  arm_simd_types[Int32x4_t].eltype = get_typenode_from_name (INT32_TYPE);
+  arm_simd_types[Int64x2_t].eltype = get_typenode_from_name (INT64_TYPE);
+  arm_simd_types[Uint8x8_t].eltype = get_typenode_from_name (UINT8_TYPE);
+  arm_simd_types[Uint8x16_t].eltype = get_typenode_from_name (UINT8_TYPE);
+  arm_simd_types[Uint16x4_t].eltype = get_typenode_from_name (UINT16_TYPE);
+  arm_simd_types[Uint16x8_t].eltype = get_typenode_from_name (UINT16_TYPE);
+  arm_simd_types[Uint32x2_t].eltype = get_typenode_from_name (UINT32_TYPE);
+  arm_simd_types[Uint32x4_t].eltype = get_typenode_from_name (UINT32_TYPE);
+  arm_simd_types[Uint64x2_t].eltype = get_typenode_from_name (UINT64_TYPE);
 
   /* Note: poly64x2_t is defined in arm_neon.h, to ensure it gets default
      mangling.  */
diff --git a/gcc/config/arm/arm-mve-builtins.def b/gcc/config/arm/arm-mve-builtins.def
index e2cf1baf370..a901d8231e9 100644
--- a/gcc/config/arm/arm-mve-builtins.def
+++ b/gcc/config/arm/arm-mve-builtins.def
@@ -39,14 +39,14 @@ DEF_MVE_MODE (r, none, none, none)
 
 #define REQUIRES_FLOAT false
 DEF_MVE_TYPE (mve_pred16_t, boolean_type_node)
-DEF_MVE_TYPE (uint8x16_t, unsigned_intQI_type_node)
-DEF_MVE_TYPE (uint16x8_t, unsigned_intHI_type_node)
-DEF_MVE_TYPE (uint32x4_t, unsigned_intSI_type_node)
-DEF_MVE_TYPE (uint64x2_t, unsigned_intDI_type_node)
-DEF_MVE_TYPE (int8x16_t, intQI_type_node)
-DEF_MVE_TYPE (int16x8_t, intHI_type_node)
-DEF_MVE_TYPE (int32x4_t, intSI_type_node)
-DEF_MVE_TYPE (int64x2_t, intDI_type_node)
+DEF_MVE_TYPE (uint8x16_t, get_typenode_from_name (UINT8_TYPE))
+DEF_MVE_TYPE (uint16x8_t, get_typenode_from_name (UINT16_TYPE))
+DEF_MVE_TYPE (uint32x4_t, get_typenode_from_name (UINT32_TYPE))
+DEF_MVE_TYPE (uint64x2_t, get_typenode_from_name (UINT64_TYPE))
+DEF_MVE_TYPE (int8x16_t, get_typenode_from_name (INT8_TYPE))
+DEF_MVE_TYPE (int16x8_t, get_typenode_from_name (INT16_TYPE))
+DEF_MVE_TYPE (int32x4_t, get_typenode_from_name (INT32_TYPE))
+DEF_MVE_TYPE (int64x2_t, get_typenode_from_name (INT64_TYPE))
 #undef REQUIRES_FLOAT
 
 #define REQUIRES_FLOAT true
-- 
2.34.1


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 2/6] arm: [MVE intrinsics] Add support for void and load/store pointers as argument types.
  2023-11-16 15:26 [PATCH 1/6] arm: Fix arm_simd_types and MVE scalar_types Christophe Lyon
@ 2023-11-16 15:26 ` Christophe Lyon
  2023-11-16 16:47   ` Kyrylo Tkachov
  2023-11-16 15:26 ` [PATCH 3/6] arm: [MVE intrinsics] Add support for contiguous loads and stores Christophe Lyon
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 15+ messages in thread
From: Christophe Lyon @ 2023-11-16 15:26 UTC (permalink / raw)
  To: gcc-patches, richard.sandiford, richard.earnshaw, kyrylo.tkachov
  Cc: Christophe Lyon

This patch adds support for '_', 'al' and 'as' for void, load pointer
and store pointer argument/return value types in intrinsic signatures.

It also adds a mew memory_scalar_type() helper to function_instance,
which is used by 'al' and 'as'.

2023-11-16  Christophe Lyon  <christophe.lyon@linaro.org>

	gcc/
	* config/arm/arm-mve-builtins-shapes.cc (build_const_pointer):
	New.
	(parse_type): Add support for '_', 'al' and 'as'.
	* config/arm/arm-mve-builtins.h (function_instance): Add
	memory_scalar_type.
	(function_base): Likewise.
---
 gcc/config/arm/arm-mve-builtins-shapes.cc | 25 +++++++++++++++++++++++
 gcc/config/arm/arm-mve-builtins.h         | 17 +++++++++++++++
 2 files changed, 42 insertions(+)

diff --git a/gcc/config/arm/arm-mve-builtins-shapes.cc b/gcc/config/arm/arm-mve-builtins-shapes.cc
index 23eb9d0e69b..ce87ebcef30 100644
--- a/gcc/config/arm/arm-mve-builtins-shapes.cc
+++ b/gcc/config/arm/arm-mve-builtins-shapes.cc
@@ -39,6 +39,13 @@
 
 namespace arm_mve {
 
+/* Return a representation of "const T *".  */
+static tree
+build_const_pointer (tree t)
+{
+  return build_pointer_type (build_qualified_type (t, TYPE_QUAL_CONST));
+}
+
 /* If INSTANCE has a predicate, add it to the list of argument types
    in ARGUMENT_TYPES.  RETURN_TYPE is the type returned by the
    function.  */
@@ -140,6 +147,9 @@ parse_element_type (const function_instance &instance, const char *&format)
 /* Read and return a type from FORMAT for function INSTANCE.  Advance
    FORMAT beyond the type string.  The format is:
 
+   _       - void
+   al      - array pointer for loads
+   as      - array pointer for stores
    p       - predicates with type mve_pred16_t
    s<elt>  - a scalar type with the given element suffix
    t<elt>  - a vector or tuple type with given element suffix [*1]
@@ -156,6 +166,21 @@ parse_type (const function_instance &instance, const char *&format)
 {
   int ch = *format++;
 
+
+  if (ch == '_')
+    return void_type_node;
+
+  if (ch == 'a')
+    {
+      ch = *format++;
+      if (ch == 'l')
+	return build_const_pointer (instance.memory_scalar_type ());
+      if (ch == 's') {
+	return build_pointer_type (instance.memory_scalar_type ());
+      }
+      gcc_unreachable ();
+    }
+
   if (ch == 'p')
     return get_mve_pred16_t ();
 
diff --git a/gcc/config/arm/arm-mve-builtins.h b/gcc/config/arm/arm-mve-builtins.h
index 37b8223dfb2..4fd230fe4c7 100644
--- a/gcc/config/arm/arm-mve-builtins.h
+++ b/gcc/config/arm/arm-mve-builtins.h
@@ -277,6 +277,7 @@ public:
   bool could_trap_p () const;
 
   unsigned int vectors_per_tuple () const;
+  tree memory_scalar_type () const;
 
   const mode_suffix_info &mode_suffix () const;
 
@@ -519,6 +520,14 @@ public:
      of vectors in the tuples, otherwise return 1.  */
   virtual unsigned int vectors_per_tuple () const { return 1; }
 
+  /* If the function addresses memory, return the type of a single
+     scalar memory element.  */
+  virtual tree
+  memory_scalar_type (const function_instance &) const
+  {
+    gcc_unreachable ();
+  }
+
   /* Try to fold the given gimple call.  Return the new gimple statement
      on success, otherwise return null.  */
   virtual gimple *fold (gimple_folder &) const { return NULL; }
@@ -644,6 +653,14 @@ function_instance::vectors_per_tuple () const
   return base->vectors_per_tuple ();
 }
 
+/* If the function addresses memory, return the type of a single
+   scalar memory element.  */
+inline tree
+function_instance::memory_scalar_type () const
+{
+  return base->memory_scalar_type (*this);
+}
+
 /* Return information about the function's mode suffix.  */
 inline const mode_suffix_info &
 function_instance::mode_suffix () const
-- 
2.34.1


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 3/6] arm: [MVE intrinsics] Add support for contiguous loads and stores
  2023-11-16 15:26 [PATCH 1/6] arm: Fix arm_simd_types and MVE scalar_types Christophe Lyon
  2023-11-16 15:26 ` [PATCH 2/6] arm: [MVE intrinsics] Add support for void and load/store pointers as argument types Christophe Lyon
@ 2023-11-16 15:26 ` Christophe Lyon
  2023-11-16 16:48   ` Kyrylo Tkachov
  2023-11-23 13:29   ` Jan-Benedict Glaw
  2023-11-16 15:26 ` [PATCH 4/6] arm: [MVE intrinsics] add load and store shapes Christophe Lyon
                   ` (3 subsequent siblings)
  5 siblings, 2 replies; 15+ messages in thread
From: Christophe Lyon @ 2023-11-16 15:26 UTC (permalink / raw)
  To: gcc-patches, richard.sandiford, richard.earnshaw, kyrylo.tkachov
  Cc: Christophe Lyon

This patch adds base support for load/store intrinsics to the
framework, starting with loads and stores for contiguous memory
elements, without extension nor truncation.

Compared to the aarch64/SVE implementation, there's no support for
gather/scatter loads/stores yet.  This will be added later as needed.

2023-11-16  Christophe Lyon  <christophe.lyon@linaro.org>

	gcc/
	* config/arm/arm-mve-builtins-functions.h (multi_vector_function)
	(full_width_access): New classes.
	* config/arm/arm-mve-builtins.cc
	(find_type_suffix_for_scalar_type, infer_pointer_type)
	(require_pointer_type, get_contiguous_base, add_mem_operand)
	(add_fixed_operand, use_contiguous_load_insn)
	(use_contiguous_store_insn): New.
	* config/arm/arm-mve-builtins.h (memory_vector_mode)
	(infer_pointer_type, require_pointer_type, get_contiguous_base)
	(add_mem_operand)
	(add_fixed_operand, use_contiguous_load_insn)
	(use_contiguous_store_insn): New.
---
 gcc/config/arm/arm-mve-builtins-functions.h |  56 ++++++++++
 gcc/config/arm/arm-mve-builtins.cc          | 116 ++++++++++++++++++++
 gcc/config/arm/arm-mve-builtins.h           |  28 ++++-
 3 files changed, 199 insertions(+), 1 deletion(-)

diff --git a/gcc/config/arm/arm-mve-builtins-functions.h b/gcc/config/arm/arm-mve-builtins-functions.h
index eba1f071af0..6d234a2dd7c 100644
--- a/gcc/config/arm/arm-mve-builtins-functions.h
+++ b/gcc/config/arm/arm-mve-builtins-functions.h
@@ -966,6 +966,62 @@ public:
   }
 };
 
+/* A function_base that sometimes or always operates on tuples of
+   vectors.  */
+class multi_vector_function : public function_base
+{
+public:
+  CONSTEXPR multi_vector_function (unsigned int vectors_per_tuple)
+    : m_vectors_per_tuple (vectors_per_tuple) {}
+
+  unsigned int
+  vectors_per_tuple () const override
+  {
+    return m_vectors_per_tuple;
+  }
+
+  /* The number of vectors in a tuple, or 1 if the function only operates
+     on single vectors.  */
+  unsigned int m_vectors_per_tuple;
+};
+
+/* A function_base that loads or stores contiguous memory elements
+   without extending or truncating them.  */
+class full_width_access : public multi_vector_function
+{
+public:
+  CONSTEXPR full_width_access (unsigned int vectors_per_tuple = 1)
+    : multi_vector_function (vectors_per_tuple) {}
+
+  tree
+  memory_scalar_type (const function_instance &fi) const override
+  {
+    return fi.scalar_type (0);
+  }
+
+  machine_mode
+  memory_vector_mode (const function_instance &fi) const override
+  {
+    machine_mode mode = fi.vector_mode (0);
+    /* Vectors of floating-point are managed in memory as vectors of
+       integers.  */
+    switch (mode)
+      {
+      case E_V4SFmode:
+	mode = E_V4SImode;
+	break;
+      case E_V8HFmode:
+	mode = E_V8HImode;
+	break;
+      }
+
+    if (m_vectors_per_tuple != 1)
+      mode = targetm.array_mode (mode, m_vectors_per_tuple).require ();
+
+    return mode;
+  }
+};
+
 } /* end namespace arm_mve */
 
 /* Declare the global function base NAME, creating it from an instance
diff --git a/gcc/config/arm/arm-mve-builtins.cc b/gcc/config/arm/arm-mve-builtins.cc
index 02dc8fa9b73..a265cb05553 100644
--- a/gcc/config/arm/arm-mve-builtins.cc
+++ b/gcc/config/arm/arm-mve-builtins.cc
@@ -36,6 +36,7 @@
 #include "fold-const.h"
 #include "gimple.h"
 #include "gimple-iterator.h"
+#include "explow.h"
 #include "emit-rtl.h"
 #include "langhooks.h"
 #include "stringpool.h"
@@ -529,6 +530,22 @@ matches_type_p (const_tree model_type, const_tree candidate)
 	  && TYPE_MAIN_VARIANT (model_type) == TYPE_MAIN_VARIANT (candidate));
 }
 
+/* If TYPE is a valid MVE element type, return the corresponding type
+   suffix, otherwise return NUM_TYPE_SUFFIXES.  */
+static type_suffix_index
+find_type_suffix_for_scalar_type (const_tree type)
+{
+  /* A linear search should be OK here, since the code isn't hot and
+     the number of types is only small.  */
+  for (unsigned int suffix_i = 0; suffix_i < NUM_TYPE_SUFFIXES; ++suffix_i)
+      {
+	vector_type_index vector_i = type_suffixes[suffix_i].vector_type;
+	if (matches_type_p (scalar_types[vector_i], type))
+	  return type_suffix_index (suffix_i);
+      }
+  return NUM_TYPE_SUFFIXES;
+}
+
 /* Report an error against LOCATION that the user has tried to use
    a floating point function when the mve.fp extension is disabled.  */
 static void
@@ -1125,6 +1142,37 @@ function_resolver::resolve_to (mode_suffix_index mode,
   return res;
 }
 
+/* Require argument ARGNO to be a pointer to a scalar type that has a
+   corresponding type suffix.  Return that type suffix on success,
+   otherwise report an error and return NUM_TYPE_SUFFIXES.  */
+type_suffix_index
+function_resolver::infer_pointer_type (unsigned int argno)
+{
+  tree actual = get_argument_type (argno);
+  if (actual == error_mark_node)
+    return NUM_TYPE_SUFFIXES;
+
+  if (TREE_CODE (actual) != POINTER_TYPE)
+    {
+      error_at (location, "passing %qT to argument %d of %qE, which"
+		" expects a pointer type", actual, argno + 1, fndecl);
+      return NUM_TYPE_SUFFIXES;
+    }
+
+  tree target = TREE_TYPE (actual);
+  type_suffix_index type = find_type_suffix_for_scalar_type (target);
+  if (type == NUM_TYPE_SUFFIXES)
+    {
+      error_at (location, "passing %qT to argument %d of %qE, but %qT is not"
+		" a valid MVE element type", actual, argno + 1, fndecl,
+		build_qualified_type (target, 0));
+      return NUM_TYPE_SUFFIXES;
+    }
+  unsigned int bits = type_suffixes[type].element_bits;
+
+  return type;
+}
+
 /* Require argument ARGNO to be a single vector or a tuple of NUM_VECTORS
    vectors; NUM_VECTORS is 1 for the former.  Return the associated type
    suffix on success, using TYPE_SUFFIX_b for predicates.  Report an error
@@ -1498,6 +1546,22 @@ function_resolver::require_scalar_type (unsigned int argno,
   return true;
 }
 
+/* Require argument ARGNO to be some form of pointer, without being specific
+   about its target type.  Return true if the argument has the right form,
+   otherwise report an appropriate error.  */
+bool
+function_resolver::require_pointer_type (unsigned int argno)
+{
+  if (!scalar_argument_p (argno))
+    {
+      error_at (location, "passing %qT to argument %d of %qE, which"
+		" expects a scalar pointer", get_argument_type (argno),
+		argno + 1, fndecl);
+      return false;
+    }
+  return true;
+}
+
 /* Require the function to have exactly EXPECTED arguments.  Return true
    if it does, otherwise report an appropriate error.  */
 bool
@@ -1955,6 +2019,14 @@ function_expander::direct_optab_handler (optab op, unsigned int suffix_i)
   return ::direct_optab_handler (op, vector_mode (suffix_i));
 }
 
+/* Return the base address for a contiguous load or store
+   function.  */
+rtx
+function_expander::get_contiguous_base ()
+{
+  return args[0];
+}
+
 /* For a function that does the equivalent of:
 
      OUTPUT = COND ? FN (INPUTS) : FALLBACK;
@@ -2043,6 +2115,26 @@ function_expander::add_integer_operand (HOST_WIDE_INT x)
   create_integer_operand (&m_ops.last (), x);
 }
 
+/* Add a memory operand with mode MODE and address ADDR.  */
+void
+function_expander::add_mem_operand (machine_mode mode, rtx addr)
+{
+  gcc_assert (VECTOR_MODE_P (mode));
+  rtx mem = gen_rtx_MEM (mode, memory_address (mode, addr));
+  /* The memory is only guaranteed to be element-aligned.  */
+  set_mem_align (mem, GET_MODE_ALIGNMENT (GET_MODE_INNER (mode)));
+  add_fixed_operand (mem);
+}
+
+/* Add an operand that must be X.  The only way of legitimizing an
+   invalid X is to reload the address of a MEM.  */
+void
+function_expander::add_fixed_operand (rtx x)
+{
+  m_ops.safe_grow (m_ops.length () + 1, true);
+  create_fixed_operand (&m_ops.last (), x);
+}
+
 /* Generate instruction ICODE, given that its operands have already
    been added to M_OPS.  Return the value of the first operand.  */
 rtx
@@ -2137,6 +2229,30 @@ function_expander::use_cond_insn (insn_code icode, unsigned int merge_argno)
   return generate_insn (icode);
 }
 
+/* Implement the call using instruction ICODE, which loads memory operand 1
+   into register operand 0.  */
+rtx
+function_expander::use_contiguous_load_insn (insn_code icode)
+{
+  machine_mode mem_mode = memory_vector_mode ();
+
+  add_output_operand (icode);
+  add_mem_operand (mem_mode, get_contiguous_base ());
+  return generate_insn (icode);
+}
+
+/* Implement the call using instruction ICODE, which stores register operand 1
+   into memory operand 0.  */
+rtx
+function_expander::use_contiguous_store_insn (insn_code icode)
+{
+  machine_mode mem_mode = memory_vector_mode ();
+
+  add_mem_operand (mem_mode, get_contiguous_base ());
+  add_input_operand (icode, args[1]);
+  return generate_insn (icode);
+}
+
 /* Implement the call using a normal unpredicated optab for PRED_none.
 
    <optab> corresponds to:
diff --git a/gcc/config/arm/arm-mve-builtins.h b/gcc/config/arm/arm-mve-builtins.h
index 4fd230fe4c7..9c219fa8db4 100644
--- a/gcc/config/arm/arm-mve-builtins.h
+++ b/gcc/config/arm/arm-mve-builtins.h
@@ -278,6 +278,7 @@ public:
 
   unsigned int vectors_per_tuple () const;
   tree memory_scalar_type () const;
+  machine_mode memory_vector_mode () const;
 
   const mode_suffix_info &mode_suffix () const;
 
@@ -383,6 +384,7 @@ public:
 		   type_suffix_index = NUM_TYPE_SUFFIXES,
 		   type_suffix_index = NUM_TYPE_SUFFIXES);
 
+  type_suffix_index infer_pointer_type (unsigned int);
   type_suffix_index infer_vector_or_tuple_type (unsigned int, unsigned int);
   type_suffix_index infer_vector_type (unsigned int);
 
@@ -394,8 +396,9 @@ public:
 				    type_suffix_index,
 				    type_class_index = SAME_TYPE_CLASS,
 				    unsigned int = SAME_SIZE);
-  bool require_integer_immediate (unsigned int);
   bool require_scalar_type (unsigned int, const char *);
+  bool require_pointer_type (unsigned int);
+  bool require_integer_immediate (unsigned int);
   bool require_derived_scalar_type (unsigned int, type_class_index,
 				    unsigned int = SAME_SIZE);
   
@@ -476,18 +479,23 @@ public:
 
   insn_code direct_optab_handler (optab, unsigned int = 0);
 
+  rtx get_contiguous_base ();
   rtx get_fallback_value (machine_mode, unsigned int, unsigned int &);
   rtx get_reg_target ();
 
   void add_output_operand (insn_code);
   void add_input_operand (insn_code, rtx);
   void add_integer_operand (HOST_WIDE_INT);
+  void add_mem_operand (machine_mode, rtx);
+  void add_fixed_operand (rtx);
   rtx generate_insn (insn_code);
 
   rtx use_exact_insn (insn_code);
   rtx use_unpred_insn (insn_code);
   rtx use_pred_x_insn (insn_code);
   rtx use_cond_insn (insn_code, unsigned int = DEFAULT_MERGE_ARGNO);
+  rtx use_contiguous_load_insn (insn_code);
+  rtx use_contiguous_store_insn (insn_code);
 
   rtx map_to_rtx_codes (rtx_code, rtx_code, rtx_code);
 
@@ -528,6 +536,15 @@ public:
     gcc_unreachable ();
   }
 
+  /* If the function addresses memory, return a vector mode whose
+     GET_MODE_NUNITS is the number of elements addressed and whose
+     GET_MODE_INNER is the mode of a single scalar memory element.  */
+  virtual machine_mode
+  memory_vector_mode (const function_instance &) const
+  {
+    gcc_unreachable ();
+  }
+
   /* Try to fold the given gimple call.  Return the new gimple statement
      on success, otherwise return null.  */
   virtual gimple *fold (gimple_folder &) const { return NULL; }
@@ -661,6 +678,15 @@ function_instance::memory_scalar_type () const
   return base->memory_scalar_type (*this);
 }
 
+/* If the function addresses memory, return a vector mode whose
+   GET_MODE_NUNITS is the number of elements addressed and whose
+   GET_MODE_INNER is the mode of a single scalar memory element.  */
+inline machine_mode
+function_instance::memory_vector_mode () const
+{
+  return base->memory_vector_mode (*this);
+}
+
 /* Return information about the function's mode suffix.  */
 inline const mode_suffix_info &
 function_instance::mode_suffix () const
-- 
2.34.1


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 4/6] arm: [MVE intrinsics] add load and store shapes
  2023-11-16 15:26 [PATCH 1/6] arm: Fix arm_simd_types and MVE scalar_types Christophe Lyon
  2023-11-16 15:26 ` [PATCH 2/6] arm: [MVE intrinsics] Add support for void and load/store pointers as argument types Christophe Lyon
  2023-11-16 15:26 ` [PATCH 3/6] arm: [MVE intrinsics] Add support for contiguous loads and stores Christophe Lyon
@ 2023-11-16 15:26 ` Christophe Lyon
  2023-11-16 16:49   ` Kyrylo Tkachov
  2023-11-16 15:26 ` [PATCH 5/6] arm: [MVE intrinsics] fix vst1 tests Christophe Lyon
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 15+ messages in thread
From: Christophe Lyon @ 2023-11-16 15:26 UTC (permalink / raw)
  To: gcc-patches, richard.sandiford, richard.earnshaw, kyrylo.tkachov
  Cc: Christophe Lyon

This patch adds the load and store shapes descriptions.

2023-11-16  Christophe Lyon  <christophe.lyon@linaro.org>

	gcc/
	* config/arm/arm-mve-builtins-shapes.cc (load, store): New.
	* config/arm/arm-mve-builtins-shapes.h (load, store): New.
---
 gcc/config/arm/arm-mve-builtins-shapes.cc | 67 +++++++++++++++++++++++
 gcc/config/arm/arm-mve-builtins-shapes.h  |  2 +
 2 files changed, 69 insertions(+)

diff --git a/gcc/config/arm/arm-mve-builtins-shapes.cc b/gcc/config/arm/arm-mve-builtins-shapes.cc
index ce87ebcef30..fe983e7c736 100644
--- a/gcc/config/arm/arm-mve-builtins-shapes.cc
+++ b/gcc/config/arm/arm-mve-builtins-shapes.cc
@@ -1428,6 +1428,38 @@ struct inherent_def : public nonoverloaded_base
 };
 SHAPE (inherent)
 
+/* sv<t0>_t svfoo[_t0](const <t0>_t *)
+
+   Example: vld1q.
+   int8x16_t [__arm_]vld1q[_s8](int8_t const *base)
+   int8x16_t [__arm_]vld1q_z[_s8](int8_t const *base, mve_pred16_t p)  */
+struct load_def : public overloaded_base<0>
+{
+  void
+  build (function_builder &b, const function_group_info &group,
+	 bool preserve_user_namespace) const override
+  {
+    b.add_overloaded_functions (group, MODE_none, preserve_user_namespace);
+    build_all (b, "t0,al", group, MODE_none, preserve_user_namespace);
+  }
+
+  /* Resolve a call based purely on a pointer argument.  */
+  tree
+  resolve (function_resolver &r) const override
+  {
+    gcc_assert (r.mode_suffix_id == MODE_none);
+
+    unsigned int i, nargs;
+    type_suffix_index type;
+    if (!r.check_gp_argument (1, i, nargs)
+	|| (type = r.infer_pointer_type (i)) == NUM_TYPE_SUFFIXES)
+      return error_mark_node;
+
+    return r.resolve_to (r.mode_suffix_id, type);
+  }
+};
+SHAPE (load)
+
 /* <T0>_t vfoo[_t0](<T0>_t)
    <T0>_t vfoo_n_t0(<sT0>_t)
 
@@ -1477,6 +1509,41 @@ struct mvn_def : public overloaded_base<0>
 };
 SHAPE (mvn)
 
+/* void vfoo[_t0](<X>_t *, v<t0>[xN]_t)
+
+   where <X> might be tied to <t0> (for non-truncating stores) or might
+   depend on the function base name (for truncating stores).
+
+   Example: vst1q.
+   void [__arm_]vst1q[_s8](int8_t *base, int8x16_t value)
+   void [__arm_]vst1q_p[_s8](int8_t *base, int8x16_t value, mve_pred16_t p)  */
+struct store_def : public overloaded_base<0>
+{
+  void
+  build (function_builder &b, const function_group_info &group,
+	 bool preserve_user_namespace) const override
+  {
+    b.add_overloaded_functions (group, MODE_none, preserve_user_namespace);
+    build_all (b, "_,as,v0", group, MODE_none, preserve_user_namespace);
+  }
+
+  tree
+  resolve (function_resolver &r) const override
+  {
+    gcc_assert (r.mode_suffix_id == MODE_none);
+
+    unsigned int i, nargs;
+    type_suffix_index type;
+    if (!r.check_gp_argument (2, i, nargs)
+	|| !r.require_pointer_type (0)
+	|| (type = r.infer_vector_type (1)) == NUM_TYPE_SUFFIXES)
+      return error_mark_node;
+
+    return r.resolve_to (r.mode_suffix_id, type);
+  }
+};
+SHAPE (store)
+
 /* <T0>_t vfoo[_t0](<T0>_t, <T0>_t, <T0>_t)
 
    i.e. the standard shape for ternary operations that operate on
diff --git a/gcc/config/arm/arm-mve-builtins-shapes.h b/gcc/config/arm/arm-mve-builtins-shapes.h
index a93245321c9..aa9309dec7e 100644
--- a/gcc/config/arm/arm-mve-builtins-shapes.h
+++ b/gcc/config/arm/arm-mve-builtins-shapes.h
@@ -61,7 +61,9 @@ namespace arm_mve
     extern const function_shape *const cmp;
     extern const function_shape *const create;
     extern const function_shape *const inherent;
+    extern const function_shape *const load;
     extern const function_shape *const mvn;
+    extern const function_shape *const store;
     extern const function_shape *const ternary;
     extern const function_shape *const ternary_lshift;
     extern const function_shape *const ternary_n;
-- 
2.34.1


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 5/6] arm: [MVE intrinsics] fix vst1 tests
  2023-11-16 15:26 [PATCH 1/6] arm: Fix arm_simd_types and MVE scalar_types Christophe Lyon
                   ` (2 preceding siblings ...)
  2023-11-16 15:26 ` [PATCH 4/6] arm: [MVE intrinsics] add load and store shapes Christophe Lyon
@ 2023-11-16 15:26 ` Christophe Lyon
  2023-11-16 15:30   ` Kyrylo Tkachov
  2023-11-16 15:26 ` [PATCH 6/6] arm: [MVE intrinsics] rework vldq1 vst1q Christophe Lyon
  2023-11-16 16:46 ` [PATCH 1/6] arm: Fix arm_simd_types and MVE scalar_types Kyrylo Tkachov
  5 siblings, 1 reply; 15+ messages in thread
From: Christophe Lyon @ 2023-11-16 15:26 UTC (permalink / raw)
  To: gcc-patches, richard.sandiford, richard.earnshaw, kyrylo.tkachov
  Cc: Christophe Lyon

vst1q intrinsics return void, so we should not do 'return vst1q_f16 (base, value);'

This was OK so far, but will trigger an error/warning with the new
implementation of these intrinsics.

This patch just removes the 'return' keyword.

2023-11-16  Christophe Lyon  <christophe.lyon@linaro.org>

	gcc/testsuite/
	* gcc.target/arm/mve/intrinsics/vst1q_f16.c: Remove 'return'.
	* gcc.target/arm/mve/intrinsics/vst1q_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vst1q_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vst1q_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vst1q_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vst1q_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vst1q_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vst1q_u8.c: Likewise.
---
 gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_f16.c | 4 ++--
 gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_f32.c | 4 ++--
 gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_s16.c | 4 ++--
 gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_s32.c | 4 ++--
 gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_s8.c  | 4 ++--
 gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_u16.c | 4 ++--
 gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_u32.c | 4 ++--
 gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_u8.c  | 4 ++--
 8 files changed, 16 insertions(+), 16 deletions(-)

diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_f16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_f16.c
index 1fa02f00f53..e4b40604d54 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_f16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_f16.c
@@ -18,7 +18,7 @@ extern "C" {
 void
 foo (float16_t *base, float16x8_t value)
 {
-  return vst1q_f16 (base, value);
+  vst1q_f16 (base, value);
 }
 
 
@@ -31,7 +31,7 @@ foo (float16_t *base, float16x8_t value)
 void
 foo1 (float16_t *base, float16x8_t value)
 {
-  return vst1q (base, value);
+  vst1q (base, value);
 }
 
 #ifdef __cplusplus
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_f32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_f32.c
index 67cc3ae3b47..8f42323c603 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_f32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_f32.c
@@ -18,7 +18,7 @@ extern "C" {
 void
 foo (float32_t *base, float32x4_t value)
 {
-  return vst1q_f32 (base, value);
+  vst1q_f32 (base, value);
 }
 
 
@@ -31,7 +31,7 @@ foo (float32_t *base, float32x4_t value)
 void
 foo1 (float32_t *base, float32x4_t value)
 {
-  return vst1q (base, value);
+  vst1q (base, value);
 }
 
 #ifdef __cplusplus
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_s16.c
index 052959b2083..891ac4155d9 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_s16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_s16.c
@@ -18,7 +18,7 @@ extern "C" {
 void
 foo (int16_t *base, int16x8_t value)
 {
-  return vst1q_s16 (base, value);
+  vst1q_s16 (base, value);
 }
 
 
@@ -31,7 +31,7 @@ foo (int16_t *base, int16x8_t value)
 void
 foo1 (int16_t *base, int16x8_t value)
 {
-  return vst1q (base, value);
+  vst1q (base, value);
 }
 
 #ifdef __cplusplus
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_s32.c
index 444ad07f4ef..a28d1eb98db 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_s32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_s32.c
@@ -18,7 +18,7 @@ extern "C" {
 void
 foo (int32_t *base, int32x4_t value)
 {
-  return vst1q_s32 (base, value);
+  vst1q_s32 (base, value);
 }
 
 
@@ -31,7 +31,7 @@ foo (int32_t *base, int32x4_t value)
 void
 foo1 (int32_t *base, int32x4_t value)
 {
-  return vst1q (base, value);
+  vst1q (base, value);
 }
 
 #ifdef __cplusplus
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_s8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_s8.c
index 684ff0aca5b..81c141a63e0 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_s8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_s8.c
@@ -18,7 +18,7 @@ extern "C" {
 void
 foo (int8_t *base, int8x16_t value)
 {
-  return vst1q_s8 (base, value);
+  vst1q_s8 (base, value);
 }
 
 
@@ -31,7 +31,7 @@ foo (int8_t *base, int8x16_t value)
 void
 foo1 (int8_t *base, int8x16_t value)
 {
-  return vst1q (base, value);
+  vst1q (base, value);
 }
 
 #ifdef __cplusplus
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_u16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_u16.c
index 1fea2de1e76..b8ce7fbe6ee 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_u16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_u16.c
@@ -18,7 +18,7 @@ extern "C" {
 void
 foo (uint16_t *base, uint16x8_t value)
 {
-  return vst1q_u16 (base, value);
+  vst1q_u16 (base, value);
 }
 
 
@@ -31,7 +31,7 @@ foo (uint16_t *base, uint16x8_t value)
 void
 foo1 (uint16_t *base, uint16x8_t value)
 {
-  return vst1q (base, value);
+  vst1q (base, value);
 }
 
 #ifdef __cplusplus
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_u32.c
index 64c43c59d47..1dbb55538a9 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_u32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_u32.c
@@ -18,7 +18,7 @@ extern "C" {
 void
 foo (uint32_t *base, uint32x4_t value)
 {
-  return vst1q_u32 (base, value);
+  vst1q_u32 (base, value);
 }
 
 
@@ -31,7 +31,7 @@ foo (uint32_t *base, uint32x4_t value)
 void
 foo1 (uint32_t *base, uint32x4_t value)
 {
-  return vst1q (base, value);
+  vst1q (base, value);
 }
 
 #ifdef __cplusplus
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_u8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_u8.c
index 5517611bba6..ab22be81647 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_u8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_u8.c
@@ -18,7 +18,7 @@ extern "C" {
 void
 foo (uint8_t *base, uint8x16_t value)
 {
-  return vst1q_u8 (base, value);
+  vst1q_u8 (base, value);
 }
 
 
@@ -31,7 +31,7 @@ foo (uint8_t *base, uint8x16_t value)
 void
 foo1 (uint8_t *base, uint8x16_t value)
 {
-  return vst1q (base, value);
+  vst1q (base, value);
 }
 
 #ifdef __cplusplus
-- 
2.34.1


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 6/6] arm: [MVE intrinsics] rework vldq1 vst1q
  2023-11-16 15:26 [PATCH 1/6] arm: Fix arm_simd_types and MVE scalar_types Christophe Lyon
                   ` (3 preceding siblings ...)
  2023-11-16 15:26 ` [PATCH 5/6] arm: [MVE intrinsics] fix vst1 tests Christophe Lyon
@ 2023-11-16 15:26 ` Christophe Lyon
  2023-11-16 16:49   ` Kyrylo Tkachov
  2023-11-16 16:46 ` [PATCH 1/6] arm: Fix arm_simd_types and MVE scalar_types Kyrylo Tkachov
  5 siblings, 1 reply; 15+ messages in thread
From: Christophe Lyon @ 2023-11-16 15:26 UTC (permalink / raw)
  To: gcc-patches, richard.sandiford, richard.earnshaw, kyrylo.tkachov
  Cc: Christophe Lyon

Implement vld1q, vst1q using the new MVE builtins framework.

2023-11-16  Christophe Lyon  <christophe.lyon@linaro.org>

	gcc/
	* config/arm/arm-mve-builtins-base.cc (vld1_impl, vld1q)
	(vst1_impl, vst1q): New.
	* config/arm/arm-mve-builtins-base.def (vld1q, vst1q): New.
	* config/arm/arm-mve-builtins-base.h (vld1q, vst1q): New.
	* config/arm/arm_mve.h
	(vld1q): Delete.
	(vst1q): Delete.
	(vld1q_s8): Delete.
	(vld1q_s32): Delete.
	(vld1q_s16): Delete.
	(vld1q_u8): Delete.
	(vld1q_u32): Delete.
	(vld1q_u16): Delete.
	(vld1q_f32): Delete.
	(vld1q_f16): Delete.
	(vst1q_f32): Delete.
	(vst1q_f16): Delete.
	(vst1q_s8): Delete.
	(vst1q_s32): Delete.
	(vst1q_s16): Delete.
	(vst1q_u8): Delete.
	(vst1q_u32): Delete.
	(vst1q_u16): Delete.
	(__arm_vld1q_s8): Delete.
	(__arm_vld1q_s32): Delete.
	(__arm_vld1q_s16): Delete.
	(__arm_vld1q_u8): Delete.
	(__arm_vld1q_u32): Delete.
	(__arm_vld1q_u16): Delete.
	(__arm_vst1q_s8): Delete.
	(__arm_vst1q_s32): Delete.
	(__arm_vst1q_s16): Delete.
	(__arm_vst1q_u8): Delete.
	(__arm_vst1q_u32): Delete.
	(__arm_vst1q_u16): Delete.
	(__arm_vld1q_f32): Delete.
	(__arm_vld1q_f16): Delete.
	(__arm_vst1q_f32): Delete.
	(__arm_vst1q_f16): Delete.
	(__arm_vld1q): Delete.
	(__arm_vst1q): Delete.
	* config/arm/mve.md (mve_vld1q_f<mode>): Rename into ...
	(@mve_vld1q_f<mode>): ... this.
	(mve_vld1q_<supf><mode>): Rename into ...
	(@mve_vld1q_<supf><mode>) ... this.
	(mve_vst1q_f<mode>): Rename into ...
	(@mve_vst1q_f<mode>): ... this.
	(mve_vst1q_<supf><mode>): Rename into ...
	(@mve_vst1q_<supf><mode>) ... this.
---
 gcc/config/arm/arm-mve-builtins-base.cc  |  58 +++++
 gcc/config/arm/arm-mve-builtins-base.def |   4 +
 gcc/config/arm/arm-mve-builtins-base.h   |   4 +-
 gcc/config/arm/arm_mve.h                 | 282 -----------------------
 gcc/config/arm/mve.md                    |   8 +-
 5 files changed, 69 insertions(+), 287 deletions(-)

diff --git a/gcc/config/arm/arm-mve-builtins-base.cc b/gcc/config/arm/arm-mve-builtins-base.cc
index 5478cac8aeb..cfe1b954a29 100644
--- a/gcc/config/arm/arm-mve-builtins-base.cc
+++ b/gcc/config/arm/arm-mve-builtins-base.cc
@@ -83,6 +83,62 @@ class vuninitializedq_impl : public quiet<function_base>
   }
 };
 
+class vld1_impl : public full_width_access
+{
+public:
+  unsigned int
+  call_properties (const function_instance &) const override
+  {
+    return CP_READ_MEMORY;
+  }
+
+  rtx
+  expand (function_expander &e) const override
+  {
+    insn_code icode;
+    if (e.type_suffix (0).float_p)
+      icode = code_for_mve_vld1q_f(e.vector_mode (0));
+    else
+      {
+	if (e.type_suffix (0).unsigned_p)
+	  icode = code_for_mve_vld1q(VLD1Q_U,
+				     e.vector_mode (0));
+	else
+	  icode = code_for_mve_vld1q(VLD1Q_S,
+				     e.vector_mode (0));
+      }
+    return e.use_contiguous_load_insn (icode);
+  }
+};
+
+class vst1_impl : public full_width_access
+{
+public:
+  unsigned int
+  call_properties (const function_instance &) const override
+  {
+    return CP_WRITE_MEMORY;
+  }
+
+  rtx
+  expand (function_expander &e) const override
+  {
+    insn_code icode;
+    if (e.type_suffix (0).float_p)
+      icode = code_for_mve_vst1q_f(e.vector_mode (0));
+    else
+      {
+	if (e.type_suffix (0).unsigned_p)
+	  icode = code_for_mve_vst1q(VST1Q_U,
+				     e.vector_mode (0));
+	else
+	  icode = code_for_mve_vst1q(VST1Q_S,
+				     e.vector_mode (0));
+      }
+    return e.use_contiguous_store_insn (icode);
+  }
+};
+
 } /* end anonymous namespace */
 
 namespace arm_mve {
@@ -290,6 +346,7 @@ FUNCTION (vfmasq, unspec_mve_function_exact_insn, (-1, -1, -1, -1, -1, VFMASQ_N_
 FUNCTION (vfmsq, unspec_mve_function_exact_insn, (-1, -1, VFMSQ_F, -1, -1, -1, -1, -1, VFMSQ_M_F, -1, -1, -1))
 FUNCTION_WITH_M_N_NO_F (vhaddq, VHADDQ)
 FUNCTION_WITH_M_N_NO_F (vhsubq, VHSUBQ)
+FUNCTION (vld1q, vld1_impl,)
 FUNCTION_PRED_P_S (vmaxavq, VMAXAVQ)
 FUNCTION_WITHOUT_N_NO_U_F (vmaxaq, VMAXAQ)
 FUNCTION_ONLY_F (vmaxnmaq, VMAXNMAQ)
@@ -405,6 +462,7 @@ FUNCTION_ONLY_N_NO_F (vshrntq, VSHRNTQ)
 FUNCTION_ONLY_N_NO_F (vshrq, VSHRQ)
 FUNCTION_ONLY_N_NO_F (vsliq, VSLIQ)
 FUNCTION_ONLY_N_NO_F (vsriq, VSRIQ)
+FUNCTION (vst1q, vst1_impl,)
 FUNCTION_WITH_RTX_M_N (vsubq, MINUS, VSUBQ)
 FUNCTION (vuninitializedq, vuninitializedq_impl,)
 
diff --git a/gcc/config/arm/arm-mve-builtins-base.def b/gcc/config/arm/arm-mve-builtins-base.def
index 01dfbdef8a3..16879246237 100644
--- a/gcc/config/arm/arm-mve-builtins-base.def
+++ b/gcc/config/arm/arm-mve-builtins-base.def
@@ -47,6 +47,7 @@ DEF_MVE_FUNCTION (vhaddq, binary_opt_n, all_integer, mx_or_none)
 DEF_MVE_FUNCTION (vhcaddq_rot90, binary, all_signed, mx_or_none)
 DEF_MVE_FUNCTION (vhcaddq_rot270, binary, all_signed, mx_or_none)
 DEF_MVE_FUNCTION (vhsubq, binary_opt_n, all_integer, mx_or_none)
+DEF_MVE_FUNCTION (vld1q, load, all_integer, none)
 DEF_MVE_FUNCTION (vmaxaq, binary_maxamina, all_signed, m_or_none)
 DEF_MVE_FUNCTION (vmaxavq, binary_maxavminav, all_signed, p_or_none)
 DEF_MVE_FUNCTION (vmaxq, binary, all_integer, mx_or_none)
@@ -150,6 +151,7 @@ DEF_MVE_FUNCTION (vshrntq, binary_rshift_narrow, integer_16_32, m_or_none)
 DEF_MVE_FUNCTION (vshrq, binary_rshift, all_integer, mx_or_none)
 DEF_MVE_FUNCTION (vsliq, ternary_lshift, all_integer, m_or_none)
 DEF_MVE_FUNCTION (vsriq, ternary_rshift, all_integer, m_or_none)
+DEF_MVE_FUNCTION (vst1q, store, all_integer, none)
 DEF_MVE_FUNCTION (vsubq, binary_opt_n, all_integer, mx_or_none)
 DEF_MVE_FUNCTION (vuninitializedq, inherent, all_integer_with_64, none)
 #undef REQUIRES_FLOAT
@@ -182,6 +184,7 @@ DEF_MVE_FUNCTION (veorq, binary, all_float, mx_or_none)
 DEF_MVE_FUNCTION (vfmaq, ternary_opt_n, all_float, m_or_none)
 DEF_MVE_FUNCTION (vfmasq, ternary_n, all_float, m_or_none)
 DEF_MVE_FUNCTION (vfmsq, ternary, all_float, m_or_none)
+DEF_MVE_FUNCTION (vld1q, load, all_float, none)
 DEF_MVE_FUNCTION (vmaxnmaq, binary, all_float, m_or_none)
 DEF_MVE_FUNCTION (vmaxnmavq, binary_maxvminv, all_float, p_or_none)
 DEF_MVE_FUNCTION (vmaxnmq, binary, all_float, mx_or_none)
@@ -203,6 +206,7 @@ DEF_MVE_FUNCTION (vrndnq, unary, all_float, mx_or_none)
 DEF_MVE_FUNCTION (vrndpq, unary, all_float, mx_or_none)
 DEF_MVE_FUNCTION (vrndq, unary, all_float, mx_or_none)
 DEF_MVE_FUNCTION (vrndxq, unary, all_float, mx_or_none)
+DEF_MVE_FUNCTION (vst1q, store, all_float, none)
 DEF_MVE_FUNCTION (vsubq, binary_opt_n, all_float, mx_or_none)
 DEF_MVE_FUNCTION (vuninitializedq, inherent, all_float, none)
 #undef REQUIRES_FLOAT
diff --git a/gcc/config/arm/arm-mve-builtins-base.h b/gcc/config/arm/arm-mve-builtins-base.h
index c574c32ac53..8c7e5fe5c3e 100644
--- a/gcc/config/arm/arm-mve-builtins-base.h
+++ b/gcc/config/arm/arm-mve-builtins-base.h
@@ -63,6 +63,7 @@ extern const function_base *const vhaddq;
 extern const function_base *const vhcaddq_rot270;
 extern const function_base *const vhcaddq_rot90;
 extern const function_base *const vhsubq;
+extern const function_base *const vld1q;
 extern const function_base *const vmaxaq;
 extern const function_base *const vmaxavq;
 extern const function_base *const vmaxnmaq;
@@ -103,8 +104,8 @@ extern const function_base *const vmovnbq;
 extern const function_base *const vmovntq;
 extern const function_base *const vmulhq;
 extern const function_base *const vmullbq_int;
-extern const function_base *const vmulltq_int;
 extern const function_base *const vmullbq_poly;
+extern const function_base *const vmulltq_int;
 extern const function_base *const vmulltq_poly;
 extern const function_base *const vmulq;
 extern const function_base *const vmvnq;
@@ -178,6 +179,7 @@ extern const function_base *const vshrntq;
 extern const function_base *const vshrq;
 extern const function_base *const vsliq;
 extern const function_base *const vsriq;
+extern const function_base *const vst1q;
 extern const function_base *const vsubq;
 extern const function_base *const vuninitializedq;
 
diff --git a/gcc/config/arm/arm_mve.h b/gcc/config/arm/arm_mve.h
index b82d94e59bd..cc027f9cbb5 100644
--- a/gcc/config/arm/arm_mve.h
+++ b/gcc/config/arm/arm_mve.h
@@ -56,7 +56,6 @@
 #define vstrbq_scatter_offset_p(__base, __offset, __value, __p) __arm_vstrbq_scatter_offset_p(__base, __offset, __value, __p)
 #define vstrwq_scatter_base_p(__addr, __offset, __value, __p) __arm_vstrwq_scatter_base_p(__addr, __offset, __value, __p)
 #define vldrbq_gather_offset_z(__base, __offset, __p) __arm_vldrbq_gather_offset_z(__base, __offset, __p)
-#define vld1q(__base) __arm_vld1q(__base)
 #define vldrhq_gather_offset(__base, __offset) __arm_vldrhq_gather_offset(__base, __offset)
 #define vldrhq_gather_offset_z(__base, __offset, __p) __arm_vldrhq_gather_offset_z(__base, __offset, __p)
 #define vldrhq_gather_shifted_offset(__base, __offset) __arm_vldrhq_gather_shifted_offset(__base, __offset)
@@ -69,7 +68,6 @@
 #define vldrwq_gather_offset_z(__base, __offset, __p) __arm_vldrwq_gather_offset_z(__base, __offset, __p)
 #define vldrwq_gather_shifted_offset(__base, __offset) __arm_vldrwq_gather_shifted_offset(__base, __offset)
 #define vldrwq_gather_shifted_offset_z(__base, __offset, __p) __arm_vldrwq_gather_shifted_offset_z(__base, __offset, __p)
-#define vst1q(__addr, __value) __arm_vst1q(__addr, __value)
 #define vstrhq_scatter_offset(__base, __offset, __value) __arm_vstrhq_scatter_offset(__base, __offset, __value)
 #define vstrhq_scatter_offset_p(__base, __offset, __value, __p) __arm_vstrhq_scatter_offset_p(__base, __offset, __value, __p)
 #define vstrhq_scatter_shifted_offset(__base, __offset, __value) __arm_vstrhq_scatter_shifted_offset(__base, __offset, __value)
@@ -346,12 +344,6 @@
 #define vldrbq_z_u32(__base, __p) __arm_vldrbq_z_u32(__base, __p)
 #define vldrwq_gather_base_z_u32(__addr,  __offset, __p) __arm_vldrwq_gather_base_z_u32(__addr,  __offset, __p)
 #define vldrwq_gather_base_z_s32(__addr,  __offset, __p) __arm_vldrwq_gather_base_z_s32(__addr,  __offset, __p)
-#define vld1q_s8(__base) __arm_vld1q_s8(__base)
-#define vld1q_s32(__base) __arm_vld1q_s32(__base)
-#define vld1q_s16(__base) __arm_vld1q_s16(__base)
-#define vld1q_u8(__base) __arm_vld1q_u8(__base)
-#define vld1q_u32(__base) __arm_vld1q_u32(__base)
-#define vld1q_u16(__base) __arm_vld1q_u16(__base)
 #define vldrhq_gather_offset_s32(__base, __offset) __arm_vldrhq_gather_offset_s32(__base, __offset)
 #define vldrhq_gather_offset_s16(__base, __offset) __arm_vldrhq_gather_offset_s16(__base, __offset)
 #define vldrhq_gather_offset_u32(__base, __offset) __arm_vldrhq_gather_offset_u32(__base, __offset)
@@ -380,8 +372,6 @@
 #define vldrwq_u32(__base) __arm_vldrwq_u32(__base)
 #define vldrwq_z_s32(__base, __p) __arm_vldrwq_z_s32(__base, __p)
 #define vldrwq_z_u32(__base, __p) __arm_vldrwq_z_u32(__base, __p)
-#define vld1q_f32(__base) __arm_vld1q_f32(__base)
-#define vld1q_f16(__base) __arm_vld1q_f16(__base)
 #define vldrhq_f16(__base) __arm_vldrhq_f16(__base)
 #define vldrhq_z_f16(__base, __p) __arm_vldrhq_z_f16(__base, __p)
 #define vldrwq_f32(__base) __arm_vldrwq_f32(__base)
@@ -416,14 +406,6 @@
 #define vldrwq_gather_shifted_offset_z_f32(__base, __offset, __p) __arm_vldrwq_gather_shifted_offset_z_f32(__base, __offset, __p)
 #define vldrwq_gather_shifted_offset_z_s32(__base, __offset, __p) __arm_vldrwq_gather_shifted_offset_z_s32(__base, __offset, __p)
 #define vldrwq_gather_shifted_offset_z_u32(__base, __offset, __p) __arm_vldrwq_gather_shifted_offset_z_u32(__base, __offset, __p)
-#define vst1q_f32(__addr, __value) __arm_vst1q_f32(__addr, __value)
-#define vst1q_f16(__addr, __value) __arm_vst1q_f16(__addr, __value)
-#define vst1q_s8(__addr, __value) __arm_vst1q_s8(__addr, __value)
-#define vst1q_s32(__addr, __value) __arm_vst1q_s32(__addr, __value)
-#define vst1q_s16(__addr, __value) __arm_vst1q_s16(__addr, __value)
-#define vst1q_u8(__addr, __value) __arm_vst1q_u8(__addr, __value)
-#define vst1q_u32(__addr, __value) __arm_vst1q_u32(__addr, __value)
-#define vst1q_u16(__addr, __value) __arm_vst1q_u16(__addr, __value)
 #define vstrhq_f16(__addr, __value) __arm_vstrhq_f16(__addr, __value)
 #define vstrhq_scatter_offset_s32( __base, __offset, __value) __arm_vstrhq_scatter_offset_s32( __base, __offset, __value)
 #define vstrhq_scatter_offset_s16( __base, __offset, __value) __arm_vstrhq_scatter_offset_s16( __base, __offset, __value)
@@ -1537,48 +1519,6 @@ __arm_vldrwq_gather_base_z_u32 (uint32x4_t __addr, const int __offset, mve_pred1
   return __builtin_mve_vldrwq_gather_base_z_uv4si (__addr, __offset, __p);
 }
 
-__extension__ extern __inline int8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vld1q_s8 (int8_t const * __base)
-{
-  return __builtin_mve_vld1q_sv16qi ((__builtin_neon_qi *) __base);
-}
-
-__extension__ extern __inline int32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vld1q_s32 (int32_t const * __base)
-{
-  return __builtin_mve_vld1q_sv4si ((__builtin_neon_si *) __base);
-}
-
-__extension__ extern __inline int16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vld1q_s16 (int16_t const * __base)
-{
-  return __builtin_mve_vld1q_sv8hi ((__builtin_neon_hi *) __base);
-}
-
-__extension__ extern __inline uint8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vld1q_u8 (uint8_t const * __base)
-{
-  return __builtin_mve_vld1q_uv16qi ((__builtin_neon_qi *) __base);
-}
-
-__extension__ extern __inline uint32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vld1q_u32 (uint32_t const * __base)
-{
-  return __builtin_mve_vld1q_uv4si ((__builtin_neon_si *) __base);
-}
-
-__extension__ extern __inline uint16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vld1q_u16 (uint16_t const * __base)
-{
-  return __builtin_mve_vld1q_uv8hi ((__builtin_neon_hi *) __base);
-}
-
 __extension__ extern __inline int32x4_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vldrhq_gather_offset_s32 (int16_t const * __base, uint32x4_t __offset)
@@ -1917,48 +1857,6 @@ __arm_vldrwq_gather_shifted_offset_z_u32 (uint32_t const * __base, uint32x4_t __
   return __builtin_mve_vldrwq_gather_shifted_offset_z_uv4si ((__builtin_neon_si *) __base, __offset, __p);
 }
 
-__extension__ extern __inline void
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vst1q_s8 (int8_t * __addr, int8x16_t __value)
-{
-  __builtin_mve_vst1q_sv16qi ((__builtin_neon_qi *) __addr, __value);
-}
-
-__extension__ extern __inline void
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vst1q_s32 (int32_t * __addr, int32x4_t __value)
-{
-  __builtin_mve_vst1q_sv4si ((__builtin_neon_si *) __addr, __value);
-}
-
-__extension__ extern __inline void
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vst1q_s16 (int16_t * __addr, int16x8_t __value)
-{
-  __builtin_mve_vst1q_sv8hi ((__builtin_neon_hi *) __addr, __value);
-}
-
-__extension__ extern __inline void
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vst1q_u8 (uint8_t * __addr, uint8x16_t __value)
-{
-  __builtin_mve_vst1q_uv16qi ((__builtin_neon_qi *) __addr, __value);
-}
-
-__extension__ extern __inline void
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vst1q_u32 (uint32_t * __addr, uint32x4_t __value)
-{
-  __builtin_mve_vst1q_uv4si ((__builtin_neon_si *) __addr, __value);
-}
-
-__extension__ extern __inline void
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vst1q_u16 (uint16_t * __addr, uint16x8_t __value)
-{
-  __builtin_mve_vst1q_uv8hi ((__builtin_neon_hi *) __addr, __value);
-}
-
 __extension__ extern __inline void
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vstrhq_scatter_offset_s32 (int16_t * __base, uint32x4_t __offset, int32x4_t __value)
@@ -4421,20 +4319,6 @@ __arm_vornq_m_f16 (float16x8_t __inactive, float16x8_t __a, float16x8_t __b, mve
   return __builtin_mve_vornq_m_fv8hf (__inactive, __a, __b, __p);
 }
 
-__extension__ extern __inline float32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vld1q_f32 (float32_t const * __base)
-{
-  return __builtin_mve_vld1q_fv4sf((__builtin_neon_si *) __base);
-}
-
-__extension__ extern __inline float16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vld1q_f16 (float16_t const * __base)
-{
-  return __builtin_mve_vld1q_fv8hf((__builtin_neon_hi *) __base);
-}
-
 __extension__ extern __inline float32x4_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vldrwq_f32 (float32_t const * __base)
@@ -4547,20 +4431,6 @@ __arm_vstrwq_f32 (float32_t * __addr, float32x4_t __value)
   __builtin_mve_vstrwq_fv4sf ((__builtin_neon_si *) __addr, __value);
 }
 
-__extension__ extern __inline void
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vst1q_f32 (float32_t * __addr, float32x4_t __value)
-{
-  __builtin_mve_vst1q_fv4sf ((__builtin_neon_si *) __addr, __value);
-}
-
-__extension__ extern __inline void
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vst1q_f16 (float16_t * __addr, float16x8_t __value)
-{
-  __builtin_mve_vst1q_fv8hf ((__builtin_neon_hi *) __addr, __value);
-}
-
 __extension__ extern __inline void
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vstrhq_f16 (float16_t * __addr, float16x8_t __value)
@@ -5651,48 +5521,6 @@ __arm_vldrbq_gather_offset_z (uint8_t const * __base, uint16x8_t __offset, mve_p
  return __arm_vldrbq_gather_offset_z_u16 (__base, __offset, __p);
 }
 
-__extension__ extern __inline int8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vld1q (int8_t const * __base)
-{
- return __arm_vld1q_s8 (__base);
-}
-
-__extension__ extern __inline int32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vld1q (int32_t const * __base)
-{
- return __arm_vld1q_s32 (__base);
-}
-
-__extension__ extern __inline int16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vld1q (int16_t const * __base)
-{
- return __arm_vld1q_s16 (__base);
-}
-
-__extension__ extern __inline uint8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vld1q (uint8_t const * __base)
-{
- return __arm_vld1q_u8 (__base);
-}
-
-__extension__ extern __inline uint32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vld1q (uint32_t const * __base)
-{
- return __arm_vld1q_u32 (__base);
-}
-
-__extension__ extern __inline uint16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vld1q (uint16_t const * __base)
-{
- return __arm_vld1q_u16 (__base);
-}
-
 __extension__ extern __inline int32x4_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vldrhq_gather_offset (int16_t const * __base, uint32x4_t __offset)
@@ -5917,48 +5745,6 @@ __arm_vldrwq_gather_shifted_offset_z (uint32_t const * __base, uint32x4_t __offs
  return __arm_vldrwq_gather_shifted_offset_z_u32 (__base, __offset, __p);
 }
 
-__extension__ extern __inline void
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vst1q (int8_t * __addr, int8x16_t __value)
-{
- __arm_vst1q_s8 (__addr, __value);
-}
-
-__extension__ extern __inline void
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vst1q (int32_t * __addr, int32x4_t __value)
-{
- __arm_vst1q_s32 (__addr, __value);
-}
-
-__extension__ extern __inline void
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vst1q (int16_t * __addr, int16x8_t __value)
-{
- __arm_vst1q_s16 (__addr, __value);
-}
-
-__extension__ extern __inline void
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vst1q (uint8_t * __addr, uint8x16_t __value)
-{
- __arm_vst1q_u8 (__addr, __value);
-}
-
-__extension__ extern __inline void
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vst1q (uint32_t * __addr, uint32x4_t __value)
-{
- __arm_vst1q_u32 (__addr, __value);
-}
-
-__extension__ extern __inline void
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vst1q (uint16_t * __addr, uint16x8_t __value)
-{
- __arm_vst1q_u16 (__addr, __value);
-}
-
 __extension__ extern __inline void
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vstrhq_scatter_offset (int16_t * __base, uint32x4_t __offset, int32x4_t __value)
@@ -7809,20 +7595,6 @@ __arm_vornq_m (float16x8_t __inactive, float16x8_t __a, float16x8_t __b, mve_pre
  return __arm_vornq_m_f16 (__inactive, __a, __b, __p);
 }
 
-__extension__ extern __inline float32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vld1q (float32_t const * __base)
-{
- return __arm_vld1q_f32 (__base);
-}
-
-__extension__ extern __inline float16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vld1q (float16_t const * __base)
-{
- return __arm_vld1q_f16 (__base);
-}
-
 __extension__ extern __inline float16x8_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vldrhq_gather_offset (float16_t const * __base, uint16x8_t __offset)
@@ -7893,20 +7665,6 @@ __arm_vstrwq (float32_t * __addr, float32x4_t __value)
  __arm_vstrwq_f32 (__addr, __value);
 }
 
-__extension__ extern __inline void
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vst1q (float32_t * __addr, float32x4_t __value)
-{
- __arm_vst1q_f32 (__addr, __value);
-}
-
-__extension__ extern __inline void
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vst1q (float16_t * __addr, float16x8_t __value)
-{
- __arm_vst1q_f16 (__addr, __value);
-}
-
 __extension__ extern __inline void
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vstrhq (float16_t * __addr, float16x8_t __value)
@@ -8670,17 +8428,6 @@ extern void *__ARM_undef;
   int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t]: __arm_vornq_m_f16 (__ARM_mve_coerce(__p0, float16x8_t), __ARM_mve_coerce(__p1, float16x8_t), __ARM_mve_coerce(__p2, float16x8_t), p3), \
   int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t]: __arm_vornq_m_f32 (__ARM_mve_coerce(__p0, float32x4_t), __ARM_mve_coerce(__p1, float32x4_t), __ARM_mve_coerce(__p2, float32x4_t), p3));})
 
-#define __arm_vld1q(p0) (\
-  _Generic( (int (*)[__ARM_mve_typeid(p0)])0, \
-  int (*)[__ARM_mve_type_int8_t_ptr]: __arm_vld1q_s8 (__ARM_mve_coerce_s8_ptr(p0, int8_t *)), \
-  int (*)[__ARM_mve_type_int16_t_ptr]: __arm_vld1q_s16 (__ARM_mve_coerce_s16_ptr(p0, int16_t *)), \
-  int (*)[__ARM_mve_type_int32_t_ptr]: __arm_vld1q_s32 (__ARM_mve_coerce_s32_ptr(p0, int32_t *)), \
-  int (*)[__ARM_mve_type_uint8_t_ptr]: __arm_vld1q_u8 (__ARM_mve_coerce_u8_ptr(p0, uint8_t *)), \
-  int (*)[__ARM_mve_type_uint16_t_ptr]: __arm_vld1q_u16 (__ARM_mve_coerce_u16_ptr(p0, uint16_t *)), \
-  int (*)[__ARM_mve_type_uint32_t_ptr]: __arm_vld1q_u32 (__ARM_mve_coerce_u32_ptr(p0, uint32_t *)), \
-  int (*)[__ARM_mve_type_float16_t_ptr]: __arm_vld1q_f16 (__ARM_mve_coerce_f16_ptr(p0, float16_t *)), \
-  int (*)[__ARM_mve_type_float32_t_ptr]: __arm_vld1q_f32 (__ARM_mve_coerce_f32_ptr(p0, float32_t *))))
-
 #define __arm_vld1q_z(p0,p1) ( \
   _Generic( (int (*)[__ARM_mve_typeid(p0)])0, \
   int (*)[__ARM_mve_type_int8_t_ptr]: __arm_vld1q_z_s8 (__ARM_mve_coerce_s8_ptr(p0, int8_t *), p1), \
@@ -8792,17 +8539,6 @@ extern void *__ARM_undef;
   int (*)[__ARM_mve_type_float16_t_ptr][__ARM_mve_type_float16x8x2_t]: __arm_vst2q_f16 (__ARM_mve_coerce_f16_ptr(p0, float16_t *), __ARM_mve_coerce(__p1, float16x8x2_t)), \
   int (*)[__ARM_mve_type_float32_t_ptr][__ARM_mve_type_float32x4x2_t]: __arm_vst2q_f32 (__ARM_mve_coerce_f32_ptr(p0, float32_t *), __ARM_mve_coerce(__p1, float32x4x2_t)));})
 
-#define __arm_vst1q(p0,p1) ({ __typeof(p1) __p1 = (p1); \
-  _Generic( (int (*)[__ARM_mve_typeid(p0)][__ARM_mve_typeid(__p1)])0, \
-  int (*)[__ARM_mve_type_int8_t_ptr][__ARM_mve_type_int8x16_t]: __arm_vst1q_s8 (__ARM_mve_coerce_s8_ptr(p0, int8_t *), __ARM_mve_coerce(__p1, int8x16_t)), \
-  int (*)[__ARM_mve_type_int16_t_ptr][__ARM_mve_type_int16x8_t]: __arm_vst1q_s16 (__ARM_mve_coerce_s16_ptr(p0, int16_t *), __ARM_mve_coerce(__p1, int16x8_t)), \
-  int (*)[__ARM_mve_type_int32_t_ptr][__ARM_mve_type_int32x4_t]: __arm_vst1q_s32 (__ARM_mve_coerce_s32_ptr(p0, int32_t *), __ARM_mve_coerce(__p1, int32x4_t)), \
-  int (*)[__ARM_mve_type_uint8_t_ptr][__ARM_mve_type_uint8x16_t]: __arm_vst1q_u8 (__ARM_mve_coerce_u8_ptr(p0, uint8_t *), __ARM_mve_coerce(__p1, uint8x16_t)), \
-  int (*)[__ARM_mve_type_uint16_t_ptr][__ARM_mve_type_uint16x8_t]: __arm_vst1q_u16 (__ARM_mve_coerce_u16_ptr(p0, uint16_t *), __ARM_mve_coerce(__p1, uint16x8_t)), \
-  int (*)[__ARM_mve_type_uint32_t_ptr][__ARM_mve_type_uint32x4_t]: __arm_vst1q_u32 (__ARM_mve_coerce_u32_ptr(p0, uint32_t *), __ARM_mve_coerce(__p1, uint32x4_t)), \
-  int (*)[__ARM_mve_type_float16_t_ptr][__ARM_mve_type_float16x8_t]: __arm_vst1q_f16 (__ARM_mve_coerce_f16_ptr(p0, float16_t *), __ARM_mve_coerce(__p1, float16x8_t)), \
-  int (*)[__ARM_mve_type_float32_t_ptr][__ARM_mve_type_float32x4_t]: __arm_vst1q_f32 (__ARM_mve_coerce_f32_ptr(p0, float32_t *), __ARM_mve_coerce(__p1, float32x4_t)));})
-
 #define __arm_vstrhq(p0,p1) ({ __typeof(p1) __p1 = (p1); \
   _Generic( (int (*)[__ARM_mve_typeid(p0)][__ARM_mve_typeid(__p1)])0, \
   int (*)[__ARM_mve_type_int16_t_ptr][__ARM_mve_type_int16x8_t]: __arm_vstrhq_s16 (__ARM_mve_coerce_s16_ptr(p0, int16_t *), __ARM_mve_coerce(__p1, int16x8_t)), \
@@ -9149,15 +8885,6 @@ extern void *__ARM_undef;
   int (*)[__ARM_mve_type_int32x4_t]: __arm_vstrwq_scatter_base_p_s32 (p0, p1, __ARM_mve_coerce(__p2, int32x4_t), p3), \
   int (*)[__ARM_mve_type_uint32x4_t]: __arm_vstrwq_scatter_base_p_u32 (p0, p1, __ARM_mve_coerce(__p2, uint32x4_t), p3));})
 
-#define __arm_vld1q(p0) (\
-  _Generic( (int (*)[__ARM_mve_typeid(p0)])0, \
-  int (*)[__ARM_mve_type_int8_t_ptr]: __arm_vld1q_s8 (__ARM_mve_coerce_s8_ptr(p0, int8_t *)), \
-  int (*)[__ARM_mve_type_int16_t_ptr]: __arm_vld1q_s16 (__ARM_mve_coerce_s16_ptr(p0, int16_t *)), \
-  int (*)[__ARM_mve_type_int32_t_ptr]: __arm_vld1q_s32 (__ARM_mve_coerce_s32_ptr(p0, int32_t *)), \
-  int (*)[__ARM_mve_type_uint8_t_ptr]: __arm_vld1q_u8 (__ARM_mve_coerce_u8_ptr(p0, uint8_t *)), \
-  int (*)[__ARM_mve_type_uint16_t_ptr]: __arm_vld1q_u16 (__ARM_mve_coerce_u16_ptr(p0, uint16_t *)), \
-  int (*)[__ARM_mve_type_uint32_t_ptr]: __arm_vld1q_u32 (__ARM_mve_coerce_u32_ptr(p0, uint32_t *))))
-
 #define __arm_vldrhq_gather_offset(p0,p1) ({ __typeof(p1) __p1 = (p1); \
   _Generic( (int (*)[__ARM_mve_typeid(p0)][__ARM_mve_typeid(__p1)])0, \
   int (*)[__ARM_mve_type_int16_t_ptr][__ARM_mve_type_uint16x8_t]: __arm_vldrhq_gather_offset_s16 (__ARM_mve_coerce_s16_ptr(p0, int16_t *), __ARM_mve_coerce(__p1, uint16x8_t)), \
@@ -9206,15 +8933,6 @@ extern void *__ARM_undef;
   int (*)[__ARM_mve_type_int32_t_ptr]: __arm_vldrwq_gather_shifted_offset_z_s32 (__ARM_mve_coerce_s32_ptr(__p0, int32_t *), p1, p2), \
   int (*)[__ARM_mve_type_uint32_t_ptr]: __arm_vldrwq_gather_shifted_offset_z_u32 (__ARM_mve_coerce_u32_ptr(__p0, uint32_t *), p1, p2));})
 
-#define __arm_vst1q(p0,p1) ({ __typeof(p1) __p1 = (p1); \
-  _Generic( (int (*)[__ARM_mve_typeid(p0)][__ARM_mve_typeid(__p1)])0, \
-  int (*)[__ARM_mve_type_int8_t_ptr][__ARM_mve_type_int8x16_t]: __arm_vst1q_s8 (__ARM_mve_coerce_s8_ptr(p0, int8_t *), __ARM_mve_coerce(__p1, int8x16_t)), \
-  int (*)[__ARM_mve_type_int16_t_ptr][__ARM_mve_type_int16x8_t]: __arm_vst1q_s16 (__ARM_mve_coerce_s16_ptr(p0, int16_t *), __ARM_mve_coerce(__p1, int16x8_t)), \
-  int (*)[__ARM_mve_type_int32_t_ptr][__ARM_mve_type_int32x4_t]: __arm_vst1q_s32 (__ARM_mve_coerce_s32_ptr(p0, int32_t *), __ARM_mve_coerce(__p1, int32x4_t)), \
-  int (*)[__ARM_mve_type_uint8_t_ptr][__ARM_mve_type_uint8x16_t]: __arm_vst1q_u8 (__ARM_mve_coerce_u8_ptr(p0, uint8_t *), __ARM_mve_coerce(__p1, uint8x16_t)), \
-  int (*)[__ARM_mve_type_uint16_t_ptr][__ARM_mve_type_uint16x8_t]: __arm_vst1q_u16 (__ARM_mve_coerce_u16_ptr(p0, uint16_t *), __ARM_mve_coerce(__p1, uint16x8_t)), \
-  int (*)[__ARM_mve_type_uint32_t_ptr][__ARM_mve_type_uint32x4_t]: __arm_vst1q_u32 (__ARM_mve_coerce_u32_ptr(p0, uint32_t *), __ARM_mve_coerce(__p1, uint32x4_t)));})
-
 #define __arm_vst1q_p(p0,p1,p2) ({ __typeof(p1) __p1 = (p1); \
   _Generic( (int (*)[__ARM_mve_typeid(p0)][__ARM_mve_typeid(__p1)])0, \
   int (*)[__ARM_mve_type_int8_t_ptr][__ARM_mve_type_int8x16_t]: __arm_vst1q_p_s8 (__ARM_mve_coerce_s8_ptr(p0, int8_t *), __ARM_mve_coerce(__p1, int8x16_t), p2), \
diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
index 366cec0812a..b0d3443da9c 100644
--- a/gcc/config/arm/mve.md
+++ b/gcc/config/arm/mve.md
@@ -3690,7 +3690,7 @@ (define_insn "mve_vldrwq_z_<supf>v4si"
 }
   [(set_attr "length" "8")])
 
-(define_expand "mve_vld1q_f<mode>"
+(define_expand "@mve_vld1q_f<mode>"
   [(match_operand:MVE_0 0 "s_register_operand")
    (unspec:MVE_0 [(match_operand:<MVE_CNVT> 1 "mve_memory_operand")] VLD1Q_F)
   ]
@@ -3700,7 +3700,7 @@ (define_expand "mve_vld1q_f<mode>"
   DONE;
 })
 
-(define_expand "mve_vld1q_<supf><mode>"
+(define_expand "@mve_vld1q_<supf><mode>"
   [(match_operand:MVE_2 0 "s_register_operand")
    (unspec:MVE_2 [(match_operand:MVE_2 1 "mve_memory_operand")] VLD1Q)
   ]
@@ -4408,7 +4408,7 @@ (define_insn "mve_vstrwq_<supf>v4si"
 }
   [(set_attr "length" "4")])
 
-(define_expand "mve_vst1q_f<mode>"
+(define_expand "@mve_vst1q_f<mode>"
   [(match_operand:<MVE_CNVT> 0 "mve_memory_operand")
    (unspec:<MVE_CNVT> [(match_operand:MVE_0 1 "s_register_operand")] VST1Q_F)
   ]
@@ -4418,7 +4418,7 @@ (define_expand "mve_vst1q_f<mode>"
   DONE;
 })
 
-(define_expand "mve_vst1q_<supf><mode>"
+(define_expand "@mve_vst1q_<supf><mode>"
   [(match_operand:MVE_2 0 "mve_memory_operand")
    (unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand")] VST1Q)
   ]
-- 
2.34.1


^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: [PATCH 5/6] arm: [MVE intrinsics] fix vst1 tests
  2023-11-16 15:26 ` [PATCH 5/6] arm: [MVE intrinsics] fix vst1 tests Christophe Lyon
@ 2023-11-16 15:30   ` Kyrylo Tkachov
  2023-11-16 15:37     ` Christophe Lyon
  0 siblings, 1 reply; 15+ messages in thread
From: Kyrylo Tkachov @ 2023-11-16 15:30 UTC (permalink / raw)
  To: Christophe Lyon, gcc-patches, Richard Sandiford, Richard Earnshaw



> -----Original Message-----
> From: Christophe Lyon <christophe.lyon@linaro.org>
> Sent: Thursday, November 16, 2023 3:26 PM
> To: gcc-patches@gcc.gnu.org; Richard Sandiford
> <Richard.Sandiford@arm.com>; Richard Earnshaw
> <Richard.Earnshaw@arm.com>; Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>
> Cc: Christophe Lyon <christophe.lyon@linaro.org>
> Subject: [PATCH 5/6] arm: [MVE intrinsics] fix vst1 tests
> 
> vst1q intrinsics return void, so we should not do 'return vst1q_f16 (base,
> value);'
> 
> This was OK so far, but will trigger an error/warning with the new
> implementation of these intrinsics.
> 

Whoops!
Ok (could have gone in as obvious IMO).
Thanks,
Kyrill

> This patch just removes the 'return' keyword.
> 
> 2023-11-16  Christophe Lyon  <christophe.lyon@linaro.org>
> 
> 	gcc/testsuite/
> 	* gcc.target/arm/mve/intrinsics/vst1q_f16.c: Remove 'return'.
> 	* gcc.target/arm/mve/intrinsics/vst1q_f32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vst1q_s16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vst1q_s32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vst1q_s8.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vst1q_u16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vst1q_u32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vst1q_u8.c: Likewise.
> ---
>  gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_f16.c | 4 ++--
>  gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_f32.c | 4 ++--
>  gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_s16.c | 4 ++--
>  gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_s32.c | 4 ++--
>  gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_s8.c  | 4 ++--
>  gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_u16.c | 4 ++--
>  gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_u32.c | 4 ++--
>  gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_u8.c  | 4 ++--
>  8 files changed, 16 insertions(+), 16 deletions(-)
> 
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_f16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_f16.c
> index 1fa02f00f53..e4b40604d54 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_f16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_f16.c
> @@ -18,7 +18,7 @@ extern "C" {
>  void
>  foo (float16_t *base, float16x8_t value)
>  {
> -  return vst1q_f16 (base, value);
> +  vst1q_f16 (base, value);
>  }
> 
> 
> @@ -31,7 +31,7 @@ foo (float16_t *base, float16x8_t value)
>  void
>  foo1 (float16_t *base, float16x8_t value)
>  {
> -  return vst1q (base, value);
> +  vst1q (base, value);
>  }
> 
>  #ifdef __cplusplus
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_f32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_f32.c
> index 67cc3ae3b47..8f42323c603 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_f32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_f32.c
> @@ -18,7 +18,7 @@ extern "C" {
>  void
>  foo (float32_t *base, float32x4_t value)
>  {
> -  return vst1q_f32 (base, value);
> +  vst1q_f32 (base, value);
>  }
> 
> 
> @@ -31,7 +31,7 @@ foo (float32_t *base, float32x4_t value)
>  void
>  foo1 (float32_t *base, float32x4_t value)
>  {
> -  return vst1q (base, value);
> +  vst1q (base, value);
>  }
> 
>  #ifdef __cplusplus
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_s16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_s16.c
> index 052959b2083..891ac4155d9 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_s16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_s16.c
> @@ -18,7 +18,7 @@ extern "C" {
>  void
>  foo (int16_t *base, int16x8_t value)
>  {
> -  return vst1q_s16 (base, value);
> +  vst1q_s16 (base, value);
>  }
> 
> 
> @@ -31,7 +31,7 @@ foo (int16_t *base, int16x8_t value)
>  void
>  foo1 (int16_t *base, int16x8_t value)
>  {
> -  return vst1q (base, value);
> +  vst1q (base, value);
>  }
> 
>  #ifdef __cplusplus
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_s32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_s32.c
> index 444ad07f4ef..a28d1eb98db 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_s32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_s32.c
> @@ -18,7 +18,7 @@ extern "C" {
>  void
>  foo (int32_t *base, int32x4_t value)
>  {
> -  return vst1q_s32 (base, value);
> +  vst1q_s32 (base, value);
>  }
> 
> 
> @@ -31,7 +31,7 @@ foo (int32_t *base, int32x4_t value)
>  void
>  foo1 (int32_t *base, int32x4_t value)
>  {
> -  return vst1q (base, value);
> +  vst1q (base, value);
>  }
> 
>  #ifdef __cplusplus
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_s8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_s8.c
> index 684ff0aca5b..81c141a63e0 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_s8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_s8.c
> @@ -18,7 +18,7 @@ extern "C" {
>  void
>  foo (int8_t *base, int8x16_t value)
>  {
> -  return vst1q_s8 (base, value);
> +  vst1q_s8 (base, value);
>  }
> 
> 
> @@ -31,7 +31,7 @@ foo (int8_t *base, int8x16_t value)
>  void
>  foo1 (int8_t *base, int8x16_t value)
>  {
> -  return vst1q (base, value);
> +  vst1q (base, value);
>  }
> 
>  #ifdef __cplusplus
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_u16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_u16.c
> index 1fea2de1e76..b8ce7fbe6ee 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_u16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_u16.c
> @@ -18,7 +18,7 @@ extern "C" {
>  void
>  foo (uint16_t *base, uint16x8_t value)
>  {
> -  return vst1q_u16 (base, value);
> +  vst1q_u16 (base, value);
>  }
> 
> 
> @@ -31,7 +31,7 @@ foo (uint16_t *base, uint16x8_t value)
>  void
>  foo1 (uint16_t *base, uint16x8_t value)
>  {
> -  return vst1q (base, value);
> +  vst1q (base, value);
>  }
> 
>  #ifdef __cplusplus
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_u32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_u32.c
> index 64c43c59d47..1dbb55538a9 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_u32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_u32.c
> @@ -18,7 +18,7 @@ extern "C" {
>  void
>  foo (uint32_t *base, uint32x4_t value)
>  {
> -  return vst1q_u32 (base, value);
> +  vst1q_u32 (base, value);
>  }
> 
> 
> @@ -31,7 +31,7 @@ foo (uint32_t *base, uint32x4_t value)
>  void
>  foo1 (uint32_t *base, uint32x4_t value)
>  {
> -  return vst1q (base, value);
> +  vst1q (base, value);
>  }
> 
>  #ifdef __cplusplus
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_u8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_u8.c
> index 5517611bba6..ab22be81647 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_u8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_u8.c
> @@ -18,7 +18,7 @@ extern "C" {
>  void
>  foo (uint8_t *base, uint8x16_t value)
>  {
> -  return vst1q_u8 (base, value);
> +  vst1q_u8 (base, value);
>  }
> 
> 
> @@ -31,7 +31,7 @@ foo (uint8_t *base, uint8x16_t value)
>  void
>  foo1 (uint8_t *base, uint8x16_t value)
>  {
> -  return vst1q (base, value);
> +  vst1q (base, value);
>  }
> 
>  #ifdef __cplusplus
> --
> 2.34.1


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 5/6] arm: [MVE intrinsics] fix vst1 tests
  2023-11-16 15:30   ` Kyrylo Tkachov
@ 2023-11-16 15:37     ` Christophe Lyon
  0 siblings, 0 replies; 15+ messages in thread
From: Christophe Lyon @ 2023-11-16 15:37 UTC (permalink / raw)
  To: Kyrylo Tkachov; +Cc: gcc-patches, Richard Sandiford, Richard Earnshaw

On Thu, 16 Nov 2023 at 16:30, Kyrylo Tkachov <Kyrylo.Tkachov@arm.com> wrote:
>
>
>
> > -----Original Message-----
> > From: Christophe Lyon <christophe.lyon@linaro.org>
> > Sent: Thursday, November 16, 2023 3:26 PM
> > To: gcc-patches@gcc.gnu.org; Richard Sandiford
> > <Richard.Sandiford@arm.com>; Richard Earnshaw
> > <Richard.Earnshaw@arm.com>; Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>
> > Cc: Christophe Lyon <christophe.lyon@linaro.org>
> > Subject: [PATCH 5/6] arm: [MVE intrinsics] fix vst1 tests
> >
> > vst1q intrinsics return void, so we should not do 'return vst1q_f16 (base,
> > value);'
> >
> > This was OK so far, but will trigger an error/warning with the new
> > implementation of these intrinsics.
> >
>
> Whoops!
> Ok (could have gone in as obvious IMO).

Indeed, I'll try to remember that when I write the same patch for the
other vst* intrinsics tests ;-)

Thanks,

Christophe

> Thanks,
> Kyrill
>
> > This patch just removes the 'return' keyword.
> >
> > 2023-11-16  Christophe Lyon  <christophe.lyon@linaro.org>
> >
> >       gcc/testsuite/
> >       * gcc.target/arm/mve/intrinsics/vst1q_f16.c: Remove 'return'.
> >       * gcc.target/arm/mve/intrinsics/vst1q_f32.c: Likewise.
> >       * gcc.target/arm/mve/intrinsics/vst1q_s16.c: Likewise.
> >       * gcc.target/arm/mve/intrinsics/vst1q_s32.c: Likewise.
> >       * gcc.target/arm/mve/intrinsics/vst1q_s8.c: Likewise.
> >       * gcc.target/arm/mve/intrinsics/vst1q_u16.c: Likewise.
> >       * gcc.target/arm/mve/intrinsics/vst1q_u32.c: Likewise.
> >       * gcc.target/arm/mve/intrinsics/vst1q_u8.c: Likewise.
> > ---
> >  gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_f16.c | 4 ++--
> >  gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_f32.c | 4 ++--
> >  gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_s16.c | 4 ++--
> >  gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_s32.c | 4 ++--
> >  gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_s8.c  | 4 ++--
> >  gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_u16.c | 4 ++--
> >  gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_u32.c | 4 ++--
> >  gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_u8.c  | 4 ++--
> >  8 files changed, 16 insertions(+), 16 deletions(-)
> >
> > diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_f16.c
> > b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_f16.c
> > index 1fa02f00f53..e4b40604d54 100644
> > --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_f16.c
> > +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_f16.c
> > @@ -18,7 +18,7 @@ extern "C" {
> >  void
> >  foo (float16_t *base, float16x8_t value)
> >  {
> > -  return vst1q_f16 (base, value);
> > +  vst1q_f16 (base, value);
> >  }
> >
> >
> > @@ -31,7 +31,7 @@ foo (float16_t *base, float16x8_t value)
> >  void
> >  foo1 (float16_t *base, float16x8_t value)
> >  {
> > -  return vst1q (base, value);
> > +  vst1q (base, value);
> >  }
> >
> >  #ifdef __cplusplus
> > diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_f32.c
> > b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_f32.c
> > index 67cc3ae3b47..8f42323c603 100644
> > --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_f32.c
> > +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_f32.c
> > @@ -18,7 +18,7 @@ extern "C" {
> >  void
> >  foo (float32_t *base, float32x4_t value)
> >  {
> > -  return vst1q_f32 (base, value);
> > +  vst1q_f32 (base, value);
> >  }
> >
> >
> > @@ -31,7 +31,7 @@ foo (float32_t *base, float32x4_t value)
> >  void
> >  foo1 (float32_t *base, float32x4_t value)
> >  {
> > -  return vst1q (base, value);
> > +  vst1q (base, value);
> >  }
> >
> >  #ifdef __cplusplus
> > diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_s16.c
> > b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_s16.c
> > index 052959b2083..891ac4155d9 100644
> > --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_s16.c
> > +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_s16.c
> > @@ -18,7 +18,7 @@ extern "C" {
> >  void
> >  foo (int16_t *base, int16x8_t value)
> >  {
> > -  return vst1q_s16 (base, value);
> > +  vst1q_s16 (base, value);
> >  }
> >
> >
> > @@ -31,7 +31,7 @@ foo (int16_t *base, int16x8_t value)
> >  void
> >  foo1 (int16_t *base, int16x8_t value)
> >  {
> > -  return vst1q (base, value);
> > +  vst1q (base, value);
> >  }
> >
> >  #ifdef __cplusplus
> > diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_s32.c
> > b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_s32.c
> > index 444ad07f4ef..a28d1eb98db 100644
> > --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_s32.c
> > +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_s32.c
> > @@ -18,7 +18,7 @@ extern "C" {
> >  void
> >  foo (int32_t *base, int32x4_t value)
> >  {
> > -  return vst1q_s32 (base, value);
> > +  vst1q_s32 (base, value);
> >  }
> >
> >
> > @@ -31,7 +31,7 @@ foo (int32_t *base, int32x4_t value)
> >  void
> >  foo1 (int32_t *base, int32x4_t value)
> >  {
> > -  return vst1q (base, value);
> > +  vst1q (base, value);
> >  }
> >
> >  #ifdef __cplusplus
> > diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_s8.c
> > b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_s8.c
> > index 684ff0aca5b..81c141a63e0 100644
> > --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_s8.c
> > +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_s8.c
> > @@ -18,7 +18,7 @@ extern "C" {
> >  void
> >  foo (int8_t *base, int8x16_t value)
> >  {
> > -  return vst1q_s8 (base, value);
> > +  vst1q_s8 (base, value);
> >  }
> >
> >
> > @@ -31,7 +31,7 @@ foo (int8_t *base, int8x16_t value)
> >  void
> >  foo1 (int8_t *base, int8x16_t value)
> >  {
> > -  return vst1q (base, value);
> > +  vst1q (base, value);
> >  }
> >
> >  #ifdef __cplusplus
> > diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_u16.c
> > b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_u16.c
> > index 1fea2de1e76..b8ce7fbe6ee 100644
> > --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_u16.c
> > +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_u16.c
> > @@ -18,7 +18,7 @@ extern "C" {
> >  void
> >  foo (uint16_t *base, uint16x8_t value)
> >  {
> > -  return vst1q_u16 (base, value);
> > +  vst1q_u16 (base, value);
> >  }
> >
> >
> > @@ -31,7 +31,7 @@ foo (uint16_t *base, uint16x8_t value)
> >  void
> >  foo1 (uint16_t *base, uint16x8_t value)
> >  {
> > -  return vst1q (base, value);
> > +  vst1q (base, value);
> >  }
> >
> >  #ifdef __cplusplus
> > diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_u32.c
> > b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_u32.c
> > index 64c43c59d47..1dbb55538a9 100644
> > --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_u32.c
> > +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_u32.c
> > @@ -18,7 +18,7 @@ extern "C" {
> >  void
> >  foo (uint32_t *base, uint32x4_t value)
> >  {
> > -  return vst1q_u32 (base, value);
> > +  vst1q_u32 (base, value);
> >  }
> >
> >
> > @@ -31,7 +31,7 @@ foo (uint32_t *base, uint32x4_t value)
> >  void
> >  foo1 (uint32_t *base, uint32x4_t value)
> >  {
> > -  return vst1q (base, value);
> > +  vst1q (base, value);
> >  }
> >
> >  #ifdef __cplusplus
> > diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_u8.c
> > b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_u8.c
> > index 5517611bba6..ab22be81647 100644
> > --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_u8.c
> > +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_u8.c
> > @@ -18,7 +18,7 @@ extern "C" {
> >  void
> >  foo (uint8_t *base, uint8x16_t value)
> >  {
> > -  return vst1q_u8 (base, value);
> > +  vst1q_u8 (base, value);
> >  }
> >
> >
> > @@ -31,7 +31,7 @@ foo (uint8_t *base, uint8x16_t value)
> >  void
> >  foo1 (uint8_t *base, uint8x16_t value)
> >  {
> > -  return vst1q (base, value);
> > +  vst1q (base, value);
> >  }
> >
> >  #ifdef __cplusplus
> > --
> > 2.34.1
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: [PATCH 1/6] arm: Fix arm_simd_types and MVE scalar_types
  2023-11-16 15:26 [PATCH 1/6] arm: Fix arm_simd_types and MVE scalar_types Christophe Lyon
                   ` (4 preceding siblings ...)
  2023-11-16 15:26 ` [PATCH 6/6] arm: [MVE intrinsics] rework vldq1 vst1q Christophe Lyon
@ 2023-11-16 16:46 ` Kyrylo Tkachov
  5 siblings, 0 replies; 15+ messages in thread
From: Kyrylo Tkachov @ 2023-11-16 16:46 UTC (permalink / raw)
  To: Christophe Lyon, gcc-patches, Richard Sandiford, Richard Earnshaw



> -----Original Message-----
> From: Christophe Lyon <christophe.lyon@linaro.org>
> Sent: Thursday, November 16, 2023 3:26 PM
> To: gcc-patches@gcc.gnu.org; Richard Sandiford
> <Richard.Sandiford@arm.com>; Richard Earnshaw
> <Richard.Earnshaw@arm.com>; Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>
> Cc: Christophe Lyon <christophe.lyon@linaro.org>
> Subject: [PATCH 1/6] arm: Fix arm_simd_types and MVE scalar_types
> 
> So far we define arm_simd_types and scalar_types using type
> definitions like intSI_type_node, etc...
> 
> This is causing problems with later patches which re-implement
> load/store MVE intrinsics, leading to error messages such as:
>   error: passing argument 1 of 'vst1q_s32' from incompatible pointer type
>   note: expected 'int *' but argument is of type 'int32_t *' {aka 'long int *'}
> 
> This patch uses get_typenode_from_name (INT32_TYPE) instead, which
> defines the types as appropriate for the target/C library.

Ok.
Thanks,
Kyrill

> 
> 2023-11-16  Christophe Lyon  <christophe.lyon@linaro.org>
> 
> 	gcc/
> 	* config/arm/arm-builtins.cc (arm_init_simd_builtin_types): Fix
> 	initialization of arm_simd_types[].eltype.
> 	* config/arm/arm-mve-builtins.def (DEF_MVE_TYPE): Fix scalar
> 	types.
> ---
>  gcc/config/arm/arm-builtins.cc      | 28 ++++++++++++++--------------
>  gcc/config/arm/arm-mve-builtins.def | 16 ++++++++--------
>  2 files changed, 22 insertions(+), 22 deletions(-)
> 
> diff --git a/gcc/config/arm/arm-builtins.cc b/gcc/config/arm/arm-builtins.cc
> index fca7dcaf565..dd9c5815c45 100644
> --- a/gcc/config/arm/arm-builtins.cc
> +++ b/gcc/config/arm/arm-builtins.cc
> @@ -1580,20 +1580,20 @@ arm_init_simd_builtin_types (void)
>        TYPE_STRING_FLAG (arm_simd_polyHI_type_node) = false;
>      }
>    /* Init all the element types built by the front-end.  */
> -  arm_simd_types[Int8x8_t].eltype = intQI_type_node;
> -  arm_simd_types[Int8x16_t].eltype = intQI_type_node;
> -  arm_simd_types[Int16x4_t].eltype = intHI_type_node;
> -  arm_simd_types[Int16x8_t].eltype = intHI_type_node;
> -  arm_simd_types[Int32x2_t].eltype = intSI_type_node;
> -  arm_simd_types[Int32x4_t].eltype = intSI_type_node;
> -  arm_simd_types[Int64x2_t].eltype = intDI_type_node;
> -  arm_simd_types[Uint8x8_t].eltype = unsigned_intQI_type_node;
> -  arm_simd_types[Uint8x16_t].eltype = unsigned_intQI_type_node;
> -  arm_simd_types[Uint16x4_t].eltype = unsigned_intHI_type_node;
> -  arm_simd_types[Uint16x8_t].eltype = unsigned_intHI_type_node;
> -  arm_simd_types[Uint32x2_t].eltype = unsigned_intSI_type_node;
> -  arm_simd_types[Uint32x4_t].eltype = unsigned_intSI_type_node;
> -  arm_simd_types[Uint64x2_t].eltype = unsigned_intDI_type_node;
> +  arm_simd_types[Int8x8_t].eltype = get_typenode_from_name
> (INT8_TYPE);
> +  arm_simd_types[Int8x16_t].eltype = get_typenode_from_name
> (INT8_TYPE);
> +  arm_simd_types[Int16x4_t].eltype = get_typenode_from_name
> (INT16_TYPE);
> +  arm_simd_types[Int16x8_t].eltype = get_typenode_from_name
> (INT16_TYPE);
> +  arm_simd_types[Int32x2_t].eltype = get_typenode_from_name
> (INT32_TYPE);
> +  arm_simd_types[Int32x4_t].eltype = get_typenode_from_name
> (INT32_TYPE);
> +  arm_simd_types[Int64x2_t].eltype = get_typenode_from_name
> (INT64_TYPE);
> +  arm_simd_types[Uint8x8_t].eltype = get_typenode_from_name
> (UINT8_TYPE);
> +  arm_simd_types[Uint8x16_t].eltype = get_typenode_from_name
> (UINT8_TYPE);
> +  arm_simd_types[Uint16x4_t].eltype = get_typenode_from_name
> (UINT16_TYPE);
> +  arm_simd_types[Uint16x8_t].eltype = get_typenode_from_name
> (UINT16_TYPE);
> +  arm_simd_types[Uint32x2_t].eltype = get_typenode_from_name
> (UINT32_TYPE);
> +  arm_simd_types[Uint32x4_t].eltype = get_typenode_from_name
> (UINT32_TYPE);
> +  arm_simd_types[Uint64x2_t].eltype = get_typenode_from_name
> (UINT64_TYPE);
> 
>    /* Note: poly64x2_t is defined in arm_neon.h, to ensure it gets default
>       mangling.  */
> diff --git a/gcc/config/arm/arm-mve-builtins.def b/gcc/config/arm/arm-mve-
> builtins.def
> index e2cf1baf370..a901d8231e9 100644
> --- a/gcc/config/arm/arm-mve-builtins.def
> +++ b/gcc/config/arm/arm-mve-builtins.def
> @@ -39,14 +39,14 @@ DEF_MVE_MODE (r, none, none, none)
> 
>  #define REQUIRES_FLOAT false
>  DEF_MVE_TYPE (mve_pred16_t, boolean_type_node)
> -DEF_MVE_TYPE (uint8x16_t, unsigned_intQI_type_node)
> -DEF_MVE_TYPE (uint16x8_t, unsigned_intHI_type_node)
> -DEF_MVE_TYPE (uint32x4_t, unsigned_intSI_type_node)
> -DEF_MVE_TYPE (uint64x2_t, unsigned_intDI_type_node)
> -DEF_MVE_TYPE (int8x16_t, intQI_type_node)
> -DEF_MVE_TYPE (int16x8_t, intHI_type_node)
> -DEF_MVE_TYPE (int32x4_t, intSI_type_node)
> -DEF_MVE_TYPE (int64x2_t, intDI_type_node)
> +DEF_MVE_TYPE (uint8x16_t, get_typenode_from_name (UINT8_TYPE))
> +DEF_MVE_TYPE (uint16x8_t, get_typenode_from_name (UINT16_TYPE))
> +DEF_MVE_TYPE (uint32x4_t, get_typenode_from_name (UINT32_TYPE))
> +DEF_MVE_TYPE (uint64x2_t, get_typenode_from_name (UINT64_TYPE))
> +DEF_MVE_TYPE (int8x16_t, get_typenode_from_name (INT8_TYPE))
> +DEF_MVE_TYPE (int16x8_t, get_typenode_from_name (INT16_TYPE))
> +DEF_MVE_TYPE (int32x4_t, get_typenode_from_name (INT32_TYPE))
> +DEF_MVE_TYPE (int64x2_t, get_typenode_from_name (INT64_TYPE))
>  #undef REQUIRES_FLOAT
> 
>  #define REQUIRES_FLOAT true
> --
> 2.34.1


^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: [PATCH 2/6] arm: [MVE intrinsics] Add support for void and load/store pointers as argument types.
  2023-11-16 15:26 ` [PATCH 2/6] arm: [MVE intrinsics] Add support for void and load/store pointers as argument types Christophe Lyon
@ 2023-11-16 16:47   ` Kyrylo Tkachov
  0 siblings, 0 replies; 15+ messages in thread
From: Kyrylo Tkachov @ 2023-11-16 16:47 UTC (permalink / raw)
  To: Christophe Lyon, gcc-patches, Richard Sandiford, Richard Earnshaw



> -----Original Message-----
> From: Christophe Lyon <christophe.lyon@linaro.org>
> Sent: Thursday, November 16, 2023 3:26 PM
> To: gcc-patches@gcc.gnu.org; Richard Sandiford
> <Richard.Sandiford@arm.com>; Richard Earnshaw
> <Richard.Earnshaw@arm.com>; Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>
> Cc: Christophe Lyon <christophe.lyon@linaro.org>
> Subject: [PATCH 2/6] arm: [MVE intrinsics] Add support for void and
> load/store pointers as argument types.
> 
> This patch adds support for '_', 'al' and 'as' for void, load pointer
> and store pointer argument/return value types in intrinsic signatures.
> 
> It also adds a mew memory_scalar_type() helper to function_instance,
> which is used by 'al' and 'as'.

Ok.
Thanks,
Kyrill

> 
> 2023-11-16  Christophe Lyon  <christophe.lyon@linaro.org>
> 
> 	gcc/
> 	* config/arm/arm-mve-builtins-shapes.cc (build_const_pointer):
> 	New.
> 	(parse_type): Add support for '_', 'al' and 'as'.
> 	* config/arm/arm-mve-builtins.h (function_instance): Add
> 	memory_scalar_type.
> 	(function_base): Likewise.
> ---
>  gcc/config/arm/arm-mve-builtins-shapes.cc | 25 +++++++++++++++++++++++
>  gcc/config/arm/arm-mve-builtins.h         | 17 +++++++++++++++
>  2 files changed, 42 insertions(+)
> 
> diff --git a/gcc/config/arm/arm-mve-builtins-shapes.cc b/gcc/config/arm/arm-
> mve-builtins-shapes.cc
> index 23eb9d0e69b..ce87ebcef30 100644
> --- a/gcc/config/arm/arm-mve-builtins-shapes.cc
> +++ b/gcc/config/arm/arm-mve-builtins-shapes.cc
> @@ -39,6 +39,13 @@
> 
>  namespace arm_mve {
> 
> +/* Return a representation of "const T *".  */
> +static tree
> +build_const_pointer (tree t)
> +{
> +  return build_pointer_type (build_qualified_type (t, TYPE_QUAL_CONST));
> +}
> +
>  /* If INSTANCE has a predicate, add it to the list of argument types
>     in ARGUMENT_TYPES.  RETURN_TYPE is the type returned by the
>     function.  */
> @@ -140,6 +147,9 @@ parse_element_type (const function_instance
> &instance, const char *&format)
>  /* Read and return a type from FORMAT for function INSTANCE.  Advance
>     FORMAT beyond the type string.  The format is:
> 
> +   _       - void
> +   al      - array pointer for loads
> +   as      - array pointer for stores
>     p       - predicates with type mve_pred16_t
>     s<elt>  - a scalar type with the given element suffix
>     t<elt>  - a vector or tuple type with given element suffix [*1]
> @@ -156,6 +166,21 @@ parse_type (const function_instance &instance,
> const char *&format)
>  {
>    int ch = *format++;
> 
> +
> +  if (ch == '_')
> +    return void_type_node;
> +
> +  if (ch == 'a')
> +    {
> +      ch = *format++;
> +      if (ch == 'l')
> +	return build_const_pointer (instance.memory_scalar_type ());
> +      if (ch == 's') {
> +	return build_pointer_type (instance.memory_scalar_type ());
> +      }
> +      gcc_unreachable ();
> +    }
> +
>    if (ch == 'p')
>      return get_mve_pred16_t ();
> 
> diff --git a/gcc/config/arm/arm-mve-builtins.h b/gcc/config/arm/arm-mve-
> builtins.h
> index 37b8223dfb2..4fd230fe4c7 100644
> --- a/gcc/config/arm/arm-mve-builtins.h
> +++ b/gcc/config/arm/arm-mve-builtins.h
> @@ -277,6 +277,7 @@ public:
>    bool could_trap_p () const;
> 
>    unsigned int vectors_per_tuple () const;
> +  tree memory_scalar_type () const;
> 
>    const mode_suffix_info &mode_suffix () const;
> 
> @@ -519,6 +520,14 @@ public:
>       of vectors in the tuples, otherwise return 1.  */
>    virtual unsigned int vectors_per_tuple () const { return 1; }
> 
> +  /* If the function addresses memory, return the type of a single
> +     scalar memory element.  */
> +  virtual tree
> +  memory_scalar_type (const function_instance &) const
> +  {
> +    gcc_unreachable ();
> +  }
> +
>    /* Try to fold the given gimple call.  Return the new gimple statement
>       on success, otherwise return null.  */
>    virtual gimple *fold (gimple_folder &) const { return NULL; }
> @@ -644,6 +653,14 @@ function_instance::vectors_per_tuple () const
>    return base->vectors_per_tuple ();
>  }
> 
> +/* If the function addresses memory, return the type of a single
> +   scalar memory element.  */
> +inline tree
> +function_instance::memory_scalar_type () const
> +{
> +  return base->memory_scalar_type (*this);
> +}
> +
>  /* Return information about the function's mode suffix.  */
>  inline const mode_suffix_info &
>  function_instance::mode_suffix () const
> --
> 2.34.1


^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: [PATCH 3/6] arm: [MVE intrinsics] Add support for contiguous loads and stores
  2023-11-16 15:26 ` [PATCH 3/6] arm: [MVE intrinsics] Add support for contiguous loads and stores Christophe Lyon
@ 2023-11-16 16:48   ` Kyrylo Tkachov
  2023-11-23 13:29   ` Jan-Benedict Glaw
  1 sibling, 0 replies; 15+ messages in thread
From: Kyrylo Tkachov @ 2023-11-16 16:48 UTC (permalink / raw)
  To: Christophe Lyon, gcc-patches, Richard Sandiford, Richard Earnshaw



> -----Original Message-----
> From: Christophe Lyon <christophe.lyon@linaro.org>
> Sent: Thursday, November 16, 2023 3:26 PM
> To: gcc-patches@gcc.gnu.org; Richard Sandiford
> <Richard.Sandiford@arm.com>; Richard Earnshaw
> <Richard.Earnshaw@arm.com>; Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>
> Cc: Christophe Lyon <christophe.lyon@linaro.org>
> Subject: [PATCH 3/6] arm: [MVE intrinsics] Add support for contiguous loads
> and stores
> 
> This patch adds base support for load/store intrinsics to the
> framework, starting with loads and stores for contiguous memory
> elements, without extension nor truncation.
> 
> Compared to the aarch64/SVE implementation, there's no support for
> gather/scatter loads/stores yet.  This will be added later as needed.
> 

Ok.
Thanks,
Kyrill

> 2023-11-16  Christophe Lyon  <christophe.lyon@linaro.org>
> 
> 	gcc/
> 	* config/arm/arm-mve-builtins-functions.h (multi_vector_function)
> 	(full_width_access): New classes.
> 	* config/arm/arm-mve-builtins.cc
> 	(find_type_suffix_for_scalar_type, infer_pointer_type)
> 	(require_pointer_type, get_contiguous_base, add_mem_operand)
> 	(add_fixed_operand, use_contiguous_load_insn)
> 	(use_contiguous_store_insn): New.
> 	* config/arm/arm-mve-builtins.h (memory_vector_mode)
> 	(infer_pointer_type, require_pointer_type, get_contiguous_base)
> 	(add_mem_operand)
> 	(add_fixed_operand, use_contiguous_load_insn)
> 	(use_contiguous_store_insn): New.
> ---
>  gcc/config/arm/arm-mve-builtins-functions.h |  56 ++++++++++
>  gcc/config/arm/arm-mve-builtins.cc          | 116 ++++++++++++++++++++
>  gcc/config/arm/arm-mve-builtins.h           |  28 ++++-
>  3 files changed, 199 insertions(+), 1 deletion(-)
> 
> diff --git a/gcc/config/arm/arm-mve-builtins-functions.h
> b/gcc/config/arm/arm-mve-builtins-functions.h
> index eba1f071af0..6d234a2dd7c 100644
> --- a/gcc/config/arm/arm-mve-builtins-functions.h
> +++ b/gcc/config/arm/arm-mve-builtins-functions.h
> @@ -966,6 +966,62 @@ public:
>    }
>  };
> 
> +/* A function_base that sometimes or always operates on tuples of
> +   vectors.  */
> +class multi_vector_function : public function_base
> +{
> +public:
> +  CONSTEXPR multi_vector_function (unsigned int vectors_per_tuple)
> +    : m_vectors_per_tuple (vectors_per_tuple) {}
> +
> +  unsigned int
> +  vectors_per_tuple () const override
> +  {
> +    return m_vectors_per_tuple;
> +  }
> +
> +  /* The number of vectors in a tuple, or 1 if the function only operates
> +     on single vectors.  */
> +  unsigned int m_vectors_per_tuple;
> +};
> +
> +/* A function_base that loads or stores contiguous memory elements
> +   without extending or truncating them.  */
> +class full_width_access : public multi_vector_function
> +{
> +public:
> +  CONSTEXPR full_width_access (unsigned int vectors_per_tuple = 1)
> +    : multi_vector_function (vectors_per_tuple) {}
> +
> +  tree
> +  memory_scalar_type (const function_instance &fi) const override
> +  {
> +    return fi.scalar_type (0);
> +  }
> +
> +  machine_mode
> +  memory_vector_mode (const function_instance &fi) const override
> +  {
> +    machine_mode mode = fi.vector_mode (0);
> +    /* Vectors of floating-point are managed in memory as vectors of
> +       integers.  */
> +    switch (mode)
> +      {
> +      case E_V4SFmode:
> +	mode = E_V4SImode;
> +	break;
> +      case E_V8HFmode:
> +	mode = E_V8HImode;
> +	break;
> +      }
> +
> +    if (m_vectors_per_tuple != 1)
> +      mode = targetm.array_mode (mode, m_vectors_per_tuple).require ();
> +
> +    return mode;
> +  }
> +};
> +
>  } /* end namespace arm_mve */
> 
>  /* Declare the global function base NAME, creating it from an instance
> diff --git a/gcc/config/arm/arm-mve-builtins.cc b/gcc/config/arm/arm-mve-
> builtins.cc
> index 02dc8fa9b73..a265cb05553 100644
> --- a/gcc/config/arm/arm-mve-builtins.cc
> +++ b/gcc/config/arm/arm-mve-builtins.cc
> @@ -36,6 +36,7 @@
>  #include "fold-const.h"
>  #include "gimple.h"
>  #include "gimple-iterator.h"
> +#include "explow.h"
>  #include "emit-rtl.h"
>  #include "langhooks.h"
>  #include "stringpool.h"
> @@ -529,6 +530,22 @@ matches_type_p (const_tree model_type, const_tree
> candidate)
>  	  && TYPE_MAIN_VARIANT (model_type) == TYPE_MAIN_VARIANT
> (candidate));
>  }
> 
> +/* If TYPE is a valid MVE element type, return the corresponding type
> +   suffix, otherwise return NUM_TYPE_SUFFIXES.  */
> +static type_suffix_index
> +find_type_suffix_for_scalar_type (const_tree type)
> +{
> +  /* A linear search should be OK here, since the code isn't hot and
> +     the number of types is only small.  */
> +  for (unsigned int suffix_i = 0; suffix_i < NUM_TYPE_SUFFIXES; ++suffix_i)
> +      {
> +	vector_type_index vector_i = type_suffixes[suffix_i].vector_type;
> +	if (matches_type_p (scalar_types[vector_i], type))
> +	  return type_suffix_index (suffix_i);
> +      }
> +  return NUM_TYPE_SUFFIXES;
> +}
> +
>  /* Report an error against LOCATION that the user has tried to use
>     a floating point function when the mve.fp extension is disabled.  */
>  static void
> @@ -1125,6 +1142,37 @@ function_resolver::resolve_to (mode_suffix_index
> mode,
>    return res;
>  }
> 
> +/* Require argument ARGNO to be a pointer to a scalar type that has a
> +   corresponding type suffix.  Return that type suffix on success,
> +   otherwise report an error and return NUM_TYPE_SUFFIXES.  */
> +type_suffix_index
> +function_resolver::infer_pointer_type (unsigned int argno)
> +{
> +  tree actual = get_argument_type (argno);
> +  if (actual == error_mark_node)
> +    return NUM_TYPE_SUFFIXES;
> +
> +  if (TREE_CODE (actual) != POINTER_TYPE)
> +    {
> +      error_at (location, "passing %qT to argument %d of %qE, which"
> +		" expects a pointer type", actual, argno + 1, fndecl);
> +      return NUM_TYPE_SUFFIXES;
> +    }
> +
> +  tree target = TREE_TYPE (actual);
> +  type_suffix_index type = find_type_suffix_for_scalar_type (target);
> +  if (type == NUM_TYPE_SUFFIXES)
> +    {
> +      error_at (location, "passing %qT to argument %d of %qE, but %qT is not"
> +		" a valid MVE element type", actual, argno + 1, fndecl,
> +		build_qualified_type (target, 0));
> +      return NUM_TYPE_SUFFIXES;
> +    }
> +  unsigned int bits = type_suffixes[type].element_bits;
> +
> +  return type;
> +}
> +
>  /* Require argument ARGNO to be a single vector or a tuple of
> NUM_VECTORS
>     vectors; NUM_VECTORS is 1 for the former.  Return the associated type
>     suffix on success, using TYPE_SUFFIX_b for predicates.  Report an error
> @@ -1498,6 +1546,22 @@ function_resolver::require_scalar_type (unsigned
> int argno,
>    return true;
>  }
> 
> +/* Require argument ARGNO to be some form of pointer, without being
> specific
> +   about its target type.  Return true if the argument has the right form,
> +   otherwise report an appropriate error.  */
> +bool
> +function_resolver::require_pointer_type (unsigned int argno)
> +{
> +  if (!scalar_argument_p (argno))
> +    {
> +      error_at (location, "passing %qT to argument %d of %qE, which"
> +		" expects a scalar pointer", get_argument_type (argno),
> +		argno + 1, fndecl);
> +      return false;
> +    }
> +  return true;
> +}
> +
>  /* Require the function to have exactly EXPECTED arguments.  Return true
>     if it does, otherwise report an appropriate error.  */
>  bool
> @@ -1955,6 +2019,14 @@ function_expander::direct_optab_handler (optab
> op, unsigned int suffix_i)
>    return ::direct_optab_handler (op, vector_mode (suffix_i));
>  }
> 
> +/* Return the base address for a contiguous load or store
> +   function.  */
> +rtx
> +function_expander::get_contiguous_base ()
> +{
> +  return args[0];
> +}
> +
>  /* For a function that does the equivalent of:
> 
>       OUTPUT = COND ? FN (INPUTS) : FALLBACK;
> @@ -2043,6 +2115,26 @@ function_expander::add_integer_operand
> (HOST_WIDE_INT x)
>    create_integer_operand (&m_ops.last (), x);
>  }
> 
> +/* Add a memory operand with mode MODE and address ADDR.  */
> +void
> +function_expander::add_mem_operand (machine_mode mode, rtx addr)
> +{
> +  gcc_assert (VECTOR_MODE_P (mode));
> +  rtx mem = gen_rtx_MEM (mode, memory_address (mode, addr));
> +  /* The memory is only guaranteed to be element-aligned.  */
> +  set_mem_align (mem, GET_MODE_ALIGNMENT (GET_MODE_INNER
> (mode)));
> +  add_fixed_operand (mem);
> +}
> +
> +/* Add an operand that must be X.  The only way of legitimizing an
> +   invalid X is to reload the address of a MEM.  */
> +void
> +function_expander::add_fixed_operand (rtx x)
> +{
> +  m_ops.safe_grow (m_ops.length () + 1, true);
> +  create_fixed_operand (&m_ops.last (), x);
> +}
> +
>  /* Generate instruction ICODE, given that its operands have already
>     been added to M_OPS.  Return the value of the first operand.  */
>  rtx
> @@ -2137,6 +2229,30 @@ function_expander::use_cond_insn (insn_code
> icode, unsigned int merge_argno)
>    return generate_insn (icode);
>  }
> 
> +/* Implement the call using instruction ICODE, which loads memory operand
> 1
> +   into register operand 0.  */
> +rtx
> +function_expander::use_contiguous_load_insn (insn_code icode)
> +{
> +  machine_mode mem_mode = memory_vector_mode ();
> +
> +  add_output_operand (icode);
> +  add_mem_operand (mem_mode, get_contiguous_base ());
> +  return generate_insn (icode);
> +}
> +
> +/* Implement the call using instruction ICODE, which stores register operand
> 1
> +   into memory operand 0.  */
> +rtx
> +function_expander::use_contiguous_store_insn (insn_code icode)
> +{
> +  machine_mode mem_mode = memory_vector_mode ();
> +
> +  add_mem_operand (mem_mode, get_contiguous_base ());
> +  add_input_operand (icode, args[1]);
> +  return generate_insn (icode);
> +}
> +
>  /* Implement the call using a normal unpredicated optab for PRED_none.
> 
>     <optab> corresponds to:
> diff --git a/gcc/config/arm/arm-mve-builtins.h b/gcc/config/arm/arm-mve-
> builtins.h
> index 4fd230fe4c7..9c219fa8db4 100644
> --- a/gcc/config/arm/arm-mve-builtins.h
> +++ b/gcc/config/arm/arm-mve-builtins.h
> @@ -278,6 +278,7 @@ public:
> 
>    unsigned int vectors_per_tuple () const;
>    tree memory_scalar_type () const;
> +  machine_mode memory_vector_mode () const;
> 
>    const mode_suffix_info &mode_suffix () const;
> 
> @@ -383,6 +384,7 @@ public:
>  		   type_suffix_index = NUM_TYPE_SUFFIXES,
>  		   type_suffix_index = NUM_TYPE_SUFFIXES);
> 
> +  type_suffix_index infer_pointer_type (unsigned int);
>    type_suffix_index infer_vector_or_tuple_type (unsigned int, unsigned int);
>    type_suffix_index infer_vector_type (unsigned int);
> 
> @@ -394,8 +396,9 @@ public:
>  				    type_suffix_index,
>  				    type_class_index = SAME_TYPE_CLASS,
>  				    unsigned int = SAME_SIZE);
> -  bool require_integer_immediate (unsigned int);
>    bool require_scalar_type (unsigned int, const char *);
> +  bool require_pointer_type (unsigned int);
> +  bool require_integer_immediate (unsigned int);
>    bool require_derived_scalar_type (unsigned int, type_class_index,
>  				    unsigned int = SAME_SIZE);
> 
> @@ -476,18 +479,23 @@ public:
> 
>    insn_code direct_optab_handler (optab, unsigned int = 0);
> 
> +  rtx get_contiguous_base ();
>    rtx get_fallback_value (machine_mode, unsigned int, unsigned int &);
>    rtx get_reg_target ();
> 
>    void add_output_operand (insn_code);
>    void add_input_operand (insn_code, rtx);
>    void add_integer_operand (HOST_WIDE_INT);
> +  void add_mem_operand (machine_mode, rtx);
> +  void add_fixed_operand (rtx);
>    rtx generate_insn (insn_code);
> 
>    rtx use_exact_insn (insn_code);
>    rtx use_unpred_insn (insn_code);
>    rtx use_pred_x_insn (insn_code);
>    rtx use_cond_insn (insn_code, unsigned int = DEFAULT_MERGE_ARGNO);
> +  rtx use_contiguous_load_insn (insn_code);
> +  rtx use_contiguous_store_insn (insn_code);
> 
>    rtx map_to_rtx_codes (rtx_code, rtx_code, rtx_code);
> 
> @@ -528,6 +536,15 @@ public:
>      gcc_unreachable ();
>    }
> 
> +  /* If the function addresses memory, return a vector mode whose
> +     GET_MODE_NUNITS is the number of elements addressed and whose
> +     GET_MODE_INNER is the mode of a single scalar memory element.  */
> +  virtual machine_mode
> +  memory_vector_mode (const function_instance &) const
> +  {
> +    gcc_unreachable ();
> +  }
> +
>    /* Try to fold the given gimple call.  Return the new gimple statement
>       on success, otherwise return null.  */
>    virtual gimple *fold (gimple_folder &) const { return NULL; }
> @@ -661,6 +678,15 @@ function_instance::memory_scalar_type () const
>    return base->memory_scalar_type (*this);
>  }
> 
> +/* If the function addresses memory, return a vector mode whose
> +   GET_MODE_NUNITS is the number of elements addressed and whose
> +   GET_MODE_INNER is the mode of a single scalar memory element.  */
> +inline machine_mode
> +function_instance::memory_vector_mode () const
> +{
> +  return base->memory_vector_mode (*this);
> +}
> +
>  /* Return information about the function's mode suffix.  */
>  inline const mode_suffix_info &
>  function_instance::mode_suffix () const
> --
> 2.34.1


^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: [PATCH 4/6] arm: [MVE intrinsics] add load and store shapes
  2023-11-16 15:26 ` [PATCH 4/6] arm: [MVE intrinsics] add load and store shapes Christophe Lyon
@ 2023-11-16 16:49   ` Kyrylo Tkachov
  0 siblings, 0 replies; 15+ messages in thread
From: Kyrylo Tkachov @ 2023-11-16 16:49 UTC (permalink / raw)
  To: Christophe Lyon, gcc-patches, Richard Sandiford, Richard Earnshaw



> -----Original Message-----
> From: Christophe Lyon <christophe.lyon@linaro.org>
> Sent: Thursday, November 16, 2023 3:26 PM
> To: gcc-patches@gcc.gnu.org; Richard Sandiford
> <Richard.Sandiford@arm.com>; Richard Earnshaw
> <Richard.Earnshaw@arm.com>; Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>
> Cc: Christophe Lyon <christophe.lyon@linaro.org>
> Subject: [PATCH 4/6] arm: [MVE intrinsics] add load and store shapes
> 
> This patch adds the load and store shapes descriptions.

Ok.
Thanks,
Kyrill

> 
> 2023-11-16  Christophe Lyon  <christophe.lyon@linaro.org>
> 
> 	gcc/
> 	* config/arm/arm-mve-builtins-shapes.cc (load, store): New.
> 	* config/arm/arm-mve-builtins-shapes.h (load, store): New.
> ---
>  gcc/config/arm/arm-mve-builtins-shapes.cc | 67 +++++++++++++++++++++++
>  gcc/config/arm/arm-mve-builtins-shapes.h  |  2 +
>  2 files changed, 69 insertions(+)
> 
> diff --git a/gcc/config/arm/arm-mve-builtins-shapes.cc b/gcc/config/arm/arm-
> mve-builtins-shapes.cc
> index ce87ebcef30..fe983e7c736 100644
> --- a/gcc/config/arm/arm-mve-builtins-shapes.cc
> +++ b/gcc/config/arm/arm-mve-builtins-shapes.cc
> @@ -1428,6 +1428,38 @@ struct inherent_def : public nonoverloaded_base
>  };
>  SHAPE (inherent)
> 
> +/* sv<t0>_t svfoo[_t0](const <t0>_t *)
> +
> +   Example: vld1q.
> +   int8x16_t [__arm_]vld1q[_s8](int8_t const *base)
> +   int8x16_t [__arm_]vld1q_z[_s8](int8_t const *base, mve_pred16_t p)  */
> +struct load_def : public overloaded_base<0>
> +{
> +  void
> +  build (function_builder &b, const function_group_info &group,
> +	 bool preserve_user_namespace) const override
> +  {
> +    b.add_overloaded_functions (group, MODE_none,
> preserve_user_namespace);
> +    build_all (b, "t0,al", group, MODE_none, preserve_user_namespace);
> +  }
> +
> +  /* Resolve a call based purely on a pointer argument.  */
> +  tree
> +  resolve (function_resolver &r) const override
> +  {
> +    gcc_assert (r.mode_suffix_id == MODE_none);
> +
> +    unsigned int i, nargs;
> +    type_suffix_index type;
> +    if (!r.check_gp_argument (1, i, nargs)
> +	|| (type = r.infer_pointer_type (i)) == NUM_TYPE_SUFFIXES)
> +      return error_mark_node;
> +
> +    return r.resolve_to (r.mode_suffix_id, type);
> +  }
> +};
> +SHAPE (load)
> +
>  /* <T0>_t vfoo[_t0](<T0>_t)
>     <T0>_t vfoo_n_t0(<sT0>_t)
> 
> @@ -1477,6 +1509,41 @@ struct mvn_def : public overloaded_base<0>
>  };
>  SHAPE (mvn)
> 
> +/* void vfoo[_t0](<X>_t *, v<t0>[xN]_t)
> +
> +   where <X> might be tied to <t0> (for non-truncating stores) or might
> +   depend on the function base name (for truncating stores).
> +
> +   Example: vst1q.
> +   void [__arm_]vst1q[_s8](int8_t *base, int8x16_t value)
> +   void [__arm_]vst1q_p[_s8](int8_t *base, int8x16_t value, mve_pred16_t p)
> */
> +struct store_def : public overloaded_base<0>
> +{
> +  void
> +  build (function_builder &b, const function_group_info &group,
> +	 bool preserve_user_namespace) const override
> +  {
> +    b.add_overloaded_functions (group, MODE_none,
> preserve_user_namespace);
> +    build_all (b, "_,as,v0", group, MODE_none, preserve_user_namespace);
> +  }
> +
> +  tree
> +  resolve (function_resolver &r) const override
> +  {
> +    gcc_assert (r.mode_suffix_id == MODE_none);
> +
> +    unsigned int i, nargs;
> +    type_suffix_index type;
> +    if (!r.check_gp_argument (2, i, nargs)
> +	|| !r.require_pointer_type (0)
> +	|| (type = r.infer_vector_type (1)) == NUM_TYPE_SUFFIXES)
> +      return error_mark_node;
> +
> +    return r.resolve_to (r.mode_suffix_id, type);
> +  }
> +};
> +SHAPE (store)
> +
>  /* <T0>_t vfoo[_t0](<T0>_t, <T0>_t, <T0>_t)
> 
>     i.e. the standard shape for ternary operations that operate on
> diff --git a/gcc/config/arm/arm-mve-builtins-shapes.h b/gcc/config/arm/arm-
> mve-builtins-shapes.h
> index a93245321c9..aa9309dec7e 100644
> --- a/gcc/config/arm/arm-mve-builtins-shapes.h
> +++ b/gcc/config/arm/arm-mve-builtins-shapes.h
> @@ -61,7 +61,9 @@ namespace arm_mve
>      extern const function_shape *const cmp;
>      extern const function_shape *const create;
>      extern const function_shape *const inherent;
> +    extern const function_shape *const load;
>      extern const function_shape *const mvn;
> +    extern const function_shape *const store;
>      extern const function_shape *const ternary;
>      extern const function_shape *const ternary_lshift;
>      extern const function_shape *const ternary_n;
> --
> 2.34.1


^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: [PATCH 6/6] arm: [MVE intrinsics] rework vldq1 vst1q
  2023-11-16 15:26 ` [PATCH 6/6] arm: [MVE intrinsics] rework vldq1 vst1q Christophe Lyon
@ 2023-11-16 16:49   ` Kyrylo Tkachov
  0 siblings, 0 replies; 15+ messages in thread
From: Kyrylo Tkachov @ 2023-11-16 16:49 UTC (permalink / raw)
  To: Christophe Lyon, gcc-patches, Richard Sandiford, Richard Earnshaw



> -----Original Message-----
> From: Christophe Lyon <christophe.lyon@linaro.org>
> Sent: Thursday, November 16, 2023 3:26 PM
> To: gcc-patches@gcc.gnu.org; Richard Sandiford
> <Richard.Sandiford@arm.com>; Richard Earnshaw
> <Richard.Earnshaw@arm.com>; Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>
> Cc: Christophe Lyon <christophe.lyon@linaro.org>
> Subject: [PATCH 6/6] arm: [MVE intrinsics] rework vldq1 vst1q
> 
> Implement vld1q, vst1q using the new MVE builtins framework.

Ok. Nice to see more MVE intrinsics getting the good treatment.
Thanks,
Kyrill

> 
> 2023-11-16  Christophe Lyon  <christophe.lyon@linaro.org>
> 
> 	gcc/
> 	* config/arm/arm-mve-builtins-base.cc (vld1_impl, vld1q)
> 	(vst1_impl, vst1q): New.
> 	* config/arm/arm-mve-builtins-base.def (vld1q, vst1q): New.
> 	* config/arm/arm-mve-builtins-base.h (vld1q, vst1q): New.
> 	* config/arm/arm_mve.h
> 	(vld1q): Delete.
> 	(vst1q): Delete.
> 	(vld1q_s8): Delete.
> 	(vld1q_s32): Delete.
> 	(vld1q_s16): Delete.
> 	(vld1q_u8): Delete.
> 	(vld1q_u32): Delete.
> 	(vld1q_u16): Delete.
> 	(vld1q_f32): Delete.
> 	(vld1q_f16): Delete.
> 	(vst1q_f32): Delete.
> 	(vst1q_f16): Delete.
> 	(vst1q_s8): Delete.
> 	(vst1q_s32): Delete.
> 	(vst1q_s16): Delete.
> 	(vst1q_u8): Delete.
> 	(vst1q_u32): Delete.
> 	(vst1q_u16): Delete.
> 	(__arm_vld1q_s8): Delete.
> 	(__arm_vld1q_s32): Delete.
> 	(__arm_vld1q_s16): Delete.
> 	(__arm_vld1q_u8): Delete.
> 	(__arm_vld1q_u32): Delete.
> 	(__arm_vld1q_u16): Delete.
> 	(__arm_vst1q_s8): Delete.
> 	(__arm_vst1q_s32): Delete.
> 	(__arm_vst1q_s16): Delete.
> 	(__arm_vst1q_u8): Delete.
> 	(__arm_vst1q_u32): Delete.
> 	(__arm_vst1q_u16): Delete.
> 	(__arm_vld1q_f32): Delete.
> 	(__arm_vld1q_f16): Delete.
> 	(__arm_vst1q_f32): Delete.
> 	(__arm_vst1q_f16): Delete.
> 	(__arm_vld1q): Delete.
> 	(__arm_vst1q): Delete.
> 	* config/arm/mve.md (mve_vld1q_f<mode>): Rename into ...
> 	(@mve_vld1q_f<mode>): ... this.
> 	(mve_vld1q_<supf><mode>): Rename into ...
> 	(@mve_vld1q_<supf><mode>) ... this.
> 	(mve_vst1q_f<mode>): Rename into ...
> 	(@mve_vst1q_f<mode>): ... this.
> 	(mve_vst1q_<supf><mode>): Rename into ...
> 	(@mve_vst1q_<supf><mode>) ... this.
> ---
>  gcc/config/arm/arm-mve-builtins-base.cc  |  58 +++++
>  gcc/config/arm/arm-mve-builtins-base.def |   4 +
>  gcc/config/arm/arm-mve-builtins-base.h   |   4 +-
>  gcc/config/arm/arm_mve.h                 | 282 -----------------------
>  gcc/config/arm/mve.md                    |   8 +-
>  5 files changed, 69 insertions(+), 287 deletions(-)
> 
> diff --git a/gcc/config/arm/arm-mve-builtins-base.cc b/gcc/config/arm/arm-
> mve-builtins-base.cc
> index 5478cac8aeb..cfe1b954a29 100644
> --- a/gcc/config/arm/arm-mve-builtins-base.cc
> +++ b/gcc/config/arm/arm-mve-builtins-base.cc
> @@ -83,6 +83,62 @@ class vuninitializedq_impl : public
> quiet<function_base>
>    }
>  };
> 
> +class vld1_impl : public full_width_access
> +{
> +public:
> +  unsigned int
> +  call_properties (const function_instance &) const override
> +  {
> +    return CP_READ_MEMORY;
> +  }
> +
> +  rtx
> +  expand (function_expander &e) const override
> +  {
> +    insn_code icode;
> +    if (e.type_suffix (0).float_p)
> +      icode = code_for_mve_vld1q_f(e.vector_mode (0));
> +    else
> +      {
> +	if (e.type_suffix (0).unsigned_p)
> +	  icode = code_for_mve_vld1q(VLD1Q_U,
> +				     e.vector_mode (0));
> +	else
> +	  icode = code_for_mve_vld1q(VLD1Q_S,
> +				     e.vector_mode (0));
> +      }
> +    return e.use_contiguous_load_insn (icode);
> +  }
> +};
> +
> +class vst1_impl : public full_width_access
> +{
> +public:
> +  unsigned int
> +  call_properties (const function_instance &) const override
> +  {
> +    return CP_WRITE_MEMORY;
> +  }
> +
> +  rtx
> +  expand (function_expander &e) const override
> +  {
> +    insn_code icode;
> +    if (e.type_suffix (0).float_p)
> +      icode = code_for_mve_vst1q_f(e.vector_mode (0));
> +    else
> +      {
> +	if (e.type_suffix (0).unsigned_p)
> +	  icode = code_for_mve_vst1q(VST1Q_U,
> +				     e.vector_mode (0));
> +	else
> +	  icode = code_for_mve_vst1q(VST1Q_S,
> +				     e.vector_mode (0));
> +      }
> +    return e.use_contiguous_store_insn (icode);
> +  }
> +};
> +
>  } /* end anonymous namespace */
> 
>  namespace arm_mve {
> @@ -290,6 +346,7 @@ FUNCTION (vfmasq,
> unspec_mve_function_exact_insn, (-1, -1, -1, -1, -1, VFMASQ_N_
>  FUNCTION (vfmsq, unspec_mve_function_exact_insn, (-1, -1, VFMSQ_F, -1, -
> 1, -1, -1, -1, VFMSQ_M_F, -1, -1, -1))
>  FUNCTION_WITH_M_N_NO_F (vhaddq, VHADDQ)
>  FUNCTION_WITH_M_N_NO_F (vhsubq, VHSUBQ)
> +FUNCTION (vld1q, vld1_impl,)
>  FUNCTION_PRED_P_S (vmaxavq, VMAXAVQ)
>  FUNCTION_WITHOUT_N_NO_U_F (vmaxaq, VMAXAQ)
>  FUNCTION_ONLY_F (vmaxnmaq, VMAXNMAQ)
> @@ -405,6 +462,7 @@ FUNCTION_ONLY_N_NO_F (vshrntq, VSHRNTQ)
>  FUNCTION_ONLY_N_NO_F (vshrq, VSHRQ)
>  FUNCTION_ONLY_N_NO_F (vsliq, VSLIQ)
>  FUNCTION_ONLY_N_NO_F (vsriq, VSRIQ)
> +FUNCTION (vst1q, vst1_impl,)
>  FUNCTION_WITH_RTX_M_N (vsubq, MINUS, VSUBQ)
>  FUNCTION (vuninitializedq, vuninitializedq_impl,)
> 
> diff --git a/gcc/config/arm/arm-mve-builtins-base.def b/gcc/config/arm/arm-
> mve-builtins-base.def
> index 01dfbdef8a3..16879246237 100644
> --- a/gcc/config/arm/arm-mve-builtins-base.def
> +++ b/gcc/config/arm/arm-mve-builtins-base.def
> @@ -47,6 +47,7 @@ DEF_MVE_FUNCTION (vhaddq, binary_opt_n,
> all_integer, mx_or_none)
>  DEF_MVE_FUNCTION (vhcaddq_rot90, binary, all_signed, mx_or_none)
>  DEF_MVE_FUNCTION (vhcaddq_rot270, binary, all_signed, mx_or_none)
>  DEF_MVE_FUNCTION (vhsubq, binary_opt_n, all_integer, mx_or_none)
> +DEF_MVE_FUNCTION (vld1q, load, all_integer, none)
>  DEF_MVE_FUNCTION (vmaxaq, binary_maxamina, all_signed, m_or_none)
>  DEF_MVE_FUNCTION (vmaxavq, binary_maxavminav, all_signed, p_or_none)
>  DEF_MVE_FUNCTION (vmaxq, binary, all_integer, mx_or_none)
> @@ -150,6 +151,7 @@ DEF_MVE_FUNCTION (vshrntq, binary_rshift_narrow,
> integer_16_32, m_or_none)
>  DEF_MVE_FUNCTION (vshrq, binary_rshift, all_integer, mx_or_none)
>  DEF_MVE_FUNCTION (vsliq, ternary_lshift, all_integer, m_or_none)
>  DEF_MVE_FUNCTION (vsriq, ternary_rshift, all_integer, m_or_none)
> +DEF_MVE_FUNCTION (vst1q, store, all_integer, none)
>  DEF_MVE_FUNCTION (vsubq, binary_opt_n, all_integer, mx_or_none)
>  DEF_MVE_FUNCTION (vuninitializedq, inherent, all_integer_with_64, none)
>  #undef REQUIRES_FLOAT
> @@ -182,6 +184,7 @@ DEF_MVE_FUNCTION (veorq, binary, all_float,
> mx_or_none)
>  DEF_MVE_FUNCTION (vfmaq, ternary_opt_n, all_float, m_or_none)
>  DEF_MVE_FUNCTION (vfmasq, ternary_n, all_float, m_or_none)
>  DEF_MVE_FUNCTION (vfmsq, ternary, all_float, m_or_none)
> +DEF_MVE_FUNCTION (vld1q, load, all_float, none)
>  DEF_MVE_FUNCTION (vmaxnmaq, binary, all_float, m_or_none)
>  DEF_MVE_FUNCTION (vmaxnmavq, binary_maxvminv, all_float, p_or_none)
>  DEF_MVE_FUNCTION (vmaxnmq, binary, all_float, mx_or_none)
> @@ -203,6 +206,7 @@ DEF_MVE_FUNCTION (vrndnq, unary, all_float,
> mx_or_none)
>  DEF_MVE_FUNCTION (vrndpq, unary, all_float, mx_or_none)
>  DEF_MVE_FUNCTION (vrndq, unary, all_float, mx_or_none)
>  DEF_MVE_FUNCTION (vrndxq, unary, all_float, mx_or_none)
> +DEF_MVE_FUNCTION (vst1q, store, all_float, none)
>  DEF_MVE_FUNCTION (vsubq, binary_opt_n, all_float, mx_or_none)
>  DEF_MVE_FUNCTION (vuninitializedq, inherent, all_float, none)
>  #undef REQUIRES_FLOAT
> diff --git a/gcc/config/arm/arm-mve-builtins-base.h b/gcc/config/arm/arm-
> mve-builtins-base.h
> index c574c32ac53..8c7e5fe5c3e 100644
> --- a/gcc/config/arm/arm-mve-builtins-base.h
> +++ b/gcc/config/arm/arm-mve-builtins-base.h
> @@ -63,6 +63,7 @@ extern const function_base *const vhaddq;
>  extern const function_base *const vhcaddq_rot270;
>  extern const function_base *const vhcaddq_rot90;
>  extern const function_base *const vhsubq;
> +extern const function_base *const vld1q;
>  extern const function_base *const vmaxaq;
>  extern const function_base *const vmaxavq;
>  extern const function_base *const vmaxnmaq;
> @@ -103,8 +104,8 @@ extern const function_base *const vmovnbq;
>  extern const function_base *const vmovntq;
>  extern const function_base *const vmulhq;
>  extern const function_base *const vmullbq_int;
> -extern const function_base *const vmulltq_int;
>  extern const function_base *const vmullbq_poly;
> +extern const function_base *const vmulltq_int;
>  extern const function_base *const vmulltq_poly;
>  extern const function_base *const vmulq;
>  extern const function_base *const vmvnq;
> @@ -178,6 +179,7 @@ extern const function_base *const vshrntq;
>  extern const function_base *const vshrq;
>  extern const function_base *const vsliq;
>  extern const function_base *const vsriq;
> +extern const function_base *const vst1q;
>  extern const function_base *const vsubq;
>  extern const function_base *const vuninitializedq;
> 
> diff --git a/gcc/config/arm/arm_mve.h b/gcc/config/arm/arm_mve.h
> index b82d94e59bd..cc027f9cbb5 100644
> --- a/gcc/config/arm/arm_mve.h
> +++ b/gcc/config/arm/arm_mve.h
> @@ -56,7 +56,6 @@
>  #define vstrbq_scatter_offset_p(__base, __offset, __value, __p)
> __arm_vstrbq_scatter_offset_p(__base, __offset, __value, __p)
>  #define vstrwq_scatter_base_p(__addr, __offset, __value, __p)
> __arm_vstrwq_scatter_base_p(__addr, __offset, __value, __p)
>  #define vldrbq_gather_offset_z(__base, __offset, __p)
> __arm_vldrbq_gather_offset_z(__base, __offset, __p)
> -#define vld1q(__base) __arm_vld1q(__base)
>  #define vldrhq_gather_offset(__base, __offset)
> __arm_vldrhq_gather_offset(__base, __offset)
>  #define vldrhq_gather_offset_z(__base, __offset, __p)
> __arm_vldrhq_gather_offset_z(__base, __offset, __p)
>  #define vldrhq_gather_shifted_offset(__base, __offset)
> __arm_vldrhq_gather_shifted_offset(__base, __offset)
> @@ -69,7 +68,6 @@
>  #define vldrwq_gather_offset_z(__base, __offset, __p)
> __arm_vldrwq_gather_offset_z(__base, __offset, __p)
>  #define vldrwq_gather_shifted_offset(__base, __offset)
> __arm_vldrwq_gather_shifted_offset(__base, __offset)
>  #define vldrwq_gather_shifted_offset_z(__base, __offset, __p)
> __arm_vldrwq_gather_shifted_offset_z(__base, __offset, __p)
> -#define vst1q(__addr, __value) __arm_vst1q(__addr, __value)
>  #define vstrhq_scatter_offset(__base, __offset, __value)
> __arm_vstrhq_scatter_offset(__base, __offset, __value)
>  #define vstrhq_scatter_offset_p(__base, __offset, __value, __p)
> __arm_vstrhq_scatter_offset_p(__base, __offset, __value, __p)
>  #define vstrhq_scatter_shifted_offset(__base, __offset, __value)
> __arm_vstrhq_scatter_shifted_offset(__base, __offset, __value)
> @@ -346,12 +344,6 @@
>  #define vldrbq_z_u32(__base, __p) __arm_vldrbq_z_u32(__base, __p)
>  #define vldrwq_gather_base_z_u32(__addr,  __offset, __p)
> __arm_vldrwq_gather_base_z_u32(__addr,  __offset, __p)
>  #define vldrwq_gather_base_z_s32(__addr,  __offset, __p)
> __arm_vldrwq_gather_base_z_s32(__addr,  __offset, __p)
> -#define vld1q_s8(__base) __arm_vld1q_s8(__base)
> -#define vld1q_s32(__base) __arm_vld1q_s32(__base)
> -#define vld1q_s16(__base) __arm_vld1q_s16(__base)
> -#define vld1q_u8(__base) __arm_vld1q_u8(__base)
> -#define vld1q_u32(__base) __arm_vld1q_u32(__base)
> -#define vld1q_u16(__base) __arm_vld1q_u16(__base)
>  #define vldrhq_gather_offset_s32(__base, __offset)
> __arm_vldrhq_gather_offset_s32(__base, __offset)
>  #define vldrhq_gather_offset_s16(__base, __offset)
> __arm_vldrhq_gather_offset_s16(__base, __offset)
>  #define vldrhq_gather_offset_u32(__base, __offset)
> __arm_vldrhq_gather_offset_u32(__base, __offset)
> @@ -380,8 +372,6 @@
>  #define vldrwq_u32(__base) __arm_vldrwq_u32(__base)
>  #define vldrwq_z_s32(__base, __p) __arm_vldrwq_z_s32(__base, __p)
>  #define vldrwq_z_u32(__base, __p) __arm_vldrwq_z_u32(__base, __p)
> -#define vld1q_f32(__base) __arm_vld1q_f32(__base)
> -#define vld1q_f16(__base) __arm_vld1q_f16(__base)
>  #define vldrhq_f16(__base) __arm_vldrhq_f16(__base)
>  #define vldrhq_z_f16(__base, __p) __arm_vldrhq_z_f16(__base, __p)
>  #define vldrwq_f32(__base) __arm_vldrwq_f32(__base)
> @@ -416,14 +406,6 @@
>  #define vldrwq_gather_shifted_offset_z_f32(__base, __offset, __p)
> __arm_vldrwq_gather_shifted_offset_z_f32(__base, __offset, __p)
>  #define vldrwq_gather_shifted_offset_z_s32(__base, __offset, __p)
> __arm_vldrwq_gather_shifted_offset_z_s32(__base, __offset, __p)
>  #define vldrwq_gather_shifted_offset_z_u32(__base, __offset, __p)
> __arm_vldrwq_gather_shifted_offset_z_u32(__base, __offset, __p)
> -#define vst1q_f32(__addr, __value) __arm_vst1q_f32(__addr, __value)
> -#define vst1q_f16(__addr, __value) __arm_vst1q_f16(__addr, __value)
> -#define vst1q_s8(__addr, __value) __arm_vst1q_s8(__addr, __value)
> -#define vst1q_s32(__addr, __value) __arm_vst1q_s32(__addr, __value)
> -#define vst1q_s16(__addr, __value) __arm_vst1q_s16(__addr, __value)
> -#define vst1q_u8(__addr, __value) __arm_vst1q_u8(__addr, __value)
> -#define vst1q_u32(__addr, __value) __arm_vst1q_u32(__addr, __value)
> -#define vst1q_u16(__addr, __value) __arm_vst1q_u16(__addr, __value)
>  #define vstrhq_f16(__addr, __value) __arm_vstrhq_f16(__addr, __value)
>  #define vstrhq_scatter_offset_s32( __base, __offset, __value)
> __arm_vstrhq_scatter_offset_s32( __base, __offset, __value)
>  #define vstrhq_scatter_offset_s16( __base, __offset, __value)
> __arm_vstrhq_scatter_offset_s16( __base, __offset, __value)
> @@ -1537,48 +1519,6 @@ __arm_vldrwq_gather_base_z_u32 (uint32x4_t
> __addr, const int __offset, mve_pred1
>    return __builtin_mve_vldrwq_gather_base_z_uv4si (__addr, __offset, __p);
>  }
> 
> -__extension__ extern __inline int8x16_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vld1q_s8 (int8_t const * __base)
> -{
> -  return __builtin_mve_vld1q_sv16qi ((__builtin_neon_qi *) __base);
> -}
> -
> -__extension__ extern __inline int32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vld1q_s32 (int32_t const * __base)
> -{
> -  return __builtin_mve_vld1q_sv4si ((__builtin_neon_si *) __base);
> -}
> -
> -__extension__ extern __inline int16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vld1q_s16 (int16_t const * __base)
> -{
> -  return __builtin_mve_vld1q_sv8hi ((__builtin_neon_hi *) __base);
> -}
> -
> -__extension__ extern __inline uint8x16_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vld1q_u8 (uint8_t const * __base)
> -{
> -  return __builtin_mve_vld1q_uv16qi ((__builtin_neon_qi *) __base);
> -}
> -
> -__extension__ extern __inline uint32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vld1q_u32 (uint32_t const * __base)
> -{
> -  return __builtin_mve_vld1q_uv4si ((__builtin_neon_si *) __base);
> -}
> -
> -__extension__ extern __inline uint16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vld1q_u16 (uint16_t const * __base)
> -{
> -  return __builtin_mve_vld1q_uv8hi ((__builtin_neon_hi *) __base);
> -}
> -
>  __extension__ extern __inline int32x4_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  __arm_vldrhq_gather_offset_s32 (int16_t const * __base, uint32x4_t
> __offset)
> @@ -1917,48 +1857,6 @@ __arm_vldrwq_gather_shifted_offset_z_u32
> (uint32_t const * __base, uint32x4_t __
>    return __builtin_mve_vldrwq_gather_shifted_offset_z_uv4si
> ((__builtin_neon_si *) __base, __offset, __p);
>  }
> 
> -__extension__ extern __inline void
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vst1q_s8 (int8_t * __addr, int8x16_t __value)
> -{
> -  __builtin_mve_vst1q_sv16qi ((__builtin_neon_qi *) __addr, __value);
> -}
> -
> -__extension__ extern __inline void
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vst1q_s32 (int32_t * __addr, int32x4_t __value)
> -{
> -  __builtin_mve_vst1q_sv4si ((__builtin_neon_si *) __addr, __value);
> -}
> -
> -__extension__ extern __inline void
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vst1q_s16 (int16_t * __addr, int16x8_t __value)
> -{
> -  __builtin_mve_vst1q_sv8hi ((__builtin_neon_hi *) __addr, __value);
> -}
> -
> -__extension__ extern __inline void
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vst1q_u8 (uint8_t * __addr, uint8x16_t __value)
> -{
> -  __builtin_mve_vst1q_uv16qi ((__builtin_neon_qi *) __addr, __value);
> -}
> -
> -__extension__ extern __inline void
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vst1q_u32 (uint32_t * __addr, uint32x4_t __value)
> -{
> -  __builtin_mve_vst1q_uv4si ((__builtin_neon_si *) __addr, __value);
> -}
> -
> -__extension__ extern __inline void
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vst1q_u16 (uint16_t * __addr, uint16x8_t __value)
> -{
> -  __builtin_mve_vst1q_uv8hi ((__builtin_neon_hi *) __addr, __value);
> -}
> -
>  __extension__ extern __inline void
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  __arm_vstrhq_scatter_offset_s32 (int16_t * __base, uint32x4_t __offset,
> int32x4_t __value)
> @@ -4421,20 +4319,6 @@ __arm_vornq_m_f16 (float16x8_t __inactive,
> float16x8_t __a, float16x8_t __b, mve
>    return __builtin_mve_vornq_m_fv8hf (__inactive, __a, __b, __p);
>  }
> 
> -__extension__ extern __inline float32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vld1q_f32 (float32_t const * __base)
> -{
> -  return __builtin_mve_vld1q_fv4sf((__builtin_neon_si *) __base);
> -}
> -
> -__extension__ extern __inline float16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vld1q_f16 (float16_t const * __base)
> -{
> -  return __builtin_mve_vld1q_fv8hf((__builtin_neon_hi *) __base);
> -}
> -
>  __extension__ extern __inline float32x4_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  __arm_vldrwq_f32 (float32_t const * __base)
> @@ -4547,20 +4431,6 @@ __arm_vstrwq_f32 (float32_t * __addr,
> float32x4_t __value)
>    __builtin_mve_vstrwq_fv4sf ((__builtin_neon_si *) __addr, __value);
>  }
> 
> -__extension__ extern __inline void
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vst1q_f32 (float32_t * __addr, float32x4_t __value)
> -{
> -  __builtin_mve_vst1q_fv4sf ((__builtin_neon_si *) __addr, __value);
> -}
> -
> -__extension__ extern __inline void
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vst1q_f16 (float16_t * __addr, float16x8_t __value)
> -{
> -  __builtin_mve_vst1q_fv8hf ((__builtin_neon_hi *) __addr, __value);
> -}
> -
>  __extension__ extern __inline void
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  __arm_vstrhq_f16 (float16_t * __addr, float16x8_t __value)
> @@ -5651,48 +5521,6 @@ __arm_vldrbq_gather_offset_z (uint8_t const *
> __base, uint16x8_t __offset, mve_p
>   return __arm_vldrbq_gather_offset_z_u16 (__base, __offset, __p);
>  }
> 
> -__extension__ extern __inline int8x16_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vld1q (int8_t const * __base)
> -{
> - return __arm_vld1q_s8 (__base);
> -}
> -
> -__extension__ extern __inline int32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vld1q (int32_t const * __base)
> -{
> - return __arm_vld1q_s32 (__base);
> -}
> -
> -__extension__ extern __inline int16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vld1q (int16_t const * __base)
> -{
> - return __arm_vld1q_s16 (__base);
> -}
> -
> -__extension__ extern __inline uint8x16_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vld1q (uint8_t const * __base)
> -{
> - return __arm_vld1q_u8 (__base);
> -}
> -
> -__extension__ extern __inline uint32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vld1q (uint32_t const * __base)
> -{
> - return __arm_vld1q_u32 (__base);
> -}
> -
> -__extension__ extern __inline uint16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vld1q (uint16_t const * __base)
> -{
> - return __arm_vld1q_u16 (__base);
> -}
> -
>  __extension__ extern __inline int32x4_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  __arm_vldrhq_gather_offset (int16_t const * __base, uint32x4_t __offset)
> @@ -5917,48 +5745,6 @@ __arm_vldrwq_gather_shifted_offset_z (uint32_t
> const * __base, uint32x4_t __offs
>   return __arm_vldrwq_gather_shifted_offset_z_u32 (__base, __offset, __p);
>  }
> 
> -__extension__ extern __inline void
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vst1q (int8_t * __addr, int8x16_t __value)
> -{
> - __arm_vst1q_s8 (__addr, __value);
> -}
> -
> -__extension__ extern __inline void
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vst1q (int32_t * __addr, int32x4_t __value)
> -{
> - __arm_vst1q_s32 (__addr, __value);
> -}
> -
> -__extension__ extern __inline void
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vst1q (int16_t * __addr, int16x8_t __value)
> -{
> - __arm_vst1q_s16 (__addr, __value);
> -}
> -
> -__extension__ extern __inline void
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vst1q (uint8_t * __addr, uint8x16_t __value)
> -{
> - __arm_vst1q_u8 (__addr, __value);
> -}
> -
> -__extension__ extern __inline void
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vst1q (uint32_t * __addr, uint32x4_t __value)
> -{
> - __arm_vst1q_u32 (__addr, __value);
> -}
> -
> -__extension__ extern __inline void
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vst1q (uint16_t * __addr, uint16x8_t __value)
> -{
> - __arm_vst1q_u16 (__addr, __value);
> -}
> -
>  __extension__ extern __inline void
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  __arm_vstrhq_scatter_offset (int16_t * __base, uint32x4_t __offset,
> int32x4_t __value)
> @@ -7809,20 +7595,6 @@ __arm_vornq_m (float16x8_t __inactive,
> float16x8_t __a, float16x8_t __b, mve_pre
>   return __arm_vornq_m_f16 (__inactive, __a, __b, __p);
>  }
> 
> -__extension__ extern __inline float32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vld1q (float32_t const * __base)
> -{
> - return __arm_vld1q_f32 (__base);
> -}
> -
> -__extension__ extern __inline float16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vld1q (float16_t const * __base)
> -{
> - return __arm_vld1q_f16 (__base);
> -}
> -
>  __extension__ extern __inline float16x8_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  __arm_vldrhq_gather_offset (float16_t const * __base, uint16x8_t __offset)
> @@ -7893,20 +7665,6 @@ __arm_vstrwq (float32_t * __addr, float32x4_t
> __value)
>   __arm_vstrwq_f32 (__addr, __value);
>  }
> 
> -__extension__ extern __inline void
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vst1q (float32_t * __addr, float32x4_t __value)
> -{
> - __arm_vst1q_f32 (__addr, __value);
> -}
> -
> -__extension__ extern __inline void
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vst1q (float16_t * __addr, float16x8_t __value)
> -{
> - __arm_vst1q_f16 (__addr, __value);
> -}
> -
>  __extension__ extern __inline void
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  __arm_vstrhq (float16_t * __addr, float16x8_t __value)
> @@ -8670,17 +8428,6 @@ extern void *__ARM_undef;
>    int
> (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t][__ARM_
> mve_type_float16x8_t]: __arm_vornq_m_f16 (__ARM_mve_coerce(__p0,
> float16x8_t), __ARM_mve_coerce(__p1, float16x8_t),
> __ARM_mve_coerce(__p2, float16x8_t), p3), \
>    int
> (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t][__ARM_
> mve_type_float32x4_t]: __arm_vornq_m_f32 (__ARM_mve_coerce(__p0,
> float32x4_t), __ARM_mve_coerce(__p1, float32x4_t),
> __ARM_mve_coerce(__p2, float32x4_t), p3));})
> 
> -#define __arm_vld1q(p0) (\
> -  _Generic( (int (*)[__ARM_mve_typeid(p0)])0, \
> -  int (*)[__ARM_mve_type_int8_t_ptr]: __arm_vld1q_s8
> (__ARM_mve_coerce_s8_ptr(p0, int8_t *)), \
> -  int (*)[__ARM_mve_type_int16_t_ptr]: __arm_vld1q_s16
> (__ARM_mve_coerce_s16_ptr(p0, int16_t *)), \
> -  int (*)[__ARM_mve_type_int32_t_ptr]: __arm_vld1q_s32
> (__ARM_mve_coerce_s32_ptr(p0, int32_t *)), \
> -  int (*)[__ARM_mve_type_uint8_t_ptr]: __arm_vld1q_u8
> (__ARM_mve_coerce_u8_ptr(p0, uint8_t *)), \
> -  int (*)[__ARM_mve_type_uint16_t_ptr]: __arm_vld1q_u16
> (__ARM_mve_coerce_u16_ptr(p0, uint16_t *)), \
> -  int (*)[__ARM_mve_type_uint32_t_ptr]: __arm_vld1q_u32
> (__ARM_mve_coerce_u32_ptr(p0, uint32_t *)), \
> -  int (*)[__ARM_mve_type_float16_t_ptr]: __arm_vld1q_f16
> (__ARM_mve_coerce_f16_ptr(p0, float16_t *)), \
> -  int (*)[__ARM_mve_type_float32_t_ptr]: __arm_vld1q_f32
> (__ARM_mve_coerce_f32_ptr(p0, float32_t *))))
> -
>  #define __arm_vld1q_z(p0,p1) ( \
>    _Generic( (int (*)[__ARM_mve_typeid(p0)])0, \
>    int (*)[__ARM_mve_type_int8_t_ptr]: __arm_vld1q_z_s8
> (__ARM_mve_coerce_s8_ptr(p0, int8_t *), p1), \
> @@ -8792,17 +8539,6 @@ extern void *__ARM_undef;
>    int (*)[__ARM_mve_type_float16_t_ptr][__ARM_mve_type_float16x8x2_t]:
> __arm_vst2q_f16 (__ARM_mve_coerce_f16_ptr(p0, float16_t *),
> __ARM_mve_coerce(__p1, float16x8x2_t)), \
>    int (*)[__ARM_mve_type_float32_t_ptr][__ARM_mve_type_float32x4x2_t]:
> __arm_vst2q_f32 (__ARM_mve_coerce_f32_ptr(p0, float32_t *),
> __ARM_mve_coerce(__p1, float32x4x2_t)));})
> 
> -#define __arm_vst1q(p0,p1) ({ __typeof(p1) __p1 = (p1); \
> -  _Generic( (int (*)[__ARM_mve_typeid(p0)][__ARM_mve_typeid(__p1)])0, \
> -  int (*)[__ARM_mve_type_int8_t_ptr][__ARM_mve_type_int8x16_t]:
> __arm_vst1q_s8 (__ARM_mve_coerce_s8_ptr(p0, int8_t *),
> __ARM_mve_coerce(__p1, int8x16_t)), \
> -  int (*)[__ARM_mve_type_int16_t_ptr][__ARM_mve_type_int16x8_t]:
> __arm_vst1q_s16 (__ARM_mve_coerce_s16_ptr(p0, int16_t *),
> __ARM_mve_coerce(__p1, int16x8_t)), \
> -  int (*)[__ARM_mve_type_int32_t_ptr][__ARM_mve_type_int32x4_t]:
> __arm_vst1q_s32 (__ARM_mve_coerce_s32_ptr(p0, int32_t *),
> __ARM_mve_coerce(__p1, int32x4_t)), \
> -  int (*)[__ARM_mve_type_uint8_t_ptr][__ARM_mve_type_uint8x16_t]:
> __arm_vst1q_u8 (__ARM_mve_coerce_u8_ptr(p0, uint8_t *),
> __ARM_mve_coerce(__p1, uint8x16_t)), \
> -  int (*)[__ARM_mve_type_uint16_t_ptr][__ARM_mve_type_uint16x8_t]:
> __arm_vst1q_u16 (__ARM_mve_coerce_u16_ptr(p0, uint16_t *),
> __ARM_mve_coerce(__p1, uint16x8_t)), \
> -  int (*)[__ARM_mve_type_uint32_t_ptr][__ARM_mve_type_uint32x4_t]:
> __arm_vst1q_u32 (__ARM_mve_coerce_u32_ptr(p0, uint32_t *),
> __ARM_mve_coerce(__p1, uint32x4_t)), \
> -  int (*)[__ARM_mve_type_float16_t_ptr][__ARM_mve_type_float16x8_t]:
> __arm_vst1q_f16 (__ARM_mve_coerce_f16_ptr(p0, float16_t *),
> __ARM_mve_coerce(__p1, float16x8_t)), \
> -  int (*)[__ARM_mve_type_float32_t_ptr][__ARM_mve_type_float32x4_t]:
> __arm_vst1q_f32 (__ARM_mve_coerce_f32_ptr(p0, float32_t *),
> __ARM_mve_coerce(__p1, float32x4_t)));})
> -
>  #define __arm_vstrhq(p0,p1) ({ __typeof(p1) __p1 = (p1); \
>    _Generic( (int (*)[__ARM_mve_typeid(p0)][__ARM_mve_typeid(__p1)])0, \
>    int (*)[__ARM_mve_type_int16_t_ptr][__ARM_mve_type_int16x8_t]:
> __arm_vstrhq_s16 (__ARM_mve_coerce_s16_ptr(p0, int16_t *),
> __ARM_mve_coerce(__p1, int16x8_t)), \
> @@ -9149,15 +8885,6 @@ extern void *__ARM_undef;
>    int (*)[__ARM_mve_type_int32x4_t]: __arm_vstrwq_scatter_base_p_s32
> (p0, p1, __ARM_mve_coerce(__p2, int32x4_t), p3), \
>    int (*)[__ARM_mve_type_uint32x4_t]: __arm_vstrwq_scatter_base_p_u32
> (p0, p1, __ARM_mve_coerce(__p2, uint32x4_t), p3));})
> 
> -#define __arm_vld1q(p0) (\
> -  _Generic( (int (*)[__ARM_mve_typeid(p0)])0, \
> -  int (*)[__ARM_mve_type_int8_t_ptr]: __arm_vld1q_s8
> (__ARM_mve_coerce_s8_ptr(p0, int8_t *)), \
> -  int (*)[__ARM_mve_type_int16_t_ptr]: __arm_vld1q_s16
> (__ARM_mve_coerce_s16_ptr(p0, int16_t *)), \
> -  int (*)[__ARM_mve_type_int32_t_ptr]: __arm_vld1q_s32
> (__ARM_mve_coerce_s32_ptr(p0, int32_t *)), \
> -  int (*)[__ARM_mve_type_uint8_t_ptr]: __arm_vld1q_u8
> (__ARM_mve_coerce_u8_ptr(p0, uint8_t *)), \
> -  int (*)[__ARM_mve_type_uint16_t_ptr]: __arm_vld1q_u16
> (__ARM_mve_coerce_u16_ptr(p0, uint16_t *)), \
> -  int (*)[__ARM_mve_type_uint32_t_ptr]: __arm_vld1q_u32
> (__ARM_mve_coerce_u32_ptr(p0, uint32_t *))))
> -
>  #define __arm_vldrhq_gather_offset(p0,p1) ({ __typeof(p1) __p1 = (p1); \
>    _Generic( (int (*)[__ARM_mve_typeid(p0)][__ARM_mve_typeid(__p1)])0, \
>    int (*)[__ARM_mve_type_int16_t_ptr][__ARM_mve_type_uint16x8_t]:
> __arm_vldrhq_gather_offset_s16 (__ARM_mve_coerce_s16_ptr(p0, int16_t
> *), __ARM_mve_coerce(__p1, uint16x8_t)), \
> @@ -9206,15 +8933,6 @@ extern void *__ARM_undef;
>    int (*)[__ARM_mve_type_int32_t_ptr]:
> __arm_vldrwq_gather_shifted_offset_z_s32
> (__ARM_mve_coerce_s32_ptr(__p0, int32_t *), p1, p2), \
>    int (*)[__ARM_mve_type_uint32_t_ptr]:
> __arm_vldrwq_gather_shifted_offset_z_u32
> (__ARM_mve_coerce_u32_ptr(__p0, uint32_t *), p1, p2));})
> 
> -#define __arm_vst1q(p0,p1) ({ __typeof(p1) __p1 = (p1); \
> -  _Generic( (int (*)[__ARM_mve_typeid(p0)][__ARM_mve_typeid(__p1)])0, \
> -  int (*)[__ARM_mve_type_int8_t_ptr][__ARM_mve_type_int8x16_t]:
> __arm_vst1q_s8 (__ARM_mve_coerce_s8_ptr(p0, int8_t *),
> __ARM_mve_coerce(__p1, int8x16_t)), \
> -  int (*)[__ARM_mve_type_int16_t_ptr][__ARM_mve_type_int16x8_t]:
> __arm_vst1q_s16 (__ARM_mve_coerce_s16_ptr(p0, int16_t *),
> __ARM_mve_coerce(__p1, int16x8_t)), \
> -  int (*)[__ARM_mve_type_int32_t_ptr][__ARM_mve_type_int32x4_t]:
> __arm_vst1q_s32 (__ARM_mve_coerce_s32_ptr(p0, int32_t *),
> __ARM_mve_coerce(__p1, int32x4_t)), \
> -  int (*)[__ARM_mve_type_uint8_t_ptr][__ARM_mve_type_uint8x16_t]:
> __arm_vst1q_u8 (__ARM_mve_coerce_u8_ptr(p0, uint8_t *),
> __ARM_mve_coerce(__p1, uint8x16_t)), \
> -  int (*)[__ARM_mve_type_uint16_t_ptr][__ARM_mve_type_uint16x8_t]:
> __arm_vst1q_u16 (__ARM_mve_coerce_u16_ptr(p0, uint16_t *),
> __ARM_mve_coerce(__p1, uint16x8_t)), \
> -  int (*)[__ARM_mve_type_uint32_t_ptr][__ARM_mve_type_uint32x4_t]:
> __arm_vst1q_u32 (__ARM_mve_coerce_u32_ptr(p0, uint32_t *),
> __ARM_mve_coerce(__p1, uint32x4_t)));})
> -
>  #define __arm_vst1q_p(p0,p1,p2) ({ __typeof(p1) __p1 = (p1); \
>    _Generic( (int (*)[__ARM_mve_typeid(p0)][__ARM_mve_typeid(__p1)])0, \
>    int (*)[__ARM_mve_type_int8_t_ptr][__ARM_mve_type_int8x16_t]:
> __arm_vst1q_p_s8 (__ARM_mve_coerce_s8_ptr(p0, int8_t *),
> __ARM_mve_coerce(__p1, int8x16_t), p2), \
> diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
> index 366cec0812a..b0d3443da9c 100644
> --- a/gcc/config/arm/mve.md
> +++ b/gcc/config/arm/mve.md
> @@ -3690,7 +3690,7 @@ (define_insn "mve_vldrwq_z_<supf>v4si"
>  }
>    [(set_attr "length" "8")])
> 
> -(define_expand "mve_vld1q_f<mode>"
> +(define_expand "@mve_vld1q_f<mode>"
>    [(match_operand:MVE_0 0 "s_register_operand")
>     (unspec:MVE_0 [(match_operand:<MVE_CNVT> 1
> "mve_memory_operand")] VLD1Q_F)
>    ]
> @@ -3700,7 +3700,7 @@ (define_expand "mve_vld1q_f<mode>"
>    DONE;
>  })
> 
> -(define_expand "mve_vld1q_<supf><mode>"
> +(define_expand "@mve_vld1q_<supf><mode>"
>    [(match_operand:MVE_2 0 "s_register_operand")
>     (unspec:MVE_2 [(match_operand:MVE_2 1 "mve_memory_operand")]
> VLD1Q)
>    ]
> @@ -4408,7 +4408,7 @@ (define_insn "mve_vstrwq_<supf>v4si"
>  }
>    [(set_attr "length" "4")])
> 
> -(define_expand "mve_vst1q_f<mode>"
> +(define_expand "@mve_vst1q_f<mode>"
>    [(match_operand:<MVE_CNVT> 0 "mve_memory_operand")
>     (unspec:<MVE_CNVT> [(match_operand:MVE_0 1 "s_register_operand")]
> VST1Q_F)
>    ]
> @@ -4418,7 +4418,7 @@ (define_expand "mve_vst1q_f<mode>"
>    DONE;
>  })
> 
> -(define_expand "mve_vst1q_<supf><mode>"
> +(define_expand "@mve_vst1q_<supf><mode>"
>    [(match_operand:MVE_2 0 "mve_memory_operand")
>     (unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand")] VST1Q)
>    ]
> --
> 2.34.1


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/6] arm: [MVE intrinsics] Add support for contiguous loads and stores
  2023-11-16 15:26 ` [PATCH 3/6] arm: [MVE intrinsics] Add support for contiguous loads and stores Christophe Lyon
  2023-11-16 16:48   ` Kyrylo Tkachov
@ 2023-11-23 13:29   ` Jan-Benedict Glaw
  2023-11-23 15:57     ` Christophe Lyon
  1 sibling, 1 reply; 15+ messages in thread
From: Jan-Benedict Glaw @ 2023-11-23 13:29 UTC (permalink / raw)
  To: Christophe Lyon
  Cc: gcc-patches, richard.sandiford, richard.earnshaw, kyrylo.tkachov

[-- Attachment #1: Type: text/plain, Size: 1387 bytes --]

On Thu, 2023-11-16 15:26:14 +0000, Christophe Lyon <christophe.lyon@linaro.org> wrote:
> diff --git a/gcc/config/arm/arm-mve-builtins-functions.h b/gcc/config/arm/arm-mve-builtins-functions.h
> index eba1f071af0..6d234a2dd7c 100644
> --- a/gcc/config/arm/arm-mve-builtins-functions.h
> +++ b/gcc/config/arm/arm-mve-builtins-functions.h
> @@ -966,6 +966,62 @@ public:
[...]

> +class full_width_access : public multi_vector_function
> +{
> +public:
> +  CONSTEXPR full_width_access (unsigned int vectors_per_tuple = 1)
> +    : multi_vector_function (vectors_per_tuple) {}
> +
> +  tree
> +  memory_scalar_type (const function_instance &fi) const override
> +  {
> +    return fi.scalar_type (0);
> +  }
> +
> +  machine_mode
> +  memory_vector_mode (const function_instance &fi) const override
> +  {
> +    machine_mode mode = fi.vector_mode (0);
> +    /* Vectors of floating-point are managed in memory as vectors of
> +       integers.  */
> +    switch (mode)
> +      {
> +      case E_V4SFmode:
> +	mode = E_V4SImode;
> +	break;
> +      case E_V8HFmode:
> +	mode = E_V8HImode;
> +	break;
> +      }

This introduces warnings about many enum values not being handled, so
a default would be good I think. (I do automated builds with
--enable-werror-always, see eg.
http://toolchain.lug-owl.de/laminar/log/gcc-arm-eabi/48)

MfG, JBG

-- 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/6] arm: [MVE intrinsics] Add support for contiguous loads and stores
  2023-11-23 13:29   ` Jan-Benedict Glaw
@ 2023-11-23 15:57     ` Christophe Lyon
  0 siblings, 0 replies; 15+ messages in thread
From: Christophe Lyon @ 2023-11-23 15:57 UTC (permalink / raw)
  To: Jan-Benedict Glaw
  Cc: gcc-patches, richard.sandiford, richard.earnshaw, kyrylo.tkachov

Hi!

On Thu, 23 Nov 2023 at 14:29, Jan-Benedict Glaw <jbglaw@lug-owl.de> wrote:
>
> On Thu, 2023-11-16 15:26:14 +0000, Christophe Lyon <christophe.lyon@linaro.org> wrote:
> > diff --git a/gcc/config/arm/arm-mve-builtins-functions.h b/gcc/config/arm/arm-mve-builtins-functions.h
> > index eba1f071af0..6d234a2dd7c 100644
> > --- a/gcc/config/arm/arm-mve-builtins-functions.h
> > +++ b/gcc/config/arm/arm-mve-builtins-functions.h
> > @@ -966,6 +966,62 @@ public:
> [...]
>
> > +class full_width_access : public multi_vector_function
> > +{
> > +public:
> > +  CONSTEXPR full_width_access (unsigned int vectors_per_tuple = 1)
> > +    : multi_vector_function (vectors_per_tuple) {}
> > +
> > +  tree
> > +  memory_scalar_type (const function_instance &fi) const override
> > +  {
> > +    return fi.scalar_type (0);
> > +  }
> > +
> > +  machine_mode
> > +  memory_vector_mode (const function_instance &fi) const override
> > +  {
> > +    machine_mode mode = fi.vector_mode (0);
> > +    /* Vectors of floating-point are managed in memory as vectors of
> > +       integers.  */
> > +    switch (mode)
> > +      {
> > +      case E_V4SFmode:
> > +     mode = E_V4SImode;
> > +     break;
> > +      case E_V8HFmode:
> > +     mode = E_V8HImode;
> > +     break;
> > +      }
>
> This introduces warnings about many enum values not being handled, so
> a default would be good I think. (I do automated builds with
> --enable-werror-always, see eg.
> http://toolchain.lug-owl.de/laminar/log/gcc-arm-eabi/48)
>

Ha right, thanks for catching this.

Fixed by commit b9dbdefac626ba20222ca534b58f7e493d713b9a

Christophe

> MfG, JBG
>
> --

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2023-11-23 15:58 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-11-16 15:26 [PATCH 1/6] arm: Fix arm_simd_types and MVE scalar_types Christophe Lyon
2023-11-16 15:26 ` [PATCH 2/6] arm: [MVE intrinsics] Add support for void and load/store pointers as argument types Christophe Lyon
2023-11-16 16:47   ` Kyrylo Tkachov
2023-11-16 15:26 ` [PATCH 3/6] arm: [MVE intrinsics] Add support for contiguous loads and stores Christophe Lyon
2023-11-16 16:48   ` Kyrylo Tkachov
2023-11-23 13:29   ` Jan-Benedict Glaw
2023-11-23 15:57     ` Christophe Lyon
2023-11-16 15:26 ` [PATCH 4/6] arm: [MVE intrinsics] add load and store shapes Christophe Lyon
2023-11-16 16:49   ` Kyrylo Tkachov
2023-11-16 15:26 ` [PATCH 5/6] arm: [MVE intrinsics] fix vst1 tests Christophe Lyon
2023-11-16 15:30   ` Kyrylo Tkachov
2023-11-16 15:37     ` Christophe Lyon
2023-11-16 15:26 ` [PATCH 6/6] arm: [MVE intrinsics] rework vldq1 vst1q Christophe Lyon
2023-11-16 16:49   ` Kyrylo Tkachov
2023-11-16 16:46 ` [PATCH 1/6] arm: Fix arm_simd_types and MVE scalar_types Kyrylo Tkachov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).