* [PATCH 1/6] arm: Fix arm_simd_types and MVE scalar_types
@ 2023-11-16 15:26 Christophe Lyon
2023-11-16 15:26 ` [PATCH 2/6] arm: [MVE intrinsics] Add support for void and load/store pointers as argument types Christophe Lyon
` (5 more replies)
0 siblings, 6 replies; 15+ messages in thread
From: Christophe Lyon @ 2023-11-16 15:26 UTC (permalink / raw)
To: gcc-patches, richard.sandiford, richard.earnshaw, kyrylo.tkachov
Cc: Christophe Lyon
So far we define arm_simd_types and scalar_types using type
definitions like intSI_type_node, etc...
This is causing problems with later patches which re-implement
load/store MVE intrinsics, leading to error messages such as:
error: passing argument 1 of 'vst1q_s32' from incompatible pointer type
note: expected 'int *' but argument is of type 'int32_t *' {aka 'long int *'}
This patch uses get_typenode_from_name (INT32_TYPE) instead, which
defines the types as appropriate for the target/C library.
2023-11-16 Christophe Lyon <christophe.lyon@linaro.org>
gcc/
* config/arm/arm-builtins.cc (arm_init_simd_builtin_types): Fix
initialization of arm_simd_types[].eltype.
* config/arm/arm-mve-builtins.def (DEF_MVE_TYPE): Fix scalar
types.
---
gcc/config/arm/arm-builtins.cc | 28 ++++++++++++++--------------
gcc/config/arm/arm-mve-builtins.def | 16 ++++++++--------
2 files changed, 22 insertions(+), 22 deletions(-)
diff --git a/gcc/config/arm/arm-builtins.cc b/gcc/config/arm/arm-builtins.cc
index fca7dcaf565..dd9c5815c45 100644
--- a/gcc/config/arm/arm-builtins.cc
+++ b/gcc/config/arm/arm-builtins.cc
@@ -1580,20 +1580,20 @@ arm_init_simd_builtin_types (void)
TYPE_STRING_FLAG (arm_simd_polyHI_type_node) = false;
}
/* Init all the element types built by the front-end. */
- arm_simd_types[Int8x8_t].eltype = intQI_type_node;
- arm_simd_types[Int8x16_t].eltype = intQI_type_node;
- arm_simd_types[Int16x4_t].eltype = intHI_type_node;
- arm_simd_types[Int16x8_t].eltype = intHI_type_node;
- arm_simd_types[Int32x2_t].eltype = intSI_type_node;
- arm_simd_types[Int32x4_t].eltype = intSI_type_node;
- arm_simd_types[Int64x2_t].eltype = intDI_type_node;
- arm_simd_types[Uint8x8_t].eltype = unsigned_intQI_type_node;
- arm_simd_types[Uint8x16_t].eltype = unsigned_intQI_type_node;
- arm_simd_types[Uint16x4_t].eltype = unsigned_intHI_type_node;
- arm_simd_types[Uint16x8_t].eltype = unsigned_intHI_type_node;
- arm_simd_types[Uint32x2_t].eltype = unsigned_intSI_type_node;
- arm_simd_types[Uint32x4_t].eltype = unsigned_intSI_type_node;
- arm_simd_types[Uint64x2_t].eltype = unsigned_intDI_type_node;
+ arm_simd_types[Int8x8_t].eltype = get_typenode_from_name (INT8_TYPE);
+ arm_simd_types[Int8x16_t].eltype = get_typenode_from_name (INT8_TYPE);
+ arm_simd_types[Int16x4_t].eltype = get_typenode_from_name (INT16_TYPE);
+ arm_simd_types[Int16x8_t].eltype = get_typenode_from_name (INT16_TYPE);
+ arm_simd_types[Int32x2_t].eltype = get_typenode_from_name (INT32_TYPE);
+ arm_simd_types[Int32x4_t].eltype = get_typenode_from_name (INT32_TYPE);
+ arm_simd_types[Int64x2_t].eltype = get_typenode_from_name (INT64_TYPE);
+ arm_simd_types[Uint8x8_t].eltype = get_typenode_from_name (UINT8_TYPE);
+ arm_simd_types[Uint8x16_t].eltype = get_typenode_from_name (UINT8_TYPE);
+ arm_simd_types[Uint16x4_t].eltype = get_typenode_from_name (UINT16_TYPE);
+ arm_simd_types[Uint16x8_t].eltype = get_typenode_from_name (UINT16_TYPE);
+ arm_simd_types[Uint32x2_t].eltype = get_typenode_from_name (UINT32_TYPE);
+ arm_simd_types[Uint32x4_t].eltype = get_typenode_from_name (UINT32_TYPE);
+ arm_simd_types[Uint64x2_t].eltype = get_typenode_from_name (UINT64_TYPE);
/* Note: poly64x2_t is defined in arm_neon.h, to ensure it gets default
mangling. */
diff --git a/gcc/config/arm/arm-mve-builtins.def b/gcc/config/arm/arm-mve-builtins.def
index e2cf1baf370..a901d8231e9 100644
--- a/gcc/config/arm/arm-mve-builtins.def
+++ b/gcc/config/arm/arm-mve-builtins.def
@@ -39,14 +39,14 @@ DEF_MVE_MODE (r, none, none, none)
#define REQUIRES_FLOAT false
DEF_MVE_TYPE (mve_pred16_t, boolean_type_node)
-DEF_MVE_TYPE (uint8x16_t, unsigned_intQI_type_node)
-DEF_MVE_TYPE (uint16x8_t, unsigned_intHI_type_node)
-DEF_MVE_TYPE (uint32x4_t, unsigned_intSI_type_node)
-DEF_MVE_TYPE (uint64x2_t, unsigned_intDI_type_node)
-DEF_MVE_TYPE (int8x16_t, intQI_type_node)
-DEF_MVE_TYPE (int16x8_t, intHI_type_node)
-DEF_MVE_TYPE (int32x4_t, intSI_type_node)
-DEF_MVE_TYPE (int64x2_t, intDI_type_node)
+DEF_MVE_TYPE (uint8x16_t, get_typenode_from_name (UINT8_TYPE))
+DEF_MVE_TYPE (uint16x8_t, get_typenode_from_name (UINT16_TYPE))
+DEF_MVE_TYPE (uint32x4_t, get_typenode_from_name (UINT32_TYPE))
+DEF_MVE_TYPE (uint64x2_t, get_typenode_from_name (UINT64_TYPE))
+DEF_MVE_TYPE (int8x16_t, get_typenode_from_name (INT8_TYPE))
+DEF_MVE_TYPE (int16x8_t, get_typenode_from_name (INT16_TYPE))
+DEF_MVE_TYPE (int32x4_t, get_typenode_from_name (INT32_TYPE))
+DEF_MVE_TYPE (int64x2_t, get_typenode_from_name (INT64_TYPE))
#undef REQUIRES_FLOAT
#define REQUIRES_FLOAT true
--
2.34.1
^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH 2/6] arm: [MVE intrinsics] Add support for void and load/store pointers as argument types.
2023-11-16 15:26 [PATCH 1/6] arm: Fix arm_simd_types and MVE scalar_types Christophe Lyon
@ 2023-11-16 15:26 ` Christophe Lyon
2023-11-16 16:47 ` Kyrylo Tkachov
2023-11-16 15:26 ` [PATCH 3/6] arm: [MVE intrinsics] Add support for contiguous loads and stores Christophe Lyon
` (4 subsequent siblings)
5 siblings, 1 reply; 15+ messages in thread
From: Christophe Lyon @ 2023-11-16 15:26 UTC (permalink / raw)
To: gcc-patches, richard.sandiford, richard.earnshaw, kyrylo.tkachov
Cc: Christophe Lyon
This patch adds support for '_', 'al' and 'as' for void, load pointer
and store pointer argument/return value types in intrinsic signatures.
It also adds a mew memory_scalar_type() helper to function_instance,
which is used by 'al' and 'as'.
2023-11-16 Christophe Lyon <christophe.lyon@linaro.org>
gcc/
* config/arm/arm-mve-builtins-shapes.cc (build_const_pointer):
New.
(parse_type): Add support for '_', 'al' and 'as'.
* config/arm/arm-mve-builtins.h (function_instance): Add
memory_scalar_type.
(function_base): Likewise.
---
gcc/config/arm/arm-mve-builtins-shapes.cc | 25 +++++++++++++++++++++++
gcc/config/arm/arm-mve-builtins.h | 17 +++++++++++++++
2 files changed, 42 insertions(+)
diff --git a/gcc/config/arm/arm-mve-builtins-shapes.cc b/gcc/config/arm/arm-mve-builtins-shapes.cc
index 23eb9d0e69b..ce87ebcef30 100644
--- a/gcc/config/arm/arm-mve-builtins-shapes.cc
+++ b/gcc/config/arm/arm-mve-builtins-shapes.cc
@@ -39,6 +39,13 @@
namespace arm_mve {
+/* Return a representation of "const T *". */
+static tree
+build_const_pointer (tree t)
+{
+ return build_pointer_type (build_qualified_type (t, TYPE_QUAL_CONST));
+}
+
/* If INSTANCE has a predicate, add it to the list of argument types
in ARGUMENT_TYPES. RETURN_TYPE is the type returned by the
function. */
@@ -140,6 +147,9 @@ parse_element_type (const function_instance &instance, const char *&format)
/* Read and return a type from FORMAT for function INSTANCE. Advance
FORMAT beyond the type string. The format is:
+ _ - void
+ al - array pointer for loads
+ as - array pointer for stores
p - predicates with type mve_pred16_t
s<elt> - a scalar type with the given element suffix
t<elt> - a vector or tuple type with given element suffix [*1]
@@ -156,6 +166,21 @@ parse_type (const function_instance &instance, const char *&format)
{
int ch = *format++;
+
+ if (ch == '_')
+ return void_type_node;
+
+ if (ch == 'a')
+ {
+ ch = *format++;
+ if (ch == 'l')
+ return build_const_pointer (instance.memory_scalar_type ());
+ if (ch == 's') {
+ return build_pointer_type (instance.memory_scalar_type ());
+ }
+ gcc_unreachable ();
+ }
+
if (ch == 'p')
return get_mve_pred16_t ();
diff --git a/gcc/config/arm/arm-mve-builtins.h b/gcc/config/arm/arm-mve-builtins.h
index 37b8223dfb2..4fd230fe4c7 100644
--- a/gcc/config/arm/arm-mve-builtins.h
+++ b/gcc/config/arm/arm-mve-builtins.h
@@ -277,6 +277,7 @@ public:
bool could_trap_p () const;
unsigned int vectors_per_tuple () const;
+ tree memory_scalar_type () const;
const mode_suffix_info &mode_suffix () const;
@@ -519,6 +520,14 @@ public:
of vectors in the tuples, otherwise return 1. */
virtual unsigned int vectors_per_tuple () const { return 1; }
+ /* If the function addresses memory, return the type of a single
+ scalar memory element. */
+ virtual tree
+ memory_scalar_type (const function_instance &) const
+ {
+ gcc_unreachable ();
+ }
+
/* Try to fold the given gimple call. Return the new gimple statement
on success, otherwise return null. */
virtual gimple *fold (gimple_folder &) const { return NULL; }
@@ -644,6 +653,14 @@ function_instance::vectors_per_tuple () const
return base->vectors_per_tuple ();
}
+/* If the function addresses memory, return the type of a single
+ scalar memory element. */
+inline tree
+function_instance::memory_scalar_type () const
+{
+ return base->memory_scalar_type (*this);
+}
+
/* Return information about the function's mode suffix. */
inline const mode_suffix_info &
function_instance::mode_suffix () const
--
2.34.1
^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH 3/6] arm: [MVE intrinsics] Add support for contiguous loads and stores
2023-11-16 15:26 [PATCH 1/6] arm: Fix arm_simd_types and MVE scalar_types Christophe Lyon
2023-11-16 15:26 ` [PATCH 2/6] arm: [MVE intrinsics] Add support for void and load/store pointers as argument types Christophe Lyon
@ 2023-11-16 15:26 ` Christophe Lyon
2023-11-16 16:48 ` Kyrylo Tkachov
2023-11-23 13:29 ` Jan-Benedict Glaw
2023-11-16 15:26 ` [PATCH 4/6] arm: [MVE intrinsics] add load and store shapes Christophe Lyon
` (3 subsequent siblings)
5 siblings, 2 replies; 15+ messages in thread
From: Christophe Lyon @ 2023-11-16 15:26 UTC (permalink / raw)
To: gcc-patches, richard.sandiford, richard.earnshaw, kyrylo.tkachov
Cc: Christophe Lyon
This patch adds base support for load/store intrinsics to the
framework, starting with loads and stores for contiguous memory
elements, without extension nor truncation.
Compared to the aarch64/SVE implementation, there's no support for
gather/scatter loads/stores yet. This will be added later as needed.
2023-11-16 Christophe Lyon <christophe.lyon@linaro.org>
gcc/
* config/arm/arm-mve-builtins-functions.h (multi_vector_function)
(full_width_access): New classes.
* config/arm/arm-mve-builtins.cc
(find_type_suffix_for_scalar_type, infer_pointer_type)
(require_pointer_type, get_contiguous_base, add_mem_operand)
(add_fixed_operand, use_contiguous_load_insn)
(use_contiguous_store_insn): New.
* config/arm/arm-mve-builtins.h (memory_vector_mode)
(infer_pointer_type, require_pointer_type, get_contiguous_base)
(add_mem_operand)
(add_fixed_operand, use_contiguous_load_insn)
(use_contiguous_store_insn): New.
---
gcc/config/arm/arm-mve-builtins-functions.h | 56 ++++++++++
gcc/config/arm/arm-mve-builtins.cc | 116 ++++++++++++++++++++
gcc/config/arm/arm-mve-builtins.h | 28 ++++-
3 files changed, 199 insertions(+), 1 deletion(-)
diff --git a/gcc/config/arm/arm-mve-builtins-functions.h b/gcc/config/arm/arm-mve-builtins-functions.h
index eba1f071af0..6d234a2dd7c 100644
--- a/gcc/config/arm/arm-mve-builtins-functions.h
+++ b/gcc/config/arm/arm-mve-builtins-functions.h
@@ -966,6 +966,62 @@ public:
}
};
+/* A function_base that sometimes or always operates on tuples of
+ vectors. */
+class multi_vector_function : public function_base
+{
+public:
+ CONSTEXPR multi_vector_function (unsigned int vectors_per_tuple)
+ : m_vectors_per_tuple (vectors_per_tuple) {}
+
+ unsigned int
+ vectors_per_tuple () const override
+ {
+ return m_vectors_per_tuple;
+ }
+
+ /* The number of vectors in a tuple, or 1 if the function only operates
+ on single vectors. */
+ unsigned int m_vectors_per_tuple;
+};
+
+/* A function_base that loads or stores contiguous memory elements
+ without extending or truncating them. */
+class full_width_access : public multi_vector_function
+{
+public:
+ CONSTEXPR full_width_access (unsigned int vectors_per_tuple = 1)
+ : multi_vector_function (vectors_per_tuple) {}
+
+ tree
+ memory_scalar_type (const function_instance &fi) const override
+ {
+ return fi.scalar_type (0);
+ }
+
+ machine_mode
+ memory_vector_mode (const function_instance &fi) const override
+ {
+ machine_mode mode = fi.vector_mode (0);
+ /* Vectors of floating-point are managed in memory as vectors of
+ integers. */
+ switch (mode)
+ {
+ case E_V4SFmode:
+ mode = E_V4SImode;
+ break;
+ case E_V8HFmode:
+ mode = E_V8HImode;
+ break;
+ }
+
+ if (m_vectors_per_tuple != 1)
+ mode = targetm.array_mode (mode, m_vectors_per_tuple).require ();
+
+ return mode;
+ }
+};
+
} /* end namespace arm_mve */
/* Declare the global function base NAME, creating it from an instance
diff --git a/gcc/config/arm/arm-mve-builtins.cc b/gcc/config/arm/arm-mve-builtins.cc
index 02dc8fa9b73..a265cb05553 100644
--- a/gcc/config/arm/arm-mve-builtins.cc
+++ b/gcc/config/arm/arm-mve-builtins.cc
@@ -36,6 +36,7 @@
#include "fold-const.h"
#include "gimple.h"
#include "gimple-iterator.h"
+#include "explow.h"
#include "emit-rtl.h"
#include "langhooks.h"
#include "stringpool.h"
@@ -529,6 +530,22 @@ matches_type_p (const_tree model_type, const_tree candidate)
&& TYPE_MAIN_VARIANT (model_type) == TYPE_MAIN_VARIANT (candidate));
}
+/* If TYPE is a valid MVE element type, return the corresponding type
+ suffix, otherwise return NUM_TYPE_SUFFIXES. */
+static type_suffix_index
+find_type_suffix_for_scalar_type (const_tree type)
+{
+ /* A linear search should be OK here, since the code isn't hot and
+ the number of types is only small. */
+ for (unsigned int suffix_i = 0; suffix_i < NUM_TYPE_SUFFIXES; ++suffix_i)
+ {
+ vector_type_index vector_i = type_suffixes[suffix_i].vector_type;
+ if (matches_type_p (scalar_types[vector_i], type))
+ return type_suffix_index (suffix_i);
+ }
+ return NUM_TYPE_SUFFIXES;
+}
+
/* Report an error against LOCATION that the user has tried to use
a floating point function when the mve.fp extension is disabled. */
static void
@@ -1125,6 +1142,37 @@ function_resolver::resolve_to (mode_suffix_index mode,
return res;
}
+/* Require argument ARGNO to be a pointer to a scalar type that has a
+ corresponding type suffix. Return that type suffix on success,
+ otherwise report an error and return NUM_TYPE_SUFFIXES. */
+type_suffix_index
+function_resolver::infer_pointer_type (unsigned int argno)
+{
+ tree actual = get_argument_type (argno);
+ if (actual == error_mark_node)
+ return NUM_TYPE_SUFFIXES;
+
+ if (TREE_CODE (actual) != POINTER_TYPE)
+ {
+ error_at (location, "passing %qT to argument %d of %qE, which"
+ " expects a pointer type", actual, argno + 1, fndecl);
+ return NUM_TYPE_SUFFIXES;
+ }
+
+ tree target = TREE_TYPE (actual);
+ type_suffix_index type = find_type_suffix_for_scalar_type (target);
+ if (type == NUM_TYPE_SUFFIXES)
+ {
+ error_at (location, "passing %qT to argument %d of %qE, but %qT is not"
+ " a valid MVE element type", actual, argno + 1, fndecl,
+ build_qualified_type (target, 0));
+ return NUM_TYPE_SUFFIXES;
+ }
+ unsigned int bits = type_suffixes[type].element_bits;
+
+ return type;
+}
+
/* Require argument ARGNO to be a single vector or a tuple of NUM_VECTORS
vectors; NUM_VECTORS is 1 for the former. Return the associated type
suffix on success, using TYPE_SUFFIX_b for predicates. Report an error
@@ -1498,6 +1546,22 @@ function_resolver::require_scalar_type (unsigned int argno,
return true;
}
+/* Require argument ARGNO to be some form of pointer, without being specific
+ about its target type. Return true if the argument has the right form,
+ otherwise report an appropriate error. */
+bool
+function_resolver::require_pointer_type (unsigned int argno)
+{
+ if (!scalar_argument_p (argno))
+ {
+ error_at (location, "passing %qT to argument %d of %qE, which"
+ " expects a scalar pointer", get_argument_type (argno),
+ argno + 1, fndecl);
+ return false;
+ }
+ return true;
+}
+
/* Require the function to have exactly EXPECTED arguments. Return true
if it does, otherwise report an appropriate error. */
bool
@@ -1955,6 +2019,14 @@ function_expander::direct_optab_handler (optab op, unsigned int suffix_i)
return ::direct_optab_handler (op, vector_mode (suffix_i));
}
+/* Return the base address for a contiguous load or store
+ function. */
+rtx
+function_expander::get_contiguous_base ()
+{
+ return args[0];
+}
+
/* For a function that does the equivalent of:
OUTPUT = COND ? FN (INPUTS) : FALLBACK;
@@ -2043,6 +2115,26 @@ function_expander::add_integer_operand (HOST_WIDE_INT x)
create_integer_operand (&m_ops.last (), x);
}
+/* Add a memory operand with mode MODE and address ADDR. */
+void
+function_expander::add_mem_operand (machine_mode mode, rtx addr)
+{
+ gcc_assert (VECTOR_MODE_P (mode));
+ rtx mem = gen_rtx_MEM (mode, memory_address (mode, addr));
+ /* The memory is only guaranteed to be element-aligned. */
+ set_mem_align (mem, GET_MODE_ALIGNMENT (GET_MODE_INNER (mode)));
+ add_fixed_operand (mem);
+}
+
+/* Add an operand that must be X. The only way of legitimizing an
+ invalid X is to reload the address of a MEM. */
+void
+function_expander::add_fixed_operand (rtx x)
+{
+ m_ops.safe_grow (m_ops.length () + 1, true);
+ create_fixed_operand (&m_ops.last (), x);
+}
+
/* Generate instruction ICODE, given that its operands have already
been added to M_OPS. Return the value of the first operand. */
rtx
@@ -2137,6 +2229,30 @@ function_expander::use_cond_insn (insn_code icode, unsigned int merge_argno)
return generate_insn (icode);
}
+/* Implement the call using instruction ICODE, which loads memory operand 1
+ into register operand 0. */
+rtx
+function_expander::use_contiguous_load_insn (insn_code icode)
+{
+ machine_mode mem_mode = memory_vector_mode ();
+
+ add_output_operand (icode);
+ add_mem_operand (mem_mode, get_contiguous_base ());
+ return generate_insn (icode);
+}
+
+/* Implement the call using instruction ICODE, which stores register operand 1
+ into memory operand 0. */
+rtx
+function_expander::use_contiguous_store_insn (insn_code icode)
+{
+ machine_mode mem_mode = memory_vector_mode ();
+
+ add_mem_operand (mem_mode, get_contiguous_base ());
+ add_input_operand (icode, args[1]);
+ return generate_insn (icode);
+}
+
/* Implement the call using a normal unpredicated optab for PRED_none.
<optab> corresponds to:
diff --git a/gcc/config/arm/arm-mve-builtins.h b/gcc/config/arm/arm-mve-builtins.h
index 4fd230fe4c7..9c219fa8db4 100644
--- a/gcc/config/arm/arm-mve-builtins.h
+++ b/gcc/config/arm/arm-mve-builtins.h
@@ -278,6 +278,7 @@ public:
unsigned int vectors_per_tuple () const;
tree memory_scalar_type () const;
+ machine_mode memory_vector_mode () const;
const mode_suffix_info &mode_suffix () const;
@@ -383,6 +384,7 @@ public:
type_suffix_index = NUM_TYPE_SUFFIXES,
type_suffix_index = NUM_TYPE_SUFFIXES);
+ type_suffix_index infer_pointer_type (unsigned int);
type_suffix_index infer_vector_or_tuple_type (unsigned int, unsigned int);
type_suffix_index infer_vector_type (unsigned int);
@@ -394,8 +396,9 @@ public:
type_suffix_index,
type_class_index = SAME_TYPE_CLASS,
unsigned int = SAME_SIZE);
- bool require_integer_immediate (unsigned int);
bool require_scalar_type (unsigned int, const char *);
+ bool require_pointer_type (unsigned int);
+ bool require_integer_immediate (unsigned int);
bool require_derived_scalar_type (unsigned int, type_class_index,
unsigned int = SAME_SIZE);
@@ -476,18 +479,23 @@ public:
insn_code direct_optab_handler (optab, unsigned int = 0);
+ rtx get_contiguous_base ();
rtx get_fallback_value (machine_mode, unsigned int, unsigned int &);
rtx get_reg_target ();
void add_output_operand (insn_code);
void add_input_operand (insn_code, rtx);
void add_integer_operand (HOST_WIDE_INT);
+ void add_mem_operand (machine_mode, rtx);
+ void add_fixed_operand (rtx);
rtx generate_insn (insn_code);
rtx use_exact_insn (insn_code);
rtx use_unpred_insn (insn_code);
rtx use_pred_x_insn (insn_code);
rtx use_cond_insn (insn_code, unsigned int = DEFAULT_MERGE_ARGNO);
+ rtx use_contiguous_load_insn (insn_code);
+ rtx use_contiguous_store_insn (insn_code);
rtx map_to_rtx_codes (rtx_code, rtx_code, rtx_code);
@@ -528,6 +536,15 @@ public:
gcc_unreachable ();
}
+ /* If the function addresses memory, return a vector mode whose
+ GET_MODE_NUNITS is the number of elements addressed and whose
+ GET_MODE_INNER is the mode of a single scalar memory element. */
+ virtual machine_mode
+ memory_vector_mode (const function_instance &) const
+ {
+ gcc_unreachable ();
+ }
+
/* Try to fold the given gimple call. Return the new gimple statement
on success, otherwise return null. */
virtual gimple *fold (gimple_folder &) const { return NULL; }
@@ -661,6 +678,15 @@ function_instance::memory_scalar_type () const
return base->memory_scalar_type (*this);
}
+/* If the function addresses memory, return a vector mode whose
+ GET_MODE_NUNITS is the number of elements addressed and whose
+ GET_MODE_INNER is the mode of a single scalar memory element. */
+inline machine_mode
+function_instance::memory_vector_mode () const
+{
+ return base->memory_vector_mode (*this);
+}
+
/* Return information about the function's mode suffix. */
inline const mode_suffix_info &
function_instance::mode_suffix () const
--
2.34.1
^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH 4/6] arm: [MVE intrinsics] add load and store shapes
2023-11-16 15:26 [PATCH 1/6] arm: Fix arm_simd_types and MVE scalar_types Christophe Lyon
2023-11-16 15:26 ` [PATCH 2/6] arm: [MVE intrinsics] Add support for void and load/store pointers as argument types Christophe Lyon
2023-11-16 15:26 ` [PATCH 3/6] arm: [MVE intrinsics] Add support for contiguous loads and stores Christophe Lyon
@ 2023-11-16 15:26 ` Christophe Lyon
2023-11-16 16:49 ` Kyrylo Tkachov
2023-11-16 15:26 ` [PATCH 5/6] arm: [MVE intrinsics] fix vst1 tests Christophe Lyon
` (2 subsequent siblings)
5 siblings, 1 reply; 15+ messages in thread
From: Christophe Lyon @ 2023-11-16 15:26 UTC (permalink / raw)
To: gcc-patches, richard.sandiford, richard.earnshaw, kyrylo.tkachov
Cc: Christophe Lyon
This patch adds the load and store shapes descriptions.
2023-11-16 Christophe Lyon <christophe.lyon@linaro.org>
gcc/
* config/arm/arm-mve-builtins-shapes.cc (load, store): New.
* config/arm/arm-mve-builtins-shapes.h (load, store): New.
---
gcc/config/arm/arm-mve-builtins-shapes.cc | 67 +++++++++++++++++++++++
gcc/config/arm/arm-mve-builtins-shapes.h | 2 +
2 files changed, 69 insertions(+)
diff --git a/gcc/config/arm/arm-mve-builtins-shapes.cc b/gcc/config/arm/arm-mve-builtins-shapes.cc
index ce87ebcef30..fe983e7c736 100644
--- a/gcc/config/arm/arm-mve-builtins-shapes.cc
+++ b/gcc/config/arm/arm-mve-builtins-shapes.cc
@@ -1428,6 +1428,38 @@ struct inherent_def : public nonoverloaded_base
};
SHAPE (inherent)
+/* sv<t0>_t svfoo[_t0](const <t0>_t *)
+
+ Example: vld1q.
+ int8x16_t [__arm_]vld1q[_s8](int8_t const *base)
+ int8x16_t [__arm_]vld1q_z[_s8](int8_t const *base, mve_pred16_t p) */
+struct load_def : public overloaded_base<0>
+{
+ void
+ build (function_builder &b, const function_group_info &group,
+ bool preserve_user_namespace) const override
+ {
+ b.add_overloaded_functions (group, MODE_none, preserve_user_namespace);
+ build_all (b, "t0,al", group, MODE_none, preserve_user_namespace);
+ }
+
+ /* Resolve a call based purely on a pointer argument. */
+ tree
+ resolve (function_resolver &r) const override
+ {
+ gcc_assert (r.mode_suffix_id == MODE_none);
+
+ unsigned int i, nargs;
+ type_suffix_index type;
+ if (!r.check_gp_argument (1, i, nargs)
+ || (type = r.infer_pointer_type (i)) == NUM_TYPE_SUFFIXES)
+ return error_mark_node;
+
+ return r.resolve_to (r.mode_suffix_id, type);
+ }
+};
+SHAPE (load)
+
/* <T0>_t vfoo[_t0](<T0>_t)
<T0>_t vfoo_n_t0(<sT0>_t)
@@ -1477,6 +1509,41 @@ struct mvn_def : public overloaded_base<0>
};
SHAPE (mvn)
+/* void vfoo[_t0](<X>_t *, v<t0>[xN]_t)
+
+ where <X> might be tied to <t0> (for non-truncating stores) or might
+ depend on the function base name (for truncating stores).
+
+ Example: vst1q.
+ void [__arm_]vst1q[_s8](int8_t *base, int8x16_t value)
+ void [__arm_]vst1q_p[_s8](int8_t *base, int8x16_t value, mve_pred16_t p) */
+struct store_def : public overloaded_base<0>
+{
+ void
+ build (function_builder &b, const function_group_info &group,
+ bool preserve_user_namespace) const override
+ {
+ b.add_overloaded_functions (group, MODE_none, preserve_user_namespace);
+ build_all (b, "_,as,v0", group, MODE_none, preserve_user_namespace);
+ }
+
+ tree
+ resolve (function_resolver &r) const override
+ {
+ gcc_assert (r.mode_suffix_id == MODE_none);
+
+ unsigned int i, nargs;
+ type_suffix_index type;
+ if (!r.check_gp_argument (2, i, nargs)
+ || !r.require_pointer_type (0)
+ || (type = r.infer_vector_type (1)) == NUM_TYPE_SUFFIXES)
+ return error_mark_node;
+
+ return r.resolve_to (r.mode_suffix_id, type);
+ }
+};
+SHAPE (store)
+
/* <T0>_t vfoo[_t0](<T0>_t, <T0>_t, <T0>_t)
i.e. the standard shape for ternary operations that operate on
diff --git a/gcc/config/arm/arm-mve-builtins-shapes.h b/gcc/config/arm/arm-mve-builtins-shapes.h
index a93245321c9..aa9309dec7e 100644
--- a/gcc/config/arm/arm-mve-builtins-shapes.h
+++ b/gcc/config/arm/arm-mve-builtins-shapes.h
@@ -61,7 +61,9 @@ namespace arm_mve
extern const function_shape *const cmp;
extern const function_shape *const create;
extern const function_shape *const inherent;
+ extern const function_shape *const load;
extern const function_shape *const mvn;
+ extern const function_shape *const store;
extern const function_shape *const ternary;
extern const function_shape *const ternary_lshift;
extern const function_shape *const ternary_n;
--
2.34.1
^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH 5/6] arm: [MVE intrinsics] fix vst1 tests
2023-11-16 15:26 [PATCH 1/6] arm: Fix arm_simd_types and MVE scalar_types Christophe Lyon
` (2 preceding siblings ...)
2023-11-16 15:26 ` [PATCH 4/6] arm: [MVE intrinsics] add load and store shapes Christophe Lyon
@ 2023-11-16 15:26 ` Christophe Lyon
2023-11-16 15:30 ` Kyrylo Tkachov
2023-11-16 15:26 ` [PATCH 6/6] arm: [MVE intrinsics] rework vldq1 vst1q Christophe Lyon
2023-11-16 16:46 ` [PATCH 1/6] arm: Fix arm_simd_types and MVE scalar_types Kyrylo Tkachov
5 siblings, 1 reply; 15+ messages in thread
From: Christophe Lyon @ 2023-11-16 15:26 UTC (permalink / raw)
To: gcc-patches, richard.sandiford, richard.earnshaw, kyrylo.tkachov
Cc: Christophe Lyon
vst1q intrinsics return void, so we should not do 'return vst1q_f16 (base, value);'
This was OK so far, but will trigger an error/warning with the new
implementation of these intrinsics.
This patch just removes the 'return' keyword.
2023-11-16 Christophe Lyon <christophe.lyon@linaro.org>
gcc/testsuite/
* gcc.target/arm/mve/intrinsics/vst1q_f16.c: Remove 'return'.
* gcc.target/arm/mve/intrinsics/vst1q_f32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vst1q_s16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vst1q_s32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vst1q_s8.c: Likewise.
* gcc.target/arm/mve/intrinsics/vst1q_u16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vst1q_u32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vst1q_u8.c: Likewise.
---
gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_f16.c | 4 ++--
gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_f32.c | 4 ++--
gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_s16.c | 4 ++--
gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_s32.c | 4 ++--
gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_s8.c | 4 ++--
gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_u16.c | 4 ++--
gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_u32.c | 4 ++--
gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_u8.c | 4 ++--
8 files changed, 16 insertions(+), 16 deletions(-)
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_f16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_f16.c
index 1fa02f00f53..e4b40604d54 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_f16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_f16.c
@@ -18,7 +18,7 @@ extern "C" {
void
foo (float16_t *base, float16x8_t value)
{
- return vst1q_f16 (base, value);
+ vst1q_f16 (base, value);
}
@@ -31,7 +31,7 @@ foo (float16_t *base, float16x8_t value)
void
foo1 (float16_t *base, float16x8_t value)
{
- return vst1q (base, value);
+ vst1q (base, value);
}
#ifdef __cplusplus
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_f32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_f32.c
index 67cc3ae3b47..8f42323c603 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_f32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_f32.c
@@ -18,7 +18,7 @@ extern "C" {
void
foo (float32_t *base, float32x4_t value)
{
- return vst1q_f32 (base, value);
+ vst1q_f32 (base, value);
}
@@ -31,7 +31,7 @@ foo (float32_t *base, float32x4_t value)
void
foo1 (float32_t *base, float32x4_t value)
{
- return vst1q (base, value);
+ vst1q (base, value);
}
#ifdef __cplusplus
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_s16.c
index 052959b2083..891ac4155d9 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_s16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_s16.c
@@ -18,7 +18,7 @@ extern "C" {
void
foo (int16_t *base, int16x8_t value)
{
- return vst1q_s16 (base, value);
+ vst1q_s16 (base, value);
}
@@ -31,7 +31,7 @@ foo (int16_t *base, int16x8_t value)
void
foo1 (int16_t *base, int16x8_t value)
{
- return vst1q (base, value);
+ vst1q (base, value);
}
#ifdef __cplusplus
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_s32.c
index 444ad07f4ef..a28d1eb98db 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_s32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_s32.c
@@ -18,7 +18,7 @@ extern "C" {
void
foo (int32_t *base, int32x4_t value)
{
- return vst1q_s32 (base, value);
+ vst1q_s32 (base, value);
}
@@ -31,7 +31,7 @@ foo (int32_t *base, int32x4_t value)
void
foo1 (int32_t *base, int32x4_t value)
{
- return vst1q (base, value);
+ vst1q (base, value);
}
#ifdef __cplusplus
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_s8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_s8.c
index 684ff0aca5b..81c141a63e0 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_s8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_s8.c
@@ -18,7 +18,7 @@ extern "C" {
void
foo (int8_t *base, int8x16_t value)
{
- return vst1q_s8 (base, value);
+ vst1q_s8 (base, value);
}
@@ -31,7 +31,7 @@ foo (int8_t *base, int8x16_t value)
void
foo1 (int8_t *base, int8x16_t value)
{
- return vst1q (base, value);
+ vst1q (base, value);
}
#ifdef __cplusplus
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_u16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_u16.c
index 1fea2de1e76..b8ce7fbe6ee 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_u16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_u16.c
@@ -18,7 +18,7 @@ extern "C" {
void
foo (uint16_t *base, uint16x8_t value)
{
- return vst1q_u16 (base, value);
+ vst1q_u16 (base, value);
}
@@ -31,7 +31,7 @@ foo (uint16_t *base, uint16x8_t value)
void
foo1 (uint16_t *base, uint16x8_t value)
{
- return vst1q (base, value);
+ vst1q (base, value);
}
#ifdef __cplusplus
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_u32.c
index 64c43c59d47..1dbb55538a9 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_u32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_u32.c
@@ -18,7 +18,7 @@ extern "C" {
void
foo (uint32_t *base, uint32x4_t value)
{
- return vst1q_u32 (base, value);
+ vst1q_u32 (base, value);
}
@@ -31,7 +31,7 @@ foo (uint32_t *base, uint32x4_t value)
void
foo1 (uint32_t *base, uint32x4_t value)
{
- return vst1q (base, value);
+ vst1q (base, value);
}
#ifdef __cplusplus
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_u8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_u8.c
index 5517611bba6..ab22be81647 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_u8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_u8.c
@@ -18,7 +18,7 @@ extern "C" {
void
foo (uint8_t *base, uint8x16_t value)
{
- return vst1q_u8 (base, value);
+ vst1q_u8 (base, value);
}
@@ -31,7 +31,7 @@ foo (uint8_t *base, uint8x16_t value)
void
foo1 (uint8_t *base, uint8x16_t value)
{
- return vst1q (base, value);
+ vst1q (base, value);
}
#ifdef __cplusplus
--
2.34.1
^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH 6/6] arm: [MVE intrinsics] rework vldq1 vst1q
2023-11-16 15:26 [PATCH 1/6] arm: Fix arm_simd_types and MVE scalar_types Christophe Lyon
` (3 preceding siblings ...)
2023-11-16 15:26 ` [PATCH 5/6] arm: [MVE intrinsics] fix vst1 tests Christophe Lyon
@ 2023-11-16 15:26 ` Christophe Lyon
2023-11-16 16:49 ` Kyrylo Tkachov
2023-11-16 16:46 ` [PATCH 1/6] arm: Fix arm_simd_types and MVE scalar_types Kyrylo Tkachov
5 siblings, 1 reply; 15+ messages in thread
From: Christophe Lyon @ 2023-11-16 15:26 UTC (permalink / raw)
To: gcc-patches, richard.sandiford, richard.earnshaw, kyrylo.tkachov
Cc: Christophe Lyon
Implement vld1q, vst1q using the new MVE builtins framework.
2023-11-16 Christophe Lyon <christophe.lyon@linaro.org>
gcc/
* config/arm/arm-mve-builtins-base.cc (vld1_impl, vld1q)
(vst1_impl, vst1q): New.
* config/arm/arm-mve-builtins-base.def (vld1q, vst1q): New.
* config/arm/arm-mve-builtins-base.h (vld1q, vst1q): New.
* config/arm/arm_mve.h
(vld1q): Delete.
(vst1q): Delete.
(vld1q_s8): Delete.
(vld1q_s32): Delete.
(vld1q_s16): Delete.
(vld1q_u8): Delete.
(vld1q_u32): Delete.
(vld1q_u16): Delete.
(vld1q_f32): Delete.
(vld1q_f16): Delete.
(vst1q_f32): Delete.
(vst1q_f16): Delete.
(vst1q_s8): Delete.
(vst1q_s32): Delete.
(vst1q_s16): Delete.
(vst1q_u8): Delete.
(vst1q_u32): Delete.
(vst1q_u16): Delete.
(__arm_vld1q_s8): Delete.
(__arm_vld1q_s32): Delete.
(__arm_vld1q_s16): Delete.
(__arm_vld1q_u8): Delete.
(__arm_vld1q_u32): Delete.
(__arm_vld1q_u16): Delete.
(__arm_vst1q_s8): Delete.
(__arm_vst1q_s32): Delete.
(__arm_vst1q_s16): Delete.
(__arm_vst1q_u8): Delete.
(__arm_vst1q_u32): Delete.
(__arm_vst1q_u16): Delete.
(__arm_vld1q_f32): Delete.
(__arm_vld1q_f16): Delete.
(__arm_vst1q_f32): Delete.
(__arm_vst1q_f16): Delete.
(__arm_vld1q): Delete.
(__arm_vst1q): Delete.
* config/arm/mve.md (mve_vld1q_f<mode>): Rename into ...
(@mve_vld1q_f<mode>): ... this.
(mve_vld1q_<supf><mode>): Rename into ...
(@mve_vld1q_<supf><mode>) ... this.
(mve_vst1q_f<mode>): Rename into ...
(@mve_vst1q_f<mode>): ... this.
(mve_vst1q_<supf><mode>): Rename into ...
(@mve_vst1q_<supf><mode>) ... this.
---
gcc/config/arm/arm-mve-builtins-base.cc | 58 +++++
gcc/config/arm/arm-mve-builtins-base.def | 4 +
gcc/config/arm/arm-mve-builtins-base.h | 4 +-
gcc/config/arm/arm_mve.h | 282 -----------------------
gcc/config/arm/mve.md | 8 +-
5 files changed, 69 insertions(+), 287 deletions(-)
diff --git a/gcc/config/arm/arm-mve-builtins-base.cc b/gcc/config/arm/arm-mve-builtins-base.cc
index 5478cac8aeb..cfe1b954a29 100644
--- a/gcc/config/arm/arm-mve-builtins-base.cc
+++ b/gcc/config/arm/arm-mve-builtins-base.cc
@@ -83,6 +83,62 @@ class vuninitializedq_impl : public quiet<function_base>
}
};
+class vld1_impl : public full_width_access
+{
+public:
+ unsigned int
+ call_properties (const function_instance &) const override
+ {
+ return CP_READ_MEMORY;
+ }
+
+ rtx
+ expand (function_expander &e) const override
+ {
+ insn_code icode;
+ if (e.type_suffix (0).float_p)
+ icode = code_for_mve_vld1q_f(e.vector_mode (0));
+ else
+ {
+ if (e.type_suffix (0).unsigned_p)
+ icode = code_for_mve_vld1q(VLD1Q_U,
+ e.vector_mode (0));
+ else
+ icode = code_for_mve_vld1q(VLD1Q_S,
+ e.vector_mode (0));
+ }
+ return e.use_contiguous_load_insn (icode);
+ }
+};
+
+class vst1_impl : public full_width_access
+{
+public:
+ unsigned int
+ call_properties (const function_instance &) const override
+ {
+ return CP_WRITE_MEMORY;
+ }
+
+ rtx
+ expand (function_expander &e) const override
+ {
+ insn_code icode;
+ if (e.type_suffix (0).float_p)
+ icode = code_for_mve_vst1q_f(e.vector_mode (0));
+ else
+ {
+ if (e.type_suffix (0).unsigned_p)
+ icode = code_for_mve_vst1q(VST1Q_U,
+ e.vector_mode (0));
+ else
+ icode = code_for_mve_vst1q(VST1Q_S,
+ e.vector_mode (0));
+ }
+ return e.use_contiguous_store_insn (icode);
+ }
+};
+
} /* end anonymous namespace */
namespace arm_mve {
@@ -290,6 +346,7 @@ FUNCTION (vfmasq, unspec_mve_function_exact_insn, (-1, -1, -1, -1, -1, VFMASQ_N_
FUNCTION (vfmsq, unspec_mve_function_exact_insn, (-1, -1, VFMSQ_F, -1, -1, -1, -1, -1, VFMSQ_M_F, -1, -1, -1))
FUNCTION_WITH_M_N_NO_F (vhaddq, VHADDQ)
FUNCTION_WITH_M_N_NO_F (vhsubq, VHSUBQ)
+FUNCTION (vld1q, vld1_impl,)
FUNCTION_PRED_P_S (vmaxavq, VMAXAVQ)
FUNCTION_WITHOUT_N_NO_U_F (vmaxaq, VMAXAQ)
FUNCTION_ONLY_F (vmaxnmaq, VMAXNMAQ)
@@ -405,6 +462,7 @@ FUNCTION_ONLY_N_NO_F (vshrntq, VSHRNTQ)
FUNCTION_ONLY_N_NO_F (vshrq, VSHRQ)
FUNCTION_ONLY_N_NO_F (vsliq, VSLIQ)
FUNCTION_ONLY_N_NO_F (vsriq, VSRIQ)
+FUNCTION (vst1q, vst1_impl,)
FUNCTION_WITH_RTX_M_N (vsubq, MINUS, VSUBQ)
FUNCTION (vuninitializedq, vuninitializedq_impl,)
diff --git a/gcc/config/arm/arm-mve-builtins-base.def b/gcc/config/arm/arm-mve-builtins-base.def
index 01dfbdef8a3..16879246237 100644
--- a/gcc/config/arm/arm-mve-builtins-base.def
+++ b/gcc/config/arm/arm-mve-builtins-base.def
@@ -47,6 +47,7 @@ DEF_MVE_FUNCTION (vhaddq, binary_opt_n, all_integer, mx_or_none)
DEF_MVE_FUNCTION (vhcaddq_rot90, binary, all_signed, mx_or_none)
DEF_MVE_FUNCTION (vhcaddq_rot270, binary, all_signed, mx_or_none)
DEF_MVE_FUNCTION (vhsubq, binary_opt_n, all_integer, mx_or_none)
+DEF_MVE_FUNCTION (vld1q, load, all_integer, none)
DEF_MVE_FUNCTION (vmaxaq, binary_maxamina, all_signed, m_or_none)
DEF_MVE_FUNCTION (vmaxavq, binary_maxavminav, all_signed, p_or_none)
DEF_MVE_FUNCTION (vmaxq, binary, all_integer, mx_or_none)
@@ -150,6 +151,7 @@ DEF_MVE_FUNCTION (vshrntq, binary_rshift_narrow, integer_16_32, m_or_none)
DEF_MVE_FUNCTION (vshrq, binary_rshift, all_integer, mx_or_none)
DEF_MVE_FUNCTION (vsliq, ternary_lshift, all_integer, m_or_none)
DEF_MVE_FUNCTION (vsriq, ternary_rshift, all_integer, m_or_none)
+DEF_MVE_FUNCTION (vst1q, store, all_integer, none)
DEF_MVE_FUNCTION (vsubq, binary_opt_n, all_integer, mx_or_none)
DEF_MVE_FUNCTION (vuninitializedq, inherent, all_integer_with_64, none)
#undef REQUIRES_FLOAT
@@ -182,6 +184,7 @@ DEF_MVE_FUNCTION (veorq, binary, all_float, mx_or_none)
DEF_MVE_FUNCTION (vfmaq, ternary_opt_n, all_float, m_or_none)
DEF_MVE_FUNCTION (vfmasq, ternary_n, all_float, m_or_none)
DEF_MVE_FUNCTION (vfmsq, ternary, all_float, m_or_none)
+DEF_MVE_FUNCTION (vld1q, load, all_float, none)
DEF_MVE_FUNCTION (vmaxnmaq, binary, all_float, m_or_none)
DEF_MVE_FUNCTION (vmaxnmavq, binary_maxvminv, all_float, p_or_none)
DEF_MVE_FUNCTION (vmaxnmq, binary, all_float, mx_or_none)
@@ -203,6 +206,7 @@ DEF_MVE_FUNCTION (vrndnq, unary, all_float, mx_or_none)
DEF_MVE_FUNCTION (vrndpq, unary, all_float, mx_or_none)
DEF_MVE_FUNCTION (vrndq, unary, all_float, mx_or_none)
DEF_MVE_FUNCTION (vrndxq, unary, all_float, mx_or_none)
+DEF_MVE_FUNCTION (vst1q, store, all_float, none)
DEF_MVE_FUNCTION (vsubq, binary_opt_n, all_float, mx_or_none)
DEF_MVE_FUNCTION (vuninitializedq, inherent, all_float, none)
#undef REQUIRES_FLOAT
diff --git a/gcc/config/arm/arm-mve-builtins-base.h b/gcc/config/arm/arm-mve-builtins-base.h
index c574c32ac53..8c7e5fe5c3e 100644
--- a/gcc/config/arm/arm-mve-builtins-base.h
+++ b/gcc/config/arm/arm-mve-builtins-base.h
@@ -63,6 +63,7 @@ extern const function_base *const vhaddq;
extern const function_base *const vhcaddq_rot270;
extern const function_base *const vhcaddq_rot90;
extern const function_base *const vhsubq;
+extern const function_base *const vld1q;
extern const function_base *const vmaxaq;
extern const function_base *const vmaxavq;
extern const function_base *const vmaxnmaq;
@@ -103,8 +104,8 @@ extern const function_base *const vmovnbq;
extern const function_base *const vmovntq;
extern const function_base *const vmulhq;
extern const function_base *const vmullbq_int;
-extern const function_base *const vmulltq_int;
extern const function_base *const vmullbq_poly;
+extern const function_base *const vmulltq_int;
extern const function_base *const vmulltq_poly;
extern const function_base *const vmulq;
extern const function_base *const vmvnq;
@@ -178,6 +179,7 @@ extern const function_base *const vshrntq;
extern const function_base *const vshrq;
extern const function_base *const vsliq;
extern const function_base *const vsriq;
+extern const function_base *const vst1q;
extern const function_base *const vsubq;
extern const function_base *const vuninitializedq;
diff --git a/gcc/config/arm/arm_mve.h b/gcc/config/arm/arm_mve.h
index b82d94e59bd..cc027f9cbb5 100644
--- a/gcc/config/arm/arm_mve.h
+++ b/gcc/config/arm/arm_mve.h
@@ -56,7 +56,6 @@
#define vstrbq_scatter_offset_p(__base, __offset, __value, __p) __arm_vstrbq_scatter_offset_p(__base, __offset, __value, __p)
#define vstrwq_scatter_base_p(__addr, __offset, __value, __p) __arm_vstrwq_scatter_base_p(__addr, __offset, __value, __p)
#define vldrbq_gather_offset_z(__base, __offset, __p) __arm_vldrbq_gather_offset_z(__base, __offset, __p)
-#define vld1q(__base) __arm_vld1q(__base)
#define vldrhq_gather_offset(__base, __offset) __arm_vldrhq_gather_offset(__base, __offset)
#define vldrhq_gather_offset_z(__base, __offset, __p) __arm_vldrhq_gather_offset_z(__base, __offset, __p)
#define vldrhq_gather_shifted_offset(__base, __offset) __arm_vldrhq_gather_shifted_offset(__base, __offset)
@@ -69,7 +68,6 @@
#define vldrwq_gather_offset_z(__base, __offset, __p) __arm_vldrwq_gather_offset_z(__base, __offset, __p)
#define vldrwq_gather_shifted_offset(__base, __offset) __arm_vldrwq_gather_shifted_offset(__base, __offset)
#define vldrwq_gather_shifted_offset_z(__base, __offset, __p) __arm_vldrwq_gather_shifted_offset_z(__base, __offset, __p)
-#define vst1q(__addr, __value) __arm_vst1q(__addr, __value)
#define vstrhq_scatter_offset(__base, __offset, __value) __arm_vstrhq_scatter_offset(__base, __offset, __value)
#define vstrhq_scatter_offset_p(__base, __offset, __value, __p) __arm_vstrhq_scatter_offset_p(__base, __offset, __value, __p)
#define vstrhq_scatter_shifted_offset(__base, __offset, __value) __arm_vstrhq_scatter_shifted_offset(__base, __offset, __value)
@@ -346,12 +344,6 @@
#define vldrbq_z_u32(__base, __p) __arm_vldrbq_z_u32(__base, __p)
#define vldrwq_gather_base_z_u32(__addr, __offset, __p) __arm_vldrwq_gather_base_z_u32(__addr, __offset, __p)
#define vldrwq_gather_base_z_s32(__addr, __offset, __p) __arm_vldrwq_gather_base_z_s32(__addr, __offset, __p)
-#define vld1q_s8(__base) __arm_vld1q_s8(__base)
-#define vld1q_s32(__base) __arm_vld1q_s32(__base)
-#define vld1q_s16(__base) __arm_vld1q_s16(__base)
-#define vld1q_u8(__base) __arm_vld1q_u8(__base)
-#define vld1q_u32(__base) __arm_vld1q_u32(__base)
-#define vld1q_u16(__base) __arm_vld1q_u16(__base)
#define vldrhq_gather_offset_s32(__base, __offset) __arm_vldrhq_gather_offset_s32(__base, __offset)
#define vldrhq_gather_offset_s16(__base, __offset) __arm_vldrhq_gather_offset_s16(__base, __offset)
#define vldrhq_gather_offset_u32(__base, __offset) __arm_vldrhq_gather_offset_u32(__base, __offset)
@@ -380,8 +372,6 @@
#define vldrwq_u32(__base) __arm_vldrwq_u32(__base)
#define vldrwq_z_s32(__base, __p) __arm_vldrwq_z_s32(__base, __p)
#define vldrwq_z_u32(__base, __p) __arm_vldrwq_z_u32(__base, __p)
-#define vld1q_f32(__base) __arm_vld1q_f32(__base)
-#define vld1q_f16(__base) __arm_vld1q_f16(__base)
#define vldrhq_f16(__base) __arm_vldrhq_f16(__base)
#define vldrhq_z_f16(__base, __p) __arm_vldrhq_z_f16(__base, __p)
#define vldrwq_f32(__base) __arm_vldrwq_f32(__base)
@@ -416,14 +406,6 @@
#define vldrwq_gather_shifted_offset_z_f32(__base, __offset, __p) __arm_vldrwq_gather_shifted_offset_z_f32(__base, __offset, __p)
#define vldrwq_gather_shifted_offset_z_s32(__base, __offset, __p) __arm_vldrwq_gather_shifted_offset_z_s32(__base, __offset, __p)
#define vldrwq_gather_shifted_offset_z_u32(__base, __offset, __p) __arm_vldrwq_gather_shifted_offset_z_u32(__base, __offset, __p)
-#define vst1q_f32(__addr, __value) __arm_vst1q_f32(__addr, __value)
-#define vst1q_f16(__addr, __value) __arm_vst1q_f16(__addr, __value)
-#define vst1q_s8(__addr, __value) __arm_vst1q_s8(__addr, __value)
-#define vst1q_s32(__addr, __value) __arm_vst1q_s32(__addr, __value)
-#define vst1q_s16(__addr, __value) __arm_vst1q_s16(__addr, __value)
-#define vst1q_u8(__addr, __value) __arm_vst1q_u8(__addr, __value)
-#define vst1q_u32(__addr, __value) __arm_vst1q_u32(__addr, __value)
-#define vst1q_u16(__addr, __value) __arm_vst1q_u16(__addr, __value)
#define vstrhq_f16(__addr, __value) __arm_vstrhq_f16(__addr, __value)
#define vstrhq_scatter_offset_s32( __base, __offset, __value) __arm_vstrhq_scatter_offset_s32( __base, __offset, __value)
#define vstrhq_scatter_offset_s16( __base, __offset, __value) __arm_vstrhq_scatter_offset_s16( __base, __offset, __value)
@@ -1537,48 +1519,6 @@ __arm_vldrwq_gather_base_z_u32 (uint32x4_t __addr, const int __offset, mve_pred1
return __builtin_mve_vldrwq_gather_base_z_uv4si (__addr, __offset, __p);
}
-__extension__ extern __inline int8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vld1q_s8 (int8_t const * __base)
-{
- return __builtin_mve_vld1q_sv16qi ((__builtin_neon_qi *) __base);
-}
-
-__extension__ extern __inline int32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vld1q_s32 (int32_t const * __base)
-{
- return __builtin_mve_vld1q_sv4si ((__builtin_neon_si *) __base);
-}
-
-__extension__ extern __inline int16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vld1q_s16 (int16_t const * __base)
-{
- return __builtin_mve_vld1q_sv8hi ((__builtin_neon_hi *) __base);
-}
-
-__extension__ extern __inline uint8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vld1q_u8 (uint8_t const * __base)
-{
- return __builtin_mve_vld1q_uv16qi ((__builtin_neon_qi *) __base);
-}
-
-__extension__ extern __inline uint32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vld1q_u32 (uint32_t const * __base)
-{
- return __builtin_mve_vld1q_uv4si ((__builtin_neon_si *) __base);
-}
-
-__extension__ extern __inline uint16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vld1q_u16 (uint16_t const * __base)
-{
- return __builtin_mve_vld1q_uv8hi ((__builtin_neon_hi *) __base);
-}
-
__extension__ extern __inline int32x4_t
__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
__arm_vldrhq_gather_offset_s32 (int16_t const * __base, uint32x4_t __offset)
@@ -1917,48 +1857,6 @@ __arm_vldrwq_gather_shifted_offset_z_u32 (uint32_t const * __base, uint32x4_t __
return __builtin_mve_vldrwq_gather_shifted_offset_z_uv4si ((__builtin_neon_si *) __base, __offset, __p);
}
-__extension__ extern __inline void
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vst1q_s8 (int8_t * __addr, int8x16_t __value)
-{
- __builtin_mve_vst1q_sv16qi ((__builtin_neon_qi *) __addr, __value);
-}
-
-__extension__ extern __inline void
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vst1q_s32 (int32_t * __addr, int32x4_t __value)
-{
- __builtin_mve_vst1q_sv4si ((__builtin_neon_si *) __addr, __value);
-}
-
-__extension__ extern __inline void
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vst1q_s16 (int16_t * __addr, int16x8_t __value)
-{
- __builtin_mve_vst1q_sv8hi ((__builtin_neon_hi *) __addr, __value);
-}
-
-__extension__ extern __inline void
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vst1q_u8 (uint8_t * __addr, uint8x16_t __value)
-{
- __builtin_mve_vst1q_uv16qi ((__builtin_neon_qi *) __addr, __value);
-}
-
-__extension__ extern __inline void
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vst1q_u32 (uint32_t * __addr, uint32x4_t __value)
-{
- __builtin_mve_vst1q_uv4si ((__builtin_neon_si *) __addr, __value);
-}
-
-__extension__ extern __inline void
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vst1q_u16 (uint16_t * __addr, uint16x8_t __value)
-{
- __builtin_mve_vst1q_uv8hi ((__builtin_neon_hi *) __addr, __value);
-}
-
__extension__ extern __inline void
__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
__arm_vstrhq_scatter_offset_s32 (int16_t * __base, uint32x4_t __offset, int32x4_t __value)
@@ -4421,20 +4319,6 @@ __arm_vornq_m_f16 (float16x8_t __inactive, float16x8_t __a, float16x8_t __b, mve
return __builtin_mve_vornq_m_fv8hf (__inactive, __a, __b, __p);
}
-__extension__ extern __inline float32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vld1q_f32 (float32_t const * __base)
-{
- return __builtin_mve_vld1q_fv4sf((__builtin_neon_si *) __base);
-}
-
-__extension__ extern __inline float16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vld1q_f16 (float16_t const * __base)
-{
- return __builtin_mve_vld1q_fv8hf((__builtin_neon_hi *) __base);
-}
-
__extension__ extern __inline float32x4_t
__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
__arm_vldrwq_f32 (float32_t const * __base)
@@ -4547,20 +4431,6 @@ __arm_vstrwq_f32 (float32_t * __addr, float32x4_t __value)
__builtin_mve_vstrwq_fv4sf ((__builtin_neon_si *) __addr, __value);
}
-__extension__ extern __inline void
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vst1q_f32 (float32_t * __addr, float32x4_t __value)
-{
- __builtin_mve_vst1q_fv4sf ((__builtin_neon_si *) __addr, __value);
-}
-
-__extension__ extern __inline void
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vst1q_f16 (float16_t * __addr, float16x8_t __value)
-{
- __builtin_mve_vst1q_fv8hf ((__builtin_neon_hi *) __addr, __value);
-}
-
__extension__ extern __inline void
__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
__arm_vstrhq_f16 (float16_t * __addr, float16x8_t __value)
@@ -5651,48 +5521,6 @@ __arm_vldrbq_gather_offset_z (uint8_t const * __base, uint16x8_t __offset, mve_p
return __arm_vldrbq_gather_offset_z_u16 (__base, __offset, __p);
}
-__extension__ extern __inline int8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vld1q (int8_t const * __base)
-{
- return __arm_vld1q_s8 (__base);
-}
-
-__extension__ extern __inline int32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vld1q (int32_t const * __base)
-{
- return __arm_vld1q_s32 (__base);
-}
-
-__extension__ extern __inline int16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vld1q (int16_t const * __base)
-{
- return __arm_vld1q_s16 (__base);
-}
-
-__extension__ extern __inline uint8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vld1q (uint8_t const * __base)
-{
- return __arm_vld1q_u8 (__base);
-}
-
-__extension__ extern __inline uint32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vld1q (uint32_t const * __base)
-{
- return __arm_vld1q_u32 (__base);
-}
-
-__extension__ extern __inline uint16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vld1q (uint16_t const * __base)
-{
- return __arm_vld1q_u16 (__base);
-}
-
__extension__ extern __inline int32x4_t
__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
__arm_vldrhq_gather_offset (int16_t const * __base, uint32x4_t __offset)
@@ -5917,48 +5745,6 @@ __arm_vldrwq_gather_shifted_offset_z (uint32_t const * __base, uint32x4_t __offs
return __arm_vldrwq_gather_shifted_offset_z_u32 (__base, __offset, __p);
}
-__extension__ extern __inline void
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vst1q (int8_t * __addr, int8x16_t __value)
-{
- __arm_vst1q_s8 (__addr, __value);
-}
-
-__extension__ extern __inline void
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vst1q (int32_t * __addr, int32x4_t __value)
-{
- __arm_vst1q_s32 (__addr, __value);
-}
-
-__extension__ extern __inline void
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vst1q (int16_t * __addr, int16x8_t __value)
-{
- __arm_vst1q_s16 (__addr, __value);
-}
-
-__extension__ extern __inline void
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vst1q (uint8_t * __addr, uint8x16_t __value)
-{
- __arm_vst1q_u8 (__addr, __value);
-}
-
-__extension__ extern __inline void
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vst1q (uint32_t * __addr, uint32x4_t __value)
-{
- __arm_vst1q_u32 (__addr, __value);
-}
-
-__extension__ extern __inline void
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vst1q (uint16_t * __addr, uint16x8_t __value)
-{
- __arm_vst1q_u16 (__addr, __value);
-}
-
__extension__ extern __inline void
__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
__arm_vstrhq_scatter_offset (int16_t * __base, uint32x4_t __offset, int32x4_t __value)
@@ -7809,20 +7595,6 @@ __arm_vornq_m (float16x8_t __inactive, float16x8_t __a, float16x8_t __b, mve_pre
return __arm_vornq_m_f16 (__inactive, __a, __b, __p);
}
-__extension__ extern __inline float32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vld1q (float32_t const * __base)
-{
- return __arm_vld1q_f32 (__base);
-}
-
-__extension__ extern __inline float16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vld1q (float16_t const * __base)
-{
- return __arm_vld1q_f16 (__base);
-}
-
__extension__ extern __inline float16x8_t
__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
__arm_vldrhq_gather_offset (float16_t const * __base, uint16x8_t __offset)
@@ -7893,20 +7665,6 @@ __arm_vstrwq (float32_t * __addr, float32x4_t __value)
__arm_vstrwq_f32 (__addr, __value);
}
-__extension__ extern __inline void
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vst1q (float32_t * __addr, float32x4_t __value)
-{
- __arm_vst1q_f32 (__addr, __value);
-}
-
-__extension__ extern __inline void
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vst1q (float16_t * __addr, float16x8_t __value)
-{
- __arm_vst1q_f16 (__addr, __value);
-}
-
__extension__ extern __inline void
__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
__arm_vstrhq (float16_t * __addr, float16x8_t __value)
@@ -8670,17 +8428,6 @@ extern void *__ARM_undef;
int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t]: __arm_vornq_m_f16 (__ARM_mve_coerce(__p0, float16x8_t), __ARM_mve_coerce(__p1, float16x8_t), __ARM_mve_coerce(__p2, float16x8_t), p3), \
int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t]: __arm_vornq_m_f32 (__ARM_mve_coerce(__p0, float32x4_t), __ARM_mve_coerce(__p1, float32x4_t), __ARM_mve_coerce(__p2, float32x4_t), p3));})
-#define __arm_vld1q(p0) (\
- _Generic( (int (*)[__ARM_mve_typeid(p0)])0, \
- int (*)[__ARM_mve_type_int8_t_ptr]: __arm_vld1q_s8 (__ARM_mve_coerce_s8_ptr(p0, int8_t *)), \
- int (*)[__ARM_mve_type_int16_t_ptr]: __arm_vld1q_s16 (__ARM_mve_coerce_s16_ptr(p0, int16_t *)), \
- int (*)[__ARM_mve_type_int32_t_ptr]: __arm_vld1q_s32 (__ARM_mve_coerce_s32_ptr(p0, int32_t *)), \
- int (*)[__ARM_mve_type_uint8_t_ptr]: __arm_vld1q_u8 (__ARM_mve_coerce_u8_ptr(p0, uint8_t *)), \
- int (*)[__ARM_mve_type_uint16_t_ptr]: __arm_vld1q_u16 (__ARM_mve_coerce_u16_ptr(p0, uint16_t *)), \
- int (*)[__ARM_mve_type_uint32_t_ptr]: __arm_vld1q_u32 (__ARM_mve_coerce_u32_ptr(p0, uint32_t *)), \
- int (*)[__ARM_mve_type_float16_t_ptr]: __arm_vld1q_f16 (__ARM_mve_coerce_f16_ptr(p0, float16_t *)), \
- int (*)[__ARM_mve_type_float32_t_ptr]: __arm_vld1q_f32 (__ARM_mve_coerce_f32_ptr(p0, float32_t *))))
-
#define __arm_vld1q_z(p0,p1) ( \
_Generic( (int (*)[__ARM_mve_typeid(p0)])0, \
int (*)[__ARM_mve_type_int8_t_ptr]: __arm_vld1q_z_s8 (__ARM_mve_coerce_s8_ptr(p0, int8_t *), p1), \
@@ -8792,17 +8539,6 @@ extern void *__ARM_undef;
int (*)[__ARM_mve_type_float16_t_ptr][__ARM_mve_type_float16x8x2_t]: __arm_vst2q_f16 (__ARM_mve_coerce_f16_ptr(p0, float16_t *), __ARM_mve_coerce(__p1, float16x8x2_t)), \
int (*)[__ARM_mve_type_float32_t_ptr][__ARM_mve_type_float32x4x2_t]: __arm_vst2q_f32 (__ARM_mve_coerce_f32_ptr(p0, float32_t *), __ARM_mve_coerce(__p1, float32x4x2_t)));})
-#define __arm_vst1q(p0,p1) ({ __typeof(p1) __p1 = (p1); \
- _Generic( (int (*)[__ARM_mve_typeid(p0)][__ARM_mve_typeid(__p1)])0, \
- int (*)[__ARM_mve_type_int8_t_ptr][__ARM_mve_type_int8x16_t]: __arm_vst1q_s8 (__ARM_mve_coerce_s8_ptr(p0, int8_t *), __ARM_mve_coerce(__p1, int8x16_t)), \
- int (*)[__ARM_mve_type_int16_t_ptr][__ARM_mve_type_int16x8_t]: __arm_vst1q_s16 (__ARM_mve_coerce_s16_ptr(p0, int16_t *), __ARM_mve_coerce(__p1, int16x8_t)), \
- int (*)[__ARM_mve_type_int32_t_ptr][__ARM_mve_type_int32x4_t]: __arm_vst1q_s32 (__ARM_mve_coerce_s32_ptr(p0, int32_t *), __ARM_mve_coerce(__p1, int32x4_t)), \
- int (*)[__ARM_mve_type_uint8_t_ptr][__ARM_mve_type_uint8x16_t]: __arm_vst1q_u8 (__ARM_mve_coerce_u8_ptr(p0, uint8_t *), __ARM_mve_coerce(__p1, uint8x16_t)), \
- int (*)[__ARM_mve_type_uint16_t_ptr][__ARM_mve_type_uint16x8_t]: __arm_vst1q_u16 (__ARM_mve_coerce_u16_ptr(p0, uint16_t *), __ARM_mve_coerce(__p1, uint16x8_t)), \
- int (*)[__ARM_mve_type_uint32_t_ptr][__ARM_mve_type_uint32x4_t]: __arm_vst1q_u32 (__ARM_mve_coerce_u32_ptr(p0, uint32_t *), __ARM_mve_coerce(__p1, uint32x4_t)), \
- int (*)[__ARM_mve_type_float16_t_ptr][__ARM_mve_type_float16x8_t]: __arm_vst1q_f16 (__ARM_mve_coerce_f16_ptr(p0, float16_t *), __ARM_mve_coerce(__p1, float16x8_t)), \
- int (*)[__ARM_mve_type_float32_t_ptr][__ARM_mve_type_float32x4_t]: __arm_vst1q_f32 (__ARM_mve_coerce_f32_ptr(p0, float32_t *), __ARM_mve_coerce(__p1, float32x4_t)));})
-
#define __arm_vstrhq(p0,p1) ({ __typeof(p1) __p1 = (p1); \
_Generic( (int (*)[__ARM_mve_typeid(p0)][__ARM_mve_typeid(__p1)])0, \
int (*)[__ARM_mve_type_int16_t_ptr][__ARM_mve_type_int16x8_t]: __arm_vstrhq_s16 (__ARM_mve_coerce_s16_ptr(p0, int16_t *), __ARM_mve_coerce(__p1, int16x8_t)), \
@@ -9149,15 +8885,6 @@ extern void *__ARM_undef;
int (*)[__ARM_mve_type_int32x4_t]: __arm_vstrwq_scatter_base_p_s32 (p0, p1, __ARM_mve_coerce(__p2, int32x4_t), p3), \
int (*)[__ARM_mve_type_uint32x4_t]: __arm_vstrwq_scatter_base_p_u32 (p0, p1, __ARM_mve_coerce(__p2, uint32x4_t), p3));})
-#define __arm_vld1q(p0) (\
- _Generic( (int (*)[__ARM_mve_typeid(p0)])0, \
- int (*)[__ARM_mve_type_int8_t_ptr]: __arm_vld1q_s8 (__ARM_mve_coerce_s8_ptr(p0, int8_t *)), \
- int (*)[__ARM_mve_type_int16_t_ptr]: __arm_vld1q_s16 (__ARM_mve_coerce_s16_ptr(p0, int16_t *)), \
- int (*)[__ARM_mve_type_int32_t_ptr]: __arm_vld1q_s32 (__ARM_mve_coerce_s32_ptr(p0, int32_t *)), \
- int (*)[__ARM_mve_type_uint8_t_ptr]: __arm_vld1q_u8 (__ARM_mve_coerce_u8_ptr(p0, uint8_t *)), \
- int (*)[__ARM_mve_type_uint16_t_ptr]: __arm_vld1q_u16 (__ARM_mve_coerce_u16_ptr(p0, uint16_t *)), \
- int (*)[__ARM_mve_type_uint32_t_ptr]: __arm_vld1q_u32 (__ARM_mve_coerce_u32_ptr(p0, uint32_t *))))
-
#define __arm_vldrhq_gather_offset(p0,p1) ({ __typeof(p1) __p1 = (p1); \
_Generic( (int (*)[__ARM_mve_typeid(p0)][__ARM_mve_typeid(__p1)])0, \
int (*)[__ARM_mve_type_int16_t_ptr][__ARM_mve_type_uint16x8_t]: __arm_vldrhq_gather_offset_s16 (__ARM_mve_coerce_s16_ptr(p0, int16_t *), __ARM_mve_coerce(__p1, uint16x8_t)), \
@@ -9206,15 +8933,6 @@ extern void *__ARM_undef;
int (*)[__ARM_mve_type_int32_t_ptr]: __arm_vldrwq_gather_shifted_offset_z_s32 (__ARM_mve_coerce_s32_ptr(__p0, int32_t *), p1, p2), \
int (*)[__ARM_mve_type_uint32_t_ptr]: __arm_vldrwq_gather_shifted_offset_z_u32 (__ARM_mve_coerce_u32_ptr(__p0, uint32_t *), p1, p2));})
-#define __arm_vst1q(p0,p1) ({ __typeof(p1) __p1 = (p1); \
- _Generic( (int (*)[__ARM_mve_typeid(p0)][__ARM_mve_typeid(__p1)])0, \
- int (*)[__ARM_mve_type_int8_t_ptr][__ARM_mve_type_int8x16_t]: __arm_vst1q_s8 (__ARM_mve_coerce_s8_ptr(p0, int8_t *), __ARM_mve_coerce(__p1, int8x16_t)), \
- int (*)[__ARM_mve_type_int16_t_ptr][__ARM_mve_type_int16x8_t]: __arm_vst1q_s16 (__ARM_mve_coerce_s16_ptr(p0, int16_t *), __ARM_mve_coerce(__p1, int16x8_t)), \
- int (*)[__ARM_mve_type_int32_t_ptr][__ARM_mve_type_int32x4_t]: __arm_vst1q_s32 (__ARM_mve_coerce_s32_ptr(p0, int32_t *), __ARM_mve_coerce(__p1, int32x4_t)), \
- int (*)[__ARM_mve_type_uint8_t_ptr][__ARM_mve_type_uint8x16_t]: __arm_vst1q_u8 (__ARM_mve_coerce_u8_ptr(p0, uint8_t *), __ARM_mve_coerce(__p1, uint8x16_t)), \
- int (*)[__ARM_mve_type_uint16_t_ptr][__ARM_mve_type_uint16x8_t]: __arm_vst1q_u16 (__ARM_mve_coerce_u16_ptr(p0, uint16_t *), __ARM_mve_coerce(__p1, uint16x8_t)), \
- int (*)[__ARM_mve_type_uint32_t_ptr][__ARM_mve_type_uint32x4_t]: __arm_vst1q_u32 (__ARM_mve_coerce_u32_ptr(p0, uint32_t *), __ARM_mve_coerce(__p1, uint32x4_t)));})
-
#define __arm_vst1q_p(p0,p1,p2) ({ __typeof(p1) __p1 = (p1); \
_Generic( (int (*)[__ARM_mve_typeid(p0)][__ARM_mve_typeid(__p1)])0, \
int (*)[__ARM_mve_type_int8_t_ptr][__ARM_mve_type_int8x16_t]: __arm_vst1q_p_s8 (__ARM_mve_coerce_s8_ptr(p0, int8_t *), __ARM_mve_coerce(__p1, int8x16_t), p2), \
diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
index 366cec0812a..b0d3443da9c 100644
--- a/gcc/config/arm/mve.md
+++ b/gcc/config/arm/mve.md
@@ -3690,7 +3690,7 @@ (define_insn "mve_vldrwq_z_<supf>v4si"
}
[(set_attr "length" "8")])
-(define_expand "mve_vld1q_f<mode>"
+(define_expand "@mve_vld1q_f<mode>"
[(match_operand:MVE_0 0 "s_register_operand")
(unspec:MVE_0 [(match_operand:<MVE_CNVT> 1 "mve_memory_operand")] VLD1Q_F)
]
@@ -3700,7 +3700,7 @@ (define_expand "mve_vld1q_f<mode>"
DONE;
})
-(define_expand "mve_vld1q_<supf><mode>"
+(define_expand "@mve_vld1q_<supf><mode>"
[(match_operand:MVE_2 0 "s_register_operand")
(unspec:MVE_2 [(match_operand:MVE_2 1 "mve_memory_operand")] VLD1Q)
]
@@ -4408,7 +4408,7 @@ (define_insn "mve_vstrwq_<supf>v4si"
}
[(set_attr "length" "4")])
-(define_expand "mve_vst1q_f<mode>"
+(define_expand "@mve_vst1q_f<mode>"
[(match_operand:<MVE_CNVT> 0 "mve_memory_operand")
(unspec:<MVE_CNVT> [(match_operand:MVE_0 1 "s_register_operand")] VST1Q_F)
]
@@ -4418,7 +4418,7 @@ (define_expand "mve_vst1q_f<mode>"
DONE;
})
-(define_expand "mve_vst1q_<supf><mode>"
+(define_expand "@mve_vst1q_<supf><mode>"
[(match_operand:MVE_2 0 "mve_memory_operand")
(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand")] VST1Q)
]
--
2.34.1
^ permalink raw reply [flat|nested] 15+ messages in thread
* RE: [PATCH 5/6] arm: [MVE intrinsics] fix vst1 tests
2023-11-16 15:26 ` [PATCH 5/6] arm: [MVE intrinsics] fix vst1 tests Christophe Lyon
@ 2023-11-16 15:30 ` Kyrylo Tkachov
2023-11-16 15:37 ` Christophe Lyon
0 siblings, 1 reply; 15+ messages in thread
From: Kyrylo Tkachov @ 2023-11-16 15:30 UTC (permalink / raw)
To: Christophe Lyon, gcc-patches, Richard Sandiford, Richard Earnshaw
> -----Original Message-----
> From: Christophe Lyon <christophe.lyon@linaro.org>
> Sent: Thursday, November 16, 2023 3:26 PM
> To: gcc-patches@gcc.gnu.org; Richard Sandiford
> <Richard.Sandiford@arm.com>; Richard Earnshaw
> <Richard.Earnshaw@arm.com>; Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>
> Cc: Christophe Lyon <christophe.lyon@linaro.org>
> Subject: [PATCH 5/6] arm: [MVE intrinsics] fix vst1 tests
>
> vst1q intrinsics return void, so we should not do 'return vst1q_f16 (base,
> value);'
>
> This was OK so far, but will trigger an error/warning with the new
> implementation of these intrinsics.
>
Whoops!
Ok (could have gone in as obvious IMO).
Thanks,
Kyrill
> This patch just removes the 'return' keyword.
>
> 2023-11-16 Christophe Lyon <christophe.lyon@linaro.org>
>
> gcc/testsuite/
> * gcc.target/arm/mve/intrinsics/vst1q_f16.c: Remove 'return'.
> * gcc.target/arm/mve/intrinsics/vst1q_f32.c: Likewise.
> * gcc.target/arm/mve/intrinsics/vst1q_s16.c: Likewise.
> * gcc.target/arm/mve/intrinsics/vst1q_s32.c: Likewise.
> * gcc.target/arm/mve/intrinsics/vst1q_s8.c: Likewise.
> * gcc.target/arm/mve/intrinsics/vst1q_u16.c: Likewise.
> * gcc.target/arm/mve/intrinsics/vst1q_u32.c: Likewise.
> * gcc.target/arm/mve/intrinsics/vst1q_u8.c: Likewise.
> ---
> gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_f16.c | 4 ++--
> gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_f32.c | 4 ++--
> gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_s16.c | 4 ++--
> gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_s32.c | 4 ++--
> gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_s8.c | 4 ++--
> gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_u16.c | 4 ++--
> gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_u32.c | 4 ++--
> gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_u8.c | 4 ++--
> 8 files changed, 16 insertions(+), 16 deletions(-)
>
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_f16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_f16.c
> index 1fa02f00f53..e4b40604d54 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_f16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_f16.c
> @@ -18,7 +18,7 @@ extern "C" {
> void
> foo (float16_t *base, float16x8_t value)
> {
> - return vst1q_f16 (base, value);
> + vst1q_f16 (base, value);
> }
>
>
> @@ -31,7 +31,7 @@ foo (float16_t *base, float16x8_t value)
> void
> foo1 (float16_t *base, float16x8_t value)
> {
> - return vst1q (base, value);
> + vst1q (base, value);
> }
>
> #ifdef __cplusplus
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_f32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_f32.c
> index 67cc3ae3b47..8f42323c603 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_f32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_f32.c
> @@ -18,7 +18,7 @@ extern "C" {
> void
> foo (float32_t *base, float32x4_t value)
> {
> - return vst1q_f32 (base, value);
> + vst1q_f32 (base, value);
> }
>
>
> @@ -31,7 +31,7 @@ foo (float32_t *base, float32x4_t value)
> void
> foo1 (float32_t *base, float32x4_t value)
> {
> - return vst1q (base, value);
> + vst1q (base, value);
> }
>
> #ifdef __cplusplus
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_s16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_s16.c
> index 052959b2083..891ac4155d9 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_s16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_s16.c
> @@ -18,7 +18,7 @@ extern "C" {
> void
> foo (int16_t *base, int16x8_t value)
> {
> - return vst1q_s16 (base, value);
> + vst1q_s16 (base, value);
> }
>
>
> @@ -31,7 +31,7 @@ foo (int16_t *base, int16x8_t value)
> void
> foo1 (int16_t *base, int16x8_t value)
> {
> - return vst1q (base, value);
> + vst1q (base, value);
> }
>
> #ifdef __cplusplus
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_s32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_s32.c
> index 444ad07f4ef..a28d1eb98db 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_s32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_s32.c
> @@ -18,7 +18,7 @@ extern "C" {
> void
> foo (int32_t *base, int32x4_t value)
> {
> - return vst1q_s32 (base, value);
> + vst1q_s32 (base, value);
> }
>
>
> @@ -31,7 +31,7 @@ foo (int32_t *base, int32x4_t value)
> void
> foo1 (int32_t *base, int32x4_t value)
> {
> - return vst1q (base, value);
> + vst1q (base, value);
> }
>
> #ifdef __cplusplus
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_s8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_s8.c
> index 684ff0aca5b..81c141a63e0 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_s8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_s8.c
> @@ -18,7 +18,7 @@ extern "C" {
> void
> foo (int8_t *base, int8x16_t value)
> {
> - return vst1q_s8 (base, value);
> + vst1q_s8 (base, value);
> }
>
>
> @@ -31,7 +31,7 @@ foo (int8_t *base, int8x16_t value)
> void
> foo1 (int8_t *base, int8x16_t value)
> {
> - return vst1q (base, value);
> + vst1q (base, value);
> }
>
> #ifdef __cplusplus
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_u16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_u16.c
> index 1fea2de1e76..b8ce7fbe6ee 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_u16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_u16.c
> @@ -18,7 +18,7 @@ extern "C" {
> void
> foo (uint16_t *base, uint16x8_t value)
> {
> - return vst1q_u16 (base, value);
> + vst1q_u16 (base, value);
> }
>
>
> @@ -31,7 +31,7 @@ foo (uint16_t *base, uint16x8_t value)
> void
> foo1 (uint16_t *base, uint16x8_t value)
> {
> - return vst1q (base, value);
> + vst1q (base, value);
> }
>
> #ifdef __cplusplus
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_u32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_u32.c
> index 64c43c59d47..1dbb55538a9 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_u32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_u32.c
> @@ -18,7 +18,7 @@ extern "C" {
> void
> foo (uint32_t *base, uint32x4_t value)
> {
> - return vst1q_u32 (base, value);
> + vst1q_u32 (base, value);
> }
>
>
> @@ -31,7 +31,7 @@ foo (uint32_t *base, uint32x4_t value)
> void
> foo1 (uint32_t *base, uint32x4_t value)
> {
> - return vst1q (base, value);
> + vst1q (base, value);
> }
>
> #ifdef __cplusplus
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_u8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_u8.c
> index 5517611bba6..ab22be81647 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_u8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_u8.c
> @@ -18,7 +18,7 @@ extern "C" {
> void
> foo (uint8_t *base, uint8x16_t value)
> {
> - return vst1q_u8 (base, value);
> + vst1q_u8 (base, value);
> }
>
>
> @@ -31,7 +31,7 @@ foo (uint8_t *base, uint8x16_t value)
> void
> foo1 (uint8_t *base, uint8x16_t value)
> {
> - return vst1q (base, value);
> + vst1q (base, value);
> }
>
> #ifdef __cplusplus
> --
> 2.34.1
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 5/6] arm: [MVE intrinsics] fix vst1 tests
2023-11-16 15:30 ` Kyrylo Tkachov
@ 2023-11-16 15:37 ` Christophe Lyon
0 siblings, 0 replies; 15+ messages in thread
From: Christophe Lyon @ 2023-11-16 15:37 UTC (permalink / raw)
To: Kyrylo Tkachov; +Cc: gcc-patches, Richard Sandiford, Richard Earnshaw
On Thu, 16 Nov 2023 at 16:30, Kyrylo Tkachov <Kyrylo.Tkachov@arm.com> wrote:
>
>
>
> > -----Original Message-----
> > From: Christophe Lyon <christophe.lyon@linaro.org>
> > Sent: Thursday, November 16, 2023 3:26 PM
> > To: gcc-patches@gcc.gnu.org; Richard Sandiford
> > <Richard.Sandiford@arm.com>; Richard Earnshaw
> > <Richard.Earnshaw@arm.com>; Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>
> > Cc: Christophe Lyon <christophe.lyon@linaro.org>
> > Subject: [PATCH 5/6] arm: [MVE intrinsics] fix vst1 tests
> >
> > vst1q intrinsics return void, so we should not do 'return vst1q_f16 (base,
> > value);'
> >
> > This was OK so far, but will trigger an error/warning with the new
> > implementation of these intrinsics.
> >
>
> Whoops!
> Ok (could have gone in as obvious IMO).
Indeed, I'll try to remember that when I write the same patch for the
other vst* intrinsics tests ;-)
Thanks,
Christophe
> Thanks,
> Kyrill
>
> > This patch just removes the 'return' keyword.
> >
> > 2023-11-16 Christophe Lyon <christophe.lyon@linaro.org>
> >
> > gcc/testsuite/
> > * gcc.target/arm/mve/intrinsics/vst1q_f16.c: Remove 'return'.
> > * gcc.target/arm/mve/intrinsics/vst1q_f32.c: Likewise.
> > * gcc.target/arm/mve/intrinsics/vst1q_s16.c: Likewise.
> > * gcc.target/arm/mve/intrinsics/vst1q_s32.c: Likewise.
> > * gcc.target/arm/mve/intrinsics/vst1q_s8.c: Likewise.
> > * gcc.target/arm/mve/intrinsics/vst1q_u16.c: Likewise.
> > * gcc.target/arm/mve/intrinsics/vst1q_u32.c: Likewise.
> > * gcc.target/arm/mve/intrinsics/vst1q_u8.c: Likewise.
> > ---
> > gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_f16.c | 4 ++--
> > gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_f32.c | 4 ++--
> > gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_s16.c | 4 ++--
> > gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_s32.c | 4 ++--
> > gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_s8.c | 4 ++--
> > gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_u16.c | 4 ++--
> > gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_u32.c | 4 ++--
> > gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_u8.c | 4 ++--
> > 8 files changed, 16 insertions(+), 16 deletions(-)
> >
> > diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_f16.c
> > b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_f16.c
> > index 1fa02f00f53..e4b40604d54 100644
> > --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_f16.c
> > +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_f16.c
> > @@ -18,7 +18,7 @@ extern "C" {
> > void
> > foo (float16_t *base, float16x8_t value)
> > {
> > - return vst1q_f16 (base, value);
> > + vst1q_f16 (base, value);
> > }
> >
> >
> > @@ -31,7 +31,7 @@ foo (float16_t *base, float16x8_t value)
> > void
> > foo1 (float16_t *base, float16x8_t value)
> > {
> > - return vst1q (base, value);
> > + vst1q (base, value);
> > }
> >
> > #ifdef __cplusplus
> > diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_f32.c
> > b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_f32.c
> > index 67cc3ae3b47..8f42323c603 100644
> > --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_f32.c
> > +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_f32.c
> > @@ -18,7 +18,7 @@ extern "C" {
> > void
> > foo (float32_t *base, float32x4_t value)
> > {
> > - return vst1q_f32 (base, value);
> > + vst1q_f32 (base, value);
> > }
> >
> >
> > @@ -31,7 +31,7 @@ foo (float32_t *base, float32x4_t value)
> > void
> > foo1 (float32_t *base, float32x4_t value)
> > {
> > - return vst1q (base, value);
> > + vst1q (base, value);
> > }
> >
> > #ifdef __cplusplus
> > diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_s16.c
> > b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_s16.c
> > index 052959b2083..891ac4155d9 100644
> > --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_s16.c
> > +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_s16.c
> > @@ -18,7 +18,7 @@ extern "C" {
> > void
> > foo (int16_t *base, int16x8_t value)
> > {
> > - return vst1q_s16 (base, value);
> > + vst1q_s16 (base, value);
> > }
> >
> >
> > @@ -31,7 +31,7 @@ foo (int16_t *base, int16x8_t value)
> > void
> > foo1 (int16_t *base, int16x8_t value)
> > {
> > - return vst1q (base, value);
> > + vst1q (base, value);
> > }
> >
> > #ifdef __cplusplus
> > diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_s32.c
> > b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_s32.c
> > index 444ad07f4ef..a28d1eb98db 100644
> > --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_s32.c
> > +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_s32.c
> > @@ -18,7 +18,7 @@ extern "C" {
> > void
> > foo (int32_t *base, int32x4_t value)
> > {
> > - return vst1q_s32 (base, value);
> > + vst1q_s32 (base, value);
> > }
> >
> >
> > @@ -31,7 +31,7 @@ foo (int32_t *base, int32x4_t value)
> > void
> > foo1 (int32_t *base, int32x4_t value)
> > {
> > - return vst1q (base, value);
> > + vst1q (base, value);
> > }
> >
> > #ifdef __cplusplus
> > diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_s8.c
> > b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_s8.c
> > index 684ff0aca5b..81c141a63e0 100644
> > --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_s8.c
> > +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_s8.c
> > @@ -18,7 +18,7 @@ extern "C" {
> > void
> > foo (int8_t *base, int8x16_t value)
> > {
> > - return vst1q_s8 (base, value);
> > + vst1q_s8 (base, value);
> > }
> >
> >
> > @@ -31,7 +31,7 @@ foo (int8_t *base, int8x16_t value)
> > void
> > foo1 (int8_t *base, int8x16_t value)
> > {
> > - return vst1q (base, value);
> > + vst1q (base, value);
> > }
> >
> > #ifdef __cplusplus
> > diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_u16.c
> > b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_u16.c
> > index 1fea2de1e76..b8ce7fbe6ee 100644
> > --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_u16.c
> > +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_u16.c
> > @@ -18,7 +18,7 @@ extern "C" {
> > void
> > foo (uint16_t *base, uint16x8_t value)
> > {
> > - return vst1q_u16 (base, value);
> > + vst1q_u16 (base, value);
> > }
> >
> >
> > @@ -31,7 +31,7 @@ foo (uint16_t *base, uint16x8_t value)
> > void
> > foo1 (uint16_t *base, uint16x8_t value)
> > {
> > - return vst1q (base, value);
> > + vst1q (base, value);
> > }
> >
> > #ifdef __cplusplus
> > diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_u32.c
> > b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_u32.c
> > index 64c43c59d47..1dbb55538a9 100644
> > --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_u32.c
> > +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_u32.c
> > @@ -18,7 +18,7 @@ extern "C" {
> > void
> > foo (uint32_t *base, uint32x4_t value)
> > {
> > - return vst1q_u32 (base, value);
> > + vst1q_u32 (base, value);
> > }
> >
> >
> > @@ -31,7 +31,7 @@ foo (uint32_t *base, uint32x4_t value)
> > void
> > foo1 (uint32_t *base, uint32x4_t value)
> > {
> > - return vst1q (base, value);
> > + vst1q (base, value);
> > }
> >
> > #ifdef __cplusplus
> > diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_u8.c
> > b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_u8.c
> > index 5517611bba6..ab22be81647 100644
> > --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_u8.c
> > +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_u8.c
> > @@ -18,7 +18,7 @@ extern "C" {
> > void
> > foo (uint8_t *base, uint8x16_t value)
> > {
> > - return vst1q_u8 (base, value);
> > + vst1q_u8 (base, value);
> > }
> >
> >
> > @@ -31,7 +31,7 @@ foo (uint8_t *base, uint8x16_t value)
> > void
> > foo1 (uint8_t *base, uint8x16_t value)
> > {
> > - return vst1q (base, value);
> > + vst1q (base, value);
> > }
> >
> > #ifdef __cplusplus
> > --
> > 2.34.1
>
^ permalink raw reply [flat|nested] 15+ messages in thread
* RE: [PATCH 1/6] arm: Fix arm_simd_types and MVE scalar_types
2023-11-16 15:26 [PATCH 1/6] arm: Fix arm_simd_types and MVE scalar_types Christophe Lyon
` (4 preceding siblings ...)
2023-11-16 15:26 ` [PATCH 6/6] arm: [MVE intrinsics] rework vldq1 vst1q Christophe Lyon
@ 2023-11-16 16:46 ` Kyrylo Tkachov
5 siblings, 0 replies; 15+ messages in thread
From: Kyrylo Tkachov @ 2023-11-16 16:46 UTC (permalink / raw)
To: Christophe Lyon, gcc-patches, Richard Sandiford, Richard Earnshaw
> -----Original Message-----
> From: Christophe Lyon <christophe.lyon@linaro.org>
> Sent: Thursday, November 16, 2023 3:26 PM
> To: gcc-patches@gcc.gnu.org; Richard Sandiford
> <Richard.Sandiford@arm.com>; Richard Earnshaw
> <Richard.Earnshaw@arm.com>; Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>
> Cc: Christophe Lyon <christophe.lyon@linaro.org>
> Subject: [PATCH 1/6] arm: Fix arm_simd_types and MVE scalar_types
>
> So far we define arm_simd_types and scalar_types using type
> definitions like intSI_type_node, etc...
>
> This is causing problems with later patches which re-implement
> load/store MVE intrinsics, leading to error messages such as:
> error: passing argument 1 of 'vst1q_s32' from incompatible pointer type
> note: expected 'int *' but argument is of type 'int32_t *' {aka 'long int *'}
>
> This patch uses get_typenode_from_name (INT32_TYPE) instead, which
> defines the types as appropriate for the target/C library.
Ok.
Thanks,
Kyrill
>
> 2023-11-16 Christophe Lyon <christophe.lyon@linaro.org>
>
> gcc/
> * config/arm/arm-builtins.cc (arm_init_simd_builtin_types): Fix
> initialization of arm_simd_types[].eltype.
> * config/arm/arm-mve-builtins.def (DEF_MVE_TYPE): Fix scalar
> types.
> ---
> gcc/config/arm/arm-builtins.cc | 28 ++++++++++++++--------------
> gcc/config/arm/arm-mve-builtins.def | 16 ++++++++--------
> 2 files changed, 22 insertions(+), 22 deletions(-)
>
> diff --git a/gcc/config/arm/arm-builtins.cc b/gcc/config/arm/arm-builtins.cc
> index fca7dcaf565..dd9c5815c45 100644
> --- a/gcc/config/arm/arm-builtins.cc
> +++ b/gcc/config/arm/arm-builtins.cc
> @@ -1580,20 +1580,20 @@ arm_init_simd_builtin_types (void)
> TYPE_STRING_FLAG (arm_simd_polyHI_type_node) = false;
> }
> /* Init all the element types built by the front-end. */
> - arm_simd_types[Int8x8_t].eltype = intQI_type_node;
> - arm_simd_types[Int8x16_t].eltype = intQI_type_node;
> - arm_simd_types[Int16x4_t].eltype = intHI_type_node;
> - arm_simd_types[Int16x8_t].eltype = intHI_type_node;
> - arm_simd_types[Int32x2_t].eltype = intSI_type_node;
> - arm_simd_types[Int32x4_t].eltype = intSI_type_node;
> - arm_simd_types[Int64x2_t].eltype = intDI_type_node;
> - arm_simd_types[Uint8x8_t].eltype = unsigned_intQI_type_node;
> - arm_simd_types[Uint8x16_t].eltype = unsigned_intQI_type_node;
> - arm_simd_types[Uint16x4_t].eltype = unsigned_intHI_type_node;
> - arm_simd_types[Uint16x8_t].eltype = unsigned_intHI_type_node;
> - arm_simd_types[Uint32x2_t].eltype = unsigned_intSI_type_node;
> - arm_simd_types[Uint32x4_t].eltype = unsigned_intSI_type_node;
> - arm_simd_types[Uint64x2_t].eltype = unsigned_intDI_type_node;
> + arm_simd_types[Int8x8_t].eltype = get_typenode_from_name
> (INT8_TYPE);
> + arm_simd_types[Int8x16_t].eltype = get_typenode_from_name
> (INT8_TYPE);
> + arm_simd_types[Int16x4_t].eltype = get_typenode_from_name
> (INT16_TYPE);
> + arm_simd_types[Int16x8_t].eltype = get_typenode_from_name
> (INT16_TYPE);
> + arm_simd_types[Int32x2_t].eltype = get_typenode_from_name
> (INT32_TYPE);
> + arm_simd_types[Int32x4_t].eltype = get_typenode_from_name
> (INT32_TYPE);
> + arm_simd_types[Int64x2_t].eltype = get_typenode_from_name
> (INT64_TYPE);
> + arm_simd_types[Uint8x8_t].eltype = get_typenode_from_name
> (UINT8_TYPE);
> + arm_simd_types[Uint8x16_t].eltype = get_typenode_from_name
> (UINT8_TYPE);
> + arm_simd_types[Uint16x4_t].eltype = get_typenode_from_name
> (UINT16_TYPE);
> + arm_simd_types[Uint16x8_t].eltype = get_typenode_from_name
> (UINT16_TYPE);
> + arm_simd_types[Uint32x2_t].eltype = get_typenode_from_name
> (UINT32_TYPE);
> + arm_simd_types[Uint32x4_t].eltype = get_typenode_from_name
> (UINT32_TYPE);
> + arm_simd_types[Uint64x2_t].eltype = get_typenode_from_name
> (UINT64_TYPE);
>
> /* Note: poly64x2_t is defined in arm_neon.h, to ensure it gets default
> mangling. */
> diff --git a/gcc/config/arm/arm-mve-builtins.def b/gcc/config/arm/arm-mve-
> builtins.def
> index e2cf1baf370..a901d8231e9 100644
> --- a/gcc/config/arm/arm-mve-builtins.def
> +++ b/gcc/config/arm/arm-mve-builtins.def
> @@ -39,14 +39,14 @@ DEF_MVE_MODE (r, none, none, none)
>
> #define REQUIRES_FLOAT false
> DEF_MVE_TYPE (mve_pred16_t, boolean_type_node)
> -DEF_MVE_TYPE (uint8x16_t, unsigned_intQI_type_node)
> -DEF_MVE_TYPE (uint16x8_t, unsigned_intHI_type_node)
> -DEF_MVE_TYPE (uint32x4_t, unsigned_intSI_type_node)
> -DEF_MVE_TYPE (uint64x2_t, unsigned_intDI_type_node)
> -DEF_MVE_TYPE (int8x16_t, intQI_type_node)
> -DEF_MVE_TYPE (int16x8_t, intHI_type_node)
> -DEF_MVE_TYPE (int32x4_t, intSI_type_node)
> -DEF_MVE_TYPE (int64x2_t, intDI_type_node)
> +DEF_MVE_TYPE (uint8x16_t, get_typenode_from_name (UINT8_TYPE))
> +DEF_MVE_TYPE (uint16x8_t, get_typenode_from_name (UINT16_TYPE))
> +DEF_MVE_TYPE (uint32x4_t, get_typenode_from_name (UINT32_TYPE))
> +DEF_MVE_TYPE (uint64x2_t, get_typenode_from_name (UINT64_TYPE))
> +DEF_MVE_TYPE (int8x16_t, get_typenode_from_name (INT8_TYPE))
> +DEF_MVE_TYPE (int16x8_t, get_typenode_from_name (INT16_TYPE))
> +DEF_MVE_TYPE (int32x4_t, get_typenode_from_name (INT32_TYPE))
> +DEF_MVE_TYPE (int64x2_t, get_typenode_from_name (INT64_TYPE))
> #undef REQUIRES_FLOAT
>
> #define REQUIRES_FLOAT true
> --
> 2.34.1
^ permalink raw reply [flat|nested] 15+ messages in thread
* RE: [PATCH 2/6] arm: [MVE intrinsics] Add support for void and load/store pointers as argument types.
2023-11-16 15:26 ` [PATCH 2/6] arm: [MVE intrinsics] Add support for void and load/store pointers as argument types Christophe Lyon
@ 2023-11-16 16:47 ` Kyrylo Tkachov
0 siblings, 0 replies; 15+ messages in thread
From: Kyrylo Tkachov @ 2023-11-16 16:47 UTC (permalink / raw)
To: Christophe Lyon, gcc-patches, Richard Sandiford, Richard Earnshaw
> -----Original Message-----
> From: Christophe Lyon <christophe.lyon@linaro.org>
> Sent: Thursday, November 16, 2023 3:26 PM
> To: gcc-patches@gcc.gnu.org; Richard Sandiford
> <Richard.Sandiford@arm.com>; Richard Earnshaw
> <Richard.Earnshaw@arm.com>; Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>
> Cc: Christophe Lyon <christophe.lyon@linaro.org>
> Subject: [PATCH 2/6] arm: [MVE intrinsics] Add support for void and
> load/store pointers as argument types.
>
> This patch adds support for '_', 'al' and 'as' for void, load pointer
> and store pointer argument/return value types in intrinsic signatures.
>
> It also adds a mew memory_scalar_type() helper to function_instance,
> which is used by 'al' and 'as'.
Ok.
Thanks,
Kyrill
>
> 2023-11-16 Christophe Lyon <christophe.lyon@linaro.org>
>
> gcc/
> * config/arm/arm-mve-builtins-shapes.cc (build_const_pointer):
> New.
> (parse_type): Add support for '_', 'al' and 'as'.
> * config/arm/arm-mve-builtins.h (function_instance): Add
> memory_scalar_type.
> (function_base): Likewise.
> ---
> gcc/config/arm/arm-mve-builtins-shapes.cc | 25 +++++++++++++++++++++++
> gcc/config/arm/arm-mve-builtins.h | 17 +++++++++++++++
> 2 files changed, 42 insertions(+)
>
> diff --git a/gcc/config/arm/arm-mve-builtins-shapes.cc b/gcc/config/arm/arm-
> mve-builtins-shapes.cc
> index 23eb9d0e69b..ce87ebcef30 100644
> --- a/gcc/config/arm/arm-mve-builtins-shapes.cc
> +++ b/gcc/config/arm/arm-mve-builtins-shapes.cc
> @@ -39,6 +39,13 @@
>
> namespace arm_mve {
>
> +/* Return a representation of "const T *". */
> +static tree
> +build_const_pointer (tree t)
> +{
> + return build_pointer_type (build_qualified_type (t, TYPE_QUAL_CONST));
> +}
> +
> /* If INSTANCE has a predicate, add it to the list of argument types
> in ARGUMENT_TYPES. RETURN_TYPE is the type returned by the
> function. */
> @@ -140,6 +147,9 @@ parse_element_type (const function_instance
> &instance, const char *&format)
> /* Read and return a type from FORMAT for function INSTANCE. Advance
> FORMAT beyond the type string. The format is:
>
> + _ - void
> + al - array pointer for loads
> + as - array pointer for stores
> p - predicates with type mve_pred16_t
> s<elt> - a scalar type with the given element suffix
> t<elt> - a vector or tuple type with given element suffix [*1]
> @@ -156,6 +166,21 @@ parse_type (const function_instance &instance,
> const char *&format)
> {
> int ch = *format++;
>
> +
> + if (ch == '_')
> + return void_type_node;
> +
> + if (ch == 'a')
> + {
> + ch = *format++;
> + if (ch == 'l')
> + return build_const_pointer (instance.memory_scalar_type ());
> + if (ch == 's') {
> + return build_pointer_type (instance.memory_scalar_type ());
> + }
> + gcc_unreachable ();
> + }
> +
> if (ch == 'p')
> return get_mve_pred16_t ();
>
> diff --git a/gcc/config/arm/arm-mve-builtins.h b/gcc/config/arm/arm-mve-
> builtins.h
> index 37b8223dfb2..4fd230fe4c7 100644
> --- a/gcc/config/arm/arm-mve-builtins.h
> +++ b/gcc/config/arm/arm-mve-builtins.h
> @@ -277,6 +277,7 @@ public:
> bool could_trap_p () const;
>
> unsigned int vectors_per_tuple () const;
> + tree memory_scalar_type () const;
>
> const mode_suffix_info &mode_suffix () const;
>
> @@ -519,6 +520,14 @@ public:
> of vectors in the tuples, otherwise return 1. */
> virtual unsigned int vectors_per_tuple () const { return 1; }
>
> + /* If the function addresses memory, return the type of a single
> + scalar memory element. */
> + virtual tree
> + memory_scalar_type (const function_instance &) const
> + {
> + gcc_unreachable ();
> + }
> +
> /* Try to fold the given gimple call. Return the new gimple statement
> on success, otherwise return null. */
> virtual gimple *fold (gimple_folder &) const { return NULL; }
> @@ -644,6 +653,14 @@ function_instance::vectors_per_tuple () const
> return base->vectors_per_tuple ();
> }
>
> +/* If the function addresses memory, return the type of a single
> + scalar memory element. */
> +inline tree
> +function_instance::memory_scalar_type () const
> +{
> + return base->memory_scalar_type (*this);
> +}
> +
> /* Return information about the function's mode suffix. */
> inline const mode_suffix_info &
> function_instance::mode_suffix () const
> --
> 2.34.1
^ permalink raw reply [flat|nested] 15+ messages in thread
* RE: [PATCH 3/6] arm: [MVE intrinsics] Add support for contiguous loads and stores
2023-11-16 15:26 ` [PATCH 3/6] arm: [MVE intrinsics] Add support for contiguous loads and stores Christophe Lyon
@ 2023-11-16 16:48 ` Kyrylo Tkachov
2023-11-23 13:29 ` Jan-Benedict Glaw
1 sibling, 0 replies; 15+ messages in thread
From: Kyrylo Tkachov @ 2023-11-16 16:48 UTC (permalink / raw)
To: Christophe Lyon, gcc-patches, Richard Sandiford, Richard Earnshaw
> -----Original Message-----
> From: Christophe Lyon <christophe.lyon@linaro.org>
> Sent: Thursday, November 16, 2023 3:26 PM
> To: gcc-patches@gcc.gnu.org; Richard Sandiford
> <Richard.Sandiford@arm.com>; Richard Earnshaw
> <Richard.Earnshaw@arm.com>; Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>
> Cc: Christophe Lyon <christophe.lyon@linaro.org>
> Subject: [PATCH 3/6] arm: [MVE intrinsics] Add support for contiguous loads
> and stores
>
> This patch adds base support for load/store intrinsics to the
> framework, starting with loads and stores for contiguous memory
> elements, without extension nor truncation.
>
> Compared to the aarch64/SVE implementation, there's no support for
> gather/scatter loads/stores yet. This will be added later as needed.
>
Ok.
Thanks,
Kyrill
> 2023-11-16 Christophe Lyon <christophe.lyon@linaro.org>
>
> gcc/
> * config/arm/arm-mve-builtins-functions.h (multi_vector_function)
> (full_width_access): New classes.
> * config/arm/arm-mve-builtins.cc
> (find_type_suffix_for_scalar_type, infer_pointer_type)
> (require_pointer_type, get_contiguous_base, add_mem_operand)
> (add_fixed_operand, use_contiguous_load_insn)
> (use_contiguous_store_insn): New.
> * config/arm/arm-mve-builtins.h (memory_vector_mode)
> (infer_pointer_type, require_pointer_type, get_contiguous_base)
> (add_mem_operand)
> (add_fixed_operand, use_contiguous_load_insn)
> (use_contiguous_store_insn): New.
> ---
> gcc/config/arm/arm-mve-builtins-functions.h | 56 ++++++++++
> gcc/config/arm/arm-mve-builtins.cc | 116 ++++++++++++++++++++
> gcc/config/arm/arm-mve-builtins.h | 28 ++++-
> 3 files changed, 199 insertions(+), 1 deletion(-)
>
> diff --git a/gcc/config/arm/arm-mve-builtins-functions.h
> b/gcc/config/arm/arm-mve-builtins-functions.h
> index eba1f071af0..6d234a2dd7c 100644
> --- a/gcc/config/arm/arm-mve-builtins-functions.h
> +++ b/gcc/config/arm/arm-mve-builtins-functions.h
> @@ -966,6 +966,62 @@ public:
> }
> };
>
> +/* A function_base that sometimes or always operates on tuples of
> + vectors. */
> +class multi_vector_function : public function_base
> +{
> +public:
> + CONSTEXPR multi_vector_function (unsigned int vectors_per_tuple)
> + : m_vectors_per_tuple (vectors_per_tuple) {}
> +
> + unsigned int
> + vectors_per_tuple () const override
> + {
> + return m_vectors_per_tuple;
> + }
> +
> + /* The number of vectors in a tuple, or 1 if the function only operates
> + on single vectors. */
> + unsigned int m_vectors_per_tuple;
> +};
> +
> +/* A function_base that loads or stores contiguous memory elements
> + without extending or truncating them. */
> +class full_width_access : public multi_vector_function
> +{
> +public:
> + CONSTEXPR full_width_access (unsigned int vectors_per_tuple = 1)
> + : multi_vector_function (vectors_per_tuple) {}
> +
> + tree
> + memory_scalar_type (const function_instance &fi) const override
> + {
> + return fi.scalar_type (0);
> + }
> +
> + machine_mode
> + memory_vector_mode (const function_instance &fi) const override
> + {
> + machine_mode mode = fi.vector_mode (0);
> + /* Vectors of floating-point are managed in memory as vectors of
> + integers. */
> + switch (mode)
> + {
> + case E_V4SFmode:
> + mode = E_V4SImode;
> + break;
> + case E_V8HFmode:
> + mode = E_V8HImode;
> + break;
> + }
> +
> + if (m_vectors_per_tuple != 1)
> + mode = targetm.array_mode (mode, m_vectors_per_tuple).require ();
> +
> + return mode;
> + }
> +};
> +
> } /* end namespace arm_mve */
>
> /* Declare the global function base NAME, creating it from an instance
> diff --git a/gcc/config/arm/arm-mve-builtins.cc b/gcc/config/arm/arm-mve-
> builtins.cc
> index 02dc8fa9b73..a265cb05553 100644
> --- a/gcc/config/arm/arm-mve-builtins.cc
> +++ b/gcc/config/arm/arm-mve-builtins.cc
> @@ -36,6 +36,7 @@
> #include "fold-const.h"
> #include "gimple.h"
> #include "gimple-iterator.h"
> +#include "explow.h"
> #include "emit-rtl.h"
> #include "langhooks.h"
> #include "stringpool.h"
> @@ -529,6 +530,22 @@ matches_type_p (const_tree model_type, const_tree
> candidate)
> && TYPE_MAIN_VARIANT (model_type) == TYPE_MAIN_VARIANT
> (candidate));
> }
>
> +/* If TYPE is a valid MVE element type, return the corresponding type
> + suffix, otherwise return NUM_TYPE_SUFFIXES. */
> +static type_suffix_index
> +find_type_suffix_for_scalar_type (const_tree type)
> +{
> + /* A linear search should be OK here, since the code isn't hot and
> + the number of types is only small. */
> + for (unsigned int suffix_i = 0; suffix_i < NUM_TYPE_SUFFIXES; ++suffix_i)
> + {
> + vector_type_index vector_i = type_suffixes[suffix_i].vector_type;
> + if (matches_type_p (scalar_types[vector_i], type))
> + return type_suffix_index (suffix_i);
> + }
> + return NUM_TYPE_SUFFIXES;
> +}
> +
> /* Report an error against LOCATION that the user has tried to use
> a floating point function when the mve.fp extension is disabled. */
> static void
> @@ -1125,6 +1142,37 @@ function_resolver::resolve_to (mode_suffix_index
> mode,
> return res;
> }
>
> +/* Require argument ARGNO to be a pointer to a scalar type that has a
> + corresponding type suffix. Return that type suffix on success,
> + otherwise report an error and return NUM_TYPE_SUFFIXES. */
> +type_suffix_index
> +function_resolver::infer_pointer_type (unsigned int argno)
> +{
> + tree actual = get_argument_type (argno);
> + if (actual == error_mark_node)
> + return NUM_TYPE_SUFFIXES;
> +
> + if (TREE_CODE (actual) != POINTER_TYPE)
> + {
> + error_at (location, "passing %qT to argument %d of %qE, which"
> + " expects a pointer type", actual, argno + 1, fndecl);
> + return NUM_TYPE_SUFFIXES;
> + }
> +
> + tree target = TREE_TYPE (actual);
> + type_suffix_index type = find_type_suffix_for_scalar_type (target);
> + if (type == NUM_TYPE_SUFFIXES)
> + {
> + error_at (location, "passing %qT to argument %d of %qE, but %qT is not"
> + " a valid MVE element type", actual, argno + 1, fndecl,
> + build_qualified_type (target, 0));
> + return NUM_TYPE_SUFFIXES;
> + }
> + unsigned int bits = type_suffixes[type].element_bits;
> +
> + return type;
> +}
> +
> /* Require argument ARGNO to be a single vector or a tuple of
> NUM_VECTORS
> vectors; NUM_VECTORS is 1 for the former. Return the associated type
> suffix on success, using TYPE_SUFFIX_b for predicates. Report an error
> @@ -1498,6 +1546,22 @@ function_resolver::require_scalar_type (unsigned
> int argno,
> return true;
> }
>
> +/* Require argument ARGNO to be some form of pointer, without being
> specific
> + about its target type. Return true if the argument has the right form,
> + otherwise report an appropriate error. */
> +bool
> +function_resolver::require_pointer_type (unsigned int argno)
> +{
> + if (!scalar_argument_p (argno))
> + {
> + error_at (location, "passing %qT to argument %d of %qE, which"
> + " expects a scalar pointer", get_argument_type (argno),
> + argno + 1, fndecl);
> + return false;
> + }
> + return true;
> +}
> +
> /* Require the function to have exactly EXPECTED arguments. Return true
> if it does, otherwise report an appropriate error. */
> bool
> @@ -1955,6 +2019,14 @@ function_expander::direct_optab_handler (optab
> op, unsigned int suffix_i)
> return ::direct_optab_handler (op, vector_mode (suffix_i));
> }
>
> +/* Return the base address for a contiguous load or store
> + function. */
> +rtx
> +function_expander::get_contiguous_base ()
> +{
> + return args[0];
> +}
> +
> /* For a function that does the equivalent of:
>
> OUTPUT = COND ? FN (INPUTS) : FALLBACK;
> @@ -2043,6 +2115,26 @@ function_expander::add_integer_operand
> (HOST_WIDE_INT x)
> create_integer_operand (&m_ops.last (), x);
> }
>
> +/* Add a memory operand with mode MODE and address ADDR. */
> +void
> +function_expander::add_mem_operand (machine_mode mode, rtx addr)
> +{
> + gcc_assert (VECTOR_MODE_P (mode));
> + rtx mem = gen_rtx_MEM (mode, memory_address (mode, addr));
> + /* The memory is only guaranteed to be element-aligned. */
> + set_mem_align (mem, GET_MODE_ALIGNMENT (GET_MODE_INNER
> (mode)));
> + add_fixed_operand (mem);
> +}
> +
> +/* Add an operand that must be X. The only way of legitimizing an
> + invalid X is to reload the address of a MEM. */
> +void
> +function_expander::add_fixed_operand (rtx x)
> +{
> + m_ops.safe_grow (m_ops.length () + 1, true);
> + create_fixed_operand (&m_ops.last (), x);
> +}
> +
> /* Generate instruction ICODE, given that its operands have already
> been added to M_OPS. Return the value of the first operand. */
> rtx
> @@ -2137,6 +2229,30 @@ function_expander::use_cond_insn (insn_code
> icode, unsigned int merge_argno)
> return generate_insn (icode);
> }
>
> +/* Implement the call using instruction ICODE, which loads memory operand
> 1
> + into register operand 0. */
> +rtx
> +function_expander::use_contiguous_load_insn (insn_code icode)
> +{
> + machine_mode mem_mode = memory_vector_mode ();
> +
> + add_output_operand (icode);
> + add_mem_operand (mem_mode, get_contiguous_base ());
> + return generate_insn (icode);
> +}
> +
> +/* Implement the call using instruction ICODE, which stores register operand
> 1
> + into memory operand 0. */
> +rtx
> +function_expander::use_contiguous_store_insn (insn_code icode)
> +{
> + machine_mode mem_mode = memory_vector_mode ();
> +
> + add_mem_operand (mem_mode, get_contiguous_base ());
> + add_input_operand (icode, args[1]);
> + return generate_insn (icode);
> +}
> +
> /* Implement the call using a normal unpredicated optab for PRED_none.
>
> <optab> corresponds to:
> diff --git a/gcc/config/arm/arm-mve-builtins.h b/gcc/config/arm/arm-mve-
> builtins.h
> index 4fd230fe4c7..9c219fa8db4 100644
> --- a/gcc/config/arm/arm-mve-builtins.h
> +++ b/gcc/config/arm/arm-mve-builtins.h
> @@ -278,6 +278,7 @@ public:
>
> unsigned int vectors_per_tuple () const;
> tree memory_scalar_type () const;
> + machine_mode memory_vector_mode () const;
>
> const mode_suffix_info &mode_suffix () const;
>
> @@ -383,6 +384,7 @@ public:
> type_suffix_index = NUM_TYPE_SUFFIXES,
> type_suffix_index = NUM_TYPE_SUFFIXES);
>
> + type_suffix_index infer_pointer_type (unsigned int);
> type_suffix_index infer_vector_or_tuple_type (unsigned int, unsigned int);
> type_suffix_index infer_vector_type (unsigned int);
>
> @@ -394,8 +396,9 @@ public:
> type_suffix_index,
> type_class_index = SAME_TYPE_CLASS,
> unsigned int = SAME_SIZE);
> - bool require_integer_immediate (unsigned int);
> bool require_scalar_type (unsigned int, const char *);
> + bool require_pointer_type (unsigned int);
> + bool require_integer_immediate (unsigned int);
> bool require_derived_scalar_type (unsigned int, type_class_index,
> unsigned int = SAME_SIZE);
>
> @@ -476,18 +479,23 @@ public:
>
> insn_code direct_optab_handler (optab, unsigned int = 0);
>
> + rtx get_contiguous_base ();
> rtx get_fallback_value (machine_mode, unsigned int, unsigned int &);
> rtx get_reg_target ();
>
> void add_output_operand (insn_code);
> void add_input_operand (insn_code, rtx);
> void add_integer_operand (HOST_WIDE_INT);
> + void add_mem_operand (machine_mode, rtx);
> + void add_fixed_operand (rtx);
> rtx generate_insn (insn_code);
>
> rtx use_exact_insn (insn_code);
> rtx use_unpred_insn (insn_code);
> rtx use_pred_x_insn (insn_code);
> rtx use_cond_insn (insn_code, unsigned int = DEFAULT_MERGE_ARGNO);
> + rtx use_contiguous_load_insn (insn_code);
> + rtx use_contiguous_store_insn (insn_code);
>
> rtx map_to_rtx_codes (rtx_code, rtx_code, rtx_code);
>
> @@ -528,6 +536,15 @@ public:
> gcc_unreachable ();
> }
>
> + /* If the function addresses memory, return a vector mode whose
> + GET_MODE_NUNITS is the number of elements addressed and whose
> + GET_MODE_INNER is the mode of a single scalar memory element. */
> + virtual machine_mode
> + memory_vector_mode (const function_instance &) const
> + {
> + gcc_unreachable ();
> + }
> +
> /* Try to fold the given gimple call. Return the new gimple statement
> on success, otherwise return null. */
> virtual gimple *fold (gimple_folder &) const { return NULL; }
> @@ -661,6 +678,15 @@ function_instance::memory_scalar_type () const
> return base->memory_scalar_type (*this);
> }
>
> +/* If the function addresses memory, return a vector mode whose
> + GET_MODE_NUNITS is the number of elements addressed and whose
> + GET_MODE_INNER is the mode of a single scalar memory element. */
> +inline machine_mode
> +function_instance::memory_vector_mode () const
> +{
> + return base->memory_vector_mode (*this);
> +}
> +
> /* Return information about the function's mode suffix. */
> inline const mode_suffix_info &
> function_instance::mode_suffix () const
> --
> 2.34.1
^ permalink raw reply [flat|nested] 15+ messages in thread
* RE: [PATCH 4/6] arm: [MVE intrinsics] add load and store shapes
2023-11-16 15:26 ` [PATCH 4/6] arm: [MVE intrinsics] add load and store shapes Christophe Lyon
@ 2023-11-16 16:49 ` Kyrylo Tkachov
0 siblings, 0 replies; 15+ messages in thread
From: Kyrylo Tkachov @ 2023-11-16 16:49 UTC (permalink / raw)
To: Christophe Lyon, gcc-patches, Richard Sandiford, Richard Earnshaw
> -----Original Message-----
> From: Christophe Lyon <christophe.lyon@linaro.org>
> Sent: Thursday, November 16, 2023 3:26 PM
> To: gcc-patches@gcc.gnu.org; Richard Sandiford
> <Richard.Sandiford@arm.com>; Richard Earnshaw
> <Richard.Earnshaw@arm.com>; Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>
> Cc: Christophe Lyon <christophe.lyon@linaro.org>
> Subject: [PATCH 4/6] arm: [MVE intrinsics] add load and store shapes
>
> This patch adds the load and store shapes descriptions.
Ok.
Thanks,
Kyrill
>
> 2023-11-16 Christophe Lyon <christophe.lyon@linaro.org>
>
> gcc/
> * config/arm/arm-mve-builtins-shapes.cc (load, store): New.
> * config/arm/arm-mve-builtins-shapes.h (load, store): New.
> ---
> gcc/config/arm/arm-mve-builtins-shapes.cc | 67 +++++++++++++++++++++++
> gcc/config/arm/arm-mve-builtins-shapes.h | 2 +
> 2 files changed, 69 insertions(+)
>
> diff --git a/gcc/config/arm/arm-mve-builtins-shapes.cc b/gcc/config/arm/arm-
> mve-builtins-shapes.cc
> index ce87ebcef30..fe983e7c736 100644
> --- a/gcc/config/arm/arm-mve-builtins-shapes.cc
> +++ b/gcc/config/arm/arm-mve-builtins-shapes.cc
> @@ -1428,6 +1428,38 @@ struct inherent_def : public nonoverloaded_base
> };
> SHAPE (inherent)
>
> +/* sv<t0>_t svfoo[_t0](const <t0>_t *)
> +
> + Example: vld1q.
> + int8x16_t [__arm_]vld1q[_s8](int8_t const *base)
> + int8x16_t [__arm_]vld1q_z[_s8](int8_t const *base, mve_pred16_t p) */
> +struct load_def : public overloaded_base<0>
> +{
> + void
> + build (function_builder &b, const function_group_info &group,
> + bool preserve_user_namespace) const override
> + {
> + b.add_overloaded_functions (group, MODE_none,
> preserve_user_namespace);
> + build_all (b, "t0,al", group, MODE_none, preserve_user_namespace);
> + }
> +
> + /* Resolve a call based purely on a pointer argument. */
> + tree
> + resolve (function_resolver &r) const override
> + {
> + gcc_assert (r.mode_suffix_id == MODE_none);
> +
> + unsigned int i, nargs;
> + type_suffix_index type;
> + if (!r.check_gp_argument (1, i, nargs)
> + || (type = r.infer_pointer_type (i)) == NUM_TYPE_SUFFIXES)
> + return error_mark_node;
> +
> + return r.resolve_to (r.mode_suffix_id, type);
> + }
> +};
> +SHAPE (load)
> +
> /* <T0>_t vfoo[_t0](<T0>_t)
> <T0>_t vfoo_n_t0(<sT0>_t)
>
> @@ -1477,6 +1509,41 @@ struct mvn_def : public overloaded_base<0>
> };
> SHAPE (mvn)
>
> +/* void vfoo[_t0](<X>_t *, v<t0>[xN]_t)
> +
> + where <X> might be tied to <t0> (for non-truncating stores) or might
> + depend on the function base name (for truncating stores).
> +
> + Example: vst1q.
> + void [__arm_]vst1q[_s8](int8_t *base, int8x16_t value)
> + void [__arm_]vst1q_p[_s8](int8_t *base, int8x16_t value, mve_pred16_t p)
> */
> +struct store_def : public overloaded_base<0>
> +{
> + void
> + build (function_builder &b, const function_group_info &group,
> + bool preserve_user_namespace) const override
> + {
> + b.add_overloaded_functions (group, MODE_none,
> preserve_user_namespace);
> + build_all (b, "_,as,v0", group, MODE_none, preserve_user_namespace);
> + }
> +
> + tree
> + resolve (function_resolver &r) const override
> + {
> + gcc_assert (r.mode_suffix_id == MODE_none);
> +
> + unsigned int i, nargs;
> + type_suffix_index type;
> + if (!r.check_gp_argument (2, i, nargs)
> + || !r.require_pointer_type (0)
> + || (type = r.infer_vector_type (1)) == NUM_TYPE_SUFFIXES)
> + return error_mark_node;
> +
> + return r.resolve_to (r.mode_suffix_id, type);
> + }
> +};
> +SHAPE (store)
> +
> /* <T0>_t vfoo[_t0](<T0>_t, <T0>_t, <T0>_t)
>
> i.e. the standard shape for ternary operations that operate on
> diff --git a/gcc/config/arm/arm-mve-builtins-shapes.h b/gcc/config/arm/arm-
> mve-builtins-shapes.h
> index a93245321c9..aa9309dec7e 100644
> --- a/gcc/config/arm/arm-mve-builtins-shapes.h
> +++ b/gcc/config/arm/arm-mve-builtins-shapes.h
> @@ -61,7 +61,9 @@ namespace arm_mve
> extern const function_shape *const cmp;
> extern const function_shape *const create;
> extern const function_shape *const inherent;
> + extern const function_shape *const load;
> extern const function_shape *const mvn;
> + extern const function_shape *const store;
> extern const function_shape *const ternary;
> extern const function_shape *const ternary_lshift;
> extern const function_shape *const ternary_n;
> --
> 2.34.1
^ permalink raw reply [flat|nested] 15+ messages in thread
* RE: [PATCH 6/6] arm: [MVE intrinsics] rework vldq1 vst1q
2023-11-16 15:26 ` [PATCH 6/6] arm: [MVE intrinsics] rework vldq1 vst1q Christophe Lyon
@ 2023-11-16 16:49 ` Kyrylo Tkachov
0 siblings, 0 replies; 15+ messages in thread
From: Kyrylo Tkachov @ 2023-11-16 16:49 UTC (permalink / raw)
To: Christophe Lyon, gcc-patches, Richard Sandiford, Richard Earnshaw
> -----Original Message-----
> From: Christophe Lyon <christophe.lyon@linaro.org>
> Sent: Thursday, November 16, 2023 3:26 PM
> To: gcc-patches@gcc.gnu.org; Richard Sandiford
> <Richard.Sandiford@arm.com>; Richard Earnshaw
> <Richard.Earnshaw@arm.com>; Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>
> Cc: Christophe Lyon <christophe.lyon@linaro.org>
> Subject: [PATCH 6/6] arm: [MVE intrinsics] rework vldq1 vst1q
>
> Implement vld1q, vst1q using the new MVE builtins framework.
Ok. Nice to see more MVE intrinsics getting the good treatment.
Thanks,
Kyrill
>
> 2023-11-16 Christophe Lyon <christophe.lyon@linaro.org>
>
> gcc/
> * config/arm/arm-mve-builtins-base.cc (vld1_impl, vld1q)
> (vst1_impl, vst1q): New.
> * config/arm/arm-mve-builtins-base.def (vld1q, vst1q): New.
> * config/arm/arm-mve-builtins-base.h (vld1q, vst1q): New.
> * config/arm/arm_mve.h
> (vld1q): Delete.
> (vst1q): Delete.
> (vld1q_s8): Delete.
> (vld1q_s32): Delete.
> (vld1q_s16): Delete.
> (vld1q_u8): Delete.
> (vld1q_u32): Delete.
> (vld1q_u16): Delete.
> (vld1q_f32): Delete.
> (vld1q_f16): Delete.
> (vst1q_f32): Delete.
> (vst1q_f16): Delete.
> (vst1q_s8): Delete.
> (vst1q_s32): Delete.
> (vst1q_s16): Delete.
> (vst1q_u8): Delete.
> (vst1q_u32): Delete.
> (vst1q_u16): Delete.
> (__arm_vld1q_s8): Delete.
> (__arm_vld1q_s32): Delete.
> (__arm_vld1q_s16): Delete.
> (__arm_vld1q_u8): Delete.
> (__arm_vld1q_u32): Delete.
> (__arm_vld1q_u16): Delete.
> (__arm_vst1q_s8): Delete.
> (__arm_vst1q_s32): Delete.
> (__arm_vst1q_s16): Delete.
> (__arm_vst1q_u8): Delete.
> (__arm_vst1q_u32): Delete.
> (__arm_vst1q_u16): Delete.
> (__arm_vld1q_f32): Delete.
> (__arm_vld1q_f16): Delete.
> (__arm_vst1q_f32): Delete.
> (__arm_vst1q_f16): Delete.
> (__arm_vld1q): Delete.
> (__arm_vst1q): Delete.
> * config/arm/mve.md (mve_vld1q_f<mode>): Rename into ...
> (@mve_vld1q_f<mode>): ... this.
> (mve_vld1q_<supf><mode>): Rename into ...
> (@mve_vld1q_<supf><mode>) ... this.
> (mve_vst1q_f<mode>): Rename into ...
> (@mve_vst1q_f<mode>): ... this.
> (mve_vst1q_<supf><mode>): Rename into ...
> (@mve_vst1q_<supf><mode>) ... this.
> ---
> gcc/config/arm/arm-mve-builtins-base.cc | 58 +++++
> gcc/config/arm/arm-mve-builtins-base.def | 4 +
> gcc/config/arm/arm-mve-builtins-base.h | 4 +-
> gcc/config/arm/arm_mve.h | 282 -----------------------
> gcc/config/arm/mve.md | 8 +-
> 5 files changed, 69 insertions(+), 287 deletions(-)
>
> diff --git a/gcc/config/arm/arm-mve-builtins-base.cc b/gcc/config/arm/arm-
> mve-builtins-base.cc
> index 5478cac8aeb..cfe1b954a29 100644
> --- a/gcc/config/arm/arm-mve-builtins-base.cc
> +++ b/gcc/config/arm/arm-mve-builtins-base.cc
> @@ -83,6 +83,62 @@ class vuninitializedq_impl : public
> quiet<function_base>
> }
> };
>
> +class vld1_impl : public full_width_access
> +{
> +public:
> + unsigned int
> + call_properties (const function_instance &) const override
> + {
> + return CP_READ_MEMORY;
> + }
> +
> + rtx
> + expand (function_expander &e) const override
> + {
> + insn_code icode;
> + if (e.type_suffix (0).float_p)
> + icode = code_for_mve_vld1q_f(e.vector_mode (0));
> + else
> + {
> + if (e.type_suffix (0).unsigned_p)
> + icode = code_for_mve_vld1q(VLD1Q_U,
> + e.vector_mode (0));
> + else
> + icode = code_for_mve_vld1q(VLD1Q_S,
> + e.vector_mode (0));
> + }
> + return e.use_contiguous_load_insn (icode);
> + }
> +};
> +
> +class vst1_impl : public full_width_access
> +{
> +public:
> + unsigned int
> + call_properties (const function_instance &) const override
> + {
> + return CP_WRITE_MEMORY;
> + }
> +
> + rtx
> + expand (function_expander &e) const override
> + {
> + insn_code icode;
> + if (e.type_suffix (0).float_p)
> + icode = code_for_mve_vst1q_f(e.vector_mode (0));
> + else
> + {
> + if (e.type_suffix (0).unsigned_p)
> + icode = code_for_mve_vst1q(VST1Q_U,
> + e.vector_mode (0));
> + else
> + icode = code_for_mve_vst1q(VST1Q_S,
> + e.vector_mode (0));
> + }
> + return e.use_contiguous_store_insn (icode);
> + }
> +};
> +
> } /* end anonymous namespace */
>
> namespace arm_mve {
> @@ -290,6 +346,7 @@ FUNCTION (vfmasq,
> unspec_mve_function_exact_insn, (-1, -1, -1, -1, -1, VFMASQ_N_
> FUNCTION (vfmsq, unspec_mve_function_exact_insn, (-1, -1, VFMSQ_F, -1, -
> 1, -1, -1, -1, VFMSQ_M_F, -1, -1, -1))
> FUNCTION_WITH_M_N_NO_F (vhaddq, VHADDQ)
> FUNCTION_WITH_M_N_NO_F (vhsubq, VHSUBQ)
> +FUNCTION (vld1q, vld1_impl,)
> FUNCTION_PRED_P_S (vmaxavq, VMAXAVQ)
> FUNCTION_WITHOUT_N_NO_U_F (vmaxaq, VMAXAQ)
> FUNCTION_ONLY_F (vmaxnmaq, VMAXNMAQ)
> @@ -405,6 +462,7 @@ FUNCTION_ONLY_N_NO_F (vshrntq, VSHRNTQ)
> FUNCTION_ONLY_N_NO_F (vshrq, VSHRQ)
> FUNCTION_ONLY_N_NO_F (vsliq, VSLIQ)
> FUNCTION_ONLY_N_NO_F (vsriq, VSRIQ)
> +FUNCTION (vst1q, vst1_impl,)
> FUNCTION_WITH_RTX_M_N (vsubq, MINUS, VSUBQ)
> FUNCTION (vuninitializedq, vuninitializedq_impl,)
>
> diff --git a/gcc/config/arm/arm-mve-builtins-base.def b/gcc/config/arm/arm-
> mve-builtins-base.def
> index 01dfbdef8a3..16879246237 100644
> --- a/gcc/config/arm/arm-mve-builtins-base.def
> +++ b/gcc/config/arm/arm-mve-builtins-base.def
> @@ -47,6 +47,7 @@ DEF_MVE_FUNCTION (vhaddq, binary_opt_n,
> all_integer, mx_or_none)
> DEF_MVE_FUNCTION (vhcaddq_rot90, binary, all_signed, mx_or_none)
> DEF_MVE_FUNCTION (vhcaddq_rot270, binary, all_signed, mx_or_none)
> DEF_MVE_FUNCTION (vhsubq, binary_opt_n, all_integer, mx_or_none)
> +DEF_MVE_FUNCTION (vld1q, load, all_integer, none)
> DEF_MVE_FUNCTION (vmaxaq, binary_maxamina, all_signed, m_or_none)
> DEF_MVE_FUNCTION (vmaxavq, binary_maxavminav, all_signed, p_or_none)
> DEF_MVE_FUNCTION (vmaxq, binary, all_integer, mx_or_none)
> @@ -150,6 +151,7 @@ DEF_MVE_FUNCTION (vshrntq, binary_rshift_narrow,
> integer_16_32, m_or_none)
> DEF_MVE_FUNCTION (vshrq, binary_rshift, all_integer, mx_or_none)
> DEF_MVE_FUNCTION (vsliq, ternary_lshift, all_integer, m_or_none)
> DEF_MVE_FUNCTION (vsriq, ternary_rshift, all_integer, m_or_none)
> +DEF_MVE_FUNCTION (vst1q, store, all_integer, none)
> DEF_MVE_FUNCTION (vsubq, binary_opt_n, all_integer, mx_or_none)
> DEF_MVE_FUNCTION (vuninitializedq, inherent, all_integer_with_64, none)
> #undef REQUIRES_FLOAT
> @@ -182,6 +184,7 @@ DEF_MVE_FUNCTION (veorq, binary, all_float,
> mx_or_none)
> DEF_MVE_FUNCTION (vfmaq, ternary_opt_n, all_float, m_or_none)
> DEF_MVE_FUNCTION (vfmasq, ternary_n, all_float, m_or_none)
> DEF_MVE_FUNCTION (vfmsq, ternary, all_float, m_or_none)
> +DEF_MVE_FUNCTION (vld1q, load, all_float, none)
> DEF_MVE_FUNCTION (vmaxnmaq, binary, all_float, m_or_none)
> DEF_MVE_FUNCTION (vmaxnmavq, binary_maxvminv, all_float, p_or_none)
> DEF_MVE_FUNCTION (vmaxnmq, binary, all_float, mx_or_none)
> @@ -203,6 +206,7 @@ DEF_MVE_FUNCTION (vrndnq, unary, all_float,
> mx_or_none)
> DEF_MVE_FUNCTION (vrndpq, unary, all_float, mx_or_none)
> DEF_MVE_FUNCTION (vrndq, unary, all_float, mx_or_none)
> DEF_MVE_FUNCTION (vrndxq, unary, all_float, mx_or_none)
> +DEF_MVE_FUNCTION (vst1q, store, all_float, none)
> DEF_MVE_FUNCTION (vsubq, binary_opt_n, all_float, mx_or_none)
> DEF_MVE_FUNCTION (vuninitializedq, inherent, all_float, none)
> #undef REQUIRES_FLOAT
> diff --git a/gcc/config/arm/arm-mve-builtins-base.h b/gcc/config/arm/arm-
> mve-builtins-base.h
> index c574c32ac53..8c7e5fe5c3e 100644
> --- a/gcc/config/arm/arm-mve-builtins-base.h
> +++ b/gcc/config/arm/arm-mve-builtins-base.h
> @@ -63,6 +63,7 @@ extern const function_base *const vhaddq;
> extern const function_base *const vhcaddq_rot270;
> extern const function_base *const vhcaddq_rot90;
> extern const function_base *const vhsubq;
> +extern const function_base *const vld1q;
> extern const function_base *const vmaxaq;
> extern const function_base *const vmaxavq;
> extern const function_base *const vmaxnmaq;
> @@ -103,8 +104,8 @@ extern const function_base *const vmovnbq;
> extern const function_base *const vmovntq;
> extern const function_base *const vmulhq;
> extern const function_base *const vmullbq_int;
> -extern const function_base *const vmulltq_int;
> extern const function_base *const vmullbq_poly;
> +extern const function_base *const vmulltq_int;
> extern const function_base *const vmulltq_poly;
> extern const function_base *const vmulq;
> extern const function_base *const vmvnq;
> @@ -178,6 +179,7 @@ extern const function_base *const vshrntq;
> extern const function_base *const vshrq;
> extern const function_base *const vsliq;
> extern const function_base *const vsriq;
> +extern const function_base *const vst1q;
> extern const function_base *const vsubq;
> extern const function_base *const vuninitializedq;
>
> diff --git a/gcc/config/arm/arm_mve.h b/gcc/config/arm/arm_mve.h
> index b82d94e59bd..cc027f9cbb5 100644
> --- a/gcc/config/arm/arm_mve.h
> +++ b/gcc/config/arm/arm_mve.h
> @@ -56,7 +56,6 @@
> #define vstrbq_scatter_offset_p(__base, __offset, __value, __p)
> __arm_vstrbq_scatter_offset_p(__base, __offset, __value, __p)
> #define vstrwq_scatter_base_p(__addr, __offset, __value, __p)
> __arm_vstrwq_scatter_base_p(__addr, __offset, __value, __p)
> #define vldrbq_gather_offset_z(__base, __offset, __p)
> __arm_vldrbq_gather_offset_z(__base, __offset, __p)
> -#define vld1q(__base) __arm_vld1q(__base)
> #define vldrhq_gather_offset(__base, __offset)
> __arm_vldrhq_gather_offset(__base, __offset)
> #define vldrhq_gather_offset_z(__base, __offset, __p)
> __arm_vldrhq_gather_offset_z(__base, __offset, __p)
> #define vldrhq_gather_shifted_offset(__base, __offset)
> __arm_vldrhq_gather_shifted_offset(__base, __offset)
> @@ -69,7 +68,6 @@
> #define vldrwq_gather_offset_z(__base, __offset, __p)
> __arm_vldrwq_gather_offset_z(__base, __offset, __p)
> #define vldrwq_gather_shifted_offset(__base, __offset)
> __arm_vldrwq_gather_shifted_offset(__base, __offset)
> #define vldrwq_gather_shifted_offset_z(__base, __offset, __p)
> __arm_vldrwq_gather_shifted_offset_z(__base, __offset, __p)
> -#define vst1q(__addr, __value) __arm_vst1q(__addr, __value)
> #define vstrhq_scatter_offset(__base, __offset, __value)
> __arm_vstrhq_scatter_offset(__base, __offset, __value)
> #define vstrhq_scatter_offset_p(__base, __offset, __value, __p)
> __arm_vstrhq_scatter_offset_p(__base, __offset, __value, __p)
> #define vstrhq_scatter_shifted_offset(__base, __offset, __value)
> __arm_vstrhq_scatter_shifted_offset(__base, __offset, __value)
> @@ -346,12 +344,6 @@
> #define vldrbq_z_u32(__base, __p) __arm_vldrbq_z_u32(__base, __p)
> #define vldrwq_gather_base_z_u32(__addr, __offset, __p)
> __arm_vldrwq_gather_base_z_u32(__addr, __offset, __p)
> #define vldrwq_gather_base_z_s32(__addr, __offset, __p)
> __arm_vldrwq_gather_base_z_s32(__addr, __offset, __p)
> -#define vld1q_s8(__base) __arm_vld1q_s8(__base)
> -#define vld1q_s32(__base) __arm_vld1q_s32(__base)
> -#define vld1q_s16(__base) __arm_vld1q_s16(__base)
> -#define vld1q_u8(__base) __arm_vld1q_u8(__base)
> -#define vld1q_u32(__base) __arm_vld1q_u32(__base)
> -#define vld1q_u16(__base) __arm_vld1q_u16(__base)
> #define vldrhq_gather_offset_s32(__base, __offset)
> __arm_vldrhq_gather_offset_s32(__base, __offset)
> #define vldrhq_gather_offset_s16(__base, __offset)
> __arm_vldrhq_gather_offset_s16(__base, __offset)
> #define vldrhq_gather_offset_u32(__base, __offset)
> __arm_vldrhq_gather_offset_u32(__base, __offset)
> @@ -380,8 +372,6 @@
> #define vldrwq_u32(__base) __arm_vldrwq_u32(__base)
> #define vldrwq_z_s32(__base, __p) __arm_vldrwq_z_s32(__base, __p)
> #define vldrwq_z_u32(__base, __p) __arm_vldrwq_z_u32(__base, __p)
> -#define vld1q_f32(__base) __arm_vld1q_f32(__base)
> -#define vld1q_f16(__base) __arm_vld1q_f16(__base)
> #define vldrhq_f16(__base) __arm_vldrhq_f16(__base)
> #define vldrhq_z_f16(__base, __p) __arm_vldrhq_z_f16(__base, __p)
> #define vldrwq_f32(__base) __arm_vldrwq_f32(__base)
> @@ -416,14 +406,6 @@
> #define vldrwq_gather_shifted_offset_z_f32(__base, __offset, __p)
> __arm_vldrwq_gather_shifted_offset_z_f32(__base, __offset, __p)
> #define vldrwq_gather_shifted_offset_z_s32(__base, __offset, __p)
> __arm_vldrwq_gather_shifted_offset_z_s32(__base, __offset, __p)
> #define vldrwq_gather_shifted_offset_z_u32(__base, __offset, __p)
> __arm_vldrwq_gather_shifted_offset_z_u32(__base, __offset, __p)
> -#define vst1q_f32(__addr, __value) __arm_vst1q_f32(__addr, __value)
> -#define vst1q_f16(__addr, __value) __arm_vst1q_f16(__addr, __value)
> -#define vst1q_s8(__addr, __value) __arm_vst1q_s8(__addr, __value)
> -#define vst1q_s32(__addr, __value) __arm_vst1q_s32(__addr, __value)
> -#define vst1q_s16(__addr, __value) __arm_vst1q_s16(__addr, __value)
> -#define vst1q_u8(__addr, __value) __arm_vst1q_u8(__addr, __value)
> -#define vst1q_u32(__addr, __value) __arm_vst1q_u32(__addr, __value)
> -#define vst1q_u16(__addr, __value) __arm_vst1q_u16(__addr, __value)
> #define vstrhq_f16(__addr, __value) __arm_vstrhq_f16(__addr, __value)
> #define vstrhq_scatter_offset_s32( __base, __offset, __value)
> __arm_vstrhq_scatter_offset_s32( __base, __offset, __value)
> #define vstrhq_scatter_offset_s16( __base, __offset, __value)
> __arm_vstrhq_scatter_offset_s16( __base, __offset, __value)
> @@ -1537,48 +1519,6 @@ __arm_vldrwq_gather_base_z_u32 (uint32x4_t
> __addr, const int __offset, mve_pred1
> return __builtin_mve_vldrwq_gather_base_z_uv4si (__addr, __offset, __p);
> }
>
> -__extension__ extern __inline int8x16_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vld1q_s8 (int8_t const * __base)
> -{
> - return __builtin_mve_vld1q_sv16qi ((__builtin_neon_qi *) __base);
> -}
> -
> -__extension__ extern __inline int32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vld1q_s32 (int32_t const * __base)
> -{
> - return __builtin_mve_vld1q_sv4si ((__builtin_neon_si *) __base);
> -}
> -
> -__extension__ extern __inline int16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vld1q_s16 (int16_t const * __base)
> -{
> - return __builtin_mve_vld1q_sv8hi ((__builtin_neon_hi *) __base);
> -}
> -
> -__extension__ extern __inline uint8x16_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vld1q_u8 (uint8_t const * __base)
> -{
> - return __builtin_mve_vld1q_uv16qi ((__builtin_neon_qi *) __base);
> -}
> -
> -__extension__ extern __inline uint32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vld1q_u32 (uint32_t const * __base)
> -{
> - return __builtin_mve_vld1q_uv4si ((__builtin_neon_si *) __base);
> -}
> -
> -__extension__ extern __inline uint16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vld1q_u16 (uint16_t const * __base)
> -{
> - return __builtin_mve_vld1q_uv8hi ((__builtin_neon_hi *) __base);
> -}
> -
> __extension__ extern __inline int32x4_t
> __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> __arm_vldrhq_gather_offset_s32 (int16_t const * __base, uint32x4_t
> __offset)
> @@ -1917,48 +1857,6 @@ __arm_vldrwq_gather_shifted_offset_z_u32
> (uint32_t const * __base, uint32x4_t __
> return __builtin_mve_vldrwq_gather_shifted_offset_z_uv4si
> ((__builtin_neon_si *) __base, __offset, __p);
> }
>
> -__extension__ extern __inline void
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vst1q_s8 (int8_t * __addr, int8x16_t __value)
> -{
> - __builtin_mve_vst1q_sv16qi ((__builtin_neon_qi *) __addr, __value);
> -}
> -
> -__extension__ extern __inline void
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vst1q_s32 (int32_t * __addr, int32x4_t __value)
> -{
> - __builtin_mve_vst1q_sv4si ((__builtin_neon_si *) __addr, __value);
> -}
> -
> -__extension__ extern __inline void
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vst1q_s16 (int16_t * __addr, int16x8_t __value)
> -{
> - __builtin_mve_vst1q_sv8hi ((__builtin_neon_hi *) __addr, __value);
> -}
> -
> -__extension__ extern __inline void
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vst1q_u8 (uint8_t * __addr, uint8x16_t __value)
> -{
> - __builtin_mve_vst1q_uv16qi ((__builtin_neon_qi *) __addr, __value);
> -}
> -
> -__extension__ extern __inline void
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vst1q_u32 (uint32_t * __addr, uint32x4_t __value)
> -{
> - __builtin_mve_vst1q_uv4si ((__builtin_neon_si *) __addr, __value);
> -}
> -
> -__extension__ extern __inline void
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vst1q_u16 (uint16_t * __addr, uint16x8_t __value)
> -{
> - __builtin_mve_vst1q_uv8hi ((__builtin_neon_hi *) __addr, __value);
> -}
> -
> __extension__ extern __inline void
> __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> __arm_vstrhq_scatter_offset_s32 (int16_t * __base, uint32x4_t __offset,
> int32x4_t __value)
> @@ -4421,20 +4319,6 @@ __arm_vornq_m_f16 (float16x8_t __inactive,
> float16x8_t __a, float16x8_t __b, mve
> return __builtin_mve_vornq_m_fv8hf (__inactive, __a, __b, __p);
> }
>
> -__extension__ extern __inline float32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vld1q_f32 (float32_t const * __base)
> -{
> - return __builtin_mve_vld1q_fv4sf((__builtin_neon_si *) __base);
> -}
> -
> -__extension__ extern __inline float16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vld1q_f16 (float16_t const * __base)
> -{
> - return __builtin_mve_vld1q_fv8hf((__builtin_neon_hi *) __base);
> -}
> -
> __extension__ extern __inline float32x4_t
> __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> __arm_vldrwq_f32 (float32_t const * __base)
> @@ -4547,20 +4431,6 @@ __arm_vstrwq_f32 (float32_t * __addr,
> float32x4_t __value)
> __builtin_mve_vstrwq_fv4sf ((__builtin_neon_si *) __addr, __value);
> }
>
> -__extension__ extern __inline void
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vst1q_f32 (float32_t * __addr, float32x4_t __value)
> -{
> - __builtin_mve_vst1q_fv4sf ((__builtin_neon_si *) __addr, __value);
> -}
> -
> -__extension__ extern __inline void
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vst1q_f16 (float16_t * __addr, float16x8_t __value)
> -{
> - __builtin_mve_vst1q_fv8hf ((__builtin_neon_hi *) __addr, __value);
> -}
> -
> __extension__ extern __inline void
> __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> __arm_vstrhq_f16 (float16_t * __addr, float16x8_t __value)
> @@ -5651,48 +5521,6 @@ __arm_vldrbq_gather_offset_z (uint8_t const *
> __base, uint16x8_t __offset, mve_p
> return __arm_vldrbq_gather_offset_z_u16 (__base, __offset, __p);
> }
>
> -__extension__ extern __inline int8x16_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vld1q (int8_t const * __base)
> -{
> - return __arm_vld1q_s8 (__base);
> -}
> -
> -__extension__ extern __inline int32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vld1q (int32_t const * __base)
> -{
> - return __arm_vld1q_s32 (__base);
> -}
> -
> -__extension__ extern __inline int16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vld1q (int16_t const * __base)
> -{
> - return __arm_vld1q_s16 (__base);
> -}
> -
> -__extension__ extern __inline uint8x16_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vld1q (uint8_t const * __base)
> -{
> - return __arm_vld1q_u8 (__base);
> -}
> -
> -__extension__ extern __inline uint32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vld1q (uint32_t const * __base)
> -{
> - return __arm_vld1q_u32 (__base);
> -}
> -
> -__extension__ extern __inline uint16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vld1q (uint16_t const * __base)
> -{
> - return __arm_vld1q_u16 (__base);
> -}
> -
> __extension__ extern __inline int32x4_t
> __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> __arm_vldrhq_gather_offset (int16_t const * __base, uint32x4_t __offset)
> @@ -5917,48 +5745,6 @@ __arm_vldrwq_gather_shifted_offset_z (uint32_t
> const * __base, uint32x4_t __offs
> return __arm_vldrwq_gather_shifted_offset_z_u32 (__base, __offset, __p);
> }
>
> -__extension__ extern __inline void
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vst1q (int8_t * __addr, int8x16_t __value)
> -{
> - __arm_vst1q_s8 (__addr, __value);
> -}
> -
> -__extension__ extern __inline void
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vst1q (int32_t * __addr, int32x4_t __value)
> -{
> - __arm_vst1q_s32 (__addr, __value);
> -}
> -
> -__extension__ extern __inline void
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vst1q (int16_t * __addr, int16x8_t __value)
> -{
> - __arm_vst1q_s16 (__addr, __value);
> -}
> -
> -__extension__ extern __inline void
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vst1q (uint8_t * __addr, uint8x16_t __value)
> -{
> - __arm_vst1q_u8 (__addr, __value);
> -}
> -
> -__extension__ extern __inline void
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vst1q (uint32_t * __addr, uint32x4_t __value)
> -{
> - __arm_vst1q_u32 (__addr, __value);
> -}
> -
> -__extension__ extern __inline void
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vst1q (uint16_t * __addr, uint16x8_t __value)
> -{
> - __arm_vst1q_u16 (__addr, __value);
> -}
> -
> __extension__ extern __inline void
> __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> __arm_vstrhq_scatter_offset (int16_t * __base, uint32x4_t __offset,
> int32x4_t __value)
> @@ -7809,20 +7595,6 @@ __arm_vornq_m (float16x8_t __inactive,
> float16x8_t __a, float16x8_t __b, mve_pre
> return __arm_vornq_m_f16 (__inactive, __a, __b, __p);
> }
>
> -__extension__ extern __inline float32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vld1q (float32_t const * __base)
> -{
> - return __arm_vld1q_f32 (__base);
> -}
> -
> -__extension__ extern __inline float16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vld1q (float16_t const * __base)
> -{
> - return __arm_vld1q_f16 (__base);
> -}
> -
> __extension__ extern __inline float16x8_t
> __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> __arm_vldrhq_gather_offset (float16_t const * __base, uint16x8_t __offset)
> @@ -7893,20 +7665,6 @@ __arm_vstrwq (float32_t * __addr, float32x4_t
> __value)
> __arm_vstrwq_f32 (__addr, __value);
> }
>
> -__extension__ extern __inline void
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vst1q (float32_t * __addr, float32x4_t __value)
> -{
> - __arm_vst1q_f32 (__addr, __value);
> -}
> -
> -__extension__ extern __inline void
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vst1q (float16_t * __addr, float16x8_t __value)
> -{
> - __arm_vst1q_f16 (__addr, __value);
> -}
> -
> __extension__ extern __inline void
> __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> __arm_vstrhq (float16_t * __addr, float16x8_t __value)
> @@ -8670,17 +8428,6 @@ extern void *__ARM_undef;
> int
> (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t][__ARM_
> mve_type_float16x8_t]: __arm_vornq_m_f16 (__ARM_mve_coerce(__p0,
> float16x8_t), __ARM_mve_coerce(__p1, float16x8_t),
> __ARM_mve_coerce(__p2, float16x8_t), p3), \
> int
> (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t][__ARM_
> mve_type_float32x4_t]: __arm_vornq_m_f32 (__ARM_mve_coerce(__p0,
> float32x4_t), __ARM_mve_coerce(__p1, float32x4_t),
> __ARM_mve_coerce(__p2, float32x4_t), p3));})
>
> -#define __arm_vld1q(p0) (\
> - _Generic( (int (*)[__ARM_mve_typeid(p0)])0, \
> - int (*)[__ARM_mve_type_int8_t_ptr]: __arm_vld1q_s8
> (__ARM_mve_coerce_s8_ptr(p0, int8_t *)), \
> - int (*)[__ARM_mve_type_int16_t_ptr]: __arm_vld1q_s16
> (__ARM_mve_coerce_s16_ptr(p0, int16_t *)), \
> - int (*)[__ARM_mve_type_int32_t_ptr]: __arm_vld1q_s32
> (__ARM_mve_coerce_s32_ptr(p0, int32_t *)), \
> - int (*)[__ARM_mve_type_uint8_t_ptr]: __arm_vld1q_u8
> (__ARM_mve_coerce_u8_ptr(p0, uint8_t *)), \
> - int (*)[__ARM_mve_type_uint16_t_ptr]: __arm_vld1q_u16
> (__ARM_mve_coerce_u16_ptr(p0, uint16_t *)), \
> - int (*)[__ARM_mve_type_uint32_t_ptr]: __arm_vld1q_u32
> (__ARM_mve_coerce_u32_ptr(p0, uint32_t *)), \
> - int (*)[__ARM_mve_type_float16_t_ptr]: __arm_vld1q_f16
> (__ARM_mve_coerce_f16_ptr(p0, float16_t *)), \
> - int (*)[__ARM_mve_type_float32_t_ptr]: __arm_vld1q_f32
> (__ARM_mve_coerce_f32_ptr(p0, float32_t *))))
> -
> #define __arm_vld1q_z(p0,p1) ( \
> _Generic( (int (*)[__ARM_mve_typeid(p0)])0, \
> int (*)[__ARM_mve_type_int8_t_ptr]: __arm_vld1q_z_s8
> (__ARM_mve_coerce_s8_ptr(p0, int8_t *), p1), \
> @@ -8792,17 +8539,6 @@ extern void *__ARM_undef;
> int (*)[__ARM_mve_type_float16_t_ptr][__ARM_mve_type_float16x8x2_t]:
> __arm_vst2q_f16 (__ARM_mve_coerce_f16_ptr(p0, float16_t *),
> __ARM_mve_coerce(__p1, float16x8x2_t)), \
> int (*)[__ARM_mve_type_float32_t_ptr][__ARM_mve_type_float32x4x2_t]:
> __arm_vst2q_f32 (__ARM_mve_coerce_f32_ptr(p0, float32_t *),
> __ARM_mve_coerce(__p1, float32x4x2_t)));})
>
> -#define __arm_vst1q(p0,p1) ({ __typeof(p1) __p1 = (p1); \
> - _Generic( (int (*)[__ARM_mve_typeid(p0)][__ARM_mve_typeid(__p1)])0, \
> - int (*)[__ARM_mve_type_int8_t_ptr][__ARM_mve_type_int8x16_t]:
> __arm_vst1q_s8 (__ARM_mve_coerce_s8_ptr(p0, int8_t *),
> __ARM_mve_coerce(__p1, int8x16_t)), \
> - int (*)[__ARM_mve_type_int16_t_ptr][__ARM_mve_type_int16x8_t]:
> __arm_vst1q_s16 (__ARM_mve_coerce_s16_ptr(p0, int16_t *),
> __ARM_mve_coerce(__p1, int16x8_t)), \
> - int (*)[__ARM_mve_type_int32_t_ptr][__ARM_mve_type_int32x4_t]:
> __arm_vst1q_s32 (__ARM_mve_coerce_s32_ptr(p0, int32_t *),
> __ARM_mve_coerce(__p1, int32x4_t)), \
> - int (*)[__ARM_mve_type_uint8_t_ptr][__ARM_mve_type_uint8x16_t]:
> __arm_vst1q_u8 (__ARM_mve_coerce_u8_ptr(p0, uint8_t *),
> __ARM_mve_coerce(__p1, uint8x16_t)), \
> - int (*)[__ARM_mve_type_uint16_t_ptr][__ARM_mve_type_uint16x8_t]:
> __arm_vst1q_u16 (__ARM_mve_coerce_u16_ptr(p0, uint16_t *),
> __ARM_mve_coerce(__p1, uint16x8_t)), \
> - int (*)[__ARM_mve_type_uint32_t_ptr][__ARM_mve_type_uint32x4_t]:
> __arm_vst1q_u32 (__ARM_mve_coerce_u32_ptr(p0, uint32_t *),
> __ARM_mve_coerce(__p1, uint32x4_t)), \
> - int (*)[__ARM_mve_type_float16_t_ptr][__ARM_mve_type_float16x8_t]:
> __arm_vst1q_f16 (__ARM_mve_coerce_f16_ptr(p0, float16_t *),
> __ARM_mve_coerce(__p1, float16x8_t)), \
> - int (*)[__ARM_mve_type_float32_t_ptr][__ARM_mve_type_float32x4_t]:
> __arm_vst1q_f32 (__ARM_mve_coerce_f32_ptr(p0, float32_t *),
> __ARM_mve_coerce(__p1, float32x4_t)));})
> -
> #define __arm_vstrhq(p0,p1) ({ __typeof(p1) __p1 = (p1); \
> _Generic( (int (*)[__ARM_mve_typeid(p0)][__ARM_mve_typeid(__p1)])0, \
> int (*)[__ARM_mve_type_int16_t_ptr][__ARM_mve_type_int16x8_t]:
> __arm_vstrhq_s16 (__ARM_mve_coerce_s16_ptr(p0, int16_t *),
> __ARM_mve_coerce(__p1, int16x8_t)), \
> @@ -9149,15 +8885,6 @@ extern void *__ARM_undef;
> int (*)[__ARM_mve_type_int32x4_t]: __arm_vstrwq_scatter_base_p_s32
> (p0, p1, __ARM_mve_coerce(__p2, int32x4_t), p3), \
> int (*)[__ARM_mve_type_uint32x4_t]: __arm_vstrwq_scatter_base_p_u32
> (p0, p1, __ARM_mve_coerce(__p2, uint32x4_t), p3));})
>
> -#define __arm_vld1q(p0) (\
> - _Generic( (int (*)[__ARM_mve_typeid(p0)])0, \
> - int (*)[__ARM_mve_type_int8_t_ptr]: __arm_vld1q_s8
> (__ARM_mve_coerce_s8_ptr(p0, int8_t *)), \
> - int (*)[__ARM_mve_type_int16_t_ptr]: __arm_vld1q_s16
> (__ARM_mve_coerce_s16_ptr(p0, int16_t *)), \
> - int (*)[__ARM_mve_type_int32_t_ptr]: __arm_vld1q_s32
> (__ARM_mve_coerce_s32_ptr(p0, int32_t *)), \
> - int (*)[__ARM_mve_type_uint8_t_ptr]: __arm_vld1q_u8
> (__ARM_mve_coerce_u8_ptr(p0, uint8_t *)), \
> - int (*)[__ARM_mve_type_uint16_t_ptr]: __arm_vld1q_u16
> (__ARM_mve_coerce_u16_ptr(p0, uint16_t *)), \
> - int (*)[__ARM_mve_type_uint32_t_ptr]: __arm_vld1q_u32
> (__ARM_mve_coerce_u32_ptr(p0, uint32_t *))))
> -
> #define __arm_vldrhq_gather_offset(p0,p1) ({ __typeof(p1) __p1 = (p1); \
> _Generic( (int (*)[__ARM_mve_typeid(p0)][__ARM_mve_typeid(__p1)])0, \
> int (*)[__ARM_mve_type_int16_t_ptr][__ARM_mve_type_uint16x8_t]:
> __arm_vldrhq_gather_offset_s16 (__ARM_mve_coerce_s16_ptr(p0, int16_t
> *), __ARM_mve_coerce(__p1, uint16x8_t)), \
> @@ -9206,15 +8933,6 @@ extern void *__ARM_undef;
> int (*)[__ARM_mve_type_int32_t_ptr]:
> __arm_vldrwq_gather_shifted_offset_z_s32
> (__ARM_mve_coerce_s32_ptr(__p0, int32_t *), p1, p2), \
> int (*)[__ARM_mve_type_uint32_t_ptr]:
> __arm_vldrwq_gather_shifted_offset_z_u32
> (__ARM_mve_coerce_u32_ptr(__p0, uint32_t *), p1, p2));})
>
> -#define __arm_vst1q(p0,p1) ({ __typeof(p1) __p1 = (p1); \
> - _Generic( (int (*)[__ARM_mve_typeid(p0)][__ARM_mve_typeid(__p1)])0, \
> - int (*)[__ARM_mve_type_int8_t_ptr][__ARM_mve_type_int8x16_t]:
> __arm_vst1q_s8 (__ARM_mve_coerce_s8_ptr(p0, int8_t *),
> __ARM_mve_coerce(__p1, int8x16_t)), \
> - int (*)[__ARM_mve_type_int16_t_ptr][__ARM_mve_type_int16x8_t]:
> __arm_vst1q_s16 (__ARM_mve_coerce_s16_ptr(p0, int16_t *),
> __ARM_mve_coerce(__p1, int16x8_t)), \
> - int (*)[__ARM_mve_type_int32_t_ptr][__ARM_mve_type_int32x4_t]:
> __arm_vst1q_s32 (__ARM_mve_coerce_s32_ptr(p0, int32_t *),
> __ARM_mve_coerce(__p1, int32x4_t)), \
> - int (*)[__ARM_mve_type_uint8_t_ptr][__ARM_mve_type_uint8x16_t]:
> __arm_vst1q_u8 (__ARM_mve_coerce_u8_ptr(p0, uint8_t *),
> __ARM_mve_coerce(__p1, uint8x16_t)), \
> - int (*)[__ARM_mve_type_uint16_t_ptr][__ARM_mve_type_uint16x8_t]:
> __arm_vst1q_u16 (__ARM_mve_coerce_u16_ptr(p0, uint16_t *),
> __ARM_mve_coerce(__p1, uint16x8_t)), \
> - int (*)[__ARM_mve_type_uint32_t_ptr][__ARM_mve_type_uint32x4_t]:
> __arm_vst1q_u32 (__ARM_mve_coerce_u32_ptr(p0, uint32_t *),
> __ARM_mve_coerce(__p1, uint32x4_t)));})
> -
> #define __arm_vst1q_p(p0,p1,p2) ({ __typeof(p1) __p1 = (p1); \
> _Generic( (int (*)[__ARM_mve_typeid(p0)][__ARM_mve_typeid(__p1)])0, \
> int (*)[__ARM_mve_type_int8_t_ptr][__ARM_mve_type_int8x16_t]:
> __arm_vst1q_p_s8 (__ARM_mve_coerce_s8_ptr(p0, int8_t *),
> __ARM_mve_coerce(__p1, int8x16_t), p2), \
> diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
> index 366cec0812a..b0d3443da9c 100644
> --- a/gcc/config/arm/mve.md
> +++ b/gcc/config/arm/mve.md
> @@ -3690,7 +3690,7 @@ (define_insn "mve_vldrwq_z_<supf>v4si"
> }
> [(set_attr "length" "8")])
>
> -(define_expand "mve_vld1q_f<mode>"
> +(define_expand "@mve_vld1q_f<mode>"
> [(match_operand:MVE_0 0 "s_register_operand")
> (unspec:MVE_0 [(match_operand:<MVE_CNVT> 1
> "mve_memory_operand")] VLD1Q_F)
> ]
> @@ -3700,7 +3700,7 @@ (define_expand "mve_vld1q_f<mode>"
> DONE;
> })
>
> -(define_expand "mve_vld1q_<supf><mode>"
> +(define_expand "@mve_vld1q_<supf><mode>"
> [(match_operand:MVE_2 0 "s_register_operand")
> (unspec:MVE_2 [(match_operand:MVE_2 1 "mve_memory_operand")]
> VLD1Q)
> ]
> @@ -4408,7 +4408,7 @@ (define_insn "mve_vstrwq_<supf>v4si"
> }
> [(set_attr "length" "4")])
>
> -(define_expand "mve_vst1q_f<mode>"
> +(define_expand "@mve_vst1q_f<mode>"
> [(match_operand:<MVE_CNVT> 0 "mve_memory_operand")
> (unspec:<MVE_CNVT> [(match_operand:MVE_0 1 "s_register_operand")]
> VST1Q_F)
> ]
> @@ -4418,7 +4418,7 @@ (define_expand "mve_vst1q_f<mode>"
> DONE;
> })
>
> -(define_expand "mve_vst1q_<supf><mode>"
> +(define_expand "@mve_vst1q_<supf><mode>"
> [(match_operand:MVE_2 0 "mve_memory_operand")
> (unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand")] VST1Q)
> ]
> --
> 2.34.1
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 3/6] arm: [MVE intrinsics] Add support for contiguous loads and stores
2023-11-16 15:26 ` [PATCH 3/6] arm: [MVE intrinsics] Add support for contiguous loads and stores Christophe Lyon
2023-11-16 16:48 ` Kyrylo Tkachov
@ 2023-11-23 13:29 ` Jan-Benedict Glaw
2023-11-23 15:57 ` Christophe Lyon
1 sibling, 1 reply; 15+ messages in thread
From: Jan-Benedict Glaw @ 2023-11-23 13:29 UTC (permalink / raw)
To: Christophe Lyon
Cc: gcc-patches, richard.sandiford, richard.earnshaw, kyrylo.tkachov
[-- Attachment #1: Type: text/plain, Size: 1387 bytes --]
On Thu, 2023-11-16 15:26:14 +0000, Christophe Lyon <christophe.lyon@linaro.org> wrote:
> diff --git a/gcc/config/arm/arm-mve-builtins-functions.h b/gcc/config/arm/arm-mve-builtins-functions.h
> index eba1f071af0..6d234a2dd7c 100644
> --- a/gcc/config/arm/arm-mve-builtins-functions.h
> +++ b/gcc/config/arm/arm-mve-builtins-functions.h
> @@ -966,6 +966,62 @@ public:
[...]
> +class full_width_access : public multi_vector_function
> +{
> +public:
> + CONSTEXPR full_width_access (unsigned int vectors_per_tuple = 1)
> + : multi_vector_function (vectors_per_tuple) {}
> +
> + tree
> + memory_scalar_type (const function_instance &fi) const override
> + {
> + return fi.scalar_type (0);
> + }
> +
> + machine_mode
> + memory_vector_mode (const function_instance &fi) const override
> + {
> + machine_mode mode = fi.vector_mode (0);
> + /* Vectors of floating-point are managed in memory as vectors of
> + integers. */
> + switch (mode)
> + {
> + case E_V4SFmode:
> + mode = E_V4SImode;
> + break;
> + case E_V8HFmode:
> + mode = E_V8HImode;
> + break;
> + }
This introduces warnings about many enum values not being handled, so
a default would be good I think. (I do automated builds with
--enable-werror-always, see eg.
http://toolchain.lug-owl.de/laminar/log/gcc-arm-eabi/48)
MfG, JBG
--
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 195 bytes --]
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 3/6] arm: [MVE intrinsics] Add support for contiguous loads and stores
2023-11-23 13:29 ` Jan-Benedict Glaw
@ 2023-11-23 15:57 ` Christophe Lyon
0 siblings, 0 replies; 15+ messages in thread
From: Christophe Lyon @ 2023-11-23 15:57 UTC (permalink / raw)
To: Jan-Benedict Glaw
Cc: gcc-patches, richard.sandiford, richard.earnshaw, kyrylo.tkachov
Hi!
On Thu, 23 Nov 2023 at 14:29, Jan-Benedict Glaw <jbglaw@lug-owl.de> wrote:
>
> On Thu, 2023-11-16 15:26:14 +0000, Christophe Lyon <christophe.lyon@linaro.org> wrote:
> > diff --git a/gcc/config/arm/arm-mve-builtins-functions.h b/gcc/config/arm/arm-mve-builtins-functions.h
> > index eba1f071af0..6d234a2dd7c 100644
> > --- a/gcc/config/arm/arm-mve-builtins-functions.h
> > +++ b/gcc/config/arm/arm-mve-builtins-functions.h
> > @@ -966,6 +966,62 @@ public:
> [...]
>
> > +class full_width_access : public multi_vector_function
> > +{
> > +public:
> > + CONSTEXPR full_width_access (unsigned int vectors_per_tuple = 1)
> > + : multi_vector_function (vectors_per_tuple) {}
> > +
> > + tree
> > + memory_scalar_type (const function_instance &fi) const override
> > + {
> > + return fi.scalar_type (0);
> > + }
> > +
> > + machine_mode
> > + memory_vector_mode (const function_instance &fi) const override
> > + {
> > + machine_mode mode = fi.vector_mode (0);
> > + /* Vectors of floating-point are managed in memory as vectors of
> > + integers. */
> > + switch (mode)
> > + {
> > + case E_V4SFmode:
> > + mode = E_V4SImode;
> > + break;
> > + case E_V8HFmode:
> > + mode = E_V8HImode;
> > + break;
> > + }
>
> This introduces warnings about many enum values not being handled, so
> a default would be good I think. (I do automated builds with
> --enable-werror-always, see eg.
> http://toolchain.lug-owl.de/laminar/log/gcc-arm-eabi/48)
>
Ha right, thanks for catching this.
Fixed by commit b9dbdefac626ba20222ca534b58f7e493d713b9a
Christophe
> MfG, JBG
>
> --
^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2023-11-23 15:58 UTC | newest]
Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-11-16 15:26 [PATCH 1/6] arm: Fix arm_simd_types and MVE scalar_types Christophe Lyon
2023-11-16 15:26 ` [PATCH 2/6] arm: [MVE intrinsics] Add support for void and load/store pointers as argument types Christophe Lyon
2023-11-16 16:47 ` Kyrylo Tkachov
2023-11-16 15:26 ` [PATCH 3/6] arm: [MVE intrinsics] Add support for contiguous loads and stores Christophe Lyon
2023-11-16 16:48 ` Kyrylo Tkachov
2023-11-23 13:29 ` Jan-Benedict Glaw
2023-11-23 15:57 ` Christophe Lyon
2023-11-16 15:26 ` [PATCH 4/6] arm: [MVE intrinsics] add load and store shapes Christophe Lyon
2023-11-16 16:49 ` Kyrylo Tkachov
2023-11-16 15:26 ` [PATCH 5/6] arm: [MVE intrinsics] fix vst1 tests Christophe Lyon
2023-11-16 15:30 ` Kyrylo Tkachov
2023-11-16 15:37 ` Christophe Lyon
2023-11-16 15:26 ` [PATCH 6/6] arm: [MVE intrinsics] rework vldq1 vst1q Christophe Lyon
2023-11-16 16:49 ` Kyrylo Tkachov
2023-11-16 16:46 ` [PATCH 1/6] arm: Fix arm_simd_types and MVE scalar_types Kyrylo Tkachov
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).