Re: [PATCH] aarch64: Add -mveclibabi=sleefgnu

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

From: "Andre Vieira (lists)" <andre.simoesdiasvieira@arm.com>
To: Lou Knauer <lou.knauer@sipearl.com>, Andrew Pinski <pinskia@gmail.com>
Cc: "gcc-patches@gcc.gnu.org" <gcc-patches@gcc.gnu.org>,
	Etienne Renault <etienne.renault@sipearl.com>
Subject: Re: [PATCH] aarch64: Add -mveclibabi=sleefgnu
Date: Fri, 14 Apr 2023 16:23:05 +0100	[thread overview]
Message-ID: <af88e102-3775-4a44-7ca5-698f9918cae8@arm.com> (raw)
In-Reply-To: <d3f697ffb3df4c1ea213fbf7d724ccfc@ex13mbxc01n01.ikhex.ikoula.com>

No problem Lou and testing will be appreciated.

I strongly recommend against rebasing the version that is on the mailing 
list now, the conflicts with Andrew's patches aren't simple to resolve ;)

I'll do my best to get you revised versions next week :)

On 14/04/2023 16:07, Lou Knauer wrote:
> Hello Andre,
> 
> Thank you for your nice reply. I can see that your patches,
> combined with upcoming AArch64 support in libmvec,
> will nullify the need for my patch.
> 
> I will definitely test your patches and look forward to seeing them
> evolve, I will rebase and apply them myself.
> 
> Thank you very much!
> Kind regards,
> Lou
> 
>> -----Original Message-----
>> From: Andre Vieira (lists) <andre.simoesdiasvieira@arm.com>
>> Sent: Friday, April 14, 2023 12:30
>> To: Lou Knauer <lou.knauer@sipearl.com>; Andrew Pinski <pinskia@gmail.com>
>> Cc: gcc-patches@gcc.gnu.org; Etienne Renault <etienne.renault@sipearl.com>
>> Subject: Re: [PATCH] aarch64: Add -mveclibabi=sleefgnu
>>
>> I have (outdated) RFC's here:
>> https://gcc.gnu.org/pipermail/gcc-patches/2023-March/613593.html
>>
>> I am working on this patch series for stage 1. The list of features I am
>> working on are:
>> * SVE support for #pragma omp declare simd
>> * Support for simdclone usage in autovec from #pragma omp declare variant
>>     This offers us a more fine-tuned approach to define what is and
>> what's not available per function
>> * Support for use of simdclones in SLP
>>
>> Also planning to enable the use of mixed-types that is currently
>> disabled for AArch64, it's not a feature I suspect we need for our
>> use-case but it will enable better testing as we can then enable AArch64
>> as a simdclone target in the testsuite.
>>
>> I could try to post some updates to the RFCs, I have been rebasing them
>> on top of Andrew Stubbs latest patch to enable inbranch codegen. Let me
>> know if you'd like to see these updates sooner rather than later so you
>> can try them out for your usecase.
>>
>> Kind regards,
>> Andre
>>
>> On 14/04/2023 10:34, Lou Knauer via Gcc-patches wrote:
>>>> -----Original Message-----
>>>> From: Andrew Pinski <pinskia@gmail.com>
>>>> Sent: Friday, April 14, 2023 09:08
>>>> To: Lou Knauer <lou.knauer@sipearl.com>
>>>> Cc: gcc-patches@gcc.gnu.org; Etienne Renault <etienne.renault@sipearl.com>
>>>> Subject: Re: [PATCH] aarch64: Add -mveclibabi=sleefgnu
>>>>
>>>> On Fri, Apr 14, 2023 at 12:03 AM Lou Knauer via Gcc-patches
>>>> <gcc-patches@gcc.gnu.org> wrote:
>>>>>
>>>>> This adds support for the -mveclibabi option to the AArch64 backend of GCC by
>>>>> implementing the builtin_vectorized_function target hook for AArch64.
>>>>> The SLEEF Vectorized Math Library's GNUABI interface is used, and
>>>>> NEON/Advanced SIMD as well as SVE are supported.
>>>>>
>>>>> This was tested on the gcc testsuite and the llvm-test-suite on a AArch64
>>>>> host for NEON and SVE as well as on hand-written benchmarks. Where the
>>>>> vectorization of builtins was applied successfully in loops bound by the
>>>>> calls to those, significant (>2) performance gains can be observed.
>>>>
>>>> This is so wrong and it is better if you actually just used a header
>>>> file instead.  Specifically the openmp vect pragmas.
>>>>
>>>> Thanks,
>>>> Andrew Pinski
>>>>
>>>
>>> Thank you for your quick response. I do not fully understand your point:
>>> the OpenMP Declare SIMD pragmas are not yet implemented for SVE (here [0]
>>> someone started working on that, but it does not work in its current state).
>>> The `-mveclibabi` flag seems to be the only solution for SVE vectorization of
>>> libm functions from our point of view.
>>>
>>> Indeed, a custom header that redirects regular libm function calls to their
>>> Sleef equivalent would be a solution for NEON since OpenMP Declare SIMD
>>> pragmas are implemented for NEON in GCC. Nonetheless as far as I can tell,
>>> the libmvec is not yet support for AArch64, so Sleef is unavoidable. I
>>> therefore opted for a solution similar to the one for x86 and the SVML, where
>>> only a additional flag during compilation is needed (instead of having to
>>> modify source code to add includes). From a vectorization legality perspective,
>>> this strategy also seems more reliable than a redirecting header since
>>> Sleef functions (even the scalar ones) never set the errno and GCC already
>>> verifies such details when transforming libm calls to builtins.
>>>
>>> Alternatively, do you prefere a patch that adds SVE support for
>>> #pragma omp declare simd declarations, thus enabling the same header-based
>>> strategy for SVE as for NEON?
>>>
>>> Thank you and kind regards,
>>> Lou Knauer
>>>
>>> [0]: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96342
>>>
>>>>
>>>>>
>>>>> gcc/ChangeLog:
>>>>>
>>>>>           * config/aarch64/aarch64.opt: Add -mveclibabi option.
>>>>>           * config/aarch64/aarch64-opts.h: Add aarch64_veclibabi enum.
>>>>>           * config/aarch64/aarch64-protos.h: Add
>>>>>           aarch64_builtin_vectorized_function declaration.
>>>>>           * config/aarch64/aarch64.cc: Handle -mveclibabi option and pure
>>>>>           scalable type info for scalable vectors without "SVE type" attributes.
>>>>>           * config/aarch64/aarch64-builtins.cc: Add
>>>>>           aarch64_builtin_vectorized_function definition.
>>>>>           * doc/invoke.texi: Document -mveclibabi for AArch64 targets.
>>>>>
>>>>> gcc/testsuite/ChangeLog:
>>>>>
>>>>>           * gcc.target/aarch64/vect-vecabi-sleefgnu-neon.c: New testcase.
>>>>>           * gcc.target/aarch64/vect-vecabi-sleefgnu-sve.c: New testcase.
>>>>> ---
>>>>>    gcc/config/aarch64/aarch64-builtins.cc        | 113 ++++++++++++++++++
>>>>>    gcc/config/aarch64/aarch64-opts.h             |   5 +
>>>>>    gcc/config/aarch64/aarch64-protos.h           |   3 +
>>>>>    gcc/config/aarch64/aarch64.cc                 |  66 ++++++++++
>>>>>    gcc/config/aarch64/aarch64.opt                |  15 +++
>>>>>    gcc/doc/invoke.texi                           |  15 +++
>>>>>    .../aarch64/vect-vecabi-sleefgnu-neon.c       |  16 +++
>>>>>    .../aarch64/vect-vecabi-sleefgnu-sve.c        |  16 +++
>>>>>    8 files changed, 249 insertions(+)
>>>>>    create mode 100644 gcc/testsuite/gcc.target/aarch64/vect-vecabi-sleefgnu-neon.c
>>>>>    create mode 100644 gcc/testsuite/gcc.target/aarch64/vect-vecabi-sleefgnu-sve.c
>>>>>
>>>>> diff --git a/gcc/config/aarch64/aarch64-builtins.cc b/gcc/config/aarch64/aarch64-builtins.cc
>>>>> index cc6b7c01fd1..f53fa91b8d0 100644
>>>>> --- a/gcc/config/aarch64/aarch64-builtins.cc
>>>>> +++ b/gcc/config/aarch64/aarch64-builtins.cc
>>>>> @@ -47,6 +47,7 @@
>>>>>    #include "stringpool.h"
>>>>>    #include "attribs.h"
>>>>>    #include "gimple-fold.h"
>>>>> +#include "builtins.h"
>>>>>
>>>>>    #define v8qi_UP  E_V8QImode
>>>>>    #define v8di_UP  E_V8DImode
>>>>> @@ -3450,6 +3451,118 @@ aarch64_resolve_overloaded_builtin_general (location_t loc, tree function,
>>>>>      return NULL_TREE;
>>>>>    }
>>>>>
>>>>> +/* The vector library abi to use, if any.  */
>>>>> +extern enum aarch64_veclibabi aarch64_selected_veclibabi;
>>>>> +
>>>>> +/* Returns a function declaration for a vectorized version of the combined
>>>>> +   function with combined_fn code FN and the result vector type TYPE.
>>>>> +   NULL_TREE is returned if there is none available.  */
>>>>> +tree
>>>>> +aarch64_builtin_vectorized_function (unsigned int fn_code,
>>>>> +                                    tree type_out, tree type_in)
>>>>> +{
>>>>> +  if (TREE_CODE (type_out) != VECTOR_TYPE
>>>>> +      || TREE_CODE (type_in) != VECTOR_TYPE
>>>>> +      || aarch64_selected_veclibabi != aarch64_veclibabi_type_sleefgnu
>>>>> +      || !flag_unsafe_math_optimizations)
>>>>> +    return NULL_TREE;
>>>>> +
>>>>> +  machine_mode mode = TYPE_MODE (TREE_TYPE (type_out));
>>>>> +  poly_uint64 n = TYPE_VECTOR_SUBPARTS (type_out);
>>>>> +  if (mode != TYPE_MODE (TREE_TYPE (type_in))
>>>>> +      || !known_eq (n, TYPE_VECTOR_SUBPARTS (type_in)))
>>>>> +    return NULL_TREE;
>>>>> +
>>>>> +  bool is_scalable = !n.is_constant ();
>>>>> +  if (is_scalable)
>>>>> +    {
>>>>> +      /* SVE is needed for scalable vectors, a SVE register's size is
>>>>> +        always a multiple of 128.  */
>>>>> +      if (!TARGET_SVE
>>>>> +         || (mode == DFmode && !known_eq (n, poly_uint64 (2, 2)))
>>>>> +         || (mode == SFmode && !known_eq (n, poly_uint64 (4, 4))))
>>>>> +       return NULL_TREE;
>>>>> +    }
>>>>> +  else
>>>>> +    {
>>>>> +      /* A NEON register can hold two doubles or one float.  */
>>>>> +      if (!TARGET_SIMD
>>>>> +         || (mode == DFmode && n.to_constant () != 2)
>>>>> +         || (mode == SFmode && n.to_constant () != 4))
>>>>> +       return NULL_TREE;
>>>>> +    }
>>>>> +
>>>>> +  tree fntype;
>>>>> +  combined_fn fn = combined_fn (fn_code);
>>>>> +  const char *argencoding;
>>>>> +  switch (fn)
>>>>> +    {
>>>>> +      CASE_CFN_EXP:
>>>>> +      CASE_CFN_LOG:
>>>>> +      CASE_CFN_LOG10:
>>>>> +      CASE_CFN_TANH:
>>>>> +      CASE_CFN_TAN:
>>>>> +      CASE_CFN_ATAN:
>>>>> +      CASE_CFN_ATANH:
>>>>> +      CASE_CFN_CBRT:
>>>>> +      CASE_CFN_SINH:
>>>>> +      CASE_CFN_SIN:
>>>>> +      CASE_CFN_ASINH:
>>>>> +      CASE_CFN_ASIN:
>>>>> +      CASE_CFN_COSH:
>>>>> +      CASE_CFN_COS:
>>>>> +      CASE_CFN_ACOSH:
>>>>> +      CASE_CFN_ACOS:
>>>>> +       fntype = build_function_type_list (type_out, type_in, NULL);
>>>>> +       argencoding = "v";
>>>>> +       break;
>>>>> +
>>>>> +      CASE_CFN_POW:
>>>>> +      CASE_CFN_ATAN2:
>>>>> +       fntype = build_function_type_list (type_out, type_in, type_in, NULL);
>>>>> +       argencoding = "vv";
>>>>> +       break;
>>>>> +
>>>>> +      default:
>>>>> +       return NULL_TREE;
>>>>> +    }
>>>>> +
>>>>> +  tree fndecl = mathfn_built_in (mode == DFmode
>>>>> +                                ? double_type_node : float_type_node, fn);
>>>>> +  const char *scalar_name = IDENTIFIER_POINTER (DECL_NAME (fndecl));
>>>>> +  /* Builtins will always be prefixed with '__builtin_'.  */
>>>>> +  gcc_assert (strncmp (scalar_name, "__builtin_", 10) == 0);
>>>>> +  scalar_name += 10;
>>>>> +
>>>>> +  char vectorized_name[32];
>>>>> +  if (is_scalable)
>>>>> +    {
>>>>> +      /* SVE ISA */
>>>>> +      int n = snprintf (vectorized_name, sizeof (vectorized_name),
>>>>> +                       "_ZGVsNx%s_%s", argencoding, scalar_name);
>>>>> +      if (n < 0 || n > sizeof (vectorized_name))
>>>>> +       return NULL_TREE;
>>>>> +    }
>>>>> +  else
>>>>> +    {
>>>>> +      /* NEON ISA */
>>>>> +      int n = snprintf (vectorized_name, sizeof (vectorized_name),
>>>>> +                       "_ZGVnN%d%s_%s", mode == SFmode ? 4 : 2,
>>>>> +                       argencoding, scalar_name);
>>>>> +      if (n < 0 || n > sizeof (vectorized_name))
>>>>> +       return NULL_TREE;
>>>>> +    }
>>>>> +
>>>>> +  tree new_fndecl = build_decl (BUILTINS_LOCATION, FUNCTION_DECL,
>>>>> +                               get_identifier (vectorized_name), fntype);
>>>>> +  TREE_PUBLIC (new_fndecl) = 1;
>>>>> +  TREE_READONLY (new_fndecl) = 1;
>>>>> +  DECL_EXTERNAL (new_fndecl) = 1;
>>>>> +  DECL_IS_NOVOPS (new_fndecl) = 1;
>>>>> +
>>>>> +  return new_fndecl;
>>>>> +}
>>>>> +
>>>>>    #undef AARCH64_CHECK_BUILTIN_MODE
>>>>>    #undef AARCH64_FIND_FRINT_VARIANT
>>>>>    #undef CF0
>>>>> diff --git a/gcc/config/aarch64/aarch64-opts.h b/gcc/config/aarch64/aarch64-opts.h
>>>>> index a9f3e2715ca..d12871b893c 100644
>>>>> --- a/gcc/config/aarch64/aarch64-opts.h
>>>>> +++ b/gcc/config/aarch64/aarch64-opts.h
>>>>> @@ -98,4 +98,9 @@ enum aarch64_key_type {
>>>>>      AARCH64_KEY_B
>>>>>    };
>>>>>
>>>>> +enum aarch64_veclibabi {
>>>>> +  aarch64_veclibabi_type_none,
>>>>> +  aarch64_veclibabi_type_sleefgnu
>>>>> +};
>>>>> +
>>>>>    #endif
>>>>> diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
>>>>> index 63339fa47df..53c6e455da8 100644
>>>>> --- a/gcc/config/aarch64/aarch64-protos.h
>>>>> +++ b/gcc/config/aarch64/aarch64-protos.h
>>>>> @@ -1066,4 +1066,7 @@ extern bool aarch64_harden_sls_blr_p (void);
>>>>>
>>>>>    extern void aarch64_output_patchable_area (unsigned int, bool);
>>>>>
>>>>> +extern tree aarch64_builtin_vectorized_function (unsigned int fn,
>>>>> +                                                tree type_out, tree type_in);
>>>>> +
>>>>>    #endif /* GCC_AARCH64_PROTOS_H */
>>>>> diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
>>>>> index 42617ced73a..50ac37ff01e 100644
>>>>> --- a/gcc/config/aarch64/aarch64.cc
>>>>> +++ b/gcc/config/aarch64/aarch64.cc
>>>>> @@ -84,6 +84,7 @@
>>>>>    #include "aarch64-feature-deps.h"
>>>>>    #include "config/arm/aarch-common.h"
>>>>>    #include "config/arm/aarch-common-protos.h"
>>>>> +#include "print-tree.h"
>>>>>
>>>>>    /* This file should be included last.  */
>>>>>    #include "target-def.h"
>>>>> @@ -2951,6 +2952,62 @@ pure_scalable_type_info::analyze (const_tree type)
>>>>>          return IS_PST;
>>>>>        }
>>>>>
>>>>> +  /* Only functions and types that are part of the ARM C Language
>>>>> +     Extensions (arm_sve.h) have the SVE type attributes.
>>>>> +     The auto-vectorizer does not annotate the vector types it creates with
>>>>> +     those attributes. With the support of vectorized libm function
>>>>> +     builtins  for SVE, scalable vectors without special attributes
>>>>> +     have to be treated as well.  */
>>>>> +  if (TREE_CODE (type) == VECTOR_TYPE
>>>>> +      && !TYPE_VECTOR_SUBPARTS (type).is_constant ())
>>>>> +    {
>>>>> +      /* Boolean vectors are special because they are used by
>>>>> +        the vectorizer as masks that must go into the
>>>>> +        predicate registers.  */
>>>>> +      if (TREE_CODE (TREE_TYPE (type)) == BOOLEAN_TYPE)
>>>>> +       {
>>>>> +         p.num_zr = 0;
>>>>> +         p.num_pr = 1;
>>>>> +         p.mode = p.orig_mode = TYPE_MODE (type);
>>>>> +         add_piece (p);
>>>>> +         return IS_PST;
>>>>> +       }
>>>>> +
>>>>> +      static const struct {
>>>>> +       machine_mode mode;
>>>>> +       unsigned int element_size;
>>>>> +       poly_uint64 vector_size;
>>>>> +      } valid_vectors[] = {
>>>>> +       { VNx8BFmode, 16, poly_uint64 (8, 8) }, /* svbfloat16_t */
>>>>> +       { VNx8HFmode, 16, poly_uint64 (8, 8) }, /* svfloat16_t */
>>>>> +       { VNx4SFmode, 32, poly_uint64 (4, 4) }, /* svfloat32_t */
>>>>> +       { VNx2DFmode, 64, poly_uint64 (2, 2) }, /* svfloat64_t */
>>>>> +       { VNx16BImode, 8, poly_uint64 (16, 16) }, /* sv[u]int8_t */
>>>>> +       { VNx8HImode, 16, poly_uint64 (8, 8) }, /* sv[u]int16_t */
>>>>> +       { VNx4SImode, 32, poly_uint64 (4, 4) }, /* sv[u]int32_t */
>>>>> +       { VNx2DImode, 64, poly_uint64 (2, 2) }, /* sv[u]int64_t */
>>>>> +      };
>>>>> +
>>>>> +      machine_mode elm_mode = TYPE_MODE (TREE_TYPE (type));
>>>>> +      unsigned int elm_size = GET_MODE_BITSIZE (elm_mode).to_constant ();
>>>>> +      for (unsigned i = 0;
>>>>> +          i < sizeof (valid_vectors) / sizeof (valid_vectors[0]); i++)
>>>>> +       if (valid_vectors[i].element_size == elm_size
>>>>> +           && valid_vectors[i].mode == TYPE_MODE (type)
>>>>> +           && known_eq (valid_vectors[i].vector_size,
>>>>> +                        TYPE_VECTOR_SUBPARTS (type)))
>>>>> +         {
>>>>> +           p.num_zr = 1;
>>>>> +           p.num_pr = 0;
>>>>> +           p.mode = p.orig_mode = valid_vectors[i].mode;
>>>>> +           add_piece (p);
>>>>> +           return IS_PST;
>>>>> +         }
>>>>> +
>>>>> +      fatal_error (input_location, "unsupported vector type %qT"
>>>>> +                  " as function parameter without SVE attributes", type);
>>>>> +    }
>>>>> +
>>>>>      /* Check for user-defined PSTs.  */
>>>>>      if (TREE_CODE (type) == ARRAY_TYPE)
>>>>>        return analyze_array (type);
>>>>> @@ -17851,6 +17908,8 @@ aarch64_override_options_after_change_1 (struct gcc_options *opts)
>>>>>        flag_mrecip_low_precision_sqrt = true;
>>>>>    }
>>>>>
>>>>> +enum aarch64_veclibabi aarch64_selected_veclibabi = aarch64_veclibabi_type_none;
>>>>> +
>>>>>    /* 'Unpack' up the internal tuning structs and update the options
>>>>>        in OPTS.  The caller must have set up selected_tune and selected_arch
>>>>>        as all the other target-specific codegen decisions are
>>>>> @@ -18031,6 +18090,9 @@ aarch64_override_options_internal (struct gcc_options *opts)
>>>>>          && opts->x_optimize >= aarch64_tune_params.prefetch->default_opt_level)
>>>>>        opts->x_flag_prefetch_loop_arrays = 1;
>>>>>
>>>>> +  if (opts->x_aarch64_veclibabi_type == aarch64_veclibabi_type_sleefgnu)
>>>>> +    aarch64_selected_veclibabi = aarch64_veclibabi_type_sleefgnu;
>>>>> +
>>>>>      aarch64_override_options_after_change_1 (opts);
>>>>>    }
>>>>>
>>>>> @@ -28085,6 +28147,10 @@ aarch64_libgcc_floating_mode_supported_p
>>>>>    #undef TARGET_CONST_ANCHOR
>>>>>    #define TARGET_CONST_ANCHOR 0x1000000
>>>>>
>>>>> +#undef TARGET_VECTORIZE_BUILTIN_VECTORIZED_FUNCTION
>>>>> +#define TARGET_VECTORIZE_BUILTIN_VECTORIZED_FUNCTION \
>>>>> +  aarch64_builtin_vectorized_function
>>>>> +
>>>>>    struct gcc_target targetm = TARGET_INITIALIZER;
>>>>>
>>>>>    #include "gt-aarch64.h"
>>>>> diff --git a/gcc/config/aarch64/aarch64.opt b/gcc/config/aarch64/aarch64.opt
>>>>> index 1d7967db9c0..76013dacdea 100644
>>>>> --- a/gcc/config/aarch64/aarch64.opt
>>>>> +++ b/gcc/config/aarch64/aarch64.opt
>>>>> @@ -302,3 +302,18 @@ Constant memset size in bytes from which to start using MOPS sequence.
>>>>>    -param=aarch64-vect-unroll-limit=
>>>>>    Target Joined UInteger Var(aarch64_vect_unroll_limit) Init(4) Param
>>>>>    Limit how much the autovectorizer may unroll a loop.
>>>>> +
>>>>> +;; -mveclibabi=
>>>>> +TargetVariable
>>>>> +enum aarch64_veclibabi aarch64_veclibabi_type = aarch64_veclibabi_type_none
>>>>> +
>>>>> +mveclibabi=
>>>>> +Target RejectNegative Joined Var(aarch64_veclibabi_type) Enum(aarch64_veclibabi) Init(aarch64_veclibabi_type_none)
>>>>> +Vector library ABI to use.
>>>>> +
>>>>> +Enum
>>>>> +Name(aarch64_veclibabi) Type(enum aarch64_veclibabi)
>>>>> +Known vectorization library ABIs (for use with the -mveclibabi= option):
>>>>> +
>>>>> +EnumValue
>>>>> +Enum(aarch64_veclibabi) String(sleefgnu) Value(aarch64_veclibabi_type_sleefgnu)
>>>>> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
>>>>> index a38547f53e5..71fbbf27522 100644
>>>>> --- a/gcc/doc/invoke.texi
>>>>> +++ b/gcc/doc/invoke.texi
>>>>> @@ -20383,6 +20383,21 @@ across releases.
>>>>>
>>>>>    This option is only intended to be useful when developing GCC.
>>>>>
>>>>> +@opindex mveclibabi
>>>>> +@item -mveclibabi=@var{type}
>>>>> +Specifies the ABI type to use for vectorizing intrinsics using an
>>>>> +external library.  The only type supported at present is @samp{sleefgnu},
>>>>> +which specifies to use the GNU ABI variant of the Sleef Vectorized
>>>>> +Math Library. This flag can be used for both, Advanced SIMD (NEON) and SVE.
>>>>> +
>>>>> +GCC currently emits vectorized calls to @code{exp}, @code{log}, @code{log10},
>>>>> +@code{tanh}, @code{tan}, @code{atan}, @code{atanh}, @code{cbrt}, @code{sinh},
>>>>> +@code{sin}, @code{asinh} and @code{asin} when possible and profitable
>>>>> +on AArch64.
>>>>> +
>>>>> +Both @option{-ftree-vectorize} and @option{-funsafe-math-optimizations}
>>>>> +must also be enabled. The libsleefgnu must be specified at link time.
>>>>> +
>>>>>    @opindex mverbose-cost-dump
>>>>>    @item -mverbose-cost-dump
>>>>>    Enable verbose cost model dumping in the debug dump files.  This option is
>>>>> diff --git a/gcc/testsuite/gcc.target/aarch64/vect-vecabi-sleefgnu-neon.c b/gcc/testsuite/gcc.target/aarch64/vect-
>> vecabi-
>>>> sleefgnu-neon.c
>>>>> new file mode 100644
>>>>> index 00000000000..e9f6078cd12
>>>>> --- /dev/null
>>>>> +++ b/gcc/testsuite/gcc.target/aarch64/vect-vecabi-sleefgnu-neon.c
>>>>> @@ -0,0 +1,16 @@
>>>>> +/* { dg-do compile } */
>>>>> +/* { dg-options "-O3 -march=armv8-a+simd -ftree-vectorize -mveclibabi=sleefgnu -ffast-math" } */
>>>>> +
>>>>> +extern float sinf(float);
>>>>> +
>>>>> +float x[256];
>>>>> +
>>>>> +void foo(void)
>>>>> +{
>>>>> +  int i;
>>>>> +
>>>>> +  for (i=0; i<256; ++i)
>>>>> +    x[i] = sinf(x[i]);
>>>>> +}
>>>>> +
>>>>> +/* { dg-final { scan-assembler "_ZGVnN4v_sinf" } } */
>>>>> diff --git a/gcc/testsuite/gcc.target/aarch64/vect-vecabi-sleefgnu-sve.c b/gcc/testsuite/gcc.target/aarch64/vect-vecabi-
>>>> sleefgnu-sve.c
>>>>> new file mode 100644
>>>>> index 00000000000..8319ae420e1
>>>>> --- /dev/null
>>>>> +++ b/gcc/testsuite/gcc.target/aarch64/vect-vecabi-sleefgnu-sve.c
>>>>> @@ -0,0 +1,16 @@
>>>>> +/* { dg-do compile } */
>>>>> +/* { dg-options "-O3 -march=armv8-a+sve -ftree-vectorize -mveclibabi=sleefgnu -ffast-math" } */
>>>>> +
>>>>> +extern float sinf(float);
>>>>> +
>>>>> +float x[256];
>>>>> +
>>>>> +void foo(void)
>>>>> +{
>>>>> +  int i;
>>>>> +
>>>>> +  for (i=0; i<256; ++i)
>>>>> +    x[i] = sinf(x[i]);
>>>>> +}
>>>>> +
>>>>> +/* { dg-final { scan-assembler "_ZGVsNxv_sinf" } } */
>>>>> --
>>>>> 2.25.1
>>>>>

     prev parent reply	other threads:[~2023-04-14 15:23 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-04-14  7:02 Lou Knauer
2023-04-14  7:07 ` Andrew Pinski
2023-04-14  9:34   ` Lou Knauer
2023-04-14 10:30     ` Andre Vieira (lists)
2023-04-14 15:07       ` Lou Knauer
2023-04-14 15:23         ` Andre Vieira (lists) [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=af88e102-3775-4a44-7ca5-698f9918cae8@arm.com \
    --to=andre.simoesdiasvieira@arm.com \
    --cc=etienne.renault@sipearl.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=lou.knauer@sipearl.com \
    --cc=pinskia@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).