From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by sourceware.org (Postfix) with ESMTP id 818BD3858D20 for ; Fri, 14 Apr 2023 15:23:11 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 818BD3858D20 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id B4BDA2F4; Fri, 14 Apr 2023 08:23:55 -0700 (PDT) Received: from [10.57.68.11] (unknown [10.57.68.11]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 5C1743F6C4; Fri, 14 Apr 2023 08:23:10 -0700 (PDT) Message-ID: Date: Fri, 14 Apr 2023 16:23:05 +0100 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101 Thunderbird/102.9.0 Subject: Re: [PATCH] aarch64: Add -mveclibabi=sleefgnu Content-Language: en-US To: Lou Knauer , Andrew Pinski Cc: "gcc-patches@gcc.gnu.org" , Etienne Renault References: <93172fa5c23b4e0086b46d513a84922d@ex13mbxc01n01.ikhex.ikoula.com> <5583800a-c090-8261-406c-df232b1ae4d4@arm.com> From: "Andre Vieira (lists)" In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-16.3 required=5.0 tests=BAYES_00,GIT_PATCH_0,KAM_DMARC_NONE,KAM_DMARC_STATUS,KAM_LAZY_DOMAIN_SECURITY,KAM_SHORT,NICE_REPLY_A,SPF_HELO_NONE,SPF_NONE,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: No problem Lou and testing will be appreciated. I strongly recommend against rebasing the version that is on the mailing list now, the conflicts with Andrew's patches aren't simple to resolve ;) I'll do my best to get you revised versions next week :) On 14/04/2023 16:07, Lou Knauer wrote: > Hello Andre, > > Thank you for your nice reply. I can see that your patches, > combined with upcoming AArch64 support in libmvec, > will nullify the need for my patch. > > I will definitely test your patches and look forward to seeing them > evolve, I will rebase and apply them myself. > > Thank you very much! > Kind regards, > Lou > >> -----Original Message----- >> From: Andre Vieira (lists) >> Sent: Friday, April 14, 2023 12:30 >> To: Lou Knauer ; Andrew Pinski >> Cc: gcc-patches@gcc.gnu.org; Etienne Renault >> Subject: Re: [PATCH] aarch64: Add -mveclibabi=sleefgnu >> >> I have (outdated) RFC's here: >> https://gcc.gnu.org/pipermail/gcc-patches/2023-March/613593.html >> >> I am working on this patch series for stage 1. The list of features I am >> working on are: >> * SVE support for #pragma omp declare simd >> * Support for simdclone usage in autovec from #pragma omp declare variant >> This offers us a more fine-tuned approach to define what is and >> what's not available per function >> * Support for use of simdclones in SLP >> >> Also planning to enable the use of mixed-types that is currently >> disabled for AArch64, it's not a feature I suspect we need for our >> use-case but it will enable better testing as we can then enable AArch64 >> as a simdclone target in the testsuite. >> >> I could try to post some updates to the RFCs, I have been rebasing them >> on top of Andrew Stubbs latest patch to enable inbranch codegen. Let me >> know if you'd like to see these updates sooner rather than later so you >> can try them out for your usecase. >> >> Kind regards, >> Andre >> >> On 14/04/2023 10:34, Lou Knauer via Gcc-patches wrote: >>>> -----Original Message----- >>>> From: Andrew Pinski >>>> Sent: Friday, April 14, 2023 09:08 >>>> To: Lou Knauer >>>> Cc: gcc-patches@gcc.gnu.org; Etienne Renault >>>> Subject: Re: [PATCH] aarch64: Add -mveclibabi=sleefgnu >>>> >>>> On Fri, Apr 14, 2023 at 12:03 AM Lou Knauer via Gcc-patches >>>> wrote: >>>>> >>>>> This adds support for the -mveclibabi option to the AArch64 backend of GCC by >>>>> implementing the builtin_vectorized_function target hook for AArch64. >>>>> The SLEEF Vectorized Math Library's GNUABI interface is used, and >>>>> NEON/Advanced SIMD as well as SVE are supported. >>>>> >>>>> This was tested on the gcc testsuite and the llvm-test-suite on a AArch64 >>>>> host for NEON and SVE as well as on hand-written benchmarks. Where the >>>>> vectorization of builtins was applied successfully in loops bound by the >>>>> calls to those, significant (>2) performance gains can be observed. >>>> >>>> This is so wrong and it is better if you actually just used a header >>>> file instead. Specifically the openmp vect pragmas. >>>> >>>> Thanks, >>>> Andrew Pinski >>>> >>> >>> Thank you for your quick response. I do not fully understand your point: >>> the OpenMP Declare SIMD pragmas are not yet implemented for SVE (here [0] >>> someone started working on that, but it does not work in its current state). >>> The `-mveclibabi` flag seems to be the only solution for SVE vectorization of >>> libm functions from our point of view. >>> >>> Indeed, a custom header that redirects regular libm function calls to their >>> Sleef equivalent would be a solution for NEON since OpenMP Declare SIMD >>> pragmas are implemented for NEON in GCC. Nonetheless as far as I can tell, >>> the libmvec is not yet support for AArch64, so Sleef is unavoidable. I >>> therefore opted for a solution similar to the one for x86 and the SVML, where >>> only a additional flag during compilation is needed (instead of having to >>> modify source code to add includes). From a vectorization legality perspective, >>> this strategy also seems more reliable than a redirecting header since >>> Sleef functions (even the scalar ones) never set the errno and GCC already >>> verifies such details when transforming libm calls to builtins. >>> >>> Alternatively, do you prefere a patch that adds SVE support for >>> #pragma omp declare simd declarations, thus enabling the same header-based >>> strategy for SVE as for NEON? >>> >>> Thank you and kind regards, >>> Lou Knauer >>> >>> [0]: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96342 >>> >>>> >>>>> >>>>> gcc/ChangeLog: >>>>> >>>>> * config/aarch64/aarch64.opt: Add -mveclibabi option. >>>>> * config/aarch64/aarch64-opts.h: Add aarch64_veclibabi enum. >>>>> * config/aarch64/aarch64-protos.h: Add >>>>> aarch64_builtin_vectorized_function declaration. >>>>> * config/aarch64/aarch64.cc: Handle -mveclibabi option and pure >>>>> scalable type info for scalable vectors without "SVE type" attributes. >>>>> * config/aarch64/aarch64-builtins.cc: Add >>>>> aarch64_builtin_vectorized_function definition. >>>>> * doc/invoke.texi: Document -mveclibabi for AArch64 targets. >>>>> >>>>> gcc/testsuite/ChangeLog: >>>>> >>>>> * gcc.target/aarch64/vect-vecabi-sleefgnu-neon.c: New testcase. >>>>> * gcc.target/aarch64/vect-vecabi-sleefgnu-sve.c: New testcase. >>>>> --- >>>>> gcc/config/aarch64/aarch64-builtins.cc | 113 ++++++++++++++++++ >>>>> gcc/config/aarch64/aarch64-opts.h | 5 + >>>>> gcc/config/aarch64/aarch64-protos.h | 3 + >>>>> gcc/config/aarch64/aarch64.cc | 66 ++++++++++ >>>>> gcc/config/aarch64/aarch64.opt | 15 +++ >>>>> gcc/doc/invoke.texi | 15 +++ >>>>> .../aarch64/vect-vecabi-sleefgnu-neon.c | 16 +++ >>>>> .../aarch64/vect-vecabi-sleefgnu-sve.c | 16 +++ >>>>> 8 files changed, 249 insertions(+) >>>>> create mode 100644 gcc/testsuite/gcc.target/aarch64/vect-vecabi-sleefgnu-neon.c >>>>> create mode 100644 gcc/testsuite/gcc.target/aarch64/vect-vecabi-sleefgnu-sve.c >>>>> >>>>> diff --git a/gcc/config/aarch64/aarch64-builtins.cc b/gcc/config/aarch64/aarch64-builtins.cc >>>>> index cc6b7c01fd1..f53fa91b8d0 100644 >>>>> --- a/gcc/config/aarch64/aarch64-builtins.cc >>>>> +++ b/gcc/config/aarch64/aarch64-builtins.cc >>>>> @@ -47,6 +47,7 @@ >>>>> #include "stringpool.h" >>>>> #include "attribs.h" >>>>> #include "gimple-fold.h" >>>>> +#include "builtins.h" >>>>> >>>>> #define v8qi_UP E_V8QImode >>>>> #define v8di_UP E_V8DImode >>>>> @@ -3450,6 +3451,118 @@ aarch64_resolve_overloaded_builtin_general (location_t loc, tree function, >>>>> return NULL_TREE; >>>>> } >>>>> >>>>> +/* The vector library abi to use, if any. */ >>>>> +extern enum aarch64_veclibabi aarch64_selected_veclibabi; >>>>> + >>>>> +/* Returns a function declaration for a vectorized version of the combined >>>>> + function with combined_fn code FN and the result vector type TYPE. >>>>> + NULL_TREE is returned if there is none available. */ >>>>> +tree >>>>> +aarch64_builtin_vectorized_function (unsigned int fn_code, >>>>> + tree type_out, tree type_in) >>>>> +{ >>>>> + if (TREE_CODE (type_out) != VECTOR_TYPE >>>>> + || TREE_CODE (type_in) != VECTOR_TYPE >>>>> + || aarch64_selected_veclibabi != aarch64_veclibabi_type_sleefgnu >>>>> + || !flag_unsafe_math_optimizations) >>>>> + return NULL_TREE; >>>>> + >>>>> + machine_mode mode = TYPE_MODE (TREE_TYPE (type_out)); >>>>> + poly_uint64 n = TYPE_VECTOR_SUBPARTS (type_out); >>>>> + if (mode != TYPE_MODE (TREE_TYPE (type_in)) >>>>> + || !known_eq (n, TYPE_VECTOR_SUBPARTS (type_in))) >>>>> + return NULL_TREE; >>>>> + >>>>> + bool is_scalable = !n.is_constant (); >>>>> + if (is_scalable) >>>>> + { >>>>> + /* SVE is needed for scalable vectors, a SVE register's size is >>>>> + always a multiple of 128. */ >>>>> + if (!TARGET_SVE >>>>> + || (mode == DFmode && !known_eq (n, poly_uint64 (2, 2))) >>>>> + || (mode == SFmode && !known_eq (n, poly_uint64 (4, 4)))) >>>>> + return NULL_TREE; >>>>> + } >>>>> + else >>>>> + { >>>>> + /* A NEON register can hold two doubles or one float. */ >>>>> + if (!TARGET_SIMD >>>>> + || (mode == DFmode && n.to_constant () != 2) >>>>> + || (mode == SFmode && n.to_constant () != 4)) >>>>> + return NULL_TREE; >>>>> + } >>>>> + >>>>> + tree fntype; >>>>> + combined_fn fn = combined_fn (fn_code); >>>>> + const char *argencoding; >>>>> + switch (fn) >>>>> + { >>>>> + CASE_CFN_EXP: >>>>> + CASE_CFN_LOG: >>>>> + CASE_CFN_LOG10: >>>>> + CASE_CFN_TANH: >>>>> + CASE_CFN_TAN: >>>>> + CASE_CFN_ATAN: >>>>> + CASE_CFN_ATANH: >>>>> + CASE_CFN_CBRT: >>>>> + CASE_CFN_SINH: >>>>> + CASE_CFN_SIN: >>>>> + CASE_CFN_ASINH: >>>>> + CASE_CFN_ASIN: >>>>> + CASE_CFN_COSH: >>>>> + CASE_CFN_COS: >>>>> + CASE_CFN_ACOSH: >>>>> + CASE_CFN_ACOS: >>>>> + fntype = build_function_type_list (type_out, type_in, NULL); >>>>> + argencoding = "v"; >>>>> + break; >>>>> + >>>>> + CASE_CFN_POW: >>>>> + CASE_CFN_ATAN2: >>>>> + fntype = build_function_type_list (type_out, type_in, type_in, NULL); >>>>> + argencoding = "vv"; >>>>> + break; >>>>> + >>>>> + default: >>>>> + return NULL_TREE; >>>>> + } >>>>> + >>>>> + tree fndecl = mathfn_built_in (mode == DFmode >>>>> + ? double_type_node : float_type_node, fn); >>>>> + const char *scalar_name = IDENTIFIER_POINTER (DECL_NAME (fndecl)); >>>>> + /* Builtins will always be prefixed with '__builtin_'. */ >>>>> + gcc_assert (strncmp (scalar_name, "__builtin_", 10) == 0); >>>>> + scalar_name += 10; >>>>> + >>>>> + char vectorized_name[32]; >>>>> + if (is_scalable) >>>>> + { >>>>> + /* SVE ISA */ >>>>> + int n = snprintf (vectorized_name, sizeof (vectorized_name), >>>>> + "_ZGVsNx%s_%s", argencoding, scalar_name); >>>>> + if (n < 0 || n > sizeof (vectorized_name)) >>>>> + return NULL_TREE; >>>>> + } >>>>> + else >>>>> + { >>>>> + /* NEON ISA */ >>>>> + int n = snprintf (vectorized_name, sizeof (vectorized_name), >>>>> + "_ZGVnN%d%s_%s", mode == SFmode ? 4 : 2, >>>>> + argencoding, scalar_name); >>>>> + if (n < 0 || n > sizeof (vectorized_name)) >>>>> + return NULL_TREE; >>>>> + } >>>>> + >>>>> + tree new_fndecl = build_decl (BUILTINS_LOCATION, FUNCTION_DECL, >>>>> + get_identifier (vectorized_name), fntype); >>>>> + TREE_PUBLIC (new_fndecl) = 1; >>>>> + TREE_READONLY (new_fndecl) = 1; >>>>> + DECL_EXTERNAL (new_fndecl) = 1; >>>>> + DECL_IS_NOVOPS (new_fndecl) = 1; >>>>> + >>>>> + return new_fndecl; >>>>> +} >>>>> + >>>>> #undef AARCH64_CHECK_BUILTIN_MODE >>>>> #undef AARCH64_FIND_FRINT_VARIANT >>>>> #undef CF0 >>>>> diff --git a/gcc/config/aarch64/aarch64-opts.h b/gcc/config/aarch64/aarch64-opts.h >>>>> index a9f3e2715ca..d12871b893c 100644 >>>>> --- a/gcc/config/aarch64/aarch64-opts.h >>>>> +++ b/gcc/config/aarch64/aarch64-opts.h >>>>> @@ -98,4 +98,9 @@ enum aarch64_key_type { >>>>> AARCH64_KEY_B >>>>> }; >>>>> >>>>> +enum aarch64_veclibabi { >>>>> + aarch64_veclibabi_type_none, >>>>> + aarch64_veclibabi_type_sleefgnu >>>>> +}; >>>>> + >>>>> #endif >>>>> diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h >>>>> index 63339fa47df..53c6e455da8 100644 >>>>> --- a/gcc/config/aarch64/aarch64-protos.h >>>>> +++ b/gcc/config/aarch64/aarch64-protos.h >>>>> @@ -1066,4 +1066,7 @@ extern bool aarch64_harden_sls_blr_p (void); >>>>> >>>>> extern void aarch64_output_patchable_area (unsigned int, bool); >>>>> >>>>> +extern tree aarch64_builtin_vectorized_function (unsigned int fn, >>>>> + tree type_out, tree type_in); >>>>> + >>>>> #endif /* GCC_AARCH64_PROTOS_H */ >>>>> diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc >>>>> index 42617ced73a..50ac37ff01e 100644 >>>>> --- a/gcc/config/aarch64/aarch64.cc >>>>> +++ b/gcc/config/aarch64/aarch64.cc >>>>> @@ -84,6 +84,7 @@ >>>>> #include "aarch64-feature-deps.h" >>>>> #include "config/arm/aarch-common.h" >>>>> #include "config/arm/aarch-common-protos.h" >>>>> +#include "print-tree.h" >>>>> >>>>> /* This file should be included last. */ >>>>> #include "target-def.h" >>>>> @@ -2951,6 +2952,62 @@ pure_scalable_type_info::analyze (const_tree type) >>>>> return IS_PST; >>>>> } >>>>> >>>>> + /* Only functions and types that are part of the ARM C Language >>>>> + Extensions (arm_sve.h) have the SVE type attributes. >>>>> + The auto-vectorizer does not annotate the vector types it creates with >>>>> + those attributes. With the support of vectorized libm function >>>>> + builtins for SVE, scalable vectors without special attributes >>>>> + have to be treated as well. */ >>>>> + if (TREE_CODE (type) == VECTOR_TYPE >>>>> + && !TYPE_VECTOR_SUBPARTS (type).is_constant ()) >>>>> + { >>>>> + /* Boolean vectors are special because they are used by >>>>> + the vectorizer as masks that must go into the >>>>> + predicate registers. */ >>>>> + if (TREE_CODE (TREE_TYPE (type)) == BOOLEAN_TYPE) >>>>> + { >>>>> + p.num_zr = 0; >>>>> + p.num_pr = 1; >>>>> + p.mode = p.orig_mode = TYPE_MODE (type); >>>>> + add_piece (p); >>>>> + return IS_PST; >>>>> + } >>>>> + >>>>> + static const struct { >>>>> + machine_mode mode; >>>>> + unsigned int element_size; >>>>> + poly_uint64 vector_size; >>>>> + } valid_vectors[] = { >>>>> + { VNx8BFmode, 16, poly_uint64 (8, 8) }, /* svbfloat16_t */ >>>>> + { VNx8HFmode, 16, poly_uint64 (8, 8) }, /* svfloat16_t */ >>>>> + { VNx4SFmode, 32, poly_uint64 (4, 4) }, /* svfloat32_t */ >>>>> + { VNx2DFmode, 64, poly_uint64 (2, 2) }, /* svfloat64_t */ >>>>> + { VNx16BImode, 8, poly_uint64 (16, 16) }, /* sv[u]int8_t */ >>>>> + { VNx8HImode, 16, poly_uint64 (8, 8) }, /* sv[u]int16_t */ >>>>> + { VNx4SImode, 32, poly_uint64 (4, 4) }, /* sv[u]int32_t */ >>>>> + { VNx2DImode, 64, poly_uint64 (2, 2) }, /* sv[u]int64_t */ >>>>> + }; >>>>> + >>>>> + machine_mode elm_mode = TYPE_MODE (TREE_TYPE (type)); >>>>> + unsigned int elm_size = GET_MODE_BITSIZE (elm_mode).to_constant (); >>>>> + for (unsigned i = 0; >>>>> + i < sizeof (valid_vectors) / sizeof (valid_vectors[0]); i++) >>>>> + if (valid_vectors[i].element_size == elm_size >>>>> + && valid_vectors[i].mode == TYPE_MODE (type) >>>>> + && known_eq (valid_vectors[i].vector_size, >>>>> + TYPE_VECTOR_SUBPARTS (type))) >>>>> + { >>>>> + p.num_zr = 1; >>>>> + p.num_pr = 0; >>>>> + p.mode = p.orig_mode = valid_vectors[i].mode; >>>>> + add_piece (p); >>>>> + return IS_PST; >>>>> + } >>>>> + >>>>> + fatal_error (input_location, "unsupported vector type %qT" >>>>> + " as function parameter without SVE attributes", type); >>>>> + } >>>>> + >>>>> /* Check for user-defined PSTs. */ >>>>> if (TREE_CODE (type) == ARRAY_TYPE) >>>>> return analyze_array (type); >>>>> @@ -17851,6 +17908,8 @@ aarch64_override_options_after_change_1 (struct gcc_options *opts) >>>>> flag_mrecip_low_precision_sqrt = true; >>>>> } >>>>> >>>>> +enum aarch64_veclibabi aarch64_selected_veclibabi = aarch64_veclibabi_type_none; >>>>> + >>>>> /* 'Unpack' up the internal tuning structs and update the options >>>>> in OPTS. The caller must have set up selected_tune and selected_arch >>>>> as all the other target-specific codegen decisions are >>>>> @@ -18031,6 +18090,9 @@ aarch64_override_options_internal (struct gcc_options *opts) >>>>> && opts->x_optimize >= aarch64_tune_params.prefetch->default_opt_level) >>>>> opts->x_flag_prefetch_loop_arrays = 1; >>>>> >>>>> + if (opts->x_aarch64_veclibabi_type == aarch64_veclibabi_type_sleefgnu) >>>>> + aarch64_selected_veclibabi = aarch64_veclibabi_type_sleefgnu; >>>>> + >>>>> aarch64_override_options_after_change_1 (opts); >>>>> } >>>>> >>>>> @@ -28085,6 +28147,10 @@ aarch64_libgcc_floating_mode_supported_p >>>>> #undef TARGET_CONST_ANCHOR >>>>> #define TARGET_CONST_ANCHOR 0x1000000 >>>>> >>>>> +#undef TARGET_VECTORIZE_BUILTIN_VECTORIZED_FUNCTION >>>>> +#define TARGET_VECTORIZE_BUILTIN_VECTORIZED_FUNCTION \ >>>>> + aarch64_builtin_vectorized_function >>>>> + >>>>> struct gcc_target targetm = TARGET_INITIALIZER; >>>>> >>>>> #include "gt-aarch64.h" >>>>> diff --git a/gcc/config/aarch64/aarch64.opt b/gcc/config/aarch64/aarch64.opt >>>>> index 1d7967db9c0..76013dacdea 100644 >>>>> --- a/gcc/config/aarch64/aarch64.opt >>>>> +++ b/gcc/config/aarch64/aarch64.opt >>>>> @@ -302,3 +302,18 @@ Constant memset size in bytes from which to start using MOPS sequence. >>>>> -param=aarch64-vect-unroll-limit= >>>>> Target Joined UInteger Var(aarch64_vect_unroll_limit) Init(4) Param >>>>> Limit how much the autovectorizer may unroll a loop. >>>>> + >>>>> +;; -mveclibabi= >>>>> +TargetVariable >>>>> +enum aarch64_veclibabi aarch64_veclibabi_type = aarch64_veclibabi_type_none >>>>> + >>>>> +mveclibabi= >>>>> +Target RejectNegative Joined Var(aarch64_veclibabi_type) Enum(aarch64_veclibabi) Init(aarch64_veclibabi_type_none) >>>>> +Vector library ABI to use. >>>>> + >>>>> +Enum >>>>> +Name(aarch64_veclibabi) Type(enum aarch64_veclibabi) >>>>> +Known vectorization library ABIs (for use with the -mveclibabi= option): >>>>> + >>>>> +EnumValue >>>>> +Enum(aarch64_veclibabi) String(sleefgnu) Value(aarch64_veclibabi_type_sleefgnu) >>>>> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi >>>>> index a38547f53e5..71fbbf27522 100644 >>>>> --- a/gcc/doc/invoke.texi >>>>> +++ b/gcc/doc/invoke.texi >>>>> @@ -20383,6 +20383,21 @@ across releases. >>>>> >>>>> This option is only intended to be useful when developing GCC. >>>>> >>>>> +@opindex mveclibabi >>>>> +@item -mveclibabi=@var{type} >>>>> +Specifies the ABI type to use for vectorizing intrinsics using an >>>>> +external library. The only type supported at present is @samp{sleefgnu}, >>>>> +which specifies to use the GNU ABI variant of the Sleef Vectorized >>>>> +Math Library. This flag can be used for both, Advanced SIMD (NEON) and SVE. >>>>> + >>>>> +GCC currently emits vectorized calls to @code{exp}, @code{log}, @code{log10}, >>>>> +@code{tanh}, @code{tan}, @code{atan}, @code{atanh}, @code{cbrt}, @code{sinh}, >>>>> +@code{sin}, @code{asinh} and @code{asin} when possible and profitable >>>>> +on AArch64. >>>>> + >>>>> +Both @option{-ftree-vectorize} and @option{-funsafe-math-optimizations} >>>>> +must also be enabled. The libsleefgnu must be specified at link time. >>>>> + >>>>> @opindex mverbose-cost-dump >>>>> @item -mverbose-cost-dump >>>>> Enable verbose cost model dumping in the debug dump files. This option is >>>>> diff --git a/gcc/testsuite/gcc.target/aarch64/vect-vecabi-sleefgnu-neon.c b/gcc/testsuite/gcc.target/aarch64/vect- >> vecabi- >>>> sleefgnu-neon.c >>>>> new file mode 100644 >>>>> index 00000000000..e9f6078cd12 >>>>> --- /dev/null >>>>> +++ b/gcc/testsuite/gcc.target/aarch64/vect-vecabi-sleefgnu-neon.c >>>>> @@ -0,0 +1,16 @@ >>>>> +/* { dg-do compile } */ >>>>> +/* { dg-options "-O3 -march=armv8-a+simd -ftree-vectorize -mveclibabi=sleefgnu -ffast-math" } */ >>>>> + >>>>> +extern float sinf(float); >>>>> + >>>>> +float x[256]; >>>>> + >>>>> +void foo(void) >>>>> +{ >>>>> + int i; >>>>> + >>>>> + for (i=0; i<256; ++i) >>>>> + x[i] = sinf(x[i]); >>>>> +} >>>>> + >>>>> +/* { dg-final { scan-assembler "_ZGVnN4v_sinf" } } */ >>>>> diff --git a/gcc/testsuite/gcc.target/aarch64/vect-vecabi-sleefgnu-sve.c b/gcc/testsuite/gcc.target/aarch64/vect-vecabi- >>>> sleefgnu-sve.c >>>>> new file mode 100644 >>>>> index 00000000000..8319ae420e1 >>>>> --- /dev/null >>>>> +++ b/gcc/testsuite/gcc.target/aarch64/vect-vecabi-sleefgnu-sve.c >>>>> @@ -0,0 +1,16 @@ >>>>> +/* { dg-do compile } */ >>>>> +/* { dg-options "-O3 -march=armv8-a+sve -ftree-vectorize -mveclibabi=sleefgnu -ffast-math" } */ >>>>> + >>>>> +extern float sinf(float); >>>>> + >>>>> +float x[256]; >>>>> + >>>>> +void foo(void) >>>>> +{ >>>>> + int i; >>>>> + >>>>> + for (i=0; i<256; ++i) >>>>> + x[i] = sinf(x[i]); >>>>> +} >>>>> + >>>>> +/* { dg-final { scan-assembler "_ZGVsNxv_sinf" } } */ >>>>> -- >>>>> 2.25.1 >>>>>