From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pj1-x1032.google.com (mail-pj1-x1032.google.com [IPv6:2607:f8b0:4864:20::1032]) by sourceware.org (Postfix) with ESMTPS id E488B385840A for ; Fri, 14 Apr 2023 07:07:51 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org E488B385840A Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com Received: by mail-pj1-x1032.google.com with SMTP id my14-20020a17090b4c8e00b0024708e8e2ddso6889500pjb.4 for ; Fri, 14 Apr 2023 00:07:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1681456070; x=1684048070; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=fLLfI+KAotQebBG2j1c41w4VLDy2tvQWD2PaXRbhJr0=; b=T3D+Cpx0RLjd5HqDUDb8HQrhB03k4nWPbiE5OmloseqLfxqcOO4ERCCgiYrQWxTZKW PoZU0u4k9RskSaTDcCNwpJZxugvUKheDJLB6BlpmSBGS8SegY1msFmXtcwNEcjq/oIx5 vK3w6fGxf4VD9fJ1rbaHsGY6Jr23IT+GZFP+9LogFMDGHuKFmEXXBC0hIZq/kt6FFzry hNtukO38Ov2DMptnX6dMoSkO/i0Hr5VRxXrQS7f/va1qsfpRWBbSpcsVc7H7BS2NAQLN Mh19odLrNGqB1EhfANWt8fKJo6o0dRV5SFPMzDRh6+XJprWmAOVn2OgCV2waLG5IrViy ZSJw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1681456070; x=1684048070; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=fLLfI+KAotQebBG2j1c41w4VLDy2tvQWD2PaXRbhJr0=; b=N6ykcHpLrjrQS2Me1VxWHlWGy3BBDyFAjY83xwKo3QRKm6UnSgoRXrCjBfbQDTlCuT VRRiZyoD9ncJHn9lQxwnHtuTOzrj6T6D2JVyVAlmq28rUo6Ejr6YzDQ0tXvpFa07t/N6 3hGQOruoItoTh78YLQ6wqT5F0APy987gnl7TxctzqZjVjUoE00fKoojuirMfPHnkZU/a ErZC22xMB3cuT7Ezb72vZW8AqrEITerFXKVyJxwcBSrdLEfCisCxzyqmX5oZB/NgTrKT h/yo8kjizkSYqvNQexmGWMgY+3bWfT5zhNNNqwsULLyY5fOk6XYHwlJ6jDd02HkwNjVo 0bmw== X-Gm-Message-State: AAQBX9e5xjhTWOv6f/0T5ewJ3zGEsEz28KRAQZ5svYGTnTAzpzmwe8cY y0a4mmPHt6NVs2L3uouv1+H6kAk1EqKr48Tjs5De1IEL97I= X-Google-Smtp-Source: AKy350Z5b6CCEndEzezfeHVFSOx8UDBxjJ8SBejG8VqJaWSVVngiRwXrf2xaEPOwwNWuWiD+cwSFOgIX3UW0CNNFijg= X-Received: by 2002:a17:902:c943:b0:1a6:6e78:9f7d with SMTP id i3-20020a170902c94300b001a66e789f7dmr1755862pla.49.1681456070113; Fri, 14 Apr 2023 00:07:50 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Andrew Pinski Date: Fri, 14 Apr 2023 00:07:38 -0700 Message-ID: Subject: Re: [PATCH] aarch64: Add -mveclibabi=sleefgnu To: Lou Knauer Cc: "gcc-patches@gcc.gnu.org" , Etienne Renault Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-7.6 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,GIT_PATCH_0,KAM_SHORT,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Fri, Apr 14, 2023 at 12:03=E2=80=AFAM Lou Knauer via Gcc-patches wrote: > > This adds support for the -mveclibabi option to the AArch64 backend of GC= C by > implementing the builtin_vectorized_function target hook for AArch64. > The SLEEF Vectorized Math Library's GNUABI interface is used, and > NEON/Advanced SIMD as well as SVE are supported. > > This was tested on the gcc testsuite and the llvm-test-suite on a AArch64 > host for NEON and SVE as well as on hand-written benchmarks. Where the > vectorization of builtins was applied successfully in loops bound by the > calls to those, significant (>2) performance gains can be observed. This is so wrong and it is better if you actually just used a header file instead. Specifically the openmp vect pragmas. Thanks, Andrew Pinski > > gcc/ChangeLog: > > * config/aarch64/aarch64.opt: Add -mveclibabi option. > * config/aarch64/aarch64-opts.h: Add aarch64_veclibabi enum. > * config/aarch64/aarch64-protos.h: Add > aarch64_builtin_vectorized_function declaration. > * config/aarch64/aarch64.cc: Handle -mveclibabi option and pure > scalable type info for scalable vectors without "SVE type" attrib= utes. > * config/aarch64/aarch64-builtins.cc: Add > aarch64_builtin_vectorized_function definition. > * doc/invoke.texi: Document -mveclibabi for AArch64 targets. > > gcc/testsuite/ChangeLog: > > * gcc.target/aarch64/vect-vecabi-sleefgnu-neon.c: New testcase. > * gcc.target/aarch64/vect-vecabi-sleefgnu-sve.c: New testcase. > --- > gcc/config/aarch64/aarch64-builtins.cc | 113 ++++++++++++++++++ > gcc/config/aarch64/aarch64-opts.h | 5 + > gcc/config/aarch64/aarch64-protos.h | 3 + > gcc/config/aarch64/aarch64.cc | 66 ++++++++++ > gcc/config/aarch64/aarch64.opt | 15 +++ > gcc/doc/invoke.texi | 15 +++ > .../aarch64/vect-vecabi-sleefgnu-neon.c | 16 +++ > .../aarch64/vect-vecabi-sleefgnu-sve.c | 16 +++ > 8 files changed, 249 insertions(+) > create mode 100644 gcc/testsuite/gcc.target/aarch64/vect-vecabi-sleefgnu= -neon.c > create mode 100644 gcc/testsuite/gcc.target/aarch64/vect-vecabi-sleefgnu= -sve.c > > diff --git a/gcc/config/aarch64/aarch64-builtins.cc b/gcc/config/aarch64/= aarch64-builtins.cc > index cc6b7c01fd1..f53fa91b8d0 100644 > --- a/gcc/config/aarch64/aarch64-builtins.cc > +++ b/gcc/config/aarch64/aarch64-builtins.cc > @@ -47,6 +47,7 @@ > #include "stringpool.h" > #include "attribs.h" > #include "gimple-fold.h" > +#include "builtins.h" > > #define v8qi_UP E_V8QImode > #define v8di_UP E_V8DImode > @@ -3450,6 +3451,118 @@ aarch64_resolve_overloaded_builtin_general (locat= ion_t loc, tree function, > return NULL_TREE; > } > > +/* The vector library abi to use, if any. */ > +extern enum aarch64_veclibabi aarch64_selected_veclibabi; > + > +/* Returns a function declaration for a vectorized version of the combin= ed > + function with combined_fn code FN and the result vector type TYPE. > + NULL_TREE is returned if there is none available. */ > +tree > +aarch64_builtin_vectorized_function (unsigned int fn_code, > + tree type_out, tree type_in) > +{ > + if (TREE_CODE (type_out) !=3D VECTOR_TYPE > + || TREE_CODE (type_in) !=3D VECTOR_TYPE > + || aarch64_selected_veclibabi !=3D aarch64_veclibabi_type_sleefgnu > + || !flag_unsafe_math_optimizations) > + return NULL_TREE; > + > + machine_mode mode =3D TYPE_MODE (TREE_TYPE (type_out)); > + poly_uint64 n =3D TYPE_VECTOR_SUBPARTS (type_out); > + if (mode !=3D TYPE_MODE (TREE_TYPE (type_in)) > + || !known_eq (n, TYPE_VECTOR_SUBPARTS (type_in))) > + return NULL_TREE; > + > + bool is_scalable =3D !n.is_constant (); > + if (is_scalable) > + { > + /* SVE is needed for scalable vectors, a SVE register's size is > + always a multiple of 128. */ > + if (!TARGET_SVE > + || (mode =3D=3D DFmode && !known_eq (n, poly_uint64 (2, 2))) > + || (mode =3D=3D SFmode && !known_eq (n, poly_uint64 (4, 4)))) > + return NULL_TREE; > + } > + else > + { > + /* A NEON register can hold two doubles or one float. */ > + if (!TARGET_SIMD > + || (mode =3D=3D DFmode && n.to_constant () !=3D 2) > + || (mode =3D=3D SFmode && n.to_constant () !=3D 4)) > + return NULL_TREE; > + } > + > + tree fntype; > + combined_fn fn =3D combined_fn (fn_code); > + const char *argencoding; > + switch (fn) > + { > + CASE_CFN_EXP: > + CASE_CFN_LOG: > + CASE_CFN_LOG10: > + CASE_CFN_TANH: > + CASE_CFN_TAN: > + CASE_CFN_ATAN: > + CASE_CFN_ATANH: > + CASE_CFN_CBRT: > + CASE_CFN_SINH: > + CASE_CFN_SIN: > + CASE_CFN_ASINH: > + CASE_CFN_ASIN: > + CASE_CFN_COSH: > + CASE_CFN_COS: > + CASE_CFN_ACOSH: > + CASE_CFN_ACOS: > + fntype =3D build_function_type_list (type_out, type_in, NULL); > + argencoding =3D "v"; > + break; > + > + CASE_CFN_POW: > + CASE_CFN_ATAN2: > + fntype =3D build_function_type_list (type_out, type_in, type_in, = NULL); > + argencoding =3D "vv"; > + break; > + > + default: > + return NULL_TREE; > + } > + > + tree fndecl =3D mathfn_built_in (mode =3D=3D DFmode > + ? double_type_node : float_type_node, fn= ); > + const char *scalar_name =3D IDENTIFIER_POINTER (DECL_NAME (fndecl)); > + /* Builtins will always be prefixed with '__builtin_'. */ > + gcc_assert (strncmp (scalar_name, "__builtin_", 10) =3D=3D 0); > + scalar_name +=3D 10; > + > + char vectorized_name[32]; > + if (is_scalable) > + { > + /* SVE ISA */ > + int n =3D snprintf (vectorized_name, sizeof (vectorized_name), > + "_ZGVsNx%s_%s", argencoding, scalar_name); > + if (n < 0 || n > sizeof (vectorized_name)) > + return NULL_TREE; > + } > + else > + { > + /* NEON ISA */ > + int n =3D snprintf (vectorized_name, sizeof (vectorized_name), > + "_ZGVnN%d%s_%s", mode =3D=3D SFmode ? 4 : 2, > + argencoding, scalar_name); > + if (n < 0 || n > sizeof (vectorized_name)) > + return NULL_TREE; > + } > + > + tree new_fndecl =3D build_decl (BUILTINS_LOCATION, FUNCTION_DECL, > + get_identifier (vectorized_name), fntype)= ; > + TREE_PUBLIC (new_fndecl) =3D 1; > + TREE_READONLY (new_fndecl) =3D 1; > + DECL_EXTERNAL (new_fndecl) =3D 1; > + DECL_IS_NOVOPS (new_fndecl) =3D 1; > + > + return new_fndecl; > +} > + > #undef AARCH64_CHECK_BUILTIN_MODE > #undef AARCH64_FIND_FRINT_VARIANT > #undef CF0 > diff --git a/gcc/config/aarch64/aarch64-opts.h b/gcc/config/aarch64/aarch= 64-opts.h > index a9f3e2715ca..d12871b893c 100644 > --- a/gcc/config/aarch64/aarch64-opts.h > +++ b/gcc/config/aarch64/aarch64-opts.h > @@ -98,4 +98,9 @@ enum aarch64_key_type { > AARCH64_KEY_B > }; > > +enum aarch64_veclibabi { > + aarch64_veclibabi_type_none, > + aarch64_veclibabi_type_sleefgnu > +}; > + > #endif > diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aar= ch64-protos.h > index 63339fa47df..53c6e455da8 100644 > --- a/gcc/config/aarch64/aarch64-protos.h > +++ b/gcc/config/aarch64/aarch64-protos.h > @@ -1066,4 +1066,7 @@ extern bool aarch64_harden_sls_blr_p (void); > > extern void aarch64_output_patchable_area (unsigned int, bool); > > +extern tree aarch64_builtin_vectorized_function (unsigned int fn, > + tree type_out, tree type= _in); > + > #endif /* GCC_AARCH64_PROTOS_H */ > diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.c= c > index 42617ced73a..50ac37ff01e 100644 > --- a/gcc/config/aarch64/aarch64.cc > +++ b/gcc/config/aarch64/aarch64.cc > @@ -84,6 +84,7 @@ > #include "aarch64-feature-deps.h" > #include "config/arm/aarch-common.h" > #include "config/arm/aarch-common-protos.h" > +#include "print-tree.h" > > /* This file should be included last. */ > #include "target-def.h" > @@ -2951,6 +2952,62 @@ pure_scalable_type_info::analyze (const_tree type) > return IS_PST; > } > > + /* Only functions and types that are part of the ARM C Language > + Extensions (arm_sve.h) have the SVE type attributes. > + The auto-vectorizer does not annotate the vector types it creates w= ith > + those attributes. With the support of vectorized libm function > + builtins for SVE, scalable vectors without special attributes > + have to be treated as well. */ > + if (TREE_CODE (type) =3D=3D VECTOR_TYPE > + && !TYPE_VECTOR_SUBPARTS (type).is_constant ()) > + { > + /* Boolean vectors are special because they are used by > + the vectorizer as masks that must go into the > + predicate registers. */ > + if (TREE_CODE (TREE_TYPE (type)) =3D=3D BOOLEAN_TYPE) > + { > + p.num_zr =3D 0; > + p.num_pr =3D 1; > + p.mode =3D p.orig_mode =3D TYPE_MODE (type); > + add_piece (p); > + return IS_PST; > + } > + > + static const struct { > + machine_mode mode; > + unsigned int element_size; > + poly_uint64 vector_size; > + } valid_vectors[] =3D { > + { VNx8BFmode, 16, poly_uint64 (8, 8) }, /* svbfloat16_t */ > + { VNx8HFmode, 16, poly_uint64 (8, 8) }, /* svfloat16_t */ > + { VNx4SFmode, 32, poly_uint64 (4, 4) }, /* svfloat32_t */ > + { VNx2DFmode, 64, poly_uint64 (2, 2) }, /* svfloat64_t */ > + { VNx16BImode, 8, poly_uint64 (16, 16) }, /* sv[u]int8_t */ > + { VNx8HImode, 16, poly_uint64 (8, 8) }, /* sv[u]int16_t */ > + { VNx4SImode, 32, poly_uint64 (4, 4) }, /* sv[u]int32_t */ > + { VNx2DImode, 64, poly_uint64 (2, 2) }, /* sv[u]int64_t */ > + }; > + > + machine_mode elm_mode =3D TYPE_MODE (TREE_TYPE (type)); > + unsigned int elm_size =3D GET_MODE_BITSIZE (elm_mode).to_constant = (); > + for (unsigned i =3D 0; > + i < sizeof (valid_vectors) / sizeof (valid_vectors[0]); i++) > + if (valid_vectors[i].element_size =3D=3D elm_size > + && valid_vectors[i].mode =3D=3D TYPE_MODE (type) > + && known_eq (valid_vectors[i].vector_size, > + TYPE_VECTOR_SUBPARTS (type))) > + { > + p.num_zr =3D 1; > + p.num_pr =3D 0; > + p.mode =3D p.orig_mode =3D valid_vectors[i].mode; > + add_piece (p); > + return IS_PST; > + } > + > + fatal_error (input_location, "unsupported vector type %qT" > + " as function parameter without SVE attributes", type)= ; > + } > + > /* Check for user-defined PSTs. */ > if (TREE_CODE (type) =3D=3D ARRAY_TYPE) > return analyze_array (type); > @@ -17851,6 +17908,8 @@ aarch64_override_options_after_change_1 (struct g= cc_options *opts) > flag_mrecip_low_precision_sqrt =3D true; > } > > +enum aarch64_veclibabi aarch64_selected_veclibabi =3D aarch64_veclibabi_= type_none; > + > /* 'Unpack' up the internal tuning structs and update the options > in OPTS. The caller must have set up selected_tune and selected_arc= h > as all the other target-specific codegen decisions are > @@ -18031,6 +18090,9 @@ aarch64_override_options_internal (struct gcc_opt= ions *opts) > && opts->x_optimize >=3D aarch64_tune_params.prefetch->default_opt= _level) > opts->x_flag_prefetch_loop_arrays =3D 1; > > + if (opts->x_aarch64_veclibabi_type =3D=3D aarch64_veclibabi_type_sleef= gnu) > + aarch64_selected_veclibabi =3D aarch64_veclibabi_type_sleefgnu; > + > aarch64_override_options_after_change_1 (opts); > } > > @@ -28085,6 +28147,10 @@ aarch64_libgcc_floating_mode_supported_p > #undef TARGET_CONST_ANCHOR > #define TARGET_CONST_ANCHOR 0x1000000 > > +#undef TARGET_VECTORIZE_BUILTIN_VECTORIZED_FUNCTION > +#define TARGET_VECTORIZE_BUILTIN_VECTORIZED_FUNCTION \ > + aarch64_builtin_vectorized_function > + > struct gcc_target targetm =3D TARGET_INITIALIZER; > > #include "gt-aarch64.h" > diff --git a/gcc/config/aarch64/aarch64.opt b/gcc/config/aarch64/aarch64.= opt > index 1d7967db9c0..76013dacdea 100644 > --- a/gcc/config/aarch64/aarch64.opt > +++ b/gcc/config/aarch64/aarch64.opt > @@ -302,3 +302,18 @@ Constant memset size in bytes from which to start us= ing MOPS sequence. > -param=3Daarch64-vect-unroll-limit=3D > Target Joined UInteger Var(aarch64_vect_unroll_limit) Init(4) Param > Limit how much the autovectorizer may unroll a loop. > + > +;; -mveclibabi=3D > +TargetVariable > +enum aarch64_veclibabi aarch64_veclibabi_type =3D aarch64_veclibabi_type= _none > + > +mveclibabi=3D > +Target RejectNegative Joined Var(aarch64_veclibabi_type) Enum(aarch64_ve= clibabi) Init(aarch64_veclibabi_type_none) > +Vector library ABI to use. > + > +Enum > +Name(aarch64_veclibabi) Type(enum aarch64_veclibabi) > +Known vectorization library ABIs (for use with the -mveclibabi=3D option= ): > + > +EnumValue > +Enum(aarch64_veclibabi) String(sleefgnu) Value(aarch64_veclibabi_type_sl= eefgnu) > diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi > index a38547f53e5..71fbbf27522 100644 > --- a/gcc/doc/invoke.texi > +++ b/gcc/doc/invoke.texi > @@ -20383,6 +20383,21 @@ across releases. > > This option is only intended to be useful when developing GCC. > > +@opindex mveclibabi > +@item -mveclibabi=3D@var{type} > +Specifies the ABI type to use for vectorizing intrinsics using an > +external library. The only type supported at present is @samp{sleefgnu}= , > +which specifies to use the GNU ABI variant of the Sleef Vectorized > +Math Library. This flag can be used for both, Advanced SIMD (NEON) and S= VE. > + > +GCC currently emits vectorized calls to @code{exp}, @code{log}, @code{lo= g10}, > +@code{tanh}, @code{tan}, @code{atan}, @code{atanh}, @code{cbrt}, @code{s= inh}, > +@code{sin}, @code{asinh} and @code{asin} when possible and profitable > +on AArch64. > + > +Both @option{-ftree-vectorize} and @option{-funsafe-math-optimizations} > +must also be enabled. The libsleefgnu must be specified at link time. > + > @opindex mverbose-cost-dump > @item -mverbose-cost-dump > Enable verbose cost model dumping in the debug dump files. This option = is > diff --git a/gcc/testsuite/gcc.target/aarch64/vect-vecabi-sleefgnu-neon.c= b/gcc/testsuite/gcc.target/aarch64/vect-vecabi-sleefgnu-neon.c > new file mode 100644 > index 00000000000..e9f6078cd12 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/vect-vecabi-sleefgnu-neon.c > @@ -0,0 +1,16 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O3 -march=3Darmv8-a+simd -ftree-vectorize -mveclibabi= =3Dsleefgnu -ffast-math" } */ > + > +extern float sinf(float); > + > +float x[256]; > + > +void foo(void) > +{ > + int i; > + > + for (i=3D0; i<256; ++i) > + x[i] =3D sinf(x[i]); > +} > + > +/* { dg-final { scan-assembler "_ZGVnN4v_sinf" } } */ > diff --git a/gcc/testsuite/gcc.target/aarch64/vect-vecabi-sleefgnu-sve.c = b/gcc/testsuite/gcc.target/aarch64/vect-vecabi-sleefgnu-sve.c > new file mode 100644 > index 00000000000..8319ae420e1 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/vect-vecabi-sleefgnu-sve.c > @@ -0,0 +1,16 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O3 -march=3Darmv8-a+sve -ftree-vectorize -mveclibabi= =3Dsleefgnu -ffast-math" } */ > + > +extern float sinf(float); > + > +float x[256]; > + > +void foo(void) > +{ > + int i; > + > + for (i=3D0; i<256; ++i) > + x[i] =3D sinf(x[i]); > +} > + > +/* { dg-final { scan-assembler "_ZGVsNxv_sinf" } } */ > -- > 2.25.1 >