From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by sourceware.org (Postfix) with ESMTP id 658CC3858D39 for ; Wed, 22 Nov 2023 14:53:01 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 658CC3858D39 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 658CC3858D39 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=217.140.110.172 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1700664786; cv=none; b=jMG/wogTMZF9jYV9+vXpcVJU6p8EoxlsgsdTLBsvzr56VDRuL/b9SPtpXCsqdxsBNaTNq0ENlw/JPnIQZ5pKJgDOMJ69SSIZfaQ/3DahLIOuRNDYWh2R4d5iF3KvZTpqJoeUkhs5n1lDjgMf1B6XE1jN6tq4xaWe6papZiJEZpc= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1700664786; c=relaxed/simple; bh=UOaNuS17zQStBGIQhRBAlkpn+uapdkis7w95pMPWJ7o=; h=From:To:Subject:Date:Message-ID:MIME-Version; b=tuBwM4TTyaq2u7q8nRAAQ+/OhKNosie8zuyPa544tTPiWvJFEhbNTZV68Y+2KggTdvW48EWlghAU9JtBagHelz7r/3eIFEp4u7x1EBLZlK72WvwgtsCniJKJys/LcsWHB9Mc8+H0Ow01AaYdJrKhD8d1/17i936Qze+LmehJwSo= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id E18021595; Wed, 22 Nov 2023 06:53:47 -0800 (PST) Received: from localhost (e121540-lin.manchester.arm.com [10.32.110.72]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id B7BD13F73F; Wed, 22 Nov 2023 06:52:59 -0800 (PST) From: Richard Sandiford To: Richard Ball Mail-Followup-To: Richard Ball ,"gcc-patches\@gcc.gnu.org" , Richard Earnshaw , Kyrylo Tkachov , Marcus Shawcroft , richard.sandiford@arm.com Cc: "gcc-patches\@gcc.gnu.org" , Richard Earnshaw , Kyrylo Tkachov , Marcus Shawcroft Subject: Re: [PATCH v3] aarch64: SVE/NEON Bridging intrinsics References: Date: Wed, 22 Nov 2023 14:52:58 +0000 In-Reply-To: (Richard Ball's message of "Thu, 9 Nov 2023 16:14:50 +0000") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-Spam-Status: No, score=-22.6 required=5.0 tests=BAYES_00,GIT_PATCH_0,KAM_DMARC_NONE,KAM_DMARC_STATUS,KAM_LAZY_DOMAIN_SECURITY,KAM_LOTSOFHASH,KAM_SHORT,SPF_HELO_NONE,SPF_NONE,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Richard Ball writes: > ACLE has added intrinsics to bridge between SVE and Neon. > > The NEON_SVE Bridge adds intrinsics that allow conversions between NEON and > SVE vectors. > > This patch adds support to GCC for the following 3 intrinsics: > svset_neonq, svget_neonq and svdup_neonq > > gcc/ChangeLog: > > * config.gcc: Adds new header to config. > * config/aarch64/aarch64-builtins.cc (enum aarch64_type_qualifiers): > Moved to header file. > (ENTRY): Likewise. > (enum aarch64_simd_type): Likewise. > (struct aarch64_simd_type_info): Make extern. > (GTY): Likewise. > * config/aarch64/aarch64-c.cc (aarch64_pragma_aarch64): > Defines pragma for arm_neon_sve_bridge.h. > * config/aarch64/aarch64-protos.h: New function. > * config/aarch64/aarch64-sve-builtins-base.h: New intrinsics. > * config/aarch64/aarch64-sve-builtins-base.cc > (class svget_neonq_impl): New intrinsic implementation. > (class svset_neonq_impl): Likewise. > (class svdup_neonq_impl): Likewise. > (NEON_SVE_BRIDGE_FUNCTION): New intrinsics. > * config/aarch64/aarch64-sve-builtins-functions.h > (NEON_SVE_BRIDGE_FUNCTION): Defines macro for NEON_SVE_BRIDGE > functions. > * config/aarch64/aarch64-sve-builtins-shapes.h: New shapes. > * config/aarch64/aarch64-sve-builtins-shapes.cc > (parse_element_type): Add NEON element types. > (parse_type): Likewise. > (struct get_neonq_def): Defines function shape for get_neonq. > (struct set_neonq_def): Defines function shape for set_neonq. > (struct dup_neonq_def): Defines function shape for dup_neonq. > * config/aarch64/aarch64-sve-builtins.cc (DEF_SVE_TYPE_SUFFIX): > (DEF_SVE_NEON_TYPE_SUFFIX): Defines > macro for NEON_SVE_BRIDGE type suffixes. > (DEF_NEON_SVE_FUNCTION): Defines > macro for NEON_SVE_BRIDGE functions. > (function_resolver::infer_neon128_vector_type): Infers type suffix > for overloaded functions. > (init_neon_sve_builtins): Initialise neon_sve_bridge_builtins for LTO. > (handle_arm_neon_sve_bridge_h): Handles #pragma arm_neon_sve_bridge.h. > * config/aarch64/aarch64-sve-builtins.def > (DEF_SVE_NEON_TYPE_SUFFIX): Macro for handling neon_sve type suffixes. > (bf16): Replace entry with neon-sve entry. > (f16): Likewise. > (f32): Likewise. > (f64): Likewise. > (s8): Likewise. > (s16): Likewise. > (s32): Likewise. > (s64): Likewise. > (u8): Likewise. > (u16): Likewise. > (u32): Likewise. > (u64): Likewise. > * config/aarch64/aarch64-sve-builtins.h > (GCC_AARCH64_SVE_BUILTINS_H): Include aarch64-builtins.h. > (ENTRY): Add aarch64_simd_type definiton. > (enum aarch64_simd_type): Add neon information to type_suffix_info. > (struct type_suffix_info): New function. > * config/aarch64/aarch64-sve.md > (@aarch64_sve_get_neonq_): New intrinsic insn for big endian. > (@aarch64_sve_set_neonq_): Likewise. > (@aarch64_sve_dup_neonq_): Likewise. > * config/aarch64/aarch64.cc > (aarch64_init_builtins): Add call to init_neon_sve_builtins. > (aarch64_output_sve_set_neonq): asm output for Big Endian set_neonq. > * config/aarch64/iterators.md: Add UNSPEC_SET_NEONQ. > * config/aarch64/aarch64-builtins.h: New file. > * config/aarch64/aarch64-neon-sve-bridge-builtins.def: New file. > * config/aarch64/arm_neon_sve_bridge.h: New file. > > gcc/testsuite/ChangeLog: > > * gcc.target/aarch64/sve/acle/asm/test_sve_acle.h: Add include > arm_neon_sve_bridge header file > * gcc.dg/torture/neon-sve-bridge.c: New test. > * gcc.target/aarch64/sve/acle/asm/dup_neonq_bf16.c: New test. > * gcc.target/aarch64/sve/acle/asm/dup_neonq_f16.c: New test. > * gcc.target/aarch64/sve/acle/asm/dup_neonq_f32.c: New test. > * gcc.target/aarch64/sve/acle/asm/dup_neonq_f64.c: New test. > * gcc.target/aarch64/sve/acle/asm/dup_neonq_s16.c: New test. > * gcc.target/aarch64/sve/acle/asm/dup_neonq_s32.c: New test. > * gcc.target/aarch64/sve/acle/asm/dup_neonq_s64.c: New test. > * gcc.target/aarch64/sve/acle/asm/dup_neonq_s8.c: New test. > * gcc.target/aarch64/sve/acle/asm/dup_neonq_u16.c: New test. > * gcc.target/aarch64/sve/acle/asm/dup_neonq_u32.c: New test. > * gcc.target/aarch64/sve/acle/asm/dup_neonq_u64.c: New test. > * gcc.target/aarch64/sve/acle/asm/dup_neonq_u8.c: New test. > * gcc.target/aarch64/sve/acle/asm/get_neonq_bf16.c: New test. > * gcc.target/aarch64/sve/acle/asm/get_neonq_f16.c: New test. > * gcc.target/aarch64/sve/acle/asm/get_neonq_f32.c: New test. > * gcc.target/aarch64/sve/acle/asm/get_neonq_f64.c: New test. > * gcc.target/aarch64/sve/acle/asm/get_neonq_s16.c: New test. > * gcc.target/aarch64/sve/acle/asm/get_neonq_s32.c: New test. > * gcc.target/aarch64/sve/acle/asm/get_neonq_s64.c: New test. > * gcc.target/aarch64/sve/acle/asm/get_neonq_s8.c: New test. > * gcc.target/aarch64/sve/acle/asm/get_neonq_u16.c: New test. > * gcc.target/aarch64/sve/acle/asm/get_neonq_u32.c: New test. > * gcc.target/aarch64/sve/acle/asm/get_neonq_u64.c: New test. > * gcc.target/aarch64/sve/acle/asm/get_neonq_u8.c: New test. > * gcc.target/aarch64/sve/acle/asm/set_neonq_bf16.c: New test. > * gcc.target/aarch64/sve/acle/asm/set_neonq_f16.c: New test. > * gcc.target/aarch64/sve/acle/asm/set_neonq_f32.c: New test. > * gcc.target/aarch64/sve/acle/asm/set_neonq_f64.c: New test. > * gcc.target/aarch64/sve/acle/asm/set_neonq_s16.c: New test. > * gcc.target/aarch64/sve/acle/asm/set_neonq_s32.c: New test. > * gcc.target/aarch64/sve/acle/asm/set_neonq_s64.c: New test. > * gcc.target/aarch64/sve/acle/asm/set_neonq_s8.c: New test. > * gcc.target/aarch64/sve/acle/asm/set_neonq_u16.c: New test. > * gcc.target/aarch64/sve/acle/asm/set_neonq_u32.c: New test. > * gcc.target/aarch64/sve/acle/asm/set_neonq_u64.c: New test. > * gcc.target/aarch64/sve/acle/asm/set_neonq_u8.c: New test. > * gcc.target/aarch64/sve/acle/general-c/dup_neonq_1.c: New test. > * gcc.target/aarch64/sve/acle/general-c/get_neonq_1.c: New test. > * gcc.target/aarch64/sve/acle/general-c/set_neonq_1.c: New test. Thanks, looks good. Some comments below, but nothing major. > > diff --git a/gcc/config.gcc b/gcc/config.gcc > index d34ea246a980b5d8aaab86e4459de5ef4d341fe2..1c92c390e9b1b14d2f756ec233bba713ca8aaa94 100644 > --- a/gcc/config.gcc > +++ b/gcc/config.gcc > @@ -345,7 +345,7 @@ m32c*-*-*) > ;; > aarch64*-*-*) > cpu_type=aarch64 > - extra_headers="arm_fp16.h arm_neon.h arm_bf16.h arm_acle.h arm_sve.h" > + extra_headers="arm_fp16.h arm_neon.h arm_bf16.h arm_acle.h arm_sve.h arm_neon_sve_bridge.h" > c_target_objs="aarch64-c.o" > cxx_target_objs="aarch64-c.o" > d_target_objs="aarch64-d.o" > diff --git a/gcc/config/aarch64/aarch64-builtins.h b/gcc/config/aarch64/aarch64-builtins.h > new file mode 100644 > index 0000000000000000000000000000000000000000..ec4580981587ab3acbb39e0b0721ed247e309a74 > --- /dev/null > +++ b/gcc/config/aarch64/aarch64-builtins.h > @@ -0,0 +1,86 @@ > +/* Builtins' description for AArch64 SIMD architecture. > + Copyright (C) 2023 Free Software Foundation, Inc. > + This file is part of GCC. > + GCC is free software; you can redistribute it and/or modify it > + under the terms of the GNU General Public License as published by > + the Free Software Foundation; either version 3, or (at your option) > + any later version. > + GCC is distributed in the hope that it will be useful, but > + WITHOUT ANY WARRANTY; without even the implied warranty of > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > + General Public License for more details. > + You should have received a copy of the GNU General Public License > + along with GCC; see the file COPYING3. If not see > + . */ Please keep the copyright text verbatim, including the blank lines. (aarch64-neon-sve-bridge-builtins.def looks good.) > +#ifndef GCC_AARCH64_BUILTINS_H > +#define GCC_AARCH64_BUILTINS_H > + > +enum aarch64_type_qualifiers > +{ > + /* T foo. */ > + qualifier_none = 0x0, > + /* unsigned T foo. */ > + qualifier_unsigned = 0x1, /* 1 << 0 */ > + /* const T foo. */ > + qualifier_const = 0x2, /* 1 << 1 */ > + /* T *foo. */ > + qualifier_pointer = 0x4, /* 1 << 2 */ > + /* Used when expanding arguments if an operand could > + be an immediate. */ > + qualifier_immediate = 0x8, /* 1 << 3 */ > + qualifier_maybe_immediate = 0x10, /* 1 << 4 */ > + /* void foo (...). */ > + qualifier_void = 0x20, /* 1 << 5 */ > + /* 1 << 6 is now unused */ > + /* Some builtins should use the T_*mode* encoded in a simd_builtin_datum > + rather than using the type of the operand. */ > + qualifier_map_mode = 0x80, /* 1 << 7 */ > + /* qualifier_pointer | qualifier_map_mode */ > + qualifier_pointer_map_mode = 0x84, > + /* qualifier_const | qualifier_pointer | qualifier_map_mode */ > + qualifier_const_pointer_map_mode = 0x86, > + /* Polynomial types. */ > + qualifier_poly = 0x100, > + /* Lane indices - must be in range, and flipped for bigendian. */ > + qualifier_lane_index = 0x200, > + /* Lane indices for single lane structure loads and stores. */ > + qualifier_struct_load_store_lane_index = 0x400, > + /* Lane indices selected in pairs. - must be in range, and flipped for > + bigendian. */ > + qualifier_lane_pair_index = 0x800, > + /* Lane indices selected in quadtuplets. - must be in range, and flipped for > + bigendian. */ > + qualifier_lane_quadtup_index = 0x1000, > +}; > +#define ENTRY(E, M, Q, G) E, > +enum aarch64_simd_type > +{ > +#include "aarch64-simd-builtin-types.def" > + ARM_NEON_H_TYPES_LAST > +}; > +#undef ENTRY > +struct GTY(()) aarch64_simd_type_info > +{ > + enum aarch64_simd_type type; > + /* Internal type name. */ > + const char *name; > + /* Internal type name(mangled). The mangled names conform to the > + AAPCS64 (see "Procedure Call Standard for the ARM 64-bit Architecture", > + Appendix A). To qualify for emission with the mangled names defined in > + that document, a vector type must not only be of the correct mode but also > + be of the correct internal AdvSIMD vector type (e.g. __Int8x8_t); these > + types are registered by aarch64_init_simd_builtin_types (). In other > + words, vector types defined in other ways e.g. via vector_size attribute > + will get default mangled names. */ > + const char *mangle; > + /* Internal type. */ > + tree itype; > + /* Element type. */ > + tree eltype; > + /* Machine mode the internal type maps to. */ > + enum machine_mode mode; > + /* Qualifiers. */ > + enum aarch64_type_qualifiers q; > +}; Sorry for the trivia, but: I thought the blank lines in the original aarch64_simd_type_info made this easier to read. > +extern aarch64_simd_type_info aarch64_simd_types[]; > +#endif > \ No newline at end of file > diff --git a/gcc/config/aarch64/aarch64-builtins.cc b/gcc/config/aarch64/aarch64-builtins.cc > index 04f59fd9a54306d6422b03e32dce79bc00aed4f8..0b039c075a5cb312339729d388c9be0072f80b91 100644 > --- a/gcc/config/aarch64/aarch64-builtins.cc > +++ b/gcc/config/aarch64/aarch64-builtins.cc > @@ -47,6 +47,7 @@ > #include "stringpool.h" > #include "attribs.h" > #include "gimple-fold.h" > +#include "aarch64-builtins.h" > > #define v8qi_UP E_V8QImode > #define v8di_UP E_V8DImode > @@ -183,47 +184,8 @@ > #define SIMD_INTR_QUAL(suffix) QUAL_##suffix > #define SIMD_INTR_LENGTH_CHAR(length) LENGTH_##length > > - > #define SIMD_MAX_BUILTIN_ARGS 5 > > -enum aarch64_type_qualifiers > -{ > - /* T foo. */ > - qualifier_none = 0x0, > - /* unsigned T foo. */ > - qualifier_unsigned = 0x1, /* 1 << 0 */ > - /* const T foo. */ > - qualifier_const = 0x2, /* 1 << 1 */ > - /* T *foo. */ > - qualifier_pointer = 0x4, /* 1 << 2 */ > - /* Used when expanding arguments if an operand could > - be an immediate. */ > - qualifier_immediate = 0x8, /* 1 << 3 */ > - qualifier_maybe_immediate = 0x10, /* 1 << 4 */ > - /* void foo (...). */ > - qualifier_void = 0x20, /* 1 << 5 */ > - /* 1 << 6 is now unused */ > - /* Some builtins should use the T_*mode* encoded in a simd_builtin_datum > - rather than using the type of the operand. */ > - qualifier_map_mode = 0x80, /* 1 << 7 */ > - /* qualifier_pointer | qualifier_map_mode */ > - qualifier_pointer_map_mode = 0x84, > - /* qualifier_const | qualifier_pointer | qualifier_map_mode */ > - qualifier_const_pointer_map_mode = 0x86, > - /* Polynomial types. */ > - qualifier_poly = 0x100, > - /* Lane indices - must be in range, and flipped for bigendian. */ > - qualifier_lane_index = 0x200, > - /* Lane indices for single lane structure loads and stores. */ > - qualifier_struct_load_store_lane_index = 0x400, > - /* Lane indices selected in pairs. - must be in range, and flipped for > - bigendian. */ > - qualifier_lane_pair_index = 0x800, > - /* Lane indices selected in quadtuplets. - must be in range, and flipped for > - bigendian. */ > - qualifier_lane_quadtup_index = 0x1000, > -}; > - > /* Flags that describe what a function might do. */ > const unsigned int FLAG_NONE = 0U; > const unsigned int FLAG_READ_FPCR = 1U << 0; > @@ -883,47 +845,9 @@ const char *aarch64_scalar_builtin_types[] = { > NULL > }; > > -#define ENTRY(E, M, Q, G) E, > -enum aarch64_simd_type > -{ > -#include "aarch64-simd-builtin-types.def" > - ARM_NEON_H_TYPES_LAST > -}; > -#undef ENTRY > - > -struct GTY(()) aarch64_simd_type_info > -{ > - enum aarch64_simd_type type; > - > - /* Internal type name. */ > - const char *name; > - > - /* Internal type name(mangled). The mangled names conform to the > - AAPCS64 (see "Procedure Call Standard for the ARM 64-bit Architecture", > - Appendix A). To qualify for emission with the mangled names defined in > - that document, a vector type must not only be of the correct mode but also > - be of the correct internal AdvSIMD vector type (e.g. __Int8x8_t); these > - types are registered by aarch64_init_simd_builtin_types (). In other > - words, vector types defined in other ways e.g. via vector_size attribute > - will get default mangled names. */ > - const char *mangle; > - > - /* Internal type. */ > - tree itype; > - > - /* Element type. */ > - tree eltype; > - > - /* Machine mode the internal type maps to. */ > - enum machine_mode mode; > - > - /* Qualifiers. */ > - enum aarch64_type_qualifiers q; > -}; > - > #define ENTRY(E, M, Q, G) \ > {E, "__" #E, #G "__" #E, NULL_TREE, NULL_TREE, E_##M##mode, qualifier_##Q}, > -static GTY(()) struct aarch64_simd_type_info aarch64_simd_types [] = { > +extern GTY(()) struct aarch64_simd_type_info aarch64_simd_types [] = { > #include "aarch64-simd-builtin-types.def" > }; > #undef ENTRY > diff --git a/gcc/config/aarch64/aarch64-c.cc b/gcc/config/aarch64/aarch64-c.cc > index ab8844f6049dc95b97648b651bfcd3a4ccd3ca0b..591cbaad24a4874029ebddedef23f22ff5196295 100644 > --- a/gcc/config/aarch64/aarch64-c.cc > +++ b/gcc/config/aarch64/aarch64-c.cc > @@ -295,6 +295,8 @@ aarch64_pragma_aarch64 (cpp_reader *) > handle_arm_neon_h (); > else if (strcmp (name, "arm_acle.h") == 0) > handle_arm_acle_h (); > + else if (strcmp (name, "arm_neon_sve_bridge.h") == 0) > + aarch64_sve::handle_arm_neon_sve_bridge_h (); > else > error ("unknown %<#pragma GCC aarch64%> option %qs", name); > } > diff --git a/gcc/config/aarch64/aarch64-neon-sve-bridge-builtins.def b/gcc/config/aarch64/aarch64-neon-sve-bridge-builtins.def > new file mode 100644 > index 0000000000000000000000000000000000000000..0c3cf233c9382b2f7420379054a53fa846d46c8c > --- /dev/null > +++ b/gcc/config/aarch64/aarch64-neon-sve-bridge-builtins.def > @@ -0,0 +1,28 @@ > +/* Builtin lists for AArch64 NEON-SVE-Bridge > + Copyright (C) 2023 Free Software Foundation, Inc. > + > + This file is part of GCC. > + > + GCC is free software; you can redistribute it and/or modify it > + under the terms of the GNU General Public License as published by > + the Free Software Foundation; either version 3, or (at your option) > + any later version. > + > + GCC is distributed in the hope that it will be useful, but > + WITHOUT ANY WARRANTY; without even the implied warranty of > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > + General Public License for more details. > + > + You should have received a copy of the GNU General Public License > + along with GCC; see the file COPYING3. If not see > + . */ > + > +#ifndef DEF_NEON_SVE_FUNCTION > +#define DEF_NEON_SVE_FUNCTION(A, B, C, D) > +#endif > + > +DEF_NEON_SVE_FUNCTION (svset_neonq, set_neonq, all_data, none) > +DEF_NEON_SVE_FUNCTION (svget_neonq, get_neonq, all_data, none) > +DEF_NEON_SVE_FUNCTION (svdup_neonq, dup_neonq, all_data, none) > + > +#undef DEF_NEON_SVE_FUNCTION > \ No newline at end of file > diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h > index 60a55f4bc1956786ea687fc7cad7ec9e4a84e1f0..5d05cac51c237b12bd2b2f11eb91b01480750ded 100644 > --- a/gcc/config/aarch64/aarch64-protos.h > +++ b/gcc/config/aarch64/aarch64-protos.h > @@ -817,6 +817,7 @@ char *aarch64_output_simd_mov_immediate (rtx, unsigned, > enum simd_immediate_check w = AARCH64_CHECK_MOV); > char *aarch64_output_sve_mov_immediate (rtx); > char *aarch64_output_sve_ptrues (rtx); > +const char *aarch64_output_sve_set_neonq (rtx *, machine_mode); > bool aarch64_pad_reg_upward (machine_mode, const_tree, bool); > bool aarch64_regno_ok_for_base_p (int, bool); > bool aarch64_regno_ok_for_index_p (int, bool); > @@ -990,7 +991,9 @@ void handle_arm_neon_h (void); > > namespace aarch64_sve { > void init_builtins (); > + void init_neon_sve_builtins (); > void handle_arm_sve_h (); > + void handle_arm_neon_sve_bridge_h (); > tree builtin_decl (unsigned, bool); > bool builtin_type_p (const_tree); > bool builtin_type_p (const_tree, unsigned int *, unsigned int *); > diff --git a/gcc/config/aarch64/aarch64-sve-builtins-base.h b/gcc/config/aarch64/aarch64-sve-builtins-base.h > index d300e3a85d00b58ad790851a81d43af709b66bce..df75e4c1ecf81f3ddfa256edbcf8637d092fcfde 100644 > --- a/gcc/config/aarch64/aarch64-sve-builtins-base.h > +++ b/gcc/config/aarch64/aarch64-sve-builtins-base.h > @@ -299,6 +299,12 @@ namespace aarch64_sve > extern const function_base *const svzip2; > extern const function_base *const svzip2q; > } > + namespace neon_sve_bridge_functions > + { > + extern const function_base *const svset_neonq; > + extern const function_base *const svget_neonq; > + extern const function_base *const svdup_neonq; > + } > } > > #endif > diff --git a/gcc/config/aarch64/aarch64-sve-builtins-base.cc b/gcc/config/aarch64/aarch64-sve-builtins-base.cc > index 9010ecca6da23c107f5ded9ab3cfa678e308daf9..5e3b1fb19776a84710f2d730bc028614ecd54095 100644 > --- a/gcc/config/aarch64/aarch64-sve-builtins-base.cc > +++ b/gcc/config/aarch64/aarch64-sve-builtins-base.cc > @@ -44,6 +44,7 @@ > #include "aarch64-sve-builtins-shapes.h" > #include "aarch64-sve-builtins-base.h" > #include "aarch64-sve-builtins-functions.h" > +#include "aarch64-builtins.h" > #include "ssa.h" > #include "gimple-fold.h" > > @@ -1064,6 +1065,131 @@ public: > } > }; > > +class svget_neonq_impl : public function_base > +{ > +public: > + gimple * > + fold (gimple_folder &f) const override > + { > + if (BYTES_BIG_ENDIAN) > + return NULL; > + tree rhs_tuple = gimple_call_arg (f.call, 0); > + tree rhs_vector = build3 (BIT_FIELD_REF, TREE_TYPE (f.lhs), > + rhs_tuple, bitsize_int(128), bitsize_int(0)); Formatting nit: convention is to add a space before the "(128)" and "(0)". The argument isn't a tuple, but instead an SVE vector. Maybe just use rhs_vector for both, or rhs_sve_vector for the first, etc. > + return gimple_build_assign (f.lhs, rhs_vector); > + } > + rtx > + expand (function_expander &e) const override > + { > + if (BYTES_BIG_ENDIAN) > + { > + machine_mode mode = e.vector_mode (0); > + insn_code icode = code_for_aarch64_sve_get_neonq (mode); > + unsigned int nunits = 128 / GET_MODE_UNIT_BITSIZE (mode); > + rtx indices = aarch64_gen_stepped_int_parallel > + (nunits, (nunits - 1) , -1); Formatting: (nunits, units - 1, -1); > + > + e.add_output_operand (icode); > + e.add_input_operand (icode, e.args[0]); > + e.add_fixed_operand (indices); > + return e.generate_insn (icode); > + } > + return simplify_gen_subreg (e.vector_mode (0), e.args[0], > + GET_MODE (e.args[0]), e.vector_mode (0) is the mode of the argument rather than the mode of the result. > + INTVAL (e.args[1]) * BYTES_PER_SVE_VECTOR); There is no argument 1. I think the final simplify_gen_subreg argument should just be zero. It's hard to test this with the fold in place, but it would be good to try the tests with the fold disabled. > + } > +}; > + > +class svset_neonq_impl : public function_base > +{ > +public: > + rtx > + expand (function_expander &e) const override > + { > + insn_code icode; > + machine_mode mode = e.vector_mode (0); > + icode = code_for_vcond_mask (mode, mode); > + rtx_vector_builder builder (VNx16BImode, 16, 2); > + for (unsigned int i = 0; i < 16; i++) > + { > + builder.quick_push (CONST1_RTX (BImode)); > + } Formatting trivia, sorry, but: no braces around single statements. Same for the rest of the patch. > + for (unsigned int i = 0; i < 16; i++) > + { > + builder.quick_push (CONST0_RTX (BImode)); > + } > + e.args.quick_push (builder.build ()); > + if (BYTES_BIG_ENDIAN) > + { > + return e.use_exact_insn (code_for_aarch64_sve_set_neonq (mode)); > + } Very minor, but it might be good to move the icode down here: insn_code icode = code_for_vcond_mask (mode, mode); to avoid giving the impression that it's used for big-endian. > + e.args[1] = lowpart_subreg (mode, e.args[1], GET_MODE (e.args[1])); > + e.add_output_operand (icode); > + e.add_input_operand (icode, e.args[1]); > + e.add_input_operand (icode, e.args[0]); > + e.add_input_operand (icode, e.args[2]); > + return e.generate_insn (icode); > + } > +}; > + > +class svdup_neonq_impl : public function_base > +{ > +public: > + gimple * > + fold (gimple_folder &f) const override > + { > + if (BYTES_BIG_ENDIAN) > + { > + return NULL; > + } > + tree rhs_vector = gimple_call_arg (f.call, 0); > + unsigned int nargs = gimple_call_num_args (f.call); > + unsigned HOST_WIDE_INT NEONnelts > + = TYPE_VECTOR_SUBPARTS (TREE_TYPE (rhs_vector)).to_constant (); > + poly_uint64 SVEnelts; > + SVEnelts = TYPE_VECTOR_SUBPARTS (TREE_TYPE (f.lhs)); GCC style is to use lower-case variable names, so maybe neon_nelts and sve_nelts instead. > + vec_perm_builder builder (SVEnelts, NEONnelts, 1); > + for (unsigned int i = 0; i < NEONnelts; i++) > + { > + builder.quick_push (i); > + } > + vec_perm_indices indices (builder, 1, NEONnelts); > + tree perm_type = build_vector_type (ssizetype, SVEnelts); > + return gimple_build_assign (f.lhs, VEC_PERM_EXPR, > + rhs_vector, > + rhs_vector, > + vec_perm_indices_to_tree (perm_type, indices)); > + } > + rtx > + expand (function_expander &e) const override > + { > + insn_code icode; > + machine_mode mode = e.vector_mode (0); > + if (BYTES_BIG_ENDIAN) > + { > + icode = code_for_aarch64_sve_dup_neonq (mode); > + unsigned int nunits = 128 / GET_MODE_UNIT_BITSIZE (mode); > + rtx indices = aarch64_gen_stepped_int_parallel > + (nunits, (nunits - 1) , -1); Same formatting comment as above. > + > + e.add_output_operand (icode); > + e.add_input_operand (icode, e.args[0]); > + e.add_fixed_operand (indices); > + return e.generate_insn (icode); > + } > + if (valid_for_const_vector_p (GET_MODE_INNER (mode), e.args.last ())) > + /* Duplicate the constant to fill a vector. The pattern optimizes > + various cases involving constant operands, falling back to SEL > + if necessary. */ > + icode = code_for_vcond_mask (mode, mode); > + else > + /* Use the pattern for selecting between a duplicated scalar > + variable and a vector fallback. */ > + icode = code_for_aarch64_sel_dup (mode); > + return e.use_vcond_mask_insn (icode); I think this should just unconditionally use: @aarch64_vec_duplicate_vq_le Again, the only good way to test it is to disable the fold locally and then run the tests. > + } > +}; > + > class svindex_impl : public function_base > { > public: > @@ -3028,5 +3154,8 @@ FUNCTION (svzip1q, unspec_based_function, (UNSPEC_ZIP1Q, UNSPEC_ZIP1Q, > FUNCTION (svzip2, svzip_impl, (1)) > FUNCTION (svzip2q, unspec_based_function, (UNSPEC_ZIP2Q, UNSPEC_ZIP2Q, > UNSPEC_ZIP2Q)) > +NEON_SVE_BRIDGE_FUNCTION (svget_neonq, svget_neonq_impl,) > +NEON_SVE_BRIDGE_FUNCTION (svset_neonq, svset_neonq_impl,) > +NEON_SVE_BRIDGE_FUNCTION (svdup_neonq, svdup_neonq_impl,) > > } /* end namespace aarch64_sve */ > diff --git a/gcc/config/aarch64/aarch64-sve-builtins-functions.h b/gcc/config/aarch64/aarch64-sve-builtins-functions.h > index 2729877d914414eff33182e03ab1dfc94a3515fa..bfb7fea674a905a2eb99f2bac7cbcb72af681b52 100644 > --- a/gcc/config/aarch64/aarch64-sve-builtins-functions.h > +++ b/gcc/config/aarch64/aarch64-sve-builtins-functions.h > @@ -622,4 +622,8 @@ public: > namespace { static CONSTEXPR const CLASS NAME##_obj ARGS; } \ > namespace functions { const function_base *const NAME = &NAME##_obj; } > > +#define NEON_SVE_BRIDGE_FUNCTION(NAME, CLASS, ARGS) \ > + namespace { static CONSTEXPR const CLASS NAME##_obj ARGS; } \ > + namespace neon_sve_bridge_functions { const function_base *const NAME = &NAME##_obj; } > + > #endif > diff --git a/gcc/config/aarch64/aarch64-sve-builtins-shapes.h b/gcc/config/aarch64/aarch64-sve-builtins-shapes.h > index 7483c1d04b8e463e607e8e65aa94233460f77648..5aff20d1d21afddb934be4d5a103049b0b6c40ea 100644 > --- a/gcc/config/aarch64/aarch64-sve-builtins-shapes.h > +++ b/gcc/config/aarch64/aarch64-sve-builtins-shapes.h > @@ -105,9 +105,11 @@ namespace aarch64_sve > extern const function_shape *const count_vector; > extern const function_shape *const create; > extern const function_shape *const dupq; > + extern const function_shape *const dup_neonq; > extern const function_shape *const ext; > extern const function_shape *const fold_left; > extern const function_shape *const get; > + extern const function_shape *const get_neonq; > extern const function_shape *const inc_dec; > extern const function_shape *const inc_dec_pat; > extern const function_shape *const inc_dec_pred; > @@ -135,6 +137,7 @@ namespace aarch64_sve > extern const function_shape *const reduction_wide; > extern const function_shape *const set; > extern const function_shape *const setffr; > + extern const function_shape *const set_neonq; > extern const function_shape *const shift_left_imm_long; > extern const function_shape *const shift_left_imm_to_uint; > extern const function_shape *const shift_right_imm; > diff --git a/gcc/config/aarch64/aarch64-sve-builtins-shapes.cc b/gcc/config/aarch64/aarch64-sve-builtins-shapes.cc > index af816c4c9e705d9cc4bce5cc50481cb27e6a03a7..4b0a84fe0cb5b5f4bc6b7dd012de0bc75ee4326b 100644 > --- a/gcc/config/aarch64/aarch64-sve-builtins-shapes.cc > +++ b/gcc/config/aarch64/aarch64-sve-builtins-shapes.cc > @@ -29,6 +29,7 @@ > #include "optabs.h" > #include "aarch64-sve-builtins.h" > #include "aarch64-sve-builtins-shapes.h" > +#include "aarch64-builtins.h" > > /* In the comments below, _t0 represents the first type suffix and _t1 > represents the second. Square brackets enclose characters that are > @@ -158,6 +159,8 @@ parse_element_type (const function_instance &instance, const char *&format) > s - a scalar type with the given element suffix > t - a vector or tuple type with given element suffix [*1] > v - a vector with the given element suffix > + D - a 64 bit neon vector > + Q - a 128 bit neon vector > > where has the format described above parse_element_type > > @@ -224,6 +227,20 @@ parse_type (const function_instance &instance, const char *&format) > return acle_vector_types[0][type_suffixes[suffix].vector_type]; > } > > + if (ch == 'D') > + { > + type_suffix_index suffix = parse_element_type (instance, format); > + int neon_index = type_suffixes[suffix].neon64_type; > + return aarch64_simd_types[neon_index].itype; > + } > + > + if (ch == 'Q') > + { > + type_suffix_index suffix = parse_element_type (instance, format); > + int neon_index = type_suffixes[suffix].neon128_type; > + return aarch64_simd_types[neon_index].itype; > + } > + > gcc_unreachable (); > } > > @@ -1917,6 +1934,67 @@ struct get_def : public overloaded_base<0> > }; > SHAPE (get) > > +/* xN_t svfoo[_t0](sv_t). */ > +struct get_neonq_def : public overloaded_base<0> > +{ > + void > + build (function_builder &b, const function_group_info &group) const override > + { > + b.add_overloaded_functions (group, MODE_none); > + build_all (b, "Q0,v0", group, MODE_none); > + } > + tree > + resolve (function_resolver &r) const override > + { > + return r.resolve_unary (); > + } > +}; > +SHAPE (get_neonq) > + > +/* sv_t svfoo[_t0](sv_t, xN_t). */ > +struct set_neonq_def : public overloaded_base<0> > +{ > + void > + build (function_builder &b, const function_group_info &group) const override > + { > + b.add_overloaded_functions (group, MODE_none); > + build_all (b, "v0,v0,Q0", group, MODE_none); > + } > + tree > + resolve (function_resolver &r) const override > + { > + unsigned int i, nargs; > + type_suffix_index type; > + if (!r.check_gp_argument (2, i, nargs) > + || (type = r.infer_neon128_vector_type (i + 1)) == NUM_TYPE_SUFFIXES) > + return error_mark_node; > + return r.resolve_to (r.mode_suffix_id, type); > + } > +}; > +SHAPE (set_neonq) > + > +/* sv_t svfoo[_t0](xN_t). */ > +struct dup_neonq_def : public overloaded_base<0> > +{ > + void > + build (function_builder &b, const function_group_info &group) const override > + { > + b.add_overloaded_functions (group, MODE_none); > + build_all (b, "v0,Q0", group, MODE_none); > + } > + tree > + resolve (function_resolver &r) const override > + { > + unsigned int i, nargs; > + type_suffix_index type; > + if (!r.check_gp_argument (1, i, nargs) > + || (type = r.infer_neon128_vector_type (i)) == NUM_TYPE_SUFFIXES) > + return error_mark_node; > + return r.resolve_to (r.mode_suffix_id, type); > + } > +}; > +SHAPE (dup_neonq) > + > /* sv_t svfoo[_t0](sv_t, uint64_t) > _t svfoo[_n_t0](_t, uint64_t) > > diff --git a/gcc/config/aarch64/aarch64-sve-builtins.cc b/gcc/config/aarch64/aarch64-sve-builtins.cc > index 161a14edde7c9fb1b13b146cf50463e2d78db264..6ff5c65e2610de8309a57b004e16d4602ea76999 100644 > --- a/gcc/config/aarch64/aarch64-sve-builtins.cc > +++ b/gcc/config/aarch64/aarch64-sve-builtins.cc > @@ -52,6 +52,7 @@ > #include "aarch64-sve-builtins-base.h" > #include "aarch64-sve-builtins-sve2.h" > #include "aarch64-sve-builtins-shapes.h" > +#include "aarch64-builtins.h" > > namespace aarch64_sve { > > @@ -127,7 +128,8 @@ CONSTEXPR const mode_suffix_info mode_suffixes[] = { > > /* Static information about each type_suffix_index. */ > CONSTEXPR const type_suffix_info type_suffixes[NUM_TYPE_SUFFIXES + 1] = { > -#define DEF_SVE_TYPE_SUFFIX(NAME, ACLE_TYPE, CLASS, BITS, MODE) \ > +#define DEF_SVE_NEON_TYPE_SUFFIX(NAME, ACLE_TYPE, CLASS, BITS, MODE, \ > + NEON64, NEON128) \ > { "_" #NAME, \ > VECTOR_TYPE_##ACLE_TYPE, \ > TYPE_##CLASS, \ > @@ -138,10 +140,15 @@ CONSTEXPR const type_suffix_info type_suffixes[NUM_TYPE_SUFFIXES + 1] = { > TYPE_##CLASS == TYPE_float, \ > TYPE_##CLASS == TYPE_bool, \ > 0, \ > - MODE }, > + MODE, \ > + NEON64, \ > + NEON128 }, > +#define DEF_SVE_TYPE_SUFFIX(NAME, ACLE_TYPE, CLASS, BITS, MODE) \ > + DEF_SVE_NEON_TYPE_SUFFIX (NAME, ACLE_TYPE, CLASS, BITS, MODE, \ > + ARM_NEON_H_TYPES_LAST, ARM_NEON_H_TYPES_LAST) > #include "aarch64-sve-builtins.def" > { "", NUM_VECTOR_TYPES, TYPE_bool, 0, 0, false, false, false, false, > - 0, VOIDmode } > + 0, VOIDmode, ARM_NEON_H_TYPES_LAST, ARM_NEON_H_TYPES_LAST } > }; > > /* Define a TYPES_ macro for each combination of type > @@ -529,6 +536,13 @@ static CONSTEXPR const function_group_info function_groups[] = { > #include "aarch64-sve-builtins.def" > }; > > +/* A list of all NEON-SVE-Bridge ACLE functions. */ > +static CONSTEXPR const function_group_info neon_sve_function_groups[] = { > +#define DEF_NEON_SVE_FUNCTION(NAME, SHAPE, TYPES, PREDS) \ > + { #NAME, &neon_sve_bridge_functions::NAME, &shapes::SHAPE, types_##TYPES, preds_##PREDS }, > +#include "aarch64-neon-sve-bridge-builtins.def" > +}; > + > /* The scalar type associated with each vector type. */ > extern GTY(()) tree scalar_types[NUM_VECTOR_TYPES]; > tree scalar_types[NUM_VECTOR_TYPES]; > @@ -1403,6 +1417,32 @@ function_resolver::infer_integer_vector_type (unsigned int argno) > return type; > } > > +type_suffix_index > +function_resolver::infer_neon128_vector_type (unsigned int argno) Missing function comment. > +{ > + tree actual = get_argument_type (argno); > + if (actual == error_mark_node) > + return NUM_TYPE_SUFFIXES; > + > + for (unsigned int suffix_i = 0; suffix_i < NUM_TYPE_SUFFIXES; ++suffix_i) > + { > + int neon_index = type_suffixes[suffix_i].neon128_type; > + if (neon_index != ARM_NEON_H_TYPES_LAST) > + { > + tree type = aarch64_simd_types[neon_index].itype; > + if (type && matches_type_p (type, actual)) > + { > + return type_suffix_index (suffix_i); > + } > + } > + } > + > + error_at (location, "passing %qT to argument %d of %qE, which" > + " expects a 128 bit NEON vector type", actual, argno + 1, fndecl); > + return NUM_TYPE_SUFFIXES; > +} > + > + > /* Like infer_vector_type, but also require the type to be an unsigned > integer. */ > type_suffix_index > @@ -3410,6 +3450,13 @@ init_builtins () > handle_arm_sve_h (); > } > > +void > +init_neon_sve_builtins () Missing function comment. > +{ > + if (in_lto_p) > + handle_arm_neon_sve_bridge_h (); > +} > + > /* Register vector type TYPE under its arm_sve.h name. */ > static void > register_vector_type (vector_type_index type) > @@ -3560,6 +3607,16 @@ handle_arm_sve_h () > builder.register_function_group (function_groups[i]); > } > > +/* Implement #pragma GCC aarch64 "arm_neon_sve_bridge.h". */ > +void > +handle_arm_neon_sve_bridge_h () > +{ > + /* Define the functions. */ > + function_builder builder; > + for (unsigned int i = 0; i < ARRAY_SIZE (neon_sve_function_groups); ++i) > + builder.register_function_group (neon_sve_function_groups[i]); > +} > + > /* Return the function decl with SVE function subcode CODE, or error_mark_node > if no such function exists. */ > tree > diff --git a/gcc/config/aarch64/aarch64-sve-builtins.def b/gcc/config/aarch64/aarch64-sve-builtins.def > index 534f6e69d72342fdcfcc00bd330585db1eae32e1..e8b4a919e1bb7a2d5d3239e6d303c9ee4e73d54f 100644 > --- a/gcc/config/aarch64/aarch64-sve-builtins.def > +++ b/gcc/config/aarch64/aarch64-sve-builtins.def > @@ -29,6 +29,11 @@ > #define DEF_SVE_TYPE_SUFFIX(A, B, C, D, E) > #endif > > +#ifndef DEF_SVE_NEON_TYPE_SUFFIX > +#define DEF_SVE_NEON_TYPE_SUFFIX(A, B, C, D, E, F, G) \ > + DEF_SVE_TYPE_SUFFIX(A, B, C, D, E) > +#endif > + > #ifndef DEF_SVE_FUNCTION > #define DEF_SVE_FUNCTION(A, B, C, D) > #endif > @@ -82,23 +87,36 @@ DEF_SVE_TYPE_SUFFIX (b8, svbool_t, bool, 8, VNx16BImode) > DEF_SVE_TYPE_SUFFIX (b16, svbool_t, bool, 16, VNx8BImode) > DEF_SVE_TYPE_SUFFIX (b32, svbool_t, bool, 32, VNx4BImode) > DEF_SVE_TYPE_SUFFIX (b64, svbool_t, bool, 64, VNx2BImode) > -DEF_SVE_TYPE_SUFFIX (bf16, svbfloat16_t, bfloat, 16, VNx8BFmode) > -DEF_SVE_TYPE_SUFFIX (f16, svfloat16_t, float, 16, VNx8HFmode) > -DEF_SVE_TYPE_SUFFIX (f32, svfloat32_t, float, 32, VNx4SFmode) > -DEF_SVE_TYPE_SUFFIX (f64, svfloat64_t, float, 64, VNx2DFmode) > -DEF_SVE_TYPE_SUFFIX (s8, svint8_t, signed, 8, VNx16QImode) > -DEF_SVE_TYPE_SUFFIX (s16, svint16_t, signed, 16, VNx8HImode) > -DEF_SVE_TYPE_SUFFIX (s32, svint32_t, signed, 32, VNx4SImode) > -DEF_SVE_TYPE_SUFFIX (s64, svint64_t, signed, 64, VNx2DImode) > -DEF_SVE_TYPE_SUFFIX (u8, svuint8_t, unsigned, 8, VNx16QImode) > -DEF_SVE_TYPE_SUFFIX (u16, svuint16_t, unsigned, 16, VNx8HImode) > -DEF_SVE_TYPE_SUFFIX (u32, svuint32_t, unsigned, 32, VNx4SImode) > -DEF_SVE_TYPE_SUFFIX (u64, svuint64_t, unsigned, 64, VNx2DImode) > +DEF_SVE_NEON_TYPE_SUFFIX (bf16, svbfloat16_t, bfloat, 16, VNx8BFmode, > + Bfloat16x4_t, Bfloat16x8_t) > +DEF_SVE_NEON_TYPE_SUFFIX (f16, svfloat16_t, float, 16, VNx8HFmode, > + Float16x4_t, Float16x8_t) > +DEF_SVE_NEON_TYPE_SUFFIX (f32, svfloat32_t, float, 32, VNx4SFmode, > + Float32x2_t, Float32x4_t) > +DEF_SVE_NEON_TYPE_SUFFIX (f64, svfloat64_t, float, 64, VNx2DFmode, > + Float64x1_t, Float64x2_t) > +DEF_SVE_NEON_TYPE_SUFFIX (s8, svint8_t, signed, 8, VNx16QImode, > + Int8x8_t, Int8x16_t) > +DEF_SVE_NEON_TYPE_SUFFIX (s16, svint16_t, signed, 16, VNx8HImode, > + Int16x4_t, Int16x8_t) > +DEF_SVE_NEON_TYPE_SUFFIX (s32, svint32_t, signed, 32, VNx4SImode, > + Int32x2_t, Int32x4_t) > +DEF_SVE_NEON_TYPE_SUFFIX (s64, svint64_t, signed, 64, VNx2DImode, > + Int64x1_t, Int64x2_t) > +DEF_SVE_NEON_TYPE_SUFFIX (u8, svuint8_t, unsigned, 8, VNx16QImode, > + Uint8x8_t, Uint8x16_t) > +DEF_SVE_NEON_TYPE_SUFFIX (u16, svuint16_t, unsigned, 16, VNx8HImode, > + Uint16x4_t, Uint16x8_t) > +DEF_SVE_NEON_TYPE_SUFFIX (u32, svuint32_t, unsigned, 32, VNx4SImode, > + Uint32x2_t, Uint32x4_t) > +DEF_SVE_NEON_TYPE_SUFFIX (u64, svuint64_t, unsigned, 64, VNx2DImode, > + Uint64x1_t, Uint64x2_t) > > #include "aarch64-sve-builtins-base.def" > #include "aarch64-sve-builtins-sve2.def" > > #undef DEF_SVE_FUNCTION > +#undef DEF_SVE_NEON_TYPE_SUFFIX > #undef DEF_SVE_TYPE_SUFFIX > #undef DEF_SVE_TYPE > #undef DEF_SVE_MODE > diff --git a/gcc/config/aarch64/aarch64-sve-builtins.h b/gcc/config/aarch64/aarch64-sve-builtins.h > index a301570b82ed3477306d203080ccb76608322c09..d32bf5b57ae7b48a130a7794f3f8277ad59ed03e 100644 > --- a/gcc/config/aarch64/aarch64-sve-builtins.h > +++ b/gcc/config/aarch64/aarch64-sve-builtins.h > @@ -20,6 +20,8 @@ > #ifndef GCC_AARCH64_SVE_BUILTINS_H > #define GCC_AARCH64_SVE_BUILTINS_H > > +#include "aarch64-builtins.h" > + > /* The full name of an SVE ACLE function is the concatenation of: > > - the base name ("svadd", etc.) > @@ -206,6 +208,14 @@ struct mode_suffix_info > units_index displacement_units; > }; > > +#define ENTRY(E, M, Q, G) E, > +enum aarch64_simd_type > +{ > +#include "aarch64-simd-builtin-types.def" > + ARM_NEON_H_TYPES_LAST > +}; > +#undef ENTRY > + > /* Static information about a type suffix. */ > struct type_suffix_info > { > @@ -235,6 +245,11 @@ struct type_suffix_info > > /* The associated vector or predicate mode. */ > machine_mode vector_mode : 16; > + > + /* The corresponding 64-bit and 128-bit arm_neon.h types, or > + ARM_NEON_H_TYPES_LAST if none. */ > + aarch64_simd_type neon64_type; > + aarch64_simd_type neon128_type; > }; > > /* Static information about a set of functions. */ > @@ -400,6 +415,7 @@ public: > type_suffix_index infer_vector_or_tuple_type (unsigned int, unsigned int); > type_suffix_index infer_vector_type (unsigned int); > type_suffix_index infer_integer_vector_type (unsigned int); > + type_suffix_index infer_neon128_vector_type (unsigned int); > type_suffix_index infer_unsigned_vector_type (unsigned int); > type_suffix_index infer_sd_vector_type (unsigned int); > type_suffix_index infer_tuple_type (unsigned int); > diff --git a/gcc/config/aarch64/aarch64-sve.md b/gcc/config/aarch64/aarch64-sve.md > index 5a652d8536a0ef9461f40da7b22834e683e73ceb..3e01669fbaaa805ac4de0d2615e50674f265ee59 100644 > --- a/gcc/config/aarch64/aarch64-sve.md > +++ b/gcc/config/aarch64/aarch64-sve.md > @@ -10834,3 +10834,57 @@ > operands[4] = CONSTM1_RTX (mode); > } > ) > + > +(define_insn_and_split "@aarch64_sve_get_neonq_" > + [(set (match_operand: 0 "register_operand" "=w") > + (vec_select: > + (match_operand:SVE_FULL 1 "register_operand" "w") > + (match_operand 2 "descending_int_parallel")))] > + "TARGET_SVE > + && BYTES_BIG_ENDIAN > + && known_eq (INTVAL (XVECEXP (operands[2], 0, 0)), > + GET_MODE_NUNITS (mode) - 1)" > + "#" > + "&& reload_completed" > + [(set (match_dup 0) (match_dup 1))] > + { > + operands[1] = gen_rtx_REG (mode, REGNO (operands[1])); > + } > +) > + > +(define_insn "@aarch64_sve_set_neonq_" > + [(set (match_operand:SVE_FULL 0 "register_operand" "=w") > + (unspec:SVE_FULL > + [(match_operand:SVE_FULL 1 "register_operand" "w") > + (match_operand: 2 "register_operand" "w") > + (match_operand: 3 "register_operand" "Upl")] > + UNSPEC_SET_NEONQ))] > + "TARGET_SVE > + && BYTES_BIG_ENDIAN" > + { > + operands[2] = lowpart_subreg (mode, operands[2], > + GET_MODE (operands[2])); > + return aarch64_output_sve_set_neonq (operands, mode); > + } > +) > + > +(define_insn_and_split "@aarch64_sve_dup_neonq_" > + [(set (match_operand:SVE_FULL 0 "register_operand") > + (vec_duplicate:SVE_FULL > + (vec_select: > + (match_operand: 1 "register_operand") > + (match_operand 2 "descending_int_parallel"))))] We already have @aarch64_vec_duplicate_vq_be for this. Also, the split... > + "TARGET_SVE > + && BYTES_BIG_ENDIAN > + && known_eq (INTVAL (XVECEXP (operands[2], 0, 0)), > + GET_MODE_NUNITS (mode) - 1)" > + {@ [ cons: =0 , 1 ] > + [ w , 0 ] # > + [ w , ?w ] # > + } > + "&& reload_completed" > + [(set (match_dup 0) (match_dup 1))] > + { > + operands[1] = gen_rtx_REG (mode, REGNO (operands[1])); > + } ...here isn't correct, since we need the 128-bit vector to be duplicated to fill the whole of the SVE vector. The split pattern instead just initialises the low 128 bits. > +) > \ No newline at end of file > diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc > index 5fd7063663c67a15e654eea66ffe7193caebf6b6..bf9b725eb63f6b713a4cac430554166bd677e01a 100644 > --- a/gcc/config/aarch64/aarch64.cc > +++ b/gcc/config/aarch64/aarch64.cc > @@ -15594,6 +15594,7 @@ aarch64_init_builtins () > { > aarch64_general_init_builtins (); > aarch64_sve::init_builtins (); > + aarch64_sve::init_neon_sve_builtins (); > #ifdef SUBTARGET_INIT_BUILTINS > SUBTARGET_INIT_BUILTINS; > #endif > @@ -24100,6 +24101,23 @@ aarch64_output_sve_ptrues (rtx const_unspec) > return templ; > } > > +const char * > +aarch64_output_sve_set_neonq (rtx * operands, machine_mode mode) > +{ > + switch(GET_MODE_UNIT_BITSIZE(mode)) > + { > + case 64: > + return "sel\t%0.d, %3, %2.d, %1.d"; > + case 32: > + return "sel\t%0.s, %3, %2.s, %1.s"; > + case 16: > + return "sel\t%0.h, %3, %2.h, %1.h"; > + case 8: > + return "sel\t%0.b, %3, %2.b, %1.b"; > + } > + > +} This function shouldn't be needed. It should be possible to do this directly in the define_insn, with an asm template such as: "sel\t%0., %3, %2., %1." > + > /* Split operands into moves from op[1] + op[2] into op[0]. */ > > void > [...] The tests look good, but: > diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/set_neonq_bf16.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/set_neonq_bf16.c > new file mode 100644 > index 0000000000000000000000000000000000000000..b12ce1d46b468359728a7fef5ae464b9e80c2e52 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/set_neonq_bf16.c > @@ -0,0 +1,23 @@ > +/* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */ > + > +#include "test_sve_acle.h" > + > +/* > +** set_neonq_bf16_z24: > +** ptrue p3.h, vl8 > +** sel z24.h, p3, z0.h, z4.h > +** ret > +*/ > +TEST_SET_NEONQ (set_neonq_bf16_z24, svbfloat16_t, bfloat16x8_t, > + z24 = svset_neonq_bf16 (z4, z0), > + z24 = svset_neonq (z4, z0)) There's nothing that forces the predicate to be p3. I think it should be: /* ** set_neonq_bf16_z24: ** ptrue (p[0-9]+).h, vl8 ** sel z24.h, \1, z0.h, z4.h ** ret */ Same for the other tests and files. > + > +/* > +** set_neonq_bf16_z4: > +** ptrue p3.h, vl8 > +** sel (z0.h|z4.h), p3, z0.h, z4.h Given: > +** ret > +*/ > +TEST_SET_NEONQ (set_neonq_bf16_z4, svbfloat16_t, bfloat16x8_t, > + z4 = svset_neonq_bf16 (z4, z0), > + z4 = svset_neonq (z4, z0)) ...this, we should try to force the z4 allocation of the result. It's probably easiest to do that using register asms in TEST_SET_NEONQ, like TEST_DUP_NEONQ already does. Thanks, Richard