From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=7pns=HD=arm.com=richard.sandiford@sourceware.org>
Received: from foss.arm.com (foss.arm.com [217.140.110.172])
	by sourceware.org (Postfix) with ESMTP id 658CC3858D39
	for <gcc-patches@gcc.gnu.org>; Wed, 22 Nov 2023 14:53:01 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 658CC3858D39
Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com
ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 658CC3858D39
Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=217.140.110.172
ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1700664786; cv=none;
	b=jMG/wogTMZF9jYV9+vXpcVJU6p8EoxlsgsdTLBsvzr56VDRuL/b9SPtpXCsqdxsBNaTNq0ENlw/JPnIQZ5pKJgDOMJ69SSIZfaQ/3DahLIOuRNDYWh2R4d5iF3KvZTpqJoeUkhs5n1lDjgMf1B6XE1jN6tq4xaWe6papZiJEZpc=
ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key;
	t=1700664786; c=relaxed/simple;
	bh=UOaNuS17zQStBGIQhRBAlkpn+uapdkis7w95pMPWJ7o=;
	h=From:To:Subject:Date:Message-ID:MIME-Version; b=tuBwM4TTyaq2u7q8nRAAQ+/OhKNosie8zuyPa544tTPiWvJFEhbNTZV68Y+2KggTdvW48EWlghAU9JtBagHelz7r/3eIFEp4u7x1EBLZlK72WvwgtsCniJKJys/LcsWHB9Mc8+H0Ow01AaYdJrKhD8d1/17i936Qze+LmehJwSo=
ARC-Authentication-Results: i=1; server2.sourceware.org
Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14])
	by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id E18021595;
	Wed, 22 Nov 2023 06:53:47 -0800 (PST)
Received: from localhost (e121540-lin.manchester.arm.com [10.32.110.72])
	by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id B7BD13F73F;
	Wed, 22 Nov 2023 06:52:59 -0800 (PST)
From: Richard Sandiford <richard.sandiford@arm.com>
To: Richard Ball <richard.ball@arm.com>
Mail-Followup-To: Richard Ball <richard.ball@arm.com>,"gcc-patches\@gcc.gnu.org" <gcc-patches@gcc.gnu.org>,  Richard Earnshaw <richard.earnshaw@arm.com>,  Kyrylo Tkachov <kyrylo.tkachov@arm.com>,  Marcus Shawcroft <marcus.shawcroft@arm.com>, richard.sandiford@arm.com
Cc: "gcc-patches\@gcc.gnu.org" <gcc-patches@gcc.gnu.org>,  Richard Earnshaw <richard.earnshaw@arm.com>,  Kyrylo Tkachov <kyrylo.tkachov@arm.com>,  Marcus Shawcroft <marcus.shawcroft@arm.com>
Subject: Re: [PATCH v3] aarch64: SVE/NEON Bridging intrinsics
References: <d536f00d-b9b8-7ece-23f9-b8aa5b7163c7@arm.com>
Date: Wed, 22 Nov 2023 14:52:58 +0000
In-Reply-To: <d536f00d-b9b8-7ece-23f9-b8aa5b7163c7@arm.com> (Richard Ball's
	message of "Thu, 9 Nov 2023 16:14:50 +0000")
Message-ID: <mpta5r596wl.fsf@arm.com>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux)
MIME-Version: 1.0
Content-Type: text/plain
X-Spam-Status: No, score=-22.6 required=5.0 tests=BAYES_00,GIT_PATCH_0,KAM_DMARC_NONE,KAM_DMARC_STATUS,KAM_LAZY_DOMAIN_SECURITY,KAM_LOTSOFHASH,KAM_SHORT,SPF_HELO_NONE,SPF_NONE,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org
List-Id: <gcc-patches.gcc.gnu.org>

Richard Ball <richard.ball@arm.com> writes:
> ACLE has added intrinsics to bridge between SVE and Neon.
>
> The NEON_SVE Bridge adds intrinsics that allow conversions between NEON and
> SVE vectors.
>
> This patch adds support to GCC for the following 3 intrinsics:
> svset_neonq, svget_neonq and svdup_neonq
>
> gcc/ChangeLog:
>
> 	* config.gcc: Adds new header to config.
> 	* config/aarch64/aarch64-builtins.cc (enum aarch64_type_qualifiers):
> 	Moved to header file.
> 	(ENTRY): Likewise.
> 	(enum aarch64_simd_type): Likewise.
> 	(struct aarch64_simd_type_info): Make extern.
> 	(GTY): Likewise.
> 	* config/aarch64/aarch64-c.cc (aarch64_pragma_aarch64):
> 	Defines pragma for arm_neon_sve_bridge.h.
> 	* config/aarch64/aarch64-protos.h: New function.
> 	* config/aarch64/aarch64-sve-builtins-base.h: New intrinsics.
> 	* config/aarch64/aarch64-sve-builtins-base.cc
> 	(class svget_neonq_impl): New intrinsic implementation.
> 	(class svset_neonq_impl): Likewise.
> 	(class svdup_neonq_impl): Likewise.
> 	(NEON_SVE_BRIDGE_FUNCTION): New intrinsics.
> 	* config/aarch64/aarch64-sve-builtins-functions.h
> 	(NEON_SVE_BRIDGE_FUNCTION): Defines macro for NEON_SVE_BRIDGE
> 	functions.
> 	* config/aarch64/aarch64-sve-builtins-shapes.h: New shapes.
> 	* config/aarch64/aarch64-sve-builtins-shapes.cc
> 	(parse_element_type): Add NEON element types.
> 	(parse_type): Likewise.
> 	(struct get_neonq_def): Defines function shape for get_neonq.
> 	(struct set_neonq_def): Defines function shape for set_neonq.
> 	(struct dup_neonq_def): Defines function shape for dup_neonq.
> 	* config/aarch64/aarch64-sve-builtins.cc (DEF_SVE_TYPE_SUFFIX):
> 	(DEF_SVE_NEON_TYPE_SUFFIX): Defines 
>         macro for NEON_SVE_BRIDGE type suffixes.
> 	(DEF_NEON_SVE_FUNCTION): Defines 
>         macro for NEON_SVE_BRIDGE functions.
> 	(function_resolver::infer_neon128_vector_type): Infers type suffix
> 	for overloaded functions.
> 	(init_neon_sve_builtins): Initialise neon_sve_bridge_builtins for LTO.
> 	(handle_arm_neon_sve_bridge_h): Handles #pragma arm_neon_sve_bridge.h.
> 	* config/aarch64/aarch64-sve-builtins.def
> 	(DEF_SVE_NEON_TYPE_SUFFIX): Macro for handling neon_sve type suffixes.
> 	(bf16): Replace entry with neon-sve entry.
> 	(f16): Likewise.
> 	(f32): Likewise.
> 	(f64): Likewise.
> 	(s8): Likewise.
> 	(s16): Likewise.
> 	(s32): Likewise.
> 	(s64): Likewise.
> 	(u8): Likewise.
> 	(u16): Likewise.
> 	(u32): Likewise.
> 	(u64): Likewise.
> 	* config/aarch64/aarch64-sve-builtins.h
> 	(GCC_AARCH64_SVE_BUILTINS_H): Include aarch64-builtins.h.
> 	(ENTRY): Add aarch64_simd_type definiton.
> 	(enum aarch64_simd_type): Add neon information to type_suffix_info.
> 	(struct type_suffix_info): New function.
> 	* config/aarch64/aarch64-sve.md
> 	(@aarch64_sve_get_neonq_<mode>): New intrinsic insn for big endian.
> 	(@aarch64_sve_set_neonq_<mode>): Likewise.
> 	(@aarch64_sve_dup_neonq_<mode>): Likewise.
> 	* config/aarch64/aarch64.cc 
> 	(aarch64_init_builtins): Add call to init_neon_sve_builtins.
>         (aarch64_output_sve_set_neonq): asm output for Big Endian set_neonq.
> 	* config/aarch64/iterators.md: Add UNSPEC_SET_NEONQ.
> 	* config/aarch64/aarch64-builtins.h: New file.
> 	* config/aarch64/aarch64-neon-sve-bridge-builtins.def: New file.
> 	* config/aarch64/arm_neon_sve_bridge.h: New file.
>
> gcc/testsuite/ChangeLog:
>
> 	* gcc.target/aarch64/sve/acle/asm/test_sve_acle.h: Add include 
> 	arm_neon_sve_bridge header file
> 	* gcc.dg/torture/neon-sve-bridge.c: New test.
> 	* gcc.target/aarch64/sve/acle/asm/dup_neonq_bf16.c: New test.
> 	* gcc.target/aarch64/sve/acle/asm/dup_neonq_f16.c: New test.
> 	* gcc.target/aarch64/sve/acle/asm/dup_neonq_f32.c: New test.
> 	* gcc.target/aarch64/sve/acle/asm/dup_neonq_f64.c: New test.
> 	* gcc.target/aarch64/sve/acle/asm/dup_neonq_s16.c: New test.
> 	* gcc.target/aarch64/sve/acle/asm/dup_neonq_s32.c: New test.
> 	* gcc.target/aarch64/sve/acle/asm/dup_neonq_s64.c: New test.
> 	* gcc.target/aarch64/sve/acle/asm/dup_neonq_s8.c: New test.
> 	* gcc.target/aarch64/sve/acle/asm/dup_neonq_u16.c: New test.
> 	* gcc.target/aarch64/sve/acle/asm/dup_neonq_u32.c: New test.
> 	* gcc.target/aarch64/sve/acle/asm/dup_neonq_u64.c: New test.
> 	* gcc.target/aarch64/sve/acle/asm/dup_neonq_u8.c: New test.
> 	* gcc.target/aarch64/sve/acle/asm/get_neonq_bf16.c: New test.
> 	* gcc.target/aarch64/sve/acle/asm/get_neonq_f16.c: New test.
> 	* gcc.target/aarch64/sve/acle/asm/get_neonq_f32.c: New test.
> 	* gcc.target/aarch64/sve/acle/asm/get_neonq_f64.c: New test.
> 	* gcc.target/aarch64/sve/acle/asm/get_neonq_s16.c: New test.
> 	* gcc.target/aarch64/sve/acle/asm/get_neonq_s32.c: New test.
> 	* gcc.target/aarch64/sve/acle/asm/get_neonq_s64.c: New test.
> 	* gcc.target/aarch64/sve/acle/asm/get_neonq_s8.c: New test.
> 	* gcc.target/aarch64/sve/acle/asm/get_neonq_u16.c: New test.
> 	* gcc.target/aarch64/sve/acle/asm/get_neonq_u32.c: New test.
> 	* gcc.target/aarch64/sve/acle/asm/get_neonq_u64.c: New test.
> 	* gcc.target/aarch64/sve/acle/asm/get_neonq_u8.c: New test.
> 	* gcc.target/aarch64/sve/acle/asm/set_neonq_bf16.c: New test.
> 	* gcc.target/aarch64/sve/acle/asm/set_neonq_f16.c: New test.
> 	* gcc.target/aarch64/sve/acle/asm/set_neonq_f32.c: New test.
> 	* gcc.target/aarch64/sve/acle/asm/set_neonq_f64.c: New test.
> 	* gcc.target/aarch64/sve/acle/asm/set_neonq_s16.c: New test.
> 	* gcc.target/aarch64/sve/acle/asm/set_neonq_s32.c: New test.
> 	* gcc.target/aarch64/sve/acle/asm/set_neonq_s64.c: New test.
> 	* gcc.target/aarch64/sve/acle/asm/set_neonq_s8.c: New test.
> 	* gcc.target/aarch64/sve/acle/asm/set_neonq_u16.c: New test.
> 	* gcc.target/aarch64/sve/acle/asm/set_neonq_u32.c: New test.
> 	* gcc.target/aarch64/sve/acle/asm/set_neonq_u64.c: New test.
> 	* gcc.target/aarch64/sve/acle/asm/set_neonq_u8.c: New test.
> 	* gcc.target/aarch64/sve/acle/general-c/dup_neonq_1.c: New test.
> 	* gcc.target/aarch64/sve/acle/general-c/get_neonq_1.c: New test.
> 	* gcc.target/aarch64/sve/acle/general-c/set_neonq_1.c: New test.

Thanks, looks good.  Some comments below, but nothing major.

>
> diff --git a/gcc/config.gcc b/gcc/config.gcc
> index d34ea246a980b5d8aaab86e4459de5ef4d341fe2..1c92c390e9b1b14d2f756ec233bba713ca8aaa94 100644
> --- a/gcc/config.gcc
> +++ b/gcc/config.gcc
> @@ -345,7 +345,7 @@ m32c*-*-*)
>          ;;
>  aarch64*-*-*)
>  	cpu_type=aarch64
> -	extra_headers="arm_fp16.h arm_neon.h arm_bf16.h arm_acle.h arm_sve.h"
> +	extra_headers="arm_fp16.h arm_neon.h arm_bf16.h arm_acle.h arm_sve.h arm_neon_sve_bridge.h"
>  	c_target_objs="aarch64-c.o"
>  	cxx_target_objs="aarch64-c.o"
>  	d_target_objs="aarch64-d.o"
> diff --git a/gcc/config/aarch64/aarch64-builtins.h b/gcc/config/aarch64/aarch64-builtins.h
> new file mode 100644
> index 0000000000000000000000000000000000000000..ec4580981587ab3acbb39e0b0721ed247e309a74
> --- /dev/null
> +++ b/gcc/config/aarch64/aarch64-builtins.h
> @@ -0,0 +1,86 @@
> +/* Builtins' description for AArch64 SIMD architecture.
> +   Copyright (C) 2023 Free Software Foundation, Inc.
> +   This file is part of GCC.
> +   GCC is free software; you can redistribute it and/or modify it
> +   under the terms of the GNU General Public License as published by
> +   the Free Software Foundation; either version 3, or (at your option)
> +   any later version.
> +   GCC is distributed in the hope that it will be useful, but
> +   WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   General Public License for more details.
> +   You should have received a copy of the GNU General Public License
> +   along with GCC; see the file COPYING3.  If not see
> +   <http://www.gnu.org/licenses/>.  */

Please keep the copyright text verbatim, including the blank lines.
(aarch64-neon-sve-bridge-builtins.def looks good.)

> +#ifndef GCC_AARCH64_BUILTINS_H
> +#define GCC_AARCH64_BUILTINS_H
> +
> +enum aarch64_type_qualifiers
> +{
> +  /* T foo.  */
> +  qualifier_none = 0x0,
> +  /* unsigned T foo.  */
> +  qualifier_unsigned = 0x1, /* 1 << 0  */
> +  /* const T foo.  */
> +  qualifier_const = 0x2, /* 1 << 1  */
> +  /* T *foo.  */
> +  qualifier_pointer = 0x4, /* 1 << 2  */
> +  /* Used when expanding arguments if an operand could
> +     be an immediate.  */
> +  qualifier_immediate = 0x8, /* 1 << 3  */
> +  qualifier_maybe_immediate = 0x10, /* 1 << 4  */
> +  /* void foo (...).  */
> +  qualifier_void = 0x20, /* 1 << 5  */
> +  /* 1 << 6 is now unused */
> +  /* Some builtins should use the T_*mode* encoded in a simd_builtin_datum
> +     rather than using the type of the operand.  */
> +  qualifier_map_mode = 0x80, /* 1 << 7  */
> +  /* qualifier_pointer | qualifier_map_mode  */
> +  qualifier_pointer_map_mode = 0x84,
> +  /* qualifier_const | qualifier_pointer | qualifier_map_mode  */
> +  qualifier_const_pointer_map_mode = 0x86,
> +  /* Polynomial types.  */
> +  qualifier_poly = 0x100,
> +  /* Lane indices - must be in range, and flipped for bigendian.  */
> +  qualifier_lane_index = 0x200,
> +  /* Lane indices for single lane structure loads and stores.  */
> +  qualifier_struct_load_store_lane_index = 0x400,
> +  /* Lane indices selected in pairs. - must be in range, and flipped for
> +     bigendian.  */
> +  qualifier_lane_pair_index = 0x800,
> +  /* Lane indices selected in quadtuplets. - must be in range, and flipped for
> +     bigendian.  */
> +  qualifier_lane_quadtup_index = 0x1000,
> +};
> +#define ENTRY(E, M, Q, G) E,
> +enum aarch64_simd_type
> +{
> +#include "aarch64-simd-builtin-types.def"
> +  ARM_NEON_H_TYPES_LAST
> +};
> +#undef ENTRY
> +struct GTY(()) aarch64_simd_type_info
> +{
> +  enum aarch64_simd_type type;
> +  /* Internal type name.  */
> +  const char *name;
> +  /* Internal type name(mangled).  The mangled names conform to the
> +     AAPCS64 (see "Procedure Call Standard for the ARM 64-bit Architecture",
> +     Appendix A).  To qualify for emission with the mangled names defined in
> +     that document, a vector type must not only be of the correct mode but also
> +     be of the correct internal AdvSIMD vector type (e.g. __Int8x8_t); these
> +     types are registered by aarch64_init_simd_builtin_types ().  In other
> +     words, vector types defined in other ways e.g. via vector_size attribute
> +     will get default mangled names.  */
> +  const char *mangle;
> +  /* Internal type.  */
> +  tree itype;
> +  /* Element type.  */
> +  tree eltype;
> +  /* Machine mode the internal type maps to.  */
> +  enum machine_mode mode;
> +  /* Qualifiers.  */
> +  enum aarch64_type_qualifiers q;
> +};

Sorry for the trivia, but: I thought the blank lines in the original
aarch64_simd_type_info made this easier to read.

> +extern aarch64_simd_type_info aarch64_simd_types[];
> +#endif
> \ No newline at end of file
> diff --git a/gcc/config/aarch64/aarch64-builtins.cc b/gcc/config/aarch64/aarch64-builtins.cc
> index 04f59fd9a54306d6422b03e32dce79bc00aed4f8..0b039c075a5cb312339729d388c9be0072f80b91 100644
> --- a/gcc/config/aarch64/aarch64-builtins.cc
> +++ b/gcc/config/aarch64/aarch64-builtins.cc
> @@ -47,6 +47,7 @@
>  #include "stringpool.h"
>  #include "attribs.h"
>  #include "gimple-fold.h"
> +#include "aarch64-builtins.h"
>  
>  #define v8qi_UP  E_V8QImode
>  #define v8di_UP  E_V8DImode
> @@ -183,47 +184,8 @@
>  #define SIMD_INTR_QUAL(suffix) QUAL_##suffix
>  #define SIMD_INTR_LENGTH_CHAR(length) LENGTH_##length
>  
> -
>  #define SIMD_MAX_BUILTIN_ARGS 5
>  
> -enum aarch64_type_qualifiers
> -{
> -  /* T foo.  */
> -  qualifier_none = 0x0,
> -  /* unsigned T foo.  */
> -  qualifier_unsigned = 0x1, /* 1 << 0  */
> -  /* const T foo.  */
> -  qualifier_const = 0x2, /* 1 << 1  */
> -  /* T *foo.  */
> -  qualifier_pointer = 0x4, /* 1 << 2  */
> -  /* Used when expanding arguments if an operand could
> -     be an immediate.  */
> -  qualifier_immediate = 0x8, /* 1 << 3  */
> -  qualifier_maybe_immediate = 0x10, /* 1 << 4  */
> -  /* void foo (...).  */
> -  qualifier_void = 0x20, /* 1 << 5  */
> -  /* 1 << 6 is now unused */
> -  /* Some builtins should use the T_*mode* encoded in a simd_builtin_datum
> -     rather than using the type of the operand.  */
> -  qualifier_map_mode = 0x80, /* 1 << 7  */
> -  /* qualifier_pointer | qualifier_map_mode  */
> -  qualifier_pointer_map_mode = 0x84,
> -  /* qualifier_const | qualifier_pointer | qualifier_map_mode  */
> -  qualifier_const_pointer_map_mode = 0x86,
> -  /* Polynomial types.  */
> -  qualifier_poly = 0x100,
> -  /* Lane indices - must be in range, and flipped for bigendian.  */
> -  qualifier_lane_index = 0x200,
> -  /* Lane indices for single lane structure loads and stores.  */
> -  qualifier_struct_load_store_lane_index = 0x400,
> -  /* Lane indices selected in pairs. - must be in range, and flipped for
> -     bigendian.  */
> -  qualifier_lane_pair_index = 0x800,
> -  /* Lane indices selected in quadtuplets. - must be in range, and flipped for
> -     bigendian.  */
> -  qualifier_lane_quadtup_index = 0x1000,
> -};
> -
>  /* Flags that describe what a function might do.  */
>  const unsigned int FLAG_NONE = 0U;
>  const unsigned int FLAG_READ_FPCR = 1U << 0;
> @@ -883,47 +845,9 @@ const char *aarch64_scalar_builtin_types[] = {
>    NULL
>  };
>  
> -#define ENTRY(E, M, Q, G) E,
> -enum aarch64_simd_type
> -{
> -#include "aarch64-simd-builtin-types.def"
> -  ARM_NEON_H_TYPES_LAST
> -};
> -#undef ENTRY
> -
> -struct GTY(()) aarch64_simd_type_info
> -{
> -  enum aarch64_simd_type type;
> -
> -  /* Internal type name.  */
> -  const char *name;
> -
> -  /* Internal type name(mangled).  The mangled names conform to the
> -     AAPCS64 (see "Procedure Call Standard for the ARM 64-bit Architecture",
> -     Appendix A).  To qualify for emission with the mangled names defined in
> -     that document, a vector type must not only be of the correct mode but also
> -     be of the correct internal AdvSIMD vector type (e.g. __Int8x8_t); these
> -     types are registered by aarch64_init_simd_builtin_types ().  In other
> -     words, vector types defined in other ways e.g. via vector_size attribute
> -     will get default mangled names.  */
> -  const char *mangle;
> -
> -  /* Internal type.  */
> -  tree itype;
> -
> -  /* Element type.  */
> -  tree eltype;
> -
> -  /* Machine mode the internal type maps to.  */
> -  enum machine_mode mode;
> -
> -  /* Qualifiers.  */
> -  enum aarch64_type_qualifiers q;
> -};
> -
>  #define ENTRY(E, M, Q, G)  \
>    {E, "__" #E, #G "__" #E, NULL_TREE, NULL_TREE, E_##M##mode, qualifier_##Q},
> -static GTY(()) struct aarch64_simd_type_info aarch64_simd_types [] = {
> +extern GTY(()) struct aarch64_simd_type_info aarch64_simd_types [] = {
>  #include "aarch64-simd-builtin-types.def"
>  };
>  #undef ENTRY
> diff --git a/gcc/config/aarch64/aarch64-c.cc b/gcc/config/aarch64/aarch64-c.cc
> index ab8844f6049dc95b97648b651bfcd3a4ccd3ca0b..591cbaad24a4874029ebddedef23f22ff5196295 100644
> --- a/gcc/config/aarch64/aarch64-c.cc
> +++ b/gcc/config/aarch64/aarch64-c.cc
> @@ -295,6 +295,8 @@ aarch64_pragma_aarch64 (cpp_reader *)
>      handle_arm_neon_h ();
>    else if (strcmp (name, "arm_acle.h") == 0)
>      handle_arm_acle_h ();
> +  else if (strcmp (name, "arm_neon_sve_bridge.h") == 0)
> +    aarch64_sve::handle_arm_neon_sve_bridge_h ();
>    else
>      error ("unknown %<#pragma GCC aarch64%> option %qs", name);
>  }
> diff --git a/gcc/config/aarch64/aarch64-neon-sve-bridge-builtins.def b/gcc/config/aarch64/aarch64-neon-sve-bridge-builtins.def
> new file mode 100644
> index 0000000000000000000000000000000000000000..0c3cf233c9382b2f7420379054a53fa846d46c8c
> --- /dev/null
> +++ b/gcc/config/aarch64/aarch64-neon-sve-bridge-builtins.def
> @@ -0,0 +1,28 @@
> +/* Builtin lists for AArch64 NEON-SVE-Bridge
> +   Copyright (C) 2023 Free Software Foundation, Inc.
> +
> +   This file is part of GCC.
> +
> +   GCC is free software; you can redistribute it and/or modify it
> +   under the terms of the GNU General Public License as published by
> +   the Free Software Foundation; either version 3, or (at your option)
> +   any later version.
> +
> +   GCC is distributed in the hope that it will be useful, but
> +   WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   General Public License for more details.
> +
> +   You should have received a copy of the GNU General Public License
> +   along with GCC; see the file COPYING3.  If not see
> +   <http://www.gnu.org/licenses/>.  */
> +
> +#ifndef DEF_NEON_SVE_FUNCTION
> +#define DEF_NEON_SVE_FUNCTION(A, B, C, D)
> +#endif
> +
> +DEF_NEON_SVE_FUNCTION (svset_neonq, set_neonq, all_data, none)
> +DEF_NEON_SVE_FUNCTION (svget_neonq, get_neonq, all_data, none)
> +DEF_NEON_SVE_FUNCTION (svdup_neonq, dup_neonq, all_data, none)
> +
> +#undef DEF_NEON_SVE_FUNCTION
> \ No newline at end of file
> diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
> index 60a55f4bc1956786ea687fc7cad7ec9e4a84e1f0..5d05cac51c237b12bd2b2f11eb91b01480750ded 100644
> --- a/gcc/config/aarch64/aarch64-protos.h
> +++ b/gcc/config/aarch64/aarch64-protos.h
> @@ -817,6 +817,7 @@ char *aarch64_output_simd_mov_immediate (rtx, unsigned,
>  			enum simd_immediate_check w = AARCH64_CHECK_MOV);
>  char *aarch64_output_sve_mov_immediate (rtx);
>  char *aarch64_output_sve_ptrues (rtx);
> +const char *aarch64_output_sve_set_neonq (rtx *, machine_mode);
>  bool aarch64_pad_reg_upward (machine_mode, const_tree, bool);
>  bool aarch64_regno_ok_for_base_p (int, bool);
>  bool aarch64_regno_ok_for_index_p (int, bool);
> @@ -990,7 +991,9 @@ void handle_arm_neon_h (void);
>  
>  namespace aarch64_sve {
>    void init_builtins ();
> +  void init_neon_sve_builtins ();
>    void handle_arm_sve_h ();
> +  void handle_arm_neon_sve_bridge_h ();
>    tree builtin_decl (unsigned, bool);
>    bool builtin_type_p (const_tree);
>    bool builtin_type_p (const_tree, unsigned int *, unsigned int *);
> diff --git a/gcc/config/aarch64/aarch64-sve-builtins-base.h b/gcc/config/aarch64/aarch64-sve-builtins-base.h
> index d300e3a85d00b58ad790851a81d43af709b66bce..df75e4c1ecf81f3ddfa256edbcf8637d092fcfde 100644
> --- a/gcc/config/aarch64/aarch64-sve-builtins-base.h
> +++ b/gcc/config/aarch64/aarch64-sve-builtins-base.h
> @@ -299,6 +299,12 @@ namespace aarch64_sve
>      extern const function_base *const svzip2;
>      extern const function_base *const svzip2q;
>    }
> +  namespace neon_sve_bridge_functions
> +  {
> +    extern const function_base *const svset_neonq;
> +    extern const function_base *const svget_neonq;
> +    extern const function_base *const svdup_neonq;
> +  }
>  }
>  
>  #endif
> diff --git a/gcc/config/aarch64/aarch64-sve-builtins-base.cc b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
> index 9010ecca6da23c107f5ded9ab3cfa678e308daf9..5e3b1fb19776a84710f2d730bc028614ecd54095 100644
> --- a/gcc/config/aarch64/aarch64-sve-builtins-base.cc
> +++ b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
> @@ -44,6 +44,7 @@
>  #include "aarch64-sve-builtins-shapes.h"
>  #include "aarch64-sve-builtins-base.h"
>  #include "aarch64-sve-builtins-functions.h"
> +#include "aarch64-builtins.h"
>  #include "ssa.h"
>  #include "gimple-fold.h"
>  
> @@ -1064,6 +1065,131 @@ public:
>    }
>  };
>  
> +class svget_neonq_impl : public function_base
> +{
> +public:
> +  gimple *
> +  fold (gimple_folder &f) const override
> +  {
> +    if (BYTES_BIG_ENDIAN)
> +      return NULL;
> +    tree rhs_tuple = gimple_call_arg (f.call, 0);
> +    tree rhs_vector = build3 (BIT_FIELD_REF, TREE_TYPE (f.lhs),
> +			     rhs_tuple, bitsize_int(128), bitsize_int(0));

Formatting nit: convention is to add a space before the "(128)" and "(0)".

The argument isn't a tuple, but instead an SVE vector.  Maybe just use
rhs_vector for both, or rhs_sve_vector for the first, etc.

> +    return gimple_build_assign (f.lhs, rhs_vector);
> +  }
> +  rtx
> +  expand (function_expander &e) const override
> +  {
> +    if (BYTES_BIG_ENDIAN)
> +      {
> +	machine_mode mode = e.vector_mode (0);
> +	insn_code icode = code_for_aarch64_sve_get_neonq (mode);
> +	unsigned int nunits = 128 / GET_MODE_UNIT_BITSIZE (mode);
> +	rtx indices = aarch64_gen_stepped_int_parallel
> +	  (nunits, (nunits - 1) , -1);

Formatting: (nunits, units - 1, -1);

> +
> +	e.add_output_operand (icode);
> +	e.add_input_operand (icode, e.args[0]);
> +	e.add_fixed_operand (indices);
> +	return e.generate_insn (icode);
> +      }
> +    return simplify_gen_subreg (e.vector_mode (0), e.args[0],
> +				GET_MODE (e.args[0]),

e.vector_mode (0) is the mode of the argument rather than the mode
of the result.

> +				INTVAL (e.args[1]) * BYTES_PER_SVE_VECTOR);

There is no argument 1.  I think the final simplify_gen_subreg argument
should just be zero.

It's hard to test this with the fold in place, but it would be good
to try the tests with the fold disabled.

> +  }
> +};
> +
> +class svset_neonq_impl : public function_base
> +{
> +public:
> +  rtx
> +  expand (function_expander &e) const override
> +  {
> +    insn_code icode;
> +    machine_mode mode = e.vector_mode (0);
> +    icode = code_for_vcond_mask (mode, mode);
> +    rtx_vector_builder builder (VNx16BImode, 16, 2);
> +    for (unsigned int i = 0; i < 16; i++)
> +      {
> +	builder.quick_push (CONST1_RTX (BImode));
> +      }

Formatting trivia, sorry, but: no braces around single statements.
Same for the rest of the patch.

> +    for (unsigned int i = 0; i < 16; i++)
> +      {
> +	builder.quick_push (CONST0_RTX (BImode));
> +      }
> +    e.args.quick_push (builder.build ());
> +    if (BYTES_BIG_ENDIAN)
> +      {
> +	return e.use_exact_insn (code_for_aarch64_sve_set_neonq (mode));
> +      }

Very minor, but it might be good to move the icode down here:

  insn_code icode = code_for_vcond_mask (mode, mode);

to avoid giving the impression that it's used for big-endian.

> +    e.args[1] = lowpart_subreg (mode, e.args[1], GET_MODE (e.args[1]));
> +    e.add_output_operand (icode);
> +    e.add_input_operand (icode, e.args[1]);
> +    e.add_input_operand (icode, e.args[0]);
> +    e.add_input_operand (icode, e.args[2]);
> +    return e.generate_insn (icode);
> +  }
> +};
> +
> +class svdup_neonq_impl : public function_base
> +{
> +public:
> +  gimple *
> +  fold (gimple_folder &f) const override
> +  {
> +    if (BYTES_BIG_ENDIAN)
> +      {
> +	return NULL;
> +      }
> +    tree rhs_vector = gimple_call_arg (f.call, 0);
> +    unsigned int nargs = gimple_call_num_args (f.call);
> +    unsigned HOST_WIDE_INT NEONnelts
> +      = TYPE_VECTOR_SUBPARTS (TREE_TYPE (rhs_vector)).to_constant ();
> +    poly_uint64 SVEnelts;
> +    SVEnelts = TYPE_VECTOR_SUBPARTS (TREE_TYPE (f.lhs));

GCC style is to use lower-case variable names, so maybe neon_nelts
and sve_nelts instead.

> +    vec_perm_builder builder (SVEnelts, NEONnelts, 1);
> +    for (unsigned int i = 0; i < NEONnelts; i++)
> +      {
> +	builder.quick_push (i);
> +      }
> +    vec_perm_indices indices (builder, 1, NEONnelts);
> +    tree perm_type = build_vector_type (ssizetype, SVEnelts);
> +    return gimple_build_assign (f.lhs, VEC_PERM_EXPR,
> +				rhs_vector,
> +				rhs_vector,
> +				vec_perm_indices_to_tree (perm_type, indices));
> +  }
> +  rtx
> +  expand (function_expander &e) const override
> +  {
> +    insn_code icode;
> +    machine_mode mode = e.vector_mode (0);
> +    if (BYTES_BIG_ENDIAN)
> +      {
> +	icode = code_for_aarch64_sve_dup_neonq (mode);
> +	unsigned int nunits = 128 / GET_MODE_UNIT_BITSIZE (mode);
> +	rtx indices = aarch64_gen_stepped_int_parallel
> +	  (nunits, (nunits - 1) , -1);

Same formatting comment as above.

> +
> +	e.add_output_operand (icode);
> +	e.add_input_operand (icode, e.args[0]);
> +	e.add_fixed_operand (indices);
> +	return e.generate_insn (icode);
> +      }
> +    if (valid_for_const_vector_p (GET_MODE_INNER (mode), e.args.last ()))
> +      /* Duplicate the constant to fill a vector.  The pattern optimizes
> +	 various cases involving constant operands, falling back to SEL
> +	 if necessary.  */
> +      icode = code_for_vcond_mask (mode, mode);
> +    else
> +      /* Use the pattern for selecting between a duplicated scalar
> +	 variable and a vector fallback.  */
> +      icode = code_for_aarch64_sel_dup (mode);
> +    return e.use_vcond_mask_insn (icode);

I think this should just unconditionally use:

  @aarch64_vec_duplicate_vq<mode>_le

Again, the only good way to test it is to disable the fold locally
and then run the tests.

> +  }
> +};
> +
>  class svindex_impl : public function_base
>  {
>  public:
> @@ -3028,5 +3154,8 @@ FUNCTION (svzip1q, unspec_based_function, (UNSPEC_ZIP1Q, UNSPEC_ZIP1Q,
>  FUNCTION (svzip2, svzip_impl, (1))
>  FUNCTION (svzip2q, unspec_based_function, (UNSPEC_ZIP2Q, UNSPEC_ZIP2Q,
>  					   UNSPEC_ZIP2Q))
> +NEON_SVE_BRIDGE_FUNCTION (svget_neonq, svget_neonq_impl,)
> +NEON_SVE_BRIDGE_FUNCTION (svset_neonq, svset_neonq_impl,)
> +NEON_SVE_BRIDGE_FUNCTION (svdup_neonq, svdup_neonq_impl,)
>  
>  } /* end namespace aarch64_sve */
> diff --git a/gcc/config/aarch64/aarch64-sve-builtins-functions.h b/gcc/config/aarch64/aarch64-sve-builtins-functions.h
> index 2729877d914414eff33182e03ab1dfc94a3515fa..bfb7fea674a905a2eb99f2bac7cbcb72af681b52 100644
> --- a/gcc/config/aarch64/aarch64-sve-builtins-functions.h
> +++ b/gcc/config/aarch64/aarch64-sve-builtins-functions.h
> @@ -622,4 +622,8 @@ public:
>    namespace { static CONSTEXPR const CLASS NAME##_obj ARGS; } \
>    namespace functions { const function_base *const NAME = &NAME##_obj; }
>  
> +#define NEON_SVE_BRIDGE_FUNCTION(NAME, CLASS, ARGS) \
> +  namespace { static CONSTEXPR const CLASS NAME##_obj ARGS; } \
> +  namespace neon_sve_bridge_functions { const function_base *const NAME = &NAME##_obj; }
> +
>  #endif
> diff --git a/gcc/config/aarch64/aarch64-sve-builtins-shapes.h b/gcc/config/aarch64/aarch64-sve-builtins-shapes.h
> index 7483c1d04b8e463e607e8e65aa94233460f77648..5aff20d1d21afddb934be4d5a103049b0b6c40ea 100644
> --- a/gcc/config/aarch64/aarch64-sve-builtins-shapes.h
> +++ b/gcc/config/aarch64/aarch64-sve-builtins-shapes.h
> @@ -105,9 +105,11 @@ namespace aarch64_sve
>      extern const function_shape *const count_vector;
>      extern const function_shape *const create;
>      extern const function_shape *const dupq;
> +    extern const function_shape *const dup_neonq;
>      extern const function_shape *const ext;
>      extern const function_shape *const fold_left;
>      extern const function_shape *const get;
> +    extern const function_shape *const get_neonq;
>      extern const function_shape *const inc_dec;
>      extern const function_shape *const inc_dec_pat;
>      extern const function_shape *const inc_dec_pred;
> @@ -135,6 +137,7 @@ namespace aarch64_sve
>      extern const function_shape *const reduction_wide;
>      extern const function_shape *const set;
>      extern const function_shape *const setffr;
> +    extern const function_shape *const set_neonq;
>      extern const function_shape *const shift_left_imm_long;
>      extern const function_shape *const shift_left_imm_to_uint;
>      extern const function_shape *const shift_right_imm;
> diff --git a/gcc/config/aarch64/aarch64-sve-builtins-shapes.cc b/gcc/config/aarch64/aarch64-sve-builtins-shapes.cc
> index af816c4c9e705d9cc4bce5cc50481cb27e6a03a7..4b0a84fe0cb5b5f4bc6b7dd012de0bc75ee4326b 100644
> --- a/gcc/config/aarch64/aarch64-sve-builtins-shapes.cc
> +++ b/gcc/config/aarch64/aarch64-sve-builtins-shapes.cc
> @@ -29,6 +29,7 @@
>  #include "optabs.h"
>  #include "aarch64-sve-builtins.h"
>  #include "aarch64-sve-builtins-shapes.h"
> +#include "aarch64-builtins.h"
>  
>  /* In the comments below, _t0 represents the first type suffix and _t1
>     represents the second.  Square brackets enclose characters that are
> @@ -158,6 +159,8 @@ parse_element_type (const function_instance &instance, const char *&format)
>     s<elt>  - a scalar type with the given element suffix
>     t<elt>  - a vector or tuple type with given element suffix [*1]
>     v<elt>  - a vector with the given element suffix
> +   D<elt>  - a 64 bit neon vector
> +   Q<elt>  - a 128 bit neon vector
>  
>     where <elt> has the format described above parse_element_type
>  
> @@ -224,6 +227,20 @@ parse_type (const function_instance &instance, const char *&format)
>        return acle_vector_types[0][type_suffixes[suffix].vector_type];
>      }
>  
> +  if (ch == 'D')
> +    {
> +      type_suffix_index suffix = parse_element_type (instance, format);
> +      int neon_index = type_suffixes[suffix].neon64_type;
> +      return aarch64_simd_types[neon_index].itype;
> +    }
> +
> +  if (ch == 'Q')
> +    {
> +      type_suffix_index suffix = parse_element_type (instance, format);
> +      int neon_index = type_suffixes[suffix].neon128_type;
> +      return aarch64_simd_types[neon_index].itype;
> +    }
> +
>    gcc_unreachable ();
>  }
>  
> @@ -1917,6 +1934,67 @@ struct get_def : public overloaded_base<0>
>  };
>  SHAPE (get)
>  
> +/* <t0>xN_t svfoo[_t0](sv<t0>_t).  */
> +struct get_neonq_def : public overloaded_base<0>
> +{
> +  void
> +  build (function_builder &b, const function_group_info &group) const override
> +  {
> +    b.add_overloaded_functions (group, MODE_none);
> +    build_all (b, "Q0,v0", group, MODE_none);
> +  }
> +  tree
> +  resolve (function_resolver &r) const override
> +  {
> +    return r.resolve_unary ();
> +  }
> +};
> +SHAPE (get_neonq)
> +
> +/* sv<t0>_t svfoo[_t0](sv<t0>_t, <t0>xN_t).  */
> +struct set_neonq_def : public overloaded_base<0>
> +{
> +  void
> +  build (function_builder &b, const function_group_info &group) const override
> +  {
> +    b.add_overloaded_functions (group, MODE_none);
> +    build_all (b, "v0,v0,Q0", group, MODE_none);
> +  }
> +  tree
> +  resolve (function_resolver &r) const override
> +  {
> +    unsigned int i, nargs;
> +    type_suffix_index type;
> +    if (!r.check_gp_argument (2, i, nargs)
> +	|| (type = r.infer_neon128_vector_type (i + 1)) == NUM_TYPE_SUFFIXES)
> +      return error_mark_node;
> +    return r.resolve_to (r.mode_suffix_id, type);
> +  }
> +};
> +SHAPE (set_neonq)
> +
> +/* sv<t0>_t svfoo[_t0](<t0>xN_t).  */
> +struct dup_neonq_def : public overloaded_base<0>
> +{
> +  void
> +  build (function_builder &b, const function_group_info &group) const override
> +  {
> +    b.add_overloaded_functions (group, MODE_none);
> +    build_all (b, "v0,Q0", group, MODE_none);
> +  }
> +  tree
> +  resolve (function_resolver &r) const override
> +  {
> +    unsigned int i, nargs;
> +    type_suffix_index type;
> +    if (!r.check_gp_argument (1, i, nargs)
> +	|| (type = r.infer_neon128_vector_type (i)) == NUM_TYPE_SUFFIXES)
> +      return error_mark_node;
> +    return r.resolve_to (r.mode_suffix_id, type);
> +  }
> +};
> +SHAPE (dup_neonq)
> +
>  /* sv<t0>_t svfoo[_t0](sv<t0>_t, uint64_t)
>     <t0>_t svfoo[_n_t0](<t0>_t, uint64_t)
>  
> diff --git a/gcc/config/aarch64/aarch64-sve-builtins.cc b/gcc/config/aarch64/aarch64-sve-builtins.cc
> index 161a14edde7c9fb1b13b146cf50463e2d78db264..6ff5c65e2610de8309a57b004e16d4602ea76999 100644
> --- a/gcc/config/aarch64/aarch64-sve-builtins.cc
> +++ b/gcc/config/aarch64/aarch64-sve-builtins.cc
> @@ -52,6 +52,7 @@
>  #include "aarch64-sve-builtins-base.h"
>  #include "aarch64-sve-builtins-sve2.h"
>  #include "aarch64-sve-builtins-shapes.h"
> +#include "aarch64-builtins.h"
>  
>  namespace aarch64_sve {
>  
> @@ -127,7 +128,8 @@ CONSTEXPR const mode_suffix_info mode_suffixes[] = {
>  
>  /* Static information about each type_suffix_index.  */
>  CONSTEXPR const type_suffix_info type_suffixes[NUM_TYPE_SUFFIXES + 1] = {
> -#define DEF_SVE_TYPE_SUFFIX(NAME, ACLE_TYPE, CLASS, BITS, MODE) \
> +#define DEF_SVE_NEON_TYPE_SUFFIX(NAME, ACLE_TYPE, CLASS, BITS, MODE, \
> +				 NEON64, NEON128) \
>    { "_" #NAME, \
>      VECTOR_TYPE_##ACLE_TYPE, \
>      TYPE_##CLASS, \
> @@ -138,10 +140,15 @@ CONSTEXPR const type_suffix_info type_suffixes[NUM_TYPE_SUFFIXES + 1] = {
>      TYPE_##CLASS == TYPE_float, \
>      TYPE_##CLASS == TYPE_bool, \
>      0, \
> -    MODE },
> +    MODE, \
> +    NEON64, \
> +    NEON128 },
> +#define DEF_SVE_TYPE_SUFFIX(NAME, ACLE_TYPE, CLASS, BITS, MODE) \
> +  DEF_SVE_NEON_TYPE_SUFFIX (NAME, ACLE_TYPE, CLASS, BITS, MODE, \
> +			    ARM_NEON_H_TYPES_LAST, ARM_NEON_H_TYPES_LAST)
>  #include "aarch64-sve-builtins.def"
>    { "", NUM_VECTOR_TYPES, TYPE_bool, 0, 0, false, false, false, false,
> -    0, VOIDmode }
> +    0, VOIDmode, ARM_NEON_H_TYPES_LAST, ARM_NEON_H_TYPES_LAST }
>  };
>  
>  /* Define a TYPES_<combination> macro for each combination of type
> @@ -529,6 +536,13 @@ static CONSTEXPR const function_group_info function_groups[] = {
>  #include "aarch64-sve-builtins.def"
>  };
>  
> +/* A list of all NEON-SVE-Bridge ACLE functions.  */
> +static CONSTEXPR const function_group_info neon_sve_function_groups[] = {
> +#define DEF_NEON_SVE_FUNCTION(NAME, SHAPE, TYPES, PREDS) \
> +  { #NAME, &neon_sve_bridge_functions::NAME, &shapes::SHAPE, types_##TYPES, preds_##PREDS },
> +#include "aarch64-neon-sve-bridge-builtins.def"
> +};
> +
>  /* The scalar type associated with each vector type.  */
>  extern GTY(()) tree scalar_types[NUM_VECTOR_TYPES];
>  tree scalar_types[NUM_VECTOR_TYPES];
> @@ -1403,6 +1417,32 @@ function_resolver::infer_integer_vector_type (unsigned int argno)
>    return type;
>  }
>  
> +type_suffix_index
> +function_resolver::infer_neon128_vector_type (unsigned int argno)

Missing function comment.

> +{
> +  tree actual = get_argument_type (argno);
> +  if (actual == error_mark_node)
> +    return NUM_TYPE_SUFFIXES;
> +
> +  for (unsigned int suffix_i = 0; suffix_i < NUM_TYPE_SUFFIXES; ++suffix_i)
> +    {
> +      int neon_index = type_suffixes[suffix_i].neon128_type;
> +      if (neon_index != ARM_NEON_H_TYPES_LAST)
> +	{
> +	  tree type = aarch64_simd_types[neon_index].itype;
> +	  if (type && matches_type_p (type, actual))
> +	    {
> +	      return type_suffix_index (suffix_i);
> +	    }
> +	}
> +    }
> +
> +  error_at (location, "passing %qT to argument %d of %qE, which"
> +	    " expects a 128 bit NEON vector type", actual, argno + 1, fndecl);
> +  return NUM_TYPE_SUFFIXES;
> +}
> +
> +
>  /* Like infer_vector_type, but also require the type to be an unsigned
>     integer.  */
>  type_suffix_index
> @@ -3410,6 +3450,13 @@ init_builtins ()
>      handle_arm_sve_h ();
>  }
>  
> +void
> +init_neon_sve_builtins ()

Missing function comment.

> +{
> +  if (in_lto_p)
> +    handle_arm_neon_sve_bridge_h ();
> +}
> +
>  /* Register vector type TYPE under its arm_sve.h name.  */
>  static void
>  register_vector_type (vector_type_index type)
> @@ -3560,6 +3607,16 @@ handle_arm_sve_h ()
>      builder.register_function_group (function_groups[i]);
>  }
>  
> +/* Implement #pragma GCC aarch64 "arm_neon_sve_bridge.h".  */
> +void
> +handle_arm_neon_sve_bridge_h ()
> +{
> +  /* Define the functions.  */
> +  function_builder builder;
> +  for (unsigned int i = 0; i < ARRAY_SIZE (neon_sve_function_groups); ++i)
> +    builder.register_function_group (neon_sve_function_groups[i]);
> +}
> +
>  /* Return the function decl with SVE function subcode CODE, or error_mark_node
>     if no such function exists.  */
>  tree
> diff --git a/gcc/config/aarch64/aarch64-sve-builtins.def b/gcc/config/aarch64/aarch64-sve-builtins.def
> index 534f6e69d72342fdcfcc00bd330585db1eae32e1..e8b4a919e1bb7a2d5d3239e6d303c9ee4e73d54f 100644
> --- a/gcc/config/aarch64/aarch64-sve-builtins.def
> +++ b/gcc/config/aarch64/aarch64-sve-builtins.def
> @@ -29,6 +29,11 @@
>  #define DEF_SVE_TYPE_SUFFIX(A, B, C, D, E)
>  #endif
>  
> +#ifndef DEF_SVE_NEON_TYPE_SUFFIX
> +#define DEF_SVE_NEON_TYPE_SUFFIX(A, B, C, D, E, F, G) \
> +  DEF_SVE_TYPE_SUFFIX(A, B, C, D, E)
> +#endif
> +
>  #ifndef DEF_SVE_FUNCTION
>  #define DEF_SVE_FUNCTION(A, B, C, D)
>  #endif
> @@ -82,23 +87,36 @@ DEF_SVE_TYPE_SUFFIX (b8, svbool_t, bool, 8, VNx16BImode)
>  DEF_SVE_TYPE_SUFFIX (b16, svbool_t, bool, 16, VNx8BImode)
>  DEF_SVE_TYPE_SUFFIX (b32, svbool_t, bool, 32, VNx4BImode)
>  DEF_SVE_TYPE_SUFFIX (b64, svbool_t, bool, 64, VNx2BImode)
> -DEF_SVE_TYPE_SUFFIX (bf16, svbfloat16_t, bfloat, 16, VNx8BFmode)
> -DEF_SVE_TYPE_SUFFIX (f16, svfloat16_t, float, 16, VNx8HFmode)
> -DEF_SVE_TYPE_SUFFIX (f32, svfloat32_t, float, 32, VNx4SFmode)
> -DEF_SVE_TYPE_SUFFIX (f64, svfloat64_t, float, 64, VNx2DFmode)
> -DEF_SVE_TYPE_SUFFIX (s8, svint8_t, signed, 8, VNx16QImode)
> -DEF_SVE_TYPE_SUFFIX (s16, svint16_t, signed, 16, VNx8HImode)
> -DEF_SVE_TYPE_SUFFIX (s32, svint32_t, signed, 32, VNx4SImode)
> -DEF_SVE_TYPE_SUFFIX (s64, svint64_t, signed, 64, VNx2DImode)
> -DEF_SVE_TYPE_SUFFIX (u8, svuint8_t, unsigned, 8, VNx16QImode)
> -DEF_SVE_TYPE_SUFFIX (u16, svuint16_t, unsigned, 16, VNx8HImode)
> -DEF_SVE_TYPE_SUFFIX (u32, svuint32_t, unsigned, 32, VNx4SImode)
> -DEF_SVE_TYPE_SUFFIX (u64, svuint64_t, unsigned, 64, VNx2DImode)
> +DEF_SVE_NEON_TYPE_SUFFIX (bf16, svbfloat16_t, bfloat, 16, VNx8BFmode,
> +			  Bfloat16x4_t, Bfloat16x8_t)
> +DEF_SVE_NEON_TYPE_SUFFIX (f16, svfloat16_t, float, 16, VNx8HFmode,
> +			  Float16x4_t, Float16x8_t)
> +DEF_SVE_NEON_TYPE_SUFFIX (f32, svfloat32_t, float, 32, VNx4SFmode,
> +			  Float32x2_t, Float32x4_t)
> +DEF_SVE_NEON_TYPE_SUFFIX (f64, svfloat64_t, float, 64, VNx2DFmode,
> +			  Float64x1_t, Float64x2_t)
> +DEF_SVE_NEON_TYPE_SUFFIX (s8, svint8_t, signed, 8, VNx16QImode,
> +			  Int8x8_t, Int8x16_t)
> +DEF_SVE_NEON_TYPE_SUFFIX (s16, svint16_t, signed, 16, VNx8HImode,
> +			  Int16x4_t, Int16x8_t)
> +DEF_SVE_NEON_TYPE_SUFFIX (s32, svint32_t, signed, 32, VNx4SImode,
> +			  Int32x2_t, Int32x4_t)
> +DEF_SVE_NEON_TYPE_SUFFIX (s64, svint64_t, signed, 64, VNx2DImode,
> +			  Int64x1_t, Int64x2_t)
> +DEF_SVE_NEON_TYPE_SUFFIX (u8, svuint8_t, unsigned, 8, VNx16QImode,
> +			  Uint8x8_t, Uint8x16_t)
> +DEF_SVE_NEON_TYPE_SUFFIX (u16, svuint16_t, unsigned, 16, VNx8HImode,
> +			  Uint16x4_t, Uint16x8_t)
> +DEF_SVE_NEON_TYPE_SUFFIX (u32, svuint32_t, unsigned, 32, VNx4SImode,
> +			  Uint32x2_t, Uint32x4_t)
> +DEF_SVE_NEON_TYPE_SUFFIX (u64, svuint64_t, unsigned, 64, VNx2DImode,
> +			  Uint64x1_t, Uint64x2_t)
>  
>  #include "aarch64-sve-builtins-base.def"
>  #include "aarch64-sve-builtins-sve2.def"
>  
>  #undef DEF_SVE_FUNCTION
> +#undef DEF_SVE_NEON_TYPE_SUFFIX
>  #undef DEF_SVE_TYPE_SUFFIX
>  #undef DEF_SVE_TYPE
>  #undef DEF_SVE_MODE
> diff --git a/gcc/config/aarch64/aarch64-sve-builtins.h b/gcc/config/aarch64/aarch64-sve-builtins.h
> index a301570b82ed3477306d203080ccb76608322c09..d32bf5b57ae7b48a130a7794f3f8277ad59ed03e 100644
> --- a/gcc/config/aarch64/aarch64-sve-builtins.h
> +++ b/gcc/config/aarch64/aarch64-sve-builtins.h
> @@ -20,6 +20,8 @@
>  #ifndef GCC_AARCH64_SVE_BUILTINS_H
>  #define GCC_AARCH64_SVE_BUILTINS_H
>  
> +#include "aarch64-builtins.h"
> +
>  /* The full name of an SVE ACLE function is the concatenation of:
>  
>     - the base name ("svadd", etc.)
> @@ -206,6 +208,14 @@ struct mode_suffix_info
>    units_index displacement_units;
>  };
>  
> +#define ENTRY(E, M, Q, G) E,
> +enum aarch64_simd_type
> +{
> +#include "aarch64-simd-builtin-types.def"
> +  ARM_NEON_H_TYPES_LAST
> +};
> +#undef ENTRY
> +
>  /* Static information about a type suffix.  */
>  struct type_suffix_info
>  {
> @@ -235,6 +245,11 @@ struct type_suffix_info
>  
>    /* The associated vector or predicate mode.  */
>    machine_mode vector_mode : 16;
> +
> +  /* The corresponding 64-bit and 128-bit arm_neon.h types, or
> +     ARM_NEON_H_TYPES_LAST if none.  */
> +  aarch64_simd_type neon64_type;
> +  aarch64_simd_type neon128_type;
>  };
>  
>  /* Static information about a set of functions.  */
> @@ -400,6 +415,7 @@ public:
>    type_suffix_index infer_vector_or_tuple_type (unsigned int, unsigned int);
>    type_suffix_index infer_vector_type (unsigned int);
>    type_suffix_index infer_integer_vector_type (unsigned int);
> +  type_suffix_index infer_neon128_vector_type (unsigned int);
>    type_suffix_index infer_unsigned_vector_type (unsigned int);
>    type_suffix_index infer_sd_vector_type (unsigned int);
>    type_suffix_index infer_tuple_type (unsigned int);
> diff --git a/gcc/config/aarch64/aarch64-sve.md b/gcc/config/aarch64/aarch64-sve.md
> index 5a652d8536a0ef9461f40da7b22834e683e73ceb..3e01669fbaaa805ac4de0d2615e50674f265ee59 100644
> --- a/gcc/config/aarch64/aarch64-sve.md
> +++ b/gcc/config/aarch64/aarch64-sve.md
> @@ -10834,3 +10834,57 @@
>      operands[4] = CONSTM1_RTX (<VPRED>mode);
>    }
>  )
> +
> +(define_insn_and_split "@aarch64_sve_get_neonq_<mode>"
> +  [(set (match_operand:<V128> 0 "register_operand" "=w")
> +	  (vec_select:<V128>
> +	    (match_operand:SVE_FULL 1 "register_operand" "w")
> +	    (match_operand 2 "descending_int_parallel")))]
> +  "TARGET_SVE
> +   && BYTES_BIG_ENDIAN
> +   && known_eq (INTVAL (XVECEXP (operands[2], 0, 0)),
> +		GET_MODE_NUNITS (<V128>mode) - 1)"
> +  "#"
> +  "&& reload_completed"
> +  [(set (match_dup 0) (match_dup 1))]
> +  {
> +    operands[1] = gen_rtx_REG (<V128>mode, REGNO (operands[1]));
> +  }
> +)
> +
> +(define_insn "@aarch64_sve_set_neonq_<mode>"
> +  [(set (match_operand:SVE_FULL 0 "register_operand" "=w")
> +      (unspec:SVE_FULL
> +	[(match_operand:SVE_FULL 1 "register_operand" "w")
> +	(match_operand:<V128> 2 "register_operand" "w")
> +	(match_operand:<VPRED> 3 "register_operand" "Upl")]
> +	UNSPEC_SET_NEONQ))]
> +  "TARGET_SVE
> +   && BYTES_BIG_ENDIAN"
> +  {
> +    operands[2] = lowpart_subreg (<MODE>mode, operands[2],
> +                                  GET_MODE (operands[2]));
> +    return aarch64_output_sve_set_neonq (operands, <MODE>mode);
> +  }
> +)
> +
> +(define_insn_and_split "@aarch64_sve_dup_neonq_<mode>"
> +  [(set (match_operand:SVE_FULL 0 "register_operand")
> +	(vec_duplicate:SVE_FULL
> +	  (vec_select:<V128>
> +	    (match_operand:<V128> 1 "register_operand")
> +	    (match_operand 2 "descending_int_parallel"))))]

We already have @aarch64_vec_duplicate_vq<mode>_be for this.
Also, the split...

> +  "TARGET_SVE
> +   && BYTES_BIG_ENDIAN
> +   && known_eq (INTVAL (XVECEXP (operands[2], 0, 0)),
> +		GET_MODE_NUNITS (<MODE>mode) - 1)"
> +  {@ [ cons: =0 , 1  ]
> +     [ w        , 0  ] #
> +     [ w        , ?w ] #
> +  }
> +  "&& reload_completed"
> +  [(set (match_dup 0) (match_dup 1))]
> +  {
> +    operands[1] = gen_rtx_REG (<MODE>mode, REGNO (operands[1]));
> +  }

...here isn't correct, since we need the 128-bit vector to be duplicated
to fill the whole of the SVE vector.  The split pattern instead just
initialises the low 128 bits.

> +)
> \ No newline at end of file
> diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
> index 5fd7063663c67a15e654eea66ffe7193caebf6b6..bf9b725eb63f6b713a4cac430554166bd677e01a 100644
> --- a/gcc/config/aarch64/aarch64.cc
> +++ b/gcc/config/aarch64/aarch64.cc
> @@ -15594,6 +15594,7 @@ aarch64_init_builtins ()
>  {
>    aarch64_general_init_builtins ();
>    aarch64_sve::init_builtins ();
> +  aarch64_sve::init_neon_sve_builtins ();
>  #ifdef SUBTARGET_INIT_BUILTINS
>    SUBTARGET_INIT_BUILTINS;
>  #endif
> @@ -24100,6 +24101,23 @@ aarch64_output_sve_ptrues (rtx const_unspec)
>    return templ;
>  }
>  
> +const char *
> +aarch64_output_sve_set_neonq (rtx * operands, machine_mode mode)
> +{
> +  switch(GET_MODE_UNIT_BITSIZE(mode))
> +    {
> +    case 64:
> +      return "sel\t%0.d, %3, %2.d, %1.d";
> +    case 32:
> +      return "sel\t%0.s, %3, %2.s, %1.s";
> +    case 16:
> +      return "sel\t%0.h, %3, %2.h, %1.h";
> +    case 8:
> +      return "sel\t%0.b, %3, %2.b, %1.b";
> +    }
> +
> +}

This function shouldn't be needed.  It should be possible to do this
directly in the define_insn, with an asm template such as:

  "sel\t%0.<Vetype>, %3, %2.<Vetype>, %1.<Vetype>"

> +
>  /* Split operands into moves from op[1] + op[2] into op[0].  */
>  
>  void
> [...]

The tests look good, but:

> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/set_neonq_bf16.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/set_neonq_bf16.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..b12ce1d46b468359728a7fef5ae464b9e80c2e52
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/set_neonq_bf16.c
> @@ -0,0 +1,23 @@
> +/* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */
> +
> +#include "test_sve_acle.h"
> +
> +/*
> +** set_neonq_bf16_z24:
> +**	ptrue	p3.h, vl8
> +**	sel	z24.h, p3, z0.h, z4.h
> +**	ret
> +*/
> +TEST_SET_NEONQ (set_neonq_bf16_z24, svbfloat16_t, bfloat16x8_t,
> +	  z24 = svset_neonq_bf16 (z4, z0),
> +	  z24 = svset_neonq (z4, z0))

There's nothing that forces the predicate to be p3.  I think it should be:

/*
** set_neonq_bf16_z24:
**	ptrue	(p[0-9]+).h, vl8
**	sel	z24.h, \1, z0.h, z4.h
**	ret
*/

Same for the other tests and files.

> +
> +/*
> +** set_neonq_bf16_z4:
> +**	ptrue	p3.h, vl8
> +**	sel	(z0.h|z4.h), p3, z0.h, z4.h

Given:

> +**	ret
> +*/
> +TEST_SET_NEONQ (set_neonq_bf16_z4, svbfloat16_t, bfloat16x8_t,
> +	  z4 = svset_neonq_bf16 (z4, z0),
> +	  z4 = svset_neonq (z4, z0))

...this, we should try to force the z4 allocation of the result.
It's probably easiest to do that using register asms in TEST_SET_NEONQ,
like TEST_DUP_NEONQ already does.

Thanks,
Richard