From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=OATU=D2=arm.com=richard.sandiford@sourceware.org>
Received: from foss.arm.com (foss.arm.com [217.140.110.172])
	by sourceware.org (Postfix) with ESMTP id 859DB3858D33
	for <gcc-patches@gcc.gnu.org>; Wed,  9 Aug 2023 11:39:14 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 859DB3858D33
Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com
Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14])
	by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 5BD8DD75;
	Wed,  9 Aug 2023 04:39:56 -0700 (PDT)
Received: from localhost (e121540-lin.manchester.arm.com [10.32.110.72])
	by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 280603F6C4;
	Wed,  9 Aug 2023 04:39:13 -0700 (PDT)
From: Richard Sandiford <richard.sandiford@arm.com>
To: Richard Ball <richard.ball@arm.com>
Mail-Followup-To: Richard Ball <richard.ball@arm.com>,"gcc-patches\@gcc.gnu.org" <gcc-patches@gcc.gnu.org>,  Richard Earnshaw <Richard.Earnshaw@arm.com>,  Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>,  Marcus Shawcroft <Marcus.Shawcroft@arm.com>, richard.sandiford@arm.com
Cc: "gcc-patches\@gcc.gnu.org" <gcc-patches@gcc.gnu.org>,  Richard Earnshaw <Richard.Earnshaw@arm.com>,  Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>,  Marcus Shawcroft <Marcus.Shawcroft@arm.com>
Subject: Re: [PATCH] aarch64: SVE/NEON Bridging intrinsics
References: <c23e4ed4-ec72-a35a-4417-c589dca28a54@arm.com>
Date: Wed, 09 Aug 2023 12:39:11 +0100
In-Reply-To: <c23e4ed4-ec72-a35a-4417-c589dca28a54@arm.com> (Richard Ball's
	message of "Wed, 2 Aug 2023 13:09:54 +0100")
Message-ID: <mptil9oxx5s.fsf@arm.com>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux)
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
X-Spam-Status: No, score=-25.5 required=5.0 tests=BAYES_00,GIT_PATCH_0,KAM_ASCII_DIVIDERS,KAM_DMARC_NONE,KAM_DMARC_STATUS,KAM_LAZY_DOMAIN_SECURITY,KAM_LOTSOFHASH,KAM_SHORT,SPF_HELO_NONE,SPF_NONE,TXREP autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org
List-Id: <gcc-patches.gcc.gnu.org>

Richard Ball <richard.ball@arm.com> writes:
> ACLE has added intrinsics to bridge between SVE and Neon.
>
> The NEON_SVE Bridge adds intrinsics that allow conversions between NEON a=
nd
> SVE vectors.
>
> This patch adds support to GCC for the following 3 intrinsics:
> svset_neonq, svget_neonq and svdup_neonq
>
> gcc/ChangeLog:
>
> 	* config.gcc: Adds new header to config.
> 	* config/aarch64/aarch64-builtins.cc (GTY): Externs aarch64_simd_types.
> 	* config/aarch64/aarch64-c.cc (aarch64_pragma_aarch64):
>           Defines pragma for arm_neon_sve_bridge.h.
> 	* config/aarch64/aarch64-protos.h: New function.
> 	* config/aarch64/aarch64-sve-builtins-base.h: New intrinsics.
> 	* config/aarch64/aarch64-sve-builtins-base.cc
>           (class svget_neonq_impl): New intrinsic implementation.
> 	(class svset_neonq_impl): Likewise.
> 	(class svdup_neonq_impl): Likewise.
> 	(NEON_SVE_BRIDGE_FUNCTION): New intrinsics.
> 	* config/aarch64/aarch64-sve-builtins-functions.h
>           (NEON_SVE_BRIDGE_FUNCTION): Defines macro for NEON_SVE_BRIDGE=20
> functions.
> 	* config/aarch64/aarch64-sve-builtins-shapes.h: New shapes.
> 	* config/aarch64/aarch64-sve-builtins-shapes.cc
>           (parse_neon_type): Parser for NEON types.
> 	(parse_element_type): Add NEON element types.
> 	(parse_type): Likewise.
> 	(NEON_SVE_BRIDGE_SHAPE): Defines macro for NEON_SVE_BRIDGE shapes.
> 	(struct get_neonq_def): Defines function shape for get_neonq.
> 	(struct set_neonq_def): Defines function shape for set_neonq.
> 	(struct dup_neonq_def): Defines function shape for dup_neonq.
> 	* config/aarch64/aarch64-sve-builtins.cc (DEF_NEON_SVE_FUNCTION): Defines
>           macro for NEON_SVE_BRIDGE functions.
> 	(handle_arm_neon_sve_bridge_h): Handles #pragma arm_neon_sve_bridge.h.
> 	* config/aarch64/aarch64-builtins.h: New header file to extern neon type=
s.
> 	* config/aarch64/aarch64-neon-sve-bridge-builtins.def: New instrinsics
>           function def file.
> 	* config/aarch64/arm_neon_sve_bridge.h: New header file.
>
> gcc/testsuite/ChangeLog:
>
> 	* gcc.c-torture/execute/neon-sve-bridge.c: New test.
>
> #########################################################################=
####
>
> diff --git a/gcc/config.gcc b/gcc/config.gcc
> index=20
> d88071773c9e1280cc5f38e36e09573214323b48..ca55992200dbe58782c3dbf66906339=
de021ba6b=20
> 100644
> --- a/gcc/config.gcc
> +++ b/gcc/config.gcc
> @@ -334,7 +334,7 @@ m32c*-*-*)
>            ;;
>    aarch64*-*-*)
>    	cpu_type=3Daarch64
> -	extra_headers=3D"arm_fp16.h arm_neon.h arm_bf16.h arm_acle.h arm_sve.h"
> +	extra_headers=3D"arm_fp16.h arm_neon.h arm_bf16.h arm_acle.h arm_sve.h=
=20
> arm_neon_sve_bridge.h"
>    	c_target_objs=3D"aarch64-c.o"
>    	cxx_target_objs=3D"aarch64-c.o"
>    	d_target_objs=3D"aarch64-d.o"
> diff --git a/gcc/config/aarch64/aarch64-builtins.h=20
> b/gcc/config/aarch64/aarch64-builtins.h
> new file mode 100644
> index=20
> 0000000000000000000000000000000000000000..eebde448f92c230c8f88b4da1ca8ebd=
9670b1536
> --- /dev/null
> +++ b/gcc/config/aarch64/aarch64-builtins.h
> @@ -0,0 +1,86 @@
> +/* Builtins' description for AArch64 SIMD architecture.
> +   Copyright (C) 2023 Free Software Foundation, Inc.
> +   This file is part of GCC.
> +   GCC is free software; you can redistribute it and/or modify it
> +   under the terms of the GNU General Public License as published by
> +   the Free Software Foundation; either version 3, or (at your option)
> +   any later version.
> +   GCC is distributed in the hope that it will be useful, but
> +   WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   General Public License for more details.
> +   You should have received a copy of the GNU General Public License
> +   along with GCC; see the file COPYING3.  If not see
> +   <http://www.gnu.org/licenses/>.  */
> +#ifndef GCC_AARCH64_BUILTINS_H
> +#define GCC_AARCH64_BUILTINS_H
> +#include "tree.h"

It looks like the include shouldn't be needed.  tree is forward-declared
in coretypes.h, which is included everywhere.

> +enum aarch64_type_qualifiers
> +{
> +  /* T foo.  */
> +  qualifier_none =3D 0x0,
> +  /* unsigned T foo.  */
> +  qualifier_unsigned =3D 0x1, /* 1 << 0  */
> +  /* const T foo.  */
> +  qualifier_const =3D 0x2, /* 1 << 1  */
> +  /* T *foo.  */
> +  qualifier_pointer =3D 0x4, /* 1 << 2  */
> +  /* Used when expanding arguments if an operand could
> +     be an immediate.  */
> +  qualifier_immediate =3D 0x8, /* 1 << 3  */
> +  qualifier_maybe_immediate =3D 0x10, /* 1 << 4  */
> +  /* void foo (...).  */
> +  qualifier_void =3D 0x20, /* 1 << 5  */
> +  /* 1 << 6 is now unused */
> +  /* Some builtins should use the T_*mode* encoded in a simd_builtin_dat=
um
> +     rather than using the type of the operand.  */
> +  qualifier_map_mode =3D 0x80, /* 1 << 7  */
> +  /* qualifier_pointer | qualifier_map_mode  */
> +  qualifier_pointer_map_mode =3D 0x84,
> +  /* qualifier_const | qualifier_pointer | qualifier_map_mode  */
> +  qualifier_const_pointer_map_mode =3D 0x86,
> +  /* Polynomial types.  */
> +  qualifier_poly =3D 0x100,
> +  /* Lane indices - must be in range, and flipped for bigendian.  */
> +  qualifier_lane_index =3D 0x200,
> +  /* Lane indices for single lane structure loads and stores.  */
> +  qualifier_struct_load_store_lane_index =3D 0x400,
> +  /* Lane indices selected in pairs. - must be in range, and flipped for
> +     bigendian.  */
> +  qualifier_lane_pair_index =3D 0x800,
> +  /* Lane indices selected in quadtuplets. - must be in range, and=20
> flipped for
> +     bigendian.  */
> +  qualifier_lane_quadtup_index =3D 0x1000,
> +};
> +#define ENTRY(E, M, Q, G) E,
> +enum aarch64_simd_type
> +{
> +#include "aarch64-simd-builtin-types.def"
> +  ARM_NEON_H_TYPES_LAST
> +};
> +#undef ENTRY
> +struct GTY(()) aarch64_simd_type_info
> +{
> +  enum aarch64_simd_type type;
> +  /* Internal type name.  */
> +  const char *name;
> +  /* Internal type name(mangled).  The mangled names conform to the
> +     AAPCS64 (see "Procedure Call Standard for the ARM 64-bit=20
> Architecture",
> +     Appendix A).  To qualify for emission with the mangled names=20
> defined in
> +     that document, a vector type must not only be of the correct mode=20
> but also
> +     be of the correct internal AdvSIMD vector type (e.g. __Int8x8_t);=20
> these
> +     types are registered by aarch64_init_simd_builtin_types ().  In oth=
er
> +     words, vector types defined in other ways e.g. via vector_size=20
> attribute
> +     will get default mangled names.  */
> +  const char *mangle;
> +  /* Internal type.  */
> +  tree itype;
> +  /* Element type.  */
> +  tree eltype;
> +  /* Machine mode the internal type maps to.  */
> +  enum machine_mode mode;
> +  /* Qualifiers.  */
> +  enum aarch64_type_qualifiers q;
> +};
> +extern aarch64_simd_type_info aarch64_simd_types[];
> +#endif
> \ No newline at end of file

Putting these in a header file is good, but we should then also remove
the copy in aarch64-builtins.cc, and make aarch64-builtsin.cc include
this file instead.

> diff --git a/gcc/config/aarch64/aarch64-builtins.cc=20
> b/gcc/config/aarch64/aarch64-builtins.cc
> index=20
> 04f59fd9a54306d6422b03e32dce79bc00aed4f8..6a3aca6420624ad5ea93d64d7ed5807=
91d65d4e4=20
> 100644
> --- a/gcc/config/aarch64/aarch64-builtins.cc
> +++ b/gcc/config/aarch64/aarch64-builtins.cc
> @@ -923,7 +923,7 @@ struct GTY(()) aarch64_simd_type_info
>
>    #define ENTRY(E, M, Q, G)  \
>      {E, "__" #E, #G "__" #E, NULL_TREE, NULL_TREE, E_##M##mode,=20
> qualifier_##Q},
> -static GTY(()) struct aarch64_simd_type_info aarch64_simd_types [] =3D {
> +extern GTY(()) struct aarch64_simd_type_info aarch64_simd_types [] =3D {
>    #include "aarch64-simd-builtin-types.def"
>    };
>    #undef ENTRY
> diff --git a/gcc/config/aarch64/aarch64-c.cc=20
> b/gcc/config/aarch64/aarch64-c.cc
> index=20
> 578ec6f45b06347d90f951b37064006786baf10f..ada8b81a7bef6c2e58b07324a7bfc38=
eecb651da=20
> 100644
> --- a/gcc/config/aarch64/aarch64-c.cc
> +++ b/gcc/config/aarch64/aarch64-c.cc
> @@ -294,6 +294,8 @@ aarch64_pragma_aarch64 (cpp_reader *)
>        handle_arm_neon_h ();
>      else if (strcmp (name, "arm_acle.h") =3D=3D 0)
>        handle_arm_acle_h ();
> +  else if (strcmp (name, "arm_neon_sve_bridge.h") =3D=3D 0)
> +    aarch64_sve::handle_arm_neon_sve_bridge_h ();
>      else
>        error ("unknown %<#pragma GCC aarch64%> option %qs", name);
>    }
> diff --git a/gcc/config/aarch64/aarch64-neon-sve-bridge-builtins.def=20
> b/gcc/config/aarch64/aarch64-neon-sve-bridge-builtins.def
> new file mode 100644
> index=20
> 0000000000000000000000000000000000000000..0c3cf233c9382b2f7420379054a53fa=
846d46c8c
> --- /dev/null
> +++ b/gcc/config/aarch64/aarch64-neon-sve-bridge-builtins.def
> @@ -0,0 +1,28 @@
> +/* Builtin lists for AArch64 NEON-SVE-Bridge
> +   Copyright (C) 2023 Free Software Foundation, Inc.
> +
> +   This file is part of GCC.
> +
> +   GCC is free software; you can redistribute it and/or modify it
> +   under the terms of the GNU General Public License as published by
> +   the Free Software Foundation; either version 3, or (at your option)
> +   any later version.
> +
> +   GCC is distributed in the hope that it will be useful, but
> +   WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   General Public License for more details.
> +
> +   You should have received a copy of the GNU General Public License
> +   along with GCC; see the file COPYING3.  If not see
> +   <http://www.gnu.org/licenses/>.  */
> +
> +#ifndef DEF_NEON_SVE_FUNCTION
> +#define DEF_NEON_SVE_FUNCTION(A, B, C, D)
> +#endif
> +
> +DEF_NEON_SVE_FUNCTION (svset_neonq, set_neonq, all_data, none)
> +DEF_NEON_SVE_FUNCTION (svget_neonq, get_neonq, all_data, none)
> +DEF_NEON_SVE_FUNCTION (svdup_neonq, dup_neonq, all_data, none)
> +
> +#undef DEF_NEON_SVE_FUNCTION
> \ No newline at end of file
> diff --git a/gcc/config/aarch64/aarch64-protos.h=20
> b/gcc/config/aarch64/aarch64-protos.h
> index=20
> 70303d6fd953e0c397b9138ede8858c2db2e53db..c5e4e20e73cedb363d867a73869c065=
9ed9b237d=20
> 100644
> --- a/gcc/config/aarch64/aarch64-protos.h
> +++ b/gcc/config/aarch64/aarch64-protos.h
> @@ -987,6 +987,7 @@ void handle_arm_neon_h (void);
>    namespace aarch64_sve {
>      void init_builtins ();
>      void handle_arm_sve_h ();
> +  void handle_arm_neon_sve_bridge_h ();
>      tree builtin_decl (unsigned, bool);
>      bool builtin_type_p (const_tree);
>      bool builtin_type_p (const_tree, unsigned int *, unsigned int *);
> diff --git a/gcc/config/aarch64/aarch64-sve-builtins-base.h=20
> b/gcc/config/aarch64/aarch64-sve-builtins-base.h
> index=20
> d300e3a85d00b58ad790851a81d43af709b66bce..df75e4c1ecf81f3ddfa256edbcf8637=
d092fcfde=20
> 100644
> --- a/gcc/config/aarch64/aarch64-sve-builtins-base.h
> +++ b/gcc/config/aarch64/aarch64-sve-builtins-base.h
> @@ -299,6 +299,12 @@ namespace aarch64_sve
>        extern const function_base *const svzip2;
>        extern const function_base *const svzip2q;
>      }
> +  namespace neon_sve_bridge_functions
> +  {
> +    extern const function_base *const svset_neonq;
> +    extern const function_base *const svget_neonq;
> +    extern const function_base *const svdup_neonq;
> +  }
>    }
>
>    #endif
> diff --git a/gcc/config/aarch64/aarch64-sve-builtins-base.cc=20
> b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
> index=20
> 9010ecca6da23c107f5ded9ab3cfa678e308daf9..0acc3acf7d34b54af8679dc36effb85=
f7b557543=20
> 100644
> --- a/gcc/config/aarch64/aarch64-sve-builtins-base.cc
> +++ b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
> @@ -44,6 +44,7 @@
>    #include "aarch64-sve-builtins-shapes.h"
>    #include "aarch64-sve-builtins-base.h"
>    #include "aarch64-sve-builtins-functions.h"
> +#include "aarch64-builtins.h"
>    #include "ssa.h"
>    #include "gimple-fold.h"
>
> @@ -1064,6 +1065,99 @@ public:
>      }
>    };
>
> +class svget_neonq_impl : public function_base
> +{
> +public:
> +  gimple *
> +  fold (gimple_folder &f) const override
> +  {
> +    tree rhs_tuple =3D gimple_call_arg (f.call, 0);
> +    tree rhs_vector =3D build3 (BIT_FIELD_REF, TREE_TYPE (f.lhs),
> +			     rhs_tuple, bitsize_int(128), bitsize_int(0));
> +    return gimple_build_assign (f.lhs, rhs_vector);
> +  }

I hope this doesn't contradict something I said earlier :)
but I don't think the fold is valid for big endian.  Quoting
from the svset_neonq documentation:

------------------------------------------------------------------------
These intrinsics set the first 128 bits of SVE vector `vec` to `subvec`.

That is, bit *i* of the result is equal to:
* bit *i* of `subvec` if *i* < 128
* bit *i* of `vec` otherwise

On big-endian targets, this leaves lanes in a different
order from the =E2=80=9Cnative=E2=80=9D SVE order.  For example, if `subvec=
` is
`int32x4_t`, then on big-endian targets, the first memory element
is in lane 3 of `subvec` and is therefore in lane 3 of the returned
SVE vector.  Using `svld1` to load elements would instead put the
first memory element in lane 0 of the returned SVE vector.
------------------------------------------------------------------------

This means that, on big endian:

  svint32_t *b;
  int32x4_t *a;
  ...
  *a =3D svget_neonq (*b);

would leave a[0] =3D=3D b[3], a[1] =3D=3D b[2], etc.  (b is loaded from usi=
ng
SVE's LD1W and a is stored to using Advanced SIMD's STR.)

The GCC representation follows memory ordering, so if we were going
to fold on big endian, we would need an extra VEC_PERM_EXPR after
the bitfield reference.  But I'm not sure it's worth it.  Let's
just return null for big endian for now.

(The bitfield is at the right offset though, which is another potential
trap for big endian.)

> +  rtx
> +  expand (function_expander &e) const override
> +  {
> +    return simplify_gen_subreg (e.vector_mode (0), e.args[0],
> +				GET_MODE (e.args[0]),
> +				INTVAL (e.args[1]) * BYTES_PER_SVE_VECTOR);

It looks like this would fail if the fold doesn't happen, since
there should only be 1 argument rather than 2.  It'd be good to
test the patch with the folds commented out.

Subregs also follow memory order, so I think for big endian this
needs to use a real define_insn.  The pattern and condition would
be similar to:

(define_insn "@aarch64_vec_duplicate_vq<mode>_be"
  [(set (match_operand:SVE_FULL 0 "register_operand" "=3Dw")
	(vec_duplicate:SVE_FULL
	  (vec_select:<V128>
	    (match_operand:<V128> 1 "register_operand" "w")
	    (match_operand 2 "descending_int_parallel"))))]
  "TARGET_SVE
   && BYTES_BIG_ENDIAN
   && known_eq (INTVAL (XVECEXP (operands[2], 0, 0)),
		GET_MODE_NUNITS (<V128>mode) - 1)"
  {
    operands[1] =3D gen_rtx_REG (<MODE>mode, REGNO (operands[1]));
    return "dup\t%0.q, %1.q[0]";
  }
)

but without the outermost vec_duplicate.  The implementation
can expand to nothing after register allocation, as for
@aarch64_sve_reinterpret<mode>.

Similar comments for the others.

> +  }
> +};
> +
> +class svset_neonq_impl : public function_base
> +{
> +public:
> +  gimple *
> +  fold (gimple_folder &f) const override
> +  {
> +    tree rhs_tuple =3D gimple_call_arg (f.call, 0);
> +    tree rhs_vector =3D gimple_call_arg (f.call, 1);
> +    gassign *copy =3D gimple_build_assign (unshare_expr (f.lhs), rhs_tup=
le);
> +    tree lhs_vector =3D build3 (BIT_INSERT_EXPR, TREE_TYPE (rhs_vector),
> +			     f.lhs, rhs_vector, bitsize_int(0));
> +    gassign *update =3D gimple_build_assign (f.lhs, lhs_vector);
> +    gsi_insert_after (f.gsi, update, GSI_SAME_STMT);
> +    return copy;
> +  }
> +  rtx
> +  expand (function_expander &e) const override
> +  {
> +    rtx rhs_tuple =3D e.args[0];
> +    unsigned int index =3D INTVAL (e.args[1]);
> +    rtx rhs_vector =3D e.args[2];
> +    rtx lhs_tuple =3D e.get_nonoverlapping_reg_target ();
> +    emit_move_insn (lhs_tuple, rhs_tuple);
> +    rtx lhs_vector =3D simplify_gen_subreg (GET_MODE (rhs_vector),
> +					  lhs_tuple, GET_MODE (lhs_tuple),
> +					  index * BYTES_PER_SVE_VECTOR);
> +    emit_move_insn (lhs_vector, rhs_vector);
> +    return lhs_vector;
> +  }
> +};
> +
> +class svdup_neonq_impl : public function_base
> +{
> +public:
> +  gimple *
> +  fold (gimple_folder &f) const override
> +  {
> +    tree rhs_vector =3D gimple_call_arg (f.call, 0);
> +    unsigned int nargs =3D gimple_call_num_args (f.call);
> +    unsigned HOST_WIDE_INT NEONnelts;
> +    TYPE_VECTOR_SUBPARTS (TREE_TYPE (rhs_vector)).is_constant (&NEONnelt=
s);
> +    poly_uint64 SVEnelts;
> +    SVEnelts =3D TYPE_VECTOR_SUBPARTS (TREE_TYPE (f.lhs));
> +    vec_perm_builder builder (SVEnelts, NEONnelts, 1);
> +    for (unsigned int i =3D 0; i < NEONnelts; i++)
> +      {
> +	builder.quick_push (i);
> +      }
> +    vec_perm_indices indices (builder, 1, NEONnelts);
> +    tree perm_type =3D build_vector_type (ssizetype, SVEnelts);
> +    return gimple_build_assign (f.lhs, VEC_PERM_EXPR,
> +				rhs_vector,
> +				rhs_vector,
> +				vec_perm_indices_to_tree (perm_type, indices));
> +  }
> +  rtx
> +  expand (function_expander &e) const override
> +  {
> +    insn_code icode;
> +    machine_mode mode =3D e.vector_mode (0);
> +    if (valid_for_const_vector_p (GET_MODE_INNER (mode), e.args.last ()))
> +      /* Duplicate the constant to fill a vector.  The pattern optimizes
> +	 various cases involving constant operands, falling back to SEL
> +	 if necessary.  */
> +      icode =3D code_for_vcond_mask (mode, mode);
> +    else
> +      /* Use the pattern for selecting between a duplicated scalar
> +	 variable and a vector fallback.  */
> +      icode =3D code_for_aarch64_sel_dup (mode);
> +    return e.use_vcond_mask_insn (icode);
> +  }
> +};
> +
>    class svindex_impl : public function_base
>    {
>    public:
> @@ -3028,5 +3122,8 @@ FUNCTION (svzip1q, unspec_based_function,=20
> (UNSPEC_ZIP1Q, UNSPEC_ZIP1Q,
>    FUNCTION (svzip2, svzip_impl, (1))
>    FUNCTION (svzip2q, unspec_based_function, (UNSPEC_ZIP2Q, UNSPEC_ZIP2Q,
>    					   UNSPEC_ZIP2Q))
> +NEON_SVE_BRIDGE_FUNCTION (svget_neonq, svget_neonq_impl,)
> +NEON_SVE_BRIDGE_FUNCTION (svset_neonq, svset_neonq_impl,)
> +NEON_SVE_BRIDGE_FUNCTION (svdup_neonq, svdup_neonq_impl,)
>
>    } /* end namespace aarch64_sve */
> diff --git a/gcc/config/aarch64/aarch64-sve-builtins-functions.h=20
> b/gcc/config/aarch64/aarch64-sve-builtins-functions.h
> index=20
> 2729877d914414eff33182e03ab1dfc94a3515fa..bfb7fea674a905a2eb99f2bac7cbcb7=
2af681b52=20
> 100644
> --- a/gcc/config/aarch64/aarch64-sve-builtins-functions.h
> +++ b/gcc/config/aarch64/aarch64-sve-builtins-functions.h
> @@ -622,4 +622,8 @@ public:
>      namespace { static CONSTEXPR const CLASS NAME##_obj ARGS; } \
>      namespace functions { const function_base *const NAME =3D &NAME##_ob=
j; }
>
> +#define NEON_SVE_BRIDGE_FUNCTION(NAME, CLASS, ARGS) \
> +  namespace { static CONSTEXPR const CLASS NAME##_obj ARGS; } \
> +  namespace neon_sve_bridge_functions { const function_base *const NAME=
=20
> =3D &NAME##_obj; }
> +
>    #endif
> diff --git a/gcc/config/aarch64/aarch64-sve-builtins-shapes.h=20
> b/gcc/config/aarch64/aarch64-sve-builtins-shapes.h
> index=20
> 7483c1d04b8e463e607e8e65aa94233460f77648..30c0bf8503622b0320a334b79c32823=
3248122a4=20
> 100644
> --- a/gcc/config/aarch64/aarch64-sve-builtins-shapes.h
> +++ b/gcc/config/aarch64/aarch64-sve-builtins-shapes.h
> @@ -186,6 +186,13 @@ namespace aarch64_sve
>        extern const function_shape *const unary_uint;
>        extern const function_shape *const unary_widen;
>      }
> +
> +  namespace neon_sve_bridge_shapes
> +  {
> +    extern const function_shape *const get_neonq;
> +    extern const function_shape *const set_neonq;
> +    extern const function_shape *const dup_neonq;
> +  }
>    }
>
>    #endif
> diff --git a/gcc/config/aarch64/aarch64-sve-builtins-shapes.cc=20
> b/gcc/config/aarch64/aarch64-sve-builtins-shapes.cc
> index=20
> af816c4c9e705d9cc4bce5cc50481cb27e6a03a7..46e65cc78b3cf7bb70344a856c8fdb4=
81534f46c=20
> 100644
> --- a/gcc/config/aarch64/aarch64-sve-builtins-shapes.cc
> +++ b/gcc/config/aarch64/aarch64-sve-builtins-shapes.cc
> @@ -29,6 +29,7 @@
>    #include "optabs.h"
>    #include "aarch64-sve-builtins.h"
>    #include "aarch64-sve-builtins-shapes.h"
> +#include "aarch64-builtins.h"
>
>    /* In the comments below, _t0 represents the first type suffix and _t1
>       represents the second.  Square brackets enclose characters that are
> @@ -145,6 +146,76 @@ parse_element_type (const function_instance=20
> &instance, const char *&format)
>      gcc_unreachable ();
>    }
>
> +int
> +parse_neon_type (type_suffix_index suffix, int ch)
> +{
> +  if (ch =3D=3D 'Q')
> +    {
> +      switch (suffix)
> +	{
> +	case TYPE_SUFFIX_s8:
> +	  return Int8x16_t;
> +	case TYPE_SUFFIX_s16:
> +	  return Int16x8_t;
> +	case TYPE_SUFFIX_s32:
> +	  return Int32x4_t;
> +	case TYPE_SUFFIX_s64:
> +	  return Int64x2_t;
> +	case TYPE_SUFFIX_u8:
> +	  return Uint8x16_t;
> +	case TYPE_SUFFIX_u16:
> +	  return Uint16x8_t;
> +	case TYPE_SUFFIX_u32:
> +	  return Uint32x4_t;
> +	case TYPE_SUFFIX_u64:
> +	  return Uint64x2_t;
> +	case TYPE_SUFFIX_f16:
> +	  return Float16x8_t;
> +	case TYPE_SUFFIX_f32:
> +	  return Float32x4_t;
> +	case TYPE_SUFFIX_f64:
> +	  return Float64x2_t;
> +	case TYPE_SUFFIX_bf16:
> +	  return Bfloat16x8_t;
> +	default:
> +	  gcc_unreachable ();
> +	}
> +    }
> +  if (ch =3D=3D 'D')
> +    {
> +      switch (suffix)
> +	{
> +	case TYPE_SUFFIX_s8:
> +	  return Int8x8_t;
> +	case TYPE_SUFFIX_s16:
> +	  return Int16x4_t;
> +	case TYPE_SUFFIX_s32:
> +	  return Int32x2_t;
> +	case TYPE_SUFFIX_s64:
> +	  return Int64x1_t;
> +	case TYPE_SUFFIX_u8:
> +	  return Uint8x8_t;
> +	case TYPE_SUFFIX_u16:
> +	  return Uint16x4_t;
> +	case TYPE_SUFFIX_u32:
> +	  return Uint32x2_t;
> +	case TYPE_SUFFIX_u64:
> +	  return Uint64x1_t;
> +	case TYPE_SUFFIX_f16:
> +	  return Float16x4_t;
> +	case TYPE_SUFFIX_f32:
> +	  return Float32x2_t;
> +	case TYPE_SUFFIX_f64:
> +	  return Float64x1_t;
> +	case TYPE_SUFFIX_bf16:
> +	  return Bfloat16x4_t;
> +	default:
> +	  gcc_unreachable ();
> +	}
> +    }
> +  gcc_unreachable ();
> +}

I think it'd be better to put this information in the type_suffix_info.
E.g. maybe we could add a DEF_SVE_NEON_TYPE macro that allows
definitions like:

DEF_SVE_NEON_TYPE_SUFFIX (s8, svint8_t, signed, 8, VNx16QImode,
			  Int8x8_t, Int8x16_t)

aarch64-sve-builtins.def could then provide a default definition
that forwards through DEF_SVE_TYPE_SUFFIX.  ARM_NEON_H_TYPES_LAST
could be used to initialise the new type_suffix_info fields for
types that don't have an Advanced SIMD equivalent (which is just
predicates, AFAIK).

> +
>    /* Read and return a type from FORMAT for function INSTANCE.  Advance
>       FORMAT beyond the type string.  The format is:
>
> @@ -158,6 +229,8 @@ parse_element_type (const function_instance=20
> &instance, const char *&format)
>       s<elt>  - a scalar type with the given element suffix
>       t<elt>  - a vector or tuple type with given element suffix [*1]
>       v<elt>  - a vector with the given element suffix
> +   D<elt>  - a 64 bit neon vector
> +   Q<elt>  - a 128 bit neon vector
>
>       where <elt> has the format described above parse_element_type
>
> @@ -224,6 +297,13 @@ parse_type (const function_instance &instance,=20
> const char *&format)
>          return acle_vector_types[0][type_suffixes[suffix].vector_type];
>        }
>
> +  if (ch =3D=3D 'Q' || ch =3D=3D 'D')
> +    {
> +      type_suffix_index suffix =3D parse_element_type (instance, format);
> +      int neon_index =3D parse_neon_type (suffix, ch);
> +      return aarch64_simd_types[neon_index].itype;
> +    }
> +
>      gcc_unreachable ();
>    }
>
> @@ -450,6 +530,12 @@ long_type_suffix (function_resolver &r,=20
> type_suffix_index type)
>      static CONSTEXPR const NAME##_def NAME##_obj; \
>      namespace shapes { const function_shape *const NAME =3D &NAME##_obj;=
 }
>
> +/* Declare the function neon_sve_bridge_shape NAME, pointing it to an=20
> instance
> +   of class <NAME>_def.  */
> +#define NEON_SVE_BRIDGE_SHAPE(NAME) \
> +  static CONSTEXPR const NAME##_def NAME##_obj; \
> +  namespace neon_sve_bridge_shapes { const function_shape *const NAME =
=3D=20
> &NAME##_obj; }
> +

I don't think these shapes need to go in their own namespace.
The shapes are there for whatever needs them.

(I agree it makes sense to use a separate namespace for the
functions though, to help distinguish functions that are defined
by the arm_sve.h pragma from those that are defined by the
arm_neon_sve_bridge.h pragma.)

>    /* Base class for functions that are not overloaded.  */
>    struct nonoverloaded_base : public function_shape
>    {
> @@ -1917,6 +2003,72 @@ struct get_def : public overloaded_base<0>
>    };
>    SHAPE (get)
>
> +/* <t0>xN_t svfoo[_t0](sv<t0>_t).  */
> +struct get_neonq_def : public overloaded_base<0>
> +{
> +  void
> +  build (function_builder &b, const function_group_info &group) const=20
> override
> +  {
> +    b.add_overloaded_functions (group, MODE_none);
> +    build_all (b, "Q0,v0", group, MODE_none);
> +  }
> +  tree
> +  resolve (function_resolver &r) const override
> +  {
> +    unsigned int i, nargs;
> +    type_suffix_index type;
> +    if (!r.check_gp_argument (1, i, nargs)
> +	|| (type =3D r.infer_tuple_type (i)) =3D=3D NUM_TYPE_SUFFIXES)
> +      return error_mark_node;
> +    return r.resolve_to (r.mode_suffix_id, type);
> +  }

I think this can just forward to r.resolve_unary.

> +};
> +NEON_SVE_BRIDGE_SHAPE (get_neonq)
> +
> +/* sv<t0>_t svfoo[_t0](sv<t0>_t, <t0>xN_t).  */
> +struct set_neonq_def : public overloaded_base<0>
> +{
> +  void
> +  build (function_builder &b, const function_group_info &group) const=20
> override
> +  {
> +    b.add_overloaded_functions (group, MODE_none);
> +    build_all (b, "v0,v0,Q0", group, MODE_none);
> +  }
> +  tree
> +  resolve (function_resolver &r) const override
> +  {
> +    unsigned int i, nargs;
> +    type_suffix_index type;
> +    if (!r.check_gp_argument (1, i, nargs)
> +	|| (type =3D r.infer_tuple_type (i)) =3D=3D NUM_TYPE_SUFFIXES)
> +      return error_mark_node;
> +    return r.resolve_to (r.mode_suffix_id, type);

This ought to check both arguments, which would then require some
inference for neon types.

> +  }
> +};
> +NEON_SVE_BRIDGE_SHAPE (set_neonq)
> +
> +/* sv<t0>_t svfoo[_t0](<t0>xN_t).  */
> +struct dup_neonq_def : public overloaded_base<0>
> +{
> +  void
> +  build (function_builder &b, const function_group_info &group) const=20
> override
> +  {
> +    b.add_overloaded_functions (group, MODE_none);
> +    build_all (b, "v0,Q0", group, MODE_none);
> +  }
> +  tree
> +  resolve (function_resolver &r) const override
> +  {
> +    unsigned int i, nargs;
> +    type_suffix_index type;
> +    if (!r.check_gp_argument (1, i, nargs)
> +	|| (type =3D r.infer_tuple_type (i)) =3D=3D NUM_TYPE_SUFFIXES)
> +      return error_mark_node;
> +    return r.resolve_to (r.mode_suffix_id, type);

I wouldn't expect this to work, since it's likely to expect an
SVE vector rather than an Advanced SIMD vector.

> +  }
> +};
> +NEON_SVE_BRIDGE_SHAPE (dup_neonq)
> +
>    /* sv<t0>_t svfoo[_t0](sv<t0>_t, uint64_t)
>       <t0>_t svfoo[_n_t0](<t0>_t, uint64_t)
>
> diff --git a/gcc/config/aarch64/aarch64-sve-builtins.cc=20
> b/gcc/config/aarch64/aarch64-sve-builtins.cc
> index=20
> 161a14edde7c9fb1b13b146cf50463e2d78db264..c994c83c5777e500ab2cf76ee2ed29d=
cebca074f=20
> 100644
> --- a/gcc/config/aarch64/aarch64-sve-builtins.cc
> +++ b/gcc/config/aarch64/aarch64-sve-builtins.cc
> @@ -529,6 +529,13 @@ static CONSTEXPR const function_group_info=20
> function_groups[] =3D {
>    #include "aarch64-sve-builtins.def"
>    };
>
> +/* A list of all NEON-SVE-Bridge ACLE functions.  */
> +static CONSTEXPR const function_group_info neon_sve_function_groups[] =
=3D {
> +#define DEF_NEON_SVE_FUNCTION(NAME, SHAPE, TYPES, PREDS) \
> +  { #NAME, &neon_sve_bridge_functions::NAME,=20
> &neon_sve_bridge_shapes::SHAPE, types_##TYPES, preds_##PREDS },
> +#include "aarch64-neon-sve-bridge-builtins.def"
> +};
> +
>    /* The scalar type associated with each vector type.  */
>    extern GTY(()) tree scalar_types[NUM_VECTOR_TYPES];
>    tree scalar_types[NUM_VECTOR_TYPES];
> @@ -3560,6 +3567,20 @@ handle_arm_sve_h ()
>        builder.register_function_group (function_groups[i]);
>    }
>
> +/* Implement #pragma GCC aarch64 "arm_sve.h".  */
> +void
> +handle_arm_neon_sve_bridge_h ()
> +{
> +

Nit: excess vertical space.

> +  sve_switcher sve;
> +
> +  /* Define the functions.  */
> +  function_table =3D new hash_table<registered_function_hasher> (1023);
> +  function_builder builder;
> +  for (unsigned int i =3D 0; i < ARRAY_SIZE (neon_sve_function_groups); =
++i)
> +    builder.register_function_group (neon_sve_function_groups[i]);
> +}
> +
>    /* Return the function decl with SVE function subcode CODE, or=20
> error_mark_node
>       if no such function exists.  */
>    tree
> diff --git a/gcc/config/aarch64/arm_neon_sve_bridge.h=20
> b/gcc/config/aarch64/arm_neon_sve_bridge.h
> new file mode 100644
> index=20
> 0000000000000000000000000000000000000000..8f526eae86b94f615d22fe8de52583b=
b403e102e
> --- /dev/null
> +++ b/gcc/config/aarch64/arm_neon_sve_bridge.h
> @@ -0,0 +1,38 @@
> +/* AArch64 NEON-SVE Bridge intrinsics include file.
> +   Copyright (C) 2023 Free Software Foundation, Inc.
> +
> +   This file is part of GCC.
> +
> +   GCC is free software; you can redistribute it and/or modify it
> +   under the terms of the GNU General Public License as published
> +   by the Free Software Foundation; either version 3, or (at your
> +   option) any later version.
> +
> +   GCC is distributed in the hope that it will be useful, but WITHOUT
> +   ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
> +   or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public
> +   License for more details.
> +
> +   Under Section 7 of GPL version 3, you are granted additional
> +   permissions described in the GCC Runtime Library Exception, version
> +   3.1, as published by the Free Software Foundation.
> +
> +   You should have received a copy of the GNU General Public License and
> +   a copy of the GCC Runtime Library Exception along with this program;
> +   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
> +   <http://www.gnu.org/licenses/>.  */
> +
> +#ifndef _ARM_NEON_SVE_BRIDGE_H_
> +#define _ARM_NEON_SVE_BRIDGE_H_
> +
> +#include <arm_neon.h>
> +#include <arm_sve.h>
> +
> +/* NOTE: This implementation of arm_neon_sve_bridge.h is intentionally=20
> short.  It does
> +   not define the types and intrinsic functions directly in C and C++
> +   code, but instead uses the following pragma to tell GCC to insert the
> +   necessary type and function definitions itself.  The net effect is the
> +   same, and the file is a complete implementation of=20
> arm_neon_sve_bridge.h.  */
> +#pragma GCC aarch64 "arm_neon_sve_bridge.h"
> +
> +#endif
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.c-torture/execute/neon-sve-bridge.c=20
> b/gcc/testsuite/gcc.c-torture/execute/neon-sve-bridge.c
> new file mode 100644
> index=20
> 0000000000000000000000000000000000000000..45dbcf97a647f0842693dbe47eedb42=
64e7b61fd
> --- /dev/null
> +++ b/gcc/testsuite/gcc.c-torture/execute/neon-sve-bridge.c
> @@ -0,0 +1,55 @@
> +// { dg-options "-march=3Darmv8.2-a+sve" }
> +// { dg-do run { target aarch64_sve_hw } }
> +
> +#include <arm_neon_sve_bridge.h>
> +
> +extern void abort (void);
> +
> +int
> +svget_neonq_test ()
> +{
> +  int64_t val1 =3D 987654321;
> +  svint64_t sveInput =3D svdup_n_s64(val1);
> +  int64x2_t neonReturn =3D svget_neonq_s64 (sveInput);
> +  int64_t val1Return =3D vgetq_lane_s64(neonReturn, 1);
> +  if (val1 =3D=3D val1Return)
> +    return 0;
> +  return 1;
> +}
> +
> +int
> +svset_neonq_test ()
> +{
> +  int64_t val2 =3D 123456789;
> +  int64x2_t NeonInput =3D vdupq_n_s64(val2);
> +  svint64_t sveReturn;
> +  sveReturn =3D svset_neonq_s64 (sveReturn, NeonInput);
> +  int64_t val2Return =3D svlasta_s64(svptrue_b64(), sveReturn);
> +  if (val2 =3D=3D val2Return)
> +    return 0;
> +  return 1;
> +}
> +
> +int
> +svdup_neonq_test ()
> +{
> +  uint32_t val2 =3D 123456789;
> +  uint32x4_t NeonInput =3D vdupq_n_u32(val2);
> +  svuint32_t sveReturn =3D svdup_neonq_u32 (NeonInput);
> +  uint32_t val2Return =3D svlastb_u32(svptrue_b32(), sveReturn);
> +  if (val2 =3D=3D val2Return)
> +    return 0;
> +  return 1;
> +}
> +
> +int
> +main ()
> +{
> +  if (svget_neonq_test() =3D=3D 1)
> +    abort ();
> +  if (svset_neonq_test() =3D=3D 1)
> +    abort ();
> +  if (svdup_neonq_test() =3D=3D 1)
> +    abort ();
> +  return 0;
> +}
> \ No newline at end of file

This is a good test to have.  It should probably go in gcc.dg/torture
instead, though, since there don't seem to be any target-specific tests
in gcc.c-torture/execute.  Also, I think it'd be more robust to compare
the full 128 bits, rather than a duplicated element.

In addition, it'd be good to have some tests for the resolver functions,
along the lines of those in gcc.target/aarch64/sve/acle/general-c.
There only needs to be one file per shape rather than one file per
function, but that amounts to the same thing in this case.  The tests
are named after the shape.

Also -- and this is going to be really tedious, sorry -- I think
we need to exercise every variation in both overloaded and
non-overloaded form.  The way the SVE ACLE code currently does that
is via gcc.target/aarch64/sve/acle/asm.

Thanks,
Richard