From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by sourceware.org (Postfix) with ESMTP id 28B843858019 for ; Wed, 13 Dec 2023 14:38:20 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 28B843858019 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 28B843858019 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=217.140.110.172 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1702478302; cv=none; b=VFaV1uTW98J7MFttEviQFs5wy58+0zy6tyvCIm/V+cNY1/o2cu4MCbLpcgmQ7eUKSyZaJ84EB4hgosaCOvuLn6c17zdHNxXnaFvXGn2QVPCHYzWxJz0u6V2Or1jT2mZxIjhoIVlXPFLgPUEiEheNlWp28LRJIu9nWUkbQJWlrwc= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1702478302; c=relaxed/simple; bh=Dh7OLEYL+V9SbQXKTfy491G/JxAvhlpF1ck3Mq3aqTo=; h=From:To:Subject:Date:Message-ID:MIME-Version; b=Os+GF7UBj8RiNIzhF+QXqrf++NbpKOtmosAk28bK9NwPDTxIuqTfq8skIanioKOMp5FalttxSyx6p719Kn/HFZWJiwRAOv5RqlRIvjAX4MU/f1BiIjJLKOCYLkqGwl7koAnuf1JHPhgWdIdtQvZI4QamtONmHJFQBf5jHLBkcZk= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id E9168C15; Wed, 13 Dec 2023 06:39:05 -0800 (PST) Received: from localhost (e121540-lin.manchester.arm.com [10.32.110.72]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 1F0B53F738; Wed, 13 Dec 2023 06:38:19 -0800 (PST) From: Richard Sandiford To: Richard Ball Mail-Followup-To: Richard Ball ,"gcc-patches\@gcc.gnu.org" , Richard Earnshaw , Marcus Shawcroft , Kyrylo Tkachov , richard.sandiford@arm.com Cc: "gcc-patches\@gcc.gnu.org" , Richard Earnshaw , Marcus Shawcroft , Kyrylo Tkachov Subject: Re: [PATCH v4] aarch64: SVE/NEON Bridging intrinsics References: <3caeab7f-c38f-4640-bc51-d8245c05c860@arm.com> Date: Wed, 13 Dec 2023 14:38:17 +0000 In-Reply-To: <3caeab7f-c38f-4640-bc51-d8245c05c860@arm.com> (Richard Ball's message of "Mon, 11 Dec 2023 15:13:03 +0000") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-Spam-Status: No, score=-21.8 required=5.0 tests=BAYES_00,GIT_PATCH_0,KAM_DMARC_NONE,KAM_DMARC_STATUS,KAM_LAZY_DOMAIN_SECURITY,KAM_SHORT,SPF_HELO_NONE,SPF_NONE,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Richard Ball writes: > ACLE has added intrinsics to bridge between SVE and Neon. > > The NEON_SVE Bridge adds intrinsics that allow conversions between NEON and > SVE vectors. > > This patch adds support to GCC for the following 3 intrinsics: > svset_neonq, svget_neonq and svdup_neonq > > gcc/ChangeLog: > > * config.gcc: Adds new header to config. > * config/aarch64/aarch64-builtins.cc (enum aarch64_type_qualifiers): > Moved to header file. > (ENTRY): Likewise. > (enum aarch64_simd_type): Likewise. > (struct aarch64_simd_type_info): Remove static. > (GTY): Likewise. > * config/aarch64/aarch64-c.cc (aarch64_pragma_aarch64): > Defines pragma for arm_neon_sve_bridge.h. > * config/aarch64/aarch64-sve-builtins-base.h: New intrinsics. > * config/aarch64/aarch64-sve-builtins-base.cc > (class svget_neonq_impl): New intrinsic implementation. > (class svset_neonq_impl): Likewise. > (class svdup_neonq_impl): Likewise. > (NEON_SVE_BRIDGE_FUNCTION): New intrinsics. > * config/aarch64/aarch64-sve-builtins-functions.h > (NEON_SVE_BRIDGE_FUNCTION): Defines macro for NEON_SVE_BRIDGE > functions. > * config/aarch64/aarch64-sve-builtins-shapes.h: New shapes. > * config/aarch64/aarch64-sve-builtins-shapes.cc > (parse_element_type): Add NEON element types. > (parse_type): Likewise. > (struct get_neonq_def): Defines function shape for get_neonq. > (struct set_neonq_def): Defines function shape for set_neonq. > (struct dup_neonq_def): Defines function shape for dup_neonq. > * config/aarch64/aarch64-sve-builtins.cc > (DEF_SVE_TYPE_SUFFIX): Changed to be called through > SVE_NEON macro. > (DEF_SVE_NEON_TYPE_SUFFIX): Defines > macro for NEON_SVE_BRIDGE type suffixes. > (DEF_NEON_SVE_FUNCTION): Defines > macro for NEON_SVE_BRIDGE functions. > (function_resolver::infer_neon128_vector_type): Infers type suffix > for overloaded functions. > (init_neon_sve_builtins): Initialise neon_sve_bridge_builtins for LTO. > (handle_arm_neon_sve_bridge_h): Handles #pragma arm_neon_sve_bridge.h. > * config/aarch64/aarch64-sve-builtins.def > (DEF_SVE_NEON_TYPE_SUFFIX): Macro for handling neon_sve type suffixes. > (bf16): Replace entry with neon-sve entry. > (f16): Likewise. > (f32): Likewise. > (f64): Likewise. > (s8): Likewise. > (s16): Likewise. > (s32): Likewise. > (s64): Likewise. > (u8): Likewise. > (u16): Likewise. > (u32): Likewise. > (u64): Likewise. > * config/aarch64/aarch64-sve-builtins.h > (GCC_AARCH64_SVE_BUILTINS_H): Include aarch64-builtins.h. > (ENTRY): Add aarch64_simd_type definiton. > (enum aarch64_simd_type): Add neon information to type_suffix_info. > (struct type_suffix_info): New function. > * config/aarch64/aarch64-sve.md > (@aarch64_sve_get_neonq_): New intrinsic insn for big endian. > (@aarch64_sve_set_neonq_): Likewise. > * config/aarch64/aarch64.cc > (aarch64_init_builtins): Add call to init_neon_sve_builtins. > * config/aarch64/iterators.md: Add UNSPEC_SET_NEONQ. > * config/aarch64/aarch64-builtins.h: New file. > * config/aarch64/aarch64-neon-sve-bridge-builtins.def: New file. > * config/aarch64/arm_neon_sve_bridge.h: New file. > > gcc/testsuite/ChangeLog: > > * gcc.target/aarch64/sve/acle/asm/test_sve_acle.h: Add include > arm_neon_sve_bridge header file > * gcc.dg/torture/neon-sve-bridge.c: New test. > * gcc.target/aarch64/sve/acle/asm/dup_neonq_bf16.c: New test. > * gcc.target/aarch64/sve/acle/asm/dup_neonq_f16.c: New test. > * gcc.target/aarch64/sve/acle/asm/dup_neonq_f32.c: New test. > * gcc.target/aarch64/sve/acle/asm/dup_neonq_f64.c: New test. > * gcc.target/aarch64/sve/acle/asm/dup_neonq_s16.c: New test. > * gcc.target/aarch64/sve/acle/asm/dup_neonq_s32.c: New test. > * gcc.target/aarch64/sve/acle/asm/dup_neonq_s64.c: New test. > * gcc.target/aarch64/sve/acle/asm/dup_neonq_s8.c: New test. > * gcc.target/aarch64/sve/acle/asm/dup_neonq_u16.c: New test. > * gcc.target/aarch64/sve/acle/asm/dup_neonq_u32.c: New test. > * gcc.target/aarch64/sve/acle/asm/dup_neonq_u64.c: New test. > * gcc.target/aarch64/sve/acle/asm/dup_neonq_u8.c: New test. > * gcc.target/aarch64/sve/acle/asm/get_neonq_bf16.c: New test. > * gcc.target/aarch64/sve/acle/asm/get_neonq_f16.c: New test. > * gcc.target/aarch64/sve/acle/asm/get_neonq_f32.c: New test. > * gcc.target/aarch64/sve/acle/asm/get_neonq_f64.c: New test. > * gcc.target/aarch64/sve/acle/asm/get_neonq_s16.c: New test. > * gcc.target/aarch64/sve/acle/asm/get_neonq_s32.c: New test. > * gcc.target/aarch64/sve/acle/asm/get_neonq_s64.c: New test. > * gcc.target/aarch64/sve/acle/asm/get_neonq_s8.c: New test. > * gcc.target/aarch64/sve/acle/asm/get_neonq_u16.c: New test. > * gcc.target/aarch64/sve/acle/asm/get_neonq_u32.c: New test. > * gcc.target/aarch64/sve/acle/asm/get_neonq_u64.c: New test. > * gcc.target/aarch64/sve/acle/asm/get_neonq_u8.c: New test. > * gcc.target/aarch64/sve/acle/asm/set_neonq_bf16.c: New test. > * gcc.target/aarch64/sve/acle/asm/set_neonq_f16.c: New test. > * gcc.target/aarch64/sve/acle/asm/set_neonq_f32.c: New test. > * gcc.target/aarch64/sve/acle/asm/set_neonq_f64.c: New test. > * gcc.target/aarch64/sve/acle/asm/set_neonq_s16.c: New test. > * gcc.target/aarch64/sve/acle/asm/set_neonq_s32.c: New test. > * gcc.target/aarch64/sve/acle/asm/set_neonq_s64.c: New test. > * gcc.target/aarch64/sve/acle/asm/set_neonq_s8.c: New test. > * gcc.target/aarch64/sve/acle/asm/set_neonq_u16.c: New test. > * gcc.target/aarch64/sve/acle/asm/set_neonq_u32.c: New test. > * gcc.target/aarch64/sve/acle/asm/set_neonq_u64.c: New test. > * gcc.target/aarch64/sve/acle/asm/set_neonq_u8.c: New test. > * gcc.target/aarch64/sve/acle/general-c/dup_neonq_1.c: New test. > * gcc.target/aarch64/sve/acle/general-c/get_neonq_1.c: New test. > * gcc.target/aarch64/sve/acle/general-c/set_neonq_1.c: New test. Thanks, looks great. OK for trunk with the trivial changes below. No need to repost unless you want to. > [...] > diff --git a/gcc/config/aarch64/aarch64-sve-builtins-base.cc b/gcc/config/aarch64/aarch64-sve-builtins-base.cc > index 4e5a88aa03a994e42f6b2528c44547410390b26c..a7766a7d468cdeccdde8907632b3f702969f4bd7 100644 > --- a/gcc/config/aarch64/aarch64-sve-builtins-base.cc > +++ b/gcc/config/aarch64/aarch64-sve-builtins-base.cc > @@ -44,6 +44,7 @@ > #include "aarch64-sve-builtins-shapes.h" > #include "aarch64-sve-builtins-base.h" > #include "aarch64-sve-builtins-functions.h" > +#include "aarch64-builtins.h" > #include "ssa.h" > #include "gimple-fold.h" > > @@ -1099,6 +1100,116 @@ public: > } > }; > > +class svget_neonq_impl : public function_base > +{ > +public: > + gimple * > + fold (gimple_folder &f) const override > + { > + if (BYTES_BIG_ENDIAN) > + return NULL; > + tree rhs_sve_vector = gimple_call_arg (f.call, 0); > + tree rhs_vector = build3 (BIT_FIELD_REF, TREE_TYPE (f.lhs), > + rhs_sve_vector, bitsize_int (128), bitsize_int (0)); > + return gimple_build_assign (f.lhs, rhs_vector); > + } > + rtx > + expand (function_expander &e) const override > + { > + if (BYTES_BIG_ENDIAN) > + { > + machine_mode mode = e.vector_mode (0); > + insn_code icode = code_for_aarch64_sve_get_neonq (mode); > + unsigned int nunits = 128 / GET_MODE_UNIT_BITSIZE (mode); > + rtx indices = aarch64_gen_stepped_int_parallel > + (nunits, nunits - 1 , -1); Formatting nit, should be: (nunits, nunits - 1, -1); (with no space before the comma) > + > + e.add_output_operand (icode); > + e.add_input_operand (icode, e.args[0]); > + e.add_fixed_operand (indices); > + return e.generate_insn (icode); > + } > + return simplify_gen_subreg (e.result_mode (), e.args[0], > + GET_MODE (e.args[0]), 0); > + } > +}; > + > +class svset_neonq_impl : public function_base > +{ > +public: > + rtx > + expand (function_expander &e) const override > + { > + machine_mode mode = e.vector_mode (0); > + rtx_vector_builder builder (VNx16BImode, 16, 2); > + for (unsigned int i = 0; i < 16; i++) > + builder.quick_push (CONST1_RTX (BImode)); > + for (unsigned int i = 0; i < 16; i++) > + builder.quick_push (CONST0_RTX (BImode)); > + e.args.quick_push (builder.build ()); > + if (BYTES_BIG_ENDIAN) > + return e.use_exact_insn (code_for_aarch64_sve_set_neonq (mode)); > + insn_code icode = code_for_vcond_mask (mode, mode); > + e.args[1] = lowpart_subreg (mode, e.args[1], GET_MODE (e.args[1])); > + e.add_output_operand (icode); > + e.add_input_operand (icode, e.args[1]); > + e.add_input_operand (icode, e.args[0]); > + e.add_input_operand (icode, e.args[2]); > + return e.generate_insn (icode); > + } > +}; > + > +class svdup_neonq_impl : public function_base > +{ > +public: > + gimple * > + fold (gimple_folder &f) const override > + { > + if (BYTES_BIG_ENDIAN) > + { > + return NULL; > + } Formatting nit, should just be: if (BYTES_BIG_ENDIAN) return NULL; without the braces. > + tree rhs_vector = gimple_call_arg (f.call, 0); > + unsigned HOST_WIDE_INT neon_nelts > + = TYPE_VECTOR_SUBPARTS (TREE_TYPE (rhs_vector)).to_constant (); > + poly_uint64 sve_nelts; > + sve_nelts = TYPE_VECTOR_SUBPARTS (TREE_TYPE (f.lhs)); It doesn't seem necessary to split this over two lines. Seems simpler as: poly_uint64 sve_nelts = TYPE_VECTOR_SUBPARTS (TREE_TYPE (f.lhs)); > + vec_perm_builder builder (sve_nelts, neon_nelts, 1); > + for (unsigned int i = 0; i < neon_nelts; i++) > + { > + builder.quick_push (i); > + } Formatting nit, should be: for (unsigned int i = 0; i < neon_nelts; i++) builder.quick_push (i); > + vec_perm_indices indices (builder, 1, neon_nelts); > + tree perm_type = build_vector_type (ssizetype, sve_nelts); > + return gimple_build_assign (f.lhs, VEC_PERM_EXPR, > + rhs_vector, > + rhs_vector, > + vec_perm_indices_to_tree (perm_type, indices)); > + } > + rtx Formatting nit, missing blank line between functions. > + expand (function_expander &e) const override > + { > + insn_code icode; Very minor, but there doesn't seem any need to forward-declare this variable. Can just be: > + machine_mode mode = e.vector_mode (0); > + if (BYTES_BIG_ENDIAN) > + { > + icode = code_for_aarch64_vec_duplicate_vq_be (mode); insn_code icode = code_for_aarch64_vec_duplicate_vq_be (mode); here and similarly below. > + unsigned int nunits = 128 / GET_MODE_UNIT_BITSIZE (mode); > + rtx indices = aarch64_gen_stepped_int_parallel > + (nunits, nunits - 1 , -1); Should be: (nunits, nunits - 1, -1); > + > + e.add_output_operand (icode); > + e.add_input_operand (icode, e.args[0]); > + e.add_fixed_operand (indices); > + return e.generate_insn (icode); > + } > + icode = code_for_aarch64_vec_duplicate_vq_le (mode); > + e.add_output_operand (icode); > + e.add_input_operand (icode, e.args[0]); > + return e.generate_insn (icode); > + } > +}; > + > class svindex_impl : public function_base > { > public: > @@ -3122,5 +3233,8 @@ FUNCTION (svzip1q, unspec_based_function, (UNSPEC_ZIP1Q, UNSPEC_ZIP1Q, > FUNCTION (svzip2, svzip_impl, (1)) > FUNCTION (svzip2q, unspec_based_function, (UNSPEC_ZIP2Q, UNSPEC_ZIP2Q, > UNSPEC_ZIP2Q)) > +NEON_SVE_BRIDGE_FUNCTION (svget_neonq, svget_neonq_impl,) > +NEON_SVE_BRIDGE_FUNCTION (svset_neonq, svset_neonq_impl,) > +NEON_SVE_BRIDGE_FUNCTION (svdup_neonq, svdup_neonq_impl,) > > } /* end namespace aarch64_sve */ > [...] > @@ -2092,6 +2109,35 @@ function_resolver::infer_integer_vector_type (unsigned int argno) > return type; > } > > +/* Require argument ARGNO to have some form of NEON128 vector type. Return the > + associated type suffix on success. > + Report an error and return NUM_TYPE_SUFFIXES on failure. */ > +type_suffix_index > +function_resolver::infer_neon128_vector_type (unsigned int argno) > +{ > + tree actual = get_argument_type (argno); > + if (actual == error_mark_node) > + return NUM_TYPE_SUFFIXES; > + > + for (unsigned int suffix_i = 0; suffix_i < NUM_TYPE_SUFFIXES; ++suffix_i) > + { > + int neon_index = type_suffixes[suffix_i].neon128_type; > + if (neon_index != ARM_NEON_H_TYPES_LAST) > + { > + tree type = aarch64_simd_types[neon_index].itype; > + if (type && matches_type_p (type, actual)) > + { > + return type_suffix_index (suffix_i); > + } Formatting, should be: if (type && matches_type_p (type, actual)) return type_suffix_index (suffix_i); > + } > + } > + > + error_at (location, "passing %qT to argument %d of %qE, which" > + " expects a 128 bit NEON vector type", actual, argno + 1, fndecl); > + return NUM_TYPE_SUFFIXES; > +} > + > + > /* Like infer_vector_type, but also require the type to be an unsigned > integer. */ > type_suffix_index > @@ -4457,6 +4503,14 @@ init_builtins () > } > } > > +/* Initialize the SVE-NEON Bridge at start-up, if LTO is required. */ > +void > +init_neon_sve_builtins () > +{ > + if (in_lto_p) > + handle_arm_neon_sve_bridge_h (); > +} > + I think the existing in_lto_p if statement in init_builtins should do this, rather than adding a new "external" function that aarch64_init_builtins has to call. Thanks, Richard