From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by sourceware.org (Postfix) with ESMTP id 346613858C2C for ; Fri, 1 Dec 2023 16:39:53 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 346613858C2C Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 346613858C2C Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=217.140.110.172 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1701448796; cv=none; b=KO3hYphModzY9cz2bTs1cGK/5U87bmN8zHl/dcliJR5weyYQvgSuW5UuzouQ/FCqsv/Jwjwt8EA9TFWP+9pF2ma6UrOHRoxXi7paBsw8zLEJhQydAThbEe45KoZqGHFS7g2UU9BNdppgsUkLjXyrkUvF0Q2cMkU7t7VUqCnqpEM= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1701448796; c=relaxed/simple; bh=tsuhyHa+BlBI40/1XQnHOL+IQFNVnf4sBRljXU/e81I=; h=Message-ID:Date:MIME-Version:Subject:To:From; b=BcTwPma8Ovz7X34mfZSol7BRw9FRU8pKkG9TgGb82FhvB85iK1KQ2Ly31l5sMp0C4Av1GpbPkyPzdUCGAVMYI/KViCFjBo/eu2IO4ntjq27JWCAZmHqByF2fCsHOobRK60tAwBb9HWnNnBB0SHFyS0FN9WVBbokivXoV+HKx+TQ= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id DB6661007; Fri, 1 Dec 2023 08:40:38 -0800 (PST) Received: from [10.57.72.109] (unknown [10.57.72.109]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id ED55A3F6C4; Fri, 1 Dec 2023 08:39:51 -0800 (PST) Message-ID: Date: Fri, 1 Dec 2023 16:39:44 +0000 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH 8/8] aarch64: Add SVE support for simd clones [PR 96342] To: gcc-patches@gcc.gnu.org, richard.sandiford@arm.com References: <73b53052-c3a4-4028-2836-ade419431eda@arm.com> <25cccf6c-1b3c-a032-7930-aba25a311dca@arm.com> Content-Language: en-US From: "Andre Vieira (lists)" In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-11.8 required=5.0 tests=BAYES_00,BODY_8BITS,GIT_PATCH_0,KAM_DMARC_NONE,KAM_DMARC_STATUS,KAM_LAZY_DOMAIN_SECURITY,KAM_LOTSOFHASH,KAM_SHORT,SPF_HELO_NONE,SPF_NONE,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On 29/11/2023 17:01, Richard Sandiford wrote: > "Andre Vieira (lists)" writes: >> Rebased, no major changes, still needs review. >> >> On 30/08/2023 10:19, Andre Vieira (lists) via Gcc-patches wrote: >>> This patch finalizes adding support for the generation of SVE simd >>> clones when no simdlen is provided, following the ABI rules where the >>> widest data type determines the minimum amount of elements in a length >>> agnostic vector. >>> >>> gcc/ChangeLog: >>> >>>         * config/aarch64/aarch64-protos.h (add_sve_type_attribute): >>> Declare. >>>     * config/aarch64/aarch64-sve-builtins.cc (add_sve_type_attribute): >>> Make >>>     visibility global. >>>     * config/aarch64/aarch64.cc (aarch64_fntype_abi): Ensure SVE ABI is >>>     chosen over SIMD ABI if a SVE type is used in return or arguments. >>>     (aarch64_simd_clone_compute_vecsize_and_simdlen): Create VLA simd >>> clone >>>     when no simdlen is provided, according to ABI rules. >>>     (aarch64_simd_clone_adjust): Add '+sve' attribute to SVE simd clones. >>>     (aarch64_simd_clone_adjust_ret_or_param): New. >>>     (TARGET_SIMD_CLONE_ADJUST_RET_OR_PARAM): Define. >>>     * omp-simd-clone.cc (simd_clone_mangle): Print 'x' for VLA simdlen. >>>     (simd_clone_adjust): Adapt safelen check to be compatible with VLA >>>     simdlen. >>> >>> gcc/testsuite/ChangeLog: >>> >>>     * c-c++-common/gomp/declare-variant-14.c: Adapt aarch64 scan. >>>     * gfortran.dg/gomp/declare-variant-14.f90: Likewise. >>>     * gcc.target/aarch64/declare-simd-1.c: Remove warning checks where no >>>     longer necessary. >>>     * gcc.target/aarch64/declare-simd-2.c: Add SVE clone scan. >> >> diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h >> index 60a55f4bc1956786ea687fc7cad7ec9e4a84e1f0..769d637f63724a7f0044f48f3dd683e0fb46049c 100644 >> --- a/gcc/config/aarch64/aarch64-protos.h >> +++ b/gcc/config/aarch64/aarch64-protos.h >> @@ -1005,6 +1005,8 @@ namespace aarch64_sve { >> #ifdef GCC_TARGET_H >> bool verify_type_context (location_t, type_context_kind, const_tree, bool); >> #endif >> + void add_sve_type_attribute (tree, unsigned int, unsigned int, >> + const char *, const char *); >> } >> >> extern void aarch64_split_combinev16qi (rtx operands[3]); >> diff --git a/gcc/config/aarch64/aarch64-sve-builtins.cc b/gcc/config/aarch64/aarch64-sve-builtins.cc >> index 161a14edde7c9fb1b13b146cf50463e2d78db264..6f99c438d10daa91b7e3b623c995489f1a8a0f4c 100644 >> --- a/gcc/config/aarch64/aarch64-sve-builtins.cc >> +++ b/gcc/config/aarch64/aarch64-sve-builtins.cc >> @@ -569,14 +569,16 @@ static bool reported_missing_registers_p; >> /* Record that TYPE is an ABI-defined SVE type that contains NUM_ZR SVE vectors >> and NUM_PR SVE predicates. MANGLED_NAME, if nonnull, is the ABI-defined >> mangling of the type. ACLE_NAME is the name of the type. */ >> -static void >> +void >> add_sve_type_attribute (tree type, unsigned int num_zr, unsigned int num_pr, >> const char *mangled_name, const char *acle_name) >> { >> tree mangled_name_tree >> = (mangled_name ? get_identifier (mangled_name) : NULL_TREE); >> + tree acle_name_tree >> + = (acle_name ? get_identifier (acle_name) : NULL_TREE); >> >> - tree value = tree_cons (NULL_TREE, get_identifier (acle_name), NULL_TREE); >> + tree value = tree_cons (NULL_TREE, acle_name_tree, NULL_TREE); >> value = tree_cons (NULL_TREE, mangled_name_tree, value); >> value = tree_cons (NULL_TREE, size_int (num_pr), value); >> value = tree_cons (NULL_TREE, size_int (num_zr), value); >> diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc >> index 37507f091c2a6154fa944c3a9fad6a655ab5d5a1..cb0947b18c6a611d55579b5b08d93f6a4a9c3b2c 100644 >> --- a/gcc/config/aarch64/aarch64.cc >> +++ b/gcc/config/aarch64/aarch64.cc >> @@ -4080,13 +4080,13 @@ aarch64_takes_arguments_in_sve_regs_p (const_tree fntype) >> static const predefined_function_abi & >> aarch64_fntype_abi (const_tree fntype) >> { >> - if (lookup_attribute ("aarch64_vector_pcs", TYPE_ATTRIBUTES (fntype))) >> - return aarch64_simd_abi (); >> - >> if (aarch64_returns_value_in_sve_regs_p (fntype) >> || aarch64_takes_arguments_in_sve_regs_p (fntype)) >> return aarch64_sve_abi (); >> >> + if (lookup_attribute ("aarch64_vector_pcs", TYPE_ATTRIBUTES (fntype))) >> + return aarch64_simd_abi (); >> + >> return default_function_abi; >> } >> > > I think we discussed this off-list later, but the change above shouldn't > be necessary. aarch64_vector_pcs must not be attached to SVE PCS functions, > so the two cases should be mutually exclusive. Yeah I had made the changes locally, but not updated the patch yet. > >> @@ -27467,7 +27467,7 @@ aarch64_simd_clone_compute_vecsize_and_simdlen (struct cgraph_node *node, >> int num, bool explicit_p) >> { >> tree t, ret_type; >> - unsigned int nds_elt_bits; >> + unsigned int nds_elt_bits, wds_elt_bits; >> int count; >> unsigned HOST_WIDE_INT const_simdlen; >> >> @@ -27513,10 +27513,14 @@ aarch64_simd_clone_compute_vecsize_and_simdlen (struct cgraph_node *node, >> if (TREE_CODE (ret_type) != VOID_TYPE) >> { >> nds_elt_bits = lane_size (SIMD_CLONE_ARG_TYPE_VECTOR, ret_type); >> + wds_elt_bits = nds_elt_bits; >> vec_elts.safe_push (std::make_pair (ret_type, nds_elt_bits)); >> } >> else >> - nds_elt_bits = POINTER_SIZE; >> + { >> + nds_elt_bits = POINTER_SIZE; >> + wds_elt_bits = 0; >> + } >> >> int i; >> tree type_arg_types = TYPE_ARG_TYPES (TREE_TYPE (node->decl)); >> @@ -27524,30 +27528,36 @@ aarch64_simd_clone_compute_vecsize_and_simdlen (struct cgraph_node *node, >> for (t = (decl_arg_p ? DECL_ARGUMENTS (node->decl) : type_arg_types), i = 0; >> t && t != void_list_node; t = TREE_CHAIN (t), i++) >> { >> - tree arg_type = decl_arg_p ? TREE_TYPE (t) : TREE_VALUE (t); >> + tree type = decl_arg_p ? TREE_TYPE (t) : TREE_VALUE (t); > > Not sure renaming arg_type is worth it. The original was probably > more descriptive. > >> if (clonei->args[i].arg_type != SIMD_CLONE_ARG_TYPE_UNIFORM >> - && !supported_simd_type (arg_type)) >> + && !supported_simd_type (type)) >> { >> if (!explicit_p) >> ; >> - else if (COMPLEX_FLOAT_TYPE_P (ret_type)) >> + else if (COMPLEX_FLOAT_TYPE_P (type)) >> warning_at (DECL_SOURCE_LOCATION (node->decl), 0, >> "GCC does not currently support argument type %qT " >> - "for simd", arg_type); >> + "for simd", type); >> else >> warning_at (DECL_SOURCE_LOCATION (node->decl), 0, >> "unsupported argument type %qT for simd", >> - arg_type); >> + type); >> return 0; >> } >> - unsigned lane_bits = lane_size (clonei->args[i].arg_type, arg_type); >> + unsigned lane_bits = lane_size (clonei->args[i].arg_type, type); >> if (clonei->args[i].arg_type == SIMD_CLONE_ARG_TYPE_VECTOR) >> - vec_elts.safe_push (std::make_pair (arg_type, lane_bits)); >> + vec_elts.safe_push (std::make_pair (type, lane_bits)); >> if (nds_elt_bits > lane_bits) >> nds_elt_bits = lane_bits; >> + else if (wds_elt_bits < lane_bits) >> + wds_elt_bits = lane_bits; >> } >> >> - clonei->vecsize_mangle = 'n'; >> + /* If we could not determine the WDS type from available parameters/return, >> + then fallback to using uintptr_t. */ >> + if (wds_elt_bits == 0) >> + wds_elt_bits = POINTER_SIZE; >> + >> clonei->mask_mode = VOIDmode; >> poly_uint64 simdlen; >> auto_vec simdlens (2); >> @@ -27558,6 +27568,11 @@ aarch64_simd_clone_compute_vecsize_and_simdlen (struct cgraph_node *node, >> simdlen = exact_div (poly_uint64 (64), nds_elt_bits); >> simdlens.safe_push (simdlen); >> simdlens.safe_push (simdlen * 2); >> + /* Only create a SVE simd clone if we aren't dealing with an unprototyped >> + function. */ >> + if (DECL_ARGUMENTS (node->decl) != 0 >> + || type_arg_types != 0) >> + simdlens.safe_push (exact_div (poly_uint64 (128, 128), wds_elt_bits)); > > This check feels a bit indirect. Does it work to use: > > if (prototype_p (TREE_TYPE (node->decl))) > > instead? > >> } >> else >> simdlens.safe_push (clonei->simdlen); >> @@ -27578,19 +27593,20 @@ aarch64_simd_clone_compute_vecsize_and_simdlen (struct cgraph_node *node, >> while (j < count && !simdlens.is_empty ()) >> { >> bool remove_simdlen = false; >> - for (auto elt : vec_elts) >> - if (known_gt (simdlens[j] * elt.second, 128U)) >> - { >> - /* Don't issue a warning for every simdclone when there is no >> - specific simdlen clause. */ >> - if (explicit_p && known_ne (clonei->simdlen, 0U)) >> - warning_at (DECL_SOURCE_LOCATION (node->decl), 0, >> - "GCC does not currently support simdlen %wd for " >> - "type %qT", >> - constant_lower_bound (simdlens[j]), elt.first); >> - remove_simdlen = true; >> - break; >> - } >> + if (simdlens[j].is_constant ()) >> + for (auto elt : vec_elts) >> + if (known_gt (simdlens[j] * elt.second, 128U)) >> + { >> + /* Don't issue a warning for every simdclone when there is no >> + specific simdlen clause. */ >> + if (explicit_p && known_ne (clonei->simdlen, 0U)) >> + warning_at (DECL_SOURCE_LOCATION (node->decl), 0, >> + "GCC does not currently support simdlen %wd for " >> + "type %qT", >> + constant_lower_bound (simdlens[j]), elt.first); >> + remove_simdlen = true; >> + break; >> + } >> if (remove_simdlen) >> { >> count--; >> @@ -27618,9 +27634,36 @@ aarch64_simd_clone_compute_vecsize_and_simdlen (struct cgraph_node *node, >> >> gcc_assert (num < count); >> clonei->simdlen = simdlens[num]; >> + if (clonei->simdlen.is_constant ()) >> + clonei->vecsize_mangle = 'n'; >> + else >> + { >> + clonei->vecsize_mangle = 's'; >> + clonei->inbranch = 1; >> + } >> return count; >> } >> >> +static tree >> +simd_clone_adjust_sve_vector_type (tree type, bool is_mask, poly_uint64 simdlen) >> +{ >> + unsigned int num_zr = 0; > > Nits: missing function comment. The body is indented by too many columns. > >> + unsigned int num_pr = 0; >> + type = TREE_TYPE (type); >> + type = build_vector_type (type, simdlen); > > Is simdlen ever different from the original TYPE_VECTOR_SUBPARTS? > I think a comment is needed if so. Not right now, but I'm not sure why you are asking. So the reason why I say not right now is because we don't support multi-{register,argument} vector mappings, so as soon as simdlen means it wouldn't fit in a single register we reject, that's for both Advanced SIMD and SVE. If we would want to, then simdlen could be larger than type's TYPE_VECTOR_SUBPARTS. >> + >> /* Implement TARGET_SIMD_CLONE_ADJUST. */ >> >> static void >> @@ -27632,6 +27675,69 @@ aarch64_simd_clone_adjust (struct cgraph_node *node) >> tree t = TREE_TYPE (node->decl); >> TYPE_ATTRIBUTES (t) = make_attribute ("aarch64_vector_pcs", "default", >> TYPE_ATTRIBUTES (t)); >> + >> + cl_target_option cur_target; >> + poly_uint16 old_sve_vg; >> + bool m_old_have_regs_of_mode[MAX_MACHINE_MODE]; >> + >> + if (node->simdclone->vecsize_mangle == 's') >> + { >> + tree target = build_string (strlen ("+sve"), "+sve"); >> + aarch64_option_valid_attribute_p (node->decl, NULL_TREE, target, 0); > > It would be good to assert that this succeeds. It's unfortunate that we > have a predicate with side-effects, but that's obviously not your fault. :) Scared to think of how it wouldn't... > >> + cl_target_option_save (&cur_target, &global_options, &global_options_set); >> + tree new_target = DECL_FUNCTION_SPECIFIC_TARGET (node->decl); >> + cl_target_option_restore (&global_options, &global_options_set, >> + TREE_TARGET_OPTION (new_target)); >> + aarch64_override_options_internal (&global_options); > > Does this preserve any existing target attributes too, to the extent > possible? E.g. if the function has: > > #pragma GCC target "+sve2" > > then I think we should honour that rather than dial down to "+sve". > Same if the code is compiled with -march=armv9-a+sve2: we should > compile the clone as SVE2 rather than SVE. Yes it adds to it, so for sve2 this actually has no effect, because obviously sve is already enabled. > >> + memcpy (m_old_have_regs_of_mode, have_regs_of_mode, >> + sizeof (have_regs_of_mode)); >> + for (int i = 0; i < NUM_MACHINE_MODES; ++i) >> + if (aarch64_sve_mode_p ((machine_mode) i)) >> + have_regs_of_mode[i] = true; > > Would it be possible to use push_cfun and pop_cfun instead, so that > we do a proper target switch? Where would I get the new cfun from? I'll go have a bit of a look around see if I can make snse of what this would do. > > (The SVE ACLE code uses a similar technique to the above, but that's > because it's run in a non-function context.) > >> + old_sve_vg = aarch64_sve_vg; >> + if (!node->simdclone->simdlen.is_constant ()) >> + aarch64_sve_vg = poly_uint16 (2, 2); > > I'm not sure we should change VG here. Agree. I forgot about the decision to accept that -msve-vector-bits would influence the VG of simdclones. I'll also make the changes to the code to compute simdlen. > > The basis for that and for allowing SVE2 above is that the ODR requires > that all comdat instances of a function are compiled in the same way. > That applies to all functions, not just simd clones. So IMO it's user > error if a clone is compiled multiple times with different target options, > or if it's compiled with target options that the runtime target doesn't in > fact support. > > We're already implicitly assuming the same thing for Advanced SIMD, > since we'll use whatever post-Armv8-A features happen to be enabled. > Agree.