public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: "Andre Vieira (lists)" <andre.simoesdiasvieira@arm.com>
To: gcc-patches@gcc.gnu.org, richard.sandiford@arm.com
Subject: Re: [PATCH 8/8] aarch64: Add SVE support for simd clones [PR 96342]
Date: Fri, 1 Dec 2023 16:39:44 +0000	[thread overview]
Message-ID: <a1231b28-0b20-4003-8857-648b5bbd3faf@arm.com> (raw)
In-Reply-To: <mpt5y1kijdk.fsf@arm.com>



On 29/11/2023 17:01, Richard Sandiford wrote:
> "Andre Vieira (lists)" <andre.simoesdiasvieira@arm.com> writes:
>> Rebased, no major changes, still needs review.
>>
>> On 30/08/2023 10:19, Andre Vieira (lists) via Gcc-patches wrote:
>>> This patch finalizes adding support for the generation of SVE simd
>>> clones when no simdlen is provided, following the ABI rules where the
>>> widest data type determines the minimum amount of elements in a length
>>> agnostic vector.
>>>
>>> gcc/ChangeLog:
>>>
>>>           * config/aarch64/aarch64-protos.h (add_sve_type_attribute):
>>> Declare.
>>>       * config/aarch64/aarch64-sve-builtins.cc (add_sve_type_attribute):
>>> Make
>>>       visibility global.
>>>       * config/aarch64/aarch64.cc (aarch64_fntype_abi): Ensure SVE ABI is
>>>       chosen over SIMD ABI if a SVE type is used in return or arguments.
>>>       (aarch64_simd_clone_compute_vecsize_and_simdlen): Create VLA simd
>>> clone
>>>       when no simdlen is provided, according to ABI rules.
>>>       (aarch64_simd_clone_adjust): Add '+sve' attribute to SVE simd clones.
>>>       (aarch64_simd_clone_adjust_ret_or_param): New.
>>>       (TARGET_SIMD_CLONE_ADJUST_RET_OR_PARAM): Define.
>>>       * omp-simd-clone.cc (simd_clone_mangle): Print 'x' for VLA simdlen.
>>>       (simd_clone_adjust): Adapt safelen check to be compatible with VLA
>>>       simdlen.
>>>
>>> gcc/testsuite/ChangeLog:
>>>
>>>       * c-c++-common/gomp/declare-variant-14.c: Adapt aarch64 scan.
>>>       * gfortran.dg/gomp/declare-variant-14.f90: Likewise.
>>>       * gcc.target/aarch64/declare-simd-1.c: Remove warning checks where no
>>>       longer necessary.
>>>       * gcc.target/aarch64/declare-simd-2.c: Add SVE clone scan.
>>
>> diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
>> index 60a55f4bc1956786ea687fc7cad7ec9e4a84e1f0..769d637f63724a7f0044f48f3dd683e0fb46049c 100644
>> --- a/gcc/config/aarch64/aarch64-protos.h
>> +++ b/gcc/config/aarch64/aarch64-protos.h
>> @@ -1005,6 +1005,8 @@ namespace aarch64_sve {
>>   #ifdef GCC_TARGET_H
>>     bool verify_type_context (location_t, type_context_kind, const_tree, bool);
>>   #endif
>> + void add_sve_type_attribute (tree, unsigned int, unsigned int,
>> +			      const char *, const char *);
>>   }
>>   
>>   extern void aarch64_split_combinev16qi (rtx operands[3]);
>> diff --git a/gcc/config/aarch64/aarch64-sve-builtins.cc b/gcc/config/aarch64/aarch64-sve-builtins.cc
>> index 161a14edde7c9fb1b13b146cf50463e2d78db264..6f99c438d10daa91b7e3b623c995489f1a8a0f4c 100644
>> --- a/gcc/config/aarch64/aarch64-sve-builtins.cc
>> +++ b/gcc/config/aarch64/aarch64-sve-builtins.cc
>> @@ -569,14 +569,16 @@ static bool reported_missing_registers_p;
>>   /* Record that TYPE is an ABI-defined SVE type that contains NUM_ZR SVE vectors
>>      and NUM_PR SVE predicates.  MANGLED_NAME, if nonnull, is the ABI-defined
>>      mangling of the type.  ACLE_NAME is the <arm_sve.h> name of the type.  */
>> -static void
>> +void
>>   add_sve_type_attribute (tree type, unsigned int num_zr, unsigned int num_pr,
>>   			const char *mangled_name, const char *acle_name)
>>   {
>>     tree mangled_name_tree
>>       = (mangled_name ? get_identifier (mangled_name) : NULL_TREE);
>> +  tree acle_name_tree
>> +    = (acle_name ? get_identifier (acle_name) : NULL_TREE);
>>   
>> -  tree value = tree_cons (NULL_TREE, get_identifier (acle_name), NULL_TREE);
>> +  tree value = tree_cons (NULL_TREE, acle_name_tree, NULL_TREE);
>>     value = tree_cons (NULL_TREE, mangled_name_tree, value);
>>     value = tree_cons (NULL_TREE, size_int (num_pr), value);
>>     value = tree_cons (NULL_TREE, size_int (num_zr), value);
>> diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
>> index 37507f091c2a6154fa944c3a9fad6a655ab5d5a1..cb0947b18c6a611d55579b5b08d93f6a4a9c3b2c 100644
>> --- a/gcc/config/aarch64/aarch64.cc
>> +++ b/gcc/config/aarch64/aarch64.cc
>> @@ -4080,13 +4080,13 @@ aarch64_takes_arguments_in_sve_regs_p (const_tree fntype)
>>   static const predefined_function_abi &
>>   aarch64_fntype_abi (const_tree fntype)
>>   {
>> -  if (lookup_attribute ("aarch64_vector_pcs", TYPE_ATTRIBUTES (fntype)))
>> -    return aarch64_simd_abi ();
>> -
>>     if (aarch64_returns_value_in_sve_regs_p (fntype)
>>         || aarch64_takes_arguments_in_sve_regs_p (fntype))
>>       return aarch64_sve_abi ();
>>   
>> +  if (lookup_attribute ("aarch64_vector_pcs", TYPE_ATTRIBUTES (fntype)))
>> +    return aarch64_simd_abi ();
>> +
>>     return default_function_abi;
>>   }
>>   
> 
> I think we discussed this off-list later, but the change above shouldn't
> be necessary.  aarch64_vector_pcs must not be attached to SVE PCS functions,
> so the two cases should be mutually exclusive.

Yeah I had made the changes locally, but not updated the patch yet.
> 
>> @@ -27467,7 +27467,7 @@ aarch64_simd_clone_compute_vecsize_and_simdlen (struct cgraph_node *node,
>>   					int num, bool explicit_p)
>>   {
>>     tree t, ret_type;
>> -  unsigned int nds_elt_bits;
>> +  unsigned int nds_elt_bits, wds_elt_bits;
>>     int count;
>>     unsigned HOST_WIDE_INT const_simdlen;
>>   
>> @@ -27513,10 +27513,14 @@ aarch64_simd_clone_compute_vecsize_and_simdlen (struct cgraph_node *node,
>>     if (TREE_CODE (ret_type) != VOID_TYPE)
>>       {
>>         nds_elt_bits = lane_size (SIMD_CLONE_ARG_TYPE_VECTOR, ret_type);
>> +      wds_elt_bits = nds_elt_bits;
>>         vec_elts.safe_push (std::make_pair (ret_type, nds_elt_bits));
>>       }
>>     else
>> -    nds_elt_bits = POINTER_SIZE;
>> +    {
>> +      nds_elt_bits = POINTER_SIZE;
>> +      wds_elt_bits = 0;
>> +    }
>>   
>>     int i;
>>     tree type_arg_types = TYPE_ARG_TYPES (TREE_TYPE (node->decl));
>> @@ -27524,30 +27528,36 @@ aarch64_simd_clone_compute_vecsize_and_simdlen (struct cgraph_node *node,
>>     for (t = (decl_arg_p ? DECL_ARGUMENTS (node->decl) : type_arg_types), i = 0;
>>          t && t != void_list_node; t = TREE_CHAIN (t), i++)
>>       {
>> -      tree arg_type = decl_arg_p ? TREE_TYPE (t) : TREE_VALUE (t);
>> +      tree type = decl_arg_p ? TREE_TYPE (t) : TREE_VALUE (t);
> 
> Not sure renaming arg_type is worth it.  The original was probably
> more descriptive.
> 
>>         if (clonei->args[i].arg_type != SIMD_CLONE_ARG_TYPE_UNIFORM
>> -	  && !supported_simd_type (arg_type))
>> +	  && !supported_simd_type (type))
>>   	{
>>   	  if (!explicit_p)
>>   	    ;
>> -	  else if (COMPLEX_FLOAT_TYPE_P (ret_type))
>> +	  else if (COMPLEX_FLOAT_TYPE_P (type))
>>   	    warning_at (DECL_SOURCE_LOCATION (node->decl), 0,
>>   			"GCC does not currently support argument type %qT "
>> -			"for simd", arg_type);
>> +			"for simd", type);
>>   	  else
>>   	    warning_at (DECL_SOURCE_LOCATION (node->decl), 0,
>>   			"unsupported argument type %qT for simd",
>> -			arg_type);
>> +			type);
>>   	  return 0;
>>   	}
>> -      unsigned lane_bits = lane_size (clonei->args[i].arg_type, arg_type);
>> +      unsigned lane_bits = lane_size (clonei->args[i].arg_type, type);
>>         if (clonei->args[i].arg_type == SIMD_CLONE_ARG_TYPE_VECTOR)
>> -	vec_elts.safe_push (std::make_pair (arg_type, lane_bits));
>> +	vec_elts.safe_push (std::make_pair (type, lane_bits));
>>         if (nds_elt_bits > lane_bits)
>>   	nds_elt_bits = lane_bits;
>> +      else if (wds_elt_bits < lane_bits)
>> +	wds_elt_bits = lane_bits;
>>       }
>>   
>> -  clonei->vecsize_mangle = 'n';
>> +  /* If we could not determine the WDS type from available parameters/return,
>> +     then fallback to using uintptr_t.  */
>> +  if (wds_elt_bits == 0)
>> +    wds_elt_bits = POINTER_SIZE;
>> +
>>     clonei->mask_mode = VOIDmode;
>>     poly_uint64 simdlen;
>>     auto_vec<poly_uint64> simdlens (2);
>> @@ -27558,6 +27568,11 @@ aarch64_simd_clone_compute_vecsize_and_simdlen (struct cgraph_node *node,
>>         simdlen = exact_div (poly_uint64 (64), nds_elt_bits);
>>         simdlens.safe_push (simdlen);
>>         simdlens.safe_push (simdlen * 2);
>> +      /* Only create a SVE simd clone if we aren't dealing with an unprototyped
>> +	 function.  */
>> +      if (DECL_ARGUMENTS (node->decl) != 0
>> +	  || type_arg_types != 0)
>> +	simdlens.safe_push (exact_div (poly_uint64 (128, 128), wds_elt_bits));
> 
> This check feels a bit indirect.  Does it work to use:
> 
>    if (prototype_p (TREE_TYPE (node->decl)))
> 
> instead?
> 
>>       }
>>     else
>>       simdlens.safe_push (clonei->simdlen);
>> @@ -27578,19 +27593,20 @@ aarch64_simd_clone_compute_vecsize_and_simdlen (struct cgraph_node *node,
>>     while (j < count && !simdlens.is_empty ())
>>       {
>>         bool remove_simdlen = false;
>> -      for (auto elt : vec_elts)
>> -	if (known_gt (simdlens[j] * elt.second, 128U))
>> -	  {
>> -	    /* Don't issue a warning for every simdclone when there is no
>> -	       specific simdlen clause.  */
>> -	    if (explicit_p && known_ne (clonei->simdlen, 0U))
>> -	      warning_at (DECL_SOURCE_LOCATION (node->decl), 0,
>> -			  "GCC does not currently support simdlen %wd for "
>> -			  "type %qT",
>> -			  constant_lower_bound (simdlens[j]), elt.first);
>> -	    remove_simdlen = true;
>> -	    break;
>> -	  }
>> +      if (simdlens[j].is_constant ())
>> +	for (auto elt : vec_elts)
>> +	  if (known_gt (simdlens[j] * elt.second, 128U))
>> +	    {
>> +	      /* Don't issue a warning for every simdclone when there is no
>> +		 specific simdlen clause.  */
>> +	      if (explicit_p && known_ne (clonei->simdlen, 0U))
>> +		warning_at (DECL_SOURCE_LOCATION (node->decl), 0,
>> +			    "GCC does not currently support simdlen %wd for "
>> +			    "type %qT",
>> +			    constant_lower_bound (simdlens[j]), elt.first);
>> +	      remove_simdlen = true;
>> +	      break;
>> +	    }
>>         if (remove_simdlen)
>>   	{
>>   	  count--;
>> @@ -27618,9 +27634,36 @@ aarch64_simd_clone_compute_vecsize_and_simdlen (struct cgraph_node *node,
>>   
>>     gcc_assert (num < count);
>>     clonei->simdlen = simdlens[num];
>> +  if (clonei->simdlen.is_constant ())
>> +    clonei->vecsize_mangle = 'n';
>> +  else
>> +    {
>> +      clonei->vecsize_mangle = 's';
>> +      clonei->inbranch = 1;
>> +    }
>>     return count;
>>   }
>>   
>> +static tree
>> +simd_clone_adjust_sve_vector_type (tree type, bool is_mask, poly_uint64 simdlen)
>> +{
>> +    unsigned int num_zr = 0;
> 
> Nits: missing function comment.  The body is indented by too many columns.
> 
>> +    unsigned int num_pr = 0;
>> +    type = TREE_TYPE (type);
>> +    type = build_vector_type (type, simdlen);
> 
> Is simdlen ever different from the original TYPE_VECTOR_SUBPARTS?
> I think a comment is needed if so.

Not right now, but I'm not sure why you are asking. So the reason why I 
say not right now is because we don't support multi-{register,argument} 
vector mappings, so as soon as simdlen means it wouldn't fit in a single 
register we reject, that's for both Advanced SIMD and SVE. If we would 
want to, then simdlen could be larger than type's TYPE_VECTOR_SUBPARTS.


>> +
>>   /* Implement TARGET_SIMD_CLONE_ADJUST.  */
>>   
>>   static void
>> @@ -27632,6 +27675,69 @@ aarch64_simd_clone_adjust (struct cgraph_node *node)
>>     tree t = TREE_TYPE (node->decl);
>>     TYPE_ATTRIBUTES (t) = make_attribute ("aarch64_vector_pcs", "default",
>>   					TYPE_ATTRIBUTES (t));
>> +
>> +  cl_target_option cur_target;
>> +  poly_uint16 old_sve_vg;
>> +  bool m_old_have_regs_of_mode[MAX_MACHINE_MODE];
>> +
>> +  if (node->simdclone->vecsize_mangle == 's')
>> +    {
>> +      tree target = build_string (strlen ("+sve"), "+sve");
>> +      aarch64_option_valid_attribute_p (node->decl, NULL_TREE, target, 0);
> 
> It would be good to assert that this succeeds.  It's unfortunate that we
> have a predicate with side-effects, but that's obviously not your fault. :)
Scared to think of how it wouldn't...
> 
>> +      cl_target_option_save (&cur_target, &global_options, &global_options_set);
>> +      tree new_target = DECL_FUNCTION_SPECIFIC_TARGET (node->decl);
>> +      cl_target_option_restore (&global_options, &global_options_set,
>> +				TREE_TARGET_OPTION (new_target));
>> +      aarch64_override_options_internal (&global_options);
> 
> Does this preserve any existing target attributes too, to the extent
> possible?  E.g. if the function has:
> 
>    #pragma GCC target "+sve2"
> 
> then I think we should honour that rather than dial down to "+sve".
> Same if the code is compiled with -march=armv9-a+sve2: we should
> compile the clone as SVE2 rather than SVE.

Yes it adds to it, so for sve2 this actually has no effect, because 
obviously sve is already enabled.
> 
>> +      memcpy (m_old_have_regs_of_mode, have_regs_of_mode,
>> +	      sizeof (have_regs_of_mode));
>> +      for (int i = 0; i < NUM_MACHINE_MODES; ++i)
>> +	if (aarch64_sve_mode_p ((machine_mode) i))
>> +	  have_regs_of_mode[i] = true;
> 
> Would it be possible to use push_cfun and pop_cfun instead, so that
> we do a proper target switch?

Where would I get the new cfun from? I'll go have a bit of a look around 
see if I can make snse of what this would do.
> 
> (The SVE ACLE code uses a similar technique to the above, but that's
> because it's run in a non-function context.)
> 
>> +      old_sve_vg = aarch64_sve_vg;
>> +      if (!node->simdclone->simdlen.is_constant ())
>> +	aarch64_sve_vg = poly_uint16 (2, 2);
> 
> I'm not sure we should change VG here.
Agree. I forgot about the decision to accept that -msve-vector-bits 
would influence the VG of simdclones. I'll also make the changes to the 
code to compute simdlen.

> 
> The basis for that and for allowing SVE2 above is that the ODR requires
> that all comdat instances of a function are compiled in the same way.
> That applies to all functions, not just simd clones.  So IMO it's user
> error if a clone is compiled multiple times with different target options,
> or if it's compiled with target options that the runtime target doesn't in
> fact support.
> 
> We're already implicitly assuming the same thing for Advanced SIMD,
> since we'll use whatever post-Armv8-A features happen to be enabled.
> 
Agree.


  reply	other threads:[~2023-12-01 16:39 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-08-30  8:49 aarch64, vect, omp: " Andre Vieira (lists)
2023-08-30  9:06 ` [PATCH 1/8] parloops: Copy target and optimizations when creating a function clone Andre Vieira (lists)
2023-08-30 12:31   ` Richard Biener
2023-10-18 14:40     ` Andre Vieira (lists)
2023-08-30  9:08 ` [Patch 2/8] parloops: Allow poly nit and bound Andre Vieira (lists)
2023-08-30 12:32   ` Richard Biener
2023-10-18 14:40     ` Andre Vieira (lists)
2023-08-30  9:10 ` [Patch 3/8] vect: Fix vect_get_smallest_scalar_type for simd clones Andre Vieira (lists)
2023-08-30 12:54   ` Richard Biener
2023-10-18 14:40     ` Andre Vieira (lists)
2023-10-19 12:07       ` Richard Biener
2023-08-30  9:11 ` [PATCH 4/8] vect: don't allow fully masked loops with non-masked simd clones [PR 110485] Andre Vieira (lists)
2023-08-30 12:54   ` Richard Biener
2023-10-18 14:40     ` Andre Vieira (lists)
2023-10-19 12:06       ` Richard Biener
2023-08-30  9:13 ` [PATCH 5/8] vect: Use inbranch simdclones in masked loops Andre Vieira (lists)
2023-10-18 14:41   ` Andre Vieira (lists)
2023-10-19 12:17     ` Richard Biener
2023-08-30  9:14 ` [PATCH 6/8] vect: Add vector_mode paramater to simd_clone_usable Andre Vieira (lists)
2023-08-30  9:17   ` Andre Vieira (lists)
2023-08-30 13:01   ` Richard Biener
2023-08-30 15:02     ` Andre Vieira (lists)
2023-08-31  6:39       ` Richard Biener
2023-09-28 15:57         ` Andre Vieira (lists)
2023-08-30  9:17 ` [PATCH7/8] vect: Add TARGET_SIMD_CLONE_ADJUST_RET_OR_PARAM Andre Vieira (lists)
2023-08-30 13:04   ` Richard Biener
2023-10-04 10:32     ` Andre Vieira (lists)
2023-10-04 10:41       ` Richard Biener
2023-10-04 12:40         ` Andre Vieira (lists)
2023-10-18 14:41           ` [PATCH6/8] omp: Reorder call for TARGET_SIMD_CLONE_ADJUST (was Re: [PATCH7/8] vect: Add TARGET_SIMD_CLONE_ADJUST_RET_OR_PARAM) Andre Vieira (lists)
2023-10-30 18:34             ` Andre Vieira (lists)
2023-10-31  7:59             ` Richard Biener
2023-12-08 10:35               ` Jakub Jelinek
2023-08-30  9:19 ` [PATCH 8/8] aarch64: Add SVE support for simd clones [PR 96342] Andre Vieira (lists)
2023-10-18 14:41   ` Andre Vieira (lists)
2023-11-29 17:01     ` Richard Sandiford
2023-12-01 16:39       ` Andre Vieira (lists) [this message]
2023-10-18 14:40 ` aarch64, vect, omp: " Andre Vieira (lists)
2023-10-18 14:41 ` [PATCH 0/8] omp: Replace simd_clone_subparts with TYPE_VECTOR_SUBPARTS Andre Vieira (lists)
2023-10-19  7:10   ` Richard Biener

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a1231b28-0b20-4003-8857-648b5bbd3faf@arm.com \
    --to=andre.simoesdiasvieira@arm.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=richard.sandiford@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).