[Aarch64] Vector Function Application Binary Interface Specification for OpenMP

public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed

* [Aarch64] Vector Function Application Binary Interface Specification for OpenMP
@ 2017-03-15  9:50 Sekhar, Ashwin
  2017-03-17 14:02 ` James Greenhalgh
  0 siblings, 1 reply; 18+ messages in thread
From: Sekhar, Ashwin @ 2017-03-15  9:50 UTC (permalink / raw)
  To: gcc; +Cc: richard.earnshaw, marcus.shawcroft, james.greenhalgh

Hi GCC Team, Aarch64 Maintainers,

The rules in Vector Function Application Binary Interface Specification  for OpenMP  (https://sourceware.org/glibc/wiki/libmvec?action=AttachFile&do=view&target=VectorABI.txt)  is used in x86 for generating the simd clones of a function.

Is there a similar one defined for Aarch64?

If not, would like to start a discussion on the same for Aarch64. To  kick start the same, a draft proposal for Aarch64 (on the same lines as  x86 ABI) is included below. The only change from x86 ABI is in the  function name mangling. Here the letter 'b' is used for indicating the  ASIMD isa.

Please review and comment.

Thanks and Regards,

Ashwin Sekhar T K

------------------------------------ CUT HERE ----------------------------------

================================================================================
 Aarch64 Vector Function Application Binary Interface Specification for OpenMP
================================================================================

1. Vector Function ABI Overview

Aarch64 Vector Function ABI provides ABI for the vector functions generated by
compiler supporting SIMD constructs of OpenMP 4.0 [1] in Aarch64. This is
based on the x86 Vector Function Application Binary Interface Specification for
OpenMP [2].

================================================================================

2. Vector Function ABI

Vector Function ABI defines a set of rules that the caller and the callee
functions must obey.

These rules consist of:
  * Calling convention
  * Vector length (the number of concurrent scalar invocations to be processed
    per invocation of the vector function)
  * Mapping from element data types to vector data types
  * Ordering of vector arguments
  * Vector function masking
  * Vector function name mangling
  * Compiler generated variants of vector function

--------------------------------------------------------------------------------

2.1. Calling Convention

The vector functions should use calling convention described in Procedure Call
Standard for the ARM 64-bit Architecture (AArch64) [3].

--------------------------------------------------------------------------------

2.2. Vector Length

Every vector variant of a SIMD-enabled function has a vector length (VLEN). If
OpenMP clause "simdlen" is used, the VLEN is the value of the argument of that
clause. The VLEN value must be power of 2. In other case the notion of the
function`s "characteristic data type" (CDT) is used to compute the vector
length.

CDT is defined in the following order:
  a) For non-void function, the CDT is the return type.
  b) If the function has any non-uniform, non-linear parameters, then the CDT
     is the type of the first such parameter.
  c) If the CDT determined by a) or b) above is struct, union, or class type
     which is pass-by-value (except for the type that maps to the built-in
     complex data type), the characteristic data type is int.
  d) If none of the above three cases is applicable, the CDT is int.

VLEN  = sizeof(vector_register) / sizeof(CDT),

For example, if ISA is ASIMD, sizeof(vector_register) = 16, as the vector
registers are 128 bit. And if the CDT of the function is "int", sizeof(CDT) = 4.
So, VLEN = 4.

--------------------------------------------------------------------------------

2.3. Element Data Type to Vector Data Type Mapping

The vector data types for parameters are selected depending on ISA, vector
length, data type of original parameter, and parameter specification.

For uniform and linear parameters (detailed description could be found in [1]),
the original data type is preserved.

For vector parameters, vector data types are selected by the compiler. The
mapping from element data type to vector data type is described as below.

  * The bit size of vector data type of parameter is computed as:

    size_of_vector_data_type = VLEN * sizeof(original_parameter_data_type) * 8

    For instance, for ASIMD version of vector function with parameter data type
    "int": If VLEN = 4, size_of_vector_data_type = 4 * 4 * 8 = 128 (bits), which
    means one argument of type __m128 to be passed.

  * If the size_of_vector_data_type is greater than the width of the vector
    register, multiple vector registers are selected and the parameter will be
    passed in multiple vector registers.

    For instance, for ASIMD version of vector function with parameter data type
    "int":

    If VLEN = 8, size_of_vector_data_type = 8 * 4 * 8 = 256 (bits), so the
    vector data type is __m256, which means 2 arguments of type __m128 are to
    be passed.

--------------------------------------------------------------------------------

2.4. Ordering of Vector Arguments

  * When a parameter in the original data type results in one argument in the
    vector function, the ordering rule is a simple one to one match with the
    original argument order.

    For example, when the original  argument list is (int a, float b, int c),
    VLEN is 4, the ISA is ASIMD, and all a, b, and c are classified  vector
    parameters, the vector function argument list becomes (__m128i vec_a,
    __m128 vec_b, __m128i vec_c).

  * There are cases where a single parameter in the original data type results
    in the multiple arguments in the vector function. Those addition second and
    subsequent arguments are inserted in the argument list right after the
    corresponding first argument, not appended to the end of the argument list
    of the vector function.

    For example, the original argument list is (int a, float
    b, int c), VLEN is 8, the ISA is ASIMD, and all a, b, and c are classified
    as vector parameters, the vector function argument list becomes
    (__m128i vec_a1, __m128i vec_a2, __m128 vec_b1, __m128 vec_b2,
    __m128i vec_c1, __m128i vec_c2).

--------------------------------------------------------------------------------

2.5. Masking of Vector Function

Masked vector function variant used for invocation in conditional statement
(please refer to [1] for detailed information) additionally takes an implicit
mask argument, which disables processing of some of the vector lanes. For
masked vector functions, the additional "mask" parameters are required.

Each element of "mask" parameters has the data type of the CDT (see Section
2.2). The number of mask parameters is the same as number of parameters
required to pass the vector of CDT for the given vector length. The value of a
mask parameter must be either bit patterns of all ones or all zeros for each
element.

For each element of the vector, if the corresponding mask value is zero, the
return value associated to that element is zero. Mask parameters are passed
after all other parameters in the same order of parameters that they are apply
to.

--------------------------------------------------------------------------------

2.6. Vector Function Name Mangling

The name mangling of the generated vector function based on vector annotation
is important part of Vector ABI. It allows the caller and the callee functions
to be compiled in separate files or compilation units. Using the function
prototype in header files to communicate vector function annotation
information, the compiler can perform function matching while vectorizing code
at call sites.

The vector function name is mangled as the concatenation of the following items:

<vector_prefix> <isa> <mask> <vlen> <vparameters> '_' <original_name>

The descriptions of each item are:
  * <vector_prefix>
      string "_ZGV"

  * <original_name>
      name of scalar function, including C++ and Fortran mangling

  * <isa>
      'b'    // ASIMD

  * <mask>
      'M'    // masked version
      | 'N'  // unmasked version

  * <vlen>
      decimal-number

  * <vparameters>
      /* empty */
      <vparameter> <opt-align> <vparameters>
          o <vparameter>
          (please refer to [1] for information about parameter types used below)
              's' decimal-number // linear parameter, variable stride ,
                                 // decimal number is the position # of
                                 // stride argument, which starts from 0
              | 'l' <number>     // linear parameter, constant stride
              | 'u'              // uniform parameter  
              | 'v'              // vector parameter
                 o <number>
                     [n] non-negative decimal integer  // n indicates negative
          o <opt-align>  
              /* empty */
              | 'a' non-negative-decimal-number

Please refer to section 2.7 Compiler generated variants of vector function for
examples of vector function mangling.

--------------------------------------------------------------------------------

2.7. Compiler generated variants of vector function

Compiler's architecture selection flag has no impact on ISA selection for the
generated vector variants.

Vector variants should be generated by compiler for each ISA for both masked and
unmasked versions for each ISA (if one of them is not specified with according
clause). Compiler implementations must not generate calls to version of other
ISAs unless some non-standard pragma or clause is used to declare those other
versions are available.

Example 1.
#pragma omp declare simd uniform(q) aligned(q:16) linear(k:1)
float foo(float *q, float x, int k)
{
    q[k] = q[k] + x;
    return q[k];
}

Below is the list of generated function names or list of symbols provided by
library with the same pragma in "foo" prototype.

1) _ZGVbN4ua16vl_foo (ASIMD ISA, unmasked version)
2) _ZGVbM4ua16vl_foo (ASIMD ISA, masked version)

Where the "foo" is the original mangled function name, "_ZGV" is the prefix of
the vector function name, "b" indicates the ASIMD ISA, "N" indicates that this
is a unmasked version, "M" indicates that this is a masked version, "4" is the
vector length for ASIMD ISA, "ua16" indicates uniform(q) and align(a:32), "v"
indicates second argument x is vector argument, "l" indicates linear(k:1) - k
is a linear variable whose stride is 1.

Example 2.
#pragma omp declare simd notinbranch
double foo(double x)
{
    return x*x;
}

Below is the list of generated function names or list of symbols provided by
library with the same pragma in "foo" prototype.

1) _ZGVbN2v_foo (ASIMD ISA, unmasked version)

Where the "foo" is the original mangled function name, "_ZGV" is the prefix of
the vector function name, "b" indicates the ASIMD ISA, "N" indicates that this
is a unmasked version, "2" is the vector length for ASIMD ISA, "v" indicates
single argument x is vector argument.

================================================================================

3. References

[1] OpenMP 4.0 Specification
http://www.openmp.org/mp-documents/OpenMP4.0.0.pdf

[2] Vector Function Application Binary Interface Specification for OpenMP (x86)
https://sourceware.org/glibc/wiki/libmvec?action=AttachFile&do=view&target=VectorABI.txt

[3] Procedure Call Standard for the ARM 64-bit Architecture (AArch64)
http://infocenter.arm.com/help/topic/com.arm.doc.ihi0055b/IHI0055B_aapcs64.pdf 

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Aarch64] Vector Function Application Binary Interface Specification for OpenMP
  2017-03-15  9:50 [Aarch64] Vector Function Application Binary Interface Specification for OpenMP Sekhar, Ashwin
@ 2017-03-17 14:02 ` James Greenhalgh
  2017-03-20  4:30   ` Sekhar, Ashwin
  0 siblings, 1 reply; 18+ messages in thread
From: James Greenhalgh @ 2017-03-17 14:02 UTC (permalink / raw)
  To: Sekhar, Ashwin; +Cc: gcc, richard.earnshaw, marcus.shawcroft, nd

On Wed, Mar 15, 2017 at 09:50:18AM +0000, Sekhar, Ashwin wrote:
> Hi GCC Team, Aarch64 Maintainers,
> 
> 
> The rules in Vector Function Application Binary Interface Specification  for
> OpenMP
> (https://sourceware.org/glibc/wiki/libmvec?action=AttachFile&do=view&target=VectorABI.txt)
> is used in x86 for generating the simd clones of a function.
> 
> Is there a similar one defined for Aarch64?
> 
> If not, would like to start a discussion on the same for Aarch64. To  kick
> start the same, a draft proposal for Aarch64 (on the same lines as  x86 ABI)
> is included below. The only change from x86 ABI is in the  function name
> mangling. Here the letter 'b' is used for indicating the  ASIMD isa.

Hi Ashwin,

Thanks for the question. ARM has defined a vector function ABI, based
on the Vector Function ABI Specification you linked below, which
is designed to be suitable for both the Advanced SIMD and Scalable
Vector Extensions. There has not yet been a release of this document
which I can point you at, nor can I give you an estimate of when the
document will be published.

However, Francesco Petrogalli has recently made a proposal to the
LLVM mailing list ( https://reviews.llvm.org/D30739 ) which I would
note conflicts with your proposal in one way. You choose 'b' for name
mangling for a vector function using Advanced SIMD, while Francesco
uses 'n', which is the agreed character in the Vector Function ABI
Specification we have been working on.

I'd encourage you to wait for formal publication of the ARM Vector
Function ABI to prevent any unexpected divergence between
implementations.

Thanks,
James

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Aarch64] Vector Function Application Binary Interface Specification for OpenMP
  2017-03-17 14:02 ` James Greenhalgh
@ 2017-03-20  4:30   ` Sekhar, Ashwin
  0 siblings, 0 replies; 18+ messages in thread
From: Sekhar, Ashwin @ 2017-03-20  4:30 UTC (permalink / raw)
  To: James Greenhalgh; +Cc: gcc, richard.earnshaw, marcus.shawcroft, nd

On Friday 17 March 2017 07:31 PM, James Greenhalgh wrote:
> On Wed, Mar 15, 2017 at 09:50:18AM +0000, Sekhar, Ashwin wrote:
>> Hi GCC Team, Aarch64 Maintainers,
>>
>>
>> The rules in Vector Function Application Binary Interface Specification  for
>> OpenMP
>> (https://sourceware.org/glibc/wiki/libmvec?action=AttachFile&do=view&target=VectorABI.txt)
>> is used in x86 for generating the simd clones of a function.
>>
>> Is there a similar one defined for Aarch64?
>>
>> If not, would like to start a discussion on the same for Aarch64. To  kick
>> start the same, a draft proposal for Aarch64 (on the same lines as  x86 ABI)
>> is included below. The only change from x86 ABI is in the  function name
>> mangling. Here the letter 'b' is used for indicating the  ASIMD isa.
>
> Hi Ashwin,
>
> Thanks for the question. ARM has defined a vector function ABI, based
> on the Vector Function ABI Specification you linked below, which
> is designed to be suitable for both the Advanced SIMD and Scalable
> Vector Extensions. There has not yet been a release of this document
> which I can point you at, nor can I give you an estimate of when the
> document will be published.
>
> However, Francesco Petrogalli has recently made a proposal to the
> LLVM mailing list ( https://reviews.llvm.org/D30739 ) which I would
> note conflicts with your proposal in one way. You choose 'b' for name
> mangling for a vector function using Advanced SIMD, while Francesco
> uses 'n', which is the agreed character in the Vector Function ABI
> Specification we have been working on.
>
> I'd encourage you to wait for formal publication of the ARM Vector
> Function ABI to prevent any unexpected divergence between
> implementations.
Thanks for the information. We at Cavium are also working on libraries 
which requires this ABI specification. So we would like to see this 
published as early as possible.

>
> Thanks,
> James
>
>
Thanks
Ashwin

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Aarch64] Vector Function Application Binary Interface Specification for OpenMP
  2018-05-16 16:21   ` Steve Ellcey
  2018-05-16 16:30     ` Richard Earnshaw (lists)
@ 2018-07-02 18:16     ` Francesco Petrogalli
  1 sibling, 0 replies; 18+ messages in thread
From: Francesco Petrogalli @ 2018-07-02 18:16 UTC (permalink / raw)
  To: sellcey
  Cc: James Greenhalgh, Sekhar, Ashwin, gcc, Richard Earnshaw,
	Marcus Shawcroft, nd

Dear all,

I just want to let you know that we just published the final version of the Vector Function ABI specification. The call-clobbered and call-preserved lists of register has been updated (see section 2.1) .

The document is located at the same address:

https://developer.arm.com/products/software-development-tools/hpc/arm-compiler-for-hpc/vector-function-abi

Kind regards,

Francesco

> On May 16, 2018, at 11:21 AM, Steve Ellcey <sellcey@cavium.com> wrote:
> 
> On Tue, 2018-05-15 at 18:29 +0000, Francesco Petrogalli wrote:
> 
>> Hi Steve,
>> 
>> I am happy to let you know that the Vector Function ABI for AArch64
>> is now public and available via the link at [1].
>> 
>> Don’t hesitate to contact me in case you have any questions.
>> 
>> Kind regards,
>> 
>> Francesco
>> 
>> [1] https://developer.arm.com/products/software-development-tools/hpc
>> /arm-compiler-for-hpc/vector-function-abi
>> 
>>> 
>>> Steve Ellcey
>>> sellcey@cavium.com
> 
> Thanks for publishing this Francesco, it looks like the main issue for
> GCC is that the Vector Function ABI has different caller saved / callee
> saved register conventions than the standard ARM calling convention.
> 
> If I understand things correctly, in the standard calling convention
> the callee will only save the bottom 64 bits of V8-V15 and so the
> caller needs to save those registers if it is using the top half.  In
> the Vector calling convention the callee will save all 128 bits of
> these registers (and possibly more registers) so the caller does not
> have to save these registers at all, even if it is using all 128 bits
> of them.
> 
> It doesn't look like GCC has any existing mechanism for having different
> sets of caller saved/callee saved registers depending on the function
> attributes of the calling or called function.
> 
> Changing what registers a callee function saves and restores shouldn't
> be too difficult since that can be done when generating the prologue
> and epilogue code but changing what registers a caller saves/restores
> when doing the call seems trickier.  The macro
> TARGET_HARD_REGNO_CALL_PART_CLOBBERED doesn't know anything about the
> function being called.  It returns true/false depending on just the
> register number and mode.
> 
> Steve Ellcey
> sellcey@cavium.com


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Aarch64] Vector Function Application Binary Interface Specification for OpenMP
  2018-05-31 10:39                   ` Alan Hayward
@ 2018-06-12  3:11                     ` Jeff Law
  0 siblings, 0 replies; 18+ messages in thread
From: Jeff Law @ 2018-06-12  3:11 UTC (permalink / raw)
  To: Alan Hayward, Richard Sandiford
  Cc: Steve Ellcey, Richard Earnshaw, Francesco Petrogalli,
	James Greenhalgh, Sekhar, Ashwin, gcc, Marcus Shawcroft, nd

On 05/31/2018 04:39 AM, Alan Hayward wrote:
> (Missed this thread initially due to incorrect email address)
Sorry.  Good to hear your're still interested in figuring this out.

> 
>> On 29 May 2018, at 11:05, Richard Sandiford <richard.sandiford@linaro.org> wrote:
>>
>> Jeff Law <law@redhat.com> writes:
>>> Now that we're in stage1 I do want to revisit the CLOBBER_HIGH stuff.
>>> When we left things I think we were trying to decide between
>>> CLOBBER_HIGH and clobbering the appropriate subreg.  The problem with
>>> the latter is the dataflow we compute is inaccurate (overly pessimistic)
>>> so that'd have to be fixed.
> 
> Yes, I want to get back to looking at this again, however Iâ€™ve been busy
> elsewhere.
Similarly.

> 
>>
>> The clobbered part of the register in this case is a high-part subreg,
>> which is ill-formed for single registers.  It would also be difficult
>> to represent in terms of the mode, since there are no defined modes for
>> what can be stored in the high part of an SVE register.  For 128-bit
>> SVE that mode would have zero bits. :-)
>>
>> I thought the alternative suggestion was instead to have:
>>
>>   (set (reg:M X) (reg:M X))
>>
>> when X is preserved in mode M but not in wider modes.  But that seems
>> like too much of a special case to me, both in terms of the source and
>> the destination:
> 
> Agreed. When I looked at doing it that way back in Jan, my conclusion was
> that if we did it that way we end up with more or less the same code but
> instead of:
> 
> if (GET_CODE (setter) == CLOBBER_HIGH
>    && reg_is_clobbered_by_clobber_high(REGNO(dest), GET_MODE (rsp->last_set_value))
> 
> Now becomes something like:
> 
> if (GET_CODE (setter) == SET
>    && REG_P (dest) && HARD_REGISTER_P (dest) && REG_P (src) && REGNO(dst) == REGNO(src)
>    && reg_is_clobbered_by_self_set(REGNO(dest), GET_MODE (rsp->last_set_value))
> 
> Ok, some of that code can go into a macro, but it feel much clearer to
> explicitly check for CLOBBER_HIGH rather then applying some special semantics
> to a specific SET case.
Then let's return to the CLOBBER_HIGH approach.  The hope was that most
of the places where you had to introduce CLOBBER_HIGH would "just work"
with the self-set approach.  If that's not the case, then there's really
nothing to be gained with self-set.

I suggest you get the patch updated for the trunk and repost now that
we're in broad agreement that self-set is a rathole.

jeff

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Aarch64] Vector Function Application Binary Interface Specification for OpenMP
  2018-05-29 10:06                 ` Richard Sandiford
  2018-05-31 10:39                   ` Alan Hayward
@ 2018-06-11 23:06                   ` Jeff Law
  1 sibling, 0 replies; 18+ messages in thread
From: Jeff Law @ 2018-06-11 23:06 UTC (permalink / raw)
  To: Steve Ellcey, Alan.Hayward, Richard Earnshaw (lists),
	Francesco Petrogalli, James Greenhalgh, Sekhar, Ashwin, gcc,
	Marcus Shawcroft, nd, richard.sandiford

On 05/29/2018 04:05 AM, Richard Sandiford wrote:
> Jeff Law <law@redhat.com> writes:
>> Now that we're in stage1 I do want to revisit the CLOBBER_HIGH stuff.
>> When we left things I think we were trying to decide between
>> CLOBBER_HIGH and clobbering the appropriate subreg.  The problem with
>> the latter is the dataflow we compute is inaccurate (overly pessimistic)
>> so that'd have to be fixed.
> 
> The clobbered part of the register in this case is a high-part subreg,
> which is ill-formed for single registers.  It would also be difficult
> to represent in terms of the mode, since there are no defined modes for
> what can be stored in the high part of an SVE register.  For 128-bit
> SVE that mode would have zero bits. :-)
> 
> I thought the alternative suggestion was instead to have:
> 
>    (set (reg:M X) (reg:M X))
You're right.  I mis-remembered.  IT happens far too often these days.

> 
> when X is preserved in mode M but not in wider modes.  But that seems
> like too much of a special case to me, both in terms of the source and
> the destination:
Well, the hope was this would "just work" without having to introduce a
new RTX code and teach all the RTL passes about it.  The self-assignment
has the right semantics, but I believe Alan showed that the DF
infrastructure pessimized it horribly.  At which point the question
became how painful would it be to fix DF and compare that to the pain of
adding a new RTX code.




> 
> - On the destination side, a SET normally provides something for later
>   instructions to use, whereas here the effect is intended to be the
>   opposite: the instruction has no effect at all on a value of mode M
>   in X.  As you say, this would pessimise df without specific handling.
>   But I think all optimisations that look for the definition of a value
>   would need to be taught to "look through" this set to find the real
>   definition of (reg:M X) (or any value of a mode no larger than M in X).
>   Very few passes use the df def-uses chains for this due its high cost.
But how often do we really need to look for the REG in a large mode than
M?  Yea, it happens occasionally, but I don't think it's pervasive and
the cases where we do probably aren't *that* important performance-wise.

Though at a conceptual level I agree.  SET is meant to provide something
for later consumption, we'd be abusing it.


> 
>   More fundamentally, it should be possible in RTL to express an
>   instruction J that *does* read X in mode M and clobbers its high part.
>   If we use the SET above to represent the clobber, and treat the rhs use
>   as special, then presumably J would need two uses of X, one "dummy" one
>   on the no-op SET and one "real" one on some other SET (or perhaps in a
>   top-level USE).  Having the number of uses determine this seems
>   a bit awkward.
> 
> IMO CLOBBER and SET have different semantics for good reason: CLOBBER
> represents an optimisation barrier for things that care about the value
> of a certain rtx object, while SET represents a productive effect or
> side-effect.  The effect we want here is the same as a normal clobber,
> except that the clobber is mode-dependent.
I largely agree.  It was really a matter of whether or not using the
self-set would simplify the implementation in a significant way.

jeff

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Aarch64] Vector Function Application Binary Interface Specification for OpenMP
  2018-05-29 10:06                 ` Richard Sandiford
@ 2018-05-31 10:39                   ` Alan Hayward
  2018-06-12  3:11                     ` Jeff Law
  2018-06-11 23:06                   ` Jeff Law
  1 sibling, 1 reply; 18+ messages in thread
From: Alan Hayward @ 2018-05-31 10:39 UTC (permalink / raw)
  To: Richard Sandiford
  Cc: Jeff Law, Steve Ellcey, Richard Earnshaw, Francesco Petrogalli,
	James Greenhalgh, Sekhar, Ashwin, gcc, Marcus Shawcroft, nd

(Missed this thread initially due to incorrect email address)

> On 29 May 2018, at 11:05, Richard Sandiford <richard.sandiford@linaro.org> wrote:
> 
> Jeff Law <law@redhat.com> writes:
>> Now that we're in stage1 I do want to revisit the CLOBBER_HIGH stuff.
>> When we left things I think we were trying to decide between
>> CLOBBER_HIGH and clobbering the appropriate subreg.  The problem with
>> the latter is the dataflow we compute is inaccurate (overly pessimistic)
>> so that'd have to be fixed.

Yes, I want to get back to looking at this again, however I’ve been busy
elsewhere.

> 
> The clobbered part of the register in this case is a high-part subreg,
> which is ill-formed for single registers.  It would also be difficult
> to represent in terms of the mode, since there are no defined modes for
> what can be stored in the high part of an SVE register.  For 128-bit
> SVE that mode would have zero bits. :-)
> 
> I thought the alternative suggestion was instead to have:
> 
>   (set (reg:M X) (reg:M X))
> 
> when X is preserved in mode M but not in wider modes.  But that seems
> like too much of a special case to me, both in terms of the source and
> the destination:

Agreed. When I looked at doing it that way back in Jan, my conclusion was
that if we did it that way we end up with more or less the same code but
instead of:

if (GET_CODE (setter) == CLOBBER_HIGH
   && reg_is_clobbered_by_clobber_high(REGNO(dest), GET_MODE (rsp->last_set_value))

Now becomes something like:

if (GET_CODE (setter) == SET
   && REG_P (dest) && HARD_REGISTER_P (dest) && REG_P (src) && REGNO(dst) == REGNO(src)
   && reg_is_clobbered_by_self_set(REGNO(dest), GET_MODE (rsp->last_set_value))

Ok, some of that code can go into a macro, but it feel much clearer to
explicitly check for CLOBBER_HIGH rather then applying some special semantics
to a specific SET case.

Alan.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Aarch64] Vector Function Application Binary Interface Specification for OpenMP
  2018-05-27 15:59               ` Jeff Law
@ 2018-05-29 10:06                 ` Richard Sandiford
  2018-05-31 10:39                   ` Alan Hayward
  2018-06-11 23:06                   ` Jeff Law
  0 siblings, 2 replies; 18+ messages in thread
From: Richard Sandiford @ 2018-05-29 10:06 UTC (permalink / raw)
  To: Jeff Law
  Cc: Steve Ellcey, Alan.Hayward, Richard Earnshaw (lists),
	Francesco Petrogalli, James Greenhalgh, Sekhar, Ashwin, gcc,
	Marcus Shawcroft, nd

Jeff Law <law@redhat.com> writes:
> Now that we're in stage1 I do want to revisit the CLOBBER_HIGH stuff.
> When we left things I think we were trying to decide between
> CLOBBER_HIGH and clobbering the appropriate subreg.  The problem with
> the latter is the dataflow we compute is inaccurate (overly pessimistic)
> so that'd have to be fixed.

The clobbered part of the register in this case is a high-part subreg,
which is ill-formed for single registers.  It would also be difficult
to represent in terms of the mode, since there are no defined modes for
what can be stored in the high part of an SVE register.  For 128-bit
SVE that mode would have zero bits. :-)

I thought the alternative suggestion was instead to have:

   (set (reg:M X) (reg:M X))

when X is preserved in mode M but not in wider modes.  But that seems
like too much of a special case to me, both in terms of the source and
the destination:

- On the destination side, a SET normally provides something for later
  instructions to use, whereas here the effect is intended to be the
  opposite: the instruction has no effect at all on a value of mode M
  in X.  As you say, this would pessimise df without specific handling.
  But I think all optimisations that look for the definition of a value
  would need to be taught to "look through" this set to find the real
  definition of (reg:M X) (or any value of a mode no larger than M in X).
  Very few passes use the df def-uses chains for this due its high cost.

- On the source side, the instruction doesn't actually care what's in X,
  but nevertheless appears to use it.  This means that most passes would
  need to be taught that a use of X on the rhs of a no-op SET is special
  and should usually be ignored.

  More fundamentally, it should be possible in RTL to express an
  instruction J that *does* read X in mode M and clobbers its high part.
  If we use the SET above to represent the clobber, and treat the rhs use
  as special, then presumably J would need two uses of X, one "dummy" one
  on the no-op SET and one "real" one on some other SET (or perhaps in a
  top-level USE).  Having the number of uses determine this seems
  a bit awkward.

IMO CLOBBER and SET have different semantics for good reason: CLOBBER
represents an optimisation barrier for things that care about the value
of a certain rtx object, while SET represents a productive effect or
side-effect.  The effect we want here is the same as a normal clobber,
except that the clobber is mode-dependent.

Thanks,
Richard

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Aarch64] Vector Function Application Binary Interface Specification for OpenMP
  2018-05-26 10:09             ` Richard Sandiford
  2018-05-26 22:13               ` Segher Boessenkool
@ 2018-05-27 15:59               ` Jeff Law
  2018-05-29 10:06                 ` Richard Sandiford
  1 sibling, 1 reply; 18+ messages in thread
From: Jeff Law @ 2018-05-27 15:59 UTC (permalink / raw)
  To: Steve Ellcey, Alan.Haward, Richard Earnshaw (lists),
	Francesco Petrogalli, James Greenhalgh, Sekhar, Ashwin, gcc,
	Marcus Shawcroft, nd, richard.sandiford

On 05/26/2018 04:09 AM, Richard Sandiford wrote:
> Steve Ellcey <sellcey@cavium.com> writes:
>> On Wed, 2018-05-16 at 22:11 +0100, Richard Sandiford wrote:
>>> Â 
>>> TARGET_HARD_REGNO_CALL_PART_CLOBBERED is the only current way
>>> of saying that an rtl instruction preserves the low part of a
>>> register but clobbers the high part.Â Â We would need something like
>>> Alan H's CLOBBER_HIGH patches to do it using explicit clobbers.
>>>
>>> Another approach would be to piggy-back on the -fipa-ra
>>> infrastructure
>>> and record that vector PCS functions only clobber Q0-Q7.Â Â If -fipa-ra
>>> knows that a function doesn't clobber Q8-Q15 then that should
>>> override
>>> TARGET_HARD_REGNO_CALL_PART_CLOBBERED.Â Â (I'm not sure whether it does
>>> in practice, but it should :-)Â Â And if it doesn't that's a bug that's
>>> worth fixing for its own sake.)
>>>
>>> Thanks,
>>> Richard
>>
>> Alan,
>>
>> I have been looking at your CLOBBER_HIGH patches to see if they
>> might be helpful in implementing the ARM SIMD Vector ABI in GCC.
>> I have also been looking at the -fipa-ra flag and how it works.
>>
>> I was wondering if you considered using the ipa-ra infrastructure
>> for the SVE work that you are currently trying to support withÂ 
>> the CLOBBER_HIGH macro?
>>
>> My current thought for the ABI work is to mark all the floating
>> point / vector registers as caller saved (the lower half of V8-V15
>> are currently callee saved) and remove
>> TARGET_HARD_REGNO_CALL_PART_CLOBBERED.
>> This should work but would be inefficient.
>>
>> The next step would be to split get_call_reg_set_usage up into
>> two functions so that I don't have to pass in a default set of
>> registers.Â Â One function would return call_used_reg_set by
>> default (but could return a smaller set if it had actual used
>> register information) and the other would return regs_invalidated
>> by_call by default (but could also return a smaller set).
>>
>> Next I would add a 'largest mode used' array to call_cgraph_rtl_info
>> structure in addition to the current function_used_regs register
>> set.
>>
>> Then I could turn the get_call_reg_set_usage replacement functions
>> into target specific functions and with the information in the
>> call_cgraph_rtl_info structure and any simd attribute information on
>> a function I could modify what registers are really being used/invalidated
>> without being saved.
>>
>> If the called function only uses the bottom half of a register it would not
>> be marked as used/invalidated.Â Â If it uses the entire register and the
>> function is not marked as simd, then the register would marked as
>> used/invalidated.Â Â If the function was marked as simd the register would not
>> be marked because a simd function would save both the upper and lower halves
>> of a callee saved register (whereas a non simd function would only save the
>> lower half).
>>
>> Does this sound like something that could be used in place of yourÂ 
>> CLOBBER_HIGH patch?
> 
> One of the advantages of CLOBBER_HIGH is that it can be attached to
> arbitrary instructions, not just calls.  The motivating example was
> tlsdesc_small_<mode>, which isn't treated as a call but as a normal
> instruction.  (And I don't think we want to change that, since it's much
> easier for rtl optimisers to deal with normal instructions compared to
> calls.  In general a call is part of a longer sequence of instructions
> that includes setting up arguments, etc.)
Yea.  I don't think we want to change tlsdesc*.  Representing them as
normal insns rather than calls seems reasonable to me.

Now that we're in stage1 I do want to revisit the CLOBBER_HIGH stuff.
When we left things I think we were trying to decide between
CLOBBER_HIGH and clobbering the appropriate subreg.  The problem with
the latter is the dataflow we compute is inaccurate (overly pessimistic)
so that'd have to be fixed.

Jeff

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Aarch64] Vector Function Application Binary Interface Specification for OpenMP
  2018-05-26 10:09             ` Richard Sandiford
@ 2018-05-26 22:13               ` Segher Boessenkool
  2018-05-27 15:59               ` Jeff Law
  1 sibling, 0 replies; 18+ messages in thread
From: Segher Boessenkool @ 2018-05-26 22:13 UTC (permalink / raw)
  To: Steve Ellcey, Alan.Haward, Richard Earnshaw (lists),
	Francesco Petrogalli, James Greenhalgh, Sekhar, Ashwin, gcc,
	Marcus Shawcroft, nd, richard.sandiford

On Sat, May 26, 2018 at 11:09:24AM +0100, Richard Sandiford wrote:
> On the wider point about changing the way call clobber information
> is represented: I agree it would be good to generalise what we have
> now.  But if possible I think we should avoid target hooks that take
> a specific call, and instead make it an inherent part of the call insn
> itself, much like CALL_INSN_FUNCTION_USAGE is now.  E.g. we could add
> a field that points to an ABI description, with -fipa-ra effectively
> creating ad-hoc ABIs.  That ABI description could start out with
> whatever we think is relevant now and could grow over time.

Somewhat related: there still is PR68150 open for problems with
HARD_REGNO_CALL_PART_CLOBBERED in postreload-gcse (it ignores it).


Segher

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Aarch64] Vector Function Application Binary Interface Specification for OpenMP
  2018-05-24 17:50           ` Steve Ellcey
@ 2018-05-26 10:09             ` Richard Sandiford
  2018-05-26 22:13               ` Segher Boessenkool
  2018-05-27 15:59               ` Jeff Law
  0 siblings, 2 replies; 18+ messages in thread
From: Richard Sandiford @ 2018-05-26 10:09 UTC (permalink / raw)
  To: Steve Ellcey
  Cc: Alan.Haward, Richard Earnshaw (lists),
	Francesco Petrogalli, James Greenhalgh, Sekhar, Ashwin, gcc,
	Marcus Shawcroft, nd

Steve Ellcey <sellcey@cavium.com> writes:
> On Wed, 2018-05-16 at 22:11 +0100, Richard Sandiford wrote:
>> 
>> TARGET_HARD_REGNO_CALL_PART_CLOBBERED is the only current way
>> of saying that an rtl instruction preserves the low part of a
>> register but clobbers the high part.  We would need something like
>> Alan H's CLOBBER_HIGH patches to do it using explicit clobbers.
>> 
>> Another approach would be to piggy-back on the -fipa-ra
>> infrastructure
>> and record that vector PCS functions only clobber Q0-Q7.  If -fipa-ra
>> knows that a function doesn't clobber Q8-Q15 then that should
>> override
>> TARGET_HARD_REGNO_CALL_PART_CLOBBERED.  (I'm not sure whether it does
>> in practice, but it should :-)  And if it doesn't that's a bug that's
>> worth fixing for its own sake.)
>> 
>> Thanks,
>> Richard
>
> Alan,
>
> I have been looking at your CLOBBER_HIGH patches to see if they
> might be helpful in implementing the ARM SIMD Vector ABI in GCC.
> I have also been looking at the -fipa-ra flag and how it works.
>
> I was wondering if you considered using the ipa-ra infrastructure
> for the SVE work that you are currently trying to support with 
> the CLOBBER_HIGH macro?
>
> My current thought for the ABI work is to mark all the floating
> point / vector registers as caller saved (the lower half of V8-V15
> are currently callee saved) and remove
> TARGET_HARD_REGNO_CALL_PART_CLOBBERED.
> This should work but would be inefficient.
>
> The next step would be to split get_call_reg_set_usage up into
> two functions so that I don't have to pass in a default set of
> registers.  One function would return call_used_reg_set by
> default (but could return a smaller set if it had actual used
> register information) and the other would return regs_invalidated
> by_call by default (but could also return a smaller set).
>
> Next I would add a 'largest mode used' array to call_cgraph_rtl_info
> structure in addition to the current function_used_regs register
> set.
>
> Then I could turn the get_call_reg_set_usage replacement functions
> into target specific functions and with the information in the
> call_cgraph_rtl_info structure and any simd attribute information on
> a function I could modify what registers are really being used/invalidated
> without being saved.
>
> If the called function only uses the bottom half of a register it would not
> be marked as used/invalidated.  If it uses the entire register and the
> function is not marked as simd, then the register would marked as
> used/invalidated.  If the function was marked as simd the register would not
> be marked because a simd function would save both the upper and lower halves
> of a callee saved register (whereas a non simd function would only save the
> lower half).
>
> Does this sound like something that could be used in place of your 
> CLOBBER_HIGH patch?

One of the advantages of CLOBBER_HIGH is that it can be attached to
arbitrary instructions, not just calls.  The motivating example was
tlsdesc_small_<mode>, which isn't treated as a call but as a normal
instruction.  (And I don't think we want to change that, since it's much
easier for rtl optimisers to deal with normal instructions compared to
calls.  In general a call is part of a longer sequence of instructions
that includes setting up arguments, etc.)

The other use case (not implemented in the posted patches) would be
to represent the effect of syscalls, which clobber the "SVE part"
of all vector registers.  In that case the clobber would need to be
attached to an inline asm insn.

On the wider point about changing the way call clobber information
is represented: I agree it would be good to generalise what we have
now.  But if possible I think we should avoid target hooks that take
a specific call, and instead make it an inherent part of the call insn
itself, much like CALL_INSN_FUNCTION_USAGE is now.  E.g. we could add
a field that points to an ABI description, with -fipa-ra effectively
creating ad-hoc ABIs.  That ABI description could start out with
whatever we think is relevant now and could grow over time.

Thanks,
Richard

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Aarch64] Vector Function Application Binary Interface Specification for OpenMP
  2018-05-16 21:11         ` Richard Sandiford
@ 2018-05-24 17:50           ` Steve Ellcey
  2018-05-26 10:09             ` Richard Sandiford
  0 siblings, 1 reply; 18+ messages in thread
From: Steve Ellcey @ 2018-05-24 17:50 UTC (permalink / raw)
  To: Richard Sandiford, Alan.Haward
  Cc: Richard Earnshaw (lists),
	Francesco Petrogalli, James Greenhalgh, Sekhar, Ashwin, gcc,
	Marcus Shawcroft, nd

On Wed, 2018-05-16 at 22:11 +0100, Richard Sandiford wrote:
>Â 
> TARGET_HARD_REGNO_CALL_PART_CLOBBERED is the only current way
> of saying that an rtl instruction preserves the low part of a
> register but clobbers the high part.Â Â We would need something like
> Alan H's CLOBBER_HIGH patches to do it using explicit clobbers.
> 
> Another approach would be to piggy-back on the -fipa-ra
> infrastructure
> and record that vector PCS functions only clobber Q0-Q7.Â Â If -fipa-ra
> knows that a function doesn't clobber Q8-Q15 then that should
> override
> TARGET_HARD_REGNO_CALL_PART_CLOBBERED.Â Â (I'm not sure whether it does
> in practice, but it should :-)Â Â And if it doesn't that's a bug that's
> worth fixing for its own sake.)
> 
> Thanks,
> Richard

Alan,

I have been looking at your CLOBBER_HIGH patches to see if they
might be helpful in implementing the ARM SIMD Vector ABI in GCC.
I have also been looking at the -fipa-ra flag and how it works.

I was wondering if you considered using the ipa-ra infrastructure
for the SVE work that you are currently trying to support withÂ 
the CLOBBER_HIGH macro?

My current thought for the ABI work is to mark all the floating
point / vector registers as caller saved (the lower half of V8-V15
are currently callee saved) and remove
TARGET_HARD_REGNO_CALL_PART_CLOBBERED.
This should work but would be inefficient.

The next step would be to split get_call_reg_set_usage up into
two functions so that I don't have to pass in a default set of
registers.Â Â One function would return call_used_reg_set by
default (but could return a smaller set if it had actual used
register information) and the other would return regs_invalidated
by_call by default (but could also return a smaller set).

Next I would add a 'largest mode used' array to call_cgraph_rtl_info
structure in addition to the current function_used_regs register
set.

Then I could turn the get_call_reg_set_usage replacement functions
into target specific functions and with the information in the
call_cgraph_rtl_info structure and any simd attribute information on
a function I could modify what registers are really being used/invalidated
without being saved.

If the called function only uses the bottom half of a register it would not
be marked as used/invalidated.Â Â If it uses the entire register and the
function is not marked as simd, then the register would marked as
used/invalidated.Â Â If the function was marked as simd the register would not
be marked because a simd function would save both the upper and lower halves
of a callee saved register (whereas a non simd function would only save the
lower half).

Does this sound like something that could be used in place of yourÂ 
CLOBBER_HIGH patch?

Steve Ellcey
sellcey@cavium.com

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Aarch64] Vector Function Application Binary Interface Specification for OpenMP
  2018-05-16 17:30       ` Steve Ellcey
@ 2018-05-16 21:11         ` Richard Sandiford
  2018-05-24 17:50           ` Steve Ellcey
  0 siblings, 1 reply; 18+ messages in thread
From: Richard Sandiford @ 2018-05-16 21:11 UTC (permalink / raw)
  To: Steve Ellcey
  Cc: Richard Earnshaw (lists),
	Francesco Petrogalli, James Greenhalgh, Sekhar, Ashwin, gcc,
	Marcus Shawcroft, nd

Steve Ellcey <sellcey@cavium.com> writes:
> On Wed, 2018-05-16 at 17:30 +0100, Richard Earnshaw (lists) wrote:
>> On 16/05/18 17:21, Steve Ellcey wrote:
>> > 
>> > It doesn't look like GCC has any existing mechanism for having different
>> > sets of caller saved/callee saved registers depending on the function
>> > attributes of the calling or called function.
>> > 
>> > Changing what registers a callee function saves and restores shouldn't
>> > be too difficult since that can be done when generating the prologue
>> > and epilogue code but changing what registers a caller saves/restores
>> > when doing the call seems trickier.  The macro
>> > TARGET_HARD_REGNO_CALL_PART_CLOBBERED doesn't know anything about the
>> > function being called.  It returns true/false depending on just the
>> > register number and mode.
>> > 
>> > Steve Ellcey
>> > sellcey@cavium.com
>> > 
>> 
>> Actually, we can.  See, for example, the attribute((pcs)) for the ARM
>> port.  I think we could probably handle this automagically for the SVE
>> vector calling convention in AArch64.
>> 
>> R.
>
> Interesting, it looks like one could use aarch64_emit_call to emit
> extra use_reg / clobber_reg instructions but in this case we want to
> tell the caller that some registers are not being clobbered by the
> callee.  The ARM port does not
> define TARGET_HARD_REGNO_CALL_PART_CLOBBERED and that seemed like one
> of the most problamatic issues with Aarch64.  Maybe we would have to
> undefine this for aarch64 and use explicit clobbers to say what
> floating point registers / vector registers are clobbered for each
> call?  I wonder how that would affect register allocation.

TARGET_HARD_REGNO_CALL_PART_CLOBBERED is the only current way
of saying that an rtl instruction preserves the low part of a
register but clobbers the high part.  We would need something like
Alan H's CLOBBER_HIGH patches to do it using explicit clobbers.

Another approach would be to piggy-back on the -fipa-ra infrastructure
and record that vector PCS functions only clobber Q0-Q7.  If -fipa-ra
knows that a function doesn't clobber Q8-Q15 then that should override
TARGET_HARD_REGNO_CALL_PART_CLOBBERED.  (I'm not sure whether it does
in practice, but it should :-)  And if it doesn't that's a bug that's
worth fixing for its own sake.)

Thanks,
Richard

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Aarch64] Vector Function Application Binary Interface Specification for OpenMP
  2018-05-16 16:30     ` Richard Earnshaw (lists)
@ 2018-05-16 17:30       ` Steve Ellcey
  2018-05-16 21:11         ` Richard Sandiford
  0 siblings, 1 reply; 18+ messages in thread
From: Steve Ellcey @ 2018-05-16 17:30 UTC (permalink / raw)
  To: Richard Earnshaw (lists), Francesco Petrogalli
  Cc: James Greenhalgh, Sekhar, Ashwin, gcc, Marcus Shawcroft, nd

On Wed, 2018-05-16 at 17:30 +0100, Richard Earnshaw (lists) wrote:
> On 16/05/18 17:21, Steve Ellcey wrote:
> >Â 
> > It doesn't look like GCC has any existing mechanism for having different
> > sets of caller saved/callee saved registers depending on the function
> > attributes of the calling or called function.
> > 
> > Changing what registers a callee function saves and restores shouldn't
> > be too difficult since that can be done when generating the prologue
> > and epilogue code but changing what registers a caller saves/restores
> > when doing the call seems trickier.Â Â The macro
> > TARGET_HARD_REGNO_CALL_PART_CLOBBERED doesn't know anything about the
> > function being called.Â Â It returns true/false depending on just the
> > register number and mode.
> > 
> > Steve Ellcey
> > sellcey@cavium.com
> > 
> 
> Actually, we can.Â Â See, for example, the attribute((pcs)) for the ARM
> port.Â Â I think we could probably handle this automagically for the SVE
> vector calling convention in AArch64.
> 
> R.

Interesting, it looks like one could use aarch64_emit_call to emit
extra use_reg / clobber_reg instructions but in this case we want to
tell the caller that some registers are not being clobbered by the
callee. Â The ARM port does not
defineÂ TARGET_HARD_REGNO_CALL_PART_CLOBBERED and that seemed like one
of the most problamatic issues with Aarch64. Â Maybe we would have to
undefine this for aarch64 and use explicit clobbers to say what
floating point registers / vector registers are clobbered for each
call? Â I wonder how that would affect register allocation.

Steve Ellcey
sellcey@cavium.com

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Aarch64] Vector Function Application Binary Interface Specification for OpenMP
  2018-05-16 16:21   ` Steve Ellcey
@ 2018-05-16 16:30     ` Richard Earnshaw (lists)
  2018-05-16 17:30       ` Steve Ellcey
  2018-07-02 18:16     ` Francesco Petrogalli
  1 sibling, 1 reply; 18+ messages in thread
From: Richard Earnshaw (lists) @ 2018-05-16 16:30 UTC (permalink / raw)
  To: sellcey, Francesco Petrogalli
  Cc: James Greenhalgh, Sekhar, Ashwin, gcc, Marcus Shawcroft, nd

On 16/05/18 17:21, Steve Ellcey wrote:
> On Tue, 2018-05-15 at 18:29 +0000, Francesco Petrogalli wrote:
> 
>> Hi Steve,
>>
>> I am happy to let you know that the Vector Function ABI for AArch64
>> is now public and available via the link at [1].
>>
>> Donâ€™t hesitate to contact me in case you have any questions.
>>
>> Kind regards,
>>
>> Francesco
>>
>> [1] https://developer.arm.com/products/software-development-tools/hpc
>> /arm-compiler-for-hpc/vector-function-abi
>>
>>>
>>> Steve Ellcey
>>> sellcey@cavium.com
> 
> Thanks for publishing this Francesco, it looks like the main issue for
> GCC is that the Vector Function ABI has different caller saved / callee
> saved register conventions than the standard ARM calling convention.
> 
> If I understand things correctly, in the standard calling convention
> the callee will only save the bottom 64 bits of V8-V15 and so the
> caller needs to save those registers if it is using the top half.Â Â In
> the Vector calling convention the callee will save all 128 bits of
> these registers (and possibly more registers) so the caller does not
> have to save these registers at all, even if it is using all 128 bits
> of them.
> 
> It doesn't look like GCC has any existing mechanism for having different
> sets of caller saved/callee saved registers depending on the function
> attributes of the calling or called function.
> 
> Changing what registers a callee function saves and restores shouldn't
> be too difficult since that can be done when generating the prologue
> and epilogue code but changing what registers a caller saves/restores
> when doing the call seems trickier.Â Â The macro
> TARGET_HARD_REGNO_CALL_PART_CLOBBERED doesn't know anything about the
> function being called.Â Â It returns true/false depending on just the
> register number and mode.
> 
> Steve Ellcey
> sellcey@cavium.com
> 


Actually, we can.  See, for example, the attribute((pcs)) for the ARM
port.  I think we could probably handle this automagically for the SVE
vector calling convention in AArch64.

R.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Aarch64] Vector Function Application Binary Interface Specification for OpenMP
  2018-05-15 18:29 ` Francesco Petrogalli
@ 2018-05-16 16:21   ` Steve Ellcey
  2018-05-16 16:30     ` Richard Earnshaw (lists)
  2018-07-02 18:16     ` Francesco Petrogalli
  0 siblings, 2 replies; 18+ messages in thread
From: Steve Ellcey @ 2018-05-16 16:21 UTC (permalink / raw)
  To: Francesco Petrogalli
  Cc: James Greenhalgh, Sekhar, Ashwin, gcc, Richard Earnshaw,
	Marcus Shawcroft, nd

On Tue, 2018-05-15 at 18:29 +0000, Francesco Petrogalli wrote:

> Hi Steve,
> 
> I am happy to let you know that the Vector Function ABI for AArch64
> is now public and available via the link at [1].
> 
> Donâ€™t hesitate to contact me in case you have any questions.
> 
> Kind regards,
> 
> Francesco
> 
> [1] https://developer.arm.com/products/software-development-tools/hpc
> /arm-compiler-for-hpc/vector-function-abi
> 
> > 
> > Steve Ellcey
> > sellcey@cavium.com

Thanks for publishing this Francesco, it looks like the main issue for
GCC is that the Vector Function ABI has different caller saved / callee
saved register conventions than the standard ARM calling convention.

If I understand things correctly, in the standard calling convention
the callee will only save the bottom 64 bits of V8-V15 and so the
caller needs to save those registers if it is using the top half.Â Â In
the Vector calling convention the callee will save all 128 bits of
these registers (and possibly more registers) so the caller does not
have to save these registers at all, even if it is using all 128 bits
of them.

It doesn't look like GCC has any existing mechanism for having different
sets of caller saved/callee saved registers depending on the function
attributes of the calling or called function.

Changing what registers a callee function saves and restores shouldn't
be too difficult since that can be done when generating the prologue
and epilogue code but changing what registers a caller saves/restores
when doing the call seems trickier.Â Â The macro
TARGET_HARD_REGNO_CALL_PART_CLOBBERED doesn't know anything about the
function being called.Â Â It returns true/false depending on just the
register number and mode.

Steve Ellcey
sellcey@cavium.com

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Aarch64] Vector Function Application Binary Interface Specification for OpenMP
  2018-02-09 21:47 Steve Ellcey
@ 2018-05-15 18:29 ` Francesco Petrogalli
  2018-05-16 16:21   ` Steve Ellcey
  0 siblings, 1 reply; 18+ messages in thread
From: Francesco Petrogalli @ 2018-05-15 18:29 UTC (permalink / raw)
  To: sellcey
  Cc: James Greenhalgh, Sekhar, Ashwin, gcc, Richard Earnshaw,
	Marcus Shawcroft, nd


> On Feb 9, 2018, at 3:47 PM, Steve Ellcey <sellcey@cavium.com> wrote:
> 
> […]
> I was wondering if the function vector ABI has been published yet and
> if so, where I could find it.
> 

Hi Steve,

I am happy to let you know that the Vector Function ABI for AArch64 is now public and available via the link at [1].

Don’t hesitate to contact me in case you have any questions.

Kind regards,

Francesco

[1] https://developer.arm.com/products/software-development-tools/hpc/arm-compiler-for-hpc/vector-function-abi

> Steve Ellcey
> sellcey@cavium.com


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Aarch64] Vector Function Application Binary Interface Specification for OpenMP
@ 2018-02-09 21:47 Steve Ellcey
  2018-05-15 18:29 ` Francesco Petrogalli
  0 siblings, 1 reply; 18+ messages in thread
From: Steve Ellcey @ 2018-02-09 21:47 UTC (permalink / raw)
  To: james.greenhalgh, Sekhar, Ashwin, gcc, richard.earnshaw,
	Marcus Shawcroft, nd

James,

This is a follow-up toÂ https://gcc.gnu.org/ml/gcc/2017-03/msg00109.html
Â where you said:

| Hi Ashwin,
|Â 
| Thanks for the question. ARM has defined a vector function ABI, based
| on the Vector Function ABI Specification you linked below, which
| is designed to be suitable for both the Advanced SIMD and Scalable
| Vector Extensions. There has not yet been a release of this document
| which I can point you at, nor can I give you an estimate of when the
| document will be published.

I was wondering if the function vector ABI has been published yet and
if so, where I could find it.

Steve Ellcey
sellcey@cavium.com

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2018-07-02 18:16 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-03-15  9:50 [Aarch64] Vector Function Application Binary Interface Specification for OpenMP Sekhar, Ashwin
2017-03-17 14:02 ` James Greenhalgh
2017-03-20  4:30   ` Sekhar, Ashwin
2018-02-09 21:47 Steve Ellcey
2018-05-15 18:29 ` Francesco Petrogalli
2018-05-16 16:21   ` Steve Ellcey
2018-05-16 16:30     ` Richard Earnshaw (lists)
2018-05-16 17:30       ` Steve Ellcey
2018-05-16 21:11         ` Richard Sandiford
2018-05-24 17:50           ` Steve Ellcey
2018-05-26 10:09             ` Richard Sandiford
2018-05-26 22:13               ` Segher Boessenkool
2018-05-27 15:59               ` Jeff Law
2018-05-29 10:06                 ` Richard Sandiford
2018-05-31 10:39                   ` Alan Hayward
2018-06-12  3:11                     ` Jeff Law
2018-06-11 23:06                   ` Jeff Law
2018-07-02 18:16     ` Francesco Petrogalli

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).