public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed
* PPC64 libmvec sincos/sincosf ABI
@ 2019-08-01 13:01 GT
  2019-08-01 17:04 ` Joseph Myers
  2019-08-07 21:17 ` Tulio Magno Quites Machado Filho
  0 siblings, 2 replies; 18+ messages in thread
From: GT @ 2019-08-01 13:01 UTC (permalink / raw)
  To: libc-alpha\@sourceware.org

I believe PPC64 needs to implement functions analogous to x86_64 _ZGVbN4vvv_sincosf, _ZGVbN4vl4l4_sincosf, _ZGVbN2vvv_sincos, _ZGVbN2vl8l8_sincos.

The function signatures of scalar sincosf and sincos are:

sincosf (float, float *, float *)
sincos (double, double *, double *)

How do I determine the vector function signatures in C, of the 4 vector functions referenced at the top of this message?

Thanks.
Bert.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: PPC64 libmvec sincos/sincosf ABI
  2019-08-01 13:01 PPC64 libmvec sincos/sincosf ABI GT
@ 2019-08-01 17:04 ` Joseph Myers
  2019-08-07 21:17 ` Tulio Magno Quites Machado Filho
  1 sibling, 0 replies; 18+ messages in thread
From: Joseph Myers @ 2019-08-01 17:04 UTC (permalink / raw)
  To: GT; +Cc: libc-alpha\@sourceware.org

On Thu, 1 Aug 2019, GT wrote:

> I believe PPC64 needs to implement functions analogous to x86_64 
> _ZGVbN4vvv_sincosf, _ZGVbN4vl4l4_sincosf, _ZGVbN2vvv_sincos, 
> _ZGVbN2vl8l8_sincos.

The x86_64 functions ended up with an ABI (vectors of pointers being 
passed) that may well be a mistake and certainly wasn't what was intended 
when they were first added.  I think it's necessary to answer questions 
along the following lines.

1. What is the best vector ABI (best performance) for sincos on PPC64?  
That may be a function of the particular vector instructions available on 
PPC64; the best choice of ABI on PPC64 need not correspond to the best 
choice on x86_64.

2. What is the correct pragma / attribute to use in header declarations to 
indicate that sincos has that ABI, and the corresponding name mangling?  
This needs resolving in conjunction with people working on the ABI 
document and compilers to ensure there is common agreement about how to 
tell the compiler that certain vector function variants are available.  
The pragma / attribute may well not be the same as used for other libmvec 
functions.

Only once you have answers to those questions can you know what function 
names should be implemented, what the interface to them should be, and, 
thus, the corresponding C interface for directly testing them from C.

As noted in previous discussions, the fact that for x86_64 the vector 
sincos functions initially had an ABI inconsistent with the one implied by 
the header declarations used, and the difference between those functions 
and other libmvec functions, makes it particularly important to do an 
end-to-end test, using the relevant header declarations and a compiler 
that supports generating vector function calls given those declarations, 
to make sure that the x86_64 mistake isn't repeated and that the PPC64 
sincos functions really do have the intended ABI corresponding to the 
header declarations.

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: PPC64 libmvec sincos/sincosf ABI
  2019-08-01 13:01 PPC64 libmvec sincos/sincosf ABI GT
  2019-08-01 17:04 ` Joseph Myers
@ 2019-08-07 21:17 ` Tulio Magno Quites Machado Filho
  2019-08-08 13:34   ` Bill Schmidt
  1 sibling, 1 reply; 18+ messages in thread
From: Tulio Magno Quites Machado Filho @ 2019-08-07 21:17 UTC (permalink / raw)
  To: GT, libc-alpha\@sourceware.org; +Cc: William J. Schmidt, Joseph Myers, nd

GT <tnggil@protonmail.com> writes:

> I believe PPC64 needs to implement functions analogous to x86_64 _ZGVbN4vvv_sincosf, _ZGVbN4vl4l4_sincosf, _ZGVbN2vvv_sincos, _ZGVbN2vl8l8_sincos.

I can't follow you here.
Why do you think both implementations for each type are necessary?

AFAIU, both _ZGVbN4vvv_sincosf and _ZGVbN2vvv_sincos should not exist.
Or are you implying they're required somewhere else?

> The function signatures of scalar sincosf and sincos are:
>
> sincosf (float, float *, float *)
> sincos (double, double *, double *)
>
> How do I determine the vector function signatures in C, of the 4 vector functions referenced at the top of this message?

For _ZGVbN4vl4l4_sincosf and _ZGVbN2vl8l8_sincos I'd write them as:

void sincosf (vector float, vector float *, vector float *);
void sincos (vector double, vector double *, vector double *);

-- 
Tulio Magno

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: PPC64 libmvec sincos/sincosf ABI
  2019-08-07 21:17 ` Tulio Magno Quites Machado Filho
@ 2019-08-08 13:34   ` Bill Schmidt
  2019-08-08 15:48     ` GT
  0 siblings, 1 reply; 18+ messages in thread
From: Bill Schmidt @ 2019-08-08 13:34 UTC (permalink / raw)
  To: Tulio Magno Quites Machado Filho, GT, libc-alpha\@sourceware.org
  Cc: Joseph Myers, nd

On 8/7/19 4:17 PM, Tulio Magno Quites Machado Filho wrote:
> GT <tnggil@protonmail.com> writes:
>
>> I believe PPC64 needs to implement functions analogous to x86_64 _ZGVbN4vvv_sincosf, _ZGVbN4vl4l4_sincosf, _ZGVbN2vvv_sincos, _ZGVbN2vl8l8_sincos.
> I can't follow you here.
> Why do you think both implementations for each type are necessary?
>
> AFAIU, both _ZGVbN4vvv_sincosf and _ZGVbN2vvv_sincos should not exist.
> Or are you implying they're required somewhere else?
>
>> The function signatures of scalar sincosf and sincos are:
>>
>> sincosf (float, float *, float *)
>> sincos (double, double *, double *)
>>
>> How do I determine the vector function signatures in C, of the 4 vector functions referenced at the top of this message?
> For _ZGVbN4vl4l4_sincosf and _ZGVbN2vl8l8_sincos I'd write them as:
>
> void sincosf (vector float, vector float *, vector float *);
> void sincos (vector double, vector double *, vector double *);
>
I'm trying to work my way into understanding the veclibabi support in
GCC, so please bear with me.

Why are we interested in sincos at all?  There is no handling of sincos
in the i386 SVML or ACML interfaces for libmvec.  They handle only sin
and cos separately, as does libmassv for Power.  I am coming late to the
discussion, but I don't understand how this fits into the libmvec ABI
requirements.

Because sincos has an oddball interface, it doesn't fit in well with the
-mveclibabi=* machinery, so far as I can tell.

Thanks,
Bill

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: PPC64 libmvec sincos/sincosf ABI
  2019-08-08 13:34   ` Bill Schmidt
@ 2019-08-08 15:48     ` GT
  2019-08-08 15:56       ` Florian Weimer
  2019-08-08 16:11       ` Bill Schmidt
  0 siblings, 2 replies; 18+ messages in thread
From: GT @ 2019-08-08 15:48 UTC (permalink / raw)
  To: Bill Schmidt; +Cc: libc-alpha

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Thursday, August 8, 2019 1:33 PM, Bill Schmidt wschmidt@linux.ibm.com wrote:

>
> I'm trying to work my way into understanding the veclibabi support in
> GCC, so please bear with me.
>
> Why are we interested in sincos at all?  There is no handling of sincos
> in the i386 SVML or ACML interfaces for libmvec.  They handle only sin
> and cos separately, as does libmassv for Power.  I am coming late to the
> discussion, but I don't understand how this fits into the libmvec ABI
> requirements.
>

1. I understood sincos to be included in implementation of libmvec on PPC64
because x86_64 provides that capability. There is discussion of an initial
implementation bug in x86_64 sincos at:
https://sourceware.org/bugzilla/show_bug.cgi?id=20024

The final comment in the thread declares the bug fixed in GLIBC 2.24.

2. I am trying to understand how GCC currently vectorizes loops containing calls
to sincos. GLIBC wiki page https://sourceware.org/glibc/wiki/libmvec has two
examples of code that GCC will vectorize if GLIBC is properly implemented. I
changed Example 2 so that it now reads as below:

tst_sincos.c
-------------------
#include <math.h>

int N = 3200;

double a[3200];
double b[3200];
double c[3200];

int main (void)
{
  int i;

  for (i = 0; i < N; i += 1)
  {
    sincos (a[i], &b[i], &c[i]);
  }

  return (0);
}
--------------------

Compiled it with gcc version 8.3.1 using the command in the Example:
gcc ./tst_sincos.c -O1 -ftree-loop-vectorize -ffast-math -lm -mavx

Does not generate vectorized call. There is the following warning issued
by gcc:

----------------------
./tst_sincos.c: In function ‘main’:
./tst_sincos.c:15:5: warning: implicit declaration of function ‘sincos’ [-Wimplicit-function-declaration]
     sincos (a[i], &b[i], &c[i]);
     ^~~~~~
./tst_sincos.c:15:5: warning: incompatible implicit declaration of built-in function ‘sincos’
./tst_sincos.c:15:5: note: include ‘<math.h>’ or provide a declaration of ‘sincos’
./tst_sincos.c:2:1:
+#include <math.h>

./tst_sincos.c:15:5:
     sincos (a[i], &b[i], &c[i]);
     ^~~~~~
----------------------

The same warning is issued even when not requesting vectorization:
gcc ./tst_sincos.c -lm

All the above was on an x86_64 system. Got the same warning on PPC64 POWER8.

What is the issue with tst_sincos.c above? math.h is clearly included.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: PPC64 libmvec sincos/sincosf ABI
  2019-08-08 15:48     ` GT
@ 2019-08-08 15:56       ` Florian Weimer
  2019-08-08 16:56         ` GT
  2019-08-08 16:11       ` Bill Schmidt
  1 sibling, 1 reply; 18+ messages in thread
From: Florian Weimer @ 2019-08-08 15:56 UTC (permalink / raw)
  To: GT; +Cc: Bill Schmidt, libc-alpha

* GT:

> ./tst_sincos.c: In function ‘main’:
> ./tst_sincos.c:15:5: warning: implicit declaration of function ‘sincos’ [-Wimplicit-function-declaration]
>      sincos (a[i], &b[i], &c[i]);
>      ^~~~~~
> ./tst_sincos.c:15:5: warning: incompatible implicit declaration of built-in function ‘sincos’
> ./tst_sincos.c:15:5: note: include ‘<math.h>’ or provide a declaration of ‘sincos’
> ./tst_sincos.c:2:1:
> +#include <math.h>

For sincos, you currently need to build with _GNU_SOURCE.  (I don't get
a vector call for this, though.)

We should probably make sincos etc. available by default because these
functions are widely available elsewhere.

Thanks,
Florian

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: PPC64 libmvec sincos/sincosf ABI
  2019-08-08 15:48     ` GT
  2019-08-08 15:56       ` Florian Weimer
@ 2019-08-08 16:11       ` Bill Schmidt
  2019-08-08 17:42         ` GT
  1 sibling, 1 reply; 18+ messages in thread
From: Bill Schmidt @ 2019-08-08 16:11 UTC (permalink / raw)
  To: GT; +Cc: libc-alpha

On 8/8/19 10:48 AM, GT wrote:
> ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
> On Thursday, August 8, 2019 1:33 PM, Bill Schmidt wschmidt@linux.ibm.com wrote:
>
>> I'm trying to work my way into understanding the veclibabi support in
>> GCC, so please bear with me.
>>
>> Why are we interested in sincos at all?  There is no handling of sincos
>> in the i386 SVML or ACML interfaces for libmvec.  They handle only sin
>> and cos separately, as does libmassv for Power.  I am coming late to the
>> discussion, but I don't understand how this fits into the libmvec ABI
>> requirements.
>>
> 1. I understood sincos to be included in implementation of libmvec on PPC64
> because x86_64 provides that capability. There is discussion of an initial
> implementation bug in x86_64 sincos at:
> https://sourceware.org/bugzilla/show_bug.cgi?id=20024
>
> The final comment in the thread declares the bug fixed in GLIBC 2.24.

Interesting.  I believe you -- I'm still learning about this.  It
appears that, although there is an interface provided in libmvec, there
isn't GCC code to generate calls to it.  I'd like to be proven wrong
about that.
>
> 2. I am trying to understand how GCC currently vectorizes loops containing calls
> to sincos. GLIBC wiki page https://sourceware.org/glibc/wiki/libmvec has two
> examples of code that GCC will vectorize if GLIBC is properly implemented. I
> changed Example 2 so that it now reads as below:
>
> tst_sincos.c
> -------------------
> #include <math.h>
>
> int N = 3200;
>
> double a[3200];
> double b[3200];
> double c[3200];
>
> int main (void)
> {
>   int i;
>
>   for (i = 0; i < N; i += 1)
>   {
>     sincos (a[i], &b[i], &c[i]);
>   }
>
>   return (0);
> }
> --------------------
>
> Compiled it with gcc version 8.3.1 using the command in the Example:
> gcc ./tst_sincos.c -O1 -ftree-loop-vectorize -ffast-math -lm -mavx
>
> Does not generate vectorized call. There is the following warning issued
> by gcc:
>
> ----------------------
> ./tst_sincos.c: In function ‘main’:
> ./tst_sincos.c:15:5: warning: implicit declaration of function ‘sincos’ [-Wimplicit-function-declaration]
>      sincos (a[i], &b[i], &c[i]);
>      ^~~~~~
> ./tst_sincos.c:15:5: warning: incompatible implicit declaration of built-in function ‘sincos’
> ./tst_sincos.c:15:5: note: include ‘<math.h>’ or provide a declaration of ‘sincos’
> ./tst_sincos.c:2:1:
> +#include <math.h>
>
> ./tst_sincos.c:15:5:
>      sincos (a[i], &b[i], &c[i]);
>      ^~~~~~
> ----------------------
>
> The same warning is issued even when not requesting vectorization:
> gcc ./tst_sincos.c -lm
>
> All the above was on an x86_64 system. Got the same warning on PPC64 POWER8.
>
> What is the issue with tst_sincos.c above? math.h is clearly included.
I agree that this is odd.  I have no explanation for you.

You aren't going to get the vectorized function calls having removed the
#pragma omp simd, I think.

But even with that addressed, I don't think you'll see the vectorized
function calls, because GCC doesn't specifically handle the sincos
builtin.  In ix86_veclibabi_svml, there's a switch statement that
describes which builtins are handled.  These are the cases that result
in calls to libmvec:

    CASE_CFN_EXP:
    CASE_CFN_LOG:
    CASE_CFN_LOG10:
    CASE_CFN_POW:
    CASE_CFN_TANH:
    CASE_CFN_TAN:
    CASE_CFN_ATAN:
    CASE_CFN_ATAN2:
    CASE_CFN_ATANH:
    CASE_CFN_CBRT:
    CASE_CFN_SINH:
    CASE_CFN_SIN:
    CASE_CFN_ASINH:
    CASE_CFN_ASIN:
    CASE_CFN_COSH:
    CASE_CFN_COS:
    CASE_CFN_ACOSH:
    CASE_CFN_ACOS:

CASE_CFN_SINCOS does not appear among them.  It's possibly because the
interface is different enough that GCC hasn't been taught to handle it. 
Or perhaps it's an oversight.  The sin and cos examples from the link
above should work, since those are covered here.

Anyway, at a minimum we probably want to provide the libmvec sincos[f]
functions for ppc64le (see the other fork of this thread for the return
value discussion), but at the moment existing vector library ABI support
in GCC doesn't seem to target those.

Bill

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: PPC64 libmvec sincos/sincosf ABI
  2019-08-08 15:56       ` Florian Weimer
@ 2019-08-08 16:56         ` GT
  0 siblings, 0 replies; 18+ messages in thread
From: GT @ 2019-08-08 16:56 UTC (permalink / raw)
  To: Florian Weimer, libc-alpha\@sourceware.org

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Thursday, August 8, 2019 3:56 PM, Florian Weimer fweimer@redhat.com wrote:

> - GT:
>
> > ./tst_sincos.c: In function ‘main’:
> > ./tst_sincos.c:15:5: warning: implicit declaration of function ‘sincos’ [-Wimplicit-function-declaration]
> > sincos (a[i], &b[i], &c[i]);
> > ^~~~~~
> > ./tst_sincos.c:15:5: warning: incompatible implicit declaration of built-in function ‘sincos’
> > ./tst_sincos.c:15:5: note: include ‘<math.h>’ or provide a declaration of ‘sincos’
> > ./tst_sincos.c:2:1:
> > +#include <math.h>
>
> For sincos, you currently need to build with _GNU_SOURCE. (I don't get
> a vector call for this, though.)

Thanks for this info. I will rebuild with that macro defined. But as explained by
Bill Schmidt in this message: https://sourceware.org/ml/libc-alpha/2019-08/msg00142.html,
support is missing in ix86_veclibabi_svml for sincos, hence no vectorization for it
will occur presently.

Thanks.
Bert.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: PPC64 libmvec sincos/sincosf ABI
  2019-08-08 16:11       ` Bill Schmidt
@ 2019-08-08 17:42         ` GT
  2019-08-08 17:51           ` Bill Schmidt
  0 siblings, 1 reply; 18+ messages in thread
From: GT @ 2019-08-08 17:42 UTC (permalink / raw)
  To: Bill Schmidt; +Cc: libc-alpha

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Thursday, August 8, 2019 4:11 PM, Bill Schmidt wschmidt@linux.ibm.com wrote:

> > 1. I understood sincos to be included in implementation of libmvec on PPC64
> > because x86_64 provides that capability. There is discussion of an initial
> > implementation bug in x86_64 sincos at:
> > https://sourceware.org/bugzilla/show_bug.cgi?id=20024
> >
> >
> > The final comment in the thread declares the bug fixed in GLIBC 2.24.
>
> Interesting.  I believe you -- I'm still learning about this.  It
> appears that, although there is an interface provided in libmvec, there
> isn't GCC code to generate calls to it.  I'd like to be proven wrong
> about that.

Your note below regarding the missing logic for sincos in ix86_veclibabi_svml
is proof that you are correct. Isn't it?

> > Compiled it with gcc version 8.3.1 using the command in the Example:
> > gcc ./tst_sincos.c -O1 -ftree-loop-vectorize -ffast-math -lm -mavx
> > Does not generate vectorized call. There is the following warning issued
> > by gcc:
> > ./tst_sincos.c: In function ‘main’:
> > ./tst_sincos.c:15:5: warning: implicit declaration of function ‘sincos’ [-Wimplicit-function-declaration]
> > sincos (a[i], &b[i], &c[i]);
> > ^~~~~~
> > ./tst_sincos.c:15:5: warning: incompatible implicit declaration of built-in function ‘sincos’
> > ./tst_sincos.c:15:5: note: include ‘<math.h>’ or provide a declaration of ‘sincos’
> > ./tst_sincos.c:2:1:
> > +#include <math.h>
> > ./tst_sincos.c:15:5:
> > sincos (a[i], &b[i], &c[i]);
> > ^~~~~~
> >
> > The same warning is issued even when not requesting vectorization:
> > gcc ./tst_sincos.c -lm
> > All the above was on an x86_64 system. Got the same warning on PPC64 POWER8.
> > What is the issue with tst_sincos.c above? math.h is clearly included.
>
> I agree that this is odd.  I have no explanation for you.

Florian has given the reason for the warnings as being that it is required to build
with _GNU_SOURCE for sincos.

> You aren't going to get the vectorized function calls having removed the
> #pragma omp simd, I think.

GCC flags -O1 and -ftree-loop-vectorize enable vectorization of loops even in the absence
of simd pragmas. Example 2 explicitly states that. The failure to vectorize is specific to
how sincos is being handled. Which goes back to ix86_veclibabi_svml.

Thanks.
Bert.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: PPC64 libmvec sincos/sincosf ABI
  2019-08-08 17:42         ` GT
@ 2019-08-08 17:51           ` Bill Schmidt
  0 siblings, 0 replies; 18+ messages in thread
From: Bill Schmidt @ 2019-08-08 17:51 UTC (permalink / raw)
  To: GT; +Cc: libc-alpha

On 8/8/19 12:42 PM, GT wrote:
> ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
> On Thursday, August 8, 2019 4:11 PM, Bill Schmidt wschmidt@linux.ibm.com wrote:
>
>>> 1. I understood sincos to be included in implementation of libmvec on PPC64
>>> because x86_64 provides that capability. There is discussion of an initial
>>> implementation bug in x86_64 sincos at:
>>> https://sourceware.org/bugzilla/show_bug.cgi?id=20024
>>>
>>>
>>> The final comment in the thread declares the bug fixed in GLIBC 2.24.
>> Interesting.  I believe you -- I'm still learning about this.  It
>> appears that, although there is an interface provided in libmvec, there
>> isn't GCC code to generate calls to it.  I'd like to be proven wrong
>> about that.
> Your note below regarding the missing logic for sincos in ix86_veclibabi_svml
> is proof that you are correct. Isn't it?

I hope so. :-) 
>
>>> Compiled it with gcc version 8.3.1 using the command in the Example:
>>> gcc ./tst_sincos.c -O1 -ftree-loop-vectorize -ffast-math -lm -mavx
>>> Does not generate vectorized call. There is the following warning issued
>>> by gcc:
>>> ./tst_sincos.c: In function ‘main’:
>>> ./tst_sincos.c:15:5: warning: implicit declaration of function ‘sincos’ [-Wimplicit-function-declaration]
>>> sincos (a[i], &b[i], &c[i]);
>>> ^~~~~~
>>> ./tst_sincos.c:15:5: warning: incompatible implicit declaration of built-in function ‘sincos’
>>> ./tst_sincos.c:15:5: note: include ‘<math.h>’ or provide a declaration of ‘sincos’
>>> ./tst_sincos.c:2:1:
>>> +#include <math.h>
>>> ./tst_sincos.c:15:5:
>>> sincos (a[i], &b[i], &c[i]);
>>> ^~~~~~
>>>
>>> The same warning is issued even when not requesting vectorization:
>>> gcc ./tst_sincos.c -lm
>>> All the above was on an x86_64 system. Got the same warning on PPC64 POWER8.
>>> What is the issue with tst_sincos.c above? math.h is clearly included.
>> I agree that this is odd.  I have no explanation for you.
> Florian has given the reason for the warnings as being that it is required to build
> with _GNU_SOURCE for sincos.
>
>> You aren't going to get the vectorized function calls having removed the
>> #pragma omp simd, I think.
> GCC flags -O1 and -ftree-loop-vectorize enable vectorization of loops even in the absence
> of simd pragmas. Example 2 explicitly states that. The failure to vectorize is specific to
> how sincos is being handled. Which goes back to ix86_veclibabi_svml.

Ah, I'm sorry.  I missed that statement in example 2.  So yes, I think
it's all clear, then; GCC just won't generate this.
>
> Thanks.
> Bert.
>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: PPC64 libmvec sincos/sincosf ABI
  2019-09-23 18:02       ` GT
@ 2019-09-24 16:43         ` Bill Schmidt
  0 siblings, 0 replies; 18+ messages in thread
From: Bill Schmidt @ 2019-09-24 16:43 UTC (permalink / raw)
  To: GT, libc-alpha\@sourceware.org

Hi!  Please CC me directly as I don't read libc-alpha religiously every day.

On 9/23/19 1:02 PM, GT wrote:
>> Sure, I can work together with you on this.  I agree that a new
>> attribute is needed.  The term we use for this in our existing ELFv2 ABI
>> document is "homogeneous aggregates," so it would be good if the name of
>> the attribute could reflect that the interface returns a homogeneous
>> aggregate.  This is a bit of a mouthful, so may require some shortening.
> How about this for the attribute specification:
>
> __attribute__ ((__elfv2_aggregate_return__))
>
> It's rather long, but there already exist attribute names of similar length, like
> no_profile_instrument_function.


I like it.  Good choice.

>
>> As far as the new ABI document goes, I think we are looking to you to
>> complete the proposal of interfaces, attributes, and so forth so that
>> the document can be written.  I am the right person to work with on this.
> I plan on reusing and adapting GCC's implementation of function cos as much as
> possible. Nothing special about cos. Could just as well say reuse/adapt from
> function sin.
>
> Sincos differs from cos in that the scalar function has 2 extra input arguments;
> the pointers to locations in which to store the sine and cosine results. So:
>
> 1. Prior to GCC making the vectorized cos call, arguments from multiple scalar
> cosine calls are assembled into a single input vector argument to the vector cos
> function. I think this part of code can be used almost verbatim for sincos. The
> reason is that the first argument to sincos is passed by value and is in fact the
> exact same value that would be passed to scalar sin and cos separately.


Agreed.

>
> 2. On return from the vector cos call, GCC extracts scalar results from the returned
> vector output and assigns each to its respective scalar variable. Much of the code
> here can be reused as long as a few changes are made:
>
> i. When assembling the vector sincos call, each scalar call's 2nd and 3rd arguments
> must be saved so that results will later be written to those locations.
> ii. On return from the vector sincos, the code needs to account for the fact that scalar
> results go to locations given by pointers rather than to named variables for cos.
>
> Have I overlooked any significant issue?
>

I think that's fine from an ABI perspective. Implementation-wise, in the 
most common case we would expect the combined scalar calls' pointer 
arguments to be consecutive, in which case we can optimize to do vector 
stores from the aggregate return registers (v2, v3).  But we have to be 
prepared to distribute the scalars independently if necessary.

Thanks, Bert!

Bill

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: PPC64 libmvec sincos/sincosf ABI
  2019-09-20 20:25     ` Bill Schmidt
@ 2019-09-23 18:02       ` GT
  2019-09-24 16:43         ` Bill Schmidt
  0 siblings, 1 reply; 18+ messages in thread
From: GT @ 2019-09-23 18:02 UTC (permalink / raw)
  To: libc-alpha\@sourceware.org

> Sure, I can work together with you on this.  I agree that a new
> attribute is needed.  The term we use for this in our existing ELFv2 ABI
> document is "homogeneous aggregates," so it would be good if the name of
> the attribute could reflect that the interface returns a homogeneous
> aggregate.  This is a bit of a mouthful, so may require some shortening.

How about this for the attribute specification:

__attribute__ ((__elfv2_aggregate_return__))

It's rather long, but there already exist attribute names of similar length, like
no_profile_instrument_function.

>
> As far as the new ABI document goes, I think we are looking to you to
> complete the proposal of interfaces, attributes, and so forth so that
> the document can be written.  I am the right person to work with on this.

I plan on reusing and adapting GCC's implementation of function cos as much as
possible. Nothing special about cos. Could just as well say reuse/adapt from
function sin.

Sincos differs from cos in that the scalar function has 2 extra input arguments;
the pointers to locations in which to store the sine and cosine results. So:

1. Prior to GCC making the vectorized cos call, arguments from multiple scalar
cosine calls are assembled into a single input vector argument to the vector cos
function. I think this part of code can be used almost verbatim for sincos. The
reason is that the first argument to sincos is passed by value and is in fact the
exact same value that would be passed to scalar sin and cos separately.

2. On return from the vector cos call, GCC extracts scalar results from the returned
vector output and assigns each to its respective scalar variable. Much of the code
here can be reused as long as a few changes are made:

i. When assembling the vector sincos call, each scalar call's 2nd and 3rd arguments
must be saved so that results will later be written to those locations.
ii. On return from the vector sincos, the code needs to account for the fact that scalar
results go to locations given by pointers rather than to named variables for cos.

Have I overlooked any significant issue?

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: PPC64 libmvec sincos/sincosf ABI
  2019-09-20 19:25   ` GT
@ 2019-09-20 20:25     ` Bill Schmidt
  2019-09-23 18:02       ` GT
  0 siblings, 1 reply; 18+ messages in thread
From: Bill Schmidt @ 2019-09-20 20:25 UTC (permalink / raw)
  To: GT, libc-alpha

On 9/20/19 2:25 PM, GT wrote:
> ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
> On Thursday, August 8, 2019 11:25 AM, Bill Schmidt wschmidt@linux.ibm.com wrote:
>
>> Let me jump in here to answer a general question that I think Bert has
>> had for a while.
>> For the PPC64LE ABI, we should be returning everything through registers
>> wherever possible.  The ABI supports multiple return values of the same
>> type (up to 8 vector return values, for example), using the same
>> registers used for passing parameters.  For simplicity in this example,
>> I'll use the AltiVec-style types (vector double), but this works
>> identically if you use more generically defined vector types.
>> #include <altivec.h>
>> struct sincosret
>> {
>>      vector double sinvals;
>>      vector double cosvals;
>> };
>> struct sincosret
>> mysincos (vector double a)
>> {
>>      struct sincosret scr;
>>      scr.sinvals = a+a;  // May be slightly incorrect
>>      scr.cosvals = a*a;  // Ditto
>>      return scr;
>> }
>> This will result in the values being returned in VR2 and VR3:
>> xvmuldp 35,34,34
>> xvadddp 34,34,34
>> blr
>> This is preferable to returning values indirectly through memory, which
>> on older POWER processors can result in stalls from the store and load
>> being too close together and possibly executed out of order.  The cost
>> is pretty much negligible compared to the cost of computing sin/cos, but
>> we might as well do it the best way that the ABI provides.
> I believe we can now answer the issues that Joseph raised earlier in this thread.
> Those questions are here: https://sourceware.org/ml/libc-alpha/2019-08/msg00022.html
>
> The PowerPC64 double-precision vector sincos will have this as its prototype:
> struct sincosret _ZGVbN2v_sincos (vector double);
>
> The corresponding single-precision vector sincosf will have a prototype:
> struct sincosretf _ZGVbN4v_sincosf (vector float);
> -----------------------
>
> We also need a new attribute that will indicate when scalar sincos[f] in a loop can be vectorized using the newly redefined PowerPC64 vector sincos[f] functions. None of the existing attributes can be used since the technique used to return multiple values in registers is new AFAIU. So, Bill, are you the designer who can attest that what is agreed to here for the sincos API and ABI will be faithfully reflected in the ABI document?


Hi Bert,

Sure, I can work together with you on this.  I agree that a new 
attribute is needed.  The term we use for this in our existing ELFv2 ABI 
document is "homogeneous aggregates," so it would be good if the name of 
the attribute could reflect that the interface returns a homogeneous 
aggregate.  This is a bit of a mouthful, so may require some shortening.

As far as the new ABI document goes, I think we are looking to you to 
complete the proposal of interfaces, attributes, and so forth so that 
the document can be written.  I am the right person to work with on this.

Thanks!

Bill

>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: PPC64 libmvec sincos/sincosf ABI
  2019-08-08 15:25 ` Bill Schmidt
  2019-08-08 18:48   ` Bill Schmidt
@ 2019-09-20 19:25   ` GT
  2019-09-20 20:25     ` Bill Schmidt
  1 sibling, 1 reply; 18+ messages in thread
From: GT @ 2019-09-20 19:25 UTC (permalink / raw)
  To: libc-alpha

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Thursday, August 8, 2019 11:25 AM, Bill Schmidt wschmidt@linux.ibm.com wrote:

>
> Let me jump in here to answer a general question that I think Bert has
> had for a while.
> For the PPC64LE ABI, we should be returning everything through registers
> wherever possible.  The ABI supports multiple return values of the same
> type (up to 8 vector return values, for example), using the same
> registers used for passing parameters.  For simplicity in this example,
> I'll use the AltiVec-style types (vector double), but this works
> identically if you use more generically defined vector types.
> #include <altivec.h>
> struct sincosret
> {
>     vector double sinvals;
>     vector double cosvals;
> };
> struct sincosret
> mysincos (vector double a)
> {
>     struct sincosret scr;
>     scr.sinvals = a+a;  // May be slightly incorrect
>     scr.cosvals = a*a;  // Ditto
>     return scr;
> }
> This will result in the values being returned in VR2 and VR3:
> xvmuldp 35,34,34
> xvadddp 34,34,34
> blr
> This is preferable to returning values indirectly through memory, which
> on older POWER processors can result in stalls from the store and load
> being too close together and possibly executed out of order.  The cost
> is pretty much negligible compared to the cost of computing sin/cos, but
> we might as well do it the best way that the ABI provides.

I believe we can now answer the issues that Joseph raised earlier in this thread.
Those questions are here: https://sourceware.org/ml/libc-alpha/2019-08/msg00022.html

The PowerPC64 double-precision vector sincos will have this as its prototype:
struct sincosret _ZGVbN2v_sincos (vector double);

The corresponding single-precision vector sincosf will have a prototype:
struct sincosretf _ZGVbN4v_sincosf (vector float);
-----------------------

We also need a new attribute that will indicate when scalar sincos[f] in a loop can be vectorized using the newly redefined PowerPC64 vector sincos[f] functions. None of the existing attributes can be used since the technique used to return multiple values in registers is new AFAIU. So, Bill, are you the designer who can attest that what is agreed to here for the sincos API and ABI will be faithfully reflected in the ABI document?

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: PPC64 libmvec sincos/sincosf ABI
  2019-08-08 15:25 ` Bill Schmidt
@ 2019-08-08 18:48   ` Bill Schmidt
  2019-09-20 19:25   ` GT
  1 sibling, 0 replies; 18+ messages in thread
From: Bill Schmidt @ 2019-08-08 18:48 UTC (permalink / raw)
  To: Wilco Dijkstra, 'GNU C Library', tnggil, Joseph Myers
  Cc: nd, Tulio Magno Quites Machado Filho

On 8/8/19 10:25 AM, Bill Schmidt wrote:
> On 8/6/19 12:42 PM, Wilco Dijkstra wrote:
>> Hi,
>>
>>> 1. What is the best vector ABI (best performance) for sincos on PPC64?  
>>> That may be a function of the particular vector instructions available on 
>>> PPC64; the best choice of ABI on PPC64 need not correspond to the best 
>>> choice on x86_64.
>> I don't think it is related to the target - the fastest ABI is one that avoids
>> unnecessary work. For example scalar sincos is slow due to the inefficient
>> ABI which forces the results through memory (fixing that gives a 50% speedup). 
>>
>> Similarly for the vector ABI I think returning 2 vectors in registers will be the
>> fastest option in all cases. The actual vector instructions shouldn't affect the
>> ABI beyond the vector widths that can be supported.
>>
>> Wilco
>>
> Let me jump in here to answer a general question that I think Bert has
> had for a while.
>
> For the PPC64LE ABI, we should be returning everything through registers
> wherever possible.  The ABI supports multiple return values of the same
> type (up to 8 vector return values, for example), using the same
> registers used for passing parameters.  For simplicity in this example,
> I'll use the AltiVec-style types (vector double), but this works
> identically if you use more generically defined vector types.
>
> #include <altivec.h>
>
> struct sincosret
> {
>     vector double sinvals;
>     vector double cosvals;
> };
>
> struct sincosret
> mysincos (vector double a)
> {
>     struct sincosret scr;
>     scr.sinvals = a+a;  // May be slightly incorrect
>     scr.cosvals = a*a;  // Ditto
>     return scr;
> }
>
> This will result in the values being returned in VR2 and VR3:
>
>     xvmuldp 35,34,34
>     xvadddp 34,34,34
>     blr
>
> This is preferable to returning values indirectly through memory, which
> on older POWER processors can result in stalls from the store and load
> being too close together and possibly executed out of order.  The cost
> is pretty much negligible compared to the cost of computing sin/cos, but
> we might as well do it the best way that the ABI provides.

Important caveat to the above.  This is the ELFv2 ABI, used for
little-endian.  For the older ELFv1 ABI, the returned values will still
go through memory.

This doesn't restrict us from supporting ELFv1, but we just won't get
the benefit.

Thanks,
Bill
>
> Now, as I've said elsewhere, dealing with sincos in the -mveclibabi
> framework in GCC may be less than straightforward, due to the different
> description of the output types, but perhaps AArch64 has already laid
> some groundwork here.  I'm not up to date on the pending patches.
>
> Hope this helps,
> Bill
>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: PPC64 libmvec sincos/sincosf ABI
  2019-08-06 17:42 Wilco Dijkstra
  2019-08-06 20:31 ` Joseph Myers
@ 2019-08-08 15:25 ` Bill Schmidt
  2019-08-08 18:48   ` Bill Schmidt
  2019-09-20 19:25   ` GT
  1 sibling, 2 replies; 18+ messages in thread
From: Bill Schmidt @ 2019-08-08 15:25 UTC (permalink / raw)
  To: Wilco Dijkstra, 'GNU C Library', tnggil, Joseph Myers; +Cc: nd

On 8/6/19 12:42 PM, Wilco Dijkstra wrote:
> Hi,
>
>> 1. What is the best vector ABI (best performance) for sincos on PPC64?  
>> That may be a function of the particular vector instructions available on 
>> PPC64; the best choice of ABI on PPC64 need not correspond to the best 
>> choice on x86_64.
> I don't think it is related to the target - the fastest ABI is one that avoids
> unnecessary work. For example scalar sincos is slow due to the inefficient
> ABI which forces the results through memory (fixing that gives a 50% speedup). 
>
> Similarly for the vector ABI I think returning 2 vectors in registers will be the
> fastest option in all cases. The actual vector instructions shouldn't affect the
> ABI beyond the vector widths that can be supported.
>
> Wilco
>
Let me jump in here to answer a general question that I think Bert has
had for a while.

For the PPC64LE ABI, we should be returning everything through registers
wherever possible.  The ABI supports multiple return values of the same
type (up to 8 vector return values, for example), using the same
registers used for passing parameters.  For simplicity in this example,
I'll use the AltiVec-style types (vector double), but this works
identically if you use more generically defined vector types.

#include <altivec.h>

struct sincosret
{
    vector double sinvals;
    vector double cosvals;
};

struct sincosret
mysincos (vector double a)
{
    struct sincosret scr;
    scr.sinvals = a+a;  // May be slightly incorrect
    scr.cosvals = a*a;  // Ditto
    return scr;
}

This will result in the values being returned in VR2 and VR3:

    xvmuldp 35,34,34
    xvadddp 34,34,34
    blr

This is preferable to returning values indirectly through memory, which
on older POWER processors can result in stalls from the store and load
being too close together and possibly executed out of order.  The cost
is pretty much negligible compared to the cost of computing sin/cos, but
we might as well do it the best way that the ABI provides.

Now, as I've said elsewhere, dealing with sincos in the -mveclibabi
framework in GCC may be less than straightforward, due to the different
description of the output types, but perhaps AArch64 has already laid
some groundwork here.  I'm not up to date on the pending patches.

Hope this helps,
Bill

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: PPC64 libmvec sincos/sincosf ABI
  2019-08-06 17:42 Wilco Dijkstra
@ 2019-08-06 20:31 ` Joseph Myers
  2019-08-08 15:25 ` Bill Schmidt
  1 sibling, 0 replies; 18+ messages in thread
From: Joseph Myers @ 2019-08-06 20:31 UTC (permalink / raw)
  To: Wilco Dijkstra; +Cc: 'GNU C Library', tnggil, nd

On Tue, 6 Aug 2019, Wilco Dijkstra wrote:

> I don't think it is related to the target - the fastest ABI is one that avoids
> unnecessary work. For example scalar sincos is slow due to the inefficient
> ABI which forces the results through memory (fixing that gives a 50% speedup). 

At the scalar API level this is an argument for the previously discussed 
idea of adding cexpi functions to <complex.h> (as a GNU extension and 
potentially something proposed for standardization).

You'd then have the question of what the API and ABI should look like for 
declaring to the compiler that cexpi has certain corresponding vector 
functions available.

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: PPC64 libmvec sincos/sincosf ABI
@ 2019-08-06 17:42 Wilco Dijkstra
  2019-08-06 20:31 ` Joseph Myers
  2019-08-08 15:25 ` Bill Schmidt
  0 siblings, 2 replies; 18+ messages in thread
From: Wilco Dijkstra @ 2019-08-06 17:42 UTC (permalink / raw)
  To: 'GNU C Library', tnggil, Joseph Myers; +Cc: nd

Hi,

> 1. What is the best vector ABI (best performance) for sincos on PPC64?  
> That may be a function of the particular vector instructions available on 
> PPC64; the best choice of ABI on PPC64 need not correspond to the best 
> choice on x86_64.

I don't think it is related to the target - the fastest ABI is one that avoids
unnecessary work. For example scalar sincos is slow due to the inefficient
ABI which forces the results through memory (fixing that gives a 50% speedup). 

Similarly for the vector ABI I think returning 2 vectors in registers will be the
fastest option in all cases. The actual vector instructions shouldn't affect the
ABI beyond the vector widths that can be supported.

Wilco

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2019-09-24 16:43 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-08-01 13:01 PPC64 libmvec sincos/sincosf ABI GT
2019-08-01 17:04 ` Joseph Myers
2019-08-07 21:17 ` Tulio Magno Quites Machado Filho
2019-08-08 13:34   ` Bill Schmidt
2019-08-08 15:48     ` GT
2019-08-08 15:56       ` Florian Weimer
2019-08-08 16:56         ` GT
2019-08-08 16:11       ` Bill Schmidt
2019-08-08 17:42         ` GT
2019-08-08 17:51           ` Bill Schmidt
2019-08-06 17:42 Wilco Dijkstra
2019-08-06 20:31 ` Joseph Myers
2019-08-08 15:25 ` Bill Schmidt
2019-08-08 18:48   ` Bill Schmidt
2019-09-20 19:25   ` GT
2019-09-20 20:25     ` Bill Schmidt
2019-09-23 18:02       ` GT
2019-09-24 16:43         ` Bill Schmidt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).