PPC64 libmvec implementation of sincos

public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed

* PPC64 libmvec implementation of sincos
@ 2019-09-27 19:23 GT
  2019-09-30 13:53 ` Szabolcs Nagy
  0 siblings, 1 reply; 25+ messages in thread
From: GT @ 2019-09-27 19:23 UTC (permalink / raw)
  To: gcc; +Cc: Bill Schmidt

I am attempting to create a vector version of sincos for PPC64.
The relevant discussion thread is on the GLIBC libc-alpha mailing list.
Navigate it beginning at https://sourceware.org/ml/libc-alpha/2019-09/msg00334.html

The intention is to reuse as much as possible from the existing GCC implementation of other libmvec functions.
My questions are: Which function(s) in GCC;

1. Gather scalar function input arguments, from multiple loop iterations, into a single vector input argument for the vector function version?
2. Distribute scalar function outputs, to appropriate loop iteration result, from the single vector function output result?

I am referring especially to vectorization of sin and cos.

Thanks.
Bert Tenjy.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: PPC64 libmvec implementation of sincos
  2019-09-27 19:23 PPC64 libmvec implementation of sincos GT
@ 2019-09-30 13:53 ` Szabolcs Nagy
  2019-09-30 17:30   ` GT
  2019-09-30 18:09   ` Richard Biener
  0 siblings, 2 replies; 25+ messages in thread
From: Szabolcs Nagy @ 2019-09-30 13:53 UTC (permalink / raw)
  To: GT, gcc; +Cc: nd, Bill Schmidt

On 27/09/2019 20:23, GT wrote:
> I am attempting to create a vector version of sincos for PPC64.
> The relevant discussion thread is on the GLIBC libc-alpha mailing list.
> Navigate it beginning at https://sourceware.org/ml/libc-alpha/2019-09/msg00334.html
> 
> The intention is to reuse as much as possible from the existing GCC implementation of other libmvec functions.
> My questions are: Which function(s) in GCC;
> 
> 1. Gather scalar function input arguments, from multiple loop iterations, into a single vector input argument for the vector function version?
> 2. Distribute scalar function outputs, to appropriate loop iteration result, from the single vector function output result?
> 
> I am referring especially to vectorization of sin and cos.

i wonder if gcc can auto-vectorize scalar sincos
calls, the vectorizer seems to want the calls to
have no side-effect, but attribute pure or const
is not appropriate for sincos (which has no return
value but takes writable pointer args)

"#pragma omp simd" on a loop seems to work but i
could not get unannotated sincos loops to vectorize.

it seems it would be nice if we could add pure/const
somehow (maybe to the simd variant only? afaik openmp
requires no sideeffects for simd variants, but that's
probably only for explicitly marked loops?)

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: PPC64 libmvec implementation of sincos
  2019-09-30 13:53 ` Szabolcs Nagy
@ 2019-09-30 17:30   ` GT
  2019-09-30 18:07     ` Szabolcs Nagy
  2019-09-30 18:09   ` Richard Biener
  1 sibling, 1 reply; 25+ messages in thread
From: GT @ 2019-09-30 17:30 UTC (permalink / raw)
  To: Szabolcs Nagy; +Cc: gcc, nd, Bill Schmidt

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Monday, September 30, 2019 9:52 AM, Szabolcs Nagy <Szabolcs.Nagy@arm.com> wrote:

> On 27/09/2019 20:23, GT wrote:
>
> > I am attempting to create a vector version of sincos for PPC64.
> > The relevant discussion thread is on the GLIBC libc-alpha mailing list.
> > Navigate it beginning at https://sourceware.org/ml/libc-alpha/2019-09/msg00334.html
> > The intention is to reuse as much as possible from the existing GCC implementation of other libmvec functions.
> > My questions are: Which function(s) in GCC;
> >
> > 1.  Gather scalar function input arguments, from multiple loop iterations, into a single vector input argument for the vector function version?
> > 2.  Distribute scalar function outputs, to appropriate loop iteration result, from the single vector function output result?
> >
> > I am referring especially to vectorization of sin and cos.
>
> i wonder if gcc can auto-vectorize scalar sincos
> calls, the vectorizer seems to want the calls to
> have no side-effect, but attribute pure or const
> is not appropriate for sincos (which has no return
> value but takes writable pointer args)

1.  Do you mean whether x86_64 already does auto-vectorize sincos?
2.  Where in the code do you see the vectorizer require no side-effect?

> "#pragma omp simd" on a loop seems to work but i
> could not get unannotated sincos loops to vectorize.
>
> it seems it would be nice if we could add pure/const
> somehow (maybe to the simd variant only? afaik openmp
> requires no sideeffects for simd variants, but that's
> probably only for explicitly marked loops?)

1. Example 1 and Example 2 at https://sourceware.org/glibc/wiki/libmvec show the 2 different
ways to activate auto-vectorization. When you refer to "unannotated sincos", which of
the 2 techniques do you mean?
2. Which function was auto-vectorized by "pragma omp simd" in the loop?

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: PPC64 libmvec implementation of sincos
  2019-09-30 17:30   ` GT
@ 2019-09-30 18:07     ` Szabolcs Nagy
  0 siblings, 0 replies; 25+ messages in thread
From: Szabolcs Nagy @ 2019-09-30 18:07 UTC (permalink / raw)
  To: GT; +Cc: nd, gcc, Bill Schmidt

On 30/09/2019 18:30, GT wrote:
> ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
> On Monday, September 30, 2019 9:52 AM, Szabolcs Nagy <Szabolcs.Nagy@arm.com> wrote:
> 
>> On 27/09/2019 20:23, GT wrote:
>>
>>> I am attempting to create a vector version of sincos for PPC64.
>>> The relevant discussion thread is on the GLIBC libc-alpha mailing list.
>>> Navigate it beginning at https://sourceware.org/ml/libc-alpha/2019-09/msg00334.html
>>> The intention is to reuse as much as possible from the existing GCC implementation of other libmvec functions.
>>> My questions are: Which function(s) in GCC;
>>>
>>> 1.  Gather scalar function input arguments, from multiple loop iterations, into a single vector input argument for the vector function version?
>>> 2.  Distribute scalar function outputs, to appropriate loop iteration result, from the single vector function output result?
>>>
>>> I am referring especially to vectorization of sin and cos.
>>
>> i wonder if gcc can auto-vectorize scalar sincos
>> calls, the vectorizer seems to want the calls to
>> have no side-effect, but attribute pure or const
>> is not appropriate for sincos (which has no return
>> value but takes writable pointer args)
> 
> 1.  Do you mean whether x86_64 already does auto-vectorize sincos?

any current target with simd attribute or omp delcare simd support.

> 2.  Where in the code do you see the vectorizer require no side-effect?

i don't know where it is in the code, but

__attribute__((simd)) float foo (float);

void bar (float *restrict a, float *restrict b)
{
	for(int i=0; i<4000; i++)
		a[i] = foo (b[i]);
}

is not vectorized, however it gets vectorized if

i add __attribute__((const)) to foo
OR
if i add '#pragma omp simd' to the loop and compile with
-fopenmp-simd.

(which makes sense to me: you don't want to vectorize
if you don't know the side-effects, otoh, there is no
attribute to say that i know there will be no side-effects
in functions taking pointer arguments so i don't see
how sincos can get vectorized)

>> "#pragma omp simd" on a loop seems to work but i
>> could not get unannotated sincos loops to vectorize.
>>
>> it seems it would be nice if we could add pure/const
>> somehow (maybe to the simd variant only? afaik openmp
>> requires no sideeffects for simd variants, but that's
>> probably only for explicitly marked loops?)
> 
> 1. Example 1 and Example 2 at https://sourceware.org/glibc/wiki/libmvec show the 2 different
> ways to activate auto-vectorization. When you refer to "unannotated sincos", which of
> the 2 techniques do you mean?

example 1 annotates the loop with #pragma omp simd.
(and requires -fopenmp-simd cflag to work)

example 2 is my goal where -ftree-vectorize is enough
without annotation.

> 2. Which function was auto-vectorized by "pragma omp simd" in the loop?

see example above.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: PPC64 libmvec implementation of sincos
  2019-09-30 13:53 ` Szabolcs Nagy
  2019-09-30 17:30   ` GT
@ 2019-09-30 18:09   ` Richard Biener
  2019-11-25 16:53     ` GT
  1 sibling, 1 reply; 25+ messages in thread
From: Richard Biener @ 2019-09-30 18:09 UTC (permalink / raw)
  To: gcc, Szabolcs Nagy, GT, gcc; +Cc: nd, Bill Schmidt

On September 30, 2019 3:52:52 PM GMT+02:00, Szabolcs Nagy <Szabolcs.Nagy@arm.com> wrote:
>On 27/09/2019 20:23, GT wrote:
>> I am attempting to create a vector version of sincos for PPC64.
>> The relevant discussion thread is on the GLIBC libc-alpha mailing
>list.
>> Navigate it beginning at
>https://sourceware.org/ml/libc-alpha/2019-09/msg00334.html
>> 
>> The intention is to reuse as much as possible from the existing GCC
>implementation of other libmvec functions.
>> My questions are: Which function(s) in GCC;
>> 
>> 1. Gather scalar function input arguments, from multiple loop
>iterations, into a single vector input argument for the vector function
>version?
>> 2. Distribute scalar function outputs, to appropriate loop iteration
>result, from the single vector function output result?
>> 
>> I am referring especially to vectorization of sin and cos.
>
>i wonder if gcc can auto-vectorize scalar sincos
>calls, the vectorizer seems to want the calls to
>have no side-effect, but attribute pure or const
>is not appropriate for sincos (which has no return
>value but takes writable pointer args)

We have __builtin_cexpi for that but not sure if any of the mechanisms can provide a mapping to a vectorized variant. 

>"#pragma omp simd" on a loop seems to work but i
>could not get unannotated sincos loops to vectorize.
>
>it seems it would be nice if we could add pure/const
>somehow (maybe to the simd variant only? afaik openmp
>requires no sideeffects for simd variants, but that's
>probably only for explicitly marked loops?)

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: PPC64 libmvec implementation of sincos
  2019-09-30 18:09   ` Richard Biener
@ 2019-11-25 16:53     ` GT
  2019-11-27  8:19       ` Richard Biener
  0 siblings, 1 reply; 25+ messages in thread
From: GT @ 2019-11-25 16:53 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc, Szabolcs Nagy, nd, Bill Schmidt

> >
> > i wonder if gcc can auto-vectorize scalar sincos
> > calls, the vectorizer seems to want the calls to
> > have no side-effect, but attribute pure or const
> > is not appropriate for sincos (which has no return
> > value but takes writable pointer args)
>
> We have __builtin_cexpi for that but not sure if any of the mechanisms can provide a mapping to a vectorized variant.
>

1. Using flags -fopt-info-all and -fopt-info-internals, the failure to vectorize sincos
is reported as "unsupported data-type: complex double". The default GCC behavior is to
replace sincos calls with calls to __builtin_cexpi.

2. Using flags -fno-builtin-sincos and -fno-builtin-cexpi, the failure to vectorize
sincos is different. In this case, the failure to vectorize is due to "number of iterations
could not be computed". No calls to __builtin_cexpi; sincos calls retained.

Questions:
1. Should we aim to provide a vectorized version of __builtin_cexpi? If so, it would have
to be a PPC64-only vector __builtin-cexpi, right?

2. Or should we require that vectorized sincos be available only when -fno-builtin-sincos flag
is used in compilation?

I don't think we need to fix both types of vectorization failures in order to obtain sincos
vectorization.

Thanks.
Bert.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: PPC64 libmvec implementation of sincos
  2019-11-25 16:53     ` GT
@ 2019-11-27  8:19       ` Richard Biener
  2019-12-04 20:53         ` GT
  0 siblings, 1 reply; 25+ messages in thread
From: Richard Biener @ 2019-11-27  8:19 UTC (permalink / raw)
  To: GT; +Cc: gcc, Szabolcs Nagy, nd, Bill Schmidt

On Mon, Nov 25, 2019 at 5:53 PM GT <tnggil@protonmail.com> wrote:
>
> > >
> > > i wonder if gcc can auto-vectorize scalar sincos
> > > calls, the vectorizer seems to want the calls to
> > > have no side-effect, but attribute pure or const
> > > is not appropriate for sincos (which has no return
> > > value but takes writable pointer args)
> >
> > We have __builtin_cexpi for that but not sure if any of the mechanisms can provide a mapping to a vectorized variant.
> >
>
> 1. Using flags -fopt-info-all and -fopt-info-internals, the failure to vectorize sincos
> is reported as "unsupported data-type: complex double". The default GCC behavior is to
> replace sincos calls with calls to __builtin_cexpi.
>
> 2. Using flags -fno-builtin-sincos and -fno-builtin-cexpi, the failure to vectorize
> sincos is different. In this case, the failure to vectorize is due to "number of iterations
> could not be computed". No calls to __builtin_cexpi; sincos calls retained.
>
> Questions:
> 1. Should we aim to provide a vectorized version of __builtin_cexpi? If so, it would have
> to be a PPC64-only vector __builtin-cexpi, right?
>
> 2. Or should we require that vectorized sincos be available only when -fno-builtin-sincos flag
> is used in compilation?
>
> I don't think we need to fix both types of vectorization failures in order to obtain sincos
> vectorization.

I think we should have a vectorized cexpi since that's having a sane
ABI.  The complex
return type of cexpi makes it a little awkward for the vectorizer but
handling this should
be manageable.  It's a bit difficult to expose complex types to the
vectorizer since
most cases are lowered early.

Richard.

> Thanks.
> Bert.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: PPC64 libmvec implementation of sincos
  2019-11-27  8:19       ` Richard Biener
@ 2019-12-04 20:53         ` GT
  2019-12-05  9:44           ` Richard Biener
  0 siblings, 1 reply; 25+ messages in thread
From: GT @ 2019-12-04 20:53 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc, Szabolcs Nagy, nd, Bill Schmidt, tnggil

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Wednesday, November 27, 2019 3:19 AM, Richard Biener <richard.guenther@gmail.com> wrote:

...

> > Questions:
> >
> > 1.  Should we aim to provide a vectorized version of __builtin_cexpi? If so, it would have
> >     to be a PPC64-only vector __builtin-cexpi, right?
> >
> > 2.  Or should we require that vectorized sincos be available only when -fno-builtin-sincos flag
> >     is used in compilation?
> >
> >
> > I don't think we need to fix both types of vectorization failures in order to obtain sincos
> > vectorization.
>
> I think we should have a vectorized cexpi since that's having a sane
> ABI. The complex
> return type of cexpi makes it a little awkward for the vectorizer but
> handling this should
> be manageable. It's a bit difficult to expose complex types to the
> vectorizer since
> most cases are lowered early.
>

I'm trying to identify the source code which needs modification but I need help proceeding.

I am comparing two compilations: The first is a simple file with a call to sin in a loop.
Vectorization succeeds. The second is an almost identical file but with a call to sincos
in the loop. Vectorization fails.

In gdb, the earliest code location where the two compilations differ is in function
number_of_iterations_exit_assumptions in file tree-ssa-loop-niter.c. Line

op0 = gimple_cond_lhs (stmt);

returns a tree which when analyzed in function instantiate_scev_r (in file tree-scalar-evolution.c)
results in the first branch of the switch being taken for sincos. For sin, the 2nd branch of the
switch is taken.

How can I correlate stmt in the source line above to the relevant line in any dump among those created
using debugging dump option -fdump-tree-all?

Thanks.
Bert.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: PPC64 libmvec implementation of sincos
  2019-12-04 20:53         ` GT
@ 2019-12-05  9:44           ` Richard Biener
  2019-12-05 17:46             ` GT
  0 siblings, 1 reply; 25+ messages in thread
From: Richard Biener @ 2019-12-05  9:44 UTC (permalink / raw)
  To: GT; +Cc: gcc, Szabolcs Nagy, nd, Bill Schmidt

On Wed, Dec 4, 2019 at 9:53 PM GT <tnggil@protonmail.com> wrote:
>
> ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
> On Wednesday, November 27, 2019 3:19 AM, Richard Biener <richard.guenther@gmail.com> wrote:
>
> ...
>
> > > Questions:
> > >
> > > 1.  Should we aim to provide a vectorized version of __builtin_cexpi? If so, it would have
> > >     to be a PPC64-only vector __builtin-cexpi, right?
> > >
> > > 2.  Or should we require that vectorized sincos be available only when -fno-builtin-sincos flag
> > >     is used in compilation?
> > >
> > >
> > > I don't think we need to fix both types of vectorization failures in order to obtain sincos
> > > vectorization.
> >
> > I think we should have a vectorized cexpi since that's having a sane
> > ABI. The complex
> > return type of cexpi makes it a little awkward for the vectorizer but
> > handling this should
> > be manageable. It's a bit difficult to expose complex types to the
> > vectorizer since
> > most cases are lowered early.
> >
>
> I'm trying to identify the source code which needs modification but I need help proceeding.
>
> I am comparing two compilations: The first is a simple file with a call to sin in a loop.
> Vectorization succeeds. The second is an almost identical file but with a call to sincos
> in the loop. Vectorization fails.
>
> In gdb, the earliest code location where the two compilations differ is in function
> number_of_iterations_exit_assumptions in file tree-ssa-loop-niter.c. Line
>
> op0 = gimple_cond_lhs (stmt);
>
> returns a tree which when analyzed in function instantiate_scev_r (in file tree-scalar-evolution.c)
> results in the first branch of the switch being taken for sincos. For sin, the 2nd branch of the
> switch is taken.
>
> How can I correlate stmt in the source line above to the relevant line in any dump among those created
> using debugging dump option -fdump-tree-all?

grep ;)

Can you provide a testcase with a simd attribute annotated cexpi that
one can play with?

Richard.

>
> Thanks.
> Bert.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: PPC64 libmvec implementation of sincos
  2019-12-05  9:44           ` Richard Biener
@ 2019-12-05 17:46             ` GT
  2019-12-06 10:48               ` Richard Biener
  0 siblings, 1 reply; 25+ messages in thread
From: GT @ 2019-12-05 17:46 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc, Szabolcs Nagy, nd, Bill Schmidt, tnggil


‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Thursday, December 5, 2019 4:44 AM, Richard Biener <richard.guenther@gmail.com> wrote:

...
...
...

> >
> > I'm trying to identify the source code which needs modification but I need help proceeding.
> > I am comparing two compilations: The first is a simple file with a call to sin in a loop.
> > Vectorization succeeds. The second is an almost identical file but with a call to sincos
> > in the loop. Vectorization fails.
> > In gdb, the earliest code location where the two compilations differ is in function
> > number_of_iterations_exit_assumptions in file tree-ssa-loop-niter.c. Line
> > op0 = gimple_cond_lhs (stmt);
> > returns a tree which when analyzed in function instantiate_scev_r (in file tree-scalar-evolution.c)
> > results in the first branch of the switch being taken for sincos. For sin, the 2nd branch of the
> > switch is taken.
> > How can I correlate stmt in the source line above to the relevant line in any dump among those created
> > using debugging dump option -fdump-tree-all?
>
> grep ;)
>
> Can you provide a testcase with a simd attribute annotated cexpi that
> one can play with?
>

On an x86_64 system, run Example 2 at this link:

sourceware.org/glibc/wiki/libmvec

After verifying vectorization (by finding a name with prefix _ZGV and suffix _sin in a.out), replace
the call to sin by one to sincos. The file should be similar to this:

================

#include <math.h>

int N = 3200;
double c[3200];
double b[3200];
double a[3200];

int main (void)
{
  int i;

  for (i = 0; i < N; i += 1)
  {
    sincos (a[i], &b[i], &c[i]);
  }

  return (0);
}

================

In addition to the options shown in Example 2, I passed GCC flags -fopt-info-all, -fopt-info-internal and
-fdump-tree-all to obtain more verbose messages.

That should show vectorization failing for sincos, and diagnostics on the screen indicating reason(s) for
the failure.

To perform the runs on PPC64 requires building both GCC and GLIBC with modifications not yet accepted
into the main development branches of the projects.

Please let me know if you are able to run on x86_64; if not, then perhaps I can push the local GCC
changes to some github repository. GLIBC changes are available at branch tuliom/libmvec of the
development repository.

Bert.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: PPC64 libmvec implementation of sincos
  2019-12-05 17:46             ` GT
@ 2019-12-06 10:48               ` Richard Biener
  2019-12-06 11:15                 ` Jakub Jelinek
  0 siblings, 1 reply; 25+ messages in thread
From: Richard Biener @ 2019-12-06 10:48 UTC (permalink / raw)
  To: GT, Jakub Jelinek; +Cc: gcc, Szabolcs Nagy, nd, Bill Schmidt

On Thu, Dec 5, 2019 at 6:45 PM GT <tnggil@protonmail.com> wrote:
>
>
> ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
> On Thursday, December 5, 2019 4:44 AM, Richard Biener <richard.guenther@gmail.com> wrote:
>
> ...
> ...
> ...
>
> > >
> > > I'm trying to identify the source code which needs modification but I need help proceeding.
> > > I am comparing two compilations: The first is a simple file with a call to sin in a loop.
> > > Vectorization succeeds. The second is an almost identical file but with a call to sincos
> > > in the loop. Vectorization fails.
> > > In gdb, the earliest code location where the two compilations differ is in function
> > > number_of_iterations_exit_assumptions in file tree-ssa-loop-niter.c. Line
> > > op0 = gimple_cond_lhs (stmt);
> > > returns a tree which when analyzed in function instantiate_scev_r (in file tree-scalar-evolution.c)
> > > results in the first branch of the switch being taken for sincos. For sin, the 2nd branch of the
> > > switch is taken.
> > > How can I correlate stmt in the source line above to the relevant line in any dump among those created
> > > using debugging dump option -fdump-tree-all?
> >
> > grep ;)
> >
> > Can you provide a testcase with a simd attribute annotated cexpi that
> > one can play with?
> >
>
> On an x86_64 system, run Example 2 at this link:
>
> sourceware.org/glibc/wiki/libmvec
>
> After verifying vectorization (by finding a name with prefix _ZGV and suffix _sin in a.out), replace
> the call to sin by one to sincos. The file should be similar to this:
>
> ================
>
> #include <math.h>
>
> int N = 3200;
> double c[3200];
> double b[3200];
> double a[3200];
>
> int main (void)
> {
>   int i;
>
>   for (i = 0; i < N; i += 1)
>   {
>     sincos (a[i], &b[i], &c[i]);
>   }
>
>   return (0);
> }
>
> ================
>
> In addition to the options shown in Example 2, I passed GCC flags -fopt-info-all, -fopt-info-internal and
> -fdump-tree-all to obtain more verbose messages.
>
> That should show vectorization failing for sincos, and diagnostics on the screen indicating reason(s) for
> the failure.
>
> To perform the runs on PPC64 requires building both GCC and GLIBC with modifications not yet accepted
> into the main development branches of the projects.
>
> Please let me know if you are able to run on x86_64; if not, then perhaps I can push the local GCC
> changes to some github repository. GLIBC changes are available at branch tuliom/libmvec of the
> development repository.

So I used

void sincos(double x, double *sin, double *cos);
_Complex double __attribute__((__simd__("notinbranch")))
__builtin_cexpi (double);

int N = 3200;
double c[3200];
double b[3200];
double a[3200];

int main (void)
{
  int i;

  for (i = 0; i < N; i += 1)
  {
    sincos (a[i], &b[i], &c[i]);
  }

  return (0);
}

and get

t.c:2:58: warning: unsupported return type ‘complex double’ for simd

so I suppose that would need fixing / ABI adjustments.  Then vectorization
fails with the expected

t.c:13:3: note:   ==> examining statement: _8 = __builtin_cexpi (_1);
t.c:13:3: note:   get vectype for scalar type: complex double
t.c:15:5: missed:   not vectorized: unsupported data-type complex double
t.c:13:3: missed:  can't determine vectorization factor.

For the ABI thing the alternative is to go with "something" for sincos
and have the vectorizer query that something at cexpi vectorization
time, emitting code for that ABI.

But of course the vectorizer needs to be teached to deal with the cexpi
call in the IL which was very low priority because there wasn't any
SIMD implementation of sincos (with whatever ABI).  I can help with
that to some extent, but I wonder what openmp says to _Complex
types and simd functions for those?  Jakub?

Richard.

> Bert.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: PPC64 libmvec implementation of sincos
  2019-12-06 10:48               ` Richard Biener
@ 2019-12-06 11:15                 ` Jakub Jelinek
  2019-12-06 11:39                   ` Richard Biener
  0 siblings, 1 reply; 25+ messages in thread
From: Jakub Jelinek @ 2019-12-06 11:15 UTC (permalink / raw)
  To: Richard Biener; +Cc: GT, gcc, Szabolcs Nagy, nd, Bill Schmidt

On Fri, Dec 06, 2019 at 11:48:03AM +0100, Richard Biener wrote:
> So I used
> 
> void sincos(double x, double *sin, double *cos);
> _Complex double __attribute__((__simd__("notinbranch")))
> __builtin_cexpi (double);

While Intel-ABI-Vector-Function-2015-v0.9.8.pdf talks about complex numbers,
the reason we punt:
unsupported return type ‘complex double’ for simd
etc. is that we really don't support VECTOR_TYPE with COMPLEX_TYPE element
type, I guess the vectorizer doesn't do anything with that either unless
some earlier optimization was able to scalarize the complex halves.
In theory we could represent the vector counterparts of complex types
as just vectors of double width with element type of COMPLEX_TYPE element
type, have a look at what exactly ICC does to find out if the vector
ordering is real0 complex0 real1 complex1 ... or
real0 real1 real2 ... complex0 complex1 complex2 ...
and tweak everything that needs to cope.

	Jakub

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: PPC64 libmvec implementation of sincos
  2019-12-06 11:15                 ` Jakub Jelinek
@ 2019-12-06 11:39                   ` Richard Biener
  2019-12-06 16:50                     ` GT
  0 siblings, 1 reply; 25+ messages in thread
From: Richard Biener @ 2019-12-06 11:39 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: GT, gcc, Szabolcs Nagy, nd, Bill Schmidt

On Fri, Dec 6, 2019 at 12:15 PM Jakub Jelinek <jakub@redhat.com> wrote:
>
> On Fri, Dec 06, 2019 at 11:48:03AM +0100, Richard Biener wrote:
> > So I used
> >
> > void sincos(double x, double *sin, double *cos);
> > _Complex double __attribute__((__simd__("notinbranch")))
> > __builtin_cexpi (double);
>
> While Intel-ABI-Vector-Function-2015-v0.9.8.pdf talks about complex numbers,
> the reason we punt:
> unsupported return type ‘complex double’ for simd
> etc. is that we really don't support VECTOR_TYPE with COMPLEX_TYPE element
> type, I guess the vectorizer doesn't do anything with that either unless
> some earlier optimization was able to scalarize the complex halves.
> In theory we could represent the vector counterparts of complex types
> as just vectors of double width with element type of COMPLEX_TYPE element
> type, have a look at what exactly ICC does to find out if the vector
> ordering is real0 complex0 real1 complex1 ... or
> real0 real1 real2 ... complex0 complex1 complex2 ...
> and tweak everything that needs to cope.

I hope real0 complex0, ...

Anyway, the first step is to support vectorizing code where parts of it are
already vectors:

typedef double v2df __attribute__((vector_size(16)));
#define N 1024
v2df a[N];
double b[N];
double c[N];
void foo()
{
  for (int i = 0; i < N; ++i)
    {
      v2df tem = a[i];
      b[i] = tem[0];
      c[i] = tem[1];
    }
}

that can be "re-vectorized" for AVX for example.  If you substitute
_Complex double for the vector type we only handle it during
vectorization because forwprop combines the load and the
__real/imag which helps.

Richard.

>         Jakub
>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: PPC64 libmvec implementation of sincos
  2019-12-06 11:39                   ` Richard Biener
@ 2019-12-06 16:50                     ` GT
  2019-12-06 17:43                       ` Richard Biener
  0 siblings, 1 reply; 25+ messages in thread
From: GT @ 2019-12-06 16:50 UTC (permalink / raw)
  To: Richard Biener
  Cc: Jakub Jelinek, gcc, Szabolcs Nagy, nd, Bill Schmidt, tnggil

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Friday, December 6, 2019 6:38 AM, Richard Biener <richard.guenther@gmail.com> wrote:

> On Fri, Dec 6, 2019 at 12:15 PM Jakub Jelinek jakub@redhat.com wrote:
>
> > On Fri, Dec 06, 2019 at 11:48:03AM +0100, Richard Biener wrote:
> >
> > > So I used
> > > void sincos(double x, double *sin, double *cos);
> > > _Complex double attribute((simd("notinbranch")))
> > > __builtin_cexpi (double);
> >
> > While Intel-ABI-Vector-Function-2015-v0.9.8.pdf talks about complex numbers,
> > the reason we punt:
> > unsupported return type ‘complex double’ for simd
> > etc. is that we really don't support VECTOR_TYPE with COMPLEX_TYPE element
> > type, I guess the vectorizer doesn't do anything with that either unless
> > some earlier optimization was able to scalarize the complex halves.
> > In theory we could represent the vector counterparts of complex types
> > as just vectors of double width with element type of COMPLEX_TYPE element
> > type, have a look at what exactly ICC does to find out if the vector
> > ordering is real0 complex0 real1 complex1 ... or
> > real0 real1 real2 ... complex0 complex1 complex2 ...
> > and tweak everything that needs to cope.
>
> I hope real0 complex0, ...
>
> Anyway, the first step is to support vectorizing code where parts of it are
> already vectors:
>
> typedef double v2df attribute((vector_size(16)));
> #define N 1024
> v2df a[N];
> double b[N];
> double c[N];
> void foo()
> {
> for (int i = 0; i < N; ++i)
> {
> v2df tem = a[i];
> b[i] = tem[0];
> c[i] = tem[1];
> }
> }
>
> that can be "re-vectorized" for AVX for example. If you substitute
> _Complex double for the vector type we only handle it during
> vectorization because forwprop combines the load and the
> __real/imag which helps.
>

Are we certain the change we want is to support _Complex double so that cexpi is auto-vectorized?
Looking at the resulting executable of the code with sincos in the loop, the only function called
is sincos. Not builtin_cexpi or any variant of cexpi. File gcc/builtins.c expands calls to builtin_cexpi
to sincos! What is gained by the compiler going through the transformations sincos -> builtin_cexpi ->
sincos?

Bert.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: PPC64 libmvec implementation of sincos
  2019-12-06 16:50                     ` GT
@ 2019-12-06 17:43                       ` Richard Biener
  2019-12-08 21:40                         ` GT
  0 siblings, 1 reply; 25+ messages in thread
From: Richard Biener @ 2019-12-06 17:43 UTC (permalink / raw)
  To: GT; +Cc: Jakub Jelinek, gcc, Szabolcs Nagy, nd, Bill Schmidt, tnggil

On December 6, 2019 5:50:25 PM GMT+01:00, GT <tnggil@protonmail.com> wrote:
>‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
>On Friday, December 6, 2019 6:38 AM, Richard Biener
><richard.guenther@gmail.com> wrote:
>
>> On Fri, Dec 6, 2019 at 12:15 PM Jakub Jelinek jakub@redhat.com wrote:
>>
>> > On Fri, Dec 06, 2019 at 11:48:03AM +0100, Richard Biener wrote:
>> >
>> > > So I used
>> > > void sincos(double x, double *sin, double *cos);
>> > > _Complex double attribute((simd("notinbranch")))
>> > > __builtin_cexpi (double);
>> >
>> > While Intel-ABI-Vector-Function-2015-v0.9.8.pdf talks about complex
>numbers,
>> > the reason we punt:
>> > unsupported return type ‘complex double’ for simd
>> > etc. is that we really don't support VECTOR_TYPE with COMPLEX_TYPE
>element
>> > type, I guess the vectorizer doesn't do anything with that either
>unless
>> > some earlier optimization was able to scalarize the complex halves.
>> > In theory we could represent the vector counterparts of complex
>types
>> > as just vectors of double width with element type of COMPLEX_TYPE
>element
>> > type, have a look at what exactly ICC does to find out if the
>vector
>> > ordering is real0 complex0 real1 complex1 ... or
>> > real0 real1 real2 ... complex0 complex1 complex2 ...
>> > and tweak everything that needs to cope.
>>
>> I hope real0 complex0, ...
>>
>> Anyway, the first step is to support vectorizing code where parts of
>it are
>> already vectors:
>>
>> typedef double v2df attribute((vector_size(16)));
>> #define N 1024
>> v2df a[N];
>> double b[N];
>> double c[N];
>> void foo()
>> {
>> for (int i = 0; i < N; ++i)
>> {
>> v2df tem = a[i];
>> b[i] = tem[0];
>> c[i] = tem[1];
>> }
>> }
>>
>> that can be "re-vectorized" for AVX for example. If you substitute
>> _Complex double for the vector type we only handle it during
>> vectorization because forwprop combines the load and the
>> __real/imag which helps.
>>
>
>Are we certain the change we want is to support _Complex double so that
>cexpi is auto-vectorized?
>Looking at the resulting executable of the code with sincos in the
>loop, the only function called
>is sincos. Not builtin_cexpi or any variant of cexpi. File
>gcc/builtins.c expands calls to builtin_cexpi
>to sincos! What is gained by the compiler going through the
>transformations sincos -> builtin_cexpi ->
>sincos?

Yes, we want to support vectorizing cexpi because that is what the compiler will lower sincos to. The sincos API is painful to deal with due to the data dependences it introduces. Now, the vectorizer can of course emit calls to a vectorized sincos it just needs to be able to deal with cexpi input IL. 

Richard. 

>Bert.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: PPC64 libmvec implementation of sincos
  2019-12-06 17:43                       ` Richard Biener
@ 2019-12-08 21:40                         ` GT
  2019-12-09  8:40                           ` Richard Biener
  0 siblings, 1 reply; 25+ messages in thread
From: GT @ 2019-12-08 21:40 UTC (permalink / raw)
  To: Richard Biener
  Cc: Jakub Jelinek, gcc, Szabolcs Nagy, nd, Bill Schmidt, tnggil

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Friday, December 6, 2019 12:43 PM, Richard Biener richard.guenther@gmail.com wrote:

...
...

> > Are we certain the change we want is to support _Complex double so that
> > cexpi is auto-vectorized?
> > Looking at the resulting executable of the code with sincos in the
> > loop, the only function called
> > is sincos. Not builtin_cexpi or any variant of cexpi. File
> > gcc/builtins.c expands calls to builtin_cexpi
> > to sincos! What is gained by the compiler going through the
> > transformations sincos -> builtin_cexpi ->
> > sincos?
>
> Yes, we want to support vectorizing cexpi because that is what the compiler will lower sincos to. The sincos API is painful to deal with due to the data dependences it introduces. Now, the vectorizer can of course emit calls to a vectorized sincos it just needs to be able to deal with cexpi input IL.
>
> Richard.

I'm modifying the code trying to get complex double accepted as a valid type by the vectorizer.
This is the first time I'm dealing with GCC source so I ask for some patience.

Function mode_for_vector in gcc/stor-layout.c requires a new else-if for complex double. I cannot
seem to find a header file where MIN_MODE_VECTOR_FLOAT and similar macros are defined. I expect
a new MIN_MODE_COMPLEX_VECTOR_FLOAT to be defined in the same file as the existing similar macros.
How do I go about making this change?

Thanks.
Bert.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: PPC64 libmvec implementation of sincos
  2019-12-08 21:40                         ` GT
@ 2019-12-09  8:40                           ` Richard Biener
  2019-12-09 17:36                             ` GT
                                               ` (2 more replies)
  0 siblings, 3 replies; 25+ messages in thread
From: Richard Biener @ 2019-12-09  8:40 UTC (permalink / raw)
  To: GT; +Cc: Jakub Jelinek, gcc, Szabolcs Nagy, nd, Bill Schmidt

On Sun, Dec 8, 2019 at 10:40 PM GT <tnggil@protonmail.com> wrote:
>
> ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
> On Friday, December 6, 2019 12:43 PM, Richard Biener richard.guenther@gmail.com wrote:
>
> ...
> ...
>
> > > Are we certain the change we want is to support _Complex double so that
> > > cexpi is auto-vectorized?
> > > Looking at the resulting executable of the code with sincos in the
> > > loop, the only function called
> > > is sincos. Not builtin_cexpi or any variant of cexpi. File
> > > gcc/builtins.c expands calls to builtin_cexpi
> > > to sincos! What is gained by the compiler going through the
> > > transformations sincos -> builtin_cexpi ->
> > > sincos?
> >
> > Yes, we want to support vectorizing cexpi because that is what the compiler will lower sincos to. The sincos API is painful to deal with due to the data dependences it introduces. Now, the vectorizer can of course emit calls to a vectorized sincos it just needs to be able to deal with cexpi input IL.
> >
> > Richard.
>
> I'm modifying the code trying to get complex double accepted as a valid type by the vectorizer.
> This is the first time I'm dealing with GCC source so I ask for some patience.
>
> Function mode_for_vector in gcc/stor-layout.c requires a new else-if for complex double. I cannot
> seem to find a header file where MIN_MODE_VECTOR_FLOAT and similar macros are defined. I expect
> a new MIN_MODE_COMPLEX_VECTOR_FLOAT to be defined in the same file as the existing similar macros.
> How do I go about making this change?

You don't want to do it this way but map _Complex double to a vector
of 2 * n doubles instead.
Look into get_related_vectype_for_scalar_type where it alreday has
code to "change" the
scalar type into something that fits what we allow for vectors.

Richard.

> Thanks.
> Bert.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: PPC64 libmvec implementation of sincos
  2019-12-09  8:40                           ` Richard Biener
@ 2019-12-09 17:36                             ` GT
  2019-12-11 17:17                               ` GT
  2019-12-18 16:50                             ` GT
  2019-12-28 20:01                             ` GT
  2 siblings, 1 reply; 25+ messages in thread
From: GT @ 2019-12-09 17:36 UTC (permalink / raw)
  To: Richard Biener
  Cc: Jakub Jelinek, gcc, Szabolcs Nagy, nd, Bill Schmidt, tnggil

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Monday, December 9, 2019 3:39 AM, Richard Biener richard.guenther@gmail.com wrote:

> > I'm modifying the code trying to get complex double accepted as a valid type by the vectorizer.
> > This is the first time I'm dealing with GCC source so I ask for some patience.
> > Function mode_for_vector in gcc/stor-layout.c requires a new else-if for complex double. I cannot
> > seem to find a header file where MIN_MODE_VECTOR_FLOAT and similar macros are defined. I expect
> > a new MIN_MODE_COMPLEX_VECTOR_FLOAT to be defined in the same file as the existing similar macros.
> > How do I go about making this change?
>
> You don't want to do it this way but map _Complex double to a vector
> of 2 * n doubles instead.
> Look into get_related_vectype_for_scalar_type where it alreday has
> code to "change" the
> scalar type into something that fits what we allow for vectors.
>

Function get_related_vectype_for_scalar_type doesn't exist. There is one named
get_vectype_for_scalar_type, which in turn calls get_vectype_for_scalar_type_and_size. In that
last function I already have 2 changes to prevent NULL_TREE being returned for _Complex double.

1.  In the first if statement of the function, added new condition !is_complex_float_mode (...),
    with arguments identical to those of the existing !is_int_mode and !is_float_mode conditions.

2.  In the 2nd if statement, the else-if has a new condition !COMPLEX_FLOAT_TYPE_P (scalar_type)

    After those changes, NULL_TREE is returned by a clause of the if statement whose first condition
    is if (known_eq (size, 0U)). The 2nd part of the else-if returns true for !mode_for_vector (...).

    Unless the correct path should involve a call similar to build_nonstandard_integer_type in the
    2nd if statement, I still end up requiring the change to mode_for_vector as in my last post.

    Bert.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: PPC64 libmvec implementation of sincos
  2019-12-09 17:36                             ` GT
@ 2019-12-11 17:17                               ` GT
  0 siblings, 0 replies; 25+ messages in thread
From: GT @ 2019-12-11 17:17 UTC (permalink / raw)
  To: Richard Biener
  Cc: Jakub Jelinek, gcc, Szabolcs Nagy, nd, Bill Schmidt, tnggil

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Monday, December 9, 2019 12:36 PM, GT <tnggil@protonmail.com> wrote:

> ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
> On Monday, December 9, 2019 3:39 AM, Richard Biener richard.guenther@gmail.com wrote:
>
> > > I'm modifying the code trying to get complex double accepted as a valid type by the vectorizer.
> > > This is the first time I'm dealing with GCC source so I ask for some patience.
> > > Function mode_for_vector in gcc/stor-layout.c requires a new else-if for complex double. I cannot
> > > seem to find a header file where MIN_MODE_VECTOR_FLOAT and similar macros are defined. I expect
> > > a new MIN_MODE_COMPLEX_VECTOR_FLOAT to be defined in the same file as the existing similar macros.
> > > How do I go about making this change?
> >
> > You don't want to do it this way but map _Complex double to a vector
> > of 2 * n doubles instead.
> > Look into get_related_vectype_for_scalar_type where it alreday has
> > code to "change" the
> > scalar type into something that fits what we allow for vectors.
>
> Function get_related_vectype_for_scalar_type doesn't exist. There is one named
> get_vectype_for_scalar_type, which in turn calls get_vectype_for_scalar_type_and_size. In that
> last function I already have 2 changes to prevent NULL_TREE being returned for _Complex double.
>
> 1.  In the first if statement of the function, added new condition !is_complex_float_mode (...),
>     with arguments identical to those of the existing !is_int_mode and !is_float_mode conditions.
>
> 2.  In the 2nd if statement, the else-if has a new condition !COMPLEX_FLOAT_TYPE_P (scalar_type)
>
>     After those changes, NULL_TREE is returned by a clause of the if statement whose first condition
>     is if (known_eq (size, 0U)). The 2nd part of the else-if returns true for !mode_for_vector (...).
>
>     Unless the correct path should involve a call similar to build_nonstandard_integer_type in the
>     2nd if statement, I still end up requiring the change to mode_for_vector as in my last post.
>
>     Bert.
>

Please disregard the most recent post. I was using a repository that was outdated. After an update
I see function get_related_vectype_for_scalar_type in the code.

Bert.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: PPC64 libmvec implementation of sincos
  2019-12-09  8:40                           ` Richard Biener
  2019-12-09 17:36                             ` GT
@ 2019-12-18 16:50                             ` GT
  2019-12-28 20:01                             ` GT
  2 siblings, 0 replies; 25+ messages in thread
From: GT @ 2019-12-18 16:50 UTC (permalink / raw)
  To: Richard Biener
  Cc: Jakub Jelinek, gcc, Szabolcs Nagy, nd, Bill Schmidt, tnggil

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Monday, December 9, 2019 3:39 AM, Richard Biener <richard.guenther@gmail.com> wrote:

> You don't want to do it this way but map _Complex double to a vector
> of 2 * n doubles instead.
> Look into get_related_vectype_for_scalar_type where it alreday has
> code to "change" the
> scalar type into something that fits what we allow for vectors.
>
> Richard.

I'm a little farther along. I'm comparing a compilation that calls sine in a loop, and
is successfully vectorized, to a compilation calling sincos in a loop and for which
vectorization fails.

The earliest difference between the two now is that vectorizing sine returns true
in the call to vectorizable_simd_clone_call but is false for vectorizing sincos.

My understanding is that sine is known to be vectorizable because there is a vector sine
prototype in GLIBC's math-vector.h. In which director(y|ies) do declaration and definition
of a vector version of built-in cexpi go?

Bert.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: PPC64 libmvec implementation of sincos
  2019-12-09  8:40                           ` Richard Biener
  2019-12-09 17:36                             ` GT
  2019-12-18 16:50                             ` GT
@ 2019-12-28 20:01                             ` GT
  2020-01-09 13:43                               ` Richard Biener
  2 siblings, 1 reply; 25+ messages in thread
From: GT @ 2019-12-28 20:01 UTC (permalink / raw)
  To: Richard Biener
  Cc: Jakub Jelinek, gcc, Szabolcs Nagy, nd, Bill Schmidt, tnggil

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Monday, December 9, 2019 3:39 AM, Richard Biener <richard.guenther@gmail.com> wrote:

> > I'm modifying the code trying to get complex double accepted as a valid type by the vectorizer.
> > This is the first time I'm dealing with GCC source so I ask for some patience.
> > Function mode_for_vector in gcc/stor-layout.c requires a new else-if for complex double. I cannot
> > seem to find a header file where MIN_MODE_VECTOR_FLOAT and similar macros are defined. I expect
> > a new MIN_MODE_COMPLEX_VECTOR_FLOAT to be defined in the same file as the existing similar macros.
> > How do I go about making this change?
>
> You don't want to do it this way but map _Complex double to a vector
> of 2 * n doubles instead.
> Look into get_related_vectype_for_scalar_type where it alreday has
> code to "change" the
> scalar type into something that fits what we allow for vectors.
>

I need more guidance on how to proceed here. Function mode_for_vector is called by
build_vector_type_for_mode. The latter is called by get_related_vectype_for_scalar_type,
which you suggested to study for a preferred implementation. The code in mode_for_vector
relies on the relationship that vector mode E_V2DFmode is composed of E_DFmode scalars.
The complex argument has mode E_DCmode. Where should I create a relationship between
E_V2DFmode and E_DCmode so that mode_for_vector returns the appropriate vector mode?
The solution has to be general so that single precision float complex vectorization is
also supported.

Bert.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: PPC64 libmvec implementation of sincos
  2019-12-28 20:01                             ` GT
@ 2020-01-09 13:43                               ` Richard Biener
  2020-01-16 11:40                                 ` GT
  0 siblings, 1 reply; 25+ messages in thread
From: Richard Biener @ 2020-01-09 13:43 UTC (permalink / raw)
  To: GT; +Cc: Jakub Jelinek, gcc, Szabolcs Nagy, nd, Bill Schmidt

On Sat, Dec 28, 2019 at 9:01 PM GT <tnggil@protonmail.com> wrote:
>
> ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
> On Monday, December 9, 2019 3:39 AM, Richard Biener <richard.guenther@gmail.com> wrote:
>
> > > I'm modifying the code trying to get complex double accepted as a valid type by the vectorizer.
> > > This is the first time I'm dealing with GCC source so I ask for some patience.
> > > Function mode_for_vector in gcc/stor-layout.c requires a new else-if for complex double. I cannot
> > > seem to find a header file where MIN_MODE_VECTOR_FLOAT and similar macros are defined. I expect
> > > a new MIN_MODE_COMPLEX_VECTOR_FLOAT to be defined in the same file as the existing similar macros.
> > > How do I go about making this change?
> >
> > You don't want to do it this way but map _Complex double to a vector
> > of 2 * n doubles instead.
> > Look into get_related_vectype_for_scalar_type where it alreday has
> > code to "change" the
> > scalar type into something that fits what we allow for vectors.
> >
>
> I need more guidance on how to proceed here. Function mode_for_vector is called by
> build_vector_type_for_mode. The latter is called by get_related_vectype_for_scalar_type,
> which you suggested to study for a preferred implementation. The code in mode_for_vector
> relies on the relationship that vector mode E_V2DFmode is composed of E_DFmode scalars.
> The complex argument has mode E_DCmode. Where should I create a relationship between
> E_V2DFmode and E_DCmode so that mode_for_vector returns the appropriate vector mode?
> The solution has to be general so that single precision float complex vectorization is
> also supported.

For _Complex types you call mode_for_vector with the component type
but twice the number
of elements.  I would not suggest to try making a vector type with
complex components.

As for the other question for testing you probably want to provide a
OMP simd declaration
of a function like

_Complex double mycexpi (double);

and make a testcase like

void foo (_Complex double * __restrict out, double *in)
{
  for (int i = 0; i < 1024; ++i)
    {
       out[i] = mycexpi (in[i]);
    }
}

or eventually with two output arrays and explicit __real/__imag
processing.  The real
and main question is how is the OMP SIMD declaration of mycexpi looking like?

So I'd completely side-step sincos() and GCCs sincos() ->
__builtin_cepxi transform
and concentrate on OMP SIMD of a function with the signature we need to handle.

Richard.

> Bert.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: PPC64 libmvec implementation of sincos
  2020-01-09 13:43                               ` Richard Biener
@ 2020-01-16 11:40                                 ` GT
  2020-01-17  8:17                                   ` GT
  0 siblings, 1 reply; 25+ messages in thread
From: GT @ 2020-01-16 11:40 UTC (permalink / raw)
  To: Richard Biener, tnggil
  Cc: Jakub Jelinek, gcc, Szabolcs Nagy, nd, Bill Schmidt

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Thursday, January 9, 2020 8:42 AM, Richard Biener <richard.guenther@gmail.com> wrote:

>
> As for the other question for testing you probably want to provide a
> OMP simd declaration
> of a function like
>
> _Complex double mycexpi (double);
>
> and make a testcase like
>
> void foo (_Complex double * __restrict out, double *in)
> {
> for (int i = 0; i < 1024; ++i)
> {
> out[i] = mycexpi (in[i]);
> }
> }
>
> or eventually with two output arrays and explicit __real/__imag
> processing. The real
> and main question is how is the OMP SIMD declaration of mycexpi looking like?
>
> So I'd completely side-step sincos() and GCCs sincos() ->
> __builtin_cepxi transform
> and concentrate on OMP SIMD of a function with the signature we need to handle.
>
> Richard.

I think what is required here is to attach either #pragma omp declare simd or __attribute__ ((simd))
to the declaration of builtin cexpi. In gcc/builtins.def, some attributes are provided during
creation of cexpi (line 656, call containing BUILT_IN_CEXPI). Attaching the simd attributes to
function declarations is how sin, cos, and the other math functions were handled in math-vector.h
glibc header file.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: PPC64 libmvec implementation of sincos
  2020-01-16 11:40                                 ` GT
@ 2020-01-17  8:17                                   ` GT
  2020-01-17 16:37                                     ` Richard Biener
  0 siblings, 1 reply; 25+ messages in thread
From: GT @ 2020-01-17  8:17 UTC (permalink / raw)
  To: Richard Biener, tnggil
  Cc: Jakub Jelinek, gcc, Szabolcs Nagy, nd, Bill Schmidt


‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Wednesday, January 15, 2020 3:20 PM, GT <tnggil@protonmail.com> wrote:

> ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
> On Thursday, January 9, 2020 8:42 AM, Richard Biener richard.guenther@gmail.com wrote:
>
> > As for the other question for testing you probably want to provide a
> > OMP simd declaration
> > of a function like
> > _Complex double mycexpi (double);
> > and make a testcase like
> > void foo (_Complex double * __restrict out, double *in)
> > {
> > for (int i = 0; i < 1024; ++i)
> > {
> > out[i] = mycexpi (in[i]);
> > }
> > }
> > or eventually with two output arrays and explicit __real/__imag
> > processing. The real
> > and main question is how is the OMP SIMD declaration of mycexpi looking like?
> > So I'd completely side-step sincos() and GCCs sincos() ->
> > __builtin_cepxi transform
> > and concentrate on OMP SIMD of a function with the signature we need to handle.
> > Richard.
>
> I think what is required here is to attach either #pragma omp declare simd orattribute ((simd))
> to the declaration of builtin cexpi. In gcc/builtins.def, some attributes are provided during
> creation of cexpi (line 656, call containing BUILT_IN_CEXPI). Attaching the simd attributes to
> function declarations is how sin, cos, and the other math functions were handled in math-vector.h
> glibc header file.

You probably intended that we first teach GCC how to vectorize any function which returns a
_Complex double and has a single parameter of type double. When that's done, move on to solving
the specific vectorization of __builtin_cexpi. Right?

Bert.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: PPC64 libmvec implementation of sincos
  2020-01-17  8:17                                   ` GT
@ 2020-01-17 16:37                                     ` Richard Biener
  0 siblings, 0 replies; 25+ messages in thread
From: Richard Biener @ 2020-01-17 16:37 UTC (permalink / raw)
  To: GT; +Cc: Jakub Jelinek, gcc, Szabolcs Nagy, nd, Bill Schmidt

On Thu, Jan 16, 2020 at 4:54 PM GT <tnggil@protonmail.com> wrote:
>
>
> ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
> On Wednesday, January 15, 2020 3:20 PM, GT <tnggil@protonmail.com> wrote:
>
> > ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
> > On Thursday, January 9, 2020 8:42 AM, Richard Biener richard.guenther@gmail.com wrote:
> >
> > > As for the other question for testing you probably want to provide a
> > > OMP simd declaration
> > > of a function like
> > > _Complex double mycexpi (double);
> > > and make a testcase like
> > > void foo (_Complex double * __restrict out, double *in)
> > > {
> > > for (int i = 0; i < 1024; ++i)
> > > {
> > > out[i] = mycexpi (in[i]);
> > > }
> > > }
> > > or eventually with two output arrays and explicit __real/__imag
> > > processing. The real
> > > and main question is how is the OMP SIMD declaration of mycexpi looking like?
> > > So I'd completely side-step sincos() and GCCs sincos() ->
> > > __builtin_cepxi transform
> > > and concentrate on OMP SIMD of a function with the signature we need to handle.
> > > Richard.
> >
> > I think what is required here is to attach either #pragma omp declare simd orattribute ((simd))
> > to the declaration of builtin cexpi. In gcc/builtins.def, some attributes are provided during
> > creation of cexpi (line 656, call containing BUILT_IN_CEXPI). Attaching the simd attributes to
> > function declarations is how sin, cos, and the other math functions were handled in math-vector.h
> > glibc header file.
>
> You probably intended that we first teach GCC how to vectorize any function which returns a
> _Complex double and has a single parameter of type double. When that's done, move on to solving
> the specific vectorization of __builtin_cexpi. Right?

Yes.

Richard.

> Bert.

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2020-01-17  8:19 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-09-27 19:23 PPC64 libmvec implementation of sincos GT
2019-09-30 13:53 ` Szabolcs Nagy
2019-09-30 17:30   ` GT
2019-09-30 18:07     ` Szabolcs Nagy
2019-09-30 18:09   ` Richard Biener
2019-11-25 16:53     ` GT
2019-11-27  8:19       ` Richard Biener
2019-12-04 20:53         ` GT
2019-12-05  9:44           ` Richard Biener
2019-12-05 17:46             ` GT
2019-12-06 10:48               ` Richard Biener
2019-12-06 11:15                 ` Jakub Jelinek
2019-12-06 11:39                   ` Richard Biener
2019-12-06 16:50                     ` GT
2019-12-06 17:43                       ` Richard Biener
2019-12-08 21:40                         ` GT
2019-12-09  8:40                           ` Richard Biener
2019-12-09 17:36                             ` GT
2019-12-11 17:17                               ` GT
2019-12-18 16:50                             ` GT
2019-12-28 20:01                             ` GT
2020-01-09 13:43                               ` Richard Biener
2020-01-16 11:40                                 ` GT
2020-01-17  8:17                                   ` GT
2020-01-17 16:37                                     ` Richard Biener

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).