* [Bug tree-optimization/40770] Vectorization of complex types, vectorization of sincos missing
2009-07-16 9:42 [Bug tree-optimization/40770] New: Vectorization of complex types, vectorization of sincos missing rguenth at gcc dot gnu dot org
@ 2009-07-16 9:43 ` ubizjak at gmail dot com
2009-07-16 9:44 ` rguenth at gcc dot gnu dot org
` (8 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: ubizjak at gmail dot com @ 2009-07-16 9:43 UTC (permalink / raw)
To: gcc-bugs
--
ubizjak at gmail dot com changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|UNCONFIRMED |NEW
Ever Confirmed|0 |1
Last reconfirmed|0000-00-00 00:00:00 |2009-07-16 09:43:14
date| |
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40770
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug tree-optimization/40770] Vectorization of complex types, vectorization of sincos missing
2009-07-16 9:42 [Bug tree-optimization/40770] New: Vectorization of complex types, vectorization of sincos missing rguenth at gcc dot gnu dot org
2009-07-16 9:43 ` [Bug tree-optimization/40770] " ubizjak at gmail dot com
@ 2009-07-16 9:44 ` rguenth at gcc dot gnu dot org
2009-07-16 12:29 ` irar at il dot ibm dot com
` (7 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2009-07-16 9:44 UTC (permalink / raw)
To: gcc-bugs
------- Comment #1 from rguenth at gcc dot gnu dot org 2009-07-16 09:44 -------
The middle-end presents the vectorizer with
<bb 3>:
# i_13 = PHI <i_7(4), 0(2)>
# ivtmp.26_8 = PHI <ivtmp.26_16(4), 1024(2)>
D.1623_3 = xd[i_13];
sincostmp.21_1 = __builtin_cexpi (D.1623_3);
D.1624_4 = IMAGPART_EXPR <sincostmp.21_1>;
sd[i_13] = D.1624_4;
D.1625_6 = REALPART_EXPR <sincostmp.21_1>;
cd[i_13] = D.1625_6;
i_7 = i_13 + 1;
ivtmp.26_16 = ivtmp.26_8 - 1;
if (ivtmp.26_16 != 0)
goto <bb 4>;
else
goto <bb 5>;
which has first of all complex types (they should be recognized as V2DF
with vectorization factor 1, thus SLP-able).
For the float case
<bb 3>:
# i_13 = PHI <i_7(4), 0(2)>
# ivtmp.6_8 = PHI <ivtmp.6_16(4), 1024(2)>
D.1610_3 = xf[i_13];
sincostmp.1_1 = __builtin_cexpif (D.1610_3);
D.1611_4 = IMAGPART_EXPR <sincostmp.1_1>;
sf[i_13] = D.1611_4;
D.1612_6 = REALPART_EXPR <sincostmp.1_1>;
cf[i_13] = D.1612_6;
i_7 = i_13 + 1;
ivtmp.6_16 = ivtmp.6_8 - 1;
if (ivtmp.6_16 != 0)
goto <bb 4>;
else
goto <bb 5>;
they should be V2SF, thus use V4SF and vectorization factor 2. Still use
SLP probably.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40770
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug tree-optimization/40770] Vectorization of complex types, vectorization of sincos missing
2009-07-16 9:42 [Bug tree-optimization/40770] New: Vectorization of complex types, vectorization of sincos missing rguenth at gcc dot gnu dot org
2009-07-16 9:43 ` [Bug tree-optimization/40770] " ubizjak at gmail dot com
2009-07-16 9:44 ` rguenth at gcc dot gnu dot org
@ 2009-07-16 12:29 ` irar at il dot ibm dot com
2009-07-16 13:05 ` rguenther at suse dot de
` (6 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: irar at il dot ibm dot com @ 2009-07-16 12:29 UTC (permalink / raw)
To: gcc-bugs
------- Comment #2 from irar at il dot ibm dot com 2009-07-16 12:29 -------
pr40770.c:20: note: ==> examining statement: sincostmp.21_1 = __builtin_cexpi
(D.1625_3);
pr40770.c:20: note: get vectype for scalar type: complex double
pr40770.c:20: note: not vectorized: unsupported data-type complex double
make_vector_type returns NULL for this type.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40770
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug tree-optimization/40770] Vectorization of complex types, vectorization of sincos missing
2009-07-16 9:42 [Bug tree-optimization/40770] New: Vectorization of complex types, vectorization of sincos missing rguenth at gcc dot gnu dot org
` (2 preceding siblings ...)
2009-07-16 12:29 ` irar at il dot ibm dot com
@ 2009-07-16 13:05 ` rguenther at suse dot de
2009-07-16 13:32 ` burnus at gcc dot gnu dot org
` (5 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: rguenther at suse dot de @ 2009-07-16 13:05 UTC (permalink / raw)
To: gcc-bugs
------- Comment #3 from rguenther at suse dot de 2009-07-16 13:05 -------
Subject: Re: Vectorization of complex types,
vectorization of sincos missing
On Thu, 16 Jul 2009, irar at il dot ibm dot com wrote:
> ------- Comment #2 from irar at il dot ibm dot com 2009-07-16 12:29 -------
> pr40770.c:20: note: ==> examining statement: sincostmp.21_1 = __builtin_cexpi
> (D.1625_3);
> pr40770.c:20: note: get vectype for scalar type: complex double
> pr40770.c:20: note: not vectorized: unsupported data-type complex double
>
> make_vector_type returns NULL for this type.
Yes - there is no vector type for complex double. But the vectorizer
could query for a vector type for the complex component type (double)
and divide the vector element count by 2 (for complex) to get the
vectorization factor which would be 1 here. Should SLP the be possible
for that loop?
Richard.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40770
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug tree-optimization/40770] Vectorization of complex types, vectorization of sincos missing
2009-07-16 9:42 [Bug tree-optimization/40770] New: Vectorization of complex types, vectorization of sincos missing rguenth at gcc dot gnu dot org
` (3 preceding siblings ...)
2009-07-16 13:05 ` rguenther at suse dot de
@ 2009-07-16 13:32 ` burnus at gcc dot gnu dot org
2009-07-16 13:58 ` rguenther at suse dot de
` (4 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: burnus at gcc dot gnu dot org @ 2009-07-16 13:32 UTC (permalink / raw)
To: gcc-bugs
------- Comment #4 from burnus at gcc dot gnu dot org 2009-07-16 13:32 -------
(In reply to comment #3)
> Yes - there is no vector type for complex double. But the vectorizer
> could query for a vector type for the complex component type (double)
> and divide the vector element count by 2 (for complex) to get the
> vectorization factor which would be 1 here.
I do not know much about this, but wouldn't that fail if one wants to vectorize
true complex functions such as ccosf (assuming that they are in principle
vectorizable)?
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40770
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug tree-optimization/40770] Vectorization of complex types, vectorization of sincos missing
2009-07-16 9:42 [Bug tree-optimization/40770] New: Vectorization of complex types, vectorization of sincos missing rguenth at gcc dot gnu dot org
` (4 preceding siblings ...)
2009-07-16 13:32 ` burnus at gcc dot gnu dot org
@ 2009-07-16 13:58 ` rguenther at suse dot de
2009-07-16 17:31 ` irar at il dot ibm dot com
` (3 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: rguenther at suse dot de @ 2009-07-16 13:58 UTC (permalink / raw)
To: gcc-bugs
------- Comment #5 from rguenther at suse dot de 2009-07-16 13:57 -------
Subject: Re: Vectorization of complex types,
vectorization of sincos missing
On Thu, 16 Jul 2009, burnus at gcc dot gnu dot org wrote:
> ------- Comment #4 from burnus at gcc dot gnu dot org 2009-07-16 13:32 -------
> (In reply to comment #3)
> > Yes - there is no vector type for complex double. But the vectorizer
> > could query for a vector type for the complex component type (double)
> > and divide the vector element count by 2 (for complex) to get the
> > vectorization factor which would be 1 here.
>
> I do not know much about this, but wouldn't that fail if one wants to vectorize
> true complex functions such as ccosf (assuming that they are in principle
> vectorizable)?
Well, for ccosf we would have a vectorization factor of 2 left for V4SF.
Of course this assumes that we present the vectorizer with a vectorized
ccosf with the signature v4sf (*)(v4sf). Or we would need to introduce
complex vector modes - which I'd rather avoid.
Richard.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40770
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug tree-optimization/40770] Vectorization of complex types, vectorization of sincos missing
2009-07-16 9:42 [Bug tree-optimization/40770] New: Vectorization of complex types, vectorization of sincos missing rguenth at gcc dot gnu dot org
` (5 preceding siblings ...)
2009-07-16 13:58 ` rguenther at suse dot de
@ 2009-07-16 17:31 ` irar at il dot ibm dot com
2009-07-20 11:18 ` irar at il dot ibm dot com
` (2 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: irar at il dot ibm dot com @ 2009-07-16 17:31 UTC (permalink / raw)
To: gcc-bugs
------- Comment #6 from irar at il dot ibm dot com 2009-07-16 17:31 -------
(In reply to comment #3)
> > make_vector_type returns NULL for this type.
> Yes - there is no vector type for complex double. But the vectorizer
> could query for a vector type for the complex component type (double)
> and divide the vector element count by 2 (for complex) to get the
> vectorization factor which would be 1 here.
I see.
> Should SLP the be possible
> for that loop?
Not with the current implementation - SLP needs strided stores to start. Here
the stores are not even adjacent. I think, it would be better to vectorize this
loop with regular loop-based vectorization to avoid permutations. I'll take a
better look on Sunday.
Ira
> Richard.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40770
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug tree-optimization/40770] Vectorization of complex types, vectorization of sincos missing
2009-07-16 9:42 [Bug tree-optimization/40770] New: Vectorization of complex types, vectorization of sincos missing rguenth at gcc dot gnu dot org
` (6 preceding siblings ...)
2009-07-16 17:31 ` irar at il dot ibm dot com
@ 2009-07-20 11:18 ` irar at il dot ibm dot com
2009-07-20 12:30 ` ubizjak at gmail dot com
2009-07-20 12:55 ` rguenther at suse dot de
9 siblings, 0 replies; 11+ messages in thread
From: irar at il dot ibm dot com @ 2009-07-20 11:18 UTC (permalink / raw)
To: gcc-bugs
------- Comment #7 from irar at il dot ibm dot com 2009-07-20 11:18 -------
AFAIU, querying for the component type of complex type is not difficult to
implement.
I think, that loop-based vectorization is preferable here, so we should stay
with vectorization factor of 2 for doubles.
The next problem is to vectorize
D.1611_4 = IMAGPART_EXPR <sincostmp.1_1>;
and
D.1612_6 = REALPART_EXPR <sincostmp.1_1>;
Currently, we support only loads and stores with IMAGPART/REALPART_EXPR,
vectorizing them as strided accesses, with extract odd and even operations for
loads. So, we will have to support interleaving of non-memory variables.
Does __builtin_cexpi have a vector implementation? If so, does it return two
vectors?
If not, I guess, we need something like:
sincostmp.1 = __builtin_cexpi (xd[i]);
sincostmp.2 = __builtin_cexpi (xd[i+1]);
v1 = VEC_EXTRACT_EVEN (sincostmp.1, sincostmp.2);
v2 = VEC_EXTRACT_ODD (sincostmp.1, sincostmp.2);
sf[i:i+1] = v1;
cf[i:i+1] = v2;
i = i + 2;
Or we can use the two vectors from vectorized __builtin_cexpi as parameters of
extract operations.
Does that make sense?
Ira
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40770
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug tree-optimization/40770] Vectorization of complex types, vectorization of sincos missing
2009-07-16 9:42 [Bug tree-optimization/40770] New: Vectorization of complex types, vectorization of sincos missing rguenth at gcc dot gnu dot org
` (7 preceding siblings ...)
2009-07-20 11:18 ` irar at il dot ibm dot com
@ 2009-07-20 12:30 ` ubizjak at gmail dot com
2009-07-20 12:55 ` rguenther at suse dot de
9 siblings, 0 replies; 11+ messages in thread
From: ubizjak at gmail dot com @ 2009-07-20 12:30 UTC (permalink / raw)
To: gcc-bugs
------- Comment #8 from ubizjak at gmail dot com 2009-07-20 12:30 -------
(In reply to comment #7)
> Does __builtin_cexpi have a vector implementation? If so, does it return two
> vectors?
No, only vectorized sincos is implemented (see also links at PR40766, Comment
#11).
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40770
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug tree-optimization/40770] Vectorization of complex types, vectorization of sincos missing
2009-07-16 9:42 [Bug tree-optimization/40770] New: Vectorization of complex types, vectorization of sincos missing rguenth at gcc dot gnu dot org
` (8 preceding siblings ...)
2009-07-20 12:30 ` ubizjak at gmail dot com
@ 2009-07-20 12:55 ` rguenther at suse dot de
9 siblings, 0 replies; 11+ messages in thread
From: rguenther at suse dot de @ 2009-07-20 12:55 UTC (permalink / raw)
To: gcc-bugs
------- Comment #9 from rguenther at suse dot de 2009-07-20 12:55 -------
Subject: Re: Vectorization of complex types,
vectorization of sincos missing
On Mon, 20 Jul 2009, irar at il dot ibm dot com wrote:
>
>
> ------- Comment #7 from irar at il dot ibm dot com 2009-07-20 11:18 -------
> AFAIU, querying for the component type of complex type is not difficult to
> implement.
> I think, that loop-based vectorization is preferable here, so we should stay
> with vectorization factor of 2 for doubles.
>
> The next problem is to vectorize
> D.1611_4 = IMAGPART_EXPR <sincostmp.1_1>;
> and
> D.1612_6 = REALPART_EXPR <sincostmp.1_1>;
>
> Currently, we support only loads and stores with IMAGPART/REALPART_EXPR,
> vectorizing them as strided accesses, with extract odd and even operations for
> loads. So, we will have to support interleaving of non-memory variables.
>
> Does __builtin_cexpi have a vector implementation? If so, does it return two
> vectors?
No, currently cexpi doesn't have a vectorized version. We could add
an internal builtin for that that takes a vector as argument and
returns a vector with complex components. And lower this during expansion
to a suitable available form (eventually just two calls).
> If not, I guess, we need something like:
>
> sincostmp.1 = __builtin_cexpi (xd[i]);
> sincostmp.2 = __builtin_cexpi (xd[i+1]);
> v1 = VEC_EXTRACT_EVEN (sincostmp.1, sincostmp.2);
> v2 = VEC_EXTRACT_ODD (sincostmp.1, sincostmp.2);
> sf[i:i+1] = v1;
> cf[i:i+1] = v2;
> i = i + 2;
Yes, that was my initial idea.
> Or we can use the two vectors from vectorized __builtin_cexpi as parameters of
> extract operations.
> Does that make sense?
Yes, I think so. With a vectorized builtin we'd have
v0 = xd[i:i+1];
sincostmp.1 = __builtin_vect_cexpi (v0);
v1 = VEC_EXTRACT_EVEN (sincostmp.1[0], sincostmp.1[1]);
v2 = VEC_EXTRACT_ODD (sincostmp.1[0], sincostmp.1[1]);
sf[i:i+1] = v1;
cf[i:i+1] = v2;
i = i + 2;
where sincostmp.1[0] would select the lower half of a V4DF and
sincostmp.1[1] the upper half of a V4DF. But that's probably
more difficult as we'd have both V2DF and V4DF in the IL.
Richard.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40770
^ permalink raw reply [flat|nested] 11+ messages in thread