* Complex numbers support: discussions summary
@ 2023-09-25 15:15 Sylvain Noiry
2023-09-26 7:30 ` Richard Biener
2023-09-26 15:46 ` Joseph Myers
0 siblings, 2 replies; 15+ messages in thread
From: Sylvain Noiry @ 2023-09-25 15:15 UTC (permalink / raw)
To: gcc; +Cc: sylvain.noiry
Hi,
We had very interesting discussions during our presentation with Paul on
the
support of complex numbers in gcc at the Cauldron.
Thank you all for your participation !
Here is a small summary from our viewpoint:
- Replace CONCAT with a backend defined internal representation in RTL
--> No particular problems
- Allow backend to write patterns for operation on complex modes
--> No particular problems
- Conditional lowering depending on whether a pattern exists or not
--> Concerns when the vectorization of split complex operations performs
better
than not vectorized unified complex operations
- Centralize complex lowering in cplxlower
--> No particular problems if it doesn't prevent IEEE compliance and
optimizations (like const folding)
- Vectorization of complex operations
--> 2 representations (interleaved and separated real/imag): cannot
impose one
if some machines prefer the other
--> Complex are composite modes, the vectorizer assumes that the inner
mode is
scalar to do some optimizations (which ones ?)
--> Mixed split/unified complex operations cannot be vectorized easely
--> Assuming that the inner representation of complex vectors is let to
target
backends, the vectorizer doesn't know it, which prevent some
optimizations
(which ones ?)
- Explicit vectors of complex
--> Cplxlower cannot lower it, and moving veclower before cplxlower is a
bad
idea as it prevents some optimizations
--> Teaching cplxlower how to deal with vectors of complex seems to be a
reasonable alternative
--> Concerns about ABI or indexing if the internal representation is let
to the
backend and differs from the representation in memory
- Impact of the current SLP pattern matching of complex operations
--> Only with -ffast-math
--> It can match user defined operations (not C99) that can be
simplified with a
complex instruction
--> Dedicated opcode and real vector type choosen VS standard opcode and
complex
mode in our implementation
--> Need to preserve SLP pattern matching as too many applications
redefines
complex and bypass C99 standard.
--> So need to harmonize with our implementation
- Support of the pure imaginary type (_Imaginary)
--> Still not supported by gcc (and llvm), neither in our implementation
--> Issues comes from the fact that an imaginary is not a complex with
real part
set to 0
--> The same issue with complex multiplication by a real (which is split
in the
frontend, and our implementation hasn't changed it yet)
--> Idea: Add an attribute to the Tree complex type which specify pure
real / pure
imaginary / full complex ?
- Fast pattern for IEEE compliant emulated operations
--> Not enough time to discuss about it
Don't hesitate to add something or bring more precision if you want.
As I said at the end of the presentation, we have written a paper which
explains
our implementation in details. You can find it on the wiki page of the
Cauldron
(https://gcc.gnu.org/wiki/cauldron2023talks?action=AttachFile&do=view&target=Exposing+Complex+Numbers+to+Target+Back-ends+%28paper%29.pdf).
Sylvain
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Complex numbers support: discussions summary
2023-09-25 15:15 Complex numbers support: discussions summary Sylvain Noiry
@ 2023-09-26 7:30 ` Richard Biener
2023-09-26 8:29 ` Tamar Christina
` (2 more replies)
2023-09-26 15:46 ` Joseph Myers
1 sibling, 3 replies; 15+ messages in thread
From: Richard Biener @ 2023-09-26 7:30 UTC (permalink / raw)
To: Sylvain Noiry; +Cc: gcc, sylvain.noiry
On Mon, Sep 25, 2023 at 5:17 PM Sylvain Noiry via Gcc <gcc@gcc.gnu.org> wrote:
>
> Hi,
>
> We had very interesting discussions during our presentation with Paul on
> the
> support of complex numbers in gcc at the Cauldron.
>
> Thank you all for your participation !
>
> Here is a small summary from our viewpoint:
>
> - Replace CONCAT with a backend defined internal representation in RTL
> --> No particular problems
>
> - Allow backend to write patterns for operation on complex modes
> --> No particular problems
>
> - Conditional lowering depending on whether a pattern exists or not
> --> Concerns when the vectorization of split complex operations performs
> better
> than not vectorized unified complex operations
>
> - Centralize complex lowering in cplxlower
> --> No particular problems if it doesn't prevent IEEE compliance and
> optimizations (like const folding)
>
> - Vectorization of complex operations
> --> 2 representations (interleaved and separated real/imag): cannot
> impose one
> if some machines prefer the other
> --> Complex are composite modes, the vectorizer assumes that the inner
> mode is
> scalar to do some optimizations (which ones ?)
> --> Mixed split/unified complex operations cannot be vectorized easely
> --> Assuming that the inner representation of complex vectors is let to
> target
> backends, the vectorizer doesn't know it, which prevent some
> optimizations
> (which ones ?)
>
> - Explicit vectors of complex
> --> Cplxlower cannot lower it, and moving veclower before cplxlower is a
> bad
> idea as it prevents some optimizations
> --> Teaching cplxlower how to deal with vectors of complex seems to be a
> reasonable alternative
> --> Concerns about ABI or indexing if the internal representation is let
> to the
> backend and differs from the representation in memory
>
> - Impact of the current SLP pattern matching of complex operations
> --> Only with -ffast-math
> --> It can match user defined operations (not C99) that can be
> simplified with a
> complex instruction
> --> Dedicated opcode and real vector type choosen VS standard opcode and
> complex
> mode in our implementation
> --> Need to preserve SLP pattern matching as too many applications
> redefines
> complex and bypass C99 standard.
> --> So need to harmonize with our implementation
>
> - Support of the pure imaginary type (_Imaginary)
> --> Still not supported by gcc (and llvm), neither in our implementation
> --> Issues comes from the fact that an imaginary is not a complex with
> real part
> set to 0
> --> The same issue with complex multiplication by a real (which is split
> in the
> frontend, and our implementation hasn't changed it yet)
> --> Idea: Add an attribute to the Tree complex type which specify pure
> real / pure
> imaginary / full complex ?
>
> - Fast pattern for IEEE compliant emulated operations
> --> Not enough time to discuss about it
>
> Don't hesitate to add something or bring more precision if you want.
>
> As I said at the end of the presentation, we have written a paper which
> explains
> our implementation in details. You can find it on the wiki page of the
> Cauldron
> (https://gcc.gnu.org/wiki/cauldron2023talks?action=AttachFile&do=view&target=Exposing+Complex+Numbers+to+Target+Back-ends+%28paper%29.pdf).
Thanks for the detailed presentation at the Cauldron.
My personal summary is that I'm less convinced delaying lowering is
the way to go.
I do think that if targets implement complex optabs we should use them but
eventually re-discovering complex operations from lowered form is going to be
more useful. That's because as you said, use of _Complex is limited and people
inventing their own representation. SLP vectorization can discover some ops
already with the limiting factor being that we don't specifically search for
only complex operations (plus we expose the result as vector operations,
requiring target support for the vector ops rather than [SD]Cmode operations).
There's the gimple-isel.cc or the widen-mul pass that perform
instruction selection
which could be enhanced to discover scalar [SD]Cmode operations.
Richard.
> Sylvain
>
>
>
>
>
^ permalink raw reply [flat|nested] 15+ messages in thread
* RE: Complex numbers support: discussions summary
2023-09-26 7:30 ` Richard Biener
@ 2023-09-26 8:29 ` Tamar Christina
2023-09-26 10:19 ` Paul Iannetta
2023-09-26 8:53 ` Paul Iannetta
2023-09-26 18:40 ` Toon Moene
2 siblings, 1 reply; 15+ messages in thread
From: Tamar Christina @ 2023-09-26 8:29 UTC (permalink / raw)
To: Richard Biener, Sylvain Noiry; +Cc: gcc, sylvain.noiry
Hi,
I tried to find you two on Sunday but couldn't locate you. Thanks for the presentation!
> >
> > We had very interesting discussions during our presentation with Paul
> > on the support of complex numbers in gcc at the Cauldron.
> >
> > Thank you all for your participation !
> >
> > Here is a small summary from our viewpoint:
> >
> > - Replace CONCAT with a backend defined internal representation in RTL
> > --> No particular problems
> >
> > - Allow backend to write patterns for operation on complex modes
> > --> No particular problems
> >
> > - Conditional lowering depending on whether a pattern exists or not
> > --> Concerns when the vectorization of split complex operations
> > --> performs
> > better
> > than not vectorized unified complex operations
> >
> > - Centralize complex lowering in cplxlower
> > --> No particular problems if it doesn't prevent IEEE compliance and
> > optimizations (like const folding)
> >
> > - Vectorization of complex operations
> > --> 2 representations (interleaved and separated real/imag): cannot
> > impose one
> > if some machines prefer the other
> > --> Complex are composite modes, the vectorizer assumes that the inner
> > mode is
> > scalar to do some optimizations (which ones ?)
> > --> Mixed split/unified complex operations cannot be vectorized easely
> > --> Assuming that the inner representation of complex vectors is let
> > --> to
> > target
> > backends, the vectorizer doesn't know it, which prevent some
> > optimizations
> > (which ones ?)
> >
> > - Explicit vectors of complex
> > --> Cplxlower cannot lower it, and moving veclower before cplxlower is
> > --> a
> > bad
> > idea as it prevents some optimizations
> > --> Teaching cplxlower how to deal with vectors of complex seems to be
> > --> a
> > reasonable alternative
> > --> Concerns about ABI or indexing if the internal representation is
> > --> let
> > to the
> > backend and differs from the representation in memory
> >
> > - Impact of the current SLP pattern matching of complex operations
> > --> Only with -ffast-math
> > --> It can match user defined operations (not C99) that can be
> > simplified with a
> > complex instruction
> > --> Dedicated opcode and real vector type choosen VS standard opcode
> > --> and
> > complex
> > mode in our implementation
> > --> Need to preserve SLP pattern matching as too many applications
> > redefines
> > complex and bypass C99 standard.
> > --> So need to harmonize with our implementation
> >
> > - Support of the pure imaginary type (_Imaginary)
> > --> Still not supported by gcc (and llvm), neither in our
> > --> implementation Issues comes from the fact that an imaginary is not
> > --> a complex with
> > real part
> > set to 0
> > --> The same issue with complex multiplication by a real (which is
> > --> split
> > in the
> > frontend, and our implementation hasn't changed it yet)
> > --> Idea: Add an attribute to the Tree complex type which specify pure
> > real / pure
> > imaginary / full complex ?
> >
> > - Fast pattern for IEEE compliant emulated operations
> > --> Not enough time to discuss about it
> >
> > Don't hesitate to add something or bring more precision if you want.
> >
> > As I said at the end of the presentation, we have written a paper
> > which explains our implementation in details. You can find it on the
> > wiki page of the Cauldron
> >
> (https://gcc.gnu.org/wiki/cauldron2023talks?action=AttachFile&do=view&tar
> get=Exposing+Complex+Numbers+to+Target+Back-ends+%28paper%29.pdf).
>
> Thanks for the detailed presentation at the Cauldron.
>
> My personal summary is that I'm less convinced delaying lowering is the way
> to go.
I personally like the delayed lowering for scalar because it allows us to properly
reassociate as a unit. That is to say, it's easier to detect a * b * c when they
are still complex ops. And the late lowering will allow beter codegen than today.
However I think we should *unconditionally* not lower them, even in situations
such as a * b * imag(b). This situation can happen by late optimizations anyway
so it has to be dealt with regardless so I don't think it should punt.
I think you can then conditionally lower if the target does *not* implement the
optab. i.e. for AArch64 the complex mode wouldn't be useful.
> I do think that if targets implement complex optabs we should use them but
> eventually re-discovering complex operations from lowered form is going to be
> more useful.
> That's because as you said, use of _Complex is limited and
> people inventing their own representation. SLP vectorization can discover
> some ops already with the limiting factor being that we don't specifically
> search for only complex operations (plus we expose the result as vector
> operations, requiring target support for the vector ops rather than [SD]Cmode
> operations).
I don't think the two are mutually exclusive, I do think we should form complex
instructions from scalar ops as well, because we can generate better expansions.
Today we only expand efficiently when the COMPLEX_EXPR node is still there
and bitfield expansion knows then that the entire value will be written. So
rediscovery will help there.
I also think if we don't lower early, as you mention we should lower the complex
operations in the vectorizer. I don't think having the complex mode as vectors
are useful. This can be easily done by using the scalar vect pattern. It'll have
to handle all arithmetic ops though, but for those the target has an optab we
can form it early which would have the SLP one skip it later.
This also means vec_lower doesn't have issues anymore.
Cheers,
Tamar
>
> There's the gimple-isel.cc or the widen-mul pass that perform instruction
> selection which could be enhanced to discover scalar [SD]Cmode operations.
>
> Richard.
>
> > Sylvain
> >
> >
> >
> >
> >
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Complex numbers support: discussions summary
2023-09-26 8:29 ` Tamar Christina
@ 2023-09-26 10:19 ` Paul Iannetta
0 siblings, 0 replies; 15+ messages in thread
From: Paul Iannetta @ 2023-09-26 10:19 UTC (permalink / raw)
To: Tamar Christina; +Cc: Richard Biener, Sylvain Noiry, gcc, sylvain.noiry
On Tue, Sep 26, 2023 at 08:29:16AM +0000, Tamar Christina via Gcc wrote:
> Hi,
>
> I tried to find you two on Sunday but couldn't locate you. Thanks for the presentation!
Yes, sadly we could not attend on Sunday because we wanted to be back
for Monday.
> > >
> > > We had very interesting discussions during our presentation with Paul
> > > on the support of complex numbers in gcc at the Cauldron.
> > >
> > > Thank you all for your participation !
> > >
> > > Here is a small summary from our viewpoint:
> > >
> > > - Replace CONCAT with a backend defined internal representation in RTL
> > > --> No particular problems
> > >
> > > - Allow backend to write patterns for operation on complex modes
> > > --> No particular problems
> > >
> > > - Conditional lowering depending on whether a pattern exists or not
> > > --> Concerns when the vectorization of split complex operations
> > > --> performs
> > > better
> > > than not vectorized unified complex operations
> > >
> > > - Centralize complex lowering in cplxlower
> > > --> No particular problems if it doesn't prevent IEEE compliance and
> > > optimizations (like const folding)
> > >
> > > - Vectorization of complex operations
> > > --> 2 representations (interleaved and separated real/imag): cannot
> > > impose one
> > > if some machines prefer the other
> > > --> Complex are composite modes, the vectorizer assumes that the inner
> > > mode is
> > > scalar to do some optimizations (which ones ?)
> > > --> Mixed split/unified complex operations cannot be vectorized easely
> > > --> Assuming that the inner representation of complex vectors is let
> > > --> to
> > > target
> > > backends, the vectorizer doesn't know it, which prevent some
> > > optimizations
> > > (which ones ?)
> > >
> > > - Explicit vectors of complex
> > > --> Cplxlower cannot lower it, and moving veclower before cplxlower is
> > > --> a
> > > bad
> > > idea as it prevents some optimizations
> > > --> Teaching cplxlower how to deal with vectors of complex seems to be
> > > --> a
> > > reasonable alternative
> > > --> Concerns about ABI or indexing if the internal representation is
> > > --> let
> > > to the
> > > backend and differs from the representation in memory
> > >
> > > - Impact of the current SLP pattern matching of complex operations
> > > --> Only with -ffast-math
> > > --> It can match user defined operations (not C99) that can be
> > > simplified with a
> > > complex instruction
> > > --> Dedicated opcode and real vector type choosen VS standard opcode
> > > --> and
> > > complex
> > > mode in our implementation
> > > --> Need to preserve SLP pattern matching as too many applications
> > > redefines
> > > complex and bypass C99 standard.
> > > --> So need to harmonize with our implementation
> > >
> > > - Support of the pure imaginary type (_Imaginary)
> > > --> Still not supported by gcc (and llvm), neither in our
> > > --> implementation Issues comes from the fact that an imaginary is not
> > > --> a complex with
> > > real part
> > > set to 0
> > > --> The same issue with complex multiplication by a real (which is
> > > --> split
> > > in the
> > > frontend, and our implementation hasn't changed it yet)
> > > --> Idea: Add an attribute to the Tree complex type which specify pure
> > > real / pure
> > > imaginary / full complex ?
> > >
> > > - Fast pattern for IEEE compliant emulated operations
> > > --> Not enough time to discuss about it
> > >
> > > Don't hesitate to add something or bring more precision if you want.
> > >
> > > As I said at the end of the presentation, we have written a paper
> > > which explains our implementation in details. You can find it on the
> > > wiki page of the Cauldron
> > >
> > (https://gcc.gnu.org/wiki/cauldron2023talks?action=AttachFile&do=view&tar
> > get=Exposing+Complex+Numbers+to+Target+Back-ends+%28paper%29.pdf).
> >
> > Thanks for the detailed presentation at the Cauldron.
> >
> > My personal summary is that I'm less convinced delaying lowering is the way
> > to go.
>
> I personally like the delayed lowering for scalar because it allows us to properly
> reassociate as a unit. That is to say, it's easier to detect a * b * c when they
> are still complex ops. And the late lowering will allow beter codegen than today.
>
> However I think we should *unconditionally* not lower them, even in situations
> such as a * b * imag(b). This situation can happen by late optimizations anyway
> so it has to be dealt with regardless so I don't think it should punt.
>
> I think you can then conditionally lower if the target does *not* implement the
> optab. i.e. for AArch64 the complex mode wouldn't be useful.
>
Indeed, our current approach in the vectorizer works only if the
complex scalar patterns exist as well, and I agree that it would be
better to if the absence of either scalar or vector patterns would not
prevent any optimizations.
Keeping everything unified until after the vectorizer and the SLP
passes and lowering after that might work. But we would have to try,
and see if we do not run into any problem with -ffast-math and/or
IEE754 compliance.
In particular, in order to not lower imag(b) we could promote it to
__complex_expr__ (0, b, IMAGINARY). At the cost of adding a field to
__complex_expr__ that would help with floating-point compliance and be
a step towards the support of _Imaginary.
> > I do think that if targets implement complex optabs we should use them but
> > eventually re-discovering complex operations from lowered form is going to be
> > more useful.
> > That's because as you said, use of _Complex is limited and
> > people inventing their own representation. SLP vectorization can discover
> > some ops already with the limiting factor being that we don't specifically
> > search for only complex operations (plus we expose the result as vector
> > operations, requiring target support for the vector ops rather than [SD]Cmode
> > operations).
>
> I don't think the two are mutually exclusive, I do think we should form complex
> instructions from scalar ops as well, because we can generate better expansions.
>
> Today we only expand efficiently when the COMPLEX_EXPR node is still there
> and bitfield expansion knows then that the entire value will be written. So
> rediscovery will help there.
>
> I also think if we don't lower early, as you mention we should lower the complex
> operations in the vectorizer. I don't think having the complex mode as vectors
> are useful. This can be easily done by using the scalar vect pattern. It'll have
> to handle all arithmetic ops though, but for those the target has an optab we
> can form it early which would have the SLP one skip it later.
>
Our main motivation to introduce complex modes for vectors was not to
duplicate common SPN with alternatives like "cmul" and such. It is
not a hard requirement from our part. We just thought that it would be
cleaner.
Paul & Sylvain
> This also means vec_lower doesn't have issues anymore.
>
> Cheers,
> Tamar
>
> >
> > There's the gimple-isel.cc or the widen-mul pass that perform instruction
> > selection which could be enhanced to discover scalar [SD]Cmode operations.
> >
> > Richard.
> >
> > > Sylvain
> > >
> > >
> > >
> > >
> > >
>
>
>
>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Complex numbers support: discussions summary
2023-09-26 7:30 ` Richard Biener
2023-09-26 8:29 ` Tamar Christina
@ 2023-09-26 8:53 ` Paul Iannetta
2023-09-26 9:28 ` Tamar Christina
2023-09-26 18:40 ` Toon Moene
2 siblings, 1 reply; 15+ messages in thread
From: Paul Iannetta @ 2023-09-26 8:53 UTC (permalink / raw)
To: Richard Biener; +Cc: Sylvain Noiry, gcc, sylvain.noiry
On Tue, Sep 26, 2023 at 09:30:21AM +0200, Richard Biener via Gcc wrote:
> On Mon, Sep 25, 2023 at 5:17 PM Sylvain Noiry via Gcc <gcc@gcc.gnu.org> wrote:
> >
> > Hi,
> >
> > We had very interesting discussions during our presentation with Paul on
> > the
> > support of complex numbers in gcc at the Cauldron.
> >
> > Thank you all for your participation !
> >
> > Here is a small summary from our viewpoint:
> >
> > - Replace CONCAT with a backend defined internal representation in RTL
> > --> No particular problems
> >
> > - Allow backend to write patterns for operation on complex modes
> > --> No particular problems
> >
> > - Conditional lowering depending on whether a pattern exists or not
> > --> Concerns when the vectorization of split complex operations performs
> > better
> > than not vectorized unified complex operations
> >
> > - Centralize complex lowering in cplxlower
> > --> No particular problems if it doesn't prevent IEEE compliance and
> > optimizations (like const folding)
> >
> > - Vectorization of complex operations
> > --> 2 representations (interleaved and separated real/imag): cannot
> > impose one
> > if some machines prefer the other
> > --> Complex are composite modes, the vectorizer assumes that the inner
> > mode is
> > scalar to do some optimizations (which ones ?)
> > --> Mixed split/unified complex operations cannot be vectorized easely
> > --> Assuming that the inner representation of complex vectors is let to
> > target
> > backends, the vectorizer doesn't know it, which prevent some
> > optimizations
> > (which ones ?)
> >
> > - Explicit vectors of complex
> > --> Cplxlower cannot lower it, and moving veclower before cplxlower is a
> > bad
> > idea as it prevents some optimizations
> > --> Teaching cplxlower how to deal with vectors of complex seems to be a
> > reasonable alternative
> > --> Concerns about ABI or indexing if the internal representation is let
> > to the
> > backend and differs from the representation in memory
> >
> > - Impact of the current SLP pattern matching of complex operations
> > --> Only with -ffast-math
> > --> It can match user defined operations (not C99) that can be
> > simplified with a
> > complex instruction
> > --> Dedicated opcode and real vector type choosen VS standard opcode and
> > complex
> > mode in our implementation
> > --> Need to preserve SLP pattern matching as too many applications
> > redefines
> > complex and bypass C99 standard.
> > --> So need to harmonize with our implementation
> >
> > - Support of the pure imaginary type (_Imaginary)
> > --> Still not supported by gcc (and llvm), neither in our implementation
> > --> Issues comes from the fact that an imaginary is not a complex with
> > real part
> > set to 0
> > --> The same issue with complex multiplication by a real (which is split
> > in the
> > frontend, and our implementation hasn't changed it yet)
> > --> Idea: Add an attribute to the Tree complex type which specify pure
> > real / pure
> > imaginary / full complex ?
> >
> > - Fast pattern for IEEE compliant emulated operations
> > --> Not enough time to discuss about it
> >
> > Don't hesitate to add something or bring more precision if you want.
> >
> > As I said at the end of the presentation, we have written a paper which
> > explains
> > our implementation in details. You can find it on the wiki page of the
> > Cauldron
> > (https://gcc.gnu.org/wiki/cauldron2023talks?action=AttachFile&do=view&target=Exposing+Complex+Numbers+to+Target+Back-ends+%28paper%29.pdf).
>
> Thanks for the detailed presentation at the Cauldron.
>
> My personal summary is that I'm less convinced delaying lowering is
> the way to go.
This is not only delayed lowering, if the SPN are there, there is no
lowering at all.
> I do think that if targets implement complex optabs we should use them but
> eventually re-discovering complex operations from lowered form is going to be
> more useful.
I would not be opposed to rediscovering complex operations but I think
that even though, rediscovering a + b, a - b is easy, a * b would
still be doable, but even a / b will be hard. Even though, I doubt
will see a hardware complex division but who knows. However, once
lowered, re-associating a * b * c and more complex expressions is going
to be hard.
> That's because as you said, use of _Complex is limited and people
> inventing their own representation.
Yes, this would be a step back at first, but, proper support for
_Complex would probably be an incentive for library writers to take
them into account.
> SLP vectorization can discover some ops
> already with the limiting factor being that we don't specifically search for
> only complex operations (plus we expose the result as vector operations,
> requiring target support for the vector ops rather than [SD]Cmode operations).
Our only concern with SLP is that it only works within loops. If we
want to re-discover complex numbers we could either add a
dedicated pass before the SLP vectorizer or rely on match.pd?
>
> There's the gimple-isel.cc or the widen-mul pass that perform
> instruction selection
> which could be enhanced to discover scalar [SD]Cmode operations.
We'll have another look there.
Thanks,
Paul
>
> Richard.
>
> > Sylvain
> >
> >
> >
> >
> >
>
>
>
>
^ permalink raw reply [flat|nested] 15+ messages in thread
* RE: Complex numbers support: discussions summary
2023-09-26 8:53 ` Paul Iannetta
@ 2023-09-26 9:28 ` Tamar Christina
2023-09-26 9:40 ` Paul Iannetta
0 siblings, 1 reply; 15+ messages in thread
From: Tamar Christina @ 2023-09-26 9:28 UTC (permalink / raw)
To: Paul Iannetta, Richard Biener; +Cc: Sylvain Noiry, gcc, sylvain.noiry
> -----Original Message-----
> From: Gcc <gcc-bounces+tamar.christina=arm.com@gcc.gnu.org> On Behalf
> Of Paul Iannetta via Gcc
> Sent: Tuesday, September 26, 2023 9:54 AM
> To: Richard Biener <richard.guenther@gmail.com>
> Cc: Sylvain Noiry <snoiry@kalrayinc.com>; gcc@gcc.gnu.org;
> sylvain.noiry@hotmail.fr
> Subject: Re: Complex numbers support: discussions summary
>
> On Tue, Sep 26, 2023 at 09:30:21AM +0200, Richard Biener via Gcc wrote:
> > On Mon, Sep 25, 2023 at 5:17 PM Sylvain Noiry via Gcc <gcc@gcc.gnu.org>
> wrote:
> > >
> > > Hi,
> > >
> > > We had very interesting discussions during our presentation with
> > > Paul on the support of complex numbers in gcc at the Cauldron.
> > >
> > > Thank you all for your participation !
> > >
> > > Here is a small summary from our viewpoint:
> > >
> > > - Replace CONCAT with a backend defined internal representation in
> > > RTL
> > > --> No particular problems
> > >
> > > - Allow backend to write patterns for operation on complex modes
> > > --> No particular problems
> > >
> > > - Conditional lowering depending on whether a pattern exists or not
> > > --> Concerns when the vectorization of split complex operations
> > > --> performs
> > > better
> > > than not vectorized unified complex operations
> > >
> > > - Centralize complex lowering in cplxlower
> > > --> No particular problems if it doesn't prevent IEEE compliance and
> > > optimizations (like const folding)
> > >
> > > - Vectorization of complex operations
> > > --> 2 representations (interleaved and separated real/imag): cannot
> > > impose one
> > > if some machines prefer the other
> > > --> Complex are composite modes, the vectorizer assumes that the
> > > --> inner
> > > mode is
> > > scalar to do some optimizations (which ones ?)
> > > --> Mixed split/unified complex operations cannot be vectorized
> > > --> easely Assuming that the inner representation of complex vectors
> > > --> is let to
> > > target
> > > backends, the vectorizer doesn't know it, which prevent some
> > > optimizations
> > > (which ones ?)
> > >
> > > - Explicit vectors of complex
> > > --> Cplxlower cannot lower it, and moving veclower before cplxlower
> > > --> is a
> > > bad
> > > idea as it prevents some optimizations
> > > --> Teaching cplxlower how to deal with vectors of complex seems to
> > > --> be a
> > > reasonable alternative
> > > --> Concerns about ABI or indexing if the internal representation is
> > > --> let
> > > to the
> > > backend and differs from the representation in memory
> > >
> > > - Impact of the current SLP pattern matching of complex operations
> > > --> Only with -ffast-math
> > > --> It can match user defined operations (not C99) that can be
> > > simplified with a
> > > complex instruction
> > > --> Dedicated opcode and real vector type choosen VS standard opcode
> > > --> and
> > > complex
> > > mode in our implementation
> > > --> Need to preserve SLP pattern matching as too many applications
> > > redefines
> > > complex and bypass C99 standard.
> > > --> So need to harmonize with our implementation
> > >
> > > - Support of the pure imaginary type (_Imaginary)
> > > --> Still not supported by gcc (and llvm), neither in our
> > > --> implementation Issues comes from the fact that an imaginary is
> > > --> not a complex with
> > > real part
> > > set to 0
> > > --> The same issue with complex multiplication by a real (which is
> > > --> split
> > > in the
> > > frontend, and our implementation hasn't changed it yet)
> > > --> Idea: Add an attribute to the Tree complex type which specify
> > > --> pure
> > > real / pure
> > > imaginary / full complex ?
> > >
> > > - Fast pattern for IEEE compliant emulated operations
> > > --> Not enough time to discuss about it
> > >
> > > Don't hesitate to add something or bring more precision if you want.
> > >
> > > As I said at the end of the presentation, we have written a paper
> > > which explains our implementation in details. You can find it on the
> > > wiki page of the Cauldron
> > >
> (https://gcc.gnu.org/wiki/cauldron2023talks?action=AttachFile&do=view&tar
> get=Exposing+Complex+Numbers+to+Target+Back-ends+%28paper%29.pdf).
> >
> > Thanks for the detailed presentation at the Cauldron.
> >
> > My personal summary is that I'm less convinced delaying lowering is
> > the way to go.
>
> This is not only delayed lowering, if the SPN are there, there is no lowering at
> all.
>
> > I do think that if targets implement complex optabs we should use them
> > but eventually re-discovering complex operations from lowered form is
> > going to be more useful.
>
> I would not be opposed to rediscovering complex operations but I think that
> even though, rediscovering a + b, a - b is easy, a * b would still be doable, but
> even a / b will be hard. Even though, I doubt will see a hardware complex
> division but who knows. However, once lowered, re-associating a * b * c and
> more complex expressions is going to be hard.
>
> > That's because as you said, use of _Complex is limited and people
> > inventing their own representation.
>
> Yes, this would be a step back at first, but, proper support for _Complex would
> probably be an incentive for library writers to take them into account.
>
> > SLP vectorization can discover some ops already with the limiting
> > factor being that we don't specifically search for only complex
> > operations (plus we expose the result as vector operations, requiring
> > target support for the vector ops rather than [SD]Cmode operations).
>
> Our only concern with SLP is that it only works within loops. If we want to re-
> discover complex numbers we could either add a dedicated pass before the
> SLP vectorizer or rely on match.pd?
SLP doesn't work in just loops. SLP works on scalar statements inside BBs starting
from sink (constructors, stores, reductions etc).
I think you're confusing Loop-Aware SLP and SLP (in GCC these are two different
Passes that share much common code.
Tamar
>
> >
> > There's the gimple-isel.cc or the widen-mul pass that perform
> > instruction selection which could be enhanced to discover scalar
> > [SD]Cmode operations.
>
> We'll have another look there.
>
> Thanks,
> Paul
> >
> > Richard.
> >
> > > Sylvain
> > >
> > >
> > >
> > >
> > >
> >
> >
> >
> >
>
>
>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Complex numbers support: discussions summary
2023-09-26 9:28 ` Tamar Christina
@ 2023-09-26 9:40 ` Paul Iannetta
0 siblings, 0 replies; 15+ messages in thread
From: Paul Iannetta @ 2023-09-26 9:40 UTC (permalink / raw)
To: Tamar Christina; +Cc: Richard Biener, Sylvain Noiry, gcc, sylvain.noiry
On Tue, Sep 26, 2023 at 09:28:08AM +0000, Tamar Christina wrote:
> > -----Original Message-----
> > From: Gcc <gcc-bounces+tamar.christina=arm.com@gcc.gnu.org> On Behalf
> > Of Paul Iannetta via Gcc
> > Sent: Tuesday, September 26, 2023 9:54 AM
> > To: Richard Biener <richard.guenther@gmail.com>
> > Cc: Sylvain Noiry <snoiry@kalrayinc.com>; gcc@gcc.gnu.org;
> > sylvain.noiry@hotmail.fr
> > Subject: Re: Complex numbers support: discussions summary
> >
> > On Tue, Sep 26, 2023 at 09:30:21AM +0200, Richard Biener via Gcc wrote:
> > > On Mon, Sep 25, 2023 at 5:17 PM Sylvain Noiry via Gcc <gcc@gcc.gnu.org>
> > wrote:
> > > >
> > > > Hi,
> > > >
> > > > We had very interesting discussions during our presentation with
> > > > Paul on the support of complex numbers in gcc at the Cauldron.
> > > >
> > > > Thank you all for your participation !
> > > >
> > > > Here is a small summary from our viewpoint:
> > > >
> > > > - Replace CONCAT with a backend defined internal representation in
> > > > RTL
> > > > --> No particular problems
> > > >
> > > > - Allow backend to write patterns for operation on complex modes
> > > > --> No particular problems
> > > >
> > > > - Conditional lowering depending on whether a pattern exists or not
> > > > --> Concerns when the vectorization of split complex operations
> > > > --> performs
> > > > better
> > > > than not vectorized unified complex operations
> > > >
> > > > - Centralize complex lowering in cplxlower
> > > > --> No particular problems if it doesn't prevent IEEE compliance and
> > > > optimizations (like const folding)
> > > >
> > > > - Vectorization of complex operations
> > > > --> 2 representations (interleaved and separated real/imag): cannot
> > > > impose one
> > > > if some machines prefer the other
> > > > --> Complex are composite modes, the vectorizer assumes that the
> > > > --> inner
> > > > mode is
> > > > scalar to do some optimizations (which ones ?)
> > > > --> Mixed split/unified complex operations cannot be vectorized
> > > > --> easely Assuming that the inner representation of complex vectors
> > > > --> is let to
> > > > target
> > > > backends, the vectorizer doesn't know it, which prevent some
> > > > optimizations
> > > > (which ones ?)
> > > >
> > > > - Explicit vectors of complex
> > > > --> Cplxlower cannot lower it, and moving veclower before cplxlower
> > > > --> is a
> > > > bad
> > > > idea as it prevents some optimizations
> > > > --> Teaching cplxlower how to deal with vectors of complex seems to
> > > > --> be a
> > > > reasonable alternative
> > > > --> Concerns about ABI or indexing if the internal representation is
> > > > --> let
> > > > to the
> > > > backend and differs from the representation in memory
> > > >
> > > > - Impact of the current SLP pattern matching of complex operations
> > > > --> Only with -ffast-math
> > > > --> It can match user defined operations (not C99) that can be
> > > > simplified with a
> > > > complex instruction
> > > > --> Dedicated opcode and real vector type choosen VS standard opcode
> > > > --> and
> > > > complex
> > > > mode in our implementation
> > > > --> Need to preserve SLP pattern matching as too many applications
> > > > redefines
> > > > complex and bypass C99 standard.
> > > > --> So need to harmonize with our implementation
> > > >
> > > > - Support of the pure imaginary type (_Imaginary)
> > > > --> Still not supported by gcc (and llvm), neither in our
> > > > --> implementation Issues comes from the fact that an imaginary is
> > > > --> not a complex with
> > > > real part
> > > > set to 0
> > > > --> The same issue with complex multiplication by a real (which is
> > > > --> split
> > > > in the
> > > > frontend, and our implementation hasn't changed it yet)
> > > > --> Idea: Add an attribute to the Tree complex type which specify
> > > > --> pure
> > > > real / pure
> > > > imaginary / full complex ?
> > > >
> > > > - Fast pattern for IEEE compliant emulated operations
> > > > --> Not enough time to discuss about it
> > > >
> > > > Don't hesitate to add something or bring more precision if you want.
> > > >
> > > > As I said at the end of the presentation, we have written a paper
> > > > which explains our implementation in details. You can find it on the
> > > > wiki page of the Cauldron
> > > >
> > (https://gcc.gnu.org/wiki/cauldron2023talks?action=AttachFile&do=view&tar
> > get=Exposing+Complex+Numbers+to+Target+Back-ends+%28paper%29.pdf).
> > >
> > > Thanks for the detailed presentation at the Cauldron.
> > >
> > > My personal summary is that I'm less convinced delaying lowering is
> > > the way to go.
> >
> > This is not only delayed lowering, if the SPN are there, there is no lowering at
> > all.
> >
> > > I do think that if targets implement complex optabs we should use them
> > > but eventually re-discovering complex operations from lowered form is
> > > going to be more useful.
> >
> > I would not be opposed to rediscovering complex operations but I think that
> > even though, rediscovering a + b, a - b is easy, a * b would still be doable, but
> > even a / b will be hard. Even though, I doubt will see a hardware complex
> > division but who knows. However, once lowered, re-associating a * b * c and
> > more complex expressions is going to be hard.
> >
> > > That's because as you said, use of _Complex is limited and people
> > > inventing their own representation.
> >
> > Yes, this would be a step back at first, but, proper support for _Complex would
> > probably be an incentive for library writers to take them into account.
> >
> > > SLP vectorization can discover some ops already with the limiting
> > > factor being that we don't specifically search for only complex
> > > operations (plus we expose the result as vector operations, requiring
> > > target support for the vector ops rather than [SD]Cmode operations).
> >
> > Our only concern with SLP is that it only works within loops. If we want to re-
> > discover complex numbers we could either add a dedicated pass before the
> > SLP vectorizer or rely on match.pd?
>
> SLP doesn't work in just loops. SLP works on scalar statements inside BBs starting
> from sink (constructors, stores, reductions etc).
> I think you're confusing Loop-Aware SLP and SLP (in GCC these are two different
> Passes that share much common code.
>
Indeed, we conflated both. Thanks for pointing this out!
Paul
> Tamar
> >
> > >
> > > There's the gimple-isel.cc or the widen-mul pass that perform
> > > instruction selection which could be enhanced to discover scalar
> > > [SD]Cmode operations.
> >
> > We'll have another look there.
> >
> > Thanks,
> > Paul
> > >
> > > Richard.
> > >
> > > > Sylvain
> > > >
> > > >
> > > >
> > > >
> > > >
> > >
> > >
> > >
> > >
> >
> >
> >
>
>
>
>
>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Complex numbers support: discussions summary
2023-09-26 7:30 ` Richard Biener
2023-09-26 8:29 ` Tamar Christina
2023-09-26 8:53 ` Paul Iannetta
@ 2023-09-26 18:40 ` Toon Moene
2023-10-05 14:45 ` Toon Moene
2 siblings, 1 reply; 15+ messages in thread
From: Toon Moene @ 2023-09-26 18:40 UTC (permalink / raw)
To: gcc
On 9/26/23 09:30, Richard Biener via Gcc wrote:
> On Mon, Sep 25, 2023 at 5:17 PM Sylvain Noiry via Gcc <gcc@gcc.gnu.org> wrote:
>> As I said at the end of the presentation, we have written a paper which
>> explains
>> our implementation in details. You can find it on the wiki page of the
>> Cauldron
>> (https://gcc.gnu.org/wiki/cauldron2023talks?action=AttachFile&do=view&target=Exposing+Complex+Numbers+to+Target+Back-ends+%28paper%29.pdf).
>
> Thanks for the detailed presentation at the Cauldron.
>
> My personal summary is that I'm less convinced delaying lowering is
> the way to go.
Thanks Sylvain for the quick summary of the discussion - it helps a
great deal now that the discussion is still fresh in our memory.
Some thought I came up with (of course, only after the end of the
conference):
In what way is the handling of the complex type different from that of
the 128 bit real (i.e., float) type ?
Both are not implemented on most architectures; on most they require two
registers (or possibly two memory location that do not necessarily have
to be adjacent) to be implemented.
Yet both are supported by the middle end - consider the clear
equivalence of the handling of variables a and b when looking at the
result of -fdump-tree-ssa (on x86_64) for:
cat 128.f90
parameter (iq=kind(1q0))
real(kind=iq) :: a, b
read*, a, b
print*, a / b
end
and:
cat complex.f90
complex a,b
read*,a,b
print*,a/b
end
Hope this helps for a continuing fruitful discussion.
Kind regards,
--
Toon Moene - e-mail: toon@moene.org - phone: +31 346 214290
Saturnushof 14, 3738 XG Maartensdijk, The Netherlands
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Complex numbers support: discussions summary
2023-09-26 18:40 ` Toon Moene
@ 2023-10-05 14:45 ` Toon Moene
0 siblings, 0 replies; 15+ messages in thread
From: Toon Moene @ 2023-10-05 14:45 UTC (permalink / raw)
To: gcc; +Cc: gfortran
On 9/26/23 20:40, Toon Moene wrote:
> On 9/26/23 09:30, Richard Biener via Gcc wrote:
>
>> On Mon, Sep 25, 2023 at 5:17 PM Sylvain Noiry via Gcc
>> <gcc@gcc.gnu.org> wrote:
>
>>> As I said at the end of the presentation, we have written a paper which
>>> explains
>>> our implementation in details. You can find it on the wiki page of the
>>> Cauldron
>>> (https://gcc.gnu.org/wiki/cauldron2023talks?action=AttachFile&do=view&target=Exposing+Complex+Numbers+to+Target+Back-ends+%28paper%29.pdf).
>>
>> Thanks for the detailed presentation at the Cauldron.
>>
>> My personal summary is that I'm less convinced delaying lowering is
>> the way to go.
>
> Thanks Sylvain for the quick summary of the discussion - it helps a
> great deal now that the discussion is still fresh in our memory.
I found time today to run some tests.
First of all, the result of the gcc test harness as applied to the top
of the complex/kvx branch in the https://github.com/kalray/gcc repository:
https://gcc.gnu.org/pipermail/gcc-testresults/2023-October/797627.html
I think there are several complex failures here that are not in
"standard" 12.2 release (for x86_64-linux-gnu).
I also compiled all of lapack-3.11.0 with that compiler and obtained the
same results as with gcc/gfortran 13.2:
--> LAPACK TESTING SUMMARY <--
Processing LAPACK Testing output found in the TESTING directory
SUMMARY nb test run numerical error other error
================ =========== ================= ================
REAL 1327023 0 (0.000%) 0 (0.000%)
DOUBLE PRECISION 1300917 6 (0.000%) 0 (0.000%)
COMPLEX 786775 0 (0.000%) 0 (0.000%)
COMPLEX16 787842 0 (0.000%) 0 (0.000%)
--> ALL PRECISIONS 4202557 6 (0.000%) 0 (0.000%)
Kind regards,
--
Toon Moene - e-mail: toon@moene.org - phone: +31 346 214290
Saturnushof 14, 3738 XG Maartensdijk, The Netherlands
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Complex numbers support: discussions summary
2023-09-25 15:15 Complex numbers support: discussions summary Sylvain Noiry
2023-09-26 7:30 ` Richard Biener
@ 2023-09-26 15:46 ` Joseph Myers
1 sibling, 0 replies; 15+ messages in thread
From: Joseph Myers @ 2023-09-26 15:46 UTC (permalink / raw)
To: Sylvain Noiry; +Cc: gcc, sylvain.noiry
[-- Attachment #1: Type: text/plain, Size: 920 bytes --]
On Mon, 25 Sep 2023, Sylvain Noiry via Gcc wrote:
> --> Idea: Add an attribute to the Tree complex type which specify pure real /
> pure
> imaginary / full complex ?
If you start from the implementation approach of lowering imaginary type
operations in the front end, a flag on a REAL_TYPE would seem natural (and
then the rest of the compiler could treat such types the same as other
real types with the same machine modes).
That would however mean that e.g. conversions to/from imaginary types in
the front end are not the same thing as what you get from middle-end
conversions (the former would apply the rules that converting imaginary to
real or real to imaginary produces zero while preserving side effects from
the expression converted; the latter would be a no-op conversion if the
machine modes are the same), which has the potential to be confusing.
--
Joseph S. Myers
joseph@codesourcery.com
^ permalink raw reply [flat|nested] 15+ messages in thread
* Complex numbers support: discussions summary
@ 2023-10-16 9:14 Sylvain Noiry
2023-10-17 20:37 ` Toon Moene
0 siblings, 1 reply; 15+ messages in thread
From: Sylvain Noiry @ 2023-10-16 9:14 UTC (permalink / raw)
To: gcc; +Cc: piannetta
Hi,
We are trying to update our patches on complex numbers to take into
account what has been discussed.
The main change from our previous patches consists of replacing vectors
of complex types with classical vectors of real types (ex V4SF instead
of V2SC) associated with existing complex opcodes (like .COMPLEX_MUL)
when vectorizing. Non vectored complex modes are also replaced by
vectors of two reals at the end of the middle-end (ex SC to V2SF), so
that it can reuse already existing patterns. Indeed, non complex
specific operations like an addition does not require an specific
pattern anymore, and already implementing patterns like cmul, cmul_conj,
cadd90,... can be used.
To do so, the cplxlower pass has been cut into two passes:
- The first one replace complex specific opcodes with dedicated
opcodes (like .COMPLEX_MUL replacing MUL_EXPR with SC mode), but complex
modes are kept at this point. Unsupported native operations are also
lowered, because we assume that it's better to lower and hope for
standard optimizations in the middle-end than trying to vectorize with
near-zero chance, and then lower only after.
- The second one almost only remaps non vectored complex modes into
vector of two reals (like SC to V2SF).
So the vectorizer takes complex modes as input but vectorize with
vectors of real modes (ex V4SF vector mode for SC). Because complex
specific opcodes have been set before, no confusion with real operations
is possible. We also may use vectors of two reals as inputs, but
vectorizing small vector modes into bigger ones (like V2SF to V4SF) is
not possible.
Here are some advantages of this new approach:
- No more vectors of complex modes
- The vectorization of complex operations is improved, because split
and unified vectored statements can easely be mixed as it uses the same
vector type. We can also imagine to test multiple options (First: native
vectored, second: split vectored, third: unified scalar,...).
- It reuses patterns for vectors of two reals for non complex
specific operations, and also already existing complex patterns like
cmul implemented on aarch64, which could mean almost free performance
gains on many targets.
On the performance side, we can still exploit the full potential of
complex instructions on KVX. To illustrate the gains on aarch64 without
rewriting any patterns (except a mov), here is the assembly generated
for a vector complex mul mul add with -O2 -mcpu=neoverse-v1 (and without
ffast-math like with SLP):
void vfmma (_Complex float a[restrict N], _Complex float b[restrict N],
_Complex float c[restrict N], _Complex float
d[restrict N])
{
for (int i = 0; i < N; i++)
c[i] += a[i] * b[i] * d[i];
}
vfmma:
movi v3.4s, 0
mov x4, 0
.align 5
.L2:
ldr q2, [x1, x4]
mov v1.16b, v3.16b
ldr q0, [x0, x4]
fcmla v1.4s, v0.4s, v2.4s, #0
fcmla v1.4s, v0.4s, v2.4s, #90
ldr q0, [x2, x4]
ldr q2, [x3, x4]
fcmla v0.4s, v2.4s, v1.4s, #0
fcmla v0.4s, v2.4s, v1.4s, #90
str q0, [x2, x4]
add x4, x4, 16
cmp x4, 256
bne .L2
ret
We have only done some experimentation with this approach. If you think
that it could be interesting we will try to develop it more.
Thanks,
Sylvain
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Complex numbers support: discussions summary
2023-10-16 9:14 Sylvain Noiry
@ 2023-10-17 20:37 ` Toon Moene
2023-10-18 7:24 ` Sylvain Noiry
0 siblings, 1 reply; 15+ messages in thread
From: Toon Moene @ 2023-10-17 20:37 UTC (permalink / raw)
To: Sylvain Noiry, gcc; +Cc: piannetta
Sylvain,
Is this on a branch in your github repository
https://github.com/kalray/gcc
somewhere ?
That would make it easier to test it for me (and probably others).
See for instance my mail here (d.d. Thu Oct 5 14:45:05 GMT 2023):
https://gcc.gnu.org/pipermail/gcc/2023-October/242643.html
Thanks in advance.
Kind regards,
Toon Moene.
On 10/16/23 11:14, Sylvain Noiry via Gcc wrote:
> Hi,
>
> We are trying to update our patches on complex numbers to take into
> account what has been discussed.
>
> The main change from our previous patches consists of replacing vectors
> of complex types with classical vectors of real types (ex V4SF instead
> of V2SC) associated with existing complex opcodes (like .COMPLEX_MUL)
> when vectorizing. Non vectored complex modes are also replaced by
> vectors of two reals at the end of the middle-end (ex SC to V2SF), so
> that it can reuse already existing patterns. Indeed, non complex
> specific operations like an addition does not require an specific
> pattern anymore, and already implementing patterns like cmul, cmul_conj,
> cadd90,... can be used.
>
> To do so, the cplxlower pass has been cut into two passes:
> - The first one replace complex specific opcodes with dedicated
> opcodes (like .COMPLEX_MUL replacing MUL_EXPR with SC mode), but complex
> modes are kept at this point. Unsupported native operations are also
> lowered, because we assume that it's better to lower and hope for
> standard optimizations in the middle-end than trying to vectorize with
> near-zero chance, and then lower only after.
> - The second one almost only remaps non vectored complex modes into
> vector of two reals (like SC to V2SF).
>
> So the vectorizer takes complex modes as input but vectorize with
> vectors of real modes (ex V4SF vector mode for SC). Because complex
> specific opcodes have been set before, no confusion with real operations
> is possible. We also may use vectors of two reals as inputs, but
> vectorizing small vector modes into bigger ones (like V2SF to V4SF) is
> not possible.
>
> Here are some advantages of this new approach:
> - No more vectors of complex modes
> - The vectorization of complex operations is improved, because split
> and unified vectored statements can easely be mixed as it uses the same
> vector type. We can also imagine to test multiple options (First: native
> vectored, second: split vectored, third: unified scalar,...).
> - It reuses patterns for vectors of two reals for non complex
> specific operations, and also already existing complex patterns like
> cmul implemented on aarch64, which could mean almost free performance
> gains on many targets.
>
> On the performance side, we can still exploit the full potential of
> complex instructions on KVX. To illustrate the gains on aarch64 without
> rewriting any patterns (except a mov), here is the assembly generated
> for a vector complex mul mul add with -O2 -mcpu=neoverse-v1 (and without
> ffast-math like with SLP):
>
> void vfmma (_Complex float a[restrict N], _Complex float b[restrict N],
> _Complex float c[restrict N], _Complex float
> d[restrict N])
> {
> for (int i = 0; i < N; i++)
> c[i] += a[i] * b[i] * d[i];
> }
>
>
> vfmma:
> movi v3.4s, 0
> mov x4, 0
> .align 5
> .L2:
> ldr q2, [x1, x4]
> mov v1.16b, v3.16b
> ldr q0, [x0, x4]
> fcmla v1.4s, v0.4s, v2.4s, #0
> fcmla v1.4s, v0.4s, v2.4s, #90
> ldr q0, [x2, x4]
> ldr q2, [x3, x4]
> fcmla v0.4s, v2.4s, v1.4s, #0
> fcmla v0.4s, v2.4s, v1.4s, #90
> str q0, [x2, x4]
> add x4, x4, 16
> cmp x4, 256
> bne .L2
> ret
>
> We have only done some experimentation with this approach. If you think
> that it could be interesting we will try to develop it more.
>
> Thanks,
>
> Sylvain
>
>
>
>
>
--
Toon Moene - e-mail: toon@moene.org - phone: +31 346 214290
Saturnushof 14, 3738 XG Maartensdijk, The Netherlands
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Complex numbers support: discussions summary
2023-10-17 20:37 ` Toon Moene
@ 2023-10-18 7:24 ` Sylvain Noiry
0 siblings, 0 replies; 15+ messages in thread
From: Sylvain Noiry @ 2023-10-18 7:24 UTC (permalink / raw)
To: Toon Moene, gcc; +Cc: piannetta
Hello Toon,
the implementation is not finished, we have just made some tests for now.
If no one sees huge problems with this new approach, we will continue to
implement and stabilize it.
Thank you for your interest !
Sylvain
On 10/17/23 22:37, Toon Moene wrote:
> Sylvain,
>
> Is this on a branch in your github repository
>
> https://github.com/kalray/gcc
>
> somewhere ?
>
> That would make it easier to test it for me (and probably others).
>
> See for instance my mail here (d.d. Thu Oct 5 14:45:05 GMT 2023):
>
> https://gcc.gnu.org/pipermail/gcc/2023-October/242643.html
>
> Thanks in advance.
>
> Kind regards,
>
> Toon Moene.
>
> On 10/16/23 11:14, Sylvain Noiry via Gcc wrote:
>
>> Hi,
>>
>> We are trying to update our patches on complex numbers to take into
>> account what has been discussed.
>>
>> The main change from our previous patches consists of replacing
>> vectors of complex types with classical vectors of real types (ex
>> V4SF instead of V2SC) associated with existing complex opcodes (like
>> .COMPLEX_MUL) when vectorizing. Non vectored complex modes are also
>> replaced by vectors of two reals at the end of the middle-end (ex SC
>> to V2SF), so that it can reuse already existing patterns. Indeed,
>> non complex specific operations like an addition does not require an
>> specific pattern anymore, and already implementing patterns like
>> cmul, cmul_conj, cadd90,... can be used.
>>
>> To do so, the cplxlower pass has been cut into two passes:
>> - The first one replace complex specific opcodes with dedicated
>> opcodes (like .COMPLEX_MUL replacing MUL_EXPR with SC mode), but
>> complex modes are kept at this point. Unsupported native operations
>> are also lowered, because we assume that it's better to lower and
>> hope for standard optimizations in the middle-end than trying to
>> vectorize with near-zero chance, and then lower only after.
>> - The second one almost only remaps non vectored complex modes
>> into vector of two reals (like SC to V2SF).
>>
>> So the vectorizer takes complex modes as input but vectorize with
>> vectors of real modes (ex V4SF vector mode for SC). Because complex
>> specific opcodes have been set before, no confusion with real
>> operations is possible. We also may use vectors of two reals as
>> inputs, but vectorizing small vector modes into bigger ones (like
>> V2SF to V4SF) is not possible.
>>
>> Here are some advantages of this new approach:
>> - No more vectors of complex modes
>> - The vectorization of complex operations is improved, because
>> split and unified vectored statements can easely be mixed as it uses
>> the same vector type. We can also imagine to test multiple options
>> (First: native vectored, second: split vectored, third: unified
>> scalar,...).
>> - It reuses patterns for vectors of two reals for non complex
>> specific operations, and also already existing complex patterns like
>> cmul implemented on aarch64, which could mean almost free performance
>> gains on many targets.
>>
>> On the performance side, we can still exploit the full potential of
>> complex instructions on KVX. To illustrate the gains on aarch64
>> without rewriting any patterns (except a mov), here is the assembly
>> generated for a vector complex mul mul add with -O2 -mcpu=neoverse-v1
>> (and without ffast-math like with SLP):
>>
>> void vfmma (_Complex float a[restrict N], _Complex float b[restrict N],
>> _Complex float c[restrict N], _Complex float
>> d[restrict N])
>> {
>> for (int i = 0; i < N; i++)
>> c[i] += a[i] * b[i] * d[i];
>> }
>>
>>
>> vfmma:
>> movi v3.4s, 0
>> mov x4, 0
>> .align 5
>> .L2:
>> ldr q2, [x1, x4]
>> mov v1.16b, v3.16b
>> ldr q0, [x0, x4]
>> fcmla v1.4s, v0.4s, v2.4s, #0
>> fcmla v1.4s, v0.4s, v2.4s, #90
>> ldr q0, [x2, x4]
>> ldr q2, [x3, x4]
>> fcmla v0.4s, v2.4s, v1.4s, #0
>> fcmla v0.4s, v2.4s, v1.4s, #90
>> str q0, [x2, x4]
>> add x4, x4, 16
>> cmp x4, 256
>> bne .L2
>> ret
>>
>> We have only done some experimentation with this approach. If you
>> think that it could be interesting we will try to develop it more.
>>
>> Thanks,
>>
>> Sylvain
>>
>>
>>
>>
>>
>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Complex numbers support: discussions summary
@ 2023-10-09 13:29 Sylvain Noiry
0 siblings, 0 replies; 15+ messages in thread
From: Sylvain Noiry @ 2023-10-09 13:29 UTC (permalink / raw)
To: gcc, toon
[-- Attachment #1: Type: text/plain, Size: 2653 bytes --]
> On 9/26/23 20:40, Toon Moene wrote:
>
>>/On 9/26/23 09:30, Richard Biener via Gcc wrote: />>//>>>/On Mon, Sep 25, 2023 at 5:17 PM Sylvain Noiry via Gcc />>>/<gcc@gcc.gnu.org> wrote: />>//>>>>/As I said at the end of the presentation, we have written a paper which />>>>/explains />>>>/our implementation in details. You can find it on the wiki page of the />>>>/Cauldron />>>>/(https://gcc.gnu.org/wiki/cauldron2023talks?action=AttachFile&do=view&target=Exposing+Complex+Numbers+to+Target+Back-ends+%28paper%29.pdf
<https://gcc.gnu.org/wiki/cauldron2023talks?action=AttachFile&do=view&target=Exposing+Complex+Numbers+to+Target+Back-ends+%28paper%29.pdf>).
/>>>//>>>/Thanks for the detailed presentation at the Cauldron. />>>//>>>/My personal summary is that I'm less convinced delaying lowering is />>>/the way to go. />>//>>/Thanks Sylvain for the quick summary of the discussion - it helps a />>/great deal now that the discussion is still fresh in our memory. />
> I found time today to run some tests.
>
> First of all, the result of the gcc test harness as applied to the top
> of the complex/kvx branch in the https://github.com/kalray/gcc repository:
>
> https://gcc.gnu.org/pipermail/gcc-testresults/2023-October/797627.html <https://gcc.gnu.org/pipermail/gcc-testresults/2023-October/797627.html>
>
> I think there are several complex failures here that are not in
> "standard" 12.2 release (for x86_64-linux-gnu).
We have removed some special cases for complex operations outside the cplxlower pass (especially in tree-ssa-forwprop.cc), because of it ruined our efforts to maintain it not lowered. So the performance is fine on the KVX target, but some (SLP) vectorization cases are missed for other targets which do not exploit complex patterns.
It may be interesting to add a conditions on theses cases rather than just remove them.
> I also compiled all of lapack-3.11.0 with that compiler and obtained the
> same results as with gcc/gfortran 13.2:
>
> --> LAPACK TESTING SUMMARY <--
> Processing LAPACK Testing output found in the TESTING directory
> SUMMARY nb test run numerical error other error
> ================ =========== ================= ================
> REAL 1327023 0 (0.000%) 0 (0.000%)
> DOUBLE PRECISION 1300917 6 (0.000%) 0 (0.000%)
> COMPLEX 786775 0 (0.000%) 0 (0.000%)
> COMPLEX16 787842 0 (0.000%) 0 (0.000%)
>
> --> ALL PRECISIONS 4202557 6 (0.000%) 0 (0.000%)
Thank you! It doesn't surprise me because GCC still processed complex operations like before when the backend does not exploit complex patterns.
Best regards,
Sylvain
^ permalink raw reply [flat|nested] 15+ messages in thread
* Complex numbers support: discussions summary
@ 2023-09-25 14:56 Sylvain Noiry
0 siblings, 0 replies; 15+ messages in thread
From: Sylvain Noiry @ 2023-09-25 14:56 UTC (permalink / raw)
To: gcc; +Cc: piannetta, Benoit Dinechin
[-- Attachment #1: Type: text/plain, Size: 3077 bytes --]
Hi,
We had very interesting discussions during our presentation with Paul on
the
support of complex numbers in gcc at the Cauldron.
Thank you all for your participation !
Here is a small summary from our viewpoint:
- Replace CONCAT with a backend defined internal representation in RTL
--> No particular problems
- Allow backend to write patterns for operation on complex modes
--> No particular problems
- Conditional lowering depending on whether a pattern exists or not
--> Concerns when the vectorization of split complex operations performs
better
than not vectorized unified complex operations
- Centralize complex lowering in cplxlower
--> No particular problems if it doesn't prevent IEEE compliance and
optimizations (like const folding)
- Vectorization of complex operations
--> 2 representations (interleaved and separated real/imag): cannot
impose one
if some machines prefer the other
--> Complex are composite modes, the vectorizer assumes that the inner
mode is
scalar to do some optimizations (which ones ?)
--> Mixed split/unified complex operations cannot be vectorized easely
--> Assuming that the inner representation of complex vectors is let to
target
backends, the vectorizer doesn't know it, which prevent some
optimizations
(which ones ?)
- Explicit vectors of complex
--> Cplxlower cannot lower it, and moving veclower before cplxlower is a
bad
idea as it prevents some optimizations
--> Teaching cplxlower how to deal with vectors of complex seems to be a
reasonable alternative
--> Concerns about ABI or indexing if the internal representation is let
to the
backend and differs from the representation in memory
- Impact of the current SLP pattern matching of complex operations
--> Only with -ffast-math
--> It can match user defined operations (not C99) that can be
simplified with a
complex instruction
--> Dedicated opcode and real vector type choosen VS standard opcode and
complex
mode in our implementation
--> Need to preserve SLP pattern matching as too many applications
redefines
complex and bypass C99 standard.
--> So need to harmonize with our implementation
- Support of the pure imaginary type (_Imaginary)
--> Still not supported by gcc (and llvm), neither in our implementation
--> Issues comes from the fact that an imaginary is not a complex with
real part
set to 0
--> The same issue with complex multiplication by a real (which is split
in the
frontend, and our implementation hasn't changed it yet)
--> Idea: Add an attribute to the Tree complex type which specify pure
real / pure
imaginary / full complex ?
- Fast pattern for IEEE compliant emulated operations
--> Not enough time to discuss about it
Don't hesitate to add something or bring more precision if you want.
As I said at the end of the presentation, we have written a paper which
explains
our implementation in details. You can find it attached to this mail
(and latter in the wiki page of the Cauldron).
Sylvain
66,4 Bot
[-- Attachment #2: exposing_complex_numbers_to_target_backends_GNU_Cauldron_2023.pdf --]
[-- Type: application/pdf, Size: 715823 bytes --]
^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2023-10-18 7:24 UTC | newest]
Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-09-25 15:15 Complex numbers support: discussions summary Sylvain Noiry
2023-09-26 7:30 ` Richard Biener
2023-09-26 8:29 ` Tamar Christina
2023-09-26 10:19 ` Paul Iannetta
2023-09-26 8:53 ` Paul Iannetta
2023-09-26 9:28 ` Tamar Christina
2023-09-26 9:40 ` Paul Iannetta
2023-09-26 18:40 ` Toon Moene
2023-10-05 14:45 ` Toon Moene
2023-09-26 15:46 ` Joseph Myers
-- strict thread matches above, loose matches on Subject: below --
2023-10-16 9:14 Sylvain Noiry
2023-10-17 20:37 ` Toon Moene
2023-10-18 7:24 ` Sylvain Noiry
2023-10-09 13:29 Sylvain Noiry
2023-09-25 14:56 Sylvain Noiry
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).