public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
* Complex numbers support: discussions summary
@ 2023-09-25 15:15 Sylvain Noiry
  2023-09-26  7:30 ` Richard Biener
  2023-09-26 15:46 ` Joseph Myers
  0 siblings, 2 replies; 15+ messages in thread
From: Sylvain Noiry @ 2023-09-25 15:15 UTC (permalink / raw)
  To: gcc; +Cc: sylvain.noiry

Hi,

We had very interesting discussions during our presentation with Paul on 
the
support of complex numbers in gcc at the Cauldron.

Thank you all for your participation !

Here is a small summary from our viewpoint:

- Replace CONCAT with a backend defined internal representation in RTL
--> No particular problems

- Allow backend to write patterns for operation on complex modes
--> No particular problems

- Conditional lowering depending on whether a pattern exists or not
--> Concerns when the vectorization of split complex operations performs 
better
    than not vectorized unified complex operations

- Centralize complex lowering in cplxlower
--> No particular problems if it doesn't prevent IEEE compliance and
    optimizations (like const folding)

- Vectorization of complex operations
--> 2 representations (interleaved and separated real/imag): cannot 
impose one
    if some machines prefer the other
--> Complex are composite modes, the vectorizer assumes that the inner 
mode is
    scalar to do some optimizations (which ones ?)
--> Mixed split/unified complex operations cannot be vectorized easely
--> Assuming that the inner representation of complex vectors is let to 
target
    backends, the vectorizer doesn't know it, which prevent some 
optimizations
    (which ones ?)

- Explicit vectors of complex
--> Cplxlower cannot lower it, and moving veclower before cplxlower is a 
bad
    idea as it prevents some optimizations
--> Teaching cplxlower how to deal with vectors of complex seems to be a
    reasonable alternative
--> Concerns about ABI or indexing if the internal representation is let 
to the
    backend and differs from the representation in memory

- Impact of the current SLP pattern matching of complex operations
--> Only with -ffast-math
--> It can match user defined operations (not C99) that can be 
simplified with a
    complex instruction
--> Dedicated opcode and real vector type choosen VS standard opcode and 
complex
    mode in our implementation
--> Need to preserve SLP pattern matching as too many applications 
redefines
    complex and bypass C99 standard.
--> So need to harmonize with our implementation

- Support of the pure imaginary type (_Imaginary)
--> Still not supported by gcc (and llvm), neither in our implementation
--> Issues comes from the fact that an imaginary is not a complex with 
real part
    set to 0
--> The same issue with complex multiplication by a real (which is split 
in the
    frontend, and our implementation hasn't changed it yet)
--> Idea: Add an attribute to the Tree complex type which specify pure 
real / pure
    imaginary / full complex ?

- Fast pattern for IEEE compliant emulated operations
--> Not enough time to discuss about it

Don't hesitate to add something or bring more precision if you want.

As I said at the end of the presentation, we have written a paper which 
explains
our implementation in details. You can find it on the wiki page of the 
Cauldron 
(https://gcc.gnu.org/wiki/cauldron2023talks?action=AttachFile&do=view&target=Exposing+Complex+Numbers+to+Target+Back-ends+%28paper%29.pdf).

Sylvain






^ permalink raw reply	[flat|nested] 15+ messages in thread
* Complex numbers support: discussions summary
@ 2023-10-16  9:14 Sylvain Noiry
  2023-10-17 20:37 ` Toon Moene
  0 siblings, 1 reply; 15+ messages in thread
From: Sylvain Noiry @ 2023-10-16  9:14 UTC (permalink / raw)
  To: gcc; +Cc: piannetta

Hi,

We are trying to update our patches on complex numbers to take into 
account what has been discussed.

The main change from our previous patches consists of replacing vectors 
of complex types with classical vectors of real types (ex V4SF instead 
of V2SC) associated with existing complex opcodes (like .COMPLEX_MUL) 
when vectorizing.  Non vectored complex modes are also replaced by 
vectors of two reals at the end of the middle-end (ex SC to V2SF), so 
that it can reuse already existing patterns.  Indeed, non complex 
specific operations like an addition does not require an specific 
pattern anymore, and already implementing patterns like cmul, cmul_conj, 
cadd90,... can be used.

To do so, the cplxlower pass has been cut into two passes:
   - The first one replace complex specific opcodes with dedicated 
opcodes (like .COMPLEX_MUL replacing MUL_EXPR with SC mode), but complex 
modes are kept at this point.  Unsupported native operations are also 
lowered, because we assume that it's better to lower and hope for 
standard optimizations in the middle-end than trying to vectorize with 
near-zero chance, and then lower only after.
   - The second one almost only remaps non vectored complex modes into 
vector of two reals (like SC to V2SF).

So the vectorizer takes complex modes as input but vectorize with 
vectors of real modes (ex V4SF vector mode for SC).  Because complex 
specific opcodes have been set before, no confusion with real operations 
is possible. We also may use vectors of two reals as inputs, but 
vectorizing small vector modes into bigger ones (like V2SF to V4SF) is 
not possible.

Here are some advantages of this new approach:
   - No more vectors of complex modes
   - The vectorization of complex operations is improved, because split 
and unified vectored statements can easely be mixed as it uses the same 
vector type. We can also imagine to test multiple options (First: native 
vectored, second: split vectored, third: unified scalar,...).
   - It reuses patterns for vectors of two reals for non complex 
specific operations, and also already existing complex patterns like 
cmul implemented on aarch64, which could mean almost free performance 
gains on many targets.

On the performance side, we can still exploit the full potential of 
complex instructions on KVX.  To illustrate the gains on aarch64 without 
rewriting any patterns (except a mov), here is the assembly generated 
for a vector complex mul mul add with -O2 -mcpu=neoverse-v1 (and without 
ffast-math like with SLP):

void vfmma (_Complex float a[restrict N], _Complex float b[restrict N],
                      _Complex float c[restrict N], _Complex float 
d[restrict N])
{
   for (int i = 0; i < N; i++)
     c[i] += a[i] * b[i] * d[i];
}


vfmma:
         movi    v3.4s, 0
         mov     x4, 0
         .align  5
.L2:
         ldr     q2, [x1, x4]
         mov     v1.16b, v3.16b
         ldr     q0, [x0, x4]
         fcmla   v1.4s, v0.4s, v2.4s, #0
         fcmla   v1.4s, v0.4s, v2.4s, #90
         ldr     q0, [x2, x4]
         ldr     q2, [x3, x4]
         fcmla   v0.4s, v2.4s, v1.4s, #0
         fcmla   v0.4s, v2.4s, v1.4s, #90
         str     q0, [x2, x4]
         add     x4, x4, 16
         cmp     x4, 256
         bne     .L2
         ret

We have only done some experimentation with this approach.  If you think 
that it could be interesting we will try to develop it more.

Thanks,

Sylvain






^ permalink raw reply	[flat|nested] 15+ messages in thread
* Complex numbers support: discussions summary
@ 2023-10-09 13:29 Sylvain Noiry
  0 siblings, 0 replies; 15+ messages in thread
From: Sylvain Noiry @ 2023-10-09 13:29 UTC (permalink / raw)
  To: gcc, toon

[-- Attachment #1: Type: text/plain, Size: 2653 bytes --]

> On 9/26/23 20:40, Toon Moene wrote:
>
>>/On 9/26/23 09:30, Richard Biener via Gcc wrote: />>//>>>/On Mon, Sep 25, 2023 at 5:17 PM Sylvain Noiry via Gcc />>>/<gcc@gcc.gnu.org> wrote: />>//>>>>/As I said at the end of the presentation, we have written a paper which />>>>/explains />>>>/our implementation in details. You can find it on the wiki page of the />>>>/Cauldron />>>>/(https://gcc.gnu.org/wiki/cauldron2023talks?action=AttachFile&do=view&target=Exposing+Complex+Numbers+to+Target+Back-ends+%28paper%29.pdf 
<https://gcc.gnu.org/wiki/cauldron2023talks?action=AttachFile&do=view&target=Exposing+Complex+Numbers+to+Target+Back-ends+%28paper%29.pdf>). 
/>>>//>>>/Thanks for the detailed presentation at the Cauldron. />>>//>>>/My personal summary is that I'm less convinced delaying lowering is />>>/the way to go. />>//>>/Thanks Sylvain for the quick summary of the discussion - it helps a />>/great deal now that the discussion is still fresh in our memory. />
> I found time today to run some tests.
>
> First of all, the result of the gcc test harness as applied to the top 
> of the complex/kvx branch in the https://github.com/kalray/gcc  repository:
>
 > https://gcc.gnu.org/pipermail/gcc-testresults/2023-October/797627.html  <https://gcc.gnu.org/pipermail/gcc-testresults/2023-October/797627.html>
>
> I think there are several complex failures here that are not in 
> "standard" 12.2 release (for x86_64-linux-gnu).

We have removed some special cases for complex operations outside the cplxlower pass (especially in tree-ssa-forwprop.cc), because of it ruined our efforts to maintain it not lowered. So the performance is fine on the KVX target, but some (SLP) vectorization cases are missed for other targets which do not exploit complex patterns.

It may be interesting to add a conditions on theses cases rather than just remove them.

> I also compiled all of lapack-3.11.0 with that compiler and obtained the 
> same results as with gcc/gfortran 13.2:
>
>			-->   LAPACK TESTING SUMMARY  <--
>		Processing LAPACK Testing output found in the TESTING directory
> SUMMARY             	nb test run 	numerical error   	other error
> ================   	===========	=================	================
> REAL             	1327023		0	(0.000%)	0	(0.000%)	
> DOUBLE PRECISION	1300917		6	(0.000%)	0	(0.000%)	
> COMPLEX          	786775		0	(0.000%)	0	(0.000%)	
> COMPLEX16         	787842		0	(0.000%)	0	(0.000%)	
>
> --> ALL PRECISIONS	4202557		6	(0.000%)	0	(0.000%)	

Thank you! It doesn't surprise me because GCC still processed complex operations like before when the backend does not exploit complex patterns.

Best regards,

Sylvain



^ permalink raw reply	[flat|nested] 15+ messages in thread
* Complex numbers support: discussions summary
@ 2023-09-25 14:56 Sylvain Noiry
  0 siblings, 0 replies; 15+ messages in thread
From: Sylvain Noiry @ 2023-09-25 14:56 UTC (permalink / raw)
  To: gcc; +Cc: piannetta, Benoit Dinechin

[-- Attachment #1: Type: text/plain, Size: 3077 bytes --]

Hi,

We had very interesting discussions during our presentation with Paul on 
the
support of complex numbers in gcc at the Cauldron.

Thank you all for your participation !

Here is a small summary from our viewpoint:

- Replace CONCAT with a backend defined internal representation in RTL
--> No particular problems

- Allow backend to write patterns for operation on complex modes
--> No particular problems

- Conditional lowering depending on whether a pattern exists or not
--> Concerns when the vectorization of split complex operations performs 
better
    than not vectorized unified complex operations

- Centralize complex lowering in cplxlower
--> No particular problems if it doesn't prevent IEEE compliance and
    optimizations (like const folding)

- Vectorization of complex operations
--> 2 representations (interleaved and separated real/imag): cannot 
impose one
    if some machines prefer the other
--> Complex are composite modes, the vectorizer assumes that the inner 
mode is
    scalar to do some optimizations (which ones ?)
--> Mixed split/unified complex operations cannot be vectorized easely
--> Assuming that the inner representation of complex vectors is let to 
target
    backends, the vectorizer doesn't know it, which prevent some 
optimizations
    (which ones ?)

- Explicit vectors of complex
--> Cplxlower cannot lower it, and moving veclower before cplxlower is a 
bad
    idea as it prevents some optimizations
--> Teaching cplxlower how to deal with vectors of complex seems to be a
    reasonable alternative
--> Concerns about ABI or indexing if the internal representation is let 
to the
    backend and differs from the representation in memory

- Impact of the current SLP pattern matching of complex operations
--> Only with -ffast-math
--> It can match user defined operations (not C99) that can be 
simplified with a
    complex instruction
--> Dedicated opcode and real vector type choosen VS standard opcode and 
complex
    mode in our implementation
--> Need to preserve SLP pattern matching as too many applications 
redefines
    complex and bypass C99 standard.
--> So need to harmonize with our implementation

- Support of the pure imaginary type (_Imaginary)
--> Still not supported by gcc (and llvm), neither in our implementation
--> Issues comes from the fact that an imaginary is not a complex with 
real part
    set to 0
--> The same issue with complex multiplication by a real (which is split 
in the
    frontend, and our implementation hasn't changed it yet)
--> Idea: Add an attribute to the Tree complex type which specify pure 
real / pure
    imaginary / full complex ?

- Fast pattern for IEEE compliant emulated operations
--> Not enough time to discuss about it

Don't hesitate to add something or bring more precision if you want.

As I said at the end of the presentation, we have written a paper which 
explains
our implementation in details. You can find it attached to this mail
(and latter in the wiki page of the Cauldron).

Sylvain
66,4          Bot



[-- Attachment #2: exposing_complex_numbers_to_target_backends_GNU_Cauldron_2023.pdf --]
[-- Type: application/pdf, Size: 715823 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2023-10-18  7:24 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-09-25 15:15 Complex numbers support: discussions summary Sylvain Noiry
2023-09-26  7:30 ` Richard Biener
2023-09-26  8:29   ` Tamar Christina
2023-09-26 10:19     ` Paul Iannetta
2023-09-26  8:53   ` Paul Iannetta
2023-09-26  9:28     ` Tamar Christina
2023-09-26  9:40       ` Paul Iannetta
2023-09-26 18:40   ` Toon Moene
2023-10-05 14:45     ` Toon Moene
2023-09-26 15:46 ` Joseph Myers
  -- strict thread matches above, loose matches on Subject: below --
2023-10-16  9:14 Sylvain Noiry
2023-10-17 20:37 ` Toon Moene
2023-10-18  7:24   ` Sylvain Noiry
2023-10-09 13:29 Sylvain Noiry
2023-09-25 14:56 Sylvain Noiry

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).