public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
From: Sylvain Noiry <snoiry@kalrayinc.com>
To: gcc@gcc.gnu.org
Cc: piannetta@kalrayinc.com
Subject: Complex numbers support: discussions summary
Date: Mon, 16 Oct 2023 11:14:28 +0200	[thread overview]
Message-ID: <da342ce9-f289-9063-8a2c-1c3a80022691@kalrayinc.com> (raw)

Hi,

We are trying to update our patches on complex numbers to take into 
account what has been discussed.

The main change from our previous patches consists of replacing vectors 
of complex types with classical vectors of real types (ex V4SF instead 
of V2SC) associated with existing complex opcodes (like .COMPLEX_MUL) 
when vectorizing.  Non vectored complex modes are also replaced by 
vectors of two reals at the end of the middle-end (ex SC to V2SF), so 
that it can reuse already existing patterns.  Indeed, non complex 
specific operations like an addition does not require an specific 
pattern anymore, and already implementing patterns like cmul, cmul_conj, 
cadd90,... can be used.

To do so, the cplxlower pass has been cut into two passes:
   - The first one replace complex specific opcodes with dedicated 
opcodes (like .COMPLEX_MUL replacing MUL_EXPR with SC mode), but complex 
modes are kept at this point.  Unsupported native operations are also 
lowered, because we assume that it's better to lower and hope for 
standard optimizations in the middle-end than trying to vectorize with 
near-zero chance, and then lower only after.
   - The second one almost only remaps non vectored complex modes into 
vector of two reals (like SC to V2SF).

So the vectorizer takes complex modes as input but vectorize with 
vectors of real modes (ex V4SF vector mode for SC).  Because complex 
specific opcodes have been set before, no confusion with real operations 
is possible. We also may use vectors of two reals as inputs, but 
vectorizing small vector modes into bigger ones (like V2SF to V4SF) is 
not possible.

Here are some advantages of this new approach:
   - No more vectors of complex modes
   - The vectorization of complex operations is improved, because split 
and unified vectored statements can easely be mixed as it uses the same 
vector type. We can also imagine to test multiple options (First: native 
vectored, second: split vectored, third: unified scalar,...).
   - It reuses patterns for vectors of two reals for non complex 
specific operations, and also already existing complex patterns like 
cmul implemented on aarch64, which could mean almost free performance 
gains on many targets.

On the performance side, we can still exploit the full potential of 
complex instructions on KVX.  To illustrate the gains on aarch64 without 
rewriting any patterns (except a mov), here is the assembly generated 
for a vector complex mul mul add with -O2 -mcpu=neoverse-v1 (and without 
ffast-math like with SLP):

void vfmma (_Complex float a[restrict N], _Complex float b[restrict N],
                      _Complex float c[restrict N], _Complex float 
d[restrict N])
{
   for (int i = 0; i < N; i++)
     c[i] += a[i] * b[i] * d[i];
}


vfmma:
         movi    v3.4s, 0
         mov     x4, 0
         .align  5
.L2:
         ldr     q2, [x1, x4]
         mov     v1.16b, v3.16b
         ldr     q0, [x0, x4]
         fcmla   v1.4s, v0.4s, v2.4s, #0
         fcmla   v1.4s, v0.4s, v2.4s, #90
         ldr     q0, [x2, x4]
         ldr     q2, [x3, x4]
         fcmla   v0.4s, v2.4s, v1.4s, #0
         fcmla   v0.4s, v2.4s, v1.4s, #90
         str     q0, [x2, x4]
         add     x4, x4, 16
         cmp     x4, 256
         bne     .L2
         ret

We have only done some experimentation with this approach.  If you think 
that it could be interesting we will try to develop it more.

Thanks,

Sylvain






             reply	other threads:[~2023-10-16  9:14 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-10-16  9:14 Sylvain Noiry [this message]
2023-10-17 20:37 ` Toon Moene
2023-10-18  7:24   ` Sylvain Noiry
  -- strict thread matches above, loose matches on Subject: below --
2023-10-09 13:29 Sylvain Noiry
2023-09-25 15:15 Sylvain Noiry
2023-09-26  7:30 ` Richard Biener
2023-09-26  8:29   ` Tamar Christina
2023-09-26 10:19     ` Paul Iannetta
2023-09-26  8:53   ` Paul Iannetta
2023-09-26  9:28     ` Tamar Christina
2023-09-26  9:40       ` Paul Iannetta
2023-09-26 18:40   ` Toon Moene
2023-10-05 14:45     ` Toon Moene
2023-09-26 15:46 ` Joseph Myers
2023-09-25 14:56 Sylvain Noiry

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=da342ce9-f289-9063-8a2c-1c3a80022691@kalrayinc.com \
    --to=snoiry@kalrayinc.com \
    --cc=gcc@gcc.gnu.org \
    --cc=piannetta@kalrayinc.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).