From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from moene.org (84-86-97-173.fixed.kpn.net [84.86.97.173]) by sourceware.org (Postfix) with ESMTPS id 7622F3858C52 for ; Tue, 17 Oct 2023 20:37:03 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 7622F3858C52 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=moene.org Authentication-Results: sourceware.org; spf=none smtp.mailfrom=moene.org ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 7622F3858C52 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=84.86.97.173 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1697575026; cv=none; b=b1PM86iJibIzE91rZygsxIOqfipsxOK5uDCgBBbAqEftbpxxqpkpHtyHHH2kZ/zy1H/V2ONIqv06mUmvIgJwwjAfQOp9gs1CTWIAkZ7nrGSMXcXZxAhde4rAu1DumWbmhPn5hgsSGTKjzu+vxWgBzyb+0eUjuUjsg6kMJEhs5Ro= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1697575026; c=relaxed/simple; bh=hqbiVmAy2TUFr1ZiI4Wh7Yq4klHlbqzbnxI5xIHZx70=; h=Message-ID:Date:MIME-Version:Subject:To:From; b=HWboQLFYZi2WxocIz75lDPxhFvWzpQ9Jua6rWUWIEefZf8RWJyO6pKWoloaDffc1E2bL/JiJBIk+HXQhPBN/Cdb3tsPydUTgkdCnIbNB3hgFhEE6BvN9hJtzDci2vFcY4iAFTZIevng331Dje6entj3/HDJwlkpK+Xe5p1fleaY= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from localhost ([127.0.0.1]) by moene.org with esmtp (Exim 4.97-RC1) (envelope-from ) id 1qsqoH-00000003drK-134t; Tue, 17 Oct 2023 22:37:01 +0200 Message-ID: <567d0202-bb1a-4242-96f0-e6bc93d237a1@moene.org> Date: Tue, 17 Oct 2023 22:37:01 +0200 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: Complex numbers support: discussions summary Content-Language: en-US To: Sylvain Noiry , gcc@gcc.gnu.org Cc: piannetta@kalrayinc.com References: From: Toon Moene Organization: Moene Computational Physics, Maartensdijk, The Netherlands In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=0.9 required=5.0 tests=BAYES_00,BODY_8BITS,KAM_DMARC_STATUS,KAM_LAZY_DOMAIN_SECURITY,KAM_SHORT,KHOP_HELO_FCRDNS,SPF_HELO_NONE,SPF_NONE,TXREP autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Sylvain, Is this on a branch in your github repository https://github.com/kalray/gcc somewhere ? That would make it easier to test it for me (and probably others). See for instance my mail here (d.d. Thu Oct 5 14:45:05 GMT 2023): https://gcc.gnu.org/pipermail/gcc/2023-October/242643.html Thanks in advance. Kind regards, Toon Moene. On 10/16/23 11:14, Sylvain Noiry via Gcc wrote: > Hi, > > We are trying to update our patches on complex numbers to take into > account what has been discussed. > > The main change from our previous patches consists of replacing vectors > of complex types with classical vectors of real types (ex V4SF instead > of V2SC) associated with existing complex opcodes (like .COMPLEX_MUL) > when vectorizing.  Non vectored complex modes are also replaced by > vectors of two reals at the end of the middle-end (ex SC to V2SF), so > that it can reuse already existing patterns.  Indeed, non complex > specific operations like an addition does not require an specific > pattern anymore, and already implementing patterns like cmul, cmul_conj, > cadd90,... can be used. > > To do so, the cplxlower pass has been cut into two passes: >   - The first one replace complex specific opcodes with dedicated > opcodes (like .COMPLEX_MUL replacing MUL_EXPR with SC mode), but complex > modes are kept at this point.  Unsupported native operations are also > lowered, because we assume that it's better to lower and hope for > standard optimizations in the middle-end than trying to vectorize with > near-zero chance, and then lower only after. >   - The second one almost only remaps non vectored complex modes into > vector of two reals (like SC to V2SF). > > So the vectorizer takes complex modes as input but vectorize with > vectors of real modes (ex V4SF vector mode for SC).  Because complex > specific opcodes have been set before, no confusion with real operations > is possible. We also may use vectors of two reals as inputs, but > vectorizing small vector modes into bigger ones (like V2SF to V4SF) is > not possible. > > Here are some advantages of this new approach: >   - No more vectors of complex modes >   - The vectorization of complex operations is improved, because split > and unified vectored statements can easely be mixed as it uses the same > vector type. We can also imagine to test multiple options (First: native > vectored, second: split vectored, third: unified scalar,...). >   - It reuses patterns for vectors of two reals for non complex > specific operations, and also already existing complex patterns like > cmul implemented on aarch64, which could mean almost free performance > gains on many targets. > > On the performance side, we can still exploit the full potential of > complex instructions on KVX.  To illustrate the gains on aarch64 without > rewriting any patterns (except a mov), here is the assembly generated > for a vector complex mul mul add with -O2 -mcpu=neoverse-v1 (and without > ffast-math like with SLP): > > void vfmma (_Complex float a[restrict N], _Complex float b[restrict N], >                      _Complex float c[restrict N], _Complex float > d[restrict N]) > { >   for (int i = 0; i < N; i++) >     c[i] += a[i] * b[i] * d[i]; > } > > > vfmma: >         movi    v3.4s, 0 >         mov     x4, 0 >         .align  5 > .L2: >         ldr     q2, [x1, x4] >         mov     v1.16b, v3.16b >         ldr     q0, [x0, x4] >         fcmla   v1.4s, v0.4s, v2.4s, #0 >         fcmla   v1.4s, v0.4s, v2.4s, #90 >         ldr     q0, [x2, x4] >         ldr     q2, [x3, x4] >         fcmla   v0.4s, v2.4s, v1.4s, #0 >         fcmla   v0.4s, v2.4s, v1.4s, #90 >         str     q0, [x2, x4] >         add     x4, x4, 16 >         cmp     x4, 256 >         bne     .L2 >         ret > > We have only done some experimentation with this approach.  If you think > that it could be interesting we will try to develop it more. > > Thanks, > > Sylvain > > > > > -- Toon Moene - e-mail: toon@moene.org - phone: +31 346 214290 Saturnushof 14, 3738 XG Maartensdijk, The Netherlands