[RFC] Exposing complex numbers to target backends

public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed

* [RFC] Exposing complex numbers to target backends
@ 2023-07-05 15:12 Sylvain Noiry
  2023-07-05 18:10 ` Toon Moene
  2023-07-06 11:02 ` Richard Biener
  0 siblings, 2 replies; 6+ messages in thread
From: Sylvain Noiry @ 2023-07-05 15:12 UTC (permalink / raw)
  To: gcc; +Cc: Paul Iannetta, Benoit Dinechin

Hi,

My name is Sylvain, I am an intern at Kalray and I work on improving the GCC backend for the KVX target.  The KVX ISA has dedicated instructions for the handling of complex numbers, which cannot be selected by GCC due to how complex numbers are handled internally.  My goal is to make GCC able to expose to machine description files new patterns dealing with complex numbers.  I already have a proof of concept which can increase performance even on other backends like x86 if the new patterns are implemented.

My approach is to prevent the lowering of complex operations when the backend can handle it natively and work directly on complex modes (SC, DC, CDI, CSI, CHI, CQI).  The cplxlower pass looks for supported optabs related to complex numbers and use them directly.  Another advantage is that native operations can now go through all GIMPLE passes and preserve most optimisations like FMA generation.

Vectorization is also preserved with native complex operands, although some functions were updated. Because vectorization assumes that inner elements are scalar and complex cannot be considered as scalar, some functions which only take scalars have been adapted or duplicated to handle complex elements.

I've also changed the representation of complex numbers during the expand pass.  READ_COMPLEX_PART and WRITE_COMPLEX_PART have been transformed into target hooks, and a new hook GEN_RTX_COMPLEX allows each backend to choose its preferred complex representation in RTL.  The default one uses CONCAT like before, but the KVX backend uses registers with complex mode containing both real and imaginary parts.

Now each backend can add its own native complex operations with patterns in its machine description. The following example implements a complex multiplication with mode SC on the KVX backend:
(define_insn "mulsc3"
  [(set (match_operand:SC 0 "register_operand" "=r")
        (mult:SC (match_operand:SC 1 "register_operand" "r")
                 (match_operand:SC 2 "register_operand" "r")))]
  ""
  "fmulwc %0 = %1, %2"
  [(set_attr "type" "mau_fpu")]
)

The main patch affects around 1400 lines of generic code, mostly located in expr.cc and tree-complex.cc. These are mainly additions or the result of the move of READ_COMPLEX_PART and WRITE_COMPLEX_PART from expr.cc to target hooks.

I know that ARM developers have added partial support of complex instructions.  However, since they are operating during the vectorization, and are promoting operations on vectors of floating point numbers that looks like operations on (vectors of) complex numbers, their approach misses simple cases.  At this point they create operations working on vector of floating point numbers which will be caught by dedicated define_expand later.  On the other hand, our approach propagates complex numbers through all the middle-end and we have an easier time to recombine the operations and recognize what ARM does.  Some choices will be needed to merge our two approaches, although I've already reused their work on complex rotations in my implementation.

Results:

I have tested my implementation on multiple code samples, as well as a few FFTs.  On a simple in-place radix-2 with precomputed twiddle seeds (2 complex mult, 1 add, and 1 sub per loop), the compute time has been divided by 3 when compiling with -O3 (because calls to __mulsc3 are replaced by native instructions) and shortened by 20% with -ffast-math.  In both cases, the achieved performance level is now on par with another version coded using intrinsics.  These improvements do not come exclusively from the new generated hardware instructions, the replacement of CONCATs to registers prevents GCC from generating instructions to extract the real and imaginary part into their own registers and recombine them later.

This new approach can also brings a performance uplift to other backends.  I have tried to reuse the same complex representation in rtl as KVX for x86, and a few patterns.  Although I still have useless moves on large programs, simple examples like below already show performance uplift.

_Complex float add(_Complex float a, _Complex float b)
{
  return a + b;
}

Using "-O2" the assembly produced is now on paar with llvm and looks like :

add:
        addps  %xmm1, %xmm0
        ret

Choices to be done:
  - Currently, ARM uses optab which start with "c" like "cmul" to distinguish between a real floating point numbers and complex numbers.  Since we keep complex mode, this could be simply done with mul<mode>.
  - Currently the parser does some early optimizations and lowering that could be moved into the cplxlower pass.  For example, i've changed a bit how complex rotations by 90° and 270° are processed, which are recognized in fold-const.cc.  A call to a new COMPLEX_ROT90/270 internal function is now inserted, which is then lowered or kept in the cplxlower pass.  Finally the widening_mul pass can generate COMPLEX_ADD_ROT90/270 internal function, which are expanded using the cadd90/270 optabs, else COMPLEX_ROT90/270 are expanded using new crot90/270 optabs.
  - Currently, we have to duplicate the preferred_simd_mode since in only accept scalar modes, if we unify enough, we could have a new type that would be a union of scalar_mode and complex_mode, but we did not do it since it would incur many modifications.
  - Declaration of complex vector through attribute directives, this would be a new C extension (and clang does not support it either).
  - The KVX ISA supports some fused conjugate and operations (ex: a + conjf(b)), which are caught directly in the combine pass if the corresponding pattern in present the backend. This solution is simple, but it also mays be caught in the middle-end like FMAs.

Currently supported patterns:
  - all basic arithmetic operations for scalar and vector complex modes (add, mul, neg, ...)
  - conj<mode> for the conjugate operation, using a new conj_optab
  - crot90<mode>/crot270<mode> for complex rotations, using new optabs

I would like to have your opinion on my approach. I can send you the patch if you want.

Best regards,

Sylvain Noiry

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC] Exposing complex numbers to target backends
  2023-07-05 15:12 [RFC] Exposing complex numbers to target backends Sylvain Noiry
@ 2023-07-05 18:10 ` Toon Moene
  2023-07-06 11:02 ` Richard Biener
  1 sibling, 0 replies; 6+ messages in thread
From: Toon Moene @ 2023-07-05 18:10 UTC (permalink / raw)
  To: Sylvain Noiry, gcc; +Cc: Paul Iannetta, Benoit Dinechin

On 7/5/23 17:12, Sylvain Noiry via Gcc wrote:

> Hi,
> 
> My name is Sylvain, I am an intern at Kalray and I work on improving the GCC backend for the KVX target.  The KVX ISA has dedicated instructions for the handling of complex numbers, which cannot be selected by GCC due to how complex numbers are handled internally.  My goal is to make GCC able to expose to machine description files new patterns dealing with complex numbers.  I already have a proof of concept which can increase performance even on other backends like x86 if the new patterns are implemented.

I do not have the expertise to evaluate if your approach is the way we 
want to go forward with the handling of complex numbers in the 
middle-/back-end(s) of GCC.

However, I *do* have a sizable amount of (Fortran) code over here that 
uses complex numbers in a day to day operation that I am willing to test 
(obviously, "day to day operation" is to be interpreted loosely here - 
what I do is following the *real* operation at my employer (KNMI) with 
my own weather forecasting computations at home, compiled with GCC).

I suppose that your patch is against the master branch of GCC's 
repository - so I'll have to test that first, cleanly.

Kind regards,

-- 
Toon Moene - e-mail: toon@moene.org - phone: +31 346 214290
Saturnushof 14, 3738 XG  Maartensdijk, The Netherlands

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC] Exposing complex numbers to target backends
  2023-07-05 15:12 [RFC] Exposing complex numbers to target backends Sylvain Noiry
  2023-07-05 18:10 ` Toon Moene
@ 2023-07-06 11:02 ` Richard Biener
  2023-07-06 15:04   ` Sylvain Noiry
  1 sibling, 1 reply; 6+ messages in thread
From: Richard Biener @ 2023-07-06 11:02 UTC (permalink / raw)
  To: Sylvain Noiry; +Cc: gcc, Paul Iannetta, Benoit Dinechin

On Wed, Jul 5, 2023 at 5:14 PM Sylvain Noiry via Gcc <gcc@gcc.gnu.org> wrote:
>
> Hi,
>
> My name is Sylvain, I am an intern at Kalray and I work on improving the GCC backend for the KVX target.  The KVX ISA has dedicated instructions for the handling of complex numbers, which cannot be selected by GCC due to how complex numbers are handled internally.  My goal is to make GCC able to expose to machine description files new patterns dealing with complex numbers.  I already have a proof of concept which can increase performance even on other backends like x86 if the new patterns are implemented.
>
> My approach is to prevent the lowering of complex operations when the backend can handle it natively and work directly on complex modes (SC, DC, CDI, CSI, CHI, CQI).  The cplxlower pass looks for supported optabs related to complex numbers and use them directly.  Another advantage is that native operations can now go through all GIMPLE passes and preserve most optimisations like FMA generation.

I'll note that complex lowering takes advantage of complex numbers
without real/imag parts, I suppose you are preseving some
of these optimizations and only prevent lowering of ops we natively
support.  I think that's a reasonable thing and I agree the
standard optabs should be used with complex modes.

> Vectorization is also preserved with native complex operands, although some functions were updated. Because vectorization assumes that inner elements are scalar and complex cannot be considered as scalar, some functions which only take scalars have been adapted or duplicated to handle complex elements.

I don't quite understand whether you end up with vectors with complex
components or vectors with twice the number of
scalar elements, implicitely representing real/imag parts interleaved.
Can you clarify?  We've recently had discussions
around this and agreed we don't want vector modes with complex component modes.

> I've also changed the representation of complex numbers during the expand pass.  READ_COMPLEX_PART and WRITE_COMPLEX_PART have been transformed into target hooks, and a new hook GEN_RTX_COMPLEX allows each backend to choose its preferred complex representation in RTL.  The default one uses CONCAT like before, but the KVX backend uses registers with complex mode containing both real and imaginary parts.
>
> Now each backend can add its own native complex operations with patterns in its machine description. The following example implements a complex multiplication with mode SC on the KVX backend:
> (define_insn "mulsc3"
>   [(set (match_operand:SC 0 "register_operand" "=r")
>         (mult:SC (match_operand:SC 1 "register_operand" "r")
>                  (match_operand:SC 2 "register_operand" "r")))]
>   ""
>   "fmulwc %0 = %1, %2"
>   [(set_attr "type" "mau_fpu")]
> )
>
> The main patch affects around 1400 lines of generic code, mostly located in expr.cc and tree-complex.cc. These are mainly additions or the result of the move of READ_COMPLEX_PART and WRITE_COMPLEX_PART from expr.cc to target hooks.
>
> I know that ARM developers have added partial support of complex instructions.  However, since they are operating during the vectorization, and are promoting operations on vectors of floating point numbers that looks like operations on (vectors of) complex numbers, their approach misses simple cases.  At this point they create operations working on vector of floating point numbers which will be caught by dedicated define_expand later.  On the other hand, our approach propagates complex numbers through all the middle-end and we have an easier time to recombine the operations and recognize what ARM does.  Some choices will be needed to merge our two approaches, although I've already reused their work on complex rotations in my implementation.
>
> Results:
>
> I have tested my implementation on multiple code samples, as well as a few FFTs.  On a simple in-place radix-2 with precomputed twiddle seeds (2 complex mult, 1 add, and 1 sub per loop), the compute time has been divided by 3 when compiling with -O3 (because calls to __mulsc3 are replaced by native instructions) and shortened by 20% with -ffast-math.  In both cases, the achieved performance level is now on par with another version coded using intrinsics.  These improvements do not come exclusively from the new generated hardware instructions, the replacement of CONCATs to registers prevents GCC from generating instructions to extract the real and imaginary part into their own registers and recombine them later.
>
> This new approach can also brings a performance uplift to other backends.  I have tried to reuse the same complex representation in rtl as KVX for x86, and a few patterns.  Although I still have useless moves on large programs, simple examples like below already show performance uplift.
>
> _Complex float add(_Complex float a, _Complex float b)
> {
>   return a + b;
> }

Yeah, the splitting doesn't help our bad job dealing with parameter
and return value expansion and your
change likely skirts that issue.  Vectorizing would likely fix it in a
similar way.

> Using "-O2" the assembly produced is now on paar with llvm and looks like :
>
> add:
>         addps  %xmm1, %xmm0
>         ret
>
> Choices to be done:
>   - Currently, ARM uses optab which start with "c" like "cmul" to distinguish between a real floating point numbers and complex numbers.  Since we keep complex mode, this could be simply done with mul<mode>.
>   - Currently the parser does some early optimizations and lowering that could be moved into the cplxlower pass.  For example, i've changed a bit how complex rotations by 90° and 270° are processed, which are recognized in fold-const.cc.  A call to a new COMPLEX_ROT90/270 internal function is now inserted, which is then lowered or kept in the cplxlower pass.  Finally the widening_mul pass can generate COMPLEX_ADD_ROT90/270 internal function, which are expanded using the cadd90/270 optabs, else COMPLEX_ROT90/270 are expanded using new crot90/270 optabs.
>   - Currently, we have to duplicate the preferred_simd_mode since in only accept scalar modes, if we unify enough, we could have a new type that would be a union of scalar_mode and complex_mode, but we did not do it since it would incur many modifications.
>   - Declaration of complex vector through attribute directives, this would be a new C extension (and clang does not support it either).
>   - The KVX ISA supports some fused conjugate and operations (ex: a + conjf(b)), which are caught directly in the combine pass if the corresponding pattern in present the backend. This solution is simple, but it also mays be caught in the middle-end like FMAs.
>
> Currently supported patterns:
>   - all basic arithmetic operations for scalar and vector complex modes (add, mul, neg, ...)
>   - conj<mode> for the conjugate operation, using a new conj_optab
>   - crot90<mode>/crot270<mode> for complex rotations, using new optabs
>
> I would like to have your opinion on my approach. I can send you the patch if you want.

It would be nice if you could split the patch into a series of changes.

Thanks,
Richard.

> Best regards,
>
> Sylvain Noiry
>
>
>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC] Exposing complex numbers to target backends
  2023-07-06 11:02 ` Richard Biener
@ 2023-07-06 15:04   ` Sylvain Noiry
  2023-07-17  9:18     ` Sylvain Noiry
  0 siblings, 1 reply; 6+ messages in thread
From: Sylvain Noiry @ 2023-07-06 15:04 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc, Paul Iannetta, Benoit Dinechin

[-- Attachment #1: Type: text/plain, Size: 9536 bytes --]

________________________________
From: Richard Biener <richard.guenther@gmail.com>
Sent: Thursday, July 6, 2023 1:02 PM
To: Sylvain Noiry <snoiry@kalrayinc.com>
Cc: gcc@gcc.gnu.org <gcc@gcc.gnu.org>; Paul Iannetta <piannetta@kalrayinc.com>; Benoit Dinechin <bddinechin@kalrayinc.com>
Subject: Re: [RFC] Exposing complex numbers to target backends

On Wed, Jul 5, 2023 at 5:14 PM Sylvain Noiry via Gcc <gcc@gcc.gnu.org> wrote:
>>
>> Hi,
>>
>> My name is Sylvain, I am an intern at Kalray and I work on improving the GCC backend for the KVX target.  The KVX ISA has dedicated instructions for the handling of complex numbers, which cannot be selected by GCC due to how complex numbers are handled internally.  My goal is to make GCC able to expose to machine description files new patterns dealing with complex numbers.  I already have a proof of concept which can increase performance even on other backends like x86 if the new patterns are implemented.
>>
>> My approach is to prevent the lowering of complex operations when the backend can handle it natively and work directly on complex modes (SC, DC, CDI, CSI, CHI, CQI).  The cplxlower pass looks for supported optabs related to complex numbers and use them directly.  Another advantage is that native operations can now go through all GIMPLE passes and preserve most optimisations like FMA generation.

>I'll note that complex lowering takes advantage of complex numbers
>without real/imag parts, I suppose you are preseving some
>of these optimizations and only prevent lowering of ops we natively
>support.  I think that's a reasonable thing and I agree the
>standard optabs should be used with complex modes.

Native complex operations are kept only if there is an optab for it,
otherwise, it will be lowered. In addition, we keep the three components
of each constant and SSA variables at any time during this pass (real, imag, both),
so that it's now easy to mix lowered and non-lowered operations. Later dead code
elimination passes remove the unused components.

Thus, complex handling remains exactly the same for backend without complex
patterns and the target hooks redefined (READ_COMPLEX_PART, WRITE_COMPLEX_PART,
 GEN_RTX_COMPLEX).

>> Vectorization is also preserved with native complex operands, although some functions were updated. Because vectorization assumes that inner elements are scalar and complex cannot be considered as scalar, some functions which only take scalars have been adapted or duplicated to handle complex elements.

>I don't quite understand whether you end up with vectors with complex
>components or vectors with twice the number of
>scalar elements, implicitely representing real/imag parts interleaved.
>Can you clarify?  We've recently had discussions
>around this and agreed we don't want vector modes with complex component modes.

Yes, I end up with vectors of complex inner types (like V2SC or V4CHI).
Except for the few functions that I've adapted, the vectorization pass works fine.
For example, if both mulsc and mulv2sc patterns are present in the backend,
a complex multiplication can be vectorized without any effort. But of course
this approach is in conflict with Arm's previous work, where my V2SC is a V4SF
for them.

>> I've also changed the representation of complex numbers during the expand pass.  READ_COMPLEX_PART and WRITE_COMPLEX_PART have been transformed into target hooks, and a new hook GEN_RTX_COMPLEX allows each backend to choose its preferred complex representation in RTL.  The default one uses CONCAT like before, but the KVX backend uses registers with complex mode containing both real and imaginary parts.
>>
>> Now each backend can add its own native complex operations with patterns in its machine description. The following example implements a complex multiplication with mode SC on the KVX backend:
>> (define_insn "mulsc3"
>>   [(set (match_operand:SC 0 "register_operand" "=r")
>>         (mult:SC (match_operand:SC 1 "register_operand" "r")
>>                  (match_operand:SC 2 "register_operand" "r")))]
>>   ""
>>   "fmulwc %0 = %1, %2"
>>   [(set_attr "type" "mau_fpu")]
>> )
>>
>> The main patch affects around 1400 lines of generic code, mostly located in expr.cc and tree-complex.cc. These are mainly additions or the result of the move of READ_COMPLEX_PART and WRITE_COMPLEX_PART from expr.cc to target hooks.
>>
>> I know that ARM developers have added partial support of complex instructions.  However, since they are operating during the vectorization, and are promoting operations on vectors of floating point numbers that looks like operations on (vectors of) complex numbers, their approach misses simple cases.  At this point they create operations working on vector of floating point numbers which will be caught by dedicated define_expand later.  On the other hand, our approach propagates complex numbers through all the middle-end and we have an easier time to recombine the operations and recognize what ARM does.  Some choices will be needed to merge our two approaches, although I've already reused their work on complex rotations in my implementation.
>>
>> Results:
>>
>> I have tested my implementation on multiple code samples, as well as a few FFTs.  On a simple in-place radix-2 with precomputed twiddle seeds (2 complex mult, 1 add, and 1 sub per loop), the compute time has been divided by 3 when compiling with -O3 (because calls to __mulsc3 are replaced by native instructions) and shortened by 20% with -ffast-math.  In both cases, the achieved performance level is now on par with another version coded using intrinsics.  These improvements do not come exclusively from the new generated hardware instructions, the replacement of CONCATs to registers prevents GCC from generating instructions to extract the real and imaginary part into their own registers and recombine them later.
>>
>> This new approach can also brings a performance uplift to other backends.  I have tried to reuse the same complex representation in rtl as KVX for x86, and a few patterns.  Although I still have useless moves on large programs, simple examples like below already show performance uplift.
>>
>> _Complex float add(_Complex float a, _Complex float b)
>> {
>>   return a + b;
>> }

>Yeah, the splitting doesn't help our bad job dealing with parameter
>and return value expansion and your
>change likely skirts that issue.  Vectorizing would likely fix it in a
>similar way.

I'm not sure if I've understood what you mean, but yes current extractions
and insertions of the real and imaginary parts is clearly a performance issue.
So, my idea is to let each target decide its own representation of complex using
the three target hooks. The default representation is the current one, with CONCAT,
but we have chosen to pair real and imaginary parts in one or multiple contiguous
registers in the KVX backend.

>> Using "-O2" the assembly produced is now on paar with llvm and looks like :
>>
>> add:
>>         addps  %xmm1, %xmm0
>>         ret
>>
>> Choices to be done:
>>   - Currently, ARM uses optab which start with "c" like "cmul" to distinguish between a real floating point numbers and complex numbers.  Since we keep complex mode, this could be simply done with mul<mode>.
>>   - Currently the parser does some early optimizations and lowering that could be moved into the cplxlower pass.  For example, i've changed a bit how complex rotations by 90° and 270° are processed, which are recognized in fold-const.cc.  A call to a new COMPLEX_ROT90/270 internal function is now inserted, which is then lowered or kept in the cplxlower pass.  Finally the widening_mul pass can generate COMPLEX_ADD_ROT90/270 internal function, which are expanded using the cadd90/270 optabs, else COMPLEX_ROT90/270 are expanded using new crot90/270 optabs.
>>   - Currently, we have to duplicate the preferred_simd_mode since in only accept scalar modes, if we unify enough, we could have a new type that would be a union of scalar_mode and complex_mode, but we did not do it since it would incur many modifications.
>>   - Declaration of complex vector through attribute directives, this would be a new C extension (and clang does not support it either).
>>   - The KVX ISA supports some fused conjugate and operations (ex: a + conjf(b)), which are caught directly in the combine pass if the corresponding pattern in present the backend. This solution is simple, but it also mays be caught in the middle-end like FMAs.
>>
>> Currently supported patterns:
>>   - all basic arithmetic operations for scalar and vector complex modes (add, mul, neg, ...)
>>   - conj<mode> for the conjugate operation, using a new conj_optab
>>   - crot90<mode>/crot270<mode> for complex rotations, using new optabs
>>
>> I would like to have your opinion on my approach. I can send you the patch if you want.

>It would be nice if you could split the patch into a series of changes.
>
>Thanks,
>Richard.

You can find my code in this repo "https://github.com/ElectrikSpace/gcc.git"
The implementation was originally done against the main dev branch of the KVX port,
and the working proof of concept is located in the "complex/kvx" branch.
I've also tried to apply the patch of GCC master branch today, with a minimal
adaptation of the x86 backend, in "the complex/x86" branch.
Both branches have the same generic patch (that I will split later) plus a backend
specific patch. I'm a little bit lost in the x86 backend, so it's not very usable
for now, but I will work on it.

Best regards,
Sylvain







^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC] Exposing complex numbers to target backends
  2023-07-06 15:04   ` Sylvain Noiry
@ 2023-07-17  9:18     ` Sylvain Noiry
  2023-09-12 10:12       ` Sylvain Noiry
  0 siblings, 1 reply; 6+ messages in thread
From: Sylvain Noiry @ 2023-07-17  9:18 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc, Paul Iannetta, Benoit Dinechin

Hi,

> [...]
> You can find my code in this repo "https://github.com/ElectrikSpace/gcc.git"
> The implementation was originally done against the main dev branch of the KVX port,
> and the working proof of concept is located in the "complex/kvx" branch.
> I've also tried to apply the patch of GCC master branch today, with a minimal
> adaptation of the x86 backend, in "the complex/x86" branch.
> Both branches have the same generic patch (that I will split later) plus a backend
> specific patch. I'm a little bit lost in the x86 backend, so it's not very usable
> for now, but I will work on it.

I have sent a little bit earlier a series of patches which describes my
implementation of the support for native complex operations. There are
8 generic patches and 1 experimental x86 patch which exploits a portion
of the added features.

The initial message called "[PATCH 0/9] Native complex operations" [1] explains the
implementation details and illustrate it with some examples, mainly on KVX because
the backend implements all the features and the ISA has native complex instructions.
Some implementation details need to be discussed, especially concerning vectors
of complex elements and the merge with the work from Arm.

[1] : "https://gcc.gnu.org/pipermail/gcc-patches/2023-July/624644.html"

Best regards,
Sylvain

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC] Exposing complex numbers to target backends
  2023-07-17  9:18     ` Sylvain Noiry
@ 2023-09-12 10:12       ` Sylvain Noiry
  0 siblings, 0 replies; 6+ messages in thread
From: Sylvain Noiry @ 2023-09-12 10:12 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc, Paul Iannetta, Benoit Dinechin

[-- Attachment #1: Type: text/plain, Size: 2321 bytes --]

Hi,

> [...]
> I have sent a little bit earlier a series of patches which describes my
> implementation of the support for native complex operations. There are
> 8 generic patches and 1 experimental x86 patch which exploits a portion
> of the added features.
> [...]

I've updated my series of patches. Except bug fixes, two major changes have been
done. Everything is explained in the message "[PATCH v2 0/11] Native complex
operations" [1].

With Paul Iannetta, we will present our work on complex numbers in gcc at the
GNU Cauldron 2023 !

[1] : "https://gcc.gnu.org/pipermail/gcc-patches/2023-September/630011.html"

Best regards,
Sylvain

________________________________
From: Sylvain Noiry <snoiry@kalrayinc.com>
Sent: Monday, July 17, 2023 11:18 AM
To: Richard Biener <richard.guenther@gmail.com>
Cc: gcc@gcc.gnu.org <gcc@gcc.gnu.org>; Paul Iannetta <piannetta@kalrayinc.com>; Benoit Dinechin <bddinechin@kalrayinc.com>
Subject: Re: [RFC] Exposing complex numbers to target backends

Hi,

> [...]
> You can find my code in this repo "https://github.com/ElectrikSpace/gcc.git"
> The implementation was originally done against the main dev branch of the KVX port,
> and the working proof of concept is located in the "complex/kvx" branch.
> I've also tried to apply the patch of GCC master branch today, with a minimal
> adaptation of the x86 backend, in "the complex/x86" branch.
> Both branches have the same generic patch (that I will split later) plus a backend
> specific patch. I'm a little bit lost in the x86 backend, so it's not very usable
> for now, but I will work on it.

I have sent a little bit earlier a series of patches which describes my
implementation of the support for native complex operations. There are
8 generic patches and 1 experimental x86 patch which exploits a portion
of the added features.

The initial message called "[PATCH 0/9] Native complex operations" [1] explains the
implementation details and illustrate it with some examples, mainly on KVX because
the backend implements all the features and the ISA has native complex instructions.
Some implementation details need to be discussed, especially concerning vectors
of complex elements and the merge with the work from Arm.

[1] : "https://gcc.gnu.org/pipermail/gcc-patches/2023-July/624644.html"

Best regards,
Sylvain

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2023-09-12 10:12 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-07-05 15:12 [RFC] Exposing complex numbers to target backends Sylvain Noiry
2023-07-05 18:10 ` Toon Moene
2023-07-06 11:02 ` Richard Biener
2023-07-06 15:04   ` Sylvain Noiry
2023-07-17  9:18     ` Sylvain Noiry
2023-09-12 10:12       ` Sylvain Noiry

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).