public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug target/109632] New: Inefficient codegen when complex numbers are emulated with structs
@ 2023-04-26 13:03 tnfchris at gcc dot gnu.org
  2023-04-26 14:14 ` [Bug target/109632] " rguenth at gcc dot gnu.org
                   ` (11 more replies)
  0 siblings, 12 replies; 13+ messages in thread
From: tnfchris at gcc dot gnu.org @ 2023-04-26 13:03 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109632

            Bug ID: 109632
           Summary: Inefficient codegen when complex numbers are emulated
                    with structs
           Product: gcc
           Version: 14.0
            Status: UNCONFIRMED
          Keywords: missed-optimization
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: tnfchris at gcc dot gnu.org
  Target Milestone: ---
            Target: aarch64*

The following two cases are the same

struct complx_t {
    float re;
    float im;
};

complx_t
add(const complx_t &a, const complx_t &b) {
  return {a.re + b.re, a.im + b.im};
}

_Complex float
add(const _Complex float *a, const _Complex float *b) {
  return {__real__ *a + __real__ *b, __imag__ *a + __imag__ *b};
}

But we generate much different code (looking at -O2),  For the first one we do:

        ldr     d1, [x1]
        ldr     d0, [x0]
        fadd    v0.2s, v0.2s, v1.2s
        fmov    x0, d0
        lsr     x1, x0, 32
        lsr     w0, w0, 0
        fmov    s1, w1
        fmov    s0, w0
        ret

which is bad for obvious reasons, but also also never needed to go through the
genreg for such a reversal. we could have used many other NEON instructions.

For the second one we generate the good instructions:

add(float _Complex const*, float _Complex const*):
        ldp     s3, s2, [x0]
        ldp     s0, s1, [x1]
        fadd    s1, s2, s1
        fadd    s0, s3, s0
        ret

The difference being that in the second one we have decomposed the initial
structure by loading the elements:

  <bb 2> [local count: 1073741824]:
  _1 = REALPART_EXPR <*a_8(D)>;
  _2 = REALPART_EXPR <*b_9(D)>;
  _3 = _1 + _2;
  _4 = IMAGPART_EXPR <*a_8(D)>;
  _5 = IMAGPART_EXPR <*b_9(D)>;
  _6 = _4 + _5;
  _10 = COMPLEX_EXPR <_3, _6>;
  return _10;

In the first one we've kept them as vectors:

  <bb 2> [local count: 1073741824]:
  vect__1.6_13 = MEM <const vector(2) float> [(float *)a_8(D)];
  vect__2.9_15 = MEM <const vector(2) float> [(float *)b_9(D)];
  vect__3.10_16 = vect__1.6_13 + vect__2.9_15;
  MEM <vector(2) float> [(float *)&D.4435] = vect__3.10_16;
  return D.4435;

This part is probably a costing issue, we SLP them even though it's not
profitable because for the APCS we have to return them in separate registers.

Using -fno-tree-vectorize gets the gimple code right:

  <bb 2> [local count: 1073741824]:
  _1 = a_8(D)->re;
  _2 = b_9(D)->re;
  _3 = _1 + _2;
  D.4435.re = _3;
  _4 = a_8(D)->im;
  _5 = b_9(D)->im;
  _6 = _4 + _5;
  D.4435.im = _6;
  return D.4435;

But we generate worse code:

        ldp     s1, s0, [x0]
        mov     x2, 0
        ldp     s3, s2, [x1]
        fadd    s1, s1, s3
        fadd    s0, s0, s2
        fmov    w1, s1
        fmov    w0, s0
        bfi     x2, x1, 0, 32
        bfi     x2, x0, 32, 32
        lsr     x0, x2, 32
        lsr     w2, w2, 0
        fmov    s1, w0
        fmov    s0, w2

where we again use genreg as a very complicated way to do a no-op.

So there are two bugs here:

1. a costing, we shouldn't SLP
2. an expansion, the code out of expand is bad to begin with.

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2023-05-23 19:16 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-04-26 13:03 [Bug target/109632] New: Inefficient codegen when complex numbers are emulated with structs tnfchris at gcc dot gnu.org
2023-04-26 14:14 ` [Bug target/109632] " rguenth at gcc dot gnu.org
2023-04-26 14:40 ` tnfchris at gcc dot gnu.org
2023-04-26 15:23 ` tnfchris at gcc dot gnu.org
2023-04-27  7:52 ` rsandifo at gcc dot gnu.org
2023-04-27 11:17 ` rsandifo at gcc dot gnu.org
2023-04-27 11:24 ` tnfchris at gcc dot gnu.org
2023-04-27 13:33 ` rsandifo at gcc dot gnu.org
2023-04-27 17:50 ` rsandifo at gcc dot gnu.org
2023-04-27 19:11 ` tnfchris at gcc dot gnu.org
2023-05-02 15:00 ` rsandifo at gcc dot gnu.org
2023-05-23 10:34 ` cvs-commit at gcc dot gnu.org
2023-05-23 19:16 ` rsandifo at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).