[Bug target/109632] New: Inefficient codegen when complex numbers are emulated with structs

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

From: "tnfchris at gcc dot gnu.org" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug target/109632] New: Inefficient codegen when complex numbers are emulated with structs
Date: Wed, 26 Apr 2023 13:03:02 +0000	[thread overview]
Message-ID: <bug-109632-4@http.gcc.gnu.org/bugzilla/> (raw)

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109632

            Bug ID: 109632
           Summary: Inefficient codegen when complex numbers are emulated
                    with structs
           Product: gcc
           Version: 14.0
            Status: UNCONFIRMED
          Keywords: missed-optimization
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: tnfchris at gcc dot gnu.org
  Target Milestone: ---
            Target: aarch64*

The following two cases are the same

struct complx_t {
    float re;
    float im;
};

complx_t
add(const complx_t &a, const complx_t &b) {
  return {a.re + b.re, a.im + b.im};
}

_Complex float
add(const _Complex float *a, const _Complex float *b) {
  return {__real__ *a + __real__ *b, __imag__ *a + __imag__ *b};
}

But we generate much different code (looking at -O2),  For the first one we do:

        ldr     d1, [x1]
        ldr     d0, [x0]
        fadd    v0.2s, v0.2s, v1.2s
        fmov    x0, d0
        lsr     x1, x0, 32
        lsr     w0, w0, 0
        fmov    s1, w1
        fmov    s0, w0
        ret

which is bad for obvious reasons, but also also never needed to go through the
genreg for such a reversal. we could have used many other NEON instructions.

For the second one we generate the good instructions:

add(float _Complex const*, float _Complex const*):
        ldp     s3, s2, [x0]
        ldp     s0, s1, [x1]
        fadd    s1, s2, s1
        fadd    s0, s3, s0
        ret

The difference being that in the second one we have decomposed the initial
structure by loading the elements:

  <bb 2> [local count: 1073741824]:
  _1 = REALPART_EXPR <*a_8(D)>;
  _2 = REALPART_EXPR <*b_9(D)>;
  _3 = _1 + _2;
  _4 = IMAGPART_EXPR <*a_8(D)>;
  _5 = IMAGPART_EXPR <*b_9(D)>;
  _6 = _4 + _5;
  _10 = COMPLEX_EXPR <_3, _6>;
  return _10;

In the first one we've kept them as vectors:

  <bb 2> [local count: 1073741824]:
  vect__1.6_13 = MEM <const vector(2) float> [(float *)a_8(D)];
  vect__2.9_15 = MEM <const vector(2) float> [(float *)b_9(D)];
  vect__3.10_16 = vect__1.6_13 + vect__2.9_15;
  MEM <vector(2) float> [(float *)&D.4435] = vect__3.10_16;
  return D.4435;

This part is probably a costing issue, we SLP them even though it's not
profitable because for the APCS we have to return them in separate registers.

Using -fno-tree-vectorize gets the gimple code right:

  <bb 2> [local count: 1073741824]:
  _1 = a_8(D)->re;
  _2 = b_9(D)->re;
  _3 = _1 + _2;
  D.4435.re = _3;
  _4 = a_8(D)->im;
  _5 = b_9(D)->im;
  _6 = _4 + _5;
  D.4435.im = _6;
  return D.4435;

But we generate worse code:

        ldp     s1, s0, [x0]
        mov     x2, 0
        ldp     s3, s2, [x1]
        fadd    s1, s1, s3
        fadd    s0, s0, s2
        fmov    w1, s1
        fmov    w0, s0
        bfi     x2, x1, 0, 32
        bfi     x2, x0, 32, 32
        lsr     x0, x2, 32
        lsr     w2, w2, 0
        fmov    s1, w0
        fmov    s0, w2

where we again use genreg as a very complicated way to do a no-op.

So there are two bugs here:

1. a costing, we shouldn't SLP
2. an expansion, the code out of expand is bad to begin with.

next             reply	other threads:[~2023-04-26 13:03 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-04-26 13:03 tnfchris at gcc dot gnu.org [this message]
2023-04-26 14:14 ` [Bug target/109632] " rguenth at gcc dot gnu.org
2023-04-26 14:40 ` tnfchris at gcc dot gnu.org
2023-04-26 15:23 ` tnfchris at gcc dot gnu.org
2023-04-27  7:52 ` rsandifo at gcc dot gnu.org
2023-04-27 11:17 ` rsandifo at gcc dot gnu.org
2023-04-27 11:24 ` tnfchris at gcc dot gnu.org
2023-04-27 13:33 ` rsandifo at gcc dot gnu.org
2023-04-27 17:50 ` rsandifo at gcc dot gnu.org
2023-04-27 19:11 ` tnfchris at gcc dot gnu.org
2023-05-02 15:00 ` rsandifo at gcc dot gnu.org
2023-05-23 10:34 ` cvs-commit at gcc dot gnu.org
2023-05-23 19:16 ` rsandifo at gcc dot gnu.org

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bug-109632-4@http.gcc.gnu.org/bugzilla/ \
    --to=gcc-bugzilla@gcc.gnu.org \
    --cc=gcc-bugs@gcc.gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).