[Bug target/114252] Introducing bswapsi reduces code performance

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

From: "gjl at gcc dot gnu.org" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug target/114252] Introducing bswapsi reduces code performance
Date: Thu, 07 Mar 2024 08:45:09 +0000	[thread overview]
Message-ID: <bug-114252-4-KXiWzLoQ2w@http.gcc.gnu.org/bugzilla/> (raw)
In-Reply-To: <bug-114252-4@http.gcc.gnu.org/bugzilla/>

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114252

--- Comment #8 from Georg-Johann Lay <gjl at gcc dot gnu.org> ---
(In reply to Richard Biener from comment #7)
> Note I do understand what you are saying, just the middle-end in detecting
> and using __builtin_bswap32 does what it does everywhere else - it checks
> whether the target implements the operation.
> 
> The middle-end doesn't try to actually compare costs (it has no idea of the
> bswapsi costs),

But even when the bswapsi insn costs nothing, the v14 code has these additional
6 movqi insns 32...37 compared to v13 code.  In order to have the same
performance like v13 code, a bswapsi would have to cost negative 6 insns.  And
an optimizer that assumes negative costs is not reasonable, in particular
because the recognition of bswap opportunities serves optimization -- or is
supposed to serve it as far as I understand.

> and it most definitely doesn't see how AVR is special in
> having only QImode registers and thus the created SImode load (which the
> target supports!) will end up as four registers.

Even when the bswap insn would cost nothing the code is worse.

> The only thing that maybe would make sense with AVR exposing bswapsi is
> users calling __builtin_bswap but since it always expands as a libcall
> even that makes no sense.

It makes perfect sense when C/C++ code uses __builtin_bswap32:

* With current bswapsi insn, the code does a call that performs SI:22 =
bswap(SI:22) with NO additionall register pressure.

* Without bswap insn, the code does a real ABI call that performs SI:22 =
bswap(SI:22) PLUS IT CLOBBERS r18, r19, r20, r21, r26, r27, r30 and r31; which
are the most powerful GPRs.

> So my preferred fix would be to remove bswapsi from avr.md?

Is there a way that the backend can fold a call to an insn that performs better
that a call? Like in TARGET_FOLD_BUILTIN?  As far as I know, the backend can
only fold target builtins, but not common builtins?  Tree fold cannot fold to
an insn obviously, but it could fold to inline asm, no?

Or can the target change an optabs entry so it expands to an insn that's more
profitable that a respective call? (like avr.md's bswap insn with transparent
call is more profitable than a real call).

The avr backend does this for many other stuff, too:

divmod, SI and PSI multiplications, parity, popcount, clz, ffs, 

> Does it benefit from recognizing bswap done with shifts on an int?

I don't fully understand that question. You mean to write code that shifts
bytes around like in
    uint32_t res = 0;
    res |= ((uint32_t) buf[0]) << 24;
    res |= ((uint32_t) buf[1]) << 16;
    res |= (uint32_t) buf[2] << 8;
    res |= buf[3];
    return res;
is better than a bswapsi call?

next prev parent reply	other threads:[~2024-03-07  8:45 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-03-06 10:01 [Bug tree-optimization/114252] New: " gjl at gcc dot gnu.org
2024-03-06 11:55 ` [Bug target/114252] " rguenth at gcc dot gnu.org
2024-03-06 11:59 ` rguenth at gcc dot gnu.org
2024-03-06 12:15 ` gjl at gcc dot gnu.org
2024-03-06 12:37 ` rguenth at gcc dot gnu.org
2024-03-06 16:12 ` gjl at gcc dot gnu.org
2024-03-06 17:18 ` rguenther at suse dot de
2024-03-07  7:45 ` rguenth at gcc dot gnu.org
2024-03-07  8:45 ` gjl at gcc dot gnu.org [this message]
2024-03-07  9:05 ` gjl at gcc dot gnu.org
2024-03-07  9:14 ` rguenth at gcc dot gnu.org
2024-03-07  9:42 ` rguenth at gcc dot gnu.org
2024-03-07 10:47 ` gjl at gcc dot gnu.org
2024-03-07 10:56 ` rguenth at gcc dot gnu.org
2024-03-07 14:12 ` gjl at gcc dot gnu.org
2024-03-07 15:02 ` pinskia at gcc dot gnu.org
2024-03-07 17:52 ` rguenther at suse dot de

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bug-114252-4-KXiWzLoQ2w@http.gcc.gnu.org/bugzilla/ \
    --to=gcc-bugzilla@gcc.gnu.org \
    --cc=gcc-bugs@gcc.gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).