public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug target/105928] New: [AArch64] 64-bit constants with same high/low halves can use ADD lsl 32 (-Os at least)
@ 2022-06-11 20:19 peter at cordes dot ca
  2022-07-05 11:48 ` [Bug target/105928] " rsandifo at gcc dot gnu.org
                   ` (4 more replies)
  0 siblings, 5 replies; 6+ messages in thread
From: peter at cordes dot ca @ 2022-06-11 20:19 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105928

            Bug ID: 105928
           Summary: [AArch64] 64-bit constants with same high/low halves
                    can use ADD lsl 32 (-Os at least)
           Product: gcc
           Version: 13.0
            Status: UNCONFIRMED
          Keywords: missed-optimization
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: peter at cordes dot ca
  Target Milestone: ---
            Target: arm64-*-*

void foo(unsigned long *p) {
    *p = 0xdeadbeefdeadbeef;
}

cleverly compiles to https://godbolt.org/z/b3oqao5Kz

        mov     w1, 48879
        movk    w1, 0xdead, lsl 16
        stp     w1, w1, [x0]
        ret

But producing the value in a register uses more than 3 instructions:

unsigned long constant(){
    return 0xdeadbeefdeadbeef;
}

        mov     x0, 48879
        movk    x0, 0xdead, lsl 16
        movk    x0, 0xbeef, lsl 32
        movk    x0, 0xdead, lsl 48
        ret

At least with -Os, and maybe at -O2 or -O3 if it's efficient, we could be doing
a shifted ADD or ORR to broadcast a zero-extended 32-bit value to 64-bit.

        mov     x0, 48879
        movk    x0, 0xdead, lsl 16
        add     x0, x0, x0, lsl 32

Some CPUs may fuse sequences of movk, and shifted operands for ALU ops may take
extra time in some CPUs, so this might not actually be optimal for performance,
but it is smaller for -Os and -Oz.

We should also be using that trick for stores to _Atomic or volatile long*,
where we currently do MOV + 3x MOVK, then an STR, with ARMv8.4-a which
guarantees atomicity.


---

ARMv8.4-a and later guarantees atomicity for ldp/stp within an aligned 16-byte
chunk, so we should use MOV/MOVK / STP there even for volatile or
__ATOMIC_RELAXED.  But presumably that's a different part of GCC's internals,
so I'll report that separately.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2023-09-18 12:33 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-06-11 20:19 [Bug target/105928] New: [AArch64] 64-bit constants with same high/low halves can use ADD lsl 32 (-Os at least) peter at cordes dot ca
2022-07-05 11:48 ` [Bug target/105928] " rsandifo at gcc dot gnu.org
2023-09-13 13:08 ` wilco at gcc dot gnu.org
2023-09-14 15:26 ` wilco at gcc dot gnu.org
2023-09-18 12:28 ` cvs-commit at gcc dot gnu.org
2023-09-18 12:33 ` wilco at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).