public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug c/101950] New: __builtin_clrsb is never inlined
@ 2021-08-17 16:13 sven.koehler at gmail dot com
  2021-08-17 18:01 ` [Bug c/101950] " jakub at gcc dot gnu.org
                   ` (4 more replies)
  0 siblings, 5 replies; 6+ messages in thread
From: sven.koehler at gmail dot com @ 2021-08-17 16:13 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101950

            Bug ID: 101950
           Summary: __builtin_clrsb is never inlined
           Product: gcc
           Version: 11.1.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c
          Assignee: unassigned at gcc dot gnu.org
          Reporter: sven.koehler at gmail dot com
  Target Milestone: ---

With gcc 11.1 on ARM 32-bit and Intel, I don't see that __builtin_clrsb is
inlined. On AARCH64 it is inlined and the cls instruction is used, as expected.
I use the C-code below to compare the assembly generated. For ARM, I use -O3
-mcpu=cortex-a53 -marm and for Intel I just use -O3.


On ARM 32-bit, clrsb1 seems to be the fastest code (see below for the assembly
code) since clz handles zero correctly. On Intel, bsr does not handle zero,
hence the workaround of setting the lsb before calling __builtin_clzl (see
below for the assembly code). On Intel, clrsb1 is slighly longer and uses a
jump to handle the zero case. clang apparently uses variant clrsb1 on ARM and
Intel, and it's inlined on both architectures when using -O3.





#define SHIFT (sizeof(x)*8-1)

int clz(unsigned long x) {
    if (x == 0) {
        return sizeof(x)*8;
    }
    return __builtin_clzl(x);
}

int clsb(long x) {
    return clz(x ^ (x >> SHIFT));
}

int clrsb1(long x) {
    return clsb(x)-1;
}

int clrsb2(long x) {
    x = ((x << 1) ^ (x >> SHIFT)) | 1;
    return __builtin_clzl(x);
}

int clrsb3(long x) {
    return __builtin_clrsbl(x);
}



on ARM 32-bit:
clrsb1:
        eor     x0, x0, x0, asr 63
        clz     x0, x0
        sub     w0, w0, #1
        ret

on Intel:
clrsb2:
        lea     rax, [rdi+rdi]
        sar     rdi, 63
        xor     rax, rdi
        or      rax, 1
        bsr     rax, rax
        xor     eax, 63
        ret

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2021-08-25  2:45 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-08-17 16:13 [Bug c/101950] New: __builtin_clrsb is never inlined sven.koehler at gmail dot com
2021-08-17 18:01 ` [Bug c/101950] " jakub at gcc dot gnu.org
2021-08-18 12:04 ` [Bug middle-end/101950] " jakub at gcc dot gnu.org
2021-08-19  9:02 ` cvs-commit at gcc dot gnu.org
2021-08-19  9:03 ` jakub at gcc dot gnu.org
2021-08-25  2:45 ` pinskia at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).