public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug other/113682] New: Branches in branchless binary search rather than cmov/csel/csinc
@ 2024-01-31 13:48 redbeard0531 at gmail dot com
  2024-01-31 14:16 ` [Bug rtl-optimization/113682] " rguenth at gcc dot gnu.org
                   ` (8 more replies)
  0 siblings, 9 replies; 10+ messages in thread
From: redbeard0531 at gmail dot com @ 2024-01-31 13:48 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113682

            Bug ID: 113682
           Summary: Branches in branchless binary search rather than
                    cmov/csel/csinc
           Product: gcc
           Version: unknown
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: other
          Assignee: unassigned at gcc dot gnu.org
          Reporter: redbeard0531 at gmail dot com
  Target Milestone: ---

I've been trying to eliminate unpredictable branches in a hot function where
perf counters show a high mispredict percentage. Unfortunately, I haven't been
able to find an incantation to get gcc to generate branchless code other than
inline asm, which I'd rather avoid. In this case I've even laid out the
critical lines so that they exactly match the behavior of the csinc and csel
instructions on arm64, but they are still not used.


Somewhat minimized repro:

typedef unsigned long size_t;
struct ITEM {char* addr; size_t len;};
int cmp(ITEM* user, ITEM* tree);

size_t bsearch2(ITEM* user, ITEM** tree, size_t treeSize) {
    auto pos = tree;
    size_t low = 0;
    size_t high = treeSize;
    while (low < high) {
        size_t i = (low + high) / 2;
        int res = cmp(user, tree[i]);

        // These should be cmp + csinc + csel on arm
        // and lea + test + cmov + cmov on x86.
        low = res > 0 ? i + 1 : low; // csinc
        high = res < 0 ? i: high; // csel

        if (res == 0) return i;
    }
    return -1;
}


On arm64 that generates a conditional branch on res > 0:
        bl      cmp(ITEM*, ITEM*)
        cmp     w0, 0
        bgt     .L15 // does low = i + 1 then loops
        mov     x20, x19
        bne     .L4 // loop


On x86_64 it does similar:
        call    cmp(ITEM*, ITEM*)
        test    eax, eax
        jg      .L16 
        jne     .L17


Note that clang produces the desired codegen for both:

arm:
        bl      cmp(ITEM*, ITEM*)
        cmp     w0, #0
        csinc   x23, x23, x22, le
        csel    x19, x22, x19, lt
        cbnz    w0, .LBB0_1

x86:
        call    cmp(ITEM*, ITEM*)@PLT
        lea     rcx, [r12 + 1]
        test    eax, eax
        cmovg   r13, rcx
        cmovs   rbx, r12
        jne     .LBB0_1

(full output for all 4 available at https://www.godbolt.org/z/aWrKbYPTG. Code
snippets from trunk, but also repos on 13.2)

While ideally gcc would generate the branchless output for the supplied code,
if there is some (reasonable) incantation that would cause it to produce
branchless output, I'd be happy to have that too.

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2024-04-03 11:10 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-01-31 13:48 [Bug other/113682] New: Branches in branchless binary search rather than cmov/csel/csinc redbeard0531 at gmail dot com
2024-01-31 14:16 ` [Bug rtl-optimization/113682] " rguenth at gcc dot gnu.org
2024-01-31 16:53 ` pinskia at gcc dot gnu.org
2024-01-31 18:00 ` [Bug middle-end/113682] " pinskia at gcc dot gnu.org
2024-02-01 10:30 ` tnfchris at gcc dot gnu.org
2024-02-01 13:52 ` redbeard0531 at gmail dot com
2024-02-02  1:04 ` pinskia at gcc dot gnu.org
2024-04-02 17:46 ` [Bug rtl-optimization/113682] " tnfchris at gcc dot gnu.org
2024-04-02 18:01 ` pinskia at gcc dot gnu.org
2024-04-03 11:10 ` tnfchris at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).