public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug target/97734] New: GCC using branches when a conditional move would be better
@ 2020-11-05 17:18 rafael at espindo dot la
  2020-11-06  7:23 ` [Bug target/97734] " rguenth at gcc dot gnu.org
                   ` (3 more replies)
  0 siblings, 4 replies; 5+ messages in thread
From: rafael at espindo dot la @ 2020-11-05 17:18 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97734

            Bug ID: 97734
           Summary: GCC using branches when a conditional move would be
                    better
           Product: gcc
           Version: 11.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: rafael at espindo dot la
  Target Milestone: ---

Created attachment 49511
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49511&action=edit
graph

Playing with the code in https://github.com/patmorin/arraylayout I noticed that
I could not reproduce the results for the eytzinger layout. It turns out the
issue was with current gcc selecting moves instead of conditional moves for
that particular code.

A reduced testcase is

#include <stdint.h>
uint64_t branchfree_search(uint64_t x, uint64_t n, uint32_t *a) {
    uint64_t i = 0;
    while (i < n) {
        i = (x <= a[i]) ? (2*i + 1) : (2*i + 2);
    }
    uint64_t j = (i+1) >> __builtin_ffsl(~(i+1));
    return (j == 0) ? n : j-1;
}

I have placed it in

https://gcc.godbolt.org/z/Krqrz7

Results
* ICC: conditional move
* Clang: branches
* GCC 6.4: conditional move
* Newer GCCs with -O2:  branches
* GCC with -Os: conditional move

The attached graph shows how the conditional move is better for "small" array
sizes.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug target/97734] GCC using branches when a conditional move would be better
  2020-11-05 17:18 [Bug target/97734] New: GCC using branches when a conditional move would be better rafael at espindo dot la
@ 2020-11-06  7:23 ` rguenth at gcc dot gnu.org
  2020-11-06  9:38 ` amonakov at gcc dot gnu.org
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 5+ messages in thread
From: rguenth at gcc dot gnu.org @ 2020-11-06  7:23 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97734

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Target|                            |x86_64-*-*
           Keywords|                            |missed-optimization

--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
I guess x86

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug target/97734] GCC using branches when a conditional move would be better
  2020-11-05 17:18 [Bug target/97734] New: GCC using branches when a conditional move would be better rafael at espindo dot la
  2020-11-06  7:23 ` [Bug target/97734] " rguenth at gcc dot gnu.org
@ 2020-11-06  9:38 ` amonakov at gcc dot gnu.org
  2020-11-06 22:15 ` rafael at espindo dot la
  2021-12-15  1:07 ` pinskia at gcc dot gnu.org
  3 siblings, 0 replies; 5+ messages in thread
From: amonakov at gcc dot gnu.org @ 2020-11-06  9:38 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97734

Alexander Monakov <amonakov at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |amonakov at gcc dot gnu.org

--- Comment #2 from Alexander Monakov <amonakov at gcc dot gnu.org> ---
By the time RTL ce2 pass runs the RTL is suitable for if-conversion, but the
pass rejects the transform (probably due to costs, not visible in dumps so
impossible to tell without gdb'ing the compiler).

Note that this cmov lengthens the loop-carried data dependency on 'i', so it's
only beneficial on workloads where the control dependency it replaces
corresponds to an unpredictable branch. And GCC has no way to know that.

For a recent counter-example (cmov dramatically slowing down a loop with a
trivially predictable branch) please see
https://stackoverflow.com/a/64285902/4755075

At risk of needlessly repeating myself: I think what people doing such research
really need is __builtin_branchless_select that is properly guaranteed to be
branchless (selection statement on GIMPLE and branchless sequence on RTL).
Otherwise they spend time tweaking their code to make cmov appear where they
need it.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug target/97734] GCC using branches when a conditional move would be better
  2020-11-05 17:18 [Bug target/97734] New: GCC using branches when a conditional move would be better rafael at espindo dot la
  2020-11-06  7:23 ` [Bug target/97734] " rguenth at gcc dot gnu.org
  2020-11-06  9:38 ` amonakov at gcc dot gnu.org
@ 2020-11-06 22:15 ` rafael at espindo dot la
  2021-12-15  1:07 ` pinskia at gcc dot gnu.org
  3 siblings, 0 replies; 5+ messages in thread
From: rafael at espindo dot la @ 2020-11-06 22:15 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97734

--- Comment #3 from Rafael Avila de Espindola <rafael at espindo dot la> ---
I just realized I made a mistake when producing the reduced testcase. The
argument x should have been an uint32_t. Unfortunately, with that fix gcc
always produces a branch.

ICC still produces a cmov:

https://gcc.godbolt.org/z/zszrPK

I would be very happy with a __builtin_branchless_select.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug target/97734] GCC using branches when a conditional move would be better
  2020-11-05 17:18 [Bug target/97734] New: GCC using branches when a conditional move would be better rafael at espindo dot la
                   ` (2 preceding siblings ...)
  2020-11-06 22:15 ` rafael at espindo dot la
@ 2021-12-15  1:07 ` pinskia at gcc dot gnu.org
  3 siblings, 0 replies; 5+ messages in thread
From: pinskia at gcc dot gnu.org @ 2021-12-15  1:07 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97734

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Severity|normal                      |enhancement
             Status|UNCONFIRMED                 |NEW
   Last reconfirmed|                            |2021-12-15
     Ever confirmed|0                           |1

--- Comment #4 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Note ICX produces a branch too just like LLVM.
MSVC produces:
$LL2@branchfree:
; Line 12
        mov     eax, DWORD PTR [r8+r10*4]
        xor     r9d, r9d
        cmp     rcx, rax
        seta    r9b
        inc     r9
        lea     r10, QWORD PTR [r9+r10*2]
        cmp     r10, rbx
        jb      SHORT $LL2@branchfree


The reason why GCC does not produce the cmov is because gimple produces:
  if (x_20(D) <= _27)
    goto <bb 4>; [50.00%]
  else
    goto <bb 5>; [50.00%]

  <bb 4> [local count: 477815112]:
  _23 = i_31 * 2;
  iftmp.0_17 = _23 + 1;
  goto <bb 6>; [100.00%]

  <bb 5> [local count: 477815112]:
  _25 = i_31 + 1;
  iftmp.0_24 = _25 * 2;

  <bb 6> [local count: 955630224]:
  # i_11 = PHI <iftmp.0_24(5), iftmp.0_17(4)>

If we write the code like:

        uint64_t t22 = 2*i + 1;
        i = (x <= a[i]) ? t22 : t22+1;

Then GCC will produce what MSVC produces (MSVC will still produce it too).
If we add unused variable like:

        uint64_t t22 = 2*i + 1;

GCC will produce the conditional move.

I can't get LLVM to produce either the setb/seta nor conditional move either.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2021-12-15  1:07 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-11-05 17:18 [Bug target/97734] New: GCC using branches when a conditional move would be better rafael at espindo dot la
2020-11-06  7:23 ` [Bug target/97734] " rguenth at gcc dot gnu.org
2020-11-06  9:38 ` amonakov at gcc dot gnu.org
2020-11-06 22:15 ` rafael at espindo dot la
2021-12-15  1:07 ` pinskia at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).