public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug target/97734] New: GCC using branches when a conditional move would be better
@ 2020-11-05 17:18 rafael at espindo dot la
2020-11-06 7:23 ` [Bug target/97734] " rguenth at gcc dot gnu.org
` (3 more replies)
0 siblings, 4 replies; 5+ messages in thread
From: rafael at espindo dot la @ 2020-11-05 17:18 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97734
Bug ID: 97734
Summary: GCC using branches when a conditional move would be
better
Product: gcc
Version: 11.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: rafael at espindo dot la
Target Milestone: ---
Created attachment 49511
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49511&action=edit
graph
Playing with the code in https://github.com/patmorin/arraylayout I noticed that
I could not reproduce the results for the eytzinger layout. It turns out the
issue was with current gcc selecting moves instead of conditional moves for
that particular code.
A reduced testcase is
#include <stdint.h>
uint64_t branchfree_search(uint64_t x, uint64_t n, uint32_t *a) {
uint64_t i = 0;
while (i < n) {
i = (x <= a[i]) ? (2*i + 1) : (2*i + 2);
}
uint64_t j = (i+1) >> __builtin_ffsl(~(i+1));
return (j == 0) ? n : j-1;
}
I have placed it in
https://gcc.godbolt.org/z/Krqrz7
Results
* ICC: conditional move
* Clang: branches
* GCC 6.4: conditional move
* Newer GCCs with -O2: branches
* GCC with -Os: conditional move
The attached graph shows how the conditional move is better for "small" array
sizes.
^ permalink raw reply [flat|nested] 5+ messages in thread
* [Bug target/97734] GCC using branches when a conditional move would be better
2020-11-05 17:18 [Bug target/97734] New: GCC using branches when a conditional move would be better rafael at espindo dot la
@ 2020-11-06 7:23 ` rguenth at gcc dot gnu.org
2020-11-06 9:38 ` amonakov at gcc dot gnu.org
` (2 subsequent siblings)
3 siblings, 0 replies; 5+ messages in thread
From: rguenth at gcc dot gnu.org @ 2020-11-06 7:23 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97734
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Target| |x86_64-*-*
Keywords| |missed-optimization
--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
I guess x86
^ permalink raw reply [flat|nested] 5+ messages in thread
* [Bug target/97734] GCC using branches when a conditional move would be better
2020-11-05 17:18 [Bug target/97734] New: GCC using branches when a conditional move would be better rafael at espindo dot la
2020-11-06 7:23 ` [Bug target/97734] " rguenth at gcc dot gnu.org
@ 2020-11-06 9:38 ` amonakov at gcc dot gnu.org
2020-11-06 22:15 ` rafael at espindo dot la
2021-12-15 1:07 ` pinskia at gcc dot gnu.org
3 siblings, 0 replies; 5+ messages in thread
From: amonakov at gcc dot gnu.org @ 2020-11-06 9:38 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97734
Alexander Monakov <amonakov at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |amonakov at gcc dot gnu.org
--- Comment #2 from Alexander Monakov <amonakov at gcc dot gnu.org> ---
By the time RTL ce2 pass runs the RTL is suitable for if-conversion, but the
pass rejects the transform (probably due to costs, not visible in dumps so
impossible to tell without gdb'ing the compiler).
Note that this cmov lengthens the loop-carried data dependency on 'i', so it's
only beneficial on workloads where the control dependency it replaces
corresponds to an unpredictable branch. And GCC has no way to know that.
For a recent counter-example (cmov dramatically slowing down a loop with a
trivially predictable branch) please see
https://stackoverflow.com/a/64285902/4755075
At risk of needlessly repeating myself: I think what people doing such research
really need is __builtin_branchless_select that is properly guaranteed to be
branchless (selection statement on GIMPLE and branchless sequence on RTL).
Otherwise they spend time tweaking their code to make cmov appear where they
need it.
^ permalink raw reply [flat|nested] 5+ messages in thread
* [Bug target/97734] GCC using branches when a conditional move would be better
2020-11-05 17:18 [Bug target/97734] New: GCC using branches when a conditional move would be better rafael at espindo dot la
2020-11-06 7:23 ` [Bug target/97734] " rguenth at gcc dot gnu.org
2020-11-06 9:38 ` amonakov at gcc dot gnu.org
@ 2020-11-06 22:15 ` rafael at espindo dot la
2021-12-15 1:07 ` pinskia at gcc dot gnu.org
3 siblings, 0 replies; 5+ messages in thread
From: rafael at espindo dot la @ 2020-11-06 22:15 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97734
--- Comment #3 from Rafael Avila de Espindola <rafael at espindo dot la> ---
I just realized I made a mistake when producing the reduced testcase. The
argument x should have been an uint32_t. Unfortunately, with that fix gcc
always produces a branch.
ICC still produces a cmov:
https://gcc.godbolt.org/z/zszrPK
I would be very happy with a __builtin_branchless_select.
^ permalink raw reply [flat|nested] 5+ messages in thread
* [Bug target/97734] GCC using branches when a conditional move would be better
2020-11-05 17:18 [Bug target/97734] New: GCC using branches when a conditional move would be better rafael at espindo dot la
` (2 preceding siblings ...)
2020-11-06 22:15 ` rafael at espindo dot la
@ 2021-12-15 1:07 ` pinskia at gcc dot gnu.org
3 siblings, 0 replies; 5+ messages in thread
From: pinskia at gcc dot gnu.org @ 2021-12-15 1:07 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97734
Andrew Pinski <pinskia at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Severity|normal |enhancement
Status|UNCONFIRMED |NEW
Last reconfirmed| |2021-12-15
Ever confirmed|0 |1
--- Comment #4 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Note ICX produces a branch too just like LLVM.
MSVC produces:
$LL2@branchfree:
; Line 12
mov eax, DWORD PTR [r8+r10*4]
xor r9d, r9d
cmp rcx, rax
seta r9b
inc r9
lea r10, QWORD PTR [r9+r10*2]
cmp r10, rbx
jb SHORT $LL2@branchfree
The reason why GCC does not produce the cmov is because gimple produces:
if (x_20(D) <= _27)
goto <bb 4>; [50.00%]
else
goto <bb 5>; [50.00%]
<bb 4> [local count: 477815112]:
_23 = i_31 * 2;
iftmp.0_17 = _23 + 1;
goto <bb 6>; [100.00%]
<bb 5> [local count: 477815112]:
_25 = i_31 + 1;
iftmp.0_24 = _25 * 2;
<bb 6> [local count: 955630224]:
# i_11 = PHI <iftmp.0_24(5), iftmp.0_17(4)>
If we write the code like:
uint64_t t22 = 2*i + 1;
i = (x <= a[i]) ? t22 : t22+1;
Then GCC will produce what MSVC produces (MSVC will still produce it too).
If we add unused variable like:
uint64_t t22 = 2*i + 1;
GCC will produce the conditional move.
I can't get LLVM to produce either the setb/seta nor conditional move either.
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2021-12-15 1:07 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-11-05 17:18 [Bug target/97734] New: GCC using branches when a conditional move would be better rafael at espindo dot la
2020-11-06 7:23 ` [Bug target/97734] " rguenth at gcc dot gnu.org
2020-11-06 9:38 ` amonakov at gcc dot gnu.org
2020-11-06 22:15 ` rafael at espindo dot la
2021-12-15 1:07 ` pinskia at gcc dot gnu.org
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).