public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug tree-optimization/95784] New: Failure to optimize usage of __builtin_add_overflow with return statement properly
@ 2020-06-20 10:13 gabravier at gmail dot com
2020-06-20 12:09 ` [Bug target/95784] " pinskia at gcc dot gnu.org
` (9 more replies)
0 siblings, 10 replies; 11+ messages in thread
From: gabravier at gmail dot com @ 2020-06-20 10:13 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95784
Bug ID: 95784
Summary: Failure to optimize usage of __builtin_add_overflow
with return statement properly
Product: gcc
Version: 11.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: gabravier at gmail dot com
Target Milestone: ---
int f(uint8_t operand, int8_t *result)
{
if (__builtin_add_overflow(operand, 0, result))
{
*result = 0;
return 10213;
}
return 0;
}
With -O3, LLVM outputs this :
f(unsigned char, signed char*): # @f(unsigned char, signed char*)
movsx eax, dil
xor ecx, ecx
cmp edi, eax
cmovne edi, ecx
mov byte ptr [rsi], dil
mov eax, 10213
cmove eax, ecx
ret
GCC outputs this :
f(unsigned char, signed char*):
movzx eax, dil
movsx di, dil
cmp ax, di
setne dl
mov r8d, edx
sal r8d, 31
sar r8d, 31
and r8d, 10213
test dl, dl
mov edx, 0
cmovne eax, edx
mov BYTE PTR [rsi], al
mov eax, r8d
ret
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug target/95784] Failure to optimize usage of __builtin_add_overflow with return statement properly
2020-06-20 10:13 [Bug tree-optimization/95784] New: Failure to optimize usage of __builtin_add_overflow with return statement properly gabravier at gmail dot com
@ 2020-06-20 12:09 ` pinskia at gcc dot gnu.org
2020-06-20 14:01 ` gabravier at gmail dot com
` (8 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: pinskia at gcc dot gnu.org @ 2020-06-20 12:09 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95784
Andrew Pinski <pinskia at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Component|tree-optimization |target
--- Comment #1 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
I dont see an issue here. Gcc is avoiding one cmov by doing a sign extend from
bit 0 (sal/sar) and an and instruction.
Cmov on x86 processors is a bit weird and sometimes slower than using
arthematic instructions.
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug target/95784] Failure to optimize usage of __builtin_add_overflow with return statement properly
2020-06-20 10:13 [Bug tree-optimization/95784] New: Failure to optimize usage of __builtin_add_overflow with return statement properly gabravier at gmail dot com
2020-06-20 12:09 ` [Bug target/95784] " pinskia at gcc dot gnu.org
@ 2020-06-20 14:01 ` gabravier at gmail dot com
2020-06-20 14:02 ` gabravier at gmail dot com
` (7 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: gabravier at gmail dot com @ 2020-06-20 14:01 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95784
--- Comment #2 from Gabriel Ravier <gabravier at gmail dot com> ---
cmov is so slow that :
- 1 movzx
- 1 setcc
- 1 sal
- 1 sar
- 1 and
- 1 test
is worth avoiding it ? From what I can see, the dependencies for the LLVM
version for the first cmov are :
- The left operand, which the preceding cmp already depends on
- EFLAGS from the cmp
- The 0 from the xor
And the dependencies for the second cmov are :
- The left operand from the preceding mov
- The right operand, which the preceding cmov already depends on
- EFLAGS, which the preceding cmov already depends on
And I can't see how the dependencies for the second cmov can slow it down so
much that doing all the extra computing is worth it.
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug target/95784] Failure to optimize usage of __builtin_add_overflow with return statement properly
2020-06-20 10:13 [Bug tree-optimization/95784] New: Failure to optimize usage of __builtin_add_overflow with return statement properly gabravier at gmail dot com
2020-06-20 12:09 ` [Bug target/95784] " pinskia at gcc dot gnu.org
2020-06-20 14:01 ` gabravier at gmail dot com
@ 2020-06-20 14:02 ` gabravier at gmail dot com
2020-06-20 14:25 ` gabravier at gmail dot com
` (6 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: gabravier at gmail dot com @ 2020-06-20 14:02 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95784
--- Comment #3 from Gabriel Ravier <gabravier at gmail dot com> ---
The most important problem here is really that all the computations for r8d are
dependant on each other, too, the sal+sar+and chain all depend on the previous
operation, the LLVM version seems much better for OOO than the GCC version.
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug target/95784] Failure to optimize usage of __builtin_add_overflow with return statement properly
2020-06-20 10:13 [Bug tree-optimization/95784] New: Failure to optimize usage of __builtin_add_overflow with return statement properly gabravier at gmail dot com
` (2 preceding siblings ...)
2020-06-20 14:02 ` gabravier at gmail dot com
@ 2020-06-20 14:25 ` gabravier at gmail dot com
2020-06-20 14:27 ` gabravier at gmail dot com
` (5 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: gabravier at gmail dot com @ 2020-06-20 14:25 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95784
--- Comment #4 from Gabriel Ravier <gabravier at gmail dot com> ---
I've tried to benchmark this on my "Intel(R) Core(TM) i5-4310U CPU @ 2.00GHz"
(this is from /proc/cpuinfo) and I'm getting some really weird results, so if
you want to try to assist me in benchmarking (which should determine which
version really is faster) that'd be welcome. I've tried to benchmark this with
the attached test.S source file and got these results :
$ gcc test.S -O3 -ggdb3 -DGCC_VERSION && time ./a.out && gcc test.S -O3 -ggdb3
-DLLVM_VERSION && time ./a.out && gcc test.S -O3 -ggdb3 && time ./a.out
real 0m2.210s # GCC version
user 0m2.200s
sys 0m0.001s
real 0m1.825s # LLVM version is faster as I expected
user 0m1.817s
sys 0m0.001s
real 0m2.080s # Doing nothing is somehow slower than the LLVM version ?
user 0m2.071s
sys 0m0.001s
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug target/95784] Failure to optimize usage of __builtin_add_overflow with return statement properly
2020-06-20 10:13 [Bug tree-optimization/95784] New: Failure to optimize usage of __builtin_add_overflow with return statement properly gabravier at gmail dot com
` (3 preceding siblings ...)
2020-06-20 14:25 ` gabravier at gmail dot com
@ 2020-06-20 14:27 ` gabravier at gmail dot com
2020-06-20 14:33 ` gabravier at gmail dot com
` (4 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: gabravier at gmail dot com @ 2020-06-20 14:27 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95784
--- Comment #5 from Gabriel Ravier <gabravier at gmail dot com> ---
Created attachment 48760
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=48760&action=edit
File for benchmarking this function
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug target/95784] Failure to optimize usage of __builtin_add_overflow with return statement properly
2020-06-20 10:13 [Bug tree-optimization/95784] New: Failure to optimize usage of __builtin_add_overflow with return statement properly gabravier at gmail dot com
` (4 preceding siblings ...)
2020-06-20 14:27 ` gabravier at gmail dot com
@ 2020-06-20 14:33 ` gabravier at gmail dot com
2020-06-20 15:51 ` jakub at gcc dot gnu.org
` (3 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: gabravier at gmail dot com @ 2020-06-20 14:33 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95784
--- Comment #6 from Gabriel Ravier <gabravier at gmail dot com> ---
Created attachment 48761
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=48761&action=edit
File for benchmarking this function but everything is aligned properly.
I've changed the source file slightly, it looks like the LLVM version was
faster than the "do nothing" version because the loop was misaligned. This is
the test results I get with the version with aligned loops (I've also adjusted
the amount of iterations) :
$ gcc test.S -O3 -ggdb3 -DGCC_VERSION && time ./a.out && gcc test.S -O3 -ggdb3
-DLLVM_VERSION && time ./a.out && gcc test.S -O3 -ggdb3 && time ./a.out
real 0m3.130s # GCC version
user 0m3.122s
sys 0m0.001s
real 0m2.599s # LLVM version
user 0m2.593s
sys 0m0.001s
real 0m2.597s # version that does nothing
user 0m2.591s
sys 0m0.000s
I can now note that the LLVM version is now almost as fast as literally doing
nothing, so now it looks really much better than the GCC version, at least to
me.
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug target/95784] Failure to optimize usage of __builtin_add_overflow with return statement properly
2020-06-20 10:13 [Bug tree-optimization/95784] New: Failure to optimize usage of __builtin_add_overflow with return statement properly gabravier at gmail dot com
` (5 preceding siblings ...)
2020-06-20 14:33 ` gabravier at gmail dot com
@ 2020-06-20 15:51 ` jakub at gcc dot gnu.org
2020-06-20 19:05 ` gabravier at gmail dot com
` (2 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: jakub at gcc dot gnu.org @ 2020-06-20 15:51 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95784
Jakub Jelinek <jakub at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |jakub at gcc dot gnu.org
--- Comment #7 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
This is not a good benchmark, you aren't actually using the result of the cmov
(e.g. in the caller) in any way, so no wonder you don't care about the terrible
latency it has.
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug target/95784] Failure to optimize usage of __builtin_add_overflow with return statement properly
2020-06-20 10:13 [Bug tree-optimization/95784] New: Failure to optimize usage of __builtin_add_overflow with return statement properly gabravier at gmail dot com
` (6 preceding siblings ...)
2020-06-20 15:51 ` jakub at gcc dot gnu.org
@ 2020-06-20 19:05 ` gabravier at gmail dot com
2020-06-22 8:05 ` rguenth at gcc dot gnu.org
2021-08-03 3:44 ` [Bug rtl-optimization/95784] " pinskia at gcc dot gnu.org
9 siblings, 0 replies; 11+ messages in thread
From: gabravier at gmail dot com @ 2020-06-20 19:05 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95784
Gabriel Ravier <gabravier at gmail dot com> changed:
What |Removed |Added
----------------------------------------------------------------------------
Attachment #48760|0 |1
is obsolete| |
Attachment #48761|0 |1
is obsolete| |
--- Comment #8 from Gabriel Ravier <gabravier at gmail dot com> ---
Created attachment 48763
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=48763&action=edit
File for benchmarking this function but everything is aligned properly and the
result is used
I've added another benchmark that uses the result of the operation (stores it
on the stack). Here are the average time taken to execute the program over 10
runs of each version :
GCC's version : 2.418 seconds
LLVM's version : 2.160 seconds
The "do nothing" version : 2.107 seconds
Do tell me if the benchmark is still flawed.
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug target/95784] Failure to optimize usage of __builtin_add_overflow with return statement properly
2020-06-20 10:13 [Bug tree-optimization/95784] New: Failure to optimize usage of __builtin_add_overflow with return statement properly gabravier at gmail dot com
` (7 preceding siblings ...)
2020-06-20 19:05 ` gabravier at gmail dot com
@ 2020-06-22 8:05 ` rguenth at gcc dot gnu.org
2021-08-03 3:44 ` [Bug rtl-optimization/95784] " pinskia at gcc dot gnu.org
9 siblings, 0 replies; 11+ messages in thread
From: rguenth at gcc dot gnu.org @ 2020-06-22 8:05 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95784
--- Comment #9 from Richard Biener <rguenth at gcc dot gnu.org> ---
I wouldn't be surprised if a version with a branch is faster even with each
of the branches mispredicted. cmovs are weird beasts but since they
are not dependent on each other their latency at least shouldn't add up here
so LLVMs two cmovs shouldnt be worse off than GCCs one cmov. You'd need to
compare against a variant without any cmov.
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug rtl-optimization/95784] Failure to optimize usage of __builtin_add_overflow with return statement properly
2020-06-20 10:13 [Bug tree-optimization/95784] New: Failure to optimize usage of __builtin_add_overflow with return statement properly gabravier at gmail dot com
` (8 preceding siblings ...)
2020-06-22 8:05 ` rguenth at gcc dot gnu.org
@ 2021-08-03 3:44 ` pinskia at gcc dot gnu.org
9 siblings, 0 replies; 11+ messages in thread
From: pinskia at gcc dot gnu.org @ 2021-08-03 3:44 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95784
Andrew Pinski <pinskia at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Last reconfirmed| |2021-08-03
Status|UNCONFIRMED |NEW
Component|target |rtl-optimization
Ever confirmed|0 |1
--- Comment #10 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Even aarch64 has:
cmp w3, w4, uxth
cset w3, ne
cmp w3, 0
csel w2, w2, wzr, eq
csel w0, w0, wzr, ne
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2021-08-03 3:44 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-06-20 10:13 [Bug tree-optimization/95784] New: Failure to optimize usage of __builtin_add_overflow with return statement properly gabravier at gmail dot com
2020-06-20 12:09 ` [Bug target/95784] " pinskia at gcc dot gnu.org
2020-06-20 14:01 ` gabravier at gmail dot com
2020-06-20 14:02 ` gabravier at gmail dot com
2020-06-20 14:25 ` gabravier at gmail dot com
2020-06-20 14:27 ` gabravier at gmail dot com
2020-06-20 14:33 ` gabravier at gmail dot com
2020-06-20 15:51 ` jakub at gcc dot gnu.org
2020-06-20 19:05 ` gabravier at gmail dot com
2020-06-22 8:05 ` rguenth at gcc dot gnu.org
2021-08-03 3:44 ` [Bug rtl-optimization/95784] " pinskia at gcc dot gnu.org
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).