public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug middle-end/66986] New: poor performance of __builtin_isinf on x64
@ 2015-07-24 7:18 neleai at seznam dot cz
2015-07-24 7:20 ` [Bug middle-end/66986] " neleai at seznam dot cz
` (5 more replies)
0 siblings, 6 replies; 7+ messages in thread
From: neleai at seznam dot cz @ 2015-07-24 7:18 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66986
Bug ID: 66986
Summary: poor performance of __builtin_isinf on x64
Product: gcc
Version: unknown
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: middle-end
Assignee: unassigned at gcc dot gnu.org
Reporter: neleai at seznam dot cz
Target Milestone: ---
Created attachment 36046
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=36046&action=edit
benchmark.
Hi,
On x64 floating builtins are considerably slower than equivalent integer code.
For easier tracking I split this in per-function basis.
Currently libc uses separate inlines to get better performance than buildins so
fixing these would be appreciated.
As for isinf main problem is how to measure it. Depending on use case parts of
it will be optimized away.
It tests 12 cases which combination of surrounding expression
1:
if (isinf2 (x))
*d+=42;
2:
if (isinf2 (x))
abort();
3:
*d += isinf2 (x);
where in cases 1 and 3 gcc tries to generate branchless code which does more
harm than good, case 2 is common usage to represent if (isinf(x) foo() else
bar()
Then its tried by placing this expression in function and selectively inlining
we measure effect of constructing needed integer constants.
Attached benchmark clearly shows that its better to use following inline
instead.
#define EXTRACT_WORDS64(i, d) \
do { \
int64_t i_; \
asm ("movq %1, %0" : "=rm" (i_) : "x" ((double) (d))); \
(i) = i_; \
} while (0)
int I2
isinf2 (double dx)
{
unsigned long x;
EXTRACT_WORDS64(dx, x);
if (2 * x == 0xffe0000000000000)
return 0;
else
return (int) (x >> 32);
}
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug middle-end/66986] poor performance of __builtin_isinf on x64
2015-07-24 7:18 [Bug middle-end/66986] New: poor performance of __builtin_isinf on x64 neleai at seznam dot cz
@ 2015-07-24 7:20 ` neleai at seznam dot cz
2015-07-24 7:29 ` [Bug target/66986] " pinskia at gcc dot gnu.org
` (4 subsequent siblings)
5 siblings, 0 replies; 7+ messages in thread
From: neleai at seznam dot cz @ 2015-07-24 7:20 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66986
--- Comment #1 from Ondrej Bilka <neleai at seznam dot cz> ---
Created attachment 36047
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=36047&action=edit
testing script
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug target/66986] poor performance of __builtin_isinf on x64
2015-07-24 7:18 [Bug middle-end/66986] New: poor performance of __builtin_isinf on x64 neleai at seznam dot cz
2015-07-24 7:20 ` [Bug middle-end/66986] " neleai at seznam dot cz
2015-07-24 7:29 ` [Bug target/66986] " pinskia at gcc dot gnu.org
@ 2015-07-24 7:29 ` pinskia at gcc dot gnu.org
2015-07-24 7:41 ` neleai at seznam dot cz
` (2 subsequent siblings)
5 siblings, 0 replies; 7+ messages in thread
From: pinskia at gcc dot gnu.org @ 2015-07-24 7:29 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66986
Andrew Pinski <pinskia at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Component|middle-end |target
--- Comment #2 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Which processor is in this on because I suspect it depends on the processor
really.
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug target/66986] poor performance of __builtin_isinf on x64
2015-07-24 7:18 [Bug middle-end/66986] New: poor performance of __builtin_isinf on x64 neleai at seznam dot cz
2015-07-24 7:20 ` [Bug middle-end/66986] " neleai at seznam dot cz
@ 2015-07-24 7:29 ` pinskia at gcc dot gnu.org
2015-07-24 7:29 ` pinskia at gcc dot gnu.org
` (3 subsequent siblings)
5 siblings, 0 replies; 7+ messages in thread
From: pinskia at gcc dot gnu.org @ 2015-07-24 7:29 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66986
--- Comment #3 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Also have you tried adding -march=native ?
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug target/66986] poor performance of __builtin_isinf on x64
2015-07-24 7:18 [Bug middle-end/66986] New: poor performance of __builtin_isinf on x64 neleai at seznam dot cz
` (2 preceding siblings ...)
2015-07-24 7:29 ` pinskia at gcc dot gnu.org
@ 2015-07-24 7:41 ` neleai at seznam dot cz
2015-07-24 7:48 ` neleai at seznam dot cz
2021-08-07 1:03 ` pinskia at gcc dot gnu.org
5 siblings, 0 replies; 7+ messages in thread
From: neleai at seznam dot cz @ 2015-07-24 7:41 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66986
--- Comment #4 from Ondrej Bilka <neleai at seznam dot cz> ---
Ok added updated benchmark with adding -mtune=native and tests for core2,
haswell and fx10. It stays pretty consistent.
don't inline
conditional add
branched
real 0m0.698s
user 0m0.698s
sys 0m0.000s
builtin
real 0m0.775s
user 0m0.776s
sys 0m0.000s
branch
branched
real 0m0.715s
user 0m0.716s
sys 0m0.000s
builtin
real 0m0.774s
user 0m0.774s
sys 0m0.000s
sum
branched
real 0m0.697s
user 0m0.697s
sys 0m0.000s
builtin
real 0m0.774s
user 0m0.775s
sys 0m0.000s
inline outer call
conditional add
branched
real 0m0.391s
user 0m0.391s
sys 0m0.000s
builtin
real 0m0.543s
user 0m0.543s
sys 0m0.000s
branch
branched
real 0m0.391s
user 0m0.392s
sys 0m0.000s
builtin
real 0m0.404s
user 0m0.404s
sys 0m0.000s
sum
branched
real 0m0.695s
user 0m0.696s
sys 0m0.000s
builtin
real 0m0.695s
user 0m0.696s
sys 0m0.000s
inline inner call
conditional add
branched
real 0m0.466s
user 0m0.466s
sys 0m0.000s
builtin
real 0m0.541s
user 0m0.541s
sys 0m0.000s
branch
branched
real 0m0.389s
user 0m0.388s
sys 0m0.000s
builtin
real 0m0.430s
user 0m0.427s
sys 0m0.003s
sum
branched
real 0m0.700s
user 0m0.701s
sys 0m0.000s
builtin
real 0m0.695s
user 0m0.695s
sys 0m0.000s
tigth loop
conditional add
branched
real 0m0.080s
user 0m0.079s
sys 0m0.000s
builtin
real 0m0.158s
user 0m0.158s
sys 0m0.000s
branch
branched
real 0m0.080s
user 0m0.080s
sys 0m0.000s
builtin
real 0m0.160s
user 0m0.160s
sys 0m0.000s
sum
branched
real 0m0.233s
user 0m0.232s
sys 0m0.000s
builtin
real 0m0.310s
user 0m0.311s
sys 0m0.000s
fx10
don't inline
conditional add
branched
real 0m0.803s
user 0m0.804s
sys 0m0.000s
builtin
real 0m0.861s
user 0m0.862s
sys 0m0.000s
branch
branched
real 0m0.650s
user 0m0.650s
sys 0m0.000s
builtin
real 0m0.686s
user 0m0.683s
sys 0m0.004s
sum
branched
real 0m1.300s
user 0m1.299s
sys 0m0.004s
builtin
real 0m1.347s
user 0m1.346s
sys 0m0.004s
inline outer call
conditional add
branched
real 0m0.366s
user 0m0.366s
sys 0m0.000s
builtin
real 0m0.539s
user 0m0.539s
sys 0m0.000s
branch
branched
real 0m0.367s
user 0m0.364s
sys 0m0.004s
builtin
real 0m0.416s
user 0m0.413s
sys 0m0.004s
sum
branched
real 0m1.301s
user 0m1.303s
sys 0m0.000s
builtin
real 0m1.307s
user 0m1.308s
sys 0m0.000s
inline inner call
conditional add
branched
real 0m0.587s
user 0m0.587s
sys 0m0.000s
builtin
real 0m0.590s
user 0m0.586s
sys 0m0.004s
branch
branched
real 0m0.516s
user 0m0.517s
sys 0m0.000s
builtin
real 0m0.553s
user 0m0.553s
sys 0m0.001s
sum
branched
real 0m1.294s
user 0m1.295s
sys 0m0.000s
builtin
real 0m1.310s
user 0m1.309s
sys 0m0.004s
tigth loop
conditional add
branched
real 0m0.118s
user 0m0.115s
sys 0m0.004s
builtin
real 0m0.409s
user 0m0.409s
sys 0m0.000s
branch
branched
real 0m0.154s
user 0m0.154s
sys 0m0.000s
builtin
real 0m0.262s
user 0m0.263s
sys 0m0.000s
sum
branched
real 0m0.369s
user 0m0.369s
sys 0m0.000s
builtin
real 0m0.408s
user 0m0.409s
sys 0m0.000s
core2
don't inline
conditional add
branched
real 0m1.573s
user 0m1.573s
sys 0m0.000s
builtin
real 0m1.696s
user 0m1.692s
sys 0m0.004s
branch
branched
real 0m1.455s
user 0m1.455s
sys 0m0.000s
builtin
real 0m1.332s
user 0m1.332s
sys 0m0.000s
sum
branched
real 0m1.332s
user 0m1.328s
sys 0m0.004s
builtin
real 0m1.574s
user 0m1.574s
sys 0m0.000s
inline outer call
conditional add
branched
real 0m0.850s
user 0m0.849s
sys 0m0.000s
builtin
real 0m1.211s
user 0m1.210s
sys 0m0.000s
branch
branched
real 0m0.851s
user 0m0.850s
sys 0m0.000s
builtin
real 0m0.970s
user 0m0.966s
sys 0m0.004s
sum
branched
real 0m1.091s
user 0m1.091s
sys 0m0.000s
builtin
real 0m1.108s
user 0m1.107s
sys 0m0.000s
inline inner call
conditional add
branched
real 0m1.091s
user 0m1.091s
sys 0m0.000s
builtin
real 0m1.091s
user 0m1.091s
sys 0m0.000s
branch
branched
real 0m0.850s
user 0m0.846s
sys 0m0.003s
builtin
real 0m0.970s
user 0m0.969s
sys 0m0.000s
sum
branched
real 0m1.096s
user 0m1.095s
sys 0m0.000s
builtin
real 0m1.101s
user 0m1.100s
sys 0m0.000s
tigth loop
conditional add
branched
real 0m0.126s
user 0m0.126s
sys 0m0.000s
builtin
real 0m0.369s
user 0m0.368s
sys 0m0.000s
branch
branched
real 0m0.124s
user 0m0.124s
sys 0m0.000s
builtin
real 0m0.367s
user 0m0.366s
sys 0m0.000s
sum
branched
real 0m0.365s
user 0m0.364s
sys 0m0.000s
builtin
real 0m0.627s
user 0m0.626s
sys 0m0.000s
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug target/66986] poor performance of __builtin_isinf on x64
2015-07-24 7:18 [Bug middle-end/66986] New: poor performance of __builtin_isinf on x64 neleai at seznam dot cz
` (3 preceding siblings ...)
2015-07-24 7:41 ` neleai at seznam dot cz
@ 2015-07-24 7:48 ` neleai at seznam dot cz
2021-08-07 1:03 ` pinskia at gcc dot gnu.org
5 siblings, 0 replies; 7+ messages in thread
From: neleai at seznam dot cz @ 2015-07-24 7:48 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66986
Ondrej Bilka <neleai at seznam dot cz> changed:
What |Removed |Added
----------------------------------------------------------------------------
Attachment #36047|0 |1
is obsolete| |
--- Comment #5 from Ondrej Bilka <neleai at seznam dot cz> ---
Created attachment 36048
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=36048&action=edit
testing script
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug target/66986] poor performance of __builtin_isinf on x64
2015-07-24 7:18 [Bug middle-end/66986] New: poor performance of __builtin_isinf on x64 neleai at seznam dot cz
` (4 preceding siblings ...)
2015-07-24 7:48 ` neleai at seznam dot cz
@ 2021-08-07 1:03 ` pinskia at gcc dot gnu.org
5 siblings, 0 replies; 7+ messages in thread
From: pinskia at gcc dot gnu.org @ 2021-08-07 1:03 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66986
Andrew Pinski <pinskia at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Resolution|--- |INVALID
Status|UNCONFIRMED |RESOLVED
--- Comment #6 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Your defined isinf2 is incorrect:
int I2
isinf2 (double dx)
{
unsigned long x;
memcpy(&x, &dx, sizeof(dx));
if (2 * x == 0xffe0000000000000)
return 0;
else
return (int) (x >> 32);
}
With that change, the GCC version that is produced is faster.
isinf2:
.LFB22:
.cfi_startproc
#APP
# 19 "/app/example.cpp" 1
movq %xmm0, %rax
# 0 "" 2
#NO_APP
movabsq $-9007199254740992, %rdx
leaq (%rax,%rax), %rcx
shrq $32, %rax
cmpq %rdx, %rcx
movl $0, %edx
cmove %edx, %eax
ret
vs
isinf2:
.LFB22:
.cfi_startproc
xorl %eax, %eax
andpd .LC0(%rip), %xmm0
ucomisd .LC1(%rip), %xmm0
seta %al
ret
For the inlined inlined case (for the T1):
.L15:
movsd (%rax), %xmm0
addsd %xmm4, %xmm0
andpd %xmm3, %xmm0
ucomisd %xmm2, %xmm0
jbe .L14
addsd %xmm5, %xmm1
.L14:
addq $8, %rax
cmpq %rax, %rdx
jne .L15
vs
.L19:
movsd (%rax), %xmm3
addsd %xmm0, %xmm3
movq %xmm3, %rdx
leaq (%rdx,%rdx), %rcx
cmpq %rdi, %rcx
je .L18
shrq $32, %rdx
testl %edx, %edx
je .L18
addsd %xmm2, %xmm1
.L18:
addq $8, %rax
cmpq %rsi, %rax
jne .L19
A double jump
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2021-08-07 1:03 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-07-24 7:18 [Bug middle-end/66986] New: poor performance of __builtin_isinf on x64 neleai at seznam dot cz
2015-07-24 7:20 ` [Bug middle-end/66986] " neleai at seznam dot cz
2015-07-24 7:29 ` [Bug target/66986] " pinskia at gcc dot gnu.org
2015-07-24 7:29 ` pinskia at gcc dot gnu.org
2015-07-24 7:41 ` neleai at seznam dot cz
2015-07-24 7:48 ` neleai at seznam dot cz
2021-08-07 1:03 ` pinskia at gcc dot gnu.org
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).