public inbox for gcc-bugs@sourceware.org help / color / mirror / Atom feed
* [Bug middle-end/66986] New: poor performance of __builtin_isinf on x64 @ 2015-07-24 7:18 neleai at seznam dot cz 2015-07-24 7:20 ` [Bug middle-end/66986] " neleai at seznam dot cz ` (5 more replies) 0 siblings, 6 replies; 7+ messages in thread From: neleai at seznam dot cz @ 2015-07-24 7:18 UTC (permalink / raw) To: gcc-bugs https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66986 Bug ID: 66986 Summary: poor performance of __builtin_isinf on x64 Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: middle-end Assignee: unassigned at gcc dot gnu.org Reporter: neleai at seznam dot cz Target Milestone: --- Created attachment 36046 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=36046&action=edit benchmark. Hi, On x64 floating builtins are considerably slower than equivalent integer code. For easier tracking I split this in per-function basis. Currently libc uses separate inlines to get better performance than buildins so fixing these would be appreciated. As for isinf main problem is how to measure it. Depending on use case parts of it will be optimized away. It tests 12 cases which combination of surrounding expression 1: if (isinf2 (x)) *d+=42; 2: if (isinf2 (x)) abort(); 3: *d += isinf2 (x); where in cases 1 and 3 gcc tries to generate branchless code which does more harm than good, case 2 is common usage to represent if (isinf(x) foo() else bar() Then its tried by placing this expression in function and selectively inlining we measure effect of constructing needed integer constants. Attached benchmark clearly shows that its better to use following inline instead. #define EXTRACT_WORDS64(i, d) \ do { \ int64_t i_; \ asm ("movq %1, %0" : "=rm" (i_) : "x" ((double) (d))); \ (i) = i_; \ } while (0) int I2 isinf2 (double dx) { unsigned long x; EXTRACT_WORDS64(dx, x); if (2 * x == 0xffe0000000000000) return 0; else return (int) (x >> 32); } ^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug middle-end/66986] poor performance of __builtin_isinf on x64 2015-07-24 7:18 [Bug middle-end/66986] New: poor performance of __builtin_isinf on x64 neleai at seznam dot cz @ 2015-07-24 7:20 ` neleai at seznam dot cz 2015-07-24 7:29 ` [Bug target/66986] " pinskia at gcc dot gnu.org ` (4 subsequent siblings) 5 siblings, 0 replies; 7+ messages in thread From: neleai at seznam dot cz @ 2015-07-24 7:20 UTC (permalink / raw) To: gcc-bugs https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66986 --- Comment #1 from Ondrej Bilka <neleai at seznam dot cz> --- Created attachment 36047 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=36047&action=edit testing script ^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug target/66986] poor performance of __builtin_isinf on x64 2015-07-24 7:18 [Bug middle-end/66986] New: poor performance of __builtin_isinf on x64 neleai at seznam dot cz 2015-07-24 7:20 ` [Bug middle-end/66986] " neleai at seznam dot cz @ 2015-07-24 7:29 ` pinskia at gcc dot gnu.org 2015-07-24 7:29 ` pinskia at gcc dot gnu.org ` (3 subsequent siblings) 5 siblings, 0 replies; 7+ messages in thread From: pinskia at gcc dot gnu.org @ 2015-07-24 7:29 UTC (permalink / raw) To: gcc-bugs https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66986 --- Comment #3 from Andrew Pinski <pinskia at gcc dot gnu.org> --- Also have you tried adding -march=native ? ^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug target/66986] poor performance of __builtin_isinf on x64 2015-07-24 7:18 [Bug middle-end/66986] New: poor performance of __builtin_isinf on x64 neleai at seznam dot cz 2015-07-24 7:20 ` [Bug middle-end/66986] " neleai at seznam dot cz 2015-07-24 7:29 ` [Bug target/66986] " pinskia at gcc dot gnu.org @ 2015-07-24 7:29 ` pinskia at gcc dot gnu.org 2015-07-24 7:41 ` neleai at seznam dot cz ` (2 subsequent siblings) 5 siblings, 0 replies; 7+ messages in thread From: pinskia at gcc dot gnu.org @ 2015-07-24 7:29 UTC (permalink / raw) To: gcc-bugs https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66986 Andrew Pinski <pinskia at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Component|middle-end |target --- Comment #2 from Andrew Pinski <pinskia at gcc dot gnu.org> --- Which processor is in this on because I suspect it depends on the processor really. ^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug target/66986] poor performance of __builtin_isinf on x64 2015-07-24 7:18 [Bug middle-end/66986] New: poor performance of __builtin_isinf on x64 neleai at seznam dot cz ` (2 preceding siblings ...) 2015-07-24 7:29 ` pinskia at gcc dot gnu.org @ 2015-07-24 7:41 ` neleai at seznam dot cz 2015-07-24 7:48 ` neleai at seznam dot cz 2021-08-07 1:03 ` pinskia at gcc dot gnu.org 5 siblings, 0 replies; 7+ messages in thread From: neleai at seznam dot cz @ 2015-07-24 7:41 UTC (permalink / raw) To: gcc-bugs https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66986 --- Comment #4 from Ondrej Bilka <neleai at seznam dot cz> --- Ok added updated benchmark with adding -mtune=native and tests for core2, haswell and fx10. It stays pretty consistent. don't inline conditional add branched real 0m0.698s user 0m0.698s sys 0m0.000s builtin real 0m0.775s user 0m0.776s sys 0m0.000s branch branched real 0m0.715s user 0m0.716s sys 0m0.000s builtin real 0m0.774s user 0m0.774s sys 0m0.000s sum branched real 0m0.697s user 0m0.697s sys 0m0.000s builtin real 0m0.774s user 0m0.775s sys 0m0.000s inline outer call conditional add branched real 0m0.391s user 0m0.391s sys 0m0.000s builtin real 0m0.543s user 0m0.543s sys 0m0.000s branch branched real 0m0.391s user 0m0.392s sys 0m0.000s builtin real 0m0.404s user 0m0.404s sys 0m0.000s sum branched real 0m0.695s user 0m0.696s sys 0m0.000s builtin real 0m0.695s user 0m0.696s sys 0m0.000s inline inner call conditional add branched real 0m0.466s user 0m0.466s sys 0m0.000s builtin real 0m0.541s user 0m0.541s sys 0m0.000s branch branched real 0m0.389s user 0m0.388s sys 0m0.000s builtin real 0m0.430s user 0m0.427s sys 0m0.003s sum branched real 0m0.700s user 0m0.701s sys 0m0.000s builtin real 0m0.695s user 0m0.695s sys 0m0.000s tigth loop conditional add branched real 0m0.080s user 0m0.079s sys 0m0.000s builtin real 0m0.158s user 0m0.158s sys 0m0.000s branch branched real 0m0.080s user 0m0.080s sys 0m0.000s builtin real 0m0.160s user 0m0.160s sys 0m0.000s sum branched real 0m0.233s user 0m0.232s sys 0m0.000s builtin real 0m0.310s user 0m0.311s sys 0m0.000s fx10 don't inline conditional add branched real 0m0.803s user 0m0.804s sys 0m0.000s builtin real 0m0.861s user 0m0.862s sys 0m0.000s branch branched real 0m0.650s user 0m0.650s sys 0m0.000s builtin real 0m0.686s user 0m0.683s sys 0m0.004s sum branched real 0m1.300s user 0m1.299s sys 0m0.004s builtin real 0m1.347s user 0m1.346s sys 0m0.004s inline outer call conditional add branched real 0m0.366s user 0m0.366s sys 0m0.000s builtin real 0m0.539s user 0m0.539s sys 0m0.000s branch branched real 0m0.367s user 0m0.364s sys 0m0.004s builtin real 0m0.416s user 0m0.413s sys 0m0.004s sum branched real 0m1.301s user 0m1.303s sys 0m0.000s builtin real 0m1.307s user 0m1.308s sys 0m0.000s inline inner call conditional add branched real 0m0.587s user 0m0.587s sys 0m0.000s builtin real 0m0.590s user 0m0.586s sys 0m0.004s branch branched real 0m0.516s user 0m0.517s sys 0m0.000s builtin real 0m0.553s user 0m0.553s sys 0m0.001s sum branched real 0m1.294s user 0m1.295s sys 0m0.000s builtin real 0m1.310s user 0m1.309s sys 0m0.004s tigth loop conditional add branched real 0m0.118s user 0m0.115s sys 0m0.004s builtin real 0m0.409s user 0m0.409s sys 0m0.000s branch branched real 0m0.154s user 0m0.154s sys 0m0.000s builtin real 0m0.262s user 0m0.263s sys 0m0.000s sum branched real 0m0.369s user 0m0.369s sys 0m0.000s builtin real 0m0.408s user 0m0.409s sys 0m0.000s core2 don't inline conditional add branched real 0m1.573s user 0m1.573s sys 0m0.000s builtin real 0m1.696s user 0m1.692s sys 0m0.004s branch branched real 0m1.455s user 0m1.455s sys 0m0.000s builtin real 0m1.332s user 0m1.332s sys 0m0.000s sum branched real 0m1.332s user 0m1.328s sys 0m0.004s builtin real 0m1.574s user 0m1.574s sys 0m0.000s inline outer call conditional add branched real 0m0.850s user 0m0.849s sys 0m0.000s builtin real 0m1.211s user 0m1.210s sys 0m0.000s branch branched real 0m0.851s user 0m0.850s sys 0m0.000s builtin real 0m0.970s user 0m0.966s sys 0m0.004s sum branched real 0m1.091s user 0m1.091s sys 0m0.000s builtin real 0m1.108s user 0m1.107s sys 0m0.000s inline inner call conditional add branched real 0m1.091s user 0m1.091s sys 0m0.000s builtin real 0m1.091s user 0m1.091s sys 0m0.000s branch branched real 0m0.850s user 0m0.846s sys 0m0.003s builtin real 0m0.970s user 0m0.969s sys 0m0.000s sum branched real 0m1.096s user 0m1.095s sys 0m0.000s builtin real 0m1.101s user 0m1.100s sys 0m0.000s tigth loop conditional add branched real 0m0.126s user 0m0.126s sys 0m0.000s builtin real 0m0.369s user 0m0.368s sys 0m0.000s branch branched real 0m0.124s user 0m0.124s sys 0m0.000s builtin real 0m0.367s user 0m0.366s sys 0m0.000s sum branched real 0m0.365s user 0m0.364s sys 0m0.000s builtin real 0m0.627s user 0m0.626s sys 0m0.000s ^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug target/66986] poor performance of __builtin_isinf on x64 2015-07-24 7:18 [Bug middle-end/66986] New: poor performance of __builtin_isinf on x64 neleai at seznam dot cz ` (3 preceding siblings ...) 2015-07-24 7:41 ` neleai at seznam dot cz @ 2015-07-24 7:48 ` neleai at seznam dot cz 2021-08-07 1:03 ` pinskia at gcc dot gnu.org 5 siblings, 0 replies; 7+ messages in thread From: neleai at seznam dot cz @ 2015-07-24 7:48 UTC (permalink / raw) To: gcc-bugs https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66986 Ondrej Bilka <neleai at seznam dot cz> changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #36047|0 |1 is obsolete| | --- Comment #5 from Ondrej Bilka <neleai at seznam dot cz> --- Created attachment 36048 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=36048&action=edit testing script ^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug target/66986] poor performance of __builtin_isinf on x64 2015-07-24 7:18 [Bug middle-end/66986] New: poor performance of __builtin_isinf on x64 neleai at seznam dot cz ` (4 preceding siblings ...) 2015-07-24 7:48 ` neleai at seznam dot cz @ 2021-08-07 1:03 ` pinskia at gcc dot gnu.org 5 siblings, 0 replies; 7+ messages in thread From: pinskia at gcc dot gnu.org @ 2021-08-07 1:03 UTC (permalink / raw) To: gcc-bugs https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66986 Andrew Pinski <pinskia at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Resolution|--- |INVALID Status|UNCONFIRMED |RESOLVED --- Comment #6 from Andrew Pinski <pinskia at gcc dot gnu.org> --- Your defined isinf2 is incorrect: int I2 isinf2 (double dx) { unsigned long x; memcpy(&x, &dx, sizeof(dx)); if (2 * x == 0xffe0000000000000) return 0; else return (int) (x >> 32); } With that change, the GCC version that is produced is faster. isinf2: .LFB22: .cfi_startproc #APP # 19 "/app/example.cpp" 1 movq %xmm0, %rax # 0 "" 2 #NO_APP movabsq $-9007199254740992, %rdx leaq (%rax,%rax), %rcx shrq $32, %rax cmpq %rdx, %rcx movl $0, %edx cmove %edx, %eax ret vs isinf2: .LFB22: .cfi_startproc xorl %eax, %eax andpd .LC0(%rip), %xmm0 ucomisd .LC1(%rip), %xmm0 seta %al ret For the inlined inlined case (for the T1): .L15: movsd (%rax), %xmm0 addsd %xmm4, %xmm0 andpd %xmm3, %xmm0 ucomisd %xmm2, %xmm0 jbe .L14 addsd %xmm5, %xmm1 .L14: addq $8, %rax cmpq %rax, %rdx jne .L15 vs .L19: movsd (%rax), %xmm3 addsd %xmm0, %xmm3 movq %xmm3, %rdx leaq (%rdx,%rdx), %rcx cmpq %rdi, %rcx je .L18 shrq $32, %rdx testl %edx, %edx je .L18 addsd %xmm2, %xmm1 .L18: addq $8, %rax cmpq %rsi, %rax jne .L19 A double jump ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2021-08-07 1:03 UTC | newest] Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2015-07-24 7:18 [Bug middle-end/66986] New: poor performance of __builtin_isinf on x64 neleai at seznam dot cz 2015-07-24 7:20 ` [Bug middle-end/66986] " neleai at seznam dot cz 2015-07-24 7:29 ` [Bug target/66986] " pinskia at gcc dot gnu.org 2015-07-24 7:29 ` pinskia at gcc dot gnu.org 2015-07-24 7:41 ` neleai at seznam dot cz 2015-07-24 7:48 ` neleai at seznam dot cz 2021-08-07 1:03 ` pinskia at gcc dot gnu.org
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).