public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug middle-end/66986] New: poor performance of __builtin_isinf on x64
@ 2015-07-24  7:18 neleai at seznam dot cz
  2015-07-24  7:20 ` [Bug middle-end/66986] " neleai at seznam dot cz
                   ` (5 more replies)
  0 siblings, 6 replies; 7+ messages in thread
From: neleai at seznam dot cz @ 2015-07-24  7:18 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66986

            Bug ID: 66986
           Summary: poor performance of __builtin_isinf on x64
           Product: gcc
           Version: unknown
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: middle-end
          Assignee: unassigned at gcc dot gnu.org
          Reporter: neleai at seznam dot cz
  Target Milestone: ---

Created attachment 36046
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=36046&action=edit
benchmark.

Hi,

On x64 floating builtins are considerably slower than equivalent integer code.
For easier tracking I split this in per-function basis.

Currently libc uses separate inlines to get better performance than buildins so
fixing these would be appreciated.

As for isinf main problem is how to measure it. Depending on use case parts of
it will be optimized away.


It tests 12 cases which combination of surrounding expression
1:
 if (isinf2 (x))
   *d+=42;
2:
  if (isinf2 (x))
    abort();
3:
 *d += isinf2 (x);

where in cases 1 and 3 gcc tries to generate branchless code which does more
harm than good, case 2 is common usage to represent if (isinf(x) foo() else
bar()

Then its tried by placing this expression in function and selectively inlining
we measure effect of constructing needed integer constants. 

Attached benchmark clearly shows that its better to use following inline
instead.

#define EXTRACT_WORDS64(i, d)                                                 \
  do {                                                                        \
    int64_t i_;                                                               \
    asm ("movq %1, %0" : "=rm" (i_) : "x" ((double) (d)));                   \
    (i) = i_;                                                                 \
  } while (0)


int I2
isinf2 (double dx)
{
  unsigned long x;
  EXTRACT_WORDS64(dx, x);
  if (2 * x == 0xffe0000000000000)
    return 0;
  else
    return (int) (x >> 32);
}


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug middle-end/66986] poor performance of __builtin_isinf on x64
  2015-07-24  7:18 [Bug middle-end/66986] New: poor performance of __builtin_isinf on x64 neleai at seznam dot cz
@ 2015-07-24  7:20 ` neleai at seznam dot cz
  2015-07-24  7:29 ` [Bug target/66986] " pinskia at gcc dot gnu.org
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: neleai at seznam dot cz @ 2015-07-24  7:20 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66986

--- Comment #1 from Ondrej Bilka <neleai at seznam dot cz> ---
Created attachment 36047
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=36047&action=edit
testing script


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug target/66986] poor performance of __builtin_isinf on x64
  2015-07-24  7:18 [Bug middle-end/66986] New: poor performance of __builtin_isinf on x64 neleai at seznam dot cz
  2015-07-24  7:20 ` [Bug middle-end/66986] " neleai at seznam dot cz
  2015-07-24  7:29 ` [Bug target/66986] " pinskia at gcc dot gnu.org
@ 2015-07-24  7:29 ` pinskia at gcc dot gnu.org
  2015-07-24  7:41 ` neleai at seznam dot cz
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: pinskia at gcc dot gnu.org @ 2015-07-24  7:29 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66986

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
          Component|middle-end                  |target

--- Comment #2 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Which processor is in this on because I suspect it depends on the processor
really.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug target/66986] poor performance of __builtin_isinf on x64
  2015-07-24  7:18 [Bug middle-end/66986] New: poor performance of __builtin_isinf on x64 neleai at seznam dot cz
  2015-07-24  7:20 ` [Bug middle-end/66986] " neleai at seznam dot cz
@ 2015-07-24  7:29 ` pinskia at gcc dot gnu.org
  2015-07-24  7:29 ` pinskia at gcc dot gnu.org
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: pinskia at gcc dot gnu.org @ 2015-07-24  7:29 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66986

--- Comment #3 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Also have you tried adding -march=native ?


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug target/66986] poor performance of __builtin_isinf on x64
  2015-07-24  7:18 [Bug middle-end/66986] New: poor performance of __builtin_isinf on x64 neleai at seznam dot cz
                   ` (2 preceding siblings ...)
  2015-07-24  7:29 ` pinskia at gcc dot gnu.org
@ 2015-07-24  7:41 ` neleai at seznam dot cz
  2015-07-24  7:48 ` neleai at seznam dot cz
  2021-08-07  1:03 ` pinskia at gcc dot gnu.org
  5 siblings, 0 replies; 7+ messages in thread
From: neleai at seznam dot cz @ 2015-07-24  7:41 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66986

--- Comment #4 from Ondrej Bilka <neleai at seznam dot cz> ---
Ok added updated benchmark with adding -mtune=native and tests for core2,
haswell and fx10. It stays pretty consistent.

don't inline
conditional add
branched

real    0m0.698s
user    0m0.698s
sys     0m0.000s
builtin

real    0m0.775s
user    0m0.776s
sys     0m0.000s
branch
branched

real    0m0.715s
user    0m0.716s
sys     0m0.000s
builtin

real    0m0.774s
user    0m0.774s
sys     0m0.000s
sum
branched

real    0m0.697s
user    0m0.697s
sys     0m0.000s
builtin

real    0m0.774s
user    0m0.775s
sys     0m0.000s
inline outer call
conditional add
branched

real    0m0.391s
user    0m0.391s
sys     0m0.000s
builtin

real    0m0.543s
user    0m0.543s
sys     0m0.000s
branch
branched

real    0m0.391s
user    0m0.392s
sys     0m0.000s
builtin

real    0m0.404s
user    0m0.404s
sys     0m0.000s
sum
branched

real    0m0.695s
user    0m0.696s
sys     0m0.000s
builtin

real    0m0.695s
user    0m0.696s
sys     0m0.000s
inline inner call
conditional add
branched

real    0m0.466s
user    0m0.466s
sys     0m0.000s
builtin

real    0m0.541s
user    0m0.541s
sys     0m0.000s
branch
branched

real    0m0.389s
user    0m0.388s
sys     0m0.000s
builtin

real    0m0.430s
user    0m0.427s
sys     0m0.003s
sum
branched

real    0m0.700s
user    0m0.701s
sys     0m0.000s
builtin

real    0m0.695s
user    0m0.695s
sys     0m0.000s
tigth loop
conditional add
branched

real    0m0.080s
user    0m0.079s
sys     0m0.000s
builtin

real    0m0.158s
user    0m0.158s
sys     0m0.000s
branch
branched

real    0m0.080s
user    0m0.080s
sys     0m0.000s
builtin

real    0m0.160s
user    0m0.160s
sys     0m0.000s
sum
branched

real    0m0.233s
user    0m0.232s
sys     0m0.000s
builtin

real    0m0.310s
user    0m0.311s
sys     0m0.000s

fx10

don't inline
conditional add
branched

real    0m0.803s
user    0m0.804s
sys     0m0.000s
builtin

real    0m0.861s
user    0m0.862s
sys     0m0.000s
branch
branched

real    0m0.650s
user    0m0.650s
sys     0m0.000s
builtin

real    0m0.686s
user    0m0.683s
sys     0m0.004s
sum
branched

real    0m1.300s
user    0m1.299s
sys     0m0.004s
builtin

real    0m1.347s
user    0m1.346s
sys     0m0.004s
inline outer call
conditional add
branched

real    0m0.366s
user    0m0.366s
sys     0m0.000s
builtin

real    0m0.539s
user    0m0.539s
sys     0m0.000s
branch
branched

real    0m0.367s
user    0m0.364s
sys     0m0.004s
builtin

real    0m0.416s
user    0m0.413s
sys     0m0.004s
sum
branched

real    0m1.301s
user    0m1.303s
sys     0m0.000s
builtin

real    0m1.307s
user    0m1.308s
sys     0m0.000s
inline inner call
conditional add
branched

real    0m0.587s
user    0m0.587s
sys     0m0.000s
builtin

real    0m0.590s
user    0m0.586s
sys     0m0.004s
branch
branched

real    0m0.516s
user    0m0.517s
sys     0m0.000s
builtin

real    0m0.553s
user    0m0.553s
sys     0m0.001s
sum
branched

real    0m1.294s
user    0m1.295s
sys     0m0.000s
builtin

real    0m1.310s
user    0m1.309s
sys     0m0.004s
tigth loop
conditional add
branched

real    0m0.118s
user    0m0.115s
sys     0m0.004s
builtin

real    0m0.409s
user    0m0.409s
sys     0m0.000s
branch
branched

real    0m0.154s
user    0m0.154s
sys     0m0.000s
builtin

real    0m0.262s
user    0m0.263s
sys     0m0.000s
sum
branched

real    0m0.369s
user    0m0.369s
sys     0m0.000s
builtin

real    0m0.408s
user    0m0.409s
sys     0m0.000s


core2

don't inline
conditional add
branched

real    0m1.573s
user    0m1.573s
sys     0m0.000s
builtin

real    0m1.696s
user    0m1.692s
sys     0m0.004s
branch
branched

real    0m1.455s
user    0m1.455s
sys     0m0.000s
builtin

real    0m1.332s
user    0m1.332s
sys     0m0.000s
sum
branched

real    0m1.332s
user    0m1.328s
sys     0m0.004s
builtin

real    0m1.574s
user    0m1.574s
sys     0m0.000s
inline outer call
conditional add
branched

real    0m0.850s
user    0m0.849s
sys     0m0.000s
builtin

real    0m1.211s
user    0m1.210s
sys     0m0.000s
branch
branched

real    0m0.851s
user    0m0.850s
sys     0m0.000s
builtin

real    0m0.970s
user    0m0.966s
sys     0m0.004s
sum
branched

real    0m1.091s
user    0m1.091s
sys     0m0.000s
builtin

real    0m1.108s
user    0m1.107s
sys     0m0.000s
inline inner call
conditional add
branched

real    0m1.091s
user    0m1.091s
sys     0m0.000s
builtin

real    0m1.091s
user    0m1.091s
sys     0m0.000s
branch
branched

real    0m0.850s
user    0m0.846s
sys     0m0.003s
builtin

real    0m0.970s
user    0m0.969s
sys     0m0.000s
sum
branched

real    0m1.096s
user    0m1.095s
sys     0m0.000s
builtin

real    0m1.101s
user    0m1.100s
sys     0m0.000s
tigth loop
conditional add
branched

real    0m0.126s
user    0m0.126s
sys     0m0.000s
builtin

real    0m0.369s
user    0m0.368s
sys     0m0.000s
branch
branched

real    0m0.124s
user    0m0.124s
sys     0m0.000s
builtin

real    0m0.367s
user    0m0.366s
sys     0m0.000s
sum
branched

real    0m0.365s
user    0m0.364s
sys     0m0.000s
builtin

real    0m0.627s
user    0m0.626s
sys     0m0.000s


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug target/66986] poor performance of __builtin_isinf on x64
  2015-07-24  7:18 [Bug middle-end/66986] New: poor performance of __builtin_isinf on x64 neleai at seznam dot cz
                   ` (3 preceding siblings ...)
  2015-07-24  7:41 ` neleai at seznam dot cz
@ 2015-07-24  7:48 ` neleai at seznam dot cz
  2021-08-07  1:03 ` pinskia at gcc dot gnu.org
  5 siblings, 0 replies; 7+ messages in thread
From: neleai at seznam dot cz @ 2015-07-24  7:48 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66986

Ondrej Bilka <neleai at seznam dot cz> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
  Attachment #36047|0                           |1
        is obsolete|                            |

--- Comment #5 from Ondrej Bilka <neleai at seznam dot cz> ---
Created attachment 36048
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=36048&action=edit
testing script


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug target/66986] poor performance of __builtin_isinf on x64
  2015-07-24  7:18 [Bug middle-end/66986] New: poor performance of __builtin_isinf on x64 neleai at seznam dot cz
                   ` (4 preceding siblings ...)
  2015-07-24  7:48 ` neleai at seznam dot cz
@ 2021-08-07  1:03 ` pinskia at gcc dot gnu.org
  5 siblings, 0 replies; 7+ messages in thread
From: pinskia at gcc dot gnu.org @ 2021-08-07  1:03 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66986

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         Resolution|---                         |INVALID
             Status|UNCONFIRMED                 |RESOLVED

--- Comment #6 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Your defined isinf2 is incorrect:
int I2
isinf2 (double dx)
{
  unsigned long x;
  memcpy(&x, &dx, sizeof(dx));
  if (2 * x == 0xffe0000000000000)
    return 0;
  else
    return (int) (x >> 32);
}

With that change, the GCC version that is produced is faster.

isinf2:
.LFB22:
        .cfi_startproc
#APP
# 19 "/app/example.cpp" 1
        movq %xmm0, %rax
# 0 "" 2
#NO_APP
        movabsq $-9007199254740992, %rdx
        leaq    (%rax,%rax), %rcx
        shrq    $32, %rax
        cmpq    %rdx, %rcx
        movl    $0, %edx
        cmove   %edx, %eax
        ret


vs
isinf2:
.LFB22:
        .cfi_startproc
        xorl    %eax, %eax
        andpd   .LC0(%rip), %xmm0
        ucomisd .LC1(%rip), %xmm0
        seta    %al
        ret


For the inlined inlined case (for the T1):
.L15:
        movsd   (%rax), %xmm0
        addsd   %xmm4, %xmm0
        andpd   %xmm3, %xmm0
        ucomisd %xmm2, %xmm0
        jbe     .L14
        addsd   %xmm5, %xmm1
.L14:
        addq    $8, %rax
        cmpq    %rax, %rdx
        jne     .L15

vs
.L19:
        movsd   (%rax), %xmm3
        addsd   %xmm0, %xmm3
        movq    %xmm3, %rdx
        leaq    (%rdx,%rdx), %rcx
        cmpq    %rdi, %rcx
        je      .L18
        shrq    $32, %rdx
        testl   %edx, %edx
        je      .L18
        addsd   %xmm2, %xmm1
.L18:
        addq    $8, %rax
        cmpq    %rsi, %rax
        jne     .L19

A double jump

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2021-08-07  1:03 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-07-24  7:18 [Bug middle-end/66986] New: poor performance of __builtin_isinf on x64 neleai at seznam dot cz
2015-07-24  7:20 ` [Bug middle-end/66986] " neleai at seznam dot cz
2015-07-24  7:29 ` [Bug target/66986] " pinskia at gcc dot gnu.org
2015-07-24  7:29 ` pinskia at gcc dot gnu.org
2015-07-24  7:41 ` neleai at seznam dot cz
2015-07-24  7:48 ` neleai at seznam dot cz
2021-08-07  1:03 ` pinskia at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).