public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug c/50168] New: __builtin_ctz() and intrinsics __bsr(), __bsf() generate suboptimal code on x86_64
@ 2011-08-23 16:40 gpiez at web dot de
  2011-08-23 17:00 ` [Bug c/50168] " jakub at gcc dot gnu.org
                   ` (9 more replies)
  0 siblings, 10 replies; 11+ messages in thread
From: gpiez at web dot de @ 2011-08-23 16:40 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50168

             Bug #: 50168
           Summary: __builtin_ctz() and intrinsics __bsr(), __bsf()
                    generate suboptimal code on x86_64
    Classification: Unclassified
           Product: gcc
           Version: unknown
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c
        AssignedTo: unassigned@gcc.gnu.org
        ReportedBy: gpiez@web.de


Testcase:

--------------------
#include <x86intrin.h>

static inline long my_bsfq(long x) __attribute__((__always_inline__));
static inline long my_bsfq(long x) {
    long result;
    asm(" bsfq %1, %0 \n"
        : "=r"(result)
        : "r"(x)
    );
    return result;
}

long c[64];

long f(long i) {
    return c[ __bsfq(i) ];
}

long g(long i) {
    return c[ __builtin_ctzll(i) ];
}

long h(long i) {
    return c[ my_bsfq(i) ];
}
----------------------



When I compile this with 'gcc -O3 -g testcase.c -c -o testcase.o
&& objdump -d testcase', I get



----------------------
0000000000000000 <f>:
   0:   48 0f bc ff             bsf    %rdi,%rdi
   4:   48 63 ff                movslq %edi,%rdi
   7:   48 8b 04 fd 00 00 00    mov    0x0(,%rdi,8),%rax
   e:   00 
   f:   c3                      retq   

0000000000000010 <g>:
  10:   48 0f bc ff             bsf    %rdi,%rdi
  14:   48 63 ff                movslq %edi,%rdi
  17:   48 8b 04 fd 00 00 00    mov    0x0(,%rdi,8),%rax
  1e:   00 
  1f:   c3                      retq   

0000000000000020 <h>:
  20:   48 0f bc ff             bsf    %rdi,%rdi
  24:   48 8b 04 fd 00 00 00    mov    0x0(,%rdi,8),%rax
  2b:   00 
  2c:   c3                      retq   
-----------------------



Please note the unneeded 32 to 64 bit conversion 'movslq ...' inserted by the
compiler in functions f() and g(). It should look like h() instead.

I suspect the source is the prototype of the builtin, whose return type 'int'
does not match the "natural" return type on x86_64, which is 64 bit, the same
register size as the input register.

If I replace the builtin/intrinsic with the selfmade asm one, I get a nice
speedup of 2% in my chessengine.


^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2021-08-08 22:48 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-08-23 16:40 [Bug c/50168] New: __builtin_ctz() and intrinsics __bsr(), __bsf() generate suboptimal code on x86_64 gpiez at web dot de
2011-08-23 17:00 ` [Bug c/50168] " jakub at gcc dot gnu.org
2011-08-23 18:07 ` jakub at gcc dot gnu.org
2011-08-23 22:01 ` gpiez at web dot de
2011-08-23 23:53 ` gpiez at web dot de
2011-08-24  8:18 ` [Bug middle-end/50168] " rguenth at gcc dot gnu.org
2011-08-24  9:41 ` jakub at gcc dot gnu.org
2011-08-24  9:47 ` rguenth at gcc dot gnu.org
2011-08-24  9:49 ` jakub at gcc dot gnu.org
2011-08-24  9:53 ` jakub at gcc dot gnu.org
2021-08-08 22:48 ` [Bug middle-end/50168] __builtin_ctz() and intrinsics __bsr(), __bsf() generate extra sign extend " pinskia at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).