public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug c/31695]  New: __builtin_ctzll slower than 2*__builtin_ctz
@ 2007-04-25  8:17 joerg dot richter at pdv-fs dot de
  2007-04-25 14:38 ` [Bug c/31695] " rguenth at gcc dot gnu dot org
  0 siblings, 1 reply; 2+ messages in thread
From: joerg dot richter at pdv-fs dot de @ 2007-04-25  8:17 UTC (permalink / raw)
  To: gcc-bugs

int func1( unsigned long long val )
{
  return __builtin_ctzll( val );
}

int func2( unsigned long long val )
{
  unsigned lo = (unsigned)val;
  return lo ? __builtin_ctz(lo) : __builtin_ctz(unsigned(val>>32)) + 32;
}

func1 is more than 2 times slower than func2.  
But it should be at least as fast as func2

__builtin_ctzll is not expanded inline like __builtin_ctz.


-- 
           Summary: __builtin_ctzll slower than 2*__builtin_ctz
           Product: gcc
           Version: 4.1.1
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c
        AssignedTo: unassigned at gcc dot gnu dot org
        ReportedBy: joerg dot richter at pdv-fs dot de
GCC target triplet: i686-pc-linux-gnu


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31695


^ permalink raw reply	[flat|nested] 2+ messages in thread

* [Bug c/31695] __builtin_ctzll slower than 2*__builtin_ctz
  2007-04-25  8:17 [Bug c/31695] New: __builtin_ctzll slower than 2*__builtin_ctz joerg dot richter at pdv-fs dot de
@ 2007-04-25 14:38 ` rguenth at gcc dot gnu dot org
  0 siblings, 0 replies; 2+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2007-04-25 14:38 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #1 from rguenth at gcc dot gnu dot org  2007-04-25 15:38 -------
Because it calls into libgcc and that without tail-calling:

_Z5func1y:
.LFB2:
        pushl   %ebp
.LCFI2:
        movl    %esp, %ebp
.LCFI3:
        subl    $24, %esp
.LCFI4:
        movl    8(%ebp), %eax
        movl    12(%ebp), %edx
        movl    %eax, (%esp)
        movl    %edx, 4(%esp)
        call    __ctzdi2
        leave
        ret

libgcc implements it as

int
__ctzDI2 (UDWtype x)
{
  const DWunion uu = {.ll = x};
  UWtype word;
  Wtype ret, add;

  if (uu.s.low)
    word = uu.s.low, add = 0;
  else
    word = uu.s.high, add = W_TYPE_SIZE;

  count_trailing_zeros (ret, word);
  return ret + add;
}

(count_trailing_zeros is expanded to asm bsfl on x86, that's ok)

The question remains why we don't tailcall.  And we could expand the
long-long version inline.


-- 

rguenth at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Severity|normal                      |enhancement
             Status|UNCONFIRMED                 |NEW
     Ever Confirmed|0                           |1
           Keywords|                            |missed-optimization
   Last reconfirmed|0000-00-00 00:00:00         |2007-04-25 15:38:21
               date|                            |


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31695


^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2007-04-25 14:38 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-04-25  8:17 [Bug c/31695] New: __builtin_ctzll slower than 2*__builtin_ctz joerg dot richter at pdv-fs dot de
2007-04-25 14:38 ` [Bug c/31695] " rguenth at gcc dot gnu dot org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).