public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug c/36041]  New: Speed up builtin_popcountll
@ 2008-04-25  0:35 intvnut at gmail dot com
  2008-04-25  0:37 ` [Bug c/36041] " intvnut at gmail dot com
                   ` (6 more replies)
  0 siblings, 7 replies; 8+ messages in thread
From: intvnut at gmail dot com @ 2008-04-25  0:35 UTC (permalink / raw)
  To: gcc-bugs

The current __builtin_popcountll (and likely __builtin_popcount) are fairly
slow as compared to a simple, short C version derived from what can be found in
Knuth's recent publications.  The following short function is about 3x as fast
as the __builtin version, which runs counter to the idea that __builtin_XXX
provides access to implementations that are exemplars for a given platform.

unsigned int popcount64(unsigned long long x)
{
    x = (x & 0x5555555555555555ULL) + ((x >> 1) & 0x5555555555555555ULL);
    x = (x & 0x3333333333333333ULL) + ((x >> 2) & 0x3333333333333333ULL);
    x = (x & 0x0F0F0F0F0F0F0F0FULL) + ((x >> 4) & 0x0F0F0F0F0F0F0F0FULL);
    return (x * 0x0101010101010101ULL) >> 56;
}

This version has the additional benefit that it omits the lookup table that the
current "builtin" version uses.

I measured the above function vs. __builtin_popcountll with a loop like the
following:

    t1 = clock();
    for (j = 0; j < 1000000; j++)
        for (i = 0; i < 1024; i++)
            pt = popcount64(data[i]);
    t2 = clock();

    printf("popcount64 = %d clocks\n", t2 - t1);

...where data[] is a u64 that's preinitialized.

I'll attach the exact source I used, which also includes two other possible
implementations of popcountll.


-- 
           Summary: Speed up builtin_popcountll
           Product: gcc
           Version: 4.2.0
            Status: UNCONFIRMED
          Severity: enhancement
          Priority: P3
         Component: c
        AssignedTo: unassigned at gcc dot gnu dot org
        ReportedBy: intvnut at gmail dot com
 GCC build triplet: x86_64-unknown-linux-gnu
  GCC host triplet: x86_64-unknown-linux-gnu
GCC target triplet: x86_64-unknown-linux-gnu


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36041


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug c/36041] Speed up builtin_popcountll
  2008-04-25  0:35 [Bug c/36041] New: Speed up builtin_popcountll intvnut at gmail dot com
@ 2008-04-25  0:37 ` intvnut at gmail dot com
  2008-04-25  0:40 ` [Bug middle-end/36041] " intvnut at gmail dot com
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: intvnut at gmail dot com @ 2008-04-25  0:37 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #1 from intvnut at gmail dot com  2008-04-25 00:36 -------
Created an attachment (id=15529)
 --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=15529&action=view)
Popcount sample implementations

These implementations are derived from insights in Henry S. Warren's Hacker's
Delight and Knuth's recent fascicle regarding boolean tips and tricks.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36041


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug middle-end/36041] Speed up builtin_popcountll
  2008-04-25  0:35 [Bug c/36041] New: Speed up builtin_popcountll intvnut at gmail dot com
  2008-04-25  0:37 ` [Bug c/36041] " intvnut at gmail dot com
@ 2008-04-25  0:40 ` intvnut at gmail dot com
  2008-04-25  8:45 ` rguenth at gcc dot gnu dot org
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: intvnut at gmail dot com @ 2008-04-25  0:40 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #2 from intvnut at gmail dot com  2008-04-25 00:39 -------
When run on my Opteron 280 system, the four separate implementations give the
following run times:

popcount64_1 = 13130000 clocks
popcount64_2 = 6480000 clocks
popcount64_3 = 3740000 clocks
popcount64_4 = 5490000 clocks

As one can see, the popcount64_3 implementation is over 3.5x the speed of the
__builtin_popcountll implementation.


-- 

intvnut at gmail dot com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |intvnut at gmail dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36041


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug middle-end/36041] Speed up builtin_popcountll
  2008-04-25  0:35 [Bug c/36041] New: Speed up builtin_popcountll intvnut at gmail dot com
  2008-04-25  0:37 ` [Bug c/36041] " intvnut at gmail dot com
  2008-04-25  0:40 ` [Bug middle-end/36041] " intvnut at gmail dot com
@ 2008-04-25  8:45 ` rguenth at gcc dot gnu dot org
  2008-04-25 12:29 ` intvnut at gmail dot com
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2008-04-25  8:45 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #3 from rguenth at gcc dot gnu dot org  2008-04-25 08:44 -------
The implementation is written so it is also reasonable on targets like the AVR
which only has 8bit registers...


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36041


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug middle-end/36041] Speed up builtin_popcountll
  2008-04-25  0:35 [Bug c/36041] New: Speed up builtin_popcountll intvnut at gmail dot com
                   ` (2 preceding siblings ...)
  2008-04-25  8:45 ` rguenth at gcc dot gnu dot org
@ 2008-04-25 12:29 ` intvnut at gmail dot com
  2008-04-25 14:52 ` rguenth at gcc dot gnu dot org
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: intvnut at gmail dot com @ 2008-04-25 12:29 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #4 from intvnut at gmail dot com  2008-04-25 12:29 -------
Is there a mechanism to provide different implementations based on the target
(or in this case, class of target)?  The byte count approach certainly is more
appropriate for 8-bit targets, sure, but what about the rest of us?  How are
targets handled that might have this as an instruction?

FWIW, I'd be happy to write a 32-bit version to complement the 64-bit version I
provided with my report if there's a way to build a different implementation
based on the class of target being compiled for.  That way, embedded 8/16 bit,
32 bit and 64 bit targets can each have a version that's appropriate for that
class of target.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36041


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug middle-end/36041] Speed up builtin_popcountll
  2008-04-25  0:35 [Bug c/36041] New: Speed up builtin_popcountll intvnut at gmail dot com
                   ` (3 preceding siblings ...)
  2008-04-25 12:29 ` intvnut at gmail dot com
@ 2008-04-25 14:52 ` rguenth at gcc dot gnu dot org
  2008-04-29  3:42 ` intvnut at gmail dot com
  2010-02-21  1:34 ` manu at gcc dot gnu dot org
  6 siblings, 0 replies; 8+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2008-04-25 14:52 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #5 from rguenth at gcc dot gnu dot org  2008-04-25 14:52 -------
It should be possible to have an alternate implementation in libgcc2.c by means
of just selecting on a proper architecture define or the size of the argument
mode.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36041


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug middle-end/36041] Speed up builtin_popcountll
  2008-04-25  0:35 [Bug c/36041] New: Speed up builtin_popcountll intvnut at gmail dot com
                   ` (4 preceding siblings ...)
  2008-04-25 14:52 ` rguenth at gcc dot gnu dot org
@ 2008-04-29  3:42 ` intvnut at gmail dot com
  2010-02-21  1:34 ` manu at gcc dot gnu dot org
  6 siblings, 0 replies; 8+ messages in thread
From: intvnut at gmail dot com @ 2008-04-29  3:42 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #6 from intvnut at gmail dot com  2008-04-29 03:42 -------
(In reply to comment #5)
> It should be possible to have an alternate implementation in libgcc2.c by means
> of just selecting on a proper architecture define or the size of the argument
> mode.
> 

I see where it would go in libgcc2.c, but I don't know the appropriate
architecture defines to key off of, since I really do know nothing about GCC's
internals.

Since the method I used above is likely to be a strict improvement over the
table driven method on 32-bit and 64-bit targets, is it enough to, say, key off
of "#if W_TYPE_SIZE == 32" and "#if W_TYPE_SIZE == 64"?  Is there some
documentation I can read to know how best to propose a patch? 

(I'm just a motivated user, not a compiler developer, in case you couldn't
tell.)


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36041


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug middle-end/36041] Speed up builtin_popcountll
  2008-04-25  0:35 [Bug c/36041] New: Speed up builtin_popcountll intvnut at gmail dot com
                   ` (5 preceding siblings ...)
  2008-04-29  3:42 ` intvnut at gmail dot com
@ 2010-02-21  1:34 ` manu at gcc dot gnu dot org
  6 siblings, 0 replies; 8+ messages in thread
From: manu at gcc dot gnu dot org @ 2010-02-21  1:34 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #7 from manu at gcc dot gnu dot org  2010-02-21 01:34 -------
Given Richard's comment, I am confirming this.

Joseph,

bugzilla is too busy to keep track of conversations. If you have questions
about gcc development, go to gcc@gcc.gnu.org. See also
http://gcc.gnu.org/contribute.html

If you send a patch to gcc-patches@gcc.gnu.org, you may get more specific
feedback.


-- 

manu at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |manu at gcc dot gnu dot org
             Status|UNCONFIRMED                 |NEW
     Ever Confirmed|0                           |1
   Last reconfirmed|0000-00-00 00:00:00         |2010-02-21 01:34:11
               date|                            |


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36041


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2010-02-21  1:34 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-04-25  0:35 [Bug c/36041] New: Speed up builtin_popcountll intvnut at gmail dot com
2008-04-25  0:37 ` [Bug c/36041] " intvnut at gmail dot com
2008-04-25  0:40 ` [Bug middle-end/36041] " intvnut at gmail dot com
2008-04-25  8:45 ` rguenth at gcc dot gnu dot org
2008-04-25 12:29 ` intvnut at gmail dot com
2008-04-25 14:52 ` rguenth at gcc dot gnu dot org
2008-04-29  3:42 ` intvnut at gmail dot com
2010-02-21  1:34 ` manu at gcc dot gnu dot org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).