public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug c/36041] New: Speed up builtin_popcountll
@ 2008-04-25 0:35 intvnut at gmail dot com
2008-04-25 0:37 ` [Bug c/36041] " intvnut at gmail dot com
` (6 more replies)
0 siblings, 7 replies; 8+ messages in thread
From: intvnut at gmail dot com @ 2008-04-25 0:35 UTC (permalink / raw)
To: gcc-bugs
The current __builtin_popcountll (and likely __builtin_popcount) are fairly
slow as compared to a simple, short C version derived from what can be found in
Knuth's recent publications. The following short function is about 3x as fast
as the __builtin version, which runs counter to the idea that __builtin_XXX
provides access to implementations that are exemplars for a given platform.
unsigned int popcount64(unsigned long long x)
{
x = (x & 0x5555555555555555ULL) + ((x >> 1) & 0x5555555555555555ULL);
x = (x & 0x3333333333333333ULL) + ((x >> 2) & 0x3333333333333333ULL);
x = (x & 0x0F0F0F0F0F0F0F0FULL) + ((x >> 4) & 0x0F0F0F0F0F0F0F0FULL);
return (x * 0x0101010101010101ULL) >> 56;
}
This version has the additional benefit that it omits the lookup table that the
current "builtin" version uses.
I measured the above function vs. __builtin_popcountll with a loop like the
following:
t1 = clock();
for (j = 0; j < 1000000; j++)
for (i = 0; i < 1024; i++)
pt = popcount64(data[i]);
t2 = clock();
printf("popcount64 = %d clocks\n", t2 - t1);
...where data[] is a u64 that's preinitialized.
I'll attach the exact source I used, which also includes two other possible
implementations of popcountll.
--
Summary: Speed up builtin_popcountll
Product: gcc
Version: 4.2.0
Status: UNCONFIRMED
Severity: enhancement
Priority: P3
Component: c
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: intvnut at gmail dot com
GCC build triplet: x86_64-unknown-linux-gnu
GCC host triplet: x86_64-unknown-linux-gnu
GCC target triplet: x86_64-unknown-linux-gnu
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36041
^ permalink raw reply [flat|nested] 8+ messages in thread
* [Bug c/36041] Speed up builtin_popcountll
2008-04-25 0:35 [Bug c/36041] New: Speed up builtin_popcountll intvnut at gmail dot com
@ 2008-04-25 0:37 ` intvnut at gmail dot com
2008-04-25 0:40 ` [Bug middle-end/36041] " intvnut at gmail dot com
` (5 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: intvnut at gmail dot com @ 2008-04-25 0:37 UTC (permalink / raw)
To: gcc-bugs
------- Comment #1 from intvnut at gmail dot com 2008-04-25 00:36 -------
Created an attachment (id=15529)
--> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=15529&action=view)
Popcount sample implementations
These implementations are derived from insights in Henry S. Warren's Hacker's
Delight and Knuth's recent fascicle regarding boolean tips and tricks.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36041
^ permalink raw reply [flat|nested] 8+ messages in thread
* [Bug middle-end/36041] Speed up builtin_popcountll
2008-04-25 0:35 [Bug c/36041] New: Speed up builtin_popcountll intvnut at gmail dot com
2008-04-25 0:37 ` [Bug c/36041] " intvnut at gmail dot com
@ 2008-04-25 0:40 ` intvnut at gmail dot com
2008-04-25 8:45 ` rguenth at gcc dot gnu dot org
` (4 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: intvnut at gmail dot com @ 2008-04-25 0:40 UTC (permalink / raw)
To: gcc-bugs
------- Comment #2 from intvnut at gmail dot com 2008-04-25 00:39 -------
When run on my Opteron 280 system, the four separate implementations give the
following run times:
popcount64_1 = 13130000 clocks
popcount64_2 = 6480000 clocks
popcount64_3 = 3740000 clocks
popcount64_4 = 5490000 clocks
As one can see, the popcount64_3 implementation is over 3.5x the speed of the
__builtin_popcountll implementation.
--
intvnut at gmail dot com changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |intvnut at gmail dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36041
^ permalink raw reply [flat|nested] 8+ messages in thread
* [Bug middle-end/36041] Speed up builtin_popcountll
2008-04-25 0:35 [Bug c/36041] New: Speed up builtin_popcountll intvnut at gmail dot com
2008-04-25 0:37 ` [Bug c/36041] " intvnut at gmail dot com
2008-04-25 0:40 ` [Bug middle-end/36041] " intvnut at gmail dot com
@ 2008-04-25 8:45 ` rguenth at gcc dot gnu dot org
2008-04-25 12:29 ` intvnut at gmail dot com
` (3 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2008-04-25 8:45 UTC (permalink / raw)
To: gcc-bugs
------- Comment #3 from rguenth at gcc dot gnu dot org 2008-04-25 08:44 -------
The implementation is written so it is also reasonable on targets like the AVR
which only has 8bit registers...
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36041
^ permalink raw reply [flat|nested] 8+ messages in thread
* [Bug middle-end/36041] Speed up builtin_popcountll
2008-04-25 0:35 [Bug c/36041] New: Speed up builtin_popcountll intvnut at gmail dot com
` (2 preceding siblings ...)
2008-04-25 8:45 ` rguenth at gcc dot gnu dot org
@ 2008-04-25 12:29 ` intvnut at gmail dot com
2008-04-25 14:52 ` rguenth at gcc dot gnu dot org
` (2 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: intvnut at gmail dot com @ 2008-04-25 12:29 UTC (permalink / raw)
To: gcc-bugs
------- Comment #4 from intvnut at gmail dot com 2008-04-25 12:29 -------
Is there a mechanism to provide different implementations based on the target
(or in this case, class of target)? The byte count approach certainly is more
appropriate for 8-bit targets, sure, but what about the rest of us? How are
targets handled that might have this as an instruction?
FWIW, I'd be happy to write a 32-bit version to complement the 64-bit version I
provided with my report if there's a way to build a different implementation
based on the class of target being compiled for. That way, embedded 8/16 bit,
32 bit and 64 bit targets can each have a version that's appropriate for that
class of target.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36041
^ permalink raw reply [flat|nested] 8+ messages in thread
* [Bug middle-end/36041] Speed up builtin_popcountll
2008-04-25 0:35 [Bug c/36041] New: Speed up builtin_popcountll intvnut at gmail dot com
` (3 preceding siblings ...)
2008-04-25 12:29 ` intvnut at gmail dot com
@ 2008-04-25 14:52 ` rguenth at gcc dot gnu dot org
2008-04-29 3:42 ` intvnut at gmail dot com
2010-02-21 1:34 ` manu at gcc dot gnu dot org
6 siblings, 0 replies; 8+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2008-04-25 14:52 UTC (permalink / raw)
To: gcc-bugs
------- Comment #5 from rguenth at gcc dot gnu dot org 2008-04-25 14:52 -------
It should be possible to have an alternate implementation in libgcc2.c by means
of just selecting on a proper architecture define or the size of the argument
mode.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36041
^ permalink raw reply [flat|nested] 8+ messages in thread
* [Bug middle-end/36041] Speed up builtin_popcountll
2008-04-25 0:35 [Bug c/36041] New: Speed up builtin_popcountll intvnut at gmail dot com
` (4 preceding siblings ...)
2008-04-25 14:52 ` rguenth at gcc dot gnu dot org
@ 2008-04-29 3:42 ` intvnut at gmail dot com
2010-02-21 1:34 ` manu at gcc dot gnu dot org
6 siblings, 0 replies; 8+ messages in thread
From: intvnut at gmail dot com @ 2008-04-29 3:42 UTC (permalink / raw)
To: gcc-bugs
------- Comment #6 from intvnut at gmail dot com 2008-04-29 03:42 -------
(In reply to comment #5)
> It should be possible to have an alternate implementation in libgcc2.c by means
> of just selecting on a proper architecture define or the size of the argument
> mode.
>
I see where it would go in libgcc2.c, but I don't know the appropriate
architecture defines to key off of, since I really do know nothing about GCC's
internals.
Since the method I used above is likely to be a strict improvement over the
table driven method on 32-bit and 64-bit targets, is it enough to, say, key off
of "#if W_TYPE_SIZE == 32" and "#if W_TYPE_SIZE == 64"? Is there some
documentation I can read to know how best to propose a patch?
(I'm just a motivated user, not a compiler developer, in case you couldn't
tell.)
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36041
^ permalink raw reply [flat|nested] 8+ messages in thread
* [Bug middle-end/36041] Speed up builtin_popcountll
2008-04-25 0:35 [Bug c/36041] New: Speed up builtin_popcountll intvnut at gmail dot com
` (5 preceding siblings ...)
2008-04-29 3:42 ` intvnut at gmail dot com
@ 2010-02-21 1:34 ` manu at gcc dot gnu dot org
6 siblings, 0 replies; 8+ messages in thread
From: manu at gcc dot gnu dot org @ 2010-02-21 1:34 UTC (permalink / raw)
To: gcc-bugs
------- Comment #7 from manu at gcc dot gnu dot org 2010-02-21 01:34 -------
Given Richard's comment, I am confirming this.
Joseph,
bugzilla is too busy to keep track of conversations. If you have questions
about gcc development, go to gcc@gcc.gnu.org. See also
http://gcc.gnu.org/contribute.html
If you send a patch to gcc-patches@gcc.gnu.org, you may get more specific
feedback.
--
manu at gcc dot gnu dot org changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |manu at gcc dot gnu dot org
Status|UNCONFIRMED |NEW
Ever Confirmed|0 |1
Last reconfirmed|0000-00-00 00:00:00 |2010-02-21 01:34:11
date| |
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36041
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2010-02-21 1:34 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-04-25 0:35 [Bug c/36041] New: Speed up builtin_popcountll intvnut at gmail dot com
2008-04-25 0:37 ` [Bug c/36041] " intvnut at gmail dot com
2008-04-25 0:40 ` [Bug middle-end/36041] " intvnut at gmail dot com
2008-04-25 8:45 ` rguenth at gcc dot gnu dot org
2008-04-25 12:29 ` intvnut at gmail dot com
2008-04-25 14:52 ` rguenth at gcc dot gnu dot org
2008-04-29 3:42 ` intvnut at gmail dot com
2010-02-21 1:34 ` manu at gcc dot gnu dot org
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).