public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug middle-end/36041] Speed up builtin_popcountll
       [not found] <bug-36041-4@http.gcc.gnu.org/bugzilla/>
@ 2012-09-05 10:40 ` jsalavert at gmail dot com
  2012-09-05 15:21 ` paolo.carlini at oracle dot com
                   ` (12 subsequent siblings)
  13 siblings, 0 replies; 20+ messages in thread
From: jsalavert at gmail dot com @ 2012-09-05 10:40 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36041

José Salavert Torres <jsalavert at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |jsalavert at gmail dot com

--- Comment #8 from José Salavert Torres <jsalavert at gmail dot com> 2012-09-05 10:39:45 UTC ---
Hello, there has been any advance in in this issue, Knuth's publication
approach would be great for 8 bit registers also.

Also, allowing different behaviour for each architecture would be better.

In the forums the implementation described here is now like this, seems to use
less operations:

inline unsigned int bitcount32(uint32_t i) {

  //Parallel binary bit add                                                     
  i = i - ((i >> 1) & 0x55555555);
  i = (i & 0x33333333) + ((i >> 2) & 0x33333333);
  return (((i + (i >> 4)) & 0xF0F0F0F) * 0x1010101) >> 24;

}

  //Parallel binary bit add                                                     
  i = i - ((i >> 1) & 0x5555555555555555);                                      
  i = (i & 0x3333333333333333) + ((i >> 2) & 0x3333333333333333);               
  return (((i + (i >> 4)) & 0xF0F0F0F0F0F0F0F) * 0x101010101010101) >> 56;      

}


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Bug middle-end/36041] Speed up builtin_popcountll
       [not found] <bug-36041-4@http.gcc.gnu.org/bugzilla/>
  2012-09-05 10:40 ` [Bug middle-end/36041] Speed up builtin_popcountll jsalavert at gmail dot com
@ 2012-09-05 15:21 ` paolo.carlini at oracle dot com
  2012-10-26 15:51 ` gpiez at web dot de
                   ` (11 subsequent siblings)
  13 siblings, 0 replies; 20+ messages in thread
From: paolo.carlini at oracle dot com @ 2012-09-05 15:21 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36041

Paolo Carlini <paolo.carlini at oracle dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|gcc-bugs at gcc dot gnu.org |glisse at gcc dot gnu.org

--- Comment #9 from Paolo Carlini <paolo.carlini at oracle dot com> 2012-09-05 15:21:12 UTC ---
Maybe Marc is interested.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Bug middle-end/36041] Speed up builtin_popcountll
       [not found] <bug-36041-4@http.gcc.gnu.org/bugzilla/>
  2012-09-05 10:40 ` [Bug middle-end/36041] Speed up builtin_popcountll jsalavert at gmail dot com
  2012-09-05 15:21 ` paolo.carlini at oracle dot com
@ 2012-10-26 15:51 ` gpiez at web dot de
  2013-06-26 18:52 ` crrodriguez at opensuse dot org
                   ` (10 subsequent siblings)
  13 siblings, 0 replies; 20+ messages in thread
From: gpiez at web dot de @ 2012-10-26 15:51 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36041

Gunther Piez <gpiez at web dot de> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |gpiez at web dot de

--- Comment #10 from Gunther Piez <gpiez at web dot de> 2012-10-26 15:51:24 UTC ---
Just noted the exceptional slowness of the provided __builtin_popcountll() even
on ARMv5.

I already used the above parallel bit count algorithm in the case that a native
bit count instruction (like the SSE popcnt or NEON vcnt) is not present, but
native 64 bit registers are available. 

But on a 32 bit architecture like ARM I figured it made sense to just use the
__builtin_popcountll() because the many 64 bit instructions in the algorithm
may be very slow without NEON or similar support on a pure 32 bit architecture.

But "optimizing" my code with some macro magic to make it use the library
popcount made the whole program 25% slower, although only a minor part of it
actually does use the popcount instruction.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Bug middle-end/36041] Speed up builtin_popcountll
       [not found] <bug-36041-4@http.gcc.gnu.org/bugzilla/>
                   ` (2 preceding siblings ...)
  2012-10-26 15:51 ` gpiez at web dot de
@ 2013-06-26 18:52 ` crrodriguez at opensuse dot org
  2013-06-26 23:28 ` glisse at gcc dot gnu.org
                   ` (9 subsequent siblings)
  13 siblings, 0 replies; 20+ messages in thread
From: crrodriguez at opensuse dot org @ 2013-06-26 18:52 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36041

--- Comment #11 from Cristian Rodríguez <crrodriguez at opensuse dot org> ---
Not to be annoying, but compiling the test case attached to this bug report
with clang 3.3 produces code in where 

inline u32 popcount64_1(u64 x) { return __builtin_popcountll(x); }


is over 3 times faster than GCC 4.8.1 in x86_64.

I think GCC could "just" generate IFUNCS for generic targets , in x86_64 one
function with attribute target popcnt and the other a call to libgcc that at
least matches the clang performance.
>From gcc-bugs-return-425242-listarch-gcc-bugs=gcc.gnu.org@gcc.gnu.org Wed Jun 26 19:02:58 2013
Return-Path: <gcc-bugs-return-425242-listarch-gcc-bugs=gcc.gnu.org@gcc.gnu.org>
Delivered-To: listarch-gcc-bugs@gcc.gnu.org
Received: (qmail 17172 invoked by alias); 26 Jun 2013 19:02:58 -0000
Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-bugs.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-bugs/>
List-Post: <mailto:gcc-bugs@gcc.gnu.org>
List-Help: <mailto:gcc-bugs-help@gcc.gnu.org>
Sender: gcc-bugs-owner@gcc.gnu.org
Delivered-To: mailing list gcc-bugs@gcc.gnu.org
Received: (qmail 17113 invoked by uid 48); 26 Jun 2013 19:02:51 -0000
From: "dominiq at lps dot ens.fr" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug fortran/47803] [F95+] Constant inquiry function rejected in PARAMETER definition
Date: Wed, 26 Jun 2013 19:02:00 -0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gcc
X-Bugzilla-Component: fortran
X-Bugzilla-Version: 4.6.0
X-Bugzilla-Keywords: rejects-valid
X-Bugzilla-Severity: normal
X-Bugzilla-Who: dominiq at lps dot ens.fr
X-Bugzilla-Status: NEW
X-Bugzilla-Priority: P3
X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org
X-Bugzilla-Target-Milestone: ---
X-Bugzilla-Flags:
X-Bugzilla-Changed-Fields: bug_status cf_reconfirmed_on everconfirmed
Message-ID: <bug-47803-4-osSXSwRjZ4@http.gcc.gnu.org/bugzilla/>
In-Reply-To: <bug-47803-4@http.gcc.gnu.org/bugzilla/>
References: <bug-47803-4@http.gcc.gnu.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: 7bit
X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
X-SW-Source: 2013-06/txt/msg01621.txt.bz2
Content-length: 605

http://gcc.gnu.org/bugzilla/show_bug.cgi?idG803

Dominique d'Humieres <dominiq at lps dot ens.fr> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |NEW
   Last reconfirmed|                            |2013-06-26
     Ever confirmed|0                           |1

--- Comment #2 from Dominique d'Humieres <dominiq at lps dot ens.fr> ---
The link http://j3-fortran.org/pipermail/j3/2011-February/004197.html is
broken. Any news from the committee about this PR?


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Bug middle-end/36041] Speed up builtin_popcountll
       [not found] <bug-36041-4@http.gcc.gnu.org/bugzilla/>
                   ` (3 preceding siblings ...)
  2013-06-26 18:52 ` crrodriguez at opensuse dot org
@ 2013-06-26 23:28 ` glisse at gcc dot gnu.org
  2013-06-26 23:31 ` pinskia at gcc dot gnu.org
                   ` (8 subsequent siblings)
  13 siblings, 0 replies; 20+ messages in thread
From: glisse at gcc dot gnu.org @ 2013-06-26 23:28 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36041

--- Comment #12 from Marc Glisse <glisse at gcc dot gnu.org> ---
Created attachment 30381
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=30381&action=edit
IFUNC proof of concept patch

Sadly, libgcc is compiled with gcc and not g++ so we can't use the recent
multiversioning support with the target attribute and we have to manually set
up ifunc. Note that the si/di difference is not a typo, just a wart in the way
libgcc is configured.

This is just a proof of concept, we'd want to replace also __popcountti2 at
least. And most importantly we need to restrict the inclusion of t-ifunc to
platforms where ifunc is supported (move it elsewhere in config.host, maybe
even include the content of t-ifunc in an existing t-*).

There are probably better ways to organize this, putting the generic
implementation in libgcc2.c protected by suitable macros (which ones?) so it
benefits also darwin/cygwin (no ifunc) and non-x86 platforms.

I didn't check the generic code, I just pasted it from one of the comments.

If you want this to happen, please work out a patch and post it to gcc-patches
(you can start from this one or not), don't wait for others to write one, I
won't have more time to spend on this. Don't be too afraid to test the wrong
macro, the reviewer will tell you if that is the case.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Bug middle-end/36041] Speed up builtin_popcountll
       [not found] <bug-36041-4@http.gcc.gnu.org/bugzilla/>
                   ` (4 preceding siblings ...)
  2013-06-26 23:28 ` glisse at gcc dot gnu.org
@ 2013-06-26 23:31 ` pinskia at gcc dot gnu.org
  2013-06-26 23:38 ` crrodriguez at opensuse dot org
                   ` (7 subsequent siblings)
  13 siblings, 0 replies; 20+ messages in thread
From: pinskia at gcc dot gnu.org @ 2013-06-26 23:31 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36041

--- Comment #13 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
(In reply to Marc Glisse from comment #12)
> Created attachment 30381 [details]
> IFUNC proof of concept patch

I think it is a bad idea to use ifunc for such a function as most of the time
it is link against statically in most cases.  Why can't you compile your code
with -march=native for the places where you know you are going to compile and
run directly on the same machine?


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Bug middle-end/36041] Speed up builtin_popcountll
       [not found] <bug-36041-4@http.gcc.gnu.org/bugzilla/>
                   ` (5 preceding siblings ...)
  2013-06-26 23:31 ` pinskia at gcc dot gnu.org
@ 2013-06-26 23:38 ` crrodriguez at opensuse dot org
  2013-06-26 23:49 ` glisse at gcc dot gnu.org
                   ` (6 subsequent siblings)
  13 siblings, 0 replies; 20+ messages in thread
From: crrodriguez at opensuse dot org @ 2013-06-26 23:38 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36041

--- Comment #14 from Cristian Rodríguez <crrodriguez at opensuse dot org> ---
(In reply to Andrew Pinski from comment #13)
> (In reply to Marc Glisse from comment #12)
 Why can't you compile
> your code with -march=native for the places where you know you are going to
> compile and run directly on the same machine?

Because it will be useless to general purpose distributions of course.
>From gcc-bugs-return-425256-listarch-gcc-bugs=gcc.gnu.org@gcc.gnu.org Wed Jun 26 23:41:34 2013
Return-Path: <gcc-bugs-return-425256-listarch-gcc-bugs=gcc.gnu.org@gcc.gnu.org>
Delivered-To: listarch-gcc-bugs@gcc.gnu.org
Received: (qmail 13785 invoked by alias); 26 Jun 2013 23:41:34 -0000
Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-bugs.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-bugs/>
List-Post: <mailto:gcc-bugs@gcc.gnu.org>
List-Help: <mailto:gcc-bugs-help@gcc.gnu.org>
Sender: gcc-bugs-owner@gcc.gnu.org
Delivered-To: mailing list gcc-bugs@gcc.gnu.org
Received: (qmail 13720 invoked by uid 48); 26 Jun 2013 23:41:31 -0000
From: "pinskia at gcc dot gnu.org" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug middle-end/36041] Speed up builtin_popcountll
Date: Wed, 26 Jun 2013 23:41:00 -0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gcc
X-Bugzilla-Component: middle-end
X-Bugzilla-Version: 4.2.0
X-Bugzilla-Keywords:
X-Bugzilla-Severity: enhancement
X-Bugzilla-Who: pinskia at gcc dot gnu.org
X-Bugzilla-Status: NEW
X-Bugzilla-Priority: P3
X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org
X-Bugzilla-Target-Milestone: ---
X-Bugzilla-Flags:
X-Bugzilla-Changed-Fields:
Message-ID: <bug-36041-4-QgPGnRyke1@http.gcc.gnu.org/bugzilla/>
In-Reply-To: <bug-36041-4@http.gcc.gnu.org/bugzilla/>
References: <bug-36041-4@http.gcc.gnu.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
X-SW-Source: 2013-06/txt/msg01635.txt.bz2
Content-length: 393

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36041

--- Comment #15 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
(In reply to Cristian Rodríguez from comment #14)
> Because it will be useless to general purpose distributions of course.

Then ifunc for this short of a function is not useful either.  Then maybe we
should move over to use the non table version for GCC in general.
>From gcc-bugs-return-425257-listarch-gcc-bugs=gcc.gnu.org@gcc.gnu.org Wed Jun 26 23:43:53 2013
Return-Path: <gcc-bugs-return-425257-listarch-gcc-bugs=gcc.gnu.org@gcc.gnu.org>
Delivered-To: listarch-gcc-bugs@gcc.gnu.org
Received: (qmail 16139 invoked by alias); 26 Jun 2013 23:43:53 -0000
Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-bugs.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-bugs/>
List-Post: <mailto:gcc-bugs@gcc.gnu.org>
List-Help: <mailto:gcc-bugs-help@gcc.gnu.org>
Sender: gcc-bugs-owner@gcc.gnu.org
Delivered-To: mailing list gcc-bugs@gcc.gnu.org
Received: (qmail 16081 invoked by uid 48); 26 Jun 2013 23:43:50 -0000
From: "glisse at gcc dot gnu.org" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug middle-end/36041] Speed up builtin_popcountll
Date: Wed, 26 Jun 2013 23:43:00 -0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gcc
X-Bugzilla-Component: middle-end
X-Bugzilla-Version: 4.2.0
X-Bugzilla-Keywords:
X-Bugzilla-Severity: enhancement
X-Bugzilla-Who: glisse at gcc dot gnu.org
X-Bugzilla-Status: NEW
X-Bugzilla-Priority: P3
X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org
X-Bugzilla-Target-Milestone: ---
X-Bugzilla-Flags:
X-Bugzilla-Changed-Fields:
Message-ID: <bug-36041-4-KVKOjXTSL6@http.gcc.gnu.org/bugzilla/>
In-Reply-To: <bug-36041-4@http.gcc.gnu.org/bugzilla/>
References: <bug-36041-4@http.gcc.gnu.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: 7bit
X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
X-SW-Source: 2013-06/txt/msg01636.txt.bz2
Content-length: 687

http://gcc.gnu.org/bugzilla/show_bug.cgi?id6041

--- Comment #16 from Marc Glisse <glisse at gcc dot gnu.org> ---
(In reply to Andrew Pinski from comment #13)
> I think it is a bad idea to use ifunc for such a function as most of the
> time it is link against statically in most cases.

g++ links to it dynamically by default.
Maybe we only want ifunc for libgcc_s and not libgcc, I haven't thought about
it.

> Why can't you compile your code with -march=native

That's what I do, but Cristian is probably compiling generic packages for a
distribution

> for the places where you know you are going to
> compile and run directly on the same machine?

so he is not in this situation.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Bug middle-end/36041] Speed up builtin_popcountll
       [not found] <bug-36041-4@http.gcc.gnu.org/bugzilla/>
                   ` (6 preceding siblings ...)
  2013-06-26 23:38 ` crrodriguez at opensuse dot org
@ 2013-06-26 23:49 ` glisse at gcc dot gnu.org
  2013-06-27  5:34 ` jakub at gcc dot gnu.org
                   ` (5 subsequent siblings)
  13 siblings, 0 replies; 20+ messages in thread
From: glisse at gcc dot gnu.org @ 2013-06-26 23:49 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36041

--- Comment #17 from Marc Glisse <glisse at gcc dot gnu.org> ---
(In reply to Andrew Pinski from comment #15)
> (In reply to Cristian Rodríguez from comment #14)
> > Because it will be useless to general purpose distributions of course.
> 
> Then ifunc for this short of a function is not useful either.  Then maybe we
> should move over to use the non table version for GCC in general.

Moving from the table to the non-table version speeds things by a factor 2. The
ifunc version gains another factor 2. I wouldn't call that useless. (obviously,
it remains 4 or 5 times slower than an inlined version)
>From gcc-bugs-return-425259-listarch-gcc-bugs=gcc.gnu.org@gcc.gnu.org Thu Jun 27 00:10:22 2013
Return-Path: <gcc-bugs-return-425259-listarch-gcc-bugs=gcc.gnu.org@gcc.gnu.org>
Delivered-To: listarch-gcc-bugs@gcc.gnu.org
Received: (qmail 29520 invoked by alias); 27 Jun 2013 00:10:22 -0000
Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-bugs.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-bugs/>
List-Post: <mailto:gcc-bugs@gcc.gnu.org>
List-Help: <mailto:gcc-bugs-help@gcc.gnu.org>
Sender: gcc-bugs-owner@gcc.gnu.org
Delivered-To: mailing list gcc-bugs@gcc.gnu.org
Received: (qmail 29425 invoked by uid 48); 27 Jun 2013 00:10:10 -0000
From: "dje at gcc dot gnu.org" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug testsuite/57687] FAIL: c-c++-common/cilk-plus/AN/comma_exp.c on x86_64-apple-darwin10
Date: Thu, 27 Jun 2013 00:10:00 -0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gcc
X-Bugzilla-Component: testsuite
X-Bugzilla-Version: 4.9.0
X-Bugzilla-Keywords:
X-Bugzilla-Severity: normal
X-Bugzilla-Who: dje at gcc dot gnu.org
X-Bugzilla-Status: NEW
X-Bugzilla-Priority: P3
X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org
X-Bugzilla-Target-Milestone: ---
X-Bugzilla-Flags:
X-Bugzilla-Changed-Fields: cf_gcctarget cc cf_gcchost
Message-ID: <bug-57687-4-0LWlkk9fJS@http.gcc.gnu.org/bugzilla/>
In-Reply-To: <bug-57687-4@http.gcc.gnu.org/bugzilla/>
References: <bug-57687-4@http.gcc.gnu.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: 7bit
X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
X-SW-Source: 2013-06/txt/msg01638.txt.bz2
Content-length: 610

http://gcc.gnu.org/bugzilla/show_bug.cgi?idW687

David Edelsohn <dje at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Target|x86_64-apple-darwin10       |x86_64-apple-darwin10
                   |mips-mti-elf                |mips-mti-elf powerpc-aix
                 CC|                            |dje at gcc dot gnu.org
               Host|x86_64-apple-darwin10       |*-*-*

--- Comment #6 from David Edelsohn <dje at gcc dot gnu.org> ---
The failures also occur on powerpc-aix


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Bug middle-end/36041] Speed up builtin_popcountll
       [not found] <bug-36041-4@http.gcc.gnu.org/bugzilla/>
                   ` (7 preceding siblings ...)
  2013-06-26 23:49 ` glisse at gcc dot gnu.org
@ 2013-06-27  5:34 ` jakub at gcc dot gnu.org
  2013-06-27  6:14 ` crrodriguez at opensuse dot org
                   ` (4 subsequent siblings)
  13 siblings, 0 replies; 20+ messages in thread
From: jakub at gcc dot gnu.org @ 2013-06-27  5:34 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36041

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |jakub at gcc dot gnu.org

--- Comment #18 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
I think it is a bad idea to introduce the IFUNC into libgcc_s, because then
while you speed up the few users of this builtin, you slow down all users of
libgcc_s (pretty much all C++ programs and lots of C programs), because they
will need to resolve the ifunc.  For a very heavily used builtin perhaps, but
for a rarely used one it just isn't a good idea.  User's can just use
multi-versioning themselves and use __builtin_popcount* in the multi-versioned
function.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Bug middle-end/36041] Speed up builtin_popcountll
       [not found] <bug-36041-4@http.gcc.gnu.org/bugzilla/>
                   ` (8 preceding siblings ...)
  2013-06-27  5:34 ` jakub at gcc dot gnu.org
@ 2013-06-27  6:14 ` crrodriguez at opensuse dot org
  2013-06-27  7:13 ` jakub at gcc dot gnu.org
                   ` (3 subsequent siblings)
  13 siblings, 0 replies; 20+ messages in thread
From: crrodriguez at opensuse dot org @ 2013-06-27  6:14 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36041

--- Comment #19 from Cristian Rodríguez <crrodriguez at opensuse dot org> ---
(In reply to Jakub Jelinek from comment #18)
> I think it is a bad idea to introduce the IFUNC into libgcc_s, because then
> while you speed up the few users of this builtin, you slow down all users of
> libgcc_s (pretty much all C++ programs and lots of C programs), because they
> will need to resolve the ifunc.  For a very heavily used builtin perhaps,
> but for a rarely used one it just isn't a good idea.  User's can just use
> multi-versioning themselves and use __builtin_popcount* in the
> multi-versioned function.

Hold on..Apparently I used ambiguous language in my comment.. adding ifuncs to
libgcc* was not my real suggestion, but to EMIT such IFUNC s in the resulting
final user code when the target environment allows it. One generic, one
hardware/arch specific.
>From gcc-bugs-return-425262-listarch-gcc-bugs=gcc.gnu.org@gcc.gnu.org Thu Jun 27 06:43:39 2013
Return-Path: <gcc-bugs-return-425262-listarch-gcc-bugs=gcc.gnu.org@gcc.gnu.org>
Delivered-To: listarch-gcc-bugs@gcc.gnu.org
Received: (qmail 9727 invoked by alias); 27 Jun 2013 06:43:39 -0000
Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-bugs.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-bugs/>
List-Post: <mailto:gcc-bugs@gcc.gnu.org>
List-Help: <mailto:gcc-bugs-help@gcc.gnu.org>
Sender: gcc-bugs-owner@gcc.gnu.org
Delivered-To: mailing list gcc-bugs@gcc.gnu.org
Received: (qmail 9639 invoked by uid 48); 27 Jun 2013 06:43:35 -0000
From: "glisse at gcc dot gnu.org" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug middle-end/36041] Speed up builtin_popcountll
Date: Thu, 27 Jun 2013 06:43:00 -0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gcc
X-Bugzilla-Component: middle-end
X-Bugzilla-Version: 4.2.0
X-Bugzilla-Keywords:
X-Bugzilla-Severity: enhancement
X-Bugzilla-Who: glisse at gcc dot gnu.org
X-Bugzilla-Status: NEW
X-Bugzilla-Priority: P3
X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org
X-Bugzilla-Target-Milestone: ---
X-Bugzilla-Flags:
X-Bugzilla-Changed-Fields:
Message-ID: <bug-36041-4-imZZwc3kCu@http.gcc.gnu.org/bugzilla/>
In-Reply-To: <bug-36041-4@http.gcc.gnu.org/bugzilla/>
References: <bug-36041-4@http.gcc.gnu.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
X-SW-Source: 2013-06/txt/msg01641.txt.bz2
Content-length: 1533

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36041

--- Comment #20 from Marc Glisse <glisse at gcc dot gnu.org> ---
(In reply to Jakub Jelinek from comment #18)
> I think it is a bad idea to introduce the IFUNC into libgcc_s, because then
> while you speed up the few users of this builtin, you slow down all users of
> libgcc_s (pretty much all C++ programs and lots of C programs), because they
> will need to resolve the ifunc.

I assume it is only those that use the builtin at least once, no? At least
LD_DEBUG seems to say so. I have no idea how heavy the ifunc resolution is, so
ok. We are back to only considering the non-table version... (By the way,
shouldn't these builtins act like C99 inline functions, so we can sometimes
inline them at -O3 (it could also enable vectorization)? Or maybe they already
do and it's just that I didn't test hard enough)


(In reply to Cristian Rodríguez from comment #19)
> Hold on..Apparently I used ambiguous language in my comment.. adding ifuncs
> to libgcc* was not my real suggestion, but to EMIT such IFUNC s in the
> resulting final user code when the target environment allows it. One
> generic, one hardware/arch specific.

Not sure if that's much better. Ideally we'd clone the hot loop that uses it
and propagate the versioning to that, not just the instruction, but I don't
think we have any code for that. Although if gcc saw the full code:
if(__builtin_cpu_supports("popcnt"))_mm_popcnt_u64(x);else{call lib}, it might
already manage to clone the loop.
>From gcc-bugs-return-425263-listarch-gcc-bugs=gcc.gnu.org@gcc.gnu.org Thu Jun 27 07:12:00 2013
Return-Path: <gcc-bugs-return-425263-listarch-gcc-bugs=gcc.gnu.org@gcc.gnu.org>
Delivered-To: listarch-gcc-bugs@gcc.gnu.org
Received: (qmail 19810 invoked by alias); 27 Jun 2013 07:12:00 -0000
Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-bugs.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-bugs/>
List-Post: <mailto:gcc-bugs@gcc.gnu.org>
List-Help: <mailto:gcc-bugs-help@gcc.gnu.org>
Sender: gcc-bugs-owner@gcc.gnu.org
Delivered-To: mailing list gcc-bugs@gcc.gnu.org
Received: (qmail 19761 invoked by uid 48); 27 Jun 2013 07:11:54 -0000
From: "jbeulich at novell dot com" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug c/57725] conflicting language extensions
Date: Thu, 27 Jun 2013 07:12:00 -0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gcc
X-Bugzilla-Component: c
X-Bugzilla-Version: 4.8.1
X-Bugzilla-Keywords:
X-Bugzilla-Severity: normal
X-Bugzilla-Who: jbeulich at novell dot com
X-Bugzilla-Status: UNCONFIRMED
X-Bugzilla-Priority: P3
X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org
X-Bugzilla-Target-Milestone: ---
X-Bugzilla-Flags:
X-Bugzilla-Changed-Fields:
Message-ID: <bug-57725-4-ytV2gfwD6i@http.gcc.gnu.org/bugzilla/>
In-Reply-To: <bug-57725-4@http.gcc.gnu.org/bugzilla/>
References: <bug-57725-4@http.gcc.gnu.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: 7bit
X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
X-SW-Source: 2013-06/txt/msg01642.txt.bz2
Content-length: 922

http://gcc.gnu.org/bugzilla/show_bug.cgi?idW725

--- Comment #5 from jbeulich at novell dot com ---
How that? How is code supposed to find out then?

Perhaps briefly explaining where this is coming from originally might help: The
Xen hypervisor (as much as Linux) has a number of linker script constructs like

  .xsm_initcall.init : {
       __xsm_initcall_start = .;
       *(.xsm_initcall.init)
       __xsm_initcall_end = .;
  } :text

If there's no matching input section at all, the two boundary symbols will end
up equal. How would C code be supposed to find out if the comparison result is
unspecified?

And remember, this is not a problem with default visibility (presumably because
the code needs to be prepared for link time overrides of the symbols), but Xen
likes to get built with non-default visibility in order to avoid expensive GOT
indirections when accessing data despite the necessary use of -fPIC.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Bug middle-end/36041] Speed up builtin_popcountll
       [not found] <bug-36041-4@http.gcc.gnu.org/bugzilla/>
                   ` (9 preceding siblings ...)
  2013-06-27  6:14 ` crrodriguez at opensuse dot org
@ 2013-06-27  7:13 ` jakub at gcc dot gnu.org
  2013-06-28 12:50 ` glisse at gcc dot gnu.org
                   ` (2 subsequent siblings)
  13 siblings, 0 replies; 20+ messages in thread
From: jakub at gcc dot gnu.org @ 2013-06-27  7:13 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36041

--- Comment #21 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Created attachment 30382
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=30382&action=edit
gcc49-pr36041.patch

Untested libgcc2.c implementation (no hw support).  HW support is IMHO best
dealt on the compiler side doing something, already a PLT call is fairly
expensive, but it depends if __builtin_popcount* is used in a hot loop or in
cold code (in the latter case it really doesn't matter).

I've looked at code generated for:
int
foo (unsigned long long i)
{
  i = i - ((i >> 1) & 0x5555555555555555);
  i = (i & 0x3333333333333333) + ((i >> 2) & 0x3333333333333333);
  i = (i + (i >> 4)) & 0xF0F0F0F0F0F0F0F;
  return (i * 0x101010101010101) >> 56;
}

int
bar (unsigned long long i)
{
  unsigned int i1 = i, i2 = i >> 32;
  i1 = i1 - ((i1 >> 1) & 0x55555555);
  i2 = i2 - ((i2 >> 1) & 0x55555555);
  i1 = (i1 & 0x33333333) + ((i1 >> 2) & 0x33333333);
  i2 = (i2 & 0x33333333) + ((i2 >> 2) & 0x33333333);
  i1 = (i1 + (i1 >> 4)) & 0xF0F0F0F;
  i2 = (i2 + (i2 >> 4)) & 0xF0F0F0F;
  return ((i1 + i2) * 0x1010101) >> 24;
}

int
baz (unsigned long long i)
{
  i = i - ((i >> 1) & 0x5555555555555555);
  i = (i & 0x3333333333333333) + ((i >> 2) & 0x3333333333333333);
  i = (i + (i >> 4)) & 0xF0F0F0F0F0F0F0F;
  return ((((unsigned int) i) + (unsigned int) (i >> 32)) * 0x1010101) >> 24;
}
on gcc -O2 -m32 and picked the second variant as shortest for the UDWtype
implementation, I guess that is likely the case on most targets.  Note that the
patch still doesn't attempt to figure out if UWtype multiplication is expensive
or not, perhaps a useful test would be whether we emit __umul<mode>3 for that
mode inside of libgcc - we'd need to define some macro and if multiplication is
expensive use the shifts + additions alternative instead.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Bug middle-end/36041] Speed up builtin_popcountll
       [not found] <bug-36041-4@http.gcc.gnu.org/bugzilla/>
                   ` (10 preceding siblings ...)
  2013-06-27  7:13 ` jakub at gcc dot gnu.org
@ 2013-06-28 12:50 ` glisse at gcc dot gnu.org
  2013-06-28 13:01 ` jakub at gcc dot gnu.org
  2021-08-16 23:28 ` pinskia at gcc dot gnu.org
  13 siblings, 0 replies; 20+ messages in thread
From: glisse at gcc dot gnu.org @ 2013-06-28 12:50 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36041

--- Comment #23 from Marc Glisse <glisse at gcc dot gnu.org> ---
(In reply to Jakub Jelinek from comment #18)
> I think it is a bad idea to introduce the IFUNC into libgcc_s, because then
> while you speed up the few users of this builtin, you slow down all users of
> libgcc_s (pretty much all C++ programs and lots of C programs), because they
> will need to resolve the ifunc.

Do you have a pointer to some text that explains this cost? I tried to read a
bit, but it seems to me that unless you use LD_BIND_NOW, the ifunc won't be
resolved if it isn't called. And if it is called, the main cost compared to a
normal relocation should be the feature test, which is indeed a bit long but
not that bad, especially since calling popcount once increases a lot the
probability that it will be called again.

Not that it matters that much, interested distributions could always ship an
sse4 libgcc_s where they replace the implementation of popcount, and best
performance requires changing things before there is even a call to libgcc, but
I'd like to understand when ifunc should be used or avoided (quite a few glibc
functions seem to use ifunc, though they are much more used than popcount and
most have a non-constant complexity).


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Bug middle-end/36041] Speed up builtin_popcountll
       [not found] <bug-36041-4@http.gcc.gnu.org/bugzilla/>
                   ` (11 preceding siblings ...)
  2013-06-28 12:50 ` glisse at gcc dot gnu.org
@ 2013-06-28 13:01 ` jakub at gcc dot gnu.org
  2021-08-16 23:28 ` pinskia at gcc dot gnu.org
  13 siblings, 0 replies; 20+ messages in thread
From: jakub at gcc dot gnu.org @ 2013-06-28 13:01 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36041

--- Comment #24 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
(In reply to Marc Glisse from comment #23)
> (In reply to Jakub Jelinek from comment #18)
> > I think it is a bad idea to introduce the IFUNC into libgcc_s, because then
> > while you speed up the few users of this builtin, you slow down all users of
> > libgcc_s (pretty much all C++ programs and lots of C programs), because they
> > will need to resolve the ifunc.
> 
> Do you have a pointer to some text that explains this cost? I tried to read
> a bit, but it seems to me that unless you use LD_BIND_NOW, the ifunc won't
> be resolved if it isn't called. And if it is called, the main cost compared
> to a normal relocation should be the feature test, which is indeed a bit
> long but not that bad, especially since calling popcount once increases a
> lot the probability that it will be called again.
> 
> Not that it matters that much, interested distributions could always ship an
> sse4 libgcc_s where they replace the implementation of popcount, and best
> performance requires changing things before there is even a call to libgcc,
> but I'd like to understand when ifunc should be used or avoided (quite a few
> glibc functions seem to use ifunc, though they are much more used than
> popcount and most have a non-constant complexity).

If you use prelink, then everything successfully prelinked is LD_BIND_NOW too
(for most relocations without runtime cost, but for ifunc not), and
for every IFUNC that requires a .gnu.conflict ifunc relocation that needs to be
resolved right away.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Bug middle-end/36041] Speed up builtin_popcountll
       [not found] <bug-36041-4@http.gcc.gnu.org/bugzilla/>
                   ` (12 preceding siblings ...)
  2013-06-28 13:01 ` jakub at gcc dot gnu.org
@ 2021-08-16 23:28 ` pinskia at gcc dot gnu.org
  13 siblings, 0 replies; 20+ messages in thread
From: pinskia at gcc dot gnu.org @ 2021-08-16 23:28 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=36041

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Keywords|                            |missed-optimization

--- Comment #25 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
I still wonder if we should inline popcount if the target does not support it. 
Same with the vectorized version of it.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Bug middle-end/36041] Speed up builtin_popcountll
  2008-04-25  0:35 [Bug c/36041] New: " intvnut at gmail dot com
                   ` (4 preceding siblings ...)
  2008-04-29  3:42 ` intvnut at gmail dot com
@ 2010-02-21  1:34 ` manu at gcc dot gnu dot org
  5 siblings, 0 replies; 20+ messages in thread
From: manu at gcc dot gnu dot org @ 2010-02-21  1:34 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #7 from manu at gcc dot gnu dot org  2010-02-21 01:34 -------
Given Richard's comment, I am confirming this.

Joseph,

bugzilla is too busy to keep track of conversations. If you have questions
about gcc development, go to gcc@gcc.gnu.org. See also
http://gcc.gnu.org/contribute.html

If you send a patch to gcc-patches@gcc.gnu.org, you may get more specific
feedback.


-- 

manu at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |manu at gcc dot gnu dot org
             Status|UNCONFIRMED                 |NEW
     Ever Confirmed|0                           |1
   Last reconfirmed|0000-00-00 00:00:00         |2010-02-21 01:34:11
               date|                            |


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36041


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Bug middle-end/36041] Speed up builtin_popcountll
  2008-04-25  0:35 [Bug c/36041] New: " intvnut at gmail dot com
                   ` (3 preceding siblings ...)
  2008-04-25 14:52 ` rguenth at gcc dot gnu dot org
@ 2008-04-29  3:42 ` intvnut at gmail dot com
  2010-02-21  1:34 ` manu at gcc dot gnu dot org
  5 siblings, 0 replies; 20+ messages in thread
From: intvnut at gmail dot com @ 2008-04-29  3:42 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #6 from intvnut at gmail dot com  2008-04-29 03:42 -------
(In reply to comment #5)
> It should be possible to have an alternate implementation in libgcc2.c by means
> of just selecting on a proper architecture define or the size of the argument
> mode.
> 

I see where it would go in libgcc2.c, but I don't know the appropriate
architecture defines to key off of, since I really do know nothing about GCC's
internals.

Since the method I used above is likely to be a strict improvement over the
table driven method on 32-bit and 64-bit targets, is it enough to, say, key off
of "#if W_TYPE_SIZE == 32" and "#if W_TYPE_SIZE == 64"?  Is there some
documentation I can read to know how best to propose a patch? 

(I'm just a motivated user, not a compiler developer, in case you couldn't
tell.)


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36041


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Bug middle-end/36041] Speed up builtin_popcountll
  2008-04-25  0:35 [Bug c/36041] New: " intvnut at gmail dot com
                   ` (2 preceding siblings ...)
  2008-04-25 12:29 ` intvnut at gmail dot com
@ 2008-04-25 14:52 ` rguenth at gcc dot gnu dot org
  2008-04-29  3:42 ` intvnut at gmail dot com
  2010-02-21  1:34 ` manu at gcc dot gnu dot org
  5 siblings, 0 replies; 20+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2008-04-25 14:52 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #5 from rguenth at gcc dot gnu dot org  2008-04-25 14:52 -------
It should be possible to have an alternate implementation in libgcc2.c by means
of just selecting on a proper architecture define or the size of the argument
mode.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36041


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Bug middle-end/36041] Speed up builtin_popcountll
  2008-04-25  0:35 [Bug c/36041] New: " intvnut at gmail dot com
  2008-04-25  0:40 ` [Bug middle-end/36041] " intvnut at gmail dot com
  2008-04-25  8:45 ` rguenth at gcc dot gnu dot org
@ 2008-04-25 12:29 ` intvnut at gmail dot com
  2008-04-25 14:52 ` rguenth at gcc dot gnu dot org
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 20+ messages in thread
From: intvnut at gmail dot com @ 2008-04-25 12:29 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #4 from intvnut at gmail dot com  2008-04-25 12:29 -------
Is there a mechanism to provide different implementations based on the target
(or in this case, class of target)?  The byte count approach certainly is more
appropriate for 8-bit targets, sure, but what about the rest of us?  How are
targets handled that might have this as an instruction?

FWIW, I'd be happy to write a 32-bit version to complement the 64-bit version I
provided with my report if there's a way to build a different implementation
based on the class of target being compiled for.  That way, embedded 8/16 bit,
32 bit and 64 bit targets can each have a version that's appropriate for that
class of target.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36041


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Bug middle-end/36041] Speed up builtin_popcountll
  2008-04-25  0:35 [Bug c/36041] New: " intvnut at gmail dot com
  2008-04-25  0:40 ` [Bug middle-end/36041] " intvnut at gmail dot com
@ 2008-04-25  8:45 ` rguenth at gcc dot gnu dot org
  2008-04-25 12:29 ` intvnut at gmail dot com
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 20+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2008-04-25  8:45 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #3 from rguenth at gcc dot gnu dot org  2008-04-25 08:44 -------
The implementation is written so it is also reasonable on targets like the AVR
which only has 8bit registers...


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36041


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Bug middle-end/36041] Speed up builtin_popcountll
  2008-04-25  0:35 [Bug c/36041] New: " intvnut at gmail dot com
@ 2008-04-25  0:40 ` intvnut at gmail dot com
  2008-04-25  8:45 ` rguenth at gcc dot gnu dot org
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 20+ messages in thread
From: intvnut at gmail dot com @ 2008-04-25  0:40 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #2 from intvnut at gmail dot com  2008-04-25 00:39 -------
When run on my Opteron 280 system, the four separate implementations give the
following run times:

popcount64_1 = 13130000 clocks
popcount64_2 = 6480000 clocks
popcount64_3 = 3740000 clocks
popcount64_4 = 5490000 clocks

As one can see, the popcount64_3 implementation is over 3.5x the speed of the
__builtin_popcountll implementation.


-- 

intvnut at gmail dot com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |intvnut at gmail dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36041


^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2021-08-16 23:28 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <bug-36041-4@http.gcc.gnu.org/bugzilla/>
2012-09-05 10:40 ` [Bug middle-end/36041] Speed up builtin_popcountll jsalavert at gmail dot com
2012-09-05 15:21 ` paolo.carlini at oracle dot com
2012-10-26 15:51 ` gpiez at web dot de
2013-06-26 18:52 ` crrodriguez at opensuse dot org
2013-06-26 23:28 ` glisse at gcc dot gnu.org
2013-06-26 23:31 ` pinskia at gcc dot gnu.org
2013-06-26 23:38 ` crrodriguez at opensuse dot org
2013-06-26 23:49 ` glisse at gcc dot gnu.org
2013-06-27  5:34 ` jakub at gcc dot gnu.org
2013-06-27  6:14 ` crrodriguez at opensuse dot org
2013-06-27  7:13 ` jakub at gcc dot gnu.org
2013-06-28 12:50 ` glisse at gcc dot gnu.org
2013-06-28 13:01 ` jakub at gcc dot gnu.org
2021-08-16 23:28 ` pinskia at gcc dot gnu.org
2008-04-25  0:35 [Bug c/36041] New: " intvnut at gmail dot com
2008-04-25  0:40 ` [Bug middle-end/36041] " intvnut at gmail dot com
2008-04-25  8:45 ` rguenth at gcc dot gnu dot org
2008-04-25 12:29 ` intvnut at gmail dot com
2008-04-25 14:52 ` rguenth at gcc dot gnu dot org
2008-04-29  3:42 ` intvnut at gmail dot com
2010-02-21  1:34 ` manu at gcc dot gnu dot org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).