public inbox for gcc-help@gcc.gnu.org
 help / color / mirror / Atom feed
* Is this a missed optimization for popcount implementation?
@ 2020-12-05 15:13 andre maute
  2020-12-05 18:38 ` Jonathan Wakely
  0 siblings, 1 reply; 5+ messages in thread
From: andre maute @ 2020-12-05 15:13 UTC (permalink / raw)
  To: gcc-help

Hi list,

I'm wondering if we have here a missed optimization
for a popcount implementation.

https://en.cppreference.com/w/cpp/numeric/popcount

std::popcount is available with C++20, which might not be able
on many systems for years.

Let us consider the code below

I'm running a Fedora 32 linux with
$ g++ --version
g++ (GCC) 10.2.1 20201016 (Red Hat 10.2.1-6)
Copyright (C) 2020 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

$ g++ -O3 -save-temps popcount.cc -march=ivybridge

popcount1() what I consider the standard implementation
does NOT get reduced into the relevant popcount instruction

popcount2() what I consider the standard "trick" implementation
gets reduced into the relevant popcount instruction

Is this expected behaviour?
You'll have to know the trick to get the optimization!

Regards Andre

/////////////// popcount.cc ///////////////
#include <iostream>

unsigned int popcount1( unsigned int n )
{
	unsigned int res = 0;
	while( n != 0 ) {
		res += n & 1;
		n >>= 1;
	}
	return res;
}

unsigned int popcount2( unsigned int n )
{
	unsigned int res = 0;
	while( n != 0 ) {
		res++;
		n &= (n-1);
	}
	return res;
}

int main()
{
	for( unsigned int n = 0; n < (1 << 31); n++ ) {
		if( n % 100000 == 0 ) {
			std::cout << n << std::endl;
		}
		unsigned int c1 = popcount1(n);
		unsigned int c2 = popcount2(n);
		if( c1 != c2 ) {
			std::cout << n << " " << c1 << " " << c2 << std::endl;
			return 1;
		}
	}

	return 0;
}
/////////////// popcount.cc ///////////////





^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Is this a missed optimization for popcount implementation?
  2020-12-05 15:13 Is this a missed optimization for popcount implementation? andre maute
@ 2020-12-05 18:38 ` Jonathan Wakely
  2020-12-05 19:48   ` andre maute
  0 siblings, 1 reply; 5+ messages in thread
From: Jonathan Wakely @ 2020-12-05 18:38 UTC (permalink / raw)
  To: andre maute; +Cc: gcc-help

On Sat, 5 Dec 2020, 15:15 andre maute, <andre.maute@gmx.de> wrote:

> Hi list,
>
> I'm wondering if we have here a missed optimization
> for a popcount implementation.
>
> https://en.cppreference.com/w/cpp/numeric/popcount
>
> std::popcount is available with C++20, which might not be able
> on many systems for years.
>
> Let us consider the code below
>
> I'm running a Fedora 32 linux with
> $ g++ --version
> g++ (GCC) 10.2.1 20201016 (Red Hat 10.2.1-6)
> Copyright (C) 2020 Free Software Foundation, Inc.
> This is free software; see the source for copying conditions.  There is NO
> warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
>
> $ g++ -O3 -save-temps popcount.cc -march=ivybridge
>
> popcount1() what I consider the standard implementation
> does NOT get reduced into the relevant popcount instruction
>
> popcount2() what I consider the standard "trick" implementation
> gets reduced into the relevant popcount instruction
>
> Is this expected behaviour?
> You'll have to know the trick to get the optimization!
>
> Regards Andre
>
> /////////////// popcount.cc ///////////////
> #include <iostream>
>
> unsigned int popcount1( unsigned int n )
> {
>

#if __has_builtin(__builtin_popcount)
  return __builtin_popcount(n);
#endif

        unsigned int res = 0;
>         while( n != 0 ) {
>                 res += n & 1;
>                 n >>= 1;
>         }
>         return res;
> }
>
> unsigned int popcount2( unsigned int n )
> {
>         unsigned int res = 0;
>         while( n != 0 ) {
>                 res++;
>                 n &= (n-1);
>         }
>         return res;
> }
>
> int main()
> {
>         for( unsigned int n = 0; n < (1 << 31); n++ ) {
>                 if( n % 100000 == 0 ) {
>                         std::cout << n << std::endl;
>                 }
>                 unsigned int c1 = popcount1(n);
>                 unsigned int c2 = popcount2(n);
>                 if( c1 != c2 ) {
>                         std::cout << n << " " << c1 << " " << c2 <<
> std::endl;
>                         return 1;
>                 }
>         }
>
>         return 0;
> }
> /////////////// popcount.cc ///////////////
>
>
>
>
>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Is this a missed optimization for popcount implementation?
  2020-12-05 18:38 ` Jonathan Wakely
@ 2020-12-05 19:48   ` andre maute
  2020-12-07 22:03     ` Jim Wilson
  0 siblings, 1 reply; 5+ messages in thread
From: andre maute @ 2020-12-05 19:48 UTC (permalink / raw)
  To: gcc-help

Hi Jonathan,

the builtin is not the point I would like to make.

I'm wondering why is there a special optimization
for the "trick" code, and why isn't there one for the "naive" code?

Regards Andre


On 12/5/20 7:38 PM, Jonathan Wakely wrote:
> On Sat, 5 Dec 2020, 15:15 andre maute, <andre.maute@gmx.de> wrote:
>
>> Hi list,
>>
>> I'm wondering if we have here a missed optimization
>> for a popcount implementation.
>>
>> https://en.cppreference.com/w/cpp/numeric/popcount
>>
>> std::popcount is available with C++20, which might not be able
>> on many systems for years.
>>
>> Let us consider the code below
>>
>> I'm running a Fedora 32 linux with
>> $ g++ --version
>> g++ (GCC) 10.2.1 20201016 (Red Hat 10.2.1-6)
>> Copyright (C) 2020 Free Software Foundation, Inc.
>> This is free software; see the source for copying conditions.  There is NO
>> warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
>>
>> $ g++ -O3 -save-temps popcount.cc -march=ivybridge
>>
>> popcount1() what I consider the standard implementation
>> does NOT get reduced into the relevant popcount instruction
>>
>> popcount2() what I consider the standard "trick" implementation
>> gets reduced into the relevant popcount instruction
>>
>> Is this expected behaviour?
>> You'll have to know the trick to get the optimization!
>>
>> Regards Andre
>>
>> /////////////// popcount.cc ///////////////
>> #include <iostream>
>>
>> unsigned int popcount1( unsigned int n )
>> {
>>
>
> #if __has_builtin(__builtin_popcount)
>    return __builtin_popcount(n);
> #endif
>
>          unsigned int res = 0;
>>          while( n != 0 ) {
>>                  res += n & 1;
>>                  n >>= 1;
>>          }
>>          return res;
>> }
>>
>> unsigned int popcount2( unsigned int n )
>> {
>>          unsigned int res = 0;
>>          while( n != 0 ) {
>>                  res++;
>>                  n &= (n-1);
>>          }
>>          return res;
>> }
>>
>> int main()
>> {
>>          for( unsigned int n = 0; n < (1 << 31); n++ ) {
>>                  if( n % 100000 == 0 ) {
>>                          std::cout << n << std::endl;
>>                  }
>>                  unsigned int c1 = popcount1(n);
>>                  unsigned int c2 = popcount2(n);
>>                  if( c1 != c2 ) {
>>                          std::cout << n << " " << c1 << " " << c2 <<
>> std::endl;
>>                          return 1;
>>                  }
>>          }
>>
>>          return 0;
>> }
>> /////////////// popcount.cc ///////////////
>>
>>
>>
>>
>>
>


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Is this a missed optimization for popcount implementation?
  2020-12-05 19:48   ` andre maute
@ 2020-12-07 22:03     ` Jim Wilson
  2020-12-08 23:39       ` andre maute
  0 siblings, 1 reply; 5+ messages in thread
From: Jim Wilson @ 2020-12-07 22:03 UTC (permalink / raw)
  To: andre maute; +Cc: gcc-help

On Sat, Dec 5, 2020 at 11:48 AM andre maute <andre.maute@gmx.de> wrote:

> I'm wondering why is there a special optimization
> for the "trick" code, and why isn't there one for the "naive" code?
>
Because a bug report was submitted for the example that you are calling
trick code.
     https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82479
and a patch was written to fix that.  The bug report says this code is
present in spec2017, which is why they wanted it optimized., and why a
volunteer was willing to write the complicated code necessary to perform
this optimization.

Anyways, yes, I'd call this a missing optimization, though likely of
limited interest, unless you can point at a benchmark or important program
that really needs it.  You are welcome to submit a bug report, and maybe
point back at bug 82479.

But in general, you are better off calling __builtin_popcount.

Jim

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Is this a missed optimization for popcount implementation?
  2020-12-07 22:03     ` Jim Wilson
@ 2020-12-08 23:39       ` andre maute
  0 siblings, 0 replies; 5+ messages in thread
From: andre maute @ 2020-12-08 23:39 UTC (permalink / raw)
  To: gcc-help

On 12/7/20 11:03 PM, Jim Wilson wrote:
> On Sat, Dec 5, 2020 at 11:48 AM andre maute <andre.maute@gmx.de> wrote:
>
>> I'm wondering why is there a special optimization
>> for the "trick" code, and why isn't there one for the "naive" code?
>>
> Because a bug report was submitted for the example that you are calling
> trick code.
>       https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82479
> and a patch was written to fix that.  The bug report says this code is
> present in spec2017, which is why they wanted it optimized., and why a
> volunteer was willing to write the complicated code necessary to perform
> this optimization.
>
> Anyways, yes, I'd call this a missing optimization, though likely of
> limited interest, unless you can point at a benchmark or important program
> that really needs it.  You are welcome to submit a bug report, and maybe
> point back at bug 82479.
>
> But in general, you are better off calling __builtin_popcount.
>
> Jim
>
Thank you Jim for the pointer.
I was triggered watching a youtube video with Matt Godbolt
	https://www.youtube.com/watch?v=w0sz5WbS5AM
about how amazing compilers are.

Wouldn't it be nice if you could tell your colleagues
that you don't have to remember those bit hacking "tricks" anymore.

This might become relevant, if you still have to additionally support
old pre C++11 multiplatform sources.

With your pointer and
https://graphics.stanford.edu/~seander/bithacks.html
this could be a nice beginner project.

Regards Andre

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2020-12-08 23:39 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-12-05 15:13 Is this a missed optimization for popcount implementation? andre maute
2020-12-05 18:38 ` Jonathan Wakely
2020-12-05 19:48   ` andre maute
2020-12-07 22:03     ` Jim Wilson
2020-12-08 23:39       ` andre maute

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).