public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
* Another epic optimiser failure
@ 2023-05-27 21:04 Stefan Kanthak
  2023-05-27 21:20 ` Jakub Jelinek
  2023-05-28  6:28 ` Nicholas Vinson
  0 siblings, 2 replies; 11+ messages in thread
From: Stefan Kanthak @ 2023-05-27 21:04 UTC (permalink / raw)
  To: gcc

--- .c ---
int ispowerof2(unsigned long long argument) {
    return __builtin_popcountll(argument) == 1;
}
--- EOF ---

GCC 13.3    gcc -m32 -march=alderlake -O3
            gcc -m32 -march=sapphirerapids -O3
            gcc -m32 -mpopcnt -mtune=sapphirerapids -O3

https://gcc.godbolt.org/z/cToYrrYPq
ispowerof2(unsigned long long):
        xor     eax, eax        # superfluous
        xor     edx, edx        # superfluous
        popcnt  eax, [esp+4]
        popcnt  edx, [esp+8]
        add     eax, edx
        cmp     eax, 1      ->    dec  eax
        sete    al
        movzx   eax, al         # superfluous
        ret

9 instructions in 28 bytes      # 6 instructions in 20 bytes

OUCH: popcnt writes the WHOLE result register, there is ABSOLUTELY
      no need to clear it beforehand nor to clear the higher 24 bits
      afterwards!

JFTR: before GCC zealots write nonsense: see -march= or -mtune=

GCC 13.3    gcc -mpopcnt -mtune=barcelona -O3

https://gcc.godbolt.org/z/3Ks8vh7a6
ispowerof2(unsigned long long):
        popcnt  rdi, rdi    ->    popcnt  rax, rdi
        xor     eax, eax        # superfluous!
        dec     edi         ->    dec     eax
        sete    al          ->    setz    al
        ret

GCC 13.3    gcc -m32 -mpopcnt -mtune=barcelona -O3

https://gcc.godbolt.org/z/s5s5KTGnv
ispowerof2(unsigned long long):
        popcnt  eax, [esp+4]
        popcnt  edx, [esp+8]
        add     eax, edx
        dec     eax
        sete    al
        movzx   eax, al        # superfluous!
        ret

Will GCC eventually generate properly optimised code instead of bloat?

Stefan

^ permalink raw reply	[flat|nested] 11+ messages in thread
* Re: Another epic optimiser failure
@ 2023-05-28  7:50 Julian Waters
  2023-05-29 19:01 ` Dave Blanchard
  0 siblings, 1 reply; 11+ messages in thread
From: Julian Waters @ 2023-05-28  7:50 UTC (permalink / raw)
  To: gcc

[-- Attachment #1: Type: text/plain, Size: 507 bytes --]

Man, these clang fanboys sure are getting out of hand

I feel like all this garbage can be easily resolved by y'all showing this
idiot the exact proper options required and attaching the resulting
compiled assembly exactly as he wants it, or if gcc doesn't compile the
exact assembly he wants, explaining why gcc chose a different
route than the quote on quote "Perfect assembly" that he expects it to spit
out

And Stefan? Ever heard of the saying that "the loudest man in the room is
always the weakest"?

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2023-05-30  4:04 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-05-27 21:04 Another epic optimiser failure Stefan Kanthak
2023-05-27 21:20 ` Jakub Jelinek
2023-05-27 21:28   ` Stefan Kanthak
2023-05-27 21:42     ` Andrew Pinski
2023-05-27 22:00       ` Stefan Kanthak
2023-05-27 22:46         ` Jonathan Wakely
2023-05-28  6:28 ` Nicholas Vinson
2023-05-28  7:50 Julian Waters
2023-05-29 19:01 ` Dave Blanchard
2023-05-29 23:44   ` Nicholas Vinson
2023-05-30  4:04   ` Julian Waters

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).