public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
* Epic code generator/optimiser failures
@ 2023-05-27 17:32 Stefan Kanthak
  0 siblings, 0 replies; only message in thread
From: Stefan Kanthak @ 2023-05-27 17:32 UTC (permalink / raw)
  To: gcc

--- demo.c ---
int ispowerof2(unsigned long long argument) {
    return (argument != 0) && ((argument & argument - 1) == 0);
}
--- EOF ---

GCC 13.1    gcc -m32 -mavx -O3 # or -march=native instead of -mavx

https://gcc.godbolt.org/z/T31Gzo85W
ispowerof2(unsigned long long):
        vmovq   xmm1, QWORD PTR [esp+4]        ->    movq     xmm0, dword ptr [esp+4]
        xor     eax, eax                       ->    xor      eax, eax
        vpunpcklqdq     xmm0, xmm1, xmm1       # superfluous
        vptest  xmm0, xmm0                     ->    ptest    xmm0, xmm0
        je      .L1                            ->    jz       .L1
        vpcmpeqd        xmm0, xmm0, xmm0       ->    pcmpeqd  xmm1, xmm1
        xor     eax, eax                       # superfluous
        vpaddq  xmm0, xmm1, xmm0               ->    paddq    xmm1. xmm0
        vpand   xmm0, xmm0, xmm1               # superfluous
        vpunpcklqdq     xmm0, xmm0, xmm0       # superfluous
        vptest  xmm0, xmm0                     ->    ptest    xmm1, xmm0
        sete    al                             ->    setz     al
.L1:
        ret                                    ->    ret

5 out of 13 instructions are SUPERFLUOUS here!

OUCH #1: there's ANSOLUTELY no need to generate AVX instructions and
         bloat the code through VEX prefixes and longer instructions!

OUCH #2: [V]MOVQ clears the upper lane of XMM registers, there's
         ABSOLTELY no need for [V]PUNPCKLQDQ instructions.

GCC 13.1    gcc -m32 -msse4.1 -O3

https://gcc.godbolt.org/z/bqsqec6r1
ispowerof2(unsigned long long):
        movq    xmm1, QWORD PTR [esp+4]       ->    movq    xmm0, [esp+4]
        xor     eax, eax                      ->    xor     eax, eax
        movdqa  xmm0, xmm1                    # superfluous
        punpcklqdq      xmm0, xmm1            # superfluous
        ptest   xmm0, xmm0                    ->    ptest   xmm0, xmm0
        je      .L1                           ->    jz      .L1
        pcmpeqd xmm0, xmm0                    ->    pcmpeqq xmm1, xmm1
        xor     eax, eax                      # superfluous
        paddq   xmm0, xmm1                    ->    paddq   xmm1, xmm0
        pand    xmm0, xmm1                    # superfluous
        punpcklqdq      xmm0, xmm0            # superfluous
        ptest   xmm0, xmm0                    ->    ptest   xmm1, xmm0
        sete    al                            ->    setz    al
.L1:
        ret                                   ->    ret

5 out of 14 instructions are superfluous here, or 18 of 50 bytes!

OUCH #3/#4: see above!

Will GCC eventually generate proper SSE4.1/AVX code?

Stefan

^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2023-05-27 17:33 UTC | newest]

Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-05-27 17:32 Epic code generator/optimiser failures Stefan Kanthak

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).