public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
From: "Stefan Kanthak" <stefan.kanthak@nexgo.de>
To: <gcc@gnu.org>
Subject: Will GCC eventually support SSE2 or SSE4.1?
Date: Fri, 26 May 2023 08:46:19 +0200	[thread overview]
Message-ID: <51071A92918346ABBC6B5703179F5174@H270> (raw)

Hi,

compile the following function on a system with Core2 processor
(released January 2008) for the 32-bit execution environment:

--- demo.c ---
int ispowerof2(unsigned long long argument)
{
    return (argument & argument - 1) == 0;
}
--- EOF ---

GCC 13.3: gcc -m32 -O3 demo.c

NOTE: -mtune=native is the default!

# https://godbolt.org/z/b43cjGdY9
ispowerof2(unsigned long long):
        movq    xmm1, [esp+4]
        pcmpeqd xmm0, xmm0
        paddq   xmm0, xmm1
        pand    xmm0, xmm1
        movd    edx, xmm0      #    pxor    xmm1, xmm1
        psrlq   xmm0, 32       #    pcmpeqb xmm0, xmm1
        movd    eax, xmm0      #    pmovmskb eax, xmm0
        or      edx, eax       #    cmp     al, 255
        sete    al             #    sete    al
        movzx   eax, al        #
        ret

11 instructions in 40 bytes    # 10 instructions in 36 bytes

OOPS: why does GCC (ab)use the SSE2 alias "Willamette New Instruction Set"
      here instead of the native SSE4.1 alias "Penryn New Instruction Set"
      of the Core2 (and all later processors)?

OUCH: why does it FAIL to REALLY use SSE2, as shown in the comments on the
      right side?


Now add the -mtune=core2 option to EXPLICITLY enable the NATIVE SSE4.1
alias "Penryn New Instruction Set" of the Core2 processor:

GCC 13.3: gcc -m32 -mtune=core2 -O3 demo.c

# https://godbolt.org/z/svhEoYT11
ispowerof2(unsigned long long):
                               #    xor      eax, eax
        movq    xmm1, [esp+4]  #    movq     xmm1, [esp+4]
        pcmpeqd xmm0, xmm0     #    pcmpeqq  xmm0, xmm0
        paddq   xmm0, xmm1     #    paddq    xmm0, xmm1
        pand    xmm0, xmm1     #    ptest    xmm0, xmm1
        movd    edx, xmm0      #
        psrlq   xmm0, 32       #
        movd    eax, xmm0      #
        or      edx, eax       #
        sete    al             #    sete     al
        movzx   eax, al        #
        ret                    #    ret

11 instructions in 40 bytes    # 7 instructions in 26 bytes

OUCH: GCC FAILS to use SSE4.1 as shown in the comments on the right side.
      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Last compile with -mtune=i386 for the i386 processor:

GCC 13.3: gcc -m32 -mtune=i386 -O3 demo.c

# https://godbolt.org/z/e76W6dsMj
ispowerof2(unsigned long long):
        push    ebx            #
        mov     ecx, [esp+8]   #    mov    eax, [esp+4]
        mov     ebx, [esp+12]  #    mov    edx, [esp+8]
        mov     eax, ecx       #
        mov     edx, ebx       #
        add     eax, -1        #    add    eax, -1
        adc     edx, -1        #    adc    edx, -1
        and     eax, ecx       #    and    eax, [esp+4]
        and     edx, ebx       #    and    edx, [esp+8]
        or      eax, edx       #    or     eax, edx
        sete    al             #    neg    eax
        movzx   eax, al        #    sbb    eax, eax
        pop     ebx            #    inc    eax
        ret                    #    ret

14 instructions in 33 bytes    # 11 instructions in 32 bytes

OUCH: why does GCC abuse EBX (and ECX too) and performs a superfluous
      memory write?


Stefan Kanthak

             reply	other threads:[~2023-05-26  6:55 UTC|newest]

Thread overview: 43+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-05-26  6:46 Stefan Kanthak [this message]
2023-05-26  7:00 ` Andrew Pinski
2023-05-26  7:30   ` Jonathan Wakely
2023-05-26  7:58     ` Stefan Kanthak
2023-05-26  8:16       ` Sam James
2023-05-26  8:28       ` Jonathan Wakely
2023-05-26  8:59         ` Stefan Kanthak
2023-05-26  9:22           ` Jakub Jelinek
2023-05-26 11:28             ` Stefan Kanthak
2023-05-26 11:42               ` Jonathan Wakely
2023-05-26 12:03                 ` Stefan Kanthak
2023-05-26 12:16                   ` Jonathan Wakely
2023-05-26 12:22                     ` Stefan Kanthak
2023-05-26 13:00                       ` Mark Wielaard
2023-05-26 12:23                   ` Jonathan Wakely
2023-05-26 11:36             ` Stefan Kanthak
2023-05-26 11:45               ` Jonathan Wakely
2023-05-26 12:19                 ` Stefan Kanthak
2023-05-26 12:30                   ` Jonathan Wakely
2023-05-26 12:42                     ` Stefan Kanthak
2023-05-26 13:33                       ` Nicholas Vinson
2023-05-26 12:37                   ` Jakub Jelinek
2023-05-26 13:49                     ` Stefan Kanthak
2023-05-26 14:07                       ` Jonathan Wakely
2023-05-26 14:18                         ` Jakub Jelinek
2023-05-26 14:41                           ` Stefan Kanthak
2023-05-26 14:55                             ` Jonathan Wakely
2023-05-26 15:07                               ` Stefan Kanthak
2023-05-26 14:26                         ` Stefan Kanthak
2023-05-26 14:58                           ` Jonathan Wakely
2023-05-26 15:49                             ` Stefan Kanthak
2023-05-26 16:44                               ` David Brown
2023-05-27 18:16                                 ` Will GCC eventually support correct code compilation? Dave Blanchard
2023-05-27 18:59                                   ` Jason Merrill
2023-05-28 11:50                                   ` David Brown
2023-05-26  9:22           ` Will GCC eventually support SSE2 or SSE4.1? Jonathan Wakely
2023-05-26  8:12     ` Hagen Paul Pfeifer
2023-05-26  9:51       ` Jonathan Wakely
2023-05-26 11:34 ` Nicholas Vinson
2023-05-26 15:10 ` LIU Hao
2023-05-26 15:40   ` Stefan Kanthak
2023-05-27 18:20     ` LIU Hao
2023-05-27 18:49       ` Stefan Kanthak

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=51071A92918346ABBC6B5703179F5174@H270 \
    --to=stefan.kanthak@nexgo.de \
    --cc=gcc@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).