From: "Stefan Kanthak" <stefan.kanthak@nexgo.de>
To: <gcc@gnu.org>
Subject: Will GCC eventually support SSE2 or SSE4.1?
Date: Fri, 26 May 2023 08:46:19 +0200 [thread overview]
Message-ID: <51071A92918346ABBC6B5703179F5174@H270> (raw)
Hi,
compile the following function on a system with Core2 processor
(released January 2008) for the 32-bit execution environment:
--- demo.c ---
int ispowerof2(unsigned long long argument)
{
return (argument & argument - 1) == 0;
}
--- EOF ---
GCC 13.3: gcc -m32 -O3 demo.c
NOTE: -mtune=native is the default!
# https://godbolt.org/z/b43cjGdY9
ispowerof2(unsigned long long):
movq xmm1, [esp+4]
pcmpeqd xmm0, xmm0
paddq xmm0, xmm1
pand xmm0, xmm1
movd edx, xmm0 # pxor xmm1, xmm1
psrlq xmm0, 32 # pcmpeqb xmm0, xmm1
movd eax, xmm0 # pmovmskb eax, xmm0
or edx, eax # cmp al, 255
sete al # sete al
movzx eax, al #
ret
11 instructions in 40 bytes # 10 instructions in 36 bytes
OOPS: why does GCC (ab)use the SSE2 alias "Willamette New Instruction Set"
here instead of the native SSE4.1 alias "Penryn New Instruction Set"
of the Core2 (and all later processors)?
OUCH: why does it FAIL to REALLY use SSE2, as shown in the comments on the
right side?
Now add the -mtune=core2 option to EXPLICITLY enable the NATIVE SSE4.1
alias "Penryn New Instruction Set" of the Core2 processor:
GCC 13.3: gcc -m32 -mtune=core2 -O3 demo.c
# https://godbolt.org/z/svhEoYT11
ispowerof2(unsigned long long):
# xor eax, eax
movq xmm1, [esp+4] # movq xmm1, [esp+4]
pcmpeqd xmm0, xmm0 # pcmpeqq xmm0, xmm0
paddq xmm0, xmm1 # paddq xmm0, xmm1
pand xmm0, xmm1 # ptest xmm0, xmm1
movd edx, xmm0 #
psrlq xmm0, 32 #
movd eax, xmm0 #
or edx, eax #
sete al # sete al
movzx eax, al #
ret # ret
11 instructions in 40 bytes # 7 instructions in 26 bytes
OUCH: GCC FAILS to use SSE4.1 as shown in the comments on the right side.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Last compile with -mtune=i386 for the i386 processor:
GCC 13.3: gcc -m32 -mtune=i386 -O3 demo.c
# https://godbolt.org/z/e76W6dsMj
ispowerof2(unsigned long long):
push ebx #
mov ecx, [esp+8] # mov eax, [esp+4]
mov ebx, [esp+12] # mov edx, [esp+8]
mov eax, ecx #
mov edx, ebx #
add eax, -1 # add eax, -1
adc edx, -1 # adc edx, -1
and eax, ecx # and eax, [esp+4]
and edx, ebx # and edx, [esp+8]
or eax, edx # or eax, edx
sete al # neg eax
movzx eax, al # sbb eax, eax
pop ebx # inc eax
ret # ret
14 instructions in 33 bytes # 11 instructions in 32 bytes
OUCH: why does GCC abuse EBX (and ECX too) and performs a superfluous
memory write?
Stefan Kanthak
next reply other threads:[~2023-05-26 6:55 UTC|newest]
Thread overview: 43+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-05-26 6:46 Stefan Kanthak [this message]
2023-05-26 7:00 ` Andrew Pinski
2023-05-26 7:30 ` Jonathan Wakely
2023-05-26 7:58 ` Stefan Kanthak
2023-05-26 8:16 ` Sam James
2023-05-26 8:28 ` Jonathan Wakely
2023-05-26 8:59 ` Stefan Kanthak
2023-05-26 9:22 ` Jakub Jelinek
2023-05-26 11:28 ` Stefan Kanthak
2023-05-26 11:42 ` Jonathan Wakely
2023-05-26 12:03 ` Stefan Kanthak
2023-05-26 12:16 ` Jonathan Wakely
2023-05-26 12:22 ` Stefan Kanthak
2023-05-26 13:00 ` Mark Wielaard
2023-05-26 12:23 ` Jonathan Wakely
2023-05-26 11:36 ` Stefan Kanthak
2023-05-26 11:45 ` Jonathan Wakely
2023-05-26 12:19 ` Stefan Kanthak
2023-05-26 12:30 ` Jonathan Wakely
2023-05-26 12:42 ` Stefan Kanthak
2023-05-26 13:33 ` Nicholas Vinson
2023-05-26 12:37 ` Jakub Jelinek
2023-05-26 13:49 ` Stefan Kanthak
2023-05-26 14:07 ` Jonathan Wakely
2023-05-26 14:18 ` Jakub Jelinek
2023-05-26 14:41 ` Stefan Kanthak
2023-05-26 14:55 ` Jonathan Wakely
2023-05-26 15:07 ` Stefan Kanthak
2023-05-26 14:26 ` Stefan Kanthak
2023-05-26 14:58 ` Jonathan Wakely
2023-05-26 15:49 ` Stefan Kanthak
2023-05-26 16:44 ` David Brown
2023-05-27 18:16 ` Will GCC eventually support correct code compilation? Dave Blanchard
2023-05-27 18:59 ` Jason Merrill
2023-05-28 11:50 ` David Brown
2023-05-26 9:22 ` Will GCC eventually support SSE2 or SSE4.1? Jonathan Wakely
2023-05-26 8:12 ` Hagen Paul Pfeifer
2023-05-26 9:51 ` Jonathan Wakely
2023-05-26 11:34 ` Nicholas Vinson
2023-05-26 15:10 ` LIU Hao
2023-05-26 15:40 ` Stefan Kanthak
2023-05-27 18:20 ` LIU Hao
2023-05-27 18:49 ` Stefan Kanthak
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=51071A92918346ABBC6B5703179F5174@H270 \
--to=stefan.kanthak@nexgo.de \
--cc=gcc@gnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).