From: "Stefan Kanthak" <stefan.kanthak@nexgo.de>
To: "Jakub Jelinek" <jakub@redhat.com>
Cc: "Jonathan Wakely" <jwakely.gcc@gmail.com>, <gcc@gnu.org>,
"Andrew Pinski" <pinskia@gmail.com>
Subject: Re: Will GCC eventually support SSE2 or SSE4.1?
Date: Fri, 26 May 2023 13:28:00 +0200 [thread overview]
Message-ID: <7D6327CDEBFD4331B6FFBD67E4B514FD@H270> (raw)
In-Reply-To: <ZHB6Qtn4RujimEi2@tucnak>
"Jakub Jelinek" <jakub@redhat.com> wrote:
> On Fri, May 26, 2023 at 10:59:03AM +0200, Stefan Kanthak wrote:
>> 3) SSE4.1 is supported since Core2, but -march=core2 fails to enable it.
>> That's bad, REALITY CHECK, please!
>
> You're wrong.
> SSE4.1 first appeared in the 45nm versions of Core2, the 65nm versions
> didn't have it.
That's correct, I failed to see this difference.
> The supported CPU names don't distinguish between core2 submodels,
> so if you have core2 with sse4.1, you should either be using -march=native
> if compiling on such a machine, or use -march=core2 -msse4.1,
This is one of the combinations I didn't test until now; with it (and with
-m32 -msse4.1 too) GCC generates SSE4.1 instructions, but FAILS to optimise:
# Compilation provided by Compiler Explorer at https://godbolt.org/
ispowerof2(unsigned long long):
movq xmm1, QWORD PTR [esp+4]
pcmpeqd xmm0, xmm0
xor eax, eax
paddq xmm0, xmm1
pand xmm0, xmm1 # SUPERFLUOUS!
punpcklqdq xmm0, xmm0 # SUPERFLUOUS!
ptest xmm0, xmm0 # ptest xmm0, xmm1
sete al
ret
9 instructions in 36 bytes instead of 7 instructions in 26 bytes.
JFTR: the documentation of MOVQ specifies
| when the destination operand is an XMM register, the quadword is
| stored to the low quadword of the register, and the high quadword
| is cleared to all 0s.
> there is no -march={conroe,allendale,wolfdale,merom,penryn,...}.
>
>> 4) If the documenation is right, then the behaviour of GCC is wrong: it
>> doesn't allow to use SSE4.1 without SSE4.2!
>
> If you aren't able to read the documentation, it is hard to argue.
When the documentation is wrong or incomplete it's hard to trust it!
| -m32
...
| The -m32 option sets int, long, and pointer types to 32 bits, and
| generates code that runs on any i386 system.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
OUCH: as shown in https://godbolt.org/z/b43cjGdY9 -m32 ALONE but
generates SSE2 instructions which DONT run on ANY i386 system!
OOPS: as shown above, -m32 -msse4.1 (or another -msse*) also generates
code that does NOT run on ANY i386 system!
Where is the precedence of the different -m* options for the CPU type
documented?
Where is their influence on each other documented?
| -march=cpu-type
...
| Specifying -march=cpu-type implies -mtune=cpu-type, except where noted
| otherwise.
...
| -mtune=cpu-type
...
| the compiler does not generate any code that cannot run on the default
| machine type unless you use a -march=cpu-type option.
Why is the "default machine type" not mentioned/specified with -march=?
Stefan
next prev parent reply other threads:[~2023-05-26 11:29 UTC|newest]
Thread overview: 43+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-05-26 6:46 Stefan Kanthak
2023-05-26 7:00 ` Andrew Pinski
2023-05-26 7:30 ` Jonathan Wakely
2023-05-26 7:58 ` Stefan Kanthak
2023-05-26 8:16 ` Sam James
2023-05-26 8:28 ` Jonathan Wakely
2023-05-26 8:59 ` Stefan Kanthak
2023-05-26 9:22 ` Jakub Jelinek
2023-05-26 11:28 ` Stefan Kanthak [this message]
2023-05-26 11:42 ` Jonathan Wakely
2023-05-26 12:03 ` Stefan Kanthak
2023-05-26 12:16 ` Jonathan Wakely
2023-05-26 12:22 ` Stefan Kanthak
2023-05-26 13:00 ` Mark Wielaard
2023-05-26 12:23 ` Jonathan Wakely
2023-05-26 11:36 ` Stefan Kanthak
2023-05-26 11:45 ` Jonathan Wakely
2023-05-26 12:19 ` Stefan Kanthak
2023-05-26 12:30 ` Jonathan Wakely
2023-05-26 12:42 ` Stefan Kanthak
2023-05-26 13:33 ` Nicholas Vinson
2023-05-26 12:37 ` Jakub Jelinek
2023-05-26 13:49 ` Stefan Kanthak
2023-05-26 14:07 ` Jonathan Wakely
2023-05-26 14:18 ` Jakub Jelinek
2023-05-26 14:41 ` Stefan Kanthak
2023-05-26 14:55 ` Jonathan Wakely
2023-05-26 15:07 ` Stefan Kanthak
2023-05-26 14:26 ` Stefan Kanthak
2023-05-26 14:58 ` Jonathan Wakely
2023-05-26 15:49 ` Stefan Kanthak
2023-05-26 16:44 ` David Brown
2023-05-27 18:16 ` Will GCC eventually support correct code compilation? Dave Blanchard
2023-05-27 18:59 ` Jason Merrill
2023-05-28 11:50 ` David Brown
2023-05-26 9:22 ` Will GCC eventually support SSE2 or SSE4.1? Jonathan Wakely
2023-05-26 8:12 ` Hagen Paul Pfeifer
2023-05-26 9:51 ` Jonathan Wakely
2023-05-26 11:34 ` Nicholas Vinson
2023-05-26 15:10 ` LIU Hao
2023-05-26 15:40 ` Stefan Kanthak
2023-05-27 18:20 ` LIU Hao
2023-05-27 18:49 ` Stefan Kanthak
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=7D6327CDEBFD4331B6FFBD67E4B514FD@H270 \
--to=stefan.kanthak@nexgo.de \
--cc=gcc@gnu.org \
--cc=jakub@redhat.com \
--cc=jwakely.gcc@gmail.com \
--cc=pinskia@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).