public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug target/95566] New: x86 instruction selection --- some REX prefixes unnecessary
@ 2020-06-07 10:01 zero at smallinteger dot com
2021-08-20 5:13 ` [Bug target/95566] " pinskia at gcc dot gnu.org
2021-08-20 8:09 ` crazylht at gmail dot com
0 siblings, 2 replies; 3+ messages in thread
From: zero at smallinteger dot com @ 2020-06-07 10:01 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95566
Bug ID: 95566
Summary: x86 instruction selection --- some REX prefixes
unnecessary
Product: gcc
Version: 10.1.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: zero at smallinteger dot com
Target Milestone: ---
Created attachment 48696
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=48696&action=edit
sample code
Consider the code attached, compiled with
gcc -O3 sample.c -o sample
Gcc produces unrolled loop code that follows the pattern below.
movzx ecx, WORD PTR [rsp-62]
cmp rdx, rcx
Here, rdx has the value of k >> 48. The top 32 bits of rdx are zero after the
shift, so the entirety of k >> 48 is in edx. Thus, the cmp instructions could
be
cmp edx, ecx
instead. This difference avoids the REX prefix, and thus the instructions are
shorter. After sufficient unrolling (or with e.g. more complex comparisons
that depend on k >> 48), shorter instructions without the REX prefix will be
better even accounting for the partial register dependency (or an instruction
to break the dependency). The Intel optimization manual says shorter
instructions are better.
The attachment is the entirety of sample.c. I did not include other files
because this attachment appears to qualify for that exemption due to excuse
(ii): the attached test case is small and does not include any other file.
I originally found this behavior looking at the disassembly of gcc (Gentoo
9.2.0-r2 p3) 9.2.0. I verified the same behavior with gcc 10.1 and gcc trunk
at godbolt.
^ permalink raw reply [flat|nested] 3+ messages in thread
* [Bug target/95566] x86 instruction selection --- some REX prefixes unnecessary
2020-06-07 10:01 [Bug target/95566] New: x86 instruction selection --- some REX prefixes unnecessary zero at smallinteger dot com
@ 2021-08-20 5:13 ` pinskia at gcc dot gnu.org
2021-08-20 8:09 ` crazylht at gmail dot com
1 sibling, 0 replies; 3+ messages in thread
From: pinskia at gcc dot gnu.org @ 2021-08-20 5:13 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95566
Andrew Pinski <pinskia at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Last reconfirmed| |2021-08-20
Severity|normal |enhancement
Ever confirmed|0 |1
Status|UNCONFIRMED |NEW
--- Comment #1 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Reduced testcase:
int f(unsigned short *a, unsigned long long d)
{
return *a == (d>>48);
}
---- CUT ----
of the compilers I have compared, only ICX can do this:
shrq $48, %rsi
xorl %eax, %eax
cmpw %si, (%rdi)
sete %al
retq
^ permalink raw reply [flat|nested] 3+ messages in thread
* [Bug target/95566] x86 instruction selection --- some REX prefixes unnecessary
2020-06-07 10:01 [Bug target/95566] New: x86 instruction selection --- some REX prefixes unnecessary zero at smallinteger dot com
2021-08-20 5:13 ` [Bug target/95566] " pinskia at gcc dot gnu.org
@ 2021-08-20 8:09 ` crazylht at gmail dot com
1 sibling, 0 replies; 3+ messages in thread
From: crazylht at gmail dot com @ 2021-08-20 8:09 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95566
Hongtao.liu <crazylht at gmail dot com> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |crazylht at gmail dot com
--- Comment #2 from Hongtao.liu <crazylht at gmail dot com> ---
(In reply to Andrew Pinski from comment #1)
> Reduced testcase:
> int f(unsigned short *a, unsigned long long d)
> {
> return *a == (d>>48);
> }
>
> ---- CUT ----
> of the compilers I have compared, only ICX can do this:
> shrq $48, %rsi
> xorl %eax, %eax
> cmpw %si, (%rdi)
> sete %al
> retq
Failed to match this instruction:
(set (reg:QI 93)
(eq:QI (lshiftrt:DI (reg:DI 95)
(const_int 48 [0x30]))
(zero_extend:DI (mem:HI (reg:DI 94) [1 *a_6(D)+0 S2 A16]))))
guess we can drop the zero_extend here, but still 3 instruction vs 3
instruction, just some codesize optimization.
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2021-08-20 8:09 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-06-07 10:01 [Bug target/95566] New: x86 instruction selection --- some REX prefixes unnecessary zero at smallinteger dot com
2021-08-20 5:13 ` [Bug target/95566] " pinskia at gcc dot gnu.org
2021-08-20 8:09 ` crazylht at gmail dot com
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).