public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug rtl-optimization/53687] New: _mm_cmpistri generates redundant movslq %ecx,%rcx on x86-64
@ 2012-06-15 17:21 jbemmel at zonnet dot nl
2024-04-29 7:47 ` [Bug target/53687] " lh_mouse at 126 dot com
0 siblings, 1 reply; 2+ messages in thread
From: jbemmel at zonnet dot nl @ 2012-06-15 17:21 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53687
Bug #: 53687
Summary: _mm_cmpistri generates redundant movslq %ecx,%rcx on
x86-64
Classification: Unclassified
Product: gcc
Version: 4.8.0
Status: UNCONFIRMED
Severity: enhancement
Priority: P3
Component: rtl-optimization
AssignedTo: unassigned@gcc.gnu.org
ReportedBy: jbemmel@zonnet.nl
Compile the following strcmp() implementation with -O5 -march=corei7
#include <nmmintrin.h>
static inline int __strcmp(const char * cs, const char * ct)
{
// Works for both 32-bit and 64-bit code
// see http://www.strchr.com/strcmp_and_strlen_using_sse_4.2
long diff = cs-ct;
long nextbytes = 16;
ct -= 16;
loop:
__m128i ct16cs = _mm_loadu_si128( (const __m128i *) (ct += nextbytes) );
int offset = _mm_cmpistri( ct16cs, * (const __m128i *) (ct+diff),
_SIDD_CMP_EQUAL_EACH | _SIDD_NEGATIVE_POLARITY );
__asm__ __volatile__ goto( "ja %l[loop] \n jc %l[not_equal]" : : :
"memory" : loop, not_equal );
return 0;
not_equal:
return ct[diff+offset] - ct[offset];
}
GCC generates the following code:
00000000004007c0 <strcmp>:
4007c0: 48 29 f7 sub %rsi,%rdi
4007c3: 48 83 ee 10 sub $0x10,%rsi
4007c7: 48 83 c6 10 add $0x10,%rsi
4007cb: f3 0f 6f 06 movdqu (%rsi),%xmm0
4007cf: 66 0f 3a 63 04 3e 18 pcmpistri $0x18,(%rsi,%rdi,1),%xmm0
4007d6: 77 ef ja 4007c7 <strcmp+0x7>
4007d8: 72 06 jb 4007e0 <strcmp+0x20>
4007da: 31 c0 xor %eax,%eax
4007dc: c3 retq
4007dd: 0f 1f 00 nopl (%rax)
* 4007e0: 48 63 c9 movslq %ecx,%rcx
4007e3: 48 01 f7 add %rsi,%rdi
4007e6: 0f be 04 0f movsbl (%rdi,%rcx,1),%eax
4007ea: 0f be 14 0e movsbl (%rsi,%rcx,1),%edx
4007ee: 29 d0 sub %edx,%eax
4007f0: c3 retq
4007f1: 66 66 66 66 66 66 2e data32 data32 data32 data32 data32 nopw
4007f8: 0f 1f 84 00 00 00 00 %cs:0x0(%rax,%rax,1)
4007ff: 00
The "movslq" instruction is redundant, because pcmpistri clears the upper bits
of RCX when generating an index (verified using gdb)
^ permalink raw reply [flat|nested] 2+ messages in thread
* [Bug target/53687] _mm_cmpistri generates redundant movslq %ecx,%rcx on x86-64
2012-06-15 17:21 [Bug rtl-optimization/53687] New: _mm_cmpistri generates redundant movslq %ecx,%rcx on x86-64 jbemmel at zonnet dot nl
@ 2024-04-29 7:47 ` lh_mouse at 126 dot com
0 siblings, 0 replies; 2+ messages in thread
From: lh_mouse at 126 dot com @ 2024-04-29 7:47 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53687
LIU Hao <lh_mouse at 126 dot com> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |lh_mouse at 126 dot com
--- Comment #2 from LIU Hao <lh_mouse at 126 dot com> ---
Intel's choice that `_mm_cmpistri()` should return `int` is just awkward. The
result has always been zero-extended to RCX; similarly for `PMOVMSK` (with AVX,
`VPMOVMSKB` produces a 32-bit result which is still zero-extended).
For Clang, casting the result to `uint32_t` is sufficient to eliminate the
zero-extension; for GCC it does not work.
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2024-04-29 7:47 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-06-15 17:21 [Bug rtl-optimization/53687] New: _mm_cmpistri generates redundant movslq %ecx,%rcx on x86-64 jbemmel at zonnet dot nl
2024-04-29 7:47 ` [Bug target/53687] " lh_mouse at 126 dot com
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).