public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed
* [PATCH v1] x86: Replace sse2 instructions with avx in memcmp-evex-movbe.S
@ 2021-10-23  5:26 Noah Goldstein
  2021-10-23 13:22 ` H.J. Lu
  0 siblings, 1 reply; 3+ messages in thread
From: Noah Goldstein @ 2021-10-23  5:26 UTC (permalink / raw)
  To: libc-alpha

This commit replaces two usages of SSE2 'movups' with AVX 'vmovdqu'.

it could potentially be dangerous to use SSE2 if this function is ever
called without using 'vzeroupper' beforehand. While compilers appear
to use 'vzeroupper' before function calls if AVX2 has been used, using
SSE2 here is more brittle. Since it is not absolutely necessary it
should be avoided.

It costs 2-extra bytes but the extra bytes should only eat into
alignment padding.
---
 sysdeps/x86_64/multiarch/memcmp-evex-movbe.S | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/sysdeps/x86_64/multiarch/memcmp-evex-movbe.S b/sysdeps/x86_64/multiarch/memcmp-evex-movbe.S
index 2761b54f2e..640f6757fa 100644
--- a/sysdeps/x86_64/multiarch/memcmp-evex-movbe.S
+++ b/sysdeps/x86_64/multiarch/memcmp-evex-movbe.S
@@ -561,13 +561,13 @@ L(between_16_31):
 	/* From 16 to 31 bytes.  No branch when size == 16.  */
 
 	/* Use movups to save code size.  */
-	movups	(%rsi), %xmm2
+	vmovdqu	(%rsi), %xmm2
 	VPCMP	$4, (%rdi), %xmm2, %k1
 	kmovd	%k1, %eax
 	testl	%eax, %eax
 	jnz	L(return_vec_0_lv)
 	/* Use overlapping loads to avoid branches.  */
-	movups	-16(%rsi, %rdx, CHAR_SIZE), %xmm2
+	vmovdqu	-16(%rsi, %rdx, CHAR_SIZE), %xmm2
 	VPCMP	$4, -16(%rdi, %rdx, CHAR_SIZE), %xmm2, %k1
 	addl	$(CHAR_PER_VEC - (16 / CHAR_SIZE)), %edx
 	kmovd	%k1, %eax
-- 
2.29.2


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2022-04-23  1:22 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-23  5:26 [PATCH v1] x86: Replace sse2 instructions with avx in memcmp-evex-movbe.S Noah Goldstein
2021-10-23 13:22 ` H.J. Lu
2022-04-23  1:22   ` Sunil Pandey

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).