public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed
* [PATCH v1] x86: Replace sse2 instructions with avx in memcmp-evex-movbe.S
@ 2021-10-23  5:26 Noah Goldstein
  2021-10-23 13:22 ` H.J. Lu
  0 siblings, 1 reply; 3+ messages in thread
From: Noah Goldstein @ 2021-10-23  5:26 UTC (permalink / raw)
  To: libc-alpha

This commit replaces two usages of SSE2 'movups' with AVX 'vmovdqu'.

it could potentially be dangerous to use SSE2 if this function is ever
called without using 'vzeroupper' beforehand. While compilers appear
to use 'vzeroupper' before function calls if AVX2 has been used, using
SSE2 here is more brittle. Since it is not absolutely necessary it
should be avoided.

It costs 2-extra bytes but the extra bytes should only eat into
alignment padding.
---
 sysdeps/x86_64/multiarch/memcmp-evex-movbe.S | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/sysdeps/x86_64/multiarch/memcmp-evex-movbe.S b/sysdeps/x86_64/multiarch/memcmp-evex-movbe.S
index 2761b54f2e..640f6757fa 100644
--- a/sysdeps/x86_64/multiarch/memcmp-evex-movbe.S
+++ b/sysdeps/x86_64/multiarch/memcmp-evex-movbe.S
@@ -561,13 +561,13 @@ L(between_16_31):
 	/* From 16 to 31 bytes.  No branch when size == 16.  */
 
 	/* Use movups to save code size.  */
-	movups	(%rsi), %xmm2
+	vmovdqu	(%rsi), %xmm2
 	VPCMP	$4, (%rdi), %xmm2, %k1
 	kmovd	%k1, %eax
 	testl	%eax, %eax
 	jnz	L(return_vec_0_lv)
 	/* Use overlapping loads to avoid branches.  */
-	movups	-16(%rsi, %rdx, CHAR_SIZE), %xmm2
+	vmovdqu	-16(%rsi, %rdx, CHAR_SIZE), %xmm2
 	VPCMP	$4, -16(%rdi, %rdx, CHAR_SIZE), %xmm2, %k1
 	addl	$(CHAR_PER_VEC - (16 / CHAR_SIZE)), %edx
 	kmovd	%k1, %eax
-- 
2.29.2


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH v1] x86: Replace sse2 instructions with avx in memcmp-evex-movbe.S
  2021-10-23  5:26 [PATCH v1] x86: Replace sse2 instructions with avx in memcmp-evex-movbe.S Noah Goldstein
@ 2021-10-23 13:22 ` H.J. Lu
  2022-04-23  1:22   ` Sunil Pandey
  0 siblings, 1 reply; 3+ messages in thread
From: H.J. Lu @ 2021-10-23 13:22 UTC (permalink / raw)
  To: Noah Goldstein; +Cc: GNU C Library, Carlos O'Donell

On Fri, Oct 22, 2021 at 10:26 PM Noah Goldstein <goldstein.w.n@gmail.com> wrote:
>
> This commit replaces two usages of SSE2 'movups' with AVX 'vmovdqu'.
>
> it could potentially be dangerous to use SSE2 if this function is ever
> called without using 'vzeroupper' beforehand. While compilers appear
> to use 'vzeroupper' before function calls if AVX2 has been used, using
> SSE2 here is more brittle. Since it is not absolutely necessary it
> should be avoided.
>
> It costs 2-extra bytes but the extra bytes should only eat into
> alignment padding.
> ---
>  sysdeps/x86_64/multiarch/memcmp-evex-movbe.S | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/sysdeps/x86_64/multiarch/memcmp-evex-movbe.S b/sysdeps/x86_64/multiarch/memcmp-evex-movbe.S
> index 2761b54f2e..640f6757fa 100644
> --- a/sysdeps/x86_64/multiarch/memcmp-evex-movbe.S
> +++ b/sysdeps/x86_64/multiarch/memcmp-evex-movbe.S
> @@ -561,13 +561,13 @@ L(between_16_31):
>         /* From 16 to 31 bytes.  No branch when size == 16.  */
>
>         /* Use movups to save code size.  */
> -       movups  (%rsi), %xmm2
> +       vmovdqu (%rsi), %xmm2
>         VPCMP   $4, (%rdi), %xmm2, %k1
>         kmovd   %k1, %eax
>         testl   %eax, %eax
>         jnz     L(return_vec_0_lv)
>         /* Use overlapping loads to avoid branches.  */
> -       movups  -16(%rsi, %rdx, CHAR_SIZE), %xmm2
> +       vmovdqu -16(%rsi, %rdx, CHAR_SIZE), %xmm2
>         VPCMP   $4, -16(%rdi, %rdx, CHAR_SIZE), %xmm2, %k1
>         addl    $(CHAR_PER_VEC - (16 / CHAR_SIZE)), %edx
>         kmovd   %k1, %eax
> --
> 2.29.2
>

LGTM.

Reviewed-by: H.J. Lu <hjl.tools@gmail.com>

Thanks.

-- 
H.J.

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH v1] x86: Replace sse2 instructions with avx in memcmp-evex-movbe.S
  2021-10-23 13:22 ` H.J. Lu
@ 2022-04-23  1:22   ` Sunil Pandey
  0 siblings, 0 replies; 3+ messages in thread
From: Sunil Pandey @ 2022-04-23  1:22 UTC (permalink / raw)
  To: H.J. Lu, libc-stable; +Cc: Noah Goldstein, GNU C Library

On Sat, Oct 23, 2021 at 6:22 AM H.J. Lu via Libc-alpha
<libc-alpha@sourceware.org> wrote:
>
> On Fri, Oct 22, 2021 at 10:26 PM Noah Goldstein <goldstein.w.n@gmail.com> wrote:
> >
> > This commit replaces two usages of SSE2 'movups' with AVX 'vmovdqu'.
> >
> > it could potentially be dangerous to use SSE2 if this function is ever
> > called without using 'vzeroupper' beforehand. While compilers appear
> > to use 'vzeroupper' before function calls if AVX2 has been used, using
> > SSE2 here is more brittle. Since it is not absolutely necessary it
> > should be avoided.
> >
> > It costs 2-extra bytes but the extra bytes should only eat into
> > alignment padding.
> > ---
> >  sysdeps/x86_64/multiarch/memcmp-evex-movbe.S | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/sysdeps/x86_64/multiarch/memcmp-evex-movbe.S b/sysdeps/x86_64/multiarch/memcmp-evex-movbe.S
> > index 2761b54f2e..640f6757fa 100644
> > --- a/sysdeps/x86_64/multiarch/memcmp-evex-movbe.S
> > +++ b/sysdeps/x86_64/multiarch/memcmp-evex-movbe.S
> > @@ -561,13 +561,13 @@ L(between_16_31):
> >         /* From 16 to 31 bytes.  No branch when size == 16.  */
> >
> >         /* Use movups to save code size.  */
> > -       movups  (%rsi), %xmm2
> > +       vmovdqu (%rsi), %xmm2
> >         VPCMP   $4, (%rdi), %xmm2, %k1
> >         kmovd   %k1, %eax
> >         testl   %eax, %eax
> >         jnz     L(return_vec_0_lv)
> >         /* Use overlapping loads to avoid branches.  */
> > -       movups  -16(%rsi, %rdx, CHAR_SIZE), %xmm2
> > +       vmovdqu -16(%rsi, %rdx, CHAR_SIZE), %xmm2
> >         VPCMP   $4, -16(%rdi, %rdx, CHAR_SIZE), %xmm2, %k1
> >         addl    $(CHAR_PER_VEC - (16 / CHAR_SIZE)), %edx
> >         kmovd   %k1, %eax
> > --
> > 2.29.2
> >
>
> LGTM.
>
> Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
>
> Thanks.
>
> --
> H.J.

I would like to backport this patch to release branches.
Any comments or objections?

--Sunil

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2022-04-23  1:22 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-23  5:26 [PATCH v1] x86: Replace sse2 instructions with avx in memcmp-evex-movbe.S Noah Goldstein
2021-10-23 13:22 ` H.J. Lu
2022-04-23  1:22   ` Sunil Pandey

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).