public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed
* [PATCH] x86_64: Align memmove unaligned routines to 64 byte
@ 2024-05-29 16:36 Sunil K Pandey
  2024-05-29 18:05 ` Noah Goldstein
  0 siblings, 1 reply; 6+ messages in thread
From: Sunil K Pandey @ 2024-05-29 16:36 UTC (permalink / raw)
  To: libc-alpha

This patch align memmove unaligned routines to 64 byte.  Default 16 byte
alignment may cause upto 15% random perf regression for less than vector
size memmove.
---
 sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S b/sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S
index 838f8f8bff..85c0efd9e3 100644
--- a/sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S
+++ b/sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S
@@ -207,7 +207,7 @@ ENTRY (MEMMOVE_CHK_SYMBOL (__memmove_chk, unaligned))
 END (MEMMOVE_CHK_SYMBOL (__memmove_chk, unaligned))
 #endif
 
-ENTRY (MEMMOVE_SYMBOL (__memmove, unaligned))
+ENTRY_P2ALIGN (MEMMOVE_SYMBOL (__memmove, unaligned), 6)
 	movq	%rdi, %rax
 L(start):
 # ifdef __ILP32__
-- 
2.44.0


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] x86_64: Align memmove unaligned routines to 64 byte
  2024-05-29 16:36 [PATCH] x86_64: Align memmove unaligned routines to 64 byte Sunil K Pandey
@ 2024-05-29 18:05 ` Noah Goldstein
  2024-05-29 18:26   ` Sunil Pandey
  0 siblings, 1 reply; 6+ messages in thread
From: Noah Goldstein @ 2024-05-29 18:05 UTC (permalink / raw)
  To: Sunil K Pandey; +Cc: libc-alpha

On Wed, May 29, 2024 at 11:37 AM Sunil K Pandey <skpgkp2@gmail.com> wrote:
>
> This patch align memmove unaligned routines to 64 byte.  Default 16 byte
> alignment may cause upto 15% random perf regression for less than vector
> size memmove.
> ---
>  sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S b/sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S
> index 838f8f8bff..85c0efd9e3 100644
> --- a/sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S
> +++ b/sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S
> @@ -207,7 +207,7 @@ ENTRY (MEMMOVE_CHK_SYMBOL (__memmove_chk, unaligned))
>  END (MEMMOVE_CHK_SYMBOL (__memmove_chk, unaligned))
>  #endif
>
> -ENTRY (MEMMOVE_SYMBOL (__memmove, unaligned))
> +ENTRY_P2ALIGN (MEMMOVE_SYMBOL (__memmove, unaligned), 6)
>         movq    %rdi, %rax
>  L(start):
>  # ifdef __ILP32__
> --
> 2.44.0
>

Isn't the erms nearly always used?

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] x86_64: Align memmove unaligned routines to 64 byte
  2024-05-29 18:05 ` Noah Goldstein
@ 2024-05-29 18:26   ` Sunil Pandey
  2024-05-29 18:33     ` Noah Goldstein
  0 siblings, 1 reply; 6+ messages in thread
From: Sunil Pandey @ 2024-05-29 18:26 UTC (permalink / raw)
  To: Noah Goldstein; +Cc: libc-alpha

[-- Attachment #1: Type: text/plain, Size: 1277 bytes --]

On Wed, May 29, 2024 at 11:06 AM Noah Goldstein <goldstein.w.n@gmail.com>
wrote:

> On Wed, May 29, 2024 at 11:37 AM Sunil K Pandey <skpgkp2@gmail.com> wrote:
> >
> > This patch align memmove unaligned routines to 64 byte.  Default 16 byte
> > alignment may cause upto 15% random perf regression for less than vector
> > size memmove.
> > ---
> >  sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S
> b/sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S
> > index 838f8f8bff..85c0efd9e3 100644
> > --- a/sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S
> > +++ b/sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S
> > @@ -207,7 +207,7 @@ ENTRY (MEMMOVE_CHK_SYMBOL (__memmove_chk, unaligned))
> >  END (MEMMOVE_CHK_SYMBOL (__memmove_chk, unaligned))
> >  #endif
> >
> > -ENTRY (MEMMOVE_SYMBOL (__memmove, unaligned))
> > +ENTRY_P2ALIGN (MEMMOVE_SYMBOL (__memmove, unaligned), 6)
> >         movq    %rdi, %rax
> >  L(start):
> >  # ifdef __ILP32__
> > --
> > 2.44.0
> >
>
> Isn't the erms nearly always used?
>

It's used in libMicro benchmark.
https://github.com/redhat-performance/libMicro

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] x86_64: Align memmove unaligned routines to 64 byte
  2024-05-29 18:26   ` Sunil Pandey
@ 2024-05-29 18:33     ` Noah Goldstein
  2024-05-29 18:53       ` Sunil Pandey
  0 siblings, 1 reply; 6+ messages in thread
From: Noah Goldstein @ 2024-05-29 18:33 UTC (permalink / raw)
  To: Sunil Pandey; +Cc: libc-alpha

On Wed, May 29, 2024 at 1:26 PM Sunil Pandey <skpgkp2@gmail.com> wrote:
>
>
>
> On Wed, May 29, 2024 at 11:06 AM Noah Goldstein <goldstein.w.n@gmail.com> wrote:
>>
>> On Wed, May 29, 2024 at 11:37 AM Sunil K Pandey <skpgkp2@gmail.com> wrote:
>> >
>> > This patch align memmove unaligned routines to 64 byte.  Default 16 byte
>> > alignment may cause upto 15% random perf regression for less than vector
>> > size memmove.
>> > ---
>> >  sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S | 2 +-
>> >  1 file changed, 1 insertion(+), 1 deletion(-)
>> >
>> > diff --git a/sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S b/sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S
>> > index 838f8f8bff..85c0efd9e3 100644
>> > --- a/sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S
>> > +++ b/sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S
>> > @@ -207,7 +207,7 @@ ENTRY (MEMMOVE_CHK_SYMBOL (__memmove_chk, unaligned))
>> >  END (MEMMOVE_CHK_SYMBOL (__memmove_chk, unaligned))
>> >  #endif
>> >
>> > -ENTRY (MEMMOVE_SYMBOL (__memmove, unaligned))
>> > +ENTRY_P2ALIGN (MEMMOVE_SYMBOL (__memmove, unaligned), 6)
>> >         movq    %rdi, %rax
>> >  L(start):
>> >  # ifdef __ILP32__
>> > --
>> > 2.44.0
>> >
>>
>> Isn't the erms nearly always used?
>
>
> It's used in libMicro benchmark.
> https://github.com/redhat-performance/libMicro

Im not contending that if you microbenchmark this it won't be faster
if it's 64 byte aligned.

The question is outside of microbenchmarks is this impl important
enough to optimize?

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] x86_64: Align memmove unaligned routines to 64 byte
  2024-05-29 18:33     ` Noah Goldstein
@ 2024-05-29 18:53       ` Sunil Pandey
  2024-06-03 13:29         ` Carlos O'Donell
  0 siblings, 1 reply; 6+ messages in thread
From: Sunil Pandey @ 2024-05-29 18:53 UTC (permalink / raw)
  To: Noah Goldstein; +Cc: libc-alpha

[-- Attachment #1: Type: text/plain, Size: 2020 bytes --]

On Wed, May 29, 2024 at 11:34 AM Noah Goldstein <goldstein.w.n@gmail.com>
wrote:

> On Wed, May 29, 2024 at 1:26 PM Sunil Pandey <skpgkp2@gmail.com> wrote:
> >
> >
> >
> > On Wed, May 29, 2024 at 11:06 AM Noah Goldstein <goldstein.w.n@gmail.com>
> wrote:
> >>
> >> On Wed, May 29, 2024 at 11:37 AM Sunil K Pandey <skpgkp2@gmail.com>
> wrote:
> >> >
> >> > This patch align memmove unaligned routines to 64 byte.  Default 16
> byte
> >> > alignment may cause upto 15% random perf regression for less than
> vector
> >> > size memmove.
> >> > ---
> >> >  sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S | 2 +-
> >> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >> >
> >> > diff --git a/sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S
> b/sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S
> >> > index 838f8f8bff..85c0efd9e3 100644
> >> > --- a/sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S
> >> > +++ b/sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S
> >> > @@ -207,7 +207,7 @@ ENTRY (MEMMOVE_CHK_SYMBOL (__memmove_chk,
> unaligned))
> >> >  END (MEMMOVE_CHK_SYMBOL (__memmove_chk, unaligned))
> >> >  #endif
> >> >
> >> > -ENTRY (MEMMOVE_SYMBOL (__memmove, unaligned))
> >> > +ENTRY_P2ALIGN (MEMMOVE_SYMBOL (__memmove, unaligned), 6)
> >> >         movq    %rdi, %rax
> >> >  L(start):
> >> >  # ifdef __ILP32__
> >> > --
> >> > 2.44.0
> >> >
> >>
> >> Isn't the erms nearly always used?
> >
> >
> > It's used in libMicro benchmark.
> > https://github.com/redhat-performance/libMicro
>
> Im not contending that if you microbenchmark this it won't be faster
> if it's 64 byte aligned.
>
> The question is outside of microbenchmarks is this impl important
> enough to optimize?
>

There are people who rely on this benchmark data for their hardware/software
decision.

So, if this implementation is there and we can easily optimize, we should.
If nothing else, it can reduce their random regression troubleshooting time.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] x86_64: Align memmove unaligned routines to 64 byte
  2024-05-29 18:53       ` Sunil Pandey
@ 2024-06-03 13:29         ` Carlos O'Donell
  0 siblings, 0 replies; 6+ messages in thread
From: Carlos O'Donell @ 2024-06-03 13:29 UTC (permalink / raw)
  To: Sunil Pandey, Noah Goldstein; +Cc: libc-alpha

On 5/29/24 2:53 PM, Sunil Pandey wrote:
> There are people who rely on this benchmark data for their hardware/software
> decision.

They should not, and we should not base our implementation decisions
*solely* on this criteria.

We should take a nuanced approach and think clearly and deeply about
what we want to achieve regarding function performance and any
true alignment requirements.
 
> So, if this implementation is there and we can easily optimize, we should.
> If nothing else, it can reduce their random regression troubleshooting time.

Yes, if it's easy, we can do it... but we should do it only if it really
makes sense for applications.

-- 
Cheers,
Carlos.


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2024-06-03 13:29 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-05-29 16:36 [PATCH] x86_64: Align memmove unaligned routines to 64 byte Sunil K Pandey
2024-05-29 18:05 ` Noah Goldstein
2024-05-29 18:26   ` Sunil Pandey
2024-05-29 18:33     ` Noah Goldstein
2024-05-29 18:53       ` Sunil Pandey
2024-06-03 13:29         ` Carlos O'Donell

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).