public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed
* [libc/string] State of PAGE_COPY_FWD / PAGE_COPY_THRESHOLD
@ 2016-11-01  9:28 Maxim Kuvyrkov
  2016-11-01 13:59 ` Adhemerval Zanella
  2016-11-10  8:27 ` Florian Weimer
  0 siblings, 2 replies; 16+ messages in thread
From: Maxim Kuvyrkov @ 2016-11-01  9:28 UTC (permalink / raw)
  To: GNU C Library

I wanted to check performance impact of using linux zero page sharing in calls to memset (PTR, 0, SIZE).  I remembered seeing PAGE_COPY_FWD_MAYBE and PAGE_COPY_THRESHOLD in string/memcpy.c, and my plan was to copy this logic to an experimental memset() implementation.

Closer inspection of the current code showed that only Mach port attempted to use full-page copying in memcpy.c, but now even the Mach port disables it.  The net result is that code in string/memcpy.c, as well as parts of headers sysdeps/generic/pagecopy.h and sysdeps/generic/memcopy.h are dead code.

From the above we have 2 questions:
1. Is it possible to use full-page copy (with COW) in the Linux glibc port for memcpy() and/or memset(0)?

2. If not, then is there any reason to keep the dead code around or should we clean it up?

--
Maxim Kuvyrkov
www.linaro.org



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [libc/string] State of PAGE_COPY_FWD / PAGE_COPY_THRESHOLD
  2016-11-01  9:28 [libc/string] State of PAGE_COPY_FWD / PAGE_COPY_THRESHOLD Maxim Kuvyrkov
@ 2016-11-01 13:59 ` Adhemerval Zanella
  2016-11-10  7:39   ` Maxim Kuvyrkov
  2016-11-10  8:27 ` Florian Weimer
  1 sibling, 1 reply; 16+ messages in thread
From: Adhemerval Zanella @ 2016-11-01 13:59 UTC (permalink / raw)
  To: libc-alpha



On 01/11/2016 07:28, Maxim Kuvyrkov wrote:
> I wanted to check performance impact of using linux zero page sharing in calls to memset (PTR, 0, SIZE).  I remembered seeing PAGE_COPY_FWD_MAYBE and PAGE_COPY_THRESHOLD in string/memcpy.c, and my plan was to copy this logic to an experimental memset() implementation.
> 
> Closer inspection of the current code showed that only Mach port attempted to use full-page copying in memcpy.c, but now even the Mach port disables it.  The net result is that code in string/memcpy.c, as well as parts of headers sysdeps/generic/pagecopy.h and sysdeps/generic/memcopy.h are dead code.
> 
> From the above we have 2 questions:
> 1. Is it possible to use full-page copy (with COW) in the Linux glibc port for memcpy() and/or memset(0)?

It is still possible to use the algorithms string/mem{cpy,set}, you just need to
make some change on the architecture you are aiming for.

On x86_64, for instance, you will need to remove any possible assembly
implementation so sysdeps won't use it instead.  While configuring
with --disable-multi-arch (to remove the ifunc usage and keep it
simpler), I removed:

        deleted:    sysdeps/x86_64/memcpy.S
        deleted:    sysdeps/x86_64/memcpy_chk.S
        deleted:    sysdeps/x86_64/memmove.S
        deleted:    sysdeps/x86_64/mempcpy.S
        deleted:    sysdeps/x86_64/wordcopy.c

The 'memcpy.S' is the default optimized implementation and 'memcpy_chk.S'
is an empty one (since it is implemented on memcpy.S for x86_64 and we
will need the symbols provided). Same logic applies for the other removed
one (memmove.S and mempcpy.S).

I also removed sysdeps/x86_64/wordcopy.c because the idea is to use the
default one on string/wordcopy.c. 

Next it will require to define OP_T_THRES so I created the file
'sysdeps/x86_64/memcopy.h' with the contents:

$ cat sysdeps/x86_64/memcopy.h
#include <sysdeps/generic/memcopy.h>

#undef OP_T_THRES
#define        OP_T_THRES      8

(I think we should just define it to WORDSIZE/8 somewhere).

This should enable the build and use of generic memcpy implementation.
To actually use the PAGE_COPY_* macro you will need to add a arch
specific pagecopy.h header.  Using the x86_64 example:

$ cat sysdeps/x86/pagecopy.h

#define PAGE_SIZE           4096
#define PAGE_COPY_THRESHOLD PAGE_SIZE

#define PAGE_COPY_FWD(dstp, srcp, nbytes_left, nbytes)  /* Implement it */

It should work on any other architecture as well.  Now the question
is whether this actually does make sense for Linux.  Hurd/mach provided
a syscall (?) to actually copy the pages (vm_copy) which seems to apply
some tricks to avoid full copy pages. By 'linux zero page sharing' are 
you referring to KSM (Kernel Samepage Merging)? 

If so, on a system without a provided kernel interface to work directed 
with underlying memory mapping (such as for mach), mem{cpy,set} will 
actually need to touch the pages and it will be up to kernel page fault 
mechanism to actually handle it (by identifying common pages and adjusting
vma mapping accordingly). And AFAIK this are only enabled on KSM if you 
actually madavise the page explicit. So I am not grasping the need to
actually implement page copying on Linux.

> 
> 2. If not, then is there any reason to keep the dead code around or should we clean it up?

In fact I think hurd/mach intent is indeed to actually use it and 
it is not using due a missing adjustment in commit
99f8dc922033821edcc13f9f8360e9fda40dfcff (Fix -Wundef warning on
PAGE_COPY_THRESHOLD).  It should have changed 'sysdeps/mach/pagecopy.h"
PAGE_THRESHOLD definition to PAGE_COPY_THRESHOLD.

> 
> --
> Maxim Kuvyrkov
> www.linaro.org
> 
> 
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [libc/string] State of PAGE_COPY_FWD / PAGE_COPY_THRESHOLD
  2016-11-01 13:59 ` Adhemerval Zanella
@ 2016-11-10  7:39   ` Maxim Kuvyrkov
  2016-11-10  7:48     ` Andrew Pinski
  2016-11-10  8:26     ` Florian Weimer
  0 siblings, 2 replies; 16+ messages in thread
From: Maxim Kuvyrkov @ 2016-11-10  7:39 UTC (permalink / raw)
  To: Adhemerval Zanella; +Cc: libc-alpha

> On Nov 1, 2016, at 5:59 PM, Adhemerval Zanella <adhemerval.zanella@linaro.org> wrote:
> 
...
> $ cat sysdeps/x86/pagecopy.h
> 
> #define PAGE_SIZE           4096
> #define PAGE_COPY_THRESHOLD PAGE_SIZE
> 
> #define PAGE_COPY_FWD(dstp, srcp, nbytes_left, nbytes)  /* Implement it */
> 
> It should work on any other architecture as well.  Now the question
> is whether this actually does make sense for Linux.  Hurd/mach provided
> a syscall (?) to actually copy the pages (vm_copy) which seems to apply
> some tricks to avoid full copy pages. By 'linux zero page sharing' are 
> you referring to KSM (Kernel Samepage Merging)? 
> 
> If so, on a system without a provided kernel interface to work directed 
> with underlying memory mapping (such as for mach), mem{cpy,set} will 
> actually need to touch the pages and it will be up to kernel page fault 
> mechanism to actually handle it (by identifying common pages and adjusting
> vma mapping accordingly). And AFAIK this are only enabled on KSM if you 
> actually madavise the page explicit. So I am not grasping the need to
> actually implement page copying on Linux.

Linux kernel has a reserved page filled with zeroes, so it there /were/ a syscall to tell kernel to map N consecutive pages starting at address PTR to that zero page, we could use that in GLIBC for really big memset(0).

A quick investigation shows that there is no such syscall provided by the Linux kernel.  Doesn't mean we can't ask for / implement one.

--
Maxim Kuvyrkov
www.linaro.org


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [libc/string] State of PAGE_COPY_FWD / PAGE_COPY_THRESHOLD
  2016-11-10  7:39   ` Maxim Kuvyrkov
@ 2016-11-10  7:48     ` Andrew Pinski
  2016-11-10  7:52       ` Maxim Kuvyrkov
  2016-11-10  8:26     ` Florian Weimer
  1 sibling, 1 reply; 16+ messages in thread
From: Andrew Pinski @ 2016-11-10  7:48 UTC (permalink / raw)
  To: Maxim Kuvyrkov; +Cc: Adhemerval Zanella, GNU C Library

On Wed, Nov 9, 2016 at 11:39 PM, Maxim Kuvyrkov
<maxim.kuvyrkov@linaro.org> wrote:
>> On Nov 1, 2016, at 5:59 PM, Adhemerval Zanella <adhemerval.zanella@linaro.org> wrote:
>>
> ...
>> $ cat sysdeps/x86/pagecopy.h
>>
>> #define PAGE_SIZE           4096
>> #define PAGE_COPY_THRESHOLD PAGE_SIZE
>>
>> #define PAGE_COPY_FWD(dstp, srcp, nbytes_left, nbytes)  /* Implement it */
>>
>> It should work on any other architecture as well.  Now the question
>> is whether this actually does make sense for Linux.  Hurd/mach provided
>> a syscall (?) to actually copy the pages (vm_copy) which seems to apply
>> some tricks to avoid full copy pages. By 'linux zero page sharing' are
>> you referring to KSM (Kernel Samepage Merging)?
>>
>> If so, on a system without a provided kernel interface to work directed
>> with underlying memory mapping (such as for mach), mem{cpy,set} will
>> actually need to touch the pages and it will be up to kernel page fault
>> mechanism to actually handle it (by identifying common pages and adjusting
>> vma mapping accordingly). And AFAIK this are only enabled on KSM if you
>> actually madavise the page explicit. So I am not grasping the need to
>> actually implement page copying on Linux.
>
> Linux kernel has a reserved page filled with zeroes, so it there /were/ a syscall to tell kernel to map N consecutive pages starting at address PTR to that zero page, we could use that in GLIBC for really big memset(0).
>
> A quick investigation shows that there is no such syscall provided by the Linux kernel.  Doesn't mean we can't ask for / implement one.

And then there would be a COW interrupt on the first write.  Not a
good idea.  Since most likely you are writing zeros to a big page for
security reasons before filling it again with other data.  That mean
each page would need to be copied which is normally slower than
zeroing in the first place.

COW is only useful when most of the pages will not be written to; it
is not useful when doing memcpy or memset.  Mainly because you don't
need to take the overhead of taking an interrupt twice (a system call
is still an interrupt).

Thanks,
Andrew


>
> --
> Maxim Kuvyrkov
> www.linaro.org
>
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [libc/string] State of PAGE_COPY_FWD / PAGE_COPY_THRESHOLD
  2016-11-10  7:48     ` Andrew Pinski
@ 2016-11-10  7:52       ` Maxim Kuvyrkov
  2016-11-10  8:01         ` Andrew Pinski
  0 siblings, 1 reply; 16+ messages in thread
From: Maxim Kuvyrkov @ 2016-11-10  7:52 UTC (permalink / raw)
  To: Andrew Pinski; +Cc: Adhemerval Zanella, GNU C Library

> On Nov 10, 2016, at 11:48 AM, Andrew Pinski <pinskia@gmail.com> wrote:
> 
> On Wed, Nov 9, 2016 at 11:39 PM, Maxim Kuvyrkov
> <maxim.kuvyrkov@linaro.org> wrote:
>>> On Nov 1, 2016, at 5:59 PM, Adhemerval Zanella <adhemerval.zanella@linaro.org> wrote:
>>> 
>> ...
>>> $ cat sysdeps/x86/pagecopy.h
>>> 
>>> #define PAGE_SIZE           4096
>>> #define PAGE_COPY_THRESHOLD PAGE_SIZE
>>> 
>>> #define PAGE_COPY_FWD(dstp, srcp, nbytes_left, nbytes)  /* Implement it */
>>> 
>>> It should work on any other architecture as well.  Now the question
>>> is whether this actually does make sense for Linux.  Hurd/mach provided
>>> a syscall (?) to actually copy the pages (vm_copy) which seems to apply
>>> some tricks to avoid full copy pages. By 'linux zero page sharing' are
>>> you referring to KSM (Kernel Samepage Merging)?
>>> 
>>> If so, on a system without a provided kernel interface to work directed
>>> with underlying memory mapping (such as for mach), mem{cpy,set} will
>>> actually need to touch the pages and it will be up to kernel page fault
>>> mechanism to actually handle it (by identifying common pages and adjusting
>>> vma mapping accordingly). And AFAIK this are only enabled on KSM if you
>>> actually madavise the page explicit. So I am not grasping the need to
>>> actually implement page copying on Linux.
>> 
>> Linux kernel has a reserved page filled with zeroes, so it there /were/ a syscall to tell kernel to map N consecutive pages starting at address PTR to that zero page, we could use that in GLIBC for really big memset(0).
>> 
>> A quick investigation shows that there is no such syscall provided by the Linux kernel.  Doesn't mean we can't ask for / implement one.
> 
> And then there would be a COW interrupt on the first write.  Not a
> good idea.  Since most likely you are writing zeros to a big page for
> security reasons before filling it again with other data.

I'm looking at this as a possible performance optimization for a well-known benchmark.  

>   That mean
> each page would need to be copied which is normally slower than
> zeroing in the first place.

It may be like you say, or it may be a significant performance improvement.  I want to see numbers before deciding on how useful this may be.

> 
> COW is only useful when most of the pages will not be written to; it
> is not useful when doing memcpy or memset.  Mainly because you don't
> need to take the overhead of taking an interrupt twice (a system call
> is still an interrupt).


--
Maxim Kuvyrkov
www.linaro.org

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [libc/string] State of PAGE_COPY_FWD / PAGE_COPY_THRESHOLD
  2016-11-10  7:52       ` Maxim Kuvyrkov
@ 2016-11-10  8:01         ` Andrew Pinski
  2016-11-10  8:05           ` Andrew Pinski
  0 siblings, 1 reply; 16+ messages in thread
From: Andrew Pinski @ 2016-11-10  8:01 UTC (permalink / raw)
  To: Maxim Kuvyrkov; +Cc: Adhemerval Zanella, GNU C Library

On Wed, Nov 9, 2016 at 11:52 PM, Maxim Kuvyrkov
<maxim.kuvyrkov@linaro.org> wrote:
>> On Nov 10, 2016, at 11:48 AM, Andrew Pinski <pinskia@gmail.com> wrote:
>>
>> On Wed, Nov 9, 2016 at 11:39 PM, Maxim Kuvyrkov
>> <maxim.kuvyrkov@linaro.org> wrote:
>>>> On Nov 1, 2016, at 5:59 PM, Adhemerval Zanella <adhemerval.zanella@linaro.org> wrote:
>>>>
>>> ...
>>>> $ cat sysdeps/x86/pagecopy.h
>>>>
>>>> #define PAGE_SIZE           4096
>>>> #define PAGE_COPY_THRESHOLD PAGE_SIZE
>>>>
>>>> #define PAGE_COPY_FWD(dstp, srcp, nbytes_left, nbytes)  /* Implement it */
>>>>
>>>> It should work on any other architecture as well.  Now the question
>>>> is whether this actually does make sense for Linux.  Hurd/mach provided
>>>> a syscall (?) to actually copy the pages (vm_copy) which seems to apply
>>>> some tricks to avoid full copy pages. By 'linux zero page sharing' are
>>>> you referring to KSM (Kernel Samepage Merging)?
>>>>
>>>> If so, on a system without a provided kernel interface to work directed
>>>> with underlying memory mapping (such as for mach), mem{cpy,set} will
>>>> actually need to touch the pages and it will be up to kernel page fault
>>>> mechanism to actually handle it (by identifying common pages and adjusting
>>>> vma mapping accordingly). And AFAIK this are only enabled on KSM if you
>>>> actually madavise the page explicit. So I am not grasping the need to
>>>> actually implement page copying on Linux.
>>>
>>> Linux kernel has a reserved page filled with zeroes, so it there /were/ a syscall to tell kernel to map N consecutive pages starting at address PTR to that zero page, we could use that in GLIBC for really big memset(0).
>>>
>>> A quick investigation shows that there is no such syscall provided by the Linux kernel.  Doesn't mean we can't ask for / implement one.
>>
>> And then there would be a COW interrupt on the first write.  Not a
>> good idea.  Since most likely you are writing zeros to a big page for
>> security reasons before filling it again with other data.
>
> I'm looking at this as a possible performance optimization for a well-known benchmark.

Please don't do it unless you benchmark real workloads.  Doing this
for a benchmark is not a good.  Please use something like WRF, mysql,
hadoop, spark or any other real workload that does lots of
memset/memcpy.  Please don't do this just for a well-known broken
benchmark.  Seriously this is just a broken benchmark anyways.

>
>>   That mean
>> each page would need to be copied which is normally slower than
>> zeroing in the first place.
>
> It may be like you say, or it may be a significant performance improvement.  I want to see numbers before deciding on how useful this may be.

Copying is always slower than doing setting zero.  There are
instructions on most major arch (including AARCH64) for zeroing a
cache line.  Copying means loading one cache line to L1 and then doing
stores.  Yes you can mark the cache line as not going to be used later
but that still means going to the cache.

Thanks,
Andrew

>
>>
>> COW is only useful when most of the pages will not be written to; it
>> is not useful when doing memcpy or memset.  Mainly because you don't
>> need to take the overhead of taking an interrupt twice (a system call
>> is still an interrupt).
>
>
> --
> Maxim Kuvyrkov
> www.linaro.org
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [libc/string] State of PAGE_COPY_FWD / PAGE_COPY_THRESHOLD
  2016-11-10  8:01         ` Andrew Pinski
@ 2016-11-10  8:05           ` Andrew Pinski
  2016-11-10  8:25             ` Florian Weimer
  0 siblings, 1 reply; 16+ messages in thread
From: Andrew Pinski @ 2016-11-10  8:05 UTC (permalink / raw)
  To: Maxim Kuvyrkov; +Cc: Adhemerval Zanella, GNU C Library

On Thu, Nov 10, 2016 at 12:01 AM, Andrew Pinski <pinskia@gmail.com> wrote:
> On Wed, Nov 9, 2016 at 11:52 PM, Maxim Kuvyrkov
> <maxim.kuvyrkov@linaro.org> wrote:
>>> On Nov 10, 2016, at 11:48 AM, Andrew Pinski <pinskia@gmail.com> wrote:
>>>
>>> On Wed, Nov 9, 2016 at 11:39 PM, Maxim Kuvyrkov
>>> <maxim.kuvyrkov@linaro.org> wrote:
>>>>> On Nov 1, 2016, at 5:59 PM, Adhemerval Zanella <adhemerval.zanella@linaro.org> wrote:
>>>>>
>>>> ...
>>>>> $ cat sysdeps/x86/pagecopy.h
>>>>>
>>>>> #define PAGE_SIZE           4096
>>>>> #define PAGE_COPY_THRESHOLD PAGE_SIZE
>>>>>
>>>>> #define PAGE_COPY_FWD(dstp, srcp, nbytes_left, nbytes)  /* Implement it */
>>>>>
>>>>> It should work on any other architecture as well.  Now the question
>>>>> is whether this actually does make sense for Linux.  Hurd/mach provided
>>>>> a syscall (?) to actually copy the pages (vm_copy) which seems to apply
>>>>> some tricks to avoid full copy pages. By 'linux zero page sharing' are
>>>>> you referring to KSM (Kernel Samepage Merging)?
>>>>>
>>>>> If so, on a system without a provided kernel interface to work directed
>>>>> with underlying memory mapping (such as for mach), mem{cpy,set} will
>>>>> actually need to touch the pages and it will be up to kernel page fault
>>>>> mechanism to actually handle it (by identifying common pages and adjusting
>>>>> vma mapping accordingly). And AFAIK this are only enabled on KSM if you
>>>>> actually madavise the page explicit. So I am not grasping the need to
>>>>> actually implement page copying on Linux.
>>>>
>>>> Linux kernel has a reserved page filled with zeroes, so it there /were/ a syscall to tell kernel to map N consecutive pages starting at address PTR to that zero page, we could use that in GLIBC for really big memset(0).
>>>>
>>>> A quick investigation shows that there is no such syscall provided by the Linux kernel.  Doesn't mean we can't ask for / implement one.
>>>
>>> And then there would be a COW interrupt on the first write.  Not a
>>> good idea.  Since most likely you are writing zeros to a big page for
>>> security reasons before filling it again with other data.
>>
>> I'm looking at this as a possible performance optimization for a well-known benchmark.
>
> Please don't do it unless you benchmark real workloads.  Doing this
> for a benchmark is not a good.  Please use something like WRF, mysql,
> hadoop, spark or any other real workload that does lots of
> memset/memcpy.  Please don't do this just for a well-known broken
> benchmark.  Seriously this is just a broken benchmark anyways.
>
>>
>>>   That mean
>>> each page would need to be copied which is normally slower than
>>> zeroing in the first place.
>>
>> It may be like you say, or it may be a significant performance improvement.  I want to see numbers before deciding on how useful this may be.
>
> Copying is always slower than doing setting zero.  There are
> instructions on most major arch (including AARCH64) for zeroing a
> cache line.  Copying means loading one cache line to L1 and then doing
> stores.  Yes you can mark the cache line as not going to be used later
> but that still means going to the cache.

Also memcpy/memset is useful to optimize for each micro-arch instead
of doing this kind of optimization is better in general.  I have a
semi-optimized memcpy for ThunderX (T88, not T81 or T83; still have
not looked into an optimized version for T83/T81 yet) and have an idea
how to optimize it for another core but there is no way to handle this
in glibc because of kernel or glibc infrastructure for arm64.

Thanks,
Andrew

>
> Thanks,
> Andrew
>
>>
>>>
>>> COW is only useful when most of the pages will not be written to; it
>>> is not useful when doing memcpy or memset.  Mainly because you don't
>>> need to take the overhead of taking an interrupt twice (a system call
>>> is still an interrupt).
>>
>>
>> --
>> Maxim Kuvyrkov
>> www.linaro.org
>>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [libc/string] State of PAGE_COPY_FWD / PAGE_COPY_THRESHOLD
  2016-11-10  8:05           ` Andrew Pinski
@ 2016-11-10  8:25             ` Florian Weimer
  2016-11-10  8:34               ` Andrew Pinski
  0 siblings, 1 reply; 16+ messages in thread
From: Florian Weimer @ 2016-11-10  8:25 UTC (permalink / raw)
  To: Andrew Pinski, Maxim Kuvyrkov; +Cc: Adhemerval Zanella, GNU C Library

On 11/10/2016 09:05 AM, Andrew Pinski wrote:

> Also memcpy/memset is useful to optimize for each micro-arch instead
> of doing this kind of optimization is better in general.  I have a
> semi-optimized memcpy for ThunderX (T88, not T81 or T83; still have
> not looked into an optimized version for T83/T81 yet) and have an idea
> how to optimize it for another core but there is no way to handle this
> in glibc because of kernel or glibc infrastructure for arm64.

We do this all the time on x86_64 (and presumably POWER).  What's 
missing on aarch64?

Thanks,
Florian

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [libc/string] State of PAGE_COPY_FWD / PAGE_COPY_THRESHOLD
  2016-11-10  7:39   ` Maxim Kuvyrkov
  2016-11-10  7:48     ` Andrew Pinski
@ 2016-11-10  8:26     ` Florian Weimer
  1 sibling, 0 replies; 16+ messages in thread
From: Florian Weimer @ 2016-11-10  8:26 UTC (permalink / raw)
  To: Maxim Kuvyrkov, Adhemerval Zanella; +Cc: libc-alpha

On 11/10/2016 08:39 AM, Maxim Kuvyrkov wrote:

> A quick investigation shows that there is no such syscall provided by the Linux kernel.  Doesn't mean we can't ask for / implement one.

Linux did exactly this when you read a large block from /dev/zero.  It 
was removed because it did not provide any real benefit to applications. 
  It just looked very good in certain benchmarks.

Florian

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [libc/string] State of PAGE_COPY_FWD / PAGE_COPY_THRESHOLD
  2016-11-01  9:28 [libc/string] State of PAGE_COPY_FWD / PAGE_COPY_THRESHOLD Maxim Kuvyrkov
  2016-11-01 13:59 ` Adhemerval Zanella
@ 2016-11-10  8:27 ` Florian Weimer
  1 sibling, 0 replies; 16+ messages in thread
From: Florian Weimer @ 2016-11-10  8:27 UTC (permalink / raw)
  To: Maxim Kuvyrkov, GNU C Library

On 11/01/2016 10:28 AM, Maxim Kuvyrkov wrote:
> I wanted to check performance impact of using linux zero page sharing in calls to memset (PTR, 0, SIZE).  I remembered seeing PAGE_COPY_FWD_MAYBE and PAGE_COPY_THRESHOLD in string/memcpy.c, and my plan was to copy this logic to an experimental memset() implementation.
>
> Closer inspection of the current code showed that only Mach port attempted to use full-page copying in memcpy.c, but now even the Mach port disables it.  The net result is that code in string/memcpy.c, as well as parts of headers sysdeps/generic/pagecopy.h and sysdeps/generic/memcopy.h are dead code.

I posted a patch to remove this, but it got stalled.

The Mach port no longer compiles on the master branch, so I don't know 
how we can make changes to it.

Florian

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [libc/string] State of PAGE_COPY_FWD / PAGE_COPY_THRESHOLD
  2016-11-10  8:25             ` Florian Weimer
@ 2016-11-10  8:34               ` Andrew Pinski
  2016-11-10  8:55                 ` Andrew Pinski
  0 siblings, 1 reply; 16+ messages in thread
From: Andrew Pinski @ 2016-11-10  8:34 UTC (permalink / raw)
  To: Florian Weimer; +Cc: Maxim Kuvyrkov, Adhemerval Zanella, GNU C Library

On Thu, Nov 10, 2016 at 12:25 AM, Florian Weimer <fweimer@redhat.com> wrote:
> On 11/10/2016 09:05 AM, Andrew Pinski wrote:
>
>> Also memcpy/memset is useful to optimize for each micro-arch instead
>> of doing this kind of optimization is better in general.  I have a
>> semi-optimized memcpy for ThunderX (T88, not T81 or T83; still have
>> not looked into an optimized version for T83/T81 yet) and have an idea
>> how to optimize it for another core but there is no way to handle this
>> in glibc because of kernel or glibc infrastructure for arm64.
>
>
> We do this all the time on x86_64 (and presumably POWER).  What's missing on
>

Unless you want to read a file (either /proc/cpuinfo or
/sys/devices/system/cpu/cpuN/regs/identification/midr_el1) the arch
does not provide a way to identify what processor you are running on
in userspace.  There has been some discussions already on the kernel
list and I was going to have another discussion with the arm folks but
I can't find the email thread where the meeting was setup for :).

Thanks,
Andrew

>
> Thanks,
> Florian
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [libc/string] State of PAGE_COPY_FWD / PAGE_COPY_THRESHOLD
  2016-11-10  8:34               ` Andrew Pinski
@ 2016-11-10  8:55                 ` Andrew Pinski
  2016-11-10  9:25                   ` Siddhesh Poyarekar
  2016-11-30 11:18                   ` Siddhesh Poyarekar
  0 siblings, 2 replies; 16+ messages in thread
From: Andrew Pinski @ 2016-11-10  8:55 UTC (permalink / raw)
  To: Florian Weimer; +Cc: Maxim Kuvyrkov, Adhemerval Zanella, GNU C Library

On Thu, Nov 10, 2016 at 12:34 AM, Andrew Pinski <pinskia@gmail.com> wrote:
> On Thu, Nov 10, 2016 at 12:25 AM, Florian Weimer <fweimer@redhat.com> wrote:
>> On 11/10/2016 09:05 AM, Andrew Pinski wrote:
>>
>>> Also memcpy/memset is useful to optimize for each micro-arch instead
>>> of doing this kind of optimization is better in general.  I have a
>>> semi-optimized memcpy for ThunderX (T88, not T81 or T83; still have
>>> not looked into an optimized version for T83/T81 yet) and have an idea
>>> how to optimize it for another core but there is no way to handle this
>>> in glibc because of kernel or glibc infrastructure for arm64.
>>
>>
>> We do this all the time on x86_64 (and presumably POWER).  What's missing on
>>
>
> Unless you want to read a file (either /proc/cpuinfo or
> /sys/devices/system/cpu/cpuN/regs/identification/midr_el1) the arch
> does not provide a way to identify what processor you are running on
> in userspace.  There has been some discussions already on the kernel
> list and I was going to have another discussion with the arm folks but
> I can't find the email thread where the meeting was setup for :).

I found the email thread, it is setup for this Friday.

Thanks,
Andrew

>
> Thanks,
> Andrew
>
>>
>> Thanks,
>> Florian
>>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [libc/string] State of PAGE_COPY_FWD / PAGE_COPY_THRESHOLD
  2016-11-10  8:55                 ` Andrew Pinski
@ 2016-11-10  9:25                   ` Siddhesh Poyarekar
  2016-11-10  9:29                     ` Florian Weimer
  2016-11-30 11:18                   ` Siddhesh Poyarekar
  1 sibling, 1 reply; 16+ messages in thread
From: Siddhesh Poyarekar @ 2016-11-10  9:25 UTC (permalink / raw)
  To: Andrew Pinski, Florian Weimer
  Cc: Maxim Kuvyrkov, Adhemerval Zanella, GNU C Library

On Thursday 10 November 2016 02:24 PM, Andrew Pinski wrote:
> I found the email thread, it is setup for this Friday.

Would you or someone be able to post a summary of the discussion?  Last
I know from discussions at Cauldron, I am responsible for working on
fixing IFUNC related issues that prevent us from doing anything useful
in an ifunc, including accessing vdso.  It is on my list of things to do
right after tunables.

Also, FWIW we can use tunables to work around this until the ifunc
mechanism is in place.  H. J. Lu is looking to do something similar,
except as an override for x86 so we don't need any additional work on
top of his work.  The only requirement is for tunables to be initialized
early enough, which it does.

Siddhesh

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [libc/string] State of PAGE_COPY_FWD / PAGE_COPY_THRESHOLD
  2016-11-10  9:25                   ` Siddhesh Poyarekar
@ 2016-11-10  9:29                     ` Florian Weimer
  2016-11-10  9:34                       ` Siddhesh Poyarekar
  0 siblings, 1 reply; 16+ messages in thread
From: Florian Weimer @ 2016-11-10  9:29 UTC (permalink / raw)
  To: Siddhesh Poyarekar, Andrew Pinski
  Cc: Maxim Kuvyrkov, Adhemerval Zanella, GNU C Library

On 11/10/2016 10:25 AM, Siddhesh Poyarekar wrote:
> On Thursday 10 November 2016 02:24 PM, Andrew Pinski wrote:
>> I found the email thread, it is setup for this Friday.
>
> Would you or someone be able to post a summary of the discussion?  Last
> I know from discussions at Cauldron, I am responsible for working on
> fixing IFUNC related issues that prevent us from doing anything useful
> in an ifunc, including accessing vdso.  It is on my list of things to do
> right after tunables.

Another option would be to get the complete implementation of memcmp 
etc. fom the vDSO, I think.

In the ARM context, there were also suggestions that we should support 
big/little asymmetric multi-processing.  I hope we don't have to do 
this.  It would seriously limit IFUNC selection and other 
microarchitecture-based optimizations.

Thanks,
Florian

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [libc/string] State of PAGE_COPY_FWD / PAGE_COPY_THRESHOLD
  2016-11-10  9:29                     ` Florian Weimer
@ 2016-11-10  9:34                       ` Siddhesh Poyarekar
  0 siblings, 0 replies; 16+ messages in thread
From: Siddhesh Poyarekar @ 2016-11-10  9:34 UTC (permalink / raw)
  To: Florian Weimer, Andrew Pinski
  Cc: Maxim Kuvyrkov, Adhemerval Zanella, GNU C Library

On Thursday 10 November 2016 02:59 PM, Florian Weimer wrote:
> Another option would be to get the complete implementation of memcmp
> etc. fom the vDSO, I think.

Hmm, that is an interesting thought.

> In the ARM context, there were also suggestions that we should support
> big/little asymmetric multi-processing.  I hope we don't have to do
> this.  It would seriously limit IFUNC selection and other
> microarchitecture-based optimizations.

Even if we don't actually have optimal routines for big.LITTLE, we still
have to account for the possibility of their existence, hence the need
to peek into all online processors in the hope (because some could be
offline, giving an incorrect view of the system) of guessing the correct
execution environment.

Siddhesh

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [libc/string] State of PAGE_COPY_FWD / PAGE_COPY_THRESHOLD
  2016-11-10  8:55                 ` Andrew Pinski
  2016-11-10  9:25                   ` Siddhesh Poyarekar
@ 2016-11-30 11:18                   ` Siddhesh Poyarekar
  1 sibling, 0 replies; 16+ messages in thread
From: Siddhesh Poyarekar @ 2016-11-30 11:18 UTC (permalink / raw)
  To: Andrew Pinski, Florian Weimer
  Cc: Maxim Kuvyrkov, Adhemerval Zanella, GNU C Library

On Thursday 10 November 2016 02:24 PM, Andrew Pinski wrote:
>> Unless you want to read a file (either /proc/cpuinfo or
>> /sys/devices/system/cpu/cpuN/regs/identification/midr_el1) the arch
>> does not provide a way to identify what processor you are running on
>> in userspace.  There has been some discussions already on the kernel
>> list and I was going to have another discussion with the arm folks but
>> I can't find the email thread where the meeting was setup for :).
> 
> I found the email thread, it is setup for this Friday.

Hi Andrew,

Did this meeting happen?  If yes, then can you please summarize the
result of the discussion?  I saw a patchset on LKML that emulates MRS
for MIDR and other registers and assumed it may have something to do
with this discussion.

Siddhesh

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2016-11-30 11:18 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-11-01  9:28 [libc/string] State of PAGE_COPY_FWD / PAGE_COPY_THRESHOLD Maxim Kuvyrkov
2016-11-01 13:59 ` Adhemerval Zanella
2016-11-10  7:39   ` Maxim Kuvyrkov
2016-11-10  7:48     ` Andrew Pinski
2016-11-10  7:52       ` Maxim Kuvyrkov
2016-11-10  8:01         ` Andrew Pinski
2016-11-10  8:05           ` Andrew Pinski
2016-11-10  8:25             ` Florian Weimer
2016-11-10  8:34               ` Andrew Pinski
2016-11-10  8:55                 ` Andrew Pinski
2016-11-10  9:25                   ` Siddhesh Poyarekar
2016-11-10  9:29                     ` Florian Weimer
2016-11-10  9:34                       ` Siddhesh Poyarekar
2016-11-30 11:18                   ` Siddhesh Poyarekar
2016-11-10  8:26     ` Florian Weimer
2016-11-10  8:27 ` Florian Weimer

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).