* [PATCH] powerpc: Use aligned stores in memset @ 2017-08-18 5:13 Rajalakshmi Srinivasaraghavan 2017-08-18 6:21 ` Florian Weimer ` (2 more replies) 0 siblings, 3 replies; 31+ messages in thread From: Rajalakshmi Srinivasaraghavan @ 2017-08-18 5:13 UTC (permalink / raw) To: libc-alpha; +Cc: Rajalakshmi Srinivasaraghavan The powerpc hardware does not allow unaligned accesses on non cacheable memory. This patch avoids misaligned stores for sizes less than 8 in memset to avoid such cases. Tested on powerpc64 and powerpc64le. 2017-08-17 Rajalakshmi Srinivasaraghavan <raji@linux.vnet.ibm.com> * sysdeps/powerpc/powerpc64/power8/memset.S: Store byte by byte for unaligned inputs if size is less than 8. --- sysdeps/powerpc/powerpc64/power8/memset.S | 68 ++++++++++++++++++++++++++++++- 1 file changed, 66 insertions(+), 2 deletions(-) diff --git a/sysdeps/powerpc/powerpc64/power8/memset.S b/sysdeps/powerpc/powerpc64/power8/memset.S index 7ad3bb1b00..504bab0841 100644 --- a/sysdeps/powerpc/powerpc64/power8/memset.S +++ b/sysdeps/powerpc/powerpc64/power8/memset.S @@ -377,7 +377,8 @@ L(write_LT_32): subf r5,r0,r5 2: bf 30,1f - sth r4,0(r10) + stb r4,0(r10) + stb r4,1(r10) addi r10,r10,2 1: bf 31,L(end_4bytes_alignment) @@ -437,11 +438,74 @@ L(tail5): /* Handles copies of 0~8 bytes. */ .align 4 L(write_LE_8): - bne cr6,L(tail4) + /* Use stb instead of sth which is safe for + both aligned and unaligned inputs. */ + bne cr6,L(LE7_tail4) + /* If input is word aligned, use stw, Else use stb. */ + andi. r0,r10,3 + bne L(8_unalign) stw r4,0(r10) stw r4,4(r10) blr + + /* Unaligned input and size is 8. */ + .align 4 +L(8_unalign): + andi. r0,r10,1 + beq L(8_hwalign) + stb r4,0(r10) + sth r4,1(r10) + sth r4,3(r10) + sth r4,5(r10) + stb r4,7(r10) + blr + + /* Halfword aligned input and size is 8. */ + .align 4 +L(8_hwalign): + sth r4,0(r10) + sth r4,2(r10) + sth r4,4(r10) + sth r4,6(r10) + blr + + .align 4 + /* Copies 4~7 bytes. */ +L(LE7_tail4): + bf 29,L(LE7_tail2) + stb r4,0(r10) + stb r4,1(r10) + stb r4,2(r10) + stb r4,3(r10) + bf 30,L(LE7_tail5) + stb r4,4(r10) + stb r4,5(r10) + bflr 31 + stb r4,6(r10) + blr + + .align 4 + /* Copies 2~3 bytes. */ +L(LE7_tail2): + bf 30,1f + stb r4,0(r10) + stb r4,1(r10) + bflr 31 + stb r4,2(r10) + blr + + .align 4 +L(LE7_tail5): + bflr 31 + stb r4,4(r10) + blr + + .align 4 +1: bflr 31 + stb r4,0(r10) + blr + END_GEN_TB (MEMSET,TB_TOCLESS) libc_hidden_builtin_def (memset) -- 2.11.0 ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH] powerpc: Use aligned stores in memset 2017-08-18 5:13 [PATCH] powerpc: Use aligned stores in memset Rajalakshmi Srinivasaraghavan @ 2017-08-18 6:21 ` Florian Weimer 2017-08-18 6:51 ` Rajalakshmi Srinivasaraghavan 2017-08-18 6:25 ` [PATCH] powerpc: Use aligned stores in memset Andrew Pinski 2017-08-21 2:20 ` Tulio Magno Quites Machado Filho 2 siblings, 1 reply; 31+ messages in thread From: Florian Weimer @ 2017-08-18 6:21 UTC (permalink / raw) To: Rajalakshmi Srinivasaraghavan, libc-alpha On 08/18/2017 07:11 AM, Rajalakshmi Srinivasaraghavan wrote: > * sysdeps/powerpc/powerpc64/power8/memset.S: Store byte by byte > for unaligned inputs if size is less than 8. This makes me rather nervous. powerpc64le was supposed to have reasonable efficient unaligned loads and stores. GCC happily generates them, too. Thanks, Florian ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH] powerpc: Use aligned stores in memset 2017-08-18 6:21 ` Florian Weimer @ 2017-08-18 6:51 ` Rajalakshmi Srinivasaraghavan 2017-08-18 9:10 ` Florian Weimer 0 siblings, 1 reply; 31+ messages in thread From: Rajalakshmi Srinivasaraghavan @ 2017-08-18 6:51 UTC (permalink / raw) To: Florian Weimer, libc-alpha On 08/18/2017 11:51 AM, Florian Weimer wrote: > On 08/18/2017 07:11 AM, Rajalakshmi Srinivasaraghavan wrote: >> * sysdeps/powerpc/powerpc64/power8/memset.S: Store byte by byte >> for unaligned inputs if size is less than 8. > > This makes me rather nervous. powerpc64le was supposed to have > reasonable efficient unaligned loads and stores. GCC happily generates > them, too. This is meant ONLY for caching inhibited accesses. Caching Inhibited accesses are required to be Guarded and properly aligned. > > Thanks, > Florian > > -- Thanks Rajalakshmi S ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH] powerpc: Use aligned stores in memset 2017-08-18 6:51 ` Rajalakshmi Srinivasaraghavan @ 2017-08-18 9:10 ` Florian Weimer 2017-08-18 12:13 ` Adhemerval Zanella 2017-09-12 10:30 ` Florian Weimer 0 siblings, 2 replies; 31+ messages in thread From: Florian Weimer @ 2017-08-18 9:10 UTC (permalink / raw) To: Rajalakshmi Srinivasaraghavan; +Cc: libc-alpha On 08/18/2017 08:51 AM, Rajalakshmi Srinivasaraghavan wrote: > > > On 08/18/2017 11:51 AM, Florian Weimer wrote: >> On 08/18/2017 07:11 AM, Rajalakshmi Srinivasaraghavan wrote: >>> * sysdeps/powerpc/powerpc64/power8/memset.S: Store byte by byte >>> for unaligned inputs if size is less than 8. >> >> This makes me rather nervous. powerpc64le was supposed to have >> reasonable efficient unaligned loads and stores. GCC happily generates >> them, too. > > This is meant ONLY for caching inhibited accesses. Caching Inhibited > accesses are required to be Guarded and properly aligned. The intent is to support memset for such memory regions, right? This change is insufficient. You have to fix GCC as well because it will inline memset of unaligned pointers, like this: typedef long __attribute__ ((aligned(1))) long_unaligned; void clear (long_unaligned *p) { memset (p, 0, sizeof (*p)); } clear: li 9,0 std 9,0(3) blr That's why I think your change is not useful in isolation. Thanks, Florian ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH] powerpc: Use aligned stores in memset 2017-08-18 9:10 ` Florian Weimer @ 2017-08-18 12:13 ` Adhemerval Zanella 2017-09-12 10:30 ` Florian Weimer 1 sibling, 0 replies; 31+ messages in thread From: Adhemerval Zanella @ 2017-08-18 12:13 UTC (permalink / raw) To: libc-alpha On 18/08/2017 06:10, Florian Weimer wrote: > On 08/18/2017 08:51 AM, Rajalakshmi Srinivasaraghavan wrote: >> >> >> On 08/18/2017 11:51 AM, Florian Weimer wrote: >>> On 08/18/2017 07:11 AM, Rajalakshmi Srinivasaraghavan wrote: >>>> * sysdeps/powerpc/powerpc64/power8/memset.S: Store byte by byte >>>> for unaligned inputs if size is less than 8. >>> >>> This makes me rather nervous. powerpc64le was supposed to have >>> reasonable efficient unaligned loads and stores. GCC happily generates >>> them, too. >> >> This is meant ONLY for caching inhibited accesses. Caching Inhibited >> accesses are required to be Guarded and properly aligned. > > The intent is to support memset for such memory regions, right? This > change is insufficient. You have to fix GCC as well because it will > inline memset of unaligned pointers, like this: > > typedef long __attribute__ ((aligned(1))) long_unaligned; > > void > clear (long_unaligned *p) > { > memset (p, 0, sizeof (*p)); > } > > clear: > li 9,0 > std 9,0(3) > blr > > That's why I think your change is not useful in isolation. POWER8 does have fast unaligned access memory and in fact unaligned access could be used to provide a faster memcpy/memmove implementation (I created one that I never sent upstream some time ago [1]). Unaligned accesses are used extensively in some optimized str* implementation I created for POWER8. It also allows GCC to use unaligned access for builtin mem* operation without issue on *most* of the cases. The problem is memset/memcpy/memmove *specifically* are used in some userland drivers for DMA (if I recall correctly for some XORG drivers) and for this specific user cases using unaligned access, specially vector ones, will case the kernel to trap on *every* unaligned instruction leading to abysmal performance. That's why I pushed 87868c2418fb74357757e3b739ce5b76b17a8929 to fix this very issue for POWER7 memcpy. We already discussed this same issue some time ago [2] to try overcome this limitation. I think ideally the drivers that rely on aligned mem* operations should we its own mem* operations (similar to how dpdk does [3]). [1] https://github.com/zatrazz/glibc/commits/memopt-power8 [2] https://sourceware.org/ml/libc-alpha/2015-01/msg00130.html [3] http://dpdk.org/browse/dpdk/tree/lib/librte_eal/common/include/arch/ppc_64/rte_memcpy.h ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH] powerpc: Use aligned stores in memset 2017-08-18 9:10 ` Florian Weimer 2017-08-18 12:13 ` Adhemerval Zanella @ 2017-09-12 10:30 ` Florian Weimer 2017-09-12 12:18 ` Zack Weinberg ` (2 more replies) 1 sibling, 3 replies; 31+ messages in thread From: Florian Weimer @ 2017-09-12 10:30 UTC (permalink / raw) To: Rajalakshmi Srinivasaraghavan; +Cc: libc-alpha On 08/18/2017 11:10 AM, Florian Weimer wrote: > On 08/18/2017 08:51 AM, Rajalakshmi Srinivasaraghavan wrote: >> >> >> On 08/18/2017 11:51 AM, Florian Weimer wrote: >>> On 08/18/2017 07:11 AM, Rajalakshmi Srinivasaraghavan wrote: >>>> * sysdeps/powerpc/powerpc64/power8/memset.S: Store byte by byte >>>> for unaligned inputs if size is less than 8. >>> >>> This makes me rather nervous. powerpc64le was supposed to have >>> reasonable efficient unaligned loads and stores. GCC happily generates >>> them, too. >> >> This is meant ONLY for caching inhibited accesses. Caching Inhibited >> accesses are required to be Guarded and properly aligned. > > The intent is to support memset for such memory regions, right? This > change is insufficient. You have to fix GCC as well because it will > inline memset of unaligned pointers, like this: Here's a more complete example: #include <assert.h> #include <stdio.h> #include <string.h> typedef long __attribute__ ((aligned(1))) long_unaligned; __attribute__ ((noinline, noclone, weak)) void clear (long_unaligned *p) { memset (p, 0, sizeof (*p)); } struct data { char misalign; long_unaligned data; }; int main (void) { struct data *data = malloc (sizeof (*data)); assert (data != NULL); long_unaligned *p = &data->data; printf ("pointer: %p\n", p); clear (p); return 0; } The clear function compiles to: typedef long __attribute__ ((aligned(1))) long_unaligned; void clear (long_unaligned *p) { memset (p, 0, sizeof (*p)); } At run time, I get: pointer: 0x10003c10011 This means that GCC introduced an unaligned store, no matter how memset was implemented. I could not find the manual which has the requirement that the mem* functions do not use unaligned accesses. Unless they are worded in a very peculiar way, right now, the GCC/glibc combination does not comply with a requirement that memset & Co. can be used for device memory access. Furthermore, I find it very peculiar that over-reading device memory is acceptable. Some memory-mapped devices behave strangely if memory locations are read out of order or multiple times, and the current glibc implementation accesses locations which are outside the specified object boundaries. So I think the implementation constraint on the mem* functions is wrong. It leads to a slower implementation of the mem* function for most of userspace which does not access device memory, and even for device memory, it is probably not what you want. Thanks, Florian ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH] powerpc: Use aligned stores in memset 2017-09-12 10:30 ` Florian Weimer @ 2017-09-12 12:18 ` Zack Weinberg 2017-09-12 13:57 ` Steven Munroe ` (2 more replies) 2017-09-12 13:38 ` Steven Munroe 2017-09-13 13:12 ` Tulio Magno Quites Machado Filho 2 siblings, 3 replies; 31+ messages in thread From: Zack Weinberg @ 2017-09-12 12:18 UTC (permalink / raw) To: Florian Weimer; +Cc: Rajalakshmi Srinivasaraghavan, GNU C Library On Tue, Sep 12, 2017 at 6:30 AM, Florian Weimer <fweimer@redhat.com> wrote: > > I could not find the manual which has the requirement that the mem* > functions do not use unaligned accesses. Unless they are worded in a > very peculiar way, right now, the GCC/glibc combination does not comply > with a requirement that memset & Co. can be used for device memory access. mem* are required to behave as-if they access memory as an array of unsigned char. Therefore it is valid to give them arbitrarily (un)aligned pointers. The C abstract machine doesn't specifically contemplate the possibility of a CPU that can do unaligned word reads but maybe not to all memory addresses, but I would argue that if there is such a CPU, then mem* are obliged to cope with it. > ...the current glibc > implementation accesses locations which are outside the specified object > boundaries. I think that's technically a defect. Nothing in the C standard licenses it to do that; we just get away with it because, on the implementations to date, it's not observable (unless you go past the end of a page, which you'll note there are a bunch of tests to ensure we don't do). If an over-read by a single byte is observable, then mem* is not allowed to do that. zw ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH] powerpc: Use aligned stores in memset 2017-09-12 12:18 ` Zack Weinberg @ 2017-09-12 13:57 ` Steven Munroe 2017-09-12 14:37 ` Joseph Myers 2017-09-12 17:09 ` Florian Weimer 2 siblings, 0 replies; 31+ messages in thread From: Steven Munroe @ 2017-09-12 13:57 UTC (permalink / raw) To: Zack Weinberg Cc: Florian Weimer, Rajalakshmi Srinivasaraghavan, GNU C Library On Tue, 2017-09-12 at 08:18 -0400, Zack Weinberg wrote: > On Tue, Sep 12, 2017 at 6:30 AM, Florian Weimer <fweimer@redhat.com> wrote: > > > > I could not find the manual which has the requirement that the mem* > > functions do not use unaligned accesses. Unless they are worded in a > > very peculiar way, right now, the GCC/glibc combination does not comply > > with a requirement that memset & Co. can be used for device memory access. > > mem* are required to behave as-if they access memory as an array of > unsigned char. Therefore it is valid to give them arbitrarily > (un)aligned pointers. The C abstract machine doesn't specifically > contemplate the possibility of a CPU that can do unaligned word reads > but maybe not to all memory addresses, but I would argue that if there > is such a CPU, then mem* are obliged to cope with it. > > > ...the current glibc > > implementation accesses locations which are outside the specified object > > boundaries. > > I think that's technically a defect. Nothing in the C standard > licenses it to do that; we just get away with it because, on the > implementations to date, it's not observable (unless you go past the > end of a page, which you'll note there are a bunch of tests to ensure > we don't do). If an over-read by a single byte is observable, then > mem* is not allowed to do that. > Also a bit of over reaction. As long a the library routine does no cause a visible artifact (segfault or alignment check) aligned access before or after the requested start address and length is an optimization. For example accessing the source at offset 3 and length 10 with an aligned quadword load is Ok as long I clear the leading and trailing bytes. But attempting to store 7 bytes within a quadword by merging bytes in a register and storing the whole quadword would violate single copy atomicity and is not allowed. > zw > ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH] powerpc: Use aligned stores in memset 2017-09-12 12:18 ` Zack Weinberg 2017-09-12 13:57 ` Steven Munroe @ 2017-09-12 14:37 ` Joseph Myers 2017-09-12 15:06 ` Zack Weinberg 2017-09-12 17:09 ` Florian Weimer 2 siblings, 1 reply; 31+ messages in thread From: Joseph Myers @ 2017-09-12 14:37 UTC (permalink / raw) To: Zack Weinberg Cc: Florian Weimer, Rajalakshmi Srinivasaraghavan, GNU C Library On Tue, 12 Sep 2017, Zack Weinberg wrote: > On Tue, Sep 12, 2017 at 6:30 AM, Florian Weimer <fweimer@redhat.com> wrote: > > > > I could not find the manual which has the requirement that the mem* > > functions do not use unaligned accesses. Unless they are worded in a > > very peculiar way, right now, the GCC/glibc combination does not comply > > with a requirement that memset & Co. can be used for device memory access. > > mem* are required to behave as-if they access memory as an array of > unsigned char. Therefore it is valid to give them arbitrarily > (un)aligned pointers. The C abstract machine doesn't specifically > contemplate the possibility of a CPU that can do unaligned word reads > but maybe not to all memory addresses, but I would argue that if there > is such a CPU, then mem* are obliged to cope with it. Only if there is a way, within the standard, in which you might obtain a pointer to such memory. It is explicitly undefined in ISO C to access "an object defined with a volatile-qualified type through use of an lvalue with non-volatile-qualified type" (C11 6.7.3#6). Thus you can't use mem* functions on objects defined as volatile. I think device memory with special access requirements should be considered to be defined as volatile. (So any access from C code should use volatile-qualified lvalues.) -- Joseph S. Myers joseph@codesourcery.com ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH] powerpc: Use aligned stores in memset 2017-09-12 14:37 ` Joseph Myers @ 2017-09-12 15:06 ` Zack Weinberg 0 siblings, 0 replies; 31+ messages in thread From: Zack Weinberg @ 2017-09-12 15:06 UTC (permalink / raw) To: Joseph Myers; +Cc: Florian Weimer, Rajalakshmi Srinivasaraghavan, GNU C Library On Tue, Sep 12, 2017 at 10:37 AM, Joseph Myers <joseph@codesourcery.com> wrote: > On Tue, 12 Sep 2017, Zack Weinberg wrote: >> >> mem* are required to behave as-if they access memory as an array of >> unsigned char. Therefore it is valid to give them arbitrarily >> (un)aligned pointers. The C abstract machine doesn't specifically >> contemplate the possibility of a CPU that can do unaligned word reads >> but maybe not to all memory addresses, but I would argue that if there >> is such a CPU, then mem* are obliged to cope with it. > > Only if there is a way, within the standard, in which you might obtain a > pointer to such memory. Perhaps it is only a matter of QoI, but I would argue that if there is _any_ way to obtain such a pointer, considering the entire operating system, then mem* can and should cope with it. > I think device memory with > special access requirements should be considered to be defined as > volatile. (So any access from C code should use volatile-qualified > lvalues.) I know you know that's violently disputed. zw ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH] powerpc: Use aligned stores in memset 2017-09-12 12:18 ` Zack Weinberg 2017-09-12 13:57 ` Steven Munroe 2017-09-12 14:37 ` Joseph Myers @ 2017-09-12 17:09 ` Florian Weimer 2 siblings, 0 replies; 31+ messages in thread From: Florian Weimer @ 2017-09-12 17:09 UTC (permalink / raw) To: Zack Weinberg; +Cc: Rajalakshmi Srinivasaraghavan, GNU C Library On 09/12/2017 02:18 PM, Zack Weinberg wrote: > On Tue, Sep 12, 2017 at 6:30 AM, Florian Weimer <fweimer@redhat.com> wrote: >> >> I could not find the manual which has the requirement that the mem* >> functions do not use unaligned accesses. Unless they are worded in a >> very peculiar way, right now, the GCC/glibc combination does not comply >> with a requirement that memset & Co. can be used for device memory access. > > mem* are required to behave as-if they access memory as an array of > unsigned char. Therefore it is valid to give them arbitrarily > (un)aligned pointers. The C abstract machine doesn't specifically > contemplate the possibility of a CPU that can do unaligned word reads > but maybe not to all memory addresses, but I would argue that if there > is such a CPU, then mem* are obliged to cope with it. I disagree. On most architectures, including x86-64, you can tell, with certain hardware devices, that our mem* functions do not perform byte-wise read or write access. On many architectures, just a hardware watchpoint installed using ptrace (a supported API) is sufficient. But this theoretical possibility does not mean that we cannot or should not optimize the mem* functions. If you need specific memory access patterns, you need to use inline assembly. In many cases, volatile loads and stores are sufficient, too. >> ...the current glibc >> implementation accesses locations which are outside the specified object >> boundaries. > > I think that's technically a defect. Nothing in the C standard > licenses it to do that; It's permitted under the as-if rule. Florian ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH] powerpc: Use aligned stores in memset 2017-09-12 10:30 ` Florian Weimer 2017-09-12 12:18 ` Zack Weinberg @ 2017-09-12 13:38 ` Steven Munroe 2017-09-12 14:08 ` Florian Weimer 2017-09-13 13:12 ` Tulio Magno Quites Machado Filho 2 siblings, 1 reply; 31+ messages in thread From: Steven Munroe @ 2017-09-12 13:38 UTC (permalink / raw) To: Florian Weimer; +Cc: Rajalakshmi Srinivasaraghavan, libc-alpha On Tue, 2017-09-12 at 12:30 +0200, Florian Weimer wrote: > On 08/18/2017 11:10 AM, Florian Weimer wrote: > > On 08/18/2017 08:51 AM, Rajalakshmi Srinivasaraghavan wrote: > >> > >> > >> On 08/18/2017 11:51 AM, Florian Weimer wrote: > >>> On 08/18/2017 07:11 AM, Rajalakshmi Srinivasaraghavan wrote: > >>>> * sysdeps/powerpc/powerpc64/power8/memset.S: Store byte by byte > >>>> for unaligned inputs if size is less than 8. > >>> > >>> This makes me rather nervous. powerpc64le was supposed to have > >>> reasonable efficient unaligned loads and stores. GCC happily generates > >>> them, too. > >> > >> This is meant ONLY for caching inhibited accesses. Caching Inhibited > >> accesses are required to be Guarded and properly aligned. > > > > The intent is to support memset for such memory regions, right? This > > change is insufficient. You have to fix GCC as well because it will > > inline memset of unaligned pointers, like this: > > Here's a more complete example: > ..snip > > This means that GCC introduced an unaligned store, no matter how memset > was implemented. > C will do what ever the programmer wants. We can not stop that. And in user mode and cache coherent memory this is not a problem as Adhemerval explained. So we are not going to degrade the performance of general applications for a tiny subset of specialized device drivers. Those guy have to know what they are doing. But in the library (like libc) that might be called from a user mode device driver (Xorg for example) and access Cache inhibited memory the memcpy implementation has to check alignment and size and using the correct instructions for each case. That is what we are doing here. > I could not find the manual which has the requirement that the mem* > functions do not use unaligned accesses. Unless they are worded in a > very peculiar way, right now, the GCC/glibc combination does not comply > with a requirement that memset & Co. can be used for device memory access. > > Furthermore, I find it very peculiar that over-reading device memory is > acceptable. Some memory-mapped devices behave strangely if memory > locations are read out of order or multiple times, and the current glibc > implementation accesses locations which are outside the specified object > boundaries. > Yes device driver writers have to know what they are doing. > So I think the implementation constraint on the mem* functions is wrong. > It leads to a slower implementation of the mem* function for most of > userspace which does not access device memory, and even for device > memory, it is probably not what you want. > We are just trying to make the mem* safe (not segfault or alignment check) if used correctly. The definition of correctly is a bit fluid. I personally disagree with the Xorg folks but so far they have refused to bend... > Thanks, > Florian > ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH] powerpc: Use aligned stores in memset 2017-09-12 13:38 ` Steven Munroe @ 2017-09-12 14:08 ` Florian Weimer 2017-09-12 14:16 ` Steven Munroe 0 siblings, 1 reply; 31+ messages in thread From: Florian Weimer @ 2017-09-12 14:08 UTC (permalink / raw) To: Steven Munroe; +Cc: Rajalakshmi Srinivasaraghavan, libc-alpha * Steven Munroe: >> This means that GCC introduced an unaligned store, no matter how memset >> was implemented. >> > C will do what ever the programmer wants. We can not stop that. That's not true. If some specification says that for POWER, mem* must behave in a certain way, and the GCC/glibc combiniation does not do that, that's a bug on POWER. The programmer only sees the entire toolchain, and it is our job to make the whole thing compliant with applicable specifications, even if this means coordinating among different projects. > And in user mode and cache coherent memory this is not a problem as > Adhemerval explained. Obviously not, otherwise we wouldn't be changing glibc. > So we are not going to degrade the performance of general applications > for a tiny subset of specialized device drivers. Those guy have to know > what they are doing. > > But in the library (like libc) that might be called from a user mode > device driver (Xorg for example) and access Cache inhibited memory the > memcpy implementation has to check alignment and size and using the > correct instructions for each case. > > That is what we are doing here. Sorry, but you are contradicting yourself. I very much doubt the Xorg-compatible memcmp is an improvement across the board. ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH] powerpc: Use aligned stores in memset 2017-09-12 14:08 ` Florian Weimer @ 2017-09-12 14:16 ` Steven Munroe 2017-09-12 17:04 ` Florian Weimer 0 siblings, 1 reply; 31+ messages in thread From: Steven Munroe @ 2017-09-12 14:16 UTC (permalink / raw) To: Florian Weimer; +Cc: Rajalakshmi Srinivasaraghavan, libc-alpha On Tue, 2017-09-12 at 16:08 +0200, Florian Weimer wrote: > * Steven Munroe: > > >> This means that GCC introduced an unaligned store, no matter how memset > >> was implemented. > >> > > C will do what ever the programmer wants. We can not stop that. > > That's not true. If some specification says that for POWER, mem* must > behave in a certain way, and the GCC/glibc combiniation does not do > that, that's a bug on POWER. > What is the bug that you think we are not fixing? > The programmer only sees the entire toolchain, and it is our job to > make the whole thing compliant with applicable specifications, even if > this means coordinating among different projects. > > > And in user mode and cache coherent memory this is not a problem as > > Adhemerval explained. > > Obviously not, otherwise we wouldn't be changing glibc. > I was arguing against forcing GCC and compilers in general being forced to be aware of Cache Inhibited memory. Programmers do. What are you arguing? > > So we are not going to degrade the performance of general applications > > for a tiny subset of specialized device drivers. Those guy have to know > > what they are doing. > > > > But in the library (like libc) that might be called from a user mode > > device driver (Xorg for example) and access Cache inhibited memory the > > memcpy implementation has to check alignment and size and using the > > correct instructions for each case. > > > > That is what we are doing here. > > Sorry, but you are contradicting yourself. I very much doubt the > Xorg-compatible memcmp is an improvement across the board. > ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH] powerpc: Use aligned stores in memset 2017-09-12 14:16 ` Steven Munroe @ 2017-09-12 17:04 ` Florian Weimer 2017-09-12 19:21 ` Steven Munroe 0 siblings, 1 reply; 31+ messages in thread From: Florian Weimer @ 2017-09-12 17:04 UTC (permalink / raw) To: munroesj; +Cc: Rajalakshmi Srinivasaraghavan, libc-alpha On 09/12/2017 04:16 PM, Steven Munroe wrote: > On Tue, 2017-09-12 at 16:08 +0200, Florian Weimer wrote: >> * Steven Munroe: >> >>>> This means that GCC introduced an unaligned store, no matter how memset >>>> was implemented. >>>> >>> C will do what ever the programmer wants. We can not stop that. >> >> That's not true. If some specification says that for POWER, mem* must >> behave in a certain way, and the GCC/glibc combiniation does not do >> that, that's a bug on POWER. >> > What is the bug that you think we are not fixing? memset, as called by the C programmer, still uses unaligned stores. >> The programmer only sees the entire toolchain, and it is our job to >> make the whole thing compliant with applicable specifications, even if >> this means coordinating among different projects. >> >>> And in user mode and cache coherent memory this is not a problem as >>> Adhemerval explained. >> >> Obviously not, otherwise we wouldn't be changing glibc. >> > I was arguing against forcing GCC and compilers in general being forced > to be aware of Cache Inhibited memory. Programmers do. Exactly. In order to give programmers this choice, you need functions like device_memset, which are not subject to compiler or library optimizations which are not valid for device memory. > What are you arguing? If you want a memset which is compatible with device memory, you need to fix GCC *and* glibc. Just patching glibc is not enough because GCC optimizes memset in ways that are incompatible with your apparent goal.. Thanks, Florian ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH] powerpc: Use aligned stores in memset 2017-09-12 17:04 ` Florian Weimer @ 2017-09-12 19:21 ` Steven Munroe 2017-09-12 19:45 ` Florian Weimer 0 siblings, 1 reply; 31+ messages in thread From: Steven Munroe @ 2017-09-12 19:21 UTC (permalink / raw) To: Florian Weimer; +Cc: Rajalakshmi Srinivasaraghavan, libc-alpha On Tue, 2017-09-12 at 19:04 +0200, Florian Weimer wrote: > On 09/12/2017 04:16 PM, Steven Munroe wrote: > > On Tue, 2017-09-12 at 16:08 +0200, Florian Weimer wrote: > >> * Steven Munroe: > >> > >>>> This means that GCC introduced an unaligned store, no matter how memset > >>>> was implemented. > >>>> > >>> C will do what ever the programmer wants. We can not stop that. > >> > >> That's not true. If some specification says that for POWER, mem* must > >> behave in a certain way, and the GCC/glibc combiniation does not do > >> that, that's a bug on POWER. > >> > > What is the bug that you think we are not fixing? > > memset, as called by the C programmer, still uses unaligned stores. > Are you sure? Which one? find ./ -name 'memset*' | grep powerpc ./sysdeps/powerpc/powerpc32/power7/memset.S ./sysdeps/powerpc/powerpc32/memset.S ./sysdeps/powerpc/powerpc32/476/memset.S ./sysdeps/powerpc/powerpc32/405/memset.S ./sysdeps/powerpc/powerpc32/power6/memset.S ./sysdeps/powerpc/powerpc32/power4/memset.S ./sysdeps/powerpc/powerpc32/power4/multiarch/memset-ppc32.S ./sysdeps/powerpc/powerpc32/power4/multiarch/memset-power6.S ./sysdeps/powerpc/powerpc32/power4/multiarch/memset.c ./sysdeps/powerpc/powerpc32/power4/multiarch/memset-power7.S ./sysdeps/powerpc/powerpc64/power8/memset.S ./sysdeps/powerpc/powerpc64/power7/memset.S ./sysdeps/powerpc/powerpc64/memset.S ./sysdeps/powerpc/powerpc64/multiarch/memset-power8.S ./sysdeps/powerpc/powerpc64/multiarch/memset-power4.S ./sysdeps/powerpc/powerpc64/multiarch/memset-power6.S ./sysdeps/powerpc/powerpc64/multiarch/memset.c ./sysdeps/powerpc/powerpc64/multiarch/memset-ppc64.S ./sysdeps/powerpc/powerpc64/multiarch/memset-power7.S ./sysdeps/powerpc/powerpc64/power6/memset.S ./sysdeps/powerpc/powerpc64/power4/memset.S > >> The programmer only sees the entire toolchain, and it is our job to > >> make the whole thing compliant with applicable specifications, even if > >> this means coordinating among different projects. > >> > >>> And in user mode and cache coherent memory this is not a problem as > >>> Adhemerval explained. > >> > >> Obviously not, otherwise we wouldn't be changing glibc. > >> > > I was arguing against forcing GCC and compilers in general being forced > > to be aware of Cache Inhibited memory. Programmers do. > > Exactly. In order to give programmers this choice, you need functions > like device_memset, which are not subject to compiler or library > optimizations which are not valid for device memory. > Which project is going to host device_memset? Are you suggesting that GLIBC should? > > What are you arguing? > > If you want a memset which is compatible with device memory, you need to > fix GCC *and* glibc. Just patching glibc is not enough because GCC > optimizes memset in ways that are incompatible with your apparent goal.. > I still don't see how GCC changes are required for this. You need to specific here. We are not going to version every loop that might contain stores based on speculation that someone who does not know what they are doing might access Cache Inhibited storage. Not going to happen. > Thanks, > Florian > ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH] powerpc: Use aligned stores in memset 2017-09-12 19:21 ` Steven Munroe @ 2017-09-12 19:45 ` Florian Weimer 2017-09-12 20:25 ` Steven Munroe 0 siblings, 1 reply; 31+ messages in thread From: Florian Weimer @ 2017-09-12 19:45 UTC (permalink / raw) To: munroesj; +Cc: Rajalakshmi Srinivasaraghavan, libc-alpha On 09/12/2017 09:21 PM, Steven Munroe wrote: > On Tue, 2017-09-12 at 19:04 +0200, Florian Weimer wrote: >> On 09/12/2017 04:16 PM, Steven Munroe wrote: >>> On Tue, 2017-09-12 at 16:08 +0200, Florian Weimer wrote: >>>> * Steven Munroe: >>>> >>>>>> This means that GCC introduced an unaligned store, no matter how memset >>>>>> was implemented. >>>>>> >>>>> C will do what ever the programmer wants. We can not stop that. >>>> >>>> That's not true. If some specification says that for POWER, mem* must >>>> behave in a certain way, and the GCC/glibc combiniation does not do >>>> that, that's a bug on POWER. >>>> >>> What is the bug that you think we are not fixing? >> >> memset, as called by the C programmer, still uses unaligned stores. Please look at my example and its disassembly. > We are not going to version every loop that might contain stores based > on speculation that someone who does not know what they are doing might > access Cache Inhibited storage. You need to remove optimizations from GCC which expand memset calls using other instructions if those expansions do not compensate for the possibility of unaligned stores. Thanks, Florian ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH] powerpc: Use aligned stores in memset 2017-09-12 19:45 ` Florian Weimer @ 2017-09-12 20:25 ` Steven Munroe 0 siblings, 0 replies; 31+ messages in thread From: Steven Munroe @ 2017-09-12 20:25 UTC (permalink / raw) To: Florian Weimer; +Cc: Rajalakshmi Srinivasaraghavan, libc-alpha On Tue, 2017-09-12 at 21:45 +0200, Florian Weimer wrote: > On 09/12/2017 09:21 PM, Steven Munroe wrote: > > On Tue, 2017-09-12 at 19:04 +0200, Florian Weimer wrote: > >> On 09/12/2017 04:16 PM, Steven Munroe wrote: > >>> On Tue, 2017-09-12 at 16:08 +0200, Florian Weimer wrote: > >>>> * Steven Munroe: > >>>> > >>>>>> This means that GCC introduced an unaligned store, no matter how memset > >>>>>> was implemented. > >>>>>> > >>>>> C will do what ever the programmer wants. We can not stop that. > >>>> > >>>> That's not true. If some specification says that for POWER, mem* must > >>>> behave in a certain way, and the GCC/glibc combiniation does not do > >>>> that, that's a bug on POWER. > >>>> > >>> What is the bug that you think we are not fixing? > >> > >> memset, as called by the C programmer, still uses unaligned stores. > > Please look at my example and its disassembly. > > > We are not going to version every loop that might contain stores based > > on speculation that someone who does not know what they are doing might > > access Cache Inhibited storage. > > You need to remove optimizations from GCC which expand memset calls > using other instructions if those expansions do not compensate for the > possibility of unaligned stores. > No, the programmer should use -fno-builtin-memset if that programmer knows he is accessing cache inhibited space. To be clear this is not new, cache coherent and cache inhibited storage have been in the PowerISA from the beginning. So why all the fuss now? ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH] powerpc: Use aligned stores in memset 2017-09-12 10:30 ` Florian Weimer 2017-09-12 12:18 ` Zack Weinberg 2017-09-12 13:38 ` Steven Munroe @ 2017-09-13 13:12 ` Tulio Magno Quites Machado Filho 2017-09-18 13:54 ` Florian Weimer 2 siblings, 1 reply; 31+ messages in thread From: Tulio Magno Quites Machado Filho @ 2017-09-13 13:12 UTC (permalink / raw) To: Florian Weimer, Rajalakshmi Srinivasaraghavan; +Cc: libc-alpha Florian Weimer <fweimer@redhat.com> writes: > On 08/18/2017 11:10 AM, Florian Weimer wrote: >> On 08/18/2017 08:51 AM, Rajalakshmi Srinivasaraghavan wrote: >>> >>> >>> On 08/18/2017 11:51 AM, Florian Weimer wrote: >>>> On 08/18/2017 07:11 AM, Rajalakshmi Srinivasaraghavan wrote: >>>>> * sysdeps/powerpc/powerpc64/power8/memset.S: Store byte by byte >>>>> for unaligned inputs if size is less than 8. >>>> >>>> This makes me rather nervous. powerpc64le was supposed to have >>>> reasonable efficient unaligned loads and stores. GCC happily generates >>>> them, too. >>> >>> This is meant ONLY for caching inhibited accesses. Caching Inhibited >>> accesses are required to be Guarded and properly aligned. >> >> The intent is to support memset for such memory regions, right? This >> change is insufficient. You have to fix GCC as well because it will >> inline memset of unaligned pointers, like this: > > Here's a more complete example: > > > #include <assert.h> > #include <stdio.h> > #include <string.h> > > typedef long __attribute__ ((aligned(1))) long_unaligned; > > __attribute__ ((noinline, noclone, weak)) > void > clear (long_unaligned *p) > { > memset (p, 0, sizeof (*p)); > } > > struct data > { > char misalign; > long_unaligned data; > }; > > int > main (void) > { > struct data *data = malloc (sizeof (*data)); > assert (data != NULL); > long_unaligned *p = &data->data; > printf ("pointer: %p\n", p); > clear (p); > return 0; > } > > The clear function compiles to: > > typedef long __attribute__ ((aligned(1))) long_unaligned; > > void > clear (long_unaligned *p) > { > memset (p, 0, sizeof (*p)); > } > > At run time, I get: > > pointer: 0x10003c10011 > > This means that GCC introduced an unaligned store, no matter how memset > was implemented. Which isn't necessarily a problem. The performance penalty only appears when the memory access is referring to an address which isn't at the instruction's natural boundary. In this case, memset should use stb to avoid an alignment interrupt. Notice that if the memory access is not at the natural boundary, an alignment interrupt is generated and it won't generate an error. The access will still happen, but it will have a performance penalty. > So I think the implementation constraint on the mem* functions is wrong. > It leads to a slower implementation of the mem* function for most of > userspace which does not access device memory, and even for device > memory, it is probably not what you want. Makes sense. But as there is nothing in the standard allowing or prohibiting the usage of mem* functions to access caching-inhibited memory, I thought it would make sense to provide functions that are as generic as possible. IMHO, it's easier for programmers to use generic functions in most scenarios and have access to specialized functions, e.g. a function for data already aligned at 16 bytes. -- Tulio Magno ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH] powerpc: Use aligned stores in memset 2017-09-13 13:12 ` Tulio Magno Quites Machado Filho @ 2017-09-18 13:54 ` Florian Weimer 2017-10-03 18:29 ` Adhemerval Zanella 0 siblings, 1 reply; 31+ messages in thread From: Florian Weimer @ 2017-09-18 13:54 UTC (permalink / raw) To: Tulio Magno Quites Machado Filho, Rajalakshmi Srinivasaraghavan Cc: libc-alpha On 09/13/2017 03:12 PM, Tulio Magno Quites Machado Filho wrote: >> So I think the implementation constraint on the mem* functions is wrong. >> It leads to a slower implementation of the mem* function for most of >> userspace which does not access device memory, and even for device >> memory, it is probably not what you want. > Makes sense. But as there is nothing in the standard allowing or prohibiting > the usage of mem* functions to access caching-inhibited memory, I thought it > would make sense to provide functions that are as generic as possible. But I have shown that you aren't doing that because of the GCC optimization which inlines the memset call. But I won't continue this conversation as I don't see it particularly useful to anyone. In the end, you are the architecture maintainers, and you should do what you think is best. Thanks, Florian ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH] powerpc: Use aligned stores in memset 2017-09-18 13:54 ` Florian Weimer @ 2017-10-03 18:29 ` Adhemerval Zanella 2017-10-05 12:13 ` Rajalakshmi Srinivasaraghavan 2017-11-08 18:52 ` Tulio Magno Quites Machado Filho 0 siblings, 2 replies; 31+ messages in thread From: Adhemerval Zanella @ 2017-10-03 18:29 UTC (permalink / raw) To: Florian Weimer, Tulio Magno Quites Machado Filho, Rajalakshmi Srinivasaraghavan Cc: libc-alpha [-- Attachment #1: Type: text/plain, Size: 2791 bytes --] On 18/09/2017 10:54, Florian Weimer wrote: > On 09/13/2017 03:12 PM, Tulio Magno Quites Machado Filho wrote: >>> So I think the implementation constraint on the mem* functions is wrong. >>>  It leads to a slower implementation of the mem* function for most of >>> userspace which does not access device memory, and even for device >>> memory, it is probably not what you want. >> Makes sense. But as there is nothing in the standard allowing or prohibiting >> the usage of mem* functions to access caching-inhibited memory, I thought it >> would make sense to provide functions that are as generic as possible. > > But I have shown that you aren't doing that because of the GCC optimization which inlines the memset call. > > But I won't continue this conversation as I don't see it particularly useful to anyone. In the end, you are the architecture maintainers, and you should do what you think is best. > > Thanks, > Florian I think one way to provide a slight better memcpy implementation for POWER8 and still be able to circumvent the non-aligned on non-cacheable memory is to use tunables. The branch azanella/memcpy-power8 [1] has a power8 memcpy optimization which uses unaligned load and stores that I created some time ago but never actually send upstream. It shows better performance on both bench-memcpy and bench-memcpy-random (about 10% on latter) and mixed results on bench-memcpy-large (which it is mainly dominated by memory throughput and on the environment I am using, a shared PowerKVM instance, the results does not seem to be reliable). It could use some tunning, specially on some the range I used for unrolling the load/stores and it also does not care for unaligned access on cross-page boundary (which tend to be quite slow on current hardware, but also on current page size of usual 64k also uncommon). This first patch does not enable this option as a default for POWER8, it just add on string tests as an option. The second patch changes the selection to: 1. If glibc is configure with tunables, set the new implementation as the default for ISA 2.07 (power8). 2. Also if tunable is active, add the parameter glibc.tune.aligned_memopt to disable the new implementation selection. So programs that rely on aligned loads can set: GLIBC_TUNABLES=glibc.tune.aligned_memopt=1 And then the memcpy ifunc selection would pick the power7 one which uses only aligned load and stores. This is a RFC patch and if the idea sounds to powerpc arch mantainers I can work on finishing the patch with more comments and send upstream. I tried to apply same unaligned idea for memset and memmove, but I could get any real improvement in neither. [1] https://sourceware.org/git/?p=glibc.git;a=shortlog;h=refs/heads/azanella/memcpy-power8 [-- Attachment #2: bench-memcpy-random.out --] [-- Type: text/plain, Size: 769 bytes --] { "timing_type": "hp_timing", "functions": { "memcpy": { "bench-variant": "random", "ifuncs": ["__memcpy_power8", "__memcpy_power7", "__memcpy_a2", "__memcpy_power6", "__memcpy_power4", "__memcpy_ppc"], "results": [ { "max-size": 4096, "timings": [21935.9, 22892, 35018.3, 27433.9, 27430.6, 27156.3] }, { "max-size": 8192, "timings": [19792.4, 23740.1, 34017, 27218.6, 26828.5, 24779] }, { "max-size": 16384, "timings": [20685.2, 23021.5, 34795.3, 27244.5, 26756.5, 27921.6] }, { "max-size": 32768, "timings": [20024.8, 22553.9, 34411.3, 28268.6, 26792.3, 27380.6] }, { "max-size": 65536, "timings": [20758.8, 23063.2, 38000.7, 28168.8, 27867.9, 25706.8] }] } } } [-- Attachment #3: bench-memcpy.out --] [-- Type: text/plain, Size: 67126 bytes --] { "timing_type": "hp_timing", "functions": { "memcpy": { "bench-variant": "default", "ifuncs": ["builtin_memcpy", "simple_memcpy", "__memcpy_power8", "__memcpy_power7", "__memcpy_a2", "__memcpy_power6", "__memcpy_power4", "__memcpy_ppc"], "results": [ { "length": 1, "align1": 0, "align2": 0, "timings": [153.906, 5.25, 4.0625, 7.64062, 12.3125, 14.7656, 14.5156, 9.48438] }, { "length": 1, "align1": 0, "align2": 0, "timings": [6.51562, 4.98438, 4.23438, 6.46875, 9.70312, 10.7188, 10.7188, 6.78125] }, { "length": 1, "align1": 0, "align2": 0, "timings": [6.51562, 5.15625, 3.9375, 6.26562, 9.54688, 10.6719, 10.5156, 6.39062] }, { "length": 1, "align1": 0, "align2": 0, "timings": [6.39062, 4.89062, 3.92188, 6.375, 9.1875, 10.7031, 10.75, 6.39062] }, { "length": 2, "align1": 0, "align2": 0, "timings": [6.90625, 3.82812, 4.5625, 5.96875, 9.54688, 11.3438, 10.7656, 7.07812] }, { "length": 2, "align1": 1, "align2": 0, "timings": [6.51562, 3.51562, 4.375, 5.3125, 9.5625, 11.125, 10.8281, 6.40625] }, { "length": 2, "align1": 0, "align2": 1, "timings": [6.78125, 5, 3.95312, 5.3125, 9.54688, 10.9062, 10.125, 6.70312] }, { "length": 2, "align1": 1, "align2": 1, "timings": [6.75, 3.65625, 4.14062, 5.54688, 9.51562, 11.0625, 10.5469, 6.42188] }, { "length": 4, "align1": 0, "align2": 0, "timings": [6.6875, 5.35938, 4.21875, 5.95312, 9.73438, 11.0156, 10.7656, 7] }, { "length": 4, "align1": 2, "align2": 0, "timings": [6.59375, 5.17188, 4.0625, 5.625, 9.64062, 10.9219, 10.8281, 6.40625] }, { "length": 4, "align1": 0, "align2": 2, "timings": [6.46875, 5.03125, 4.21875, 5.5, 9.67188, 10.7031, 10.8438, 6.45312] }, { "length": 4, "align1": 2, "align2": 2, "timings": [6.54688, 5, 4.15625, 5.5625, 9.59375, 10.5781, 10.8438, 6.65625] }, { "length": 8, "align1": 0, "align2": 0, "timings": [7.3125, 8.42188, 4.78125, 4.64062, 6, 9.25, 9.09375, 4.625] }, { "length": 8, "align1": 3, "align2": 0, "timings": [7.28125, 8.03125, 4.8125, 4.42188, 5.875, 8.90625, 8.9375, 4.875] }, { "length": 8, "align1": 0, "align2": 3, "timings": [7, 7.98438, 4.67188, 4.23438, 5.70312, 8.8125, 8.57812, 4.85938] }, { "length": 8, "align1": 3, "align2": 3, "timings": [7.23438, 7.85938, 4.79688, 4.26562, 5.85938, 8.60938, 8.78125, 4.84375] }, { "length": 16, "align1": 0, "align2": 0, "timings": [6.25, 23.3594, 3.92188, 7.1875, 23.8281, 10.5938, 12.3438, 8.95312] }, { "length": 16, "align1": 4, "align2": 0, "timings": [6.07812, 23.5625, 3.78125, 6.70312, 23.3438, 10.5938, 12.1562, 8.10938] }, { "length": 16, "align1": 0, "align2": 4, "timings": [6.20312, 23.4219, 3.95312, 6.40625, 33.5625, 10.4531, 12.0312, 8.40625] }, { "length": 16, "align1": 4, "align2": 4, "timings": [6.25, 23.3125, 3.84375, 6.51562, 32.9688, 10.5, 12, 8.42188] }, { "length": 32, "align1": 0, "align2": 0, "timings": [6.20312, 32.2031, 3.75, 9.0625, 23.7656, 8.48438, 7.10938, 8.17188] }, { "length": 32, "align1": 5, "align2": 0, "timings": [5.95312, 31.9688, 3.85938, 6.57812, 23.375, 18.6875, 12.1094, 9.60938] }, { "length": 32, "align1": 0, "align2": 5, "timings": [6.23438, 31.7188, 3.95312, 8.26562, 28.2656, 14.9688, 12.9531, 11.4375] }, { "length": 32, "align1": 5, "align2": 5, "timings": [6.375, 31.9844, 3.76562, 8.78125, 27.7188, 9.09375, 6.60938, 9.15625] }, { "length": 64, "align1": 0, "align2": 0, "timings": [7.07812, 55.6094, 4.51562, 8.53125, 25.1875, 8.73438, 8.14062, 8.92188] }, { "length": 64, "align1": 6, "align2": 0, "timings": [7.17188, 55.75, 4.625, 7.5, 24.6562, 19, 16.125, 12] }, { "length": 64, "align1": 0, "align2": 6, "timings": [7.10938, 55.5781, 4.54688, 8.23438, 30.0625, 19.375, 19, 13.9844] }, { "length": 64, "align1": 6, "align2": 6, "timings": [6.84375, 55.7812, 4.29688, 8.46875, 29.5781, 8.875, 7.98438, 9.89062] }, { "length": 128, "align1": 0, "align2": 0, "timings": [10.1875, 104.078, 7.17188, 10.6719, 33.375, 6.76562, 10.1719, 11.9375] }, { "length": 128, "align1": 7, "align2": 0, "timings": [8.9375, 103.641, 7.21875, 10, 31.5312, 21.7656, 20.3594, 21.875] }, { "length": 128, "align1": 0, "align2": 7, "timings": [9.1875, 103.516, 6.67188, 10.9375, 32.7188, 22.0781, 22.0469, 18.1562] }, { "length": 128, "align1": 7, "align2": 7, "timings": [8.92188, 103.828, 6.51562, 9.01562, 32.0938, 9.89062, 9.64062, 11.75] }, { "length": 256, "align1": 0, "align2": 0, "timings": [11.125, 199.516, 9.71875, 12.6406, 35.4062, 9.375, 14.8125, 16.8438] }, { "length": 256, "align1": 8, "align2": 0, "timings": [11.5469, 200.297, 9.14062, 15.0156, 35.3438, 8.96875, 14.3438, 15.7344] }, { "length": 256, "align1": 0, "align2": 8, "timings": [10.8594, 200.234, 7.84375, 16.4844, 50.1719, 9.17188, 14.5312, 15.5469] }, { "length": 256, "align1": 8, "align2": 8, "timings": [10.6406, 199.344, 7.90625, 10.9375, 49.3906, 9.34375, 14.125, 15.8281] }, { "length": 512, "align1": 0, "align2": 0, "timings": [16.1406, 394.062, 12.5625, 16.625, 44.625, 17.0156, 26.1562, 29.5156] }, { "length": 512, "align1": 9, "align2": 0, "timings": [14.6719, 395.766, 12.3125, 25.7812, 45.2969, 45.9531, 44.5625, 52.0469] }, { "length": 512, "align1": 0, "align2": 9, "timings": [14.7969, 394.875, 11.8906, 28.4062, 58.0781, 46.4844, 46.625, 53.9688] }, { "length": 512, "align1": 9, "align2": 9, "timings": [14.5156, 394.672, 12.5625, 13.625, 56.2188, 18.2344, 28.4688, 29.9531] }, { "length": 1024, "align1": 0, "align2": 0, "timings": [23.4375, 787.422, 19.4062, 29.1719, 56.125, 35.1406, 44.0781, 47.3438] }, { "length": 1024, "align1": 10, "align2": 0, "timings": [22.6406, 781.578, 19.4688, 42.0781, 57.2031, 77.7344, 76.0469, 91.8281] }, { "length": 1024, "align1": 0, "align2": 10, "timings": [22.8438, 780, 19.8125, 43.4062, 71.4531, 79.3281, 78.7031, 93.7344] }, { "length": 1024, "align1": 10, "align2": 10, "timings": [21.7031, 784.812, 19.5938, 24.9375, 67.8906, 38.4219, 45.6875, 47.5781] }, { "length": 2048, "align1": 0, "align2": 0, "timings": [41.9219, 1575.08, 39.1875, 44.6719, 88.7031, 66.8906, 75.625, 79.6094] }, { "length": 2048, "align1": 11, "align2": 0, "timings": [42.0938, 1568.55, 39.8594, 73.9062, 92.0312, 142.828, 139.266, 169.688] }, { "length": 2048, "align1": 0, "align2": 11, "timings": [42.8594, 1556.59, 40.0625, 76.5156, 109.656, 141.547, 142, 172.188] }, { "length": 2048, "align1": 11, "align2": 11, "timings": [40.9688, 1559.66, 38.6094, 42.2031, 101.938, 70.625, 76.4062, 78.9375] }, { "length": 4096, "align1": 0, "align2": 0, "timings": [75.7031, 3117.66, 70.1875, 77.2969, 156.703, 132.562, 139.469, 142.438] }, { "length": 4096, "align1": 12, "align2": 0, "timings": [87.4531, 3124.62, 85.6719, 137.453, 181.906, 351.281, 260.047, 317.625] }, { "length": 4096, "align1": 0, "align2": 12, "timings": [95.375, 3228.45, 93.0469, 139.812, 179.641, 258.766, 256, 323.047] }, { "length": 4096, "align1": 12, "align2": 12, "timings": [109.906, 3613.47, 107.359, 75.4375, 171.453, 135.234, 142.406, 142.859] }, { "length": 8192, "align1": 0, "align2": 0, "timings": [140.156, 6294.27, 135.547, 141.969, 292.047, 260.062, 266.172, 269.781] }, { "length": 8192, "align1": 13, "align2": 0, "timings": [168.172, 6270.34, 165.844, 264.703, 339.953, 484.297, 479.812, 612.125] }, { "length": 8192, "align1": 0, "align2": 13, "timings": [179.281, 6167.27, 135.5, 200.328, 261.109, 342.625, 337.828, 407.719] }, { "length": 8192, "align1": 13, "align2": 13, "timings": [112.781, 3590.89, 95.4375, 78.4531, 170.625, 146.688, 149.062, 150.438] }, { "length": 16384, "align1": 0, "align2": 0, "timings": [149.25, 6959.89, 146.656, 151.75, 313.969, 287.203, 290.875, 292.859] }, { "length": 16384, "align1": 14, "align2": 0, "timings": [184.672, 7171.08, 183.453, 289.938, 362.469, 525.375, 521.75, 661.656] }, { "length": 16384, "align1": 0, "align2": 14, "timings": [193.766, 7127.11, 192.062, 290.516, 351.25, 526.719, 522.234, 663.406] }, { "length": 16384, "align1": 14, "align2": 14, "timings": [169.203, 6986.52, 167.734, 150.062, 322.141, 288.453, 291.031, 292.891] }, { "length": 32768, "align1": 0, "align2": 0, "timings": [291.828, 16318.4, 336.234, 296.516, 615.844, 571.281, 575.281, 577.266] }, { "length": 32768, "align1": 15, "align2": 0, "timings": [368, 15069.5, 364.469, 574.172, 716.656, 1027.11, 1029.91, 1312.67] }, { "length": 32768, "align1": 0, "align2": 15, "timings": [379.422, 15164.2, 377.938, 575.594, 687.625, 1029.11, 1025.59, 1312.98] }, { "length": 32768, "align1": 15, "align2": 15, "timings": [311.281, 15046.3, 309.719, 298.016, 623.141, 572.75, 575.781, 576.75] }, { "length": 65536, "align1": 0, "align2": 0, "timings": [608.219, 30219.4, 593.5, 676.047, 1244.56, 1152.98, 1188.02, 1168.06] }, { "length": 65536, "align1": 16, "align2": 0, "timings": [656.047, 30360.4, 595.766, 630.266, 1235.25, 1158.94, 1169.88, 1169.11] }, { "length": 65536, "align1": 0, "align2": 16, "timings": [700.203, 30408.5, 604.453, 637.156, 1272.98, 1158.08, 1171.02, 1169.58] }, { "length": 65536, "align1": 16, "align2": 16, "timings": [603.094, 31151.9, 595.812, 630.578, 1244.88, 1159.19, 1415.02, 1169.95] }, { "length": 0, "align1": 0, "align2": 0, "timings": [4.28125, 2.95312, 2.60938, 3.73438, 5.5625, 6.85938, 6.73438, 4.26562] }, { "length": 0, "align1": 0, "align2": 0, "timings": [4.0625, 2.76562, 2.60938, 3.5625, 5.59375, 6.46875, 6.23438, 4.04688] }, { "length": 0, "align1": 0, "align2": 0, "timings": [3.8125, 2.73438, 2.60938, 3.54688, 5.73438, 6.375, 6.46875, 4.10938] }, { "length": 0, "align1": 0, "align2": 0, "timings": [3.9375, 2.9375, 2.60938, 3.46875, 5.79688, 6.21875, 6.48438, 4.0625] }, { "length": 1, "align1": 0, "align2": 0, "timings": [3.65625, 2.92188, 2.375, 3.5625, 5.40625, 6.10938, 6.23438, 3.79688] }, { "length": 1, "align1": 1, "align2": 0, "timings": [3.54688, 2.84375, 2.3125, 3.39062, 5.34375, 6.03125, 6.07812, 3.57812] }, { "length": 1, "align1": 0, "align2": 1, "timings": [3.60938, 2.8125, 2.20312, 3.375, 5.15625, 6.0625, 6.10938, 3.73438] }, { "length": 1, "align1": 1, "align2": 1, "timings": [3.51562, 2.85938, 2.28125, 3.59375, 5.25, 6.0625, 6.01562, 3.57812] }, { "length": 2, "align1": 0, "align2": 0, "timings": [3.59375, 1.92188, 2.20312, 3.26562, 5.35938, 6.14062, 6.15625, 3.79688] }, { "length": 2, "align1": 2, "align2": 0, "timings": [3.6875, 2, 2.21875, 3.07812, 5.375, 6.23438, 6.03125, 3.75] }, { "length": 2, "align1": 0, "align2": 2, "timings": [3.70312, 2.60938, 2.20312, 3.14062, 5.3125, 6, 6.03125, 3.75] }, { "length": 2, "align1": 2, "align2": 2, "timings": [3.59375, 1.96875, 2.20312, 3.125, 5.34375, 6.20312, 6.03125, 3.75] }, { "length": 3, "align1": 0, "align2": 0, "timings": [3.32812, 5.78125, 2.34375, 3.28125, 4.90625, 5.76562, 5.73438, 3.75] }, { "length": 3, "align1": 3, "align2": 0, "timings": [3.40625, 5.78125, 2.20312, 3.15625, 4.85938, 5.73438, 5.75, 3.48438] }, { "length": 3, "align1": 0, "align2": 3, "timings": [3.34375, 7.42188, 2.65625, 2.92188, 4.9375, 5.75, 5.84375, 3.48438] }, { "length": 3, "align1": 3, "align2": 3, "timings": [3.34375, 5.23438, 1.98438, 3.04688, 4.76562, 5.65625, 5.85938, 3.46875] }, { "length": 4, "align1": 0, "align2": 0, "timings": [3.95312, 2.71875, 2.4375, 3.28125, 5.35938, 5.9375, 6.01562, 3.60938] }, { "length": 4, "align1": 4, "align2": 0, "timings": [3.71875, 2.875, 2.21875, 3.14062, 5.40625, 6.01562, 5.96875, 3.625] }, { "length": 4, "align1": 0, "align2": 4, "timings": [3.625, 2.8125, 2.25, 3.10938, 5.21875, 5.8125, 6.03125, 3.73438] }, { "length": 4, "align1": 4, "align2": 4, "timings": [3.64062, 2.79688, 2.35938, 3.125, 5.35938, 6.04688, 6, 3.75] }, { "length": 5, "align1": 0, "align2": 0, "timings": [3.51562, 3.60938, 2.17188, 3.1875, 5.125, 5.6875, 5.79688, 3.625] }, { "length": 5, "align1": 5, "align2": 0, "timings": [3.375, 3.42188, 2.0625, 3.07812, 4.89062, 5.57812, 5.67188, 3.4375] }, { "length": 5, "align1": 0, "align2": 5, "timings": [3.39062, 3.20312, 2.04688, 3.125, 4.92188, 5.375, 5.65625, 3.35938] }, { "length": 5, "align1": 5, "align2": 5, "timings": [3.32812, 3.07812, 2.60938, 3.125, 4.79688, 5.51562, 5.57812, 3.375] }, { "length": 6, "align1": 0, "align2": 0, "timings": [3.46875, 6.92188, 2, 2.85938, 5.1875, 5.5625, 5.90625, 3.26562] }, { "length": 6, "align1": 6, "align2": 0, "timings": [3.42188, 7.03125, 2, 2.70312, 5.15625, 5.42188, 5.71875, 3.4375] }, { "length": 6, "align1": 0, "align2": 6, "timings": [3.39062, 6.85938, 1.85938, 2.5, 5.03125, 5.3125, 5.6875, 3.32812] }, { "length": 6, "align1": 6, "align2": 6, "timings": [3.26562, 6.92188, 2.01562, 2.73438, 5.0625, 5.5, 5.25, 3.32812] }, { "length": 7, "align1": 0, "align2": 0, "timings": [3.17188, 4.29688, 3.0625, 2.79688, 4.64062, 5.375, 5.48438, 3.125] }, { "length": 7, "align1": 7, "align2": 0, "timings": [3.20312, 3.98438, 2.92188, 2.71875, 4.65625, 5.3125, 5.34375, 3.14062] }, { "length": 7, "align1": 0, "align2": 7, "timings": [3, 4.03125, 2.75, 2.85938, 4.65625, 5.25, 5.45312, 3] }, { "length": 7, "align1": 7, "align2": 7, "timings": [3.125, 3.92188, 2.51562, 2.6875, 4.64062, 5.125, 5.20312, 3.07812] }, { "length": 8, "align1": 0, "align2": 0, "timings": [4.14062, 4.48438, 2.71875, 2.4375, 3.45312, 5.26562, 5, 2.76562] }, { "length": 8, "align1": 8, "align2": 0, "timings": [3.9375, 4.4375, 2.54688, 2.46875, 3.32812, 5.32812, 4.96875, 2.54688] }, { "length": 8, "align1": 0, "align2": 8, "timings": [3.89062, 4.375, 2.57812, 2.28125, 3.28125, 5.23438, 4.96875, 2.67188] }, { "length": 8, "align1": 8, "align2": 8, "timings": [3.85938, 4.40625, 2.71875, 2.34375, 3.14062, 5.1875, 4.875, 2.67188] }, { "length": 9, "align1": 0, "align2": 0, "timings": [3.875, 5.17188, 2.46875, 3.85938, 2.95312, 6.85938, 6.60938, 4.48438] }, { "length": 9, "align1": 9, "align2": 0, "timings": [3.59375, 4.95312, 2.35938, 3.42188, 2.82812, 7.17188, 7.26562, 5.21875] }, { "length": 9, "align1": 0, "align2": 9, "timings": [3.84375, 4.90625, 2.34375, 3.79688, 2.82812, 6.54688, 6.4375, 4.28125] }, { "length": 9, "align1": 9, "align2": 9, "timings": [3.71875, 4.84375, 2.32812, 2.96875, 2.84375, 7.125, 7.125, 5.03125] }, { "length": 10, "align1": 0, "align2": 0, "timings": [3.98438, 5.75, 2.53125, 3.54688, 3.01562, 6.82812, 6.375, 4.46875] }, { "length": 10, "align1": 10, "align2": 0, "timings": [3.84375, 5.6875, 2.45312, 4.21875, 2.84375, 6.21875, 7.23438, 4.90625] }, { "length": 10, "align1": 0, "align2": 10, "timings": [3.79688, 5.70312, 2.35938, 3.39062, 2.6875, 6.875, 6.29688, 4.28125] }, { "length": 10, "align1": 10, "align2": 10, "timings": [3.82812, 5.75, 2.34375, 3.90625, 2.67188, 6.15625, 7.20312, 4.89062] }, { "length": 11, "align1": 0, "align2": 0, "timings": [3.60938, 6.35938, 2.01562, 3.46875, 2.46875, 6.53125, 6.04688, 4.03125] }, { "length": 11, "align1": 11, "align2": 0, "timings": [3.39062, 6.14062, 2.21875, 3.78125, 2.46875, 7.21875, 6.6875, 4.54688] }, { "length": 11, "align1": 0, "align2": 11, "timings": [3.3125, 5.73438, 2.1875, 3.39062, 2.46875, 6.45312, 5.98438, 3.85938] }, { "length": 11, "align1": 11, "align2": 11, "timings": [3.59375, 5.71875, 2.1875, 3.98438, 2.5, 7.20312, 6.60938, 4.60938] }, { "length": 12, "align1": 0, "align2": 0, "timings": [3.875, 6.59375, 2.65625, 3.5, 3.21875, 5.64062, 6.40625, 4.46875] }, { "length": 12, "align1": 12, "align2": 0, "timings": [3.82812, 6.3125, 2.375, 3.28125, 2.96875, 5.57812, 6.34375, 4.15625] }, { "length": 12, "align1": 0, "align2": 12, "timings": [3.64062, 6.1875, 2.40625, 3.26562, 2.98438, 5.34375, 6.20312, 4.25] }, { "length": 12, "align1": 12, "align2": 12, "timings": [3.84375, 6.1875, 2.40625, 3.09375, 3, 5.4375, 6.125, 4.15625] }, { "length": 13, "align1": 0, "align2": 0, "timings": [3.65625, 6.85938, 2.875, 3.25, 2.6875, 6.42188, 5.95312, 3.79688] }, { "length": 13, "align1": 13, "align2": 0, "timings": [3.46875, 6.67188, 2.82812, 3.42188, 2.57812, 7.82812, 7.1875, 5.29688] }, { "length": 13, "align1": 0, "align2": 13, "timings": [3.5, 6.54688, 2.54688, 3.28125, 2.53125, 6.32812, 5.875, 3.70312] }, { "length": 13, "align1": 13, "align2": 13, "timings": [3.57812, 6.57812, 2.70312, 3.29688, 2.59375, 7.57812, 7.03125, 4.82812] }, { "length": 14, "align1": 0, "align2": 0, "timings": [3.78125, 10.5156, 2.26562, 3.04688, 2.73438, 6.65625, 5.875, 3.9375] }, { "length": 14, "align1": 14, "align2": 0, "timings": [3.64062, 10.3438, 2.3125, 3.75, 2.625, 5.8125, 6.9375, 4.85938] }, { "length": 14, "align1": 0, "align2": 14, "timings": [3.70312, 10.3438, 2.15625, 3.01562, 2.46875, 6.60938, 5.96875, 4.25] }, { "length": 14, "align1": 14, "align2": 14, "timings": [3.59375, 10.3125, 2.73438, 3.60938, 2.5625, 5.89062, 6.53125, 4.71875] }, { "length": 15, "align1": 0, "align2": 0, "timings": [3.26562, 10.7188, 3.04688, 3.09375, 2.89062, 6.1875, 5.6875, 3.42188] }, { "length": 15, "align1": 15, "align2": 0, "timings": [3.375, 10.7188, 2.85938, 3.5, 2.8125, 7.0625, 6.26562, 4.34375] }, { "length": 15, "align1": 0, "align2": 15, "timings": [3.35938, 10.7344, 2.78125, 3.07812, 2.95312, 6.1875, 5.60938, 3.28125] }, { "length": 15, "align1": 15, "align2": 15, "timings": [3.21875, 10.75, 2.95312, 3.42188, 2.64062, 6.60938, 6.01562, 4.23438] }, { "length": 16, "align1": 0, "align2": 0, "timings": [3.54688, 12.9219, 2.3125, 3.92188, 13.2031, 6, 7, 4.84375] }, { "length": 16, "align1": 16, "align2": 0, "timings": [3.39062, 13.0938, 2.1875, 3.73438, 13.0469, 5.65625, 6.89062, 4.60938] }, { "length": 16, "align1": 0, "align2": 16, "timings": [3.53125, 13.0156, 2.20312, 3.625, 13.0625, 5.8125, 6.78125, 4.8125] }, { "length": 16, "align1": 16, "align2": 16, "timings": [3.53125, 12.9844, 2.20312, 3.6875, 13.1562, 5.89062, 6.53125, 4.6875] }, { "length": 17, "align1": 0, "align2": 0, "timings": [3.53125, 13.3281, 2.10938, 3.70312, 12.8281, 6.90625, 6.15625, 4.29688] }, { "length": 17, "align1": 17, "align2": 0, "timings": [3.39062, 13.7344, 2.14062, 3.25, 12.6719, 7.51562, 6.875, 5.23438] }, { "length": 17, "align1": 0, "align2": 17, "timings": [3.5, 13.6562, 2.20312, 3.875, 17.9531, 6.6875, 6.5, 4.125] }, { "length": 17, "align1": 17, "align2": 17, "timings": [3.46875, 13.6875, 2.14062, 3.01562, 17.5469, 7.20312, 6.65625, 4.59375] }, { "length": 18, "align1": 0, "align2": 0, "timings": [3.46875, 14, 2.23438, 3.4375, 12.2969, 7.09375, 6.51562, 4.20312] }, { "length": 18, "align1": 18, "align2": 0, "timings": [3.375, 13.8594, 2.1875, 4.23438, 12.2812, 6.29688, 7.25, 5.29688] }, { "length": 18, "align1": 0, "align2": 18, "timings": [3.39062, 13.9531, 2.20312, 3.39062, 18.3125, 6.78125, 6.29688, 4.23438] }, { "length": 18, "align1": 18, "align2": 18, "timings": [3.59375, 13.7656, 2.15625, 4.07812, 18.2031, 6.17188, 6.92188, 5.0625] }, { "length": 19, "align1": 0, "align2": 0, "timings": [3.375, 9.70312, 2.07812, 3.42188, 12.3594, 6.70312, 6.28125, 4.04688] }, { "length": 19, "align1": 19, "align2": 0, "timings": [3.4375, 9.45312, 2.07812, 3.65625, 12.2812, 7.46875, 6.70312, 4.65625] }, { "length": 19, "align1": 0, "align2": 19, "timings": [3.34375, 9.375, 2.07812, 3.39062, 18.25, 6.65625, 6.23438, 3.9375] }, { "length": 19, "align1": 19, "align2": 19, "timings": [3.375, 9.45312, 2.23438, 3.60938, 17.8438, 7.09375, 6.39062, 4.40625] }, { "length": 20, "align1": 0, "align2": 0, "timings": [3.35938, 9.875, 2.20312, 3.34375, 12.5312, 5.76562, 6.64062, 4.23438] }, { "length": 20, "align1": 20, "align2": 0, "timings": [3.39062, 9.75, 2.0625, 3.3125, 12.3281, 5.65625, 6.32812, 4.09375] }, { "length": 20, "align1": 0, "align2": 20, "timings": [3.46875, 9.82812, 2.15625, 3.21875, 16.1094, 5.67188, 6.25, 4.3125] }, { "length": 20, "align1": 20, "align2": 20, "timings": [3.39062, 9.76562, 2.17188, 3.23438, 16.0469, 5.48438, 6.35938, 4.25] }, { "length": 21, "align1": 0, "align2": 0, "timings": [3.375, 13.3438, 2.1875, 3.15625, 12.375, 6.46875, 5.9375, 3.95312] }, { "length": 21, "align1": 21, "align2": 0, "timings": [3.53125, 13.2969, 2.07812, 3.57812, 12.3281, 7.9375, 7.26562, 5.15625] }, { "length": 21, "align1": 0, "align2": 21, "timings": [3.46875, 13.1875, 2.14062, 3.39062, 16.0312, 6.39062, 6, 3.875] }, { "length": 21, "align1": 21, "align2": 21, "timings": [3.42188, 15.1562, 2.07812, 3.375, 15.6406, 7.60938, 7.10938, 4.71875] }, { "length": 22, "align1": 0, "align2": 0, "timings": [3.40625, 13.5781, 2.125, 3.07812, 12.0625, 6.42188, 5.9375, 3.90625] }, { "length": 22, "align1": 22, "align2": 0, "timings": [3.54688, 13.6875, 2.20312, 3.82812, 12, 5.875, 6.9375, 4.85938] }, { "length": 22, "align1": 0, "align2": 22, "timings": [3.32812, 13.5938, 2.20312, 2.90625, 16.0469, 6.39062, 6.07812, 3.76562] }, { "length": 22, "align1": 22, "align2": 22, "timings": [3.59375, 13.5781, 2.21875, 3.5, 15.7969, 5.48438, 6.64062, 4.59375] }, { "length": 23, "align1": 0, "align2": 0, "timings": [3.46875, 14.0312, 2.14062, 3.20312, 11.7812, 6.0625, 5.67188, 3.59375] }, { "length": 23, "align1": 23, "align2": 0, "timings": [3.34375, 14.1094, 2.17188, 3.29688, 11.5469, 6.96875, 6.03125, 4.46875] }, { "length": 23, "align1": 0, "align2": 23, "timings": [3.40625, 14.0781, 2.15625, 3.09375, 15.8438, 6.07812, 5.70312, 3.67188] }, { "length": 23, "align1": 23, "align2": 23, "timings": [3.39062, 14.1094, 2.17188, 3, 15.4688, 6.75, 6.14062, 4.34375] }, { "length": 24, "align1": 0, "align2": 0, "timings": [3.57812, 14.5469, 2.1875, 3.89062, 10.0156, 5.6875, 6.5625, 4.45312] }, { "length": 24, "align1": 24, "align2": 0, "timings": [3.5625, 14.5, 2.17188, 3.57812, 9.95312, 5.23438, 6.39062, 4.15625] }, { "length": 24, "align1": 0, "align2": 24, "timings": [3.45312, 14.4062, 2.07812, 3.64062, 16.6406, 5.6875, 6.21875, 4.32812] }, { "length": 24, "align1": 24, "align2": 24, "timings": [3.40625, 14.4375, 2.21875, 3.64062, 16.5781, 5.6875, 6.34375, 4.34375] }, { "length": 25, "align1": 0, "align2": 0, "timings": [3.48438, 14.8906, 2.20312, 3.76562, 9.70312, 6.39062, 6.21875, 3.71875] }, { "length": 25, "align1": 25, "align2": 0, "timings": [3.54688, 14.8906, 2.21875, 3.23438, 9.5, 7.5, 6.78125, 4.96875] }, { "length": 25, "align1": 0, "align2": 25, "timings": [3.51562, 14.9219, 2.09375, 3.53125, 15.9531, 6.48438, 6.21875, 4.125] }, { "length": 25, "align1": 25, "align2": 25, "timings": [3.57812, 14.8281, 2.09375, 2.84375, 15.6719, 7.20312, 6.78125, 4.67188] }, { "length": 26, "align1": 0, "align2": 0, "timings": [3.39062, 15.3906, 2.07812, 3.25, 9.625, 6.45312, 5.90625, 3.95312] }, { "length": 26, "align1": 26, "align2": 0, "timings": [3.35938, 15.3906, 2.09375, 3.73438, 9.46875, 5.92188, 7.09375, 4.98438] }, { "length": 26, "align1": 0, "align2": 26, "timings": [3.53125, 15.2812, 2.125, 3.14062, 15.9844, 6.60938, 5.875, 3.73438] }, { "length": 26, "align1": 26, "align2": 26, "timings": [3.59375, 15.4375, 2.09375, 3.57812, 15.9062, 5.78125, 6.875, 5.0625] }, { "length": 27, "align1": 0, "align2": 0, "timings": [3.59375, 15.75, 2.23438, 3.53125, 9.1875, 6.3125, 5.875, 3.8125] }, { "length": 27, "align1": 27, "align2": 0, "timings": [3.48438, 15.7344, 2.125, 3.8125, 9.04688, 6.9375, 6.375, 4.54688] }, { "length": 27, "align1": 0, "align2": 27, "timings": [3.39062, 15.7188, 2.07812, 3.34375, 15.9844, 6.0625, 5.71875, 3.6875] }, { "length": 27, "align1": 27, "align2": 27, "timings": [3.40625, 15.75, 2.23438, 3.46875, 15.6875, 6.79688, 6.125, 4.4375] }, { "length": 28, "align1": 0, "align2": 0, "timings": [3.48438, 16.0625, 2.21875, 3.1875, 9.89062, 5.32812, 6.15625, 4.20312] }, { "length": 28, "align1": 28, "align2": 0, "timings": [3.5, 16.1406, 2.09375, 3.15625, 9.75, 5.09375, 5.75, 4.03125] }, { "length": 28, "align1": 0, "align2": 28, "timings": [3.5625, 16.2188, 2.1875, 3, 13.9062, 5.14062, 5.92188, 3.90625] }, { "length": 28, "align1": 28, "align2": 28, "timings": [3.35938, 16.1562, 2.15625, 3.0625, 13.7656, 5.125, 5.98438, 3.78125] }, { "length": 29, "align1": 0, "align2": 0, "timings": [3.40625, 16.5312, 2.20312, 3.17188, 9.21875, 6.20312, 5.5, 3.59375] }, { "length": 29, "align1": 29, "align2": 0, "timings": [3.39062, 16.625, 2.07812, 3.57812, 8.9375, 7.625, 6.78125, 4.89062] }, { "length": 29, "align1": 0, "align2": 29, "timings": [3.46875, 16.5312, 2.17188, 3.26562, 13.3125, 6.1875, 5.45312, 3.48438] }, { "length": 29, "align1": 29, "align2": 29, "timings": [3.32812, 16.5625, 2.0625, 3.29688, 13, 7.3125, 6.73438, 4.67188] }, { "length": 30, "align1": 0, "align2": 0, "timings": [3.53125, 16.9688, 2.04688, 2.9375, 9.17188, 6.39062, 5.70312, 3.78125] }, { "length": 30, "align1": 30, "align2": 0, "timings": [3.40625, 16.9688, 2.20312, 3.3125, 9.125, 5.65625, 6.46875, 4.78125] }, { "length": 30, "align1": 0, "align2": 30, "timings": [3.32812, 16.9219, 2.07812, 2.79688, 13.6094, 6.45312, 5.85938, 3.71875] }, { "length": 30, "align1": 30, "align2": 30, "timings": [3.54688, 16.875, 2.07812, 3.21875, 13.4531, 5.4375, 6.39062, 4.25] }, { "length": 31, "align1": 0, "align2": 0, "timings": [3.60938, 17.6406, 2.20312, 2.875, 8.875, 6.04688, 5.26562, 3.28125] }, { "length": 31, "align1": 31, "align2": 0, "timings": [3.5, 17.6562, 2.10938, 3.21875, 8.70312, 6.5, 6.07812, 4.29688] }, { "length": 31, "align1": 0, "align2": 31, "timings": [3.5625, 17.6094, 2.15625, 2.76562, 13.1719, 5.82812, 5.32812, 3.125] }, { "length": 31, "align1": 31, "align2": 31, "timings": [3.39062, 17.6094, 2.0625, 3.15625, 13.0156, 6.6875, 5.73438, 3.95312] }, { "length": 48, "align1": 0, "align2": 0, "timings": [3.98438, 24.4531, 2.40625, 4.625, 13.8125, 4.29688, 3.79688, 4.67188] }, { "length": 48, "align1": 3, "align2": 0, "timings": [3.78125, 24.4375, 2.5, 3.34375, 13.3438, 10.0312, 9.07812, 6.15625] }, { "length": 48, "align1": 0, "align2": 3, "timings": [3.85938, 24.4531, 2.48438, 4.64062, 15.7969, 11.2969, 10.4375, 7.23438] }, { "length": 48, "align1": 3, "align2": 3, "timings": [3.73438, 24.4531, 2.5, 4.89062, 15.6406, 5.57812, 4.09375, 5.40625] }, { "length": 80, "align1": 0, "align2": 0, "timings": [5.20312, 37.8125, 3.23438, 4.48438, 14.6094, 5.0625, 4.29688, 5.29688] }, { "length": 80, "align1": 5, "align2": 0, "timings": [4.53125, 37.8438, 3.25, 4.03125, 14.3281, 10.5938, 9.32812, 7.23438] }, { "length": 80, "align1": 0, "align2": 5, "timings": [4.67188, 37.9219, 3.1875, 5.4375, 16.9062, 11.4844, 11.0156, 8.5] }, { "length": 80, "align1": 5, "align2": 5, "timings": [4.51562, 37.8594, 3.01562, 4.90625, 16.6562, 5.32812, 5.03125, 5.85938] }, { "length": 96, "align1": 0, "align2": 0, "timings": [4.875, 44.4531, 3.34375, 4.45312, 14.9531, 4.76562, 5.07812, 5.6875] }, { "length": 96, "align1": 6, "align2": 0, "timings": [4.79688, 44.4375, 3.26562, 4.73438, 14.6406, 10.8281, 10.2344, 7.875] }, { "length": 96, "align1": 0, "align2": 6, "timings": [4.5625, 44.4531, 3, 5.375, 17.7031, 11, 10.9375, 8.70312] }, { "length": 96, "align1": 6, "align2": 6, "timings": [4.57812, 44.4219, 2.96875, 4.65625, 17.2188, 5.15625, 5.23438, 5.73438] }, { "length": 112, "align1": 0, "align2": 0, "timings": [5.46875, 51.1562, 3.84375, 4.45312, 15.4062, 4.625, 5.01562, 5.6875] }, { "length": 112, "align1": 7, "align2": 0, "timings": [5.23438, 51.1562, 3.82812, 4.26562, 14.9531, 12.1094, 10.875, 8.625] }, { "length": 112, "align1": 0, "align2": 7, "timings": [5.07812, 51.1875, 3.57812, 6.17188, 17.7031, 12.5, 11.8594, 9.51562] }, { "length": 112, "align1": 7, "align2": 7, "timings": [4.89062, 51.125, 3.6875, 4.76562, 17.4219, 5.34375, 5.4375, 6.42188] }, { "length": 144, "align1": 0, "align2": 0, "timings": [5.75, 64.625, 3.98438, 5.54688, 22.7656, 5.10938, 5.53125, 6.5625] }, { "length": 144, "align1": 9, "align2": 0, "timings": [5.625, 64.625, 4.09375, 5.51562, 22.25, 12.8594, 12, 12.7031] }, { "length": 144, "align1": 0, "align2": 9, "timings": [5.46875, 64.6719, 3.9375, 7.03125, 23.1406, 13.9062, 13.3125, 14.0625] }, { "length": 144, "align1": 9, "align2": 9, "timings": [5.92188, 64.6406, 4.34375, 5.89062, 22.9062, 6.71875, 6, 7.15625] }, { "length": 160, "align1": 0, "align2": 0, "timings": [5.5625, 71.125, 4.29688, 5.5, 22.9531, 5.3125, 6.65625, 7.35938] }, { "length": 160, "align1": 10, "align2": 0, "timings": [5.64062, 71.5312, 4.29688, 6.23438, 22.7031, 13.5469, 12.5781, 13.2031] }, { "length": 160, "align1": 0, "align2": 10, "timings": [5.14062, 71.1562, 3.75, 6.71875, 23.75, 14.2188, 13.4844, 14.7969] }, { "length": 160, "align1": 10, "align2": 10, "timings": [5.3125, 71.6562, 3.65625, 5.75, 23.2031, 6.98438, 6.29688, 7.6875] }, { "length": 176, "align1": 0, "align2": 0, "timings": [6.32812, 78.0625, 4.57812, 5.35938, 23.2344, 4.96875, 6.14062, 7.125] }, { "length": 176, "align1": 11, "align2": 0, "timings": [5.8125, 77.9531, 4.70312, 6.07812, 22.625, 14.7344, 13.6875, 14.25] }, { "length": 176, "align1": 0, "align2": 11, "timings": [5.78125, 78.2031, 4.42188, 7.45312, 23.4688, 15.0156, 14.2031, 15.1406] }, { "length": 176, "align1": 11, "align2": 11, "timings": [5.42188, 77.9844, 4.28125, 5.46875, 22.7969, 7.1875, 7.10938, 7.53125] }, { "length": 192, "align1": 0, "align2": 0, "timings": [5.64062, 84.6719, 3.40625, 5.875, 23.4844, 5.21875, 7.26562, 7.625] }, { "length": 192, "align1": 12, "align2": 0, "timings": [4.95312, 84.7031, 3.375, 6.96875, 23.0156, 14.9688, 14.3438, 14.9531] }, { "length": 192, "align1": 0, "align2": 12, "timings": [5.98438, 84.5625, 4.10938, 7.65625, 24.4531, 16.0938, 16.25, 16.2031] }, { "length": 192, "align1": 12, "align2": 12, "timings": [5.45312, 84.875, 4.23438, 5.79688, 23.9688, 6.9375, 7.54688, 7.92188] }, { "length": 208, "align1": 0, "align2": 0, "timings": [5.64062, 91.6406, 4.03125, 5.54688, 23.8906, 5.1875, 6.71875, 7.65625] }, { "length": 208, "align1": 13, "align2": 0, "timings": [5.29688, 91.875, 3.9375, 6.34375, 23.5938, 16, 15.0625, 15.7344] }, { "length": 208, "align1": 0, "align2": 13, "timings": [5.59375, 91.8125, 4.23438, 8.07812, 26.8281, 16.25, 16.2812, 17.2812] }, { "length": 208, "align1": 13, "align2": 13, "timings": [5.21875, 91.4375, 4.15625, 5.6875, 26.6094, 6.6875, 7.34375, 8.1875] }, { "length": 224, "align1": 0, "align2": 0, "timings": [5.34375, 98.1719, 3.875, 5.39062, 24.3594, 5.35938, 7.78125, 8.60938] }, { "length": 224, "align1": 14, "align2": 0, "timings": [5.28125, 98.5781, 3.9375, 7.10938, 23.8438, 16.0469, 15.3906, 16.3906] }, { "length": 224, "align1": 0, "align2": 14, "timings": [5.5, 98.3125, 4.17188, 8.0625, 28.0312, 16.2812, 16.4531, 17.125] }, { "length": 224, "align1": 14, "align2": 14, "timings": [5.45312, 98.2812, 4.125, 5.34375, 27.6562, 6.48438, 7.8125, 8.42188] }, { "length": 240, "align1": 0, "align2": 0, "timings": [6.23438, 105.109, 4.375, 5.15625, 24.6406, 5.32812, 7.60938, 8.3125] }, { "length": 240, "align1": 15, "align2": 0, "timings": [5.71875, 104.781, 4.29688, 7.1875, 24.3594, 17, 15.9062, 17.0938] }, { "length": 240, "align1": 0, "align2": 15, "timings": [5.875, 105.484, 4.3125, 8.76562, 28.2969, 17.1875, 16.6562, 17.6094] }, { "length": 240, "align1": 15, "align2": 15, "timings": [5.8125, 104.719, 4.34375, 5.78125, 28.1875, 7.09375, 8.03125, 8.92188] }, { "length": 272, "align1": 0, "align2": 0, "timings": [6.79688, 118.125, 5.3125, 6.64062, 25.0938, 5.875, 8.23438, 9.15625] }, { "length": 272, "align1": 17, "align2": 0, "timings": [6.5, 118.25, 5.4375, 7.625, 24.6562, 17.4688, 16.5156, 18.4531] }, { "length": 272, "align1": 0, "align2": 17, "timings": [7.21875, 118.719, 5.23438, 9.60938, 28.1094, 18.9375, 18.25, 19.7969] }, { "length": 272, "align1": 17, "align2": 17, "timings": [6.34375, 118.625, 5.01562, 7.67188, 27.2031, 7, 9.09375, 9.78125] }, { "length": 288, "align1": 0, "align2": 0, "timings": [6.45312, 126.078, 5.29688, 6.85938, 24.9688, 5.9375, 9.42188, 10.0469] }, { "length": 288, "align1": 18, "align2": 0, "timings": [6.625, 125.172, 4.875, 8.54688, 24.9219, 17.7188, 17.1406, 19.0625] }, { "length": 288, "align1": 0, "align2": 18, "timings": [6.625, 125.625, 4.96875, 9.0625, 27.9375, 18.9531, 18.375, 20.5469] }, { "length": 288, "align1": 18, "align2": 18, "timings": [6.46875, 125.516, 4.92188, 6.76562, 27.3125, 7.48438, 9.51562, 10.125] }, { "length": 304, "align1": 0, "align2": 0, "timings": [7.10938, 131.75, 5.59375, 6.60938, 24.3594, 6.07812, 9.35938, 10.2969] }, { "length": 304, "align1": 19, "align2": 0, "timings": [6.90625, 131.875, 5.625, 8.20312, 24.3594, 18.4844, 17.7344, 19.6562] }, { "length": 304, "align1": 0, "align2": 19, "timings": [7.59375, 132.594, 5.28125, 12.5, 28.6406, 19.3125, 18.6406, 20.9688] }, { "length": 304, "align1": 19, "align2": 19, "timings": [6.9375, 131.906, 5.21875, 6.78125, 27.6562, 7.5, 9.45312, 10.5781] }, { "length": 320, "align1": 0, "align2": 0, "timings": [6.375, 138.938, 4.70312, 6.53125, 24.9688, 6.60938, 10.5938, 10.5156] }, { "length": 320, "align1": 20, "align2": 0, "timings": [6.14062, 138.672, 4.65625, 9.0625, 24.8125, 18.8906, 18.0625, 20.4219] }, { "length": 320, "align1": 0, "align2": 20, "timings": [7.85938, 139.281, 5.14062, 10.5938, 29.0781, 19.8125, 19.5781, 22.2344] }, { "length": 320, "align1": 20, "align2": 20, "timings": [6.76562, 138.891, 5.10938, 6.96875, 28.4531, 7.54688, 10.9531, 11] }, { "length": 336, "align1": 0, "align2": 0, "timings": [6.59375, 145.422, 4.96875, 6.51562, 25.3594, 6.75, 9.70312, 10.9375] }, { "length": 336, "align1": 21, "align2": 0, "timings": [6.1875, 145.047, 5.07812, 8.71875, 25.4844, 19.8125, 19, 21.3281] }, { "length": 336, "align1": 0, "align2": 21, "timings": [7.01562, 145.75, 5.34375, 10.7969, 29.2812, 20.2031, 20.1094, 22.9375] }, { "length": 336, "align1": 21, "align2": 21, "timings": [6.1875, 145.391, 4.95312, 6.96875, 28.25, 7.29688, 10.4062, 11.3281] }, { "length": 352, "align1": 0, "align2": 0, "timings": [6.23438, 152.562, 5.04688, 5.92188, 25.6719, 6.82812, 10.8438, 12.0781] }, { "length": 352, "align1": 22, "align2": 0, "timings": [6.20312, 151.938, 4.95312, 11.1094, 25.75, 20.375, 19.5938, 21.8594] }, { "length": 352, "align1": 0, "align2": 22, "timings": [7.25, 152.141, 5.07812, 12.0625, 29.6719, 20.4688, 20.8594, 23] }, { "length": 352, "align1": 22, "align2": 22, "timings": [6.10938, 153.438, 4.9375, 6.5, 29.1719, 7.45312, 10.7031, 11.4531] }, { "length": 368, "align1": 0, "align2": 0, "timings": [7.29688, 159.375, 5.32812, 6.15625, 26.1406, 7.10938, 10.8438, 12.375] }, { "length": 368, "align1": 23, "align2": 0, "timings": [7, 159.328, 5.34375, 9.64062, 26.2969, 21.3125, 20.2031, 22.5625] }, { "length": 368, "align1": 0, "align2": 23, "timings": [7.04688, 158.891, 5.29688, 12.4531, 32.1094, 21.5, 20.9062, 23.6875] }, { "length": 368, "align1": 23, "align2": 23, "timings": [6.96875, 158.781, 5.35938, 6.8125, 30.7812, 7.65625, 10.9531, 11.8594] }, { "length": 384, "align1": 0, "align2": 0, "timings": [7.64062, 166.25, 6.07812, 8.15625, 22.3125, 7.57812, 12.2188, 13.4688] }, { "length": 384, "align1": 24, "align2": 0, "timings": [7.40625, 166.016, 5.48438, 12, 21.8281, 7.3125, 12.0156, 13.3594] }, { "length": 384, "align1": 0, "align2": 24, "timings": [6.6875, 166, 5.4375, 12.2344, 27.9688, 7.42188, 12.1562, 13.4062] }, { "length": 384, "align1": 24, "align2": 24, "timings": [6.95312, 166.859, 5.46875, 7.26562, 27.1719, 7.26562, 12.1406, 13.4219] }, { "length": 400, "align1": 0, "align2": 0, "timings": [7.89062, 171.234, 6.3125, 7.51562, 27.5312, 7.85938, 12.4688, 13.625] }, { "length": 400, "align1": 25, "align2": 0, "timings": [7.76562, 172.172, 6.25, 12.1875, 27.9062, 22.1094, 21.5469, 24.0625] }, { "length": 400, "align1": 0, "align2": 25, "timings": [7.84375, 173.031, 6.28125, 14.0781, 28.0625, 23.2188, 23.1094, 25.2812] }, { "length": 400, "align1": 25, "align2": 25, "timings": [7.29688, 173.172, 5.84375, 7.8125, 26.6562, 9.21875, 13.3594, 14.4219] }, { "length": 416, "align1": 0, "align2": 0, "timings": [7.59375, 178.797, 6.14062, 7.65625, 27.9844, 8.0625, 12.75, 14.4688] }, { "length": 416, "align1": 26, "align2": 0, "timings": [7.59375, 179.469, 6.1875, 12.6406, 28.2031, 22.625, 21.8594, 24.625] }, { "length": 416, "align1": 0, "align2": 26, "timings": [7.51562, 179.656, 5.9375, 13.5156, 27.8125, 23.8438, 23.1875, 25.9219] }, { "length": 416, "align1": 26, "align2": 26, "timings": [7.45312, 180.141, 5.76562, 7.45312, 26.4375, 9.10938, 13.3125, 14.8438] }, { "length": 432, "align1": 0, "align2": 0, "timings": [8.1875, 185.938, 6.75, 7.09375, 27.3438, 8.4375, 13.2344, 14.4375] }, { "length": 432, "align1": 27, "align2": 0, "timings": [7.9375, 186.531, 6.28125, 13.3125, 28.0312, 23.0781, 22.4375, 25.3906] }, { "length": 432, "align1": 0, "align2": 27, "timings": [7.79688, 186.703, 6.70312, 14.1406, 28.1875, 24.0469, 23.7969, 26.2656] }, { "length": 432, "align1": 27, "align2": 27, "timings": [7.71875, 186.391, 6.375, 7.8125, 26.3906, 9.10938, 13.7031, 14.6875] }, { "length": 448, "align1": 0, "align2": 0, "timings": [7.46875, 193.969, 5.71875, 8.125, 27.4688, 8.78125, 13.3906, 15.0938] }, { "length": 448, "align1": 28, "align2": 0, "timings": [6.82812, 193.094, 5.5625, 13.0469, 28.1094, 23.4219, 22.7656, 26] }, { "length": 448, "align1": 0, "align2": 28, "timings": [7.95312, 193.969, 6.34375, 14.5312, 28.9375, 24.75, 24.5938, 27.8594] }, { "length": 448, "align1": 28, "align2": 28, "timings": [7.76562, 192.328, 6.42188, 7.95312, 27.6719, 9.76562, 14.2812, 15.2344] }, { "length": 464, "align1": 0, "align2": 0, "timings": [7.54688, 199.109, 5.76562, 7.40625, 27.5312, 8.95312, 13.7969, 15.1875] }, { "length": 464, "align1": 29, "align2": 0, "timings": [7.1875, 199.625, 5.85938, 13.7969, 28.3906, 24.375, 23.25, 27.0312] }, { "length": 464, "align1": 0, "align2": 29, "timings": [7.65625, 200.156, 6.625, 14.4531, 29.125, 25.0781, 24.7344, 28.5469] }, { "length": 464, "align1": 29, "align2": 29, "timings": [7.17188, 200.312, 5.875, 8.125, 27.5156, 9.375, 14.3438, 15.5156] }, { "length": 480, "align1": 0, "align2": 0, "timings": [7.125, 206.875, 5.79688, 7.65625, 28.0469, 9.25, 14.0781, 15.75] }, { "length": 480, "align1": 30, "align2": 0, "timings": [7.26562, 206.422, 5.78125, 13.625, 28.6875, 24.6094, 23.75, 27.5781] }, { "length": 480, "align1": 0, "align2": 30, "timings": [7.625, 207.656, 6.21875, 14.5625, 29.5312, 25.2812, 25.1719, 28.3125] }, { "length": 480, "align1": 30, "align2": 30, "timings": [7.17188, 206.641, 5.76562, 7.3125, 27.9844, 9.73438, 14.5625, 15.7031] }, { "length": 496, "align1": 0, "align2": 0, "timings": [7.92188, 215.156, 6.25, 7.15625, 28.0469, 9.375, 14.4062, 15.8594] }, { "length": 496, "align1": 31, "align2": 0, "timings": [8.0625, 212.906, 6.90625, 14.3125, 28.3906, 25.4219, 24.375, 28.1875] }, { "length": 496, "align1": 0, "align2": 31, "timings": [8.03125, 214.797, 6.6875, 15.1406, 33.1719, 26.2031, 25.8594, 29.0312] }, { "length": 496, "align1": 31, "align2": 31, "timings": [8.125, 213.375, 6.95312, 7.79688, 31.125, 10.4219, 15.1719, 16.4375] }, { "length": 1024, "align1": 0, "align2": 0, "timings": [13.0625, 438.641, 10.8906, 16.1406, 31.125, 19.6094, 24.5, 26.5625] }, { "length": 1024, "align1": 32, "align2": 0, "timings": [12.7656, 438.047, 10.8906, 15.8438, 30.6719, 19.5156, 24.4688, 26.4219] }, { "length": 1024, "align1": 0, "align2": 32, "timings": [12.6562, 438.875, 10.8594, 16.2656, 37.2344, 19.5469, 25.0781, 26.9531] }, { "length": 1024, "align1": 32, "align2": 32, "timings": [12.7812, 439.281, 10.875, 15.9219, 36.0156, 19.5625, 24.5312, 26.2969] }, { "length": 1056, "align1": 0, "align2": 0, "timings": [12.9219, 450.016, 11.2812, 15.6406, 36, 20.6875, 24.9688, 26.8438] }, { "length": 1056, "align1": 33, "align2": 0, "timings": [13.1719, 451.609, 11.6719, 24.3125, 36.875, 44.2344, 43.8594, 52.4219] }, { "length": 1056, "align1": 0, "align2": 33, "timings": [14.2344, 453.031, 12.3281, 26.0781, 41.0156, 45.1719, 45.4531, 53.5625] }, { "length": 1056, "align1": 33, "align2": 33, "timings": [13.1719, 449.547, 11.5156, 16.7812, 39.0312, 23.0312, 26.1719, 27.3594] }, { "length": 1088, "align1": 0, "align2": 0, "timings": [12.5781, 485.172, 10.875, 15.7344, 36.4219, 21.6562, 25.7344, 27.6562] }, { "length": 1088, "align1": 34, "align2": 0, "timings": [12.9375, 462.172, 10.9531, 24.875, 37.2812, 45.7031, 44.75, 53.5] }, { "length": 1088, "align1": 0, "align2": 34, "timings": [14.5469, 464.047, 12.2812, 25.8438, 41.875, 46.7031, 46.1406, 55.1406] }, { "length": 1088, "align1": 34, "align2": 34, "timings": [13.8125, 466.016, 11.6719, 15.9219, 39.5312, 23.2969, 26.4844, 27.7188] }, { "length": 1120, "align1": 0, "align2": 0, "timings": [12.7188, 480.078, 11.0625, 15.5, 36.75, 21.1094, 26.3281, 28.1875] }, { "length": 1120, "align1": 35, "align2": 0, "timings": [12.9688, 477.594, 11.75, 25.4062, 37.7812, 47.7656, 45.8594, 54.875] }, { "length": 1120, "align1": 0, "align2": 35, "timings": [14.3594, 477.312, 12.1562, 26.8281, 47.0156, 47.2031, 47.0312, 56] }, { "length": 1120, "align1": 35, "align2": 35, "timings": [12.9844, 479.438, 11.9219, 16.0781, 45.3594, 23.7969, 26.6875, 28.0938] }, { "length": 1152, "align1": 0, "align2": 0, "timings": [15.2188, 490.391, 13.6562, 17.0625, 33.2969, 22.1094, 26.75, 28.7969] }, { "length": 1152, "align1": 36, "align2": 0, "timings": [15.6719, 490.25, 14.25, 26.1094, 34.5, 47.75, 46.9219, 56.4531] }, { "length": 1152, "align1": 0, "align2": 36, "timings": [14.75, 497.047, 12.7344, 26.9062, 42.7969, 48.7812, 49.0625, 57.9844] }, { "length": 1152, "align1": 36, "align2": 36, "timings": [13.5781, 489.797, 11.8281, 15.5938, 41.0156, 23.6562, 28.125, 28.3906] }, { "length": 1184, "align1": 0, "align2": 0, "timings": [15.4688, 524.609, 13.875, 16.8594, 38.3594, 22.9219, 27.2344, 29.3125] }, { "length": 1184, "align1": 37, "align2": 0, "timings": [15.7812, 505.344, 14.2812, 26.375, 39.5, 49.3281, 48.3125, 57.8594] }, { "length": 1184, "align1": 0, "align2": 37, "timings": [16.1719, 504.938, 14.5938, 27.5156, 43.0625, 50.7656, 49.5469, 59.6094] }, { "length": 1184, "align1": 37, "align2": 37, "timings": [15.3438, 504.969, 14.0156, 17.125, 41.0625, 25.0625, 27.5625, 28.7656] }, { "length": 1216, "align1": 0, "align2": 0, "timings": [14.9844, 516.484, 13.3906, 16.9375, 38.7344, 23.6406, 27.8906, 29.7188] }, { "length": 1216, "align1": 38, "align2": 0, "timings": [15.3125, 523.016, 13.9062, 27.0938, 39.3438, 50.5625, 49.1719, 59.2188] }, { "length": 1216, "align1": 0, "align2": 38, "timings": [17.1406, 589.031, 15.3906, 27.7812, 44.0938, 51.625, 50.7188, 60.0938] }, { "length": 1216, "align1": 38, "align2": 38, "timings": [15.7656, 522.156, 14.4531, 16.8281, 42.5625, 24.9375, 27.9375, 29.4844] }, { "length": 1248, "align1": 0, "align2": 0, "timings": [15.3906, 531.859, 13.5312, 16.6094, 39.1875, 23.1406, 28.4375, 30.375] }, { "length": 1248, "align1": 39, "align2": 0, "timings": [16, 533.922, 14.9375, 27.4062, 40.1562, 51.1562, 50.6094, 60.4844] }, { "length": 1248, "align1": 0, "align2": 39, "timings": [17.0312, 539.953, 15.8281, 28.5781, 46.8125, 51.375, 51.8125, 61.7812] }, { "length": 1248, "align1": 39, "align2": 39, "timings": [16, 533.938, 14.5469, 17.0469, 44.7344, 25.5781, 28.625, 30.0625] }, { "length": 1280, "align1": 0, "align2": 0, "timings": [17.1875, 577.375, 15.3906, 18.2656, 49.2031, 24.1094, 28.8594, 31.0938] }, { "length": 1280, "align1": 40, "align2": 0, "timings": [17.5, 549.016, 15.9844, 28.2188, 35.3906, 23.7188, 28.7812, 30.9062] }, { "length": 1280, "align1": 0, "align2": 40, "timings": [17.6094, 547.156, 15.75, 29.5781, 41.9062, 24.1562, 30.1406, 31.9219] }, { "length": 1280, "align1": 40, "align2": 40, "timings": [16.3125, 544.828, 15.1719, 17.4375, 41.2812, 23.9219, 28.9219, 30.8906] }, { "length": 1312, "align1": 0, "align2": 0, "timings": [17.25, 558.266, 15.3906, 17.9375, 40.7656, 25.0625, 29.5156, 31.3594] }, { "length": 1312, "align1": 41, "align2": 0, "timings": [17.5469, 562.672, 21.8281, 28.875, 41.8281, 53.0938, 52.3438, 63.5] }, { "length": 1312, "align1": 0, "align2": 41, "timings": [18.2656, 559.328, 16.6094, 30.2812, 43.125, 53.7344, 53.8906, 64.4219] }, { "length": 1312, "align1": 41, "align2": 41, "timings": [16.9219, 559.688, 15.7188, 18.6719, 41.3438, 27.2344, 30.6875, 31.7344] }, { "length": 1344, "align1": 0, "align2": 0, "timings": [16.9375, 577.328, 15.2188, 17.8438, 41.0625, 25.9062, 30.0938, 32.0156] }, { "length": 1344, "align1": 42, "align2": 0, "timings": [17.2344, 570.812, 15.7812, 29.6094, 42.4062, 56.0312, 53.7031, 64.9531] }, { "length": 1344, "align1": 0, "align2": 42, "timings": [18.7656, 573.234, 17.0469, 29.8594, 44.0625, 55.7656, 54.875, 65.5781] }, { "length": 1344, "align1": 42, "align2": 42, "timings": [17.5938, 571.172, 16.1875, 17.8281, 42.5, 27.9219, 30.7656, 32.1875] }, { "length": 1376, "align1": 0, "align2": 0, "timings": [17.1406, 587.203, 15.375, 17.6406, 41.4062, 25.375, 30.6719, 32.6875] }, { "length": 1376, "align1": 43, "align2": 0, "timings": [17.4375, 591.641, 15.9375, 30.3281, 43.0312, 55.2656, 54.6406, 66.0781] }, { "length": 1376, "align1": 0, "align2": 43, "timings": [18.2344, 585.75, 16.3906, 30.7344, 47.8125, 55.8594, 55.7031, 66.75] }, { "length": 1376, "align1": 43, "align2": 43, "timings": [16.9219, 589.609, 15.6562, 18.125, 44.5781, 27.9062, 30.875, 32.2656] }, { "length": 1408, "align1": 0, "align2": 0, "timings": [18.125, 601.797, 16.4688, 19.4219, 38.125, 26.2812, 31.0781, 33.1406] }, { "length": 1408, "align1": 44, "align2": 0, "timings": [18.4375, 603.312, 17.1562, 30.6562, 38.9219, 56.625, 55.7344, 67.5156] }, { "length": 1408, "align1": 0, "align2": 44, "timings": [18.8125, 606.578, 17.0156, 31.4219, 46.5625, 57.2344, 57.625, 68.6719] }, { "length": 1408, "align1": 44, "align2": 44, "timings": [17.5781, 662.484, 16.2344, 17.9844, 43.4531, 27.9844, 32.3906, 33.3281] }, { "length": 1440, "align1": 0, "align2": 0, "timings": [18.0938, 609.641, 16.7188, 19.125, 43.375, 27.125, 31.6875, 33.6094] }, { "length": 1440, "align1": 45, "align2": 0, "timings": [18.6562, 606.828, 22.9062, 31.3906, 44, 58.1875, 56.9375, 68.9219] }, { "length": 1440, "align1": 0, "align2": 45, "timings": [19.5469, 627.25, 18, 31.5, 46.6719, 59.3125, 58.1406, 70.4062] }, { "length": 1440, "align1": 45, "align2": 45, "timings": [18.3594, 609.484, 16.8594, 19.4219, 43.5469, 29.375, 32.0156, 33.25] }, { "length": 1472, "align1": 0, "align2": 0, "timings": [18.0312, 629.766, 16.3281, 18.9688, 43.5625, 28.0938, 32.2656, 34.4375] }, { "length": 1472, "align1": 46, "align2": 0, "timings": [18.2656, 651.969, 17.0312, 31.8906, 44.8438, 73.4219, 58.0469, 70.4844] }, { "length": 1472, "align1": 0, "align2": 46, "timings": [19.9375, 629.516, 18.4375, 31.9062, 46.3125, 60.4688, 59.2969, 70.8438] }, { "length": 1472, "align1": 46, "align2": 46, "timings": [18.5938, 631.625, 17.2344, 19.0312, 44.9531, 29.4531, 32.4688, 33.6719] }, { "length": 1504, "align1": 0, "align2": 0, "timings": [18.25, 646.125, 16.5156, 18.8438, 43.7969, 27.75, 32.9844, 34.7344] }, { "length": 1504, "align1": 47, "align2": 0, "timings": [18.5156, 646.016, 17.125, 32.4688, 46.0312, 59.8438, 59.1562, 71.3438] }, { "length": 1504, "align1": 0, "align2": 47, "timings": [19.5625, 649.047, 17.7188, 32.7812, 49.1562, 59.8594, 60.0938, 72.3438] }, { "length": 1504, "align1": 47, "align2": 47, "timings": [18.2344, 646.5, 16.8594, 19.125, 46.7969, 30.0625, 33.2344, 34.6094] }, { "length": 1536, "align1": 0, "align2": 0, "timings": [19.2812, 658.672, 17.4844, 20.4844, 40.125, 28.1875, 33.4688, 35.375] }, { "length": 1536, "align1": 48, "align2": 0, "timings": [18.8438, 655.109, 17.5938, 20.3594, 40.0469, 28.25, 33.3906, 35.4688] }, { "length": 1536, "align1": 0, "align2": 48, "timings": [18.7969, 658.188, 17.625, 20.4219, 45.875, 28.4375, 34.6719, 36.5] }, { "length": 1536, "align1": 48, "align2": 48, "timings": [18.7969, 653.5, 17.5312, 20.375, 45.625, 28.0781, 33.375, 35.3125] }, { "length": 1568, "align1": 0, "align2": 0, "timings": [19.0781, 668.141, 17.75, 20.3438, 45.8594, 29.2812, 33.9375, 35.8594] }, { "length": 1568, "align1": 49, "align2": 0, "timings": [19.8594, 668.766, 18.3125, 33.6094, 47.0938, 61.9062, 61.4219, 74.6719] }, { "length": 1568, "align1": 0, "align2": 49, "timings": [20.7031, 669.219, 21.5469, 34.9688, 51.6719, 62.5625, 62.3438, 75.625] }, { "length": 1568, "align1": 49, "align2": 49, "timings": [19.1094, 671.594, 18.0156, 21.3281, 48.3125, 31.4062, 34.9688, 36.4844] }, { "length": 1600, "align1": 0, "align2": 0, "timings": [18.9219, 690.344, 17.4062, 20.0625, 46.0469, 30.1562, 34.5312, 36.4844] }, { "length": 1600, "align1": 50, "align2": 0, "timings": [19.6406, 690.359, 18.125, 33.9062, 47.0312, 63.2969, 62.2031, 75.6875] }, { "length": 1600, "align1": 0, "align2": 50, "timings": [21.4375, 681.328, 22.125, 34.7812, 51.8438, 64.3438, 63.9688, 76.8281] }, { "length": 1600, "align1": 50, "align2": 50, "timings": [19.8281, 679.891, 18.2344, 20.3438, 48.9062, 32.1562, 35.125, 36.375] }, { "length": 1632, "align1": 0, "align2": 0, "timings": [19.4219, 698.516, 17.4844, 19.9375, 46.5, 29.9688, 35.0781, 36.8594] }, { "length": 1632, "align1": 51, "align2": 0, "timings": [19.5625, 690.531, 18.2656, 34.75, 48.1094, 66.3125, 63.7656, 77.3125] }, { "length": 1632, "align1": 0, "align2": 51, "timings": [20.7188, 702.453, 19.75, 35.6875, 56.0625, 64.1875, 64.5625, 77.9062] }, { "length": 1632, "align1": 51, "align2": 51, "timings": [19.3281, 702.922, 17.8281, 20.5469, 52.9062, 32.6406, 35.5781, 36.6406] }, { "length": 1664, "align1": 0, "align2": 0, "timings": [20.4531, 713.938, 18.6094, 21.5469, 43.125, 30.5781, 35.6094, 37.5469] }, { "length": 1664, "align1": 52, "align2": 0, "timings": [20.75, 769.547, 19.625, 35.2188, 44.9688, 65.125, 64.7344, 78.6719] }, { "length": 1664, "align1": 0, "align2": 52, "timings": [21.5312, 716.344, 20.3438, 36.125, 53.6562, 65.8125, 66.1406, 79.3438] }, { "length": 1664, "align1": 52, "align2": 52, "timings": [19.7188, 713.531, 18.3281, 20.2344, 50.3906, 32.2812, 54.2812, 37.4062] }, { "length": 1696, "align1": 0, "align2": 0, "timings": [20.6094, 727.172, 18.8125, 21.4062, 47.7812, 31.6875, 36.2812, 38.2656] }, { "length": 1696, "align1": 53, "align2": 0, "timings": [20.8438, 732.984, 19.3906, 35.7969, 49.9688, 66.9375, 65.6562, 79.875] }, { "length": 1696, "align1": 0, "align2": 53, "timings": [22.125, 719.859, 22.0781, 36.5156, 54.6719, 66.75, 67.0469, 81.0156] }, { "length": 1696, "align1": 53, "align2": 53, "timings": [20.5469, 720, 18.9844, 21.7812, 50.4844, 33.6406, 36.3438, 37.7031] }, { "length": 1728, "align1": 0, "align2": 0, "timings": [20.2812, 738.047, 18.4062, 21.1875, 48.375, 32.5938, 36.8438, 38.7812] }, { "length": 1728, "align1": 54, "align2": 0, "timings": [20.6094, 738.234, 19.2969, 36.4375, 49.4375, 67.7969, 66.7188, 81.2656] }, { "length": 1728, "align1": 0, "align2": 54, "timings": [22.4375, 737.281, 23.4219, 37.0156, 53.8125, 69.1719, 67.875, 81.9219] }, { "length": 1728, "align1": 54, "align2": 54, "timings": [20.8594, 734.531, 19.5156, 21.4531, 51.6875, 33.75, 36.9062, 38.2188] }, { "length": 1760, "align1": 0, "align2": 0, "timings": [20.5469, 751.828, 18.625, 21.0156, 48.9062, 32.2656, 37.3281, 39.125] }, { "length": 1760, "align1": 55, "align2": 0, "timings": [20.9531, 749.047, 19.375, 36.9688, 50.5312, 68.6094, 67.9688, 82.6406] }, { "length": 1760, "align1": 0, "align2": 55, "timings": [22.3125, 756.469, 20.9219, 37.5, 55.7656, 68.6406, 69.3594, 83.4219] }, { "length": 1760, "align1": 55, "align2": 55, "timings": [20.4688, 748.219, 19.0156, 21.0938, 68.4688, 34.6562, 37.625, 38.875] }, { "length": 1792, "align1": 0, "align2": 0, "timings": [21.4688, 768.312, 19.6562, 22.7031, 45.4688, 32.7031, 37.8281, 39.7188] }, { "length": 1792, "align1": 56, "align2": 0, "timings": [21.7656, 765.891, 20.4531, 37.4844, 45.4844, 32.7656, 37.7344, 39.6719] }, { "length": 1792, "align1": 0, "align2": 56, "timings": [22.6094, 764.438, 20.8438, 38.6094, 51.3438, 33.0469, 39.9844, 41.8906] }, { "length": 1792, "align1": 56, "align2": 56, "timings": [20.875, 766.344, 19.4844, 21.9688, 50.7812, 32.8125, 37.8125, 39.5938] }, { "length": 1824, "align1": 0, "align2": 0, "timings": [21.75, 772.656, 19.7656, 22.4688, 50.2344, 33.8281, 38.4844, 40.3438] }, { "length": 1824, "align1": 57, "align2": 0, "timings": [21.9844, 777.922, 26.5156, 38.0312, 52.3281, 70.4219, 69.9688, 85.2812] }, { "length": 1824, "align1": 0, "align2": 57, "timings": [23.25, 781.812, 21.5938, 39.3281, 54.6875, 71.3125, 71.5, 86.6562] }, { "length": 1824, "align1": 57, "align2": 57, "timings": [21.4531, 776.578, 20.2188, 23.3438, 50.7031, 36.2188, 39.3594, 40.9062] }, { "length": 1856, "align1": 0, "align2": 0, "timings": [21.4062, 793.719, 19.625, 22.4375, 50.5469, 34.7656, 39.1094, 40.8281] }, { "length": 1856, "align1": 58, "align2": 0, "timings": [21.9688, 787.234, 20.3281, 38.5625, 52.4062, 73.5156, 71.25, 86.6875] }, { "length": 1856, "align1": 0, "align2": 58, "timings": [23.875, 796.328, 22.1094, 38.7031, 54.8281, 73.0938, 72.5781, 87.7656] }, { "length": 1856, "align1": 58, "align2": 58, "timings": [21.8125, 840.25, 20.6406, 22.5, 51.9219, 36.4844, 39.5156, 40.7969] }, { "length": 1888, "align1": 0, "align2": 0, "timings": [21.7969, 802.969, 19.7031, 22.1406, 51.2656, 34.4688, 39.5625, 41.375] }, { "length": 1888, "align1": 59, "align2": 0, "timings": [27.375, 802.938, 87.6562, 39.2188, 53.6406, 72.9688, 72.2188, 106.266] }, { "length": 1888, "align1": 0, "align2": 59, "timings": [23.2969, 811.766, 21.5469, 39.6875, 56.25, 73.2031, 73.3594, 88.8438] }, { "length": 1888, "align1": 59, "align2": 59, "timings": [21.5781, 805.219, 20.0156, 22.5469, 51.9062, 37.0781, 39.8594, 41.1719] }, { "length": 1920, "align1": 0, "align2": 0, "timings": [22.5625, 818.109, 20.75, 23.8281, 47.7031, 35.25, 40.1875, 41.9688] }, { "length": 1920, "align1": 60, "align2": 0, "timings": [23.0625, 819.141, 21.8125, 39.7188, 49.0938, 74.0312, 73.3906, 89.5938] }, { "length": 1920, "align1": 0, "align2": 60, "timings": [23.9062, 817.953, 21.9375, 40.3594, 57.1562, 74.75, 75, 90.1406] }, { "length": 1920, "align1": 60, "align2": 60, "timings": [21.8125, 816.75, 20.4531, 22.4844, 52.6875, 36.6875, 41.2188, 42.2188] }, { "length": 1952, "align1": 0, "align2": 0, "timings": [22.75, 824.703, 21.0625, 23.6562, 52.8125, 36.1406, 40.5781, 42.5312] }, { "length": 1952, "align1": 61, "align2": 0, "timings": [23.2031, 832.562, 27.4844, 40.2969, 54.4219, 75.5625, 74.5625, 91.3281] }, { "length": 1952, "align1": 0, "align2": 61, "timings": [24.4219, 835.703, 22.9219, 40.6094, 56.7812, 76.7031, 75.4688, 90.6875] }, { "length": 1952, "align1": 61, "align2": 61, "timings": [22.5312, 834.641, 21.2812, 24.0469, 52.7812, 38.1562, 40.7812, 41.9844] }, { "length": 1984, "align1": 0, "align2": 0, "timings": [22.4688, 840.312, 20.6562, 23.5938, 53.1094, 36.9531, 41.1875, 43.1094] }, { "length": 1984, "align1": 62, "align2": 0, "timings": [23.0469, 832.453, 21.6562, 40.7812, 54.7812, 76.3906, 75.4062, 91.5312] }, { "length": 1984, "align1": 0, "align2": 62, "timings": [24.9375, 853.438, 23.375, 40.7969, 57.6875, 78.0469, 76.5938, 92.9844] }, { "length": 1984, "align1": 62, "align2": 62, "timings": [23.1719, 846.984, 21.6406, 23.5469, 54.0938, 38.125, 41.4844, 42.625] }, { "length": 2016, "align1": 0, "align2": 0, "timings": [22.7031, 855.812, 20.8594, 23.3594, 53.5625, 36.7188, 41.8594, 43.75] }, { "length": 2016, "align1": 63, "align2": 0, "timings": [28.2812, 856.766, 21.6406, 41.4219, 56.25, 76.9062, 76.7031, 93.5156] }, { "length": 2016, "align1": 0, "align2": 63, "timings": [24.2969, 861.422, 22.9219, 41.4688, 58.875, 77.1562, 77.7812, 94.75] }, { "length": 2016, "align1": 63, "align2": 63, "timings": [22.7656, 853.516, 21.2188, 23.7812, 54.7031, 38.7812, 42, 43.1875] }, { "length": 65536, "align1": 0, "align2": 0, "timings": [609.469, 30388.2, 604.344, 637.438, 1236.55, 1157.64, 1169.56, 1231.25] }] } } } ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH] powerpc: Use aligned stores in memset 2017-10-03 18:29 ` Adhemerval Zanella @ 2017-10-05 12:13 ` Rajalakshmi Srinivasaraghavan 2017-11-08 18:52 ` Tulio Magno Quites Machado Filho 1 sibling, 0 replies; 31+ messages in thread From: Rajalakshmi Srinivasaraghavan @ 2017-10-05 12:13 UTC (permalink / raw) To: Adhemerval Zanella, Tulio Magno Quites Machado Filho; +Cc: libc-alpha On 10/03/2017 11:59 PM, Adhemerval Zanella wrote: > I think one way to provide a slight better memcpy implementation for POWER8 > and still be able to circumvent the non-aligned on non-cacheable memory > is to use tunables. > > The branch azanella/memcpy-power8 [1] has a power8 memcpy optimization which > uses unaligned load and stores that I created some time ago but never actually > send upstream. It shows better performance on both bench-memcpy and > bench-memcpy-random (about 10% on latter) and mixed results on bench-memcpy-large > (which it is mainly dominated by memory throughput and on the environment I am > using, a shared PowerKVM instance, the results does not seem to be reliable). > > It could use some tunning, specially on some the range I used for unrolling > the load/stores and it also does not care for unaligned access on cross-page > boundary (which tend to be quite slow on current hardware, but also on > current page size of usual 64k also uncommon). > > This first patch does not enable this option as a default for POWER8, it just > add on string tests as an option. The second patch changes the selection to: > > 1. If glibc is configure with tunables, set the new implementation as the > default for ISA 2.07 (power8). > > 2. Also if tunable is active, add the parameter glibc.tune.aligned_memopt > to disable the new implementation selection. > > So programs that rely on aligned loads can set: > > GLIBC_TUNABLES=glibc.tune.aligned_memopt=1 > > And then the memcpy ifunc selection would pick the power7 one which uses > only aligned load and stores. > > This is a RFC patch and if the idea sounds to powerpc arch mantainers I can > work on finishing the patch with more comments and send upstream. I tried > to apply same unaligned idea for memset and memmove, but I could get any real > improvement in neither. > > [1]https://sourceware.org/git/?p=glibc.git;a=shortlog;h=refs/heads/azanella/memcpy-power8 Thanks for sharing the patches. At this point we are also working on memcpy for power8 with a different approach and we are planning to post it soon. We can choose the better performing version and use your tunables patch too. -- Thanks Rajalakshmi S ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH] powerpc: Use aligned stores in memset 2017-10-03 18:29 ` Adhemerval Zanella 2017-10-05 12:13 ` Rajalakshmi Srinivasaraghavan @ 2017-11-08 18:52 ` Tulio Magno Quites Machado Filho 2017-12-08 19:52 ` [PATCHv2] powerpc: POWER8 memcpy optimization for cached memory Tulio Magno Quites Machado Filho 1 sibling, 1 reply; 31+ messages in thread From: Tulio Magno Quites Machado Filho @ 2017-11-08 18:52 UTC (permalink / raw) To: Adhemerval Zanella, Rajalakshmi Srinivasaraghavan Cc: libc-alpha, Florian Weimer Adhemerval Zanella <adhemerval.zanella@linaro.org> writes: > I think one way to provide a slight better memcpy implementation for POWER8 > and still be able to circumvent the non-aligned on non-cacheable memory > is to use tunables. > > The branch azanella/memcpy-power8 [1] has a power8 memcpy optimization which > uses unaligned load and stores that I created some time ago but never actually > send upstream. It shows better performance on both bench-memcpy and > bench-memcpy-random (about 10% on latter) and mixed results on bench-memcpy-large > (which it is mainly dominated by memory throughput and on the environment I am > using, a shared PowerKVM instance, the results does not seem to be reliable). > > It could use some tunning, specially on some the range I used for unrolling > the load/stores and it also does not care for unaligned access on cross-page > boundary (which tend to be quite slow on current hardware, but also on > current page size of usual 64k also uncommon). > > This first patch does not enable this option as a default for POWER8, it just > add on string tests as an option. The second patch changes the selection to: > > 1. If glibc is configure with tunables, set the new implementation as the > default for ISA 2.07 (power8). > > 2. Also if tunable is active, add the parameter glibc.tune.aligned_memopt > to disable the new implementation selection. I think it would be safer if we don't change the default behavior. IMHO, programs that want a performance improvement would have to set a tunable. In other words, the new implementation would be disabled by default. > So programs that rely on aligned loads can set: > > GLIBC_TUNABLES=glibc.tune.aligned_memopt=1 I also think that we should not expose internal details of the implementation to users, i.e. avoiding to use aligned/unaligned in the name of the function and in the tunables. I think that glibc.tune.cached_memopt=1 better exposes what is the optimal use-case scenario of this implementation. > This is a RFC patch and if the idea sounds to powerpc arch mantainers I can > work on finishing the patch with more comments and send upstream. I tried > to apply same unaligned idea for memset and memmove, but I could get any real > improvement in neither. I like the idea. Could you merge both patches and send it to libc-alpha, please? > [1] https://sourceware.org/git/?p=glibc.git;a=shortlog;h=refs/heads/azanella/memcpy-power8 > [ bench-memcpy-random.out: text/plain ] > { > "timing_type": "hp_timing", > "functions": { > "memcpy": { > "bench-variant": "random", > "ifuncs": ["__memcpy_power8", I also suggest give a more specific name, e.g. __memcpy_power8_cached. That would make room for a POWER8 implementation what uses only naturally aligned loads/stores. Your implementation uses lxvd2x and stxvd2x, which should be avoided in a cache-inhibited scenario, i.e. glibc.tune.aligned_memopt=0. However, after changing the tunables' name to glibc.tune.cached_memopt, I think these instructions could stay they're executed when glibc.tune.cached_memopt=1. Thanks!!! -- Tulio Magno ^ permalink raw reply [flat|nested] 31+ messages in thread
* [PATCHv2] powerpc: POWER8 memcpy optimization for cached memory 2017-11-08 18:52 ` Tulio Magno Quites Machado Filho @ 2017-12-08 19:52 ` Tulio Magno Quites Machado Filho 2017-12-08 20:06 ` Florian Weimer 2017-12-10 7:11 ` Rajalakshmi Srinivasaraghavan 0 siblings, 2 replies; 31+ messages in thread From: Tulio Magno Quites Machado Filho @ 2017-12-08 19:52 UTC (permalink / raw) To: libc-alpha; +Cc: Adhemerval Zanella, Rajalakshmi Srinivasaraghavan From: Adhemerval Zanella <azanella@linux.vnet.ibm.com> I made the changes I requested, updated copyright entries, added a manual entry and fixed a build issue on powerpc64. --- 8< --- On POWER8, unaligned memory accesses to cached memory has little impact on performance as opposed to its ancestors. It is disabled by default and will only be available when the tunable glibc.tune.cached_memopt is set to 1. __memcpy_power8_cached __memcpy_power7 ============================================================ max-size=4096: 33325.70 ( 12.65%) 38153.00 max-size=8192: 32878.20 ( 11.17%) 37012.30 max-size=16384: 33782.20 ( 11.61%) 38219.20 max-size=32768: 33296.20 ( 11.30%) 37538.30 max-size=65536: 33765.60 ( 10.53%) 37738.40 2017-12-08 Adhemerval Zanella <azanella@linux.vnet.ibm.com> Tulio Magno Quites Machado Filho <tuliom@linux.vnet.ibm.com> * manual/tunables.texi (Hardware Capability Tunables): Document glibc.tune.cached_memopt. * sysdeps/powerpc/cpu-features.c: New file. * sysdeps/powerpc/cpu-features.h: New file. * sysdeps/powerpc/dl-procinfo.c [!IS_IN(ldconfig)]: Add _dl_powerpc_cpu_features. * sysdeps/powerpc/dl-tunables.list: New file. * sysdeps/powerpc/ldsodefs.h: Include cpu-features.h. * sysdeps/powerpc/powerpc32/power4/multiarch/init-arch.h: . * sysdeps/powerpc/powerpc64/dl-machine.h (INIT_ARCH): Initialize use_aligned_memopt. * sysdeps/powerpc/powerpc64/multiarch/Makefile (sysdep_routines): Add memcpy-power8-cached. * sysdeps/powerpc/powerpc64/multiarch/ifunc-impl-list.c: Add __memcpy_power8_cached. * sysdeps/powerpc/powerpc64/multiarch/memcpy.c: Likewise. * sysdeps/powerpc/powerpc64/multiarch/memcpy-power8-cached.S: New file. --- manual/tunables.texi | 7 + sysdeps/powerpc/cpu-features.c | 39 +++++ sysdeps/powerpc/cpu-features.h | 28 ++++ sysdeps/powerpc/dl-procinfo.c | 16 ++ sysdeps/powerpc/dl-tunables.list | 28 ++++ sysdeps/powerpc/ldsodefs.h | 1 + .../powerpc/powerpc32/power4/multiarch/init-arch.h | 2 + sysdeps/powerpc/powerpc64/dl-machine.h | 4 +- sysdeps/powerpc/powerpc64/multiarch/Makefile | 4 +- .../powerpc/powerpc64/multiarch/ifunc-impl-list.c | 2 + .../powerpc64/multiarch/memcpy-power8-cached.S | 179 +++++++++++++++++++++ sysdeps/powerpc/powerpc64/multiarch/memcpy.c | 23 +-- 12 files changed, 320 insertions(+), 13 deletions(-) create mode 100644 sysdeps/powerpc/cpu-features.c create mode 100644 sysdeps/powerpc/cpu-features.h create mode 100644 sysdeps/powerpc/dl-tunables.list create mode 100644 sysdeps/powerpc/powerpc64/multiarch/memcpy-power8-cached.S diff --git a/manual/tunables.texi b/manual/tunables.texi index e851b95..17ceb64 100644 --- a/manual/tunables.texi +++ b/manual/tunables.texi @@ -319,6 +319,13 @@ the ones in @code{sysdeps/x86/cpu-features.h}. This tunable is specific to i386 and x86-64. @end deftp +@deftp Tunable glibc.tune.cached_memopt +The @code{glibc.tune.cached_memopt=[0|1]} tunable allows the user to enable +optimizations recommended to cacheable memory. + +This tunable is specific to powerpc, powerpc64 and powerpc64le. +@end deftp + @deftp Tunable glibc.tune.cpu The @code{glibc.tune.cpu=xxx} tunable allows the user to tell @theglibc{} to assume that the CPU is @code{xxx} where xxx may have one of these values: diff --git a/sysdeps/powerpc/cpu-features.c b/sysdeps/powerpc/cpu-features.c new file mode 100644 index 0000000..6870582 --- /dev/null +++ b/sysdeps/powerpc/cpu-features.c @@ -0,0 +1,39 @@ +/* Initialize cpu feature data. PowerPC version. + Copyright (C) 2017 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + <http://www.gnu.org/licenses/>. */ + +#include <stdint.h> +#include <cpu-features.h> + +#if HAVE_TUNABLES +# include <elf/dl-tunables.h> +#endif + +static inline void +init_cpu_features (struct cpu_features *cpu_features) +{ + /* Default is to use aligned memory access on optimized function unless + tunables is enable, since for this case user can explicit disable + unaligned optimizations. */ +#if HAVE_TUNABLES + int32_t cached_memfunc = TUNABLE_GET (glibc, tune, cached_memopt, int32_t, + NULL); + cpu_features->use_cached_memopt = (cached_memfunc > 0); +#else + cpu_features->use_cached_memopt = false; +#endif +} diff --git a/sysdeps/powerpc/cpu-features.h b/sysdeps/powerpc/cpu-features.h new file mode 100644 index 0000000..36a8bb4 --- /dev/null +++ b/sysdeps/powerpc/cpu-features.h @@ -0,0 +1,28 @@ +/* Initialize cpu feature data. PowerPC version. + Copyright (C) 2017 Free Software Foundation, Inc. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + <http://www.gnu.org/licenses/>. */ + +#ifndef __CPU_FEATURES_POWERPC_H +# define __CPU_FEATURES_POWERPC_H + +#include <stdbool.h> + +struct cpu_features +{ + bool use_cached_memopt; +}; + +#endif /* __CPU_FEATURES_H */ diff --git a/sysdeps/powerpc/dl-procinfo.c b/sysdeps/powerpc/dl-procinfo.c index 55a6e78..c8b14454d 100644 --- a/sysdeps/powerpc/dl-procinfo.c +++ b/sysdeps/powerpc/dl-procinfo.c @@ -42,6 +42,22 @@ # define PROCINFO_CLASS #endif +#if !IS_IN (ldconfig) +# if !defined PROCINFO_DECL && defined SHARED + ._dl_powerpc_cpu_features +# else +PROCINFO_CLASS struct cpu_features _dl_powerpc_cpu_features +# endif +# ifndef PROCINFO_DECL += { } +# endif +# if !defined SHARED || defined PROCINFO_DECL +; +# else +, +# endif +#endif + #if !defined PROCINFO_DECL && defined SHARED ._dl_powerpc_cap_flags #else diff --git a/sysdeps/powerpc/dl-tunables.list b/sysdeps/powerpc/dl-tunables.list new file mode 100644 index 0000000..9e14b9a --- /dev/null +++ b/sysdeps/powerpc/dl-tunables.list @@ -0,0 +1,28 @@ +# powerpc specific tunables. +# Copyright (C) 2017 Free Software Foundation, Inc. +# This file is part of the GNU C Library. + +# The GNU C Library is free software; you can redistribute it and/or +# modify it under the terms of the GNU Lesser General Public +# License as published by the Free Software Foundation; either +# version 2.1 of the License, or (at your option) any later version. + +# The GNU C Library is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU +# Lesser General Public License for more details. + +# You should have received a copy of the GNU Lesser General Public +# License along with the GNU C Library; if not, see +# <http://www.gnu.org/licenses/>. + +glibc { + tune { + cached_memopt { + type: INT_32 + minval: 0 + maxval: 1 + default: 0 + } + } +} diff --git a/sysdeps/powerpc/ldsodefs.h b/sysdeps/powerpc/ldsodefs.h index 466de79..6f8b3a2 100644 --- a/sysdeps/powerpc/ldsodefs.h +++ b/sysdeps/powerpc/ldsodefs.h @@ -20,6 +20,7 @@ #define _POWERPC_LDSODEFS_H 1 #include <elf.h> +#include <cpu-features.h> struct La_ppc32_regs; struct La_ppc32_retval; diff --git a/sysdeps/powerpc/powerpc32/power4/multiarch/init-arch.h b/sysdeps/powerpc/powerpc32/power4/multiarch/init-arch.h index f2e6a4b..6038941 100644 --- a/sysdeps/powerpc/powerpc32/power4/multiarch/init-arch.h +++ b/sysdeps/powerpc/powerpc32/power4/multiarch/init-arch.h @@ -37,6 +37,8 @@ #define INIT_ARCH() \ unsigned long int hwcap = __GLRO(dl_hwcap); \ unsigned long int __attribute__((unused)) hwcap2 = __GLRO(dl_hwcap2); \ + bool __attribute__((unused)) use_cached_memopt = \ + GLRO(dl_powerpc_cpu_features).use_cached_memopt; \ if (hwcap & PPC_FEATURE_ARCH_2_06) \ hwcap |= PPC_FEATURE_ARCH_2_05 | \ PPC_FEATURE_POWER5_PLUS | \ diff --git a/sysdeps/powerpc/powerpc64/dl-machine.h b/sysdeps/powerpc/powerpc64/dl-machine.h index aeb91b8..76dceee 100644 --- a/sysdeps/powerpc/powerpc64/dl-machine.h +++ b/sysdeps/powerpc/powerpc64/dl-machine.h @@ -27,6 +27,7 @@ #include <dl-tls.h> #include <sysdep.h> #include <hwcapinfo.h> +#include <cpu-features.c> /* Translate a processor specific dynamic tag to the index in l_info array. */ @@ -300,13 +301,14 @@ BODY_PREFIX "_dl_start_user:\n" \ /* We define an initialization function to initialize HWCAP/HWCAP2 and platform data so it can be copied into the TCB later. This is called very early in _dl_sysdep_start for dynamically linked binaries. */ -#ifdef SHARED +#if defined(SHARED) && IS_IN (rtld) # define DL_PLATFORM_INIT dl_platform_init () static inline void __attribute__ ((unused)) dl_platform_init (void) { __tcb_parse_hwcap_and_convert_at_platform (); + init_cpu_features (&GLRO(dl_powerpc_cpu_features)); } #endif diff --git a/sysdeps/powerpc/powerpc64/multiarch/Makefile b/sysdeps/powerpc/powerpc64/multiarch/Makefile index dea49ac..4df6b45 100644 --- a/sysdeps/powerpc/powerpc64/multiarch/Makefile +++ b/sysdeps/powerpc/powerpc64/multiarch/Makefile @@ -1,6 +1,6 @@ ifeq ($(subdir),string) -sysdep_routines += memcpy-power7 memcpy-a2 memcpy-power6 memcpy-cell \ - memcpy-power4 memcpy-ppc64 \ +sysdep_routines += memcpy-power8-cached memcpy-power7 memcpy-a2 memcpy-power6 \ + memcpy-cell memcpy-power4 memcpy-ppc64 \ memcmp-power8 memcmp-power7 memcmp-power4 memcmp-ppc64 \ memset-power7 memset-power6 memset-power4 \ memset-ppc64 memset-power8 \ diff --git a/sysdeps/powerpc/powerpc64/multiarch/ifunc-impl-list.c b/sysdeps/powerpc/powerpc64/multiarch/ifunc-impl-list.c index 6a88536..77a60ea 100644 --- a/sysdeps/powerpc/powerpc64/multiarch/ifunc-impl-list.c +++ b/sysdeps/powerpc/powerpc64/multiarch/ifunc-impl-list.c @@ -51,6 +51,8 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array, #ifdef SHARED /* Support sysdeps/powerpc/powerpc64/multiarch/memcpy.c. */ IFUNC_IMPL (i, name, memcpy, + IFUNC_IMPL_ADD (array, i, memcpy, hwcap2 & PPC_FEATURE2_ARCH_2_07, + __memcpy_power8_cached) IFUNC_IMPL_ADD (array, i, memcpy, hwcap & PPC_FEATURE_HAS_VSX, __memcpy_power7) IFUNC_IMPL_ADD (array, i, memcpy, hwcap & PPC_FEATURE_ARCH_2_06, diff --git a/sysdeps/powerpc/powerpc64/multiarch/memcpy-power8-cached.S b/sysdeps/powerpc/powerpc64/multiarch/memcpy-power8-cached.S new file mode 100644 index 0000000..e5b6f25 --- /dev/null +++ b/sysdeps/powerpc/powerpc64/multiarch/memcpy-power8-cached.S @@ -0,0 +1,179 @@ +/* Optimized memcpy implementation for cached memory on PowerPC64/POWER8. + Copyright (C) 2017 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + <http://www.gnu.org/licenses/>. */ + +#include <sysdep.h> + + +/* __ptr_t [r3] memcpy (__ptr_t dst [r3], __ptr_t src [r4], size_t len [r5]); + Returns 'dst'. */ + + .machine power8 +ENTRY_TOCLESS (__memcpy_power8_cached, 5) + CALL_MCOUNT 3 + + cmpldi cr7,r5,15 + bgt cr7,L(ge_16) + andi. r9,r5,0x1 + mr r9,r3 + beq cr0,1f + lbz r10,0(r4) + addi r9,r3,1 + addi r4,r4,1 + stb r10,0(r3) +1: + andi. r10,r5,0x2 + beq cr0,2f + lhz r10,0(r4) + addi r9,r9,2 + addi r4,r4,2 + sth r10,-2(r9) +2: + andi. r10,r5,0x4 + beq cr0,3f + lwz r10,0(r4) + addi r9,9,4 + addi r4,4,4 + stw r10,-4(r9) +3: + andi. r10,r5,0x8 + beqlr cr0 + ld r10,0(r4) + std r10,0(r9) + blr + + .align 4 +L(ge_16): + cmpldi cr7,r5,32 + ble cr7,L(ge_16_le_32) + cmpldi cr7,r5,64 + ble cr7,L(gt_32_le_64) + + /* Align dst to 16 bytes. */ + andi. r9,r3,0xf + mr r12,r3 + beq cr0,L(dst_is_align_16) + lxvd2x v0,r0,r4 + subfic r12,r9,16 + subf r5,r12,r5 + add r4,r4,r12 + add r12,r3,r12 + stxvd2x v0,r0,r3 +L(dst_is_align_16): + cmpldi cr7,r5,127 + ble cr7,L(tail_copy) + addi r8,r5,-128 + mr r9,r12 + rldicr r8,r8,0,56 + li r11,16 + srdi r10,r8,7 + addi r0,r8,128 + addi r10,r10,1 + li r6,32 + mtctr r10 + li r7,48 + + /* Main loop, copy 128 bytes each time. */ + .align 4 +L(copy_128): + lxvd2x v10,r0,r4 + lxvd2x v11,r4,r11 + addi r8,r4,64 + addi r10,r9,64 + lxvd2x v12,r4,r6 + lxvd2x v0,r4,r7 + addi r4,r4,128 + stxvd2x v10,r0,r9 + stxvd2x v11,r9,r11 + stxvd2x v12,r9,r6 + stxvd2x v0,r9,r7 + addi r9,r9,128 + lxvd2x v10,r0,r8 + lxvd2x v11,r8,r11 + lxvd2x v12,r8,r6 + lxvd2x v0,r8,r7 + stxvd2x v10,r0,r10 + stxvd2x v11,r10,r11 + stxvd2x v12,r10,r6 + stxvd2x v0,r10,r7 + bdnz L(copy_128) + + add r12,r12,r0 + rldicl r5,r5,0,57 +L(tail_copy): + cmpldi cr7,r5,63 + ble cr7,L(tail_le_64) + li r8,16 + li r10,32 + lxvd2x v10,r0,r4 + li r9,48 + addi r5,r5,-64 + lxvd2x v11,r4,r8 + lxvd2x v12,r4,r10 + lxvd2x v0,r4,r9 + addi r4,r4,64 + stxvd2x v10,r0,r12 + stxvd2x v11,r12,r8 + stxvd2x v12,r12,r10 + stxvd2x v0,r12,9 + addi r12,r12,64 + +L(tail_le_64): + cmpldi cr7,r5,32 + bgt cr7,L(tail_gt_32_le_64) + cmpdi cr7,r5,0 + beqlr cr7 + addi r5,r5,-32 + li r9,16 + add r8,r4,r5 + add r10,r12,r5 + lxvd2x v12,r4,r5 + lxvd2x v0,r8,r9 + stxvd2x v12,r12,r5 + stxvd2x v0,r10,r9 + blr + + .align 4 +L(ge_16_le_32): + addi r5,r5,-16 + lxvd2x v0,r0,r4 + lxvd2x v1,r4,r5 + stxvd2x v0,r0,r3 + stxvd2x v1,r3,r5 + blr + + .align 4 +L(gt_32_le_64): + mr r12,r3 + + .align 4 +L(tail_gt_32_le_64): + li r9,16 + lxvd2x v0,r0,r4 + addi r5,r5,-32 + lxvd2x v1,r4,r9 + add r8,r4,r5 + lxvd2x v2,r4,r5 + add r10,r12,r5 + lxvd2x v3,r8,r9 + stxvd2x v0,r0,r12 + stxvd2x v1,r12,r9 + stxvd2x v2,r12,r5 + stxvd2x v3,r10,r9 + blr + +END_GEN_TB (__memcpy_power8_cached,TB_TOCLESS) diff --git a/sysdeps/powerpc/powerpc64/multiarch/memcpy.c b/sysdeps/powerpc/powerpc64/multiarch/memcpy.c index 9f4286c..fb49fe1 100644 --- a/sysdeps/powerpc/powerpc64/multiarch/memcpy.c +++ b/sysdeps/powerpc/powerpc64/multiarch/memcpy.c @@ -35,18 +35,21 @@ extern __typeof (__redirect_memcpy) __memcpy_cell attribute_hidden; extern __typeof (__redirect_memcpy) __memcpy_power6 attribute_hidden; extern __typeof (__redirect_memcpy) __memcpy_a2 attribute_hidden; extern __typeof (__redirect_memcpy) __memcpy_power7 attribute_hidden; +extern __typeof (__redirect_memcpy) __memcpy_power8_cached attribute_hidden; libc_ifunc (__libc_memcpy, - (hwcap & PPC_FEATURE_HAS_VSX) - ? __memcpy_power7 : - (hwcap & PPC_FEATURE_ARCH_2_06) - ? __memcpy_a2 : - (hwcap & PPC_FEATURE_ARCH_2_05) - ? __memcpy_power6 : - (hwcap & PPC_FEATURE_CELL_BE) - ? __memcpy_cell : - (hwcap & PPC_FEATURE_POWER4) - ? __memcpy_power4 + ((hwcap2 & PPC_FEATURE2_ARCH_2_07) && use_cached_memopt) + ? __memcpy_power8_cached : + (hwcap & PPC_FEATURE_HAS_VSX) + ? __memcpy_power7 : + (hwcap & PPC_FEATURE_ARCH_2_06) + ? __memcpy_a2 : + (hwcap & PPC_FEATURE_ARCH_2_05) + ? __memcpy_power6 : + (hwcap & PPC_FEATURE_CELL_BE) + ? __memcpy_cell : + (hwcap & PPC_FEATURE_POWER4) + ? __memcpy_power4 : __memcpy_ppc); #undef memcpy -- 2.9.5 ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCHv2] powerpc: POWER8 memcpy optimization for cached memory 2017-12-08 19:52 ` [PATCHv2] powerpc: POWER8 memcpy optimization for cached memory Tulio Magno Quites Machado Filho @ 2017-12-08 20:06 ` Florian Weimer 2017-12-11 12:44 ` Tulio Magno Quites Machado Filho 2017-12-10 7:11 ` Rajalakshmi Srinivasaraghavan 1 sibling, 1 reply; 31+ messages in thread From: Florian Weimer @ 2017-12-08 20:06 UTC (permalink / raw) To: Tulio Magno Quites Machado Filho, libc-alpha Cc: Adhemerval Zanella, Rajalakshmi Srinivasaraghavan On 12/08/2017 08:40 PM, Tulio Magno Quites Machado Filho wrote: > +@deftp Tunable glibc.tune.cached_memopt > +The @code{glibc.tune.cached_memopt=[0|1]} tunable allows the user to enable > +optimizations recommended to cacheable memory. > + > +This tunable is specific to powerpc, powerpc64 and powerpc64le. > +@end deftp I think this has a slight grammar problem. What about this instead? The @code{glibc.tune.cached_memopt=[0|1]} tunable allows the user to enable optimizations recommended for cacheable memory. If set to @code{1}, @theglibc{} assumes that the process memory image consists of cacheable (non-device) memory only. The default, @code{0}, indicates that the process may use device memory. (I think it's best not to mention string functions here because it is impossible to describe how glibc.tune.cached_memopt affects them due to compiler optimizations.) Thanks, Florian ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCHv2] powerpc: POWER8 memcpy optimization for cached memory 2017-12-08 20:06 ` Florian Weimer @ 2017-12-11 12:44 ` Tulio Magno Quites Machado Filho 2017-12-11 20:09 ` Adhemerval Zanella 0 siblings, 1 reply; 31+ messages in thread From: Tulio Magno Quites Machado Filho @ 2017-12-11 12:44 UTC (permalink / raw) To: Florian Weimer, libc-alpha Cc: Adhemerval Zanella, Rajalakshmi Srinivasaraghavan Florian Weimer <fweimer@redhat.com> writes: > On 12/08/2017 08:40 PM, Tulio Magno Quites Machado Filho wrote: >> +@deftp Tunable glibc.tune.cached_memopt >> +The @code{glibc.tune.cached_memopt=[0|1]} tunable allows the user to enable >> +optimizations recommended to cacheable memory. >> + >> +This tunable is specific to powerpc, powerpc64 and powerpc64le. >> +@end deftp > > I think this has a slight grammar problem. > > What about this instead? > > The @code{glibc.tune.cached_memopt=[0|1]} tunable allows the user to > enable optimizations recommended for cacheable memory. If set to > @code{1}, @theglibc{} assumes that the process memory image consists of > cacheable (non-device) memory only. The default, @code{0}, indicates > that the process may use device memory. And a much better description. > (I think it's best not to mention string functions here because it is > impossible to describe how glibc.tune.cached_memopt affects them due to > compiler optimizations.) Ack. Fixed locally. Thanks! -- Tulio Magno ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCHv2] powerpc: POWER8 memcpy optimization for cached memory 2017-12-11 12:44 ` Tulio Magno Quites Machado Filho @ 2017-12-11 20:09 ` Adhemerval Zanella 0 siblings, 0 replies; 31+ messages in thread From: Adhemerval Zanella @ 2017-12-11 20:09 UTC (permalink / raw) To: Tulio Magno Quites Machado Filho, Florian Weimer, libc-alpha Cc: Rajalakshmi Srinivasaraghavan On 11/12/2017 10:44, Tulio Magno Quites Machado Filho wrote: > Florian Weimer <fweimer@redhat.com> writes: > >> On 12/08/2017 08:40 PM, Tulio Magno Quites Machado Filho wrote: >>> +@deftp Tunable glibc.tune.cached_memopt >>> +The @code{glibc.tune.cached_memopt=[0|1]} tunable allows the user to enable >>> +optimizations recommended to cacheable memory. >>> + >>> +This tunable is specific to powerpc, powerpc64 and powerpc64le. >>> +@end deftp >> >> I think this has a slight grammar problem. >> >> What about this instead? >> >> The @code{glibc.tune.cached_memopt=[0|1]} tunable allows the user to >> enable optimizations recommended for cacheable memory. If set to >> @code{1}, @theglibc{} assumes that the process memory image consists of >> cacheable (non-device) memory only. The default, @code{0}, indicates >> that the process may use device memory. > > And a much better description. > >> (I think it's best not to mention string functions here because it is >> impossible to describe how glibc.tune.cached_memopt affects them due to >> compiler optimizations.) > > Ack. > > Fixed locally. > > Thanks! > Thanks for working on this Tulio, LGTM. ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCHv2] powerpc: POWER8 memcpy optimization for cached memory 2017-12-08 19:52 ` [PATCHv2] powerpc: POWER8 memcpy optimization for cached memory Tulio Magno Quites Machado Filho 2017-12-08 20:06 ` Florian Weimer @ 2017-12-10 7:11 ` Rajalakshmi Srinivasaraghavan 2017-12-11 19:48 ` Tulio Magno Quites Machado Filho 1 sibling, 1 reply; 31+ messages in thread From: Rajalakshmi Srinivasaraghavan @ 2017-12-10 7:11 UTC (permalink / raw) To: libc-alpha On 12/09/2017 01:10 AM, Tulio Magno Quites Machado Filho wrote: > From: Adhemerval Zanella<azanella@linux.vnet.ibm.com> > > I made the changes I requested, updated copyright entries, added a > manual entry and fixed a build issue on powerpc64. > > --- 8< --- > > On POWER8, unaligned memory accesses to cached memory has little impact > on performance as opposed to its ancestors. > > It is disabled by default and will only be available when the tunable > glibc.tune.cached_memopt is set to 1. > > __memcpy_power8_cached __memcpy_power7 > ============================================================ > max-size=4096: 33325.70 ( 12.65%) 38153.00 > max-size=8192: 32878.20 ( 11.17%) 37012.30 > max-size=16384: 33782.20 ( 11.61%) 38219.20 > max-size=32768: 33296.20 ( 11.30%) 37538.30 > max-size=65536: 33765.60 ( 10.53%) 37738.40 > > 2017-12-08 Adhemerval Zanella<azanella@linux.vnet.ibm.com> > Tulio Magno Quites Machado Filho<tuliom@linux.vnet.ibm.com> > > * manual/tunables.texi (Hardware Capability Tunables): Document > glibc.tune.cached_memopt. > * sysdeps/powerpc/cpu-features.c: New file. > * sysdeps/powerpc/cpu-features.h: New file. > * sysdeps/powerpc/dl-procinfo.c [!IS_IN(ldconfig)]: Add > _dl_powerpc_cpu_features. > * sysdeps/powerpc/dl-tunables.list: New file. > * sysdeps/powerpc/ldsodefs.h: Include cpu-features.h. > * sysdeps/powerpc/powerpc32/power4/multiarch/init-arch.h: . Comment missing. > * sysdeps/powerpc/powerpc64/dl-machine.h (INIT_ARCH): Initialize > use_aligned_memopt. Should this be moved to init-arch.h? (also use_cached_memopt) > * sysdeps/powerpc/powerpc64/multiarch/Makefile (sysdep_routines): > Add memcpy-power8-cached. > * sysdeps/powerpc/powerpc64/multiarch/ifunc-impl-list.c: Add > __memcpy_power8_cached. > * sysdeps/powerpc/powerpc64/multiarch/memcpy.c: Likewise. > * sysdeps/powerpc/powerpc64/multiarch/memcpy-power8-cached.S: > New file. > --- > diff --git a/sysdeps/powerpc/powerpc64/multiarch/memcpy-power8-cached.S b/sysdeps/powerpc/powerpc64/multiarch/memcpy-power8-cached.S > new file mode 100644 > index 0000000..e5b6f25 > --- /dev/null > +++ b/sysdeps/powerpc/powerpc64/multiarch/memcpy-power8-cached.S > @@ -0,0 +1,179 @@ > + stxvd2x v0,r0,r3 > +L(dst_is_align_16): > + cmpldi cr7,r5,127 > + ble cr7,L(tail_copy) > + addi r8,r5,-128 > + mr r9,r12 > + rldicr r8,r8,0,56 > + li r11,16 > + srdi r10,r8,7 > + addi r0,r8,128 > + addi r10,r10,1 Can we directly do rldicr r0, r5, 0, 56 srdi r10,r5,7 instead of this sequence? 79 addi r8,r5,-128 81 rldicr r8,r8,0,56 83 srdi r10,r8,7 84 addi r0,r8,128 85 addi r10,r10,1 > + li r6,32 > + mtctr r10 > + li r7,48 > + > + /* Main loop, copy 128 bytes each time. */ LGTM. Reviewed-by: Rajalakshmi Srinivasaraghavan <raji@linux.vnet.ibm.com> -- Thanks Rajalakshmi S ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCHv2] powerpc: POWER8 memcpy optimization for cached memory 2017-12-10 7:11 ` Rajalakshmi Srinivasaraghavan @ 2017-12-11 19:48 ` Tulio Magno Quites Machado Filho 0 siblings, 0 replies; 31+ messages in thread From: Tulio Magno Quites Machado Filho @ 2017-12-11 19:48 UTC (permalink / raw) To: Rajalakshmi Srinivasaraghavan, libc-alpha Rajalakshmi Srinivasaraghavan <raji@linux.vnet.ibm.com> writes: > On 12/09/2017 01:10 AM, Tulio Magno Quites Machado Filho wrote: >> * manual/tunables.texi (Hardware Capability Tunables): Document >> glibc.tune.cached_memopt. >> * sysdeps/powerpc/cpu-features.c: New file. >> * sysdeps/powerpc/cpu-features.h: New file. >> * sysdeps/powerpc/dl-procinfo.c [!IS_IN(ldconfig)]: Add >> _dl_powerpc_cpu_features. >> * sysdeps/powerpc/dl-tunables.list: New file. >> * sysdeps/powerpc/ldsodefs.h: Include cpu-features.h. >> * sysdeps/powerpc/powerpc32/power4/multiarch/init-arch.h: . > > Comment missing. Ooops. >> * sysdeps/powerpc/powerpc64/dl-machine.h (INIT_ARCH): Initialize >> use_aligned_memopt. > > Should this be moved to init-arch.h? (also use_cached_memopt) Indeed. Changed to: * sysdeps/powerpc/powerpc32/power4/multiarch/init-arch.h (INIT_ARCH): Initialize use_aligned_memopt. * sysdeps/powerpc/powerpc64/dl-machine.h [defined(SHARED && IS_IN(rtld))]: Restrict dl_platform_init availability and initialize CPU features used by tunables. >> diff --git a/sysdeps/powerpc/powerpc64/multiarch/memcpy-power8-cached.S b/sysdeps/powerpc/powerpc64/multiarch/memcpy-power8-cached.S >> new file mode 100644 >> index 0000000..e5b6f25 >> --- /dev/null >> +++ b/sysdeps/powerpc/powerpc64/multiarch/memcpy-power8-cached.S >> @@ -0,0 +1,179 @@ >> + stxvd2x v0,r0,r3 >> +L(dst_is_align_16): >> + cmpldi cr7,r5,127 >> + ble cr7,L(tail_copy) >> + addi r8,r5,-128 >> + mr r9,r12 >> + rldicr r8,r8,0,56 >> + li r11,16 >> + srdi r10,r8,7 >> + addi r0,r8,128 >> + addi r10,r10,1 > > Can we directly do > rldicr r0, r5, 0, 56 > srdi r10,r5,7 > instead of this sequence? > 79 addi r8,r5,-128 > 81 rldicr r8,r8,0,56 > 83 srdi r10,r8,7 > 84 addi r0,r8,128 > 85 addi r10,r10,1 Yes. I changed that and made more changes for clarity: - Replaced rldicr with clrrdi. - Replace r0 with 0 where it's treated as an immediate. Pushed as c9cd7b0ce5c5. -- Tulio Magno ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH] powerpc: Use aligned stores in memset 2017-08-18 5:13 [PATCH] powerpc: Use aligned stores in memset Rajalakshmi Srinivasaraghavan 2017-08-18 6:21 ` Florian Weimer @ 2017-08-18 6:25 ` Andrew Pinski 2017-08-21 2:20 ` Tulio Magno Quites Machado Filho 2 siblings, 0 replies; 31+ messages in thread From: Andrew Pinski @ 2017-08-18 6:25 UTC (permalink / raw) To: Rajalakshmi Srinivasaraghavan; +Cc: GNU C Library On Thu, Aug 17, 2017 at 10:11 PM, Rajalakshmi Srinivasaraghavan <raji@linux.vnet.ibm.com> wrote: > The powerpc hardware does not allow unaligned accesses on non cacheable > memory. This patch avoids misaligned stores for sizes less than 8 in > memset to avoid such cases. Tested on powerpc64 and powerpc64le. Why are you using memset on non cacheable memory? In fact how are you getting non-cacheable memory, mmap of /dev/mem or something different? Thanks, Andrew > > 2017-08-17 Rajalakshmi Srinivasaraghavan <raji@linux.vnet.ibm.com> > > * sysdeps/powerpc/powerpc64/power8/memset.S: Store byte by byte > for unaligned inputs if size is less than 8. > --- > sysdeps/powerpc/powerpc64/power8/memset.S | 68 ++++++++++++++++++++++++++++++- > 1 file changed, 66 insertions(+), 2 deletions(-) > > diff --git a/sysdeps/powerpc/powerpc64/power8/memset.S b/sysdeps/powerpc/powerpc64/power8/memset.S > index 7ad3bb1b00..504bab0841 100644 > --- a/sysdeps/powerpc/powerpc64/power8/memset.S > +++ b/sysdeps/powerpc/powerpc64/power8/memset.S > @@ -377,7 +377,8 @@ L(write_LT_32): > subf r5,r0,r5 > > 2: bf 30,1f > - sth r4,0(r10) > + stb r4,0(r10) > + stb r4,1(r10) > addi r10,r10,2 > > 1: bf 31,L(end_4bytes_alignment) > @@ -437,11 +438,74 @@ L(tail5): > /* Handles copies of 0~8 bytes. */ > .align 4 > L(write_LE_8): > - bne cr6,L(tail4) > + /* Use stb instead of sth which is safe for > + both aligned and unaligned inputs. */ > + bne cr6,L(LE7_tail4) > + /* If input is word aligned, use stw, Else use stb. */ > + andi. r0,r10,3 > + bne L(8_unalign) > > stw r4,0(r10) > stw r4,4(r10) > blr > + > + /* Unaligned input and size is 8. */ > + .align 4 > +L(8_unalign): > + andi. r0,r10,1 > + beq L(8_hwalign) > + stb r4,0(r10) > + sth r4,1(r10) > + sth r4,3(r10) > + sth r4,5(r10) > + stb r4,7(r10) > + blr > + > + /* Halfword aligned input and size is 8. */ > + .align 4 > +L(8_hwalign): > + sth r4,0(r10) > + sth r4,2(r10) > + sth r4,4(r10) > + sth r4,6(r10) > + blr > + > + .align 4 > + /* Copies 4~7 bytes. */ > +L(LE7_tail4): > + bf 29,L(LE7_tail2) > + stb r4,0(r10) > + stb r4,1(r10) > + stb r4,2(r10) > + stb r4,3(r10) > + bf 30,L(LE7_tail5) > + stb r4,4(r10) > + stb r4,5(r10) > + bflr 31 > + stb r4,6(r10) > + blr > + > + .align 4 > + /* Copies 2~3 bytes. */ > +L(LE7_tail2): > + bf 30,1f > + stb r4,0(r10) > + stb r4,1(r10) > + bflr 31 > + stb r4,2(r10) > + blr > + > + .align 4 > +L(LE7_tail5): > + bflr 31 > + stb r4,4(r10) > + blr > + > + .align 4 > +1: bflr 31 > + stb r4,0(r10) > + blr > + > END_GEN_TB (MEMSET,TB_TOCLESS) > libc_hidden_builtin_def (memset) > > -- > 2.11.0 > ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH] powerpc: Use aligned stores in memset 2017-08-18 5:13 [PATCH] powerpc: Use aligned stores in memset Rajalakshmi Srinivasaraghavan 2017-08-18 6:21 ` Florian Weimer 2017-08-18 6:25 ` [PATCH] powerpc: Use aligned stores in memset Andrew Pinski @ 2017-08-21 2:20 ` Tulio Magno Quites Machado Filho 2 siblings, 0 replies; 31+ messages in thread From: Tulio Magno Quites Machado Filho @ 2017-08-21 2:20 UTC (permalink / raw) To: Rajalakshmi Srinivasaraghavan, libc-alpha Rajalakshmi Srinivasaraghavan <raji@linux.vnet.ibm.com> writes: > The powerpc hardware does not allow unaligned accesses on non cacheable > memory. This patch avoids misaligned stores for sizes less than 8 in > memset to avoid such cases. Tested on powerpc64 and powerpc64le. This commit message is misleading. I think it's necessary to improve with: 1. Remove the first line. 2. Mention the performance impact and what causes it. 3. Reference the section "3.1.4.2 Alignment Interrupts" of the "POWER8 Processor User's Manual for the Single-Chip Module", which describes this behavior. 4. Mention which kind of programs are affected by the old behavior. > 2017-08-17 Rajalakshmi Srinivasaraghavan <raji@linux.vnet.ibm.com> > > * sysdeps/powerpc/powerpc64/power8/memset.S: Store byte by byte > for unaligned inputs if size is less than 8. > --- > sysdeps/powerpc/powerpc64/power8/memset.S | 68 ++++++++++++++++++++++++++++++- > 1 file changed, 66 insertions(+), 2 deletions(-) > > diff --git a/sysdeps/powerpc/powerpc64/power8/memset.S b/sysdeps/powerpc/powerpc64/power8/memset.S > index 7ad3bb1b00..504bab0841 100644 > --- a/sysdeps/powerpc/powerpc64/power8/memset.S > +++ b/sysdeps/powerpc/powerpc64/power8/memset.S > @@ -377,7 +377,8 @@ L(write_LT_32): > subf r5,r0,r5 > > 2: bf 30,1f > - sth r4,0(r10) > + stb r4,0(r10) > + stb r4,1(r10) Needs a comment to prevent future mistakes in the future. > @@ -437,11 +438,74 @@ L(tail5): > /* Handles copies of 0~8 bytes. */ > .align 4 > L(write_LE_8): > - bne cr6,L(tail4) > + /* Use stb instead of sth which is safe for > + both aligned and unaligned inputs. */ I don't think "safe" is the correct term. What about this? Use stb instead of sth because it doesn't generate alignment interrupts on cache-inhibited storage. > + bne cr6,L(LE7_tail4) > + /* If input is word aligned, use stw, Else use stb. */ s/Else/else/ -- Tulio Magno ^ permalink raw reply [flat|nested] 31+ messages in thread
end of thread, other threads:[~2017-12-11 20:09 UTC | newest] Thread overview: 31+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2017-08-18 5:13 [PATCH] powerpc: Use aligned stores in memset Rajalakshmi Srinivasaraghavan 2017-08-18 6:21 ` Florian Weimer 2017-08-18 6:51 ` Rajalakshmi Srinivasaraghavan 2017-08-18 9:10 ` Florian Weimer 2017-08-18 12:13 ` Adhemerval Zanella 2017-09-12 10:30 ` Florian Weimer 2017-09-12 12:18 ` Zack Weinberg 2017-09-12 13:57 ` Steven Munroe 2017-09-12 14:37 ` Joseph Myers 2017-09-12 15:06 ` Zack Weinberg 2017-09-12 17:09 ` Florian Weimer 2017-09-12 13:38 ` Steven Munroe 2017-09-12 14:08 ` Florian Weimer 2017-09-12 14:16 ` Steven Munroe 2017-09-12 17:04 ` Florian Weimer 2017-09-12 19:21 ` Steven Munroe 2017-09-12 19:45 ` Florian Weimer 2017-09-12 20:25 ` Steven Munroe 2017-09-13 13:12 ` Tulio Magno Quites Machado Filho 2017-09-18 13:54 ` Florian Weimer 2017-10-03 18:29 ` Adhemerval Zanella 2017-10-05 12:13 ` Rajalakshmi Srinivasaraghavan 2017-11-08 18:52 ` Tulio Magno Quites Machado Filho 2017-12-08 19:52 ` [PATCHv2] powerpc: POWER8 memcpy optimization for cached memory Tulio Magno Quites Machado Filho 2017-12-08 20:06 ` Florian Weimer 2017-12-11 12:44 ` Tulio Magno Quites Machado Filho 2017-12-11 20:09 ` Adhemerval Zanella 2017-12-10 7:11 ` Rajalakshmi Srinivasaraghavan 2017-12-11 19:48 ` Tulio Magno Quites Machado Filho 2017-08-18 6:25 ` [PATCH] powerpc: Use aligned stores in memset Andrew Pinski 2017-08-21 2:20 ` Tulio Magno Quites Machado Filho
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).