public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed
* [PATCH] powerpc: Use aligned stores in memset
@ 2017-08-18  5:13 Rajalakshmi Srinivasaraghavan
  2017-08-18  6:21 ` Florian Weimer
                   ` (2 more replies)
  0 siblings, 3 replies; 31+ messages in thread
From: Rajalakshmi Srinivasaraghavan @ 2017-08-18  5:13 UTC (permalink / raw)
  To: libc-alpha; +Cc: Rajalakshmi Srinivasaraghavan

The powerpc hardware does not allow unaligned accesses on non cacheable
memory.  This patch avoids misaligned stores for sizes less than 8 in
memset to avoid such cases.  Tested on powerpc64 and powerpc64le.

2017-08-17  Rajalakshmi Srinivasaraghavan  <raji@linux.vnet.ibm.com>

	* sysdeps/powerpc/powerpc64/power8/memset.S: Store byte by byte
	for unaligned inputs if size is less than 8.
---
 sysdeps/powerpc/powerpc64/power8/memset.S | 68 ++++++++++++++++++++++++++++++-
 1 file changed, 66 insertions(+), 2 deletions(-)

diff --git a/sysdeps/powerpc/powerpc64/power8/memset.S b/sysdeps/powerpc/powerpc64/power8/memset.S
index 7ad3bb1b00..504bab0841 100644
--- a/sysdeps/powerpc/powerpc64/power8/memset.S
+++ b/sysdeps/powerpc/powerpc64/power8/memset.S
@@ -377,7 +377,8 @@ L(write_LT_32):
 	subf	r5,r0,r5
 
 2:	bf	30,1f
-	sth	r4,0(r10)
+	stb	r4,0(r10)
+	stb	r4,1(r10)
 	addi	r10,r10,2
 
 1:	bf	31,L(end_4bytes_alignment)
@@ -437,11 +438,74 @@ L(tail5):
 	/* Handles copies of 0~8 bytes.  */
 	.align	4
 L(write_LE_8):
-	bne	cr6,L(tail4)
+	/* Use stb instead of sth which is safe for
+	   both aligned and unaligned inputs.  */
+	bne	cr6,L(LE7_tail4)
+	/* If input is word aligned, use stw, Else use stb.  */
+	andi.	r0,r10,3
+	bne	L(8_unalign)
 
 	stw	r4,0(r10)
 	stw	r4,4(r10)
 	blr
+
+	/* Unaligned input and size is 8.  */
+	.align	4
+L(8_unalign):
+	andi.	r0,r10,1
+	beq	L(8_hwalign)
+	stb	r4,0(r10)
+	sth	r4,1(r10)
+	sth	r4,3(r10)
+	sth	r4,5(r10)
+	stb	r4,7(r10)
+	blr
+
+	/* Halfword aligned input and size is 8.  */
+	.align	4
+L(8_hwalign):
+	sth	r4,0(r10)
+	sth	r4,2(r10)
+	sth	r4,4(r10)
+	sth	r4,6(r10)
+	blr
+
+	.align	4
+	/* Copies 4~7 bytes.  */
+L(LE7_tail4):
+	bf	29,L(LE7_tail2)
+	stb	r4,0(r10)
+	stb	r4,1(r10)
+	stb	r4,2(r10)
+	stb	r4,3(r10)
+	bf	30,L(LE7_tail5)
+	stb	r4,4(r10)
+	stb	r4,5(r10)
+	bflr	31
+	stb	r4,6(r10)
+	blr
+
+	.align	4
+	/* Copies 2~3 bytes.  */
+L(LE7_tail2):
+	bf	30,1f
+	stb	r4,0(r10)
+	stb	r4,1(r10)
+	bflr	31
+	stb	r4,2(r10)
+	blr
+
+	.align	4
+L(LE7_tail5):
+	bflr	31
+	stb	r4,4(r10)
+	blr
+
+	.align	4
+1: 	bflr	31
+	stb	r4,0(r10)
+	blr
+
 END_GEN_TB (MEMSET,TB_TOCLESS)
 libc_hidden_builtin_def (memset)
 
-- 
2.11.0

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] powerpc: Use aligned stores in memset
  2017-08-18  5:13 [PATCH] powerpc: Use aligned stores in memset Rajalakshmi Srinivasaraghavan
@ 2017-08-18  6:21 ` Florian Weimer
  2017-08-18  6:51   ` Rajalakshmi Srinivasaraghavan
  2017-08-18  6:25 ` [PATCH] powerpc: Use aligned stores in memset Andrew Pinski
  2017-08-21  2:20 ` Tulio Magno Quites Machado Filho
  2 siblings, 1 reply; 31+ messages in thread
From: Florian Weimer @ 2017-08-18  6:21 UTC (permalink / raw)
  To: Rajalakshmi Srinivasaraghavan, libc-alpha

On 08/18/2017 07:11 AM, Rajalakshmi Srinivasaraghavan wrote:
> 	* sysdeps/powerpc/powerpc64/power8/memset.S: Store byte by byte
> 	for unaligned inputs if size is less than 8.

This makes me rather nervous.  powerpc64le was supposed to have
reasonable efficient unaligned loads and stores.  GCC happily generates
them, too.

Thanks,
Florian

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] powerpc: Use aligned stores in memset
  2017-08-18  5:13 [PATCH] powerpc: Use aligned stores in memset Rajalakshmi Srinivasaraghavan
  2017-08-18  6:21 ` Florian Weimer
@ 2017-08-18  6:25 ` Andrew Pinski
  2017-08-21  2:20 ` Tulio Magno Quites Machado Filho
  2 siblings, 0 replies; 31+ messages in thread
From: Andrew Pinski @ 2017-08-18  6:25 UTC (permalink / raw)
  To: Rajalakshmi Srinivasaraghavan; +Cc: GNU C Library

On Thu, Aug 17, 2017 at 10:11 PM, Rajalakshmi Srinivasaraghavan
<raji@linux.vnet.ibm.com> wrote:
> The powerpc hardware does not allow unaligned accesses on non cacheable
> memory.  This patch avoids misaligned stores for sizes less than 8 in
> memset to avoid such cases.  Tested on powerpc64 and powerpc64le.

Why are you using memset on non cacheable memory?  In fact how are you
getting non-cacheable memory, mmap of /dev/mem or something different?

Thanks,
Andrew

>
> 2017-08-17  Rajalakshmi Srinivasaraghavan  <raji@linux.vnet.ibm.com>
>
>         * sysdeps/powerpc/powerpc64/power8/memset.S: Store byte by byte
>         for unaligned inputs if size is less than 8.
> ---
>  sysdeps/powerpc/powerpc64/power8/memset.S | 68 ++++++++++++++++++++++++++++++-
>  1 file changed, 66 insertions(+), 2 deletions(-)
>
> diff --git a/sysdeps/powerpc/powerpc64/power8/memset.S b/sysdeps/powerpc/powerpc64/power8/memset.S
> index 7ad3bb1b00..504bab0841 100644
> --- a/sysdeps/powerpc/powerpc64/power8/memset.S
> +++ b/sysdeps/powerpc/powerpc64/power8/memset.S
> @@ -377,7 +377,8 @@ L(write_LT_32):
>         subf    r5,r0,r5
>
>  2:     bf      30,1f
> -       sth     r4,0(r10)
> +       stb     r4,0(r10)
> +       stb     r4,1(r10)
>         addi    r10,r10,2
>
>  1:     bf      31,L(end_4bytes_alignment)
> @@ -437,11 +438,74 @@ L(tail5):
>         /* Handles copies of 0~8 bytes.  */
>         .align  4
>  L(write_LE_8):
> -       bne     cr6,L(tail4)
> +       /* Use stb instead of sth which is safe for
> +          both aligned and unaligned inputs.  */
> +       bne     cr6,L(LE7_tail4)
> +       /* If input is word aligned, use stw, Else use stb.  */
> +       andi.   r0,r10,3
> +       bne     L(8_unalign)
>
>         stw     r4,0(r10)
>         stw     r4,4(r10)
>         blr
> +
> +       /* Unaligned input and size is 8.  */
> +       .align  4
> +L(8_unalign):
> +       andi.   r0,r10,1
> +       beq     L(8_hwalign)
> +       stb     r4,0(r10)
> +       sth     r4,1(r10)
> +       sth     r4,3(r10)
> +       sth     r4,5(r10)
> +       stb     r4,7(r10)
> +       blr
> +
> +       /* Halfword aligned input and size is 8.  */
> +       .align  4
> +L(8_hwalign):
> +       sth     r4,0(r10)
> +       sth     r4,2(r10)
> +       sth     r4,4(r10)
> +       sth     r4,6(r10)
> +       blr
> +
> +       .align  4
> +       /* Copies 4~7 bytes.  */
> +L(LE7_tail4):
> +       bf      29,L(LE7_tail2)
> +       stb     r4,0(r10)
> +       stb     r4,1(r10)
> +       stb     r4,2(r10)
> +       stb     r4,3(r10)
> +       bf      30,L(LE7_tail5)
> +       stb     r4,4(r10)
> +       stb     r4,5(r10)
> +       bflr    31
> +       stb     r4,6(r10)
> +       blr
> +
> +       .align  4
> +       /* Copies 2~3 bytes.  */
> +L(LE7_tail2):
> +       bf      30,1f
> +       stb     r4,0(r10)
> +       stb     r4,1(r10)
> +       bflr    31
> +       stb     r4,2(r10)
> +       blr
> +
> +       .align  4
> +L(LE7_tail5):
> +       bflr    31
> +       stb     r4,4(r10)
> +       blr
> +
> +       .align  4
> +1:     bflr    31
> +       stb     r4,0(r10)
> +       blr
> +
>  END_GEN_TB (MEMSET,TB_TOCLESS)
>  libc_hidden_builtin_def (memset)
>
> --
> 2.11.0
>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] powerpc: Use aligned stores in memset
  2017-08-18  6:21 ` Florian Weimer
@ 2017-08-18  6:51   ` Rajalakshmi Srinivasaraghavan
  2017-08-18  9:10     ` Florian Weimer
  0 siblings, 1 reply; 31+ messages in thread
From: Rajalakshmi Srinivasaraghavan @ 2017-08-18  6:51 UTC (permalink / raw)
  To: Florian Weimer, libc-alpha



On 08/18/2017 11:51 AM, Florian Weimer wrote:
> On 08/18/2017 07:11 AM, Rajalakshmi Srinivasaraghavan wrote:
>> 	* sysdeps/powerpc/powerpc64/power8/memset.S: Store byte by byte
>> 	for unaligned inputs if size is less than 8.
> 
> This makes me rather nervous.  powerpc64le was supposed to have
> reasonable efficient unaligned loads and stores.  GCC happily generates
> them, too.

This is meant ONLY for caching inhibited accesses.  Caching Inhibited 
accesses are required to be Guarded and properly aligned.

> 
> Thanks,
> Florian
> 
> 

-- 
Thanks
Rajalakshmi S

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] powerpc: Use aligned stores in memset
  2017-08-18  6:51   ` Rajalakshmi Srinivasaraghavan
@ 2017-08-18  9:10     ` Florian Weimer
  2017-08-18 12:13       ` Adhemerval Zanella
  2017-09-12 10:30       ` Florian Weimer
  0 siblings, 2 replies; 31+ messages in thread
From: Florian Weimer @ 2017-08-18  9:10 UTC (permalink / raw)
  To: Rajalakshmi Srinivasaraghavan; +Cc: libc-alpha

On 08/18/2017 08:51 AM, Rajalakshmi Srinivasaraghavan wrote:
> 
> 
> On 08/18/2017 11:51 AM, Florian Weimer wrote:
>> On 08/18/2017 07:11 AM, Rajalakshmi Srinivasaraghavan wrote:
>>>     * sysdeps/powerpc/powerpc64/power8/memset.S: Store byte by byte
>>>     for unaligned inputs if size is less than 8.
>>
>> This makes me rather nervous.  powerpc64le was supposed to have
>> reasonable efficient unaligned loads and stores.  GCC happily generates
>> them, too.
> 
> This is meant ONLY for caching inhibited accesses.  Caching Inhibited
> accesses are required to be Guarded and properly aligned.

The intent is to support memset for such memory regions, right?  This
change is insufficient.  You have to fix GCC as well because it will
inline memset of unaligned pointers, like this:

typedef long __attribute__ ((aligned(1))) long_unaligned;

void
clear (long_unaligned *p)
{
  memset (p, 0, sizeof (*p));
}

clear:
	li 9,0
	std 9,0(3)
	blr

That's why I think your change is not useful in isolation.

Thanks,
Florian

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] powerpc: Use aligned stores in memset
  2017-08-18  9:10     ` Florian Weimer
@ 2017-08-18 12:13       ` Adhemerval Zanella
  2017-09-12 10:30       ` Florian Weimer
  1 sibling, 0 replies; 31+ messages in thread
From: Adhemerval Zanella @ 2017-08-18 12:13 UTC (permalink / raw)
  To: libc-alpha



On 18/08/2017 06:10, Florian Weimer wrote:
> On 08/18/2017 08:51 AM, Rajalakshmi Srinivasaraghavan wrote:
>>
>>
>> On 08/18/2017 11:51 AM, Florian Weimer wrote:
>>> On 08/18/2017 07:11 AM, Rajalakshmi Srinivasaraghavan wrote:
>>>>     * sysdeps/powerpc/powerpc64/power8/memset.S: Store byte by byte
>>>>     for unaligned inputs if size is less than 8.
>>>
>>> This makes me rather nervous.  powerpc64le was supposed to have
>>> reasonable efficient unaligned loads and stores.  GCC happily generates
>>> them, too.
>>
>> This is meant ONLY for caching inhibited accesses.  Caching Inhibited
>> accesses are required to be Guarded and properly aligned.
> 
> The intent is to support memset for such memory regions, right?  This
> change is insufficient.  You have to fix GCC as well because it will
> inline memset of unaligned pointers, like this:
> 
> typedef long __attribute__ ((aligned(1))) long_unaligned;
> 
> void
> clear (long_unaligned *p)
> {
>   memset (p, 0, sizeof (*p));
> }
> 
> clear:
> 	li 9,0
> 	std 9,0(3)
> 	blr
> 
> That's why I think your change is not useful in isolation.


POWER8 does have fast unaligned access memory and in fact unaligned access
could be used to provide a faster memcpy/memmove implementation (I created
one that I never sent upstream some time ago [1]). Unaligned accesses are
used extensively in some optimized str* implementation I created for POWER8. 
It also allows GCC to use unaligned access for builtin mem* operation without
issue on *most* of the cases.

The problem is memset/memcpy/memmove *specifically* are used in some userland
drivers for DMA (if I recall correctly for some XORG drivers) and for this
specific user cases using unaligned access, specially vector ones, will case
the kernel to trap on *every* unaligned instruction leading to abysmal
performance. That's why I pushed 87868c2418fb74357757e3b739ce5b76b17a8929
to fix this very issue for POWER7 memcpy.

We already discussed this same issue some time ago [2] to try overcome this
limitation. I think ideally the drivers that rely on aligned mem* operations
should we its own mem* operations (similar to how dpdk does [3]).

[1] https://github.com/zatrazz/glibc/commits/memopt-power8
[2] https://sourceware.org/ml/libc-alpha/2015-01/msg00130.html
[3] http://dpdk.org/browse/dpdk/tree/lib/librte_eal/common/include/arch/ppc_64/rte_memcpy.h

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] powerpc: Use aligned stores in memset
  2017-08-18  5:13 [PATCH] powerpc: Use aligned stores in memset Rajalakshmi Srinivasaraghavan
  2017-08-18  6:21 ` Florian Weimer
  2017-08-18  6:25 ` [PATCH] powerpc: Use aligned stores in memset Andrew Pinski
@ 2017-08-21  2:20 ` Tulio Magno Quites Machado Filho
  2 siblings, 0 replies; 31+ messages in thread
From: Tulio Magno Quites Machado Filho @ 2017-08-21  2:20 UTC (permalink / raw)
  To: Rajalakshmi Srinivasaraghavan, libc-alpha

Rajalakshmi Srinivasaraghavan <raji@linux.vnet.ibm.com> writes:

> The powerpc hardware does not allow unaligned accesses on non cacheable
> memory.  This patch avoids misaligned stores for sizes less than 8 in
> memset to avoid such cases.  Tested on powerpc64 and powerpc64le.

This commit message is misleading.  I think it's necessary to improve with:

 1. Remove the first line.
 2. Mention the performance impact and what causes it.
 3. Reference the section "3.1.4.2 Alignment Interrupts" of the "POWER8
    Processor User's Manual for the Single-Chip Module", which describes
    this behavior.
 4. Mention which kind of programs are affected by the old behavior.

> 2017-08-17  Rajalakshmi Srinivasaraghavan  <raji@linux.vnet.ibm.com>
>
> 	* sysdeps/powerpc/powerpc64/power8/memset.S: Store byte by byte
> 	for unaligned inputs if size is less than 8.
> ---
>  sysdeps/powerpc/powerpc64/power8/memset.S | 68 ++++++++++++++++++++++++++++++-
>  1 file changed, 66 insertions(+), 2 deletions(-)
>
> diff --git a/sysdeps/powerpc/powerpc64/power8/memset.S b/sysdeps/powerpc/powerpc64/power8/memset.S
> index 7ad3bb1b00..504bab0841 100644
> --- a/sysdeps/powerpc/powerpc64/power8/memset.S
> +++ b/sysdeps/powerpc/powerpc64/power8/memset.S
> @@ -377,7 +377,8 @@ L(write_LT_32):
>  	subf	r5,r0,r5
>
>  2:	bf	30,1f
> -	sth	r4,0(r10)
> +	stb	r4,0(r10)
> +	stb	r4,1(r10)

Needs a comment to prevent future mistakes in the future.

> @@ -437,11 +438,74 @@ L(tail5):
>  	/* Handles copies of 0~8 bytes.  */
>  	.align	4
>  L(write_LE_8):
> -	bne	cr6,L(tail4)
> +	/* Use stb instead of sth which is safe for
> +	   both aligned and unaligned inputs.  */

I don't think "safe" is the correct term.  What about this?

    Use stb instead of sth because it doesn't generate alignment interrupts
    on cache-inhibited storage.

> +	bne	cr6,L(LE7_tail4)
> +	/* If input is word aligned, use stw, Else use stb.  */

s/Else/else/


-- 
Tulio Magno

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] powerpc: Use aligned stores in memset
  2017-08-18  9:10     ` Florian Weimer
  2017-08-18 12:13       ` Adhemerval Zanella
@ 2017-09-12 10:30       ` Florian Weimer
  2017-09-12 12:18         ` Zack Weinberg
                           ` (2 more replies)
  1 sibling, 3 replies; 31+ messages in thread
From: Florian Weimer @ 2017-09-12 10:30 UTC (permalink / raw)
  To: Rajalakshmi Srinivasaraghavan; +Cc: libc-alpha

On 08/18/2017 11:10 AM, Florian Weimer wrote:
> On 08/18/2017 08:51 AM, Rajalakshmi Srinivasaraghavan wrote:
>>
>>
>> On 08/18/2017 11:51 AM, Florian Weimer wrote:
>>> On 08/18/2017 07:11 AM, Rajalakshmi Srinivasaraghavan wrote:
>>>>     * sysdeps/powerpc/powerpc64/power8/memset.S: Store byte by byte
>>>>     for unaligned inputs if size is less than 8.
>>>
>>> This makes me rather nervous.  powerpc64le was supposed to have
>>> reasonable efficient unaligned loads and stores.  GCC happily generates
>>> them, too.
>>
>> This is meant ONLY for caching inhibited accesses.  Caching Inhibited
>> accesses are required to be Guarded and properly aligned.
> 
> The intent is to support memset for such memory regions, right?  This
> change is insufficient.  You have to fix GCC as well because it will
> inline memset of unaligned pointers, like this:

Here's a more complete example:


#include <assert.h>
#include <stdio.h>
#include <string.h>

typedef long __attribute__ ((aligned(1))) long_unaligned;

__attribute__ ((noinline, noclone, weak))
void
clear (long_unaligned *p)
{
  memset (p, 0, sizeof (*p));
}

struct data
{
  char misalign;
  long_unaligned data;
};

int
main (void)
{
  struct data *data = malloc (sizeof (*data));
  assert (data != NULL);
  long_unaligned *p = &data->data;
  printf ("pointer: %p\n", p);
  clear (p);
  return 0;
}

The clear function compiles to:

typedef long __attribute__ ((aligned(1))) long_unaligned;

void
clear (long_unaligned *p)
{
  memset (p, 0, sizeof (*p));
}

At run time, I get:

pointer: 0x10003c10011

This means that GCC introduced an unaligned store, no matter how memset
was implemented.

I could not find the manual which has the requirement that the mem*
functions do not use unaligned accesses.  Unless they are worded in a
very peculiar way, right now, the GCC/glibc combination does not comply
with a requirement that memset & Co. can be used for device memory access.

Furthermore, I find it very peculiar that over-reading device memory is
acceptable.  Some memory-mapped devices behave strangely if memory
locations are read out of order or multiple times, and the current glibc
implementation accesses locations which are outside the specified object
boundaries.

So I think the implementation constraint on the mem* functions is wrong.
 It leads to a slower implementation of the mem* function for most of
userspace which does not access device memory, and even for device
memory, it is probably not what you want.

Thanks,
Florian

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] powerpc: Use aligned stores in memset
  2017-09-12 10:30       ` Florian Weimer
@ 2017-09-12 12:18         ` Zack Weinberg
  2017-09-12 13:57           ` Steven Munroe
                             ` (2 more replies)
  2017-09-12 13:38         ` Steven Munroe
  2017-09-13 13:12         ` Tulio Magno Quites Machado Filho
  2 siblings, 3 replies; 31+ messages in thread
From: Zack Weinberg @ 2017-09-12 12:18 UTC (permalink / raw)
  To: Florian Weimer; +Cc: Rajalakshmi Srinivasaraghavan, GNU C Library

On Tue, Sep 12, 2017 at 6:30 AM, Florian Weimer <fweimer@redhat.com> wrote:
>
> I could not find the manual which has the requirement that the mem*
> functions do not use unaligned accesses.  Unless they are worded in a
> very peculiar way, right now, the GCC/glibc combination does not comply
> with a requirement that memset & Co. can be used for device memory access.

mem* are required to behave as-if they access memory as an array of
unsigned char.  Therefore it is valid to give them arbitrarily
(un)aligned pointers.  The C abstract machine doesn't specifically
contemplate the possibility of a CPU that can do unaligned word reads
but maybe not to all memory addresses, but I would argue that if there
is such a CPU, then mem* are obliged to cope with it.

> ...the current glibc
> implementation accesses locations which are outside the specified object
> boundaries.

I think that's technically a defect.  Nothing in the C standard
licenses it to do that; we just get away with it because, on the
implementations to date, it's not observable (unless you go past the
end of a page, which you'll note there are a bunch of tests to ensure
we don't do).  If an over-read by a single byte is observable, then
mem* is not allowed to do that.

zw

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] powerpc: Use aligned stores in memset
  2017-09-12 10:30       ` Florian Weimer
  2017-09-12 12:18         ` Zack Weinberg
@ 2017-09-12 13:38         ` Steven Munroe
  2017-09-12 14:08           ` Florian Weimer
  2017-09-13 13:12         ` Tulio Magno Quites Machado Filho
  2 siblings, 1 reply; 31+ messages in thread
From: Steven Munroe @ 2017-09-12 13:38 UTC (permalink / raw)
  To: Florian Weimer; +Cc: Rajalakshmi Srinivasaraghavan, libc-alpha

On Tue, 2017-09-12 at 12:30 +0200, Florian Weimer wrote:
> On 08/18/2017 11:10 AM, Florian Weimer wrote:
> > On 08/18/2017 08:51 AM, Rajalakshmi Srinivasaraghavan wrote:
> >>
> >>
> >> On 08/18/2017 11:51 AM, Florian Weimer wrote:
> >>> On 08/18/2017 07:11 AM, Rajalakshmi Srinivasaraghavan wrote:
> >>>>     * sysdeps/powerpc/powerpc64/power8/memset.S: Store byte by byte
> >>>>     for unaligned inputs if size is less than 8.
> >>>
> >>> This makes me rather nervous.  powerpc64le was supposed to have
> >>> reasonable efficient unaligned loads and stores.  GCC happily generates
> >>> them, too.
> >>
> >> This is meant ONLY for caching inhibited accesses.  Caching Inhibited
> >> accesses are required to be Guarded and properly aligned.
> > 
> > The intent is to support memset for such memory regions, right?  This
> > change is insufficient.  You have to fix GCC as well because it will
> > inline memset of unaligned pointers, like this:
> 
> Here's a more complete example:
> 

..snip

> 
> This means that GCC introduced an unaligned store, no matter how memset
> was implemented.
> 
C will do what ever the programmer wants. We can not stop that. 

And in user mode and cache coherent memory this is not a problem as
Adhemerval explained.

So we are not going to degrade the performance of general applications
for a tiny subset of specialized device drivers. Those guy have to know
what they are doing.

But in the library (like libc) that might be called from a user mode
device driver (Xorg for example) and access Cache inhibited memory the
memcpy implementation has to check alignment and size and using the
correct instructions for each case.

That is what we are doing here. 


> I could not find the manual which has the requirement that the mem*
> functions do not use unaligned accesses.  Unless they are worded in a
> very peculiar way, right now, the GCC/glibc combination does not comply
> with a requirement that memset & Co. can be used for device memory access.
> 
> Furthermore, I find it very peculiar that over-reading device memory is
> acceptable.  Some memory-mapped devices behave strangely if memory
> locations are read out of order or multiple times, and the current glibc
> implementation accesses locations which are outside the specified object
> boundaries.
> 
Yes device driver writers have to know what they are doing.

> So I think the implementation constraint on the mem* functions is wrong.
>  It leads to a slower implementation of the mem* function for most of
> userspace which does not access device memory, and even for device
> memory, it is probably not what you want.
> 
We are just trying to make the mem* safe (not segfault or alignment
check) if used correctly. 

The definition of correctly is a bit fluid. I personally disagree with
the Xorg folks but so far they have refused to bend...


> Thanks,
> Florian
> 


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] powerpc: Use aligned stores in memset
  2017-09-12 12:18         ` Zack Weinberg
@ 2017-09-12 13:57           ` Steven Munroe
  2017-09-12 14:37           ` Joseph Myers
  2017-09-12 17:09           ` Florian Weimer
  2 siblings, 0 replies; 31+ messages in thread
From: Steven Munroe @ 2017-09-12 13:57 UTC (permalink / raw)
  To: Zack Weinberg
  Cc: Florian Weimer, Rajalakshmi Srinivasaraghavan, GNU C Library

On Tue, 2017-09-12 at 08:18 -0400, Zack Weinberg wrote:
> On Tue, Sep 12, 2017 at 6:30 AM, Florian Weimer <fweimer@redhat.com> wrote:
> >
> > I could not find the manual which has the requirement that the mem*
> > functions do not use unaligned accesses.  Unless they are worded in a
> > very peculiar way, right now, the GCC/glibc combination does not comply
> > with a requirement that memset & Co. can be used for device memory access.
> 
> mem* are required to behave as-if they access memory as an array of
> unsigned char.  Therefore it is valid to give them arbitrarily
> (un)aligned pointers.  The C abstract machine doesn't specifically
> contemplate the possibility of a CPU that can do unaligned word reads
> but maybe not to all memory addresses, but I would argue that if there
> is such a CPU, then mem* are obliged to cope with it.
> 
> > ...the current glibc
> > implementation accesses locations which are outside the specified object
> > boundaries.
> 
> I think that's technically a defect.  Nothing in the C standard
> licenses it to do that; we just get away with it because, on the
> implementations to date, it's not observable (unless you go past the
> end of a page, which you'll note there are a bunch of tests to ensure
> we don't do).  If an over-read by a single byte is observable, then
> mem* is not allowed to do that.
> 
Also a bit of over reaction.

As long a the library routine does no cause a visible artifact (segfault
or alignment check) aligned access before or after the requested start
address and length is an optimization.

For example accessing the source at offset 3 and length 10 with an
aligned quadword load is Ok as long I clear the leading and trailing
bytes.

But attempting to store 7 bytes within a quadword by merging bytes in a
register and storing the whole quadword would violate single copy
atomicity and is not allowed.

> zw
> 


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] powerpc: Use aligned stores in memset
  2017-09-12 13:38         ` Steven Munroe
@ 2017-09-12 14:08           ` Florian Weimer
  2017-09-12 14:16             ` Steven Munroe
  0 siblings, 1 reply; 31+ messages in thread
From: Florian Weimer @ 2017-09-12 14:08 UTC (permalink / raw)
  To: Steven Munroe; +Cc: Rajalakshmi Srinivasaraghavan, libc-alpha

* Steven Munroe:

>> This means that GCC introduced an unaligned store, no matter how memset
>> was implemented.
>> 
> C will do what ever the programmer wants. We can not stop that.

That's not true.  If some specification says that for POWER, mem* must
behave in a certain way, and the GCC/glibc combiniation does not do
that, that's a bug on POWER.

The programmer only sees the entire toolchain, and it is our job to
make the whole thing compliant with applicable specifications, even if
this means coordinating among different projects.

> And in user mode and cache coherent memory this is not a problem as
> Adhemerval explained.

Obviously not, otherwise we wouldn't be changing glibc.

> So we are not going to degrade the performance of general applications
> for a tiny subset of specialized device drivers. Those guy have to know
> what they are doing.
>
> But in the library (like libc) that might be called from a user mode
> device driver (Xorg for example) and access Cache inhibited memory the
> memcpy implementation has to check alignment and size and using the
> correct instructions for each case.
>
> That is what we are doing here.

Sorry, but you are contradicting yourself.  I very much doubt the
Xorg-compatible memcmp is an improvement across the board.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] powerpc: Use aligned stores in memset
  2017-09-12 14:08           ` Florian Weimer
@ 2017-09-12 14:16             ` Steven Munroe
  2017-09-12 17:04               ` Florian Weimer
  0 siblings, 1 reply; 31+ messages in thread
From: Steven Munroe @ 2017-09-12 14:16 UTC (permalink / raw)
  To: Florian Weimer; +Cc: Rajalakshmi Srinivasaraghavan, libc-alpha

On Tue, 2017-09-12 at 16:08 +0200, Florian Weimer wrote:
> * Steven Munroe:
> 
> >> This means that GCC introduced an unaligned store, no matter how memset
> >> was implemented.
> >> 
> > C will do what ever the programmer wants. We can not stop that.
> 
> That's not true.  If some specification says that for POWER, mem* must
> behave in a certain way, and the GCC/glibc combiniation does not do
> that, that's a bug on POWER.
> 
What is the bug that you think we are not fixing?

> The programmer only sees the entire toolchain, and it is our job to
> make the whole thing compliant with applicable specifications, even if
> this means coordinating among different projects.
> 
> > And in user mode and cache coherent memory this is not a problem as
> > Adhemerval explained.
> 
> Obviously not, otherwise we wouldn't be changing glibc.
> 
I was arguing against forcing GCC and compilers in general being forced
to be aware of Cache Inhibited memory. Programmers do.

What are you arguing? 

> > So we are not going to degrade the performance of general applications
> > for a tiny subset of specialized device drivers. Those guy have to know
> > what they are doing.
> >
> > But in the library (like libc) that might be called from a user mode
> > device driver (Xorg for example) and access Cache inhibited memory the
> > memcpy implementation has to check alignment and size and using the
> > correct instructions for each case.
> >
> > That is what we are doing here.
> 
> Sorry, but you are contradicting yourself.  I very much doubt the
> Xorg-compatible memcmp is an improvement across the board.
> 


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] powerpc: Use aligned stores in memset
  2017-09-12 12:18         ` Zack Weinberg
  2017-09-12 13:57           ` Steven Munroe
@ 2017-09-12 14:37           ` Joseph Myers
  2017-09-12 15:06             ` Zack Weinberg
  2017-09-12 17:09           ` Florian Weimer
  2 siblings, 1 reply; 31+ messages in thread
From: Joseph Myers @ 2017-09-12 14:37 UTC (permalink / raw)
  To: Zack Weinberg
  Cc: Florian Weimer, Rajalakshmi Srinivasaraghavan, GNU C Library

On Tue, 12 Sep 2017, Zack Weinberg wrote:

> On Tue, Sep 12, 2017 at 6:30 AM, Florian Weimer <fweimer@redhat.com> wrote:
> >
> > I could not find the manual which has the requirement that the mem*
> > functions do not use unaligned accesses.  Unless they are worded in a
> > very peculiar way, right now, the GCC/glibc combination does not comply
> > with a requirement that memset & Co. can be used for device memory access.
> 
> mem* are required to behave as-if they access memory as an array of
> unsigned char.  Therefore it is valid to give them arbitrarily
> (un)aligned pointers.  The C abstract machine doesn't specifically
> contemplate the possibility of a CPU that can do unaligned word reads
> but maybe not to all memory addresses, but I would argue that if there
> is such a CPU, then mem* are obliged to cope with it.

Only if there is a way, within the standard, in which you might obtain a 
pointer to such memory.

It is explicitly undefined in ISO C to access "an object defined with a 
volatile-qualified type through use of an lvalue with 
non-volatile-qualified type" (C11 6.7.3#6).  Thus you can't use mem* 
functions on objects defined as volatile.  I think device memory with 
special access requirements should be considered to be defined as 
volatile.  (So any access from C code should use volatile-qualified 
lvalues.)

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] powerpc: Use aligned stores in memset
  2017-09-12 14:37           ` Joseph Myers
@ 2017-09-12 15:06             ` Zack Weinberg
  0 siblings, 0 replies; 31+ messages in thread
From: Zack Weinberg @ 2017-09-12 15:06 UTC (permalink / raw)
  To: Joseph Myers; +Cc: Florian Weimer, Rajalakshmi Srinivasaraghavan, GNU C Library

On Tue, Sep 12, 2017 at 10:37 AM, Joseph Myers <joseph@codesourcery.com> wrote:
> On Tue, 12 Sep 2017, Zack Weinberg wrote:
>>
>> mem* are required to behave as-if they access memory as an array of
>> unsigned char.  Therefore it is valid to give them arbitrarily
>> (un)aligned pointers.  The C abstract machine doesn't specifically
>> contemplate the possibility of a CPU that can do unaligned word reads
>> but maybe not to all memory addresses, but I would argue that if there
>> is such a CPU, then mem* are obliged to cope with it.
>
> Only if there is a way, within the standard, in which you might obtain a
> pointer to such memory.

Perhaps it is only a matter of QoI, but I would argue that if there is
_any_ way to obtain such a pointer, considering the entire operating
system, then mem* can and should cope with it.

> I think device memory with
> special access requirements should be considered to be defined as
> volatile.  (So any access from C code should use volatile-qualified
> lvalues.)

I know you know that's violently disputed.

zw

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] powerpc: Use aligned stores in memset
  2017-09-12 14:16             ` Steven Munroe
@ 2017-09-12 17:04               ` Florian Weimer
  2017-09-12 19:21                 ` Steven Munroe
  0 siblings, 1 reply; 31+ messages in thread
From: Florian Weimer @ 2017-09-12 17:04 UTC (permalink / raw)
  To: munroesj; +Cc: Rajalakshmi Srinivasaraghavan, libc-alpha

On 09/12/2017 04:16 PM, Steven Munroe wrote:
> On Tue, 2017-09-12 at 16:08 +0200, Florian Weimer wrote:
>> * Steven Munroe:
>>
>>>> This means that GCC introduced an unaligned store, no matter how memset
>>>> was implemented.
>>>>
>>> C will do what ever the programmer wants. We can not stop that.
>>
>> That's not true.  If some specification says that for POWER, mem* must
>> behave in a certain way, and the GCC/glibc combiniation does not do
>> that, that's a bug on POWER.
>>
> What is the bug that you think we are not fixing?

memset, as called by the C programmer, still uses unaligned stores.

>> The programmer only sees the entire toolchain, and it is our job to
>> make the whole thing compliant with applicable specifications, even if
>> this means coordinating among different projects.
>>
>>> And in user mode and cache coherent memory this is not a problem as
>>> Adhemerval explained.
>>
>> Obviously not, otherwise we wouldn't be changing glibc.
>>
> I was arguing against forcing GCC and compilers in general being forced
> to be aware of Cache Inhibited memory. Programmers do.

Exactly.  In order to give programmers this choice, you need functions
like device_memset, which are not subject to compiler or library
optimizations which are not valid for device memory.

> What are you arguing? 

If you want a memset which is compatible with device memory, you need to
fix GCC *and* glibc.  Just patching glibc is not enough because GCC
optimizes memset in ways that are incompatible with your apparent goal..

Thanks,
Florian

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] powerpc: Use aligned stores in memset
  2017-09-12 12:18         ` Zack Weinberg
  2017-09-12 13:57           ` Steven Munroe
  2017-09-12 14:37           ` Joseph Myers
@ 2017-09-12 17:09           ` Florian Weimer
  2 siblings, 0 replies; 31+ messages in thread
From: Florian Weimer @ 2017-09-12 17:09 UTC (permalink / raw)
  To: Zack Weinberg; +Cc: Rajalakshmi Srinivasaraghavan, GNU C Library

On 09/12/2017 02:18 PM, Zack Weinberg wrote:
> On Tue, Sep 12, 2017 at 6:30 AM, Florian Weimer <fweimer@redhat.com> wrote:
>>
>> I could not find the manual which has the requirement that the mem*
>> functions do not use unaligned accesses.  Unless they are worded in a
>> very peculiar way, right now, the GCC/glibc combination does not comply
>> with a requirement that memset & Co. can be used for device memory access.
> 
> mem* are required to behave as-if they access memory as an array of
> unsigned char.  Therefore it is valid to give them arbitrarily
> (un)aligned pointers.  The C abstract machine doesn't specifically
> contemplate the possibility of a CPU that can do unaligned word reads
> but maybe not to all memory addresses, but I would argue that if there
> is such a CPU, then mem* are obliged to cope with it.

I disagree.  On most architectures, including x86-64, you can tell, with
certain hardware devices, that our mem* functions do not perform
byte-wise read or write access.  On many architectures, just a hardware
watchpoint installed using ptrace (a supported API) is sufficient.  But
this theoretical possibility does not mean that we cannot or should not
optimize the mem* functions.

If you need specific memory access patterns, you need to use inline
assembly.  In many cases, volatile loads and stores are sufficient, too.

>> ...the current glibc
>> implementation accesses locations which are outside the specified object
>> boundaries.
> 
> I think that's technically a defect.  Nothing in the C standard
> licenses it to do that;

It's permitted under the as-if rule.

Florian

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] powerpc: Use aligned stores in memset
  2017-09-12 17:04               ` Florian Weimer
@ 2017-09-12 19:21                 ` Steven Munroe
  2017-09-12 19:45                   ` Florian Weimer
  0 siblings, 1 reply; 31+ messages in thread
From: Steven Munroe @ 2017-09-12 19:21 UTC (permalink / raw)
  To: Florian Weimer; +Cc: Rajalakshmi Srinivasaraghavan, libc-alpha

On Tue, 2017-09-12 at 19:04 +0200, Florian Weimer wrote:
> On 09/12/2017 04:16 PM, Steven Munroe wrote:
> > On Tue, 2017-09-12 at 16:08 +0200, Florian Weimer wrote:
> >> * Steven Munroe:
> >>
> >>>> This means that GCC introduced an unaligned store, no matter how memset
> >>>> was implemented.
> >>>>
> >>> C will do what ever the programmer wants. We can not stop that.
> >>
> >> That's not true.  If some specification says that for POWER, mem* must
> >> behave in a certain way, and the GCC/glibc combiniation does not do
> >> that, that's a bug on POWER.
> >>
> > What is the bug that you think we are not fixing?
> 
> memset, as called by the C programmer, still uses unaligned stores.
> 

Are you sure? Which one?

find ./ -name 'memset*' | grep powerpc
./sysdeps/powerpc/powerpc32/power7/memset.S
./sysdeps/powerpc/powerpc32/memset.S
./sysdeps/powerpc/powerpc32/476/memset.S
./sysdeps/powerpc/powerpc32/405/memset.S
./sysdeps/powerpc/powerpc32/power6/memset.S
./sysdeps/powerpc/powerpc32/power4/memset.S
./sysdeps/powerpc/powerpc32/power4/multiarch/memset-ppc32.S
./sysdeps/powerpc/powerpc32/power4/multiarch/memset-power6.S
./sysdeps/powerpc/powerpc32/power4/multiarch/memset.c
./sysdeps/powerpc/powerpc32/power4/multiarch/memset-power7.S
./sysdeps/powerpc/powerpc64/power8/memset.S
./sysdeps/powerpc/powerpc64/power7/memset.S
./sysdeps/powerpc/powerpc64/memset.S
./sysdeps/powerpc/powerpc64/multiarch/memset-power8.S
./sysdeps/powerpc/powerpc64/multiarch/memset-power4.S
./sysdeps/powerpc/powerpc64/multiarch/memset-power6.S
./sysdeps/powerpc/powerpc64/multiarch/memset.c
./sysdeps/powerpc/powerpc64/multiarch/memset-ppc64.S
./sysdeps/powerpc/powerpc64/multiarch/memset-power7.S
./sysdeps/powerpc/powerpc64/power6/memset.S
./sysdeps/powerpc/powerpc64/power4/memset.S

> >> The programmer only sees the entire toolchain, and it is our job to
> >> make the whole thing compliant with applicable specifications, even if
> >> this means coordinating among different projects.
> >>
> >>> And in user mode and cache coherent memory this is not a problem as
> >>> Adhemerval explained.
> >>
> >> Obviously not, otherwise we wouldn't be changing glibc.
> >>
> > I was arguing against forcing GCC and compilers in general being forced
> > to be aware of Cache Inhibited memory. Programmers do.
> 
> Exactly.  In order to give programmers this choice, you need functions
> like device_memset, which are not subject to compiler or library
> optimizations which are not valid for device memory.
> 
Which project is going to host device_memset?

Are you suggesting that GLIBC should?

> > What are you arguing? 
> 
> If you want a memset which is compatible with device memory, you need to
> fix GCC *and* glibc.  Just patching glibc is not enough because GCC
> optimizes memset in ways that are incompatible with your apparent goal..
> 
I still don't see how GCC changes are required for this.

You need to specific here.

We are not going to version every loop that might contain stores based
on speculation that someone who does not know what they are doing might
access Cache Inhibited storage.

Not going to happen.

> Thanks,
> Florian
> 


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] powerpc: Use aligned stores in memset
  2017-09-12 19:21                 ` Steven Munroe
@ 2017-09-12 19:45                   ` Florian Weimer
  2017-09-12 20:25                     ` Steven Munroe
  0 siblings, 1 reply; 31+ messages in thread
From: Florian Weimer @ 2017-09-12 19:45 UTC (permalink / raw)
  To: munroesj; +Cc: Rajalakshmi Srinivasaraghavan, libc-alpha

On 09/12/2017 09:21 PM, Steven Munroe wrote:
> On Tue, 2017-09-12 at 19:04 +0200, Florian Weimer wrote:
>> On 09/12/2017 04:16 PM, Steven Munroe wrote:
>>> On Tue, 2017-09-12 at 16:08 +0200, Florian Weimer wrote:
>>>> * Steven Munroe:
>>>>
>>>>>> This means that GCC introduced an unaligned store, no matter how memset
>>>>>> was implemented.
>>>>>>
>>>>> C will do what ever the programmer wants. We can not stop that.
>>>>
>>>> That's not true.  If some specification says that for POWER, mem* must
>>>> behave in a certain way, and the GCC/glibc combiniation does not do
>>>> that, that's a bug on POWER.
>>>>
>>> What is the bug that you think we are not fixing?
>>
>> memset, as called by the C programmer, still uses unaligned stores.

Please look at my example and its disassembly.

> We are not going to version every loop that might contain stores based
> on speculation that someone who does not know what they are doing might
> access Cache Inhibited storage.

You need to remove optimizations from GCC which expand memset calls
using other instructions if those expansions do not compensate for the
possibility of unaligned stores.

Thanks,
Florian

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] powerpc: Use aligned stores in memset
  2017-09-12 19:45                   ` Florian Weimer
@ 2017-09-12 20:25                     ` Steven Munroe
  0 siblings, 0 replies; 31+ messages in thread
From: Steven Munroe @ 2017-09-12 20:25 UTC (permalink / raw)
  To: Florian Weimer; +Cc: Rajalakshmi Srinivasaraghavan, libc-alpha

On Tue, 2017-09-12 at 21:45 +0200, Florian Weimer wrote:
> On 09/12/2017 09:21 PM, Steven Munroe wrote:
> > On Tue, 2017-09-12 at 19:04 +0200, Florian Weimer wrote:
> >> On 09/12/2017 04:16 PM, Steven Munroe wrote:
> >>> On Tue, 2017-09-12 at 16:08 +0200, Florian Weimer wrote:
> >>>> * Steven Munroe:
> >>>>
> >>>>>> This means that GCC introduced an unaligned store, no matter how memset
> >>>>>> was implemented.
> >>>>>>
> >>>>> C will do what ever the programmer wants. We can not stop that.
> >>>>
> >>>> That's not true.  If some specification says that for POWER, mem* must
> >>>> behave in a certain way, and the GCC/glibc combiniation does not do
> >>>> that, that's a bug on POWER.
> >>>>
> >>> What is the bug that you think we are not fixing?
> >>
> >> memset, as called by the C programmer, still uses unaligned stores.
> 
> Please look at my example and its disassembly.
> 
> > We are not going to version every loop that might contain stores based
> > on speculation that someone who does not know what they are doing might
> > access Cache Inhibited storage.
> 
> You need to remove optimizations from GCC which expand memset calls
> using other instructions if those expansions do not compensate for the
> possibility of unaligned stores.
> 
No, the programmer should use -fno-builtin-memset if that programmer
knows he is accessing cache inhibited space.

To be clear this is not new, cache coherent and cache inhibited storage
have been in the PowerISA from the beginning.

So why all the fuss now?



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] powerpc: Use aligned stores in memset
  2017-09-12 10:30       ` Florian Weimer
  2017-09-12 12:18         ` Zack Weinberg
  2017-09-12 13:38         ` Steven Munroe
@ 2017-09-13 13:12         ` Tulio Magno Quites Machado Filho
  2017-09-18 13:54           ` Florian Weimer
  2 siblings, 1 reply; 31+ messages in thread
From: Tulio Magno Quites Machado Filho @ 2017-09-13 13:12 UTC (permalink / raw)
  To: Florian Weimer, Rajalakshmi Srinivasaraghavan; +Cc: libc-alpha

Florian Weimer <fweimer@redhat.com> writes:

> On 08/18/2017 11:10 AM, Florian Weimer wrote:
>> On 08/18/2017 08:51 AM, Rajalakshmi Srinivasaraghavan wrote:
>>>
>>>
>>> On 08/18/2017 11:51 AM, Florian Weimer wrote:
>>>> On 08/18/2017 07:11 AM, Rajalakshmi Srinivasaraghavan wrote:
>>>>>     * sysdeps/powerpc/powerpc64/power8/memset.S: Store byte by byte
>>>>>     for unaligned inputs if size is less than 8.
>>>>
>>>> This makes me rather nervous.  powerpc64le was supposed to have
>>>> reasonable efficient unaligned loads and stores.  GCC happily generates
>>>> them, too.
>>>
>>> This is meant ONLY for caching inhibited accesses.  Caching Inhibited
>>> accesses are required to be Guarded and properly aligned.
>> 
>> The intent is to support memset for such memory regions, right?  This
>> change is insufficient.  You have to fix GCC as well because it will
>> inline memset of unaligned pointers, like this:
>
> Here's a more complete example:
>
>
> #include <assert.h>
> #include <stdio.h>
> #include <string.h>
>
> typedef long __attribute__ ((aligned(1))) long_unaligned;
>
> __attribute__ ((noinline, noclone, weak))
> void
> clear (long_unaligned *p)
> {
>   memset (p, 0, sizeof (*p));
> }
>
> struct data
> {
>   char misalign;
>   long_unaligned data;
> };
>
> int
> main (void)
> {
>   struct data *data = malloc (sizeof (*data));
>   assert (data != NULL);
>   long_unaligned *p = &data->data;
>   printf ("pointer: %p\n", p);
>   clear (p);
>   return 0;
> }
>
> The clear function compiles to:
>
> typedef long __attribute__ ((aligned(1))) long_unaligned;
>
> void
> clear (long_unaligned *p)
> {
>   memset (p, 0, sizeof (*p));
> }
>
> At run time, I get:
>
> pointer: 0x10003c10011
>
> This means that GCC introduced an unaligned store, no matter how memset
> was implemented.

Which isn't necessarily a problem.
The performance penalty only appears when the memory access is referring
to an address which isn't at the instruction's natural boundary.

In this case, memset should use stb to avoid an alignment interrupt.

Notice that if the memory access is not at the natural boundary, an alignment
interrupt is generated and it won't generate an error.  The access will still
happen, but it will have a performance penalty.

> So I think the implementation constraint on the mem* functions is wrong.
>  It leads to a slower implementation of the mem* function for most of
> userspace which does not access device memory, and even for device
> memory, it is probably not what you want.

Makes sense.  But as there is nothing in the standard allowing or prohibiting
the usage of mem* functions to access caching-inhibited memory, I thought it
would make sense to provide functions that are as generic as possible.

IMHO, it's easier for programmers to use generic functions in most scenarios
and have access to specialized functions, e.g. a function for data already
aligned at 16 bytes.

-- 
Tulio Magno

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] powerpc: Use aligned stores in memset
  2017-09-13 13:12         ` Tulio Magno Quites Machado Filho
@ 2017-09-18 13:54           ` Florian Weimer
  2017-10-03 18:29             ` Adhemerval Zanella
  0 siblings, 1 reply; 31+ messages in thread
From: Florian Weimer @ 2017-09-18 13:54 UTC (permalink / raw)
  To: Tulio Magno Quites Machado Filho, Rajalakshmi Srinivasaraghavan
  Cc: libc-alpha

On 09/13/2017 03:12 PM, Tulio Magno Quites Machado Filho wrote:
>> So I think the implementation constraint on the mem* functions is wrong.
>>   It leads to a slower implementation of the mem* function for most of
>> userspace which does not access device memory, and even for device
>> memory, it is probably not what you want.
> Makes sense.  But as there is nothing in the standard allowing or prohibiting
> the usage of mem* functions to access caching-inhibited memory, I thought it
> would make sense to provide functions that are as generic as possible.

But I have shown that you aren't doing that because of the GCC 
optimization which inlines the memset call.

But I won't continue this conversation as I don't see it particularly 
useful to anyone.  In the end, you are the architecture maintainers, and 
you should do what you think is best.

Thanks,
Florian

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] powerpc: Use aligned stores in memset
  2017-09-18 13:54           ` Florian Weimer
@ 2017-10-03 18:29             ` Adhemerval Zanella
  2017-10-05 12:13               ` Rajalakshmi Srinivasaraghavan
  2017-11-08 18:52               ` Tulio Magno Quites Machado Filho
  0 siblings, 2 replies; 31+ messages in thread
From: Adhemerval Zanella @ 2017-10-03 18:29 UTC (permalink / raw)
  To: Florian Weimer, Tulio Magno Quites Machado Filho,
	Rajalakshmi Srinivasaraghavan
  Cc: libc-alpha

[-- Attachment #1: Type: text/plain, Size: 2791 bytes --]



On 18/09/2017 10:54, Florian Weimer wrote:
> On 09/13/2017 03:12 PM, Tulio Magno Quites Machado Filho wrote:
>>> So I think the implementation constraint on the mem* functions is wrong.
>>>   It leads to a slower implementation of the mem* function for most of
>>> userspace which does not access device memory, and even for device
>>> memory, it is probably not what you want.
>> Makes sense.  But as there is nothing in the standard allowing or prohibiting
>> the usage of mem* functions to access caching-inhibited memory, I thought it
>> would make sense to provide functions that are as generic as possible.
> 
> But I have shown that you aren't doing that because of the GCC optimization which inlines the memset call.
> 
> But I won't continue this conversation as I don't see it particularly useful to anyone.  In the end, you are the architecture maintainers, and you should do what you think is best.
> 
> Thanks,
> Florian

I think one way to provide a slight better memcpy implementation for POWER8
and still be able to circumvent the non-aligned on non-cacheable memory
is to use tunables.

The branch azanella/memcpy-power8 [1] has a power8 memcpy optimization which
uses unaligned load and stores that I created some time ago but never actually
send upstream.  It shows better performance on both bench-memcpy and
bench-memcpy-random (about 10% on latter) and mixed results on bench-memcpy-large
(which it is mainly dominated by memory throughput and on the environment I am
using, a shared PowerKVM instance, the results does not seem to be reliable).

It could use some tunning, specially on some the range I used for unrolling
the load/stores and it also does not care for unaligned access on cross-page
boundary (which tend to be quite slow on current hardware, but also on
current page size of usual 64k also uncommon).

This first patch does not enable this option as a default for POWER8, it just
add on string tests as an option.  The second patch changes the selection to:

  1. If glibc is configure with tunables, set the new implementation as the
     default for ISA 2.07 (power8).

  2. Also if tunable is active, add the parameter glibc.tune.aligned_memopt
     to disable the new implementation selection.

So programs that rely on aligned loads can set:

GLIBC_TUNABLES=glibc.tune.aligned_memopt=1

And then the memcpy ifunc selection would pick the power7 one which uses
only aligned load and stores.

This is a RFC patch and if the idea sounds to powerpc arch mantainers I can
work on finishing the patch with more comments and send upstream.  I tried
to apply same unaligned idea for memset and memmove, but I could get any real
improvement in neither.

[1] https://sourceware.org/git/?p=glibc.git;a=shortlog;h=refs/heads/azanella/memcpy-power8

[-- Attachment #2: bench-memcpy-random.out --]
[-- Type: text/plain, Size: 769 bytes --]

{
 "timing_type": "hp_timing",
 "functions": {
  "memcpy": {
   "bench-variant": "random",
   "ifuncs": ["__memcpy_power8", "__memcpy_power7", "__memcpy_a2", "__memcpy_power6", "__memcpy_power4", "__memcpy_ppc"],
   "results": [
    {
     "max-size": 4096,
     "timings": [21935.9, 22892, 35018.3, 27433.9, 27430.6, 27156.3]
    },
    {
     "max-size": 8192,
     "timings": [19792.4, 23740.1, 34017, 27218.6, 26828.5, 24779]
    },
    {
     "max-size": 16384,
     "timings": [20685.2, 23021.5, 34795.3, 27244.5, 26756.5, 27921.6]
    },
    {
     "max-size": 32768,
     "timings": [20024.8, 22553.9, 34411.3, 28268.6, 26792.3, 27380.6]
    },
    {
     "max-size": 65536,
     "timings": [20758.8, 23063.2, 38000.7, 28168.8, 27867.9, 25706.8]
    }]
  }
 }
}

[-- Attachment #3: bench-memcpy.out --]
[-- Type: text/plain, Size: 67126 bytes --]

{
 "timing_type": "hp_timing",
 "functions": {
  "memcpy": {
   "bench-variant": "default",
   "ifuncs": ["builtin_memcpy", "simple_memcpy", "__memcpy_power8", "__memcpy_power7", "__memcpy_a2", "__memcpy_power6", "__memcpy_power4", "__memcpy_ppc"],
   "results": [
    {
     "length": 1,
     "align1": 0,
     "align2": 0,
     "timings": [153.906, 5.25, 4.0625, 7.64062, 12.3125, 14.7656, 14.5156, 9.48438]
    },
    {
     "length": 1,
     "align1": 0,
     "align2": 0,
     "timings": [6.51562, 4.98438, 4.23438, 6.46875, 9.70312, 10.7188, 10.7188, 6.78125]
    },
    {
     "length": 1,
     "align1": 0,
     "align2": 0,
     "timings": [6.51562, 5.15625, 3.9375, 6.26562, 9.54688, 10.6719, 10.5156, 6.39062]
    },
    {
     "length": 1,
     "align1": 0,
     "align2": 0,
     "timings": [6.39062, 4.89062, 3.92188, 6.375, 9.1875, 10.7031, 10.75, 6.39062]
    },
    {
     "length": 2,
     "align1": 0,
     "align2": 0,
     "timings": [6.90625, 3.82812, 4.5625, 5.96875, 9.54688, 11.3438, 10.7656, 7.07812]
    },
    {
     "length": 2,
     "align1": 1,
     "align2": 0,
     "timings": [6.51562, 3.51562, 4.375, 5.3125, 9.5625, 11.125, 10.8281, 6.40625]
    },
    {
     "length": 2,
     "align1": 0,
     "align2": 1,
     "timings": [6.78125, 5, 3.95312, 5.3125, 9.54688, 10.9062, 10.125, 6.70312]
    },
    {
     "length": 2,
     "align1": 1,
     "align2": 1,
     "timings": [6.75, 3.65625, 4.14062, 5.54688, 9.51562, 11.0625, 10.5469, 6.42188]
    },
    {
     "length": 4,
     "align1": 0,
     "align2": 0,
     "timings": [6.6875, 5.35938, 4.21875, 5.95312, 9.73438, 11.0156, 10.7656, 7]
    },
    {
     "length": 4,
     "align1": 2,
     "align2": 0,
     "timings": [6.59375, 5.17188, 4.0625, 5.625, 9.64062, 10.9219, 10.8281, 6.40625]
    },
    {
     "length": 4,
     "align1": 0,
     "align2": 2,
     "timings": [6.46875, 5.03125, 4.21875, 5.5, 9.67188, 10.7031, 10.8438, 6.45312]
    },
    {
     "length": 4,
     "align1": 2,
     "align2": 2,
     "timings": [6.54688, 5, 4.15625, 5.5625, 9.59375, 10.5781, 10.8438, 6.65625]
    },
    {
     "length": 8,
     "align1": 0,
     "align2": 0,
     "timings": [7.3125, 8.42188, 4.78125, 4.64062, 6, 9.25, 9.09375, 4.625]
    },
    {
     "length": 8,
     "align1": 3,
     "align2": 0,
     "timings": [7.28125, 8.03125, 4.8125, 4.42188, 5.875, 8.90625, 8.9375, 4.875]
    },
    {
     "length": 8,
     "align1": 0,
     "align2": 3,
     "timings": [7, 7.98438, 4.67188, 4.23438, 5.70312, 8.8125, 8.57812, 4.85938]
    },
    {
     "length": 8,
     "align1": 3,
     "align2": 3,
     "timings": [7.23438, 7.85938, 4.79688, 4.26562, 5.85938, 8.60938, 8.78125, 4.84375]
    },
    {
     "length": 16,
     "align1": 0,
     "align2": 0,
     "timings": [6.25, 23.3594, 3.92188, 7.1875, 23.8281, 10.5938, 12.3438, 8.95312]
    },
    {
     "length": 16,
     "align1": 4,
     "align2": 0,
     "timings": [6.07812, 23.5625, 3.78125, 6.70312, 23.3438, 10.5938, 12.1562, 8.10938]
    },
    {
     "length": 16,
     "align1": 0,
     "align2": 4,
     "timings": [6.20312, 23.4219, 3.95312, 6.40625, 33.5625, 10.4531, 12.0312, 8.40625]
    },
    {
     "length": 16,
     "align1": 4,
     "align2": 4,
     "timings": [6.25, 23.3125, 3.84375, 6.51562, 32.9688, 10.5, 12, 8.42188]
    },
    {
     "length": 32,
     "align1": 0,
     "align2": 0,
     "timings": [6.20312, 32.2031, 3.75, 9.0625, 23.7656, 8.48438, 7.10938, 8.17188]
    },
    {
     "length": 32,
     "align1": 5,
     "align2": 0,
     "timings": [5.95312, 31.9688, 3.85938, 6.57812, 23.375, 18.6875, 12.1094, 9.60938]
    },
    {
     "length": 32,
     "align1": 0,
     "align2": 5,
     "timings": [6.23438, 31.7188, 3.95312, 8.26562, 28.2656, 14.9688, 12.9531, 11.4375]
    },
    {
     "length": 32,
     "align1": 5,
     "align2": 5,
     "timings": [6.375, 31.9844, 3.76562, 8.78125, 27.7188, 9.09375, 6.60938, 9.15625]
    },
    {
     "length": 64,
     "align1": 0,
     "align2": 0,
     "timings": [7.07812, 55.6094, 4.51562, 8.53125, 25.1875, 8.73438, 8.14062, 8.92188]
    },
    {
     "length": 64,
     "align1": 6,
     "align2": 0,
     "timings": [7.17188, 55.75, 4.625, 7.5, 24.6562, 19, 16.125, 12]
    },
    {
     "length": 64,
     "align1": 0,
     "align2": 6,
     "timings": [7.10938, 55.5781, 4.54688, 8.23438, 30.0625, 19.375, 19, 13.9844]
    },
    {
     "length": 64,
     "align1": 6,
     "align2": 6,
     "timings": [6.84375, 55.7812, 4.29688, 8.46875, 29.5781, 8.875, 7.98438, 9.89062]
    },
    {
     "length": 128,
     "align1": 0,
     "align2": 0,
     "timings": [10.1875, 104.078, 7.17188, 10.6719, 33.375, 6.76562, 10.1719, 11.9375]
    },
    {
     "length": 128,
     "align1": 7,
     "align2": 0,
     "timings": [8.9375, 103.641, 7.21875, 10, 31.5312, 21.7656, 20.3594, 21.875]
    },
    {
     "length": 128,
     "align1": 0,
     "align2": 7,
     "timings": [9.1875, 103.516, 6.67188, 10.9375, 32.7188, 22.0781, 22.0469, 18.1562]
    },
    {
     "length": 128,
     "align1": 7,
     "align2": 7,
     "timings": [8.92188, 103.828, 6.51562, 9.01562, 32.0938, 9.89062, 9.64062, 11.75]
    },
    {
     "length": 256,
     "align1": 0,
     "align2": 0,
     "timings": [11.125, 199.516, 9.71875, 12.6406, 35.4062, 9.375, 14.8125, 16.8438]
    },
    {
     "length": 256,
     "align1": 8,
     "align2": 0,
     "timings": [11.5469, 200.297, 9.14062, 15.0156, 35.3438, 8.96875, 14.3438, 15.7344]
    },
    {
     "length": 256,
     "align1": 0,
     "align2": 8,
     "timings": [10.8594, 200.234, 7.84375, 16.4844, 50.1719, 9.17188, 14.5312, 15.5469]
    },
    {
     "length": 256,
     "align1": 8,
     "align2": 8,
     "timings": [10.6406, 199.344, 7.90625, 10.9375, 49.3906, 9.34375, 14.125, 15.8281]
    },
    {
     "length": 512,
     "align1": 0,
     "align2": 0,
     "timings": [16.1406, 394.062, 12.5625, 16.625, 44.625, 17.0156, 26.1562, 29.5156]
    },
    {
     "length": 512,
     "align1": 9,
     "align2": 0,
     "timings": [14.6719, 395.766, 12.3125, 25.7812, 45.2969, 45.9531, 44.5625, 52.0469]
    },
    {
     "length": 512,
     "align1": 0,
     "align2": 9,
     "timings": [14.7969, 394.875, 11.8906, 28.4062, 58.0781, 46.4844, 46.625, 53.9688]
    },
    {
     "length": 512,
     "align1": 9,
     "align2": 9,
     "timings": [14.5156, 394.672, 12.5625, 13.625, 56.2188, 18.2344, 28.4688, 29.9531]
    },
    {
     "length": 1024,
     "align1": 0,
     "align2": 0,
     "timings": [23.4375, 787.422, 19.4062, 29.1719, 56.125, 35.1406, 44.0781, 47.3438]
    },
    {
     "length": 1024,
     "align1": 10,
     "align2": 0,
     "timings": [22.6406, 781.578, 19.4688, 42.0781, 57.2031, 77.7344, 76.0469, 91.8281]
    },
    {
     "length": 1024,
     "align1": 0,
     "align2": 10,
     "timings": [22.8438, 780, 19.8125, 43.4062, 71.4531, 79.3281, 78.7031, 93.7344]
    },
    {
     "length": 1024,
     "align1": 10,
     "align2": 10,
     "timings": [21.7031, 784.812, 19.5938, 24.9375, 67.8906, 38.4219, 45.6875, 47.5781]
    },
    {
     "length": 2048,
     "align1": 0,
     "align2": 0,
     "timings": [41.9219, 1575.08, 39.1875, 44.6719, 88.7031, 66.8906, 75.625, 79.6094]
    },
    {
     "length": 2048,
     "align1": 11,
     "align2": 0,
     "timings": [42.0938, 1568.55, 39.8594, 73.9062, 92.0312, 142.828, 139.266, 169.688]
    },
    {
     "length": 2048,
     "align1": 0,
     "align2": 11,
     "timings": [42.8594, 1556.59, 40.0625, 76.5156, 109.656, 141.547, 142, 172.188]
    },
    {
     "length": 2048,
     "align1": 11,
     "align2": 11,
     "timings": [40.9688, 1559.66, 38.6094, 42.2031, 101.938, 70.625, 76.4062, 78.9375]
    },
    {
     "length": 4096,
     "align1": 0,
     "align2": 0,
     "timings": [75.7031, 3117.66, 70.1875, 77.2969, 156.703, 132.562, 139.469, 142.438]
    },
    {
     "length": 4096,
     "align1": 12,
     "align2": 0,
     "timings": [87.4531, 3124.62, 85.6719, 137.453, 181.906, 351.281, 260.047, 317.625]
    },
    {
     "length": 4096,
     "align1": 0,
     "align2": 12,
     "timings": [95.375, 3228.45, 93.0469, 139.812, 179.641, 258.766, 256, 323.047]
    },
    {
     "length": 4096,
     "align1": 12,
     "align2": 12,
     "timings": [109.906, 3613.47, 107.359, 75.4375, 171.453, 135.234, 142.406, 142.859]
    },
    {
     "length": 8192,
     "align1": 0,
     "align2": 0,
     "timings": [140.156, 6294.27, 135.547, 141.969, 292.047, 260.062, 266.172, 269.781]
    },
    {
     "length": 8192,
     "align1": 13,
     "align2": 0,
     "timings": [168.172, 6270.34, 165.844, 264.703, 339.953, 484.297, 479.812, 612.125]
    },
    {
     "length": 8192,
     "align1": 0,
     "align2": 13,
     "timings": [179.281, 6167.27, 135.5, 200.328, 261.109, 342.625, 337.828, 407.719]
    },
    {
     "length": 8192,
     "align1": 13,
     "align2": 13,
     "timings": [112.781, 3590.89, 95.4375, 78.4531, 170.625, 146.688, 149.062, 150.438]
    },
    {
     "length": 16384,
     "align1": 0,
     "align2": 0,
     "timings": [149.25, 6959.89, 146.656, 151.75, 313.969, 287.203, 290.875, 292.859]
    },
    {
     "length": 16384,
     "align1": 14,
     "align2": 0,
     "timings": [184.672, 7171.08, 183.453, 289.938, 362.469, 525.375, 521.75, 661.656]
    },
    {
     "length": 16384,
     "align1": 0,
     "align2": 14,
     "timings": [193.766, 7127.11, 192.062, 290.516, 351.25, 526.719, 522.234, 663.406]
    },
    {
     "length": 16384,
     "align1": 14,
     "align2": 14,
     "timings": [169.203, 6986.52, 167.734, 150.062, 322.141, 288.453, 291.031, 292.891]
    },
    {
     "length": 32768,
     "align1": 0,
     "align2": 0,
     "timings": [291.828, 16318.4, 336.234, 296.516, 615.844, 571.281, 575.281, 577.266]
    },
    {
     "length": 32768,
     "align1": 15,
     "align2": 0,
     "timings": [368, 15069.5, 364.469, 574.172, 716.656, 1027.11, 1029.91, 1312.67]
    },
    {
     "length": 32768,
     "align1": 0,
     "align2": 15,
     "timings": [379.422, 15164.2, 377.938, 575.594, 687.625, 1029.11, 1025.59, 1312.98]
    },
    {
     "length": 32768,
     "align1": 15,
     "align2": 15,
     "timings": [311.281, 15046.3, 309.719, 298.016, 623.141, 572.75, 575.781, 576.75]
    },
    {
     "length": 65536,
     "align1": 0,
     "align2": 0,
     "timings": [608.219, 30219.4, 593.5, 676.047, 1244.56, 1152.98, 1188.02, 1168.06]
    },
    {
     "length": 65536,
     "align1": 16,
     "align2": 0,
     "timings": [656.047, 30360.4, 595.766, 630.266, 1235.25, 1158.94, 1169.88, 1169.11]
    },
    {
     "length": 65536,
     "align1": 0,
     "align2": 16,
     "timings": [700.203, 30408.5, 604.453, 637.156, 1272.98, 1158.08, 1171.02, 1169.58]
    },
    {
     "length": 65536,
     "align1": 16,
     "align2": 16,
     "timings": [603.094, 31151.9, 595.812, 630.578, 1244.88, 1159.19, 1415.02, 1169.95]
    },
    {
     "length": 0,
     "align1": 0,
     "align2": 0,
     "timings": [4.28125, 2.95312, 2.60938, 3.73438, 5.5625, 6.85938, 6.73438, 4.26562]
    },
    {
     "length": 0,
     "align1": 0,
     "align2": 0,
     "timings": [4.0625, 2.76562, 2.60938, 3.5625, 5.59375, 6.46875, 6.23438, 4.04688]
    },
    {
     "length": 0,
     "align1": 0,
     "align2": 0,
     "timings": [3.8125, 2.73438, 2.60938, 3.54688, 5.73438, 6.375, 6.46875, 4.10938]
    },
    {
     "length": 0,
     "align1": 0,
     "align2": 0,
     "timings": [3.9375, 2.9375, 2.60938, 3.46875, 5.79688, 6.21875, 6.48438, 4.0625]
    },
    {
     "length": 1,
     "align1": 0,
     "align2": 0,
     "timings": [3.65625, 2.92188, 2.375, 3.5625, 5.40625, 6.10938, 6.23438, 3.79688]
    },
    {
     "length": 1,
     "align1": 1,
     "align2": 0,
     "timings": [3.54688, 2.84375, 2.3125, 3.39062, 5.34375, 6.03125, 6.07812, 3.57812]
    },
    {
     "length": 1,
     "align1": 0,
     "align2": 1,
     "timings": [3.60938, 2.8125, 2.20312, 3.375, 5.15625, 6.0625, 6.10938, 3.73438]
    },
    {
     "length": 1,
     "align1": 1,
     "align2": 1,
     "timings": [3.51562, 2.85938, 2.28125, 3.59375, 5.25, 6.0625, 6.01562, 3.57812]
    },
    {
     "length": 2,
     "align1": 0,
     "align2": 0,
     "timings": [3.59375, 1.92188, 2.20312, 3.26562, 5.35938, 6.14062, 6.15625, 3.79688]
    },
    {
     "length": 2,
     "align1": 2,
     "align2": 0,
     "timings": [3.6875, 2, 2.21875, 3.07812, 5.375, 6.23438, 6.03125, 3.75]
    },
    {
     "length": 2,
     "align1": 0,
     "align2": 2,
     "timings": [3.70312, 2.60938, 2.20312, 3.14062, 5.3125, 6, 6.03125, 3.75]
    },
    {
     "length": 2,
     "align1": 2,
     "align2": 2,
     "timings": [3.59375, 1.96875, 2.20312, 3.125, 5.34375, 6.20312, 6.03125, 3.75]
    },
    {
     "length": 3,
     "align1": 0,
     "align2": 0,
     "timings": [3.32812, 5.78125, 2.34375, 3.28125, 4.90625, 5.76562, 5.73438, 3.75]
    },
    {
     "length": 3,
     "align1": 3,
     "align2": 0,
     "timings": [3.40625, 5.78125, 2.20312, 3.15625, 4.85938, 5.73438, 5.75, 3.48438]
    },
    {
     "length": 3,
     "align1": 0,
     "align2": 3,
     "timings": [3.34375, 7.42188, 2.65625, 2.92188, 4.9375, 5.75, 5.84375, 3.48438]
    },
    {
     "length": 3,
     "align1": 3,
     "align2": 3,
     "timings": [3.34375, 5.23438, 1.98438, 3.04688, 4.76562, 5.65625, 5.85938, 3.46875]
    },
    {
     "length": 4,
     "align1": 0,
     "align2": 0,
     "timings": [3.95312, 2.71875, 2.4375, 3.28125, 5.35938, 5.9375, 6.01562, 3.60938]
    },
    {
     "length": 4,
     "align1": 4,
     "align2": 0,
     "timings": [3.71875, 2.875, 2.21875, 3.14062, 5.40625, 6.01562, 5.96875, 3.625]
    },
    {
     "length": 4,
     "align1": 0,
     "align2": 4,
     "timings": [3.625, 2.8125, 2.25, 3.10938, 5.21875, 5.8125, 6.03125, 3.73438]
    },
    {
     "length": 4,
     "align1": 4,
     "align2": 4,
     "timings": [3.64062, 2.79688, 2.35938, 3.125, 5.35938, 6.04688, 6, 3.75]
    },
    {
     "length": 5,
     "align1": 0,
     "align2": 0,
     "timings": [3.51562, 3.60938, 2.17188, 3.1875, 5.125, 5.6875, 5.79688, 3.625]
    },
    {
     "length": 5,
     "align1": 5,
     "align2": 0,
     "timings": [3.375, 3.42188, 2.0625, 3.07812, 4.89062, 5.57812, 5.67188, 3.4375]
    },
    {
     "length": 5,
     "align1": 0,
     "align2": 5,
     "timings": [3.39062, 3.20312, 2.04688, 3.125, 4.92188, 5.375, 5.65625, 3.35938]
    },
    {
     "length": 5,
     "align1": 5,
     "align2": 5,
     "timings": [3.32812, 3.07812, 2.60938, 3.125, 4.79688, 5.51562, 5.57812, 3.375]
    },
    {
     "length": 6,
     "align1": 0,
     "align2": 0,
     "timings": [3.46875, 6.92188, 2, 2.85938, 5.1875, 5.5625, 5.90625, 3.26562]
    },
    {
     "length": 6,
     "align1": 6,
     "align2": 0,
     "timings": [3.42188, 7.03125, 2, 2.70312, 5.15625, 5.42188, 5.71875, 3.4375]
    },
    {
     "length": 6,
     "align1": 0,
     "align2": 6,
     "timings": [3.39062, 6.85938, 1.85938, 2.5, 5.03125, 5.3125, 5.6875, 3.32812]
    },
    {
     "length": 6,
     "align1": 6,
     "align2": 6,
     "timings": [3.26562, 6.92188, 2.01562, 2.73438, 5.0625, 5.5, 5.25, 3.32812]
    },
    {
     "length": 7,
     "align1": 0,
     "align2": 0,
     "timings": [3.17188, 4.29688, 3.0625, 2.79688, 4.64062, 5.375, 5.48438, 3.125]
    },
    {
     "length": 7,
     "align1": 7,
     "align2": 0,
     "timings": [3.20312, 3.98438, 2.92188, 2.71875, 4.65625, 5.3125, 5.34375, 3.14062]
    },
    {
     "length": 7,
     "align1": 0,
     "align2": 7,
     "timings": [3, 4.03125, 2.75, 2.85938, 4.65625, 5.25, 5.45312, 3]
    },
    {
     "length": 7,
     "align1": 7,
     "align2": 7,
     "timings": [3.125, 3.92188, 2.51562, 2.6875, 4.64062, 5.125, 5.20312, 3.07812]
    },
    {
     "length": 8,
     "align1": 0,
     "align2": 0,
     "timings": [4.14062, 4.48438, 2.71875, 2.4375, 3.45312, 5.26562, 5, 2.76562]
    },
    {
     "length": 8,
     "align1": 8,
     "align2": 0,
     "timings": [3.9375, 4.4375, 2.54688, 2.46875, 3.32812, 5.32812, 4.96875, 2.54688]
    },
    {
     "length": 8,
     "align1": 0,
     "align2": 8,
     "timings": [3.89062, 4.375, 2.57812, 2.28125, 3.28125, 5.23438, 4.96875, 2.67188]
    },
    {
     "length": 8,
     "align1": 8,
     "align2": 8,
     "timings": [3.85938, 4.40625, 2.71875, 2.34375, 3.14062, 5.1875, 4.875, 2.67188]
    },
    {
     "length": 9,
     "align1": 0,
     "align2": 0,
     "timings": [3.875, 5.17188, 2.46875, 3.85938, 2.95312, 6.85938, 6.60938, 4.48438]
    },
    {
     "length": 9,
     "align1": 9,
     "align2": 0,
     "timings": [3.59375, 4.95312, 2.35938, 3.42188, 2.82812, 7.17188, 7.26562, 5.21875]
    },
    {
     "length": 9,
     "align1": 0,
     "align2": 9,
     "timings": [3.84375, 4.90625, 2.34375, 3.79688, 2.82812, 6.54688, 6.4375, 4.28125]
    },
    {
     "length": 9,
     "align1": 9,
     "align2": 9,
     "timings": [3.71875, 4.84375, 2.32812, 2.96875, 2.84375, 7.125, 7.125, 5.03125]
    },
    {
     "length": 10,
     "align1": 0,
     "align2": 0,
     "timings": [3.98438, 5.75, 2.53125, 3.54688, 3.01562, 6.82812, 6.375, 4.46875]
    },
    {
     "length": 10,
     "align1": 10,
     "align2": 0,
     "timings": [3.84375, 5.6875, 2.45312, 4.21875, 2.84375, 6.21875, 7.23438, 4.90625]
    },
    {
     "length": 10,
     "align1": 0,
     "align2": 10,
     "timings": [3.79688, 5.70312, 2.35938, 3.39062, 2.6875, 6.875, 6.29688, 4.28125]
    },
    {
     "length": 10,
     "align1": 10,
     "align2": 10,
     "timings": [3.82812, 5.75, 2.34375, 3.90625, 2.67188, 6.15625, 7.20312, 4.89062]
    },
    {
     "length": 11,
     "align1": 0,
     "align2": 0,
     "timings": [3.60938, 6.35938, 2.01562, 3.46875, 2.46875, 6.53125, 6.04688, 4.03125]
    },
    {
     "length": 11,
     "align1": 11,
     "align2": 0,
     "timings": [3.39062, 6.14062, 2.21875, 3.78125, 2.46875, 7.21875, 6.6875, 4.54688]
    },
    {
     "length": 11,
     "align1": 0,
     "align2": 11,
     "timings": [3.3125, 5.73438, 2.1875, 3.39062, 2.46875, 6.45312, 5.98438, 3.85938]
    },
    {
     "length": 11,
     "align1": 11,
     "align2": 11,
     "timings": [3.59375, 5.71875, 2.1875, 3.98438, 2.5, 7.20312, 6.60938, 4.60938]
    },
    {
     "length": 12,
     "align1": 0,
     "align2": 0,
     "timings": [3.875, 6.59375, 2.65625, 3.5, 3.21875, 5.64062, 6.40625, 4.46875]
    },
    {
     "length": 12,
     "align1": 12,
     "align2": 0,
     "timings": [3.82812, 6.3125, 2.375, 3.28125, 2.96875, 5.57812, 6.34375, 4.15625]
    },
    {
     "length": 12,
     "align1": 0,
     "align2": 12,
     "timings": [3.64062, 6.1875, 2.40625, 3.26562, 2.98438, 5.34375, 6.20312, 4.25]
    },
    {
     "length": 12,
     "align1": 12,
     "align2": 12,
     "timings": [3.84375, 6.1875, 2.40625, 3.09375, 3, 5.4375, 6.125, 4.15625]
    },
    {
     "length": 13,
     "align1": 0,
     "align2": 0,
     "timings": [3.65625, 6.85938, 2.875, 3.25, 2.6875, 6.42188, 5.95312, 3.79688]
    },
    {
     "length": 13,
     "align1": 13,
     "align2": 0,
     "timings": [3.46875, 6.67188, 2.82812, 3.42188, 2.57812, 7.82812, 7.1875, 5.29688]
    },
    {
     "length": 13,
     "align1": 0,
     "align2": 13,
     "timings": [3.5, 6.54688, 2.54688, 3.28125, 2.53125, 6.32812, 5.875, 3.70312]
    },
    {
     "length": 13,
     "align1": 13,
     "align2": 13,
     "timings": [3.57812, 6.57812, 2.70312, 3.29688, 2.59375, 7.57812, 7.03125, 4.82812]
    },
    {
     "length": 14,
     "align1": 0,
     "align2": 0,
     "timings": [3.78125, 10.5156, 2.26562, 3.04688, 2.73438, 6.65625, 5.875, 3.9375]
    },
    {
     "length": 14,
     "align1": 14,
     "align2": 0,
     "timings": [3.64062, 10.3438, 2.3125, 3.75, 2.625, 5.8125, 6.9375, 4.85938]
    },
    {
     "length": 14,
     "align1": 0,
     "align2": 14,
     "timings": [3.70312, 10.3438, 2.15625, 3.01562, 2.46875, 6.60938, 5.96875, 4.25]
    },
    {
     "length": 14,
     "align1": 14,
     "align2": 14,
     "timings": [3.59375, 10.3125, 2.73438, 3.60938, 2.5625, 5.89062, 6.53125, 4.71875]
    },
    {
     "length": 15,
     "align1": 0,
     "align2": 0,
     "timings": [3.26562, 10.7188, 3.04688, 3.09375, 2.89062, 6.1875, 5.6875, 3.42188]
    },
    {
     "length": 15,
     "align1": 15,
     "align2": 0,
     "timings": [3.375, 10.7188, 2.85938, 3.5, 2.8125, 7.0625, 6.26562, 4.34375]
    },
    {
     "length": 15,
     "align1": 0,
     "align2": 15,
     "timings": [3.35938, 10.7344, 2.78125, 3.07812, 2.95312, 6.1875, 5.60938, 3.28125]
    },
    {
     "length": 15,
     "align1": 15,
     "align2": 15,
     "timings": [3.21875, 10.75, 2.95312, 3.42188, 2.64062, 6.60938, 6.01562, 4.23438]
    },
    {
     "length": 16,
     "align1": 0,
     "align2": 0,
     "timings": [3.54688, 12.9219, 2.3125, 3.92188, 13.2031, 6, 7, 4.84375]
    },
    {
     "length": 16,
     "align1": 16,
     "align2": 0,
     "timings": [3.39062, 13.0938, 2.1875, 3.73438, 13.0469, 5.65625, 6.89062, 4.60938]
    },
    {
     "length": 16,
     "align1": 0,
     "align2": 16,
     "timings": [3.53125, 13.0156, 2.20312, 3.625, 13.0625, 5.8125, 6.78125, 4.8125]
    },
    {
     "length": 16,
     "align1": 16,
     "align2": 16,
     "timings": [3.53125, 12.9844, 2.20312, 3.6875, 13.1562, 5.89062, 6.53125, 4.6875]
    },
    {
     "length": 17,
     "align1": 0,
     "align2": 0,
     "timings": [3.53125, 13.3281, 2.10938, 3.70312, 12.8281, 6.90625, 6.15625, 4.29688]
    },
    {
     "length": 17,
     "align1": 17,
     "align2": 0,
     "timings": [3.39062, 13.7344, 2.14062, 3.25, 12.6719, 7.51562, 6.875, 5.23438]
    },
    {
     "length": 17,
     "align1": 0,
     "align2": 17,
     "timings": [3.5, 13.6562, 2.20312, 3.875, 17.9531, 6.6875, 6.5, 4.125]
    },
    {
     "length": 17,
     "align1": 17,
     "align2": 17,
     "timings": [3.46875, 13.6875, 2.14062, 3.01562, 17.5469, 7.20312, 6.65625, 4.59375]
    },
    {
     "length": 18,
     "align1": 0,
     "align2": 0,
     "timings": [3.46875, 14, 2.23438, 3.4375, 12.2969, 7.09375, 6.51562, 4.20312]
    },
    {
     "length": 18,
     "align1": 18,
     "align2": 0,
     "timings": [3.375, 13.8594, 2.1875, 4.23438, 12.2812, 6.29688, 7.25, 5.29688]
    },
    {
     "length": 18,
     "align1": 0,
     "align2": 18,
     "timings": [3.39062, 13.9531, 2.20312, 3.39062, 18.3125, 6.78125, 6.29688, 4.23438]
    },
    {
     "length": 18,
     "align1": 18,
     "align2": 18,
     "timings": [3.59375, 13.7656, 2.15625, 4.07812, 18.2031, 6.17188, 6.92188, 5.0625]
    },
    {
     "length": 19,
     "align1": 0,
     "align2": 0,
     "timings": [3.375, 9.70312, 2.07812, 3.42188, 12.3594, 6.70312, 6.28125, 4.04688]
    },
    {
     "length": 19,
     "align1": 19,
     "align2": 0,
     "timings": [3.4375, 9.45312, 2.07812, 3.65625, 12.2812, 7.46875, 6.70312, 4.65625]
    },
    {
     "length": 19,
     "align1": 0,
     "align2": 19,
     "timings": [3.34375, 9.375, 2.07812, 3.39062, 18.25, 6.65625, 6.23438, 3.9375]
    },
    {
     "length": 19,
     "align1": 19,
     "align2": 19,
     "timings": [3.375, 9.45312, 2.23438, 3.60938, 17.8438, 7.09375, 6.39062, 4.40625]
    },
    {
     "length": 20,
     "align1": 0,
     "align2": 0,
     "timings": [3.35938, 9.875, 2.20312, 3.34375, 12.5312, 5.76562, 6.64062, 4.23438]
    },
    {
     "length": 20,
     "align1": 20,
     "align2": 0,
     "timings": [3.39062, 9.75, 2.0625, 3.3125, 12.3281, 5.65625, 6.32812, 4.09375]
    },
    {
     "length": 20,
     "align1": 0,
     "align2": 20,
     "timings": [3.46875, 9.82812, 2.15625, 3.21875, 16.1094, 5.67188, 6.25, 4.3125]
    },
    {
     "length": 20,
     "align1": 20,
     "align2": 20,
     "timings": [3.39062, 9.76562, 2.17188, 3.23438, 16.0469, 5.48438, 6.35938, 4.25]
    },
    {
     "length": 21,
     "align1": 0,
     "align2": 0,
     "timings": [3.375, 13.3438, 2.1875, 3.15625, 12.375, 6.46875, 5.9375, 3.95312]
    },
    {
     "length": 21,
     "align1": 21,
     "align2": 0,
     "timings": [3.53125, 13.2969, 2.07812, 3.57812, 12.3281, 7.9375, 7.26562, 5.15625]
    },
    {
     "length": 21,
     "align1": 0,
     "align2": 21,
     "timings": [3.46875, 13.1875, 2.14062, 3.39062, 16.0312, 6.39062, 6, 3.875]
    },
    {
     "length": 21,
     "align1": 21,
     "align2": 21,
     "timings": [3.42188, 15.1562, 2.07812, 3.375, 15.6406, 7.60938, 7.10938, 4.71875]
    },
    {
     "length": 22,
     "align1": 0,
     "align2": 0,
     "timings": [3.40625, 13.5781, 2.125, 3.07812, 12.0625, 6.42188, 5.9375, 3.90625]
    },
    {
     "length": 22,
     "align1": 22,
     "align2": 0,
     "timings": [3.54688, 13.6875, 2.20312, 3.82812, 12, 5.875, 6.9375, 4.85938]
    },
    {
     "length": 22,
     "align1": 0,
     "align2": 22,
     "timings": [3.32812, 13.5938, 2.20312, 2.90625, 16.0469, 6.39062, 6.07812, 3.76562]
    },
    {
     "length": 22,
     "align1": 22,
     "align2": 22,
     "timings": [3.59375, 13.5781, 2.21875, 3.5, 15.7969, 5.48438, 6.64062, 4.59375]
    },
    {
     "length": 23,
     "align1": 0,
     "align2": 0,
     "timings": [3.46875, 14.0312, 2.14062, 3.20312, 11.7812, 6.0625, 5.67188, 3.59375]
    },
    {
     "length": 23,
     "align1": 23,
     "align2": 0,
     "timings": [3.34375, 14.1094, 2.17188, 3.29688, 11.5469, 6.96875, 6.03125, 4.46875]
    },
    {
     "length": 23,
     "align1": 0,
     "align2": 23,
     "timings": [3.40625, 14.0781, 2.15625, 3.09375, 15.8438, 6.07812, 5.70312, 3.67188]
    },
    {
     "length": 23,
     "align1": 23,
     "align2": 23,
     "timings": [3.39062, 14.1094, 2.17188, 3, 15.4688, 6.75, 6.14062, 4.34375]
    },
    {
     "length": 24,
     "align1": 0,
     "align2": 0,
     "timings": [3.57812, 14.5469, 2.1875, 3.89062, 10.0156, 5.6875, 6.5625, 4.45312]
    },
    {
     "length": 24,
     "align1": 24,
     "align2": 0,
     "timings": [3.5625, 14.5, 2.17188, 3.57812, 9.95312, 5.23438, 6.39062, 4.15625]
    },
    {
     "length": 24,
     "align1": 0,
     "align2": 24,
     "timings": [3.45312, 14.4062, 2.07812, 3.64062, 16.6406, 5.6875, 6.21875, 4.32812]
    },
    {
     "length": 24,
     "align1": 24,
     "align2": 24,
     "timings": [3.40625, 14.4375, 2.21875, 3.64062, 16.5781, 5.6875, 6.34375, 4.34375]
    },
    {
     "length": 25,
     "align1": 0,
     "align2": 0,
     "timings": [3.48438, 14.8906, 2.20312, 3.76562, 9.70312, 6.39062, 6.21875, 3.71875]
    },
    {
     "length": 25,
     "align1": 25,
     "align2": 0,
     "timings": [3.54688, 14.8906, 2.21875, 3.23438, 9.5, 7.5, 6.78125, 4.96875]
    },
    {
     "length": 25,
     "align1": 0,
     "align2": 25,
     "timings": [3.51562, 14.9219, 2.09375, 3.53125, 15.9531, 6.48438, 6.21875, 4.125]
    },
    {
     "length": 25,
     "align1": 25,
     "align2": 25,
     "timings": [3.57812, 14.8281, 2.09375, 2.84375, 15.6719, 7.20312, 6.78125, 4.67188]
    },
    {
     "length": 26,
     "align1": 0,
     "align2": 0,
     "timings": [3.39062, 15.3906, 2.07812, 3.25, 9.625, 6.45312, 5.90625, 3.95312]
    },
    {
     "length": 26,
     "align1": 26,
     "align2": 0,
     "timings": [3.35938, 15.3906, 2.09375, 3.73438, 9.46875, 5.92188, 7.09375, 4.98438]
    },
    {
     "length": 26,
     "align1": 0,
     "align2": 26,
     "timings": [3.53125, 15.2812, 2.125, 3.14062, 15.9844, 6.60938, 5.875, 3.73438]
    },
    {
     "length": 26,
     "align1": 26,
     "align2": 26,
     "timings": [3.59375, 15.4375, 2.09375, 3.57812, 15.9062, 5.78125, 6.875, 5.0625]
    },
    {
     "length": 27,
     "align1": 0,
     "align2": 0,
     "timings": [3.59375, 15.75, 2.23438, 3.53125, 9.1875, 6.3125, 5.875, 3.8125]
    },
    {
     "length": 27,
     "align1": 27,
     "align2": 0,
     "timings": [3.48438, 15.7344, 2.125, 3.8125, 9.04688, 6.9375, 6.375, 4.54688]
    },
    {
     "length": 27,
     "align1": 0,
     "align2": 27,
     "timings": [3.39062, 15.7188, 2.07812, 3.34375, 15.9844, 6.0625, 5.71875, 3.6875]
    },
    {
     "length": 27,
     "align1": 27,
     "align2": 27,
     "timings": [3.40625, 15.75, 2.23438, 3.46875, 15.6875, 6.79688, 6.125, 4.4375]
    },
    {
     "length": 28,
     "align1": 0,
     "align2": 0,
     "timings": [3.48438, 16.0625, 2.21875, 3.1875, 9.89062, 5.32812, 6.15625, 4.20312]
    },
    {
     "length": 28,
     "align1": 28,
     "align2": 0,
     "timings": [3.5, 16.1406, 2.09375, 3.15625, 9.75, 5.09375, 5.75, 4.03125]
    },
    {
     "length": 28,
     "align1": 0,
     "align2": 28,
     "timings": [3.5625, 16.2188, 2.1875, 3, 13.9062, 5.14062, 5.92188, 3.90625]
    },
    {
     "length": 28,
     "align1": 28,
     "align2": 28,
     "timings": [3.35938, 16.1562, 2.15625, 3.0625, 13.7656, 5.125, 5.98438, 3.78125]
    },
    {
     "length": 29,
     "align1": 0,
     "align2": 0,
     "timings": [3.40625, 16.5312, 2.20312, 3.17188, 9.21875, 6.20312, 5.5, 3.59375]
    },
    {
     "length": 29,
     "align1": 29,
     "align2": 0,
     "timings": [3.39062, 16.625, 2.07812, 3.57812, 8.9375, 7.625, 6.78125, 4.89062]
    },
    {
     "length": 29,
     "align1": 0,
     "align2": 29,
     "timings": [3.46875, 16.5312, 2.17188, 3.26562, 13.3125, 6.1875, 5.45312, 3.48438]
    },
    {
     "length": 29,
     "align1": 29,
     "align2": 29,
     "timings": [3.32812, 16.5625, 2.0625, 3.29688, 13, 7.3125, 6.73438, 4.67188]
    },
    {
     "length": 30,
     "align1": 0,
     "align2": 0,
     "timings": [3.53125, 16.9688, 2.04688, 2.9375, 9.17188, 6.39062, 5.70312, 3.78125]
    },
    {
     "length": 30,
     "align1": 30,
     "align2": 0,
     "timings": [3.40625, 16.9688, 2.20312, 3.3125, 9.125, 5.65625, 6.46875, 4.78125]
    },
    {
     "length": 30,
     "align1": 0,
     "align2": 30,
     "timings": [3.32812, 16.9219, 2.07812, 2.79688, 13.6094, 6.45312, 5.85938, 3.71875]
    },
    {
     "length": 30,
     "align1": 30,
     "align2": 30,
     "timings": [3.54688, 16.875, 2.07812, 3.21875, 13.4531, 5.4375, 6.39062, 4.25]
    },
    {
     "length": 31,
     "align1": 0,
     "align2": 0,
     "timings": [3.60938, 17.6406, 2.20312, 2.875, 8.875, 6.04688, 5.26562, 3.28125]
    },
    {
     "length": 31,
     "align1": 31,
     "align2": 0,
     "timings": [3.5, 17.6562, 2.10938, 3.21875, 8.70312, 6.5, 6.07812, 4.29688]
    },
    {
     "length": 31,
     "align1": 0,
     "align2": 31,
     "timings": [3.5625, 17.6094, 2.15625, 2.76562, 13.1719, 5.82812, 5.32812, 3.125]
    },
    {
     "length": 31,
     "align1": 31,
     "align2": 31,
     "timings": [3.39062, 17.6094, 2.0625, 3.15625, 13.0156, 6.6875, 5.73438, 3.95312]
    },
    {
     "length": 48,
     "align1": 0,
     "align2": 0,
     "timings": [3.98438, 24.4531, 2.40625, 4.625, 13.8125, 4.29688, 3.79688, 4.67188]
    },
    {
     "length": 48,
     "align1": 3,
     "align2": 0,
     "timings": [3.78125, 24.4375, 2.5, 3.34375, 13.3438, 10.0312, 9.07812, 6.15625]
    },
    {
     "length": 48,
     "align1": 0,
     "align2": 3,
     "timings": [3.85938, 24.4531, 2.48438, 4.64062, 15.7969, 11.2969, 10.4375, 7.23438]
    },
    {
     "length": 48,
     "align1": 3,
     "align2": 3,
     "timings": [3.73438, 24.4531, 2.5, 4.89062, 15.6406, 5.57812, 4.09375, 5.40625]
    },
    {
     "length": 80,
     "align1": 0,
     "align2": 0,
     "timings": [5.20312, 37.8125, 3.23438, 4.48438, 14.6094, 5.0625, 4.29688, 5.29688]
    },
    {
     "length": 80,
     "align1": 5,
     "align2": 0,
     "timings": [4.53125, 37.8438, 3.25, 4.03125, 14.3281, 10.5938, 9.32812, 7.23438]
    },
    {
     "length": 80,
     "align1": 0,
     "align2": 5,
     "timings": [4.67188, 37.9219, 3.1875, 5.4375, 16.9062, 11.4844, 11.0156, 8.5]
    },
    {
     "length": 80,
     "align1": 5,
     "align2": 5,
     "timings": [4.51562, 37.8594, 3.01562, 4.90625, 16.6562, 5.32812, 5.03125, 5.85938]
    },
    {
     "length": 96,
     "align1": 0,
     "align2": 0,
     "timings": [4.875, 44.4531, 3.34375, 4.45312, 14.9531, 4.76562, 5.07812, 5.6875]
    },
    {
     "length": 96,
     "align1": 6,
     "align2": 0,
     "timings": [4.79688, 44.4375, 3.26562, 4.73438, 14.6406, 10.8281, 10.2344, 7.875]
    },
    {
     "length": 96,
     "align1": 0,
     "align2": 6,
     "timings": [4.5625, 44.4531, 3, 5.375, 17.7031, 11, 10.9375, 8.70312]
    },
    {
     "length": 96,
     "align1": 6,
     "align2": 6,
     "timings": [4.57812, 44.4219, 2.96875, 4.65625, 17.2188, 5.15625, 5.23438, 5.73438]
    },
    {
     "length": 112,
     "align1": 0,
     "align2": 0,
     "timings": [5.46875, 51.1562, 3.84375, 4.45312, 15.4062, 4.625, 5.01562, 5.6875]
    },
    {
     "length": 112,
     "align1": 7,
     "align2": 0,
     "timings": [5.23438, 51.1562, 3.82812, 4.26562, 14.9531, 12.1094, 10.875, 8.625]
    },
    {
     "length": 112,
     "align1": 0,
     "align2": 7,
     "timings": [5.07812, 51.1875, 3.57812, 6.17188, 17.7031, 12.5, 11.8594, 9.51562]
    },
    {
     "length": 112,
     "align1": 7,
     "align2": 7,
     "timings": [4.89062, 51.125, 3.6875, 4.76562, 17.4219, 5.34375, 5.4375, 6.42188]
    },
    {
     "length": 144,
     "align1": 0,
     "align2": 0,
     "timings": [5.75, 64.625, 3.98438, 5.54688, 22.7656, 5.10938, 5.53125, 6.5625]
    },
    {
     "length": 144,
     "align1": 9,
     "align2": 0,
     "timings": [5.625, 64.625, 4.09375, 5.51562, 22.25, 12.8594, 12, 12.7031]
    },
    {
     "length": 144,
     "align1": 0,
     "align2": 9,
     "timings": [5.46875, 64.6719, 3.9375, 7.03125, 23.1406, 13.9062, 13.3125, 14.0625]
    },
    {
     "length": 144,
     "align1": 9,
     "align2": 9,
     "timings": [5.92188, 64.6406, 4.34375, 5.89062, 22.9062, 6.71875, 6, 7.15625]
    },
    {
     "length": 160,
     "align1": 0,
     "align2": 0,
     "timings": [5.5625, 71.125, 4.29688, 5.5, 22.9531, 5.3125, 6.65625, 7.35938]
    },
    {
     "length": 160,
     "align1": 10,
     "align2": 0,
     "timings": [5.64062, 71.5312, 4.29688, 6.23438, 22.7031, 13.5469, 12.5781, 13.2031]
    },
    {
     "length": 160,
     "align1": 0,
     "align2": 10,
     "timings": [5.14062, 71.1562, 3.75, 6.71875, 23.75, 14.2188, 13.4844, 14.7969]
    },
    {
     "length": 160,
     "align1": 10,
     "align2": 10,
     "timings": [5.3125, 71.6562, 3.65625, 5.75, 23.2031, 6.98438, 6.29688, 7.6875]
    },
    {
     "length": 176,
     "align1": 0,
     "align2": 0,
     "timings": [6.32812, 78.0625, 4.57812, 5.35938, 23.2344, 4.96875, 6.14062, 7.125]
    },
    {
     "length": 176,
     "align1": 11,
     "align2": 0,
     "timings": [5.8125, 77.9531, 4.70312, 6.07812, 22.625, 14.7344, 13.6875, 14.25]
    },
    {
     "length": 176,
     "align1": 0,
     "align2": 11,
     "timings": [5.78125, 78.2031, 4.42188, 7.45312, 23.4688, 15.0156, 14.2031, 15.1406]
    },
    {
     "length": 176,
     "align1": 11,
     "align2": 11,
     "timings": [5.42188, 77.9844, 4.28125, 5.46875, 22.7969, 7.1875, 7.10938, 7.53125]
    },
    {
     "length": 192,
     "align1": 0,
     "align2": 0,
     "timings": [5.64062, 84.6719, 3.40625, 5.875, 23.4844, 5.21875, 7.26562, 7.625]
    },
    {
     "length": 192,
     "align1": 12,
     "align2": 0,
     "timings": [4.95312, 84.7031, 3.375, 6.96875, 23.0156, 14.9688, 14.3438, 14.9531]
    },
    {
     "length": 192,
     "align1": 0,
     "align2": 12,
     "timings": [5.98438, 84.5625, 4.10938, 7.65625, 24.4531, 16.0938, 16.25, 16.2031]
    },
    {
     "length": 192,
     "align1": 12,
     "align2": 12,
     "timings": [5.45312, 84.875, 4.23438, 5.79688, 23.9688, 6.9375, 7.54688, 7.92188]
    },
    {
     "length": 208,
     "align1": 0,
     "align2": 0,
     "timings": [5.64062, 91.6406, 4.03125, 5.54688, 23.8906, 5.1875, 6.71875, 7.65625]
    },
    {
     "length": 208,
     "align1": 13,
     "align2": 0,
     "timings": [5.29688, 91.875, 3.9375, 6.34375, 23.5938, 16, 15.0625, 15.7344]
    },
    {
     "length": 208,
     "align1": 0,
     "align2": 13,
     "timings": [5.59375, 91.8125, 4.23438, 8.07812, 26.8281, 16.25, 16.2812, 17.2812]
    },
    {
     "length": 208,
     "align1": 13,
     "align2": 13,
     "timings": [5.21875, 91.4375, 4.15625, 5.6875, 26.6094, 6.6875, 7.34375, 8.1875]
    },
    {
     "length": 224,
     "align1": 0,
     "align2": 0,
     "timings": [5.34375, 98.1719, 3.875, 5.39062, 24.3594, 5.35938, 7.78125, 8.60938]
    },
    {
     "length": 224,
     "align1": 14,
     "align2": 0,
     "timings": [5.28125, 98.5781, 3.9375, 7.10938, 23.8438, 16.0469, 15.3906, 16.3906]
    },
    {
     "length": 224,
     "align1": 0,
     "align2": 14,
     "timings": [5.5, 98.3125, 4.17188, 8.0625, 28.0312, 16.2812, 16.4531, 17.125]
    },
    {
     "length": 224,
     "align1": 14,
     "align2": 14,
     "timings": [5.45312, 98.2812, 4.125, 5.34375, 27.6562, 6.48438, 7.8125, 8.42188]
    },
    {
     "length": 240,
     "align1": 0,
     "align2": 0,
     "timings": [6.23438, 105.109, 4.375, 5.15625, 24.6406, 5.32812, 7.60938, 8.3125]
    },
    {
     "length": 240,
     "align1": 15,
     "align2": 0,
     "timings": [5.71875, 104.781, 4.29688, 7.1875, 24.3594, 17, 15.9062, 17.0938]
    },
    {
     "length": 240,
     "align1": 0,
     "align2": 15,
     "timings": [5.875, 105.484, 4.3125, 8.76562, 28.2969, 17.1875, 16.6562, 17.6094]
    },
    {
     "length": 240,
     "align1": 15,
     "align2": 15,
     "timings": [5.8125, 104.719, 4.34375, 5.78125, 28.1875, 7.09375, 8.03125, 8.92188]
    },
    {
     "length": 272,
     "align1": 0,
     "align2": 0,
     "timings": [6.79688, 118.125, 5.3125, 6.64062, 25.0938, 5.875, 8.23438, 9.15625]
    },
    {
     "length": 272,
     "align1": 17,
     "align2": 0,
     "timings": [6.5, 118.25, 5.4375, 7.625, 24.6562, 17.4688, 16.5156, 18.4531]
    },
    {
     "length": 272,
     "align1": 0,
     "align2": 17,
     "timings": [7.21875, 118.719, 5.23438, 9.60938, 28.1094, 18.9375, 18.25, 19.7969]
    },
    {
     "length": 272,
     "align1": 17,
     "align2": 17,
     "timings": [6.34375, 118.625, 5.01562, 7.67188, 27.2031, 7, 9.09375, 9.78125]
    },
    {
     "length": 288,
     "align1": 0,
     "align2": 0,
     "timings": [6.45312, 126.078, 5.29688, 6.85938, 24.9688, 5.9375, 9.42188, 10.0469]
    },
    {
     "length": 288,
     "align1": 18,
     "align2": 0,
     "timings": [6.625, 125.172, 4.875, 8.54688, 24.9219, 17.7188, 17.1406, 19.0625]
    },
    {
     "length": 288,
     "align1": 0,
     "align2": 18,
     "timings": [6.625, 125.625, 4.96875, 9.0625, 27.9375, 18.9531, 18.375, 20.5469]
    },
    {
     "length": 288,
     "align1": 18,
     "align2": 18,
     "timings": [6.46875, 125.516, 4.92188, 6.76562, 27.3125, 7.48438, 9.51562, 10.125]
    },
    {
     "length": 304,
     "align1": 0,
     "align2": 0,
     "timings": [7.10938, 131.75, 5.59375, 6.60938, 24.3594, 6.07812, 9.35938, 10.2969]
    },
    {
     "length": 304,
     "align1": 19,
     "align2": 0,
     "timings": [6.90625, 131.875, 5.625, 8.20312, 24.3594, 18.4844, 17.7344, 19.6562]
    },
    {
     "length": 304,
     "align1": 0,
     "align2": 19,
     "timings": [7.59375, 132.594, 5.28125, 12.5, 28.6406, 19.3125, 18.6406, 20.9688]
    },
    {
     "length": 304,
     "align1": 19,
     "align2": 19,
     "timings": [6.9375, 131.906, 5.21875, 6.78125, 27.6562, 7.5, 9.45312, 10.5781]
    },
    {
     "length": 320,
     "align1": 0,
     "align2": 0,
     "timings": [6.375, 138.938, 4.70312, 6.53125, 24.9688, 6.60938, 10.5938, 10.5156]
    },
    {
     "length": 320,
     "align1": 20,
     "align2": 0,
     "timings": [6.14062, 138.672, 4.65625, 9.0625, 24.8125, 18.8906, 18.0625, 20.4219]
    },
    {
     "length": 320,
     "align1": 0,
     "align2": 20,
     "timings": [7.85938, 139.281, 5.14062, 10.5938, 29.0781, 19.8125, 19.5781, 22.2344]
    },
    {
     "length": 320,
     "align1": 20,
     "align2": 20,
     "timings": [6.76562, 138.891, 5.10938, 6.96875, 28.4531, 7.54688, 10.9531, 11]
    },
    {
     "length": 336,
     "align1": 0,
     "align2": 0,
     "timings": [6.59375, 145.422, 4.96875, 6.51562, 25.3594, 6.75, 9.70312, 10.9375]
    },
    {
     "length": 336,
     "align1": 21,
     "align2": 0,
     "timings": [6.1875, 145.047, 5.07812, 8.71875, 25.4844, 19.8125, 19, 21.3281]
    },
    {
     "length": 336,
     "align1": 0,
     "align2": 21,
     "timings": [7.01562, 145.75, 5.34375, 10.7969, 29.2812, 20.2031, 20.1094, 22.9375]
    },
    {
     "length": 336,
     "align1": 21,
     "align2": 21,
     "timings": [6.1875, 145.391, 4.95312, 6.96875, 28.25, 7.29688, 10.4062, 11.3281]
    },
    {
     "length": 352,
     "align1": 0,
     "align2": 0,
     "timings": [6.23438, 152.562, 5.04688, 5.92188, 25.6719, 6.82812, 10.8438, 12.0781]
    },
    {
     "length": 352,
     "align1": 22,
     "align2": 0,
     "timings": [6.20312, 151.938, 4.95312, 11.1094, 25.75, 20.375, 19.5938, 21.8594]
    },
    {
     "length": 352,
     "align1": 0,
     "align2": 22,
     "timings": [7.25, 152.141, 5.07812, 12.0625, 29.6719, 20.4688, 20.8594, 23]
    },
    {
     "length": 352,
     "align1": 22,
     "align2": 22,
     "timings": [6.10938, 153.438, 4.9375, 6.5, 29.1719, 7.45312, 10.7031, 11.4531]
    },
    {
     "length": 368,
     "align1": 0,
     "align2": 0,
     "timings": [7.29688, 159.375, 5.32812, 6.15625, 26.1406, 7.10938, 10.8438, 12.375]
    },
    {
     "length": 368,
     "align1": 23,
     "align2": 0,
     "timings": [7, 159.328, 5.34375, 9.64062, 26.2969, 21.3125, 20.2031, 22.5625]
    },
    {
     "length": 368,
     "align1": 0,
     "align2": 23,
     "timings": [7.04688, 158.891, 5.29688, 12.4531, 32.1094, 21.5, 20.9062, 23.6875]
    },
    {
     "length": 368,
     "align1": 23,
     "align2": 23,
     "timings": [6.96875, 158.781, 5.35938, 6.8125, 30.7812, 7.65625, 10.9531, 11.8594]
    },
    {
     "length": 384,
     "align1": 0,
     "align2": 0,
     "timings": [7.64062, 166.25, 6.07812, 8.15625, 22.3125, 7.57812, 12.2188, 13.4688]
    },
    {
     "length": 384,
     "align1": 24,
     "align2": 0,
     "timings": [7.40625, 166.016, 5.48438, 12, 21.8281, 7.3125, 12.0156, 13.3594]
    },
    {
     "length": 384,
     "align1": 0,
     "align2": 24,
     "timings": [6.6875, 166, 5.4375, 12.2344, 27.9688, 7.42188, 12.1562, 13.4062]
    },
    {
     "length": 384,
     "align1": 24,
     "align2": 24,
     "timings": [6.95312, 166.859, 5.46875, 7.26562, 27.1719, 7.26562, 12.1406, 13.4219]
    },
    {
     "length": 400,
     "align1": 0,
     "align2": 0,
     "timings": [7.89062, 171.234, 6.3125, 7.51562, 27.5312, 7.85938, 12.4688, 13.625]
    },
    {
     "length": 400,
     "align1": 25,
     "align2": 0,
     "timings": [7.76562, 172.172, 6.25, 12.1875, 27.9062, 22.1094, 21.5469, 24.0625]
    },
    {
     "length": 400,
     "align1": 0,
     "align2": 25,
     "timings": [7.84375, 173.031, 6.28125, 14.0781, 28.0625, 23.2188, 23.1094, 25.2812]
    },
    {
     "length": 400,
     "align1": 25,
     "align2": 25,
     "timings": [7.29688, 173.172, 5.84375, 7.8125, 26.6562, 9.21875, 13.3594, 14.4219]
    },
    {
     "length": 416,
     "align1": 0,
     "align2": 0,
     "timings": [7.59375, 178.797, 6.14062, 7.65625, 27.9844, 8.0625, 12.75, 14.4688]
    },
    {
     "length": 416,
     "align1": 26,
     "align2": 0,
     "timings": [7.59375, 179.469, 6.1875, 12.6406, 28.2031, 22.625, 21.8594, 24.625]
    },
    {
     "length": 416,
     "align1": 0,
     "align2": 26,
     "timings": [7.51562, 179.656, 5.9375, 13.5156, 27.8125, 23.8438, 23.1875, 25.9219]
    },
    {
     "length": 416,
     "align1": 26,
     "align2": 26,
     "timings": [7.45312, 180.141, 5.76562, 7.45312, 26.4375, 9.10938, 13.3125, 14.8438]
    },
    {
     "length": 432,
     "align1": 0,
     "align2": 0,
     "timings": [8.1875, 185.938, 6.75, 7.09375, 27.3438, 8.4375, 13.2344, 14.4375]
    },
    {
     "length": 432,
     "align1": 27,
     "align2": 0,
     "timings": [7.9375, 186.531, 6.28125, 13.3125, 28.0312, 23.0781, 22.4375, 25.3906]
    },
    {
     "length": 432,
     "align1": 0,
     "align2": 27,
     "timings": [7.79688, 186.703, 6.70312, 14.1406, 28.1875, 24.0469, 23.7969, 26.2656]
    },
    {
     "length": 432,
     "align1": 27,
     "align2": 27,
     "timings": [7.71875, 186.391, 6.375, 7.8125, 26.3906, 9.10938, 13.7031, 14.6875]
    },
    {
     "length": 448,
     "align1": 0,
     "align2": 0,
     "timings": [7.46875, 193.969, 5.71875, 8.125, 27.4688, 8.78125, 13.3906, 15.0938]
    },
    {
     "length": 448,
     "align1": 28,
     "align2": 0,
     "timings": [6.82812, 193.094, 5.5625, 13.0469, 28.1094, 23.4219, 22.7656, 26]
    },
    {
     "length": 448,
     "align1": 0,
     "align2": 28,
     "timings": [7.95312, 193.969, 6.34375, 14.5312, 28.9375, 24.75, 24.5938, 27.8594]
    },
    {
     "length": 448,
     "align1": 28,
     "align2": 28,
     "timings": [7.76562, 192.328, 6.42188, 7.95312, 27.6719, 9.76562, 14.2812, 15.2344]
    },
    {
     "length": 464,
     "align1": 0,
     "align2": 0,
     "timings": [7.54688, 199.109, 5.76562, 7.40625, 27.5312, 8.95312, 13.7969, 15.1875]
    },
    {
     "length": 464,
     "align1": 29,
     "align2": 0,
     "timings": [7.1875, 199.625, 5.85938, 13.7969, 28.3906, 24.375, 23.25, 27.0312]
    },
    {
     "length": 464,
     "align1": 0,
     "align2": 29,
     "timings": [7.65625, 200.156, 6.625, 14.4531, 29.125, 25.0781, 24.7344, 28.5469]
    },
    {
     "length": 464,
     "align1": 29,
     "align2": 29,
     "timings": [7.17188, 200.312, 5.875, 8.125, 27.5156, 9.375, 14.3438, 15.5156]
    },
    {
     "length": 480,
     "align1": 0,
     "align2": 0,
     "timings": [7.125, 206.875, 5.79688, 7.65625, 28.0469, 9.25, 14.0781, 15.75]
    },
    {
     "length": 480,
     "align1": 30,
     "align2": 0,
     "timings": [7.26562, 206.422, 5.78125, 13.625, 28.6875, 24.6094, 23.75, 27.5781]
    },
    {
     "length": 480,
     "align1": 0,
     "align2": 30,
     "timings": [7.625, 207.656, 6.21875, 14.5625, 29.5312, 25.2812, 25.1719, 28.3125]
    },
    {
     "length": 480,
     "align1": 30,
     "align2": 30,
     "timings": [7.17188, 206.641, 5.76562, 7.3125, 27.9844, 9.73438, 14.5625, 15.7031]
    },
    {
     "length": 496,
     "align1": 0,
     "align2": 0,
     "timings": [7.92188, 215.156, 6.25, 7.15625, 28.0469, 9.375, 14.4062, 15.8594]
    },
    {
     "length": 496,
     "align1": 31,
     "align2": 0,
     "timings": [8.0625, 212.906, 6.90625, 14.3125, 28.3906, 25.4219, 24.375, 28.1875]
    },
    {
     "length": 496,
     "align1": 0,
     "align2": 31,
     "timings": [8.03125, 214.797, 6.6875, 15.1406, 33.1719, 26.2031, 25.8594, 29.0312]
    },
    {
     "length": 496,
     "align1": 31,
     "align2": 31,
     "timings": [8.125, 213.375, 6.95312, 7.79688, 31.125, 10.4219, 15.1719, 16.4375]
    },
    {
     "length": 1024,
     "align1": 0,
     "align2": 0,
     "timings": [13.0625, 438.641, 10.8906, 16.1406, 31.125, 19.6094, 24.5, 26.5625]
    },
    {
     "length": 1024,
     "align1": 32,
     "align2": 0,
     "timings": [12.7656, 438.047, 10.8906, 15.8438, 30.6719, 19.5156, 24.4688, 26.4219]
    },
    {
     "length": 1024,
     "align1": 0,
     "align2": 32,
     "timings": [12.6562, 438.875, 10.8594, 16.2656, 37.2344, 19.5469, 25.0781, 26.9531]
    },
    {
     "length": 1024,
     "align1": 32,
     "align2": 32,
     "timings": [12.7812, 439.281, 10.875, 15.9219, 36.0156, 19.5625, 24.5312, 26.2969]
    },
    {
     "length": 1056,
     "align1": 0,
     "align2": 0,
     "timings": [12.9219, 450.016, 11.2812, 15.6406, 36, 20.6875, 24.9688, 26.8438]
    },
    {
     "length": 1056,
     "align1": 33,
     "align2": 0,
     "timings": [13.1719, 451.609, 11.6719, 24.3125, 36.875, 44.2344, 43.8594, 52.4219]
    },
    {
     "length": 1056,
     "align1": 0,
     "align2": 33,
     "timings": [14.2344, 453.031, 12.3281, 26.0781, 41.0156, 45.1719, 45.4531, 53.5625]
    },
    {
     "length": 1056,
     "align1": 33,
     "align2": 33,
     "timings": [13.1719, 449.547, 11.5156, 16.7812, 39.0312, 23.0312, 26.1719, 27.3594]
    },
    {
     "length": 1088,
     "align1": 0,
     "align2": 0,
     "timings": [12.5781, 485.172, 10.875, 15.7344, 36.4219, 21.6562, 25.7344, 27.6562]
    },
    {
     "length": 1088,
     "align1": 34,
     "align2": 0,
     "timings": [12.9375, 462.172, 10.9531, 24.875, 37.2812, 45.7031, 44.75, 53.5]
    },
    {
     "length": 1088,
     "align1": 0,
     "align2": 34,
     "timings": [14.5469, 464.047, 12.2812, 25.8438, 41.875, 46.7031, 46.1406, 55.1406]
    },
    {
     "length": 1088,
     "align1": 34,
     "align2": 34,
     "timings": [13.8125, 466.016, 11.6719, 15.9219, 39.5312, 23.2969, 26.4844, 27.7188]
    },
    {
     "length": 1120,
     "align1": 0,
     "align2": 0,
     "timings": [12.7188, 480.078, 11.0625, 15.5, 36.75, 21.1094, 26.3281, 28.1875]
    },
    {
     "length": 1120,
     "align1": 35,
     "align2": 0,
     "timings": [12.9688, 477.594, 11.75, 25.4062, 37.7812, 47.7656, 45.8594, 54.875]
    },
    {
     "length": 1120,
     "align1": 0,
     "align2": 35,
     "timings": [14.3594, 477.312, 12.1562, 26.8281, 47.0156, 47.2031, 47.0312, 56]
    },
    {
     "length": 1120,
     "align1": 35,
     "align2": 35,
     "timings": [12.9844, 479.438, 11.9219, 16.0781, 45.3594, 23.7969, 26.6875, 28.0938]
    },
    {
     "length": 1152,
     "align1": 0,
     "align2": 0,
     "timings": [15.2188, 490.391, 13.6562, 17.0625, 33.2969, 22.1094, 26.75, 28.7969]
    },
    {
     "length": 1152,
     "align1": 36,
     "align2": 0,
     "timings": [15.6719, 490.25, 14.25, 26.1094, 34.5, 47.75, 46.9219, 56.4531]
    },
    {
     "length": 1152,
     "align1": 0,
     "align2": 36,
     "timings": [14.75, 497.047, 12.7344, 26.9062, 42.7969, 48.7812, 49.0625, 57.9844]
    },
    {
     "length": 1152,
     "align1": 36,
     "align2": 36,
     "timings": [13.5781, 489.797, 11.8281, 15.5938, 41.0156, 23.6562, 28.125, 28.3906]
    },
    {
     "length": 1184,
     "align1": 0,
     "align2": 0,
     "timings": [15.4688, 524.609, 13.875, 16.8594, 38.3594, 22.9219, 27.2344, 29.3125]
    },
    {
     "length": 1184,
     "align1": 37,
     "align2": 0,
     "timings": [15.7812, 505.344, 14.2812, 26.375, 39.5, 49.3281, 48.3125, 57.8594]
    },
    {
     "length": 1184,
     "align1": 0,
     "align2": 37,
     "timings": [16.1719, 504.938, 14.5938, 27.5156, 43.0625, 50.7656, 49.5469, 59.6094]
    },
    {
     "length": 1184,
     "align1": 37,
     "align2": 37,
     "timings": [15.3438, 504.969, 14.0156, 17.125, 41.0625, 25.0625, 27.5625, 28.7656]
    },
    {
     "length": 1216,
     "align1": 0,
     "align2": 0,
     "timings": [14.9844, 516.484, 13.3906, 16.9375, 38.7344, 23.6406, 27.8906, 29.7188]
    },
    {
     "length": 1216,
     "align1": 38,
     "align2": 0,
     "timings": [15.3125, 523.016, 13.9062, 27.0938, 39.3438, 50.5625, 49.1719, 59.2188]
    },
    {
     "length": 1216,
     "align1": 0,
     "align2": 38,
     "timings": [17.1406, 589.031, 15.3906, 27.7812, 44.0938, 51.625, 50.7188, 60.0938]
    },
    {
     "length": 1216,
     "align1": 38,
     "align2": 38,
     "timings": [15.7656, 522.156, 14.4531, 16.8281, 42.5625, 24.9375, 27.9375, 29.4844]
    },
    {
     "length": 1248,
     "align1": 0,
     "align2": 0,
     "timings": [15.3906, 531.859, 13.5312, 16.6094, 39.1875, 23.1406, 28.4375, 30.375]
    },
    {
     "length": 1248,
     "align1": 39,
     "align2": 0,
     "timings": [16, 533.922, 14.9375, 27.4062, 40.1562, 51.1562, 50.6094, 60.4844]
    },
    {
     "length": 1248,
     "align1": 0,
     "align2": 39,
     "timings": [17.0312, 539.953, 15.8281, 28.5781, 46.8125, 51.375, 51.8125, 61.7812]
    },
    {
     "length": 1248,
     "align1": 39,
     "align2": 39,
     "timings": [16, 533.938, 14.5469, 17.0469, 44.7344, 25.5781, 28.625, 30.0625]
    },
    {
     "length": 1280,
     "align1": 0,
     "align2": 0,
     "timings": [17.1875, 577.375, 15.3906, 18.2656, 49.2031, 24.1094, 28.8594, 31.0938]
    },
    {
     "length": 1280,
     "align1": 40,
     "align2": 0,
     "timings": [17.5, 549.016, 15.9844, 28.2188, 35.3906, 23.7188, 28.7812, 30.9062]
    },
    {
     "length": 1280,
     "align1": 0,
     "align2": 40,
     "timings": [17.6094, 547.156, 15.75, 29.5781, 41.9062, 24.1562, 30.1406, 31.9219]
    },
    {
     "length": 1280,
     "align1": 40,
     "align2": 40,
     "timings": [16.3125, 544.828, 15.1719, 17.4375, 41.2812, 23.9219, 28.9219, 30.8906]
    },
    {
     "length": 1312,
     "align1": 0,
     "align2": 0,
     "timings": [17.25, 558.266, 15.3906, 17.9375, 40.7656, 25.0625, 29.5156, 31.3594]
    },
    {
     "length": 1312,
     "align1": 41,
     "align2": 0,
     "timings": [17.5469, 562.672, 21.8281, 28.875, 41.8281, 53.0938, 52.3438, 63.5]
    },
    {
     "length": 1312,
     "align1": 0,
     "align2": 41,
     "timings": [18.2656, 559.328, 16.6094, 30.2812, 43.125, 53.7344, 53.8906, 64.4219]
    },
    {
     "length": 1312,
     "align1": 41,
     "align2": 41,
     "timings": [16.9219, 559.688, 15.7188, 18.6719, 41.3438, 27.2344, 30.6875, 31.7344]
    },
    {
     "length": 1344,
     "align1": 0,
     "align2": 0,
     "timings": [16.9375, 577.328, 15.2188, 17.8438, 41.0625, 25.9062, 30.0938, 32.0156]
    },
    {
     "length": 1344,
     "align1": 42,
     "align2": 0,
     "timings": [17.2344, 570.812, 15.7812, 29.6094, 42.4062, 56.0312, 53.7031, 64.9531]
    },
    {
     "length": 1344,
     "align1": 0,
     "align2": 42,
     "timings": [18.7656, 573.234, 17.0469, 29.8594, 44.0625, 55.7656, 54.875, 65.5781]
    },
    {
     "length": 1344,
     "align1": 42,
     "align2": 42,
     "timings": [17.5938, 571.172, 16.1875, 17.8281, 42.5, 27.9219, 30.7656, 32.1875]
    },
    {
     "length": 1376,
     "align1": 0,
     "align2": 0,
     "timings": [17.1406, 587.203, 15.375, 17.6406, 41.4062, 25.375, 30.6719, 32.6875]
    },
    {
     "length": 1376,
     "align1": 43,
     "align2": 0,
     "timings": [17.4375, 591.641, 15.9375, 30.3281, 43.0312, 55.2656, 54.6406, 66.0781]
    },
    {
     "length": 1376,
     "align1": 0,
     "align2": 43,
     "timings": [18.2344, 585.75, 16.3906, 30.7344, 47.8125, 55.8594, 55.7031, 66.75]
    },
    {
     "length": 1376,
     "align1": 43,
     "align2": 43,
     "timings": [16.9219, 589.609, 15.6562, 18.125, 44.5781, 27.9062, 30.875, 32.2656]
    },
    {
     "length": 1408,
     "align1": 0,
     "align2": 0,
     "timings": [18.125, 601.797, 16.4688, 19.4219, 38.125, 26.2812, 31.0781, 33.1406]
    },
    {
     "length": 1408,
     "align1": 44,
     "align2": 0,
     "timings": [18.4375, 603.312, 17.1562, 30.6562, 38.9219, 56.625, 55.7344, 67.5156]
    },
    {
     "length": 1408,
     "align1": 0,
     "align2": 44,
     "timings": [18.8125, 606.578, 17.0156, 31.4219, 46.5625, 57.2344, 57.625, 68.6719]
    },
    {
     "length": 1408,
     "align1": 44,
     "align2": 44,
     "timings": [17.5781, 662.484, 16.2344, 17.9844, 43.4531, 27.9844, 32.3906, 33.3281]
    },
    {
     "length": 1440,
     "align1": 0,
     "align2": 0,
     "timings": [18.0938, 609.641, 16.7188, 19.125, 43.375, 27.125, 31.6875, 33.6094]
    },
    {
     "length": 1440,
     "align1": 45,
     "align2": 0,
     "timings": [18.6562, 606.828, 22.9062, 31.3906, 44, 58.1875, 56.9375, 68.9219]
    },
    {
     "length": 1440,
     "align1": 0,
     "align2": 45,
     "timings": [19.5469, 627.25, 18, 31.5, 46.6719, 59.3125, 58.1406, 70.4062]
    },
    {
     "length": 1440,
     "align1": 45,
     "align2": 45,
     "timings": [18.3594, 609.484, 16.8594, 19.4219, 43.5469, 29.375, 32.0156, 33.25]
    },
    {
     "length": 1472,
     "align1": 0,
     "align2": 0,
     "timings": [18.0312, 629.766, 16.3281, 18.9688, 43.5625, 28.0938, 32.2656, 34.4375]
    },
    {
     "length": 1472,
     "align1": 46,
     "align2": 0,
     "timings": [18.2656, 651.969, 17.0312, 31.8906, 44.8438, 73.4219, 58.0469, 70.4844]
    },
    {
     "length": 1472,
     "align1": 0,
     "align2": 46,
     "timings": [19.9375, 629.516, 18.4375, 31.9062, 46.3125, 60.4688, 59.2969, 70.8438]
    },
    {
     "length": 1472,
     "align1": 46,
     "align2": 46,
     "timings": [18.5938, 631.625, 17.2344, 19.0312, 44.9531, 29.4531, 32.4688, 33.6719]
    },
    {
     "length": 1504,
     "align1": 0,
     "align2": 0,
     "timings": [18.25, 646.125, 16.5156, 18.8438, 43.7969, 27.75, 32.9844, 34.7344]
    },
    {
     "length": 1504,
     "align1": 47,
     "align2": 0,
     "timings": [18.5156, 646.016, 17.125, 32.4688, 46.0312, 59.8438, 59.1562, 71.3438]
    },
    {
     "length": 1504,
     "align1": 0,
     "align2": 47,
     "timings": [19.5625, 649.047, 17.7188, 32.7812, 49.1562, 59.8594, 60.0938, 72.3438]
    },
    {
     "length": 1504,
     "align1": 47,
     "align2": 47,
     "timings": [18.2344, 646.5, 16.8594, 19.125, 46.7969, 30.0625, 33.2344, 34.6094]
    },
    {
     "length": 1536,
     "align1": 0,
     "align2": 0,
     "timings": [19.2812, 658.672, 17.4844, 20.4844, 40.125, 28.1875, 33.4688, 35.375]
    },
    {
     "length": 1536,
     "align1": 48,
     "align2": 0,
     "timings": [18.8438, 655.109, 17.5938, 20.3594, 40.0469, 28.25, 33.3906, 35.4688]
    },
    {
     "length": 1536,
     "align1": 0,
     "align2": 48,
     "timings": [18.7969, 658.188, 17.625, 20.4219, 45.875, 28.4375, 34.6719, 36.5]
    },
    {
     "length": 1536,
     "align1": 48,
     "align2": 48,
     "timings": [18.7969, 653.5, 17.5312, 20.375, 45.625, 28.0781, 33.375, 35.3125]
    },
    {
     "length": 1568,
     "align1": 0,
     "align2": 0,
     "timings": [19.0781, 668.141, 17.75, 20.3438, 45.8594, 29.2812, 33.9375, 35.8594]
    },
    {
     "length": 1568,
     "align1": 49,
     "align2": 0,
     "timings": [19.8594, 668.766, 18.3125, 33.6094, 47.0938, 61.9062, 61.4219, 74.6719]
    },
    {
     "length": 1568,
     "align1": 0,
     "align2": 49,
     "timings": [20.7031, 669.219, 21.5469, 34.9688, 51.6719, 62.5625, 62.3438, 75.625]
    },
    {
     "length": 1568,
     "align1": 49,
     "align2": 49,
     "timings": [19.1094, 671.594, 18.0156, 21.3281, 48.3125, 31.4062, 34.9688, 36.4844]
    },
    {
     "length": 1600,
     "align1": 0,
     "align2": 0,
     "timings": [18.9219, 690.344, 17.4062, 20.0625, 46.0469, 30.1562, 34.5312, 36.4844]
    },
    {
     "length": 1600,
     "align1": 50,
     "align2": 0,
     "timings": [19.6406, 690.359, 18.125, 33.9062, 47.0312, 63.2969, 62.2031, 75.6875]
    },
    {
     "length": 1600,
     "align1": 0,
     "align2": 50,
     "timings": [21.4375, 681.328, 22.125, 34.7812, 51.8438, 64.3438, 63.9688, 76.8281]
    },
    {
     "length": 1600,
     "align1": 50,
     "align2": 50,
     "timings": [19.8281, 679.891, 18.2344, 20.3438, 48.9062, 32.1562, 35.125, 36.375]
    },
    {
     "length": 1632,
     "align1": 0,
     "align2": 0,
     "timings": [19.4219, 698.516, 17.4844, 19.9375, 46.5, 29.9688, 35.0781, 36.8594]
    },
    {
     "length": 1632,
     "align1": 51,
     "align2": 0,
     "timings": [19.5625, 690.531, 18.2656, 34.75, 48.1094, 66.3125, 63.7656, 77.3125]
    },
    {
     "length": 1632,
     "align1": 0,
     "align2": 51,
     "timings": [20.7188, 702.453, 19.75, 35.6875, 56.0625, 64.1875, 64.5625, 77.9062]
    },
    {
     "length": 1632,
     "align1": 51,
     "align2": 51,
     "timings": [19.3281, 702.922, 17.8281, 20.5469, 52.9062, 32.6406, 35.5781, 36.6406]
    },
    {
     "length": 1664,
     "align1": 0,
     "align2": 0,
     "timings": [20.4531, 713.938, 18.6094, 21.5469, 43.125, 30.5781, 35.6094, 37.5469]
    },
    {
     "length": 1664,
     "align1": 52,
     "align2": 0,
     "timings": [20.75, 769.547, 19.625, 35.2188, 44.9688, 65.125, 64.7344, 78.6719]
    },
    {
     "length": 1664,
     "align1": 0,
     "align2": 52,
     "timings": [21.5312, 716.344, 20.3438, 36.125, 53.6562, 65.8125, 66.1406, 79.3438]
    },
    {
     "length": 1664,
     "align1": 52,
     "align2": 52,
     "timings": [19.7188, 713.531, 18.3281, 20.2344, 50.3906, 32.2812, 54.2812, 37.4062]
    },
    {
     "length": 1696,
     "align1": 0,
     "align2": 0,
     "timings": [20.6094, 727.172, 18.8125, 21.4062, 47.7812, 31.6875, 36.2812, 38.2656]
    },
    {
     "length": 1696,
     "align1": 53,
     "align2": 0,
     "timings": [20.8438, 732.984, 19.3906, 35.7969, 49.9688, 66.9375, 65.6562, 79.875]
    },
    {
     "length": 1696,
     "align1": 0,
     "align2": 53,
     "timings": [22.125, 719.859, 22.0781, 36.5156, 54.6719, 66.75, 67.0469, 81.0156]
    },
    {
     "length": 1696,
     "align1": 53,
     "align2": 53,
     "timings": [20.5469, 720, 18.9844, 21.7812, 50.4844, 33.6406, 36.3438, 37.7031]
    },
    {
     "length": 1728,
     "align1": 0,
     "align2": 0,
     "timings": [20.2812, 738.047, 18.4062, 21.1875, 48.375, 32.5938, 36.8438, 38.7812]
    },
    {
     "length": 1728,
     "align1": 54,
     "align2": 0,
     "timings": [20.6094, 738.234, 19.2969, 36.4375, 49.4375, 67.7969, 66.7188, 81.2656]
    },
    {
     "length": 1728,
     "align1": 0,
     "align2": 54,
     "timings": [22.4375, 737.281, 23.4219, 37.0156, 53.8125, 69.1719, 67.875, 81.9219]
    },
    {
     "length": 1728,
     "align1": 54,
     "align2": 54,
     "timings": [20.8594, 734.531, 19.5156, 21.4531, 51.6875, 33.75, 36.9062, 38.2188]
    },
    {
     "length": 1760,
     "align1": 0,
     "align2": 0,
     "timings": [20.5469, 751.828, 18.625, 21.0156, 48.9062, 32.2656, 37.3281, 39.125]
    },
    {
     "length": 1760,
     "align1": 55,
     "align2": 0,
     "timings": [20.9531, 749.047, 19.375, 36.9688, 50.5312, 68.6094, 67.9688, 82.6406]
    },
    {
     "length": 1760,
     "align1": 0,
     "align2": 55,
     "timings": [22.3125, 756.469, 20.9219, 37.5, 55.7656, 68.6406, 69.3594, 83.4219]
    },
    {
     "length": 1760,
     "align1": 55,
     "align2": 55,
     "timings": [20.4688, 748.219, 19.0156, 21.0938, 68.4688, 34.6562, 37.625, 38.875]
    },
    {
     "length": 1792,
     "align1": 0,
     "align2": 0,
     "timings": [21.4688, 768.312, 19.6562, 22.7031, 45.4688, 32.7031, 37.8281, 39.7188]
    },
    {
     "length": 1792,
     "align1": 56,
     "align2": 0,
     "timings": [21.7656, 765.891, 20.4531, 37.4844, 45.4844, 32.7656, 37.7344, 39.6719]
    },
    {
     "length": 1792,
     "align1": 0,
     "align2": 56,
     "timings": [22.6094, 764.438, 20.8438, 38.6094, 51.3438, 33.0469, 39.9844, 41.8906]
    },
    {
     "length": 1792,
     "align1": 56,
     "align2": 56,
     "timings": [20.875, 766.344, 19.4844, 21.9688, 50.7812, 32.8125, 37.8125, 39.5938]
    },
    {
     "length": 1824,
     "align1": 0,
     "align2": 0,
     "timings": [21.75, 772.656, 19.7656, 22.4688, 50.2344, 33.8281, 38.4844, 40.3438]
    },
    {
     "length": 1824,
     "align1": 57,
     "align2": 0,
     "timings": [21.9844, 777.922, 26.5156, 38.0312, 52.3281, 70.4219, 69.9688, 85.2812]
    },
    {
     "length": 1824,
     "align1": 0,
     "align2": 57,
     "timings": [23.25, 781.812, 21.5938, 39.3281, 54.6875, 71.3125, 71.5, 86.6562]
    },
    {
     "length": 1824,
     "align1": 57,
     "align2": 57,
     "timings": [21.4531, 776.578, 20.2188, 23.3438, 50.7031, 36.2188, 39.3594, 40.9062]
    },
    {
     "length": 1856,
     "align1": 0,
     "align2": 0,
     "timings": [21.4062, 793.719, 19.625, 22.4375, 50.5469, 34.7656, 39.1094, 40.8281]
    },
    {
     "length": 1856,
     "align1": 58,
     "align2": 0,
     "timings": [21.9688, 787.234, 20.3281, 38.5625, 52.4062, 73.5156, 71.25, 86.6875]
    },
    {
     "length": 1856,
     "align1": 0,
     "align2": 58,
     "timings": [23.875, 796.328, 22.1094, 38.7031, 54.8281, 73.0938, 72.5781, 87.7656]
    },
    {
     "length": 1856,
     "align1": 58,
     "align2": 58,
     "timings": [21.8125, 840.25, 20.6406, 22.5, 51.9219, 36.4844, 39.5156, 40.7969]
    },
    {
     "length": 1888,
     "align1": 0,
     "align2": 0,
     "timings": [21.7969, 802.969, 19.7031, 22.1406, 51.2656, 34.4688, 39.5625, 41.375]
    },
    {
     "length": 1888,
     "align1": 59,
     "align2": 0,
     "timings": [27.375, 802.938, 87.6562, 39.2188, 53.6406, 72.9688, 72.2188, 106.266]
    },
    {
     "length": 1888,
     "align1": 0,
     "align2": 59,
     "timings": [23.2969, 811.766, 21.5469, 39.6875, 56.25, 73.2031, 73.3594, 88.8438]
    },
    {
     "length": 1888,
     "align1": 59,
     "align2": 59,
     "timings": [21.5781, 805.219, 20.0156, 22.5469, 51.9062, 37.0781, 39.8594, 41.1719]
    },
    {
     "length": 1920,
     "align1": 0,
     "align2": 0,
     "timings": [22.5625, 818.109, 20.75, 23.8281, 47.7031, 35.25, 40.1875, 41.9688]
    },
    {
     "length": 1920,
     "align1": 60,
     "align2": 0,
     "timings": [23.0625, 819.141, 21.8125, 39.7188, 49.0938, 74.0312, 73.3906, 89.5938]
    },
    {
     "length": 1920,
     "align1": 0,
     "align2": 60,
     "timings": [23.9062, 817.953, 21.9375, 40.3594, 57.1562, 74.75, 75, 90.1406]
    },
    {
     "length": 1920,
     "align1": 60,
     "align2": 60,
     "timings": [21.8125, 816.75, 20.4531, 22.4844, 52.6875, 36.6875, 41.2188, 42.2188]
    },
    {
     "length": 1952,
     "align1": 0,
     "align2": 0,
     "timings": [22.75, 824.703, 21.0625, 23.6562, 52.8125, 36.1406, 40.5781, 42.5312]
    },
    {
     "length": 1952,
     "align1": 61,
     "align2": 0,
     "timings": [23.2031, 832.562, 27.4844, 40.2969, 54.4219, 75.5625, 74.5625, 91.3281]
    },
    {
     "length": 1952,
     "align1": 0,
     "align2": 61,
     "timings": [24.4219, 835.703, 22.9219, 40.6094, 56.7812, 76.7031, 75.4688, 90.6875]
    },
    {
     "length": 1952,
     "align1": 61,
     "align2": 61,
     "timings": [22.5312, 834.641, 21.2812, 24.0469, 52.7812, 38.1562, 40.7812, 41.9844]
    },
    {
     "length": 1984,
     "align1": 0,
     "align2": 0,
     "timings": [22.4688, 840.312, 20.6562, 23.5938, 53.1094, 36.9531, 41.1875, 43.1094]
    },
    {
     "length": 1984,
     "align1": 62,
     "align2": 0,
     "timings": [23.0469, 832.453, 21.6562, 40.7812, 54.7812, 76.3906, 75.4062, 91.5312]
    },
    {
     "length": 1984,
     "align1": 0,
     "align2": 62,
     "timings": [24.9375, 853.438, 23.375, 40.7969, 57.6875, 78.0469, 76.5938, 92.9844]
    },
    {
     "length": 1984,
     "align1": 62,
     "align2": 62,
     "timings": [23.1719, 846.984, 21.6406, 23.5469, 54.0938, 38.125, 41.4844, 42.625]
    },
    {
     "length": 2016,
     "align1": 0,
     "align2": 0,
     "timings": [22.7031, 855.812, 20.8594, 23.3594, 53.5625, 36.7188, 41.8594, 43.75]
    },
    {
     "length": 2016,
     "align1": 63,
     "align2": 0,
     "timings": [28.2812, 856.766, 21.6406, 41.4219, 56.25, 76.9062, 76.7031, 93.5156]
    },
    {
     "length": 2016,
     "align1": 0,
     "align2": 63,
     "timings": [24.2969, 861.422, 22.9219, 41.4688, 58.875, 77.1562, 77.7812, 94.75]
    },
    {
     "length": 2016,
     "align1": 63,
     "align2": 63,
     "timings": [22.7656, 853.516, 21.2188, 23.7812, 54.7031, 38.7812, 42, 43.1875]
    },
    {
     "length": 65536,
     "align1": 0,
     "align2": 0,
     "timings": [609.469, 30388.2, 604.344, 637.438, 1236.55, 1157.64, 1169.56, 1231.25]
    }]
  }
 }
}

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] powerpc: Use aligned stores in memset
  2017-10-03 18:29             ` Adhemerval Zanella
@ 2017-10-05 12:13               ` Rajalakshmi Srinivasaraghavan
  2017-11-08 18:52               ` Tulio Magno Quites Machado Filho
  1 sibling, 0 replies; 31+ messages in thread
From: Rajalakshmi Srinivasaraghavan @ 2017-10-05 12:13 UTC (permalink / raw)
  To: Adhemerval Zanella, Tulio Magno Quites Machado Filho; +Cc: libc-alpha



On 10/03/2017 11:59 PM, Adhemerval Zanella wrote:
> I think one way to provide a slight better memcpy implementation for POWER8
> and still be able to circumvent the non-aligned on non-cacheable memory
> is to use tunables.
> 
> The branch azanella/memcpy-power8 [1] has a power8 memcpy optimization which
> uses unaligned load and stores that I created some time ago but never actually
> send upstream.  It shows better performance on both bench-memcpy and
> bench-memcpy-random (about 10% on latter) and mixed results on bench-memcpy-large
> (which it is mainly dominated by memory throughput and on the environment I am
> using, a shared PowerKVM instance, the results does not seem to be reliable).
> 
> It could use some tunning, specially on some the range I used for unrolling
> the load/stores and it also does not care for unaligned access on cross-page
> boundary (which tend to be quite slow on current hardware, but also on
> current page size of usual 64k also uncommon).
> 
> This first patch does not enable this option as a default for POWER8, it just
> add on string tests as an option.  The second patch changes the selection to:
> 
>    1. If glibc is configure with tunables, set the new implementation as the
>       default for ISA 2.07 (power8).
> 
>    2. Also if tunable is active, add the parameter glibc.tune.aligned_memopt
>       to disable the new implementation selection.
> 
> So programs that rely on aligned loads can set:
> 
> GLIBC_TUNABLES=glibc.tune.aligned_memopt=1
> 
> And then the memcpy ifunc selection would pick the power7 one which uses
> only aligned load and stores.
> 
> This is a RFC patch and if the idea sounds to powerpc arch mantainers I can
> work on finishing the patch with more comments and send upstream.  I tried
> to apply same unaligned idea for memset and memmove, but I could get any real
> improvement in neither.
> 
> [1]https://sourceware.org/git/?p=glibc.git;a=shortlog;h=refs/heads/azanella/memcpy-power8

Thanks for sharing the patches. At this point we are also working on
memcpy for power8 with a different approach and we are planning
to post it soon. We can choose the better performing version and
use your tunables patch too.

-- 
Thanks
Rajalakshmi S

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] powerpc: Use aligned stores in memset
  2017-10-03 18:29             ` Adhemerval Zanella
  2017-10-05 12:13               ` Rajalakshmi Srinivasaraghavan
@ 2017-11-08 18:52               ` Tulio Magno Quites Machado Filho
  2017-12-08 19:52                 ` [PATCHv2] powerpc: POWER8 memcpy optimization for cached memory Tulio Magno Quites Machado Filho
  1 sibling, 1 reply; 31+ messages in thread
From: Tulio Magno Quites Machado Filho @ 2017-11-08 18:52 UTC (permalink / raw)
  To: Adhemerval Zanella, Rajalakshmi Srinivasaraghavan
  Cc: libc-alpha, Florian Weimer

Adhemerval Zanella <adhemerval.zanella@linaro.org> writes:

> I think one way to provide a slight better memcpy implementation for POWER8
> and still be able to circumvent the non-aligned on non-cacheable memory
> is to use tunables.
>
> The branch azanella/memcpy-power8 [1] has a power8 memcpy optimization which
> uses unaligned load and stores that I created some time ago but never actually
> send upstream.  It shows better performance on both bench-memcpy and
> bench-memcpy-random (about 10% on latter) and mixed results on bench-memcpy-large
> (which it is mainly dominated by memory throughput and on the environment I am
> using, a shared PowerKVM instance, the results does not seem to be reliable).
>
> It could use some tunning, specially on some the range I used for unrolling
> the load/stores and it also does not care for unaligned access on cross-page
> boundary (which tend to be quite slow on current hardware, but also on
> current page size of usual 64k also uncommon).
>
> This first patch does not enable this option as a default for POWER8, it just
> add on string tests as an option.  The second patch changes the selection to:
>
>   1. If glibc is configure with tunables, set the new implementation as the
>      default for ISA 2.07 (power8).
>
>   2. Also if tunable is active, add the parameter glibc.tune.aligned_memopt
>      to disable the new implementation selection.

I think it would be safer if we don't change the default behavior.
IMHO, programs that want a performance improvement would have to set a
tunable.

In other words, the new implementation would be disabled by default.

> So programs that rely on aligned loads can set:
>
> GLIBC_TUNABLES=glibc.tune.aligned_memopt=1

I also think that we should not expose internal details of the implementation
to users, i.e. avoiding to use aligned/unaligned in the name of the function
and in the tunables.
I think that glibc.tune.cached_memopt=1 better exposes what is the optimal
use-case scenario of this implementation.

> This is a RFC patch and if the idea sounds to powerpc arch mantainers I can
> work on finishing the patch with more comments and send upstream.  I tried
> to apply same unaligned idea for memset and memmove, but I could get any real
> improvement in neither.

I like the idea.  Could you merge both patches and send it to libc-alpha,
please?

> [1] https://sourceware.org/git/?p=glibc.git;a=shortlog;h=refs/heads/azanella/memcpy-power8
> [ bench-memcpy-random.out: text/plain ]
> {
>  "timing_type": "hp_timing",
>  "functions": {
>   "memcpy": {
>    "bench-variant": "random",
>    "ifuncs": ["__memcpy_power8",

I also suggest give a more specific name, e.g. __memcpy_power8_cached.
That would make room for a POWER8 implementation what uses only naturally
aligned loads/stores.

Your implementation uses lxvd2x and stxvd2x, which should be avoided in a
cache-inhibited scenario, i.e. glibc.tune.aligned_memopt=0.

However, after changing the tunables' name to glibc.tune.cached_memopt, I think
these instructions could stay they're executed when glibc.tune.cached_memopt=1.

Thanks!!!

-- 
Tulio Magno

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCHv2] powerpc: POWER8 memcpy optimization for cached memory
  2017-11-08 18:52               ` Tulio Magno Quites Machado Filho
@ 2017-12-08 19:52                 ` Tulio Magno Quites Machado Filho
  2017-12-08 20:06                   ` Florian Weimer
  2017-12-10  7:11                   ` Rajalakshmi Srinivasaraghavan
  0 siblings, 2 replies; 31+ messages in thread
From: Tulio Magno Quites Machado Filho @ 2017-12-08 19:52 UTC (permalink / raw)
  To: libc-alpha; +Cc: Adhemerval Zanella, Rajalakshmi Srinivasaraghavan

From: Adhemerval Zanella <azanella@linux.vnet.ibm.com>

I made the changes I requested, updated copyright entries, added a
manual entry and fixed a build issue on powerpc64.

--- 8< ---

On POWER8, unaligned memory accesses to cached memory has little impact
on performance as opposed to its ancestors.

It is disabled by default and will only be available when the tunable
glibc.tune.cached_memopt is set to 1.

                 __memcpy_power8_cached      __memcpy_power7
============================================================
    max-size=4096:     33325.70 ( 12.65%)        38153.00
    max-size=8192:     32878.20 ( 11.17%)        37012.30
   max-size=16384:     33782.20 ( 11.61%)        38219.20
   max-size=32768:     33296.20 ( 11.30%)        37538.30
   max-size=65536:     33765.60 ( 10.53%)        37738.40

2017-12-08  Adhemerval Zanella  <azanella@linux.vnet.ibm.com>
	    Tulio Magno Quites Machado Filho  <tuliom@linux.vnet.ibm.com>

	* manual/tunables.texi (Hardware Capability Tunables): Document
	glibc.tune.cached_memopt.
	* sysdeps/powerpc/cpu-features.c: New file.
	* sysdeps/powerpc/cpu-features.h: New file.
	* sysdeps/powerpc/dl-procinfo.c [!IS_IN(ldconfig)]: Add
	_dl_powerpc_cpu_features.
	* sysdeps/powerpc/dl-tunables.list: New file.
	* sysdeps/powerpc/ldsodefs.h: Include cpu-features.h.
	* sysdeps/powerpc/powerpc32/power4/multiarch/init-arch.h: .
	* sysdeps/powerpc/powerpc64/dl-machine.h (INIT_ARCH): Initialize
	use_aligned_memopt.
	* sysdeps/powerpc/powerpc64/multiarch/Makefile (sysdep_routines):
	Add memcpy-power8-cached.
	* sysdeps/powerpc/powerpc64/multiarch/ifunc-impl-list.c: Add
	__memcpy_power8_cached.
	* sysdeps/powerpc/powerpc64/multiarch/memcpy.c: Likewise.
	* sysdeps/powerpc/powerpc64/multiarch/memcpy-power8-cached.S:
	New file.
---
 manual/tunables.texi                               |   7 +
 sysdeps/powerpc/cpu-features.c                     |  39 +++++
 sysdeps/powerpc/cpu-features.h                     |  28 ++++
 sysdeps/powerpc/dl-procinfo.c                      |  16 ++
 sysdeps/powerpc/dl-tunables.list                   |  28 ++++
 sysdeps/powerpc/ldsodefs.h                         |   1 +
 .../powerpc/powerpc32/power4/multiarch/init-arch.h |   2 +
 sysdeps/powerpc/powerpc64/dl-machine.h             |   4 +-
 sysdeps/powerpc/powerpc64/multiarch/Makefile       |   4 +-
 .../powerpc/powerpc64/multiarch/ifunc-impl-list.c  |   2 +
 .../powerpc64/multiarch/memcpy-power8-cached.S     | 179 +++++++++++++++++++++
 sysdeps/powerpc/powerpc64/multiarch/memcpy.c       |  23 +--
 12 files changed, 320 insertions(+), 13 deletions(-)
 create mode 100644 sysdeps/powerpc/cpu-features.c
 create mode 100644 sysdeps/powerpc/cpu-features.h
 create mode 100644 sysdeps/powerpc/dl-tunables.list
 create mode 100644 sysdeps/powerpc/powerpc64/multiarch/memcpy-power8-cached.S

diff --git a/manual/tunables.texi b/manual/tunables.texi
index e851b95..17ceb64 100644
--- a/manual/tunables.texi
+++ b/manual/tunables.texi
@@ -319,6 +319,13 @@ the ones in @code{sysdeps/x86/cpu-features.h}.
 This tunable is specific to i386 and x86-64.
 @end deftp
 
+@deftp Tunable glibc.tune.cached_memopt
+The @code{glibc.tune.cached_memopt=[0|1]} tunable allows the user to enable
+optimizations recommended to cacheable memory.
+
+This tunable is specific to powerpc, powerpc64 and powerpc64le.
+@end deftp
+
 @deftp Tunable glibc.tune.cpu
 The @code{glibc.tune.cpu=xxx} tunable allows the user to tell @theglibc{} to
 assume that the CPU is @code{xxx} where xxx may have one of these values:
diff --git a/sysdeps/powerpc/cpu-features.c b/sysdeps/powerpc/cpu-features.c
new file mode 100644
index 0000000..6870582
--- /dev/null
+++ b/sysdeps/powerpc/cpu-features.c
@@ -0,0 +1,39 @@
+/* Initialize cpu feature data.  PowerPC version.
+   Copyright (C) 2017 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include <stdint.h>
+#include <cpu-features.h>
+
+#if HAVE_TUNABLES
+# include <elf/dl-tunables.h>
+#endif
+
+static inline void
+init_cpu_features (struct cpu_features *cpu_features)
+{
+  /* Default is to use aligned memory access on optimized function unless
+     tunables is enable, since for this case user can explicit disable
+     unaligned optimizations.  */
+#if HAVE_TUNABLES
+  int32_t cached_memfunc = TUNABLE_GET (glibc, tune, cached_memopt, int32_t,
+					NULL);
+  cpu_features->use_cached_memopt = (cached_memfunc > 0);
+#else
+  cpu_features->use_cached_memopt = false;
+#endif
+}
diff --git a/sysdeps/powerpc/cpu-features.h b/sysdeps/powerpc/cpu-features.h
new file mode 100644
index 0000000..36a8bb4
--- /dev/null
+++ b/sysdeps/powerpc/cpu-features.h
@@ -0,0 +1,28 @@
+/* Initialize cpu feature data.  PowerPC version.
+   Copyright (C) 2017 Free Software Foundation, Inc.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#ifndef __CPU_FEATURES_POWERPC_H
+# define __CPU_FEATURES_POWERPC_H
+
+#include <stdbool.h>
+
+struct cpu_features
+{
+  bool use_cached_memopt;
+};
+
+#endif /* __CPU_FEATURES_H  */
diff --git a/sysdeps/powerpc/dl-procinfo.c b/sysdeps/powerpc/dl-procinfo.c
index 55a6e78..c8b14454d 100644
--- a/sysdeps/powerpc/dl-procinfo.c
+++ b/sysdeps/powerpc/dl-procinfo.c
@@ -42,6 +42,22 @@
 # define PROCINFO_CLASS
 #endif
 
+#if !IS_IN (ldconfig)
+# if !defined PROCINFO_DECL && defined SHARED
+  ._dl_powerpc_cpu_features
+# else
+PROCINFO_CLASS struct cpu_features _dl_powerpc_cpu_features
+# endif
+# ifndef PROCINFO_DECL
+= { }
+# endif
+# if !defined SHARED || defined PROCINFO_DECL
+;
+# else
+,
+# endif
+#endif
+
 #if !defined PROCINFO_DECL && defined SHARED
   ._dl_powerpc_cap_flags
 #else
diff --git a/sysdeps/powerpc/dl-tunables.list b/sysdeps/powerpc/dl-tunables.list
new file mode 100644
index 0000000..9e14b9a
--- /dev/null
+++ b/sysdeps/powerpc/dl-tunables.list
@@ -0,0 +1,28 @@
+# powerpc specific tunables.
+# Copyright (C) 2017 Free Software Foundation, Inc.
+# This file is part of the GNU C Library.
+
+# The GNU C Library is free software; you can redistribute it and/or
+# modify it under the terms of the GNU Lesser General Public
+# License as published by the Free Software Foundation; either
+# version 2.1 of the License, or (at your option) any later version.
+
+# The GNU C Library is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+# Lesser General Public License for more details.
+
+# You should have received a copy of the GNU Lesser General Public
+# License along with the GNU C Library; if not, see
+# <http://www.gnu.org/licenses/>.
+
+glibc {
+  tune {
+    cached_memopt {
+      type: INT_32
+      minval: 0
+      maxval: 1
+      default: 0
+    }
+  }
+}
diff --git a/sysdeps/powerpc/ldsodefs.h b/sysdeps/powerpc/ldsodefs.h
index 466de79..6f8b3a2 100644
--- a/sysdeps/powerpc/ldsodefs.h
+++ b/sysdeps/powerpc/ldsodefs.h
@@ -20,6 +20,7 @@
 #define	_POWERPC_LDSODEFS_H	1
 
 #include <elf.h>
+#include <cpu-features.h>
 
 struct La_ppc32_regs;
 struct La_ppc32_retval;
diff --git a/sysdeps/powerpc/powerpc32/power4/multiarch/init-arch.h b/sysdeps/powerpc/powerpc32/power4/multiarch/init-arch.h
index f2e6a4b..6038941 100644
--- a/sysdeps/powerpc/powerpc32/power4/multiarch/init-arch.h
+++ b/sysdeps/powerpc/powerpc32/power4/multiarch/init-arch.h
@@ -37,6 +37,8 @@
 #define INIT_ARCH() \
   unsigned long int hwcap = __GLRO(dl_hwcap); 			\
   unsigned long int __attribute__((unused)) hwcap2 = __GLRO(dl_hwcap2); \
+  bool __attribute__((unused)) use_cached_memopt =		\
+    GLRO(dl_powerpc_cpu_features).use_cached_memopt;		\
   if (hwcap & PPC_FEATURE_ARCH_2_06)				\
     hwcap |= PPC_FEATURE_ARCH_2_05 |				\
 	     PPC_FEATURE_POWER5_PLUS |				\
diff --git a/sysdeps/powerpc/powerpc64/dl-machine.h b/sysdeps/powerpc/powerpc64/dl-machine.h
index aeb91b8..76dceee 100644
--- a/sysdeps/powerpc/powerpc64/dl-machine.h
+++ b/sysdeps/powerpc/powerpc64/dl-machine.h
@@ -27,6 +27,7 @@
 #include <dl-tls.h>
 #include <sysdep.h>
 #include <hwcapinfo.h>
+#include <cpu-features.c>
 
 /* Translate a processor specific dynamic tag to the index
    in l_info array.  */
@@ -300,13 +301,14 @@ BODY_PREFIX "_dl_start_user:\n"						\
 /* We define an initialization function to initialize HWCAP/HWCAP2 and
    platform data so it can be copied into the TCB later.  This is called
    very early in _dl_sysdep_start for dynamically linked binaries.  */
-#ifdef SHARED
+#if defined(SHARED) && IS_IN (rtld)
 # define DL_PLATFORM_INIT dl_platform_init ()
 
 static inline void __attribute__ ((unused))
 dl_platform_init (void)
 {
   __tcb_parse_hwcap_and_convert_at_platform ();
+  init_cpu_features (&GLRO(dl_powerpc_cpu_features));
 }
 #endif
 
diff --git a/sysdeps/powerpc/powerpc64/multiarch/Makefile b/sysdeps/powerpc/powerpc64/multiarch/Makefile
index dea49ac..4df6b45 100644
--- a/sysdeps/powerpc/powerpc64/multiarch/Makefile
+++ b/sysdeps/powerpc/powerpc64/multiarch/Makefile
@@ -1,6 +1,6 @@
 ifeq ($(subdir),string)
-sysdep_routines += memcpy-power7 memcpy-a2 memcpy-power6 memcpy-cell \
-		   memcpy-power4 memcpy-ppc64 \
+sysdep_routines += memcpy-power8-cached memcpy-power7 memcpy-a2 memcpy-power6 \
+		   memcpy-cell memcpy-power4 memcpy-ppc64 \
 		   memcmp-power8 memcmp-power7 memcmp-power4 memcmp-ppc64 \
 		   memset-power7 memset-power6 memset-power4 \
 		   memset-ppc64 memset-power8 \
diff --git a/sysdeps/powerpc/powerpc64/multiarch/ifunc-impl-list.c b/sysdeps/powerpc/powerpc64/multiarch/ifunc-impl-list.c
index 6a88536..77a60ea 100644
--- a/sysdeps/powerpc/powerpc64/multiarch/ifunc-impl-list.c
+++ b/sysdeps/powerpc/powerpc64/multiarch/ifunc-impl-list.c
@@ -51,6 +51,8 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array,
 #ifdef SHARED
   /* Support sysdeps/powerpc/powerpc64/multiarch/memcpy.c.  */
   IFUNC_IMPL (i, name, memcpy,
+	      IFUNC_IMPL_ADD (array, i, memcpy, hwcap2 & PPC_FEATURE2_ARCH_2_07,
+			      __memcpy_power8_cached)
 	      IFUNC_IMPL_ADD (array, i, memcpy, hwcap & PPC_FEATURE_HAS_VSX,
 			      __memcpy_power7)
 	      IFUNC_IMPL_ADD (array, i, memcpy, hwcap & PPC_FEATURE_ARCH_2_06,
diff --git a/sysdeps/powerpc/powerpc64/multiarch/memcpy-power8-cached.S b/sysdeps/powerpc/powerpc64/multiarch/memcpy-power8-cached.S
new file mode 100644
index 0000000..e5b6f25
--- /dev/null
+++ b/sysdeps/powerpc/powerpc64/multiarch/memcpy-power8-cached.S
@@ -0,0 +1,179 @@
+/* Optimized memcpy implementation for cached memory on PowerPC64/POWER8.
+   Copyright (C) 2017 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+
+
+/* __ptr_t [r3] memcpy (__ptr_t dst [r3], __ptr_t src [r4], size_t len [r5]);
+   Returns 'dst'.  */
+
+	.machine power8
+ENTRY_TOCLESS (__memcpy_power8_cached, 5)
+	CALL_MCOUNT 3
+
+	cmpldi	cr7,r5,15
+	bgt	cr7,L(ge_16)
+	andi.	r9,r5,0x1
+	mr	r9,r3
+	beq	cr0,1f
+	lbz	r10,0(r4)
+	addi	r9,r3,1
+	addi	r4,r4,1
+	stb	r10,0(r3)
+1:
+	andi.	r10,r5,0x2
+	beq	cr0,2f
+	lhz	r10,0(r4)
+	addi	r9,r9,2
+	addi	r4,r4,2
+	sth	r10,-2(r9)
+2:
+	andi.	r10,r5,0x4
+	beq	cr0,3f
+	lwz	r10,0(r4)
+	addi	r9,9,4
+	addi	r4,4,4
+	stw	r10,-4(r9)
+3:
+	andi.	r10,r5,0x8
+	beqlr	cr0
+	ld	r10,0(r4)
+	std	r10,0(r9)
+	blr
+
+	.align 4
+L(ge_16):
+	cmpldi	cr7,r5,32
+	ble	cr7,L(ge_16_le_32)
+	cmpldi	cr7,r5,64
+	ble	cr7,L(gt_32_le_64)
+
+	/* Align dst to 16 bytes.  */
+	andi.	r9,r3,0xf
+	mr	r12,r3
+	beq	cr0,L(dst_is_align_16)
+	lxvd2x	v0,r0,r4
+	subfic	r12,r9,16
+	subf	r5,r12,r5
+	add	r4,r4,r12
+	add	r12,r3,r12
+	stxvd2x	v0,r0,r3
+L(dst_is_align_16):
+	cmpldi	cr7,r5,127
+	ble	cr7,L(tail_copy)
+	addi	r8,r5,-128
+	mr	r9,r12
+	rldicr	r8,r8,0,56
+	li	r11,16
+	srdi	r10,r8,7
+	addi	r0,r8,128
+	addi	r10,r10,1
+	li	r6,32
+	mtctr	r10
+	li	r7,48
+
+	/* Main loop, copy 128 bytes each time.  */
+	.align 4
+L(copy_128):
+	lxvd2x	v10,r0,r4
+	lxvd2x	v11,r4,r11
+	addi	r8,r4,64
+	addi	r10,r9,64
+	lxvd2x	v12,r4,r6
+	lxvd2x	v0,r4,r7
+	addi	r4,r4,128
+	stxvd2x v10,r0,r9
+	stxvd2x v11,r9,r11
+	stxvd2x v12,r9,r6
+	stxvd2x v0,r9,r7
+	addi	r9,r9,128
+	lxvd2x	v10,r0,r8
+	lxvd2x	v11,r8,r11
+	lxvd2x	v12,r8,r6
+	lxvd2x	v0,r8,r7
+	stxvd2x v10,r0,r10
+	stxvd2x v11,r10,r11
+	stxvd2x v12,r10,r6
+	stxvd2x v0,r10,r7
+	bdnz	L(copy_128)
+
+	add	r12,r12,r0
+	rldicl 	r5,r5,0,57
+L(tail_copy):
+	cmpldi	cr7,r5,63
+	ble	cr7,L(tail_le_64)
+	li	r8,16
+	li	r10,32
+	lxvd2x	v10,r0,r4
+	li	r9,48
+	addi	r5,r5,-64
+	lxvd2x	v11,r4,r8
+	lxvd2x	v12,r4,r10
+	lxvd2x	v0,r4,r9
+	addi	r4,r4,64
+	stxvd2x	v10,r0,r12
+	stxvd2x	v11,r12,r8
+	stxvd2x	v12,r12,r10
+	stxvd2x	v0,r12,9
+	addi	r12,r12,64
+
+L(tail_le_64):
+	cmpldi	cr7,r5,32
+	bgt	cr7,L(tail_gt_32_le_64)
+	cmpdi	cr7,r5,0
+	beqlr	cr7
+	addi	r5,r5,-32
+	li	r9,16
+	add	r8,r4,r5
+	add	r10,r12,r5
+	lxvd2x	v12,r4,r5
+	lxvd2x	v0,r8,r9
+	stxvd2x	v12,r12,r5
+	stxvd2x	v0,r10,r9
+	blr
+
+	.align 4
+L(ge_16_le_32):
+	addi	r5,r5,-16
+	lxvd2x	v0,r0,r4
+	lxvd2x	v1,r4,r5
+	stxvd2x	v0,r0,r3
+	stxvd2x	v1,r3,r5
+	blr
+
+	.align 4
+L(gt_32_le_64):
+	mr	r12,r3
+
+	.align 4
+L(tail_gt_32_le_64):
+	li	r9,16
+	lxvd2x	v0,r0,r4
+	addi	r5,r5,-32
+	lxvd2x	v1,r4,r9
+	add	r8,r4,r5
+	lxvd2x	v2,r4,r5
+	add	r10,r12,r5
+	lxvd2x	v3,r8,r9
+	stxvd2x	v0,r0,r12
+	stxvd2x	v1,r12,r9
+	stxvd2x	v2,r12,r5
+	stxvd2x	v3,r10,r9
+	blr
+
+END_GEN_TB (__memcpy_power8_cached,TB_TOCLESS)
diff --git a/sysdeps/powerpc/powerpc64/multiarch/memcpy.c b/sysdeps/powerpc/powerpc64/multiarch/memcpy.c
index 9f4286c..fb49fe1 100644
--- a/sysdeps/powerpc/powerpc64/multiarch/memcpy.c
+++ b/sysdeps/powerpc/powerpc64/multiarch/memcpy.c
@@ -35,18 +35,21 @@ extern __typeof (__redirect_memcpy) __memcpy_cell attribute_hidden;
 extern __typeof (__redirect_memcpy) __memcpy_power6 attribute_hidden;
 extern __typeof (__redirect_memcpy) __memcpy_a2 attribute_hidden;
 extern __typeof (__redirect_memcpy) __memcpy_power7 attribute_hidden;
+extern __typeof (__redirect_memcpy) __memcpy_power8_cached attribute_hidden;
 
 libc_ifunc (__libc_memcpy,
-            (hwcap & PPC_FEATURE_HAS_VSX)
-            ? __memcpy_power7 :
-	      (hwcap & PPC_FEATURE_ARCH_2_06)
-	      ? __memcpy_a2 :
-		(hwcap & PPC_FEATURE_ARCH_2_05)
-		? __memcpy_power6 :
-		  (hwcap & PPC_FEATURE_CELL_BE)
-		  ? __memcpy_cell :
-		    (hwcap & PPC_FEATURE_POWER4)
-		    ? __memcpy_power4
+	    ((hwcap2 & PPC_FEATURE2_ARCH_2_07) && use_cached_memopt)
+	    ? __memcpy_power8_cached :
+	      (hwcap & PPC_FEATURE_HAS_VSX)
+	      ? __memcpy_power7 :
+		(hwcap & PPC_FEATURE_ARCH_2_06)
+		? __memcpy_a2 :
+		  (hwcap & PPC_FEATURE_ARCH_2_05)
+		  ? __memcpy_power6 :
+		    (hwcap & PPC_FEATURE_CELL_BE)
+		    ? __memcpy_cell :
+		      (hwcap & PPC_FEATURE_POWER4)
+		      ? __memcpy_power4
             : __memcpy_ppc);
 
 #undef memcpy
-- 
2.9.5

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCHv2] powerpc: POWER8 memcpy optimization for cached memory
  2017-12-08 19:52                 ` [PATCHv2] powerpc: POWER8 memcpy optimization for cached memory Tulio Magno Quites Machado Filho
@ 2017-12-08 20:06                   ` Florian Weimer
  2017-12-11 12:44                     ` Tulio Magno Quites Machado Filho
  2017-12-10  7:11                   ` Rajalakshmi Srinivasaraghavan
  1 sibling, 1 reply; 31+ messages in thread
From: Florian Weimer @ 2017-12-08 20:06 UTC (permalink / raw)
  To: Tulio Magno Quites Machado Filho, libc-alpha
  Cc: Adhemerval Zanella, Rajalakshmi Srinivasaraghavan

On 12/08/2017 08:40 PM, Tulio Magno Quites Machado Filho wrote:
> +@deftp Tunable glibc.tune.cached_memopt
> +The @code{glibc.tune.cached_memopt=[0|1]} tunable allows the user to enable
> +optimizations recommended to cacheable memory.
> +
> +This tunable is specific to powerpc, powerpc64 and powerpc64le.
> +@end deftp

I think this has a slight grammar problem.

What about this instead?

The @code{glibc.tune.cached_memopt=[0|1]} tunable allows the user to 
enable optimizations recommended for cacheable memory.  If set to 
@code{1}, @theglibc{} assumes that the process memory image consists of 
cacheable (non-device) memory only.  The default, @code{0}, indicates 
that the process may use device memory.

(I think it's best not to mention string functions here because it is 
impossible to describe how glibc.tune.cached_memopt affects them due to 
compiler optimizations.)

Thanks,
Florian

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCHv2] powerpc: POWER8 memcpy optimization for cached memory
  2017-12-08 19:52                 ` [PATCHv2] powerpc: POWER8 memcpy optimization for cached memory Tulio Magno Quites Machado Filho
  2017-12-08 20:06                   ` Florian Weimer
@ 2017-12-10  7:11                   ` Rajalakshmi Srinivasaraghavan
  2017-12-11 19:48                     ` Tulio Magno Quites Machado Filho
  1 sibling, 1 reply; 31+ messages in thread
From: Rajalakshmi Srinivasaraghavan @ 2017-12-10  7:11 UTC (permalink / raw)
  To: libc-alpha



On 12/09/2017 01:10 AM, Tulio Magno Quites Machado Filho wrote:
> From: Adhemerval Zanella<azanella@linux.vnet.ibm.com>
> 
> I made the changes I requested, updated copyright entries, added a
> manual entry and fixed a build issue on powerpc64.
> 
> --- 8< ---
> 
> On POWER8, unaligned memory accesses to cached memory has little impact
> on performance as opposed to its ancestors.
> 
> It is disabled by default and will only be available when the tunable
> glibc.tune.cached_memopt is set to 1.
> 
>                   __memcpy_power8_cached      __memcpy_power7
> ============================================================
>      max-size=4096:     33325.70 ( 12.65%)        38153.00
>      max-size=8192:     32878.20 ( 11.17%)        37012.30
>     max-size=16384:     33782.20 ( 11.61%)        38219.20
>     max-size=32768:     33296.20 ( 11.30%)        37538.30
>     max-size=65536:     33765.60 ( 10.53%)        37738.40
> 
> 2017-12-08  Adhemerval Zanella<azanella@linux.vnet.ibm.com>
> 	    Tulio Magno Quites Machado Filho<tuliom@linux.vnet.ibm.com>
> 
> 	* manual/tunables.texi (Hardware Capability Tunables): Document
> 	glibc.tune.cached_memopt.
> 	* sysdeps/powerpc/cpu-features.c: New file.
> 	* sysdeps/powerpc/cpu-features.h: New file.
> 	* sysdeps/powerpc/dl-procinfo.c [!IS_IN(ldconfig)]: Add
> 	_dl_powerpc_cpu_features.
> 	* sysdeps/powerpc/dl-tunables.list: New file.
> 	* sysdeps/powerpc/ldsodefs.h: Include cpu-features.h.
> 	* sysdeps/powerpc/powerpc32/power4/multiarch/init-arch.h: .

Comment missing.
> 	* sysdeps/powerpc/powerpc64/dl-machine.h (INIT_ARCH): Initialize
> 	use_aligned_memopt.

Should this be moved to init-arch.h? (also use_cached_memopt)
> 	* sysdeps/powerpc/powerpc64/multiarch/Makefile (sysdep_routines):
> 	Add memcpy-power8-cached.
> 	* sysdeps/powerpc/powerpc64/multiarch/ifunc-impl-list.c: Add
> 	__memcpy_power8_cached.
> 	* sysdeps/powerpc/powerpc64/multiarch/memcpy.c: Likewise.
> 	* sysdeps/powerpc/powerpc64/multiarch/memcpy-power8-cached.S:
> 	New file.
> ---
> diff --git a/sysdeps/powerpc/powerpc64/multiarch/memcpy-power8-cached.S b/sysdeps/powerpc/powerpc64/multiarch/memcpy-power8-cached.S
> new file mode 100644
> index 0000000..e5b6f25
> --- /dev/null
> +++ b/sysdeps/powerpc/powerpc64/multiarch/memcpy-power8-cached.S
> @@ -0,0 +1,179 @@
> +	stxvd2x	v0,r0,r3
> +L(dst_is_align_16):
> +	cmpldi	cr7,r5,127
> +	ble	cr7,L(tail_copy)
> +	addi	r8,r5,-128
> +	mr	r9,r12
> +	rldicr	r8,r8,0,56
> +	li	r11,16
> +	srdi	r10,r8,7
> +	addi	r0,r8,128
> +	addi	r10,r10,1

Can we directly do
	rldicr  r0, r5, 0, 56
	srdi    r10,r5,7
instead of this sequence?
79         addi    r8,r5,-128
81         rldicr  r8,r8,0,56
83         srdi    r10,r8,7
84         addi    r0,r8,128
85         addi    r10,r10,1

> +	li	r6,32
> +	mtctr	r10
> +	li	r7,48
> +
> +	/* Main loop, copy 128 bytes each time.  */

LGTM.

Reviewed-by: Rajalakshmi Srinivasaraghavan  <raji@linux.vnet.ibm.com>


-- 
Thanks
Rajalakshmi S

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCHv2] powerpc: POWER8 memcpy optimization for cached memory
  2017-12-08 20:06                   ` Florian Weimer
@ 2017-12-11 12:44                     ` Tulio Magno Quites Machado Filho
  2017-12-11 20:09                       ` Adhemerval Zanella
  0 siblings, 1 reply; 31+ messages in thread
From: Tulio Magno Quites Machado Filho @ 2017-12-11 12:44 UTC (permalink / raw)
  To: Florian Weimer, libc-alpha
  Cc: Adhemerval Zanella, Rajalakshmi Srinivasaraghavan

Florian Weimer <fweimer@redhat.com> writes:

> On 12/08/2017 08:40 PM, Tulio Magno Quites Machado Filho wrote:
>> +@deftp Tunable glibc.tune.cached_memopt
>> +The @code{glibc.tune.cached_memopt=[0|1]} tunable allows the user to enable
>> +optimizations recommended to cacheable memory.
>> +
>> +This tunable is specific to powerpc, powerpc64 and powerpc64le.
>> +@end deftp
>
> I think this has a slight grammar problem.
>
> What about this instead?
>
> The @code{glibc.tune.cached_memopt=[0|1]} tunable allows the user to 
> enable optimizations recommended for cacheable memory.  If set to 
> @code{1}, @theglibc{} assumes that the process memory image consists of 
> cacheable (non-device) memory only.  The default, @code{0}, indicates 
> that the process may use device memory.

And a much better description.

> (I think it's best not to mention string functions here because it is 
> impossible to describe how glibc.tune.cached_memopt affects them due to 
> compiler optimizations.)

Ack.

Fixed locally.

Thanks!

-- 
Tulio Magno

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCHv2] powerpc: POWER8 memcpy optimization for cached memory
  2017-12-10  7:11                   ` Rajalakshmi Srinivasaraghavan
@ 2017-12-11 19:48                     ` Tulio Magno Quites Machado Filho
  0 siblings, 0 replies; 31+ messages in thread
From: Tulio Magno Quites Machado Filho @ 2017-12-11 19:48 UTC (permalink / raw)
  To: Rajalakshmi Srinivasaraghavan, libc-alpha

Rajalakshmi Srinivasaraghavan <raji@linux.vnet.ibm.com> writes:

> On 12/09/2017 01:10 AM, Tulio Magno Quites Machado Filho wrote:
>> 	* manual/tunables.texi (Hardware Capability Tunables): Document
>> 	glibc.tune.cached_memopt.
>> 	* sysdeps/powerpc/cpu-features.c: New file.
>> 	* sysdeps/powerpc/cpu-features.h: New file.
>> 	* sysdeps/powerpc/dl-procinfo.c [!IS_IN(ldconfig)]: Add
>> 	_dl_powerpc_cpu_features.
>> 	* sysdeps/powerpc/dl-tunables.list: New file.
>> 	* sysdeps/powerpc/ldsodefs.h: Include cpu-features.h.
>> 	* sysdeps/powerpc/powerpc32/power4/multiarch/init-arch.h: .
>
> Comment missing.

Ooops.

>> 	* sysdeps/powerpc/powerpc64/dl-machine.h (INIT_ARCH): Initialize
>> 	use_aligned_memopt.
>
> Should this be moved to init-arch.h? (also use_cached_memopt)

Indeed.
Changed to:

	* sysdeps/powerpc/powerpc32/power4/multiarch/init-arch.h
	(INIT_ARCH): Initialize use_aligned_memopt.
	* sysdeps/powerpc/powerpc64/dl-machine.h [defined(SHARED &&
	IS_IN(rtld))]: Restrict dl_platform_init availability and
	initialize CPU features used by tunables.

>> diff --git a/sysdeps/powerpc/powerpc64/multiarch/memcpy-power8-cached.S b/sysdeps/powerpc/powerpc64/multiarch/memcpy-power8-cached.S
>> new file mode 100644
>> index 0000000..e5b6f25
>> --- /dev/null
>> +++ b/sysdeps/powerpc/powerpc64/multiarch/memcpy-power8-cached.S
>> @@ -0,0 +1,179 @@
>> +	stxvd2x	v0,r0,r3
>> +L(dst_is_align_16):
>> +	cmpldi	cr7,r5,127
>> +	ble	cr7,L(tail_copy)
>> +	addi	r8,r5,-128
>> +	mr	r9,r12
>> +	rldicr	r8,r8,0,56
>> +	li	r11,16
>> +	srdi	r10,r8,7
>> +	addi	r0,r8,128
>> +	addi	r10,r10,1
>
> Can we directly do
> 	rldicr  r0, r5, 0, 56
> 	srdi    r10,r5,7
> instead of this sequence?
> 79         addi    r8,r5,-128
> 81         rldicr  r8,r8,0,56
> 83         srdi    r10,r8,7
> 84         addi    r0,r8,128
> 85         addi    r10,r10,1

Yes.  I changed that and made more changes for clarity:
 - Replaced rldicr with clrrdi.
 - Replace r0 with 0 where it's treated as an immediate.

Pushed as c9cd7b0ce5c5.

-- 
Tulio Magno

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCHv2] powerpc: POWER8 memcpy optimization for cached memory
  2017-12-11 12:44                     ` Tulio Magno Quites Machado Filho
@ 2017-12-11 20:09                       ` Adhemerval Zanella
  0 siblings, 0 replies; 31+ messages in thread
From: Adhemerval Zanella @ 2017-12-11 20:09 UTC (permalink / raw)
  To: Tulio Magno Quites Machado Filho, Florian Weimer, libc-alpha
  Cc: Rajalakshmi Srinivasaraghavan



On 11/12/2017 10:44, Tulio Magno Quites Machado Filho wrote:
> Florian Weimer <fweimer@redhat.com> writes:
> 
>> On 12/08/2017 08:40 PM, Tulio Magno Quites Machado Filho wrote:
>>> +@deftp Tunable glibc.tune.cached_memopt
>>> +The @code{glibc.tune.cached_memopt=[0|1]} tunable allows the user to enable
>>> +optimizations recommended to cacheable memory.
>>> +
>>> +This tunable is specific to powerpc, powerpc64 and powerpc64le.
>>> +@end deftp
>>
>> I think this has a slight grammar problem.
>>
>> What about this instead?
>>
>> The @code{glibc.tune.cached_memopt=[0|1]} tunable allows the user to 
>> enable optimizations recommended for cacheable memory.  If set to 
>> @code{1}, @theglibc{} assumes that the process memory image consists of 
>> cacheable (non-device) memory only.  The default, @code{0}, indicates 
>> that the process may use device memory.
> 
> And a much better description.
> 
>> (I think it's best not to mention string functions here because it is 
>> impossible to describe how glibc.tune.cached_memopt affects them due to 
>> compiler optimizations.)
> 
> Ack.
> 
> Fixed locally.
> 
> Thanks!
> 

Thanks for working on this Tulio, LGTM.

^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2017-12-11 20:09 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-08-18  5:13 [PATCH] powerpc: Use aligned stores in memset Rajalakshmi Srinivasaraghavan
2017-08-18  6:21 ` Florian Weimer
2017-08-18  6:51   ` Rajalakshmi Srinivasaraghavan
2017-08-18  9:10     ` Florian Weimer
2017-08-18 12:13       ` Adhemerval Zanella
2017-09-12 10:30       ` Florian Weimer
2017-09-12 12:18         ` Zack Weinberg
2017-09-12 13:57           ` Steven Munroe
2017-09-12 14:37           ` Joseph Myers
2017-09-12 15:06             ` Zack Weinberg
2017-09-12 17:09           ` Florian Weimer
2017-09-12 13:38         ` Steven Munroe
2017-09-12 14:08           ` Florian Weimer
2017-09-12 14:16             ` Steven Munroe
2017-09-12 17:04               ` Florian Weimer
2017-09-12 19:21                 ` Steven Munroe
2017-09-12 19:45                   ` Florian Weimer
2017-09-12 20:25                     ` Steven Munroe
2017-09-13 13:12         ` Tulio Magno Quites Machado Filho
2017-09-18 13:54           ` Florian Weimer
2017-10-03 18:29             ` Adhemerval Zanella
2017-10-05 12:13               ` Rajalakshmi Srinivasaraghavan
2017-11-08 18:52               ` Tulio Magno Quites Machado Filho
2017-12-08 19:52                 ` [PATCHv2] powerpc: POWER8 memcpy optimization for cached memory Tulio Magno Quites Machado Filho
2017-12-08 20:06                   ` Florian Weimer
2017-12-11 12:44                     ` Tulio Magno Quites Machado Filho
2017-12-11 20:09                       ` Adhemerval Zanella
2017-12-10  7:11                   ` Rajalakshmi Srinivasaraghavan
2017-12-11 19:48                     ` Tulio Magno Quites Machado Filho
2017-08-18  6:25 ` [PATCH] powerpc: Use aligned stores in memset Andrew Pinski
2017-08-21  2:20 ` Tulio Magno Quites Machado Filho

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).