public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed
* Re: [PATCH] Add ifunc memcpy and memmove for aarch64
@ 2017-02-07 12:42 Wilco Dijkstra
  2017-02-07 13:01 ` Siddhesh Poyarekar
  0 siblings, 1 reply; 38+ messages in thread
From: Wilco Dijkstra @ 2017-02-07 12:42 UTC (permalink / raw)
  To: Siddhesh Poyarekar, sellcey; +Cc: adhemerval.zanella, libc-alpha, nd

Siddhesh wrote:
> I think it would be cleaner to put the full generic and thunderx
> implementations in separate files instead of trying to do this macro
> dance because it keeps micro-architecture details separate.  Assembly
> code is hard to maintain as it is without adding conditional compilation
> using macros.

I agree we want to avoid using conditional compilation as much as possible.
On the other hand duplication is a bad idea too, I've seen too many cases where
bugs were only fixed in one of the N duplicates.

However I'm actually wondering whether we need an ifunc for this case.
For large copies from L2 I think adding a prefetch should be benign even on 
cores that don't need it, so if the benchmarks confirm this we should consider
updating the generic memcpy.

> I also second Adhemerval's suggestion to separate the patch to add the
> framework from the one to add the thunderx ifunc.  It makes for easier
> cherry picking and git-blaming.

Agreed.

Wilco

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH] Add ifunc memcpy and memmove for aarch64
  2017-02-07 12:42 [PATCH] Add ifunc memcpy and memmove for aarch64 Wilco Dijkstra
@ 2017-02-07 13:01 ` Siddhesh Poyarekar
  2017-02-07 13:22   ` Adhemerval Zanella
  0 siblings, 1 reply; 38+ messages in thread
From: Siddhesh Poyarekar @ 2017-02-07 13:01 UTC (permalink / raw)
  To: Wilco Dijkstra, sellcey; +Cc: adhemerval.zanella, libc-alpha, nd

On Tuesday 07 February 2017 06:12 PM, Wilco Dijkstra wrote:
> I agree we want to avoid using conditional compilation as much as possible.
> On the other hand duplication is a bad idea too, I've seen too many cases where
> bugs were only fixed in one of the N duplicates.

Sure, but then in that case the de-duplication must be done by
identifying a logical code block and make that into a macro to override
and not just arbitrarily inject hunks of code.  So in this case it could
be alternate implementations of copy_long that is sufficient so #define
COPY_LONG in both memcpy_generic and memcpy_thunderx and have the parent
(memcpy.S) use that macro.  In fact, that might even end up making the
code a bit nicer to read.

> However I'm actually wondering whether we need an ifunc for this case.
> For large copies from L2 I think adding a prefetch should be benign even on 
> cores that don't need it, so if the benchmarks confirm this we should consider
> updating the generic memcpy.

That is a call that ARM maintainers can take and is also another reason
to separate the IFUNC infrastructure code from the thunderx change.

Siddhesh

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH] Add ifunc memcpy and memmove for aarch64
  2017-02-07 13:01 ` Siddhesh Poyarekar
@ 2017-02-07 13:22   ` Adhemerval Zanella
  2017-02-07 23:20     ` Steve Ellcey
  0 siblings, 1 reply; 38+ messages in thread
From: Adhemerval Zanella @ 2017-02-07 13:22 UTC (permalink / raw)
  To: Siddhesh Poyarekar, Wilco Dijkstra, sellcey; +Cc: libc-alpha, nd

[-- Attachment #1: Type: text/plain, Size: 1467 bytes --]



On 07/02/2017 11:01, Siddhesh Poyarekar wrote:
> On Tuesday 07 February 2017 06:12 PM, Wilco Dijkstra wrote:
>> I agree we want to avoid using conditional compilation as much as possible.
>> On the other hand duplication is a bad idea too, I've seen too many cases where
>> bugs were only fixed in one of the N duplicates.
> 
> Sure, but then in that case the de-duplication must be done by
> identifying a logical code block and make that into a macro to override
> and not just arbitrarily inject hunks of code.  So in this case it could
> be alternate implementations of copy_long that is sufficient so #define
> COPY_LONG in both memcpy_generic and memcpy_thunderx and have the parent
> (memcpy.S) use that macro.  In fact, that might even end up making the
> code a bit nicer to read.
> 
>> However I'm actually wondering whether we need an ifunc for this case.
>> For large copies from L2 I think adding a prefetch should be benign even on 
>> cores that don't need it, so if the benchmarks confirm this we should consider
>> updating the generic memcpy.
> 
> That is a call that ARM maintainers can take and is also another reason
> to separate the IFUNC infrastructure code from the thunderx change.

I checked only the memcpy change on a APM X-Gene 1 and results seems to show
improvements on aligned input, at least for sizes shorter thatn 4MB.  I would
like to check on more armv8 chips, but it does seems a nice improvement
over generic implementation.

[-- Attachment #2: bench-memcpy-large.out --]
[-- Type: text/plain, Size: 1689 bytes --]

                       	memcpy
Length 65543, alignment  0/ 0:	4553.71
Length 65551, alignment  0/ 3:	11239.8
Length 65567, alignment  3/ 0:	11201.6
Length 65599, alignment  3/ 5:	11221.2
Length 131079, alignment  0/ 0:	9023.67
Length 131087, alignment  0/ 3:	22489.5
Length 131103, alignment  3/ 0:	22439.6
Length 131135, alignment  3/ 5:	22426.3
Length 262151, alignment  0/ 0:	21198.5
Length 262159, alignment  0/ 3:	48474
Length 262175, alignment  3/ 0:	48292.3
Length 262207, alignment  3/ 5:	48545.1
Length 524295, alignment  0/ 0:	43480.7
Length 524303, alignment  0/ 3:	93729.3
Length 524319, alignment  3/ 0:	93706.8
Length 524351, alignment  3/ 5:	93809.2
Length 1048583, alignment  0/ 0:	86732.2
Length 1048591, alignment  0/ 3:	187419
Length 1048607, alignment  3/ 0:	187153
Length 1048639, alignment  3/ 5:	187384
Length 2097159, alignment  0/ 0:	173630
Length 2097167, alignment  0/ 3:	373671
Length 2097183, alignment  3/ 0:	373776
Length 2097215, alignment  3/ 5:	374153
Length 4194311, alignment  0/ 0:	383575
Length 4194319, alignment  0/ 3:	752044
Length 4194335, alignment  3/ 0:	750919
Length 4194367, alignment  3/ 5:	751680
Length 8388615, alignment  0/ 0:	1.24695e+06
Length 8388623, alignment  0/ 3:	1.6407e+06
Length 8388639, alignment  3/ 0:	1.63961e+06
Length 8388671, alignment  3/ 5:	1.6407e+06
Length 16777223, alignment  0/ 0:	2.7774e+06
Length 16777231, alignment  0/ 3:	3.34092e+06
Length 16777247, alignment  3/ 0:	3.33036e+06
Length 16777279, alignment  3/ 5:	3.33811e+06
Length 33554439, alignment  0/ 0:	5.4628e+06
Length 33554447, alignment  0/ 3:	6.56429e+06
Length 33554463, alignment  3/ 0:	6.56451e+06
Length 33554495, alignment  3/ 5:	6.5654e+06

[-- Attachment #3: bench-memcpy-large.patched --]
[-- Type: text/plain, Size: 1690 bytes --]

                       	memcpy
Length 65543, alignment  0/ 0:	5590.23
Length 65551, alignment  0/ 3:	11171
Length 65567, alignment  3/ 0:	11146.2
Length 65599, alignment  3/ 5:	11154.1
Length 131079, alignment  0/ 0:	11109
Length 131087, alignment  0/ 3:	22266.3
Length 131103, alignment  3/ 0:	22296.1
Length 131135, alignment  3/ 5:	22257.1
Length 262151, alignment  0/ 0:	22780.6
Length 262159, alignment  0/ 3:	46212.7
Length 262175, alignment  3/ 0:	45999.7
Length 262207, alignment  3/ 5:	46221.3
Length 524295, alignment  0/ 0:	47787.3
Length 524303, alignment  0/ 3:	93263.7
Length 524319, alignment  3/ 0:	93028.3
Length 524351, alignment  3/ 5:	93301.5
Length 1048583, alignment  0/ 0:	95413.2
Length 1048591, alignment  0/ 3:	186367
Length 1048607, alignment  3/ 0:	185780
Length 1048639, alignment  3/ 5:	186296
Length 2097159, alignment  0/ 0:	190546
Length 2097167, alignment  0/ 3:	372310
Length 2097183, alignment  3/ 0:	371187
Length 2097215, alignment  3/ 5:	372281
Length 4194311, alignment  0/ 0:	379009
Length 4194319, alignment  0/ 3:	736763
Length 4194335, alignment  3/ 0:	733672
Length 4194367, alignment  3/ 5:	736531
Length 8388615, alignment  0/ 0:	1.26684e+06
Length 8388623, alignment  0/ 3:	1.61883e+06
Length 8388639, alignment  3/ 0:	1.6062e+06
Length 8388671, alignment  3/ 5:	1.61872e+06
Length 16777223, alignment  0/ 0:	2.68259e+06
Length 16777231, alignment  0/ 3:	3.24415e+06
Length 16777247, alignment  3/ 0:	3.23356e+06
Length 16777279, alignment  3/ 5:	3.2449e+06
Length 33554439, alignment  0/ 0:	5.47245e+06
Length 33554447, alignment  0/ 3:	6.56719e+06
Length 33554463, alignment  3/ 0:	6.55255e+06
Length 33554495, alignment  3/ 5:	6.56698e+06

[-- Attachment #4: memcpy_aarch64.patch --]
[-- Type: text/x-patch, Size: 1885 bytes --]

diff --git a/sysdeps/aarch64/memcpy.S b/sysdeps/aarch64/memcpy.S
index 29af8b1..4742a01 100644
--- a/sysdeps/aarch64/memcpy.S
+++ b/sysdeps/aarch64/memcpy.S
@@ -158,10 +158,13 @@ L(copy96):
 
 	.p2align 4
 L(copy_long):
+	cmp	count, #32768
+	b.lo	L(copy_long_without_prefetch)
 	and	tmp1, dstin, 15
 	bic	dst, dstin, 15
 	ldp	D_l, D_h, [src]
 	sub	src, src, tmp1
+	prfm	pldl1strm, [src, 384]
 	add	count, count, tmp1	/* Count is now 16 too large.  */
 	ldp	A_l, A_h, [src, 16]
 	stp	D_l, D_h, [dstin]
@@ -169,7 +172,10 @@ L(copy_long):
 	ldp	C_l, C_h, [src, 48]
 	ldp	D_l, D_h, [src, 64]!
 	subs	count, count, 128 + 16	/* Test and readjust count.  */
-	b.ls	2f
+
+L(prefetch_loop64):
+	tbz	src, #6, 1f
+	prfm	pldl1strm, [src, 512]
 1:
 	stp	A_l, A_h, [dst, 16]
 	ldp	A_l, A_h, [src, 16]
@@ -180,12 +186,39 @@ L(copy_long):
 	stp	D_l, D_h, [dst, 64]!
 	ldp	D_l, D_h, [src, 64]!
 	subs	count, count, 64
-	b.hi	1b
+	b.hi	L(prefetch_loop64)
+	b	L(last64)
+
+L(copy_long_without_prefetch):
+
+	and	tmp1, dstin, 15
+	bic	dst, dstin, 15
+	ldp	D_l, D_h, [src]
+	sub	src, src, tmp1
+	add	count, count, tmp1	/* Count is now 16 too large.  */
+	ldp	A_l, A_h, [src, 16]
+	stp	D_l, D_h, [dstin]
+	ldp	B_l, B_h, [src, 32]
+	ldp	C_l, C_h, [src, 48]
+	ldp	D_l, D_h, [src, 64]!
+	subs	count, count, 128 + 16	/* Test and readjust count.  */
+	b.ls	L(last64)
+L(loop64):
+	stp	A_l, A_h, [dst, 16]
+	ldp	A_l, A_h, [src, 16]
+	stp	B_l, B_h, [dst, 32]
+	ldp	B_l, B_h, [src, 32]
+	stp	C_l, C_h, [dst, 48]
+	ldp	C_l, C_h, [src, 48]
+	stp	D_l, D_h, [dst, 64]!
+	ldp	D_l, D_h, [src, 64]!
+	subs	count, count, 64
+	b.hi	L(loop64)
 
 	/* Write the last full set of 64 bytes.  The remainder is at most 64
 	   bytes, so it is safe to always copy 64 bytes from the end even if
 	   there is just 1 byte left.  */
-2:
+L(last64):
 	ldp	E_l, E_h, [srcend, -64]
 	stp	A_l, A_h, [dst, 16]
 	ldp	A_l, A_h, [srcend, -48]

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH] Add ifunc memcpy and memmove for aarch64
  2017-02-07 13:22   ` Adhemerval Zanella
@ 2017-02-07 23:20     ` Steve Ellcey
  2017-02-08  5:46       ` Siddhesh Poyarekar
  0 siblings, 1 reply; 38+ messages in thread
From: Steve Ellcey @ 2017-02-07 23:20 UTC (permalink / raw)
  To: Adhemerval Zanella, Siddhesh Poyarekar, Wilco Dijkstra; +Cc: libc-alpha, nd

[-- Attachment #1: Type: text/plain, Size: 977 bytes --]

OK, here is the basic IFUNC enablement for aarch64 without the
memcpy/memmove changes that use it.  I verified that it builds and
causes no regressions on aarch64.  As mentioned in the original email
this code depends on the mrs instruction which is privileged, but the
4.11 kernel will have emulation for it (https://lkml.org/lkml/2017/1/10
/816).

OK to checkin this part?

Steve Ellcey
sellcey@cavium.com


2017-02-07  Steve Ellcey  <sellcey@caviumnetworks.com>
	    Adhemerval Zanella  <adhemerval.zanella@linaro.org>

	* sysdeps/aarch64/dl-machine.h: Include cpu-features.c.
	(DL_PLATFORM_INIT): New define.
	(dl_platform_init): New function.
	* sysdeps/aarch64/ldsodefs.h: Include cpu-features.h.
	* sysdeps/unix/sysv/linux/aarch64/cpu-features.c: New file.
	* sysdeps/unix/sysv/linux/aarch64/cpu-features.h: Ditto.
	* sysdeps/unix/sysv/linux/aarch64/dl-procinfo.c: Ditto.
	* sysdeps/unix/sysv/linux/aarch64/libc-start.c: Ditto.

[-- Attachment #2: ifunc.patch --]
[-- Type: text/x-patch, Size: 8975 bytes --]

diff --git a/sysdeps/aarch64/dl-machine.h b/sysdeps/aarch64/dl-machine.h
index 84b8aec..15d79a6 100644
--- a/sysdeps/aarch64/dl-machine.h
+++ b/sysdeps/aarch64/dl-machine.h
@@ -25,6 +25,7 @@
 #include <tls.h>
 #include <dl-tlsdesc.h>
 #include <dl-irel.h>
+#include <cpu-features.c>
 
 /* Return nonzero iff ELF header is compatible with the running host.  */
 static inline int __attribute__ ((unused))
@@ -225,6 +226,23 @@ _dl_start_user:								\n\
 #define ELF_MACHINE_NO_REL 1
 #define ELF_MACHINE_NO_RELA 0
 
+#define DL_PLATFORM_INIT dl_platform_init ()
+
+static inline void __attribute__ ((unused))
+dl_platform_init (void)
+{
+  if (GLRO(dl_platform) != NULL && *GLRO(dl_platform) == '\0')
+    /* Avoid an empty string which would disturb us.  */
+    GLRO(dl_platform) = NULL;
+
+#ifdef SHARED
+  /* init_cpu_features has been called early from __libc_start_main in
+     static executable.  */
+  init_cpu_features (&GLRO(dl_aarch64_cpu_features));
+#endif
+}
+
+
 static inline ElfW(Addr)
 elf_machine_fixup_plt (struct link_map *map, lookup_t t,
 		       const ElfW(Rela) *reloc,
diff --git a/sysdeps/aarch64/ldsodefs.h b/sysdeps/aarch64/ldsodefs.h
index f277074..ba4ada3 100644
--- a/sysdeps/aarch64/ldsodefs.h
+++ b/sysdeps/aarch64/ldsodefs.h
@@ -20,6 +20,7 @@
 #define _AARCH64_LDSODEFS_H 1
 
 #include <elf.h>
+#include <cpu-features.h>
 
 struct La_aarch64_regs;
 struct La_aarch64_retval;
diff --git a/sysdeps/unix/sysv/linux/aarch64/cpu-features.c b/sysdeps/unix/sysv/linux/aarch64/cpu-features.c
index e69de29..8e4b514 100644
--- a/sysdeps/unix/sysv/linux/aarch64/cpu-features.c
+++ b/sysdeps/unix/sysv/linux/aarch64/cpu-features.c
@@ -0,0 +1,38 @@
+/* Initialize CPU feature data.  AArch64 version.
+   This file is part of the GNU C Library.
+   Copyright (C) 2017 Free Software Foundation, Inc.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include <cpu-features.h>
+
+#ifndef HWCAP_CPUID
+# define HWCAP_CPUID		(1 << 11)
+#endif
+
+static inline void
+init_cpu_features (struct cpu_features *cpu_features)
+{
+  if (GLRO(dl_hwcap) & HWCAP_CPUID)
+    {
+      register uint64_t id = 0;
+      asm volatile ("mrs %0, midr_el1" : "=r"(id));
+      cpu_features->midr_el1 = id;
+    }
+  else
+    {
+      cpu_features->midr_el1 = 0;
+    }
+}
diff --git a/sysdeps/unix/sysv/linux/aarch64/cpu-features.h b/sysdeps/unix/sysv/linux/aarch64/cpu-features.h
index e69de29..c92b650 100644
--- a/sysdeps/unix/sysv/linux/aarch64/cpu-features.h
+++ b/sysdeps/unix/sysv/linux/aarch64/cpu-features.h
@@ -0,0 +1,49 @@
+/* Initialize CPU feature data.  AArch64 version.
+   This file is part of the GNU C Library.
+   Copyright (C) 2017 Free Software Foundation, Inc.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#ifndef _CPU_FEATURES_AARCH64_H
+#define _CPU_FEATURES_AARCH64_H
+
+#include <stdint.h>
+
+#define MIDR_PARTNUM_SHIFT	4
+#define MIDR_PARTNUM_MASK	(0xfff << MIDR_PARTNUM_SHIFT)
+#define MIDR_PARTNUM(midr)	\
+	(((midr) & MIDR_PARTNUM_MASK) >> MIDR_PARTNUM_SHIFT)
+#define MIDR_ARCHITECTURE_SHIFT	16
+#define MIDR_ARCHITECTURE_MASK	(0xf << MIDR_ARCHITECTURE_SHIFT)
+#define MIDR_ARCHITECTURE(midr)	\
+	(((midr) & MIDR_ARCHITECTURE_MASK) >> MIDR_ARCHITECTURE_SHIFT)
+#define MIDR_VARIANT_SHIFT	20
+#define MIDR_VARIANT_MASK	(0xf << MIDR_VARIANT_SHIFT)
+#define MIDR_VARIANT(midr)	\
+	(((midr) & MIDR_VARIANT_MASK) >> MIDR_VARIANT_SHIFT)
+#define MIDR_IMPLEMENTOR_SHIFT	24
+#define MIDR_IMPLEMENTOR_MASK	(0xff << MIDR_IMPLEMENTOR_SHIFT)
+#define MIDR_IMPLEMENTOR(midr)	\
+	(((midr) & MIDR_IMPLEMENTOR_MASK) >> MIDR_IMPLEMENTOR_SHIFT)
+
+#define IS_THUNDERX(midr) (MIDR_IMPLEMENTOR(midr) == 'C'	\
+			   && MIDR_PARTNUM(midr) == 0x0a1)
+
+struct cpu_features
+{
+  uint64_t midr_el1;
+};
+
+#endif /* _CPU_FEATURES_AARCH64_H  */
diff --git a/sysdeps/unix/sysv/linux/aarch64/dl-procinfo.c b/sysdeps/unix/sysv/linux/aarch64/dl-procinfo.c
index e69de29..438046a 100644
--- a/sysdeps/unix/sysv/linux/aarch64/dl-procinfo.c
+++ b/sysdeps/unix/sysv/linux/aarch64/dl-procinfo.c
@@ -0,0 +1,60 @@
+/* Data for AArch64 version of processor capability information.
+   Linux version.
+   Copyright (C) 2017 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+/* If anything should be added here check whether the size of each string
+   is still ok with the given array size.
+
+   All the #ifdefs in the definitions are quite irritating but
+   necessary if we want to avoid duplicating the information.  There
+   are three different modes:
+
+   - PROCINFO_DECL is defined.  This means we are only interested in
+     declarations.
+
+   - PROCINFO_DECL is not defined:
+
+     + if SHARED is defined the file is included in an array
+       initializer.  The .element = { ... } syntax is needed.
+
+     + if SHARED is not defined a normal array initialization is
+       needed.
+  */
+
+#ifndef PROCINFO_CLASS
+# define PROCINFO_CLASS
+#endif
+
+#if !IS_IN (ldconfig)
+# if !defined PROCINFO_DECL && defined SHARED
+  ._dl_aarch64_cpu_features
+# else
+PROCINFO_CLASS struct cpu_features _dl_aarch64_cpu_features
+# endif
+# ifndef PROCINFO_DECL
+= { }
+# endif
+# if !defined SHARED || defined PROCINFO_DECL
+;
+# else
+,
+# endif
+#endif
+
+#undef PROCINFO_DECL
+#undef PROCINFO_CLASS
diff --git a/sysdeps/unix/sysv/linux/aarch64/libc-start.c b/sysdeps/unix/sysv/linux/aarch64/libc-start.c
index e69de29..c98aff1 100644
--- a/sysdeps/unix/sysv/linux/aarch64/libc-start.c
+++ b/sysdeps/unix/sysv/linux/aarch64/libc-start.c
@@ -0,0 +1,40 @@
+/* Copyright (C) 2017 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#ifdef SHARED
+# include <csu/libc-start.c>
+# else
+/* The main work is done in the generic function.  */
+# define LIBC_START_DISABLE_INLINE
+# define LIBC_START_MAIN generic_start_main
+# include <csu/libc-start.c>
+# include <cpu-features.c>
+
+extern struct cpu_features _dl_aarch64_cpu_features;
+
+int
+__libc_start_main (int (*main) (int, char **, char ** MAIN_AUXVEC_DECL),
+		   int argc, char **argv,
+		   __typeof (main) init,
+		   void (*fini) (void),
+		   void (*rtld_fini) (void), void *stack_end)
+{
+  init_cpu_features (&_dl_aarch64_cpu_features);
+  return generic_start_main (main, argc, argv, init, fini, rtld_fini,
+			     stack_end);
+}
+#endif

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH] Add ifunc memcpy and memmove for aarch64
  2017-02-07 23:20     ` Steve Ellcey
@ 2017-02-08  5:46       ` Siddhesh Poyarekar
  2017-02-08  5:48         ` Siddhesh Poyarekar
  2017-02-09  0:02         ` Steve Ellcey
  0 siblings, 2 replies; 38+ messages in thread
From: Siddhesh Poyarekar @ 2017-02-08  5:46 UTC (permalink / raw)
  To: Steve Ellcey, Adhemerval Zanella, Wilco Dijkstra; +Cc: libc-alpha, nd

On Wednesday 08 February 2017 04:50 AM, Steve Ellcey wrote:
> OK, here is the basic IFUNC enablement for aarch64 without the
> memcpy/memmove changes that use it.  I verified that it builds and
> causes no regressions on aarch64.  As mentioned in the original email
> this code depends on the mrs instruction which is privileged, but the
> 4.11 kernel will have emulation for it (https://lkml.org/lkml/2017/1/10
> /816).
> 
> OK to checkin this part?

Looks OK with a couple of nits below.

> 
> Steve Ellcey
> sellcey@cavium.com
> 
> 
> 2017-02-07  Steve Ellcey  <sellcey@caviumnetworks.com>
> 	    Adhemerval Zanella  <adhemerval.zanella@linaro.org>
> 
> 	* sysdeps/aarch64/dl-machine.h: Include cpu-features.c.
> 	(DL_PLATFORM_INIT): New define.
> 	(dl_platform_init): New function.
> 	* sysdeps/aarch64/ldsodefs.h: Include cpu-features.h.
> 	* sysdeps/unix/sysv/linux/aarch64/cpu-features.c: New file.
> 	* sysdeps/unix/sysv/linux/aarch64/cpu-features.h: Ditto.
> 	* sysdeps/unix/sysv/linux/aarch64/dl-procinfo.c: Ditto.
> 	* sysdeps/unix/sysv/linux/aarch64/libc-start.c: Ditto.

I was told years ago that we prefer 'Likewise' to 'Ditto' :)

> 
> 
> ifunc.patch
> 
> 
> diff --git a/sysdeps/aarch64/dl-machine.h b/sysdeps/aarch64/dl-machine.h
> index 84b8aec..15d79a6 100644
> --- a/sysdeps/aarch64/dl-machine.h
> +++ b/sysdeps/aarch64/dl-machine.h
> @@ -25,6 +25,7 @@
>  #include <tls.h>
>  #include <dl-tlsdesc.h>
>  #include <dl-irel.h>
> +#include <cpu-features.c>
>  
>  /* Return nonzero iff ELF header is compatible with the running host.  */
>  static inline int __attribute__ ((unused))
> @@ -225,6 +226,23 @@ _dl_start_user:								\n\
>  #define ELF_MACHINE_NO_REL 1
>  #define ELF_MACHINE_NO_RELA 0
>  
> +#define DL_PLATFORM_INIT dl_platform_init ()
> +
> +static inline void __attribute__ ((unused))
> +dl_platform_init (void)
> +{
> +  if (GLRO(dl_platform) != NULL && *GLRO(dl_platform) == '\0')
> +    /* Avoid an empty string which would disturb us.  */
> +    GLRO(dl_platform) = NULL;
> +
> +#ifdef SHARED
> +  /* init_cpu_features has been called early from __libc_start_main in
> +     static executable.  */
> +  init_cpu_features (&GLRO(dl_aarch64_cpu_features));
> +#endif
> +}
> +
> +
>  static inline ElfW(Addr)
>  elf_machine_fixup_plt (struct link_map *map, lookup_t t,
>  		       const ElfW(Rela) *reloc,
> diff --git a/sysdeps/aarch64/ldsodefs.h b/sysdeps/aarch64/ldsodefs.h
> index f277074..ba4ada3 100644
> --- a/sysdeps/aarch64/ldsodefs.h
> +++ b/sysdeps/aarch64/ldsodefs.h
> @@ -20,6 +20,7 @@
>  #define _AARCH64_LDSODEFS_H 1
>  
>  #include <elf.h>
> +#include <cpu-features.h>
>  
>  struct La_aarch64_regs;
>  struct La_aarch64_retval;
> diff --git a/sysdeps/unix/sysv/linux/aarch64/cpu-features.c b/sysdeps/unix/sysv/linux/aarch64/cpu-features.c
> index e69de29..8e4b514 100644
> --- a/sysdeps/unix/sysv/linux/aarch64/cpu-features.c
> +++ b/sysdeps/unix/sysv/linux/aarch64/cpu-features.c
> @@ -0,0 +1,38 @@
> +/* Initialize CPU feature data.  AArch64 version.
> +   This file is part of the GNU C Library.
> +   Copyright (C) 2017 Free Software Foundation, Inc.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <http://www.gnu.org/licenses/>.  */
> +
> +#include <cpu-features.h>
> +
> +#ifndef HWCAP_CPUID
> +# define HWCAP_CPUID		(1 << 11)
> +#endif
> +
> +static inline void
> +init_cpu_features (struct cpu_features *cpu_features)
> +{
> +  if (GLRO(dl_hwcap) & HWCAP_CPUID)
> +    {
> +      register uint64_t id = 0;
> +      asm volatile ("mrs %0, midr_el1" : "=r"(id));
> +      cpu_features->midr_el1 = id;
> +    }
> +  else
> +    {
> +      cpu_features->midr_el1 = 0;
> +    }
> +}
> diff --git a/sysdeps/unix/sysv/linux/aarch64/cpu-features.h b/sysdeps/unix/sysv/linux/aarch64/cpu-features.h
> index e69de29..c92b650 100644
> --- a/sysdeps/unix/sysv/linux/aarch64/cpu-features.h
> +++ b/sysdeps/unix/sysv/linux/aarch64/cpu-features.h
> @@ -0,0 +1,49 @@
> +/* Initialize CPU feature data.  AArch64 version.
> +   This file is part of the GNU C Library.
> +   Copyright (C) 2017 Free Software Foundation, Inc.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <http://www.gnu.org/licenses/>.  */
> +
> +#ifndef _CPU_FEATURES_AARCH64_H
> +#define _CPU_FEATURES_AARCH64_H
> +
> +#include <stdint.h>
> +
> +#define MIDR_PARTNUM_SHIFT	4
> +#define MIDR_PARTNUM_MASK	(0xfff << MIDR_PARTNUM_SHIFT)
> +#define MIDR_PARTNUM(midr)	\
> +	(((midr) & MIDR_PARTNUM_MASK) >> MIDR_PARTNUM_SHIFT)
> +#define MIDR_ARCHITECTURE_SHIFT	16
> +#define MIDR_ARCHITECTURE_MASK	(0xf << MIDR_ARCHITECTURE_SHIFT)
> +#define MIDR_ARCHITECTURE(midr)	\
> +	(((midr) & MIDR_ARCHITECTURE_MASK) >> MIDR_ARCHITECTURE_SHIFT)
> +#define MIDR_VARIANT_SHIFT	20
> +#define MIDR_VARIANT_MASK	(0xf << MIDR_VARIANT_SHIFT)
> +#define MIDR_VARIANT(midr)	\
> +	(((midr) & MIDR_VARIANT_MASK) >> MIDR_VARIANT_SHIFT)
> +#define MIDR_IMPLEMENTOR_SHIFT	24
> +#define MIDR_IMPLEMENTOR_MASK	(0xff << MIDR_IMPLEMENTOR_SHIFT)
> +#define MIDR_IMPLEMENTOR(midr)	\
> +	(((midr) & MIDR_IMPLEMENTOR_MASK) >> MIDR_IMPLEMENTOR_SHIFT)
> +
> +#define IS_THUNDERX(midr) (MIDR_IMPLEMENTOR(midr) == 'C'	\
> +			   && MIDR_PARTNUM(midr) == 0x0a1)
> +
> +struct cpu_features
> +{
> +  uint64_t midr_el1;
> +};
> +
> +#endif /* _CPU_FEATURES_AARCH64_H  */
> diff --git a/sysdeps/unix/sysv/linux/aarch64/dl-procinfo.c b/sysdeps/unix/sysv/linux/aarch64/dl-procinfo.c
> index e69de29..438046a 100644
> --- a/sysdeps/unix/sysv/linux/aarch64/dl-procinfo.c
> +++ b/sysdeps/unix/sysv/linux/aarch64/dl-procinfo.c
> @@ -0,0 +1,60 @@
> +/* Data for AArch64 version of processor capability information.
> +   Linux version.
> +   Copyright (C) 2017 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <http://www.gnu.org/licenses/>.  */
> +
> +/* If anything should be added here check whether the size of each string
> +   is still ok with the given array size.
> +
> +   All the #ifdefs in the definitions are quite irritating but
> +   necessary if we want to avoid duplicating the information.  There
> +   are three different modes:
> +
> +   - PROCINFO_DECL is defined.  This means we are only interested in
> +     declarations.
> +
> +   - PROCINFO_DECL is not defined:
> +
> +     + if SHARED is defined the file is included in an array
> +       initializer.  The .element = { ... } syntax is needed.
> +
> +     + if SHARED is not defined a normal array initialization is
> +       needed.
> +  */
> +
> +#ifndef PROCINFO_CLASS
> +# define PROCINFO_CLASS
> +#endif
> +
> +#if !IS_IN (ldconfig)
> +# if !defined PROCINFO_DECL && defined SHARED
> +  ._dl_aarch64_cpu_features
> +# else
> +PROCINFO_CLASS struct cpu_features _dl_aarch64_cpu_features
> +# endif
> +# ifndef PROCINFO_DECL
> += { }
> +# endif
> +# if !defined SHARED || defined PROCINFO_DECL
> +;
> +# else
> +,
> +# endif
> +#endif
> +
> +#undef PROCINFO_DECL
> +#undef PROCINFO_CLASS
> diff --git a/sysdeps/unix/sysv/linux/aarch64/libc-start.c b/sysdeps/unix/sysv/linux/aarch64/libc-start.c
> index e69de29..c98aff1 100644
> --- a/sysdeps/unix/sysv/linux/aarch64/libc-start.c
> +++ b/sysdeps/unix/sysv/linux/aarch64/libc-start.c
> @@ -0,0 +1,40 @@

You've forgotten to add a one line description for this file.

> +/* Copyright (C) 2017 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <http://www.gnu.org/licenses/>.  */
> +
> +#ifdef SHARED
> +# include <csu/libc-start.c>
> +# else
> +/* The main work is done in the generic function.  */
> +# define LIBC_START_DISABLE_INLINE
> +# define LIBC_START_MAIN generic_start_main
> +# include <csu/libc-start.c>
> +# include <cpu-features.c>
> +
> +extern struct cpu_features _dl_aarch64_cpu_features;
> +
> +int
> +__libc_start_main (int (*main) (int, char **, char ** MAIN_AUXVEC_DECL),
> +		   int argc, char **argv,
> +		   __typeof (main) init,
> +		   void (*fini) (void),
> +		   void (*rtld_fini) (void), void *stack_end)
> +{
> +  init_cpu_features (&_dl_aarch64_cpu_features);
> +  return generic_start_main (main, argc, argv, init, fini, rtld_fini,
> +			     stack_end);
> +}
> +#endif
> 

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH] Add ifunc memcpy and memmove for aarch64
  2017-02-08  5:46       ` Siddhesh Poyarekar
@ 2017-02-08  5:48         ` Siddhesh Poyarekar
  2017-02-08 12:03           ` Szabolcs Nagy
  2017-02-09  0:02         ` Steve Ellcey
  1 sibling, 1 reply; 38+ messages in thread
From: Siddhesh Poyarekar @ 2017-02-08  5:48 UTC (permalink / raw)
  To: Steve Ellcey, Adhemerval Zanella, Wilco Dijkstra; +Cc: libc-alpha, nd

On Wednesday 08 February 2017 11:15 AM, Siddhesh Poyarekar wrote:
> Looks OK with a couple of nits below.

Oh and I suppose you need an ack from the ARM maintainers as well before
you push.

Siddhesh

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH] Add ifunc memcpy and memmove for aarch64
  2017-02-08  5:48         ` Siddhesh Poyarekar
@ 2017-02-08 12:03           ` Szabolcs Nagy
  0 siblings, 0 replies; 38+ messages in thread
From: Szabolcs Nagy @ 2017-02-08 12:03 UTC (permalink / raw)
  To: Siddhesh Poyarekar, Steve Ellcey, Adhemerval Zanella, Wilco Dijkstra
  Cc: nd, libc-alpha

On 08/02/17 05:46, Siddhesh Poyarekar wrote:
> On Wednesday 08 February 2017 11:15 AM, Siddhesh Poyarekar wrote:
>> Looks OK with a couple of nits below.
> 
> Oh and I suppose you need an ack from the ARM maintainers as well before
> you push.
> 

arm maintainer != aarch64 maintainer.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH] Add ifunc memcpy and memmove for aarch64
  2017-02-08  5:46       ` Siddhesh Poyarekar
  2017-02-08  5:48         ` Siddhesh Poyarekar
@ 2017-02-09  0:02         ` Steve Ellcey
  2017-02-09 10:51           ` Szabolcs Nagy
  1 sibling, 1 reply; 38+ messages in thread
From: Steve Ellcey @ 2017-02-09  0:02 UTC (permalink / raw)
  To: Siddhesh Poyarekar, Adhemerval Zanella, Wilco Dijkstra; +Cc: libc-alpha, nd

[-- Attachment #1: Type: text/plain, Size: 847 bytes --]

On Wed, 2017-02-08 at 11:15 +0530, Siddhesh Poyarekar wrote:

> Looks OK with a couple of nits below.

Here is a de-nitted version with the ChangeLog using 'Likewise'
instead of 'Ditto' and with a one line description at the top
of libc-start.c.

Steve Ellcey
sellcey@caviium.com



2017-02-08  Steve Ellcey  <sellcey@caviumnetworks.com>
	    Adhemerval Zanella  <adhemerval.zanella@linaro.org>

	* sysdeps/aarch64/dl-machine.h: Include cpu-features.c.
	(DL_PLATFORM_INIT): New define.
	(dl_platform_init): New function.
	* sysdeps/aarch64/ldsodefs.h: Include cpu-features.h.
	* sysdeps/unix/sysv/linux/aarch64/cpu-features.c: New file.
	* sysdeps/unix/sysv/linux/aarch64/cpu-features.h: Likewise.
	* sysdeps/unix/sysv/linux/aarch64/dl-procinfo.c: Likewise.
	* sysdeps/unix/sysv/linux/aarch64/libc-start.c: Likewise.

[-- Attachment #2: ifunc.patch --]
[-- Type: text/x-patch, Size: 9017 bytes --]

diff --git a/sysdeps/aarch64/dl-machine.h b/sysdeps/aarch64/dl-machine.h
index 84b8aec..15d79a6 100644
--- a/sysdeps/aarch64/dl-machine.h
+++ b/sysdeps/aarch64/dl-machine.h
@@ -25,6 +25,7 @@
 #include <tls.h>
 #include <dl-tlsdesc.h>
 #include <dl-irel.h>
+#include <cpu-features.c>
 
 /* Return nonzero iff ELF header is compatible with the running host.  */
 static inline int __attribute__ ((unused))
@@ -225,6 +226,23 @@ _dl_start_user:								\n\
 #define ELF_MACHINE_NO_REL 1
 #define ELF_MACHINE_NO_RELA 0
 
+#define DL_PLATFORM_INIT dl_platform_init ()
+
+static inline void __attribute__ ((unused))
+dl_platform_init (void)
+{
+  if (GLRO(dl_platform) != NULL && *GLRO(dl_platform) == '\0')
+    /* Avoid an empty string which would disturb us.  */
+    GLRO(dl_platform) = NULL;
+
+#ifdef SHARED
+  /* init_cpu_features has been called early from __libc_start_main in
+     static executable.  */
+  init_cpu_features (&GLRO(dl_aarch64_cpu_features));
+#endif
+}
+
+
 static inline ElfW(Addr)
 elf_machine_fixup_plt (struct link_map *map, lookup_t t,
 		       const ElfW(Rela) *reloc,
diff --git a/sysdeps/aarch64/ldsodefs.h b/sysdeps/aarch64/ldsodefs.h
index f277074..ba4ada3 100644
--- a/sysdeps/aarch64/ldsodefs.h
+++ b/sysdeps/aarch64/ldsodefs.h
@@ -20,6 +20,7 @@
 #define _AARCH64_LDSODEFS_H 1
 
 #include <elf.h>
+#include <cpu-features.h>
 
 struct La_aarch64_regs;
 struct La_aarch64_retval;
diff --git a/sysdeps/unix/sysv/linux/aarch64/cpu-features.c b/sysdeps/unix/sysv/linux/aarch64/cpu-features.c
index e69de29..8e4b514 100644
--- a/sysdeps/unix/sysv/linux/aarch64/cpu-features.c
+++ b/sysdeps/unix/sysv/linux/aarch64/cpu-features.c
@@ -0,0 +1,38 @@
+/* Initialize CPU feature data.  AArch64 version.
+   This file is part of the GNU C Library.
+   Copyright (C) 2017 Free Software Foundation, Inc.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include <cpu-features.h>
+
+#ifndef HWCAP_CPUID
+# define HWCAP_CPUID		(1 << 11)
+#endif
+
+static inline void
+init_cpu_features (struct cpu_features *cpu_features)
+{
+  if (GLRO(dl_hwcap) & HWCAP_CPUID)
+    {
+      register uint64_t id = 0;
+      asm volatile ("mrs %0, midr_el1" : "=r"(id));
+      cpu_features->midr_el1 = id;
+    }
+  else
+    {
+      cpu_features->midr_el1 = 0;
+    }
+}
diff --git a/sysdeps/unix/sysv/linux/aarch64/cpu-features.h b/sysdeps/unix/sysv/linux/aarch64/cpu-features.h
index e69de29..c92b650 100644
--- a/sysdeps/unix/sysv/linux/aarch64/cpu-features.h
+++ b/sysdeps/unix/sysv/linux/aarch64/cpu-features.h
@@ -0,0 +1,49 @@
+/* Initialize CPU feature data.  AArch64 version.
+   This file is part of the GNU C Library.
+   Copyright (C) 2017 Free Software Foundation, Inc.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#ifndef _CPU_FEATURES_AARCH64_H
+#define _CPU_FEATURES_AARCH64_H
+
+#include <stdint.h>
+
+#define MIDR_PARTNUM_SHIFT	4
+#define MIDR_PARTNUM_MASK	(0xfff << MIDR_PARTNUM_SHIFT)
+#define MIDR_PARTNUM(midr)	\
+	(((midr) & MIDR_PARTNUM_MASK) >> MIDR_PARTNUM_SHIFT)
+#define MIDR_ARCHITECTURE_SHIFT	16
+#define MIDR_ARCHITECTURE_MASK	(0xf << MIDR_ARCHITECTURE_SHIFT)
+#define MIDR_ARCHITECTURE(midr)	\
+	(((midr) & MIDR_ARCHITECTURE_MASK) >> MIDR_ARCHITECTURE_SHIFT)
+#define MIDR_VARIANT_SHIFT	20
+#define MIDR_VARIANT_MASK	(0xf << MIDR_VARIANT_SHIFT)
+#define MIDR_VARIANT(midr)	\
+	(((midr) & MIDR_VARIANT_MASK) >> MIDR_VARIANT_SHIFT)
+#define MIDR_IMPLEMENTOR_SHIFT	24
+#define MIDR_IMPLEMENTOR_MASK	(0xff << MIDR_IMPLEMENTOR_SHIFT)
+#define MIDR_IMPLEMENTOR(midr)	\
+	(((midr) & MIDR_IMPLEMENTOR_MASK) >> MIDR_IMPLEMENTOR_SHIFT)
+
+#define IS_THUNDERX(midr) (MIDR_IMPLEMENTOR(midr) == 'C'	\
+			   && MIDR_PARTNUM(midr) == 0x0a1)
+
+struct cpu_features
+{
+  uint64_t midr_el1;
+};
+
+#endif /* _CPU_FEATURES_AARCH64_H  */
diff --git a/sysdeps/unix/sysv/linux/aarch64/dl-procinfo.c b/sysdeps/unix/sysv/linux/aarch64/dl-procinfo.c
index e69de29..438046a 100644
--- a/sysdeps/unix/sysv/linux/aarch64/dl-procinfo.c
+++ b/sysdeps/unix/sysv/linux/aarch64/dl-procinfo.c
@@ -0,0 +1,60 @@
+/* Data for AArch64 version of processor capability information.
+   Linux version.
+   Copyright (C) 2017 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+/* If anything should be added here check whether the size of each string
+   is still ok with the given array size.
+
+   All the #ifdefs in the definitions are quite irritating but
+   necessary if we want to avoid duplicating the information.  There
+   are three different modes:
+
+   - PROCINFO_DECL is defined.  This means we are only interested in
+     declarations.
+
+   - PROCINFO_DECL is not defined:
+
+     + if SHARED is defined the file is included in an array
+       initializer.  The .element = { ... } syntax is needed.
+
+     + if SHARED is not defined a normal array initialization is
+       needed.
+  */
+
+#ifndef PROCINFO_CLASS
+# define PROCINFO_CLASS
+#endif
+
+#if !IS_IN (ldconfig)
+# if !defined PROCINFO_DECL && defined SHARED
+  ._dl_aarch64_cpu_features
+# else
+PROCINFO_CLASS struct cpu_features _dl_aarch64_cpu_features
+# endif
+# ifndef PROCINFO_DECL
+= { }
+# endif
+# if !defined SHARED || defined PROCINFO_DECL
+;
+# else
+,
+# endif
+#endif
+
+#undef PROCINFO_DECL
+#undef PROCINFO_CLASS
diff --git a/sysdeps/unix/sysv/linux/aarch64/libc-start.c b/sysdeps/unix/sysv/linux/aarch64/libc-start.c
index e69de29..a5babd4 100644
--- a/sysdeps/unix/sysv/linux/aarch64/libc-start.c
+++ b/sysdeps/unix/sysv/linux/aarch64/libc-start.c
@@ -0,0 +1,41 @@
+/* Override csu/libc-start.c on AArch64.
+   Copyright (C) 2017 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#ifdef SHARED
+# include <csu/libc-start.c>
+# else
+/* The main work is done in the generic function.  */
+# define LIBC_START_DISABLE_INLINE
+# define LIBC_START_MAIN generic_start_main
+# include <csu/libc-start.c>
+# include <cpu-features.c>
+
+extern struct cpu_features _dl_aarch64_cpu_features;
+
+int
+__libc_start_main (int (*main) (int, char **, char ** MAIN_AUXVEC_DECL),
+		   int argc, char **argv,
+		   __typeof (main) init,
+		   void (*fini) (void),
+		   void (*rtld_fini) (void), void *stack_end)
+{
+  init_cpu_features (&_dl_aarch64_cpu_features);
+  return generic_start_main (main, argc, argv, init, fini, rtld_fini,
+			     stack_end);
+}
+#endif

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH] Add ifunc memcpy and memmove for aarch64
  2017-02-09  0:02         ` Steve Ellcey
@ 2017-02-09 10:51           ` Szabolcs Nagy
  2017-02-09 11:04             ` Siddhesh Poyarekar
  2017-02-09 15:54             ` Andrew Pinski
  0 siblings, 2 replies; 38+ messages in thread
From: Szabolcs Nagy @ 2017-02-09 10:51 UTC (permalink / raw)
  To: Steve Ellcey, Siddhesh Poyarekar, Adhemerval Zanella, Wilco Dijkstra
  Cc: nd, libc-alpha

On 09/02/17 00:02, Steve Ellcey wrote:
> +static inline void __attribute__ ((unused))
> +dl_platform_init (void)
> +{
> +  if (GLRO(dl_platform) != NULL && *GLRO(dl_platform) == '\0')
> +    /* Avoid an empty string which would disturb us.  */
> +    GLRO(dl_platform) = NULL;
> +
> +#ifdef SHARED
> +  /* init_cpu_features has been called early from __libc_start_main in
> +     static executable.  */
> +  init_cpu_features (&GLRO(dl_aarch64_cpu_features));
> +#endif
> +}
...
> +static inline void
> +init_cpu_features (struct cpu_features *cpu_features)
> +{
> +  if (GLRO(dl_hwcap) & HWCAP_CPUID)
> +    {
> +      register uint64_t id = 0;
> +      asm volatile ("mrs %0, midr_el1" : "=r"(id));
> +      cpu_features->midr_el1 = id;

this is a trap into the kernel at every process startup

since this is called very early (dynamic linking case
above, static linking case below) i wonder if there
could be a way for the user to request midr_el1==0
unconditionally (avoiding the overhead and making
sure the most generic implementation is used)

is there something like that on other targets?

> +    }
> +  else
> +    {
> +      cpu_features->midr_el1 = 0;
> +    }
> +}
...
> +#ifdef SHARED
> +# include <csu/libc-start.c>
> +# else
> +/* The main work is done in the generic function.  */
> +# define LIBC_START_DISABLE_INLINE
> +# define LIBC_START_MAIN generic_start_main
> +# include <csu/libc-start.c>
> +# include <cpu-features.c>
> +
> +extern struct cpu_features _dl_aarch64_cpu_features;
> +
> +int
> +__libc_start_main (int (*main) (int, char **, char ** MAIN_AUXVEC_DECL),
> +		   int argc, char **argv,
> +		   __typeof (main) init,
> +		   void (*fini) (void),
> +		   void (*rtld_fini) (void), void *stack_end)
> +{
> +  init_cpu_features (&_dl_aarch64_cpu_features);
> +  return generic_start_main (main, argc, argv, init, fini, rtld_fini,
> +			     stack_end);
> +}
> +#endif
> 

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH] Add ifunc memcpy and memmove for aarch64
  2017-02-09 10:51           ` Szabolcs Nagy
@ 2017-02-09 11:04             ` Siddhesh Poyarekar
  2017-02-09 11:40               ` Szabolcs Nagy
  2017-02-09 15:54             ` Andrew Pinski
  1 sibling, 1 reply; 38+ messages in thread
From: Siddhesh Poyarekar @ 2017-02-09 11:04 UTC (permalink / raw)
  To: Szabolcs Nagy, Steve Ellcey, Adhemerval Zanella, Wilco Dijkstra
  Cc: nd, libc-alpha

On Thursday 09 February 2017 04:20 PM, Szabolcs Nagy wrote:
> this is a trap into the kernel at every process startup
> 
> since this is called very early (dynamic linking case
> above, static linking case below) i wonder if there
> could be a way for the user to request midr_el1==0
> unconditionally (avoiding the overhead and making
> sure the most generic implementation is used)

Well you could use tunables to avoid it, but then if a single trap is a
problem for you then the tunables infra is going to be just as expensive.

Why is a single trap at startup such a concern though?

> is there something like that on other targets?

H.J. Lu had a patch to override IFUNCs using (what will now be) a
tunable but that patch did not make progress.  I hope it will now since
I too am interested in overriding IFUNC selection using tunables.  But
then this would be an orthogonal discussion to avoiding the trap since
the goals of both are different.

Siddhesh

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH] Add ifunc memcpy and memmove for aarch64
  2017-02-09 11:04             ` Siddhesh Poyarekar
@ 2017-02-09 11:40               ` Szabolcs Nagy
  2017-02-09 11:53                 ` Siddhesh Poyarekar
  0 siblings, 1 reply; 38+ messages in thread
From: Szabolcs Nagy @ 2017-02-09 11:40 UTC (permalink / raw)
  To: Siddhesh Poyarekar, Steve Ellcey, Adhemerval Zanella, Wilco Dijkstra
  Cc: nd, libc-alpha

On 09/02/17 11:04, Siddhesh Poyarekar wrote:
> On Thursday 09 February 2017 04:20 PM, Szabolcs Nagy wrote:
>> this is a trap into the kernel at every process startup
>>
>> since this is called very early (dynamic linking case
>> above, static linking case below) i wonder if there
>> could be a way for the user to request midr_el1==0
>> unconditionally (avoiding the overhead and making
>> sure the most generic implementation is used)
> 
> Well you could use tunables to avoid it, but then if a single trap is a
> problem for you then the tunables infra is going to be just as expensive.
> 
> Why is a single trap at startup such a concern though?

ok, it is probably not worth worrying about
(should be around 0.1% of the minimal startup time now)

but doing it just to control ifunc selection is still
useful (at least for development)

(if eventually there will be widespread use of ifunc
based function multi-versioning then the trap will
not be per process startup but per module load which
is a bit more relevant, but also not something we can
control from the libc)

>> is there something like that on other targets?
> 
> H.J. Lu had a patch to override IFUNCs using (what will now be) a
> tunable but that patch did not make progress.  I hope it will now since
> I too am interested in overriding IFUNC selection using tunables.  But
> then this would be an orthogonal discussion to avoiding the trap since
> the goals of both are different.

i see.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH] Add ifunc memcpy and memmove for aarch64
  2017-02-09 11:40               ` Szabolcs Nagy
@ 2017-02-09 11:53                 ` Siddhesh Poyarekar
  0 siblings, 0 replies; 38+ messages in thread
From: Siddhesh Poyarekar @ 2017-02-09 11:53 UTC (permalink / raw)
  To: Szabolcs Nagy, Steve Ellcey, Adhemerval Zanella, Wilco Dijkstra
  Cc: nd, libc-alpha

On Thursday 09 February 2017 05:10 PM, Szabolcs Nagy wrote:
> ok, it is probably not worth worrying about
> (should be around 0.1% of the minimal startup time now)
> 
> but doing it just to control ifunc selection is still
> useful (at least for development)

A simple tunable that disables CPU selection would be nice:

glibc.tune.cpu = default|thunderx|a57

However then you would have to think harder about positioning the cpu
structure initialization in static binaries to have them be controlled
by tunables and not just blindly put them right at the top of
__libc_start as the easy way out.  It is something that should be done
anyway.

However, couldn't we do that as an add-on to this patch?  I'd really
like this framework to go in early and be exercised a bit because I am
interested in pushing (in the coming weeks hopefully) some optimal
string routines for aarch64.

> (if eventually there will be widespread use of ifunc
> based function multi-versioning then the trap will
> not be per process startup but per module load which
> is a bit more relevant, but also not something we can
> control from the libc)

Right we cannot control that.

Siddhesh

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH] Add ifunc memcpy and memmove for aarch64
  2017-02-09 10:51           ` Szabolcs Nagy
  2017-02-09 11:04             ` Siddhesh Poyarekar
@ 2017-02-09 15:54             ` Andrew Pinski
  2017-02-10  0:55               ` Steve Ellcey
  1 sibling, 1 reply; 38+ messages in thread
From: Andrew Pinski @ 2017-02-09 15:54 UTC (permalink / raw)
  To: Szabolcs Nagy
  Cc: Steve Ellcey, Siddhesh Poyarekar, Adhemerval Zanella,
	Wilco Dijkstra, nd, libc-alpha

On Thu, Feb 9, 2017 at 2:50 AM, Szabolcs Nagy <szabolcs.nagy@arm.com> wrote:
> On 09/02/17 00:02, Steve Ellcey wrote:
>> +static inline void __attribute__ ((unused))
>> +dl_platform_init (void)
>> +{
>> +  if (GLRO(dl_platform) != NULL && *GLRO(dl_platform) == '\0')
>> +    /* Avoid an empty string which would disturb us.  */
>> +    GLRO(dl_platform) = NULL;
>> +
>> +#ifdef SHARED
>> +  /* init_cpu_features has been called early from __libc_start_main in
>> +     static executable.  */
>> +  init_cpu_features (&GLRO(dl_aarch64_cpu_features));
>> +#endif
>> +}
> ...
>> +static inline void
>> +init_cpu_features (struct cpu_features *cpu_features)
>> +{
>> +  if (GLRO(dl_hwcap) & HWCAP_CPUID)
>> +    {
>> +      register uint64_t id = 0;
>> +      asm volatile ("mrs %0, midr_el1" : "=r"(id));
>> +      cpu_features->midr_el1 = id;
>
> this is a trap into the kernel at every process startup
>
> since this is called very early (dynamic linking case
> above, static linking case below) i wonder if there
> could be a way for the user to request midr_el1==0
> unconditionally (avoiding the overhead and making
> sure the most generic implementation is used)

Well the easy way to do this would be use LD_HWCAP_MASK and mask off
the HWCAP_CPUID bit.

>
> is there something like that on other targets?

Some targets like PowerPC use the hwcap to say which processor they
are being run on so you use the same method as above.

Thanks,
Andrew

>
>> +    }
>> +  else
>> +    {
>> +      cpu_features->midr_el1 = 0;
>> +    }
>> +}
> ...
>> +#ifdef SHARED
>> +# include <csu/libc-start.c>
>> +# else
>> +/* The main work is done in the generic function.  */
>> +# define LIBC_START_DISABLE_INLINE
>> +# define LIBC_START_MAIN generic_start_main
>> +# include <csu/libc-start.c>
>> +# include <cpu-features.c>
>> +
>> +extern struct cpu_features _dl_aarch64_cpu_features;
>> +
>> +int
>> +__libc_start_main (int (*main) (int, char **, char ** MAIN_AUXVEC_DECL),
>> +                int argc, char **argv,
>> +                __typeof (main) init,
>> +                void (*fini) (void),
>> +                void (*rtld_fini) (void), void *stack_end)
>> +{
>> +  init_cpu_features (&_dl_aarch64_cpu_features);
>> +  return generic_start_main (main, argc, argv, init, fini, rtld_fini,
>> +                          stack_end);
>> +}
>> +#endif
>>
>

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH] Add ifunc memcpy and memmove for aarch64
  2017-02-09 15:54             ` Andrew Pinski
@ 2017-02-10  0:55               ` Steve Ellcey
  2017-02-23  0:27                 ` Steve Ellcey
  0 siblings, 1 reply; 38+ messages in thread
From: Steve Ellcey @ 2017-02-10  0:55 UTC (permalink / raw)
  To: Andrew Pinski, Szabolcs Nagy
  Cc: Siddhesh Poyarekar, Adhemerval Zanella, Wilco Dijkstra, nd, libc-alpha

On Thu, 2017-02-09 at 07:54 -0800, Andrew Pinski wrote:
> On Thu, Feb 9, 2017 at 2:50 AM, Szabolcs Nagy <szabolcs.nagy@arm.com> wrote:> 
> > since this is called very early (dynamic linking case
> > above, static linking case below) i wonder if there
> > could be a way for the user to request midr_el1==0
> > unconditionally (avoiding the overhead and making
> > sure the most generic implementation is used)
> Well the easy way to do this would be use LD_HWCAP_MASK and mask off
> the HWCAP_CPUID bit.
> 
> > is there something like that on other targets?
> Some targets like PowerPC use the hwcap to say which processor they
> are being run on so you use the same method as above.
> 
> Thanks,
> Andrew

Do you know if LD_HWCAP_MASK actually works on PowerPC to do this?  I
see where PowerPC is reading GLRO(dl_hwcap) but I don't see them
reading GLRO(dl_hwcap_mask).  I don't think that the generic parts of
libc apply LD_HWCAP_MASK to HWCAP automatically but maybe I missed it
somewhere.

I changed init_cpu_features in my patch from:

	if (GLRO(dl_hwcap) & HWCAP_CPUID)
to
	if (GLRO(dl_hwcap) & GLRO(dl_hwcap_mask) & HWCAP_CPUID)

to see what would happen.  Initially this returned false because
we were using the default dl-procinfo.h and HWCAP_IMPORTANT (used
to initialize dl_hwcap_mask) was 0.  I made a copy of dl-procinfo.h
and set HWCAP_IMPORTANT to HPWCAP_CPUID and I got true here and the
thunderx ifuncs were run.

But if I ran things after setting LD_HWCAP_MASK to 0 it didn't seem to
have any affect and I still ran thunderx ifuncs.  I am not sure but it
seemed like this code in init_cpu_features may have been getting run
before LD_HWCAP_MASK was getting read and before GLRO(dl_hwcap_mask)
was reset from its initial value.

Steve Ellcey

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH] Add ifunc memcpy and memmove for aarch64
  2017-02-10  0:55               ` Steve Ellcey
@ 2017-02-23  0:27                 ` Steve Ellcey
  2017-02-23 14:21                   ` Siddhesh Poyarekar
  0 siblings, 1 reply; 38+ messages in thread
From: Steve Ellcey @ 2017-02-23  0:27 UTC (permalink / raw)
  To: Andrew Pinski, Szabolcs Nagy
  Cc: Siddhesh Poyarekar, Adhemerval Zanella, Wilco Dijkstra, nd, libc-alpha

[-- Attachment #1: Type: text/plain, Size: 1471 bytes --]

Here is a new version of the IFUNC functionality for aarch64.  It does
not include the memcpy changes to use it.  I tried to move the cpu
feature check to later in the start up process so that I could check
GLRO(dl_hwcap_mask) but it does not seem to be working.  I was
wondering if anyone could look at this and see if I am doing something
wrong.

If I understand the code correctly, for a dynamically linked program,
dl_main is going to get called before dl_sysdep_start and dl_main is
where environment variables are processed and where dl_hwcap_mask
should be getting set.  But when I check dl_hwcap_mask in
init_cpu_features (called from dl_sysdep_start), it does not seem to
have changed from its original value that is set in the new
dl-procinfo.h header file.

Any ideas on why this isn't working?

Steve Ellcey
sellcey@cavium.com


2017-02-22  Steve Ellcey  <sellcey@caviumnetworks.com>
	    Adhemerval Zanella  <adhemerval.zanella@linaro.org>

	* csu/libc-start.c (LIBC_START_MAIN): Use INIT_CPU_FEATURES.
	* elf/dl-sysdep.c (_dl_sysdep_start): Likewise.
	* sysdeps/aarch64/dl-procinfo.h: New file.
	* sysdeps/aarch64/ldsodefs.h: Include cpu-features.h.
	* sysdeps/unix/sysv/linux/aarch64/Makefile (sysdep_routines):
	Add cpu-features.
	* sysdeps/unix/sysv/linux/aarch64/cpu-features.c: New file.
	* sysdeps/unix/sysv/linux/aarch64/cpu-features.h: Likewise.
	* sysdeps/unix/sysv/linux/aarch64/dl-procinfo.c: Likewise.


[-- Attachment #2: ifunc.patch --]
[-- Type: text/x-patch, Size: 9536 bytes --]

diff --git a/csu/libc-start.c b/csu/libc-start.c
index 9a56dcb..ec19466 100644
--- a/csu/libc-start.c
+++ b/csu/libc-start.c
@@ -182,6 +182,10 @@ LIBC_START_MAIN (int (*main) (int, char **, char ** MAIN_AUXVEC_DECL),
 
   __tunables_init (__environ);
 
+#ifdef INIT_CPU_FEATURES
+  INIT_CPU_FEATURES;
+#endif
+
   /* Perform IREL{,A} relocations.  */
   apply_irel ();
 
diff --git a/elf/dl-sysdep.c b/elf/dl-sysdep.c
index 4053ff3..c963a1e 100644
--- a/elf/dl-sysdep.c
+++ b/elf/dl-sysdep.c
@@ -223,6 +223,10 @@ _dl_sysdep_start (void **start_argptr,
 
   __tunables_init (_environ);
 
+#ifdef INIT_CPU_FEATURES
+  INIT_CPU_FEATURES;
+#endif
+
 #ifdef DL_SYSDEP_INIT
   DL_SYSDEP_INIT;
 #endif
diff --git a/sysdeps/aarch64/dl-procinfo.h b/sysdeps/aarch64/dl-procinfo.h
index e69de29..0e0829f 100644
--- a/sysdeps/aarch64/dl-procinfo.h
+++ b/sysdeps/aarch64/dl-procinfo.h
@@ -0,0 +1,50 @@
+/* Stub version of processor capability information handling macros.
+   Copyright (C) 1998-2017 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+   Contributed by Ulrich Drepper <drepper@cygnus.com>, 1998.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#ifndef _DL_PROCINFO_H
+#define _DL_PROCINFO_H	1
+
+/* We cannot provide a general printing function.  */
+#define _dl_procinfo(type, word) -1
+
+/* There are no hardware capabilities defined.  */
+#define _dl_hwcap_string(idx) ""
+
+/* There are no different platforms defined.  */
+#define _dl_platform_string(idx) ""
+
+/* Needed here until this gets into kernel sources.  */
+#ifndef HWCAP_CPUID
+# define HWCAP_CPUID            (1 << 11)
+#endif
+
+/* By default there is no important hardware capability.  */
+#define HWCAP_IMPORTANT (HWCAP_CPUID)
+
+/* There're no platforms to filter out.  */
+#define _DL_HWCAP_PLATFORM 0
+
+/* We don't have any hardware capabilities.  */
+#define _DL_HWCAP_COUNT 0
+
+#define _dl_string_hwcap(str) (-1)
+
+#define _dl_string_platform(str) (-1)
+
+#endif /* dl-procinfo.h */
diff --git a/sysdeps/aarch64/ldsodefs.h b/sysdeps/aarch64/ldsodefs.h
index f277074..ba4ada3 100644
--- a/sysdeps/aarch64/ldsodefs.h
+++ b/sysdeps/aarch64/ldsodefs.h
@@ -20,6 +20,7 @@
 #define _AARCH64_LDSODEFS_H 1
 
 #include <elf.h>
+#include <cpu-features.h>
 
 struct La_aarch64_regs;
 struct La_aarch64_retval;
diff --git a/sysdeps/unix/sysv/linux/aarch64/Makefile b/sysdeps/unix/sysv/linux/aarch64/Makefile
index 6b4e620..d17dafe 100644
--- a/sysdeps/unix/sysv/linux/aarch64/Makefile
+++ b/sysdeps/unix/sysv/linux/aarch64/Makefile
@@ -1,5 +1,6 @@
 ifeq ($(subdir),csu)
 sysdep_routines      += __read_tp libc-__read_tp
+sysdep_routines      += cpu-features
 static-only-routines += __read_tp
 shared-only-routines += libc-__read_tp
 endif
diff --git a/sysdeps/unix/sysv/linux/aarch64/cpu-features.c b/sysdeps/unix/sysv/linux/aarch64/cpu-features.c
index e69de29..867e1ca 100644
--- a/sysdeps/unix/sysv/linux/aarch64/cpu-features.c
+++ b/sysdeps/unix/sysv/linux/aarch64/cpu-features.c
@@ -0,0 +1,45 @@
+/* Initialize CPU feature data.  AArch64 version.
+   This file is part of the GNU C Library.
+   Copyright (C) 2017 Free Software Foundation, Inc.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include <cpu-features.h>
+#include <assert.h>
+#include <stdlib.h>
+#include <stdio.h>
+#include <unistd.h>
+#include <ldsodefs.h>
+#include <exit-thread.h>
+#include <libc-internal.h>
+
+#ifndef HWCAP_CPUID
+# define HWCAP_CPUID		(1 << 11)
+#endif
+
+void
+init_cpu_features (struct cpu_features *cpu_features)
+{
+  if (HWCAP_CPUID & GLRO(dl_hwcap) & GLRO(dl_hwcap_mask))
+    {
+      register uint64_t id = 0;
+      asm volatile ("mrs %0, midr_el1" : "=r"(id));
+      cpu_features->midr_el1 = id;
+    }
+  else
+    {
+      cpu_features->midr_el1 = 0;
+    }
+}
diff --git a/sysdeps/unix/sysv/linux/aarch64/cpu-features.h b/sysdeps/unix/sysv/linux/aarch64/cpu-features.h
index e69de29..0b2a51b 100644
--- a/sysdeps/unix/sysv/linux/aarch64/cpu-features.h
+++ b/sysdeps/unix/sysv/linux/aarch64/cpu-features.h
@@ -0,0 +1,53 @@
+/* Initialize CPU feature data.  AArch64 version.
+   This file is part of the GNU C Library.
+   Copyright (C) 2017 Free Software Foundation, Inc.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#ifndef _CPU_FEATURES_AARCH64_H
+#define _CPU_FEATURES_AARCH64_H
+
+#include <stdint.h>
+
+#define MIDR_PARTNUM_SHIFT	4
+#define MIDR_PARTNUM_MASK	(0xfff << MIDR_PARTNUM_SHIFT)
+#define MIDR_PARTNUM(midr)	\
+	(((midr) & MIDR_PARTNUM_MASK) >> MIDR_PARTNUM_SHIFT)
+#define MIDR_ARCHITECTURE_SHIFT	16
+#define MIDR_ARCHITECTURE_MASK	(0xf << MIDR_ARCHITECTURE_SHIFT)
+#define MIDR_ARCHITECTURE(midr)	\
+	(((midr) & MIDR_ARCHITECTURE_MASK) >> MIDR_ARCHITECTURE_SHIFT)
+#define MIDR_VARIANT_SHIFT	20
+#define MIDR_VARIANT_MASK	(0xf << MIDR_VARIANT_SHIFT)
+#define MIDR_VARIANT(midr)	\
+	(((midr) & MIDR_VARIANT_MASK) >> MIDR_VARIANT_SHIFT)
+#define MIDR_IMPLEMENTOR_SHIFT	24
+#define MIDR_IMPLEMENTOR_MASK	(0xff << MIDR_IMPLEMENTOR_SHIFT)
+#define MIDR_IMPLEMENTOR(midr)	\
+	(((midr) & MIDR_IMPLEMENTOR_MASK) >> MIDR_IMPLEMENTOR_SHIFT)
+
+#define IS_THUNDERX(midr) (MIDR_IMPLEMENTOR(midr) == 'C'	\
+			   && MIDR_PARTNUM(midr) == 0x0a1)
+
+struct cpu_features
+{
+  uint64_t midr_el1;
+};
+
+void init_cpu_features (struct cpu_features *cpu_features);
+
+#define INIT_CPU_FEATURES init_cpu_features(&GLRO(dl_aarch64_cpu_features))
+
+#endif /* _CPU_FEATURES_AARCH64_H  */
diff --git a/sysdeps/unix/sysv/linux/aarch64/dl-procinfo.c b/sysdeps/unix/sysv/linux/aarch64/dl-procinfo.c
index e69de29..438046a 100644
--- a/sysdeps/unix/sysv/linux/aarch64/dl-procinfo.c
+++ b/sysdeps/unix/sysv/linux/aarch64/dl-procinfo.c
@@ -0,0 +1,60 @@
+/* Data for AArch64 version of processor capability information.
+   Linux version.
+   Copyright (C) 2017 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+/* If anything should be added here check whether the size of each string
+   is still ok with the given array size.
+
+   All the #ifdefs in the definitions are quite irritating but
+   necessary if we want to avoid duplicating the information.  There
+   are three different modes:
+
+   - PROCINFO_DECL is defined.  This means we are only interested in
+     declarations.
+
+   - PROCINFO_DECL is not defined:
+
+     + if SHARED is defined the file is included in an array
+       initializer.  The .element = { ... } syntax is needed.
+
+     + if SHARED is not defined a normal array initialization is
+       needed.
+  */
+
+#ifndef PROCINFO_CLASS
+# define PROCINFO_CLASS
+#endif
+
+#if !IS_IN (ldconfig)
+# if !defined PROCINFO_DECL && defined SHARED
+  ._dl_aarch64_cpu_features
+# else
+PROCINFO_CLASS struct cpu_features _dl_aarch64_cpu_features
+# endif
+# ifndef PROCINFO_DECL
+= { }
+# endif
+# if !defined SHARED || defined PROCINFO_DECL
+;
+# else
+,
+# endif
+#endif
+
+#undef PROCINFO_DECL
+#undef PROCINFO_CLASS

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH] Add ifunc memcpy and memmove for aarch64
  2017-02-23  0:27                 ` Steve Ellcey
@ 2017-02-23 14:21                   ` Siddhesh Poyarekar
  2017-02-23 16:20                     ` Steve Ellcey
  0 siblings, 1 reply; 38+ messages in thread
From: Siddhesh Poyarekar @ 2017-02-23 14:21 UTC (permalink / raw)
  To: Steve Ellcey, Andrew Pinski, Szabolcs Nagy
  Cc: Adhemerval Zanella, Wilco Dijkstra, nd, libc-alpha

On Thursday 23 February 2017 05:57 AM, Steve Ellcey wrote:
> Here is a new version of the IFUNC functionality for aarch64.  It does
> not include the memcpy changes to use it.  I tried to move the cpu
> feature check to later in the start up process so that I could check
> GLRO(dl_hwcap_mask) but it does not seem to be working.  I was
> wondering if anyone could look at this and see if I am doing something
> wrong.
> 
> If I understand the code correctly, for a dynamically linked program,
> dl_main is going to get called before dl_sysdep_start and dl_main is
> where environment variables are processed and where dl_hwcap_mask
> should be getting set.  But when I check dl_hwcap_mask in
> init_cpu_features (called from dl_sysdep_start), it does not seem to
> have changed from its original value that is set in the new
> dl-procinfo.h header file.

dl_sysdep_start calls dl_main, so you've got the order wrong.  If you
want init_cpu_features to read dl_hwcap_mask then you'll have to move
the code to read LD_* environment variables earlier.  In fact they need
to go into tunables, something I've had in mind for this release.

Do you want me to move the envvars into tunables or would you like to do it?

Siddhesh

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH] Add ifunc memcpy and memmove for aarch64
  2017-02-23 14:21                   ` Siddhesh Poyarekar
@ 2017-02-23 16:20                     ` Steve Ellcey
  2017-02-23 16:34                       ` Siddhesh Poyarekar
  0 siblings, 1 reply; 38+ messages in thread
From: Steve Ellcey @ 2017-02-23 16:20 UTC (permalink / raw)
  To: Siddhesh Poyarekar, Andrew Pinski, Szabolcs Nagy
  Cc: Adhemerval Zanella, Wilco Dijkstra, nd, libc-alpha

On Thu, 2017-02-23 at 19:51 +0530, Siddhesh Poyarekar wrote:
> 
> dl_sysdep_start calls dl_main, so you've got the order wrong.  If you
> want init_cpu_features to read dl_hwcap_mask then you'll have to move
> the code to read LD_* environment variables earlier.  In fact they need
> to go into tunables, something I've had in mind for this release.
> 
> Do you want me to move the envvars into tunables or would you like to
> do it?
> 
> Siddhesh

If you want to move it, that would be great.

Steve Ellcey
sellcey@cavium.com

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH] Add ifunc memcpy and memmove for aarch64
  2017-02-23 16:20                     ` Steve Ellcey
@ 2017-02-23 16:34                       ` Siddhesh Poyarekar
  2017-02-23 16:42                         ` Steve Ellcey
  0 siblings, 1 reply; 38+ messages in thread
From: Siddhesh Poyarekar @ 2017-02-23 16:34 UTC (permalink / raw)
  To: Steve Ellcey, Andrew Pinski, Szabolcs Nagy
  Cc: Adhemerval Zanella, Wilco Dijkstra, nd, libc-alpha

On Thursday 23 February 2017 09:50 PM, Steve Ellcey wrote:
> If you want to move it, that would be great.

OK, I'll get started on it.  Meanwhile, it would be nice to get the
earlier patch in since it works well for everything except
dl_hwcap_mask.  Do the aarch64 machine maintainers have a strong opinion
on this?

Siddhesh

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH] Add ifunc memcpy and memmove for aarch64
  2017-02-23 16:34                       ` Siddhesh Poyarekar
@ 2017-02-23 16:42                         ` Steve Ellcey
  2017-02-23 16:53                           ` Siddhesh Poyarekar
  0 siblings, 1 reply; 38+ messages in thread
From: Steve Ellcey @ 2017-02-23 16:42 UTC (permalink / raw)
  To: Siddhesh Poyarekar, Andrew Pinski, Szabolcs Nagy
  Cc: Adhemerval Zanella, Wilco Dijkstra, nd, libc-alpha

On Thu, 2017-02-23 at 22:04 +0530, Siddhesh Poyarekar wrote:
> On Thursday 23 February 2017 09:50 PM, Steve Ellcey wrote:
> > 
> > If you want to move it, that would be great.
> OK, I'll get started on it.  Meanwhile, it would be nice to get the
> earlier patch in since it works well for everything except
> dl_hwcap_mask.  Do the aarch64 machine maintainers have a strong opinion
> on this?
> 
> Siddhesh

Just to be clear, the earlier patch would be this one, right?

https://sourceware.org/ml/libc-alpha/2017-02/msg00175.html

Steve Ellcey
sellcey@cavium.com

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH] Add ifunc memcpy and memmove for aarch64
  2017-02-23 16:42                         ` Steve Ellcey
@ 2017-02-23 16:53                           ` Siddhesh Poyarekar
  2017-03-01 18:48                             ` Steve Ellcey
  0 siblings, 1 reply; 38+ messages in thread
From: Siddhesh Poyarekar @ 2017-02-23 16:53 UTC (permalink / raw)
  To: Steve Ellcey, Andrew Pinski, Szabolcs Nagy
  Cc: Adhemerval Zanella, Wilco Dijkstra, nd, libc-alpha

On Thursday 23 February 2017 10:12 PM, Steve Ellcey wrote:
> Just to be clear, the earlier patch would be this one, right?
> 
> https://sourceware.org/ml/libc-alpha/2017-02/msg00175.html

Yes.  It does everything necessary to get multiarch enabled and working
correctly for kernels that support it.  The hwcap_mask feature is
something that can be added on top and I can commit to doing that within
this release.

Siddhesh

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH] Add ifunc memcpy and memmove for aarch64
  2017-02-23 16:53                           ` Siddhesh Poyarekar
@ 2017-03-01 18:48                             ` Steve Ellcey
  2017-03-14 18:46                               ` Szabolcs Nagy
  0 siblings, 1 reply; 38+ messages in thread
From: Steve Ellcey @ 2017-03-01 18:48 UTC (permalink / raw)
  To: Siddhesh Poyarekar, Andrew Pinski, Szabolcs Nagy
  Cc: Adhemerval Zanella, Wilco Dijkstra, nd, libc-alpha

On Thu, 2017-02-23 at 22:23 +0530, Siddhesh Poyarekar wrote:
> On Thursday 23 February 2017 10:12 PM, Steve Ellcey wrote:
> > 
> > Just to be clear, the earlier patch would be this one, right?
> > 
> > https://sourceware.org/ml/libc-alpha/2017-02/msg00175.html
> Yes.  It does everything necessary to get multiarch enabled and working
> correctly for kernels that support it.  The hwcap_mask feature is
> something that can be added on top and I can commit to doing that within
> this release.
> 
> Siddhesh

Ping.  Does anyone have any comments or objections to this patch
that enables IFUNC on aarch64?

Steve Ellcey

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH] Add ifunc memcpy and memmove for aarch64
  2017-03-01 18:48                             ` Steve Ellcey
@ 2017-03-14 18:46                               ` Szabolcs Nagy
  2017-03-14 18:51                                 ` Andrew Pinski
  2017-03-15 23:53                                 ` Steve Ellcey
  0 siblings, 2 replies; 38+ messages in thread
From: Szabolcs Nagy @ 2017-03-14 18:46 UTC (permalink / raw)
  To: Steve Ellcey, Siddhesh Poyarekar, Andrew Pinski
  Cc: nd, Adhemerval Zanella, Wilco Dijkstra, libc-alpha, Marcus Shawcroft

On 01/03/17 18:48, Steve Ellcey wrote:
> On Thu, 2017-02-23 at 22:23 +0530, Siddhesh Poyarekar wrote:
>> On Thursday 23 February 2017 10:12 PM, Steve Ellcey wrote:
>>>
>>> Just to be clear, the earlier patch would be this one, right?
>>>
>>> https://sourceware.org/ml/libc-alpha/2017-02/msg00175.html
>> Yes.  It does everything necessary to get multiarch enabled and working
>> correctly for kernels that support it.  The hwcap_mask feature is
>> something that can be added on top and I can commit to doing that within
>> this release.
>>
>> Siddhesh
> 
> Ping.  Does anyone have any comments or objections to this patch
> that enables IFUNC on aarch64?

the patch looks ok, with HWCAP in bits/hwcap.h instead of

+/* Needed here until this gets into kernel sources.  */
+#ifndef HWCAP_CPUID
+# define HWCAP_CPUID            (1 << 11)
+#endif

the hwcap value is not yet in linux v4.10, but already
allocated, if we are committed to this value then i
think it's better to only have it in one place.
you may need to include bis/hwcap.h in some files.

i was not sure if we should wait for this to be in a
linux release, but i guess the value won't change now.

an unrelated issue i was wondering about is how the
upcoming ilp32 ifunc resolver will receive the hwcap
argument: will it only see 32bit (unsigned long hwcap)
or 64 bits (with different ifunc resolver prototype)
and if there is something in this area that needs to
be changed before ifuncs are used.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH] Add ifunc memcpy and memmove for aarch64
  2017-03-14 18:46                               ` Szabolcs Nagy
@ 2017-03-14 18:51                                 ` Andrew Pinski
  2017-03-15 23:53                                 ` Steve Ellcey
  1 sibling, 0 replies; 38+ messages in thread
From: Andrew Pinski @ 2017-03-14 18:51 UTC (permalink / raw)
  To: Szabolcs Nagy
  Cc: Steve Ellcey, Siddhesh Poyarekar, nd, Adhemerval Zanella,
	Wilco Dijkstra, libc-alpha, Marcus Shawcroft

On Tue, Mar 14, 2017 at 11:45 AM, Szabolcs Nagy <szabolcs.nagy@arm.com> wrote:
> On 01/03/17 18:48, Steve Ellcey wrote:
>> On Thu, 2017-02-23 at 22:23 +0530, Siddhesh Poyarekar wrote:
>>> On Thursday 23 February 2017 10:12 PM, Steve Ellcey wrote:
>>>>
>>>> Just to be clear, the earlier patch would be this one, right?
>>>>
>>>> https://sourceware.org/ml/libc-alpha/2017-02/msg00175.html
>>> Yes.  It does everything necessary to get multiarch enabled and working
>>> correctly for kernels that support it.  The hwcap_mask feature is
>>> something that can be added on top and I can commit to doing that within
>>> this release.
>>>
>>> Siddhesh
>>
>> Ping.  Does anyone have any comments or objections to this patch
>> that enables IFUNC on aarch64?
>
> the patch looks ok, with HWCAP in bits/hwcap.h instead of
>
> +/* Needed here until this gets into kernel sources.  */
> +#ifndef HWCAP_CPUID
> +# define HWCAP_CPUID            (1 << 11)
> +#endif
>
> the hwcap value is not yet in linux v4.10, but already
> allocated, if we are committed to this value then i
> think it's better to only have it in one place.
> you may need to include bis/hwcap.h in some files.
>
> i was not sure if we should wait for this to be in a
> linux release, but i guess the value won't change now.

It is in 4.11rc1 though.  Is that ok enough?

>
> an unrelated issue i was wondering about is how the
> upcoming ilp32 ifunc resolver will receive the hwcap
> argument: will it only see 32bit (unsigned long hwcap)
> or 64 bits (with different ifunc resolver prototype)
> and if there is something in this area that needs to
> be changed before ifuncs are used.

Right now, the hwcap is not using the full 64bit.  But the kernel will
have to split it into hwcap and hwcap2 anyways when we hit that point.
Most likely we should have the ifunc resolver take an uint64_t now
rather than latter.

Thanks,
Andrew Pinski

>

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH] Add ifunc memcpy and memmove for aarch64
  2017-03-14 18:46                               ` Szabolcs Nagy
  2017-03-14 18:51                                 ` Andrew Pinski
@ 2017-03-15 23:53                                 ` Steve Ellcey
  2017-03-22  5:38                                   ` Aarch64 machine maintainership (was: Add ifunc memcpy and memmove for aarch64) Siddhesh Poyarekar
  1 sibling, 1 reply; 38+ messages in thread
From: Steve Ellcey @ 2017-03-15 23:53 UTC (permalink / raw)
  To: Szabolcs Nagy, Siddhesh Poyarekar, Andrew Pinski
  Cc: nd, Adhemerval Zanella, Wilco Dijkstra, libc-alpha, Marcus Shawcroft

On Tue, 2017-03-14 at 18:45 +0000, Szabolcs Nagy wrote:
> 
> the hwcap value is not yet in linux v4.10, but already
> allocated, if we are committed to this value then i
> think it's better to only have it in one place.
> you may need to include bis/hwcap.h in some files.

I checked this patch in after moving the HWCAP_CPUID to bits/hwcap.h
and adding an include of sys/auxv.h to cpu-features.c to get the value.
When I added an include of bits/hwcap.h directly I got an error about
including auxv.h instead of hwcap.h.

Steve Ellcey
sellcey@cavium.com

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Aarch64 machine maintainership (was: Add ifunc memcpy and memmove for aarch64)
  2017-03-15 23:53                                 ` Steve Ellcey
@ 2017-03-22  5:38                                   ` Siddhesh Poyarekar
  2017-03-22 17:34                                     ` Joseph Myers
  0 siblings, 1 reply; 38+ messages in thread
From: Siddhesh Poyarekar @ 2017-03-22  5:38 UTC (permalink / raw)
  To: Steve Ellcey, Szabolcs Nagy, Andrew Pinski
  Cc: nd, Adhemerval Zanella, Wilco Dijkstra, libc-alpha, Marcus Shawcroft

On Thursday 16 March 2017 05:22 AM, Steve Ellcey wrote:
> I checked this patch in after moving the HWCAP_CPUID to bits/hwcap.h
> and adding an include of sys/auxv.h to cpu-features.c to get the value.
> When I added an include of bits/hwcap.h directly I got an error about
> including auxv.h instead of hwcap.h.

Technically you still needed to get an ack from Marcus who is the only
machine maintainer for aarch64.  Practically though, the patch is fine
and it appears that Marcus hasn't had the time to review the patch and
it doesn't make sense to wait too long for something that has had
all-round consensus.

That said, it would be nice to have one of two things happen going forward:

Either Marcus names one or more machine maintainers for aarch64 that
have the bandwidth to review and approve aarch64 patches for glibc.  I
would like to propose Adhemerval as an aarch64 machine maintainer too
since he has the necessary experience in glibc and also the involvement
in aarch64 work.

Alternatively, make aarch64 patch review open like x86.  This might be
the way forward given that aarch64 development interests are getting
increasingly diverse (probably more so than x86) and it may become quite
difficult for a single maintainer from ARM to gate everything that goes in.

Siddhesh

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Aarch64 machine maintainership (was: Add ifunc memcpy and memmove for aarch64)
  2017-03-22  5:38                                   ` Aarch64 machine maintainership (was: Add ifunc memcpy and memmove for aarch64) Siddhesh Poyarekar
@ 2017-03-22 17:34                                     ` Joseph Myers
  2017-03-22 17:52                                       ` Aarch64 machine maintainership Siddhesh Poyarekar
  0 siblings, 1 reply; 38+ messages in thread
From: Joseph Myers @ 2017-03-22 17:34 UTC (permalink / raw)
  To: Siddhesh Poyarekar
  Cc: Steve Ellcey, Szabolcs Nagy, Andrew Pinski, nd,
	Adhemerval Zanella, Wilco Dijkstra, libc-alpha, Marcus Shawcroft

On Wed, 22 Mar 2017, Siddhesh Poyarekar wrote:

> Technically you still needed to get an ack from Marcus who is the only
> machine maintainer for aarch64.  Practically though, the patch is fine

No, machine patches are subject to consensus just like any other patches, 
and such consensus can be reached without needing a machine maintainer to 
comment (although one might hope they would review most substantial 
patches for their machine).

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Aarch64 machine maintainership
  2017-03-22 17:34                                     ` Joseph Myers
@ 2017-03-22 17:52                                       ` Siddhesh Poyarekar
  0 siblings, 0 replies; 38+ messages in thread
From: Siddhesh Poyarekar @ 2017-03-22 17:52 UTC (permalink / raw)
  To: Joseph Myers
  Cc: Steve Ellcey, Szabolcs Nagy, Andrew Pinski, nd,
	Adhemerval Zanella, Wilco Dijkstra, libc-alpha, Marcus Shawcroft

On Wednesday 22 March 2017 11:04 PM, Joseph Myers wrote:
> No, machine patches are subject to consensus just like any other patches, 
> and such consensus can be reached without needing a machine maintainer to 
> comment (although one might hope they would review most substantial 
> patches for their machine).

That is not very clear from the Consensus wiki page, so I added the
following to the machine maintainer section:

 * If you are not a maintainer for the machine you're proposing the
   change for, your patches are subject to consensus like any other
   patches and while review from a machine maintainer may be ideal, it
   is not strictly necessary for the patch to be accepted.

Siddhesh

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH] Add ifunc memcpy and memmove for aarch64
  2017-01-25 17:34       ` Steve Ellcey
  2017-02-06 20:54         ` Adhemerval Zanella
@ 2017-02-07  6:47         ` Siddhesh Poyarekar
  1 sibling, 0 replies; 38+ messages in thread
From: Siddhesh Poyarekar @ 2017-02-07  6:47 UTC (permalink / raw)
  To: Steve Ellcey, Adhemerval Zanella, libc-alpha

On Wednesday 25 January 2017 11:04 PM, Steve Ellcey wrote:
> Here is a new version of the aarch64 ifunc patch with the cpu-features
> style of initialization on startup.  Adhemerval, since I took some code
> from your branch I added your name to the ChangeLog.  In addition to
> doing the mrs instruction on startup the main difference in this patch
> from the last one is that it uses ifuncs in both the shared and archive
> libc libraries.
> 
> Steve Ellcey
> sellcey@cavium.com
> 
> 
> 2017-01-25  Steve Ellcey  <sellcey@caviumnetworks.com>
> 	    Adhemerval Zanella  <adhemerval.zanella@linaro.org>
> 
> 	* sysdeps/aarch64/dl-machine.h: Include cpu-features.c.
> 	(DL_PLATFORM_INIT): New define.
> 	(dl_platform_init): New function.
> 	* sysdeps/aarch64/ldsodefs.h: Include cpu-features.h.
> 	* sysdeps/aarch64/memcpy.S (MEMMOVE, MEMCPY): New macros.
> 	(memmove): Use MEMMOVE for name.
> 	(memcpy): Use MEMCPY for name.  Add loop with prefetching
> 	under USE_THUNDERX macro.
> 	* sysdeps/aarch64/multiarch/Makefile: New file.
> 	* sysdeps/aarch64/multiarch/ifunc-impl-list.c: Ditto.
> 	* sysdeps/aarch64/multiarch/init-arch.h: Ditto.
> 	* sysdeps/aarch64/multiarch/memcpy.c: Ditto.
> 	* sysdeps/aarch64/multiarch/memcpy_generic.S: Ditto.
> 	* sysdeps/aarch64/multiarch/memcpy_thunderx.S: Ditto.
> 	* sysdeps/aarch64/multiarch/memmove.c: Ditto.
> 	* sysdeps/unix/sysv/linux/aarch64/cpu-features.c: Ditto.
> 	* sysdeps/unix/sysv/linux/aarch64/cpu-features.h: Ditto.
> 	* sysdeps/unix/sysv/linux/aarch64/dl-procinfo.c: Ditto.
> 	* sysdeps/unix/sysv/linux/aarch64/libc-start.c: Ditto.
> 
> 
> ifunc.patch
> 
> 
> diff --git a/sysdeps/aarch64/dl-machine.h b/sysdeps/aarch64/dl-machine.h
> index 84b8aec..15d79a6 100644
> --- a/sysdeps/aarch64/dl-machine.h
> +++ b/sysdeps/aarch64/dl-machine.h
> @@ -25,6 +25,7 @@
>  #include <tls.h>
>  #include <dl-tlsdesc.h>
>  #include <dl-irel.h>
> +#include <cpu-features.c>
>  
>  /* Return nonzero iff ELF header is compatible with the running host.  */
>  static inline int __attribute__ ((unused))
> @@ -225,6 +226,23 @@ _dl_start_user:								\n\
>  #define ELF_MACHINE_NO_REL 1
>  #define ELF_MACHINE_NO_RELA 0
>  
> +#define DL_PLATFORM_INIT dl_platform_init ()
> +
> +static inline void __attribute__ ((unused))
> +dl_platform_init (void)
> +{
> +  if (GLRO(dl_platform) != NULL && *GLRO(dl_platform) == '\0')
> +    /* Avoid an empty string which would disturb us.  */
> +    GLRO(dl_platform) = NULL;
> +
> +#ifdef SHARED
> +  /* init_cpu_features has been called early from __libc_start_main in
> +     static executable.  */
> +  init_cpu_features (&GLRO(dl_aarch64_cpu_features));
> +#endif
> +}
> +
> +
>  static inline ElfW(Addr)
>  elf_machine_fixup_plt (struct link_map *map, lookup_t t,
>  		       const ElfW(Rela) *reloc,
> diff --git a/sysdeps/aarch64/ldsodefs.h b/sysdeps/aarch64/ldsodefs.h
> index f277074..ba4ada3 100644
> --- a/sysdeps/aarch64/ldsodefs.h
> +++ b/sysdeps/aarch64/ldsodefs.h
> @@ -20,6 +20,7 @@
>  #define _AARCH64_LDSODEFS_H 1
>  
>  #include <elf.h>
> +#include <cpu-features.h>
>  
>  struct La_aarch64_regs;
>  struct La_aarch64_retval;
> diff --git a/sysdeps/aarch64/memcpy.S b/sysdeps/aarch64/memcpy.S
> index 29af8b1..74444b4 100644
> --- a/sysdeps/aarch64/memcpy.S
> +++ b/sysdeps/aarch64/memcpy.S
> @@ -59,7 +59,14 @@
>     Overlapping large forward memmoves use a loop that copies backwards.
>  */
>  
> -ENTRY_ALIGN (memmove, 6)
> +#ifndef MEMMOVE
> +#  define MEMMOVE memmove
> +#endif
> +#ifndef MEMCPY
> +#  define MEMCPY memcpy
> +#endif
> +
> +ENTRY_ALIGN (MEMMOVE, 6)
>  
>  	DELOUSE (0)
>  	DELOUSE (1)
> @@ -71,9 +78,9 @@ ENTRY_ALIGN (memmove, 6)
>  	b.lo	L(move_long)
>  
>  	/* Common case falls through into memcpy.  */
> -END (memmove)
> -libc_hidden_builtin_def (memmove)
> -ENTRY (memcpy)
> +END (MEMMOVE)
> +libc_hidden_builtin_def (MEMMOVE)
> +ENTRY (MEMCPY)
>  
>  	DELOUSE (0)
>  	DELOUSE (1)
> @@ -158,10 +165,22 @@ L(copy96):
>  
>  	.p2align 4
>  L(copy_long):
> +
> +#ifdef USE_THUNDERX
> +
> +	/* On thunderx, large memcpy's are helped by software prefetching.
> +	   This loop is identical to the one below it but with prefetching
> +	   instructions included.  For loops that are less than 32768 bytes,
> +	   the prefetching does not help and slow the code down so we only
> +	   use the prefetching loop for the largest memcpys.  */

I think it would be cleaner to put the full generic and thunderx
implementations in separate files instead of trying to do this macro
dance because it keeps micro-architecture details separate.  Assembly
code is hard to maintain as it is without adding conditional compilation
using macros.

I also second Adhemerval's suggestion to separate the patch to add the
framework from the one to add the thunderx ifunc.  It makes for easier
cherry picking and git-blaming.

Siddhesh

> +
> +	cmp	count, #32768
> +	b.lo	L(copy_long_without_prefetch)
>  	and	tmp1, dstin, 15
>  	bic	dst, dstin, 15
>  	ldp	D_l, D_h, [src]
>  	sub	src, src, tmp1
> +	prfm	pldl1strm, [src, 384]
>  	add	count, count, tmp1	/* Count is now 16 too large.  */
>  	ldp	A_l, A_h, [src, 16]
>  	stp	D_l, D_h, [dstin]
> @@ -169,7 +188,10 @@ L(copy_long):
>  	ldp	C_l, C_h, [src, 48]
>  	ldp	D_l, D_h, [src, 64]!
>  	subs	count, count, 128 + 16	/* Test and readjust count.  */
> -	b.ls	2f
> +
> +L(prefetch_loop64):
> +	tbz	src, #6, 1f
> +	prfm	pldl1strm, [src, 512]
>  1:
>  	stp	A_l, A_h, [dst, 16]
>  	ldp	A_l, A_h, [src, 16]
> @@ -180,12 +202,40 @@ L(copy_long):
>  	stp	D_l, D_h, [dst, 64]!
>  	ldp	D_l, D_h, [src, 64]!
>  	subs	count, count, 64
> -	b.hi	1b
> +	b.hi	L(prefetch_loop64)
> +	b	L(last64)
> +
> +L(copy_long_without_prefetch):
> +#endif
> +
> +	and	tmp1, dstin, 15
> +	bic	dst, dstin, 15
> +	ldp	D_l, D_h, [src]
> +	sub	src, src, tmp1
> +	add	count, count, tmp1	/* Count is now 16 too large.  */
> +	ldp	A_l, A_h, [src, 16]
> +	stp	D_l, D_h, [dstin]
> +	ldp	B_l, B_h, [src, 32]
> +	ldp	C_l, C_h, [src, 48]
> +	ldp	D_l, D_h, [src, 64]!
> +	subs	count, count, 128 + 16	/* Test and readjust count.  */
> +	b.ls	L(last64)
> +L(loop64):
> +	stp	A_l, A_h, [dst, 16]
> +	ldp	A_l, A_h, [src, 16]
> +	stp	B_l, B_h, [dst, 32]
> +	ldp	B_l, B_h, [src, 32]
> +	stp	C_l, C_h, [dst, 48]
> +	ldp	C_l, C_h, [src, 48]
> +	stp	D_l, D_h, [dst, 64]!
> +	ldp	D_l, D_h, [src, 64]!
> +	subs	count, count, 64
> +	b.hi	L(loop64)
>  
>  	/* Write the last full set of 64 bytes.  The remainder is at most 64
>  	   bytes, so it is safe to always copy 64 bytes from the end even if
>  	   there is just 1 byte left.  */
> -2:
> +L(last64):
>  	ldp	E_l, E_h, [srcend, -64]
>  	stp	A_l, A_h, [dst, 16]
>  	ldp	A_l, A_h, [srcend, -48]
> @@ -256,5 +306,5 @@ L(move_long):
>  	stp	C_l, C_h, [dstin]
>  3:	ret
>  
> -END (memcpy)
> -libc_hidden_builtin_def (memcpy)
> +END (MEMCPY)
> +libc_hidden_builtin_def (MEMCPY)
> diff --git a/sysdeps/aarch64/multiarch/Makefile b/sysdeps/aarch64/multiarch/Makefile
> index e69de29..78d52c7 100644
> --- a/sysdeps/aarch64/multiarch/Makefile
> +++ b/sysdeps/aarch64/multiarch/Makefile
> @@ -0,0 +1,3 @@
> +ifeq ($(subdir),string)
> +sysdep_routines += memcpy_generic memcpy_thunderx
> +endif
> diff --git a/sysdeps/aarch64/multiarch/ifunc-impl-list.c b/sysdeps/aarch64/multiarch/ifunc-impl-list.c
> index e69de29..c4f23df 100644
> --- a/sysdeps/aarch64/multiarch/ifunc-impl-list.c
> +++ b/sysdeps/aarch64/multiarch/ifunc-impl-list.c
> @@ -0,0 +1,51 @@
> +/* Enumerate available IFUNC implementations of a function.  AARCH64 version.
> +   Copyright (C) 2017 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <http://www.gnu.org/licenses/>.  */
> +
> +#include <assert.h>
> +#include <string.h>
> +#include <wchar.h>
> +#include <ldsodefs.h>
> +#include <ifunc-impl-list.h>
> +#include <init-arch.h>
> +#include <stdio.h>
> +
> +/* Maximum number of IFUNC implementations.  */
> +#define MAX_IFUNC	2
> +
> +size_t
> +__libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array,
> +			size_t max)
> +{
> +  assert (max >= MAX_IFUNC);
> +
> +  size_t i = 0;
> +
> +  INIT_ARCH ();
> +
> +  /* Support sysdeps/aarch64/multiarch/memcpy.c and memmove.c.  */
> +  IFUNC_IMPL (i, name, memcpy,
> +	      IFUNC_IMPL_ADD (array, i, memcpy, IS_THUNDERX (midr),
> +			      __memcpy_thunderx)
> +	      IFUNC_IMPL_ADD (array, i, memcpy, 1, __memcpy_generic))
> +  IFUNC_IMPL (i, name, memmove,
> +	      IFUNC_IMPL_ADD (array, i, memmove, IS_THUNDERX (midr),
> +			      __memmove_thunderx)
> +	      IFUNC_IMPL_ADD (array, i, memmove, 1, __memmove_generic))
> +
> +  return i;
> +}
> diff --git a/sysdeps/aarch64/multiarch/init-arch.h b/sysdeps/aarch64/multiarch/init-arch.h
> index e69de29..eafbf77 100644
> --- a/sysdeps/aarch64/multiarch/init-arch.h
> +++ b/sysdeps/aarch64/multiarch/init-arch.h
> @@ -0,0 +1,22 @@
> +/* This file is part of the GNU C Library.
> +   Copyright (C) 2017 Free Software Foundation, Inc.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <http://www.gnu.org/licenses/>.  */
> +
> +#include <ldsodefs.h>
> +
> +#define INIT_ARCH()						\
> +  uint64_t __attribute__((unused)) midr = 			\
> +    GLRO(dl_aarch64_cpu_features).midr_el1;
> diff --git a/sysdeps/aarch64/multiarch/memcpy.c b/sysdeps/aarch64/multiarch/memcpy.c
> index e69de29..4e3f251 100644
> --- a/sysdeps/aarch64/multiarch/memcpy.c
> +++ b/sysdeps/aarch64/multiarch/memcpy.c
> @@ -0,0 +1,39 @@
> +/* Multiple versions of memcpy. AARCH64 version.
> +   Copyright (C) 2017 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <http://www.gnu.org/licenses/>.  */
> +
> +/* Define multiple versions only for the definition in libc.  */
> +
> +#if IS_IN (libc)
> +/* Redefine memcpy so that the compiler won't complain about the type
> +   mismatch with the IFUNC selector in strong_alias, below.  */
> +# undef memcpy
> +# define memcpy __redirect_memcpy
> +# include <string.h>
> +# include <init-arch.h>
> +
> +extern __typeof (__redirect_memcpy) __libc_memcpy;
> +
> +extern __typeof (__redirect_memcpy) __memcpy_generic attribute_hidden;
> +extern __typeof (__redirect_memcpy) __memcpy_thunderx attribute_hidden;
> +
> +libc_ifunc (__libc_memcpy,
> +            IS_THUNDERX (midr) ? __memcpy_thunderx : __memcpy_generic);
> +
> +#undef memcpy
> +strong_alias (__libc_memcpy, memcpy);
> +#endif
> diff --git a/sysdeps/aarch64/multiarch/memcpy_generic.S b/sysdeps/aarch64/multiarch/memcpy_generic.S
> index e69de29..50e1a1c 100644
> --- a/sysdeps/aarch64/multiarch/memcpy_generic.S
> +++ b/sysdeps/aarch64/multiarch/memcpy_generic.S
> @@ -0,0 +1,42 @@
> +/* A Generic Optimized memcpy implementation for AARCH64.
> +   Copyright (C) 2017 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <http://www.gnu.org/licenses/>.  */
> +
> +/* The actual memcpy and memmove code is in ../memcpy.S.  If we are
> +   building libc this file defines __memcpy_generic and __memmove_generic.
> +   Otherwise the include of ../memcpy.S will define the normal __memcpy
> +   and__memmove entry points.  */
> +
> +#include <sysdep.h>
> +
> +#if IS_IN (libc)
> +
> +#define MEMCPY __memcpy_generic
> +#define MEMMOVE __memmove_generic
> +
> +/* Do not hide the generic versions of memcpy and memmove, we use them
> +   internally.  */
> +#undef libc_hidden_builtin_def
> +#define libc_hidden_builtin_def(name)
> +
> +/* It doesn't make sense to send libc-internal memcpy calls through a PLT. */
> +	.globl __GI_memcpy; __GI_memcpy = __memcpy_generic
> +	.globl __GI_memmove; __GI_memmove = __memmove_generic
> +
> +#endif
> +
> +#include "../memcpy.S"
> diff --git a/sysdeps/aarch64/multiarch/memcpy_thunderx.S b/sysdeps/aarch64/multiarch/memcpy_thunderx.S
> index e69de29..ee971c8 100644
> --- a/sysdeps/aarch64/multiarch/memcpy_thunderx.S
> +++ b/sysdeps/aarch64/multiarch/memcpy_thunderx.S
> @@ -0,0 +1,32 @@
> +/* A Thunderx Optimized memcpy implementation for AARCH64.
> +   Copyright (C) 2017 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <http://www.gnu.org/licenses/>.  */
> +
> +/* The actual thunderx optimized code is in ../memcpy.S under the USE_THUNDERX
> +   ifdef.  If we are not building libc then we do not build anything when
> +   compiling this file and __memcpy is defined by memcpy_generic.S.  */
> +
> +#include <sysdep.h>
> +
> +#if IS_IN (libc)
> +
> +#define MEMCPY __memcpy_thunderx
> +#define MEMMOVE __memmove_thunderx
> +#define USE_THUNDERX
> +#include "../memcpy.S"
> +
> +#endif
> diff --git a/sysdeps/aarch64/multiarch/memmove.c b/sysdeps/aarch64/multiarch/memmove.c
> index e69de29..8d7a146 100644
> --- a/sysdeps/aarch64/multiarch/memmove.c
> +++ b/sysdeps/aarch64/multiarch/memmove.c
> @@ -0,0 +1,39 @@
> +/* Multiple versions of memmove. AARCH64 version.
> +   Copyright (C) 2017 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <http://www.gnu.org/licenses/>.  */
> +
> +/* Define multiple versions only for the definition in libc.  */
> +
> +#if IS_IN (libc)
> +/* Redefine memmove so that the compiler won't complain about the type
> +   mismatch with the IFUNC selector in strong_alias, below.  */
> +# undef memmove
> +# define memmove __redirect_memmove
> +# include <string.h>
> +# include <init-arch.h>
> +
> +extern __typeof (__redirect_memmove) __libc_memmove;
> +
> +extern __typeof (__redirect_memmove) __memmove_generic attribute_hidden;
> +extern __typeof (__redirect_memmove) __memmove_thunderx attribute_hidden;
> +
> +libc_ifunc (__libc_memmove,
> +            IS_THUNDERX (midr) ? __memmove_thunderx : __memmove_generic);
> +
> +#undef memmove
> +strong_alias (__libc_memmove, memmove);
> +#endif
> diff --git a/sysdeps/unix/sysv/linux/aarch64/cpu-features.c b/sysdeps/unix/sysv/linux/aarch64/cpu-features.c
> index e69de29..8e4b514 100644
> --- a/sysdeps/unix/sysv/linux/aarch64/cpu-features.c
> +++ b/sysdeps/unix/sysv/linux/aarch64/cpu-features.c
> @@ -0,0 +1,38 @@
> +/* Initialize CPU feature data.  AArch64 version.
> +   This file is part of the GNU C Library.
> +   Copyright (C) 2017 Free Software Foundation, Inc.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <http://www.gnu.org/licenses/>.  */
> +
> +#include <cpu-features.h>
> +
> +#ifndef HWCAP_CPUID
> +# define HWCAP_CPUID		(1 << 11)
> +#endif
> +
> +static inline void
> +init_cpu_features (struct cpu_features *cpu_features)
> +{
> +  if (GLRO(dl_hwcap) & HWCAP_CPUID)
> +    {
> +      register uint64_t id = 0;
> +      asm volatile ("mrs %0, midr_el1" : "=r"(id));
> +      cpu_features->midr_el1 = id;
> +    }
> +  else
> +    {
> +      cpu_features->midr_el1 = 0;
> +    }
> +}
> diff --git a/sysdeps/unix/sysv/linux/aarch64/cpu-features.h b/sysdeps/unix/sysv/linux/aarch64/cpu-features.h
> index e69de29..c92b650 100644
> --- a/sysdeps/unix/sysv/linux/aarch64/cpu-features.h
> +++ b/sysdeps/unix/sysv/linux/aarch64/cpu-features.h
> @@ -0,0 +1,49 @@
> +/* Initialize CPU feature data.  AArch64 version.
> +   This file is part of the GNU C Library.
> +   Copyright (C) 2017 Free Software Foundation, Inc.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <http://www.gnu.org/licenses/>.  */
> +
> +#ifndef _CPU_FEATURES_AARCH64_H
> +#define _CPU_FEATURES_AARCH64_H
> +
> +#include <stdint.h>
> +
> +#define MIDR_PARTNUM_SHIFT	4
> +#define MIDR_PARTNUM_MASK	(0xfff << MIDR_PARTNUM_SHIFT)
> +#define MIDR_PARTNUM(midr)	\
> +	(((midr) & MIDR_PARTNUM_MASK) >> MIDR_PARTNUM_SHIFT)
> +#define MIDR_ARCHITECTURE_SHIFT	16
> +#define MIDR_ARCHITECTURE_MASK	(0xf << MIDR_ARCHITECTURE_SHIFT)
> +#define MIDR_ARCHITECTURE(midr)	\
> +	(((midr) & MIDR_ARCHITECTURE_MASK) >> MIDR_ARCHITECTURE_SHIFT)
> +#define MIDR_VARIANT_SHIFT	20
> +#define MIDR_VARIANT_MASK	(0xf << MIDR_VARIANT_SHIFT)
> +#define MIDR_VARIANT(midr)	\
> +	(((midr) & MIDR_VARIANT_MASK) >> MIDR_VARIANT_SHIFT)
> +#define MIDR_IMPLEMENTOR_SHIFT	24
> +#define MIDR_IMPLEMENTOR_MASK	(0xff << MIDR_IMPLEMENTOR_SHIFT)
> +#define MIDR_IMPLEMENTOR(midr)	\
> +	(((midr) & MIDR_IMPLEMENTOR_MASK) >> MIDR_IMPLEMENTOR_SHIFT)
> +
> +#define IS_THUNDERX(midr) (MIDR_IMPLEMENTOR(midr) == 'C'	\
> +			   && MIDR_PARTNUM(midr) == 0x0a1)
> +
> +struct cpu_features
> +{
> +  uint64_t midr_el1;
> +};
> +
> +#endif /* _CPU_FEATURES_AARCH64_H  */
> diff --git a/sysdeps/unix/sysv/linux/aarch64/dl-procinfo.c b/sysdeps/unix/sysv/linux/aarch64/dl-procinfo.c
> index e69de29..438046a 100644
> --- a/sysdeps/unix/sysv/linux/aarch64/dl-procinfo.c
> +++ b/sysdeps/unix/sysv/linux/aarch64/dl-procinfo.c
> @@ -0,0 +1,60 @@
> +/* Data for AArch64 version of processor capability information.
> +   Linux version.
> +   Copyright (C) 2017 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <http://www.gnu.org/licenses/>.  */
> +
> +/* If anything should be added here check whether the size of each string
> +   is still ok with the given array size.
> +
> +   All the #ifdefs in the definitions are quite irritating but
> +   necessary if we want to avoid duplicating the information.  There
> +   are three different modes:
> +
> +   - PROCINFO_DECL is defined.  This means we are only interested in
> +     declarations.
> +
> +   - PROCINFO_DECL is not defined:
> +
> +     + if SHARED is defined the file is included in an array
> +       initializer.  The .element = { ... } syntax is needed.
> +
> +     + if SHARED is not defined a normal array initialization is
> +       needed.
> +  */
> +
> +#ifndef PROCINFO_CLASS
> +# define PROCINFO_CLASS
> +#endif
> +
> +#if !IS_IN (ldconfig)
> +# if !defined PROCINFO_DECL && defined SHARED
> +  ._dl_aarch64_cpu_features
> +# else
> +PROCINFO_CLASS struct cpu_features _dl_aarch64_cpu_features
> +# endif
> +# ifndef PROCINFO_DECL
> += { }
> +# endif
> +# if !defined SHARED || defined PROCINFO_DECL
> +;
> +# else
> +,
> +# endif
> +#endif
> +
> +#undef PROCINFO_DECL
> +#undef PROCINFO_CLASS
> diff --git a/sysdeps/unix/sysv/linux/aarch64/libc-start.c b/sysdeps/unix/sysv/linux/aarch64/libc-start.c
> index e69de29..c98aff1 100644
> --- a/sysdeps/unix/sysv/linux/aarch64/libc-start.c
> +++ b/sysdeps/unix/sysv/linux/aarch64/libc-start.c
> @@ -0,0 +1,40 @@
> +/* Copyright (C) 2017 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <http://www.gnu.org/licenses/>.  */
> +
> +#ifdef SHARED
> +# include <csu/libc-start.c>
> +# else
> +/* The main work is done in the generic function.  */
> +# define LIBC_START_DISABLE_INLINE
> +# define LIBC_START_MAIN generic_start_main
> +# include <csu/libc-start.c>
> +# include <cpu-features.c>
> +
> +extern struct cpu_features _dl_aarch64_cpu_features;
> +
> +int
> +__libc_start_main (int (*main) (int, char **, char ** MAIN_AUXVEC_DECL),
> +		   int argc, char **argv,
> +		   __typeof (main) init,
> +		   void (*fini) (void),
> +		   void (*rtld_fini) (void), void *stack_end)
> +{
> +  init_cpu_features (&_dl_aarch64_cpu_features);
> +  return generic_start_main (main, argc, argv, init, fini, rtld_fini,
> +			     stack_end);
> +}
> +#endif
> 

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH] Add ifunc memcpy and memmove for aarch64
  2017-01-25 17:34       ` Steve Ellcey
@ 2017-02-06 20:54         ` Adhemerval Zanella
  2017-02-07  6:47         ` Siddhesh Poyarekar
  1 sibling, 0 replies; 38+ messages in thread
From: Adhemerval Zanella @ 2017-02-06 20:54 UTC (permalink / raw)
  To: Steve Ellcey, libc-alpha



On 25/01/2017 15:34, Steve Ellcey wrote:
> Here is a new version of the aarch64 ifunc patch with the cpu-features
> style of initialization on startup.  Adhemerval, since I took some code
> from your branch I added your name to the ChangeLog.  In addition to
> doing the mrs instruction on startup the main difference in this patch
> from the last one is that it uses ifuncs in both the shared and archive
> libc libraries.
> 
> Steve Ellcey
> sellcey@cavium.com

Hi Steve,

I think it is better to split this patchset in two, one for multiarch foundation
for aarch64 and another one for the thunderx memcpy implementation itself.

Besides that I think patch should be ok.

> diff --git a/sysdeps/aarch64/multiarch/init-arch.h b/sysdeps/aarch64/multiarch/init-arch.h
> index e69de29..eafbf77 100644
> --- a/sysdeps/aarch64/multiarch/init-arch.h
> +++ b/sysdeps/aarch64/multiarch/init-arch.h
> @@ -0,0 +1,22 @@
> +/* This file is part of the GNU C Library.

Missing one line description for this file.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH] Add ifunc memcpy and memmove for aarch64
  2017-01-24 14:09     ` Adhemerval Zanella
  2017-01-24 19:34       ` Steve Ellcey
@ 2017-01-25 17:34       ` Steve Ellcey
  2017-02-06 20:54         ` Adhemerval Zanella
  2017-02-07  6:47         ` Siddhesh Poyarekar
  1 sibling, 2 replies; 38+ messages in thread
From: Steve Ellcey @ 2017-01-25 17:34 UTC (permalink / raw)
  To: Adhemerval Zanella, libc-alpha

[-- Attachment #1: Type: text/plain, Size: 1503 bytes --]

Here is a new version of the aarch64 ifunc patch with the cpu-features
style of initialization on startup.  Adhemerval, since I took some code
from your branch I added your name to the ChangeLog.  In addition to
doing the mrs instruction on startup the main difference in this patch
from the last one is that it uses ifuncs in both the shared and archive
libc libraries.

Steve Ellcey
sellcey@cavium.com


2017-01-25  Steve Ellcey  <sellcey@caviumnetworks.com>
	    Adhemerval Zanella  <adhemerval.zanella@linaro.org>

	* sysdeps/aarch64/dl-machine.h: Include cpu-features.c.
	(DL_PLATFORM_INIT): New define.
	(dl_platform_init): New function.
	* sysdeps/aarch64/ldsodefs.h: Include cpu-features.h.
	* sysdeps/aarch64/memcpy.S (MEMMOVE, MEMCPY): New macros.
	(memmove): Use MEMMOVE for name.
	(memcpy): Use MEMCPY for name.  Add loop with prefetching
	under USE_THUNDERX macro.
	* sysdeps/aarch64/multiarch/Makefile: New file.
	* sysdeps/aarch64/multiarch/ifunc-impl-list.c: Ditto.
	* sysdeps/aarch64/multiarch/init-arch.h: Ditto.
	* sysdeps/aarch64/multiarch/memcpy.c: Ditto.
	* sysdeps/aarch64/multiarch/memcpy_generic.S: Ditto.
	* sysdeps/aarch64/multiarch/memcpy_thunderx.S: Ditto.
	* sysdeps/aarch64/multiarch/memmove.c: Ditto.
	* sysdeps/unix/sysv/linux/aarch64/cpu-features.c: Ditto.
	* sysdeps/unix/sysv/linux/aarch64/cpu-features.h: Ditto.
	* sysdeps/unix/sysv/linux/aarch64/dl-procinfo.c: Ditto.
	* sysdeps/unix/sysv/linux/aarch64/libc-start.c: Ditto.

[-- Attachment #2: ifunc.patch --]
[-- Type: text/x-patch, Size: 22322 bytes --]

diff --git a/sysdeps/aarch64/dl-machine.h b/sysdeps/aarch64/dl-machine.h
index 84b8aec..15d79a6 100644
--- a/sysdeps/aarch64/dl-machine.h
+++ b/sysdeps/aarch64/dl-machine.h
@@ -25,6 +25,7 @@
 #include <tls.h>
 #include <dl-tlsdesc.h>
 #include <dl-irel.h>
+#include <cpu-features.c>
 
 /* Return nonzero iff ELF header is compatible with the running host.  */
 static inline int __attribute__ ((unused))
@@ -225,6 +226,23 @@ _dl_start_user:								\n\
 #define ELF_MACHINE_NO_REL 1
 #define ELF_MACHINE_NO_RELA 0
 
+#define DL_PLATFORM_INIT dl_platform_init ()
+
+static inline void __attribute__ ((unused))
+dl_platform_init (void)
+{
+  if (GLRO(dl_platform) != NULL && *GLRO(dl_platform) == '\0')
+    /* Avoid an empty string which would disturb us.  */
+    GLRO(dl_platform) = NULL;
+
+#ifdef SHARED
+  /* init_cpu_features has been called early from __libc_start_main in
+     static executable.  */
+  init_cpu_features (&GLRO(dl_aarch64_cpu_features));
+#endif
+}
+
+
 static inline ElfW(Addr)
 elf_machine_fixup_plt (struct link_map *map, lookup_t t,
 		       const ElfW(Rela) *reloc,
diff --git a/sysdeps/aarch64/ldsodefs.h b/sysdeps/aarch64/ldsodefs.h
index f277074..ba4ada3 100644
--- a/sysdeps/aarch64/ldsodefs.h
+++ b/sysdeps/aarch64/ldsodefs.h
@@ -20,6 +20,7 @@
 #define _AARCH64_LDSODEFS_H 1
 
 #include <elf.h>
+#include <cpu-features.h>
 
 struct La_aarch64_regs;
 struct La_aarch64_retval;
diff --git a/sysdeps/aarch64/memcpy.S b/sysdeps/aarch64/memcpy.S
index 29af8b1..74444b4 100644
--- a/sysdeps/aarch64/memcpy.S
+++ b/sysdeps/aarch64/memcpy.S
@@ -59,7 +59,14 @@
    Overlapping large forward memmoves use a loop that copies backwards.
 */
 
-ENTRY_ALIGN (memmove, 6)
+#ifndef MEMMOVE
+#  define MEMMOVE memmove
+#endif
+#ifndef MEMCPY
+#  define MEMCPY memcpy
+#endif
+
+ENTRY_ALIGN (MEMMOVE, 6)
 
 	DELOUSE (0)
 	DELOUSE (1)
@@ -71,9 +78,9 @@ ENTRY_ALIGN (memmove, 6)
 	b.lo	L(move_long)
 
 	/* Common case falls through into memcpy.  */
-END (memmove)
-libc_hidden_builtin_def (memmove)
-ENTRY (memcpy)
+END (MEMMOVE)
+libc_hidden_builtin_def (MEMMOVE)
+ENTRY (MEMCPY)
 
 	DELOUSE (0)
 	DELOUSE (1)
@@ -158,10 +165,22 @@ L(copy96):
 
 	.p2align 4
 L(copy_long):
+
+#ifdef USE_THUNDERX
+
+	/* On thunderx, large memcpy's are helped by software prefetching.
+	   This loop is identical to the one below it but with prefetching
+	   instructions included.  For loops that are less than 32768 bytes,
+	   the prefetching does not help and slow the code down so we only
+	   use the prefetching loop for the largest memcpys.  */
+
+	cmp	count, #32768
+	b.lo	L(copy_long_without_prefetch)
 	and	tmp1, dstin, 15
 	bic	dst, dstin, 15
 	ldp	D_l, D_h, [src]
 	sub	src, src, tmp1
+	prfm	pldl1strm, [src, 384]
 	add	count, count, tmp1	/* Count is now 16 too large.  */
 	ldp	A_l, A_h, [src, 16]
 	stp	D_l, D_h, [dstin]
@@ -169,7 +188,10 @@ L(copy_long):
 	ldp	C_l, C_h, [src, 48]
 	ldp	D_l, D_h, [src, 64]!
 	subs	count, count, 128 + 16	/* Test and readjust count.  */
-	b.ls	2f
+
+L(prefetch_loop64):
+	tbz	src, #6, 1f
+	prfm	pldl1strm, [src, 512]
 1:
 	stp	A_l, A_h, [dst, 16]
 	ldp	A_l, A_h, [src, 16]
@@ -180,12 +202,40 @@ L(copy_long):
 	stp	D_l, D_h, [dst, 64]!
 	ldp	D_l, D_h, [src, 64]!
 	subs	count, count, 64
-	b.hi	1b
+	b.hi	L(prefetch_loop64)
+	b	L(last64)
+
+L(copy_long_without_prefetch):
+#endif
+
+	and	tmp1, dstin, 15
+	bic	dst, dstin, 15
+	ldp	D_l, D_h, [src]
+	sub	src, src, tmp1
+	add	count, count, tmp1	/* Count is now 16 too large.  */
+	ldp	A_l, A_h, [src, 16]
+	stp	D_l, D_h, [dstin]
+	ldp	B_l, B_h, [src, 32]
+	ldp	C_l, C_h, [src, 48]
+	ldp	D_l, D_h, [src, 64]!
+	subs	count, count, 128 + 16	/* Test and readjust count.  */
+	b.ls	L(last64)
+L(loop64):
+	stp	A_l, A_h, [dst, 16]
+	ldp	A_l, A_h, [src, 16]
+	stp	B_l, B_h, [dst, 32]
+	ldp	B_l, B_h, [src, 32]
+	stp	C_l, C_h, [dst, 48]
+	ldp	C_l, C_h, [src, 48]
+	stp	D_l, D_h, [dst, 64]!
+	ldp	D_l, D_h, [src, 64]!
+	subs	count, count, 64
+	b.hi	L(loop64)
 
 	/* Write the last full set of 64 bytes.  The remainder is at most 64
 	   bytes, so it is safe to always copy 64 bytes from the end even if
 	   there is just 1 byte left.  */
-2:
+L(last64):
 	ldp	E_l, E_h, [srcend, -64]
 	stp	A_l, A_h, [dst, 16]
 	ldp	A_l, A_h, [srcend, -48]
@@ -256,5 +306,5 @@ L(move_long):
 	stp	C_l, C_h, [dstin]
 3:	ret
 
-END (memcpy)
-libc_hidden_builtin_def (memcpy)
+END (MEMCPY)
+libc_hidden_builtin_def (MEMCPY)
diff --git a/sysdeps/aarch64/multiarch/Makefile b/sysdeps/aarch64/multiarch/Makefile
index e69de29..78d52c7 100644
--- a/sysdeps/aarch64/multiarch/Makefile
+++ b/sysdeps/aarch64/multiarch/Makefile
@@ -0,0 +1,3 @@
+ifeq ($(subdir),string)
+sysdep_routines += memcpy_generic memcpy_thunderx
+endif
diff --git a/sysdeps/aarch64/multiarch/ifunc-impl-list.c b/sysdeps/aarch64/multiarch/ifunc-impl-list.c
index e69de29..c4f23df 100644
--- a/sysdeps/aarch64/multiarch/ifunc-impl-list.c
+++ b/sysdeps/aarch64/multiarch/ifunc-impl-list.c
@@ -0,0 +1,51 @@
+/* Enumerate available IFUNC implementations of a function.  AARCH64 version.
+   Copyright (C) 2017 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include <assert.h>
+#include <string.h>
+#include <wchar.h>
+#include <ldsodefs.h>
+#include <ifunc-impl-list.h>
+#include <init-arch.h>
+#include <stdio.h>
+
+/* Maximum number of IFUNC implementations.  */
+#define MAX_IFUNC	2
+
+size_t
+__libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array,
+			size_t max)
+{
+  assert (max >= MAX_IFUNC);
+
+  size_t i = 0;
+
+  INIT_ARCH ();
+
+  /* Support sysdeps/aarch64/multiarch/memcpy.c and memmove.c.  */
+  IFUNC_IMPL (i, name, memcpy,
+	      IFUNC_IMPL_ADD (array, i, memcpy, IS_THUNDERX (midr),
+			      __memcpy_thunderx)
+	      IFUNC_IMPL_ADD (array, i, memcpy, 1, __memcpy_generic))
+  IFUNC_IMPL (i, name, memmove,
+	      IFUNC_IMPL_ADD (array, i, memmove, IS_THUNDERX (midr),
+			      __memmove_thunderx)
+	      IFUNC_IMPL_ADD (array, i, memmove, 1, __memmove_generic))
+
+  return i;
+}
diff --git a/sysdeps/aarch64/multiarch/init-arch.h b/sysdeps/aarch64/multiarch/init-arch.h
index e69de29..eafbf77 100644
--- a/sysdeps/aarch64/multiarch/init-arch.h
+++ b/sysdeps/aarch64/multiarch/init-arch.h
@@ -0,0 +1,22 @@
+/* This file is part of the GNU C Library.
+   Copyright (C) 2017 Free Software Foundation, Inc.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include <ldsodefs.h>
+
+#define INIT_ARCH()						\
+  uint64_t __attribute__((unused)) midr = 			\
+    GLRO(dl_aarch64_cpu_features).midr_el1;
diff --git a/sysdeps/aarch64/multiarch/memcpy.c b/sysdeps/aarch64/multiarch/memcpy.c
index e69de29..4e3f251 100644
--- a/sysdeps/aarch64/multiarch/memcpy.c
+++ b/sysdeps/aarch64/multiarch/memcpy.c
@@ -0,0 +1,39 @@
+/* Multiple versions of memcpy. AARCH64 version.
+   Copyright (C) 2017 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+/* Define multiple versions only for the definition in libc.  */
+
+#if IS_IN (libc)
+/* Redefine memcpy so that the compiler won't complain about the type
+   mismatch with the IFUNC selector in strong_alias, below.  */
+# undef memcpy
+# define memcpy __redirect_memcpy
+# include <string.h>
+# include <init-arch.h>
+
+extern __typeof (__redirect_memcpy) __libc_memcpy;
+
+extern __typeof (__redirect_memcpy) __memcpy_generic attribute_hidden;
+extern __typeof (__redirect_memcpy) __memcpy_thunderx attribute_hidden;
+
+libc_ifunc (__libc_memcpy,
+            IS_THUNDERX (midr) ? __memcpy_thunderx : __memcpy_generic);
+
+#undef memcpy
+strong_alias (__libc_memcpy, memcpy);
+#endif
diff --git a/sysdeps/aarch64/multiarch/memcpy_generic.S b/sysdeps/aarch64/multiarch/memcpy_generic.S
index e69de29..50e1a1c 100644
--- a/sysdeps/aarch64/multiarch/memcpy_generic.S
+++ b/sysdeps/aarch64/multiarch/memcpy_generic.S
@@ -0,0 +1,42 @@
+/* A Generic Optimized memcpy implementation for AARCH64.
+   Copyright (C) 2017 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+/* The actual memcpy and memmove code is in ../memcpy.S.  If we are
+   building libc this file defines __memcpy_generic and __memmove_generic.
+   Otherwise the include of ../memcpy.S will define the normal __memcpy
+   and__memmove entry points.  */
+
+#include <sysdep.h>
+
+#if IS_IN (libc)
+
+#define MEMCPY __memcpy_generic
+#define MEMMOVE __memmove_generic
+
+/* Do not hide the generic versions of memcpy and memmove, we use them
+   internally.  */
+#undef libc_hidden_builtin_def
+#define libc_hidden_builtin_def(name)
+
+/* It doesn't make sense to send libc-internal memcpy calls through a PLT. */
+	.globl __GI_memcpy; __GI_memcpy = __memcpy_generic
+	.globl __GI_memmove; __GI_memmove = __memmove_generic
+
+#endif
+
+#include "../memcpy.S"
diff --git a/sysdeps/aarch64/multiarch/memcpy_thunderx.S b/sysdeps/aarch64/multiarch/memcpy_thunderx.S
index e69de29..ee971c8 100644
--- a/sysdeps/aarch64/multiarch/memcpy_thunderx.S
+++ b/sysdeps/aarch64/multiarch/memcpy_thunderx.S
@@ -0,0 +1,32 @@
+/* A Thunderx Optimized memcpy implementation for AARCH64.
+   Copyright (C) 2017 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+/* The actual thunderx optimized code is in ../memcpy.S under the USE_THUNDERX
+   ifdef.  If we are not building libc then we do not build anything when
+   compiling this file and __memcpy is defined by memcpy_generic.S.  */
+
+#include <sysdep.h>
+
+#if IS_IN (libc)
+
+#define MEMCPY __memcpy_thunderx
+#define MEMMOVE __memmove_thunderx
+#define USE_THUNDERX
+#include "../memcpy.S"
+
+#endif
diff --git a/sysdeps/aarch64/multiarch/memmove.c b/sysdeps/aarch64/multiarch/memmove.c
index e69de29..8d7a146 100644
--- a/sysdeps/aarch64/multiarch/memmove.c
+++ b/sysdeps/aarch64/multiarch/memmove.c
@@ -0,0 +1,39 @@
+/* Multiple versions of memmove. AARCH64 version.
+   Copyright (C) 2017 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+/* Define multiple versions only for the definition in libc.  */
+
+#if IS_IN (libc)
+/* Redefine memmove so that the compiler won't complain about the type
+   mismatch with the IFUNC selector in strong_alias, below.  */
+# undef memmove
+# define memmove __redirect_memmove
+# include <string.h>
+# include <init-arch.h>
+
+extern __typeof (__redirect_memmove) __libc_memmove;
+
+extern __typeof (__redirect_memmove) __memmove_generic attribute_hidden;
+extern __typeof (__redirect_memmove) __memmove_thunderx attribute_hidden;
+
+libc_ifunc (__libc_memmove,
+            IS_THUNDERX (midr) ? __memmove_thunderx : __memmove_generic);
+
+#undef memmove
+strong_alias (__libc_memmove, memmove);
+#endif
diff --git a/sysdeps/unix/sysv/linux/aarch64/cpu-features.c b/sysdeps/unix/sysv/linux/aarch64/cpu-features.c
index e69de29..8e4b514 100644
--- a/sysdeps/unix/sysv/linux/aarch64/cpu-features.c
+++ b/sysdeps/unix/sysv/linux/aarch64/cpu-features.c
@@ -0,0 +1,38 @@
+/* Initialize CPU feature data.  AArch64 version.
+   This file is part of the GNU C Library.
+   Copyright (C) 2017 Free Software Foundation, Inc.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include <cpu-features.h>
+
+#ifndef HWCAP_CPUID
+# define HWCAP_CPUID		(1 << 11)
+#endif
+
+static inline void
+init_cpu_features (struct cpu_features *cpu_features)
+{
+  if (GLRO(dl_hwcap) & HWCAP_CPUID)
+    {
+      register uint64_t id = 0;
+      asm volatile ("mrs %0, midr_el1" : "=r"(id));
+      cpu_features->midr_el1 = id;
+    }
+  else
+    {
+      cpu_features->midr_el1 = 0;
+    }
+}
diff --git a/sysdeps/unix/sysv/linux/aarch64/cpu-features.h b/sysdeps/unix/sysv/linux/aarch64/cpu-features.h
index e69de29..c92b650 100644
--- a/sysdeps/unix/sysv/linux/aarch64/cpu-features.h
+++ b/sysdeps/unix/sysv/linux/aarch64/cpu-features.h
@@ -0,0 +1,49 @@
+/* Initialize CPU feature data.  AArch64 version.
+   This file is part of the GNU C Library.
+   Copyright (C) 2017 Free Software Foundation, Inc.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#ifndef _CPU_FEATURES_AARCH64_H
+#define _CPU_FEATURES_AARCH64_H
+
+#include <stdint.h>
+
+#define MIDR_PARTNUM_SHIFT	4
+#define MIDR_PARTNUM_MASK	(0xfff << MIDR_PARTNUM_SHIFT)
+#define MIDR_PARTNUM(midr)	\
+	(((midr) & MIDR_PARTNUM_MASK) >> MIDR_PARTNUM_SHIFT)
+#define MIDR_ARCHITECTURE_SHIFT	16
+#define MIDR_ARCHITECTURE_MASK	(0xf << MIDR_ARCHITECTURE_SHIFT)
+#define MIDR_ARCHITECTURE(midr)	\
+	(((midr) & MIDR_ARCHITECTURE_MASK) >> MIDR_ARCHITECTURE_SHIFT)
+#define MIDR_VARIANT_SHIFT	20
+#define MIDR_VARIANT_MASK	(0xf << MIDR_VARIANT_SHIFT)
+#define MIDR_VARIANT(midr)	\
+	(((midr) & MIDR_VARIANT_MASK) >> MIDR_VARIANT_SHIFT)
+#define MIDR_IMPLEMENTOR_SHIFT	24
+#define MIDR_IMPLEMENTOR_MASK	(0xff << MIDR_IMPLEMENTOR_SHIFT)
+#define MIDR_IMPLEMENTOR(midr)	\
+	(((midr) & MIDR_IMPLEMENTOR_MASK) >> MIDR_IMPLEMENTOR_SHIFT)
+
+#define IS_THUNDERX(midr) (MIDR_IMPLEMENTOR(midr) == 'C'	\
+			   && MIDR_PARTNUM(midr) == 0x0a1)
+
+struct cpu_features
+{
+  uint64_t midr_el1;
+};
+
+#endif /* _CPU_FEATURES_AARCH64_H  */
diff --git a/sysdeps/unix/sysv/linux/aarch64/dl-procinfo.c b/sysdeps/unix/sysv/linux/aarch64/dl-procinfo.c
index e69de29..438046a 100644
--- a/sysdeps/unix/sysv/linux/aarch64/dl-procinfo.c
+++ b/sysdeps/unix/sysv/linux/aarch64/dl-procinfo.c
@@ -0,0 +1,60 @@
+/* Data for AArch64 version of processor capability information.
+   Linux version.
+   Copyright (C) 2017 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+/* If anything should be added here check whether the size of each string
+   is still ok with the given array size.
+
+   All the #ifdefs in the definitions are quite irritating but
+   necessary if we want to avoid duplicating the information.  There
+   are three different modes:
+
+   - PROCINFO_DECL is defined.  This means we are only interested in
+     declarations.
+
+   - PROCINFO_DECL is not defined:
+
+     + if SHARED is defined the file is included in an array
+       initializer.  The .element = { ... } syntax is needed.
+
+     + if SHARED is not defined a normal array initialization is
+       needed.
+  */
+
+#ifndef PROCINFO_CLASS
+# define PROCINFO_CLASS
+#endif
+
+#if !IS_IN (ldconfig)
+# if !defined PROCINFO_DECL && defined SHARED
+  ._dl_aarch64_cpu_features
+# else
+PROCINFO_CLASS struct cpu_features _dl_aarch64_cpu_features
+# endif
+# ifndef PROCINFO_DECL
+= { }
+# endif
+# if !defined SHARED || defined PROCINFO_DECL
+;
+# else
+,
+# endif
+#endif
+
+#undef PROCINFO_DECL
+#undef PROCINFO_CLASS
diff --git a/sysdeps/unix/sysv/linux/aarch64/libc-start.c b/sysdeps/unix/sysv/linux/aarch64/libc-start.c
index e69de29..c98aff1 100644
--- a/sysdeps/unix/sysv/linux/aarch64/libc-start.c
+++ b/sysdeps/unix/sysv/linux/aarch64/libc-start.c
@@ -0,0 +1,40 @@
+/* Copyright (C) 2017 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#ifdef SHARED
+# include <csu/libc-start.c>
+# else
+/* The main work is done in the generic function.  */
+# define LIBC_START_DISABLE_INLINE
+# define LIBC_START_MAIN generic_start_main
+# include <csu/libc-start.c>
+# include <cpu-features.c>
+
+extern struct cpu_features _dl_aarch64_cpu_features;
+
+int
+__libc_start_main (int (*main) (int, char **, char ** MAIN_AUXVEC_DECL),
+		   int argc, char **argv,
+		   __typeof (main) init,
+		   void (*fini) (void),
+		   void (*rtld_fini) (void), void *stack_end)
+{
+  init_cpu_features (&_dl_aarch64_cpu_features);
+  return generic_start_main (main, argc, argv, init, fini, rtld_fini,
+			     stack_end);
+}
+#endif

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH] Add ifunc memcpy and memmove for aarch64
  2017-01-24 19:34       ` Steve Ellcey
@ 2017-01-24 20:49         ` Steve Ellcey
  0 siblings, 0 replies; 38+ messages in thread
From: Steve Ellcey @ 2017-01-24 20:49 UTC (permalink / raw)
  To: Adhemerval Zanella, libc-alpha

Never mind.  I fixed this by moving the definition of dl_platform_init
up earlier in the file.  That, plus including cpu-features.c fixed the
build.

Steve


On Tue, 2017-01-24 at 11:34 -0800, Steve Ellcey wrote:
> 
> I added this code to sysdeps/aarch64/dl-machine.h but when I added it
> I
> got a build error.  I am using the same prototype for
> dl_platform_init
> that x86 has so I am not sure why I get this error.
> 
> Steve Ellcey
> sellcey@caviumnetworks.com
> 
> 
> diff --git a/sysdeps/aarch64/dl-machine.h b/sysdeps/aarch64/dl-
> machine.h
> index 84b8aec..7f38a68 100644
> --- a/sysdeps/aarch64/dl-machine.h
> +++ b/sysdeps/aarch64/dl-machine.h
> @@ -426,4 +426,20 @@ elf_machine_lazy_rel (struct link_map *map,
>      _dl_reloc_bad_type (map, r_type, 1);
>  }
>  
> +#define DL_PLATFORM_INIT dl_platform_init ()
> +
> +static inline void __attribute__ ((unused))
> +dl_platform_init (void)
> +{
> +  if (GLRO(dl_platform) != NULL && *GLRO(dl_platform) == '\0')
> +    /* Avoid an empty string which would disturb us.  */
> +    GLRO(dl_platform) = NULL;
> +
> +#ifdef SHARED
> +  /* init_cpu_features has been called early from __libc_start_main
> in
> +     static executable.  */
> +  init_cpu_features (&GLRO(dl_aarch64_cpu_features));
> +#endif
> +}
> +
>  #endif
> 
> 
> 
> The error I get is:
> 
> In file included from dynamic-link.h:92:0,
>                  from dl-conflict.c:59:
> ../sysdeps/aarch64/dl-machine.h: In function ‘_dl_resolve_conflicts’:
> ../sysdeps/aarch64/dl-machine.h:432:1: error: invalid storage class
> for function ‘dl_platform_init’
>  dl_platform_init (void)
>  ^~~~~~~~~~~~~~~~
> ../o-iterator.mk:9: recipe for target '/home/ubuntu/sellcey/glibc-
> ifunc-new/obj-glibc64/elf/dl-conflict.o' failed
> make[2]: *** [/home/ubuntu/sellcey/glibc-ifunc-new/obj-
> glibc64/elf/dl-conflict.o] Error 1

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH] Add ifunc memcpy and memmove for aarch64
  2017-01-24 14:09     ` Adhemerval Zanella
@ 2017-01-24 19:34       ` Steve Ellcey
  2017-01-24 20:49         ` Steve Ellcey
  2017-01-25 17:34       ` Steve Ellcey
  1 sibling, 1 reply; 38+ messages in thread
From: Steve Ellcey @ 2017-01-24 19:34 UTC (permalink / raw)
  To: Adhemerval Zanella, libc-alpha

On Tue, 2017-01-24 at 12:09 -0200, Adhemerval Zanella wrote:

> This branch in my personal repo [1] have a workable draft version for
> aarch64.  It contains 2 patches, one that implements the cpu-features.c
> for aarch64 and another one that actually uses it to implement the
> thundex ifunc.
> 
> On the first patch I would like to remove the sysdeps/aarch64/ldsodefs.h and
> make only Linux specific, because of hwcap. I will try to cleanup this up
> later.
> 
> [1] https://github.com/zatrazz/glibc/tree/master-aarch64-ifunc

Thanks Adhemerval,

That clears a lot of things up.  One thing I noticed in your tree is
that you only call init_cpu_features from  __libc_start_main for the
static glibc.  On x86 they also defined DL_PLATFORM_INIT to be a
routine that calls init_cpu_features for the dynamically loaded glibc. 

I added this code to sysdeps/aarch64/dl-machine.h but when I added it I
got a build error.  I am using the same prototype for dl_platform_init
that x86 has so I am not sure why I get this error.

Steve Ellcey
sellcey@caviumnetworks.com


diff --git a/sysdeps/aarch64/dl-machine.h b/sysdeps/aarch64/dl-
machine.h
index 84b8aec..7f38a68 100644
--- a/sysdeps/aarch64/dl-machine.h
+++ b/sysdeps/aarch64/dl-machine.h
@@ -426,4 +426,20 @@ elf_machine_lazy_rel (struct link_map *map,
     _dl_reloc_bad_type (map, r_type, 1);
 }
 
+#define DL_PLATFORM_INIT dl_platform_init ()
+
+static inline void __attribute__ ((unused))
+dl_platform_init (void)
+{
+  if (GLRO(dl_platform) != NULL && *GLRO(dl_platform) == '\0')
+    /* Avoid an empty string which would disturb us.  */
+    GLRO(dl_platform) = NULL;
+
+#ifdef SHARED
+  /* init_cpu_features has been called early from __libc_start_main in
+     static executable.  */
+  init_cpu_features (&GLRO(dl_aarch64_cpu_features));
+#endif
+}
+
 #endif



The error I get is:

In file included from dynamic-link.h:92:0,
                 from dl-conflict.c:59:
../sysdeps/aarch64/dl-machine.h: In function ‘_dl_resolve_conflicts’:
../sysdeps/aarch64/dl-machine.h:432:1: error: invalid storage class for function ‘dl_platform_init’
 dl_platform_init (void)
 ^~~~~~~~~~~~~~~~
../o-iterator.mk:9: recipe for target '/home/ubuntu/sellcey/glibc-ifunc-new/obj-glibc64/elf/dl-conflict.o' failed
make[2]: *** [/home/ubuntu/sellcey/glibc-ifunc-new/obj-glibc64/elf/dl-conflict.o] Error 1

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH] Add ifunc memcpy and memmove for aarch64
  2017-01-23 23:33   ` Steve Ellcey
  2017-01-24  9:37     ` Florian Weimer
@ 2017-01-24 14:09     ` Adhemerval Zanella
  2017-01-24 19:34       ` Steve Ellcey
  2017-01-25 17:34       ` Steve Ellcey
  1 sibling, 2 replies; 38+ messages in thread
From: Adhemerval Zanella @ 2017-01-24 14:09 UTC (permalink / raw)
  To: Steve Ellcey, libc-alpha



On 23/01/2017 21:33, Steve Ellcey wrote:
> On Thu, 2017-01-19 at 17:41 -0200, Adhemerval Zanella wrote:
>>  
>> I think to avoid potentially multiple kernel traps at loading or plt resolve time,
>> a better solution would be issue the mrs instruction once at loader/program startup,
>> fill in an internal structure with the required information and use it later on
>> ifunc resolution.  This is similar the cpu-features/cacheinfo strategy for x86.
>>
>> From last patch iteration [1] documentation, kernel provides the HWCAP_CPUID
>> bit on hwcap to indication it supports the mrs emulation.  So using my previous
>> suggestion I would recommend:
>>
>>   1. Remove any configure check or restriction.
>>   2. Add a cpu_features module similar to x86 that set a global state with
>>      the cpu information obtained from kernel.  It will first check HWCAP_CPUID
>>      bit on hwcap and if it is set then issue the mrs instruction.  It will
>>      then populate the global state with the required cpu information.
>>   3. Use the cpu information to select the correct ifunc.
>>
>> It has another advantage of avoid more complexity with different glibc
>> with different minimum required kernels.
> 
> 
> Adhemerval,
> 
> I am looking at the cpu-features setup from x86 and trying to implement
> that for aarch64 but there are some things I don't understand about the
> code and I was hoping you (or someone else on the list) could help me.
> I have attached the patch I have so far, this code doesn't contain any
> use of the cpu features code but is just the code that tries to initialize
> it on start up.  Right now it doesn't build and I am not sure what I am
> missing.
> 
> Specifically I have these questions.
> 
> How is cpu-features-offsets.sym used and what do I need in this file?
> I think this may be how _dl_aarch64_cpu_features is supposed to be
> defined but I am not sure.
> 

The .sym files are a trick glibc uses to basically define struct or tls
offsets so use on assembly implementations. x86 uses it because it
originally implemented most of ifunc resolvers directly in assembly
(back when compiler support was lacking).

Since you are implementing directly on C, these files are unnecessary.

> I obviously need something in init_cpu_features to check if mrs is
> emulated in the kernel but I am not sure how to do that.  I know it
> involves the HWCAPs but I am not sure how to access them, do I need a
> sym file to get access to that too?  Something like
> sysdeps/arm/rtld-global-offsets.sym?
> 
> Right now my build dies with:
> 
> <stdin>:2:102: error: implicit declaration of function ‘rtld_global_ro_offsetof’ [-Werror=implicit-function-declaration]
> <stdin>:2:127: error: ‘_dl_aarch64_cpu_features’ undeclared (first use in this function)
> <stdin>:2:127: note: each undeclared identifier is reported only once for each function it appears in
> <stdin>:3:82: error: invalid application of ‘sizeof’ to incomplete type ‘struct cpu_features’
> cc1: all warnings being treated as errors
> ../Makerules:266: recipe for target '/home/ubuntu/sellcey/glibc-ifunc-new/obj-glibc64/cpu-features-offsets.h' failed
> make[2]: *** [/home/ubuntu/sellcey/glibc-ifunc-new/obj-glibc64/cpu-features-offsets.h] Error 1
> 
> Steve Ellcey
> sellcey@caviumnetworks.com
> 

This branch in my personal repo [1] have a workable draft version for
aarch64.  It contains 2 patches, one that implements the cpu-features.c
for aarch64 and another one that actually uses it to implement the
thundex ifunc.

On the first patch I would like to remove the sysdeps/aarch64/ldsodefs.h and
make only Linux specific, because of hwcap. I will try to cleanup this up
later.

[1] https://github.com/zatrazz/glibc/tree/master-aarch64-ifunc

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH] Add ifunc memcpy and memmove for aarch64
  2017-01-23 23:33   ` Steve Ellcey
@ 2017-01-24  9:37     ` Florian Weimer
  2017-01-24 14:09     ` Adhemerval Zanella
  1 sibling, 0 replies; 38+ messages in thread
From: Florian Weimer @ 2017-01-24  9:37 UTC (permalink / raw)
  To: Steve Ellcey, Adhemerval Zanella, libc-alpha

On 01/24/2017 12:33 AM, Steve Ellcey wrote:
> How is cpu-features-offsets.sym used and what do I need in this file?
> I think this may be how _dl_aarch64_cpu_features is supposed to be
> defined but I am not sure.

It allows the assembler to use the values of C constant expressions. 
Commit 67aae64512cb42332f76a83e84ac2bc608ad4ad2 is an aarch64 example of 
its use.

Florian

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH] Add ifunc memcpy and memmove for aarch64
  2017-01-19 19:42 ` Adhemerval Zanella
  2017-01-19 21:04   ` Joseph Myers
@ 2017-01-23 23:33   ` Steve Ellcey
  2017-01-24  9:37     ` Florian Weimer
  2017-01-24 14:09     ` Adhemerval Zanella
  1 sibling, 2 replies; 38+ messages in thread
From: Steve Ellcey @ 2017-01-23 23:33 UTC (permalink / raw)
  To: Adhemerval Zanella, libc-alpha

[-- Attachment #1: Type: text/plain, Size: 2965 bytes --]

On Thu, 2017-01-19 at 17:41 -0200, Adhemerval Zanella wrote:
> 
> I think to avoid potentially multiple kernel traps at loading or plt resolve time,
> a better solution would be issue the mrs instruction once at loader/program startup,
> fill in an internal structure with the required information and use it later on
> ifunc resolution.  This is similar the cpu-features/cacheinfo strategy for x86.
> 
> From last patch iteration [1] documentation, kernel provides the HWCAP_CPUID
> bit on hwcap to indication it supports the mrs emulation.  So using my previous
> suggestion I would recommend:
> 
>   1. Remove any configure check or restriction.
>   2. Add a cpu_features module similar to x86 that set a global state with
>      the cpu information obtained from kernel.  It will first check HWCAP_CPUID
>      bit on hwcap and if it is set then issue the mrs instruction.  It will
>      then populate the global state with the required cpu information.
>   3. Use the cpu information to select the correct ifunc.
> 
> It has another advantage of avoid more complexity with different glibc
> with different minimum required kernels.


Adhemerval,

I am looking at the cpu-features setup from x86 and trying to implement
that for aarch64 but there are some things I don't understand about the
code and I was hoping you (or someone else on the list) could help me.
I have attached the patch I have so far, this code doesn't contain any
use of the cpu features code but is just the code that tries to initialize
it on start up.  Right now it doesn't build and I am not sure what I am
missing.

Specifically I have these questions.

How is cpu-features-offsets.sym used and what do I need in this file?
I think this may be how _dl_aarch64_cpu_features is supposed to be
defined but I am not sure.

I obviously need something in init_cpu_features to check if mrs is
emulated in the kernel but I am not sure how to do that.  I know it
involves the HWCAPs but I am not sure how to access them, do I need a
sym file to get access to that too?  Something like
sysdeps/arm/rtld-global-offsets.sym?

Right now my build dies with:

<stdin>:2:102: error: implicit declaration of function ‘rtld_global_ro_offsetof’ [-Werror=implicit-function-declaration]
<stdin>:2:127: error: ‘_dl_aarch64_cpu_features’ undeclared (first use in this function)
<stdin>:2:127: note: each undeclared identifier is reported only once for each function it appears in
<stdin>:3:82: error: invalid application of ‘sizeof’ to incomplete type ‘struct cpu_features’
cc1: all warnings being treated as errors
../Makerules:266: recipe for target '/home/ubuntu/sellcey/glibc-ifunc-new/obj-glibc64/cpu-features-offsets.h' failed
make[2]: *** [/home/ubuntu/sellcey/glibc-ifunc-new/obj-glibc64/cpu-features-offsets.h] Error 1

Steve Ellcey
sellcey@caviumnetworks.com

[-- Attachment #2: ifunc2.diff --]
[-- Type: text/x-patch, Size: 10446 bytes --]

diff --git a/sysdeps/aarch64/Makefile b/sysdeps/aarch64/Makefile
index 562c137..e1d47fd 100644
--- a/sysdeps/aarch64/Makefile
+++ b/sysdeps/aarch64/Makefile
@@ -5,11 +5,13 @@ CFLAGS-backtrace.c += -funwind-tables
 endif
 
 ifeq ($(subdir),elf)
+sysdep-dl-routines += dl-get-cpu-features
 sysdep-dl-routines += tlsdesc dl-tlsdesc
 gen-as-const-headers += dl-link.sym
 endif
 
 ifeq ($(subdir),csu)
+gen-as-const-headers += cpu-features-offsets.sym
 gen-as-const-headers += tlsdesc.sym
 endif
 
diff --git a/sysdeps/aarch64/cpu-features-offsets.sym b/sysdeps/aarch64/cpu-features-offsets.sym
index e69de29..ad33818 100644
--- a/sysdeps/aarch64/cpu-features-offsets.sym
+++ b/sysdeps/aarch64/cpu-features-offsets.sym
@@ -0,0 +1,4 @@
+
+RTLD_GLOBAL_RO_DL_AARCH64_CPU_FEATURES_OFFSET rtld_global_ro_offsetof (_dl_aarch64_cpu_features)
+
+CPU_FEATURES_SIZE	sizeof (struct cpu_features)
diff --git a/sysdeps/aarch64/cpu-features.c b/sysdeps/aarch64/cpu-features.c
index e69de29..6c8f065 100644
--- a/sysdeps/aarch64/cpu-features.c
+++ b/sysdeps/aarch64/cpu-features.c
@@ -0,0 +1,30 @@
+/* Initialize CPU feature data.
+   This file is part of the GNU C Library.
+   Copyright (C) 2017 Free Software Foundation, Inc.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include <stdint.h>
+#include <cpu-features.h>
+
+static inline void
+init_cpu_features (struct cpu_features *cpu_features)
+{
+  /* What do I do on a kernel that does not support mrs.  */
+
+  register uint64_t id = 0;
+  asm volatile ("mrs %0, midr_el1" : "=r"(id));
+  cpu_features->midr_el1 = id;
+}
diff --git a/sysdeps/aarch64/cpu-features.h b/sysdeps/aarch64/cpu-features.h
index e69de29..a2d0786 100644
--- a/sysdeps/aarch64/cpu-features.h
+++ b/sysdeps/aarch64/cpu-features.h
@@ -0,0 +1,48 @@
+/* This file is part of the GNU C Library.
+   Copyright (C) 2017 Free Software Foundation, Inc.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#ifndef cpu_features_h
+#define cpu_features_h
+
+#include <stdint.h>
+
+#define MIDR_PARTNUM_SHIFT	4
+#define MIDR_PARTNUM_MASK	(0xfff << MIDR_PARTNUM_SHIFT)
+#define MIDR_PARTNUM(midr)	\
+	(((midr) & MIDR_PARTNUM_MASK) >> MIDR_PARTNUM_SHIFT)
+#define MIDR_ARCHITECTURE_SHIFT	16
+#define MIDR_ARCHITECTURE_MASK	(0xf << MIDR_ARCHITECTURE_SHIFT)
+#define MIDR_ARCHITECTURE(midr)	\
+	(((midr) & MIDR_ARCHITECTURE_MASK) >> MIDR_ARCHITECTURE_SHIFT)
+#define MIDR_VARIANT_SHIFT	20
+#define MIDR_VARIANT_MASK	(0xf << MIDR_VARIANT_SHIFT)
+#define MIDR_VARIANT(midr)	\
+	(((midr) & MIDR_VARIANT_MASK) >> MIDR_VARIANT_SHIFT)
+#define MIDR_IMPLEMENTOR_SHIFT	24
+#define MIDR_IMPLEMENTOR_MASK	(0xff << MIDR_IMPLEMENTOR_SHIFT)
+#define MIDR_IMPLEMENTOR(midr)	\
+	(((midr) & MIDR_IMPLEMENTOR_MASK) >> MIDR_IMPLEMENTOR_SHIFT)
+
+#define IS_THUNDERX(midr) (MIDR_IMPLEMENTOR(midr) == 'C'	\
+			   && MIDR_PARTNUM(midr) == 0x0a1)
+
+struct cpu_features
+{
+  uint64_t midr_el1;
+};
+
+#endif /* cpu_features_h */
diff --git a/sysdeps/aarch64/dl-get-cpu-features.c b/sysdeps/aarch64/dl-get-cpu-features.c
index e69de29..1581c75 100644
--- a/sysdeps/aarch64/dl-get-cpu-features.c
+++ b/sysdeps/aarch64/dl-get-cpu-features.c
@@ -0,0 +1,26 @@
+/* This file is part of the GNU C Library.
+   Copyright (C) 2017 Free Software Foundation, Inc.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include <ldsodefs.h>
+
+#undef __get_cpu_features
+
+const struct cpu_features *
+__get_cpu_features (void)
+{
+  return &GLRO(dl_aarch64_cpu_features);
+}
diff --git a/sysdeps/aarch64/dl-machine.h b/sysdeps/aarch64/dl-machine.h
index 84b8aec..4e34b13 100644
--- a/sysdeps/aarch64/dl-machine.h
+++ b/sysdeps/aarch64/dl-machine.h
@@ -25,6 +25,7 @@
 #include <tls.h>
 #include <dl-tlsdesc.h>
 #include <dl-irel.h>
+#include <cpu-features.c>
 
 /* Return nonzero iff ELF header is compatible with the running host.  */
 static inline int __attribute__ ((unused))
@@ -426,4 +427,20 @@ elf_machine_lazy_rel (struct link_map *map,
     _dl_reloc_bad_type (map, r_type, 1);
 }
 
+#define DL_PLATFORM_INIT dl_platform_init ()
+
+static inline void __attribute__ ((unused))
+dl_platform_init (void)
+{
+  if (GLRO(dl_platform) != NULL && *GLRO(dl_platform) == '\0')
+    /* Avoid an empty string which would disturb us.  */
+    GLRO(dl_platform) = NULL;
+
+#ifdef SHARED
+  /* init_cpu_features has been called early from __libc_start_main in
+     static executable.  */
+  init_cpu_features (&GLRO(dl_aarch64_cpu_features));
+#endif
+}
+
 #endif
diff --git a/sysdeps/aarch64/dl-procinfo.c b/sysdeps/aarch64/dl-procinfo.c
index e69de29..8d477d5 100644
--- a/sysdeps/aarch64/dl-procinfo.c
+++ b/sysdeps/aarch64/dl-procinfo.c
@@ -0,0 +1,57 @@
+/* Data for Aarch64 version of processor capability information.
+   Copyright (C) 2017 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+/* If anything should be added here check whether the size of each string
+   is still ok with the given array size.
+
+   All the #ifdefs in the definitions are quite irritating but
+   necessary if we want to avoid duplicating the information.  There
+   are three different modes:
+
+   - PROCINFO_DECL is defined.  This means we are only interested in
+     declarations.
+
+   - PROCINFO_DECL is not defined:
+
+     + if SHARED is defined the file is included in an array
+       initializer.  The .element = { ... } syntax is needed.
+
+     + if SHARED is not defined a normal array initialization is
+       needed.
+  */
+
+#ifndef PROCINFO_CLASS
+# define PROCINFO_CLASS
+#endif
+
+#if !defined PROCINFO_DECL && defined SHARED
+  ._dl_aarch64_cpu_features
+#else
+PROCINFO_CLASS struct cpu_features _dl_aarch64_cpu_features
+#endif
+#ifndef PROCINFO_DECL
+= { }
+#endif
+#if !defined SHARED || defined PROCINFO_DECL
+;
+#else
+,
+#endif
+
+#undef PROCINFO_DECL
+#undef PROCINFO_CLASS
diff --git a/sysdeps/aarch64/ldsodefs.h b/sysdeps/aarch64/ldsodefs.h
index f277074..ba4ada3 100644
--- a/sysdeps/aarch64/ldsodefs.h
+++ b/sysdeps/aarch64/ldsodefs.h
@@ -20,6 +20,7 @@
 #define _AARCH64_LDSODEFS_H 1
 
 #include <elf.h>
+#include <cpu-features.h>
 
 struct La_aarch64_regs;
 struct La_aarch64_retval;
diff --git a/sysdeps/aarch64/libc-start.c b/sysdeps/aarch64/libc-start.c
index e69de29..49d5f4a 100644
--- a/sysdeps/aarch64/libc-start.c
+++ b/sysdeps/aarch64/libc-start.c
@@ -0,0 +1,41 @@
+/* Copyright (C) 2017 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#ifdef SHARED
+# include <csu/libc-start.c>
+# else
+/* The main work is done in the generic function.  */
+# define LIBC_START_DISABLE_INLINE
+# define LIBC_START_MAIN generic_start_main
+# include <csu/libc-start.c>
+# include <cpu-features.h>
+# include <cpu-features.c>
+
+extern struct cpu_features _dl_aarch64_cpu_features;
+
+int
+__libc_start_main (int (*main) (int, char **, char ** MAIN_AUXVEC_DECL),
+		   int argc, char **argv,
+		   __typeof (main) init,
+		   void (*fini) (void),
+		   void (*rtld_fini) (void), void *stack_end)
+{
+  init_cpu_features (&_dl_aarch64_cpu_features);
+  return generic_start_main (main, argc, argv, init, fini, rtld_fini,
+			     stack_end);
+}
+#endif

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH] Add ifunc memcpy and memmove for aarch64
  2017-01-19 19:42 ` Adhemerval Zanella
@ 2017-01-19 21:04   ` Joseph Myers
  2017-01-23 23:33   ` Steve Ellcey
  1 sibling, 0 replies; 38+ messages in thread
From: Joseph Myers @ 2017-01-19 21:04 UTC (permalink / raw)
  To: Adhemerval Zanella; +Cc: libc-alpha

On Thu, 19 Jan 2017, Adhemerval Zanella wrote:

> We need to make sure glibc built against older kernel headers (or with
> --enable-kernel=x.y.z) do not use mrs instruction and glibc built against
> newer kernel that may use mrs fail on loading with DL_SYSDEP_OSCHECK.

Agreed.  That is, I think that either the configured minimum kernel 
version or the kernel support at runtime (or both, with the configured 
minimum kernel allowing runtime tests to be disabled) should be what 
determines whether these implementations can be used - rather than 
enabling multi-arch changing the minimum kernel version.

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH] Add ifunc memcpy and memmove for aarch64
  2017-01-19 18:23 [PATCH] Add ifunc memcpy and memmove for aarch64 Steve Ellcey
@ 2017-01-19 19:42 ` Adhemerval Zanella
  2017-01-19 21:04   ` Joseph Myers
  2017-01-23 23:33   ` Steve Ellcey
  0 siblings, 2 replies; 38+ messages in thread
From: Adhemerval Zanella @ 2017-01-19 19:42 UTC (permalink / raw)
  To: libc-alpha

Hi Steve,

On 19/01/2017 16:22, Steve Ellcey wrote:
> +extern uint64_t __midr attribute_hidden;
> +extern bool __is_thunderx attribute_hidden;
> +
> +#define INIT_ARCH()						\
> +  {								\
> +    if (__midr == 0)						\
> +      {								\
> +	asm volatile ("mrs %0, midr_el1" : "=r"(__midr));	\
> +	__is_thunderx = IS_THUNDERX(__midr);			\
> +      }								\
> +  }

I think to avoid potentially multiple kernel traps at loading or plt resolve time,
a better solution would be issue the mrs instruction once at loader/program startup,
fill in an internal structure with the required information and use it later on
ifunc resolution.  This is similar the cpu-features/cacheinfo strategy for x86.


> diff --git a/sysdeps/unix/sysv/linux/aarch64/configure.ac b/sysdeps/unix/sysv/linux/aarch64/configure.ac
> index 211fa9c..684cb46 100644
> --- a/sysdeps/unix/sysv/linux/aarch64/configure.ac
> +++ b/sysdeps/unix/sysv/linux/aarch64/configure.ac
> @@ -1,6 +1,11 @@
>  GLIBC_PROVIDES dnl See aclocal.m4 in the top level source directory.
>  # Local configure fragment for sysdeps/unix/sysv/linux/aarch64.
>  
> -arch_minimum_kernel=3.7.0
> +# For multi-arch support we need a kernel that emulates the mrs instruction.
> +if test x$multi_arch = xyes; then
> +    arch_minimum_kernel=4.11.0
> +else
> +    arch_minimum_kernel=3.7.0
> +fi

I do not think this is suffice to prevent the multiarch version on system with
old installed kernel headers.  This will only prevents if you explicit use
--enable-multi-arch, however multiarch are enabled by default in configure.ac
(configure.ac:877).  So building on with old kernel headers will broke 
the runtime.

We need to make sure glibc built against older kernel headers (or with
--enable-kernel=x.y.z) do not use mrs instruction and glibc built against
newer kernel that may use mrs fail on loading with DL_SYSDEP_OSCHECK.

From last patch iteration [1] documentation, kernel provides the HWCAP_CPUID
bit on hwcap to indication it supports the mrs emulation.  So using my previous
suggestion I would recommend:

  1. Remove any configure check or restriction.
  2. Add a cpu_features module similar to x86 that set a global state with
     the cpu information obtained from kernel.  It will first check HWCAP_CPUID
     bit on hwcap and if it is set then issue the mrs instruction.  It will
     then populate the global state with the required cpu information.
  3. Use the cpu information to select the correct ifunc.

It has another advantage of avoid more complexity with different glibc
with different minimum required kernels.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH] Add ifunc memcpy and memmove for aarch64
@ 2017-01-19 18:23 Steve Ellcey
  2017-01-19 19:42 ` Adhemerval Zanella
  0 siblings, 1 reply; 38+ messages in thread
From: Steve Ellcey @ 2017-01-19 18:23 UTC (permalink / raw)
  To: libc-alpha

[-- Attachment #1: Type: text/plain, Size: 2298 bytes --]

This patch adds ifunc versions of memcpy and memmove for aarch64.  I
know this isn't appropriate for 2.25 but I wanted to submit it and get
it reviewed for 2.26.  The basic change is to include software
prefetching for large memcpy's on thunderx which can speed up those
routines by around 2X.  For memcpy's under 32K bytes I found that the
software prefetching did not help (and sometimes hurt).  I wasn't
really interested in speeding up memmove but since memcpy and memmove
are implemented in one file it seemed easier to make memmove an ifunc
along with memcpy rather than try and split them up.  memmove does get
a speedup when it uses the memcpy code.

The ifunc code depends on the mrs instruction which is a privileged
instruction but the 4.11 version of the linux kernel will have
emulation for it (https://lkml.org/lkml/2017/1/10/816).  Since it is
emulated I added code to save it's value rather than read it everytime
we want to execute an ifunc selection function.  I also saved a flag to
specify if the platform was thunderx or not so that glibc did not have
to do multiple logical operations on the mrs value in each ifunc
selection function to determine if it was on a thunderx platform or
not.

I have attached the bench-memcpy.out, bench-memcpy-large.out, bench-
memmove.out and bench-memmove-large.out files to show the performance
difference, most of the difference is seen in the large versions as the
smaller ones only use prefetching on a couple of inputs.

Steve Ellcey
sellcey@caviumnetworks.com


2017-01-19  Steve Ellcey  <sellcey@caviumnetworks.com>

	* sysdeps/aarch64/memcpy.S (MEMMOVE, MEMCPY): New macros.
	(memmove): Use MEMMOVE for name.
	(memcpy): Use MEMCPY for name.  Add loop with prefetching
	under USE_THUNDERX macro.
	* sysdeps/aarch64/multiarch/Makefile: New file.
	* sysdeps/aarch64/multiarch/ifunc-impl-list.c: Ditto.
	* sysdeps/aarch64/multiarch/init-arch.h: Ditto.
	* sysdeps/aarch64/multiarch/memcpy.c: Ditto.
	* sysdeps/aarch64/multiarch/memcpy_generic.S: Ditto.
	* sysdeps/aarch64/multiarch/memcpy_thunderx.S: Ditto.
	* sysdeps/unix/sysv/linux/aarch64/configure.ac (arch_minimum_kernel):
	Set to 4.11.0 if building with multi_arch.
	* sysdeps/unix/sysv/linux/aarch64/configure: Regenerate.

[-- Attachment #2: ifunc.patch --]
[-- Type: text/x-patch, Size: 15812 bytes --]

diff --git a/sysdeps/aarch64/memcpy.S b/sysdeps/aarch64/memcpy.S
index 29af8b1..74444b4 100644
--- a/sysdeps/aarch64/memcpy.S
+++ b/sysdeps/aarch64/memcpy.S
@@ -59,7 +59,14 @@
    Overlapping large forward memmoves use a loop that copies backwards.
 */
 
-ENTRY_ALIGN (memmove, 6)
+#ifndef MEMMOVE
+#  define MEMMOVE memmove
+#endif
+#ifndef MEMCPY
+#  define MEMCPY memcpy
+#endif
+
+ENTRY_ALIGN (MEMMOVE, 6)
 
 	DELOUSE (0)
 	DELOUSE (1)
@@ -71,9 +78,9 @@ ENTRY_ALIGN (memmove, 6)
 	b.lo	L(move_long)
 
 	/* Common case falls through into memcpy.  */
-END (memmove)
-libc_hidden_builtin_def (memmove)
-ENTRY (memcpy)
+END (MEMMOVE)
+libc_hidden_builtin_def (MEMMOVE)
+ENTRY (MEMCPY)
 
 	DELOUSE (0)
 	DELOUSE (1)
@@ -158,10 +165,22 @@ L(copy96):
 
 	.p2align 4
 L(copy_long):
+
+#ifdef USE_THUNDERX
+
+	/* On thunderx, large memcpy's are helped by software prefetching.
+	   This loop is identical to the one below it but with prefetching
+	   instructions included.  For loops that are less than 32768 bytes,
+	   the prefetching does not help and slow the code down so we only
+	   use the prefetching loop for the largest memcpys.  */
+
+	cmp	count, #32768
+	b.lo	L(copy_long_without_prefetch)
 	and	tmp1, dstin, 15
 	bic	dst, dstin, 15
 	ldp	D_l, D_h, [src]
 	sub	src, src, tmp1
+	prfm	pldl1strm, [src, 384]
 	add	count, count, tmp1	/* Count is now 16 too large.  */
 	ldp	A_l, A_h, [src, 16]
 	stp	D_l, D_h, [dstin]
@@ -169,7 +188,10 @@ L(copy_long):
 	ldp	C_l, C_h, [src, 48]
 	ldp	D_l, D_h, [src, 64]!
 	subs	count, count, 128 + 16	/* Test and readjust count.  */
-	b.ls	2f
+
+L(prefetch_loop64):
+	tbz	src, #6, 1f
+	prfm	pldl1strm, [src, 512]
 1:
 	stp	A_l, A_h, [dst, 16]
 	ldp	A_l, A_h, [src, 16]
@@ -180,12 +202,40 @@ L(copy_long):
 	stp	D_l, D_h, [dst, 64]!
 	ldp	D_l, D_h, [src, 64]!
 	subs	count, count, 64
-	b.hi	1b
+	b.hi	L(prefetch_loop64)
+	b	L(last64)
+
+L(copy_long_without_prefetch):
+#endif
+
+	and	tmp1, dstin, 15
+	bic	dst, dstin, 15
+	ldp	D_l, D_h, [src]
+	sub	src, src, tmp1
+	add	count, count, tmp1	/* Count is now 16 too large.  */
+	ldp	A_l, A_h, [src, 16]
+	stp	D_l, D_h, [dstin]
+	ldp	B_l, B_h, [src, 32]
+	ldp	C_l, C_h, [src, 48]
+	ldp	D_l, D_h, [src, 64]!
+	subs	count, count, 128 + 16	/* Test and readjust count.  */
+	b.ls	L(last64)
+L(loop64):
+	stp	A_l, A_h, [dst, 16]
+	ldp	A_l, A_h, [src, 16]
+	stp	B_l, B_h, [dst, 32]
+	ldp	B_l, B_h, [src, 32]
+	stp	C_l, C_h, [dst, 48]
+	ldp	C_l, C_h, [src, 48]
+	stp	D_l, D_h, [dst, 64]!
+	ldp	D_l, D_h, [src, 64]!
+	subs	count, count, 64
+	b.hi	L(loop64)
 
 	/* Write the last full set of 64 bytes.  The remainder is at most 64
 	   bytes, so it is safe to always copy 64 bytes from the end even if
 	   there is just 1 byte left.  */
-2:
+L(last64):
 	ldp	E_l, E_h, [srcend, -64]
 	stp	A_l, A_h, [dst, 16]
 	ldp	A_l, A_h, [srcend, -48]
@@ -256,5 +306,5 @@ L(move_long):
 	stp	C_l, C_h, [dstin]
 3:	ret
 
-END (memcpy)
-libc_hidden_builtin_def (memcpy)
+END (MEMCPY)
+libc_hidden_builtin_def (MEMCPY)
diff --git a/sysdeps/aarch64/multiarch/Makefile b/sysdeps/aarch64/multiarch/Makefile
index e69de29..78d52c7 100644
--- a/sysdeps/aarch64/multiarch/Makefile
+++ b/sysdeps/aarch64/multiarch/Makefile
@@ -0,0 +1,3 @@
+ifeq ($(subdir),string)
+sysdep_routines += memcpy_generic memcpy_thunderx
+endif
diff --git a/sysdeps/aarch64/multiarch/ifunc-impl-list.c b/sysdeps/aarch64/multiarch/ifunc-impl-list.c
index e69de29..c6d63f6 100644
--- a/sysdeps/aarch64/multiarch/ifunc-impl-list.c
+++ b/sysdeps/aarch64/multiarch/ifunc-impl-list.c
@@ -0,0 +1,61 @@
+/* Enumerate available IFUNC implementations of a function.  AARCH64 version.
+   Copyright (C) 2017 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include <assert.h>
+#include <string.h>
+#include <wchar.h>
+#include <ldsodefs.h>
+#include <ifunc-impl-list.h>
+#include <init-arch.h>
+#include <stdio.h>
+
+/* Access to the midr_el1 register is emulated by the linux kernel and
+   is slow so we save it in __midr after it is read once.  We also save
+   the value of IS_THUNDERX in __is_thunderx so it does not need to be
+   recomputed by checking multiple bits from __midr.  */
+
+uint64_t __midr attribute_hidden = 0;
+bool __is_thunderx attribute_hidden;
+
+/* Maximum number of IFUNC implementations.  */
+#define MAX_IFUNC	2
+
+size_t
+__libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array,
+			size_t max)
+{
+  assert (max >= MAX_IFUNC);
+
+  size_t i = 0;
+
+  INIT_ARCH ();
+
+#ifdef SHARED
+  /* Support sysdeps/aarch64/multiarch/memcpy.c.  */
+  IFUNC_IMPL (i, name, memcpy,
+	      IFUNC_IMPL_ADD (array, i, memcpy, __is_thunderx,
+			      __memcpy_thunderx)
+	      IFUNC_IMPL_ADD (array, i, memcpy, 1, __memcpy_generic))
+  IFUNC_IMPL (i, name, memmove,
+	      IFUNC_IMPL_ADD (array, i, memmove, __is_thunderx,
+			      __memmove_thunderx)
+	      IFUNC_IMPL_ADD (array, i, memmove, 1, __memmove_generic))
+#endif
+
+  return i;
+}
diff --git a/sysdeps/aarch64/multiarch/init-arch.h b/sysdeps/aarch64/multiarch/init-arch.h
index e69de29..e12ba61 100644
--- a/sysdeps/aarch64/multiarch/init-arch.h
+++ b/sysdeps/aarch64/multiarch/init-arch.h
@@ -0,0 +1,55 @@
+/* This file is part of the GNU C Library.
+   Copyright (C) 2017 Free Software Foundation, Inc.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include <elf.h>
+#include <stdint.h>
+#include <stdbool.h>
+
+#define MIDR_REVISION_MASK	0xf
+#define MIDR_REVISION(__midr)	((__midr) & MIDR_REVISION_MASK)
+#define MIDR_PARTNUM_SHIFT	4
+#define MIDR_PARTNUM_MASK	(0xfff << MIDR_PARTNUM_SHIFT)
+#define MIDR_PARTNUM(__midr)	\
+	(((__midr) & MIDR_PARTNUM_MASK) >> MIDR_PARTNUM_SHIFT)
+#define MIDR_ARCHITECTURE_SHIFT	16
+#define MIDR_ARCHITECTURE_MASK	(0xf << MIDR_ARCHITECTURE_SHIFT)
+#define MIDR_ARCHITECTURE(__midr)	\
+	(((__midr) & MIDR_ARCHITECTURE_MASK) >> MIDR_ARCHITECTURE_SHIFT)
+#define MIDR_VARIANT_SHIFT	20
+#define MIDR_VARIANT_MASK	(0xf << MIDR_VARIANT_SHIFT)
+#define MIDR_VARIANT(__midr)	\
+	(((__midr) & MIDR_VARIANT_MASK) >> MIDR_VARIANT_SHIFT)
+#define MIDR_IMPLEMENTOR_SHIFT	24
+#define MIDR_IMPLEMENTOR_MASK	(0xff << MIDR_IMPLEMENTOR_SHIFT)
+#define MIDR_IMPLEMENTOR(__midr)	\
+	(((__midr) & MIDR_IMPLEMENTOR_MASK) >> MIDR_IMPLEMENTOR_SHIFT)
+
+#define IS_THUNDERX(__midr) (MIDR_IMPLEMENTOR(__midr) == 'C'	\
+			     && MIDR_PARTNUM(__midr) == 0x0a1)
+
+
+extern uint64_t __midr attribute_hidden;
+extern bool __is_thunderx attribute_hidden;
+
+#define INIT_ARCH()						\
+  {								\
+    if (__midr == 0)						\
+      {								\
+	asm volatile ("mrs %0, midr_el1" : "=r"(__midr));	\
+	__is_thunderx = IS_THUNDERX(__midr);			\
+      }								\
+  }
diff --git a/sysdeps/aarch64/multiarch/memcpy.c b/sysdeps/aarch64/multiarch/memcpy.c
index e69de29..b2b587b 100644
--- a/sysdeps/aarch64/multiarch/memcpy.c
+++ b/sysdeps/aarch64/multiarch/memcpy.c
@@ -0,0 +1,41 @@
+/* Multiple versions of memcpy. AARCH64 version.
+   Copyright (C) 2017 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+/* Define multiple versions only for the definition in lib and for
+   DSO.  In static binaries we need memcpy before the initialization
+   happened.  */
+
+#if defined SHARED && IS_IN (libc)
+/* Redefine memcpy so that the compiler won't complain about the type
+   mismatch with the IFUNC selector in strong_alias, below.  */
+# undef memcpy
+# define memcpy __redirect_memcpy
+# include <string.h>
+# include <init-arch.h>
+
+extern __typeof (__redirect_memcpy) __libc_memcpy;
+
+extern __typeof (__redirect_memcpy) __memcpy_generic attribute_hidden;
+extern __typeof (__redirect_memcpy) __memcpy_thunderx attribute_hidden;
+
+libc_ifunc (__libc_memcpy,
+            __is_thunderx ? __memcpy_thunderx : __memcpy_generic);
+
+#undef memcpy
+strong_alias (__libc_memcpy, memcpy);
+#endif
diff --git a/sysdeps/aarch64/multiarch/memcpy_generic.S b/sysdeps/aarch64/multiarch/memcpy_generic.S
index e69de29..c0e3462 100644
--- a/sysdeps/aarch64/multiarch/memcpy_generic.S
+++ b/sysdeps/aarch64/multiarch/memcpy_generic.S
@@ -0,0 +1,42 @@
+/* A Generic Optimized memcpy implementation for AARCH64.
+   Copyright (C) 2017 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+/* The actual memcpy and memmove code is in ../memcpy.S.  If we are
+   building a shared libc using IFUNC this file defines __memcpy_generic
+   and __memmove_generic.  Otherwise the include of ../memcpy.S will
+   define the normal __memcpy and__memmove entry points.  */
+
+#include <sysdep.h>
+
+#if defined SHARED && IS_IN (libc)
+
+#define MEMCPY __memcpy_generic
+#define MEMMOVE __memmove_generic
+
+/* Do not hide the generic versions of memcpy and memmove, we use them
+   internally.  */
+#undef libc_hidden_builtin_def
+#define libc_hidden_builtin_def(name)
+
+/* It doesn't make sense to send libc-internal memcpy calls through a PLT. */
+	.globl __GI_memcpy; __GI_memcpy = __memcpy_generic
+	.globl __GI_memmove; __GI_memmove = __memmove_generic
+
+#endif
+
+#include "../memcpy.S"
diff --git a/sysdeps/aarch64/multiarch/memcpy_thunderx.S b/sysdeps/aarch64/multiarch/memcpy_thunderx.S
index e69de29..df5e959 100644
--- a/sysdeps/aarch64/multiarch/memcpy_thunderx.S
+++ b/sysdeps/aarch64/multiarch/memcpy_thunderx.S
@@ -0,0 +1,33 @@
+/* A Thunderx Optimized memcpy implementation for AARCH64.
+   Copyright (C) 2017 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+/* The actual thunderx optimized code is in ../memcpy.S under the USE_THUNDERX
+   ifdef.  If we are not building a shared libc with IFUNC then we do not
+   build anything when compiling this file and __memcpy is defined by
+   memcpy_generic.S.  */
+
+#include <sysdep.h>
+
+#if defined SHARED && IS_IN (libc)
+
+#define MEMCPY __memcpy_thunderx
+#define MEMMOVE __memmove_thunderx
+#define USE_THUNDERX
+#include "../memcpy.S"
+
+#endif
diff --git a/sysdeps/aarch64/multiarch/memmove.c b/sysdeps/aarch64/multiarch/memmove.c
index e69de29..c08c763 100644
--- a/sysdeps/aarch64/multiarch/memmove.c
+++ b/sysdeps/aarch64/multiarch/memmove.c
@@ -0,0 +1,40 @@
+/* Multiple versions of memmove. AARCH64 version.
+   Copyright (C) 2017 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+/* Define multiple versions only for the definition in lib and for
+   DSO.  In static binaries we need memmove before the initialization
+   happened.  */
+#if defined SHARED && IS_IN (libc)
+/* Redefine memmove so that the compiler won't complain about the type
+   mismatch with the IFUNC selector in strong_alias, below.  */
+# undef memmove
+# define memmove __redirect_memmove
+# include <string.h>
+# include <init-arch.h>
+
+extern __typeof (__redirect_memmove) __libc_memmove;
+
+extern __typeof (__redirect_memmove) __memmove_generic attribute_hidden;
+extern __typeof (__redirect_memmove) __memmove_thunderx attribute_hidden;
+
+libc_ifunc (__libc_memmove,
+            __is_thunderx ? __memmove_thunderx : __memmove_generic);
+
+#undef memmove
+strong_alias (__libc_memmove, memmove);
+#endif
diff --git a/sysdeps/unix/sysv/linux/aarch64/configure.ac b/sysdeps/unix/sysv/linux/aarch64/configure.ac
index 211fa9c..684cb46 100644
--- a/sysdeps/unix/sysv/linux/aarch64/configure.ac
+++ b/sysdeps/unix/sysv/linux/aarch64/configure.ac
@@ -1,6 +1,11 @@
 GLIBC_PROVIDES dnl See aclocal.m4 in the top level source directory.
 # Local configure fragment for sysdeps/unix/sysv/linux/aarch64.
 
-arch_minimum_kernel=3.7.0
+# For multi-arch support we need a kernel that emulates the mrs instruction.
+if test x$multi_arch = xyes; then
+    arch_minimum_kernel=4.11.0
+else
+    arch_minimum_kernel=3.7.0
+fi
 
 LIBC_SLIBDIR_RTLDDIR([lib64], [lib])

[-- Attachment #3: bench-memcpy.out --]
[-- Type: text/plain, Size: 26125 bytes --]

                       	builtin_memcpy	simple_memcpy	__memcpy_thunderx	__memcpy_generic
Length    1, alignment  0/ 0:	40.4688	21.25	21.0938	21.5625
Length    1, alignment  0/ 0:	24.0625	21.5625	21.4062	21.0938
Length    1, alignment  0/ 0:	23.9062	15.7812	20.7812	20.9375
Length    1, alignment  0/ 0:	24.0625	15.7812	20.7812	20.9375
Length    2, alignment  0/ 0:	24.0625	24.375	20.9375	20.9375
Length    2, alignment  1/ 0:	24.0625	23.75	20.7812	20.9375
Length    2, alignment  0/ 1:	24.0625	22.9688	20.9375	20.9375
Length    2, alignment  1/ 1:	24.0625	22.9688	20.7812	20.7812
Length    4, alignment  0/ 0:	22.9688	24.6875	19.0625	19.5312
Length    4, alignment  2/ 0:	22.0312	23.9062	18.9062	19.2188
Length    4, alignment  0/ 2:	22.0312	23.4375	18.9062	18.9062
Length    4, alignment  2/ 2:	22.0312	23.4375	18.9062	18.9062
Length    8, alignment  0/ 0:	21.875	35.3125	17.9688	18.125
Length    8, alignment  3/ 0:	30.3125	34.375	26.0938	26.25
Length    8, alignment  0/ 3:	31.875	33.5938	27.8125	27.6562
Length    8, alignment  3/ 3:	39.375	33.5938	35.1562	35.3125
Length   16, alignment  0/ 0:	20.7812	67.9688	17.6562	17.6562
Length   16, alignment  4/ 0:	30.3125	67.5	26.25	26.25
Length   16, alignment  0/ 4:	31.875	67.3438	27.6562	27.6562
Length   16, alignment  4/ 4:	39.375	67.5	35.1562	35.3125
Length   32, alignment  0/ 0:	22.3438	99.375	17.6562	17.6562
Length   32, alignment  5/ 0:	31.5625	99.2188	27.3438	27.3438
Length   32, alignment  0/ 5:	31.5625	99.2188	27.3438	27.3438
Length   32, alignment  5/ 5:	40.4688	99.2188	36.4062	36.4062
Length   64, alignment  0/ 0:	23.9062	179.219	19.6875	19.375
Length   64, alignment  6/ 0:	40.3125	179.219	36.0938	36.0938
Length   64, alignment  0/ 6:	41.7188	179.375	37.6562	37.5
Length   64, alignment  6/ 6:	57.5	179.219	53.4375	53.4375
Length  128, alignment  0/ 0:	30.4688	339.219	27.0312	25.3125
Length  128, alignment  7/ 0:	67.8125	339.219	64.375	63.125
Length  128, alignment  0/ 7:	69.8438	339.375	66.4062	64.375
Length  128, alignment  7/ 7:	72.3438	339.375	68.9062	67.3438
Length  256, alignment  0/ 0:	40.1562	659.219	35.9375	37.3438
Length  256, alignment  8/ 0:	108.594	659.219	105.156	106.094
Length  256, alignment  0/ 8:	110.469	659.375	107.188	107.5
Length  256, alignment  8/ 8:	81.4062	659.375	78.125	80
Length  512, alignment  0/ 0:	57.9688	1299.38	53.9062	52.3438
Length  512, alignment  9/ 0:	194.844	1299.53	191.719	190.156
Length  512, alignment  0/ 9:	196.875	1299.53	193.75	191.719
Length  512, alignment  9/ 9:	99.375	1299.53	96.0938	94.5312
Length 1024, alignment  0/ 0:	97.0312	2579.53	93.2812	91.4062
Length 1024, alignment 10/ 0:	372.812	2579.53	369.688	368.125
Length 1024, alignment  0/10:	375.156	2579.38	371.562	369.531
Length 1024, alignment 10/10:	139.062	2579.38	135.469	134.219
Length 2048, alignment  0/ 0:	194.375	5393.91	193.281	188.125
Length 2048, alignment 11/ 0:	713.906	5139.84	710.781	709.375
Length 2048, alignment  0/11:	715.938	5139.84	712.812	710.938
Length 2048, alignment 11/11:	235.781	5139.69	233.281	230.156
Length 4096, alignment  0/ 0:	369.062	10260.9	366.406	360.781
Length 4096, alignment 12/ 0:	1403.44	10260.9	1400.16	1398.59
Length 4096, alignment  0/12:	1405.16	10435.2	1402.66	1399.84
Length 4096, alignment 12/12:	410.781	10260.3	408.125	402.344
Length 8192, alignment  0/ 0:	717.344	20503.6	714.844	703.906
Length 8192, alignment 13/ 0:	2781.41	20740.5	2778.12	2776.72
Length 8192, alignment  0/13:	2783.28	20503.1	2779.38	2776.56
Length 8192, alignment 13/13:	759.375	20654.2	757.031	746.406
Length 16384, alignment  0/ 0:	1430.16	41006.7	1429.84	1408.59
Length 16384, alignment 14/ 0:	5563.91	41004.4	5550.16	5734.38
Length 16384, alignment  0/14:	5549.06	41002.2	5547.03	5693.75
Length 16384, alignment 14/14:	1461.88	40995.2	1461.41	1440.47
Length 32768, alignment  0/ 0:	3426.72	82858.4	3555.16	4450.62
Length 32768, alignment 15/ 0:	12141.2	83634.5	12128.8	12733.9
Length 32768, alignment  0/15:	12190.8	83705	12164.4	12682.2
Length 32768, alignment 15/15:	3548.28	83734.7	3546.41	3970
Length 65536, alignment  0/ 0:	7917.97	174364	7863.91	15286.1
Length 65536, alignment 16/ 0:	8100.16	174329	7898.44	15577.5
Length 65536, alignment  0/16:	7956.56	174425	7906.09	15322
Length 65536, alignment 16/16:	7952.34	174475	7907.34	15613.8
Length    0, alignment  0/ 0:	25.3125	17.0312	21.7188	21.7188
Length    0, alignment  0/ 0:	24.375	16.5625	21.4062	21.5625
Length    0, alignment  0/ 0:	24.375	16.4062	21.4062	21.25
Length    0, alignment  0/ 0:	24.5312	16.4062	21.25	21.4062
Length    1, alignment  0/ 0:	24.375	16.4062	21.25	20.9375
Length    1, alignment  1/ 0:	24.0625	15.9375	20.9375	21.0938
Length    1, alignment  0/ 1:	24.0625	15.9375	20.9375	21.0938
Length    1, alignment  1/ 1:	24.2188	15.9375	20.9375	20.9375
Length    2, alignment  0/ 0:	24.0625	23.4375	20.9375	21.0938
Length    2, alignment  2/ 0:	24.0625	23.4375	20.9375	21.0938
Length    2, alignment  0/ 2:	24.0625	23.125	20.9375	21.0938
Length    2, alignment  2/ 2:	24.0625	23.2812	20.9375	21.0938
Length    3, alignment  0/ 0:	24.0625	22.5	20.9375	20.9375
Length    3, alignment  3/ 0:	24.0625	21.875	20.9375	20.9375
Length    3, alignment  0/ 3:	24.0625	20.7812	20.9375	20.9375
Length    3, alignment  3/ 3:	24.2188	20.9375	20.9375	20.9375
Length    4, alignment  0/ 0:	22.3438	23.75	19.0625	18.9062
Length    4, alignment  4/ 0:	22.1875	23.5938	18.9062	19.0625
Length    4, alignment  0/ 4:	22.0312	23.4375	19.0625	18.9062
Length    4, alignment  4/ 4:	22.0312	23.4375	18.9062	19.0625
Length    5, alignment  0/ 0:	22.0312	27.1875	18.9062	18.9062
Length    5, alignment  5/ 0:	30.625	26.4062	27.5	27.3438
Length    5, alignment  0/ 5:	32.1875	25.9375	28.9062	29.0625
Length    5, alignment  5/ 5:	39.6875	25.9375	36.25	36.4062
Length    6, alignment  0/ 0:	22.0312	29.375	18.9062	18.9062
Length    6, alignment  6/ 0:	26.0938	28.75	22.9688	22.8125
Length    6, alignment  0/ 6:	27.5	28.5938	24.375	24.5312
Length    6, alignment  6/ 6:	31.0938	28.4375	27.9688	27.9688
Length    7, alignment  0/ 0:	21.875	32.5	18.9062	18.9062
Length    7, alignment  7/ 0:	25.9375	31.5625	22.9688	22.9688
Length    7, alignment  0/ 7:	27.5	30.9375	24.375	24.375
Length    7, alignment  7/ 7:	31.0938	31.0938	27.9688	27.9688
Length    8, alignment  0/ 0:	21.4062	34.0625	17.9688	17.8125
Length    8, alignment  8/ 0:	20.9375	33.9062	17.6562	17.8125
Length    8, alignment  0/ 8:	20.9375	33.75	17.8125	17.8125
Length    8, alignment  8/ 8:	20.9375	33.75	17.6562	17.8125
Length    9, alignment  0/ 0:	31.4062	37.0312	27.1875	27.3438
Length    9, alignment  9/ 0:	35.4688	36.5625	31.0938	31.25
Length    9, alignment  0/ 9:	35.4688	36.25	31.0938	31.4062
Length    9, alignment  9/ 9:	39.375	36.4062	35.3125	35.3125
Length   10, alignment  0/ 0:	31.4062	39.2188	27.3438	27.1875
Length   10, alignment 10/ 0:	35.3125	38.9062	31.25	31.25
Length   10, alignment  0/10:	35.4688	38.75	31.25	31.25
Length   10, alignment 10/10:	39.375	173.438	35.625	35.3125
Length   11, alignment  0/ 0:	31.4062	41.4062	27.3438	27.3438
Length   11, alignment 11/ 0:	35.3125	41.25	31.25	31.25
Length   11, alignment  0/11:	35.4688	41.0938	31.25	31.25
Length   11, alignment 11/11:	39.5312	41.0938	35.3125	35.3125
Length   12, alignment  0/ 0:	31.4062	44.0625	27.1875	27.1875
Length   12, alignment 12/ 0:	31.875	43.75	27.8125	27.8125
Length   12, alignment  0/12:	30.9375	43.5938	26.875	26.7188
Length   12, alignment 12/12:	30.9375	43.5938	26.7188	26.7188
Length   13, alignment  0/ 0:	31.4062	46.25	27.3438	27.1875
Length   13, alignment 13/ 0:	35.4688	46.25	31.25	31.25
Length   13, alignment  0/13:	35.3125	46.0938	31.25	31.25
Length   13, alignment 13/13:	39.375	46.25	35.3125	35.3125
Length   14, alignment  0/ 0:	31.4062	52.6562	27.3438	27.1875
Length   14, alignment 14/ 0:	35.4688	52.5	31.25	31.4062
Length   14, alignment  0/14:	35.3125	52.5	31.25	31.25
Length   14, alignment 14/14:	39.375	52.5	35.1562	35.3125
Length   15, alignment  0/ 0:	31.4062	65	27.3438	27.3438
Length   15, alignment 15/ 0:	35.3125	64.8438	31.25	31.25
Length   15, alignment  0/15:	35.4688	64.8438	31.0938	31.25
Length   15, alignment 15/15:	39.5312	64.8438	35.3125	35.3125
Length   16, alignment  0/ 0:	20.9375	67.5	17.6562	17.6562
Length   16, alignment 16/ 0:	23.2812	67.5	17.8125	17.6562
Length   16, alignment  0/16:	20.9375	67.3438	17.6562	17.6562
Length   16, alignment 16/16:	20.9375	67.3438	17.8125	17.6562
Length   17, alignment  0/ 0:	32.9688	62.0312	28.5938	28.4375
Length   17, alignment 17/ 0:	36.5625	61.875	32.5	32.5
Length   17, alignment  0/17:	36.5625	61.875	32.5	32.3438
Length   17, alignment 17/17:	40.625	61.875	36.4062	36.4062
Length   18, alignment  0/ 0:	32.5	64.375	28.4375	28.2812
Length   18, alignment 18/ 0:	36.5625	64.375	32.3438	32.3438
Length   18, alignment  0/18:	36.5625	64.375	32.5	32.3438
Length   18, alignment 18/18:	40.4688	64.375	36.4062	36.4062
Length   19, alignment  0/ 0:	32.5	66.875	28.4375	28.2812
Length   19, alignment 19/ 0:	36.7188	66.875	32.3438	32.5
Length   19, alignment  0/19:	36.5625	66.875	32.3438	32.3438
Length   19, alignment 19/19:	40.625	66.875	36.5625	36.4062
Length   20, alignment  0/ 0:	32.5	69.375	28.4375	28.2812
Length   20, alignment 20/ 0:	36.5625	69.375	32.5	32.3438
Length   20, alignment  0/20:	36.5625	69.375	32.5	32.3438
Length   20, alignment 20/20:	40.625	69.375	36.5625	36.25
Length   21, alignment  0/ 0:	32.5	71.875	28.4375	28.2812
Length   21, alignment 21/ 0:	36.5625	71.875	32.5	32.3438
Length   21, alignment  0/21:	36.5625	71.875	32.3438	32.3438
Length   21, alignment 21/21:	40.625	71.875	36.4062	36.4062
Length   22, alignment  0/ 0:	32.6562	74.375	28.4375	28.4375
Length   22, alignment 22/ 0:	36.5625	74.375	32.3438	32.3438
Length   22, alignment  0/22:	36.7188	74.375	32.3438	32.3438
Length   22, alignment 22/22:	40.4688	74.375	36.4062	36.4062
Length   23, alignment  0/ 0:	32.5	76.875	28.4375	28.2812
Length   23, alignment 23/ 0:	36.5625	76.875	32.3438	32.3438
Length   23, alignment  0/23:	36.5625	76.875	32.3438	32.3438
Length   23, alignment 23/23:	40.4688	76.875	36.4062	36.4062
Length   24, alignment  0/ 0:	32.6562	79.375	28.4375	28.4375
Length   24, alignment 24/ 0:	32.0312	79.375	27.9688	27.9688
Length   24, alignment  0/24:	32.0312	79.375	27.9688	27.9688
Length   24, alignment 24/24:	31.5625	79.375	27.5	27.3438
Length   25, alignment  0/ 0:	32.5	81.875	28.4375	28.4375
Length   25, alignment 25/ 0:	36.5625	81.875	32.5	32.3438
Length   25, alignment  0/25:	36.5625	81.875	32.5	32.3438
Length   25, alignment 25/25:	40.625	81.875	36.4062	36.4062
Length   26, alignment  0/ 0:	32.6562	84.375	28.4375	28.4375
Length   26, alignment 26/ 0:	36.5625	84.375	32.3438	32.3438
Length   26, alignment  0/26:	36.5625	84.2188	32.5	32.3438
Length   26, alignment 26/26:	40.625	84.375	36.4062	36.25
Length   27, alignment  0/ 0:	32.6562	86.7188	28.4375	28.2812
Length   27, alignment 27/ 0:	36.5625	86.875	32.3438	32.3438
Length   27, alignment  0/27:	36.7188	86.875	32.3438	32.3438
Length   27, alignment 27/27:	40.4688	86.875	36.5625	36.4062
Length   28, alignment  0/ 0:	32.6562	89.375	28.4375	28.4375
Length   28, alignment 28/ 0:	36.5625	89.375	32.5	32.3438
Length   28, alignment  0/28:	36.5625	89.2188	32.5	32.3438
Length   28, alignment 28/28:	40.625	89.375	36.4062	36.4062
Length   29, alignment  0/ 0:	32.6562	91.7188	28.4375	28.4375
Length   29, alignment 29/ 0:	36.5625	91.875	32.3438	32.3438
Length   29, alignment  0/29:	36.5625	91.875	32.3438	32.3438
Length   29, alignment 29/29:	40.4688	91.875	36.5625	36.4062
Length   30, alignment  0/ 0:	32.5	94.375	28.4375	28.4375
Length   30, alignment 30/ 0:	36.5625	94.375	32.3438	32.5
Length   30, alignment  0/30:	36.5625	94.375	32.3438	32.5
Length   30, alignment 30/30:	40.625	95.625	36.4062	36.4062
Length   31, alignment  0/ 0:	32.5	96.875	28.4375	28.2812
Length   31, alignment 31/ 0:	36.5625	96.7188	32.5	32.3438
Length   31, alignment  0/31:	36.5625	96.875	32.3438	32.3438
Length   31, alignment 31/31:	40.4688	96.875	36.4062	36.4062
Length   48, alignment  0/ 0:	22.9688	139.219	19.2188	19.2188
Length   48, alignment  3/ 0:	40.1562	139.219	36.25	36.0938
Length   48, alignment  0/ 3:	41.7188	139.375	37.6562	37.5
Length   48, alignment  3/ 3:	57.5	139.219	53.4375	53.4375
Length   80, alignment  0/ 0:	25.1562	219.219	20.7812	20.4688
Length   80, alignment  5/ 0:	51.5625	219.375	47.8125	48.2812
Length   80, alignment  0/ 5:	51.0938	219.219	46.875	46.875
Length   80, alignment  5/ 5:	78.5938	219.219	74.375	74.375
Length   96, alignment  0/ 0:	24.0625	259.375	20.4688	20.4688
Length   96, alignment  6/ 0:	51.5625	259.219	47.9688	47.9688
Length   96, alignment  0/ 6:	51.0938	259.219	46.875	46.875
Length   96, alignment  6/ 6:	78.4375	259.219	74.375	74.2188
Length  112, alignment  0/ 0:	30	299.375	26.7188	24.5312
Length  112, alignment  7/ 0:	67.8125	299.219	64.375	62.9688
Length  112, alignment  0/ 7:	69.8438	299.375	66.4062	64.375
Length  112, alignment  7/ 7:	72.3438	299.219	68.9062	67.1875
Length  144, alignment  0/ 0:	29.8438	379.375	26.4062	24.5312
Length  144, alignment  9/ 0:	67.8125	379.219	64.5312	62.9688
Length  144, alignment  0/ 9:	90.4688	379.375	86.7188	85.1562
Length  144, alignment  9/ 9:	76.5625	379.375	73.4375	72.1875
Length  160, alignment  0/ 0:	34.2188	419.375	30.9375	28.9062
Length  160, alignment 10/ 0:	87.0312	419.219	83.9062	82.3438
Length  160, alignment  0/10:	89.0625	419.375	86.0938	83.75
Length  160, alignment 10/10:	76.7188	419.375	73.4375	71.7188
Length  176, alignment  0/ 0:	34.2188	459.219	31.0938	28.75
Length  176, alignment 11/ 0:	87.1875	459.375	83.9062	82.1875
Length  176, alignment  0/11:	89.0625	459.375	85.9375	83.9062
Length  176, alignment 11/11:	76.7188	459.375	73.4375	71.7188
Length  192, alignment  0/ 0:	34.0625	499.375	31.0938	28.9062
Length  192, alignment 12/ 0:	87.1875	499.219	84.0625	82.1875
Length  192, alignment  0/12:	89.0625	499.219	85.9375	83.75
Length  192, alignment 12/12:	76.5625	499.375	73.5938	71.875
Length  208, alignment  0/ 0:	34.2188	539.375	30.9375	28.9062
Length  208, alignment 13/ 0:	87.0312	539.219	83.9062	82.3438
Length  208, alignment  0/13:	111.25	539.219	107.812	108.75
Length  208, alignment 13/13:	81.4062	539.219	78.125	80.1562
Length  224, alignment  0/ 0:	38.9062	579.375	35.7812	37.0312
Length  224, alignment 14/ 0:	108.906	579.375	105.625	106.875
Length  224, alignment  0/14:	110.938	579.375	107.812	108.438
Length  224, alignment 14/14:	81.4062	579.219	78.2812	80
Length  240, alignment  0/ 0:	38.9062	619.375	35.7812	37.0312
Length  240, alignment 15/ 0:	108.906	619.219	105.625	107.031
Length  240, alignment  0/15:	110.781	619.375	107.656	108.438
Length  240, alignment 15/15:	81.5625	619.375	78.125	80
Length  272, alignment  0/ 0:	38.9062	699.375	35.625	37.0312
Length  272, alignment 17/ 0:	108.906	699.375	105.781	106.875
Length  272, alignment  0/17:	133.125	864.688	129.531	127.812
Length  272, alignment 17/17:	85.7812	699.375	82.5	81.5625
Length  288, alignment  0/ 0:	43.4375	739.375	40.1562	38.125
Length  288, alignment 18/ 0:	130.312	739.375	127.031	125.469
Length  288, alignment  0/18:	132.344	739.375	129.062	127.031
Length  288, alignment 18/18:	85.9375	739.375	82.5	81.0938
Length  304, alignment  0/ 0:	43.2812	779.531	40	38.125
Length  304, alignment 19/ 0:	130.312	779.375	127.031	125.625
Length  304, alignment  0/19:	132.344	779.531	129.062	127.031
Length  304, alignment 19/19:	85.7812	779.375	82.5	81.0938
Length  320, alignment  0/ 0:	43.4375	819.375	40.1562	37.9688
Length  320, alignment 20/ 0:	130.312	819.375	127.031	125.469
Length  320, alignment  0/20:	132.344	819.531	128.906	127.031
Length  320, alignment 20/20:	85.9375	819.375	82.5	81.0938
Length  336, alignment  0/ 0:	43.4375	859.375	40.1562	38.125
Length  336, alignment 21/ 0:	130.312	859.375	127.031	125.625
Length  336, alignment  0/21:	155	859.375	150.781	149.375
Length  336, alignment 21/21:	90.1562	859.375	86.875	85.7812
Length  352, alignment  0/ 0:	47.6562	899.531	44.5312	42.5
Length  352, alignment 22/ 0:	151.562	899.375	148.438	146.875
Length  352, alignment  0/22:	153.594	899.375	150.469	148.281
Length  352, alignment 22/22:	90.1562	899.375	87.0312	85.4688
Length  368, alignment  0/ 0:	47.8125	939.375	44.5312	42.5
Length  368, alignment 23/ 0:	151.562	939.375	148.438	146.875
Length  368, alignment  0/23:	153.75	939.375	150.625	148.438
Length  368, alignment 23/23:	90.1562	939.375	87.0312	85.4688
Length  384, alignment  0/ 0:	47.8125	979.375	44.5312	42.5
Length  384, alignment 24/ 0:	151.562	979.375	147.969	146.406
Length  384, alignment  0/24:	153.125	979.531	149.844	147.969
Length  384, alignment 24/24:	90.1562	979.531	87.0312	85.4688
Length  400, alignment  0/ 0:	47.6562	1019.53	44.5312	42.5
Length  400, alignment 25/ 0:	151.719	1019.53	148.438	147.031
Length  400, alignment  0/25:	176.25	1019.53	172.188	170.781
Length  400, alignment 25/25:	94.8438	1187.5	91.4062	90
Length  416, alignment  0/ 0:	52.1875	1059.53	49.0625	46.875
Length  416, alignment 26/ 0:	173.125	1059.53	170	168.438
Length  416, alignment  0/26:	175.156	1059.53	172.031	169.844
Length  416, alignment 26/26:	94.6875	1059.53	91.5625	90
Length  432, alignment  0/ 0:	52.1875	1099.53	49.0625	47.0312
Length  432, alignment 27/ 0:	173.125	1099.69	170	168.594
Length  432, alignment  0/27:	175.156	1099.53	171.875	169.844
Length  432, alignment 27/27:	94.6875	1099.53	91.4062	90
Length  448, alignment  0/ 0:	52.1875	1139.53	49.0625	47.0312
Length  448, alignment 28/ 0:	173.125	1139.38	170	168.438
Length  448, alignment  0/28:	175.156	1139.53	171.875	169.844
Length  448, alignment 28/28:	94.6875	1139.53	91.4062	90
Length  464, alignment  0/ 0:	52.1875	1179.53	49.0625	47.0312
Length  464, alignment 29/ 0:	173.281	1179.53	170	168.438
Length  464, alignment  0/29:	197.344	1179.53	193.75	191.875
Length  464, alignment 29/29:	99.5312	1179.53	96.0938	94.6875
Length  480, alignment  0/ 0:	56.7188	1219.53	53.5938	51.5625
Length  480, alignment 30/ 0:	194.688	1219.53	191.562	190.156
Length  480, alignment  0/30:	196.875	1219.53	193.594	191.562
Length  480, alignment 30/30:	99.5312	1219.38	96.0938	94.5312
Length  496, alignment  0/ 0:	56.7188	1259.53	53.75	51.5625
Length  496, alignment 31/ 0:	194.688	1259.38	191.562	190
Length  496, alignment  0/31:	197.031	1259.53	193.594	191.719
Length  496, alignment 31/31:	99.375	1259.53	96.0938	94.5312
Length 1024, alignment  0/ 0:	96.4062	2579.38	93.125	91.25
Length 1024, alignment 32/ 0:	96.0938	2579.53	94.2188	91.0938
Length 1024, alignment  0/32:	96.25	2579.53	93.2812	90.9375
Length 1024, alignment 32/32:	96.4062	2579.53	92.9688	91.0938
Length 1056, alignment  0/ 0:	104.688	2659.53	101.094	99.2188
Length 1056, alignment 33/ 0:	391.719	2659.38	388.125	386.719
Length 1056, alignment  0/33:	393.594	2659.53	390.156	388.125
Length 1056, alignment 33/33:	143.75	2659.53	140.156	145.781
Length 1088, alignment  0/ 0:	101.094	2739.53	97.5	102.812
Length 1088, alignment 34/ 0:	391.719	2739.53	388.125	386.562
Length 1088, alignment  0/34:	393.594	2739.38	390	388.125
Length 1088, alignment 34/34:	143.594	2739.53	140	145.625
Length 1120, alignment  0/ 0:	105.312	2819.53	102.031	107.031
Length 1120, alignment 35/ 0:	413.125	2819.53	409.688	408.125
Length 1120, alignment  0/35:	415.156	2819.38	411.719	409.688
Length 1120, alignment 35/35:	148.125	2819.53	144.531	150.156
Length 1152, alignment  0/ 0:	105.781	3025.62	102.5	107.188
Length 1152, alignment 36/ 0:	413.281	2899.38	409.844	408.125
Length 1152, alignment  0/36:	415.156	2899.53	411.719	409.844
Length 1152, alignment 36/36:	148.125	2899.53	144.531	150.156
Length 1184, alignment  0/ 0:	110.156	2979.53	106.562	111.719
Length 1184, alignment 37/ 0:	434.688	2979.53	431.25	429.688
Length 1184, alignment  0/37:	436.719	2979.53	433.281	431.094
Length 1184, alignment 37/37:	152.656	2979.53	149.219	154.688
Length 1216, alignment  0/ 0:	110	3059.53	106.719	111.719
Length 1216, alignment 38/ 0:	434.688	3059.53	431.25	429.844
Length 1216, alignment  0/38:	436.719	3059.38	433.125	431.25
Length 1216, alignment 38/38:	152.812	3059.53	149.219	154.688
Length 1248, alignment  0/ 0:	114.531	3139.53	110.938	109.219
Length 1248, alignment 39/ 0:	456.25	3264.84	453.125	451.25
Length 1248, alignment  0/39:	458.281	3139.38	454.688	452.812
Length 1248, alignment 39/39:	157.188	3139.53	153.594	152.188
Length 1280, alignment  0/ 0:	114.531	3219.38	111.094	109.219
Length 1280, alignment 40/ 0:	456.094	3219.38	452.5	451.094
Length 1280, alignment  0/40:	458.125	3219.53	454.688	452.656
Length 1280, alignment 40/40:	157.188	3219.53	153.594	152.188
Length 1312, alignment  0/ 0:	121.562	3299.84	115.625	113.594
Length 1312, alignment 41/ 0:	477.812	3299.53	474.375	472.812
Length 1312, alignment  0/41:	479.844	3299.53	476.25	474.375
Length 1312, alignment 41/41:	161.562	3299.53	158.125	156.719
Length 1344, alignment  0/ 0:	119.219	3379.53	115.781	113.75
Length 1344, alignment 42/ 0:	477.812	3379.53	636.406	473.281
Length 1344, alignment  0/42:	479.844	3379.53	476.094	474.375
Length 1344, alignment 42/42:	161.562	3379.53	158.125	156.719
Length 1376, alignment  0/ 0:	123.594	3459.53	120.156	118.281
Length 1376, alignment 43/ 0:	499.219	3459.53	495.938	494.375
Length 1376, alignment  0/43:	501.406	3459.53	497.812	495.625
Length 1376, alignment 43/43:	166.25	3459.38	162.656	161.25
Length 1408, alignment  0/ 0:	123.75	3539.53	120.156	118.281
Length 1408, alignment 44/ 0:	499.219	3539.53	495.781	494.375
Length 1408, alignment  0/44:	501.406	3539.53	497.969	496.094
Length 1408, alignment 44/44:	166.25	3539.38	162.656	161.094
Length 1440, alignment  0/ 0:	128.281	3619.53	124.531	122.812
Length 1440, alignment 45/ 0:	520.938	3750.94	517.812	515.938
Length 1440, alignment  0/45:	522.969	3619.53	519.531	517.5
Length 1440, alignment 45/45:	170.625	3619.38	167.188	165.625
Length 1472, alignment  0/ 0:	128.281	3699.53	124.688	122.812
Length 1472, alignment 46/ 0:	521.094	3699.53	517.5	516.094
Length 1472, alignment  0/46:	522.812	3699.53	519.375	517.5
Length 1472, alignment 46/46:	170.781	3699.53	167.188	165.625
Length 1504, alignment  0/ 0:	132.812	3779.38	129.219	127.188
Length 1504, alignment 47/ 0:	542.344	3779.53	538.906	537.188
Length 1504, alignment  0/47:	545.156	3779.53	540.469	538.594
Length 1504, alignment 47/47:	175.312	3779.38	171.875	170.469
Length 1536, alignment  0/ 0:	132.812	3982.5	129.531	127.344
Length 1536, alignment 48/ 0:	132.812	3859.53	129.375	127.344
Length 1536, alignment  0/48:	133.281	3859.53	129.844	127.812
Length 1536, alignment 48/48:	133.281	3859.53	129.844	127.812
Length 1568, alignment  0/ 0:	138.438	3939.38	134.219	132.031
Length 1568, alignment 49/ 0:	563.594	3939.53	560	558.594
Length 1568, alignment  0/49:	565.469	3939.38	562.031	560
Length 1568, alignment 49/49:	180.625	3939.53	177.031	174.844
Length 1600, alignment  0/ 0:	137.5	4019.53	133.906	132.031
Length 1600, alignment 50/ 0:	563.438	4019.53	560	558.594
Length 1600, alignment  0/50:	565.469	4019.53	562.031	683.125
Length 1600, alignment 50/50:	180.625	4019.53	177.188	175.469
Length 1632, alignment  0/ 0:	142.5	4099.53	138.75	137.031
Length 1632, alignment 51/ 0:	585	4099.53	581.406	580
Length 1632, alignment  0/51:	587.031	4099.53	583.594	581.562
Length 1632, alignment 51/51:	188.75	4099.53	185.469	182.5
Length 1664, alignment  0/ 0:	142.5	4179.53	138.906	136.875
Length 1664, alignment 52/ 0:	585	4179.53	581.406	580
Length 1664, alignment  0/52:	587.031	4179.53	583.438	581.562
Length 1664, alignment 52/52:	188.281	4179.53	181.406	180.625
Length 1696, alignment  0/ 0:	154.688	4259.38	146.406	143.438
Length 1696, alignment 53/ 0:	732.5	4259.84	603.125	601.562
Length 1696, alignment  0/53:	608.594	4259.53	605	602.969
Length 1696, alignment 53/53:	199.375	4259.53	201.562	197.656
Length 1728, alignment  0/ 0:	148.906	4339.53	146.875	143.125
Length 1728, alignment 54/ 0:	606.406	4339.53	603.125	601.562
Length 1728, alignment  0/54:	608.438	4339.53	605	603.125
Length 1728, alignment 54/54:	197.5	4339.53	190.312	186.094
Length 1760, alignment  0/ 0:	153.906	4419.38	150.938	148.125
Length 1760, alignment 55/ 0:	627.969	4419.53	624.375	623.125
Length 1760, alignment  0/55:	630	4542.03	626.719	624.531
Length 1760, alignment 55/55:	213.906	4419.53	211.25	208.125
Length 1792, alignment  0/ 0:	152.969	4499.53	152.969	146.406
Length 1792, alignment 56/ 0:	628.125	4499.53	624.531	623.125
Length 1792, alignment  0/56:	630	4499.53	626.406	624.531
Length 1792, alignment 56/56:	212.031	4499.38	209.688	206.562
Length 1824, alignment  0/ 0:	164.844	4579.38	173.125	158.438
Length 1824, alignment 57/ 0:	649.531	4579.53	646.094	644.531
Length 1824, alignment  0/57:	651.562	4579.53	647.969	645.938
Length 1824, alignment 57/57:	218.281	4702.34	218.438	212.812
Length 1856, alignment  0/ 0:	168.281	4659.53	162.344	162.5
Length 1856, alignment 58/ 0:	649.531	4659.53	645.938	644.531
Length 1856, alignment  0/58:	651.406	4659.53	647.969	646.094
Length 1856, alignment 58/58:	218.281	4659.53	215.625	212.656
Length 1888, alignment  0/ 0:	168.594	4739.53	167.656	168.438
Length 1888, alignment 59/ 0:	671.094	4739.38	667.5	666.094
Length 1888, alignment  0/59:	672.969	4739.53	669.531	667.5
Length 1888, alignment 59/59:	224.844	4739.38	222.344	218.906
Length 1920, alignment  0/ 0:	173.594	4974.38	174.531	162.812
Length 1920, alignment 60/ 0:	670.938	4819.53	667.5	665.938
Length 1920, alignment  0/60:	672.969	4819.38	669.531	667.5
Length 1920, alignment 60/60:	224.688	4819.53	222.656	219.219
Length 1952, alignment  0/ 0:	190.156	4899.38	187.344	183.906
Length 1952, alignment 61/ 0:	692.5	4899.53	688.906	687.5
Length 1952, alignment  0/61:	694.531	4899.38	690.938	689.062
Length 1952, alignment 61/61:	229.062	4899.53	226.562	222.969
Length 1984, alignment  0/ 0:	190.156	4979.53	187.344	184.062
Length 1984, alignment 62/ 0:	692.5	4981.25	689.844	687.5
Length 1984, alignment  0/62:	694.531	4979.53	691.094	688.906
Length 1984, alignment 62/62:	229.375	4979.53	226.562	222.969
Length 2016, alignment  0/ 0:	194.375	5059.53	191.562	188.281
Length 2016, alignment 63/ 0:	713.906	5059.84	710.781	709.375
Length 2016, alignment  0/63:	715.938	5059.69	712.812	710.781
Length 2016, alignment 63/63:	235.625	5059.69	233.438	230
Length 4096, alignment  0/ 0:	369.219	10398	367.812	360.938

[-- Attachment #4: bench-memcpy-large.out --]
[-- Type: text/plain, Size: 2098 bytes --]

                       	__memcpy_thunderx	__memcpy_generic
Length 65543, alignment  0/ 0:	8083.75	15541.2
Length 65551, alignment  0/ 3:	24720.6	32405
Length 65567, alignment  3/ 0:	24486.2	33813.8
Length 65599, alignment  3/ 5:	24731.9	32400.6
Length 131079, alignment  0/ 0:	15959.4	31031.9
Length 131087, alignment  0/ 3:	49411.9	64778.1
Length 131103, alignment  3/ 0:	49505.6	66046.2
Length 131135, alignment  3/ 5:	49401.9	64780
Length 262151, alignment  0/ 0:	31648.1	62405
Length 262159, alignment  0/ 3:	98538.8	129344
Length 262175, alignment  3/ 0:	97577.5	132878
Length 262207, alignment  3/ 5:	99119.4	129346
Length 524295, alignment  0/ 0:	63994.4	124135
Length 524303, alignment  0/ 3:	199494	259969
Length 524319, alignment  3/ 0:	198194	264828
Length 524351, alignment  3/ 5:	199118	259842
Length 1048583, alignment  0/ 0:	152811	259784
Length 1048591, alignment  0/ 3:	422906	529983
Length 1048607, alignment  3/ 0:	424201	540640
Length 1048639, alignment  3/ 5:	422879	529976
Length 2097159, alignment  0/ 0:	276857	501925
Length 2097167, alignment  0/ 3:	812467	1.04305e+06
Length 2097183, alignment  3/ 0:	810185	1.06351e+06
Length 2097215, alignment  3/ 5:	812467	1.04307e+06
Length 4194311, alignment  0/ 0:	524355	986463
Length 4194319, alignment  0/ 3:	1.59268e+06	2.06977e+06
Length 4194335, alignment  3/ 0:	1.5818e+06	2.11026e+06
Length 4194367, alignment  3/ 5:	1.59222e+06	2.06932e+06
Length 8388615, alignment  0/ 0:	1.12852e+06	3.00444e+06
Length 8388623, alignment  0/ 3:	3.17872e+06	5.16414e+06
Length 8388639, alignment  3/ 0:	3.15213e+06	5.23659e+06
Length 8388671, alignment  3/ 5:	3.179e+06	5.1543e+06
Length 16777223, alignment  0/ 0:	3.54774e+06	1.30525e+07
Length 16777231, alignment  0/ 3:	6.8e+06	1.77641e+07
Length 16777247, alignment  3/ 0:	6.72802e+06	1.7955e+07
Length 16777279, alignment  3/ 5:	6.80436e+06	1.77679e+07
Length 33554439, alignment  0/ 0:	7.34141e+06	2.62947e+07
Length 33554447, alignment  0/ 3:	1.36974e+07	3.57826e+07
Length 33554463, alignment  3/ 0:	1.37467e+07	3.6138e+07
Length 33554495, alignment  3/ 5:	1.36981e+07	3.57831e+07

[-- Attachment #5: bench-memmove.out --]
[-- Type: text/plain, Size: 21879 bytes --]

                       	simple_memmove	__memmove_thunderx	__memmove_generic
Length    1, alignment  0/32:	37.1875	26.875	22.6562
Length    1, alignment 32/ 0:	18.2812	22.0312	22.0312
Length    1, alignment  0/ 0:	17.3438	21.875	21.875
Length    1, alignment  0/ 0:	16.875	21.875	21.875
Length    2, alignment  0/32:	22.0312	21.875	21.875
Length    2, alignment 32/ 0:	20.4688	21.875	21.875
Length    2, alignment  0/ 1:	21.25	21.875	21.875
Length    2, alignment  1/ 0:	20	21.875	21.875
Length    4, alignment  0/32:	27.0312	20.7812	20.3125
Length    4, alignment 32/ 0:	25.4688	19.8438	19.8438
Length    4, alignment  0/ 2:	26.4062	19.8438	19.6875
Length    4, alignment  2/ 0:	24.6875	19.8438	19.6875
Length    8, alignment  0/32:	37.1875	19.0625	18.75
Length    8, alignment 32/ 0:	35.625	18.125	18.2812
Length    8, alignment  0/ 3:	35.9375	28.75	28.5938
Length    8, alignment  3/ 0:	34.375	37.1875	37.1875
Length   16, alignment  0/32:	66.875	18.2812	18.125
Length   16, alignment 32/ 0:	69.0625	18.4375	18.125
Length   16, alignment  0/ 4:	66.25	28.5938	28.5938
Length   16, alignment  4/ 0:	68.2812	37.1875	37.0312
Length   32, alignment  0/32:	102.188	19.5312	19.2188
Length   32, alignment 32/ 0:	100.312	18.9062	19.0625
Length   32, alignment  0/ 5:	102.344	28.5938	28.5938
Length   32, alignment  5/ 0:	100.156	33.5938	33.5938
Length   64, alignment  0/32:	182.344	21.25	21.0938
Length   64, alignment 32/ 0:	180.312	20.625	20.4688
Length   64, alignment  0/ 6:	182.188	38.9062	38.5938
Length   64, alignment  6/ 0:	180.312	47.3438	47.3438
Length  128, alignment  0/32:	342.344	26.25	25.625
Length  128, alignment 32/ 0:	340.156	28.125	26.7188
Length  128, alignment  0/ 7:	342.344	65.1562	65
Length  128, alignment  7/ 0:	340.156	70.9375	69.2188
Length  256, alignment  0/32:	662.344	35.4688	35.3125
Length  256, alignment 32/ 0:	660.312	37.1875	36.0938
Length  256, alignment  0/ 8:	662.344	106.406	106.719
Length  256, alignment  8/ 0:	660.312	111.094	109.688
Length  512, alignment  0/32:	1302.34	53.75	53.75
Length  512, alignment 32/ 0:	1300.47	55.3125	53.9062
Length  512, alignment  0/ 9:	1302.34	192.656	192.5
Length  512, alignment  9/ 0:	1300.31	197.656	195.938
Length 1024, alignment  0/32:	2582.34	93.125	93.125
Length 1024, alignment 32/ 0:	2580.31	94.375	93.2812
Length 1024, alignment  0/10:	2582.34	367.188	366.875
Length 1024, alignment 10/ 0:	2580.47	375.625	371.094
Length 2048, alignment  0/32:	5537.97	193.75	190.469
Length 2048, alignment 32/ 0:	5142.81	192.812	189.375
Length 2048, alignment  0/11:	5142.66	712.031	712.031
Length 2048, alignment 11/ 0:	5141.09	716.719	715
Length 4096, alignment  0/32:	10263.1	366.094	362.031
Length 4096, alignment 32/ 0:	10261.7	367.031	361.25
Length 4096, alignment  0/12:	10263.3	1400.94	1564.69
Length 4096, alignment 12/ 0:	10261.6	1405.31	1403.75
Length    0, alignment  0/32:	19.6875	23.2812	22.9688
Length    0, alignment 32/ 0:	17.8125	22.3438	22.3438
Length    0, alignment  0/ 0:	17.5	22.1875	22.1875
Length    0, alignment  0/ 0:	17.1875	22.1875	22.1875
Length    1, alignment  0/32:	19.2188	22.0312	22.0312
Length    1, alignment 32/ 0:	17.1875	22.0312	21.875
Length    1, alignment  0/ 1:	19.0625	21.7188	21.875
Length    1, alignment  1/ 0:	17.0312	21.875	21.7188
Length    2, alignment  0/32:	21.4062	21.875	21.875
Length    2, alignment 32/ 0:	19.5312	21.875	22.0312
Length    2, alignment  0/ 2:	21.0938	21.875	21.875
Length    2, alignment  2/ 0:	19.5312	21.875	21.875
Length    3, alignment  0/32:	23.9062	21.7188	21.875
Length    3, alignment 32/ 0:	22.9688	21.875	21.875
Length    3, alignment  0/ 3:	23.2812	21.875	21.875
Length    3, alignment  3/ 0:	22.1875	21.875	21.875
Length    4, alignment  0/32:	26.25	20	19.8438
Length    4, alignment 32/ 0:	24.375	19.8438	19.6875
Length    4, alignment  0/ 4:	26.0938	19.8438	19.6875
Length    4, alignment  4/ 0:	24.5312	19.8438	19.6875
Length    5, alignment  0/32:	29.5312	19.6875	19.6875
Length    5, alignment 32/ 0:	27.9688	19.8438	19.6875
Length    5, alignment  0/ 5:	28.75	29.6875	29.6875
Length    5, alignment  5/ 0:	27.1875	28.2812	28.2812
Length    6, alignment  0/32:	35.4688	19.6875	19.6875
Length    6, alignment 32/ 0:	30.3125	19.8438	19.6875
Length    6, alignment  0/ 6:	31.25	25.3125	25.1562
Length    6, alignment  6/ 0:	29.8438	23.75	23.75
Length    7, alignment  0/32:	34.375	19.8438	19.6875
Length    7, alignment 32/ 0:	32.5	19.8438	19.6875
Length    7, alignment  0/ 7:	33.4375	25.3125	25.1562
Length    7, alignment  7/ 0:	32.1875	23.75	23.75
Length    8, alignment  0/32:	36.4062	18.2812	18.4375
Length    8, alignment 32/ 0:	35	18.2812	18.2812
Length    8, alignment  0/ 8:	36.0938	18.125	18.125
Length    8, alignment  8/ 0:	34.5312	18.2812	18.125
Length    9, alignment  0/32:	38.9062	28.2812	28.125
Length    9, alignment 32/ 0:	37.5	28.125	28.125
Length    9, alignment  0/ 9:	37.9688	37.0312	37.1875
Length    9, alignment  9/ 0:	37.0312	32.1875	32.1875
Length   10, alignment  0/32:	41.4062	28.125	28.125
Length   10, alignment 32/ 0:	40	28.125	28.125
Length   10, alignment  0/10:	41.0938	37.1875	37.0312
Length   10, alignment 10/ 0:	39.6875	32.1875	32.0312
Length   11, alignment  0/32:	43.4375	28.125	28.125
Length   11, alignment 32/ 0:	42.3438	28.125	27.9688
Length   11, alignment  0/11:	42.9688	37.1875	37.0312
Length   11, alignment 11/ 0:	41.875	32.1875	32.0312
Length   12, alignment  0/32:	46.25	28.125	28.125
Length   12, alignment 32/ 0:	44.5312	28.125	28.125
Length   12, alignment  0/12:	45.9375	27.6562	27.5
Length   12, alignment 12/ 0:	44.375	28.5938	28.5938
Length   13, alignment  0/32:	48.5938	28.2812	28.125
Length   13, alignment 32/ 0:	46.875	28.125	28.125
Length   13, alignment  0/13:	48.125	32.1875	32.0312
Length   13, alignment 13/ 0:	46.875	32.1875	32.1875
Length   14, alignment  0/32:	50.7812	28.2812	28.125
Length   14, alignment 32/ 0:	53.9062	28.125	28.125
Length   14, alignment  0/14:	50.7812	32.1875	32.0312
Length   14, alignment 14/ 0:	53.75	32.0312	32.0312
Length   15, alignment  0/32:	62.3438	28.125	28.125
Length   15, alignment 32/ 0:	60.4688	28.2812	28.125
Length   15, alignment  0/15:	62.3438	32.1875	32.0312
Length   15, alignment 15/ 0:	60.3125	32.1875	32.0312
Length   16, alignment  0/32:	66.25	18.125	18.125
Length   16, alignment 32/ 0:	66.875	18.2812	18.125
Length   16, alignment  0/16:	66.25	18.2812	18.125
Length   16, alignment 16/ 0:	66.875	18.4375	18.125
Length   17, alignment  0/32:	63.5938	29.8438	29.8438
Length   17, alignment 32/ 0:	70.7812	29.8438	29.6875
Length   17, alignment  0/17:	63.4375	33.75	33.5938
Length   17, alignment 17/ 0:	70.7812	33.5938	33.75
Length   18, alignment  0/32:	67.3438	29.5312	29.5312
Length   18, alignment 32/ 0:	73.4375	29.6875	29.6875
Length   18, alignment  0/18:	67.1875	33.5938	33.5938
Length   18, alignment 18/ 0:	73.2812	33.5938	33.5938
Length   19, alignment  0/32:	68.5938	29.6875	29.5312
Length   19, alignment 32/ 0:	75.9375	29.5312	29.5312
Length   19, alignment  0/19:	68.2812	33.5938	33.5938
Length   19, alignment 19/ 0:	75.9375	35.3125	33.5938
Length   20, alignment  0/32:	72.3438	29.6875	29.5312
Length   20, alignment 32/ 0:	70.3125	29.6875	29.5312
Length   20, alignment  0/20:	72.3438	33.75	33.75
Length   20, alignment 20/ 0:	70.3125	33.75	33.75
Length   21, alignment  0/32:	73.5938	29.6875	29.5312
Length   21, alignment 32/ 0:	72.9688	29.6875	29.5312
Length   21, alignment  0/21:	73.2812	33.5938	33.5938
Length   21, alignment 21/ 0:	72.8125	33.5938	33.5938
Length   22, alignment  0/32:	77.3438	29.6875	29.6875
Length   22, alignment 32/ 0:	75.3125	29.6875	29.6875
Length   22, alignment  0/22:	77.3438	33.75	33.5938
Length   22, alignment 22/ 0:	75.3125	33.75	33.5938
Length   23, alignment  0/32:	78.5938	29.6875	29.6875
Length   23, alignment 32/ 0:	77.8125	29.6875	29.6875
Length   23, alignment  0/23:	78.4375	33.5938	33.5938
Length   23, alignment 23/ 0:	77.9688	33.5938	33.5938
Length   24, alignment  0/32:	82.1875	29.6875	29.6875
Length   24, alignment 32/ 0:	80.4688	29.6875	29.6875
Length   24, alignment  0/24:	82.3438	29.2188	29.0625
Length   24, alignment 24/ 0:	80.3125	29.2188	29.0625
Length   25, alignment  0/32:	83.5938	29.6875	29.5312
Length   25, alignment 32/ 0:	82.8125	30.1562	30
Length   25, alignment  0/25:	83.4375	33.5938	33.5938
Length   25, alignment 25/ 0:	82.8125	33.75	33.5938
Length   26, alignment  0/32:	87.3438	29.6875	29.6875
Length   26, alignment 32/ 0:	85.4688	29.6875	29.5312
Length   26, alignment  0/26:	87.3438	33.75	33.5938
Length   26, alignment 26/ 0:	85.3125	33.75	33.5938
Length   27, alignment  0/32:	88.4375	29.6875	29.6875
Length   27, alignment 32/ 0:	87.8125	29.6875	29.5312
Length   27, alignment  0/27:	88.2812	33.5938	33.5938
Length   27, alignment 27/ 0:	87.8125	33.75	33.5938
Length   28, alignment  0/32:	92.3438	29.6875	29.6875
Length   28, alignment 32/ 0:	90.3125	29.6875	29.5312
Length   28, alignment  0/28:	92.1875	33.75	33.5938
Length   28, alignment 28/ 0:	90.3125	33.75	33.5938
Length   29, alignment  0/32:	93.5938	29.6875	29.5312
Length   29, alignment 32/ 0:	92.8125	29.6875	29.6875
Length   29, alignment  0/29:	93.2812	33.5938	33.5938
Length   29, alignment 29/ 0:	92.9688	33.5938	33.5938
Length   30, alignment  0/32:	97.1875	29.6875	29.6875
Length   30, alignment 32/ 0:	95.3125	29.6875	29.5312
Length   30, alignment  0/30:	97.3438	33.5938	33.5938
Length   30, alignment 30/ 0:	95.3125	33.75	33.5938
Length   31, alignment  0/32:	98.5938	29.6875	29.5312
Length   31, alignment 32/ 0:	97.9688	29.6875	29.5312
Length   31, alignment  0/31:	98.2812	33.75	33.75
Length   31, alignment 31/ 0:	97.8125	33.5938	33.5938
Length   48, alignment  0/32:	142.188	20.4688	20.4688
Length   48, alignment 32/ 0:	140.469	20.4688	20.4688
Length   48, alignment  0/ 3:	142.344	38.9062	38.75
Length   48, alignment  3/ 0:	140.312	47.3438	47.3438
Length   80, alignment  0/32:	222.188	22.5	22.3438
Length   80, alignment 32/ 0:	220.312	21.875	21.7188
Length   80, alignment  0/ 5:	222.344	48.125	48.125
Length   80, alignment  5/ 0:	220.312	49.375	49.0625
Length   96, alignment  0/32:	262.188	21.5625	21.5625
Length   96, alignment 32/ 0:	260.312	21.5625	21.5625
Length   96, alignment  0/ 6:	262.188	48.125	48.125
Length   96, alignment  6/ 0:	260.312	49.375	49.0625
Length  112, alignment  0/32:	302.344	25.9375	25.4688
Length  112, alignment 32/ 0:	300.156	27.8125	26.0938
Length  112, alignment  0/ 7:	302.188	65	64.8438
Length  112, alignment  7/ 0:	300.312	70.625	68.9062
Length  144, alignment  0/32:	382.344	31.0938	30.9375
Length  144, alignment 32/ 0:	380.312	27.3438	25.9375
Length  144, alignment  0/ 9:	382.344	85.4688	85
Length  144, alignment  9/ 0:	380.156	70.3125	68.75
Length  160, alignment  0/32:	422.344	30.625	30.3125
Length  160, alignment 32/ 0:	420.312	32.6562	31.0938
Length  160, alignment  0/10:	422.5	85	84.6875
Length  160, alignment 10/ 0:	420.312	105.156	103.75
Length  176, alignment  0/32:	462.344	30.4688	30.3125
Length  176, alignment 32/ 0:	460.312	32.0312	30.4688
Length  176, alignment  0/11:	462.344	84.8438	84.8438
Length  176, alignment 11/ 0:	460.312	89.8438	88.4375
Length  192, alignment  0/32:	502.344	30.4688	30.3125
Length  192, alignment 32/ 0:	500.312	32.0312	30.4688
Length  192, alignment  0/12:	502.344	84.8438	84.6875
Length  192, alignment 12/ 0:	500.312	90	88.4375
Length  208, alignment  0/32:	708.281	35.4688	35.1562
Length  208, alignment 32/ 0:	540.312	32.0312	30.4688
Length  208, alignment  0/13:	542.5	106.406	106.562
Length  208, alignment 13/ 0:	540.312	89.8438	88.4375
Length  224, alignment  0/32:	582.344	34.8438	35
Length  224, alignment 32/ 0:	580.312	36.5625	35.3125
Length  224, alignment  0/14:	582.344	106.406	106.406
Length  224, alignment 14/ 0:	580.312	126.25	125.156
Length  240, alignment  0/32:	622.344	34.8438	35
Length  240, alignment 32/ 0:	620.312	36.4062	35
Length  240, alignment  0/15:	622.344	106.406	106.406
Length  240, alignment 15/ 0:	620.312	111.25	109.844
Length  272, alignment  0/32:	702.344	40	39.8438
Length  272, alignment 32/ 0:	700.469	36.25	35
Length  272, alignment  0/17:	702.344	127.969	127.969
Length  272, alignment 17/ 0:	700.312	106.25	104.844
Length  288, alignment  0/32:	742.344	39.6875	39.375
Length  288, alignment 32/ 0:	740.312	42.3438	40.1562
Length  288, alignment  0/18:	742.344	128.125	127.812
Length  288, alignment 18/ 0:	740.781	128.594	126.406
Length  304, alignment  0/32:	782.188	39.375	39.2188
Length  304, alignment 32/ 0:	780.312	41.0938	39.2188
Length  304, alignment  0/19:	782.188	127.969	127.812
Length  304, alignment 19/ 0:	780.469	128.125	126.25
Length  320, alignment  0/32:	822.344	39.5312	39.375
Length  320, alignment 32/ 0:	820.469	40.9375	39.375
Length  320, alignment  0/20:	822.344	127.969	127.812
Length  320, alignment 20/ 0:	820.312	127.969	126.25
Length  336, alignment  0/32:	862.188	44.6875	44.375
Length  336, alignment 32/ 0:	860.312	41.0938	39.375
Length  336, alignment  0/21:	862.344	149.531	149.375
Length  336, alignment 21/ 0:	860.469	127.969	126.094
Length  352, alignment  0/32:	902.188	43.9062	43.9062
Length  352, alignment 32/ 0:	900.312	46.5625	44.6875
Length  352, alignment  0/22:	902.344	149.375	149.219
Length  352, alignment 22/ 0:	900.312	149.531	148.281
Length  368, alignment  0/32:	942.344	43.9062	43.75
Length  368, alignment 32/ 0:	940.469	45.3125	43.9062
Length  368, alignment  0/23:	942.344	149.531	149.219
Length  368, alignment 23/ 0:	940.469	149.375	147.969
Length  384, alignment  0/32:	982.344	43.9062	43.9062
Length  384, alignment 32/ 0:	980.469	45.4688	43.9062
Length  384, alignment  0/24:	1129.22	149.219	149.375
Length  384, alignment 24/ 0:	980.312	148.75	147.344
Length  400, alignment  0/32:	1022.19	49.0625	49.0625
Length  400, alignment 32/ 0:	1020.31	45.3125	44.0625
Length  400, alignment  0/25:	1022.34	171.094	170.938
Length  400, alignment 25/ 0:	1020.47	149.375	147.969
Length  416, alignment  0/32:	1062.34	48.75	48.2812
Length  416, alignment 32/ 0:	1060.31	51.0938	49.375
Length  416, alignment  0/26:	1062.34	171.094	170.781
Length  416, alignment 26/ 0:	1060.47	170.938	169.844
Length  432, alignment  0/32:	1102.34	48.75	48.2812
Length  432, alignment 32/ 0:	1100.47	49.8438	48.4375
Length  432, alignment  0/27:	1102.34	171.094	170.781
Length  432, alignment 27/ 0:	1100.47	170.938	169.531
Length  448, alignment  0/32:	1142.34	48.5938	48.2812
Length  448, alignment 32/ 0:	1140.47	49.8438	48.5938
Length  448, alignment  0/28:	1142.34	170.938	170.938
Length  448, alignment 28/ 0:	1140.31	170.781	169.531
Length  464, alignment  0/32:	1182.34	53.2812	53.2812
Length  464, alignment 32/ 0:	1180.47	49.8438	48.4375
Length  464, alignment  0/29:	1182.34	192.5	192.344
Length  464, alignment 29/ 0:	1180.47	170.938	169.531
Length  480, alignment  0/32:	1222.34	52.9688	52.8125
Length  480, alignment 32/ 0:	1220.31	54.6875	53.2812
Length  480, alignment  0/30:	1222.34	192.5	192.188
Length  480, alignment 30/ 0:	1220.47	192.5	190.938
Length  496, alignment  0/32:	1262.34	53.125	52.8125
Length  496, alignment 32/ 0:	1260.31	54.2188	52.6562
Length  496, alignment  0/31:	1262.5	192.5	192.188
Length  496, alignment 31/ 0:	1260.47	192.344	190.781
Length 1024, alignment  0/ 0:	2580.47	18.4375	18.4375
Length 1024, alignment 32/ 0:	2702.97	94.8438	92.6562
Length 1024, alignment  0/32:	2582.5	92.9688	92.8125
Length 1024, alignment 32/32:	2580.16	18.2812	18.2812
Length 1056, alignment  0/ 0:	2660.47	18.125	18.125
Length 1056, alignment 33/ 0:	2660.78	389.375	387.656
Length 1056, alignment  0/33:	2662.81	389.688	389.531
Length 1056, alignment 33/33:	2660.62	18.125	18.125
Length 1088, alignment  0/ 0:	2740.78	18.125	18.125
Length 1088, alignment 34/ 0:	2740.78	389.375	387.656
Length 1088, alignment  0/34:	2742.66	389.531	389.531
Length 1088, alignment 34/34:	2740.78	18.2812	18.125
Length 1120, alignment  0/ 0:	2820.78	18.125	18.125
Length 1120, alignment 35/ 0:	2820.62	410.781	409.219
Length 1120, alignment  0/35:	2822.81	411.094	410.938
Length 1120, alignment 35/35:	2820.78	18.125	18.125
Length 1152, alignment  0/ 0:	2900.78	18.2812	18.125
Length 1152, alignment 36/ 0:	2900.62	410.781	409.062
Length 1152, alignment  0/36:	3014.69	411.562	411.25
Length 1152, alignment 36/36:	2900.78	18.2812	17.9688
Length 1184, alignment  0/ 0:	2980.78	18.125	18.125
Length 1184, alignment 37/ 0:	2980.78	432.344	430.625
Length 1184, alignment  0/37:	2982.81	432.5	432.5
Length 1184, alignment 37/37:	2980.78	18.125	18.125
Length 1216, alignment  0/ 0:	3062.03	18.4375	18.125
Length 1216, alignment 38/ 0:	3060.16	432.188	430.625
Length 1216, alignment  0/38:	3062.34	432.656	432.344
Length 1216, alignment 38/38:	3060.47	18.2812	18.125
Length 1248, alignment  0/ 0:	3140.47	18.125	18.125
Length 1248, alignment 39/ 0:	3140.31	453.906	452.188
Length 1248, alignment  0/39:	3142.34	454.219	453.906
Length 1248, alignment 39/39:	3140.47	18.125	17.9688
Length 1280, alignment  0/ 0:	3220.31	18.125	17.9688
Length 1280, alignment 40/ 0:	3378.75	453.75	452.188
Length 1280, alignment  0/40:	3222.5	454.062	453.906
Length 1280, alignment 40/40:	3220.47	18.125	18.125
Length 1312, alignment  0/ 0:	3300.31	18.125	18.125
Length 1312, alignment 41/ 0:	3300.47	475.312	473.594
Length 1312, alignment  0/41:	3302.34	475.625	475.469
Length 1312, alignment 41/41:	3300.31	18.2812	18.125
Length 1344, alignment  0/ 0:	3380.31	18.125	18.125
Length 1344, alignment 42/ 0:	3380.47	475.312	473.75
Length 1344, alignment  0/42:	3382.34	475.625	475.469
Length 1344, alignment 42/42:	3380.47	18.125	18.125
Length 1376, alignment  0/ 0:	3460.31	18.2812	18.125
Length 1376, alignment 43/ 0:	3460.31	496.875	495
Length 1376, alignment  0/43:	3462.5	497.031	619.219
Length 1376, alignment 43/43:	3460.62	18.125	18.125
Length 1408, alignment  0/ 0:	3540.47	18.2812	17.9688
Length 1408, alignment 44/ 0:	3540.31	496.719	495.156
Length 1408, alignment  0/44:	3542.34	497.031	496.875
Length 1408, alignment 44/44:	3540.47	18.125	18.125
Length 1440, alignment  0/ 0:	3620.31	18.2812	18.125
Length 1440, alignment 45/ 0:	3620.31	518.281	516.719
Length 1440, alignment  0/45:	3622.34	518.594	518.438
Length 1440, alignment 45/45:	3620.47	18.125	18.125
Length 1472, alignment  0/ 0:	3700.47	18.2812	18.125
Length 1472, alignment 46/ 0:	3700.31	518.281	516.719
Length 1472, alignment  0/46:	3702.5	518.438	518.438
Length 1472, alignment 46/46:	3700.47	19.8438	17.9688
Length 1504, alignment  0/ 0:	3781.25	18.125	17.9688
Length 1504, alignment 47/ 0:	3780.47	539.688	538.125
Length 1504, alignment  0/47:	3782.81	540.156	540
Length 1504, alignment 47/47:	3780.47	18.125	18.125
Length 1536, alignment  0/ 0:	3860.31	18.125	18.125
Length 1536, alignment 48/ 0:	3860.47	130.469	128.75
Length 1536, alignment  0/48:	3862.5	128.906	128.906
Length 1536, alignment 48/48:	3860.47	18.125	17.9688
Length 1568, alignment  0/ 0:	3940.62	18.125	17.9688
Length 1568, alignment 49/ 0:	3940.94	561.25	559.531
Length 1568, alignment  0/49:	3942.81	561.719	561.562
Length 1568, alignment 49/49:	3940.47	18.125	18.125
Length 1600, alignment  0/ 0:	4164.53	19.2188	18.125
Length 1600, alignment 50/ 0:	4021.72	561.25	559.688
Length 1600, alignment  0/50:	4022.81	561.719	561.562
Length 1600, alignment 50/50:	4020.47	18.125	18.125
Length 1632, alignment  0/ 0:	4100.78	18.125	18.125
Length 1632, alignment 51/ 0:	4100.62	582.812	581.094
Length 1632, alignment  0/51:	4102.66	583.281	582.969
Length 1632, alignment 51/51:	4100.47	18.2812	18.125
Length 1664, alignment  0/ 0:	4180.78	18.125	18.125
Length 1664, alignment 52/ 0:	4180.62	582.812	581.25
Length 1664, alignment  0/52:	4182.66	583.125	582.969
Length 1664, alignment 52/52:	4315.94	18.2812	18.125
Length 1696, alignment  0/ 0:	4260.78	18.2812	17.9688
Length 1696, alignment 53/ 0:	4260.62	604.219	602.656
Length 1696, alignment  0/53:	4262.81	604.688	604.531
Length 1696, alignment 53/53:	4260.31	18.125	18.125
Length 1728, alignment  0/ 0:	4340.31	18.2812	17.9688
Length 1728, alignment 54/ 0:	4340.78	604.375	602.5
Length 1728, alignment  0/54:	4342.81	604.531	604.531
Length 1728, alignment 54/54:	4340.47	18.2812	18.2812
Length 1760, alignment  0/ 0:	4420.31	18.125	18.125
Length 1760, alignment 55/ 0:	4420.78	625.625	624.062
Length 1760, alignment  0/55:	4422.66	760.312	626.25
Length 1760, alignment 55/55:	4420.31	18.125	17.9688
Length 1792, alignment  0/ 0:	4500.62	18.125	18.125
Length 1792, alignment 56/ 0:	4500.62	625.781	624.062
Length 1792, alignment  0/56:	4502.81	625.938	625.938
Length 1792, alignment 56/56:	4500.47	18.125	18.125
Length 1824, alignment  0/ 0:	4580.62	18.2812	18.125
Length 1824, alignment 57/ 0:	4582.5	647.656	646.562
Length 1824, alignment  0/57:	4582.19	647.5	647.344
Length 1824, alignment 57/57:	4580.47	18.9062	18.75
Length 1856, alignment  0/ 0:	4661.09	18.125	18.125
Length 1856, alignment 58/ 0:	4834.06	647.812	646.25
Length 1856, alignment  0/58:	4663.12	647.656	647.5
Length 1856, alignment 58/58:	4660.31	18.75	18.75
Length 1888, alignment  0/ 0:	4741.09	18.125	18.125
Length 1888, alignment 59/ 0:	4741.09	668.75	667.031
Length 1888, alignment  0/59:	4742.34	668.906	668.906
Length 1888, alignment 59/59:	4740.31	18.75	18.75
Length 1920, alignment  0/ 0:	4821.09	18.125	18.75
Length 1920, alignment 60/ 0:	4821.09	668.75	667.031
Length 1920, alignment  0/60:	4822.34	669.062	668.906
Length 1920, alignment 60/60:	4963.28	18.9062	18.75
Length 1952, alignment  0/ 0:	4900.94	18.9062	18.75
Length 1952, alignment 61/ 0:	4901.09	690.156	688.75
Length 1952, alignment  0/61:	4902.5	690.625	690.469
Length 1952, alignment 61/61:	4900.47	18.75	18.75
Length 1984, alignment  0/ 0:	4981.09	18.125	18.75
Length 1984, alignment 62/ 0:	4980.94	690.156	688.75
Length 1984, alignment  0/62:	4982.5	690.625	690.469
Length 1984, alignment 62/62:	4980.47	18.75	18.75
Length 2016, alignment  0/ 0:	5060.94	18.2812	18.75
Length 2016, alignment 63/ 0:	5194.38	712.5	710.938
Length 2016, alignment  0/63:	5063.44	712.188	712.031
Length 2016, alignment 63/63:	5060.62	19.2188	19.0625

[-- Attachment #6: bench-memmove-large.out --]
[-- Type: text/plain, Size: 3251 bytes --]

                       	__memmove_thunderx	__memmove_generic
Length 4103, alignment  0/64:	525.625	418.125
Length 4111, alignment  0/ 3:	1465	1464.38
Length 4127, alignment  3/ 0:	1502.5	1497.5
Length 4159, alignment  3/ 7:	1485	1482.5
Length 4223, alignment  9/ 5:	1509.38	1510.62
Length 8199, alignment  0/64:	768.75	756.25
Length 8207, alignment  0/ 3:	2848.12	2846.88
Length 8223, alignment  3/ 0:	2881.25	2879.38
Length 8255, alignment  3/ 7:	2868.12	2868.12
Length 8319, alignment  9/ 5:	2893.75	2890
Length 16391, alignment  0/64:	1508.75	1468.75
Length 16399, alignment  0/ 3:	5630.62	5645
Length 16415, alignment  3/ 0:	6587.5	5696.88
Length 16447, alignment  3/ 7:	5665.62	5671.25
Length 16511, alignment  9/ 5:	5680	5715.62
Length 32775, alignment  0/64:	3404.38	3379.38
Length 32783, alignment  0/ 3:	11950	11948.8
Length 32799, alignment  3/ 0:	12272.5	12196.2
Length 32831, alignment  3/ 7:	11932.5	11937.5
Length 32895, alignment  9/ 5:	12278.1	12600.6
Length 65543, alignment  0/64:	15153.8	15145.6
Length 65551, alignment  0/ 3:	32700.6	32701.9
Length 65567, alignment  3/ 0:	24323.1	32856.9
Length 65599, alignment  3/ 7:	32333.1	32333.1
Length 65663, alignment  9/ 5:	24330.6	32889.4
Length 131079, alignment  0/64:	30443.8	30994.4
Length 131087, alignment  0/ 3:	65536.2	65528.8
Length 131103, alignment  3/ 0:	48560	65695
Length 131135, alignment  3/ 7:	65275.6	64768.8
Length 131199, alignment  9/ 5:	48574.4	66727.5
Length 262151, alignment  0/64:	61041.2	61043.1
Length 262159, alignment  0/ 3:	131192	132792
Length 262175, alignment  3/ 0:	97861.9	131354
Length 262207, alignment  3/ 7:	129631	130307
Length 262271, alignment  9/ 5:	97690	131389
Length 524295, alignment  0/64:	121656	122274
Length 524303, alignment  0/ 3:	262468	262572
Length 524319, alignment  3/ 0:	193571	262575
Length 524351, alignment  3/ 7:	259253	259378
Length 524415, alignment  9/ 5:	193574	262574
Length 1048583, alignment  0/64:	242923	242967
Length 1048591, alignment  0/ 3:	524019	524000
Length 1048607, alignment  3/ 0:	386996	524172
Length 1048639, alignment  3/ 7:	517732	517734
Length 1048703, alignment  9/ 5:	386961	524196
Length 2097159, alignment  0/64:	485096	484989
Length 2097167, alignment  0/ 3:	1.04767e+06	1.04724e+06
Length 2097183, alignment  3/ 0:	772307	1.04781e+06
Length 2097215, alignment  3/ 7:	1.03466e+06	1.03465e+06
Length 2097279, alignment  9/ 5:	772771	1.04743e+06
Length 4194311, alignment  0/64:	969175	969163
Length 4194319, alignment  0/ 3:	2.09413e+06	2.09371e+06
Length 4194335, alignment  3/ 0:	1.54408e+06	2.09386e+06
Length 4194367, alignment  3/ 7:	2.06856e+06	2.06895e+06
Length 4194431, alignment  9/ 5:	1.54376e+06	2.09388e+06
Length 8388615, alignment  0/64:	1.95435e+06	1.95496e+06
Length 8388623, alignment  0/ 3:	4.19442e+06	4.19203e+06
Length 8388639, alignment  3/ 0:	3.08706e+06	4.19641e+06
Length 8388671, alignment  3/ 7:	4.14203e+06	4.14215e+06
Length 8388735, alignment  9/ 5:	3.08678e+06	4.19257e+06
Length 16777223, alignment  0/64:	6.15746e+06	6.13601e+06
Length 16777231, alignment  0/ 3:	1.07097e+07	1.06811e+07
Length 16777247, alignment  3/ 0:	6.44117e+06	1.10652e+07
Length 16777279, alignment  3/ 7:	1.06051e+07	1.05801e+07
Length 16777343, alignment  9/ 5:	6.43968e+06	1.10505e+07

^ permalink raw reply	[flat|nested] 38+ messages in thread

end of thread, other threads:[~2017-03-22 17:52 UTC | newest]

Thread overview: 38+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-02-07 12:42 [PATCH] Add ifunc memcpy and memmove for aarch64 Wilco Dijkstra
2017-02-07 13:01 ` Siddhesh Poyarekar
2017-02-07 13:22   ` Adhemerval Zanella
2017-02-07 23:20     ` Steve Ellcey
2017-02-08  5:46       ` Siddhesh Poyarekar
2017-02-08  5:48         ` Siddhesh Poyarekar
2017-02-08 12:03           ` Szabolcs Nagy
2017-02-09  0:02         ` Steve Ellcey
2017-02-09 10:51           ` Szabolcs Nagy
2017-02-09 11:04             ` Siddhesh Poyarekar
2017-02-09 11:40               ` Szabolcs Nagy
2017-02-09 11:53                 ` Siddhesh Poyarekar
2017-02-09 15:54             ` Andrew Pinski
2017-02-10  0:55               ` Steve Ellcey
2017-02-23  0:27                 ` Steve Ellcey
2017-02-23 14:21                   ` Siddhesh Poyarekar
2017-02-23 16:20                     ` Steve Ellcey
2017-02-23 16:34                       ` Siddhesh Poyarekar
2017-02-23 16:42                         ` Steve Ellcey
2017-02-23 16:53                           ` Siddhesh Poyarekar
2017-03-01 18:48                             ` Steve Ellcey
2017-03-14 18:46                               ` Szabolcs Nagy
2017-03-14 18:51                                 ` Andrew Pinski
2017-03-15 23:53                                 ` Steve Ellcey
2017-03-22  5:38                                   ` Aarch64 machine maintainership (was: Add ifunc memcpy and memmove for aarch64) Siddhesh Poyarekar
2017-03-22 17:34                                     ` Joseph Myers
2017-03-22 17:52                                       ` Aarch64 machine maintainership Siddhesh Poyarekar
  -- strict thread matches above, loose matches on Subject: below --
2017-01-19 18:23 [PATCH] Add ifunc memcpy and memmove for aarch64 Steve Ellcey
2017-01-19 19:42 ` Adhemerval Zanella
2017-01-19 21:04   ` Joseph Myers
2017-01-23 23:33   ` Steve Ellcey
2017-01-24  9:37     ` Florian Weimer
2017-01-24 14:09     ` Adhemerval Zanella
2017-01-24 19:34       ` Steve Ellcey
2017-01-24 20:49         ` Steve Ellcey
2017-01-25 17:34       ` Steve Ellcey
2017-02-06 20:54         ` Adhemerval Zanella
2017-02-07  6:47         ` Siddhesh Poyarekar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).