* Re: [PATCH] Add ifunc memcpy and memmove for aarch64
@ 2017-02-07 12:42 Wilco Dijkstra
2017-02-07 13:01 ` Siddhesh Poyarekar
0 siblings, 1 reply; 38+ messages in thread
From: Wilco Dijkstra @ 2017-02-07 12:42 UTC (permalink / raw)
To: Siddhesh Poyarekar, sellcey; +Cc: adhemerval.zanella, libc-alpha, nd
Siddhesh wrote:
> I think it would be cleaner to put the full generic and thunderx
> implementations in separate files instead of trying to do this macro
> dance because it keeps micro-architecture details separate. Assembly
> code is hard to maintain as it is without adding conditional compilation
> using macros.
I agree we want to avoid using conditional compilation as much as possible.
On the other hand duplication is a bad idea too, I've seen too many cases where
bugs were only fixed in one of the N duplicates.
However I'm actually wondering whether we need an ifunc for this case.
For large copies from L2 I think adding a prefetch should be benign even on
cores that don't need it, so if the benchmarks confirm this we should consider
updating the generic memcpy.
> I also second Adhemerval's suggestion to separate the patch to add the
> framework from the one to add the thunderx ifunc. It makes for easier
> cherry picking and git-blaming.
Agreed.
Wilco
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH] Add ifunc memcpy and memmove for aarch64
2017-02-07 12:42 [PATCH] Add ifunc memcpy and memmove for aarch64 Wilco Dijkstra
@ 2017-02-07 13:01 ` Siddhesh Poyarekar
2017-02-07 13:22 ` Adhemerval Zanella
0 siblings, 1 reply; 38+ messages in thread
From: Siddhesh Poyarekar @ 2017-02-07 13:01 UTC (permalink / raw)
To: Wilco Dijkstra, sellcey; +Cc: adhemerval.zanella, libc-alpha, nd
On Tuesday 07 February 2017 06:12 PM, Wilco Dijkstra wrote:
> I agree we want to avoid using conditional compilation as much as possible.
> On the other hand duplication is a bad idea too, I've seen too many cases where
> bugs were only fixed in one of the N duplicates.
Sure, but then in that case the de-duplication must be done by
identifying a logical code block and make that into a macro to override
and not just arbitrarily inject hunks of code. So in this case it could
be alternate implementations of copy_long that is sufficient so #define
COPY_LONG in both memcpy_generic and memcpy_thunderx and have the parent
(memcpy.S) use that macro. In fact, that might even end up making the
code a bit nicer to read.
> However I'm actually wondering whether we need an ifunc for this case.
> For large copies from L2 I think adding a prefetch should be benign even on
> cores that don't need it, so if the benchmarks confirm this we should consider
> updating the generic memcpy.
That is a call that ARM maintainers can take and is also another reason
to separate the IFUNC infrastructure code from the thunderx change.
Siddhesh
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH] Add ifunc memcpy and memmove for aarch64
2017-02-07 13:01 ` Siddhesh Poyarekar
@ 2017-02-07 13:22 ` Adhemerval Zanella
2017-02-07 23:20 ` Steve Ellcey
0 siblings, 1 reply; 38+ messages in thread
From: Adhemerval Zanella @ 2017-02-07 13:22 UTC (permalink / raw)
To: Siddhesh Poyarekar, Wilco Dijkstra, sellcey; +Cc: libc-alpha, nd
[-- Attachment #1: Type: text/plain, Size: 1467 bytes --]
On 07/02/2017 11:01, Siddhesh Poyarekar wrote:
> On Tuesday 07 February 2017 06:12 PM, Wilco Dijkstra wrote:
>> I agree we want to avoid using conditional compilation as much as possible.
>> On the other hand duplication is a bad idea too, I've seen too many cases where
>> bugs were only fixed in one of the N duplicates.
>
> Sure, but then in that case the de-duplication must be done by
> identifying a logical code block and make that into a macro to override
> and not just arbitrarily inject hunks of code. So in this case it could
> be alternate implementations of copy_long that is sufficient so #define
> COPY_LONG in both memcpy_generic and memcpy_thunderx and have the parent
> (memcpy.S) use that macro. In fact, that might even end up making the
> code a bit nicer to read.
>
>> However I'm actually wondering whether we need an ifunc for this case.
>> For large copies from L2 I think adding a prefetch should be benign even on
>> cores that don't need it, so if the benchmarks confirm this we should consider
>> updating the generic memcpy.
>
> That is a call that ARM maintainers can take and is also another reason
> to separate the IFUNC infrastructure code from the thunderx change.
I checked only the memcpy change on a APM X-Gene 1 and results seems to show
improvements on aligned input, at least for sizes shorter thatn 4MB. I would
like to check on more armv8 chips, but it does seems a nice improvement
over generic implementation.
[-- Attachment #2: bench-memcpy-large.out --]
[-- Type: text/plain, Size: 1689 bytes --]
memcpy
Length 65543, alignment 0/ 0: 4553.71
Length 65551, alignment 0/ 3: 11239.8
Length 65567, alignment 3/ 0: 11201.6
Length 65599, alignment 3/ 5: 11221.2
Length 131079, alignment 0/ 0: 9023.67
Length 131087, alignment 0/ 3: 22489.5
Length 131103, alignment 3/ 0: 22439.6
Length 131135, alignment 3/ 5: 22426.3
Length 262151, alignment 0/ 0: 21198.5
Length 262159, alignment 0/ 3: 48474
Length 262175, alignment 3/ 0: 48292.3
Length 262207, alignment 3/ 5: 48545.1
Length 524295, alignment 0/ 0: 43480.7
Length 524303, alignment 0/ 3: 93729.3
Length 524319, alignment 3/ 0: 93706.8
Length 524351, alignment 3/ 5: 93809.2
Length 1048583, alignment 0/ 0: 86732.2
Length 1048591, alignment 0/ 3: 187419
Length 1048607, alignment 3/ 0: 187153
Length 1048639, alignment 3/ 5: 187384
Length 2097159, alignment 0/ 0: 173630
Length 2097167, alignment 0/ 3: 373671
Length 2097183, alignment 3/ 0: 373776
Length 2097215, alignment 3/ 5: 374153
Length 4194311, alignment 0/ 0: 383575
Length 4194319, alignment 0/ 3: 752044
Length 4194335, alignment 3/ 0: 750919
Length 4194367, alignment 3/ 5: 751680
Length 8388615, alignment 0/ 0: 1.24695e+06
Length 8388623, alignment 0/ 3: 1.6407e+06
Length 8388639, alignment 3/ 0: 1.63961e+06
Length 8388671, alignment 3/ 5: 1.6407e+06
Length 16777223, alignment 0/ 0: 2.7774e+06
Length 16777231, alignment 0/ 3: 3.34092e+06
Length 16777247, alignment 3/ 0: 3.33036e+06
Length 16777279, alignment 3/ 5: 3.33811e+06
Length 33554439, alignment 0/ 0: 5.4628e+06
Length 33554447, alignment 0/ 3: 6.56429e+06
Length 33554463, alignment 3/ 0: 6.56451e+06
Length 33554495, alignment 3/ 5: 6.5654e+06
[-- Attachment #3: bench-memcpy-large.patched --]
[-- Type: text/plain, Size: 1690 bytes --]
memcpy
Length 65543, alignment 0/ 0: 5590.23
Length 65551, alignment 0/ 3: 11171
Length 65567, alignment 3/ 0: 11146.2
Length 65599, alignment 3/ 5: 11154.1
Length 131079, alignment 0/ 0: 11109
Length 131087, alignment 0/ 3: 22266.3
Length 131103, alignment 3/ 0: 22296.1
Length 131135, alignment 3/ 5: 22257.1
Length 262151, alignment 0/ 0: 22780.6
Length 262159, alignment 0/ 3: 46212.7
Length 262175, alignment 3/ 0: 45999.7
Length 262207, alignment 3/ 5: 46221.3
Length 524295, alignment 0/ 0: 47787.3
Length 524303, alignment 0/ 3: 93263.7
Length 524319, alignment 3/ 0: 93028.3
Length 524351, alignment 3/ 5: 93301.5
Length 1048583, alignment 0/ 0: 95413.2
Length 1048591, alignment 0/ 3: 186367
Length 1048607, alignment 3/ 0: 185780
Length 1048639, alignment 3/ 5: 186296
Length 2097159, alignment 0/ 0: 190546
Length 2097167, alignment 0/ 3: 372310
Length 2097183, alignment 3/ 0: 371187
Length 2097215, alignment 3/ 5: 372281
Length 4194311, alignment 0/ 0: 379009
Length 4194319, alignment 0/ 3: 736763
Length 4194335, alignment 3/ 0: 733672
Length 4194367, alignment 3/ 5: 736531
Length 8388615, alignment 0/ 0: 1.26684e+06
Length 8388623, alignment 0/ 3: 1.61883e+06
Length 8388639, alignment 3/ 0: 1.6062e+06
Length 8388671, alignment 3/ 5: 1.61872e+06
Length 16777223, alignment 0/ 0: 2.68259e+06
Length 16777231, alignment 0/ 3: 3.24415e+06
Length 16777247, alignment 3/ 0: 3.23356e+06
Length 16777279, alignment 3/ 5: 3.2449e+06
Length 33554439, alignment 0/ 0: 5.47245e+06
Length 33554447, alignment 0/ 3: 6.56719e+06
Length 33554463, alignment 3/ 0: 6.55255e+06
Length 33554495, alignment 3/ 5: 6.56698e+06
[-- Attachment #4: memcpy_aarch64.patch --]
[-- Type: text/x-patch, Size: 1885 bytes --]
diff --git a/sysdeps/aarch64/memcpy.S b/sysdeps/aarch64/memcpy.S
index 29af8b1..4742a01 100644
--- a/sysdeps/aarch64/memcpy.S
+++ b/sysdeps/aarch64/memcpy.S
@@ -158,10 +158,13 @@ L(copy96):
.p2align 4
L(copy_long):
+ cmp count, #32768
+ b.lo L(copy_long_without_prefetch)
and tmp1, dstin, 15
bic dst, dstin, 15
ldp D_l, D_h, [src]
sub src, src, tmp1
+ prfm pldl1strm, [src, 384]
add count, count, tmp1 /* Count is now 16 too large. */
ldp A_l, A_h, [src, 16]
stp D_l, D_h, [dstin]
@@ -169,7 +172,10 @@ L(copy_long):
ldp C_l, C_h, [src, 48]
ldp D_l, D_h, [src, 64]!
subs count, count, 128 + 16 /* Test and readjust count. */
- b.ls 2f
+
+L(prefetch_loop64):
+ tbz src, #6, 1f
+ prfm pldl1strm, [src, 512]
1:
stp A_l, A_h, [dst, 16]
ldp A_l, A_h, [src, 16]
@@ -180,12 +186,39 @@ L(copy_long):
stp D_l, D_h, [dst, 64]!
ldp D_l, D_h, [src, 64]!
subs count, count, 64
- b.hi 1b
+ b.hi L(prefetch_loop64)
+ b L(last64)
+
+L(copy_long_without_prefetch):
+
+ and tmp1, dstin, 15
+ bic dst, dstin, 15
+ ldp D_l, D_h, [src]
+ sub src, src, tmp1
+ add count, count, tmp1 /* Count is now 16 too large. */
+ ldp A_l, A_h, [src, 16]
+ stp D_l, D_h, [dstin]
+ ldp B_l, B_h, [src, 32]
+ ldp C_l, C_h, [src, 48]
+ ldp D_l, D_h, [src, 64]!
+ subs count, count, 128 + 16 /* Test and readjust count. */
+ b.ls L(last64)
+L(loop64):
+ stp A_l, A_h, [dst, 16]
+ ldp A_l, A_h, [src, 16]
+ stp B_l, B_h, [dst, 32]
+ ldp B_l, B_h, [src, 32]
+ stp C_l, C_h, [dst, 48]
+ ldp C_l, C_h, [src, 48]
+ stp D_l, D_h, [dst, 64]!
+ ldp D_l, D_h, [src, 64]!
+ subs count, count, 64
+ b.hi L(loop64)
/* Write the last full set of 64 bytes. The remainder is at most 64
bytes, so it is safe to always copy 64 bytes from the end even if
there is just 1 byte left. */
-2:
+L(last64):
ldp E_l, E_h, [srcend, -64]
stp A_l, A_h, [dst, 16]
ldp A_l, A_h, [srcend, -48]
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH] Add ifunc memcpy and memmove for aarch64
2017-02-07 13:22 ` Adhemerval Zanella
@ 2017-02-07 23:20 ` Steve Ellcey
2017-02-08 5:46 ` Siddhesh Poyarekar
0 siblings, 1 reply; 38+ messages in thread
From: Steve Ellcey @ 2017-02-07 23:20 UTC (permalink / raw)
To: Adhemerval Zanella, Siddhesh Poyarekar, Wilco Dijkstra; +Cc: libc-alpha, nd
[-- Attachment #1: Type: text/plain, Size: 977 bytes --]
OK, here is the basic IFUNC enablement for aarch64 without the
memcpy/memmove changes that use it.  I verified that it builds and
causes no regressions on aarch64.  As mentioned in the original email
this code depends on the mrs instruction which is privileged, but the
4.11 kernel will have emulation for it (https://lkml.org/lkml/2017/1/10
/816).
OK to checkin this part?
Steve Ellcey
sellcey@cavium.com
2017-02-07  Steve Ellcey  <sellcey@caviumnetworks.com>
    Adhemerval Zanella  <adhemerval.zanella@linaro.org>
* sysdeps/aarch64/dl-machine.h: Include cpu-features.c.
(DL_PLATFORM_INIT): New define.
(dl_platform_init): New function.
* sysdeps/aarch64/ldsodefs.h: Include cpu-features.h.
* sysdeps/unix/sysv/linux/aarch64/cpu-features.c: New file.
* sysdeps/unix/sysv/linux/aarch64/cpu-features.h: Ditto.
* sysdeps/unix/sysv/linux/aarch64/dl-procinfo.c: Ditto.
* sysdeps/unix/sysv/linux/aarch64/libc-start.c: Ditto.
[-- Attachment #2: ifunc.patch --]
[-- Type: text/x-patch, Size: 8975 bytes --]
diff --git a/sysdeps/aarch64/dl-machine.h b/sysdeps/aarch64/dl-machine.h
index 84b8aec..15d79a6 100644
--- a/sysdeps/aarch64/dl-machine.h
+++ b/sysdeps/aarch64/dl-machine.h
@@ -25,6 +25,7 @@
#include <tls.h>
#include <dl-tlsdesc.h>
#include <dl-irel.h>
+#include <cpu-features.c>
/* Return nonzero iff ELF header is compatible with the running host. */
static inline int __attribute__ ((unused))
@@ -225,6 +226,23 @@ _dl_start_user: \n\
#define ELF_MACHINE_NO_REL 1
#define ELF_MACHINE_NO_RELA 0
+#define DL_PLATFORM_INIT dl_platform_init ()
+
+static inline void __attribute__ ((unused))
+dl_platform_init (void)
+{
+ if (GLRO(dl_platform) != NULL && *GLRO(dl_platform) == '\0')
+ /* Avoid an empty string which would disturb us. */
+ GLRO(dl_platform) = NULL;
+
+#ifdef SHARED
+ /* init_cpu_features has been called early from __libc_start_main in
+ static executable. */
+ init_cpu_features (&GLRO(dl_aarch64_cpu_features));
+#endif
+}
+
+
static inline ElfW(Addr)
elf_machine_fixup_plt (struct link_map *map, lookup_t t,
const ElfW(Rela) *reloc,
diff --git a/sysdeps/aarch64/ldsodefs.h b/sysdeps/aarch64/ldsodefs.h
index f277074..ba4ada3 100644
--- a/sysdeps/aarch64/ldsodefs.h
+++ b/sysdeps/aarch64/ldsodefs.h
@@ -20,6 +20,7 @@
#define _AARCH64_LDSODEFS_H 1
#include <elf.h>
+#include <cpu-features.h>
struct La_aarch64_regs;
struct La_aarch64_retval;
diff --git a/sysdeps/unix/sysv/linux/aarch64/cpu-features.c b/sysdeps/unix/sysv/linux/aarch64/cpu-features.c
index e69de29..8e4b514 100644
--- a/sysdeps/unix/sysv/linux/aarch64/cpu-features.c
+++ b/sysdeps/unix/sysv/linux/aarch64/cpu-features.c
@@ -0,0 +1,38 @@
+/* Initialize CPU feature data. AArch64 version.
+ This file is part of the GNU C Library.
+ Copyright (C) 2017 Free Software Foundation, Inc.
+
+ The GNU C Library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Lesser General Public
+ License as published by the Free Software Foundation; either
+ version 2.1 of the License, or (at your option) any later version.
+
+ The GNU C Library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public
+ License along with the GNU C Library; if not, see
+ <http://www.gnu.org/licenses/>. */
+
+#include <cpu-features.h>
+
+#ifndef HWCAP_CPUID
+# define HWCAP_CPUID (1 << 11)
+#endif
+
+static inline void
+init_cpu_features (struct cpu_features *cpu_features)
+{
+ if (GLRO(dl_hwcap) & HWCAP_CPUID)
+ {
+ register uint64_t id = 0;
+ asm volatile ("mrs %0, midr_el1" : "=r"(id));
+ cpu_features->midr_el1 = id;
+ }
+ else
+ {
+ cpu_features->midr_el1 = 0;
+ }
+}
diff --git a/sysdeps/unix/sysv/linux/aarch64/cpu-features.h b/sysdeps/unix/sysv/linux/aarch64/cpu-features.h
index e69de29..c92b650 100644
--- a/sysdeps/unix/sysv/linux/aarch64/cpu-features.h
+++ b/sysdeps/unix/sysv/linux/aarch64/cpu-features.h
@@ -0,0 +1,49 @@
+/* Initialize CPU feature data. AArch64 version.
+ This file is part of the GNU C Library.
+ Copyright (C) 2017 Free Software Foundation, Inc.
+
+ The GNU C Library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Lesser General Public
+ License as published by the Free Software Foundation; either
+ version 2.1 of the License, or (at your option) any later version.
+
+ The GNU C Library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public
+ License along with the GNU C Library; if not, see
+ <http://www.gnu.org/licenses/>. */
+
+#ifndef _CPU_FEATURES_AARCH64_H
+#define _CPU_FEATURES_AARCH64_H
+
+#include <stdint.h>
+
+#define MIDR_PARTNUM_SHIFT 4
+#define MIDR_PARTNUM_MASK (0xfff << MIDR_PARTNUM_SHIFT)
+#define MIDR_PARTNUM(midr) \
+ (((midr) & MIDR_PARTNUM_MASK) >> MIDR_PARTNUM_SHIFT)
+#define MIDR_ARCHITECTURE_SHIFT 16
+#define MIDR_ARCHITECTURE_MASK (0xf << MIDR_ARCHITECTURE_SHIFT)
+#define MIDR_ARCHITECTURE(midr) \
+ (((midr) & MIDR_ARCHITECTURE_MASK) >> MIDR_ARCHITECTURE_SHIFT)
+#define MIDR_VARIANT_SHIFT 20
+#define MIDR_VARIANT_MASK (0xf << MIDR_VARIANT_SHIFT)
+#define MIDR_VARIANT(midr) \
+ (((midr) & MIDR_VARIANT_MASK) >> MIDR_VARIANT_SHIFT)
+#define MIDR_IMPLEMENTOR_SHIFT 24
+#define MIDR_IMPLEMENTOR_MASK (0xff << MIDR_IMPLEMENTOR_SHIFT)
+#define MIDR_IMPLEMENTOR(midr) \
+ (((midr) & MIDR_IMPLEMENTOR_MASK) >> MIDR_IMPLEMENTOR_SHIFT)
+
+#define IS_THUNDERX(midr) (MIDR_IMPLEMENTOR(midr) == 'C' \
+ && MIDR_PARTNUM(midr) == 0x0a1)
+
+struct cpu_features
+{
+ uint64_t midr_el1;
+};
+
+#endif /* _CPU_FEATURES_AARCH64_H */
diff --git a/sysdeps/unix/sysv/linux/aarch64/dl-procinfo.c b/sysdeps/unix/sysv/linux/aarch64/dl-procinfo.c
index e69de29..438046a 100644
--- a/sysdeps/unix/sysv/linux/aarch64/dl-procinfo.c
+++ b/sysdeps/unix/sysv/linux/aarch64/dl-procinfo.c
@@ -0,0 +1,60 @@
+/* Data for AArch64 version of processor capability information.
+ Linux version.
+ Copyright (C) 2017 Free Software Foundation, Inc.
+ This file is part of the GNU C Library.
+
+ The GNU C Library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Lesser General Public
+ License as published by the Free Software Foundation; either
+ version 2.1 of the License, or (at your option) any later version.
+
+ The GNU C Library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public
+ License along with the GNU C Library; if not, see
+ <http://www.gnu.org/licenses/>. */
+
+/* If anything should be added here check whether the size of each string
+ is still ok with the given array size.
+
+ All the #ifdefs in the definitions are quite irritating but
+ necessary if we want to avoid duplicating the information. There
+ are three different modes:
+
+ - PROCINFO_DECL is defined. This means we are only interested in
+ declarations.
+
+ - PROCINFO_DECL is not defined:
+
+ + if SHARED is defined the file is included in an array
+ initializer. The .element = { ... } syntax is needed.
+
+ + if SHARED is not defined a normal array initialization is
+ needed.
+ */
+
+#ifndef PROCINFO_CLASS
+# define PROCINFO_CLASS
+#endif
+
+#if !IS_IN (ldconfig)
+# if !defined PROCINFO_DECL && defined SHARED
+ ._dl_aarch64_cpu_features
+# else
+PROCINFO_CLASS struct cpu_features _dl_aarch64_cpu_features
+# endif
+# ifndef PROCINFO_DECL
+= { }
+# endif
+# if !defined SHARED || defined PROCINFO_DECL
+;
+# else
+,
+# endif
+#endif
+
+#undef PROCINFO_DECL
+#undef PROCINFO_CLASS
diff --git a/sysdeps/unix/sysv/linux/aarch64/libc-start.c b/sysdeps/unix/sysv/linux/aarch64/libc-start.c
index e69de29..c98aff1 100644
--- a/sysdeps/unix/sysv/linux/aarch64/libc-start.c
+++ b/sysdeps/unix/sysv/linux/aarch64/libc-start.c
@@ -0,0 +1,40 @@
+/* Copyright (C) 2017 Free Software Foundation, Inc.
+ This file is part of the GNU C Library.
+
+ The GNU C Library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Lesser General Public
+ License as published by the Free Software Foundation; either
+ version 2.1 of the License, or (at your option) any later version.
+
+ The GNU C Library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public
+ License along with the GNU C Library; if not, see
+ <http://www.gnu.org/licenses/>. */
+
+#ifdef SHARED
+# include <csu/libc-start.c>
+# else
+/* The main work is done in the generic function. */
+# define LIBC_START_DISABLE_INLINE
+# define LIBC_START_MAIN generic_start_main
+# include <csu/libc-start.c>
+# include <cpu-features.c>
+
+extern struct cpu_features _dl_aarch64_cpu_features;
+
+int
+__libc_start_main (int (*main) (int, char **, char ** MAIN_AUXVEC_DECL),
+ int argc, char **argv,
+ __typeof (main) init,
+ void (*fini) (void),
+ void (*rtld_fini) (void), void *stack_end)
+{
+ init_cpu_features (&_dl_aarch64_cpu_features);
+ return generic_start_main (main, argc, argv, init, fini, rtld_fini,
+ stack_end);
+}
+#endif
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH] Add ifunc memcpy and memmove for aarch64
2017-02-07 23:20 ` Steve Ellcey
@ 2017-02-08 5:46 ` Siddhesh Poyarekar
2017-02-08 5:48 ` Siddhesh Poyarekar
2017-02-09 0:02 ` Steve Ellcey
0 siblings, 2 replies; 38+ messages in thread
From: Siddhesh Poyarekar @ 2017-02-08 5:46 UTC (permalink / raw)
To: Steve Ellcey, Adhemerval Zanella, Wilco Dijkstra; +Cc: libc-alpha, nd
On Wednesday 08 February 2017 04:50 AM, Steve Ellcey wrote:
> OK, here is the basic IFUNC enablement for aarch64 without the
> memcpy/memmove changes that use it. I verified that it builds and
> causes no regressions on aarch64. As mentioned in the original email
> this code depends on the mrs instruction which is privileged, but the
> 4.11 kernel will have emulation for it (https://lkml.org/lkml/2017/1/10
> /816).
>
> OK to checkin this part?
Looks OK with a couple of nits below.
>
> Steve Ellcey
> sellcey@cavium.com
>
>
> 2017-02-07 Steve Ellcey <sellcey@caviumnetworks.com>
> Adhemerval Zanella <adhemerval.zanella@linaro.org>
>
> * sysdeps/aarch64/dl-machine.h: Include cpu-features.c.
> (DL_PLATFORM_INIT): New define.
> (dl_platform_init): New function.
> * sysdeps/aarch64/ldsodefs.h: Include cpu-features.h.
> * sysdeps/unix/sysv/linux/aarch64/cpu-features.c: New file.
> * sysdeps/unix/sysv/linux/aarch64/cpu-features.h: Ditto.
> * sysdeps/unix/sysv/linux/aarch64/dl-procinfo.c: Ditto.
> * sysdeps/unix/sysv/linux/aarch64/libc-start.c: Ditto.
I was told years ago that we prefer 'Likewise' to 'Ditto' :)
>
>
> ifunc.patch
>
>
> diff --git a/sysdeps/aarch64/dl-machine.h b/sysdeps/aarch64/dl-machine.h
> index 84b8aec..15d79a6 100644
> --- a/sysdeps/aarch64/dl-machine.h
> +++ b/sysdeps/aarch64/dl-machine.h
> @@ -25,6 +25,7 @@
> #include <tls.h>
> #include <dl-tlsdesc.h>
> #include <dl-irel.h>
> +#include <cpu-features.c>
>
> /* Return nonzero iff ELF header is compatible with the running host. */
> static inline int __attribute__ ((unused))
> @@ -225,6 +226,23 @@ _dl_start_user: \n\
> #define ELF_MACHINE_NO_REL 1
> #define ELF_MACHINE_NO_RELA 0
>
> +#define DL_PLATFORM_INIT dl_platform_init ()
> +
> +static inline void __attribute__ ((unused))
> +dl_platform_init (void)
> +{
> + if (GLRO(dl_platform) != NULL && *GLRO(dl_platform) == '\0')
> + /* Avoid an empty string which would disturb us. */
> + GLRO(dl_platform) = NULL;
> +
> +#ifdef SHARED
> + /* init_cpu_features has been called early from __libc_start_main in
> + static executable. */
> + init_cpu_features (&GLRO(dl_aarch64_cpu_features));
> +#endif
> +}
> +
> +
> static inline ElfW(Addr)
> elf_machine_fixup_plt (struct link_map *map, lookup_t t,
> const ElfW(Rela) *reloc,
> diff --git a/sysdeps/aarch64/ldsodefs.h b/sysdeps/aarch64/ldsodefs.h
> index f277074..ba4ada3 100644
> --- a/sysdeps/aarch64/ldsodefs.h
> +++ b/sysdeps/aarch64/ldsodefs.h
> @@ -20,6 +20,7 @@
> #define _AARCH64_LDSODEFS_H 1
>
> #include <elf.h>
> +#include <cpu-features.h>
>
> struct La_aarch64_regs;
> struct La_aarch64_retval;
> diff --git a/sysdeps/unix/sysv/linux/aarch64/cpu-features.c b/sysdeps/unix/sysv/linux/aarch64/cpu-features.c
> index e69de29..8e4b514 100644
> --- a/sysdeps/unix/sysv/linux/aarch64/cpu-features.c
> +++ b/sysdeps/unix/sysv/linux/aarch64/cpu-features.c
> @@ -0,0 +1,38 @@
> +/* Initialize CPU feature data. AArch64 version.
> + This file is part of the GNU C Library.
> + Copyright (C) 2017 Free Software Foundation, Inc.
> +
> + The GNU C Library is free software; you can redistribute it and/or
> + modify it under the terms of the GNU Lesser General Public
> + License as published by the Free Software Foundation; either
> + version 2.1 of the License, or (at your option) any later version.
> +
> + The GNU C Library is distributed in the hope that it will be useful,
> + but WITHOUT ANY WARRANTY; without even the implied warranty of
> + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
> + Lesser General Public License for more details.
> +
> + You should have received a copy of the GNU Lesser General Public
> + License along with the GNU C Library; if not, see
> + <http://www.gnu.org/licenses/>. */
> +
> +#include <cpu-features.h>
> +
> +#ifndef HWCAP_CPUID
> +# define HWCAP_CPUID (1 << 11)
> +#endif
> +
> +static inline void
> +init_cpu_features (struct cpu_features *cpu_features)
> +{
> + if (GLRO(dl_hwcap) & HWCAP_CPUID)
> + {
> + register uint64_t id = 0;
> + asm volatile ("mrs %0, midr_el1" : "=r"(id));
> + cpu_features->midr_el1 = id;
> + }
> + else
> + {
> + cpu_features->midr_el1 = 0;
> + }
> +}
> diff --git a/sysdeps/unix/sysv/linux/aarch64/cpu-features.h b/sysdeps/unix/sysv/linux/aarch64/cpu-features.h
> index e69de29..c92b650 100644
> --- a/sysdeps/unix/sysv/linux/aarch64/cpu-features.h
> +++ b/sysdeps/unix/sysv/linux/aarch64/cpu-features.h
> @@ -0,0 +1,49 @@
> +/* Initialize CPU feature data. AArch64 version.
> + This file is part of the GNU C Library.
> + Copyright (C) 2017 Free Software Foundation, Inc.
> +
> + The GNU C Library is free software; you can redistribute it and/or
> + modify it under the terms of the GNU Lesser General Public
> + License as published by the Free Software Foundation; either
> + version 2.1 of the License, or (at your option) any later version.
> +
> + The GNU C Library is distributed in the hope that it will be useful,
> + but WITHOUT ANY WARRANTY; without even the implied warranty of
> + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
> + Lesser General Public License for more details.
> +
> + You should have received a copy of the GNU Lesser General Public
> + License along with the GNU C Library; if not, see
> + <http://www.gnu.org/licenses/>. */
> +
> +#ifndef _CPU_FEATURES_AARCH64_H
> +#define _CPU_FEATURES_AARCH64_H
> +
> +#include <stdint.h>
> +
> +#define MIDR_PARTNUM_SHIFT 4
> +#define MIDR_PARTNUM_MASK (0xfff << MIDR_PARTNUM_SHIFT)
> +#define MIDR_PARTNUM(midr) \
> + (((midr) & MIDR_PARTNUM_MASK) >> MIDR_PARTNUM_SHIFT)
> +#define MIDR_ARCHITECTURE_SHIFT 16
> +#define MIDR_ARCHITECTURE_MASK (0xf << MIDR_ARCHITECTURE_SHIFT)
> +#define MIDR_ARCHITECTURE(midr) \
> + (((midr) & MIDR_ARCHITECTURE_MASK) >> MIDR_ARCHITECTURE_SHIFT)
> +#define MIDR_VARIANT_SHIFT 20
> +#define MIDR_VARIANT_MASK (0xf << MIDR_VARIANT_SHIFT)
> +#define MIDR_VARIANT(midr) \
> + (((midr) & MIDR_VARIANT_MASK) >> MIDR_VARIANT_SHIFT)
> +#define MIDR_IMPLEMENTOR_SHIFT 24
> +#define MIDR_IMPLEMENTOR_MASK (0xff << MIDR_IMPLEMENTOR_SHIFT)
> +#define MIDR_IMPLEMENTOR(midr) \
> + (((midr) & MIDR_IMPLEMENTOR_MASK) >> MIDR_IMPLEMENTOR_SHIFT)
> +
> +#define IS_THUNDERX(midr) (MIDR_IMPLEMENTOR(midr) == 'C' \
> + && MIDR_PARTNUM(midr) == 0x0a1)
> +
> +struct cpu_features
> +{
> + uint64_t midr_el1;
> +};
> +
> +#endif /* _CPU_FEATURES_AARCH64_H */
> diff --git a/sysdeps/unix/sysv/linux/aarch64/dl-procinfo.c b/sysdeps/unix/sysv/linux/aarch64/dl-procinfo.c
> index e69de29..438046a 100644
> --- a/sysdeps/unix/sysv/linux/aarch64/dl-procinfo.c
> +++ b/sysdeps/unix/sysv/linux/aarch64/dl-procinfo.c
> @@ -0,0 +1,60 @@
> +/* Data for AArch64 version of processor capability information.
> + Linux version.
> + Copyright (C) 2017 Free Software Foundation, Inc.
> + This file is part of the GNU C Library.
> +
> + The GNU C Library is free software; you can redistribute it and/or
> + modify it under the terms of the GNU Lesser General Public
> + License as published by the Free Software Foundation; either
> + version 2.1 of the License, or (at your option) any later version.
> +
> + The GNU C Library is distributed in the hope that it will be useful,
> + but WITHOUT ANY WARRANTY; without even the implied warranty of
> + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
> + Lesser General Public License for more details.
> +
> + You should have received a copy of the GNU Lesser General Public
> + License along with the GNU C Library; if not, see
> + <http://www.gnu.org/licenses/>. */
> +
> +/* If anything should be added here check whether the size of each string
> + is still ok with the given array size.
> +
> + All the #ifdefs in the definitions are quite irritating but
> + necessary if we want to avoid duplicating the information. There
> + are three different modes:
> +
> + - PROCINFO_DECL is defined. This means we are only interested in
> + declarations.
> +
> + - PROCINFO_DECL is not defined:
> +
> + + if SHARED is defined the file is included in an array
> + initializer. The .element = { ... } syntax is needed.
> +
> + + if SHARED is not defined a normal array initialization is
> + needed.
> + */
> +
> +#ifndef PROCINFO_CLASS
> +# define PROCINFO_CLASS
> +#endif
> +
> +#if !IS_IN (ldconfig)
> +# if !defined PROCINFO_DECL && defined SHARED
> + ._dl_aarch64_cpu_features
> +# else
> +PROCINFO_CLASS struct cpu_features _dl_aarch64_cpu_features
> +# endif
> +# ifndef PROCINFO_DECL
> += { }
> +# endif
> +# if !defined SHARED || defined PROCINFO_DECL
> +;
> +# else
> +,
> +# endif
> +#endif
> +
> +#undef PROCINFO_DECL
> +#undef PROCINFO_CLASS
> diff --git a/sysdeps/unix/sysv/linux/aarch64/libc-start.c b/sysdeps/unix/sysv/linux/aarch64/libc-start.c
> index e69de29..c98aff1 100644
> --- a/sysdeps/unix/sysv/linux/aarch64/libc-start.c
> +++ b/sysdeps/unix/sysv/linux/aarch64/libc-start.c
> @@ -0,0 +1,40 @@
You've forgotten to add a one line description for this file.
> +/* Copyright (C) 2017 Free Software Foundation, Inc.
> + This file is part of the GNU C Library.
> +
> + The GNU C Library is free software; you can redistribute it and/or
> + modify it under the terms of the GNU Lesser General Public
> + License as published by the Free Software Foundation; either
> + version 2.1 of the License, or (at your option) any later version.
> +
> + The GNU C Library is distributed in the hope that it will be useful,
> + but WITHOUT ANY WARRANTY; without even the implied warranty of
> + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
> + Lesser General Public License for more details.
> +
> + You should have received a copy of the GNU Lesser General Public
> + License along with the GNU C Library; if not, see
> + <http://www.gnu.org/licenses/>. */
> +
> +#ifdef SHARED
> +# include <csu/libc-start.c>
> +# else
> +/* The main work is done in the generic function. */
> +# define LIBC_START_DISABLE_INLINE
> +# define LIBC_START_MAIN generic_start_main
> +# include <csu/libc-start.c>
> +# include <cpu-features.c>
> +
> +extern struct cpu_features _dl_aarch64_cpu_features;
> +
> +int
> +__libc_start_main (int (*main) (int, char **, char ** MAIN_AUXVEC_DECL),
> + int argc, char **argv,
> + __typeof (main) init,
> + void (*fini) (void),
> + void (*rtld_fini) (void), void *stack_end)
> +{
> + init_cpu_features (&_dl_aarch64_cpu_features);
> + return generic_start_main (main, argc, argv, init, fini, rtld_fini,
> + stack_end);
> +}
> +#endif
>
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH] Add ifunc memcpy and memmove for aarch64
2017-02-08 5:46 ` Siddhesh Poyarekar
@ 2017-02-08 5:48 ` Siddhesh Poyarekar
2017-02-08 12:03 ` Szabolcs Nagy
2017-02-09 0:02 ` Steve Ellcey
1 sibling, 1 reply; 38+ messages in thread
From: Siddhesh Poyarekar @ 2017-02-08 5:48 UTC (permalink / raw)
To: Steve Ellcey, Adhemerval Zanella, Wilco Dijkstra; +Cc: libc-alpha, nd
On Wednesday 08 February 2017 11:15 AM, Siddhesh Poyarekar wrote:
> Looks OK with a couple of nits below.
Oh and I suppose you need an ack from the ARM maintainers as well before
you push.
Siddhesh
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH] Add ifunc memcpy and memmove for aarch64
2017-02-08 5:48 ` Siddhesh Poyarekar
@ 2017-02-08 12:03 ` Szabolcs Nagy
0 siblings, 0 replies; 38+ messages in thread
From: Szabolcs Nagy @ 2017-02-08 12:03 UTC (permalink / raw)
To: Siddhesh Poyarekar, Steve Ellcey, Adhemerval Zanella, Wilco Dijkstra
Cc: nd, libc-alpha
On 08/02/17 05:46, Siddhesh Poyarekar wrote:
> On Wednesday 08 February 2017 11:15 AM, Siddhesh Poyarekar wrote:
>> Looks OK with a couple of nits below.
>
> Oh and I suppose you need an ack from the ARM maintainers as well before
> you push.
>
arm maintainer != aarch64 maintainer.
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH] Add ifunc memcpy and memmove for aarch64
2017-02-08 5:46 ` Siddhesh Poyarekar
2017-02-08 5:48 ` Siddhesh Poyarekar
@ 2017-02-09 0:02 ` Steve Ellcey
2017-02-09 10:51 ` Szabolcs Nagy
1 sibling, 1 reply; 38+ messages in thread
From: Steve Ellcey @ 2017-02-09 0:02 UTC (permalink / raw)
To: Siddhesh Poyarekar, Adhemerval Zanella, Wilco Dijkstra; +Cc: libc-alpha, nd
[-- Attachment #1: Type: text/plain, Size: 847 bytes --]
On Wed, 2017-02-08 at 11:15 +0530, Siddhesh Poyarekar wrote:
> Looks OK with a couple of nits below.
Here is a de-nitted version with the ChangeLog using 'Likewise'
instead of 'Ditto' and with a one line description at the top
of libc-start.c.
Steve Ellcey
sellcey@caviium.com
2017-02-08  Steve Ellcey  <sellcey@caviumnetworks.com>
    Adhemerval Zanella  <adhemerval.zanella@linaro.org>
* sysdeps/aarch64/dl-machine.h: Include cpu-features.c.
(DL_PLATFORM_INIT): New define.
(dl_platform_init): New function.
* sysdeps/aarch64/ldsodefs.h: Include cpu-features.h.
* sysdeps/unix/sysv/linux/aarch64/cpu-features.c: New file.
* sysdeps/unix/sysv/linux/aarch64/cpu-features.h: Likewise.
* sysdeps/unix/sysv/linux/aarch64/dl-procinfo.c: Likewise.
* sysdeps/unix/sysv/linux/aarch64/libc-start.c: Likewise.
[-- Attachment #2: ifunc.patch --]
[-- Type: text/x-patch, Size: 9017 bytes --]
diff --git a/sysdeps/aarch64/dl-machine.h b/sysdeps/aarch64/dl-machine.h
index 84b8aec..15d79a6 100644
--- a/sysdeps/aarch64/dl-machine.h
+++ b/sysdeps/aarch64/dl-machine.h
@@ -25,6 +25,7 @@
#include <tls.h>
#include <dl-tlsdesc.h>
#include <dl-irel.h>
+#include <cpu-features.c>
/* Return nonzero iff ELF header is compatible with the running host. */
static inline int __attribute__ ((unused))
@@ -225,6 +226,23 @@ _dl_start_user: \n\
#define ELF_MACHINE_NO_REL 1
#define ELF_MACHINE_NO_RELA 0
+#define DL_PLATFORM_INIT dl_platform_init ()
+
+static inline void __attribute__ ((unused))
+dl_platform_init (void)
+{
+ if (GLRO(dl_platform) != NULL && *GLRO(dl_platform) == '\0')
+ /* Avoid an empty string which would disturb us. */
+ GLRO(dl_platform) = NULL;
+
+#ifdef SHARED
+ /* init_cpu_features has been called early from __libc_start_main in
+ static executable. */
+ init_cpu_features (&GLRO(dl_aarch64_cpu_features));
+#endif
+}
+
+
static inline ElfW(Addr)
elf_machine_fixup_plt (struct link_map *map, lookup_t t,
const ElfW(Rela) *reloc,
diff --git a/sysdeps/aarch64/ldsodefs.h b/sysdeps/aarch64/ldsodefs.h
index f277074..ba4ada3 100644
--- a/sysdeps/aarch64/ldsodefs.h
+++ b/sysdeps/aarch64/ldsodefs.h
@@ -20,6 +20,7 @@
#define _AARCH64_LDSODEFS_H 1
#include <elf.h>
+#include <cpu-features.h>
struct La_aarch64_regs;
struct La_aarch64_retval;
diff --git a/sysdeps/unix/sysv/linux/aarch64/cpu-features.c b/sysdeps/unix/sysv/linux/aarch64/cpu-features.c
index e69de29..8e4b514 100644
--- a/sysdeps/unix/sysv/linux/aarch64/cpu-features.c
+++ b/sysdeps/unix/sysv/linux/aarch64/cpu-features.c
@@ -0,0 +1,38 @@
+/* Initialize CPU feature data. AArch64 version.
+ This file is part of the GNU C Library.
+ Copyright (C) 2017 Free Software Foundation, Inc.
+
+ The GNU C Library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Lesser General Public
+ License as published by the Free Software Foundation; either
+ version 2.1 of the License, or (at your option) any later version.
+
+ The GNU C Library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public
+ License along with the GNU C Library; if not, see
+ <http://www.gnu.org/licenses/>. */
+
+#include <cpu-features.h>
+
+#ifndef HWCAP_CPUID
+# define HWCAP_CPUID (1 << 11)
+#endif
+
+static inline void
+init_cpu_features (struct cpu_features *cpu_features)
+{
+ if (GLRO(dl_hwcap) & HWCAP_CPUID)
+ {
+ register uint64_t id = 0;
+ asm volatile ("mrs %0, midr_el1" : "=r"(id));
+ cpu_features->midr_el1 = id;
+ }
+ else
+ {
+ cpu_features->midr_el1 = 0;
+ }
+}
diff --git a/sysdeps/unix/sysv/linux/aarch64/cpu-features.h b/sysdeps/unix/sysv/linux/aarch64/cpu-features.h
index e69de29..c92b650 100644
--- a/sysdeps/unix/sysv/linux/aarch64/cpu-features.h
+++ b/sysdeps/unix/sysv/linux/aarch64/cpu-features.h
@@ -0,0 +1,49 @@
+/* Initialize CPU feature data. AArch64 version.
+ This file is part of the GNU C Library.
+ Copyright (C) 2017 Free Software Foundation, Inc.
+
+ The GNU C Library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Lesser General Public
+ License as published by the Free Software Foundation; either
+ version 2.1 of the License, or (at your option) any later version.
+
+ The GNU C Library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public
+ License along with the GNU C Library; if not, see
+ <http://www.gnu.org/licenses/>. */
+
+#ifndef _CPU_FEATURES_AARCH64_H
+#define _CPU_FEATURES_AARCH64_H
+
+#include <stdint.h>
+
+#define MIDR_PARTNUM_SHIFT 4
+#define MIDR_PARTNUM_MASK (0xfff << MIDR_PARTNUM_SHIFT)
+#define MIDR_PARTNUM(midr) \
+ (((midr) & MIDR_PARTNUM_MASK) >> MIDR_PARTNUM_SHIFT)
+#define MIDR_ARCHITECTURE_SHIFT 16
+#define MIDR_ARCHITECTURE_MASK (0xf << MIDR_ARCHITECTURE_SHIFT)
+#define MIDR_ARCHITECTURE(midr) \
+ (((midr) & MIDR_ARCHITECTURE_MASK) >> MIDR_ARCHITECTURE_SHIFT)
+#define MIDR_VARIANT_SHIFT 20
+#define MIDR_VARIANT_MASK (0xf << MIDR_VARIANT_SHIFT)
+#define MIDR_VARIANT(midr) \
+ (((midr) & MIDR_VARIANT_MASK) >> MIDR_VARIANT_SHIFT)
+#define MIDR_IMPLEMENTOR_SHIFT 24
+#define MIDR_IMPLEMENTOR_MASK (0xff << MIDR_IMPLEMENTOR_SHIFT)
+#define MIDR_IMPLEMENTOR(midr) \
+ (((midr) & MIDR_IMPLEMENTOR_MASK) >> MIDR_IMPLEMENTOR_SHIFT)
+
+#define IS_THUNDERX(midr) (MIDR_IMPLEMENTOR(midr) == 'C' \
+ && MIDR_PARTNUM(midr) == 0x0a1)
+
+struct cpu_features
+{
+ uint64_t midr_el1;
+};
+
+#endif /* _CPU_FEATURES_AARCH64_H */
diff --git a/sysdeps/unix/sysv/linux/aarch64/dl-procinfo.c b/sysdeps/unix/sysv/linux/aarch64/dl-procinfo.c
index e69de29..438046a 100644
--- a/sysdeps/unix/sysv/linux/aarch64/dl-procinfo.c
+++ b/sysdeps/unix/sysv/linux/aarch64/dl-procinfo.c
@@ -0,0 +1,60 @@
+/* Data for AArch64 version of processor capability information.
+ Linux version.
+ Copyright (C) 2017 Free Software Foundation, Inc.
+ This file is part of the GNU C Library.
+
+ The GNU C Library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Lesser General Public
+ License as published by the Free Software Foundation; either
+ version 2.1 of the License, or (at your option) any later version.
+
+ The GNU C Library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public
+ License along with the GNU C Library; if not, see
+ <http://www.gnu.org/licenses/>. */
+
+/* If anything should be added here check whether the size of each string
+ is still ok with the given array size.
+
+ All the #ifdefs in the definitions are quite irritating but
+ necessary if we want to avoid duplicating the information. There
+ are three different modes:
+
+ - PROCINFO_DECL is defined. This means we are only interested in
+ declarations.
+
+ - PROCINFO_DECL is not defined:
+
+ + if SHARED is defined the file is included in an array
+ initializer. The .element = { ... } syntax is needed.
+
+ + if SHARED is not defined a normal array initialization is
+ needed.
+ */
+
+#ifndef PROCINFO_CLASS
+# define PROCINFO_CLASS
+#endif
+
+#if !IS_IN (ldconfig)
+# if !defined PROCINFO_DECL && defined SHARED
+ ._dl_aarch64_cpu_features
+# else
+PROCINFO_CLASS struct cpu_features _dl_aarch64_cpu_features
+# endif
+# ifndef PROCINFO_DECL
+= { }
+# endif
+# if !defined SHARED || defined PROCINFO_DECL
+;
+# else
+,
+# endif
+#endif
+
+#undef PROCINFO_DECL
+#undef PROCINFO_CLASS
diff --git a/sysdeps/unix/sysv/linux/aarch64/libc-start.c b/sysdeps/unix/sysv/linux/aarch64/libc-start.c
index e69de29..a5babd4 100644
--- a/sysdeps/unix/sysv/linux/aarch64/libc-start.c
+++ b/sysdeps/unix/sysv/linux/aarch64/libc-start.c
@@ -0,0 +1,41 @@
+/* Override csu/libc-start.c on AArch64.
+ Copyright (C) 2017 Free Software Foundation, Inc.
+ This file is part of the GNU C Library.
+
+ The GNU C Library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Lesser General Public
+ License as published by the Free Software Foundation; either
+ version 2.1 of the License, or (at your option) any later version.
+
+ The GNU C Library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public
+ License along with the GNU C Library; if not, see
+ <http://www.gnu.org/licenses/>. */
+
+#ifdef SHARED
+# include <csu/libc-start.c>
+# else
+/* The main work is done in the generic function. */
+# define LIBC_START_DISABLE_INLINE
+# define LIBC_START_MAIN generic_start_main
+# include <csu/libc-start.c>
+# include <cpu-features.c>
+
+extern struct cpu_features _dl_aarch64_cpu_features;
+
+int
+__libc_start_main (int (*main) (int, char **, char ** MAIN_AUXVEC_DECL),
+ int argc, char **argv,
+ __typeof (main) init,
+ void (*fini) (void),
+ void (*rtld_fini) (void), void *stack_end)
+{
+ init_cpu_features (&_dl_aarch64_cpu_features);
+ return generic_start_main (main, argc, argv, init, fini, rtld_fini,
+ stack_end);
+}
+#endif
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH] Add ifunc memcpy and memmove for aarch64
2017-02-09 0:02 ` Steve Ellcey
@ 2017-02-09 10:51 ` Szabolcs Nagy
2017-02-09 11:04 ` Siddhesh Poyarekar
2017-02-09 15:54 ` Andrew Pinski
0 siblings, 2 replies; 38+ messages in thread
From: Szabolcs Nagy @ 2017-02-09 10:51 UTC (permalink / raw)
To: Steve Ellcey, Siddhesh Poyarekar, Adhemerval Zanella, Wilco Dijkstra
Cc: nd, libc-alpha
On 09/02/17 00:02, Steve Ellcey wrote:
> +static inline void __attribute__ ((unused))
> +dl_platform_init (void)
> +{
> + if (GLRO(dl_platform) != NULL && *GLRO(dl_platform) == '\0')
> + /* Avoid an empty string which would disturb us. */
> + GLRO(dl_platform) = NULL;
> +
> +#ifdef SHARED
> + /* init_cpu_features has been called early from __libc_start_main in
> + static executable. */
> + init_cpu_features (&GLRO(dl_aarch64_cpu_features));
> +#endif
> +}
...
> +static inline void
> +init_cpu_features (struct cpu_features *cpu_features)
> +{
> + if (GLRO(dl_hwcap) & HWCAP_CPUID)
> + {
> + register uint64_t id = 0;
> + asm volatile ("mrs %0, midr_el1" : "=r"(id));
> + cpu_features->midr_el1 = id;
this is a trap into the kernel at every process startup
since this is called very early (dynamic linking case
above, static linking case below) i wonder if there
could be a way for the user to request midr_el1==0
unconditionally (avoiding the overhead and making
sure the most generic implementation is used)
is there something like that on other targets?
> + }
> + else
> + {
> + cpu_features->midr_el1 = 0;
> + }
> +}
...
> +#ifdef SHARED
> +# include <csu/libc-start.c>
> +# else
> +/* The main work is done in the generic function. */
> +# define LIBC_START_DISABLE_INLINE
> +# define LIBC_START_MAIN generic_start_main
> +# include <csu/libc-start.c>
> +# include <cpu-features.c>
> +
> +extern struct cpu_features _dl_aarch64_cpu_features;
> +
> +int
> +__libc_start_main (int (*main) (int, char **, char ** MAIN_AUXVEC_DECL),
> + int argc, char **argv,
> + __typeof (main) init,
> + void (*fini) (void),
> + void (*rtld_fini) (void), void *stack_end)
> +{
> + init_cpu_features (&_dl_aarch64_cpu_features);
> + return generic_start_main (main, argc, argv, init, fini, rtld_fini,
> + stack_end);
> +}
> +#endif
>
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH] Add ifunc memcpy and memmove for aarch64
2017-02-09 10:51 ` Szabolcs Nagy
@ 2017-02-09 11:04 ` Siddhesh Poyarekar
2017-02-09 11:40 ` Szabolcs Nagy
2017-02-09 15:54 ` Andrew Pinski
1 sibling, 1 reply; 38+ messages in thread
From: Siddhesh Poyarekar @ 2017-02-09 11:04 UTC (permalink / raw)
To: Szabolcs Nagy, Steve Ellcey, Adhemerval Zanella, Wilco Dijkstra
Cc: nd, libc-alpha
On Thursday 09 February 2017 04:20 PM, Szabolcs Nagy wrote:
> this is a trap into the kernel at every process startup
>
> since this is called very early (dynamic linking case
> above, static linking case below) i wonder if there
> could be a way for the user to request midr_el1==0
> unconditionally (avoiding the overhead and making
> sure the most generic implementation is used)
Well you could use tunables to avoid it, but then if a single trap is a
problem for you then the tunables infra is going to be just as expensive.
Why is a single trap at startup such a concern though?
> is there something like that on other targets?
H.J. Lu had a patch to override IFUNCs using (what will now be) a
tunable but that patch did not make progress. I hope it will now since
I too am interested in overriding IFUNC selection using tunables. But
then this would be an orthogonal discussion to avoiding the trap since
the goals of both are different.
Siddhesh
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH] Add ifunc memcpy and memmove for aarch64
2017-02-09 11:04 ` Siddhesh Poyarekar
@ 2017-02-09 11:40 ` Szabolcs Nagy
2017-02-09 11:53 ` Siddhesh Poyarekar
0 siblings, 1 reply; 38+ messages in thread
From: Szabolcs Nagy @ 2017-02-09 11:40 UTC (permalink / raw)
To: Siddhesh Poyarekar, Steve Ellcey, Adhemerval Zanella, Wilco Dijkstra
Cc: nd, libc-alpha
On 09/02/17 11:04, Siddhesh Poyarekar wrote:
> On Thursday 09 February 2017 04:20 PM, Szabolcs Nagy wrote:
>> this is a trap into the kernel at every process startup
>>
>> since this is called very early (dynamic linking case
>> above, static linking case below) i wonder if there
>> could be a way for the user to request midr_el1==0
>> unconditionally (avoiding the overhead and making
>> sure the most generic implementation is used)
>
> Well you could use tunables to avoid it, but then if a single trap is a
> problem for you then the tunables infra is going to be just as expensive.
>
> Why is a single trap at startup such a concern though?
ok, it is probably not worth worrying about
(should be around 0.1% of the minimal startup time now)
but doing it just to control ifunc selection is still
useful (at least for development)
(if eventually there will be widespread use of ifunc
based function multi-versioning then the trap will
not be per process startup but per module load which
is a bit more relevant, but also not something we can
control from the libc)
>> is there something like that on other targets?
>
> H.J. Lu had a patch to override IFUNCs using (what will now be) a
> tunable but that patch did not make progress. I hope it will now since
> I too am interested in overriding IFUNC selection using tunables. But
> then this would be an orthogonal discussion to avoiding the trap since
> the goals of both are different.
i see.
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH] Add ifunc memcpy and memmove for aarch64
2017-02-09 11:40 ` Szabolcs Nagy
@ 2017-02-09 11:53 ` Siddhesh Poyarekar
0 siblings, 0 replies; 38+ messages in thread
From: Siddhesh Poyarekar @ 2017-02-09 11:53 UTC (permalink / raw)
To: Szabolcs Nagy, Steve Ellcey, Adhemerval Zanella, Wilco Dijkstra
Cc: nd, libc-alpha
On Thursday 09 February 2017 05:10 PM, Szabolcs Nagy wrote:
> ok, it is probably not worth worrying about
> (should be around 0.1% of the minimal startup time now)
>
> but doing it just to control ifunc selection is still
> useful (at least for development)
A simple tunable that disables CPU selection would be nice:
glibc.tune.cpu = default|thunderx|a57
However then you would have to think harder about positioning the cpu
structure initialization in static binaries to have them be controlled
by tunables and not just blindly put them right at the top of
__libc_start as the easy way out. It is something that should be done
anyway.
However, couldn't we do that as an add-on to this patch? I'd really
like this framework to go in early and be exercised a bit because I am
interested in pushing (in the coming weeks hopefully) some optimal
string routines for aarch64.
> (if eventually there will be widespread use of ifunc
> based function multi-versioning then the trap will
> not be per process startup but per module load which
> is a bit more relevant, but also not something we can
> control from the libc)
Right we cannot control that.
Siddhesh
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH] Add ifunc memcpy and memmove for aarch64
2017-02-09 10:51 ` Szabolcs Nagy
2017-02-09 11:04 ` Siddhesh Poyarekar
@ 2017-02-09 15:54 ` Andrew Pinski
2017-02-10 0:55 ` Steve Ellcey
1 sibling, 1 reply; 38+ messages in thread
From: Andrew Pinski @ 2017-02-09 15:54 UTC (permalink / raw)
To: Szabolcs Nagy
Cc: Steve Ellcey, Siddhesh Poyarekar, Adhemerval Zanella,
Wilco Dijkstra, nd, libc-alpha
On Thu, Feb 9, 2017 at 2:50 AM, Szabolcs Nagy <szabolcs.nagy@arm.com> wrote:
> On 09/02/17 00:02, Steve Ellcey wrote:
>> +static inline void __attribute__ ((unused))
>> +dl_platform_init (void)
>> +{
>> + if (GLRO(dl_platform) != NULL && *GLRO(dl_platform) == '\0')
>> + /* Avoid an empty string which would disturb us. */
>> + GLRO(dl_platform) = NULL;
>> +
>> +#ifdef SHARED
>> + /* init_cpu_features has been called early from __libc_start_main in
>> + static executable. */
>> + init_cpu_features (&GLRO(dl_aarch64_cpu_features));
>> +#endif
>> +}
> ...
>> +static inline void
>> +init_cpu_features (struct cpu_features *cpu_features)
>> +{
>> + if (GLRO(dl_hwcap) & HWCAP_CPUID)
>> + {
>> + register uint64_t id = 0;
>> + asm volatile ("mrs %0, midr_el1" : "=r"(id));
>> + cpu_features->midr_el1 = id;
>
> this is a trap into the kernel at every process startup
>
> since this is called very early (dynamic linking case
> above, static linking case below) i wonder if there
> could be a way for the user to request midr_el1==0
> unconditionally (avoiding the overhead and making
> sure the most generic implementation is used)
Well the easy way to do this would be use LD_HWCAP_MASK and mask off
the HWCAP_CPUID bit.
>
> is there something like that on other targets?
Some targets like PowerPC use the hwcap to say which processor they
are being run on so you use the same method as above.
Thanks,
Andrew
>
>> + }
>> + else
>> + {
>> + cpu_features->midr_el1 = 0;
>> + }
>> +}
> ...
>> +#ifdef SHARED
>> +# include <csu/libc-start.c>
>> +# else
>> +/* The main work is done in the generic function. */
>> +# define LIBC_START_DISABLE_INLINE
>> +# define LIBC_START_MAIN generic_start_main
>> +# include <csu/libc-start.c>
>> +# include <cpu-features.c>
>> +
>> +extern struct cpu_features _dl_aarch64_cpu_features;
>> +
>> +int
>> +__libc_start_main (int (*main) (int, char **, char ** MAIN_AUXVEC_DECL),
>> + int argc, char **argv,
>> + __typeof (main) init,
>> + void (*fini) (void),
>> + void (*rtld_fini) (void), void *stack_end)
>> +{
>> + init_cpu_features (&_dl_aarch64_cpu_features);
>> + return generic_start_main (main, argc, argv, init, fini, rtld_fini,
>> + stack_end);
>> +}
>> +#endif
>>
>
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH] Add ifunc memcpy and memmove for aarch64
2017-02-09 15:54 ` Andrew Pinski
@ 2017-02-10 0:55 ` Steve Ellcey
2017-02-23 0:27 ` Steve Ellcey
0 siblings, 1 reply; 38+ messages in thread
From: Steve Ellcey @ 2017-02-10 0:55 UTC (permalink / raw)
To: Andrew Pinski, Szabolcs Nagy
Cc: Siddhesh Poyarekar, Adhemerval Zanella, Wilco Dijkstra, nd, libc-alpha
On Thu, 2017-02-09 at 07:54 -0800, Andrew Pinski wrote:
> On Thu, Feb 9, 2017 at 2:50 AM, Szabolcs Nagy <szabolcs.nagy@arm.com> wrote:>
> > since this is called very early (dynamic linking case
> > above, static linking case below) i wonder if there
> > could be a way for the user to request midr_el1==0
> > unconditionally (avoiding the overhead and making
> > sure the most generic implementation is used)
> Well the easy way to do this would be use LD_HWCAP_MASK and mask off
> the HWCAP_CPUID bit.
>
> > is there something like that on other targets?
> Some targets like PowerPC use the hwcap to say which processor they
> are being run on so you use the same method as above.
>
> Thanks,
> Andrew
Do you know if LD_HWCAP_MASK actually works on PowerPC to do this? Â I
see where PowerPC is reading GLRO(dl_hwcap) but I don't see them
reading GLRO(dl_hwcap_mask). Â I don't think that the generic parts of
libc apply LD_HWCAP_MASK to HWCAP automatically but maybe I missed it
somewhere.
I changed init_cpu_features in my patch from:
if (GLRO(dl_hwcap) & HWCAP_CPUID)
to
if (GLRO(dl_hwcap) & GLRO(dl_hwcap_mask) & HWCAP_CPUID)
to see what would happen. Â Initially this returned false because
we were using the default dl-procinfo.h and HWCAP_IMPORTANT (used
to initialize dl_hwcap_mask) was 0. Â I made a copy of dl-procinfo.h
and set HWCAP_IMPORTANT to HPWCAP_CPUID and I got true here and the
thunderx ifuncs were run.
But if I ran things after setting LD_HWCAP_MASK to 0 it didn't seem to
have any affect and I still ran thunderx ifuncs. Â I am not sure but it
seemed like this code in init_cpu_features may have been getting run
before LD_HWCAP_MASK was getting read and before GLRO(dl_hwcap_mask)
was reset from its initial value.
Steve Ellcey
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH] Add ifunc memcpy and memmove for aarch64
2017-02-10 0:55 ` Steve Ellcey
@ 2017-02-23 0:27 ` Steve Ellcey
2017-02-23 14:21 ` Siddhesh Poyarekar
0 siblings, 1 reply; 38+ messages in thread
From: Steve Ellcey @ 2017-02-23 0:27 UTC (permalink / raw)
To: Andrew Pinski, Szabolcs Nagy
Cc: Siddhesh Poyarekar, Adhemerval Zanella, Wilco Dijkstra, nd, libc-alpha
[-- Attachment #1: Type: text/plain, Size: 1471 bytes --]
Here is a new version of the IFUNC functionality for aarch64. Â It does
not include the memcpy changes to use it. Â I tried to move the cpu
feature check to later in the start up process so that I could check
GLRO(dl_hwcap_mask) but it does not seem to be working. Â I was
wondering if anyone could look at this and see if I am doing something
wrong.
If I understand the code correctly, for a dynamically linked program,
dl_main is going to get called before dl_sysdep_start and dl_main is
where environment variables are processed and where dl_hwcap_mask
should be getting set. Â But when I check dl_hwcap_mask in
init_cpu_features (called from dl_sysdep_start), it does not seem to
have changed from its original value that is set in the new
dl-procinfo.h header file.
Any ideas on why this isn't working?
Steve Ellcey
sellcey@cavium.com
2017-02-22  Steve Ellcey  <sellcey@caviumnetworks.com>
    Adhemerval Zanella  <adhemerval.zanella@linaro.org>
* csu/libc-start.c (LIBC_START_MAIN): Use INIT_CPU_FEATURES.
* elf/dl-sysdep.c (_dl_sysdep_start): Likewise.
* sysdeps/aarch64/dl-procinfo.h: New file.
* sysdeps/aarch64/ldsodefs.h: Include cpu-features.h.
* sysdeps/unix/sysv/linux/aarch64/Makefile (sysdep_routines):
Add cpu-features.
* sysdeps/unix/sysv/linux/aarch64/cpu-features.c: New file.
* sysdeps/unix/sysv/linux/aarch64/cpu-features.h: Likewise.
* sysdeps/unix/sysv/linux/aarch64/dl-procinfo.c: Likewise.
[-- Attachment #2: ifunc.patch --]
[-- Type: text/x-patch, Size: 9536 bytes --]
diff --git a/csu/libc-start.c b/csu/libc-start.c
index 9a56dcb..ec19466 100644
--- a/csu/libc-start.c
+++ b/csu/libc-start.c
@@ -182,6 +182,10 @@ LIBC_START_MAIN (int (*main) (int, char **, char ** MAIN_AUXVEC_DECL),
__tunables_init (__environ);
+#ifdef INIT_CPU_FEATURES
+ INIT_CPU_FEATURES;
+#endif
+
/* Perform IREL{,A} relocations. */
apply_irel ();
diff --git a/elf/dl-sysdep.c b/elf/dl-sysdep.c
index 4053ff3..c963a1e 100644
--- a/elf/dl-sysdep.c
+++ b/elf/dl-sysdep.c
@@ -223,6 +223,10 @@ _dl_sysdep_start (void **start_argptr,
__tunables_init (_environ);
+#ifdef INIT_CPU_FEATURES
+ INIT_CPU_FEATURES;
+#endif
+
#ifdef DL_SYSDEP_INIT
DL_SYSDEP_INIT;
#endif
diff --git a/sysdeps/aarch64/dl-procinfo.h b/sysdeps/aarch64/dl-procinfo.h
index e69de29..0e0829f 100644
--- a/sysdeps/aarch64/dl-procinfo.h
+++ b/sysdeps/aarch64/dl-procinfo.h
@@ -0,0 +1,50 @@
+/* Stub version of processor capability information handling macros.
+ Copyright (C) 1998-2017 Free Software Foundation, Inc.
+ This file is part of the GNU C Library.
+ Contributed by Ulrich Drepper <drepper@cygnus.com>, 1998.
+
+ The GNU C Library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Lesser General Public
+ License as published by the Free Software Foundation; either
+ version 2.1 of the License, or (at your option) any later version.
+
+ The GNU C Library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public
+ License along with the GNU C Library; if not, see
+ <http://www.gnu.org/licenses/>. */
+
+#ifndef _DL_PROCINFO_H
+#define _DL_PROCINFO_H 1
+
+/* We cannot provide a general printing function. */
+#define _dl_procinfo(type, word) -1
+
+/* There are no hardware capabilities defined. */
+#define _dl_hwcap_string(idx) ""
+
+/* There are no different platforms defined. */
+#define _dl_platform_string(idx) ""
+
+/* Needed here until this gets into kernel sources. */
+#ifndef HWCAP_CPUID
+# define HWCAP_CPUID (1 << 11)
+#endif
+
+/* By default there is no important hardware capability. */
+#define HWCAP_IMPORTANT (HWCAP_CPUID)
+
+/* There're no platforms to filter out. */
+#define _DL_HWCAP_PLATFORM 0
+
+/* We don't have any hardware capabilities. */
+#define _DL_HWCAP_COUNT 0
+
+#define _dl_string_hwcap(str) (-1)
+
+#define _dl_string_platform(str) (-1)
+
+#endif /* dl-procinfo.h */
diff --git a/sysdeps/aarch64/ldsodefs.h b/sysdeps/aarch64/ldsodefs.h
index f277074..ba4ada3 100644
--- a/sysdeps/aarch64/ldsodefs.h
+++ b/sysdeps/aarch64/ldsodefs.h
@@ -20,6 +20,7 @@
#define _AARCH64_LDSODEFS_H 1
#include <elf.h>
+#include <cpu-features.h>
struct La_aarch64_regs;
struct La_aarch64_retval;
diff --git a/sysdeps/unix/sysv/linux/aarch64/Makefile b/sysdeps/unix/sysv/linux/aarch64/Makefile
index 6b4e620..d17dafe 100644
--- a/sysdeps/unix/sysv/linux/aarch64/Makefile
+++ b/sysdeps/unix/sysv/linux/aarch64/Makefile
@@ -1,5 +1,6 @@
ifeq ($(subdir),csu)
sysdep_routines += __read_tp libc-__read_tp
+sysdep_routines += cpu-features
static-only-routines += __read_tp
shared-only-routines += libc-__read_tp
endif
diff --git a/sysdeps/unix/sysv/linux/aarch64/cpu-features.c b/sysdeps/unix/sysv/linux/aarch64/cpu-features.c
index e69de29..867e1ca 100644
--- a/sysdeps/unix/sysv/linux/aarch64/cpu-features.c
+++ b/sysdeps/unix/sysv/linux/aarch64/cpu-features.c
@@ -0,0 +1,45 @@
+/* Initialize CPU feature data. AArch64 version.
+ This file is part of the GNU C Library.
+ Copyright (C) 2017 Free Software Foundation, Inc.
+
+ The GNU C Library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Lesser General Public
+ License as published by the Free Software Foundation; either
+ version 2.1 of the License, or (at your option) any later version.
+
+ The GNU C Library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public
+ License along with the GNU C Library; if not, see
+ <http://www.gnu.org/licenses/>. */
+
+#include <cpu-features.h>
+#include <assert.h>
+#include <stdlib.h>
+#include <stdio.h>
+#include <unistd.h>
+#include <ldsodefs.h>
+#include <exit-thread.h>
+#include <libc-internal.h>
+
+#ifndef HWCAP_CPUID
+# define HWCAP_CPUID (1 << 11)
+#endif
+
+void
+init_cpu_features (struct cpu_features *cpu_features)
+{
+ if (HWCAP_CPUID & GLRO(dl_hwcap) & GLRO(dl_hwcap_mask))
+ {
+ register uint64_t id = 0;
+ asm volatile ("mrs %0, midr_el1" : "=r"(id));
+ cpu_features->midr_el1 = id;
+ }
+ else
+ {
+ cpu_features->midr_el1 = 0;
+ }
+}
diff --git a/sysdeps/unix/sysv/linux/aarch64/cpu-features.h b/sysdeps/unix/sysv/linux/aarch64/cpu-features.h
index e69de29..0b2a51b 100644
--- a/sysdeps/unix/sysv/linux/aarch64/cpu-features.h
+++ b/sysdeps/unix/sysv/linux/aarch64/cpu-features.h
@@ -0,0 +1,53 @@
+/* Initialize CPU feature data. AArch64 version.
+ This file is part of the GNU C Library.
+ Copyright (C) 2017 Free Software Foundation, Inc.
+
+ The GNU C Library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Lesser General Public
+ License as published by the Free Software Foundation; either
+ version 2.1 of the License, or (at your option) any later version.
+
+ The GNU C Library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public
+ License along with the GNU C Library; if not, see
+ <http://www.gnu.org/licenses/>. */
+
+#ifndef _CPU_FEATURES_AARCH64_H
+#define _CPU_FEATURES_AARCH64_H
+
+#include <stdint.h>
+
+#define MIDR_PARTNUM_SHIFT 4
+#define MIDR_PARTNUM_MASK (0xfff << MIDR_PARTNUM_SHIFT)
+#define MIDR_PARTNUM(midr) \
+ (((midr) & MIDR_PARTNUM_MASK) >> MIDR_PARTNUM_SHIFT)
+#define MIDR_ARCHITECTURE_SHIFT 16
+#define MIDR_ARCHITECTURE_MASK (0xf << MIDR_ARCHITECTURE_SHIFT)
+#define MIDR_ARCHITECTURE(midr) \
+ (((midr) & MIDR_ARCHITECTURE_MASK) >> MIDR_ARCHITECTURE_SHIFT)
+#define MIDR_VARIANT_SHIFT 20
+#define MIDR_VARIANT_MASK (0xf << MIDR_VARIANT_SHIFT)
+#define MIDR_VARIANT(midr) \
+ (((midr) & MIDR_VARIANT_MASK) >> MIDR_VARIANT_SHIFT)
+#define MIDR_IMPLEMENTOR_SHIFT 24
+#define MIDR_IMPLEMENTOR_MASK (0xff << MIDR_IMPLEMENTOR_SHIFT)
+#define MIDR_IMPLEMENTOR(midr) \
+ (((midr) & MIDR_IMPLEMENTOR_MASK) >> MIDR_IMPLEMENTOR_SHIFT)
+
+#define IS_THUNDERX(midr) (MIDR_IMPLEMENTOR(midr) == 'C' \
+ && MIDR_PARTNUM(midr) == 0x0a1)
+
+struct cpu_features
+{
+ uint64_t midr_el1;
+};
+
+void init_cpu_features (struct cpu_features *cpu_features);
+
+#define INIT_CPU_FEATURES init_cpu_features(&GLRO(dl_aarch64_cpu_features))
+
+#endif /* _CPU_FEATURES_AARCH64_H */
diff --git a/sysdeps/unix/sysv/linux/aarch64/dl-procinfo.c b/sysdeps/unix/sysv/linux/aarch64/dl-procinfo.c
index e69de29..438046a 100644
--- a/sysdeps/unix/sysv/linux/aarch64/dl-procinfo.c
+++ b/sysdeps/unix/sysv/linux/aarch64/dl-procinfo.c
@@ -0,0 +1,60 @@
+/* Data for AArch64 version of processor capability information.
+ Linux version.
+ Copyright (C) 2017 Free Software Foundation, Inc.
+ This file is part of the GNU C Library.
+
+ The GNU C Library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Lesser General Public
+ License as published by the Free Software Foundation; either
+ version 2.1 of the License, or (at your option) any later version.
+
+ The GNU C Library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public
+ License along with the GNU C Library; if not, see
+ <http://www.gnu.org/licenses/>. */
+
+/* If anything should be added here check whether the size of each string
+ is still ok with the given array size.
+
+ All the #ifdefs in the definitions are quite irritating but
+ necessary if we want to avoid duplicating the information. There
+ are three different modes:
+
+ - PROCINFO_DECL is defined. This means we are only interested in
+ declarations.
+
+ - PROCINFO_DECL is not defined:
+
+ + if SHARED is defined the file is included in an array
+ initializer. The .element = { ... } syntax is needed.
+
+ + if SHARED is not defined a normal array initialization is
+ needed.
+ */
+
+#ifndef PROCINFO_CLASS
+# define PROCINFO_CLASS
+#endif
+
+#if !IS_IN (ldconfig)
+# if !defined PROCINFO_DECL && defined SHARED
+ ._dl_aarch64_cpu_features
+# else
+PROCINFO_CLASS struct cpu_features _dl_aarch64_cpu_features
+# endif
+# ifndef PROCINFO_DECL
+= { }
+# endif
+# if !defined SHARED || defined PROCINFO_DECL
+;
+# else
+,
+# endif
+#endif
+
+#undef PROCINFO_DECL
+#undef PROCINFO_CLASS
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH] Add ifunc memcpy and memmove for aarch64
2017-02-23 0:27 ` Steve Ellcey
@ 2017-02-23 14:21 ` Siddhesh Poyarekar
2017-02-23 16:20 ` Steve Ellcey
0 siblings, 1 reply; 38+ messages in thread
From: Siddhesh Poyarekar @ 2017-02-23 14:21 UTC (permalink / raw)
To: Steve Ellcey, Andrew Pinski, Szabolcs Nagy
Cc: Adhemerval Zanella, Wilco Dijkstra, nd, libc-alpha
On Thursday 23 February 2017 05:57 AM, Steve Ellcey wrote:
> Here is a new version of the IFUNC functionality for aarch64. It does
> not include the memcpy changes to use it. I tried to move the cpu
> feature check to later in the start up process so that I could check
> GLRO(dl_hwcap_mask) but it does not seem to be working. I was
> wondering if anyone could look at this and see if I am doing something
> wrong.
>
> If I understand the code correctly, for a dynamically linked program,
> dl_main is going to get called before dl_sysdep_start and dl_main is
> where environment variables are processed and where dl_hwcap_mask
> should be getting set. But when I check dl_hwcap_mask in
> init_cpu_features (called from dl_sysdep_start), it does not seem to
> have changed from its original value that is set in the new
> dl-procinfo.h header file.
dl_sysdep_start calls dl_main, so you've got the order wrong. If you
want init_cpu_features to read dl_hwcap_mask then you'll have to move
the code to read LD_* environment variables earlier. In fact they need
to go into tunables, something I've had in mind for this release.
Do you want me to move the envvars into tunables or would you like to do it?
Siddhesh
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH] Add ifunc memcpy and memmove for aarch64
2017-02-23 14:21 ` Siddhesh Poyarekar
@ 2017-02-23 16:20 ` Steve Ellcey
2017-02-23 16:34 ` Siddhesh Poyarekar
0 siblings, 1 reply; 38+ messages in thread
From: Steve Ellcey @ 2017-02-23 16:20 UTC (permalink / raw)
To: Siddhesh Poyarekar, Andrew Pinski, Szabolcs Nagy
Cc: Adhemerval Zanella, Wilco Dijkstra, nd, libc-alpha
On Thu, 2017-02-23 at 19:51 +0530, Siddhesh Poyarekar wrote:
>Â
> dl_sysdep_start calls dl_main, so you've got the order wrong.  If you
> want init_cpu_features to read dl_hwcap_mask then you'll have to move
> the code to read LD_* environment variables earlier.  In fact they need
> to go into tunables, something I've had in mind for this release.
>
> Do you want me to move the envvars into tunables or would you like to
> do it?
>
> Siddhesh
If you want to move it, that would be great.
Steve Ellcey
sellcey@cavium.com
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH] Add ifunc memcpy and memmove for aarch64
2017-02-23 16:20 ` Steve Ellcey
@ 2017-02-23 16:34 ` Siddhesh Poyarekar
2017-02-23 16:42 ` Steve Ellcey
0 siblings, 1 reply; 38+ messages in thread
From: Siddhesh Poyarekar @ 2017-02-23 16:34 UTC (permalink / raw)
To: Steve Ellcey, Andrew Pinski, Szabolcs Nagy
Cc: Adhemerval Zanella, Wilco Dijkstra, nd, libc-alpha
On Thursday 23 February 2017 09:50 PM, Steve Ellcey wrote:
> If you want to move it, that would be great.
OK, I'll get started on it. Meanwhile, it would be nice to get the
earlier patch in since it works well for everything except
dl_hwcap_mask. Do the aarch64 machine maintainers have a strong opinion
on this?
Siddhesh
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH] Add ifunc memcpy and memmove for aarch64
2017-02-23 16:34 ` Siddhesh Poyarekar
@ 2017-02-23 16:42 ` Steve Ellcey
2017-02-23 16:53 ` Siddhesh Poyarekar
0 siblings, 1 reply; 38+ messages in thread
From: Steve Ellcey @ 2017-02-23 16:42 UTC (permalink / raw)
To: Siddhesh Poyarekar, Andrew Pinski, Szabolcs Nagy
Cc: Adhemerval Zanella, Wilco Dijkstra, nd, libc-alpha
On Thu, 2017-02-23 at 22:04 +0530, Siddhesh Poyarekar wrote:
> On Thursday 23 February 2017 09:50 PM, Steve Ellcey wrote:
> >
> > If you want to move it, that would be great.
> OK, I'll get started on it.  Meanwhile, it would be nice to get the
> earlier patch in since it works well for everything except
> dl_hwcap_mask.  Do the aarch64 machine maintainers have a strong opinion
> on this?
>
> Siddhesh
Just to be clear, the earlier patch would be this one, right?
https://sourceware.org/ml/libc-alpha/2017-02/msg00175.html
Steve Ellcey
sellcey@cavium.com
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH] Add ifunc memcpy and memmove for aarch64
2017-02-23 16:42 ` Steve Ellcey
@ 2017-02-23 16:53 ` Siddhesh Poyarekar
2017-03-01 18:48 ` Steve Ellcey
0 siblings, 1 reply; 38+ messages in thread
From: Siddhesh Poyarekar @ 2017-02-23 16:53 UTC (permalink / raw)
To: Steve Ellcey, Andrew Pinski, Szabolcs Nagy
Cc: Adhemerval Zanella, Wilco Dijkstra, nd, libc-alpha
On Thursday 23 February 2017 10:12 PM, Steve Ellcey wrote:
> Just to be clear, the earlier patch would be this one, right?
>
> https://sourceware.org/ml/libc-alpha/2017-02/msg00175.html
Yes. It does everything necessary to get multiarch enabled and working
correctly for kernels that support it. The hwcap_mask feature is
something that can be added on top and I can commit to doing that within
this release.
Siddhesh
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH] Add ifunc memcpy and memmove for aarch64
2017-02-23 16:53 ` Siddhesh Poyarekar
@ 2017-03-01 18:48 ` Steve Ellcey
2017-03-14 18:46 ` Szabolcs Nagy
0 siblings, 1 reply; 38+ messages in thread
From: Steve Ellcey @ 2017-03-01 18:48 UTC (permalink / raw)
To: Siddhesh Poyarekar, Andrew Pinski, Szabolcs Nagy
Cc: Adhemerval Zanella, Wilco Dijkstra, nd, libc-alpha
On Thu, 2017-02-23 at 22:23 +0530, Siddhesh Poyarekar wrote:
> On Thursday 23 February 2017 10:12 PM, Steve Ellcey wrote:
> >
> > Just to be clear, the earlier patch would be this one, right?
> >
> > https://sourceware.org/ml/libc-alpha/2017-02/msg00175.html
> Yes.  It does everything necessary to get multiarch enabled and working
> correctly for kernels that support it.  The hwcap_mask feature is
> something that can be added on top and I can commit to doing that within
> this release.
>
> Siddhesh
Ping. Â Does anyone have any comments or objections to this patch
that enables IFUNC on aarch64?
Steve Ellcey
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH] Add ifunc memcpy and memmove for aarch64
2017-03-01 18:48 ` Steve Ellcey
@ 2017-03-14 18:46 ` Szabolcs Nagy
2017-03-14 18:51 ` Andrew Pinski
2017-03-15 23:53 ` Steve Ellcey
0 siblings, 2 replies; 38+ messages in thread
From: Szabolcs Nagy @ 2017-03-14 18:46 UTC (permalink / raw)
To: Steve Ellcey, Siddhesh Poyarekar, Andrew Pinski
Cc: nd, Adhemerval Zanella, Wilco Dijkstra, libc-alpha, Marcus Shawcroft
On 01/03/17 18:48, Steve Ellcey wrote:
> On Thu, 2017-02-23 at 22:23 +0530, Siddhesh Poyarekar wrote:
>> On Thursday 23 February 2017 10:12 PM, Steve Ellcey wrote:
>>>
>>> Just to be clear, the earlier patch would be this one, right?
>>>
>>> https://sourceware.org/ml/libc-alpha/2017-02/msg00175.html
>> Yes. It does everything necessary to get multiarch enabled and working
>> correctly for kernels that support it. The hwcap_mask feature is
>> something that can be added on top and I can commit to doing that within
>> this release.
>>
>> Siddhesh
>
> Ping. Does anyone have any comments or objections to this patch
> that enables IFUNC on aarch64?
the patch looks ok, with HWCAP in bits/hwcap.h instead of
+/* Needed here until this gets into kernel sources. */
+#ifndef HWCAP_CPUID
+# define HWCAP_CPUID (1 << 11)
+#endif
the hwcap value is not yet in linux v4.10, but already
allocated, if we are committed to this value then i
think it's better to only have it in one place.
you may need to include bis/hwcap.h in some files.
i was not sure if we should wait for this to be in a
linux release, but i guess the value won't change now.
an unrelated issue i was wondering about is how the
upcoming ilp32 ifunc resolver will receive the hwcap
argument: will it only see 32bit (unsigned long hwcap)
or 64 bits (with different ifunc resolver prototype)
and if there is something in this area that needs to
be changed before ifuncs are used.
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH] Add ifunc memcpy and memmove for aarch64
2017-03-14 18:46 ` Szabolcs Nagy
@ 2017-03-14 18:51 ` Andrew Pinski
2017-03-15 23:53 ` Steve Ellcey
1 sibling, 0 replies; 38+ messages in thread
From: Andrew Pinski @ 2017-03-14 18:51 UTC (permalink / raw)
To: Szabolcs Nagy
Cc: Steve Ellcey, Siddhesh Poyarekar, nd, Adhemerval Zanella,
Wilco Dijkstra, libc-alpha, Marcus Shawcroft
On Tue, Mar 14, 2017 at 11:45 AM, Szabolcs Nagy <szabolcs.nagy@arm.com> wrote:
> On 01/03/17 18:48, Steve Ellcey wrote:
>> On Thu, 2017-02-23 at 22:23 +0530, Siddhesh Poyarekar wrote:
>>> On Thursday 23 February 2017 10:12 PM, Steve Ellcey wrote:
>>>>
>>>> Just to be clear, the earlier patch would be this one, right?
>>>>
>>>> https://sourceware.org/ml/libc-alpha/2017-02/msg00175.html
>>> Yes. It does everything necessary to get multiarch enabled and working
>>> correctly for kernels that support it. The hwcap_mask feature is
>>> something that can be added on top and I can commit to doing that within
>>> this release.
>>>
>>> Siddhesh
>>
>> Ping. Does anyone have any comments or objections to this patch
>> that enables IFUNC on aarch64?
>
> the patch looks ok, with HWCAP in bits/hwcap.h instead of
>
> +/* Needed here until this gets into kernel sources. */
> +#ifndef HWCAP_CPUID
> +# define HWCAP_CPUID (1 << 11)
> +#endif
>
> the hwcap value is not yet in linux v4.10, but already
> allocated, if we are committed to this value then i
> think it's better to only have it in one place.
> you may need to include bis/hwcap.h in some files.
>
> i was not sure if we should wait for this to be in a
> linux release, but i guess the value won't change now.
It is in 4.11rc1 though. Is that ok enough?
>
> an unrelated issue i was wondering about is how the
> upcoming ilp32 ifunc resolver will receive the hwcap
> argument: will it only see 32bit (unsigned long hwcap)
> or 64 bits (with different ifunc resolver prototype)
> and if there is something in this area that needs to
> be changed before ifuncs are used.
Right now, the hwcap is not using the full 64bit. But the kernel will
have to split it into hwcap and hwcap2 anyways when we hit that point.
Most likely we should have the ifunc resolver take an uint64_t now
rather than latter.
Thanks,
Andrew Pinski
>
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH] Add ifunc memcpy and memmove for aarch64
2017-03-14 18:46 ` Szabolcs Nagy
2017-03-14 18:51 ` Andrew Pinski
@ 2017-03-15 23:53 ` Steve Ellcey
2017-03-22 5:38 ` Aarch64 machine maintainership (was: Add ifunc memcpy and memmove for aarch64) Siddhesh Poyarekar
1 sibling, 1 reply; 38+ messages in thread
From: Steve Ellcey @ 2017-03-15 23:53 UTC (permalink / raw)
To: Szabolcs Nagy, Siddhesh Poyarekar, Andrew Pinski
Cc: nd, Adhemerval Zanella, Wilco Dijkstra, libc-alpha, Marcus Shawcroft
On Tue, 2017-03-14 at 18:45 +0000, Szabolcs Nagy wrote:
>Â
> the hwcap value is not yet in linux v4.10, but already
> allocated, if we are committed to this value then i
> think it's better to only have it in one place.
> you may need to include bis/hwcap.h in some files.
I checked this patch in after moving the HWCAP_CPUID to bits/hwcap.h
and adding an include of sys/auxv.h to cpu-features.c to get the value.
When I added an include of bits/hwcap.h directly I got an error about
including auxv.h instead of hwcap.h.
Steve Ellcey
sellcey@cavium.com
^ permalink raw reply [flat|nested] 38+ messages in thread
* Aarch64 machine maintainership (was: Add ifunc memcpy and memmove for aarch64)
2017-03-15 23:53 ` Steve Ellcey
@ 2017-03-22 5:38 ` Siddhesh Poyarekar
2017-03-22 17:34 ` Joseph Myers
0 siblings, 1 reply; 38+ messages in thread
From: Siddhesh Poyarekar @ 2017-03-22 5:38 UTC (permalink / raw)
To: Steve Ellcey, Szabolcs Nagy, Andrew Pinski
Cc: nd, Adhemerval Zanella, Wilco Dijkstra, libc-alpha, Marcus Shawcroft
On Thursday 16 March 2017 05:22 AM, Steve Ellcey wrote:
> I checked this patch in after moving the HWCAP_CPUID to bits/hwcap.h
> and adding an include of sys/auxv.h to cpu-features.c to get the value.
> When I added an include of bits/hwcap.h directly I got an error about
> including auxv.h instead of hwcap.h.
Technically you still needed to get an ack from Marcus who is the only
machine maintainer for aarch64. Practically though, the patch is fine
and it appears that Marcus hasn't had the time to review the patch and
it doesn't make sense to wait too long for something that has had
all-round consensus.
That said, it would be nice to have one of two things happen going forward:
Either Marcus names one or more machine maintainers for aarch64 that
have the bandwidth to review and approve aarch64 patches for glibc. I
would like to propose Adhemerval as an aarch64 machine maintainer too
since he has the necessary experience in glibc and also the involvement
in aarch64 work.
Alternatively, make aarch64 patch review open like x86. This might be
the way forward given that aarch64 development interests are getting
increasingly diverse (probably more so than x86) and it may become quite
difficult for a single maintainer from ARM to gate everything that goes in.
Siddhesh
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: Aarch64 machine maintainership (was: Add ifunc memcpy and memmove for aarch64)
2017-03-22 5:38 ` Aarch64 machine maintainership (was: Add ifunc memcpy and memmove for aarch64) Siddhesh Poyarekar
@ 2017-03-22 17:34 ` Joseph Myers
2017-03-22 17:52 ` Aarch64 machine maintainership Siddhesh Poyarekar
0 siblings, 1 reply; 38+ messages in thread
From: Joseph Myers @ 2017-03-22 17:34 UTC (permalink / raw)
To: Siddhesh Poyarekar
Cc: Steve Ellcey, Szabolcs Nagy, Andrew Pinski, nd,
Adhemerval Zanella, Wilco Dijkstra, libc-alpha, Marcus Shawcroft
On Wed, 22 Mar 2017, Siddhesh Poyarekar wrote:
> Technically you still needed to get an ack from Marcus who is the only
> machine maintainer for aarch64. Practically though, the patch is fine
No, machine patches are subject to consensus just like any other patches,
and such consensus can be reached without needing a machine maintainer to
comment (although one might hope they would review most substantial
patches for their machine).
--
Joseph S. Myers
joseph@codesourcery.com
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: Aarch64 machine maintainership
2017-03-22 17:34 ` Joseph Myers
@ 2017-03-22 17:52 ` Siddhesh Poyarekar
0 siblings, 0 replies; 38+ messages in thread
From: Siddhesh Poyarekar @ 2017-03-22 17:52 UTC (permalink / raw)
To: Joseph Myers
Cc: Steve Ellcey, Szabolcs Nagy, Andrew Pinski, nd,
Adhemerval Zanella, Wilco Dijkstra, libc-alpha, Marcus Shawcroft
On Wednesday 22 March 2017 11:04 PM, Joseph Myers wrote:
> No, machine patches are subject to consensus just like any other patches,
> and such consensus can be reached without needing a machine maintainer to
> comment (although one might hope they would review most substantial
> patches for their machine).
That is not very clear from the Consensus wiki page, so I added the
following to the machine maintainer section:
* If you are not a maintainer for the machine you're proposing the
change for, your patches are subject to consensus like any other
patches and while review from a machine maintainer may be ideal, it
is not strictly necessary for the patch to be accepted.
Siddhesh
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH] Add ifunc memcpy and memmove for aarch64
2017-01-25 17:34 ` Steve Ellcey
2017-02-06 20:54 ` Adhemerval Zanella
@ 2017-02-07 6:47 ` Siddhesh Poyarekar
1 sibling, 0 replies; 38+ messages in thread
From: Siddhesh Poyarekar @ 2017-02-07 6:47 UTC (permalink / raw)
To: Steve Ellcey, Adhemerval Zanella, libc-alpha
On Wednesday 25 January 2017 11:04 PM, Steve Ellcey wrote:
> Here is a new version of the aarch64 ifunc patch with the cpu-features
> style of initialization on startup. Adhemerval, since I took some code
> from your branch I added your name to the ChangeLog. In addition to
> doing the mrs instruction on startup the main difference in this patch
> from the last one is that it uses ifuncs in both the shared and archive
> libc libraries.
>
> Steve Ellcey
> sellcey@cavium.com
>
>
> 2017-01-25 Steve Ellcey <sellcey@caviumnetworks.com>
> Adhemerval Zanella <adhemerval.zanella@linaro.org>
>
> * sysdeps/aarch64/dl-machine.h: Include cpu-features.c.
> (DL_PLATFORM_INIT): New define.
> (dl_platform_init): New function.
> * sysdeps/aarch64/ldsodefs.h: Include cpu-features.h.
> * sysdeps/aarch64/memcpy.S (MEMMOVE, MEMCPY): New macros.
> (memmove): Use MEMMOVE for name.
> (memcpy): Use MEMCPY for name. Add loop with prefetching
> under USE_THUNDERX macro.
> * sysdeps/aarch64/multiarch/Makefile: New file.
> * sysdeps/aarch64/multiarch/ifunc-impl-list.c: Ditto.
> * sysdeps/aarch64/multiarch/init-arch.h: Ditto.
> * sysdeps/aarch64/multiarch/memcpy.c: Ditto.
> * sysdeps/aarch64/multiarch/memcpy_generic.S: Ditto.
> * sysdeps/aarch64/multiarch/memcpy_thunderx.S: Ditto.
> * sysdeps/aarch64/multiarch/memmove.c: Ditto.
> * sysdeps/unix/sysv/linux/aarch64/cpu-features.c: Ditto.
> * sysdeps/unix/sysv/linux/aarch64/cpu-features.h: Ditto.
> * sysdeps/unix/sysv/linux/aarch64/dl-procinfo.c: Ditto.
> * sysdeps/unix/sysv/linux/aarch64/libc-start.c: Ditto.
>
>
> ifunc.patch
>
>
> diff --git a/sysdeps/aarch64/dl-machine.h b/sysdeps/aarch64/dl-machine.h
> index 84b8aec..15d79a6 100644
> --- a/sysdeps/aarch64/dl-machine.h
> +++ b/sysdeps/aarch64/dl-machine.h
> @@ -25,6 +25,7 @@
> #include <tls.h>
> #include <dl-tlsdesc.h>
> #include <dl-irel.h>
> +#include <cpu-features.c>
>
> /* Return nonzero iff ELF header is compatible with the running host. */
> static inline int __attribute__ ((unused))
> @@ -225,6 +226,23 @@ _dl_start_user: \n\
> #define ELF_MACHINE_NO_REL 1
> #define ELF_MACHINE_NO_RELA 0
>
> +#define DL_PLATFORM_INIT dl_platform_init ()
> +
> +static inline void __attribute__ ((unused))
> +dl_platform_init (void)
> +{
> + if (GLRO(dl_platform) != NULL && *GLRO(dl_platform) == '\0')
> + /* Avoid an empty string which would disturb us. */
> + GLRO(dl_platform) = NULL;
> +
> +#ifdef SHARED
> + /* init_cpu_features has been called early from __libc_start_main in
> + static executable. */
> + init_cpu_features (&GLRO(dl_aarch64_cpu_features));
> +#endif
> +}
> +
> +
> static inline ElfW(Addr)
> elf_machine_fixup_plt (struct link_map *map, lookup_t t,
> const ElfW(Rela) *reloc,
> diff --git a/sysdeps/aarch64/ldsodefs.h b/sysdeps/aarch64/ldsodefs.h
> index f277074..ba4ada3 100644
> --- a/sysdeps/aarch64/ldsodefs.h
> +++ b/sysdeps/aarch64/ldsodefs.h
> @@ -20,6 +20,7 @@
> #define _AARCH64_LDSODEFS_H 1
>
> #include <elf.h>
> +#include <cpu-features.h>
>
> struct La_aarch64_regs;
> struct La_aarch64_retval;
> diff --git a/sysdeps/aarch64/memcpy.S b/sysdeps/aarch64/memcpy.S
> index 29af8b1..74444b4 100644
> --- a/sysdeps/aarch64/memcpy.S
> +++ b/sysdeps/aarch64/memcpy.S
> @@ -59,7 +59,14 @@
> Overlapping large forward memmoves use a loop that copies backwards.
> */
>
> -ENTRY_ALIGN (memmove, 6)
> +#ifndef MEMMOVE
> +# define MEMMOVE memmove
> +#endif
> +#ifndef MEMCPY
> +# define MEMCPY memcpy
> +#endif
> +
> +ENTRY_ALIGN (MEMMOVE, 6)
>
> DELOUSE (0)
> DELOUSE (1)
> @@ -71,9 +78,9 @@ ENTRY_ALIGN (memmove, 6)
> b.lo L(move_long)
>
> /* Common case falls through into memcpy. */
> -END (memmove)
> -libc_hidden_builtin_def (memmove)
> -ENTRY (memcpy)
> +END (MEMMOVE)
> +libc_hidden_builtin_def (MEMMOVE)
> +ENTRY (MEMCPY)
>
> DELOUSE (0)
> DELOUSE (1)
> @@ -158,10 +165,22 @@ L(copy96):
>
> .p2align 4
> L(copy_long):
> +
> +#ifdef USE_THUNDERX
> +
> + /* On thunderx, large memcpy's are helped by software prefetching.
> + This loop is identical to the one below it but with prefetching
> + instructions included. For loops that are less than 32768 bytes,
> + the prefetching does not help and slow the code down so we only
> + use the prefetching loop for the largest memcpys. */
I think it would be cleaner to put the full generic and thunderx
implementations in separate files instead of trying to do this macro
dance because it keeps micro-architecture details separate. Assembly
code is hard to maintain as it is without adding conditional compilation
using macros.
I also second Adhemerval's suggestion to separate the patch to add the
framework from the one to add the thunderx ifunc. It makes for easier
cherry picking and git-blaming.
Siddhesh
> +
> + cmp count, #32768
> + b.lo L(copy_long_without_prefetch)
> and tmp1, dstin, 15
> bic dst, dstin, 15
> ldp D_l, D_h, [src]
> sub src, src, tmp1
> + prfm pldl1strm, [src, 384]
> add count, count, tmp1 /* Count is now 16 too large. */
> ldp A_l, A_h, [src, 16]
> stp D_l, D_h, [dstin]
> @@ -169,7 +188,10 @@ L(copy_long):
> ldp C_l, C_h, [src, 48]
> ldp D_l, D_h, [src, 64]!
> subs count, count, 128 + 16 /* Test and readjust count. */
> - b.ls 2f
> +
> +L(prefetch_loop64):
> + tbz src, #6, 1f
> + prfm pldl1strm, [src, 512]
> 1:
> stp A_l, A_h, [dst, 16]
> ldp A_l, A_h, [src, 16]
> @@ -180,12 +202,40 @@ L(copy_long):
> stp D_l, D_h, [dst, 64]!
> ldp D_l, D_h, [src, 64]!
> subs count, count, 64
> - b.hi 1b
> + b.hi L(prefetch_loop64)
> + b L(last64)
> +
> +L(copy_long_without_prefetch):
> +#endif
> +
> + and tmp1, dstin, 15
> + bic dst, dstin, 15
> + ldp D_l, D_h, [src]
> + sub src, src, tmp1
> + add count, count, tmp1 /* Count is now 16 too large. */
> + ldp A_l, A_h, [src, 16]
> + stp D_l, D_h, [dstin]
> + ldp B_l, B_h, [src, 32]
> + ldp C_l, C_h, [src, 48]
> + ldp D_l, D_h, [src, 64]!
> + subs count, count, 128 + 16 /* Test and readjust count. */
> + b.ls L(last64)
> +L(loop64):
> + stp A_l, A_h, [dst, 16]
> + ldp A_l, A_h, [src, 16]
> + stp B_l, B_h, [dst, 32]
> + ldp B_l, B_h, [src, 32]
> + stp C_l, C_h, [dst, 48]
> + ldp C_l, C_h, [src, 48]
> + stp D_l, D_h, [dst, 64]!
> + ldp D_l, D_h, [src, 64]!
> + subs count, count, 64
> + b.hi L(loop64)
>
> /* Write the last full set of 64 bytes. The remainder is at most 64
> bytes, so it is safe to always copy 64 bytes from the end even if
> there is just 1 byte left. */
> -2:
> +L(last64):
> ldp E_l, E_h, [srcend, -64]
> stp A_l, A_h, [dst, 16]
> ldp A_l, A_h, [srcend, -48]
> @@ -256,5 +306,5 @@ L(move_long):
> stp C_l, C_h, [dstin]
> 3: ret
>
> -END (memcpy)
> -libc_hidden_builtin_def (memcpy)
> +END (MEMCPY)
> +libc_hidden_builtin_def (MEMCPY)
> diff --git a/sysdeps/aarch64/multiarch/Makefile b/sysdeps/aarch64/multiarch/Makefile
> index e69de29..78d52c7 100644
> --- a/sysdeps/aarch64/multiarch/Makefile
> +++ b/sysdeps/aarch64/multiarch/Makefile
> @@ -0,0 +1,3 @@
> +ifeq ($(subdir),string)
> +sysdep_routines += memcpy_generic memcpy_thunderx
> +endif
> diff --git a/sysdeps/aarch64/multiarch/ifunc-impl-list.c b/sysdeps/aarch64/multiarch/ifunc-impl-list.c
> index e69de29..c4f23df 100644
> --- a/sysdeps/aarch64/multiarch/ifunc-impl-list.c
> +++ b/sysdeps/aarch64/multiarch/ifunc-impl-list.c
> @@ -0,0 +1,51 @@
> +/* Enumerate available IFUNC implementations of a function. AARCH64 version.
> + Copyright (C) 2017 Free Software Foundation, Inc.
> + This file is part of the GNU C Library.
> +
> + The GNU C Library is free software; you can redistribute it and/or
> + modify it under the terms of the GNU Lesser General Public
> + License as published by the Free Software Foundation; either
> + version 2.1 of the License, or (at your option) any later version.
> +
> + The GNU C Library is distributed in the hope that it will be useful,
> + but WITHOUT ANY WARRANTY; without even the implied warranty of
> + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
> + Lesser General Public License for more details.
> +
> + You should have received a copy of the GNU Lesser General Public
> + License along with the GNU C Library; if not, see
> + <http://www.gnu.org/licenses/>. */
> +
> +#include <assert.h>
> +#include <string.h>
> +#include <wchar.h>
> +#include <ldsodefs.h>
> +#include <ifunc-impl-list.h>
> +#include <init-arch.h>
> +#include <stdio.h>
> +
> +/* Maximum number of IFUNC implementations. */
> +#define MAX_IFUNC 2
> +
> +size_t
> +__libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array,
> + size_t max)
> +{
> + assert (max >= MAX_IFUNC);
> +
> + size_t i = 0;
> +
> + INIT_ARCH ();
> +
> + /* Support sysdeps/aarch64/multiarch/memcpy.c and memmove.c. */
> + IFUNC_IMPL (i, name, memcpy,
> + IFUNC_IMPL_ADD (array, i, memcpy, IS_THUNDERX (midr),
> + __memcpy_thunderx)
> + IFUNC_IMPL_ADD (array, i, memcpy, 1, __memcpy_generic))
> + IFUNC_IMPL (i, name, memmove,
> + IFUNC_IMPL_ADD (array, i, memmove, IS_THUNDERX (midr),
> + __memmove_thunderx)
> + IFUNC_IMPL_ADD (array, i, memmove, 1, __memmove_generic))
> +
> + return i;
> +}
> diff --git a/sysdeps/aarch64/multiarch/init-arch.h b/sysdeps/aarch64/multiarch/init-arch.h
> index e69de29..eafbf77 100644
> --- a/sysdeps/aarch64/multiarch/init-arch.h
> +++ b/sysdeps/aarch64/multiarch/init-arch.h
> @@ -0,0 +1,22 @@
> +/* This file is part of the GNU C Library.
> + Copyright (C) 2017 Free Software Foundation, Inc.
> +
> + The GNU C Library is free software; you can redistribute it and/or
> + modify it under the terms of the GNU Lesser General Public
> + License as published by the Free Software Foundation; either
> + version 2.1 of the License, or (at your option) any later version.
> +
> + The GNU C Library is distributed in the hope that it will be useful,
> + but WITHOUT ANY WARRANTY; without even the implied warranty of
> + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
> + Lesser General Public License for more details.
> +
> + You should have received a copy of the GNU Lesser General Public
> + License along with the GNU C Library; if not, see
> + <http://www.gnu.org/licenses/>. */
> +
> +#include <ldsodefs.h>
> +
> +#define INIT_ARCH() \
> + uint64_t __attribute__((unused)) midr = \
> + GLRO(dl_aarch64_cpu_features).midr_el1;
> diff --git a/sysdeps/aarch64/multiarch/memcpy.c b/sysdeps/aarch64/multiarch/memcpy.c
> index e69de29..4e3f251 100644
> --- a/sysdeps/aarch64/multiarch/memcpy.c
> +++ b/sysdeps/aarch64/multiarch/memcpy.c
> @@ -0,0 +1,39 @@
> +/* Multiple versions of memcpy. AARCH64 version.
> + Copyright (C) 2017 Free Software Foundation, Inc.
> + This file is part of the GNU C Library.
> +
> + The GNU C Library is free software; you can redistribute it and/or
> + modify it under the terms of the GNU Lesser General Public
> + License as published by the Free Software Foundation; either
> + version 2.1 of the License, or (at your option) any later version.
> +
> + The GNU C Library is distributed in the hope that it will be useful,
> + but WITHOUT ANY WARRANTY; without even the implied warranty of
> + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
> + Lesser General Public License for more details.
> +
> + You should have received a copy of the GNU Lesser General Public
> + License along with the GNU C Library; if not, see
> + <http://www.gnu.org/licenses/>. */
> +
> +/* Define multiple versions only for the definition in libc. */
> +
> +#if IS_IN (libc)
> +/* Redefine memcpy so that the compiler won't complain about the type
> + mismatch with the IFUNC selector in strong_alias, below. */
> +# undef memcpy
> +# define memcpy __redirect_memcpy
> +# include <string.h>
> +# include <init-arch.h>
> +
> +extern __typeof (__redirect_memcpy) __libc_memcpy;
> +
> +extern __typeof (__redirect_memcpy) __memcpy_generic attribute_hidden;
> +extern __typeof (__redirect_memcpy) __memcpy_thunderx attribute_hidden;
> +
> +libc_ifunc (__libc_memcpy,
> + IS_THUNDERX (midr) ? __memcpy_thunderx : __memcpy_generic);
> +
> +#undef memcpy
> +strong_alias (__libc_memcpy, memcpy);
> +#endif
> diff --git a/sysdeps/aarch64/multiarch/memcpy_generic.S b/sysdeps/aarch64/multiarch/memcpy_generic.S
> index e69de29..50e1a1c 100644
> --- a/sysdeps/aarch64/multiarch/memcpy_generic.S
> +++ b/sysdeps/aarch64/multiarch/memcpy_generic.S
> @@ -0,0 +1,42 @@
> +/* A Generic Optimized memcpy implementation for AARCH64.
> + Copyright (C) 2017 Free Software Foundation, Inc.
> + This file is part of the GNU C Library.
> +
> + The GNU C Library is free software; you can redistribute it and/or
> + modify it under the terms of the GNU Lesser General Public
> + License as published by the Free Software Foundation; either
> + version 2.1 of the License, or (at your option) any later version.
> +
> + The GNU C Library is distributed in the hope that it will be useful,
> + but WITHOUT ANY WARRANTY; without even the implied warranty of
> + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
> + Lesser General Public License for more details.
> +
> + You should have received a copy of the GNU Lesser General Public
> + License along with the GNU C Library; if not, see
> + <http://www.gnu.org/licenses/>. */
> +
> +/* The actual memcpy and memmove code is in ../memcpy.S. If we are
> + building libc this file defines __memcpy_generic and __memmove_generic.
> + Otherwise the include of ../memcpy.S will define the normal __memcpy
> + and__memmove entry points. */
> +
> +#include <sysdep.h>
> +
> +#if IS_IN (libc)
> +
> +#define MEMCPY __memcpy_generic
> +#define MEMMOVE __memmove_generic
> +
> +/* Do not hide the generic versions of memcpy and memmove, we use them
> + internally. */
> +#undef libc_hidden_builtin_def
> +#define libc_hidden_builtin_def(name)
> +
> +/* It doesn't make sense to send libc-internal memcpy calls through a PLT. */
> + .globl __GI_memcpy; __GI_memcpy = __memcpy_generic
> + .globl __GI_memmove; __GI_memmove = __memmove_generic
> +
> +#endif
> +
> +#include "../memcpy.S"
> diff --git a/sysdeps/aarch64/multiarch/memcpy_thunderx.S b/sysdeps/aarch64/multiarch/memcpy_thunderx.S
> index e69de29..ee971c8 100644
> --- a/sysdeps/aarch64/multiarch/memcpy_thunderx.S
> +++ b/sysdeps/aarch64/multiarch/memcpy_thunderx.S
> @@ -0,0 +1,32 @@
> +/* A Thunderx Optimized memcpy implementation for AARCH64.
> + Copyright (C) 2017 Free Software Foundation, Inc.
> + This file is part of the GNU C Library.
> +
> + The GNU C Library is free software; you can redistribute it and/or
> + modify it under the terms of the GNU Lesser General Public
> + License as published by the Free Software Foundation; either
> + version 2.1 of the License, or (at your option) any later version.
> +
> + The GNU C Library is distributed in the hope that it will be useful,
> + but WITHOUT ANY WARRANTY; without even the implied warranty of
> + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
> + Lesser General Public License for more details.
> +
> + You should have received a copy of the GNU Lesser General Public
> + License along with the GNU C Library; if not, see
> + <http://www.gnu.org/licenses/>. */
> +
> +/* The actual thunderx optimized code is in ../memcpy.S under the USE_THUNDERX
> + ifdef. If we are not building libc then we do not build anything when
> + compiling this file and __memcpy is defined by memcpy_generic.S. */
> +
> +#include <sysdep.h>
> +
> +#if IS_IN (libc)
> +
> +#define MEMCPY __memcpy_thunderx
> +#define MEMMOVE __memmove_thunderx
> +#define USE_THUNDERX
> +#include "../memcpy.S"
> +
> +#endif
> diff --git a/sysdeps/aarch64/multiarch/memmove.c b/sysdeps/aarch64/multiarch/memmove.c
> index e69de29..8d7a146 100644
> --- a/sysdeps/aarch64/multiarch/memmove.c
> +++ b/sysdeps/aarch64/multiarch/memmove.c
> @@ -0,0 +1,39 @@
> +/* Multiple versions of memmove. AARCH64 version.
> + Copyright (C) 2017 Free Software Foundation, Inc.
> + This file is part of the GNU C Library.
> +
> + The GNU C Library is free software; you can redistribute it and/or
> + modify it under the terms of the GNU Lesser General Public
> + License as published by the Free Software Foundation; either
> + version 2.1 of the License, or (at your option) any later version.
> +
> + The GNU C Library is distributed in the hope that it will be useful,
> + but WITHOUT ANY WARRANTY; without even the implied warranty of
> + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
> + Lesser General Public License for more details.
> +
> + You should have received a copy of the GNU Lesser General Public
> + License along with the GNU C Library; if not, see
> + <http://www.gnu.org/licenses/>. */
> +
> +/* Define multiple versions only for the definition in libc. */
> +
> +#if IS_IN (libc)
> +/* Redefine memmove so that the compiler won't complain about the type
> + mismatch with the IFUNC selector in strong_alias, below. */
> +# undef memmove
> +# define memmove __redirect_memmove
> +# include <string.h>
> +# include <init-arch.h>
> +
> +extern __typeof (__redirect_memmove) __libc_memmove;
> +
> +extern __typeof (__redirect_memmove) __memmove_generic attribute_hidden;
> +extern __typeof (__redirect_memmove) __memmove_thunderx attribute_hidden;
> +
> +libc_ifunc (__libc_memmove,
> + IS_THUNDERX (midr) ? __memmove_thunderx : __memmove_generic);
> +
> +#undef memmove
> +strong_alias (__libc_memmove, memmove);
> +#endif
> diff --git a/sysdeps/unix/sysv/linux/aarch64/cpu-features.c b/sysdeps/unix/sysv/linux/aarch64/cpu-features.c
> index e69de29..8e4b514 100644
> --- a/sysdeps/unix/sysv/linux/aarch64/cpu-features.c
> +++ b/sysdeps/unix/sysv/linux/aarch64/cpu-features.c
> @@ -0,0 +1,38 @@
> +/* Initialize CPU feature data. AArch64 version.
> + This file is part of the GNU C Library.
> + Copyright (C) 2017 Free Software Foundation, Inc.
> +
> + The GNU C Library is free software; you can redistribute it and/or
> + modify it under the terms of the GNU Lesser General Public
> + License as published by the Free Software Foundation; either
> + version 2.1 of the License, or (at your option) any later version.
> +
> + The GNU C Library is distributed in the hope that it will be useful,
> + but WITHOUT ANY WARRANTY; without even the implied warranty of
> + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
> + Lesser General Public License for more details.
> +
> + You should have received a copy of the GNU Lesser General Public
> + License along with the GNU C Library; if not, see
> + <http://www.gnu.org/licenses/>. */
> +
> +#include <cpu-features.h>
> +
> +#ifndef HWCAP_CPUID
> +# define HWCAP_CPUID (1 << 11)
> +#endif
> +
> +static inline void
> +init_cpu_features (struct cpu_features *cpu_features)
> +{
> + if (GLRO(dl_hwcap) & HWCAP_CPUID)
> + {
> + register uint64_t id = 0;
> + asm volatile ("mrs %0, midr_el1" : "=r"(id));
> + cpu_features->midr_el1 = id;
> + }
> + else
> + {
> + cpu_features->midr_el1 = 0;
> + }
> +}
> diff --git a/sysdeps/unix/sysv/linux/aarch64/cpu-features.h b/sysdeps/unix/sysv/linux/aarch64/cpu-features.h
> index e69de29..c92b650 100644
> --- a/sysdeps/unix/sysv/linux/aarch64/cpu-features.h
> +++ b/sysdeps/unix/sysv/linux/aarch64/cpu-features.h
> @@ -0,0 +1,49 @@
> +/* Initialize CPU feature data. AArch64 version.
> + This file is part of the GNU C Library.
> + Copyright (C) 2017 Free Software Foundation, Inc.
> +
> + The GNU C Library is free software; you can redistribute it and/or
> + modify it under the terms of the GNU Lesser General Public
> + License as published by the Free Software Foundation; either
> + version 2.1 of the License, or (at your option) any later version.
> +
> + The GNU C Library is distributed in the hope that it will be useful,
> + but WITHOUT ANY WARRANTY; without even the implied warranty of
> + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
> + Lesser General Public License for more details.
> +
> + You should have received a copy of the GNU Lesser General Public
> + License along with the GNU C Library; if not, see
> + <http://www.gnu.org/licenses/>. */
> +
> +#ifndef _CPU_FEATURES_AARCH64_H
> +#define _CPU_FEATURES_AARCH64_H
> +
> +#include <stdint.h>
> +
> +#define MIDR_PARTNUM_SHIFT 4
> +#define MIDR_PARTNUM_MASK (0xfff << MIDR_PARTNUM_SHIFT)
> +#define MIDR_PARTNUM(midr) \
> + (((midr) & MIDR_PARTNUM_MASK) >> MIDR_PARTNUM_SHIFT)
> +#define MIDR_ARCHITECTURE_SHIFT 16
> +#define MIDR_ARCHITECTURE_MASK (0xf << MIDR_ARCHITECTURE_SHIFT)
> +#define MIDR_ARCHITECTURE(midr) \
> + (((midr) & MIDR_ARCHITECTURE_MASK) >> MIDR_ARCHITECTURE_SHIFT)
> +#define MIDR_VARIANT_SHIFT 20
> +#define MIDR_VARIANT_MASK (0xf << MIDR_VARIANT_SHIFT)
> +#define MIDR_VARIANT(midr) \
> + (((midr) & MIDR_VARIANT_MASK) >> MIDR_VARIANT_SHIFT)
> +#define MIDR_IMPLEMENTOR_SHIFT 24
> +#define MIDR_IMPLEMENTOR_MASK (0xff << MIDR_IMPLEMENTOR_SHIFT)
> +#define MIDR_IMPLEMENTOR(midr) \
> + (((midr) & MIDR_IMPLEMENTOR_MASK) >> MIDR_IMPLEMENTOR_SHIFT)
> +
> +#define IS_THUNDERX(midr) (MIDR_IMPLEMENTOR(midr) == 'C' \
> + && MIDR_PARTNUM(midr) == 0x0a1)
> +
> +struct cpu_features
> +{
> + uint64_t midr_el1;
> +};
> +
> +#endif /* _CPU_FEATURES_AARCH64_H */
> diff --git a/sysdeps/unix/sysv/linux/aarch64/dl-procinfo.c b/sysdeps/unix/sysv/linux/aarch64/dl-procinfo.c
> index e69de29..438046a 100644
> --- a/sysdeps/unix/sysv/linux/aarch64/dl-procinfo.c
> +++ b/sysdeps/unix/sysv/linux/aarch64/dl-procinfo.c
> @@ -0,0 +1,60 @@
> +/* Data for AArch64 version of processor capability information.
> + Linux version.
> + Copyright (C) 2017 Free Software Foundation, Inc.
> + This file is part of the GNU C Library.
> +
> + The GNU C Library is free software; you can redistribute it and/or
> + modify it under the terms of the GNU Lesser General Public
> + License as published by the Free Software Foundation; either
> + version 2.1 of the License, or (at your option) any later version.
> +
> + The GNU C Library is distributed in the hope that it will be useful,
> + but WITHOUT ANY WARRANTY; without even the implied warranty of
> + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
> + Lesser General Public License for more details.
> +
> + You should have received a copy of the GNU Lesser General Public
> + License along with the GNU C Library; if not, see
> + <http://www.gnu.org/licenses/>. */
> +
> +/* If anything should be added here check whether the size of each string
> + is still ok with the given array size.
> +
> + All the #ifdefs in the definitions are quite irritating but
> + necessary if we want to avoid duplicating the information. There
> + are three different modes:
> +
> + - PROCINFO_DECL is defined. This means we are only interested in
> + declarations.
> +
> + - PROCINFO_DECL is not defined:
> +
> + + if SHARED is defined the file is included in an array
> + initializer. The .element = { ... } syntax is needed.
> +
> + + if SHARED is not defined a normal array initialization is
> + needed.
> + */
> +
> +#ifndef PROCINFO_CLASS
> +# define PROCINFO_CLASS
> +#endif
> +
> +#if !IS_IN (ldconfig)
> +# if !defined PROCINFO_DECL && defined SHARED
> + ._dl_aarch64_cpu_features
> +# else
> +PROCINFO_CLASS struct cpu_features _dl_aarch64_cpu_features
> +# endif
> +# ifndef PROCINFO_DECL
> += { }
> +# endif
> +# if !defined SHARED || defined PROCINFO_DECL
> +;
> +# else
> +,
> +# endif
> +#endif
> +
> +#undef PROCINFO_DECL
> +#undef PROCINFO_CLASS
> diff --git a/sysdeps/unix/sysv/linux/aarch64/libc-start.c b/sysdeps/unix/sysv/linux/aarch64/libc-start.c
> index e69de29..c98aff1 100644
> --- a/sysdeps/unix/sysv/linux/aarch64/libc-start.c
> +++ b/sysdeps/unix/sysv/linux/aarch64/libc-start.c
> @@ -0,0 +1,40 @@
> +/* Copyright (C) 2017 Free Software Foundation, Inc.
> + This file is part of the GNU C Library.
> +
> + The GNU C Library is free software; you can redistribute it and/or
> + modify it under the terms of the GNU Lesser General Public
> + License as published by the Free Software Foundation; either
> + version 2.1 of the License, or (at your option) any later version.
> +
> + The GNU C Library is distributed in the hope that it will be useful,
> + but WITHOUT ANY WARRANTY; without even the implied warranty of
> + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
> + Lesser General Public License for more details.
> +
> + You should have received a copy of the GNU Lesser General Public
> + License along with the GNU C Library; if not, see
> + <http://www.gnu.org/licenses/>. */
> +
> +#ifdef SHARED
> +# include <csu/libc-start.c>
> +# else
> +/* The main work is done in the generic function. */
> +# define LIBC_START_DISABLE_INLINE
> +# define LIBC_START_MAIN generic_start_main
> +# include <csu/libc-start.c>
> +# include <cpu-features.c>
> +
> +extern struct cpu_features _dl_aarch64_cpu_features;
> +
> +int
> +__libc_start_main (int (*main) (int, char **, char ** MAIN_AUXVEC_DECL),
> + int argc, char **argv,
> + __typeof (main) init,
> + void (*fini) (void),
> + void (*rtld_fini) (void), void *stack_end)
> +{
> + init_cpu_features (&_dl_aarch64_cpu_features);
> + return generic_start_main (main, argc, argv, init, fini, rtld_fini,
> + stack_end);
> +}
> +#endif
>
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH] Add ifunc memcpy and memmove for aarch64
2017-01-25 17:34 ` Steve Ellcey
@ 2017-02-06 20:54 ` Adhemerval Zanella
2017-02-07 6:47 ` Siddhesh Poyarekar
1 sibling, 0 replies; 38+ messages in thread
From: Adhemerval Zanella @ 2017-02-06 20:54 UTC (permalink / raw)
To: Steve Ellcey, libc-alpha
On 25/01/2017 15:34, Steve Ellcey wrote:
> Here is a new version of the aarch64 ifunc patch with the cpu-features
> style of initialization on startup. Adhemerval, since I took some code
> from your branch I added your name to the ChangeLog. In addition to
> doing the mrs instruction on startup the main difference in this patch
> from the last one is that it uses ifuncs in both the shared and archive
> libc libraries.
>
> Steve Ellcey
> sellcey@cavium.com
Hi Steve,
I think it is better to split this patchset in two, one for multiarch foundation
for aarch64 and another one for the thunderx memcpy implementation itself.
Besides that I think patch should be ok.
> diff --git a/sysdeps/aarch64/multiarch/init-arch.h b/sysdeps/aarch64/multiarch/init-arch.h
> index e69de29..eafbf77 100644
> --- a/sysdeps/aarch64/multiarch/init-arch.h
> +++ b/sysdeps/aarch64/multiarch/init-arch.h
> @@ -0,0 +1,22 @@
> +/* This file is part of the GNU C Library.
Missing one line description for this file.
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH] Add ifunc memcpy and memmove for aarch64
2017-01-24 14:09 ` Adhemerval Zanella
2017-01-24 19:34 ` Steve Ellcey
@ 2017-01-25 17:34 ` Steve Ellcey
2017-02-06 20:54 ` Adhemerval Zanella
2017-02-07 6:47 ` Siddhesh Poyarekar
1 sibling, 2 replies; 38+ messages in thread
From: Steve Ellcey @ 2017-01-25 17:34 UTC (permalink / raw)
To: Adhemerval Zanella, libc-alpha
[-- Attachment #1: Type: text/plain, Size: 1503 bytes --]
Here is a new version of the aarch64 ifunc patch with the cpu-features
style of initialization on startup. Â Adhemerval, since I took some code
from your branch I added your name to the ChangeLog. Â In addition to
doing the mrs instruction on startup the main difference in this patch
from the last one is that it uses ifuncs in both the shared and archive
libc libraries.
Steve Ellcey
sellcey@cavium.com
2017-01-25  Steve Ellcey  <sellcey@caviumnetworks.com>
    Adhemerval Zanella  <adhemerval.zanella@linaro.org>
* sysdeps/aarch64/dl-machine.h: Include cpu-features.c.
(DL_PLATFORM_INIT): New define.
(dl_platform_init): New function.
* sysdeps/aarch64/ldsodefs.h: Include cpu-features.h.
* sysdeps/aarch64/memcpy.S (MEMMOVE, MEMCPY): New macros.
(memmove): Use MEMMOVE for name.
(memcpy): Use MEMCPY for name.  Add loop with prefetching
under USE_THUNDERX macro.
* sysdeps/aarch64/multiarch/Makefile: New file.
* sysdeps/aarch64/multiarch/ifunc-impl-list.c: Ditto.
* sysdeps/aarch64/multiarch/init-arch.h: Ditto.
* sysdeps/aarch64/multiarch/memcpy.c: Ditto.
* sysdeps/aarch64/multiarch/memcpy_generic.S: Ditto.
* sysdeps/aarch64/multiarch/memcpy_thunderx.S: Ditto.
* sysdeps/aarch64/multiarch/memmove.c: Ditto.
* sysdeps/unix/sysv/linux/aarch64/cpu-features.c: Ditto.
* sysdeps/unix/sysv/linux/aarch64/cpu-features.h: Ditto.
* sysdeps/unix/sysv/linux/aarch64/dl-procinfo.c: Ditto.
* sysdeps/unix/sysv/linux/aarch64/libc-start.c: Ditto.
[-- Attachment #2: ifunc.patch --]
[-- Type: text/x-patch, Size: 22322 bytes --]
diff --git a/sysdeps/aarch64/dl-machine.h b/sysdeps/aarch64/dl-machine.h
index 84b8aec..15d79a6 100644
--- a/sysdeps/aarch64/dl-machine.h
+++ b/sysdeps/aarch64/dl-machine.h
@@ -25,6 +25,7 @@
#include <tls.h>
#include <dl-tlsdesc.h>
#include <dl-irel.h>
+#include <cpu-features.c>
/* Return nonzero iff ELF header is compatible with the running host. */
static inline int __attribute__ ((unused))
@@ -225,6 +226,23 @@ _dl_start_user: \n\
#define ELF_MACHINE_NO_REL 1
#define ELF_MACHINE_NO_RELA 0
+#define DL_PLATFORM_INIT dl_platform_init ()
+
+static inline void __attribute__ ((unused))
+dl_platform_init (void)
+{
+ if (GLRO(dl_platform) != NULL && *GLRO(dl_platform) == '\0')
+ /* Avoid an empty string which would disturb us. */
+ GLRO(dl_platform) = NULL;
+
+#ifdef SHARED
+ /* init_cpu_features has been called early from __libc_start_main in
+ static executable. */
+ init_cpu_features (&GLRO(dl_aarch64_cpu_features));
+#endif
+}
+
+
static inline ElfW(Addr)
elf_machine_fixup_plt (struct link_map *map, lookup_t t,
const ElfW(Rela) *reloc,
diff --git a/sysdeps/aarch64/ldsodefs.h b/sysdeps/aarch64/ldsodefs.h
index f277074..ba4ada3 100644
--- a/sysdeps/aarch64/ldsodefs.h
+++ b/sysdeps/aarch64/ldsodefs.h
@@ -20,6 +20,7 @@
#define _AARCH64_LDSODEFS_H 1
#include <elf.h>
+#include <cpu-features.h>
struct La_aarch64_regs;
struct La_aarch64_retval;
diff --git a/sysdeps/aarch64/memcpy.S b/sysdeps/aarch64/memcpy.S
index 29af8b1..74444b4 100644
--- a/sysdeps/aarch64/memcpy.S
+++ b/sysdeps/aarch64/memcpy.S
@@ -59,7 +59,14 @@
Overlapping large forward memmoves use a loop that copies backwards.
*/
-ENTRY_ALIGN (memmove, 6)
+#ifndef MEMMOVE
+# define MEMMOVE memmove
+#endif
+#ifndef MEMCPY
+# define MEMCPY memcpy
+#endif
+
+ENTRY_ALIGN (MEMMOVE, 6)
DELOUSE (0)
DELOUSE (1)
@@ -71,9 +78,9 @@ ENTRY_ALIGN (memmove, 6)
b.lo L(move_long)
/* Common case falls through into memcpy. */
-END (memmove)
-libc_hidden_builtin_def (memmove)
-ENTRY (memcpy)
+END (MEMMOVE)
+libc_hidden_builtin_def (MEMMOVE)
+ENTRY (MEMCPY)
DELOUSE (0)
DELOUSE (1)
@@ -158,10 +165,22 @@ L(copy96):
.p2align 4
L(copy_long):
+
+#ifdef USE_THUNDERX
+
+ /* On thunderx, large memcpy's are helped by software prefetching.
+ This loop is identical to the one below it but with prefetching
+ instructions included. For loops that are less than 32768 bytes,
+ the prefetching does not help and slow the code down so we only
+ use the prefetching loop for the largest memcpys. */
+
+ cmp count, #32768
+ b.lo L(copy_long_without_prefetch)
and tmp1, dstin, 15
bic dst, dstin, 15
ldp D_l, D_h, [src]
sub src, src, tmp1
+ prfm pldl1strm, [src, 384]
add count, count, tmp1 /* Count is now 16 too large. */
ldp A_l, A_h, [src, 16]
stp D_l, D_h, [dstin]
@@ -169,7 +188,10 @@ L(copy_long):
ldp C_l, C_h, [src, 48]
ldp D_l, D_h, [src, 64]!
subs count, count, 128 + 16 /* Test and readjust count. */
- b.ls 2f
+
+L(prefetch_loop64):
+ tbz src, #6, 1f
+ prfm pldl1strm, [src, 512]
1:
stp A_l, A_h, [dst, 16]
ldp A_l, A_h, [src, 16]
@@ -180,12 +202,40 @@ L(copy_long):
stp D_l, D_h, [dst, 64]!
ldp D_l, D_h, [src, 64]!
subs count, count, 64
- b.hi 1b
+ b.hi L(prefetch_loop64)
+ b L(last64)
+
+L(copy_long_without_prefetch):
+#endif
+
+ and tmp1, dstin, 15
+ bic dst, dstin, 15
+ ldp D_l, D_h, [src]
+ sub src, src, tmp1
+ add count, count, tmp1 /* Count is now 16 too large. */
+ ldp A_l, A_h, [src, 16]
+ stp D_l, D_h, [dstin]
+ ldp B_l, B_h, [src, 32]
+ ldp C_l, C_h, [src, 48]
+ ldp D_l, D_h, [src, 64]!
+ subs count, count, 128 + 16 /* Test and readjust count. */
+ b.ls L(last64)
+L(loop64):
+ stp A_l, A_h, [dst, 16]
+ ldp A_l, A_h, [src, 16]
+ stp B_l, B_h, [dst, 32]
+ ldp B_l, B_h, [src, 32]
+ stp C_l, C_h, [dst, 48]
+ ldp C_l, C_h, [src, 48]
+ stp D_l, D_h, [dst, 64]!
+ ldp D_l, D_h, [src, 64]!
+ subs count, count, 64
+ b.hi L(loop64)
/* Write the last full set of 64 bytes. The remainder is at most 64
bytes, so it is safe to always copy 64 bytes from the end even if
there is just 1 byte left. */
-2:
+L(last64):
ldp E_l, E_h, [srcend, -64]
stp A_l, A_h, [dst, 16]
ldp A_l, A_h, [srcend, -48]
@@ -256,5 +306,5 @@ L(move_long):
stp C_l, C_h, [dstin]
3: ret
-END (memcpy)
-libc_hidden_builtin_def (memcpy)
+END (MEMCPY)
+libc_hidden_builtin_def (MEMCPY)
diff --git a/sysdeps/aarch64/multiarch/Makefile b/sysdeps/aarch64/multiarch/Makefile
index e69de29..78d52c7 100644
--- a/sysdeps/aarch64/multiarch/Makefile
+++ b/sysdeps/aarch64/multiarch/Makefile
@@ -0,0 +1,3 @@
+ifeq ($(subdir),string)
+sysdep_routines += memcpy_generic memcpy_thunderx
+endif
diff --git a/sysdeps/aarch64/multiarch/ifunc-impl-list.c b/sysdeps/aarch64/multiarch/ifunc-impl-list.c
index e69de29..c4f23df 100644
--- a/sysdeps/aarch64/multiarch/ifunc-impl-list.c
+++ b/sysdeps/aarch64/multiarch/ifunc-impl-list.c
@@ -0,0 +1,51 @@
+/* Enumerate available IFUNC implementations of a function. AARCH64 version.
+ Copyright (C) 2017 Free Software Foundation, Inc.
+ This file is part of the GNU C Library.
+
+ The GNU C Library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Lesser General Public
+ License as published by the Free Software Foundation; either
+ version 2.1 of the License, or (at your option) any later version.
+
+ The GNU C Library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public
+ License along with the GNU C Library; if not, see
+ <http://www.gnu.org/licenses/>. */
+
+#include <assert.h>
+#include <string.h>
+#include <wchar.h>
+#include <ldsodefs.h>
+#include <ifunc-impl-list.h>
+#include <init-arch.h>
+#include <stdio.h>
+
+/* Maximum number of IFUNC implementations. */
+#define MAX_IFUNC 2
+
+size_t
+__libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array,
+ size_t max)
+{
+ assert (max >= MAX_IFUNC);
+
+ size_t i = 0;
+
+ INIT_ARCH ();
+
+ /* Support sysdeps/aarch64/multiarch/memcpy.c and memmove.c. */
+ IFUNC_IMPL (i, name, memcpy,
+ IFUNC_IMPL_ADD (array, i, memcpy, IS_THUNDERX (midr),
+ __memcpy_thunderx)
+ IFUNC_IMPL_ADD (array, i, memcpy, 1, __memcpy_generic))
+ IFUNC_IMPL (i, name, memmove,
+ IFUNC_IMPL_ADD (array, i, memmove, IS_THUNDERX (midr),
+ __memmove_thunderx)
+ IFUNC_IMPL_ADD (array, i, memmove, 1, __memmove_generic))
+
+ return i;
+}
diff --git a/sysdeps/aarch64/multiarch/init-arch.h b/sysdeps/aarch64/multiarch/init-arch.h
index e69de29..eafbf77 100644
--- a/sysdeps/aarch64/multiarch/init-arch.h
+++ b/sysdeps/aarch64/multiarch/init-arch.h
@@ -0,0 +1,22 @@
+/* This file is part of the GNU C Library.
+ Copyright (C) 2017 Free Software Foundation, Inc.
+
+ The GNU C Library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Lesser General Public
+ License as published by the Free Software Foundation; either
+ version 2.1 of the License, or (at your option) any later version.
+
+ The GNU C Library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public
+ License along with the GNU C Library; if not, see
+ <http://www.gnu.org/licenses/>. */
+
+#include <ldsodefs.h>
+
+#define INIT_ARCH() \
+ uint64_t __attribute__((unused)) midr = \
+ GLRO(dl_aarch64_cpu_features).midr_el1;
diff --git a/sysdeps/aarch64/multiarch/memcpy.c b/sysdeps/aarch64/multiarch/memcpy.c
index e69de29..4e3f251 100644
--- a/sysdeps/aarch64/multiarch/memcpy.c
+++ b/sysdeps/aarch64/multiarch/memcpy.c
@@ -0,0 +1,39 @@
+/* Multiple versions of memcpy. AARCH64 version.
+ Copyright (C) 2017 Free Software Foundation, Inc.
+ This file is part of the GNU C Library.
+
+ The GNU C Library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Lesser General Public
+ License as published by the Free Software Foundation; either
+ version 2.1 of the License, or (at your option) any later version.
+
+ The GNU C Library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public
+ License along with the GNU C Library; if not, see
+ <http://www.gnu.org/licenses/>. */
+
+/* Define multiple versions only for the definition in libc. */
+
+#if IS_IN (libc)
+/* Redefine memcpy so that the compiler won't complain about the type
+ mismatch with the IFUNC selector in strong_alias, below. */
+# undef memcpy
+# define memcpy __redirect_memcpy
+# include <string.h>
+# include <init-arch.h>
+
+extern __typeof (__redirect_memcpy) __libc_memcpy;
+
+extern __typeof (__redirect_memcpy) __memcpy_generic attribute_hidden;
+extern __typeof (__redirect_memcpy) __memcpy_thunderx attribute_hidden;
+
+libc_ifunc (__libc_memcpy,
+ IS_THUNDERX (midr) ? __memcpy_thunderx : __memcpy_generic);
+
+#undef memcpy
+strong_alias (__libc_memcpy, memcpy);
+#endif
diff --git a/sysdeps/aarch64/multiarch/memcpy_generic.S b/sysdeps/aarch64/multiarch/memcpy_generic.S
index e69de29..50e1a1c 100644
--- a/sysdeps/aarch64/multiarch/memcpy_generic.S
+++ b/sysdeps/aarch64/multiarch/memcpy_generic.S
@@ -0,0 +1,42 @@
+/* A Generic Optimized memcpy implementation for AARCH64.
+ Copyright (C) 2017 Free Software Foundation, Inc.
+ This file is part of the GNU C Library.
+
+ The GNU C Library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Lesser General Public
+ License as published by the Free Software Foundation; either
+ version 2.1 of the License, or (at your option) any later version.
+
+ The GNU C Library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public
+ License along with the GNU C Library; if not, see
+ <http://www.gnu.org/licenses/>. */
+
+/* The actual memcpy and memmove code is in ../memcpy.S. If we are
+ building libc this file defines __memcpy_generic and __memmove_generic.
+ Otherwise the include of ../memcpy.S will define the normal __memcpy
+ and__memmove entry points. */
+
+#include <sysdep.h>
+
+#if IS_IN (libc)
+
+#define MEMCPY __memcpy_generic
+#define MEMMOVE __memmove_generic
+
+/* Do not hide the generic versions of memcpy and memmove, we use them
+ internally. */
+#undef libc_hidden_builtin_def
+#define libc_hidden_builtin_def(name)
+
+/* It doesn't make sense to send libc-internal memcpy calls through a PLT. */
+ .globl __GI_memcpy; __GI_memcpy = __memcpy_generic
+ .globl __GI_memmove; __GI_memmove = __memmove_generic
+
+#endif
+
+#include "../memcpy.S"
diff --git a/sysdeps/aarch64/multiarch/memcpy_thunderx.S b/sysdeps/aarch64/multiarch/memcpy_thunderx.S
index e69de29..ee971c8 100644
--- a/sysdeps/aarch64/multiarch/memcpy_thunderx.S
+++ b/sysdeps/aarch64/multiarch/memcpy_thunderx.S
@@ -0,0 +1,32 @@
+/* A Thunderx Optimized memcpy implementation for AARCH64.
+ Copyright (C) 2017 Free Software Foundation, Inc.
+ This file is part of the GNU C Library.
+
+ The GNU C Library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Lesser General Public
+ License as published by the Free Software Foundation; either
+ version 2.1 of the License, or (at your option) any later version.
+
+ The GNU C Library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public
+ License along with the GNU C Library; if not, see
+ <http://www.gnu.org/licenses/>. */
+
+/* The actual thunderx optimized code is in ../memcpy.S under the USE_THUNDERX
+ ifdef. If we are not building libc then we do not build anything when
+ compiling this file and __memcpy is defined by memcpy_generic.S. */
+
+#include <sysdep.h>
+
+#if IS_IN (libc)
+
+#define MEMCPY __memcpy_thunderx
+#define MEMMOVE __memmove_thunderx
+#define USE_THUNDERX
+#include "../memcpy.S"
+
+#endif
diff --git a/sysdeps/aarch64/multiarch/memmove.c b/sysdeps/aarch64/multiarch/memmove.c
index e69de29..8d7a146 100644
--- a/sysdeps/aarch64/multiarch/memmove.c
+++ b/sysdeps/aarch64/multiarch/memmove.c
@@ -0,0 +1,39 @@
+/* Multiple versions of memmove. AARCH64 version.
+ Copyright (C) 2017 Free Software Foundation, Inc.
+ This file is part of the GNU C Library.
+
+ The GNU C Library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Lesser General Public
+ License as published by the Free Software Foundation; either
+ version 2.1 of the License, or (at your option) any later version.
+
+ The GNU C Library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public
+ License along with the GNU C Library; if not, see
+ <http://www.gnu.org/licenses/>. */
+
+/* Define multiple versions only for the definition in libc. */
+
+#if IS_IN (libc)
+/* Redefine memmove so that the compiler won't complain about the type
+ mismatch with the IFUNC selector in strong_alias, below. */
+# undef memmove
+# define memmove __redirect_memmove
+# include <string.h>
+# include <init-arch.h>
+
+extern __typeof (__redirect_memmove) __libc_memmove;
+
+extern __typeof (__redirect_memmove) __memmove_generic attribute_hidden;
+extern __typeof (__redirect_memmove) __memmove_thunderx attribute_hidden;
+
+libc_ifunc (__libc_memmove,
+ IS_THUNDERX (midr) ? __memmove_thunderx : __memmove_generic);
+
+#undef memmove
+strong_alias (__libc_memmove, memmove);
+#endif
diff --git a/sysdeps/unix/sysv/linux/aarch64/cpu-features.c b/sysdeps/unix/sysv/linux/aarch64/cpu-features.c
index e69de29..8e4b514 100644
--- a/sysdeps/unix/sysv/linux/aarch64/cpu-features.c
+++ b/sysdeps/unix/sysv/linux/aarch64/cpu-features.c
@@ -0,0 +1,38 @@
+/* Initialize CPU feature data. AArch64 version.
+ This file is part of the GNU C Library.
+ Copyright (C) 2017 Free Software Foundation, Inc.
+
+ The GNU C Library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Lesser General Public
+ License as published by the Free Software Foundation; either
+ version 2.1 of the License, or (at your option) any later version.
+
+ The GNU C Library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public
+ License along with the GNU C Library; if not, see
+ <http://www.gnu.org/licenses/>. */
+
+#include <cpu-features.h>
+
+#ifndef HWCAP_CPUID
+# define HWCAP_CPUID (1 << 11)
+#endif
+
+static inline void
+init_cpu_features (struct cpu_features *cpu_features)
+{
+ if (GLRO(dl_hwcap) & HWCAP_CPUID)
+ {
+ register uint64_t id = 0;
+ asm volatile ("mrs %0, midr_el1" : "=r"(id));
+ cpu_features->midr_el1 = id;
+ }
+ else
+ {
+ cpu_features->midr_el1 = 0;
+ }
+}
diff --git a/sysdeps/unix/sysv/linux/aarch64/cpu-features.h b/sysdeps/unix/sysv/linux/aarch64/cpu-features.h
index e69de29..c92b650 100644
--- a/sysdeps/unix/sysv/linux/aarch64/cpu-features.h
+++ b/sysdeps/unix/sysv/linux/aarch64/cpu-features.h
@@ -0,0 +1,49 @@
+/* Initialize CPU feature data. AArch64 version.
+ This file is part of the GNU C Library.
+ Copyright (C) 2017 Free Software Foundation, Inc.
+
+ The GNU C Library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Lesser General Public
+ License as published by the Free Software Foundation; either
+ version 2.1 of the License, or (at your option) any later version.
+
+ The GNU C Library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public
+ License along with the GNU C Library; if not, see
+ <http://www.gnu.org/licenses/>. */
+
+#ifndef _CPU_FEATURES_AARCH64_H
+#define _CPU_FEATURES_AARCH64_H
+
+#include <stdint.h>
+
+#define MIDR_PARTNUM_SHIFT 4
+#define MIDR_PARTNUM_MASK (0xfff << MIDR_PARTNUM_SHIFT)
+#define MIDR_PARTNUM(midr) \
+ (((midr) & MIDR_PARTNUM_MASK) >> MIDR_PARTNUM_SHIFT)
+#define MIDR_ARCHITECTURE_SHIFT 16
+#define MIDR_ARCHITECTURE_MASK (0xf << MIDR_ARCHITECTURE_SHIFT)
+#define MIDR_ARCHITECTURE(midr) \
+ (((midr) & MIDR_ARCHITECTURE_MASK) >> MIDR_ARCHITECTURE_SHIFT)
+#define MIDR_VARIANT_SHIFT 20
+#define MIDR_VARIANT_MASK (0xf << MIDR_VARIANT_SHIFT)
+#define MIDR_VARIANT(midr) \
+ (((midr) & MIDR_VARIANT_MASK) >> MIDR_VARIANT_SHIFT)
+#define MIDR_IMPLEMENTOR_SHIFT 24
+#define MIDR_IMPLEMENTOR_MASK (0xff << MIDR_IMPLEMENTOR_SHIFT)
+#define MIDR_IMPLEMENTOR(midr) \
+ (((midr) & MIDR_IMPLEMENTOR_MASK) >> MIDR_IMPLEMENTOR_SHIFT)
+
+#define IS_THUNDERX(midr) (MIDR_IMPLEMENTOR(midr) == 'C' \
+ && MIDR_PARTNUM(midr) == 0x0a1)
+
+struct cpu_features
+{
+ uint64_t midr_el1;
+};
+
+#endif /* _CPU_FEATURES_AARCH64_H */
diff --git a/sysdeps/unix/sysv/linux/aarch64/dl-procinfo.c b/sysdeps/unix/sysv/linux/aarch64/dl-procinfo.c
index e69de29..438046a 100644
--- a/sysdeps/unix/sysv/linux/aarch64/dl-procinfo.c
+++ b/sysdeps/unix/sysv/linux/aarch64/dl-procinfo.c
@@ -0,0 +1,60 @@
+/* Data for AArch64 version of processor capability information.
+ Linux version.
+ Copyright (C) 2017 Free Software Foundation, Inc.
+ This file is part of the GNU C Library.
+
+ The GNU C Library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Lesser General Public
+ License as published by the Free Software Foundation; either
+ version 2.1 of the License, or (at your option) any later version.
+
+ The GNU C Library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public
+ License along with the GNU C Library; if not, see
+ <http://www.gnu.org/licenses/>. */
+
+/* If anything should be added here check whether the size of each string
+ is still ok with the given array size.
+
+ All the #ifdefs in the definitions are quite irritating but
+ necessary if we want to avoid duplicating the information. There
+ are three different modes:
+
+ - PROCINFO_DECL is defined. This means we are only interested in
+ declarations.
+
+ - PROCINFO_DECL is not defined:
+
+ + if SHARED is defined the file is included in an array
+ initializer. The .element = { ... } syntax is needed.
+
+ + if SHARED is not defined a normal array initialization is
+ needed.
+ */
+
+#ifndef PROCINFO_CLASS
+# define PROCINFO_CLASS
+#endif
+
+#if !IS_IN (ldconfig)
+# if !defined PROCINFO_DECL && defined SHARED
+ ._dl_aarch64_cpu_features
+# else
+PROCINFO_CLASS struct cpu_features _dl_aarch64_cpu_features
+# endif
+# ifndef PROCINFO_DECL
+= { }
+# endif
+# if !defined SHARED || defined PROCINFO_DECL
+;
+# else
+,
+# endif
+#endif
+
+#undef PROCINFO_DECL
+#undef PROCINFO_CLASS
diff --git a/sysdeps/unix/sysv/linux/aarch64/libc-start.c b/sysdeps/unix/sysv/linux/aarch64/libc-start.c
index e69de29..c98aff1 100644
--- a/sysdeps/unix/sysv/linux/aarch64/libc-start.c
+++ b/sysdeps/unix/sysv/linux/aarch64/libc-start.c
@@ -0,0 +1,40 @@
+/* Copyright (C) 2017 Free Software Foundation, Inc.
+ This file is part of the GNU C Library.
+
+ The GNU C Library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Lesser General Public
+ License as published by the Free Software Foundation; either
+ version 2.1 of the License, or (at your option) any later version.
+
+ The GNU C Library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public
+ License along with the GNU C Library; if not, see
+ <http://www.gnu.org/licenses/>. */
+
+#ifdef SHARED
+# include <csu/libc-start.c>
+# else
+/* The main work is done in the generic function. */
+# define LIBC_START_DISABLE_INLINE
+# define LIBC_START_MAIN generic_start_main
+# include <csu/libc-start.c>
+# include <cpu-features.c>
+
+extern struct cpu_features _dl_aarch64_cpu_features;
+
+int
+__libc_start_main (int (*main) (int, char **, char ** MAIN_AUXVEC_DECL),
+ int argc, char **argv,
+ __typeof (main) init,
+ void (*fini) (void),
+ void (*rtld_fini) (void), void *stack_end)
+{
+ init_cpu_features (&_dl_aarch64_cpu_features);
+ return generic_start_main (main, argc, argv, init, fini, rtld_fini,
+ stack_end);
+}
+#endif
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH] Add ifunc memcpy and memmove for aarch64
2017-01-24 19:34 ` Steve Ellcey
@ 2017-01-24 20:49 ` Steve Ellcey
0 siblings, 0 replies; 38+ messages in thread
From: Steve Ellcey @ 2017-01-24 20:49 UTC (permalink / raw)
To: Adhemerval Zanella, libc-alpha
Never mind. Â I fixed this by moving the definition of dl_platform_init
up earlier in the file. Â That, plus including cpu-features.c fixed the
build.
Steve
On Tue, 2017-01-24 at 11:34 -0800, Steve Ellcey wrote:
>Â
> I added this code to sysdeps/aarch64/dl-machine.h but when I added it
> I
> got a build error. Â I am using the same prototype for
> dl_platform_init
> that x86 has so I am not sure why I get this error.
>
> Steve Ellcey
> sellcey@caviumnetworks.com
>
>
> diff --git a/sysdeps/aarch64/dl-machine.h b/sysdeps/aarch64/dl-
> machine.h
> index 84b8aec..7f38a68 100644
> --- a/sysdeps/aarch64/dl-machine.h
> +++ b/sysdeps/aarch64/dl-machine.h
> @@ -426,4 +426,20 @@ elf_machine_lazy_rel (struct link_map *map,
> Â Â Â Â Â _dl_reloc_bad_type (map, r_type, 1);
> Â }
> Â
> +#define DL_PLATFORM_INIT dl_platform_init ()
> +
> +static inline void __attribute__ ((unused))
> +dl_platform_init (void)
> +{
> +Â Â if (GLRO(dl_platform) != NULL && *GLRO(dl_platform) == '\0')
> +    /* Avoid an empty string which would disturb us.  */
> +Â Â Â Â GLRO(dl_platform) = NULL;
> +
> +#ifdef SHARED
> +Â Â /* init_cpu_features has been called early from __libc_start_main
> in
> +     static executable.  */
> +Â Â init_cpu_features (&GLRO(dl_aarch64_cpu_features));
> +#endif
> +}
> +
> Â #endif
>
>
>
> The error I get is:
>
> In file included from dynamic-link.h:92:0,
> Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â from dl-conflict.c:59:
> ../sysdeps/aarch64/dl-machine.h: In function â_dl_resolve_conflictsâ:
> ../sysdeps/aarch64/dl-machine.h:432:1: error: invalid storage class
> for function âdl_platform_initâ
> Â dl_platform_init (void)
> Â ^~~~~~~~~~~~~~~~
> ../o-iterator.mk:9: recipe for target '/home/ubuntu/sellcey/glibc-
> ifunc-new/obj-glibc64/elf/dl-conflict.o' failed
> make[2]: *** [/home/ubuntu/sellcey/glibc-ifunc-new/obj-
> glibc64/elf/dl-conflict.o] Error 1
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH] Add ifunc memcpy and memmove for aarch64
2017-01-24 14:09 ` Adhemerval Zanella
@ 2017-01-24 19:34 ` Steve Ellcey
2017-01-24 20:49 ` Steve Ellcey
2017-01-25 17:34 ` Steve Ellcey
1 sibling, 1 reply; 38+ messages in thread
From: Steve Ellcey @ 2017-01-24 19:34 UTC (permalink / raw)
To: Adhemerval Zanella, libc-alpha
On Tue, 2017-01-24 at 12:09 -0200, Adhemerval Zanella wrote:
> This branch in my personal repo [1] have a workable draft version for
> aarch64.  It contains 2 patches, one that implements the cpu-features.c
> for aarch64 and another one that actually uses it to implement the
> thundex ifunc.
>
> On the first patch I would like to remove the sysdeps/aarch64/ldsodefs.h and
> make only Linux specific, because of hwcap. I will try to cleanup this up
> later.
>
> [1] https://github.com/zatrazz/glibc/tree/master-aarch64-ifunc
Thanks Adhemerval,
That clears a lot of things up. Â One thing I noticed in your tree is
that you only call init_cpu_features from  __libc_start_main for the
static glibc. Â On x86 they also defined DL_PLATFORM_INIT to be a
routine that calls init_cpu_features for the dynamically loaded glibc.Â
I added this code to sysdeps/aarch64/dl-machine.h but when I added it I
got a build error. Â I am using the same prototype for dl_platform_init
that x86 has so I am not sure why I get this error.
Steve Ellcey
sellcey@caviumnetworks.com
diff --git a/sysdeps/aarch64/dl-machine.h b/sysdeps/aarch64/dl-
machine.h
index 84b8aec..7f38a68 100644
--- a/sysdeps/aarch64/dl-machine.h
+++ b/sysdeps/aarch64/dl-machine.h
@@ -426,4 +426,20 @@ elf_machine_lazy_rel (struct link_map *map,
     _dl_reloc_bad_type (map, r_type, 1);
 }
Â
+#define DL_PLATFORM_INIT dl_platform_init ()
+
+static inline void __attribute__ ((unused))
+dl_platform_init (void)
+{
+Â Â if (GLRO(dl_platform) != NULL && *GLRO(dl_platform) == '\0')
+    /* Avoid an empty string which would disturb us.  */
+Â Â Â Â GLRO(dl_platform) = NULL;
+
+#ifdef SHARED
+Â Â /* init_cpu_features has been called early from __libc_start_main in
+     static executable.  */
+Â Â init_cpu_features (&GLRO(dl_aarch64_cpu_features));
+#endif
+}
+
 #endif
The error I get is:
In file included from dynamic-link.h:92:0,
                 from dl-conflict.c:59:
../sysdeps/aarch64/dl-machine.h: In function â_dl_resolve_conflictsâ:
../sysdeps/aarch64/dl-machine.h:432:1: error: invalid storage class for function âdl_platform_initâ
 dl_platform_init (void)
 ^~~~~~~~~~~~~~~~
../o-iterator.mk:9: recipe for target '/home/ubuntu/sellcey/glibc-ifunc-new/obj-glibc64/elf/dl-conflict.o' failed
make[2]: *** [/home/ubuntu/sellcey/glibc-ifunc-new/obj-glibc64/elf/dl-conflict.o] Error 1
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH] Add ifunc memcpy and memmove for aarch64
2017-01-23 23:33 ` Steve Ellcey
2017-01-24 9:37 ` Florian Weimer
@ 2017-01-24 14:09 ` Adhemerval Zanella
2017-01-24 19:34 ` Steve Ellcey
2017-01-25 17:34 ` Steve Ellcey
1 sibling, 2 replies; 38+ messages in thread
From: Adhemerval Zanella @ 2017-01-24 14:09 UTC (permalink / raw)
To: Steve Ellcey, libc-alpha
On 23/01/2017 21:33, Steve Ellcey wrote:
> On Thu, 2017-01-19 at 17:41 -0200, Adhemerval Zanella wrote:
>>
>> I think to avoid potentially multiple kernel traps at loading or plt resolve time,
>> a better solution would be issue the mrs instruction once at loader/program startup,
>> fill in an internal structure with the required information and use it later on
>> ifunc resolution. This is similar the cpu-features/cacheinfo strategy for x86.
>>
>> From last patch iteration [1] documentation, kernel provides the HWCAP_CPUID
>> bit on hwcap to indication it supports the mrs emulation. So using my previous
>> suggestion I would recommend:
>>
>> 1. Remove any configure check or restriction.
>> 2. Add a cpu_features module similar to x86 that set a global state with
>> the cpu information obtained from kernel. It will first check HWCAP_CPUID
>> bit on hwcap and if it is set then issue the mrs instruction. It will
>> then populate the global state with the required cpu information.
>> 3. Use the cpu information to select the correct ifunc.
>>
>> It has another advantage of avoid more complexity with different glibc
>> with different minimum required kernels.
>
>
> Adhemerval,
>
> I am looking at the cpu-features setup from x86 and trying to implement
> that for aarch64 but there are some things I don't understand about the
> code and I was hoping you (or someone else on the list) could help me.
> I have attached the patch I have so far, this code doesn't contain any
> use of the cpu features code but is just the code that tries to initialize
> it on start up. Right now it doesn't build and I am not sure what I am
> missing.
>
> Specifically I have these questions.
>
> How is cpu-features-offsets.sym used and what do I need in this file?
> I think this may be how _dl_aarch64_cpu_features is supposed to be
> defined but I am not sure.
>
The .sym files are a trick glibc uses to basically define struct or tls
offsets so use on assembly implementations. x86 uses it because it
originally implemented most of ifunc resolvers directly in assembly
(back when compiler support was lacking).
Since you are implementing directly on C, these files are unnecessary.
> I obviously need something in init_cpu_features to check if mrs is
> emulated in the kernel but I am not sure how to do that. I know it
> involves the HWCAPs but I am not sure how to access them, do I need a
> sym file to get access to that too? Something like
> sysdeps/arm/rtld-global-offsets.sym?
>
> Right now my build dies with:
>
> <stdin>:2:102: error: implicit declaration of function ârtld_global_ro_offsetofâ [-Werror=implicit-function-declaration]
> <stdin>:2:127: error: â_dl_aarch64_cpu_featuresâ undeclared (first use in this function)
> <stdin>:2:127: note: each undeclared identifier is reported only once for each function it appears in
> <stdin>:3:82: error: invalid application of âsizeofâ to incomplete type âstruct cpu_featuresâ
> cc1: all warnings being treated as errors
> ../Makerules:266: recipe for target '/home/ubuntu/sellcey/glibc-ifunc-new/obj-glibc64/cpu-features-offsets.h' failed
> make[2]: *** [/home/ubuntu/sellcey/glibc-ifunc-new/obj-glibc64/cpu-features-offsets.h] Error 1
>
> Steve Ellcey
> sellcey@caviumnetworks.com
>
This branch in my personal repo [1] have a workable draft version for
aarch64. It contains 2 patches, one that implements the cpu-features.c
for aarch64 and another one that actually uses it to implement the
thundex ifunc.
On the first patch I would like to remove the sysdeps/aarch64/ldsodefs.h and
make only Linux specific, because of hwcap. I will try to cleanup this up
later.
[1] https://github.com/zatrazz/glibc/tree/master-aarch64-ifunc
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH] Add ifunc memcpy and memmove for aarch64
2017-01-23 23:33 ` Steve Ellcey
@ 2017-01-24 9:37 ` Florian Weimer
2017-01-24 14:09 ` Adhemerval Zanella
1 sibling, 0 replies; 38+ messages in thread
From: Florian Weimer @ 2017-01-24 9:37 UTC (permalink / raw)
To: Steve Ellcey, Adhemerval Zanella, libc-alpha
On 01/24/2017 12:33 AM, Steve Ellcey wrote:
> How is cpu-features-offsets.sym used and what do I need in this file?
> I think this may be how _dl_aarch64_cpu_features is supposed to be
> defined but I am not sure.
It allows the assembler to use the values of C constant expressions.
Commit 67aae64512cb42332f76a83e84ac2bc608ad4ad2 is an aarch64 example of
its use.
Florian
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH] Add ifunc memcpy and memmove for aarch64
2017-01-19 19:42 ` Adhemerval Zanella
2017-01-19 21:04 ` Joseph Myers
@ 2017-01-23 23:33 ` Steve Ellcey
2017-01-24 9:37 ` Florian Weimer
2017-01-24 14:09 ` Adhemerval Zanella
1 sibling, 2 replies; 38+ messages in thread
From: Steve Ellcey @ 2017-01-23 23:33 UTC (permalink / raw)
To: Adhemerval Zanella, libc-alpha
[-- Attachment #1: Type: text/plain, Size: 2965 bytes --]
On Thu, 2017-01-19 at 17:41 -0200, Adhemerval Zanella wrote:
>Â
> I think to avoid potentially multiple kernel traps at loading or plt resolve time,
> a better solution would be issue the mrs instruction once at loader/program startup,
> fill in an internal structure with the required information and use it later on
> ifunc resolution.  This is similar the cpu-features/cacheinfo strategy for x86.
>
> From last patch iteration [1] documentation, kernel provides the HWCAP_CPUID
> bit on hwcap to indication it supports the mrs emulation.  So using my previous
> suggestion I would recommend:
>
> Â 1. Remove any configure check or restriction.
> Â 2. Add a cpu_features module similar to x86 that set a global state with
>      the cpu information obtained from kernel.  It will first check HWCAP_CPUID
>      bit on hwcap and if it is set then issue the mrs instruction.  It will
> Â Â Â Â Â then populate the global state with the required cpu information.
> Â 3. Use the cpu information to select the correct ifunc.
>
> It has another advantage of avoid more complexity with different glibc
> with different minimum required kernels.
Adhemerval,
I am looking at the cpu-features setup from x86 and trying to implement
that for aarch64 but there are some things I don't understand about the
code and I was hoping you (or someone else on the list) could help me.
I have attached the patch I have so far, this code doesn't contain any
use of the cpu features code but is just the code that tries to initialize
it on start up.  Right now it doesn't build and I am not sure what I am
missing.
Specifically I have these questions.
How is cpu-features-offsets.sym used and what do I need in this file?
I think this may be how _dl_aarch64_cpu_features is supposed to be
defined but I am not sure.
I obviously need something in init_cpu_features to check if mrs is
emulated in the kernel but I am not sure how to do that.  I know it
involves the HWCAPs but I am not sure how to access them, do I need a
sym file to get access to that too?  Something like
sysdeps/arm/rtld-global-offsets.sym?
Right now my build dies with:
<stdin>:2:102: error: implicit declaration of function ârtld_global_ro_offsetofâ [-Werror=implicit-function-declaration]
<stdin>:2:127: error: â_dl_aarch64_cpu_featuresâ undeclared (first use in this function)
<stdin>:2:127: note: each undeclared identifier is reported only once for each function it appears in
<stdin>:3:82: error: invalid application of âsizeofâ to incomplete type âstruct cpu_featuresâ
cc1: all warnings being treated as errors
../Makerules:266: recipe for target '/home/ubuntu/sellcey/glibc-ifunc-new/obj-glibc64/cpu-features-offsets.h' failed
make[2]: *** [/home/ubuntu/sellcey/glibc-ifunc-new/obj-glibc64/cpu-features-offsets.h] Error 1
Steve Ellcey
sellcey@caviumnetworks.com
[-- Attachment #2: ifunc2.diff --]
[-- Type: text/x-patch, Size: 10446 bytes --]
diff --git a/sysdeps/aarch64/Makefile b/sysdeps/aarch64/Makefile
index 562c137..e1d47fd 100644
--- a/sysdeps/aarch64/Makefile
+++ b/sysdeps/aarch64/Makefile
@@ -5,11 +5,13 @@ CFLAGS-backtrace.c += -funwind-tables
endif
ifeq ($(subdir),elf)
+sysdep-dl-routines += dl-get-cpu-features
sysdep-dl-routines += tlsdesc dl-tlsdesc
gen-as-const-headers += dl-link.sym
endif
ifeq ($(subdir),csu)
+gen-as-const-headers += cpu-features-offsets.sym
gen-as-const-headers += tlsdesc.sym
endif
diff --git a/sysdeps/aarch64/cpu-features-offsets.sym b/sysdeps/aarch64/cpu-features-offsets.sym
index e69de29..ad33818 100644
--- a/sysdeps/aarch64/cpu-features-offsets.sym
+++ b/sysdeps/aarch64/cpu-features-offsets.sym
@@ -0,0 +1,4 @@
+
+RTLD_GLOBAL_RO_DL_AARCH64_CPU_FEATURES_OFFSET rtld_global_ro_offsetof (_dl_aarch64_cpu_features)
+
+CPU_FEATURES_SIZE sizeof (struct cpu_features)
diff --git a/sysdeps/aarch64/cpu-features.c b/sysdeps/aarch64/cpu-features.c
index e69de29..6c8f065 100644
--- a/sysdeps/aarch64/cpu-features.c
+++ b/sysdeps/aarch64/cpu-features.c
@@ -0,0 +1,30 @@
+/* Initialize CPU feature data.
+ This file is part of the GNU C Library.
+ Copyright (C) 2017 Free Software Foundation, Inc.
+
+ The GNU C Library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Lesser General Public
+ License as published by the Free Software Foundation; either
+ version 2.1 of the License, or (at your option) any later version.
+
+ The GNU C Library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public
+ License along with the GNU C Library; if not, see
+ <http://www.gnu.org/licenses/>. */
+
+#include <stdint.h>
+#include <cpu-features.h>
+
+static inline void
+init_cpu_features (struct cpu_features *cpu_features)
+{
+ /* What do I do on a kernel that does not support mrs. */
+
+ register uint64_t id = 0;
+ asm volatile ("mrs %0, midr_el1" : "=r"(id));
+ cpu_features->midr_el1 = id;
+}
diff --git a/sysdeps/aarch64/cpu-features.h b/sysdeps/aarch64/cpu-features.h
index e69de29..a2d0786 100644
--- a/sysdeps/aarch64/cpu-features.h
+++ b/sysdeps/aarch64/cpu-features.h
@@ -0,0 +1,48 @@
+/* This file is part of the GNU C Library.
+ Copyright (C) 2017 Free Software Foundation, Inc.
+
+ The GNU C Library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Lesser General Public
+ License as published by the Free Software Foundation; either
+ version 2.1 of the License, or (at your option) any later version.
+
+ The GNU C Library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public
+ License along with the GNU C Library; if not, see
+ <http://www.gnu.org/licenses/>. */
+
+#ifndef cpu_features_h
+#define cpu_features_h
+
+#include <stdint.h>
+
+#define MIDR_PARTNUM_SHIFT 4
+#define MIDR_PARTNUM_MASK (0xfff << MIDR_PARTNUM_SHIFT)
+#define MIDR_PARTNUM(midr) \
+ (((midr) & MIDR_PARTNUM_MASK) >> MIDR_PARTNUM_SHIFT)
+#define MIDR_ARCHITECTURE_SHIFT 16
+#define MIDR_ARCHITECTURE_MASK (0xf << MIDR_ARCHITECTURE_SHIFT)
+#define MIDR_ARCHITECTURE(midr) \
+ (((midr) & MIDR_ARCHITECTURE_MASK) >> MIDR_ARCHITECTURE_SHIFT)
+#define MIDR_VARIANT_SHIFT 20
+#define MIDR_VARIANT_MASK (0xf << MIDR_VARIANT_SHIFT)
+#define MIDR_VARIANT(midr) \
+ (((midr) & MIDR_VARIANT_MASK) >> MIDR_VARIANT_SHIFT)
+#define MIDR_IMPLEMENTOR_SHIFT 24
+#define MIDR_IMPLEMENTOR_MASK (0xff << MIDR_IMPLEMENTOR_SHIFT)
+#define MIDR_IMPLEMENTOR(midr) \
+ (((midr) & MIDR_IMPLEMENTOR_MASK) >> MIDR_IMPLEMENTOR_SHIFT)
+
+#define IS_THUNDERX(midr) (MIDR_IMPLEMENTOR(midr) == 'C' \
+ && MIDR_PARTNUM(midr) == 0x0a1)
+
+struct cpu_features
+{
+ uint64_t midr_el1;
+};
+
+#endif /* cpu_features_h */
diff --git a/sysdeps/aarch64/dl-get-cpu-features.c b/sysdeps/aarch64/dl-get-cpu-features.c
index e69de29..1581c75 100644
--- a/sysdeps/aarch64/dl-get-cpu-features.c
+++ b/sysdeps/aarch64/dl-get-cpu-features.c
@@ -0,0 +1,26 @@
+/* This file is part of the GNU C Library.
+ Copyright (C) 2017 Free Software Foundation, Inc.
+
+ The GNU C Library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Lesser General Public
+ License as published by the Free Software Foundation; either
+ version 2.1 of the License, or (at your option) any later version.
+
+ The GNU C Library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public
+ License along with the GNU C Library; if not, see
+ <http://www.gnu.org/licenses/>. */
+
+#include <ldsodefs.h>
+
+#undef __get_cpu_features
+
+const struct cpu_features *
+__get_cpu_features (void)
+{
+ return &GLRO(dl_aarch64_cpu_features);
+}
diff --git a/sysdeps/aarch64/dl-machine.h b/sysdeps/aarch64/dl-machine.h
index 84b8aec..4e34b13 100644
--- a/sysdeps/aarch64/dl-machine.h
+++ b/sysdeps/aarch64/dl-machine.h
@@ -25,6 +25,7 @@
#include <tls.h>
#include <dl-tlsdesc.h>
#include <dl-irel.h>
+#include <cpu-features.c>
/* Return nonzero iff ELF header is compatible with the running host. */
static inline int __attribute__ ((unused))
@@ -426,4 +427,20 @@ elf_machine_lazy_rel (struct link_map *map,
_dl_reloc_bad_type (map, r_type, 1);
}
+#define DL_PLATFORM_INIT dl_platform_init ()
+
+static inline void __attribute__ ((unused))
+dl_platform_init (void)
+{
+ if (GLRO(dl_platform) != NULL && *GLRO(dl_platform) == '\0')
+ /* Avoid an empty string which would disturb us. */
+ GLRO(dl_platform) = NULL;
+
+#ifdef SHARED
+ /* init_cpu_features has been called early from __libc_start_main in
+ static executable. */
+ init_cpu_features (&GLRO(dl_aarch64_cpu_features));
+#endif
+}
+
#endif
diff --git a/sysdeps/aarch64/dl-procinfo.c b/sysdeps/aarch64/dl-procinfo.c
index e69de29..8d477d5 100644
--- a/sysdeps/aarch64/dl-procinfo.c
+++ b/sysdeps/aarch64/dl-procinfo.c
@@ -0,0 +1,57 @@
+/* Data for Aarch64 version of processor capability information.
+ Copyright (C) 2017 Free Software Foundation, Inc.
+ This file is part of the GNU C Library.
+
+ The GNU C Library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Lesser General Public
+ License as published by the Free Software Foundation; either
+ version 2.1 of the License, or (at your option) any later version.
+
+ The GNU C Library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public
+ License along with the GNU C Library; if not, see
+ <http://www.gnu.org/licenses/>. */
+
+/* If anything should be added here check whether the size of each string
+ is still ok with the given array size.
+
+ All the #ifdefs in the definitions are quite irritating but
+ necessary if we want to avoid duplicating the information. There
+ are three different modes:
+
+ - PROCINFO_DECL is defined. This means we are only interested in
+ declarations.
+
+ - PROCINFO_DECL is not defined:
+
+ + if SHARED is defined the file is included in an array
+ initializer. The .element = { ... } syntax is needed.
+
+ + if SHARED is not defined a normal array initialization is
+ needed.
+ */
+
+#ifndef PROCINFO_CLASS
+# define PROCINFO_CLASS
+#endif
+
+#if !defined PROCINFO_DECL && defined SHARED
+ ._dl_aarch64_cpu_features
+#else
+PROCINFO_CLASS struct cpu_features _dl_aarch64_cpu_features
+#endif
+#ifndef PROCINFO_DECL
+= { }
+#endif
+#if !defined SHARED || defined PROCINFO_DECL
+;
+#else
+,
+#endif
+
+#undef PROCINFO_DECL
+#undef PROCINFO_CLASS
diff --git a/sysdeps/aarch64/ldsodefs.h b/sysdeps/aarch64/ldsodefs.h
index f277074..ba4ada3 100644
--- a/sysdeps/aarch64/ldsodefs.h
+++ b/sysdeps/aarch64/ldsodefs.h
@@ -20,6 +20,7 @@
#define _AARCH64_LDSODEFS_H 1
#include <elf.h>
+#include <cpu-features.h>
struct La_aarch64_regs;
struct La_aarch64_retval;
diff --git a/sysdeps/aarch64/libc-start.c b/sysdeps/aarch64/libc-start.c
index e69de29..49d5f4a 100644
--- a/sysdeps/aarch64/libc-start.c
+++ b/sysdeps/aarch64/libc-start.c
@@ -0,0 +1,41 @@
+/* Copyright (C) 2017 Free Software Foundation, Inc.
+ This file is part of the GNU C Library.
+
+ The GNU C Library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Lesser General Public
+ License as published by the Free Software Foundation; either
+ version 2.1 of the License, or (at your option) any later version.
+
+ The GNU C Library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public
+ License along with the GNU C Library; if not, see
+ <http://www.gnu.org/licenses/>. */
+
+#ifdef SHARED
+# include <csu/libc-start.c>
+# else
+/* The main work is done in the generic function. */
+# define LIBC_START_DISABLE_INLINE
+# define LIBC_START_MAIN generic_start_main
+# include <csu/libc-start.c>
+# include <cpu-features.h>
+# include <cpu-features.c>
+
+extern struct cpu_features _dl_aarch64_cpu_features;
+
+int
+__libc_start_main (int (*main) (int, char **, char ** MAIN_AUXVEC_DECL),
+ int argc, char **argv,
+ __typeof (main) init,
+ void (*fini) (void),
+ void (*rtld_fini) (void), void *stack_end)
+{
+ init_cpu_features (&_dl_aarch64_cpu_features);
+ return generic_start_main (main, argc, argv, init, fini, rtld_fini,
+ stack_end);
+}
+#endif
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH] Add ifunc memcpy and memmove for aarch64
2017-01-19 19:42 ` Adhemerval Zanella
@ 2017-01-19 21:04 ` Joseph Myers
2017-01-23 23:33 ` Steve Ellcey
1 sibling, 0 replies; 38+ messages in thread
From: Joseph Myers @ 2017-01-19 21:04 UTC (permalink / raw)
To: Adhemerval Zanella; +Cc: libc-alpha
On Thu, 19 Jan 2017, Adhemerval Zanella wrote:
> We need to make sure glibc built against older kernel headers (or with
> --enable-kernel=x.y.z) do not use mrs instruction and glibc built against
> newer kernel that may use mrs fail on loading with DL_SYSDEP_OSCHECK.
Agreed. That is, I think that either the configured minimum kernel
version or the kernel support at runtime (or both, with the configured
minimum kernel allowing runtime tests to be disabled) should be what
determines whether these implementations can be used - rather than
enabling multi-arch changing the minimum kernel version.
--
Joseph S. Myers
joseph@codesourcery.com
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH] Add ifunc memcpy and memmove for aarch64
2017-01-19 18:23 [PATCH] Add ifunc memcpy and memmove for aarch64 Steve Ellcey
@ 2017-01-19 19:42 ` Adhemerval Zanella
2017-01-19 21:04 ` Joseph Myers
2017-01-23 23:33 ` Steve Ellcey
0 siblings, 2 replies; 38+ messages in thread
From: Adhemerval Zanella @ 2017-01-19 19:42 UTC (permalink / raw)
To: libc-alpha
Hi Steve,
On 19/01/2017 16:22, Steve Ellcey wrote:
> +extern uint64_t __midr attribute_hidden;
> +extern bool __is_thunderx attribute_hidden;
> +
> +#define INIT_ARCH() \
> + { \
> + if (__midr == 0) \
> + { \
> + asm volatile ("mrs %0, midr_el1" : "=r"(__midr)); \
> + __is_thunderx = IS_THUNDERX(__midr); \
> + } \
> + }
I think to avoid potentially multiple kernel traps at loading or plt resolve time,
a better solution would be issue the mrs instruction once at loader/program startup,
fill in an internal structure with the required information and use it later on
ifunc resolution. This is similar the cpu-features/cacheinfo strategy for x86.
> diff --git a/sysdeps/unix/sysv/linux/aarch64/configure.ac b/sysdeps/unix/sysv/linux/aarch64/configure.ac
> index 211fa9c..684cb46 100644
> --- a/sysdeps/unix/sysv/linux/aarch64/configure.ac
> +++ b/sysdeps/unix/sysv/linux/aarch64/configure.ac
> @@ -1,6 +1,11 @@
> GLIBC_PROVIDES dnl See aclocal.m4 in the top level source directory.
> # Local configure fragment for sysdeps/unix/sysv/linux/aarch64.
>
> -arch_minimum_kernel=3.7.0
> +# For multi-arch support we need a kernel that emulates the mrs instruction.
> +if test x$multi_arch = xyes; then
> + arch_minimum_kernel=4.11.0
> +else
> + arch_minimum_kernel=3.7.0
> +fi
I do not think this is suffice to prevent the multiarch version on system with
old installed kernel headers. This will only prevents if you explicit use
--enable-multi-arch, however multiarch are enabled by default in configure.ac
(configure.ac:877). So building on with old kernel headers will broke
the runtime.
We need to make sure glibc built against older kernel headers (or with
--enable-kernel=x.y.z) do not use mrs instruction and glibc built against
newer kernel that may use mrs fail on loading with DL_SYSDEP_OSCHECK.
From last patch iteration [1] documentation, kernel provides the HWCAP_CPUID
bit on hwcap to indication it supports the mrs emulation. So using my previous
suggestion I would recommend:
1. Remove any configure check or restriction.
2. Add a cpu_features module similar to x86 that set a global state with
the cpu information obtained from kernel. It will first check HWCAP_CPUID
bit on hwcap and if it is set then issue the mrs instruction. It will
then populate the global state with the required cpu information.
3. Use the cpu information to select the correct ifunc.
It has another advantage of avoid more complexity with different glibc
with different minimum required kernels.
^ permalink raw reply [flat|nested] 38+ messages in thread
* [PATCH] Add ifunc memcpy and memmove for aarch64
@ 2017-01-19 18:23 Steve Ellcey
2017-01-19 19:42 ` Adhemerval Zanella
0 siblings, 1 reply; 38+ messages in thread
From: Steve Ellcey @ 2017-01-19 18:23 UTC (permalink / raw)
To: libc-alpha
[-- Attachment #1: Type: text/plain, Size: 2298 bytes --]
This patch adds ifunc versions of memcpy and memmove for aarch64.  I
know this isn't appropriate for 2.25 but I wanted to submit it and get
it reviewed for 2.26.  The basic change is to include software
prefetching for large memcpy's on thunderx which can speed up those
routines by around 2X.  For memcpy's under 32K bytes I found that the
software prefetching did not help (and sometimes hurt).  I wasn't
really interested in speeding up memmove but since memcpy and memmove
are implemented in one file it seemed easier to make memmove an ifunc
along with memcpy rather than try and split them up.  memmove does get
a speedup when it uses the memcpy code.
The ifunc code depends on the mrs instruction which is a privileged
instruction but the 4.11 version of the linux kernel will have
emulation for it (https://lkml.org/lkml/2017/1/10/816).  Since it is
emulated I added code to save it's value rather than read it everytime
we want to execute an ifunc selection function.  I also saved a flag to
specify if the platform was thunderx or not so that glibc did not have
to do multiple logical operations on the mrs value in each ifunc
selection function to determine if it was on a thunderx platform or
not.
I have attached the bench-memcpy.out, bench-memcpy-large.out, bench-
memmove.out and bench-memmove-large.out files to show the performance
difference, most of the difference is seen in the large versions as the
smaller ones only use prefetching on a couple of inputs.
Steve Ellcey
sellcey@caviumnetworks.com
2017-01-19  Steve Ellcey  <sellcey@caviumnetworks.com>
* sysdeps/aarch64/memcpy.S (MEMMOVE, MEMCPY): New macros.
(memmove): Use MEMMOVE for name.
(memcpy): Use MEMCPY for name.  Add loop with prefetching
under USE_THUNDERX macro.
* sysdeps/aarch64/multiarch/Makefile: New file.
* sysdeps/aarch64/multiarch/ifunc-impl-list.c: Ditto.
* sysdeps/aarch64/multiarch/init-arch.h: Ditto.
* sysdeps/aarch64/multiarch/memcpy.c: Ditto.
* sysdeps/aarch64/multiarch/memcpy_generic.S: Ditto.
* sysdeps/aarch64/multiarch/memcpy_thunderx.S: Ditto.
* sysdeps/unix/sysv/linux/aarch64/configure.ac (arch_minimum_kernel):
Set to 4.11.0 if building with multi_arch.
* sysdeps/unix/sysv/linux/aarch64/configure: Regenerate.
[-- Attachment #2: ifunc.patch --]
[-- Type: text/x-patch, Size: 15812 bytes --]
diff --git a/sysdeps/aarch64/memcpy.S b/sysdeps/aarch64/memcpy.S
index 29af8b1..74444b4 100644
--- a/sysdeps/aarch64/memcpy.S
+++ b/sysdeps/aarch64/memcpy.S
@@ -59,7 +59,14 @@
Overlapping large forward memmoves use a loop that copies backwards.
*/
-ENTRY_ALIGN (memmove, 6)
+#ifndef MEMMOVE
+# define MEMMOVE memmove
+#endif
+#ifndef MEMCPY
+# define MEMCPY memcpy
+#endif
+
+ENTRY_ALIGN (MEMMOVE, 6)
DELOUSE (0)
DELOUSE (1)
@@ -71,9 +78,9 @@ ENTRY_ALIGN (memmove, 6)
b.lo L(move_long)
/* Common case falls through into memcpy. */
-END (memmove)
-libc_hidden_builtin_def (memmove)
-ENTRY (memcpy)
+END (MEMMOVE)
+libc_hidden_builtin_def (MEMMOVE)
+ENTRY (MEMCPY)
DELOUSE (0)
DELOUSE (1)
@@ -158,10 +165,22 @@ L(copy96):
.p2align 4
L(copy_long):
+
+#ifdef USE_THUNDERX
+
+ /* On thunderx, large memcpy's are helped by software prefetching.
+ This loop is identical to the one below it but with prefetching
+ instructions included. For loops that are less than 32768 bytes,
+ the prefetching does not help and slow the code down so we only
+ use the prefetching loop for the largest memcpys. */
+
+ cmp count, #32768
+ b.lo L(copy_long_without_prefetch)
and tmp1, dstin, 15
bic dst, dstin, 15
ldp D_l, D_h, [src]
sub src, src, tmp1
+ prfm pldl1strm, [src, 384]
add count, count, tmp1 /* Count is now 16 too large. */
ldp A_l, A_h, [src, 16]
stp D_l, D_h, [dstin]
@@ -169,7 +188,10 @@ L(copy_long):
ldp C_l, C_h, [src, 48]
ldp D_l, D_h, [src, 64]!
subs count, count, 128 + 16 /* Test and readjust count. */
- b.ls 2f
+
+L(prefetch_loop64):
+ tbz src, #6, 1f
+ prfm pldl1strm, [src, 512]
1:
stp A_l, A_h, [dst, 16]
ldp A_l, A_h, [src, 16]
@@ -180,12 +202,40 @@ L(copy_long):
stp D_l, D_h, [dst, 64]!
ldp D_l, D_h, [src, 64]!
subs count, count, 64
- b.hi 1b
+ b.hi L(prefetch_loop64)
+ b L(last64)
+
+L(copy_long_without_prefetch):
+#endif
+
+ and tmp1, dstin, 15
+ bic dst, dstin, 15
+ ldp D_l, D_h, [src]
+ sub src, src, tmp1
+ add count, count, tmp1 /* Count is now 16 too large. */
+ ldp A_l, A_h, [src, 16]
+ stp D_l, D_h, [dstin]
+ ldp B_l, B_h, [src, 32]
+ ldp C_l, C_h, [src, 48]
+ ldp D_l, D_h, [src, 64]!
+ subs count, count, 128 + 16 /* Test and readjust count. */
+ b.ls L(last64)
+L(loop64):
+ stp A_l, A_h, [dst, 16]
+ ldp A_l, A_h, [src, 16]
+ stp B_l, B_h, [dst, 32]
+ ldp B_l, B_h, [src, 32]
+ stp C_l, C_h, [dst, 48]
+ ldp C_l, C_h, [src, 48]
+ stp D_l, D_h, [dst, 64]!
+ ldp D_l, D_h, [src, 64]!
+ subs count, count, 64
+ b.hi L(loop64)
/* Write the last full set of 64 bytes. The remainder is at most 64
bytes, so it is safe to always copy 64 bytes from the end even if
there is just 1 byte left. */
-2:
+L(last64):
ldp E_l, E_h, [srcend, -64]
stp A_l, A_h, [dst, 16]
ldp A_l, A_h, [srcend, -48]
@@ -256,5 +306,5 @@ L(move_long):
stp C_l, C_h, [dstin]
3: ret
-END (memcpy)
-libc_hidden_builtin_def (memcpy)
+END (MEMCPY)
+libc_hidden_builtin_def (MEMCPY)
diff --git a/sysdeps/aarch64/multiarch/Makefile b/sysdeps/aarch64/multiarch/Makefile
index e69de29..78d52c7 100644
--- a/sysdeps/aarch64/multiarch/Makefile
+++ b/sysdeps/aarch64/multiarch/Makefile
@@ -0,0 +1,3 @@
+ifeq ($(subdir),string)
+sysdep_routines += memcpy_generic memcpy_thunderx
+endif
diff --git a/sysdeps/aarch64/multiarch/ifunc-impl-list.c b/sysdeps/aarch64/multiarch/ifunc-impl-list.c
index e69de29..c6d63f6 100644
--- a/sysdeps/aarch64/multiarch/ifunc-impl-list.c
+++ b/sysdeps/aarch64/multiarch/ifunc-impl-list.c
@@ -0,0 +1,61 @@
+/* Enumerate available IFUNC implementations of a function. AARCH64 version.
+ Copyright (C) 2017 Free Software Foundation, Inc.
+ This file is part of the GNU C Library.
+
+ The GNU C Library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Lesser General Public
+ License as published by the Free Software Foundation; either
+ version 2.1 of the License, or (at your option) any later version.
+
+ The GNU C Library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public
+ License along with the GNU C Library; if not, see
+ <http://www.gnu.org/licenses/>. */
+
+#include <assert.h>
+#include <string.h>
+#include <wchar.h>
+#include <ldsodefs.h>
+#include <ifunc-impl-list.h>
+#include <init-arch.h>
+#include <stdio.h>
+
+/* Access to the midr_el1 register is emulated by the linux kernel and
+ is slow so we save it in __midr after it is read once. We also save
+ the value of IS_THUNDERX in __is_thunderx so it does not need to be
+ recomputed by checking multiple bits from __midr. */
+
+uint64_t __midr attribute_hidden = 0;
+bool __is_thunderx attribute_hidden;
+
+/* Maximum number of IFUNC implementations. */
+#define MAX_IFUNC 2
+
+size_t
+__libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array,
+ size_t max)
+{
+ assert (max >= MAX_IFUNC);
+
+ size_t i = 0;
+
+ INIT_ARCH ();
+
+#ifdef SHARED
+ /* Support sysdeps/aarch64/multiarch/memcpy.c. */
+ IFUNC_IMPL (i, name, memcpy,
+ IFUNC_IMPL_ADD (array, i, memcpy, __is_thunderx,
+ __memcpy_thunderx)
+ IFUNC_IMPL_ADD (array, i, memcpy, 1, __memcpy_generic))
+ IFUNC_IMPL (i, name, memmove,
+ IFUNC_IMPL_ADD (array, i, memmove, __is_thunderx,
+ __memmove_thunderx)
+ IFUNC_IMPL_ADD (array, i, memmove, 1, __memmove_generic))
+#endif
+
+ return i;
+}
diff --git a/sysdeps/aarch64/multiarch/init-arch.h b/sysdeps/aarch64/multiarch/init-arch.h
index e69de29..e12ba61 100644
--- a/sysdeps/aarch64/multiarch/init-arch.h
+++ b/sysdeps/aarch64/multiarch/init-arch.h
@@ -0,0 +1,55 @@
+/* This file is part of the GNU C Library.
+ Copyright (C) 2017 Free Software Foundation, Inc.
+
+ The GNU C Library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Lesser General Public
+ License as published by the Free Software Foundation; either
+ version 2.1 of the License, or (at your option) any later version.
+
+ The GNU C Library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public
+ License along with the GNU C Library; if not, see
+ <http://www.gnu.org/licenses/>. */
+
+#include <elf.h>
+#include <stdint.h>
+#include <stdbool.h>
+
+#define MIDR_REVISION_MASK 0xf
+#define MIDR_REVISION(__midr) ((__midr) & MIDR_REVISION_MASK)
+#define MIDR_PARTNUM_SHIFT 4
+#define MIDR_PARTNUM_MASK (0xfff << MIDR_PARTNUM_SHIFT)
+#define MIDR_PARTNUM(__midr) \
+ (((__midr) & MIDR_PARTNUM_MASK) >> MIDR_PARTNUM_SHIFT)
+#define MIDR_ARCHITECTURE_SHIFT 16
+#define MIDR_ARCHITECTURE_MASK (0xf << MIDR_ARCHITECTURE_SHIFT)
+#define MIDR_ARCHITECTURE(__midr) \
+ (((__midr) & MIDR_ARCHITECTURE_MASK) >> MIDR_ARCHITECTURE_SHIFT)
+#define MIDR_VARIANT_SHIFT 20
+#define MIDR_VARIANT_MASK (0xf << MIDR_VARIANT_SHIFT)
+#define MIDR_VARIANT(__midr) \
+ (((__midr) & MIDR_VARIANT_MASK) >> MIDR_VARIANT_SHIFT)
+#define MIDR_IMPLEMENTOR_SHIFT 24
+#define MIDR_IMPLEMENTOR_MASK (0xff << MIDR_IMPLEMENTOR_SHIFT)
+#define MIDR_IMPLEMENTOR(__midr) \
+ (((__midr) & MIDR_IMPLEMENTOR_MASK) >> MIDR_IMPLEMENTOR_SHIFT)
+
+#define IS_THUNDERX(__midr) (MIDR_IMPLEMENTOR(__midr) == 'C' \
+ && MIDR_PARTNUM(__midr) == 0x0a1)
+
+
+extern uint64_t __midr attribute_hidden;
+extern bool __is_thunderx attribute_hidden;
+
+#define INIT_ARCH() \
+ { \
+ if (__midr == 0) \
+ { \
+ asm volatile ("mrs %0, midr_el1" : "=r"(__midr)); \
+ __is_thunderx = IS_THUNDERX(__midr); \
+ } \
+ }
diff --git a/sysdeps/aarch64/multiarch/memcpy.c b/sysdeps/aarch64/multiarch/memcpy.c
index e69de29..b2b587b 100644
--- a/sysdeps/aarch64/multiarch/memcpy.c
+++ b/sysdeps/aarch64/multiarch/memcpy.c
@@ -0,0 +1,41 @@
+/* Multiple versions of memcpy. AARCH64 version.
+ Copyright (C) 2017 Free Software Foundation, Inc.
+ This file is part of the GNU C Library.
+
+ The GNU C Library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Lesser General Public
+ License as published by the Free Software Foundation; either
+ version 2.1 of the License, or (at your option) any later version.
+
+ The GNU C Library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public
+ License along with the GNU C Library; if not, see
+ <http://www.gnu.org/licenses/>. */
+
+/* Define multiple versions only for the definition in lib and for
+ DSO. In static binaries we need memcpy before the initialization
+ happened. */
+
+#if defined SHARED && IS_IN (libc)
+/* Redefine memcpy so that the compiler won't complain about the type
+ mismatch with the IFUNC selector in strong_alias, below. */
+# undef memcpy
+# define memcpy __redirect_memcpy
+# include <string.h>
+# include <init-arch.h>
+
+extern __typeof (__redirect_memcpy) __libc_memcpy;
+
+extern __typeof (__redirect_memcpy) __memcpy_generic attribute_hidden;
+extern __typeof (__redirect_memcpy) __memcpy_thunderx attribute_hidden;
+
+libc_ifunc (__libc_memcpy,
+ __is_thunderx ? __memcpy_thunderx : __memcpy_generic);
+
+#undef memcpy
+strong_alias (__libc_memcpy, memcpy);
+#endif
diff --git a/sysdeps/aarch64/multiarch/memcpy_generic.S b/sysdeps/aarch64/multiarch/memcpy_generic.S
index e69de29..c0e3462 100644
--- a/sysdeps/aarch64/multiarch/memcpy_generic.S
+++ b/sysdeps/aarch64/multiarch/memcpy_generic.S
@@ -0,0 +1,42 @@
+/* A Generic Optimized memcpy implementation for AARCH64.
+ Copyright (C) 2017 Free Software Foundation, Inc.
+ This file is part of the GNU C Library.
+
+ The GNU C Library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Lesser General Public
+ License as published by the Free Software Foundation; either
+ version 2.1 of the License, or (at your option) any later version.
+
+ The GNU C Library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public
+ License along with the GNU C Library; if not, see
+ <http://www.gnu.org/licenses/>. */
+
+/* The actual memcpy and memmove code is in ../memcpy.S. If we are
+ building a shared libc using IFUNC this file defines __memcpy_generic
+ and __memmove_generic. Otherwise the include of ../memcpy.S will
+ define the normal __memcpy and__memmove entry points. */
+
+#include <sysdep.h>
+
+#if defined SHARED && IS_IN (libc)
+
+#define MEMCPY __memcpy_generic
+#define MEMMOVE __memmove_generic
+
+/* Do not hide the generic versions of memcpy and memmove, we use them
+ internally. */
+#undef libc_hidden_builtin_def
+#define libc_hidden_builtin_def(name)
+
+/* It doesn't make sense to send libc-internal memcpy calls through a PLT. */
+ .globl __GI_memcpy; __GI_memcpy = __memcpy_generic
+ .globl __GI_memmove; __GI_memmove = __memmove_generic
+
+#endif
+
+#include "../memcpy.S"
diff --git a/sysdeps/aarch64/multiarch/memcpy_thunderx.S b/sysdeps/aarch64/multiarch/memcpy_thunderx.S
index e69de29..df5e959 100644
--- a/sysdeps/aarch64/multiarch/memcpy_thunderx.S
+++ b/sysdeps/aarch64/multiarch/memcpy_thunderx.S
@@ -0,0 +1,33 @@
+/* A Thunderx Optimized memcpy implementation for AARCH64.
+ Copyright (C) 2017 Free Software Foundation, Inc.
+ This file is part of the GNU C Library.
+
+ The GNU C Library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Lesser General Public
+ License as published by the Free Software Foundation; either
+ version 2.1 of the License, or (at your option) any later version.
+
+ The GNU C Library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public
+ License along with the GNU C Library; if not, see
+ <http://www.gnu.org/licenses/>. */
+
+/* The actual thunderx optimized code is in ../memcpy.S under the USE_THUNDERX
+ ifdef. If we are not building a shared libc with IFUNC then we do not
+ build anything when compiling this file and __memcpy is defined by
+ memcpy_generic.S. */
+
+#include <sysdep.h>
+
+#if defined SHARED && IS_IN (libc)
+
+#define MEMCPY __memcpy_thunderx
+#define MEMMOVE __memmove_thunderx
+#define USE_THUNDERX
+#include "../memcpy.S"
+
+#endif
diff --git a/sysdeps/aarch64/multiarch/memmove.c b/sysdeps/aarch64/multiarch/memmove.c
index e69de29..c08c763 100644
--- a/sysdeps/aarch64/multiarch/memmove.c
+++ b/sysdeps/aarch64/multiarch/memmove.c
@@ -0,0 +1,40 @@
+/* Multiple versions of memmove. AARCH64 version.
+ Copyright (C) 2017 Free Software Foundation, Inc.
+ This file is part of the GNU C Library.
+
+ The GNU C Library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Lesser General Public
+ License as published by the Free Software Foundation; either
+ version 2.1 of the License, or (at your option) any later version.
+
+ The GNU C Library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public
+ License along with the GNU C Library; if not, see
+ <http://www.gnu.org/licenses/>. */
+
+/* Define multiple versions only for the definition in lib and for
+ DSO. In static binaries we need memmove before the initialization
+ happened. */
+#if defined SHARED && IS_IN (libc)
+/* Redefine memmove so that the compiler won't complain about the type
+ mismatch with the IFUNC selector in strong_alias, below. */
+# undef memmove
+# define memmove __redirect_memmove
+# include <string.h>
+# include <init-arch.h>
+
+extern __typeof (__redirect_memmove) __libc_memmove;
+
+extern __typeof (__redirect_memmove) __memmove_generic attribute_hidden;
+extern __typeof (__redirect_memmove) __memmove_thunderx attribute_hidden;
+
+libc_ifunc (__libc_memmove,
+ __is_thunderx ? __memmove_thunderx : __memmove_generic);
+
+#undef memmove
+strong_alias (__libc_memmove, memmove);
+#endif
diff --git a/sysdeps/unix/sysv/linux/aarch64/configure.ac b/sysdeps/unix/sysv/linux/aarch64/configure.ac
index 211fa9c..684cb46 100644
--- a/sysdeps/unix/sysv/linux/aarch64/configure.ac
+++ b/sysdeps/unix/sysv/linux/aarch64/configure.ac
@@ -1,6 +1,11 @@
GLIBC_PROVIDES dnl See aclocal.m4 in the top level source directory.
# Local configure fragment for sysdeps/unix/sysv/linux/aarch64.
-arch_minimum_kernel=3.7.0
+# For multi-arch support we need a kernel that emulates the mrs instruction.
+if test x$multi_arch = xyes; then
+ arch_minimum_kernel=4.11.0
+else
+ arch_minimum_kernel=3.7.0
+fi
LIBC_SLIBDIR_RTLDDIR([lib64], [lib])
[-- Attachment #3: bench-memcpy.out --]
[-- Type: text/plain, Size: 26125 bytes --]
builtin_memcpy simple_memcpy __memcpy_thunderx __memcpy_generic
Length 1, alignment 0/ 0: 40.4688 21.25 21.0938 21.5625
Length 1, alignment 0/ 0: 24.0625 21.5625 21.4062 21.0938
Length 1, alignment 0/ 0: 23.9062 15.7812 20.7812 20.9375
Length 1, alignment 0/ 0: 24.0625 15.7812 20.7812 20.9375
Length 2, alignment 0/ 0: 24.0625 24.375 20.9375 20.9375
Length 2, alignment 1/ 0: 24.0625 23.75 20.7812 20.9375
Length 2, alignment 0/ 1: 24.0625 22.9688 20.9375 20.9375
Length 2, alignment 1/ 1: 24.0625 22.9688 20.7812 20.7812
Length 4, alignment 0/ 0: 22.9688 24.6875 19.0625 19.5312
Length 4, alignment 2/ 0: 22.0312 23.9062 18.9062 19.2188
Length 4, alignment 0/ 2: 22.0312 23.4375 18.9062 18.9062
Length 4, alignment 2/ 2: 22.0312 23.4375 18.9062 18.9062
Length 8, alignment 0/ 0: 21.875 35.3125 17.9688 18.125
Length 8, alignment 3/ 0: 30.3125 34.375 26.0938 26.25
Length 8, alignment 0/ 3: 31.875 33.5938 27.8125 27.6562
Length 8, alignment 3/ 3: 39.375 33.5938 35.1562 35.3125
Length 16, alignment 0/ 0: 20.7812 67.9688 17.6562 17.6562
Length 16, alignment 4/ 0: 30.3125 67.5 26.25 26.25
Length 16, alignment 0/ 4: 31.875 67.3438 27.6562 27.6562
Length 16, alignment 4/ 4: 39.375 67.5 35.1562 35.3125
Length 32, alignment 0/ 0: 22.3438 99.375 17.6562 17.6562
Length 32, alignment 5/ 0: 31.5625 99.2188 27.3438 27.3438
Length 32, alignment 0/ 5: 31.5625 99.2188 27.3438 27.3438
Length 32, alignment 5/ 5: 40.4688 99.2188 36.4062 36.4062
Length 64, alignment 0/ 0: 23.9062 179.219 19.6875 19.375
Length 64, alignment 6/ 0: 40.3125 179.219 36.0938 36.0938
Length 64, alignment 0/ 6: 41.7188 179.375 37.6562 37.5
Length 64, alignment 6/ 6: 57.5 179.219 53.4375 53.4375
Length 128, alignment 0/ 0: 30.4688 339.219 27.0312 25.3125
Length 128, alignment 7/ 0: 67.8125 339.219 64.375 63.125
Length 128, alignment 0/ 7: 69.8438 339.375 66.4062 64.375
Length 128, alignment 7/ 7: 72.3438 339.375 68.9062 67.3438
Length 256, alignment 0/ 0: 40.1562 659.219 35.9375 37.3438
Length 256, alignment 8/ 0: 108.594 659.219 105.156 106.094
Length 256, alignment 0/ 8: 110.469 659.375 107.188 107.5
Length 256, alignment 8/ 8: 81.4062 659.375 78.125 80
Length 512, alignment 0/ 0: 57.9688 1299.38 53.9062 52.3438
Length 512, alignment 9/ 0: 194.844 1299.53 191.719 190.156
Length 512, alignment 0/ 9: 196.875 1299.53 193.75 191.719
Length 512, alignment 9/ 9: 99.375 1299.53 96.0938 94.5312
Length 1024, alignment 0/ 0: 97.0312 2579.53 93.2812 91.4062
Length 1024, alignment 10/ 0: 372.812 2579.53 369.688 368.125
Length 1024, alignment 0/10: 375.156 2579.38 371.562 369.531
Length 1024, alignment 10/10: 139.062 2579.38 135.469 134.219
Length 2048, alignment 0/ 0: 194.375 5393.91 193.281 188.125
Length 2048, alignment 11/ 0: 713.906 5139.84 710.781 709.375
Length 2048, alignment 0/11: 715.938 5139.84 712.812 710.938
Length 2048, alignment 11/11: 235.781 5139.69 233.281 230.156
Length 4096, alignment 0/ 0: 369.062 10260.9 366.406 360.781
Length 4096, alignment 12/ 0: 1403.44 10260.9 1400.16 1398.59
Length 4096, alignment 0/12: 1405.16 10435.2 1402.66 1399.84
Length 4096, alignment 12/12: 410.781 10260.3 408.125 402.344
Length 8192, alignment 0/ 0: 717.344 20503.6 714.844 703.906
Length 8192, alignment 13/ 0: 2781.41 20740.5 2778.12 2776.72
Length 8192, alignment 0/13: 2783.28 20503.1 2779.38 2776.56
Length 8192, alignment 13/13: 759.375 20654.2 757.031 746.406
Length 16384, alignment 0/ 0: 1430.16 41006.7 1429.84 1408.59
Length 16384, alignment 14/ 0: 5563.91 41004.4 5550.16 5734.38
Length 16384, alignment 0/14: 5549.06 41002.2 5547.03 5693.75
Length 16384, alignment 14/14: 1461.88 40995.2 1461.41 1440.47
Length 32768, alignment 0/ 0: 3426.72 82858.4 3555.16 4450.62
Length 32768, alignment 15/ 0: 12141.2 83634.5 12128.8 12733.9
Length 32768, alignment 0/15: 12190.8 83705 12164.4 12682.2
Length 32768, alignment 15/15: 3548.28 83734.7 3546.41 3970
Length 65536, alignment 0/ 0: 7917.97 174364 7863.91 15286.1
Length 65536, alignment 16/ 0: 8100.16 174329 7898.44 15577.5
Length 65536, alignment 0/16: 7956.56 174425 7906.09 15322
Length 65536, alignment 16/16: 7952.34 174475 7907.34 15613.8
Length 0, alignment 0/ 0: 25.3125 17.0312 21.7188 21.7188
Length 0, alignment 0/ 0: 24.375 16.5625 21.4062 21.5625
Length 0, alignment 0/ 0: 24.375 16.4062 21.4062 21.25
Length 0, alignment 0/ 0: 24.5312 16.4062 21.25 21.4062
Length 1, alignment 0/ 0: 24.375 16.4062 21.25 20.9375
Length 1, alignment 1/ 0: 24.0625 15.9375 20.9375 21.0938
Length 1, alignment 0/ 1: 24.0625 15.9375 20.9375 21.0938
Length 1, alignment 1/ 1: 24.2188 15.9375 20.9375 20.9375
Length 2, alignment 0/ 0: 24.0625 23.4375 20.9375 21.0938
Length 2, alignment 2/ 0: 24.0625 23.4375 20.9375 21.0938
Length 2, alignment 0/ 2: 24.0625 23.125 20.9375 21.0938
Length 2, alignment 2/ 2: 24.0625 23.2812 20.9375 21.0938
Length 3, alignment 0/ 0: 24.0625 22.5 20.9375 20.9375
Length 3, alignment 3/ 0: 24.0625 21.875 20.9375 20.9375
Length 3, alignment 0/ 3: 24.0625 20.7812 20.9375 20.9375
Length 3, alignment 3/ 3: 24.2188 20.9375 20.9375 20.9375
Length 4, alignment 0/ 0: 22.3438 23.75 19.0625 18.9062
Length 4, alignment 4/ 0: 22.1875 23.5938 18.9062 19.0625
Length 4, alignment 0/ 4: 22.0312 23.4375 19.0625 18.9062
Length 4, alignment 4/ 4: 22.0312 23.4375 18.9062 19.0625
Length 5, alignment 0/ 0: 22.0312 27.1875 18.9062 18.9062
Length 5, alignment 5/ 0: 30.625 26.4062 27.5 27.3438
Length 5, alignment 0/ 5: 32.1875 25.9375 28.9062 29.0625
Length 5, alignment 5/ 5: 39.6875 25.9375 36.25 36.4062
Length 6, alignment 0/ 0: 22.0312 29.375 18.9062 18.9062
Length 6, alignment 6/ 0: 26.0938 28.75 22.9688 22.8125
Length 6, alignment 0/ 6: 27.5 28.5938 24.375 24.5312
Length 6, alignment 6/ 6: 31.0938 28.4375 27.9688 27.9688
Length 7, alignment 0/ 0: 21.875 32.5 18.9062 18.9062
Length 7, alignment 7/ 0: 25.9375 31.5625 22.9688 22.9688
Length 7, alignment 0/ 7: 27.5 30.9375 24.375 24.375
Length 7, alignment 7/ 7: 31.0938 31.0938 27.9688 27.9688
Length 8, alignment 0/ 0: 21.4062 34.0625 17.9688 17.8125
Length 8, alignment 8/ 0: 20.9375 33.9062 17.6562 17.8125
Length 8, alignment 0/ 8: 20.9375 33.75 17.8125 17.8125
Length 8, alignment 8/ 8: 20.9375 33.75 17.6562 17.8125
Length 9, alignment 0/ 0: 31.4062 37.0312 27.1875 27.3438
Length 9, alignment 9/ 0: 35.4688 36.5625 31.0938 31.25
Length 9, alignment 0/ 9: 35.4688 36.25 31.0938 31.4062
Length 9, alignment 9/ 9: 39.375 36.4062 35.3125 35.3125
Length 10, alignment 0/ 0: 31.4062 39.2188 27.3438 27.1875
Length 10, alignment 10/ 0: 35.3125 38.9062 31.25 31.25
Length 10, alignment 0/10: 35.4688 38.75 31.25 31.25
Length 10, alignment 10/10: 39.375 173.438 35.625 35.3125
Length 11, alignment 0/ 0: 31.4062 41.4062 27.3438 27.3438
Length 11, alignment 11/ 0: 35.3125 41.25 31.25 31.25
Length 11, alignment 0/11: 35.4688 41.0938 31.25 31.25
Length 11, alignment 11/11: 39.5312 41.0938 35.3125 35.3125
Length 12, alignment 0/ 0: 31.4062 44.0625 27.1875 27.1875
Length 12, alignment 12/ 0: 31.875 43.75 27.8125 27.8125
Length 12, alignment 0/12: 30.9375 43.5938 26.875 26.7188
Length 12, alignment 12/12: 30.9375 43.5938 26.7188 26.7188
Length 13, alignment 0/ 0: 31.4062 46.25 27.3438 27.1875
Length 13, alignment 13/ 0: 35.4688 46.25 31.25 31.25
Length 13, alignment 0/13: 35.3125 46.0938 31.25 31.25
Length 13, alignment 13/13: 39.375 46.25 35.3125 35.3125
Length 14, alignment 0/ 0: 31.4062 52.6562 27.3438 27.1875
Length 14, alignment 14/ 0: 35.4688 52.5 31.25 31.4062
Length 14, alignment 0/14: 35.3125 52.5 31.25 31.25
Length 14, alignment 14/14: 39.375 52.5 35.1562 35.3125
Length 15, alignment 0/ 0: 31.4062 65 27.3438 27.3438
Length 15, alignment 15/ 0: 35.3125 64.8438 31.25 31.25
Length 15, alignment 0/15: 35.4688 64.8438 31.0938 31.25
Length 15, alignment 15/15: 39.5312 64.8438 35.3125 35.3125
Length 16, alignment 0/ 0: 20.9375 67.5 17.6562 17.6562
Length 16, alignment 16/ 0: 23.2812 67.5 17.8125 17.6562
Length 16, alignment 0/16: 20.9375 67.3438 17.6562 17.6562
Length 16, alignment 16/16: 20.9375 67.3438 17.8125 17.6562
Length 17, alignment 0/ 0: 32.9688 62.0312 28.5938 28.4375
Length 17, alignment 17/ 0: 36.5625 61.875 32.5 32.5
Length 17, alignment 0/17: 36.5625 61.875 32.5 32.3438
Length 17, alignment 17/17: 40.625 61.875 36.4062 36.4062
Length 18, alignment 0/ 0: 32.5 64.375 28.4375 28.2812
Length 18, alignment 18/ 0: 36.5625 64.375 32.3438 32.3438
Length 18, alignment 0/18: 36.5625 64.375 32.5 32.3438
Length 18, alignment 18/18: 40.4688 64.375 36.4062 36.4062
Length 19, alignment 0/ 0: 32.5 66.875 28.4375 28.2812
Length 19, alignment 19/ 0: 36.7188 66.875 32.3438 32.5
Length 19, alignment 0/19: 36.5625 66.875 32.3438 32.3438
Length 19, alignment 19/19: 40.625 66.875 36.5625 36.4062
Length 20, alignment 0/ 0: 32.5 69.375 28.4375 28.2812
Length 20, alignment 20/ 0: 36.5625 69.375 32.5 32.3438
Length 20, alignment 0/20: 36.5625 69.375 32.5 32.3438
Length 20, alignment 20/20: 40.625 69.375 36.5625 36.25
Length 21, alignment 0/ 0: 32.5 71.875 28.4375 28.2812
Length 21, alignment 21/ 0: 36.5625 71.875 32.5 32.3438
Length 21, alignment 0/21: 36.5625 71.875 32.3438 32.3438
Length 21, alignment 21/21: 40.625 71.875 36.4062 36.4062
Length 22, alignment 0/ 0: 32.6562 74.375 28.4375 28.4375
Length 22, alignment 22/ 0: 36.5625 74.375 32.3438 32.3438
Length 22, alignment 0/22: 36.7188 74.375 32.3438 32.3438
Length 22, alignment 22/22: 40.4688 74.375 36.4062 36.4062
Length 23, alignment 0/ 0: 32.5 76.875 28.4375 28.2812
Length 23, alignment 23/ 0: 36.5625 76.875 32.3438 32.3438
Length 23, alignment 0/23: 36.5625 76.875 32.3438 32.3438
Length 23, alignment 23/23: 40.4688 76.875 36.4062 36.4062
Length 24, alignment 0/ 0: 32.6562 79.375 28.4375 28.4375
Length 24, alignment 24/ 0: 32.0312 79.375 27.9688 27.9688
Length 24, alignment 0/24: 32.0312 79.375 27.9688 27.9688
Length 24, alignment 24/24: 31.5625 79.375 27.5 27.3438
Length 25, alignment 0/ 0: 32.5 81.875 28.4375 28.4375
Length 25, alignment 25/ 0: 36.5625 81.875 32.5 32.3438
Length 25, alignment 0/25: 36.5625 81.875 32.5 32.3438
Length 25, alignment 25/25: 40.625 81.875 36.4062 36.4062
Length 26, alignment 0/ 0: 32.6562 84.375 28.4375 28.4375
Length 26, alignment 26/ 0: 36.5625 84.375 32.3438 32.3438
Length 26, alignment 0/26: 36.5625 84.2188 32.5 32.3438
Length 26, alignment 26/26: 40.625 84.375 36.4062 36.25
Length 27, alignment 0/ 0: 32.6562 86.7188 28.4375 28.2812
Length 27, alignment 27/ 0: 36.5625 86.875 32.3438 32.3438
Length 27, alignment 0/27: 36.7188 86.875 32.3438 32.3438
Length 27, alignment 27/27: 40.4688 86.875 36.5625 36.4062
Length 28, alignment 0/ 0: 32.6562 89.375 28.4375 28.4375
Length 28, alignment 28/ 0: 36.5625 89.375 32.5 32.3438
Length 28, alignment 0/28: 36.5625 89.2188 32.5 32.3438
Length 28, alignment 28/28: 40.625 89.375 36.4062 36.4062
Length 29, alignment 0/ 0: 32.6562 91.7188 28.4375 28.4375
Length 29, alignment 29/ 0: 36.5625 91.875 32.3438 32.3438
Length 29, alignment 0/29: 36.5625 91.875 32.3438 32.3438
Length 29, alignment 29/29: 40.4688 91.875 36.5625 36.4062
Length 30, alignment 0/ 0: 32.5 94.375 28.4375 28.4375
Length 30, alignment 30/ 0: 36.5625 94.375 32.3438 32.5
Length 30, alignment 0/30: 36.5625 94.375 32.3438 32.5
Length 30, alignment 30/30: 40.625 95.625 36.4062 36.4062
Length 31, alignment 0/ 0: 32.5 96.875 28.4375 28.2812
Length 31, alignment 31/ 0: 36.5625 96.7188 32.5 32.3438
Length 31, alignment 0/31: 36.5625 96.875 32.3438 32.3438
Length 31, alignment 31/31: 40.4688 96.875 36.4062 36.4062
Length 48, alignment 0/ 0: 22.9688 139.219 19.2188 19.2188
Length 48, alignment 3/ 0: 40.1562 139.219 36.25 36.0938
Length 48, alignment 0/ 3: 41.7188 139.375 37.6562 37.5
Length 48, alignment 3/ 3: 57.5 139.219 53.4375 53.4375
Length 80, alignment 0/ 0: 25.1562 219.219 20.7812 20.4688
Length 80, alignment 5/ 0: 51.5625 219.375 47.8125 48.2812
Length 80, alignment 0/ 5: 51.0938 219.219 46.875 46.875
Length 80, alignment 5/ 5: 78.5938 219.219 74.375 74.375
Length 96, alignment 0/ 0: 24.0625 259.375 20.4688 20.4688
Length 96, alignment 6/ 0: 51.5625 259.219 47.9688 47.9688
Length 96, alignment 0/ 6: 51.0938 259.219 46.875 46.875
Length 96, alignment 6/ 6: 78.4375 259.219 74.375 74.2188
Length 112, alignment 0/ 0: 30 299.375 26.7188 24.5312
Length 112, alignment 7/ 0: 67.8125 299.219 64.375 62.9688
Length 112, alignment 0/ 7: 69.8438 299.375 66.4062 64.375
Length 112, alignment 7/ 7: 72.3438 299.219 68.9062 67.1875
Length 144, alignment 0/ 0: 29.8438 379.375 26.4062 24.5312
Length 144, alignment 9/ 0: 67.8125 379.219 64.5312 62.9688
Length 144, alignment 0/ 9: 90.4688 379.375 86.7188 85.1562
Length 144, alignment 9/ 9: 76.5625 379.375 73.4375 72.1875
Length 160, alignment 0/ 0: 34.2188 419.375 30.9375 28.9062
Length 160, alignment 10/ 0: 87.0312 419.219 83.9062 82.3438
Length 160, alignment 0/10: 89.0625 419.375 86.0938 83.75
Length 160, alignment 10/10: 76.7188 419.375 73.4375 71.7188
Length 176, alignment 0/ 0: 34.2188 459.219 31.0938 28.75
Length 176, alignment 11/ 0: 87.1875 459.375 83.9062 82.1875
Length 176, alignment 0/11: 89.0625 459.375 85.9375 83.9062
Length 176, alignment 11/11: 76.7188 459.375 73.4375 71.7188
Length 192, alignment 0/ 0: 34.0625 499.375 31.0938 28.9062
Length 192, alignment 12/ 0: 87.1875 499.219 84.0625 82.1875
Length 192, alignment 0/12: 89.0625 499.219 85.9375 83.75
Length 192, alignment 12/12: 76.5625 499.375 73.5938 71.875
Length 208, alignment 0/ 0: 34.2188 539.375 30.9375 28.9062
Length 208, alignment 13/ 0: 87.0312 539.219 83.9062 82.3438
Length 208, alignment 0/13: 111.25 539.219 107.812 108.75
Length 208, alignment 13/13: 81.4062 539.219 78.125 80.1562
Length 224, alignment 0/ 0: 38.9062 579.375 35.7812 37.0312
Length 224, alignment 14/ 0: 108.906 579.375 105.625 106.875
Length 224, alignment 0/14: 110.938 579.375 107.812 108.438
Length 224, alignment 14/14: 81.4062 579.219 78.2812 80
Length 240, alignment 0/ 0: 38.9062 619.375 35.7812 37.0312
Length 240, alignment 15/ 0: 108.906 619.219 105.625 107.031
Length 240, alignment 0/15: 110.781 619.375 107.656 108.438
Length 240, alignment 15/15: 81.5625 619.375 78.125 80
Length 272, alignment 0/ 0: 38.9062 699.375 35.625 37.0312
Length 272, alignment 17/ 0: 108.906 699.375 105.781 106.875
Length 272, alignment 0/17: 133.125 864.688 129.531 127.812
Length 272, alignment 17/17: 85.7812 699.375 82.5 81.5625
Length 288, alignment 0/ 0: 43.4375 739.375 40.1562 38.125
Length 288, alignment 18/ 0: 130.312 739.375 127.031 125.469
Length 288, alignment 0/18: 132.344 739.375 129.062 127.031
Length 288, alignment 18/18: 85.9375 739.375 82.5 81.0938
Length 304, alignment 0/ 0: 43.2812 779.531 40 38.125
Length 304, alignment 19/ 0: 130.312 779.375 127.031 125.625
Length 304, alignment 0/19: 132.344 779.531 129.062 127.031
Length 304, alignment 19/19: 85.7812 779.375 82.5 81.0938
Length 320, alignment 0/ 0: 43.4375 819.375 40.1562 37.9688
Length 320, alignment 20/ 0: 130.312 819.375 127.031 125.469
Length 320, alignment 0/20: 132.344 819.531 128.906 127.031
Length 320, alignment 20/20: 85.9375 819.375 82.5 81.0938
Length 336, alignment 0/ 0: 43.4375 859.375 40.1562 38.125
Length 336, alignment 21/ 0: 130.312 859.375 127.031 125.625
Length 336, alignment 0/21: 155 859.375 150.781 149.375
Length 336, alignment 21/21: 90.1562 859.375 86.875 85.7812
Length 352, alignment 0/ 0: 47.6562 899.531 44.5312 42.5
Length 352, alignment 22/ 0: 151.562 899.375 148.438 146.875
Length 352, alignment 0/22: 153.594 899.375 150.469 148.281
Length 352, alignment 22/22: 90.1562 899.375 87.0312 85.4688
Length 368, alignment 0/ 0: 47.8125 939.375 44.5312 42.5
Length 368, alignment 23/ 0: 151.562 939.375 148.438 146.875
Length 368, alignment 0/23: 153.75 939.375 150.625 148.438
Length 368, alignment 23/23: 90.1562 939.375 87.0312 85.4688
Length 384, alignment 0/ 0: 47.8125 979.375 44.5312 42.5
Length 384, alignment 24/ 0: 151.562 979.375 147.969 146.406
Length 384, alignment 0/24: 153.125 979.531 149.844 147.969
Length 384, alignment 24/24: 90.1562 979.531 87.0312 85.4688
Length 400, alignment 0/ 0: 47.6562 1019.53 44.5312 42.5
Length 400, alignment 25/ 0: 151.719 1019.53 148.438 147.031
Length 400, alignment 0/25: 176.25 1019.53 172.188 170.781
Length 400, alignment 25/25: 94.8438 1187.5 91.4062 90
Length 416, alignment 0/ 0: 52.1875 1059.53 49.0625 46.875
Length 416, alignment 26/ 0: 173.125 1059.53 170 168.438
Length 416, alignment 0/26: 175.156 1059.53 172.031 169.844
Length 416, alignment 26/26: 94.6875 1059.53 91.5625 90
Length 432, alignment 0/ 0: 52.1875 1099.53 49.0625 47.0312
Length 432, alignment 27/ 0: 173.125 1099.69 170 168.594
Length 432, alignment 0/27: 175.156 1099.53 171.875 169.844
Length 432, alignment 27/27: 94.6875 1099.53 91.4062 90
Length 448, alignment 0/ 0: 52.1875 1139.53 49.0625 47.0312
Length 448, alignment 28/ 0: 173.125 1139.38 170 168.438
Length 448, alignment 0/28: 175.156 1139.53 171.875 169.844
Length 448, alignment 28/28: 94.6875 1139.53 91.4062 90
Length 464, alignment 0/ 0: 52.1875 1179.53 49.0625 47.0312
Length 464, alignment 29/ 0: 173.281 1179.53 170 168.438
Length 464, alignment 0/29: 197.344 1179.53 193.75 191.875
Length 464, alignment 29/29: 99.5312 1179.53 96.0938 94.6875
Length 480, alignment 0/ 0: 56.7188 1219.53 53.5938 51.5625
Length 480, alignment 30/ 0: 194.688 1219.53 191.562 190.156
Length 480, alignment 0/30: 196.875 1219.53 193.594 191.562
Length 480, alignment 30/30: 99.5312 1219.38 96.0938 94.5312
Length 496, alignment 0/ 0: 56.7188 1259.53 53.75 51.5625
Length 496, alignment 31/ 0: 194.688 1259.38 191.562 190
Length 496, alignment 0/31: 197.031 1259.53 193.594 191.719
Length 496, alignment 31/31: 99.375 1259.53 96.0938 94.5312
Length 1024, alignment 0/ 0: 96.4062 2579.38 93.125 91.25
Length 1024, alignment 32/ 0: 96.0938 2579.53 94.2188 91.0938
Length 1024, alignment 0/32: 96.25 2579.53 93.2812 90.9375
Length 1024, alignment 32/32: 96.4062 2579.53 92.9688 91.0938
Length 1056, alignment 0/ 0: 104.688 2659.53 101.094 99.2188
Length 1056, alignment 33/ 0: 391.719 2659.38 388.125 386.719
Length 1056, alignment 0/33: 393.594 2659.53 390.156 388.125
Length 1056, alignment 33/33: 143.75 2659.53 140.156 145.781
Length 1088, alignment 0/ 0: 101.094 2739.53 97.5 102.812
Length 1088, alignment 34/ 0: 391.719 2739.53 388.125 386.562
Length 1088, alignment 0/34: 393.594 2739.38 390 388.125
Length 1088, alignment 34/34: 143.594 2739.53 140 145.625
Length 1120, alignment 0/ 0: 105.312 2819.53 102.031 107.031
Length 1120, alignment 35/ 0: 413.125 2819.53 409.688 408.125
Length 1120, alignment 0/35: 415.156 2819.38 411.719 409.688
Length 1120, alignment 35/35: 148.125 2819.53 144.531 150.156
Length 1152, alignment 0/ 0: 105.781 3025.62 102.5 107.188
Length 1152, alignment 36/ 0: 413.281 2899.38 409.844 408.125
Length 1152, alignment 0/36: 415.156 2899.53 411.719 409.844
Length 1152, alignment 36/36: 148.125 2899.53 144.531 150.156
Length 1184, alignment 0/ 0: 110.156 2979.53 106.562 111.719
Length 1184, alignment 37/ 0: 434.688 2979.53 431.25 429.688
Length 1184, alignment 0/37: 436.719 2979.53 433.281 431.094
Length 1184, alignment 37/37: 152.656 2979.53 149.219 154.688
Length 1216, alignment 0/ 0: 110 3059.53 106.719 111.719
Length 1216, alignment 38/ 0: 434.688 3059.53 431.25 429.844
Length 1216, alignment 0/38: 436.719 3059.38 433.125 431.25
Length 1216, alignment 38/38: 152.812 3059.53 149.219 154.688
Length 1248, alignment 0/ 0: 114.531 3139.53 110.938 109.219
Length 1248, alignment 39/ 0: 456.25 3264.84 453.125 451.25
Length 1248, alignment 0/39: 458.281 3139.38 454.688 452.812
Length 1248, alignment 39/39: 157.188 3139.53 153.594 152.188
Length 1280, alignment 0/ 0: 114.531 3219.38 111.094 109.219
Length 1280, alignment 40/ 0: 456.094 3219.38 452.5 451.094
Length 1280, alignment 0/40: 458.125 3219.53 454.688 452.656
Length 1280, alignment 40/40: 157.188 3219.53 153.594 152.188
Length 1312, alignment 0/ 0: 121.562 3299.84 115.625 113.594
Length 1312, alignment 41/ 0: 477.812 3299.53 474.375 472.812
Length 1312, alignment 0/41: 479.844 3299.53 476.25 474.375
Length 1312, alignment 41/41: 161.562 3299.53 158.125 156.719
Length 1344, alignment 0/ 0: 119.219 3379.53 115.781 113.75
Length 1344, alignment 42/ 0: 477.812 3379.53 636.406 473.281
Length 1344, alignment 0/42: 479.844 3379.53 476.094 474.375
Length 1344, alignment 42/42: 161.562 3379.53 158.125 156.719
Length 1376, alignment 0/ 0: 123.594 3459.53 120.156 118.281
Length 1376, alignment 43/ 0: 499.219 3459.53 495.938 494.375
Length 1376, alignment 0/43: 501.406 3459.53 497.812 495.625
Length 1376, alignment 43/43: 166.25 3459.38 162.656 161.25
Length 1408, alignment 0/ 0: 123.75 3539.53 120.156 118.281
Length 1408, alignment 44/ 0: 499.219 3539.53 495.781 494.375
Length 1408, alignment 0/44: 501.406 3539.53 497.969 496.094
Length 1408, alignment 44/44: 166.25 3539.38 162.656 161.094
Length 1440, alignment 0/ 0: 128.281 3619.53 124.531 122.812
Length 1440, alignment 45/ 0: 520.938 3750.94 517.812 515.938
Length 1440, alignment 0/45: 522.969 3619.53 519.531 517.5
Length 1440, alignment 45/45: 170.625 3619.38 167.188 165.625
Length 1472, alignment 0/ 0: 128.281 3699.53 124.688 122.812
Length 1472, alignment 46/ 0: 521.094 3699.53 517.5 516.094
Length 1472, alignment 0/46: 522.812 3699.53 519.375 517.5
Length 1472, alignment 46/46: 170.781 3699.53 167.188 165.625
Length 1504, alignment 0/ 0: 132.812 3779.38 129.219 127.188
Length 1504, alignment 47/ 0: 542.344 3779.53 538.906 537.188
Length 1504, alignment 0/47: 545.156 3779.53 540.469 538.594
Length 1504, alignment 47/47: 175.312 3779.38 171.875 170.469
Length 1536, alignment 0/ 0: 132.812 3982.5 129.531 127.344
Length 1536, alignment 48/ 0: 132.812 3859.53 129.375 127.344
Length 1536, alignment 0/48: 133.281 3859.53 129.844 127.812
Length 1536, alignment 48/48: 133.281 3859.53 129.844 127.812
Length 1568, alignment 0/ 0: 138.438 3939.38 134.219 132.031
Length 1568, alignment 49/ 0: 563.594 3939.53 560 558.594
Length 1568, alignment 0/49: 565.469 3939.38 562.031 560
Length 1568, alignment 49/49: 180.625 3939.53 177.031 174.844
Length 1600, alignment 0/ 0: 137.5 4019.53 133.906 132.031
Length 1600, alignment 50/ 0: 563.438 4019.53 560 558.594
Length 1600, alignment 0/50: 565.469 4019.53 562.031 683.125
Length 1600, alignment 50/50: 180.625 4019.53 177.188 175.469
Length 1632, alignment 0/ 0: 142.5 4099.53 138.75 137.031
Length 1632, alignment 51/ 0: 585 4099.53 581.406 580
Length 1632, alignment 0/51: 587.031 4099.53 583.594 581.562
Length 1632, alignment 51/51: 188.75 4099.53 185.469 182.5
Length 1664, alignment 0/ 0: 142.5 4179.53 138.906 136.875
Length 1664, alignment 52/ 0: 585 4179.53 581.406 580
Length 1664, alignment 0/52: 587.031 4179.53 583.438 581.562
Length 1664, alignment 52/52: 188.281 4179.53 181.406 180.625
Length 1696, alignment 0/ 0: 154.688 4259.38 146.406 143.438
Length 1696, alignment 53/ 0: 732.5 4259.84 603.125 601.562
Length 1696, alignment 0/53: 608.594 4259.53 605 602.969
Length 1696, alignment 53/53: 199.375 4259.53 201.562 197.656
Length 1728, alignment 0/ 0: 148.906 4339.53 146.875 143.125
Length 1728, alignment 54/ 0: 606.406 4339.53 603.125 601.562
Length 1728, alignment 0/54: 608.438 4339.53 605 603.125
Length 1728, alignment 54/54: 197.5 4339.53 190.312 186.094
Length 1760, alignment 0/ 0: 153.906 4419.38 150.938 148.125
Length 1760, alignment 55/ 0: 627.969 4419.53 624.375 623.125
Length 1760, alignment 0/55: 630 4542.03 626.719 624.531
Length 1760, alignment 55/55: 213.906 4419.53 211.25 208.125
Length 1792, alignment 0/ 0: 152.969 4499.53 152.969 146.406
Length 1792, alignment 56/ 0: 628.125 4499.53 624.531 623.125
Length 1792, alignment 0/56: 630 4499.53 626.406 624.531
Length 1792, alignment 56/56: 212.031 4499.38 209.688 206.562
Length 1824, alignment 0/ 0: 164.844 4579.38 173.125 158.438
Length 1824, alignment 57/ 0: 649.531 4579.53 646.094 644.531
Length 1824, alignment 0/57: 651.562 4579.53 647.969 645.938
Length 1824, alignment 57/57: 218.281 4702.34 218.438 212.812
Length 1856, alignment 0/ 0: 168.281 4659.53 162.344 162.5
Length 1856, alignment 58/ 0: 649.531 4659.53 645.938 644.531
Length 1856, alignment 0/58: 651.406 4659.53 647.969 646.094
Length 1856, alignment 58/58: 218.281 4659.53 215.625 212.656
Length 1888, alignment 0/ 0: 168.594 4739.53 167.656 168.438
Length 1888, alignment 59/ 0: 671.094 4739.38 667.5 666.094
Length 1888, alignment 0/59: 672.969 4739.53 669.531 667.5
Length 1888, alignment 59/59: 224.844 4739.38 222.344 218.906
Length 1920, alignment 0/ 0: 173.594 4974.38 174.531 162.812
Length 1920, alignment 60/ 0: 670.938 4819.53 667.5 665.938
Length 1920, alignment 0/60: 672.969 4819.38 669.531 667.5
Length 1920, alignment 60/60: 224.688 4819.53 222.656 219.219
Length 1952, alignment 0/ 0: 190.156 4899.38 187.344 183.906
Length 1952, alignment 61/ 0: 692.5 4899.53 688.906 687.5
Length 1952, alignment 0/61: 694.531 4899.38 690.938 689.062
Length 1952, alignment 61/61: 229.062 4899.53 226.562 222.969
Length 1984, alignment 0/ 0: 190.156 4979.53 187.344 184.062
Length 1984, alignment 62/ 0: 692.5 4981.25 689.844 687.5
Length 1984, alignment 0/62: 694.531 4979.53 691.094 688.906
Length 1984, alignment 62/62: 229.375 4979.53 226.562 222.969
Length 2016, alignment 0/ 0: 194.375 5059.53 191.562 188.281
Length 2016, alignment 63/ 0: 713.906 5059.84 710.781 709.375
Length 2016, alignment 0/63: 715.938 5059.69 712.812 710.781
Length 2016, alignment 63/63: 235.625 5059.69 233.438 230
Length 4096, alignment 0/ 0: 369.219 10398 367.812 360.938
[-- Attachment #4: bench-memcpy-large.out --]
[-- Type: text/plain, Size: 2098 bytes --]
__memcpy_thunderx __memcpy_generic
Length 65543, alignment 0/ 0: 8083.75 15541.2
Length 65551, alignment 0/ 3: 24720.6 32405
Length 65567, alignment 3/ 0: 24486.2 33813.8
Length 65599, alignment 3/ 5: 24731.9 32400.6
Length 131079, alignment 0/ 0: 15959.4 31031.9
Length 131087, alignment 0/ 3: 49411.9 64778.1
Length 131103, alignment 3/ 0: 49505.6 66046.2
Length 131135, alignment 3/ 5: 49401.9 64780
Length 262151, alignment 0/ 0: 31648.1 62405
Length 262159, alignment 0/ 3: 98538.8 129344
Length 262175, alignment 3/ 0: 97577.5 132878
Length 262207, alignment 3/ 5: 99119.4 129346
Length 524295, alignment 0/ 0: 63994.4 124135
Length 524303, alignment 0/ 3: 199494 259969
Length 524319, alignment 3/ 0: 198194 264828
Length 524351, alignment 3/ 5: 199118 259842
Length 1048583, alignment 0/ 0: 152811 259784
Length 1048591, alignment 0/ 3: 422906 529983
Length 1048607, alignment 3/ 0: 424201 540640
Length 1048639, alignment 3/ 5: 422879 529976
Length 2097159, alignment 0/ 0: 276857 501925
Length 2097167, alignment 0/ 3: 812467 1.04305e+06
Length 2097183, alignment 3/ 0: 810185 1.06351e+06
Length 2097215, alignment 3/ 5: 812467 1.04307e+06
Length 4194311, alignment 0/ 0: 524355 986463
Length 4194319, alignment 0/ 3: 1.59268e+06 2.06977e+06
Length 4194335, alignment 3/ 0: 1.5818e+06 2.11026e+06
Length 4194367, alignment 3/ 5: 1.59222e+06 2.06932e+06
Length 8388615, alignment 0/ 0: 1.12852e+06 3.00444e+06
Length 8388623, alignment 0/ 3: 3.17872e+06 5.16414e+06
Length 8388639, alignment 3/ 0: 3.15213e+06 5.23659e+06
Length 8388671, alignment 3/ 5: 3.179e+06 5.1543e+06
Length 16777223, alignment 0/ 0: 3.54774e+06 1.30525e+07
Length 16777231, alignment 0/ 3: 6.8e+06 1.77641e+07
Length 16777247, alignment 3/ 0: 6.72802e+06 1.7955e+07
Length 16777279, alignment 3/ 5: 6.80436e+06 1.77679e+07
Length 33554439, alignment 0/ 0: 7.34141e+06 2.62947e+07
Length 33554447, alignment 0/ 3: 1.36974e+07 3.57826e+07
Length 33554463, alignment 3/ 0: 1.37467e+07 3.6138e+07
Length 33554495, alignment 3/ 5: 1.36981e+07 3.57831e+07
[-- Attachment #5: bench-memmove.out --]
[-- Type: text/plain, Size: 21879 bytes --]
simple_memmove __memmove_thunderx __memmove_generic
Length 1, alignment 0/32: 37.1875 26.875 22.6562
Length 1, alignment 32/ 0: 18.2812 22.0312 22.0312
Length 1, alignment 0/ 0: 17.3438 21.875 21.875
Length 1, alignment 0/ 0: 16.875 21.875 21.875
Length 2, alignment 0/32: 22.0312 21.875 21.875
Length 2, alignment 32/ 0: 20.4688 21.875 21.875
Length 2, alignment 0/ 1: 21.25 21.875 21.875
Length 2, alignment 1/ 0: 20 21.875 21.875
Length 4, alignment 0/32: 27.0312 20.7812 20.3125
Length 4, alignment 32/ 0: 25.4688 19.8438 19.8438
Length 4, alignment 0/ 2: 26.4062 19.8438 19.6875
Length 4, alignment 2/ 0: 24.6875 19.8438 19.6875
Length 8, alignment 0/32: 37.1875 19.0625 18.75
Length 8, alignment 32/ 0: 35.625 18.125 18.2812
Length 8, alignment 0/ 3: 35.9375 28.75 28.5938
Length 8, alignment 3/ 0: 34.375 37.1875 37.1875
Length 16, alignment 0/32: 66.875 18.2812 18.125
Length 16, alignment 32/ 0: 69.0625 18.4375 18.125
Length 16, alignment 0/ 4: 66.25 28.5938 28.5938
Length 16, alignment 4/ 0: 68.2812 37.1875 37.0312
Length 32, alignment 0/32: 102.188 19.5312 19.2188
Length 32, alignment 32/ 0: 100.312 18.9062 19.0625
Length 32, alignment 0/ 5: 102.344 28.5938 28.5938
Length 32, alignment 5/ 0: 100.156 33.5938 33.5938
Length 64, alignment 0/32: 182.344 21.25 21.0938
Length 64, alignment 32/ 0: 180.312 20.625 20.4688
Length 64, alignment 0/ 6: 182.188 38.9062 38.5938
Length 64, alignment 6/ 0: 180.312 47.3438 47.3438
Length 128, alignment 0/32: 342.344 26.25 25.625
Length 128, alignment 32/ 0: 340.156 28.125 26.7188
Length 128, alignment 0/ 7: 342.344 65.1562 65
Length 128, alignment 7/ 0: 340.156 70.9375 69.2188
Length 256, alignment 0/32: 662.344 35.4688 35.3125
Length 256, alignment 32/ 0: 660.312 37.1875 36.0938
Length 256, alignment 0/ 8: 662.344 106.406 106.719
Length 256, alignment 8/ 0: 660.312 111.094 109.688
Length 512, alignment 0/32: 1302.34 53.75 53.75
Length 512, alignment 32/ 0: 1300.47 55.3125 53.9062
Length 512, alignment 0/ 9: 1302.34 192.656 192.5
Length 512, alignment 9/ 0: 1300.31 197.656 195.938
Length 1024, alignment 0/32: 2582.34 93.125 93.125
Length 1024, alignment 32/ 0: 2580.31 94.375 93.2812
Length 1024, alignment 0/10: 2582.34 367.188 366.875
Length 1024, alignment 10/ 0: 2580.47 375.625 371.094
Length 2048, alignment 0/32: 5537.97 193.75 190.469
Length 2048, alignment 32/ 0: 5142.81 192.812 189.375
Length 2048, alignment 0/11: 5142.66 712.031 712.031
Length 2048, alignment 11/ 0: 5141.09 716.719 715
Length 4096, alignment 0/32: 10263.1 366.094 362.031
Length 4096, alignment 32/ 0: 10261.7 367.031 361.25
Length 4096, alignment 0/12: 10263.3 1400.94 1564.69
Length 4096, alignment 12/ 0: 10261.6 1405.31 1403.75
Length 0, alignment 0/32: 19.6875 23.2812 22.9688
Length 0, alignment 32/ 0: 17.8125 22.3438 22.3438
Length 0, alignment 0/ 0: 17.5 22.1875 22.1875
Length 0, alignment 0/ 0: 17.1875 22.1875 22.1875
Length 1, alignment 0/32: 19.2188 22.0312 22.0312
Length 1, alignment 32/ 0: 17.1875 22.0312 21.875
Length 1, alignment 0/ 1: 19.0625 21.7188 21.875
Length 1, alignment 1/ 0: 17.0312 21.875 21.7188
Length 2, alignment 0/32: 21.4062 21.875 21.875
Length 2, alignment 32/ 0: 19.5312 21.875 22.0312
Length 2, alignment 0/ 2: 21.0938 21.875 21.875
Length 2, alignment 2/ 0: 19.5312 21.875 21.875
Length 3, alignment 0/32: 23.9062 21.7188 21.875
Length 3, alignment 32/ 0: 22.9688 21.875 21.875
Length 3, alignment 0/ 3: 23.2812 21.875 21.875
Length 3, alignment 3/ 0: 22.1875 21.875 21.875
Length 4, alignment 0/32: 26.25 20 19.8438
Length 4, alignment 32/ 0: 24.375 19.8438 19.6875
Length 4, alignment 0/ 4: 26.0938 19.8438 19.6875
Length 4, alignment 4/ 0: 24.5312 19.8438 19.6875
Length 5, alignment 0/32: 29.5312 19.6875 19.6875
Length 5, alignment 32/ 0: 27.9688 19.8438 19.6875
Length 5, alignment 0/ 5: 28.75 29.6875 29.6875
Length 5, alignment 5/ 0: 27.1875 28.2812 28.2812
Length 6, alignment 0/32: 35.4688 19.6875 19.6875
Length 6, alignment 32/ 0: 30.3125 19.8438 19.6875
Length 6, alignment 0/ 6: 31.25 25.3125 25.1562
Length 6, alignment 6/ 0: 29.8438 23.75 23.75
Length 7, alignment 0/32: 34.375 19.8438 19.6875
Length 7, alignment 32/ 0: 32.5 19.8438 19.6875
Length 7, alignment 0/ 7: 33.4375 25.3125 25.1562
Length 7, alignment 7/ 0: 32.1875 23.75 23.75
Length 8, alignment 0/32: 36.4062 18.2812 18.4375
Length 8, alignment 32/ 0: 35 18.2812 18.2812
Length 8, alignment 0/ 8: 36.0938 18.125 18.125
Length 8, alignment 8/ 0: 34.5312 18.2812 18.125
Length 9, alignment 0/32: 38.9062 28.2812 28.125
Length 9, alignment 32/ 0: 37.5 28.125 28.125
Length 9, alignment 0/ 9: 37.9688 37.0312 37.1875
Length 9, alignment 9/ 0: 37.0312 32.1875 32.1875
Length 10, alignment 0/32: 41.4062 28.125 28.125
Length 10, alignment 32/ 0: 40 28.125 28.125
Length 10, alignment 0/10: 41.0938 37.1875 37.0312
Length 10, alignment 10/ 0: 39.6875 32.1875 32.0312
Length 11, alignment 0/32: 43.4375 28.125 28.125
Length 11, alignment 32/ 0: 42.3438 28.125 27.9688
Length 11, alignment 0/11: 42.9688 37.1875 37.0312
Length 11, alignment 11/ 0: 41.875 32.1875 32.0312
Length 12, alignment 0/32: 46.25 28.125 28.125
Length 12, alignment 32/ 0: 44.5312 28.125 28.125
Length 12, alignment 0/12: 45.9375 27.6562 27.5
Length 12, alignment 12/ 0: 44.375 28.5938 28.5938
Length 13, alignment 0/32: 48.5938 28.2812 28.125
Length 13, alignment 32/ 0: 46.875 28.125 28.125
Length 13, alignment 0/13: 48.125 32.1875 32.0312
Length 13, alignment 13/ 0: 46.875 32.1875 32.1875
Length 14, alignment 0/32: 50.7812 28.2812 28.125
Length 14, alignment 32/ 0: 53.9062 28.125 28.125
Length 14, alignment 0/14: 50.7812 32.1875 32.0312
Length 14, alignment 14/ 0: 53.75 32.0312 32.0312
Length 15, alignment 0/32: 62.3438 28.125 28.125
Length 15, alignment 32/ 0: 60.4688 28.2812 28.125
Length 15, alignment 0/15: 62.3438 32.1875 32.0312
Length 15, alignment 15/ 0: 60.3125 32.1875 32.0312
Length 16, alignment 0/32: 66.25 18.125 18.125
Length 16, alignment 32/ 0: 66.875 18.2812 18.125
Length 16, alignment 0/16: 66.25 18.2812 18.125
Length 16, alignment 16/ 0: 66.875 18.4375 18.125
Length 17, alignment 0/32: 63.5938 29.8438 29.8438
Length 17, alignment 32/ 0: 70.7812 29.8438 29.6875
Length 17, alignment 0/17: 63.4375 33.75 33.5938
Length 17, alignment 17/ 0: 70.7812 33.5938 33.75
Length 18, alignment 0/32: 67.3438 29.5312 29.5312
Length 18, alignment 32/ 0: 73.4375 29.6875 29.6875
Length 18, alignment 0/18: 67.1875 33.5938 33.5938
Length 18, alignment 18/ 0: 73.2812 33.5938 33.5938
Length 19, alignment 0/32: 68.5938 29.6875 29.5312
Length 19, alignment 32/ 0: 75.9375 29.5312 29.5312
Length 19, alignment 0/19: 68.2812 33.5938 33.5938
Length 19, alignment 19/ 0: 75.9375 35.3125 33.5938
Length 20, alignment 0/32: 72.3438 29.6875 29.5312
Length 20, alignment 32/ 0: 70.3125 29.6875 29.5312
Length 20, alignment 0/20: 72.3438 33.75 33.75
Length 20, alignment 20/ 0: 70.3125 33.75 33.75
Length 21, alignment 0/32: 73.5938 29.6875 29.5312
Length 21, alignment 32/ 0: 72.9688 29.6875 29.5312
Length 21, alignment 0/21: 73.2812 33.5938 33.5938
Length 21, alignment 21/ 0: 72.8125 33.5938 33.5938
Length 22, alignment 0/32: 77.3438 29.6875 29.6875
Length 22, alignment 32/ 0: 75.3125 29.6875 29.6875
Length 22, alignment 0/22: 77.3438 33.75 33.5938
Length 22, alignment 22/ 0: 75.3125 33.75 33.5938
Length 23, alignment 0/32: 78.5938 29.6875 29.6875
Length 23, alignment 32/ 0: 77.8125 29.6875 29.6875
Length 23, alignment 0/23: 78.4375 33.5938 33.5938
Length 23, alignment 23/ 0: 77.9688 33.5938 33.5938
Length 24, alignment 0/32: 82.1875 29.6875 29.6875
Length 24, alignment 32/ 0: 80.4688 29.6875 29.6875
Length 24, alignment 0/24: 82.3438 29.2188 29.0625
Length 24, alignment 24/ 0: 80.3125 29.2188 29.0625
Length 25, alignment 0/32: 83.5938 29.6875 29.5312
Length 25, alignment 32/ 0: 82.8125 30.1562 30
Length 25, alignment 0/25: 83.4375 33.5938 33.5938
Length 25, alignment 25/ 0: 82.8125 33.75 33.5938
Length 26, alignment 0/32: 87.3438 29.6875 29.6875
Length 26, alignment 32/ 0: 85.4688 29.6875 29.5312
Length 26, alignment 0/26: 87.3438 33.75 33.5938
Length 26, alignment 26/ 0: 85.3125 33.75 33.5938
Length 27, alignment 0/32: 88.4375 29.6875 29.6875
Length 27, alignment 32/ 0: 87.8125 29.6875 29.5312
Length 27, alignment 0/27: 88.2812 33.5938 33.5938
Length 27, alignment 27/ 0: 87.8125 33.75 33.5938
Length 28, alignment 0/32: 92.3438 29.6875 29.6875
Length 28, alignment 32/ 0: 90.3125 29.6875 29.5312
Length 28, alignment 0/28: 92.1875 33.75 33.5938
Length 28, alignment 28/ 0: 90.3125 33.75 33.5938
Length 29, alignment 0/32: 93.5938 29.6875 29.5312
Length 29, alignment 32/ 0: 92.8125 29.6875 29.6875
Length 29, alignment 0/29: 93.2812 33.5938 33.5938
Length 29, alignment 29/ 0: 92.9688 33.5938 33.5938
Length 30, alignment 0/32: 97.1875 29.6875 29.6875
Length 30, alignment 32/ 0: 95.3125 29.6875 29.5312
Length 30, alignment 0/30: 97.3438 33.5938 33.5938
Length 30, alignment 30/ 0: 95.3125 33.75 33.5938
Length 31, alignment 0/32: 98.5938 29.6875 29.5312
Length 31, alignment 32/ 0: 97.9688 29.6875 29.5312
Length 31, alignment 0/31: 98.2812 33.75 33.75
Length 31, alignment 31/ 0: 97.8125 33.5938 33.5938
Length 48, alignment 0/32: 142.188 20.4688 20.4688
Length 48, alignment 32/ 0: 140.469 20.4688 20.4688
Length 48, alignment 0/ 3: 142.344 38.9062 38.75
Length 48, alignment 3/ 0: 140.312 47.3438 47.3438
Length 80, alignment 0/32: 222.188 22.5 22.3438
Length 80, alignment 32/ 0: 220.312 21.875 21.7188
Length 80, alignment 0/ 5: 222.344 48.125 48.125
Length 80, alignment 5/ 0: 220.312 49.375 49.0625
Length 96, alignment 0/32: 262.188 21.5625 21.5625
Length 96, alignment 32/ 0: 260.312 21.5625 21.5625
Length 96, alignment 0/ 6: 262.188 48.125 48.125
Length 96, alignment 6/ 0: 260.312 49.375 49.0625
Length 112, alignment 0/32: 302.344 25.9375 25.4688
Length 112, alignment 32/ 0: 300.156 27.8125 26.0938
Length 112, alignment 0/ 7: 302.188 65 64.8438
Length 112, alignment 7/ 0: 300.312 70.625 68.9062
Length 144, alignment 0/32: 382.344 31.0938 30.9375
Length 144, alignment 32/ 0: 380.312 27.3438 25.9375
Length 144, alignment 0/ 9: 382.344 85.4688 85
Length 144, alignment 9/ 0: 380.156 70.3125 68.75
Length 160, alignment 0/32: 422.344 30.625 30.3125
Length 160, alignment 32/ 0: 420.312 32.6562 31.0938
Length 160, alignment 0/10: 422.5 85 84.6875
Length 160, alignment 10/ 0: 420.312 105.156 103.75
Length 176, alignment 0/32: 462.344 30.4688 30.3125
Length 176, alignment 32/ 0: 460.312 32.0312 30.4688
Length 176, alignment 0/11: 462.344 84.8438 84.8438
Length 176, alignment 11/ 0: 460.312 89.8438 88.4375
Length 192, alignment 0/32: 502.344 30.4688 30.3125
Length 192, alignment 32/ 0: 500.312 32.0312 30.4688
Length 192, alignment 0/12: 502.344 84.8438 84.6875
Length 192, alignment 12/ 0: 500.312 90 88.4375
Length 208, alignment 0/32: 708.281 35.4688 35.1562
Length 208, alignment 32/ 0: 540.312 32.0312 30.4688
Length 208, alignment 0/13: 542.5 106.406 106.562
Length 208, alignment 13/ 0: 540.312 89.8438 88.4375
Length 224, alignment 0/32: 582.344 34.8438 35
Length 224, alignment 32/ 0: 580.312 36.5625 35.3125
Length 224, alignment 0/14: 582.344 106.406 106.406
Length 224, alignment 14/ 0: 580.312 126.25 125.156
Length 240, alignment 0/32: 622.344 34.8438 35
Length 240, alignment 32/ 0: 620.312 36.4062 35
Length 240, alignment 0/15: 622.344 106.406 106.406
Length 240, alignment 15/ 0: 620.312 111.25 109.844
Length 272, alignment 0/32: 702.344 40 39.8438
Length 272, alignment 32/ 0: 700.469 36.25 35
Length 272, alignment 0/17: 702.344 127.969 127.969
Length 272, alignment 17/ 0: 700.312 106.25 104.844
Length 288, alignment 0/32: 742.344 39.6875 39.375
Length 288, alignment 32/ 0: 740.312 42.3438 40.1562
Length 288, alignment 0/18: 742.344 128.125 127.812
Length 288, alignment 18/ 0: 740.781 128.594 126.406
Length 304, alignment 0/32: 782.188 39.375 39.2188
Length 304, alignment 32/ 0: 780.312 41.0938 39.2188
Length 304, alignment 0/19: 782.188 127.969 127.812
Length 304, alignment 19/ 0: 780.469 128.125 126.25
Length 320, alignment 0/32: 822.344 39.5312 39.375
Length 320, alignment 32/ 0: 820.469 40.9375 39.375
Length 320, alignment 0/20: 822.344 127.969 127.812
Length 320, alignment 20/ 0: 820.312 127.969 126.25
Length 336, alignment 0/32: 862.188 44.6875 44.375
Length 336, alignment 32/ 0: 860.312 41.0938 39.375
Length 336, alignment 0/21: 862.344 149.531 149.375
Length 336, alignment 21/ 0: 860.469 127.969 126.094
Length 352, alignment 0/32: 902.188 43.9062 43.9062
Length 352, alignment 32/ 0: 900.312 46.5625 44.6875
Length 352, alignment 0/22: 902.344 149.375 149.219
Length 352, alignment 22/ 0: 900.312 149.531 148.281
Length 368, alignment 0/32: 942.344 43.9062 43.75
Length 368, alignment 32/ 0: 940.469 45.3125 43.9062
Length 368, alignment 0/23: 942.344 149.531 149.219
Length 368, alignment 23/ 0: 940.469 149.375 147.969
Length 384, alignment 0/32: 982.344 43.9062 43.9062
Length 384, alignment 32/ 0: 980.469 45.4688 43.9062
Length 384, alignment 0/24: 1129.22 149.219 149.375
Length 384, alignment 24/ 0: 980.312 148.75 147.344
Length 400, alignment 0/32: 1022.19 49.0625 49.0625
Length 400, alignment 32/ 0: 1020.31 45.3125 44.0625
Length 400, alignment 0/25: 1022.34 171.094 170.938
Length 400, alignment 25/ 0: 1020.47 149.375 147.969
Length 416, alignment 0/32: 1062.34 48.75 48.2812
Length 416, alignment 32/ 0: 1060.31 51.0938 49.375
Length 416, alignment 0/26: 1062.34 171.094 170.781
Length 416, alignment 26/ 0: 1060.47 170.938 169.844
Length 432, alignment 0/32: 1102.34 48.75 48.2812
Length 432, alignment 32/ 0: 1100.47 49.8438 48.4375
Length 432, alignment 0/27: 1102.34 171.094 170.781
Length 432, alignment 27/ 0: 1100.47 170.938 169.531
Length 448, alignment 0/32: 1142.34 48.5938 48.2812
Length 448, alignment 32/ 0: 1140.47 49.8438 48.5938
Length 448, alignment 0/28: 1142.34 170.938 170.938
Length 448, alignment 28/ 0: 1140.31 170.781 169.531
Length 464, alignment 0/32: 1182.34 53.2812 53.2812
Length 464, alignment 32/ 0: 1180.47 49.8438 48.4375
Length 464, alignment 0/29: 1182.34 192.5 192.344
Length 464, alignment 29/ 0: 1180.47 170.938 169.531
Length 480, alignment 0/32: 1222.34 52.9688 52.8125
Length 480, alignment 32/ 0: 1220.31 54.6875 53.2812
Length 480, alignment 0/30: 1222.34 192.5 192.188
Length 480, alignment 30/ 0: 1220.47 192.5 190.938
Length 496, alignment 0/32: 1262.34 53.125 52.8125
Length 496, alignment 32/ 0: 1260.31 54.2188 52.6562
Length 496, alignment 0/31: 1262.5 192.5 192.188
Length 496, alignment 31/ 0: 1260.47 192.344 190.781
Length 1024, alignment 0/ 0: 2580.47 18.4375 18.4375
Length 1024, alignment 32/ 0: 2702.97 94.8438 92.6562
Length 1024, alignment 0/32: 2582.5 92.9688 92.8125
Length 1024, alignment 32/32: 2580.16 18.2812 18.2812
Length 1056, alignment 0/ 0: 2660.47 18.125 18.125
Length 1056, alignment 33/ 0: 2660.78 389.375 387.656
Length 1056, alignment 0/33: 2662.81 389.688 389.531
Length 1056, alignment 33/33: 2660.62 18.125 18.125
Length 1088, alignment 0/ 0: 2740.78 18.125 18.125
Length 1088, alignment 34/ 0: 2740.78 389.375 387.656
Length 1088, alignment 0/34: 2742.66 389.531 389.531
Length 1088, alignment 34/34: 2740.78 18.2812 18.125
Length 1120, alignment 0/ 0: 2820.78 18.125 18.125
Length 1120, alignment 35/ 0: 2820.62 410.781 409.219
Length 1120, alignment 0/35: 2822.81 411.094 410.938
Length 1120, alignment 35/35: 2820.78 18.125 18.125
Length 1152, alignment 0/ 0: 2900.78 18.2812 18.125
Length 1152, alignment 36/ 0: 2900.62 410.781 409.062
Length 1152, alignment 0/36: 3014.69 411.562 411.25
Length 1152, alignment 36/36: 2900.78 18.2812 17.9688
Length 1184, alignment 0/ 0: 2980.78 18.125 18.125
Length 1184, alignment 37/ 0: 2980.78 432.344 430.625
Length 1184, alignment 0/37: 2982.81 432.5 432.5
Length 1184, alignment 37/37: 2980.78 18.125 18.125
Length 1216, alignment 0/ 0: 3062.03 18.4375 18.125
Length 1216, alignment 38/ 0: 3060.16 432.188 430.625
Length 1216, alignment 0/38: 3062.34 432.656 432.344
Length 1216, alignment 38/38: 3060.47 18.2812 18.125
Length 1248, alignment 0/ 0: 3140.47 18.125 18.125
Length 1248, alignment 39/ 0: 3140.31 453.906 452.188
Length 1248, alignment 0/39: 3142.34 454.219 453.906
Length 1248, alignment 39/39: 3140.47 18.125 17.9688
Length 1280, alignment 0/ 0: 3220.31 18.125 17.9688
Length 1280, alignment 40/ 0: 3378.75 453.75 452.188
Length 1280, alignment 0/40: 3222.5 454.062 453.906
Length 1280, alignment 40/40: 3220.47 18.125 18.125
Length 1312, alignment 0/ 0: 3300.31 18.125 18.125
Length 1312, alignment 41/ 0: 3300.47 475.312 473.594
Length 1312, alignment 0/41: 3302.34 475.625 475.469
Length 1312, alignment 41/41: 3300.31 18.2812 18.125
Length 1344, alignment 0/ 0: 3380.31 18.125 18.125
Length 1344, alignment 42/ 0: 3380.47 475.312 473.75
Length 1344, alignment 0/42: 3382.34 475.625 475.469
Length 1344, alignment 42/42: 3380.47 18.125 18.125
Length 1376, alignment 0/ 0: 3460.31 18.2812 18.125
Length 1376, alignment 43/ 0: 3460.31 496.875 495
Length 1376, alignment 0/43: 3462.5 497.031 619.219
Length 1376, alignment 43/43: 3460.62 18.125 18.125
Length 1408, alignment 0/ 0: 3540.47 18.2812 17.9688
Length 1408, alignment 44/ 0: 3540.31 496.719 495.156
Length 1408, alignment 0/44: 3542.34 497.031 496.875
Length 1408, alignment 44/44: 3540.47 18.125 18.125
Length 1440, alignment 0/ 0: 3620.31 18.2812 18.125
Length 1440, alignment 45/ 0: 3620.31 518.281 516.719
Length 1440, alignment 0/45: 3622.34 518.594 518.438
Length 1440, alignment 45/45: 3620.47 18.125 18.125
Length 1472, alignment 0/ 0: 3700.47 18.2812 18.125
Length 1472, alignment 46/ 0: 3700.31 518.281 516.719
Length 1472, alignment 0/46: 3702.5 518.438 518.438
Length 1472, alignment 46/46: 3700.47 19.8438 17.9688
Length 1504, alignment 0/ 0: 3781.25 18.125 17.9688
Length 1504, alignment 47/ 0: 3780.47 539.688 538.125
Length 1504, alignment 0/47: 3782.81 540.156 540
Length 1504, alignment 47/47: 3780.47 18.125 18.125
Length 1536, alignment 0/ 0: 3860.31 18.125 18.125
Length 1536, alignment 48/ 0: 3860.47 130.469 128.75
Length 1536, alignment 0/48: 3862.5 128.906 128.906
Length 1536, alignment 48/48: 3860.47 18.125 17.9688
Length 1568, alignment 0/ 0: 3940.62 18.125 17.9688
Length 1568, alignment 49/ 0: 3940.94 561.25 559.531
Length 1568, alignment 0/49: 3942.81 561.719 561.562
Length 1568, alignment 49/49: 3940.47 18.125 18.125
Length 1600, alignment 0/ 0: 4164.53 19.2188 18.125
Length 1600, alignment 50/ 0: 4021.72 561.25 559.688
Length 1600, alignment 0/50: 4022.81 561.719 561.562
Length 1600, alignment 50/50: 4020.47 18.125 18.125
Length 1632, alignment 0/ 0: 4100.78 18.125 18.125
Length 1632, alignment 51/ 0: 4100.62 582.812 581.094
Length 1632, alignment 0/51: 4102.66 583.281 582.969
Length 1632, alignment 51/51: 4100.47 18.2812 18.125
Length 1664, alignment 0/ 0: 4180.78 18.125 18.125
Length 1664, alignment 52/ 0: 4180.62 582.812 581.25
Length 1664, alignment 0/52: 4182.66 583.125 582.969
Length 1664, alignment 52/52: 4315.94 18.2812 18.125
Length 1696, alignment 0/ 0: 4260.78 18.2812 17.9688
Length 1696, alignment 53/ 0: 4260.62 604.219 602.656
Length 1696, alignment 0/53: 4262.81 604.688 604.531
Length 1696, alignment 53/53: 4260.31 18.125 18.125
Length 1728, alignment 0/ 0: 4340.31 18.2812 17.9688
Length 1728, alignment 54/ 0: 4340.78 604.375 602.5
Length 1728, alignment 0/54: 4342.81 604.531 604.531
Length 1728, alignment 54/54: 4340.47 18.2812 18.2812
Length 1760, alignment 0/ 0: 4420.31 18.125 18.125
Length 1760, alignment 55/ 0: 4420.78 625.625 624.062
Length 1760, alignment 0/55: 4422.66 760.312 626.25
Length 1760, alignment 55/55: 4420.31 18.125 17.9688
Length 1792, alignment 0/ 0: 4500.62 18.125 18.125
Length 1792, alignment 56/ 0: 4500.62 625.781 624.062
Length 1792, alignment 0/56: 4502.81 625.938 625.938
Length 1792, alignment 56/56: 4500.47 18.125 18.125
Length 1824, alignment 0/ 0: 4580.62 18.2812 18.125
Length 1824, alignment 57/ 0: 4582.5 647.656 646.562
Length 1824, alignment 0/57: 4582.19 647.5 647.344
Length 1824, alignment 57/57: 4580.47 18.9062 18.75
Length 1856, alignment 0/ 0: 4661.09 18.125 18.125
Length 1856, alignment 58/ 0: 4834.06 647.812 646.25
Length 1856, alignment 0/58: 4663.12 647.656 647.5
Length 1856, alignment 58/58: 4660.31 18.75 18.75
Length 1888, alignment 0/ 0: 4741.09 18.125 18.125
Length 1888, alignment 59/ 0: 4741.09 668.75 667.031
Length 1888, alignment 0/59: 4742.34 668.906 668.906
Length 1888, alignment 59/59: 4740.31 18.75 18.75
Length 1920, alignment 0/ 0: 4821.09 18.125 18.75
Length 1920, alignment 60/ 0: 4821.09 668.75 667.031
Length 1920, alignment 0/60: 4822.34 669.062 668.906
Length 1920, alignment 60/60: 4963.28 18.9062 18.75
Length 1952, alignment 0/ 0: 4900.94 18.9062 18.75
Length 1952, alignment 61/ 0: 4901.09 690.156 688.75
Length 1952, alignment 0/61: 4902.5 690.625 690.469
Length 1952, alignment 61/61: 4900.47 18.75 18.75
Length 1984, alignment 0/ 0: 4981.09 18.125 18.75
Length 1984, alignment 62/ 0: 4980.94 690.156 688.75
Length 1984, alignment 0/62: 4982.5 690.625 690.469
Length 1984, alignment 62/62: 4980.47 18.75 18.75
Length 2016, alignment 0/ 0: 5060.94 18.2812 18.75
Length 2016, alignment 63/ 0: 5194.38 712.5 710.938
Length 2016, alignment 0/63: 5063.44 712.188 712.031
Length 2016, alignment 63/63: 5060.62 19.2188 19.0625
[-- Attachment #6: bench-memmove-large.out --]
[-- Type: text/plain, Size: 3251 bytes --]
__memmove_thunderx __memmove_generic
Length 4103, alignment 0/64: 525.625 418.125
Length 4111, alignment 0/ 3: 1465 1464.38
Length 4127, alignment 3/ 0: 1502.5 1497.5
Length 4159, alignment 3/ 7: 1485 1482.5
Length 4223, alignment 9/ 5: 1509.38 1510.62
Length 8199, alignment 0/64: 768.75 756.25
Length 8207, alignment 0/ 3: 2848.12 2846.88
Length 8223, alignment 3/ 0: 2881.25 2879.38
Length 8255, alignment 3/ 7: 2868.12 2868.12
Length 8319, alignment 9/ 5: 2893.75 2890
Length 16391, alignment 0/64: 1508.75 1468.75
Length 16399, alignment 0/ 3: 5630.62 5645
Length 16415, alignment 3/ 0: 6587.5 5696.88
Length 16447, alignment 3/ 7: 5665.62 5671.25
Length 16511, alignment 9/ 5: 5680 5715.62
Length 32775, alignment 0/64: 3404.38 3379.38
Length 32783, alignment 0/ 3: 11950 11948.8
Length 32799, alignment 3/ 0: 12272.5 12196.2
Length 32831, alignment 3/ 7: 11932.5 11937.5
Length 32895, alignment 9/ 5: 12278.1 12600.6
Length 65543, alignment 0/64: 15153.8 15145.6
Length 65551, alignment 0/ 3: 32700.6 32701.9
Length 65567, alignment 3/ 0: 24323.1 32856.9
Length 65599, alignment 3/ 7: 32333.1 32333.1
Length 65663, alignment 9/ 5: 24330.6 32889.4
Length 131079, alignment 0/64: 30443.8 30994.4
Length 131087, alignment 0/ 3: 65536.2 65528.8
Length 131103, alignment 3/ 0: 48560 65695
Length 131135, alignment 3/ 7: 65275.6 64768.8
Length 131199, alignment 9/ 5: 48574.4 66727.5
Length 262151, alignment 0/64: 61041.2 61043.1
Length 262159, alignment 0/ 3: 131192 132792
Length 262175, alignment 3/ 0: 97861.9 131354
Length 262207, alignment 3/ 7: 129631 130307
Length 262271, alignment 9/ 5: 97690 131389
Length 524295, alignment 0/64: 121656 122274
Length 524303, alignment 0/ 3: 262468 262572
Length 524319, alignment 3/ 0: 193571 262575
Length 524351, alignment 3/ 7: 259253 259378
Length 524415, alignment 9/ 5: 193574 262574
Length 1048583, alignment 0/64: 242923 242967
Length 1048591, alignment 0/ 3: 524019 524000
Length 1048607, alignment 3/ 0: 386996 524172
Length 1048639, alignment 3/ 7: 517732 517734
Length 1048703, alignment 9/ 5: 386961 524196
Length 2097159, alignment 0/64: 485096 484989
Length 2097167, alignment 0/ 3: 1.04767e+06 1.04724e+06
Length 2097183, alignment 3/ 0: 772307 1.04781e+06
Length 2097215, alignment 3/ 7: 1.03466e+06 1.03465e+06
Length 2097279, alignment 9/ 5: 772771 1.04743e+06
Length 4194311, alignment 0/64: 969175 969163
Length 4194319, alignment 0/ 3: 2.09413e+06 2.09371e+06
Length 4194335, alignment 3/ 0: 1.54408e+06 2.09386e+06
Length 4194367, alignment 3/ 7: 2.06856e+06 2.06895e+06
Length 4194431, alignment 9/ 5: 1.54376e+06 2.09388e+06
Length 8388615, alignment 0/64: 1.95435e+06 1.95496e+06
Length 8388623, alignment 0/ 3: 4.19442e+06 4.19203e+06
Length 8388639, alignment 3/ 0: 3.08706e+06 4.19641e+06
Length 8388671, alignment 3/ 7: 4.14203e+06 4.14215e+06
Length 8388735, alignment 9/ 5: 3.08678e+06 4.19257e+06
Length 16777223, alignment 0/64: 6.15746e+06 6.13601e+06
Length 16777231, alignment 0/ 3: 1.07097e+07 1.06811e+07
Length 16777247, alignment 3/ 0: 6.44117e+06 1.10652e+07
Length 16777279, alignment 3/ 7: 1.06051e+07 1.05801e+07
Length 16777343, alignment 9/ 5: 6.43968e+06 1.10505e+07
^ permalink raw reply [flat|nested] 38+ messages in thread
end of thread, other threads:[~2017-03-22 17:52 UTC | newest]
Thread overview: 38+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-02-07 12:42 [PATCH] Add ifunc memcpy and memmove for aarch64 Wilco Dijkstra
2017-02-07 13:01 ` Siddhesh Poyarekar
2017-02-07 13:22 ` Adhemerval Zanella
2017-02-07 23:20 ` Steve Ellcey
2017-02-08 5:46 ` Siddhesh Poyarekar
2017-02-08 5:48 ` Siddhesh Poyarekar
2017-02-08 12:03 ` Szabolcs Nagy
2017-02-09 0:02 ` Steve Ellcey
2017-02-09 10:51 ` Szabolcs Nagy
2017-02-09 11:04 ` Siddhesh Poyarekar
2017-02-09 11:40 ` Szabolcs Nagy
2017-02-09 11:53 ` Siddhesh Poyarekar
2017-02-09 15:54 ` Andrew Pinski
2017-02-10 0:55 ` Steve Ellcey
2017-02-23 0:27 ` Steve Ellcey
2017-02-23 14:21 ` Siddhesh Poyarekar
2017-02-23 16:20 ` Steve Ellcey
2017-02-23 16:34 ` Siddhesh Poyarekar
2017-02-23 16:42 ` Steve Ellcey
2017-02-23 16:53 ` Siddhesh Poyarekar
2017-03-01 18:48 ` Steve Ellcey
2017-03-14 18:46 ` Szabolcs Nagy
2017-03-14 18:51 ` Andrew Pinski
2017-03-15 23:53 ` Steve Ellcey
2017-03-22 5:38 ` Aarch64 machine maintainership (was: Add ifunc memcpy and memmove for aarch64) Siddhesh Poyarekar
2017-03-22 17:34 ` Joseph Myers
2017-03-22 17:52 ` Aarch64 machine maintainership Siddhesh Poyarekar
-- strict thread matches above, loose matches on Subject: below --
2017-01-19 18:23 [PATCH] Add ifunc memcpy and memmove for aarch64 Steve Ellcey
2017-01-19 19:42 ` Adhemerval Zanella
2017-01-19 21:04 ` Joseph Myers
2017-01-23 23:33 ` Steve Ellcey
2017-01-24 9:37 ` Florian Weimer
2017-01-24 14:09 ` Adhemerval Zanella
2017-01-24 19:34 ` Steve Ellcey
2017-01-24 20:49 ` Steve Ellcey
2017-01-25 17:34 ` Steve Ellcey
2017-02-06 20:54 ` Adhemerval Zanella
2017-02-07 6:47 ` Siddhesh Poyarekar
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).