* [PATCH 0/4] aarch64: Add SME support
@ 2023-12-08 16:31 Szabolcs Nagy
2023-12-08 16:32 ` [PATCH 1/4] aarch64: Add SME runtime support Szabolcs Nagy
` (3 more replies)
0 siblings, 4 replies; 11+ messages in thread
From: Szabolcs Nagy @ 2023-12-08 16:31 UTC (permalink / raw)
To: libc-alpha
Most of the SME (Scalable Matrix Extension) runtime support is
in libgcc, a bit of that has to be repeated in the libc so we
don't depend on libgcc to handle the ZA register state in
longjmp.
Szabolcs Nagy (4):
aarch64: Add SME runtime support
aarch64: Add longjmp support for SME
aarch64: Add setcontext support for SME
aarch64: Add longjmp test for SME
sysdeps/aarch64/Makefile | 13 +-
sysdeps/aarch64/__arm_za_disable.S | 112 ++++++++
sysdeps/aarch64/__longjmp.S | 22 ++
sysdeps/aarch64/rtld-global-offsets.sym | 10 +
sysdeps/aarch64/tst-sme-jmp.c | 278 +++++++++++++++++++
sysdeps/unix/sysv/linux/aarch64/setcontext.S | 19 ++
6 files changed, 451 insertions(+), 3 deletions(-)
create mode 100644 sysdeps/aarch64/__arm_za_disable.S
create mode 100644 sysdeps/aarch64/rtld-global-offsets.sym
create mode 100644 sysdeps/aarch64/tst-sme-jmp.c
--
2.25.1
^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH 1/4] aarch64: Add SME runtime support
2023-12-08 16:31 [PATCH 0/4] aarch64: Add SME support Szabolcs Nagy
@ 2023-12-08 16:32 ` Szabolcs Nagy
2023-12-28 13:41 ` Adhemerval Zanella Netto
2023-12-08 16:32 ` [PATCH 2/4] aarch64: Add longjmp support for SME Szabolcs Nagy
` (2 subsequent siblings)
3 siblings, 1 reply; 11+ messages in thread
From: Szabolcs Nagy @ 2023-12-08 16:32 UTC (permalink / raw)
To: libc-alpha
The runtime support routines for the call ABI of the Scalable Matrix
Extension (SME) are mostly in libgcc. Since libc.so cannot depend on
libgcc_s.so have an implementation of __arm_za_disable in libc for
libc internal use in longjmp and similar APIs.
__libc_arm_za_disable follows the same PCS rules as __arm_za_disable,
but it's a hidden symbol so it does not need variant PCS marking.
Using __libc_fatal instead of abort because it can print a message and
works in ld.so too. But for now we don't need SME routines in ld.so.
To check the SME HWCAP in asm, we need the _dl_hwcap2 member offset in
_rtld_global_ro in the shared libc.so, while in libc.a the _dl_hwcap2
object is accessed.
---
sysdeps/aarch64/Makefile | 10 ++-
sysdeps/aarch64/__arm_za_disable.S | 112 ++++++++++++++++++++++++
sysdeps/aarch64/rtld-global-offsets.sym | 10 +++
3 files changed, 129 insertions(+), 3 deletions(-)
create mode 100644 sysdeps/aarch64/__arm_za_disable.S
create mode 100644 sysdeps/aarch64/rtld-global-offsets.sym
diff --git a/sysdeps/aarch64/Makefile b/sysdeps/aarch64/Makefile
index 6a9559e5f5..9d8844d9c8 100644
--- a/sysdeps/aarch64/Makefile
+++ b/sysdeps/aarch64/Makefile
@@ -48,7 +48,9 @@ endif
endif
ifeq ($(subdir),csu)
-gen-as-const-headers += tlsdesc.sym
+gen-as-const-headers += \
+ tlsdesc.sym \
+ rtld-global-offsets.sym
endif
ifeq ($(subdir),gmon)
@@ -62,8 +64,10 @@ endif
ifeq ($(subdir),misc)
sysdep_headers += sys/ifunc.h
-sysdep_routines += __mtag_tag_zero_region \
- __mtag_tag_region
+sysdep_routines += \
+ __mtag_tag_zero_region \
+ __mtag_tag_region \
+ __arm_za_disable
endif
ifeq ($(subdir),malloc)
diff --git a/sysdeps/aarch64/__arm_za_disable.S b/sysdeps/aarch64/__arm_za_disable.S
new file mode 100644
index 0000000000..f9e2d942f2
--- /dev/null
+++ b/sysdeps/aarch64/__arm_za_disable.S
@@ -0,0 +1,112 @@
+/* Libc internal support routine for SME.
+ Copyright (C) 2023 Free Software Foundation, Inc.
+
+ This file is part of the GNU C Library.
+
+ The GNU C Library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Lesser General Public
+ License as published by the Free Software Foundation; either
+ version 2.1 of the License, or (at your option) any later version.
+
+ The GNU C Library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public
+ License along with the GNU C Library. If not, see
+ <http://www.gnu.org/licenses/>. */
+
+#include <sysdep.h>
+#include <rtld-global-offsets.h>
+
+#define HWCAP2_SME_BIT 23
+
+/* Disable ZA. Call ABI:
+ - Private ZA, streaming-compatible.
+ - x0-x13, x19-x29, sp and fp regs are call preserved.
+ - On return tpidr2_el0 = 0, ZA = 0.
+ - Takes no argument.
+ - Does not return a value.
+ - Can abort on failure (then registers are not preserved). */
+
+ENTRY (__libc_arm_za_disable)
+
+ /* Check if SME is available. */
+#ifdef SHARED
+ /* In libc.so. */
+ adrp x14, :got:_rtld_global_ro
+ ldr x14, [x14, :got_lo12:_rtld_global_ro]
+ ldr x14, [x14, GLRO_DL_HWCAP2_OFFSET]
+#else
+ /* In libc.a, may be PIC. */
+ adrp x14, _dl_hwcap2
+ ldr x14, [x14, :lo12:_dl_hwcap2]
+#endif
+ tbz x14, HWCAP2_SME_BIT, L(end)
+
+ .inst 0xd53bd0ae /* mrs x14, tpidr2_el0 */
+ cbz x14, L(end)
+
+ /* check reserved bytes. */
+ ldrh w15, [x14, 10]
+ ldr w16, [x14, 12]
+ orr w15, w15, w16
+ cbnz w15, L(fail)
+
+ ldr x16, [x14]
+ cbz x16, L(end)
+ ldrh w17, [x14, 8]
+ cbz w17, L(end)
+
+ /* x14: tpidr2, x15: 0,
+ x16: za_save_buffer, x17: num_za_save_slices. */
+
+L(save_loop):
+ .inst 0xe1206200 /* str za[w15, 0], [x16] */
+ .inst 0xe1206201 /* str za[w15, 1], [x16, 1, mul vl] */
+ .inst 0xe1206202 /* str za[w15, 2], [x16, 2, mul vl] */
+ .inst 0xe1206203 /* str za[w15, 3], [x16, 3, mul vl] */
+ .inst 0xe1206204 /* str za[w15, 4], [x16, 4, mul vl] */
+ .inst 0xe1206205 /* str za[w15, 5], [x16, 5, mul vl] */
+ .inst 0xe1206206 /* str za[w15, 6], [x16, 6, mul vl] */
+ .inst 0xe1206207 /* str za[w15, 7], [x16, 7, mul vl] */
+ .inst 0xe1206208 /* str za[w15, 8], [x16, 8, mul vl] */
+ .inst 0xe1206209 /* str za[w15, 9], [x16, 9, mul vl] */
+ .inst 0xe120620a /* str za[w15, 10], [x16, 10, mul vl] */
+ .inst 0xe120620b /* str za[w15, 11], [x16, 11, mul vl] */
+ .inst 0xe120620c /* str za[w15, 12], [x16, 12, mul vl] */
+ .inst 0xe120620d /* str za[w15, 13], [x16, 13, mul vl] */
+ .inst 0xe120620e /* str za[w15, 14], [x16, 14, mul vl] */
+ .inst 0xe120620f /* str za[w15, 15], [x16, 15, mul vl] */
+ add w15, w15, 16
+ .inst 0x04305a10 /* addsvl x16, x16, 16 */
+ cmp w17, w15
+ bhi L(save_loop)
+ .inst 0xd51bd0bf /* msr tpidr2_el0, xzr */
+ .inst 0xd503447f /* smstop za */
+L(end):
+ ret
+L(fail):
+#if HAVE_AARCH64_PAC_RET
+ PACIASP
+ cfi_window_save
+#endif
+ stp x29, x30, [sp, -32]!
+ cfi_adjust_cfa_offset (32)
+ cfi_rel_offset (x29, 0)
+ cfi_rel_offset (x30, 8)
+ mov x29, sp
+ .inst 0x04e0e3f0 /* cntd x16 */
+ str x16, [sp, 16]
+ cfi_rel_offset (46, 16)
+ .inst 0xd503467f /* smstop */
+ adrp x0, L(msg)
+ add x0, x0, :lo12:L(msg)
+ bl HIDDEN_JUMPTARGET (__libc_fatal)
+END (__libc_arm_za_disable)
+
+ .section .rodata
+ .align 3
+L(msg):
+ .string "FATAL: __libc_arm_za_disable failed.\n"
diff --git a/sysdeps/aarch64/rtld-global-offsets.sym b/sysdeps/aarch64/rtld-global-offsets.sym
new file mode 100644
index 0000000000..23cdaf7d9e
--- /dev/null
+++ b/sysdeps/aarch64/rtld-global-offsets.sym
@@ -0,0 +1,10 @@
+#define SHARED 1
+
+#include <ldsodefs.h>
+
+#define GLRO_offsetof(name) offsetof (struct rtld_global_ro, _##name)
+
+-- Offsets of _rtld_global_ro in libc.so
+
+GLRO_DL_HWCAP_OFFSET GLRO_offsetof (dl_hwcap)
+GLRO_DL_HWCAP2_OFFSET GLRO_offsetof (dl_hwcap2)
--
2.25.1
^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH 2/4] aarch64: Add longjmp support for SME
2023-12-08 16:31 [PATCH 0/4] aarch64: Add SME support Szabolcs Nagy
2023-12-08 16:32 ` [PATCH 1/4] aarch64: Add SME runtime support Szabolcs Nagy
@ 2023-12-08 16:32 ` Szabolcs Nagy
2023-12-28 13:42 ` Adhemerval Zanella Netto
2023-12-08 16:32 ` [PATCH 3/4] aarch64: Add setcontext " Szabolcs Nagy
2023-12-08 16:32 ` [PATCH 4/4] aarch64: Add longjmp test " Szabolcs Nagy
3 siblings, 1 reply; 11+ messages in thread
From: Szabolcs Nagy @ 2023-12-08 16:32 UTC (permalink / raw)
To: libc-alpha
For the ZA lazy saving scheme to work, longjmp has to call
__libc_arm_za_disable.
In ld.so we assume ZA is not used so longjmp does not need
special support there.
---
sysdeps/aarch64/__longjmp.S | 22 ++++++++++++++++++++++
1 file changed, 22 insertions(+)
diff --git a/sysdeps/aarch64/__longjmp.S b/sysdeps/aarch64/__longjmp.S
index d743e7478d..659199d7d4 100644
--- a/sysdeps/aarch64/__longjmp.S
+++ b/sysdeps/aarch64/__longjmp.S
@@ -49,6 +49,28 @@ ENTRY (__longjmp)
PTR_ARG (0)
+#if IS_IN(libc)
+ /* Disable ZA state of SME in libc.a and libc.so, but not in ld.so. */
+# if HAVE_AARCH64_PAC_RET
+ PACIASP
+ cfi_window_save
+# endif
+ stp x29, x30, [sp, -16]!
+ cfi_adjust_cfa_offset (16)
+ cfi_rel_offset (x29, 0)
+ cfi_rel_offset (x30, 8)
+ mov x29, sp
+ bl __libc_arm_za_disable
+ ldp x29, x30, [sp], 16
+ cfi_adjust_cfa_offset (-16)
+ cfi_restore (x29)
+ cfi_restore (x30)
+# if HAVE_AARCH64_PAC_RET
+ AUTIASP
+ cfi_window_save
+# endif
+#endif
+
ldp x19, x20, [x0, #JB_X19<<3]
ldp x21, x22, [x0, #JB_X21<<3]
ldp x23, x24, [x0, #JB_X23<<3]
--
2.25.1
^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH 3/4] aarch64: Add setcontext support for SME
2023-12-08 16:31 [PATCH 0/4] aarch64: Add SME support Szabolcs Nagy
2023-12-08 16:32 ` [PATCH 1/4] aarch64: Add SME runtime support Szabolcs Nagy
2023-12-08 16:32 ` [PATCH 2/4] aarch64: Add longjmp support for SME Szabolcs Nagy
@ 2023-12-08 16:32 ` Szabolcs Nagy
2023-12-28 13:42 ` Adhemerval Zanella Netto
2023-12-08 16:32 ` [PATCH 4/4] aarch64: Add longjmp test " Szabolcs Nagy
3 siblings, 1 reply; 11+ messages in thread
From: Szabolcs Nagy @ 2023-12-08 16:32 UTC (permalink / raw)
To: libc-alpha
For the ZA lazy saving scheme to work, setcontext has to call
__libc_arm_za_disable.
Also fixes swapcontext which uses setcontext internally.
---
sysdeps/unix/sysv/linux/aarch64/setcontext.S | 19 +++++++++++++++++++
1 file changed, 19 insertions(+)
diff --git a/sysdeps/unix/sysv/linux/aarch64/setcontext.S b/sysdeps/unix/sysv/linux/aarch64/setcontext.S
index d7756015c5..fe75adf61e 100644
--- a/sysdeps/unix/sysv/linux/aarch64/setcontext.S
+++ b/sysdeps/unix/sysv/linux/aarch64/setcontext.S
@@ -49,6 +49,25 @@ ENTRY (__setcontext)
cbz x0, 1f
b C_SYMBOL_NAME (__syscall_error)
1:
+ /* Disable ZA of SME. */
+#if HAVE_AARCH64_PAC_RET
+ PACIASP
+ cfi_window_save
+#endif
+ stp x29, x30, [sp, -16]!
+ cfi_adjust_cfa_offset (16)
+ cfi_rel_offset (x29, 0)
+ cfi_rel_offset (x30, 8)
+ mov x29, sp
+ bl __libc_arm_za_disable
+ ldp x29, x30, [sp], 16
+ cfi_adjust_cfa_offset (-16)
+ cfi_restore (x29)
+ cfi_restore (x30)
+#if HAVE_AARCH64_PAC_RET
+ AUTIASP
+ cfi_window_save
+#endif
/* Restore the general purpose registers. */
mov x0, x9
cfi_def_cfa (x0, 0)
--
2.25.1
^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH 4/4] aarch64: Add longjmp test for SME
2023-12-08 16:31 [PATCH 0/4] aarch64: Add SME support Szabolcs Nagy
` (2 preceding siblings ...)
2023-12-08 16:32 ` [PATCH 3/4] aarch64: Add setcontext " Szabolcs Nagy
@ 2023-12-08 16:32 ` Szabolcs Nagy
2023-12-28 14:36 ` Adhemerval Zanella Netto
3 siblings, 1 reply; 11+ messages in thread
From: Szabolcs Nagy @ 2023-12-08 16:32 UTC (permalink / raw)
To: libc-alpha
Includes test for setcontext too.
The test directly checks after longjmp if ZA got disabled and the
ZA contents got saved following the lazy saving scheme. It does not
use ACLE code to verify that gcc can interoperate with glibc.
---
sysdeps/aarch64/Makefile | 3 +
sysdeps/aarch64/tst-sme-jmp.c | 278 ++++++++++++++++++++++++++++++++++
2 files changed, 281 insertions(+)
create mode 100644 sysdeps/aarch64/tst-sme-jmp.c
diff --git a/sysdeps/aarch64/Makefile b/sysdeps/aarch64/Makefile
index 9d8844d9c8..141d7d9cc2 100644
--- a/sysdeps/aarch64/Makefile
+++ b/sysdeps/aarch64/Makefile
@@ -68,6 +68,9 @@ sysdep_routines += \
__mtag_tag_zero_region \
__mtag_tag_region \
__arm_za_disable
+
+tests += \
+ tst-sme-jmp
endif
ifeq ($(subdir),malloc)
diff --git a/sysdeps/aarch64/tst-sme-jmp.c b/sysdeps/aarch64/tst-sme-jmp.c
new file mode 100644
index 0000000000..08dd291b0c
--- /dev/null
+++ b/sysdeps/aarch64/tst-sme-jmp.c
@@ -0,0 +1,278 @@
+/* Test for SME longjmp.
+ Copyright (C) 2023 Free Software Foundation, Inc.
+ This file is part of the GNU C Library.
+
+ The GNU C Library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Lesser General Public
+ License as published by the Free Software Foundation; either
+ version 2.1 of the License, or (at your option) any later version.
+
+ The GNU C Library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public
+ License along with the GNU C Library; if not, see
+ <https://www.gnu.org/licenses/>. */
+
+#include <stdint.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <setjmp.h>
+#include <ucontext.h>
+#include <sys/auxv.h>
+#include <support/check.h>
+
+struct blk {
+ void *za_save_buffer;
+ uint16_t num_za_save_slices;
+ char __reserved[6];
+};
+
+static unsigned long svl;
+static uint8_t *za_orig;
+static uint8_t *za_dump;
+static uint8_t *za_save;
+
+static unsigned long
+get_svl (void)
+{
+ register unsigned long x0 asm ("x0");
+ asm volatile (
+ ".inst 0x04bf5820 /* rdsvl x0, 1 */\n"
+ : "=r" (x0));
+ return x0;
+}
+
+/* PSTATE.ZA = 1, set ZA state to active. */
+static void
+start_za (void)
+{
+ asm volatile (
+ ".inst 0xd503457f /* smstart za */");
+}
+
+/* Read SVCR to get SM (bit0) and ZA (bit1) state. */
+static unsigned long
+get_svcr (void)
+{
+ register unsigned long x0 asm ("x0");
+ asm volatile (
+ ".inst 0xd53b4240 /* mrs x0, svcr */\n"
+ : "=r" (x0));
+ return x0;
+}
+
+/* Load data into ZA byte by byte from p. */
+static void __attribute__ ((noinline))
+load_za (const void *p)
+{
+ register unsigned long x15 asm ("x15") = 0;
+ register unsigned long x16 asm ("x16") = (unsigned long)p;
+ register unsigned long x17 asm ("x17") = svl;
+
+ asm volatile (
+ ".inst 0xd503437f /* smstart sm */\n"
+ ".L_ldr_loop:\n"
+ ".inst 0xe1006200 /* ldr za[w15, 0], [x16] */\n"
+ "add w15, w15, 1\n"
+ ".inst 0x04305030 /* addvl x16, x16, 1 */\n"
+ "cmp w15, w17\n"
+ "bne .L_ldr_loop\n"
+ ".inst 0xd503427f /* smstop sm */\n"
+ : "+r"(x15), "+r"(x16), "+r"(x17));
+}
+
+/* Set tpidr2 to BLK. */
+static void
+set_tpidr2 (struct blk *blk)
+{
+ register unsigned long x0 asm ("x0") = (unsigned long)blk;
+ asm volatile (
+ ".inst 0xd51bd0a0 /* msr tpidr2_el0, x0 */\n"
+ :: "r"(x0) : "memory");
+}
+
+/* Returns tpidr2. */
+static void *
+get_tpidr2 (void)
+{
+ register unsigned long x0 asm ("x0");
+ asm volatile (
+ ".inst 0xd53bd0a0 /* mrs x0, tpidr2_el0 */\n"
+ : "=r"(x0) :: "memory");
+ return (void *) x0;
+}
+
+static void
+print_data(const char *msg, void *p)
+{
+ unsigned char *a = p;
+ printf ("%s:\n", msg);
+ for (int i = 0; i < svl; i++)
+ {
+ printf ("%d: ", i);
+ for (int j = 0; j < svl; j++)
+ printf("%02x,", a[i*svl+j]);
+ printf("\n");
+ }
+ printf(".\n");
+ fflush (stdout);
+}
+
+__attribute__ ((noinline))
+static void
+do_longjmp (jmp_buf env)
+{
+ longjmp (env, 1);
+}
+
+__attribute__ ((noinline))
+static void
+do_setcontext (const ucontext_t *p)
+{
+ setcontext (p);
+}
+
+static void
+longjmp_test (void)
+{
+ unsigned long svcr;
+ jmp_buf env;
+ void *p;
+ int r;
+ struct blk blk = {za_save, svl, {0}};
+
+ printf ("longjmp test:\n");
+ p = get_tpidr2 ();
+ printf ("initial tp2 = %p\n", p);
+ if (p != NULL)
+ FAIL_EXIT1 ("tpidr2 is not initialized to 0");
+ svcr = get_svcr ();
+ if (svcr != 0)
+ FAIL_EXIT1 ("svcr != 0: %lu", svcr);
+ set_tpidr2 (&blk);
+ start_za ();
+ load_za (za_orig);
+
+ print_data ("za save space", za_save);
+ p = get_tpidr2 ();
+ printf ("before setjmp: tp2 = %p\n", p);
+ if (p != &blk)
+ FAIL_EXIT1 ("tpidr2 is not set to BLK %p", (void *)&blk);
+ if (setjmp (env) == 0)
+ {
+ p = get_tpidr2 ();
+ printf ("before longjmp: tp2 = %p\n", p);
+ if (p != &blk)
+ FAIL_EXIT1 ("tpidr2 is clobbered");
+ do_longjmp (env);
+ FAIL_EXIT1 ("longjmp returned");
+ }
+ p = get_tpidr2 ();
+ printf ("after longjmp: tp2 = %p\n", p);
+ if (p != NULL)
+ FAIL_EXIT1 ("tpidr2 is not set to 0");
+ svcr = get_svcr ();
+ if (svcr != 0)
+ FAIL_EXIT1 ("svcr != 0: %lu", svcr);
+ print_data ("za save space", za_save);
+ r = memcmp (za_orig, za_save, svl*svl);
+ if (r != 0)
+ FAIL_EXIT1 ("saving za failed");
+}
+
+static void
+setcontext_test (void)
+{
+ volatile int setcontext_done = 0;
+ unsigned long svcr;
+ ucontext_t ctx;
+ void *p;
+ int r;
+ struct blk blk = {za_save, svl, {0}};
+
+ printf ("setcontext test:\n");
+ p = get_tpidr2 ();
+ printf ("initial tp2 = %p\n", p);
+ if (p != NULL)
+ FAIL_EXIT1 ("tpidr2 is not initialized to 0");
+ svcr = get_svcr ();
+ if (svcr != 0)
+ FAIL_EXIT1 ("svcr != 0: %lu", svcr);
+ set_tpidr2 (&blk);
+ start_za ();
+ load_za (za_orig);
+
+ print_data ("za save space", za_save);
+ p = get_tpidr2 ();
+ printf ("before getcontext: tp2 = %p\n", p);
+ if (p != &blk)
+ FAIL_EXIT1 ("tpidr2 is not set to BLK %p", (void *)&blk);
+ r = getcontext (&ctx);
+ if (r != 0)
+ FAIL_EXIT1 ("getcontext failed");
+ if (setcontext_done == 0)
+ {
+ p = get_tpidr2 ();
+ printf ("before setcontext: tp2 = %p\n", p);
+ if (p != &blk)
+ FAIL_EXIT1 ("tpidr2 is clobbered");
+ setcontext_done = 1;
+ do_setcontext (&ctx);
+ FAIL_EXIT1 ("setcontext returned");
+ }
+ p = get_tpidr2 ();
+ printf ("after setcontext: tp2 = %p\n", p);
+ if (p != NULL)
+ FAIL_EXIT1 ("tpidr2 is not set to 0");
+ svcr = get_svcr ();
+ if (svcr != 0)
+ FAIL_EXIT1 ("svcr != 0: %lu", svcr);
+ print_data ("za save space", za_save);
+ r = memcmp (za_orig, za_save, svl*svl);
+ if (r != 0)
+ FAIL_EXIT1 ("saving za failed");
+}
+
+static int
+do_test (void)
+{
+ unsigned long hwcap2;
+
+ hwcap2 = getauxval (AT_HWCAP2);
+ if ((hwcap2 & HWCAP2_SME) == 0)
+ return 77;
+
+ svl = get_svl ();
+ printf ("svl: %lu\n", svl);
+ if (svl < 16 || svl % 16 != 0 || svl >= (1 << 16))
+ FAIL_EXIT1 ("invalid svl");
+
+ za_orig = malloc (svl*svl);
+ za_save = malloc (svl*svl);
+ za_dump = malloc (svl*svl);
+ memset (za_orig, 1, svl*svl);
+ memset (za_save, 2, svl*svl);
+ memset (za_dump, 3, svl*svl);
+ for (int i = 0; i < svl; i++)
+ for (int j = 0; j < svl; j++)
+ za_orig[i*svl+j] = i*svl+j;
+ print_data ("original data", za_orig);
+
+ longjmp_test ();
+
+ memset (za_save, 2, svl*svl);
+ memset (za_dump, 3, svl*svl);
+
+ setcontext_test ();
+
+ free (za_orig);
+ free (za_save);
+ free (za_dump);
+ return 0;
+}
+
+#include <support/test-driver.c>
--
2.25.1
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 1/4] aarch64: Add SME runtime support
2023-12-08 16:32 ` [PATCH 1/4] aarch64: Add SME runtime support Szabolcs Nagy
@ 2023-12-28 13:41 ` Adhemerval Zanella Netto
2024-01-02 17:15 ` Szabolcs Nagy
0 siblings, 1 reply; 11+ messages in thread
From: Adhemerval Zanella Netto @ 2023-12-28 13:41 UTC (permalink / raw)
To: libc-alpha, Szabolcs Nagy
On 08/12/23 13:32, Szabolcs Nagy wrote:
> The runtime support routines for the call ABI of the Scalable Matrix
> Extension (SME) are mostly in libgcc. Since libc.so cannot depend on
> libgcc_s.so have an implementation of __arm_za_disable in libc for
> libc internal use in longjmp and similar APIs.
>
> __libc_arm_za_disable follows the same PCS rules as __arm_za_disable,
> but it's a hidden symbol so it does not need variant PCS marking.
>
> Using __libc_fatal instead of abort because it can print a message and
> works in ld.so too. But for now we don't need SME routines in ld.so.
>
> To check the SME HWCAP in asm, we need the _dl_hwcap2 member offset in
> _rtld_global_ro in the shared libc.so, while in libc.a the _dl_hwcap2
> object is accessed.
LGTM, thanks.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
> ---
> sysdeps/aarch64/Makefile | 10 ++-
> sysdeps/aarch64/__arm_za_disable.S | 112 ++++++++++++++++++++++++
> sysdeps/aarch64/rtld-global-offsets.sym | 10 +++
> 3 files changed, 129 insertions(+), 3 deletions(-)
> create mode 100644 sysdeps/aarch64/__arm_za_disable.S
> create mode 100644 sysdeps/aarch64/rtld-global-offsets.sym
>
> diff --git a/sysdeps/aarch64/Makefile b/sysdeps/aarch64/Makefile
> index 6a9559e5f5..9d8844d9c8 100644
> --- a/sysdeps/aarch64/Makefile
> +++ b/sysdeps/aarch64/Makefile
> @@ -48,7 +48,9 @@ endif
> endif
>
> ifeq ($(subdir),csu)
> -gen-as-const-headers += tlsdesc.sym
> +gen-as-const-headers += \
> + tlsdesc.sym \
> + rtld-global-offsets.sym
> endif
>
> ifeq ($(subdir),gmon)
> @@ -62,8 +64,10 @@ endif
>
> ifeq ($(subdir),misc)
> sysdep_headers += sys/ifunc.h
> -sysdep_routines += __mtag_tag_zero_region \
> - __mtag_tag_region
> +sysdep_routines += \
> + __mtag_tag_zero_region \
> + __mtag_tag_region \
> + __arm_za_disable
> endif
>
> ifeq ($(subdir),malloc)
Ok (although usually the Makefile reflow makes more sense to be a
unrelated patch).
> diff --git a/sysdeps/aarch64/__arm_za_disable.S b/sysdeps/aarch64/__arm_za_disable.S
> new file mode 100644
> index 0000000000..f9e2d942f2
> --- /dev/null
> +++ b/sysdeps/aarch64/__arm_za_disable.S
> @@ -0,0 +1,112 @@
> +/* Libc internal support routine for SME.
> + Copyright (C) 2023 Free Software Foundation, Inc.
> +
> + This file is part of the GNU C Library.
> +
> + The GNU C Library is free software; you can redistribute it and/or
> + modify it under the terms of the GNU Lesser General Public
> + License as published by the Free Software Foundation; either
> + version 2.1 of the License, or (at your option) any later version.
> +
> + The GNU C Library is distributed in the hope that it will be useful,
> + but WITHOUT ANY WARRANTY; without even the implied warranty of
> + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
> + Lesser General Public License for more details.
> +
> + You should have received a copy of the GNU Lesser General Public
> + License along with the GNU C Library. If not, see
> + <http://www.gnu.org/licenses/>. */
> +
> +#include <sysdep.h>
> +#include <rtld-global-offsets.h>
> +
> +#define HWCAP2_SME_BIT 23
> +
> +/* Disable ZA. Call ABI:
> + - Private ZA, streaming-compatible.
> + - x0-x13, x19-x29, sp and fp regs are call preserved.
> + - On return tpidr2_el0 = 0, ZA = 0.
> + - Takes no argument.
> + - Does not return a value.
> + - Can abort on failure (then registers are not preserved). */
> +
> +ENTRY (__libc_arm_za_disable)
> +
> + /* Check if SME is available. */
> +#ifdef SHARED
> + /* In libc.so. */
> + adrp x14, :got:_rtld_global_ro
> + ldr x14, [x14, :got_lo12:_rtld_global_ro]
> + ldr x14, [x14, GLRO_DL_HWCAP2_OFFSET]
> +#else
> + /* In libc.a, may be PIC. */
> + adrp x14, _dl_hwcap2
> + ldr x14, [x14, :lo12:_dl_hwcap2]
> +#endif
> + tbz x14, HWCAP2_SME_BIT, L(end)
> +
> + .inst 0xd53bd0ae /* mrs x14, tpidr2_el0 */
> + cbz x14, L(end)
> +
> + /* check reserved bytes. */
Maybe add that the the action chose is to abort if non-zero bytes
are found.
> + ldrh w15, [x14, 10]
> + ldr w16, [x14, 12]
> + orr w15, w15, w16
> + cbnz w15, L(fail)
> +
> + ldr x16, [x14]
> + cbz x16, L(end)
> + ldrh w17, [x14, 8]
> + cbz w17, L(end)
> +
> + /* x14: tpidr2, x15: 0,
> + x16: za_save_buffer, x17: num_za_save_slices. */
> +
> +L(save_loop):
> + .inst 0xe1206200 /* str za[w15, 0], [x16] */
> + .inst 0xe1206201 /* str za[w15, 1], [x16, 1, mul vl] */
> + .inst 0xe1206202 /* str za[w15, 2], [x16, 2, mul vl] */
> + .inst 0xe1206203 /* str za[w15, 3], [x16, 3, mul vl] */
> + .inst 0xe1206204 /* str za[w15, 4], [x16, 4, mul vl] */
> + .inst 0xe1206205 /* str za[w15, 5], [x16, 5, mul vl] */
> + .inst 0xe1206206 /* str za[w15, 6], [x16, 6, mul vl] */
> + .inst 0xe1206207 /* str za[w15, 7], [x16, 7, mul vl] */
> + .inst 0xe1206208 /* str za[w15, 8], [x16, 8, mul vl] */
> + .inst 0xe1206209 /* str za[w15, 9], [x16, 9, mul vl] */
> + .inst 0xe120620a /* str za[w15, 10], [x16, 10, mul vl] */
> + .inst 0xe120620b /* str za[w15, 11], [x16, 11, mul vl] */
> + .inst 0xe120620c /* str za[w15, 12], [x16, 12, mul vl] */
> + .inst 0xe120620d /* str za[w15, 13], [x16, 13, mul vl] */
> + .inst 0xe120620e /* str za[w15, 14], [x16, 14, mul vl] */
> + .inst 0xe120620f /* str za[w15, 15], [x16, 15, mul vl] */
> + add w15, w15, 16
> + .inst 0x04305a10 /* addsvl x16, x16, 16 */
> + cmp w17, w15
> + bhi L(save_loop)
> + .inst 0xd51bd0bf /* msr tpidr2_el0, xzr */
> + .inst 0xd503447f /* smstop za */
> +L(end):
> + ret
> +L(fail):
> +#if HAVE_AARCH64_PAC_RET
> + PACIASP
> + cfi_window_save
> +#endif
> + stp x29, x30, [sp, -32]!
> + cfi_adjust_cfa_offset (32)
> + cfi_rel_offset (x29, 0)
> + cfi_rel_offset (x30, 8)
> + mov x29, sp
> + .inst 0x04e0e3f0 /* cntd x16 */
> + str x16, [sp, 16]
> + cfi_rel_offset (46, 16)
> + .inst 0xd503467f /* smstop */
> + adrp x0, L(msg)
> + add x0, x0, :lo12:L(msg)
> + bl HIDDEN_JUMPTARGET (__libc_fatal)
> +END (__libc_arm_za_disable)
> +
> + .section .rodata
> + .align 3
> +L(msg):
> + .string "FATAL: __libc_arm_za_disable failed.\n"
Ok.
> diff --git a/sysdeps/aarch64/rtld-global-offsets.sym b/sysdeps/aarch64/rtld-global-offsets.sym
> new file mode 100644
> index 0000000000..23cdaf7d9e
> --- /dev/null
> +++ b/sysdeps/aarch64/rtld-global-offsets.sym
> @@ -0,0 +1,10 @@
> +#define SHARED 1
> +
> +#include <ldsodefs.h>
> +
> +#define GLRO_offsetof(name) offsetof (struct rtld_global_ro, _##name)
> +
> +-- Offsets of _rtld_global_ro in libc.so
> +
> +GLRO_DL_HWCAP_OFFSET GLRO_offsetof (dl_hwcap)
It seems to be unused.
> +GLRO_DL_HWCAP2_OFFSET GLRO_offsetof (dl_hwcap2)
OK.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 2/4] aarch64: Add longjmp support for SME
2023-12-08 16:32 ` [PATCH 2/4] aarch64: Add longjmp support for SME Szabolcs Nagy
@ 2023-12-28 13:42 ` Adhemerval Zanella Netto
0 siblings, 0 replies; 11+ messages in thread
From: Adhemerval Zanella Netto @ 2023-12-28 13:42 UTC (permalink / raw)
To: Szabolcs Nagy, libc-alpha
On 08/12/23 13:32, Szabolcs Nagy wrote:
> For the ZA lazy saving scheme to work, longjmp has to call
> __libc_arm_za_disable.
>
> In ld.so we assume ZA is not used so longjmp does not need
> special support there.
LGTM, thanks.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
> ---
> sysdeps/aarch64/__longjmp.S | 22 ++++++++++++++++++++++
> 1 file changed, 22 insertions(+)
>
> diff --git a/sysdeps/aarch64/__longjmp.S b/sysdeps/aarch64/__longjmp.S
> index d743e7478d..659199d7d4 100644
> --- a/sysdeps/aarch64/__longjmp.S
> +++ b/sysdeps/aarch64/__longjmp.S
> @@ -49,6 +49,28 @@ ENTRY (__longjmp)
>
> PTR_ARG (0)
>
> +#if IS_IN(libc)
> + /* Disable ZA state of SME in libc.a and libc.so, but not in ld.so. */
> +# if HAVE_AARCH64_PAC_RET
> + PACIASP
> + cfi_window_save
> +# endif
> + stp x29, x30, [sp, -16]!
> + cfi_adjust_cfa_offset (16)
> + cfi_rel_offset (x29, 0)
> + cfi_rel_offset (x30, 8)
> + mov x29, sp
> + bl __libc_arm_za_disable
> + ldp x29, x30, [sp], 16
> + cfi_adjust_cfa_offset (-16)
> + cfi_restore (x29)
> + cfi_restore (x30)
> +# if HAVE_AARCH64_PAC_RET
> + AUTIASP
> + cfi_window_save
> +# endif
> +#endif
> +
> ldp x19, x20, [x0, #JB_X19<<3]
> ldp x21, x22, [x0, #JB_X21<<3]
> ldp x23, x24, [x0, #JB_X23<<3]
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 3/4] aarch64: Add setcontext support for SME
2023-12-08 16:32 ` [PATCH 3/4] aarch64: Add setcontext " Szabolcs Nagy
@ 2023-12-28 13:42 ` Adhemerval Zanella Netto
0 siblings, 0 replies; 11+ messages in thread
From: Adhemerval Zanella Netto @ 2023-12-28 13:42 UTC (permalink / raw)
To: Szabolcs Nagy, libc-alpha
On 08/12/23 13:32, Szabolcs Nagy wrote:
> For the ZA lazy saving scheme to work, setcontext has to call
> __libc_arm_za_disable.
>
> Also fixes swapcontext which uses setcontext internally.
LGTM, thanks.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
> ---
> sysdeps/unix/sysv/linux/aarch64/setcontext.S | 19 +++++++++++++++++++
> 1 file changed, 19 insertions(+)
>
> diff --git a/sysdeps/unix/sysv/linux/aarch64/setcontext.S b/sysdeps/unix/sysv/linux/aarch64/setcontext.S
> index d7756015c5..fe75adf61e 100644
> --- a/sysdeps/unix/sysv/linux/aarch64/setcontext.S
> +++ b/sysdeps/unix/sysv/linux/aarch64/setcontext.S
> @@ -49,6 +49,25 @@ ENTRY (__setcontext)
> cbz x0, 1f
> b C_SYMBOL_NAME (__syscall_error)
> 1:
> + /* Disable ZA of SME. */
> +#if HAVE_AARCH64_PAC_RET
> + PACIASP
> + cfi_window_save
> +#endif
> + stp x29, x30, [sp, -16]!
> + cfi_adjust_cfa_offset (16)
> + cfi_rel_offset (x29, 0)
> + cfi_rel_offset (x30, 8)
> + mov x29, sp
> + bl __libc_arm_za_disable
> + ldp x29, x30, [sp], 16
> + cfi_adjust_cfa_offset (-16)
> + cfi_restore (x29)
> + cfi_restore (x30)
> +#if HAVE_AARCH64_PAC_RET
> + AUTIASP
> + cfi_window_save
> +#endif
> /* Restore the general purpose registers. */
> mov x0, x9
> cfi_def_cfa (x0, 0)
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 4/4] aarch64: Add longjmp test for SME
2023-12-08 16:32 ` [PATCH 4/4] aarch64: Add longjmp test " Szabolcs Nagy
@ 2023-12-28 14:36 ` Adhemerval Zanella Netto
2024-01-02 17:20 ` Szabolcs Nagy
0 siblings, 1 reply; 11+ messages in thread
From: Adhemerval Zanella Netto @ 2023-12-28 14:36 UTC (permalink / raw)
To: libc-alpha, Szabolcs Nagy
On 08/12/23 13:32, Szabolcs Nagy wrote:
> Includes test for setcontext too.
>
> The test directly checks after longjmp if ZA got disabled and the
> ZA contents got saved following the lazy saving scheme. It does not
> use ACLE code to verify that gcc can interoperate with glibc.
LGTM, thanks. Some minor suggestions below.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
> ---
> sysdeps/aarch64/Makefile | 3 +
> sysdeps/aarch64/tst-sme-jmp.c | 278 ++++++++++++++++++++++++++++++++++
> 2 files changed, 281 insertions(+)
> create mode 100644 sysdeps/aarch64/tst-sme-jmp.c
>
> diff --git a/sysdeps/aarch64/Makefile b/sysdeps/aarch64/Makefile
> index 9d8844d9c8..141d7d9cc2 100644
> --- a/sysdeps/aarch64/Makefile
> +++ b/sysdeps/aarch64/Makefile
> @@ -68,6 +68,9 @@ sysdep_routines += \
> __mtag_tag_zero_region \
> __mtag_tag_region \
> __arm_za_disable
> +
> +tests += \
> + tst-sme-jmp
> endif
>
> ifeq ($(subdir),malloc)
Ok.
> diff --git a/sysdeps/aarch64/tst-sme-jmp.c b/sysdeps/aarch64/tst-sme-jmp.c
> new file mode 100644
> index 0000000000..08dd291b0c
> --- /dev/null
> +++ b/sysdeps/aarch64/tst-sme-jmp.c
> @@ -0,0 +1,278 @@
> +/* Test for SME longjmp.
> + Copyright (C) 2023 Free Software Foundation, Inc.
> + This file is part of the GNU C Library.
> +
> + The GNU C Library is free software; you can redistribute it and/or
> + modify it under the terms of the GNU Lesser General Public
> + License as published by the Free Software Foundation; either
> + version 2.1 of the License, or (at your option) any later version.
> +
> + The GNU C Library is distributed in the hope that it will be useful,
> + but WITHOUT ANY WARRANTY; without even the implied warranty of
> + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
> + Lesser General Public License for more details.
> +
> + You should have received a copy of the GNU Lesser General Public
> + License along with the GNU C Library; if not, see
> + <https://www.gnu.org/licenses/>. */
> +
> +#include <stdint.h>
> +#include <stdio.h>
> +#include <stdlib.h>
> +#include <string.h>
> +#include <setjmp.h>
> +#include <ucontext.h>
> +#include <sys/auxv.h>
> +#include <support/check.h>
> +
> +struct blk {
> + void *za_save_buffer;
> + uint16_t num_za_save_slices;
> + char __reserved[6];
> +};
> +
> +static unsigned long svl;
> +static uint8_t *za_orig;
> +static uint8_t *za_dump;
> +static uint8_t *za_save;
> +
> +static unsigned long
> +get_svl (void)
> +{
> + register unsigned long x0 asm ("x0");
> + asm volatile (
> + ".inst 0x04bf5820 /* rdsvl x0, 1 */\n"
> + : "=r" (x0));
> + return x0;
> +}
> +
> +/* PSTATE.ZA = 1, set ZA state to active. */
> +static void
> +start_za (void)
> +{
> + asm volatile (
> + ".inst 0xd503457f /* smstart za */");
> +}
> +
> +/* Read SVCR to get SM (bit0) and ZA (bit1) state. */
> +static unsigned long
> +get_svcr (void)
> +{
> + register unsigned long x0 asm ("x0");
> + asm volatile (
> + ".inst 0xd53b4240 /* mrs x0, svcr */\n"
> + : "=r" (x0));
> + return x0;
> +}
> +
> +/* Load data into ZA byte by byte from p. */
> +static void __attribute__ ((noinline))
> +load_za (const void *p)
> +{
> + register unsigned long x15 asm ("x15") = 0;
> + register unsigned long x16 asm ("x16") = (unsigned long)p;
> + register unsigned long x17 asm ("x17") = svl;
> +
> + asm volatile (
> + ".inst 0xd503437f /* smstart sm */\n"
> + ".L_ldr_loop:\n"
> + ".inst 0xe1006200 /* ldr za[w15, 0], [x16] */\n"
> + "add w15, w15, 1\n"
> + ".inst 0x04305030 /* addvl x16, x16, 1 */\n"
> + "cmp w15, w17\n"
> + "bne .L_ldr_loop\n"
> + ".inst 0xd503427f /* smstop sm */\n"
> + : "+r"(x15), "+r"(x16), "+r"(x17));
> +}
> +
> +/* Set tpidr2 to BLK. */
> +static void
> +set_tpidr2 (struct blk *blk)
> +{
> + register unsigned long x0 asm ("x0") = (unsigned long)blk;
> + asm volatile (
> + ".inst 0xd51bd0a0 /* msr tpidr2_el0, x0 */\n"
> + :: "r"(x0) : "memory");
> +}
> +
> +/* Returns tpidr2. */
> +static void *
> +get_tpidr2 (void)
> +{
> + register unsigned long x0 asm ("x0");
> + asm volatile (
> + ".inst 0xd53bd0a0 /* mrs x0, tpidr2_el0 */\n"
> + : "=r"(x0) :: "memory");
> + return (void *) x0;
> +}
> +
> +static void
> +print_data(const char *msg, void *p)
> +{
> + unsigned char *a = p;
> + printf ("%s:\n", msg);
> + for (int i = 0; i < svl; i++)
> + {
> + printf ("%d: ", i);
> + for (int j = 0; j < svl; j++)
> + printf("%02x,", a[i*svl+j]);
> + printf("\n");
> + }
> + printf(".\n");
> + fflush (stdout);
> +}
> +
> +__attribute__ ((noinline))
> +static void
> +do_longjmp (jmp_buf env)
> +{
> + longjmp (env, 1);
> +}
> +
> +__attribute__ ((noinline))
> +static void
> +do_setcontext (const ucontext_t *p)
> +{
> + setcontext (p);
> +}
> +
> +static void
> +longjmp_test (void)
> +{
> + unsigned long svcr;
> + jmp_buf env;
> + void *p;
> + int r;
> + struct blk blk = {za_save, svl, {0}};
> +
> + printf ("longjmp test:\n");
> + p = get_tpidr2 ();
> + printf ("initial tp2 = %p\n", p);
> + if (p != NULL)
> + FAIL_EXIT1 ("tpidr2 is not initialized to 0");
> + svcr = get_svcr ();
> + if (svcr != 0)
> + FAIL_EXIT1 ("svcr != 0: %lu", svcr);
> + set_tpidr2 (&blk);
> + start_za ();
> + load_za (za_orig);
> +
> + print_data ("za save space", za_save);
> + p = get_tpidr2 ();
> + printf ("before setjmp: tp2 = %p\n", p);
> + if (p != &blk)
> + FAIL_EXIT1 ("tpidr2 is not set to BLK %p", (void *)&blk);
> + if (setjmp (env) == 0)
> + {
> + p = get_tpidr2 ();
> + printf ("before longjmp: tp2 = %p\n", p);
> + if (p != &blk)
> + FAIL_EXIT1 ("tpidr2 is clobbered");
> + do_longjmp (env);
> + FAIL_EXIT1 ("longjmp returned");
> + }
> + p = get_tpidr2 ();
> + printf ("after longjmp: tp2 = %p\n", p);
> + if (p != NULL)
> + FAIL_EXIT1 ("tpidr2 is not set to 0");
> + svcr = get_svcr ();
> + if (svcr != 0)
> + FAIL_EXIT1 ("svcr != 0: %lu", svcr);
> + print_data ("za save space", za_save);
> + r = memcmp (za_orig, za_save, svl*svl);
> + if (r != 0)
> + FAIL_EXIT1 ("saving za failed");
> +}
> +
> +static void
> +setcontext_test (void)
> +{
> + volatile int setcontext_done = 0;
> + unsigned long svcr;
> + ucontext_t ctx;
> + void *p;
> + int r;
> + struct blk blk = {za_save, svl, {0}};
> +
> + printf ("setcontext test:\n");
> + p = get_tpidr2 ();
> + printf ("initial tp2 = %p\n", p);
> + if (p != NULL)
> + FAIL_EXIT1 ("tpidr2 is not initialized to 0");
> + svcr = get_svcr ();
> + if (svcr != 0)
> + FAIL_EXIT1 ("svcr != 0: %lu", svcr);
> + set_tpidr2 (&blk);
> + start_za ();
> + load_za (za_orig);
> +
> + print_data ("za save space", za_save);
> + p = get_tpidr2 ();
> + printf ("before getcontext: tp2 = %p\n", p);
> + if (p != &blk)
> + FAIL_EXIT1 ("tpidr2 is not set to BLK %p", (void *)&blk);
> + r = getcontext (&ctx);
> + if (r != 0)
> + FAIL_EXIT1 ("getcontext failed");
> + if (setcontext_done == 0)
> + {
> + p = get_tpidr2 ();
> + printf ("before setcontext: tp2 = %p\n", p);
> + if (p != &blk)
> + FAIL_EXIT1 ("tpidr2 is clobbered");
> + setcontext_done = 1;
> + do_setcontext (&ctx);
> + FAIL_EXIT1 ("setcontext returned");
> + }
> + p = get_tpidr2 ();
> + printf ("after setcontext: tp2 = %p\n", p);
> + if (p != NULL)
> + FAIL_EXIT1 ("tpidr2 is not set to 0");
> + svcr = get_svcr ();
> + if (svcr != 0)
> + FAIL_EXIT1 ("svcr != 0: %lu", svcr);
> + print_data ("za save space", za_save);
> + r = memcmp (za_orig, za_save, svl*svl);
> + if (r != 0)
> + FAIL_EXIT1 ("saving za failed");
> +}
> +
> +static int
> +do_test (void)
> +{
> + unsigned long hwcap2;
> +
> + hwcap2 = getauxval (AT_HWCAP2);
> + if ((hwcap2 & HWCAP2_SME) == 0)
> + return 77;
Use EXIT_UNSUPPORTED here.
> +
> + svl = get_svl ();
> + printf ("svl: %lu\n", svl);
> + if (svl < 16 || svl % 16 != 0 || svl >= (1 << 16))
> + FAIL_EXIT1 ("invalid svl");
> +
> + za_orig = malloc (svl*svl);
> + za_save = malloc (svl*svl);
> + za_dump = malloc (svl*svl);
Use xmalloc here (or xcalloc).
> + memset (za_orig, 1, svl*svl);
> + memset (za_save, 2, svl*svl);
> + memset (za_dump, 3, svl*svl);
> + for (int i = 0; i < svl; i++)
> + for (int j = 0; j < svl; j++)
> + za_orig[i*svl+j] = i*svl+j;
> + print_data ("original data", za_orig);
> +
> + longjmp_test ();
> +
> + memset (za_save, 2, svl*svl);
> + memset (za_dump, 3, svl*svl);
> +
> + setcontext_test ();
> +
> + free (za_orig);
> + free (za_save);
> + free (za_dump);
> + return 0;
> +}
> +
> +#include <support/test-driver.c>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 1/4] aarch64: Add SME runtime support
2023-12-28 13:41 ` Adhemerval Zanella Netto
@ 2024-01-02 17:15 ` Szabolcs Nagy
0 siblings, 0 replies; 11+ messages in thread
From: Szabolcs Nagy @ 2024-01-02 17:15 UTC (permalink / raw)
To: Adhemerval Zanella Netto, libc-alpha
The 12/28/2023 10:41, Adhemerval Zanella Netto wrote:
> On 08/12/23 13:32, Szabolcs Nagy wrote:
> > The runtime support routines for the call ABI of the Scalable Matrix
> > Extension (SME) are mostly in libgcc. Since libc.so cannot depend on
> > libgcc_s.so have an implementation of __arm_za_disable in libc for
> > libc internal use in longjmp and similar APIs.
> >
> > __libc_arm_za_disable follows the same PCS rules as __arm_za_disable,
> > but it's a hidden symbol so it does not need variant PCS marking.
> >
> > Using __libc_fatal instead of abort because it can print a message and
> > works in ld.so too. But for now we don't need SME routines in ld.so.
> >
> > To check the SME HWCAP in asm, we need the _dl_hwcap2 member offset in
> > _rtld_global_ro in the shared libc.so, while in libc.a the _dl_hwcap2
> > object is accessed.
>
> LGTM, thanks.
>
> Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
thanks.
> > -sysdep_routines += __mtag_tag_zero_region \
> > - __mtag_tag_region
> > +sysdep_routines += \
> > + __mtag_tag_zero_region \
> > + __mtag_tag_region \
> > + __arm_za_disable
> > endif
> >
> > ifeq ($(subdir),malloc)
>
> Ok (although usually the Makefile reflow makes more sense to be a
> unrelated patch).
i kept this as it is a minor refactor.
> > + /* check reserved bytes. */
>
> Maybe add that the the action chose is to abort if non-zero bytes
> are found.
updated the comment.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 4/4] aarch64: Add longjmp test for SME
2023-12-28 14:36 ` Adhemerval Zanella Netto
@ 2024-01-02 17:20 ` Szabolcs Nagy
0 siblings, 0 replies; 11+ messages in thread
From: Szabolcs Nagy @ 2024-01-02 17:20 UTC (permalink / raw)
To: Adhemerval Zanella Netto, libc-alpha
The 12/28/2023 11:36, Adhemerval Zanella Netto wrote:
> On 08/12/23 13:32, Szabolcs Nagy wrote:
> > Includes test for setcontext too.
> >
> > The test directly checks after longjmp if ZA got disabled and the
> > ZA contents got saved following the lazy saving scheme. It does not
> > use ACLE code to verify that gcc can interoperate with glibc.
>
> LGTM, thanks. Some minor suggestions below.
>
> Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
thanks.
> > + hwcap2 = getauxval (AT_HWCAP2);
> > + if ((hwcap2 & HWCAP2_SME) == 0)
> > + return 77;
>
> Use EXIT_UNSUPPORTED here.
changed.
> > + za_orig = malloc (svl*svl);
> > + za_save = malloc (svl*svl);
> > + za_dump = malloc (svl*svl);
>
> Use xmalloc here (or xcalloc).
updated to xmalloc.
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2024-01-02 17:21 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-12-08 16:31 [PATCH 0/4] aarch64: Add SME support Szabolcs Nagy
2023-12-08 16:32 ` [PATCH 1/4] aarch64: Add SME runtime support Szabolcs Nagy
2023-12-28 13:41 ` Adhemerval Zanella Netto
2024-01-02 17:15 ` Szabolcs Nagy
2023-12-08 16:32 ` [PATCH 2/4] aarch64: Add longjmp support for SME Szabolcs Nagy
2023-12-28 13:42 ` Adhemerval Zanella Netto
2023-12-08 16:32 ` [PATCH 3/4] aarch64: Add setcontext " Szabolcs Nagy
2023-12-28 13:42 ` Adhemerval Zanella Netto
2023-12-08 16:32 ` [PATCH 4/4] aarch64: Add longjmp test " Szabolcs Nagy
2023-12-28 14:36 ` Adhemerval Zanella Netto
2024-01-02 17:20 ` Szabolcs Nagy
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).