public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed
* [PATCH 0/4] aarch64: Add SME support
@ 2023-12-08 16:31 Szabolcs Nagy
  2023-12-08 16:32 ` [PATCH 1/4] aarch64: Add SME runtime support Szabolcs Nagy
                   ` (3 more replies)
  0 siblings, 4 replies; 11+ messages in thread
From: Szabolcs Nagy @ 2023-12-08 16:31 UTC (permalink / raw)
  To: libc-alpha

Most of the SME (Scalable Matrix Extension) runtime support is
in libgcc, a bit of that has to be repeated in the libc so we
don't depend on libgcc to handle the ZA register state in
longjmp.

Szabolcs Nagy (4):
  aarch64: Add SME runtime support
  aarch64: Add longjmp support for SME
  aarch64: Add setcontext support for SME
  aarch64: Add longjmp test for SME

 sysdeps/aarch64/Makefile                     |  13 +-
 sysdeps/aarch64/__arm_za_disable.S           | 112 ++++++++
 sysdeps/aarch64/__longjmp.S                  |  22 ++
 sysdeps/aarch64/rtld-global-offsets.sym      |  10 +
 sysdeps/aarch64/tst-sme-jmp.c                | 278 +++++++++++++++++++
 sysdeps/unix/sysv/linux/aarch64/setcontext.S |  19 ++
 6 files changed, 451 insertions(+), 3 deletions(-)
 create mode 100644 sysdeps/aarch64/__arm_za_disable.S
 create mode 100644 sysdeps/aarch64/rtld-global-offsets.sym
 create mode 100644 sysdeps/aarch64/tst-sme-jmp.c

-- 
2.25.1


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH 1/4] aarch64: Add SME runtime support
  2023-12-08 16:31 [PATCH 0/4] aarch64: Add SME support Szabolcs Nagy
@ 2023-12-08 16:32 ` Szabolcs Nagy
  2023-12-28 13:41   ` Adhemerval Zanella Netto
  2023-12-08 16:32 ` [PATCH 2/4] aarch64: Add longjmp support for SME Szabolcs Nagy
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 11+ messages in thread
From: Szabolcs Nagy @ 2023-12-08 16:32 UTC (permalink / raw)
  To: libc-alpha

The runtime support routines for the call ABI of the Scalable Matrix
Extension (SME) are mostly in libgcc. Since libc.so cannot depend on
libgcc_s.so have an implementation of __arm_za_disable in libc for
libc internal use in longjmp and similar APIs.

__libc_arm_za_disable follows the same PCS rules as __arm_za_disable,
but it's a hidden symbol so it does not need variant PCS marking.

Using __libc_fatal instead of abort because it can print a message and
works in ld.so too. But for now we don't need SME routines in ld.so.

To check the SME HWCAP in asm, we need the _dl_hwcap2 member offset in
_rtld_global_ro in the shared libc.so, while in libc.a the _dl_hwcap2
object is accessed.
---
 sysdeps/aarch64/Makefile                |  10 ++-
 sysdeps/aarch64/__arm_za_disable.S      | 112 ++++++++++++++++++++++++
 sysdeps/aarch64/rtld-global-offsets.sym |  10 +++
 3 files changed, 129 insertions(+), 3 deletions(-)
 create mode 100644 sysdeps/aarch64/__arm_za_disable.S
 create mode 100644 sysdeps/aarch64/rtld-global-offsets.sym

diff --git a/sysdeps/aarch64/Makefile b/sysdeps/aarch64/Makefile
index 6a9559e5f5..9d8844d9c8 100644
--- a/sysdeps/aarch64/Makefile
+++ b/sysdeps/aarch64/Makefile
@@ -48,7 +48,9 @@ endif
 endif
 
 ifeq ($(subdir),csu)
-gen-as-const-headers += tlsdesc.sym
+gen-as-const-headers += \
+  tlsdesc.sym \
+  rtld-global-offsets.sym
 endif
 
 ifeq ($(subdir),gmon)
@@ -62,8 +64,10 @@ endif
 
 ifeq ($(subdir),misc)
 sysdep_headers += sys/ifunc.h
-sysdep_routines += __mtag_tag_zero_region \
-		   __mtag_tag_region
+sysdep_routines += \
+  __mtag_tag_zero_region \
+  __mtag_tag_region \
+  __arm_za_disable
 endif
 
 ifeq ($(subdir),malloc)
diff --git a/sysdeps/aarch64/__arm_za_disable.S b/sysdeps/aarch64/__arm_za_disable.S
new file mode 100644
index 0000000000..f9e2d942f2
--- /dev/null
+++ b/sysdeps/aarch64/__arm_za_disable.S
@@ -0,0 +1,112 @@
+/* Libc internal support routine for SME.
+   Copyright (C) 2023 Free Software Foundation, Inc.
+
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+#include <rtld-global-offsets.h>
+
+#define HWCAP2_SME_BIT 23
+
+/* Disable ZA.  Call ABI:
+   - Private ZA, streaming-compatible.
+   - x0-x13, x19-x29, sp and fp regs are call preserved.
+   - On return tpidr2_el0 = 0, ZA = 0.
+   - Takes no argument.
+   - Does not return a value.
+   - Can abort on failure (then registers are not preserved).  */
+
+ENTRY (__libc_arm_za_disable)
+
+	/* Check if SME is available.  */
+#ifdef SHARED
+	/* In libc.so.  */
+	adrp	x14, :got:_rtld_global_ro
+	ldr	x14, [x14, :got_lo12:_rtld_global_ro]
+	ldr	x14, [x14, GLRO_DL_HWCAP2_OFFSET]
+#else
+	/* In libc.a, may be PIC.  */
+	adrp	x14, _dl_hwcap2
+	ldr	x14, [x14, :lo12:_dl_hwcap2]
+#endif
+	tbz	x14, HWCAP2_SME_BIT, L(end)
+
+	.inst	0xd53bd0ae  /* mrs	x14, tpidr2_el0  */
+	cbz	x14, L(end)
+
+	/* check reserved bytes.  */
+	ldrh	w15, [x14, 10]
+	ldr	w16, [x14, 12]
+	orr	w15, w15, w16
+	cbnz	w15, L(fail)
+
+	ldr	x16, [x14]
+	cbz	x16, L(end)
+	ldrh	w17, [x14, 8]
+	cbz	w17, L(end)
+
+	/* x14: tpidr2, x15: 0,
+	   x16: za_save_buffer, x17: num_za_save_slices.  */
+
+L(save_loop):
+	.inst	0xe1206200  /* str	za[w15, 0], [x16]  */
+	.inst	0xe1206201  /* str	za[w15, 1], [x16, 1, mul vl] */
+	.inst	0xe1206202  /* str	za[w15, 2], [x16, 2, mul vl] */
+	.inst	0xe1206203  /* str	za[w15, 3], [x16, 3, mul vl] */
+	.inst	0xe1206204  /* str	za[w15, 4], [x16, 4, mul vl] */
+	.inst	0xe1206205  /* str	za[w15, 5], [x16, 5, mul vl] */
+	.inst	0xe1206206  /* str	za[w15, 6], [x16, 6, mul vl] */
+	.inst	0xe1206207  /* str	za[w15, 7], [x16, 7, mul vl] */
+	.inst	0xe1206208  /* str	za[w15, 8], [x16, 8, mul vl] */
+	.inst	0xe1206209  /* str	za[w15, 9], [x16, 9, mul vl] */
+	.inst	0xe120620a  /* str	za[w15, 10], [x16, 10, mul vl] */
+	.inst	0xe120620b  /* str	za[w15, 11], [x16, 11, mul vl] */
+	.inst	0xe120620c  /* str	za[w15, 12], [x16, 12, mul vl] */
+	.inst	0xe120620d  /* str	za[w15, 13], [x16, 13, mul vl] */
+	.inst	0xe120620e  /* str	za[w15, 14], [x16, 14, mul vl] */
+	.inst	0xe120620f  /* str	za[w15, 15], [x16, 15, mul vl] */
+	add	w15, w15, 16
+	.inst	0x04305a10  /* addsvl	x16, x16, 16  */
+	cmp	w17, w15
+	bhi	L(save_loop)
+	.inst	0xd51bd0bf  /* msr	tpidr2_el0, xzr  */
+	.inst	0xd503447f  /* smstop	za  */
+L(end):
+	ret
+L(fail):
+#if HAVE_AARCH64_PAC_RET
+	PACIASP
+	cfi_window_save
+#endif
+	stp	x29, x30, [sp, -32]!
+	cfi_adjust_cfa_offset (32)
+	cfi_rel_offset (x29, 0)
+	cfi_rel_offset (x30, 8)
+	mov	x29, sp
+	.inst	0x04e0e3f0  /* cntd	x16  */
+	str	x16, [sp, 16]
+	cfi_rel_offset (46, 16)
+	.inst	0xd503467f  /* smstop  */
+	adrp	x0, L(msg)
+	add	x0, x0, :lo12:L(msg)
+	bl	HIDDEN_JUMPTARGET (__libc_fatal)
+END (__libc_arm_za_disable)
+
+	.section        .rodata
+	.align  3
+L(msg):
+	.string "FATAL: __libc_arm_za_disable failed.\n"
diff --git a/sysdeps/aarch64/rtld-global-offsets.sym b/sysdeps/aarch64/rtld-global-offsets.sym
new file mode 100644
index 0000000000..23cdaf7d9e
--- /dev/null
+++ b/sysdeps/aarch64/rtld-global-offsets.sym
@@ -0,0 +1,10 @@
+#define SHARED 1
+
+#include <ldsodefs.h>
+
+#define GLRO_offsetof(name) offsetof (struct rtld_global_ro, _##name)
+
+-- Offsets of _rtld_global_ro in libc.so
+
+GLRO_DL_HWCAP_OFFSET	GLRO_offsetof (dl_hwcap)
+GLRO_DL_HWCAP2_OFFSET	GLRO_offsetof (dl_hwcap2)
-- 
2.25.1


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH 2/4] aarch64: Add longjmp support for SME
  2023-12-08 16:31 [PATCH 0/4] aarch64: Add SME support Szabolcs Nagy
  2023-12-08 16:32 ` [PATCH 1/4] aarch64: Add SME runtime support Szabolcs Nagy
@ 2023-12-08 16:32 ` Szabolcs Nagy
  2023-12-28 13:42   ` Adhemerval Zanella Netto
  2023-12-08 16:32 ` [PATCH 3/4] aarch64: Add setcontext " Szabolcs Nagy
  2023-12-08 16:32 ` [PATCH 4/4] aarch64: Add longjmp test " Szabolcs Nagy
  3 siblings, 1 reply; 11+ messages in thread
From: Szabolcs Nagy @ 2023-12-08 16:32 UTC (permalink / raw)
  To: libc-alpha

For the ZA lazy saving scheme to work, longjmp has to call
__libc_arm_za_disable.

In ld.so we assume ZA is not used so longjmp does not need
special support there.
---
 sysdeps/aarch64/__longjmp.S | 22 ++++++++++++++++++++++
 1 file changed, 22 insertions(+)

diff --git a/sysdeps/aarch64/__longjmp.S b/sysdeps/aarch64/__longjmp.S
index d743e7478d..659199d7d4 100644
--- a/sysdeps/aarch64/__longjmp.S
+++ b/sysdeps/aarch64/__longjmp.S
@@ -49,6 +49,28 @@ ENTRY (__longjmp)
 
 	PTR_ARG (0)
 
+#if IS_IN(libc)
+	/* Disable ZA state of SME in libc.a and libc.so, but not in ld.so.  */
+# if HAVE_AARCH64_PAC_RET
+	PACIASP
+	cfi_window_save
+# endif
+	stp	x29, x30, [sp, -16]!
+	cfi_adjust_cfa_offset (16)
+	cfi_rel_offset (x29, 0)
+	cfi_rel_offset (x30, 8)
+	mov	x29, sp
+	bl	__libc_arm_za_disable
+	ldp	x29, x30, [sp], 16
+	cfi_adjust_cfa_offset (-16)
+	cfi_restore (x29)
+	cfi_restore (x30)
+# if HAVE_AARCH64_PAC_RET
+	AUTIASP
+	cfi_window_save
+# endif
+#endif
+
 	ldp	x19, x20, [x0, #JB_X19<<3]
 	ldp	x21, x22, [x0, #JB_X21<<3]
 	ldp	x23, x24, [x0, #JB_X23<<3]
-- 
2.25.1


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH 3/4] aarch64: Add setcontext support for SME
  2023-12-08 16:31 [PATCH 0/4] aarch64: Add SME support Szabolcs Nagy
  2023-12-08 16:32 ` [PATCH 1/4] aarch64: Add SME runtime support Szabolcs Nagy
  2023-12-08 16:32 ` [PATCH 2/4] aarch64: Add longjmp support for SME Szabolcs Nagy
@ 2023-12-08 16:32 ` Szabolcs Nagy
  2023-12-28 13:42   ` Adhemerval Zanella Netto
  2023-12-08 16:32 ` [PATCH 4/4] aarch64: Add longjmp test " Szabolcs Nagy
  3 siblings, 1 reply; 11+ messages in thread
From: Szabolcs Nagy @ 2023-12-08 16:32 UTC (permalink / raw)
  To: libc-alpha

For the ZA lazy saving scheme to work, setcontext has to call
__libc_arm_za_disable.

Also fixes swapcontext which uses setcontext internally.
---
 sysdeps/unix/sysv/linux/aarch64/setcontext.S | 19 +++++++++++++++++++
 1 file changed, 19 insertions(+)

diff --git a/sysdeps/unix/sysv/linux/aarch64/setcontext.S b/sysdeps/unix/sysv/linux/aarch64/setcontext.S
index d7756015c5..fe75adf61e 100644
--- a/sysdeps/unix/sysv/linux/aarch64/setcontext.S
+++ b/sysdeps/unix/sysv/linux/aarch64/setcontext.S
@@ -49,6 +49,25 @@ ENTRY (__setcontext)
 	cbz	x0, 1f
 	b	C_SYMBOL_NAME (__syscall_error)
 1:
+	/* Disable ZA of SME.  */
+#if HAVE_AARCH64_PAC_RET
+	PACIASP
+	cfi_window_save
+#endif
+	stp	x29, x30, [sp, -16]!
+	cfi_adjust_cfa_offset (16)
+	cfi_rel_offset (x29, 0)
+	cfi_rel_offset (x30, 8)
+	mov	x29, sp
+	bl	__libc_arm_za_disable
+	ldp	x29, x30, [sp], 16
+	cfi_adjust_cfa_offset (-16)
+	cfi_restore (x29)
+	cfi_restore (x30)
+#if HAVE_AARCH64_PAC_RET
+	AUTIASP
+	cfi_window_save
+#endif
 	/* Restore the general purpose registers.  */
 	mov	x0, x9
 	cfi_def_cfa (x0, 0)
-- 
2.25.1


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH 4/4] aarch64: Add longjmp test for SME
  2023-12-08 16:31 [PATCH 0/4] aarch64: Add SME support Szabolcs Nagy
                   ` (2 preceding siblings ...)
  2023-12-08 16:32 ` [PATCH 3/4] aarch64: Add setcontext " Szabolcs Nagy
@ 2023-12-08 16:32 ` Szabolcs Nagy
  2023-12-28 14:36   ` Adhemerval Zanella Netto
  3 siblings, 1 reply; 11+ messages in thread
From: Szabolcs Nagy @ 2023-12-08 16:32 UTC (permalink / raw)
  To: libc-alpha

Includes test for setcontext too.

The test directly checks after longjmp if ZA got disabled and the
ZA contents got saved following the lazy saving scheme. It does not
use ACLE code to verify that gcc can interoperate with glibc.
---
 sysdeps/aarch64/Makefile      |   3 +
 sysdeps/aarch64/tst-sme-jmp.c | 278 ++++++++++++++++++++++++++++++++++
 2 files changed, 281 insertions(+)
 create mode 100644 sysdeps/aarch64/tst-sme-jmp.c

diff --git a/sysdeps/aarch64/Makefile b/sysdeps/aarch64/Makefile
index 9d8844d9c8..141d7d9cc2 100644
--- a/sysdeps/aarch64/Makefile
+++ b/sysdeps/aarch64/Makefile
@@ -68,6 +68,9 @@ sysdep_routines += \
   __mtag_tag_zero_region \
   __mtag_tag_region \
   __arm_za_disable
+
+tests += \
+  tst-sme-jmp
 endif
 
 ifeq ($(subdir),malloc)
diff --git a/sysdeps/aarch64/tst-sme-jmp.c b/sysdeps/aarch64/tst-sme-jmp.c
new file mode 100644
index 0000000000..08dd291b0c
--- /dev/null
+++ b/sysdeps/aarch64/tst-sme-jmp.c
@@ -0,0 +1,278 @@
+/* Test for SME longjmp.
+   Copyright (C) 2023 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <stdint.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <setjmp.h>
+#include <ucontext.h>
+#include <sys/auxv.h>
+#include <support/check.h>
+
+struct blk {
+  void *za_save_buffer;
+  uint16_t num_za_save_slices;
+  char __reserved[6];
+};
+
+static unsigned long svl;
+static uint8_t *za_orig;
+static uint8_t *za_dump;
+static uint8_t *za_save;
+
+static unsigned long
+get_svl (void)
+{
+  register unsigned long x0 asm ("x0");
+  asm volatile (
+    ".inst   0x04bf5820  /* rdsvl   x0, 1  */\n"
+    : "=r" (x0));
+  return x0;
+}
+
+/* PSTATE.ZA = 1, set ZA state to active.  */
+static void
+start_za (void)
+{
+  asm volatile (
+    ".inst   0xd503457f  /* smstart za  */");
+}
+
+/* Read SVCR to get SM (bit0) and ZA (bit1) state.  */
+static unsigned long
+get_svcr (void)
+{
+  register unsigned long x0 asm ("x0");
+  asm volatile (
+    ".inst   0xd53b4240  /* mrs     x0, svcr  */\n"
+    : "=r" (x0));
+  return x0;
+}
+
+/* Load data into ZA byte by byte from p.  */
+static void __attribute__ ((noinline))
+load_za (const void *p)
+{
+  register unsigned long x15 asm ("x15") = 0;
+  register unsigned long x16 asm ("x16") = (unsigned long)p;
+  register unsigned long x17 asm ("x17") = svl;
+
+  asm volatile (
+    ".inst   0xd503437f  /* smstart sm  */\n"
+    ".L_ldr_loop:\n"
+    ".inst   0xe1006200  /* ldr     za[w15, 0], [x16]  */\n"
+    "add     w15, w15, 1\n"
+    ".inst   0x04305030  /* addvl   x16, x16, 1  */\n"
+    "cmp     w15, w17\n"
+    "bne     .L_ldr_loop\n"
+    ".inst   0xd503427f  /* smstop  sm  */\n"
+    : "+r"(x15), "+r"(x16), "+r"(x17));
+}
+
+/* Set tpidr2 to BLK.  */
+static void
+set_tpidr2 (struct blk *blk)
+{
+  register unsigned long x0 asm ("x0") = (unsigned long)blk;
+  asm volatile (
+    ".inst   0xd51bd0a0  /* msr     tpidr2_el0, x0  */\n"
+    :: "r"(x0) : "memory");
+}
+
+/* Returns tpidr2.  */
+static void *
+get_tpidr2 (void)
+{
+  register unsigned long x0 asm ("x0");
+  asm volatile (
+    ".inst   0xd53bd0a0  /* mrs     x0, tpidr2_el0  */\n"
+    : "=r"(x0) :: "memory");
+  return (void *) x0;
+}
+
+static void
+print_data(const char *msg, void *p)
+{
+  unsigned char *a = p;
+  printf ("%s:\n", msg);
+  for (int i = 0; i < svl; i++)
+    {
+      printf ("%d: ", i);
+      for (int j = 0; j < svl; j++)
+	printf("%02x,", a[i*svl+j]);
+      printf("\n");
+    }
+  printf(".\n");
+  fflush (stdout);
+}
+
+__attribute__ ((noinline))
+static void
+do_longjmp (jmp_buf env)
+{
+  longjmp (env, 1);
+}
+
+__attribute__ ((noinline))
+static void
+do_setcontext (const ucontext_t *p)
+{
+  setcontext (p);
+}
+
+static void
+longjmp_test (void)
+{
+  unsigned long svcr;
+  jmp_buf env;
+  void *p;
+  int r;
+  struct blk blk = {za_save, svl, {0}};
+
+  printf ("longjmp test:\n");
+  p = get_tpidr2 ();
+  printf ("initial tp2 = %p\n", p);
+  if (p != NULL)
+    FAIL_EXIT1 ("tpidr2 is not initialized to 0");
+  svcr = get_svcr ();
+  if (svcr != 0)
+    FAIL_EXIT1 ("svcr != 0: %lu", svcr);
+  set_tpidr2 (&blk);
+  start_za ();
+  load_za (za_orig);
+
+  print_data ("za save space", za_save);
+  p = get_tpidr2 ();
+  printf ("before setjmp: tp2 = %p\n", p);
+  if (p != &blk)
+    FAIL_EXIT1 ("tpidr2 is not set to BLK %p", (void *)&blk);
+  if (setjmp (env) == 0)
+    {
+      p = get_tpidr2 ();
+      printf ("before longjmp: tp2 = %p\n", p);
+      if (p != &blk)
+	FAIL_EXIT1 ("tpidr2 is clobbered");
+      do_longjmp (env);
+      FAIL_EXIT1 ("longjmp returned");
+    }
+  p = get_tpidr2 ();
+  printf ("after longjmp: tp2 = %p\n", p);
+  if (p != NULL)
+    FAIL_EXIT1 ("tpidr2 is not set to 0");
+  svcr = get_svcr ();
+  if (svcr != 0)
+    FAIL_EXIT1 ("svcr != 0: %lu", svcr);
+  print_data ("za save space", za_save);
+  r = memcmp (za_orig, za_save, svl*svl);
+  if (r != 0)
+    FAIL_EXIT1 ("saving za failed");
+}
+
+static void
+setcontext_test (void)
+{
+  volatile int setcontext_done = 0;
+  unsigned long svcr;
+  ucontext_t ctx;
+  void *p;
+  int r;
+  struct blk blk = {za_save, svl, {0}};
+
+  printf ("setcontext test:\n");
+  p = get_tpidr2 ();
+  printf ("initial tp2 = %p\n", p);
+  if (p != NULL)
+    FAIL_EXIT1 ("tpidr2 is not initialized to 0");
+  svcr = get_svcr ();
+  if (svcr != 0)
+    FAIL_EXIT1 ("svcr != 0: %lu", svcr);
+  set_tpidr2 (&blk);
+  start_za ();
+  load_za (za_orig);
+
+  print_data ("za save space", za_save);
+  p = get_tpidr2 ();
+  printf ("before getcontext: tp2 = %p\n", p);
+  if (p != &blk)
+    FAIL_EXIT1 ("tpidr2 is not set to BLK %p", (void *)&blk);
+  r = getcontext (&ctx);
+  if (r != 0)
+    FAIL_EXIT1 ("getcontext failed");
+  if (setcontext_done == 0)
+    {
+      p = get_tpidr2 ();
+      printf ("before setcontext: tp2 = %p\n", p);
+      if (p != &blk)
+	FAIL_EXIT1 ("tpidr2 is clobbered");
+      setcontext_done = 1;
+      do_setcontext (&ctx);
+      FAIL_EXIT1 ("setcontext returned");
+    }
+  p = get_tpidr2 ();
+  printf ("after setcontext: tp2 = %p\n", p);
+  if (p != NULL)
+    FAIL_EXIT1 ("tpidr2 is not set to 0");
+  svcr = get_svcr ();
+  if (svcr != 0)
+    FAIL_EXIT1 ("svcr != 0: %lu", svcr);
+  print_data ("za save space", za_save);
+  r = memcmp (za_orig, za_save, svl*svl);
+  if (r != 0)
+    FAIL_EXIT1 ("saving za failed");
+}
+
+static int
+do_test (void)
+{
+  unsigned long hwcap2;
+
+  hwcap2 = getauxval (AT_HWCAP2);
+  if ((hwcap2 & HWCAP2_SME) == 0)
+    return 77;
+
+  svl = get_svl ();
+  printf ("svl: %lu\n", svl);
+  if (svl < 16 || svl % 16 != 0 || svl >= (1 << 16))
+    FAIL_EXIT1 ("invalid svl");
+
+  za_orig = malloc (svl*svl);
+  za_save = malloc (svl*svl);
+  za_dump = malloc (svl*svl);
+  memset (za_orig, 1, svl*svl);
+  memset (za_save, 2, svl*svl);
+  memset (za_dump, 3, svl*svl);
+  for (int i = 0; i < svl; i++)
+    for (int j = 0; j < svl; j++)
+      za_orig[i*svl+j] = i*svl+j;
+  print_data ("original data", za_orig);
+
+  longjmp_test ();
+
+  memset (za_save, 2, svl*svl);
+  memset (za_dump, 3, svl*svl);
+
+  setcontext_test ();
+
+  free (za_orig);
+  free (za_save);
+  free (za_dump);
+  return 0;
+}
+
+#include <support/test-driver.c>
-- 
2.25.1


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/4] aarch64: Add SME runtime support
  2023-12-08 16:32 ` [PATCH 1/4] aarch64: Add SME runtime support Szabolcs Nagy
@ 2023-12-28 13:41   ` Adhemerval Zanella Netto
  2024-01-02 17:15     ` Szabolcs Nagy
  0 siblings, 1 reply; 11+ messages in thread
From: Adhemerval Zanella Netto @ 2023-12-28 13:41 UTC (permalink / raw)
  To: libc-alpha, Szabolcs Nagy



On 08/12/23 13:32, Szabolcs Nagy wrote:
> The runtime support routines for the call ABI of the Scalable Matrix
> Extension (SME) are mostly in libgcc. Since libc.so cannot depend on
> libgcc_s.so have an implementation of __arm_za_disable in libc for
> libc internal use in longjmp and similar APIs.
> 
> __libc_arm_za_disable follows the same PCS rules as __arm_za_disable,
> but it's a hidden symbol so it does not need variant PCS marking.
> 
> Using __libc_fatal instead of abort because it can print a message and
> works in ld.so too. But for now we don't need SME routines in ld.so.
> 
> To check the SME HWCAP in asm, we need the _dl_hwcap2 member offset in
> _rtld_global_ro in the shared libc.so, while in libc.a the _dl_hwcap2
> object is accessed.

LGTM, thanks.

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>

> ---
>  sysdeps/aarch64/Makefile                |  10 ++-
>  sysdeps/aarch64/__arm_za_disable.S      | 112 ++++++++++++++++++++++++
>  sysdeps/aarch64/rtld-global-offsets.sym |  10 +++
>  3 files changed, 129 insertions(+), 3 deletions(-)
>  create mode 100644 sysdeps/aarch64/__arm_za_disable.S
>  create mode 100644 sysdeps/aarch64/rtld-global-offsets.sym
> 
> diff --git a/sysdeps/aarch64/Makefile b/sysdeps/aarch64/Makefile
> index 6a9559e5f5..9d8844d9c8 100644
> --- a/sysdeps/aarch64/Makefile
> +++ b/sysdeps/aarch64/Makefile
> @@ -48,7 +48,9 @@ endif
>  endif
>  
>  ifeq ($(subdir),csu)
> -gen-as-const-headers += tlsdesc.sym
> +gen-as-const-headers += \
> +  tlsdesc.sym \
> +  rtld-global-offsets.sym
>  endif
>  
>  ifeq ($(subdir),gmon)
> @@ -62,8 +64,10 @@ endif
>  
>  ifeq ($(subdir),misc)
>  sysdep_headers += sys/ifunc.h
> -sysdep_routines += __mtag_tag_zero_region \
> -		   __mtag_tag_region
> +sysdep_routines += \
> +  __mtag_tag_zero_region \
> +  __mtag_tag_region \
> +  __arm_za_disable
>  endif
>  
>  ifeq ($(subdir),malloc)

Ok (although usually the Makefile reflow makes more sense to be a
unrelated patch).

> diff --git a/sysdeps/aarch64/__arm_za_disable.S b/sysdeps/aarch64/__arm_za_disable.S
> new file mode 100644
> index 0000000000..f9e2d942f2
> --- /dev/null
> +++ b/sysdeps/aarch64/__arm_za_disable.S
> @@ -0,0 +1,112 @@
> +/* Libc internal support routine for SME.
> +   Copyright (C) 2023 Free Software Foundation, Inc.
> +
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library.  If not, see
> +   <http://www.gnu.org/licenses/>.  */
> +
> +#include <sysdep.h>
> +#include <rtld-global-offsets.h>
> +
> +#define HWCAP2_SME_BIT 23
> +
> +/* Disable ZA.  Call ABI:
> +   - Private ZA, streaming-compatible.
> +   - x0-x13, x19-x29, sp and fp regs are call preserved.
> +   - On return tpidr2_el0 = 0, ZA = 0.
> +   - Takes no argument.
> +   - Does not return a value.
> +   - Can abort on failure (then registers are not preserved).  */
> +
> +ENTRY (__libc_arm_za_disable)
> +
> +	/* Check if SME is available.  */
> +#ifdef SHARED
> +	/* In libc.so.  */
> +	adrp	x14, :got:_rtld_global_ro
> +	ldr	x14, [x14, :got_lo12:_rtld_global_ro]
> +	ldr	x14, [x14, GLRO_DL_HWCAP2_OFFSET]
> +#else
> +	/* In libc.a, may be PIC.  */
> +	adrp	x14, _dl_hwcap2
> +	ldr	x14, [x14, :lo12:_dl_hwcap2]
> +#endif
> +	tbz	x14, HWCAP2_SME_BIT, L(end)
> +
> +	.inst	0xd53bd0ae  /* mrs	x14, tpidr2_el0  */
> +	cbz	x14, L(end)
> +
> +	/* check reserved bytes.  */

Maybe add that the the action chose is to abort if non-zero bytes
are found.

> +	ldrh	w15, [x14, 10]
> +	ldr	w16, [x14, 12]
> +	orr	w15, w15, w16
> +	cbnz	w15, L(fail)
> +
> +	ldr	x16, [x14]
> +	cbz	x16, L(end)
> +	ldrh	w17, [x14, 8]
> +	cbz	w17, L(end)
> +
> +	/* x14: tpidr2, x15: 0,
> +	   x16: za_save_buffer, x17: num_za_save_slices.  */
> +
> +L(save_loop):
> +	.inst	0xe1206200  /* str	za[w15, 0], [x16]  */
> +	.inst	0xe1206201  /* str	za[w15, 1], [x16, 1, mul vl] */
> +	.inst	0xe1206202  /* str	za[w15, 2], [x16, 2, mul vl] */
> +	.inst	0xe1206203  /* str	za[w15, 3], [x16, 3, mul vl] */
> +	.inst	0xe1206204  /* str	za[w15, 4], [x16, 4, mul vl] */
> +	.inst	0xe1206205  /* str	za[w15, 5], [x16, 5, mul vl] */
> +	.inst	0xe1206206  /* str	za[w15, 6], [x16, 6, mul vl] */
> +	.inst	0xe1206207  /* str	za[w15, 7], [x16, 7, mul vl] */
> +	.inst	0xe1206208  /* str	za[w15, 8], [x16, 8, mul vl] */
> +	.inst	0xe1206209  /* str	za[w15, 9], [x16, 9, mul vl] */
> +	.inst	0xe120620a  /* str	za[w15, 10], [x16, 10, mul vl] */
> +	.inst	0xe120620b  /* str	za[w15, 11], [x16, 11, mul vl] */
> +	.inst	0xe120620c  /* str	za[w15, 12], [x16, 12, mul vl] */
> +	.inst	0xe120620d  /* str	za[w15, 13], [x16, 13, mul vl] */
> +	.inst	0xe120620e  /* str	za[w15, 14], [x16, 14, mul vl] */
> +	.inst	0xe120620f  /* str	za[w15, 15], [x16, 15, mul vl] */
> +	add	w15, w15, 16
> +	.inst	0x04305a10  /* addsvl	x16, x16, 16  */
> +	cmp	w17, w15
> +	bhi	L(save_loop)
> +	.inst	0xd51bd0bf  /* msr	tpidr2_el0, xzr  */
> +	.inst	0xd503447f  /* smstop	za  */
> +L(end):
> +	ret
> +L(fail):
> +#if HAVE_AARCH64_PAC_RET
> +	PACIASP
> +	cfi_window_save
> +#endif
> +	stp	x29, x30, [sp, -32]!
> +	cfi_adjust_cfa_offset (32)
> +	cfi_rel_offset (x29, 0)
> +	cfi_rel_offset (x30, 8)
> +	mov	x29, sp
> +	.inst	0x04e0e3f0  /* cntd	x16  */
> +	str	x16, [sp, 16]
> +	cfi_rel_offset (46, 16)
> +	.inst	0xd503467f  /* smstop  */
> +	adrp	x0, L(msg)
> +	add	x0, x0, :lo12:L(msg)
> +	bl	HIDDEN_JUMPTARGET (__libc_fatal)
> +END (__libc_arm_za_disable)
> +
> +	.section        .rodata
> +	.align  3
> +L(msg):
> +	.string "FATAL: __libc_arm_za_disable failed.\n"

Ok.

> diff --git a/sysdeps/aarch64/rtld-global-offsets.sym b/sysdeps/aarch64/rtld-global-offsets.sym
> new file mode 100644
> index 0000000000..23cdaf7d9e
> --- /dev/null
> +++ b/sysdeps/aarch64/rtld-global-offsets.sym
> @@ -0,0 +1,10 @@
> +#define SHARED 1
> +
> +#include <ldsodefs.h>
> +
> +#define GLRO_offsetof(name) offsetof (struct rtld_global_ro, _##name)
> +
> +-- Offsets of _rtld_global_ro in libc.so
> +
> +GLRO_DL_HWCAP_OFFSET	GLRO_offsetof (dl_hwcap)

It seems to be unused.

> +GLRO_DL_HWCAP2_OFFSET	GLRO_offsetof (dl_hwcap2)

OK.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/4] aarch64: Add longjmp support for SME
  2023-12-08 16:32 ` [PATCH 2/4] aarch64: Add longjmp support for SME Szabolcs Nagy
@ 2023-12-28 13:42   ` Adhemerval Zanella Netto
  0 siblings, 0 replies; 11+ messages in thread
From: Adhemerval Zanella Netto @ 2023-12-28 13:42 UTC (permalink / raw)
  To: Szabolcs Nagy, libc-alpha



On 08/12/23 13:32, Szabolcs Nagy wrote:
> For the ZA lazy saving scheme to work, longjmp has to call
> __libc_arm_za_disable.
> 
> In ld.so we assume ZA is not used so longjmp does not need
> special support there.

LGTM, thanks.

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>

> ---
>  sysdeps/aarch64/__longjmp.S | 22 ++++++++++++++++++++++
>  1 file changed, 22 insertions(+)
> 
> diff --git a/sysdeps/aarch64/__longjmp.S b/sysdeps/aarch64/__longjmp.S
> index d743e7478d..659199d7d4 100644
> --- a/sysdeps/aarch64/__longjmp.S
> +++ b/sysdeps/aarch64/__longjmp.S
> @@ -49,6 +49,28 @@ ENTRY (__longjmp)
>  
>  	PTR_ARG (0)
>  
> +#if IS_IN(libc)
> +	/* Disable ZA state of SME in libc.a and libc.so, but not in ld.so.  */
> +# if HAVE_AARCH64_PAC_RET
> +	PACIASP
> +	cfi_window_save
> +# endif
> +	stp	x29, x30, [sp, -16]!
> +	cfi_adjust_cfa_offset (16)
> +	cfi_rel_offset (x29, 0)
> +	cfi_rel_offset (x30, 8)
> +	mov	x29, sp
> +	bl	__libc_arm_za_disable
> +	ldp	x29, x30, [sp], 16
> +	cfi_adjust_cfa_offset (-16)
> +	cfi_restore (x29)
> +	cfi_restore (x30)
> +# if HAVE_AARCH64_PAC_RET
> +	AUTIASP
> +	cfi_window_save
> +# endif
> +#endif
> +
>  	ldp	x19, x20, [x0, #JB_X19<<3]
>  	ldp	x21, x22, [x0, #JB_X21<<3]
>  	ldp	x23, x24, [x0, #JB_X23<<3]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 3/4] aarch64: Add setcontext support for SME
  2023-12-08 16:32 ` [PATCH 3/4] aarch64: Add setcontext " Szabolcs Nagy
@ 2023-12-28 13:42   ` Adhemerval Zanella Netto
  0 siblings, 0 replies; 11+ messages in thread
From: Adhemerval Zanella Netto @ 2023-12-28 13:42 UTC (permalink / raw)
  To: Szabolcs Nagy, libc-alpha



On 08/12/23 13:32, Szabolcs Nagy wrote:
> For the ZA lazy saving scheme to work, setcontext has to call
> __libc_arm_za_disable.
> 
> Also fixes swapcontext which uses setcontext internally.

LGTM, thanks.

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>

> ---
>  sysdeps/unix/sysv/linux/aarch64/setcontext.S | 19 +++++++++++++++++++
>  1 file changed, 19 insertions(+)
> 
> diff --git a/sysdeps/unix/sysv/linux/aarch64/setcontext.S b/sysdeps/unix/sysv/linux/aarch64/setcontext.S
> index d7756015c5..fe75adf61e 100644
> --- a/sysdeps/unix/sysv/linux/aarch64/setcontext.S
> +++ b/sysdeps/unix/sysv/linux/aarch64/setcontext.S
> @@ -49,6 +49,25 @@ ENTRY (__setcontext)
>  	cbz	x0, 1f
>  	b	C_SYMBOL_NAME (__syscall_error)
>  1:
> +	/* Disable ZA of SME.  */
> +#if HAVE_AARCH64_PAC_RET
> +	PACIASP
> +	cfi_window_save
> +#endif
> +	stp	x29, x30, [sp, -16]!
> +	cfi_adjust_cfa_offset (16)
> +	cfi_rel_offset (x29, 0)
> +	cfi_rel_offset (x30, 8)
> +	mov	x29, sp
> +	bl	__libc_arm_za_disable
> +	ldp	x29, x30, [sp], 16
> +	cfi_adjust_cfa_offset (-16)
> +	cfi_restore (x29)
> +	cfi_restore (x30)
> +#if HAVE_AARCH64_PAC_RET
> +	AUTIASP
> +	cfi_window_save
> +#endif
>  	/* Restore the general purpose registers.  */
>  	mov	x0, x9
>  	cfi_def_cfa (x0, 0)

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 4/4] aarch64: Add longjmp test for SME
  2023-12-08 16:32 ` [PATCH 4/4] aarch64: Add longjmp test " Szabolcs Nagy
@ 2023-12-28 14:36   ` Adhemerval Zanella Netto
  2024-01-02 17:20     ` Szabolcs Nagy
  0 siblings, 1 reply; 11+ messages in thread
From: Adhemerval Zanella Netto @ 2023-12-28 14:36 UTC (permalink / raw)
  To: libc-alpha, Szabolcs Nagy



On 08/12/23 13:32, Szabolcs Nagy wrote:
> Includes test for setcontext too.
> 
> The test directly checks after longjmp if ZA got disabled and the
> ZA contents got saved following the lazy saving scheme. It does not
> use ACLE code to verify that gcc can interoperate with glibc.

LGTM, thanks.  Some minor suggestions below.

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>

> ---
>  sysdeps/aarch64/Makefile      |   3 +
>  sysdeps/aarch64/tst-sme-jmp.c | 278 ++++++++++++++++++++++++++++++++++
>  2 files changed, 281 insertions(+)
>  create mode 100644 sysdeps/aarch64/tst-sme-jmp.c
> 
> diff --git a/sysdeps/aarch64/Makefile b/sysdeps/aarch64/Makefile
> index 9d8844d9c8..141d7d9cc2 100644
> --- a/sysdeps/aarch64/Makefile
> +++ b/sysdeps/aarch64/Makefile
> @@ -68,6 +68,9 @@ sysdep_routines += \
>    __mtag_tag_zero_region \
>    __mtag_tag_region \
>    __arm_za_disable
> +
> +tests += \
> +  tst-sme-jmp
>  endif
>  
>  ifeq ($(subdir),malloc)

Ok.

> diff --git a/sysdeps/aarch64/tst-sme-jmp.c b/sysdeps/aarch64/tst-sme-jmp.c
> new file mode 100644
> index 0000000000..08dd291b0c
> --- /dev/null
> +++ b/sysdeps/aarch64/tst-sme-jmp.c
> @@ -0,0 +1,278 @@
> +/* Test for SME longjmp.
> +   Copyright (C) 2023 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#include <stdint.h>
> +#include <stdio.h>
> +#include <stdlib.h>
> +#include <string.h>
> +#include <setjmp.h>
> +#include <ucontext.h>
> +#include <sys/auxv.h>
> +#include <support/check.h>
> +
> +struct blk {
> +  void *za_save_buffer;
> +  uint16_t num_za_save_slices;
> +  char __reserved[6];
> +};
> +
> +static unsigned long svl;
> +static uint8_t *za_orig;
> +static uint8_t *za_dump;
> +static uint8_t *za_save;
> +
> +static unsigned long
> +get_svl (void)
> +{
> +  register unsigned long x0 asm ("x0");
> +  asm volatile (
> +    ".inst   0x04bf5820  /* rdsvl   x0, 1  */\n"
> +    : "=r" (x0));
> +  return x0;
> +}
> +
> +/* PSTATE.ZA = 1, set ZA state to active.  */
> +static void
> +start_za (void)
> +{
> +  asm volatile (
> +    ".inst   0xd503457f  /* smstart za  */");
> +}
> +
> +/* Read SVCR to get SM (bit0) and ZA (bit1) state.  */
> +static unsigned long
> +get_svcr (void)
> +{
> +  register unsigned long x0 asm ("x0");
> +  asm volatile (
> +    ".inst   0xd53b4240  /* mrs     x0, svcr  */\n"
> +    : "=r" (x0));
> +  return x0;
> +}
> +
> +/* Load data into ZA byte by byte from p.  */
> +static void __attribute__ ((noinline))
> +load_za (const void *p)
> +{
> +  register unsigned long x15 asm ("x15") = 0;
> +  register unsigned long x16 asm ("x16") = (unsigned long)p;
> +  register unsigned long x17 asm ("x17") = svl;
> +
> +  asm volatile (
> +    ".inst   0xd503437f  /* smstart sm  */\n"
> +    ".L_ldr_loop:\n"
> +    ".inst   0xe1006200  /* ldr     za[w15, 0], [x16]  */\n"
> +    "add     w15, w15, 1\n"
> +    ".inst   0x04305030  /* addvl   x16, x16, 1  */\n"
> +    "cmp     w15, w17\n"
> +    "bne     .L_ldr_loop\n"
> +    ".inst   0xd503427f  /* smstop  sm  */\n"
> +    : "+r"(x15), "+r"(x16), "+r"(x17));
> +}
> +
> +/* Set tpidr2 to BLK.  */
> +static void
> +set_tpidr2 (struct blk *blk)
> +{
> +  register unsigned long x0 asm ("x0") = (unsigned long)blk;
> +  asm volatile (
> +    ".inst   0xd51bd0a0  /* msr     tpidr2_el0, x0  */\n"
> +    :: "r"(x0) : "memory");
> +}
> +
> +/* Returns tpidr2.  */
> +static void *
> +get_tpidr2 (void)
> +{
> +  register unsigned long x0 asm ("x0");
> +  asm volatile (
> +    ".inst   0xd53bd0a0  /* mrs     x0, tpidr2_el0  */\n"
> +    : "=r"(x0) :: "memory");
> +  return (void *) x0;
> +}
> +
> +static void
> +print_data(const char *msg, void *p)
> +{
> +  unsigned char *a = p;
> +  printf ("%s:\n", msg);
> +  for (int i = 0; i < svl; i++)
> +    {
> +      printf ("%d: ", i);
> +      for (int j = 0; j < svl; j++)
> +	printf("%02x,", a[i*svl+j]);
> +      printf("\n");
> +    }
> +  printf(".\n");
> +  fflush (stdout);
> +}
> +
> +__attribute__ ((noinline))
> +static void
> +do_longjmp (jmp_buf env)
> +{
> +  longjmp (env, 1);
> +}
> +
> +__attribute__ ((noinline))
> +static void
> +do_setcontext (const ucontext_t *p)
> +{
> +  setcontext (p);
> +}
> +
> +static void
> +longjmp_test (void)
> +{
> +  unsigned long svcr;
> +  jmp_buf env;
> +  void *p;
> +  int r;
> +  struct blk blk = {za_save, svl, {0}};
> +
> +  printf ("longjmp test:\n");
> +  p = get_tpidr2 ();
> +  printf ("initial tp2 = %p\n", p);
> +  if (p != NULL)
> +    FAIL_EXIT1 ("tpidr2 is not initialized to 0");
> +  svcr = get_svcr ();
> +  if (svcr != 0)
> +    FAIL_EXIT1 ("svcr != 0: %lu", svcr);
> +  set_tpidr2 (&blk);
> +  start_za ();
> +  load_za (za_orig);
> +
> +  print_data ("za save space", za_save);
> +  p = get_tpidr2 ();
> +  printf ("before setjmp: tp2 = %p\n", p);
> +  if (p != &blk)
> +    FAIL_EXIT1 ("tpidr2 is not set to BLK %p", (void *)&blk);
> +  if (setjmp (env) == 0)
> +    {
> +      p = get_tpidr2 ();
> +      printf ("before longjmp: tp2 = %p\n", p);
> +      if (p != &blk)
> +	FAIL_EXIT1 ("tpidr2 is clobbered");
> +      do_longjmp (env);
> +      FAIL_EXIT1 ("longjmp returned");
> +    }
> +  p = get_tpidr2 ();
> +  printf ("after longjmp: tp2 = %p\n", p);
> +  if (p != NULL)
> +    FAIL_EXIT1 ("tpidr2 is not set to 0");
> +  svcr = get_svcr ();
> +  if (svcr != 0)
> +    FAIL_EXIT1 ("svcr != 0: %lu", svcr);
> +  print_data ("za save space", za_save);
> +  r = memcmp (za_orig, za_save, svl*svl);
> +  if (r != 0)
> +    FAIL_EXIT1 ("saving za failed");
> +}
> +
> +static void
> +setcontext_test (void)
> +{
> +  volatile int setcontext_done = 0;
> +  unsigned long svcr;
> +  ucontext_t ctx;
> +  void *p;
> +  int r;
> +  struct blk blk = {za_save, svl, {0}};
> +
> +  printf ("setcontext test:\n");
> +  p = get_tpidr2 ();
> +  printf ("initial tp2 = %p\n", p);
> +  if (p != NULL)
> +    FAIL_EXIT1 ("tpidr2 is not initialized to 0");
> +  svcr = get_svcr ();
> +  if (svcr != 0)
> +    FAIL_EXIT1 ("svcr != 0: %lu", svcr);
> +  set_tpidr2 (&blk);
> +  start_za ();
> +  load_za (za_orig);
> +
> +  print_data ("za save space", za_save);
> +  p = get_tpidr2 ();
> +  printf ("before getcontext: tp2 = %p\n", p);
> +  if (p != &blk)
> +    FAIL_EXIT1 ("tpidr2 is not set to BLK %p", (void *)&blk);
> +  r = getcontext (&ctx);
> +  if (r != 0)
> +    FAIL_EXIT1 ("getcontext failed");
> +  if (setcontext_done == 0)
> +    {
> +      p = get_tpidr2 ();
> +      printf ("before setcontext: tp2 = %p\n", p);
> +      if (p != &blk)
> +	FAIL_EXIT1 ("tpidr2 is clobbered");
> +      setcontext_done = 1;
> +      do_setcontext (&ctx);
> +      FAIL_EXIT1 ("setcontext returned");
> +    }
> +  p = get_tpidr2 ();
> +  printf ("after setcontext: tp2 = %p\n", p);
> +  if (p != NULL)
> +    FAIL_EXIT1 ("tpidr2 is not set to 0");
> +  svcr = get_svcr ();
> +  if (svcr != 0)
> +    FAIL_EXIT1 ("svcr != 0: %lu", svcr);
> +  print_data ("za save space", za_save);
> +  r = memcmp (za_orig, za_save, svl*svl);
> +  if (r != 0)
> +    FAIL_EXIT1 ("saving za failed");
> +}
> +
> +static int
> +do_test (void)
> +{
> +  unsigned long hwcap2;
> +
> +  hwcap2 = getauxval (AT_HWCAP2);
> +  if ((hwcap2 & HWCAP2_SME) == 0)
> +    return 77;

Use EXIT_UNSUPPORTED here.

> +
> +  svl = get_svl ();
> +  printf ("svl: %lu\n", svl);
> +  if (svl < 16 || svl % 16 != 0 || svl >= (1 << 16))
> +    FAIL_EXIT1 ("invalid svl");
> +
> +  za_orig = malloc (svl*svl);
> +  za_save = malloc (svl*svl);
> +  za_dump = malloc (svl*svl);

Use xmalloc here (or xcalloc).

> +  memset (za_orig, 1, svl*svl);
> +  memset (za_save, 2, svl*svl);
> +  memset (za_dump, 3, svl*svl);
> +  for (int i = 0; i < svl; i++)
> +    for (int j = 0; j < svl; j++)
> +      za_orig[i*svl+j] = i*svl+j;
> +  print_data ("original data", za_orig);
> +
> +  longjmp_test ();
> +
> +  memset (za_save, 2, svl*svl);
> +  memset (za_dump, 3, svl*svl);
> +
> +  setcontext_test ();
> +
> +  free (za_orig);
> +  free (za_save);
> +  free (za_dump);
> +  return 0;
> +}
> +
> +#include <support/test-driver.c>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/4] aarch64: Add SME runtime support
  2023-12-28 13:41   ` Adhemerval Zanella Netto
@ 2024-01-02 17:15     ` Szabolcs Nagy
  0 siblings, 0 replies; 11+ messages in thread
From: Szabolcs Nagy @ 2024-01-02 17:15 UTC (permalink / raw)
  To: Adhemerval Zanella Netto, libc-alpha

The 12/28/2023 10:41, Adhemerval Zanella Netto wrote:
> On 08/12/23 13:32, Szabolcs Nagy wrote:
> > The runtime support routines for the call ABI of the Scalable Matrix
> > Extension (SME) are mostly in libgcc. Since libc.so cannot depend on
> > libgcc_s.so have an implementation of __arm_za_disable in libc for
> > libc internal use in longjmp and similar APIs.
> > 
> > __libc_arm_za_disable follows the same PCS rules as __arm_za_disable,
> > but it's a hidden symbol so it does not need variant PCS marking.
> > 
> > Using __libc_fatal instead of abort because it can print a message and
> > works in ld.so too. But for now we don't need SME routines in ld.so.
> > 
> > To check the SME HWCAP in asm, we need the _dl_hwcap2 member offset in
> > _rtld_global_ro in the shared libc.so, while in libc.a the _dl_hwcap2
> > object is accessed.
> 
> LGTM, thanks.
> 
> Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>

thanks.

> > -sysdep_routines += __mtag_tag_zero_region \
> > -		   __mtag_tag_region
> > +sysdep_routines += \
> > +  __mtag_tag_zero_region \
> > +  __mtag_tag_region \
> > +  __arm_za_disable
> >  endif
> >  
> >  ifeq ($(subdir),malloc)
> 
> Ok (although usually the Makefile reflow makes more sense to be a
> unrelated patch).

i kept this as it is a minor refactor.

> > +	/* check reserved bytes.  */
> 
> Maybe add that the the action chose is to abort if non-zero bytes
> are found.

updated the comment.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 4/4] aarch64: Add longjmp test for SME
  2023-12-28 14:36   ` Adhemerval Zanella Netto
@ 2024-01-02 17:20     ` Szabolcs Nagy
  0 siblings, 0 replies; 11+ messages in thread
From: Szabolcs Nagy @ 2024-01-02 17:20 UTC (permalink / raw)
  To: Adhemerval Zanella Netto, libc-alpha

The 12/28/2023 11:36, Adhemerval Zanella Netto wrote:
> On 08/12/23 13:32, Szabolcs Nagy wrote:
> > Includes test for setcontext too.
> > 
> > The test directly checks after longjmp if ZA got disabled and the
> > ZA contents got saved following the lazy saving scheme. It does not
> > use ACLE code to verify that gcc can interoperate with glibc.
> 
> LGTM, thanks.  Some minor suggestions below.
> 
> Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>

thanks.

> > +  hwcap2 = getauxval (AT_HWCAP2);
> > +  if ((hwcap2 & HWCAP2_SME) == 0)
> > +    return 77;
> 
> Use EXIT_UNSUPPORTED here.

changed.

> > +  za_orig = malloc (svl*svl);
> > +  za_save = malloc (svl*svl);
> > +  za_dump = malloc (svl*svl);
> 
> Use xmalloc here (or xcalloc).

updated to xmalloc.

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2024-01-02 17:21 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-12-08 16:31 [PATCH 0/4] aarch64: Add SME support Szabolcs Nagy
2023-12-08 16:32 ` [PATCH 1/4] aarch64: Add SME runtime support Szabolcs Nagy
2023-12-28 13:41   ` Adhemerval Zanella Netto
2024-01-02 17:15     ` Szabolcs Nagy
2023-12-08 16:32 ` [PATCH 2/4] aarch64: Add longjmp support for SME Szabolcs Nagy
2023-12-28 13:42   ` Adhemerval Zanella Netto
2023-12-08 16:32 ` [PATCH 3/4] aarch64: Add setcontext " Szabolcs Nagy
2023-12-28 13:42   ` Adhemerval Zanella Netto
2023-12-08 16:32 ` [PATCH 4/4] aarch64: Add longjmp test " Szabolcs Nagy
2023-12-28 14:36   ` Adhemerval Zanella Netto
2024-01-02 17:20     ` Szabolcs Nagy

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).