public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed
* [PATCH v2 0/2] powerpc: Add support for system call vectored
@ 2020-12-03 17:15 Matheus Castanho
  2020-12-03 17:15 ` [PATCH v2 1/2] powerpc: Runtime selection between sc and scv for syscalls Matheus Castanho
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Matheus Castanho @ 2020-12-03 17:15 UTC (permalink / raw)
  To: libc-alpha

This patchset enables the usage of system call vectored (scv) instruction by
system calls for improved performance on POWER9 and later processors.

This new version dropped the patch to move brk to a C-based implementation, in
favor of Adhemerval's patch [0]. So it should be applied on top of that.

Also, it modified the scv check mechanism to remove the thread pointer check as
suggested by Florian. Now the dynamic loader always uses sc, static code still
checks the thread pointer before accessing the TCB, and shared code (outside the
dl) accesses the TCB directly.

Please check each patch for more details, specially 1/2 which explains the
rationale and details of the runtime mechanism to choose between the old
behavior (sc) and the new one (scv 0).

Tested on powerpc, powerpc64, and powerpc64le (with and without scv)

[0] https://sourceware.org/pipermail/libc-alpha/2020-November/119770.html

---

v2:
  - Fix typos and comments to match GNU style
  - Remove the check for the thread pointer on shared code, and always use sc
    when inside the dynamic loader
  - Drop the patch implementing brk in C

Matheus Castanho (2):
  powerpc: Runtime selection between sc and scv for syscalls
  powerpc: Use scv instruction on clone when available

 sysdeps/powerpc/powerpc32/sysdep.h            |  15 ++-
 sysdeps/powerpc/powerpc64/sysdep.h            | 120 +++++++++++++++++-
 .../unix/sysv/linux/powerpc/powerpc64/clone.S |  36 +++++-
 .../unix/sysv/linux/powerpc/powerpc64/vfork.S |   6 +-
 sysdeps/unix/sysv/linux/powerpc/syscall.S     |  11 +-
 sysdeps/unix/sysv/linux/powerpc/sysdep.h      |  93 ++++++++++----
 6 files changed, 241 insertions(+), 40 deletions(-)

--
2.26.2

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH v2 1/2] powerpc: Runtime selection between sc and scv for syscalls
  2020-12-03 17:15 [PATCH v2 0/2] powerpc: Add support for system call vectored Matheus Castanho
@ 2020-12-03 17:15 ` Matheus Castanho
  2020-12-10 17:28   ` Matheus Castanho
  2020-12-30 21:29   ` Tulio Magno Quites Machado Filho
  2020-12-03 17:15 ` [PATCH v2 2/2] powerpc: Use scv instruction on clone when available Matheus Castanho
  2020-12-08 18:24 ` [PATCH v2 0/2] powerpc: Add support for system call vectored Lucas A. M. Magalhaes
  2 siblings, 2 replies; 8+ messages in thread
From: Matheus Castanho @ 2020-12-03 17:15 UTC (permalink / raw)
  To: libc-alpha

This patch depends on the brk consolidation implemented by [0].

[0] https://sourceware.org/pipermail/libc-alpha/2020-November/119770.html

---8<---

Linux kernel v5.9 added support for system calls using the scv
instruction for POWER9 and later.  The new codepath provides better
performance (see below) if compared to using sc.  For the
foreseeable future, both sc and scv mechanisms will co-exist, so this
patch enables glibc to do a runtime check and use scv when it is
available.

Before issuing the system call to the kernel, we check hwcap2 in the TCB
for PPC_FEATURE2_SCV to see if scv is supported by the kernel.  If not,
we fallback to sc and keep the old behavior.

The kernel implements a different error return convention for scv, so
when returning from a system call we need to handle the return value
differently depending on the instruction we used to enter the kernel.

For syscalls implemented in ASM, entry and exit are implemented by
different macros (PSEUDO and PSEUDO_RET, resp.), which may be used in
sequence (e.g. for templated syscalls) or with other instructions in
between (e.g. clone).  To avoid accessing the TCB a second time on
PSEUDO_RET to check which instruction we used, the value read from
hwcap2 is cached on a non-volatile register.

This is not needed when using INTERNAL_SYSCALL macro, since entry and
exit are bundled into the same inline asm directive.

The dynamic loader may issue syscalls before the TCB has been setup
so it always uses sc with no extra checks.  For the static case, there
is no compile-time way to determine if we are inside startup code,
so we also check the value of the thread pointer before effectively
accessing the TCB.  For such situations in which the availability of
scv cannot be determined, sc is always used.

Support for scv in syscalls implemented in their own ASM file (clone and
vfork) will be added later. For now simply use sc as before.

Average performance over 1M calls for each syscall "type":
  - stat: C wrapper calling INTERNAL_SYSCALL
  - getpid: templated ASM syscall
  - syscall: call to gettid using syscall function

  Standard:
     stat : 1.573445 us / ~3619 cycles
   getpid : 0.164986 us / ~379 cycles
  syscall : 0.162743 us / ~374 cycles

  With scv:
     stat : 1.537049 us / ~3535 cycles <~ -84 cycles  / -2.32%
   getpid : 0.109923 us / ~253 cycles  <~ -126 cycles / -33.25%
  syscall : 0.116410 us / ~268 cycles  <~ -106 cycles / -28.34%

Tested on powerpc, powerpc64, powerpc64le (with and without scv)
---
 sysdeps/powerpc/powerpc32/sysdep.h            |  15 ++-
 sysdeps/powerpc/powerpc64/sysdep.h            | 120 +++++++++++++++++-
 .../unix/sysv/linux/powerpc/powerpc64/clone.S |   9 +-
 .../unix/sysv/linux/powerpc/powerpc64/vfork.S |   6 +-
 sysdeps/unix/sysv/linux/powerpc/syscall.S     |  11 +-
 sysdeps/unix/sysv/linux/powerpc/sysdep.h      |  93 ++++++++++----
 6 files changed, 214 insertions(+), 40 deletions(-)

diff --git a/sysdeps/powerpc/powerpc32/sysdep.h b/sysdeps/powerpc/powerpc32/sysdep.h
index 829eec266a..a3fc1f0ca7 100644
--- a/sysdeps/powerpc/powerpc32/sysdep.h
+++ b/sysdeps/powerpc/powerpc32/sysdep.h
@@ -92,7 +92,10 @@ GOT_LABEL:			;					      \

 #define DO_CALL(syscall)						      \
     li 0,syscall;							      \
-    sc
+    DO_CALL_SC
+
+#define DO_CALL_SC \
+	sc

 #undef JUMPTARGET
 #ifdef PIC
@@ -106,14 +109,20 @@ GOT_LABEL:			;					      \
 # define HIDDEN_JUMPTARGET(name) __GI_##name##@local
 #endif

+#define TAIL_CALL_SYSCALL_ERROR \
+    b __syscall_error@local
+
 #define PSEUDO(name, syscall_name, args)				      \
   .section ".text";							      \
   ENTRY (name)								      \
     DO_CALL (SYS_ify (syscall_name));

+#define RET_SC \
+    bnslr+;
+
 #define PSEUDO_RET							      \
-    bnslr+;								      \
-    b __syscall_error@local
+    RET_SC;								      \
+    TAIL_CALL_SYSCALL_ERROR
 #define ret PSEUDO_RET

 #undef	PSEUDO_END
diff --git a/sysdeps/powerpc/powerpc64/sysdep.h b/sysdeps/powerpc/powerpc64/sysdep.h
index d557098898..f6a0619743 100644
--- a/sysdeps/powerpc/powerpc64/sysdep.h
+++ b/sysdeps/powerpc/powerpc64/sysdep.h
@@ -17,6 +17,7 @@
    <https://www.gnu.org/licenses/>.  */

 #include <sysdeps/powerpc/sysdep.h>
+#include <tls.h>

 #ifdef __ASSEMBLER__

@@ -263,10 +264,80 @@ LT_LABELSUFFIX(name,_name_end): ; \
   TRACEBACK_MASK(name,mask);	\
   END_2(name)

-#define DO_CALL(syscall) \
-    li 0,syscall; \
+/* We will allocate a new frame to save LR and the non-volatile register used to
+   read the TCB when checking for scv support on syscall code.  We actually just
+   need the minimum frame size plus room for 1 reg (8 bytes).  But the ABI
+   mandates stack frames should be aligned at 16 Bytes, so we end up allocating
+   a bit more space then what will actually be used.  */
+#define SCV_FRAME_SIZE (FRAME_MIN_SIZE+16)
+#define SCV_FRAME_NVOLREG_SAVE FRAME_MIN_SIZE
+
+/* Allocate frame and save register */
+#define NVOLREG_SAVE \
+    stdu r1,-SCV_FRAME_SIZE(r1); \
+    std r31,SCV_FRAME_NVOLREG_SAVE(r1); \
+    cfi_adjust_cfa_offset(SCV_FRAME_SIZE);
+
+/* Restore register and destroy frame */
+#define NVOLREG_RESTORE	\
+    ld r31,SCV_FRAME_NVOLREG_SAVE(r1); \
+    addi r1,r1,SCV_FRAME_SIZE; \
+    cfi_adjust_cfa_offset(-SCV_FRAME_SIZE);
+
+/* Check PPC_FEATURE2_SCV bit from hwcap2 in the TCB.  If it is not set, scv is
+   not available, then go to JUMPFALSE (label given by the macro's caller).  We
+   save the value we read from the TCB in a non-volatile register so we can
+   reuse it later when exiting from the syscall in PSEUDO_RET.  Note that for
+   the static case we need an extra check to guarantee the thread pointer has
+   already been initialized, otherwise we may try to access an invalid address
+   if a syscall is called before the TLS has been setup.  */
+    .macro CHECK_SCV_SUPPORT REG JUMPFALSE
+
+#ifndef SHARED
+    /* Check if thread pointer has already been setup.  */
+    cmpdi r13,0
+    beq \JUMPFALSE
+#endif
+
+    /* Read PPC_FEATURE2_SCV from TCB and store it in REG */
+    ld \REG,TCB_HWCAP(PT_THREAD_POINTER)
+    andis. \REG,\REG,PPC_FEATURE2_SCV>>16
+
+    beq \JUMPFALSE
+    .endm
+
+#if IS_IN(rtld)
+# define DO_CALL(syscall) \
+    li r0,syscall; \
+    DO_CALL_SC
+#else
+/* Before doing the syscall, check if we can use scv.  scv is supported by P9
+   and later with Linux v5.9 and later.  If so, use it.  Otherwise, fallback to
+   sc.  We use a non-volatile register to save hwcap2 from the TCB, so we need
+   to save its content beforehand.  */
+# define DO_CALL(syscall) \
+    li r0,syscall; \
+    NVOLREG_SAVE; \
+    CHECK_SCV_SUPPORT r31 0f; \
+    DO_CALL_SCV; \
+    b 1f; \
+0:  DO_CALL_SC; \
+1:
+#endif /* IS_IN(rtld) */
+
+/* DO_CALL_SC and DO_CALL_SCV expect the syscall number to be in r0.  */
+#define DO_CALL_SC \
     sc

+#define DO_CALL_SCV \
+    mflr r9; \
+    std r9,FRAME_LR_SAVE(r1); \
+    cfi_offset(lr,FRAME_LR_SAVE); \
+    scv 0; \
+    ld r9,FRAME_LR_SAVE(r1); \
+    mtlr r9; \
+    cfi_restore(lr);
+
 /* ppc64 is always PIC */
 #undef JUMPTARGET
 #define JUMPTARGET(name) FUNC_LABEL(name)
@@ -304,9 +375,32 @@ LT_LABELSUFFIX(name,_name_end): ; \
     .endif
 #endif

-#define PSEUDO_RET \
-    bnslr+; \
+#if IS_IN(rtld)
+# define PSEUDO_RET \
+    RET_SC; \
     TAIL_CALL_SYSCALL_ERROR
+#else
+/* This should only be called after a DO_CALL.  In such cases, r31 contains the
+   value of PPC_FEATURE2_SCV read from hwcap2 by CHECK_SCV_SUPPORT.  If it is
+   set, we know we have entered the kernel using scv, so handle the return code
+   accordingly.  */
+# define PSEUDO_RET \
+    cmpdi cr5,r31,0; \
+    NVOLREG_RESTORE; \
+    beq cr5,0f; \
+    RET_SCV; \
+    b 1f; \
+0:  RET_SC; \
+1:  TAIL_CALL_SYSCALL_ERROR
+#endif
+
+#define RET_SCV \
+    cmpdi r3,0; \
+    bgelr+; \
+    neg r3,r3;
+
+#define RET_SC \
+    bnslr+;

 #define ret PSEUDO_RET

@@ -319,8 +413,15 @@ LT_LABELSUFFIX(name,_name_end): ; \
   ENTRY (name);						\
   DO_CALL (SYS_ify (syscall_name))

-#define PSEUDO_RET_NOERRNO \
+#if IS_IN(rtld)
+# define PSEUDO_RET_NOERRNO \
     blr
+#else
+/* This should only be called after a DO_CALL.  */
+# define PSEUDO_RET_NOERRNO \
+    NVOLREG_RESTORE; \
+    blr
+#endif /* IS_IN(rtld) */

 #define ret_NOERRNO PSEUDO_RET_NOERRNO

@@ -333,8 +434,15 @@ LT_LABELSUFFIX(name,_name_end): ; \
   ENTRY (name);						\
   DO_CALL (SYS_ify (syscall_name))

-#define PSEUDO_RET_ERRVAL \
+#if IS_IN(rtld)
+# define PSEUDO_RET_ERRVAL \
+    blr
+#else
+/* This should only be called after a DO_CALL.  */
+# define PSEUDO_RET_ERRVAL \
+    NVOLREG_RESTORE; \
     blr
+#endif /* IS_IN(rtld) */

 #define ret_ERRVAL PSEUDO_RET_ERRVAL

diff --git a/sysdeps/unix/sysv/linux/powerpc/powerpc64/clone.S b/sysdeps/unix/sysv/linux/powerpc/powerpc64/clone.S
index b30641c805..fc496fa671 100644
--- a/sysdeps/unix/sysv/linux/powerpc/powerpc64/clone.S
+++ b/sysdeps/unix/sysv/linux/powerpc/powerpc64/clone.S
@@ -68,7 +68,8 @@ ENTRY (__clone)
 	cfi_endproc

 	/* Do the call.  */
-	DO_CALL(SYS_ify(clone))
+	li 	r0,SYS_ify(clone)
+	DO_CALL_SC

 	/* Check for child process.  */
 	cmpdi	cr1,r3,0
@@ -82,7 +83,8 @@ ENTRY (__clone)
 	bctrl
 	ld	r2,FRAME_TOC_SAVE(r1)

-	DO_CALL(SYS_ify(exit))
+	li	r0,(SYS_ify(exit))
+	DO_CALL_SC
 	/* We won't ever get here but provide a nop so that the linker
 	   will insert a toc adjusting stub if necessary.  */
 	nop
@@ -104,7 +106,8 @@ L(parent):
 	cfi_restore(r30)
 	cfi_restore(r31)

-	PSEUDO_RET
+	RET_SC
+	TAIL_CALL_SYSCALL_ERROR

 END (__clone)

diff --git a/sysdeps/unix/sysv/linux/powerpc/powerpc64/vfork.S b/sysdeps/unix/sysv/linux/powerpc/powerpc64/vfork.S
index 17199fb14a..a71f69e929 100644
--- a/sysdeps/unix/sysv/linux/powerpc/powerpc64/vfork.S
+++ b/sysdeps/unix/sysv/linux/powerpc/powerpc64/vfork.S
@@ -28,9 +28,11 @@
 ENTRY (__vfork)
 	CALL_MCOUNT 0

-	DO_CALL (SYS_ify (vfork))
+	li r0,SYS_ify (vfork)
+	DO_CALL_SC

-	PSEUDO_RET
+	RET_SC
+	TAIL_CALL_SYSCALL_ERROR

 PSEUDO_END (__vfork)
 libc_hidden_def (__vfork)
diff --git a/sysdeps/unix/sysv/linux/powerpc/syscall.S b/sysdeps/unix/sysv/linux/powerpc/syscall.S
index 48dade4642..9fc4ddd3cb 100644
--- a/sysdeps/unix/sysv/linux/powerpc/syscall.S
+++ b/sysdeps/unix/sysv/linux/powerpc/syscall.S
@@ -25,6 +25,13 @@ ENTRY (syscall)
 	mr   r6,r7
 	mr   r7,r8
 	mr   r8,r9
-	sc
-	PSEUDO_RET
+#if !IS_IN(rtld) && (defined(__PPC64__) || defined(__powerpc64__))
+	CHECK_SCV_SUPPORT r9 0f
+	DO_CALL_SCV
+	RET_SCV
+	b 1f
+#endif
+0:	DO_CALL_SC
+	RET_SC
+1:	TAIL_CALL_SYSCALL_ERROR
 PSEUDO_END (syscall)
diff --git a/sysdeps/unix/sysv/linux/powerpc/sysdep.h b/sysdeps/unix/sysv/linux/powerpc/sysdep.h
index b2bca598b9..7f69804edc 100644
--- a/sysdeps/unix/sysv/linux/powerpc/sysdep.h
+++ b/sysdeps/unix/sysv/linux/powerpc/sysdep.h
@@ -64,39 +64,84 @@
 #define INTERNAL_VSYSCALL_CALL(funcptr, nr, args...)			\
   INTERNAL_VSYSCALL_CALL_TYPE(funcptr, long int, nr, args)

+#define DECLARE_REGS				\
+  register long int r0  __asm__ ("r0");		\
+  register long int r3  __asm__ ("r3");		\
+  register long int r4  __asm__ ("r4");		\
+  register long int r5  __asm__ ("r5");		\
+  register long int r6  __asm__ ("r6");		\
+  register long int r7  __asm__ ("r7");		\
+  register long int r8  __asm__ ("r8");

-#undef INTERNAL_SYSCALL
-#define INTERNAL_SYSCALL_NCS(name, nr, args...) \
-  ({									\
-    register long int r0  __asm__ ("r0");				\
-    register long int r3  __asm__ ("r3");				\
-    register long int r4  __asm__ ("r4");				\
-    register long int r5  __asm__ ("r5");				\
-    register long int r6  __asm__ ("r6");				\
-    register long int r7  __asm__ ("r7");				\
-    register long int r8  __asm__ ("r8");				\
-    LOADARGS_##nr (name, ##args);					\
-    __asm__ __volatile__						\
-      ("sc\n\t"								\
-       "mfcr  %0\n\t"							\
-       "0:"								\
-       : "=&r" (r0),							\
-         "=&r" (r3), "=&r" (r4), "=&r" (r5),				\
-         "=&r" (r6), "=&r" (r7), "=&r" (r8)				\
-       : ASM_INPUT_##nr							\
-       : "r9", "r10", "r11", "r12",					\
-         "cr0", "ctr", "memory");					\
-    r0 & (1 << 28) ? -r3 : r3;						\
+#define SYSCALL_SCV(nr)				\
+  ({						\
+    __asm__ __volatile__			\
+      ("scv 0\n\t"				\
+       "0:"					\
+       : "=&r" (r0),				\
+	 "=&r" (r3), "=&r" (r4), "=&r" (r5),	\
+	 "=&r" (r6), "=&r" (r7), "=&r" (r8)	\
+       : ASM_INPUT_##nr			\
+       : "r9", "r10", "r11", "r12",		\
+	 "lr", "ctr", "memory");		\
+    r3;					\
   })
-#define INTERNAL_SYSCALL(name, nr, args...)				\
-  INTERNAL_SYSCALL_NCS (__NR_##name, nr, args)
+
+#define SYSCALL_SC(nr)				\
+  ({						\
+    __asm__ __volatile__			\
+      ("sc\n\t"				\
+       "mfcr %0\n\t"				\
+       "0:"					\
+       : "=&r" (r0),				\
+	 "=&r" (r3), "=&r" (r4), "=&r" (r5),	\
+	 "=&r" (r6), "=&r" (r7), "=&r" (r8)	\
+       : ASM_INPUT_##nr			\
+       : "r9", "r10", "r11", "r12",		\
+	 "cr0", "ctr", "memory");		\
+    r0 & (1 << 28) ? -r3 : r3;			\
+  })
+
+/* This will only be non-empty for 64-bit systems, see below.  */
+#define TRY_SYSCALL_SCV(nr)

 #if defined(__PPC64__) || defined(__powerpc64__)
 # define SYSCALL_ARG_SIZE 8
+
+/* For the static case, unlike the dynamic loader, there is no compile-time way
+   to check if we are inside startup code.  So we need to check if the thread
+   pointer has already been setup before trying to access the TLS.  */
+# ifndef SHARED
+#  define CHECK_THREAD_POINTER (__thread_register != 0)
+# else
+#  define CHECK_THREAD_POINTER (1)
+# endif
+
+/* When inside the dynamic loader, the thread pointer may not have been
+   initialized yet, so don't check for scv support in that case.  */
+# if !IS_IN(rtld)
+#  undef TRY_SYSCALL_SCV
+#  define TRY_SYSCALL_SCV(nr)						\
+  CHECK_THREAD_POINTER && THREAD_GET_HWCAP() & PPC_FEATURE2_SCV ?	\
+      SYSCALL_SCV(nr) :
+# endif
+
 #else
 # define SYSCALL_ARG_SIZE 4
 #endif

+# define INTERNAL_SYSCALL_NCS(name, nr, args...)	\
+  ({							\
+    DECLARE_REGS;					\
+    LOADARGS_##nr (name, ##args);			\
+    TRY_SYSCALL_SCV(nr)					\
+    SYSCALL_SC(nr);					\
+  })
+
+#undef INTERNAL_SYSCALL
+#define INTERNAL_SYSCALL(name, nr, args...)				\
+  INTERNAL_SYSCALL_NCS (__NR_##name, nr, args)
+
 #define LOADARGS_0(name, dummy) \
 	r0 = name
 #define LOADARGS_1(name, __arg1) \
--
2.26.2

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH v2 2/2] powerpc: Use scv instruction on clone when available
  2020-12-03 17:15 [PATCH v2 0/2] powerpc: Add support for system call vectored Matheus Castanho
  2020-12-03 17:15 ` [PATCH v2 1/2] powerpc: Runtime selection between sc and scv for syscalls Matheus Castanho
@ 2020-12-03 17:15 ` Matheus Castanho
  2020-12-30 21:30   ` Tulio Magno Quites Machado Filho
  2020-12-08 18:24 ` [PATCH v2 0/2] powerpc: Add support for system call vectored Lucas A. M. Magalhaes
  2 siblings, 1 reply; 8+ messages in thread
From: Matheus Castanho @ 2020-12-03 17:15 UTC (permalink / raw)
  To: libc-alpha

clone already uses r31 to temporarily save input arguments before doing the
syscall, so we use a different register to read from the TCB. We can also avoid
allocating another stack frame, which is not needed since we can simply extend
the usage of the red zone.
---
 .../unix/sysv/linux/powerpc/powerpc64/clone.S | 33 +++++++++++++++++--
 1 file changed, 30 insertions(+), 3 deletions(-)

diff --git a/sysdeps/unix/sysv/linux/powerpc/powerpc64/clone.S b/sysdeps/unix/sysv/linux/powerpc/powerpc64/clone.S
index fc496fa671..247e0de68c 100644
--- a/sysdeps/unix/sysv/linux/powerpc/powerpc64/clone.S
+++ b/sysdeps/unix/sysv/linux/powerpc/powerpc64/clone.S
@@ -38,9 +38,11 @@ ENTRY (__clone)
 	beq-	cr0,L(badargs)
 
 	/* Save some regs in the "red zone".  */
+	std	r28,-32(r1)
 	std	r29,-24(r1)
 	std	r30,-16(r1)
 	std	r31,-8(r1)
+	cfi_offset(r28,-32)
 	cfi_offset(r29,-24)
 	cfi_offset(r30,-16)
 	cfi_offset(r31,-8)
@@ -69,11 +71,26 @@ ENTRY (__clone)
 
 	/* Do the call.  */
 	li 	r0,SYS_ify(clone)
-	DO_CALL_SC
+	CHECK_SCV_SUPPORT r28 0f
+	/* This is equivalent to DO_CALL_SCV, but we cannot use the macro here
+	because it uses CFI directives and we just called cfi_endproc.  */
+	mflr 	r9
+	std 	r9,FRAME_LR_SAVE(r1)
+	scv 	0
+	ld 	r9,FRAME_LR_SAVE(r1)
+	mtlr 	r9
+
+	/* Check for child process.  */
+	/* When using scv, error is indicated by negative r3.  */
+	cmpdi	cr1,r3,0
+	b 1f
+0:      DO_CALL_SC
 
 	/* Check for child process.  */
+	/* With sc, error is indicated by cr0.SO.  */
 	cmpdi	cr1,r3,0
 	crandc	cr1*4+eq,cr1*4+eq,cr0*4+so
+1:
 	bne-	cr1,L(parent)		/* The '-' is to minimise the race.  */
 
 	std	r2,FRAME_TOC_SAVE(r1)
@@ -95,19 +112,29 @@ L(badargs):
 	TAIL_CALL_SYSCALL_ERROR
 
 L(parent):
+	/* Check if scv is available. */
+	cmpdi cr1,r28,0
+
 	/* Parent.  Restore registers & return.  */
+	cfi_offset(r28,-32)
 	cfi_offset(r29,-24)
 	cfi_offset(r30,-16)
 	cfi_offset(r31,-8)
+	ld	r28,-32(r1)
 	ld	r29,-24(r1)
 	ld	r30,-16(r1)
 	ld	r31,-8(r1)
+	cfi_restore(r28)
 	cfi_restore(r29)
 	cfi_restore(r30)
 	cfi_restore(r31)
 
-	RET_SC
-	TAIL_CALL_SYSCALL_ERROR
+	beq cr1,0f
+	RET_SCV
+	b 1f
+0:	RET_SC
+1:	TAIL_CALL_SYSCALL_ERROR
+
 
 END (__clone)
 
-- 
2.26.2


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 0/2] powerpc: Add support for system call vectored
  2020-12-03 17:15 [PATCH v2 0/2] powerpc: Add support for system call vectored Matheus Castanho
  2020-12-03 17:15 ` [PATCH v2 1/2] powerpc: Runtime selection between sc and scv for syscalls Matheus Castanho
  2020-12-03 17:15 ` [PATCH v2 2/2] powerpc: Use scv instruction on clone when available Matheus Castanho
@ 2020-12-08 18:24 ` Lucas A. M. Magalhaes
  2 siblings, 0 replies; 8+ messages in thread
From: Lucas A. M. Magalhaes @ 2020-12-08 18:24 UTC (permalink / raw)
  To: Matheus Castanho, libc-alpha

Hi Matheus,
Worked on top of Adhemerval's brk patch. But it fails if applied
directly on master.

Tested on a POWER9 with and without scv support.

Tested-by: Lucas A. M. Magalhães <lamm@linux.ibm.com>

Quoting Matheus Castanho via Libc-alpha (2020-12-03 14:15:26)
> This patchset enables the usage of system call vectored (scv) instruction by
> system calls for improved performance on POWER9 and later processors.
> 
> This new version dropped the patch to move brk to a C-based implementation, in
> favor of Adhemerval's patch [0]. So it should be applied on top of that.
> 
> Also, it modified the scv check mechanism to remove the thread pointer check as
> suggested by Florian. Now the dynamic loader always uses sc, static code still
> checks the thread pointer before accessing the TCB, and shared code (outside the
> dl) accesses the TCB directly.
> 
> Please check each patch for more details, specially 1/2 which explains the
> rationale and details of the runtime mechanism to choose between the old
> behavior (sc) and the new one (scv 0).
> 
> Tested on powerpc, powerpc64, and powerpc64le (with and without scv)
> 
> [0] https://sourceware.org/pipermail/libc-alpha/2020-November/119770.html
> 
> ---
> 
> v2:
>   - Fix typos and comments to match GNU style
>   - Remove the check for the thread pointer on shared code, and always use sc
>     when inside the dynamic loader
>   - Drop the patch implementing brk in C
> 
> Matheus Castanho (2):
>   powerpc: Runtime selection between sc and scv for syscalls
>   powerpc: Use scv instruction on clone when available
> 
>  sysdeps/powerpc/powerpc32/sysdep.h            |  15 ++-
>  sysdeps/powerpc/powerpc64/sysdep.h            | 120 +++++++++++++++++-
>  .../unix/sysv/linux/powerpc/powerpc64/clone.S |  36 +++++-
>  .../unix/sysv/linux/powerpc/powerpc64/vfork.S |   6 +-
>  sysdeps/unix/sysv/linux/powerpc/syscall.S     |  11 +-
>  sysdeps/unix/sysv/linux/powerpc/sysdep.h      |  93 ++++++++++----
>  6 files changed, 241 insertions(+), 40 deletions(-)
> 
> --
> 2.26.2

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 1/2] powerpc: Runtime selection between sc and scv for syscalls
  2020-12-03 17:15 ` [PATCH v2 1/2] powerpc: Runtime selection between sc and scv for syscalls Matheus Castanho
@ 2020-12-10 17:28   ` Matheus Castanho
  2020-12-10 18:07     ` Adhemerval Zanella
  2020-12-30 21:29   ` Tulio Magno Quites Machado Filho
  1 sibling, 1 reply; 8+ messages in thread
From: Matheus Castanho @ 2020-12-10 17:28 UTC (permalink / raw)
  To: libc-alpha; +Cc: Florian Weimer, Adhemerval Zanella

Hi Florian an Adhemerval,

I was able to get rid of the thread pointer check, leaving it only on libc.a, and the DL
is using the old interface (sc insn). Do you have further comments?

On 12/3/20 2:15 PM, Matheus Castanho via Libc-alpha wrote:
> This patch depends on the brk consolidation implemented by [0].
> 
> [0] https://sourceware.org/pipermail/libc-alpha/2020-November/119770.html
> 

[...]

Adhemerval, do you plan to merge this patch (with the change suggested by Tulio) 
in this release? I see it already has a Reviewed-by.

I'd like to try to merge this change before the freeze, but it depends on that patch
now.

Thanks,
Matheus Castanho

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 1/2] powerpc: Runtime selection between sc and scv for syscalls
  2020-12-10 17:28   ` Matheus Castanho
@ 2020-12-10 18:07     ` Adhemerval Zanella
  0 siblings, 0 replies; 8+ messages in thread
From: Adhemerval Zanella @ 2020-12-10 18:07 UTC (permalink / raw)
  To: Matheus Castanho, libc-alpha; +Cc: Florian Weimer



On 10/12/2020 14:28, Matheus Castanho wrote:
> Hi Florian an Adhemerval,
> 
> I was able to get rid of the thread pointer check, leaving it only on libc.a, and the DL
> is using the old interface (sc insn). Do you have further comments?
> 
> On 12/3/20 2:15 PM, Matheus Castanho via Libc-alpha wrote:
>> This patch depends on the brk consolidation implemented by [0].
>>
>> [0] https://sourceware.org/pipermail/libc-alpha/2020-November/119770.html
>>
> 
> [...]
> 
> Adhemerval, do you plan to merge this patch (with the change suggested by Tulio) 
> in this release? I see it already has a Reviewed-by.
> 
> I'd like to try to merge this change before the freeze, but it depends on that patch
> now.
> 
> Thanks,
> Matheus Castanho
> 

I will do it, thanks for remind me.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 1/2] powerpc: Runtime selection between sc and scv for syscalls
  2020-12-03 17:15 ` [PATCH v2 1/2] powerpc: Runtime selection between sc and scv for syscalls Matheus Castanho
  2020-12-10 17:28   ` Matheus Castanho
@ 2020-12-30 21:29   ` Tulio Magno Quites Machado Filho
  1 sibling, 0 replies; 8+ messages in thread
From: Tulio Magno Quites Machado Filho @ 2020-12-30 21:29 UTC (permalink / raw)
  To: Matheus Castanho, libc-alpha

Matheus Castanho via Libc-alpha <libc-alpha@sourceware.org> writes:

> Linux kernel v5.9 added support for system calls using the scv
> instruction for POWER9 and later.  The new codepath provides better
> performance (see below) if compared to using sc.  For the
> foreseeable future, both sc and scv mechanisms will co-exist, so this
> patch enables glibc to do a runtime check and use scv when it is
> available.
>
> Before issuing the system call to the kernel, we check hwcap2 in the TCB
> for PPC_FEATURE2_SCV to see if scv is supported by the kernel.  If not,
> we fallback to sc and keep the old behavior.
>
> The kernel implements a different error return convention for scv, so
> when returning from a system call we need to handle the return value
> differently depending on the instruction we used to enter the kernel.
>
> For syscalls implemented in ASM, entry and exit are implemented by
> different macros (PSEUDO and PSEUDO_RET, resp.), which may be used in
> sequence (e.g. for templated syscalls) or with other instructions in
> between (e.g. clone).  To avoid accessing the TCB a second time on
> PSEUDO_RET to check which instruction we used, the value read from
> hwcap2 is cached on a non-volatile register.
>
> This is not needed when using INTERNAL_SYSCALL macro, since entry and
> exit are bundled into the same inline asm directive.
>
> The dynamic loader may issue syscalls before the TCB has been setup
> so it always uses sc with no extra checks.  For the static case, there
> is no compile-time way to determine if we are inside startup code,
> so we also check the value of the thread pointer before effectively
> accessing the TCB.  For such situations in which the availability of
> scv cannot be determined, sc is always used.
>
> Support for scv in syscalls implemented in their own ASM file (clone and
> vfork) will be added later. For now simply use sc as before.

LGTM.

Pushed as 68ab82f56690ada86ac1e0c46bad06ba189a10ef.

Thanks!

-- 
Tulio Magno

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 2/2] powerpc: Use scv instruction on clone when available
  2020-12-03 17:15 ` [PATCH v2 2/2] powerpc: Use scv instruction on clone when available Matheus Castanho
@ 2020-12-30 21:30   ` Tulio Magno Quites Machado Filho
  0 siblings, 0 replies; 8+ messages in thread
From: Tulio Magno Quites Machado Filho @ 2020-12-30 21:30 UTC (permalink / raw)
  To: Matheus Castanho, libc-alpha

Matheus Castanho via Libc-alpha <libc-alpha@sourceware.org> writes:

> clone already uses r31 to temporarily save input arguments before doing the
> syscall, so we use a different register to read from the TCB. We can also avoid
> allocating another stack frame, which is not needed since we can simply extend
> the usage of the red zone.

LGTM.
Pushed as 41f013cef24884604c303435dd1915be2ea5c0e0.

Thanks!

-- 
Tulio Magno

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2020-12-30 21:30 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-12-03 17:15 [PATCH v2 0/2] powerpc: Add support for system call vectored Matheus Castanho
2020-12-03 17:15 ` [PATCH v2 1/2] powerpc: Runtime selection between sc and scv for syscalls Matheus Castanho
2020-12-10 17:28   ` Matheus Castanho
2020-12-10 18:07     ` Adhemerval Zanella
2020-12-30 21:29   ` Tulio Magno Quites Machado Filho
2020-12-03 17:15 ` [PATCH v2 2/2] powerpc: Use scv instruction on clone when available Matheus Castanho
2020-12-30 21:30   ` Tulio Magno Quites Machado Filho
2020-12-08 18:24 ` [PATCH v2 0/2] powerpc: Add support for system call vectored Lucas A. M. Magalhaes

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).