public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed
* [PATCH 0/4] powerpc: Add support for system call vectored
@ 2020-11-18 14:46 Matheus Castanho
  2020-11-18 14:47 ` [PATCH 1/4] powerpc: Replace brk.S with a C implementation Matheus Castanho
                   ` (3 more replies)
  0 siblings, 4 replies; 20+ messages in thread
From: Matheus Castanho @ 2020-11-18 14:46 UTC (permalink / raw)
  To: libc-alpha; +Cc: tuliom

This patchset enables the usage of system call vectored (scv) instruction by
system calls for improved performance on POWER9 and later processors.

Please check each patch for more details, specially 3/4 which explains the
rationale and details of the runtime mechanism to choose between the old
behavior (sc) and the new one (scv 0).

Tested on powerpc, powerpc64, and powerpc64le (with and without scv)

Matheus Castanho (3):
  powerpc: Make PT_THREAD_POINTER available to assembly code
  powerpc: Runtime selection between sc and scv for syscalls
  powerpc: Use scv instruction on clone when available

Tulio Magno Quites Machado Filho (1):
  powerpc: Replace brk.S with a C implementation

 sysdeps/powerpc/nptl/tls.h                    | 26 +++---
 sysdeps/powerpc/powerpc32/sysdep.h            | 19 ++--
 sysdeps/powerpc/powerpc64/sysdep.h            | 90 ++++++++++++++++++-
 .../linux/powerpc/{powerpc64/brk.S => brk.c}  | 41 ++++-----
 sysdeps/unix/sysv/linux/powerpc/dl-brk.S      |  1 -
 .../unix/sysv/linux/powerpc/powerpc32/brk.S   | 52 -----------
 .../unix/sysv/linux/powerpc/powerpc64/clone.S | 36 +++++++-
 .../unix/sysv/linux/powerpc/powerpc64/vfork.S |  6 +-
 sysdeps/unix/sysv/linux/powerpc/syscall.S     | 11 ++-
 sysdeps/unix/sysv/linux/powerpc/sysdep.h      | 78 +++++++++++-----
 10 files changed, 235 insertions(+), 125 deletions(-)
 rename sysdeps/unix/sysv/linux/powerpc/{powerpc64/brk.S => brk.c} (58%)
 delete mode 100644 sysdeps/unix/sysv/linux/powerpc/dl-brk.S
 delete mode 100644 sysdeps/unix/sysv/linux/powerpc/powerpc32/brk.S

--
2.26.2

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 1/4] powerpc: Replace brk.S with a C implementation
  2020-11-18 14:46 [PATCH 0/4] powerpc: Add support for system call vectored Matheus Castanho
@ 2020-11-18 14:47 ` Matheus Castanho
  2020-11-18 20:00   ` Adhemerval Zanella
  2020-11-18 21:28   ` Andreas Schwab
  2020-11-18 14:47 ` [PATCH 2/4] powerpc: Make PT_THREAD_POINTER available to assembly code Matheus Castanho
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 20+ messages in thread
From: Matheus Castanho @ 2020-11-18 14:47 UTC (permalink / raw)
  To: libc-alpha; +Cc: tuliom

From: Tulio Magno Quites Machado Filho <tuliom@linux.ibm.com>

There is no need to maintain a separate assembly implementation of brk for
powerpc.  It basically does the same thing as the generic C-based version, so we
can reuse that instead.
---
 .../linux/powerpc/{powerpc64/brk.S => brk.c}  | 41 +++++++--------
 sysdeps/unix/sysv/linux/powerpc/dl-brk.S      |  1 -
 .../unix/sysv/linux/powerpc/powerpc32/brk.S   | 52 -------------------
 3 files changed, 18 insertions(+), 76 deletions(-)
 rename sysdeps/unix/sysv/linux/powerpc/{powerpc64/brk.S => brk.c} (58%)
 delete mode 100644 sysdeps/unix/sysv/linux/powerpc/dl-brk.S
 delete mode 100644 sysdeps/unix/sysv/linux/powerpc/powerpc32/brk.S

diff --git a/sysdeps/unix/sysv/linux/powerpc/powerpc64/brk.S b/sysdeps/unix/sysv/linux/powerpc/brk.c
similarity index 58%
rename from sysdeps/unix/sysv/linux/powerpc/powerpc64/brk.S
rename to sysdeps/unix/sysv/linux/powerpc/brk.c
index f206909b72..af7d1d73a2 100644
--- a/sysdeps/unix/sysv/linux/powerpc/powerpc64/brk.S
+++ b/sysdeps/unix/sysv/linux/powerpc/brk.c
@@ -1,6 +1,6 @@
-/* brk system call for Linux.  PowerPC64 version.
-   Copyright (C) 1995-2020 Free Software Foundation, Inc.
+/* Copyright (C) 2020 Free Software Foundation, Inc.
    This file is part of the GNU C Library.
+   Contributed by Chris Metcalf <cmetcalf@tilera.com>, 2011.
 
    The GNU C Library is free software; you can redistribute it and/or
    modify it under the terms of the GNU Lesser General Public
@@ -13,31 +13,26 @@
    Lesser General Public License for more details.
 
    You should have received a copy of the GNU Lesser General Public
-   License along with the GNU C Library; if not, see
+   License along with the GNU C Library.  If not, see
    <https://www.gnu.org/licenses/>.  */
 
+#include <errno.h>
+#include <unistd.h>
 #include <sysdep.h>
-#define _ERRNO_H	1
-#include <bits/errno.h>
 
-	.comm	__curbrk,8,8
-	.section	".toc","aw"
-.LC__curbrk:
-	.tc __curbrk[TC],__curbrk
-	.section ".text"
-ENTRY (__brk)
-	CALL_MCOUNT 1
+/* This must be initialized data because commons can't have aliases.  */
+void *__curbrk = 0;
 
-	std	r3,-8(r1)
-	DO_CALL(SYS_ify(brk))
-	ld	r6,-8(r1)
-	ld	r5,.LC__curbrk@toc(r2)
-	std     r3,0(r5)
-	cmpld   r6,r3
-	li	r3,0
-	blelr+
-	li      r3,ENOMEM
-	TAIL_CALL_SYSCALL_ERROR
-END (__brk)
+int
+__brk (void *addr)
+{
+  __curbrk = (void *) INTERNAL_SYSCALL_CALL (brk, addr);
+  if (__curbrk < addr)
+    {
+      __set_errno (ENOMEM);
+      return -1;
+    }
 
+  return 0;
+}
 weak_alias (__brk, brk)
diff --git a/sysdeps/unix/sysv/linux/powerpc/dl-brk.S b/sysdeps/unix/sysv/linux/powerpc/dl-brk.S
deleted file mode 100644
index eeb96544e3..0000000000
--- a/sysdeps/unix/sysv/linux/powerpc/dl-brk.S
+++ /dev/null
@@ -1 +0,0 @@
-#include <brk.S>
diff --git a/sysdeps/unix/sysv/linux/powerpc/powerpc32/brk.S b/sysdeps/unix/sysv/linux/powerpc/powerpc32/brk.S
deleted file mode 100644
index f3b960795e..0000000000
--- a/sysdeps/unix/sysv/linux/powerpc/powerpc32/brk.S
+++ /dev/null
@@ -1,52 +0,0 @@
-/* brk system call for Linux/ppc.
-   Copyright (C) 1995-2020 Free Software Foundation, Inc.
-   This file is part of the GNU C Library.
-
-   The GNU C Library is free software; you can redistribute it and/or
-   modify it under the terms of the GNU Lesser General Public
-   License as published by the Free Software Foundation; either
-   version 2.1 of the License, or (at your option) any later version.
-
-   The GNU C Library is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-   Lesser General Public License for more details.
-
-   You should have received a copy of the GNU Lesser General Public
-   License along with the GNU C Library; if not, see
-   <https://www.gnu.org/licenses/>.  */
-
-#include <sysdep.h>
-#define _ERRNO_H	1
-#include <bits/errno.h>
-
-	.comm	__curbrk,4,4
-	.section ".text"
-ENTRY (__brk)
-	mflr	r0
-	stwu    r1,-16(r1)
-	cfi_adjust_cfa_offset (16)
-	stw	r3,8(r1)
-	stw	r0,20(r1)
-	cfi_offset (lr, 4)
-	DO_CALL(SYS_ify(brk))
-	lwz     r6,8(r1)
-#ifdef PIC
-	SETUP_GOT_ACCESS(r5,got_label)
-	addis	r5,r5,__curbrk-got_label@ha
-	stw	r3,__curbrk-got_label@l(r5)
-#else
-	lis     r4,__curbrk@ha
-	stw     r3,__curbrk@l(r4)
-#endif
-	lwz	r0,20(r1)
-	cmplw   r6,r3
-	addi    r1,r1,16
-	mtlr	r0
-	li	r3,0
-	blelr+
-	li      r3,ENOMEM
-	b	__syscall_error@local
-END (__brk)
-
-weak_alias (__brk, brk)
-- 
2.26.2


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 2/4] powerpc: Make PT_THREAD_POINTER available to assembly code
  2020-11-18 14:46 [PATCH 0/4] powerpc: Add support for system call vectored Matheus Castanho
  2020-11-18 14:47 ` [PATCH 1/4] powerpc: Replace brk.S with a C implementation Matheus Castanho
@ 2020-11-18 14:47 ` Matheus Castanho
  2020-11-19 19:58   ` Tulio Magno Quites Machado Filho
  2020-11-18 14:47 ` [PATCH 3/4] powerpc: Runtime selection between sc and scv for syscalls Matheus Castanho
  2020-11-18 14:47 ` [PATCH 4/4] powerpc: Use scv instruction on clone when available Matheus Castanho
  3 siblings, 1 reply; 20+ messages in thread
From: Matheus Castanho @ 2020-11-18 14:47 UTC (permalink / raw)
  To: libc-alpha; +Cc: tuliom

PT_THREAD_POINTER is currenty defined inside a #ifndef __ASSEMBLER__ block, but
its usage should not be limited to C code, as it can be useful when accessing
the TLS from assembly code as well.
---
 sysdeps/powerpc/nptl/tls.h | 26 ++++++++++++++++----------
 1 file changed, 16 insertions(+), 10 deletions(-)

diff --git a/sysdeps/powerpc/nptl/tls.h b/sysdeps/powerpc/nptl/tls.h
index 261eecfd18..d585968d04 100644
--- a/sysdeps/powerpc/nptl/tls.h
+++ b/sysdeps/powerpc/nptl/tls.h
@@ -29,8 +29,24 @@
 
 #else /* __ASSEMBLER__ */
 # include <tcb-offsets.h>
+# define __ASSEMBLY__
+# include <asm/ptrace.h>
 #endif /* __ASSEMBLER__ */
 
+#ifndef __powerpc64__
+/* Register r2 (tp) is reserved by the ABI as "thread pointer". */
+# define PT_THREAD_POINTER PT_R2
+# ifndef __ASSEMBLER__
+register void *__thread_register __asm__ ("r2");
+# endif
+
+#else /* __powerpc64__ */
+/* Register r13 (tp) is reserved by the ABI as "thread pointer". */
+# define PT_THREAD_POINTER PT_R13
+# ifndef __ASSEMBLER__
+register void *__thread_register __asm__ ("r13");
+# endif
+#endif /* __powerpc64__ */
 
 #ifndef __ASSEMBLER__
 
@@ -106,16 +122,6 @@ typedef struct
   (sizeof (struct pthread)						      \
    + ((sizeof (tcbhead_t) + TLS_TCB_ALIGN - 1) & ~(TLS_TCB_ALIGN - 1)))
 
-# ifndef __powerpc64__
-/* Register r2 (tp) is reserved by the ABI as "thread pointer". */
-register void *__thread_register __asm__ ("r2");
-#  define PT_THREAD_POINTER PT_R2
-# else
-/* Register r13 (tp) is reserved by the ABI as "thread pointer". */
-register void *__thread_register __asm__ ("r13");
-#  define PT_THREAD_POINTER PT_R13
-# endif
-
 /* The following assumes that TP (R2 or R13) points to the end of the
    TCB + 0x7000 (per the ABI).  This implies that TCB address is
    TP - 0x7000.  As we define TLS_DTV_AT_TP we can
-- 
2.26.2


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 3/4] powerpc: Runtime selection between sc and scv for syscalls
  2020-11-18 14:46 [PATCH 0/4] powerpc: Add support for system call vectored Matheus Castanho
  2020-11-18 14:47 ` [PATCH 1/4] powerpc: Replace brk.S with a C implementation Matheus Castanho
  2020-11-18 14:47 ` [PATCH 2/4] powerpc: Make PT_THREAD_POINTER available to assembly code Matheus Castanho
@ 2020-11-18 14:47 ` Matheus Castanho
  2020-11-18 15:16   ` Florian Weimer
  2020-11-18 19:00   ` Paul A. Clarke
  2020-11-18 14:47 ` [PATCH 4/4] powerpc: Use scv instruction on clone when available Matheus Castanho
  3 siblings, 2 replies; 20+ messages in thread
From: Matheus Castanho @ 2020-11-18 14:47 UTC (permalink / raw)
  To: libc-alpha; +Cc: tuliom

Linux kernel v5.9 added support for system calls using the scv
instruction for POWER9 and later.  The new codepath provides better
performance (see below) if compared to using sc.  For the
foreseeable future, both sc and scv mechanisms will co-exist, so this
patch enables glibc to do a runtime check and always use scv when it is
available.

Before issuing the system call to the kernel, we check hwcap2 in the TCB
for PPC_FEATURE2_SCV to see if scv is supported by the kernel.  If not,
we fallback to sc and keep the old behavior.

The kernel implements a different error return convention for scv, so
when returning from a system call we need to handle the return value
differently depending on the instruction we used to enter the kernel.

For syscalls implemented in ASM, entry and exit are implemented by
different macros (PSEUDO and PSEUDO_RET, resp.), which may be used in
sequence (e.g. for templated syscalls) or with other instructions in
between (e.g. clone).  To avoid accessing the TCB a second time on
PSEUDO_RET to check which instruction we used, the value read from
hwcap2 is cached on a non-volatile register.

This is not needed when using INTERNAL_SYSCALL macro, since entry and
exit are bundled into the same inline asm directive.

Since system calls may be called before the TCB has been setup (e.g.
inside the dynamic loader), we also check the value of the thread
pointer before effectively accessing the TCB.  For such situations in
which the availability of scv cannot be determined, sc is always used.

Support for scv in syscalls implemented in their own ASM file (clone and
vfork) will be added later. For now simply use sc as before.

Average performance over 1M calls for each syscall "type":
  - stat: C wrapper calling INTERNAL_SYSCALL
  - getpid: templated ASM syscall
  - syscall: call to gettid using syscall function

  Standard:
     stat : 1.573445 us / ~3619 cycles
   getpid : 0.164986 us / ~379 cycles
  syscall : 0.162743 us / ~374 cycles

  With scv:
     stat : 1.537049 us / ~3535 cycles <~ -84 cycles  / -2.32%
   getpid : 0.109923 us / ~253 cycles  <~ -126 cycles / -33.25%
  syscall : 0.116410 us / ~268 cycles  <~ -106 cycles / -28.34%

Tested on powerpc, powerpc64, powerpc64le (with and without scv)
---
 sysdeps/powerpc/powerpc32/sysdep.h            | 19 ++--
 sysdeps/powerpc/powerpc64/sysdep.h            | 90 ++++++++++++++++++-
 .../unix/sysv/linux/powerpc/powerpc64/clone.S |  9 +-
 .../unix/sysv/linux/powerpc/powerpc64/vfork.S |  6 +-
 sysdeps/unix/sysv/linux/powerpc/syscall.S     | 11 ++-
 sysdeps/unix/sysv/linux/powerpc/sysdep.h      | 78 +++++++++++-----
 6 files changed, 174 insertions(+), 39 deletions(-)

diff --git a/sysdeps/powerpc/powerpc32/sysdep.h b/sysdeps/powerpc/powerpc32/sysdep.h
index 829eec266a..bff18bdc8b 100644
--- a/sysdeps/powerpc/powerpc32/sysdep.h
+++ b/sysdeps/powerpc/powerpc32/sysdep.h
@@ -90,9 +90,12 @@ GOT_LABEL:			;					      \
   cfi_endproc;								      \
   ASM_SIZE_DIRECTIVE(name)
 
-#define DO_CALL(syscall)						      \
-    li 0,syscall;							      \
-    sc
+#define DO_CALL(syscall) \
+	li 0,syscall; \
+	DO_CALL_SC
+
+#define DO_CALL_SC \
+	sc
 
 #undef JUMPTARGET
 #ifdef PIC
@@ -106,14 +109,20 @@ GOT_LABEL:			;					      \
 # define HIDDEN_JUMPTARGET(name) __GI_##name##@local
 #endif
 
+#define TAIL_CALL_SYSCALL_ERROR \
+    b __syscall_error@local
+
 #define PSEUDO(name, syscall_name, args)				      \
   .section ".text";							      \
   ENTRY (name)								      \
     DO_CALL (SYS_ify (syscall_name));
 
+#define RET_SC \
+    bnslr+;
+
 #define PSEUDO_RET							      \
-    bnslr+;								      \
-    b __syscall_error@local
+    RET_SC;								      \
+    TAIL_CALL_SYSCALL_ERROR
 #define ret PSEUDO_RET
 
 #undef	PSEUDO_END
diff --git a/sysdeps/powerpc/powerpc64/sysdep.h b/sysdeps/powerpc/powerpc64/sysdep.h
index d557098898..2d7dde64da 100644
--- a/sysdeps/powerpc/powerpc64/sysdep.h
+++ b/sysdeps/powerpc/powerpc64/sysdep.h
@@ -17,6 +17,7 @@
    <https://www.gnu.org/licenses/>.  */
 
 #include <sysdeps/powerpc/sysdep.h>
+#include <tls.h>
 
 #ifdef __ASSEMBLER__
 
@@ -263,10 +264,72 @@ LT_LABELSUFFIX(name,_name_end): ; \
   TRACEBACK_MASK(name,mask);	\
   END_2(name)
 
+/* We will allocate a new frame to save LR and the non-volatile register used to
+   read the TCB when checking for scv support on syscall code.  We actually just
+   need the minimum frame size plus room for 1 reg (64 bits).  But the ABI
+   mandates stack frames should be aligned at 16 Bytes, so we end up allocating
+   a bit more space then what will actually be used.  */
+#define SCV_FRAME_SIZE (FRAME_MIN_SIZE+16)
+#define SCV_FRAME_NVOLREG_SAVE FRAME_MIN_SIZE
+
+/* Allocate frame and save register */
+#define NVOLREG_SAVE \
+    stdu r1,-SCV_FRAME_SIZE(r1); \
+    std r31,SCV_FRAME_NVOLREG_SAVE(r1); \
+    cfi_adjust_cfa_offset(SCV_FRAME_SIZE);
+
+/* Restore register and destroy frame */
+#define NVOLREG_RESTORE	\
+    ld r31,SCV_FRAME_NVOLREG_SAVE(r1); \
+    addi r1,r1,SCV_FRAME_SIZE; \
+    cfi_adjust_cfa_offset(-SCV_FRAME_SIZE);
+
+/* Check PPC_FEATURE2_SCV bit from hwcap2 in the TCB and update CR0
+ * accordingly.  First, we check if the thread pointer != 0, so we don't try to
+ * access the TCB before it has been initialized, e.g. inside the dynamic
+ * loader.  If it is already initialized, check if scv is available.  On both
+ * negative cases, go to JUMPFALSE (label given by the macro's caller).  We
+ * save the value we read from the TCB in a non-volatile register so we can
+ * reuse it later when exiting from the syscall in PSEUDO_RET.  */
+    .macro CHECK_SCV_SUPPORT REG JUMPFALSE
+
+    /* Check if thread pointer has already been setup */
+    cmpdi r13,0
+    beq \JUMPFALSE
+
+    /* Read PPC_FEATURE2_SCV from TCB and store it in REG */
+    ld \REG,TCB_HWCAP(PT_THREAD_POINTER)
+    andis. \REG,\REG,PPC_FEATURE2_SCV>>16
+
+    beq \JUMPFALSE
+    .endm
+
+/* Before doing the syscall, check if we can use scv.  scv is supported by P9
+ * and later with Linux v5.9 and later.  If so, use it.  Otherwise, fallback to
+ * sc.  We use a non-volatile register to save hwcap2 from the TCB, so we need
+ * to save its content beforehand. */
 #define DO_CALL(syscall) \
-    li 0,syscall; \
+    li r0,syscall; \
+    NVOLREG_SAVE; \
+    CHECK_SCV_SUPPORT r31 0f; \
+    DO_CALL_SCV; \
+    b 1f; \
+0:  DO_CALL_SC; \
+1:
+
+/* DO_CALL_SC and DO_CALL_SCV expect the syscall number to be loaded on r0.  */
+#define DO_CALL_SC \
     sc
 
+#define DO_CALL_SCV \
+    mflr r9; \
+    std r9,FRAME_LR_SAVE(r1); \
+    cfi_offset(lr,FRAME_LR_SAVE); \
+    scv 0; \
+    ld r9,FRAME_LR_SAVE(r1); \
+    mtlr r9; \
+    cfi_restore(lr);
+
 /* ppc64 is always PIC */
 #undef JUMPTARGET
 #define JUMPTARGET(name) FUNC_LABEL(name)
@@ -304,9 +367,26 @@ LT_LABELSUFFIX(name,_name_end): ; \
     .endif
 #endif
 
+/* This should only be called after a DO_CALL. In such cases, r31 contains the
+ * value of PPC_FEATURE2_SCV read from hwcap2 by CHECK_SCV_SUPPORT.  If it is
+ * set, we know we have entered the kernel using scv, so handle the return code
+ * accordingly.  */
 #define PSEUDO_RET \
-    bnslr+; \
-    TAIL_CALL_SYSCALL_ERROR
+    cmpdi cr5,r31,0; \
+    NVOLREG_RESTORE; \
+    beq cr5,0f; \
+    RET_SCV; \
+    b 1f; \
+0:  RET_SC; \
+1:  TAIL_CALL_SYSCALL_ERROR
+
+#define RET_SCV \
+    cmpdi r3,0; \
+    bgelr+; \
+    neg r3,r3;
+
+#define RET_SC \
+    bnslr+;
 
 #define ret PSEUDO_RET
 
@@ -319,7 +399,9 @@ LT_LABELSUFFIX(name,_name_end): ; \
   ENTRY (name);						\
   DO_CALL (SYS_ify (syscall_name))
 
+/* This should only be called after a DO_CALL.  */
 #define PSEUDO_RET_NOERRNO \
+    NVOLREG_RESTORE; \
     blr
 
 #define ret_NOERRNO PSEUDO_RET_NOERRNO
@@ -333,7 +415,9 @@ LT_LABELSUFFIX(name,_name_end): ; \
   ENTRY (name);						\
   DO_CALL (SYS_ify (syscall_name))
 
+/* This should only be called after a DO_CALL.  */
 #define PSEUDO_RET_ERRVAL \
+    NVOLREG_RESTORE; \
     blr
 
 #define ret_ERRVAL PSEUDO_RET_ERRVAL
diff --git a/sysdeps/unix/sysv/linux/powerpc/powerpc64/clone.S b/sysdeps/unix/sysv/linux/powerpc/powerpc64/clone.S
index b30641c805..fc496fa671 100644
--- a/sysdeps/unix/sysv/linux/powerpc/powerpc64/clone.S
+++ b/sysdeps/unix/sysv/linux/powerpc/powerpc64/clone.S
@@ -68,7 +68,8 @@ ENTRY (__clone)
 	cfi_endproc
 
 	/* Do the call.  */
-	DO_CALL(SYS_ify(clone))
+	li 	r0,SYS_ify(clone)
+	DO_CALL_SC
 
 	/* Check for child process.  */
 	cmpdi	cr1,r3,0
@@ -82,7 +83,8 @@ ENTRY (__clone)
 	bctrl
 	ld	r2,FRAME_TOC_SAVE(r1)
 
-	DO_CALL(SYS_ify(exit))
+	li	r0,(SYS_ify(exit))
+	DO_CALL_SC
 	/* We won't ever get here but provide a nop so that the linker
 	   will insert a toc adjusting stub if necessary.  */
 	nop
@@ -104,7 +106,8 @@ L(parent):
 	cfi_restore(r30)
 	cfi_restore(r31)
 
-	PSEUDO_RET
+	RET_SC
+	TAIL_CALL_SYSCALL_ERROR
 
 END (__clone)
 
diff --git a/sysdeps/unix/sysv/linux/powerpc/powerpc64/vfork.S b/sysdeps/unix/sysv/linux/powerpc/powerpc64/vfork.S
index 17199fb14a..a71f69e929 100644
--- a/sysdeps/unix/sysv/linux/powerpc/powerpc64/vfork.S
+++ b/sysdeps/unix/sysv/linux/powerpc/powerpc64/vfork.S
@@ -28,9 +28,11 @@
 ENTRY (__vfork)
 	CALL_MCOUNT 0
 
-	DO_CALL (SYS_ify (vfork))
+	li r0,SYS_ify (vfork)
+	DO_CALL_SC
 
-	PSEUDO_RET
+	RET_SC
+	TAIL_CALL_SYSCALL_ERROR
 
 PSEUDO_END (__vfork)
 libc_hidden_def (__vfork)
diff --git a/sysdeps/unix/sysv/linux/powerpc/syscall.S b/sysdeps/unix/sysv/linux/powerpc/syscall.S
index 48dade4642..23ce2f69c9 100644
--- a/sysdeps/unix/sysv/linux/powerpc/syscall.S
+++ b/sysdeps/unix/sysv/linux/powerpc/syscall.S
@@ -25,6 +25,13 @@ ENTRY (syscall)
 	mr   r6,r7
 	mr   r7,r8
 	mr   r8,r9
-	sc
-	PSEUDO_RET
+#if defined(__PPC64__) || defined(__powerpc64__)
+	CHECK_SCV_SUPPORT r9 0f
+	DO_CALL_SCV
+	RET_SCV
+	b 1f
+#endif
+0:	DO_CALL_SC
+	RET_SC
+1:	TAIL_CALL_SYSCALL_ERROR
 PSEUDO_END (syscall)
diff --git a/sysdeps/unix/sysv/linux/powerpc/sysdep.h b/sysdeps/unix/sysv/linux/powerpc/sysdep.h
index b2bca598b9..19f4321c6b 100644
--- a/sysdeps/unix/sysv/linux/powerpc/sysdep.h
+++ b/sysdeps/unix/sysv/linux/powerpc/sysdep.h
@@ -64,39 +64,69 @@
 #define INTERNAL_VSYSCALL_CALL(funcptr, nr, args...)			\
   INTERNAL_VSYSCALL_CALL_TYPE(funcptr, long int, nr, args)
 
+#define DECLARE_REGS				\
+  register long int r0  __asm__ ("r0");		\
+  register long int r3  __asm__ ("r3");		\
+  register long int r4  __asm__ ("r4");		\
+  register long int r5  __asm__ ("r5");		\
+  register long int r6  __asm__ ("r6");		\
+  register long int r7  __asm__ ("r7");		\
+  register long int r8  __asm__ ("r8");
+
+#define SYSCALL_SCV(nr)				\
+  ({						\
+    __asm__ __volatile__			\
+      ("scv 0\n\t"				\
+       "0:"					\
+       : "=&r" (r0),				\
+	 "=&r" (r3), "=&r" (r4), "=&r" (r5),	\
+	 "=&r" (r6), "=&r" (r7), "=&r" (r8)	\
+       : ASM_INPUT_##nr			\
+       : "r9", "r10", "r11", "r12",		\
+	 "lr", "ctr", "memory");		\
+    r3;					\
+  })
 
-#undef INTERNAL_SYSCALL
-#define INTERNAL_SYSCALL_NCS(name, nr, args...) \
-  ({									\
-    register long int r0  __asm__ ("r0");				\
-    register long int r3  __asm__ ("r3");				\
-    register long int r4  __asm__ ("r4");				\
-    register long int r5  __asm__ ("r5");				\
-    register long int r6  __asm__ ("r6");				\
-    register long int r7  __asm__ ("r7");				\
-    register long int r8  __asm__ ("r8");				\
-    LOADARGS_##nr (name, ##args);					\
-    __asm__ __volatile__						\
-      ("sc\n\t"								\
-       "mfcr  %0\n\t"							\
-       "0:"								\
-       : "=&r" (r0),							\
-         "=&r" (r3), "=&r" (r4), "=&r" (r5),				\
-         "=&r" (r6), "=&r" (r7), "=&r" (r8)				\
-       : ASM_INPUT_##nr							\
-       : "r9", "r10", "r11", "r12",					\
-         "cr0", "ctr", "memory");					\
-    r0 & (1 << 28) ? -r3 : r3;						\
+#define SYSCALL_SC(nr)				\
+  ({						\
+    __asm__ __volatile__			\
+      ("sc\n\t"				\
+       "mfcr %0\n\t"				\
+       "0:"					\
+       : "=&r" (r0),				\
+	 "=&r" (r3), "=&r" (r4), "=&r" (r5),	\
+	 "=&r" (r6), "=&r" (r7), "=&r" (r8)	\
+       : ASM_INPUT_##nr			\
+       : "r9", "r10", "r11", "r12",		\
+	 "cr0", "ctr", "memory");		\
+    r0 & (1 << 28) ? -r3 : r3;			\
   })
-#define INTERNAL_SYSCALL(name, nr, args...)				\
-  INTERNAL_SYSCALL_NCS (__NR_##name, nr, args)
 
 #if defined(__PPC64__) || defined(__powerpc64__)
 # define SYSCALL_ARG_SIZE 8
+
+# define INTERNAL_SYSCALL_NCS(name, nr, args...)			\
+  ({									\
+    DECLARE_REGS;							\
+    LOADARGS_##nr (name, ##args);					\
+    __thread_register != 0 && THREAD_GET_HWCAP() & PPC_FEATURE2_SCV ?	\
+      SYSCALL_SCV(nr) : SYSCALL_SC(nr);					\
+  })
 #else
 # define SYSCALL_ARG_SIZE 4
+
+# define INTERNAL_SYSCALL_NCS(name, nr, args...)	\
+  ({							\
+    DECLARE_REGS;					\
+    LOADARGS_##nr (name, ##args);			\
+    SYSCALL_SC(nr);					\
+  })
 #endif
 
+#undef INTERNAL_SYSCALL
+#define INTERNAL_SYSCALL(name, nr, args...)				\
+  INTERNAL_SYSCALL_NCS (__NR_##name, nr, args)
+
 #define LOADARGS_0(name, dummy) \
 	r0 = name
 #define LOADARGS_1(name, __arg1) \
-- 
2.26.2


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 4/4] powerpc: Use scv instruction on clone when available
  2020-11-18 14:46 [PATCH 0/4] powerpc: Add support for system call vectored Matheus Castanho
                   ` (2 preceding siblings ...)
  2020-11-18 14:47 ` [PATCH 3/4] powerpc: Runtime selection between sc and scv for syscalls Matheus Castanho
@ 2020-11-18 14:47 ` Matheus Castanho
  3 siblings, 0 replies; 20+ messages in thread
From: Matheus Castanho @ 2020-11-18 14:47 UTC (permalink / raw)
  To: libc-alpha; +Cc: tuliom

clone already uses r31 to temporarily save input arguments before doing the
syscall, so we use a different register to read from the TCB. We can also avoid
allocating another stack frame, which is not needed since we can simply extend
the usage of the red zone.
---
 .../unix/sysv/linux/powerpc/powerpc64/clone.S | 33 +++++++++++++++++--
 1 file changed, 30 insertions(+), 3 deletions(-)

diff --git a/sysdeps/unix/sysv/linux/powerpc/powerpc64/clone.S b/sysdeps/unix/sysv/linux/powerpc/powerpc64/clone.S
index fc496fa671..247e0de68c 100644
--- a/sysdeps/unix/sysv/linux/powerpc/powerpc64/clone.S
+++ b/sysdeps/unix/sysv/linux/powerpc/powerpc64/clone.S
@@ -38,9 +38,11 @@ ENTRY (__clone)
 	beq-	cr0,L(badargs)
 
 	/* Save some regs in the "red zone".  */
+	std	r28,-32(r1)
 	std	r29,-24(r1)
 	std	r30,-16(r1)
 	std	r31,-8(r1)
+	cfi_offset(r28,-32)
 	cfi_offset(r29,-24)
 	cfi_offset(r30,-16)
 	cfi_offset(r31,-8)
@@ -69,11 +71,26 @@ ENTRY (__clone)
 
 	/* Do the call.  */
 	li 	r0,SYS_ify(clone)
-	DO_CALL_SC
+	CHECK_SCV_SUPPORT r28 0f
+	/* This is equivalent to DO_CALL_SCV, but we cannot use the macro here
+	because it uses CFI directives and we just called cfi_endproc.  */
+	mflr 	r9
+	std 	r9,FRAME_LR_SAVE(r1)
+	scv 	0
+	ld 	r9,FRAME_LR_SAVE(r1)
+	mtlr 	r9
+
+	/* Check for child process.  */
+	/* When using scv, error is indicated by negative r3.  */
+	cmpdi	cr1,r3,0
+	b 1f
+0:      DO_CALL_SC
 
 	/* Check for child process.  */
+	/* With sc, error is indicated by cr0.SO.  */
 	cmpdi	cr1,r3,0
 	crandc	cr1*4+eq,cr1*4+eq,cr0*4+so
+1:
 	bne-	cr1,L(parent)		/* The '-' is to minimise the race.  */
 
 	std	r2,FRAME_TOC_SAVE(r1)
@@ -95,19 +112,29 @@ L(badargs):
 	TAIL_CALL_SYSCALL_ERROR
 
 L(parent):
+	/* Check if scv is available. */
+	cmpdi cr1,r28,0
+
 	/* Parent.  Restore registers & return.  */
+	cfi_offset(r28,-32)
 	cfi_offset(r29,-24)
 	cfi_offset(r30,-16)
 	cfi_offset(r31,-8)
+	ld	r28,-32(r1)
 	ld	r29,-24(r1)
 	ld	r30,-16(r1)
 	ld	r31,-8(r1)
+	cfi_restore(r28)
 	cfi_restore(r29)
 	cfi_restore(r30)
 	cfi_restore(r31)
 
-	RET_SC
-	TAIL_CALL_SYSCALL_ERROR
+	beq cr1,0f
+	RET_SCV
+	b 1f
+0:	RET_SC
+1:	TAIL_CALL_SYSCALL_ERROR
+
 
 END (__clone)
 
-- 
2.26.2


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 3/4] powerpc: Runtime selection between sc and scv for syscalls
  2020-11-18 14:47 ` [PATCH 3/4] powerpc: Runtime selection between sc and scv for syscalls Matheus Castanho
@ 2020-11-18 15:16   ` Florian Weimer
  2020-11-19 20:29     ` Matheus Castanho
  2020-11-18 19:00   ` Paul A. Clarke
  1 sibling, 1 reply; 20+ messages in thread
From: Florian Weimer @ 2020-11-18 15:16 UTC (permalink / raw)
  To: Matheus Castanho via Libc-alpha; +Cc: Matheus Castanho, tuliom

* Matheus Castanho via Libc-alpha:

> +/* Check PPC_FEATURE2_SCV bit from hwcap2 in the TCB and update CR0
> + * accordingly.  First, we check if the thread pointer != 0, so we don't try to
> + * access the TCB before it has been initialized, e.g. inside the dynamic
> + * loader.  If it is already initialized, check if scv is available.  On both
> + * negative cases, go to JUMPFALSE (label given by the macro's caller).  We
> + * save the value we read from the TCB in a non-volatile register so we can
> + * reuse it later when exiting from the syscall in PSEUDO_RET.  */

This comment style is not GNU (sorry).

I think you can avoid the conditional check and replace it with #if
IS_IN (rtld).  Then ld.so will use the old interface unconditionally,
but that should be okay.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 3/4] powerpc: Runtime selection between sc and scv for syscalls
  2020-11-18 14:47 ` [PATCH 3/4] powerpc: Runtime selection between sc and scv for syscalls Matheus Castanho
  2020-11-18 15:16   ` Florian Weimer
@ 2020-11-18 19:00   ` Paul A. Clarke
  2020-11-19 20:34     ` Matheus Castanho
  1 sibling, 1 reply; 20+ messages in thread
From: Paul A. Clarke @ 2020-11-18 19:00 UTC (permalink / raw)
  To: Matheus Castanho; +Cc: libc-alpha, tuliom

On Wed, Nov 18, 2020 at 11:47:02AM -0300, Matheus Castanho via Libc-alpha wrote:
> Linux kernel v5.9 added support for system calls using the scv
> instruction for POWER9 and later.  The new codepath provides better
> performance (see below) if compared to using sc.  For the
> foreseeable future, both sc and scv mechanisms will co-exist, so this
> patch enables glibc to do a runtime check and always use scv when it is
> available.

nit: "always" is perhaps too strong here, as there are exceptions, as noted
in your message further below.

> Before issuing the system call to the kernel, we check hwcap2 in the TCB
> for PPC_FEATURE2_SCV to see if scv is supported by the kernel.  If not,
> we fallback to sc and keep the old behavior.
> 
> The kernel implements a different error return convention for scv, so
> when returning from a system call we need to handle the return value
> differently depending on the instruction we used to enter the kernel.
> 
> For syscalls implemented in ASM, entry and exit are implemented by
> different macros (PSEUDO and PSEUDO_RET, resp.), which may be used in
> sequence (e.g. for templated syscalls) or with other instructions in
> between (e.g. clone).  To avoid accessing the TCB a second time on
> PSEUDO_RET to check which instruction we used, the value read from
> hwcap2 is cached on a non-volatile register.
> 
> This is not needed when using INTERNAL_SYSCALL macro, since entry and
> exit are bundled into the same inline asm directive.
> 
> Since system calls may be called before the TCB has been setup (e.g.
> inside the dynamic loader), we also check the value of the thread
> pointer before effectively accessing the TCB.  For such situations in
> which the availability of scv cannot be determined, sc is always used.
> 
> Support for scv in syscalls implemented in their own ASM file (clone and
> vfork) will be added later. For now simply use sc as before.
> 
> Average performance over 1M calls for each syscall "type":
>   - stat: C wrapper calling INTERNAL_SYSCALL
>   - getpid: templated ASM syscall
>   - syscall: call to gettid using syscall function
> 
>   Standard:
>      stat : 1.573445 us / ~3619 cycles
>    getpid : 0.164986 us / ~379 cycles
>   syscall : 0.162743 us / ~374 cycles
> 
>   With scv:
>      stat : 1.537049 us / ~3535 cycles <~ -84 cycles  / -2.32%
>    getpid : 0.109923 us / ~253 cycles  <~ -126 cycles / -33.25%
>   syscall : 0.116410 us / ~268 cycles  <~ -106 cycles / -28.34%
> 
> Tested on powerpc, powerpc64, powerpc64le (with and without scv)
> ---
>  sysdeps/powerpc/powerpc32/sysdep.h            | 19 ++--
>  sysdeps/powerpc/powerpc64/sysdep.h            | 90 ++++++++++++++++++-
>  .../unix/sysv/linux/powerpc/powerpc64/clone.S |  9 +-
>  .../unix/sysv/linux/powerpc/powerpc64/vfork.S |  6 +-
>  sysdeps/unix/sysv/linux/powerpc/syscall.S     | 11 ++-
>  sysdeps/unix/sysv/linux/powerpc/sysdep.h      | 78 +++++++++++-----
>  6 files changed, 174 insertions(+), 39 deletions(-)
> 
> diff --git a/sysdeps/powerpc/powerpc32/sysdep.h b/sysdeps/powerpc/powerpc32/sysdep.h
> index 829eec266a..bff18bdc8b 100644
> --- a/sysdeps/powerpc/powerpc32/sysdep.h
> +++ b/sysdeps/powerpc/powerpc32/sysdep.h
> @@ -90,9 +90,12 @@ GOT_LABEL:			;					      \
>    cfi_endproc;								      \
>    ASM_SIZE_DIRECTIVE(name)
> 
> -#define DO_CALL(syscall)						      \
> -    li 0,syscall;							      \
> -    sc
> +#define DO_CALL(syscall) \
> +	li 0,syscall; \
> +	DO_CALL_SC

nit: there are some innocuous whitespace changes which could be avoided to minimize
the diff (moving the '\' closer and changing from spaces to a tab).

> +
> +#define DO_CALL_SC \
> +	sc
> 
>  #undef JUMPTARGET
>  #ifdef PIC
> @@ -106,14 +109,20 @@ GOT_LABEL:			;					      \
>  # define HIDDEN_JUMPTARGET(name) __GI_##name##@local
>  #endif
> 
> +#define TAIL_CALL_SYSCALL_ERROR \
> +    b __syscall_error@local
> +
>  #define PSEUDO(name, syscall_name, args)				      \
>    .section ".text";							      \
>    ENTRY (name)								      \
>      DO_CALL (SYS_ify (syscall_name));
> 
> +#define RET_SC \
> +    bnslr+;
> +
>  #define PSEUDO_RET							      \
> -    bnslr+;								      \
> -    b __syscall_error@local
> +    RET_SC;								      \
> +    TAIL_CALL_SYSCALL_ERROR
>  #define ret PSEUDO_RET
> 
>  #undef	PSEUDO_END
> diff --git a/sysdeps/powerpc/powerpc64/sysdep.h b/sysdeps/powerpc/powerpc64/sysdep.h
> index d557098898..2d7dde64da 100644
> --- a/sysdeps/powerpc/powerpc64/sysdep.h
> +++ b/sysdeps/powerpc/powerpc64/sysdep.h
> @@ -17,6 +17,7 @@
>     <https://www.gnu.org/licenses/>.  */
> 
>  #include <sysdeps/powerpc/sysdep.h>
> +#include <tls.h>
> 
>  #ifdef __ASSEMBLER__
> 
> @@ -263,10 +264,72 @@ LT_LABELSUFFIX(name,_name_end): ; \
>    TRACEBACK_MASK(name,mask);	\
>    END_2(name)
> 
> +/* We will allocate a new frame to save LR and the non-volatile register used to
> +   read the TCB when checking for scv support on syscall code.  We actually just
> +   need the minimum frame size plus room for 1 reg (64 bits).  But the ABI

nit: Since everything is in bytes below, suggest changing "64 bits" to "8 bytes".

> +   mandates stack frames should be aligned at 16 Bytes, so we end up allocating
> +   a bit more space then what will actually be used.  */
> +#define SCV_FRAME_SIZE (FRAME_MIN_SIZE+16)

8 for the register save area + 8 more to maintain 16-byte alignment.  OK.

> +#define SCV_FRAME_NVOLREG_SAVE FRAME_MIN_SIZE
> +
> +/* Allocate frame and save register */
> +#define NVOLREG_SAVE \
> +    stdu r1,-SCV_FRAME_SIZE(r1); \
> +    std r31,SCV_FRAME_NVOLREG_SAVE(r1); \
> +    cfi_adjust_cfa_offset(SCV_FRAME_SIZE);
> +
> +/* Restore register and destroy frame */
> +#define NVOLREG_RESTORE	\
> +    ld r31,SCV_FRAME_NVOLREG_SAVE(r1); \
> +    addi r1,r1,SCV_FRAME_SIZE; \
> +    cfi_adjust_cfa_offset(-SCV_FRAME_SIZE);
> +
> +/* Check PPC_FEATURE2_SCV bit from hwcap2 in the TCB and update CR0
> + * accordingly.  First, we check if the thread pointer != 0, so we don't try to
> + * access the TCB before it has been initialized, e.g. inside the dynamic
> + * loader.  If it is already initialized, check if scv is available.  On both
> + * negative cases, go to JUMPFALSE (label given by the macro's caller).  We
> + * save the value we read from the TCB in a non-volatile register so we can
> + * reuse it later when exiting from the syscall in PSEUDO_RET.  */

Florian already mentioned removing the leading '*' for proper formatting.

> +    .macro CHECK_SCV_SUPPORT REG JUMPFALSE
> +
> +    /* Check if thread pointer has already been setup */
> +    cmpdi r13,0
> +    beq \JUMPFALSE
> +
> +    /* Read PPC_FEATURE2_SCV from TCB and store it in REG */
> +    ld \REG,TCB_HWCAP(PT_THREAD_POINTER)
> +    andis. \REG,\REG,PPC_FEATURE2_SCV>>16
> +
> +    beq \JUMPFALSE
> +    .endm
> +
> +/* Before doing the syscall, check if we can use scv.  scv is supported by P9
> + * and later with Linux v5.9 and later.  If so, use it.  Otherwise, fallback to
> + * sc.  We use a non-volatile register to save hwcap2 from the TCB, so we need
> + * to save its content beforehand. */

Here, too. Also need one more space after '.'.

>  #define DO_CALL(syscall) \
> -    li 0,syscall; \
> +    li r0,syscall; \
> +    NVOLREG_SAVE; \
> +    CHECK_SCV_SUPPORT r31 0f; \
> +    DO_CALL_SCV; \
> +    b 1f; \
> +0:  DO_CALL_SC; \
> +1:
> +
> +/* DO_CALL_SC and DO_CALL_SCV expect the syscall number to be loaded on r0.  */

nit: s/loaded on/in/

rest looks OK.

With the comments fixed, LGTM. Fixing the nits is up to you.

PC

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/4] powerpc: Replace brk.S with a C implementation
  2020-11-18 14:47 ` [PATCH 1/4] powerpc: Replace brk.S with a C implementation Matheus Castanho
@ 2020-11-18 20:00   ` Adhemerval Zanella
  2020-11-19 17:37     ` Tulio Magno Quites Machado Filho
  2020-11-18 21:28   ` Andreas Schwab
  1 sibling, 1 reply; 20+ messages in thread
From: Adhemerval Zanella @ 2020-11-18 20:00 UTC (permalink / raw)
  To: libc-alpha



On 18/11/2020 11:47, Matheus Castanho via Libc-alpha wrote:
> From: Tulio Magno Quites Machado Filho <tuliom@linux.ibm.com>
> 
> There is no need to maintain a separate assembly implementation of brk for
> powerpc.  It basically does the same thing as the generic C-based version, so we
> can reuse that instead.

LGTM.  You also might want to check my brk consolidation [1], it
consolidate not only powerpc, but all implementations.

[1] https://sourceware.org/pipermail/libc-alpha/2020-November/119770.html

> ---
>  .../linux/powerpc/{powerpc64/brk.S => brk.c}  | 41 +++++++--------
>  sysdeps/unix/sysv/linux/powerpc/dl-brk.S      |  1 -
>  .../unix/sysv/linux/powerpc/powerpc32/brk.S   | 52 -------------------
>  3 files changed, 18 insertions(+), 76 deletions(-)
>  rename sysdeps/unix/sysv/linux/powerpc/{powerpc64/brk.S => brk.c} (58%)
>  delete mode 100644 sysdeps/unix/sysv/linux/powerpc/dl-brk.S
>  delete mode 100644 sysdeps/unix/sysv/linux/powerpc/powerpc32/brk.S
> 
> diff --git a/sysdeps/unix/sysv/linux/powerpc/powerpc64/brk.S b/sysdeps/unix/sysv/linux/powerpc/brk.c
> similarity index 58%
> rename from sysdeps/unix/sysv/linux/powerpc/powerpc64/brk.S
> rename to sysdeps/unix/sysv/linux/powerpc/brk.c
> index f206909b72..af7d1d73a2 100644
> --- a/sysdeps/unix/sysv/linux/powerpc/powerpc64/brk.S
> +++ b/sysdeps/unix/sysv/linux/powerpc/brk.c
> @@ -1,6 +1,6 @@
> -/* brk system call for Linux.  PowerPC64 version.
> -   Copyright (C) 1995-2020 Free Software Foundation, Inc.
> +/* Copyright (C) 2020 Free Software Foundation, Inc.
>     This file is part of the GNU C Library.
> +   Contributed by Chris Metcalf <cmetcalf@tilera.com>, 2011.

No new 'Contributed by' in newer code.

>  
>     The GNU C Library is free software; you can redistribute it and/or
>     modify it under the terms of the GNU Lesser General Public
> @@ -13,31 +13,26 @@
>     Lesser General Public License for more details.
>  
>     You should have received a copy of the GNU Lesser General Public
> -   License along with the GNU C Library; if not, see
> +   License along with the GNU C Library.  If not, see
>     <https://www.gnu.org/licenses/>.  */
>  
> +#include <errno.h>
> +#include <unistd.h>
>  #include <sysdep.h>
> -#define _ERRNO_H	1
> -#include <bits/errno.h>
>  
> -	.comm	__curbrk,8,8
> -	.section	".toc","aw"
> -.LC__curbrk:
> -	.tc __curbrk[TC],__curbrk
> -	.section ".text"
> -ENTRY (__brk)
> -	CALL_MCOUNT 1
> +/* This must be initialized data because commons can't have aliases.  */
> +void *__curbrk = 0;
>  
> -	std	r3,-8(r1)
> -	DO_CALL(SYS_ify(brk))
> -	ld	r6,-8(r1)
> -	ld	r5,.LC__curbrk@toc(r2)
> -	std     r3,0(r5)
> -	cmpld   r6,r3
> -	li	r3,0
> -	blelr+
> -	li      r3,ENOMEM
> -	TAIL_CALL_SYSCALL_ERROR
> -END (__brk)
> +int
> +__brk (void *addr)
> +{
> +  __curbrk = (void *) INTERNAL_SYSCALL_CALL (brk, addr);
> +  if (__curbrk < addr)
> +    {
> +      __set_errno (ENOMEM);
> +      return -1;
> +    }
>  
> +  return 0;
> +}
>  weak_alias (__brk, brk)
> diff --git a/sysdeps/unix/sysv/linux/powerpc/dl-brk.S b/sysdeps/unix/sysv/linux/powerpc/dl-brk.S
> deleted file mode 100644
> index eeb96544e3..0000000000
> --- a/sysdeps/unix/sysv/linux/powerpc/dl-brk.S
> +++ /dev/null
> @@ -1 +0,0 @@
> -#include <brk.S>
> diff --git a/sysdeps/unix/sysv/linux/powerpc/powerpc32/brk.S b/sysdeps/unix/sysv/linux/powerpc/powerpc32/brk.S
> deleted file mode 100644
> index f3b960795e..0000000000
> --- a/sysdeps/unix/sysv/linux/powerpc/powerpc32/brk.S
> +++ /dev/null
> @@ -1,52 +0,0 @@
> -/* brk system call for Linux/ppc.
> -   Copyright (C) 1995-2020 Free Software Foundation, Inc.
> -   This file is part of the GNU C Library.
> -
> -   The GNU C Library is free software; you can redistribute it and/or
> -   modify it under the terms of the GNU Lesser General Public
> -   License as published by the Free Software Foundation; either
> -   version 2.1 of the License, or (at your option) any later version.
> -
> -   The GNU C Library is distributed in the hope that it will be useful,
> -   but WITHOUT ANY WARRANTY; without even the implied warranty of
> -   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> -   Lesser General Public License for more details.
> -
> -   You should have received a copy of the GNU Lesser General Public
> -   License along with the GNU C Library; if not, see
> -   <https://www.gnu.org/licenses/>.  */
> -
> -#include <sysdep.h>
> -#define _ERRNO_H	1
> -#include <bits/errno.h>
> -
> -	.comm	__curbrk,4,4
> -	.section ".text"
> -ENTRY (__brk)
> -	mflr	r0
> -	stwu    r1,-16(r1)
> -	cfi_adjust_cfa_offset (16)
> -	stw	r3,8(r1)
> -	stw	r0,20(r1)
> -	cfi_offset (lr, 4)
> -	DO_CALL(SYS_ify(brk))
> -	lwz     r6,8(r1)
> -#ifdef PIC
> -	SETUP_GOT_ACCESS(r5,got_label)
> -	addis	r5,r5,__curbrk-got_label@ha
> -	stw	r3,__curbrk-got_label@l(r5)
> -#else
> -	lis     r4,__curbrk@ha
> -	stw     r3,__curbrk@l(r4)
> -#endif
> -	lwz	r0,20(r1)
> -	cmplw   r6,r3
> -	addi    r1,r1,16
> -	mtlr	r0
> -	li	r3,0
> -	blelr+
> -	li      r3,ENOMEM
> -	b	__syscall_error@local
> -END (__brk)
> -
> -weak_alias (__brk, brk)
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/4] powerpc: Replace brk.S with a C implementation
  2020-11-18 14:47 ` [PATCH 1/4] powerpc: Replace brk.S with a C implementation Matheus Castanho
  2020-11-18 20:00   ` Adhemerval Zanella
@ 2020-11-18 21:28   ` Andreas Schwab
  1 sibling, 0 replies; 20+ messages in thread
From: Andreas Schwab @ 2020-11-18 21:28 UTC (permalink / raw)
  To: Matheus Castanho via Libc-alpha; +Cc: Matheus Castanho, tuliom

On Nov 18 2020, Matheus Castanho via Libc-alpha wrote:

> +/* This must be initialized data because commons can't have aliases.  */
> +void *__curbrk = 0;

This is not true, you are not adding any aliases.

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/4] powerpc: Replace brk.S with a C implementation
  2020-11-18 20:00   ` Adhemerval Zanella
@ 2020-11-19 17:37     ` Tulio Magno Quites Machado Filho
  0 siblings, 0 replies; 20+ messages in thread
From: Tulio Magno Quites Machado Filho @ 2020-11-19 17:37 UTC (permalink / raw)
  To: Adhemerval Zanella, libc-alpha

Adhemerval Zanella via Libc-alpha <libc-alpha@sourceware.org> writes:

> On 18/11/2020 11:47, Matheus Castanho via Libc-alpha wrote:
>> From: Tulio Magno Quites Machado Filho <tuliom@linux.ibm.com>
>> 
>> There is no need to maintain a separate assembly implementation of brk for
>> powerpc.  It basically does the same thing as the generic C-based version, so we
>> can reuse that instead.
>
> LGTM.  You also might want to check my brk consolidation [1], it
> consolidate not only powerpc, but all implementations.
>
> [1] https://sourceware.org/pipermail/libc-alpha/2020-November/119770.html

I think it's worth dropping this patch in favor of yours, Adhemerval.

-- 
Tulio Magno

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/4] powerpc: Make PT_THREAD_POINTER available to assembly code
  2020-11-18 14:47 ` [PATCH 2/4] powerpc: Make PT_THREAD_POINTER available to assembly code Matheus Castanho
@ 2020-11-19 19:58   ` Tulio Magno Quites Machado Filho
  2020-11-24 17:33     ` Matheus Castanho
  0 siblings, 1 reply; 20+ messages in thread
From: Tulio Magno Quites Machado Filho @ 2020-11-19 19:58 UTC (permalink / raw)
  To: Matheus Castanho, libc-alpha

Matheus Castanho via Libc-alpha <libc-alpha@sourceware.org> writes:

> PT_THREAD_POINTER is currenty defined inside a #ifndef __ASSEMBLER__ block, but
> its usage should not be limited to C code, as it can be useful when accessing
> the TLS from assembly code as well.

This patch LGTM.

Reviewed-by: Tulio Magno Quites Machado Filho <tuliom@linux.ibm.com>

Thanks!

-- 
Tulio Magno

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 3/4] powerpc: Runtime selection between sc and scv for syscalls
  2020-11-18 15:16   ` Florian Weimer
@ 2020-11-19 20:29     ` Matheus Castanho
  2020-11-19 20:35       ` Florian Weimer
  0 siblings, 1 reply; 20+ messages in thread
From: Matheus Castanho @ 2020-11-19 20:29 UTC (permalink / raw)
  To: Florian Weimer, Matheus Castanho via Libc-alpha; +Cc: tuliom



On 11/18/20 12:16 PM, Florian Weimer wrote:
> * Matheus Castanho via Libc-alpha:
> 
>> +/* Check PPC_FEATURE2_SCV bit from hwcap2 in the TCB and update CR0
>> + * accordingly.  First, we check if the thread pointer != 0, so we don't try to
>> + * access the TCB before it has been initialized, e.g. inside the dynamic
>> + * loader.  If it is already initialized, check if scv is available.  On both
>> + * negative cases, go to JUMPFALSE (label given by the macro's caller).  We
>> + * save the value we read from the TCB in a non-volatile register so we can
>> + * reuse it later when exiting from the syscall in PSEUDO_RET.  */
> 
> This comment style is not GNU (sorry).

Oops, fixed.

> 
> I think you can avoid the conditional check and replace it with #if
> IS_IN (rtld).  Then ld.so will use the old interface unconditionally,
> but that should be okay.
> 

That should work for shared libc, but in the static case we may also hit the same problem:
trying to access the TLS to read hwcap2 before it has been initialized, but this time in csu/libc-tls.c

Is there a way to also check if we are in the static startup code at compile time? If not, I'm afraid
I'll have to keep the check for the thread pointer.

Thanks,
Matheus Castanho

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 3/4] powerpc: Runtime selection between sc and scv for syscalls
  2020-11-18 19:00   ` Paul A. Clarke
@ 2020-11-19 20:34     ` Matheus Castanho
  0 siblings, 0 replies; 20+ messages in thread
From: Matheus Castanho @ 2020-11-19 20:34 UTC (permalink / raw)
  To: Paul A. Clarke; +Cc: libc-alpha, tuliom



On 11/18/20 4:00 PM, Paul A. Clarke wrote:
> On Wed, Nov 18, 2020 at 11:47:02AM -0300, Matheus Castanho via Libc-alpha wrote:
>> Linux kernel v5.9 added support for system calls using the scv
>> instruction for POWER9 and later.  The new codepath provides better
>> performance (see below) if compared to using sc.  For the
>> foreseeable future, both sc and scv mechanisms will co-exist, so this
>> patch enables glibc to do a runtime check and always use scv when it is
>> available.
> 
> nit: "always" is perhaps too strong here, as there are exceptions, as noted
> in your message further below.

Right. I'll change that on v2.

> 
>> Before issuing the system call to the kernel, we check hwcap2 in the TCB
>> for PPC_FEATURE2_SCV to see if scv is supported by the kernel.  If not,
>> we fallback to sc and keep the old behavior.
>>
>> The kernel implements a different error return convention for scv, so
>> when returning from a system call we need to handle the return value
>> differently depending on the instruction we used to enter the kernel.
>>
>> For syscalls implemented in ASM, entry and exit are implemented by
>> different macros (PSEUDO and PSEUDO_RET, resp.), which may be used in
>> sequence (e.g. for templated syscalls) or with other instructions in
>> between (e.g. clone).  To avoid accessing the TCB a second time on
>> PSEUDO_RET to check which instruction we used, the value read from
>> hwcap2 is cached on a non-volatile register.
>>
>> This is not needed when using INTERNAL_SYSCALL macro, since entry and
>> exit are bundled into the same inline asm directive.
>>
>> Since system calls may be called before the TCB has been setup (e.g.
>> inside the dynamic loader), we also check the value of the thread
>> pointer before effectively accessing the TCB.  For such situations in
>> which the availability of scv cannot be determined, sc is always used.
>>
>> Support for scv in syscalls implemented in their own ASM file (clone and
>> vfork) will be added later. For now simply use sc as before.
>>
>> Average performance over 1M calls for each syscall "type":
>>   - stat: C wrapper calling INTERNAL_SYSCALL
>>   - getpid: templated ASM syscall
>>   - syscall: call to gettid using syscall function
>>
>>   Standard:
>>      stat : 1.573445 us / ~3619 cycles
>>    getpid : 0.164986 us / ~379 cycles
>>   syscall : 0.162743 us / ~374 cycles
>>
>>   With scv:
>>      stat : 1.537049 us / ~3535 cycles <~ -84 cycles  / -2.32%
>>    getpid : 0.109923 us / ~253 cycles  <~ -126 cycles / -33.25%
>>   syscall : 0.116410 us / ~268 cycles  <~ -106 cycles / -28.34%
>>
>> Tested on powerpc, powerpc64, powerpc64le (with and without scv)
>> ---
>>  sysdeps/powerpc/powerpc32/sysdep.h            | 19 ++--
>>  sysdeps/powerpc/powerpc64/sysdep.h            | 90 ++++++++++++++++++-
>>  .../unix/sysv/linux/powerpc/powerpc64/clone.S |  9 +-
>>  .../unix/sysv/linux/powerpc/powerpc64/vfork.S |  6 +-
>>  sysdeps/unix/sysv/linux/powerpc/syscall.S     | 11 ++-
>>  sysdeps/unix/sysv/linux/powerpc/sysdep.h      | 78 +++++++++++-----
>>  6 files changed, 174 insertions(+), 39 deletions(-)
>>
>> diff --git a/sysdeps/powerpc/powerpc32/sysdep.h b/sysdeps/powerpc/powerpc32/sysdep.h
>> index 829eec266a..bff18bdc8b 100644
>> --- a/sysdeps/powerpc/powerpc32/sysdep.h
>> +++ b/sysdeps/powerpc/powerpc32/sysdep.h
>> @@ -90,9 +90,12 @@ GOT_LABEL:			;					      \
>>    cfi_endproc;								      \
>>    ASM_SIZE_DIRECTIVE(name)
>>
>> -#define DO_CALL(syscall)						      \
>> -    li 0,syscall;							      \
>> -    sc
>> +#define DO_CALL(syscall) \
>> +	li 0,syscall; \
>> +	DO_CALL_SC
> 
> nit: there are some innocuous whitespace changes which could be avoided to minimize
> the diff (moving the '\' closer and changing from spaces to a tab).
> 

Fixed.

>> +
>> +#define DO_CALL_SC \
>> +	sc
>>
>>  #undef JUMPTARGET
>>  #ifdef PIC
>> @@ -106,14 +109,20 @@ GOT_LABEL:			;					      \
>>  # define HIDDEN_JUMPTARGET(name) __GI_##name##@local
>>  #endif
>>
>> +#define TAIL_CALL_SYSCALL_ERROR \
>> +    b __syscall_error@local
>> +
>>  #define PSEUDO(name, syscall_name, args)				      \
>>    .section ".text";							      \
>>    ENTRY (name)								      \
>>      DO_CALL (SYS_ify (syscall_name));
>>
>> +#define RET_SC \
>> +    bnslr+;
>> +
>>  #define PSEUDO_RET							      \
>> -    bnslr+;								      \
>> -    b __syscall_error@local
>> +    RET_SC;								      \
>> +    TAIL_CALL_SYSCALL_ERROR
>>  #define ret PSEUDO_RET
>>
>>  #undef	PSEUDO_END
>> diff --git a/sysdeps/powerpc/powerpc64/sysdep.h b/sysdeps/powerpc/powerpc64/sysdep.h
>> index d557098898..2d7dde64da 100644
>> --- a/sysdeps/powerpc/powerpc64/sysdep.h
>> +++ b/sysdeps/powerpc/powerpc64/sysdep.h
>> @@ -17,6 +17,7 @@
>>     <https://www.gnu.org/licenses/>.  */
>>
>>  #include <sysdeps/powerpc/sysdep.h>
>> +#include <tls.h>
>>
>>  #ifdef __ASSEMBLER__
>>
>> @@ -263,10 +264,72 @@ LT_LABELSUFFIX(name,_name_end): ; \
>>    TRACEBACK_MASK(name,mask);	\
>>    END_2(name)
>>
>> +/* We will allocate a new frame to save LR and the non-volatile register used to
>> +   read the TCB when checking for scv support on syscall code.  We actually just
>> +   need the minimum frame size plus room for 1 reg (64 bits).  But the ABI
> 
> nit: Since everything is in bytes below, suggest changing "64 bits" to "8 bytes".

Fixed.

> 
>> +   mandates stack frames should be aligned at 16 Bytes, so we end up allocating
>> +   a bit more space then what will actually be used.  */
>> +#define SCV_FRAME_SIZE (FRAME_MIN_SIZE+16)
> 
> 8 for the register save area + 8 more to maintain 16-byte alignment.  OK.
> 
>> +#define SCV_FRAME_NVOLREG_SAVE FRAME_MIN_SIZE
>> +
>> +/* Allocate frame and save register */
>> +#define NVOLREG_SAVE \
>> +    stdu r1,-SCV_FRAME_SIZE(r1); \
>> +    std r31,SCV_FRAME_NVOLREG_SAVE(r1); \
>> +    cfi_adjust_cfa_offset(SCV_FRAME_SIZE);
>> +
>> +/* Restore register and destroy frame */
>> +#define NVOLREG_RESTORE	\
>> +    ld r31,SCV_FRAME_NVOLREG_SAVE(r1); \
>> +    addi r1,r1,SCV_FRAME_SIZE; \
>> +    cfi_adjust_cfa_offset(-SCV_FRAME_SIZE);
>> +
>> +/* Check PPC_FEATURE2_SCV bit from hwcap2 in the TCB and update CR0
>> + * accordingly.  First, we check if the thread pointer != 0, so we don't try to
>> + * access the TCB before it has been initialized, e.g. inside the dynamic
>> + * loader.  If it is already initialized, check if scv is available.  On both
>> + * negative cases, go to JUMPFALSE (label given by the macro's caller).  We
>> + * save the value we read from the TCB in a non-volatile register so we can
>> + * reuse it later when exiting from the syscall in PSEUDO_RET.  */
> 
> Florian already mentioned removing the leading '*' for proper formatting.

Fixed.

> 
>> +    .macro CHECK_SCV_SUPPORT REG JUMPFALSE
>> +
>> +    /* Check if thread pointer has already been setup */
>> +    cmpdi r13,0
>> +    beq \JUMPFALSE
>> +
>> +    /* Read PPC_FEATURE2_SCV from TCB and store it in REG */
>> +    ld \REG,TCB_HWCAP(PT_THREAD_POINTER)
>> +    andis. \REG,\REG,PPC_FEATURE2_SCV>>16
>> +
>> +    beq \JUMPFALSE
>> +    .endm
>> +
>> +/* Before doing the syscall, check if we can use scv.  scv is supported by P9
>> + * and later with Linux v5.9 and later.  If so, use it.  Otherwise, fallback to
>> + * sc.  We use a non-volatile register to save hwcap2 from the TCB, so we need
>> + * to save its content beforehand. */
> 
> Here, too. Also need one more space after '.'.

Fixed and fixed.

> 
>>  #define DO_CALL(syscall) \
>> -    li 0,syscall; \
>> +    li r0,syscall; \
>> +    NVOLREG_SAVE; \
>> +    CHECK_SCV_SUPPORT r31 0f; \
>> +    DO_CALL_SCV; \
>> +    b 1f; \
>> +0:  DO_CALL_SC; \
>> +1:
>> +
>> +/* DO_CALL_SC and DO_CALL_SCV expect the syscall number to be loaded on r0.  */
> 
> nit: s/loaded on/in/

Fixed too.

> 
> rest looks OK.
> 
> With the comments fixed, LGTM. Fixing the nits is up to you.
> 
> PC
> 

Queued changes for v2.

Thanks,
Matheus Castanho

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 3/4] powerpc: Runtime selection between sc and scv for syscalls
  2020-11-19 20:29     ` Matheus Castanho
@ 2020-11-19 20:35       ` Florian Weimer
  2020-11-23 18:00         ` Matheus Castanho
  0 siblings, 1 reply; 20+ messages in thread
From: Florian Weimer @ 2020-11-19 20:35 UTC (permalink / raw)
  To: Matheus Castanho; +Cc: Matheus Castanho via Libc-alpha, tuliom

* Matheus Castanho:

> That should work for shared libc, but in the static case we may also
> hit the same problem: trying to access the TLS to read hwcap2 before
> it has been initialized, but this time in csu/libc-tls.c

Ahh.

> Is there a way to also check if we are in the static startup code at
> compile time? If not, I'm afraid I'll have to keep the check for the
> thread pointer.

Is the thread pointer in a regular register?  Then you could install a
fake TCB early on that has a zero bit in the right place.

The other option would be to always use the old interface for !SHARED.
Just saying. 8-)

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 3/4] powerpc: Runtime selection between sc and scv for syscalls
  2020-11-19 20:35       ` Florian Weimer
@ 2020-11-23 18:00         ` Matheus Castanho
  2020-12-01 12:50           ` Matheus Castanho
  2020-12-01 13:11           ` Florian Weimer
  0 siblings, 2 replies; 20+ messages in thread
From: Matheus Castanho @ 2020-11-23 18:00 UTC (permalink / raw)
  To: Florian Weimer; +Cc: Matheus Castanho via Libc-alpha, tuliom

Hi Florian,

On 11/19/20 5:35 PM, Florian Weimer wrote:
> * Matheus Castanho:
> 
>> That should work for shared libc, but in the static case we may also
>> hit the same problem: trying to access the TLS to read hwcap2 before
>> it has been initialized, but this time in csu/libc-tls.c
> 
> Ahh.
> 
>> Is there a way to also check if we are in the static startup code at
>> compile time? If not, I'm afraid I'll have to keep the check for the
>> thread pointer.
> 
> Is the thread pointer in a regular register?  Then you could install a
> fake TCB early on that has a zero bit in the right place.
> 
> The other option would be to always use the old interface for !SHARED.
> Just saying. 8-)
> 

I believe adding the fake TCB is a bit out of the scope of this patch, and I'd
also prefer to keep the same behavior for both static and shared libcs, so we
avoid surprises in the future.

Do you see this as a blocker for merging this patch?

Thanks,
Matheus Castanho

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/4] powerpc: Make PT_THREAD_POINTER available to assembly code
  2020-11-19 19:58   ` Tulio Magno Quites Machado Filho
@ 2020-11-24 17:33     ` Matheus Castanho
  0 siblings, 0 replies; 20+ messages in thread
From: Matheus Castanho @ 2020-11-24 17:33 UTC (permalink / raw)
  To: Tulio Magno Quites Machado Filho, libc-alpha



On 11/19/20 4:58 PM, Tulio Magno Quites Machado Filho wrote:
> Matheus Castanho via Libc-alpha <libc-alpha@sourceware.org> writes:
> 
>> PT_THREAD_POINTER is currenty defined inside a #ifndef __ASSEMBLER__ block, but
>> its usage should not be limited to C code, as it can be useful when accessing
>> the TLS from assembly code as well.
> 
> This patch LGTM.
> 
> Reviewed-by: Tulio Magno Quites Machado Filho <tuliom@linux.ibm.com>
> 
> Thanks!
> 

Pushed as 1e0a7fd0997ad5454d3fee480ceb392c4b49c064

Thanks,
Matheus Castanho

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 3/4] powerpc: Runtime selection between sc and scv for syscalls
  2020-11-23 18:00         ` Matheus Castanho
@ 2020-12-01 12:50           ` Matheus Castanho
  2020-12-01 13:11           ` Florian Weimer
  1 sibling, 0 replies; 20+ messages in thread
From: Matheus Castanho @ 2020-12-01 12:50 UTC (permalink / raw)
  To: Florian Weimer; +Cc: tuliom, Matheus Castanho via Libc-alpha

Gentle ping.

On 11/23/20 3:00 PM, Matheus Castanho via Libc-alpha wrote:
> Hi Florian,
> 
> On 11/19/20 5:35 PM, Florian Weimer wrote:
>> * Matheus Castanho:
>>
>>> That should work for shared libc, but in the static case we may also
>>> hit the same problem: trying to access the TLS to read hwcap2 before
>>> it has been initialized, but this time in csu/libc-tls.c
>>
>> Ahh.
>>
>>> Is there a way to also check if we are in the static startup code at
>>> compile time? If not, I'm afraid I'll have to keep the check for the
>>> thread pointer.
>>
>> Is the thread pointer in a regular register?  Then you could install a
>> fake TCB early on that has a zero bit in the right place.
>>
>> The other option would be to always use the old interface for !SHARED.
>> Just saying. 8-)
>>
> 
> I believe adding the fake TCB is a bit out of the scope of this patch, and I'd
> also prefer to keep the same behavior for both static and shared libcs, so we
> avoid surprises in the future.
> 
> Do you see this as a blocker for merging this patch?

Florian, any thoughts?

Thanks,
Matheus Castanho

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 3/4] powerpc: Runtime selection between sc and scv for syscalls
  2020-11-23 18:00         ` Matheus Castanho
  2020-12-01 12:50           ` Matheus Castanho
@ 2020-12-01 13:11           ` Florian Weimer
  2020-12-01 13:35             ` Adhemerval Zanella
  1 sibling, 1 reply; 20+ messages in thread
From: Florian Weimer @ 2020-12-01 13:11 UTC (permalink / raw)
  To: Matheus Castanho via Libc-alpha; +Cc: Matheus Castanho, tuliom

* Matheus Castanho via Libc-alpha:

> Hi Florian,
>
> On 11/19/20 5:35 PM, Florian Weimer wrote:
>> * Matheus Castanho:
>> 
>>> That should work for shared libc, but in the static case we may also
>>> hit the same problem: trying to access the TLS to read hwcap2 before
>>> it has been initialized, but this time in csu/libc-tls.c
>> 
>> Ahh.
>> 
>>> Is there a way to also check if we are in the static startup code at
>>> compile time? If not, I'm afraid I'll have to keep the check for the
>>> thread pointer.
>> 
>> Is the thread pointer in a regular register?  Then you could install a
>> fake TCB early on that has a zero bit in the right place.
>> 
>> The other option would be to always use the old interface for !SHARED.
>> Just saying. 8-)
>> 
>
> I believe adding the fake TCB is a bit out of the scope of this patch, and I'd
> also prefer to keep the same behavior for both static and shared libcs, so we
> avoid surprises in the future.
>
> Do you see this as a blocker for merging this patch?

I think the run-time check in the shared builds is unnecessary.  I don't
have an opinion on the static case, but I think ld.so should use the
legacy interface unconditionally, and shared libc.so should use a
dynamic check while assuming that the TCB is initialized.  If we can
avoid making things harder for the branch predict, we should do so.

In the end, it's your port though, and I don't have a strong opinion
here.

Thanks,
Florian
-- 
Red Hat GmbH, https://de.redhat.com/ , Registered seat: Grasbrunn,
Commercial register: Amtsgericht Muenchen, HRB 153243,
Managing Directors: Charles Cachera, Brian Klemm, Laurie Krebs, Michael O'Neill


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 3/4] powerpc: Runtime selection between sc and scv for syscalls
  2020-12-01 13:11           ` Florian Weimer
@ 2020-12-01 13:35             ` Adhemerval Zanella
  2020-12-03 17:19               ` Matheus Castanho
  0 siblings, 1 reply; 20+ messages in thread
From: Adhemerval Zanella @ 2020-12-01 13:35 UTC (permalink / raw)
  To: libc-alpha



On 01/12/2020 10:11, Florian Weimer via Libc-alpha wrote:
> * Matheus Castanho via Libc-alpha:
> 
>> Hi Florian,
>>
>> On 11/19/20 5:35 PM, Florian Weimer wrote:
>>> * Matheus Castanho:
>>>
>>>> That should work for shared libc, but in the static case we may also
>>>> hit the same problem: trying to access the TLS to read hwcap2 before
>>>> it has been initialized, but this time in csu/libc-tls.c
>>>
>>> Ahh.
>>>
>>>> Is there a way to also check if we are in the static startup code at
>>>> compile time? If not, I'm afraid I'll have to keep the check for the
>>>> thread pointer.
>>>
>>> Is the thread pointer in a regular register?  Then you could install a
>>> fake TCB early on that has a zero bit in the right place.
>>>
>>> The other option would be to always use the old interface for !SHARED.
>>> Just saying. 8-)
>>>
>>
>> I believe adding the fake TCB is a bit out of the scope of this patch, and I'd
>> also prefer to keep the same behavior for both static and shared libcs, so we
>> avoid surprises in the future.
>>
>> Do you see this as a blocker for merging this patch?
> 
> I think the run-time check in the shared builds is unnecessary.  I don't
> have an opinion on the static case, but I think ld.so should use the
> legacy interface unconditionally, and shared libc.so should use a
> dynamic check while assuming that the TCB is initialized.  If we can
> avoid making things harder for the branch predict, we should do so.
I agree, for static case it might be more complicate to disentangle the
loader code so the check might be required.

> 
> In the end, it's your port though, and I don't have a strong opinion
> here.
> 
> Thanks,
> Florian
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 3/4] powerpc: Runtime selection between sc and scv for syscalls
  2020-12-01 13:35             ` Adhemerval Zanella
@ 2020-12-03 17:19               ` Matheus Castanho
  0 siblings, 0 replies; 20+ messages in thread
From: Matheus Castanho @ 2020-12-03 17:19 UTC (permalink / raw)
  To: Adhemerval Zanella, libc-alpha; +Cc: Florian Weimer



On 12/1/20 10:35 AM, Adhemerval Zanella via Libc-alpha wrote:
> 
> 
> On 01/12/2020 10:11, Florian Weimer via Libc-alpha wrote:
>> * Matheus Castanho via Libc-alpha:
>>
>>> Hi Florian,
>>>
>>> On 11/19/20 5:35 PM, Florian Weimer wrote:
>>>> * Matheus Castanho:
>>>>
>>>>> That should work for shared libc, but in the static case we may also
>>>>> hit the same problem: trying to access the TLS to read hwcap2 before
>>>>> it has been initialized, but this time in csu/libc-tls.c
>>>>
>>>> Ahh.
>>>>
>>>>> Is there a way to also check if we are in the static startup code at
>>>>> compile time? If not, I'm afraid I'll have to keep the check for the
>>>>> thread pointer.
>>>>
>>>> Is the thread pointer in a regular register?  Then you could install a
>>>> fake TCB early on that has a zero bit in the right place.
>>>>
>>>> The other option would be to always use the old interface for !SHARED.
>>>> Just saying. 8-)
>>>>
>>>
>>> I believe adding the fake TCB is a bit out of the scope of this patch, and I'd
>>> also prefer to keep the same behavior for both static and shared libcs, so we
>>> avoid surprises in the future.
>>>
>>> Do you see this as a blocker for merging this patch?
>>
>> I think the run-time check in the shared builds is unnecessary.  I don't
>> have an opinion on the static case, but I think ld.so should use the
>> legacy interface unconditionally, and shared libc.so should use a
>> dynamic check while assuming that the TCB is initialized.  If we can
>> avoid making things harder for the branch predict, we should do so.
> I agree, for static case it might be more complicate to disentangle the
> loader code so the check might be required.
> 

[snip]

Ok, thanks for the feedback. I believe I was able to address this in v2 [0].

[0] https://sourceware.org/pipermail/libc-alpha/2020-December/120353.html

Thanks,
Matheus Castanho

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2020-12-03 17:19 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-11-18 14:46 [PATCH 0/4] powerpc: Add support for system call vectored Matheus Castanho
2020-11-18 14:47 ` [PATCH 1/4] powerpc: Replace brk.S with a C implementation Matheus Castanho
2020-11-18 20:00   ` Adhemerval Zanella
2020-11-19 17:37     ` Tulio Magno Quites Machado Filho
2020-11-18 21:28   ` Andreas Schwab
2020-11-18 14:47 ` [PATCH 2/4] powerpc: Make PT_THREAD_POINTER available to assembly code Matheus Castanho
2020-11-19 19:58   ` Tulio Magno Quites Machado Filho
2020-11-24 17:33     ` Matheus Castanho
2020-11-18 14:47 ` [PATCH 3/4] powerpc: Runtime selection between sc and scv for syscalls Matheus Castanho
2020-11-18 15:16   ` Florian Weimer
2020-11-19 20:29     ` Matheus Castanho
2020-11-19 20:35       ` Florian Weimer
2020-11-23 18:00         ` Matheus Castanho
2020-12-01 12:50           ` Matheus Castanho
2020-12-01 13:11           ` Florian Weimer
2020-12-01 13:35             ` Adhemerval Zanella
2020-12-03 17:19               ` Matheus Castanho
2020-11-18 19:00   ` Paul A. Clarke
2020-11-19 20:34     ` Matheus Castanho
2020-11-18 14:47 ` [PATCH 4/4] powerpc: Use scv instruction on clone when available Matheus Castanho

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).