public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed
* [PATCH 0/4] Simplify internal single-threaded usage
@ 2022-06-08 16:49 Adhemerval Zanella
  2022-06-08 16:49 ` [PATCH 1/4] misc: Optimize internal usage of __libc_single_threaded Adhemerval Zanella
                   ` (3 more replies)
  0 siblings, 4 replies; 10+ messages in thread
From: Adhemerval Zanella @ 2022-06-08 16:49 UTC (permalink / raw)
  To: libc-alpha, Wilco Dijkstra

Glibc currently has three different internal ways to check if a process
is single-threaded: the exported global variable __libc_single_threaded,
the internal-only __libc_multiple_threads, and the variant used by some
architectures and allocated on TCB, multiple_threads.  Also each port
can define SINGLE_THREAD_BY_GLOBAL to either use __libc_multiple_threads
or multiple_threads.

The __libc_single_threaded and __libc_multiple_threads have essentially
the semantic: both are global variables where the value is not reset
if/when the process becomes single-threaded.  The issue of using
__libc_single_threaded internally is since it is accessed through copy
relocation, both values must be updated.  This is fixed in the first
patch.

The second replaces __libc_multiple_threads with __libc_single_threaded,
while also fixing a bug where architecture that defines
SINGLE_THREAD_BY_GLOBAL did not actually enable the optimization.

The third patch replaces multiple_threads with __libc_single_threaded,
to simplify a possible single-thread lock optimization.  On most
architectures, accessing an internal global variable should be as fast
as through the TCB (it seems that only legacy ABIs that require extra
code sequence to materialize global access, such as i686 and sparc,
using the TCB would be faster).

The i686 seems to be the only architecture that optimizes the lock access
directly by reimplementing the atomic operations.  In this case some
as rewritten using compiler builtins along with SINGLE_THREAD_P macro,
while other unused macros are just removed (for instance
atomic_add_zero).  The idea is to just phase out this specific atomic
implementation in favor of compiler builtins and move the single-thread
optimization to be arch-neutral.

The last patch just remove the single-thread.h header and move the
definition to internal sys/single_threaded.h, so now there is only
one place to add such optimization.

Adhemerval Zanella (4):
  misc: Optimize internal usage of __libc_single_threaded
  Replace __libc_multiple_threads with __libc_single_threaded
  Remove usage of TLS_MULTIPLE_THREADS_IN_TCB
  Remove single-thread.h

 dlfcn/dlsym.c                               |   1 +
 elf/libc_early_init.c                       |   9 +
 include/dlfcn.h                             |   4 +
 include/sys/single_threaded.h               |  20 +-
 misc/single_threaded.c                      |   2 +
 misc/tst-atomic.c                           |   1 +
 nptl/Makefile                               |   1 -
 nptl/allocatestack.c                        |  12 -
 nptl/descr.h                                |  17 +-
 nptl/libc_multiple_threads.c                |  28 --
 nptl/pthread_cancel.c                       |   9 +-
 nptl/pthread_create.c                       |  11 +-
 sysdeps/generic/single-thread.h             |  25 --
 sysdeps/i386/htl/tcb-offsets.sym            |   1 -
 sysdeps/i386/nptl/tcb-offsets.sym           |   1 -
 sysdeps/i386/nptl/tls.h                     |   4 +-
 sysdeps/ia64/nptl/tcb-offsets.sym           |   1 -
 sysdeps/ia64/nptl/tls.h                     |   2 -
 sysdeps/mach/hurd/i386/tls.h                |   4 +-
 sysdeps/mach/hurd/sysdep-cancel.h           |   5 -
 sysdeps/nios2/nptl/tcb-offsets.sym          |   1 -
 sysdeps/or1k/nptl/tls.h                     |   2 -
 sysdeps/powerpc/nptl/tcb-offsets.sym        |   3 -
 sysdeps/powerpc/nptl/tls.h                  |   3 -
 sysdeps/s390/nptl/tcb-offsets.sym           |   1 -
 sysdeps/s390/nptl/tls.h                     |   6 +-
 sysdeps/sh/nptl/tcb-offsets.sym             |   1 -
 sysdeps/sh/nptl/tls.h                       |   2 -
 sysdeps/sparc/nptl/tcb-offsets.sym          |   1 -
 sysdeps/sparc/nptl/tls.h                    |   2 +-
 sysdeps/unix/sysdep.h                       |   2 +-
 sysdeps/unix/sysv/linux/aarch64/sysdep.h    |   2 -
 sysdeps/unix/sysv/linux/alpha/sysdep.h      |   2 -
 sysdeps/unix/sysv/linux/arc/sysdep.h        |   2 -
 sysdeps/unix/sysv/linux/arm/sysdep.h        |   2 -
 sysdeps/unix/sysv/linux/hppa/sysdep.h       |   2 -
 sysdeps/unix/sysv/linux/microblaze/sysdep.h |   2 -
 sysdeps/unix/sysv/linux/s390/sysdep.h       |   3 -
 sysdeps/unix/sysv/linux/single-thread.h     |  44 ---
 sysdeps/unix/sysv/linux/x86_64/sysdep.h     |   2 -
 sysdeps/x86/atomic-machine.h                | 327 +++-----------------
 sysdeps/x86_64/nptl/tcb-offsets.sym         |   1 -
 42 files changed, 96 insertions(+), 475 deletions(-)
 delete mode 100644 nptl/libc_multiple_threads.c
 delete mode 100644 sysdeps/generic/single-thread.h
 delete mode 100644 sysdeps/unix/sysv/linux/single-thread.h

-- 
2.34.1


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 1/4] misc: Optimize internal usage of __libc_single_threaded
  2022-06-08 16:49 [PATCH 0/4] Simplify internal single-threaded usage Adhemerval Zanella
@ 2022-06-08 16:49 ` Adhemerval Zanella
  2022-06-08 17:44   ` Florian Weimer
  2022-06-08 16:49 ` [PATCH 2/4] Replace __libc_multiple_threads with __libc_single_threaded Adhemerval Zanella
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 10+ messages in thread
From: Adhemerval Zanella @ 2022-06-08 16:49 UTC (permalink / raw)
  To: libc-alpha, Wilco Dijkstra

To avoid a GOT indirection for internal usages.  On some architecture,
__libc_single_thread can be accessed through copy relocations so it
requires to update both copies, which is done through finding the
new with dlsym.

Checked on x86_64-linux-gnu and i686-linux-gnu.
---
 dlfcn/dlsym.c                 |  1 +
 elf/libc_early_init.c         |  9 +++++++++
 include/dlfcn.h               |  4 ++++
 include/sys/single_threaded.h | 11 +++++++++++
 misc/single_threaded.c        |  2 ++
 nptl/pthread_create.c         |  6 +++++-
 6 files changed, 32 insertions(+), 1 deletion(-)

diff --git a/dlfcn/dlsym.c b/dlfcn/dlsym.c
index 2e9ff98e79..43c7ee8c4d 100644
--- a/dlfcn/dlsym.c
+++ b/dlfcn/dlsym.c
@@ -88,3 +88,4 @@ ___dlsym (void *handle, const char *name)
 }
 weak_alias (___dlsym, dlsym)
 #endif /* !SHARED */
+libc_hidden_def (___dlsym)
diff --git a/elf/libc_early_init.c b/elf/libc_early_init.c
index 3c4a19cf6b..18966900c4 100644
--- a/elf/libc_early_init.c
+++ b/elf/libc_early_init.c
@@ -16,7 +16,9 @@
    License along with the GNU C Library; if not, see
    <https://www.gnu.org/licenses/>.  */
 
+#include <assert.h>
 #include <ctype.h>
+#include <dlfcn.h>
 #include <elision-conf.h>
 #include <libc-early-init.h>
 #include <libc-internal.h>
@@ -38,6 +40,13 @@ __libc_early_init (_Bool initial)
   __libc_single_threaded = initial;
 
 #ifdef SHARED
+  /* _libc_single_thread can be accessed through copy relocations, so it
+     requires to update the external copy.  */
+  __libc_external_single_threaded = ___dlsym (RTLD_DEFAULT,
+					      "__libc_single_threaded");
+  assert (__libc_external_single_threaded != NULL);
+  *__libc_external_single_threaded = initial;
+
   __libc_initial = initial;
 #endif
 
diff --git a/include/dlfcn.h b/include/dlfcn.h
index ae25f05303..95b8756770 100644
--- a/include/dlfcn.h
+++ b/include/dlfcn.h
@@ -135,5 +135,9 @@ extern int __dladdr1 (const void *address, Dl_info *info,
 extern int __dlinfo (void *handle, int request, void *arg);
 extern char *__dlerror (void);
 
+/* Internal interfaces to avoid intra-PLT calls.  */
+extern __typeof (dlsym) ___dlsym;
+libc_hidden_proto (___dlsym);
+
 #endif
 #endif
diff --git a/include/sys/single_threaded.h b/include/sys/single_threaded.h
index 18f6972482..258b01e0b2 100644
--- a/include/sys/single_threaded.h
+++ b/include/sys/single_threaded.h
@@ -1 +1,12 @@
 #include <misc/sys/single_threaded.h>
+
+#ifndef _ISOMAC
+
+libc_hidden_proto (__libc_single_threaded);
+
+# ifdef SHARED
+extern __typeof (__libc_single_threaded) *__libc_external_single_threaded
+  attribute_hidden;
+# endif
+
+#endif
diff --git a/misc/single_threaded.c b/misc/single_threaded.c
index 96ada9137b..201d86a273 100644
--- a/misc/single_threaded.c
+++ b/misc/single_threaded.c
@@ -22,6 +22,8 @@
    __libc_early_init (as false for inner libcs).  */
 #ifdef SHARED
 char __libc_single_threaded;
+__typeof (__libc_single_threaded) *__libc_external_single_threaded;
 #else
 char __libc_single_threaded = 1;
 #endif
+libc_hidden_data_def (__libc_single_threaded)
diff --git a/nptl/pthread_create.c b/nptl/pthread_create.c
index e7a099acb7..5633d01c62 100644
--- a/nptl/pthread_create.c
+++ b/nptl/pthread_create.c
@@ -627,7 +627,11 @@ __pthread_create_2_1 (pthread_t *newthread, const pthread_attr_t *attr,
   if (__libc_single_threaded)
     {
       late_init ();
-      __libc_single_threaded = 0;
+      __libc_single_threaded =
+#ifdef SHARED
+        *__libc_external_single_threaded =
+#endif
+	0;
     }
 
   const struct pthread_attr *iattr = (struct pthread_attr *) attr;
-- 
2.34.1


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 2/4] Replace __libc_multiple_threads with __libc_single_threaded
  2022-06-08 16:49 [PATCH 0/4] Simplify internal single-threaded usage Adhemerval Zanella
  2022-06-08 16:49 ` [PATCH 1/4] misc: Optimize internal usage of __libc_single_threaded Adhemerval Zanella
@ 2022-06-08 16:49 ` Adhemerval Zanella
  2022-06-08 16:49 ` [PATCH 3/4] Remove usage of TLS_MULTIPLE_THREADS_IN_TCB Adhemerval Zanella
  2022-06-08 16:49 ` [PATCH 4/4] Remove single-thread.h Adhemerval Zanella
  3 siblings, 0 replies; 10+ messages in thread
From: Adhemerval Zanella @ 2022-06-08 16:49 UTC (permalink / raw)
  To: libc-alpha, Wilco Dijkstra

And also fixes the SINGLE_THREAD_P macro for SINGLE_THREAD_BY_GLOBAL,
since header inclusion single-thread.h is in the wrong order, the define
needs to come before including sysdeps/unix/sysdep.h.  The macro
is now moved to a per-arch single-threade.h header.
---
 nptl/Makefile                                 |  1 -
 nptl/allocatestack.c                          |  6 ----
 nptl/libc_multiple_threads.c                  | 28 -------------------
 nptl/pthread_cancel.c                         |  2 +-
 .../unix/sysv/linux/aarch64/single-thread.h   |  2 ++
 sysdeps/unix/sysv/linux/aarch64/sysdep.h      |  2 --
 sysdeps/unix/sysv/linux/alpha/sysdep.h        |  2 --
 sysdeps/unix/sysv/linux/arc/single-thread.h   |  2 ++
 sysdeps/unix/sysv/linux/arc/sysdep.h          |  2 --
 sysdeps/unix/sysv/linux/arm/single-thread.h   |  2 ++
 sysdeps/unix/sysv/linux/arm/sysdep.h          |  2 --
 sysdeps/unix/sysv/linux/hppa/single-thread.h  |  2 ++
 sysdeps/unix/sysv/linux/hppa/sysdep.h         |  2 --
 .../sysv/linux/microblaze/single-thread.h     |  2 ++
 sysdeps/unix/sysv/linux/microblaze/sysdep.h   |  2 --
 sysdeps/unix/sysv/linux/s390/single-thread.h  |  2 ++
 sysdeps/unix/sysv/linux/s390/sysdep.h         |  3 --
 sysdeps/unix/sysv/linux/single-thread.h       | 11 ++++----
 .../unix/sysv/linux/x86_64/single-thread.h    |  2 ++
 sysdeps/unix/sysv/linux/x86_64/sysdep.h       |  2 --
 20 files changed, 20 insertions(+), 59 deletions(-)
 delete mode 100644 nptl/libc_multiple_threads.c
 create mode 100644 sysdeps/unix/sysv/linux/aarch64/single-thread.h
 create mode 100644 sysdeps/unix/sysv/linux/arc/single-thread.h
 create mode 100644 sysdeps/unix/sysv/linux/arm/single-thread.h
 create mode 100644 sysdeps/unix/sysv/linux/hppa/single-thread.h
 create mode 100644 sysdeps/unix/sysv/linux/microblaze/single-thread.h
 create mode 100644 sysdeps/unix/sysv/linux/s390/single-thread.h
 create mode 100644 sysdeps/unix/sysv/linux/x86_64/single-thread.h

diff --git a/nptl/Makefile b/nptl/Makefile
index b585663974..3d2ce8af8a 100644
--- a/nptl/Makefile
+++ b/nptl/Makefile
@@ -50,7 +50,6 @@ routines = \
   events \
   futex-internal \
   libc-cleanup \
-  libc_multiple_threads \
   lowlevellock \
   nptl-stack \
   nptl_deallocate_tsd \
diff --git a/nptl/allocatestack.c b/nptl/allocatestack.c
index 01a282f3f6..98f5f6dd85 100644
--- a/nptl/allocatestack.c
+++ b/nptl/allocatestack.c
@@ -292,9 +292,6 @@ allocate_stack (const struct pthread_attr *attr, struct pthread **pdp,
 
       /* This is at least the second thread.  */
       pd->header.multiple_threads = 1;
-#ifndef TLS_MULTIPLE_THREADS_IN_TCB
-      __libc_multiple_threads = 1;
-#endif
 
 #ifdef NEED_DL_SYSINFO
       SETUP_THREAD_SYSINFO (pd);
@@ -413,9 +410,6 @@ allocate_stack (const struct pthread_attr *attr, struct pthread **pdp,
 
 	  /* This is at least the second thread.  */
 	  pd->header.multiple_threads = 1;
-#ifndef TLS_MULTIPLE_THREADS_IN_TCB
-	  __libc_multiple_threads = 1;
-#endif
 
 #ifdef NEED_DL_SYSINFO
 	  SETUP_THREAD_SYSINFO (pd);
diff --git a/nptl/libc_multiple_threads.c b/nptl/libc_multiple_threads.c
deleted file mode 100644
index 0c2dc33d0d..0000000000
--- a/nptl/libc_multiple_threads.c
+++ /dev/null
@@ -1,28 +0,0 @@
-/* Copyright (C) 2002-2022 Free Software Foundation, Inc.
-   This file is part of the GNU C Library.
-
-   The GNU C Library is free software; you can redistribute it and/or
-   modify it under the terms of the GNU Lesser General Public
-   License as published by the Free Software Foundation; either
-   version 2.1 of the License, or (at your option) any later version.
-
-   The GNU C Library is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-   Lesser General Public License for more details.
-
-   You should have received a copy of the GNU Lesser General Public
-   License along with the GNU C Library; if not, see
-   <https://www.gnu.org/licenses/>.  */
-
-#include <pthreadP.h>
-
-#if IS_IN (libc)
-# ifndef TLS_MULTIPLE_THREADS_IN_TCB
-/* Variable set to a nonzero value either if more than one thread runs or ran,
-   or if a single-threaded process is trying to cancel itself.  See
-   nptl/descr.h for more context on the single-threaded process case.  */
-int __libc_multiple_threads;
-libc_hidden_data_def (__libc_multiple_threads)
-# endif
-#endif
diff --git a/nptl/pthread_cancel.c b/nptl/pthread_cancel.c
index e67b2df5cc..e1735279f2 100644
--- a/nptl/pthread_cancel.c
+++ b/nptl/pthread_cancel.c
@@ -161,7 +161,7 @@ __pthread_cancel (pthread_t th)
 	   points get executed.  */
 	THREAD_SETMEM (THREAD_SELF, header.multiple_threads, 1);
 #ifndef TLS_MULTIPLE_THREADS_IN_TCB
-      __libc_multiple_threads = 1;
+	__libc_single_threaded = 0;
 #endif
     }
   while (!atomic_compare_exchange_weak_acquire (&pd->cancelhandling, &oldval,
diff --git a/sysdeps/unix/sysv/linux/aarch64/single-thread.h b/sysdeps/unix/sysv/linux/aarch64/single-thread.h
new file mode 100644
index 0000000000..a5d3a2aaf4
--- /dev/null
+++ b/sysdeps/unix/sysv/linux/aarch64/single-thread.h
@@ -0,0 +1,2 @@
+#define SINGLE_THREAD_BY_GLOBAL
+#include_next <single-thread.h>
diff --git a/sysdeps/unix/sysv/linux/aarch64/sysdep.h b/sysdeps/unix/sysv/linux/aarch64/sysdep.h
index 3b230dccf1..f1853e012f 100644
--- a/sysdeps/unix/sysv/linux/aarch64/sysdep.h
+++ b/sysdeps/unix/sysv/linux/aarch64/sysdep.h
@@ -164,8 +164,6 @@
 # define HAVE_CLOCK_GETTIME64_VSYSCALL	"__kernel_clock_gettime"
 # define HAVE_GETTIMEOFDAY_VSYSCALL	"__kernel_gettimeofday"
 
-# define SINGLE_THREAD_BY_GLOBAL		1
-
 # undef INTERNAL_SYSCALL_RAW
 # define INTERNAL_SYSCALL_RAW(name, nr, args...)		\
   ({ long _sys_result;						\
diff --git a/sysdeps/unix/sysv/linux/alpha/sysdep.h b/sysdeps/unix/sysv/linux/alpha/sysdep.h
index 3051a744b4..77ec2b5400 100644
--- a/sysdeps/unix/sysv/linux/alpha/sysdep.h
+++ b/sysdeps/unix/sysv/linux/alpha/sysdep.h
@@ -32,8 +32,6 @@
 #undef SYS_ify
 #define SYS_ify(syscall_name)	__NR_##syscall_name
 
-#define SINGLE_THREAD_BY_GLOBAL 1
-
 #ifdef __ASSEMBLER__
 #include <asm/pal.h>
 #include <alpha/regdef.h>
diff --git a/sysdeps/unix/sysv/linux/arc/single-thread.h b/sysdeps/unix/sysv/linux/arc/single-thread.h
new file mode 100644
index 0000000000..a5d3a2aaf4
--- /dev/null
+++ b/sysdeps/unix/sysv/linux/arc/single-thread.h
@@ -0,0 +1,2 @@
+#define SINGLE_THREAD_BY_GLOBAL
+#include_next <single-thread.h>
diff --git a/sysdeps/unix/sysv/linux/arc/sysdep.h b/sysdeps/unix/sysv/linux/arc/sysdep.h
index 29b0e0161c..d0c1a78381 100644
--- a/sysdeps/unix/sysv/linux/arc/sysdep.h
+++ b/sysdeps/unix/sysv/linux/arc/sysdep.h
@@ -132,8 +132,6 @@ L (call_syscall_err):			ASM_LINE_SEP	\
 
 #else  /* !__ASSEMBLER__ */
 
-# define SINGLE_THREAD_BY_GLOBAL		1
-
 # if IS_IN (libc)
 extern long int __syscall_error (long int);
 hidden_proto (__syscall_error)
diff --git a/sysdeps/unix/sysv/linux/arm/single-thread.h b/sysdeps/unix/sysv/linux/arm/single-thread.h
new file mode 100644
index 0000000000..a5d3a2aaf4
--- /dev/null
+++ b/sysdeps/unix/sysv/linux/arm/single-thread.h
@@ -0,0 +1,2 @@
+#define SINGLE_THREAD_BY_GLOBAL
+#include_next <single-thread.h>
diff --git a/sysdeps/unix/sysv/linux/arm/sysdep.h b/sysdeps/unix/sysv/linux/arm/sysdep.h
index 7bdd218063..1f270b961e 100644
--- a/sysdeps/unix/sysv/linux/arm/sysdep.h
+++ b/sysdeps/unix/sysv/linux/arm/sysdep.h
@@ -408,8 +408,6 @@ __local_syscall_error:						\
 #define INTERNAL_SYSCALL_NCS(number, nr, args...)              \
   INTERNAL_SYSCALL_RAW (number, nr, args)
 
-#define SINGLE_THREAD_BY_GLOBAL	1
-
 #endif	/* __ASSEMBLER__ */
 
 #endif /* linux/arm/sysdep.h */
diff --git a/sysdeps/unix/sysv/linux/hppa/single-thread.h b/sysdeps/unix/sysv/linux/hppa/single-thread.h
new file mode 100644
index 0000000000..a5d3a2aaf4
--- /dev/null
+++ b/sysdeps/unix/sysv/linux/hppa/single-thread.h
@@ -0,0 +1,2 @@
+#define SINGLE_THREAD_BY_GLOBAL
+#include_next <single-thread.h>
diff --git a/sysdeps/unix/sysv/linux/hppa/sysdep.h b/sysdeps/unix/sysv/linux/hppa/sysdep.h
index 42f7705852..2f339a4bd6 100644
--- a/sysdeps/unix/sysv/linux/hppa/sysdep.h
+++ b/sysdeps/unix/sysv/linux/hppa/sysdep.h
@@ -474,6 +474,4 @@ L(pre_end):					ASM_LINE_SEP	\
 #define PTR_MANGLE(var) (void) (var)
 #define PTR_DEMANGLE(var) (void) (var)
 
-#define SINGLE_THREAD_BY_GLOBAL	1
-
 #endif /* _LINUX_HPPA_SYSDEP_H */
diff --git a/sysdeps/unix/sysv/linux/microblaze/single-thread.h b/sysdeps/unix/sysv/linux/microblaze/single-thread.h
new file mode 100644
index 0000000000..a5d3a2aaf4
--- /dev/null
+++ b/sysdeps/unix/sysv/linux/microblaze/single-thread.h
@@ -0,0 +1,2 @@
+#define SINGLE_THREAD_BY_GLOBAL
+#include_next <single-thread.h>
diff --git a/sysdeps/unix/sysv/linux/microblaze/sysdep.h b/sysdeps/unix/sysv/linux/microblaze/sysdep.h
index dfd6312506..fda78f6467 100644
--- a/sysdeps/unix/sysv/linux/microblaze/sysdep.h
+++ b/sysdeps/unix/sysv/linux/microblaze/sysdep.h
@@ -308,8 +308,6 @@ SYSCALL_ERROR_LABEL_DCL:                            \
 # define PTR_MANGLE(var) (void) (var)
 # define PTR_DEMANGLE(var) (void) (var)
 
-# define SINGLE_THREAD_BY_GLOBAL	1
-
 #undef HAVE_INTERNAL_BRK_ADDR_SYMBOL
 #define HAVE_INTERNAL_BRK_ADDR_SYMBOL 1
 
diff --git a/sysdeps/unix/sysv/linux/s390/single-thread.h b/sysdeps/unix/sysv/linux/s390/single-thread.h
new file mode 100644
index 0000000000..a5d3a2aaf4
--- /dev/null
+++ b/sysdeps/unix/sysv/linux/s390/single-thread.h
@@ -0,0 +1,2 @@
+#define SINGLE_THREAD_BY_GLOBAL
+#include_next <single-thread.h>
diff --git a/sysdeps/unix/sysv/linux/s390/sysdep.h b/sysdeps/unix/sysv/linux/s390/sysdep.h
index 78c7e8c7e2..2d0a26779c 100644
--- a/sysdeps/unix/sysv/linux/s390/sysdep.h
+++ b/sysdeps/unix/sysv/linux/s390/sysdep.h
@@ -93,9 +93,6 @@
 #define ASMFMT_5 , "0" (gpr2), "d" (gpr3), "d" (gpr4), "d" (gpr5), "d" (gpr6)
 #define ASMFMT_6 , "0" (gpr2), "d" (gpr3), "d" (gpr4), "d" (gpr5), "d" (gpr6), "d" (gpr7)
 
-#define SINGLE_THREAD_BY_GLOBAL		1
-
-
 #define VDSO_NAME  "LINUX_2.6.29"
 #define VDSO_HASH  123718585
 
diff --git a/sysdeps/unix/sysv/linux/single-thread.h b/sysdeps/unix/sysv/linux/single-thread.h
index 4529a906d2..208edccce6 100644
--- a/sysdeps/unix/sysv/linux/single-thread.h
+++ b/sysdeps/unix/sysv/linux/single-thread.h
@@ -19,6 +19,10 @@
 #ifndef _SINGLE_THREAD_H
 #define _SINGLE_THREAD_H
 
+#ifndef __ASSEMBLER__
+# include <sys/single_threaded.h>
+#endif
+
 /* The default way to check if the process is single thread is by using the
    pthread_t 'multiple_threads' field.  However, for some architectures it is
    faster to either use an extra field on TCB or global variables (the TCB
@@ -27,16 +31,11 @@
    The ABI might define SINGLE_THREAD_BY_GLOBAL to enable the single thread
    check to use global variables instead of the pthread_t field.  */
 
-#ifndef __ASSEMBLER__
-extern int __libc_multiple_threads;
-libc_hidden_proto (__libc_multiple_threads)
-#endif
-
 #if !defined SINGLE_THREAD_BY_GLOBAL || IS_IN (rtld)
 # define SINGLE_THREAD_P \
   (THREAD_GETMEM (THREAD_SELF, header.multiple_threads) == 0)
 #else
-# define SINGLE_THREAD_P (__libc_multiple_threads == 0)
+# define SINGLE_THREAD_P (__libc_single_threaded != 0)
 #endif
 
 #define RTLD_SINGLE_THREAD_P SINGLE_THREAD_P
diff --git a/sysdeps/unix/sysv/linux/x86_64/single-thread.h b/sysdeps/unix/sysv/linux/x86_64/single-thread.h
new file mode 100644
index 0000000000..a5d3a2aaf4
--- /dev/null
+++ b/sysdeps/unix/sysv/linux/x86_64/single-thread.h
@@ -0,0 +1,2 @@
+#define SINGLE_THREAD_BY_GLOBAL
+#include_next <single-thread.h>
diff --git a/sysdeps/unix/sysv/linux/x86_64/sysdep.h b/sysdeps/unix/sysv/linux/x86_64/sysdep.h
index e1ce3b62eb..740abefcfd 100644
--- a/sysdeps/unix/sysv/linux/x86_64/sysdep.h
+++ b/sysdeps/unix/sysv/linux/x86_64/sysdep.h
@@ -379,8 +379,6 @@
 
 # define HAVE_CLONE3_WRAPPER			1
 
-# define SINGLE_THREAD_BY_GLOBAL		1
-
 #endif	/* __ASSEMBLER__ */
 
 
-- 
2.34.1


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 3/4] Remove usage of TLS_MULTIPLE_THREADS_IN_TCB
  2022-06-08 16:49 [PATCH 0/4] Simplify internal single-threaded usage Adhemerval Zanella
  2022-06-08 16:49 ` [PATCH 1/4] misc: Optimize internal usage of __libc_single_threaded Adhemerval Zanella
  2022-06-08 16:49 ` [PATCH 2/4] Replace __libc_multiple_threads with __libc_single_threaded Adhemerval Zanella
@ 2022-06-08 16:49 ` Adhemerval Zanella
  2022-06-08 16:49 ` [PATCH 4/4] Remove single-thread.h Adhemerval Zanella
  3 siblings, 0 replies; 10+ messages in thread
From: Adhemerval Zanella @ 2022-06-08 16:49 UTC (permalink / raw)
  To: libc-alpha, Wilco Dijkstra

Instead use __libc_single_threaded on all architectures.  The TCB
field is renamed to avoid change the struct layout.

The i686 atomic need some adjustments since it has single-thread
optimization.  It now uses SINGLE_THREAD_P along with old compiler
builtins (__sync), since the generic code provides C11 ones.
Some catomic are also removed, since they are not used.
---
 misc/tst-atomic.c                       |   1 +
 nptl/allocatestack.c                    |   6 -
 nptl/descr.h                            |  17 +-
 nptl/pthread_cancel.c                   |   7 +-
 nptl/pthread_create.c                   |   5 -
 sysdeps/i386/htl/tcb-offsets.sym        |   1 -
 sysdeps/i386/nptl/tcb-offsets.sym       |   1 -
 sysdeps/i386/nptl/tls.h                 |   4 +-
 sysdeps/ia64/nptl/tcb-offsets.sym       |   1 -
 sysdeps/ia64/nptl/tls.h                 |   2 -
 sysdeps/mach/hurd/i386/tls.h            |   4 +-
 sysdeps/nios2/nptl/tcb-offsets.sym      |   1 -
 sysdeps/or1k/nptl/tls.h                 |   2 -
 sysdeps/powerpc/nptl/tcb-offsets.sym    |   3 -
 sysdeps/powerpc/nptl/tls.h              |   3 -
 sysdeps/s390/nptl/tcb-offsets.sym       |   1 -
 sysdeps/s390/nptl/tls.h                 |   6 +-
 sysdeps/sh/nptl/tcb-offsets.sym         |   1 -
 sysdeps/sh/nptl/tls.h                   |   2 -
 sysdeps/sparc/nptl/tcb-offsets.sym      |   1 -
 sysdeps/sparc/nptl/tls.h                |   2 +-
 sysdeps/unix/sysv/linux/single-thread.h |  15 +-
 sysdeps/x86/atomic-machine.h            | 327 ++++--------------------
 sysdeps/x86_64/nptl/tcb-offsets.sym     |   1 -
 24 files changed, 55 insertions(+), 359 deletions(-)

diff --git a/misc/tst-atomic.c b/misc/tst-atomic.c
index 6d681a7bfd..ddbc618e25 100644
--- a/misc/tst-atomic.c
+++ b/misc/tst-atomic.c
@@ -18,6 +18,7 @@
 
 #include <stdio.h>
 #include <atomic.h>
+#include <support/xthread.h>
 
 #ifndef atomic_t
 # define atomic_t int
diff --git a/nptl/allocatestack.c b/nptl/allocatestack.c
index 98f5f6dd85..3e0d01cb52 100644
--- a/nptl/allocatestack.c
+++ b/nptl/allocatestack.c
@@ -290,9 +290,6 @@ allocate_stack (const struct pthread_attr *attr, struct pthread **pdp,
 	 stack cache nor will the memory (except the TLS memory) be freed.  */
       pd->user_stack = true;
 
-      /* This is at least the second thread.  */
-      pd->header.multiple_threads = 1;
-
 #ifdef NEED_DL_SYSINFO
       SETUP_THREAD_SYSINFO (pd);
 #endif
@@ -408,9 +405,6 @@ allocate_stack (const struct pthread_attr *attr, struct pthread **pdp,
 	     descriptor.  */
 	  pd->specific[0] = pd->specific_1stblock;
 
-	  /* This is at least the second thread.  */
-	  pd->header.multiple_threads = 1;
-
 #ifdef NEED_DL_SYSINFO
 	  SETUP_THREAD_SYSINFO (pd);
 #endif
diff --git a/nptl/descr.h b/nptl/descr.h
index bb46b5958e..77b25d8267 100644
--- a/nptl/descr.h
+++ b/nptl/descr.h
@@ -137,22 +137,7 @@ struct pthread
 #else
     struct
     {
-      /* multiple_threads is enabled either when the process has spawned at
-	 least one thread or when a single-threaded process cancels itself.
-	 This enables additional code to introduce locking before doing some
-	 compare_and_exchange operations and also enable cancellation points.
-	 The concepts of multiple threads and cancellation points ideally
-	 should be separate, since it is not necessary for multiple threads to
-	 have been created for cancellation points to be enabled, as is the
-	 case is when single-threaded process cancels itself.
-
-	 Since enabling multiple_threads enables additional code in
-	 cancellation points and compare_and_exchange operations, there is a
-	 potential for an unneeded performance hit when it is enabled in a
-	 single-threaded, self-canceling process.  This is OK though, since a
-	 single-threaded process will enable async cancellation only when it
-	 looks to cancel itself and is hence going to end anyway.  */
-      int multiple_threads;
+      int unused_multiple_threads;
       int gscope_flag;
     } header;
 #endif
diff --git a/nptl/pthread_cancel.c b/nptl/pthread_cancel.c
index e1735279f2..6d26a15d0e 100644
--- a/nptl/pthread_cancel.c
+++ b/nptl/pthread_cancel.c
@@ -157,12 +157,9 @@ __pthread_cancel (pthread_t th)
 
 	/* A single-threaded process should be able to kill itself, since
 	   there is nothing in the POSIX specification that says that it
-	   cannot.  So we set multiple_threads to true so that cancellation
-	   points get executed.  */
-	THREAD_SETMEM (THREAD_SELF, header.multiple_threads, 1);
-#ifndef TLS_MULTIPLE_THREADS_IN_TCB
+	   cannot.  So we set __libc_single_threaded to true so that
+	   cancellation points get executed.  */
 	__libc_single_threaded = 0;
-#endif
     }
   while (!atomic_compare_exchange_weak_acquire (&pd->cancelhandling, &oldval,
 						newval));
diff --git a/nptl/pthread_create.c b/nptl/pthread_create.c
index 5633d01c62..d43865352f 100644
--- a/nptl/pthread_create.c
+++ b/nptl/pthread_create.c
@@ -882,11 +882,6 @@ __pthread_create_2_1 (pthread_t *newthread, const pthread_attr_t *attr,
 	   other reason that create_thread chose.  Now let it run
 	   free.  */
 	lll_unlock (pd->lock, LLL_PRIVATE);
-
-      /* We now have for sure more than one thread.  The main thread might
-	 not yet have the flag set.  No need to set the global variable
-	 again if this is what we use.  */
-      THREAD_SETMEM (THREAD_SELF, header.multiple_threads, 1);
     }
 
  out:
diff --git a/sysdeps/i386/htl/tcb-offsets.sym b/sysdeps/i386/htl/tcb-offsets.sym
index 7b7c719369..f3f7df6c06 100644
--- a/sysdeps/i386/htl/tcb-offsets.sym
+++ b/sysdeps/i386/htl/tcb-offsets.sym
@@ -2,7 +2,6 @@
 #include <tls.h>
 #include <kernel-features.h>
 
-MULTIPLE_THREADS_OFFSET offsetof (tcbhead_t, multiple_threads)
 SYSINFO_OFFSET          offsetof (tcbhead_t, sysinfo)
 POINTER_GUARD           offsetof (tcbhead_t, pointer_guard)
 SIGSTATE_OFFSET         offsetof (tcbhead_t, _hurd_sigstate)
diff --git a/sysdeps/i386/nptl/tcb-offsets.sym b/sysdeps/i386/nptl/tcb-offsets.sym
index 2ec9e787c1..1efd1469d8 100644
--- a/sysdeps/i386/nptl/tcb-offsets.sym
+++ b/sysdeps/i386/nptl/tcb-offsets.sym
@@ -6,7 +6,6 @@ RESULT			offsetof (struct pthread, result)
 TID			offsetof (struct pthread, tid)
 CANCELHANDLING		offsetof (struct pthread, cancelhandling)
 CLEANUP_JMP_BUF		offsetof (struct pthread, cleanup_jmp_buf)
-MULTIPLE_THREADS_OFFSET	offsetof (tcbhead_t, multiple_threads)
 SYSINFO_OFFSET		offsetof (tcbhead_t, sysinfo)
 CLEANUP			offsetof (struct pthread, cleanup)
 CLEANUP_PREV		offsetof (struct _pthread_cleanup_buffer, __prev)
diff --git a/sysdeps/i386/nptl/tls.h b/sysdeps/i386/nptl/tls.h
index 91090bf287..48940a9f44 100644
--- a/sysdeps/i386/nptl/tls.h
+++ b/sysdeps/i386/nptl/tls.h
@@ -36,7 +36,7 @@ typedef struct
 			   thread descriptor used by libpthread.  */
   dtv_t *dtv;
   void *self;		/* Pointer to the thread descriptor.  */
-  int multiple_threads;
+  int unused_multiple_threads;
   uintptr_t sysinfo;
   uintptr_t stack_guard;
   uintptr_t pointer_guard;
@@ -57,8 +57,6 @@ typedef struct
 _Static_assert (offsetof (tcbhead_t, __private_ss) == 0x30,
 		"offset of __private_ss != 0x30");
 
-# define TLS_MULTIPLE_THREADS_IN_TCB 1
-
 #else /* __ASSEMBLER__ */
 # include <tcb-offsets.h>
 #endif
diff --git a/sysdeps/ia64/nptl/tcb-offsets.sym b/sysdeps/ia64/nptl/tcb-offsets.sym
index b01f712be2..ab2cb180f9 100644
--- a/sysdeps/ia64/nptl/tcb-offsets.sym
+++ b/sysdeps/ia64/nptl/tcb-offsets.sym
@@ -2,5 +2,4 @@
 #include <tls.h>
 
 TID			offsetof (struct pthread, tid) - TLS_PRE_TCB_SIZE
-MULTIPLE_THREADS_OFFSET offsetof (struct pthread, header.multiple_threads) - TLS_PRE_TCB_SIZE
 SYSINFO_OFFSET		offsetof (tcbhead_t, __private)
diff --git a/sysdeps/ia64/nptl/tls.h b/sysdeps/ia64/nptl/tls.h
index 8ccedb73e6..008e080fc4 100644
--- a/sysdeps/ia64/nptl/tls.h
+++ b/sysdeps/ia64/nptl/tls.h
@@ -36,8 +36,6 @@ typedef struct
 
 register struct pthread *__thread_self __asm__("r13");
 
-# define TLS_MULTIPLE_THREADS_IN_TCB 1
-
 #else /* __ASSEMBLER__ */
 # include <tcb-offsets.h>
 #endif
diff --git a/sysdeps/mach/hurd/i386/tls.h b/sysdeps/mach/hurd/i386/tls.h
index 264ed9a9c5..d33e91c922 100644
--- a/sysdeps/mach/hurd/i386/tls.h
+++ b/sysdeps/mach/hurd/i386/tls.h
@@ -33,7 +33,7 @@ typedef struct
   void *tcb;			/* Points to this structure.  */
   dtv_t *dtv;			/* Vector of pointers to TLS data.  */
   thread_t self;		/* This thread's control port.  */
-  int multiple_threads;
+  int unused_multiple_threads;
   uintptr_t sysinfo;
   uintptr_t stack_guard;
   uintptr_t pointer_guard;
@@ -117,8 +117,6 @@ _hurd_tls_init (tcbhead_t *tcb)
   /* This field is used by TLS accesses to get our "thread pointer"
      from the TLS point of view.  */
   tcb->tcb = tcb;
-  /* We always at least start the sigthread anyway.  */
-  tcb->multiple_threads = 1;
 
   /* Get the first available selector.  */
   int sel = -1;
diff --git a/sysdeps/nios2/nptl/tcb-offsets.sym b/sysdeps/nios2/nptl/tcb-offsets.sym
index 3cd8d984ac..93a695ac7f 100644
--- a/sysdeps/nios2/nptl/tcb-offsets.sym
+++ b/sysdeps/nios2/nptl/tcb-offsets.sym
@@ -8,6 +8,5 @@
 # define __thread_self          ((void *) 0)
 # define thread_offsetof(mem)   ((ptrdiff_t) THREAD_SELF + offsetof (struct pthread, mem))
 
-MULTIPLE_THREADS_OFFSET		thread_offsetof (header.multiple_threads)
 TID_OFFSET			thread_offsetof (tid)
 POINTER_GUARD			(offsetof (tcbhead_t, pointer_guard) - TLS_TCB_OFFSET - sizeof (tcbhead_t))
diff --git a/sysdeps/or1k/nptl/tls.h b/sysdeps/or1k/nptl/tls.h
index c6ffe62c3f..3bb07beef8 100644
--- a/sysdeps/or1k/nptl/tls.h
+++ b/sysdeps/or1k/nptl/tls.h
@@ -35,8 +35,6 @@ typedef struct
 
 register tcbhead_t *__thread_self __asm__("r10");
 
-# define TLS_MULTIPLE_THREADS_IN_TCB 1
-
 /* Get system call information.  */
 # include <sysdep.h>
 
diff --git a/sysdeps/powerpc/nptl/tcb-offsets.sym b/sysdeps/powerpc/nptl/tcb-offsets.sym
index 4c01615ad0..a0ee95f94d 100644
--- a/sysdeps/powerpc/nptl/tcb-offsets.sym
+++ b/sysdeps/powerpc/nptl/tcb-offsets.sym
@@ -10,9 +10,6 @@
 # define thread_offsetof(mem)	((ptrdiff_t) THREAD_SELF + offsetof (struct pthread, mem))
 
 
-#if TLS_MULTIPLE_THREADS_IN_TCB
-MULTIPLE_THREADS_OFFSET		thread_offsetof (header.multiple_threads)
-#endif
 TID				thread_offsetof (tid)
 POINTER_GUARD			(offsetof (tcbhead_t, pointer_guard) - TLS_TCB_OFFSET - sizeof (tcbhead_t))
 TAR_SAVE			(offsetof (tcbhead_t, tar_save) - TLS_TCB_OFFSET - sizeof (tcbhead_t))
diff --git a/sysdeps/powerpc/nptl/tls.h b/sysdeps/powerpc/nptl/tls.h
index 22b0075235..fd5ee51981 100644
--- a/sysdeps/powerpc/nptl/tls.h
+++ b/sysdeps/powerpc/nptl/tls.h
@@ -52,9 +52,6 @@
 # define TLS_DTV_AT_TP	1
 # define TLS_TCB_AT_TP	0
 
-/* We use the multiple_threads field in the pthread struct */
-#define TLS_MULTIPLE_THREADS_IN_TCB	1
-
 /* Get the thread descriptor definition.  */
 # include <nptl/descr.h>
 
diff --git a/sysdeps/s390/nptl/tcb-offsets.sym b/sysdeps/s390/nptl/tcb-offsets.sym
index 9c1c01f353..bc7b267463 100644
--- a/sysdeps/s390/nptl/tcb-offsets.sym
+++ b/sysdeps/s390/nptl/tcb-offsets.sym
@@ -1,6 +1,5 @@
 #include <sysdep.h>
 #include <tls.h>
 
-MULTIPLE_THREADS_OFFSET		offsetof (tcbhead_t, multiple_threads)
 STACK_GUARD			offsetof (tcbhead_t, stack_guard)
 TID				offsetof (struct pthread, tid)
diff --git a/sysdeps/s390/nptl/tls.h b/sysdeps/s390/nptl/tls.h
index ff210ffeb2..d69ed539f7 100644
--- a/sysdeps/s390/nptl/tls.h
+++ b/sysdeps/s390/nptl/tls.h
@@ -35,7 +35,7 @@ typedef struct
 			   thread descriptor used by libpthread.  */
   dtv_t *dtv;
   void *self;		/* Pointer to the thread descriptor.  */
-  int multiple_threads;
+  int unused_multiple_threads;
   uintptr_t sysinfo;
   uintptr_t stack_guard;
   int gscope_flag;
@@ -44,10 +44,6 @@ typedef struct
   void *__private_ss;
 } tcbhead_t;
 
-# ifndef __s390x__
-#  define TLS_MULTIPLE_THREADS_IN_TCB 1
-# endif
-
 #else /* __ASSEMBLER__ */
 # include <tcb-offsets.h>
 #endif
diff --git a/sysdeps/sh/nptl/tcb-offsets.sym b/sysdeps/sh/nptl/tcb-offsets.sym
index 234207779d..4e452d9c6c 100644
--- a/sysdeps/sh/nptl/tcb-offsets.sym
+++ b/sysdeps/sh/nptl/tcb-offsets.sym
@@ -6,7 +6,6 @@ RESULT			offsetof (struct pthread, result)
 TID			offsetof (struct pthread, tid)
 CANCELHANDLING		offsetof (struct pthread, cancelhandling)
 CLEANUP_JMP_BUF		offsetof (struct pthread, cleanup_jmp_buf)
-MULTIPLE_THREADS_OFFSET	offsetof (struct pthread, header.multiple_threads)
 TLS_PRE_TCB_SIZE	sizeof (struct pthread)
 MUTEX_FUTEX		offsetof (pthread_mutex_t, __data.__lock)
 POINTER_GUARD		offsetof (tcbhead_t, pointer_guard)
diff --git a/sysdeps/sh/nptl/tls.h b/sysdeps/sh/nptl/tls.h
index 76591ab6ef..8778cb4ac0 100644
--- a/sysdeps/sh/nptl/tls.h
+++ b/sysdeps/sh/nptl/tls.h
@@ -36,8 +36,6 @@ typedef struct
   uintptr_t pointer_guard;
 } tcbhead_t;
 
-# define TLS_MULTIPLE_THREADS_IN_TCB 1
-
 #else /* __ASSEMBLER__ */
 # include <tcb-offsets.h>
 #endif /* __ASSEMBLER__ */
diff --git a/sysdeps/sparc/nptl/tcb-offsets.sym b/sysdeps/sparc/nptl/tcb-offsets.sym
index f75d02065e..e4a7e4720f 100644
--- a/sysdeps/sparc/nptl/tcb-offsets.sym
+++ b/sysdeps/sparc/nptl/tcb-offsets.sym
@@ -1,6 +1,5 @@
 #include <sysdep.h>
 #include <tls.h>
 
-MULTIPLE_THREADS_OFFSET		offsetof (tcbhead_t, multiple_threads)
 POINTER_GUARD			offsetof (tcbhead_t, pointer_guard)
 TID				offsetof (struct pthread, tid)
diff --git a/sysdeps/sparc/nptl/tls.h b/sysdeps/sparc/nptl/tls.h
index d1e2bb4ad1..b78cf0d6b4 100644
--- a/sysdeps/sparc/nptl/tls.h
+++ b/sysdeps/sparc/nptl/tls.h
@@ -35,7 +35,7 @@ typedef struct
 			   thread descriptor used by libpthread.  */
   dtv_t *dtv;
   void *self;
-  int multiple_threads;
+  int unused_multiple_threads;
 #if __WORDSIZE == 64
   int gscope_flag;
 #endif
diff --git a/sysdeps/unix/sysv/linux/single-thread.h b/sysdeps/unix/sysv/linux/single-thread.h
index 208edccce6..dd80e82c82 100644
--- a/sysdeps/unix/sysv/linux/single-thread.h
+++ b/sysdeps/unix/sysv/linux/single-thread.h
@@ -23,20 +23,7 @@
 # include <sys/single_threaded.h>
 #endif
 
-/* The default way to check if the process is single thread is by using the
-   pthread_t 'multiple_threads' field.  However, for some architectures it is
-   faster to either use an extra field on TCB or global variables (the TCB
-   field is also used on x86 for some single-thread atomic optimizations).
-
-   The ABI might define SINGLE_THREAD_BY_GLOBAL to enable the single thread
-   check to use global variables instead of the pthread_t field.  */
-
-#if !defined SINGLE_THREAD_BY_GLOBAL || IS_IN (rtld)
-# define SINGLE_THREAD_P \
-  (THREAD_GETMEM (THREAD_SELF, header.multiple_threads) == 0)
-#else
-# define SINGLE_THREAD_P (__libc_single_threaded != 0)
-#endif
+#define SINGLE_THREAD_P (__libc_single_threaded != 0)
 
 #define RTLD_SINGLE_THREAD_P SINGLE_THREAD_P
 
diff --git a/sysdeps/x86/atomic-machine.h b/sysdeps/x86/atomic-machine.h
index f24f1c71ed..56f96bf034 100644
--- a/sysdeps/x86/atomic-machine.h
+++ b/sysdeps/x86/atomic-machine.h
@@ -51,53 +51,30 @@
 #define atomic_compare_and_exchange_bool_acq(mem, newval, oldval) \
   (! __sync_bool_compare_and_swap (mem, oldval, newval))
 
-
-#define __arch_c_compare_and_exchange_val_8_acq(mem, newval, oldval) \
+#define __arch_c_compare_and_exchange_val_x_acq(mem, newval, oldval) \
   ({ __typeof (*mem) ret;						      \
-     __asm __volatile ("cmpl $0, %%" SEG_REG ":%P5\n\t"			      \
-		       "je 0f\n\t"					      \
-		       "lock\n"						      \
-		       "0:\tcmpxchgb %b2, %1"				      \
-		       : "=a" (ret), "=m" (*mem)			      \
-		       : BR_CONSTRAINT (newval), "m" (*mem), "0" (oldval),    \
-			 "i" (offsetof (tcbhead_t, multiple_threads)));	      \
+     if (SINGLE_THREAD_P)						      \
+       {								      \
+         ret = (*mem);							      \
+         if (ret == (oldval))						      \
+	   *(mem) = (newval);						      \
+       }								      \
+     else								      \
+       ret = __sync_val_compare_and_swap (mem, oldval, newval);		      \
      ret; })
 
+#define __arch_c_compare_and_exchange_val_8_acq(mem, newval, oldval) \
+  __arch_c_compare_and_exchange_val_x_acq (mem, newval, oldval)
+
 #define __arch_c_compare_and_exchange_val_16_acq(mem, newval, oldval) \
-  ({ __typeof (*mem) ret;						      \
-     __asm __volatile ("cmpl $0, %%" SEG_REG ":%P5\n\t"			      \
-		       "je 0f\n\t"					      \
-		       "lock\n"						      \
-		       "0:\tcmpxchgw %w2, %1"				      \
-		       : "=a" (ret), "=m" (*mem)			      \
-		       : BR_CONSTRAINT (newval), "m" (*mem), "0" (oldval),    \
-			 "i" (offsetof (tcbhead_t, multiple_threads)));	      \
-     ret; })
+  __arch_c_compare_and_exchange_val_x_acq (mem, newval, oldval)
 
 #define __arch_c_compare_and_exchange_val_32_acq(mem, newval, oldval) \
-  ({ __typeof (*mem) ret;						      \
-     __asm __volatile ("cmpl $0, %%" SEG_REG ":%P5\n\t"			      \
-		       "je 0f\n\t"					      \
-		       "lock\n"						      \
-		       "0:\tcmpxchgl %2, %1"				      \
-		       : "=a" (ret), "=m" (*mem)			      \
-		       : BR_CONSTRAINT (newval), "m" (*mem), "0" (oldval),    \
-			 "i" (offsetof (tcbhead_t, multiple_threads)));       \
-     ret; })
+  __arch_c_compare_and_exchange_val_x_acq (mem, newval, oldval)
 
 #ifdef __x86_64__
 # define __arch_c_compare_and_exchange_val_64_acq(mem, newval, oldval) \
-  ({ __typeof (*mem) ret;						      \
-     __asm __volatile ("cmpl $0, %%fs:%P5\n\t"				      \
-		       "je 0f\n\t"					      \
-		       "lock\n"						      \
-		       "0:\tcmpxchgq %q2, %1"				      \
-		       : "=a" (ret), "=m" (*mem)			      \
-		       : "q" ((int64_t) cast_to_integer (newval)),	      \
-			 "m" (*mem),					      \
-			 "0" ((int64_t) cast_to_integer (oldval)),	      \
-			 "i" (offsetof (tcbhead_t, multiple_threads)));	      \
-     ret; })
+  __arch_c_compare_and_exchange_val_x_acq (mem, newval, oldval)
 # define do_exchange_and_add_val_64_acq(pfx, mem, value) 0
 # define do_add_val_64_acq(pfx, mem, value) do { } while (0)
 #else
@@ -175,167 +152,26 @@
      result; })
 
 
-#define __arch_exchange_and_add_body(lock, pfx, mem, value) \
-  ({ __typeof (*mem) __result;						      \
-     __typeof (value) __addval = (value);				      \
-     if (sizeof (*mem) == 1)						      \
-       __asm __volatile (lock "xaddb %b0, %1"				      \
-			 : "=q" (__result), "=m" (*mem)			      \
-			 : "0" (__addval), "m" (*mem),			      \
-			   "i" (offsetof (tcbhead_t, multiple_threads)));     \
-     else if (sizeof (*mem) == 2)					      \
-       __asm __volatile (lock "xaddw %w0, %1"				      \
-			 : "=r" (__result), "=m" (*mem)			      \
-			 : "0" (__addval), "m" (*mem),			      \
-			   "i" (offsetof (tcbhead_t, multiple_threads)));     \
-     else if (sizeof (*mem) == 4)					      \
-       __asm __volatile (lock "xaddl %0, %1"				      \
-			 : "=r" (__result), "=m" (*mem)			      \
-			 : "0" (__addval), "m" (*mem),			      \
-			   "i" (offsetof (tcbhead_t, multiple_threads)));     \
-     else if (__HAVE_64B_ATOMICS)					      \
-       __asm __volatile (lock "xaddq %q0, %1"				      \
-			 : "=r" (__result), "=m" (*mem)			      \
-			 : "0" ((int64_t) cast_to_integer (__addval)),     \
-			   "m" (*mem),					      \
-			   "i" (offsetof (tcbhead_t, multiple_threads)));     \
-     else								      \
-       __result = do_exchange_and_add_val_64_acq (pfx, (mem), __addval);      \
-     __result; })
-
-#define atomic_exchange_and_add(mem, value) \
-  __sync_fetch_and_add (mem, value)
-
-#define __arch_exchange_and_add_cprefix \
-  "cmpl $0, %%" SEG_REG ":%P4\n\tje 0f\n\tlock\n0:\t"
-
-#define catomic_exchange_and_add(mem, value) \
-  __arch_exchange_and_add_body (__arch_exchange_and_add_cprefix, __arch_c,    \
-				mem, value)
-
-
-#define __arch_add_body(lock, pfx, apfx, mem, value) \
-  do {									      \
-    if (__builtin_constant_p (value) && (value) == 1)			      \
-      pfx##_increment (mem);						      \
-    else if (__builtin_constant_p (value) && (value) == -1)		      \
-      pfx##_decrement (mem);						      \
-    else if (sizeof (*mem) == 1)					      \
-      __asm __volatile (lock "addb %b1, %0"				      \
-			: "=m" (*mem)					      \
-			: IBR_CONSTRAINT (value), "m" (*mem),		      \
-			  "i" (offsetof (tcbhead_t, multiple_threads)));      \
-    else if (sizeof (*mem) == 2)					      \
-      __asm __volatile (lock "addw %w1, %0"				      \
-			: "=m" (*mem)					      \
-			: "ir" (value), "m" (*mem),			      \
-			  "i" (offsetof (tcbhead_t, multiple_threads)));      \
-    else if (sizeof (*mem) == 4)					      \
-      __asm __volatile (lock "addl %1, %0"				      \
-			: "=m" (*mem)					      \
-			: "ir" (value), "m" (*mem),			      \
-			  "i" (offsetof (tcbhead_t, multiple_threads)));      \
-    else if (__HAVE_64B_ATOMICS)					      \
-      __asm __volatile (lock "addq %q1, %0"				      \
-			: "=m" (*mem)					      \
-			: "ir" ((int64_t) cast_to_integer (value)),	      \
-			  "m" (*mem),					      \
-			  "i" (offsetof (tcbhead_t, multiple_threads)));      \
-    else								      \
-      do_add_val_64_acq (apfx, (mem), (value));				      \
-  } while (0)
-
 # define atomic_add(mem, value) \
-  __arch_add_body (LOCK_PREFIX, atomic, __arch, mem, value)
-
-#define __arch_add_cprefix \
-  "cmpl $0, %%" SEG_REG ":%P3\n\tje 0f\n\tlock\n0:\t"
+  __sync_fetch_and_add (mem, value)
 
 #define catomic_add(mem, value) \
-  __arch_add_body (__arch_add_cprefix, atomic, __arch_c, mem, value)
+  ({									     \
+    if (SINGLE_THREAD_P)						     \
+      {									     \
+        __typeof (*mem) __incr = (value);				     \
+        *(mem) += __incr;						     \
+      }									     \
+   else									     \
+     atomic_add (mem, value);						     \
+  })
 
 
-#define atomic_add_negative(mem, value) \
-  ({ unsigned char __result;						      \
-     if (sizeof (*mem) == 1)						      \
-       __asm __volatile (LOCK_PREFIX "addb %b2, %0; sets %1"		      \
-			 : "=m" (*mem), "=qm" (__result)		      \
-			 : IBR_CONSTRAINT (value), "m" (*mem));		      \
-     else if (sizeof (*mem) == 2)					      \
-       __asm __volatile (LOCK_PREFIX "addw %w2, %0; sets %1"		      \
-			 : "=m" (*mem), "=qm" (__result)		      \
-			 : "ir" (value), "m" (*mem));			      \
-     else if (sizeof (*mem) == 4)					      \
-       __asm __volatile (LOCK_PREFIX "addl %2, %0; sets %1"		      \
-			 : "=m" (*mem), "=qm" (__result)		      \
-			 : "ir" (value), "m" (*mem));			      \
-     else if (__HAVE_64B_ATOMICS)					      \
-       __asm __volatile (LOCK_PREFIX "addq %q2, %0; sets %1"		      \
-			 : "=m" (*mem), "=qm" (__result)		      \
-			 : "ir" ((int64_t) cast_to_integer (value)),	      \
-			   "m" (*mem));					      \
-     else								      \
-       __atomic_link_error ();						      \
-     __result; })
-
-
-#define atomic_add_zero(mem, value) \
-  ({ unsigned char __result;						      \
-     if (sizeof (*mem) == 1)						      \
-       __asm __volatile (LOCK_PREFIX "addb %b2, %0; setz %1"		      \
-			 : "=m" (*mem), "=qm" (__result)		      \
-			 : IBR_CONSTRAINT (value), "m" (*mem));		      \
-     else if (sizeof (*mem) == 2)					      \
-       __asm __volatile (LOCK_PREFIX "addw %w2, %0; setz %1"		      \
-			 : "=m" (*mem), "=qm" (__result)		      \
-			 : "ir" (value), "m" (*mem));			      \
-     else if (sizeof (*mem) == 4)					      \
-       __asm __volatile (LOCK_PREFIX "addl %2, %0; setz %1"		      \
-			 : "=m" (*mem), "=qm" (__result)		      \
-			 : "ir" (value), "m" (*mem));			      \
-     else if (__HAVE_64B_ATOMICS)					      \
-       __asm __volatile (LOCK_PREFIX "addq %q2, %0; setz %1"		      \
-			 : "=m" (*mem), "=qm" (__result)		      \
-			 : "ir" ((int64_t) cast_to_integer (value)),	      \
-			   "m" (*mem));					      \
-     else								      \
-       __atomic_link_error ();					      \
-     __result; })
-
-
-#define __arch_increment_body(lock, pfx, mem) \
-  do {									      \
-    if (sizeof (*mem) == 1)						      \
-      __asm __volatile (lock "incb %b0"					      \
-			: "=m" (*mem)					      \
-			: "m" (*mem),					      \
-			  "i" (offsetof (tcbhead_t, multiple_threads)));      \
-    else if (sizeof (*mem) == 2)					      \
-      __asm __volatile (lock "incw %w0"					      \
-			: "=m" (*mem)					      \
-			: "m" (*mem),					      \
-			  "i" (offsetof (tcbhead_t, multiple_threads)));      \
-    else if (sizeof (*mem) == 4)					      \
-      __asm __volatile (lock "incl %0"					      \
-			: "=m" (*mem)					      \
-			: "m" (*mem),					      \
-			  "i" (offsetof (tcbhead_t, multiple_threads)));      \
-    else if (__HAVE_64B_ATOMICS)					      \
-      __asm __volatile (lock "incq %q0"					      \
-			: "=m" (*mem)					      \
-			: "m" (*mem),					      \
-			  "i" (offsetof (tcbhead_t, multiple_threads)));      \
-    else								      \
-      do_add_val_64_acq (pfx, mem, 1);					      \
-  } while (0)
-
-#define atomic_increment(mem) __arch_increment_body (LOCK_PREFIX, __arch, mem)
-
-#define __arch_increment_cprefix \
-  "cmpl $0, %%" SEG_REG ":%P2\n\tje 0f\n\tlock\n0:\t"
+#define atomic_increment(mem) \
+  atomic_add (mem, 1)
 
 #define catomic_increment(mem) \
-  __arch_increment_body (__arch_increment_cprefix, __arch_c, mem)
+  catomic_add (mem, 1)
 
 
 #define atomic_increment_and_test(mem) \
@@ -361,39 +197,16 @@
      __result; })
 
 
-#define __arch_decrement_body(lock, pfx, mem) \
-  do {									      \
-    if (sizeof (*mem) == 1)						      \
-      __asm __volatile (lock "decb %b0"					      \
-			: "=m" (*mem)					      \
-			: "m" (*mem),					      \
-			  "i" (offsetof (tcbhead_t, multiple_threads)));      \
-    else if (sizeof (*mem) == 2)					      \
-      __asm __volatile (lock "decw %w0"					      \
-			: "=m" (*mem)					      \
-			: "m" (*mem),					      \
-			  "i" (offsetof (tcbhead_t, multiple_threads)));      \
-    else if (sizeof (*mem) == 4)					      \
-      __asm __volatile (lock "decl %0"					      \
-			: "=m" (*mem)					      \
-			: "m" (*mem),					      \
-			  "i" (offsetof (tcbhead_t, multiple_threads)));      \
-    else if (__HAVE_64B_ATOMICS)					      \
-      __asm __volatile (lock "decq %q0"					      \
-			: "=m" (*mem)					      \
-			: "m" (*mem),					      \
-			  "i" (offsetof (tcbhead_t, multiple_threads)));      \
-    else								      \
-      do_add_val_64_acq (pfx, mem, -1);					      \
-  } while (0)
-
-#define atomic_decrement(mem) __arch_decrement_body (LOCK_PREFIX, __arch, mem)
-
-#define __arch_decrement_cprefix \
-  "cmpl $0, %%" SEG_REG ":%P2\n\tje 0f\n\tlock\n0:\t"
+#define atomic_decrement(mem) \
+  __sync_fetch_and_sub (mem, 1);
 
 #define catomic_decrement(mem) \
-  __arch_decrement_body (__arch_decrement_cprefix, __arch_c, mem)
+  ({									     \
+    if (SINGLE_THREAD_P)						     \
+      *(mem) -= 1;						     	     \
+   else									     \
+      atomic_decrement (mem);						     \
+  })
 
 
 #define atomic_decrement_and_test(mem) \
@@ -467,69 +280,21 @@
      __result; })
 
 
-#define __arch_and_body(lock, mem, mask) \
-  do {									      \
-    if (sizeof (*mem) == 1)						      \
-      __asm __volatile (lock "andb %b1, %0"				      \
-			: "=m" (*mem)					      \
-			: IBR_CONSTRAINT (mask), "m" (*mem),		      \
-			  "i" (offsetof (tcbhead_t, multiple_threads)));      \
-    else if (sizeof (*mem) == 2)					      \
-      __asm __volatile (lock "andw %w1, %0"				      \
-			: "=m" (*mem)					      \
-			: "ir" (mask), "m" (*mem),			      \
-			  "i" (offsetof (tcbhead_t, multiple_threads)));      \
-    else if (sizeof (*mem) == 4)					      \
-      __asm __volatile (lock "andl %1, %0"				      \
-			: "=m" (*mem)					      \
-			: "ir" (mask), "m" (*mem),			      \
-			  "i" (offsetof (tcbhead_t, multiple_threads)));      \
-    else if (__HAVE_64B_ATOMICS)					      \
-      __asm __volatile (lock "andq %q1, %0"				      \
-			: "=m" (*mem)					      \
-			: "ir" (mask), "m" (*mem),			      \
-			  "i" (offsetof (tcbhead_t, multiple_threads)));      \
-    else								      \
-      __atomic_link_error ();						      \
-  } while (0)
-
-#define __arch_cprefix \
-  "cmpl $0, %%" SEG_REG ":%P3\n\tje 0f\n\tlock\n0:\t"
-
-#define atomic_and(mem, mask) __arch_and_body (LOCK_PREFIX, mem, mask)
+#define atomic_and(mem, mask) \
+  __sync_fetch_and_and ((mem), (mask))
 
 #define catomic_and(mem, mask) __arch_and_body (__arch_cprefix, mem, mask)
 
 
-#define __arch_or_body(lock, mem, mask) \
-  do {									      \
-    if (sizeof (*mem) == 1)						      \
-      __asm __volatile (lock "orb %b1, %0"				      \
-			: "=m" (*mem)					      \
-			: IBR_CONSTRAINT (mask), "m" (*mem),		      \
-			  "i" (offsetof (tcbhead_t, multiple_threads)));      \
-    else if (sizeof (*mem) == 2)					      \
-      __asm __volatile (lock "orw %w1, %0"				      \
-			: "=m" (*mem)					      \
-			: "ir" (mask), "m" (*mem),			      \
-			  "i" (offsetof (tcbhead_t, multiple_threads)));      \
-    else if (sizeof (*mem) == 4)					      \
-      __asm __volatile (lock "orl %1, %0"				      \
-			: "=m" (*mem)					      \
-			: "ir" (mask), "m" (*mem),			      \
-			  "i" (offsetof (tcbhead_t, multiple_threads)));      \
-    else if (__HAVE_64B_ATOMICS)					      \
-      __asm __volatile (lock "orq %q1, %0"				      \
-			: "=m" (*mem)					      \
-			: "ir" (mask), "m" (*mem),			      \
-			  "i" (offsetof (tcbhead_t, multiple_threads)));      \
-    else								      \
-      __atomic_link_error ();						      \
-  } while (0)
-
-#define atomic_or(mem, mask) __arch_or_body (LOCK_PREFIX, mem, mask)
+#define atomic_or(mem, mask) __sync_fetch_and_or (mem, mask)
 
-#define catomic_or(mem, mask) __arch_or_body (__arch_cprefix, mem, mask)
+#define catomic_or(mem, mask) \
+  ({									     \
+    if (SINGLE_THREAD_P)						     \
+      *(mem) |= mask;						     	     \
+   else									     \
+      atomic_or (mem, mask);						     \
+  })
 
 /* We don't use mfence because it is supposedly slower due to having to
    provide stronger guarantees (e.g., regarding self-modifying code).  */
diff --git a/sysdeps/x86_64/nptl/tcb-offsets.sym b/sysdeps/x86_64/nptl/tcb-offsets.sym
index 2bbd563a6c..8ec55a7ea8 100644
--- a/sysdeps/x86_64/nptl/tcb-offsets.sym
+++ b/sysdeps/x86_64/nptl/tcb-offsets.sym
@@ -9,7 +9,6 @@ CLEANUP_JMP_BUF		offsetof (struct pthread, cleanup_jmp_buf)
 CLEANUP			offsetof (struct pthread, cleanup)
 CLEANUP_PREV		offsetof (struct _pthread_cleanup_buffer, __prev)
 MUTEX_FUTEX		offsetof (pthread_mutex_t, __data.__lock)
-MULTIPLE_THREADS_OFFSET	offsetof (tcbhead_t, multiple_threads)
 POINTER_GUARD		offsetof (tcbhead_t, pointer_guard)
 FEATURE_1_OFFSET	offsetof (tcbhead_t, feature_1)
 SSP_BASE_OFFSET		offsetof (tcbhead_t, ssp_base)
-- 
2.34.1


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 4/4] Remove single-thread.h
  2022-06-08 16:49 [PATCH 0/4] Simplify internal single-threaded usage Adhemerval Zanella
                   ` (2 preceding siblings ...)
  2022-06-08 16:49 ` [PATCH 3/4] Remove usage of TLS_MULTIPLE_THREADS_IN_TCB Adhemerval Zanella
@ 2022-06-08 16:49 ` Adhemerval Zanella
  3 siblings, 0 replies; 10+ messages in thread
From: Adhemerval Zanella @ 2022-06-08 16:49 UTC (permalink / raw)
  To: libc-alpha, Wilco Dijkstra

And move SINGLE_THREAD_P macro to sys/single_threaded.h.
---
 include/sys/single_threaded.h                 | 15 +++++++---
 sysdeps/generic/single-thread.h               | 25 ----------------
 sysdeps/mach/hurd/sysdep-cancel.h             |  5 ----
 sysdeps/unix/sysdep.h                         |  2 +-
 .../unix/sysv/linux/aarch64/single-thread.h   |  2 --
 sysdeps/unix/sysv/linux/arc/single-thread.h   |  2 --
 sysdeps/unix/sysv/linux/arm/single-thread.h   |  2 --
 sysdeps/unix/sysv/linux/hppa/single-thread.h  |  2 --
 .../sysv/linux/microblaze/single-thread.h     |  2 --
 sysdeps/unix/sysv/linux/s390/single-thread.h  |  2 --
 sysdeps/unix/sysv/linux/single-thread.h       | 30 -------------------
 .../unix/sysv/linux/x86_64/single-thread.h    |  2 --
 12 files changed, 12 insertions(+), 79 deletions(-)
 delete mode 100644 sysdeps/generic/single-thread.h
 delete mode 100644 sysdeps/unix/sysv/linux/aarch64/single-thread.h
 delete mode 100644 sysdeps/unix/sysv/linux/arc/single-thread.h
 delete mode 100644 sysdeps/unix/sysv/linux/arm/single-thread.h
 delete mode 100644 sysdeps/unix/sysv/linux/hppa/single-thread.h
 delete mode 100644 sysdeps/unix/sysv/linux/microblaze/single-thread.h
 delete mode 100644 sysdeps/unix/sysv/linux/s390/single-thread.h
 delete mode 100644 sysdeps/unix/sysv/linux/single-thread.h
 delete mode 100644 sysdeps/unix/sysv/linux/x86_64/single-thread.h

diff --git a/include/sys/single_threaded.h b/include/sys/single_threaded.h
index 258b01e0b2..c08bd52ab8 100644
--- a/include/sys/single_threaded.h
+++ b/include/sys/single_threaded.h
@@ -1,12 +1,19 @@
-#include <misc/sys/single_threaded.h>
+#ifndef __ASSEMBLER__
+# include <misc/sys/single_threaded.h>
 
-#ifndef _ISOMAC
+# ifndef _ISOMAC
 
 libc_hidden_proto (__libc_single_threaded);
 
-# ifdef SHARED
+#  ifdef SHARED
 extern __typeof (__libc_single_threaded) *__libc_external_single_threaded
   attribute_hidden;
+#  endif
+
+#  define SINGLE_THREAD_P (__libc_single_threaded != 0)
+
+#  define RTLD_SINGLE_THREAD_P SINGLE_THREAD_P
+
 # endif
 
-#endif
+#endif /* __ASSEMBLER__ */
diff --git a/sysdeps/generic/single-thread.h b/sysdeps/generic/single-thread.h
deleted file mode 100644
index 7f8222b38a..0000000000
--- a/sysdeps/generic/single-thread.h
+++ /dev/null
@@ -1,25 +0,0 @@
-/* Single thread optimization, generic version.
-   Copyright (C) 2019-2022 Free Software Foundation, Inc.
-   This file is part of the GNU C Library.
-
-   The GNU C Library is free software; you can redistribute it and/or
-   modify it under the terms of the GNU Lesser General Public
-   License as published by the Free Software Foundation; either
-   version 2.1 of the License, or (at your option) any later version.
-
-   The GNU C Library is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-   Lesser General Public License for more details.
-
-   You should have received a copy of the GNU Lesser General Public
-   License along with the GNU C Library; if not, see
-   <https://www.gnu.org/licenses/>.  */
-
-#ifndef _SINGLE_THREAD_H
-#define _SINGLE_THREAD_H
-
-#define SINGLE_THREAD_P (0)
-#define RTLD_SINGLE_THREAD_P (0)
-
-#endif /* _SINGLE_THREAD_H  */
diff --git a/sysdeps/mach/hurd/sysdep-cancel.h b/sysdeps/mach/hurd/sysdep-cancel.h
index 669c17151a..9311367ab9 100644
--- a/sysdeps/mach/hurd/sysdep-cancel.h
+++ b/sysdeps/mach/hurd/sysdep-cancel.h
@@ -6,11 +6,6 @@ void __pthread_disable_asynccancel (int oldtype);
 #pragma weak __pthread_enable_asynccancel
 #pragma weak __pthread_disable_asynccancel
 
-/* Always multi-thread (since there's at least the sig handler), but no
-   handling enabled.  */
-#define SINGLE_THREAD_P (0)
-#define RTLD_SINGLE_THREAD_P (0)
-
 #define LIBC_CANCEL_ASYNC() ({ \
 	int __cancel_oldtype = 0; \
 	if (__pthread_enable_asynccancel) \
diff --git a/sysdeps/unix/sysdep.h b/sysdeps/unix/sysdep.h
index a1d9df4c73..a8abecb92b 100644
--- a/sysdeps/unix/sysdep.h
+++ b/sysdeps/unix/sysdep.h
@@ -16,7 +16,7 @@
    <https://www.gnu.org/licenses/>.  */
 
 #include <sysdeps/generic/sysdep.h>
-#include <single-thread.h>
+#include <sys/single_threaded.h>
 #include <sys/syscall.h>
 #define	HAVE_SYSCALLS
 
diff --git a/sysdeps/unix/sysv/linux/aarch64/single-thread.h b/sysdeps/unix/sysv/linux/aarch64/single-thread.h
deleted file mode 100644
index a5d3a2aaf4..0000000000
--- a/sysdeps/unix/sysv/linux/aarch64/single-thread.h
+++ /dev/null
@@ -1,2 +0,0 @@
-#define SINGLE_THREAD_BY_GLOBAL
-#include_next <single-thread.h>
diff --git a/sysdeps/unix/sysv/linux/arc/single-thread.h b/sysdeps/unix/sysv/linux/arc/single-thread.h
deleted file mode 100644
index a5d3a2aaf4..0000000000
--- a/sysdeps/unix/sysv/linux/arc/single-thread.h
+++ /dev/null
@@ -1,2 +0,0 @@
-#define SINGLE_THREAD_BY_GLOBAL
-#include_next <single-thread.h>
diff --git a/sysdeps/unix/sysv/linux/arm/single-thread.h b/sysdeps/unix/sysv/linux/arm/single-thread.h
deleted file mode 100644
index a5d3a2aaf4..0000000000
--- a/sysdeps/unix/sysv/linux/arm/single-thread.h
+++ /dev/null
@@ -1,2 +0,0 @@
-#define SINGLE_THREAD_BY_GLOBAL
-#include_next <single-thread.h>
diff --git a/sysdeps/unix/sysv/linux/hppa/single-thread.h b/sysdeps/unix/sysv/linux/hppa/single-thread.h
deleted file mode 100644
index a5d3a2aaf4..0000000000
--- a/sysdeps/unix/sysv/linux/hppa/single-thread.h
+++ /dev/null
@@ -1,2 +0,0 @@
-#define SINGLE_THREAD_BY_GLOBAL
-#include_next <single-thread.h>
diff --git a/sysdeps/unix/sysv/linux/microblaze/single-thread.h b/sysdeps/unix/sysv/linux/microblaze/single-thread.h
deleted file mode 100644
index a5d3a2aaf4..0000000000
--- a/sysdeps/unix/sysv/linux/microblaze/single-thread.h
+++ /dev/null
@@ -1,2 +0,0 @@
-#define SINGLE_THREAD_BY_GLOBAL
-#include_next <single-thread.h>
diff --git a/sysdeps/unix/sysv/linux/s390/single-thread.h b/sysdeps/unix/sysv/linux/s390/single-thread.h
deleted file mode 100644
index a5d3a2aaf4..0000000000
--- a/sysdeps/unix/sysv/linux/s390/single-thread.h
+++ /dev/null
@@ -1,2 +0,0 @@
-#define SINGLE_THREAD_BY_GLOBAL
-#include_next <single-thread.h>
diff --git a/sysdeps/unix/sysv/linux/single-thread.h b/sysdeps/unix/sysv/linux/single-thread.h
deleted file mode 100644
index dd80e82c82..0000000000
--- a/sysdeps/unix/sysv/linux/single-thread.h
+++ /dev/null
@@ -1,30 +0,0 @@
-/* Single thread optimization, Linux version.
-   Copyright (C) 2019-2022 Free Software Foundation, Inc.
-   This file is part of the GNU C Library.
-
-   The GNU C Library is free software; you can redistribute it and/or
-   modify it under the terms of the GNU Lesser General Public
-   License as published by the Free Software Foundation; either
-   version 2.1 of the License, or (at your option) any later version.
-
-   The GNU C Library is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-   Lesser General Public License for more details.
-
-   You should have received a copy of the GNU Lesser General Public
-   License along with the GNU C Library; if not, see
-   <https://www.gnu.org/licenses/>.  */
-
-#ifndef _SINGLE_THREAD_H
-#define _SINGLE_THREAD_H
-
-#ifndef __ASSEMBLER__
-# include <sys/single_threaded.h>
-#endif
-
-#define SINGLE_THREAD_P (__libc_single_threaded != 0)
-
-#define RTLD_SINGLE_THREAD_P SINGLE_THREAD_P
-
-#endif /* _SINGLE_THREAD_H  */
diff --git a/sysdeps/unix/sysv/linux/x86_64/single-thread.h b/sysdeps/unix/sysv/linux/x86_64/single-thread.h
deleted file mode 100644
index a5d3a2aaf4..0000000000
--- a/sysdeps/unix/sysv/linux/x86_64/single-thread.h
+++ /dev/null
@@ -1,2 +0,0 @@
-#define SINGLE_THREAD_BY_GLOBAL
-#include_next <single-thread.h>
-- 
2.34.1


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/4] misc: Optimize internal usage of __libc_single_threaded
  2022-06-08 16:49 ` [PATCH 1/4] misc: Optimize internal usage of __libc_single_threaded Adhemerval Zanella
@ 2022-06-08 17:44   ` Florian Weimer
  2022-06-08 18:03     ` Andreas Schwab
  2022-06-08 18:14     ` Adhemerval Zanella
  0 siblings, 2 replies; 10+ messages in thread
From: Florian Weimer @ 2022-06-08 17:44 UTC (permalink / raw)
  To: Adhemerval Zanella via Libc-alpha; +Cc: Wilco Dijkstra, Adhemerval Zanella

* Adhemerval Zanella via Libc-alpha:

> diff --git a/elf/libc_early_init.c b/elf/libc_early_init.c
> index 3c4a19cf6b..18966900c4 100644
> --- a/elf/libc_early_init.c
> +++ b/elf/libc_early_init.c
> @@ -16,7 +16,9 @@
>     License along with the GNU C Library; if not, see
>     <https://www.gnu.org/licenses/>.  */
>  
> +#include <assert.h>
>  #include <ctype.h>
> +#include <dlfcn.h>
>  #include <elision-conf.h>
>  #include <libc-early-init.h>
>  #include <libc-internal.h>
> @@ -38,6 +40,13 @@ __libc_early_init (_Bool initial)
>    __libc_single_threaded = initial;
>  
>  #ifdef SHARED
> +  /* _libc_single_thread can be accessed through copy relocations, so it
> +     requires to update the external copy.  */
> +  __libc_external_single_threaded = ___dlsym (RTLD_DEFAULT,
> +					      "__libc_single_threaded");
> +  assert (__libc_external_single_threaded != NULL);
> +  *__libc_external_single_threaded = initial;
> +
>    __libc_initial = initial;
>  #endif

Typo in the comment: [_]_libc_single_thread.

You must use __libc_dlsym, to avoid clobbering dlerror.  No need to add
___dlsym.

Is it necessary to cache the address?

> diff --git a/nptl/pthread_create.c b/nptl/pthread_create.c
> index e7a099acb7..5633d01c62 100644
> --- a/nptl/pthread_create.c
> +++ b/nptl/pthread_create.c
> @@ -627,7 +627,11 @@ __pthread_create_2_1 (pthread_t *newthread, const pthread_attr_t *attr,
>    if (__libc_single_threaded)
>      {
>        late_init ();
> -      __libc_single_threaded = 0;
> +      __libc_single_threaded =
> +#ifdef SHARED
> +        *__libc_external_single_threaded =
> +#endif
> +	0;
>      }
>  
>    const struct pthread_attr *iattr = (struct pthread_attr *) attr;

I think you can call __libc_dlsym here.

Thanks,
Florian


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/4] misc: Optimize internal usage of __libc_single_threaded
  2022-06-08 17:44   ` Florian Weimer
@ 2022-06-08 18:03     ` Andreas Schwab
  2022-06-08 18:14     ` Adhemerval Zanella
  1 sibling, 0 replies; 10+ messages in thread
From: Andreas Schwab @ 2022-06-08 18:03 UTC (permalink / raw)
  To: Florian Weimer via Libc-alpha; +Cc: Florian Weimer, Wilco Dijkstra

On Jun 08 2022, Florian Weimer via Libc-alpha wrote:

> * Adhemerval Zanella via Libc-alpha:
>
>> diff --git a/elf/libc_early_init.c b/elf/libc_early_init.c
>> index 3c4a19cf6b..18966900c4 100644
>> --- a/elf/libc_early_init.c
>> +++ b/elf/libc_early_init.c
>> @@ -16,7 +16,9 @@
>>     License along with the GNU C Library; if not, see
>>     <https://www.gnu.org/licenses/>.  */
>>  
>> +#include <assert.h>
>>  #include <ctype.h>
>> +#include <dlfcn.h>
>>  #include <elision-conf.h>
>>  #include <libc-early-init.h>
>>  #include <libc-internal.h>
>> @@ -38,6 +40,13 @@ __libc_early_init (_Bool initial)
>>    __libc_single_threaded = initial;
>>  
>>  #ifdef SHARED
>> +  /* _libc_single_thread can be accessed through copy relocations, so it
>> +     requires to update the external copy.  */
>> +  __libc_external_single_threaded = ___dlsym (RTLD_DEFAULT,
>> +					      "__libc_single_threaded");
>> +  assert (__libc_external_single_threaded != NULL);
>> +  *__libc_external_single_threaded = initial;
>> +
>>    __libc_initial = initial;
>>  #endif
>
> Typo in the comment: [_]_libc_single_thread.

__libc_single_threaded

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/4] misc: Optimize internal usage of __libc_single_threaded
  2022-06-08 17:44   ` Florian Weimer
  2022-06-08 18:03     ` Andreas Schwab
@ 2022-06-08 18:14     ` Adhemerval Zanella
  2022-06-08 19:00       ` Adhemerval Zanella
  1 sibling, 1 reply; 10+ messages in thread
From: Adhemerval Zanella @ 2022-06-08 18:14 UTC (permalink / raw)
  To: Florian Weimer, Adhemerval Zanella via Libc-alpha; +Cc: Wilco Dijkstra



On 08/06/2022 14:44, Florian Weimer wrote:
> * Adhemerval Zanella via Libc-alpha:
> 
>> diff --git a/elf/libc_early_init.c b/elf/libc_early_init.c
>> index 3c4a19cf6b..18966900c4 100644
>> --- a/elf/libc_early_init.c
>> +++ b/elf/libc_early_init.c
>> @@ -16,7 +16,9 @@
>>     License along with the GNU C Library; if not, see
>>     <https://www.gnu.org/licenses/>.  */
>>  
>> +#include <assert.h>
>>  #include <ctype.h>
>> +#include <dlfcn.h>
>>  #include <elision-conf.h>
>>  #include <libc-early-init.h>
>>  #include <libc-internal.h>
>> @@ -38,6 +40,13 @@ __libc_early_init (_Bool initial)
>>    __libc_single_threaded = initial;
>>  
>>  #ifdef SHARED
>> +  /* _libc_single_thread can be accessed through copy relocations, so it
>> +     requires to update the external copy.  */
>> +  __libc_external_single_threaded = ___dlsym (RTLD_DEFAULT,
>> +					      "__libc_single_threaded");
>> +  assert (__libc_external_single_threaded != NULL);
>> +  *__libc_external_single_threaded = initial;
>> +
>>    __libc_initial = initial;
>>  #endif
> 
> Typo in the comment: [_]_libc_single_thread.

Ack.

> 
> You must use __libc_dlsym, to avoid clobbering dlerror.  No need to add
> ___dlsym.

In fact we do need to use ___dlsym so we can use RTLD_DEFAULT, __libc_dlsym
does not support it and I am not sure how easy it would be do to it (I am
checking, but running in some issues).

> 
> Is it necessary to cache the address?

Not really, but it a duplicated effort when pthread_create is called.

> 
>> diff --git a/nptl/pthread_create.c b/nptl/pthread_create.c
>> index e7a099acb7..5633d01c62 100644
>> --- a/nptl/pthread_create.c
>> +++ b/nptl/pthread_create.c
>> @@ -627,7 +627,11 @@ __pthread_create_2_1 (pthread_t *newthread, const pthread_attr_t *attr,
>>    if (__libc_single_threaded)
>>      {
>>        late_init ();
>> -      __libc_single_threaded = 0;
>> +      __libc_single_threaded =
>> +#ifdef SHARED
>> +        *__libc_external_single_threaded =
>> +#endif
>> +	0;
>>      }
>>  
>>    const struct pthread_attr *iattr = (struct pthread_attr *) attr;
> 
> I think you can call __libc_dlsym here.
> 
> Thanks,
> Florian
> 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/4] misc: Optimize internal usage of __libc_single_threaded
  2022-06-08 18:14     ` Adhemerval Zanella
@ 2022-06-08 19:00       ` Adhemerval Zanella
  2022-06-08 19:41         ` Florian Weimer
  0 siblings, 1 reply; 10+ messages in thread
From: Adhemerval Zanella @ 2022-06-08 19:00 UTC (permalink / raw)
  To: Florian Weimer, Adhemerval Zanella via Libc-alpha; +Cc: Wilco Dijkstra



On 08/06/2022 15:14, Adhemerval Zanella wrote:
> 
> 
> On 08/06/2022 14:44, Florian Weimer wrote:
>> * Adhemerval Zanella via Libc-alpha:
>>
>>> diff --git a/elf/libc_early_init.c b/elf/libc_early_init.c
>>> index 3c4a19cf6b..18966900c4 100644
>>> --- a/elf/libc_early_init.c
>>> +++ b/elf/libc_early_init.c
>>> @@ -16,7 +16,9 @@
>>>     License along with the GNU C Library; if not, see
>>>     <https://www.gnu.org/licenses/>.  */
>>>  
>>> +#include <assert.h>
>>>  #include <ctype.h>
>>> +#include <dlfcn.h>
>>>  #include <elision-conf.h>
>>>  #include <libc-early-init.h>
>>>  #include <libc-internal.h>
>>> @@ -38,6 +40,13 @@ __libc_early_init (_Bool initial)
>>>    __libc_single_threaded = initial;
>>>  
>>>  #ifdef SHARED
>>> +  /* _libc_single_thread can be accessed through copy relocations, so it
>>> +     requires to update the external copy.  */
>>> +  __libc_external_single_threaded = ___dlsym (RTLD_DEFAULT,
>>> +					      "__libc_single_threaded");
>>> +  assert (__libc_external_single_threaded != NULL);
>>> +  *__libc_external_single_threaded = initial;
>>> +
>>>    __libc_initial = initial;
>>>  #endif
>>
>> Typo in the comment: [_]_libc_single_thread.
> 
> Ack.
> 
>>
>> You must use __libc_dlsym, to avoid clobbering dlerror.  No need to add
>> ___dlsym.
> 
> In fact we do need to use ___dlsym so we can use RTLD_DEFAULT, __libc_dlsym
> does not support it and I am not sure how easy it would be do to it (I am
> checking, but running in some issues).
> 

Ok, I could make __libc_dlsym work with RTLD_DEFAULT, the trick is we need to
use global cope instead of local one.

>>
>> Is it necessary to cache the address?
> 
> Not really, but it a duplicated effort when pthread_create is called.
> 
>>
>>> diff --git a/nptl/pthread_create.c b/nptl/pthread_create.c
>>> index e7a099acb7..5633d01c62 100644
>>> --- a/nptl/pthread_create.c
>>> +++ b/nptl/pthread_create.c
>>> @@ -627,7 +627,11 @@ __pthread_create_2_1 (pthread_t *newthread, const pthread_attr_t *attr,
>>>    if (__libc_single_threaded)
>>>      {
>>>        late_init ();
>>> -      __libc_single_threaded = 0;
>>> +      __libc_single_threaded =
>>> +#ifdef SHARED
>>> +        *__libc_external_single_threaded =
>>> +#endif
>>> +	0;
>>>      }
>>>  
>>>    const struct pthread_attr *iattr = (struct pthread_attr *) attr;
>>
>> I think you can call __libc_dlsym here.
>>
>> Thanks,
>> Florian
>>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/4] misc: Optimize internal usage of __libc_single_threaded
  2022-06-08 19:00       ` Adhemerval Zanella
@ 2022-06-08 19:41         ` Florian Weimer
  0 siblings, 0 replies; 10+ messages in thread
From: Florian Weimer @ 2022-06-08 19:41 UTC (permalink / raw)
  To: Adhemerval Zanella; +Cc: Adhemerval Zanella via Libc-alpha, Wilco Dijkstra

* Adhemerval Zanella:

> Ok, I could make __libc_dlsym work with RTLD_DEFAULT, the trick is we need to
> use global cope instead of local one.

Doesn't L(dl_ns)[LM_ID_BASE]._ns_loaded work as the handle/link map?

If it is about multiple namespaces, I think we can make pthread_create
fail with ENOSYS in secondary namespaces.  It is currently broken
anyway.

Thanks,
Florian


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2022-06-08 19:41 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-06-08 16:49 [PATCH 0/4] Simplify internal single-threaded usage Adhemerval Zanella
2022-06-08 16:49 ` [PATCH 1/4] misc: Optimize internal usage of __libc_single_threaded Adhemerval Zanella
2022-06-08 17:44   ` Florian Weimer
2022-06-08 18:03     ` Andreas Schwab
2022-06-08 18:14     ` Adhemerval Zanella
2022-06-08 19:00       ` Adhemerval Zanella
2022-06-08 19:41         ` Florian Weimer
2022-06-08 16:49 ` [PATCH 2/4] Replace __libc_multiple_threads with __libc_single_threaded Adhemerval Zanella
2022-06-08 16:49 ` [PATCH 3/4] Remove usage of TLS_MULTIPLE_THREADS_IN_TCB Adhemerval Zanella
2022-06-08 16:49 ` [PATCH 4/4] Remove single-thread.h Adhemerval Zanella

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).